How the ruby debugger Byebug works with TracePoint API

In the series ‘How it works’ I look at awesome code of others. This week I look at Byebug a Ruby 2.0 debugger gem.

What is Byebug?

Byebug is an amazing ruby 2.0 debugger gem. It allows for debugging of ruby 2.0 using the TracePoint API. It is a gem with native extensions meaning it requires C compilation to be used. We are looking at byebug 1.8.2 in this post.

Why am I reviewing Byebug?

Byebug is the first 2.0 debugger. It’s elegant, easy to use. And written by an amazing developer David Rodriguez. It’s the default reference project for anyone who wants to write native, well tested-gem.

Terminology

REPL - The debugger, the inspector, the thing where you type commands after it has hit a breakpoint. REPL stands for Read-eval-print-loop.

Overview

Byebug consists out of a couple of parts.

C extension with hooks to the Tracepoint API and deals with low level concepts such as stack frames.
A breakpoint and catch point system
A ruby support library with acts as a bridge to the C API and allows for command processing and a REPL.
A beautiful test suite

Personally, I find the code and the library absolutely stunning. It’s both beautiful and elegant.

The gem file structure

The command tree comes up with the following file structure. I’ve removed a lot of files. But this is the basic outline.

.
├── Rakefile
├── bin
│   └── byebug
├── byebug.gemspec
├── ext
│   └── byebug
│       ├── Makefile
│       ├── breakpoint.c
│       ├── breakpoint.o
│       ├── byebug.bundle
│       ├── byebug.c
│       ├── byebug.h
│       ├── byebug.o
│       ├── context.c
│       ├── context.o
│       ├── extconf.rb
├── lib
│   ├── byebug
│   │   ├── command.rb
│   │   ├── commands
│   │   │   ├── breakpoints.rb
│   │   │   ├── ....
│   │   ├── context.rb
│   │   ├── remote.rb
│   │   └── ....
│   ├── byebug.bundle
│   └── byebug.rb
├── logo.png
└── test
    ├── examples
    │   ├── breakpoint.rb
    │   └── variables.rb
    ├── stepping_test.rb
    ├── support
    │   ├── breakpoint.rb
    │   ├── context.rb
    │   ├── matchers.rb
    │   ├── test_dsl.rb
    │   └── test_interface.rb
    └── variables_test.rb
10 directories, 131 files

Let’s go over them quickly:

The Rakefile contains tasks to build and test the gem. Rake compile compiles the gem using the rake-compiler gem to compile native extensions. The output of the C extension goes to ./lib/byebug.bundle. After which require will load it. Check my other article on making a native gem for details on how this works.
The byebug.gemspec contains the dependencies used and the specification for Byebug. You can see its a native gem because it has the s.extensions defined.

require File.dirname(__FILE__) + '/lib/byebug/version'

Gem::Specification.new do |s|
  s.name        = 'byebug'
  s.version     = Byebug::VERSION
  s.authors     = ['David Rodriguez', 'Kent Sibilev', 'Mark Moseley']
  s.email       = '[email protected]'
  s.license     = 'BSD'
  s.homepage    = 'http://github.com/deivid-rodriguez/byebug'
  s.summary     = %q{Ruby 2.0 fast debugger - base + cli}
  s.description = %q{Byebug is a Ruby 2.0 debugger. It's implemented using the
    Ruby 2.0 TracePoint C API for execution control and the Debug Inspector C
    API for call stack navigation.  The core component provides support that
    front-ends can build on. It provides breakpoint handling and bindings for
    stack frames among other things and it comes with an easy to use command
    line interface.}

  s.required_ruby_version     = '>= 2.0.0'

  s.files            = `git ls-files`.split("\n")
  s.test_files       = `git ls-files -- test/*`.split("\n")
  s.executables      = ['byebug']
  s.extra_rdoc_files = ['README.md']
  s.extensions       = ['ext/byebug/extconf.rb']

  s.add_dependency "columnize", "~> 0.3.6"
  s.add_dependency "debugger-linecache", '~> 1.2.0'

  s.add_development_dependency 'rake', '~> 10.1.0'
  s.add_development_dependency 'rake-compiler', '~> 0.9.1'
  s.add_development_dependency 'mocha', '~> 0.14.0'
end

You can see Byebug depends on rake, rake-compiler and mocha as development dependencies. And debugger-linecache (responsible for caching lines of code for showing context) and columnize (responsible for showing information in columns) as run time dependencies.

Let’s continue:

The ./lib/byebug directory contains the debugging support framework written in Ruby. And handles command processing for the REPL and code for managing breakpoints.
The ./test directory contains the tests for testing the gem.

The debugging process explained

The debug process uses an internal ruby API called TracePoint, which basically allows you to hook into the Ruby interpreter and execution process.

You can register a callback (hook) whenever a certain ruby line gets executed or when a certain event happens such as an exception thrown or returned from a method. Byebug works by hooking into these calls using the TracePoint API.

The basic process (highly oversimplified) works like this:

You run your regular ruby program.
You require the Byebug gem.
Byebug registers a couple of Tracepoint events using the TracePoint API from C. (In pseudo code: ‘call-this-hook-on-every-executed-ruby-line’, and a ‘call-this-hook-when-an-exception-occurs’ event)
It starts the TracePoint API and Byebug hook code gets called on every line.
It checks whether there are breakpoints defined for that line and file, and if so it breaks into the debug REPL. And gives you a prompt, where it waits for commands.
Additionally when a exception happens it breaks into the REPL as well.

Something to understand is, is that the code that is executed when a TracePoint is hit, is not being trace-pointed. Which is good or else we would end up in some weird tracepoint’ception.

I’ve build a small pseudo debugger in Ruby to explain the concept. Save it as tinydebug.rb and run this via ruby tinydebug.rb

state = :break; size = 0

# Here we hook into the TracePoint API, this block gets executed on every line.
trace = TracePoint.new(:line) do |tp|

  lines = File.read(tp.path).split /\n/
  line =  lines[tp.lineno-1]

  puts "#{tp.path}: #{tp.lineno} - #{line}"
  p tp.binding.eval('local_variables')

  if state == :step
    if size == caller.size then state = :break end
  end

  if state == :break
    action = (gets).strip
    puts "use n,s,bt" unless %w(s bt n).include? action

    if action == 's'
      state = :step
      size = caller.size
    end
  end
end

# From here on we enable the tracepoint API
trace.enable

puts "Use n to execute next, and s to step over a method"

def myfunc
  a = "im a local val"
  puts "Hey i am in a method"
  puts "I'm in a method"
end

puts "line one"
myfunc
puts "line two"

How the C extension works

A part of Byebug is written in C. This is primarily because of the following reasons:

Speed. It’s fast. Breakpoint checking and the TracePoint callbacks are done in C.
Low level access. Some things like binding-as-caller are not accessible from Ruby and therefore done in C.

A quick glance of some interesting lines and files in C.

In byebug.c the Init_byebug gets called when the C library gets loaded.

void
Init_byebug()
{
  mByebug = rb_define_module("Byebug");
  rb_define_module_function(mByebug, "setup_tracepoints",
                                     Byebug_setup_tracepoints, 0);
  rb_define_module_function(mByebug, "remove_tracepoints",
                                     Byebug_remove_tracepoints, 0);
  rb_define_module_function(mByebug, "context", Byebug_context, 0);
  rb_define_module_function(mByebug, "breakpoints", Byebug_breakpoints, 0);
  rb_define_module_function(mByebug, "add_catchpoint",
                                     Byebug_add_catchpoint, 1);
  rb_define_module_function(mByebug, "catchpoints", Byebug_catchpoints, 0);
  rb_define_module_function(mByebug, "_start", Byebug_start, 0);
  rb
  ... SNIP ..
}

You can see here it defined a module named Byebug.

If we take a look at Byebug_start you can see that it will setup the TracePoint in case it has not yet been registered.

static VALUE
Byebug_start(VALUE self)
{
  VALUE result;

  if (BYEBUG_STARTED)
    result = Qfalse;
  else
  {
    Byebug_setup_tracepoints(self);
    result = Qtrue;
  }

  if (rb_block_given_p())
    rb_ensure(rb_yield, self, Byebug_stop, self);

  return result;
}

You can see here which TracePoint are registered, in short ‘exception raised’, ‘line execution’, ‘class’ and ‘return’ events.

static VALUE
Byebug_setup_tracepoints(VALUE self)
{
  if (catchpoints != Qnil) return Qnil;

  breakpoints = rb_ary_new();
  catchpoints = rb_hash_new();
  context = context_create();

  tpLine = rb_tracepoint_new(Qnil,
    RUBY_EVENT_LINE,
    process_line_event, NULL);

  tpCall = rb_tracepoint_new(Qnil,
    RUBY_EVENT_CALL | RUBY_EVENT_B_CALL | RUBY_EVENT_CLASS,
    process_call_event, NULL);

  tpCCall = rb_tracepoint_new(Qnil,
    RUBY_EVENT_C_CALL,
    process_c_call_event, NULL);

  tpReturn = rb_tracepoint_new(Qnil,
    RUBY_EVENT_RETURN | RUBY_EVENT_B_RETURN | RUBY_EVENT_END,
    process_return_event, NULL);

  tpCReturn = rb_tracepoint_new(Qnil,
    RUBY_EVENT_C_RETURN,
    process_c_return_event, NULL);

  tpRaise = rb_tracepoint_new(Qnil,
    RUBY_EVENT_RAISE,
    process_raise_event, NULL);

  rb_tracepoint_enable(tpLine);
  rb_tracepoint_enable(tpCall);
  rb_tracepoint_enable(tpCCall);
  rb_tracepoint_enable(tpReturn);
  rb_tracepoint_enable(tpCReturn);
  rb_tracepoint_enable(tpRaise);

  return Qnil;
}

The actual processing of the lines happens in the method process_line_event.

static void
process_line_event(VALUE trace_point, void *data)
{
  EVENT_SETUP;
  VALUE breakpoint = Qnil;
  VALUE file    = rb_tracearg_path(trace_arg);
  VALUE line    = rb_tracearg_lineno(trace_arg);
  VALUE binding = rb_tracearg_binding(trace_arg);
  int moved = 0;

  EVENT_COMMON();

  if (dc->stack_size == 0) dc->stack_size++;

  if (dc->last_line != rb_tracearg_lineno(trace_arg) ||
      dc->last_file != rb_tracearg_path(trace_arg))
  {
    moved = 1;
  }

  if (RTEST(tracing))
    call_at_tracing(context, dc, file, line);

  if (moved || !CTX_FL_TEST(dc, CTX_FL_FORCE_MOVE))
  {
    dc->steps = dc->steps <= 0 ? -1 : dc->steps - 1;
    if (dc->stack_size <= dc->dest_frame)
    {
      dc->lines = dc->lines <= 0 ? -1 : dc->lines - 1;
      dc->dest_frame = dc->stack_size;
    }
  }

  if (dc->steps == 0 || dc->lines == 0 ||
      (CTX_FL_TEST(dc, CTX_FL_ENABLE_BKPT) &&
      (!NIL_P(
       breakpoint = find_breakpoint_by_pos(breakpoints, file, line, binding)))))
  {
    call_at_line_check(context, dc, breakpoint, file, line);
  }

  cleanup(dc);
}

You can see here the call to find_breakpoint_by_pos which brings us to the following.

How breakpoints are implemented

Breakpoints are not a ruby concept, but one created by Byebug. What happens under the hood is the following:

The Tracepoint hook gets called on every line being executed find_breakpoint_by_pos
Byebug checks this line against its collection of breakpoints. In this breakpoint collection there is a filename, and a line-number. It checks if the current file and line-number match. If so it returns a breakpoint.
The method call_at_line_check is called with given breakpoint.
That will call call_at_breakpoint

static VALUE
call_at_breakpoint(VALUE context_obj, debug_context_t *dc, VALUE breakpoint)
{
  dc->stop_reason = CTX_STOP_BREAKPOINT;
  return call_at(context_obj, dc, rb_intern("at_breakpoint"), 1, breakpoint, 0);
}

And that will call the method at_breakpoint on the context.rb object.

def at_breakpoint(brkpnt)
  handler.at_breakpoint(self, brkpnt)
end

And the handler will show the REPL.

How stepping works

Stepping is nothing more then breaking out of the REPL, and let ruby continue is next line. And then begin called back into the REPL.

How stepping over works

Stepping over is a bit more complicated. Stepping over works by checking the current length of the stack and saving this in a variable. Then continue away from the REPL and when called back to the Tracepoint check whether the current stack-frame size is the same as the one saved, if this is the case break into the REPL.

This way, when going into a new method the stack-frame size is higher than the one saved, so keep going executing every line without breaking until the stack size length is the same as the that is saved.

The REPL (command processor)

The command processor, is nothing more then an abstraction layer over the REPL. It basically allows for pluggable commands. You can find the code of this in lib/bybug/commands

module Byebug
  # Implements byebug "continue" command.
  class ContinueCommand < Command
    self.allow_in_post_mortem = true
    self.need_context         = false

    def regexp
      /^\s* c(?:ont(?:inue)?)? (?:\s+(\S+))? \s*$/x
    end

    def execute
      if @match[1] && !@state.context.dead?
        filename = File.expand_path(@state.file)
        return unless line_number = get_int(@match[1], "Continue", 0, nil, 0)
        return errmsg "Line #{line_number} is not a stopping point in file " \
                      "\"#{filename}\"\n" unless
          LineCache.trace_line_numbers(filename).member?(line_number)

        Byebug.add_breakpoint filename, line_number
      end
      @state.proceed
    end

    class << self
      def names
        %w(continue)
      end

      def description
        %{c[ont[inue]][ nnn]

          Run until program ends, hits a breakpoint or reaches line nnn}
      end
    end
  end
end

The test suite

The test suite is absolutely amaze-balls because it’s very declarative. Uses therefore less brain-cycles and enables you to spend those on real problems.

it 'must leave on the same line by default' do
  enter 'next'
  debug_file('stepping') { $state.line.must_equal 10 }
end

And these are the main parts of Byebug. Come back to get more content like this.