How the ruby debugger Byebug works with TracePoint API
In the series ‘How it works’ I look at awesome code of others. This week I look
at Byebug a Ruby 2.0 debugger gem.
What is Byebug?
Byebug is an amazing ruby 2.0 debugger gem. It allows for debugging of ruby 2.0 using the
Tracepoint API. It is a gem with native extensions meaning it requires C compilation to
be used. We are looking at byebug 1.8.2 in this post.
Why am I reviewing Byebug?
Byebug is the first 2.0 debugger. It’s elegant, easy to use. And written by an amazing developer David Rodriguez. It’s
the default reference project for anyone who wants to write native, well tested-gem.
REPL - The debugger, the inspector, the thing where you type commands after it has hit a breakpoint. REPL stands for Read-eval-print-loop.
Byebug consists out of a couple of parts.
C extension with hooks to the Tracepoint API and deals with low level concepts such as stack frames.
A breakpoint and catchpoint system
A ruby support library with acts as a bridge to the C API and allows for command processing and a REPL.
A beautiful test suite
Personally, I find the code and the library absolutely stunning. It’s both beautiful and elegant.
The gem file structure
The command tree comes up with the following file structure. I’ve removed a lot of files. But this is
the basic outline.
Let’s go over them quickly:
The Rakefile contains tasks to build and test the gem. Rake compile compiles the gem using the rake-compiler gem to compile
native extensions. The output of the C extension goes to ./lib/byebug.bundle. After which require will load it. Check my other article
on making a native gem for details on how this works.
The byebug.gemspec contains the dependencies used and the specification for byebug. You can see its a native gem because it has the s.extensions defined.
You can see Byebug depends on rake, rake-compiler and mocha as development dependencies. And debugger-linecache (responsible for caching lines of code for showing context) and columnize (responsible for showing information in columns) as runtime dependencies.
The ./lib/byebug directory contains the debugging support framework written in Ruby. And handles command processing for the REPL and code for managing breakpoints.
The ./test directory contains the tests for testing the gem.
The debugging process explained
The debug process uses an internal ruby API called TracePoint, which basically allows you to hook into the Ruby interpreter and
You can register a callback (hook) whenever a certain ruby line gets executed or when a certain event happens such
as an exception thrown or returned from a method. Byebugs works by hooking into these calls using the TracePoint API.
The basic process (highly oversimplified) works like this:
You run your regular ruby program.
You require the byebug gem.
Byebug registers a couple of Tracepoint events using the tracepoint API from C. (In pseudo code: ‘call-this-hook-on-every-executed-ruby-line’, and a ‘call-this-hook-when-an-exception-occurs’ event)
It starts the tracepoint API and Byebug’s hook code gets called on every line.
It checks whether there are breakpoints defined for that line and file, and if so it breaks into the debug REPL. And gives you a prompt, where it waits for commands.
Additionally when a exception happens it breaks into the REPL as well.
Something to understand is, is that the code that is executed when a tracepoint is hit, is not being trace-pointed. Which is good
or else we would end up in some weird tracepoint’ception.
I’ve build a small pseudo debugger in Ruby to explain the concept. Save it as tinydebug.rb and run this via ruby tinydebug.rb
How the C extension works
A part of Byebug is written in C. This is primarily because of the following reasons:
Speed. It’s fast. Breakpoint checking and the tracepoint callbacks are done in C.
Low level access. Some things like binding-as-caller are not accessible from Ruby and therefore done in C.
A quick glance of some interesting lines and files in C.
In byebug.c the Init_byebug gets called when the C library gets loaded.
You can see here it defined a module named Byebug.
If we take a look at Byebug_start you can see that it will setup the tracepoints in case it has not yet
You can see here which tracepoints are registered, in short ‘exception raised’, ‘line execution’, ‘class’ and ‘return’ events.
The actual processing of the lines happens in the method process_line_event.
You can see here the call to find_breakpoint_by_pos which brings us to the following.
How breakpoints are implemented
Breakpoints are not a ruby concept, but one created by byebug. What happens under the hood is the following:
The Tracepoint hook gets called on every line being executed find_breakpoint_by_pos
Byebug checks this line against its collection of breakpoints. In this breakpoint collection there is a filename, and a line-number. It checks if the current file and line-number match. If so it returns a breakpoint.
The method call_at_line_check is called with given breakpoint.
That will call call_at_breakpoint
And that will call the method at_breakpoint on the context.rb object.
And the handler will show the REPL.
How stepping works
Stepping is nothing more then breaking out of the REPL, and let ruby continue is next line. And then begin called back into the REPL.
How stepping over works
Stepping over is a bit more complicated. Stepping over works by checking the current length of the stack and saving this in a
variable. Then continue away from the REPL and when called back to the Tracepoint check whether the current stack-frame size is the same as the one saved, if this is the case break into the REPL.
This way, when going into a new method the stack-frame size is higher than the one saved, so keep going executing every line without breaking
until the stack size length is the same as the that is saved.
The REPL (command processor)
The command processor, is nothing more then an abstraction layer over the REPL. It basically allows for pluggable commands.
You can find the code of this in lib/bybug/commands
The test suite
The test suite is absolutely amaze-balls because it’s very declarative. Uses therefore less brain-cycles and
enables you to spend those on real problems.
And these are the main parts of byebug. Come back to get more content like this.