< Prev^ UpNext >

Where To Start

The first thing to decide when creating a project is what to name it. I'm going to stay fairly pedestrian and non-clever here, and after a quick check with Google to try and avoid duplicating names with another project, just name it "ruby-explorer". (I really wanted to use "t-rex", with the justification that it stands for "The Ruby Explorer". Sadly, that name is already taken by multiple projects.)

So, naming taken care of, I've created a Github repository for the project.

All good software starts the same way: bite off a manageable chunk and make it work. I find that it's best to have that chunk be a working application, however rudimentary. Starting with a working app helps avoid painting yourself into a corner with an unworkable design. You'll find out fast if you're headed that way, while the cost to correct it is lowest. In the interest of keeping things simple for the first pass, it'll be a command-line app.

For starters, let's say that I want to know all of the files loaded by a Rails project. This should be a fairly straightforward exercise, with plenty of room to add extra functionality once that's up and running.

A bit of terminology here: when I want to talk about the code we're exploring, I'll use the adjective "target". So I might talk about the "target application", or "target code", for example.

Figuring Out the Clever Bit: A Spike

There's a bit of exploration to do here before I start on the app itself; the problem at hand is the basic question of how to intercept calls to Ruby's core methods. The terminology I like for this sort of exploration is taken from Extreme Programming (XP), and is "spike". A spike is just a bit of exploratory code. It is not production code, nor is it ever intended to be. Its sole purpose is to learn something.

To begin with the smallest piece, I'll tackle the simple problem of noting every time that the target code calls require. I won't worry about calls to require_relative or load; if I can handle require in the spike, then I'm confident I can handle all three in the production code.

For target code, we'll use a simple "hello, world" program with one require at the top:

require 'socket' # Arbitrary module; not actually used

puts "In the subject app"

Only slightly more code is needed to spy on the target:

module Kernel
  alias_method :original_require, :require

  def require(name)
    puts "doing require(#{name})"
    return_value= original_require(name)
    puts "require(#{name}) returned #{return_value}"
    return_value
  end
end

There's some subtle stuff going on here, though. First, we're opening up the Kernel module, one of the foundations of Ruby. The ability to re-open and "monkey-patch" an existing module or class is one of the things that makes sophisticated metaprogramming possible. Next, we use alias-method to create a new name for require. It is important to distinguish between the code for a method and the name of the method. After the alias_method, we have two names for the same code. The same code will be invoked if we call require or original_require.

Next, we define require in the usual way. This creates a new chunk of code and names it Kernel::require. The original code for require is still there, and we can get to it via the name original_require, even though the name require is now attached to our new block of code.

After managing this trick, the rest is straightforward; we tell the user that we're about to call require, actually do the call to the original require method, making sure to remember what it returned, and tell the user about the return value. Then we actually return the value.

The code for the initial spike is checked in under the directory spikes/wrap-require.

We can use Ruby's -r option to force it to load our inspection code before the target app, so that we can get our hooks into require before the target can call it. When we try all this out, it looks like:

jmax@deepthought $ ruby -r ./wrap_require.rb subject_app.rb
doing require(socket)
doing require(socket.so)
require(socket.so) returned true
doing require(io/wait)
require(io/wait) returned true
require(socket) returned true
In the subject app
jmax@deepthought $

And we've already uncovered something interesting. The target's call require 'socket' does two requires of its own, for socket.so and io/wait. The second require isn't particularly odd; it's just loading up another chunk of code that it wants to use. The first one, though, is loading up a binary file; a shared library, probably written in C. This is a peek into the guts of a Ruby extension; if there's non-Ruby code involved, it gets packaged as a shared library, and the .so then gets loaded with require, just as a Ruby file would be.

All this is almost completely transparent to the target code; the only things affected are:

The first thing is simply an artifact of this being a spike; we're investigating how to wrap an existing method, and haven't made any effort to leave $stdout untouched. In production code, we'll have to avoid polluting the target's output.

The second somewhat unavoidable. By the very nature of metaprogramming, we're going to be executing some extra code; all we can do is keep the overhead as low as possible.

We'll deal with the third item, not polluting the target code's name space, next.

Summarizing what we've learned from our spike, there are three points:

I'll record these items in the project's Github wiki on a Notes page, so I don't forget them. As I address them, and as I discover new things, I'll update that page.

< Prev^ UpNext >