Getting to Know the Ruby Standard Library – TSort

TSort is an interesting part of the ruby standard library that performs topological sorting. Topological sorting is used in package management, analyzing source code, and evaluating prerequisites. RubyGems uses TSort to determine what order to install gems in when there are multiple dependencies. Let’s take a look at TSort in action.

Here is an example of a JobRunner which will take a bunch of tasks with prerequisites, and then tell you which order they should be performed in:

require 'tsort'

class JobRunner
  include TSort

  Job = Struct.new(:name, :dependencies)
  def initialize()
    @jobs = Hash.new{|h,k| h[k] = []}
  end

  alias_method :execute, :tsort

  def add(name, dependencies=[])
    @jobs[name] = dependencies
  end

  def tsort_each_node(&block)
    @jobs.each_key(&block)
  end

  def tsort_each_child(node, &block)
    @jobs[node].each(&block)
  end
end

if __FILE__ == $0
  runner = JobRunner.new
  runner.add('breakfast', ['serve'])
  runner.add('serve', ['cook'])
  runner.add('cook', ['buy eggs','buy bacon'])
  puts runner.execute
end

Running this file will show that you need to buy the eggs and bacon before you can cook and then serve breakfast.

buy eggs
  buy bacon
  cook
  serve
  breakfast

Hardly an achievement of complex reasoning, but as the number of prerequisites grow, this becomes vastly more useful.

TSort

Let’s take a look at the source (qw tsort if you have qwandry). The first thing to notice is that TSort is a module instead of a class. This means that instead of extending it, or calling it directly we will include it in another class. Notice that in the JobRunner above we called include TSort at the very beginning. This means our class will now include TSort’s functionality, but in order for it to work we need to implement a few methods. From the TSort rdoc:

TSort requires two methods to interpret an object as a graph,
  tsort_each_node and tsort_each_child.

  * tsort_each_node is used to iterate for all nodes over a graph.
  * tsort_each_child is used to iterate for child nodes of a given node.

Our implementation of tsort_each_node iterates over each job, while tsort_each_child will iterate over each prerequisite for the job. These methods allow TSort to provide two useful methods. The first is strongly_connected_components, which will return each node or sets of nodes forming a circular dependency. The second is tsort which we used above to return the nodes in a sorted order so that each of their prerequisites are satisfied. Lets take a look at what each of these does:

def tsort
    result = []
    tsort_each {|element| result << element}
    result
  end

This is simple enough, it just collects all the results from tsort_each and returns them. Following this thread we see that tsort_each then iterates over the results of each_strongly_connected_component, so lets take a closer look at that, and perhaps we can figure out how TSort works.

def each_strongly_connected_component # :yields: nodes
    id_map = {}
    stack = []
    tsort_each_node {|node|
      unless id_map.include? node
        each_strongly_connected_component_from(node, id_map, stack) {|c|
          yield c
    #...

From this snippet of code we can see that TSort is going to iterate over each of the nodes (jobs in our case), while doing this it will keep track of the position each node has in the stack with the id_map hash. Next let’s see what each_strongly_connected_component_from does with the node.

def each_strongly_connected_component_from(node, id_map={}, stack=[])
  minimum_id = node_id = id_map[node] = id_map.size
  stack_length = stack.length
  stack << node
  #...

We can see that id_map and stack are passed around to keep track of what has been seen already. Jumping to the end of the method, we see that when we finish, TSort is going to yield back an array of nodes:

if node_id == minimum_id
    component = stack.slice!(stack_length .. -1)
    component.each {|n| id_map[n] = nil}
    yield component
  end

We know from our example above that this eventually returns all the nodes in their proper order, and tsort_each expects the arrays to have a length of 1 so we can assume that if everything goes correctly stack.slice!(stack_length .. -1) should return an array with a length of 1. After it returns this array, it clears the entries from the id_map. We can deduce that TSort sorts the nodes by pushing items onto the stack, and then returning the top of it when there are no remaining prerequisites for the node.

Now let’s look at the main part of this method:

#...
tsort_each_child(node) {|child|
  if id_map.include? child
    child_id = id_map[child]
    minimum_id = child_id if child_id && child_id < minimum_id
  else
    sub_minimum_id =
      each_strongly_connected_component_from(child, id_map, stack) {|c|
        yield c
      }
    minimum_id = sub_minimum_id if sub_minimum_id < minimum_id
  end
}
#...

For each child we see that if we have already evaluated the node. If its index in the stack is less than our current minimum_id we treat it as the new minimum. Otherwise we recursively call each_strongly_connected_component_from for this node, which will push the child onto the stack and push its prerequisites onto the stack as well. Once there are no more nodes left to push on, the loop exits and the node will get popped off. If there is a cycle (nodes that depend on each other), all the nodes in the cycle will be returned. If this is starting to look familiar to you, this essentially amounts to a depth first search of all the nodes.

Recap

Our exploration of TSort has revealed a useful library, and shown us one way to perform a depth first search on a data structure that may include cycles, which is pretty neat. Curious about other applications of TSort? The strongly_connected_components that TSort returns also happens to be useful for detecting tightly coupled methods and classes which depend on each other which is useful when doing refactoring and static analysis of source code. This is just one of the handy things we can do with ruby’s standard library.

blog comments powered by Disqus
Monkey Small Crow Small