Sep 13

XPath From the Command Line Using Ruby

Category: Programming, Ruby, XML, XPath

There are other ways of doing this, but I thought it would be fun to write a command-line xpath script in ruby using rexml. (full working example here)

Parsing the Command Line Arguments

The first step is to set up the command line options. I want to have two arguments:

  1. a file argument for specifying the XML file
  2. an xpath argument for specifying the xpath expression
require 'optparse'
require 'ostruct'
 
options = OpenStruct.new
opts = OptionParser.new do |opts|
    opts.banner = "Usage: xpath.irb -x [-f]"
    opts.on("-f", "--file FILENAME",
        "Optionally specify xml file (defaults to STDIN)") do |file|
            options.xml_file = file
    end
    opts.on("-x", "--xpath PATH",
        "Sepcify xpath expression") do |xpath|
            options.xpath = xpath
    end
end
 
begin
    opts.parse(ARGV)
rescue OptionParser::ParseError => err
    puts err.message
    puts opts
    exit
end
 
if(options.xpath == nil)
    puts "Must specify an xpath expression."
    puts opts
    exit
end

Applying the XPath Expression

Now I just need to get the XML document and apply the XPath expression to it:

require 'rexml/document'
require 'rexml/xpath'
 
...
 
doc =nil
if(options.xml_file != nil)
    puts options.xml_file
    doc = REXML::Document.new(File.new(options.xml_file))
else
    doc = REXML::Document.new(STDIN)
end
 
puts "<result>"
REXML::XPath.each(doc, options.xpath){ |element| puts element }
puts "</result>"

Notes

Ruby seems to do pretty well on the usability level here (similar Java examples replace the two lines of XML-specific code with about a page), but the speed is just terrible. I’m clocking half a second on a tiny document with a simple query.

To be fair, I did the same thing using xalan by modifying this example and it was also ridiculously slow. Xalan does a lot better if you have a precompiled XPath expression or XSLT script, but I usually find myself generating these things on the fly which means xalan doesn’t help much.

Anyway for now if you are looking to get performance I’d suggest libxml (example here).

The Author

Michael Smit is a software engineer in Seattle, Washington who works for amazon

Comments are off for this post

Comments are closed.