Feb 18

Converting hex to Binary in 4 Languages

Category: C, Perl, Programming, Python, Ruby

I’ve been playing around with some scripting languages recently so I thought I’d do a small example in a few different languages for laughs.

The Program


So basically I wanted to write a script that takes a string of hexadecimal as input and outputs a stream of binary data:



cat 00 01 02 03 04 | ./myscript > binary_out

Ruby

I’ve been playing around with the Ruby scripting language of Ruby on Rails fame recently. I came up with the following:

#!/bin/ruby
class String
  def hex_to_binary
    temp = gsub("\s", "");
    ret = []
    (0...temp.size()/2).each{|index| ret[index] = [temp[index*2, 2]].pack("H2")}
    return ret
  end
end
STDOUT.write(STDIN.read.hex_to_binary)

NOTE: the Ruby expert tells me that extending global class types is usually not a good idea. I would tend to agree, but that begs the question of why they allow you to do it

NOTE #2: Tim has pointed out in the comments that you could just as easily do

STDOUT.write(STDIN.read.gsub(/\s/,'').to_a.pack("H*"))

Which is absolutely correct, but less interesting to talk about so I’m going to leave the original up anyway:) Anyway the correct example is pretty much identical to the Perl example except you have to convert your string to an array first. The only annoyance there is it means you have to dig a bit through documentation to find it, though not nearly as much as Python.

First I use the Ruby ability to dynamically extend any class at any time to add a new method hex_to_binary to the String class. This is cute, but I’m not sure how kosher it is.Anyway, gsub replaces any whitespace with the empty string.(0...temp.size()/2) creates a set of numbers between 0 and the size of the temp/2 and each iterates over each element in that set.

The bit in the brackets a closure or a way of wrapping a chunk of code as an object (in java you might use an anonymous class). The |index| indicates the closure takes a single argument which in this case is the current element in the set being iterated over.

Finally, [temp[index*2, 2]] creates a single element array containing a two character chunk of the text and pack method takes two characters in hexadecimal format and returns a byte of data.

Once we have the method all the final line does is read from STDIN, call hex_to_binary on the resultant string, and write it to STDOUT.

So all in all probably on the confusing side if you don’t know any ruby, but pretty elegant once you play with the language a bit.

Python


I’ve been using Python on and off because of the scons build system.
It seems to be pretty popular in NASA circles these days. I’m not completely sold myself, but here we are….

#!/usr/bin/python
import sys
import binascii
import re
p = re.compile('\s+')
hex_string = sys.stdin.read()
hex_string = p.sub('', hex_string)
sys.stdout.write(binascii.unhexlify(hex_string))

So this is pretty straight forward I suppose. Read standard in, replace any whitespace with the empty string, and call the convenient (if poorly named) unhexlfy to convert the string to binary. On the plus side I’d say the code is probably clearer to a C/C++ coder than the Ruby example. On the negative side Ruby seems to have a pretty coherent set of features whereas I spent a lot of time looking through a mishmash of libraries to find stuff in Python. Some things are methods of the class others are functions in libraries and which is where and why isn’t all that clear.


Perl


I suppose no set of scripting examples could exclude perl….

#!/usr/bin/perl
chomp(@lines = <STDIN>);
$line = join('', @lines);
$line =~ s/\s+//g;
print pack "H*", $line;

OK so in perl the character prefixing a variable determines its type so @lines is an array and $line is a variable (and %map is a hashmap). @lines = reads all the data on STDIN and breaks it up by newline. There is some context dependency to this line in that $line = would just read a single line.

Anyway, chomp removes the newline from the end of any of the strings in the array. join('', @lines) concatenates the elements of the array using the empty string as padding.

In perl regular expressions take the form of $variable =~ regex_here. The regular expression may simply match expressions or actually replace them in the string. In this case, s/\s+//g, the leading ’s’ means that this is a substitution. Anything matched by the expression in the first pair of ‘/’s is replaced by what is in the second. The trailing ‘g’ indicates that all instances should be replaced rather than just the first match.

Pack is similar to the ruby pack command from the first example. The H indicates to pack hexadecimal characters and the * indicates that there are an unlimited number of characters to be packed.

I think it says something that I didn’t really use many of the obscure features of perl here….

In C

It stands to have some comparison here. here is a good example. All things considered I prefer any of the above.

The Author

Michael Smit is a software engineer in Seattle, Washington who works for amazon

5 comments

5 Comments so far

  1. [...] apparently the hex to binary in 4 languages portion of this web page is what gets the most google hits and who am I to argue? So without [...]

  2. Tim Huang March 14th, 2008 3:57 pm

    for Ruby,
    STDIN.read.to_i(base=16).to_s(base=2)

  3. mike March 14th, 2008 7:08 pm

    Thanks Tim, but that isn’t what I’m trying to do. I want to output binary data (as in not ASCII) not base two integer string. Additionally that code, although concise, doesn’t handle spaces or newlines properly.

  4. Tim Huang March 14th, 2008 11:39 pm

    How about this?

    STDOUT.write(STDIN.read.gsub(/\s/,”).to_a.pack(”H*”))

  5. mike March 15th, 2008 6:02 am

    Duly noted and updated:)

Leave a comment