Jul 20

Boost Spirit Part II (Attaching Actions to Your Grammar)

Category: C++, Programming

In the last article we covered how to define a grammar in the boost spirit library. This article will show you how to use the data parsed by the grammar with actions. The code for this example is here.

Actions and Functors

Spirit expressions can be associated with arbitrary code that is executed when those expressions are matched. As with STL, spirit uses functors (if you haven’t used functors please check this) which are attached to rules using the [] operator:

some_rule= (rule_a | rule_b)[some_functor];

When the contents of (rule_a | rule_b) is matched some_functor is called and passed the matched section of text. If a rule defines multiple actions they are called in the order they are matched (with some subtleties noted at the end of the article).

The functor prototype can depend on the type of rule being parsed, but most likely you’ll be using one that takes a pointer to the start and end of the text that was matched:

struct struct_functor_type{
    void operator()(const char *start, const char *end){
        //do whatever
void function_functor(const char *start, const char *end){

Binding Actions to Methods

Now many of the examples in the spirit documentation use functions which operate on global data as the functors. For the most part I don’t like working on global data so I’m going to use method references instead. There is a very mild trick to this which the boost function and bind libraries make possible.

From a function pointer perspective, C++ methods are basically C functions with an extra this argument:

class c{
    void method(const char *start, const char *end);
//c::method is essentially
void c_method(c *this, const char *start, const char *end);

Since spirit is expecting two arguments we need to wrap the method call so the “this” argument is bound to an object pointer. Bind allows you to set arguments as follows:

c c_obj;
function2<void, const char *, const char *> func =
    bind(&c::method, &c_obj, _1, _2);
func(NULL, NULL); //equivalent to c_obj.method(NULL, NULL)

Extending The Example

The Grammar

So now we are going to apply this strategy to the example from the previous article. The grammar will take references to functors in its constructor which will then be attached to the relevant rules as actions.

In this case I’m going to bind an action to each of the arguments to a rule (i.e. type, name, etc) and one more action to the end of that record. As an example let’s just bind the property name, value and end:

//typedef the standard functor callback type to save typing
typedef function2<void, const char *, const char *> grammar_cb;
//add functors of interest to the constructor
struct record_grammar:public grammar<record_grammar>{
    record_grammar( grammar_cb &property_name,
        grammar_cb &property_value,
        grammar_cb &end_property):
    //bind actions to relevant rules.
    template<class ScannerT>
    struct definition{
        definition(const record_grammar &self){
            prop_decl = (
                identifier[self.property_name] >> ch_p('=') >>
        rule<ScannerT> const& start(){return record_list;}
    grammar_cb &property_name;
    grammar_cb &property_value;
    grammar_cb &end_property;

The Document

Assuming we’ve defined additional actions for records in the grammar using the technique above, now we’re going to write a class that can uses the grammar to actually do something. In this case I’m going to defined two structs that represent records and properties:

struct property_info{
    string name, value;
struct record_info{
    string type;
    string name;
    vector<property_info> properties;
    vector<record_info> sub_records;

Now I’m going to write a class that builds a set of record_info objects from an input string.

class document{
    parse_info<> parse(const std::string &data){
        grammar_cb property_name_f(
            bind(&document::property_name, this, _1,_2));
        grammar_cb property_value_f(
            bind(&document::property_value, this, _1,_2));
        grammar_cb end_property_f(
            bind(&document::end_property, this, _1,_2));
        grammar_cb record_type_f(
            bind(&document::record_type, this, _1,_2));
        grammar_cb record_name_f(
            bind(&document::record_name, this, _1,_2));
        grammar_cb end_record_f(
            bind(&document::end_record, this, _1, _2));
        record_grammar g(
        parse_info<> info = 
            boost::spirit::parse(data.c_str(), g >> eps_p, space_p);
        return info;
    const record_info & result(){
        return record_stack.top();
    void property_name(const char *start, const char *end){
        current_property.name = string(start, end);
    void property_value(const char *start, const char *end){
        current_property.value = string(start, end);
    void end_property(const char *start, const char *end){
    void record_type(const char *start, const char *end){
        record_stack.top().type = string(start, end);
    void record_name(const char *start, const char *end){
        record_stack.top().name = string(start, end);
    void end_record(const char *start, const char *end){
        record_info top = record_stack.top();
    stack<record_info> record_stack;
    property_info current_property;

The full example along with code to print out the result can be found here.

Potential Problems

Ambiguous Grammar

One non-obvious aspect of actions is that an ambiguous grammar can result in an action getting called more times than you had expected. For instance, I originally wrote the record rule as follows:

record = (
    identifier[self.record_type] >>
    identifier[self.record_name] >> ch_p('{') >>
            (prop_list >> ch_p(',') >> record_list) |
            prop_list |
     >> ch_p('}')

Given this input:

Type Name{

The parser will first match the prop_list in (prop_list >> ch_p(',') >> record_list) causing any actions in prop_list. When it doesn’t find a “,”, the parser will then correctly match the stand-alone prop_list, calling all the actions a second time.

Different Parse Iterators

Another problem with boost is that the arguments to your functor callbacks change type depending on what type of iterator you use in the parse call. If you use a C string the arguments are simply char *’s, but if you use a file iterator or some other iterator the arguments must be specified in terms of that iterator.

This makes it very hard to write parser logic in the style above that can be used with more than a single type of iterator.

Improving the Error Messages

For information on improving the error messages for your grammar take a look at the next article.

The Author

Michael Smit is a software engineer in Seattle, Washington who works for amazon


5 Comments so far

  1. jose October 9th, 2008 11:45 am

    Thank you for providing these great spirit articles.

    Maybe you should provide the same example with Spirit 2 qi. The doubt I have about Spirit 2 is when
    to use spirit::lex vs spirit::qi given that the lexer also allows semantic actions and the qi PEG parser doesn’t require using the lexer


  2. [...] article expands the example from the previous two, here and here, to add more specific error [...]

  3. [...] on to the next article in order to learn how to do something useful with your grammar. The AuthorMichael Smit is a [...]

  4. ilham November 6th, 2008 11:15 am

    Thank you for the articles on spirit. Just a tiny editorial comment. Is it possible to use a less bright cyan color for highlighted keywords in the examples? It’s pretty difficult to read them on white.

  5. mike November 6th, 2008 11:18 am

    Good question. I’m using a standard wordpress plugin to do the code formatting. I think all I have to do is change the CSS provided with the plugin. I’ll look into it.