Dec 10

Spirit Vs. Lex/yacc/et al.

What are the differences and when should I use one or the other?

What is Lex/Yacc?


Lex and Yacc are some fairly ancient GNU tools which you can use to parse custom LALR languages (typically programming languages). Lex and Yacc are actually separate programs which generate C code from a custom language unique to each.

Lex generates code for parsing text into tokens. Tokens are basically a combination of an identifier such as INTEGER, or FLOAT_LITERAL, or STRING_LITERAL that describe the type of the token and the data associated with that token.

Yacc allows you to map particular sequences of tokens to C code for handling those tokens. For instance the sequence NAME_CONSTANT EQUALS INTEGER_CONSTANT might describe an assignment (var_name = 12).

The LALR acronym basically describes the limitations of the parser in terms of what grammars can be described and parsed. (If you really want to know what it means look it up on wikipedia).

What is Spirit?


Spirit is also a framework for parsing custom grammars that is a Domain Specific Language written in C++ itself.

What’s the Difference


In Language DSL vs. External Tool


Spirit is a C++ library and as such does not require any additional tools. There is no pre-compile step involved. Usually this would imply that Spirit has less helpful error messages than lex/yacc. In practice lex/yacc has terrible error messages anyway.

LALR vs. EBNF


Yacc is a LALR parser which, without getting into the details, makes it very tedious, brittle, and unmaintainable to represent particular grammars in the tool.

Spirit, on the other hand, uses a EBNF style notation to define it’s grammar which means you can represent a lot more stuff in a much more compact form.

Functional Vs. Procedural


Both Yacc and spirit are essentially functional languages that allow you to apply procedural code based on pattern matching.

Unfortunately, spirit is written in a procedural language that lacks a good closure feature. This means spirit code is frustratingly similar to what you might write in YACC (or preferably ANTLR) while not quite allowing you to program in the same style.

Which should I use?


For parsing small custom languages in C++ use spirit. It doesn’t involve an extra tool dependency, it’s readable by developers who are used to EBNF notation, and it beats what you usually get which is a pile of line.find_first_of(’ ‘) statements and the like that are easy to break and hard to maintain. Spirit describes the grammar, not the steps used to parse it.

For major grammars representing sophisticated custom grammars I’d suggest using a modern parser generator like ANTLR. Spirit is very cool as far as is goes, but it does suffer from it’s medium. C++ is not a pretty language and C++ DSLs are not for the faint of heart.

So basically don’t use lex/yacc at all, use spirit for every day mini-languages in C++, and use a modern parser generator like ANTLR for everything else.

The Author

Michael Smit is a software engineer in Seattle, Washington who works for amazon

Comments are off for this post

Comments are closed.