You lex into tokens and then parse the tokens. It’s much easier to write cleaner...

aidenn0 · on May 4, 2023

The part of my parser that parses tokens looks identical whether or not I'm using a separate lexing stage. I write a rule that expects a number, I write a rule that expects a string. Whether or not the item is a terminal (like it would be with a lexer) or a non-terminal (like it would be with a parser) doesn't really change how the rule is written.

ynik · on May 5, 2023

Lexers have a maximal-munch rule. Without a separate lexing stage, that's kinda hard to emulate within a parser (depends on the type of parser, I guess), as you'd end up with additional ambiguities.

aidenn0 · on May 5, 2023

So it sounds like lexers have specific advantages with parsing technologies that admit ambiguities? Since I've used PEGs and recursive-descent parsers almost exclusively, that might explain why I haven't had issues.

leroy-is-here · on May 4, 2023

Yes, but when or if you need to update your list of available terminals, or extend the parser because of introduced ambiguity to the language, having the lexing and parsing be separate reduces the overhead significantly.