You lex into tokens and then parse the tokens. It’s much easier to write cleaner, extensible, and maintainable code this way. For example, why have the hassle of worrying about whether you’re dealing with a string with numbers or just a number itself while parsing?
The part of my parser that parses tokens looks identical whether or not I'm using a separate lexing stage. I write a rule that expects a number, I write a rule that expects a string. Whether or not the item is a terminal (like it would be with a lexer) or a non-terminal (like it would be with a parser) doesn't really change how the rule is written.
Lexers have a maximal-munch rule. Without a separate lexing stage, that's kinda hard to emulate within a parser (depends on the type of parser, I guess), as you'd end up with additional ambiguities.
So it sounds like lexers have specific advantages with parsing technologies that admit ambiguities? Since I've used PEGs and recursive-descent parsers almost exclusively, that might explain why I haven't had issues.
Yes, but when or if you need to update your list of available terminals, or extend the parser because of introduced ambiguity to the language, having the lexing and parsing be separate reduces the overhead significantly.