I've wrote a lot of python crawlers, and have never really experienced an issue with lxml. It doesn't really ever seem to choke, and I'm sure I must have come across some pretty funky markup in my time. lxml has always handled things fine for me. Maybe I've just gotten lucky I guess?