Hacker Newsnew | past | comments | ask | show | jobs | submit | jurgenv's commentslogin

GLR, GLL and Earley's parser (with the tree construction) are labeled "context-free general" parsing algorithms because they can handle _any_ context-free grammar while the other algorithms associated with sub-classes of CFGs, such as LR, LALR and LL do not implement all CFGs. In particular this comes in handy for languages which are unambiguous yet require a lot of lookahead or context to avoid parser action conflicts. The context-free general algorithms are popular among people who have to rapidly prototype languages or implement many different language processing tools without the opportunity to spend months on a parser. We pay in speed. GLR and GLL typically run nearly linearly on normal files, while Earley has a bad average case behavior which is often quadratic.

Another benefit of context-free general implementations is composability of grammars. Since CFGs are closed under composition you can implement modular grammar formalisms using a context-free general parser.

Another main drawback is that (un)ambiguity of context-free grammars is undecidable, and so prototypes generated using GLR, GLL and Earley may produce more than one tree, and in release mode you migh report an unexpected static error to your user in that case. We deal with this drawback using random testing/fuzzing and some static analysis.


Working with code Analysis and transformation techniques at the same time was always about glueing database and logic progressing tech to functional programming and algebraic specification tech for us. Just having those linguistically integrated before you start thinking about a language progressing tool makes working with Rascal a lot of fun.


Goals are often similar, but methods very different. Epsilon and Rascal share many design decisions. Rascal is based fully on immutable data, including relations, while the EMF is based on objects.


Thanks! That's exactly what I wanted to know.


What was developed in IBM time and what was written as Eclipse plugin is EPL. The rest is BSD2.


That was version 0.1 bootstrapped off SDF2. Since 2009 Rascal is based on a topdown general algorithm calked GLL. It's still Scannerless with declarative disambiguation, like SGLR. Currently it's evolving to data-dependent GLL to be able to model the offside rule, the C pre-processor and symbol table, and other "interesting" stuff we encounter in programming languages. See thesis by Afroozeh and Izmaylova.


CodeBuff is also learning from examples. Check out the paper at the SLE conf; it has all the nitty gritty details how it learns the whitespace rules from the example files.


haven't tried, but if you train it using only correct Python I bet it can only produce correct Python. Have to check this out to be totally sure of the absence of a weird corner case.


The grammar is used to compute the code layout features that CodeBuff learns from: it parses each file and associates spacing and indentation features to trees' contextual features.

Then, when we parse a new file to pretty-print, we parse again with the _same parser_ and the features that were learned are matched to the tree at hand to recover the "right" spacing and indentation features for the given example code.


It's language parametric, so it can work for any language you have a grammar for and a set of example files to train on. In this sense it helps the authors of formatting tools.

For the users for a specific language it could also be an improvement since configuration of a formatter can be done via completely arbitrary code examples. But we have to work a bit further to streamline that use case.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: