Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Part of it also that this isn't a regular language. The PCRE is more powerful a language than a Chomsky type 3 language in that there are strings that can be matched by a PCRE (such as a prime number expressed in unary) that are not recognized in a pure regular language.

Extending finite automata to efficiently match Perl-compatible regular expressions - http://dx.doi.org/10.1145/1544012.1544037



I don't know why people keep pointing this out - RegEx's not being regular languages has been true for basically all of history (it is not just pcre, traditional unix (basic) regexes also have this). Most people's only experience with "regular" things have been with non-regular regexes. Grep is 51 years old at this point.


Because recognizing a prime number using an actual regular expression would prove that prime numbers form a regular language, which would be an incredible result.

And somehow, some people like me are in computer science mode when reading such sentences, such reminders wakes us up: "Oh, ok, not actually regular, not such a big deal"


> Because recognizing a prime number using an actual regular expression would prove that prime numbers form a regular language, which would be an incredible result.

It would be literally incredible, because the pumping lemma shows that it's false.


> because the pumping lemma shows that it's false

Ah yes, I even used to teach this actually. Sorry for the understatement xD.


> Ah yes, I even used to teach this actually. Sorry for the understatement xD.

No worries! Mainly I was just pleased with myself for remembering the pumping lemma, and kind of wished that I knew how to contact my professor (from back in the mid-1990s, when an intro CS course was taught out of Sipser and involved almost no actual programming) and tell him that he did such a good job that it stuck with me some 30 years later!


The OP's title uses the word "regular", and it's about doing mathy things which puts the brain in math mode, so it's helpful to point out that this is only works with non-regular regexes.


The POSIX standard for regular expressions (which grep implemented half a century ago) doesn't support back references. Even I-Regexp (RFC 9485) doesn't support it.

    This specification describes an interoperable regular expression (abbreviated as "regexp") flavor, I-Regexp.

    I-Regexp does not provide advanced regular expression features such as capture groups, lookahead, or backreferences. It supports only a Boolean matching capability, i.e., testing whether a given regular expression matches a given piece of text.
It wasn't until '97 when PCRE was released to mimic Perl's handling of regex and some time after that that GNU grep added -P as an option (BSD doesn't appear to support PCRE).

While PCRE is a defacto standard (I been heard uttering "ugh, that only handles posix regex"), for most of the history of regex they were only as powerful as a NDFA.


POSIX BREs do support backrefs. See section 9.3.6 BREs Matching Multiple Characters point 3 at https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1...

Backref support was added to grep between 6th edition and 7th edition unix

6th edition grep manual: http://man.cat-v.org/unix-6th/1/grep

7th edition grep manual: http://man.cat-v.org/unix_7th/1/grep

Both of those refer to ed(1) for the syntax of regular expressions

6th edition ed manual: http://man.cat-v.org/unix-6th/1/ed

7th edition ed manual: http://man.cat-v.org/unix_7th/1/ed

POSIX EREs do not support backrefs. This goes back to the 1970s because egrep used a different regex matching algorithm to grep — egrep compiled the regex to a DFA which could not match backrefs, unlike grep’s nondeterministic algorithm — and egrep also had different syntax.


Before seeing the regex, I was thinking, how can you possibly recognize a prime number with a regular language?

The answer is, you don't, this regex doesn't describe a regular language


You lack theory of mind, people may "experience" regexes in practice but not make the careful distinction/connection to the elementary theory (and theorems, e.g. about the limitations of regular expressions) taught in CS majors at university, this is not some unusual disconnect, but happens often and in many disciplines whenever transferring any knowledge from academia to industry.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: