"C is not perfect." Yup, it has some "dark corners" or whatever want to call its...

pests · on March 11, 2013

According to the linked presentation, slide 13:

"The C specification says that when there is such an ambiguity, munch as much as possible. (The "greedy lexer rule".)"

So j+++++k turns into:

j++ ++ + k

Which is clarified on the next slide.

graycat · on March 11, 2013

Wow!

I would have guessed that j++ ++ was not legal syntax.

So, I was wrong: There are two ways to parse that mess. So, there is ambiguity. And the way they resolve the ambiguity is their 'greedy' rule! Wow!

Net, that tricky stuff is too tricky for me.

There was a famous investor in Boston who said that he only invests in companies only an idiot could run well because the chances were too high that too soon some idiot would be running the company.

Well, I want code, or at least language syntax, that any idiot can understand, for now, me, and later some of the people that might be working for me!

You are way ahead of me on C, and you leave me more afraid of it than I was. But then I was always afraid of it and, in particular, never wrote ++.

graycat · on March 12, 2013

Okay, some clarity from actually running some simple code! Or if K&R didn't make a lot of details clear to me in my fast reading, then maybe some simple test cases will!!!

So, my first issue was the statement for C

     i = j+++++k;

So, to make some tests, I dusted off my ObjectRexx script for doing C compiles, links, and execution.

Platform: Windows XP SP3 with recent updates. And apparently somehow I have

     Visual Studio 2008 x86 32 bit

installed, and it has relevant "tools", e.g., a C/C++ compiler, linker, etc.

I don't use IDEs or Visual Studio and, instead, apparently as a significant fraction of readers at HN, write code with my favorite text editor (e.g., KEdit) and some command line scripts (using ObjectRexx, which is elegant but for better access to Windows services, etc. likely I should convert to Microsoft's PowerShell).

So, I typed in some C code and tried to compile it. Then I encountered again one of the usually unmentioned problems in computing: Software installation and system management. Several hours later I had a C/C++ 'compile, load, and go' (CLG) script working, but my throat was sore from screaming curses at the perversity of 'system management' -- a project of a few minutes with a prerequisite of several hours of system management mud wrestling.

For the mud wrestling, the first problem was, since my last use of C, I had changed my usual boot partition from D to E. Next the version of C installed on E was different from that on D. And the installation on D would not run when E was booted. Bummer.

Next, the C compiler, linker, etc. want a lot of environment variables. Fine with me; generally I like the old PC/DOS idea of environment variables.

However, apparently Microsoft was never very clear on just what software, when, could change the environment variables where. At least I wasn't clear.

So, booting from my partition E, the C/C++ tools want environment variables set as in

     E:\Program Files\Microsoft Visual Studio 9.0\Common7\Tools\vsvars32.bat

Okay. Nice little BAT file.

If run the BAT file from a console window, it changes the environment variables as needed by C/C++. But, in console windows I run a little 'shell script' I wrote in ObjectRexx. I has a few nice features for directory tree walking, etc. But when run the BAT file from the command line of a console window that is running my little shell script, after the BAT file is done and returns, the environment variables have been restored to what they were before running the BAT file. If use a statement, say,

     set >t1

at the end of the BAT file, then file t1 shows that the environment variable values have been changed while the BAT file was still running.

So, sure, there is a 'stack' of invocations of processes, applications, or whatever in the console window and its address space, and, somehow, since my shell script was in the stack, when the BAT file quit the stack and its collection of environment variables was popped back to what they had been.

But eventually I relented, gave up on this little project taking just a few minutes, slowed down, thought a little, read some old notes, discovered that I should change the environment variables within my ObjectRexx script, using an ObjectRexx function for that purpose, as needed by C/C++ CLG, found the needed changes, implemented them, and, presto, got a C/C++ CLG script that works while my shell script is running and while I am booted from my drive E.

On to the C question:

For 'types', the test program has

     int i, j, k;

For

     i = j+++++k;

my guess was that this would parse only one way,

     i = (j++) + (++k)

and be legal. And as I recall, but likely no longer have good notes, some years ago on OS/2, PC/DOS, or an IBM mainframe,

     i = j+++++k;

was legal.

Not now! With the C/C++ tools with

     Visual Studio 2008 x86 32 bit

statement

     i = j+++++k;

gives C/C++ compiler error message

     error C2105:  '++' needs l-value

So, that's an L-value or 'left value' or something that the 'operator' ++ can increment.

So, it wasn't clear how the compiler was parsing. So, I tried

     i = j++ ++ +k;

and it also resulted in

     error C2105:  '++' needs l-value

So, likely the ++ that is causing the problem is the second one.

So, I tried

    i = (j++)++ + k;

and still got

     error C2105:  '++' needs l-value

Then I tried

    i = j++ + ++k;

and it worked as would hope: k was incremented by 1 and added to j, the sum was assigned to i, and then j was incremented by 1.

Then I tried

    i = j+++k;

Surprise! It's legal! j and k are added and the sum is assigned to i, and then j is incremented by 1.

So, I long concluded that to understand some of the tricky, sparse syntax of the language, not clearly explained in K&R, have to write and run test cases as here. Bummer. But, as below, here I'm significantly wrong.

Possible to make sense out of this?

Maybe: If start reading

Brian W. Kernighan and Dennis M. Ritchie, 'The C Programming Language, Second Edition', ISBN 0-13-110362-8, Prentice-Hall, Englewood Cliffs, New Jersey, 1988.

in "Appendix A: Reference Manual" on page 191, then hear about 'tokens' and 'white space' to separate tokens.

Okay, no doubt + and ++ are such 'tokens'.

Continuing, right away on page 192 have

"If the input stream has been separated into tokens up to a given input character, the next token is the longest string of characters that could constitute a token."

I would have said "up to and including a given input character", but K&R are 'sparse'!

So, with this parsing rule, in

     j+++k

the tokens are

which is essentially

     (j++) + k

which is legal, but in

     j+++++k

the tokens are

which would be essentially

     (j++)++ + k

where the second ++ does not have an 'L-value' to act on.

So, my remark that

     j+++++k

can parse only one legal way is irrelevant because that is not how the C parsing works.

Basically I was assuming a 'token getting' parsing rule like I've implement a few times in my own work: There are tokens and delimiters, and a 'token' is the longest string of characters bounded by delimiters but not containing a delimiter. The delimiters are white space, (), etc.

K&R seems to have a point: My parsing rule would have trouble with just

     j>=k

and, instead would require writing

     j >= k

which I do anyway.

Generally, though, the C syntax is sparse and tricky, so tricky it stands to be error prone.

Back to writing Visual Basic .NET.