exxo_'s comments

exxo_ · on June 2, 2021

If one can measure the whole boot process and verify the attestation "remotely", why would he need secureboot on top of that?

viraptor · on June 2, 2021

You need secureboot to be able to ensure that the boot process is the one you set up. Otherwise the attacker can observe it once and replace it with their own version doing whatever they want and saying "yup, here's your magic number, I totally generated in a legit way not read from a saved store".

exxo_ · on Oct 3, 2020

There are several things that can impact performance on "traditional" container runtimes. For example, cgroups, LSMs, seccomp (especially with spectre mitigations), network NS/bridges, etc. There are also more subtle things like being able to do CMA, or deal with shared memory. Most runtimes let you opt out but this becomes difficult to manage and secure with multiple users.

exxo_ · on Oct 3, 2020

It is the same idea, we actually considered it at first. There are some differences in the implementation though and we built enroot with the idea of being more extensible. We also have a plugin for SLURM (https://github.com/NVIDIA/pyxis)

exxo_ · on Feb 5, 2020

Simple unprivileged container runtimes: https://github.com/NVIDIA/enroot https://github.com/hpc/charliecloud

exxo_ · on Nov 23, 2017

Maybe it's time to have national referendums...

exxo_ · on June 13, 2017

Speed vs readability:

https://github.com/coreutils/coreutils/blob/master/src/yes.c

https://github.com/openbsd/src/blob/master/usr.bin/yes/yes.c

avar · on June 13, 2017

One thing to keep in mind when looking at GNU programs is that they're often intentionally written in an odd style to remove all questions of Unix copyright infringement at the time that they were written.

The long-standing advice when writing GNU utilities used to be that if the program you were replacing was optimized for minimizing CPU use, write yours to minimize memory use, or vice-versa. Or in this case, if the program was optimized for simplicity, optimize for throughput.

It would have been very easy for the nascent GNU project to unintentionally produce a line-by-line equivalent of BSD yes.c, which would have potentially landed them in the 80/90s equivalent of the Google v.s. Oracle case.

endgame · on June 13, 2017

Came here to post this, so instead I'll just back you up with a source:

https://www.gnu.org/prep/standards/standards.html#Reading-No...

pge · on June 13, 2017

"Add a programming language for extensibility and write part of the program in that language" wow, that's just asking for trouble...

jerf · on June 13, 2017

Remember that "yes" is the bottom end of complexity here, not the top. As the utility grows larger that stops looking like "asking for trouble" and starts looking like "often the only sensible solution". Expecting people to extend programs in C is often "asking for trouble", after all!

(Lately I've been really tempted to pull a Cato and start terminating my every HN post with "C delenda est." https://en.wikipedia.org/wiki/Carthago_delenda_est)

rjbwork · on June 13, 2017

Never go full Cato.

emodendroket · on June 13, 2017

Yeah, I mean, who even uses emacs?

colund · on June 13, 2017

Youngster!

gertef · on June 13, 2017

That's the LISP programming philosophy of the creator and host of this web forum

k__ · on June 13, 2017

Does writing it in a different language suffice?

xapata · on June 13, 2017

Depends how different.

exxo_ · on June 13, 2017

This is the first revision:

https://github.com/coreutils/coreutils/blob/ccbd1d7dc5189f46...

h2hn · on June 13, 2017

And here is the madness: :)

https://github.com/coreutils/coreutils/commit/35217221c211f3...

alkonaut · on June 13, 2017

And the motivation: it's used as test inputs. Not sure I agree with completely destroying the readability of a perfectly readable file - but ok.

acqq · on June 13, 2017

The motivation, as far as I see, was only that it "may be used."

But see the post of belorn where he argues that the error handling seems to be good.

Paul-ish · on June 13, 2017

Is the date on that accurate, did that happen in 2016? Kinda hurts the theory that they were trying to avoid Unix similarity.

jwilk · on June 13, 2017

It says: Mar 9, 2015

simias · on June 13, 2017

One point in favor of this simple version is that it's immediately obvious that it doesn't do the same thing as the OpenBSD version. In OpenBSD `yes a b c` will only print "a" while in GNU it prints "a b c". I did not catch that when I was reading the more complicated modern version.

pulse7 · on June 13, 2017

(joke) They copied the API of the main() function... :-)

jorvi · on June 13, 2017

At the risk of sounding like a copyright newbie, shouldn't that be covered by just doing a 'clean room' implementation? As long as you can verifiably prove that you didn't copy the source, it should fall under general use (as there's really only one way of doing such a thing), right? Much like Apple can't patent rectangles, although they tried.

nullc · on June 13, 2017

Even if you eventually "win" you already lost when plausible litigation began.

ceronman · on June 13, 2017

Normally I think readability is more important than speed. But in this particular case, I think GNU is doing the right thing optimizing the code to the limit.

This is the beautiful part of Unix: small tools that do only one thing well. Programs following this philosophy are very good abstractions. They do one very well defined thing so you can use them without having to understand how they work. I have used Unix for years and I've never felt the need to read the source code for `yes`. And because they do a very small thing, even if you need to read them, the overhead of optimization is not that much, for example, the optimized GNU yes is just under 100 LOC if you remove comments and help boilerplate. Yes, it's longer than the BSD version, but it's just a matter of minutes to understands what it does.

IshKebab · on June 13, 2017

I totally disagree. Nobody will ever want to use `yes` at 10 GB/s. They will want it to be reliable, and this sort of over-optimisation increases the risk of bugs.

joosters · on June 13, 2017

I've used 'yes' many times to generate huge amounts of data quickly. Back then, it never had the small string optimisation, but you could always run 'yes InsertReallyLongStringHere' to spew out data much faster than /dev/urandom or even /dev/zero

I'm glad it runs fast, and I hope that all OS utilities are optimised (and tested, of course!) instead of making their source code pretty. The fact is, most people want to use programs, not read them.

currysausage · on June 13, 2017

> The fact is, most people want to use programs, not read them.

I want to use safe programs, and programs with readable code are more likely to be properly audited.

joosters · on June 13, 2017

Audited UNIX tools - does such a thing exist?

_cx2w · on June 13, 2017

openbsd.org

khedoros1 · on June 13, 2017

Sounds like you want to stick to something with a non-GNU userspace, apparently.

masklinn · on June 13, 2017

> you could always run 'yes InsertReallyLongStringHere' to spew out data much faster than /dev/zero

That really doesn't make any sense, /dev/zero should be at least as fast as yes.

joosters · on June 13, 2017

/dev/zero should be at least as fast as yes

I agree, all I remember is that when I tried it, /dev/zero sometimes sucked performance-wise. I can't recall the exact circumstances as it was some time ago, and could have been on any of Linux/FreeBSD/SunOS/HP-UX/IRIX - perhaps it was the fastest common way at the time?

On a recent x64 Linux, /dev/zero seems plenty fast enough now:

  $ dd bs=8k count=819200 if=/dev/zero of=/dev/null
  819200+0 records in
  819200+0 records out
  6710886400 bytes (6.7 GB, 6.2 GiB) copied, 0.331137 s, 20.3 GB/s

  $ yes | dd bs=8k count=819200 of=/dev/null
  819200+0 records in
  819200+0 records out
  6710886400 bytes (6.7 GB, 6.2 GiB) copied, 0.959551 s, 7.0 GB/s

Tepix · on June 13, 2017

No need for "dd", let "pv" get the data from /dev/zero directly:

$ pv < /dev/zero > /dev/null [ 16GiB/s]

But the version of yes using vmsplice() is even faster than that on my machine.

pbhjpbhj · on June 13, 2017

What's the line to test `yes` with `pv`?

    pv < /usr/bin/yes > /dev/null

doesn't seem to work properly. FWIW I get 330MiB/s vs 8.4GiB/s for /dev/zero.

[Incidentally first I've heard of pv but I've known about dd for a decade or two].

masklinn · on June 13, 2017

< and > are for file redirection, yes is a binary so you want to pipe its stdout into pv:

    yes | pv > /dev/null

pbhjpbhj · on June 14, 2017

So how did that redirect even work; should we be doing a `mknod` to make a "yes" device to make the comparison work (can we, does it help other than in my naive imagination).

BenjiWiebe · on June 14, 2017

The reason that redirect worked is because it was using the contents of the yes program instead of its output.

kevindqc · on June 13, 2017

I know this works, but how come we can see the output of pv when it is redirected to /dev/null? Maybe I just don't understand how pipes and redirection works since I rarely use Linux :(

masklinn · on June 14, 2017

> I know this works, but how come we can see the output of pv when it is redirected to /dev/null?

From pv's man page:

> Its standard input will be passed through to its standard output and progress will be shown on standard error.

> Maybe I just don't understand how pipes and redirection works since I rarely use Linux :(

The Windows/DOS command line has the same concepts[0], though it's probably used less often: by default a process has 3 FDs for STDIN (0), STDOUT (1) and STDERR (2).

At a shell

* you can feed a file to STDIN via <$FILE (so `</dev/zero pv` will feed the data of /dev/zero to pv), the 0 is optional

* you can pipe an output to a command (other | command) or the output of a command to an other (command | other)

* you can redirect the STDOUT or STDERR to files via 1> and 2> (the "1" is optional so 1> and > are the same thing) (you can redirect both independently)

* you can "merge" either output stream to the other by redirecting them to &FD so 1>&2 will redirect STDOUT (1) to STDERR (2) and 2>&1 will redirect STDERR (2) to STDOUT (1), you can combine that with a regular redirection so e.g. `command >foo 2>&1 ` or with a pipe (`command 2>&1 | other`)

And you can actually create more FDs to use in your pipelines[1] though I don't remember ever seeing

[0] https://support.microsoft.com/en-us/help/110930/redirecting-...

[1] http://tldp.org/LDP/abs/html/io-redirection.html

wutbrodo · on June 13, 2017

You're redirecting stdout to /dev/null and pv is writing to stderr. If you use &> instead, stderr and stdout will both be redirected to /dev/null and you will see no output at all.

moosingin3space · on June 13, 2017

Maybe this?

    yes | pv > /dev/null

wutbrodo · on June 13, 2017

That's not quite apples-to-apples though:

cat /dev/zero | pv > /dev/null

uses a pipe like the 'yes' line above, but runs substantially slower on my computer than the redirect-only version.

MichaelBurge · on June 13, 2017

'yes' to a file is how I sometimes benchmark disk speed. It should have fewer system calls than reading from /dev/zero and then writing.

I actually checked just now, and it looks like you'd be making twice as many system calls with /dev/zero compared to generating the data locally:

    strace dd count=1 bs=512 if=/dev/zero of=test
    open("/dev/zero", O_RDONLY)             = 3
    dup2(3, 0)                              = 0
    close(3)                                = 0
    lseek(0, 0, SEEK_CUR)                   = 0
    open("test2", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
    dup2(3, 1)                              = 1
    close(3)                                = 0
    read(0,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
    write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
    close(0)                                = 0
    close(1)

While 'strace yes > test2' is just a constant stream of write() calls.

The difference matters if you're benchmarking e.g. some new SSD compared to a tmpfs on a machine with 100+ GB of RAM. It's always better if the tools have less overhead, because the comparison is more meaningful.

Also consider that it can be faster to write to a local network than to disk. I've never done it, but I imagine that the kernel's not going to want to deal with your /dev/zero calls if it's spending all of its time writing to a 10GB switch. I can imagine some very specialized storage servers that could spend most of their time writing from memory buffers to a network switch, or if you're troubleshooting a slowdown in the networking itself.

wolfgang42 · on June 13, 2017

When I started this comment, I didn't think you were measuring what you thought you were measuring with those straces. 'strace yes > test2' only watches 'yes', not '> test2' (which is handled by the shell). Here's what your command outputs:

    $ strace yes > /tmp/test2
    [various initialization steps]
    write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
    write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
    write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
    write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
    [...]

To measure everything, I started up a whole new shell, expecting to see a 'write(1, ...)', a 'read(1)', and a 'write(f, ...)':

    $ strace -f sh -c 'yes > /tmp/test2'
    [various initialization steps]
    [pid 15839] write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
    [pid 15839] write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
    [pid 15839] write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
    [pid 15839] write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
    [...]

How does this possibly work? File descriptor 1 is supposed to be the terminal, not a file! Of course, the magic of file redirection:

    open("/tmp/test2", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3 # Open the file, get FD 3
    fcntl(1, F_DUPFD, 10)                   = 10 # Copy FD 1 (the terminal, STDOUT) to FD 10 temporarily
    close(1)                                = 0  # Close original FD 1
    fcntl(10, F_SETFD, FD_CLOEXEC)          = 0  # Close FD 10 (the terminal, copy of STDOUT) when exec() is called
    dup2(3, 1)                              = 1  # Copy FD 3 (the file) to FD 1 (STDOUT)
    close(3)                                = 0  # Close FD 3 (the file's original descrptor)

kccqzy · on June 13, 2017

I had the impression that you thought "yes" pipes the output to shell, and then the shell writes to disk. That's incorrect. " > file" means redirecting to a file, and therefore all write system calls actually write to disk.

wolfgang42 · on June 13, 2017

That's exactly what I thought. Now, had you asked me how redirection worked I would have said "the redirection operators cause the shell to attach a file to the descriptor," but I'd never actually thought through the implications of that in terms of what syscalls get made, and the output of strace presented a rather visceral demonstration of the implications of this clever bit of design.

yk · on June 13, 2017

The ability to get 10 GB/s dummy date into a pipe from the command line could come in handy at some point, for stress testing or something. I am not sure if it is over-optimized. (And even with the optimizations the risk of bugs should be very small.)

masklinn · on June 13, 2017

Of course you could just pipe from /dev/zero which will easily do 10GB/s on every machine.

yk · on June 13, 2017

Just tried it,

    cat /dev/zero|pv >/dev/null

gives me roughly 2/3 the speed of yes. (5 GB/s vs 7 GB/s) Plus yes gives you an arbitrary string, instead of only zeroes.

Tepix · on June 13, 2017

Tobik already wrote it

but here's todays Useless Use of Cat Award http://porkmail.org/era/unix/award.html

e12e · on June 14, 2017

Well, sometimes using cat is faster, as I discovered to my surprise recently:

https://news.ycombinator.com/item?id=14414610

But in that case one could probably say that Gnu awk is just not very good at input handling (as Mawk doesn't appear to benefit from an extra "cat").

tobik · on June 13, 2017

That's because you're using cat and a pipe. Try this instead:

  pv > /dev/null < /dev/zero

mikeash · on June 13, 2017

For anyone who is like me and finds it uncomfortable that things are now out of order, note that you can still put the input redirection in front:

  </dev/zero pv >/dev/null

vram22 · on June 13, 2017

Or even this:

pv </dev/zero >/dev/null

which is a common way of doing it (for any command with any inputs and outputs, not just the above ones), i.e.:

command < input_source > output_dest

All three pv command invocation variants, the one here and the two above, work. And it becomes more clear why they are the same, when you know that the redirections are done by the shell + kernel, and the command (such as pv) does not even know about it. So in all three cases, it does not see the redirections, because they are already done by the time the command starts running (with its stdin and stdout redirected to / from those respective sources). And that is precisely why the command works the same whether it is reading from the keyboard or a file or pipe, and whether it is writing to the screen or a file or pipe.

duozerk · on June 13, 2017

I believe /dev/zero writes data one byte at a time; that's likely the reason why.

[edit] That's actually inaccurate (and badly expressed), see comments below.

adwn · on June 13, 2017

/dev/zero doesn't "write" anything in the sense that yes writes, since it's a character device and not a program. The Linux kernel's implementation of /dev/zero does not write one byte at a time.

duozerk · on June 13, 2017

You're right of course; and actually I believe the kernel will simply provide as many bytes as the read() requested; so the speed should mostly depend on how you access /dev/zero. IE, the user above was using cat and I think with dd and a proper block size it'd be much faster.

sqeaky · on June 13, 2017

I was under the impression that cat automatically used a sane size for reading. Now that I think of it I cannot think of a source, other that to point out my own anecdotal experience.

When I was writing raspbian images to SD cards to use on a raspabery Cat and DD took within a few seconds of each other on an operation longer then a minute. Since then I have been using cat where I could, but I didn't think to right down the numbers though.

e12e · on June 14, 2017

For something somewhat related, see parts of this thread:

https://news.ycombinator.com/item?id=14414610

Note that cat+gnu awk was faster than just gnu awk - but mawk was faster still (reading a not entirely small file).

And in a similar vein of gp comparing Gnu and Openbsd, note that openbsd cat is a little more convoluted than the simplest possible implementation (at least to my eyes):

https://github.com/openbsd/src/blob/master/bin/cat/cat.c

https://github.com/coreutils/coreutils/blob/master/src/cat.c

(That is, Gnu "cat" and OpenBSD "cat" are less different than Gnu "yes" and OpenBSD "yes").

problems · on June 13, 2017

yes will give repeated data though, not just zeros, seems like it's more useful here - as others have pointed out, also uses less syscalls than /dev/zero.

belorn · on June 13, 2017

If you want to maximize readability and simplicity then writing `yes` in C is a bad choice. It is much easier, cleaner and shorter to write it in python or just use the shell which would normally be using the `yes` in the first place. Since `yes` is used in shells and builtins can be considered very reliable, here is a implementation as a shell function:

    yes(){ while :; do echo "${1:-y}"; done; }

Python:

    import sys 
    while True:
      if len(sys.argv) > 1:
        print(sys.argv[1])
      else:
        print("y")

And if you don't need the features of `yes` and only need a loop that prints 'y', then there really is hard to beat the simplicity of:

  while:; do echo "y"; done

raptorfactor · on June 13, 2017

Is it really "easier, cleaner and shorter to write it in python"? Did you look at the OpenBSD implementation?

https://github.com/openbsd/src/blob/master/usr.bin/yes/yes.c

It's essentially line for line identical to your python code...

belorn · on June 13, 2017

The C example has three includes, two conditionals, two loops, and one function definition. The python example has a single include, conditional, and loop.

For readability purpose it is easier to go through each lines of the python program than the OpenBSD C code. Its not massively different, but its distinguishable enough that I would choose the python version if I wanted to maximize readability, minimize syntax requirement and did not want to use shell script.

The Shell function is in my view the superior choice if the audience is a programmer than know the shell script syntax. It is just a single loop and is written in the environment that the program is intended to be used in. The only drawback there is the speed.

mikeash · on June 13, 2017

Most of what makes the C program bigger comes from the fact that the C program does more. Your python example doesn't call pledge(). Remove that from the C program and it drops to one include, one conditional, and two loops. Further, counting the two loops against C doesn't make any sense: it's entirely up to the programmer whether to have a conditional containing two loops, or a loop containing a conditional. Both languages could naturally do it either way.

teamhappy · on June 13, 2017

> The python example has a single include, conditional, and loop.

... and python. Don't forget to count python.

BenjiWiebe · on June 14, 2017

Exactly. That's the reason I write stuff in C instead of my favorite interpreted language, Ruby. When you write something in C, that's it. No large interpreter plus runtime needed.

_cx2w · on June 13, 2017

[flagged]

sctb · on June 13, 2017

You've been on a tear of uncivil and unsubstantive commenting, and it has to stop. Often a good strategy for this is to slow down. High-quality reflective posts are what we're after instead of dismissive, snarky reflexive ones, and the former come more slowly.

_RPM · on June 13, 2017

This is innificient. Why do the argv check in every iteration of the while loop? It's not going to change between iterations.

th4dv · on June 13, 2017

> Nobody will ever want to use `yes` at 10 GB/s

Just because you say so?

bastawhiz · on June 13, 2017

If the speed of yes is bounded by memory speed, doing anything useful will almost certainly consume that data at a far lower rate. Putting it on a disk, pushing it over a network, etc. will almost always be slower than yes is able to generate data.

cestith · on June 13, 2017

The typical use case is piping it into another running program. Maybe someone wants to do that really quickly rather than putting it on disk or pushing it over a network.

tomohawk · on June 13, 2017

It's not about running 'yes' at 10GB/s, it's about less overhead to do a simple job. If this version of yes is 100x faster, that implies it using 1% of a cpu to do the same work that would otherwise occupy 100% of a cpu. This leaves more of the machine to do what is likely to be the intended task.

0xbadcafebee · on June 13, 2017

Unix userland tools have evolved over decades to be as efficient as possible because they have historically underpinned everything the operating system does. The faster each tool works, the faster the other tools that depend on them work. If increased efficiency results in a bug, that bug can then be fixed, making it a net gain for system stability.

avar · on June 13, 2017

I'm having a hard time imagining a case where even a grossly complex 100,000 line implementation of yes(1) couldn't be trivially proven to be correct.

mikeash · on June 13, 2017

Easy! Write a program that does a brute force check of Goldbach's conjecture on all integers. For each integer that passes the check, print a line. If you can prove this correct (or incorrect) you'll probably win a Fields medal.

acdha · on June 13, 2017

I agree with the general point that you could prove such a simple program correct relatively easily but that does still have a cost, which is always a concern in an open-source project. You still need someone to step up and do that work and continue to verify it periodically in the future – if that code is doing complicated things with buffering, that opens up possible odd bugs due to stdlib, gcc, maybe even kernel behaviour changes which might not affect simpler programs.

Not a huge bit of work to be sure but for a non-commercial project you might have trouble finding a volunteer who cares about that tedium.

_cx2w · on June 13, 2017

Absolutely, I once trivially proved a 1000,000 line implementation of return 0 to be correct. I don't know why all comments are bothered by how much overkill this yes implementation is. Maybe they don't hold 100,000 PhDs like we do, am I right?

nobodyorother · on June 13, 2017

Hey, 640K is enough for anyone!

YSFEJ4SWJUVU6 · on June 13, 2017

It's not really optimized to the limit – or perhaps it is, but then the limit is fairly easy to reach.

When I saw this item here I reached for the terminal and wrote a simple (and non-compliant) alternative that simply initialises 8150 bytes of "y\n" and then loops a write forever. I understand that it is not a fully standard-compliant yes, and that maybe GNU yes is indeed fast, but that awfully simple program that takes all of 10 lines (everything included) and took me all of a minute to write performs just as well as far as pv is concerned.

(I eventually completed a feature complete yes but I still think that simply not using `puts` is hardly optimising to the limit.)

scandox · on June 13, 2017

Yeah I got a surprise when I wanted to see how strlen was implemented:

https://github.com/lattera/glibc/blob/master/string/strlen.c

_wmd · on June 13, 2017

If you're on amd64 that's the wrong file:

https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86...

The pcmpeqb instruction is from SSE 4, it compares 16 bytes per op

belorn · on June 13, 2017

A nitpick, but I notice that the BSD implementation do not catch errors when printing. In theory it could get EINTR and only write partial amounts of argv[1], especially if the argument string is really long and the program runs for a extended amount of time. The GNU version do catch EINTR.

Naturally it could be that the C function puts used by most people in OpenBSD is implemented with built in EINTR catching loop, or that OpenBSD do not interrupts writes.

joosters · on June 13, 2017

You'd have to check the various POSIX standards to be sure - and have to then verify that the OS/libc actually follows them, but you can pretty much rely on every libc's I/O wrapper functions to handle interrupted system calls or incomplete writes. I've never seen any code check the return value of a printf() to verify that all the characters were printed.

As you say, the GNU versions definitely handle EINTR - the linux man page for puts() just says it returns a non-negative value on success, it's not even specified whether it returns the number of bytes written or not.

LukeShu · on June 13, 2017

`puts` is an "stdio" function, not a system call. It won't EINTR, it correctly resumes if the underlying write() EINTRs. If it does get an error, it will return EOF, then you'll have to call `ferror()` to find out which error.

But the point stands; there's no error handling in the OpenBSD version. But that could be considered a design decision; the OpenBSD version never gives up on writing data until you kill it; the GNU version bails at the first error.

asveikau · on June 13, 2017

I think the GNU version is surprisingly readable for what it does. I have seen code that is a lot less readable than this and doesn't have the performance benefits described in the article.

agumonkey · on June 13, 2017

I wonder how much of BSD is written in this canonical style.

I think unix v6 was mostly as clear. Linux suffered the real world penalty. But maybe BSD managed to keep it's source poetic.

hultner · on June 13, 2017

In general I love reading the code base of BSD-systems, NetBSD in particular is really beautiful and easy to follow.

realusername · on June 13, 2017

I generally look at some BSD sources when I want to know how some unix tools are working, they are always much much more readable than the GNU equivalent.

zakk · on June 13, 2017

I am wondering why the GNU version uses

atexit (close_stdout);

Aren't all streams closed at exit? And why closing stdout, anyway?

tyingq · on June 13, 2017

It's to make sure the buffer is flushed. Streams are closed, but not explicitly flushed at exit. Stdout only because it's the only filehandle with a buffer in yes.

LukeShu · on June 13, 2017

In normal use, GNU `yes` does unbuffered IO on stdout. However, it does use buffered IO for --help and --version messages; it sets atexit(close_stdout) to cover both of those cases at once, rather than handling them both separately.

mallaidh · on June 14, 2017

And for a third example, Busybox's implementation:

https://git.busybox.net/busybox/tree/coreutils/yes.c

_pfxa · on June 13, 2017

Mah, how many times do you read a program and how many times do you execute it?

masklinn · on June 13, 2017

According to the article/experiment you don't have to bork up the code much, just copy your stuff into a big buffer before you start printing it (and print using write(2) instead of puts)

microtherion · on June 13, 2017

This works easily for the default case, which prints "y\n" (two bytes), which is likely to divide BUFSIZ. To handle the general case with the same efficiency, you have to have a buffer size that is a multiple of both BUFSIZ and the length of the string to be printed. It appears that GNU yes will not do that and simply does unaligned writes in this case, which is likely to be considerably slower (possibly slower than write would have been).

masklinn · on June 13, 2017

> It appears that GNU yes will not do that and simply does unaligned writes in this case, which is likely to be considerably slower (possibly slower than write would have been).

Why would it be slower to do a single, say, 8190 bytes write instead of 2730 3-byte writes?

AstralStorm · on June 13, 2017

Generally writes not aligned to cache line are slightly slower on most common architectures, vastly slower on others. (Such as many MIPS)

Small write calls themselves incur a considerable syscall overhead.

emmelaich · on June 13, 2017

This is why Linux and GNU has won.

It's just a trade off. For utilities whose behaviour doesn't change we're happy to improve the speed.

hultner · on June 13, 2017

And it's also a source of many bugs (in the past and likely in the future as well). In most use cases today I'd rather have a slightly slower userland which is easily read (and audited) then one which compromises quality for speed in edge cases.

kw71 · on June 13, 2017

IOW, Can't leave anything alone.

_lqaf · on June 13, 2017

You're always welcome to port old, slow utilities forward. That's the beauty of open source!

kw71 · on June 13, 2017

This work certainly has value but it is frustrating to keep up with everything being changed. And if changes affect me do the fashions and attitudes of those making the changes have synchronicity with the way that I use computers?

In the old days I think we would have left yes in c because the compiler will build it on any platform.

My first experience with Linux is porting land.c to one of the commercial Unix. Now I first met TCP/IP on a 3B2 and later met systems sold by SMI and DEC and SCO and we all mostly constructed packets the same way and a guy called Stevens had written some nice books about this that everyone had. I think I recall the commercial and free BSD also did things the usual way. But whoever figured out this interesting phenomenon land.c demonstrated happened to be a Linux user and this platform had some different ideas about it. I saw rewriting the relevant parts as an annoying, menial task.

exxo_ · on Sept 13, 2016

We (NVIDIA) recently moved away from Quay/Github/Jenkins to Gitlab for our deep learning automation and the experience so far has been truly amazing. We were able to automate our most complex DL container pipeline in a matter of days. We still have to workaround some Gitlab limitations (e.g. issues [CE]17069, [CE]18994, [CE]18106, [EE]224) but overall it's great to see everything working in harmony (i.e. Docker registry, CI pipelines, Git repositories, Runners on-premises). On a personal note, I would like to see more storage on Githost.io instances considering the fact that you can't easily delete pipeline traces and that Docker images can quickly add up.

sytse · on Sept 14, 2016

Thank you so much for commenting. It is great to hear that the deep learning automation department of Nvidia is using GitLab and is happy with everything working in harmony.

Regarding your suggestions:

I asked to prioritize https://gitlab.com/gitlab-org/gitlab-ce/issues/17069

We're already actively discussing https://gitlab.com/gitlab-org/gitlab-ce/issues/18994

Not sure about https://gitlab.com/gitlab-org/gitlab-ce/issues/18106

https://gitlab.com/gitlab-org/gitlab-ee/issues/224 looks interesting

Please comment in the issues if you have additional details about the use case or questions.

The costs of GitHost.io correlate with the storage since they are Digital Ocean instances. Not sure how to solve. Maybe by allowing to use their networked storage, but this seems complex. Consider emailing support@gitlab.com if you have any questions or suggestions.

Snappy · on Sept 14, 2016

You can also switch your container registry to use S3, which might be more cost-effective. I'm not positive if GitHost.io supports that, but it likely does.

exxo_ · on May 20, 2016

I think you are missing the point, it has nothing to do with installing the NVIDIA drivers through Docker.

What you are showing[1] is how to install NVIDIA drivers on CoreOS the hackish way (not persistent, no driver libs, no DKMS, no UVM, no KMS...)

Regarding rkt, it's not supported at the moment but a similar approach could be taken. As for the Docker CLI wrapper, you can avoid it if you really need to.

joshuak · on May 21, 2016

While my code here is definitely hackish, I can't argue with that, I have to say I'm hard pressed to see how running a container to activate a driver is hackish when the comparison at hand is modifying the Docker CLI and requiring a Docker plugin.

I run the driver container at startup, and never shut it down. How is this not persistent? DKMS and other build/deployment choices are not obviated by my approach, so I'm not sure that's relevant.

Looking more deeply at the "Why NVIDIA Docker" in the repo wiki doesn't provide any enlightenment either. In fact it doesn't really explain why docker itself must be modified. The only explanation really is lack of container portability, but driver containers are portable within the scope of a given kernel version. Certainly modified docker cli and plugin requirements are much less portable.

It seems to me like someone at nVidia simply didn't realize that they could run a container in privileged mode and effectively install the driver system wide for all containers.

exxo_ · on May 21, 2016

If you want more insights, I suggest you read the section "Internals".

I'm not going to dwell on the details but there are many reason why doing so can go horribly wrong. Believe me, we (NVIDIA) evaluated our options and know the implications of running our drivers within containers.

Do you really know what --privileged do? If so, you know that there is no such thing as "install the driver system wide". For that you would have to circumvent the namespaces and a bunch of other things that Docker put in place.

"portable within the scope of a given kernel" [and driver] "version"

Well that's not what I call portable :) With nvidia-docker you can build a CUDA image on your laptop and deploy it anywhere in the cloud or on premises without a single modification.

joshuak · on May 21, 2016

Ok great, thanks! I'll check this out when we run into these issues.

exxo_ · on Feb 20, 2016

You can already have dots in IPv6 addresses if it's an IPv4-mapped/compatible address (e.g. ::FFFF:129.144.52.38)

exxo_ · on Feb 4, 2016

You need the NVIDIA drivers and the nvidia-docker plugin.

$ docker-machine create --driver amazonec2 --amazonec2-instance-type g2.2xlarge ...

$ docker-machine ssh <host> # install the NVIDIA driver and nvidia-docker-plugin

$ eval `docker-machine env <host>`

$ ssh-add ~/.docker/machine/machines/<host>/id_rsa

$ NV_HOST="ssh://ubuntu@<ip>:" nvidia-docker run mybuild/tensorflow

Step 2 can be skipped if you use a custom AMI.