Actually 'find' will also stat each entry no matter what. Many of the standard-t...

tedunangst · on Aug 16, 2011

The only issue I'm aware of with gzip is actually in zlib, where it stored 32-bit byte counters, but those were strictly optional and it works fine with data that overflowed them. The zlib window size may be only 32k, but bzip2 doesn't do that much better with 900k and a better algorithm, so I wouldn't consider it embarrassingly inefficient.

moe · on Aug 16, 2011

I was referring to the lack of SMP support in gzip (see http://www.zlib.net/pigz/).

Steve_Baker · on Aug 16, 2011

How do you tell a file from a directory without stat()ing it? The d_type field is not portable. Since find and other tools like it need to recursively descend a directory tree, a stat() for each file to determine its type is unavoidable.

Ralith · on Aug 16, 2011

But times have changed, and development isn't dead. Why haven't they been updated? The optimizations you imply are often straightforward and well-understood; not major undertakings to implement.

sixtofour · on Aug 16, 2011

"But times have changed, and development isn't dead. Why haven't they been updated?"

Maybe because listing 8M files is not a common use case, and there just isn't the motivation to update otherwise perfectly working code. It's not an itchy problem.

Ralith · on Aug 17, 2011

Compressing/decompressing large quantities of data is, at the very least.