Would love to be able to have a database file system and be able to use standard...

yason · on April 15, 2021

Try your nearest shell to experiment with this? In my scripts and programs I've stored data just in files many times: simple, transparent and effective. Works everywhere. There's so much in search that you can just brute-force through unless you have a huge amount of data or a lot of concurrent tasks that it doesn't matter much in today's hardware.

eru · on April 15, 2021

The maildir format does something like this.

nix23 · on April 15, 2021

That sounds like ndb:

https://9fans.github.io/plan9port/man/man1/ndb.html

https://www.youtube.com/watch?v=e9Y0iDXXQh8&feature=share

thinker5555 · on April 15, 2021

This reminds me a little bit of the filesystem from BeOS (RIP) and Haiku. You cant attach any metadata to any file you want, and it has a system to create queries on that metadata.

blacktriangle · on April 15, 2021

Very much miss this. I love how they used this technique to build their email client. Emails were just files on the filesystem with a filesystem plugin that made their metadata queryable. So the email client was just a file browser window with some pre-saved queries that would let you find all emails by date, sender, etc.

megous · on April 15, 2021

A job for fuse. :) I'd like having a nice sysfs like view on all tables and other objects in a database. Even better if editable.

Or alternatively a midnight commander for databases. :)

encryptluks2 · on April 15, 2021

OMG yes. Now this would be perfect for SQLite.

pxndxx · on April 15, 2021

If you do this, you can add all the files to git and get historical views, changelogs... and if you distribute the "database", you even can know who made what changes!

encryptluks2 · on April 15, 2021

Along the lines of what I was thinking. Would be really neat! I already do this to an extent, but with a virtual file system it could automatically sort/categorize/etc like a regular database.

busymom0 · on April 15, 2021

This makes me wonder how efficient filesystems are with millions of file in a single directory? Do they create some sort of index? Are there limits to the number of files a directory can hold? Is that what inodes are for? I remember seeing “inode” limits on some VPS I was using a while back.

majewsky · on April 15, 2021

Indeed, putting too many files in a single directory is inefficient, both for read and write. When writing a file (or changing metadata like permissions), the entire directory inode may have to be rewritten. When searching for a file or opening it, the entire file list for the directory needs to be read (in the worst case where the file is at the end of the list).

When you need to store lots of files on disk, it's a common pattern to spread them out in subdirectories. For instance, instead of `files/2d8af74bcb29ad84`, you would have `files/2d/8a/f74bcb29ad84`.

loeg · on April 15, 2021

> When writing a file (or changing metadata like permissions), the entire directory inode may have to be rewritten.

That isn't how inode filesystems work -- if you change a file's permissions, it's just an inode update on the file -- not the containing directory.

Even in DOS-type filesystems (FAT/exFAT), it's just a record update in the corresponding dirent for that file.

If you add a new file to a directory, that causes an mtime update on the directory's inode.

The rest is accurate -- many older filesystems have lookup performance that scales poorly with directory size (for DOS filesystems and BSD UFS, you have to do a full directory scan). Also ls defaults to sorting output, which is O(N log N) and can be slow in large directories.

fnord123 · on April 15, 2021

Large directory sizes suck on NFS and parallel file systems like Panasas, Lustre, GPFS, and the like. Python and Rust's Cargo also suck on networked file systems and would be greatly improved by pushing things into a sqlite file.

deckard1 · on April 15, 2021

most things suck on networked file systems. I still have trauma from having to use NFS more than a decade ago, on systems that would freeze on boot because the network couldn't be reached or the NFS server was down.

I'd also never put sqlite on NFS. Locking is often broken on NFS. Unless you can guarantee a heterogeneous environment. I can imagine some Excel guy in marketing is going to launch his SQLite UI on Windows and completely hose it all.

fnord123 · on April 15, 2021

Nfs locking broken.. excel? Sqlite ui on Windows? Are you sure you're not confusing SMB with nfs? I've had SMB cache files that ducks everything up. Not nfs.

simias · on April 15, 2021

That's sometimes true but it depends a lot on the filesystem used, and sometimes on the options used when building it. ReiserFS, ext4 and FAT 32 won't have the same performance profile.

I think breaking down large collections in subdirectories is mainly done in order to ensure that it'll work even with FS that don't deal well with very large directories and also because it makes inspecting the files manually a little more convenient given that many file explorers (and especially the GUI ones) can have trouble with large directories.

eru · on April 15, 2021

Yes, though I think there's some POSIX compatibility stuff that also makes big directories a hassle.

utxaa · on April 15, 2021

also ext4 (then 3 i think) used to look for dir entries linearly (it was a linked list of entries).

gugagore · on April 15, 2021

What about on the other end? Why not have each hex digit be a directory along a path? Then you have very, very few files per directory at the cost of deeper hierarchy. What's the practical downside?

pmiller2 · on April 15, 2021

Directories are just another type of file. So, if you do this, and your filenames are n characters long, you'll end up needing to do n file accesses just to find the file you're looking for. Unless the underlying file system does something to make that particular access pattern fast, well... it's going to be stupidly slow after a certain point.

loeg · on April 15, 2021

> Unless the underlying file system does something to make that particular access pattern fast

I don't know exactly how Linux does it. Windows hands off whole paths to the filesystem, so this idea is possible there.

In FreeBSD, there is a generic routine (lookup(9)) that goes component by component, so at each step the filesystem is only asked to resolve a single component to a vnode. I think a clever filesystem implementation (in FreeBSD) could look at the remaining path and kick off asynchronous prefetch... but I am not aware of anything doing this.

eru · on April 16, 2021

I've played around with fuse a bit.

There are two modes you can implement for your filesystem: in the so-called 'high level' mode you get the whole path. In the 'low level' mode the filesystem asks you for one piece of the path at a time.

The low level mode seemed faster in my tests, and I think it's also closer to how Linux kernel works internally?

ArneBab · on April 15, 2021

You then need to do more stat calls — one per hierarchy level — and you waste more inodes.

thewakalix · on April 15, 2021

[epistemic status: amateur speculation]

I’d expect it to take up slightly more disk space. More importantly, the more indirections, the longer it’ll take to get to the file.

loeg · on April 15, 2021

More overhead for each extra layer. Kernel serialization/locking, random seeks (dirent and inode per level), maybe other concerns.

eru · on April 16, 2021

The random seeks aren't a given: your filesystem could do something clever with memory or disk layout to avoid most of them.

loeg · on April 16, 2021

Anything particular in mind? You could probably get a good academic paper or patent out of this.

utxaa · on April 15, 2021

if this was the case - even if rewriting the inode was O(n) - you would have noticed. try ls on a large directory, now touch a file on same directory.

xorcist · on April 15, 2021

"Modern" file systems (say, ext3 and onwards) scales pretty well.

The problem is that most tools that operate on file systems doesn't. Things like readdir() is a linear scan and takes a long time on a million files.

So in practice it's not optimal. A thousand files, no problem. A million, start looking at doing it in-memory (or use some database tool).

nh2 · on April 15, 2021

> Are there limits to the number of files a directory can hold?

Yes, depending on the file system.

For example, ext4 with default settings uses 32-bit hashes. Upon the first collision, you can no longer add more files to the directory (ENOSPC error).

Source: https://blog.merovius.de/2013/10/20/ext4-mysterious-no-space...

bentcorner · on April 15, 2021

WinFS was an attempt at this. It'd be interesting to learn why it was cancelled.

https://en.wikipedia.org/wiki/WinFS

tinus_hn · on April 15, 2021

Because it tries to make people work in a way they don’t want to work. People want to organize their files in folders. They don’t want to tag them all with labels or tags and they don’t want to input all that metadata.

Perhaps they want to tag a few, and it’s useful to have some autodetected metadata but they don’t want to tag them all and they don’t want a gigantic ‘Untagged files’ list. They want folders and they want more than a flat folder list, they want nested folders.

You can implement it, it’s not hard and most modern file systems have all the features you need. But users will hate it and won’t use it the way you want.

eru · on April 16, 2021

You are right that users don't want to do that busy work.

But I am less sure users actually want folders.

Some power-users, sure. But most normal people don't want to deal with folders, either.

For evidence: look at the guy who saves everything on his overflowing desktop.

Any system that allows people to find their stuff, and perhaps make a few annotations, will be good for them.

Google Photos is almost a good example: I don't have to do annotate anything, yet I can search for eg pictures of snow or by location.

(I say only 'almost', because while impressive, that system isn't good enough yet to find obscure stuff or to work on contextual cues like 'those pictures I took at home after we came back from shopping sometime in the last few months'.)

tinus_hn · on April 16, 2021

That’s all very nice but a filesystem needs to be able to deal with every type of file on the planet. Which means you can’t automatically detect the contents.

And really, a lot of users don’t want an interface that doesn’t allow them to do what they want just because someone else just dumps all their files on the desktop.

Apple tried this on iCloud and had to go back. Because, while it makes for nice presentation and usability, there’s a lot of users that it can’t cater for.

utxaa · on April 15, 2021

you can :) even across nfs ... maildir etc ...