What is an OS good for if not to run programs? You either port existing programs...

vilunov · on July 25, 2023

Often the user-space API limits the design of internals and available features. I for one would like to see the classical hierarchical filesystem gone, but POSIX (and UNIX) at its core is about the filesystem.

If you need to run the existing programs, feel free to develop a translation layer, at the same time allowing a new type of OS paradigm to emerge.

yjftsjthsd-h · on July 25, 2023

Nobody's stopping you from slapping on a compat layer, and ex. Haiku does that, but it's more work than just being unixy to start with.

drpixie · on July 26, 2023

Yes please, something that doesn't have files, or processes. Both are too low level and require us to write or use enormous amounts of code to do very simple tasks that are performed by almost all applications. Very wasteful.

But lets stay away from adding another layer for our nice new abstraction. A new layer means the old ones are still there are will be used because programmers are lazy. It's usually easier to reuse someone else's code (that access the old layers) or reuse old methods and algorithms. In both cases, we're moving back towards the old stuff, the stuff we want to leave behind.

huhlig · on July 26, 2023

Curious, what would you replace the classic Hierarchical File System with? A flat key value store? We’ve tried that and it’s a nightmare.

mike_hearn · on July 26, 2023

Here's a few ideas I've been stockpiling for the day I retire and my wife will hopefully let me just fiddle with operating systems all day. None of these necessarily would work well or make sense, they're just stuff that would be satisfying to explore. NB: All this can be done in userspace with enough hacks.

1. Make strong file typing work well. Filesystems ignore the question of type, leaving it to ad-hoc conventions like extensions. Figuring out what a file really is takes a lot of effort, is often duplicated by apps and the OS and has a long history of introducing security vulnerabilities. There's also no first class notion of "type casting". Format conversion is delegated to an ad-hoc and inconsistent set of tools which you have to learn about and obtain yourself, even though it's a very common need.

2. Unify files and directories. Make open() and read() work on directories. You need (1) for this.

A lot of cruft and complexity in computing comes from the fact that most things only understand byte arrays (e.g. http, email attachments...), not directories. Directories have no native serialization in any OS, unless you want to stretch and call DMG a fundamental part of macOS. Instead it's delegated to an ancient set of utilities and "archive" formats. As a consequence a big part of app development has historically been about coming up with use-case specific file formats, many of which are ad hoc pseudo-filesystems. This task is hard and tedious! There are lots of badly designed file formats out there, and my experience is that these days very few devs actually know how to design file formats, which is part of why so much computing has migrated to cloud SaaS (where you ignore files and work only with databases). Many new file formats are just ZIPs.

An operating system (or userspace "operating environment") could fix this by redefining some cases that are errors in POSIX today to be well defined operations. For example given a file that has some sort of hierarchical inner structure like a .zip (.jar, .docx), .csv, .json, .java, .css and so on, allow it to be both read as a byte array but also allow you to list it like a directory. Plugins would implement a lossless two-way conversion, and the OS would convert back and forth behind the scenes. So for example you could write:

    echo red > styles.css/.main-content/background-color

and the CSS file would be updated. Likewise, you could do:

    mkdir "Fun Project"
    curl --post -d "Fun Project" https://whatever.com/cgi-bin/receive-directory

The trick here is to try and retain compatibility with as much software as possible whilst meaningfully upgrading it with new functionality. Redefining error cases is one way to do this (because you have some confidence they won't be important to existing programs).

There are UI issues to solve. You'd probably want some notion of an item having "bias", so explorers can decide whether it makes more sense to e.g. open a clicked icon in an app or explore inside it. You'd need to find ways to express the duality in a GUI nicely and so on.

But if you get it right it would allow app devs to think of their data in terms of tiny files and then let the OS handle the annoying aspects like what happens when you drag such a directory onto an email. Apple/NeXT tried to do this with their bundles but it never really worked because it wasn't integrated into the core OS in any way, it was just a UI hack in the Finder. For example early iLife apps represented documents as bundles, but eventually they gave up and did a file format because moving iLife documents around was just too difficult in a world of protocols that don't understand directories.

3. Ghost files.

There's an obvious question here of how to handle data that can be worked with in many formats. For example, images often need to be converted back and forth and the conversion may not be lossless. I want to find a syntax that lets you "cast" using just names. The obvious and most compatible approach is that every format the system understands is always present, and filled out on demand. Example:

    $ take /tmp/foo
    $ wget https://www.example.com/face.jpg
    $ ls
    face.jpg
    $ imgcat face.webp
    <now you see the face>

In this case, the OS has a notion of files that could exist but currently don't, and which don't appear in directory listings. Instead they are summoned on demand by the act of trying to open them for reading. You could imagine a few different UXs for handling divergence here, i.e. what if you edit the JPG after you accessed the WEBP version.

4. Unify KV stores and the filesystem

Transactional sorted KV stores are a very general and powerful primitive. A few things make filesystems not a perfect replacement for RocksDB. One is that a file is a heavyweight thing. Not only must you go to kernel mode to get it, but every file has things like permissions, mtimes, ctimes and so on, which for lightweight key/value pairs is overkill. So I'd like a way to mark a directory as "lite". Inside a lite directory files don't have stored metadata (attempting to querying them always returns dummy values or errors), and you can't create subdirectories either. Instead it's basically a KV store and the FS implementation uses an efficient KV store approach, like an LSM tree. Reading such a directory as a byte stream gives you an SSTable.

Also, contemporary kernels don't have any notion of range scans. If you write `ls foo*` then that's expanded by the shell, not filtered out by doing an efficient partial scan inside the kernel, so you can get nonsense like running out of command line space (especially easy on Windows). But to unify FS and KV stores you need efficient range scans.

There have been attempts at transactional filing systems - NTFS does this. But it's never worked well and is deprecated. Part of the reason UNIX doesn't have this is because the filesystem is a composite of different underlying storage engines, so to do transactionality at the level of the whole FS view you'd need 2PC between very different pieces of code and maybe even machines, which is quite hard. Lite directories, being as they are fixed in one place, would support transactions within them.

5. Transient directories

Free disk space management is a constant PITA in most operating systems. Disk space fills up and then you're kicked to a variety of third party cleaner tools. It sucks, especially as most of your storage is probably just a local cache of stuff generated or obtained from elsewhere.

In my research OS/OE, "filectories" or whatever they're called can be tagged with expiry times, notions of priority, and URLs+hashes. The OS indexes these and when free disk space runs out the OS will do start deleting the lowest priority files to free up space. Temporary files go first, then files that were downloaded but weren't used for a while (re-downloaded on demand), and so on.

In such an OS you wouldn't have a clear notion of free disk space. Instead as you ran out of disk space, rare operations would just get slower. You could also arrange for stuff to be evicted to remote caches or external drives instead of being deleted.

drpixie · on July 26, 2023

6. The basic OS abstraction is typed objects.

Objects are not serialised and stored in files, they exist (only) as the fundamental entity of the OS. These objects are not read/saved from disk, they exist until destroyed and are managed transparently by the OS - similar to the manner to which virtual memory pages are transparently managed by Unix/Linux.

Objects represent reasonably high level entities, perhaps images, sounds, etc, but probably not individual integers or strings. Objects may reference ("contain") other objects.

Objects are strongly typed and implement common sets of abilities. All picture objects can show themselves, duplicate themselves, etc.

mike_hearn · on July 26, 2023

I've moved away from this idea over time even though it's intuitively attractive:

1. OOP is closely tied to language semantics but languages disagree on exactly what objects, methods and type signatures are, and programmers disagree on what languages they'd like to use.

2. From the end user's perspective it's often useful to separate code and data.

This isn't an anti-OOP position. I use OOP for my own programming and it serves me well. And modern operating systems are all strongly OOP albeit mostly for communication between micro-services rather than working with data.

One reason the files+apps paradigm dominates and not objects (despite several attempts at changing that) is because it allows apps and formats to compete in a loosely coupled market. I don't want to be tied to a specific image editor because I have an image object that happens to be created by some primitive thing, I want to work with the pixels using Photoshop. The concept of files, apps and file associations lets me do that even though it's not well supported at the bottom layers of the OS stack.

But OOP is fundamentally about combining code and data. So an image object, in this context, would have to be something more like a codec implementation that implements an IPixels interface. But again, even then, why would the two be combined tightly? What if I want to swap in a faster codec implementation that I found?

Follow this reasoning to its conclusion and you decide that what's needed is actually an OS that can transparently convert data into various different static formats on the fly, and which has a much deeper and more sophisticated concept of file associations. PNG<->JPEG should be automatic obviously but also, one of those formats you convert to might for instance be dynamically loadable code or a serialized object graph. For example, imagine you have an image viewer written in Kotlin or Java. You also have a complex file that nonetheless meets some common standard, maybe a new proprietary image format, and you'd like your general image viewer to be able to load it directly without conversion to some intermediate format. Then you could write a converter that "converts" the image to a Java JAR which is then automatically dynamically loaded by the OS frameworks and asked to return an object graph that conforms to some interface:

    var image = ImageRenderer.load("foo.compleximage")
    if (image instanceof WidgetFactory) { /* put it into a gui */ }

Java already has a framework for abstracting images of course. The trick here is that the app itself doesn't have to actually have a plugin for this new format. Nor does the OS need to be specific to Java. Instead the OS just needs a framework for format conversion, and one of the formats can be dynamically synthesized code instead of what we conventionally think of as a file format.