That seems like a problem for compiler optimizers to solve, not programmers.

matheusmoreira · on Nov 10, 2019

It's actually the domain of build systems. Splitting code into as many independent files as possible gives the build system more data to work with, allowing it to compile more parts of the program in parallel only when necessary.

If a file contains two functions and the developer changes one of them, both functions will be recompiled. If two files contain one function each, only the file with the changed function will be recompiled.

Build times increase with language power and complexity as well as the size of the project. Avoiding needless work is always a major victory.

twic · on Nov 10, 2019

In C++, the experience is the opposite - a "unity build", where everything is #included into a single translation unit, tends to be faster:

https://mesonbuild.com/Unity-builds.html

http://onqtam.com/programming/2018-07-07-unity-builds/

https://buffered.io/posts/the-magic-of-unity-builds/

matheusmoreira · on Nov 10, 2019

Unity builds are useful too but they have limitations. They are equivalent to full rebuilds and can't be done in parallel. The optimizations they enable can also be achieved via link time optimization. Language features that leverage file scope can interact badly with this type of build. They require a lot of memory since the compiler reads and processes the entire source code of the project and its dependencies.

Unity builds improve compilation times because the preprocessor and compiler is invoked only once. It is most useful in projects with lots of huge dependencies that require the inclusion of complex headers. The effect is less pronounced in simpler projects and they shouldn't be necessary at all in languages that have an actual module system instead of a preprocessor: Rust, Zig.

BubRoss · on Nov 10, 2019

I have to clarify here a little bit and say that it is faster on one core. If you have multiple cores, having your translation unit count in the same order of magnitude as your core count will be faster. There is a lot more redundant work going on, but the parallelism can make up for it.

dooglius · on Nov 10, 2019

> If a file contains two functions and the developer changes one of them, both functions will be recompiled. If two files contain one function each, only the file with the changed function will be recompiled.

Still sounds like a compiler problem

kyberias · on Nov 10, 2019

Multiple files also seems like a problem for IDEs to solve, not programmers.

lazulicurio · on Nov 10, 2019

That brings to mind an interesting idea for an IDE: having one big virtual file that you edit, which gets split into multiple physical files on disk (based on module/class/whatever). Although, thinking about it, there are some languages that would make such automatic restructuring rather difficult.

thyrsus · on Nov 10, 2019

You've just described Leo - leoeditor.com - where you're effectively editing a gigantic single xml file hidden by a GUI. The structuring is only occasionally automatic - mostly manual. It has python available the way emacs has elisp.

Git conflict resolution of that single file is intractable, so I convert the representation into thousands of tiny files for git, which I reassemble into the xml for Leo.

kyberias · on Nov 10, 2019

Yes! Why can't OOP language editors (IDE) simply represent the source code of classes, interfaces and other type definitions as they are without even revealing anything about the files they reside in? The technical detail of source code being stored in files is mundane.

userbinator · on Nov 10, 2019

That brings to mind an interesting idea for an IDE: having one big virtual file that you edit, which gets split into multiple physical files on disk

If you're going to work with it as one big file, then what's the point of multiple physical files anyway? Just store it as one big file then.

ScottFree · on Nov 10, 2019

Why store it as a (text) file at all? Why not store the code in a database? Or as binary? Then you can store metadata pertaining to the code and not just the code itself. Unreal blueprints are an interesting way of structuring code and providing a componetized api. It would be interesting if they were more closely integrated with the code itself. Then you could manipulate data flows, code and even do debugging from inside the same interface.

Yes, this is all pie in the sky stuff, but it's interesting to think about.

pdimitar · on Nov 11, 2019

I have been toying with the idea to store programming projects in a single sqlite3 database butt never seen enough value to actually pursue it.

As you mentioned though, it's interesting to think about.