There must have been a very good reason they tested the generational GC as they did.
Your comparison of heap allocation generational GC with stack allocation is not correct. Yes, allocating in the nursery is very fast, a couple of cycles only. The stack frame gets allocated once per function call. Yes, when a GC runs, you have to scan the whole heap, including the stack.
But what you are completely missing is, that allocating on the heap eventually triggers a GC, which takes cpu resources to perform. After a collection of the nursery, all surviving objects are promoted to the older generations. This promotion not only takes work, but grows the older generations which are more expensive to GC. So, while heap allocation with a generational GC is very cheap, it is not free. A large allocation count causes more frequent GC runs and objects might be promoted to older generations prematurly. As a consequence, a program that allocates less will perform better. Avoiding a high amount of heap allocations is a good way to increase your programs performance, be it by doing stack allocation, or by reusing buffers for example.
> There must have been a very good reason they tested the generational GC as they did.
The reason was that they didn't have time to implement copying GC, per the talk. That's fair as far as engineering schedules are concerned. It says nothing about how good generational GC is in general.
> But what you are completely missing is, that allocating on the heap eventually triggers a GC, which takes cpu resources to perform.
It causes a minor GC only. Those are very cheap.
Yes, there are potential add-on costs. But it's been repeatedly shown that with a fast generational GC, the benefit of escape analysis for garbage collection is marginal. That's why Java HotSpot took so long to implement it. The main benefit of escape analysis in HotSpot, in fact, is that it allows SROA-like optimizations like lock elision, not that it makes garbage collection faster. Generational GCs really are that good.
IMHE, generations are a nightmare to operate for high performance servers at scale because you have to balance the sizes of those heaps manually and it can change abruptly with code changes or workload fluctuation.
Go allocations are indeed costlier but the performance critical sections of applications can be profiled and optimized accordingly to remove allocations.
I'd rather have Go's amazing low GC latency and slightly higher allocation costs vs the operational nightmare from HotSpot.
Automatic management of generations has never fully worked in Java. Every new JDK version just adds more knobs. Sounds like you have a different experience?
Your comparison of heap allocation generational GC with stack allocation is not correct. Yes, allocating in the nursery is very fast, a couple of cycles only. The stack frame gets allocated once per function call. Yes, when a GC runs, you have to scan the whole heap, including the stack.
But what you are completely missing is, that allocating on the heap eventually triggers a GC, which takes cpu resources to perform. After a collection of the nursery, all surviving objects are promoted to the older generations. This promotion not only takes work, but grows the older generations which are more expensive to GC. So, while heap allocation with a generational GC is very cheap, it is not free. A large allocation count causes more frequent GC runs and objects might be promoted to older generations prematurly. As a consequence, a program that allocates less will perform better. Avoiding a high amount of heap allocations is a good way to increase your programs performance, be it by doing stack allocation, or by reusing buffers for example.