There is a very common pattern in the world where people conflate *goals* with *...

saurik · on Jan 27, 2023

> And then, within the matrix of those message busses, we don't need Erlang's restart capability. We have an abundance of ways to restart processes, from systemd, to kubernetes, to any number of other ways.

I agree with pretty much all of your comment (which clearly comes from a place of deep experience), but the thing that keeps bringing me back to the ideas of Erlang--potentially trying in vain to implement similar concepts in the languages I actually work in (including developing a way to manage fibers in C++ coroutines that work similar to Erlang processes so I could debug background behaviors)--is the idea that these restartable and isolated units simply aren't large enough to be managed by the operating system and systemd or kubernetes of all things: they are things like individual user connections. While there are plentiful easy ways to do shared-nothing concurrency in the world attached to virtually every software project and framework these days, they are all orders of magnitude more expensive than what Erlang was doing, even with its silly little kind of inefficient VM.

llimllib · on Jan 27, 2023

It's worth noting that jerf implemented process trees including restarting services in golang - so I'm curious in what way he means that we don't need Erlang's restart capability when he added it to his own library.

jerf · on Jan 27, 2023

We do not need Erlang's exact solution.

The suture interface for a service in the latest version is:

    type Service interface {
     Serve(ctx context.Context) error
    }

Where's all the stuff for gen_server? https://www.erlang.org/doc/man/gen_server.html Where's the "start_link" versus "start_monitor" distinction? Where's "cast" versus "call"? Heck, where's "stop"?

The answer is, those things are all Erlangisms. They make sense in Erlang. But in Go, the supervisor tree doesn't need to enforce any of those things. Cast or call or multicall or whatever else you like in the code that talks to those services. When I got down to it I couldn't even justify a "Start" or "Initialize" call; I just couldn't help but notice it was completely redundant and it could just be in the Serve function.

What you need is that a single crash does not take down your entire service. It does not have to be in Erlang supervisor trees with Erlang gen_servers that have the exact features and behaviors as Erlang supervisors and the exact APIs. It doesn't even have to be in an OS process the same way as Erlang; cloud lambda functions in many ways solve the same problems in a completely different way. Getting too stuck in Erlang's thoughtspace inhibits your ability to design solutions. There are many ways to make it so a single crash does not take down your entire service. Erlang's way is not the only way.

sangnoir · on Jan 27, 2023

> Heck, where's "stop"?

I'm intermediate (at best) in Go, but I've always found server teardown in Go a little clunky: most solution's I've seen are variations of sending a KILL signal to a channel that the server is listening on, which technically is still message-passing, though unstandardized, unlike stop/n

jimbokun · on Jan 27, 2023

That sounds very interesting. What's the project that interface if from?

llimllib · on Jan 27, 2023

https://pkg.go.dev/github.com/thejerf/suture/v4

camgunz · on Jan 27, 2023

I read that list as "Erlang doesn't have exclusive rights to X"

btbuildem · on Jan 27, 2023

In a way, this "totalizing" can be looked at as a pro rather than a con.

Yes, today we have an entire collection of ecosystems of services that can provide the key functionalities as described above. Each of these technologies comes with it's own long tail of dependencies, security issues, maintenance effort, plain old computational overhead, etc.

Meanwhile, this 30-year old technology provides matching functionality (yes, admittedly with syntax and object types that simultaneously induce vertigo and motion sickness), but all the bugs have long been eradicated or encased in amber, and pound-for-pound it will rip circles around an alternative solution that's dragging a Java VM or a megaton of node_modules along with it, wrapped in docker images and k8s yamls.

I use yaws as my go-to webserver. It's nuke-proof. It's simple*, and it Just Works. Good luck finding a haxxor that can breach it. I believe that it's in large part due to the very simple conceptual building blocks it's constructed out of (the gen_ behaviours OP describes).

ellyagg · on Jan 29, 2023

> In a way, this "totalizing" can be looked at as a pro rather than a con.

Yep, if you want to write a new concurrent system, you will move much faster living inside this integrated environment.

And of course vertical scaling is easier than horizontal. It'll be a long time before you outgrow a huge server.

And if it's not, that's a nice problem to have. So you split off parts of the app into other servcies and erlang/elixir is wonderful at communicating with/controlling other network addressable services.

The problem with erlang is that it's both harder to get started and has a lower ceiling than some other languages. But there's a huge middle class of software that would really benefit from it if they got over the initial hump.

brightball · on Jan 27, 2023

> We don't need Erlang clusters for redundancy any more. You just run multiple copies of a service against the message bus, on multiple systems for redundancy.

The thing about Erlang is that you never needed clusters at all. The redundancy was built into each instances with the runtime. When you build that way, everything naturally scales out horizontally with additional processors and/or physical nodes.

You can't do that in any other language without building the entire system for it from the ground up.

Using multiple systems for redundancy means counting on the entire system going down. The Erlang way isolates this impact to 1 of potentially millions of parallel actions on the system itself. Using other systems for redundancy, the other million actions in progress on this server go down with it. The difference in the level of redundancy is significant.

But I do agree with you that we don't need it for most systems because most systems simply aren't that complex. The benefit of the BEAM comes from simplifying complexity, which tends to evolve over time. Elixir, Phoenix and LiveView will likely lead to earlier adoption of the BEAM in projects before the complexity ramps up which will show a long term benefit.

jerf · on Jan 27, 2023

"The thing about Erlang is that you never needed clusters at all."

IIRC, if you read the original thesis, the reason for clusters is just that there's always that chance an entire machine will go down, so if you want high reliability, you have no choice but to have a second one.

The OP is correct in that the key to understanding every design decision in Erlang is to look at it through the lens of reliability. It also helps to think about it in terms of phone switches, where the time horizon for reliability is in milliseconds. I am responsible for many "reliable" systems that have a high need for reliability, but not quite on that granularity. A few seconds pause, or the need for a client to potentially re-issue a request, is not as critical as missing milliseconds in a phone call.

brightball · on Jan 27, 2023

That's true. You do always need to plan for machine redundancy but hopefully machines don't completely fail that often. I can't remember the last time I experienced an instance failure that wasn't a data center wide impact.

It impacts how you architect certain solutions though. For example, if you've got users connected with websockets you're suddenly able to maintain their state right there with the connection.

In a situation where you can't rely on state on the server itself, every websocket connection has to relay to some backend system like Redis/DB, etc since the state can't be counted on at the connection layer.

josevalim · on Jan 27, 2023

Your comment seems to consider Erlang as a tool to build heterogeneous systems, where you have multiple distinct applications, written in the same or different technologies, talking to each other.

In such cases, I would agree with part of your criticisms because it is indeed the wrong tool for the job. Erlang was not designed to solve this problem: the serialization format is centered around Erlang. The distribution messages reflect the semantics of processes, messaging, monitoring, etc. Inter communication is not the focus. Even on its early days, the distribution was used to provide tolerance against hardware failures by running two identical systems. So I find comparing Kafka and Erlang to be an apples to oranges scenario.

In my opinion, Erlang shines for building homogeneous systems: multiple instances of the same application running in a cluster. Precisely because all I need is Erlang. It comes with a unified programming model for both local and distributed execution. Look at Phoenix uses it to provide features such as distributed pubsub and presence out-of-the-box, features which either require external tools - and additional complexity - or simply do not exist in other platforms. And the beauty in designing such solutions is that you start with the concurrent version and then naturally evolve into making it distributed (if necessary).

I also find the comparison equally misses the mark between restarts/fault-tolerance and Kubernetes. Because, once again, they mostly work at different levels. The classical example is using supervisors to model database connections, something you simply cannot delegate to k8s. But a more recent example comes from working on Nx, which communicates to the GPU. You can stream data in and out of the GPU, but what happens when the code streaming data errors out? You need to develop a synchronization mechanism to make sure the GPU does not get stuck. And what happens if the synchronization mechanism fails? With Erlang I can model and test all of those failure scenarios quite elegantly. Perhaps there are better approaches lurking out there, but it certainly isn't k8s.

When it comes to k8s, they mostly complement each other. Erlang tool for restarting _machines_ is basically non-existent (there is -heart or the stand-by system described by Joe) and k8s addresses that. Erlang doesn't have service discovery, k8s covers that gap. But, even then, there is no assumption you must use Erlang clustering. It is opt-in, you don't have to use it, and in the "worst case" scenario, you can deploy Erlang just as any other language.

OkayPhysicist · on Jan 27, 2023

The totalizing environment is a huge boon, rather than a downside. The utility of tooling is directly tied to how predictable the entire architecture of your application is. If, at the highest level, your projects always consist of a random hodgepodge of services written in a variety of flavor-of-the-week programming languages, 100% of your top-level tooling will need to be custom built for your project. In contrast, if all of your projects consist of "Erlang" or maybe "Erlang+a database", then suddenly you can create tooling that works on any project. For example, Observer lets you look at the entire process tree of your application, for any Erlang application. If your entire project is an Erlang application, you now have complete visibility into your entire application's current state. It's practically magic.

That was my biggest point, but you posted a long comment with a few other disagreeable points, too:

1. re:Conway's Law : Just write the code in Erlang (read: Elixir in this day and age) to begin with. Rewriting large applications in new languages is rarely worth it anyways, that's not a failure of the better language you want to switch to, it's a failure of the poor choice of language you started with.

2. re: But it's not on the cutting edge anymore: And yet it is. Erlang and Elixir are the only languages in anything remotely resembling widespread use to not have an utterly pathological concurrency story. Async-Await is an awkward crutch to shoehorn concurrency into languages that were never designed to support it, Go and its goroutines entirely misses the point of the exercise and simultaneously encourages you to write mutable state and punishes you for doing so, and the Actor model libraries for other languages are just half-baked, bug-ridden implementations of a small fraction of Erlang.

bcrosby95 · on Jan 27, 2023

> Erlang couldn't work with Conway's Law without totally converting your entire company to it, which just isn't going to happen.

I think this says more than you think it does.

I've worked in software development professionally for 20 years. I have never worked at a company that had more than 1 backend language. A large percentage of developers work at small companies.

Erlang/Elixir/BEAM isn't for the Googles of the world. And that's fine. Tech needs to stop its infatuation with these companies. Just because something is right for those companies, doesn't mean it's right for yours. Ignoring this has done a lot of damage over the years. Production complexity is a killer.

People love to talk about scaling (and here, I'm talking about scaling a company), but only ever mention one aspect: scaling up. Things also have to scale down. Technological choices tend to sacrifice one for the other.

But BEAM languages have a wider scaling band than most other technological choices. And that is in large part because of everything it offers out of the box.

weatherlight · on Jan 27, 2023

No one ever talks about scaling down systems, everything is always additive.

stevan · on Jan 27, 2023

> In the light of this statement, the answer to what I think is the thesis question of that entire piece:

> "This begs the question: why aren't language and library designers stealing the structure behind Erlang's behaviours, rather than copying the ideas of lightweight processes and message passing?"

> [...]

> But for the world we live in today, that's a price not worth paying. We don't need the Erlang message bus to be the only message bus. The Erlang message bus is, frankly, not very good, and it's actively terrible if you want to use it for one non-Erlang process to communicate to another. We don't need the Erlang message bus. We have a dozen message busses, some in the cloud, some commercial, some open source, some that double as retention (Kafka), all of which scale better, none of which tie you to Erlang's funky data types.

I asked why they were not stealing the structure of behind Erlang's behaviours, I didn't suggest anyone should steal Erlang's message bus or anything else.

> And then, within the matrix of those message busses, we don't need Erlang's restart capability. We have an abundance of ways to restart processes, from systemd, to kubernetes, to any number of other ways.

I don't think restarting the process from systemd or kubernetes is comparable with a supervisor tree. First of all the tree gives you a way to structure and control the restarts, e.g. frequently failing processes should be further down the tree or they will cause their sister nodes to get restarted etc. The other obvious difference is speed.

> We don't need Erlang's behaviors. We have interfaces, traits, [...]

Yet I don't know of any other language which uses interfaces in a way which they achieve the benefits (listed in the article) that behaviours in Erlang (e.g. gen_server) give you, do you?

jerf · on Jan 27, 2023

"I asked why they were not stealing the structure of behind Erlang's behaviours, I didn't suggest anyone should steal Erlang's message bus or anything else."

To some extent I know... but to some extent the answer is these things are all tied together. Erlang is a really tight ball of solutions to its own problems at times. I don't mean that in a bad way, but it all works together. It needs "behaviors" because it didn't have any of the other things I mentioned.

When I went to implement behaviors (https://www.jerf.org/iri/post/2930/ ), I discovered they just weren't worth copying into Go. You ask if any other language uses interfaces to achieve what Erlang does; my perspective is that I've seen people try to port "behaviors" into two or three other languages now, and they're always these foreign things that are very klunky, and solve problems better solved other ways.

"I don't think restarting the process from systemd or kubernetes is comparable with a supervisor tree."

It isn't, but the problem is...

"First of all the tree gives you a way to structure and control the restarts, e.g. frequently failing processes should be further down the tree or they will cause their sister nodes to get restarted etc."

I don't need that. I've been using supervisor trees for over a decade now, and they rarely, if ever, go down more than "application -> services". Maybe somebody out there has "trees" that go down six levels and have super complicated bespoke restart operations on each level and branch, but they must be the exception.

To the extent that I have deep trees, they're for composition, not because I need the complicated behaviors. A thing that used to be a single process service is now three processes, and to hide that, I make that thing a supervisor of its own so that the upper levels still just see an ".Add()" operation, instead of having to know about all the bits and pieces.

"The other obvious difference is speed."

Certainly, but those are only one of the options.

"Yet I don't know of any other language which uses interfaces in a way which they achieve the benefits (listed in the article) that behaviours in Erlang (e.g. gen_server) give you, do you?"

This is a case of what I'm talking about. Don't confuse Erlang's particular solution for being the only possible solution. Erlang's behaviors are basically the Template pattern (https://en.wikipedia.org/wiki/Template_method_pattern ) written into the language rather than implemented through objects. If you look for the exact Erlang behaviors out in the wild, you won't hardly find anything. If you look for things that solve the same problems, there's tons of them. A lambda function in AWS is a solution to that problem. The suture library I wrote is a different one. Java frameworks have their own solutions in all sorts of different ways.

To put it another way, whereas in 1998 people having the problems Erlang solved was rare, today we all have them. We can't be blundering around with no solutions since we are all too blinkered to use Erlang which just solves them all. That makes no sense. There are far more distributed systems concerned with reliability out there now implemented in not-Erlang than in Erlang. We are not all just blundering along in a fog of confusion, unaware of the existence of architecture, modularity, and abstraction. If programmers have a flaw, it's too much architecture rather than too little.

Maybe that's one of the problems with the Erlang writing. It's all implicitly written from a perspective of the 90s, where this is all a surprise to people, and it kind of seeps in if you let it. But that's not where the world is right now. It is not news that we need to be reliable. It is not news that we want to run on multiple systems. I've got non-technical managers asking me about this stuff at work whenever I propose a design. There's been a ton of work on all of these issues. It's not all good, by any means! But there's now too many solutions moreso than not enough.

stevan · on Jan 27, 2023

> To some extent I know... but to some extent the answer is these things are all tied together. Erlang is a really tight ball of solutions to its own problems at times. I don't mean that in a bad way, but it all works together. It needs "behaviors" because it didn't have any of the other things I mentioned.

I can see why you would say that regarding the supervisor behaviour, but I don't see how your argument applies to the other five behaviours I wrote about. Let's keep it simple and focus on, say, `gen_server`?

> This is a case of what I'm talking about. Don't confuse Erlang's particular solution for being the only possible solution.

I'm not, in fact I mention that I've started working on a small prototype to explore implementing behaviours in a different way at the end of the post.

> Erlang's behaviors are basically the Template pattern (https://en.wikipedia.org/wiki/Template_method_pattern ) written into the language rather than implemented through objects. If you look for the exact Erlang behaviors out in the wild, you won't hardly find anything. If you look for things that solve the same problems, there's tons of them.

My understanding of Joe's thesis is that we can compose a small set of behaviours into complex systems. He and OTP expose and highlight this structure. I don't doubt that other's have also discovered the usefulness of this structure, but I don't see anyone else try to help bring that structure to the forefront so that we can improve the state of software.

Your argument seems to be that this structure has now become implicit in the tools that have been developed since then, but I think again this misses the point: the structure is simple and should be made explicit not hidden away behind 500k lines of C (systemd) or almost 4M lines of Go (kubernetes).

macintux · on Jan 27, 2023

Additionally, the fact that a very small set of standard behaviours has been re-used time and time again for big software projects means they’re extremely well-tested, and well-documented.

dmitriid · on Jan 27, 2023

> We don't need Erlang's behaviors. We have interfaces, traits, object orientation, and even just pushing that entire problem up to the OS process level, or writing a cloud function, and any number of ways of achieving the same goal.

None of these achieve the same goal. Or they result in significantly more complicated and brittle systems. Or they only achieve that goal insofar as you need to glue several heterogenous systems together.

> The reason why language designers aren't rushing to copy Erlang is that what was excellent and amazing in 2000 (and, again let me underline, I mean that very seriously, it was a cutting edge platform built with a lot of vision and moxie) is, in 2023, mediocre.

The main reason is that it is borderline impossible to retrofit Erlang model onto an existing language. Adding concurrency alone may be a decade-long project (see OCaml). Adding all of the guarantees that Erlang VM provides... well.

And on top of that too many people completely ignore anything in Erlang beyond "lightweight processes/actors".

The fact that you can have an isolated process that you can monitor and observe, and have a guaranteed notification that it failed/succeeded without affecting the rest of the system is a) completely ignored and b) nearly impossible to retrofit onto existing systems.

And there are exceedingly few new languages that even think about concurrency at all. And async/await is not even remotely state of the art (but people are busy grafting them onto all languages they can lay their hands on).

State of the art still is mutexes and killing your entire program if something fails. Often both of those.

lvass · on Jan 27, 2023

>But when the chaos settles and best practices emerge, something that I'd say is at least a good 5 years away

That's incredibly optimistic. I am 30 and have basically no hope this will happen in my lifetime, and in the meanwhile, BEAM works great. I agree with all your points, except when you write 2023 as if we're doing things better now. Research in this area hasn't really bore many fruit over the last 30 years as you make it look like.

mononcqc · on Jan 27, 2023

I'm not quite sure how Erlang's world is totalizing. It has ways to ship things in a very integrated manner, but I have shipped and operated Erlang software that was containerized the same as everything else, in the same K8s cluster as the rest, with the same controls as everything else, with similar feature flags and telemetry as the rest, using the same Kafka streams with the same gRPC (or Thrift or Avro) messages as the rest, invisibly different from other applications in the cluster to the operator in how they were run aside from generating stack traces that look different when part of it crashes.

That it _also_ ships with other ways of doing things in no way constrains or limits your decisions, and most modern Erlang (or Elixir) applications I have maintained ran the same way.

You still get message passing (to internal processes), supervision (with shared-nothing and/or immutability mechanisms that are essential to useful supervision and fault isolation), the ability to restart within the host, but also from systemd or whatever else.

None of these mechanisms are mutually exclusive so long as you build your application from the modern world rather than grabbing a book from 10-15 years ago explaining how to do things 10-15 years ago.

And you don't _need_ any of what Erlang provides, the same way you don't _need_ containers (or k8s), the same way you don't _need_ OpenTelemetry, the same way you don't _need_ an absolutely powerful type system (as Go will demonstrate). But they are nice, and they are useful, and they can be a bad fit to some problems as well.

Live deploys are one example of this. Most people never actually used the feature. Those who need it found ways (and I wrote one that fits in somewhat nicely with modern kubernetes deployments in https://ferd.ca/my-favorite-erlang-container.html) but in no way has anyone been forced to do it. In fact, the most common pattern is people wanting to eventually use that mechanism and finding out they had not structured their app properly to do it and needing to give it a facelift. Because it was never necessary nor totalizing.

Erlang isn't the only solution anymore, that's true, and it's one of the things that makes its adoption less of an obvious thing in many corners of the industry. But none of the new solutions in the 2023 reality are also mutually exclusive to Erlang. They're all available to Erlang as well, and to Elixir.

And while the type system is underpowered (and there are ongoing area of research there -- I think at least 3-4 competing type systems are being developed and experimented with right now), that the syntax remains what it is, I still strongly believe that what people copied from Erlang were the easy bits that provide the less benefit.

There is still nothing to this day, whether in Rust or Go or Java or Python or whatever, that lets you decompose and structure a system for its components to have the type of isolation they have, a clarity of dependency in blast radius and faults, nor the ability to introspect things at runtime interactively in production that Erlang (and by extension, languages like Elixir or Gleam) provide.

I've used them, I worked in them, and it doesn't compare on that front. Regardless of if Erlang is worth deploying your software in production for, the approach it has becomes as illuminating as the stacks that try and push concepts such as lack of side-effects and purity and what they let you transform in how you think about problems and their solutions.

That part hasn't been copied, and it's still relevant to this day in structuring robust systems.

petejodo · on Jan 27, 2023

> We don't need the Erlang message bus to be the only message bus. The Erlang message bus is, frankly, not very good

Genuinely curious why it's not very good. Were you speaking solely from the perspective of non-Erlang processes? And also specifically regarding remote messages rather than local?

jerf · on Jan 27, 2023

There's two ways a non-erlang process can speak Erlang terms.

It can implement the Erlang node protocol and show up as a full Erlang node. Neat capability, but the impedence mismatch between systems is something fierce, because the protocol deeply assumes that you're basically Erlang, e.g., not just that processes have mailboxes but that mailboxes have the exact same semantics as Erlang, and you have to implement process linking with the exact same semantics, etc. It's difficult.

Alternatively, you can write a proxy in Erlang where the first process speaks to an Erlang server to send a term to some other Erlang process that will then relay the message in whatever form. This will either be a custom protocol, in which case this is an awful lot of code to write for such a task, or a common messaging protocol in which case you don't need Erlang. That is, you can speak rabbitmq, but that doesn't need to be implemented in Erlang.

The production system I maintained on Erlang did this a lot, because I was forced to have a lot of Perl code interacting with the Erlang system. Back in the day it was a fine choice; it was a time when you couldn't just pop on to google and turn up a dozen battle-ready message busses in two minutes. Now, though, it is far better for the Erlang code to just be another node on your common bus than for it to be the message bus.