We launched our #TalkConcurrency campaign last week with a fantastic interview with one of the founding fathers of concurrency; Sir Tony Hoare. This week we continue with the co-creator of Erlang, Joe Armstrong, as he walks us through his experiences with concurrency since 1985.
In the mid 80s, whilst working at the Ericsson Computer Science laboratory, Joe and his colleagues were looking for an approach to developing fault-tolerant and scalable systems. This resulted in the Erlang style concurrency as we know it today. During our visit to Cambridge University’s Department of Computer Science, we asked Joe, Tony, and Carl to discuss their experiences of concurrent systems and the future direction of concurrent software development.
The panel discussion will follow, but for now, here is an insightful discussion about how concurrency models have developed, and where they will be heading in the future with Joe Armstrong.
About Joe Armstrong
Joe made his name by co-creating Erlang alongside Robert Virding and Mike Williams in the 1980s at the Ericsson Computer Science Labs. Before that, he was debugging programs in exchange for beer whilst studying at University College London. He later received a Ph. D. in computer science from the Royal Institute of Technology (KTH) in Stockholm, Sweden in 2003.
Joe is the author of a number of key books on the topic of Erlang and beyond this including Concurrent Programming in Erlang, Programming Erlang: Software for a Concurrent World and Coders At Work.
You can read and watch more about Joe’s Erlang journey via our #OpenErlang campaign.
Joe Armstrong: My name is Joe Armstrong. I’m here to tell you about a style of concurrent programming that we evolved from about 1985.
My introduction to this was when I started working at Ericsson and I got the problem of trying to figure out how to build a fault-tolerant system. At the time, Ericsson built large telephone exchanges that had hundreds of thousands of users, and a key requirement in building these systems was that they should never go down. In other words, they had to be completely fault-tolerant.
There was actually a fairly long history of this. Ericsson started building systems like this in about 1974, in the mid ‘70s. By the time I came along, which was around about 1985, they were a world-leading manufacturer of large fault-tolerant systems. I got a job in the computer science lab to see how we could program these systems in the future.
Initially, I wasn’t really interested in concurrency as such, I was interested in how you make fault-tolerant systems. A characteristic of these systems were that they handle hundreds of thousands of telephone calls at the same time. If you imagine a system that has 100,000 subscribers talking to each other, you could view this as being 50,000 pairs of people talking to each other.
Obviously, there is concurrency in the way the problem is set up. If we have 100,000 people using a telephone exchange, we have 100,000 parallel activities going on. The natural way to model this is with 100,000 processes grouped into pairs. That’s 100,000 people talking, it’s 50,000 pairs of two people. It seemed a natural way to describe this.
There was also a tradition of doing this using multiple processes. The system that I first looked at or the one that was being used in Ericsson consisted of two processors. One with an active processor that was doing all the work, the second was a standby processor that immediately took over if the first processor failed. This was the starting point. That was back in 1985. We were in a computer science lab and I started trying to describe this in a number of different programming languages. There was a multi-programming language project to try and model this in different languages.
Sooner or later, I stumbled upon Smalltalk. The way that Smalltalk described things in terms of objects and messages seemed very good, but it wasn’t a true type of concurrency I was interested in. The problem with Smalltalk was that if things failed, there wasn’t really a good failure model, and it wasn’t really concurrent. I went over to try to model failure.
Actually, through some accidental circumstances, I was introduced to Prolog, and I thought I could model all of this in Prolog. Prolog had a very bad failure model. If a computation fails in Prolog, you just get a saying, “No.” Not very good. I slowly modified this. Over a period of three or four years, from about 1985 to 1989, we developed this style of programming which became this language called Erlang. During the time, this project grew from myself to including Robert Virding and Mike Williams. We evolved this style of programming that is now called Erlang.
In 1990, I think it was a sort of hallelujah moment, we went to a conference in Bournemouth and it was about how to program distributed systems. At the time, everybody was building tightly-coupled distributed systems. It was rather embarrassing because, at the end of each talk, we took turns in sticking up our hand and asking the embarrassing question, “What happens if one of the nodes fail?”
These people would say, “Well, we would just assume the nodes aren’t going to fail.” The answer to the question was, “Well, the whole system doesn’t work.” We would shrug our shoulders and say, “Well, yet I know a system that doesn’t work.” We were in this rather strange position of thinking, “Hang on. The rest of the world is completely wrong and we are right. We’re doing it the right way.”
I had always viewed failure as being central to a problem. We cannot assume when we’re building a big system that the individual nodes will not fail. I had viewed building systems as building them from a lot of independent components, and that any one of those components could fail at any point in time and we have to live with that.
Once you’ve split the world into parallel components, the only way they can talk to each other is through sending messages. This is almost a biological, a physical model of the world. When a group of people sit and talk to each other, you can view them as having independent models in their heads and they talk to each other through language. Language is messages and what’s in their brain is a state machine. This seemed a very natural way of thinking.
I’m also minded to think that the original Von Neumann and Turing always thought that computation to be viewed as a biological process of communicating machines. It seemed a very natural way to model things. In fact, it seems rather strange recording a set of talks about the origin of concurrency because the way the whole world works is concurrent. If we look at the internet, it’s billions of devices all connected together, all using message passing, and all with private memories. The Internet of Things and the Web consists of loads of nodes with private memory communicating through message passing.
To me, it is very strange that that model breaks down when you get to individual applications inside a computer. It seems very strange to not have concurrency. This is a funny state of affairs! In the world of programming, the vast majority of all programming languages make it very easy to write sequential programs and very difficult to write concurrent programs. My goal was to make it very easy to write concurrent programs. Consequently, it might be a bit more difficult to write sequential programs. Of course, when multi-cores came along, what we had done then mapped very well onto parallel programs.
Up to that point, concurrent programs were actually sequential programs that were interleaved rather quickly in an operating system. When multi-cores came along, the possibility emerged to execute those programs in parallel, so we were immediately able to take advantage of parallel cores. In fact, that’s probably the reason why Erlang has spread in the last 15 to 20 years because of the way it scales naturally onto massive multi-core computers.
What were the key points that we learned perhaps in the early days of Erlang? This is the period from about 1985 to 1989, this is a period when we did a lot of experiments. I think what we tried to do was structure the system into primitives and libraries. We had to choose primitives that made it easy to write the libraries. What a normal program will do is use the libraries. For example, there are many very, very difficult problems like leadership election or maintaining write append buffers between processes. They turn out to be very difficult to program, so they’re done in the libraries, they’re not done in the kernel of the language.
The work in the late ’80s was to identify which primitives we had to have in the language in order to build the libraries. This was not at all obvious. One primitive that came– It’s actually an idea of Mike Williams, was the notion of a link. That extended our handling across remote nodes. The idea was you could link two processes together. If you’ve got one process here, another process here, you could put a link between them. The meaning of the link was that if one of the processes failed, the other process would receive a message saying that the remote process had failed.
This link mechanism was the enabling factor that allowed us to write a lot of libraries on top of that. What users of Erlang or Elixir will see, I think it’s called supervision trees where we link together collections of processes and build them into trees, but the underlying mechanism is the link mechanism. That’s I think one of the things we learned there was how to build these primitives. In fact, we tried lots of different primitives. We tried to implement buffers between processes and things like that. It turned out that these are very difficult to implement in a virtual machine, so we stripped down the virtual machine to the primitives that we needed.
I think one of the things we learned is that in order to implement a concurrent language, you have to do three things at a very primitive level. Message passing should be extremely fast, context switching should be extremely fast, and there should be a built-in error processing mechanism. Without that, it’s really impossible to build languages like this. That’s what’s built into the virtual machine and gets inherited by all languages that are implemented on top of this virtual machine.
[00:10:16] [END OF AUDIO]