Let's #TalkConcurrency Panel Discussion with Sir Tony Hoare, Joe Armstrong, and Carl Hewitt
by Erlang Solutions
When considering the panel to discuss concurrency, you’d be pushed to find a higher calibre than Sir Tony Hoare, Joe Armstrong, and Carl Hewitt. All greats within the industry and beyond, over the past couple of weeks, we’ve been releasing their individual interviews; a storyboard into the lifeline of concurrency and models over the past few decades.
Here we have the full panel discussion, hosted by Francesco Cesarini, about their experience in the concurrency field, and where they see concurrency heading in the future.
[Full Panel Discussion Transcript]
QU 1 - What problems were you trying to solve when you created actors concurrent sequential processes and the Erlang type of concurrency respectively?
Francesco Cesarini: Concurrent programming has been around for decades. Concurrency is when your multiple events, your code snippets or programs are perceived to be executing at the same time. Unlike imperative languages, which uses routines or object-oriented languages, which use objects. Concurrency oriented languages use processes, actors, agents as the main building blocks.
Whilst these concurrency foundations have remained the same and stable, the problems we’re solving today in the computer science world have changed a lot compared to when these concepts were originally put together in the ‘70s and '80s. Back then, there was no IoT. There was no web, there were no massive multi-user online games, video streaming, and automated trading or online transactions. The internet has changed it all and in doing so with these changes; it has helped propel concurrency into future mainstream languages. Today we’re very fortunate to have Professor Tony Hoare, Professor Carl Hewitt and Dr. Joe Armstrong; three visionaries who in the '70s and '80s helped lay the foundations to the most widely spread concurrency models as we know them today. So, welcome and thank you for being here.
Interviewees: Thank you.
Francesco: The first question I’d like to ask is, what problems we’re trying to solve when you created actors, concurrent sequential processes and the earliest type of concurrency respectively?
Carl Hewitt: I think the biggest thing that we had was, we had some early success with Planner, right? There were these capability systems running around, there was functional programming running around. The most important realisation we came to was that logic programming and functional programming couldn’t do the kind of concurrency that needed to be done.
Joe Armstrong: You’re right.
Carl: At the same time, we realised it was possible to unify all these things together so that the functional programs and logic programs, all these digital things were special cases of just one concept for modeling digital computation, you can get by with just one fundamental concept and that was the real thing. Of course, we thought, “Well, there’s plenty of parallelisms out there, there are all these machines, we’ll make this work.” The hardware just wasn’t there at the time and the software wasn’t there at the time but now we’re moving into a realm of having tens of thousands of cores on one chip and these aren’t wimpy GPU cores. These are the real things with extremely low latencies among them, so we’ll be able to achieve latencies between actors passing messages in the order of 10 nanoseconds with good engineering and we’re going to need that for the new class of applications that we’re going to be doing, which is scalable intelligent systems. There’s now this enormous technology race on.
Francesco: What inspired CSP?
Tony Hoare: It was the promise of the microprocessor. The microprocessors were then fairly small and they all had rather small stores and they weren’t connected to each other but people were talking about connecting large numbers of microprocessors mainly in order to get the requisite speed. I based CSP design on what would be efficient, controllable and reliable programming for distributed systems of that kind. So that’s a basic justification for concentrating on a process that didn’t share memory with other processes, which certainly makes the programming a great deal simpler.
The problem at that time was the cost of connecting the processes together, the cost and the overhead. The devices for doing this were based quite often on buffered communication, which involves local memory management at each node. I knew that since you had to call a software item to perform communication that the overhead would just escalate as people thought of new and clever things as people always do with software, don’t they? I wanted the hardware instruction for output and for input to be built into the machine code, in which the individual components were programmed.
Now, a measure of the success of the transputer, which with the efforts of David May was implemented some years later, '85 as opposed to '78. He got the overhead for communication so low, that if you wanted to program an assignment even, you could program it by forking another process, a process which performs an output of the value to be assigned and another process for inputting the value that is intended to be assigned, use communication for that and then join the two processes again. All within a factor of 10 to 20 ordinary instruction cycles, which was way above anything that any other hardware system could touch because the communication was synchronised, it was possible to do it at the hardware level. There was another reason for further pursuing the synchronised communication, that was I was studying the formal semantics of the language by describing how the traces of execution of each individual process were interleaved. If you have synchronised communication they behave like a sort of zip fastener, where each zip links in with a single zap and the train of synchronisations forms a clear sequence with interleaving only occurring in the gaps between the synchronised communications. A combination of practice and theory seemed to converge on making synchronised communication the standard.
Of course, I realised you very often need buffered communication but that isn’t very difficult to implement on a very low overhead communication basis by just setting up a finite buffer as a process in the memory of the computer, which mediates between the outputting process and the inputting process.
Francesco: You picked synchronous message passing because it was fast enough and it solved the problem?
Tony: Fast enough? It was as fast as it could possibly be. I’m talking about what is now 10 nanoseconds, that’s the sort of speed you need to be built right into the software.
Francesco: Exactly. Not only but the solution was much simpler, which is perfect. Joe, what about you?
Joe: I started from a different angle, I wanted to build fault tolerant systems and pretty soon I realised that you can’t make a fault tolerant system on a computer, because I think in the entire computer might crash, so I needed lots of independent computers, I’d read your CSP book and played with transputer and I thought, “This is great, this sort of lockstep [mimics thumps]. How does it work in a context where the message passing is not internal?” It’s to remote load, I did want it to be remote in case the thing crashed and I couldn’t get this synchronous.
I was a physicist and I’m thinking, “Messages take time”, and they propagate through space, there’s no guarantee it gets there. If you send a message to something else and it doesn’t come back, you don’t know if the communication is broken or if the computer is broken, even if it gets there and the computer receives it, the computer might sort of not do anything with it, so you really can’t trust anything basically.
I just want to model what’s going on in the real world and I’m thinking, “It’s the key”, I have read your book and this observational equivalence sort of struck through with this– I thought, “It was the most important principle in computer science.” Basically, we’ve got black boxes that communicate but we shouldn’t care what programming language they’re written in. Provided, they obey the protocol, so I thought this, it was central that we wrote down the protocol. Because we couldn’t formally prove it in the sense you would want to do inside one system, I thought, “Well we’ve got to just dynamically check everything.” So, we need to build a world where there are parallel processes communicating through message passing and I thought they cannot have shared memory because if they have shared memory and the remote going to be the crash. If you don’t want dangling pointers that you can’t dereference, so that was a sort of guiding principle. I didn’t know about the actor stuff at the time and I don’t know, how else can you build systems?
Carl: That’s right [chuckles].
Joe: We are four people. We’ve got our state machines and we’re sending messages to each other. We don’t, actually, know if the message has been received.
Carl: That’s right.
Joe: And this and this. I thought because I used to be a physicist and I’m thinking, the program and the data have got to be at the same point in space-time for the computation to occur. I thought, why are people just moving the data and not the programs. We could move both of them to some intermediate point in the middle to perform the computation there. I think, in part of the system we can use strong algebras there, lock step when it’s very beautiful. Another part of the system we can’t, it seems to be a mix between mathematics and engineering.
The mathematics can be applied to part of the system, and best engineering practice can be applied to other parts of the system, a delicate balance between the two. I was pursuing the engineering aspects of this and just trying to make a language to make it relatively easy to do this. I thought we’re treading on a minefield. There are sudden bits of terribly complex software like leadership election and terribly complicated, then there are bits that are terribly easy.
It struck me rather strange that there was this paradox of the things that are terribly simple in sequential language that are impossible in concurrent languages, then there’s the other way around, the fix is terribly simple in concurrent language, impossible in sequential languages.
Tony: I agree with you completely about the central importance of all of the buffered messaging and indeed that it has some nice mathematical properties that they’ve synchronised messaging doesn’t have. But the synchronised message, the paradigm really has another reason, and that is to create the input and output as a single atomic action. I think, atomic actions in the sense of actors and in the sense of petri nets too, I think.
Carl: That was the other thing that mystified us because we were thinking, well if you want to be like that the thing that was done in the old sequential computation by Turing and Church. There was a universal model. They nailed that thing. We wanted to do the same thing for concurrency, so we thought well the only possible way to do that is to base it on physics because no matter what they do, you can’t get around physics. [laughs]
Tony: I agree, yes.
Carl: This put constraints on and also we wanted to be distributed. We thought, well, okay. Also, we wanted to be multi-core, so that means if it’s distributed on the IoT that there’s no in-between place to buffer them. The message leaves here before arrives there, right? We can’t synchronise these IoT things, so the fundamental communication primitive has to be asynchronous and unbuffered. If you want to have a buffer that’s just another actor. You do puts and gets on your buffer right, and sure.
Tony: How you got this all upside down.
Tony: Look at the actual physics, the electronics, it’s all local. If you have a 10 nanosecond communication time on-chip and you don’t take advantage of it by doing synchronised communication then your overhead is good. You can’t use it for everything. Basically, both of them are necessary, which is fundamental…so shall we say postpone discussion?
Carl: You see we don’t have, 10 nanoseconds. It’s only average. In some cases, okay it’s going to take us a long time to get a message across a chip, but we have to through compactifying garbage collectors and get the locality and so on. The average, it’s only 10 nanoseconds, but when a core on one side of the chip sends a message to the other side of the chip, goes to this fantastically complicated interconnect. It’s like the internet on a chip between these two cores. Again, the way that they build these things, there is no buffer. You assemble a message in one core and you give it to the communication system and it appears on the other side and there’s no buffer.
Joe: I think, we don’t really want two different ways to program.
Carl: That’s right.
Joe: If you got the world wide web, if I’ve got a process in Cambridge that’s talking to one in Stanford, its messages, I write it this way, I’ve had to send and receive that. I have to allow for failure, my message might not get through. And suddenly, if they collapse this program onto a single processor, where they’re both in the same place, I don’t want to change the way I program. I want to write exactly the same thing. I don’t want two different mental models. I can use mutexes and I can use all these fancy things, but I don’t want to– [crosstalk] Where I can see that–
Carl: Have you seen the chip? Now with having 10,000 cores on the chip, the core on the other side of the chip might as well be in Los Angeles, get ready. It’s distributed programming on a single chip– [crosstalk]
Joe: And WiFi and things like this are going to change that as well when we have– [crosstalk]
Tony: When you can build the buffered communication with a 10 nanosecond average delay, I will come around to your point of view.
Carl: Oh, we can do that [laughs] Average, the trick is the average. In some cases, it’s going to be– It could be seconds, that’ll be very few of them.
Tony: That’s why I think the Erlang idea of distinguishing local from remote is so important and they both exist, which is fundamental. I’m not going to argue about it. If you take remote as fundamental, well, you’re welcome to it as long as you’ve been my transaction. The things that happen locally, really do happen in a way that no other agent in the system can detect anytime or state in which some of the actions have happened itself or not.
Carl: I am absolutely good. Inside an actor, it’s not visible to the other actors.
Joe: That’s right, but I thought that the function call really is like a black box you use.
Tony: Absolutely. Yes.
Joe: You’re sending a message into this thing that doesn’t know, and you get a return value. Only it’s got different semantics because exactly once, is trivial, I mean that’s how it works, but exactly once doesn’t work in a distributed system, it’s at most once to at least once. You have all the funny impossibility things happening then. It’s funny that this is different local [crosstalk] in a different model.
Tony: I think there’s– I’ll bring in one of my other buzzwords, abstraction. Modern systems are built out of layers of class declarations. A class declaration can itself be used called by classes higher up, up the hierarchy of abstraction. What the method calls to a lower class are treated theoretically. Theoretically, though may not be implemented in the same way as transactions when reasoning in the higher-level class.
They will be implemented by method bodies, which are far from atomic in the lower class. Each class has an appropriate level of granularity at which it regards certain groupings of the actions as being inseparable in time. At the same time, it produces nonatomic things, which are just method bodies, which simulate the atomicity at higher levels. Now the simulation can be very good because the one restriction about disjointness that I would like to preserve is that each level doesn’t share anything with the levels above and below, which I think to be practical programs, perhaps you could check this, would regard as a reasonable thing of declaring it–
Francesco: By the way, it’s correct, layering abstraction.
Joe: I entirely agree with you, but the question that interests me is, what happens when the system becomes huge? Because if you got this little tight system, you could prove anything you want.
Tony: That’s right.
Joe: You may or may not be able to prove things about, but the idea of the real practice is– [crosstalk]
Tony: The real payoff comes when– [crosstalk]
Joe: Really big. How I know when–
Tony: Bigger the better.
Joe: Yes, but imagine it’s changing more rapidly, than you can prove its properties. Imagine– [crosstalk]
Tony: The reason why you could–
Joe: Imagine it’s always inconsistent.
Tony: I don’t have to imagine these things– [crosstalk]
Joe: [crosstalk] any attempt to make it consistency possible.
Tony: Do you know Hoare’s great saying about inside every large program, there’s a small program trying to get out.
Carl: Yes, but you never find it.
Tony: You never find because you didn’t put it there at the very top-level. of abstraction. Everything, you will have a very powerful atomic action. You can write a small program which describes at a large scale, what a very large program was.
Joe: This is a thing that really scares me, are people developing large applications that they don’t understand. Then they get so complex and put it inside the black box and seal it [crosstalk] layers, so you end up with gigabytes of stuff.
Tony: Who talked about sealing it? What is the part of that program that changes most frequently? The top layers change. You can change, they have an interface. Well defined interface [crosstalk].
Joe: Many programs don’t have well-defined interface. They should have, I entirely agree.
Carl: These intelligent systems don’t work that way. They’re not like operating systems, okay? These ontologies have a massive amount of information. The layering doesn’t work anymore for these massive ontologies. They’re just chock-full of inconsistency.
[00:21:17] [END OF AUDIO]
QU 2 - Is there anything forgotten which should be known, or anything which you feel has been overshadowed, which is important?
Francesco: Is there anything forgotten which should be known, or anything which you feel has been overshadowed, which is important? I think maybe one or two key points.
Joe: I could talk. I have lectures that have gone for hours.
Francesco: [laughs] I wish we had hours.
Carl: That’s a scam.
Joe: Now you’ve got me going. Now it becomes interesting.
Francesco: Anything from CSP, which you feel has been omitted or forgotten, which would help us today?
Tony: If there is, I’m sure I’ve forgotten it. [laughter] I do think this- the new factor, which I hope will become more significant, has been becoming more significant, and that is tooling for program construction. A good set of tools, which really supports the abstraction hierarchy that I’m talking about, and enables you to design and implement the top layers first by simulation, of course, of the lower layers, it’s the sort of stub technique that it will actually encourage programmers to design things by talking about its major components first. The second thing is that the tools must extend well into the testing phase. As you will know, large programs these days are subject to changes daily. Every one of those changes has to be consistent with what’s gone before and correct not to introduce any new strange behaviours. I use a Fitbit. Changes are just extraordinary.
Joe: Why do I have to change the software once a day?
Tony: No, they’re a little bit less frequent. I have to exercise once a day. I think that’s the problem.
Joe: People keep telling me this, you’ve got to upgrade your operating system. Then they say, 'Well, that’s because of security things.“ I don’t really have much confidence in them. [laughter] If they said we have to change it once every 20 years, I could believe that it was reliable, but telling me that I have to change it once every six weeks is crazy.
Tony: You need an overall development and delivery system in which you can reliably deliver small changes to large programs on a very frequent basis.
Joe: Without breaking everything.
Tony: Without breaking everything. Well, I think I have a rather unpleasant dream, I think it is, that you’ll get your customers to do the testing. The beta testing has always been a useless technique. Well, you set up a sandbox in which you deliver the new functionality. If it fails in any way, you go back to the old functionality, you treat it as a large scale transaction, you go back and you run the old functionality for that customer and you report it. That report gets straight to the person who is responsible for that piece of code in the form of a failed trace.
Joe: Why don’t they do that? Should have done it 20 years ago.
Tony: Because it’s actually not very easy.
Joe: You get a black box and record it, you know like a flight recorder on a plane.
Tony: The screens were not really powerful enough to do a large scale trace. In the case of concurrency, you mustn’t use logs. Those logs of individual threads are a pain to correlate when you do get the communication. You’ve got to use in fact, a causality diagram and arbitrary network and the software to manipulate those on large scale I think will take some time to develop.
[00:05:06] [END OF AUDIO]
QU 3 - Linear structures vs causal structures
Francesco: Even with concurrency, you need to be able to extract the linear execution of your program from process-to-process. It’s something which there is work being done on it and structure deal.
Tony: I would say you have to analyse the causal structure, not the linear structure.
Francesco: That’s true.
Carl: I think that we’ve forgotten, which we knew in the early days of intelligent systems, is that these systems are to be isomorphic with large human organisations. These complex intelligent systems are going to run on very much say, the principles that Stanford runs. You say, "Well, what are the specifications for Stanford University?” Well, we have principles and we have ethics and we have guidelines, but you really don’t have formal specifications. Anything you think is going to work for programs, for launch programs, that wouldn’t work for something like Stanford University, it’s not going to work because they’re basically isomorphic.
Therefore, I think that what we do is we do keep logs for these things. Stanford keeps records of all kinds, and that’s so that if something goes wrong, we can look back and try to see how we can do better in the future and also to assess accountability and responsibility. That is the fundamental thing, is that that’s going to be the structure of these large scale information systems that we are constructing.
Francesco: My key points I’m taking home here are, simplicity, where you need to have small programs or programs which become complex but the units are small. It makes sense to see a process, an actor or an agent as maybe one of the building blocks, which is small, it’s containable, it’s manageable. The second point is I think the importance of no shared memory, correct me if I’m wrong. This no shared memory approach then brings us into both distribution and scalability and multi-core. Those are the key points I’m taking home.
Joe: I think one of the things we’ve forgotten, is the importance of protocols and not describing them accurately. We build a load of systems, but we don’t write down what the protocols between them are, and we don’t describe the ordering of messages and things like that. You would think that it would be easy to reverse engineer, a client-server application just by looking at the messages but you can trace the messages then you say, “Where’s the specification that says what the order should be?” There is no specification. Then you have to guess the ordering means.
Tony: One can use finite state machines [crosstalk] specifying these things. CSP would.
Joe: People don’t, that’s the problem. In fact, all the international RFCs are pretty good.
Tony: I think I would have a small concession to make to sharing. You’re allowed to share something between two processes at most. Obvious examples are the communications channel. What’s the use of that if you can’t share it between the outputter or in the inputter. Now, if you have a hierarchical structure like I’ve been describing, the behaviour of a shared object is programmed as a process inside the lower level class, so that even if you only use the same programming language, it’s a highly non-deterministic structure, which is a different context I used to call a monitor, which accepts procedure calls, accepts communications from all sides.
Everybody who uses it has to register and has to conform to an interface protocol which governs the sequence which makes the sharing safe in a sense, which is important at the higher level and implemented in the lower level.
Carl: The fundamental source of indeterminacy in these systems, you have all these zillions of actors sending messages. Is the order in which the messages are received, that’s where the arbitration occurs in the system. If you have something, for example, Tony and this work that was done by Tony, if you’re on our readers, writers scheduler, you’ve got these read messages and write messages coming in from the great world out there. You’re sitting here defending this database, you’re scheduling it so that it’s never the case that there are two writers in the database, and there’s never the case that there are a reader and a writer.
You’re sitting here taking these requests from all covers you don’t know who’s going to be read and writing this database, and you’re scheduling of that. You have your own internal variables and then he must be kept very private the number of readers and the number of writers you’ve got in the database for example. The indeterminacy is in these messages that are coming in from the outside world which you are then scheduling for the database. That is the funnel and irrevocable source of the indeterminacy.
Tony: I agree. That’s why it’s built into CSP. The fundamental choice, construction allows you to wait for the first of two messages to arrive. Cut it down to two, two doesn’t scale but just bear with me for a bit! The great advantage of this is that if you have to order the receipts of these two messages, you will double the waiting time. If you went for two things at the same time, you wait twice as fast.
Carl: Well, here’s the actors who found that these two messages coming in. You take them in the order in which they’re received. If you want to process them in a different order inside, the idea is you don’t want to have a queue of things waiting inside of it. The ideal, but you don’t want to have that, you want to take everything that comes inside so that you can then properly schedule the order in which you process it. It’s like your mail, you take the mails that come in, you may not want to pay to pay the first bill that comes in. You’ll process it later but you take it as it comes in because that’s much more efficient.
Joe: What Erlang does, it’s every practice’s got a mailbox, incoming messages just end up in the mailbox in order and the program gets an interrupt say, “Hey, there’s something in the mailbox,” and then it can do what the hell it likes, it just spits them out. “I want to take that one out, take that one out. I’m going to go to sleep again.”
Carl: That’s an excessive amount of overhead.
Joe: But it makes the programming a lot easier.
Carl: I don’t think so, because you can program is much more easily if you take it all inside as it arrives and doesn’t have this separate cable out there.
Joe: But then you have an MxM-state machine…
Carl: Well, it’s not so bad for readers, writers.
Francesco: It depends on the problem you’re solving, very much.
[00:07:53] [END OF AUDIO]
QU 4 - Why is concurrent at scale today still done with legacy languages that have concurrency bolted on as an afterthought. What do we need to do to change this?
Francesco: Why is concurrent at scale today still done with legacy languages that have concurrency bolted on as an afterthought. I think concurrency needs to be designed into the language from scratch. It’s very very hard to write the framework and bolt it on, what do we need to do to change this?
Joe: Survival of the fittest.
Carl: There is this often it’s a new project. Okay, like the moon project or heaven forbid, the Manhattan Project or the icon project that enables new things to be brought in because otherwise because capitalism is basically a very incremental hill-climbing process. The most sensible financial thing for capitalists to do is to bolt something on because you get the most rapid buck for the least investment in the short term but then you end up with monsters like C++ [laughs] and things like that but if you just keep pursuing that path. I think that because we’re now engaged in this giant race among these nations to create these scalable intelligence systems and they’re good at creating these large projects to do that there is some opportunity now for innovation because that’s not the standard hill climbing.
Joe: I think hardware changes precede software. I think if you kick this hardware the same, you get an S-shaped curve, you get rapid development in the beginning and then you get up the top end and nothing much happens and then new hardware comes on and suddenly there’s a lot of change. So Erlang is billions and billions of times faster than it was but that’s due to clock speeds, it’s not due to clever programming.
Carl: Well the clocks aren’t going up I know. We’re now faced with two fundamental evolution’s having thousands of powerful cores on one chip.
Carl: Also having all these IoT devices, those are two huge hardware changes.
Joe: I always thought that gigabyte memories and certainly I view petabytes memories when they come to me like an atomic bomb because that they are just– if you imagine the combination of petabytes memories with LiFi and communication at tens of gigabits per second but the combination and like 10,000 Cray-Ones and a little thing like your fingernail everywhere in every single light bulb that’s like an atomic bomb hitting software. What we’re going to do with it, nobody’s got a clue.
Carl: Well that’s the thing is a stacked carbon nanotube chips that they’re working on now aren’t going to give us these thousands of cores on a chip. Also, they make the memory at the same stuff they make the processor out of. It’s different from now we make it the DRAMs out of different stuff that we make the processes for it so we can’t combine them.
Joe: I was completely blown away a couple of weeks ago. I saw a newspaper article about farm bots and suddenly this company made three little robots. One was a seed planting robot, tiny little thing. It will go around and plant seeds. Then there was a watering robot. Walked around and looked at the seeds. Then there was the weeding robot that had a pair of scissors on the bottom. It went around snipping the things and suddenly this realisation that farming could we– We could watch every single seed individually and the amount of energy to do so I thought was claiming was like 5% of the energy of ploughing, using a plough is terribly inefficient use.
When we’ve got computing at this scale, we can tackle traditional problems in completely different ways and we have to do that for the benefit of mankind not to build things to feed your cat when you’re out. To improve the efficiency of farming and things like that. It’s amazing.
Francesco: What you didn’t know is that the farm bot was actually powered by Erlang.
Speaker 4: No, I didn’t.
Joe: It was open source and all you need is a 3D printer and you can print these things have them running around in your garden.
[00:04:16] [END OF AUDIO]
QU 5 - The future of current programming and immutability
Francesco: I think there a lot of claims about the future from current programming languages. Some people claim that there’ll be a lot of features taken from functional programming languages. The first kind of feature which comes to mind is immutability.
Carl: The essential thing about the actors is that they change. They get all their power of the concurrency, is because they change. Now, the messages they send between each other are immutable because they have to exist as photons and there’s no way to change the photons in route. By definition, the messages are immutable but the actors have to change. They get all their power of modularity from over the functional programming, is because they do change. They change a lot, which the functional programming can’t do, right?
Francesco: Yes, but it’s only the actors which can change their own data.
Carl: They changed it, that’s right. [crosstalk]
Francesco: From the outside, yes.
Carl: As our friends say, change comes from within. You can’t change me but you can send me a message so I can change myself.
Francesco: It’s a form of isolation and I think these are ideas which come from functional programming but they’ve also been heavily influenced from over-programming. I think Alan Kay’s objects, objects don’t share memory, and objects communicate with message passing.
Carl: You should mention Kristen Nygaard and Ole-Johan Dahl for that.
Tony: I think this is a crucial argument. If you’re writing programs that interact with the real world, you’ve got to construct a model of the real world inside the computer, as it’s done just universally in the design of real-time systems. The real world has things called objects and the objects do sit in a certain place, more or less. They can move around but the movement of objects, the existence of objects, the sequentiality of the actions performed by the same object, these are features of the real world. The objects change. Functional programming doesn’t address the real world. I think functional programs are wonderful. [chuckling]
I really, really admire functional. If I had my choice, I’d always use functional programming.
Carl: You don’t have a choice.
Carl: You can’t do the readers/writers scheduling as a functional program. It just makes no sense, it can’t do it. The scheduler has to be continually changing its internal state, as the read and the write messages come in. It’s got to be buffering up the reads and buffering up the writes and letting some reads – It’s just always changing and you can’t do that.
Joe: Alan Kay said, the big thing about object oriented programming was the messages. It was the messaging structure that was the important bit to know. That was what had been lost and of course, then the next thing comes, we’ve got your immutable messages, which I totally agree with. Then, we need some kind of notation to write down the sequences of allowed messages, which you got in CSP and which people think to ignore. A state machine in CSP describing the allowed sequencing of messages.
Carl: The only thing about the actor model was to minimise sequentiality as much as possible. Sequentiality is evil. You have arbitration in front of the actor, in terms of the order in which it’s going to take the messages in because that’s irreducible. As soon as an actor takes a message in, it wants to run everything inside of itself in the parallel, to the extent that it can. That is its goal, the maximum amount of internal parallelism inside an actor.
[00:03:53] [END OF AUDIO]
QU 6 - Can a solution with share states be made robust and safe? And can a solution which communicates with measured passing be made fast?
Francesco: Can a solution with share states be made robust and safe?
Carl: You mean shared memory in which you do assignments, on loads and stores? No way.
Francesco: A second question is can a solution which communicates with message passing be made fast?
Carl: Yes, but only if you have the right kind of processors in it. That respect Tony was a pioneer with a transputer, of realising that in order to do this at speed, you have to have the hardware that’s suitable and the hardware previously was not. We’re going to have to do that again. The RISC processor is not suitable. We have to do better than that.
Francesco: What is the implication for the futures of software development?
Tony: I think to capture– The test for capturing the essence of concurrency is that you can use the same language for the design of hardware and of software because the interface between those will become fluid. You’ve got to have a hierarchical design philosophy in which you can program each individual 10 nanoseconds at the same time as you program over a 10-year time span. Sequentiality and concurrency enter into both those scales. Bridging the scale of granularity of time and space is what every application has to do. The language can help do that. That’s a real criterion for designing the language. [crosstalk]
Carl: Each semicolon hurts performance because you have to finish up the thing that’s before the semicolon before you can start the thing after the semicolon. The ideal concurrent program has no semicolons. [laughs] No sequentiality.
Tony: I played with functional programming…
Carl: No, no, it still has to do the state change, but it has to have these macro state change things like in queuing and dequeuing and allowing guys in the queues to proceed. This macro things so that you don’t have to spray your program full of semicolons, but still have the state change. It’s not functional.
Joe: I have played with some of these highly concurrent languages. I have played with Strand which was highly concurrent and it was terrible because they have a problem with the opposite. If you created too many parallel processors, so something rather a surprise of this tiny thing you’ve created 6 million parallel processes to do virtually nothing- [crosstalk]
Tony: There is a wonderful way of controlling concurrency. If you got a concurrency problem, try and make it more sequential. Anyway–
Carl: Maybe that’s being religious.
Tony: I would say it’s all in my religion which is that if you have programmed or unprogrammed components, there are two ways of composing them. One sequentially, which requires that all the causal chains go forward from one to the other, another backwards. The other in which the causal chains can go between both operands. You have to tell the programmer that he is the person who has to worry about deadlocks. Some actually–
Carl: I think we’ve solved the deadlock block problem by the following mechanism. Whenever an actor sends a request to another actor, the system says, “Okay, we’re keeping statistics on what’s going on.” We don’t get a response back within the certain number of standard deviations, then the program that issued the request is throwing an exception, “Too long, it too long.” Right? Now, you can try again, but a program will never deadlock, right? It will always terminate. [laughs].
Joe: We’ve done that for 30 years.
Carl: Fair enough. Okay, he’s already got the solution.
Joe: In fact, with deadlock is– Talked to Francesco, I said, “I’ve only been hit in the face by deadlock once or twice in 30 years because we use Carl’s mechanism. On the other hand, you do have the nasty problem with– The message doesn’t come back within this time. Then the time comes and then it comes just after that. You’ve got a lot of messing around and throw it away. That’s another tricky problem then.
Carl: It is, that’s right.
Tony: Well, the problem was solved in the same way in the transputer language Occam, which every time you waited for something you could specify a time limit. It’s responsibilities put on the programmer to manage deadlock in that way. I was deprecated that way of managing the deadlocks, but I think it’s going to be inevitable anyway.
Joe: I remember with Occam the abstractions were great but the transputer didn’t do fair scheduling. When you’re waiting for things, some of the things sort of lower down weren’t fairly scheduled.
Carl: You don’t want to put the burden on the programmer to specify the amount of time. You should say, it’s like you don’t want to put the program on the business of doing the garbage collection with freeze. You want the system to handle it automatically, therefore it will be keeping the statistics and the number of standard deviations that it’s taken in the past.
Joe: Of course, what you said about timeouts? Tony, I gave a talk about Erlang and you were in the audience and you had one question, you said, "How do you choose the value for the timeout?” You have immediately hit on the key.
Carl: The answer is, don’t put the burden on the garbage collection, you put the burden on the system to keep your statistics and throw the exception.
Tony: At the level of the abstraction hierarchy, which you are now living, you choose a level which is appropriate.
Joe: And I must say the Telecoms people actually did it very well because they have two protocols. They have remote procedure calls that are known to terminate very quickly. You send a message to something, immediate answer back. Therefore it’s okay to busy wait for that. That’s fine. The second one is that you know that it’s going to take a long time, so send an acknowledgement back. Then you know you’ve got to wait a long time. The protocol designers sort of have to think «which of these two cases should I use so that it’s very explicit?»
Tony: Absolutely the right answer.
Joe: All of Telecoms protocols sees that. Virtually none of the software systems use that.
Tony: In the concurrent system, you have a concept of a transaction, an atomic event which stretches across more than two components. That is a very important idea for which there are many implementations, and therefore, I don’t know. People are reluctant to put into programming languages.
Joe: A remote procedure call should actually say, I send you a message and the answer I should get back, uh, either immediately to get that one of two things, is either here’s the answer, or I’m going to give you the answer within 10 seconds. You should tell me how long you think it’s going to take.
Tony: This is built into Occam because if you just didn’t mention anything, it wouldn’t assume to.
Carl: Yes, but transactions have never been successful for distributed systems and now everything is a distributed system, including what’s going on in a chip. I have my doubts the transactions are going to be a part of the feature of concurrency-
Tony: At a low level.
Carl: -but within an actor when you get some message it tries to do that. Even then, it’s got the problem between any pair of instructions that can be blown away.
Francesco: Yes, that’s how we achieve scale. Any transactions are basically serialised through process or an actor. Then, you need to—however—make sure that you’ve got the fault tolerance around it in case that you lose an actor-
Joe: That’s right. Exactly, yes.
Francesco: -because you then need to replay. That’s done in a different layer. You’re actually hiding the complexity away from the programmers.
Joe: I was going to say, if you had good clock synchronisation down to– Say your IDs have got clock synchronisation down to 100 nanoseconds?
Carl: If you can cross a chip, that’s good enough.
Joe: No, no, but across the world. If you have really good– Whatever the granularity of time synchronisation, if you think you can trust that, a lot of problems would go away but it’s very difficult.
Tony: Levels of granularity–
Joe: We could use supernovas and stars, and measure the time-
Carl: Google’s pursuing that and now, it’s causing them tremendous amounts of problem. They thought that they can rely on that global time synchronisation. They find that they can’t, that there’s a tail, right? The time synchronisation was cutting off that tail causing unreliability problems. Now that you’re going back to what Tony was talking about, namely, the causal model because message passing like the semicolon also moves irreversibly forward in time. It creates a chain of messages from here to there, that is irreversible.
Joe: I used to work with astronomy and the astronomers could get clock synchronisation down to about a nanosecond. If you could propagate that out or - of course you can’t - but that’s the best you could probably do.
Tony: Just accept that you have to live that different levels of granularity, but you don’t want to import all the problems of the lower levels every time you write a higher level thing. Higher level things tend to be slower because they’re implemented in terms of lower level things, and therefore the inefficiency of the implementation at the high levels which is where the real application oriented actions happen are relatively not quite so sensitive to overhead as the lower level.
Carl: Yes, I agree with Tony. Recently now that we have these IoT devices, we have to have something since an unseen actor is going to live or might have a distributed implementation but an actor then is for a group of IoT devices, like the IoT devices in your house, right? You need to group that as a new unit of abstraction, you and your IoT devices is now a citadel. We had to do that. Now currently we have firewalls which are just terrible. We need a new level of synchronisation, new level security, these citadels which protect a unit of IoT devices and people, and from the Internet, from the bandits on the Internet and they have to be grouped together. Then within that, they use cryptographic protocols between the IoT devices so you know you really can trust what’s happening.
Joe: I was going to say what do you think about distributed protocols where you deliberately slow everything down? For example Bitcoin, that’s the fundamental design of things. Well, we’ve got to propagate this to the entire world. That will take 10 seconds and therefore we have to slow every computation down so that it takes 10 seconds.
If we get faster processors we will make the computation more difficult but it still takes 10 seconds.
Carl: If your business model is to make things slower, your competitor is going to beat you.
[00:11:46] [END OF AUDIO]
QU 7 - What has been the most disappointing developments in concurrency in the last few decades?
Francesco: What is the development, I think, which has most depressed you in the last 10, 15 years, 30 years?
Joe: Interested or depressed?
Francesco: Depressed you. That made you sad, made you angry.
Joe: The bitcoin proof of work algorithm…
Carl: No, no. It’s the mass surveillance that Snowden revealed. Right? That is really being done, surveillance is being done on a totally amazing level. The amount of information that companies and the intelligence agencies are collecting on us is just astounding. The question is, will they get everything? Because we’re about to all be wearing the holo glasses in 10 years or so because the replacement for our cell phones and that they have a backdoor into holo glasses. They see and hear, everything that you see, hear and do. It’s an absolutely terrifying prospect, but you can’t resist it. You’ll have to use them in your job. Right now I can’t be functional in my life if I gave this up. I would no longer be competent, right? I can no longer coordinate with people, I couldn’t get my job done. The same will be true with the holo glasses once they get them lightweight, like once that Tony wears. Then they don’t make you look like a bug-eyed monster like the current entertainment ones do, right? That’s happening because the companies in Silicon Valley have the prototypes and big companies will be shipping it in just a couple of years.
Tony: Well, the central level interference in elections referenda is even more horrific because it really is very easy now to buy votes. When this happened in the Roman Republic, people got rich enough to buy votes. The Republic failed and certainly couldn’t maintain a democracy. I think the political implications are dreadful.
[00:02:00] [END OF AUDIO]
QU 8 - Concurrency going mainstream
Francesco: We’ll be seeing a concurrency oriented programming becoming mainstream. It’s an excellent idea.
Carl: It has to. It has to. That’s right. If anything we see the default applications, the default system is going to become an intelligence system, because now we’re going to have the capability to do it. In order to get the response times down, like the people doing with the glasses, you’d think that if you got a server on the internet, you think you’re doing pretty good if you’re giving a 100 millisecond response time. Well, the Holo glasses, they laugh at 100-milliseconds. They talk about 10 [laughs]. That puts an enormous force on how fast the thing has to perform and the only way to do it is with the concurrency.
Joe: Now, I think we’re going to go to the sort of structure the brain has. When I was working at Ericsson, you look at how mobile phones are made. I think they’ve got a video codec and an audio codec. The brain has got this visual codecs and it’s got the audio-visual part of the brain with specialised hardware for that. If you look at the sort of chips we build, there was a lot of confusion. There a lot of different video codecs.
Then somebody would say, “This is the best codec and we’ll build that in hardware, and this is the best audio codecs.” Then the speech recognition. These become standard components. You bake them into a tiny little chip wired it up with a lot of memory and very fast communications. Then I think the development stops then until we get new generation of chips that have neural network chips that are very, very fast, but that will change how we program.
[00:01:43] [END OF AUDIO]
QU 9 - What are your views on blockchain and decentralised web? What role do you see concurrency playing?
Francesco: What are your views on blockchain and say Solid, you know, Sir Tim Berners Lee’s decentralised web? What role do you see concurrency playing in both?
Carl: Blockchains are very slow and they’re easily hacked like, for example, in Bitcoin, the Chinese bit-miners own the majority of the bitcoins and so they can outvote anybody else then, right? That won’t work.
The other thing is that we’ve learned that performance is enormously important, and you have competition, and you have to have a business model to have any effect on the world. So unless Solid can compete in the business model and in performance, then it won’t matter. Even if it has great nice ideas, like it was once thought, I disagree with Joe, in that blockchain was a great idea but blockchains completely don’t scale so it was absolutely necessary in order to have a scalable web is to use one-way links. For example, actor addresses don’t have back pointers, because it would just completely kill performance. What if I’m an actor, there might be some popular actor and there might be millions of actors that have its address that could send it a message, but that one guy can’t be held responsible for knowing everybody who has its address.
The scalability has now become a crucial issue and that’s a driving force for concurrency because concurrency is the only way to get the scale and performance.
Joe: I think deployment is a problem because even if somebody made an open source privacy application, it needs 50 million users to take off. Apple and Google and everybody have dominated this way of deploying something to hundreds of millions of people.
Carl: That’s right.
Joe: It’s very difficult to break, the first one to get a hundred million users wins basically.
Carl: You have to have a business model. I think that for the citadels like each home has a citadel when you get the internet, the business model again it’s going to be advertising because how do you compete with free. There’s a business to be had between your citadel, matching your citadel with merchants that want to sell to you matching you up with them there is a business there, which is basically some of the advertising business. If somebody would build a citadel based on that, then they could fund the whole thing out of advertising as Google does currently with a centralised model.
The problem is that we have is how do you bootstrap that, how do you get a big player to make the conversion because it completely scares them because it’s contra to their current business model.
Joe: What I don’t know it’s the asymmetry in knowledge, so Google knows everything about us but we know nothing about Google. When people start to realise that that asymmetry can be used for political purposes and economic purposes that they will demand– Maybe maybe something like AT&T was split up. Why isn’t Google being split up? Why doesn’t the European Union have something like Google to deploy its services?
Carl: Exactly, but note they have toxic knowledge, having access to our sensitive information in their servers is actually going to be very bad for them, because once the people in England realise that the Americans have all this intimate knowledge in their data centres of British citizens, they realise that’s a national security risk and that’s for example why Uber was kicked out of China. The Chinese government didn’t want to have a foreign company to know about the travel habits of the citizens of Beijing so they bottom out.
Storing sensitive information is actually toxic to these companies. They just don’t realise it yet because they’re going to get the view that now they’re being forced to store all the information in each country like you have to store the Chinese citizens’ information in China, and then you have to be domiciled in China which means you’ve just been broken up you can’t be an international company. Not only that, if you’ve got the sensitive information in your data centres, then I’ll send you at the security service of your country you want to come and say, “Look. I want to have it.”
Then they discover they don’t they just have the bits, they want to have your toolchain. If you’re your Google or Microsoft, the only way they can manage that is to use your toolchain so then they have this little building inside your company. That’s a pain in the tail, they have two companies that they have to get bits from. They want you to standardise your stack and then the company because it’s got this sensitive information is becoming a prisoner of the government because now the government wants the information.
Francesco: We’ve gone from concurrency to resilience to scale to kind of social-political area and they’re all linked together.
Carl: That’s right.
Francesco: There’s no doubt about it.
[00:05:15] [END OF AUDIO]
QU 10 - How would you sum up the future of concurrency in one sentence?
Francesco: How would you sum up the future in one sentence?
Joe: I don’t know, I always imagine a historian in 2-300 years’ time writing the history of this period. It would just be like the Dark Ages, the ages of confusion. Will it end with computer failures that kill millions of people or will it transition into something that is for the benefit of mankind? We don’t know at the moment and I don’t know how long it will be before we know. Maybe we will know in 20 years’ time or 50 years’ time but at the moment, it’s very confused. I don’t know where we’re going.
Tony: I don’t really have anything to say about the distant future. I would like to go back to a point by making a suggestion about security, which is enforced by runtime checks. The way that security is enforced at the moment is by sandboxes. If we extend the idea of abstraction downwards, then we get the idea that you can specify security protocol by interrupting the progress of the higher level users and checking that they conform to the protocols in real time all the time. Conceptually, we’re reusing the same concept of layering.
You can then have, obviously, what I might call dungeons of security where you’re digging underneath the program to check that it’s satisfying protocols which are believed by people to prove things, will implement your desires as to what can and cannot happen.
Carl: We’re now embarked on the most complex engineering project that we have ever done. That is to build the technology stack for these scalable intelligent systems. The Chinese minister of Sciences said they think they can do it by 2025. The only way to build them is to use massive concurrency. It gives you the performance, the modularity, the reliability, and the security that you need. The big question is, what will they now be used for? We want to use them for things like pain management, which is a huge problem in the US, is to have pain management without opioid addiction. Our solution is to use these scalable intelligent systems. They could be used for other things. They could actually become the basis of universal mass surveillance. We are at a turning point.
Tony: Why can’t we use things that don’t scale? That seems very hard.
Carl: The economics demand it. if it’s not scalable–
Tony: I’m not forbidding from using scalable techniques, but not all the ordinary people who work at most at two levels of abstraction and scale use the same concepts which are inappropriate to use at the highest levels.
Carl: This technology stack for these things, as you say, they’re all these levels, they’re different abstractions, et cetera. These are complex beasts.
Francesco: This leaves some food for thought. Thank you so much for being part of this.
All: Thank you.
[00:03:55] [END OF AUDIO]Go back to the blog