Delivered 2015-10-17 at QCon 2015, Shanghai, China



Prepared Talk






Transcript (unedited)

Mike: Ni hao?

Together: Ni hao.

Mike: That’s all I know. Okay, one more thing to prove to my friends that I am actually here. I need your picture. Yes, hey, excellent, thank you.

Okay. I’m very honored to be here to talk with you. This is my first visit to Shanghai, and only my second visit to china. I’m so excited to meet so many people here and to talk to you about this subject that’s very important to me. It’s a bit of a strange title, yes, Dreams, lies and what is this autonomous web? So just quickly, this is me. This is how you find me on the GitHub, and the Twitter, and the LinkedIn, and this is me. As long as you write English I can talk to you on WeChat, but I don’t understand Chinese.

I work in a group called API Academy. It’s a very interesting group of researchers and thinkers around the world who look at the trends, and the future of the internet, and the web, and computing. I’m very happy to follow Chris, the previous speaker, who was talking about optimizing machines, because the talk that I want to give to you today is about optimizing the network. Our future is not to program machines but to program the network. We used to program registers. Remember yesterday if you saw Peter Newmark’s talk, he showed you the earliest days of all the vacuum tubes. This is what we programmed 50 years ago. Now, we program a machine, a component, a web server. But in the future, we will actually program the entire network.

So we are at a very important crossroads. More and more devices per person, year after year, an average of seven devices per person. And if you think about all of the rural areas in the world that have no devices, we will have billions and billions of devices to deal with and yottabytes, and exabytes of data, huge amounts of data to deal with. And we have this history of all sorts of programming languages, language after language, year after year, we keep inventing new languages, why? Because we’re struggling, because we have a challenge ahead of us. We have more and more downloads, over and over, more and more downloads onto our tiny little phones. What happens when we have devices in the room, on my shoes, on my wrist, in my body, we can’t download those apps. This is a huge problem. And we have all these APIs trying to connect these apps together. Many, many millions of APIS, why? Because we’re trying to make connections across the network.

So as we reach 50 billion connected devices just in the next few years…and that’s not enough, we want personal assistants, we want smart houses, we want the cars to drive themselves. These are huge challenges, and we’re in for a big crash. I like this picture because it actually tells a great story. It tells a story about big data, how we have all of these data but we’re not sure what to do with it yet. About neural networks, about the way we try to connect things together but we keep running into problems. Everyone is trying to come up with a way to mimic the brain, but most of us are approaching it like it’s a computer, like it’s a single machine. The brain is not a single machine, it is billions of machines. Each neuron is its own machine. You heard Chris talk about micro services doing one thing, and one thing well. That’s the way the brain works.

We’re not going to program computers, we have to focus on networks. And the networks are not so pretty like this with these simple lines, no. They span the entire globe. I work with some people who are actually working on the internet for outer space, so they span the galaxy. This is incredibly powerful and incredibly difficult. This is what the network looks like, this is the most powerful network we have today, the World Wide Web, and this is what it looks like. This could be a picture of the brain. The World Wide Web is a single application that spans the globe and eventually will span across planets. That, is what we will program.

And to do that we have to start thinking differently. You may have seen this phrase, "Those who cannot remember the past are doomed to repeat it." The mistakes we make if we forget, we will make again. All of the designs over the last 70 years of computing we often make the same mistake, we think, "If I could just make it faster, it would be better." Faster is not good enough, we need to make it bigger. And that means it could be slower. Remember, Chris talked about the idea of asynchronous programming. In asynchronous programming things don’t all arrive at the same time, in real life things don’t all arrive at the same time. It took me many hours to get here. It will take many seconds for that picture to appear on Twitter. This is life, it’s not instant.

But here is a quote I like even more, "Those who ignore the mistakes of the future, are bound to make them". This is our opportunity to not make new mistakes. We can learn now and create a new autonomous web, a new place that acts like the brain. This is a challenge to us. So I have one more quote, this talk is going to be very far-reaching, and I love this quote from Andre Gide the Frenchman, "One does not discover new lands without consenting to lose sight of the shore for a very long time". I want us now for the next 30 minutes to lose sight of the shore, to go far off and think about what that future could be and what the challenges are that we have. And we can do that a few different ways. Let’s talk about dreams.

I find dreams fascinating. As a small child, I used to have the same recurring dream over and over again and I wondered why, and I became really fascinated with learning about dreams. This dreamy image was created by a Google computer. It looks at real images and compares it to its own memory and creates a new image. The Google computer is hallucinating, is imagining. Google’s DeepDream computer creates these strange images. Here it’s creating new images never seen before, it’s combining things together. We might think that this is sort of a bad program, but in fact it’s much how our brains work, it’s how we imagine things. And imagination makes us incredibly unique.

So Google’s DeepDream creates these imaginations and they’re based on archetypes, the ideas that we have in our head. It is assembling new things based on items it already has in its own memory. These archetypes help us to learn about how the brain works. This is the brain when it’s dreaming. When you’re lying still and doing nothing many parts of the brain are operating, why? What’s going on? We’re hallucinating. And why do we hallucinate? Dreams are the way we hallucinate, it’s the way we practice, it’s the way we learn. We imagine, what would happen if? What if I took the left turn or not the right? What if I met him at the store, what will I say? What if we meet at the coffee shop, will she say hello to me? All these things we imagine, this is what our brain does.

So many of us think, "Oh, we should really focus on this big data. This is the future, big, big data." This is a storage facility in the U.S. in the desert for one yottabyte of data, massive amounts of data. Yet we have all these data, we don’t have any intelligence. The brain itself, how much data does the brain store? How much do we imagine, or do we suspect the brain can hold? 100 terabytes is the most common estimation. There are some that say less, some that say more, but this is a common one. That is about 100,000 gigabytes, based on how much information we take in every day, just the fact that I’m looking here, that I’m walking here, all these information, I take in a couple of gigs a day. We basically…our brains have the ability to store 250 years of data. That’s amazing, just one brain. Not a big building, just one brain.

But, does our brain actually store all that data? Every single bit of data, over all our entire life? The answer is no. This is very important, our brain does not store all that data. Instead, we take information in, in our immediate memory, we hold onto it, so I could remember how to say, ni hao. I will forget tomorrow, because that will fall out of my working memory, but eventually we decide what’s important and what’s not and we place that in long term storage. Dreams are how we do garbage collection in our minds, in our brains. It’s one of the key functions of sleep, is to decide what to keep and what to throw away. We do not build computers to decide what to keep and throw away, we just simply keep everything. That makes it even harder for us to accomplish our task.

Pruning data into long term memory is why we hallucinate, is why dream. There’s a great book on this topic, "The Secret World of Sleep." And it talks about many people who have difficulty sleeping, difficulty pruning their memories. In fact, it tells a story of one person who cannot forget and it is very painful. Every single little thing reminds him of a memory 25 years ago. He can’t even function. He has to go in a room where there’s all dark just to try to relax and his memory still works. It’s terrible. Forgetting makes us more efficient, more effective. Forgetting is important.

So we must learn to forget, we forget all the time. We make decisions, sometimes those decisions are not so good. "I’m sorry, I forgot your name." But we do this, this is what we do. Forgetting is important and so is choosing. This is also an incredibly difficult task, learning to choose is hard, learning to choose well is even harder. There’s a great author by the name of Barry Schwartz who tells this story. As our possibilities get larger, it’s incredibly harder to decide. So he has this book called the "The Paradox of Choice" which is a very good book. And he understands this idea of choice. As we store more data and we’re trying to figure out which piece of data to use, choice becomes even harder. Data is the enemy.

So this is a model of how we wake and sleep and how our brains operate. And there’s chemistry involved, we don’t need to talk about it now, but what’s important is we decrease the amount of information and we consolidate it. So this is what we have to think about big data, we have to decrease it. I love this quote from Edward Tufte, the great data visual scientist, "If you torture big data long enough, it will tell you what you want to know". I can look through lots and lots of data and find exactly the answer I’m looking for, may not be the right answer, but it was the one I wanted. And many of us know this, you look for the answer you want, not for the answer that’s right. If we build an autonomous web, a web where things can actually act on our behalf, we have to teach machines to hallucinate and we have to teach them to forget. Wow, this is important. We have to learn to prune, get rid of data and imagine things.

Okay. Two, lies. Lies are incredibly powerful. Lies are why we have society. We all have the same belief, we all have the same notion. We all say, "Oh, you look good, you look fine. You’re not fat." "Yes, I am. I am fat." But I want to speak just a little bit, I’m going to pick on this one thing, because lies are also something we must deal with. We know this, this autonomous car, this Google Car. So we’re very excited about self-driving cars. I love this quote from Michael Luis [SP], "A key to start the car, that’s simple. A car itself is rather complicated, but driving a car in traffic, that’s complex. Why is complex? Because I cannot predict, I can only react. I cannot know ahead of time anything will happen so that I can avoid it, I must react when it happens". Reactive programming, as Chris mentioned.

So here’s a great quote, I won’t go through all of it. But the way the Google Car actually works is it’s memorized all the roads, all the sidewalks, all the stop signs, all the traffic lights, ahead of time. It already knows what the entire driving area looks like before it leaves the house. It doesn’t really react or make decisions based on unexpected consequence, it simply drives the path it already knows. It has everything memorized. It knows where all the roads and all the signs and all the left turns and right turns and you cannot turn on this light. So it knows that already and it just pays attention to the cars and the pedestrians. In fact, if someone stands in front of a Google Car, it stops and waits for someone else to come along and drive the car. If a duck walks in front of the car, it stops. It doesn’t know how to deal with that. It only knows how to deal what’s already in its memory, what it’s already memorized.

It doesn’t really react, except in real life, in real life complex systems, individuals react as a group. This is a flock of birds that groups itself, they pay attention to each other. This is ants, ants actually have a very complex network of communication. They don’t know ahead of time what’s going on, even down to the very biological bacteria and virus. They react as a group, but they don’t know what’s going on ahead of time, they don’t have a map. So our ideas about how these computers are going to work, how these automated systems work, are really based on a false idea. They all are based on statistics right now, and complexity, complex systems. Traffic is not statistical. You can’t use statistics to get through traffic.

IBM Watson is going to use their technology to help us with medicine and all sorts of things, based on statistics. And in fact, they even say on their website, they offer you the benefits, without the complexity. This is trouble. All the IBM Watson can do is predict based on the past, not react to the now. Google’s entire system is based on the notion of statistical checking. It’s spelling, it’s translate service, all of that is just statistics. And in fact, a person in charge of research at Google has said, eventually we will get to the point where statistics do not give us any more value. We can’t get any better, we’re running to the end of this road on statistics because learning is complex. Statistics are not learning.

Okay. So what does this really mean? We know that the brain is a complex system, we know that simply memorizing ahead of time won’t work, what will work? If we want to create a web where I can ask a bot or an autonomous piece of software to go get something for me, to go purchase something for me, to take my vehicle to the store, to communicate with my friends, to look at all sorts of information and give me some advice, we really have to go back aways. So I’m gonna go back to the sort of the earliest days of information theory. Yesterday, Peter went back to the history of computing, I’m gonna go back even further, to the mid 1800s. A physicist by the name of James Maxwell is thinking about this notion of thermodynamics, of predicting the path of molecules. Molecules bounce all over and when they bounce a lot they create heat.

In fact, he’s trying to figure out if you could ever break the law of conservation of energy. So he imagines this creature, what he calls the "demon" that sits, and he notices which molecules are fast and which are slow, and he has a magic door and splits them apart. He invents this idea in his head, he hallucinates an imagination. So he’s trying to see if we could fight the second law of thermodynamics. So, around the same time, a little bit latter, about a decade later, Ludwig Boltzmann comes up with this idea, how can we predict where a molecule will be if we can’t separate it by the fast and the slow ones? What we do is we come up with a statistical pattern, remember, we talked about the statistics? And it turns out all the possible states of where the molecules might be is a sort of a…what was later called an eigenstate, like a probability of what might happen.

All we really know about the world is an estimate. Most of us want to think that we know exactly what’s going on. We never do, it’s simply an estimate. Later, decades later, Claude Shannon, one of the people who really invented information theory in the modern age says, "This idea of being able to predict or not predict is actually not as important as whether or not there’s some bit of surprise, some unique bit of information that can tell us something." And he calls this entropy. He uses the word entropy in a very different way than we use in physics. For him, entropy is good, entropy is the extra thing that’s unique. Later on, it’s renamed surprisal, the surprise of seeing an outcome, that’s the information.

If I send information from here to here and it’s always the same, I get bored, I don’t care. But if suddenly there is a new bit of information, that’s entropy, that’s interesting. So it’s the new information that becomes incredibly interesting. Allan Turing takes this idea about information and predictability and creates what’s called Turing Tapes, the very first computer in the '30s. He creates what we now know as a computer. In essence, the problem is, if I write a program, do I know if it will ever end? Do I know ahead of time if it will ever stop? Can I predict the outcome? And, in fact, it turns out we can’t. We can’t even predict when a machine is done.

Finally, mathematician Kurt Gödel has this idea he says, "This statement is not provable". Is that true? Is that a true statement? "This statement is not provable". If it’s true, then it can’t be proved. If it’s false, then it’s actually true because it’s not provable. He makes a trick, he treats the data like the program, he treats the math like a word, and that’s exactly how we build our computers. We build our computers to put program and data in the same place, right? John von Neumann, when he finally decides to build computers in the '50s, in the U.S., he uses this idea of putting memory and program in the same place. And that’s the way our brains work. Our brains are the program and they are the data. Big data is the wrong idea, data and program together is the right idea. And, in fact, this is how genes, the actual genes in our body work. They’re programmed as long string messages and those messages are inside the cell itself, they’re not outside. They are the program and they are the data that makes us who we are.

Finally, I wanna talk about Roy Fielding. Do we know this name, Roy Fielding? Rest, do we know this word REST, APIs, yes. Roy Fielding invents this idea of REST architecture in the 1990s. And it’s based on the idea of not being able to predict how the network works, only being able to see the next person in the chain. Creating software that’s only concerned about the next link, not the final destination. Where the information itself, where the data is the program, where we actually ship messages along that don’t just have data or names, they actually have instructions. "This is how you edit, this is how you search, this is how you write."

Most of us today do not write programs this way. We actually have all of the instructions in the code, not the message. That means, if the message changes or if we want the network to do something different that code is no good, and we have to get rid of it and upload a new one. In Fielding’s world, we don’t have to do this, we actually send the code and the messages together, and they call that hypermedia. So we’re talking about complex systems, networks with no central control. There’s no central control in this room, we’re all individuals. We may act as a group, we all come in, we sit down but we are not the ones…there’s not one person in control of all of us. If I wave my hand, you don’t do what I say, I think, I hope.

So the web is a complex system. The web is very much like the system. So what does that mean? What’s our web like today? Most of the way the web works is through these formats, these message models, HTML or CSS. But there’s a whole new set of models, HAL, and Collection+JSON and Siren and Hydra and UBA. These are all designed to contain both message and data, program and data in the message together. And they have varying levels of entropy of interest. There’s this very simple media type, it’s easy to understand, it just sends URLs, just sends simple addresses. So it’s very low entropy and it’s very low energy to use. It’s very easy to understand.

Plain text has high entropy. It’s very hard to predict what’s going to appear and it takes a lot of code, a lot of energy to make it work. HTML is about in the middle, it has some structure but you can’t predict a lot of things. This ability to understand the difference between the entropy level and the energy needed is how we organize our computer systems. From a machine point of view, the less predictable, the more energy I have to use. So this is incredibly important. Energy is computing power, coding time, how much source code I need, how much memory, all those things. The more unpredictable, the more memory and power I need.

Most web applications today, they just work once and we write a new one. Every time I download something on my phone, they say, "Oh no, you have to download a new one, the other one’s broken. We fixed it." Over, and over, and over. Can you imagine how that would work in biology? There could be no species, they will all die. Every time something changed…if the weather changed everybody would die, it’s already broken.

HTTP itself has been around for more than 20 years. Do we think it’s going to be around in another 20 years? Most of the programming languages, most of the formats assume HTTP is the only way to communicate, yet all our devices, our watches and all the other building machines, they don’t use HTTP. What happens in 20 years, 30 years, 40 years, 50 years? It’s hard for us to remember that HTTP just 25 years ago didn’t exist, and nobody thought it was very good. We have this quote from Abraham Maslow, "When all you have is a hammer, everything looks like a nail". "I will use Http for everything, oh, yes it’s fine, it’s good, yes." "I will use JSON for everything and HTTP and the world is good." No, who knows what will happen in 10, 15, 20 years? How long will this last? Eventually, what else will be the dominant theory? We need to design our systems so that they’re not tied to just one protocol or one format, we have to adapt. We have to build adaptable systems.

I have this quote from Lord Kelvin, you know the Kelvin temperatures, in science, in physics? He predicted that the world could be no more than 40 million years old because he measured the heat. And even though people explained to him that, that heat was not correct and so on and so forth. He said, "No, no, no, I will never admit that the world could be more than 40 million years old". Of course we know, it’s more than a billion years old. He was so stuck on his idea that he didn’t want to leave the shore, he didn’t want to accept a new idea.

Okay. So what can we do now to solve these problems? We can lower our entropy, we can decouple from protocols and we can focus on programming the network. We need more media types, more message models. I listed six that have come up in the last five years. We need dozens. Just like in real life, just like in nature, we need lots of these to exist and see if they live or die. Normally, these formats have three levels of entropy, the structure, the format, like if it’s curly braces or angle brackets. The protocol as to what you can do with these messages, what’s possible and the actual meaning, like I’m doing accounting or WeChat or user management or e-commerce. That’s the domain, right? That’s different from all the others.

Most of the time we mix these up and make it difficult to separate. So when I wanna do e-commerce, I always have to use JSON over HTTP, why? I should be able to do e-commerce over web sockets or MQTT or CoAP. I should be able to do with JSON, or Siren, or UBA, or XML. Separate them and that lowers the overall entropy value, makes it more powerful. The higher the surprise, the higher the dependence on custom code. If we don’t standardize, we have to keep writing the same code over and over. And every line of code introduces a new bug. We have to learn to stop writing code.

So there’s this model, this is the model that I created about five or six years ago to start analyzing messages about what’s possible and what’s not, it’s called Factors. Can I do a link, embedded, and originated, and so on so forth, we won’t talk about the details today. But there are models for analyzing messages to see what actions you have in the message. The code is in the message, the program is in the message. And there are models for creating domains, application domains, that I can share separate from the message. This is a site called [inaudible 00:32:49] which I and a few friends are working on. Tom mentioned the last book "RESTful Web APIs", it introduces this idea of separating the domain from the protocol.

What we need are more types that talk to machines not humans. It needs to be optimized for a machine. We need to have the ability to tell a machine what it’s doing not tell a human and now we can start creating programs for machines. Machines are like little pets, they’re very simple. I can teach them a command. Remember Chris Bailey talked about, do one thing and one thing well? MacRoy’s [SP] rule for Unix systems, that’s what we need to create. We need to create languages to let machines do one thing and one thing well. "You find a proper shirt in my size. You find the store where that shirt is. You check out with my credit card to that store for that shirt. You make sure that it’s shipped from that store to me." Each program doing one thing. Languages that are matched to the program.

We create machine-readable messages. It’s going to lower the cost of programming, it’s gonna lower the cost of getting work done. We need to get rid of HTTP eventually. We might as well start thinking about it now. It will not last for a century, it will not last for 50 years, maybe. We need to move on. We need to start designing systems that assume other protocols. We need to start creating new ways to focus on the entire network and not just one machine, like one server or one client. When I write an application today I assume that millions of machines that I’ve never met will use my service, how is that possible? People 10 years from now will use my service and I’ve never met them, I never talked to them. That’s the kind of network we need.

Richard Taylor, at UC Irvine in California, has this great quote, "The World Wide Web is a distributed hypermedia application. It’s one application. We’re programming that, that’s what we’re programming". So there’s this idea about the way the web works, it works through links. Barabási and Albert created this model called preferential attachment. Normally in the world we have this idea where everything is a Bell Curve, right? Everything is distributed but in real life that’s not true. In real life, we have some nodes that have many links and some that have very few. This is how Google became rich. Google understood that some pages have lots of links, they sold ads on those pages.

Yahoo tried to create a catalog of every page, they had humans create a hierarchy, no hierarchy. Google said, "Forget that, we just need to know which pages are popular". So this exists in real life, this is the way our friends work, this is the way our cities work, this is the way our countries work, through preferential attachment, through links, popularity. So we can lower the entropy and messages, we can reduce the dependence on protocols, and start thinking about linking the network rather than just writing for one machine.

Okay. There’s some hard stuff too. If we wanna create this place where there is autonomy we have to think about the future. How do we create applications that have no central control? How do we actually allow them to adapt to changes? And eventually, how do we allow them to have their own life cycle, right? Many of us today operate in a system where our job is decide when to start writing a program and maintain it for many years and then decide to take the program offline. We call this, software development lifecycle. How can we automate that? How can we get software to just take itself offline?

So there’s a gentleman by the name of John Conway, a mathematician and he created this thing called The Game of Life. Does anybody recognize this, The Game of Life? Sometimes it’s a programming project and it’s based on some very, very simple rules, it’s called cellular automata, automated cells that talk to each other. This is actually one of the programs running on my screen right now. And it’s one of the programs that runs perpetually. And it’s based on some simple rules about if there is a square, then other things grow around it, and if there are too many the square dies. And this creates the shape. This is actually called the laser gun shape. And it turns out, in Conway’s mathematical model which is actually very, very simple, it’s like four rules, you can play this by hand or you could write a program. There are some shapes that perpetually work over and over, there are some shapes that run for a while and just die, there is nothing and there are some that just become static, they don’t do anything.

This is starting to program a network, each one of these is a cell in the network. This idea of cellular automata was the way we thought we were gonna actually create computers in the 1950s. But it turned out to be difficult math, so instead we created the one we have today, with the central processing unit and memory and program and a clock. But originally it was going to be this instead. There are people still today working on this model. As a matter of fact, there is this gentleman by the name of Stephen Wolfram, who has this website called Wolfram’s Atlas. And he takes all of these models and he puts them together and he uses them as search engine properties. And he collects a great deal of information and he shares it with everyone based on these cellular automata, it’s the cells that learn.

These cells exist in nature. I was at a conference last year at Columbia University in New York, there are people at Microsoft growing cells, growing actual biological cells that use this technique. The future web, we will not create a single program, we’ll create lots of tiny ones that cooperate together to create the activity we want. We’ll program a network of machines. We know from Google and from Netflix and from Facebook and Microsoft, they have thousands and thousands of machines and if one dies, that’s okay, they put another one in. That’s what we will write, that’s the program we will write.

There’s a great application called Robby, the soda machine [SP], which actually goes about and learns how to collect up things on a grid. Not through the changing of the program, but through the changing of the data, it learns. So we have to model adaptation, we have to model the ability to learn. Now, the problem with Robby is there’s a central scorekeeper. There is another… In life we don’t have a central Scorekeeper, we have resources, you die, if there is not enough food.

There’s a system that’s actually designed called Random Boolean Networks which has this notion of no central scorekeeper. We won’t talk a lot about it today but there are people already creating this idea that, I don’t have a central place that decides who wins and losses, they interact themselves. It’s a complex system. More importantly, we need to figure out how to model competition, who wins, who losses? We fight for resources, we fight for memory, we fight for clock time on the CPU, we fight for electricity. I don’t know what we’ll do yet. But this is the big challenge ahead of us in our future.

Okay. Let’s wrap it up. We’ve gone very, very far, very, very far. Physics and biology and the future mathematics, what does that really mean for us? It turns out the stuff we’re doing today is really based on biological systems, on these complex systems. We can take advantage of that today if we remember that that’s what we’re really doing. The problem is most of what we’re actually executing ignores that and tries to do everything as a one-off random mess. We’re not taking advantage of each other, we’re not taking advantage of other programs on a system. And we lack this ability to work together with individual programs.

We can start today by creating less entropy, more machine-readable code, decouple from a single protocol and treat the network as the place we want to program, not the machine. I’m programming the network, I’m adding to the network. And in the future we’re gonna give up central control. I won’t decide exactly everything, I will be like driving a car and I will react to what’s going on around me in order to get to the place that I’m going. But to do that, we have to realize this autonomous web. How long will that take? I love this quote from Douglas Hofstadter, Douglas Hofstadter a mathematician who wrote about Godel, we talked about earlier. He says he has a law, "Things take longer than you think even when you take into account his law".

Things will probably take longer than we think. I’ve actually been talking about this for five years and we’re only making a tiny bit of progress, but we’re very close. The thing is, we have to remember, ah, sorry, to not make the mistakes of the future. To take the time now and think about things differently. And that way we can safely leave the shore, we can get to a future, even when we don’t know exactly where that’s going to be. We must be willing to lose sight of the shore in order to make progress. Thank you.