Building Effective Microservice Teams

Overview

Status: Delivered 2016-03-28 @ QConSP 2016, Sao Paulo, Brazil
Last Updated: 2016-12-02 14:22:26 EST
Summary: Home
Slides: PDF
Prepared Talk: NA
Video: InfoQ
Transcript: HTML
Notes: This is an unedited transcript of the presentation.

Transcript

So this talk is about optimizing teams in a distributed world or building microservice teams that’s the title in the text. And I refer it to as Conway’s Three Other Laws. Who knows the name Mel Conway, this could be fun then. Okay, very good. So, this is me, this is me everywhere, this is me and LinkedIn and GitHub and Twitter and I would love to connect with you and hear what you’re working on, and what your teams are working on. So please feel free to connect with me or talk about anything with me on this topic and others.

I work in a group called the API Academy which is a very interesting collection of thought leaders in Britain and Canada and U.S. We have a person in Germany but he is now a CTO of a large organization so sort of an incredibly smart group of people that I learned a great deal from. And that’s one of the reasons I get to come and talk to you today, as they encourage us to talk to listen and to share. Speaking of sharing, we’re just finishing up a book on microservice architecture with our API Academy team so this should be available in early release, I think probably next month. So I’ll be tweeting about it and I would love to hear your feedback.

This book is not about the code as much it is about the business value and the culture changes and the way that your organization ends up working. So I’m very excited to share that and some of the things I’ll be talking about today are from the book. So, I’ll start out with this story, I visited Amsterdam last year and was talking about the same general topic. There is wonderful place called the NEMO Museum, it actually looks like a boat but it’s really a museum and you can walk up to what looks like the upper deck and view the bay. But what I find really curious is next to the bay where the ships are these really odd devices. These really strange machines that apparently were very important in shipping at one time, they helped bring ships in or move cargo or something.

What I find fascinating about them is here they are, they’re sort of enshrined tools that we know nothing about, at least, you know, to me. Now, there probably be somebody in Amsterdam that understands them. And it reminds me that very often when we’re working in companies, we’re working in teams there are all sorts of things we don’t know anything about. Often we want the one button solution, you know, if you could just click here and I get a perfect team doing perfect work and I don’t have to worry about managing them or any of those things I’d be all set, right?

And I would love to be able to give you that button and tell you that just do this. Look, watch my talk and you’re set. Unfortunately, all I really have for you are lots of tools and it turns out these tools have been laying around where you are, where you walk every day in all the people around you and you maybe just don’t recognize what they’re like, and that’s really what we’ll get a chance to talk about today. So, I’m wanna talk about this notion of effective teams. I really enjoyed listening to Jonathan’s talk if you were here… How many people were here for Jonathan’s talk in the first [inaudible 00:03:32]? Great talk, if you didn’t catch his talk get his slides, find him. He’s got wonderful direct experience from working at Spotify about teams.

This is a quote I love from Microsoft. Microsoft did some research in 2009, "Organizational metrics predict software failure proneness with precision of 85%. Not the code, not the lines of code, not the number of check-ins, not the number of bugs, the organizational metrics give you an 85% accuracy of how many bugs you’re gonna have." The organization that matters, the organization that counts, the tech is easy, it’s the organization that matters. And it matters even more so if you start doing this thing people are calling microservice because microservice is actually just exploding the same project, the same idea into lots and lots of smaller parts. And then all of the social aspects, all of the challenges exist now in multiple versions and they don’t all behave the same.

So having effective teams in microservice is incredibly important to think about the kind of tooling you’re gonna use, the kind of tools you’re gonna bring to bear. Sam Newman, who’s written the probably one of the best books of microservices right now. It’s called "Building Microservices" with O’Reilly [SP] wrote in a blog post that "Microservices allow organizations to align the architecture of their systems to the structure of their teams." And this is a really important little bit. What happens is microservices actually give us another opportunity to start rethinking the way we interact with each other, just as we’re starting to rethink the way we’re gonna get code to interact.

This is another opportunity for us to rethink the way we interact and it turns out the way we interact is actually much more important, and we call this demystifying Conway’s Law and that’s referring to a gentleman by the name of Mel Conway. So, this is Mel Conway, Mel Conway in 1967 when Mel Conway writes an article about committees and how committees work together. Mel Conway has actually had a rather amazing career. He first in the '50s built an assembler language for Burroughs computers which later become Remington Rand which become IBM computers.

And he called the assembler the SAVE or S-A-V-E assembler, not because those letters mean anything but because back then everything was written out on cards. And every card would have the word SAVE stamped on it, nobody would throw them away. He built a first universal compiler for mainframes so he’s building the Java runtime and the dot net runtime in the 1950s for mainframes, for mainframes this is amazing. He wrote his paper on coroutines in '63. This paper is still cited today for people who are designing compilers and parsers.

Medical computing, he was an Apple fellow he actually built the compiler systems for Pascal for Mac in the Apple 2. He is currently working on a project called Humanize The Craft, which really rethinks the way we code. But it’s this one article that I want to talk about, this article from 1967 called How Do Committees Invent. And it comes out of his experience at Remington Rand when we used to build mainframe computers as bespoke hardware devices. You know, we all talk about layout bespoke code or custom code, we talk about trying to create them in more generic way.

This was back when we created a mainframe physically to solve one problem, like how to lob shells to our enemies or how to schedule airplanes. The Sabre system was a bespoke hardware system when it was originally built. And this is where we get the notion of project based organizations. Jonathan talked about this notion of project based organizations, how Spotify is so different. So Mel Conway is at the very beginning of this idea of creating a project based organization. There’s a whole community in theory about the idea of project based, how we bring people together from lots of different places to solve a particular problem and then go up part.

And this is what got Mel thinking about when we get these committees together to solve a problem, how do they behave and how does that affect output. So he writes this article called "How do committees invent." Originally, wanted to submit it to the "Harvard Business Review." They rejected it because they said his idea wasn’t well supported. So, he gets printed in an IT magazine called "Datamation." His idea has really only a tiny bit to do with IT, it has mostly to do with people, but we tend to associate this as an IT article. It’s not a long article. If you haven’t read it I definitely encourage you to read it. It’s easy to find on the Internet and it’s well worth reading, it’s less than 4000 words.

This is the punchline of the article. It’s a rather long sentence. "Any organization that designs a system will inevitably produce a design whose structure is a copy of that communication structure." Inevitably is actually the keyword. An easier version system’s design is a copy of the organization’s communication structure, not the organizational chart by the way but the way we talk to each other. Do we have lots of memos, a few memos, do I trust you, do I not trust you. Do I have to get permission from three people or two people or no one, that affects the system that you build, communication dictates design.

One of the first things you’ll hear people start to think about, and I think I’ve heard this from some microservice people as, "If I want the software to look a certain way what’s the first thing I’m gonna do? I’m gonna start to think about how I organize my teams." The first thing that I’m gonna do. So that’s called Conway’s Law, that first thing. It was called Conway’s Law by Frederick Brooks, Mel Conway didn’t think it was a law. And Brooks has written "The Mythical Man-Month." Who’s seen this book? Everybody should have a copy of this book, it’s a fantastic book by Frederick Brooks.

Frederick Brooks has his own law, we all know this law, we’ve all experienced this law. Adding manpower to a late project makes it later. You know, you’re falling behind, I’d like to give you five more team members. No, please God, no. You’ll just go put them somewhere else. Now why is that true? Why was Brooks' Law true? It’s because of the combinations of communication that happens when we start adding more people to the team. And there’s basically an exponential pattern here, here’s a really simple mathematical pattern here.

If I basically triple the size of the team in a general way, I increase by orders of magnitude, the complication of communicating the same information to everybody. In 5 people I got 10 ways, in 15 people 100 ways, in 50 people 1000 ways, 150 people, remember Jonathan said at 150 they wanted to break this group up. At 150 people there are over 10,000 different communication paths that I need to manage. And it’s really incredibly difficult to get everybody to follow or get everybody on board when you’ve got 10,000 possible ways of viewing the same sentence.

And this really goes to another researcher, a British social anthropologist by Robin Dunbar. Who’s heard of Dunbar’s Number, cool. Dunbar’s Number, basically Robin Dunbar has this idea that our mental capacity limits the number of social interactions that we can manage on a safe basis. He basically goes on to say because of his social research that the size of brain actually dictates the size of community. And he says one of the things that’s really amazing about the way that humans work is that we have a larger brain and we can expand that in several different ways.

Dunbar’s Number occurs in a lots of ways. Most people know the 150 number, because that’s the one that gets popularized. In the beginning, Facebook used the 1500 number as a limit to the number of people you can have as friends in Facebook because of Dunbar’s research. They found out that that’s a terrible idea that online our cognitive capacities are quite different so they had to learn from that experience. One of the things that I find fascinating…I don’t know if Jonathan…I don’t think Jonathan is here, but Spotify talks a lot about Dunbar’s Number when they talk about arranging their tribes and their squads and all these other things.

So this has become a powerful way to start thinking about it. As a matter of fact, the sort of the numbers that Dunbar talks about is intimate, trusted, close, and casual. And it turns out that these are if you think about, you think about where startups are on this list and what happens when you grow a company. There are levels when suddenly, you know, it’s five people in a room and everybody sort of understands, they know exactly what you’re talking about you don’t need to explain it, I know exactly what he’s gonna do, we’re looking great. Then you start to grow, you get up to about 15 and people say, "Yeah, yeah, we just need to, let’s be clear and we’ll be good, I trust you. Everything’s fine."

But then you get up to that 35 or sometimes up around 50 range somebody is gonna say, "You know, this used to be a fun company. It’s not anymore, now we’re having meetings, there’s paperwork there’s somebody in human resources. There’s somebody in legal. This is getting complicated." And then pretty soon you have what a culture problem, right? And it’s because of the way our cognitive systems work because of Fred Brooks. Because communication dictates design, these are things that have been sitting right in front of us for decades and decades. So that’s Conway’s First Law.

Conway’s First Law tells us that team size is important. So we need to make teams as small as necessary, not as large as necessary as small as necessary. What’s the least number of people I can get this done with. It’s sort of like a minimum viable product, right, you know this idea. What’s the simplest solution that works, what’s the smallest team that works? So I’m gonna use Spotify slides on this because they’ve written a great piece called "Scaling Spotify." I don’t think Jonathan mentioned it but it’s another thing I would encourage reading.

And this notion of squads, these squads are about seven people, which is well within Dunbar’s Number of 3 to 5. And their tribes are groups of squads and they have these guilds that share information. We won’t spend a lot of time on it, it’s a great piece but Spotify really understands what this means. So this is a recommendation we give companies when we are consulting with them. If you don’t have some kind of relationship with every member of your team, your team is probably too big. Now, relationship doesn’t mean that you go out drinking together every day or that you know the name of the cat or what their kids are doing in school or so on and so forth, but at least have some kind of conversational relationship with someone.

I was at a company just about six months ago and we were talking about some of this, and it was in a small group and I actually heard over on the side conversation, a manager say, "Who is she, is she on our team?" Okay, and I was like, "Done." Right? So if you don’t know, you’re gonna have trouble, it’s gonna change the way you communicate. So we recommend that you aim for a team size of Dunbar level 1 or 2 of 5 to 15. This is what we see time and time again when we look at companies, whether they’re mature companies or startup companies, whether they’re, you know, like a cloud native company or they’re doing gas drilling somewhere. These are the teams sizes that we see over and over again.

There’s lots of other things about how often they change and things like that that we really don’t have time to get into. But some are around these levels, these intimate and trusted levels, you can get a lot of work done with a small team. When you begin to grow pass that you get what’s called communication overhead. When Dunbar was working on his research he looked at chimpanzee groups at 150, like a tribe of 150, 40% of the time, 40% of the work of that tribe was grooming each other. Just keep maintaining relationships that’s overhead, that’s communication overhead, 40% at 150 people, communication overhead is almost zero at five.

So I said the other three laws, this paper is actually full of great ideas. That’s just one of them, the one that people have noticed. But there are three other ones I want to call out and I’m gonna call them the second, third and fourth law. So, Conway’s Second Law and this is a quote from the paper, "There’s never enough time to do something right but there’s always enough time to do it over." And this is another thing that I think many of us have experienced. Eventually, somewhere along the line someone’s gonna say, "Look, we don’t have time for that we have to make the deadline, we’ll do that in the next version. Don’t worry about it, don’t get so upset about it. We’ll have a chance to do it again in two years when we change all the software anyway."

We all experience this message, doing it over is sort of what we do, right? Oh, there’s a new framework, there’s a new thing or there’s a new client library or there’s a new server library, it’s all called [inaudible 00:17:21] now it’s all called containers, now it’s all called [inaudible 00:17:23] we do it over, right? We’ve been doing it over for a half a century. What we’re really doing is we’re engaging in tradeoffs. This is totally natural, totally real and unavoidable. And actually there’s a great book that talks about efficiency thoroughness tradeoffs.

We tradeoffs between being efficient and being incredibly thorough, whereas the simple tradeoff that we all make that we all know about is when you finally decide to ship even though you know there are bugs in the code, okay? I could try to get every bug out but I’ll never make the ship date. As a matter of fact, you know, I always hear all the time do you want it right or do you want it on time, right? We always have these tradeoff things in our heads. It’s a great book called the "ETTO Principle" by Erik Hollnagel. Anybody seen this book?

I totally encourage you to read this book, it’s very readable, very easy to read. Erik is actually another Scandinavian and he works mostly in the medical fields in helping make sure that doctors and nurses don’t make mistakes when your heart is open on the table. It’s a kind of a good job to be in, I’m glad he’s doing it. He’s also a big hero of the DevOps Community because Erik talks a lot about this notion of doing tradeoffs in complex systems. He has a couple of ideas I wanted to bring today, this is a little tough and I’m not sure how it will even translate, satisficing versus sacrificing.

Satisficing is a consequence of a limited cognitive capacity. Things are so complicated, I just skip things. That’s okay, I don’t need to know, go ahead, right? Sacrificing is when I know exactly what’s going on but I just don’t have enough time. I could do this a whole lot better if I had more time, I just can’t do it. There’s a quote that’s been attributed to a lot of writers, I picked Mark Twain because I like that particular attribution. He said, "I’m sorry, I wrote a very long letter. I just didn’t have time to write a shorter one."

So, sacrificing is when we say, "I just don’t, I am limited resources. Satisficing is when it’s so complicated, I can’t figure it out." Problem is too complicated, ignore the details, we do this all the time, right? You don’t really want to know how the Internet works, right, it’s kind of scary. You know, there’s that messages bouncing around and servers are not answering and things get rerouted, it’s a little too… I just as long as it gets there I’m fine. Not enough resources, I just give up a feature just drop it, you know, that’s just ship on time, so this is normal. This is the way we always work, this is the way we always work in systems.

The other thing that he talks about which I really love which is kind of related to safety 1 and safety 2. In safety 1, the ideas I want to take as many bugs out of the system as I possibly can before I ship it because then it’ll be a stable system. Of course, the reality is I can never get all the bugs out, never. There are lots of bugs I never find, somebody else will find them. Usually it’s the first user that I give it to and she finds it right away. Damn it.

What Hollnagel talks about is safety 2 or resilience, putting enough good things in the system that it survives the bugs. John Gall, a person I didn’t get a chance to put a lot of material and John Gall has this book called "Systemantics." And one of his principles is, "Stable systems run in failure mode." Right? I think Jeremy talked about this idea earlier today in the keynote. So in safety 2 resiliency systems or sometimes they’re called antifragile systems, we put enough good things in the bag that outweigh the bad. The bad that we might never find because it’s just we don’t have time to find all of that.

I’m just gonna put this graph up because it’s from his book. He has this notion of the idea of loose coupling or tight interaction or integration as a way to think about how complex a system is and how resilient it can be. Down in the bottom when we have loose coupling and loose interaction integration that’s most manufacturing lines, a Toyota I can stop the manufacturing line anytime I want and we can discuss the problem. Way up in the top corner is a nuclear power plant, if somebody flips the wrong switch it’s pretty much done. It’s a terribly highly interactive integrated tightly coupled system, right?

So when you design software you want to make sure you design your stuff more like an assembly line and less like a nuclear power plant and you’ll be okay, right? That’s why we think about this loosely coupled asynchronous interaction that actually, you know, pushes us further down in that lower left quadrant. So the enemy in all this by the way is intractability, the enemy is not being able to understand what’s going on. I can’t fix something I don’t understand. I can understand about taking a feature away because I’ve got timing but if I’ve missed an important aspect of the system I’m in real big trouble.

So what happens systems grow, they keep growing bigger and bigger so we have all this code. We’ve all looked at this, it’s like what is all this stuff, oh, it’s from years ago, never mind to go down to the bottom comment that section out and write another one, right? We know exactly how these systems grow. Rate of change increases, they want it faster and they want it faster than faster, and expectations keep rising. They want it better while they want it faster while it grows, right? So that’s where intractability comes in. Sometimes, suddenly it’s difficult to get a handle on what’s going on and we’ve now named that thing a monolith, right?

That’s the new mean for that thing that I don’t understand, it’s a monolith, nobody knows what’s inside that, right? Even when you ask somebody what a monolith is, they’ll say, "Well it’s a big thing." They don’t know what’s in it either. So, how do we fight this? We fight this with continuous delivery. One of the great things about continuous delivery, it’s tiny little bits over and over, over and over and over. Whether it’s at Spotify or at Etsy or any of these organizations, they do lots of things over and over and over again.

Conway’s Second Law is about problem size, not team size. You need to make the solution as small as necessary too. You need to make sure that what you’re delivering is tractable, is understandable, what you’re delivering. So this is actually a chart of congenious delivery at Etsy over a space of a few years. It’s hard to, you don’t need to worry about the numbers, it’s the number of daily releases. And you can see in the beginning there’s maybe 4, 5, 6, 10, 20 getting higher, 30 releases per day to the point where at one point it’s 40 to 50 releases per day, 250, 260 five days a year, 10,000 releases to production a year. Think about that, think how they do that. They don’t release the entire system 10,000 times a year they released small parts of it.

In fact, one of the things that’s kind of buried in the diagram is the darker color is configuration only change. And you can see in the beginning there’s almost no change on configuration, it’s all code and eventually it starts to get more and more related to just changing configuration rather than running code. This configuration is often much more tractable then code, this is one of the things that Etsy has learned and they talk about. So turning this into a tractable system means I can start to release smaller and smaller parts that means I can reason about the release.

If I’ve got one bug fix in a release and I put that release into production and it doesn’t work do I know what caused it? Yes. How about if I have 10 bug fixes? No. Remember the chart from Brooks, 5 people 10 different ways so on and so forth, works for bugs the same way, works for changes the same way. If I put on 150 changes into a big update, every six months there are more than 10,000 possible ways that could go wrong, right, keeping it small really matters. So if you or your own team cannot explain all the code in the release it’s too big, it’s too big, stop.

This is very hard for people, this one is tough. Like, no, no, no, we’ll just put another one in, one at a time is infinitely easier to deal with. Execute small releases, lots of small releases instead of just a few large ones. If you’re releasing every quarter, every six months, every year you’re just really banging your hand with a hammer. It’ll feel so good when you stop.

Smaller releases many more of them are gonna be much easier to solve if something goes wrong, much easier to get history from, much easier to understand and learn from over time. Jez Humble who is sort of the founder of this notion of continuous delivery says, "In an IT organization if something hurts, you do it more often." At Etsy your first day you’re hired they put a little button on your desk. And they say, "Okay, you’re releasing to production, you go ahead and press that button, because that’s what we do here, we release to production that’s our job." And if I remember correctly you push the button and then they turn the lights off and they send a siren and they blame you for running the whole system into the ground as a joke.

But the idea of small releases is incredibly important. Okay, how we doing here, we’re doing good. Conway’s Third Law, this is a weird one. I had to look this up, has anybody know this word homomorphism, any math people, look I got a couple of nods, that’s amazing. It’s cool. So, there’s a homomorphism, there’s a direct connection between the graph of a system and the graph of the organization in the way the organization interoperates.

So homomorphism is this ability to transform a set into another set that preserves all the relationships, all the details. It’s like graph to graph, okay? This is incredibly important in mathematics and it’s also incredibly important in your organization. So this is actually an illustration from the paper, basically says, "Look, if you’ve got some users, people who do user programming, people who do system programming and engineers who’re building the hardware that’s the deliverables you’re gonna get. You’re gonna get a deliverable hardware, a system hardware, and an application hardware deliverable.

If your general system is divided between two groups that all meet together, you’re gonna end up creating software that looks just like that. Communication dictates design. Eric S. Raymond has a much simpler way of explaining it from his book in the "Hacker’s Dictionary." He says, "If you’ve got four groups working on a compiler you get a four pass compiler. Change that to three groups, you know what you’re gonna to get, a three pass compiler."

So, this is just directly telling us how things work, how systems work. And it really talks about this notion of cross team independencies. When I’m over here and I’ve got all these groups and you think about they can get even more and more complicated, you think about Spotify, you think about what Jeremy was talking about with 29 lambda functions, right, they all are deliverables. All of a sudden it becomes really important that you understand who’s dependent on who else. If I have to wait for your team to release I’m not independent.

And when you set up lots of microservices in lots of teams it’s important to make sure you don’t create a whole new set of dependencies. Each team needs to be fully independent, they can release on their own schedule at their own speed at their own time. Gartner has this thing there they’re calling bimodal, right, bimodal. So my first question to the Gartner analyst was why two? That’s it, you only get two speeds that’s it. That’s a terrible car, right? Lots of speeds is reality, right, lots of speeds.

So if you have to whole to a release until some other team is ready you’re not an independent team. And now your company works at the pace of the slowest team in the group. Nobody can go faster, it’s incredibly dangerous. So this is another study from Microsoft, large scale software teams and how they work on interactions. Responses, I think from about 80 team leaders on how they deal with dependencies, you know, we have a whole industry in managing dependencies, right? My industry is actually pretty simple, I get rid of them. I don’t need a tool to tell me what my dependencies are, I get rid of them.

I love some of the answers here, they’re a little creepy. I avoid unreasonable people, right? I interact with people I trust or I just cancel the project completely. I must have made Microsoft really happy, what, what, we’ve cancelled projects. Actually, these are the more fun, right, never take critical dependencies, eliminate code dependencies, have a backup plan to ship without the dependency. That’s called a circuit breaker I think in code, right? And minimize code dependencies, right, minimizing these things is incredibly important.

We all know what it’s like when you’re trying to like build a package and things go wrong. Did MPM find out what dependency problems were last week? Who knows this, MPM project, right, left pad, right? Now, I remember the discussions about whether or not we should be shipping the packages or just shipping the references and let them all resolve at build time, right. And everybody’s, "Oh, build time is gonna be better, right?" It just takes one case, right, one case in a complex system and suddenly tens of thousands of packages won’t build tonight, right? Dependencies are deadly and dangerous.

All right. So Conway’s Fourth Law, Conway’s Fourth Law is kind of a summary of a bunch of other things, disintegration. Structures of large systems tend to disintegrate during development more so than small systems, things fall apart. I think we’ve all probably been on a project where you could tell this thing was never gonna ship. That the ship date kept getting further and further and further out. It’s just never gonna make it and once it does ship you’re pretty sure that nobody wants this, right, that’s what’s this is about. There’s three reasons he says disintegration occurs. When we realize it’s gonna be large then we want to add more people and of course we know how that’s gonna work out, right?

Adding more people makes it like that’s the first reason, we kind of get freaky and somebody says let’s just add more people. There’s also, do we know Parkinson’s Law, Parkinson’s work will expand to fill the space a lot of, right? So Parkinson has this really, really kind of perverse way of looking at the way committees and politics and governments work. And basically he says not only is it the bad news that we add more people there’s a sort of a disincentive in sociopolitical systems to have more people reporting to you, right? So somebody says, "Mike, I’m gonna give you 10 more people." Great I got 10 more people reporting to me, it’s making things worse.

The second reason is since we’ve added more people, now communication is worse, right, because of Dunbar, right? So now what happens I’ve added five more people that’s another set of numbers that I have to add to the whole complex mix because people can’t keep track of things. And then thirdly because of that graph of the way the organization communicates and the software looks as I add more people and can’t communicate with them well the software errors start to rise, the code looks like crap. It’s just sort of the hilariously inevitable thing because communication dictates design.

So Conway’s Fourth Law tells us that time is actually against large teams and large projects, time is the danger. That’s one of the reasons you do short release cycles. Short release cycles help you try to beat time. If you release every two weeks it’s only within two weeks that something could go terribly wrong and off the rails. But if you release every two months or every six months you can go way off the rails before you get released. Time is a danger.

There’s a group called the Standish Group and they have a wonderful report, they’ve been doing for about 20 years. It’s actually called the chaos report and it’s focusing only on large scale projects, large scale corporate or governmental IT projects, and they chronicle the failures. He talked about celebrating failures, this is the whole company that’s made an industry of documenting them and then selling it these stories to you. It’s sort of like this giant schadenfreude, we know what this is, right? I love finding out how other people failed, right? As long as they don’t write up about me I’ll be okay.

But we won’t go into the details complexity. C1 to C5 complexity across the top, size from S1 to S5 down the bottom, you can see once you get into high complexity high size the failure rate is astronomical, the opportunity for failure just increases exponentially. Small projects released over and over gonna give you a much better shot, that’s really what microservices is trying to pull off here, right, give you a tractable piece that you can release over and over again that’s not complex connected to somebody else.

So if your released days are missed, it’s often because your scope is too big, you’re trying to do too much, stop. We’ve got some companies we work with, one of the first things we tell them is you need to start releasing every two weeks. And if two weeks comes up and you haven’t actually shipped a bug or shipped to change you release anyway because the release team needs work. You released, you released, you released.

As a matter of fact, when I worked with a couple of…I was like an outsourced programmer for a while. I knew how to manage groups if we had a rhythm for release. If they missed a rhythm then I knew we were in trouble, you know, if my rhythm was every six months I wouldn’t know for six months if we were in trouble or not. But if it was every week or every two weeks it was a lot easier for me. So aim for a scope that supports a release cycle of two weeks or less, if you’re doing anything more you’re just hurting yourself and hurting your team. Work on that right now, it’s not easy there’s a lot of infrastructure. There’s a lot of material that you have to get together to do that but that’s one of the first things.

Okay, we got a few more minutes and then I want to make sure I have time for questions. This is Mel Conway now by the way, this guy is not dead. He’s alive and living in Boston and he is very interesting guy. So, Conway’s Laws help us succeed when we’re working in microservice teams. So because the system is a copy of the communications structure, we actively manage communications. We actively manage the teams and the team sizes because that’s gonna affect our code.

There’s a great article called "Global Software Development." Somebody was asking in the previous session about remoting. Think about how remoting affects what we just talked about. If I’ve got five people here in Sao Paulo and one person in Rio, that’s two teams, sorry, that’s two releases. Let’s design it that way, right. Take advantage of what you have there. So, remoting affects things in a lot of ways. So one of them is the idea of uncertainty, you can reduce uncertainty by reducing the complexity of the code in the releases, make them smaller, keep the code simpler. You also need to lower the…I’m sorry complexity is code uncertainty is communication.

You need to increase communication value so how do you do that. Real time chat tools, video conferencing, forums and newsgroups, wikis and websites to document. All of these things are attempts to improve the quality of communication. That will improve the code. Code reviews are not going to do it, right? I spent a week writing a thousand lines of code now you’re gonna tell me it was wrong. No. Every day, I check in, every day I ask question, every day somebody asks me a question. Shorten that feedback loop, reduce the effort required to locate and interact with the right people at the right time.

I think it’s PayPal has this notion that if you’re working as a remote worker you have to have a video camera on when you’re at your desk, not so they didn’t know you there, so that when I walked by the screen, oh, Mike is here I can talk to Mike now. Like, I try to recreate that availability interaction from being in a colocated space. So there’s never enough time. He’s the one that Conway talks about. So all we have to remember is that it’s continually repeating. It’s okay if I don’t get the bug fix in today’s release because there will be a release tomorrow. I don’t have to worry about, it’s continuous. So I don’t have to worry about that lack of resources, I only need to focus on tractability.

Small frequent changes testing along the ray reduces the inherent risk in deploying code. As a matter of fact, if you think about what Etsy does 10,000 deployments a year, they don’t deploy from the same place. There are people all over the organization that are deploying. And they’re not doing end-to-end testing. You can’t do end-to-end testing when the system changes 10 times a day. You do bench testing and then do runtime testing, canary builds and other things that actually tell you what the runtime is like.

So you can reduce a lot of ceremonies to this process. Implement small changes, test those changes immediately, deploy them constantly. So if I write something and it sits in the queue for a week before it finally gets deployed and then it gets deployed and there’s a problem, I’m probably on another project by now. And now it’s gonna be a huge mental game for me to kind of resurrect what I did last week and figure out what that is. Whereas, if I do it today and you release it today, you release it in the morning, as I got a much better chance of being able to solve any problems that might come up. Shorten the feedback loop as much as you possibly can.

Complex systems are often complex because there are intractable and they’re often intractable because we can’t trace the paths, right? Stock markets, economies, weather, my family, too complicated, right? Because it takes too long for things like show, Dad, you told me a week ago, like I don’t remember. You know, Conway’s Third Law about homomorphism, about the math between the system graph and the organization means that you get to organize your teams.

You organize your teams to get the code you want, this is called microservices, right? If you think about what ThoughtWorks has been talking about when they talk about microservices or around business capabilities not around projects, that’s the whole idea. Like, this is the capability, this is what I want this is what Jonathan was talking about at Spotify all the time, right? This is what they do.

So rather than these silo teams, I’m the front end, I’m the middleware, I’m the backend, you mix them together. You’re the front end, you’re the middleware, you’re the backend, you’re the test person. You teach all of us, all of us, right? Now, we’ve got a nice squad, we’ve got a nice team. So organized by product or business unit make sure that test and deploy are all in there. Make sure you include storage and business process NUI as all owned by the same group, don’t try to pass something off.

We see this time and time again at Amazon and Hootsuite and Spotify and Etsy and all these other companies. Allow teams autonomy within their boundary, Jonathan talked about this as well. I don’t really care about what happens inside that box in terms of technical terms. As long as when it comes outside the box, it works with everything else we have. If you pick some technology that doesn’t work with the rest of us you got a problem on your hands. But if you decide you want to use some other tool or some other bit of information that’s fine with me.

Require teams to interoperate not integrate. This is one that we’ve added and we’ve seen this as really important. It’s important that I understand the interface not that I understand the innards. So if your team does something in a certain way that is rest of us don’t really need to know about as long as it’s consistency, as long as you’re contracts and your promises are kept. So make sure teams own their own complete life cycle, that includes support at the end by the way.

And it includes when it’s time to get rid of a service like nobody’s using this. Mike, you know, it’s a great project and all but nobody’s using the service anymore, we need to get rid of it. Or another one that’s equally important is, you know, we’re gonna need to break this team up but somebody’s gonna need to own these three services. I can’t just kill the team and orphan the services.

We talked to Uber, the taxi cab people, 400 engineers, 800 microservices. Eight hundred microservices, really? That’s two services a person. Well, actually a lot of them we don’t understand what they do anymore but we’re all afraid to kill them, right? Don’t orphan your stuff, it could get worse. And the Fourth Law is that large structures fail faster than small structures so you keep your teams as small as necessary but no smaller. And that brings me to Jeff Bezos. Do we know this story, the two pizza rule? Jeff Bezos got really upset when he starts building all this infrastructure that’s all gonna be service, everything’s a service.

And he was told by his team, we’re gonna have to have a lot more meetings and he said, "No, no, we’re not having more meetings." We need to explain things. And he says, "If you have to explain things then there are too many people in the room. You need to break this into smaller groups." And he created the rule that said, "If a team can’t be fed by two pizzas that team is too big." Now those are American pizzas so it’s, you know, the big team. But, you know, he had this inherent notion that you want teams to be autonomous and not have to explain everything to anybody or get permission from anyone.

So make them small, resist the urge to grow teams just to meet a deadline. Consider Dunbar’s rule when you’re sizing your teams. And be prepared to break them up into smaller teams, Jonathan talked about this very same thing. It’s better to be too small than too big, it’s better to be a little bit overworked and people sit around creating trouble thinking of new projects that nobody needs.

Okay, increased communications, support continuous processes, organize your teams, and keep them small. These are the four things that we think are really important and, you know, it’s a half a century old advice. And it applies to lots and lots of things not just to microservice, not just to IT but lots and lots of stuff. And that’s what I have. Thanks.

Quesitons

Moderator: Questions?
Question: Yeah, I had a question, thinking about continuous delivery. You know, I tend to think about that in terms of like web applications or internal services. Is there anything useful from that that you could take if you’re building a product that, you know, your customers install?
Mike: That’s a really good point. So the idea of continuous delivery usually focuses when a distribution is not a challenge, right. So, usually I’m in a corporation or I’m on a website. We do see companies doing this and primarily affects mobile apps. So often what you end up doing is you need to create a way that mobile apps can be updated on a regular basis and configuration is a great way to do that. So configuration as often something that I don’t have to wait for the Apple Store to approve or validate.

So I can actually add new features and new functionality to applications in that way so we definitely see people doing that and that can work on the desktop side as well. What it really amounts to is lowering the cost of getting your customers, your individual users to do an update. And if you can make that zero in other words if those updates are automatic in some way then it works really well. We definitely see companies doing that. It’s definitely a challenge, but it’s a different one. Yes?
Question: Hi. Do you have opinions or thoughts or same kind of advice for small teams like two people, one person or three people? Any problems you see or how to solve them or any research in this area?
Mike: So the question has to do with what if you’re just a really small team, are there any other things to watch out for. Here’s the thing that we’re starting to see. This is just gonna be anecdotal, I don’t have any research to back this up. What we’re finding out is small teams build a monolith. Because when there’s only three of you, it makes no sense to have 20 services, right? But what happens is as you grow you need to start sort of slicing things up.

So one of the things we’ve noticed, we’ve run into some very small startup teams and they’re saying, "Oh, we’re gonna have 20 services and we watch them sort of struggle to, you know, basically juggle 20 things and it’s a dumb idea." So don’t be too worried if you’re just a small team to build the so-called monolith or something that doesn’t look like it’s the hipster’s thing. Do what sort of make sense in your group, I think is something we found. Does that that help? I mean, that’s one of the things that we’ve noticed. Yes?
Moderator: So thank you again, Mike.
Mike: Thank you very very much. I’ll be here all week. Thanks.