My talk "Understand, Automate, and Collaborate for Development Speed with Microservices" is now available on InfoQ.
Recently I've moved from teaching simply about the Life Preserver itself to merging in more recent ideas Domain Driven Design, especially from the great work by Alberto Brandolini on Event Storming. This article is a brief writeup of my take on why this approach is so powerful and how I tend to teach and promote it on the Microservices and DDD courses.
The "Big Misinterpretation" of DDD
Domain Driven Design (DDD) is a wonderful technique that attempts to bring our designs closer to the domains we are working on.
The theory being that the closer we are to the domain in our implementations, the better our software will be. With this goal in mind, DDD has been extremely successful throughout our industry.
But, as architects and designers, we have an embarrassing love affair with making early decisions on the structure of our software.
This emphasis on structure was NOT the intention of DDD.
DDD adoption and interpretation has unfortunately exacerbated this by misinterpreting that the place for the Ubiquitous Language is in the things, specifically in the Domain Objects.
Domain Objects became a massive problem as they entangled multiple concerns across a software design in their goal of being ubiquitous. They became Entities, Message Payloads, Form Backing Objects … they were promiscuous across the system and a source of entanglement and system fragility.
The situation got even worse as people packaged Domain Objects into libraries for reuse effectively increasing their footprint and increasing the ossifying effect of the Domain Objects being the brittle part of your system that could not evolve now that it is being used everywhere.
Shared domain object libraries could not be evolved without everything evolving in lock-step.
Frameworks emerged that had a Domain Object and Entity focus. The goal was to make things easier but not necessarily to make things simpler or improve your design. Examples of criminals in this space includes Rails, Grails, Spring Roo, Entity Framework, N/Hibernate etc.
You could detect the problem by encountering the unintended, and un-wanted, 40-file commit Ripple Effect.
Entity, Database, and Domain Class-First as an approach became very popular and caused these problems as it became the default architectural style of the Enterprise Layered Architecture.
The whole thing exploded into systems that could not change as the Domain Classes, the Things, went from being Ubiquitous in one area, to being Canonical across all areas.
Then there was an epiphany!
It is not the things that matter in early stages of design…
…it is the things that happen.
We have a name for the things that happen in software design, they are referred to as Events.
The epiphany in DDD was that it was the things that happen that should be the first consideration with designing some software.
This newer approach was called Events-First.
Events turn out to better capture the ubiquitous language of a domain or system. More often than not the easiest way to describe the system is in terms of the things that happen, not the things that do the work when collaborating with non-technical stakeholders,
The things that happen are the Events.
It turns out that this approach works well if you’re evolving an existing system, or working on a new one.
The technique of Event Storming is the first design step on this journey.
Event Storming is a collaborative activity where you bring together domain experts and technical architects and designers to discover the ubiquitous language of a system or context.
The intention is to try to capture a system in terms of the things that happen, the Events.
Using post-its you then arrange these events in a rough order of how they might happen, although not at first considering in any way how they happen or what technologies or supporting structure might be present in their creation and promulgation.
One technique is to think of yourself as a Detective arriving at a crime-scene and simply ask yourself of the system you are working on with your team, “What are the Facts?”
Limit yourself to events that describe what you can know, in the system you are working on and can influence.
What Makes a Good Event?
When designing your events attempt to make concepts explicit. If several low-level events manifest a higher-level fact, make that fact an event. Equally if a high-level fact could be seen as being made up of smaller-scale events, then make sure you have those events too.
Having a LOT of events is rarely a problem.
Make your events completely self-contained and self-describing.
Make your events technology and implementation agnostic.
One you have the happy set of events, explore the events that can happen when things can go wrong in your context.
This approach helps you ask the question “What events do we need to know?”, which is a powerful technique to help explore boundary conditions and assumptions that might affect real estimates of how complex the software will be to build.
Events are immutable, after all they are Facts.
Lay Out your Events to Explore Causality
Consider then creating a Causality Graph of your Events to explore assumptions around what order Events might need to occur in, and what order you have simply assumed they might need to go in.
The more events that are not causally linked, the more options you have about how those events are generated and handled in the emerging structure that you’ll eventually design.
Bounding with a “Bounded Context”
It’s now time to introduce a boundary around your ubiquitous language, as expressed by the events.
The “Life Preserver” diagram is simply a useful tool made up of a couple of concentric circles that are a visual representation of your Bounded Context and Anti-Corruption Layer.
The Bounded Context is related to teams and team structure.
It is common for one team to look after and develop one or more Bounded Contexts, but two teams with different goals should not ideally work on the same bounded context as speed of change friction and dilution of responsibility usually occur in that case.
If a team inherits a “Heritage” (Legacy) software system, it should remain in its own Bounded Context as it will exhibit its own ubiquitous language of the time and place and team it was originally developed in. Do not be tempted to merge the Heritage context with your current context as you will dilute the ubiquitous language and advantageous human comprehension in both systems.
Bounded Contexts rarely make sense as nested concepts. Team’s focus on and work towards distinct goals and resulting areas in systems and so bounded contexts follow this model.
Finally, To Structure…
First, discover the stateless services that existing within your system. These are the services that do not maintain any state whatsoever and consume, as well as put out, Events.
Second, consider using Repository services to capture state where the model for modifying the state as well as reading it is the same.
Be wary of leaky abstractions around data persistence technologies and make your repositories agnostic of whatever technology you choose.
Finally look to implement microservices using CQRS if you need to vary the performance characteristics, and models, for modification and enquiry of state…
Beating Complexity in State
Complexity, entanglement of concerns, is the not-so-silent killer of software development projects. The biggest cause of accidental, or incidental, complexity in software is Us, and we can work to remove it by Organising, Reducing and Encapsulating, and looking for Events First.
The second biggest cause of accidental complexity in software is using the wrong data model for the wrong job.
Domain Driven Design (DDD) gives us the simple Repository pattern for the storage and retrieval of data but unfortunately this pattern tends to entangle the concerns of Write and Read.
This is not a problem if the models for Write and Read are the same, in fact it has some real advantages as you can ensure complete consistency over a data set in a Repository.
But when the needs of write and read are different, and they often are, we tend to need something more.
Beyond the model used, it turns out that `writing` and `reading` have further different characteristics when it comes to:
When both the need to manipulate and read data are entangled we end up with `systemic complexity` that in turn results in a system component that resists comprehension and change.
We need to disentangle these concerns if we can in order to increase the simplicity of the system so that we can develop, and operate, the software more easily.
Separating Concerns: Command Query Responsibility Segregation
The first step towards simplicity is to separate the two models. This is at the heart of the Command Query Responsibility Segregation (CQRS) pattern.
The key feature of this approach is to unpick the accidental complexity of entanglement between the write and read models into two system components: The Write and Read Models.
Commands and The Write Model
The Write model captures whatever is necessary to report that some important state in the system has been modified. That is why this model is often referred to as the modification model.
Commands are received by the write model and processed to produce events that report what modification has occurred.
Commands represent transactionally consistent manipulations of the Write Model, and should succeed or fail entirely. The Write Model will maintain whatever state is necessary, in whatever optimised form, to produce the events that it is responsible for.
The job of an Aggregate is to take a command and execute it to produce one or more valid events.
The Events represent full, complete, self-describing and immutable Facts about the system. In turn a Command ideally contains all that is necessary to produce the expected, valid Events.
In this respect it’s important to remember that the Events are not merely side-effects of processing the commands, they are the actual data result of the commands. Commands and Aggregates only exist to produce an important set of Events, facts, about our domain.
What do you do if the Command does not contain all the necessary details to produce there Events? Sometimes the processing of a command requires an aggregate to have some awareness of data, or more accurately Events, from other parts of the system. Sometimes this data can be sourced from local (to the Bounded Context) views.
Of course this can represent a race condition where an aggregate is trying to execute a command and the requisite information is not yet available in the supporting view or local cache.
This is a normal state in an eventually consistent system, where isolation and partitioning is valued at the sacrifice of system-level consistency (such as in adaptable, antifragile, microservice-based systems). This can also occur if an erroneous, or malicious, command is being processed by the Aggregate.
To deal with this the sender of the event that causes the command on the aggregate may have to deal with the fact that the command will be reported as failed. It is then the responsibility of the sender to decide, using their larger business context knowledge, whether to back-off and retry the command (one or more times with increasing delays is a common strategy) or fail the larger business operation.
In some respects this strategy is similar to using a circuit-breaker where the circuit is the interaction between the initiator and the aggregate, and eventually the circuit is broken for the business operation trying to be conducted.
In the case where the sender is a Saga then additional processing may be necessary in order to perform roll back through compensating commands.
Queries and the Read Model
The Write Model provides a great starting point but it does then beg the question "How do I then query for my data?".
In CQRS this responsibility falls to one (or more) Read Models.
A Read Model is responsible to listening to a number of different events and maintaining an optimised version of the state that can, in a performant fashion, meet the needs of responding to one or more types of Query.
Events as the Communication Between Models
So how do Read Models keep up-to-date? By subscribing to one or more event streams that are being emitted by the Write Models in the system.
What events the Read Model is interested in will be dictated by the Query or Queries that the Read Model will have to provide answers to.
Events are the Key
The use of CQRS relies on designing the right events. Events are the most important part of the Ubiquitous Language of the domain, and so should make those domain concepts as explicit as possible.
Reducing the Fragility of State using Event Sourcing
Event Sourcing is where you record the events that come out of an aggregate.
An event store simply reliably stores your events in a guaranteed sequence order that can then be queried and replayed.
Store your events in a robust and resilient place so that aggregates and views can be re-created from the events at any point in time.
This means aggregates and views can be fragile.
Snapshotting represents a compromise where replay is shortened, versions can be retired, but there is a deliberate amount of coupling to data migration.
Where to go next?
Courses that work through this approach ,with a specific eye to building microservices, are available.
I shall be exploring some of these concepts as I finish the Antifragile Software book, and I shall be serialising some of the implementation examples from that book here on this blog over time.
Further reading on some of the topics discussed here are available from:
"What are you doing at Atomist?", I get this question a LOT.
Today I'm really excited to see the first post from Rod on this subject and there's lots more to share in the coming days, weeks and months.
As part of the team at Atomist I've been busy working with Rod and everyone on our new, and soon to be open-sourced, tools for writing and evolving your own software codebases.
After the great response to Jess's awesome talk at Elm Conf last week, I'll be talking about and demonstrating some of the current Atomist toolset as part of my talks at JAX London in October, and µCon 2016 in November.
Hope to see you at either of those conferences to do some coding with Atomist!
Naming is critical to the human comprehension of your system, and this obviously extends to service naming. Human comprehension is king and so in order to fight the biggest limiting factor on your software system accepting and thriving on change, i.e. you and your comprehension of it, naming is a critical concern.
When naming microservices I recommend you focus on naming a service in terms of what it does, not what it is. PayrollSummaryAggregator, or “PayrollTableView” are named in this preferred way.
Things to avoid include (in no particular order as such):
If you find yourself developing a service and wanting to use “and’ in the title then it’s a good indication that the service is going to much. This is a good and natural thing; we often build services that start out seeming to do one thing and then realise that it does too much, which leads to reduction and new service extraction.
What about Type?
Calling a service a “Service does not make it a single-concern service that aids comprehension.*
Should you add “Service” to a service’s name? The rule here is only if you have other *types’ of artefact in your system.
If you have stable and reusable libraries then having some things called “Library” and others called “Service” probably aids understanding.
But use a type with care, it can easily lead to too high-level and name and hide multiple concerns and complexity in the service.
A Word on Codenames
Code naming services is fun and often used when we’re not sure what a service will do … yet. However these should be dropped at the earliest opportunity as they:
Avoiding Acronyms, IYC (If You Can)
The danger of naming micro services with what they do is that it can lead to long service names. That in itself is not a problem, but it can lead us lazy humans to strip them down to acronyms in conversation (both digital and verbal), which then leads to reducing the comprehension of others.
Try to avoid acronymising service names, even if they are long unless you are in a context where the acronym has been clarified.
by Russ Miles, Wednesday 10th February, 2016
I'm pleased to announce that my QCon London 2016 talk on my Microservices Mega-Architectures track has been announced.
This year I'm going to be speaking along with my friend Sylvain Hellegouarch on the topic of TDD and Microservices. Specifically in a world where you want to be evolving and improving services all the time, how do you do that with confidence!? What do you need to care about? And what
We have a lot to share in that space, so expect a deep dive into the practicalities of creating test-driven microservices and system resilience.
I look forward to seeing you at the conference!
by Russ Miles, Wednesday 10th February 2016
I am really excited to finally announce that I'm part of the mission, product and company called Atomist.
It's exciting times indeed! I cannot wait to share what we're doing. Watch this space!
by Russ Miles & Sylvain Hellegouarch (Simplicity Itself), Wednesday 10th February, 2016
In the previous post I talked about why software, typically, is no friend of change. While you can certainly build software and deploy in a monolith fashionic to get some of the options to embrace change, the Microservices architectural-style is focussed on designing to respond to the very nature of any live system: change! And to do this quickly!
Because microservices are actively designed for change, engineering teams can welcome business variations in a more confident manner, increasing comprehension and trust between all the stakeholders.
In fact, I emphasise exactly this in my most recent talk on the "Technical Journey to Microservices". Microservices are most often adopted for adaptability (embracing change), speed and antifragility.
Rather than the anger and guilt that an agile team feels when "one small change" is proposed, change is embraced as a normal aspect of software research and development.
by Russ Miles, Wednesday 10th February, 2016
It's a dimly-lit setting. Stress fills the air. I walk into the comfortable anteroom of a church for my first meeting with my soon-to-be-fellow addicts...
As I sit down with a luke-warm cup of coffee, the opening rite begins:
"My name is CORBA and I was a great idea, with poor execution..."
"My name is the Monolith, and I seemed a great idea for domain discovery..."
"My name is Russ Miles and I used to promote the Monolith-first approach..."
Welcome to "Architectures Anonymous"
I have learned so much in the last year. Working on real projects I was a consultant all over the planet for the lovely folks at my company, Simplicity Itself, and now I'm a reformed Consultant, working on something new. More on that soon.
This post is inspired by this great article from a while back by Stefan Tilkov. Stefan puts it to us to:
"think about the subsystems you build, and build them as independently of each other as possible"
In other words, think about small, independent services (Microservices) early. Going for microservices first, rather than building a monolith.
So why have I in the past been much more hesitant to go in this direction? Why do I need to confess that I too hedged my bets and though that the monolith-first approach was the better way? I'd be the first to admit that many of these rules are entirely context (and that includes the people and skills, not just the domain!) dependent.
My take, and those of some great people like Sam Newman and Martin Fowler, was to recommend building a better monolith first (I use my Life Preserver process and tool to do this visually), so that you are ready for the options of microservices but you can still work speedily on the monolith as you are in an intense period of discovery at the beginning of a product's life.
And there lies the reason for my change of heart. It was an argument based on Speed.
The implication, and the evidence I'd seen until recently, was that while you were in the discovery phase of a product, when you were researching and developing the core domain itself, then a team hacking away on a monolith would be faster. It seemed to make sense, and I even conducted an experiment that, in a limited way, provided it.
More experiments have come and gone though, and I can say now that it is not easier to discover and develop a software as monolith, at least it shouldn't be easier...
If it were easier and simpler to create smaller, independently evolving microservices then we could be at least equally as fast as discovery of the domain as a monolithic approach. In fact, with the isolation it becomes possible for many people to work on the same discovery process and not step on each others toes, so in all likelihood we could be faster!
The problem is that it is not currently simple and easy to develop with microservices. It's painful.
But that is changing.
I've seen that as the inertia of developing microservices-based systems using the emerging best-practice design patterns and tools that I talk about, so the approach becomes better and better suited to the green-field, microservices-first approach.
With better tools and understanding, the microservices-based approach will not be slower at domain discovery, it will likely be faster.
I think the industry is slightly in mourning for the monolith. There's a real chance we'll look back at it in 10 years and say "wasn't is simpler in the old days...". Of course it wasn't, but time does have a way of increasing the tint on rose-tinted goggles...
by Russ Miles, Wednesday 10th February, 2016
I'm rapidly pulling together the code for Book 2 of "Antifragile Software: Building Adaptable Software with Microservices".
As this extends into a number of repositories, both staging sandboxes for experiments as well as more static examples lifted directly for the book, I thought it was time to collect the lot together into a organisation on GitHub that readers can subscribe to and follow.
The Antifragile Software GitHub organisation is born. If you're reading the book and want an early glimpse of the supporting code as I bring it together, this is one place to start. Also, if you haven't already, follow me on twitter as I'll be announcing small updates to the book there (rather than drowning the readers in day-to-day email announcements!).
by Russ Miles & Sylvain Hellegouarch (Simplicity Itself), 10th February, 2016
I was just introduced to this wonderful talk by Richard Cook on systems resilience by my good friend, Sylvain Hellegouarch
Richard's main point seems to be that the fundamental battle any software delivery and operation team faces is made of three, often implicit, boundaries:
In the early 2000s, it was acknowledged that the initial view was only a partial understanding of the system’s input and so agile methodologies were gradually defined. The aim here was to iterate as often as you can so as to correct your path early.
Although a foot in the direction, this did not pan out as well as one could have hoped for. Indeed, changes could be introduced during the life of the project, but they still could take a long time to be delivered into production. This was the Elephant in the Standup effect; where your software becomes grossly in conflict with reality over time.
It's time for this to change. Right now the process can absorb incoming changes but the software, something you might hope would be wonderfully malleable, resists these changes.
If we don't architect and design to thrive on change then we are destined to build systems that will be perceved to be holding the business back.
We need to fix the software design and architecture, not just the process. This is at the heart of Antifragile Software as enabled by microservices.
Musings on software development