Recently I've moved from teaching simply about the Life Preserver itself to merging in more recent ideas Domain Driven Design, especially from the great work by Alberto Brandolini on Event Storming. This article is a brief writeup of my take on why this approach is so powerful and how I tend to teach and promote it on the Microservices and DDD courses.
The "Big Misinterpretation" of DDD
Domain Driven Design (DDD) is a wonderful technique that attempts to bring our designs closer to the domains we are working on.
The theory being that the closer we are to the domain in our implementations, the better our software will be. With this goal in mind, DDD has been extremely successful throughout our industry.
But, as architects and designers, we have an embarrassing love affair with making early decisions on the structure of our software.
This emphasis on structure was NOT the intention of DDD.
The theory being that the closer we are to the domain in our implementations, the better our software will be. With this goal in mind, DDD has been extremely successful throughout our industry.
But, as architects and designers, we have an embarrassing love affair with making early decisions on the structure of our software.
This emphasis on structure was NOT the intention of DDD.
DDD adoption and interpretation has unfortunately exacerbated this by misinterpreting that the place for the Ubiquitous Language is in the things, specifically in the Domain Objects.
Domain Objects became a massive problem as they entangled multiple concerns across a software design in their goal of being ubiquitous. They became Entities, Message Payloads, Form Backing Objects … they were promiscuous across the system and a source of entanglement and system fragility.
The situation got even worse as people packaged Domain Objects into libraries for reuse effectively increasing their footprint and increasing the ossifying effect of the Domain Objects being the brittle part of your system that could not evolve now that it is being used everywhere.
Shared domain object libraries could not be evolved without everything evolving in lock-step.
Domain Objects became a massive problem as they entangled multiple concerns across a software design in their goal of being ubiquitous. They became Entities, Message Payloads, Form Backing Objects … they were promiscuous across the system and a source of entanglement and system fragility.
The situation got even worse as people packaged Domain Objects into libraries for reuse effectively increasing their footprint and increasing the ossifying effect of the Domain Objects being the brittle part of your system that could not evolve now that it is being used everywhere.
Shared domain object libraries could not be evolved without everything evolving in lock-step.
Frameworks emerged that had a Domain Object and Entity focus. The goal was to make things easier but not necessarily to make things simpler or improve your design. Examples of criminals in this space includes Rails, Grails, Spring Roo, Entity Framework, N/Hibernate etc.
You could detect the problem by encountering the unintended, and un-wanted, 40-file commit Ripple Effect.
You could detect the problem by encountering the unintended, and un-wanted, 40-file commit Ripple Effect.
Entity, Database, and Domain Class-First as an approach became very popular and caused these problems as it became the default architectural style of the Enterprise Layered Architecture.
The whole thing exploded into systems that could not change as the Domain Classes, the Things, went from being Ubiquitous in one area, to being Canonical across all areas.
Then there was an epiphany!
It is not the things that matter in early stages of design…
…it is the things that happen.
We have a name for the things that happen in software design, they are referred to as Events.
Then there was an epiphany!
It is not the things that matter in early stages of design…
…it is the things that happen.
We have a name for the things that happen in software design, they are referred to as Events.
Consider "Events-First"
The epiphany in DDD was that it was the things that happen that should be the first consideration with designing some software.
This newer approach was called Events-First.
Events turn out to better capture the ubiquitous language of a domain or system. More often than not the easiest way to describe the system is in terms of the things that happen, not the things that do the work when collaborating with non-technical stakeholders,
The things that happen are the Events.
It turns out that this approach works well if you’re evolving an existing system, or working on a new one.
The technique of Event Storming is the first design step on this journey.
This newer approach was called Events-First.
Events turn out to better capture the ubiquitous language of a domain or system. More often than not the easiest way to describe the system is in terms of the things that happen, not the things that do the work when collaborating with non-technical stakeholders,
The things that happen are the Events.
It turns out that this approach works well if you’re evolving an existing system, or working on a new one.
The technique of Event Storming is the first design step on this journey.
Event Storming
Event Storming is a collaborative activity where you bring together domain experts and technical architects and designers to discover the ubiquitous language of a system or context.
The intention is to try to capture a system in terms of the things that happen, the Events.
Using post-its you then arrange these events in a rough order of how they might happen, although not at first considering in any way how they happen or what technologies or supporting structure might be present in their creation and promulgation.
One technique is to think of yourself as a Detective arriving at a crime-scene and simply ask yourself of the system you are working on with your team, “What are the Facts?”
Limit yourself to events that describe what you can know, in the system you are working on and can influence.
The intention is to try to capture a system in terms of the things that happen, the Events.
Using post-its you then arrange these events in a rough order of how they might happen, although not at first considering in any way how they happen or what technologies or supporting structure might be present in their creation and promulgation.
One technique is to think of yourself as a Detective arriving at a crime-scene and simply ask yourself of the system you are working on with your team, “What are the Facts?”
Limit yourself to events that describe what you can know, in the system you are working on and can influence.
What Makes a Good Event?
When designing your events attempt to make concepts explicit. If several low-level events manifest a higher-level fact, make that fact an event. Equally if a high-level fact could be seen as being made up of smaller-scale events, then make sure you have those events too.
Having a LOT of events is rarely a problem.
Make your events completely self-contained and self-describing.
Make your events technology and implementation agnostic.
One you have the happy set of events, explore the events that can happen when things can go wrong in your context.
This approach helps you ask the question “What events do we need to know?”, which is a powerful technique to help explore boundary conditions and assumptions that might affect real estimates of how complex the software will be to build.
Events are immutable, after all they are Facts.
Having a LOT of events is rarely a problem.
Make your events completely self-contained and self-describing.
Make your events technology and implementation agnostic.
One you have the happy set of events, explore the events that can happen when things can go wrong in your context.
This approach helps you ask the question “What events do we need to know?”, which is a powerful technique to help explore boundary conditions and assumptions that might affect real estimates of how complex the software will be to build.
Events are immutable, after all they are Facts.
Lay Out your Events to Explore Causality
Consider then creating a Causality Graph of your Events to explore assumptions around what order Events might need to occur in, and what order you have simply assumed they might need to go in.
The more events that are not causally linked, the more options you have about how those events are generated and handled in the emerging structure that you’ll eventually design.
The more events that are not causally linked, the more options you have about how those events are generated and handled in the emerging structure that you’ll eventually design.
Bounding with a “Bounded Context”
It’s now time to introduce a boundary around your ubiquitous language, as expressed by the events.
The “Life Preserver” diagram is simply a useful tool made up of a couple of concentric circles that are a visual representation of your Bounded Context and Anti-Corruption Layer.
The Bounded Context is related to teams and team structure.
The “Life Preserver” diagram is simply a useful tool made up of a couple of concentric circles that are a visual representation of your Bounded Context and Anti-Corruption Layer.
The Bounded Context is related to teams and team structure.
It is common for one team to look after and develop one or more Bounded Contexts, but two teams with different goals should not ideally work on the same bounded context as speed of change friction and dilution of responsibility usually occur in that case.
If a team inherits a “Heritage” (Legacy) software system, it should remain in its own Bounded Context as it will exhibit its own ubiquitous language of the time and place and team it was originally developed in. Do not be tempted to merge the Heritage context with your current context as you will dilute the ubiquitous language and advantageous human comprehension in both systems.
Bounded Contexts rarely make sense as nested concepts. Team’s focus on and work towards distinct goals and resulting areas in systems and so bounded contexts follow this model.
Finally, To Structure…
First, discover the stateless services that existing within your system. These are the services that do not maintain any state whatsoever and consume, as well as put out, Events.
Second, consider using Repository services to capture state where the model for modifying the state as well as reading it is the same.
Be wary of leaky abstractions around data persistence technologies and make your repositories agnostic of whatever technology you choose.
Finally look to implement microservices using CQRS if you need to vary the performance characteristics, and models, for modification and enquiry of state…
Second, consider using Repository services to capture state where the model for modifying the state as well as reading it is the same.
Be wary of leaky abstractions around data persistence technologies and make your repositories agnostic of whatever technology you choose.
Finally look to implement microservices using CQRS if you need to vary the performance characteristics, and models, for modification and enquiry of state…
Beating Complexity in State
Complexity, entanglement of concerns, is the not-so-silent killer of software development projects. The biggest cause of accidental, or incidental, complexity in software is Us, and we can work to remove it by Organising, Reducing and Encapsulating, and looking for Events First.
The second biggest cause of accidental complexity in software is using the wrong data model for the wrong job.
Domain Driven Design (DDD) gives us the simple Repository pattern for the storage and retrieval of data but unfortunately this pattern tends to entangle the concerns of Write and Read.
This is not a problem if the models for Write and Read are the same, in fact it has some real advantages as you can ensure complete consistency over a data set in a Repository.
But when the needs of write and read are different, and they often are, we tend to need something more.
Beyond the model used, it turns out that `writing` and `reading` have further different characteristics when it comes to:
When both the need to manipulate and read data are entangled we end up with `systemic complexity` that in turn results in a system component that resists comprehension and change.
We need to disentangle these concerns if we can in order to increase the simplicity of the system so that we can develop, and operate, the software more easily.
The second biggest cause of accidental complexity in software is using the wrong data model for the wrong job.
Domain Driven Design (DDD) gives us the simple Repository pattern for the storage and retrieval of data but unfortunately this pattern tends to entangle the concerns of Write and Read.
This is not a problem if the models for Write and Read are the same, in fact it has some real advantages as you can ensure complete consistency over a data set in a Repository.
But when the needs of write and read are different, and they often are, we tend to need something more.
Beyond the model used, it turns out that `writing` and `reading` have further different characteristics when it comes to:
- Fault Tolerance
- Performance and Scalability
- Consistency
When both the need to manipulate and read data are entangled we end up with `systemic complexity` that in turn results in a system component that resists comprehension and change.
We need to disentangle these concerns if we can in order to increase the simplicity of the system so that we can develop, and operate, the software more easily.
Separating Concerns: Command Query Responsibility Segregation
The first step towards simplicity is to separate the two models. This is at the heart of the Command Query Responsibility Segregation (CQRS) pattern.
The key feature of this approach is to unpick the accidental complexity of entanglement between the write and read models into two system components: The Write and Read Models.
The key feature of this approach is to unpick the accidental complexity of entanglement between the write and read models into two system components: The Write and Read Models.
Commands and The Write Model
The Write model captures whatever is necessary to report that some important state in the system has been modified. That is why this model is often referred to as the modification model.
Commands are received by the write model and processed to produce events that report what modification has occurred.
Commands represent transactionally consistent manipulations of the Write Model, and should succeed or fail entirely. The Write Model will maintain whatever state is necessary, in whatever optimised form, to produce the events that it is responsible for.
The job of an Aggregate is to take a command and execute it to produce one or more valid events.
The Events represent full, complete, self-describing and immutable Facts about the system. In turn a Command ideally contains all that is necessary to produce the expected, valid Events.
In this respect it’s important to remember that the Events are not merely side-effects of processing the commands, they are the actual data result of the commands. Commands and Aggregates only exist to produce an important set of Events, facts, about our domain.
What do you do if the Command does not contain all the necessary details to produce there Events? Sometimes the processing of a command requires an aggregate to have some awareness of data, or more accurately Events, from other parts of the system. Sometimes this data can be sourced from local (to the Bounded Context) views.
Of course this can represent a race condition where an aggregate is trying to execute a command and the requisite information is not yet available in the supporting view or local cache.
This is a normal state in an eventually consistent system, where isolation and partitioning is valued at the sacrifice of system-level consistency (such as in adaptable, antifragile, microservice-based systems). This can also occur if an erroneous, or malicious, command is being processed by the Aggregate.
To deal with this the sender of the event that causes the command on the aggregate may have to deal with the fact that the command will be reported as failed. It is then the responsibility of the sender to decide, using their larger business context knowledge, whether to back-off and retry the command (one or more times with increasing delays is a common strategy) or fail the larger business operation.
In some respects this strategy is similar to using a circuit-breaker where the circuit is the interaction between the initiator and the aggregate, and eventually the circuit is broken for the business operation trying to be conducted.
In the case where the sender is a Saga then additional processing may be necessary in order to perform roll back through compensating commands.
Commands are received by the write model and processed to produce events that report what modification has occurred.
Commands represent transactionally consistent manipulations of the Write Model, and should succeed or fail entirely. The Write Model will maintain whatever state is necessary, in whatever optimised form, to produce the events that it is responsible for.
The job of an Aggregate is to take a command and execute it to produce one or more valid events.
The Events represent full, complete, self-describing and immutable Facts about the system. In turn a Command ideally contains all that is necessary to produce the expected, valid Events.
In this respect it’s important to remember that the Events are not merely side-effects of processing the commands, they are the actual data result of the commands. Commands and Aggregates only exist to produce an important set of Events, facts, about our domain.
What do you do if the Command does not contain all the necessary details to produce there Events? Sometimes the processing of a command requires an aggregate to have some awareness of data, or more accurately Events, from other parts of the system. Sometimes this data can be sourced from local (to the Bounded Context) views.
Of course this can represent a race condition where an aggregate is trying to execute a command and the requisite information is not yet available in the supporting view or local cache.
This is a normal state in an eventually consistent system, where isolation and partitioning is valued at the sacrifice of system-level consistency (such as in adaptable, antifragile, microservice-based systems). This can also occur if an erroneous, or malicious, command is being processed by the Aggregate.
To deal with this the sender of the event that causes the command on the aggregate may have to deal with the fact that the command will be reported as failed. It is then the responsibility of the sender to decide, using their larger business context knowledge, whether to back-off and retry the command (one or more times with increasing delays is a common strategy) or fail the larger business operation.
In some respects this strategy is similar to using a circuit-breaker where the circuit is the interaction between the initiator and the aggregate, and eventually the circuit is broken for the business operation trying to be conducted.
In the case where the sender is a Saga then additional processing may be necessary in order to perform roll back through compensating commands.
Queries and the Read Model
The Write Model provides a great starting point but it does then beg the question "How do I then query for my data?".
In CQRS this responsibility falls to one (or more) Read Models.
A Read Model is responsible to listening to a number of different events and maintaining an optimised version of the state that can, in a performant fashion, meet the needs of responding to one or more types of Query.
In CQRS this responsibility falls to one (or more) Read Models.
A Read Model is responsible to listening to a number of different events and maintaining an optimised version of the state that can, in a performant fashion, meet the needs of responding to one or more types of Query.
Events as the Communication Between Models
So how do Read Models keep up-to-date? By subscribing to one or more event streams that are being emitted by the Write Models in the system.
What events the Read Model is interested in will be dictated by the Query or Queries that the Read Model will have to provide answers to.
What events the Read Model is interested in will be dictated by the Query or Queries that the Read Model will have to provide answers to.
Events are the Key
The use of CQRS relies on designing the right events. Events are the most important part of the Ubiquitous Language of the domain, and so should make those domain concepts as explicit as possible.
Reducing the Fragility of State using Event Sourcing
Event Sourcing is where you record the events that come out of an aggregate.
An event store simply reliably stores your events in a guaranteed sequence order that can then be queried and replayed.
Store your events in a robust and resilient place so that aggregates and views can be re-created from the events at any point in time.
This means aggregates and views can be fragile.
Snapshotting represents a compromise where replay is shortened, versions can be retired, but there is a deliberate amount of coupling to data migration.
An event store simply reliably stores your events in a guaranteed sequence order that can then be queried and replayed.
Store your events in a robust and resilient place so that aggregates and views can be re-created from the events at any point in time.
This means aggregates and views can be fragile.
Snapshotting represents a compromise where replay is shortened, versions can be retired, but there is a deliberate amount of coupling to data migration.
Where to go next?
Courses that work through this approach ,with a specific eye to building microservices, are available.
I shall be exploring some of these concepts as I finish the Antifragile Software book, and I shall be serialising some of the implementation examples from that book here on this blog over time.
Further reading on some of the topics discussed here are available from:
I shall be exploring some of these concepts as I finish the Antifragile Software book, and I shall be serialising some of the implementation examples from that book here on this blog over time.
Further reading on some of the topics discussed here are available from:
- “The Morning Paper” by Adrian Colyer: specifically the summary on “Data on the Outside versus Data on the Inside” in respect of Event design for internal and external Events: https://blog.acolyer.org/2016/09/13/data-on-the-outside-versus-data-on-the-inside/
- “The Mythical Man Month”, Fred Brooks