Whenever we’re teaching and talking about these techniques, one question that comes up over and over is "where should I do validation? Does that belong with my business logic, in the domain model, or is that an infrastructural concern?"
As with any architectural question, the answer is: it depends!
The most important consideration is that we want to keep our code well-separated so that each part of the system is simple. We don’t want to clutter our code with irrelevant detail.
When people use the word "validation" they usually mean a process where they test the inputs of an operation to make sure that they match some criteria. Inputs that match the criteria are considered "valid", and inputs that don’t are "invalid".
If the input is invalid, then the operation can’t continue, but should exit with some kind of error.
In other words, validation is about creating pre-conditions. We find it useful to separate our pre-conditions into three sub-types: syntax, semantics, and pragmatics.
In linguistics, the syntax of a language is the set of rules that govern the structure of grammatical sentences. For example, in English, the sentence "Allocate three units of TASTELESS-LAMP to order twenty-seven" is grammatically sound while the phrase "hat hat hat hat hat hat wibble" is not.
How does this map to our application? Here are some examples of syntactic rules:
-
An Allocate command must have an orderid, a sku, and a quantity.
-
A quantity is an integer
-
A sku is a string with a maximum length of 64 characters
-
An orderid is a string with a maximum length of 16 characters
These are rules about the shape and structure of incoming data. An Allocate command without a sku or order id isn’t a valid message. It’s the equivalent of the phrase "Allocate three to".
We tend to validate these rules at the edge of the system. Our rule of thumb is that a message handler should only ever receive a message that is well-formed and contains all required information.
One option is to put your validation logic on the message type itself:
from schema import And, Schema, Use
@dataclass
class Allocate(Command):
__schema = Schema({ #(1)
'orderid': int,
sku: str,
qty: And(Use(int), lambda n: n > 0)
})
orderid: str
sku: str
qty: int
@classmethod
def from_json(cls, data): #(2)
data = json.loads(data)
return cls(**__schema.validate(data))-
The 'schema'.footnote[https://pypi.org/project/schema/] library lets us describe the structure and validation of our messages in a nice declarative way.
-
The 'from_json' method reads a string as json, and turns it into our message type.
This can get repetitive, though, since we need to specify our fields twice, so we might want to introduce a helper library that can unify the validation and declaration of our message types.
def command(name, **fields): #(1)
schema = Schema(And(Use(json.loads), fields)) #(2)
cls = make_dataclass(name, fields.keys())
cls.from_json = lambda s: cls(**schema.validate(s)) #(3)
return cls
def greater_than_zero(x):
return x > 0
quantity = And(Use(int), greater_than_zero) #(4)
Allocate = command( #(5)
orderid=int,
sku=str,
qty=quantity
)
AddStock = command(
sku=str,
qty=quantity-
The
commandfunction takes a message name, plus kwargs for the fields of the message payload, where the name of the kwarg is the name of the field, and the value is the parser. -
We use the make_dataclass function from the dataclass module to dynamically create our message type.
-
We patch the
from_jsonmethod onto our dynamic dataclass -
We can create reusable parsers for quantity, or sku, etc to keep things DRY.
-
Declaring a message type becomes a one-liner
Earlier, we said that we want to avoid cluttering our code with irrelevant detail. In particular, we don’t want to code defensively inside our domain model. Instead, we want to make sure that requests are known to be valid before our domain model or use-case handlers see them. This helps our code to stay clean and maintainable over the long-term. We sometimes refer to this as "validating at the edge of the system".
Back in Chapter 6 we said that the message bus was a great place to put cross-cutting concerns, and validation is a perfect example of that. Here’s how we might change our bus to perform validation for us.
class MessageBus:
def handle_message(self, name: str, body: str):
try:
message_type = next(type for type in EVENT_HANDLERS.keys() if type.__name__ == name)
message = message_type.from_json(body)
self.handle([message])
except StopIteration:
raise KeyError(f"Unknown message name {name}")
except ValidationError as e:
print(f'invalid message of type {name}\n'
f'{body}\n'
f'{e}')
raiseHere’s how we might use that method from our Flask API endpoint.
@app.route("/change_quantity", methods=['POST'])
def change_batch_quantity():
try:
bus.handle_message('ChangeBatchQuantity', request.body)
except ValidationError as e:
return bad_request(e)
except exceptions.InvalidSku as e:
return jsonify({'message': str(e)}), 400
def bad_request(e: ValidationError):
return e.code, 400And here’s how we might plug it in to our asynchronous message processor
def handle_change_batch_quantity(m, bus: messagebus.MessageBus):
try:
bus.handle_message('ChangeBatchQuantity', m)
except ValidationError:
print('Skipping invalid message')
except exceptions.InvalidSku as e:
print(f'Unable to change stock for missing sku {e}')Notice that our entry points are solely concerned with how to get a message from the outside world, and how to report success or failure. Our mesage bus takes care of validating our requests, and routing them to the correct handler, and our handlers are exclusively focused on the logic of our use case.
While syntax is concerned with the structure of messages, semantics is the study of meaning in messages. The sentence "allocate no dogs from ellipsis hat" is syntactically valid, and has the same structure as the sentence "allocate one teapot to order five", but it’s meaningless.
{
"orderid": "superman",
"sku": "zygote",
"qty": -1
}We can read this json blob as an Allocate command but we can’t successfully execute it, because it’s nonsense.
We tend to validate semantic concerns at the message handler layer with a kind of contract-based programming.
"""
This module contains pre-conditions that we apply to our handlers.
"""
class MessageUnprocessable(Exception): #(1)
def __init__(self, message):
self.message = message
class ProductNotFound(MessageUnprocessable): #(2)
""""
This exception is raised when we try to perform an action on a product
that doesn't exist in our database.
""""
def __init__(self, message):
super().__init__(message)
self.sku = message.sku
def product_exists(event, uow): #(3)
product = uow.products.get(event.sku)
if product is None:
raise ProductNotFound(event)-
We use a common base class for errors that mean a message is invalid
-
Using a specific error type for this problem makes it easier to report on and handle the error. For example, it’s easy to map ProductNotFound to a 404 in Flask.
-
product_existsis a precondition. If the condition is False, we raise an error.
This keeps the main flow of our logic in the service layer clean and declarative:
# services.py
from allocation import ensure
def allocate(event, uow):
line = mode.OrderLine(event.orderid, event.sku, event.qty)
with uow:
ensure.product_exists(uow, event) #(4)
product = uow.products.get(line.sku)
product.allocate(line)
uow.commit()We can extend this technique to make sure that we apply messages idempotently. For example, we want to make sure that we don’t insert a batch of stock more than once.
If we get asked to create a batch that already exists, we’ll log a warning and continue to the next message.
class SkipMessage (Exception):
""""
This exception is raised when a message can't be processed, but there's no
incorrect behaviour. For example, we might receive the same message multiple
times, or we might receive a message that is now out of date.
""""
def __init__(self, reason):
self.reason = reason
def batch_is_new(self, event, uow):
batch = uow.batches.get(event.batchid)
if batch is not None:
raise SkipMessage(f"Batch with id {event.batchid} already exists")Introducing a SkipMessage exception lets us handle these cases in a generic
way in our message bus.
class MessageBus:
def handle_message(self, message):
try:
...
except SkipMessage as e:
logging.warn(f"Skipping message {message.id} because {e.reason}")There are a couple of pit-falls to be aware of here. Firstly, we need to be sure that we’re using the same unit of work that we use for the main logic of our use-case. Otherwise we open ourselves to irritating concurrency bugs.
Secondly, we should try to avoid putting all our business logic into these pre-condition checks. As a rule of thumb, if a rule can be tested inside our domain model, then it should be tested in the domain model.
Pragmatics is the study of how we understand language in context. After we have parsed a message and grasped it’s meaning, we still need to process it in context. For example, if you get a comment on a pull request saying "I think this is very brave", it may mean that the reviewer admires your courage, unless they’re British, in which case they’re trying to tell you that what you’re doing is insanely risky, and only a fool would attempt it. Context is everything.
Places we can do validation, and different types of validation:
-
event/command schemas
-
at service layer
-
in model (business rules)
-
at exit boundaries (?)
Topics to discuss:
-
Validate at the edges, don’t program defensively inside
-
Difference between syntax and semantics
-
Discuss patterns for validating messages
-
Talk about reasons for loosely validating messages in the consumer, tolerant reader et c.