If you lookup best practices on schemas they will inevitably start talking about evolvable schemas. Don’t get me wrong – evolvable schemas are great, but they instantly pull the schema away from being something that looks like a contract to something that causes confusion.
If all fields can be modified then it’s not a contract!
A system I’ve been working on recently has enforced evolvable schemas for a number of years. It’s an iteratively built system, with some product pivots along the way. My team is building a new service within the system that, as a proof of concept, needs to handle the minimal valid data.
The obvious starting point is to look at the schemas to understand what are must have fields and what is optional. The problem is, with the mantra of everything must be evolvable, all fields have a default as null, even primary keys.
The upshot is you can build a service that produces data that is valid according to the schema but no downstream services can understand your output. They can de-serialise the messages, but then just have to discard because of missing fields. Evolvable schemas can not be used to enforce even basic data validty. As you head down the path of optional fields only being valid in certain combinations then there is no chance the schema will help you.
The answer is 2 fold:
- meticulously document your schemas. This is another reason why I’m not a huge fan of Avro – the documentation flow is poor in JSON format.
- constantly run e2e tests with authentic data. The contracts must be tested as part of the CI pipelines to ensure validity. The contract tests must also be constantly updated to ensure that they represent the actual required data.