Sergey Shishkin

software craftsmanship in practice

Data/Actions Impedance Mismatch

While working on hypermedia APIs recently I came across a very subtle design smell. It manifests itself in excessive usage of CRUD semantics and designing resources after data entities. To anticipate early debate I’ll repeat that the smell is very subtle and in some situations the same design may fit well.

Let’s consider a valid case first. CouchDB, a NOSQL document-oriented database, provides an HTTP API which can be considered restful to a degree: it returns meaningful status codes, respects HTTP caching, uses meaningful HTTP methods. In CouchDB documents and document collections are all resources, they have unique identifiers – URLs, and the client interacts with resources through their representations.

Another such example is Atom Publishing Protocol – AtomPub. It has the same uniform CRUD interface over resources, which are collections and items. AtomPub has even nice hypermedia controls – links, which make it perfectly restful. One common thing about these two API examples is that none of them exposes any domain logic. At most, some CRUD operations may require authorization and basic input parameter validation, but no context-dependent logic.

For a utility API like database or publishing protocol it’s not a big deal though – database domain logic is just CRUD. For any but most simplistic API however CRUD is a symptom of an Anemic Domain Model. It is the opposite of the pit of success that any customer-focused service wants its API to be. But why exactly is that’s the case?

If resources are modeled solely after data entities the data versus operations impedance mismatch arises. On the read side everything is fine – given meaningful links between resources, the client can happily navigate all the way through the exposed API data. But as soon as anything needs to be changed the client is left alone with HTTP POST, PUT and DELETE methods, which all have very strict semantics and operate on the resource level. And no, PATCH method is not a solution here, since the problem is not how fine-grained control the client has over data.

The problem is that POST, PUT, DELETE and even PATCH all have too generic semantics to represent any valuable domain operation. POST doesn’t define how the posted resource representation should look like. PUT means representation replace at its entirety, while PATCH allows partial updates. Though neither PUT nor PATCH can communicate which parts of the representation can be updated or what the user’s intent was, making them almost unusable for modeling domain operations. And DELETE is almost never a good business operation (again, data stores and content management systems not considered), if a resource was important enough to be uniquely addressable in the first place.

Not only does complexity grow on the client, the server implementation gets complicated too. The server has to reverse-engineer the lost intent of the user from representation diffs in order to apply domain validation. Different HTTP methods are called in different contexts and require different authorization, but many web frameworks map resource actions to methods on a resource-bound class. This leads to violation of the Single Responsibility Principle and is prone to authorization and validation errors. Moreover read and write operations often require different non-functional considerations, so it makes more sense to separate reads and writes at least on the implementation level.

The described problems result in the impedance mismatch between the server and the client. If the client is not a simple forms-over-data, it “thinks” in domain operations (tasks, actions) for the write part of the domain. A solely data-centric API just doesn’t fit there.

I hope to make a case for the design smell in this blog post and plan to outline a solution in an upcoming post.