Earlier today, I had a very interesting Twitter exchange with Darrel Miller. That exchange inspired me to share some of my thoughts about media types, in the context of Hypermedia Architecture and APIs.

How Many Media Types Do We Need?

Before we ask that question, what we really should be asking is: why do we even need media types? Or Hypermedia architecture in APIs?

We Want Hypermedia Architecture When We Care About Evolvability

Increasingly, we need architectures that last and don’t require versioning every year. Most software systems are actually not good at this. There’s one specific architectural style (that of World Wide Web) which has proven over 25 years that it is very good at evolvability:

“[Hypermedia Arch. Style] is software design on the scale of decades: every detail is intended to promote software longevity and independent evolution.” - Roy. T. Fielding

How does Hypermedia style achieve this? By breaking tight coupling of the client and the server. In Hypermedia style the server is guiding the client through communication of affordances, while client is exchanging messages with the server, moving the application from one state to another.

Road Signs

Basically: affordances communicated by the server are like road signs, they tell you what, where and how you can do. At the end of the day, it’s still the API client which “drives” the API “roads”, but it can navigate paths dynamically if people responsible for the roads care to post signs about: direction, speed/weight limits, reroutes etc. Without road signs you would need to memorize the entire map before you could drive anywhere. And what happens when the map is outdated? Same thing when a documentation for an API is outdated: you get lost.

Are Road Signs Good?

When the first cars started getting on the roads, we can be sure things were very simple: you didn’t have speed limits, you had very few roads, and people generally knew directions without any signs. Most certainly, when road signs first started appearing they were huge nuisance for the drivers: now you had to do more work to drive around, obey additional laws etc. And probably the road signs were not standardized initially, so they could be more confusing than helpful.

Today, good road signs are the major contributor to the usability of roads (we will get to the point of GPS, hang tight). As soon as you have many roads and many drivers, you cannot do without them. You wouldn’t dream doing without them.

But can’t we communicate directions without “media types”? There must surely be many ways a server can expose available controls to the client, right? Yes, there are, but if everybody did their own thing, it would be like road signs that are different in every municipality and all around the world.

Familiarity Is a Major Usability Factor.

In the “Hypermedia world” we talk a lot about “affordances” and that when we see a chair we know it “affords to be sat on, to put things on, or to throw it”, but we often skip the bit about: how do we know all of that? We can only understand what objects afford doing with them if we are familiar with their affordances!

“Objects are only useful to us if they communicate affordances in ways familiar to us” - yours truly.

When we publish an API, the clients of the API are our users. As @honzajavorek said at the latest Nordic APIs recently:

“You are designing your APIs for your users, not your database tables!”

Evolvability is the concern of the API Publishers. They are the ones that need to evolve server-side code, without breaking the clients. The clients themselves only care about it indirectly: when and if we break their code. What clients do care about directly is that: Hypermedia APIs do, currently, require more work on the client-side. This is due to the lack of proper standardization. No API client developer currently loves Hypermedia style. We can do better.

Layered Architecture

Most APIs do things that are the same, regardless the industry or app-specific requirements of the API:

  • Pagination
  • Internationalization
  • Rights Management
  • Linking to other data items
  • Templated querying
  • Facilities for content update
  • Ability to provide metadata about semantic meaning of app-specific attributes
  • etc.

the mechanics of doing these is the same for every API. Yet, every API does this differently. Just using a hypermedia type doesn’t solve the problem of familiarity: if you invent a new media-type every time you write a new API, then from the perspective of the API client: you are just making my life harder. The win is very small: basically a promise that you won’t break my code in the future, but that’s your problem to solve, anyway!

We are currently posting road signs that are different from city to city, from state to state etc. That hugely decreases their usability!

Instead, we could use layered approach:

  1. Media type addresses concerns that are the same for all APIs (kinda like HTML does for websites).
  2. Profiles allow definition of semantics specific to an industry or an application. They should be layered to create additional opportunities for familiarity: if I am aware of a profile standard in the news industry and I point my client that knows corresponding media type + news industry profiles, I should be able to extract 80% of value from CNN’s or NPR’s API, even if they have 20% of org-specific semantics they also expose.

So in this sense:

  1. Media type is for standardizing a lot of mechanics of exchanging messages and everything that is definitely common for any API. It should be kept very small and it should concentrate on defining all H-Factors to achieve maximum generality.

  2. Profiles are where you would define additional semantics, both: ontology-wise (attributes) as well as more link relations. There’re too many link relations (even standard ones) for media types to enforce their implementation in clients.

It’s a dangerous analogy, because HTML/CSS combination is geared towards rendering of unstructured content, but if you can forget that for a second and remember that APIs are all about structured data, a media type/profile separation is as crucial as separating markup from stylesheets in HTML/CSS. It is the same basic layering of concerns, in different context.

Ratios for Efficiency

Why is layering efficient and why can it solve the animosity client devs have towards Hypermedia APIs? Because there’s huge opportunity for “not repeating yourselves”. I reckon (call it Irakli’s Conjecture if you will) that, in most APIs:

  1. 30% of requirements are standard-enough to be covered by a media type
  2. 50% of requirements are standard-enough to be covered by an industry- and market- specific profiles
  3. Only 20% are really specific to your application.

Now imagine that there are standard SDKs (like: CURL for HTTP) to cover #1-3, then an API client only has to write 20% of the code they would normally write. And if we reach the state at which Profiles can be made fully machine-readable and automated (ALPS?) then you get yourself something fully automated like a GPS for cars, and the life is truly beautiful.

Would client devs still “hate” Hypermedia when and if it can make their lives that much easier? No chance!

So, How Many Media Types Do We Need?

People won’t write standard SDKs for 100s of media types. To have good, solid solutions we need to constrain this space. Ideally, I think we really only need one media type. In reality, humans can never agree on one way of doing things (hey, we still drive on different sides of the road in diff countries!), so there will probably be handful. That is fine: it will create some competition. However, if there are more than 3-4 major media types, then we are doing things wrong and I think that could be the biggest threat to Hypermedia taking-off as a predominant API architectural style.

Don’t create media types unnecessarily! Profiles is what you need.