Following is a transcript of a talk that was given at the APIStrategy and Practice Conference in San Francisco, in October 2013.
I was invited here, today, to tell you about a massive content platform that we built as a Hypermedia API and why it is revolutionary.
But before we get to the good part, I need to make a confession:
All content management systems are broken in a bad way.
How do I know? I know because I have been part of the problem. I have spent a decade building content management systems for large multinationals, US Federal Government and a number of leading news organizations. I've done my share of proprietary ones and for years I was also a top contributor to open-source Drupal, which some of you probably use.
This is my public confession that we, despite all our best intentions, have failed miserably. No modern CMS is really architected for what publishers need.
What do publishers need? They need:
Unfortuantely, these are not the concerns that any CMS is architected for. . The mismatch of needs and the solutions isn't just a problem for NPR or New York Times or Reuters. We live in the age of 'Everybody Is a Publisher'.
A very vivid example of how deeply broken content management systems are, is how many people are choosing Github for publishing websites. And these are not just some geeks. The website part of healthcare.gov was built on top of Git. Ironically it was also the only part that actually worked.
If people are choosing code versioning systems to publish content, it's safe to say: "Houston, we have a problem".
The conditions of the discovery of the solution to this problem were somewhat serendipitous.
About a year ago I and my team were asked to develop a content platform where hundreds of publishers could easily store, exchange and explore content.
The irony of these three requirements being exactly what I knew a modern content management system needed wasn't lost on me.
Building this kind of distributed system is hard and very risky. We immediately started looking for a success story of anybody having built anything so decentralized. The answer was right in front of us: it's called the World Wide Web and we know it works.
The key to the web architecture is: Hypermedia.
Hypermedia is the matter of which the World Wide Web is made. Much like physical world is built of interacting elementary particles (Bosons and Fermions), the web is essentially the universe of myriad of interacting hypermedia documents.
Despite Hypermedia coming into the spotlight relatively recently, it is no baby. Hypermedia is more than twenty years old. Twenty-three to be precise. And much like we ourselves were at that age: it's largely misunderstood, very rebellious, but also full of potential.
Most of Hypermedia's potential comes from three core traits:
Now HTML is really special. It's just a handful of tags, but the wealth of creative user-interfaces and user experiences that people are able to build with them is truly astounding. You have a handful of simple rules and you get enormous creativity. That's magical. That's the kind of thing we wanted.
We didn't want something too prescriptive. Such things are either dead-at-birth due to bloat or the least-common-denominator solutions like RSS. RSS is fine, but it already exists. We wanted a magic like HTML, not something like RSS.
To be able to achieve this challenging task we needed to "stand on the shoulders of the giants". We started with Mike Amundsen's Collection+JSON media type, and Mark Nottingham's Home Document specification. We added years of our own experience with content APIs and content management systems and created a media type we called Collection Document.
Collection Document is a recursive media type that is a document and a collection at the same time. As a collection it contains other documents that can contain other documents etc. This recursion allows to describe very complex domains.
But it's also simple: it has only three top-level elements: attributes, links and errors. Links is the most important part since that's what communicates the behavior and relationships of the content. For link types, rather then inventing new ones, we really tried to use popular, standard IETF link relation types whenever possible.
Following are the most important link relation types that Collection Document employs:
Creating lists or buckets of documents is one of the most important task in content management, so we provide both of the two possible ways of doing so.
Item link: is a way to define collections top-down. This is when a document points to other documents that it contains. It's a 'contains' relationship, suitable for "blog contains blog posts" or "news story contains asset documents" scenarios.
Collection link: is a bottom-up approach. In this case child documents themselves are pointing to which parent document they're associated with. It's a 'belongs to' relationship, suitable for things like: topics.
Permission links define who can access and modify the documents and point to a type of Collection.document that contains such information.
Query links are parametrized search URLs that you can run to explore and segment data.
Edit links are the form templates that you can submit to modify data.
All primary keys and relationships in a Collection Document are URL-based, allowing us to leverage a lot of built-in security, caching and routing capabilities of the HTTP protocol.
The current implementation of the media type is defined on top of JSON. Mostly because JSON is very simple and widely accepted format by API devs, but the media type itself is easily portable to XML or even as a microformats extension of HTML5.
We found that using these basic rules, Collection.doc media type can easily facilitate many use-cases. We can't demonstrate all of them, but I would like to tell you about one that really brought this whole thing home for us.
When we started building Public Media Platform, we faced a dilemma. NPR already has a large API with hundreds of content producers in it. Once we launch PMP, what should they do? Will producers have to send content to both NPR API and PMP API? This was a big question we candidly had trouble answering.
So we did what all good engineers do (#chuckle): we ignored the problem and kept going forward.
When we were done building PMP the answer became obvious. Usage of Hypermedia and religious usage of URLs for referencing everything (even internally) allowed us to "break" the biggest constraint of traditional APIs: the silo-ing of the content. In "classical" APIs: all documents need to live within the API itself. It's not like Twitter API can make sense of a tweet that was created and is stored in another API. But PMP can, because all of its content are just URLs to certain types of documents. Where these documents live is largely irrelevant.
I repeat: you can have core data that your API relies on live somewhere else, as far as you can reference it!
That is groundbreaking in the APIs world. But ironically it's natural for the rest of the web. You don't save your page directly into Google's database to make it searchable. And you don't have to publish your webpage into somebody else's database to make linkages. You kindof had to do such silly things for the APIs, though. But now we don't need to, anymore.
When you use Hypermedia properly and specifically if you use Collection.Document media type, you end up with a true web of APIs. A truly distributed, robust and versatile system. Something that APIs weren't able to do before.
Now, that is quite magical.