All Content Management Systems (CMS) Are Broken In a Bad Way.

Following is a transcript of a talk that was given at the APIStrategy and Practice Conference in San Francisco, in October 2013.

PMP Stylized as a Breaking Bad poster

Part One: Your CMS Is Broken. And So Is Mine.

I was invited here, today, to tell you about a massive content platform that we built as a Hypermedia API and why it is revolutionary.

But before we get to the good part, I need to make a confession:

All content management systems are broken in a bad way.

How do I know? I know because I have been part of the problem. I have spent a decade building content management systems for large multinationals, US Federal Government and a number of leading news organizations. I’ve done my share of proprietary ones and for years I was also a top contributor to open-source Drupal, which some of you probably use.

This is my public confession that we, despite all our best intentions, have failed miserably. No modern CMS is really architected for what publishers need.

What do publishers need? They need:

An easy way to connect to many content sources.
Ability to push content to many destinations: web, mobile and probably B2B.
They need their CMS to be architected for the “Cloud” so they don’t have to ever worry about silly things like: “are my servers up?”, or “do they scale?”

Unfortuantely, these are not the concerns that any CMS is architected for. . The mismatch of needs and the solutions isn’t just a problem for NPR or New York Times or Reuters. We live in the age of ‘Everybody Is a Publisher’.

A very vivid example of how deeply broken content management systems are, is how many people are choosing Github for publishing websites. And these are not just some geeks. The website part of healthcare.gov was built on top of Git. Ironically it was also the only part that actually worked.

If people are choosing code versioning systems to publish content, it’s safe to say: “Houston, we have a problem”.

Part Two: Public Media Platform

The conditions of the discovery of the solution to this problem were somewhat serendipitous.

About a year ago I and my team were asked to develop a content platform where hundreds of publishers could easily store, exchange and explore content.

The system:

had to be built for cloud/be scalable
had to facilitate decentralized content sourcing
and had to allow publishing to a wide variety of destinations

The irony of these three requirements being exactly what I knew a modern content management system needed wasn’t lost on me.

Building this kind of distributed system is hard and very risky. We immediately started looking for a success story of anybody having built anything so decentralized. The answer was right in front of us: it’s called the World Wide Web and we know it works.

Better yet, very smart people, such as Roy Fielding, Mike Amundsen, Mark Nottingham, Ioseb Dzmanashvili, Jon Moore and others have spent years analyzing what makes web’s architecture work.

The key to the web architecture is: Hypermedia.

Part Three: Hypermedia

Hypermedia is the matter of which the World Wide Web is made. Much like physical world is built of interacting elementary particles (Bosons and Fermions), the web is essentially the universe of myriad of interacting hypermedia documents.

Despite Hypermedia coming into the spotlight relatively recently, it is no baby. Hypermedia is more than twenty years old. Twenty-three to be precise. And much like we ourselves were at that age: it’s largely misunderstood, very rebellious, but also full of potential.

Most of Hypermedia’s potential comes from three core traits:

the robustness of the HTTP protocol,
universal use of URLs
and the extremely versatile media types it uses: HTML and CSS.

Now HTML is really special. It’s just a handful of tags, but the wealth of creative user-interfaces and user experiences that people are able to build with them is truly astounding. You have a handful of simple rules and you get enormous creativity. That’s magical. That’s the kind of thing we wanted.

We didn’t want something too prescriptive. Such things are either dead-at-birth due to bloat or the least-common-denominator solutions like RSS. RSS is fine, but it already exists. We wanted a magic like HTML, not something like RSS.

To be able to achieve this challenging task we needed to “stand on the shoulders of the giants”. We started with Mike Amundsen’s Collection+JSON media type, and Mark Nottingham’s Home Document specification. We added years of our own experience with content APIs and content management systems and created a media type we called Collection Document.

Collection Document is a recursive media type that is a document and a collection at the same time. As a collection it contains other documents that can contain other documents etc. This recursion allows to describe very complex domains.

But it’s also simple: it has only three top-level elements: attributes, links and errors. Links is the most important part since that’s what communicates the behavior and relationships of the content. For link types, rather then inventing new ones, we really tried to use popular, standard IETF link relation types whenever possible.

Collection.Document media type diagram

Following are the most important link relation types that Collection Document employs:

Profile: allows defining additional semantics on top of the media type. This is how you would define attributes of specific content types, which is an important task in content publishing. Profiles are also inheritable allowing re-use and collaboration. Profile definitions are themselves instances of the collection document type and are saved just like any other document. Which means: to define a new profile, you don’t have to go through a standards body. Innovation around profiles is decentralized.

Creating lists or buckets of documents is one of the most important task in content management, so we provide both of the two possible ways of doing so.

Item link: is a way to define collections top-down. This is when a document points to other documents that it contains. It’s a ‘contains’ relationship, suitable for “blog contains blog posts” or “news story contains asset documents” scenarios.
Collection link: is a bottom-up approach. In this case child documents themselves are pointing to which parent document they’re associated with. It’s a ‘belongs to’ relationship, suitable for things like: topics.
Permission links define who can access and modify the documents and point to a type of Collection.document that contains such information.
Query links are parametrized search URLs that you can run to explore and segment data.
Edit links are the form templates that you can submit to modify data.

All primary keys and relationships in a Collection Document are URL-based, allowing us to leverage a lot of built-in security, caching and routing capabilities of the HTTP protocol.

The current implementation of the media type is defined on top of JSON. Mostly because JSON is very simple and widely accepted format by API devs, but the media type itself is easily portable to XML or even as a microformats extension of HTML5.

We found that using these basic rules, Collection.doc media type can easily facilitate many use-cases. We can’t demonstrate all of them, but I would like to tell you about one that really brought this whole thing home for us.

Web of APIs

When we started building Public Media Platform, we faced a dilemma. NPR already has a large API with hundreds of content producers in it. Once we launch PMP, what should they do? Will producers have to send content to both NPR API and PMP API? This was a big question we candidly had trouble answering.

So we did what all good engineers do (#chuckle): we ignored the problem and kept going forward.

When we were done building PMP the answer became obvious. Usage of Hypermedia and religious usage of URLs for referencing everything (even internally) allowed us to “break” the biggest constraint of traditional APIs: the silo-ing of the content. In “classical” APIs: all documents need to live within the API itself. It’s not like Twitter API can make sense of a tweet that was created and is stored in another API. But PMP can, because all of its content are just URLs to certain types of documents. Where these documents live is largely irrelevant.

I repeat: you can have core data that your API relies on live somewhere else, as far as you can reference it!

That is groundbreaking in the APIs world. But ironically it’s natural for the rest of the web. You don’t save your page directly into Google’s database to make it searchable. And you don’t have to publish your webpage into somebody else’s database to make linkages. You kindof had to do such silly things for the APIs, though. But now we don’t need to, anymore.

When you use Hypermedia properly and specifically if you use Collection.Document media type, you end up with a true web of APIs. A truly distributed, robust and versatile system. Something that APIs weren’t able to do before.

Now, that is quite magical.