On April 22nd, 2011, I was in Washington, DC, preparing for my new job at NPR. It was a dream come true: to head technology at the media organization that I had been a life-long fan of.

npr photo

The job was starting on Monday, and it was only Friday, so I did what any responsible person starting a job at NPR should do — I grabbed a bottle of good, old, single malt and set down to think.

npr photo

In the next 15 minutes or so, I will try to share with you some of what was going through my head that night. Those ideas eventually became the basis of our technology strategy.

Let's start with the main point:

All CMSs Suck

Virtually all modern content management systems are completely broken.

How do I know? I know because I have been part of the problem. I have spent a decade building content management systems for large multinationals, US Federal Government and a number of leading news organizations. I've done my share of proprietary ones and for years I was also a top contributor to open-source Drupal, which some of you probably use.

I'm part of the problem

This is my public confession that we, despite all our best intentions, have failed miserably. No modern CMS is really architected for what publishers need.

Most content management systems were designed years ago, for much simpler world. We now live in an incredibly fragmented and complex world.

When we put some content together, various pieces of that content come from many different sources. Wire feeds such as Reuters and Associated Press, various social media outlets, partner news organizations–all contribute to the input.

Distributed World

The output is equally fragmented. Once a team of journalists produces the final content, it is pushed to many publishing channels: website, iOS and Android apps, satelite distribution, as well as: back to the partner organizations and news networks.

Despite the apparent complexity of the process, it has to all work seamlessly. During breaking news or major events coverage, the allowed margin of error is pretty much: zero.

What do publishers need

So what do publishers need? They need:

  • An easy way to connect to many content sources.
  • Ability to push content to many destinations: web, mobile and probably B2B.
  • They need their CMS to be architected for the "Cloud" so they don't have to ever worry about silly things like: "are my servers up?", or "do they scale?"

Unfortunately, these are not the concerns that any CMS is architected for.

Let's talk about how such system should be built then.

APIs First

1. APIs First

APIs are the pipes of the digital world. They are the only universal way to connect anything to any other thing on the web. Unfortunately, since we started the web in desktop-centric world, APIs were an afterthought. Historically, we used to build a website and then, maybe also add an API, as a window into our content.

This is a wrong approach. Your website is just one of the publishing destinations. Increasingly, it's not even the most important one, since mobile viewership is clearly on the rise. Don't treat your website as special. Put all your content and functionality into APIs and deliver them through the APIs: for anything whether it's website or Google Glass.

Hypermedia Driven

2. Hypermedia

In publishing we need things to just work. Editors should never worry whether large traffic brought their wonderful content piece down or if something called "replication" broke and that's why things aren't publishing. Things, however complex, must just work, under any load.

This is hard. Where do we look for help?

The most scalable, most distributed system humankind has ever created is: world wide web. We can learn a lot from its design. Better yet, very smart people, have already spent years analyzing what makes web's architecture work.

The key to the scalability of web's architecture is: Hypermedia.

Hypermedia is the matter of which the World Wide Web is made. Much like physical world is built of interacting elementary particles (Bosons and Fermions), the web is essentially the universe of myriad of interacting hypermedia documents.

"Hypermedia" sounds alien and intimidating. What is Hypermedia? It's a type of content that not only carries data but also links to other documents and can be interacted with.

Most of you are already familiar with one Hypermedia type: HTML.

We take HTML for granted but HTML is really special. It's just a handful of tags, but the wealth of creative user-interfaces and user experiences that people are able to build with HTML is truly astounding. You have a handful of simple rules and you get enormous creativity. That's magical. That's the kind of thing you need when you are building something revolutionary.

However HTML, as a hypermedia type, was designed for human-centric web, for websites, for rending of content. We said that we need to build APIs first. HTML isn't ideal for exchanging structured content via APIs.

Collection Document

There are however, other hypermedia types that were designed for that very purpose. As a matter of fact, when building PMP we designed a very robust one for media organizations. It's called Collection.Document and you can find all the information about it at cdoc.io. I highly recommend checking it out.

Linked APIs

3. Linked APIs

In order to build truly interconnected system that will allow us to freely publish content in the Internet of Things, it isn't enough to just build APIs first, or to just use Hypermedia. We need to use both of these tools to create something I call: Linked APIs.

Linked APIs are a new breed of APIs that fix a significant flaw with the current generation of APIs.

The problem with the current APIs is that: most APIs are, at best, creating narrow windows into solid walls surrounding the silo-ed data islands. Even the most well-known and large APIs – such as those provided by Twitter, Facebook or Google – only operate on the data that is within their own databases.

To take Twitter as the example: there is a lot that you can do with their public API; but in the end all of the created content always resides on Twitter's servers. The same is true for Facebook, of course.

In that sense, current APIs create isolated, guarded data islands in the universe of the web. Which is very "anti-web" — the web was created in the spirit of decentralized equal participation. On the web, everybody publishes everywhere, owns their data, and then we have ways to reach that data through hyperlinks, RSS feeds, activity streams, Google search and other methods. APIs have not really reached that stage of maturity yet. APIs are highly centralized, in terms of data storage, and virtually none of them ever link to other APIs.

We need APIs that link to each other. Hyperlinks were essential for the growth of human web. They are equally essential for the Internet of Things ahead of us.

We can only truly have open and free data, if we jail-brake the data out of the silos that data is stashed-away at, currently. Linked APIs are the key to data freedom on the web. They are the engine of that freedom.

Let's get the engine cranking!