Google Protocol Buffers - the Good, the Bad and the Ugly

Google released to open-source its "language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more": Protocol Buffers.

Google claims it's like XML, but better mostly because: "[Protocol Buffers] are 3 to 10 times smaller, and 20 to 100 times faster".

My feelings about this news are mixed. Like I was just telling Doug you can't argue with Google when it comes to matters of performance and speed, BUT you can feel disgusted at the fact that a giant, like Google, uses its muscle to diminish and harm the crucial standard like XML. XML took so long to get adopted, made so much possible and is still so fragile, that you can't take this matter lightly.

OK, maybe the PB thing is faster and smaller and blah, blah, blah and maybe it's not as cumbersome as CORBA was, so it's not total evil, BUT (I repeat - BUT) let's be honest here - not everybody is Google and I can bet 90% of systems just do not care about the same things Google does. So, XML is fine for most applications.

However, now that Google is pushing one more of its bloated technologies (want another example? Think GWT) - a lot of people will adopt it just because it's a Google thing. And it may harm XML, and it may harm industry.

So, you see - as much as we all love open-source, sometimes when open-source gets intermixed with big, corporate politics - things can go south.

And last but not least, if you want more object-oriented, smaller, faster exchange format, there is JSON! JSON is well adopted and support, so why, God, why do mere mortals like ourselves need Protocol Buffers?

We don't. Please, don't force us.

u r crazee. xml, json, and

u r crazee. xml, json, and pb are COMPLETELY DIFFERENT paradigms for COMPLETELY DIFFERENT purposes. your entire post is dripping with stupidity. how did we create a generation of dumb programmers?

Nice Article

Anonymous,

you are wrong, son. Google itself calls Protocol Buffers an XML alternative in their press-release. You should learn how to write properly, before you call anybody stupid, too.

Irakli,

nice analysis. I second your concern.

A major difference between

A major difference between protocol buffers and JSON is that protocol buffers use a binary format, while JSON is plain text.

Because it's binary, the format is more compact and easier to interpret by a computer - which makes protocol buffers faster than JSON.

Minor Improvement

You can use compression during transmission if the data size is a concern. The biggest problem with PB is indeed the fact that it is binary. Binary exchange protocols have proven time and time again to have interoperability and portability issues.

Like I said in the article, nobody is arguing about performance - PB is definitely faster, but in most cases, for most applications - that level of optimization is not required, so that is irrelevant. On the other hand - PB has a potential to do significant damage to interoperability in the industry if it is positioned as XML alternative. Unfortunately, Google made a mistake to clearly state that on their website, they called it "better than XML".

Binary interoperability and portability issues?

That's quite an assertion regarding the portability of binary data. If byte order is your issue, that's childsplay compared to XML portability issues! Ever take a look at an XML file containing DATE and REAL NUMBER data? You generate the XML data in the US and read it back in Germany -- this causes parsing errors using the most common text conversions. That's a portability issue in my book.

I have to spec out at the beginning of every XML-based project, here's how we format dates, here's how we format numbers, here's how we avoid parsing errors. Most programmers don't think in cross-platform or internationalized ways, so they constantly rediscover these same problems. The mistakes are allowed by free text formatting/conversion, and they are actually not possible in the binary PB.

Don't brush aside the well-documented cost of parsing text -- many white papers now blame XML parsing as being the biggest performance bottleneck in the exchange of information. XML is not good enough for many of the tasks we use it for. I hear "good enough" all the time in this line of work, and people end up rewriting "good enough" when it doesn't scale to unanticipated new use cases.

There are 2 big reasons to

There are 2 big reasons to use PB over JSON and XML:

1) Positional binding. While it's true that JSON is less bloated compared to XML (which is over bloated), it still sends the name of the attribute with each record. That creates an enormous amount of overhead. PB, on the other hand, uses positional binding and doesn't send the attribute names at all.

2) Binary data. Yes, there is still binary data! Images, files, etc. that need to be transferred over the network. Binary data is natural for PB while JSON and XML offer no efficient way to transfer binary data (Base64 is not a very efficient way). Try uploading a 100MB file over SOAP and over PB, you'll see the difference.

The biggest advantage XML and JSON hold over PB is that they are readable by a human being. Unless that's needed PB is a better alternative, IMO.

Well, Protocol Buffers is

Well, Protocol Buffers is not for mere mortals, glad you've seen this through. In fact, there are quite a few of former Google engineers who are very accustomed and fond of using PB for the wire protocol. And their buddies, who are still working at Google wanted to help them diminish the pain of using XML, JSON or whatever, so they open-sourced the PB technologies.
Remember, Protocol Buffers is not for mere mortals like yourselves.

Nice article, BUT

JSON wasn't a standard in 2001, and now is popular, because is small and can be integrated with JavaScript very easy.

PB sounds good because of one thing - It's BINARY! You don't have he 30% overhead when using Base64 to encode the data. Something crucial in low-bandwidth or highly used network. The other thing is, off course, Google. If this is not the company to create (enforce), who will? I'm not aware of any standardized binary stream for inter-process/object communication protocol. Google has the power to make tis Protocol Buffers popular.

An idea came to me. Why not use ZIP for protocol. You have the expandability (add new files), security (supports encryption, even though it's weak), speed (it's binary). The problem is if somebody asks me for a binary stream, and I recommend him to use ZIP, he will be like "Why? Who uses it? Why not use something standard like Google's PB."

I'm currently interested in removing the 30% overhead when encoding binary file with Base64 and then using JSON.
A binary protocol will be nice, unfortunately PB is NOT implemented for JavaScript.