Google Protocol Buffers - the Good, the Bad and the Ugly
Google released to open-source its "language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more": Protocol Buffers.
Google claims it's like XML, but better mostly because: "[Protocol Buffers] are 3 to 10 times smaller, and 20 to 100 times faster".
My feelings about this news are mixed. Like I was just telling Doug you can't argue with Google when it comes to matters of performance and speed, BUT you can feel disgusted at the fact that a giant, like Google, uses its muscle to diminish and harm the crucial standard like XML. XML took so long to get adopted, made so much possible and is still so fragile, that you can't take this matter lightly.
OK, maybe the PB thing is faster and smaller and blah, blah, blah and maybe it's not as cumbersome as CORBA was, so it's not total evil, BUT (I repeat - BUT) let's be honest here - not everybody is Google and I can bet 90% of systems just do not care about the same things Google does. So, XML is fine for most applications.
However, now that Google is pushing one more of its bloated technologies (want another example? Think GWT) - a lot of people will adopt it just because it's a Google thing. And it may harm XML, and it may harm industry.
So, you see - as much as we all love open-source, sometimes when open-source gets intermixed with big, corporate politics - things can go south.
And last but not least, if you want more object-oriented, smaller, faster exchange format, there is JSON! JSON is well adopted and support, so why, God, why do mere mortals like ourselves need Protocol Buffers?
We don't. Please, don't force us.


u r crazee. xml, json, and
u r crazee. xml, json, and pb are COMPLETELY DIFFERENT paradigms for COMPLETELY DIFFERENT purposes. your entire post is dripping with stupidity. how did we create a generation of dumb programmers?
Nice Article
Anonymous,
you are wrong, son. Google itself calls Protocol Buffers an XML alternative in their press-release. You should learn how to write properly, before you call anybody stupid, too.
Irakli,
nice analysis. I second your concern.
A major difference between
A major difference between protocol buffers and JSON is that protocol buffers use a binary format, while JSON is plain text.
Because it's binary, the format is more compact and easier to interpret by a computer - which makes protocol buffers faster than JSON.
Minor Improvement
You can use compression during transmission if the data size is a concern. The biggest problem with PB is indeed the fact that it is binary. Binary exchange protocols have proven time and time again to have interoperability and portability issues.
Like I said in the article, nobody is arguing about performance - PB is definitely faster, but in most cases, for most applications - that level of optimization is not required, so that is irrelevant. On the other hand - PB has a potential to do significant damage to interoperability in the industry if it is positioned as XML alternative. Unfortunately, Google made a mistake to clearly state that on their website, they called it "better than XML".
Binary interoperability and portability issues?
That's quite an assertion regarding the portability of binary data. If byte order is your issue, that's childsplay compared to XML portability issues! Ever take a look at an XML file containing DATE and REAL NUMBER data? You generate the XML data in the US and read it back in Germany -- this causes parsing errors using the most common text conversions. That's a portability issue in my book.
I have to spec out at the beginning of every XML-based project, here's how we format dates, here's how we format numbers, here's how we avoid parsing errors. Most programmers don't think in cross-platform or internationalized ways, so they constantly rediscover these same problems. The mistakes are allowed by free text formatting/conversion, and they are actually not possible in the binary PB.
Don't brush aside the well-documented cost of parsing text -- many white papers now blame XML parsing as being the biggest performance bottleneck in the exchange of information. XML is not good enough for many of the tasks we use it for. I hear "good enough" all the time in this line of work, and people end up rewriting "good enough" when it doesn't scale to unanticipated new use cases.
There are 2 big reasons to
There are 2 big reasons to use PB over JSON and XML:
1) Positional binding. While it's true that JSON is less bloated compared to XML (which is over bloated), it still sends the name of the attribute with each record. That creates an enormous amount of overhead. PB, on the other hand, uses positional binding and doesn't send the attribute names at all.
2) Binary data. Yes, there is still binary data! Images, files, etc. that need to be transferred over the network. Binary data is natural for PB while JSON and XML offer no efficient way to transfer binary data (Base64 is not a very efficient way). Try uploading a 100MB file over SOAP and over PB, you'll see the difference.
The biggest advantage XML and JSON hold over PB is that they are readable by a human being. Unless that's needed PB is a better alternative, IMO.
Well, Protocol Buffers is
Well, Protocol Buffers is not for mere mortals, glad you've seen this through. In fact, there are quite a few of former Google engineers who are very accustomed and fond of using PB for the wire protocol. And their buddies, who are still working at Google wanted to help them diminish the pain of using XML, JSON or whatever, so they open-sourced the PB technologies.
Remember, Protocol Buffers is not for mere mortals like yourselves.
Nice article, BUT
JSON wasn't a standard in 2001, and now is popular, because is small and can be integrated with JavaScript very easy.
PB sounds good because of one thing - It's BINARY! You don't have he 30% overhead when using Base64 to encode the data. Something crucial in low-bandwidth or highly used network. The other thing is, off course, Google. If this is not the company to create (enforce), who will? I'm not aware of any standardized binary stream for inter-process/object communication protocol. Google has the power to make tis Protocol Buffers popular.
An idea came to me. Why not use ZIP for protocol. You have the expandability (add new files), security (supports encryption, even though it's weak), speed (it's binary). The problem is if somebody asks me for a binary stream, and I recommend him to use ZIP, he will be like "Why? Who uses it? Why not use something standard like Google's PB."
I'm currently interested in removing the 30% overhead when encoding binary file with Base64 and then using JSON.
A binary protocol will be nice, unfortunately PB is NOT implemented for JavaScript.
and there is AMF3
For a long time Macromedia, now Adobe, had its own binary protocol for communication between the Flash player and their backend applications. Last year (if i am correct) they released the AMF3 protocol and many implementations for different languages now exist.
It has been proven over and over again that codec and transfer speed is by far better than writing, transmitting and parsing XML documents, even with the native XML support since ActionScript3.
I would like to see an adoption of either protocols (AMF3, PB) in the realization of popular WebServices. What if Facebook, Youtube, Flickr and Amazon supported this content-type?
Say NO to the pointy-bracket language called XML.
used it.
Hi.
I used pb to build a payload mechanism to sit on top of our ipc layer; it behaved very well, was very easy to figure out and it is extensible. I'm pretty much agnostic regarding the format argument , so I won't venture an opinion ( admittedly this is partly driven by either ignorance or indifference, or a combination of both ). pb replaced our in-house codecs nicely and gives us an API that is easily inferred from the IDL . When I can present my co-workers with data-packaging code that is 1/10 in size ( as regards in-house code that needs to be maintained ), and which is more extensible, faster and easier to comprehend than the original, it makes a good argument. Not a silver bullet of course, but then again, I don't believe in werewolves. I don't really know what that means, but it makes me feel witty for some reason.
PB is good thing, so is XML
Thank you for sharing your experience, it is great.
I just want to add this point: the original blog post did not, in any way try to diminish PB per se. Of course there's value in binary protocols, especially if it is well tested by a huge company like Google and put out as an open standard. That's a great thing.
However, when and how that becomes a problem is where it is positioned as "XML killer". Now that is NOT a wise thing or a nice thing. While there are definitely cases where PB is more appropriate than XML (and you and the previous commenter gave excellent examples, I think), that does not in any way mean that XML is crap and we should just all drop it and use PB instead.
As a conclusion: it's been a year since this blog post, now, and clearly PB has not even started to "kill" XML, not even remotely, so I guess that shows what are the differences in the roles of the two.
Thank you