Hacker News new | ask | show | jobs
by cout 1103 days ago
With the mention of the parser's performance, I am reminded of Rich Kilmer's 2004 RubyConf presentation. Rich used Ruby to test reliability of a distributed system with hundreds of Java VMs. Some of the serialized data was stored as XML (over 1M lines!), which was slow to parse and load. Rich modified the program to serialize the data as Ruby code, which loaded much faster (https://www.infoq.com/news/2007/06/infoq-interview-rich-kilm...):

> Chad Fowler and I over basically 2 weeks, took the Java Debug Wire Protocol specification [...] turned it into a DSL in Ruby, and used that DSL to generate the packets for sending and receiving data. So we used the DSL in Ruby as a generator to generate Ruby code, as the whole protocol and then I used that at Darpa. They were trying to say “we could freeze the agent society from within the agent society will send messages, and it took about 7-8 minutes for all the messages to propagate and everything to freeze and then go quiet. I had a Ruby process that was running all 300 VM’s were underneath it, and I could freeze it in about a half of second. All 300 of them! And you actually could watch the CPU use because we had a monitor and the CPU use has dropped to zero. And it freaked them out. And what was great was you could turn it back on, and all the agents came back on. But time had been lost. And it was like alien abduction lost time, ten minutes went away, “what happened to us?” It was a bizarre thing because they were agents and they were planning on things and all of sudden 10 minutes just went away. But it was interesting to show how Ruby could actually be used as this kind of harnesses to wrap around things like systems.

2 comments

Nice cover story to hide your ~totally existing~ time machine ;)
>Some of the serialized data was stored as XML (over 1M lines!), which was slow to parse and load. Rich modified the program to serialize the data as Ruby code, which loaded much faster

So he took a data format designed for human readability and converted it to a data format that's designed purely to be read by Ruby and people are surprised that it's faster?

I would argue that an XML file that's "over 1M lines!" is no longer human readable.
That's not the point though.
I think a bigger difference is that the XML is parsed by a Ruby program and the generated Ruby code is parsed by a C program.
Using something like protobuf would have required fewer steps and adhered to a standardized format.
It sounds like the novel idea was thinking to do that in the first place
"Use a more performant serialization format" is hardly novel. It's why things like protobuf exist.
lol what do you want here. "ok you're right, the approach that worked is bad we should have waited for the astute technical insight of an anonymous internet commenter two decades later." you got it bud A++ are you looking for work you seem delightful.
I'm not sure why you're so upset over this. All I've done is point out that some people put a ton of work into a great deal of engineering effort to solve a problem that would have been trivially solved by merely using a different choice of serialization format. Even 10 years ago the technology to do so was widely available and standardized.

But I mean sure, WOOHOO, what a great achievement!