Hacker News new | ask | show | jobs
by jnewhouse 3360 days ago
Author here, let me know if you have any questions/want more details.
3 comments

Thanks for writing about your experiences. Why not use an existing serialization framework, such as Protobuf, instead of building something in-house?
I don't think protobuf was around for public use when we came up with this format, which began around 2005. We use Protobuf internally, and some of our columns are actually byte[]'s containing protobuf data. We now support Parquet and are doing more work with other big data tools, but we've had a hard time matching the performance of our custom stuff.
1) Can you provide any more details about how Rowfiles are structured and/or implemented? Specifically, how does it handle nested objects? Does it support `transient`? Do `writeObject` and/or `readObject` come into play?

2) Do you feel this is a generic enough solution that you would consider submitting it as a JSR?

It natively supports a limited set of Columns. Basically boxed primitives, java.util.Date, joda.time.DateTime, and arrays and double arrays of both boxed and unboxed versions of the preceding. The list of Columns being used is used to read and write to a byte buffer. The byte buffer is almost entirely the field's data, with one or two bytes describing how the subsequent field is encoded. Nested objects aren't handled out of the box, but there is the capability to define a UserRowField that allows for serialization/deserialization to bytes of any Serializable class. This gets used for our SQL map-reduce function a lot. The downside is that you need to have the UserRowField implementation in your classpath in order to read the Row, which is not generally the case.
So... what's quantcast?
We're a big data advertise and measure company based in San Francisco. We run online display ad campaigns for marketers across realtime bidding exchanges (RTB), such as those run by Google and AppNexus. We also provide a publisher product to give site owners insights into their audience. Stack Overflow's profile is at https://www.quantcast.com/stackoverflow.com.