|
|
|
|
|
by bhntr3
3487 days ago
|
|
I've looked at the code and messed around with arrow. It seems like a performance optimization that solves a small sliver of the problem. It could help with the parquet/thrift version issues they mentioned. But I don't see any guarantee it won't introduce its own version and compatibility problems. If the initial implementations are buggy like described in TFA it could actually be a lot worse. In general, I've learned to be skeptical of any new big data solution. Hadoop and hive are clumsy but as someone on my team said "they've found and fixed the tens of thousands of bugs". It seems to take five years before any significant new solution is stable and reliable enough to be used on large, complex workloads. Which makes me really uncertain how we get out of this situation. Maybe something like arrow is a silver bullet that fixes everything with minimal complexity and thus few bugs. But I'm skeptical. |
|