| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by raja_sekar 2430 days ago
	The author of the repo here. It is definitely not orders of magnitude faster. I didn't mention it anywhere also I guess. But yeah, JVM is sometimes a problem for in-memory computing for big data processing. Spark itself tried to address this. This is what their tungsten engine does. They circumvent huge Java Objects by using native types through JNI(sun.misc.Unsafe). This is the reason why Dataframes are generally much faster than RDD(which typically uses Java objects). This is the reason only certain native types are allowed in Dataframes. This project was just for exploring the feasibility of implementing itself in the native language. Closure serialization can be a nightmare here. If it actually translated to even 2-4X better performance than Spark which itself is very difficult to achieve considering years of optimizations went into Spark, it can be a good alternative and can reduce cloud costs a bit, especially if the Python APIs remain compatible. Spark Dataframes are already highly optimized. Therefore I just thought of open-sourcing it and if others see the benefits, it will automatically grow with the help of the community. It is still a long, long way to reach Spark level maturity. Spark is indeed a very huge ecosystem built upon an already big Hadoop ecosystem.

1 comments

fourthark 2429 days ago

> if others see the benefits, it will automatically grow with the help of the community

There’s nothing automatic about it, you or someone else will need to put a lot of work into leading the community, merging pull requests, debugging, etc.

(Sad to say, promotion too, in a lot of cases.)

link

raja_sekar 2429 days ago

I didn't mean it in that way. Instead of sitting idly on my laptop, it might at least be useful for someone, and if it really proves to be beneficial, then people might contribute to it. Yeah, not denying your point, it does require huge effort from some people to get it into a mature production-ready stage.

link