Hacker News new | ask | show | jobs
by llbeansandrice 1989 days ago
> I agree with a few other commentators here that Hadoop/Spark isn't being used a lot in their production environments

I guess I'm the odd-man out because that's all I've used for this kind of work. Spark, Hive, Hadoop, Scala, Kafka, etc.

2 comments

I should have specified more thoroughly.

I am not seeing Spark being chosen for new data eng roll-outs. It is still very prevalent in existing environments because it still works well. (used at $lastjob myself)

However - I am still seeing a lot of Spark for machine-learning work by data scientists. Distributed ML feels like it is getting split into a different toolkit than distributed DE.

I guess it depends on what jobs you're looking for. There's a lot of exiting companies/teams (like mine) looking to hire people but we're on the "old stack" using Kafka, Scala, Spark, etc. We don't do any ML stuff but I'm on the pipeline side of it. The data scientists down the line tend to use Hive/SparkSQL/Athena for a lot of work but I'm much less involved with that.

Not all jobs are new pasture and I think that's forgotten very frequently.

I'd love to do some Kafka, Scala and Spark. What kind of exp are you looking for?
I'm also the odd one out, so many enterprises moving to spark on databricks.