Hacker News new | ask | show | jobs
by datadrivenangel 498 days ago
If you have experience in any data frame library (like Pandas), and SQL, you can pick up PySpark pretty easily... With the one caveat that writing good data pipelines in any language gets much harder when you start looking at ways to actually processes big data (~20+TB). Modern SQL engines are so good though.