| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lenwood 1465 days ago
	Agree. I've completed data pipelines for several projects and have found that the cleanest, and often fastest solution is to use SQL to structure the data as needed. This is anecdotal and I'm not an expert with SQL, but I haven't come across a situation where R or Pandas dataframes worked better than a well written query for data manipulation. This has the benefit of simplifying collaboration across teams because within my company not everyone uses the same toolset for analysis, but we all have access to the same database. Other tools are better suited to analysis or expansion of the data with input from other sources, but within our own data SQL wins.