| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by alephxyz 1237 days ago
	I've worked with many data scientists whose typical SQL usage was to load entire rows (or with simple 'with' filtering) into Pandas / R dataframes and then do all their work there. I think it's a combination of Pandas and R having much simpler APIs and a ton of documentation on Stack Overflow, and modern hardware just being so good that you can load a big enough chunk of your dataset in-memory on a laptop.

3 comments

bigger_cheese 1237 days ago

I mostly use SAS, I tend to prefer using plain sql queries where I typically depart SQL and jump into code is doing what SAS calls "By Group processing" (example https://support.sas.com/kb/26/013.html#)

I am not as familiar with R. Last time I worked in R (some years ago) equivalent R code was something like this caution I'm no expert in writing R so might be a better /more intuitive way...

Output_data <-merge(x=T1, y=T2, by="Date", all.x="True") %>% mutate(My_var = NAME) %>% fill(My_var)

In SQL the equivalent would need to use Over (Partition by) which is less intuitive for me to write.

link

o_nate 1236 days ago

Hard for me to imagine anyone who finds Pandas API more intuitive than plain old SQL. I can't do anything in Pandas without looking up syntax.

link

VeninVidiaVicii 1237 days ago

Guilty! I live in data.table in R, which is essentially an ideological implementation of SQL, but with much terser syntax.

https://cran.r-project.org/web/packages/data.table/vignettes...

link

waffletower 1237 days ago

Never feel guilty using that superior workflow, when your dataset can comfortably resides in memory.

link