Hacker News new | ask | show | jobs
by gaha 1796 days ago
The way I love doing these kinds of things is by literally using three lines of R code:

library(sqldf)

tab1 = read.csv("file1.csv")

sqldf("select * from tab1")

sqldf [1] is a library with which you can access R dataframes just like tables in SQL (it is actually using SQLite in the background). I do not have much experience with programming in R itself, but you barely need it. This approach has the advantage that it is very flexible, and you get the power of the (IMHO) best plotting library ggplot with it. Of course, you can also do more complicated stuff like joining multiple dataframes and basically everything else you can do with SQLite, and then store results back in R dataframes. This workflow works if you use an IDE which lets you execute single lines or blocks of codes like R-Studio. Then you also get a nice GUI with it, but there are also plugins for VI/Emacs that work very well.

[1] https://cran.r-project.org/web/packages/sqldf/index.html

EDIT: code formatting

2 comments

Checkout the R bindings for DuckDB[0]. You should find that it does the same thing (i.e. run SQL against a dataframe/file on disk) much faster for many SQL operations.

[0] https://duckdb.org/docs/api/r

Came here just to recommend DuckDB! :-) Huge fan. It's unreasonably fast for how easy it is to use.
If you’re going to use R you might as well learn a bit of dplyr or data.table.

You get that autocomplete goodness and the vast majority of sql can be reduced down to single lines.