| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by capnrefsmmat 3778 days ago

My biggest complaint about R isn't the inconsistency and obtuseness -- I've been using it long enough to get familiar with the documentation and the zillions of varieties of apply. My problem is the data structures.

R has only a few core data structures: vectors, lists, arrays, and matrices. Data frames are built on top of lists, and admittedly data frames are incredibly useful for statistics -- there's a reason pandas exists, and a reason data analysis is much more tedious in other languages.

But there are no hash maps or sets (lists have named elements, but with O(n) indexing; the only hash tables available use environments and accept limited types of keys), no tuples, no structural or record types, stacks and queues only recently became available on CRAN (through C), and so on.

This leads to the folk belief that the only way to optimize R is to vectorize code or to write in in C or C++ (with Rcpp, for instance). No statistical programmer ever thinks about choosing the right data structure for the job, since you basically only ever use lists and data frames. Fast operations on data structures (like graph algorithms) have to be written in C. There's just no way to do it in R.

When I co-taught a statistical computing course, covering the basics of data structures and algorithms, I included some homework assignments where the difference between a fast and a slow algorithm was the choice of data structure. R users struggled because they had very little available to them. If their code wasn't fast because they were doing O(n) list lookups in a loop, there wasn't anything they could do to fix it.

I hope Python and Julia can eat R's lunch. Some day I'll have to get around to trying Julia for a serious project...

1 comments

hadley 3777 days ago

The lack of data structures in R is a totally fixable problem.

link

capnrefsmmat 3777 days ago

Sure, through packages, but you'd need to adapt the entire standard library to take advantage of them, so you could pass new data structures to built-in functions and get meaningful results.

Generic iterators would also be extremely useful to build in, so it's easy to work with a wide variety of structures.

link

hadley 3777 days ago

And generic functions allow you to fill in those missing pieces from a package too.

link