| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by klelatti 2002 days ago
	I'm an actuary with a strong interest in this area - would be very interested to hear more especially on your R vs Python experience.

5 comments

meztez 2002 days ago

It came down to IDE, workflow and data.table.

RStudio is an absolute killer solution from the get go. Package management in R is simple and robust. Shiny is the new Excel pivot table on performance enhancing code.

Python has more contributors, more users. It also creates a lot more noise. Business people may feel like it is a a programmer tool. R feel more approachable.

In the end, both are great solutions but we decided on R because we believe in the people contributing to the ecosystem, mostly RStudio. Somewhere down the line, there might be a transition to julia.

klelatti 2002 days ago

Thanks - really interesting, especially on the RStudio point.

hnracer 2002 days ago

I've used R (3 years) and Python (8+ years) in data science and much prefer Python, because it can do things that aren't just pure data analysis, and because pandas is so amazingly good compared to R's data matrix solutions, in my opinion. I believe that the algorithmic trading industry has gone fully into Python and away from R for these reasons.

meztez 2002 days ago

R has data.table. It is the game changer as I agree base R data.frame do not cut it for performance. tibble will come close once they incorporate more of the data.table performance tricks.

https://h2oai.github.io/db-benchmark/

hnracer 2002 days ago

Does R have robust CSV parsing? I remember using the default and it'd be extremely finicky about getting the header and index flags right and wouldn't typecast numeric columns properly (instead they'd end up as factors and not play nice)

st1ck 2002 days ago

Python version of data.table has very fast CSV parsing (compared to Pandas), and it didn't have issues like those you mention. Even if data.table had issues with CSV parsing, you could probably use Apache Arrow to parse CSV into arrow table and then convert it to data.table (but that is probably suboptimal).

alexhutcheson 2002 days ago

https://readr.tidyverse.org/

bostonfincs 2002 days ago

Personally have never had a problem with R csv parsing

disgruntledphd2 2001 days ago

It happens, but mostly because other formats don't produce usable CSV's. The biggest problem is if there are any free-entry text fields (common for customer/business name), and there isn't full quoting around these fields, base R will break.

I believe both fread and readr::read_csv do the right thing here, but the base-R perspective on data manipulation before read.csv is to use Perl (the R-core team are pretty old-school, to be fair).

eyeball 2002 days ago

h2o's data.table clone is fine

https://github.com/h2oai/datatable

alexilliamson 2001 days ago

I've been a heavy user of all 3, and pandas syntax is a nightmare compared to dplyR or data.table in R. That being said, I still use pandas because I prefer python for non-analysis.

jimmyjimjimmy 2002 days ago

I'm a CPA. When I started learning code, I looked for whatever was most like a spreadsheet. R for the bill, with built-in frames.

2Gkashmiri 2002 days ago

Oh.. similar line for me, accounting/tax law. Excel is bread and butter because all year end fianncials are prepared and finalised on excel. Although I have used libreoffice on my personal machine, it also kinda works.

For a couple of years I have tried to excel macro myself a balance sheet template which does most of the copy pasting from precious years, does bank interest calculations and all.

It would be interesting to know how does a us CPA work because its all accounting package>excel>efile.

thetwentyone 2002 days ago

I'm on mobile, but do also consider https://JuliaActuary.org (something that I personally have contributed to).

klelatti 2002 days ago

Looks really interesting thanks. I've seen some interesting insurance projects using Julia e.g.

https://www.youtube.com/watch?v=__gMirBBNXY

jgalt212 2001 days ago

R is better if your raw data is already tabular. I prefer Python if the raw data is unstructured / semi-structured. You can make the case that once Python has converted the data to tabular then move to R, but at that point I like the soup to nuts to be in one language.