Hacker News new | ask | show | jobs
by doodledoodahs 842 days ago
OK, since you're here!

(this all prefaced with a massive thank you for tidyverse, without which R is very crusty).

I love R for interactive work and quick analyses, but I'm currently trying to integrate various bits of R code into a large document-building pipeline and wishing I could use Python for it:

- Exception handling and error processing seem a pain in R. Maybe I'm doing it wrong, but if feels like a mess and not nearly as ergonomic as python. Trycatch seems to have gotchas related to scope because the error handling is in a function. The distinction between warning, stop etc seems odd. The option to stop on warnings isn't useful because older packages seem to abuse warnings as messages. I have just discovered `safely` which is helpful, but then you have to unwrap lists in pipelines which feels clunky.

- Related, I _really_ wish we could just drop model objects or other tibbles as single objects directly into a tibble cell rather than as list(df). Unpacking lists and checking objects inside them exist is much more of a pain (e.g. can't just do `filter(!is.na(df_col))`)

- I really miss defaultdict from python, and dictionaries generally.

- Passing variable names as strings to dynamically generate things seems clunky compared with python. Again, it may be because I'm doing to wrong but I end up having to wrap things in !!sym the whole time and the nse semantics seem hard to remember (I only use R about 20% of the time). I liked cur_data() for passing a df row to a function but this now seems deprecated.

- String formatting -- fstrings are just great. Glue is OK, but escaping special characters seems more tricksy. Jinjar is OK, not quite jinja.

- purrr is nice, but furrr just isn't a drop-in replacement. Making http requests in parallel seems non-trivial compared to doing it with python. Is there an easy way to do it without creating multiple processes? Why can't I just do something like `. %>% mutate_parallel(response=GET(url), workers=10) %>% ...`?

2 comments

Amen to that. Can I add the following:

- 5 different ways to do wide to long and long to wide over the years even in the tidyverse. - A lot of dependencies to connect to DBs and difficult programs. Rstudio/Posit does have some premium libraries but they should be made free and bundled with the tidyverse to really promote the ecosystem. - Shiny support to save interactive charts and tables. This is a massive problem for me. If I have a heavily stylized HTML table with a bunch of css, I need to rely on webshot, webshot2 which are both alpha or beta versions and they are poorly documented. How can I evangelize R if my deployments cannot be used properly by my community?

What are the premium packages you're talking about? As far as I know all of our R packages are 100% open source.

I'd love to hear more why you're using webshot etc to talk screenshots of your shiny app. A more typical workflow would be to generate a separate HTML/PDF with quarto/RMarkdown.

Thanks for responding and your amazing work with the tidyverse. I am the "R-guy" in my finservices company and we have a paid rconnect dev/qa/prod and rserver pro licences for a few hundred users.

The packages I think are the dependencies of some DB connectivity libraries. https://www.rstudio.com/tags/databases/ - these are the ones I was referring to.

Re webshot my use case is: I have a heavily modified DT table in a shiny app. Users log in, play around with the DT table, update ggplots etc and then download the snapshot and send it to a WORD file. I can't move away from word and use html or pdf because we need the word file formatted by editors for publication and they need to follow the corpo guidelines. So, I am having to use webshot to grab a screenshot of the tagged html instead of natively handling it. I tried using officedown and a few other methods and it just didn't work.

ps: I hope the rebrand goes great and I am rooting for you.

Oh, you mean the pro drivers? Unfortunately we can't give those away because we have to pay several $100k a year just to get access for our customers. Most of the pro drivers do have equivalent open source versions that you should be able to use instead.

Hmmm, I'd still try generating the table with quarto (since you can output word documents), or try gt (https://gt.rstudio.com), which I know has much greater control over output, and supports RTF output (https://gt.rstudio.com/reference/as_rtf.html) which should import cleanly into word.

PDF in knitr is tied to TeX. Webshot and other capture is better because CSS styles work without translation to TeX.
> The distinction between warning, stop etc seems odd. The option to stop on warnings isn't useful because older packages seem to abuse warnings as messages.

Use suppressWarnings() to silence misbehaving functions or withCallingHandlers() to stop or handle specific conditions.

> Passing variable names as strings to dynamically generate things seems clunky compared with python.

Can you give me an elegant example in Python? Because I don't understand what you want to generate dynamically.

That said, I dislike the tidyverse solution as well. Too much abstraction for not enough benefit over a base solution with substitute()

For the most common cases, the tidyverse now only requires {{ }}. This allows you to tell tidyeval functions that you have the name of a df-var stored in an env-var. Do you have specific cases that you find frustrating?