| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by notafraudster 2633 days ago

This seemed interesting, but when I went through the "Accepted Stack Overflow" links on the main page, I thought "how would I do this in an R tidyverse stack?" and set the goal that my responses should be shorter, clearer, or ideally both, and that I would favour clearer answers to code golf, except that when posting to HN I collapse the code into a single line while in R there would be linebreaks at each semicolon or after each pipe operator (%>%). Here are three examples below:

"Customized sort based on multiple columns of CSV". In R, something like this: `library(tidyverse); read_delim("file.tsv", delim = "@") %>% arrange(.[[2]]) %>% group_by(.[[2]]) %>% arrange(match(.[[3]], c("arch.", "var." "ver.", "anci.", "fam.")), .[[3]]) %>% group_by(.[[2]], .[[3]]) %>% mutate(n = n()) %>% arrange(desc(n)) %>% ungroup() %>% select(1:4)`

"Extract text from HTML table". In R, something like this would suffice: `library(rvest); library(tidyverse); read_html(URL_GOES_HERE) %>% html_nodes("div.scoreTableArea") %>% html_table() %>% write_delim("out.csv", delim = "\t")`

"Get n-th Field of Each Create Referring to Another File". In R: `library(tidyverse); file1 = read_delim("file1.txt", delim = " ", col_names = FALSE); chunks = readChar("file2.txt", 999999) %>% str_split(";") %>% unlist() %>% map(function(x) { matches = str_match(str_trim(x), '^create table "(.)"([^(])\\(((.|\n)*)\\)$'); title = matches[, 2]; fields = matches[, 4] %>% str_split(",") %>% unlist() %>% str_trim(); return(tibble(table_name = rep(title, length(fields)), n = 1:length(fields), field = fields)) }) %>% bind_rows(); file1 %>% left_join(chunks, by = c("X1" = "table_name", "X2" = "n"))`

The third example trades off a little clarity for a little robustness by adding a regex instead of assuming the SQL table definition is one field per line.

2 comments

kazinator 2633 days ago

There is no HTML parsing library in TXR, yet the code still looks good.

TXR Lisp has support for that type of functional transformation of structured data, with fairly tidy syntax. If a need for a full blown HTML parsing library arises, someone will come up with one; maybe me. It could end up integrated into the TXR flex/Yacc parser, which would make it fast.

In the "Get n-th Field" task, what we can do is snarf the data as a string, then remove all the commas and semicolons. It then parses as a TXR Lisp with the lisp-parse function, resulting in this:

  (create table (qref "def" something)
   (f01 char (10) f02 char (10) f03 char (10) f04 date)
   create table (qref "abc" something)
   (x01 char (10) x02 char (1) x03 char (10))
   create table (qref "ghi" something)
   (z01 char (10) z02 intr (10) z03 double (10) z04 char (10) z05 char (10)))

That seems to open an avenue to a solution. E.g. we can now partition it into pieces that start with the create symbol:

  28> (partition *26 (op where (op eq 'create)))
  ((create table (qref "def" something) (f01 char (10) f02 char (10) f03 char (10) f04 date))
   (create table (qref "abc" something) (x01 char (10) x02 char (1) x03 char (10)))
   (create table (qref "ghi" something) (z01 char (10) z02 intr (10) z03 double (10) z04 char (10) z05
                                         char (10))))

Now the (qref "def" something) parts are in fixed positions, followed by fixed-shape triplets.

Only problem with this type of solution is that it takes the example data too literally. The user's actual data might not cleanly parse this way.

kazinator 2631 days ago

> when posting to HN I collapse the code into a single line

If you just put two spaces of indentationo on every line, you get a verbatim block in typewriter font,

  like
  this.