|
This seemed interesting, but when I went through the "Accepted Stack Overflow" links on the main page, I thought "how would I do this in an R tidyverse stack?" and set the goal that my responses should be shorter, clearer, or ideally both, and that I would favour clearer answers to code golf, except that when posting to HN I collapse the code into a single line while in R there would be linebreaks at each semicolon or after each pipe operator (%>%). Here are three examples below: "Customized sort based on multiple columns of CSV". In R, something like this: `library(tidyverse); read_delim("file.tsv", delim = "@") %>% arrange(.[[2]]) %>% group_by(.[[2]]) %>% arrange(match(.[[3]], c("arch.", "var." "ver.", "anci.", "fam.")), .[[3]]) %>% group_by(.[[2]], .[[3]]) %>% mutate(n = n()) %>% arrange(desc(n)) %>% ungroup() %>% select(1:4)` "Extract text from HTML table". In R, something like this would suffice: `library(rvest); library(tidyverse); read_html(URL_GOES_HERE) %>% html_nodes("div.scoreTableArea") %>% html_table() %>% write_delim("out.csv", delim = "\t")` "Get n-th Field of Each Create Referring to Another File". In R: `library(tidyverse); file1 = read_delim("file1.txt", delim = " ", col_names = FALSE); chunks = readChar("file2.txt", 999999) %>% str_split(";") %>% unlist() %>% map(function(x) { matches = str_match(str_trim(x), '^create table "(.)"([^(])\\(((.|\n)*)\\)$'); title = matches[, 2]; fields = matches[, 4] %>% str_split(",") %>% unlist() %>% str_trim(); return(tibble(table_name = rep(title, length(fields)), n = 1:length(fields), field = fields)) }) %>% bind_rows(); file1 %>% left_join(chunks, by = c("X1" = "table_name", "X2" = "n"))` The third example trades off a little clarity for a little robustness by adding a regex instead of assuming the SQL table definition is one field per line. |
TXR Lisp has support for that type of functional transformation of structured data, with fairly tidy syntax. If a need for a full blown HTML parsing library arises, someone will come up with one; maybe me. It could end up integrated into the TXR flex/Yacc parser, which would make it fast.
In the "Get n-th Field" task, what we can do is snarf the data as a string, then remove all the commas and semicolons. It then parses as a TXR Lisp with the lisp-parse function, resulting in this:
That seems to open an avenue to a solution. E.g. we can now partition it into pieces that start with the create symbol: Now the (qref "def" something) parts are in fixed positions, followed by fixed-shape triplets.Only problem with this type of solution is that it takes the example data too literally. The user's actual data might not cleanly parse this way.