| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kazinator 2597 days ago

There is no HTML parsing library in TXR, yet the code still looks good.

TXR Lisp has support for that type of functional transformation of structured data, with fairly tidy syntax. If a need for a full blown HTML parsing library arises, someone will come up with one; maybe me. It could end up integrated into the TXR flex/Yacc parser, which would make it fast.

In the "Get n-th Field" task, what we can do is snarf the data as a string, then remove all the commas and semicolons. It then parses as a TXR Lisp with the lisp-parse function, resulting in this:

  (create table (qref "def" something)
   (f01 char (10) f02 char (10) f03 char (10) f04 date)
   create table (qref "abc" something)
   (x01 char (10) x02 char (1) x03 char (10))
   create table (qref "ghi" something)
   (z01 char (10) z02 intr (10) z03 double (10) z04 char (10) z05 char (10)))

That seems to open an avenue to a solution. E.g. we can now partition it into pieces that start with the create symbol:

  28> (partition *26 (op where (op eq 'create)))
  ((create table (qref "def" something) (f01 char (10) f02 char (10) f03 char (10) f04 date))
   (create table (qref "abc" something) (x01 char (10) x02 char (1) x03 char (10)))
   (create table (qref "ghi" something) (z01 char (10) z02 intr (10) z03 double (10) z04 char (10) z05
                                         char (10))))

Now the (qref "def" something) parts are in fixed positions, followed by fixed-shape triplets.

Only problem with this type of solution is that it takes the example data too literally. The user's actual data might not cleanly parse this way.