Hacker News new | ask | show | jobs
by textmode 2878 days ago
In processing HTML, XML or JSON with sed I have often used tr (e.g., delete newlines, add non-printable delimiter, then replace delimiter with newline) to reformat into sed-friendly input. However, an easy alternative to using tr for this is flex.

As an example, below is a one-off/reusable HTML/XML reformatter in flex. This makes HTML/XML easier for me to read. It also makes it very easy to process with sed and other line-based utilities.

    ftp -4o 1.xml http://web.archive.org/web/20130814000845/http://zombofant.net/blog/ |a.out |less

    flex -8iCrfa 038.l 
    cc -static lex.yy.c

    cat 038.l

    #define echo ECHO
    #define jmp BEGIN
    #define nl putchar(10)
    #define ind fputs("\40\40\40",stdout)
   %s xa xb 
   xa \11|\40 
   %%
   ^\x0d\x0a jmp xb;
   \<{xa}*script nl;ind;echo;jmp xa;
   <xa>\<{xa}*\/script{xa}*\> echo;jmp xb;
   <xa>{xa}{xa}* putchar(32);
   <xa>. echo;
   <xb>\< nl;ind;echo;
   <xb>\> echo;
   <xb>{xa}{xa}* putchar(32);
   <xb>. echo;
   .|\n
   %%
   int main(){ yylex();}
   int yywrap(){ nl;}