| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by 1vuio0pswjnm7 1488 days ago

Problem 2 - Extract href value from <a> tags in NYT front page

Create a file called 2.l containing

    int fileno(FILE *);
    #define jmp (yy_start) = 1 + 2 *
    #define echo do {if(fwrite(yytext,(size_t)yyleng,1,yyout)){}}while(0)
   
   %s xa xb
   %option noyywrap noinput nounput
   %%
   \<a jmp xa;
   <xa>\40href=\" jmp xb;
   <xb>\" jmp 0;
   <xb>[^\"]* echo;putchar(10);
   .|\n
   %%
   int main(){ yylex();exit(0);}

Compile

    flex -8iCrf 1.l
    cc  -std=c89 -Wall -pedantic -I$HOME -pipe lex.yy.c -static -o yy1

And finally,

    yy2 < 1.htm

This faster than Python and requires fewer resources.

1 comments

lgas 1485 days ago

It's hard to imagine an environment where the the speed/resource difference between that approach and python would matter.

Can't see reaching for something like that instead of something like

    curl -s url | htmlq a --attribute href

link