Hacker News new | ask | show | jobs
by 1vuio0pswjnm7 1429 days ago
Below is one for PDF. Compile the 052.l file with something like

     flex -8iCrf $1;
     cc -O3 -std=c89 -W -Wall -pedantic -I$HOME -pipe lex.yy.c -static -o yy${x%.l};
     strip -s yy${x%.l};
     test -d yy||mkdir yy;
     export PATH=$PATH:$HOME/yy;
     exec mv yy${x%.l} yy;
"yy045" is a small program to remove chunked transfer encoding.

These programs are to be used in pipelines, something like

      echo https://www.bezem.de/pdf/ReservedWordsInC.pdf|yy025|nc -vv h1b 80|yy052 >1.pdf
"h1b" is a HOSTS file entry for a localhost TLS-enabled forward proxy

"yy025" is a small program that generates HTTP.

Interestingly I think curl was modified in recent years to detect binary data on stdin. I just tested the following and it extracted the PDF automatically.

       curl https://www.bezem.de/pdf/ReservedWordsInC.pdf > 1.pdf
However, one thing that curl does _not_ do is HTTP/1.1 pipelining. I use pipelining on a daily basis. That is where these programs become useful for me.

       cat > 052.l

       /* PDF file carver */
       /* PDFs can contain newlines */
       /* yy045 removes them so dont use yy045 */
   
    #define echo ECHO
    #define jmp BEGIN
    int fileno(FILE *);
   
   xa "%PDF-"
   xb "%%EOF" 
   
   %s xa 
   %option noyywrap nounput noinput
   %%
   
   {xa} echo;jmp xa;
   <xa>{xb} echo;jmp 0;
   <xa>.|\n|\r echo;
   .|\n
   
   %%
   int main(){ yylex();exit(0) ;}

   ^D