|
|
|
|
|
by bringtheaction
3048 days ago
|
|
Speaking of text conversion and git, I usually don't commit non-textual data aside from necessary files like image and audio assets, but one time I commited some PDFs in a "samples" directory for a tool I made to extract some data from a set of PDF files, and later I removed one of them and observed that when I typed "git show" the diff showed the text contents of the PDF which I find rather mind blowing because of how much trouble I had experienced extracting text and how git was casually showing me an ascii rendering of the document more or less with good representation of the layout of the document. This in fact prompted me to further investigate the open source text extraction tools on the market and I ended up finding one that was better than the one I had selected at first and which I had then been building upon. Happily my own tools were built in such a way that I could reuse most of the code I had written while using the previous tool, and in fact during the rewrite I also realized that I could write the new code in a much cleaner way and so there were basically only upsides to switching tool and rewriting some of my code :) |
|