|
|
|
|
|
by gcanyon
320 days ago
|
|
The answer seems obvious to me: 1. PDFs support arbitrary attached/included metadata in whatever format you like.
2. So everything that produces PDFs should attach the same information in a machine-friendly format.
3. Then everyone who wants to "parse" the PDF can refer to the metadata instead.
From a practical standpoint: my first name is Geoff. Half the resume parsers out there interpret my name as "Geo" and "ff" separately. Because that's how the text gets placed into the PDF. This happens out of multiple source applications. |
|
If you're interested in helping out the resume parsers, take a look at the accessibility tree. Not every PDF renderer generates accessible PDFs, but accessible PDFs can help shitty AI parsers get their names right.
As for the ff problem, that's probably the resume analyzer not being able to cope with non-ASCII text such as the ff ligature. You may be able to influence the PDF renderer not to generate ligatures like that (at the expense of often creating uglier text).