Hacker News new | ask | show | jobs
by tlarkworthy 2443 days ago
So a regex is used for the weak classifiers, just they are not trusted. So you say a thing ending in -ND is 90% chance of being a part number. So you have regexes with wiggle room. Then u dictate hard knowledge that table headers must be above data, and then the MIP solver has the freedom to override the regexes classifiers with knowledge from elsewhere. This works well if you have some really strong top level structural knowledge.