Hacker News new | ask | show | jobs
by mSparks 3504 days ago
my first reaction would be to parse until you hit a problem. then use a string distance function and a genetic algorithm to find the problematic characters.

in other words. find multiple possibilities that result in valid a json object and choose the one with the shortest distance.

then, of course log out the changes.

I do something similar with csvs. mssql is notorious for spitting out junk inside csv files.

also, i can guess how it was created.

the code is probably in c, and a rare edge case is overwriting memory before it hits the file.