|
|
|
|
|
by PaulHoule
1478 days ago
|
|
For a long time data analysis products have had "profiling" tools https://en.wikipedia.org/wiki/Data_profiling which can look at the values in a column and make some inferences about the column such as "these are all integers between 35 and 89". Most of those work at the level of the whole column, but I worked at a firm that developed a convolutional network classifier that could take either a single data point (say "1999-08-24") or the column header text plus the data point ("Independence Date", "1998-08-24") and guess at the data type (e.g. "date", "address", ...) It worked really well but wasn't explainable. Another disadvantage was that there was some things it was never going to figure out, such as this checksum on credit card numbers: https://en.wikipedia.org/wiki/Luhn_algorithm |
|