|
Hi HN, I am looking for the most accurate tool which I can use to infer Data Types from Tabular data (csv,tsv,excel) I need to be able to perform some small customization, if possible, to the detection algorithm. For example if I have a 9 digit number, starting with 0, then treat it as a String. So far - I have found Frictionless Framework [0] which seems good, but I can't see any way of specifying customizations to the profiling algorithm, and Data Profiler [1] which uses ML for type detection, and it seems I should be able to train some new rules but I need a CUDA capable machine, which at the moment I do not have. Hoping the collective HN brain can point me to something better if it exists. [0] - https://framework.frictionlessdata.io/
[1] - https://github.com/capitalone/DataProfiler |
https://pandas.pydata.org/pandas-docs/stable/reference/api/p... is pretty powerful (see also "parse_dates" and "converters" parameters). See also parse_excel()
You can also use procedural code to look at the column data and change the type: