What kind of data issues are you able to detect?
What data formats are you accepting?
Is there a limit for data set size?