Hacker News new | ask | show | jobs
by Yomguithereal 113 days ago
> ‘Safely’. An attacker who has control over a row in that file can easily embed data that satisfies the statistical checks, thus injecting data.

This is clearly not the sort of thing you should expose to anyone, it is an optimization technique. The same way you would not use a fast but DOSable hash function for your hashmap.

1 comments

> This is clearly not the sort of thing you should expose to anyone, it is an optimization technique.

If you’re optimizing a workflow that uses CSV files and the data in those files is under your control, speeding up parallel reading of CSV files wouldn’t be on my list; replacing CSV by, say, parquet would. An easy to implement alternative would be to simply write multiple CSV files.