Hacker News new | ask | show | jobs
by mywittyname 2223 days ago
That behavior is technically correct because those are all valid representations of that value, e.g., 007449938 = 7449938 = 7.449938E6. Pretty much any plain text (or binary) data format will have this same issue, unless it supports additional metadata for display purposes.

If you intended the data to be a string value, then it should have been enclosed in quotes.

4 comments

The problem is that when you have a number like 123451234512345, it may get rewritten as 1.234512E+14, and then you are loosing information.
Your assertion sounded incorrect so I tested it. Quoting large numbers does not preserve them on a save.

Adding a ' before it should work, but for data extracted from arbitrary systems you'll have few guarantees of the sort. Formatting the cells in Excel to display the full value instead of the truncated value also works from memory, but you won't always know ahead of time if this happened later in the file.

In our case it was often mailing identifier barcodes so any loss of precision made them entirely useless.

What if the value is an ID rather than an integer? CSV doesn't specify whether the data type is a string or a number. The schema needs to be inferred by the parser/user.
> If you intended the data to be a string value, then it should have been enclosed in quotes.

That's not part of any CSV specification I've seen, including RFC4180.