|
|
|
|
|
by geofft
3581 days ago
|
|
> suggestions for speeding it up in the tracker is to just remove that and work on raw bytes (FFS) This is valid, because UTF-8 was designed to make this valid. The UTF-8 encoding of a comma, 0x2C (also the ASCII encoding of a comma), does not appear as a part of any other UTF-8 encodings. Same with the UTF-8 encoding of the double quote, 0x22. So scanning for 0x22 and 0x2C bytes, without stopping to decode other UTF-8 sequences along the way, will produce the correct result for a valid UTF-8 input string. Then you fully decode UTF-8 for the individual fields when needed (and if you're doing a string-compare for some target value that's already UTF-8, you never need to decode UTF-8 for that field at all). |
|
Is Go's internal representation of the target string UTF-8?