|
|
|
|
|
by 1letterunixname
814 days ago
|
|
There most certainly is getwchar() and fgetwc()/getwc() on anything that's POSIX C95, so that's more or less everything that's not a vintage antique. Reading individual UTF-8 codepoints is a trivial exercise if byte width getchar() were available, and portable C code to do so would be able to run on anything made after 1982. IIRC, they don't teach how to write portable C code in Comp Sci programs anymore and it's a shame. Never read a file completely into memory at once unless there is zero chance of it being a huge file because this is an obvious DoS vector and waste of resources. |
|
Apologies for the imprecision: by OS API, I meant syscall, at least on POSIX systems. The functions you refer to are C stdio things. Note also they implement on top of read(2) one of the two options I mentioned: "loop over getting the next N bytes and getting all complete characters so far (with some extra complexity around characters that cross chunk boundaries)".
btw, if we're being precise, getwchar gets a code point, and character might mean grapheme instead. Same is true for the `str::chars` call in the LLM's Rust snippet. The docstring for that method mentions this [1] because it was written in this century after people thought about this stuff a bit.
> portable C code to do so would be able to run on anything made after 1982.
Our comments are part of a thread discussing this prompt [2] that specifically requests Rust and this snippet in response [3]. Not portable C code. You can use those C stdio functions from Rust, but you really shouldn't without a very good reason. Rust has its own IO library that is safe and well integrated with other Rust things like `#![derive(Debug)]`.
[1] https://doc.rust-lang.org/std/primitive.str.html#method.char...
[2] https://news.ycombinator.com/item?id=39910542
[3] https://news.ycombinator.com/item?id=39910542