Hacker News new | ask | show | jobs
by woodruffw 814 days ago
If we're being sticklers, this isn't reading character-by-character: it's performing a buffered read, which then gets iterated over.
3 comments

here's what it does if i feed your nit pick into it verbatim:

    use std::fs::File;
    use std::io::{self, Read};
    use std::str::from_utf8;

    fn read_file_character_by_character(path: &str) -> io::Result<()> {
        // Open the file in read-only mode.
        let mut file = File::open(path)?;

        let mut buffer = [0; 4]; // UTF-8 character max byte length
        let mut temp_buffer = vec![];

        while let Ok(bytes_read) = file.read(&mut buffer[..1]) { // Read byte by byte
            if bytes_read == 0 { // End of file
                break;
            }

            temp_buffer.push(buffer[0]);

            if let Ok(s) = from_utf8(&temp_buffer) {
                if s.chars().count() == 1 { // Complete character read
                    print!("{}", s);
                    temp_buffer.clear(); // Clear temp buffer for next character
                }
            }
        }

        Ok(())
    }

    fn main() {
        let path = "path/to/your/file.txt";

        if let Err(e) = read_file_character_by_character(path) {
            eprintln!("Error reading file: {}", e);
        }
    }
Unlike the original version, this version compiles and seems to basically work correctly. However, the design is misleading: `buffer` is declared as an array of 4 bytes but only the first byte is ever used. The code also has suboptimal performance and error handling, though that's not the end of the world.
all true, as I said in another fork of the thread, this comes down to part of what humans will still be valuable for in this loop: distilling poor requirements into better requirements.
I wouldn't say it's a nit. The file may be 10s of GB. Do you want to read it to a string?
The buffered read didn’t do that, it used the default buffered reader implementation. IIRC that implementation currently defaults to 8kb buffer windows which is a little too small to be efficient enough for high throughput, but substantially more performant than making a syscall per byte, and without spending too much memory.
I was talking about this:

    let mut file = File::open(path)?;
    let mut contents = String::new();
    file.read_to_string(&mut contents)?;
The original prompt is a bit under-specified. (But hey, that certainly matches the real world!)

You're going to have to buffer at least a little, to figure out where the USV / grapheme boundary is, depending on our definition of "character". To me, a BufReader is appropriate here; it avoids lots of tiny reads to the kernel, which is probably the right behavior in a real case.

To me, "read character by character" vaguely implies something that's going to yield a stream of characters. (Again, for some definition there.)