Hacker News new | ask | show | jobs
by raggi 814 days ago
it does an ok job with this task:

    use std::fs::File;
    use std::io::{self, BufReader, Read};

    fn read_file_character_by_character(path: &str) -> io::Result<()> {
        // Open the file in read-only mode.
        let file = File::open(path)?;

        // Create a buffered reader to read the file more efficiently.
        let reader = BufReader::new(file);

        // `chars` method returns an iterator over the characters of the input.
        // Note that it returns a Result<(char, usize), io::Error>, where usize is the byte length of the char.
        for char_result in reader.chars() {
            match char_result {
                Ok(c) => print!("{}", c),
                Err(e) => return Err(e),
            }
        }

        Ok(())
    }

    fn main() {
        let path = "path/to/your/file.txt";

        if let Err(e) = read_file_character_by_character(path) {
            eprintln!("Error reading file: {}", e);
        }
    }
3 comments

Only problem is that the critical `chars` method doesn't actually exist. Rust's standard library has a `chars` method for strings, but not for Readers.

(Also, the comment about the iterator element type is inconsistent with the code following it. Based on the comment, `c` would be of type `(char, usize)`, but then trying to print it with {} would fail because tuples don't implement Display.)

good catch. feeding it the error output of rustc it then produces:

    use std::fs::File;
    use std::io::{self, Read};

    fn read_file_character_by_character(path: &str) -> io::Result<()> {
        let mut file = File::open(path)?;
        let mut contents = String::new();

        file.read_to_string(&mut contents)?;

        for c in contents.chars() {
            println!("{}", c);
        }

        Ok(())
    }

    fn main() {
        let path = "path/to/your/file.txt";

        if let Err(e) = read_file_character_by_character(path) {
            eprintln!("Error reading file: {}", e);
        }
    }
But this doesn't read the file char-by-char, but uses buffering to read it into a string
What would you expect? There's no OS API for "read one character", except in say ASCII where 1 byte = 1 code point = 1 character. And it'd be hideously inefficient anyway. So you either loop over getting the next N bytes and getting all complete characters so far (with some extra complexity around characters that cross chunk boundaries) or you read the whole thing into a single buffer and iterate the characters. This code does the latter. If this tool doesn't have the ability to respond by asking requirements questions, I'd consider either choice valid.

Of course, in real life, I do expect to get requirements questions back from an engineer when I assign a task. Seems more practical than anticipating everything up-front into the perfect specification/prompt. Why shouldn't I expect the same from an LLM-based tool? Are any of them set up to do that?

There most certainly is getwchar() and fgetwc()/getwc() on anything that's POSIX C95, so that's more or less everything that's not a vintage antique.

Reading individual UTF-8 codepoints is a trivial exercise if byte width getchar() were available, and portable C code to do so would be able to run on anything made after 1982. IIRC, they don't teach how to write portable C code in Comp Sci programs anymore and it's a shame.

Never read a file completely into memory at once unless there is zero chance of it being a huge file because this is an obvious DoS vector and waste of resources.

> There most certainly is getwchar() and fgetwc()/getwc() on anything that's POSIX C95, so that's more or less everything that's not a vintage antique.

Apologies for the imprecision: by OS API, I meant syscall, at least on POSIX systems. The functions you refer to are C stdio things. Note also they implement on top of read(2) one of the two options I mentioned: "loop over getting the next N bytes and getting all complete characters so far (with some extra complexity around characters that cross chunk boundaries)".

btw, if we're being precise, getwchar gets a code point, and character might mean grapheme instead. Same is true for the `str::chars` call in the LLM's Rust snippet. The docstring for that method mentions this [1] because it was written in this century after people thought about this stuff a bit.

> portable C code to do so would be able to run on anything made after 1982.

Our comments are part of a thread discussing this prompt [2] that specifically requests Rust and this snippet in response [3]. Not portable C code. You can use those C stdio functions from Rust, but you really shouldn't without a very good reason. Rust has its own IO library that is safe and well integrated with other Rust things like `#![derive(Debug)]`.

[1] https://doc.rust-lang.org/std/primitive.str.html#method.char...

[2] https://news.ycombinator.com/item?id=39910542

[3] https://news.ycombinator.com/item?id=39910542

A few notes:

- It should be generating `path: impl AsRef<Path>` to be properly generic.

- It's not setting a nonzero exit code on error.

- Edge case handling is a vital property for production-usable tools at scale. I'm wondering if it can yet special case situations such as creating a conditional compilation version for Linux that uses the splice syscall when the arguments are 2 file handles.

On errors, it exits with success.
fwiw, the benchmark that matters really has nothing to do with authoring code.

the typing of code is the easy part even though it's a part a lot of folks are somewhat addicted to.

the things which have far more value are applying value judgements to requirements, correlating and incorporating sparse and inaccurate diagnostic information into a coherent debugging strategy, and so on. there will come a time when it can assist with these too, probably first on requirements distillation, but for more complex debugging tasks that's a novel problem solving area that we've yet to see substantial movement on.

so if you want to stave off the robots coming for you, get good at debugging hard problems, and learn to make really great use of tools that accelerate the typing out of solutions to baseline product requirements.

If we're being sticklers, this isn't reading character-by-character: it's performing a buffered read, which then gets iterated over.
here's what it does if i feed your nit pick into it verbatim:

    use std::fs::File;
    use std::io::{self, Read};
    use std::str::from_utf8;

    fn read_file_character_by_character(path: &str) -> io::Result<()> {
        // Open the file in read-only mode.
        let mut file = File::open(path)?;

        let mut buffer = [0; 4]; // UTF-8 character max byte length
        let mut temp_buffer = vec![];

        while let Ok(bytes_read) = file.read(&mut buffer[..1]) { // Read byte by byte
            if bytes_read == 0 { // End of file
                break;
            }

            temp_buffer.push(buffer[0]);

            if let Ok(s) = from_utf8(&temp_buffer) {
                if s.chars().count() == 1 { // Complete character read
                    print!("{}", s);
                    temp_buffer.clear(); // Clear temp buffer for next character
                }
            }
        }

        Ok(())
    }

    fn main() {
        let path = "path/to/your/file.txt";

        if let Err(e) = read_file_character_by_character(path) {
            eprintln!("Error reading file: {}", e);
        }
    }
Unlike the original version, this version compiles and seems to basically work correctly. However, the design is misleading: `buffer` is declared as an array of 4 bytes but only the first byte is ever used. The code also has suboptimal performance and error handling, though that's not the end of the world.
all true, as I said in another fork of the thread, this comes down to part of what humans will still be valuable for in this loop: distilling poor requirements into better requirements.
I wouldn't say it's a nit. The file may be 10s of GB. Do you want to read it to a string?
The buffered read didn’t do that, it used the default buffered reader implementation. IIRC that implementation currently defaults to 8kb buffer windows which is a little too small to be efficient enough for high throughput, but substantially more performant than making a syscall per byte, and without spending too much memory.
I was talking about this:

    let mut file = File::open(path)?;
    let mut contents = String::new();
    file.read_to_string(&mut contents)?;
The original prompt is a bit under-specified. (But hey, that certainly matches the real world!)

You're going to have to buffer at least a little, to figure out where the USV / grapheme boundary is, depending on our definition of "character". To me, a BufReader is appropriate here; it avoids lots of tiny reads to the kernel, which is probably the right behavior in a real case.

To me, "read character by character" vaguely implies something that's going to yield a stream of characters. (Again, for some definition there.)