Hacker News new | ask | show | jobs
by Felk 1940 days ago
I got a function that assigned the same expression to three variables. Then it declared a void function with documentation stating "returns true on success, false otherwise". Apparently that code was written by a human, which makes me either doubt the correctness of that website, or the quality of the code it was fed with
4 comments

First code it showed me had getXXX() methods returning void, each of which contained nothing but a printf using the same string variable with no apparent connection to XXX, along with invalid format strings. Surely code this nonsensical has to be generated. Yet when I clicked "GPT2" it said I was wrong.
Don't underestimate the power of failed merges and indifference
I know some closed source projects that suffered from that kind of getter. Sadly the coders complaining most about it also were the ones who committed the most crimes against common sense on it.
This made me worried, so I went and spot-checked 5-6. Using the "cheat sheet" I was always able to guess correctly, so I think the site is working fine.

The list of packages the real snippets are drawn from is here (maybe if you want to avoid using them... ;) ):

https://moyix.net/~moyix/sample_pkgnames.txt

Note that the GPT samples are prompted with 128 characters randomly selected from those same packages, so you will see GPT2-generated code that mentions the package name etc. However, these packages were not used for training.

Same thought here - apparently humans read from uninitialized arrays immediately after declaring them! That said, it is still a pretty fun website :)
Are people surprised to see low quality code in the wild? Guess what, most code is subpar.
I actually ran into a case where I wanted to do this, but was forced not to.

What was the scenario? I had a couple of small, fixed-size char buffers and I wanted to swap their valid portions, but the obvious choice of swap_ranges(a, b, a + max(na, nb)) would run into this issue. (n.b. this wouldn't be correct for non-POD types anyway, but we're talking about chars.)

On top of it being annoying to not be able to do the convenient thing, it made life harder when debugging, because the "correct" solution does not preservs the bit patterns (0xCC/0xCD or whatever) that the debug build injects into uninitialized arrays, therefore making it harder to tell when I later read an uninitialized element from a swapped-from array.

Why would you ever want to swap an uninitialized value into a buffer? You're wasting CPU cycles writing out data that you are guaranteed to never want to use. Why not just do a copy from the source buffer to the uninitialized one (as that is likely the half of the swap that is desired)?
I literally explained why I want to do that in my last paragraph?
Your last comment just talks about not detecting when you read uninitialized values, but obviously, you wouldn't read uninitialized values _if you never wrote them_?

Unless your use case is swapping with an uninitialized buffer to mark a buffer as "done" and detect further use of it?

You're not understanding what I'm saying. I'm talking about what I see in the debugger when I'm debugging. When you see 0xCC in a variable in the debugger you know you probably had an out of bounds read. Because in debug mode the compiler and runtime leave these markers in uninitialized memory. For that to be helpful you need to swap until the max of the initialized sizes of both arrays, so that you preserve these markers. You defeat that helpful feature if you copy the uninitialized portion of the buffer instead of swapping.
Since these are snippets from a random position in the file, it's possible that the code that initialized them was outside the snippet?
Maybe the probability of GPT2 generating that sequence is nearly 0. Sometimes weird edge cases are more human.