Hacker News new | ask | show | jobs
by tehjoker 530 days ago
i write c++, but i had to teach myself and always wondered why others use imprecise types. portability is one possibility, but then you can't know if your datastructure will break for a given input
2 comments

History and tradition at this point. Bit-sized integers and the other “meaningful” integer types like size_t weren’t added to the languages themselves until C99 and C++11. A lot of us learned those languages before that, and lots of code still exists from that time, or at least code bases that have evolved from that time.

I think it actually comes from the opposite of portability. Access to different kinds of systems wasn’t common then. If you were learning and working on a system where int is 32 bits and pointers are 32 bits, and other possibilities are just vague mentions in whatever books you’re learning from, it’s very easy to get into the habit of thinking that int is the right type for a 32-bit quantity and for something that can hold a pointer.

The lack of explicitly sized ints is actually a pro-portability feature but it prioritizes speed and ease of implementation of arithmetic operations over bitwise operations. The minimum ranges for each type can be used as a guide for average users to write correct and portable arithmetic and carefully-written bitwise operations. But most people would rather not think about the number of bits being variable at all.
Sort of. It was kind of handy when int would be the natural machine size, 16-bit on 16-bit hardware, 32 on 32. But making int be 64-bit leaves a gap, so it’s generally stuck at 32-but even on 64-bit hardware. And then people couldn’t agree on whether long should be 32 or 64 on 64-bit platforms, so now none of the basic integer types will typically give you the natural machine size on both 32 and 64-bit targets. At this point, if you want the “biggest integer that goes fast on this hardware” then your best bet is probably intptr_t or size_t.
There were/are machines where the char size is not 8 bits, and the ints are not sized in powers of 2. These machines are now rare but I think they still exist. This references some historical examples: https://retrocomputing.stackexchange.com/questions/12794/wer...
Oh wow, I didn't know size_t was so recent.
At least for C++, it's older than C++11; a lot of us used for a long time the "C++0x" pseudo-standard (which is mostly the draft of what later became C++11; as the C++0x name indicates, it was originally intended to be finished before 2010), and on most C++ compilers headers and types from C99 were available even when compiling C++ code (excluding MSVC, which really dragged their feet in implementing C99, and which AFAIK to this day still hasn't fully implemented all mandatory C99 features).
I believe it was somewhat older as part of typical C and C++ implementations, but don’t get standardized for a while. A big part of the older C and C++ standards are about unifying and codifying things that implementations were already doing.
I'm not sure what you mean by "imprecise types", but if you mean something like using an `int` for an array index instead of `size_t` or something, I can tell you why I do it. Using `int` lets you use -1 as an easy invalid index, and iterating backwards is a straightforward modification of the normal loop: `for (int i = max; i >= 0; --i)`. That loop fails if using `size_t`, since it is never negative. Actually `size_t` may not even be correct for STL containers, it might be `std::vector::size_type` or something. Also, I don't think I've encountered an array with more than 2 billion items. And some things, like movie data, are usually iterated over using pointers. As you say `int` is easy to type.

Also, for something like half my programming life, a 2+GB array was basically unobtainable.

By precise, I meant more the byte width (uint32_t vs uint64_t etc). The other kinds of types help you track what the purpose of something is, but don't really assist with correctness at the machine level.

In my work, I have a lot of data that is > 2GB, so int32_t vs uint32_t is very meaningful, and usually using a uint32_t is just delaying upgrading to int64_t or uint64_t.

Going in the other direction, a point cloud can usually be represented using 3 uint16_t and that saves a lot of memory vs using uint32_t or uint64_t.

If you want an index that can go negative, then the right type is ssize_t, not int.