Hacker News new | ask | show | jobs
by adomanico 3366 days ago
It does seem to be affecting the end user binary size: http://imgur.com/a/FEgQY

These numbers were all around 60mb on Xcode 8.2.1

3 comments

60 millibits is pretty small.

(Pardon the joke, but while the "m" obviously means "M" from context, I really wish people -- especially computer engineers -- wouldn't say bits when they mean something eight times as large.)

I think you have a good point. The whole reason for standards (like SI units) is so that people don't need to guess or interpret. I don't get why this was downvoted.
A bit isn't a divisible unit, so nobody is going to be confused as to whether "mb" stands for "millibits". As for the capitalization of "b", if we're going to be pedantic then we should say "Mo" for "megaoctets", since "byte" is, as far as IEEE standards are concerned, of ambiguous length. But I think we can trust people enough to not spend too long puzzling over whether "mb" means megabytes or megabits, just as we can trust them to assume that byte implies eight bits in this context.
"A bit isn't a divisible unit"

What ? Sure it is. When you measure the information content (or, the entropy) of a message, you very frequently get non-integer numbers of bits per (character/unit/message/whatever).

Written english, for instance, has 1.46 bits of information per character.

Or 146 cb.

That is expressing a ratio. A more concrete example, .46 a person doesn't exist even if that ratio is useful for expressing statistics.
I think "gaining half a bit of information" about something can correspond to stuff that lets you update your probability distribution about it?

I'm not sure.

Or to express it another way, a variable with 3 possible states has 0.5 bits more capacity than one with 2.
0.585... bits more capacity, but yeah (bits = log2(states)).
> you very frequently get non-integer numbers of bits per (character/unit/message/whatever)

That'd be an average, and it's like when you have 2.58 people per household. Presumably most people do not keep about 6/10 of a person around.

The point of a bit (in information theory) is that it is the smallest possible (read: not divisible) unit of information.

> But I think we can trust people enough to not spend too long puzzling over whether "mb" means megabytes or megabits

Please don't assume this. I have the great pleasure working with network Engineers, who have apparently globally decided that bits are a perfectly reasonable measurement of throughput and react very differently to speed in Mb/s and MB/s. I'm not trying to be pedantic or say that this is how it should be, I'm just saying that people really do use both units and it is horribly confusing and anything you can do to not be ambiguous is appreciated.

Well, no. If we're going to be pedantic, we should say MiB (mebibytes), because file sizes on disk are expressed in multiples of powers of 2, but the SI prefixes are multiples of 10.

So a 1 megabyte file (as reported by the file system) is actually 1048576 bytes, which technically - sorry, I mean pedantically - speaking, is 1 mebibyte.

To make matters worse, disk manufacturers use the decimal prefixes, so our nice 1 terabyte drive is 931 mebibytes, but is reported by the file system as 931 megabytes (not MiB).

Finally, memory manufacturers use the binary prefix, so 1 megabyte of RAM is actually 1 mebibyte (1048576 bytes).

A bit of a mess, no?

All the above is, IMHO, a consequence of imprecision. If we get used to being loose with our terminology, we risk carrying that attitude over into our work product, with sometimes regrettable results.

So I'll continue to strive to be pedantic (translation: precise).

> Well, no. If we're going to be pedantic, we should say MiB (mebibytes), because file sizes on disk are expressed in multiples of powers of 2,

Not true anymore. OS X (and I assume iOS) reports sizes in power-of-10 units.

If you think about it, it is really user-hostile to express file sizes as powers-of-two. Who can remember that a "GiB" is 1073741824 bytes?

I didn't know OS/X used the decimal prefixes, but that just means it's less true, not untrue. There are still many more systems out there that use the binary prefix. I imagine most *nix, and not sure about Windows. And RAM is still power of 2.

I don't think it's terribly user hostile to express sizes as powers of two when you work with these kinds of numbers for a living, especially when it's near the bare metal (Erlang binary data type FTW!)

But I do think it's user hostile to have two different units depending on what you're looking at. If it were all decimal or all binary, it would be much easier.

You probably meant 1TB to be 931 Gibibytes, didn't you?
Ah crap, yes, I did. Got to stop this middle of the night posting...
While we're at it, one should use “Mi”, not “M”. M is still 1000 * 1000, while Mi is 1024 * 1024.

That amount of sloppiness in any other engineering discipline would just finish you off immediately.

Agreed. Fun fact: Wolfram alpha understands Mebibytes, which is useful if you want to quickly convert between networking specs (say megabits) and "real" computer units.

Then again, maybe doing more simple math by actually using one's brain wouldn't hurt either. :)

Edit: And yes, I'm aware that you'll never get the converted speed of what is written on the network device's box. But sometimes it's nice to have an upper limit you can compare to at least.

Many, many wire protocols use 5 bits of bandwidth to send 4 bits of information, for various reasons. So dividing by 10 gives you a better estimate.

Of course when gigabit became a thing, your practical throughput was more like 75 MBps for a very long time, and being off by 25% in capacity planning is a pretty big error (one I've seen numerous engineers make, and a few make both, which means you're off by 40%)

After using many Unix tools that have this convention, I'm ok with 10 M referring to 1010241024 bytes (10 MiB), contrasted with 10 MB meaning 10,000,000 bytes.
MacOS (we are talking about Xcode) is using the SI definition though. 10 Mega should be 10 Million.
Ah, I didn't see your comment before I wrote mine. And I agree violently with your point of view.
This is gatekeeping, since the message is coded to be obvious to those "in the know" (of course mb means mebibyte!) but is a barrier to those who are trying to learn more (mb probably means megabits per second? it's a unit for measuring download speed? why is the 's' left off?)

This is putting the burden of collaboration in the wrong place: it shouldn't be a question of, can we expect a reasonable engineer in the industry to understand this unambiguously (with some deductions); but rather, can I hold down the shift key when typing the abbreviation for megabytes.

Obviously this depend on the actual audience, don't bother following this in team chat where speed is more important than clarity.

It is divisible in information and coding theory, where it makes perfect sense.
Because we solve practical problems, and do not nit pick of what is technically correct. We are not drones and easily understand that in this context it's megabytes.

Just look at all that "technically it's mebibytes bla bla bla" in replies. No one cares. Write some code. Or better - go outside.

space launches have crashed because of confusion over standard units.

in that case it was confusion between metric and certain fantasy engineering units, but an error of 1000/1024 will cause troubles just as badly.

so with that attitude maybe don't write that code, and better stay inside or a rocket might fall on your head.

but for serious, that correction probably has taught more than 10 people the difference between uppercase B = bytes, lowercase b = bits, uppercase M = mega = 1 000 000, lowercase m = milli, MiB = mebibyte = 1024 x 1024 bytes = 1 048 576 bytes, or at least made them aware of the important fact there is a difference. while your pedantry about nitpicking has taught nobody anything except to always be alert cause there's people like you that like to offload mental ballast and use wrong units because they insist their errors can be inferred and corrected from context... which is an important lesson also, but as a warning, not to defend the behaviour.

This. Thank you.
Insightful comment which took me 2700 mS to write.
You can downvote?
Yeah, when your karma is high enough. I don't know what the amount is that leads to downvote enlightenment though.
I believe it is 1000.
It's less than 1Mk (one megakarma).
It's around the 500 mark at the moment for comment downvoting I believe.
And I really wish people wouldn't start this argument when it's clear what was meant from context. We can't all get our wishes. :)
Huh, and I thought mb was "make bucket", like aws s3 mb.

http://docs.aws.amazon.com/cli/latest/reference/s3/mb.html

Since you said that the "m" is obvious from context, I'll compromise from now on and use "mB" exclusively
You should have included the whole window in the screenshot to prevent confusion: http://imgur.com/a/e2wb3

Specifically to those who don't use iTunes Connect, this page is titled "Estimated App Store file sizes for Build", and the (?) callout says "This is the amount of disk space the app will take up on the customer's device."

No, that's the wrong side of the store you are looking at :)

That's the size as uploaded by the developer, not downloaded to user.

Then that's pretty confusing when the (?) button next to Install Size says "This is the amount of disk space the app will take up on the customer's device."
These numbers matched exactly to what is currently in the AppStore for the last release we have ready for sale
As it says below "You need to look at the download size on the end user device, not the binary size in iTunesConnect."

Is that what you are doing?

Because otherwise, it includes the bitcode size. (i don't pretend this makes sense. only that it is :P)

In any case, i would still bet more heavily on it being displayed wrong than anything else.

If i was to increase the binary size of first party apps at google by even 1%, there would be a mob with pitchforks at my desk in less than an hour.

I can't imagine apple is really different.

I'm not sure the same can be assumed for Apple based on several things such as internal secrecy, differing organisational hierarchies, their willingness to focus on things that aren't tangible metrics like end user file download size.

Between the org and culture differences I'm just not so sure Apple can be safely assumed as similar as you expect.

None of the things you mention change the fact that if you increase the IOS system image size by 3x, it will likely no longer fit in the default firmware partition :)
I can't imagine anywhere besides Google that would actually notice or care about that.
Gonna disagree: Pretty much anywhere that doesn't generate shitty apps cares about even small size regression
If so, it seems very confusing that the header is "Install Size".