| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by CyberDildonics 1922 days ago

I'm not even sure I understand what you are asking, do you think that memory allocation is always bumping a pointer a uniform amount with no memory being freed, no variance in allocation size and no interlaced lifetimes?

You originally said that memory allocation was 'just a few instructions' when the reality is that it is complex and avoiding allocation is the first thing to do when optimizing software.

Saying that memory allocation is simple and fast based on a scenario that should never happen because it's useless is a little silly. No, that is not how allocators work.

1 comments

chrisseaton 1922 days ago

> do you think that memory allocation is always bumping a pointer a uniform amount with no memory being freed, no variance in allocation size and no interlaced lifetimes?

In modern language implementations with moving collectors allocation is literally what I've described in the fast path. Not theoretical. That's literally the instructions used.

> You originally said that memory allocation was 'just a few instructions'

Here it is in a real industrial compiler and garbage collector, for example.

https://github.com/oracle/graal/blob/f9530b0948c58a66845e84f...

It's precisely as I've described.

link

CyberDildonics 1922 days ago

It is not at all. That's like someone looking at perfect weather, blowing up an air mattress on the beach and wondering why anyone would need a house.

You are only on the 'fast path' when a single allocation of an array would have been much faster anyway. It isn't difficult to be 'fast' when accomplishing something trivial that would be even faster if done directly.

Why do you keeping saying the same thing while ignoring freeing memory, different lifetimes and wildly different sizes of allocations (which are why memory allocators actually exist)?

This idea that memory allocation is cheap is a blight on performance. Are you really allocating memory inside your hot loops and having it not show up when you profile?

link

chrisseaton 1922 days ago

> You are only on the 'fast path'

Isn't that what I've said all along? I said 'amortised' in my very first comment. Most allocations are from the fast path... that's why they bother to make it fast.

> when a single allocation of an array would have been much faster anyway

TLAB allocation is heterogenous (which solves your size question.) An array isn't. TLAB can be evacuated and fragmention-free (which solves your lifetime problem.) An array can't.

> Why do you keeping saying the same thing while ignoring freeing memory

I said 'Deallocation though - yes that's slow!' in my very first comment.

link

CyberDildonics 1922 days ago

Your original comment was trying to argue that allocations aren't a big performance problem. Anyone with experience optimizing knows this is not true and that minimizing allocations is the first thing to look at because it will most likely have the largest payoff with the least effort. Incorrect preconceptions and misinformation doesn't help people.

Saying 'allocations are fast' because you can make unnecessary allocations cheaper while ignoring deallocation is basically a lie, it's playing some sort of language game while telling people the opposite of what is actually true.

> Most allocations are from the fast path... that's why they bother to make it fast.

This is where you have a deep misconception. Doing millions of allocations that could be one allocation is not fast. If you want to add 1000000 to a variable do you loop through it one million times and increment it by 1 every time?

> TLAB allocation is heterogenous (which solves your size question.) An array isn't. TLAB can be evacuated and fragmention-free (which solves your lifetime problem.) An array can't.

That's like building a skyscraper out of legos because they fit together so well. It's nonsensical, especially due to pages and memory mapping.

> I said 'Deallocation though - yes that's slow!' in my very first comment.

Do you have a house with a kitchen and no bathroom? The discussion is that excessive memory allocation is a huge factor in performance. Trying to play language games to ignore the entire cycle doesn't change that and misleads people. Allocated memory needs to be deallocated. Lots of tiny allocations of a few bytes each are a mistake. They cause huge performance problems and should be obviously unnecessary. You wouldn't pick one piece of cereal out of the box, you would pour it in.

Do you profile your programs? Do you work on optimizing them? I have a hard time believing you are doing something performance sensitive while trying to rationalize massive memory allocation waste.

link

chrisseaton 1922 days ago

> This is where you have a deep misconception. Doing millions of allocations that could be one allocation is not fast. If you want to add 1000000 to a variable do you loop through it one million times and increment it by 1 every time?

Great example... because guess what happens with TLAB allocation during optimisation? It will do exactly what you say and combine multiple allocations into a simple bump, like it would with any other arithmetic!

> Anyone with experience optimizing knows this is not true ... Do you profile your programs? Do you work on optimizing them? I have a hard time believing you are doing something performance sensitive

I'm not going to keep arguing on this forever. But I have a PhD in language implementation and optimisation and I've be working professionally and publishing in the field for almost a decade. I'd be surprised to find out I have deep misconceptions on the topic.

link

CyberDildonics 1921 days ago

> Great example... because guess what happens with TLAB allocation during optimisation? It will do exactly what you say and combine multiple allocations into a simple bump, like it would with any other arithmetic!

Before you were saying it isn't a problem at all, now you are saying it isn't a problem because a single java allocator tries to work around it? That's a bandaid over something that you are trying to say isn't even a problem.

> I'm not going to keep arguing on this forever. But I have a PhD in language implementation and optimisation and I've be working professionally and publishing in the field for almost a decade. I'd be surprised to find out I have deep misconceptions on the topic.

But are you profiling and optimizing actual software? That's the real question, because as I keep saying, anyone optimizing software knows that lots of small allocations are the first thing to look for. You still haven't addressed this, even though the whole thread is about decreasing allocations for optimization and you seem to be saying it isn't a problem, which is misinformation.

link

kragen 1921 days ago

> Anyone with experience optimizing knows this is not true...This is where you have a deep misconception. …Do you profile your programs?

There's Dunning–Kruger, and then there's Dunning–Kruger of the level "telling a guy whose Ph.D. dissertation, Specialising Dynamic Techniques for Implementing The Ruby Programming Language, is about code optimization on GraalVM, that he has a deep misconception, and questioning whether he has any experience optimizing".

I repeat my complaint about "the climate of boastful intellectual vacuity this site fosters."

link

CyberDildonics 1921 days ago

You realize this person that you are calling an expert claimed that they optimize software by putting tiny memory allocations into their tight loops right?

I'm not really sure how anything I'm saying is even controversial unless someone is desperate for it to be.

Anyone who has profiled and optimized software has experience weeding out excessive memory allocations since it is almost always the lowest hanging fruit.

No matter how fast allocating a few bytes is, doing what you are ultimately trying to do with those bytes is much faster.

Allocating 8 bytes in 150 cycles might seem fast, until you realize that modern CPUs can deal with multiple integers or floats on every single cycle.

A 12 year old CPU can allocate space for, and add together well over 850 million floats to 850 million other floats in a single second on a single core. You can download ISPC and verify this for yourself. By your own numbers, the allocation alone would take about a minute and a half.

Neither of you has confronted this. I'm actually fascinated by the lengths you both have gone to specifically to avoid confronting this. Saying that lots of small allocations has no impact on speed is counter to basically all optimization advice and here I have explained why that is.

link