Hacker News new | ask | show | jobs
by webignition 2126 days ago
What are the memory leak problems?

I was under the impression that PHP's stateless nature, with a whole new instance of everything coming into being for each request, somewhat sidesteps the capacity for problematic memory leaks that could otherwise exist for processes that remain alive between requests.

4 comments

> I was under the impression that PHP's stateless nature, with a whole new instance of everything coming into being for each request

A couple issues here.

1. Even if the PHP process exits after each request, memory leaks during the request are still an issue. If I'm looping through 1,000 elements of an array and leaking memory each time, I could end up with a peak memory usage of 100MB when 1MB would've sufficed. That forces me to scale vertically when I might not otherwise need to.

2. Most mature applications have worker processes that may be longer-lived than serving an HTTP request. If PHP leaks memory, it's less suited to those tasks. Maybe you write those in Go instead, but then you have to ask yourself why you didn't just write the whole application in Go.

1. This is likely user error due to a poor understanding of how PHP memory management works. Unless you're using poorly written extensions then memory management in PHP is quite straightforward.

2. There's a large amount of PHP libraries built to do exactly what you're describing. Long lived PHP processes are live in production across the world.

Straight up misinformation.

Note: I wrote PHP professionally for more than 10 years, and I still maintain several PHP code bases.

> This is likely user error due to a poor understanding of how PHP memory management works.

This is not what the comment I responding to was talking about. They were talking about memory leaks that were not in userspace. PHP's bug tracker has 943 bugs related to memory leaks[1]. That has nothing to do with understanding "how PHP memory management works".

> Unless you're using poorly written extensions then memory management in PHP is quite straightforward.

How exactly do you know whether an extension is poorly written? In many cases, it doesn't matter because there's exactly one extension that does what you need.

You could also be using poorly-written libraries.

> There's a large amount of PHP libraries built to do exactly what you're describing. Long lived PHP processes are live in production across the world.

I know. I use some, and I have several long-lived PHP processes running right now.

If a memory leak is in PHP itself, no library can fix it.

> Straight up misinformation.

Disagreeing with someone (or not understanding them) doesn't mean they're spreading misinformation.

My point was that memory leaks in the PHP virtual machine are still a problem even if PHP usually runs statelessly. Do you disagree?

1. https://bugs.php.net/search.php?search_for=leak&boolean=0&li...

Ok, but looking at that same bug tracker there are only a handful of memory leaks in PHP core (i.e. not related to an extension), the most serious of which is being actively worked on: https://bugs.php.net/bug.php?id=76982
This is true. As far as I understand, the only issue with memory leaks have been around super long running processes that create cyclical references that cause refcounting to not "notice" that something is free. I have never had a memory leak problem with a web service, and I wrote* the PHP backend for Grooveshark (45 million monthly users, averaging 2 hours per visit, running on half a dozen front end web servers. Sadly I don't remember RPS figures at this point)

*not to take sole credit. I created RPC framework and initial API, and oversaw a small team of very talented engineers who added a significant chunk of the functionality powering Grooveshark over the years

Most web workloads will use some sort of process cache, like php-fpm, simply because spawning a process for each request is very inefficient. So the process and any extensions used can leak memory over time across requests.
php-fpm is typically configured to restart those processes after some number of requests, just in case there is such a leak. But I have not seen those leaks happen between process restarts, state is still wiped clean between requests, so leaks should be rare/difficult and probably most likely the result of faulty extensions.
Manipulating complex data structures that are large and/or trying to manage long running processes cause memory leaks. I often find I have to jump through hoops I don't have to in other languages to deal with these kinds of problems, for instance.

I usually have to resort to queues, which just adds complexity where I typically don't want it.

> Manipulating complex data structures that are large

I do assume it's not the case, but one of the footguns with PHP is that assigning around large variables/arrays and triggering the copy-on-write can end building up the amount of duplicates of that data being kept alive. Pass by reference, unsets, generators and the like aren't used super often due to a lot of runs being small stateless requests, but sometimes they may be needed.

We resorted to only giving PHP ready to transmute data, so it didn’t handle any of the manipulation unless it was straightForward based on our own criteria, but anything involving too much searching or filtering (yes I realize there are other problems here, but our scale was large enough that these problems had to be tackled piecemeal, and aren’t relevant beyond that for this discussion)

Ultimately I’ve had less issues with Python for this type of workload and is eventually where we moved.