Hacker News new | ask | show | jobs
Facebook Porting PHP To JVM (nerds-central.blogspot.com)
44 points by code-dog 5055 days ago
12 comments

At this point I have to just laugh. They literally already paid for the development of a prototype for a faster PHP, and decided not to pursue that route: http://morepypy.blogspot.com/2012/07/hello-everyone.html. I have to assume they have some insane internal politics, because there's no sane technical reason to develop HipHop (static subset-of-php to C++ compiler), then start the HipHop VM, then fund the PHP on PyPy prototype, and now apparently move to a PHP on the JVM.
I don't understand what's wrong with exploring your options. Why would you assume that the reasoning has something to do with politics rather than trying to find a better technical solution to their problem?
They're not exploring a research space, they're exploring a space where the options are incredibly well known, and they're jumping from solution to solution, even after they appear to find a success.
If they have enough resources, they can afford to try all options in parallel. Wasteful, but although it may seem strange, it isn't like programming language implementation engineering is expensive compared to data centers.
yeah, god forbid they feel that their initial success could be improved upon; They should put down their tools as soon as they get their first version that works

/sarcasm

I'm not sure why you'd laugh. Facebook is investigating a bunch of different options which may or may not yield fruit.
Standard feasibility work for any large project? Alternately the potential that all of their explorations have application in a stack somewhere in the organisation? At any rate testing and validating / invalidating alternatives is time well spent if you have the engineering capacity.
Phalanger ( http://phalanger.codeplex.com/ ) is an implementation of PHP running on the CLR with a fairly good compatibility story and sometimes much better performance.

Phalanger worked well even without using the DLR (dynamic language runtime) so a JVM implementation shouldn't be a massive undertaking (except for the lexer and parser parts...)

There are already at least 2 implementations of PHP on the JVM:

http://quercus.caucho.com/ http://www.projectzero.org/php/

Has anyone around here actually used either of the above and care to comment? I would be curious to know how compatible they actually are with the average chunk of PHP code, and if not how hard it is to work out whats wrong and get it running.
Quercus is known to run WordPress, MediaWiki, and Drupal without patches.
I work at Facebook and have been involved in the Hiphop for PHP project for the past few years.

We have our own VM written and it isn't based on the JVM.

Bytecode: https://github.com/facebook/hiphop-php/blob/master/doc/bytec...

Code: https://github.com/facebook/hiphop-php/tree/master/src/runti...

It should be noted that there is no direct evidence that this is happening, other than:

"The presence of Facebook engineers at the Java Language Summit in San-Francisco"

Is the author aware that Facebook is a multi-language shop? It's why they have Thrift, after all.

Why do you quote selectively? The next part is: "along with their interest in implementing PHP using invokedynamic on the JVM".

Also, JVM Language Summit, not Java Language Summit. (OP got it wrong too.) More on http://wiki.jvmlangsummit.com/Main_Page

I quoted selectively because the author gave no source for "their interest in implementing PHP".

None. No links. No names. No quotes. No press releases. No blogs. No email archive. Nothing. Not a skerrick. Not a sausage. Not a sniff.

Most likely this was a hallway conversation at JVM Language Summit (which it seems that OP attended). I don't see any particular reason to doubt his claim.
I do, because he seems to have a bee in his bonnet about the evils of interpreters.

I'm also a bit wary of leaping from one-guy-musing-in-a-hallway to FACEBOOK PORTING PHP TO JAVA.

It's the summer of 2018. Facebook shares drop to 17 cents as their PHP to ASM compiler project nears completion.
No reason it should take that long. The article suggest 6 person years - which could mean 6 people for 1 year.
Just like nine women can have a baby in one month?
exactly. One makes the head, two make each arm, etc...

of course, you can't do that with computer software

In this case a team could work across the project. One on the parser/lexer one on AST translation, one of code generation, Two on the runtime and 1 on build/ci. I think that makes 6.
try reading The Mythical Man Month http://en.wikipedia.org/wiki/The_Mythical_Man-Month

unfortunately my boss thinks as you do..

You either never read the book or are interpreting it wrong.

The premise is that as you add more people to an EXISTING project, the amount of time it will take to complete will increase. However, if all of those people are involved from the beginning, then you are not subject to the same phenomenon - within reason of course.

So while it might not be "6" developers working on it for a year, it could potentially be 9 very devoted developers cranking this out with good results towards the end of the year.

Brookes also talks about another issue, that in a team of 6 there are (6 choose 2) edges in the communication graph. I think this is what kyriakos was referring to.
Errmm, if you actually read the book properly you'll realise that the baby/mothers analogy applies to synchronous sets of work and workloads that cannot be broken down into parallel running tasks. Once architected to a suitable degree I'd imagine that there are at least a few streams of work that could be carried out concurrently.
Yes! Someone gets it!
AFAIK they have about 20 people on this....unless I've mis-construed a friend of mine's elliptical comments. In other news - when is Google's VM that all the people they hired from the CLR team going to be released?
I guess someone must really love PHP, though I'm slightly confused as to why.

If you're going to make such a massive undertaking anyway (it's not just the new runtime, all existing code will have to be re-tested and re-debugged), why not sink the man-hours into putting it into a new language better suited to the task from the start?

Because there are millions of man hours of PHP from FB and it will cost less than one hundred thousand to port that work to the JVM.

Yes the code will have to retested, but if you rewrite then you rewrite AND retest. Working code makes a great test suite, something you lose in a rewrite.

Yes, that sounds fine (and I'm aware of the Spolsky article posted in another comment.)

But the fact is that a system of that size simply must already be split up into discrete services or components. (If it's not, then that should be their first priority. But I can't imagine Facebook is running everything they do out of one process).

So they could port each service/component one at a time, rigorously testing and improving performance as they go. Then they'd not only get the benefit of a better runtime, but also a safer, faster language. In fact, if they're anything like most companies I know, they're always in the process of rewriting one service or another to improve performance or features, whether in a new language or not. All they'd need to do would be to switch to the new language whenever they were refactoring existing code anyway.

Of course I can't make the decision for them, and they are rightly hesitant to do an entire rewrite. But it seems like they're going to fairly extreme lengths to stay with PHP.

> But the fact is that a system of that size simply must already be split up into discrete services or components.

Come on, they created Thrift, I think we can assume they use it?

i understand your thinking but in your opinion what language is suited for this particular task? personally i doubt there's a platform that's suitable for this scale, one has to create his own, and thats what facebook's doing.
This is where someone posts the Joel on Software link about Netscape (I'm on my iPhone else I would).

Also, developing a new implementation of PHP allows site development to continue in the meantime.

I agree with Joel about a lot of things but I don't think history lines up with his famous comments about the Netscape rewrite. Netscape was well on the decline before they began the rewrite and its problems were more lack of commercial focus (eg. their failed attempts at "groupware") rather than the technical decision to rewrite Navigator. And without that rewrite it is unlikely that Firefox would have ever existed and without Firefox, Netscape (now Mozilla) would have slid into the abyss of irrelevance a long time ago.

So while there are certainly good arguments to be made that rewrites can bite you in the ass and should often (though not always, IMO) be avoided, I wish people would stop referencing Joel's circa-2000 post unless they are willing to rewrite it in hindsight and still attempt to make his argument.

Why do these companies like the JVM so much? Twitter, and now Facebook. Does it have that great of performance?
It's one of the most scalable, robust, secure, proven mainstream web platforms available. And at almost any server-side scale it performs better than PHP, Ruby, and Python.

As far as mainstream languages go, only C and C++ have a performance and scalability edge, but C++ at least comes with more complexity. JVM seems to hit the performance:complexity sweet spot.

... and you have to work really hard to make C or C++ run any faster than the JVM. For all normal levels of programmer effort there is no difference or the JVM is faster.
There aren't that many problems with the JVM compared with Java. Most of what makes java suck is the language and what makes the performance sluggish is the standard lib as well as the programming style the community encourages.

As much as I love to rag on java the language the JVM is generally a solid piece of work that makes reasonable tradeoffs.

How does it compare with the CLR, or things like LLVM?
I'd say it's on par with the CLR, similar corner cases, similar annoyances.

The biggest problem I've had with the CLR is threads bouncing between CPUs but I'd attribute that to Windows as I worked around it with some Win32 calls to pin threads to processors.

Can't really compare to LLVM, I haven't personally encountered any of the known llvm bugs (clang to be fair) with Obj-C.

The JVM has a huge advantage over the CLR in that Mono is nothing like as well supported as Oracle supports the JVM and Microsoft's CLR is Windows only - which makes it far to expensive for cloud computing.
If you're looking to stretch a budget I'd look to companies like Hetzner rather than looking to the cloud.

I'd try to increase the revenue my server generates rather than decrease the cost of servers, but I'm the kind of guy who thinks it's possible to pull more than 5 cents an hour in revenue from a server, but if you only get 5 cents per hour then it would be important to use linux so you maintain your 2 cent per hour profit margin.

Yes. http://shootout.alioth.debian.org/u64q/which-programming-lan...

Note that Java compares very favorably with C. (Of course, the real thing to take away from that chart is: why don't all the JVM users switch to Haskell? It's more concise, safer, and has a better community.)

Many JVM developers for whom programming is craft rather than a day job are definitely looking at both Scala and Haskell. Lots of excellent stuff going on with both, as you point out - concise, safer.
I'll just have to trust you on that I suppose.
That is true. As someone who thinks both ways (craft and day job) the issue with Scala (that with which I have the most experience) is the tooling.
The JVM can be bootstrapped relatively easily compared to Haskell. It's just C++ code. If I gave you a computer with a C and C++ compilers, and the ghc source, what would you do?
I'd make a snide comment about receiving a phone call from 1980 asking for their computer back.
Now that benchmark you linked has a hidden bias towards Java. If you read the methodology listed in their FAQ, they mention they ran each benchmark for Java 66 times in the same JVM instance before discarding the first 65 results, which leaves out the initial iterations before JIT has kicked in. For servers, which perform many similar operations for each client and are rarely restarted, the benchmark you posted is probably valid. But for jobs that wouldn't benefit from JIT, Java would perform much more slowly.
NOT TRUE.

If you read the methodology listed in their FAQ, they state in bold - "Time measurements include program startup time".

http://shootout.alioth.debian.org/help.php#time

If you read the section "What about Java?" they mention ADDITIONAL measurements, which are only shown on the Help page, and indicate how little difference JVM startup time makes once these programs have run for 5 seconds and 20 seconds.

http://shootout.alioth.debian.org/help.php#java

Is there any official reference to backup the claim? I doubt Facebook would spend effort on doing this since they have already port PHP to C++ with Hiphop.
Lets ask this one question. Why rewrite python in python? Why rewrite Ruby with java? They may have realized that php is a good solution for their existing framework, the framework has an enormous unit test harness, which would take a very long time to translate, and they have nothing against the language at the company. If they are planning a jvm for php, it is probably because they have developers there who have experienced the speed increase of other jvm langauges like jruby, jthon, etc. I'd wager money that they will probably use one of the current open source translations and build on top of that, maybe gut it like crazy and use it as a foundation. A full rewrite if php in jvm would probably be extremely difficult timewise.
> Facebook are looking to move PHP on.

Did Google Translate write the article?

This is all great but... why?
Because:

1. the JVM is an open, stable, mature, fast platform for server applications.

2. Facebook are not moving off PHP any time soon.

3. For stuff outside its written-in-C standard lib, PHP is quite slow. PHP's garbage collection, JITting etc is nowhere near as advanced as the JVM's.

4. Since porting dynamic languages to the JVM is a well-worn pathway, why not try porting PHP and see how it performs?

If Facebook can reduce their CPU and RAM requirements by just 10%, it pays for itself many times over vs the stock PHP runtime. And based on what I've seen with JRuby (going from 50Mb MRI runtimes down to 2.5Mb runtimes that are 3-10x faster) that's entirely reasonable.

What kind of computations were you doing on MRI which resulted in such performance boost by switching to JRuby? I have tried Rails under JRuby many times, the performance is the same as 1.9.3 and memory consumption usually higher.
Redmine. Runs faster and smaller per instance of the application.
Looking at my post again, I guess I didn't do a very good job.

I didn't mean 2.5Mb per copy of the JVM. I meant per runtime within the JVM. These are run as threads, rather than processes.

So we're really comparing

    n instances of MRI * 50Mb
vs

    1 JVM + (n * 2.5Mb)
Where the cost of the JVM itself is amortised over the application instances.

Sorry, what I said originally is easily misleading.

My reaction exactly. I have great respect for the JVM, but PHP could simply be replaced by JavaScript for example.

It seems easier to me to port PHP to Rhino or node.js than to bring PHP to the JVM. Especially considering CoffeeScript, PHP just can't keep up.

was this post just written with a JavaScript-buzzword generator?
Yes.js