Hacker News new | ask | show | jobs
by pudquick 4711 days ago
This is incorrect.

The article involves compilation of the PyPy interpreter (not CPython) into JS.

This is why I asked about someone attempting a benchmark comparison between the two.

Still, it's a fun project. Good luck to the author.

1 comments

The quoted 139 MB includes the full Python standard library, which certainly accounts for a majority of that size. repl.it also compiles the entire Python standard library, so why should the two be significantly different in size?
For anyone following here at this point, I've downloaded the entirety of the repl.it python engine by using os.walk() on the root directory (/), causing my browser to download every .js, .py, etc. file it can find and store them locally, uncompressed, on my machine. It even amusingly found some .exe files hosted in the distutils directory.

The entirety of the repl.it emscripten CPython project is 24MB, uncompressed. This includes the entire standard library that it ships with and all the '_underscore.so.js' emscripten compiled shared objects. Compressed via zip it's 4.7MB. For comparison, this is almost identical (within a few MB) to a clean install of python locally on my workstation, size-wise. I am assuming at this point that it's most if not all of the standard modules included at repl.it.

(And for reference: The core CPython engine, translated minus modules, weighs in at 4.6MB uncompressed and 800KB compressed)

Downloading the prebuilt PyPy project from http://pypy.org/download.html I see that uncompressed the project is 55MB in size.

Removing all .txt and pure .py files, I'm left with 37MB (and that's being generous) of 'code' files that potentially are being translated. And that's with shared objects - not a static compile - so there's possibly duplicated code in there that wouldn't be present in a single monolithic executable.

I stand by my assertion that 139MB is significantly different in size and that the translation is what accounts for the majority of that size (84MB if I'm generous, 102MB if I'm slightly more realistic).

As much as there may be a speed benefit, eventually, if everything works out here, the current size of the project definitely moved it out of the realm of anything I'd want to attempt loading into a browser.

A big part of the current size problem is the way that the stdlib files are bundled - the contents of each file are encoded, byte-for-byte, as a list of base-10 integers. So "hello" gets bundled as "[104, 101, 108, 108, 111]", resulting in quite a bit of overhead.

I agree that 139M is pretty ridiculous for any practical purpose! I'm going to work on lazily loading just the files that are needed, which should make a big difference.

Huh. I didn't realize PyPy was so much larger than CPython.

I was aware of the ridiculous size of emscripten's output. When I experimented with emscripten a couple months ago, "Hello World" in C produced about 2000 lines of JavaScript. Another ~700 line program in C ran about 150,000 to 200,000 lines, depending on various settings.

repl.it does not include the full standard library. repl.it uses dlopen to load python modules at runtime, which were compiled separately.

pypy.js on the other hand does include the full standard library.

They are two different implementations of the standard library.
The PyPy project did not reimplement the entire Python standard library. The goal of PyPy is to provide a drop-in replacement for the canonical CPython interpreter. Right now, PyPy is (very) compatible with existing Python 2.7.3 code.[1]

1: http://pypy.org/compat.html - Standard library modules supported by PyPy. Note that large parts of python library are implemented in pure python, so they don't have to be listed there.

I didn't say the entire library was reimplemented, just that they are two different implementations.