Hacker News new | ask | show | jobs
by nicebill8 2373 days ago
I assume the first script isn't the benchmarked program, but that it just generates it. The pure parsing of the second script would be the benchmark.

Having said that, the fact it takes 5 seconds is pretty arbitrary—we can't really say it's good or bad unless there is some basis for comparison.

2 comments

I might have not explained properly, the output of the program was saved into the file,

  python generate.py > big.py
then this file was executed with Python, that took 5 seconds:

  time python big.py
and yes, I had no idea what to expect, hence the test.
Some context:

- big.py is almost 15MB

- Running `lua5.3 big.py` takes about 1/6 the time on my Linux VM. (I had to use a big.py only half the size because this VM was too small for the original.)

So Python isn't the fastest at this but I wouldn't call it terrible. Basically all of that time was parsing/compiling, since the time is the same with big.py changed to `def foo(): x0=0 # etc.`

a million sequential memory accesses should not take 5 seconds. Maybe 5 ms. Most likely the time is spent on VM /interpreter startup and shutdown, OS overhead, etc.
That's not a million sequential memory accesses.

Assuming a fairly naive python interpreter, it involves

- Parsing a million lines of code

- Outputting bytecode for a million lines of code

- Running a million of lines of bytecode, including

- Allocating a million int objects (ints are heap allocated in python)

- Hashing a million identifiers

- Putting those million identifiers into a hashmap, each pointing to the corresponding into object

- Deallocating those million int object

Plus VM/Interpreter startup/shutdown. But we can benchmark that by running an empty python file and it's not significant.

Given the downvotes, I'm fairly certain I missed the direction of discussion. 5 seconds is an astronomical amount of time to do something as described in the parent comment.
Is 5 seconds such a long time really? For parsing a file, then creating and evaluating 1 million local variables, all inserted into the local namespace?

Made me wonder, how long does it take to compile a million line long C program? Let's see:

  N = int(1E6)

  print("""#include <stdio.h>
  int main() {
  """)

  for x in range(N):
     print (f"int x{x}={x};")

  print("""
    return 0;
  }
  """)
now generate the code and compile it:

  time gcc big.c 
drumroll ... ummm ... look at that ... doesnt finish, hangs forever (or at least longer that care to stick around wich was 10 minutes) ...

notably for 10,000 variables it takes just 0.3 seconds, but raising that number one order of magnitude to 100,000 makes it hang, 100% CPU 3% memory used. Something has an awful scaling behind the scenes.

But now we're timing an optimizing compiler, and not an interpreted program of 1 million lines. I understand that you couldn't generate a program of 1 million lines, but still, I would expect compilation to take longer than execution for just about any non-looping program. Again, I am missing the direction of this discussion.

Compilation is technically NP-Hard, meaning it scales very poorly, as you described, whereas interpretation is not (no register allocation, etc).

https://cs.stackexchange.com/questions/22435/time-complexity...

the main purpose of the exercise was to explore what happens when you throw 1M lines of code to a language,

our estimates to what happens could be wildly inaccurate,

I was pleased with Python finishing in 5 seconds because I thought it might not even work at all due to some internal implementation detail that I was not aware of - like the ruby example that fails with a stack error. I am pleased to see that Python can handle it.

As for the GCC compilation I find it quite unexpected that it compiles 10K lines in just 0.3 seconds, but then seems to hang on 100K lines. Not so easy to explain.

That code doesn’t imply simply 1 million sequential memory accesses in Python. Python is not C.
it has to read and parse 1 million lines, just reading the file with wc takes 22ms with

  wc big.py
so 5ms is way underestimating the job