Hacker News new | ask | show | jobs
by jvanderbot 2373 days ago
a million sequential memory accesses should not take 5 seconds. Maybe 5 ms. Most likely the time is spent on VM /interpreter startup and shutdown, OS overhead, etc.
3 comments

That's not a million sequential memory accesses.

Assuming a fairly naive python interpreter, it involves

- Parsing a million lines of code

- Outputting bytecode for a million lines of code

- Running a million of lines of bytecode, including

- Allocating a million int objects (ints are heap allocated in python)

- Hashing a million identifiers

- Putting those million identifiers into a hashmap, each pointing to the corresponding into object

- Deallocating those million int object

Plus VM/Interpreter startup/shutdown. But we can benchmark that by running an empty python file and it's not significant.

Given the downvotes, I'm fairly certain I missed the direction of discussion. 5 seconds is an astronomical amount of time to do something as described in the parent comment.
Is 5 seconds such a long time really? For parsing a file, then creating and evaluating 1 million local variables, all inserted into the local namespace?

Made me wonder, how long does it take to compile a million line long C program? Let's see:

  N = int(1E6)

  print("""#include <stdio.h>
  int main() {
  """)

  for x in range(N):
     print (f"int x{x}={x};")

  print("""
    return 0;
  }
  """)
now generate the code and compile it:

  time gcc big.c 
drumroll ... ummm ... look at that ... doesnt finish, hangs forever (or at least longer that care to stick around wich was 10 minutes) ...

notably for 10,000 variables it takes just 0.3 seconds, but raising that number one order of magnitude to 100,000 makes it hang, 100% CPU 3% memory used. Something has an awful scaling behind the scenes.

But now we're timing an optimizing compiler, and not an interpreted program of 1 million lines. I understand that you couldn't generate a program of 1 million lines, but still, I would expect compilation to take longer than execution for just about any non-looping program. Again, I am missing the direction of this discussion.

Compilation is technically NP-Hard, meaning it scales very poorly, as you described, whereas interpretation is not (no register allocation, etc).

https://cs.stackexchange.com/questions/22435/time-complexity...

the main purpose of the exercise was to explore what happens when you throw 1M lines of code to a language,

our estimates to what happens could be wildly inaccurate,

I was pleased with Python finishing in 5 seconds because I thought it might not even work at all due to some internal implementation detail that I was not aware of - like the ruby example that fails with a stack error. I am pleased to see that Python can handle it.

As for the GCC compilation I find it quite unexpected that it compiles 10K lines in just 0.3 seconds, but then seems to hang on 100K lines. Not so easy to explain.

That code doesn’t imply simply 1 million sequential memory accesses in Python. Python is not C.
it has to read and parse 1 million lines, just reading the file with wc takes 22ms with

  wc big.py
so 5ms is way underestimating the job