Hacker News new | ask | show | jobs
by skohan 2591 days ago
It depends on the implementation. In some cases, where arrays are required to be stored in contiguous memory, appending to an array will result in a copy anyway since the entire contents will have to be moved to a new, larger block of memory.
2 comments

Typically (I don't know about in Javascript in particular), a vector has both a length and a capacity. The capacity represents total allocated space (always >= the length). If excess capacity is available when you append, that gets used rather than having to copy the array. The capacity grows exponentially (powers of 2 or 1.5 or something). This means that if you take an empty vector and append N elements one at a time, it does O(N) total copying of existing values rather than O(N^2). Another way to put it is that a single append is amortized constant time.

Here's a stackoverflow answer which talks a bit more about this: https://cs.stackexchange.com/a/9382

edit: oops, nvm
You're forgetting that half the array is never copied. The total number of copies doesn't exceed a linear cN, so it's O(N), not O(N log(N)).

The constant bound on the number of copies depends on the growth factor. With a growth factor of 2, the constant is 2. As long as the growth factor is greater than 1, the number of copies is linear.

You can verify this yourself with some trivial code:

    struct CountsCopies {
      CountsCopies() {}
      CountsCopies(const CountsCopies& other) { ++CopyCount(); }
      CountsCopies& operator=(const CountsCopies& other) { ++CopyCount(); return *this; }
      static std::size_t& CopyCount() {
        static std::size_t count = 0;
        return count;
      }
    };

    std::size_t last_copies = 0;
    std::vector<CountsCopies> v;
    for (int i = 0; i < 1000; ++i) {
      const std::size_t copies = CountsCopies::CopyCount();
      if (copies != last_copies) {
        std::cout << "size=" << i << " copies=" << copies
                  << " ratio=" << copies / static_cast<double>(i) << '\n';
        last_copies = copies;
      }
      v.emplace_back();
    }
Output:

    size=2 copies=1 ratio=0.5
    size=3 copies=3 ratio=1
    size=5 copies=7 ratio=1.4
    size=9 copies=15 ratio=1.66667
    size=17 copies=31 ratio=1.82353
    size=33 copies=63 ratio=1.90909
    size=65 copies=127 ratio=1.95385
    size=129 copies=255 ratio=1.97674
    size=257 copies=511 ratio=1.98833
    size=513 copies=1023 ratio=1.99415
No, since the early array copies are on smaller arrays they take far less than O(n) time. The total time in copies is 1 + 2 + 4 + ... + 2^m + ... 2^(floor log2 n), which equals 2^(ceil log2 n) - 1, or O(n).
The number of copies required is a geometric series. With a growth factor of 2x, the number of total copies is only 2N - 1. Thus, it is O(N).

It's O(N) for any geometric growth rate, but using 2x makes it easy to see, because every copy is one larger than all previous copies combined. Consider a concrete example for N=9: 1 + 2 + 4 + 8 = 7 + 8.

It really does depend on the implementation. Only the logical address space needs to be contiguous, not the underlying physical memory. A clever enough implementation could work hand-in-hand with a clever enough allocator, so that realloc would just add new logical pages to the end of the underlying buffer when it’s time to double its size without having to copy anything.
Your virtual address space needs room for that, clever allocators can’t violate the laws of physics. Also, you can only do this at page size granules.

At any-rate, it’s not like copying will happen anyways during GC.

Is this really done? It seems like this approach would require all arrays to be allocated with at least a page's worth of memory, which seems incredibly wasteful if you have a bunch of small arrays with a handful of floats or something.
”It seems like this approach would require all arrays to be allocated with at least a page's worth of memory”

I don’t know whether it is done in practice, but you can postpone doing that until the array gets ‘big’, for some definition of ‘big’.

Allocating a bunch of small arrays is going to be wasteful (performance wise) anyway. That being said, I'm not sure if this is done frequently or not.