Hacker News new | ask | show | jobs
by adunn 4194 days ago
This is great news. PHP doesn't have many structured data types, so arrays (aka maps) are basically used for everything. Any improvement to them will impact the entire application.

It would be nice to have separate types for arrays and maps though. I don't understand why they were combined to begin with. Simplicity? Seems like there are more edge cases and gotchas the way things are now.

4 comments

PHP has a standard library with plenty of collections: http://php.net/manual/en/book.spl.php

Stack - http://php.net/manual/en/book.spl.php Queue - http://php.net/manual/en/class.splqueue.php PriorityQueue - http://php.net/manual/en/class.splpriorityqueue.php Real Maps - http://php.net/manual/en/class.splobjectstorage.php

It's a shame some people are not aware of these.

Well, yeah, I agree that the SPL types are nice to have. My complaint is that PHP's only first-class array type is a weird array/map combo. Have you ever seen anything like that in another language?

The SPL types are definitely a welcome addition to the language, but they feel like add-ons. Definitely not first-class. The standard array functions don't work with SPL types (array_map, etc.) even though the SPL types are iterable.

More to my point, missing from SPL is a dynamically-sized array that's not based on linked lists. Linked lists don't have O(1) lookup. This type of array really should be first-class, but it’s completely missing from the language. Please correct me if I’m overlooking something!

That said, I believe having separate, first-class, dynamic arrays and maps would strike a better balance of dynamism, performance, and predictability compared to the single existing first-class array/map.

Lua also have hybrid array, maps and object. It does work well when implemented in a sane way. Lua has different way of iterating the table if you want an array or maps.
... though I wouldn't necessarily call Lua's way of doing things "sane". E.g. if Lua encounters a `nil` somewhere in your table, it will stop iterating.
It's certainly sane in the sense that it was an intentional, if perhaps unusual, design decision. I personally like it since it results in an efficient implementation of sparse arrays.

In any event, Lua 5.2 will respect the __len, __pairs and __ipairs metamethods, so you can tweak this behavior if you need to store nils in your tables and iterate over them.

The world would be a beautiful place if everything intentional were sane :-) It's bitten me a couple of times (e.g. when unpacking arguments in a function to feed them to another function, where one of the arguments is nil) but I can see how it could be the desired behavior in some cases.
> My complaint is that PHP's only first-class array type is a weird array/map combo. Have you ever seen anything like that in another language?

Um, Javascript?

Javascript's arrays cannot be used as maps or associative arrays. Arrays require numeric, consecutive keys. Otherwise it's an Object.

--

Clarification edit: Creating a new key on an array using arr['key']=1 creates a property on the arr Object but does not add an element to the standard array.

http://stackoverflow.com/questions/8630471/strings-as-keys-o...

Arrays are always objects in JavaScript, and they can be extended like any other object or used as maps (barring key conflicts).
Adding non-numeric keys does not remove an Array's "arrayness" in JavaScript

  var x = [];
  console.log(Object.prototype.toString.call(x));
  //[object Array]
  
  x[0] = 1;
  console.log(x[0]);
  //1
  
  x["test"] = 2;
  console.log(x["test"]);
  //2
  
  console.log(Object.prototype.toString.call(x));
  //[object Array]
  
  x.map
  //function map() { [native code] }

  var y = {};
  
  console.log(Object.prototype.toString.call(y));
  //[object Object]
  
  y.map
  //undefined
His point was:

    x = [1]; x["test"] = 555; x.map(function (y) { return y; });
    // [1]
That's [1], not [1, 555]. You can access the array from the map interface, but not the other way around.
It does internally. Pretty sure that's even defined in the standard.
Arrays are full objects e.g. function(){ var x = []; x.foo = "bar"; return x.foo;} returns "bar".

That is the whole reason prototype.js library broke the use of for (var x in somearray) because the library set properties on Array.prototype.

Nope.

Array are objects,but all objects aren't Arrays.

    var a={};

    a instanceof Array // false
Furthermore,Javascript objects aren't maps.
No primitive types: tuples, lists, sets. Having only arrays to work with and being weak typed results in a lot of headache.
Those aren't primitive types
What are those called then? Wikipedia definition of primitive type is pretty vague. I don't think collections/containers can't be called primitive.
Collections are generally not primitives unless they are implemented in the language as primitives.
While I agree with you, Erlang for instance has collections as primitives (lists/tuples), but then offers more complex collections (such as gb_tree) as part of the stdlib.
I'd call them "composite types".

"Tuple" tends to refer to fixed-size collections where each element may have a different type. The tuple type is the (cartesian) product of those types. For example, the type "a tuple containing 3 Booleans" can be written as `Boolean * Boolean * Boolean`.

Since Boolean is type 2 (ie. it contains 2 elements, `True` and `False`), this makes our 3-Boolean tuple `2 * 2 * 2 = 8`. True enough, it has 8 elements: `(True, True, True)`, `(True, True, False)`, `(True, False, True)`, `(True, False, False)`, `(False, True, True)`, `(False, True, False)`, `(False, False, True)` and `(False, False, False)`.

Likewise, a tuple like `(1, "hello", True)` has type `Int * String * Boolean`.

That's why tuples are "composite types", they're the product of other types.

Arbitrary-length collections are more complicated, since they require recursion. The simplest is a singly-linked list with all elements of the same type `T`, which is given by the equation `list(T) = 1 + (T * list(T))`. `1` is the unit type `void`, which has one element (`NULL`), which represents the "nil" at the end of the list. The `T * list(T)` is a tuple containing a `T` and a `list(T)`, ie. it's a "cons cell". The `+` is a (tagged) union.

We can solve recursive equations like this using the greatest-fixed-point combinator `mu a`,to get `mu a. list = 1 + T * a` (see http://debasishg.blogspot.co.uk/2012/01/learning-type-level-... ).

For heterogeneous collections, where the element types can differ, things get more complicated, since we need to establish the (potentially infinite) structure of the type, and where each component type fits in.

Dynamic languages use one big recursive type, so all of these types just-so-happen to be the same, and we get tuples-of-tuples(-of-tuples, etc.) via the "top-level" recursive nature of that single type.

In contrast, a "primitive type" is one that's not made up out of other, simpler types. We can usually treat the empty type 0 and the unit type 1 as "primitive", since we can't define them out of simpler types. We don't have to use those as our primitives though, since we could choose some 'larger' type, like 5 (the type with 5 elements) as primitive, then use quotients and subtraction to define the others (eg. `5 / 5 = 1` and `5 - 5 = 0`) but it's much more elegant to take 0 and 1 as primitive.

Particular languages may choose to make other types primitive, eg. Int32 or Float64, eg. if they want to treat them specially with hardware optimisation and such.

> It would be nice to have separate types for arrays and maps though. I don't understand why they were combined to begin with. Simplicity? Seems like there are more edge cases and gotchas the way things are now.

Combining arrays and maps in one type was the cause for a remotely exploitable vulnerability in Drupal this year, https://www.drupal.org/SA-CORE-2014-005.

I commented on that at https://lwn.net/Articles/618530/. Quoting from that comment:

"[...] most uses will treat it either as an array (list of items) or as a key/value store (map from key to value, or sometimes set of values), but rarely as both at the same time. [...] In this vulnerability, the programmer expected a sequence, and was handed a mapping. [...] all uses of a single variable should be consistent (never use a sequence method on a mapping variable or a mapping method on a sequence variable). As shown in this vulnerability, "foreach ($data as $i => $value)" is a mapping method; it should never be used on a sequence, even if it works."

> It would be nice to have separate types for arrays and hash tables though. I don't understand why they were combined to begin with. Simplicity? Seems like there are more edge cases and gotchas the way things are now.

There is no "hash table" type in PHP user land.

There are no arrays in PHP.

There are only hash tables that are called "array" for simplicity.

They aren't really hash tables either, because they additionally store the order of the keys (in a normal hash table, order would be arbitrary)
How do they maintain their order?
By using a doubly linked list, where the first element in the "bucket", contains the next pointer to the next element in the hash table. Read zend_hash.h & zend_hash.c It is fairly complicated and explaining it in depth is beyond the scope of this comment.

This "bucket" also handles the collisions by using separate chaining. There is actually two "next" pointers, one for the chains, and one for the next element in order of insertion. Very confusing and requires reading through the code and playing with it.

You put things in order by memory address and they stay in order when you access them. An array is just a special case of a hash map -- one with a trivial hash function.

That's why the lend themselves to the same syntax so well.

You made my day, guys, please don't stop :)
You're right, thanks for pointing that out.
> arrays (aka maps) are basically used for everything.

This is interesting. In fact, I believe object properties share the same mechanism as associative arrays, that is, $a->b will actually lookup the hash of "b" in $a. Does this new hashtable layout influence object properties/methods too? That would be huge!

> Does this new hashtable layout influence object properties/methods too?

Not sure, but I don't think so. Ppl often think object properties and arrays are much alike, but the array's HashTable struct and the object's store struct are very different. The main performance gains are not about the buckets (which are simple and quite alike) but the array's hashtable idea.

Often objects (php>=5.4) have a better performance than arrays; arrays have an undefined length while with good code, all object properties are defined at compile time. Because of this, you don't need to store the data in a hashtable. Nikic has a post about this subject too: https://gist.github.com/nikic/5015323