| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rkp8000 1206 days ago

For those confused about how the hyperdimensional computing (HDC) approach (also known as "vector symbolic architectures"/VSA or "holographic reduced representations"/HRR) described in the article differs from the use of vector embeddings in more mainstream artificial neural networks:

1. HDC is largely organized around the observation that certain symbolic-like operations become vastly simplified in high-dimensional spaces and can be performed with simple algebraic manipulations.

For instance, if you have a dictionary of words, each represented by a random (but fixed) high-dimensional vector, you can store subsets of these words just by summing their vector representations together. This works because random high-dim vectors are nearly orthogonal with very high probability. This implies that the sum of several of such vectors will have essentially a zero dot product with words (ie their embeddings) not included in the subset and a much larger dot product (~1 if the vectors are all normalized) with words included in the subset (as long as the number of words in the subset is sufficiently smaller than the total dictionary size). Hence the sum encodes the subset since subset membership can be checked with a dot product.

Notably, this also works when the dictionary size is exponentially large relative to the vector dimension, since it is possible to sample an exponentially large number of near-orthogonal vectors in a high-dimensional vector space (unlike in a low-dimensional vector space). This is also very mathematically similar to how a [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) works, which is a probablistic data structure commonly used for set membership queries when the set is extremely large (e.g. all possible URLs). (Note also, though, that this is at the expense of being able to decode/recover the set elements directly from the summed representation.)

HDC/VSAs capitalize on other nice properties of random high-dimensional vectors as well, e.g. via the ability to "bind" two words together using a circular convolution. See [Kanerva 2009](http://rctn.org/vs265/kanerva09-hyperdimensional.pdf) for a nice review of various examples like this. There are also various similar ways to represent sequences of words, tree structures, graphs, etc, all within a fixed- but high-dim vector space, so they provide a nice way to represent "syntactic" structure in data objects, if you will.

2. None of the above emerges as a property of network training. In fact, a network isn't even necessarily required, which could be a feature or a bug, depending on your perspective.

As a feature, this is useful in the sense that it provides an immediate no-training-required way of representing objects with fairly complex relational structure (e.g. a sequence of words or a tree-structure) as a fixed- but high-dimensional vector. In a traditional neural network (although this may be less true for very modern LLMs) trained to do something with, say, sentences of a max of 20 words, the network would likely fail to "figure out how to represent" sentences composed of 100 words. With the HDC/VSA approach, however, you get a representation of such a sentence right off the bat, and don't have to worry about it being inside/outside your training dataset. The "utility" of such a representation is not necessarily obvious, and will depend on exactly how it is created, but IMO it is nice to know that there is a systematic way of constructing one that does not interfere with others (with very high probability).

On the other hand, one of the main drawbacks of this approach (at least so far) is that it has generally not been obvious how to make these systems learn robustly, so that e.g. vector representations could change through learning to capture more semantic relationships between words and objects. Nonetheless, given the above one can imagine how the HDC/VSA approach may provide something akin to a useful inductive bias or initialization for representation learning in more trainable systems.

It will certainly be interesting to see if and how these might get incorporated into modern AI systems in the coming years.