| HN Mirror

This is a great thread, thanks! Somehow I missed it when looking for prior art.

When I initially started implementing this I was hung up on similar concerns. For example in GPT2/PotatoGPT the MLP player is 4x the width of the residual stream. I went down a rabbit hole of addition and multiplication in Typescript types (the type system is Turing complete, so it's technically possible!) and after crashing my TS language server a bunch I switched tacticts.

Where I ended up was to use symbolic equivalence, which turned out to be more ergonomic anyway, i.e.

  type Multiply<A extends number, B extends number> = 
    number & { label: `${A} * ${B}` }
  const Multiply = <A extends number, B extends number>(a: A, b: B) => 
    a * b as Multiply<A, B>;

such that

  tensor([
    params.EmbeddingDimensions, // This is a literal with known size
    Multiply(4, params.EmbeddingDimensions)] as const)

is inferred as

  Tensor<readonly [768, Multiply<4, 768>]>

Notably, switching to a more symbolic approach makes it easier for type checking dimensions that can change at runtime, so something like:

  tensor([Var(tokens.length, 'Sequence Length'), 
          Multiply<4, Var(tokens.length, 'Sequence Length')>])

infers as

  Tensor<readonly [
     Var<'Sequence Length'>, 
     Multiply<4, Var<'Sequence Length'>>]>

And you'll get all the same correctness constraints that you would if these were known dimensions.

The downside to this approach is that typescript won't know that Multiply<4, Var<'A'>> is equivalent to Multiply<Var<'A'>, 4> but in practice I haven't found this to be a problem.

Finally, on more complicated operators/functions that compose dimensions from different variables Typescript is also very capable, albeit not the most ergonomic. You can check my code for matrix multiplication and Seb's writeup for another example of a zip function).