|
|
|
|
|
by whimsicalism
1129 days ago
|
|
That work looks really interesting! I am also excited about type safety when it comes to tensors. My understanding was that this type safe approach to tensor shape had encountered issues because it was difficult/impossible (maybe?) to reason about the shape of some common operators at compile time. But perhaps those operators are not really necessary. [0] Some sort of typed 'named tensor' that could be combined with einsum notation at runtime would be awesome, ie. (don't really know TS/JS well but pseudocode) import { torch } from 'pytorch' as t
import { torch.nn } from 'pytorch' as nn
const tensorA: Tensor[Batch, Seq, Emb] = t.randn([10,10,10]) // initialize tensor
const transformLayer = nn.Einsum((Batch, Seq, Emb),(Emb)->(Batch, Seq))
const tensorB: Tensor[Emb2] = t.randn([20])
const transformedOutput = transformLayer(tensorA, tensorB) // type error: Emb2 does not match Emb
[0]: https://github.com/pytorch/pytorch/issues/26889 |
|
When I initially started implementing this I was hung up on similar concerns. For example in GPT2/PotatoGPT the MLP player is 4x the width of the residual stream. I went down a rabbit hole of addition and multiplication in Typescript types (the type system is Turing complete, so it's technically possible!) and after crashing my TS language server a bunch I switched tacticts.
Where I ended up was to use symbolic equivalence, which turned out to be more ergonomic anyway, i.e.
such that is inferred as Notably, switching to a more symbolic approach makes it easier for type checking dimensions that can change at runtime, so something like: infers as And you'll get all the same correctness constraints that you would if these were known dimensions.The downside to this approach is that typescript won't know that Multiply<4, Var<'A'>> is equivalent to Multiply<Var<'A'>, 4> but in practice I haven't found this to be a problem.
Finally, on more complicated operators/functions that compose dimensions from different variables Typescript is also very capable, albeit not the most ergonomic. You can check my code for matrix multiplication and Seb's writeup for another example of a zip function).