Hacker News new | ask | show | jobs
by robbedpeter 1642 days ago
The transformer in transformer (TnT) model looks promising - you can set up multiple overlapping domains of attention, at arbitrary scales over the input.