Y
Hacker News
new
|
ask
|
show
|
jobs
by
darkbatman
237 days ago
By looking at the paper, memory needed per layer seems to be higher than transformer architecture. Pretty sure that would be blowing up the vram of gpu at scale.