Y
Hacker News
new
|
ask
|
show
|
jobs
by
lucidrains
2612 days ago
I've tried it as well and got good syntactic results. For more sensical programs, I think we will need more layers & attn heads. Perhaps someone will fork gpt-2 and add the sparse transformer to it.