Hacker News new | ask | show | jobs
by lucidrains 2612 days ago
I've tried it as well and got good syntactic results. For more sensical programs, I think we will need more layers & attn heads. Perhaps someone will fork gpt-2 and add the sparse transformer to it.