Hacker News new | ask | show | jobs
by dreaminvm 1158 days ago
Happy to see this type of work that is truly open source and commercially usable. Is this the entire corpus or a subset? Do you intend to release any new iterations?

I've been thinking of starting similar efforts at another BigCorp by hosting a UL2 or GPT-J instance.

1 comments

15k is the entire corpus we have right now. Hopefully others can join up in releasing additional samples that can be merged in over time.

We'll definitely keep iterating on Dolly and releasing everything openly.