Hacker News new | ask | show | jobs
by littlestymaar 652 days ago
Is that really the case though? Available compute sounds unlikely to be the limiting factor here, compared to data which is way scarcer than what's being used to train LLMs, and I suspect Google used mostly publicly available data for training unless they signed deals beforehand with biotechnology companies which have access to more data. That's possible of course, but that doesn't feel very google-y.
1 comments

Yes, all data Google used was public. We have enough compute from YC (thanks YC!) to do this. The main thing is the technical infrastructure - processing the data, efficient loading at training time, proper benchmarking, etc. We are building these now.
Thanks for the answer! It's much better to have the definitive answer rather than rely on gut feeling (even though it was right in this case).

Keep up the good work!

How much compute does YC give you access to btw? Is that just things like azure credit or do YC have actual hardware?