|
|
|
|
|
by momojo
107 days ago
|
|
I'm surprised the point/comment ratio is this skewed. There's so much meat in the post to chew on. I like your writing. This was one of those blogs where I can tell you spent a massive amount of time on the technical, but simplified it to layman's terms. I hope you keep putting out stuff :). I have a couple questions: 1. I think this quote should be raising *many more* eyebrows. > The astounding thing about Goliath wasn’t that is was a huge leap in performance, it was that the damn thing functioned at all. To this day, I still don’t understand why this didn’t raise more eyebrows. You put a cat's brain into a dog's head and its still breathing! It didn't flatline immediately! Is yesterday's news? This seems like the biggest take away. Why isn't every <MODEL_PROVIDER> attempting LLM-surgery at this moment? Have you noticed any increasede discourse in this area? 2. You mentioned you spent the beginning of your career looking at brains in biotech. How did you end up in a basement of GPU's, working not in biotech, but still kind of looking at brains? Again, great post! |
|
On your questions: 1) A few other papers have been mentioned in the thread, like Solar10.7B. They duplicated the whole transformer stack, and it kinda helped. But as I found experimentally, that probably not a great idea. You are duplicating 'organs' (i.e. input processing stuff), that should only have one copy. Also, that paper didn't see immediate improvements; they had to do continued pre-training to see benefits. At that point, I'm guessing the big labs stopped bothering. Limited by hardware, I had to find unusual angles to approach this topic.
2) Nah, no more wetware for me. I did a half decade of research at a big neurobiology institute, and while it was very enjoyable, I can truly say that grant writing and paper review are 'not my thing'. This reason this info was delayed so long is that I wanted a paper in the AI field to go along with my papers in other fields. But as a Hobbyist with no official affiliation, and the attention span of a gnat, I gave up and started a blog instead. Maybe someone will cite it?