Hacker News new | ask | show | jobs
by two_in_one 871 days ago
From the post:

> I implemented imperative code that does what I’m proposing the transformer is doing. It produces outputs very similar to the transformer.

This means there is probably a way to bypass transformers and get the same results. Would be interesting if it's more efficient. Like given foundation model train something else and run it on much smaller device.

1 comments

I explained that it's not bypassing transformers and not more efficient in another comment: https://news.ycombinator.com/item?id=39254966