Hacker News new | ask | show | jobs
by pankajdoharey 155 days ago
If they are real slime balls they can justify it by saying you see we use speculative decoding so we first use a smaller faster model model first and then then answer is enhanced by larger model blah blah ..... "FOr the best User experience"