| HN Mirror

There's some secret sauce behind it, but mostly just using relatively inexpensive cloud inference hardware very effectively. It turns out most of the common NLP frameworks leave a good deal of performance on the table, not to mention the importance of minimizing cloud costs through general methods like using spot instances.