Hacker News new | ask | show | jobs
by punk_ihaq 1070 days ago
> Is it expected to be slow?

Probably, yes. The slowness is not on the Streamlit end, but on the Replicate API end. The docs for the 13b API [0] say:

> Predictions typically complete within 9 seconds.

Whereas for the 70b API [1]:

> Predictions typically complete within 18 seconds. The predict time for this model varies significantly based on the inputs.

[0] https://replicate.com/a16z-infra/llama13b-v2-chat

[1] https://replicate.com/replicate/llama70b-v2-chat