Hacker News new | ask | show | jobs
by iambateman 1201 days ago
If I was running in a server context, would the 50gb of ram be required to respond to one request, or can it be used to respond to multiple requests simultaneously?
2 comments

I'm very late to this question, but I believe that that amount is only required once, but the context tensor will need to be created per request. I haven't confirmed that, though.
I'd assume that all the calculations used for 1 request would already eat up that amount of memory, but I could be wrong!