This will synchronously block until ‘chat.ask’ returns though. Be prepared to be paying for the memory of your whole app tens/low hundreds of MB of memory being held alive doing nothing (other than handling new chunks) until whatever streaming API this is using under the hood is finished streaming.
Yes, it does. Ruby has a global interpreter lock (GIL) that prevents multiple threads to be executed by the interpreter at the same time, so Puma does have threads, they just can’t run Ruby code at the same time. They can hide IO though.