Hacker News new | ask | show | jobs
by joevandyk 454 days ago
From https://rubyllm.com/#have-great-conversations

    # Stream responses in real-time
    chat.ask "Tell me a story about a Ruby programmer" do |chunk|
      print chunk.content
    end
2 comments

This will synchronously block until ‘chat.ask’ returns though. Be prepared to be paying for the memory of your whole app tens/low hundreds of MB of memory being held alive doing nothing (other than handling new chunks) until whatever streaming API this is using under the hood is finished streaming.
Threads?
Rails is a hot ball of global mutable state. Good luck with threads.
The default rails application server is puma and it uses threads
Yes, it does. Ruby has a global interpreter lock (GIL) that prevents multiple threads to be executed by the interpreter at the same time, so Puma does have threads, they just can’t run Ruby code at the same time. They can hide IO though.
The GIL is released during common IO operations like the HTTP requests that power LLM communication
That looks good, I didn't see that earlier.