Hacker News new | ask | show | jobs
by dragonwriter 526 days ago
When you say RAG calls are you talking about requests to the (usually vector db) external datastore, or the repeated calls to the LLM?

“Guardrails” are often just calls to one or more (usually smaller) classification/moderation models.

1 comments

call to external db, and then to llm with retrieved context.

also business rules, no?

> call to external db, and then to llm with retrieved context.

Right, neither of those are CPU intensive in a different way than LLM inference itself (the latter is LLM inference itself.)

> also business rules, no?

Business rules can vary quite a bit in content and complexity, but either tend to be simple enough that they won’t impose much additional load, or complex enough that you are probably going to want to simply use an existing rules engine (many of which, regardless of their implementation language, have Python bindings) which are going to behave the same way no matter what language you call them from.

Fair points!