|
|
|
Infinity embedding engine added to KubeAI
|
|
1 points
by samosx
638 days ago
|
|
Just merged and released the [Infinity support PR](https://github.com/substratusai/kubeai/pull/197) in KubeAI, adding Infinity as an embedding engine. So you can get embeddings on your local Kubernetes clusters with an OpenAI compatible API. Infinity is a high performance and low latency embeddings engine: https://github.com/michaelfeil/infinity
KubeAI is a Kubernetes Operator for running OSS ML serving engines: https://github.com/substratusai/kubeai How to use this? Deploy on any K8s cluster by running:
```
helm repo add kubeai https://www.kubeai.org
helm install kubeai kubeai/kubeai --wait --timeout 10m
cat > model-values.yaml << EOF
catalog:
bge-embed-text-cpu:
enabled: true
features: ["TextEmbedding"]
owner: baai
url: "hf://BAAI/bge-small-en-v1.5"
engine: Infinity
resourceProfile: cpu:1
minReplicas: 1
EOF
helm install kubeai-models kubeai/models -f ./model-values.yaml
``` Forward kubeai service to local host:
```
kubectl port-forward svc/kubeai 8000:80
``` Afterwards you could use the OpenAI Python client to get embeddings:
```
from openai import OpenAI
# Assumes port-forward of kubeai service to localhost:8000.
client = OpenAI(api_key="ignored", base_url="http://localhost:8000/openai/v1")
response = client.embeddings.create(
input="Your text goes here.",
model="bge-embed-text-cpu"
)
print(response)
``` What’s next?
- Support for autoscaling based on Infinity reported metrics. |
|