Hacker News new | ask | show | jobs
by jiayq84 987 days ago
To show some actual coding examples, We have made the python library open-source at https://github.com/leptonai/leptonai/. With it, launching a common HuggingFace model is as simple as a one liner. For example, if you have a GPU, Stable Diffusion XL is as simple as:

pip install -U leptonai

lep photon run -n sdxl -m hf:stabilityai/stable-diffusion-xl-base-1.0 --local

And you have a local OpenAPI server that runs it! Go to http://0.0.0.0:8080/docs, or use your favorite OpenAPI client.

We've been building AI API services using such tools ourselves. The easiest way to try out Lepton is to head to https://lepton.ai/playground and use our API service for popular models: Stable Diffusion, LLaMA, WhisperX, and other interesting showcases

We are proud of our performance. For example, we have probably the fastest LLaMA 7B and 70B model APIs, and it costs $0.8 to run 1 million tokens inference - we believe it's the most affordable one in the market. In addition, during the open beta phase, calling these services is free when you sign up for the Lepton AI platform.

Under the hood, we wrote a platform to allow you to run things easily on the cloud with ease. For example, if you find Pygmalion to be a great conversation model but you don't have a GPU, use lepton's Remote() capability to launch a service:

from leptonai import Remote

pygmalion = Remote("hf:PygmalionAI/pygmalion-2-7b", resource_shape="gpu.a10")

Wait a few minutes for the model to be downloaded and run, and you can now use it as if it were a standard python function:

print(pygmalion.run(inputs="Once upon a time", max_new_tokens=128))

If you are interested in the operational details, you can find fine-grained controls at https://dashboard.lepton.ai/ as a fully managed platform - we also support BYOC (bring your own compute) if you are an enterprise needing more autonomy over infrastructure.