| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by stosssik 63 days ago

A curated list of LLM APIs with permanent free tiers, no trial credits, no credit card traps.

Info included: rate limits, max context, and supported modalities.

Here's the list per provider:

Cohere (https://cohere.com/)

  • Command A (111B)
  • Command R+
  • Command R
  • + 3 more model (https://docs.cohere.com/docs/models)

Google Gemini (https://ai.google.dev/)

  • Gemini 2.5 Flash
  • Gemini 2.5 Flash-Lite

Mistral AI (https://mistral.ai/)

  • Mistral Small 4
  • Mistral Medium 3
  • Mistral Large 3
  • + 3 more model (https://docs.mistral.ai/getting-started/models/models_overview/)

Z AI (Zhipu AI) (https://z.ai/)

  • GLM-4.7-Flash
  • GLM-4.5-Flash
  • GLM-4.6V-Flash

Inference providers - Third-party platforms that host open-weight models from various sources.

Cerebras (https://cerebras.ai/)

  • llama3.1-8b
  • gpt-oss-120b
  • qwen-3-235b-a22b-instruct-2507
  • zai-glm-4.7

Cloudflare Workers AI (https://developers.cloudflare.com/workers-ai/)

  • @cf/meta/llama-3.3-70b-instruct-fp8-fast
  • @cf/meta/llama-3.1-8b-instruct-fp8-fast
  • @cf/meta/llama-3.2-11b-vision-instruct
  • + 5 more models (https://developers.cloudflare.com/workers-ai/models/)

GitHub Models (https://github.com/marketplace/models)

  • gpt-4.1
  • gpt-4.1-mini
  • gpt-4o
  • + 7 more models (https://github.com/marketplace/models)

Groq (https://groq.com/)

  • llama-3.3-70b-versatile
  • llama-3.1-8b-instant
  • llama-4-scout-17b-16e-instruct
  • + 7 more models (https://console.groq.com/docs/models)

Hugging Face (https://huggingface.co/)

  • Meta-Llama-3.1-8B-Instruct
  • Mistral-7B-Instruct-v0.3
  • Mixtral-8x7B-Instruct-v0.1
  • Phi-3.5-mini-instruct
  • Qwen2.5-7B-Instruct

Kilo Code (https://kilocode.ai/)

  • bytedance-seed/dola-seed-2.0-pro:free - Modality: Text | Rate Limit: ~200 req/hr
  • x-ai/grok-code-fast-1:optimized:free - Modality: Text (code) | Rate Limit: ~200 req/hr
  • nvidia/nemotron-3-super-120b-a12b:free
  • arcee-ai/trinity-large-thinking:free - Modality: Text (reasoning) | Rate Limit: ~200 req/hr
  • openrouter/free - Modality: Text | Rate Limit: ~200 req/hr

LLM7.io (https://llm7.io/)

  • deepseek-r1-0528 - Modality: Text (reasoning) | Rate Limit: 30 RPM (120 with token)
  • deepseek-v3-0324 - Modality: Text | Rate Limit: 30 RPM (120 with token)
  • gemini-2.5-flash-lite - Modality: Text + Vision | Rate Limit: 30 RPM (120 with token)
  • + 3 more model (https://llm7.io/)

NVIDIA NIM (https://build.nvidia.com/)

  • deepseek-ai/deepseek-r1
  • nvidia/llama-3.1-nemotron-ultra-253b-v1
  • nvidia/nemotron-3-super-120b-a12b
  • + 3 more models (https://build.nvidia.com/models)

Ollama Cloud (https://ollama.com/cloud)

  • llama3.1:cloud
  • deepseek-r1:cloud
  • qwen2.5:cloud
  • gemma2:cloud
  • mistral:cloud

OpenRouter (https://openrouter.ai/)

  • deepseek/deepseek-r1-0528:free
  • deepseek/deepseek-chat-v3-0324:free
  • qwen/qwen3.6-plus:free
  • + 9 more free models (https://openrouter.ai/models?q=free)

SiliconFlow (https://siliconflow.com/)

  • Qwen/Qwen3-8B
  • deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
  • deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
  • + 3 more model (https://siliconflow.com/models)

RPM = requests per minute • RPD = requests per day. TPM = Tokens per minute • TPD = Tokens per day • RPS = Requests per second • All endpoints are OpenAI SDK-compatible.