| A curated list of LLM APIs with permanent free tiers, no trial credits, no credit card traps. Info included: rate limits, max context, and supported modalities. Here's the list per provider: Cohere (https://cohere.com/) • Command A (111B)
• Command R+
• Command R
• + 3 more model (https://docs.cohere.com/docs/models)
Google Gemini (https://ai.google.dev/) • Gemini 2.5 Flash
• Gemini 2.5 Flash-Lite
Mistral AI (https://mistral.ai/) • Mistral Small 4
• Mistral Medium 3
• Mistral Large 3
• + 3 more model (https://docs.mistral.ai/getting-started/models/models_overview/)
Z AI (Zhipu AI) (https://z.ai/) • GLM-4.7-Flash
• GLM-4.5-Flash
• GLM-4.6V-Flash
Inference providers - Third-party platforms that host open-weight models from various sources.Cerebras (https://cerebras.ai/) • llama3.1-8b
• gpt-oss-120b
• qwen-3-235b-a22b-instruct-2507
• zai-glm-4.7
Cloudflare Workers AI (https://developers.cloudflare.com/workers-ai/) • @cf/meta/llama-3.3-70b-instruct-fp8-fast
• @cf/meta/llama-3.1-8b-instruct-fp8-fast
• @cf/meta/llama-3.2-11b-vision-instruct
• + 5 more models (https://developers.cloudflare.com/workers-ai/models/)
GitHub Models (https://github.com/marketplace/models) • gpt-4.1
• gpt-4.1-mini
• gpt-4o
• + 7 more models (https://github.com/marketplace/models)
Groq (https://groq.com/) • llama-3.3-70b-versatile
• llama-3.1-8b-instant
• llama-4-scout-17b-16e-instruct
• + 7 more models (https://console.groq.com/docs/models)
Hugging Face (https://huggingface.co/) • Meta-Llama-3.1-8B-Instruct
• Mistral-7B-Instruct-v0.3
• Mixtral-8x7B-Instruct-v0.1
• Phi-3.5-mini-instruct
• Qwen2.5-7B-Instruct
Kilo Code (https://kilocode.ai/) • bytedance-seed/dola-seed-2.0-pro:free - Modality: Text | Rate Limit: ~200 req/hr
• x-ai/grok-code-fast-1:optimized:free - Modality: Text (code) | Rate Limit: ~200 req/hr
• nvidia/nemotron-3-super-120b-a12b:free
• arcee-ai/trinity-large-thinking:free - Modality: Text (reasoning) | Rate Limit: ~200 req/hr
• openrouter/free - Modality: Text | Rate Limit: ~200 req/hr
LLM7.io (https://llm7.io/) • deepseek-r1-0528 - Modality: Text (reasoning) | Rate Limit: 30 RPM (120 with token)
• deepseek-v3-0324 - Modality: Text | Rate Limit: 30 RPM (120 with token)
• gemini-2.5-flash-lite - Modality: Text + Vision | Rate Limit: 30 RPM (120 with token)
• + 3 more model (https://llm7.io/)
NVIDIA NIM (https://build.nvidia.com/) • deepseek-ai/deepseek-r1
• nvidia/llama-3.1-nemotron-ultra-253b-v1
• nvidia/nemotron-3-super-120b-a12b
• + 3 more models (https://build.nvidia.com/models)
Ollama Cloud (https://ollama.com/cloud) • llama3.1:cloud
• deepseek-r1:cloud
• qwen2.5:cloud
• gemma2:cloud
• mistral:cloud
OpenRouter (https://openrouter.ai/) • deepseek/deepseek-r1-0528:free
• deepseek/deepseek-chat-v3-0324:free
• qwen/qwen3.6-plus:free
• + 9 more free models (https://openrouter.ai/models?q=free)
SiliconFlow (https://siliconflow.com/) • Qwen/Qwen3-8B
• deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
• deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
• + 3 more model (https://siliconflow.com/models)
RPM = requests per minute • RPD = requests per day. TPM = Tokens per minute • TPD = Tokens per day • RPS = Requests per second • All endpoints are OpenAI SDK-compatible. |