LLM inference providers — low-latency, custom weights, aggregators
Groq, Cerebras, SambaNova, Together, Fireworks, OpenRouter, LiteLLM, Replicate, fal, Modal, Baseten, SiliconFlow — where to run models without babysitting GPUs.
Tools in this cluster
- Groq
Groq: popular AI product—see the official site for features, pricing, supported regions, data handling, and latest model lineup.
Inference / Hosting - Cerebras Inference
Wafer-scale inference service from Cerebras claiming extreme token throughput on popular open LLMs—great for latency-sensitive interactive apps; verify model list and quotas on the site.
Inference / Hosting - SambaNova Cloud
SambaNova Cloud: popular AI product—see the official site for features, pricing, supported regions, data handling, and latest model lineup.
Inference / Hosting - Together AI
Together AI: popular AI product—see the official site for features, pricing, supported regions, data handling, and latest model lineup.
Inference / Hosting - Fireworks AI
Fireworks AI: popular AI product—see the official site for features, pricing, supported regions, data handling, and latest model lineup.
Inference / Hosting - OpenRouter
OpenRouter: popular AI product—see the official site for features, pricing, supported regions, data handling, and latest model lineup.
Inference / Hosting - LiteLLM
Open-source proxy gateway exposing 100+ LLM vendors through one OpenAI-compatible API—routing, budgets, fallbacks, and logging without reinventing plumbing.
Inference / Hosting - Replicate
Replicate: popular AI product—see the official site for features, pricing, supported regions, data handling, and latest model lineup.
Inference / Hosting - fal
fal: popular AI product—see the official site for features, pricing, supported regions, data handling, and latest model lineup.
Inference / Hosting - Modal
Modal: popular AI product—see the official site for features, pricing, supported regions, data handling, and latest model lineup.
Inference / Hosting - Baseten
Baseten: popular AI product—see the official site for features, pricing, supported regions, data handling, and latest model lineup.
Inference / Hosting - 硅基流动 SiliconFlow
硅基流动 SiliconFlow: popular AI product—see the official site for features, pricing, supported regions, data handling, and latest model lineup.
Inference / Hosting