Available Models

W&B Inference provides access to several open-source foundation models. Each model has different strengths and use cases.

Model catalog

Model	Model ID (for API usage)	Type	Context Window	Parameters	Description
DeepSeek V3.1	`deepseek-ai/DeepSeek-V3.1`	Text	161k	37B-671B (Active-Total)	A large hybrid model that supports both thinking and non-thinking modes via prompt templates.
Google Gemma 4 31B	`google/gemma-4-31B-it`	Text, Vision	262k	31B (Total)	Gemma 4 31B Dense is designed for advanced reasoning, agentic workflows, and longer context and is natively trained on 140+ languages.
Meta Llama 3.3 70B	`meta-llama/Llama-3.3-70B-Instruct`	Text	128k	70B (Total)	Multilingual model excelling in conversational tasks, detailed instruction-following, and coding.
Meta Llama 3.1 70B	`meta-llama/Llama-3.1-70B-Instruct`	Text	128k	70B (Total)	Efficient conversational model optimized for responsive multilingual chatbot interactions.
Meta Llama 3.1 8B	`meta-llama/Llama-3.1-8B-Instruct`	Text	128k	8B (Total)	Efficient conversational model optimized for responsive multilingual chatbot interactions.
Microsoft Phi 4 Mini 3.8B	`microsoft/Phi-4-mini-instruct`	Text	128k	3.8B (Total)	Compact, efficient model ideal for fast responses in resource-constrained environments.
MiniMax M2.5	`MiniMaxAI/MiniMax-M2.5`	Text	197k	10B-230B (Active-Total)	MoE model with a highly sparse architecture designed for high-throughput and low latency with strong coding capabilities.
Moonshot AI Kimi K2.5	`moonshotai/Kimi-K2.5`	Text, Vision	262k	32B-1T (Active-Total)	Kimi K2.5 is a multimodal Mixture-of-Experts language model featuring 32 billion activated parameters and a total of 1 trillion parameters.
NVIDIA Nemotron 3 Super 120B	`nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8`	Text	262k	12B-120B (Active-Total)	Nemotron 3 is a LatentMoE model designed to deliver strong agentic, reasoning, and conversational capabilities.
OpenAI GPT OSS 120B	`openai/gpt-oss-120b`	Text	131k	5.1B-117B (Active-Total)	Efficient Mixture-of-Experts model designed for high-reasoning, agentic and general-purpose use cases.
OpenAI GPT OSS 20B	`openai/gpt-oss-20b`	Text	131k	3.6B-20B (Active-Total)	Lower latency Mixture-of-Experts model trained on OpenAI’s Harmony response format with reasoning capabilities.
OpenPipe Qwen3 14B Instruct	`OpenPipe/Qwen3-14B-Instruct`	Text	32.8k	14.8B (Total)	An efficient multilingual, dense, instruction-tuned model, optimized by OpenPipe for building agents with finetuning.
Qwen3 235B A22B Thinking-2507	`Qwen/Qwen3-235B-A22B-Thinking-2507`	Text	262k	22B-235B (Active-Total)	High-performance Mixture-of-Experts model optimized for structured reasoning, math, and long-form generation.
Qwen3 235B A22B-2507	`Qwen/Qwen3-235B-A22B-Instruct-2507`	Text	262k	22B-235B (Active-Total)	Efficient multilingual, Mixture-of-Experts, instruction-tuned model, optimized for logical reasoning.
Qwen3 30B A3B	`Qwen/Qwen3-30B-A3B-Instruct-2507`	Text	262k	3.3B-30.5B (Active-Total)	Qwen3-30B-A3B-Instruct-2507 is a 30.5B MoE instruction-tuned model with enhanced reasoning, coding, and long-context understanding.
Qwen3 Coder 480B A35B	`Qwen/Qwen3-Coder-480B-A35B-Instruct`	Text	262k	35B-480B (Active-Total)	Mixture-of-Experts model optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning.
Qwen3.5 35B A3B	`Qwen/Qwen3.5-35B-A3B`	Text, Vision	262k	3B-35B (Active-Total)	Qwen3.5-35B-A3B is an open-weights multimodal MoE model built for efficient, high-throughput inference across chat, reasoning, and agentic tasks.
Z.AI GLM 5	`zai-org/GLM-5-FP8`	Text	200k	40B-744B (Active-Total)	Mixture-of-Experts model for long-horizon agentic tasks with strong performance on reasoning and coding.
Meta Llama 4 Scout (deprecated)	`meta-llama/Llama-4-Scout-17B-16E-Instruct`	Text, Vision	64k	17B-109B (Active-Total)	Multimodal model integrating text and image understanding, ideal for visual tasks and combined analysis.

Using model IDs

When using the API, specify the model using its Model ID from the table above. For example:

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[...]
)

Next steps

Check usage limits and pricing for each model
See API reference for how to use these models
Try models in the W&B Playground

Response Settings

Tutorials

API Reference

Model catalog

Using model IDs

Next steps

Response Settings

Tutorials

API Reference

​Model catalog

​Using model IDs

​Next steps

Model catalog

Using model IDs

Next steps