Model-Provider Matrix

I put this page together to save you the time I already burned smoke-testing model and path combinations while writing the guide. This is not a benchmark and it is not trying to answer "what is the best model?" It answers a much more practical question:

If you follow the tutorial as written, which models are likely to work cleanly, and where are you going to hit annoying provider-specific edge cases?

I compiled these results on March 29, 2026 using the free-tier API keys for Ollama and Hugging Face, reran the new direct Gemini API checks on March 31, 2026 with a live GEMINI_API_KEY using google-genai 1.69.0, and then added live GitHub Models checks on April 1, 2026 with GITHUB_TOKEN against GitHub's hosted inference API. Model routing, account entitlements, local model inventory, and platform behavior can all change, so treat this as a field note from a real run, not a timeless truth.

What I actually checked

I focused on the parts most likely to waste a your time if they go wrong:

JSON summarization
Schema-constrained extraction
Multi-turn chat where the lesson depends on it
Tool calling
Embeddings where the lesson depends on them

If I were handing this guide to a friend today

If you just want the shortest path to examples that behaved well for me, start here:

Path	Model I would start with	I would use it for	Caveat
OpenAI	`gpt-5.4-nano`	Cheap validation of the foundation app flow and tool calling	Pair it with `text-embedding-3-small` for embeddings.
Gemini	`gemini-2.5-flash`	Foundation lesson contracts through the direct Gemini API	Validated on March 31, 2026 with `google-genai 1.69.0`; uses the native Gemini SDK surface rather than an OpenAI-compatible client. If you need hosted Gemini tuning, that is a Vertex AI path, not the Gemini API path validated here.
Anthropic	`claude-haiku-4-5-20251001`	Cheap structured-output and tool-calling checks	Use the updated `output_config` examples. Prompt-only "return JSON" was not reliable enough.
Hugging Face Inference Providers	`Qwen/Qwen2.5-7B-Instruct`	Summarization, extraction, and tool calling	Smaller routed models were either unavailable or did not hold the contract.
GitHub Models	`openai/gpt-4.1`	Foundation lesson contracts and hosted GitHub Models examples through GitHub auth	Validated here with `GITHUB_TOKEN`, GitHub-specific headers, and publisher/model IDs such as `openai/gpt-4.1`. On the token I checked on April 1, 2026, Anthropic and Google models were not present in the catalog.
Ollama Cloud	`gpt-oss:20b`	Hosted Ollama chat, JSON summarization, and tool-calling smoke tests	Good if you want the Ollama path without local hardware. Do not assume direct Ollama Cloud embeddings are available; use local Ollama or the hybrid path for retrieval lessons.
Local Ollama	`qwen3.5:latest`	Local summarization, extraction, and tool calling	You still need to `ollama pull qwen3.5:latest` before first run.
Local Ollama	`embeddinggemma:latest`	Local embeddings for the beginner retrieval path	Pull it explicitly before running the example.

How I am using the labels

Status	What I mean by it
Pass	The example worked as written in the tutorial.
Partial	Part of the surface worked, but an important part did not or needed a caveat.
Fail	The example did not hold its contract as written. A learner would probably hit confusing output or an error.
Unavailable	The model was not visible on the account or provider surface I was using that day.

Compatibility by path

Select a path tab to see which models I tested, what worked, and where I hit issues.

OpenAI

OpenAI was the most straightforward path during validation. The foundation lesson contracts held across all models I tested, and embeddings worked without surprises.

Default models (currently in lesson code)

Model	What I tested	Status	Notes
`gpt-4o-mini`	JSON summarization, schema-constrained extraction, multi-turn chat, tool calling	Pass	The foundation API lesson contracts held once the response had enough output tokens to finish the JSON object.
`gpt-4o`	Basic chat availability for judge or distillation examples	Pass	A direct chat probe worked. I did not run full eval or distillation workflows.
`gpt-4.1-nano`	Small-model summarization or classification defaults	Pass	A direct chat probe worked.
`gpt-4.1-mini`	Basic chat availability for memory and specialist examples	Pass	A direct chat probe worked.
`gpt-4o-mini-2024-07-18`	Base-model availability for fine-tuning examples	Pass	A direct chat probe worked. I did not run a fine-tuning job.
`text-embedding-3-small`	Embeddings	Pass	Returned a 1536-dimension vector.

Lower-cost alternatives

Model	What I tried	Status	Notes
`gpt-5.4-nano`	Foundation app flow: `/health`, `/echo`, `/summarize-request`, `/summarize`, `/chat`, `/extract/bug-report`, `/chat-with-tools`	Pass	This was the cleanest low-cost end-to-end validation path.
`text-embedding-3-small`	Embeddings for the retrieval path	Pass	Returned the expected embedding vector shape and remains the cheapest good fit here.

Model	What I tested	Status	Notes
`gemini-2.5-flash`	Basic chat, structured summarization, multi-turn chat, schema-constrained extraction, tool calling	Pass	All five foundation lesson contracts held on the direct Gemini API with the native `google-genai` SDK. The smoke run returned valid typed JSON for summarization and extraction and completed the explicit manual tool-calling loop cleanly.
`gemini-embedding-001`	Embeddings for the retrieval path	Pass	Returned 768-dimension vectors when I requested `output_dimensionality=768`, matching the retrieval sample now in the lesson.

Model	What I tested	Status	Notes
`claude-sonnet-4-6`	Structured summarization, schema-constrained extraction, tool calling	Pass	The updated `output_config` examples worked cleanly.

Model	What I tried	Status	Notes
`claude-haiku-4-5-20251001`	Prompt-only JSON summarization and extraction	Fail	It returned fenced ```json blocks, so direct `json.loads(response.content[0].text)` fail without manipulating the response.
`claude-haiku-4-5-20251001`	Structured output via `output_config` plus tool calling	Pass	Clean parseable JSON for summarization and extraction, and tool use worked.
`claude-3-5-haiku-latest`	Small-model smoke-test target	Unavailable	This model was not exposed on this key during validation.

Model	What I tested	Status	Notes
`openai/gpt-oss-120b:fastest`	Plain chat	Pass	A normal chat request returned `ok`. This still appears in text-only prompt lessons.
`Qwen/Qwen2.5-7B-Instruct`	JSON summarization, schema-constrained extraction, tool calling	Pass	This is now the structured-output default in building-with-apis.md because it actually held all three contracts on this token.
`meta-llama/Llama-3.1-8B-Instruct`	Basic chat generation	Pass	Normal text generation worked.
`sentence-transformers/all-MiniLM-L6-v2`	Embeddings	Pass	Returned a 384-dimension vector.

Model	What I tried	Status	Notes
`Qwen/Qwen2.5-3B-Instruct`	Small-model smoke-test target	Unavailable	It was not available through the enabled providers on this token.
`Qwen/Qwen3-4B-Instruct-2507`	JSON summarization through the OpenAI-compatible router	Pass	The simple summarizer path worked.
`Qwen/Qwen3-4B-Instruct-2507`	Schema-constrained extraction through `InferenceClient.chat_completion`	Fail	It ignored the schema and returned prose or markdown instead of JSON.
`Qwen/Qwen2.5-7B-Instruct`	JSON summarization, schema-constrained extraction, tool calling	Pass	This was the smallest Hugging Face model I validated that held all the tested foundation contracts on this token.

Model	What I tested	Status	Notes
`openai/gpt-4.1`	Basic chat, JSON summarization, multi-turn chat, schema-constrained extraction, tool calling	Pass	This held the same foundation lesson contracts as the direct OpenAI path once I used GitHub's required headers and publisher/model ID format. I also validated a full manual tool-calling round trip, not just tool-call emission.
`openai/text-embedding-3-small`	Embeddings	Pass	The embeddings endpoint returned usable vectors through GitHub's hosted inference API.

Model	What I tested	Status	Notes
`qwen3.5:latest`	JSON summarization, schema-constrained extraction, tool calling	Pass	This is now the default local Ollama generation model across the runnable lesson code.
`embeddinggemma:latest`	Embeddings	Pass	Returned a 768-dimension vector.
`nomic-embed-text:latest`	Embeddings	Pass	Returned a 768-dimension vector.

Model	What I tested	Status	Notes
`gpt-oss:20b`	JSON summarization and tool calling	Pass	This is now the default hosted Ollama model in the cloud summarization and tool-calling examples because it held those contracts cleanly at lower cost.
`gpt-oss:120b`	Hosted JSON extraction fallback	Pass	This is the cloud extraction fallback in building-with-apis.md. It gave me clean JSON when I used `format="json"` plus Pydantic validation on my side.

Model	What I tried	Status	Notes
`qwen2.5:3b-instruct`	Small-model smoke-test target	Unavailable	It was not visible on this key during validation.
`ministral-3:3b`	JSON summarization	Fail	It returned fenced JSON, so the lesson's direct `json.loads(...)` path was not beginner-safe.
`gemma3:4b`	JSON summarization	Fail	Same problem: markdown fences instead of raw JSON.
`gpt-oss:20b`	Schema-constrained extraction	Fail	It ignored the schema and returned a markdown table instead of JSON.

What I had to patch to make the defaults trustworthy

I replaced the old Hugging Face structured-output defaults because openai/gpt-oss-120b:fastest and Qwen/Qwen3-32B did not hold the lesson contracts as written.
I split Ollama into explicit local and cloud variants in the lesson so learners do not have to infer where a caveat applies. Local Ollama remains the main strict schema path. Ollama Cloud now has its own full examples where the hosted behavior was good enough to justify them.
For embedding-heavy lessons, I introduced an explicit Ollama hybrid path: local Ollama for embeddings, Ollama Cloud for generation. That keeps the retrieval lessons honest about what I could actually validate.
I still think Ollama Cloud is a reasonable hosted on-ramp if you want the Ollama ecosystem without a local GPU. I just do not want to over-promise on strict schema extraction when my own smoke tests did not justify that.
For hosted Ollama extraction, the most honest pattern I found was format="json" plus BugReport.model_validate_json(...) on the app side. That held up on gpt-oss:120b. The stricter format=<json schema> path did not.
I made the Ollama lessons explicit about ollama pull ... because "Ollama is running" is not enough if the exact model in the code is missing.

How I would use this page

I would read this page alongside Choosing a Provider, Build with APIs, Not Chat Apps, and Model Selection and Serving.

If you want Ollama but do not have the hardware or patience for local setup yet, I would start with Ollama Cloud for the chat-first and summarization-heavy parts of the guide. When you hit embedding-heavy or structured-output-heavy sections, I would switch to local Ollama or the hybrid path rather than assume direct cloud embeddings will be there.

If you change lesson code, provider SDKs, or default model IDs, rerun the smoke tests and update this page with the new date, model IDs, and caveats. The point here is not proving that a model is good in the abstract, but to keep learners from burning an hour on a failure mode that we already know how to avoid.

Model-Provider Matrix

What I actually checked

If I were handing this guide to a friend today

How I am using the labels

Compatibility by path

OpenAI

Default models (currently in lesson code)

Lower-cost alternatives

Gemini

Gemini API vs. Vertex AI

Current status

Provider-specific pattern

Anthropic

Default models (currently in lesson code)

Lower-cost alternatives

Provider-specific pattern

Hugging Face Inference Providers

Default models (currently in lesson code)

Lower-cost alternatives

Provider-specific pattern

GitHub Models

Current status

Platform-specific pattern

Ollama (Local)

Default models (currently in lesson code)

Provider-specific pattern

Ollama Cloud

Default models (currently in lesson code)

Lower-cost alternatives

Provider-specific pattern

What I had to patch to make the defaults trustworthy

How I would use this page