Field Terms We Don't Teach (and Why)
You'll encounter terms in blog posts, vendor documentation, and tutorial sites that don't appear in this learning path, or that appear here with different categorizations than you'll find elsewhere. That's deliberate.
The AI field has a terminology problem. The same word means different things to different vendors. Distinct concepts get lumped together. Marketing categories masquerade as technical ones. And things that are really just applications of a technique get listed as separate techniques alongside the technique they're built on.
This page documents the terms and categorizations we intentionally exclude, recategorize, or use differently from common external sources. If you Google something and find a categorization that contradicts what we teach, check here first. We may have made a deliberate choice, and this page explains why.
How to use this page
- When you encounter a term elsewhere that contradicts the curriculum: Check here for whether the difference is intentional.
- When you're confused about categorization: The distinction between "technique" and "application of a technique" is the most common source of confusion. If something sounds like a separate technique but its description is "SFT but with different data," it's an application.
- When you want to go deeper: The "premature" section points you toward real techniques that are worth learning once you've completed the curriculum.
- When a topic feels important but unfamiliar: The "adjacent disciplines" section covers topics that are real but belong to different engineering roles than the one this curriculum teaches.
This page will grow over time as we encounter more terminology confusion in the field.
Fine-tuning categorizations
External resources (including Google Cloud's fine-tuning guide) often list these as distinct "types of fine-tuning" alongside SFT, LoRA, and DPO. I don't categorize them that way because they're applications of those techniques, not separate techniques. It's like saying git and GitHub are two different version control systems.
Few-shot learning
What external sources say: A type of fine-tuning where the model is given a few examples.
What it actually is: A prompting technique. You put examples in the prompt at inference time. No weights change. No training happens. This is prompt engineering, covered in Module 1. Categorizing it alongside supervised fine-tuning conflates inference-time and training-time techniques, two fundamentally different operations.
Where we teach the real concept: Prompt Engineering Fundamentals covers few-shot prompting as a prompt construction technique.
Transfer learning
What external sources say: A type of fine-tuning where the model leverages knowledge from pre-training.
What it actually is: The general paradigm that makes fine-tuning possible at all. Every fine-tuning technique is transfer learning, where you're transferring knowledge from the pre-trained model to the task-specific model. Listing it as a "type" alongside the specific techniques is like listing "cooking" as a type of recipe alongside "stir-fry" and "braising."
Where we teach the real concept: The concept is implicit throughout Module 8. When we fine-tune with LoRA, we're doing transfer learning. We just don't use a separate label for the general paradigm.
Domain-specific fine-tuning
What external sources say: A type of fine-tuning where the model is adapted to a particular domain.
What it actually is: SFT applied to domain-specific data. The technique is SFT. What changes is the data curation strategy: you collect training examples from the target domain. This is an application of SFT, not a separate technique.
Where we learn the real concept: Fine-Tuning teaches SFT with data curation from your specific run logs and failure clusters, which is domain-specific fine-tuning in practice, without the misleading separate label.
Multi-task learning
What external sources say: A type of fine-tuning where the model is trained on multiple tasks simultaneously.
What it actually is: SFT with a training dataset that includes examples from multiple tasks. The technique is still SFT. The data curation strategy includes variety across tasks. This is a valid technique at scale, but listing it as a separate "type" alongside SFT obscures the fact that it's SFT with a different dataset composition.
Why I'm excluding it: This learning path focuses on bounded, single-task fine-tuning because that's where beginners should start. Multi-task training introduces task interference and data balancing problems that are premature for someone doing their first fine-tune.
Sequential fine-tuning
What external sources say: A type of fine-tuning where the model is adapted to a series of related tasks in stages.
What it actually is: Running SFT multiple times in sequence. Each round is standard SFT. The "sequential" part is a training strategy, not a technique. The main risk is catastrophic forgetting between stages, which we cover as a concept in the fine-tuning lesson.
Why I'm excluding it: It's an advanced training strategy, not a foundational technique. If you understand SFT, catastrophic forgetting, and evaluation you can figure out sequential training when you need it.
Terminology we use differently
These terms appear in the curriculum, but we use them differently from how some external sources do.
"Inferencing" → inference
What external sources say: "Inferencing" as a gerund for running a model.
What I use here: Use "inference" as both noun and verb. "Run inference," not "do inferencing." The "-ing" form is non-standard and adds no precision. This is a minor style choice, but consistency matters when learners are building vocabulary.
"LRM" → reasoning model
What external sources say: "Large Reasoning Model" (LRM) as a category name for models like o3.
What I use here: "Reasoning model." The "LRM" acronym hasn't formally been adopted across provider documentation the way "LLM" has. "Reasoning model" is more descriptive and used more consistently across OpenAI, Anthropic, and Google docs.
"RAG database" → retrieval method
What external sources say: "RAG database" or "vector database" as synonymous with RAG.
What I use here: RAG is a pattern, not a database choice. The retrieval step can use any backing store or search method: grep, BM25, SQL, AST, vector search, graph traversal, or combinations. A vector database is one option for one part of the pattern. See the Retrieval Method Chooser.
"Prompt tricks" → prompt engineering
What external sources say: Prompt engineering as a collection of tricks, hacks, or magic phrases.
What I use here: Prompt engineering is writing contracts. It's decomposition, constraint specification, output schema design, and debugging. The "trick" framing implies prompting is a bag of hacks you memorize. The "contract" framing means you're designing the interface between your system and the model.
"Agentic AI" → AI engineering (with agents as one tool)
What external sources say: "Agentic AI" as a category of AI systems.
What I use here: Agents are a tool, not a category. An agent is a model + tools + control loop. Whether to use an agent (vs. a simpler pipeline) is an engineering decision, not an identity. I teach agents in Module 3 alongside non-agentic approaches because the question is always "does this task need an agent?", not "how do I make everything agentic?"
"AWS Bedrock" / "Vertex AI" → cloud provider surfaces, not separate providers
What external sources say: AWS Bedrock and Google Vertex AI listed alongside OpenAI and Anthropic as "AI providers."
Why I'm excluding them as separate provider tabs: Bedrock and Vertex AI are deployment surfaces for models that already have direct provider APIs. Claude on Bedrock is still Claude. Gemini on Vertex AI is still Gemini. Adding them as separate provider paths would mean maintaining duplicate code for the same model behavior with different SDK calls, complexity without pedagogical value. Instead, the curriculum teaches through direct provider SDKs and provides a Cloud Provider Surfaces reference with translation tables for learners who access models through cloud platforms.
"GitHub Copilot SDK" → agent platform / runtime, not provider
What external sources say: GitHub markets Copilot SDK as a way to build agentic systems into your own applications.
Why I'm excluding it as a provider tab: That capability is real, but the layer is different. Copilot SDK gives you an agent runtime: sessions, tool execution, control flow, and the surrounding Copilot orchestration layer. The curriculum's provider tabs mean "the direct model API surface you call when teaching message structure, structured outputs, embeddings, and tool calls at the contract level." Copilot SDK sits above that layer. It is better understood later alongside agent frameworks and orchestration systems than alongside direct provider APIs.
"GitHub Models" → hosted inference / routing platform, supported path but not direct provider API
What external sources say: GitHub Models exposes a real inference API that lets you call multiple publishers' models through GitHub authentication.
What I use here: GitHub Models is a supported hosted inference path in the curriculum now. The clarification is about layer, not exclusion: it is not a direct provider API in the same category as OpenAI, Gemini, or Anthropic. It is a platform layer in front of multiple publishers. That's why the chooser, lessons, and reference pages talk about it as a hosted inference path rather than pretending it is the native publisher API for a model family.
Tools that overlap more than they differ
Some tools get compared as if they're competing alternatives when they're actually layers of the same stack, or the same engine with different interfaces. Learners spend time researching "Ollama vs LM Studio vs llama.cpp" when the real question is simpler than it looks.
llama.cpp vs Ollama vs LM Studio
The confusion: Three tools for running local models. Blog posts compare them head-to-head as if you need to pick one. Benchmarks show marginal speed differences. Feature matrices list dozens of checkboxes.
The relationship they don't make obvious: Ollama and LM Studio are both built on top of llama.cpp. They're not alternatives to it; they're convenience layers. llama.cpp is the inference engine. Ollama wraps it in a developer-friendly daemon with a REST API and model registry. LM Studio wraps it (plus Apple's MLX engine on Mac) in a desktop GUI with visual controls. Choosing between them isn't like choosing between PostgreSQL and MySQL. It's like choosing between using PostgreSQL directly, through an ORM, or through a GUI client.
What I use in this curriculum:
- Ollama is the default for local inference throughout the learning path. It provides a REST API at
http://localhost:11434that works like any other provider endpoint, which means your code doesn't need to know whether the model is local or hosted. One command to install, one command to run a model. That's the right abstraction for AI engineering work. - llama.cpp appears in the Hardware Guide as the option for maximum control: custom quantization, unusual hardware (ARM, Raspberry Pi, edge devices), or fully auditable open-source requirements. If you need to tune GPU layer offloading or KV-cache quantization, llama.cpp gives you those knobs.
- LM Studio isn't taught in the curriculum. It's a good tool for exploring models visually, letting you adjust temperature and context length with sliders and compare model outputs side by side. But it requires a GUI, can't run headless, and is closed source. For the programmatic, API-driven work this curriculum teaches, Ollama is a better fit.
When the choice actually matters:
| If you need... | Use |
|---|---|
| A local API endpoint for your application | Ollama |
| Visual exploration of model behavior | LM Studio |
| Maximum performance on unusual hardware | llama.cpp |
| Apple Silicon MLX acceleration | LM Studio (uses MLX engine instead of llama.cpp) |
| Headless/containerized deployment | Ollama or llama.cpp (LM Studio requires a GUI) |
| Fine-grained control over inference parameters | llama.cpp |
Ollama and LM Studio are convenience layers built on top of llama.cpp (the inference engine). They're not competing alternatives; they're different interfaces to the same foundation.
**Some practical advice:** Start with Ollama. If you later need something it doesn't provide, such as MLX acceleration, raw performance tuning, headless GPU management, you'll know exactly what you need and why. Many developers use LM Studio for discovery and Ollama for development. That's not hedging, but using each tool for what it's good at.
Concepts that are real but premature
These are legitimate techniques I've chosen not to teach in detail. Not because they're wrong, but because they require prerequisites or infrastructure that are beyond the scope on this curriculum.
RLHF (Reinforcement Learning from Human Feedback)
What it is: A training method that uses a separately trained reward model to optimize the language model's behavior based on human preferences.
Why I mention but don't teach it: RLHF requires training two models (the reward model and the language model), collecting human preference data at scale, and managing a more complex training loop. DPO achieves similar goals with a simpler setup. I teach DPO as the accessible entry point to preference optimization and reference RLHF for learners who need the full machinery.
ORPO, KTO, SimPO (newer preference methods)
What they are: ORPO (Odds Ratio Preference Optimization), KTO (Kahneman-Tversky Optimization), and SimPO (Simple Preference Optimization) are alternatives to DPO that simplify or improve preference-based training in various ways.
Why I'm excluding them: The landscape is still stabilizing. DPO teaches the core concept of preference-based optimization, and the mechanics transfer to newer methods. I'll revisit these as the field converges.
Full fine-tuning (all parameters)
What it is: Updating every parameter in the model during training, not just LoRA adapters.
Why I included PEFT instead: Full fine-tuning requires multi-GPU setups and significantly more VRAM than the curriculum assumes. PEFT/LoRA/QLoRA achieves most of the same benefits on consumer hardware. I mention full fine-tuning in the fine-tuning landscape table so learners know it exists.
Patterns that skip measurement
These are development patterns you'll encounter in the wild that produce working results often enough to feel productive, but bypass the measurement and attribution discipline the curriculum teaches. They aren't necessarily wrong in every context, but they're problematic as defaults, and learners who adopt them early will struggle to diagnose failures later.
Retry loops without evals ("the Ralph Wiggum loop")
What it is: A pattern where an AI agent is given a prompt and run in a loop. If the output isn't done, feed the same prompt back and try again. The simplest version is literally while :; do cat PROMPT.md | claude ; done. The loop runs until a completion signal appears or a maximum iteration count is hit.
Why it's appealing: It works for hackathons and demos. Iteration beats one-shot perfection, and the pattern requires almost no setup. There are real examples of impressive results from brute-force retry loops.
Why I don't teach it:
-
No failure attribution. When the loop takes 30 iterations instead of 3, you have no way to know why. Was the prompt unclear? Was the task too broad? Did the model need different context? The loop doesn't distinguish between "almost right on attempt 1" and "fundamentally wrong for 29 attempts, then lucky on attempt 30." Without attribution, you can't improve the system. You can only run the loop again and hope.
-
No cost visibility. Each iteration costs tokens. A loop running overnight with no cost tracking or token budgeting is exactly the operational blindness that Module 6 teaches you to avoid. An impressive result that cost $297 in API calls is less impressive when you ask: what did the wasted iterations cost, and could the same result have been achieved in 3 targeted iterations with better decomposition?
-
Prompt-as-specification without decomposition. The pattern treats a large prompt as a monolithic specification and hopes repeated attempts will converge on the right output. The curriculum teaches the opposite, where we decompose complex tasks into steps, constrain each step's output, and verify before proceeding. Decomposition is more work up front but dramatically more reliable and debuggable.
-
It trains the wrong instinct. A learner who reaches for retry loops when things fail is learning to throw compute at problems. A learner who reaches for evals, failure attribution, and the optimization ladder is learning to diagnose problems. Where the second instinct compounds, the first one just costs more over time.
What the disciplined version looks like: If you added eval checks between iterations (run the benchmark, check whether the failure count decreased, attribute the remaining failures, and decide whether to continue, change the prompt, or change the approach), you'd have an automated version of the harness from Module 6. That's a legitimate pattern. The difference between a retry loop and an eval-driven improvement loop is measurement, and this makes all the difference for production.
The short version: Iteration is good; blind iteration is vibe coding with a while loop.
Concepts from adjacent disciplines
These are valid, important topics, but they belong to different engineering disciplines than the one this curriculum teaches. AI engineering (building applications on top of models via inference APIs) is not the same as ML engineering (training and serving models), ML infrastructure (managing compute and storage for training), or data engineering (building pipelines that feed training). If you encounter these topics and feel like you should be learning them, check whether they're actually relevant to the work you're doing.
AI storage
What it is: Specialized storage infrastructure designed to handle the I/O demands of AI training workloads, such as moving massive datasets between storage and GPUs, managing model checkpoints, and sustaining the throughput that training pipelines require. Vendors like VAST Data, NetApp, and others market purpose-built storage systems for these workloads.
Why it sounds relevant: "AI" is in the name. If you're learning AI engineering, surely you need to understand AI storage?
Why it's a different discipline: AI storage solves infrastructure problems for teams that train models at scale, such as data center operators, ML platform engineers, and infrastructure teams managing GPU clusters. This curriculum teaches you to use models via inference APIs. The storage layer between your application and an API call is HTTP. You don't manage the provider's training infrastructure any more than a web developer manages Cloudflare's CDN nodes.
When it would become relevant to you: If you move into ML infrastructure, start training large models from scratch (not fine-tuning with QLoRA on a single GPU), or build internal ML platforms for an organization. At that point, storage I/O becomes a bottleneck you'll feel directly. For the work this curriculum teaches (building applications that call inference APIs, fine-tuning small models with PEFT, and running local models with Ollama), your laptop's SSD is fine.
Further reading: VAST Data: Why AI Storage Matters gives a good overview of the infrastructure concerns for anyone curious about what this discipline involves.
Cross-references
- Common Category Mistakes — conceptual confusions within the curriculum itself
- Glossary — term definitions used throughout the curriculum
- Fine-Tuning Landscape — the full technique comparison table