Why This Learning Path Is Ordered This Way

Everyone's journey is different, but I've found that most of the content available is either taught in isolation (i.e. build an agent, add RAG, fine tune your model), targets a specific use case without formally declaring what that use case is (and the problem it solves), or assumes you have a lot of the building blocks already. Visiting the OpenAI or Anthropic docs will give you a wonderful exposure to concepts and guides that will feel like an ocean of data without a destination.

The order of topics in this learning path is itself an architectural decision. If you're new to AI Engineering, it's likely that you may not even know what questions to ask, or how to approach the topics. For example, where is the line between machine learning and AI engineering? The answer is obvious to experienced engineers, but maybe not so much to someone who isn't working in the space already. In short, this is the learning path I wish I had when I started learning about AI Engineering.

This page explains the reasoning behind the structure so you can follow the sequence with confidence, or make an informed decision to deviate.

The ordering problem

A serious AI Engineering path has to balance five competing goals:

Learn the fundamentals instead of cargo-culting tools
Feel the pain that motivates better architectures
Avoid decision fatigue
Avoid vendor lock-in
Still arrive at a system that resembles production reality

Those goals pull you in different directions. The sequence outlined below is designed to satisfy all five without settling for surface-level interfaces.

Three ways curricula typically go wrong (IMHO)

There are three common ways that I think AI engineering curricula usually miss the mark.

Survey-first learning. You learn agents, then RAG, then evals, then fine-tuning as independent modules. This creates breadth, but the pieces don't connect when you try to build something real.

Magic-path learning. You follow a "best stack" recipe without first feeling the problem each tool solves. This creates compliance, not judgment.

Topic-without-system learning. You dive deep into an advanced topic, such as multi-agent orchestration, fine-tuning, or knowledge graphs, without being anchored in a system where you can see the consequences. For example, a standalone multi-agent tutorial will have you building agent loops, but without a benchmark or evals in place, you'll have no way to tell if the agents are actually producing better results or just adding complexity. You end up iterating in circles because you never defined what "good" looks like. The topic makes sense in isolation, but without the surrounding system, you can't measure whether it's helping.

Contrastive, project-layered learning

This curriculum is built around a fourth option:

Build one serious system. At each layer, start with the cheapest plausible baseline, observe the failure, write down what broke, and only then adopt the next tool.

That gives you the benefits of sequence without losing the "why." You'll never wonder "why am I learning this?" because you'll have just hit the wall that makes it obvious.

This is fundamentally a first-principles approach to AI engineering. Rather than teaching solutions and hoping you'll encounter the problems later, we'll surface the problem first, experience its consequences, and then introduce the tool that addresses it. That's why we support multiple paths (the principle transfers across all of them), why we start with failure modes before solutions (understanding why something fails teaches you more than knowing what to use), and why this curriculum is sequenced around observable pain rather than topic popularity.

The central rule

If you can't explain your current system from its logs, traces, and evals, you're not ready to add more complexity.

The sequence at a glance

Module	What you build	Why it belongs here
Foundation Sprint	Python, FastAPI, LLM mental models, API fluency, structured outputs, retrieval basics	Gives you the foundation so the main path never feels mystical
Benchmark and Harness	Benchmark questions and a run log	Defines "better" before infrastructure hides the signal
Agent and Tool Building	Raw tool-calling agent, then a framework-backed version, then MCP	Teaches the mechanics before abstractions
Code Retrieval	Naive baseline, AST-aware, graph/hybrid, context compilation	Lets you feel naive retrieval fail before adopting code-aware retrieval
RAG and Grounded Answers	RAG pipeline, evidence bundles, retrieval routing	Turns retrieval into grounded answer generation
Observability and Evals	Telemetry, cost/outcome tracking, harness, retrieval evals, tool/trace evals	Makes the system visible and accountable before optimization
Orchestration and Memory	Subagents, specialists, A2A, thread memory, long-term memory	Added only after the single-agent system is observable and measurable
Optimization	Optimization taxonomy, distillation, fine-tuning	Reserved for stable systems with eval coverage

Why code indexing appears before "general RAG"

We're anchoring our journey to a single project (a.k.a. an "anchor") so that we can build on top of our learning like we would do in production. For the anchor project, the first retrieval problem isn't "chat with PDFs." Instead, we focus on:

"Where is this symbol defined?"
"What calls this function?"
"What breaks if I change this interface?"
"Which files should I read to understand this subsystem?"

These are code retrieval questions. They fail in specific ways under naive chunk-and-embed retrieval: symbol boundaries get split, exact identifiers are missed, semantically related prose outranks executable code, and relationship questions have no structure to traverse.

So in this curriculum we don't skip naive retrieval. It gives you a deliberate baseline first, then we upgrade the retrieval method. That way we're not saying "just trust the advanced indexing stack."

Why evals come before orchestration

Multi-agent systems multiply ambiguity. If we can't measure a single-agent system's behavior, we definitely can't measure a multi-agent system's behavior. Evals and observability come first so that when you add subagents, you'll be able to tell whether they're actually helping.

Why optimization comes last

Distillation and fine-tuning change the model itself. If retrieval is bad, you should fix retrieval, not fine-tune around it. If prompts are unclear, you should fix prompts, not train the model to tolerate them. Premature optimization will lead you down a path of endless frustration and a lot of time wasted. Optimization lives at the end here because it's the most permanent intervention and should only be applied to a system that's already working well enough to measure.

What's next

How to Use the Path. That page explains the teaching conventions you'll keep seeing: problem-to-tool maps, Retrieval Lab Notes, and the three-label tool framing.