Prompt Engineering Fundamentals
You now understand what the model sees at inference time: a sequence of messages (system, user, assistant, tool) within a fixed token budget. This lesson teaches you how to structure that input deliberately.
Prompt engineering isn't a collection of tricks. It's the practice of designing clear, testable contracts between you and the model. A well-engineered prompt defines what the model should do, what evidence it should use, what format the output should take, and how it should handle edge cases. When the output is wrong, a well-engineered prompt tells you where to look.
What you'll learn
- Design prompt contracts with explicit expectations for behavior, format, and failure handling
- Use few-shot examples to steer model behavior through demonstration rather than instruction
- Decompose complex tasks into prompt sequences that are each testable independently
- Debug prompt failures by isolating prompt issues from context issues and model limitations
- Explain what context engineering is and why it matters more than any single prompting technique
Concepts
Prompt contract: the set of explicit expectations your prompt establishes: what the model should do, what input it is given, what format the output should take, and what it should do when the input is incomplete or ambiguous. A prompt without a clear contract is untestable. You cannot tell whether the model followed it or not.
Few-shot examples: examples of input/output pairs included in the prompt to demonstrate the expected behavior. Few-shot examples are often more effective than lengthy instructions because they show the model what you want rather than telling it. They also make the prompt contract concrete and verifiable.
Chain-of-thought: a prompting technique that asks the model to show its reasoning steps before giving a final answer. This can improve accuracy on multi-step problems by encouraging the model to decompose its reasoning. It is one technique among several, useful when reasoning is complex, but not always necessary and not free (it costs tokens and latency).
Prompt decomposition: breaking a complex task into smaller, independently testable prompt steps. Instead of one prompt that does retrieval + analysis + formatting + citation, use a sequence of focused prompts where each has a clear contract and verifiable output. This makes debugging easier: when something breaks, you can identify which step failed.
Context engineering: the discipline of selecting, packaging, and budgeting the information a model sees at inference time. Prompts, retrieved evidence, tool results, memory, and conversation history are all parts of context. Context engineering is arguably the core skill of AI engineering. A bigger context window does not substitute for better context selection. More tokens can actually degrade quality if the context is noisy, stale, or contradictory.
Context rot: degradation of output quality caused by accumulated, stale, or conflicting context. Symptoms include: the model ignores recent instructions in favor of earlier ones, retrieved evidence contradicts itself, conversation history introduces noise, or memory entries are outdated. Context rot is the natural consequence of not actively managing what goes into the context window. You will encounter it again in retrieval (Module 4), context compilation (Module 4), and memory (Module 7).
Walkthrough
Setup: prompt lab
Use the same provider you chose in Choosing a Provider or in the previous lesson. If you already configured more than one provider, keep them. Cross-provider comparison is a feature, not a mistake.
Continue in the llm-experiments/ directory you created in the previous lesson. If you are starting fresh, recreate the minimal setup now and create prompt_lab.py, the script you will use throughout this lesson to test prompt variants side by side.
The contract experiments themselves do not change across providers. Only the client initialization, model name, and response parsing differ. Pick your provider tab below and use that version for the rest of the lesson.
mkdir llm-experiments && cd llm-experiments
python -m venv .venv && source .venv/bin/activate
pip install openai
export OPENAI_API_KEY="sk-..."# prompt_lab.py
from openai import OpenAI
client = OpenAI()
def run_prompt(name, system, user, temperature=0):
"""Run a prompt and print the result with a label."""
print(f"\n{'='*60}")
print(f"Variant: {name}")
print(f"{'='*60}")
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user},
],
temperature=temperature,
)
print(response.choices[0].message.content)
print(f" Tokens: {response.usage.total_tokens}")
return response.choices[0].message.contentYou will add prompt variants to this file as you work through the sections below. To verify the setup works, add a quick test call at the bottom:
# --- Verify setup ---
if __name__ == "__main__":
run_prompt(
"setup test",
system="You are a helpful assistant.",
user="Say 'Prompt lab is working.' and nothing else.",
)python prompt_lab.pyExpected:
============================================================
Variant: setup test
============================================================
Prompt lab is working.
Tokens: ~30
Once you see output, delete the setup test. You will replace it with real experiments below.
Prompt contracts, not prompt tricks
A prompt contract has four parts:
-
Role and task: what the model is and what it should do. Be specific here. "You are a code reviewer" is weaker than "You are a code reviewer that identifies security vulnerabilities in Python web applications. You only flag issues you can explain with a specific code reference."
-
Input specification: what the model is being given. Name the inputs explicitly: "You will receive a Python function and a list of known CVE patterns." The model cannot infer what you intended to provide.
-
Output specification: the exact format of the expected response. Use structured output schemas (covered in Build with APIs) whenever possible. If the output is natural language, specify structure: "Respond with a list of findings. Each finding has: file, line, vulnerability type, explanation, severity."
-
Edge case behavior: what the model should do when the input is ambiguous, incomplete, or outside scope. "If no vulnerabilities are found, return an empty list. Do not invent issues."
A prompt without an output specification is untestable. A prompt without edge case behavior will surprise you in production.
Try it now. Add these three variants to your prompt_lab.py and run them:
# --- Add to prompt_lab.py ---
CODE_SAMPLE = """
def get_user(id):
return db.execute(f"SELECT * FROM users WHERE id = {id}")
"""
# Variant A: vague contract
run_prompt(
"A: vague contract",
system="You are a code reviewer.",
user=f"Review this code:\n{CODE_SAMPLE}",
)
# Variant B: specific contract, no output format
run_prompt(
"B: specific role, no output format",
system="You are a code reviewer that identifies security vulnerabilities in Python web applications.",
user=f"Review this code:\n{CODE_SAMPLE}",
)
# Variant C: full contract
run_prompt(
"C: full contract",
system=(
"You are a code reviewer that identifies security vulnerabilities "
"in Python web applications. You only flag issues you can explain "
"with a specific code reference.\n\n"
"For each finding, respond with:\n"
"- line: the approximate line number\n"
"- type: the vulnerability type\n"
"- explanation: why it is a vulnerability\n"
"- severity: low, medium, high, or critical\n\n"
"If no vulnerabilities are found, respond with: No issues found."
),
user=f"Review this code:\n{CODE_SAMPLE}",
)python prompt_lab.pyCompare the three outputs:
- Variant A will give a generic, unfocused review. It might mention style, naming, and security in a jumble
- Variant B will focus on security but in an unpredictable format
- Variant C will produce a structured finding with the exact fields you specified
This is what a prompt contract does: it turns a vague request into a testable, verifiable output.
Few-shot examples as contracts
Instead of writing long instructions, show the model what you want:
Given this function:
def get_user(id):
return db.execute(f"SELECT * FROM users WHERE id = {id}")
Your findings:
- file: example.py, line: 2, type: SQL injection, explanation: f-string interpolation of user input into SQL query, severity: high
Given this function:
def health():
return {"status": "ok"}
Your findings:
[]
Two examples establish the contract more reliably than a paragraph of instructions. Include at least one positive example (there is something to find) and one negative example (there is nothing to find, and the correct answer is empty).
Decomposition over complexity
When a task requires retrieval, analysis, and formatting, do not write one giant prompt. Break it into steps:
- Retrieve: select the relevant evidence (a separate prompt or retrieval call)
- Analyze: given the evidence, answer the question (its own prompt with a clear contract)
- Format: structure the answer for the consumer (its own prompt or just a schema)
Each step has a clear input, a clear output, and can be tested independently. When the final output is wrong, you can check each step's output to find where the failure occurred.
Reasoning scaffolds
Sometimes the model needs to reason through a problem before producing an answer. Chain-of-thought is one approach: ask the model to think step by step before giving a conclusion. But it is not the only approach, and it is not always the best one.
Use reasoning scaffolds when:
- The task involves multiple logical steps
- The model frequently produces incorrect answers without reasoning
- You need to audit the model's reasoning for correctness
Skip reasoning scaffolds when:
- The task is simple extraction or formatting
- You are optimizing for speed and cost
- The model already produces correct answers without them
Context engineering in practice
Every prompt exists within a context budget. You are always making tradeoffs:
- More few-shot examples improve reliability but consume tokens
- Longer system prompts are more precise but leave less room for retrieved evidence
- Conversation history provides continuity but accumulates noise over time
Start with the smallest context that produces correct output. Add context only when you can measure that it improves results. This is context engineering: deliberate selection and budgeting, not "put everything in the prompt."
A note for later: stable prompt structure (consistent system prompts, consistent few-shot formatting) improves cacheability. Prompt caching (covered in Module 6) can significantly reduce cost and latency, but only if your prompts have stable prefixes. Design for stability now; measure the benefit later.
Debugging prompts
When the model produces wrong output, diagnose before changing anything:
- Is it a prompt issue? Are the instructions ambiguous? Is the contract unclear? Is the output format underspecified? Test: simplify the prompt to the minimum and check if the problem persists.
- Is it a context issue? Is the model seeing the wrong evidence, too much evidence, or stale evidence? Test: manually inspect what is in the context window. Is the answer supported by the evidence provided?
- Is it a model limitation? Is the task genuinely beyond the model's capability? Test: try a larger or more capable model. If it works, the issue is model capability, not your prompt.
This diagnostic (introduced in the previous lesson) becomes a daily habit. Resist the urge to tweak the prompt without diagnosing first. Most prompt changes are guesses, and guesses compound into unmaintainable prompt spaghetti.
Exercises
- Write a prompt contract for a task of your choice (code review, bug triage, document summarization, or meeting notes extraction). Include all four parts: role/task, input specification, output specification, and edge case behavior.
- Add two few-shot examples to your prompt: one positive case and one negative case. Test whether the model follows the examples more reliably than the instructions alone.
- Take a complex task and decompose it into 2-3 prompt steps. Run each step independently and verify the output at each stage.
- Deliberately break your prompt by providing contradictory instructions or noisy context. Diagnose the failure: is it a prompt issue, a context issue, or a model limitation?
- Optional: rerun one of your prompt experiments on a second provider. Keep the prompt contract identical and note what changed in client setup, model naming, response parsing, and token/usage reporting.
Completion checkpoint
You can:
- Show a prompt contract with all four parts (role, input, output, edge cases)
- Show how the same prompt contract ports cleanly between at least two provider surfaces
- Show few-shot examples that make the contract concrete
- Decompose a multi-step task into independently testable prompt steps
- Diagnose a prompt failure by isolating prompt vs context vs model issues
- Explain what context engineering is and why "more context" is not always better
- Explain which parts of a prompt experiment stay stable across OpenAI, Gemini, Anthropic, Hugging Face, and Ollama, and which parts are provider-specific plumbing
Connecting to the project
The prompt contracts and decomposition patterns you practiced here are standalone exercises. In the next lesson, you'll apply them directly to the FastAPI project you started in lesson 1. Your summarizer, extraction, and tool-calling endpoints will all use prompt contracts you design.
Beyond Module 1, prompt engineering isn't a one-time skill. You'll write prompt contracts for:
- Retrieval graders in Module 4
- Eval rubrics in Module 6
- Specialist agents in Module 7
- Distillation teachers in Module 8
Context engineering and context rot will resurface every time we decide what evidence to include in the model's context.
What's next
Building with APIs. Prompt contracts only become engineering when they live in code, so the next lesson turns them into requests, structured outputs, sessions, and tool calls.
References
Start here
- Anthropic: Prompt engineering guide — the best single resource on structuring prompts for production use
Build with this
- OpenAI: Prompt engineering guide — practical techniques with examples, covers few-shot, chain-of-thought, and structured outputs
- OpenAI: Structured outputs — enforce output schemas so your prompt contracts are machine-verifiable
- Gemini text generation guide — system instructions and request config on the direct Gemini API
- Gemini structured output — JSON schema support and typed parsing on Gemini
- Hugging Face: Prompt engineering — official guidance on prompt design in the open-model ecosystem
- Hugging Face: Structured outputs with LLMs — schema-constrained outputs on Hugging Face-hosted models
- Ollama: Structured outputs — Ollama's guide to schema-constrained outputs
Deep dive
- Ollama Modelfile reference — system prompts, templates, and model behavior customization for Ollama
- Brex: Prompt engineering guide — open-source examples of production prompt patterns from a real engineering team