Why Prompting is an Engineering Skill
Most people treat prompts like Google search queries — throw some words in, hope for the best, tweak randomly when it doesn't work.
That's not how good AI developers think.
A prompt is an instruction set. You are programming the model using natural language. And just like code, the way you write it determines exactly what you get back.
Bad prompt → unpredictable output → broken app → frustrated users.
Good prompt → consistent, structured output → reliable app → happy users.
By the end of this module you'll understand the exact structure of how messages reach the model, how to write prompts that actually work, and patterns you'll reuse in every AI application you build.
The Three Roles — How the Model Sees a Conversation
When you call any LLM API, the conversation is not just raw text. It's structured into roles. Every piece of text is tagged with who sent it.
There are three roles:
┌─────────────────────────────────────────────────┐
│ SYSTEM │
│ Instructions for how the model should behave. │
│ Set by YOU, the developer. User never sees it. │
└─────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────┐
│ USER │
│ The message from the human in the conversation.│
│ This is what the user types. │
└─────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────┐
│ ASSISTANT │
│ The model's response. │
│ Also called "completion" or "assistant turn." │
└─────────────────────────────────────────────────┘
Every API call you make — whether it's a simple chatbot or a complex RAG pipeline — is built from combinations of these three roles.
Part 1 — The System Prompt
What it is
The system prompt is your developer instruction layer. It runs before anything else. It tells the model:
- Who it is
- What it should and shouldn't do
- What tone to use
- What format to respond in
- What domain it operates in
- What to do in edge cases
The user never sees the system prompt. But it shapes every single response the model gives.
Think of it like this — before an employee starts a customer support call, their manager briefs them:
"You are a support agent for TechCorp.
Always be polite. Never discuss pricing.
If you don't know something, say 'I'll
check on that for you.' Keep answers short."
The customer calling in has no idea this briefing happened. But every answer the agent gives is shaped by it.
That briefing is your system prompt.
What a Weak System Prompt Looks Like
SYSTEM:
You are a helpful assistant.
This is what most beginners write. It's almost useless.
"Helpful" is vague. "Assistant" is vague. The model will guess what you want — and guessing means inconsistency.
What a Strong System Prompt Looks Like
Here's a system prompt for a customer support bot for a software product:
SYSTEM:
You are a customer support specialist for DevTool Pro,
a developer productivity SaaS application.
Your behavior rules:
- Answer ONLY questions related to DevTool Pro features,
bugs, billing, and account management
- If a question is unrelated to DevTool Pro, politely
decline and redirect: "I can only help with DevTool
Pro related questions."
- Never speculate about features that don't exist
- If unsure, say: "I don't have that information right
now — let me connect you with our team."
- Always respond in plain English, no jargon
- Keep responses under 150 words unless the question
genuinely requires more detail
Response format:
- Start with a direct answer to the question
- Add explanation if needed
- End with one follow-up offer if relevant
Tone: Professional but warm. Never robotic.
This system prompt does five things well:
1. Defines identity → who the model IS
2. Sets boundaries → what it will and won't do
3. Handles edge cases → what to do when it doesn't know
4. Controls format → how the output is structured
5. Sets tone → how it sounds
System Prompt in Code
const response = await fetch("https://api.anthropic.com/v1/messages", { method: "POST", headers: { "Content-Type": "application/json", }, body: JSON.stringify({ model: "claude-sonnet-4-6", max_tokens: 1024, system: `You are a customer support specialist for DevTool Pro. Answer only questions related to DevTool Pro. If unsure, say you'll connect them with the team. Keep responses under 150 words. Tone: Professional but warm.`, messages: [ { role: "user", content: "How do I reset my password?" } ] }) });
Notice — in the Anthropic API, system is a separate field, not part of the messages array. In OpenAI's API it's a message with role "system" inside the array. Different APIs, same concept.
Part 2 — The User Prompt
What it is
The user prompt is the actual message from the human. In a chat application, this is what the user types. In a backend pipeline (like RAG), this might be constructed programmatically.
Simple case — user just types naturally:
USER:
How do I reset my password?
But as a developer, you'll often construct the user prompt yourself — adding context, formatting, injected data — before it reaches the model.
Constructed User Prompts
In real applications, what looks like a "user message" is often built by your code. This is normal and powerful.
Example — a document summarizer:
const userDocument = "...500 words of content from uploaded PDF..."; const userQuestion = "What are the key action items?";
const constructedUserPrompt = ` Here is the document the user uploaded:
<document> ${userDocument} </document>
User's question: ${userQuestion}
Please answer based only on the document above. `;
// This constructed prompt is sent as the user message
The actual user only typed "What are the key action items?" — but your code wrapped it with the document content and clear instructions before sending.
This is a pattern you'll use constantly in RAG applications.
Prompt Engineering Techniques
Here are the core techniques that actually work — not magic phrases, but structural patterns:
Technique 1 — Be Specific, Not Vague
❌ Vague:
"Write something about climate change"
✅ Specific:
"Write a 3-paragraph summary of the causes of
climate change, written for a high school student
with no science background. Use simple language
and one real-world example per paragraph."
Specificity removes guessing. Less guessing = more consistent output.
Technique 2 — Specify the Output Format
❌ No format specified:
"List the pros and cons of React vs Vue"
✅ Format specified:
"Compare React and Vue. Return your response as
a JSON object with this exact structure:
{
"react": {
"pros": ["...", "..."],
"cons": ["...", "..."]
},
"vue": {
"pros": ["...", "..."],
"cons": ["...", "..."]
}
}
Return only the JSON. No explanation before or after."
When you specify format precisely, your code can parse the output reliably. This is critical for building real applications.
Technique 3 — Give Examples (Few-Shot Prompting)
This is one of the most powerful techniques. Show the model exactly what you want by example:
Classify customer messages as: BILLING, TECHNICAL, or GENERAL
Examples:
Message: "My invoice shows the wrong amount"
Category: BILLING
Message: "The app crashes when I click export"
Category: TECHNICAL
Message: "What are your business hours?"
Category: GENERAL
Now classify this message:
Message: "I was charged twice this month"
Category:
The model has seen the pattern three times. It knows exactly what to do. No ambiguity.
This is called few-shot prompting — giving a few examples before the actual task.
Zero-shot = no examples, just instruction. Few-shot = a few examples before the task. One-shot = exactly one example.
Technique 4 — Chain of Thought
For complex reasoning tasks, ask the model to think step by step before giving the answer:
❌ Direct answer (often wrong on complex problems):
"What is 15% of 847?"
✅ Chain of thought:
"What is 15% of 847? Think step by step before
giving the final answer."
Model output:
"Step 1: 10% of 847 = 84.7
Step 2: 5% of 847 = 84.7 / 2 = 42.35
Step 3: 15% = 84.7 + 42.35 = 127.05
Answer: 127.05"
Making the model reason explicitly before answering dramatically improves accuracy on math, logic, and multi-step problems.
This is the technique behind OpenAI's "o1" model — it was trained to think before answering, not just immediately generate responses.
Technique 5 — Constrain What the Model Can and Cannot Do
You are a data extraction assistant.
Rules:
- Extract ONLY information that is explicitly stated
in the document
- If information is not in the document, return null
for that field
- NEVER infer or guess missing information
- NEVER add information from your own knowledge
This is critical — return null rather than guessing.
Explicit constraints prevent hallucination. In production apps this is not optional — you must constrain the model's behavior.
Part 3 — The Completion (Assistant Response)
What it is
The completion is the model's response — everything it generates back. In the API it's tagged with role "assistant."
Simple example:
USER: "What is 2 + 2?"
ASSISTANT: "4"
The completion is "4."
Why You Sometimes Write the Assistant Turn Yourself
Here's something that surprises developers — you can pre-fill the assistant's response. You write the beginning of the assistant's answer, and the model continues from there.
This is called assistant prefilling and it's a powerful technique:
messages: [
{
role: "user",
content: "Give me the user data as JSON"
},
{
role: "assistant",
content: "{" // ← you start the JSON, model continues
}
]
By starting with {, you force the model to continue generating valid JSON. It won't start with an explanation or preamble — it has to continue from {.
This is useful when you need:
- Pure JSON output with no surrounding text
- Responses that start in a specific way
- Format enforcement without relying on instructions alone
Multi-Turn Conversations — How History is Structured
In a real conversation, messages alternate between user and assistant:
messages: [
{
role: "user",
content: "My name is Arjun"
},
{
role: "assistant",
content: "Nice to meet you, Arjun! How can I help you today?"
},
{
role: "user",
content: "What is my name?"
}
// Model will respond with "Arjun" because it can see the full history
]
Your application is responsible for maintaining this history array and sending it with every request. The model itself stores nothing between calls.
This is the exact reason why building a chatbot is more than just calling the API — you need to manage conversation state.
Putting It All Together — A Real Example
Here's a complete, production-style prompt structure for a code review assistant:
Notice what's happening here:
System prompt → defines role + enforces JSON format
User prompt → injects the actual code + asks the question
Output → structured JSON your code can work with
This is the pattern for every production AI feature you'll build.
The Mental Model for Prompts
System Prompt = The employee's job description + rules
User Prompt = The customer's request
Completion = The employee's response
Your job as developer:
Write a job description clear enough that
the employee never has to guess what to do.
3-Line Summary
- Every LLM interaction has three roles — system (your instructions as developer), user (the human's message), and assistant (the model's response) — understanding these lets you control exactly what the model does.
- Strong system prompts define identity, boundaries, format, tone, and edge cases — vague prompts lead to inconsistent apps, specific prompts lead to reliable ones.
- Prompt engineering is a structural skill — specifying output format, using few-shot examples, adding chain-of-thought, and constraining behavior are the techniques that make AI applications actually work in production.
Module 1.4 — Complete ✅
Phase 1 is done. 🎉
You now understand:
- The full AI/ML/DL/LLM hierarchy
- How generative AI works and why hallucination happens
- Tokens, context windows, and temperature
- System prompts, user prompts, and how to engineer them properly
Coming Up — Phase 2: LLM Internals
Module 2.1 — What is NLP, Words vs Tokens, and Tokenization
We go inside the black box. You'll understand exactly how text becomes numbers, why tokenization works the way it does, and what's really happening before the Transformer even sees your input.
No comments:
Post a Comment