Prompt Engineering

Design and optimize prompts for large language models (LLMs) to achieve reliable, high-quality outputs across diverse tasks.

When to Use

Use this skill when:

Building LLM-powered applications requiring consistent outputs
Model outputs are unreliable, inconsistent, or hallucinating
Need structured data (JSON) from natural language inputs
Implementing multi-step reasoning tasks (math, logic, analysis)
Creating AI agents that use tools and external APIs
Optimizing prompt costs or latency in production systems
Migrating prompts across different model providers
Establishing prompt versioning and testing workflows

Key Features

1. Prompting Technique Decision Framework

Choose the right technique based on task requirements:

Goal	Technique	Token Cost	Reliability	Use Case
Simple, well-defined task	Zero-Shot	Minimal	Medium	Translation, simple summarization
Specific format/style	Few-Shot	Medium	High	Classification, entity extraction
Complex reasoning	Chain-of-Thought	Higher	Very High	Math, logic, multi-hop QA
Structured data output	JSON Mode / Tools	Low-Med	Very High	API responses, data extraction
Multi-step workflows	Prompt Chaining	Medium	High	Pipelines, complex tasks
Knowledge retrieval	RAG	Higher	High	QA over documents
Agent behaviors	ReAct (Tool Use)	Highest	Medium	Multi-tool, complex tasks

2. Zero-Shot Prompting

Pattern: Clear instruction + optional context + input + output format

Best Practices:

Be specific about constraints and requirements
Use imperative voice ("Summarize...", not "Can you summarize...")
Specify output format upfront
Set temperature=0 for deterministic outputs

Example:

prompt = """
Summarize the following customer review in 2 sentences, focusing on key concerns:

Review: [customer feedback text]

Summary:
"""

3. Chain-of-Thought (CoT)

Pattern: Task + "Let's think step by step" + reasoning steps → answer

When to Use: Complex reasoning tasks (math problems, multi-hop logic, analysis)

Research Foundation: Wei et al. (2022) demonstrated 20-50% accuracy improvements on reasoning benchmarks

Zero-Shot CoT:

prompt = """
Solve this problem step by step:

A train leaves Station A at 2 PM going 60 mph.
Another leaves Station B at 3 PM going 80 mph.
Stations are 300 miles apart. When do they meet?

Let's think through this step by step:
"""

4. Few-Shot Learning

Pattern: Task description + 2-5 examples (input → output) + actual task

Sweet Spot: 2-5 examples (quality > quantity)

Best Practices:

Use diverse, representative examples
Maintain consistent formatting
Randomize example order to avoid position bias
Label edge cases explicitly

Example:

prompt = """
Classify sentiment of movie reviews.

Examples:
Review: "Absolutely fantastic! Loved every minute."
Sentiment: positive

Review: "Waste of time. Terrible acting."
Sentiment: negative

Review: "It was okay, nothing special."
Sentiment: neutral

Review: "{new_review}"
Sentiment:
"""

5. Structured Output Generation

Modern Approach (2025): Use native JSON modes and tool calling

OpenAI JSON Mode:

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "Extract user data as JSON."},
        {"role": "user", "content": "From bio: 'Sarah, 28, sarah@example.com'"}
    ],
    response_format={"type": "json_object"}
)

TypeScript with Zod:

import { generateObject } from 'ai';
import { z } from 'zod';

const schema = z.object({
  name: z.string(),
  age: z.number(),
});

const { object } = await generateObject({
  model: openai('gpt-4'),
  schema,
  prompt: 'Extract: "Sarah, 28"',
});

6. Tool Use and Function Calling

Pattern: Define functions → Model decides when to call → Execute → Return results → Model synthesizes

OpenAI Function Calling:

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto"
)

7. Prompt Chaining and Composition

Pattern: Break complex tasks into sequential prompts

LangChain LCEL Example:

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

summarize_prompt = ChatPromptTemplate.from_template(
    "Summarize: {article}"
)
title_prompt = ChatPromptTemplate.from_template(
    "Create title for: {summary}"
)

llm = ChatOpenAI(model="gpt-4")
chain = summarize_prompt | llm | title_prompt | llm

result = chain.invoke({"article": "..."})

Anthropic Prompt Caching:

# Cache large context (90% cost reduction on subsequent calls)
message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    system=[
        {"type": "text", "text": "You are a coding assistant."},
        {
            "type": "text",
            "text": f"Codebase:\n\n{large_codebase}",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": "Explain auth module"}]
)

Quick Start

Zero-Shot Prompt (Python + OpenAI)

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Summarize this article in 3 sentences: [text]"}
    ],
    temperature=0  # Deterministic output
)
print(response.choices[0].message.content)

Structured Output (TypeScript + Vercel AI SDK)

import { generateObject } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';

const schema = z.object({
  name: z.string(),
  sentiment: z.enum(['positive', 'negative', 'neutral']),
});

const { object } = await generateObject({
  model: openai('gpt-4'),
  schema,
  prompt: 'Extract sentiment from: "This product is amazing!"',
});

Library Recommendations

Python Ecosystem

LangChain: Complex RAG, agents, multi-step workflows
LlamaIndex: Document indexing, knowledge base QA
DSPy: Programmatic prompt optimization
OpenAI SDK: Direct OpenAI access
Anthropic SDK: Claude integration

TypeScript Ecosystem

Vercel AI SDK: Next.js/React AI apps (modern, type-safe)
LangChain.js: JavaScript port
Provider SDKs: openai, @anthropic-ai/sdk

Library	Complexity	Multi-Provider	Best For
LangChain	High	✅	Complex workflows, RAG
LlamaIndex	Medium	✅	Data-centric RAG
DSPy	High	✅	Research, optimization
Vercel AI SDK	Low-Medium	✅	React/Next.js apps
Provider SDKs	Low	❌	Single-provider apps

Production Best Practices

1. Prompt Versioning

PROMPTS = {
    "v1.0": {
        "system": "You are a helpful assistant.",
        "version": "2025-01-15",
        "notes": "Initial version"
    },
    "v1.1": {
        "system": "You are a helpful assistant. Always cite sources.",
        "version": "2025-02-01",
        "notes": "Reduced hallucination"
    }
}

2. Cost and Token Monitoring

def tracked_completion(prompt, model):
    response = client.messages.create(model=model, ...)
    usage = response.usage
    cost = calculate_cost(usage.input_tokens, usage.output_tokens, model)

    log_metrics({
        "input_tokens": usage.input_tokens,
        "output_tokens": usage.output_tokens,
        "cost_usd": cost,
        "timestamp": datetime.now()
    })
    return response

3. Error Handling and Retries

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10)
)
def robust_completion(prompt):
    try:
        return client.messages.create(...)
    except anthropic.RateLimitError:
        raise  # Retry
    except anthropic.APIError as e:
        return fallback_completion(prompt)

4. Input Sanitization

def sanitize_user_input(text: str) -> str:
    dangerous = [
        "ignore previous instructions",
        "ignore all instructions",
        "you are now",
    ]

    cleaned = text.lower()
    for pattern in dangerous:
        if pattern in cleaned:
            raise ValueError("Potential injection detected")
    return text

Multi-Model Portability

OpenAI GPT-4:

Strong at complex instructions
Use system messages for global behavior
Prefers concise prompts

Anthropic Claude:

Excels with XML-structured prompts
Use <thinking> tags for chain-of-thought
Prefers detailed instructions

Google Gemini:

Multimodal by default (text + images)
Strong at code generation
More aggressive safety filters

Meta Llama (Open Source):

Requires more explicit instructions
Few-shot examples critical
Self-hosted, full control

building-ai-chat: Conversational AI patterns and system messages
evaluating-llms: Testing and validating prompt quality
model-serving: Deploying prompt-based applications
designing-apis: LLM API integration patterns
generating-documentation: LLM-powered documentation tools

References

Full skill documentation: /skills/prompt-engineering/SKILL.md
Zero-shot patterns: /skills/prompt-engineering/references/zero-shot-patterns.md
Chain-of-thought: /skills/prompt-engineering/references/chain-of-thought.md
Few-shot learning: /skills/prompt-engineering/references/few-shot-learning.md
Structured outputs: /skills/prompt-engineering/references/structured-outputs.md
Tool use guide: /skills/prompt-engineering/references/tool-use-guide.md
Prompt chaining: /skills/prompt-engineering/references/prompt-chaining.md
RAG patterns: /skills/prompt-engineering/references/rag-patterns.md
Multi-model portability: /skills/prompt-engineering/references/multi-model-portability.md

When to Use​

Key Features​

1. Prompting Technique Decision Framework​

2. Zero-Shot Prompting​

3. Chain-of-Thought (CoT)​

4. Few-Shot Learning​

5. Structured Output Generation​

6. Tool Use and Function Calling​

7. Prompt Chaining and Composition​

Quick Start​

Zero-Shot Prompt (Python + OpenAI)​

Structured Output (TypeScript + Vercel AI SDK)​

Library Recommendations​

Python Ecosystem​

TypeScript Ecosystem​

Production Best Practices​

1. Prompt Versioning​

2. Cost and Token Monitoring​

3. Error Handling and Retries​

4. Input Sanitization​

Multi-Model Portability​

Related Skills​

References​