A practitioner's reference for building reliable, production-grade AI systems with LLMs.
Every pattern shows what breaks, what fixes it, and why.
Most prompt engineering guides are either academic (useless) or a list of tricks (forgettable). This repo is different: it's 20 patterns I've used in production, each documented with a real failure, a real fix, and the mechanics behind why the fix works.
If you're evaluating AI engineers or building AI products, this repo answers the question: "Does this person actually understand how LLMs work, or are they just vibing with ChatGPT?"
|
Control the basics: reasoning, format, tone
|
Handle complexity: edge cases, reliability
|
||||||||||||||||||||||||||||||||||||
|
Build systems: architecture, safety, scale
|
Ship and maintain: monitoring, ops, rollback
|
Tip
Short on time? Start with these three patterns. They're the highest-leverage for most real-world problems:
- ποΈ System Prompt Architecture (#11) β The difference between a demo and a product.
- π Evaluation-Driven Iteration (#16) β Stop tuning prompts by feel.
- π‘οΈ Adversarial Guardrails (#12) β What separates "shipping" from "shipping safely."
For a scannable one-pager, see CHEATSHEET.md. For full end-to-end system designs, see USE-CASES.md.
Every pattern in this repo follows this format. Here's a preview of the shape of the problem:
Caution
β Naive Prompt
Analyze this candidate's resume and tell me if they're a good fit
for our Senior ML Engineer role.
What breaks: Vague "yes" or "no" with surface-level reasoning. Pattern-matches on keyword overlap rather than evaluating fit signals. The output feels like a coin flip with extra words. Ask this same prompt 5 times β you'll get 5 different answers, each confidently wrong in its own way.
Note
β Engineered Prompt
You are a senior technical recruiter evaluating ML Engineer
candidates. Analyze this candidate step by step:
1. SKILLS MATCH: List each required skill. For each, note whether
the resume demonstrates it and at what level. Cite evidence.
2. EXPERIENCE DEPTH: Years of relevant experience, scale of
systems built, progression of responsibility.
3. GAPS: Hard gaps (missing must-haves) vs. soft gaps (nice-to-haves).
4. CULTURE SIGNALS: Collaboration, communication, leadership
indicators from project descriptions.
5. FINAL ASSESSMENT: Based ONLY on steps 1-4, provide a fit score
(1-10) with your top concern and top strength.
Why it works: Chain-of-thought forces the model to generate reasoning tokens that become context for the conclusion. Explicit anchoring ("Based ONLY on steps 1-4") prevents the model from contradicting its own analysis. See full breakdown in Pattern #1.
Five end-to-end system designs showing how patterns combine. Each pulls from real work I've done in career coaching, restaurant operations, and AI SaaS.
|
5-stage pipeline combining extraction, analysis, and verification. Prevents fabricated experience. |
Restaurant Operations Analysis Weekly reports for multi-location restaurant groups. Context management + self-consistency. |
Adversarial-safe scraping pipeline with dynamic few-shot matching against user profiles. |
|
Guardrails-first classifier with layered defense and human-in-the-loop for uncertainty. |
Handles 70% of tickets autonomously with multi-model routing and graceful degradation. |
Patterns production AI systems share β and why single-pattern solutions don't work in the real world. |
The four categories aren't independent β each builds on the previous. You won't get much from Prompt Versioning (#20) if you haven't mastered System Prompt Architecture (#11), and Chain of Verification (#13) assumes you already know Chain-of-Thought (#1).
Progress through the tiers as your systems grow in complexity. Most indie projects need only Foundational + Intermediate. Production apps need Advanced. Anything serving real users at scale needs Production.
Every pattern file in this repo follows the same template so you can scan quickly:
# Pattern Name
Category Β· Difficulty Β· Impact
## When To Use β Decide if this applies to your problem
## The Problem β Naive prompt + what goes wrong + why
## The Pattern β Engineered prompt with inline explanations
## Why It Works β Model mechanics and underlying theory
## Real-World Example β Concrete scenario from career/restaurant/SaaS
## Common Mistakes β 2-3 pitfalls people hit in practice
## Related Patterns β Which other patterns complement this one
Read one top-to-bottom. Then use the rest as a reference.
Across all visuals, categories use consistent colors:
| Category | Color | Hex |
|---|---|---|
| π§± Foundational | π¦ Blue | #3B82F6 |
| β‘ Intermediate | π§ Amber | #F59E0B |
| π¬ Advanced | πͺ Purple | #8B5CF6 |
| π Production | π© Green | #10B981 |
Found a pattern that's missing? Spotted a case where my engineered prompt fails? PRs welcome. The bar for new patterns is high:
- Real problem: Document an actual failure mode, not a hypothetical.
- Real fix: Engineered prompt must be something you'd ship.
- Real reasoning: Explain why it works, not just that it works.
- Fits the template: Follow the existing pattern file structure.
![]() |
Sayem Islam Prompt Specialist & AI Evaluator Building AI products and evaluating them for a living. Career pivoter (restaurant operations β tech) who genuinely cares about why AI systems work or don't. This repo is the reference I wish existed when I started working with LLMs in production. π§ hello@sayemislam.com π sayemislam.com π LinkedIn π mysecondact.io |
β If this helped you, star the repo. If it didn't, tell me why β I'll fix it.
Last updated: 2026 Β· MIT License Β· Built with attention to detail
