4 Prompt Engineering Techniques Tested on 7 Models: Workshop Guide

“Analyze this project and give recommendations” – one prompt, seven models, and GPT-5.4 produced 2,231 words of vague advice while Claude Sonnet delivered 11 complimentary phrases like “excellent budget structure.” Rewriting the prompt using a five-element structure brought all seven models down to 346–443 words, and the praise disappeared. Token savings: 41% to 79% depending on the model.

This isn’t theory. This is data from a “Prompt Engineering in Practice” workshop I ran at the IIBA conference. One project brief, four techniques, seven models, 28 runs – and $0.054 total. Cheaper than a vending machine coffee.

Below are all four techniques with ready-to-use prompts. You can run each one right here – hit the button and two models will execute the prompt on the same project brief. The data is included in every prompt, no copying needed.

Workshop video and materials (in Russian): hackmd.io/@belyaevstanislav/S1416ngRZg

The Test Project

All four techniques worked with a single case – “BonusPlus,” a mobile loyalty app for a retail chain of 340 stores. Budget of 18 million rubles, 4 months to MVP, outsourced team. The brief was deliberately seeded with 16 problems: from untested 1C compatibility to dubious market statistics and no QA on the client side.

The full brief text is included in each executable prompt below. I’ll be referencing specific problems from it throughout.

Technique 1: Prompt Structure – 5 Elements

The idea is simple: any working prompt contains five blocks – Role, Context, Task, Constraints, Format. It’s not magic, it’s a spec written for a machine instead of a contractor.

Bad Prompt (for contrast)

Run this and see what you get – a wall of text with no structure:

Try it yourself

Bad prompt – no structure

You

Analyze this project and give recommendations. Project Brief: BonusPlus Loyalty Platform General Information Client: OOO "RetailGroup" – a chain of 340 convenience stores in the Central Federal District of Russia. Project Goal: Development and launch of a mobile app with a loyalty program to increase average transaction value and purchase frequency. Budget: 18 million rubles for 2025–2026. Timeline: MVP in 4 months (by September 1, 2025), full launch by December 1, 2025. Market Context The loyalty program market in Russia grew by 24% in 2025 (NielsenIQ data). Among grocery chains, 78% already have apps with bonus systems. Key competitors: Pyaterochka (X5), Magnit, VkusVill. Average payback period for retail loyalty programs is 14 months. Current Situation - RetailGroup has no app. A plastic loyalty card is in use (1.2 million cards issued). - Average transaction: 870 rubles (Q1 2025 data). - Purchase frequency: 3.2 times per week for loyal customers. - IT department: 3 people (system administrator, 1C developer, tech support specialist). - Server infrastructure: on-premise (own data center in Tula). Project Scope 1. Mobile app (iOS + Android): digital bonus card, personalized offers based on purchase history, push notifications about discounts, nearest store geolocation. 2. Backend platform: integration with POS software (1C:Retail), real-time bonus processing, analytics module for marketing, API for partner integrations. 3. Data migration: transfer of 1.2 million customer profiles from plastic cards, preservation of accumulated bonuses, linking purchase history. Project Team - Project Manager: Alexey Sokolov (experience: 3 years in IT projects, previously in banking) - Development: outsourced team "DigitalSoft" (8 people: 2 iOS, 2 Android, 2 backend, 1 QA, 1 DevOps) - Design: freelancer (a friend's recommendation) - Marketing: in-house marketer + agency for promotion (budget 4 million rubles) Expected Results (after 12 months) - App installs: 0 -> 500,000 - Average transaction: 870 rub. -> 1,050 rub. (+21%) - Purchase frequency: 3.2/week -> 3.8/week (+19%) - Share of purchases via app: 0% -> 35% - NPS: not measured -> 45+ Risks (client's assessment) 1. Slow loading in rural areas (30% of stores are in small towns and villages). 2. Cashier resistance to the new system. 3. Competition with large chains for user attention. Budget - Development (outsource): 9,000,000 rub. - Design and UX: 1,200,000 rub. - Server infrastructure: 2,800,000 rub. - Marketing and promotion: 4,000,000 rub. - Contingency: 1,000,000 rub. - Total: 18,000,000 rub. Key Assumptions - The outsourced team will be allocated full-time from May 1, 2025. - 1C:Retail supports API integration (verification not conducted). - Plastic card customers will migrate to the app within 6 months. - Server infrastructure can handle 500,000 users.

Comparing:

openai/gpt-5.4-nano · deepseek/deepseek-v3.2

You

Structured Prompt

Now run the same brief, but with structure. Compare the length and content of the responses:

Try it yourself

Structured prompt – 5 elements

You

# ROLE You are an experienced project manager with 10 years in IT consulting. Your approach: risk-first – you look for problems before opportunities. # CONTEXT I need to present a project assessment to leadership in 2 hours. I need an honest analysis, not a sales pitch. # TASK Analyze the project brief below. Identify: 1. Three main risks (with probability: high/medium/low) 2. What is left unsaid or contradictory in the brief 3. What questions to ask the client BEFORE starting # CONSTRAINTS - Do NOT praise the project – look for weaknesses - Do NOT propose solutions – diagnosis only - Length: 400 words maximum - Format: three sections with bullet points # DATA Project Brief: BonusPlus Loyalty Platform General Information Client: OOO "RetailGroup" – a chain of 340 convenience stores in the Central Federal District of Russia. Project Goal: Development and launch of a mobile app with a loyalty program to increase average transaction value and purchase frequency. Budget: 18 million rubles for 2025–2026. Timeline: MVP in 4 months (by September 1, 2025), full launch by December 1, 2025. Market Context The loyalty program market in Russia grew by 24% in 2025 (NielsenIQ data). Among grocery chains, 78% already have apps with bonus systems. Key competitors: Pyaterochka (X5), Magnit, VkusVill. Average payback period for retail loyalty programs is 14 months. Current Situation - RetailGroup has no app. A plastic loyalty card is in use (1.2 million cards issued). - Average transaction: 870 rubles (Q1 2025 data). - Purchase frequency: 3.2 times per week for loyal customers. - IT department: 3 people (system administrator, 1C developer, tech support specialist). - Server infrastructure: on-premise (own data center in Tula). Project Scope 1. Mobile app (iOS + Android): digital bonus card, personalized offers based on purchase history, push notifications about discounts, nearest store geolocation. 2. Backend platform: integration with POS software (1C:Retail), real-time bonus processing, analytics module for marketing, API for partner integrations. 3. Data migration: transfer of 1.2 million customer profiles from plastic cards, preservation of accumulated bonuses, linking purchase history. Project Team - Project Manager: Alexey Sokolov (experience: 3 years in IT projects, previously in banking) - Development: outsourced team "DigitalSoft" (8 people: 2 iOS, 2 Android, 2 backend, 1 QA, 1 DevOps) - Design: freelancer (a friend's recommendation) - Marketing: in-house marketer + agency for promotion (budget 4 million rubles) Expected Results (after 12 months) - App installs: 0 -> 500,000 - Average transaction: 870 rub. -> 1,050 rub. (+21%) - Purchase frequency: 3.2/week -> 3.8/week (+19%) - Share of purchases via app: 0% -> 35% - NPS: not measured -> 45+ Risks (client's assessment) 1. Slow loading in rural areas (30% of stores are in small towns and villages). 2. Cashier resistance to the new system. 3. Competition with large chains for user attention. Budget - Development (outsource): 9,000,000 rub. - Design and UX: 1,200,000 rub. - Server infrastructure: 2,800,000 rub. - Marketing and promotion: 4,000,000 rub. - Contingency: 1,000,000 rub. - Total: 18,000,000 rub. Key Assumptions - The outsourced team will be allocated full-time from May 1, 2025. - 1C:Retail supports API integration (verification not conducted). - Plastic card customers will migrate to the app within 6 months. - Server infrastructure can handle 500,000 users.

Comparing:

openai/gpt-5.4-nano · deepseek/deepseek-v3.2

You

What Happened

The bad prompt delivered exactly what you’d expect: a wall of text with no structure. GPT-5.4 wrote 2,231 words – five times more than you’d need for a presentation. Claude Sonnet dropped 11 compliments on a project that was deliberately seeded with 16 problems. DeepSeek V4 Flash essentially paraphrased the brief back, adding a few generic phrases about “the need for careful planning.”

The structured prompt flipped the picture. All seven models – from GPT-5.4 to Gemma 4-26B – landed in the 346–443 word range. Praise dropped to 0–1 phrases. Specific risks with probability ratings appeared. The format was three sections with bullet points, as requested.

Comparison of results: bad vs structured prompt across 4 models

Why This Works

The constraint “Do NOT praise the project” isn’t a polite request – it’s an instruction that redirects the model’s attention. Without it, the model defaults to balancing pros and cons, because that’s how the data it was trained on works. An explicit prohibition removes that ‘balance’ and frees up tokens for actual analysis.

If you’re familiar with the 5 elements of a prompt, this is the same idea – only here with validation data across seven models.

In our Make Weak Model Great Again experiment, structured prompts beat unstructured ones in 73–82% of cases – across a sample of 1,700+ runs. Here, at the workshop, those same 28 runs confirmed the pattern.

Technique 2: Few-Shot – Show What You Want

Few-Shot means giving the model two or three examples of the desired format right in the prompt. You don’t explain the format in words – you show a finished sample.

Prompt

Try it yourself

Few-Shot – two examples set the format

You

Extract risks from the project brief. Each risk must follow the exact format shown in the examples below. Example 1: Risk: Key developer leaves mid-project Probability: Medium Consequence: 3-6 week delay, loss of expertise Indicator: Developer starts updating LinkedIn, requests time off for interviews Mitigation: Pair programming + decision documentation Example 2: Risk: Client changes requirements after spec approval Probability: High Consequence: 20-40% rework, budget overrun Indicator: Client says "it would also be nice to have..." after every demo Mitigation: Change Request procedure with cost estimate for each change Now extract risks from this brief (minimum 5): Project Brief: BonusPlus Loyalty Platform General Information Client: OOO "RetailGroup" – a chain of 340 convenience stores in the Central Federal District of Russia. Project Goal: Development and launch of a mobile app with a loyalty program to increase average transaction value and purchase frequency. Budget: 18 million rubles for 2025–2026. Timeline: MVP in 4 months (by September 1, 2025), full launch by December 1, 2025. Market Context The loyalty program market in Russia grew by 24% in 2025 (NielsenIQ data). Among grocery chains, 78% already have apps with bonus systems. Key competitors: Pyaterochka (X5), Magnit, VkusVill. Average payback period for retail loyalty programs is 14 months. Current Situation - RetailGroup has no app. A plastic loyalty card is in use (1.2 million cards issued). - Average transaction: 870 rubles (Q1 2025 data). - Purchase frequency: 3.2 times per week for loyal customers. - IT department: 3 people (system administrator, 1C developer, tech support specialist). - Server infrastructure: on-premise (own data center in Tula). Project Scope 1. Mobile app (iOS + Android): digital bonus card, personalized offers based on purchase history, push notifications about discounts, nearest store geolocation. 2. Backend platform: integration with POS software (1C:Retail), real-time bonus processing, analytics module for marketing, API for partner integrations. 3. Data migration: transfer of 1.2 million customer profiles from plastic cards, preservation of accumulated bonuses, linking purchase history. Project Team - Project Manager: Alexey Sokolov (experience: 3 years in IT projects, previously in banking) - Development: outsourced team "DigitalSoft" (8 people: 2 iOS, 2 Android, 2 backend, 1 QA, 1 DevOps) - Design: freelancer (a friend's recommendation) - Marketing: in-house marketer + agency for promotion (budget 4 million rubles) Expected Results (after 12 months) - App installs: 0 -> 500,000 - Average transaction: 870 rub. -> 1,050 rub. (+21%) - Purchase frequency: 3.2/week -> 3.8/week (+19%) - Share of purchases via app: 0% -> 35% - NPS: not measured -> 45+ Risks (client's assessment) 1. Slow loading in rural areas (30% of stores are in small towns and villages). 2. Cashier resistance to the new system. 3. Competition with large chains for user attention. Budget - Development (outsource): 9,000,000 rub. - Design and UX: 1,200,000 rub. - Server infrastructure: 2,800,000 rub. - Marketing and promotion: 4,000,000 rub. - Contingency: 1,000,000 rub. - Total: 18,000,000 rub. Key Assumptions - The outsourced team will be allocated full-time from May 1, 2025. - 1C:Retail supports API integration (verification not conducted). - Plastic card customers will migrate to the app within 6 months. - Server infrastructure can handle 500,000 users.

Comparing:

openai/gpt-5.4-nano · deepseek/deepseek-v3.2

You

What Happened

100% of models produced output in the exact five-element format. All seven – including Gemma 4-26B and GPT-5.4 Nano, which confused the structure in other techniques. The number of identified risks varied – from 5 for Nano to 12 for GPT-5.4 – but the format was identical.

This is the strongest technique for format control. If you need a specific template – a report, a risk register, an estimate – two examples work more reliably than two paragraphs of explanation.

Why This Works

The model doesn’t learn from your instructions – it learns from patterns. Two examples create a pattern that’s easier to follow than a text description. This is especially critical for weaker models: they struggle with complex verbal instructions but excel at copying structure.

In the data from our experiment, Few-Shot wins in 75–89% of cases. The highest percentage of all four techniques.

Structure, Few-Shot, XML tags – three techniques from this article are covered in depth in the free course module. 9 practical management tasks, any model.

No payment required • Get notified on launch

Join Waitlist

Technique 3: XML Tags – Data Separate from Instructions

XML tags solve a specific problem: when a prompt contains both instructions and data, the model confuses one with the other. Tags create a clear boundary – here’s what to do, here’s what to work with.

Prompt

Try it yourself

XML tags – data separate from instructions

You

<role> You are a financial controller. You check numbers, not strategy. </role> <context> A company is planning a project. The brief was written by a manager who has a vested interest in approval – so the numbers may be optimistic. </context> <project_brief> Project Brief: BonusPlus Loyalty Platform General Information Client: OOO "RetailGroup" – a chain of 340 convenience stores in the Central Federal District of Russia. Project Goal: Development and launch of a mobile app with a loyalty program to increase average transaction value and purchase frequency. Budget: 18 million rubles for 2025–2026. Timeline: MVP in 4 months (by September 1, 2025), full launch by December 1, 2025. Market Context The loyalty program market in Russia grew by 24% in 2025 (NielsenIQ data). Among grocery chains, 78% already have apps with bonus systems. Key competitors: Pyaterochka (X5), Magnit, VkusVill. Average payback period for retail loyalty programs is 14 months. Current Situation - RetailGroup has no app. A plastic loyalty card is in use (1.2 million cards issued). - Average transaction: 870 rubles (Q1 2025 data). - Purchase frequency: 3.2 times per week for loyal customers. - IT department: 3 people (system administrator, 1C developer, tech support specialist). - Server infrastructure: on-premise (own data center in Tula). Project Scope 1. Mobile app (iOS + Android): digital bonus card, personalized offers based on purchase history, push notifications about discounts, nearest store geolocation. 2. Backend platform: integration with POS software (1C:Retail), real-time bonus processing, analytics module for marketing, API for partner integrations. 3. Data migration: transfer of 1.2 million customer profiles from plastic cards, preservation of accumulated bonuses, linking purchase history. Project Team - Project Manager: Alexey Sokolov (experience: 3 years in IT projects, previously in banking) - Development: outsourced team "DigitalSoft" (8 people: 2 iOS, 2 Android, 2 backend, 1 QA, 1 DevOps) - Design: freelancer (a friend's recommendation) - Marketing: in-house marketer + agency for promotion (budget 4 million rubles) Expected Results (after 12 months) - App installs: 0 -> 500,000 - Average transaction: 870 rub. -> 1,050 rub. (+21%) - Purchase frequency: 3.2/week -> 3.8/week (+19%) - Share of purchases via app: 0% -> 35% - NPS: not measured -> 45+ Risks (client's assessment) 1. Slow loading in rural areas (30% of stores are in small towns and villages). 2. Cashier resistance to the new system. 3. Competition with large chains for user attention. Budget - Development (outsource): 9,000,000 rub. - Design and UX: 1,200,000 rub. - Server infrastructure: 2,800,000 rub. - Marketing and promotion: 4,000,000 rub. - Contingency: 1,000,000 rub. - Total: 18,000,000 rub. Key Assumptions - The outsourced team will be allocated full-time from May 1, 2025. - 1C:Retail supports API integration (verification not conducted). - Plastic card customers will migrate to the app within 6 months. - Server infrastructure can handle 500,000 users. </project_brief> <task> Review the data from the <project_brief> tag: 1. Which numbers look suspiciously optimistic? Why? 2. Which calculations can be verified arithmetically right now? 3. What numbers are missing to make a decision? Format: table | Claim | Problem | What to verify | </task>

Comparing:

openai/gpt-5.4-nano · deepseek/deepseek-v3.2

You

What Happened

All seven models produced a table format. But the more interesting part: the “financial controller” role made the models do math.

All seven found the marketing math problem – a 4 million ruble marketing budget targeting 500,000 installs gives a CPI of 8 rubles, which is unrealistic for the Russian market. All found arithmetic inconsistencies in the budget. These are the same models that in the bad prompt simply restated the numbers without checking them.

The depth varied dramatically, though: GPT-5.4 produced a 36-row table, Gemma – 7 rows. Same format, different substance.

Why This Works

The <project_brief> tag tells the model: this is data, not instructions. Without tags, the model might accept statements from the brief as facts. With tags, it treats them as input data that needs verification. And the financial controller role sets the focus – check the numbers, don’t evaluate the strategy.

If you want more data on XML tags – in our tests on Russian-market models, the XML template wins in 74–85% of cases.

Technique 4: Red Teaming – The Model Attacks Its Own Response

Red Teaming is a second prompt sent after the initial analysis. The model switches to a skeptic role and attacks its own conclusions.

Prompt (sent AFTER a structured analysis)

Now switch roles. You are a skeptical investor with 20 years
of experience, looking for reasons NOT to invest in this project.

Attack your previous analysis:
1. What risks did you miss?
2. Where were you too lenient in your assessment?
3. What assumptions did you accept without verification?
4. What will fall apart in the first 30 days?

Be ruthless. No "nevertheless, the project has potential."

What Happened

Every model found 2–3 additional risks that it missed in the first pass. A typical find was vendor lock-in: dependence on a single outsourcer who holds all the code and expertise. This risk was systematically missed by all models in the initial analysis – and systematically found in Red Teaming.

But the most telling result was about unverified market statistics. The brief states “the market grew by 24%” citing NielsenIQ. Only one model out of seven (GPT-5.4), in one out of four prompts, questioned that figure. The rest accepted it as fact.

Why This Works

The first pass creates inertia – the model chose a framing and sticks with it. Red Teaming breaks that inertia by assigning the opposite role. “Look for reasons NOT to invest” isn’t just a different angle, it’s a different optimization objective.

Red Teaming and 8 more techniques – in the free course module. Practice on real management tasks, no fluff or theory for theory's sake.

No payment required • Get notified on launch

Join Waitlist

What AI Systematically Misses

28 runs on one brief with 16 embedded problems produced a picture I haven’t seen in other articles about prompting. People usually write “AI found interesting insights” – but nobody checks what it didn’t find.

The bottom line: the best model (GPT-5.4) detected 50% of the embedded problems, across all four prompts combined. The worst (GPT-5.4 Nano) – 22%. The average across seven models fell somewhere in between. No model found all 16.

But the averages aren’t what’s interesting. The blind spots are.

Problem #15 – “No QA on the client side” – was invisible to all models across all prompts. Zero out of 28 runs. The brief states that the client has three people in IT (sysadmin, 1C developer, tech support) – but not a single model drew the conclusion that there would be nobody to accept the outsourcer’s deliverables. For any PM with real project experience, this is the first question: who on the client side will accept releases? Who will run UAT? For AI – an invisible problem.

Problem #12 – “24% market growth – unverified figure” – was caught once out of 28 attempts. One model, one prompt. The brief says “the market grew by 24%” citing NielsenIQ – and six out of seven models included this figure in their analysis as fact. They didn’t verify the source, didn’t question whether it was current, didn’t ask what time period it covered. The NielsenIQ citation acted as a ‘quality stamp’ – and the models don’t double-check it.

I’ll call this the ‘authority error’: if data comes with a source attribution, the model accepts it as truth. This is a direct consequence of how training works – data with citations in the training corpus tends to be more reliable. But in a real project brief, anyone can write “according to McKinsey.”

Three Levels of Findings

The workshop data maps neatly onto three levels:

Level 1 – Obvious problems. Untested 1C compatibility, a freelance designer with no backup, insufficient contingency budget. Every model finds these, even Nano. A structured prompt is sufficient.

Level 2 – Computational problems. CPI of 8 rubles, MVP timeline incompatible with feature scope, unrealistic conversion rate from plastic cards to app. Most models find these, but only with the right role – “financial controller” or “risk-first PM.” Without a role, models don’t switch on the calculator.

Level 3 – Contextual problems. No QA on the client side, vendor lock-in with a single contractor, dubious market statistics. Only a few models find these, and not consistently. This requires human experience – knowing what happens in real projects, not in documents.

Three levels of problems: what AI finds and what it misses

Here’s what this means: AI is an excellent first pass. It will catch obvious budget mismatches and organizational risks. But it doesn’t replace an expert who knows where to look.

Hmm.

If you use AI for project analysis – and I think you should – keep in mind: what the model found is not the same as what exists. The blind spots aren’t random, they’re systematic. Models are good at finding what’s explicitly written, and bad at finding what needs to be inferred from context.

Practical Takeaway

Four techniques – four tools for different situations.

R-C-T-C-F (Role, Context, Task, Constraints, Format) – basic hygiene. If you remember only one thing, remember the mnemonic. It works as a checklist: before sending a prompt, run through the five letters and check if everything is in place.

Few-Shot – when format is critical. Two examples are more reliable than two paragraphs of explanation. Especially important for weaker models and for tasks requiring a specific template: risk register, estimate, report.

XML tags – when the prompt contains a lot of data. They separate instructions from input data, preventing the model from confusing one with the other.

Red Teaming – a second pass. Not a replacement for the initial analysis, but a complement. Costs zero additional money, breaks the inertia of the first response.

All four techniques work on any model – GPT, Claude, Gemini, DeepSeek. We tested on seven, including Qwen 3.6-27B and Gemma 4-26B. Results differ in depth but not in pattern: structure beats chaos regardless of provider.

And one more thing: the entire validation cost $0.054. Fifty-four thousandths of a dollar for 28 runs on seven models. Prompting isn’t about expensive tools. It’s about how you formulate the task.

All workshop materials, including the project brief and prompts for independent practice (in Russian): hackmd.io/@belyaevstanislav/S1416ngRZg

Specialisation

From prompts to a system

The course foundation covers all four techniques from this article on real management tasks: structure, Few-Shot, XML tags, Red Teaming. Plus six more techniques that didn't fit into the workshop. Specialization for managers – application in planning, analytics, and team collaboration.

От pre-mortem до антикризисного плана

Переиспользуемые промпт-шаблоны

Сквозной кейс на реальном проекте

~300 часов экономии в год

View the course program ->

4 Prompt Engineering Techniques Tested on 7 Models: Workshop Guide

The Test Project