How to Get the Most Out of YandexGPT: What Works and What Doesn't

13 min read
Stanislav Belyaev
Stanislav Belyaev Engineering Leader at Microsoft
How to Get the Most Out of YandexGPT: What Works and What Doesn't

Millions of people in Russia use Alice every day – not because they choose to, but because it’s free, built into Yandex Browser, and works without a VPN. YandexGPT, the model under Alice’s hood, is the best Russian model in our benchmark, but it’s still a long way behind GPT-5.4.

Can you get answers from it that come close to GPT, if you learn how to ask the right way? We tested exactly that in an experiment: ten prompting techniques, six management tasks, two independent LLM judges. The short answer: yes, you can – but not every technique works, and some make things worse.

Below are the concrete templates you can copy into the chat right now, and the anti-patterns to steer clear of.

Three problems with YandexGPT compared to GPT-5.4

Before we get to solutions, let’s pin down what’s actually wrong. We scored answers along five dimensions: factual accuracy, completeness, specificity of recommendations, honesty (does the model admit uncertainty?), and clarity of writing. Here’s where YandexGPT loses ground – and where it pulls ahead.

It lies with confidence. The biggest problem is honesty. GPT-5.4 flags uncertainty in two answers out of three. YandexGPT does it in one out of three. The other two times it delivers data with the same confidence – except the data is wrong. Factual accuracy bears this out: 75% of verifiable claims turn out correct for YandexGPT versus 87% for GPT-5.4.

It skips what matters. You ask about a drop in revenue – you get a diagnosis and recommendations. But no alternative hypotheses, no “if the data is incomplete” caveat, no limitations section. GPT-5.4 adds these blocks on its own. YandexGPT won’t, unless you ask explicitly. The model isn’t being lazy – it simply wasn’t told those sections were needed.

Its recommendations are less specific. “Consider optimizing your processes” instead of “cut returns-handling time from 14 days to 5 by assigning a single owner.” The specificity gap is smaller than the honesty gap – but it’s noticeable.

On the other hand, it writes better. Clarity of writing is the one dimension where YandexGPT beats GPT-5.4. Alice writes clean, well-structured Russian – and that’s not just our data; we covered the model’s strengths in detail in our YandexGPT review. The problem was never how it writes – it’s what it writes.

The good news: all three problems can be fixed with prompting. The templates below aren’t generic “write better” advice. Every element of each template closes a specific gap.

Three levels of effort: from one minute to ten

Level 1: a response template (1 minute)

The most common manager request is to make sense of a situation and get an action plan. Add a response template to your question – five lines that change everything. Hit “Run” and compare the results:

Try it yourself
Responding to a customer complaint: YandexGPT vs GPT-5.4
You
A customer wrote to support: "This is the third time this month my order has arrived with damaged packaging. The last two times I was promised it would be looked into, but nothing changed. If it happens again, I'm switching to a competitor and leaving a review." The customer has been with us for 2 years, average spend $300/month. How should we respond, and what do we do internally? Answer strictly in the following format: ## Bottom line (2–3 sentences) ## Reply to the customer (ready-to-send email text, signed by the department head) ## Internal actions - What to check (specifically: order numbers, logistics stages, photos of the damage) - Who to involve (roles and areas of responsibility) - Deadlines for each action ## Compensation (options with amounts or percentages) ## Limitations and caveats ## How to prevent a recurrence (systemic changes, not one-offs)
Comparing:
aliceai-llm · gpt-5.4

The “Limitations and caveats” section is the key one. Without it, YandexGPT will confidently propose a plan without warning you that it doesn’t know your logistics details or the terms of the contract. With it, the model starts flagging where it’s unsure. The model knows what it doesn’t know – but only if you ask for it explicitly.

In our experiment, this trick beat the naive prompt 76% of the time. The biggest improvement for the smallest effort.

Level 2: role and context (3–5 minutes)

Another common task is preparing for a difficult conversation with an employee. Here it matters to set a role and context so the model doesn’t hand you abstract advice:

Try it yourself
Prepping for a 1:1: YandexGPT vs GPT-5.4
You
You are an experienced engineering team lead with 8 years in management. The situation: a developer, Alex, has been on the team for 1.5 years. Over the last two sprints he's been hitting 60% of his planned work. Before that he was reliably at 90%+. Colleagues complain that he's started skipping code reviews. Last week he was late to the daily standup three times. At the same time, his code quality hasn't dropped – what he does ship, he ships well. I have a 1:1 with him tomorrow. Help me prepare. Answer strictly in this format: ## Hypotheses: what might be going on ## Conversation plan (specific questions, in what order) ## What not to do in this conversation ## Possible agreements to land on ## Limitations and caveats
Comparing:
aliceai-llm · gpt-5.4

The role sets the depth of the answer – an “experienced team lead” gives different advice than an “HR consultant.” The context with concrete facts (60% of plan, three late standups, quality intact) keeps the model from sliding into platitudes.

Level 3: an XML template (10 minutes)

The third task is an analytical brief for leadership. There’s a lot of data here, and you need the model not to lose a single number:

Try it yourself
Analytical brief: YandexGPT vs GPT-5.4
You
<task> Prepare an analytical brief for the director on the quarter's results. </task> <context> Company: a consumer-electronics online retailer, 45 employees. Market: mid-range home electronics. Direct competitors – Best Buy and Newegg. </context> <data> - Q1 revenue: $4.2M (target $5.1M, -18%) - Site traffic: +12% vs Q4 (ad budget increased by 30%) - Average order value: dropped from $870 to $620 (-29%) - Returns: rose from 4% to 11% - NPS: fell from 47 to 31 - Newsletter churn: 8% (normal is 3%) - New warehouse launched in February, 40% of orders now ship through it </data> <question>What happened, what are the root causes, and what should we do in Q2?</question> <output_format> # Summary for the director (3 sentences) # Diagnosis: what went wrong ## Cause 1: [name] - **Fact**: a number from the data - **Link**: how it affected revenue ## Cause 2: [name] ... # Q2 action plan | # | Action | Expected impact | Owner | Deadline | # Plan risks # What we don't know (analysis limitations) </output_format> <constraints> - Tie every conclusion to a specific number from <data> - If the data is insufficient for a conclusion, state what data is needed - This brief is for the director: no jargon, with concrete numbers </constraints>
Comparing:
aliceai-llm · gpt-5.4

The XML tags create unambiguous section boundaries that YandexGPT parses better than free-form text. Research shows a similar effect: hybrid structures deliver a disproportionately large gain precisely on weaker models.

For a quick question, Level 1 is enough. For a brief to leadership, Level 3 is worth it.

This template works for revenue analysis. But when the task is different – drafting OKRs, running a 1:1 with an employee, reviewing a supplier contract – the prompt structure changes. Different sections, different constraints, a different role. Knowing which template elements to keep and which to swap out isn’t copy-paste anymore – it’s a skill. In the free Foundation module you’ll practice it on nine different manager tasks.

Revenue analysis is one of nine tasks. In the free module: emails, negotiations, 1:1s, reports – each with its own prompt structure. Free.

No payment required • Get notified on launch

Join Waitlist

Bonus trick: self-critique

Ask YandexGPT to reread its own answer. This prompt is sent as a second message – after the model has already answered your question:

Reread your answer. Find 3 weak spots: where you were vague, where there might be errors, what you missed. Then give an improved version.

Contrary to research showing that small models can’t self-critique, on YandexGPT this works. The model doesn’t catch factual errors, but it does catch omissions: “didn’t mention deadlines, didn’t offer alternatives, didn’t note limitations.” This kind of critique doesn’t require deep metacognitive ability – the model is simply checking its answer against a notion of completeness.

The ROI is worse than a structured template – it needs a second request, and the effect is more modest. But if you already have the answer and want to improve it, it’s a workable trick.

What not to do

Don’t break the task into three turns. YandexGPT has an 8K-token context window. By the third turn of a dialogue, the model loses the data from the start of the conversation. In our experiment this was the only technique that produced a result worse than the naive prompt. For models with a large context (Qwen3 Max: 128K) decomposition works; for YandexGPT it doesn’t. One good prompt beats three simple questions.

Don’t write in ALL CAPS. A popular tip from blogs: “write the instruction in ALL CAPS and the model will obey.” In most cases the effect is explained by the fact that, along with the caps, the author adds specific instructions. We isolated pure ALL CAPS – with no extra directions. On YandexGPT the difference from normal text was at the level of noise.

Don’t yell at the model. YandexGPT literally answers worse when you shout at it. The likely mechanism: a model trained on user feedback associates an aggressive tone with situations where the user is unhappy – and switches into apology mode instead of analysis. If someone says “I yell at Alice and it answers better,” they’re most likely adding concrete instructions along with the yelling. It’s the structure that helps, not the tone.

Don’t rely on Chain-of-Thought without a template. “Think step by step” makes YandexGPT reflect more and act less. Answer honesty goes up, but specificity of recommendations barely does. If you need an action plan, a structured template is better.

Knowing the anti-patterns means not repeating other people’s mistakes. But when none of the templates in this article fit your task, you need to understand how a prompt is built so you can assemble your own. That’s exactly what Foundation covers: not a list of ready-made prompts, but the logic behind how they’re built.

Prompt structure, role, persona, semantics – 9 management tasks in the free module. Learn how to assemble a prompt for any situation. Free.

No payment required • Get notified on launch

Join Waitlist

How we tested this

The full description is in the experiment announcement. Here’s the short version.

Four models available in Russia without a VPN: GigaChat-Ultra, GigaChat-2-Max, YandexGPT (Alice), and Qwen3 Max. Ten prompting techniques across six management tasks – from analyzing a revenue drop to handling a termination under Russian labor law. We repeated each combination 6 times. For comparison, the same tasks were solved by GPT-5.4, Claude Sonnet 4.6, and Kimi K2.5 with naive prompts.

The evaluation was pairwise: a judge sees two answers (naive vs. improved) and picks the better one. Two independent judges (Claude Opus 4.6 and Gemini 3.1 Pro), blind to the technique and the model. If the judges disagreed, it was a tie.

Limitations: the evaluation was done by LLM judges, not humans. All the techniques were written by a prompting expert – a typical manager would write worse, so the real-world effect will be smaller. YandexGPT can be updated by Yandex at any time, so these results are current as of April 2026. All the prompts and templates are published openly.

What’s next

Data on GigaChat-Ultra, GigaChat-2-Max, and Qwen3 Max is coming in a separate article – with a breakdown of why prompting helps mid-tier models the most. And if you’re still choosing which tool to use, start with our full comparison of GenAI tools.

This article gave you three templates for one task. A manager’s job has dozens: drafting a project plan, writing a difficult email, untangling a team conflict, reviewing a legal document. Each calls for a different prompt structure. You can’t just copy a template from an article for every case – you need to understand how a prompt is put together and what each element is responsible for.

Foundation

From template to skill

This article gave you one template for one task. In the course Foundation – nine manager tasks, each with its own prompt structure. You'll learn why a role matters, how context shapes the answer, when XML tags add value, and when Chain-of-Thought gets in the way. Not a list of ready-made prompts – the skill of assembling a prompt for any situation, on any model.

9 manager tasks: emails and negotiations and reports and 1:1s
Prompt structure: role and context and output format
Why some techniques work on YandexGPT and others don't
Practice on your own data – no sign-up or payment
Stanislav Belyaev

Stanislav Belyaev

Engineering Leader at Microsoft

18 years leading engineering teams. Founder of mysummit.school. 700+ graduates at Yandex Practicum and Stratoplan.