Tool Comparison

40 GigaChat Case Studies vs the Benchmark: Checking Sber's Numbers

May 26, 2026

23 min read

Sber, Russia’s largest bank and the company behind GigaChat, released a sponsored showcase: forty business cases from companies that deployed GigaChat and reported the results. EdTech, MedTech, HRTech, cybersecurity, PropTech. Polished cards, concrete numbers, real startups.

On the image: the “One step ahead” promo slide from the Sber500×GigaChat accelerator – 40 startups across 9 industries. Claimed effects: business processes up to x16 faster, costs down by up to 90%, up to 95% task automation, and revenue up by up to 30%.

We have a benchmark of our own: 29 models, 4,308 independent evaluations on managerial tasks. In it, GigaChat sits dead last – 29th out of 29 after the second wave of testing. That creates an interesting situation.

Not because Sber is lying. The cases are real, the startups exist, the automation works. The question is different: was this the optimal model for the tasks they were solving?

40 GigaChat Case Studies vs the Benchmark: Checking Sber's Numbers

AI Benchmarks Are Losing Their Meaning – So How Do You Pick a Model?

May 3

7 min

AI Benchmarks Are Losing Their Meaning – So How Do You Pick a Model?

In March we broke down how LLM benchmarks actually work – GPQA Diamond, SWE-bench, Chatbot Arena. In April we tested 53 models and found that the quality gap between the top models is tenths of a point – while the price gap spans three orders of magnitude.

Now for the next question. What if the benchmarks themselves are starting to break?

99% Quality at 1.4% of the Price: What's Wrong with the AI Model Market

Apr 26

8 min

99% Quality at 1.4% of the Price: What's Wrong with the AI Model Market

Most managers pick an AI model the same way: grab the most expensive one available. The logic makes sense – pricier means better. That’s how enterprise software worked for the last twenty years.

The AI model market in 2026 works differently. The cost per query ranges from $0.0001 to $0.17 – three orders of magnitude. And the actual quality difference between the top ten models? 0.24 points on a five-point scale. Meanwhile, Wharton / GBK Collective reports that a third of corporate AI projects never get past the pilot stage. And Epoch AI shows that only 5.6% of users apply AI in any genuinely deep way.

Maybe the question isn’t which model is best, but whether paying a premium delivers proportionally better results for typical management tasks.

We tested it. The answer was harsher than we expected.

How to Get the Most Out of YandexGPT: What Works and What Doesn't

Apr 23

13 min

How to Get the Most Out of YandexGPT: What Works and What Doesn't

Millions of people in Russia use Alice every day – not because they choose to, but because it’s free, built into Yandex Browser, and works without a VPN. YandexGPT, the model under Alice’s hood, is the best Russian model in our benchmark, but it’s still a long way behind GPT-5.4.

Can you get answers from it that come close to GPT, if you learn how to ask the right way? We tested exactly that in an experiment: ten prompting techniques, six management tasks, two independent LLM judges. The short answer: yes, you can – but not every technique works, and some make things worse.

Below are the concrete templates you can copy into the chat right now, and the anti-patterns to steer clear of.

GigaChat Ultra Thinking: Thinks Longer – Answers Worse?

Mar 26

7 min

GigaChat Ultra Thinking: Thinks Longer – Answers Worse?

GigaChat Ultra Thinking takes longer to think and uses more compute. It solves management tasks 3.3% worse than the version without reasoning. This is not a bug or a fluke – it’s a pattern documented in academic papers over the past two years.

This week, Sber unveiled GigaChat Ultra – a new flagship model with a reasoning mode (Thinking). The model is available for free via web, mobile apps, and a Telegram bot. We immediately added both variants to our AI model research for managers: ran them through all 32 scenarios using our unified methodology, scored them with both LLM judges, and compared against the other 52 models.

Kimi by Moonshot in 2026: K3, K2.6, K2.7-Code and Agents for Managers

Mar 18

18 min

Kimi by Moonshot in 2026: K3, K2.6, K2.7-Code and Agents for Managers

Can an open-source Chinese model beat the closed flagships from OpenAI and Anthropic on availability? Based on our independent testing, the new Kimi K3 (released July 16, 2026) took 2nd place out of 47 models. The only model above it is GPT-5.6 Sol, which is blocked in restricted markets – which makes Kimi K3 the strongest model available without a VPN in markets where the Western flagships are restricted.

Chat Z.AI (GLM-5) Review 2026: Pricing, Benchmarks & Agent Mode

Mar 16

15 min

Chat Z.AI (GLM-5) Review 2026: Pricing, Benchmarks & Agent Mode

On February 6, 2026, an anonymous model called “Pony Alpha” appeared on OpenRouter – free, with zero details about its creators. The AI community immediately set about identifying it. Its coding abilities came remarkably close to Claude Opus 4.5. When asked “who are you?”, the model responded: “I am GLM.” But when prompted to write a web page describing itself – it wrote: “I am Claude, created by Anthropic.”

Best AI for Managers in Russia: 52 Models, 3,300+ Evaluations

Mar 15

11 min

Best AI for Managers in Russia: 52 Models, 3,300+ Evaluations

We conducted a large-scale study: 52 models, evaluations from two independent LLM judges, across 8 categories of management tasks. This is the most comprehensive Russian-language AI ranking for managers available today.

The question remains the same: which AI actually works for a manager in Russia – without VPN, without workarounds?

GenAI Tools Comparison 2026: Which AI Should a Manager Choose?

Mar 7

9 min

GenAI Tools Comparison 2026: Which AI Should a Manager Choose?

By March 2026, the generative AI market has dozens of tools. Every vendor claims to be the leader, and marketing materials compete in loudness. How does a manager choose a tool that actually solves real problems?

LLM Benchmarks Explained: MMLU, Chatbot Arena & SWE-bench Leaderboard (2026)

Mar 6

6 min

LLM Benchmarks Explained: MMLU, Chatbot Arena & SWE-bench Leaderboard (2026)

Imagine you’re choosing a company car for your team. One dealer says: “Our car is the fastest.” Another: “We have the best fuel economy.” A third: “We lead in safety.” They’re all right – but each is measuring something different. Without understanding what exactly is being measured and how, you can’t compare the options objectively.

Tool Comparison

40 GigaChat Case Studies vs the Benchmark: Checking Sber's Numbers

AI Benchmarks Are Losing Their Meaning – So How Do You Pick a Model?

99% Quality at 1.4% of the Price: What's Wrong with the AI Model Market

How to Get the Most Out of YandexGPT: What Works and What Doesn't

GigaChat Ultra Thinking: Thinks Longer – Answers Worse?

Kimi by Moonshot in 2026: K3, K2.6, K2.7-Code and Agents for Managers

Chat Z.AI (GLM-5) Review 2026: Pricing, Benchmarks & Agent Mode

Best AI for Managers in Russia: 52 Models, 3,300+ Evaluations

GenAI Tools Comparison 2026: Which AI Should a Manager Choose?

LLM Benchmarks Explained: MMLU, Chatbot Arena & SWE-bench Leaderboard (2026)

Essential

Analytics

Functional

Marketing

40 GigaChat Case Studies vs the Benchmark: Checking Sber's Numbers

AI Benchmarks Are Losing Their Meaning – So How Do You Pick a Model?

99% Quality at 1.4% of the Price: What's Wrong with the AI Model Market

How to Get the Most Out of YandexGPT: What Works and What Doesn't

GigaChat Ultra Thinking: Thinks Longer – Answers Worse?

Kimi by Moonshot in 2026: K3, K2.6, K2.7-Code and Agents for Managers

Chat Z.AI (GLM-5) Review 2026: Pricing, Benchmarks & Agent Mode

Best AI for Managers in Russia: 52 Models, 3,300+ Evaluations

GenAI Tools Comparison 2026: Which AI Should a Manager Choose?

LLM Benchmarks Explained: MMLU, Chatbot Arena & SWE-bench Leaderboard (2026)

⚙️ Cookie settings

Essential

Analytics

Functional

Marketing

Notice

Cookie Policy