April 30, 2026ChessIQTrust systemsProduct engineering

ChessIQ's Analysis Engine Just Got More Honest

Better labels, cleaner trust boundaries, and more reliable puzzles.

Some software bugs are easy to miss because nothing visibly breaks. No crash. No error message. Just quietly wrong numbers, quietly wrong labels, quietly wrong conclusions - and a user who trusts them. ChessIQ had a few of those. This post is about fixing them.

What changed for players

If you've been using ChessIQ to analyze your games, you may notice a few things feel different.

Move labels are more conservative

A move that previously got flagged as a "blunder" might now show up as a "mistake" or even an "inaccuracy." The engine is not getting softer. It is being more careful.

The old system could assign a severe label based mostly on raw centipawn loss. That is useful evidence, but it does not always tell the whole story. A large evaluation swing in a position that was already effectively lost does not mean the same thing as a move that turns a winning position into a draw or loss.

ChessIQ now looks more closely at the actual swing in expected result, not just the raw evaluation delta.

Centipawn loss still matters. It is still visible. It still supports accuracy scoring. But it no longer gets the only vote on how severe a move really was.

Accuracy scores are more stable

Provisional analysis - the kind the engine produces while it is still thinking - no longer bleeds into your accuracy score.

Before, if you clicked through a game quickly, some numbers could be influenced by half-finished engine output. That data is now properly quarantined until the engine has enough confidence to commit it.

Fast feedback is useful. False confidence is not.

Puzzles are more reliable

A training puzzle is only useful if the "correct" answer is actually correct.

There was a subtle flaw where the blunder a puzzle was built around could sometimes appear somewhere in the accepted solution set. In plain English: the system could accidentally build a puzzle that punished you for avoiding the mistake.

That is fixed.

Puzzle generation now validates that the played blunder does not appear anywhere in the accepted moves. Blunder puzzles also stay in the blunder lane instead of drifting into opportunity or conversion framing.

Training should not accidentally teach the move that caused the problem.

Progress indicators are more honest

The loading copy no longer oversells how far along analysis is. That is a small change, but it matters. If the app is still working, it should say that plainly. Chess analysis is complicated enough without pretending unfinished work is finished.

What changed under the hood

This was a three-pass overhaul. Each pass addressed a different part of the analysis trust chain.

Pass 1: Provisional data quarantine

The core problem was that in-progress analysis labels were leaking into surfaces that should only consume finalized data. That included accuracy scores, critical moment detection, game summaries, puzzle generation, motif stats, chart annotations, and PGN exports. The fix was making label finality explicit.

A label is now either:

committed - the engine finished and the data is trustworthy
provisional - the engine is still working and the label is not ready to be treated as truth

Every downstream consumer was updated to respect that boundary.

A few important semantics were hardened at the same time. The mover's color is now derived from the actual board position rather than inferred from ply count. Mate-score comparisons were also tightened, including edge cases where both sides are in forced-mate territory.

Those are the kinds of bugs that rarely announce themselves. They just make analysis feel a little inconsistent. Fixing them makes the whole system more stable.

Pass 2: Expected-score plumbing and timeout hardening

Stockfish can produce WDL estimates - win, draw, and loss - alongside centipawn evaluation. ChessIQ was previously relying too heavily on centipawn loss. This pass carried WDL data through the analysis pipeline and computed expected-score loss for each move.

That gives ChessIQ a better practical question to ask:

Did this move meaningfully change the likely result of the game?

That is different from asking only:

How much did the engine evaluation move?

Both questions matter. They just should not be treated as the same question.

This pass also tightened timeout handling. When the engine is interrupted mid-search, stale output from the previous position can potentially leak into the next search. ChessIQ now stops the engine more carefully, waits for it to become ready again, clears stale state, and only then allows the next search to proceed.

That work is not flashy. It is what keeps analysis honest.

Pass 3: WDL-gated severity and search fencing

The final pass wired expected-score loss into the classification system itself.

The key design decision:

Centipawn loss still supports accuracy scoring, but user-facing move severity is now gated by expected-score loss. So if a move looks terrible in centipawns but barely changes the practical result, ChessIQ can soften the label. A severe centipawn drop might still matter, but it should not automatically become a severe user-facing judgment.

At the same time, mate swings still override normal handling. If a move allows or misses a forced mate, that remains a special case.

ChessIQ now also tracks why a label was chosen. A classification can be based on mate, expected-score loss, or centipawn loss. That gives the app a path toward clearer explanations: not just "Blunder," but why the move earned that label.

The lowest-level fix was adding search fencing. Every Stockfish search now owns its output. A stale best move or analysis line from a previous search can no longer be parsed as evidence for the current position.

This is the kind of bug that is hard to reproduce deterministically, but when it happens, it can create strange classification behavior. The fix is simple in principle: every piece of engine output needs to belong to the search that produced it. Now it does.

Verification

This work is backed by the full test suite passing, including focused regression coverage for classification, accuracy, puzzle generation, cache behavior, timeout handling, and stale Stockfish output.

The final verification pass reached:

397 / 397 tests passing

That does not mean the analysis engine is perfect. It does mean this trust-chain pass is now covered by tests instead of just intuition.

Why this matters

Chess analysis tools can have a credibility problem when they present unstable engine output as settled truth. A label like "blunder" carries weight. It affects how a player feels about a move, whether they study it, whether it becomes a training puzzle, and how they understand the story of the game.

Getting that wrong quietly is worse than getting it wrong obviously.

ChessIQ's analysis pipeline is now more careful about the difference between:

what the engine is still considering
what the engine has finished computing
what looks bad by centipawn loss
what actually changed the expected result of the game

That distinction matters.

The goal is not to make ChessIQ less critical. The goal is to make it more accurate about when criticism is deserved.

What comes next

The next pass is about exposing more of this nuance in the interface itself. ChessIQ now has the underlying data needed to explain whether a move label came from a mate swing, expected-score loss, or centipawn loss. The next step is turning that into clear "why this label?" explanations without cluttering the analysis view.

Not every user wants engine internals. But every user deserves to know whether a label is settled, what evidence supports it, and why the app thinks the move mattered.

ChessIQ is available at ChessIQ.ca. The analysis engine runs in your browser, with no account required and analyzed game data stored locally on your device.