Why I Stopped Trusting Browser Multithreading Checks
Why production startup needs probing, fallback, and recovery.
Feature detection can say multithreading is supported, but only runtime validation, fallback paths, and recovery logic make browser-native analysis reliable.
When I started building browser-native chess analysis, I assumed multithreading support was a straightforward feature-detection problem.
If the browser exposed the right APIs, multithreaded Stockfish should work.
That assumption was clean, common, and wrong.
The original mental model
Most browser code follows a familiar rule: if the API exists, it is safe to use.
if ("IntersectionObserver" in window) {
// safe to use
}
I treated multithreaded engine startup the same way. If these checks passed, I expected startup to be reliable:
SharedArrayBufferexistscrossOriginIsolatedis true- Worker support is available
In theory, that stack should be enough.
In production, it was not.
What actually happened
Even with all capability signals present, startup could still fail.
Sometimes failure was obvious. Other times it was subtle:
- Workers initialized inconsistently
- The engine stalled after launch
- Behavior changed between environments with the same apparent support
- Certain startup and caching states produced hard-to-reproduce failures
From the outside, the browser looked fully compatible. At runtime, compatibility was conditional.
The shift: startup is a system, not a check
I stopped treating startup as a single yes/no decision and started treating it as a layered system:
- Capability detection
- Runtime probing
- Fallback behavior
- Failure classification
- Recovery logic
Each layer handled a different failure mode.
1) Capability detection is necessary, not sufficient
The baseline checks still matter. Without SharedArrayBuffer, cross-origin isolation, and workers, multithreading is not possible.
But those checks only answer: Could this work?
They do not answer: Will this work reliably right now?
2) Probe the engine, do not just trust the browser
After startup, the engine runs a short validation search.
If the probe behaves correctly, multithreaded mode is accepted. If the probe fails or looks unstable, the system falls back.
This validates runtime behavior directly instead of trusting capability signals as proof.
3) Fallback is part of the design, not an edge case
Even in capable environments, startup can fail for reasons outside normal feature checks.
So startup always includes a deterministic fallback path:
try multithreaded startup
↓
run probe validation
↓
success -> continue in multithread mode
failure -> switch to single-thread mode
Single-thread analysis is slower, but it preserves reliability and continuity.
4) Failure modes need distinct handling
Different failures should not collapse into one generic error.
In practice, startup improved once failures were classified explicitly:
- Offline and engine assets not cached
- Engine runtime script failed to load
- Multithread probe failed
- Generic initialization failure
This improved both user messaging and operational debugging.
5) Long-running analysis needs recovery
Startup reliability is only part of the problem.
Long-running browser workloads can stall. Background tab throttling, worker hangs, and position-specific lockups all happen in real sessions.
So analysis runs with watchdog logic: if updates stop for too long, the run is restarted automatically.
To users, analysis continues. Internally, the system has recovered from a stalled engine.
What surprised me most
Getting Stockfish running in the browser was not the hard part.
Making browser-native analysis dependable under real-world conditions required significantly more system design than I expected.
The broader lesson
This changed how I think about browser capability checks.
Feature detection is necessary. It is not always sufficient.
For browser-native systems doing serious work, the safest approach is not to trust capability signals alone. It is to validate behavior directly, provide graceful fallback, and recover automatically when runtime conditions degrade.
Capability signals are permission, not proof.