Bitcoin Node Sync: When More Cores Make It Slower
"Just throw more CPU cores at it." That's the instinct when something seems slow, right? More parallel processing, faster results.
I thought the same when my Bitcoin node was validating background blocks at 14 blocks per second. With an 8-core CPU mostly idle at 18%, surely using more cores would speed things up?
Spoiler: It made things 50% slower.
The Experiment
My node was running Bitcoin Knots 28.1 in AssumeUTXO mode, validating historical blocks in the background from block 346,000 towards the snapshot base at 840,000. The validation was using only a fraction of available resources:
Validation: 14.0 blocks/s
Disk I/O: 7.4%
Network: 4.8%
Memory: 54%
Everything looked underutilized. The obvious move: increase the
par parameter (which controls script verification threads)
from default to par=8 to use all 8 cores.
The Result: Slower, Not Faster
After restarting with par=8 and letting it stabilize,
the numbers told a different story:
CPU Usage: 11.8% (↓ from 17.9%)
Validation: 7.4 blocks/s (↓ from 14.0 b/s)
The node got SLOWER and used LESS CPU. What happened?
This is counterintuitive. More threads should mean more work done, not less. But the key insight: the threads aren't doing less work because they're lazy - they're waiting.
Understanding Thread Contention
Bitcoin Core's validation process involves:
- Download block from peers (sequential I/O)
- Verify signatures within the block (parallelizable)
- Update UTXO database (random I/O, serialized)
- Advance to next block (sequential)
While step 2 (signature verification) can use multiple cores effectively, the critical path is step 3: updating the chainstate database. This is a LevelDB with millions of small, random read/write operations.
The chainstate database has inherent serialization points. Multiple threads trying to access it simultaneously don't speed things up - they create lock contention. Threads spend time waiting for database locks instead of doing useful work.
What's Happening With par=8
With more threads active:
- More threads compete for the same database locks
- Context switching overhead increases
- Threads block each other on
cs_main(Bitcoin's global lock) - CPU cycles are wasted on synchronization, not validation
The result: CPU usage goes DOWN (threads waiting, not working) and throughput goes DOWN (more contention overhead).
Why AssumeUTXO Background Validation is Disk-Bound
During AssumeUTXO background validation, your node validates every block from genesis to the snapshot base (840,000 blocks). This is:
- Over 60 million UTXOs to track
- Billions of signature verifications
- Hundreds of GB of chainstate database operations
Research and profiling consistently show: disk I/O is the limiting factor, not CPU. Even high-end hardware sees disk reads at 150-220 MB/s while CPU cores sit at 30% utilization.
"Bitcoin Core's overall speed is significantly affected by the random-access speed of the chainstate directory. If your chainstate is on a magnetic disk, that will very probably be your biggest performance bottleneck."
What Actually Improved Performance
After understanding the problem, I reverted par to
default and instead increased dbcache from 4 GB to 8 GB:
# bitcoin.conf
dbcache=8000 # 8 GB cache (was 4000)
# par removed - default auto-detection is optimal
The logic: if disk I/O is the bottleneck, keep more data in RAM
to reduce disk access. The dbcache parameter controls
how much RAM Bitcoin Core uses for the UTXO cache.
Research shows that increasing dbcache from 4 GB to 8 GB yields approximately 10-24% performance improvement, depending on the validation phase.
More importantly: fewer disk flushes = less I/O contention = smoother validation.
Why Default par is Better
Bitcoin Core's default par=0 (auto-detect) was chosen
after extensive testing. It typically uses cores - 1
threads, capped at 15, which balances parallelism against overhead.
For disk-bound workloads like background validation, 2-4 script verification threads are optimal. More than that creates diminishing returns or even regression due to contention.
The Right Hardware Makes The Difference
If validation is disk-bound, the solution isn't more CPU power - it's faster storage. The key insight:
- Chainstate database (~15-20 GB): needs NVMe SSD
- Block files (~600 GB): can stay on slow HDD
An NVMe SSD provides 100,000+ random IOPS compared to a HDD's ~100. That's a 1000x improvement where it actually matters.
• Chainstate on internal NVMe SSD
• Blocks on external USB HDD (cheap, capacity)
• 8-16 GB RAM for large dbcache
• Any modern CPU (even old i3 works fine)
This setup costs €150-250 total with used hardware.
Lessons from Real-World Testing
Through this optimization journey, several key lessons emerged:
1. Measure, Don't Assume
Low CPU utilization doesn't mean "not enough threads" - it often means "threads are blocked." Adding more threads to a contended resource makes things worse, not better.
2. Understand Your Bottleneck
Bitcoin validation has different phases with different bottlenecks:
- Early blocks (2009-2012): CPU-bound (few UTXOs, simple validation)
- Middle blocks (2013-2017): Disk-bound (UTXO set growth)
- Recent blocks (2018+): CPU-bound again (SegWit, Taproot complexity)
AssumeUTXO background validation goes through all phases, but the disk-bound middle section dominates the time.
3. Trust the Defaults
Bitcoin Core's default settings are well-tuned. The par
parameter exists for specific use cases (extremely high-end hardware,
or intentionally leaving cores free), not as a "make it faster" knob.
4. Cache is King
Every byte you can keep in RAM is a byte you don't have to fetch from
disk. Increasing dbcache has a much larger impact than
tuning thread counts.
The Broader Implications
This pattern - where adding parallelism degrades performance - appears throughout distributed systems. It's called thread contention and it's a fundamental challenge in concurrent programming.
Bitcoin Core developers have worked hard to reduce contention points
(lock-free data structures, reduced cs_main scope), but
some serialization is inherent to the problem: you can't update a
database in parallel without coordination.
Background validation runs at low priority by design. The node is already usable (you can send transactions, validate new blocks), so there's no rush. Bitcoin Core keeps your system responsive and avoids thermal issues on low-power hardware.
The Future: SwiftSync
While Bitcoin Core doesn't currently provide user controls for background validation speed, there's promising development that will fundamentally change the equation: SwiftSync.
What is SwiftSync?
SwiftSync is a proposed optimization that uses "hints files" (~88-100 MB compressed) to indicate which UTXOs remain unspent at specific block heights. This enables near-stateless, fully parallelizable validation.
SwiftSync has demonstrated a 5.28x speedup of Initial Block Download compared to default Bitcoin Core settings:
• Standard IBD: ~41 hours
• With SwiftSync: ~8 hours
The speedup comes from eliminating unnecessary database writes for temporary UTXOs that are created and spent within the sync window.
Impact on Background Validation
Importantly, SwiftSync doesn't just speed up initial sync - it also accelerates AssumeUTXO background validation. When James O'Beirne questioned whether SwiftSync was necessary given AssumeUTXO already provides a fast path to usability, developer Ruben Somsen clarified:
"SwiftSync speeds up assumeUTXO's background validation, making it a nice addition for users of assumeUTXO."
This means my 23-hour background validation estimate could potentially drop to 4-5 hours with SwiftSync - without requiring additional hardware or configuration changes.
Current Status
As of 2025, SwiftSync exists as a proof-of-concept implementation in Rust, with active development and testing showing promising results. However:
- It's not yet integrated into Bitcoin Core
- No timeline has been announced for mainnet release
- Community discussion continues on optimal implementation
For now, the optimizations discussed earlier (NVMe storage, large dbcache, default par settings) remain the best approach. But SwiftSync represents an exciting future where background validation completes much faster without the hardware investments currently required.
Practical Recommendations
If you're running a Bitcoin node and want optimal performance:
- Use NVMe for chainstate - this is non-negotiable for good performance
- Increase dbcache - use 50% of available RAM safely
- Leave par at default - or use
par=-2to leave 2 cores free - Be patient - background validation takes days by design
Example configuration for 16 GB RAM system:
# bitcoin.conf
dbcache=8000 # 8 GB cache
maxmempool=300 # 300 MB mempool
par=0 # Auto-detect (default)
# or: par=-2 # Leave 2 cores free
Conclusion
The intuition that "more cores = faster" breaks down when the workload is limited by a serialized resource like disk I/O. In Bitcoin's case, throwing more threads at background validation creates contention that actually slows things down.
The real optimizations are:
- ✅ Fast storage (NVMe) for the critical path
- ✅ Large cache (dbcache) to minimize disk access
- ✅ Default thread settings that avoid contention
- ❌ Not adding more CPU cores
- ❌ Not increasing thread counts beyond defaults
Understanding where your system's true bottleneck lies - and addressing that, rather than assumptions - is the key to real performance improvements.
Need Help Optimizing Your Node?
I set up and optimize Bitcoin nodes on custom hardware - tuned for your specific use case. From home mining heat integration to high-performance validation rigs.
Get in Touch