Performance Benchmarking

SSI uses a two-tier benchmarking system to track performance over time and investigate regressions.

Tier 1: Timing

./scripts/benchmark-timing.py

Runs timing measurements across benchmark sites. Takes approximately 4–7 minutes. Results are stored in benchmarks/timing-history.toml and published to the ssi-benchmarks Codeberg repo.

Run after every commit to main to maintain a performance history.

Tier 2: Profiling

./scripts/benchmark-profiling.py

Runs DWARF profiling, generates flamegraphs, and produces differential analysis against a baseline. Takes approximately 15–20 minutes. Requires Tier 1 data for the current version. Results are published to the ssi-benchmarks Codeberg repo.

Run when timing measurements show changes worth investigating.

Benchmark Sites

Benchmark sites are located in tests/test-examples/ alongside the functional test sites. Each benchmark site is sized to produce meaningful timing measurements and exercises specific SSI features.

Validation Performance (v0.89+)

A significant improvement was made in v0.89 by separating validation logic from the build pipeline:

Impact:

Build speed: 5–7% improvement in standard deployments
Up to 44.9% improvement observed in some configurations (79.01ms → 43.51ms average)

Architecture change: Validation now runs only when explicitly requested via ssi validate, removing the overhead from the standard build path.

# Before: validation always active
                ssi deploy site/ output/  # 79.01ms average
                
                # After: validation optional
                ssi deploy site/ output/  # 43.51ms average (validation disabled)
                ssi validate site/         # Full validation on demand

Common Bottlenecks

Based on profiling:

File I/O: Batch operations where possible
String allocations: Reuse buffers in hot paths
Path resolution: Cache resolved paths

Token matching uses the Aho-Corasick algorithm (via the daachorse crate), which scans for all configured tokens in a single pass — this is not a bottleneck.

Interpreting Results

Single runs can be noisy; watch patterns over time
System load affects timing measurements
Use instruction counts alongside wall time for more stable comparisons
Profile with flamegraphs before optimizing

Troubleshooting

Inconsistent timing results

Ensure the system is idle during benchmarking
Disable CPU frequency scaling
Close unnecessary applications

Missing baseline data

Run ./scripts/benchmark-timing.py to generate
Commit the updated history file
Ensure version numbers are current