Super-Simple Includes Documentation

v0.244.0

Performance Benchmarking

Performance Benchmarking

SSI uses a two-tier benchmarking system to track performance over time and investigate regressions.

Tier 1: Timing

./scripts/benchmark-timing.py
                

Runs timing measurements across benchmark sites. Takes approximately 4–7 minutes. Results are stored in benchmarks/timing-history.toml and published to the ssi-benchmarks Codeberg repo.

Run after every commit to main to maintain a performance history.

Tier 2: Profiling

./scripts/benchmark-profiling.py
                

Runs DWARF profiling, generates flamegraphs, and produces differential analysis against a baseline. Takes approximately 15–20 minutes. Requires Tier 1 data for the current version. Results are published to the ssi-benchmarks Codeberg repo.

Run when timing measurements show changes worth investigating.

Benchmark Sites

Benchmark sites are located in tests/test-examples/ alongside the functional test sites. Each benchmark site is sized to produce meaningful timing measurements and exercises specific SSI features.

Validation Performance (v0.89+)

A significant improvement was made in v0.89 by separating validation logic from the build pipeline:

Impact:

  • Build speed: 5–7% improvement in standard deployments
  • Up to 44.9% improvement observed in some configurations (79.01ms → 43.51ms average)

Architecture change: Validation now runs only when explicitly requested via ssi validate, removing the overhead from the standard build path.

# Before: validation always active
                ssi deploy site/ output/  # 79.01ms average
                
                # After: validation optional
                ssi deploy site/ output/  # 43.51ms average (validation disabled)
                ssi validate site/         # Full validation on demand
                

Common Bottlenecks

Based on profiling:

  1. File I/O: Batch operations where possible
  2. String allocations: Reuse buffers in hot paths
  3. Path resolution: Cache resolved paths

Token matching uses the Aho-Corasick algorithm (via the daachorse crate), which scans for all configured tokens in a single pass — this is not a bottleneck.

Interpreting Results

  • Single runs can be noisy; watch patterns over time
  • System load affects timing measurements
  • Use instruction counts alongside wall time for more stable comparisons
  • Profile with flamegraphs before optimizing

Troubleshooting

Inconsistent timing results

  • Ensure the system is idle during benchmarking
  • Disable CPU frequency scaling
  • Close unnecessary applications

Missing baseline data

  • Run ./scripts/benchmark-timing.py to generate
  • Commit the updated history file
  • Ensure version numbers are current