Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Performance Benchmarks

Official performance benchmarks and testing methodology for Matchy.

Overview

Matchy provides built-in benchmarking via the matchy bench command. All benchmarks use real-world data patterns and measure build time, load time, and query throughput.

Running Benchmarks

Quick Benchmark

matchy bench ip

Runs default IP benchmark (1M entries).

Custom Benchmark

matchy bench pattern --count 100000 --query-count 1000000

Benchmark Types

  • ip - IPv4 and IPv6 address lookups
  • literal - Exact string matching
  • pattern - Glob pattern matching
  • combined - Mixed workload (IPs + patterns)

See matchy bench command for full options.

Official Results

Generated with version 0.5.2 on Apple M-series hardware

IP Address Lookups

Configuration: 100,000 IPv4 addresses, 100,000 queries

MetricValue
Build time0.04s
Build rate2.76M IPs/sec
Database size586 KB
Load time0.54ms
Query throughput5.80M queries/sec
Query latency0.17µs

Key characteristics:

  • O(32) lookups for IPv4, O(128) for IPv6
  • Binary trie traversal
  • Cache-friendly sequential access

String Literal Matching

Configuration: 50,000 literal strings, 50,000 queries

MetricValue
Build time0.01s
Build rate4.03M literals/sec
Database size3.00 MB
Load time0.49ms
Query throughput4.58M queries/sec
Query latency0.22µs

Key characteristics:

  • O(1) hash table lookups
  • FxHash for fast non-cryptographic hashing
  • Zero-copy memory access

Pattern Matching (Globs)

Configuration: 10,000 glob patterns, 50,000 queries

MetricValue
Build time0.00s
Build rate4.08M patterns/sec
Database size62 KB
Load time0.27ms
Query throughput4.57M queries/sec
Query latency0.22µs

Key characteristics:

  • Aho-Corasick automaton
  • Parallel pattern matching
  • Glob wildcard support

Combined Database

Configuration: 10,000 IPs + 10,000 patterns, 50,000 queries

MetricValue
Build time0.01s
Build rate1.41M entries/sec
Database size2.29 MB
Load time0.46ms
Query throughput15.43K queries/sec
Query latency64.83µs

Key characteristics:

  • Realistic mixed workload
  • Combined IP and pattern searches
  • Production-like performance

Performance Factors

Database Size

EntriesBuild TimeQuery Throughput
10K<0.01s6.5M queries/sec
100K0.04s5.8M queries/sec
1M0.35s5.2M queries/sec
10M3.5s4.8M queries/sec

Query performance remains high even with large databases due to memory-mapped access and efficient data structures.

Hit Rate Impact

Hit RateThroughputNotes
0%6.2M/secEarly termination
10%5.8M/secDefault benchmark
50%5.5M/secRealistic workload
100%5.0M/secData extraction overhead

Higher hit rates show slightly lower throughput due to result extraction overhead.

Trusted Mode

ModeThroughputNotes
Safe4.9M/secUTF-8 validation
Trusted5.8M/sec~18% faster

Use --trusted flag for databases you control.

Memory Usage

Per-Database Overhead

  • Handle: ~200 bytes
  • File mapping: 0 bytes (OS-managed)
  • Query state: 0 bytes (stack-allocated)

Sharing Between Processes

With 10 processes using 1GB database:

  • Without mmap: 10 × 1GB = 10GB RAM
  • With mmap: ~1GB RAM (shared pages)

Memory-mapped databases are shared between processes automatically by the OS.

Scalability

Vertical Scaling

  • Single-threaded: 5.8M queries/sec
  • 4 threads: 23M queries/sec (4×)
  • 8 threads: 46M queries/sec (8×)

Linear scaling due to thread-safe read-only access.

Horizontal Scaling

Multiple servers can use the same database:

  • NFS/shared storage: All servers access one copy
  • Local copies: Each server loads independently
  • Hot reload: Update without restart

Comparison to Alternatives

vs. Traditional Databases

FeatureMatchyPostgreSQLRedis
IP lookups/sec5.8M50K200K
Pattern matchingYesSlowNo
Memory usageLow (mmap)HighHigh
Startup time<1msSecondsSeconds
Concurrent readsUnlimitedLimitedLimited

vs. In-Memory Structures

FeatureMatchyHashMapRegex Set
Query speed5.8M/sec10M/sec10K/sec
MemoryO(1)O(n)O(n)
Load time<1msSecondsSeconds
PersistenceBuilt-inManualManual

Matchy trades slight query speed for massive memory and load time advantages.

Benchmarking Methodology

Data Generation

Benchmarks use realistic synthetic data:

  • IPs: Mix of /32 addresses and CIDR ranges
  • Literals: Domain-like strings
  • Patterns: Realistic glob patterns

Measurement

  1. Build time: Time to compile entries
  2. Save time: Disk write performance
  3. Load time: Memory-mapping overhead (averaged over 3 runs)
  4. Query time: Batch query throughput

Hardware

Official benchmarks run on:

  • CPU: Apple M-series (ARM64)
  • RAM: 16GB+
  • Storage: SSD

Results vary by hardware but relative performance remains consistent.

Reproducing Benchmarks

Local Testing

# IP benchmark
matchy bench ip -n 100000 --query-count 100000

# Pattern benchmark
matchy bench pattern -n 10000 --query-count 50000

# Combined benchmark
matchy bench combined -n 20000 --query-count 50000

Continuous Integration

# Run benchmarks and check for regressions
matchy bench ip > results.txt
grep "QPS" results.txt

Custom Workloads

# Build your own database
matchy build -i custom.csv -o test.mxy

# Benchmark it
time matchy query test.mxy < queries.txt

Performance Tuning

For Best Query Performance

  1. Use --trusted for controlled databases
  2. Reuse database handles
  3. Use memory-mapped files (automatic)
  4. Keep database on fast storage (SSD)
  5. Use direct IP lookup when possible

For Best Build Performance

  1. Sort input data by type
  2. Use batch additions
  3. Pre-allocate if entry count known
  4. Use multiple builders in parallel

For Lowest Memory

  1. Use memory-mapped mode (default)
  2. Share databases between processes
  3. Close unused databases promptly
  4. Use validated mode (skips validation cache)

See Also