matchy bench

Benchmark database performance by generating test databases and measuring build, load, and query performance.

Synopsis

matchy bench [OPTIONS] [TYPE]

Description

The matchy bench command generates synthetic test databases of various types and sizes, then benchmarks:

Build time: How long it takes to create the database
Load time: How long it takes to open/memory-map the database
Query performance: Throughput and latency for lookups

This is useful for performance testing, capacity planning, and comparing different database types and configurations.

Arguments

`[TYPE]`

Type of database to benchmark. Default: ip

Options:

ip - IP address databases
literal - Exact string match databases
pattern - Glob pattern databases
combined - Mixed database with all entry types

matchy bench ip         # Benchmark IP lookups
matchy bench pattern    # Benchmark pattern matching
matchy bench combined   # Benchmark mixed workload

Options

`-n, --count <COUNT>`

Number of entries to test with. Default: 1000000

matchy bench ip --count 100000      # Small database
matchy bench ip --count 10000000    # Large database

`-o, --output <OUTPUT>`

Output file for the test database. If not specified, uses a temporary file.

matchy bench pattern --output test.mxy

`-k, --keep`

Keep the generated database file after benchmarking (otherwise it's deleted).

matchy bench ip --output bench.mxy --keep

`--load-iterations <LOAD_ITERATIONS>`

Number of load iterations to average. Default: 3

matchy bench ip --load-iterations 10

`--query-count <QUERY_COUNT>`

Number of queries for batch benchmark. Default: 100000

matchy bench ip --query-count 1000000  # 1M queries

`--hit-rate <HIT_RATE>`

Percentage of queries that should match (0-100). Default: 10

A lower hit rate tests "not found" performance, while a higher hit rate tests match performance.

matchy bench ip --hit-rate 50    # 50% of queries find matches
matchy bench ip --hit-rate 90    # 90% of queries find matches

`--trusted`

Trust database and skip UTF-8 validation (faster, only for trusted sources).

matchy bench pattern --trusted

`--pattern-style <PATTERN_STYLE>`

Pattern style for pattern benchmarks. Default: complex

Options:

prefix - Prefix patterns like prefix*
suffix - Suffix patterns like *.suffix
mixed - Mix of prefix and suffix
complex - Complex patterns with wildcards and character classes

matchy bench pattern --pattern-style prefix
matchy bench pattern --pattern-style complex

`-h, --help`

Print help information.

Examples

Basic IP Benchmark

$ matchy bench ip --count 1000
<!-- cmdrun matchy bench ip --count 1000 -->

Pattern Benchmark with Custom Settings

$ matchy bench pattern --count 500 --pattern-style prefix
<!-- cmdrun matchy bench pattern --count 500 --pattern-style prefix -->

Combined Benchmark

$ matchy bench combined --count 300
<!-- cmdrun matchy bench combined --count 300 -->

Save Benchmark Database

matchy bench ip --count 1000000 --output benchmark.mxy --keep

This creates a database you can inspect or query later:

matchy inspect benchmark.mxy
matchy query benchmark.mxy "192.0.2.1"

High Hit Rate Benchmark

matchy bench ip --hit-rate 90 --query-count 1000000

Tests performance when most queries find matches (realistic for allowlist/blocklist scenarios).

Low Hit Rate Benchmark

matchy bench ip --hit-rate 5 --query-count 1000000

Tests "not found" performance (realistic for threat intelligence databases where most IPs are not threats).

Benchmark Types

IP Benchmarks

Generates random IPv4 and IPv6 addresses:

Mix of /32 addresses and CIDR ranges
Realistic distribution
Tests binary trie performance

Literal Benchmarks

Generates random strings:

Domain-like strings (e.g., subdomain.example.com)
Tests hash table performance
O(1) lookup complexity

Pattern Benchmarks

Generates glob patterns based on style:

Prefix: prefix* patterns
Suffix: *.suffix patterns
Mixed: Combination of prefix and suffix
Complex: Wildcards, character classes [abc], negation [!xyz]

Tests Aho-Corasick automaton performance.

Combined Benchmarks

Generates databases with all three types:

Equal distribution (33.3% each)
Tests mixed workload performance
Realistic production scenario

Performance Factors

Benchmark results depend on:

Database Size

Larger databases → slightly slower queries
Build time scales linearly
Load time remains constant (memory-mapped)

Entry Type

IPs: Fastest (~7M queries/sec)
Literals: Very fast (~8M queries/sec)
Patterns: Moderate (~1-2M queries/sec)

Hit Rate

High hit rate → slightly slower (data extraction overhead)
Low hit rate → faster (early termination)

Hardware

CPU speed affects query throughput
RAM speed affects load performance
Storage type affects build time

Pattern Complexity

Simple patterns (prefix/suffix) → faster
Complex patterns → slower
More patterns → more states to traverse

Interpreting Results

Build Time

How long it takes to compile entries into optimized format:

1M entries: ~1-3 seconds (typical)
Scales approximately linearly
One-time cost

Load Time

How long it takes to memory-map the database:

Should be <1ms for any size
Instant startup time
Memory-mapped, not loaded into RAM

Query Performance

Good performance:

IPs: >5M queries/sec
Literals: >6M queries/sec
Patterns: >1M queries/sec

Acceptable performance:

IPs: 2-5M queries/sec
Literals: 3-6M queries/sec
Patterns: 500k-1M queries/sec

Investigate if slower:

Check system load
Verify no swap usage
Check disk I/O (shouldn't be any after load)
Try --trusted flag

Use Cases

Capacity Planning

# Test with production-sized database
matchy bench combined --count 5000000 --query-count 10000000

Use results to estimate:

Queries your system can handle
Memory requirements
Build time for updates

Performance Regression Testing

# Run before changes
matchy bench pattern --count 1000000 > before.txt

# Make changes...

# Run after changes
matchy bench pattern --count 1000000 > after.txt

# Compare results
diff before.txt after.txt

Hardware Comparison

# Run same benchmark on different systems
matchy bench combined --count 1000000

Compare:

Query throughput
Build time
Load time

Optimization Validation

# Test with validation
matchy bench ip --count 1000000

# Test without validation (trusted)
matchy bench ip --count 1000000 --trusted

Compare the difference to see validation overhead.

Exit Status

0: Benchmark completed successfully
1: Error (out of memory, disk full, etc.)

Keyboard shortcuts

Matchy Documentation