Database Concepts

This chapter covers the fundamental concepts of Matchy databases.

What is a Database?

A Matchy database is a binary file containing:

Entries - IP addresses, CIDR ranges, patterns, or exact strings
Data - Structured information associated with each entry
Indexes - Optimized data structures for fast lookups

Databases use the .mxy extension by convention, though any extension works.

Immutability

Databases are read-only once built. You cannot add, remove, or modify entries in an existing database.

To update a database:

Create a new builder
Add all entries (old + new + modified)
Build the new database
Atomically replace the old file

This ensures readers always see consistent state and enables safe concurrent access.

Entry Types

Matchy automatically detects four types of entries:

Entry Type	Example	Matches
IP Address	`192.0.2.1`	Exact IP address
CIDR Range	`10.0.0.0/8`	All IPs in range
Pattern	`*.example.com`	Strings matching glob
Exact String	`example.com`	Exact string only

You don't need to specify the type - Matchy infers it from the format.

Auto-Detection

When you query a database, Matchy automatically:

Checks if the query is an IP address → searches IP tree
Checks for exact string match → searches hash table
Searches patterns → uses Aho-Corasick algorithm

This makes querying simple: db.lookup("anything") works for all types.

Memory Mapping

Databases use memory mapping (mmap) for instant loading:

Traditional Database          Matchy Database
─────────────────────        ─────────────────
1. Open file                 1. Open file
2. Read into memory          2. Memory map
3. Parse format              3. Done! (<1ms)
4. Build data structures
   (100-500ms for large DB)

Memory mapping has several benefits:

Instant loading - Databases load in under 1 millisecond regardless of size.

Shared memory - The OS shares memory-mapped pages across processes automatically:

64 processes with a 100MB database = ~100MB RAM total
Traditional approach = 64 × 100MB = 6,400MB RAM

Large databases - Work with databases larger than available RAM. The OS pages data in and out as needed.

Binary Format

Databases use a compact binary format based on MaxMind's MMDB specification:

IP tree - Binary trie for IP address lookups (MMDB compatible)
Hash table - For exact string matches (Matchy extension)
Aho-Corasick automaton - For pattern matching (Matchy extension)
Data section - Structured data storage (MMDB compatible)

This means:

Standard MMDB readers can read the IP portion
Matchy can read standard MMDB files (like GeoIP databases)
Cross-platform compatible (same file works on Linux, macOS, Windows)

Building a Database

The general workflow is:

Create a builder - Specify match mode (case-sensitive or not)
Add entries - Add IP addresses, patterns, strings with associated data
Build - Generate optimized binary format
Save - Write to file

How to build:

Querying a Database

The query process:

Open database - Memory map the file
Query - Call lookup with any string
Get result - Receive match data or None

How to query:

Query Results

Queries return one of:

IP match - IP address or CIDR range matched
Pattern match - One or more patterns matched
Exact match - Exact string matched
No match - No entries matched

For pattern matches, Matchy returns all matching patterns and their associated data. This is useful when multiple patterns match (e.g., *.com and example.* both match example.com).

Database Size

Database size depends on:

Number of entries
Pattern complexity (more patterns = larger automaton)
Data size (structured data per entry)

Typical sizes:

1,000 entries - ~50-100KB
10,000 entries - ~500KB-1MB
100,000 entries - ~5-10MB
1,000,000 entries - ~50-100MB

Pattern-heavy databases are larger due to the Aho-Corasick automaton.

Thread Safety

Databases are thread-safe for concurrent queries:

Multiple threads can safely query the same database
Memory-mapped data is read-only
No locking required

Builders are NOT thread-safe:

Don't share a builder across threads
Build databases sequentially

Compatibility

Databases are:

✅ Platform-independent - Same file on Linux, macOS, Windows
✅ Tool-independent - CLI-built databases work with APIs
✅ Language-independent - Rust-built databases work with C
✅ MMDB-compatible - Can read standard MaxMind databases

Next Steps

Now that you understand database concepts, dive into specific topics:

Entry Types - Deep dive on IP, CIDR, patterns, strings
Pattern Matching - Glob syntax and matching rules
Data Types and Values - What data you can store
Performance Considerations - Optimization strategies

Keyboard shortcuts

Matchy Documentation