Database Concepts
This chapter covers the fundamental concepts of Matchy databases.
What is a Database?
A Matchy database is a binary file containing:
- Entries - IP addresses, CIDR ranges, patterns, or exact strings
- Data - Structured information associated with each entry
- Indexes - Optimized data structures for fast lookups
Databases use the .mxy
extension by convention, though any extension works.
Immutability
Databases are read-only once built. You cannot add, remove, or modify entries in an existing database.
To update a database:
- Create a new builder
- Add all entries (old + new + modified)
- Build the new database
- Atomically replace the old file
This ensures readers always see consistent state and enables safe concurrent access.
Entry Types
Matchy automatically detects four types of entries:
Entry Type | Example | Matches |
---|---|---|
IP Address | 192.0.2.1 | Exact IP address |
CIDR Range | 10.0.0.0/8 | All IPs in range |
Pattern | *.example.com | Strings matching glob |
Exact String | example.com | Exact string only |
You don't need to specify the type - Matchy infers it from the format.
Auto-Detection
When you query a database, Matchy automatically:
- Checks if the query is an IP address → searches IP tree
- Checks for exact string match → searches hash table
- Searches patterns → uses Aho-Corasick algorithm
This makes querying simple: db.lookup("anything")
works for all types.
Memory Mapping
Databases use memory mapping (mmap) for instant loading:
Traditional Database Matchy Database
───────────────────── ─────────────────
1. Open file 1. Open file
2. Read into memory 2. Memory map
3. Parse format 3. Done! (<1ms)
4. Build data structures
(100-500ms for large DB)
Memory mapping has several benefits:
Instant loading - Databases load in under 1 millisecond regardless of size.
Shared memory - The OS shares memory-mapped pages across processes automatically:
- 64 processes with a 100MB database = ~100MB RAM total
- Traditional approach = 64 × 100MB = 6,400MB RAM
Large databases - Work with databases larger than available RAM. The OS pages data in and out as needed.
Binary Format
Databases use a compact binary format based on MaxMind's MMDB specification:
- IP tree - Binary trie for IP address lookups (MMDB compatible)
- Hash table - For exact string matches (Matchy extension)
- Aho-Corasick automaton - For pattern matching (Matchy extension)
- Data section - Structured data storage (MMDB compatible)
This means:
- Standard MMDB readers can read the IP portion
- Matchy can read standard MMDB files (like GeoIP databases)
- Cross-platform compatible (same file works on Linux, macOS, Windows)
Building a Database
The general workflow is:
- Create a builder - Specify match mode (case-sensitive or not)
- Add entries - Add IP addresses, patterns, strings with associated data
- Build - Generate optimized binary format
- Save - Write to file
How to build:
Querying a Database
The query process:
- Open database - Memory map the file
- Query - Call lookup with any string
- Get result - Receive match data or None
How to query:
Query Results
Queries return one of:
- IP match - IP address or CIDR range matched
- Pattern match - One or more patterns matched
- Exact match - Exact string matched
- No match - No entries matched
For pattern matches, Matchy returns all matching patterns and their associated data.
This is useful when multiple patterns match (e.g., *.com
and example.*
both match
example.com
).
Database Size
Database size depends on:
- Number of entries
- Pattern complexity (more patterns = larger automaton)
- Data size (structured data per entry)
Typical sizes:
- 1,000 entries - ~50-100KB
- 10,000 entries - ~500KB-1MB
- 100,000 entries - ~5-10MB
- 1,000,000 entries - ~50-100MB
Pattern-heavy databases are larger due to the Aho-Corasick automaton.
Thread Safety
Databases are thread-safe for concurrent queries:
- Multiple threads can safely query the same database
- Memory-mapped data is read-only
- No locking required
Builders are NOT thread-safe:
- Don't share a builder across threads
- Build databases sequentially
Compatibility
Databases are:
- ✅ Platform-independent - Same file on Linux, macOS, Windows
- ✅ Tool-independent - CLI-built databases work with APIs
- ✅ Language-independent - Rust-built databases work with C
- ✅ MMDB-compatible - Can read standard MaxMind databases
Next Steps
Now that you understand database concepts, dive into specific topics:
- Entry Types - Deep dive on IP, CIDR, patterns, strings
- Pattern Matching - Glob syntax and matching rules
- Data Types and Values - What data you can store
- Performance Considerations - Optimization strategies