Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Why Matchy Exists

The Problem

Many applications need to match IP addresses and strings against large datasets. Common use cases include:

  • Threat intelligence: checking IPs and domains against blocklists
  • GeoIP lookups: finding location data for IP addresses
  • Domain categorization: classifying websites by patterns
  • Network security: matching against indicators of compromise

Traditional approaches have significant limitations:

Hash tables provide fast exact lookups, but can't match patterns. You can't use a hash table to match phishing.evil.com against a pattern like *.evil.com.

Sequential scanning works for patterns but doesn't scale. With 10,000 patterns, you perform 10,000 comparisons per lookup. This approach quickly becomes a bottleneck.

Multiple data structures add complexity. Using a hash table for exact matches, a tree for IP ranges, and pattern matching for domains means maintaining three separate systems.

Serialization overhead slows down loading. Traditional databases need to parse and deserialize data on startup, which can take hundreds of milliseconds or more.

Memory duplication wastes resources. In multi-process applications, each process loads its own copy of the database, multiplying memory usage.

The Solution

Matchy addresses these problems with a unified approach:

Automatic type detection means one database holds IPs, CIDR ranges, exact strings, and patterns. You don't need to know which type you're querying - Matchy figures it out.

Optimized data structures provide efficient lookups for each type. IPs use a binary search tree. Exact strings use hash tables. Patterns use the Aho-Corasick algorithm.

Memory mapping eliminates deserialization. Databases are memory-mapped files that load in under a millisecond. The operating system shares pages across processes automatically.

Compact binary format reduces size. Matchy uses a space-efficient binary representation similar to MaxMind's MMDB format.

Performance

A typical Matchy database can perform:

  • 7M+ IP address lookups per second
  • 1M+ pattern matches per second (with 50,000 patterns)
  • Sub-microsecond latency for individual queries
  • Sub-millisecond loading time

Compatibility

Matchy can read standard MaxMind MMDB files, making it a drop-in replacement for GeoIP databases. It extends the MMDB format to support string matching and patterns while maintaining compatibility with existing files.

When to Use Matchy

Matchy is designed for applications that need:

  • Fast lookups against large datasets
  • Pattern matching in addition to exact matches
  • IP address and string matching in the same database
  • Minimal memory overhead in multi-process architectures
  • Quick database loading without deserialization

If you only need exact string matching and already have a solution that works, Matchy might be overkill. But if you need patterns, IPs, and efficiency at scale, Matchy was built for you.