Backed by mandō
Sieve is the research-grade pattern matching infrastructure that finds exactly what matters — across files, directories, and entire codebases — with the precision of a formal system and the speed of a compiled engine. We built this from first principles because the world's information problem isn't retrieval. It's recognition.
Capabilities
Sieve ships with a comprehensive set of matching primitives, each designed to collapse the distance between a question and its answer. These are not features we added. They are properties of the system.
Our pattern engine normalizes case boundaries transparently, so you find "Error", "error", and
"ERROR" in a single pass. No preprocessing. No separate indices. Just pass -i and
the distinction dissolves.
Point Sieve at a directory and it walks the entire tree. Every file, every subdirectory, every
nested structure — surfaced with the -r flag. The -R
variant follows symbolic links, for when your corpus topology demands it.
The -E engine unlocks the full power of extended regex: alternation, repetition
quantifiers, grouping without backslash escaping. The expressive surface area of your queries
expands by an order of magnitude.
For research teams that need lookahead, lookbehind, non-greedy quantifiers, and named capture
groups. The -P engine gives you PCRE — the most powerful pattern
grammar available — natively, with zero configuration overhead.
When you need the signal density, not the signal itself. The -c flag returns a count
of matching lines per file — a one-pass frequency analysis that lets you map
pattern distribution across an entire corpus without materializing results.
Every match carries its origin coordinate. With -n, Sieve prefixes each result with
the exact line number in the source file, creating an audit trail from discovery back to
context. Essential for compliance and reproducibility.
Sometimes the answer is everything the pattern doesn't match. The -v flag inverts
the match predicate, returning only non-matching lines. It's exclusion as a first-class
operation, not an afterthought.
A match without context is a coordinate without a map. -A, -B, and
-C extract trailing, leading, or surrounding lines around each match. Configurable
window size. Full situational awareness around every hit.
When you need only the matched substring, not the line that contains it. The -o flag
isolates exactly the characters that satisfy the pattern — nothing more, nothing
less. Clean signal, zero noise.
The -w flag constrains matches to whole words, respecting the natural boundaries in
your data. Search for "log" without matching "logging", "catalog", or "blog". Surgical
specificity, built into the engine.
How It Works
Sieve reduces the entire pattern-matching workflow to a deterministic pipeline that runs in constant time per line. No indexing phase. No warm-up period. You describe the pattern; we find every instance.
Express your query as a literal string, basic regex, extended regex, or Perl-compatible regular expression. Sieve accepts all four grammars. The pattern is compiled once and applied everywhere.
Pass one file, many files, or an entire directory tree. Use --include and
--exclude to filter by file pattern, so Sieve only examines what matters. The
search space is bounded by your intent, not your infrastructure.
Choose your result shape: full matching lines, only the matched text, counts, file names only, or context windows. Add line numbers for provenance. Add color for visual clarity. The output is as structured as you need it to be.
Sieve streams results to standard output as they're found — no buffering, no final materialization step. Pipe the output into any downstream process. The pattern match is the first stage in an arbitrarily complex data pipeline.
Our Thesis
We founded Sieve on a conviction that most people in our industry have overlooked: that the ability to locate a pattern in a body of text is not a convenience feature or a nice-to-have. It is the single most important operation in information work. Every search engine, every log analyzer, every security scanner, every code review tool, every compliance audit — at the very bottom of the abstraction stack, there is a pattern, and something finding it.
And yet for years, the state of the art has been fragmented across ad-hoc implementations, underpowered full-text search indices, and vendor-locked API endpoints that approximate the answer rather than computing it. We think that's wrong. We think the world needs a single, composable, research-grade pattern matching layer that operates at the speed of the filesystem itself. Not an approximation. Not a probability distribution over possible results. An exact answer.
This is what Sieve is. A formal system masquerading as a product. A compiler for questions about text. We are building from first principles because the problem deserves it, and because we believe that when you give people a tool that is precise, fast, and composable, they will build things on top of it that neither we nor they can predict today. That is the kind of compound leverage we are optimizing for.
By the numbers
Competitive Analysis
We benchmarked against traditional search solutions. The results speak for themselves.
| Capability | Sieve | Traditional Full-Text Search | Regex Libraries |
|---|---|---|---|
| Indexing required | ✓ None | ✗ Hours | ✓ None |
| Recursive directory search | ✓ Native | ✗ Requires agent | ✗ Not included |
| Perl-compatible regex | ✓ Built-in | ✗ No | ◐ Depends |
| Context window extraction | ✓ Configurable | ◐ Limited | ✗ No |
| File pattern filtering | ✓ --include/--exclude | ✗ External config | ✗ N/A |
| Composable output | ✓ Stdout-native | ✗ API only | ◐ Manual |
| Zero-config deployment | ✓ Single binary | ✗ Server cluster | ◐ Library dep |
Live Demo
Real commands. Real output. Highlighted matches.
$ sieve -rn "import" ./src ./src/main.py:1:from collections import defaultdict ./src/main.py:2:import sys ./src/main.py:3:import os ./src/utils.py:1:import json ./src/utils.py:2:from pathlib import Path
$ sieve -i -C 1 "error" server.log Connection established on port 8080 Error: timeout waiting for response from upstream Retrying with exponential backoff -- Request processed in 42ms FATAL ERROR: unable to allocate memory Shutting down gracefully
$ sieve -Eo "[0-9]+\.[0-9]+\.[0-9]+" versions.txt 3.12.14 2.7.18 1.0.0 4.2.1
$ sieve -rc "TODO" ./src ./src/main.py:3 ./src/utils.py:1 ./src/config.py:0 ./src/server.py:7
Pricing
Every tier is built around real capabilities. You unlock more of the engine as you grow.
Basic literal string matching for individuals exploring pattern recognition.
-FFull regex engine with recursive traversal for serious practitioners.
-E-i-r / -R-n-v-wPerl regex, context windows, and enterprise output controls for teams.
-P-A -B -C-o-m--color-H / -hFull API access, pattern file injection, null-terminated output, and dedicated infrastructure.
-f-e-Z-q-xWhat people are saying
"We replaced our entire Elasticsearch cluster with Sieve. The recursive search alone saved us six figures in infrastructure. No indexing, no shards, no cluster management. Just patterns and answers."
"The Perl-compatible regex engine is genuinely world-class. Lookahead, lookbehind, non-greedy quantifiers — all native. I used to chain three different tools together to get what Sieve gives me in one invocation."
"I run Sieve as the first step in every incident response. Case-insensitive search across a million log lines with context windows — I can go from alert to root cause in under 30 seconds."
"The composability is the real unlock. Sieve doesn't try to be a platform — it's a primitive. I pipe it into everything. It's the one tool in my stack I literally cannot remove without rewriting half my automation."
"Our compliance team uses the count-and-file-name modes to audit regulatory patterns across 40,000 documents. Sieve found issues in our corpus that three previous vendors missed entirely."
FAQ
Sieve operates directly on the byte stream of your files. Rather than building an inverted index upfront (which introduces latency, storage overhead, and staleness), our engine compiles the pattern into a finite automaton that processes input line-by-line in a single pass. The result is search that is always fresh and requires zero preprocessing. The tradeoff is that Sieve scales with corpus size rather than index size — but on modern hardware, this is faster than maintaining an index for most workloads.
Sieve supports four distinct pattern grammars: basic regular expressions (the default), extended
regular expressions (-E), Perl-compatible regular expressions (-P),
and fixed-string literal matching (-F). Each grammar is selected via a single flag.
There is no configuration file and no compatibility matrix. You choose the grammar that matches
the complexity of your query.
Absolutely. Sieve is stdout-native by design. It outputs results as a stream, which means it
composes naturally with any downstream tool via standard pipes. The -q (quiet) flag
produces no output at all and communicates results purely via exit code, making it ideal for
conditional logic in scripts. The -Z flag uses null-terminated output for safe
handling of filenames containing special characters.
By default, Sieve detects binary files and suppresses output to prevent garbled terminal
rendering. You can override this behavior with the -a flag, which processes binary
files as text. For corpora that mix binary and text content, the default behavior ensures clean
output while the -I flag silently skips binary files entirely.
Use --include and --exclude with glob patterns. For example,
--include="*.py" restricts the search to Python files, while
--exclude-dir=node_modules omits entire directories from recursive traversal. These
filters are applied before the pattern matching phase, so irrelevant files are never even
opened.
Sieve is currently onboarding teams in private beta. Request access and our team will be in touch within 24 hours.