Backed by mandō

Pattern intelligence
for the entire corpus.

Sieve is the research-grade pattern matching infrastructure that finds exactly what matters — across files, directories, and entire codebases — with the precision of a formal system and the speed of a compiled engine. We built this from first principles because the world's information problem isn't retrieval. It's recognition.

See it work →
SOC 2 Type II
ISO 27001
GDPR Compliant
HIPAA Ready

Every pattern. Every file.
Every answer.

Sieve ships with a comprehensive set of matching primitives, each designed to collapse the distance between a question and its answer. These are not features we added. They are properties of the system.

-i

Case-Invariant Matching

Our pattern engine normalizes case boundaries transparently, so you find "Error", "error", and "ERROR" in a single pass. No preprocessing. No separate indices. Just pass -i and the distinction dissolves.

-r

Recursive Corpus Traversal

Point Sieve at a directory and it walks the entire tree. Every file, every subdirectory, every nested structure — surfaced with the -r flag. The -R variant follows symbolic links, for when your corpus topology demands it.

-E

Extended Regular Expressions

The -E engine unlocks the full power of extended regex: alternation, repetition quantifiers, grouping without backslash escaping. The expressive surface area of your queries expands by an order of magnitude.

-P

Perl-Compatible Regex

For research teams that need lookahead, lookbehind, non-greedy quantifiers, and named capture groups. The -P engine gives you PCRE — the most powerful pattern grammar available — natively, with zero configuration overhead.

-c

Statistical Match Counting

When you need the signal density, not the signal itself. The -c flag returns a count of matching lines per file — a one-pass frequency analysis that lets you map pattern distribution across an entire corpus without materializing results.

-n

Line-Number Provenance

Every match carries its origin coordinate. With -n, Sieve prefixes each result with the exact line number in the source file, creating an audit trail from discovery back to context. Essential for compliance and reproducibility.

-v

Inverse Pattern Logic

Sometimes the answer is everything the pattern doesn't match. The -v flag inverts the match predicate, returning only non-matching lines. It's exclusion as a first-class operation, not an afterthought.

-A -B -C

Contextual Window Extraction

A match without context is a coordinate without a map. -A, -B, and -C extract trailing, leading, or surrounding lines around each match. Configurable window size. Full situational awareness around every hit.

-o

Precision Extraction

When you need only the matched substring, not the line that contains it. The -o flag isolates exactly the characters that satisfy the pattern — nothing more, nothing less. Clean signal, zero noise.

-w

Word-Boundary Intelligence

The -w flag constrains matches to whole words, respecting the natural boundaries in your data. Search for "log" without matching "logging", "catalog", or "blog". Surgical specificity, built into the engine.

From pattern to answer in four steps

Sieve reduces the entire pattern-matching workflow to a deterministic pipeline that runs in constant time per line. No indexing phase. No warm-up period. You describe the pattern; we find every instance.

01

Define the pattern

Express your query as a literal string, basic regex, extended regex, or Perl-compatible regular expression. Sieve accepts all four grammars. The pattern is compiled once and applied everywhere.

02

Specify the corpus

Pass one file, many files, or an entire directory tree. Use --include and --exclude to filter by file pattern, so Sieve only examines what matters. The search space is bounded by your intent, not your infrastructure.

03

Configure the output

Choose your result shape: full matching lines, only the matched text, counts, file names only, or context windows. Add line numbers for provenance. Add color for visual clarity. The output is as structured as you need it to be.

04

Execute and compose

Sieve streams results to standard output as they're found — no buffering, no final materialization step. Pipe the output into any downstream process. The pattern match is the first stage in an arbitrarily complex data pipeline.

Pattern matching is a fundamental primitive.

We founded Sieve on a conviction that most people in our industry have overlooked: that the ability to locate a pattern in a body of text is not a convenience feature or a nice-to-have. It is the single most important operation in information work. Every search engine, every log analyzer, every security scanner, every code review tool, every compliance audit — at the very bottom of the abstraction stack, there is a pattern, and something finding it.

And yet for years, the state of the art has been fragmented across ad-hoc implementations, underpowered full-text search indices, and vendor-locked API endpoints that approximate the answer rather than computing it. We think that's wrong. We think the world needs a single, composable, research-grade pattern matching layer that operates at the speed of the filesystem itself. Not an approximation. Not a probability distribution over possible results. An exact answer.

This is what Sieve is. A formal system masquerading as a product. A compiler for questions about text. We are building from first principles because the problem deserves it, and because we believe that when you give people a tool that is precise, fast, and composable, they will build things on top of it that neither we nor they can predict today. That is the kind of compound leverage we are optimizing for.

Infrastructure-grade performance

0
GB/s
Throughput on commodity hardware
0
%
Of production servers run our core engine
0
Regex grammars
Basic, Extended, Perl-compatible, Fixed-string
0
Indexing required
Stream-native. No pre-processing step.

How Sieve compares

We benchmarked against traditional search solutions. The results speak for themselves.

Capability Sieve Traditional Full-Text Search Regex Libraries
Indexing required ✓ None ✗ Hours ✓ None
Recursive directory search ✓ Native ✗ Requires agent ✗ Not included
Perl-compatible regex ✓ Built-in ✗ No ◐ Depends
Context window extraction ✓ Configurable ◐ Limited ✗ No
File pattern filtering ✓ --include/--exclude ✗ External config ✗ N/A
Composable output ✓ Stdout-native ✗ API only ◐ Manual
Zero-config deployment ✓ Single binary ✗ Server cluster ◐ Library dep

See Sieve in action

Real commands. Real output. Highlighted matches.

Recursive search with line numbers
$ sieve -rn "import" ./src

./src/main.py:1:from collections import defaultdict
./src/main.py:2:import sys
./src/main.py:3:import os
./src/utils.py:1:import json
./src/utils.py:2:from pathlib import Path
Case-insensitive match with context
$ sieve -i -C 1 "error" server.log

  Connection established on port 8080
  Error: timeout waiting for response from upstream
  Retrying with exponential backoff
--
  Request processed in 42ms
  FATAL ERROR: unable to allocate memory
  Shutting down gracefully
Extended regex with only-matching
$ sieve -Eo "[0-9]+\.[0-9]+\.[0-9]+" versions.txt

3.12.14
2.7.18
1.0.0
4.2.1
Count matches per file
$ sieve -rc "TODO" ./src

./src/main.py:3
./src/utils.py:1
./src/config.py:0
./src/server.py:7

Simple, transparent pricing

Every tier is built around real capabilities. You unlock more of the engine as you grow.

Starter
$0
Free forever

Basic literal string matching for individuals exploring pattern recognition.

  • Fixed-string matching -F
  • Single-file search
  • Case-sensitive only
  • 100 queries / day
  • Standard output
Pro
$29
per month

Full regex engine with recursive traversal for serious practitioners.

  • Everything in Starter
  • Extended regex -E
  • Case-insensitive -i
  • Recursive search -r / -R
  • Line numbers -n
  • Invert match -v
  • Word matching -w
  • Unlimited queries
Enterprise
$79
per seat / month

Perl regex, context windows, and enterprise output controls for teams.

  • Everything in Pro
  • Perl-compatible regex -P
  • Context extraction -A -B -C
  • Only-matching mode -o
  • Max-count limiting -m
  • Color output --color
  • File-name control -H / -h
  • SSO & audit logs
  • Priority support
Research
 Custom
annual agreement

Full API access, pattern file injection, null-terminated output, and dedicated infrastructure.

  • Everything in Enterprise
  • Pattern file input -f
  • Multi-pattern -e
  • Null-terminated output -Z
  • Include/exclude filters
  • Quiet mode -q
  • Line-match mode -x
  • Dedicated SLA
  • On-prem deployment

Trusted by teams who ship

"We replaced our entire Elasticsearch cluster with Sieve. The recursive search alone saved us six figures in infrastructure. No indexing, no shards, no cluster management. Just patterns and answers."
Karen Leung
Karen Leung
VP of Infrastructure, ScaleForge
"The Perl-compatible regex engine is genuinely world-class. Lookahead, lookbehind, non-greedy quantifiers — all native. I used to chain three different tools together to get what Sieve gives me in one invocation."
Raj Joshi
Raj Joshi
Principal Engineer, Codex Labs
"I run Sieve as the first step in every incident response. Case-insensitive search across a million log lines with context windows — I can go from alert to root cause in under 30 seconds."
Valeria Ortiz
Valeria Ortiz
SRE Lead, UpGrid
"The composability is the real unlock. Sieve doesn't try to be a platform — it's a primitive. I pipe it into everything. It's the one tool in my stack I literally cannot remove without rewriting half my automation."
David Ström
David Ström
Solo Developer & Indie Hacker
"Our compliance team uses the count-and-file-name modes to audit regulatory patterns across 40,000 documents. Sieve found issues in our corpus that three previous vendors missed entirely."
Tanya Wu
Tanya Wu
CTO, NormativeAI

Common questions

How does Sieve achieve search without indexing?

Sieve operates directly on the byte stream of your files. Rather than building an inverted index upfront (which introduces latency, storage overhead, and staleness), our engine compiles the pattern into a finite automaton that processes input line-by-line in a single pass. The result is search that is always fresh and requires zero preprocessing. The tradeoff is that Sieve scales with corpus size rather than index size — but on modern hardware, this is faster than maintaining an index for most workloads.

What regex dialects does Sieve support?

Sieve supports four distinct pattern grammars: basic regular expressions (the default), extended regular expressions (-E), Perl-compatible regular expressions (-P), and fixed-string literal matching (-F). Each grammar is selected via a single flag. There is no configuration file and no compatibility matrix. You choose the grammar that matches the complexity of your query.

Can I use Sieve in automated pipelines?

Absolutely. Sieve is stdout-native by design. It outputs results as a stream, which means it composes naturally with any downstream tool via standard pipes. The -q (quiet) flag produces no output at all and communicates results purely via exit code, making it ideal for conditional logic in scripts. The -Z flag uses null-terminated output for safe handling of filenames containing special characters.

Is Sieve suitable for binary files?

By default, Sieve detects binary files and suppresses output to prevent garbled terminal rendering. You can override this behavior with the -a flag, which processes binary files as text. For corpora that mix binary and text content, the default behavior ensures clean output while the -I flag silently skips binary files entirely.

How do I search only specific file types in a directory?

Use --include and --exclude with glob patterns. For example, --include="*.py" restricts the search to Python files, while --exclude-dir=node_modules omits entire directories from recursive traversal. These filters are applied before the pattern matching phase, so irrelevant files are never even opened.

Join the research preview.

Sieve is currently onboarding teams in private beta. Request access and our team will be in touch within 24 hours.

Backed by mandō

See mandō's portfolio →