SINGULR — One Line. One Truth. Zero Duplicates.

The Deduplication Manifesto

Every dataset is lying to you. Rows repeat. Records echo. Your pipeline is haunted by phantom duplicates that inflate your metrics, corrupt your models, and waste your compute.

We built SINGULR because deduplication isn't a feature — it's a philosophy. A single source of truth isn't a nice-to-have. It's the only acceptable state of your data.

One line. One truth. Zero noise. That's the SINGULR promise.

Everything you need to
eliminate redundancy

Occurrence Counting

Prefix every line with its exact occurrence count. Know not just what's duplicate — know how duplicate it is. Powered by the --count engine.

◉

Duplicate Isolation

Surface only lines that appear more than once. Strip the noise, keep the signal. The --repeated filter exposes every offender.

◎

Full Duplicate Expansion

Don't just flag duplicates — see every single occurrence. The -D mode gives you complete transparency into repeated data.

①

Unique-Only Mode

Extract lines that appear exactly once. Find the truly original entries in any dataset with --unique — zero false positives.

Case-Insensitive Matching

Normalize casing before comparison. "ERROR" and "error" are the same duplicate — SINGULR treats them that way with --ignore-case.

⊞

Field-Aware Comparison

Skip leading fields and characters before comparing. Target the exact data columns that matter with --skip-fields and --skip-chars.

▮

Precision Window

Compare only the first N characters of each line. Scope your deduplication to exactly the data boundaries you define with --check-chars.

How it works

Pre-Index Your Data

SINGULR is optimized for pre-indexed data streams. Feed it structured, ordered input for maximum throughput and zero-latency adjacent comparison.

Configure Your Filter

Choose your mode: count occurrences, isolate duplicates, extract uniques, or run full expansion. Layer in case normalization, field skipping, and character windowing.

Execute at Line Speed

SINGULR processes adjacent comparisons in O(1) memory. No hash tables. No bloom filters. Pure streaming deduplication over pre-indexed pipelines.

Ship Clean Data

Output your deduplicated stream to any destination — files, pipelines, downstream services. Group separators and null-terminated output supported natively.

Pricing

Starter

$0/mo

Default deduplication
Up to 10K lines/stream
Single mode per query
Community support

Start Free

What teams are saying

"We were shipping duplicate rows to production for months. SINGULR caught 12 million redundant records in our first run. Our data warehouse costs dropped 34% overnight."

Jordan Krause VP Data, Meridian Analytics

"The adjacent-comparison architecture is genius. Pre-index once, deduplicate forever. We process 800M events/day and SINGULR adds zero measurable overhead."

Priya Nambiar Staff Engineer, Tessera Cloud

"I used to pipe through sort | some-dedup-tool and pray. SINGULR's count mode and case normalization changed how we think about data quality. It's embarrassingly fast."

Marcus Lindqvist CTO, NordStream Data

"The --unique mode is a cheat code. Finding the one-off anomalies in a 50GB log file used to take hours. SINGULR does it in a single streaming pass."

Dani Sakamoto SRE Lead, Uplink Systems

The team behind SINGULR

Camille Reeves

CEO & Co-Founder

Former Principal Engineer at Snowflake. Believes duplicates are a moral failing.

Sana Zhao

CTO & Co-Founder

PhD in streaming algorithms from CMU. Wrote her thesis on adjacent-pair comparison at scale.

Ellis Marchetti

Head of Product

Previously led data quality at Stripe. Obsessed with the purity of output streams.

FAQ

SINGULR is architected for adjacent-line deduplication, which delivers maximum performance on pre-indexed data streams. This isn't a limitation — it's a feature. Pre-indexing ensures deterministic, O(1) memory deduplication with zero hash collisions. Think of it as optimized-by-default.

The --repeated flag outputs one representative line per duplicate group. -D expands to show every single occurrence. Use --repeated for summaries, -D for full forensic analysis.

Absolutely. Use --skip-fields=N to ignore leading fields and --skip-chars=N to bypass leading characters. Combine with --check-chars=N to compare only a specific character window. Full precision targeting.

Yes. Enable --ignore-case and SINGULR normalizes casing before every comparison. "DATA", "Data", and "data" are recognized as duplicates.

SINGULR outputs newline-delimited results by default. Enable --zero-terminated for NUL-delimited output, or use --group to insert empty-line separators between duplicate groups for structured downstream processing.

Kill
duplicates.

The Deduplication Manifesto

Everything you need to
eliminate redundancy

Occurrence Counting

Duplicate Isolation

Full Duplicate Expansion

Unique-Only Mode

Case-Insensitive Matching

Field-Aware Comparison

Precision Window

How it works

Pre-Index Your Data

Configure Your Filter

Execute at Line Speed

Ship Clean Data

See SINGULR in action

Pricing

What teams are saying

Live Dedup Feed

The team behind SINGULR

Camille Reeves

Sana Zhao

Ellis Marchetti

FAQ

Get access

Killduplicates.

The Deduplication Manifesto

Everything you need toeliminate redundancy

Occurrence Counting

Duplicate Isolation

Full Duplicate Expansion

Unique-Only Mode

Case-Insensitive Matching

Field-Aware Comparison

Precision Window

How it works

Pre-Index Your Data

Configure Your Filter

Execute at Line Speed

Ship Clean Data

See SINGULR in action

Pricing

What teams are saying

Live Dedup Feed

The team behind SINGULR

Camille Reeves

Sana Zhao

Ellis Marchetti

FAQ

Get access

Follow us

Kill
duplicates.

Everything you need to
eliminate redundancy