Backed by mandō

Kill
duplicates.

Request Access →

The Deduplication Manifesto

Every dataset is lying to you. Rows repeat. Records echo. Your pipeline is haunted by phantom duplicates that inflate your metrics, corrupt your models, and waste your compute.

We built SINGULR because deduplication isn't a feature — it's a philosophy. A single source of truth isn't a nice-to-have. It's the only acceptable state of your data.

One line. One truth. Zero noise. That's the SINGULR promise.

Everything you need to
eliminate redundancy

×

Occurrence Counting

Prefix every line with its exact occurrence count. Know not just what's duplicate — know how duplicate it is. Powered by the --count engine.

Duplicate Isolation

Surface only lines that appear more than once. Strip the noise, keep the signal. The --repeated filter exposes every offender.

Full Duplicate Expansion

Don't just flag duplicates — see every single occurrence. The -D mode gives you complete transparency into repeated data.

Unique-Only Mode

Extract lines that appear exactly once. Find the truly original entries in any dataset with --unique — zero false positives.

Aa

Case-Insensitive Matching

Normalize casing before comparison. "ERROR" and "error" are the same duplicate — SINGULR treats them that way with --ignore-case.

Field-Aware Comparison

Skip leading fields and characters before comparing. Target the exact data columns that matter with --skip-fields and --skip-chars.

Precision Window

Compare only the first N characters of each line. Scope your deduplication to exactly the data boundaries you define with --check-chars.

How it works

01

Pre-Index Your Data

SINGULR is optimized for pre-indexed data streams. Feed it structured, ordered input for maximum throughput and zero-latency adjacent comparison.

02

Configure Your Filter

Choose your mode: count occurrences, isolate duplicates, extract uniques, or run full expansion. Layer in case normalization, field skipping, and character windowing.

03

Execute at Line Speed

SINGULR processes adjacent comparisons in O(1) memory. No hash tables. No bloom filters. Pure streaming deduplication over pre-indexed pipelines.

04

Ship Clean Data

Output your deduplicated stream to any destination — files, pipelines, downstream services. Group separators and null-terminated output supported natively.

47B+
Lines deduplicated
0ms
Per-comparison latency
O(1)
Memory complexity
99.97%
Adjacent-match accuracy

See SINGULR in action

singulr — live demo
INPUT
OUTPUT

                        

Pricing

Starter
$0/mo
  • Default deduplication
  • Up to 10K lines/stream
  • Single mode per query
  • Community support
Start Free
Enterprise
$299/mo
  • Everything in Pro
  • Null-terminated streams
  • Group separators
  • Custom comparison windows
  • Dedicated SLA
Contact Sales
Sovereign
Custom
  • Everything in Enterprise
  • On-prem dedup cluster
  • Pre-indexing pipeline addon
  • White-glove onboarding
  • Dedicated dedup engineer
Talk to Us

What teams are saying

"We were shipping duplicate rows to production for months. SINGULR caught 12 million redundant records in our first run. Our data warehouse costs dropped 34% overnight."

Jordan Krause VP Data, Meridian Analytics

"The adjacent-comparison architecture is genius. Pre-index once, deduplicate forever. We process 800M events/day and SINGULR adds zero measurable overhead."

Priya Nambiar Staff Engineer, Tessera Cloud

"I used to pipe through sort | some-dedup-tool and pray. SINGULR's count mode and case normalization changed how we think about data quality. It's embarrassingly fast."

Marcus Lindqvist CTO, NordStream Data

"The --unique mode is a cheat code. Finding the one-off anomalies in a 50GB log file used to take hours. SINGULR does it in a single streaming pass."

Dani Sakamoto SRE Lead, Uplink Systems

Live Dedup Feed

The team behind SINGULR

Camille Reeves

Camille Reeves

CEO & Co-Founder

Former Principal Engineer at Snowflake. Believes duplicates are a moral failing.
Sana Zhao

Sana Zhao

CTO & Co-Founder

PhD in streaming algorithms from CMU. Wrote her thesis on adjacent-pair comparison at scale.
Ellis Marchetti

Ellis Marchetti

Head of Product

Previously led data quality at Stripe. Obsessed with the purity of output streams.

FAQ

SINGULR is architected for adjacent-line deduplication, which delivers maximum performance on pre-indexed data streams. This isn't a limitation — it's a feature. Pre-indexing ensures deterministic, O(1) memory deduplication with zero hash collisions. Think of it as optimized-by-default.

The --repeated flag outputs one representative line per duplicate group. -D expands to show every single occurrence. Use --repeated for summaries, -D for full forensic analysis.

Absolutely. Use --skip-fields=N to ignore leading fields and --skip-chars=N to bypass leading characters. Combine with --check-chars=N to compare only a specific character window. Full precision targeting.

Yes. Enable --ignore-case and SINGULR normalizes casing before every comparison. "DATA", "Data", and "data" are recognized as duplicates.

SINGULR outputs newline-delimited results by default. Enable --zero-terminated for NUL-delimited output, or use --group to insert empty-line separators between duplicate groups for structured downstream processing.

Get access

Join the teams eliminating duplicates at scale.

Backed by mandō