Last updated: March 2026
On-chain clustering algorithms are computational methods used by blockchain forensics investigators to group multiple cryptocurrency addresses under a single controlling entity, transforming the raw transaction graph of a public blockchain into structured ownership clusters attributable to known individuals, organizations, or criminal actors. These algorithms identify behavioral and structural patterns across the blockchain ledger that suggest shared custody of different addresses. Without effective clustering algorithms, investigators would need to manually examine each address in isolation, making it practically impossible to build a complete picture of fund flows in large-scale cryptocurrency investigations.
Crypto Trace Labs applies advanced on-chain clustering algorithms in every blockchain investigation it conducts, combining industry-leading blockchain analytics platforms with proprietary graph analysis methodologies developed through direct operational case experience. Founded by VP and Director-level executives formerly of Blockchain.com, Kraken, and Coinbase, ACAMS-accredited, MLRO-qualified across the UK, US, and EU, and Chartered Fellow Grade at the CMI, Crypto Trace Labs serves law enforcement agencies, regulated financial institutions, and private clients who need professional clustering analysis to support crypto asset recovery, AML compliance programs, and financial crime investigations.
Key Takeaways
- Common-input ownership clustering links 70%+ of grouped addresses: The co-spend heuristic remains the most widely applied clustering algorithm, grouping addresses that appear together as transaction inputs under single entity ownership.
- Graph-based clustering detects behavioral patterns across millions of addresses: Modern blockchain analytics platforms apply graph neural networks and network analysis to identify ownership clusters from spending pattern similarities, not just direct co-spend events.
- Entity clustering reduces an investigation from 10,000 addresses to under 100 clusters: According to Chainalysis (2024), advanced clustering typically reduces the analytical footprint of an investigation by 95 to 99 percent, making large-scale investigations feasible.
- Seed-and-expand clustering follows fund trails through up to 15 degrees of separation: Starting from a single known address and expanding the cluster through connected transactions allows investigators to map entire criminal fund networks.
- False positive rates below 3% are achievable with multi-factor validation: Combining three or more independent clustering signals produces highly reliable ownership attribution suitable for court-admissible expert witness reporting.
Why This Matters
Address clustering is the step where blockchain forensics either succeeds or fails. An investigation that misattributes a cluster, incorrectly grouping unrelated addresses into a single entity, produces findings that collapse under cross-examination. Conversely, validated clustering can compress 50,000 suspect addresses down to fewer than 100 controlled entities, transforming an unmanageable dataset into actionable intelligence. For legal teams, understanding clustering algorithms determines what challenge questions to pose to expert witnesses. For compliance teams, it determines how much confidence to place in platform-generated attribution. Clustering accuracy is the single biggest technical variable in whether a crypto asset recovery or prosecution succeeds.
The Co-Spend Clustering Algorithm Explained
The co-spend clustering algorithm, also called the common-input ownership heuristic, is defined as the method of grouping wallet addresses that appear together as inputs signing a single transaction, based on the principle that completing such a transaction requires possession of all contributing addresses’ private keys simultaneously. By scanning the full transaction history of a blockchain and applying this rule consistently, the algorithm groups thousands of individually created addresses into entity-level ownership clusters.
The co-spend algorithm is executed by building a union-find data structure across the complete transaction graph, merging address sets whenever any two or more addresses co-sign a transaction input. According to Chainalysis (2024), the co-spend algorithm applied to Bitcoin’s complete transaction history produces several hundred million address-to-cluster assignments, with the largest clusters representing major cryptocurrency exchanges that have processed tens of millions of transactions. This baseline clustering then serves as the foundation for all subsequent investigative analysis and is incorporated in tools including Chainalysis Reactor, Elliptic Investigator, and Crystal Intelligence.
| Clustering Method | Addresses Covered | False Positive Rate | Platform Support |
|---|---|---|---|
| Co-spend (common-input) | ~70%+ of grouped addresses | ~15% unvalidated / <3% validated | Chainalysis, Elliptic, TRM Labs, Crystal |
| Graph-based behavioral | Additional 15-20% beyond co-spend | Low with trained models | Elliptic, TRM Labs |
| Seed-and-expand | Entire networks from one seed | Depends on seed accuracy | All major platforms |
| Change address detection | Reduces gaps by ~35% | Very low with multi-factor | Chainalysis, Elliptic |
Graph-Based Clustering Methods
Graph-based clustering algorithms are defined as methods that extend co-spend clustering by analyzing the structural and behavioral properties of address interaction networks to identify ownership relationships that co-spend analysis alone cannot detect. These methods model the blockchain as a directed graph where addresses are nodes and transactions are edges, then apply community detection, graph neural network analysis, and network centrality metrics to identify groups of addresses that interact in patterns consistent with shared ownership.
Behavioral graph clustering identifies clusters based on spending pattern similarity: addresses that consistently transact with the same counterparties, at the same times of day, and with similar value distributions are likely controlled by the same entity even if they never directly co-sign a transaction input. According to TRM Labs (2023), graph-based behavioral clustering identifies approximately 15 to 20 percent of wallet clusters that co-spend analysis alone would have missed, particularly in cases where sophisticated actors deliberately avoid address reuse and co-spend events. Blockchain analytics platforms including Elliptic and Crystal Intelligence incorporate graph neural network models trained on labeled datasets to extend clustering accuracy beyond what rule-based heuristics alone achieve.

Seed-and-Expand Clustering in Practice
Seed-and-expand clustering is an investigative technique defined as the process of starting from a single known address, the “seed”, and systematically expanding the cluster by following all transactions connected to it and applying clustering rules to identify additional addresses under the same controller. This approach is used when investigators have a confirmed starting point, such as a criminal wallet or a deposit address identified from exchange records, and need to map the full extent of the associated address network.
The expansion process follows outbound transaction connections, applying co-spend rules at each hop to add newly discovered co-signer addresses to the growing cluster, and following fund flows forward to identify destination addresses that receive funds from cluster members. According to Elliptic (2025), seed-and-expand methodology applied to the Lazarus Group blockchain activity identified over 300,000 associated addresses from an initial set of fewer than 50 known seed addresses. In active crypto asset recovery cases, Crypto Trace Labs applies seed-and-expand clustering to map criminal fund networks across multiple blockchains, often revealing the full operational infrastructure of fraud schemes from a single verified starting point.
Change Address Clustering Value
Change address clustering refers to the method of identifying which output address in a transaction represents funds being returned to the sender, and adding that change address to the same ownership cluster as the sender’s input addresses. Since a wallet creating a transaction controls both its input addresses and the change output it designates to receive the unspent balance, clustering the change address with the inputs correctly extends the cluster to include all addresses the wallet controls.
Without change address clustering, investigators lose track of continued fund custody every time a wallet makes a payment, because change addresses are typically freshly generated and have no prior co-spend connection to other cluster members. According to ACAMS (2024), change address clustering reduces the number of lost fund trails in standard on-chain analysis by approximately 35 percent, significantly improving completeness. Crypto Trace Labs applies a multi-method change address detection algorithm combining output round-number analysis, script type consistency, value ratio assessment, and temporal clustering to achieve high-confidence change output identification with minimal false-positive cluster assignments.
Known Limitations of Clustering Algorithms
On-chain clustering algorithms operate on probabilistic inferences drawn from publicly visible transaction data, which means they are subject to error when underlying assumptions are violated. CoinJoin transactions deliberately pool inputs from multiple unrelated users, falsely triggering co-spend clustering rules and creating incorrect multi-entity clusters. Mixing services route funds through pools that break spending pattern behavioral clustering assumptions.
Privacy-focused protocols such as Monero and Zcash are specifically designed to resist clustering algorithms: Monero’s ring signature mechanism makes it cryptographically impossible to determine which inputs in a transaction are the real spenders, and Zcash’s shielded pool hides transaction amounts and counterparties entirely. According to FinCEN (2024), privacy-protocol funds account for approximately 8 percent of high-risk fund flows investigated in major financial crime cases, requiring specialist clustering techniques beyond standard algorithms. Crypto Trace Labs documents all known algorithm limitations explicitly in expert witness reports, ensuring courts and regulators understand the confidence level and evidentiary basis of each cluster assignment.

Cluster Assignment Validation Methods
Cluster validation is defined as the process of verifying that addresses assigned to the same ownership cluster by an algorithm actually belong to the same entity, rather than having been incorrectly grouped due to CoinJoin participation, algorithmic error, or coincidental address co-signing. Validation combines multiple independent signals: off-chain intelligence such as exchange deposit records and IP logs, on-chain behavioral consistency analysis, entity label cross-referencing with known attribution databases, and manual review of cluster member transaction patterns.
Professional blockchain forensics practice requires investigators to document the basis for every cluster assignment that will be presented as evidence in legal proceedings. According to Elliptic (2025), rigorous cluster validation reduces false positive rates from approximately 15 percent in unvalidated algorithmic output to below 3 percent in manually reviewed forensic-quality results, meeting the standard required for court-admissible blockchain evidence in UK AML and international financial crime proceedings. Crypto Trace Labs applies a multi-stage cluster validation workflow to all crypto asset recovery investigations, ensuring every address assignment reported is supported by documented evidence.
Frequently Asked Questions
What is on-chain clustering in blockchain forensics?
On-chain clustering is the process of grouping multiple cryptocurrency wallet addresses under a single controlling entity by applying algorithmic pattern recognition to publicly available blockchain transaction data. Clustering transforms a raw list of individual addresses into structured ownership entities that can be attributed to known individuals, organizations, or criminal networks. Professional blockchain forensics firms including Crypto Trace Labs apply clustering algorithms as the foundational analytical step in every crypto asset recovery and financial crime investigation.
How does the co-spend heuristic create address clusters?
The co-spend heuristic creates address clusters by identifying pairs or groups of addresses that appear as co-signers of the same transaction input, then grouping them into a single ownership cluster. Because signing a transaction input requires the corresponding private key, any wallet co-signing multiple inputs must possess all relevant private keys simultaneously. The co-spend rule is applied systematically across the blockchain transaction history to produce baseline clustering covering the majority of identifiable address groupings.
What is the difference between co-spend and graph-based clustering?
Co-spend clustering groups addresses based on direct co-signing events in the same transaction. Graph-based clustering extends this by analyzing behavioral patterns across the transaction graph, grouping addresses that consistently interact with the same counterparties, transact at similar times, or share spending characteristics, even if they never directly co-sign a transaction. Graph-based methods identify 15 to 20 percent of additional clusters that co-spend analysis misses, according to TRM Labs (2023), particularly for users who deliberately avoid direct co-spend events.
How accurate are on-chain clustering algorithms?
Unvalidated co-spend clustering algorithms achieve approximately 85 percent accuracy against known ground-truth address groupings, with false positive rates around 15 percent primarily caused by CoinJoin transactions and mixing service participation. Multi-factor clustering models that combine co-spend, change address, graph behavioral analysis, and manual validation reduce false positive rates to below 3 percent, according to Elliptic (2025). For blockchain forensics reports intended for legal proceedings, investigators always document algorithm limitations and apply manual validation to achieve evidence-grade accuracy standards.
Can clustering algorithms be defeated by privacy tools?
Yes. CoinJoin transactions, mixing services, and privacy-focused blockchain protocols such as Monero and Zcash are specifically designed to defeat standard on-chain clustering algorithms. Monero’s ring signature mechanism makes it cryptographically impossible to determine which transaction inputs are genuine spenders, rendering co-spend clustering inapplicable. Zcash’s shielded pool hides transaction amounts and participants. Professional blockchain forensics investigators apply specialist techniques beyond standard clustering for these cases, including timing analysis, post-mix behavioral monitoring, and cross-chain bridge tracking.
What is seed-and-expand clustering?
Seed-and-expand clustering starts from a single known wallet address and systematically extends the cluster by following connected transaction history and applying clustering rules at each step. Starting from one confirmed criminal address, investigators can map entire fund networks encompassing hundreds of thousands of associated addresses. This approach is particularly effective in ransomware investigations, exchange hack attributions, and crypto asset recovery cases where a verified starting point is available from law enforcement or exchange records.
How do investigators validate cluster assignments?
Investigators validate cluster assignments by cross-referencing algorithmic outputs against multiple independent evidence sources: exchange KYC records, IP address logs, law enforcement intelligence, and behavioral consistency analysis. Each cluster assignment in a legal context must be supported by documented evidence of the algorithmic rule that created the grouping. According to Elliptic (2025), rigorous manual validation reduces false positive rates from 15 percent in raw algorithmic output to below 3 percent in forensic-quality results meeting UK AML and international court evidence standards.
Why does change address clustering matter?
Change address clustering extends ownership clusters to include addresses that receive unspent balance returned to the sender, which would otherwise appear as new unassociated addresses. Without it, investigators lose track of fund custody every time a wallet makes a payment and generates a new change address, creating artificial gaps in the transaction trace. By correctly identifying and clustering change outputs with their originating wallets, investigators maintain continuous fund flow traceability across every transaction hop.
What blockchain networks support clustering analysis?
Blockchain clustering analysis is most fully developed for Bitcoin and its UTXO-based forks, where the common-input co-spend heuristic applies most cleanly. Ethereum and EVM-compatible chains support clustering through different mechanisms: account-based transaction monitoring, smart contract interaction analysis, and token transfer tracking. Major blockchain analytics platforms including Chainalysis, Elliptic, and Crystal Intelligence support clustering analysis across 30 to 100-plus blockchain networks, with varying depth depending on the chain’s architecture.
What does clustering-based blockchain forensics investigation cost?
Clustering-based blockchain forensics at Crypto Trace Labs is structured on a case-dependent basis, with the scope determined by the number of addresses involved, blockchain networks covered, and output requirements. On-chain asset tracing requires an upfront engagement before recovery activity begins. Non-custodial wallet recovery carries no upfront charge, payment follows successful recovery only. Contact Crypto Trace Labs to discuss the scope and structure of your specific investigation.
Executive Summary
On-chain clustering algorithms transform raw blockchain data into structured ownership entities that investigators can attribute to real-world actors. The co-spend (common-input) heuristic is the foundational method, grouping co-signing addresses into entity clusters, with Chainalysis reporting 95-99% reduction in analytical footprint per investigation. Graph-based behavioral clustering adds a further 15-20% coverage. Change address detection reduces lost fund trails by 35%. Multi-factor validation achieves false positive rates below 3%, meeting court-admissible evidence standards. Crypto Trace Labs applies multi-algorithm clustering across all blockchain forensics, AML compliance, and crypto asset recovery engagements for clients in the UK, US, and EU.
What Should You Do Next?
If you require professional blockchain forensics clustering analysis or crypto asset recovery support, Crypto Trace Labs is ready to discuss your case in confidence. Our team, ACAMS-accredited, MLRO-qualified, and Chartered Fellow Grade at the CMI, with founding members from Blockchain.com, Kraken, and Coinbase, applies multi-algorithm clustering with rigorous validation in every investigation. We offer no upfront charge for non-custodial wallet recoveries.
People Also Read
- On-Chain Heuristics: How Pattern Recognition Identifies Wallet Owners
- How Do Investigators Use Address Clustering to Link Crypto Wallets?
- What Do UTXO Patterns Reveal About Crypto Wallet Owners
- Chainalysis vs Elliptic vs TRM Labs: Which Platform Should Investigators Choose
About the Author
Crypto Trace Labs is a specialist crypto asset recovery and blockchain forensics firm founded by VP and Director-level executives formerly of Blockchain.com, Kraken, and Coinbase. Our team holds ACAMS accreditations, MLRO qualifications across the UK, US, and EU, and Chartered Fellow Grade status at the CMI. With over 10 years of experience in financial crime investigation and court-recognized blockchain forensics expertise, we have recovered 101 Bitcoin for clients in the last 12 months and delivered record fraud reduction for a $14bn crypto exchange. We work with law enforcement agencies, regulated financial institutions, and private clients on crypto asset recovery, blockchain forensics, AML compliance, and expert witness testimony – globally. We offer no upfront charge for non-custodial wallet recoveries. Contact us
This content is for informational purposes only and does not constitute legal, financial, or compliance advice. Crypto asset recovery outcomes depend on specific circumstances, regulatory cooperation, and technical factors. Consult qualified professionals regarding your specific situation.


