Scaling blockchains securely

The Nym mixnet solution to the problem of selective disclosure attacks

September 12, 20249 mins Read

Mixnets like Nym’s can help solve this problem by preventing attacks which target chain consensus.

The security problem of scaling

Blockchains are permanent and public ledgers of transactions which cannot be interfered with by third parties. For new transactions to be verified (for example, to ensure that a coin isn’t being spent twice), this full ledger (contained in blocks) must be constantly updated and checked. Therein lies two unsolved problems for blockchain technology:

How can a blockchain scale without straining the resources of verifiers who must use their own bandwidth and resources to verify transactions for the communal chain?
Do more efficient means of data verification come at the cost of chain security?

Mustafa Al-Bassam, the co-founder of Celestia and computer security researcher, diagnoses the problem in the following way: as block sizes grow, running “full nodes” for data availability verification becomes increasingly resource intensive. The introduction of committees and light nodes are more efficient alternatives, but they can be vulnerable to a specific network attack known as selective disclosures.

The Nym team has been working closely with Celestia, researching how the Nym mixnet might be able to add an anonymization layer to the blockchain verification process in order to protect against this particular attack. As we will see, this would involve a mode of Unlinkable Data Availability Sampling.

Before sketching Nym’s solution, let’s first look at the technical nature of the problem for blockchains in scaling without relying on full nodes, and how existing solutions leave blockchains open to selective disclosure attacks.

Data availability verification

Data availability refers to the guarantee that all the data in a blockchain block has been properly published and can be accessed when needed by the network. This ensures that all the necessary information related to a block is available for nodes to download and verify so the network can validate the correctness and completeness of the block’s transactions and state.

In blockchain systems, verifying data availability is crucial in preventing malicious actors from hiding or withholding parts of a block’s data while still claiming the block as valid. Without data availability, a block could be added to the chain with missing or incomplete data, potentially leading to invalid transactions, security risks, and an inconsistent state across the network.

Traditional blockchains require users (running full nodes) to verify all data by syncing the entire chain. But ensuring data availability can be challenging, especially as block sizes grow.

Techniques like data availability sampling allow light nodes to verify the availability of data without downloading the entire block, making data availability verification more efficient overall. Modern blockchains take this approach to verify data availability with fewer resources. Celestia is the leading example in the use of data availability to power all the modern chains.

Mechanisms for scaling, securely and insecurely

There are several ways that data availability verification can be achieved with different levels of security:

Full Nodes (maximum security)

Full nodes verify all data by downloading everything to ensure maximum security by rejecting incomplete blocks. This baseline solution, however, becomes increasingly inefficient as block sizes grow. Afterall, it takes physical and financial resources to perform this verification work.

No Data Availability Guarantee (zero security)

There is no guarantee that data is available, only a commitment (like IPFS URIs). This may be sufficient for scenarios like NFTs where safety isn’t required, but it is certainly not a solution for most chain transactions with real stakes.

Data Availability Committee

Through an honest majority, a select committee guarantees that data is available, thus balancing data availability with performance.

Data Availability Committee with Crypto Economic Security

This method also faces the same problem of scaling, since committee members face growing data overheads to perform verification work. To ameliorate this, committees can also be “crypto economically incentivized,” that is, provided tokens in proportion to the verification work performed.

Committees can also be penalized (“slashed” or “halted”) if they are dishonest, thus increasing overall security. This works when the committee is part of the chain’s consensus mechanism.

Light nodes

Introducing light nodes into this framework can also allow for data availability checks, thus further reducing resource requirements. There are two ways of doing this:

Data Availability Sampling without an Honest Minority of Light Nodes: Light nodes use sampling techniques to verify data without downloading the entire block, but they can’t guarantee full data recovery if some data is missing. This relies on the data availability committee and sampling interfaces.
Data Availability Sampling with Honest Minority of Light Nodes: If there’s a minority of honest light nodes, they can reconstruct a block if any data is withheld to enhance security. Note that a synchronous network is needed for nodes to share data effectively.

Unlinkable Data Availability Sampling

This advanced level prevents targeted attacks (like selective share disclosures, which we will delve into next) by making requests from light nodes unlinkable and uniformly random. This would require further advancements in anonymization technologies, which is where Nym’s solution comes into play.

But first, what makes these prior solutions inadequate and vulnerable exactly?

The problem: Selective disclosure attacks

A selective disclosure attack is a type of data availability attack where a malicious adversary tries to convince a node (or multiple nodes) that a block’s data is fully available when, in reality, part of it is withheld. This effectively makes the block incomplete or unrecoverable.

The attacker’s goal is to manipulate the verification process by selectively responding to queries for block data in a peer-to-peer network. In the end, it breaks consensus, forks the chain, and compromises real transactions and overall trust.

So here’s how it works, based on the updated understanding.

Overview of the Attack

The attack has two simultaneous components:

The adversary withholds enough data shares from the block so that it cannot be reconstructed by the network, making the block unavailable.
Simultaneously, the adversary selectively responds to queries from the target light nodes so that they believe the block is available.

Mechanism of the attack

The network relies on data availability sampling (DAS) where light nodes request random samples of block data from other nodes to verify the availability.

In a selective disclosure attack, the adversary identifies a portion of the block’s data to withhold, ensuring the block cannot be reconstructed.
However, the adversary selectively responds to the queries of honest nodes, such as light clients, by providing data from the shares that were not withheld. This creates the false appearance that the block is fully available.

Challenges for Honest Nodes

Since adversarial nodes are indistinguishable from honest ones and respond correctly when queried, they cannot be blacklisted unless they are detected.
Honest nodes make sample requests, and the adversary’s responses appear valid because the withheld data is hidden from the specific queries of the honest nodes.

Two solutions

One suggested countermeasure is adding an anonymization layer where the source of each sample request cannot be linked to the client (light node) and requests are processed in a random order across the network. This prevents the adversary from targeting specific nodes with selective disclosure.
Another approach involves ensuring each node makes a sufficient number of queries (increasing the chances of detecting withheld data), or relying on a large enough number of nodes to cover the missing shares.

Simulation results

If a client makes a small number of queries (e.g., 15), the probability of a successful attack is relatively high (~0.0133) — the adversary could trick the client after about 75 attempts.
As the number of queries per client increases (e.g., to 50 queries), the probability of the attack succeeding drops dramatically (to nearly 0).
Similarly, targeting more clients increases the attack’s success rate, but the probability decreases as the number of queries rises.

Summary of the problem

The selective disclosure attack manipulates data availability sampling by selectively revealing data to specific nodes, convincing them that the block is fully available. Countermeasures involve using (1) anonymization techniques and (2) ensuring that clients make enough random queries to detect missing data.

Possible solutions investigated

In an “Evaluation of private networks for Celestia,” researchers analyzed different possible solutions for adding an anonymity layer to Celestia’s blockchain. Possible private network solutions included:

A Tor network overlay with Snowflake to mask Tor traffic with WebRTC and prevent eavesdropping
Mixnet integration, such as with Nym’s Loopix, for traffic anonymization
Latency tolerances that implement randomized delays in traffic to desynchronize the queries of clients
Cover traffic or dummy traffic to conceal real query patterns
The use of VPNs for additional protections

These solutions each have advantages and downsides (e.g., in terms of possible latency).

Setting aside the advantages and many problems posed by a Tor overlay, it’s important to note that many of these core solutions are network techniques (e.g. cover traffic and randomized delays) already in operation with Nym mixnet. So the Nym core team decided to pursue a R&D investigation to see what Nym could do to help Celestia and others make a more private blockchain experience.

Nym’s Anonymous Sampling solution

Nym Technologies proposes to integrate its infrastructure with modular networks, leveraging the Nym mixnet as an anonymization layer to address selective disclosure attacks through Private Data Availability Sampling (P-DAS).

P-DAS enables requests via the Nym Mixnet, decoupling the request from the requester, which ensures that the requester cannot be targeted. This method provides a privacy-preserving, secure approach to data availability sampling, allowing nodes to verify data availability without exposure to adversarial attacks.

A Nym Mixnet integration could offer several key benefits:

An Anonymized Data Availability Sampling module compatible with the Nym mixnet, ensuring privacy-preserving data verification
Prevention of selective disclosure attacks by routing encrypted traffic through the Nym mixnet, obscuring user activity while maintaining data integrity using techniques like cover traffic, mixing, and timing obfuscation
Performance optimization through simulations to determine ideal mixnet parameters for balancing performance and security

The Mixnet integration would strengthen data security for modular networks, making them more resilient to manipulation while protecting user privacy. Nym’s research will continue its work to offer robust protection to a modular future.

Stay tuned for more updates on the project!