Data Privacy Day special: a primer on metadata
Learn what metadata is, who can access it and how to protect it
Learn what metadata is, who can access it and how to protect it
Data Privacy Day is 28 January this year. It’s the perfect occasion to take stock of where we are when it comes to online privacy protections. In good news, end-to-end encryption is becoming firmly established as the norm. There is now broad recognition that privacy protections enable secure online communication. In addition, secure communications are the very basis for meaningful online interactions, from our social lives through to commerce.
But as the sheer volume of digital communications continues to increase — and with the advent of AI-driven surveillance — it is no longer the content of your message that matters, but its metadata. In fact, the metadata can often reveal things about your life far more accurately than the content. Your message content might contain your thoughts and opinions, but the metadata reveals your actions.
Consider this metaphor: you have a bird’s eye view of a busy city. You are watching the cars as they travel through the bustling traffic of the city. Now, from your position you cannot see what or who is inside any given car. But you CAN see its colour and size, what house it leaves from at what time, where it is going, how frequently it travels between those addresses and more. From that information, you can deduce quite a lot about what is going on: who is visiting whom, the nature of their relationship, even details about their health, habits, and wealth. This, in short, is metadata.
In this blog post we give a primer on metadata surveillance and what can be done about it.
What is ‘metadata’?
Firstly, what is metadata really? It is the data about your message, including its:
- Origin
- Type
- Destination
- Time
- Length
- Size
…and pretty much any other identifiable patterns of your messages.
What can it be used for?
Metadata can be analysed to reveal internet traffic patterns. These patterns are like a fingerprint. In fact, there is a type of attack called “website fingerprinting” and in these types of attacks, a malicious actor would download all the pages of a website, analyse the metadata of these pages, and then match them to patterns of traffic. They can then learn which specific page a person might be looking at: worrying, for example, in the case of a website that provides health advice.
In short, metadata can reveal:
- Which specific web page you visit, even over HTTPS
- Which web domain you request over encrypted DNS
- What you are typing, even in an encrypted web application (search, taxes, health)
- What you say in a voice conversation over an encrypted voice channel
…and more.
Why is metadata valuable?
Metadata reveals a lot, but there are other reasons why it is valuable for anyone wanting to spy on people.
- It is machine readable, making it the preferred target in AI-driven surveillance, and making it easy to analyse large volumes of data.
- It doesn’t take up much storage, compared to say a video file, which means it is easy to collect and store lots of it.
- It is openly available!
- It is currently unprotected by the law.
Who has access to metadata?
Unfortunately, metadata is largely unprotected, meaning there are a lot of actors who have direct access to it. These include:
- Internet Service Providers
- Internet Exchanges
- Autonomous systems
- BGP routers
- The internet backbone
- Even your wifi router, LAN
…and eavesdroppers, including, famously, the NSA.
So what can we do about it?
The most effective protection against metadata surveillance today is a mixnet. Let’s go back to our original metaphor. Imagine you are looking down at a busy city, but all the cars are identical. And what’s more, they pass through tunnels and under bridges, but when they emerge, it’s impossible to pinpoint which car is which and if they have changed lanes or not.
This is essentially what a mixnet does. It encrypts internet packets in layers of encryption that make all packets look identical. It then ‘mixes’ these packets by having them route through three layers of what are called ‘mix nodes’ where they are mixed with other identical looking packets, and micro time delays are added. It becomes pretty much impossible to trace and analyse who is speaking to who. A mixnet makes it impossible for a snooper to know:
- What you are saying
- Who you are communicating with (sending or receiving messages)
- When you are communicating
- How long you are communicating
- From where you are communicating
- The amount of data you are sending or receiving
- Any patterns in your communications
- Whether you are communicating at all
Nym is a decentralised, permissionless mixnet run by node operators all over the world. This is cybersecurity for and by a global community. And what’s more, there is a utility token, NYM, to make sure that this community of operators and supporters is incentivised to contribute and sustained in the long term.
We are always looking for new community members to run nodes and create privacy-enhanced applications on top of Nym. Join our community to participate and keep up to date.
Tools
- Chat webapp
- Ethereum RPC mixer
- Ethereum transaction submitter using Nym
- Mixnet speed dashboard
- Mixnodes telegram monitoring bot — Nodes.Guru
- Is Nym mixnet up
- Pastenym — anon text sharing service
Privacy loves company
Discord // Telegram // Element // Twitter
The internet is global and so is Nym: join the Nym Community wherever you are and help build the private internet today.
English // 中文 // Русский // Türkçe // Tiếng Việt // 日本 // Française // Español // Português // 한국인