I recently published an essay on increasing centralization in Internet privacy in Meatspace Press’s new book called “Eaten by the Internet”. I admire the work of many of my co-contributers so this was a cool experience. You can read the book online here. I’ve reproduced the essay here on my blog.

…

Trust issues: internet privacy and centralised infrastructure

In the internet ecosystem, content delivery networks (CDNs) are the biggest piece of infrastructure you haven’t heard of. CDNs are systems of servers spread out across the world that cache content (like images, videos and web pages) and serve it to users from their nearest location. This reduces load times and improves website performance. For example, if you live in New Delhi, when you do a search on Google, the website that is sent to your browser is actually served from a server in India, not from Google’s HQ in Mountain View, California. CDNs sell scale and pool content; by their very nature, they centralise the internet.

Deploying thousands of servers across the world is expensive, which means there are only a handful of mainstream CDNs; CDNs wield infrastructural power in deciding what users can see and access online. They are also increasingly prominent in the field of internet privacy, where they are starting to serve the role of trusted infrastructure in new privacy-preserving protocols. While these protocols are important for privacy, the reliance on expensive infrastructure has the effect of making them deployable only by large tech companies. This has ramifications for the politics of access to privacy on the internet and a danger that smaller organisations acting in the public interest will not be able to afford to provide privacy for their users.

New privacy protocols

The central privacy problem with traditional encrypted connections between your phone and the server that stores the content is that this server learns both “who” and “what”: “who” is requesting “what” content. A server that learns that Alice (“who”) has visited an abortion clinic website (“what”) can leak that sensitive information, possibly endangering Alice. If all the server learns is that someone is accessing an abortion clinic website, however, that is a meaningful privacy improvement. This improvement is even more pronounced if a lot of people are accessing that website. The anonymity of crowds is good for user privacy.

Breaking this link between “who” and “what” in users’ internet access is a major principle in recent privacy protocols and products. Oblivious DoH [1], Oblivious HTTP [2], and Apple’s Private Relay [3] all build privacy protections into the technical infrastructures that connect networks, users, and servers. They do this by adding one or more servers called “relays” between the device that requests the content and the server that stores it. This new relay server doesn’t learn “what” you’re accessing, but it does know “who” you are. It relays your request on your behalf. The final server learns “what” you’re accessing—it has to, because it has to send the content back to you—but it doesn’t learn “who” you are, because it only talks to the relay. No one party learns everything, which makes it impossible to say definitively that a particular user did something.

The underlying principle here is to divide up the trust. Other solutions that draw on this principle propose a form of privacy-preserving measurement, in which sensitive user data is split up and sent to two separate servers. That way, no one server ever learns one user’s specific data, but it is still possible to get aggregate statistics from everyone who contributed. Approached as a matter of trust, this multiple-server approach to privacy divides trust between two parties. User privacy is protected as long as the two parties don’t collude, i.e. by sharing information, because to do so would be to defeat the point of the separation. This approach creates the need for a trusted third party: an organisation or company that agrees to serve as one of the parties and promises not to collude with the other party.

Trust-as-a-service

To be a trusted third party is to take on an infrastructural role. The business model is taking money from companies who want to provide users with content while preserving their privacy. In other words, trusted third parties sell trust-as-a-service. Trusted third parties are not a new concept, but what distinguishes the role of a trusted third party in new internet privacy protocols is the traffic component. If there is protection in the crowd, as these protocols assume, the trusted organisation needs to be able to handle a lot of users. This makes content delivery networks, or CDNs, appear as a natural fit to take on this role.

CDNs already provide infrastructural services at a large scale to companies on the internet. They are also expensive, because they provide a service that only big players require. For example, a content provider like Netflix needing to distribute its content worldwide in order to meet global demand. The need for a CDN only arises when such demand is high, and revenue is flowing; it is hardly surprising that CDNs therefore take a big slice of the profit in exchange for expanding the pie.

$$$

Protecting users’ privacy, however, should not be expensive. It should be cheap and easy. While it is great that privacy-enhancing protocols are being developed, with increasingly innovative ways of splitting up the trust, we should consider how expensive or difficult it is to implement and run them. If the infrastructure bill is so high that only a company the size of Google could deploy or implement these privacy standards, that is a serious problem.

It puts users in a position where their only choice for retaining their privacy on the internet will be to trust Big Tech and a commercial third party. This trusted third party is completely invisible to the users who are supposed to trust them. Even worse, there are very few infrastructure companies in the world capable of running such a service—further centralising internet infrastructure.

Should users just embrace CDNs as their only chance at workable privacy protocols online? No! The good news is that we have a precedent for relying on organizations that act in the public interest in these situations. The ubiquitous adoption of an older privacy-enhancing protocol, the HTTPS protocol, which ensures that your connection to a website is secure, was largely enabled by the Let’s Encrypt project, which effectively acted as a trusted third party and issued the digital certificates this protocol required for free. Crucially, Let’s Encrypt is run by a non-profit using donations and grants; it effectively acts as public interest infrastructure [4]. While it remains to be seen how the infrastructure-reliant privacy protocols currently in development end up being deployed and with what business relationships, the active participation of ISRG (the parent organisation of Let’s Encrypt) in developing these protocols is heartening [5]. One hopes that they would take on a similar non-profit-driven role in the actual operation of these systems. There is also active work being done on developing cryptographic privacy protocols that are cheap and easy to run while guaranteeing similar privacy properties. This should be encouraged and supported. We can expect to see these protocols starting to be widely deployed in the coming few years.

We need privacy-preserving protocols that can serve the interests of civil society and small organisations who cannot afford the massive infrastructure bills associated with CDN corporations. We need privacy on the internet to be accessible for everyone. Privacy is a political right. It cannot and should not be a premium service.