Can’t close the barn door

Posted onJanuary 10, 2012 by Benjamin Hartley

So, SOPA is the news of the day, in terms of the Internet and security; it has been for well over a month now.
In case you’re not familiar, SOPA is the Stop Online Piracy Act. It will “authorize the U.S. Department of Justice to seek court orders against websites outside U.S. jurisdiction accused of infringing on copyrights, or of enabling or facilitating copyright infringement.”

I won’t bore you with the typical arguments about how it’ll infringe on free speech, or weakens safe harbor, etc. These arguments have been made, and they may have some validity, but let’s talk technology.

SOPA is the most recent in a long line of legislation intended to regulate the internet. Such legislation is doomed to failure. The internet was designed to be impossible to regulate. SOPA focuses on preventing search engines from directing users to sites, and ordering domain name registrars to delist sites. While there are other provisions, these are the primary tools for stopping piracy outside of US jurisdiction. They’re supremely ineffective tools, because neither search engines nor DNSes are necessary for the function of the Internet.

To understand this, let’s step back and look at what the Internet really is.

The Internet, or rather its precursors, were created in the 1960s as a result of an initiative by DARPA – the Defense Advanced Research Projects Agency. DARPA is notable for investing in all sorts of interesting projects that might have military applications – many are successful, and result in some of the most powerful technologies of our time. Granted, many are pretty off-the-wall and don’t look like they’ll ever amount to anything, but that’s the risk you take.
The Internet was created to enable communications even against attempts to disrupt the network – even against the loss of most metropolitan areas, such as might happen during a nuclear war. This is actually very hard to do: you have to come up with a design that works even if all of your central nodes are gone.
The Internet as we know it today has a number of elegant solutions which make it the most robust communications network ever known.
The first is in the data packet. All data sent on the Internet is broken up into packets – even when it’s called “streaming”, it actually consists of content that has been broken up into separate packets which are then reassembled at the destination. Each packet, in turn, has a portion that says to where the information is going (the address) and a portion which contains the actual data (payload). This means that any given packet can be lost or corrupted, and the entire rest of the message will still get through. Granted, with encryption or compression this might be a moot point, but on the other hand with error correction it can actually be made even more robust.
Beyond that, there are the routing protocols. Various routing protocols work somewhat differently in ways that are hard to describe, but they all serve roughly the same function. When a router receives a packet, it looks at the destination address and tries to find a route to that address. What’s especially clever is that if a given route fails, the router can then select an alternate route. In this way, the Internet can be self-healing. Bandwidth might drop as alternate routes are used, but so long as a path exists the message can still get through. And that path isn’t limited to even the same medium as was used in the past: Internet data can be sent over copper, satellite, radio, laser, physical media, even carrier pigeon!

Now, I haven’t mentioned DNS or search engines so far. That’s because we don’t need either.

DNS – Domain Name Service – is a technology that renders IP addresses into human-readable names. The addresses to which I alluded earlier are numerical. In IPv4 they’re a 32-bit binary number; in the newer IPv6 they’re a whopping 128 bits. Rendered into decimal, they’re a bit more manageable, but not by all that much – would you like to memorize strings of numbers like “192.168.15.106” for every website you visit? DNS is a service that your computer accesses which translates the much easier to recall names, like www.google.com into 74.125.227.147. It’s a nice convenience, but you don’t actually need it. And you’re not locked in to any one DNS server – you can set up your own, or you can actually use one that’s based outside of US jurisdiction.

And search engines?
Same thing – they’re a convenience. There isn’t even a specification on what a search engine is. And as you doubtless know, you can use whatever search engine you like, again including ones that are based outside of US jurisdiction.

There are technical solutions to these oversights, of course. But, thanks to the structure of the Internet, there are workarounds for those as well. The Internet was designed to be hard to disrupt. From a technical standpoint, attempts to regulate the Internet are basically the same as trying to disrupt it; it’s simply not a technology which was designed to be regulated.