I finally got around to watching A new way to look at networking yesterday. This is a talk given by Van Jacobson at Google in 2006 (yes, it has been on my todo list for a long time).This is definitely worth watching if you are interested in networking.
A couple of quick comments (These are not particularly deep or anything. This is mostly for my own reference later.):
- He says that the current Internet was designed for conversations between end nodes but we’re using it for information dissemination.
- Me: This distinction relies on the data being disseminated to each user being identical. However, in the vast majority of cases even data that on the surface is identical such as web site content is actually unique for each visitor. Any site with advertisements or with customizable features are good examples. As a result we are still using the Internet for conversations in most situations.
- He outlines the development of networking:
- The phone network was about connecting wires. Conversations were implicit.
- The Internet added metadata (the source and destination) to the data which allowed for a much more resilient network to be created. The Internet is about conversations between end nodes.
- He wants to add another layer where content is addressable rather than the source or destination.
- He argues for making implicit information explicit so the network can make more intelligent decisions.
- This is what IP did by adding the source and destination to data.
- His idea of identifying the data not the source or destination is very interesting. A consequences of this model is that data must be immutable, identifiable and build in metadata such as the version and the date. It strikes me how the internal operation of the Git version control system matches these requirements.
- At the moment I write this cc7feea39bed2951cc29af3ad642f39a99dfe8d3 uniquely identifies the current version (content) of Linus’s kernel development tree.
If you’ve ever looked at Freenet, the idea of identifying data there is very similar. Freenet aims for total anonymity, so you can’t tag people or nodes, only data. Freenet is sort of a global namespace with the basic datum being the CHK (content hash key) where the “filename” (URL) is a hash of the data’s contents. You gain local namespaces through a PKI.
I bring this up because I always thought it was a brilliant model for dealing with redundancy in, for example, the web. Any time any two people are requesting the same data, they’re requesting the same file, even if they don’t know it. This has great implications for caching. If web pages were more modular (e.g., a web page wasn’t just one monolithic document, but composed of smaller documents) this would be even better: all of your static content could indeed be static.
Of course the problem with this sort of filesystem is the birthday paradox. I prefer to live in my dream world where hash collisions never happen ;)