This Thing Called the World Wide Web

I remember when there was no World Wide Web, when the Internet was not called the Internet. For today’s emerging generation of computer users, however, computer use IS the Internet or the Web. The journey from a room-sized machine churning out processed data to global information access from a handheld device weighing less than a pound probably does not fascinate you like it does me. But I believe that having a better understanding of what the web is made up of will help you understand why sometimes it doesn’t do what you want it to do. Please understand that this is a very basic method of describing the process, and is not, in any way, designed to give you enough information to say “I know how to fix DNS problems.” It’s just the basics.

The CAST of Characters
You, the computer user
Mike, the software developer
Carol, the website designer
Steve, the network engineer
A host of extras who have built the hardware that all this stuff runs on

The Settings
Your Home
A software company
Carol’s home
XYZ Networking
ABC Communications (telephone company)
The world

The Props
Your computer
Mike’s computer
Carol’s computer
The servers at XYZ Networking
A billion dollars of switching equipment owned by ABC Communications
Miles and Miles and Miles of cables of different kinds

The Story
When you press the power button on your computer, it does a whole bunch of stuff that gets it ready for you to use it. By the time you are presented with the desktop, we’ve been treated by the “extras” who built the hardware, and Mike and his colleagues who built the Operating System. But you still need some more of Mike’s talents, you need to open a browser. That would be Internet Explorer, or Mozilla Firefox, or Google Chrome, to name the three most popular. So if someone tells you “Open up a browser window,” they want you to start one of these three programs.

Your browser has a home page. That’s the page you see immediately when your browser opens up. You can tell your browser what you want to see when you open it up, but when your browser opens up, you’ve already accessed the World Wide Web. Your browser has sent a request to open your home page, and that request has been fulfilled by a web server. Sounds simple, but here’s the nuts and bolts of that:

Just like your mail needs an address to get to you, or from you to where you want to send it, web traffic needs an address to get where it needs to go. Every device that wants to send and receive information needs an address, and we call that an IP address. The IP stands for Internet Protocol, and when someone wants to know your IP address, they will always say “IP address;” they will (almost) never call it an “Internet Protocol address.” The IP address is assigned by one of two authorities: Within the organization, the IP address will be assigned by the network system. In a company, there would be a server set up to do that. In your average home network, either your Internet Service Provider’s access device or your switch or your wireless router will hand out addresses as new devices request access to the network. For example, Google’s IP address is 74.125.228.34 and if you type that into your browser’s address bar, it should take you to Google’s page, just like typing www.google.com into the address bar will do. This website also has an address; your computer has one. But it’s too difficult to remember all the numerical addresses of all the websites we want to visit, and fortunately, we don’t have to. And that’s a really good thing, because we’re running out of IP addresses in that format, and the new format is even more difficult to remember. It looks like this: FE80:0000:0000:0000:0202:B3FF:FE1E:8329, so we’re all glad we can type in the name instead of the number. As you hit the “go” or the “enter” or whatever you do to initiate the page load, the transaction is handed off to ABC Communications, our hypothetical internet service provider.

Now we’re going to jump to the other end of the transaction and then come back in a bit to tie them together.

Carol is a web designer who works from home. She almost certainly has an IP address, but she wouldn’t necessarily need one if she had some other means of getting her work to you. As Carol gets the web design finished, she loads the files that make up the website to a web server. The files are stored there until someone asks to see them. Someone asks to see them by typing in a request to a browser window. When you see the “www” in a website address, that means that you are requesting a web page; you are asking for a page stored on a web server, which is a server, or a dedicated portion of a server, whose purpose is to store and serve up web pages upon request. Some portion of a server might also be dedicated to file transfer, and that same organization whose website you visited may also have a file transfer protocol site, or ftp site. In that case, that particular website name wouldn’t start with www, it would start with “ftp.” Steve, the network engineer at XYZ Networking, has the web server set up for Carol to load the website onto. Steve has taken care of a lot of details that will make it all look like magic. Carol isn’t the only one who loads her files to Steve’s servers, Steve has a roomful of servers each capable of hosting multiple websites and serving up the pages with lightning speed. Since Carol works from her home, and Steve works from his server farm, their internet services are vastly different. Carol might consider using her own system from her home to host her website, but that would require an investment in hardware, software, and skills, not to mention a corporate internet service. Most ISP’s frown on using your residential internet service to host a website, because the traffic to and from the site is much heavier than just normal web traffic from a standard user. So Steve takes care of getting Carol set up with the information she will need to load her website and make changes to it as necessary. A website is made up of a number of different pages, and each page is a file. When you click on a menu item, you’re taken to a new page. What’s really happening is that you are calling up the file that contains that page. Behind the scenes, the file structure of a website looks similar to the file structure on your computer, with folders to keep it all organized. But remember that this is just at Steve’s location. The indexed web contains at least 1.59 BILLION web pages!

So now let’s say that Carol is working from her home—in St. Petersburg, Russia. And let’s say that Steve is working from his network services job—in Reno, Nevada. And let’s say that you want to read Carol’s work from your hotel room—in Sydney, Australia. Now that you’ve gotten an idea of the players and their roles, how do we pull all of it together?

Well, one of the first things your browser wants to know before it reaches out of your home (through your gateway, your portal to the world wide web) is “Who is my Domain Name System Server?” That is provided by your ISP, but there are DNS servers that you can tell your network adapter to use, if you’re really geeky. If you don’t specify one, you’ll be using the one that is hard-coded into your ISP’s hardware. Since internet traffic is guided by the NUMBERS, but we all type in NAMES, there must be some way to get the NAMES into NUMBERS. Right? Right. It’s called Domain Name System, and it handles that translation based on information that is provided to it by guys like Steve. There are 130 root servers (You will hear some geeks say that there are 12 or 13 of them, but they’re wrong, and here’s the internet authority’s statement on it. The 12 or 13 number applies to the groups in charge of those root servers. ) Each of these root servers has a file on it that tells where to find the authoritative DNS servers for all the top-level domains. A top level domain is the last part of the website name—like “dot-com” or “dot-org” or “dot-edu” or “dot-gov.” Your company or your ISP has a DNS server as well, and it’s job is to reach out to other DNS servers and find whatever it doesn’t know, until some DNS server reaches out to one of the servers in charge of the top-level-domain addresses and names. Then each of the DNS servers in the process caches or stores the information for later use.

Web traffic travels along wires and cables as signals, and those numerical addresses tell the switching equipment owned by the telephone companies where those signals ought to go, in a process that is way too complicated to explain in a blog post that is already long enough. All of this is going on millions of times a second all over the globe. Check out this image of a portion of the routing paths of the internet.

internet map

So now that you see how much is involved in the process of checking out a new recipe on the web, you can see that we should never be surprised when it sometimes doesn’t work correctly, we ought to be amazed that it ever works at all! There are an astonishing number of points of failure between your computer and my web host; some of those are hardware-related, some will be software related, and some will be human-error related. (Of the three of those, human error is the most difficult to track down.)

Boggles the mind, doesn’t it?