As I was searching through AntiOnline, looking for ways a "newbie" like me could contribute, I realized that nobody had posted (or at least, I could not find) a general tutorial on HTTP. "Hey!" I said to myself, "I know (in general) how HTTP works! I'm sure someone would appreciate a tutorial on HTTP." So, I have set off to write this thing, and I hope it will be useful to someone out there.

HTTP, or the Hypertext Transfer Protocol, is one of the most important protocols used on the Internet today. It is the protocol that web clients (browsers) use to transfer web pages (and other files) from a web server to your computer. It can also be used to transfer files from your computer to a web server. Actually, the term "files" can be confusing, because when I send this tutorial in, I am not sending a file per say, I am sending the text that I am typing into the form. A better term (the one used in the RFC for HTTP) is resource; a resource can any type of data. By the way, HTTP is defined in RFC2616 (which I am going to be referring to a lot during the course of writing this). I would suggest that you read the RFC after reading this, because I am not going to be creating a comprehensive list of request methods, status codes, and headers. I could not shorten the RFC's lists without leaving out important parts, and I do not think I should copy and paste the lists into here; that would be a waste of space.

Even though the RFC can be confusing, HTTP is really very simple. Even so, I think I'll include an example that you can try on your computer, before confusing you with too many definitions. You can send HTTP messages to a server from any basic Telnet client. Just open up your telnet client and connect to any computer running a web server on port 80 (not every computer will have its web server running on port 80, but most will). Here's what I did to connect to google.com:
Code:
telnet> open google.com 80
Trying 216.239.35.100...
Connected to www.google.com (216.239.35.100).
Escape character is '^]'.
Oh, before I go on, everyone please do not use google.com. The steps I am outlining are standardized, and should work on any web server. I do not want Google to think that they are being attacked by hackers. Also, no, in case you were getting suspicious, this is not illegal; this is the same thing your web client does (except, your web client probably does it faster than you can type ). Okay, now onto our next step: requesting a page.
Code:
GET / HTTP/1.1
Note the extra line in the code segment. It is necessary. Just press Enter twice after you have typed the "GET / HTTP/1.1". This segment of code is fairly self explanatory. You want to get the "/" page (actually, you are asking for the index of the "/" folder of the web site since you do not know the name of any files). "HTTP/1.1" is simply indicating to the server that you are using version 1.1 of HTTP instead of 1.0 or (gasp) 0.9. All this, so far, is equivalent to typing "http://google.com/" into your web client's address bar. After pressing Enter twice, you should get a response similar to this:
Code:
HTTP/1.1 200 OK
Content-Length: 2486
Server: GWS/2.0
Date: Wed, 01 Jan 2003 21:34:40 GMT
Content-Type: text/html
Cache-control: private
Set-Cookie: PREF=ID=109a32574e737804:TM=1041456880:LM=1041456880:S=IQSsfydCTaF_UqVz; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com

<html>
...
</html>
I replaced the HTML code with ellipses because the actual HTML has no significance in this tutorial. The first line that you get is the status code of the reply. In this case, the code was 200 OK. This is the response that you probably want your client to receive when you are surfing the web. One code that I am sure everyone is familiar with is the 404 Not Found code (grimace). The rest of the lines are headers, which tell you about the file and request that you set a cookie (which I would not worry about at this point).

Now that you have gotten a (hopefully) clear example of a simple HTTP transaction, I will go into some definitions of what a HTTP message is. All HTTP messages are either requests, like what we typed, or responses, like what we received. Each message starts with a request line, like "GET / HTTP/1.1", or a status line, like "HTTP/1.1 200 OK". Note that when I say line, I mean a string of text followed by a CRLF. CRLF is normally a carriage return character followed by a line feed character, but any character combination sent that is considered a line break is supposed to work according to the RFC. In short, in this case, CRLF = Enter key equivalent. I hope that did not confuse anyone. After the request or status line, comes as many header lines, like "Content-Type: text/html", as you or the server wants. Then comes a blank line and, depending on the type of request or status line, a message body.

I should probably describe requests in more depth. First, a request line consists of: "Method URL HTTP-Version CRLF". And, as I said before, then you have optional header lines, a blank line, and an optional message body. An example of a method would be "GET". An example of a URL would be "/". An example of a HTTP-Version would be "HTTP/1.1".

Well, I hope this has given you enough information to have a general understanding of web clients and servers. I definitely have not given you enough information to program a full-fledged client or server though. If you want to do that, you really need to read the RFC. Also, if you are interested, read RFC2617, HTTP Authentication. That RFC goes into how password protected web sites can be created and access via HTTP.

Okay, that's it. If anyone has any suggestions or corrections, please send them to me. Thanks for reading.