With all of the posts asking TCP/IP questions recently, I decided it would be a good idea to make this thread. I have uploaded some of these files before, but searching through 20,000 posts is not easy when you need a quick check of something...Actually, this won't be an easy quick check either, as there is going to be a **** load of information here.

I'm not posting this as a tutorial, because most of the information in this was not written by myself. However, if alot of people reply and enjoy this, or somehow lead me to believe this is a good thread, then maybe I'll write a tutorial about this. I have got a TCP/IP class under my Crimson Ghost shaped belt, so I do have a fairly good understanding of this material.

Some of this may not seem to fit into TCP/IP, and I did think about that, and also the fact that some of this is old and or just outdated, but I am posting that information anyway so people new to all of this can see how things worked back in the day.

As for the information that may seem off topic, I just think it's important to add as much here as possible. Some of the information dealing with Hacking and so on may seem out of place, but any hacker that actually knows what they are talking about has a good understanding of how TCP/IP and the protocols it includes all work. So I put it in.

Some of this deals with security, some of it not. If you find any blatant errors, or mis leading information, please reply and point this out. I'd like people to actually learn things the right way from this, and not read it and be flamed for a misunderstanding or something.

On with the text:

I have tried to list who actually wrote these, but some are older than the machine I am typing this all on and I have tried getting the originals for all of these so that you can see who wrote these, and so credit may be given to those who actually deserve it.

First: Some general tips to keep you from being a lamer/**** head online:

I put this here because if you plan on learning TCP/IP, you are more than likely going to be online. And hopefully this guide will help you out in not being a dick head.

Tip #1 - Hacking, hacking is NOT e-mail bombing, being an IRC warrior or
harassing someone. It is the long honored trade of learning about Operating
Systems, Unix for expamle as it's the most popular, and getting to know how
it works inside and out. How to program for it, how to manipulate it and
control it. If you don't know about an Operating System you want to get
into, go learn about it. There is no magic command or word or program, that
I know of, that will let you get and takeover any system with a wave of a
wand. It just don't happen that way.

Tip #2 - This goes with Tip #1, go to school, or get some books and learn.
Read all you can about Unix, C and TCI/IP. I won't lie, it won't happen over
night, it's taken me 2 years of hard core dedication to get to the state I'm
at now, and still I can't keep up with it! For the begginner, get a book on
Unix, learn it, read it over and over until you KNOW it.

Tip #3 - Recources, and where do you get them? Well the best place to find
information on the latest security flaws and hole in not CERN. They post
only after the problem is fixed and every sysadmin and their mother knows
the fix. Go to News Groups. Not the lame ones like "alt.hackers", the only
people you find there are little kids on AOL wanting to know those magic
words. Get on groups like "comp.security.unix", these are where the BIG boys
hang out. The CEO from Sun Microsystmes posts to it, head honchos from
Novell and University Professiors all use these. They post questions and
possable fixes to holes no one has even thought about yet. They are gold
mines.

Tip #3 - I know this may sound lame to the more vetren hacker, but get
invloved in a group. The ones I'm in are always passing new information to
each other and doing, or working on little projects. I'm always amazed on
what I learn from other members of my group.

Tip #4 - Don't e-mail people with kick ass web sites and ask them to help
you be a hacker. Most of the time they think you are just some lamer and
trash your e-mail. Like I said, It's taken me two years to get where I am
now, I'm NOT going to take two years to teach you what I know. Like I said
above, get a book, go to school. read, read, read....

Tip #4 - Screw IRC, the people that you may talk to there are either there
just to chat, or are big head ego inflated dicks. There is NO way anyone
will learn anything on IRC, save how to be an IRC warrior. So skip it.

Tip #5 - Before you go around bugging other people asking them tons of
questions, go look for the answer yourself. Thats a key aspect for a good
hacker; to be able to track down and find little tid bits of information on
obscure topics.

Tip #6 - I think I might of said this before, but get familer with C or a C
based programming language like PERL, Java, or VC++. 90% of hacking has to
do with you writing, or using some sort of script written in C or PERL to
open up an exploit or hole.

Well thats all I can think of for now......

Ê

----------------------------------------------------------------------------

bbuster@succeed.net
___________________________________________________________________________

An Introduction to TCP/IP:

Introduction
to
the Internet Protocols





C R

C S
Computer Science Facilities Group
C I

L S


RUTGERS
The State University of New Jersey




3 July 1987

This is an introduction to the Internet networking protocols (TCP/IP).
It includes a summary of the facilities available and brief
descriptions of the major protocols in the family.

Copyright (C) 1987, Charles L. Hedrick. Anyone may reproduce this
document, in whole or in part, provided that: (1) any copy or
republication of the entire document must show Rutgers University as
the source, and must include this notice; and (2) any other use of
this material must reference this manual and Rutgers University, and
the fact that the material is copyright by Charles Hedrick and is used
by permission.



Unix is a trademark of AT&T Technologies, Inc.



Table of Contents


1. What is TCP/IP? 1
2. General description of the TCP/IP protocols 5
2.1 The TCP level 7
2.2 The IP level 10
2.3 The Ethernet level 11
3. Well-known sockets and the applications layer 12
3.1 An example application: SMTP 15
4. Protocols other than TCP: UDP and ICMP 17
5. Keeping track of names and information: the domain system 18
6. Routing 20
7. Details about Internet addresses: subnets and broadcasting 21
8. Datagram fragmentation and reassembly 23
9. Ethernet encapsulation: ARP 24
10. Getting more information 25






































i



This document is a brief introduction to TCP/IP, followed by advice on
what to read for more information. This is not intended to be a
complete description. It can give you a reasonable idea of the
capabilities of the protocols. But if you need to know any details of
the technology, you will want to read the standards yourself.
Throughout the text, you will find references to the standards, in the
form of "RFC" or "IEN" numbers. These are document numbers. The final
section of this document tells you how to get copies of those
standards.



1. What is TCP/IP?


TCP/IP is a set of protocols developed to allow cooperating computers
to share resources across a network. It was developed by a community
of researchers centered around the ARPAnet. Certainly the ARPAnet is
the best-known TCP/IP network. However as of June, 87, at least 130
different vendors had products that support TCP/IP, and thousands of
networks of all kinds use it.

First some basic definitions. The most accurate name for the set of
protocols we are describing is the "Internet protocol suite". TCP and
IP are two of the protocols in this suite. (They will be described
below.) Because TCP and IP are the best known of the protocols, it
has become common to use the term TCP/IP or IP/TCP to refer to the
whole family. It is probably not worth fighting this habit. However
this can lead to some oddities. For example, I find myself talking
about NFS as being based on TCP/IP, even though it doesn't use TCP at
all. (It does use IP. But it uses an alternative protocol, UDP,
instead of TCP. All of this alphabet soup will be unscrambled in the
following pages.)

The Internet is a collection of networks, including the Arpanet,
NSFnet, regional networks such as NYsernet, local networks at a number
of University and research institutions, and a number of military
networks. The term "Internet" applies to this entire set of networks.
The subset of them that is managed by the Department of Defense is
referred to as the "DDN" (Defense Data Network). This includes some
research-oriented networks, such as the Arpanet, as well as more
strictly military ones. (Because much of the funding for Internet
protocol developments is done via the DDN organization, the terms
Internet and DDN can sometimes seem equivalent.) All of these
networks are connected to each other. Users can send messages from
any of them to any other, except where there are security or other
policy restrictions on access. Officially speaking, the Internet
protocol documents are simply standards adopted by the Internet
community for its own use. More recently, the Department of Defense
issued a MILSPEC definition of TCP/IP. This was intended to be a more
formal definition, appropriate for use in purchasing specifications.
However most of the TCP/IP community continues to use the Internet
standards. The MILSPEC version is intended to be consistent with it.

Whatever it is called, TCP/IP is a family of protocols. A few provide
1



"low-level" functions needed for many applications. These include IP,
TCP, and UDP. (These will be described in a bit more detail later.)
Others are protocols for doing specific tasks, e.g. transferring files
between computers, sending mail, or finding out who is logged in on
another computer. Initially TCP/IP was used mostly between
minicomputers or mainframes. These machines had their own disks, and
generally were self-contained. Thus the most important "traditional"
TCP/IP services are:

- file transfer. The file transfer protocol (FTP) allows a user on
any computer to get files from another computer, or to send files
to another computer. Security is handled by requiring the user
to specify a user name and password for the other computer.
Provisions are made for handling file transfer between machines
with different character set, end of line conventions, etc. This
is not quite the same thing as more recent "network file system"
or "netbios" protocols, which will be described below. Rather,
FTP is a utility that you run any time you want to access a file
on another system. You use it to copy the file to your own
system. You then work with the local copy. (See RFC 959 for
specifications for FTP.)

- remote login. The network terminal protocol (TELNET) allows a
user to log in on any other computer on the network. You start a
remote session by specifying a computer to connect to. From that
time until you finish the session, anything you type is sent to
the other computer. Note that you are really still talking to
your own computer. But the telnet program effectively makes your
computer invisible while it is running. Every character you type
is sent directly to the other system. Generally, the connection
to the remote computer behaves much like a dialup connection.
That is, the remote system will ask you to log in and give a
password, in whatever manner it would normally ask a user who had
just dialed it up. When you log off of the other computer, the
telnet program exits, and you will find yourself talking to your
own computer. Microcomputer implementations of telnet generally
include a terminal emulator for some common type of terminal.
(See RFC's 854 and 855 for specifications for telnet. By the
way, the telnet protocol should not be confused with Telenet, a
vendor of commercial network services.)

- computer mail. This allows you to send messages to users on
other computers. Originally, people tended to use only one or
two specific computers. They would maintain "mail files" on
those machines. The computer mail system is simply a way for you
to add a message to another user's mail file. There are some
problems with this in an environment where microcomputers are
used. The most serious is that a micro is not well suited to
receive computer mail. When you send mail, the mail software
expects to be able to open a connection to the addressee's
computer, in order to send the mail. If this is a microcomputer,
it may be turned off, or it may be running an application other
than the mail system. For this reason, mail is normally handled
by a larger system, where it is practical to have a mail server
running all the time. Microcomputer mail software then becomes a
2



user interface that retrieves mail from the mail server. (See
RFC 821 and 822 for specifications for computer mail. See RFC
937 for a protocol designed for microcomputers to use in reading
mail from a mail server.)

These services should be present in any implementation of TCP/IP,
except that micro-oriented implementations may not support computer
mail. These traditional applications still play a very important role
in TCP/IP-based networks. However more recently, the way in which
networks are used has been changing. The older model of a number of
large, self-sufficient computers is beginning to change. Now many
installations have several kinds of computers, including
microcomputers, workstations, minicomputers, and mainframes. These
computers are likely to be configured to perform specialized tasks.
Although people are still likely to work with one specific computer,
that computer will call on other systems on the net for specialized
services. This has led to the "server/client" model of network
services. A server is a system that provides a specific service for
the rest of the network. A client is another system that uses that
service. (Note that the server and client need not be on different
computers. They could be different programs running on the same
computer.) Here are the kinds of servers typically present in a
modern computer setup. Note that these computer services can all be
provided within the framework of TCP/IP.

- network file systems. This allows a system to access files on
another computer in a somewhat more closely integrated fashion
than FTP. A network file system provides the illusion that disks
or other devices from one system are directly connected to other
systems. There is no need to use a special network utility to
access a file on another system. Your computer simply thinks it
has some extra disk drives. These extra "virtual" drives refer
to the other system's disks. This capability is useful for
several different purposes. It lets you put large disks on a few
computers, but still give others access to the disk space. Aside
from the obvious economic benefits, this allows people working on
several computers to share common files. It makes system
maintenance and backup easier, because you don't have to worry
about updating and backing up copies on lots of different
machines. A number of vendors now offer high-performance
diskless computers. These computers have no disk drives at all.
They are entirely dependent upon disks attached to common "file
servers". (See RFC's 1001 and 1002 for a description of
PC-oriented NetBIOS over TCP. In the workstation and
minicomputer area, Sun's Network File System is more likely to be
used. Protocol specifications for it are available from Sun
Microsystems.)

- remote printing. This allows you to access printers on other
computers as if they were directly attached to yours. (The most
commonly used protocol is the remote lineprinter protocol from
Berkeley Unix. Unfortunately, there is no protocol document for
this. However the C code is easily obtained from Berkeley, so
implementations are common.)

3



- remote execution. This allows you to request that a particular
program be run on a different computer. This is useful when you
can do most of your work on a small computer, but a few tasks
require the resources of a larger system. There are a number of
different kinds of remote execution. Some operate on a command
by command basis. That is, you request that a specific command
or set of commands should run on some specific computer. (More
sophisticated versions will choose a system that happens to be
free.) However there are also "remote procedure call" systems
that allow a program to call a subroutine that will run on
another computer. (There are many protocols of this sort.
Berkeley Unix contains two servers to execute commands remotely:
rsh and rexec. The man pages describe the protocols that they
use. The user-contributed software with Berkeley 4.3 contains a
"distributed shell" that will distribute tasks among a set of
systems, depending upon load. Remote procedure call mechanisms
have been a topic for research for a number of years, so many
organizations have implementations of such facilities. The most
widespread commercially-supported remote procedure call protocols
seem to be Xerox's Courier and Sun's RPC. Protocol documents are
available from Xerox and Sun. There is a public implementation
of Courier over TCP as part of the user-contributed software with
Berkeley 4.3. An implementation of RPC was posted to Usenet by
Sun, and also appears as part of the user-contributed software
with Berkeley 4.3.)

- name servers. In large installations, there are a number of
different collections of names that have to be managed. This
includes users and their passwords, names and network addresses
for computers, and accounts. It becomes very tedious to keep
this data up to date on all of the computers. Thus the databases
are kept on a small number of systems. Other systems access the
data over the network. (RFC 822 and 823 describe the name server
protocol used to keep track of host names and Internet addresses
on the Internet. This is now a required part of any TCP/IP
implementation. IEN 116 describes an older name server protocol
that is used by a few terminal servers and other products to look
up host names. Sun's Yellow Pages system is designed as a
general mechanism to handle user names, file sharing groups, and
other databases commonly used by Unix systems. It is widely
available commercially. Its protocol definition is available
from Sun.)

- terminal servers. Many installations no longer connect terminals
directly to computers. Instead they connect them to terminal
servers. A terminal server is simply a small computer that only
knows how to run telnet (or some other protocol to do remote
login). If your terminal is connected to one of these, you
simply type the name of a computer, and you are connected to it.
Generally it is possible to have active connections to more than
one computer at the same time. The terminal server will have
provisions to switch between connections rapidly, and to notify
you when output is waiting for another connection. (Terminal
servers use the telnet protocol, already mentioned. However any
real terminal server will also have to support name service and a
4



number of other protocols.)

- network-oriented window systems. Until recently, high-
performance graphics programs had to execute on a computer that
had a bit-mapped graphics screen directly attached to it.
Network window systems allow a program to use a display on a
different computer. Full-scale network window systems provide an
interface that lets you distribute jobs to the systems that are
best suited to handle them, but still give you a single
graphically-based user interface. (The most widely-implemented
window system is X. A protocol description is available from
MIT's Project Athena. A reference implementation is publically
available from MIT. A number of vendors are also supporting
NeWS, a window system defined by Sun. Both of these systems are
designed to use TCP/IP.)

Note that some of the protocols described above were designed by
Berkeley, Sun, or other organizations. Thus they are not officially
part of the Internet protocol suite. However they are implemented
using TCP/IP, just as normal TCP/IP application protocols are. Since
the protocol definitions are not considered proprietary, and since
commercially-support implementations are widely available, it is
reasonable to think of these protocols as being effectively part of
the Internet suite. Note that the list above is simply a sample of
the sort of services available through TCP/IP. However it does
contain the majority of the "major" applications. The other
commonly-used protocols tend to be specialized facilities for getting
information of various kinds, such as who is logged in, the time of
day, etc. However if you need a facility that is not listed here, we
encourage you to look through the current edition of Internet
Protocols (currently RFC 1011), which lists all of the available
protocols, and also to look at some of the major TCP/IP
implementations to see what various vendors have added.



2. General description of the TCP/IP protocols


TCP/IP is a layered set of protocols. In order to understand what
this means, it is useful to look at an example. A typical situation
is sending mail. First, there is a protocol for mail. This defines a
set of commands which one machine sends to another, e.g. commands to
specify who the sender of the message is, who it is being sent to, and
then the text of the message. However this protocol assumes that
there is a way to communicate reliably between the two computers.
Mail, like other application protocols, simply defines a set of
commands and messages to be sent. It is designed to be used together
with TCP and IP. TCP is responsible for making sure that the commands
get through to the other end. It keeps track of what is sent, and
retransmitts anything that did not get through. If any message is too
large for one datagram, e.g. the text of the mail, TCP will split it
up into several datagrams, and make sure that they all arrive
correctly. Since these functions are needed for many applications,
they are put together into a separate protocol, rather than being part
5



of the specifications for sending mail. You can think of TCP as
forming a library of routines that applications can use when they need
reliable network communications with another computer. Similarly, TCP
calls on the services of IP. Although the services that TCP supplies
are needed by many applications, there are still some kinds of
applications that don't need them. However there are some services
that every application needs. So these services are put together into
IP. As with TCP, you can think of IP as a library of routines that
TCP calls on, but which is also available to applications that don't
use TCP. This strategy of building several levels of protocol is
called "layering". We think of the applications programs such as
mail, TCP, and IP, as being separate "layers", each of which calls on
the services of the layer below it. Generally, TCP/IP applications
use 4 layers:

- an application protocol such as mail

- a protocol such as TCP that provides services need by many
applications

- IP, which provides the basic service of getting datagrams to
their destination

- the protocols needed to manage a specific physical medium, such
as Ethernet or a point to point line.

TCP/IP is based on the "catenet model". (This is described in more
detail in IEN 48.) This model assumes that there are a large number
of independent networks connected together by gateways. The user
should be able to access computers or other resources on any of these
networks. Datagrams will often pass through a dozen different
networks before getting to their final destination. The routing
needed to accomplish this should be completely invisible to the user.
As far as the user is concerned, all he needs to know in order to
access another system is an "Internet address". This is an address
that looks like 128.6.4.194. It is actually a 32-bit number. However
it is normally written as 4 decimal numbers, each representing 8 bits
of the address. (The term "octet" is used by Internet documentation
for such 8-bit chunks. The term "byte" is not used, because TCP/IP is
supported by some computers that have byte sizes other than 8 bits.)
Generally the structure of the address gives you some information
about how to get to the system. For example, 128.6 is a network
number assigned by a central authority to Rutgers University. Rutgers
uses the next octet to indicate which of the campus Ethernets is
involved. 128.6.4 happens to be an Ethernet used by the Computer
Science Department. The last octet allows for up to 254 systems on
each Ethernet. (It is 254 because 0 and 255 are not allowed, for
reasons that will be discussed later.) Note that 128.6.4.194 and
128.6.5.194 would be different systems. The structure of an Internet
address is described in a bit more detail later.

Of course we normally refer to systems by name, rather than by
Internet address. When we specify a name, the network software looks
it up in a database, and comes up with the corresponding Internet
address. Most of the network software deals strictly in terms of the
6



address. (RFC 882 describes the name server technology used to handle
this lookup.)

TCP/IP is built on "connectionless" technology. Information is
transfered as a sequence of "datagrams". A datagram is a collection
of data that is sent as a single message. Each of these datagrams is
sent through the network individually. There are provisions to open
connections (i.e. to start a conversation that will continue for some
time). However at some level, information from those connections is
broken up into datagrams, and those datagrams are treated by the
network as completely separate. For example, suppose you want to
transfer a 15000 octet file. Most networks can't handle a 15000 octet
datagram. So the protocols will break this up into something like 30
500-octet datagrams. Each of these datagrams will be sent to the
other end. At that point, they will be put back together into the
15000-octet file. However while those datagrams are in transit, the
network doesn't know that there is any connection between them. It is
perfectly possible that datagram 14 will actually arrive before
datagram 13. It is also possible that somewhere in the network, an
error will occur, and some datagram won't get through at all. In that
case, that datagram has to be sent again.

Note by the way that the terms "datagram" and "packet" often seem to
be nearly interchangable. Technically, datagram is the right word to
use when describing TCP/IP. A datagram is a unit of data, which is
what the protocols deal with. A packet is a physical thing, appearing
on an Ethernet or some wire. In most cases a packet simply contains a
datagram, so there is very little difference. However they can
differ. When TCP/IP is used on top of X.25, the X.25 interface breaks
the datagrams up into 128-byte packets. This is invisible to IP,
because the packets are put back together into a single datagram at
the other end before being processed by TCP/IP. So in this case, one
IP datagram would be carried by several packets. However with most
media, there are efficiency advantages to sending one datagram per
packet, and so the distinction tends to vanish.



2.1 The TCP level


Two separate protocols are involved in handling TCP/IP datagrams. TCP
(the "transmission control protocol") is responsible for breaking up
the message into datagrams, reassembling them at the other end,
resending anything that gets lost, and putting things back in the
right order. IP (the "internet protocol") is responsible for routing
individual datagrams. It may seem like TCP is doing all the work.
And in small networks that is true. However in the Internet, simply
getting a datagram to its destination can be a complex job. A
connection may require the datagram to go through several networks at
Rutgers, a serial line to the John von Neuman Supercomputer Center, a
couple of Ethernets there, a series of 56Kbaud phone lines to another
NSFnet site, and more Ethernets on another campus. Keeping track of
the routes to all of the destinations and handling incompatibilities
among different transport media turns out to be a complex job. Note
7



that the interface between TCP and IP is fairly simple. TCP simply
hands IP a datagram with a destination. IP doesn't know how this
datagram relates to any datagram before it or after it.

It may have occurred to you that something is missing here. We have
talked about Internet addresses, but not about how you keep track of
multiple connections to a given system. Clearly it isn't enough to
get a datagram to the right destination. TCP has to know which
connection this datagram is part of. This task is referred to as
"demultiplexing." In fact, there are several levels of demultiplexing
going on in TCP/IP. The information needed to do this demultiplexing
is contained in a series of "headers". A header is simply a few extra
octets tacked onto the beginning of a datagram by some protocol in
order to keep track of it. It's a lot like putting a letter into an
envelope and putting an address on the outside of the envelope.
Except with modern networks it happens several times. It's like you
put the letter into a little envelope, your secretary puts that into a
somewhat bigger envelope, the campus mail center puts that envelope
into a still bigger one, etc. Here is an overview of the headers that
get stuck on a message that passes through a typical TCP/IP network:

We start with a single data stream, say a file you are trying to send
to some other computer:

......................................................

TCP breaks it up into manageable chunks. (In order to do this, TCP
has to know how large a datagram your network can handle. Actually,
the TCP's at each end say how big a datagram they can handle, and then
they pick the smallest size.)

.... .... .... .... .... .... .... ....

TCP puts a header at the front of each datagram. This header actually
contains at least 20 octets, but the most important ones are a source
and destination "port number" and a "sequence number". The port
numbers are used to keep track of different conversations. Suppose 3
different people are transferring files. Your TCP might allocate port
numbers 1000, 1001, and 1002 to these transfers. When you are sending
a datagram, this becomes the "source" port number, since you are the
source of the datagram. Of course the TCP at the other end has
assigned a port number of its own for the conversation. Your TCP has
to know the port number used by the other end as well. (It finds out
when the connection starts, as we will explain below.) It puts this
in the "destination" port field. Of course if the other end sends a
datagram back to you, the source and destination port numbers will be
reversed, since then it will be the source and you will be the
destination. Each datagram has a sequence number. This is used so
that the other end can make sure that it gets the datagrams in the
right order, and that it hasn't missed any. (See the TCP
specification for details.) TCP doesn't number the datagrams, but the
octets. So if there are 500 octets of data in each datagram, the
first datagram might be numbered 0, the second 500, the next 1000, the
next 1500, etc. Finally, I will mention the Checksum. This is a
number that is computed by adding up all the octets in the datagram
8



(more or less - see the TCP spec). The result is put in the header.
TCP at the other end computes the checksum again. If they disagree,
then something bad happened to the datagram in transmission, and it is
thrown away. So here's what the datagram looks like now.

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |U|A|P|R|S|F| |
| Offset| Reserved |R|C|S|S|Y|I| Window |
| | |G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| your data ... next 500 octets |
| ...... |

If we abbreviate the TCP header as "T", the whole file now looks like
this:

T.... T.... T.... T.... T.... T.... T....

You will note that there are items in the header that I have not
described above. They are generally involved with managing the
connection. In order to make sure the datagram has arrived at its
destination, the recipient has to send back an "acknowledgement".
This is a datagram whose "Acknowledgement number" field is filled in.
For example, sending a packet with an acknowledgement of 1500
indicates that you have received all the data up to octet number 1500.
If the sender doesn't get an acknowledgement within a reasonable
amount of time, it sends the data again. The window is used to
control how much data can be in transit at any one time. It is not
practical to wait for each datagram to be acknowledged before sending
the next one. That would slow things down too much. On the other
hand, you can't just keep sending, or a fast computer might overrun
the capacity of a slow one to absorb data. Thus each end indicates
how much new data it is currently prepared to absorb by putting the
number of octets in its "Window" field. As the computer receives
data, the amount of space left in its window decreases. When it goes
to zero, the sender has to stop. As the receiver processes the data,
it increases its window, indicating that it is ready to accept more
data. Often the same datagram can be used to acknowledge receipt of a
set of data and to give permission for additional new data (by an
updated window). The "Urgent" field allows one end to tell the other
to skip ahead in its processing to a particular octet. This is often
useful for handling asynchronous events, for example when you type a
control character or other command that interrupts output. The other
fields are beyond the scope of this document.



9



2.2 The IP level


TCP sends each of these datagrams to IP. Of course it has to tell IP
the Internet address of the computer at the other end. Note that this
is all IP is concerned about. It doesn't care about what is in the
datagram, or even in the TCP header. IP's job is simply to find a
route for the datagram and get it to the other end. In order to allow
gateways or other intermediate systems to forward the datagram, it
adds its own header. The main things in this header are the source
and destination Internet address (32-bit addresses, like 128.6.4.194),
the protocol number, and another checksum. The source Internet
address is simply the address of your machine. (This is necessary so
the other end knows where the datagram came from.) The destination
Internet address is the address of the other machine. (This is
necessary so any gateways in the middle know where you want the
datagram to go.) The protocol number tells IP at the other end to
send the datagram to TCP. Although most IP traffic uses TCP, there
are other protocols that can use IP, so you have to tell IP which
protocol to send the datagram to. Finally, the checksum allows IP at
the other end to verify that the header wasn't damaged in transit.
Note that TCP and IP have separate checksums. IP needs to be able to
verify that the header didn't get damaged in transit, or it could send
a message to the wrong place. For reasons not worth discussing here,
it is both more efficient and safer to have TCP compute a separate
checksum for the TCP header and data. Once IP has tacked on its
header, here's what the message looks like:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| TCP header, then your data ...... |
| |

If we represent the IP header by an "I", your file now looks like
this:

IT.... IT.... IT.... IT.... IT.... IT.... IT....

Again, the header contains some additional fields that have not been
discussed. Most of them are beyond the scope of this document. The
flags and fragment offset are used to keep track of the pieces when a
datagram has to be split up. This can happen when datagrams are
forwarded through a network for which they are too big. (This will be
discussed a bit more below.) The time to live is a number that is
decremented whenever the datagram passes through a system. When it
goes to zero, the datagram is discarded. This is done in case a loop
10



develops in the system somehow. Of course this should be impossible,
but well-designed networks are built to cope with "impossible"
conditions.

At this point, it's possible that no more headers are needed. If your
computer happens to have a direct phone line connecting it to the
destination computer, or to a gateway, it may simply send the
datagrams out on the line (though likely a synchronous protocol such
as HDLC would be used, and it would add at least a few octets at the
beginning and end).



2.3 The Ethernet level


However most of our networks these days use Ethernet. So now we have
to describe Ethernet's headers. Unfortunately, Ethernet has its own
addresses. The people who designed Ethernet wanted to make sure that
no two machines would end up with the same Ethernet address.
Furthermore, they didn't want the user to have to worry about
assigning addresses. So each Ethernet controller comes with an
address builtin from the factory. In order to make sure that they
would never have to reuse addresses, the Ethernet designers allocated
48 bits for the Ethernet address. People who make Ethernet equipment
have to register with a central authority, to make sure that the
numbers they assign don't overlap any other manufacturer. Ethernet is
a "broadcast medium". That is, it is in effect like an old party line
telephone. When you send a packet out on the Ethernet, every machine
on the network sees the packet. So something is needed to make sure
that the right machine gets it. As you might guess, this involves the
Ethernet header. Every Ethernet packet has a 14-octet header that
includes the source and destination Ethernet address, and a type code.
Each machine is supposed to pay attention only to packets with its own
Ethernet address in the destination field. (It's perfectly possible
to cheat, which is one reason that Ethernet communications are not
terribly secure.) Note that there is no connection between the
Ethernet address and the Internet address. Each machine has to have a
table of what Ethernet address corresponds to what Internet address.
(We will describe how this table is constructed a bit later.) In
addition to the addresses, the header contains a type code. The type
code is to allow for several different protocol families to be used on
the same network. So you can use TCP/IP, DECnet, Xerox NS, etc. at
the same time. Each of them will put a different value in the type
field. Finally, there is a checksum. The Ethernet controller
computes a checksum of the entire packet. When the other end receives
the packet, it recomputes the checksum, and throws the packet away if
the answer disagrees with the original. The checksum is put on the
end of the packet, not in the header. The final result is that your
message looks like this:





11



+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ethernet destination address (first 32 bits) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ethernet dest (last 16 bits) |Ethernet source (first 16 bits)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ethernet source address (last 32 bits) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type code |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IP header, then TCP header, then your data |
| |
...
| |
| end of your data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ethernet Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

If we represent the Ethernet header with "E", and the Ethernet
checksum with "C", your file now looks like this:

EIT....C EIT....C EIT....C EIT....C EIT....C

When these packets are received by the other end, of course all the
headers are removed. The Ethernet interface removes the Ethernet
header and the checksum. It looks at the type code. Since the type
code is the one assigned to IP, the Ethernet device driver passes the
datagram up to IP. IP removes the IP header. It looks at the IP
protocol field. Since the protocol type is TCP, it passes the
datagram up to TCP. TCP now looks at the sequence number. It uses
the sequence numbers and other information to combine all the
datagrams into the original file.

The ends our initial summary of TCP/IP. There are still some crucial
concepts we haven't gotten to, so we'll now go back and add details in
several areas. (For detailed descriptions of the items discussed here
see, RFC 793 for TCP, RFC 791 for IP, and RFC's 894 and 826 for
sending IP over Ethernet.)



3. Well-known sockets and the applications layer


So far, we have described how a stream of data is broken up into
datagrams, sent to another computer, and put back together. However
something more is needed in order to accomplish anything useful.
There has to be a way for you to open a connection to a specified
computer, log into it, tell it what file you want, and control the
transmission of the file. (If you have a different application in
mind, e.g. computer mail, some analogous protocol is needed.) This is
done by "application protocols". The application protocols run "on
top" of TCP/IP. That is, when they want to send a message, they give
the message to TCP. TCP makes sure it gets delivered to the other
end. Because TCP and IP take care of all the networking details, the
12



applications protocols can treat a network connection as if it were a
simple byte stream, like a terminal or phone line.

Before going into more details about applications programs, we have to
describe how you find an application. Suppose you want to send a file
to a computer whose Internet address is 128.6.4.7. To start the
process, you need more than just the Internet address. You have to
connect to the FTP server at the other end. In general, network
programs are specialized for a specific set of tasks. Most systems
have separate programs to handle file transfers, remote terminal
logins, mail, etc. When you connect to 128.6.4.7, you have to specify
that you want to talk to the FTP server. This is done by having
"well-known sockets" for each server. Recall that TCP uses port
numbers to keep track of individual conversations. User programs
normally use more or less random port numbers. However specific port
numbers are assigned to the programs that sit waiting for requests.
For example, if you want to send a file, you will start a program
called "ftp". It will open a connection using some random number, say
1234, for the port number on its end. However it will specify port
number 21 for the other end. This is the official port number for the
FTP server. Note that there are two different programs involved. You
run ftp on your side. This is a program designed to accept commands
from your terminal and pass them on to the other end. The program
that you talk to on the other machine is the FTP server. It is
designed to accept commands from the network connection, rather than
an interactive terminal. There is no need for your program to use a
well-known socket number for itself. Nobody is trying to find it.
However the servers have to have well-known numbers, so that people
can open connections to them and start sending them commands. The
official port numbers for each program are given in "Assigned
Numbers".

Note that a connection is actually described by a set of 4 numbers:
the Internet address at each end, and the TCP port number at each end.
Every datagram has all four of those numbers in it. (The Internet
addresses are in the IP header, and the TCP port numbers are in the
TCP header.) In order to keep things straight, no two connections can
have the same set of numbers. However it is enough for any one number
to be different. For example, it is perfectly possible for two
different users on a machine to be sending files to the same other
machine. This could result in connections with the following
parameters:

Internet addresses TCP ports
connection 1 128.6.4.194, 128.6.4.7 1234, 21
connection 2 128.6.4.194, 128.6.4.7 1235, 21

Since the same machines are involved, the Internet addresses are the
same. Since they are both doing file transfers, one end of the
connection involves the well-known port number for FTP. The only
thing that differs is the port number for the program that the users
are running. That's enough of a difference. Generally, at least one
end of the connection asks the network software to assign it a port
number that is guaranteed to be unique. Normally, it's the user's
end, since the server has to use a well-known number.
13




Now that we know how to open connections, let's get back to the
applications programs. As mentioned earlier, once TCP has opened a
connection, we have something that might as well be a simple wire.
All the hard parts are handled by TCP and IP. However we still need
some agreement as to what we send over this connection. In effect
this is simply an agreement on what set of commands the application
will understand, and the format in which they are to be sent.
Generally, what is sent is a combination of commands and data. They
use context to differentiate. For example, the mail protocol works
like this: Your mail program opens a connection to the mail server at
the other end. Your program gives it your machine's name, the sender
of the message, and the recipients you want it sent to. It then sends
a command saying that it is starting the message. At that point, the
other end stops treating what it sees as commands, and starts
accepting the message. Your end then starts sending the text of the
message. At the end of the message, a special mark is sent (a dot in
the first column). After that, both ends understand that your program
is again sending commands. This is the simplest way to do things, and
the one that most applications use.

File transfer is somewhat more complex. The file transfer protocol
involves two different connections. It starts out just like mail.
The user's program sends commands like "log me in as this user", "here
is my password", "send me the file with this name". However once the
command to send data is sent, a second connection is opened for the
data itself. It would certainly be possible to send the data on the
same connection, as mail does. However file transfers often take a
long time. The designers of the file transfer protocol wanted to
allow the user to continue issuing commands while the transfer is
going on. For example, the user might make an inquiry, or he might
abort the transfer. Thus the designers felt it was best to use a
separate connection for the data and leave the original command
connection for commands. (It is also possible to open command
connections to two different computers, and tell them to send a file
from one to the other. In that case, the data couldn't go over the
command connection.)

Remote terminal connections use another mechanism still. For remote
logins, there is just one connection. It normally sends data. When
it is necessary to send a command (e.g. to set the terminal type or to
change some mode), a special character is used to indicate that the
next character is a command. If the user happens to type that special
character as data, two of them are sent.

We are not going to describe the application protocols in detail in
this document. It's better to read the RFC's yourself. However there
are a couple of common conventions used by applications that will be
described here. First, the common network representation: TCP/IP is
intended to be usable on any computer. Unfortunately, not all
computers agree on how data is represented. There are differences in
character codes (ASCII vs. EBCDIC), in end of line conventions
(carriage return, line feed, or a representation using counts), and in
whether terminals expect characters to be sent individually or a line
at a time. In order to allow computers of different kinds to
communicate, each applications protocol defines a standard
14



representation. Note that TCP and IP do not care about the
representation. TCP simply sends octets. However the programs at
both ends have to agree on how the octets are to be interpreted. The
RFC for each application specifies the standard representation for
that application. Normally it is "net ASCII". This uses ASCII
characters, with end of line denoted by a carriage return followed by
a line feed. For remote login, there is also a definition of a
"standard terminal", which turns out to be a half-duplex terminal with
echoing happening on the local machine. Most applications also make
provisions for the two computers to agree on other representations
that they may find more convenient. For example, PDP-10's have 36-bit
words. There is a way that two PDP-10's can agree to send a 36-bit
binary file. Similarly, two systems that prefer full-duplex terminal
conversations can agree on that. However each application has a
standard representation, which every machine must support.



3.1 An example application: SMTP


In order to give a bit better idea what is involved in the application
protocols, I'm going to show an example of SMTP, which is the mail
protocol. (SMTP is "simple mail transfer protocol.) We assume that a
computer called TOPAZ.RUTGERS.EDU wants to send the following message.

Date: Sat, 27 Jun 87 13:26:31 EDT
From: hedrick@topaz.rutgers.edu
To: levy@red.rutgers.edu
Subject: meeting

Let's get together Monday at 1pm.

First, note that the format of the message itself is described by an
Internet standard (RFC 822). The standard specifies the fact that the
message must be transmitted as net ASCII (i.e. it must be ASCII, with
carriage return/linefeed to delimit lines). It also describes the
general structure, as a group of header lines, then a blank line, and
then the body of the message. Finally, it describes the syntax of the
header lines in detail. Generally they consist of a keyword and then
a value.

Note that the addressee is indicated as LEVY@RED.RUTGERS.EDU.
Initially, addresses were simply "person at machine". However recent
standards have made things more flexible. There are now provisions
for systems to handle other systems' mail. This can allow automatic
forwarding on behalf of computers not connected to the Internet. It
can be used to direct mail for a number of systems to one central mail
server. Indeed there is no requirement that an actual computer by the
name of RED.RUTGERS.EDU even exist. The name servers could be set up
so that you mail to department names, and each department's mail is
routed automatically to an appropriate computer. It is also possible
that the part before the @ is something other than a user name. It is
possible for programs to be set up to process mail. There are also
provisions to handle mailing lists, and generic names such as
15



"postmaster" or "operator".

The way the message is to be sent to another system is described by
RFC's 821 and 974. The program that is going to be doing the sending
asks the name server several queries to determine where to route the
message. The first query is to find out which machines handle mail
for the name RED.RUTGERS.EDU. In this case, the server replies that
RED.RUTGERS.EDU handles its own mail. The program then asks for the
address of RED.RUTGERS.EDU, which is 128.6.4.2. Then the mail program
opens a TCP connection to port 25 on 128.6.4.2. Port 25 is the
well-known socket used for receiving mail. Once this connection is
established, the mail program starts sending commands. Here is a
typical conversation. Each line is labelled as to whether it is from
TOPAZ or RED. Note that TOPAZ initiated the connection:

RED 220 RED.RUTGERS.EDU SMTP Service at 29 Jun 87 05:17:18 EDT
TOPAZ HELO topaz.rutgers.edu
RED 250 RED.RUTGERS.EDU - Hello, TOPAZ.RUTGERS.EDU
TOPAZ MAIL From:<hedrick@topaz.rutgers.edu>
RED 250 MAIL accepted
TOPAZ RCPT To:<levy@red.rutgers.edu>
RED 250 Recipient accepted
TOPAZ DATA
RED 354 Start mail input; end with <CRLF>.<CRLF>
TOPAZ Date: Sat, 27 Jun 87 13:26:31 EDT
TOPAZ From: hedrick@topaz.rutgers.edu
TOPAZ To: levy@red.rutgers.edu
TOPAZ Subject: meeting
TOPAZ
TOPAZ Let's get together Monday at 1pm.
TOPAZ .
RED 250 OK
TOPAZ QUIT
RED 221 RED.RUTGERS.EDU Service closing transmission channel

First, note that commands all use normal text. This is typical of the
Internet standards. Many of the protocols use standard ASCII
commands. This makes it easy to watch what is going on and to
diagnose problems. For example, the mail program keeps a log of each
conversation. If something goes wrong, the log file can simply be
mailed to the postmaster. Since it is normal text, he can see what
was going on. It also allows a human to interact directly with the
mail server, for testing. (Some newer protocols are complex enough
that this is not practical. The commands would have to have a syntax
that would require a significant parser. Thus there is a tendency for
newer protocols to use binary formats. Generally they are structured
like C or Pascal record structures.) Second, note that the responses
all begin with numbers. This is also typical of Internet protocols.
The allowable respons