Linux SecAdmin - Look Who's Knocking
Written By Gary Freeman - a.k.a. Aciscorouter
Tuesday February 7 2006 @ 17:35

DISCLAIMER: This is a tutorial of sorts that takes you through a day-to-day problem and solution that I am often faced with in my Security Planning / Operations role for a large Telecommunications company. I am not making any assumption as to where in the curve people reading this will be situated and I don't even guarantee this will be a good read. In fact, given my exposure and expertise of the tools used in this article, I may be missing the plot and some may find an easier, softer way of doing what I was tasked to do. Having said all of this, for those I've confused, sorry, I tried to provide links for further reading. For those I've disgusted with my simplicity or seeming Lamer approach, well, like you, I'm always learning and I'm open to criticism and advice. Finally, for those of you who found this article helpful in any way, even if it was only one person, I've succeeded with what I intended to do - PASS THE KNOWLEDGE ON :-)

Introduction
Why is it when you Google for something you absolutely need you can never find it? Well, case in fact, I had a Squid proxy server left over from a decommissioning project that was still seeing tons of traffic still when it shouldn't be seeing any! The Linux server was locked down using sudo and no one knew the root password so we had very little choices as to what programs we could run to view activity. The server was flaky and Netstat would never finish outputting the current activity. So the server folks approached me and asked if there was any way to find out what unique IP addresses internally were connecting to the five pre-configured proxy ports (8080, 8082, 8084, 8086, 8888).

As it turns out, the Squid admin user had access to the Tcpdump application and could run the application against Eth0. I got him to run Tcpdump and output it to a dump file for three hours worth of activity during the lunch hour web traffic spike. This produced a 470MB text file that I had to SFTP from his server to my Linux box.

Alrighty then! What do I do with a honkin' text file that repeats the same info endlessly? We have hits from employees and internal servers hitting the proxy ports, the proxy itself establishing connections to the web, the foreign sites replying to the proxy and then, finally, the proxy returns the data to the corporate host. One conversation from an internal host connecting to the homepage of their favorite security tutorial site could warrant four times the number of HTTP flows. I needed to strip out extraneous information and narrow down the million+ lines of data to something sensible. So, I started thinking of the commands that would be required so that eventually I could write a shell script.

TCPDUMP
Click here for more information about the tool
The first thing that had to be done was to capture the traffic to the server with the only tool I had available and so I typed:
Code:
tcpdump -i eth0 > tcpdumpfile.txt
I let this run through our busiest web surfing period of the day, 11 AM until 1 PM. I stopped the command by typing:
Code:
ps ax |grep tcpdump <- to get the PID
kill <PID>
I then pulled the 470 MB text file to my local Linux box using SFTP and placed it into my user directory to manipulate.

GREP
Click here for more information about the tool
The first thing that had to be done was blast through the huge 470 meg file and match the port numbers and output the matches to separate text files. This would make handling the data much easier if each file contained only the matching hits for each listening proxy port. Grep is needed to search for regular expressions and display the matching to the specified pattern. Here's what the syntax should look like to get what I needed:
Code:
grep '.8080' tcpdumpfile.txt > 8080-1.txt
grep '.8082' tcpdumpfile.txt > 8082-1.txt
grep '.8084' tcpdumpfile.txt > 8084-1.txt
grep '.8086' tcpdumpfile.txt > 8086-1.txt
grep '.8888' tcpdumpfile.txt > 8888-1.txt
As the code above illustrates, I was instructing the Grep command to match the pattern of port numbers and output the results of each search to a new file named after the port number match. The period I placed before the port number was some insurance that Grep didn't match any part of the TCP control numbers that are usually present in the TCPdump output. So now I had five files with distinctive information regarding any sessions that had a source or destination port numbers that matched all five listening proxy ports. The output of these files looked something like this:
Code:
tcpdumpfile.txt:11:18:07.024094 proxyserver.8888 > 10.xx.xx.22.3663: F 3250551350:3250551350(0) ack 271471715 win 6644 (DF)
tcpdumpfile.txt:11:18:07.024327 10.xx.xx.22.3663 > proxyserver.8888: . ack 1 win 63957 (DF)
tcpdumpfile.txt:11:18:10.296278 10.xx.xx.11.2466 > proxyserver.8888: P 1447232284:1447232603(319) ack 3955847267 win 63438 (DF)
tcpdumpfile.txt:11:18:10.296291 proxyserver.8888 > 10.xx.xx.11.2466: . ack 319 win 51210 (DF)
Great, looks somewhat readable doesn't it? Well, these text files that were produced are still 121,000 lines in length and about 15 MB. So that small excerpt you see above repeats itself many times and there is still so much information I don't care about.

AWK
Click here for more information about the tool
Now, to massage the data! We use Awk to parse only the source address column of the TCPdump output because now we have individual files per port and no longer care about port numbers, the destination addresses or any of the other extraneous data. Because Awk uses spaces by default as the field separator character we can easily depict the source address as the second column and instruct Awk only to extract the second column and output it to a new document:
Code:
awk '/.*/ {print $2}' 8080-1.txt | sort -u > 8080-2.txt
awk '/.*/ {print $2}' 8082-1.txt | sort -u > 8082-2.txt
awk '/.*/ {print $2}' 8084-1.txt | sort -u > 8084-2.txt
awk '/.*/ {print $2}' 8086-1.txt | sort -u > 8086-2.txt
awk '/.*/ {print $2}' 8888-1.txt | sort -u > 8888-2.txt
With the above commands we are instructing Awk to use a wildcard search pattern (/.*/) and output only column two, sorted by unique addresses, which we redirect to the another text file depicting the port number (e.g. 8080) and the instance (e.g. -2). The output of the new text file looks something like this:
Code:
10.xx.16.12.4081
10.xx.16.5.3573
10.xx.19.180.1029
10.xx.19.180.1030
10.xx.19.180.1031
10.xx.19.180.1032
10.xx.19.180.1033
10.xx.19.180.1034
10.xx.19.180.1035
10.xx.19.180.1036
10.xx.19.180.1042
10.xx.19.180.1046
10.xx.19.180.1047
Notice that the file is sorted by unique address (xx in the second octet is just to scramble my addresses) but since there is still a port number attached to the end of the IP address we have many duplicates of the same IP address. We need to remove the port numbers before we can sort the addresses and remove the duplicates.

CUT
Click here for more information about the tool
So now we are getting down to the nitty-gritty and have to use the Cut command to extract everything but the IP addresses and output the results to, yet, another file. We will have to use Cat to read the file and pipe it into the Cut command like so:
Code:
cat 8080-2.txt | cut -d. -f1,2,3,4 > 8080-3.txt
cat 8082-2.txt | cut -d. -f1,2,3,4 > 8082-3.txt
cat 8084-2.txt | cut -d. -f1,2,3,4 > 8084-3.txt
cat 8086-2.txt | cut -d. -f1,2,3,4 > 8086-3.txt
cat 8888-2.txt | cut -d. -f1,2,3,4 > 8888-3.txt
Ok, basically I defined the "." character as the delimiter character and told Cut to only output columns 1.2.3.4 (being the four octets in the IP addresses). This may be a problem if you have DNS names in your output as I did, I just deleted the duplicates manually with these as they were so few. In future, as IKnowNot pointed out, tcpdump should be run with the -n and -nn options so that the host IPs and port numbers aren't resolved to hostnames and port names. Now the output looks like this and now it's obvious that we have distinct duplicates that can be further sorted:
Code:
10.xx.16.12
10.xx.16.5
10.xx.19.180
10.xx.19.180
10.xx.19.180
10.xx.19.180
10.xx.19.180
10.xx.19.180
10.xx.19.180
10.xx.19.180
10.xx.19.180
10.xx.19.180
10.xx.19.180
SORT
Click here for more information about the tool
So, finally, we will sort each of the files and create a final output file with a list of all the unique IP addresses that have knocked on our server for the specified port. Remember, "-u" or "--unique" will output only the first instance of the addresses Sort sees and duplicates will be ignored:
Code:
sort -u 8080-3.txt > 8080.txt
sort -u 8082-3.txt > 8082.txt
sort -u 8084-3.txt > 8084.txt
sort -u 8086-3.txt > 8086.txt
sort -u 8888-3.txt > 8888.txt
Now, finally we have the results for each port in a filename that represents each of the ports we were querying.

Wha-hoo, but man, wouldn't it have been easier just to script something from the start and save the effort? Yes, and now we know what we are doing and how we are doing it, we can create a shell script in Korn or Bash to tie everything together for us. Open up your favorite Linux editor and cut and paste the code below:
Code:
#
# whoisknocking.sh
# (c)2006 Gary Freeman
#
# This script searches for unique IP addresses
# connecting to the user specified ports in  a
# tcpdump log file.
#
# DEFINE THE SHELL TO BE USED
#
#!/bin/ksh
#
# LIST THE VARIABLES
#
filename1=$1-1.txt
filename2=$1-2.txt
filename3=$1-3.txt
filename4=$1.txt
#
# CLEANUP ANY TRACES OF TEXTFILES
#
rm -rf $filename1 $filename2 $filename3 $filename4
#
# DEFINE THE TOOL USAGE ERRORLEVEL
#
if ( [ q$1 = q ] ) ;
  then
  echo " "
  echo " usage: whoisknocking <IP Port> <InputFile>"
  echo " "
  exit 1
  fi;
#
# GREP FOR THE PORT IN USER SPECIFIED FILE
#
grep $1 $2 > $filename1
#
# STRIP EVERYTHING BUT COLUMN TWO
#
awk '/.*/ {print $2}' $filename1 | sort -u > $filename2
#
# REMOVE THE PORT NUMBERS FROM OUTPUT
#
cat $filename2 | cut -d. -f1,2,3,4 > $filename3
#
# SORT THE REULTS AND ONLY SHOW UNIQUE IP ADDR
#
sort -u $filename3 > $filename4
#
# FINALLY, CLEANUP THE TEMP FILES
#
rm -rf $filename1 $filename2 $filename3
Now save the file as whoisknocking.sh (or whatever you like) and make sure the file is executable by running the following command:
Code:
chmod +x whoisknocking.sh
Now run the command using the following syntax and press enter:
Code:
./whoisknocking.sh <server port> <full path to tcpdump file>
Well, that concludes our broadcast :-) Hope you enjoyed the show. Stay tuned for future broadcasts. - GF