Dear all,

I'm currently looking into writing a program which will do HTTP fingerprinting to identify a HTTP server determining what vendor / version it's running.

You might say, this isn't very difficult, as the server sends a "Server:" header telling you exactly what it is? No, in fact many sites do forge this header, there are commercially available add-ons for popular web servers which remove this header (or substitute it with a forged one).

Similar tools exist for telnet, SMTP and in fact it all happens at a lower level in nmap and queso with their TCP fingerprinting.

Why then, do it at HTTP level?

Sometimes it's impossible to tell from either the TCP signature or anything else what web server someone's running, yet I believe that with a small number of requests (say, 3 or 4) I can get sufficient information to be able to easily distinguish any of the top web servers, even if the admin tries as hard as possible to hide this information.

Initial implementation seems easy in Perl, I have examined headers from several web servers and there are a number of distinguishing features which will help enormously. These are
- HTTP version supported
- Order of HTTP headers
- Presence of particular http headers
- Format of "ETag" header
- Wording of status message "Not Found", "Object Not Found", "Not found" etc (note this cannot be changed by the admin, although the HTML document returned can be)
- Reply to malformed requests

Your comments please.