August 2nd, 2005, 12:49 AM
Advanced Web Based Honeypot Techniques
Advanced Web Based Honeypot Techniques
GHDB operated by johnny.ihackstuff.com
The GHH project develops web based honeypots designed to lure "Google Hackers" using malicious search engine tactics, along with tools and documentation to allow others to develop customized honeypots, decreasing the exposure of vulnerable applications in the Google index.
This tutorial will expand upon extension spoofing and transparent linking, and how to apply it in the creation of customized web based honeypots. The v1.1 honeypots and documentation released by GHH will be used as a reference for this tutorial.
Spoofed file extensions
While browsing through the Google Hacking Database (GHDB), you should notice that not all of the signatures target server side scripts (.php for example). This hack, for example:
That hack searches for the file extension .txt. The contents of these files are usually interesting, and their exposure could introduce vulnerability on the server they are hosted on. There is usually more of a risk being introduced to the enviroment than a typical web application vulnerability in cases like these.
Or perhaps these:
Depending on their contents, a database file such as this could cause extreme losses. In order to emulate filetypes like these, GHH depends on apache htaccess files to spoof its file extension. We can then take advantage of server side scripting to log and handle the attack any way we want, and if we're using GHH as an engine, this means log remotely and apply signatures to the attack.
So following the previous tutorial on GHH v1.0 (Should still be compatible) we can leverage htaccess and Apache to allow our honeypot to spoof another file extension. By placing a htaccess file in the same directory as our honeypot with the following lines:
Apache & PHP will interpret the .xyz file as a PHP script . The only problem is that browsers won't behave normally when viewing some extensions (.mdb, .txt for example) To handle this, we can place the following PHP code at the beginning of our honeypot:
AddHandler application/x-httpd-php .mdb #Change .mdb to your filetype
AddType application/x-httpd-php .mdb #Change .mdb to your filetype
This will tell the browser to handle a file as a certain type of content. The previous code would be acceptable for a .sql, .txt, .log, .dat file or something similar. When the content reaches the attacker, the browser will behave like it should (we already have them captured, but it's best not to tip them off anyhow). If you had a database file, you'd want it to open in access for example. This would require 'Content-Type: application/msaccess' to be sent to the browser.
header('Content-Type: text/plain'); //This line must change
//Rest of code...
Content types available @:
Transparent linking is the process of advertising your honeypot to search engines, but not the casual users of your website. There are a few ways to do this, some better than others. The better your transparent link, the less false positives you'll have in your logs. The goal is to have visitors to your honeypot that are referred from a search engine, and not from the site it's hosted on. This forces them to find the honeypot through the engine, and by that vector you can retrieve the search query they used against your site (intention and motive!)
Simply making an obvious hyperlink with some text in your top level website:
Obvious problems include users clicking on the link, and filling your logs with false positive. Don't use this type of link.
The following CSS style will make the link the same color as your background. You should change black to match your background.
Then apply your style to the link.
This has it's problems as well. It's cumbersome, because you might not know what the background will be behind the link. This makes a literally transparent link desireable, however I haven't found any options other than CSS Alpha() function, which doesn't seem to work well with text.
<a href="http://yourwebsite.com/honeypot.php" class="camo">.</a>
The following CSS will prevent the link from being shown to the user at all, as long as their browser renders CSS.
The link is now completely nonexistent, except in the source. The thought was that being completely invisible would be the best option, however the GHH project learned the hard way that display:none is completely ignored by Google because it can be abused. Against what seems to be the popular belief, Google does not index links with a CSS style of display:none (such a smart spider!) It will however, be indexed by less powerful crawlers.
In order to leverage a disappearing link, you'll need to plug in some PHP to detect when the Googlebot comes around (You have to cater to Googlebot :))
This is also a pain, but it does the job. Other spiders aren't as smart as Googlebot, and freely crawl links with the display:none style, so this technique will compeletly cover the link from casual browsers and still let it be discovered by Google.
The use of image maps can be a quick way to link multiple honeypots. Create nearly untouchable links in an image.
<area shape="rect" coords="0,0,0,0" href="http://yourdomain.com/honeypot">
Buddy linking is as simple as having other domains link to your honeypot. When they are crawled, spiders will hopefully follow up to your site. Casual users of your site are not likely to cause false positives, however users of your buddies site may cause them, making it a good idea to stick to the tactics described here.
TELL the search engine where you are, and forget about linking. Most engines have a suggest feature, Google has sitemaps. If you don't feel like using the python tool or writing XML, there's the option to submit a textfile with URL's separated by CRLF's. Check it out here:
The nature of GHH is to be known but not seen. This is why working with GHH is challenging. The concept of Google Hacking and Honeypots are simple, however the design of the web and the design of a honeypot in tandem present the challenge of "hiding in plain sight" on the web. GHH is developed under that concept, which is useful in the creation of new tools related to the relevant attacks.
Benefits of GHH include very early warning of a potential attack, by catching an attacker in their reconnaisance phase and learning their possible motives. GHH also improves other vulnerable targets chances of survival on the web. By saturating a search engine index with specific false positives, it makes what was once an foolproof vector a more unreliable source of victims. So in short, it also benefits others.
I attached the installation flowchart that was just released in v1.1 of the GHH package since it's kind of handy to have a visual nearby. Comments encouraged and appreciated as always.