Web content filtering - Quick & Dirty

Before continuing I am going to make the following assumptions;
1. You are using a linux gateway machine - any distro I make no distinctions here, SUSE, Mandrake, Slackware, Gentoo whatever rows your boat.
2. You are comfortable installing software on linux
3. You can use the commandline

If you are not comfortable with any of these, don't panic a little study and you will be up to speed in no time.

This little tutorial will show you how to;
1. Block annoying web content
2. Block unwanted websites
3. Control what users on the network can access from your network

Tools required
1. Gshield - http://muse.linuxmafia.org/gshield/ - nice little firewall script - makes transparent proxy set up a breeze
2. Squid - http://www.squid.org - proxy server
3. Dansguardian - http://www.dansguardian.org - nice content filtering proxy that sits between squid and the end user.
4. Adzap - http://adzapper.sourceforge.net - great little squid redirector for removing those pesky ad's from your display.

Basically the set up we are chasing here is;

INTERNET >>>>> SQUID / ADZAP >>>>>> DANSGUARDIAN >>>>> END USERS

Ok, go ahead and install these on your linux gateway machine.

Finished ... ok lets proceed

SETTING UP GSHIELD

All going well you should have gshield installed and have a file something like /etc/firewall/gShield.conf ... well open it (I like nano but again your editor of choice will do)

nano -w /etc/firewall/gShield.conf ... look its well commented - go ahead and make the necessary setting changes for you environment but the important part for this tutorial is the section;

# ------------------------------------------- #
# Transparent Proxy #
# --------------------------------------------#

# If you wish to ensure web traffic is pumped thru
# a proxy regardless of the client configuration,
# set ENABLE_TRANSPROXY to "YES" and fill out the rest.

Set it to YES ... thats a no brainer, now enter the IP of your gateway and port that squid is running on, usually 3128 by default but I suggest changing that to some obscure number that I will explain later and save the file.

SETTING UP AD ZAPPING

Some like ad's, some don't - I don't. This is an optional set but I threw it in anyway.

Ok, onto Ad zapping ... installed adzap, of course you have ... so we go in and look for a script called wrapzap - on my machine its located in /etc/adzapper but yours could be /usr/bin/local/adzap or wherever you installed it to - 'locate wrapzap' will set you right.

Open the wrapzap script up and have a look - ok what I changed in this script was the line;

ZAP_MODE="CLEAR" # or "CLEAR"

You have to add CLEAR here or you will get those ugly black and yellow "This Ad has been zapped" placeholders all over the page ... worse than the ad's I reckon ... now I get nice clear white space or the ads just disappear with a nice 1x1 pixel gif - I also added a zap folder to my local apache server and pointed the ZAP_BASE to these, speeds things up, but its a personal choice and not a necessity. If per chance you do use your apache server to do this and you get a "page cannot be found" where the ad should be, take careful note of the file the ad spot was trying to pull and then create a 1x1 pixel gif and name it the missing file and drop it in the zap folder on your server. All should be well then.

Adzap also comes with a nice little updating script, run it once in a while to grab the lastest ad servers - this script updates the squid-redirect file in the adzap/scripts folder.

SETTING UP SQUID

I will assume you have squid set up and all the correct acl's set so you can access the internet through it - if not visit the squid website and read up on it.

I will make it easy, transparent proxy set up can be a bit tricky to set up certain portions of the squid.conf file need changing ... just cut the following then paste these to the end of the /etc/squid/squid.conf file and make the necessary changes to suit your set up.

cache_effective_user squid # or whatever user you have for squid
cache_effective_group squid # ditto as above
visible_hostname yourbox.com #edit to your local machine name/domain
redirect_program /etc/adzapper/wrapzap #or wherever wrapzap lives
httpd_accel_host virtual
httpd_accel_port 80
httpd_accel_with_proxy on
httpd_accel_uses_host_header on
httpd_accel_single_host on
http_port ENTER SQUID PORT HERE (same as in gshield.conf)
https_port ENTER SQUID PORT HERE (same as in gshield.conf)

With transparent proxying - all computers on your network have to come through squid to access the internet - if they don't, bad luck - no access !!

SETTING UP DANSGUARDIAN

Ok, onto Dansguardian.

The file you are looking for here to edit is /etc/dansguardian/dansguardian.conf - its very well commented.

The important section is # the port that DansGuardian listens to
# It needs to be greater than 1024
filterport = port you want dansguardian to listen on

# the ip of the proxy (default is the loopback - i.e. this server)
proxyip = IP of your gateway

# the port DansGuardian connects to proxy on
proxyport = Port of squid proxy server (yes same as is in squid.conf & gshield.conf)

I also defaulted the "can't access page' to my local webserver so it serves up the banned page when people try and access the naughty bits.

Bits and pieces of Dansguardian need uncommenting but play with it until you get it right for your set up. The files are well written and easy to follow.

Look in the /etc/dansguardian folder .... it has a large number of files for you to play with like bannediplist, bannedextensionlist, bannedmimelist ... list goes on. Open them up and play - add/subtract whatever you think people on your network should have or have not

The /etc/dansguardian/blacklist folder if the crux of the site blocking - you can download the blacklist from dansguardian.org once for free the pay a small fee each month to update, or grab the blacklist from squidguard.org for free and drop it in here, the choice is all yours, you can even make your own !!

I have set it up to ban all .exe .zip .src .pif .avi .mp3 and the list goes on - from downloading - it protects my monthly download quota from marauding teenage sons and keeps the RIAA, MPAA off my back. They have pretty much free run of the internet - but downloading and visiting illegal content is out.

Now you need to fire up everything ... so do it - /etc/init.d/squid start /etc/init.d/dansguardian start /etc/init.d/gshield start ... hopefully all these start up scripts are where they should be.

Now go to a users machine ... try setting the internet connection to "direct connect to internet" - all going well you should get a failed page - this is good.

Set the users connection to your gateway IP and the port that dansguardian is running on and save - try going to a porn site - and you should hit the default blocked page ... for fun you can even make a custom page for the this - if not back to the dansguadian.cong file and tweak it.

For a bit of fun open a console session and tail -f /var/log/dansguardian/access.log .... watch where they cant go !!

Never fear, your computer - aim it directly at the port squid operates on and 'bang' you have full access to the internet, complete with ad blocking - now that brings me back to the using of the obscure port for squid - keep it a secret, if others on the network find out they can easily change their connection to point to squid and bypass dansguardian, dont leave your screen up so nosy ones can access your browser settings, or like me you will be going in and changing it all the time - teenage sons - sheesh !!

Remember any changes in the conf files and you have to restart the service to get the changes to stick.

Hope I haven't forgotten anything - for most problems that may arise the answers can be found in the logs - check them regularly.