Geeklog

Geeklog Spam-X Plugin

(If you came here looking for Hendrickson Software Components' email spam filter of the same name, please click here.)

Introduction

The Geeklog Spam-X plugin was created to fight the problem of comment spam for Geeklog systems. If you are unfamiliar with comment spam you might see the Comment Spam Manifesto.

Spam protection in Geeklog is mostly based on the Spam-X plugin, originally developed by Tom Willet. It has a modular architecture that allows it to be extended with new modules to fight the spammer's latest tricks, should the need arise.

What is being checked for spam?

Geeklog and the Spam-X plugin will check the following for spam:

Module Types

The Spam-X plugin was built to be expandable to easily adapt to changes the comment spammers might make. There are three types of modules: Examine, Action, and Admin. A new module is contained in a file and can simply be dropped in and it will be added to the plugin.

Examine Modules

Geeklog ships with the following examine modules:

Spam Link Verification (SLV)

SLV is a centralized, server-based service that examines posts made on websites and detects when certain links show up in unusually high numbers. In other words, when a spammer starts spamming a lot of sites with the same URLs and those sites all report to SLV, the system will recognize this as a spam wave and will flag posts containing these URLs as spam.

In other words still, it's a dynamic blacklist that automatically updates itself when a spammer starts spamming for their site. And it can only get better (in terms of accuracy and reaction speed) the more sites use it.

SLV is a free service run by Russ Jones at www.linksleeve.org.

Privacy Notice: It should be stressed that using SLV means that information from your site is being sent to a third party's site. In some legislations you may have to inform your users about this fact - please check with your local privacy laws.

Sending information to an external site may also be undesirable on some setups, e.g. on a company intranet. You can disable SLV support by removing the four files SLV.Examine.class.php, SLVbase.class.php, SLVreport.Action.class.php, and SLVwhitelist.Admin.class.php from your Spam-X directory (/path/to/geeklog/plugins/spamx). Or you can simply disable the Spam-X plugin entirely (or uninstall it).

The SLV Examine and Action modules will extract all URLs from a post and only send those to SLV (i.e. the rest of the post's content is not being sent). They also remove any links that contain your Geeklog site's URL. In case a post does not contain any external links, the modules simply do not contact SLV at all.

Personal Blacklist

The Personal Blacklist module lets you add keywords and URLs that typically exist in spam posts. When you're being hit by spam, make sure to add the URLs of those spam posts to your Personal Blacklist so that they can be filtered out automatically, should the spammer try to post them again.

This will also help you get rid of spam that made it through, as you can then use the Mass Delete Comments and Mass Delete Trackbacks modules to easily remove large numbers of spam posts from your database.

The Personal Blacklist also has an option to import the Geeklog censor list and ban all comments which contain one of those words. This or an expanded list might be useful for a website that caters to children. Then no comments with offensive language could be posted.

IP Filter

Sometimes you will encounter spam that is coming from one or only a few IP addresses. By simply adding those IP addresses to the IP Filter module, any posts from these IPs will be blocked automatically.

In addition to single IP addresses, you can also add IP address ranges, either in CIDR notation or as simple from-to ranges.

Please note that IP addresses aren't really a good filter criterion. While some ISPs and hosting services are known to host spammers, it won't help much to block an IP address by one of the well-known ISPs. Often, the spammer will get a new IP address the next time he connects to the internet, while the blocked IP address will be reused and may be used by some innocent user.

IP of URL Filter

This module is only useful in a few special cases: Here you enter the IP address of a webserver that is used to host domains for which you may see spam. Some spammers have a lot of their sites on only a few webservers, so instead of adding lots of domains to your blacklist, you only add the IP addresses of those webservers. The Spam-X module will then check all the URLs in a post to see if any of these is hosted on one of those blacklisted webservers.

HTTP Header Filter

This module lets you filter for certain HTTP headers. Every HTTP request sent to your site is accompanied by a series of headers identifying, for example, the browser that your visitors uses, their preferred language, and other information.

With the Header filter module, you can block HTTP requests with certain headers. For example, some spammers are using Perl scripts to send their spam posts. The user agent (browser identification) sent by Perl scripts is usually something like "libwww-perl/5.805" (the version number may vary). So to block posts made by this user agent, you would enter:

Header:User-Agent
Content:^libwww-perl

This would block all posts from user agents beginning with "libwww-perl".

Action Modules

Once one of the examine modules detects a spam post, the action modules will decide what to do with the spam. Most of the time, you will simply want to delete the post then, so this is what the Delete Action module does.

As the name implies, the Mail Admin Action module sends an email to the site admin when a spam post is encountered. Since this can cause quite a lot of emails being sent, it is disabled by default.

Action modules have to be enabled specifically before they are used (examine modules, on the other hand, are activated by simply dropping them into the Spam-X directory). For this, every action module has a unique number that needs to be added up with the number of the other action modules you want to enable and entered as the value for the spamx config variable in Geeklog's main configuration.

Example

The Delete Action module has the value 128, while the Mail Admin Action module has the value 8. So to activate both modules, add 128 + 8 = 136 and enter that in the Configuration admin panel.

The SLV Examine module is complemented by a SLV Action module that ensures that SLV is notified of spam posts caught by other examine modules. It "piggybacks" on the Delete Action module, i.e. when you activate the Delete Action module, you'll also enable the SLV Action module.

Admin Modules

The Admin modules for the Personal Blacklist, IP Filter, IP of URL Filter, and HTTP Header Filter modules provide you with a form to add new entries. To delete an existing entry, simply click on it.

With the SLV Whitelist admin module you can add URLs that you don't want to be reported to SLV. This is useful when posts on your site happen to contain certain URLs quite often but you don't want those to be considered spam by SLV.
Note that your site's URL (i.e. $_CONF['site_url']) is automatically whitelisted, so you don't need to add it here.

The Log View module lets you inspect and clear the Spam-X logfile. The logfile contains additional information about the spam posts, e.g. which IP address they came from, the user id (if posted by a logged-in user), and which of the examine modules caught the spam post.

In case a large number of spam posts made it through without being caught, the Mass Delete Comments and Mass Delete Trackbacks modules will help you get rid of them easily. Before you use these modules, make sure to add the URLs or keywords from those spams to your Personal Blacklist.

Note about MT-Blacklist

MT-Blacklist was a blacklist, i.e. a listing of URLs that were used in spam posts, originally developed for Movable Type (hence the name) and maintained by Jay Allen.

Maintaining a blacklist is a lot of work, and you're continually playing catch-up with the spammers. Therefore, Jay Allen eventually discontinued MT-Blacklist on the assumption that new and better methods to detect spam are now available.

Starting with Geeklog 1.4.1, Geeklog no longer uses MT-Blacklist. All MT-Blacklist entries are removed from the database when you upgrade to Geeklog 1.4.1 and the MT-Blacklist examine and admin modules are no longer included.

Trackback Spam

Trackbacks are also run through Spam-X before they will be accepted by Geeklog. There are also some additional checks that can be performed on trackbacks: Geeklog can be configured to check if the site that supposedly sent the trackback actually contains a link back to your site. In addition, Geeklog can also check if the IP address of the site in the trackback URL matches the IP address that sent the trackback. Trackbacks that fail any of these tests are usually spam. Please refer to the documentation for the configuration for more information.

Configuration

The Spam-X plugin's configuration can be changed from the Configuration admin panel:

Spam-X Main Settings

Variable Default Value Description
logging true Whether to log recognized spam posts in the spamx.log logfile (if set to true) or not (false).
admin_override false The Spam-X plugin will filter posts by any user - even site admins. This can be a problem sometimes, e.g. when you want to post a note about spam that itself contains "spammy" URLs or keywords. When this option is set to true then posts made by users in the 'spamx Admin' group are not checked for spam.
timeout 5 Timeout (in seconds) for contacting external services such as SLV.
notification_email $_CONF['site_mail'] Email address to which spam notifications are sent when the Mail Admin action module is enabled.
action 128 This only exists as a fallback in case $_CONF['spamx'] in Geeklog's main configuration is not set. I.e. $_CONF['spamx'] takes precedence.

More Information

Further information as well as a support forum for the Spam-X plugin can be found on the Spam-X Plugin's Homepage and in the Geeklog Wiki.