Repel http:BL is used with the Apache webserver to identify friendly search engines and detect malicious web bots such as email address harvesters and comment spammers. It accesses the DNS blacklist registry compiled by Project Honeypot to reliably determine the type and threat level of robots visiting your server. Repel can for example be used to:
Eliminating the flood of requests often made by malicious bots may improve the performance of web servers. Preventing harvesting of email addresses may lead to less spam. Blocking comment spammers from posting may reduce clutter on blogs and message boards.
Repel is free and comes with open source licensed under LGPL. It works with Python 2.3.5 or later.
Download the latest version of Repel. You can activate it for Apache webservers using rewrite rules:
RewriteEngine on
RewriteLock /path/to/Apache/rewritelock.lock
RewriteMap REPEL "prg:/path/to/scripts/Repel/repel.py --key=honeypotkey"
Use your own key from Project Honeypot in place of the honeypotkey. You can alternatively insert the key in the options file of Repel.
RewriteCond ${REPEL:%{REMOTE_ADDR}|OK} Suspicious|Malicious
RewriteRule ^.* - [F]
Place this rule on the line immediately following the rewrite condition.
You are encouraged to install your own honeypot and redirect harvesters and commentspammers to it. You can for example define a rewrite condition as above and use a rewrite rule like:
RewriteCond ${REPEL:%{REMOTE_ADDR}|OK} Harvester|CommentSpammer
RewriteRule ^.* /cgi-bin/honeypot.py [L]
Note that each virtual host definition needs to have a RewriteEngine on directive to enable rewriting. Optionally place an rewriteoptions inherit directive in the Apache virtual host definition to apply the rewrite rules of the main server. Keep in mind that the main rewrite rules are applied after the rewrite rules of the virtual host no matter the order in the configuration file.
For questions about configuration or other concerns, please visit the Repel support forum.
Repel is technically a filter that reads IP addresses from input, looks each up as a DNSBL query from a DNS server, and emits the result in the same order as in the input, provided in a format suitable for regular expression matching. Start Repel with a --log option to log responses in a file so you can examine the format:
RewriteMap REPEL "prg:/path/to/scripts/Repel/repel.py --key=honeypotkey --log=httpbl.log"
When Repel identifies an IP address as a bot, it reponds with a code consisting of four pairs of hexadecimal digits, separated by colons (e.g. "7F:01:01:06"). The meaning of these numbers (as decimals) is described in the Project Honeypot API. For your convenience, the code is followed by a combination of descriptive keywords that decodes the result, so you can match these in the Apache rewrite conditions:
These labels identify friendly search engines in the response:
The command line can take the following options:
To get a list with other options, start the application with -h.
You can run Repel as a filter from a terminal/shell, for testing or batch processing. By default, it reads lines of IP addresses from standard input and outputs the decoded DNSBL response.
Alternatively list one or more filenames on the command line as sources for IP addresses to look up. The output is in Apache rewritemap format with the original address first on the line.
The tradeoff of using Repel is a slight increase in latency for the requests that require HTTP:BL verification, but this can be minimized by skipping the test for apparent human visitors.
When you have the basic configuration working, you may add additional rewrite conditions to bypass Repel for requests that almost certainly are made by humans, such as when the visitors have authenticated with a password, come from within your own domain, or have a cookie that proves earlier access.
You can speed up DNSBL queries by running a local DNS server. It reduces the time to query DNSBL when there are a time lag between repeated requests from the same address.
Administrators of demanding web servers may consider the mod_httpbl Apache module as an alternative.
Visit the Repel support forum for other issues.