Spam is a problem that never goes away. Email spam. Comment spam.
I'd like to introduce you to Project Honey Pot.
By the way, three quick comments about the links to Project Honey Pot in this article. (i) They have a "referrer" query string in. They don't have an affiliate scheme that pays out real money. It just gives what they call "karma" to my user account: They like to keep track of what their users give back to the project, and referring others to their site is one way to pay it forward. (ii) I find, semi-frequently, that their site is down with nginx "bad gateway" errors. If you get that problem, try again in 5 minutes, and the site should be back up. (iii) They don't use https. They should.
What Is Project Honey Pot?
Here's how they describe themselves:
Project Honey Pot is the first and only distributed system for identifying spammers and the spambots they use to scrape addresses from your website. Using the Project Honey Pot system you can install addresses that are custom-tagged to the time and IP address of a visitor to your site. If one of these addresses begins receiving email we not only can tell that the messages are spam, but also the exact moment when the address was harvested and the IP address that gathered it.
I'm going to have a go at explaining more clearly what they do, and how it works:
How Do They Collect Data?
A "honey pot" is a web page, email address, or other online service that you want spammer / abusers to visit. Typically, you want this so that you can identify who they are, and block them, or so that you can learn about how they work. You could create your own honey pot webpage or email address, and block any abusers you find. But that will be very ineffective: Spammer may only visit your site once, and by then it's too late.
Project Honey Pot invites website owners to create a honey pot page on their site. This page will, most frequently, contain an email address that spammers will harvest. No genuine emails will be sent to that address, so any email sent to that address is spam. By using a unique email address each time that honey pot page is served, Project Honey Pot can work out exactly which visit to your site led to the address being harvested. They can then analyse the visit, to work out how to block that same spammer / harvester in future.
Sometimes, the honey pot pages will contain other things, maybe a fake contact form, or a fake form to leave a comment. Again, anyone who completes that form is a spammer.
What Do They Need?
To run this infrastructure, they need people to donate two kinds of technology:
- Webmasters can donate a webpage on their site to act as a honeypot. It's dead easy. Most websites run on servers that can serve PHP pages. You can log into your account on the Project Honey Pot site, and they'll walk you through creating a bespoke .php file for your site. Install it in a directory of your choice, activate it, and it's good to go. If, for some reason, your web host doesn't serve PHP pages, choose between ASP (.NET), Perl, Python or a few others.
- Domain registrants can donate an MX record. Project Honey Pot need a plentiful supply of email addresses, so that each visit to a honeypot can present a unique email address. For that, they need lots of domains (or subdomains) where all email sent to that domain can go to their mail servers to be analysed. (If they only had a few domains that they used repeatedly, spammers would learn the domains never to send to). Many domains never need to receive email, so can afford to send their email to Project Honey Pot servers instead, and any domain could do this for a subdomain created for this purpose.
If lots of people donate those things, they'll have lots of web page honey pots, and lots of email addresses, and they can harvest lots of data about the activities of spammers.
Some website owners can't (or don't wish to) install a honey pot page of their own. Project Honey Pot will give you a "quick link" instead: You can link to someone else's honey pot page. That way, you can still help generate bot traffic for the honey pot pages.
How Do You Use Their Data?
So Project Honey Pot gather lots of data, crowd-sourcing it, on spammers and their activity. How can you make use of this to prevent spammers visiting your website?
They offer an HTTP Blacklist (or http:bl) service. This provides an API whereby you can query their database to find out if a given visitor is likely to be a spammer. This uses DNS, so each query is relatively quick. You can decide how aggressive this check will be. They return a threat rating for any given IP address that tells you just how much spam activity they've seen from that address. You can decide how high the rating must be before you block the visitor.
To use their http:bl service, you need an API key. For that, you need to create an account at Project Honey Pot. You also need to be an active contributor to their project, which is entirely reasonable. You do this by any one of the following:
- Running your own honey pot
- Giving them an MX record
- Referring other people to their website
Those who help the service to run can use its data to protect their own sites.
How To Use Project Honey Pot in Drupal
Lastly, how do you use the http:bl service in a Drupal site?
In one of two ways.
Option 1: Implement Bad Behavior. I know — it's misspelt.
Bad Behavior is a set of PHP scripts which prevents spambots from accessing your site by analyzing their actual HTTP requests and comparing them to profiles from known spambots. It goes far beyond User-Agent and Referer.
There is a Bad Behavior Drupal module.
- It's a mature project, with some experienced Drupal contributors behind it.
- It's dead-easy to install.
- If you use Drush, enabling the module will automatically download the Bad Behavior libraries.
- The issue queue and git repository both suggest that the module is very minimally maintained. (Last commit: 21 Oct 2014)
- New versions of the Bad Behavior library take forever for the module to support officially. [The most recent version of Bad Behavior is 2.2.19 released 25 Aug 2016. The module asks for 2.2.15, released 24 Dec 2013.]
- Only Drupal 7 is supported, with no hints of work on Drupal 8 even being planned, let along begun. That's a shame: It's a good module, and a good library.
What's this got to do with Project Honey Pot? Bad Behavior implements support for Project Honey Pot. It's optional — you can use Bad Behavior without Project Honey Pot. But right there, within your Drupal module configuration settings, you can set the Project Honey Pot API key, and you're away.
Option 2: Use the Project Honey Pot module
There is an http:bl module. It says it's minimally maintained, but there's an active Drupal 8 (-dev) branch. The last commit (at time of writing) was April 10 2017 for 8.x-1.x and March 25 2017 for 7.x-1.x. This will allow you to use http:bl directly in your Drupal site.
The project page says "http:BL has been adopted for use to enhance protection on Drupal.org." I'm unclear whether the module itself is in production use on drupal.org, or whether that claim merely refers to the http:bl service.
Over To You
Comments are open. If you've got experiences of Project Honey Pot (good or bad), with or without Bad Behavior, please pile in.
I'll monitor this site to see how effective it is at reducing spam. What I'm watching for is a reduction in the number of spam attempts that Mollom has to block.