Setting up SpamAssassin to use Pyzor, Razor and DCC within Virtualmin

Sat, 14/12/2024 - 11:46 -- James Oakley

I run a webserver on Virtualmin, and it also handles incoming email. Sometimes some more details help people apply what I'm about to write to their server, adjusting for any differences they may have. So I'm using AlmaLinux 9.5, Postfix 3.5.25, Procmail 3.22, and SpamAssassin 3.4.6. The system has Perl 5.32.1.

I use SpamAssassin to score incoming emails for spam, so I can filter accordingly. Within Virtualmin, I use SpamAssassin as a standalone command rather than using spamc to connect to the spamd daemon (within Virtualmin, that's set at Virtualmin > Email Settings > Spam and Virus Scanning > SpamAssassin client program. The reason is that this lets me have per-user settings and bayes databases. If you want to use the daemonised version of SpamAssassin, some of what follows may need adjusting.

I wanted to set SpamAssassin up to use 3 cloud / crowd based tools: Pyzor, Razor and DCC. They're each slightly different, and each detects spam that the other misses, so there is mileage in using all 3. However there is also value in using just 1 or 2, and in what follows each is independent of the others.

I could not find clear and up to date instructions on how to set up these tools to work with SpamAssassin within Virtualmin. There were various instructions for other systems with or without a control panel, and SpamAssassin has a very good documentation site of its own. Combining the fact that some of these instructions were written long ago when versions were older, and that none of them bring everything together, I ended up having to pull things together and figure it out.

So here is what I've written up for myself for future reference. I hope it helps!

Pyzor

From their own website:

Pyzor is a collaborative, networked system to detect and block spam using
digests of messages.

Using Pyzor client a short digest is generated that is likely to uniquely
identify the email message. This digest is then sent to a Pyzor server to:

  • check the number of times it has been reported as spam or whitelisted as
    not-spam
  • report the message as spam
  • whitelist the message as not-spam

It's run by SpamExperts who operate their own commercial cloud SAAS spam filtering service. You can run your own Pyzor server, but most people will simply connect to the one run by SpamExperts, which means non-SpamExperts users get to benefit from the huge dataset they analyse regularly.

  1. dnf install pip
  2. pip install https://github.com/SpamExperts/pyzor/archive/refs/tags/release-1-1-2.zip (or whatever the latest release is)
  3. Append pyzor_options --homedir /etc/mail/spamassassin to /etc/mail/spamassassin/local.cf
  4. Add port 24441 outbound for TCP and UDP, IPv4 and IPv6, in your firewall
  5. Test: echo "test" | spamassassin -D pyzor 2>&1 | less

DCC

DCC, or Distributed Checksum Clearinghouses …

is an anti-spam content filter that runs on a variety of operating systems. The counts can be used by SMTP servers and mail user agents to detect and reject or filter spam or unsolicited bulk mail. DCC servers exchange or "flood" common checksums. The checksums include values that are constant across common variations in bulk messages, including "personalizations."

The idea of DCC is that if mail recipients could compare the mail they receive, they could recognize unsolicited bulk mail. A DCC server totals reports of checksums of messages from clients and answers queries about the total counts for checksums of mail messages. A DCC client reports the checksums for a mail message to a server and is told the total number of recipients of mail with each checksum. If one of the totals is higher than a threshold set by the client and according to local whitelists the message is unsolicited, the DCC client can log, discard, or reject the message.

So this is different from Pyzor. This is not about checking (a hash of) messages against known-spam. This is about stripping out common personalisations of bulk messages ("dear name", etc.), creating a hash, and logging that the message has been received. Each new person to receive that message can check how frequently an identical message has been received by someone else. If the same message is circulating widely enough, it could be indicative that it's spam. Obviously, this risks false positives with genuine bulk mailing lists, so SpamAssassin adds a moderate score to messages DCC flags, but not so high that it automatically gets treated as spam on that one flag alone. The DCC servers also return the probability a message is actually spam, which nudges up the SpamAssassin score a little further.

Here's how I set this up. Consult "Using Dcc" on the SpamAsssassin documentation site, and also the page on installing.

  1. Install DCC (the dcc-* is to give you a cd command that works whatever the latest version number is)
    wget https://www.dcc-servers.net/dcc/source/dcc.tar.Z
    tar xfvz dcc.tar.Z
    cd dcc-*
    CFLAGS="-O2 -fstack-protector" DCC_CFLAGS="-O2 -fstack-protector" ./configure && make && make install
  2. Create a user to run DCC
    # create dcc user to run dccifd safely (Linux specific useradd arguments)
    useradd -m -U -d /var/dcc -s /bin/sh dcc
    # homedir needs to be owned by the user
    chown -hR dcc:dcc /var/dcc
  3. Create a systemd process. Create a file named /etc/systemd/system/dcc.service as follows:
    [Unit]
    Description=DCC (Distributed Checksum Clearinghouses) interface daemon
    
    [Service]
    Type=forking
    PermissionsStartOnly=true
    RuntimeDirectory=dcc
    ExecStart=/var/dcc/libexec/dccifd
    ExecStop=pkill -u dcc;rm -f /var/dcc/dccifd;
    User=dcc
    Group=dcc
    Nice=1
    
    [Install]
    WantedBy=multi-user.target
  4. Open port 6277 inbound and outbound for UDP in the firewall.
  5. systemctl daemon-reload; systemctl start dcc
  6. systemctl enable dcc
  7. The progarm /var/dcc/libexec/dccifd will run persistently as a non-root user, so if you have software that reports persistent programs add this to the whitelist (such as /etc/csf/csf.pignore)
  8. Check that it works: cdcc info should report at least one public DCC server not reporting an error
  9. Enable in SpamAssassin:
    1. Uncomment the following line in /etc/mail/spamassassin/v310.pre: loadplugin Mail::SpamAssassin::Plugin::DCC
    2. Add the following to /etc/mail/spam/assassin/local.cf:
      use_dcc 1
      # When using dccifd, socket will be search from dcc_home
      dcc_home /var/dcc
      dcc_timeout 8
      # If not using dccifd, dccproc is used
      dcc_path /usr/local/bin/dccproc
    3. Check SpamAssassin is now picking up DCC: echo "test" | spamassassin -D dcc 2>&1 | less

Razor

Razor is much older than the other two. The most recent release of the software you'll need to install dates to 2008! But it still works. Because of this, documentation is even thinner.

Vipul's Razor is a distributed, collaborative, spam detection and filtering network. The primary focus of the system is to identify and disable an email spam before its injection and processing is complete.

I'll be honest, I'm less clear what this one does, and in particular the extent to which it operates off hashes as opposed to the actual email content. For this reason, you may wish not to use this one. It also installs lots of Perl modules, and I wrote up one problem I had where I ended up with two copies of the File::Spec library. This is where its age lets it down.

You may wish to consult the SpamAssassin documentation pages on installing and using Razor.

  1. Install Perl dependencies
    1. Go to https://sourceforge.net/projects/razor/files/razor-agents-sdk, and download the .tar.bz2 file for the latest razor-agents-sdk
    2. Untar (tar -xjf)
    3. Install
      cd razor-agents-sdk-*
      perl Makefile.PL
      make
      make test
      make install
      cd ..; rm razor-agents-sdk-*
    4. Install the Razor Agent
      1. Go to https://sourceforge.net/projects/razor/files/razor-agents, and download the .tar.bz2 file for the latest razor-agents
      2. Untar (tar -xjf)
      3. Install
        cd razor-agents-*
        perl Makefile.PL
        makemake test
        make install
        cd ..; rm razor-agents-*
      4. Register Razor (creates an account on their server so they can see which mail comes from which systems). Note, some documentation mentiosn the need to run another command, razor-client, but (contrary to those docs) this no longer needs running.
        razor-admin -create
        razor-admin -discover
        razor-admin -register
      5. Repeat the previous step for every account on the server that will use Razor. Unless using the spamd, Virtualmin runs SpamAssassin as the mailbox user that is filtering the email (and those mailbox users don't have a shell assigned, so we need to specify it).
        su -s /bin/bash mailbox@example.com
        razor-admin -create
        razor-admin -discover
        razor-admin -register
      6. Test that it's working: echo "test" | spamassassin -D razor2 2>&1 | less

A note on testing

In each of the 3 packages above, I've piped "test" to spamassassin in debug mode to see if the tool concerned is working.

SpamAssassin contains sample emails you can use for testing. So, taking Razor as the example, you can instead run a test on an actual sample email

spamassassin -t -D razor2 < /usr/share/doc/spamassassin/sample-nonspam.txt
spamassassin -t -D razor2 < /usr/share/doc/spamassassin/sample-spam.txt

The -t is important; it means you're in test mode so it won't submit this to any external databases or use it for Bayesian training.

I'd suggest testing first by simply piping "test" in, to pick up any obvious problems quickly, and then test with a full sample email if you think it's working.

A note on privacy

For each user you setup to work with Razor, the commands you ran will create a folder ~/.razor. This contains a number of files, one of which is named razor-agent.conf. There is a man page for this file which explains most of the directives.

In particular, I'd draw your attention to report_headers.

When reporting spam, the entire email (headers and body) is sent to a Razor Nomination Server. When set to 0, all the headers are removed except headers beginning with "Content-" before sending, and a special header beginning with "X-Razor2" is added to note this action. The default is 1.

Make your own mind up. For me, I'll be setting this to 0 for each user that works with Razor.

Blog Category: 

Add new comment

Additional Terms