Spam is a waste of the receivers’ time, and, a waste of the sender’s optimism.
– Mokokoma Mokhonoana
One of the many tools in the defensive toolkit is the booby trap. Works in reality as much as it does digitally, as long as you lay the trap properly. So don’t put out a carrot in your box trap but expect a leopard.
Background
The idea here is that we want to augment the data sets we’re collecting from Ringo Dingo with some industry specific spam indicators. The best way to do this is to let the spammers do our work for us, so setting up a related website on the cheap we can ensure it’s found on web searches pretty easily and let them do the rest.
Spam emails sent via phishers, unsolicited sales or unsolicited affiliates are usually quite generic and generally have a common construct. We want to capture a new data stream from spammers looking to sell particular types of services, so we can tag these data sets up for our learning models to use later down the line.
Approach
Website, add genuine reviews of genuine skip companies, search for UK-side skip companies and do a small amount of research on companies house to highlight problem areas.

Add a simple contact form, a simple “request new skip firm” and a mechanism to add new reviews. Make the captcha workable but not complex using WordPress plugins rather than hCaptcha.




When there are a few hundred emails to use as a base data set the trap can be dismantled.
Basic mechanisms were applied to prevent crawlers, scrapers and the various tools like modsecurity and geoblocking.
Running The Trap
To start with, a couple of contact form submissions came in from “Eric Jones” and “Erica Johnson” for SEO and traffic enhancement. It was clear they’d made a token effort to look at the site, determined that it was fairly new and were trying to sell unsolicited services. Immediately there were really handy data points such as links with domain names and product names. Each of the domains would be spidered by RD as the contact emails came in, hence building a historial record of domain name linkage. Any URL builder emails are automatically expanded so we’re already capturing an interesting level of detail.

As the next year passed the number of contact form submissions increased significantly. Three years later there’s still contact form submissions coming in as the site is being shutdown.

Results
As a result of these actions we now have the following:
- A large number of emails representing each contact request form submission from the Skipreview.uk website
- Each email was analysed by the RD Thunderbird plugin, and relevant data points analysed
- RD will capture headers, message body and on-demand caches of DNS, WHOIS and other standard meta-data in its data mesh
- Meta-data acquired allows linkage between apparently unrelated senders or campaigns
Therefore the data sets will contain the meta-data for all unsolicited senders that are useful for later alignment and preparation of data sets for model training.
Common patterns emerged:
- Senders use throwaway email accounts from standard providers e.g. AOL, Microsoft / Hotmail and Gmail mostly suggesting that either spammers are setting up accounts and leaving them for a certain amount of time before using them for each campaign; or the email providers are not properly establishing spam sender behaviours
- The messages divided into three basic categories; either they were SEO / traffic shaping consultants, multimedia / campaign designers or social media promotion sellers
- Many of the spammers are using the basic SEO applied to search for the site using standard web search engines, in order to sell SEO services
None of the above is a surprise of course, but the exercise showed how simple it was to set up a collection mechanism for useful data.
By the time the model training occurs the meta data set itself may be stale but will form a great template for repeating the exercise, and then having MLops retrain as new data comes in (or it determines its has itself become stale).
The trap itself will be dismantled simply by redirecting the domain elsewhere and shutting down the web and database server behind the scenes. All of five minutes work and minimal effort to create!





You must be logged in to post a comment.