The Spoor Of Darkness: Dealing with Spam
An article that is only indirectly gaming-related today. Most of that relationship is narcissist in nature, because this is an article about Campaign Mastery itself, and about the environment – the internet – in which it resides. But I thought it sufficiently important, in light of recent events, to publish anyway.
The recent DDoS (Distributed denial-of-service) attack on the DNS Servers belonging to Dyn DNS was alarming and disturbing – but it came as no surprise to me. The reason is Spam Patterns, specifically spam generated by automated malicious software known as Spambots.
You see, I recently migrated my anti-spam protocols to a new methodology that I had devised. As a result, I was paying more attention to the Spam being received by Campaign Mastery than usual, and tracking it back to the originating networks, and started seeing spambot-generated spam coming from unusual places.
IBM. AT&T. Time-Warner. AOL. MIT. Microsoft. Places like that.
That led me to suspect that something new was going on, and that systems that would normally be hardened and resistant to such rogue software were being compromised.
So I was expecting something to go down, and hoping that the relevant authorities had been paying attention to the same things that I was (or to even better indicators); hence it came as no surprise when something did happen. Alarming yes, but surprising, no.
But, as it turns out, those weren’t the primary originators of last Friday’s attack. And the spambot infiltration of unlikely places has continued since. In other words, last Friday had nothing to do with the compromised system security indicators that I was seeing; those are still out there. Add disturbing to the mix, because the implication is that another sword of Damocles is poised, ready to strike.
So far, everything I have described sounds like a great intro to a science-fiction / secret agent adventure – one attack revealing another – but this is all very real.
Why take spam seriously? Isn’t it, like, just an annoying inconvenience?
We’ve lived with Spam for so long that people are starting to treat it like part of the furniture, the price of being on the web. And that’s a serious problem for the entire internet.
Based on the spam that I have been analyzing over the last two months, automatically-generated spambot-created spam outnumbered the old-fashioned kind about 99-to-one. As I said, a spambot is a piece of rogue software that is somehow placed on a system that generates spurious email messages aimed more-or-less randomly.
It used to be the case that the major purpose of a spambot was to target websites that didn’t pay enough attention to the comments being posted on them so that the websites being linked to within the spam would rise in ranking on search engines such as Google, enabling them to ensnare unsuspecting visitors whose systems would then be compromised, enabling hackers to do whatever they wanted – distribution of viruses, identity theft, compromising of bank accounts, or – on command – becoming a vector for a denial-of-service attack. And, of course, raiding the email accounts of the users to further distribute itself.
Every piece of automatically-generated spam received from somewhere indicates that someone has gotten something behind the defenses of that system. What if the spambot itself harbors a more serious payload, and the spambot activities are just a means of announcing to the hackers, “I’m here, I’m in place, awaiting your orders”?
That’s why I take spam seriously, especially when two things start happening: I start seeing it turn up in places it’s never turned up before (but that could just mean that someone who has my email address has been infected with a spambot) and when it starts coming from places that I never expected it to originate. Lately, I’ve been seeing both – so I’m concerned.
Are internet providers doing enough to combat spam? And what more can be done?
In theory, the way to deal with spam is to identify the originating network and alert them to the abuse of their systems. They then identify whose accounts have been compromised, perform the appropriate level of spring cleaning, and the spambots go away. It doesn’t work that way; the networks seem to have gone out of their way to make it as difficult as possible to report spam and other abuse. Whether this is because their security specialists assume that everyone who knows enough to report system abuse is also a systems security expert, or because they just don’t want to be inundated with reports, doesn’t matter; the end result is the same.
With the rise of WordPress and other blogging platforms, there are more unskilled people looking after their own network security than ever before. If ever the time was right for a pushbutton solution, it’s now long overdue.
Here’s what should happen: A user logs into their website’s administration section, goes to their comments-handling utility, and sees that there is something in their spam folder. They open the spam folder within their inbox and look each item, verifying that it is indeed spam. They then push a button or click a link that:
- extracts the relevant information from their database;
- constructs an email alerting the source of the abuse of their systems being compromised;
- appends the relevant information from the database;>/li>
- performs a lookup and obtains the email address for reporting systems abuse;
- sends that email to that address using the account registered as belonging to the administrator of the website receiving the spam;
- and then deletes the spam.
Having established with their login that they are an authorized and real human being, the recipient of that email knows to take it seriously – especially if they start getting hundreds or even thousands of alerts to the same problem. But if additional verification is required, a second password specifically required to authorize a spam report or some other real-human verification test is all that’s required.
This puts the onus on the people hosting the spam originators to do something about the problem instead of letting it fester, untreated. (Are you paying attention, Wordfence.com?)
None of that happens at the moment. When you boil the current situation right down, people have three choices: ignore the spam and hope it goes away, do all of the above manually (which takes a lot of time and effort), or simply delete it. Sources who generate a lot might get blacklisted or might not – which simply means that your website is no longer reachable from the compromised systems in any way. So far as that part of the internet is concerned, your site ceases to exist – for both legitimate users and the spammers.
I hate this approach. Not only does it potentially deny your website to the very people you want to read it, but it can be turned against people simply by making it look like spam is originating from a website that you want to deny to people. If enough internet users block a website, that business is no longer someone that those internet users can do business with. You don’t sell your goods and services, your business goes broke, you go away – leaving the market open to a shadier competitor who was willing to employ those tactics. And on top of that, it doesn’t actually solve the spam problem – it just hides it from you.
But, in the absence of the better alternative that I have described, that is the best choice that site administrators have.
A story of transition
If there was even a half-practical alternative, I wouldn’t use it, and wouldn’t have been using it since July 2014 (as described in “Fighting The Spam War“).
Ever since I instituted that anti-spam policy, I have been worried about blocking legitimate visitors to the site, condemning internet neighbors because of the one bad apple who happens to “live” next door. I did things that way because I saw no alternative. Which brings me back to my new anti-spam protocols.
I’ll be honest – they are a lot of work. I’d rather not do them. But the old procedures were no longer working sufficiently effectively, and spam levels had starting to rise to unacceptable levels – in fact, had been at those levels for a good six months or more. I was dealing with 200 spam “comments” a day on a good day and 450+ on a bad.
One of the major flaws in the old system is that it had no memory. A site could be blocked one day, go quiet for the next two or three, get unblocked, and then spam again. Or the site could be blocked, and hundreds of failed attempts to reach the site could be documented – and even if these were legitimate web traffic trying to visit Campaign Mastery, would be treated as spam attempts because one a block was instituted, there was no way to distinguish between the two.
One particular network got blocked for four months, in which time it accumulated more than 5,000 attempts to reach the site. Were these all attempts to spam it? It seemed unlikely. And that was the straw that broke the back of the old, flawed system – when the potential for blocking legitimate traffic became more than I could tolerate.
Introducing the new Anti-Spam Protocols
The new protocols took me three months, on and off, to design and get close to being right (I’m still making minor tweaks). And at the heart of them is a spreadsheet which provides that memory that I spoke of. They are designed to be as little work as possible – but they still consume a good hour or three of my day, every day. That’s the bad news.
The good news is that they ultimately classify all spam originators into one of six categories:
- Do Not Block, which indicates that I have verified that legitimate traffic outweighs spam by a significant ratio;
- Block Individually, which indicates that each IP address should be given individual treatment because there is a significant likelihood of non-spam traffic;
- Evaluation in Progress, which indicates that a statistically-significant number of attempts to reach the website have been blocked relative to the number of spam comments originating from that source, which will enable the originating network to be classified into one of the preceding categories;
- Block Collectively, an interim state in which traffic from an originating network is blocked for a period of time relative to the nature and frequency of the actual spam received, and which may enable a statistical appraisal of the traffic originating from that network (this is a necessary precursor to category 3 assessment), and which is the default into which ALL traffic falls;
- Green Denied, which is a stage that can eventually lead to a classification into category six if egregious behavior persists;
- Block Permanently, indicating that sufficiently significant spam levels have been observed with no legitimate traffic that I am comfortable that a Permanent blacklisting will not affect real people who want to read what the site offers.
As I write this, 33 networks are in category two, one has made it all the way to category one, and one has been relegated to category five. By sheer coincidence, the two worst offenders to date (43 and 53 spam in 96 hours, respectively) are both blocked until the 20th of December. That’s 60 and 73 days, respectively. One is a network, the other is an individual IP address.
This is clearly a major improvement on the old system for three reasons: (1) it’s more granular; (2) it blocks for finite periods rather than indefinitely if the traffic keeps coming; and (3) it permits analysis of the actual traffic instead of basing decisions on worst-case assumptions.
Eventually, it will permit me to make permanent decisions and stop using it – saving not only those 1-3 hours a day, but also the time that would have been lost under the old system.
So, how does it work?
The whole thing is actually based on the generic crime-and-punishment system of the American courts. Probation, suspended sentences, jail time, witness relocation, being let off with a warning for a first offense, consideration of past offenses, even a statute of limitations, all have analogues within the process. It has many of the same faults, flaws, and compromises, as well.
Caught committing a crime
A spam comment is received which has the originating IP address of (say) 345.367.400.894 (note that this, like all the examples, is a completely fictitious IP address which can’t actually exist in the real world.
The first thing I do is log it into a row of the spreadsheet, with the date on which I am doing so, the number of spam received from that IP since I last did my spam processing, and the number of those that conform to a recognizable spambot pattern. Then the spam is deleted, ensuring that if I get interrupted, it won’t get processed twice.
The First-Offense Warning/Suspended Sentence
When all the spam that has been received has been logged in, I sort the entire log by IP number. This groups the newly-arrived spam with the history of the originating record. What I do next depends on that history.
To keep the process clear, let’s assume that there is no history – that this is the first time that I have received spam from this particular IP address according to the system’s records.
The Trial
When this happens, I perform what is called a “Whois lookup” which identifies the owner of that particular IP address. That’s how I know that I’ve received spam recently from that impressive role-call of companies that I named at the start of this article. I record the range of the network – that is, the range of IP addresses that belong to it, on the assumption that the entire network may has been compromised until proven (probably) otherwise.
Some networks have a bad reputation with me, earned through being recognized as the source of a LOT of spam over the years. If I recognize the network as a known producer of spam, that gets recorded. Similarly, some countries generate more spam than others (and not a lot of web traffic to the site) – if I recognize the country of the network as being in my top-five unprotected countries for spam, that gets recorded as well. Both of these factors induce harsher treatment of the ‘crime’.
A few countries generate significant levels of traffic to the website. These are considered “protected” countries, in which a higher tolerance level for spam is justified by the traffic. My top-five traffic sources are the US, Canada, the UK, Australia, and France, and those constitute the ‘protected’ list at the moment. If the network from which the spam originated comes from one of those, that gets recorded as well.
There are some sources that I know to be internet providers to ordinary people (not just businesses), or to be more likely to have people interested in RPGs (which is what the website is about). That gets taken into account. Finally, there are some types of network servers that, while useful, are more prone to abuse because of the anonymity that they are designed to provide, called TOR networks. If the description of the network indicates such an origin, that gets taken into account as well.
All of these factors are weighted and weighed up according to a numeric calculation and compared with an “action trigger” that indicates a sufficiently serious level of spam that action is warranted. At the moment, that trigger is set to a value of “4”. Over time, it will increase, as will the weighting given to recidivists.
If this target isn’t reached, the “judge” lets the “offender” off with a warning and a suspended sentence. If no further “crime” is committed in the time frame of that suspended sentence – which is calculated automatically based on the specifics logged – then it simply becomes part of the “criminal history” of that network, and – in due course – gets expunged completely, so that it will no longer be taken into consideration in future cases. The standard period at the moment is five days, but that can and does vary quite a lot. It could be as little as two – or as many as 6. Getting to a week generally means that the “crime” is sufficiently serious to merit immediate “incarceration”, i.e. blocking.
Second Offenses
Let’s say that the next day, a piece of spam arrives from 345.367.400.896. This is a different IP belonging, let’s say, to the same network as 345.367.400.894, which is serving its suspended sentence.
Right now, in this part of the process, I’m not interested in individuals; I’m hunting for criminal organizations. These are both part of the same network, and there’s no instruction to give individual treatment to them – the network again goes before the judge, and part of the “evidence” is the suspended sentence. Between the two crimes, there might or might not be enough guilt to warrant “jail time”, It’s about 50/50 and depends on the exact specifics. If there is a sentence, the whole network serves it, and it could be anything from 2 to 7 days. Three or four are the most likely.
Jail Time
So the network goes to jail, i.e. gets blocked. The sentence is actually divided into two equal halves – time to be served, and time to be out on parole. In the first half, the network is blocked, which means that traffic originating from that network is counted. Some of it will be spam, some of it might not be. That information gets recorded when the network is “released” and compared with the number of spam likely to have been received based on the tally which got the network “locked up”. This determines which type of parole is served – green or blue.
“Green” parole
Based purely on the number of spam comments that got the network locked up, this tests whether or not the network should be treated as individuals instead of collectively. It is a significant step towards “Do Not Block” status. Green Parole means that for a specific period of time, the network is not blocked, no matter how many spam gets received, but that records are kept for each individual IP address within the network. At the end of the time period (or sooner if significant levels of spam are received), the network may receive “block individually” status, or it may even receive “do not block” status, though that’s rare. Or it may fail, indicating that virtually all the traffic that was blocked was in fact spam, leading to “green denied” status. Or it may simply get thrown back into the general population, but that it also rare.
It’s all about the pattern and quantity of spam received. The network is placed under a microscope and treated according to the results. If all the spam appears to be coming from only a small handful of IP addresses within the network, that’s a “block individually” result. If there’s a whole lot of spam coming from a lot of different IPs, that’s Green Denied – a permanent restriction to the general population which leaves the network vulnerable to being classified as “block permanently”.
Green Parole is the equivalent of being in witness protection so far as the anti-spam protocols are concerned. It’s a fresh start and a chance to weed out the bad apples. What it really means, though, is that there has been enough web traffic over the projected spam levels to permit statistical analysis of the member IPs. Right now, there are 7 networks on “Green Parole”.
“Blue” parole
Blue Parole is a lot less forgiving. For the second half of the sentence, the network is “released on parole” (unblocked). If any spam arrives during this time, the network not only goes back to jail for the other half of the sentence before commencing a new parole period, the fresh crime is added to the sentence as well. Right now, there are 28 networks on Blue Parole – which indicates either that the number of blocked “hits” was in line with the expected levels of spam, based on the evidence when it was “locked up”, or that there were no hits at all (so there is no data to use for an analysis).
What’s more, past periods of blue parole are counted, and judged quite harshly. That “green denied” network that’s blocked until December 20? It had served 8 periods of blue parole. If it re-offends in the 73-day parole period that follows its jail term, it will be “locked up” for at least 133 days. Since this is longer than I am willing to fuss over the calendar involved (more than 3 months), it will instead be “life imprisonment without the possibility of parole” – also known as Block Permanently.
A criminal record
Networks don’t receive a criminal record for life. Assuming that they make it through their blue parole, they enter gray status. Grey means that they are waiting for the statute of limitations to run out on their original crime. If jail time was not awarded to a network because they behaved themselves while sentence was suspended, they also have this status. The current statute of limitations is the period of the original sentence plus 14 days. During this time, anything on their record counts for 1/2 when a “new crime” is committed. This is just enough, in most cases, to turn what would otherwise have been a suspended sentence into new jail time.
Individual Judgment
Our hypothetical example had two spam arriving over a four day period. Let’s say that while it was blocked, 15 blocked hits were recorded by the system. While it’s possible that this was another 15 spam attempts, it doesn’t seem very likely – not in a 3- or 4- day period. It’s certainly enough to permit statistical analysis, so our example network goes into Parole Green. For the duration of the parole period, spam is counted but triggers no action – until it reaches a level great enough to account for those 15 hits. If it gets to the end of its parole period without enough spam to account for the traffic recorded, it earns individual treatment (status 2), also known as “Block individually”.
That means that each IP number within the network is treated as though it were a network in its own right. Since all the criminal activities to date have been logged against the network, and not the individual IP – they started as individual records but get conflated into a single record – they all get tossed out, and the network starts over with a clean slate.
If you have an organization of as many as 33,554,432 individuals – and some networks have that and more – it doesn’t take many bad seeds to run up significant “jail time” when all their “criminal acts” are aggregated. It’s a lot harder for an individual to accumulate enough misdeeds in a short enough time period to permit analysis of them as an individual; many times, an IP might be locked up, paroled, released, and even had their sentences expunged through that statute of limitations, endlessly repeating this cycle of misbehavior.
But each time one is released, their information is checked on release, and if there is enough to make an assessment, it may earn a coveted “Do Not Block” status, indicating that its social probity – i.e. the internet traffic reaching the site from it – outweighs the spam that is received from it. If the one network earns enough of these and has no long-term inmates amongst its population, the entire network may be granted this status.
On the other hand, recidivism counts can accumulate; each jail term and parole period is longer than the last, and eventually the IP may be blocked for more than 3 months – earning it a “block permanently” status.
The Aging Process
After all the spam has been classified and treated according to its current status within the system, the final step is to roll forward the clock, dismissing any records that have aged beyond the “statute of limitations”, concluding parole periods, and so on. This also includes releasing from ‘prison’ any blocks that have expired, resetting the system ready for the next batch.
Flaws and Weaknesses
No human process is without flaws, and this is no different. It’s flaws and weaknesses, too, are largely reminiscent of the human institution of imprisonment.
First, I dislike the need to bias the results for known spam originators, both in terms of networks and nations, just as I dislike the use of racial profiling in criminal investigations. I do so simply because years of permitting the same thing to happen have shown that some people using those services or residing in those nations can’t be trusted, and it makes the system more prone to correctly processing and preventing the receipt of more spam.
Second, human error can and does happen. A network block can be reported as cleared but the block not actually removed, for example. Or I might misidentify someone by typing in the wrong IP address, resulting in the equivalent of wrongful imprisonment. That’s one of the reasons it was so important to have a statute of limitations and for all statuses to be subject to periodic review if site behavior changes.
Third, the system can break down in one of two ways: being flooded by too much spam to process, or as a result of not processing spam received at least daily and preferably two or three times a day.
Fourth, and finally, the entire technique is vulnerable to IP address spoofing, where the IP address that gets reported to me is false, a deception perpetrated by the spammer. There have been a few cases of this that I think I have detected (not those associated with the prestigious names I listed earlier, I must add). Briefly, when you block an IP address and more spam shows up supposedly from that IP address, I regard it as suspect in this respect, and treat it accordingly, discounting it in terms of judging the apparent network).
But my old methods were prone to the same failings, and lacked the safeguards built into the new one.
The Tools
All this is possible through a combination of three tools, both WordPress plugins. The first is Akismet, which learns to identify the spam and places it into a special “spam” folder, separating it out from real comments. While there was the occasional false positive in the first few months of use, for the remainder of the many years that Campaign Mastery has used it, there has been a 99.9999% accuracy in that respect – perhaps one in one hundred thousand comments identified as spam are actually genuine. False negatives, situations in which Akismet is unsure, happen more frequently; but that’s exactly the way you would want it to be, when you think about it.
The second tool is the spreadsheet that I have constructed, and the procedures for using that spreadsheet. Almost all the operations are manual in nature, but some calculation gets performed automatically, especially the dates on which conditions expire. While it would be convenient for more of this operation to be automated, it’s not a huge deal – and does give me a greater level of control over the process.
The third tool is the one which provides the functionality to implement the judgments made using the spreadsheet. This is done with a plug-in named Wordfence, which is excellent in what it does and getting better all the time.
We’ve been using both of these for virtually as long as Campaign Mastery has been in operation, and in that time have processed 1,299,127 pieces of spam. Appalling though that number is, I would estimate that the combination has prevented at least three-to-five times that much spam from even reaching the site – call it 4 million pieces of spam over the last 8 years or so. Note that this number is higher than the spam levels reported earlier because there have been times when hundreds or even thousands of spam were arriving a day, as I described the last time I wrote about this subject.
Half a million spams a year. 1370 a day, average. Even if each only took ten seconds to process each one, that’s still close to 4 hours every day. In fact, it often takes 20 or 30 seconds to assess each piece of potential spam – so if I had to operate without these tools, I would spend about 10 hours a day doing nothing but handling spam – on average.
In fact, one of the triggers that led to the creation of the new anti-spam protocols was spam handling routinely exceeding an hour a day, up from about 10 minutes. Under the new protocols, it still takes about an hour – but it’s a far more productive hour, with far greater confidence in what is being done, and what isn’t.
Protocol Impact
To date, I haven’t heard from anyone complaining that the site was unreachable. That’s good. Any disruptions in that respect are temporary, unless the source network is being really seriously abused. And it’s cut the Spam being received from that 150-350 daily tally to between 20 and 35 a day, and sometimes less – with the confidence that minimal impact is being experienced by the real visitors to the site. That is a ten-fold reduction, And that ratio is getting better all the time.
The End Goal
One of the main reasons why it’s a lot of work is because networks are being given every opportunity to deal with their own problems, i.e. to “reform” themselves. Networks and IPs have to be dealt with, time and time again. But there is light at the end of this particular tunnel, and it’s not an oncoming train.
The end goal is to have the entire internet classified into one of the two extremes – “block permanently” or “do not block ever”. When that happens, I will happily blacklist the real offenders, secure in the knowledge that genuine visitors to the website won’t be affected. And I can put the headache of spam behind me – so far as Campaign Mastery is concerned.
The Bigger Picture
There’s only one fly in that prescription: the fact that none of this is doing anything to actually stop the proliferation of spam at the source, and the potential for that continual flood to be a vector for more significant harm. Spam is, or at least can be, a conduit of evil – or, at the very least, the spoor of such a conduit, the visible manifestation of compromised network security. Last Friday showed what that can mean for all of us. It does me no good to be sitting in a protective cocoon if the internet itself goes down, in whole or in part.
What I have devised is a way to weed the spam producers out from (most) of the legitimate web traffic to my site. I can’t fix the entire internet, but I can try to keep my little corner of it clean – and if the spam producers stop getting the results they want, maybe they’ll stop wasting their time with it. Okay, maybe that’s too much to hope for.
But it would be a start if those who could would take spam a little more seriously – and act on that.
The Gaming Connection
I promised at the start of the article that there were be a gaming connection to all this, and here it is.
This is a practical example of something that I have been extolling in the pages of Campaign Mastery for quite some time – the power of analogy.
While I didn’t start out to model the new anti-spam protocols on the generic criminal-justice system, it was only when I grasped the similarities and consciously looked for other applications of that analogy that refinements like the “statute of limitations” revealed themselves. In fact, about half of the new spam protocol was a result of “analysis by analogy”.
I used this same principle to construct an entirely new view of psionics some time back (it was part of the Examining Psionics series back in 2010. And I try to apply it whenever I am trying to solve something complicated and that I don’t fully grasp yet.
Not only does it illuminate aspects of the situation that I hadn’t thought of, not only does it give me a handle on understanding the things that I don’t understand, but it can actually suggest solutions to parts of the problem that I wasn’t even aware of. In fact, that’s how you tell that you have an illuminating analogy – if it doesn’t do at least one of those, and preferably all three, look for another one.
So think about that the next time you have a complicated situation to analyze – whether it’s trying to figure out what a villain’s grand scheme is, or how a character will react to an unusual situation, or trying to get a handle on some house rules (or just understand the official ones).
Discover more from Campaign Mastery
Subscribe to get the latest posts sent to your email.
Comments Off on The Spoor Of Darkness: Dealing with Spam