3.15.2006

The Horror: Fooling Thunderbird's Spam Filter with Stephen King

I really like Thunderbird, the Mozilla mail client, but the spam filtering is not particularly useful. I keep getting the same luxury watch and prescription drug emails over and over, and maybe 5% of them are actually filtered, despite training the Bayesian "learning" filter for months.

The spammers seem to be fooling it by including invisible text of a non-spam-sounding nature. At least in the one I looked at this morning, it was a sequence of rarely appearing words ("operatic hermaneutic escherichia") followed by some text from Misery:

With Annie Wilkes that is a question which has no sane answer. Being such a straight arrow was part of the reason for this amazing fecundity, but Annie herself was a bigger one. "You don't want to write my book and so you're making up tricks not to start."


I have to say this is pretty clever, in that it's not obvious to me that there's a good solution, short of filtering any HTML message with a certain percent content of "hidden" text (figuring that out would be a little tricky, as they vary the font color in a range of "invisible"). And even if Thunderbird did that, there are probably lots of places to hide text that doesn't scan like spam. Has Bayesian filtering failed? What say you Paul Graham? Or is Thunderbird's implementation just broken because it doesn't pay enough attention to my "good" mail and is too concerned with false positives filtering things I actually want to see?

No comments: