It was brought to my attention the other day that there are some concerns about e-mail addresses published on our college’s web site and the effect it has on spam. It turns out the filters here run through about 10,000,000 emails a day, about 7% of which are passed on as being actual, legitimate messages. We are not a huge campus, but I’m going to guess that many of you would see a similar ratio. Naturally, this has brought up conversation of obfuscating e-mail addresses. We’ll set aside the “closing the gate after the horse got out” metaphor for now, because techniques can always help prevent spam from hitting new addresses, so at least that way we can lighten the load for our new users.
CSS (Code Direction)
This technique came to me by way of Silvan Mühlemann’s blog. I think of any method, this is both the easiest and coolest, and it works in FireFox and IE6. The problem is, it’s also the worst. It relies on the idea that you can take a string, and with a CSS attribute reverse the flow of the information inside the selector and make it readable. So, when you write the address, you say moc.elpmaxe@liame, and with CSS reverse it to be firstname.lastname@example.org. The reason this is bad though is twofold. First, you can’t make it clickable. The CSS only works on content within the selector, so you can’t manipulate an
href, and obviously putting the email in as a plain
href is as bad as having it normal in the page in the first place. Secondly, it breaks copy + paste, because copying the text causes you to copy from the source, which is backwards. So pasting it pastes the original moc.elpmaxe@liame. If you make the link not clickable, you darn sure better not break copying. The bad part is that Mühlemann’s blog reported a 0% spam rate over a year and a half on an address encoded in this manner, so it appears to be great at stopping spam.
This faces pretty much all the same problems as the other CSS technique, but instead relies on using a span inside an email address to hide a part and make it human readable:
email@<span style="display:none;">I hate spam</span>example.com. A user can read the address without issue, but still can’t copy it, and you still can’t make it a link.
Character Entity Encoding
This is the practice of taking an email address and encoding all the characters into HTML entity values, so email@example.com becomes firstname.lastname@example.org. This is better than having an email in plain text (affecting a 62% decrease in spam volume over plain text), and it allows you to make it clickable. However, it’s straightforward enough that it comes in second behind plain text as the easiest to get past, though the decrease in spam volume was fairly significant.
A similar, but alternative method that appears to reduce spam load by 92% over plain text is to mix in entities for the “@” and “.”, producing a mailto like email@example.com. This is probably because the crawlers are set to ignore single occurrences of encoded entities, and with them there, the email doesn’t match an email pattern (at least until they get smart enough to match this pattern).
Both of these methods can be considered viable for accessibility purposes, and they make a big enough impact that one could serious consider employing them full time.
Inserting comments results in addresses like
email<!-- -->@<!-- @ -->example<!-- -->.<!-- . -->com. This however fails the test to make the address clickable. It is more effective than fully character encoding the address, but less so than selectively encoding the “@” and “.”, receiving about 444% more spam than that method. Comments decrease spam by 11% over full on entity encoding of the address.
Use “name AT site DOT com”
A List Apart Method
Rather than explain this, go read their tutorial. It’s very clever, and is probably the best alternative out there, but only if you are using PHP and can write some custom .htaccess URI rewrite rules.
So, given this boat load of information, where does it leave us? I think many of us in the educational circles can use A List Apart’s system for any of our emails that show up in dynamically generated listings. Email addresses added to a page by an editor or such would have to be handled manually though (you can get around this with some additional work using Apache’s mod_substitution). My solution is a combination of techniques. Our CMS is Java based, so A List Apart’s methodology doesn’t exactly work. But, what I can do is combine ROT13 encoding with a <noscript> alternative that incorporates an image generator and character encoded link to make it clickable. This would create an image representation of the address that is properly alt tagged so that screen readers can still interpret the address and users could still click it. I think this is a good blend in my case. There is a URIRewrite application on our server as well that would allow me to do some of the A List Apart system in the future. The point being, you don’t have to use only one solution, you can combine different options to try and get the best of every world. But there is no magic bullet if you are trying not to break accessibility.
For many of us, the horse may already be out of the gates, so closing the gate now might not do much. But we can at least try to ease the load on new addresses that become published, and make the spammer’s job harder (and make email admins less likely to gripe at you). There’s no good excuse for handing over emails as plaintext when we have tools to easily avoid it. And ultimately, if a human can read it, it’s inevitable that spammers will crack through it. For the time being, that process isn’t cost effective for them though, so we might as well take advantage of it.