I am sure that everyone is aware of Spam, or "Unsolicited Commercial Email" as we are supposed to call it. In fact it doesn't bother me too much unless it starts to get offensive. I can hit the delete key as fast as it arrives and, with plenty of experience, I can recognise most of them from the titles. However it does bother some people and, as a lot of the addresses on this site belong to other people, I make every effort to protect them from it.
What is spam
There are a number of different types of spam which it is worth distinguishing.
- There is the genuine Unsolicited Commercial Email. These are put out on contract by genuine commercial companies who believe, rightly or wrongly, that this is a legitimate form of advertising. Although they are often unsolicited, it is possible that you may have originally accidentally “opted in” to receive them. The problem is that these lists are sold on to other companies who have no connection with the original one. The emails are often the graphical type with product images, genuine telephone numbers (albeit in the USA) and something real to sell. Unfortunately, among this type you will also get a few promoting pornography. The sender address will be genuine, often that of the bulk mail company, and the web addresses will work. It has been argued that employing the “click here to unsubscribe” should work for these because they are honest businesses. In practice I have found that this is rarely the case, and there is the alternative opinion that replying to them just confirms that it is a valid address. I don't know one way or the other.
- A second type are those promoting illegal activity. Get rich quick, non-prescription Viagra, passwords for porn sites and such like. They are usually plain text and badly written. The sender address will be fake, the web links will fail (possibly because the site has been pulled by the ISP before you get there). It is never worth replying to these. More than likely the mail will bounce. If it does get through then your address will be added to a list of higher grade confirmed addresses which they can sell for more money. The hardest thing with these is trying to guess the motive for sending them as they don’t seem to fulfil any purpose.
- There is a broad band of overlap between these two types. Genuine plain text, illegal anonymous graphical and all shades between.
- Another type worth mentioning is not really bulk email. They are generated by viruses that other people have caught. What these often do is send an email to everyone in the victim’s address book, which could include you. The best thing to do with these is delete them immediately. There is nothing you can do about it, and it is not even worth replying to let your friend know that they are infected because that address may have been faked as well.
- Finally there are chain mails. No one knows where they start from, but you received it directly from a well meaning friend. There are jokes, prayers, virus hoaxes, scams and the traditional “send money to the person at the top of the list” type. The only thing you can do is carefully and considerately educate the person that sent it to you.
The bulk mailers of both the main types (unless they buy the lists) use what is called a “harvesting spider” to collect email addresses from, amongst other places, web pages. This is a program which follows links from one site to the next looking for the addresses on peoples home pages. There are two basic types. There is the raw spider which does all the link following for itself. These are generally employed by those mailers of the second type above who don't care what the addresses are. The other type employs commercial search engines to do the choosing work and are more selective. They can get lists of addresses for people interested in a particular topic by employing the keyword matching facilities of the search engine. Some harvesters employ both techniques.
The use of this technique has recently (March 2003) been confirmed by a research project carried out by the Center for Democracy & Technology. See their online report on the subject.
The objective of this page is to help you to make it as hard as possible for them. This is done by employing every means possible to disguise the fact that a page contains an email address whilst making it usable for the purpose for which it was intended—enabling your friends and customers to contact you.
Techniques to avoid getting it
I have searched around the web and discovered three or four techniques which I will tell you about.
-
Disguise the email address using “character entities,” especially the @ (at) sign. e.g. coding the address as account@domain.com. The browser will convert the @ back to the @ sign and it will all look ok to the viewer. I suspect that most of the harvesters are wise to this one by now and will not be fooled (it is only a couple of lines of code to convert them back to the real ASCII text). There are some “encryption” programs available (even for sale!) which will do this to every character in the address. At the moment the technique seems to be effective but for how long?
-
Make sure that you have a separate “contacts” web pages which contains the email addresses and code that page with a
<meta NAME="ROBOTS" CONTENT="NOINDEX" />
tag in the head. This is thought to be pretty effective. The commercial search engines will ignore it as requested so the harvesters that go by this route will not see them. Many stand alone spiders will also honour this request as well, not out of any consideration for you but because there are people on the web who hate spam with a vengeance and have invented special web pages designed to kill harvesting robots. These pages can have serious consequences on genuine search engines as well so they keep these off by using the meta tag. The effectiveness of this is still being debated by the experts.
-
Generate the email address using JavaScript or even with a server side script. This technique means that the full email address never appears in the source code in one piece (or at all with the server side option). It is written into the document at execution/display time being made up from a combination of parts. That is the technique that I employ and is fully described later.
- Use a web form. These can vary from the simple, which employ the user’s email client to send the mail, to the complex which mail directly from your server. The former do not seem to offer much advantage because, at some stage, you still have to generate the email address to plug into the message. The latter would be very effective but are beyond the capabilities of many people and requires facilities not often provided by web hosts. Between the two are “Guest Books” which may be effective but I haven’t had a chance to investigate them. They would only be suitable for handling messages back to the site owner and are prone to vandalism, especially if all the messages and replies are displayed online.
- Manually disguise your email address by inserting some humanly understandable rubbish into it. This can be quite effective, though your correspondents will have to type the address in by hand. There is no “click here” option using this method. The harvesters are wise to some of the more obvious techniques such as account@nospam.domain.com so try to make up something for yourself. I think that they also know about (at) instead of @ as I used that for a while but still got junk which I could trace back to that source. Also avoid causing pain to either your own or another ISP. There is a real domain called nospam.com which is virtually useless now, and account.nospam@domain.com causes your own ISP to have to process an unknown account name.
- Another suggestion is to create a graphic of your email address. It has the same problem of no “click here” option but is virually unhackable by mechanical means.
- Regularly change your email address. This is a real pain
but, once you are on the spam address lists, can be the only
solution. To avoid having to tell everyone in your address
book that it has changed, follow these simple rules.
- Keep your real email address very private.
- Make use of the alias facility (most ISPs allow four or five to cater for members of the family). Create an alias which you give out to your personal correspondence list. Never use it for mailing lists, in your signature block, web sites, bulletin boards
- Make another alias for use in your signature block and also set that as your reply-to address (see the mail client help). If it gets spam infested then change it.
- (Optional) Make a third alias for web pages etc.
-
(Optional) If you have access to unlimited email aliases then use a different one for every time you are asked to fill in a form and keep a note of them. That way you will discover who is selling your soul.I no longer advocate this method and suggest you turn this feature off. The spammers now use a technique called a “dictionary attack” where they will work through a huge number of names before the @ on your domain looking for good ones. If you have this facility for accepting any mail addressed to your domain you will get a huge amount of spam.
Sorting it out when it arrives
With the best will in the world, you are not going to be able to avoid getting some. This means that you need to filter it out and quarantine it when it arives so that your eyes are not offended and you are not tempted to open it. The best software that I have discovered for sorting your mail into “spam” and “ham” (and any other catagories if you like) is POPFile, an open source freeware product. There are possibly other good products but I think those using Bayes Theorem to do the filtering stand the best long term chance. They take a little while to teach good from bad (rather like children) but do a better job in the long run.
Addresses on web pages
Now for the meat of this page (not spam). What is the best way to obfuscate (nice word that) an address in a web page to combat all but the most sophisticated crawler, for you will not stop them all this way. I have devised a method which combines a number of the techniques described above and is still “clickable”—you can see it in action on my family page. It uses a) disguising the @ sign with the numeric entity b) breaks the email address into unrecognisable chunks combined back together with a JavaScript function c) inserting HTML into the address to break it up further d) displays the visible @ sign as a very small graphical gif which no robot will recognise e) provides a nice little envelope email bullet to tell people what it is. f) Provides an alternative means of contact for those users who do not have JavaScript available. Here is an example:—
(it is not a real address!) Here is the code that does it (three different options, the last one is new and quite slick). There may be some benefit in altering the order of the parameters so the account does not appear before the domain and, maybe, even breaking the domain up. I am sure that given the technique you can adapt it in any way that you want.
function mail(account, domain, realname) { document.write("<a href='mai" + "lto:" + account + "@" + domain + "'><img \ border='0' src='mail.gif' \ alt='[EMail]' width='15' height='12' />" + realname + "</a>"); } function xmail(account, domain) { document.write("<a href='mai" + "lto:" + account + "@" + domain + "'><img \ border='0' src='mail.gif' \ alt='[EMail]' width='15' height='12' />" + account + "<img border='0' \ src='at.gif' alt='@' width='15' height='12' />" + domain + "</a>"); } function mailto(account, domain) { window.location = 'mai' + 'lto:' + account + '%40' + domain; }
The download contains this code plus the images, instructions and examples.
This software is published Open Source Freeware. You are free to use the programs either in full or part without charge. I wouldn’t mind a credit and a link though <grin>.
No warranty is given or implied by the use of the software which you do at your own risk. No compensation can be considered regarding damage to data, computers or any thing else arising as a result of using these programs. In particular I don't, and cannot, guarantee that you will be free from spam by using it.
Download
I try to keep my system virus free but you do check downloaded files yourself, don’t you? <grin>
8 Jul 2006
mail.exe (72K self extracting) or mail.zip (4K)
Signature
of ZIP file.
-----BEGIN PGP SIGNATURE----- Version: PGP Desktop 9.0.6 (Build 6060) - not licensed for commercial use: www.pgp.com iQA/AwUBRK+M89+E1RVVVycKEQI5YgCfbyolyECKlwxo6UXf5GtkZvRioTsAoMCl GvR43vaHM1LPpmVj73jzOTHZ =A0S5 -----END PGP SIGNATURE-----
My public key can be found here or
on the public key servers identified by Rick
Parsons
& west-penwith.org.uk
.
Enjoy.