Machinehead Software Misc. Downloads
Machinehead Misc Downloads

Making it difficult for S-p-a-m Bots To 'Harvest' Your E-m-a-i-l Address

A free HTML & Javascript tutorial by Nigel Jones. This page will only be of interest if you are operating a web site. Us Webmasters all share a common problem i.e. the amount of junk e-m-a-i-l-s we have to download every day. Often they have a 'remove me' option and undoubtedly some will actually be honourable and do this. Others will simply treat the reply as confirmation of your existence, making the list more valuable when they sell it to all their s-p-a-mer friends!

A bot or spider, is a program that recursively downloads pages, creating lists of links from each. It can only find your page by following links. Some such as GoogleBot are welcome site visitors that also catalogue your pages, and run various clever algorithms that assess the page for relevance, so interested people can find you. S-p-a-m bot's are not so clever but they are only interested in compiling lists of e-mail addresses so they can fill up your inbox with crap. Often the crap you receive isn't even in a language you understand. They are operated by fairly stupid people who don't seem to understand that junk e-mail makes people less likely to buy the product rather than more likely.

Problem is if you run a web site you need to display this information so people can make legitimate enquires.

The Standard HTML Link

The standard HTML for an e-m-a-i-l link takes the following form:
<a href="mailto:nonexistantname@nonexistantomain.com">Some text or a graphic for you to click on</a>

Produces a link like this:
Some text or a graphic for you to click on
To automatically fill in the subject line and put in some body text:
<a href="mailto:nonexistantname@nonexistantomain.com?subject=SUBJECTLINE TEXT&body=BODYTEXT">Some text or a graphic for you to click on</a>

Produces a link like this:
Some text or a graphic for you to click on
This format is really easy for a program to process, simply read the file a character at a time until the 'mailto:' is found. Keep reading up to the next quote ("), ?, & (or any other character that is illegal within an e-m-a-i-l address) and the characters then test that what you have is a correctly formatted e-m-a-i-l address. Easy as that! Interestingly though analysis on my 'deleted items' suggests the current breed of bots are still too stupid to harvest an address that has ?subject=. Looks like they just read up to the next quote then validate (but I wouldn't want to rely on this lasting for ever!).

An Alternative Technique

Below is at least a partial solution that will make an e-m-a-i-l address pretty tough to 'harvest'.

Step 1: Add the following code to in between the <head></head> tags of your web page:
<script src="mailer_script.js"></script>
This example instructs the browser to load the function contained in the file mailer_script.js making the mailme() function available for use within the page by the web browser. Also the example above assumes that the .js file is in the same directory as the .html file. I have posted the script within a .zip file so you can download it easily from here.www.machinehead-software.co.uk/zips/mailer_script.zip

Step 2: You can call the function in a number of ways to cope with various common scenarios:
Example1: (put the following into your page between the <body></body> tags)
<script>
<!---
mailme('nonexistantname','','','');
//--->
</script>
To get this result: (click to test)

Example 2:
<script>
<!---
mailme('nonexistantname','e-m-a-i-l me','','');
//--->
</script>
To get this result: (click to test)

Example 3:
<script>
<!---
mailme('nonexistantname','click to complain','I want to make a complaint!','');
//--->
</script>
To get this result: (click to test)

Example 4:
<script>
<!---
mailme('nonexistantname','click to complain','I want to make a complaint!','This parrot is dead!');
//--->
</script>
To get this result: (click to test)

Example 5:
You can also use the function to display a graphic by putting the HTML for the graphic into the second argument - but don't forget to add \ characters in front of the quotes (" and ' become \" and \') otherwise it won't work.
<script>
<!---
mailme('nonexistantname','<img src=\"images/animated_mhead_icon.gif\" height=32 width=32 alt=\"Please don\'t steal my logo\">','','');
//--->
</script>
To get this result: (click to test)

mailer_script.js contains the following simple function:

function mailme(name,text,subject,body){
     //(example below assumes your domain is domain_name.co.uk)
     //edit the following 2 lines for the domain of your site
     d="domain_name";
     e=".co.uk";

     str='<a href="mailto:' + name + '@' + d + e
     if(subject != ""){
         str+='?subject=' + subject
         if(body!="")str+="&body=" + body
     }
     str+= '">'
     if(text==""){
         str+= name + '@' + d + e
     }else{
         str+= text
     }
     str+='</a>'
     document.write(str);
};

The mailme() function simply sticks the arguments you supply together and writes a human readable e-m-a-i-l link to the page. Don't forget to edit the text 'domain_name' and '.co.uk' to use your sites domain name. I also recommend that everyone calls the .js file by different names, and uses different names for the function call. Do this and it becomes impossible for the bots to harvest your address. If we all do it in exactly the same way, it becomes technically feasible to harvest.

What if the browser doesn't support javascript?

If the browser doesn't support javascript the mailme function isn't going to show anything but you can put some text/markup between some <noscript></noscript> tags.

One technique is to break up your e-m-a-i-l within a table like this:

<noscript>
<table border=0 cellpadding=0 cellspacing=0><tr>
     <td>My e-mail address is: non</td>
     <td>existantname</td>
     <td>@non</td>
     <td>existantdomain</td>
     <td>.</td>
     <td>com</td>
<tr></table>
<font color=red>Please turn on your javascript if you want to click on my address as a link (this is a s-p-a-m reduction measure)</font>
</noscript>

Giving the following result:
My e-mail address is: non existantname @non existantdomain . com
Please turn on your javascript if you want to click on my address as a link (this is a s-p-a-m reduction measure)

Obviously it won't open the clients e-m-a-i-l-er etc when clicked on. But the address is still displayed by the browser in a human readable form, and most browsers support javascript anyhow. Although it is technically feasible to write a program to harvest addresses disguised in this way. It is highly unlikely that the s-p-a-m b-o-t-s will be this smart in the foreseeable future. ;-)

An even more secure technique would be to present your mail address as a graphic. Users of non javascript enabled browsers still won't be able to click the graphic, but it will take a very long time indeed before any of the bots become smart enough (and fast enough!) to process every graphic in a web site and determine if it contains an address!

Limitations

The techniques described above won't stop you from getting s-p-a-m. The script won't remove you from any lists you might already be on. It won't stop the s-p-a-mers from selling these lists to others. It won't stop your address from being 'harvested' by other means (submission forms, humans with browsers etc). But at least a human visitor to my site is unlikely to send me adverts in Chinese etc!

Download the script

Because I'm a public spirited kind of bloke, I have included the script as a .zip file so you can download it easily from here.

License

This script is distributed on a completely free basis in an open source format. You may deploy it on any site you like (commercial or non-commercial) for free. Your usage of the script is entirely at your own risk, and neither Machinehead Software or Nigel Jones assume any liability whatsoever. There is no requirement for you to acknowledge me when you use the mailer_script.js script (although link's back to my site are always appreciated). You may modify/reverse engineer the mailer_script.js script in any way you want to (and you are actively encouraged to do so for reasons explained above!). You may distribute the code in any human or machine readable form you like providing that you do so for free.

Statement

For the record - I HAVE NEVER 'OPTED IN' TO ANY MAILING LISTS!!!!!!.

I resent having to download irrelevant advertisements at my own expense. When I need your products or services I will seek out and visit your site. I am perfectly capable of finding pornography, cosmetics, bulk e-m-a-i-l systems, and even adverts in Chinese if I should feel the need! I don't currently have an intolerable s-p-a-m problem (but its growing). I'm currently only getting about 10 s-p-a-m-s for each legitimate e-m-a-i-l, and most of these will loose me when my old e-m-a-i-l address mhead@xsystems.co.uk is switched off. (Got my own domain now, its www.machinehead-software.co.uk ;-)

There are other ways to reduce the s-p-a-m payload: A change of e-m-a-i-l address when the ratio gets above about 20:1. (Not a big problem doing this because my friends and customers can find the site just as easily as the s-p-a-m-ers) is one option. Ask the sysop of your server to configure the firewall to reject future s-p-a-m-s is another possibility. This second option is a bit of a 'sledgehammer to crack a nut' though because it rejects all future e-m-a-i-l from a particular domain. This is hardly likely to be a problem with Taiwanese s-p-a-m servers though, and fortunately Simon (who owns the XSystems server is usually very cooperative in these matters ;-)

I think it is important to differentiate. Not all unsolicited e-m-a-i-l is s-p-a-m. In the last four years or so I have received at least 3 useful bits of information from bulk e-m-a-i-l lists that I did not subscribe to. Unsolicited e-m-a-i-l-s are less likely to be classed as annoying s-p-a-m if they are small (no graphics, attachments etc) and infrequent (5 copies of the same ad on the same day is a very good way to annoy). Selective targeting is also very important, s-p-a-m bots are not selective, same goes for lists purchased from other s-p-a-m-ers. A list compiled from existing customers and people who have made previous enquiries about this or closely related subject matter is a better way. This can be acceptable providing it is used in a fairly limited fashion. Better of course is not to bother at all in anything other than exceptional circumstances. If you a good job of constructing your site, and your product is any good then people will come to you, and you will be to busy responding to legitimate enquires to even find the time to create a s-p-a-m list. This is the way the web is supposed to work, and those clever chaps at Google, Altavista, Lycos etc are doing an excellent job of providing relevant information on demand.

Bespoke Programming and Web Site Design Services

Home Page | Webmasters Weaponry | Bicycle Software | Other Software | Music Player | Complaints | S-p-a-m b-o-t r-e-v-e-n-g-e!!!!

Designed by Nigel Jones (registered voter!) and the Machinehead Programing Team

Site Hosted by XSec Hosting /Link for Hannah BCISGNET