Fight Spam

Spammers are mostly some small businessmen that were talked into believing the power of mass mailing. But they're just as bad as those providing bulk email addresses, which are gathered by a computer program crawling on the Internet or reading newsgroups. You should regularly search for your email address in Google Web and Google Groups to see if your email address is leaked to the Internet. It's possible that you or your friend who knows your email posted a document that contains your email address to the Internet which is picked up by a search engine. If so, notify the author to remove it or mangle it to one that is only readable by a human, for instance, youATyahooDOTcom instead of you@yahoo.com.

This article is about how to read mail header and therefore the path the mail went through. This does not tell you the real identity of the spammer; that would be the responsibility of the ISP of the spammer.

All emails contain mail headers. Yahoo!Mail has an option called "Full Headers". Microsoft Outlook allows you to view headers in a small window. An example email with its header is shown below.

Return-Path: ca101@ultimatesuccess.net
Return-Path: <ca101@ultimatesuccess.net>
Received: from sww.shell.com by columbia.shell.com (SMI-8.6/SIC-SVR4)
	id XAA11958; Tue, 28 Jul 1998 23:35:20 -0500
From: ca101@ultimatesuccess.net
Received: from gate2.shellus.com (gate2.shell.com) by sww.shell.com (4.1/FEJ-1.1)
	id AA10178; Tue, 28 Jul 98 23:35:57 CDT
Received: by gate2.shellus.com; id XAA17296; Tue, 28 Jul 1998 23:35:12 -0500
Date: Tue, 28 Jul 1998 23:35:12 -0500
Message-Id: <199807290435.XAA17296@gate2.shellus.com>

Received: from unknown(208.29.220.141) by gate2.shellus.com via smap (3.2)
	id xma016157; Tue, 28 Jul 98 23:32:09 -0500
To: <yong@shell.com>
Subject: .FREE.$10.00.Calling.Card!!!
Content-Length: 4262

******************************************************************
This Commercial Email Message complies with the proposed United 
States Federal requirements for commercial email, as well as the
Washington State Commercial Email Bill. For additional information

see: http://www.wa.gov/wwweb/AGO/junkemail/ 

Further mailings to you may be stopped at no cost to you by sending
a reply to: ca101@ultimatesuccess.net with "REMOVE" in the subject.
*******************************************************************
Tuesday, July 28, 1998 3A

Dear Friend,

How would you like to receive absolutely.....

FREE! free! FREE! free! FREE! free! FREE!  

..a FREE $10.00
[snipped]

To report a spam mail to the spammer's ISP, we're only interested in Received: lines. If the email is never forged along its way to your mailbox, the bottom Received: line shows from where the email originated and received by which machine; the upper Received: line relays it on, and the top Received: line should tell you from what machine your own mail server received this email from. Since we're dealing with spams, we should only trust the top one, because you know your own mail server won't cheat you (or how could you get your mail?), and so we can disregard all other Received: lines. The above example tells you a real address unknown(208.29.220.141). Don't bother to look up ultimatesuccess.net to see if it exists. You need to find out where unknown(208.29.220.141) is. Sometimes this address has a corresponding hostname like somemachine.somedomain.com. But in this case "unknown" means the IP 208.29.220.141 can't be resolved by the DNS at the receiving site, the site shellus.com where gate2.shellus.com is (the Received: line is added by the machine followed by the by keyword). In order to find the owner of a particular machine, you need to do a whois search. If you're not behind a firewall, you can do it on command line: whois -h whois.arin.net 206.20.183.119. Regardless whether you're behind a firewall, you can run this in your browser at ws.arin.net/cgi-bin/whois.pl. What returns from this query is as follows:

Sprint (NETBLK-SPRINTLINK-BLKS) SPRINTLINK-BLKS    208.0.0.0 - 208.31.255.255
Mycomsoftware (NETBLK-SPRINT-D01DDC) SPRINT-D01DDC
                                                   208.29.220.0 - 208.29.220.255
Sometimes the output tells you to further submit the query to another whois server, possibly one of the following Do so if that's the case. Otherwise, to single out one record, look it up with "!xxx", where xxx is the handle, shown in parenthesis following the name, which comes first.

Now you know that the address belongs in the block from 208.29.220.0 to 208.29.220.255 and the company that owns this block is Mycomsoftware. The next thing you do is search the abuse.net database. Aha, you got the address abuse@sprint.net! You know you need to complain to Sprint, which is its upper level ISP. If you were more careful in reading the ARIN search result before, you would know that. But searching the abuse.net database is always a good idea.

Not all complaint address is like abuse@thecompany.com. For instance, yahoo.com registered with abuse.net the address mail-abuse@yahoo-inc.com. The abuse.net folks generally do a good job. But in rare cases you get a complaint address that looks like the bad guy himself, you may want to complain to him and his upper level ISP at the same time. Another case where you need to find their upper ISP to complain is that abuse.net database does not have an address registered yet for that domain. You may want to forward the spam to postmaster@thecompany.com. According to RFC 822 (a.k.a STD0011) page 32, every domain should have this postmaster user set up. Unfortunately many hostmaters particularly athose in Asia are pretty ignorant of these rules. That's why when you start to get Chinese or Taiwanese spams, it's very difficult to stop getting them!

Sometimes even the Received lines are forged. But there's one gold rule: the top Received line is authentic. Why? Because, that line is added by your own (or your ISP's) mail server. Let's say you're using Yahoo!Mail. The top Received line has to be ..from... by ..yahoo.com... In the good old days when spam was not that rampant and when I was a low-paid programmer with lots of free time even during the day, I would peruse all lines in the mail header and validate each of them. Now I get close to 10 spam mails in my Inbox a day (even with Yahoo!Mail BulkMail folder set up) and are more closely watched by my boss. I simply ignore all lines (including Received lines) below the first Received line. Then I simply forward the spam to their ISP. There's another reason why ignoring everything below the first Received line is better than before. If some Received lines below the first are valid, that means all those above the valid one are simply doing mail relay. That is, I'm at somemachine.x.com and I can use somemachine.y.com to send a mail without first getting on x.com domain. Nowadays more and more mail server admins are aware of this abuse and so have turned mail relay off.

If you still wish to decipher all mail header lines, you can look at IP addresses and time signatures contained in the line. Every Internet-routable IP address contains 4 and only 4 numbers, each in the range 0 to 255, not starting with 0, 127, 192.168, 224, ending with 0 or 255. Also, the time signature should have its first two digits less than 25 (it's the number of hours relative to GMT (Greenwich Mean Time) or otherwise labeled: EDT (Eastern Daylight Time), EST (Eastern Standard Time), PDT (Pacific Daylight Time), MST (Mountain Standard Time) etc.). Therefore, something like

Received: from mail.example.com(208.29.260.141) by gate2.shellus.com via smap 
(3.2) id xma016157; Tue, 28 Jul 98 23:32:09 -5000
is bogus because the IP contains a number 260 and the time is 50 hours slower than Standard Time. This Received line should be completely ignored. Sometimes even if these two criteria are met, the time could still be wrong. For instance, Date: line says -0400 (EDT) but one of the Received: line says -0700 (EDT) while they both show the same hour in the hr/mn/sc format.

I summarize the reading of a mail header Received line as follows in plain English. Not all of them have all these features; some don't have the [IP] part, e.g.. Note B and [IP] should be separated by a space as required by sendmail.

Received: from A (B [IP]) by C (8.7.4/8.7.3) with ESMtp id ... for email_address Date
The sender uses A in SMTP HELO command, to send a mail from the host B, the incoming SMTP connection which has an IP address IP, to C (i.e., received by C), which runs Sendmail program version 8.7.4 and adds this header line to the mail. This mail is delivered to email_address on Date.

With this knowledge, you can also check to see whether the by section of each Received line matches the from section of its upper Received. If not, it could be forged. The reason as I said before why you have to check the last Received is that the Received lines are generated from bottom up, which is the sequential path of network connection.

Links

At Sam Spade, you can find all the tools on one page. At Anti-Spam campaign, you'll find all kinds of relevant information given in verbose mode. Alan Schwartz and Simson Garfinkel wrote a good book Stopping Spam published by O'Reilly. The newsgroup news.admin.net-abuse.email is also an important resource. Another good site is Vicomsoft introductory article.

Technically not spam, but otherwise interesting to read, is my July 2006 involvement in an email scam, almost.

© 1998,1999,2002,2005,2006 Yong Huang

To my Computer Page