Blocking hotmail.com

I’ve just blocked people with email addresses from the hotmail.com domain from posting on this site. This will only affect people who put their email address into the comment form, so if it really bothers you just don’t enter an email address or use a different one. Why am I doing this? Because about 99% of the spam over the last week has been using various hotmail.com addresses, and I am fed up with moderating it.

Blog spam

I occasionally complain about blog spam, but it seems I should take special time to mention that there have now been over 400,000 spam comments on stillhq.com. Specifically, the site current tells me “1335 comments today, 1335 of them spam. 401209 comments ever, 400220 of the spam.”. Dear spammers — you suck.

I wonder if there is anything useful which can be done with all this spam? Just in case, its available at http://www.stillhq.com/allcomments/.

An occasional rant about spam

mikal@daedalus:~/blog-comments$ find . -type f -name "*.yes" | wc -l
664
mikal@daedalus:~/blog-comments$ find . -type f -name "*.no" | wc -l
18361
mikal@daedalus:~/blog-comments$ find . -type f -name "*.blocked" | wc -l
32111
mikal@daedalus:~/blog-comments$ find . -type f -name "*.badword" | wc -l
5007
mikal@daedalus:~/blog-comments$ du -sh
506M    .

664 real comments on this site, 18361 I manually said no to, 32111 were blocked based on originating IP, and 5007 contained a bad word. Andrew currently donates 506 mb of disk to hosting just comments. That seems excessive to me. I’ll take the time to cleanup the disk usage in the next couple of days.

Email subscription to comments

Hey all. Yesterday I finally got around to implementing email subscriptions to comments on posts in my custom comment module code for Blosxom. I run a custom comment module because of the static generation mode I use for the site, which helps reduce load on Andrew‘s server.

Email subscription to comments on a post that you have commented on is the default, but it is easy for the user to turn it off. If you post and opt for email, you’ll also get an email when your own post survives moderation, which might be useful for some people.

It will be interesting to see if willingness to be emailed a comment is an effective spam signal or not — so far with a sample of six spam comments, it seems to be evenly split between the two options, which is interesting because it means some spam bots are smart enough to turn the check box off. Or are they using a POST without using my form at all?

(That makes me wonder if moving the URL for the submission CGI might reduce spam…)

If there is any interest in a public release of my uber crap perl code let me know, and I might try and find the time to clean it up.

Black listing words in comments

I had yet another flood of comment spam over the last couple of days, so I spent some time last night writing some code to add word black listing to my IP based spam filtering. I also block over 100 IPs from posting to this blog now. It’s helped a lot — I am only black listing one word so far, and it’s already blocked 200 spam posts…

Some details from the dawn of time: 559 posts which survived moderation, 12,665 posts which were manually blocked, 20,119 posts which were automatically blocked based on the submitter’s IP, and 256 blocked because of use of a banned word (since last night!). My blog comments now take 258 megabytes on disk.

Comment spam again

As In Search of L33t says, comment spam can be a large scale annoyance. In L33t’s words:

I am promising myself I am going to start blogging more. The main problem is that I am so tired of blog spam. Even with the comments turned off I am still getting blog spam. It depresses me a little to see so many blog comments that have absolutely nothing to do with my topics.

I have similar blog spam levels:

    mikal@daedalus:~/blog-comments$ du -sh
    79M     .
    mikal@daedalus:~/blog-comments$ find . -type f -name "*.no" | wc -l
    6064
    mikal@daedalus:~/blog-comments$ find . -type f -name "*.blocked" | wc -l
    4778
    mikal@daedalus:~/blog-comments$ find . -type f -name "*.yes" | wc -l
    483
    

Yes, that really is 79 meg of blog comments (admittedly including the metadata for recent comments). The most interesting bit is that blocked line. That’s the number of posts which have been automatically blocked since I started automatically blocking some posters. It’s been really effective, I get around one or two comment spams in my email for moderation a day now. The super secret algorithm? I block these IP addresses:

    84.19.184.26
    85.255.117.250
    203.142.1.182
    202.71.106.121
    85.249.136.194
    202.76.235.6
    202.75.62.79
    202.75.49.130
    202.75.49.134
    202.75.49.133
    202.75.49.131
    193.87.17.120
    

I recommend others give it a try, as it’s eliminated basically all of my comment spam. That’s right, it appears to me that almost all comment spam comes from these few IPs.

Blog comment spam

I occasionally
comment on
the amount of
comment spam I get here.

But I felt further analysis might be a good idea, so I am not logging as much information as possible about the commenter when they submit a comment. This dump below I find fairly interesting (it’s for approximately the last 24 hours).

    mikal@daedalus:~/blog-comments$ find . -type f -name *.info -exec cat {} \; | \
    grep REMOTE_ADDR | sort | uniq -c | sort -n
          2 REMOTE_ADDR = 85.255.117.250
          3 REMOTE_ADDR = 203.142.1.182
          5 REMOTE_ADDR = 202.71.106.121
          8 REMOTE_ADDR = 202.75.62.79
          9 REMOTE_ADDR = 202.75.49.130
         11 REMOTE_ADDR = 202.76.235.6
         12 REMOTE_ADDR = 202.75.49.131
         13 REMOTE_ADDR = 202.75.49.134
         16 REMOTE_ADDR = 202.75.49.133
    mikal@daedalus:~/blog-comments$
    

I wonder if blocking specific IPs would help the spam level, or if stopping comments on some posts would help? There certainly seem to be some “hot spot” posts:

    264: /home/mikal/blog-comments/travel/usa/california/santaclara/000003
    179: /home/mikal/blog-comments/diary/lca2005/000029
    170: /home/mikal/blog-comments/linux/000038
    158: /home/mikal/blog-comments/diary/000796
    134: /home/mikal/blog-comments/diary/000795
    92: /home/mikal/blog-comments/pdfdb/000001
    87: /home/mikal/blog-comments/link/000065
    81: /home/mikal/blog-comments/diary/toys/000001
    79: /home/mikal/blog-comments/travel/usa/000006
    70: /home/mikal/blog-comments/diary/toys/mp101/pymediaserver/000001
    

I think I will ponder more.

Another interesting comment spam

I think the spammers are getting smarter at dealing with moderation systems…

     Hi All. Help me please with my MSIE!
     In my title bar instead of "Microsoft Internet Explorer" a title appears as an advertisement about a site i visited
     "visit www . site353535 . com and register for free". What can I do to restore my browser? And  why on this site:
     http://xxxxxxx-xxx-xxxxx.xxxxxxxxxxxx.net (meet single woman)
     I constantly see "cannot resolve host name"? Waiting for your reply.
    

At least they’re starting to make me wonder if the comment is vaguely genuine before I kill it.