Blog spam

I occasionally complain about blog spam, but it seems I should take special time to mention that there have now been over 400,000 spam comments on Specifically, the site current tells me “1335 comments today, 1335 of them spam. 401209 comments ever, 400220 of the spam.”. Dear spammers — you suck.

I wonder if there is anything useful which can be done with all this spam? Just in case, its available at

An occasional rant about spam

mikal@daedalus:~/blog-comments$ find . -type f -name "*.yes" | wc -l
mikal@daedalus:~/blog-comments$ find . -type f -name "*.no" | wc -l
mikal@daedalus:~/blog-comments$ find . -type f -name "*.blocked" | wc -l
mikal@daedalus:~/blog-comments$ find . -type f -name "*.badword" | wc -l
mikal@daedalus:~/blog-comments$ du -sh
506M    .

664 real comments on this site, 18361 I manually said no to, 32111 were blocked based on originating IP, and 5007 contained a bad word. Andrew currently donates 506 mb of disk to hosting just comments. That seems excessive to me. I’ll take the time to cleanup the disk usage in the next couple of days.

Black listing words in comments

I had yet another flood of comment spam over the last couple of days, so I spent some time last night writing some code to add word black listing to my IP based spam filtering. I also block over 100 IPs from posting to this blog now. It’s helped a lot — I am only black listing one word so far, and it’s already blocked 200 spam posts…

Some details from the dawn of time: 559 posts which survived moderation, 12,665 posts which were manually blocked, 20,119 posts which were automatically blocked based on the submitter’s IP, and 256 blocked because of use of a banned word (since last night!). My blog comments now take 258 megabytes on disk.

Comment spam again

As In Search of L33t says, comment spam can be a large scale annoyance. In L33t’s words:

I am promising myself I am going to start blogging more. The main problem is that I am so tired of blog spam. Even with the comments turned off I am still getting blog spam. It depresses me a little to see so many blog comments that have absolutely nothing to do with my topics.

I have similar blog spam levels:

    mikal@daedalus:~/blog-comments$ du -sh
    79M     .
    mikal@daedalus:~/blog-comments$ find . -type f -name "*.no" | wc -l
    mikal@daedalus:~/blog-comments$ find . -type f -name "*.blocked" | wc -l
    mikal@daedalus:~/blog-comments$ find . -type f -name "*.yes" | wc -l

Yes, that really is 79 meg of blog comments (admittedly including the metadata for recent comments). The most interesting bit is that blocked line. That’s the number of posts which have been automatically blocked since I started automatically blocking some posters. It’s been really effective, I get around one or two comment spams in my email for moderation a day now. The super secret algorithm? I block these IP addresses:

I recommend others give it a try, as it’s eliminated basically all of my comment spam. That’s right, it appears to me that almost all comment spam comes from these few IPs.

Blog comment spam

I occasionally
comment on
the amount of
comment spam I get here.

But I felt further analysis might be a good idea, so I am not logging as much information as possible about the commenter when they submit a comment. This dump below I find fairly interesting (it’s for approximately the last 24 hours).

    mikal@daedalus:~/blog-comments$ find . -type f -name *.info -exec cat {} \; | \
    grep REMOTE_ADDR | sort | uniq -c | sort -n
          2 REMOTE_ADDR =
          3 REMOTE_ADDR =
          5 REMOTE_ADDR =
          8 REMOTE_ADDR =
          9 REMOTE_ADDR =
         11 REMOTE_ADDR =
         12 REMOTE_ADDR =
         13 REMOTE_ADDR =
         16 REMOTE_ADDR =

I wonder if blocking specific IPs would help the spam level, or if stopping comments on some posts would help? There certainly seem to be some “hot spot” posts:

    264: /home/mikal/blog-comments/travel/usa/california/santaclara/000003
    179: /home/mikal/blog-comments/diary/lca2005/000029
    170: /home/mikal/blog-comments/linux/000038
    158: /home/mikal/blog-comments/diary/000796
    134: /home/mikal/blog-comments/diary/000795
    92: /home/mikal/blog-comments/pdfdb/000001
    87: /home/mikal/blog-comments/link/000065
    81: /home/mikal/blog-comments/diary/toys/000001
    79: /home/mikal/blog-comments/travel/usa/000006
    70: /home/mikal/blog-comments/diary/toys/mp101/pymediaserver/000001

I think I will ponder more.

Another interesting comment spam

I think the spammers are getting smarter at dealing with moderation systems…

     Hi All. Help me please with my MSIE!
     In my title bar instead of "Microsoft Internet Explorer" a title appears as an advertisement about a site i visited
     "visit www . site353535 . com and register for free". What can I do to restore my browser? And  why on this site: (meet single woman)
     I constantly see "cannot resolve host name"? Waiting for your reply.

At least they’re starting to make me wonder if the comment is vaguely genuine before I kill it.

More on comment spam

Five funny comment spams today (although they were all the same, so it was less funny by the end):

    There has been a comment made on the post /diary/toys/000040.
    The commenter was Smart
    I am smart auto posting. We are posting from auto machine.

At least it’s honest. I wonder if this implies that there will be a flood of comment spam coming my way, although the flood gates seem pretty open already. It seems this bot is often quite honest.

Defending myself from the wrath of the Chris

Chris accuses my moderation robots of having failed him, which is hurtful given that I am said robot. The way comment spam detection works is currently like a moderated mailing list — all comments are treated as spam until I manually approve them, which I generally do about once a day. Therefore if you make a comment, it may take some time to appear (especially now that I live in a different timezone than most of my readers).

Fear now Chris, your comment did survive moderation and is visible…

More on comment spam

What’s wrong with this picture?

    mikal@daedalus:~/blog-comments$ find . -type f -name "*.yes" -mtime -2 | wc -l
    mikal@daedalus:~/blog-comments$ find . -type f -name "*.no" -mtime -2 | wc -l

That’s 1 real comment (thanks Andrew) to 101 spam comments. Nice.