Author Topic: Behind the scenes: NLP and Spam  (Read 1582 times)

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1576
  • Country: us
    • English
Behind the scenes: NLP and Spam
« on: December 18, 2013, 04:08:02 PM »
I just thought I'd start off the discussion in this Computational Linguistics forum with something behind-the-scenes. Spam is a major inconvenience on forums, and the main reason for having to create a new one in this case. The old software just wasn't up to the job. Now, in addition to many other new features, the forum is just in general better at dealing with spam-- it's modern with somewhat better security in the first place, and it has a lot more moderation options. We're already ahead on being able to keep up with the spam.

But on top of that, I'm planning to test out a custom spam system that will filter posts and look for problematic signs. This will incorporate just a little linguistic insight to stop the spammers. I won't be posting all of the details publicly (for obvious reasons), but it'll involve things like a keyword filter and checking for certain kinds of content. Every post will be filtered and then the bad ones will be hidden, awaiting moderator approval.

There are existing, automated systems out there that do this, but they end up with too many false positives, unhelpfully filtering legitimate posts and making users wait. True, there may still be some false positives here and there (but we'll approve your posts as soon as we see them!), but even with just some minimal NLP (that is, Natural Language Processing) the spam should be all but defeated.

If you have anything to contribute (ideas about what kinds of posts to filter) feel free to send me a PM. If it's especially general or you just have questions, feel free to reply. But I'll intentionally keep some of the details secret so the spammers don't figure out how to work around it. (For that reason if you do post some detail that might inadvertently be helpful for the spammers, I might edit your posts in this thread.)

Anyway, here's to a spam free forum!  8)


[Edit: ok, filter's up and running. Goodbye spam. It'll only improve with time.]
« Last Edit: December 18, 2013, 06:30:01 PM by djr33 »
Welcome to Linguist Forum! If you have any questions, please ask.

Offline freknu

  • Forum Regulars
  • Serious Linguist
  • *
  • Posts: 397
  • Country: fi
    • Ostrobothnian (Norse)
Re: Behind the scenes: NLP and Spam
« Reply #1 on: December 19, 2013, 12:47:10 AM »
Is this the kind of filtering that is found in Thunderbird (and other email clients) and many Firewalls/Antivirus software?

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1576
  • Country: us
    • English
Re: Behind the scenes: NLP and Spam
« Reply #2 on: December 19, 2013, 01:26:00 AM »
Points assigned based on (gradiently) suspicious behaviors?
That's part of it.

But more than anything, the benefit is that it's customized so that we can block exactly what kind of spam is coming to this forum (even on a specific week/day/month/year/whatever). And it'll keep changing as the need changes.

The problem with automated systems is that they tend to look too generally, while a system like this (with some regular maintenance) will be able to be effective against even the worst spam. If needed, it can simply filter all spammy information (like URLs).

Beyond that I'm not even sure yet. But it's very easy to add extra conditions, so as needed I'll figure out some creative ones. (Just looking at spam posts, they look different, and it's possible to quantify that :) )
« Last Edit: December 19, 2013, 07:35:30 PM by djr33 »
Welcome to Linguist Forum! If you have any questions, please ask.