Spam: Difference between revisions

The educational technology and digital learning wiki
Jump to navigation Jump to search
Line 10: Line 10:
Below we collected some essential strategies and links for [[mediawiki]] administrators that manage somewhat closed wikis (i.e. only registered users can edit). If you manage an open wiki, then you likely will have to use extra strategies that are described in [http://www.mediawiki.org/wiki/Category:Security various MediaWiki manual] pages.
Below we collected some essential strategies and links for [[mediawiki]] administrators that manage somewhat closed wikis (i.e. only registered users can edit). If you manage an open wiki, then you likely will have to use extra strategies that are described in [http://www.mediawiki.org/wiki/Category:Security various MediaWiki manual] pages.


== Learn who your spammers are ==
== Learn who your spammers are and block whole domains ==


This strategy may allow to block whole domains (e.g. in the httpd.conf file or at the system level). When you use a difficult user account procedure as in this wiki, then wikis can by spammed manually (typically by underpayd third-world people hired by a first-world company). Blocking out whole (sub)domains can help a bit...
This strategy may allow to block whole domains (e.g. in the httpd.conf file or at the system level). When you use a difficult user account procedure as in this wiki, then wikis can by spammed manually (typically by underpayd third-world people hired by a first-world company). Blocking out whole (sub)domains can help a bit...
Line 16: Line 16:
If your [[Mediawiki]] is spammed: first you will have to go either through your web server logs, e.g. search for "submitlogin" or install an extension that shows the IP number of users. We recommend the latter strategy:
If your [[Mediawiki]] is spammed: first you will have to go either through your web server logs, e.g. search for "submitlogin" or install an extension that shows the IP number of users. We recommend the latter strategy:


; The CheckUser extension
=== The CheckUser extension ===
: allows you to figure out where they come from (connect from) and may help you decide whether you should block a whole IP range or ranges (e.g. a whole country). You either can enter user names or IP numbers. Then you can both trace and block a user.
: allows you to figure out where they come from (connect from) and may help you decide whether you should block a whole IP range or ranges (e.g. a whole country). You either can enter user names or IP numbers. Then you can both trace and block a user.
: Installed on EduTechwiki and also some Wikipedia/media sites.
: Installed on EduTechwiki and also some Wikipedia/media sites.
Line 28: Line 28:
* [http://www.ipgp.net/ ipgp.net] (find domain names for IP numbers)
* [http://www.ipgp.net/ ipgp.net] (find domain names for IP numbers)


Finally, if your spammers always show up from the same country, a last resort is to block everyone (not very nice).
=== Block a domain at the web server level ===
 
If your spammers always show up from the same country, a last resort is to block everyone (not very nice).
* Firstly retrieve the country you want to block from a site like [http://www.ipdeny.com/ipblocks/ IPDeny.com]
* Firstly retrieve the country you want to block from a site like [http://www.ipdeny.com/ipblocks/ IPDeny.com]


Edit an appache configuration file, e.g. /etc/apache2/apache2.conf and use patterns like:
Edit an appache configuration file, e.g. /etc/apache2/apache2.conf and use patterns like:
  <Location "/">
  <Location "/">
  Deny from 203.177.*.*
  Deny from 203.177.
  Deny from 180.190.*
  Deny from 180.190.
  ...
  ...
  </Location>
  </Location>
Line 44: Line 46:
You can put several IP's on a single line like this:
You can put several IP's on a single line like this:
  ALL: 203.177., 222.127., 192.168.
  ALL: 203.177., 222.127., 192.168.
=== More blocking ===


If you don't have access to your server machine, then you also can block IP's from the mediawiki, but this is more resource intensive since PHP can't be as fast as the OS or the web server I believe and it will not protect other wikis that run on the same server from spamming. See [http://www.mediawiki.org/wiki/Manual:Combating_spam Combatin spam] (Mediawiki.org).
If you don't have access to your server machine, then you also can block IP's from the mediawiki, but this is more resource intensive since PHP can't be as fast as the OS or the web server I believe and it will not protect other wikis that run on the same server from spamming. See [http://www.mediawiki.org/wiki/Manual:Combating_spam Combatin spam] (Mediawiki.org).


Finally, to block access at the web server level, there also exist apache extensions (none tested) and '''[http://en.wikipedia.org/wiki/Firewall_%28computing%29 firewall programs]'''. Installing a Firewall program is a good option if you  
Finally, to block access at the web server level, there also exist apache extensions (none tested) and '''[http://en.wikipedia.org/wiki/Firewall_%28computing%29 firewall programs]'''. Installing a Firewall program is a good option if you


== Fight mediawiki spamming ==
== Fight mediawiki spamming ==

Revision as of 14:46, 17 September 2010

<pageby nominor="false" comments="false"/>

Introduction

This article mainly is concerned with wiki spamming. Wiki spamming has been increasing over the years and for two reasons:

  1. Its authors believe that inserting spamming links helps google rankings. This is actually not the case with a default installation since all links include a rel="nofollow" HTML attribute.
  2. Popular wiki pages may be spammed with "sneaky" links that will be followed by some readers. A typical example in this wiki are frequent attempts to insert links to cheating services, i.e. web sites that offer to write student papers for a fee. Note to students about cheating services: Do not use these services because quality is most often low, the paper doesn't match what your teacher expects from you, and contents most often include plagiarized sections. In other words: Better turn in a bad assignment that you wrote the "night before". You'll get the same bad grade without having to pay for it and you don't risk being punished...

Below we collected some essential strategies and links for mediawiki administrators that manage somewhat closed wikis (i.e. only registered users can edit). If you manage an open wiki, then you likely will have to use extra strategies that are described in various MediaWiki manual pages.

Learn who your spammers are and block whole domains

This strategy may allow to block whole domains (e.g. in the httpd.conf file or at the system level). When you use a difficult user account procedure as in this wiki, then wikis can by spammed manually (typically by underpayd third-world people hired by a first-world company). Blocking out whole (sub)domains can help a bit...

If your Mediawiki is spammed: first you will have to go either through your web server logs, e.g. search for "submitlogin" or install an extension that shows the IP number of users. We recommend the latter strategy:

The CheckUser extension

allows you to figure out where they come from (connect from) and may help you decide whether you should block a whole IP range or ranges (e.g. a whole country). You either can enter user names or IP numbers. Then you can both trace and block a user.
Installed on EduTechwiki and also some Wikipedia/media sites.

Alternatively dig through web server access logs and then consult one of these:

Block a domain at the web server level

If your spammers always show up from the same country, a last resort is to block everyone (not very nice).

  • Firstly retrieve the country you want to block from a site like IPDeny.com

Edit an appache configuration file, e.g. /etc/apache2/apache2.conf and use patterns like:

<Location "/">
Deny from 203.177.
Deny from 180.190.
...
</Location>

Now if you use virtual hosts, then you'll have to edit these files too or else include the same file like this.

Include /path/to/deny.conf

In addition, you could edit /etc/hosts.deny in order to block any other attemps via ssh to connect to your server. But since your httpd is a independant daemon, banning hosts in /etc/hosts.deny won't ban them from your web server (remember that).

ALL: 192.168.
ALL: .example.com

You can put several IP's on a single line like this:

ALL: 203.177., 222.127., 192.168.

More blocking

If you don't have access to your server machine, then you also can block IP's from the mediawiki, but this is more resource intensive since PHP can't be as fast as the OS or the web server I believe and it will not protect other wikis that run on the same server from spamming. See Combatin spam (Mediawiki.org).

Finally, to block access at the web server level, there also exist apache extensions (none tested) and firewall programs. Installing a Firewall program is a good option if you

Fight mediawiki spamming

There exist several strategies to fight spamming:

Registered users

To fight spamming, only registered uses should be able to edit (implemented in EduTechWiki)

Edit Localsettings.php and change:

$wgGroupPermissions['*']['edit']            = false;
$wgGroupPermissions['*']['createaccount']   = true;
$wgGroupPermissions['*']['read']            = true;
Light-weight user creation that requires some math

This can defeat some scripts

Using captcha

Make login creation and (optionally) page editing more difficult with captcha, i.e. users will have to type in a code that is generated by the wiki. This can defeat more scripts

Light-weight solution
Making user creation even more difficult with recaptcha. This is implemented in EduTechWiki
Also contributes to a digitalization project....

In Edutechwiki we roughly use the following setup. However, at some point we may remove the captcha from page editing and install the revision system (see below) instead.

# Anti Spam ConfirmEdit
# Recaptcha relies on ConfirmEdit, but only ONE needs to be loaded
# require_once("extensions/ConfirmEdit/ConfirmEdit.php");

# ReCaptcha
# See the docs in extensions/recaptcha/ConfirmEdit.php
# http://wiki.recaptcha.net/index.php/Main_Page
require_once( "$IP/extensions/recaptcha/ReCaptcha.php" );
$recaptcha_public_key = '................';
$recaptcha_private_key = '................';

# Users must be registered, once they are in, they they still must fill in captchas (at least over the summer)
$wgCaptchaTriggers['edit']          = true;
$wgCaptchaTriggers['addurl']        = false;
$wgCaptchaTriggers['create']        = true;
$wgCaptchaTriggers['createaccount'] = true;

Filtering edits and page names

Prevent creation of pages with bad words in the title and/or the text.

The builtin WgSpamRegex variable

Mediawiki includes a $wgSpamRegex variable. The goals is prevent three things: (a) bad words, (b) links to bad web sites and (c) CSS tricks to hide contents.

Insert in LocalSettings.php something like:

$wgSpamRegex = "/badword1|barword2|abcdefghi-website\.com|display_remove_:none|overflow_remove_:\s*auto;\s*height:\s*[0-4]px;/i"

I will not show ours here since I can't include it in this page ;)

Read the manual page for detail. It includes a longer regular expression that you may adopt.

Don't forget to edit MediaWiki:Spamprotectiontext

Spam blacklists extensions (an alternative)

The SpamBlacklist extension prevents edits that contain URL hosts that match regular expression patterns defined in specified files or wiki pages.

rel = "nofollow"

Wiki spammers aim at two things:

  • Insert well placed links in articles dealing somewhat with the spam content's subject area so that people will actually see them and then follow (same principle as google ads). This requires understanding of an article content. Since most paid wiki spammers are poorly trained from poor non-English speaking countries, this strategy most often fails.
  • Get a better Google ranking. This second purpose will not work in this wiki, since under the default configuration, MediaWiki adds rel='nofollow' to external links in wiki pages, to indicate that these are user-supplied, might contain spam, and should therefore not be used to influence page ranking algorithms. Popular search engines such as Google honour this attribute. (Manual:Combating spam). Most wiki spammers abusing edutechwiki are too stupid to know about this (some labour really must come cheap ....)

To some companies, wiki spamming may seem to be a good strategy, but most often it is not...

Flagged Revisions

Article validation allows for Editor and Reviewer users to rate revisions of articles and set those revisions as the default revision to show upon normal page view. These revisions will remain the same even if included templates are changed or images are overwritten. This allows for MediaWiki to act more as a Content Management System (CMS). I probably will install this any time soon - Daniel K. Schneider 11:32, 30 July 2010 (UTC).

Usage examples: Wikibooks (a Wikimedia site), German Wikipedia (each Wikipedia community can decide if they adopt and also what configuration ought to be used).

According to the FlaggedRevs Help page (retrieved Junly 31 2010):

FlaggedRevs is an extension to the MediaWiki software that allows a wiki to monitor the changes that are made to pages, and to control more carefully the content that is displayed to the wiki's readers. Pages can be flagged by certain "editors" and "reviewers" to indicate that they have been reviewed and found to meet whichever criteria the wiki requires. Each subsequent version of the page can be "flagged" by those users to review new changes. A wiki can use a scale of such flags, with only certain users allowed to set each flag.

The ability to flag revisions makes it easier to co-ordinate the process of maintaining a wiki, since it is much clearer which edits are new (and potentially undesirable) and which have been accepted as constructive. It is possible, however, to configure pages so that only revisions that are flagged to a certain level are visible when the page is viewed by readers; hence, changes made by users who cannot flag the resulting version to a high enough level remain in a "draft" form until the revision is flagged by another user who can set a higher flag.

FlaggedRevs is extremely flexible and can be used in a wide range of configurations; it can be discreet enough to be almost unnoticeable, or it can be used to very tightly control a wiki's activity.

Clean up wiki spamming

Block user and revert changes

... obviously

Mass deletion and manipulation scripts

There are several command line scripts that allow for some simple surgery (but read the code/comments on top before you use these !)

Extensions:

  • Nuke is an extension that makes it possible for sysops to mass delete pages. If you have command line access you also can use deleteBatch.php

Hide revisions with inappropriate content

Core MediaWiki has a feature (disabled by default) that adds a special page called Special:RevisionDelete. Deleted revisions and events will still appear in the page history and logs, but parts of their content will be inaccessible to the public. This is useful if you believe that some wiki spammers will link to older revisions in your wiki.

Edutechwiki settings:

$wgGroupPermissions['sysop']['deleterevision']  = true;

In addition you may rename bad user names. Not often needed IMHO for fighting spam, but can be useful to rename inappropriately created logins by students for example.

Links

General

Legal issues and official policy

Note:

  • Wiki spamming is worse than e-mail spamming, because it relates to vandalism and therefore additional laws can apply.
  • Official EU and OECD websites are often unstable (link decay, e.g. the www.oecd-antispam.org official website which is linked to from many places is dead ...)
USA (main direct or indirect source of spamming)
EU
UK

General wiki spamming

Examples from content guidelines - what is spam ?

Mediawiki

There are several MediaWiki pages dealing with spam. As of July 2010 they are not all up-to-date and coordinated.

  • Spam Filter (This is development page of Mediawiki. I includes extra information, e.g. cleanup scripts.)
  • Help:Spam (Wikia) Wikia is a commercial version of Wikipedia with many user-managed subwikis that have their own aims and content policies.

Bibliography

  • West, Andrew G., Sampath Kannany and Insup Lee (2010). Detecting Wikipedia Vandalism via Spatio-Temporal Analysis of Revision Metadata, Department of Computer & Information Science, Technical Reports (CIS), University of Pennsylvania. PDF. See also: STiki (Spatio-Temporal analysis over Wikipedia)
  • M. Potthast, B. Stein, and R. Gerling (2008). Automatic vandalism detection in Wikipedia. In Advances in Information Retrieval, pages 663-668