Spam
Lookup IP addresses and domain names
This may allow to block whole domains (e.g. in the httpd.conf file or at the system level). When you use a difficult user account procedure as in this wiki, then wikis can spammed manually (typically by third-world people underpaid by a first-world company) and this can help a bit.
If your Mediawiki is spammed: first you will have to go either through your web server logs, e.g. search for "submitlogin" or install an extension that shows the IP number of users.
- The CheckUser extension
- allows you to figure out where they come from (connect from) and may help you decide whether you should block a whole IP range or ranges (e.g. a whole country). You either can enter user names or IP numbers. Then you can both trace and block a user.
- Installed on EduTechwiki and also some Wikipedia/media sites.
Alternatively dig through web server access logs and then consult one of these:
- Whois.Net
- whois by IP
- Ping (see if a web (or other server) is alive. Takes both IP and name.
- Easywhois.com (alternative to whois.net)
- ipgp.net (find domain names for IP numbers)
Mediawiki spamming
There exist several strategies:
Registered users
To fight spamming, only registered uses should be able to edit. Edit Localsettings.php
$wgGroupPermissions['*']['edit'] = false; $wgGroupPermissions['*']['createaccount'] = true; $wgGroupPermissions['*']['read'] = true;
- Light-weight user creation that requires some math
This can defeat some scripts
- Making user creation more difficult with captcha
This can defeat more scripts
- Making user creation even more difficult with recaptcha
- Also contributes to a digitalization project....
- Mediawiki extension (repcaptcha)
- Learn more about the project
This extension is currently used in Edutechwiki with (roughly the following setup)
# Anti Spam ConfirmEdit # Recaptcha relies on ConfirmEdit, but only ONE needs to be loaded # require_once("extensions/ConfirmEdit/ConfirmEdit.php"); # ReCaptcha # See the docs in extensions/recaptcha/ConfirmEdit.php # http://wiki.recaptcha.net/index.php/Main_Page require_once( "$IP/extensions/recaptcha/ReCaptcha.php" ); $recaptcha_public_key = '................'; $recaptcha_private_key = '................'; # Users must be registered, once they are in, they they still must fill in captchas (at least over the summer) $wgCaptchaTriggers['edit'] = true; $wgCaptchaTriggers['addurl'] = false; $wgCaptchaTriggers['create'] = true; $wgCaptchaTriggers['createaccount'] = true;
Filtering edits and page names
Prevent creation of pages with bad words in the title and/or the text.
- The builtin WgSpamRegex variable
Mediawiki includes a $wgSpamRegex variable. The goals is prevent three things: (a) bad words, (b) links to bad web sites and (c) CSS tricks to hide contents.
Insert in LocalSettings.php something like:
$wgSpamRegex = "/badword1|barword2|abcdefghi-website\.com|display_remove_:none|overflow_remove_:\s*auto;\s*height:\s*[0-4]px;/i"
I will not show ours here since I can't include it in this page ;)
Read the manual page for detail. It includes a longer regular expression that you may adopt.
Don't forget to edit MediaWiki:Spamprotectiontext
- Spam blacklists extensions (an alternative)
The SpamBlacklist extension prevents edits that contain URL hosts that match regular expression patterns defined in specified files or wiki pages.
rel = "nofollow"
Wiki spammers aim at two things:
- Insert well placed links in articles dealing somewhat with the spam content's subject area so that people will actually see them and then follow (same principle as google ads). This requires understanding of an article content. Since most wiki spammers are poorly trained and paid people from poor non-English speaking countries, this strategy most often fails.
- Get a better Google ranking. This second purpose will not work in this wiki, since “{{{1}}}” (Manual:Combating spam. Most wiki spammers abusing edutechwiki are too stupid to know about this (some labour really must come cheap ....)
To some companies wiki spamming may seem to be a good strategy, but most often it is not...
Flagged Revisions
Article validation allows for Editor and Reviewer users to rate revisions of articles and set those revisions as the default revision to show upon normal page view. These revisions will remain the same even if included templates are changed or images are overwritten. This allows for MediaWiki to act more as a Content Management System (CMS).
- Install Extension:FlaggedRevs (retrieved Junly 31 2010):
According to the FlaggedRevs Help page
FlaggedRevs is an extension to the MediaWiki software that allows a wiki to monitor the changes that are made to pages, and to control more carefully the content that is displayed to the wiki's readers. Pages can be flagged by certain "editors" and "reviewers" to indicate that they have been reviewed and found to meet whichever criteria the wiki requires. Each subsequent version of the page can be "flagged" by those users to review new changes. A wiki can use a scale of such flags, with only certain users allowed to set each flag.
The ability to flag revisions makes it easier to co-ordinate the process of maintaining a wiki, since it is much clearer which edits are new (and potentially undesirable) and which have been accepted as constructive. It is possible, however, to configure pages so that only revisions that are flagged to a certain level are visible when the page is viewed by readers; hence, changes made by users who cannot flag the resulting version to a high enough level remain in a "draft" form until the revision is flagged by another user who can set a higher flag.
FlaggedRevs is extremely flexible and can be used in a wide range of configurations; it can be discreet enough to be almost unnoticeable, or it can be used to very tightly control a wiki's activity.Links
General
- Spam (electronic) (Wikipedia)
- Spamdexing (Wikipedia)
- Six Apart Guide to Comment Spam (good reading for web log owners)
- Fight Comment Spam, Ban IP's A large list of banned IP addresses by Chieh Cheng. (There exist others)
- StopBadWare (in case someone managed to upload code, e.g. JavaScript)
- Report Of The Oecd Task Force On Spam: Anti-Spam Toolkit Of Recommended Policies And Measures (2006), PDF.
- Wikis: The Next Frontier for Spammers? (Netcraft, 2004).
Legal issues and official policy
Note:
- Wiki spamming is worse than e-mail spamming, because it relates to vandalism and therefore additional laws can apply.
- Official EU and OECD websites are often unstable (link decay, e.g. the www.oecd-antispam.org official website which is linked to from many places is dead ...)
- Anti-Spam Laws (good resource)
- E-mail spam legislation by country (wikipedia)
- USA (main direct or indirect source of spamming)
- Spam Laws: The United States CAN-SPAM Act
- The CAN-SPAM Act: A Compliance Guide for Business
- CAN-SPAM Act of 2003 (Wikipedia)
- EU
- The European Coalition Against Unsolicited Commercial Email (EuroCAUCE).
- Unsolicited communications - Fighting Spam (EU Information society portal,, retrieved 11:07, 16 July 2010 (UTC)).
- Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions on fighting spam, spyware and malicious software, retrieved 11:07, 16 July 2010 (UTC).
- UK
- Spam Law Summary (Scotch Spam)
- Privacy and Electronic Communications (EC Directive) Regulations 2003 (information commissioner's office)
General wiki spamming
- Wiki Spam (Wikimedia)
- Examples from content guidelines - what is spam ?
- Wikipedia:Spam (part of the Wikipedia content guidelines).
- Wikipedia:WikiProject Spam
- Wiki Spam (from the original wiki)
Mediawiki
- Combating spam (Mediawiki Manual)
- Anti spam features (Mediawiki)
- Spam Filter (This is development page of Mediawiki. I includes extra information, e.g. cleanup scripts.)
- Help:Spam (Wikia) Wikia is a commercial version of Wikipedia with many user-managed subwikis that have their own aims and content policies.
Bibliography
- West, Andrew G., Sampath Kannany and Insup Lee (2010). Detecting Wikipedia Vandalism via Spatio-Temporal Analysis of Revision Metadata, Department of Computer & Information Science, Technical Reports (CIS), University of Pennsylvania. PDF. See also: STiki (Spatio-Temporal analysis over Wikipedia)
- M. Potthast, B. Stein, and R. Gerling (2008). Automatic vandalism detection in Wikipedia. In Advances in Information Retrieval, pages 663-668