StatMediaWiki: Difference between revisions

The educational technology and digital learning wiki
Jump to navigation Jump to search
Line 69: Line 69:
'''Performance'''
'''Performance'''


Depending on the size of your wiki you will have to wait a few minutes, hours or days. E.g. Analysis of the following type of wiki took about 180 minutes on the server machine mentioned above.
Depending on the size of your wiki you will have to wait a few minutes, hours, days or weeks. E.g. Analysis of the following type of wiki took about 180 minutes on the server machine mentioned above.
<pre>  
<pre>  
Report period: 2006-03-10T00:00:00 – 2012-01-26T17:29:01
Report period: 2006-03-10T00:00:00 – 2012-01-26T17:29:01
Line 79: Line 79:
Total visits: 37944
Total visits: 37944
Generated in: 2012-01-26T17:29:09.000042
Generated in: 2012-01-26T17:29:09.000042
</pre>
The following one is still running after 4 days, i.e. time taken seems to increase exponentially in function of users * pages * edits:
<pre>
Report period: 2006-08-21T00:00:00 – 2012-01-27T18:37:25
Total users: 529
Total pages: 834
Total edits: 51957
Total bytes: 8394153
Total files: 134
Total visits: 256024
Generated in: 2012-01-27T18:39:22.652603
</pre>
</pre>



Revision as of 14:42, 31 January 2012

Draft

<pageby nominor="false" comments="false"/>

Introduction

StatMediaWiki is a project that creates tools to collect and aggregate information available in a MediaWiki installation. StatMediaWiki is free software under the GPL v3 or higher license. There are currently two versions of this software: Classic (stable software) and Interactive (currently Beta).

See also:

Classic StatMediaWiki

Results are static HTML pages including tables and graphics that can help to analyze the wiki status and development. The tool seems to be well suited for summarizing student contributions, in particular when used over a limited time range (e.g. 6 month).

Interactive StatMediaWiki

This version is currently under development. It is an interactive application with several menus, which generate analysis, graphs and tables according to user instructions.

Installation

(under Ubuntu/Debian)

Get the software

This will retrieve the whole archive

svn checkout https://forja.rediris.es/svn/statmediawiki

Other software needed

(for now, we assume that you already have python installed)

You may have to install some or all of the following:

apt-get install python-gnuplot
apt-get install python-MySQLdb
apt-get install python-NumPy
apt-get install python-SciPy
apt-get install python-Matplotlib

In addition (optional) you may need Graphviz

Create a database user with read-only access to the wiki database

Add a user to the MySQL server
  • E.g. user="analysis" password="xxx" with a SELECT priviledge for database "MyWiki"
Add a .my.cnf configuration file to your home directory and specify the follow four lines.
[client]
user = analysis
password = xxx
host = localhost
Running on another machine ?

If you don't want to run analysis scripts on the MediaWiki server, you should add privileges for remote MySQL Access (not tested). Our small Sun Fire X4150 2CPU MediaWiki server managed fine using a typical load average of 1.2.

Usage of classic

Basically, you can launch a global analysis with the smw.py command line script. This will generate a website that includes the following statistics:

  • Global usage
  • Data per user (content evolution, activity, top pages, uploads, words cloud)
  • Data per page (content evolution, activity, work distribution, top users, words cloud)
  • Data per category
  • A tags cloud

All pages will be analysed (i.e. wiki pages, talk pages, user pages, user talk pages and so forth). I don't know if this is configurable.

Plot data are rendered as PNG, but also can be exported as CSV.

Performance

Depending on the size of your wiki you will have to wait a few minutes, hours, days or weeks. E.g. Analysis of the following type of wiki took about 180 minutes on the server machine mentioned above.

 
Report period:	2006-03-10T00:00:00 – 2012-01-26T17:29:01
Total users:	69
Total pages:	516
Total edits:	7304
Total bytes:	3354641
Total files:	17
Total visits:	37944
Generated in:	2012-01-26T17:29:09.000042

The following one is still running after 4 days, i.e. time taken seems to increase exponentially in function of users * pages * edits:

Report period:	2006-08-21T00:00:00 – 2012-01-27T18:37:25
Total users:	529
Total pages:	834 
Total edits:	51957 
Total bytes:	8394153 
Total files:	134
Total visits:	256024 
Generated in:	2012-01-27T18:39:22.652603

wmw.py command line parameters

--outputdir: absolute path to the directory where the HTML report site will be generated.
--index: name of the main (initial) file of the report (by default, index.php)
--sitename: name of the wiki that will be shown on the title of the report
--siteurl: URL of the wiki
--subdir: path that has to be added to the URL to get to the wiki (by default /index.php)
--dbname: name of the database of the wiki
--tableprefix: prefix of the tables in the database (only required if you indicated one when installing MediaWiki)
--anonymous: it replaces usernames by hashes (salty md5). Use this if you plan to publish results.
--startdate: start analysis. Example: --startdate=2010-01-01
--enddate: end of analysis

smw.py command line example

python statmediawiki/trunk/smw.py --outputdir="/web/analysis/dewiki" --sitename=DeWiki --siteurl=http://edutechwiki.unige.ch --subdir="/dewiki/" --dbname=dewiki

You should then see something like:

/export/home/schneide/statmediawiki/trunk/smwget.py:19: DeprecationWarning: the md5 module is deprecated; use hashlib instead
  import md5
---------------------------------------------------------------------------
Welcome to StatMediaWiki 1.1. Web: http://statmediawiki.forja.rediris.es
---------------------------------------------------------------------------
Loaded 14 categories
.....

And remember, the process can take quite a long time even for a small wiki.

Links

Official
Other

Usage of interactive

Not tested so far - DKS/21:03, 26 January 2012 (CET)

Bugs

Title bug - version 1.1 as of Jan 26 2012

Titles that include double quotes will fail in two scripts (at least). Of course, one should use simple titles in a wiki, but try to teach this to education students (...)

Traceback:

gnuplot> plot "/tmp/tmpPSIym8.gnuplot/fifo" title "Edits in Résumé du "livre": Pingeon, D. (1982). La délinquance juvénile stigmatisée. (all users)" with boxes, "/tmp/tmptzDYWK.gnuplot/fifo" title "Edits in Résumé du "livre": Pingeon, D. (1982). La délinquance juvénile stigmatisée. (only anonymous users)" with boxes, "/tmp/tmpMVK5JA.gnuplot/fifo" title "Edits in Résumé du "livre": Pingeon, D. (1982). La délinquance juvénile stigmatisée. (only registered users)" with boxes
                                                                         ^
         line 0: ';' expected

or

gnuplot> plot "/tmp/tmp4caTry.gnuplot/fifo" title "Edits in "L'oiseau et le cachot, naissance de l'éducation correctionnelle en suisse romande 1800-1913" (all users)" with boxes, "/tmp/tmpC60A5f.gnuplot/fifo" title "Edits in "L'oiseau et le cachot, naissance de l'éducation correctionnelle en suisse romande 1800-1913" (only anonymous users)" with boxes, "/tmp/tmpgnJsqX.gnuplot/fifo" title "Edits in "L'oiseau et le cachot, naissance de l'éducation correctionnelle en suisse romande 1800-1913" (only registered users)" with boxes
                                                                                                   ^
         line 0: invalid character 


gnuplot> set title "Accumulative work distribution in "L'oiseau et le cachot, naissance de l'éducation correctionnelle en suisse romande 1800-1913""
                                                                                             ^
         line 0: invalid character 

Namespace bug - version 1.1 as of Jan 26 2012

You will have to manually edit the Python code (see below) if you use extra namespaces in your wiki.

In the trace below, KeyError: 102 refers to an extra name space used it seems according to Erkan Yilmaz, 2011-08-18

Welcome to StatMediaWiki 1.1. Web: http://statmediawiki.forja.rediris.es
---------------------------------------------------------------------------
Loaded 105 categories
Loaded 1070 images
Loaded 2186 pages
Loaded 20644 revisions
Loaded 334 users
Traceback (most recent call last):
  File "statmediawiki/trunk/smw.py", line 55, in <module>
    main()
  File "statmediawiki/trunk/smw.py", line 43, in main
    smwload.load()
  File "/export/home/schneide/statmediawiki/trunk/smwload.py", line 42, in load
    fillFullpagetitles()
  File "/export/home/schneide/statmediawiki/trunk/smwload.py", line 68, in fillFullpagetitles
    smwconfig.pages[page_id]["full_page_title"] = page_props["page_namespace"] == 0 and page_props["page_title"] or '%s:%s' % (smwconfig.namespaces[page_props["page_namespace"]], page_props["page_title"])
KeyError: 102

Workaround according to Emilio José Rodríguez Posada (emijrp)

Do you know a bit of Python? You need to add your non-canonical namespaces to "line 159" in the smwload.py file. Look at the example below (new namespaces at end of line):

smwconfig.namespaces = {-2: "Media", -1: "Special", 0: "Main", 1: "Talk", 2: "User", 3: "User talk", 4: "Project", 5: "Project talk",
6: "File", 7: "File talk", 8: "MediaWiki", 9: "MediaWiki talk", 10: "Template", 11: "Template talk", 
12: "Help", 13: "Help talk", 14: "Category", 15: "Category talk", 102: "your namespace", 103: "other namespace"}

You can see all your namespaces in the Special:AllPages (see the pull-down menu) or LocalSettings.php

Bibliography

  • Rodríguez-Posada, Emilio J.; Juan Manuel Dodero, Manuel Palomo-Duarte, Inmaculada Medina-Bulo (2011). Learning-Oriented Assesment of Wiki Contributions: How to Assess Wiki Contributions in a Higher Education Learning Setting. Proceedings of CSEDU2011, 3rd International Conference on Computer Supported Education. Noordwijkerhout, The Netherlands. , 2011. PDF Reprint
  • See also: Publicaciones (list of publications, mostly in Spanish at StatMediaWiki)