Mediawiki2latex

The educational technology and digital learning wiki
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Draft

Introduction

Mediwiki2latex or wb2pdf is a tool created by Dirk Huenniger that allows exporting Mediawiki pages and article collections to create Latex, PDF, Epub and ODF.

It can be used to create (1) print documents on demand and (2) export for book projects that start from wiki pages as draft documents.

See also:

Disclaimer. This is not an official documentation page. Also, prior to feb 10 2019, this toolset was designed to work with standard installations, i.e. not our type of mediawikis. A nice and quick fix now allows to create PDFs from book collections on demand. Other functionality may be implemented at a later stage.

Using

Mediawiki2latex works best with Wikipedia. As of Feb 13, certain functionality does not work with this wiki, but may work on others. Below we introduce the options you have for using this platform. Command line is probably the most productive option.

Official online server

The official online server allows processing books within limits, so we recommend installing your own platform if you got a Debian/Ubuntu machine.

Using your own is faster and will take take load off the official server.

Your own local server

You could run your own server, either as public or local server.

mediawiki2latex -s PORT_NUMBER
e.g.
mediawiki2latex -s 8080

Command line

Again, some of these may not work with your wiki. Some combinations of parameters do not work, e.g. one cannot combine "bookmode" and "user templates".

See also: official manual. It it includes more information.

  -V, -?, -v    --version, --help     show version number
  -o FILE       --output=FILE         output FILE (REQUIRED)
  -f START:END  --featured=START:END  run selftest on featured article numbers from START to END
  -x CONFIG     --hex=CONFIG          hex encoded full configuration for run
  -s PORT       --server=PORT         run in server mode listen on the given port. E.g. 8080 starts it on http://localhost:8080/
  -t FILE       --templates=FILE      user template map FILE
  -r INTEGER    --resolution=INTEGER  maximum image resolution in dpi INTEGER, default = 300 DPI
  -u URL        --url=URL             input URL (REQUIRED)
  -p PAPER      --paper=PAPER         paper size, on of A4,A5,B5,letter,legal,executive
  -m            --mediawiki           use MediaWiki to expand templates
  -h            --html                use MediaWiki generated html as input (default). Will render expanded templates
  -k            --bookmode            uses a Pediapress collection "book" page, i.e. retrieves its list of included articles.
  -z            --zip                 output zip archive of latex source
  -b            --epub                output epub file
  -d            --odt                 output odt file
  -g            --vector              keep vector graphics in vector form
  -i            --internal            use internal template definitions
  -l DIRECTORY  --headers=DIRECTORY   use user supplied latex headers
  -c DIRECTORY  --copy=DIRECTORY      copy LaTeX tree to DIRECTORY

Example code for books (replace URL by your own)

  • Generate a PDF from a wiki book ("Collection extension), starting from the HTML code
mediawiki2latex -o book.pdf -k -u https://edutechwiki.unige.ch/fr/EduTech_Wiki:Livres/Book_title
  • Create a Libre Office document from a collection (see comments below with respect to LibreOffice)
mediawiki2latex -o book.odf -k -d -u https://edutechwiki.unige.ch/fr/EduTech_Wiki:Livres/XXX_YYY -c .
  • Create a zip file with latex and assets
mediawiki2latex -o book.zip -k -z -u https://edutechwiki.unige.ch/fr/BookNS:Books/Book_title

Example code for articles

  • Create a page using wiki expansion (not working as of Feb 11 2019 in this wiki)
mediawiki2latex -o article.pdf -m -u "https://edutechwiki.unige.ch/fr/STIC:STIC_III_(2018)/Prototypes_de_physicalisation_-_broderie_machine"
  • Create a page using internally defined templates (using the -t option specifying a template file)
mediawiki2latex -o article.pdf -u "https://edutechwiki.unige.ch/fr/STIC:STIC_III_(2018)/Prototypes_de_physicalisation_-_broderie_machine" -t /usr/share/mediawiki2latex/latex/templates.user

Template Tweaking

The easiest way is to use HTML mode since templates will be expanded. However you then may get unwanted contents. Therefore you could retrieve in wiki mode, but you then have to define latex templates. See the official documentation

If you set $wgDefaultUserOptions['numberheadings'] = 1; in LocalSettings, remove it temporarily while mediawiki2latex downloads the articles. Alternatively, use wiki mode, if it works in your wiki.

Reduce image size to 400px. (I have to test if this works with thumbnails).

Wrapping of images. There seems to be a model that could be used (to do).

Exclude all templates you don't want, by editing /usr/share/mediawiki2latex/latex/templates.user and by using the "-t" option. The templates.user file is not read automatically by the system. E.g. add

["tutorial","LaTeXNullTemplate"],
["tutoriel","LaTeXNullTemplate"],
["syllabus","LaTeXNullTemplate"]

Copyright information / header

You can define your own headers by modifying and recompiling

document/headers/options.tex

Else use the --headers option.

.... not tested so far.

Creating wiki books

Using transclusion

You could create a wiki page that includes other articles. However, there will be a processing limit. E.g. If you include dozens of pages you may experience slow down or exceed max. number of templates allowed. However, as of April 2019, you must use this to create books with template expansion (-m or -t flag).

= title 1 =
{{:MyPageOne}}
= title 2 =
{{:MyPageTwo}}

Example using template expansion

 mediawiki2latex -o book.pdf -u  https://edutechwiki.unige.ch/fr/Daniel_K._Schneider/My_Book -t /usr/share/mediawiki2latex/latex/templates.user

Make sure that included pages start title numbering with "==" and not "=".

Using collection extension

The recommended solution is to use wiki books defined by the collection extension, i.e. use a feature from the alternative PediaPress technology.

As of Feb 2019, this works with our wikis using default mode (html-based). It fails using "wiki" template expansion.

Installation

See the official links below first. Below we just wrote down what did work on Feb 2019 for Ubuntu 18x LTS.

(1) Install the (old) default version, which will also install lots of run time dependencies (compatible with your current Ubuntu system).

sudo apt-get install mediawiki2latex

(2) Then install the build time dependencies (as root) as explained [Benutzer:Dirk Hünniger/wb2pdf/installing here] , i.e. about 10 different packages

sudo apt-get install ghc libghc-x509-dev libghc-pem-dev
sudo apt-get install libghc-regex-compat-dev libghc-http-dev cabal-install libghc-hxt-dev
sudo apt-get install libghc-split-dev libghc-blaze-html-dev libghc-file-embed-dev
sudo apt-get install libghc-highlighting-kate-dev  libghc-hxt-http-dev libghc-regex-pcre-dev
sudo apt-get install libghc-temporary-dev libghc-url-dev libghc-utf8-string-dev
sudo apt-get install libghc-utility-ht-dev libghc-http-conduit-dev libghc-happstack-server-dev
sudo apt-get install libghc-directory-tree-dev libghc-zip-archive-dev libghc-strict-dev
sudo apt-get install libghc-network-uri-dev libghc-tagsoup-dev libghc-word8-dev
sudo apt-get install ghostscript calibre latex2rtf libreoffice 

(3) Then install the new version from the git repository

git clone https://git.code.sf.net/p/wb2pdf/git wb2pdf-git
cd wb2pdf-git
make
sudo make install

To update:

  • cd into the wb2pdf-git directory
sudo git pull
sudo make install

(4) Add a list of templates you want the system to ignore

  • Edit file wb2pdf-git/latex/templates.user and reinstall ??
  • Else, edit file /usr/share/mediawiki2latex/latex/templates.user or copy it and then use it with the "-t" option.

(5) Add fonts

  • Not needed for Ubuntu

Libre office installation and creation tips

As of Jan 7 2020:

(1) Get the latest libre office, read https://wiki.ubuntu.com/LibreOffice

sudo apt install python-software-properties
sudo apt-add-repository ppa:libreoffice/ppa
sudo apt update
sudo apt install libreoffice
$ libreoffice --version
 LibreOffice 6.3.4.2 30(Build:2)

(2) Make sure that imagemagik has permission to transform PS and PDF files to PNG

In /etc/ImageMagick-6/policy.xml

 <policy domain="coder" rights="read|write" pattern="PS" />
 <policy domain="coder" rights="none|write" pattern="PS2" />
 <policy domain="coder" rights="none|write" pattern="PS3" />
 <policy domain="coder" rights="none|write" pattern="EPS" />
 <policy domain="coder" rights="read|write" pattern="PDF" />
 <policy domain="coder" rights="read|write" pattern="XPS" /> 

(3) (Fixed) In an older than Jan 13 2020 version, the ODF could not find the image files, but that is fixed now. I did the following ativate "copy to latex", then make sure that LibreOffice can find the images and formulas directories it is looking for, e.g. if you start from Pediapress book definition:

mkdir somedirectory
cd somedirectory
mediawiki2latex -o ct.odf -d -u https://edutechwiki.unige.ch/fr/EduTech_Wiki:Livres/Initiation_%C3%A0_la_pens%C3%A9e_computationnelle_avec_JavaScript -k -c .

then

mv document/images/ . 
mv document/formulas/ .

Links