Making a local Wikipedia mirror

When the internet connectivity is unreliable or you’re preparing to stay offline for any amount of time (esp. travelling), a locally-accessible copy of Wikipedia is useful to have.

Periodic wikipedia dumps are available at https://wiki.kiwix.org/wiki/Content_in_all_languages in zim format. This allows for both formatted text and images to be stored in a single, compressed file.

The zim format can be used by kiwix-tools to display the content.

Downloading the Wikipedia dump

Scroll down the page looking for “wikipedia (English)” pack with the “all maxi” version. It will occupy about 90GB of space. Download it (preferrably over bittorrent, to lower the strain on the upstream servers.).

To simplify the operations, below are the Ubuntu command-line commands:

# Install the aria2 client (for downloading over http & bittorrent)
sudo apt-get install aria2

# Pull the latest torrent file and download its content
aria2c https://download.kiwix.org/zim/wikipedia_en_all_maxi.zim.torrent

Using the dump

The zim archives are interpreted and published over HTTP using the kiwix-serve program from the kiwix-tools package. This has to be installed separately.

Link for official download archives. Below are the Ubuntu command-line commands:

# Add the PPA that hosts the kiwix-tools package
sudo add-apt-repository ppa:kiwixteam/release

# Install the kiwix-tools package
sudo apt-get install kiwix-tools

# Adapt the command for an available TCP port and the downloaded filenames
kiwix-serve --port=8080 wikipedia_en_all_maxi_2021-03.zim
# Point a web browser to that port (over HTTP) and enjoy your local mirror.

Bonus: Mirroring locally both Wikipedia and Gutenberg Project

Gutenberg Project is an effort to collect, scan, OCR and publish digitally public domain books (out of copyright). All Gutenberg Project archives can be downloaded in the same zim format.

Scroll down the page and download “gutenberg (English)”. It will take about 70GB of space.

Alternatively, you can download it from the command-line.

aria2c https://download.kiwix.org/zim/gutenberg_en_all.zim.torrent

To serve both Gutenberg Project and Wikipedia content, the kiwix-serve command needs to be adjusted:

kiwix-serve --port=8080 gutenberg_en_all_2021-06.zim wikipedia_en_all_maxi_2021-03.zim