Skip to content

Mordecai3⚓︎

Mordecai3 is a powerful (but also resource intensive) toolkit to identify mentions of geographical locations in text and matching it to the Geonames.org knowledge base.

You can find additional information here:

Running Elasticsearch Geonames⚓︎

We had issues with the default configuration in the GitHub project. The official elasticsearch docker documentation has some other settings that need adjusting. Here's how to get it running:

# Create folder
mkdir /srv/mordecai3
cd /srv/mordecai3

# Get projects
git clone git@github.com:openeventdata/es-geonames.git
git clone git@github.com:ahalterman/mordecai3.git

# Setup venv (see more below)
python3.11 -m venv venv
# Fix permissions (should be repeated when you install something new as a different user)
chmod -R g+w venv
chown -R mordecai:mordecai venv

source venv/bin/activate
pip install -r es-geonames/requirements.txt
pip install -r mordecai3/requirements.txt

# Send ownership to dedicated user
chown -R mordecai:mordecai .

# Crate folder for ES index
mkdir geonames_index
chown -R mccadmin:mccadmin geonames_index/

sudo apt install docker-compose
# [adjust files as shown below]

# Build index
cd es-geonames
chmod ug+x create_index.sh
./create_index.sh

# (optional) add user to the group
sudo usermod -a -G mordecai [username]
# Stop on error
set -e

echo "Starting Docker container and data volume..."

# Start docker container
sudo docker compose up -d
sudo docker ps

echo "Downloading Geonames gazetteer..."
wget https://download.geonames.org/export/dump/allCountries.zip
wget https://download.geonames.org/export/dump/admin1CodesASCII.txt
wget https://download.geonames.org/export/dump/admin2Codes.txt
echo "Unpacking Geonames gazetteer..."
unzip allCountries.zip

echo "Waiting a bit for ES to start"
sleep 60

echo "Creating mappings for the fields in the Geonames index..."
curl -XPUT '0.0.0.0:9200/geonames' -H 'Content-Type: application/json' -d @geonames_mapping.json

echo "Change disk availability limits..."
curl -X PUT "0.0.0.0:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.low": "10gb",
    "cluster.routing.allocation.disk.watermark.high": "5gb",
    "cluster.routing.allocation.disk.watermark.flood_stage": "4gb",
    "cluster.info.update.interval": "1m"
  }
}
'

echo "\nLoading gazetteer into Elasticsearch..."
python geonames_elasticsearch_loader.py

echo "Done"
version: '3'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.17.18
    volumes:
      - /srv/mordecai3/geonames_index:/usr/share/elasticsearch/data
    environment:
      - node.name=mordecai
      - cluster.name=mordecai
      - discovery.type=single-node
      - xpack.security.enabled=false
      - xpack.security.http.ssl.enabled=false
      - "ES_JAVA_OPTS=-Xms4g -Xmx4g"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    ports:
      - "0.0.0.0:9200:9200"
# replace
es = Elasticsearch(urls='http://localhost:9200/', timeout=60, max_retries=2)
# with
es = Elasticsearch(urls='http://0.0.0.0:9200/', timeout=60, max_retries=2)

Assuming the container is down, spin it up via

cd /srv/mordecai3
source venv/bin/activate
cd es-geonames
sudo docker compose up

Setting up a kernel to be used in jupyterlab

cd /srv/mordecai3
source venv/bin/activate
pip install ipykernel
python -m ipykernel install --user --name=global-mordecai3

Using in python/jupyterlab⚓︎

If you'd like to use mordecai, you should use the global-mordecai3 kernel. Aside from all requirements in mordecai3 and es-geonames, it has most of the common packages installed too.

# minimal example to extract geolocations from a bit of text

Random useful commands⚓︎

docker ps
docker kill
docker images
docker volume ls