Mordecai3⚓︎
Mordecai3 is a powerful (but also resource intensive) toolkit to identify mentions of geographical locations in text and matching it to the Geonames.org knowledge base.
You can find additional information here:
Running Elasticsearch Geonames⚓︎
We had issues with the default configuration in the GitHub project. The official elasticsearch docker documentation has some other settings that need adjusting. Here's how to get it running:
# Create folder
mkdir /srv/mordecai3
cd /srv/mordecai3
# Get projects
git clone git@github.com:openeventdata/es-geonames.git
git clone git@github.com:ahalterman/mordecai3.git
# Setup venv (see more below)
python3.11 -m venv venv
# Fix permissions (should be repeated when you install something new as a different user)
chmod -R g+w venv
chown -R mordecai:mordecai venv
source venv/bin/activate
pip install -r es-geonames/requirements.txt
pip install -r mordecai3/requirements.txt
# Send ownership to dedicated user
chown -R mordecai:mordecai .
# Crate folder for ES index
mkdir geonames_index
chown -R mccadmin:mccadmin geonames_index/
sudo apt install docker-compose
# [adjust files as shown below]
# Build index
cd es-geonames
chmod ug+x create_index.sh
./create_index.sh
# (optional) add user to the group
sudo usermod -a -G mordecai [username]
# Stop on error
set -e
echo "Starting Docker container and data volume..."
# Start docker container
sudo docker compose up -d
sudo docker ps
echo "Downloading Geonames gazetteer..."
wget https://download.geonames.org/export/dump/allCountries.zip
wget https://download.geonames.org/export/dump/admin1CodesASCII.txt
wget https://download.geonames.org/export/dump/admin2Codes.txt
echo "Unpacking Geonames gazetteer..."
unzip allCountries.zip
echo "Waiting a bit for ES to start"
sleep 60
echo "Creating mappings for the fields in the Geonames index..."
curl -XPUT '0.0.0.0:9200/geonames' -H 'Content-Type: application/json' -d @geonames_mapping.json
echo "Change disk availability limits..."
curl -X PUT "0.0.0.0:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"transient": {
"cluster.routing.allocation.disk.watermark.low": "10gb",
"cluster.routing.allocation.disk.watermark.high": "5gb",
"cluster.routing.allocation.disk.watermark.flood_stage": "4gb",
"cluster.info.update.interval": "1m"
}
}
'
echo "\nLoading gazetteer into Elasticsearch..."
python geonames_elasticsearch_loader.py
echo "Done"
version: '3'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.17.18
volumes:
- /srv/mordecai3/geonames_index:/usr/share/elasticsearch/data
environment:
- node.name=mordecai
- cluster.name=mordecai
- discovery.type=single-node
- xpack.security.enabled=false
- xpack.security.http.ssl.enabled=false
- "ES_JAVA_OPTS=-Xms4g -Xmx4g"
ulimits:
memlock:
soft: -1
hard: -1
ports:
- "0.0.0.0:9200:9200"
# replace
es = Elasticsearch(urls='http://localhost:9200/', timeout=60, max_retries=2)
# with
es = Elasticsearch(urls='http://0.0.0.0:9200/', timeout=60, max_retries=2)
Assuming the container is down, spin it up via
cd /srv/mordecai3
source venv/bin/activate
cd es-geonames
sudo docker compose up
Setting up a kernel to be used in jupyterlab
cd /srv/mordecai3
source venv/bin/activate
pip install ipykernel
python -m ipykernel install --user --name=global-mordecai3
Using in python/jupyterlab⚓︎
If you'd like to use mordecai, you should use the global-mordecai3 kernel.
Aside from all requirements in mordecai3 and es-geonames, it has most of the common packages installed too.
# minimal example to extract geolocations from a bit of text
Random useful commands⚓︎
docker ps
docker kill
docker images
docker volume ls