Table of contents
Open Table of contents
Intro
I am building a chatbot, which can tell you about the upcoming movies in the selected city and date, all in Finnish. Analyzing Finnish text and words requires tools, and Voikko is one of the few existing. For anyone who is not familiar with the Finnish language — my native language — a couple of examples may reveal some of the peculiarities of our language, when compared to English.
Wikipedia defines Finnish language as follows:
Finnish is the eponymous member of the Finnic language family and is typologically between fusional and agglutinative languages. It modifies and inflects nouns, adjectives, pronouns, numerals and verbs, depending on their roles in the sentence.
The examples (quite easy ones, with a root word car = auto):
- In a car transtales to autossa, the root word being auto.
- With a car translates to autolla, the root word again being auto.
- Without a car translates to autotta or ilman autoa.
- With their cars translates to autoineen.
- Without their cars translates to autoittaan or ilman autojaan
You should get the point ¯\_(ツ)_/¯
Voikko does all the heavy lifting, yet It can be very difficult to install with the outdated instructions and numerous compatibility issues. Without going to details, I explored numerous options to run it locally on my Mac (e.g. homebrew
installation option no longer works), and within different Linux distributions, and distro versions. In any case, I would be using Voikko from a Python app. The following describes how I finally resolved the installation issue with Docker.
The end result will expose Voikko’s Analyze function as a CLI App.
This article is not a Getting Started with Docker, so if you are not familiar with Docker tools, please go through the official Getting Started tutorials.
Prerequisites
- Recent macOS, Windows or Linux laptop should all work. At the time of writing I am running
macOS Sierra 10.12.6
. - You have the Docker Community Edition running. At the time of writing I am running
version 17.06.1-ce-mac24 (18950)
. - Python 3 installation might be necessary (more on this a bit later), though not required.
Building the container image
Since I would be using Voikko from a Python App, a good starting point would be the official Python image.
Testing things out and building the dev image
I will explain my choices as I go through the steps of building the image. To begin, I created a new Dockerfile
in my project directory. After some trial and error with the Voikko installation, I ended up with the Debian 9 Stretch based image:
FROM python:3.6.2-stretch
After I found the “correct” Linux distro and version (Voikko can be really picky…), installing Voikko become rather easy:
# Install Voikko packages
RUN DEBIAN_FRONTEND=noninteractive \
&& apt-get update \
&& apt-get install -y voikko-fi python-libvoikko
Next step in this expedition was to create a tiny Python App, which would be using Voikko underneath to analyze the given word(s).
For this, I created a file app.py
to the project directory and hammered in the magical words:
#!/usr/bin/env python
import sys
from libvoikko import Voikko
print('Analysoidaan annetut sanat:\n')
v = Voikko("fi")
# Pass the 1st argument as it is the app name itself.
for a in sys.argv[1:]:
print(f'Sanan {a} analyysi:')
print(v.analyze(a))
print('Annetut sanat analysoitu.')
I set the container’s workdir
WORKDIR /usr/src/app
… and copied the app to the container
COPY app.py ./
This is where the optional local Python installation comes in; If I would have needed some external packages, I would have installed them locally, and created a normal requirements.txt
file. In this case, I would then copy the requirements.txt to the container and install the dependencies.
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
I instructed the container to run something when started
ENTRYPOINT ["python", "./app.py"]
CMD ["./app.py"]
Feeling victorious, it was time to build the image and test the final product. As most of you my dear readers are already laughing, not so fast, champ!
So, after building with
docker build -t python3-voikko .
… and running the container with
docker run --rm -ti python3-voikko
… I was greeted with a friendly traceback telling me that libvoikko.py
cannot be found! Oopsie. So the Easy Peasy installation was not all that was needed after all. Hmm. I started digging the container
docker run --rm -ti python3-voikko bash
… and found the location of the missing file. I could have done a symbolic link to a place where my beloved Python App could find the library in question, but ended up copying the file to my app directory.
RUN cp /lib/python3/dist-packages/libvoikko.py /usr/src/app/libvoikko.py
After rebuilding the container, and running it again feeling courageous I even gave it an input word to analyze as CLI agrument
docker run --rm -ti python3-voikko autossa
… and hey presto!
Analysoidaan annetut sanat:
Sanan autossa analyysi:
[{'BASEFORM': 'auto', 'CLASS': 'nimisana', 'FSTOUTPUT': '[Ln][Xp]auto[X]auto[Sine][Ny]ssa', 'NUMBER': 'singular', 'SIJAMUOTO': 'sisaolento', 'STRUCTURE': '=ppppppp', 'WORDBASES': '+auto(auto)'}]
Annetut sanat analysoitu.
Finally I got what I expected, and it was time to move along with the app development itself.
The final Dockerfile
and app can be found on GitHub repo.
Publishing the image on Docker Hub
So that you too could enjoy this fantastic end result, I have published the container image on Docker Hub.
This can be done rather easily (assuming you have already created your account).
Tag the previously built image (with your Docker Hub user_name
, and with two tags latest
and 1.0
):
#
docker tag 463278859e4d user_name/python3-voikko:latest
docker tag 463278859e4d user_name/python3-voikko:1.0
… or you can build and add 2 tags at the same time:
docker build -t user_name/python3-voikko:1.0 -t user_name/python3-voikko:latest .
Login to Docker Hub (enter credentials):
docker login
Push both images:
#
docker push user_name/python3-voikko:latest
docker push user_name/python3-voikko:1.0
Final thing to do is to write a good README on Docker Hub so that users learn how to use the image.
The python3-voikko
image can be found on Docker Hub.