If you have ever shared code, it is quite likely that you have said “well, it works on my machine” when you see how others have difficulties running it. Incorrect configuration, version differences or uninstalled dependencies are often some of the causes behind this phenomenon.
In this post we will talk about Docker, a tool that allows us to have the guarantee that our software will run the same on any machine, thus freeing us from multiple headaches. Specifically, we will do so through the example of a machine learning model with Python.
What exactly is Docker?
Docker is a tool that allows us to package our software in containers, which include all the necessary dependencies and configurations so that it can be executed. In this way, the container is self-sufficient and can be run on any machine with Docker installed.
To start a container, we must do it from an image, which is the template that will define the content of the container. In docker hub we can find a wide variety of ready-to-use images, but we can also define our own images through the DockerFile file. In this file we will find different instructions in which we define what we want the image to have.
Summarizing, we can say that we use the DockerFile to define images, and once we have the image, the container will be an instance of that image in execution.
Let’s now see how we can apply Docker in the context of predictive machine learning models. We will start from an example Python project where the model has already been exported and we have a script that loads it and generates predictions for some input data. Likewise, we also have a requirements.txt file with all the dependencies.
The first thing we will have to do is to create the DockerFile. Next we will explain each instruction.
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
COPY run_model.py run_model.py
COPY model.pkl model.pkl
CMD ["python3", "run_model.py"]
First, with the FROM command we define the image from which we are going to build our image, which in this case is the official python image for version 3.8.
With WORKDIR we define /app as the workspace. After this, we proceed to install the dependencies. With COPY we copy the requirements.txt file into the container, and then with the RUN command we run the installation of these dependencies. We copy the model and the script that executes it, and finally, with CMD we define the default execution of the container, which in this case will be to run with python the script that executes the model.
Once we have our DockerFile ready, we can build the image with the command
$ docker build -t my_first_image:v0.1
If everything went correctly, the new image should be listed if we run
$ docker image ls
Now that we have the image, we can run a container with the following command
$ docker run \
-v $PWD/input/:/input: \
-v $PWD/output/:/output: \
Containers do not persist information, which means that once they are stopped, the data they contain will be lost. With the -v flag we can define volumes, which are directories that do persist information. These are located outside the container’s internal file system, and are very useful for sharing data between different containers or with the host system itself. For our example, we will use the input and output volumes, which can be viewed as shared folders between the host and the container. When running the container, the script will read the data we have previously left in input from the host system, get the predictions and write them to a file inside output. We will be able to access this file from the host system by entering the output directory.
In this post we have introduced Docker and how we can use it to run a machine learning model. For more information, we recommend you to consult the official documentation, and if you found this article interesting, we invite you to take a look at Damavis’ blog.