Why?
Managing dependencies with Python when working on multiple AI and data science problems can quickly become a nightmare; sure, venv, pyenv, or pipenv can help you, but they are not enough, especially if you need to synchronize your work through different machines with different operating systems and architectures.
The solution
Lately, I've been using Docker as my primary development tool for setting up different environments, and I'd like to share the two templates I've been using.
The two setups have a very similar structure:
- 
Dockerfile
- 
build.sh
- 
run.sh
- 
run_command.sh(only for the CUDA setup)
- 
.env
- 
working-dir/
- 
.envfile contains the name that Docker will use to build and run the container.
- 
build.sh,run.shandrun_command.share utils scripts to help with building and spinning up the environment.
- 
working-diris the directory shared between the host environment and the Docker container; you should place all the datasets, models, notebooks, etc., in this directory.
- 
Dockerfileis the core of the setup, containing all the commands to install dependencies and to prepare the environment; any change or tweak for a specific setup should be done here.
Usage
Using this setup is very simple:
- create the .envfile following the example provided in the github repo
- run build.sh
- run run.sh
- enjoy :)
Jupyter Setup
you can find the code here: docker-ml-boilerplate
The first setup is the one I use when I need to perform data analysis, where I only need Jupyter, Pandas,Numpy, and some data visualization libraries.
Dockerfile:
FROM jupyter/scipy-notebook
# -- Install dependencies
# RUN pip install ...
RUN mkdir /home/jovyan/working-dir
WORKDIR /home/jovyan/working-dir
# -- Run any setup scripts
build.sh:
#!/usr/bin/env bash
source ./.env
echo "[+] Build docker image:"
echo "${ML_PROJECT_NAME}"
docker build -t $ML_PROJECT_NAME  .run.sh:
#!/usr/bin/env bash
source ./.env
echo "[+] Running docker env:"
echo "${ML_PROJECT_NAME}"
docker run -p 8888:8888 -v ./working-dir:/home/jovyan/working-dir $ML_PROJECT_NAMEMachine Learning with CUDA support
you can find the code here: docker-ml-boilerplate-CUDA
I use The second template when I need to leverage my computer's GPU to train different ML models for some projects.
(note: depending on your Docker installation you may need to use sudo to run the docker commands)
Dockerfile:
FROM nvidia/cuda:11.4.3-runtime-ubuntu20.04
# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive
# Install system dependencies
RUN apt-get update && apt-get install -y git python3-pip python3-dev python3-opencv libglib2.0-0
# Upgrade pip
RUN python3 -m pip install --upgrade pip
# Install PyTorch and torchvision
RUN pip3 install torch torchvision torchaudio -f https://download.pytorch.org/whl/cu111/torch_stable.html
RUN pip3 install jupyterlab numpy matplotlib pandas seaborn ipywidgets scipy
# RUN other pip install ...
RUN mkdir /home/working-dir
WORKDIR /home/working-dir
# -- Run any setup scriptsbuild.sh (same as the Jupyter setup):
run.sh:
#!/usr/bin/env bash
source ./.env
echo "[+] Running docker env:"
echo "${ML_PROJECT_NAME}"
docker run -v ./working-dir:/home/working-dir --gpus all -p 8888:8888  $ML_PROJECT_NAME jupyter lab --allow-root --ip=0.0.0.0run_command.sh:
#!/usr/bin/env bash
source ./.env
echo "[+] Running docker env:"
echo "${ML_PROJECT_NAME}"
docker run -v ./working-dir:/home/working-dir --gpus all -p 8888:8888 $ML_PROJECT_NAME $@Conclusions
Using Docker to handle Machine Learning environments has many advantages:
- Keeps all dependencies well managed and separated, avoiding the risk of breaking one project while updating another one.
- It is easy to replicate on multiple projects while quickly tweaking, updating a version, or adding a new dependency.
- Very easy to use: build the Docker once and then run it each time you need to spin up your environment. Even if you need to update the Dockerfile, leveraging the Docker cache, rebuilding the image is pretty quick.
- You can run your docker image on a machine (e.g., a powerful desktop or server) and connect to the Jupyter web interface from a different machine (your old and low-budget laptop).