Today I Learned (April 12, 2024)

How to minimalize the size of Docker files containing Python executables, using Pip --user install, multi-stage Docker image builds, and minimal Python installations

First of all, we can tell Pip where to install executables and libraries using a combination of setting the environment variable PYTHONUSERBASE and the --user install flag. E.g.

mkdir -p /tmp/theuserbase
export PYTHONUSERBASE=/tmp/theuserbase
pip install \
    --user \
    pre-commit
ls -la /tmp/theuserbase/bin/ # "binaries" (executables)
ls -la /tmp/theuserbase/lib/ # library code 

:link: https://luminousmen.com/post/why-use-pip-install-user

Now let’s combine this with a Docker multi-stage build to trim the image size. We want to only install a minimal Python installation and only the installed executables and libraries in the /tmp/theuserbase directory. A Dockerfile for this would look like the following:

# -- First stage (install dependencies, not minimalized) -- #
FROM ubuntu:22.04 as installer

ENV PYTHONUSERBASE=/tmp/theuserbase

RUN <<EOF
    set -eu
    apt-get update
    apt-get upgrade -y
    apt-get install -y python3 python3-pip 
    export PYTHONUSERBASE
    pip install --user --no-cache-dir pre-commit
EOF

# -- Second stage (minimal build, package dependencies + minimal Python installation) -- #
FROM ubuntu:22.04

ENV PYTHONUSERBASE=/tmp/theuserbase

# Copy over "binaries" (executables) and library packages from the installer image.
COPY --from=installer \
    ${PYTHONUSERBASE}/bin/ \
    ${PYTHONUSERBASE}/bin/
COPY --from=installer \
    ${PYTHONUSERBASE}/lib/ \
    ${PYTHONUSERBASE}/lib/

# Install only the minimal Python 3 package (e.g. without pip and whatever else clutters the installer image). 
RUN <<EOF
    set -eu
    apt-get update 
    apt-get upgrade --yes 
    apt-get install --yes python3-minimal 
EOF

ENV PATH="${PYTHONUSERBASE}/bin:${PATH}" 

ENTRYPOINT ["/bin/bash", "-c"]

Now after running build, we can confirm that the installation of pre-commit (just an example), works:

$ cd /tmp
$ docker build -t thetestimage . # assuming /tmp/Dockerfile has the content shown above
$ docker run --rm thetestimage 'pre-commit --version'
pre-commit 3.7.0

Image size is only about 170 MB. The equivalent fat image, i.e. the image for the following (non trimmed) Dockerfile

FROM ubuntu:22.04 

RUN <<EOF
    set -eu
    apt-get update
    apt-get upgrade -y
    apt-get install -y python3 python3-pip
    pip install --user --no-cache-dir pre-commit
EOF

ENV PATH="/root/.local/bin:${PATH}"

ENTRYPOINT ["/bin/bash", "-c"]

was about 470 MB. Using this approach, the minimalized image comes out to less than half the size of that fat image.