Using eStargz to reduce container startup time on Amazon EKS

A container image bundles executable code, library, and configuration. Images contain everything an application needs to run. It is a best practice to exclude any file that’s unnecessary for the application packaged in the image. A smaller image means that when you create a container, the container runtime (dockerd or containerd) will have to download fewer bits from the container registry, which will result in faster startup time.

There are cases when the container image becomes significantly large (>500 MB). A common scenario is machine learning workloads. ML workload containers usually package model data necessary for the application. The image size for these containers can easily span in to multiple GBs. As a result, these applications have a slower startup time when the runtime has to pull the image. In fact, research shows that pulling packages accounts for 76% of container start time, but only 6.4% of that data is read.

Lazy-pulling

There are two open source projects designed to improve container startup time. Stargz and SOCI are containerd plugins that reduce the cold start time by providing a way to run containers without downloading the entire image. They introduce the concept of lazy pulling, a technique that allows the runtime to download the bits from the container registry as needed.

Using lazy pulling significantly reduces the application startup time. eStargz is a lazily-pullable image format that is compatible with OCI runtimes and standard container registries like DockerHub, GitHub Container Registry.

eStargz

The eStargz image format is based on stargz image format by Container Registry Filesystem (CRFS) open source project. CRFS is a read-only FUSE filesystem that lets you mount a container image, served directly from a container registry, without pulling it all locally first. The project introduces Seekable tar.gz format, which makes tar.gz files seekable using an index.

Stargz Snapshotter plugin

Stargz snapshotter is implemented as a proxy plugin daemon (containerd-stargz-grpc) for containerd. When containerd starts a container, it queries the rootfs snapshots to stargz snapshotter daemon through a unix socket. This snapshotter remotely mounts queried eStargz layers from registries on the node and provides these mount points as remote snapshots to containerd. The plugin uses FUSE to mount eStargz layers directly from the container registry.

Running eStargz images on Amazon EKS

eStargz images are a little different from the images you’d build using docker build. In order to create an image that supports lazy pulling, you'll need an eStargz-aware image builder or a converter.

I am going to build my eStargz image using nerdctl, which is an eStargz-aware image builder. Since image size is not an issue, I will use Debian Jessie as the base image to demo. To make the image size artificially large, I will include twenty 50 MB files.

Lets generate large files containing random text:

mkdir files
for i in {0..20}; do base64 /dev/urandom | head -c 50000000 > files/file${i}.txt; done

Create a DockerFile:

cat > Dockerfile <<EOF
FROM debian:jessie
RUN apt-get update && apt-get install -y \
    vim
COPY files .

EOF

I am going to store my image in Amazon ECR. I’ll create an ECR repository:

ECR_URI=$(aws ecr create-repository \
  --repository-name estargz-demo \
  --query 'repository.repositoryUri' \
  --output text)

Use nerdctl to create container image:

sudo nerdctl build -t ${ECR_URI}:1 .

The image is not in eStargz formatted right now, I’ll have to convert it:

nerdctl image convert --estargz --oci \
  ${ECR_URI}:1 ${ECR_URI}:1-esgz

The resulting images:

nerdctl images
REPOSITORY                                                   TAG       IMAGE ID        CREATED           PLATFORM       SIZE       BLOB SIZE
account.dkr.ecr.us-west-2.amazonaws.com/estargz-demo    1         798b85a131ed    31 minutes ago    linux/amd64    1.2 GiB    828.8 MiB
account.dkr.ecr.us-west-2.amazonaws.com/estargz-demo    1-esgz    81b0ffd2a4a3    36 seconds ago    linux/amd64    0.0 B      832.3 MiB

aws ecr get-login-password | sudo nerdctl login \
   --username AWS \
   --password-stdin \
   $ECR_URI
nerdctl push ${ECR_URI}:1-esgz

The image is now available in ECR and I can start running it in my EKS cluster.

Create a managed node group that uses containerd

EKS nodes currently don’t enable containerd by default. I’ll create a node group that uses containerd.

Create environment variables for AWS Region and EKS cluster name:

AWS_REGION=<Your AWS Region>
CLUSTER_NAME=<Your EKS cluster's name>

First, I need to retrieve the id of the EKS optimized AMI in my region:

aws ssm get-parameter --name /aws/service/eks/optimized-ami/1.23/amazon-linux-2/recommended/image_id --region us-west-2 --query "Parameter.Value" --output text

cat > containerd-mng.yaml <<EOF
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: $CLUSTER_NAME
  region: $AWS_REGION

managedNodeGroups:
  - name: containerd-mng
    minSize: 1
    maxSize: 1
    desiredCapacity: 1
    instanceType: m5.xlarge
    ami: $AMI_ID
    overrideBootstrapCommand: |
      #!/bin/bash
      /etc/eks/bootstrap.sh Socrates --container-runtime containerd
    
EOF

Preparing EKS nodes to use eStargz

Next, I need to install eStargz snapshotter plugin on my worker node. I use AWS Systems Manager on my nodes, so I will connect to the containerd node.

aws ssm start-session --target <Instance ID of the worker node>

Connect to the node (using ssh or Systems Manager) and back up the current containerd config file (/etc/containerd/config.toml) and replace it with:

sudo mv /etc/containerd/config.toml /etc/containerd/config.toml.bak
cat > config.toml <<EOF
version = 2
root = "/var/lib/containerd"
state = "/run/containerd"

[grpc]
address = "/run/containerd/containerd.sock"

[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
snapshotter = "stargz"
disable_snapshot_annotations = false

[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.5"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true

[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/opt/cni/bin"
conf_dir = "/etc/cni/net.d"

[proxy_plugins]
  [proxy_plugins.stargz]
    type = "snapshot"
    address = "/run/containerd-stargz-grpc/containerd-stargz-grpc.sock"
EOF
sudo mv config.toml /etc/containerd/config.toml

Install FUSE:

sudo yum install fuse -y
sudo modprobe fuse
sudo bash -c 'echo "fuse" > /etc/modules-load.d/fuse.conf'

Install the snapshotter from its GitHub repository:

wget https://github.com/containerd/stargz-snapshotter/releases/download/v0.12.1/stargz-snapshotter-v0.12.1-linux-amd64.tar.gz
sudo tar xvzf stargz-snapshotter-v0.12.1-linux-amd64.tar.gz -C /usr/local/bin
sudo wget -O /etc/systemd/system/stargz-snapshotter.service https://raw.githubusercontent.com/containerd/stargz-snapshotter/main/script/config/etc/systemd/system/stargz-snapshotter.service
sudo systemctl enable --now stargz-snapshotter

Finally, restart the containerd and kubelet:

sudo systemctl restart containerd
sudo systemctl restart kubelet

Test lazy-pulling

I will now run the pod using my eStargz formatted image. Create a manifest for a new pod and replace the node name with the newly created node:

cat > estargz-pod.yaml <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: stargz-demo
spec:
  containers:
  - name: stargz-demo
    image: ${ECR_URI}:1-esgz
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 30; done;" ]
  nodeName: ip-192-168-32-236.us-west-2.compute.internal
EOF

The pod started in 2 seconds:

k get pods
NAME                            READY   STATUS        RESTARTS      AGE
stargz-demo                     1/1     Running       0             2s

To compare, I ran the same pod on another node that didn’t use containerd. It took 45 seconds to start the same image. That’s a lot of improvement in pod startup time!!

Image pull time without eStargz:

Image pull with eStargz:

Prefetching files

What if you wanted the runtime to always download a file and disable lazy-pulling?

eStargz supports prefetching of files. This mitigates runtime performance drawbacks caused by the on-demand fetching of each file.

The example below always pulls ls and bash files before starting the container:

$ cat <<EOF > /tmp/record.json
{ "path" : "/usr/bin/bash" }
{ "path" : "/usr/bin/ls" }
EOF
$ nerdctl image convert --estargz --oci \
    --estargz-record-in=/tmp/record.json \
    ubuntu:21.04 ubuntu:21.04-ls

Seekable OCI (SOCI)

Seekable OCI (SOCI) is a technology open sourced by AWS that enables containers to launch faster by lazily loading the container image. SOCI works by creating an index (SOCI Index) of the files within an existing container image. This index is a key enabler to launching containers faster, providing the capability of extracting an individual file from a container image before downloading the entire archive.

SOCI borrows some of the design principles from stargz-snapshotter, but takes a different approach.

A SOCI index is generated separately from the container image, and is stored in the registry as an OCI Artifact and linked back to the container image by OCI Reference Types. This means that the container images do not need to be converted, image digests do not change, and image signatures remain valid.

Most OCI registries like DockerHub and ECR do not currently support the "referrers" feature. So you cannot use SOCI unless you run a local ORAS registry.

Conclusion

If you are looking to reduce container start time for your workloads, eStargz snapshotter is definitely worth a look. You’ll have to change your existing container build pipelines to add a step to convert images though. When SOCI support is available in OCI registries, you’ll be able to lazily-pull images without converting them first.

You’ll also have to install eStargz snapshotter on your nodes and configure containerd to use the plugin. Creating a custom AMI or using AWS Systems manager will be the best way to do that. You could also use a DaemonSet to configure the system, but keep in mind that you’ll need to account for restarting the containerd and kubelet.