May 17, 2024

Getting "Repository does not exist" for NVIDIA CUDA containers? This is how to prevent NVIDIA from breaking your builds.

Getting "Repository does not exist" for NVIDIA CUDA containers? This is how to prevent NVIDIA from breaking your builds.

NVIDIA new container image support policy, announced in October 2022, presents a challenge for developers who rely on these images for their workflows. As images reach their end-of-life (EOL) and are subsequently deleted from Docker Hub and NGC, previously working builds may break due to missing image layers.

The error looks something along the lines of this:

Error response from daemon: manifest for nvidia/cuda:12.0.1 not found: manifest unknown: manifest unknown

Many developers have experienced the frustration of encountering a "repository does not exist or may require 'docker login'" error when attempting to pull an older NVIDIA container image. This often occurs because the image has reached its EOL and been deleted, leaving builds dependent on that image broken. This can be particularly disruptive for projects with complex build pipelines or those relying on older versions of the CUDA toolkit.

Additionally NVDIA may update tags after they have been published, potentially breaking your builds.

Why does this happen?

As of Friday October 28th, 2022 we, the Kitmaker Team at Nvidia announced a new support policy for CUDA Container Images. This support policy makes life easier for NVIDIA, but has some consequences for developers relying on the container images.

NVIDIA's support policy

NVIDIA's new policy aims to address several challenges:

- Security: Old images might contain unpatched vulnerabilities, posing security risks
- Sustainability: Maintaining a vast library of rarely used images is resource-intensive for NVDIA
- Disruption: The presence of outdated images can hinder development and adoption of newer versions of NVDIA's software, forcing the user to update

While these points are valid, it will also cause your builds to break at unexpected times, leading to a lot of frustration and takes control away from the developer.

Caching images for uninterrupted workflows with Stablebuild.

Stablebuild provides a solution to the image pull problem by offering a platform that caches Docker images indefinitely. This means that even after NVIDIA deletes or overwrites an image from its official repositories, Stablebuild users can still access and pull it.

We do this in the following way:

- Immutable Docker mirrors: Stablebuild always serves the same image once its pulled, not changing the version unless you decide it too.
- Multi-architecture support: Stablebuild pulls the same image for various architectures, so that when you run it later on different architecture it will still work.

By leveraging Stablebuild, developers can ensure their workflows remain uninterrupted even when NVIDIA's EOL policy removes images from official repositories.

Sign up for stablebuild

Stablebuild offers a community tier, allowing developers to access its image caching capabilities for free. This provides a valuable resource for safeguarding your workflows against disruptions caused by NVIDIA's image deletion policy.

Sign up for Stablebuild today for FREE and ensure your projects remain resilient in the face of changing container image availability.