Persistent Data Management in Docker

By Anurag Singh

Updated on Aug 27, 2024

Persistent Data Management in Docker

In this tutorial, we'll discuss persistent data management in Docker.

Docker is a powerful platform for developing, shipping, and running applications in containers. While containers are great for isolating applications and their dependencies, they are inherently ephemeral, meaning that data stored inside a container is lost when the container is removed. To handle persistent data, Docker provides two primary mechanisms: Volumes and Bind Mounts. This tutorial will explore how to use these options to persist data in Docker containers.

Prerequisites

Persistent Data Management in Docker

Introduction to Docker Storage

Docker containers are designed to be stateless and immutable, which means that any data written to a container's filesystem will be lost when the container is deleted. To persist data across container restarts or even when containers are removed, Docker provides two main storage options:

  • Volumes: Managed by Docker and stored in a specific directory on the Docker host.
  • Bind Mounts: Link a directory or file on the host system to a directory or file in the container.

Both methods have their use cases and advantages, which we'll explore in this tutorial.

Understanding Volumes

Volumes are the preferred mechanism for persisting data generated by and used by Docker containers. They are managed by Docker, which means Docker handles where and how the data is stored on the host.

Creating and Using Volumes

Creating a Volume:

Volumes can be created explicitly using the docker volume create command:

docker volume create my_volume

This command creates a volume named my_volume that can be used by containers.

Using a Volume:

To use a volume with a container, you can specify the volume in the docker run command:

docker run -d --name my_container -v my_volume:/data busybox

In this example, the volume my_volume is mounted to the /data directory inside the container. Any data written to /data will persist even if the container is deleted.

Inspecting a Volume:

You can inspect a volume to view its details, including its mount point on the host:

docker volume inspect my_volume

This command returns a JSON object with information about the volume.

Managing Volumes

Listing Volumes:

To list all volumes on the Docker host, use the following command:

docker volume ls

Removing a Volume:

Volumes can be removed using the docker volume rm command:

docker volume rm my_volume

Note: You cannot remove a volume that is currently in use by a container.

Understanding Bind Mounts

Bind Mounts allow you to mount a specific file or directory from the host filesystem into a container. Unlike volumes, bind mounts are dependent on the directory structure of the host machine.

Creating and Using Bind Mounts

Creating a Bind Mount:

You can create a bind mount by specifying the full path of the host directory or file in the docker run command:

docker run -d --name my_container -v /host/path:/container/path busybox

In this example, the directory /host/path on the Docker host is mounted to /container/path inside the container.

Using Bind Mounts:

Bind mounts are useful when you want a container to access or share data with the host filesystem. This can be particularly useful in development environments where you want changes made on the host to reflect immediately inside the container.

Choosing Between Volumes and Bind Mounts

Choosing between volumes and bind mounts depends on your specific use case:

Volumes are managed by Docker and are ideal for storing persistent data that is independent of the host machine's filesystem structure. They are portable and work across different host environments.

Bind Mounts give you direct access to the host filesystem and are useful when you need a container to interact with specific files or directories on the host. However, bind mounts are tightly coupled to the host's filesystem, making them less portable.

Advanced commands

1. Using Named Volumes with Specific Drivers

Docker allows you to use specific volume drivers to customize how and where your data is stored. For example, you can use the local driver with options to store volumes on a different filesystem or location.

docker volume create \
  --driver local \
  --opt type=tmpfs \
  --opt device=tmpfs \
  --opt o=size=100m,uid=1000 \
  my_tmpfs_volume

This command creates a named volume my_tmpfs_volume that is stored in memory (tmpfs) with a size limit of 100MB and specific ownership (uid=1000).

2. Attaching a Volume to Multiple Containers

You can share a single volume between multiple containers. This is particularly useful when multiple containers need to read from or write to the same data.

docker run -d --name container1 -v my_shared_volume:/data busybox
docker run -d --name container2 -v my_shared_volume:/data busybox

In this example, both container1 and container2 share the my_shared_volume volume, which is mounted at /data inside both containers.

3. Mounting a Volume in Read-Only Mode

You can mount a volume or a bind mount in read-only mode to ensure that the container cannot modify the data.

docker run -d --name my_container -v my_volume:/data:ro busybox

The :ro flag mounts the my_volume in read-only mode, so any attempts to write to /data inside the container will fail.

4. Specifying Bind Mount Consistency

When using bind mounts, especially on macOS or Windows, you can specify the consistency mode to control how file changes are synchronized between the host and the container.

docker run -d --name my_container -v /host/path:/container/path:consistent busybox

In this example, :consistent ensures consistent views of the file system between the host and the container. Other modes include :cached and :delegated, which offer different trade-offs between performance and consistency.

5. Backing Up and Restoring Data from a Volume

You can back up the contents of a Docker volume by creating a temporary container that uses the volume and then copying the data to the host filesystem.

docker run --rm -v my_volume:/data -v $(pwd):/backup busybox \
  tar cvf /backup/backup.tar /data

This command creates a backup of my_volume and stores it as backup.tar in the current directory.

To restore the data to a volume:

docker run --rm -v my_volume:/data -v $(pwd):/backup busybox \
  tar xvf /backup/backup.tar -C /

This command restores the data from backup.tar to my_volume.

6. Inspecting Bind Mounts and Volumes in Running Containers

You can inspect the volumes and bind mounts used by a running container with the docker inspect command.

docker inspect --format='{{json .Mounts}}' my_container

This command outputs a JSON representation of all mounts (volumes and bind mounts) used by my_container, showing details like the source, destination, and options.

7. Cleaning Up Unused Volumes

Over time, you may accumulate unused volumes that are no longer attached to any containers. You can remove these dangling volumes with the following command:

docker volume prune -f

This command removes all unused volumes, freeing up disk space.

8. Migrating Data Between Volumes

You can copy data between two Docker volumes by creating a temporary container that mounts both volumes and then using cp or rsync to copy the data.

docker run --rm -v source_volume:/from -v target_volume:/to busybox \
  sh -c "cp -a /from/. /to/"
  
This command copies all data from source_volume to target_volume.

These advanced commands provide greater flexibility and control over how you manage persistent storage in Docker, allowing you to tailor storage solutions to your specific needs.

Conclusion

Persistent storage is a critical aspect of containerized applications, especially when dealing with databases, logs, or other stateful data. Docker provides flexible options with volumes and bind mounts, allowing you to choose the best approach for your application's needs.

Volumes offer a robust, Docker-managed solution for persisting data.
Bind Mounts provide direct access to the host filesystem, which can be useful in certain scenarios.

Understanding these options will help you make informed decisions about how to manage data in your Docker containers effectively.