Adding documentation about device storage drivers

This commit is contained in:
Kent Knox 2017-02-27 13:37:59 -06:00
parent d9f4e4e070
commit f555443d36
2 changed files with 50 additions and 11 deletions

View File

@ -14,8 +14,8 @@ sudo docker pull rocm/rocm-terminal
sudo docker run -it --rm --device="/dev/kfd" rocm/rocm-terminal
```
## ROCm-docker quick start
Instructions and a few asciicasts are available to help users quickly get running with rocm-docker. Visit the [quick start guide](quick-start.md) to find out more.
## ROCm-docker set up guide
[Installation instructions](quick-start.md) and asciicasts demos are available to help users quickly get running with rocm-docker. Visit the set up guide to read more.
### F.A.Q
When working with the ROCm containers, the following are common and useful docker commands:
@ -23,11 +23,13 @@ When working with the ROCm containers, the following are common and useful docke
* A message like the following typically means your user does not have permissions to execute docker; use sudo or [add your user](https://docs.docker.com/engine/installation/linux/ubuntulinux/#/create-a-docker-group) to the docker group.
* `Cannot connect to the Docker daemon. Is the docker daemon running on this host?`
* Open another terminal into a running container
* `sudo docker exec -it <CONTAINER-NAME> env TERM=xterm-color bash -l`
* `sudo docker exec -it <CONTAINER-NAME> bash -l`
* Copy files from host machine into running docker container
* `sudo docker cp HOST_PATH <CONTAINER-NAME>:/PATH`
* Copy files from running docker container onto host machine
* `sudo docker cp <CONTAINER-NAME>:/PATH/TO/FILE HOST_PATH`
* If receiving messages about *no space left on device* when pulling images, check the storage driver in use by the docker engine. If its 'device mapper', that means the image size limits imposed by the 'device mapper' storage driver are a problem
* Follow the documentation in the [quick start guide](quick-start.md) for a solution to change to the storage driver
#### Saving work in a container
Docker containers are typically ephemeral, and are discarded after closing the container with the '**--rm**' flag to `docker run`. However, there are times when it is desirable to close a container that has arbitrary work in it, and serialize it back into a docker image. This may be to to create a checkpoint in a long and complicated series of instructions, or it may be desired to share the image with others through a docker registry, such as docker hub.
@ -55,7 +57,7 @@ The first method produces docker images with the smallest footprint and best bui
The setup script included in this repository is provides some flexibility to how docker containers are constructed. Unfortunately, Dockerfiles do not have a preprocessor or template language, so typically build instructions are hardcoded. However, the setup script allows us to write a primitive 'template', and after running it instantiates baked dockerfiles with environment variables substituted in. For instance, if you wish to build release images and debug images, first run the setup script to generate release dockerfiles and build the images. Then, run the setup script again and specify debug dockerfiles and build new images. The docker images should generate unique image names and not conflict with each other.
## setup.sh
Currently, the setup.sh scripts checks to make sure that it is running on an **Ubuntu system**, as it makes a few assumptions about the availability of tools and file locations. If running rocm on a Fedora machine, inspect the source of setup.sh and issue the appropriate commands manually. There are a few parameters to setup.sh of a generic nature that affects all images built after running. If no parameters are given, built images will be based off of Ubuntu 16.04 with rocm installed from debians. Supported parameters can be queried with `./setup --help`.
Currently, the setup.sh scripts checks to make sure that it is running on an **Ubuntu system**, as it makes a few assumptions about the availability of tools and file locations. If running rocm on a Fedora machine, inspect the source of setup.sh and issue the appropriate commands manually. There are a few parameters to setup.sh of a generic nature that affects all images built after running. If no parameters are given, built images will be based off of Ubuntu 16.04 with rocm components installed from debians downloaded from packages.amd.com. Supported parameters can be queried with `./setup --help`.
| setup.sh parameters | parameter [default]| description |
|-----|-----|-----|

View File

@ -1,8 +1,12 @@
### Install rocm-kernel
[![Install rocm-kernel](https://asciinema.org/a/cv0r34re9hp9g5hoja8vyh803.png)](https://asciinema.org/a/cv0r34re9hp9g5hoja8vyh803)
# Preparing a machine to run with rocm and docker
* [Installing ROCK kernel](https://github.com/RadeonOpenCompute/ROCm#debian-repository---apt-get) on Ubuntu 14.04
* This step will eventually go away as newer linux kernel images trickle down into upcoming distros. Our kernel module developers (AMDGPU and AMDKFD) are contributing source back into the mainline linux kernel. This step of installing a ROCm specific kernel image is temporary.
The following instructions assume a fresh/blank machine to be prepared for the ROCm + Docker environment; no additional software has been installed other than the typical stock package updating.
It is my recommendation to install the rocm kernel first. Depending on how distribution release cycles lines up, the rocm kernel is often newer than the stock kernel shipping in most linux distributions. The newer kernel often supports newer AMD hardware better, and stock video resolutions and hardware acceleration performance are typically improved. As of the time of this writing, ROCm officially supports Ubuntu and Fedora Linux distributions. The following asciicast demonstrates updating the kernel on Ubuntu 14.04. More detailed instructions can be found on the Radeon Open Compute website:
* [Installing ROCK kernel](https://github.com/RadeonOpenCompute/ROCm#debian-repository---apt-get) on Ubuntu
### Step 1: Install rocm-kernel
[![Install rocm-kernel](https://asciinema.org/a/cv0r34re9hp9g5hoja8vyh803.png)](https://asciinema.org/a/cv0r34re9hp9g5hoja8vyh803)
```bash
wget -qO - http://packages.amd.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add -
@ -10,8 +14,41 @@ sudo sh -c 'echo deb [arch=amd64] http://packages.amd.com/rocm/apt/debian/ trust
> /etc/apt/sources.list.d/rocm.list'
sudo apt-get update && sudo apt-get install rocm-kernel
```
Make sure to reboot the machine after installing the ROCm kernel package to force the new kernel to load on reboot. You can verify the ROCm kernel is loaded by typing the following command at a prompt:
### Build ROCm container using docker CLI
```bash
uname -r
```
Printed on the screen should be a string that's obviously the new ROCm kernel, such as : `4.6.0-kfd-compute-rocm-rel-1.4-16`. In this case, it is plain to see that the rocm compute kernel is based off of the linux 4.6.0 kernel.
### Step 2: Install docker
After verifying the new kernel is running, next install the docker engine. Manual instructions to install docker on various distro's can be found on the [docker website](https://docs.docker.com/engine/installation/linux/), but perhaps the simplest method is to use a bash script available from docker itself. If it's OK in your organization to run a bash script on your machine downloaded from the internet, open a bash prompt and execute the following line:
```bash
curl -sSL https://get.docker.com/ | sh
```
The above script looks at the linux distribution and the installed kernel, and installs docker appropriately. The script will output a warning message on a ROCm platform saying that it does not recognize the rocm kernel; this is normal and can be safely ignored. The script does proper docker installation without recognizing the kernel.
### Step 3: Verify/Change the docker device storage driver
The docker device storage driver manages how docker accesses images and containers. There are many available, and [documentation and thorough descriptions](https://docs.docker.com/engine/userguide/storagedriver/imagesandcontainers/) on storage driver architecture can be found on the official docker website. It is possible to check which storage driver docker is using by issuing a
```bash
sudo docker info
```
command at the command prompt and looking for the *'Storage Driver: '* output. It is hard to predict what storage driver Docker will choose as default on install, and defaults change over time, but in our experience we have run into a problems with the *'devicemapper'* storage driver with large image sizes. The *'devicemapper'* storage driver imposes limitations on the maximum size images and containers can be. If you work in a field of 'big data', such as in DNN applications, the 10 GB default limit of *'devicemapper'* is limiting. There are two options available if you run into this limit:
1. Switch to a different storage driver
* **AMD recommends using 'overlay2'**, whose dependencies are met by the ROCm kernel and should be available
* [overlay2](https://docs.docker.com/engine/userguide/storagedriver/overlayfs-driver/) provides for unlimited image size
* If 'overlay2' is not an option, storage drivers can be [chosen at service startup time](https://docs.docker.com/engine/userguide/storagedriver/selectadriver/) with the **--storage-driver=&lt;name&gt;** option
2. If you must stick with 'devicemapper', pass the 'devicemapper' [configuration variable](https://docs.docker.com/engine/reference/commandline/dockerd/) --dm.basesize on service startup to increase the potential image maximum
The downside to switching to the 'overlay2' storage driver after creating and working with 'devicemapper' images is that existing images need to be recreated. As such, we recommend verifying that docker be set up using the 'overlay2' storage driver before engaging in significant work.
### Step 4a: Build ROCm container using docker CLI
[![asciicast](https://asciinema.org/a/5u0d81txy9tskiitcispluw9v.png)](https://asciinema.org/a/5u0d81txy9tskiitcispluw9v)
* Clone and build the container
@ -23,7 +60,7 @@ sudo docker build -t rocm/rocm-terminal rocm-terminal
sudo docker run -it --rm --device="/dev/kfd" rocm/rocm-terminal
```
### Build ROCm container using docker-compose
### (optional) Step 4b: Build ROCm container using docker-compose
[![asciicast](https://asciinema.org/a/77cfxjz9ilt2x9ck27r9vanu7.png)](https://asciinema.org/a/77cfxjz9ilt2x9ck27r9vanu7)
* Clone and build the container using [docker-compose](https://docs.docker.com/compose/install/)
@ -33,7 +70,7 @@ git clone https://github.com/RadeonOpenCompute/ROCm-docker
cd ROCm-docker
sudo docker-compose run --rm rocm
```
### Verify successful build of ROCm-docker container
### Step 5: Verify successful build of ROCm-docker container
* Verify a working container-based ROCm software stack
* After step #2 or #3, a bash login prompt to a running docker container should be available
* `hcc --version` should display version information of the AMD heterogeneous compiler