Unable to get GPU version working on docker

Linux Kernel: 5.15.0-0.bpo.3-amd64
OS: Debian 11 Bullseye
Docker Version: 20.10.12, build e91ed57
Nvidia Driver: 510.47.03
Cuda: 11.6
GPU: Nvidia Quadro T400 PNY 2GB


  1. Install Docker based on Using DeepStack with NVIDIA GPUs

  2. Install Nvidia driver using command sudo apt install nvidia-driver

  3. Install Cuda driver using command sudo apt install cuda-drivers (Without this step I get the errors below from deepstack about CUDA)

    File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 135, in validate_cuda_device raise RuntimeError('Attempting to deserialize object on a CUDA ' RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

  4. Install nvidia container toolkit using command sudo apt install nvidia-container-toolkit

  5. Edit this file /etc/nvidia-container-runtime/config.toml
    the line ldconfig = "@/sbin/ldconfig"ldconfig = "/sbin/ldconfig"

  6. Start the deepstack container using docker run --gpus all --name deepstack-gpu -d --restart unless-stopped -e THREADCOUNT=15 -e VISION-DETECTION=True -e MODE=High -v localstorage:/datastore -p 80:5000 deepquestai/deepstack:gpu

  7. Start deepstack-ui using docker run --name deepstack-ui -d -p 81:8501 -e DEEPSTACK_IP='deepstack.gravee.com' robmarkcole/deepstack-ui:latest

No errors are produced but no objects are detected either.
Nvidia-smi on the host is showing a python3 process is using the GPU.

| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA T400         Off  | 00000000:01:00.0 Off |                  N/A |
| 38%   40C    P0    N/A /  31W |   1615MiB /  2048MiB |      0%      Default |
|                               |                      |                  N/A |

| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|    0   N/A  N/A     30034      C   python3                          1612MiB |

Really not sure where to go from here.
I’ve tried the previous version of deepstack and get the same results.

Deepstack CPU works fine on the same machine, and works exceptionally well with the new multithreading env variable.

I have also done this by installing the drivers using the .run file, but the version in the repo is the same as the current run file.