AMD GPU support with the rocm detector and YOLOv8 pretrained model download (#9762)

* ROCm AMD/GPU based build and detector, WIP * detectors/rocm: separate yolov8 postprocessing into own function; fix box scaling; use cv2.dnn.blobForImage for preprocessing; assert on required model parameters * AMD/ROCm: add couple of more ultralytics models; comments * docker/rocm: make imported model files readable by all * docker/rocm: readme about running on AMD GPUs * docker/rocm: updated README * docker/rocm: updated README * docker/rocm: updated README * detectors/rocm: separated preprocessing functions into yolo_utils.py * detector/plugins: added onnx cpu plugin * docker/rocm: updated container with limite label sets * example detectors view * docker/rocm: updated README.md * docker/rocm: update README.md * docker/rocm: do not set HSA_OVERRIDE_GFX_VERSION at all for the general version as the empty value broke rocm * detectors: simplified/optimized yolov8_postprocess * detector/yolo_utils: indentation, remove unused variable * detectors/rocm: default option to conserve cpu usage at the expense of latency * detectors/yolo_utils: use nms to prefilter overlapping boxes if too many detected * detectors/edgetpu_tfl: add support for yolov8 * util/download_models: script to download yolov8 model files * docker/main: add download-models overlay into s6 startup * detectors/rocm: assume models are in /config/model_cache/yolov8/ * docker/rocm: compile onnx files into mxr files at startup * switch model download into bash script * detectors/rocm: automatically override HSA_OVERRIDE_GFX_VERSION for couple of known chipsets * docs: rocm detector first notes * typos * describe builds (harakas temporary) * docker/rocm: also build a version for gfx1100 * docker/rocm: use cp instead of tar * docker.rocm: remove README as it is now in detector config * frigate/detectors: renamed yolov8_preprocess->preprocess, pass input tensor element type * docker/main: use newer openvino (2023.3.0) * detectors: implement class aggregation * update yolov8 model * add openvino/yolov8 support for label aggregation * docker: remove pointless s6/timeout-up files * Revert "detectors: implement class aggregation" This reverts commit dcfe6bbf6f. * detectors/openvino: remove class aggregation * detectors: increase yolov8 postprocessing score trershold to 0.5 * docker/rocm: separate rocm distributed files into its own build stage * Update object_detectors.md * updated CODEOWNERS file for rocm * updated build names for documentation * Revert "docker/main: use newer openvino (2023.3.0)" This reverts commit dee95de908. * reverrted openvino detector * reverted edgetpu detector * scratched rocm docs from any mention of edgetpu or openvino * Update docs/docs/configuration/object_detectors.md Co-authored-by: Nicolas Mowen <nickmowen213@gmail.com> * renamed frigate.detectors.yolo_utils.py -> frigate.detectors.util.py * clarified rocm example performance * Improved wording and clarified text * Mentioned rocm detector for AMD GPUs * applied ruff formating * applied ruff suggested fixes * docker/rocm: fix missing argument resulting in larger docker image sizes * docs/configuration/object_detectors: fix links to yolov8 release files --------- Co-authored-by: Nicolas Mowen <nickmowen213@gmail.com>
2024-02-10 14:41:46 +02:00
parent 64988c9be0
commit 44d8cdbba1
26 changed files with 1291 additions and 1 deletions
--- a/docs/docs/configuration/object_detectors.md
+++ b/docs/docs/configuration/object_detectors.md
@@ -397,3 +397,158 @@ detectors:
 ```

 :::
+
+## AMD/ROCm GPU detector
+
+### Setup
+
+The `rocm` detector supports running [ultralytics](https://github.com/ultralytics/ultralytics) yolov8 models on AMD GPUs and iGPUs. Use a frigate docker image with `-rocm` suffix, for example `ghcr.io/blakeblackshear/frigate:stable-rocm`.
+
+As the ROCm software stack is quite bloated, there are also smaller versions for specific GPU chipsets:
+
+- `ghcr.io/blakeblackshear/frigate:stable-rocm-gfx900`
+- `ghcr.io/blakeblackshear/frigate:stable-rocm-gfx1030`
+- `ghcr.io/blakeblackshear/frigate:stable-rocm-gfx1100`
+
+### Docker settings for GPU access
+
+ROCm needs access to the `/dev/kfd` and `/dev/dri` devices. When docker or frigate is not run under root then also `video` (and possibly `render` and `ssl/_ssl`) groups should be added.
+
+When running docker directly the following flags should be added for device access:
+
+```bash
+$ docker run --device=/dev/kfd --device=/dev/dri  \
+    ...
+```
+
+When using docker compose:
+
+```yaml
+services:
+  frigate:
+...
+    devices:
+      - /dev/dri
+      - /dev/kfd
+...
+```
+
+For reference on recommended settings see [running ROCm/pytorch in Docker](https://rocm.docs.amd.com/projects/install-on-linux/en/develop/how-to/3rd-party/pytorch-install.html#using-docker-with-pytorch-pre-installed).
+
+### Docker settings for overriding the GPU chipset
+
+Your GPU or iGPU might work just fine without any special configuration but in many cases they need manual settings. AMD/ROCm software stack comes with a limited set of GPU drivers and for newer or missing models you will have to override the chipset version to an older/generic version to get things working.
+
+Also AMD/ROCm does not "officially" support integrated GPUs. It still does work with most of them just fine but requires special settings. One has to configure the `HSA_OVERRIDE_GFX_VERSION` environment variable. See the [ROCm bug report](https://github.com/ROCm/ROCm/issues/1743) for context and examples.
+
+For chipset specific frigate rocm builds this variable is already set automatically.
+
+For the general rocm frigate build there is some automatic detection:
+
+  - gfx90c -> 9.0.0
+  - gfx1031 -> 10.3.0
+  - gfx1103 -> 11.0.0
+
+If you have something else you might need to override the `HSA_OVERRIDE_GFX_VERSION` at Docker launch. Suppose the version you want is `9.0.0`, then you should configure it from command line as:
+
+```bash
+$ docker run -e HSA_OVERRIDE_GFX_VERSION=9.0.0 \
+    ...
+```
+
+When using docker compose:
+
+```yaml
+services:
+  frigate:
+...
+    environment:
+      HSA_OVERRIDE_GFX_VERSION: "9.0.0"
+```
+
+Figuring out what version you need can be complicated as you can't tell the chipset name and driver from the AMD brand name.
+
+  - first make sure that rocm environment is running properly by running `/opt/rocm/bin/rocminfo` in the frigate container -- it should list both the CPU and the GPU with their properties
+  - find the chipset version you have (gfxNNN) from the output of the `rocminfo` (see below)
+  - use a search engine to query what `HSA_OVERRIDE_GFX_VERSION` you need for the given gfx name ("gfxNNN ROCm HSA_OVERRIDE_GFX_VERSION")
+  - override the `HSA_OVERRIDE_GFX_VERSION` with relevant value
+  - if things are not working check the frigate docker logs
+
+#### Figuring out if AMD/ROCm is working and found your GPU
+
+```bash
+$ docker exec -it frigate /opt/rocm/bin/rocminfo
+```
+
+#### Figuring out your AMD GPU chipset version:
+
+We unset the `HSA_OVERRIDE_GFX_VERSION` to prevent an existing override from messing up the result:
+
+```bash
+$ docker exec -it frigate /bin/bash -c '(unset HSA_OVERRIDE_GFX_VERSION && /opt/rocm/bin/rocminfo |grep gfx)'
+```
+
+### Yolov8 model download and available files
+
+The ROCm specific frigate docker containers automatically download yolov8 files from https://github.com/harakas/models/releases/tag/yolov8.1-1.1/ at startup --
+they fetch [yolov8.small.models.tar.gz](https://github.com/harakas/models/releases/download/yolov8.1-1.1/yolov8.small.models.tar.gz)
+and uncompresses it into the `/config/model_cache/yolov8/` directory. After that the model files are compiled for your GPU chipset.
+
+Both the download and compilation can take couple of minutes during which frigate will not be responsive. See docker logs for how it is progressing.
+
+Automatic model download can be configured with the `DOWNLOAD_YOLOV8=1/0` environment variable either from the command line
+
+```bash
+$ docker run ... -e DOWNLOAD_YOLOV8=1 \
+    ...
+```
+
+or when using docker compose:
+
+```yaml
+services:
+  frigate:
+...
+    environment:
+      DOWNLOAD_YOLOV8: "1"
+```
+
+Download can be triggered also in regular frigate builds using that environment variable. The following files will be available under `/config/model_cache/yolov8/`:
+
+- `yolov8[ns]_320x320.onnx` -- nano (n) and small (s) sized floating point model files usable by the `rocm` and `onnx` detectors that have been trained using the coco dataset (90 classes)
+- `yolov8[ns]-oiv7_320x320.onnx` -- floating point model files usable by the `rocm` and `onnx` detectors that have been trained using the google open images v7 dataset (601 classes)
+- `labels.txt` and `labels-frigate.txt` -- full and aggregated labels for the coco dataset models
+- `labels-oiv7.txt` and `labels-oiv7-frigate.txt` -- labels for the oiv7 dataset models
+
+The aggregated label files contain renamed labels leaving only `person`, `vehicle`, `animal` and `bird` classes. The oiv7 trained models contain 601 classes and so are difficult to configure manually -- using aggregate labels is recommended.
+
+Larger models (of `m` and `l` size and also at `640x640` resolution) can be found at https://github.com/harakas/models/releases/tag/yolov8.1-1.1/ but have to be installed manually.
+
+The oiv7 models have been trained using a larger google open images v7 dataset. They also contain a lot more detection classes (over 600) so using aggregate label files is recommended. The large number of classes leads to lower baseline for detection probability values and also for higher resource consumption (they are slower to evaluate).
+
+The `rocm` builds precompile the `onnx` files for your chipset into `mxr` files. If you change your hardware or GPU or have compiled the wrong versions you need to delete the cached `.mxr` files under `/config/model_cache/yolov8/`.
+
+### Frigate configuration
+
+You also need to modify the frigate configuration to specify the detector, labels and model file. Here is an example configuration running `yolov8s`:
+
+```yaml
+model:
+  labelmap_path: /config/model_cache/yolov8/labels.txt
+  model_type: yolov8
+detectors:
+  rocm:
+    type: rocm
+    model:
+      path: /config/model_cache/yolov8/yolov8s_320x320.onnx
+```
+
+Other settings available for the rocm detector
+
+- `conserve_cpu: True` -- run ROCm/HIP synchronization in blocking mode saving CPU (at small loss of latency and maximum throughput)
+- `auto_override_gfx: True` -- enable or disable automatic gfx driver detection
+
+### Expected performance
+
+On an AMD Ryzen 3 5400U with integrated GPU (gfx90c) the yolov8n runs in around 9ms per image (about 110 detections per second) and 18ms (55 detections per second) for yolov8s (at 320x320 detector resolution).
+
--- a/docs/docs/frigate/hardware.md
+++ b/docs/docs/frigate/hardware.md
@@ -105,6 +105,12 @@ Frigate supports SBCs with the following Rockchip SoCs:

 Using the yolov8n model and an Orange Pi 5 Plus with RK3588 SoC inference speeds vary between 20 - 25 ms.

+#### AMD GPUs and iGPUs
+
+With the [rocm](../configuration/object_detectors.md#amdrocm-gpu-detector) detector Frigate can take advantage of many AMD GPUs and iGPUs.
+
+An AMD Ryzen mini PC with AMD Ryzen 3 5400U iGPU takes about 9 ms to evaluate yolov8n.
+
 ## What does Frigate use the CPU for and what does it use a detector for? (ELI5 Version)

 This is taken from a [user question on reddit](https://www.reddit.com/r/homeassistant/comments/q8mgau/comment/hgqbxh5/?utm_source=share&utm_medium=web2x&context=3). Modified slightly for clarity.
--- a/docs/docs/frigate/installation.md
+++ b/docs/docs/frigate/installation.md
@@ -150,6 +150,10 @@ The community supported docker image tags for the current stable version are:
 - `stable-tensorrt-jp5` - Frigate build optimized for nvidia Jetson devices running Jetpack 5
 - `stable-tensorrt-jp4` - Frigate build optimized for nvidia Jetson devices running Jetpack 4.6
 - `stable-rk` - Frigate build for SBCs with Rockchip SoC
+- `stable-rocm` - Frigate build for [AMD GPUs and iGPUs](../configuration/object_detectors.md#amdrocm-gpu-detector), all drivers
+  - `stable-rocm-gfx900` - AMD gfx900 driver only
+  - `stable-rocm-gfx1030` - AMD gfx1030 driver only
+  - `stable-rocm-gfx1100` - AMD gfx1100 driver only

 ## Home Assistant Addon