Nvidia Jetson ffmpeg + TensorRT support (#6458)

* Non-Jetson changes

Required for later commits:
- Allow base image to be overridden (and don't assume its WORKDIR)
- Ensure python3.9
- Map hwaccel decode presets as strings instead of lists
Not required:
- Fix existing documentation
- Simplify hwaccel scale logic

* Prepare for multi-arch tensorrt build

* Add tensorrt images for Jetson boards

* Add Jetson ffmpeg hwaccel

* Update docs

* Add CODEOWNERS

* CI

* Change default model from yolov7-tiny-416 to yolov7-320

In my experience the tiny models perform markedly worse without being
much faster

* fixup! Update docs
This commit is contained in:
Andrew Reiter
2023-07-26 06:50:41 -04:00
committed by GitHub
parent 680198148b
commit a96a951e23
28 changed files with 567 additions and 139 deletions

View File

@@ -11,16 +11,18 @@ It is highly recommended to use hwaccel presets in the config. These presets not
See [the hwaccel docs](/configuration/hardware_acceleration.md) for more info on how to setup hwaccel for your GPU / iGPU.
| Preset | Usage | Other Notes |
| --------------------- | ---------------------------- | ----------------------------------------------------- |
| preset-rpi-32-h264 | 32 bit Rpi with h264 stream | |
| preset-rpi-64-h264 | 64 bit Rpi with h264 stream | |
| preset-vaapi | Intel & AMD VAAPI | Check hwaccel docs to ensure correct driver is chosen |
| preset-intel-qsv-h264 | Intel QSV with h264 stream | If issues occur recommend using vaapi preset instead |
| preset-intel-qsv-h265 | Intel QSV with h265 stream | If issues occur recommend using vaapi preset instead |
| preset-nvidia-h264 | Nvidia GPU with h264 stream | |
| preset-nvidia-h265 | Nvidia GPU with h265 stream | |
| preset-nvidia-mjpeg | Nvidia GPU with mjpeg stream | Recommend restreaming mjpeg and using nvidia-h264 |
| Preset | Usage | Other Notes |
| --------------------- | ------------------------------ | ----------------------------------------------------- |
| preset-rpi-32-h264 | 32 bit Rpi with h264 stream | |
| preset-rpi-64-h264 | 64 bit Rpi with h264 stream | |
| preset-vaapi | Intel & AMD VAAPI | Check hwaccel docs to ensure correct driver is chosen |
| preset-intel-qsv-h264 | Intel QSV with h264 stream | If issues occur recommend using vaapi preset instead |
| preset-intel-qsv-h265 | Intel QSV with h265 stream | If issues occur recommend using vaapi preset instead |
| preset-nvidia-h264 | Nvidia GPU with h264 stream | |
| preset-nvidia-h265 | Nvidia GPU with h265 stream | |
| preset-nvidia-mjpeg | Nvidia GPU with mjpeg stream | Recommend restreaming mjpeg and using nvidia-h264 |
| preset-jetson-h264 | Nvidia Jetson with h264 stream | |
| preset-jetson-h265 | Nvidia Jetson with h265 stream | |
### Input Args Presets

View File

@@ -246,3 +246,77 @@ If you do not see these processes, check the `docker logs` for the container and
These instructions were originally based on the [Jellyfin documentation](https://jellyfin.org/docs/general/administration/hardware-acceleration.html#nvidia-hardware-acceleration-on-docker-linux).
# Community Supported
## NVIDIA Jetson (Orin AGX, Orin NX, Orin Nano*, Xavier AGX, Xavier NX, TX2, TX1, Nano)
A separate set of docker images is available that is based on Jetpack/L4T. They comes with an `ffmpeg` build
with codecs that use the Jetson's dedicated media engine. If your Jetson host is running Jetpack 4.6, use the
`frigate-tensorrt-jp4` image, or if your Jetson host is running Jetpack 5.0+, use the `frigate-tensorrt-jp5`
image. Note that the Orin Nano has no video encoder, so frigate will use software encoding on this platform,
but the image will still allow hardware decoding and tensorrt object detection.
You will need to use the image with the nvidia container runtime:
### Docker Run CLI - Jetson
```bash
docker run -d \
...
--runtime nvidia
ghcr.io/blakeblackshear/frigate-tensorrt-jp5
```
### Docker Compose - Jetson
```yaml
version: '2.4'
services:
frigate:
...
image: ghcr.io/blakeblackshear/frigate-tensorrt-jp5
runtime: nvidia # Add this
```
:::note
The `runtime:` tag is not supported on older versions of docker-compose. If you run into this, you can instead use the nvidia runtime system-wide by adding `"default-runtime": "nvidia"` to `/etc/docker/daemon.json`:
```
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia"
}
```
:::
### Setup Decoder
The decoder you need to pass in the `hwaccel_args` will depend on the input video.
A list of supported codecs (you can use `ffmpeg -decoders | grep nvmpi` in the container to get the ones your card supports)
```
V..... h264_nvmpi h264 (nvmpi) (codec h264)
V..... hevc_nvmpi hevc (nvmpi) (codec hevc)
V..... mpeg2_nvmpi mpeg2 (nvmpi) (codec mpeg2video)
V..... mpeg4_nvmpi mpeg4 (nvmpi) (codec mpeg4)
V..... vp8_nvmpi vp8 (nvmpi) (codec vp8)
V..... vp9_nvmpi vp9 (nvmpi) (codec vp9)
```
For example, for H264 video, you'll select `preset-jetson-h264`.
```yaml
ffmpeg:
hwaccel_args: preset-jetson-h264
```
If everything is working correctly, you should see a significant reduction in ffmpeg CPU load and power consumption.
Verify that hardware decoding is working by running `jtop` (`sudo pip3 install -U jetson-stats`), which should show
that NVDEC/NVDEC1 are in use.

View File

@@ -101,7 +101,7 @@ detectors:
# Required: name of the detector
detector_name:
# Required: type of the detector
# Frigate provided types include 'cpu', 'edgetpu', and 'openvino' (default: shown below)
# Frigate provided types include 'cpu', 'edgetpu', 'openvino' and 'tensorrt' (default: shown below)
# Additional detector types can also be plugged in.
# Detectors may require additional configuration.
# Refer to the Detectors configuration page for more information.
@@ -414,6 +414,8 @@ snapshots:
# Optional: Per object retention days
objects:
person: 15
# Optional: quality of the encoded jpeg, 0-100 (default: shown below)
quality: 70
# Optional: RTMP configuration
# NOTE: RTMP is deprecated in favor of restream

View File

@@ -196,7 +196,9 @@ The model used for TensorRT must be preprocessed on the same hardware platform t
The Frigate image will generate model files during startup if the specified model is not found. Processed models are stored in the `/config/model_cache` folder. Typically the `/config` path is mapped to a directory on the host already and the `model_cache` does not need to be mapped separately unless the user wants to store it in a different location on the host.
To by default, the `yolov7-tiny-416` model will be generated, but this can be overridden by specifying the `YOLO_MODELS` environment variable in Docker. One or more models may be listed in a comma-separated format, and each one will be generated. To select no model generation, set the variable to an empty string, `YOLO_MODELS=""`. Models will only be generated if the corresponding `{model}.trt` file is not present in the `model_cache` folder, so you can force a model to be regenerated by deleting it from your Frigate data folder.
By default, the `yolov7-320` model will be generated, but this can be overridden by specifying the `YOLO_MODELS` environment variable in Docker. One or more models may be listed in a comma-separated format, and each one will be generated. To select no model generation, set the variable to an empty string, `YOLO_MODELS=""`. Models will only be generated if the corresponding `{model}.trt` file is not present in the `model_cache` folder, so you can force a model to be regenerated by deleting it from your Frigate data folder.
If you have a Jetson device with DLAs (Xavier or Orin), you can generate a model that will run on the DLA by appending `-dla` to your model name, e.g. specify `YOLO_MODELS=yolov7-320-dla`. The model will run on DLA0 (Frigate does not currently support DLA1). DLA-incompatible layers will fall back to running on the GPU.
If your GPU does not support FP16 operations, you can pass the environment variable `USE_FP16=False` to disable it.
@@ -252,11 +254,11 @@ detectors:
device: 0 #This is the default, select the first GPU
model:
path: /config/model_cache/tensorrt/yolov7-tiny-416.trt
path: /config/model_cache/tensorrt/yolov7-320.trt
input_tensor: nchw
input_pixel_format: rgb
width: 416
height: 416
width: 320
height: 320
```
## Deepstack / CodeProject.AI Server Detector

View File

@@ -101,12 +101,18 @@ This should show <50% CPU in top, and ~80% CPU without `-c:v h264_v4l2m2m`.
ffmpeg -c:v h264_v4l2m2m -re -stream_loop -1 -i https://streams.videolan.org/ffmpeg/incoming/720p60.mp4 -f rawvideo -pix_fmt yuv420p pipe: > /dev/null
```
**NVIDIA**
**NVIDIA GPU**
```shell
ffmpeg -c:v h264_cuvid -re -stream_loop -1 -i https://streams.videolan.org/ffmpeg/incoming/720p60.mp4 -f rawvideo -pix_fmt yuv420p pipe: > /dev/null
```
**NVIDIA Jetson**
```shell
ffmpeg -c:v h264_nvmpi -re -stream_loop -1 -i https://streams.videolan.org/ffmpeg/incoming/720p60.mp4 -f rawvideo -pix_fmt yuv420p pipe: > /dev/null
```
**VAAPI**
```shell

View File

@@ -70,7 +70,7 @@ Inference speeds vary greatly depending on the CPU, GPU, or VPU used, some known
| Intel i5 1135G7 | 10 - 15 ms | |
| Intel i5 12600K | ~ 15 ms | Inference speeds on CPU were ~ 35 ms |
### TensorRT
### TensorRT - Nvidia GPU
The TensortRT detector is able to run on x86 hosts that have an Nvidia GPU which supports the 12.x series of CUDA libraries. The minimum driver version on the host system must be `>=525.60.13`. Also the GPU must support a Compute Capability of `5.0` or greater. This generally correlates to a Maxwell-era GPU or newer, check the [TensorRT docs for more info](/configuration/object_detectors#nvidia-tensorrt-detector).
@@ -87,6 +87,14 @@ Inference speeds will vary greatly depending on the GPU and the model used.
| Quadro P400 2GB | 20 - 25 ms |
| Quadro P2000 | ~ 12 ms |
### Community Supported:
#### Nvidia Jetson
Frigate supports all Jetson boards, from the inexpensive Jetson Nano to the powerful Jetson Orin AGX. It will [make use of the Jetson's hardware media engine](/configuration/hardware_acceleration#nvidia-jetson-orin-agx-orin-nx-orin-nano-xavier-agx-xavier-nx-tx2-tx1-nano) when configured with the [appropriate presets](/configuration/ffmpeg_presets#hwaccel-presets), and will make use of the Jetson's GPU and DLA for object detection when configured with the [TensorRT detector](/configuration/object_detectors#nvidia-tensorrt-detector).
Inference speed will vary depending on the YOLO model, jetson platform and jetson nvpmodel (GPU/DLA/EMC clock speed). It is typically 20-40 ms for most models. The DLA is more efficient than the GPU, but not faster, so using the DLA will reduce power consumption but will slightly increase inference time.
## What does Frigate use the CPU for and what does it use a detector for? (ELI5 Version)
This is taken from a [user question on reddit](https://www.reddit.com/r/homeassistant/comments/q8mgau/comment/hgqbxh5/?utm_source=share&utm_medium=web2x&context=3). Modified slightly for clarity.

View File

@@ -93,6 +93,9 @@ The following officially supported builds are available:
The following community supported builds are available:
`ghcr.io/blakeblackshear/frigate:stable-tensorrt-jp5` - Frigate build optimized for nvidia Jetson devices running Jetpack 5
`ghcr.io/blakeblackshear/frigate:stable-tensorrt-jp4` - Frigate build optimized for nvidia Jetson devices running Jetpack 4.6
:::
```yaml