Audio events (#6848)

* Initial audio classification model implementation * fix mypy * Keep audio labelmap local * Cleanup * Start adding config for audio * Add the detector * Add audio detection process keypoints * Build out base config * Load labelmap correctly * Fix config bugs * Start audio process * Fix startup issues * Try to cleanup restarting * Add ffmpeg input args * Get audio detection working * Save event to db * End events if not heard for 30 seconds * Use not heard config * Stop ffmpeg when shutting down * Fixes * End events correctly * Use api instead of event queue to save audio events * Get events working * Close threads when stop event is sent * remove unused * Only start audio process if at least one camera is enabled * Add const for float * Cleanup labelmap * Add audio icon in frontend * Add ability to toggle audio with mqtt * Set initial audio value * Fix audio enabling * Close logpipe * Isort * Formatting * Fix web tests * Fix web tests * Handle cases where args are a string * Remove log * Cleanup process close * Use correct field * Simplify if statement * Use var for localhost * Add audio detectors docs * Add restream docs to mention audio detection * Add full config docs * Fix links to other docs --------- Co-authored-by: Jason Hunter <hunterjm@gmail.com>
2023-07-01 07:18:33 -06:00
parent f1dc3a639c
commit c3b313a70d
28 changed files with 1090 additions and 69 deletions
--- a/docs/docs/configuration/audio_detectors.md
+++ b/docs/docs/configuration/audio_detectors.md
@@ -0,0 +1,63 @@
+---
+id: audio_detectors
+title: Audio Detectors
+---
+
+Frigate provides a builtin audio detector which runs on the CPU. Compared to object detection in images, audio detection is a relatively lightweight operation so the only option is to run the detection on a CPU.
+
+## Configuration
+
+Audio events work by detecting a type of audio and creating an event, the event will end once the type of audio has not been heard for the configured amount of time. Audio events save a snapshot at the beginning of the event as well as recordings throughout the event. The recordings are retained using the configured recording retention.
+
+### Enabling Audio Events
+
+Audio events can be enabled for all cameras or only for specific cameras.
+
+```yaml
+
+audio: # <- enable audio events for all camera
+  enabled: True
+
+cameras:
+  front_camera:
+    ffmpeg:
+    ...
+    audio:
+      enabled: True # <- enable audio events for the front_camera
+```
+
+If you are using multiple streams then you must set the `audio` role on the stream that is going to be used for audio detection, this can be any stream but the stream must have audio included.
+
+:::note
+
+The ffmpeg process for capturing audio will be a separate connection to the camera along with the other roles assigned to the camera, for this reason it is recommended that the go2rtc restream is used for this purpose. See [the restream docs](/configuration/restream.md) for more information.
+
+:::
+
+```yaml
+cameras:
+  front_camera:
+    ffmpeg:
+      inputs:
+        - path: rtsp://.../main_stream
+          roles:
+            - record
+        - path: rtsp://.../sub_stream # <- this stream must have audio enabled
+          roles:
+            - audio
+            - detect
+```
+
+### Configuring Audio Events
+
+The included audio model has over 500 different types of audio that can be detected, many of which are not practical. By default `bark`, `speech`, `yell`, and `scream` are enabled but these can be customized.
+
+```yaml
+audio:
+  enabled: True
+  listen:
+    - bark
+    - scream
+    - speech
+    - yell
+```
--- a/docs/docs/configuration/index.md
+++ b/docs/docs/configuration/index.md
@@ -138,6 +138,20 @@ model:
  labelmap:
    2: vehicle

+# Optional: Audio Events Configuration
+# NOTE: Can be overridden at the camera level
+audio:
+  # Optional: Enable audio events (default: shown below)
+  enabled: False
+  # Optional: Configure the amount of seconds without detected audio to end the event (default: shown below)
+  max_not_heard: 30
+  # Optional: Types of audio to listen for (default: shown below)
+  listen:
+    - bark
+    - scream
+    - speech
+    - yell
+
 # Optional: logger verbosity settings
 logger:
  # Optional: Default log verbosity (default: shown below)
--- a/docs/docs/configuration/object_detectors.md
+++ b/docs/docs/configuration/object_detectors.md
@@ -1,6 +1,6 @@
 ---
-id: detectors
-title: Detectors
+id: object_detectors
+title: Object Detectors
 ---

 Frigate provides the following builtin detector types: `cpu`, `edgetpu`, `openvino`, and `tensorrt`. By default, Frigate will use a single CPU detector. Other detectors may require additional configuration as described below. When using multiple detectors they will run in dedicated processes, but pull from a common queue of detection requests from across all cameras.
@@ -275,6 +275,6 @@ detectors:
    api_timeout: 0.1 # seconds
 ```

-Replace `<your_codeproject_ai_server_ip>` and `<port>` with the IP address and port of your CodeProject.AI server. 
+Replace `<your_codeproject_ai_server_ip>` and `<port>` with the IP address and port of your CodeProject.AI server.

 To verify that the integration is working correctly, start Frigate and observe the logs for any error messages related to CodeProject.AI. Additionally, you can check the Frigate web interface to see if the objects detected by CodeProject.AI are being displayed and tracked properly.
--- a/docs/docs/configuration/restream.md
+++ b/docs/docs/configuration/restream.md
@@ -67,6 +67,7 @@ cameras:
          roles:
            - record
            - detect
+            - audio # <- only necessary if audio detection is enabled
  http_cam:
    ffmpeg:
      output_args:
@@ -77,6 +78,7 @@ cameras:
          roles:
            - record
            - detect
+            - audio # <- only necessary if audio detection is enabled
 ```

 ### With Sub Stream
@@ -112,6 +114,7 @@ cameras:
        - path: rtsp://127.0.0.1:8554/rtsp_cam_sub # <--- the name here must match the name of the camera_sub in restream
          input_args: preset-rtsp-restream
          roles:
+            - audio # <- only necessary if audio detection is enabled
            - detect
  http_cam:
    ffmpeg:
@@ -125,6 +128,7 @@ cameras:
        - path: rtsp://127.0.0.1:8554/http_cam_sub # <--- the name here must match the name of the camera_sub in restream
          input_args: preset-rtsp-restream
          roles:
+            - audio # <- only necessary if audio detection is enabled
            - detect
 ```

--- a/docs/docs/frigate/hardware.md
+++ b/docs/docs/frigate/hardware.md
@@ -50,7 +50,7 @@ The OpenVINO detector type is able to run on:
 - 6th Gen Intel Platforms and newer that have an iGPU
 - x86 & Arm64 hosts with VPU Hardware (ex: Intel NCS2)

-More information is available [in the detector docs](/configuration/detectors#openvino-detector)
+More information is available [in the detector docs](/configuration/object_detectors#openvino-detector)

 Inference speeds vary greatly depending on the CPU, GPU, or VPU used, some known examples are below:

@@ -72,7 +72,7 @@ Inference speeds vary greatly depending on the CPU, GPU, or VPU used, some known

 ### TensorRT

-The TensortRT detector is able to run on x86 hosts that have an Nvidia GPU which supports the 11.x series of CUDA libraries. The minimum driver version on the host system must be `>=450.80.02`. Also the GPU must support a Compute Capability of `5.0` or greater. This generally correlates to a Maxwell-era GPU or newer, check the [TensorRT docs for more info](/configuration/detectors#nvidia-tensorrt-detector).
+The TensortRT detector is able to run on x86 hosts that have an Nvidia GPU which supports the 11.x series of CUDA libraries. The minimum driver version on the host system must be `>=450.80.02`. Also the GPU must support a Compute Capability of `5.0` or greater. This generally correlates to a Maxwell-era GPU or newer, check the [TensorRT docs for more info](/configuration/object_detectors#nvidia-tensorrt-detector).

 Inference speeds will vary greatly depending on the GPU and the model used.
 `tiny` variants are faster than the equivalent non-tiny model, some known examples are below:
--- a/docs/docs/guides/getting_started.md
+++ b/docs/docs/guides/getting_started.md
@@ -71,7 +71,7 @@ cameras:
      ...
 ```

-More details on available detectors can be found [here](../configuration/detectors.md).
+More details on available detectors can be found [here](../configuration/object_detectors.md).

 Restart Frigate and you should start seeing detections for `person`. If you want to track other objects, they will need to be added according to the [configuration file reference](../configuration/index.md#full-configuration-reference).