Skip to content

DeepStream Concepts

DeepStream is NVIDIA's streaming analytics toolkit built on GStreamer. It provides a pipeline-based architecture for real-time video processing, computer vision, and AI inference. Understanding the core concepts of buffers, pads, and metadata flow is essential for effective DeepStream development.

Pipeline Architecture

Pipeline as a Directed Graph

DeepStream pipelines are implemented as directed graphs where:

  • Elements are nodes that perform specific operations (decoding, inference, tracking, etc.)
  • Pads are connection points between elements (source pads output data, sink pads receive data)
  • Buffers carry data and metadata through the pipeline

A typical DeepStream pipeline follows this pattern:

CameraSrc → Decoder → StreamMux → Inference → Tracker → OSD → Encoder/Sink

Each element consumes buffers from upstream elements and produces downstream buffers for downstream elements. In practice, buffers are often reused from a buffer pool, not always “new”. Elements may modify the same GstBuffer in place, especially with GPU NVMM memory. Important nuance: the metadata is continuously accumulated on the same buffer object, not re-created fresh each time.

GstBuffer Structure

A GstBuffer is the fundamental data container in GStreamer/DeepStream:

  • Payload: The actual video frame data (typically stored in GPU memory for DeepStream)
  • Metadata: Timestamps, format information (caps), and custom user data
  • DeepStream Metadata: NvDsBatchMeta structures containing detection results, tracking information, and other analytics data

The buffer acts as a carrier that maintains both the raw video data and all associated metadata as it flows through the pipeline.

Pad Negotiation

Elements connect through pads, which must negotiate compatible data formats:

  • Caps negotiation: Pads exchange capability information to determine compatible formats
  • Format matching: Downstream elements must accept the format produced by upstream elements
  • Example: nvvideoconvert might output video/x-raw(memory:NVMM), format=NV12; downstream elements must support this format

Pad Probes

Pad probes provide a mechanism to intercept and inspect data at specific points in the pipeline:

  • Installation: Probes are attached to specific pads (typically source pads)
  • Callback execution: Every buffer passing through triggers the probe callback
  • Capabilities:
  • Inspect: Read metadata, timestamps, detection results
  • Modify: Attach new metadata or modify buffer properties
  • Control: Drop buffers or block data flow

Pad probes are particularly useful for accessing inference results, as nvinfer elements attach detection metadata to buffers. But note that pad probes can also be installed for events or queries (not just buffers).

Data Flow and Element Interaction

Buffer Processing Flow

The pipeline processes data through the following sequence:

  1. Buffer Creation: Upstream elements produce buffers containing video data
  2. Buffer Transmission: Data flows through source pads to sink pads of downstream elements
  3. Buffer Processing: Downstream elements transform or annotate the buffer content
  4. Metadata Accumulation: DeepStream elements attach metadata to the same buffer as it flows
  5. Buffer Reuse: The same buffer object is continuously reused, accumulating metadata from multiple elements

Note that it is not literally the same buffer pointer for all frames. Each frame has its own buffer, but within that frame’s life, the same buffer travels the whole pipeline, carrying more metadata as it goes. The pool recycles buffer objects across frames once released.

Metadata Attachment Points

DeepStream elements add metadata at specific stages:

  • nvstreammux: Creates batch structure and adds NvDsBatchMeta
  • nvinfer: Adds NvDsObjectMeta containing detection results to frame metadata
  • nvtracker: Adds object tracking IDs to existing object metadata
  • nvdsosd: Reads metadata for overlay rendering but preserves the original data

All metadata remains attached to the same buffer object throughout the pipeline.

Buffer Structure Visualization

┌─────────────────────────────────┐
│ GstBuffer                       │
├─────────────────────────────────┤
│ Payload: Pixels (GPU/CPU mem)   │ ← Video frame data
│ Timestamp (PTS)                 │ ← Timing information
│ NvDsBatchMeta                   │ ← DeepStream metadata
│   └─ FrameMeta                  │   └─ Per-stream information
│       └─ ObjectMeta             │       └─ Detection/tracking data
└─────────────────────────────────┘

Each element can examine and add to the metadata "clipboard" attached to the buffer.

Data Access Methods

Appsink vs Pad Probe

Appsink: - Dedicated pipeline termination element - Pulls complete buffers (pixels + metadata) into application code - Suitable for CPU-based processing and final output

Pad Probe: - Mid-pipeline interception mechanism - Non-destructive inspection of buffers - Ideal for metadata extraction and debugging

Synchronization Considerations

Since metadata is attached in-band to buffers, the most reliable approach for synchronized access is:

  • Intercept the same buffer using either probes or appsink
  • Access pixels and metadata together from the same buffer object
  • Avoid splitting data flow, which would require manual synchronization

Example Pipeline Walkthrough

Pipeline Structure

CameraSrc → Decoder → StreamMux → Inference → Tracker → OSD → Convert → Appsink

Stage-by-Stage Buffer Evolution

1. Camera → Decoder

Input: Compressed video stream
Output: Raw video frames

Buffer State:
┌─────────────────────────────────┐
│ Pixels: Raw YUV (NV12, GPU)     │
│ PTS: 1668338300000 ns           │
│ Metadata: (empty)               │
└─────────────────────────────────┘
  • Camera source creates initial buffer
  • Decoder converts compressed frames to raw GPU surfaces
  • No DeepStream-specific metadata yet

2. nvstreammux

Input: Multiple video streams
Output: Batched frames

Buffer State:
┌─────────────────────────────────┐
│ Pixels: Batch of N frames       │
│ PTS: Per-frame timestamps       │
│ NvDsBatchMeta                   │
│   └─ FrameMeta (per stream)     │
│       • source_id               │
│       • width, height           │
│       • buf_pts                 │
└─────────────────────────────────┘
  • Combines multiple input streams into batches
  • Adds NvDsBatchMeta structure for DeepStream processing

3. nvinfer (Detection)

Input: Batched frames
Output: Frames with detection metadata

Buffer State:
┌─────────────────────────────────┐
│ Pixels: Unchanged (GPU)         │
│ NvDsBatchMeta                   │
│   └─ FrameMeta                  │
│       └─ ObjectMeta             │
│           • class_id            │
│           • bbox {x, y, w, h}   │
│           • confidence          │
└─────────────────────────────────┘
  • Runs AI model inference on GPU
  • Attaches detection results as NvDsObjectMeta

4. nvtracker

Input: Frames with detections
Output: Frames with tracking IDs

Buffer State:
┌─────────────────────────────────┐
│ Pixels: Unchanged               │
│ NvDsBatchMeta                   │
│   └─ FrameMeta                  │
│       └─ ObjectMeta             │
│           • object_id (new)     │
│           • class_id            │
│           • bbox                │
│           • confidence          │
└─────────────────────────────────┘
  • Associates tracking IDs with detected objects
  • Enriches existing object metadata

5. nvdsosd (Overlay)

Input: Frames with tracking data
Output: Frames with visual overlays

Buffer State:
┌─────────────────────────────────┐
│ Pixels: + overlay graphics      │
│ NvDsBatchMeta: Intact           │
│   └─ FrameMeta                  │
│       └─ ObjectMeta (unchanged) │
└─────────────────────────────────┘
  • Renders bounding boxes and labels
  • Preserves all metadata

6. nvvideoconvert → Appsink

Input: GPU frames with overlays
Output: CPU frames ready for application

Buffer State:
┌─────────────────────────────────┐
│ Pixels: BGR (CPU memory)        │
│ NvDsBatchMeta: Complete         │
│   └─ All metadata preserved     │
└─────────────────────────────────┘
  • Converts from GPU to CPU memory
  • Applications can access both pixels and metadata

Probe Placement Strategy

Camera → Decoder → StreamMux → Infer → Tracker → OSD → Convert → Appsink
           ↑         ↑         ↑       ↑        ↑
        (probe)   (probe)   (probe)  (probe)  (probe)

Probe Locations: - After Decoder: Raw video data available - After StreamMux: Batch structure and frame metadata - After Infer: Detection results available - After Tracker: Complete object tracking data - After OSD: Final processed frames with overlays

Key Implementation Considerations

  1. Probe Selection: Choose probe location based on required metadata
  2. CPU Conversion: Only convert to CPU when necessary for application processing
  3. Synchronization: Always access pixels and metadata from the same buffer
  4. Memory Management: GPU memory is more efficient for pipeline processing

Best Practices

Performance Optimization

  • Minimize CPU-GPU transfers: Keep data in GPU memory as long as possible
  • Batch processing: Use nvstreammux to process multiple streams efficiently
  • Memory pools: Leverage GStreamer's buffer pool system for consistent performance
  • Probe efficiency: Keep probe callbacks lightweight to avoid pipeline stalls

Debugging and Monitoring

  • Probe placement: Use probes strategically to inspect data at different pipeline stages
  • Metadata inspection: Access NvDsBatchMeta to verify detection and tracking results
  • Buffer analysis: Check buffer caps and timestamps for synchronization issues
  • Pipeline state: Monitor element states and error messages for troubleshooting

Common Pitfalls

  • Metadata loss: Avoid splitting data flow between separate branches
  • Memory leaks: Properly release buffers and metadata when using appsink
  • Synchronization errors: Always access pixels and metadata from the same buffer
  • Format mismatches: Ensure proper caps negotiation between elements