Skip to main content

This site is a preview for github.com/espressif/developer-portal/pull/573

Building Smarter Camera Applications with esp-video

·4 mins·
ESP32-P4 ESP32-S3 ESP32-C6 Camera AI Video Multimedia
Author
Wang Yuxin
Embedded Software Engineer at Espressif
Table of Contents
The esp-video component provides a solution to build camera applications on the ESP32 chips. This article will introduce the esp-video component, how to use it, and will give an overview of the framework around it.

Overview
#

In recent years, the rapid evolution of IoT and AI technologies has significantly enhanced the sensing capabilities of connected devices. Camera sensors are typically used to capture images, providing devices with visual perception capabilities.

esp-video is a camera application development component provided by Espressif, designed to provide developers with a simplified, efficient and cost-effective solution for building vision applications. Developers can use esp-video component to quickly drive various types of camera sensors. The key features of the component are:

esp-video-key-features

Use cases
#

The ESP32 chips can be connected to various camera sensors. The following applications have been implemented:

  • Image transmission: ESP32 series of chips equipped with robust networking (Wi-Fi, BLE, Zigbee, Ethernet), these chips enable long-distance transmission of captured images and videos.

  • Segmented Capture: Integrated with ESP32’s Mesh network, this feature splits tasks between low-power devices (for detection) and high-power devices (for video capture).

  • AI Applications: ESP32-S and ESP32-P series of chips support on-device AI via esp-dl and esp-who, enabling features like face recognition, motion tracking, barcode scanning, and more. They also preprocess data (e.g., human detection) for cloud AI. Current applications include smart pet feeders, remote utility monitoring, automated grading, medical testing, and human presence detection.

    camera-mutil-usages
  • Multi-Camera Systems:

    1. Dual-Focus Setup: Near-camera (close objects) + far-camera (distant objects).

    2. Color + Monochrome: Color imaging for detail, monochrome for enhanced night vision sensitivity.

    3. 360° Coverage: Directional modules (e.g., conference cameras) capture all participants around a table.

    mutil-cameras-applications

    esp-video supports simultaneous connections to multiple camera sensors. The following figure shows the effect of running simple_video_server on the ESP32-P4-Function-EV-Board:

    camera-mutil-usages

esp-video and esp32-camera
#

esp32-camera is the first-generation camera application development component, which has a driver for image sensors compatible with the ESP32 series of chips. esp-video is an enhanced version of esp32-camera. The key differences are as follows:

Featureesp-videoesp32-camera
Supported ChipsESP32-S3, ESP32-P/C seriesESP32, ESP32S2/S3
Real-Time PerformanceHigh frame ratesStandard
ISP CapabilityBuilt-in (ESP32-P Series)Not supported
Video EncodingH.264/JPEG (high-speed)Not supported

Key design principles
#

esp-video focuses on the following four core strengths:

  • User-Friendly Design: The component’s API aligns with the V4L2 (Video for Linux 2) standard, enabling developers to interact with cameras as easily as handling regular files—just call open() to get started.

  • High Performance: Optimized hardware-software integration and image processing algorithms (IPA) ensure fast startup, smooth preview, and responsive capture during photo operations.

  • Consistent Functionality: Supporting chips like ESP32-S3, ESP32-P series, and ESP32-C series, the component achieves unified device control through a standardized abstraction layer for underlying interfaces. Whether it’s diverse camera sensors or system components like ISPs, codecs, and VCMs, all can be managed via ioctl(). As shown, common interfaces (MIPI-CSI, DVP, SPI, USB) share the same open() and ioctl() workflow for seamless operation.

    supported-common-camera-sensor-interfaces
  • Flexible Expansion: Developers can customize configurations for existing cameras or add new peripheral drivers and control commands to extend functionality.

Framework architecture
#

The entire camera application development framework follows a clear four-layer structure, and the esp-video component is located in the middleware layer. The main contents of each layer are:

  • Driver Layer: Low-level drivers for peripherals such as MIPI-CSI, DVP, SPI, I2C, I3C, ISP, JPEG, and H.264.

  • Device Layer: Abstracts hardware variations, reducing integration complexity while ensuring broad compatibility.

  • Middleware Layer: Routes commands to the correct device, executes them, and returns results to the application.

  • Application Layer: Offers a unified API to simplify development.

    esp-video-framework

Getting started
#

  1. Find supported camera sensors at esp_cam_sensor.
  2. Select a supported device from the list of supported video devices based on the hardware interface being used.
  3. Open the device using open().
  4. Retrieve image data with ioctl(fd, VIDIOC_DQBUF, ...).
  5. Release memory after use via ioctl(fd, VIDIOC_QBUF, ...).

A simple example of capturing image data from a camera sensor is capture_stream. Explore more examples at esp-video/examples.

esp-video-codes

Final thoughts
#

Looking for a quick, reliable way to build camera applications? esp-video combines an intuitive API with strong community backing—ideal for accelerating your project.

Feel free to use esp-video and share your feedback with us!

Contact: sales@espressif.com for project evaluation or development support.

Related

ESP H.264 Practical Usage Guide
·7 mins
Multimedia H.264 Performance Tuning ESP32-P4 ESP32-S3
This article introduces Espressif’s esp_h264 component, a lightweight H.264 codec optimized for embedded devices. It shows how to leverage hardware acceleration, implement efficient video processing, and optimize performance for various applications.
Explore the PIE capabilities on the ESP32-P4
·10 mins
ESP32-P4 ESP32-S3 PIE AI DSP Assembly
Introducing ESP_NEW_JPEG: An Efficient JPEG Encoder and Decoder
·10 mins
Multimedia JPEG ESP32-S3
The ESP_NEW_JPEG library from Espressif enables efficient JPEG encoding and decoding on embedded devices. This article introduces the main features of ESP_NEW_JPEG – including image rotation, clipping, scaling, and block mode – and shares key usage tips for developers.