Low-latency on-device gesture recognition and data capture using ESP32-S3 and k-NN.
GestureRecog is an embedded system designed to recognize hand gestures using flex sensors. It features on-device training, data storage, and real-time inference, all while maintaining extreme portability and low power consumption.
- Framework: ESP-IDF v5.x (C-based).
- DSP Engine: ESP-DSP for SIMD-accelerated k-Nearest Neighbors (k-NN) inference.
- Storage: SPIFFS (Serial Peripheral Interface Flash File System) for persistent on-device dataset storage.
- OS: FreeRTOS for multi-tasking and deterministic sampling.
- Framework: Next.js 15 (React 19).
- Styling: Tailwind CSS 4.0.
- Visualization: Chart.js with
react-chartjs-2for real-time sensor telemetry. - Connectivity: Web Serial API (via
web-serial-polyfill) for browser-to-hardware communication (supports Android/Chrome).
The project uses the internal ADC1 unit. The default mapping for the 5-finger flex sensors is:
| Sensor | Finger | ESP32-S3 Pin | ADC Channel |
|---|---|---|---|
| Channel 0 | Thumb | GPIO 1 | ADC1_CH0 |
| Channel 1 | Index | GPIO 2 | ADC1_CH1 |
| Channel 2 | Middle | GPIO 3 | ADC1_CH2 |
| Channel 3 | Ring | GPIO 4 | ADC1_CH3 |
| Channel 4 | Pinky | GPIO 5 | ADC1_CH4 |
- Baud Rate: 115200 (via USB-Serial/JTAG).
- Sampling Frequency: 10Hz (10 samples per second).
- What: The ESP32-S3 samples the 5 flex sensors via ADC1.
-
Processing: An Exponential Moving Average (EMA) filter is applied to each channel with a smoothing factor (
$\alpha$ ) of 0.2. - Why: Flex sensors are susceptible to high-frequency electrical noise and mechanical vibration. EMA ensures a smooth signal without introducing significant latency.
- What: Tracks the minimum and maximum ADC values ever seen for each finger during the "Calibrate" mode.
- Processing: Raw 12-bit ADC values (0-4095) are mapped to a floating-point range (0.0 to 1.0).
- Why: Every hand is different, and flex sensors vary in resistance. Normalization ensures that "fully closed" is always 1.0 and "fully open" is always 0.0, making the gesture recognition logic universal across users.
- What: The system maintains a rolling buffer of the last 32 samples (approx. 3.2 seconds) for each of the 5 channels.
-
Processing: This creates a feature vector of 160 elements (
$5 \text{ channels} \times 32 \text{ samples}$ ). - Why: Static finger positions aren't enough for robust gesture recognition; the temporal "signature" of how a gesture is held provides much higher accuracy.
- What: When a user triggers the
storecommand (via the web UI), the current 160-element vector is appended to/spiffs/dataset.binwith a custom label (e.g., "Peace"). - Why: Storing data directly on the device allows it to function as a standalone peripheral. It doesn't need a cloud or PC to "remember" gestures after a reboot.
- What: Every 1 second (during Infer Mode), the live buffer is compared against every record in the SPIFFS dataset.
- Processing:
- Uses Euclidean Distance to find the closest match.
- Optimized with ESP-DSP SIMD instructions (
dsps_sub_f32anddsps_dotprod_f32).
- Why: k-Nearest Neighbors is ideal for this scale. It requires no complex backpropagation or training phase. By using S3 SIMD instructions, we can compare dozens of gestures in under 1ms, ensuring zero-lag predictions.
- What: Monitors the variance in the buffer. If the difference between min and max in a window is below a threshold (e.g., 100), the device marks itself as "Disabled."
- Why: Prevents "phantom" predictions when the glove is sitting still or not being worn.
The ESP32 communicates with the web dashboard via a custom telemetry protocol over Serial:
>r[idx]:[val]: Raw ADC value for a specific channel.>[idx]:[val]: Mapped (0-100) value for the UI progress bars.>max[idx]:[val]/>min[idx]:[val]: Current calibration bounds.>mode:[MODE]: Current operation state (TRAIN, INFER, CALIBRATE).Best Match: [Label] (Dist: 0.00): Prediction results for the dashboard bubble.
- Hardware: Connect 5 flex sensors to GPIOs 1-5 of an ESP32-S3.
- Build: Use
idf.py buildin theesp/directory. - Flash: Use
idf.py flash monitor. - Dashboard: Open the
frontend/and runbun dev(ornpm run dev) to access the Web Serial interface.