RECORD_ID: prj_01 STATUS: COMPLETED
[SUBJECT_MATTER]

edge ai on stm32 — comparing stm32n6570 and stm32mp257f

Comparative Edge AI project focused on deploying face and gesture recognition pipelines on STM32N6570-DK and STM32MP257F-DK2, with attention to latency, deployment complexity, and embedded constraints.

TIMEFRAME 2026
TECHNICAL_STACK
STM32N6570-DK STM32MP257F-DK2 Python TensorFlow / Keras OpenCV ONNX Runtime STM32CubeIDE X-CUBE-AI ST Edge AI Developer Cloud FreeRTOS OpenSTLinux

system_log // edge_ai_stm32

This project was a Master 2 academic team project focused on evaluating embedded AI deployment on two recent STMicroelectronics platforms: the STM32N6570-DK microcontroller and the STM32MP257F-DK2 microprocessor.

The goal was to compare two Edge AI execution strategies on real hardware through computer vision applications, while measuring practical constraints such as latency, runtime behavior, memory pressure, integration effort, and overall robustness.

mission_scope

We worked on two main application families:

  • face detection and face recognition
  • hand gesture recognition from 0 to 5 fingers

The project was designed as a comparative study between:

  • an MCU-oriented approach with strong embedded constraints and NPU acceleration,
  • an MPU-oriented approach offering more software flexibility through embedded Linux.

The initial specification targeted strong functional performance, including face detection precision above 90% and latency below 200 ms.

hardware_targets

stm32n6570-dk

The STM32N6570-DK is built around an Arm Cortex-M55 running at 800 MHz and integrates ST’s Neural-ART accelerator. It provides 4.2 MB of internal SRAM and targets constrained, low-power, real-time embedded execution.

This platform represents the microcontroller side of Edge AI: tighter memory, lower-level integration, and stronger emphasis on optimization.

stm32mp257f-dk2

The STM32MP257F-DK2 follows a different philosophy. It combines a Linux-capable MPU architecture with a more flexible software environment, external memory support, and a more powerful execution context for AI workloads.

This platform represents the microprocessor side of Edge AI: richer software tooling, easier high-level development, but also greater system complexity.

software_pipeline

The project relied on ST’s embedded AI ecosystem to move models from training environments to target hardware.

For the MCU workflow, the main toolchain included:

  • STM32CubeMX
  • STM32CubeIDE
  • STM32CubeProgrammer
  • X-CUBE-AI / ST Edge AI tools

For the MPU workflow, the project used:

  • OpenSTLinux
  • Python
  • OpenCV
  • ONNX Runtime
  • VSINPUExecutionProvider for NPU acceleration when supported

The deployment pipeline followed the usual sequence:

  1. train or select a model,
  2. export it in a compatible format,
  3. quantize and optimize it,
  4. integrate it on target,
  5. benchmark inference and validate behavior in real conditions.

model_choices

face_applications

For face detection on the MPU, the selected model was BlazeFace, chosen for lightweight real-time detection under Linux.

On the MCU, the implementation used two models:

  • CenterFace for face detection,
  • MobileFaceNet for face recognition.

This MCU pipeline allowed face detection first, then identity matching using facial embeddings and cosine similarity against a reference base.

gesture_applications

Gesture recognition was intentionally implemented with two different strategies.

On the MPU, the team used a direct image classification approach based on MobileNetV2 adapted to 6 classes corresponding to 0 to 5 raised fingers.

On the MCU, the approach was more hybrid:

  • palm detection,
  • 21 hand landmarks extraction,
  • geometric post-processing to infer the final finger count.

This made the comparison more interesting because it was not only a hardware comparison, but also a comparison of algorithmic strategies for embedded gesture understanding.

implementation_notes

One of the most valuable parts of the project was understanding how different embedded environments change the deployment experience.

On the STM32MP257F-DK2, Linux made development more modular and easier to debug. Camera access, preprocessing, ONNX model execution, and display logic could be managed in Python with a relatively comfortable software stack.

On the STM32N6570-DK, integration was much lower level. The project required tighter control over buffers, tensor sizes, peripheral configuration, and the full inference loop. Camera acquisition, resizing, inference, and display all had to be organized inside a much more constrained embedded pipeline.

measured_results

platformapplicationestimated inferencereal fpsqualitative validation
STM32MP257F-DK2Face detection25 ms18Very good
STM32MP257F-DK2Gesture recognition33 ms22Medium
STM32N6570-DKFace detection + recognition129 ms6Very good
STM32N6570-DKGesture detection15 ms30Very good

The MPU delivered fluid real-time execution and was especially effective for face detection.

The MCU delivered the best gesture pipeline in the project, reaching strong robustness with 30 FPS, while also proving that optimized AI workloads can run effectively on a constrained microcontroller.

constraints_and_lessons

The project also revealed that software complexity matters as much as raw compute power.

Even though the MPU offers a richer environment, OpenSTLinux setup and deployment complexity consumed a significant part of the project effort. In contrast, the MCU required more low-level engineering but gave a more convincing result for some optimized embedded AI tasks.

Another important lesson came from the gesture model on the MPU. Because the dataset was collected in controlled conditions, the model generalized less well when lighting, background, or orientation changed.

The project originally included an audio module, but that part was dropped during execution in order to secure the vision deliverables and maintain the project timeline.

project_outcome

This project gave me practical experience across the full embedded AI chain:

  • dataset preparation,
  • model training and export,
  • quantization,
  • deployment on embedded targets,
  • benchmark analysis,
  • and system-level trade-offs between MCU and MPU platforms.

More importantly, it helped me understand that successful Edge AI is not only about model accuracy. It also depends on integration cost, runtime constraints, memory limits, tooling maturity, and deployment realism.

visual_placeholders