system_log // edge_ai_stm32

This project was a Master 2 academic team project focused on evaluating embedded AI deployment on two recent STMicroelectronics platforms: the STM32N6570-DK microcontroller and the STM32MP257F-DK2 microprocessor.

The goal was to compare two Edge AI execution strategies on real hardware through computer vision applications, while measuring practical constraints such as latency, runtime behavior, memory pressure, integration effort, and overall robustness.

mission_scope

We worked on two main application families:

face detection and face recognition
hand gesture recognition from 0 to 5 fingers

The project was designed as a comparative study between:

an MCU-oriented approach with strong embedded constraints and NPU acceleration,
an MPU-oriented approach offering more software flexibility through embedded Linux.

The initial specification targeted strong functional performance, including face detection precision above 90% and latency below 200 ms.

hardware_targets

stm32n6570-dk

The STM32N6570-DK is built around an Arm Cortex-M55 running at 800 MHz and integrates ST’s Neural-ART accelerator. It provides 4.2 MB of internal SRAM and targets constrained, low-power, real-time embedded execution.

This platform represents the microcontroller side of Edge AI: tighter memory, lower-level integration, and stronger emphasis on optimization.

stm32mp257f-dk2

The STM32MP257F-DK2 follows a different philosophy. It combines a Linux-capable MPU architecture with a more flexible software environment, external memory support, and a more powerful execution context for AI workloads.

This platform represents the microprocessor side of Edge AI: richer software tooling, easier high-level development, but also greater system complexity.

software_pipeline

The project relied on ST’s embedded AI ecosystem to move models from training environments to target hardware.

For the MCU workflow, the main toolchain included:

STM32CubeMX
STM32CubeIDE
STM32CubeProgrammer
X-CUBE-AI / ST Edge AI tools

For the MPU workflow, the project used:

OpenSTLinux
Python
OpenCV
ONNX Runtime
VSINPUExecutionProvider for NPU acceleration when supported

The deployment pipeline followed the usual sequence:

train or select a model,
export it in a compatible format,
quantize and optimize it,
integrate it on target,
benchmark inference and validate behavior in real conditions.

model_choices

face_applications

For face detection on the MPU, the selected model was BlazeFace, chosen for lightweight real-time detection under Linux.

On the MCU, the implementation used two models:

CenterFace for face detection,
MobileFaceNet for face recognition.

This MCU pipeline allowed face detection first, then identity matching using facial embeddings and cosine similarity against a reference base.

gesture_applications

Gesture recognition was intentionally implemented with two different strategies.

On the MPU, the team used a direct image classification approach based on MobileNetV2 adapted to 6 classes corresponding to 0 to 5 raised fingers.

On the MCU, the approach was more hybrid:

palm detection,
21 hand landmarks extraction,
geometric post-processing to infer the final finger count.

This made the comparison more interesting because it was not only a hardware comparison, but also a comparison of algorithmic strategies for embedded gesture understanding.

implementation_notes

One of the most valuable parts of the project was understanding how different embedded environments change the deployment experience.

On the STM32MP257F-DK2, Linux made development more modular and easier to debug. Camera access, preprocessing, ONNX model execution, and display logic could be managed in Python with a relatively comfortable software stack.

On the STM32N6570-DK, integration was much lower level. The project required tighter control over buffers, tensor sizes, peripheral configuration, and the full inference loop. Camera acquisition, resizing, inference, and display all had to be organized inside a much more constrained embedded pipeline.

measured_results

platform	application	estimated inference	real fps	qualitative validation
STM32MP257F-DK2	Face detection	25 ms	18	Very good
STM32MP257F-DK2	Gesture recognition	33 ms	22	Medium
STM32N6570-DK	Face detection + recognition	129 ms	6	Very good
STM32N6570-DK	Gesture detection	15 ms	30	Very good

The MPU delivered fluid real-time execution and was especially effective for face detection.

The MCU delivered the best gesture pipeline in the project, reaching strong robustness with 30 FPS, while also proving that optimized AI workloads can run effectively on a constrained microcontroller.

constraints_and_lessons

The project also revealed that software complexity matters as much as raw compute power.

Even though the MPU offers a richer environment, OpenSTLinux setup and deployment complexity consumed a significant part of the project effort. In contrast, the MCU required more low-level engineering but gave a more convincing result for some optimized embedded AI tasks.

Another important lesson came from the gesture model on the MPU. Because the dataset was collected in controlled conditions, the model generalized less well when lighting, background, or orientation changed.

The project originally included an audio module, but that part was dropped during execution in order to secure the vision deliverables and maintain the project timeline.

project_outcome

This project gave me practical experience across the full embedded AI chain:

dataset preparation,
model training and export,
quantization,
deployment on embedded targets,
benchmark analysis,
and system-level trade-offs between MCU and MPU platforms.

More importantly, it helped me understand that successful Edge AI is not only about model accuracy. It also depends on integration cost, runtime constraints, memory limits, tooling maturity, and deployment realism.

edge ai on stm32 — comparing stm32n6570 and stm32mp257f