Dev.to
6/18/2026

I Ran Five Small Multimodal Models on a Jetson. The Fastest One Was Not the Best Baseline.
Short summary
Author benchmarked five multimodal models (Gemma, Qwen, SmolVLM, InternVL, Qwen-Omni) on NVIDIA Jetson for industrial edge AI. SmolVLM was fastest (12.8s) but lacked grounding for factory guidance. Gemma 4 E2B became the baseline—it balances speed (37.5s), structured action-card generation, and audit-trail compliance for maintenance, quality, and work-instruction workflows.
- •Fastest model (SmolVLM2-2.2B, 12.8s latency) produced generic answers unsuitable for structured industrial guidance
- •Gemma 4 E2B selected as baseline despite not being fastest—excels at audit trails, action boundaries, and workflow integration
- •Model selection framework: local deployment + deterministic guards + audit compliance matters more than leaderboard speed
Generated with AI, which can make mistakes.
Is this a good recommendation for you?



