Dev.to
6/16/2026

Three Models, Zero API Calls: Real-Time Meeting Intelligence on Apple Silicon
Short summary
Thunder Kitty 1.9.0 adds real-time meeting intelligence—topic segmentation and agenda tracking—entirely on-device using three models (all-mpnet-base-v2 for embeddings, Apple Foundation Models for labeling, Qwen for summaries) with zero API calls. Getting the sentence-embedding model onto Apple's Neural Engine revealed a silent CoreML bug that drops position embeddings; fixed by pre-computing position information. Result: sub-20ms embeddings and offline-first meeting analysis.
- •Thunder Kitty 1.9.0 ships on-device topic segmentation and agenda tracking in real-time with zero API calls
- •Three-model architecture: sentence embeddings on Neural Engine (5-20ms), Apple Foundation Models for labeling (200ms-2s), Qwen 3.5 on GPU for summaries
- •Silent CoreML bug discovered: position_ids silently dropped during model conversion; fixed by pre-computing all position information across attention layers
Generated with AI, which can make mistakes.
Is this a good recommendation for you?



