Three Models, Zero API Calls: Real-Time Meeting Intelligence on Apple Silicon

Short summary

Thunder Kitty 1.9.0 adds real-time meeting intelligence—topic segmentation and agenda tracking—entirely on-device using three models (all-mpnet-base-v2 for embeddings, Apple Foundation Models for labeling, Qwen for summaries) with zero API calls. Getting the sentence-embedding model onto Apple's Neural Engine revealed a silent CoreML bug that drops position embeddings; fixed by pre-computing position information. Result: sub-20ms embeddings and offline-first meeting analysis.

•Thunder Kitty 1.9.0 ships on-device topic segmentation and agenda tracking in real-time with zero API calls
•Three-model architecture: sentence embeddings on Neural Engine (5-20ms), Apple Foundation Models for labeling (200ms-2s), Qwen 3.5 on GPU for summaries
•Silent CoreML bug discovered: position_ids silently dropped during model conversion; fixed by pre-computing all position information across attention layers

Generated with AI, which can make mistakes.

#ai-tools #product-launch #research-breakthrough #open-source

Read full article at Dev.to

Is this a good recommendation for you?

Three Models, Zero API Calls: Real-Time Meeting Intelligence on Apple Silicon

Short summary

Explore more