Back to feed
Dev.to
Dev.to
6/16/2026
Three Models, Zero API Calls: Real-Time Meeting Intelligence on Apple Silicon

Three Models, Zero API Calls: Real-Time Meeting Intelligence on Apple Silicon

Short summary

Thunder Kitty 1.9.0 adds real-time meeting intelligence—topic segmentation and agenda tracking—entirely on-device using three models (all-mpnet-base-v2 for embeddings, Apple Foundation Models for labeling, Qwen for summaries) with zero API calls. Getting the sentence-embedding model onto Apple's Neural Engine revealed a silent CoreML bug that drops position embeddings; fixed by pre-computing position information. Result: sub-20ms embeddings and offline-first meeting analysis.

  • Thunder Kitty 1.9.0 ships on-device topic segmentation and agenda tracking in real-time with zero API calls
  • Three-model architecture: sentence embeddings on Neural Engine (5-20ms), Apple Foundation Models for labeling (200ms-2s), Qwen 3.5 on GPU for summaries
  • Silent CoreML bug discovered: position_ids silently dropped during model conversion; fixed by pre-computing all position information across attention layers

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more