arXiv cs.CL
5/12/2026

jina-embeddings-v5-omni: Geometry-preserving Embeddings via Locked Aligned Towers
Short summary
Jina AI introduced GELATO, a novel approach that extends frozen multimodal embedding models by training only 0.35% of weights. The jina-embeddings-v5-omni suite combines text, image, audio, and video into a single semantic space while maintaining exact backward compatibility with the v5 Text models. Competitive benchmarking shows performance on par with larger systems, enabling efficient deployment of multimodal AI applications without retraining core components.
- •GELATO method freezes backbone models and trains only 0.35% of weights (connector layers)
- •jina-embeddings-v5-omni supports text, image, audio, and video in unified embedding space
- •Achieves competitive performance with larger multimodal models at dramatically lower training cost
Generated with AI, which can make mistakes.
Is this a good recommendation for you?