arXiv cs.LG
6/19/2026

Closing the Social-Semantic Gap: SPSD for Edge-Based Prompt Compression in Cloud LLM Inference
Short summary
Researchers present SPSD, a technique that compresses user prompts on-device by removing social scaffolding (politeness, repetition) before sending to cloud LLMs, reducing token usage by ~100 tokens per call. Testing on 248 prompts with Gemma-2 and Llama-3.1 shows non-inferior response quality and 70–270 uWh energy savings per call. Edge-based prompt distillation reduces inference costs while preserving output quality.
- •SPSD uses on-device small language model to strip social scaffolding before cloud LLM processing
- •99.9 token average savings per call with response quality within 1-point margin on 15-point scale
- •Estimated 70–270 uWh energy savings per call; safety-critical domains routed via rule-based gates
Generated with AI, which can make mistakes.
Is this a good recommendation for you?