Back to feed
arXiv cs.LG
arXiv cs.LG
6/19/2026
Closing the Social-Semantic Gap: SPSD for Edge-Based Prompt Compression in Cloud LLM Inference

Closing the Social-Semantic Gap: SPSD for Edge-Based Prompt Compression in Cloud LLM Inference

Short summary

Researchers present SPSD, a technique that compresses user prompts on-device by removing social scaffolding (politeness, repetition) before sending to cloud LLMs, reducing token usage by ~100 tokens per call. Testing on 248 prompts with Gemma-2 and Llama-3.1 shows non-inferior response quality and 70–270 uWh energy savings per call. Edge-based prompt distillation reduces inference costs while preserving output quality.

  • SPSD uses on-device small language model to strip social scaffolding before cloud LLM processing
  • 99.9 token average savings per call with response quality within 1-point margin on 15-point scale
  • Estimated 70–270 uWh energy savings per call; safety-critical domains routed via rule-based gates

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more