Closing the Social-Semantic Gap: SPSD for Edge-Based Prompt Compression in Cloud LLM Inference

Short summary

Researchers present SPSD, a technique that compresses user prompts on-device by removing social scaffolding (politeness, repetition) before sending to cloud LLMs, reducing token usage by ~100 tokens per call. Testing on 248 prompts with Gemma-2 and Llama-3.1 shows non-inferior response quality and 70–270 uWh energy savings per call. Edge-based prompt distillation reduces inference costs while preserving output quality.

•SPSD uses on-device small language model to strip social scaffolding before cloud LLM processing
•99.9 token average savings per call with response quality within 1-point margin on 15-point scale
•Estimated 70–270 uWh energy savings per call; safety-critical domains routed via rule-based gates

Generated with AI, which can make mistakes.

#ai-tools #research-breakthrough

Read full article at arXiv cs.LG

Is this a good recommendation for you?

Closing the Social-Semantic Gap: SPSD for Edge-Based Prompt Compression in Cloud LLM Inference

Short summary

Explore more