arXiv cs.CL
6/18/2026

Want Better Synthetic Data? Steer It: Activation Steering for Low-Resource Language Generation
Short summary
Researchers propose activation steering to improve synthetic data generation for low-resource languages, avoiding costly few-shot prompting. By steering model activations to capture linguistic identity and well-formedness, the method generates more diverse training data across 11 languages using four open-source LLMs. Experiments show consistent gains in downstream classifier performance, especially for low-resource language tasks where training data is scarce.
- •Activation steering improves synthetic data diversity without few-shot prompting overhead
- •Technique works across 11 languages and multiple open-source LLMs
- •Particularly effective for low-resource languages with limited training data
Generated with AI, which can make mistakes.
Is this a good recommendation for you?