Back to feed
arXiv cs.CL
arXiv cs.CL
6/18/2026
Want Better Synthetic Data? Steer It: Activation Steering for Low-Resource Language Generation

Want Better Synthetic Data? Steer It: Activation Steering for Low-Resource Language Generation

Short summary

Researchers propose activation steering to improve synthetic data generation for low-resource languages, avoiding costly few-shot prompting. By steering model activations to capture linguistic identity and well-formedness, the method generates more diverse training data across 11 languages using four open-source LLMs. Experiments show consistent gains in downstream classifier performance, especially for low-resource language tasks where training data is scarce.

  • Activation steering improves synthetic data diversity without few-shot prompting overhead
  • Technique works across 11 languages and multiple open-source LLMs
  • Particularly effective for low-resource languages with limited training data

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more