arxiv.org
https://arxiv.org/pdf/2509.09660
Steer-MoE
SteerMoE inserts a Mixture-of-Experts (MoE)-based steering module into each layer of an audio encoder, dynamically transforming audio representations into a space that an LLM can interpret. Concretely, at layer , a shared router produces gating scores , which are used to compute a steering adjustment as a weighted sum over expert vectors . For example:
The adjusted hidden state is then passed through a linear projection and fed to the LLM as a “soft prompt”. Because this operates directly in a continuous vector space and skips discrete audio tokenization, it aims to minimize information loss while remaining “plug-and-play” (it does not require any modification to the LLM architecture). Notably, the shared router manages experts across all layers, improving parameter efficiency while enabling context-dependent steering/alignment.
Steer-MoE: Efficient Audio-Language Alignment with a...
Aligning pretrained audio encoders and Large Language Models (LLMs) offers a promising, parameter-efficient path to building powerful multimodal agents. However, existing methods often require...
https://arxiv.org/abs/2510.13558


Seonglae Cho