Audio

Explore UNISON, a single-model framework leveraging latent flow-matching and Qwen2.5-Omni-7B for diverse audio tasks, from text-to-audio to complex scene editing.

How UNISON Unifies Audio and Speech Generation with Deep LLM Fusion

UNISON is a unified latent flow-matching framework for audio and speech generation and editing. Using a single set of weights, it integrates text-to-audio, text-to-speech, zero-shot speaker cloning, mixed speech-and-sound scene generation, and audio/speech-in-scene editing—all in one model, one architecture, one forward pass, leveraging deep LLM fusion with Qwen2.5-Omni-7B.

Explore the Diffusion Transformer with Flow Matching that powers high-fidelity 48 kHz audio generation from natural language.

How MOSS-SoundEffect v2.0 Revolutionizes Text-to-Audio Synthesis

Discover MOSS-SoundEffect v2.0, a cutting-edge text-to-audio model using a 1.3B-parameter Diffusion Transformer and Flow Matching for superior sound generation. Learn about its capabilities, multilingual support, and optimal settings for creating diverse audio content.