TTS
Page 1 of 1

How NAVA Generates Synchronized 720p Audio-Video from a Single Prompt
NAVA is a 6.3B-parameter joint audio-video generator that synthesizes synchronized 720p video and audio from a single prompt. It utilizes an Align-then-Fuse MMDiT architecture to establish audio-video correspondence, offering features like multi-speaker speech with timbre control, fast generation, and language-described camera control.

How UNISON Unifies Audio and Speech Generation with Deep LLM Fusion
UNISON is a unified latent flow-matching framework for audio and speech generation and editing. Using a single set of weights, it integrates text-to-audio, text-to-speech, zero-shot speaker cloning, mixed speech-and-sound scene generation, and audio/speech-in-scene editing—all in one model, one architecture, one forward pass, leveraging deep LLM fusion with Qwen2.5-Omni-7B.

Inside TML's Real-Time AI: Redefining Human-AI Collaboration
Explore how Thinking Machines Lab (TML) is overcoming AI's collaboration bottleneck with a novel multi-stream, micro-turn design and a dual-model architecture. Learn about TML-Interaction-Small, its real-time performance, and how it enables seamless human-AI interaction.

Sophia AI Launches Sovereign Greek LLM Suite
Sophia AI presents a live demo of its Greek-language LLM suite, including text generation, image/video creation, voice, and research agents. Emphasizes technological, linguistic, and data sovereignty with EU-compliant servers and curated Greek datasets.