TTS

Page 1 of 1

Explore NAVA's Align-then-Fuse MMDiT architecture for native audio-visual alignment, enabling precise multi-timbre control and language-described camera movements.

How NAVA Generates Synchronized 720p Audio-Video from a Single Prompt

NAVA is a 6.3B-parameter joint audio-video generator that synthesizes synchronized 720p video and audio from a single prompt. It utilizes an Align-then-Fuse MMDiT architecture to establish audio-video correspondence, offering features like multi-speaker speech with timbre control, fast generation, and language-described camera control.

Explore UNISON, a single-model framework leveraging latent flow-matching and Qwen2.5-Omni-7B for diverse audio tasks, from text-to-audio to complex scene editing.

How UNISON Unifies Audio and Speech Generation with Deep LLM Fusion

UNISON is a unified latent flow-matching framework for audio and speech generation and editing. Using a single set of weights, it integrates text-to-audio, text-to-speech, zero-shot speaker cloning, mixed speech-and-sound scene generation, and audio/speech-in-scene editing—all in one model, one architecture, one forward pass, leveraging deep LLM fusion with Qwen2.5-Omni-7B.

A Deep Dive into the Multi-Stream, Dual-Model Architecture Powering Next-Generation Interactive AI Systems

Inside TML's Real-Time AI: Redefining Human-AI Collaboration

Explore how Thinking Machines Lab (TML) is overcoming AI's collaboration bottleneck with a novel multi-stream, micro-turn design and a dual-model architecture. Learn about TML-Interaction-Small, its real-time performance, and how it enables seamless human-AI interaction.

Full-stack AI models designed for Greek language, culture, and data sovereignty, addressing low adoption rates.

Sophia AI Launches Sovereign Greek LLM Suite

Sophia AI presents a live demo of its Greek-language LLM suite, including text generation, image/video creation, voice, and research agents. Emphasizes technological, linguistic, and data sovereignty with EU-compliant servers and curated Greek datasets.