Ideogram 4: First Open-Weight Foundation Model
Ideogram 4 is Ideogramâs first open-weight text-to-image model â a state-of-the-art foundation model trained from scratch. It debuts a structured JSON prompting interface, native 2K resolution, and best-in-class control over layout, color palette, and typography. By releasing model weights publicly, Ideogram brings cuttingâedge generative AI capabilities directly to researchers and developers who previously only had access to closedâsource alternatives. The model supports extreme controllability: boundingâbox coordinates, hexâcolor conditioning, and precise spatial arrangements can all be specified in a single JSON caption. This release marks a significant shift toward open, designâfocused image generation.
Architecture: Single-Stream DiT with a VisionâLanguage Encoder
Ideogram 4 uses a fully singleâstream Diffusion Transformer (DiT). Text and image tokens are concatenated into one sequence and processed jointly through 34 layers, enabling deep crossâmodal interaction at every stage. Instead of a textâonly encoder like CLIP, the model leverages Qwen3âVLâ8BâInstruct â a full visionâlanguage model that provides richer visual understanding. Hidden states from 13 intermediate layers are concatenated, giving the DiT multiâscale semantic features from surface tokens to deep compositional structure. A dualâbranch classifierâfree guidance scheme lets users independently refine prompt adherence and image quality. The 9.3Bâparameter model natively handles any resolution from 256 to 2048 (multiples of 16), with aspect ratios up to 6:1, all from a single checkpoint.
Benchmark Leadership: Best Open-Weight Image Model
Thirdâparty and internal evaluations confirm Ideogram 4 as the leading openâweight image generator. On Design Arenaâs overall Elo leaderboard it ranks highest among open models, trailing only proprietary GPT and Gemini systems. In a blind typography test by professional designers (ContraLabs), it achieved a 47.9% firstâplace win rate, far ahead of the next best model (30.0%). The same designers rated it 3.55/5 for realâworld client work â the top score. On LMArena it is a topâ5 image generation lab overall. Internally, BradleyâTerry scores place it second only to GPT Image 2 medium. Openâsource benchmarks show it closes the gap to closed models in spatial reasoning, object fidelity, prompt alignment, and text rendering. At 9.3B parameters it redefines parameter efficiency, outperforming models that are 2â9Ă larger.
JSON Prompting for Extreme Controllability
The model was trained exclusively on structured JSON captions, with each caption exhaustively describing all image content.
This yields more grounded supervision per training pair and makes JSON the most reliable prompt format.
Users can supply a colour_palette array of hex colors, bbox coordinates for precise placement of elements, and compositional_deconstruction for perâobject descriptions.
The interface also supports bestâinâclass multilingual text rendering â signage, logos, multiâline text, and watermarks appear with high fidelity directly from the prompt.
For those who prefer plain text, a âmagic promptâ system automatically expands a simple description into a full JSON caption before generation.
Installation and Model Access
Model weights are gated on Hugging Face. To use Ideogram 4, first agree to the license on the model page (ideogramâ4ânf4 or ideogramâ4âfp8), then authenticate with a Hugging Face token:
hf auth login
Clone the ideogram4 GitHub repository and install the inference package:
pip install .
For editable installs use pip install -e ..
The run_inference.py script handles generation; it requires an IDEOGRAM_API_KEY for the free magicâprompt service (obtain at developer.ideogram.ai).
Optional Hive safety screening can be enabled by setting HIVE_TEXT_MODERATION_KEY and HIVE_VISUAL_MODERATION_KEY.
python run_inference.py \ --prompt "a ginger cat wearing a tiny wizard hat reading a spellbook" \ --output out.png \ --quantization "nf4" \ --magic-prompt-key "$IDEOGRAM_API_KEY"



