Z-Anime: Full Anime Fine-Tune on Z-Image Base

Full fine-tune family based on Alibaba's Z-Image S3-DiT, with variants for quality, speed, and low VRAM.

May 24, 2026

#Content Generation #Fine Tuning #Open Source #Python #Training

Z-Anime is a full fine-tune of the Z-Image Base architecture, not a LoRA merge. It provides anime-style generation with natural language prompting, high diversity, and multiple variants including Base, Distill-8-Step, Distill-4-Step, GGUF, and AIO. Supports 8GB VRAM and includes VAE and text encoder.

Z-Anime | Full Anime Fine-Tune on Z-Image Base

Full fine-tune of Alibaba’s Z-Image Base architecture — not a LoRA merge, but a fully trained anime-focused model family built from the ground up.

Built on the S3-DiT (Single-Stream Diffusion Transformer, 6B parameters), Z-Anime inherits the rich diversity, strong controllability, full negative prompt support, and a high ceiling for fine-tuning of Z-Image Base — now adapted for anime-style generation.

Variants

Variant	Focus	Best For
Z-Anime Base	Highest quality	Final renders, full control
Z-Anime Distill-8-Step	Speed + quality balance	Everyday generation
Z-Anime Distill-4-Step	Maximum speed	Fast iteration, batches
GGUF Variants	Lower memory usage	Low VRAM / CPU / AMD-friendly workflows
AIO Variants	Single-file convenience	Easy ComfyUI setup
Diffusers Folder	`from_pretrained()` ready	Python pipelines, further fine-tuning

Key Features

Full fine-tune on Z-Image Base — not a LoRA merge
Rich anime aesthetics with strong style diversity
Natural language prompting — works best with descriptive prompts, not tag lists
High diversity across characters, poses, compositions, and layouts
LoRA training ready — strong base for further fine-tuning
Partially NSFW capable
8GB VRAM compatible
GGUF variants available
AIO variants available (Base, 4-Step, 8-Step)

Released Variants

Z-Anime Base

Full fine-tune on Z-Image Base — BF16 & FP8

Z-Anime Distill-8-Step

BF16 & FP8 — fast anime generation in 8 steps, CFG 1.0

Z-Anime Distill-4-Step

BF16 & FP8 — ultra-fast anime generation in 4 steps, CFG 1.0

GGUF Variants

Z-Anime-Base-Q8_0 — Q8_0 quantization (~6.73 GB)
Z-Anime-Base-Q4_K_S — Q4_K_S quantization (~4.2 GB)

AIO Variants

All-in-one checkpoints with image model + VAE + Text Encoder integrated in a single file. Available for Base, Distill-4-Step and Distill-8-Step — each in BF16 & FP8.

VAE & Text Encoder

The required VAE (ae.safetensors) and Text Encoder (qwen_3_4b.safetensors) are also included in this repository for users running the standard (non-AIO) variants.

Diffusers Folder

The full Diffusers-format folder (diffusers/) is included — drop-in compatible with ZImagePipeline.from_pretrained() for Python inference or further fine-tuning.

Version Formats

BF16 (~12GB)

Maximum precision. BFloat16 format with minimal quality compromise. Best for final renders, careful work, and LoRA training.

FP8 (~6GB)

Recommended for most users. Smaller files, faster downloads, and excellent quality with only minor tradeoffs compared to BF16.

GGUF

Optimized for lightweight inference setups, especially useful for low VRAM, CPU inference, or alternative backends.

AIO

All-in-one checkpoints with image model + Text Encoder + VAE integrated into a single file for the easiest setup. Available for Base, Distill-4-Step and Distill-8-Step.

Z-Anime Base

The foundation of the Z-Anime family. A full fine-tune with the highest quality ceiling, the widest creative range, and full negative prompt support.

Recommended Settings

steps: 28-50
cfg: 3.0-5.0   # up to 9.0 possible
sampler: euler_ancestral
scheduler: beta
negative_prompt: strongly recommended

CFG Guide

3.0–5.0 → sweet spot for balanced quality and creativity
5.0–7.0 → tighter prompt adherence
7.0–9.0 → maximum control, but watch for oversaturation
Above 9.0 → not recommended

Negative prompts have full effect on Z-Anime Base.

steps: 28-50
cfg: 3.0-5.0   # up to 9.0 possible
sampler: euler_ancestral
scheduler: beta
negative_prompt: strongly recommended

Z-Anime Distill-8-Step

Distilled from Z-Anime Base, delivers strong anime results in just 8 steps while keeping most of the quality.

Recommended Settings

steps: 8
cfg: 1.0   # max ~1.5
sampler: euler_ancestral
scheduler: beta
negative_prompt: limited effect

CFG Guide

Best at CFG 1.0
Small increases to 1.3–1.5 are possible
Do not go above 1.5 — artifacts may appear

Negative prompts have only limited effect. If your workflow includes ConditioningZeroOut, prefer that instead of a large negative prompt.

steps: 8
cfg: 1.0   # max ~1.5
sampler: euler_ancestral
scheduler: beta
negative_prompt: limited effect

Z-Anime Distill-4-Step

Built for maximum throughput — rapid prototyping, quick batch generation.

Recommended Settings

steps: 4
cfg: 1.0   # max ~1.5
sampler: euler_ancestral
scheduler: beta
negative_prompt: limited effect

Tips for 4-Step

Stay at CFG 1.0 for most stable results
Put the most important visual details early in the prompt
An optional upscaler (e.g., hires fix or SeedVR2) can recover fine detail

steps: 4
cfg: 1.0   # max ~1.5
sampler: euler_ancestral
scheduler: beta
negative_prompt: limited effect

Resolution Guide

Use Case	Resolution
Portrait / character art	832 × 1216
Landscape / scenes / backgrounds	1216 × 832
Square / general purpose	1024 × 1024
Tall / full body / wallpaper	768 × 1344
Cinematic / wide scenes	1920 × 1088
Detailed portraits	1024 × 1536

Supported range: approximately 512 × 512 to 2048 × 2048, any aspect ratio. All main variants designed to run on 8GB VRAM.

Prompting Guide

Natural language works best — not tag lists.

✅ Good

A young anime girl with long silver hair and golden eyes, wearing a traditional shrine maiden outfit with white haori and red hakama. She stands in a sunlit bamboo forest, cherry blossoms falling softly around her. Warm afternoon light filtering through the trees, detailed fabric shading, expressive face, calm serene expression, high quality anime illustration with fine line work.

❌ Avoid

anime girl, silver hair, shrine maiden, bamboo, cherry blossom, warm light

Character Portraits

Detailed anime portrait of [character], soft rim lighting, expressive eyes with detailed reflections, fine hair strands, clean linework, professional anime illustration quality.

Action Scenes

Dynamic anime [scene], dramatic angle, motion energy, speed lines, particle effects, cinematic composition, detailed shading, high quality anime art.

Backgrounds & Landscapes

Anime [location] at [time of day], [lighting], [atmosphere], beautiful background art, wallpaper quality, highly detailed environment.

A young anime girl with long silver hair and golden eyes, wearing a traditional shrine maiden outfit with white haori and red hakama. She stands in a sunlit bamboo forest, cherry blossoms falling softly around her. Warm afternoon light filtering through the trees, detailed fabric shading, expressive face, calm serene expression, high quality anime illustration with fine line work.

anime girl, silver hair, shrine maiden, bamboo, cherry blossom, warm light

Detailed anime portrait of [character], soft rim lighting, expressive eyes with detailed reflections, fine hair strands, clean linework, professional anime illustration quality.

Dynamic anime [scene], dramatic angle, motion energy, speed lines, particle effects, cinematic composition, detailed shading, high quality anime art.

Anime [location] at [time of day], [lighting], [atmosphere], beautiful background art, wallpaper quality, highly detailed environment.

Installation

Step 1 — Download the version you want

Choose between:

Standard / Distill models in BF16 or FP8 (+ VAE + Text Encoder)
GGUF variants for low VRAM / CPU / AMD-friendly inference (+ VAE + Text Encoder)
AIO variants for single-file convenience (no extra VAE / Text Encoder needed)

Step 2 — Place the files

Standard BF16 / FP8 models

ComfyUI/models/diffusion_models/
├── z-anime-base-bf16.safetensors
├── z-anime-base-fp8.safetensors
├── z-anime-distill-8step-bf16.safetensors
├── z-anime-distill-8step-fp8.safetensors
├── z-anime-distill-4step-bf16.safetensors
└── z-anime-distill-4step-fp8.safetensors

GGUF variants

ComfyUI/models/unet/
├── z-anime-base-q8_0.gguf
└── z-anime-base-q4_k_s.gguf

Text Encoder

Two text encoders are included — pick one:

ComfyUI/models/clip/
└── qwen_3_4b-bf16.safetensors          # default (Z-Image standard, BF16)
   or
└── qwen_3_4b-fp8.safetensors           # default (Z-Image standard, FP8)
   or
└── qwen_3_4b-engineer-v4-bf16.safetensors   # alternative (Engineer V4, BF16)
   or
└── qwen_3_4b-engineer-v4-fp8.safetensors    # alternative (Engineer V4, FP8)

Default (qwen_3_4b-*) — standard Z-Image text encoder, repackaged as single .safetensors (BF16 + FP8). This is what the model was trained against.
Engineer V4 (qwen_3_4b-engineer-v4-*) — alternative full fine-tune of the Z-Image text encoder by BennyDaBall, drop-in compatible. Often produces more varied outputs from same seed.

VAE

ComfyUI/models/vae/
└── ae.safetensors

AIO variants

For AIO versions, only the single checkpoint file is needed:

ComfyUI/models/checkpoints/
├── z-anime-base-aio-bf16.safetensors
├── z-anime-base-aio-fp8.safetensors
├── z-anime-distill-8step-aio-bf16.safetensors
├── z-anime-distill-8step-aio-fp8.safetensors
├── z-anime-distill-4step-aio-bf16.safetensors
└── z-anime-distill-4step-aio-fp8.safetensors

Step 3 — Load in ComfyUI

For standard BF16 / FP8 versions

Use: Load Diffusion Model for model, CLIP Loader for text encoder, VAE Loader for VAE.

For GGUF versions

Load the GGUF model from models/unet/, same CLIP and VAE as above.

For AIO versions

Use a standard Checkpoint Loader — no extra CLIP or VAE loading required.

ComfyUI/models/diffusion_models/
├── z-anime-base-bf16.safetensors
├── z-anime-base-fp8.safetensors
├── z-anime-distill-8step-bf16.safetensors
├── z-anime-distill-8step-fp8.safetensors
├── z-anime-distill-4step-bf16.safetensors
└── z-anime-distill-4step-fp8.safetensors

ComfyUI/models/unet/
├── z-anime-base-q8_0.gguf
└── z-anime-base-q4_k_s.gguf

ComfyUI/models/clip/
└── qwen_3_4b-bf16.safetensors          # default (Z-Image standard, BF16)
   or
└── qwen_3_4b-fp8.safetensors           # default (Z-Image standard, FP8)
   or
└── qwen_3_4b-engineer-v4-bf16.safetensors   # alternative (Engineer V4, BF16)
   or
└── qwen_3_4b-engineer-v4-fp8.safetensors    # alternative (Engineer V4, FP8)

ComfyUI/models/vae/
└── ae.safetensors

ComfyUI/models/checkpoints/
├── z-anime-base-aio-bf16.safetensors
├── z-anime-base-aio-fp8.safetensors
├── z-anime-distill-8step-aio-bf16.safetensors
├── z-anime-distill-8step-aio-fp8.safetensors
├── z-anime-distill-4step-aio-bf16.safetensors
└── z-anime-distill-4step-aio-fp8.safetensors

Custom Nodes

rgthree-comfy
ComfyUI-Lora-Manager
ComfyUI-GGUF (only for GGUF variants)
ComfyUI-SeedVR2_VideoUpscaler (optional)

Using the Diffusers Folder (Python)

import torch
from diffusers import ZImagePipeline

pipe = ZImagePipeline.from_pretrained(
    "SeeSee21/Z-Anime",
    subfolder="diffusers",
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe(
    prompt="A young anime girl with long silver hair and golden eyes, "
           "shrine maiden outfit, sunlit bamboo forest, cherry blossoms, "
           "professional anime illustration, fine line work.",
    num_inference_steps=40,
    guidance_scale=4.0,
).images[0]

image.save("z-anime-output.png")

This format is also a clean starting point for further fine-tuning (LoRA or full fine-tune) with frameworks like OneTrainer, diffusers, or kohya-ss.

import torch
from diffusers import ZImagePipeline

pipe = ZImagePipeline.from_pretrained(
    "SeeSee21/Z-Anime",
    subfolder="diffusers",
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe(
    prompt="A young anime girl with long silver hair and golden eyes, "
           "shrine maiden outfit, sunlit bamboo forest, cherry blossoms, "
           "professional anime illustration, fine line work.",
    num_inference_steps=40,
    guidance_scale=4.0,
).images[0]

image.save("z-anime-output.png")

Official Workflow

A ready-to-use ComfyUI workflow (workflows/Z-Anime-Workflow-v1.json) supports all variants (Base / Distill-8 / Distill-4, BF16 / FP8 / GGUF / AIO) and includes:

Model switch (Diffusion / GGUF / AIO loaders)
Optional LoRA loader
Positive + Negative prompt nodes (with default anime negative)
Resolution presets
Generate + Optional 1.5× upscale with side-by-side compare
Built-in MarkdownNote guide with settings per variant

Repository Structure

Z-Anime/
├── README.md
├── config.json
│
├── diffusion_models/
│   ├── z-anime-base-bf16.safetensors
│   ├── z-anime-base-fp8.safetensors
│   ├── z-anime-distill-8step-bf16.safetensors
│   ├── z-anime-distill-8step-fp8.safetensors
│   ├── z-anime-distill-4step-bf16.safetensors
│   └── z-anime-distill-4step-fp8.safetensors
│
├── gguf/
│   ├── z-anime-base-q8_0.gguf
│   └── z-anime-base-q4_k_s.gguf
│
├── aio/
│   ├── z-anime-base-aio-bf16.safetensors
│   ├── z-anime-base-aio-fp8.safetensors
│   ├── z-anime-distill-8step-aio-bf16.safetensors
│   ├── z-anime-distill-8step-aio-fp8.safetensors
│   ├── z-anime-distill-4step-aio-bf16.safetensors
│   └── z-anime-distill-4step-aio-fp8.safetensors
│
├── text_encoder/
│   ├── qwen_3_4b-bf16.safetensors                  # default
│   ├── qwen_3_4b-fp8.safetensors                   # default
│   ├── qwen_3_4b-engineer-v4-bf16.safetensors      # alternative (BennyDaBall)
│   └── qwen_3_4b-engineer-v4-fp8.safetensors       # alternative (BennyDaBall)
│
├── vae/
│   └── ae.safetensors
│
├── diffusers/
│   ├── model_index.json
│   ├── scheduler/
│   ├── tokenizer/
│   ├── text_encoder/
│   ├── transformer/   (sharded safetensors + index)
│   └── vae/
│
├── images/
│   ├── cover.png
│   ├── workflow-cover.png
│   ├── workflow-overview.png
│   ├── 1.png
│   ├── 2.png
│   ├── 3.png
│   ├── 4.png
│   ├── 5.png
│   ├── 6.png
│   ├── 7.png
│   ├── 8.png
│   └── 9.png
└── workflows/
    └── Z-Anime-Workflow-v1.json

Z-Anime/
├── README.md
├── config.json
│
├── diffusion_models/
│   ├── z-anime-base-bf16.safetensors
│   ├── z-anime-base-fp8.safetensors
│   ├── z-anime-distill-8step-bf16.safetensors
│   ├── z-anime-distill-8step-fp8.safetensors
│   ├── z-anime-distill-4step-bf16.safetensors
│   └── z-anime-distill-4step-fp8.safetensors
│
├── gguf/
│   ├── z-anime-base-q8_0.gguf
│   └── z-anime-base-q4_k_s.gguf
│
├── aio/
│   ├── z-anime-base-aio-bf16.safetensors
│   ├── z-anime-base-aio-fp8.safetensors
│   ├── z-anime-distill-8step-aio-bf16.safetensors
│   ├── z-anime-distill-8step-aio-fp8.safetensors
│   ├── z-anime-distill-4step-aio-bf16.safetensors
│   └── z-anime-distill-4step-aio-fp8.safetensors
│
├── text_encoder/
│   ├── qwen_3_4b-bf16.safetensors                  # default
│   ├── qwen_3_4b-fp8.safetensors                   # default
│   ├── qwen_3_4b-engineer-v4-bf16.safetensors      # alternative (BennyDaBall)
│   └── qwen_3_4b-engineer-v4-fp8.safetensors       # alternative (BennyDaBall)
│
├── vae/
│   └── ae.safetensors
│
├── diffusers/
│   ├── model_index.json
│   ├── scheduler/
│   ├── tokenizer/
│   ├── text_encoder/
│   ├── transformer/   (sharded safetensors + index)
│   └── vae/
│
├── images/
│   ├── cover.png
│   ├── workflow-cover.png
│   ├── workflow-overview.png
│   ├── 1.png
│   ├── 2.png
│   ├── 3.png
│   ├── 4.png
│   ├── 5.png
│   ├── 6.png
│   ├── 7.png
│   ├── 8.png
│   └── 9.png
└── workflows/
    └── Z-Anime-Workflow-v1.json

Version History

v1.0 — Initial Release

Z-Anime Base released in BF16 & FP8
Z-Anime Distill-8-Step released in BF16 & FP8
Z-Anime Distill-4-Step released in BF16 & FP8
GGUF variants added (Q8_0 ~6.73 GB, Q4_K_S ~4.2 GB)
AIO variants added — Base, Distill-4-Step and Distill-8-Step (each in BF16 & FP8)
VAE (ae.safetensors) and Text Encoder (qwen_3_4b.safetensors) included
Optimized for euler_ancestral, euler + beta, and simple practical use across the family

Attribution

Base Architecture: Tongyi Lab (Alibaba) — Z-Image
Fine-Tune: SeeSee21
License: Apache 2.0
Architecture: S3-DiT (Single-Stream Diffusion Transformer, 6B parameters)
Engineer V4 Text Encoder: BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4 — full fine-tune with SMART training, included as alternative text encoder

Project page

Z-Anime | Full Anime Fine-Tune on Z-Image Base

Variants

Key Features

Released Variants

Z-Anime Base

Z-Anime Distill-8-Step

Z-Anime Distill-4-Step

GGUF Variants

AIO Variants

VAE & Text Encoder

Diffusers Folder

Version Formats

BF16 (~12GB)

FP8 (~6GB)

GGUF

AIO

Z-Anime Base

Recommended Settings

CFG Guide

Z-Anime Distill-8-Step

Recommended Settings

CFG Guide

Z-Anime Distill-4-Step

Recommended Settings

Tips for 4-Step

Resolution Guide

Prompting Guide

✅ Good

❌ Avoid

Character Portraits

Action Scenes

Backgrounds & Landscapes

Installation

Step 1 — Download the version you want

Step 2 — Place the files

Standard BF16 / FP8 models

GGUF variants

Text Encoder

VAE

AIO variants

Step 3 — Load in ComfyUI

For standard BF16 / FP8 versions

For GGUF versions

For AIO versions

Custom Nodes

Using the Diffusers Folder (Python)

Official Workflow

Repository Structure

Version History

v1.0 — Initial Release

Links

Attribution

Juggernaut Z V1: Cinematic Fine-Tune of Z-Image Base

Juggernaut Z V1: Cinematic Fine-Tune of Z-Image Base

What is Ideogram 4: The Open-Weight Text-to-Image Foundation Model?

What is Ideogram 4: The Open-Weight Text-to-Image Foundation Model?

How Bonsai 4B's Ternary Weights Revolutionize Compact Text-to-Image AI