Z-Anime | Full Anime Fine-Tune on Z-Image Base
Full fine-tune of Alibaba’s Z-Image Base architecture — not a LoRA merge, but a fully trained anime-focused model family built from the ground up.
Built on the S3-DiT (Single-Stream Diffusion Transformer, 6B parameters), Z-Anime inherits the rich diversity, strong controllability, full negative prompt support, and a high ceiling for fine-tuning of Z-Image Base — now adapted for anime-style generation.
Variants
| Variant | Focus | Best For |
|---|---|---|
| Z-Anime Base | Highest quality | Final renders, full control |
| Z-Anime Distill-8-Step | Speed + quality balance | Everyday generation |
| Z-Anime Distill-4-Step | Maximum speed | Fast iteration, batches |
| GGUF Variants | Lower memory usage | Low VRAM / CPU / AMD-friendly workflows |
| AIO Variants | Single-file convenience | Easy ComfyUI setup |
| Diffusers Folder | from_pretrained() ready | Python pipelines, further fine-tuning |
Key Features
- Full fine-tune on Z-Image Base — not a LoRA merge
- Rich anime aesthetics with strong style diversity
- Natural language prompting — works best with descriptive prompts, not tag lists
- High diversity across characters, poses, compositions, and layouts
- LoRA training ready — strong base for further fine-tuning
- Partially NSFW capable
- 8GB VRAM compatible
- GGUF variants available
- AIO variants available (Base, 4-Step, 8-Step)
Released Variants
Z-Anime Base
Full fine-tune on Z-Image Base — BF16 & FP8
Z-Anime Distill-8-Step
BF16 & FP8 — fast anime generation in 8 steps, CFG 1.0
Z-Anime Distill-4-Step
BF16 & FP8 — ultra-fast anime generation in 4 steps, CFG 1.0
GGUF Variants
- Z-Anime-Base-Q8_0 — Q8_0 quantization (~6.73 GB)
- Z-Anime-Base-Q4_K_S — Q4_K_S quantization (~4.2 GB)
AIO Variants
All-in-one checkpoints with image model + VAE + Text Encoder integrated in a single file. Available for Base, Distill-4-Step and Distill-8-Step — each in BF16 & FP8.
VAE & Text Encoder
The required VAE (ae.safetensors) and Text Encoder (qwen_3_4b.safetensors) are also included in this repository for users running the standard (non-AIO) variants.
Diffusers Folder
The full Diffusers-format folder (diffusers/) is included — drop-in compatible with ZImagePipeline.from_pretrained() for Python inference or further fine-tuning.
Version Formats
BF16 (~12GB)
Maximum precision. BFloat16 format with minimal quality compromise. Best for final renders, careful work, and LoRA training.
FP8 (~6GB)
Recommended for most users. Smaller files, faster downloads, and excellent quality with only minor tradeoffs compared to BF16.
GGUF
Optimized for lightweight inference setups, especially useful for low VRAM, CPU inference, or alternative backends.
AIO
All-in-one checkpoints with image model + Text Encoder + VAE integrated into a single file for the easiest setup. Available for Base, Distill-4-Step and Distill-8-Step.
Z-Anime Base
The foundation of the Z-Anime family. A full fine-tune with the highest quality ceiling, the widest creative range, and full negative prompt support.
Recommended Settings
steps: 28-50
cfg: 3.0-5.0 # up to 9.0 possible
sampler: euler_ancestral
scheduler: beta
negative_prompt: strongly recommended
CFG Guide
- 3.0–5.0 → sweet spot for balanced quality and creativity
- 5.0–7.0 → tighter prompt adherence
- 7.0–9.0 → maximum control, but watch for oversaturation
- Above 9.0 → not recommended
Negative prompts have full effect on Z-Anime Base.
steps: 28-50
cfg: 3.0-5.0 # up to 9.0 possible
sampler: euler_ancestral
scheduler: beta
negative_prompt: strongly recommended
Z-Anime Distill-8-Step
Distilled from Z-Anime Base, delivers strong anime results in just 8 steps while keeping most of the quality.
Recommended Settings
steps: 8
cfg: 1.0 # max ~1.5
sampler: euler_ancestral
scheduler: beta
negative_prompt: limited effect
CFG Guide
- Best at CFG 1.0
- Small increases to 1.3–1.5 are possible
- Do not go above 1.5 — artifacts may appear
Negative prompts have only limited effect. If your workflow includes ConditioningZeroOut, prefer that instead of a large negative prompt.
steps: 8
cfg: 1.0 # max ~1.5
sampler: euler_ancestral
scheduler: beta
negative_prompt: limited effect
Z-Anime Distill-4-Step
Built for maximum throughput — rapid prototyping, quick batch generation.
Recommended Settings
steps: 4
cfg: 1.0 # max ~1.5
sampler: euler_ancestral
scheduler: beta
negative_prompt: limited effect
Tips for 4-Step
- Stay at CFG 1.0 for most stable results
- Put the most important visual details early in the prompt
- An optional upscaler (e.g., hires fix or SeedVR2) can recover fine detail
steps: 4
cfg: 1.0 # max ~1.5
sampler: euler_ancestral
scheduler: beta
negative_prompt: limited effect
Resolution Guide
| Use Case | Resolution |
|---|---|
| Portrait / character art | 832 × 1216 |
| Landscape / scenes / backgrounds | 1216 × 832 |
| Square / general purpose | 1024 × 1024 |
| Tall / full body / wallpaper | 768 × 1344 |
| Cinematic / wide scenes | 1920 × 1088 |
| Detailed portraits | 1024 × 1536 |
Supported range: approximately 512 × 512 to 2048 × 2048, any aspect ratio. All main variants designed to run on 8GB VRAM.
Prompting Guide
Natural language works best — not tag lists.
✅ Good
A young anime girl with long silver hair and golden eyes, wearing a traditional shrine maiden outfit with white haori and red hakama. She stands in a sunlit bamboo forest, cherry blossoms falling softly around her. Warm afternoon light filtering through the trees, detailed fabric shading, expressive face, calm serene expression, high quality anime illustration with fine line work.
❌ Avoid
anime girl, silver hair, shrine maiden, bamboo, cherry blossom, warm light
Character Portraits
Detailed anime portrait of [character], soft rim lighting, expressive eyes with detailed reflections, fine hair strands, clean linework, professional anime illustration quality.
Action Scenes
Dynamic anime [scene], dramatic angle, motion energy, speed lines, particle effects, cinematic composition, detailed shading, high quality anime art.
Backgrounds & Landscapes
Anime [location] at [time of day], [lighting], [atmosphere], beautiful background art, wallpaper quality, highly detailed environment.
A young anime girl with long silver hair and golden eyes, wearing a traditional shrine maiden outfit with white haori and red hakama. She stands in a sunlit bamboo forest, cherry blossoms falling softly around her. Warm afternoon light filtering through the trees, detailed fabric shading, expressive face, calm serene expression, high quality anime illustration with fine line work.
anime girl, silver hair, shrine maiden, bamboo, cherry blossom, warm light
Detailed anime portrait of [character], soft rim lighting, expressive eyes with detailed reflections, fine hair strands, clean linework, professional anime illustration quality.
Dynamic anime [scene], dramatic angle, motion energy, speed lines, particle effects, cinematic composition, detailed shading, high quality anime art.
Anime [location] at [time of day], [lighting], [atmosphere], beautiful background art, wallpaper quality, highly detailed environment.
Installation
Step 1 — Download the version you want
Choose between:
- Standard / Distill models in BF16 or FP8 (+ VAE + Text Encoder)
- GGUF variants for low VRAM / CPU / AMD-friendly inference (+ VAE + Text Encoder)
- AIO variants for single-file convenience (no extra VAE / Text Encoder needed)
Step 2 — Place the files
Standard BF16 / FP8 models
ComfyUI/models/diffusion_models/
├── z-anime-base-bf16.safetensors
├── z-anime-base-fp8.safetensors
├── z-anime-distill-8step-bf16.safetensors
├── z-anime-distill-8step-fp8.safetensors
├── z-anime-distill-4step-bf16.safetensors
└── z-anime-distill-4step-fp8.safetensors
GGUF variants
ComfyUI/models/unet/
├── z-anime-base-q8_0.gguf
└── z-anime-base-q4_k_s.gguf
Text Encoder
Two text encoders are included — pick one:
ComfyUI/models/clip/
└── qwen_3_4b-bf16.safetensors # default (Z-Image standard, BF16)
or
└── qwen_3_4b-fp8.safetensors # default (Z-Image standard, FP8)
or
└── qwen_3_4b-engineer-v4-bf16.safetensors # alternative (Engineer V4, BF16)
or
└── qwen_3_4b-engineer-v4-fp8.safetensors # alternative (Engineer V4, FP8)
- Default (
qwen_3_4b-*) — standard Z-Image text encoder, repackaged as single.safetensors(BF16 + FP8). This is what the model was trained against. - Engineer V4 (
qwen_3_4b-engineer-v4-*) — alternative full fine-tune of the Z-Image text encoder by BennyDaBall, drop-in compatible. Often produces more varied outputs from same seed.
VAE
ComfyUI/models/vae/
└── ae.safetensors
AIO variants
For AIO versions, only the single checkpoint file is needed:
ComfyUI/models/checkpoints/
├── z-anime-base-aio-bf16.safetensors
├── z-anime-base-aio-fp8.safetensors
├── z-anime-distill-8step-aio-bf16.safetensors
├── z-anime-distill-8step-aio-fp8.safetensors
├── z-anime-distill-4step-aio-bf16.safetensors
└── z-anime-distill-4step-aio-fp8.safetensors
Step 3 — Load in ComfyUI
For standard BF16 / FP8 versions
Use: Load Diffusion Model for model, CLIP Loader for text encoder, VAE Loader for VAE.
For GGUF versions
Load the GGUF model from models/unet/, same CLIP and VAE as above.
For AIO versions
Use a standard Checkpoint Loader — no extra CLIP or VAE loading required.
ComfyUI/models/diffusion_models/
├── z-anime-base-bf16.safetensors
├── z-anime-base-fp8.safetensors
├── z-anime-distill-8step-bf16.safetensors
├── z-anime-distill-8step-fp8.safetensors
├── z-anime-distill-4step-bf16.safetensors
└── z-anime-distill-4step-fp8.safetensors
ComfyUI/models/unet/
├── z-anime-base-q8_0.gguf
└── z-anime-base-q4_k_s.gguf
ComfyUI/models/clip/
└── qwen_3_4b-bf16.safetensors # default (Z-Image standard, BF16)
or
└── qwen_3_4b-fp8.safetensors # default (Z-Image standard, FP8)
or
└── qwen_3_4b-engineer-v4-bf16.safetensors # alternative (Engineer V4, BF16)
or
└── qwen_3_4b-engineer-v4-fp8.safetensors # alternative (Engineer V4, FP8)
ComfyUI/models/vae/
└── ae.safetensors
ComfyUI/models/checkpoints/
├── z-anime-base-aio-bf16.safetensors
├── z-anime-base-aio-fp8.safetensors
├── z-anime-distill-8step-aio-bf16.safetensors
├── z-anime-distill-8step-aio-fp8.safetensors
├── z-anime-distill-4step-aio-bf16.safetensors
└── z-anime-distill-4step-aio-fp8.safetensors
Custom Nodes
rgthree-comfyComfyUI-Lora-ManagerComfyUI-GGUF(only for GGUF variants)ComfyUI-SeedVR2_VideoUpscaler(optional)
Using the Diffusers Folder (Python)
import torch from diffusers import ZImagePipeline pipe = ZImagePipeline.from_pretrained( "SeeSee21/Z-Anime", subfolder="diffusers", torch_dtype=torch.bfloat16, ).to("cuda") image = pipe( prompt="A young anime girl with long silver hair and golden eyes, " "shrine maiden outfit, sunlit bamboo forest, cherry blossoms, " "professional anime illustration, fine line work.", num_inference_steps=40, guidance_scale=4.0, ).images[0] image.save("z-anime-output.png")
This format is also a clean starting point for further fine-tuning (LoRA or full fine-tune) with frameworks like OneTrainer, diffusers, or kohya-ss.
import torch from diffusers import ZImagePipeline pipe = ZImagePipeline.from_pretrained( "SeeSee21/Z-Anime", subfolder="diffusers", torch_dtype=torch.bfloat16, ).to("cuda") image = pipe( prompt="A young anime girl with long silver hair and golden eyes, " "shrine maiden outfit, sunlit bamboo forest, cherry blossoms, " "professional anime illustration, fine line work.", num_inference_steps=40, guidance_scale=4.0, ).images[0] image.save("z-anime-output.png")
Official Workflow
A ready-to-use ComfyUI workflow (workflows/Z-Anime-Workflow-v1.json) supports all variants (Base / Distill-8 / Distill-4, BF16 / FP8 / GGUF / AIO) and includes:
- Model switch (Diffusion / GGUF / AIO loaders)
- Optional LoRA loader
- Positive + Negative prompt nodes (with default anime negative)
- Resolution presets
- Generate + Optional 1.5× upscale with side-by-side compare
- Built-in MarkdownNote guide with settings per variant
Repository Structure
Z-Anime/
├── README.md
├── config.json
│
├── diffusion_models/
│ ├── z-anime-base-bf16.safetensors
│ ├── z-anime-base-fp8.safetensors
│ ├── z-anime-distill-8step-bf16.safetensors
│ ├── z-anime-distill-8step-fp8.safetensors
│ ├── z-anime-distill-4step-bf16.safetensors
│ └── z-anime-distill-4step-fp8.safetensors
│
├── gguf/
│ ├── z-anime-base-q8_0.gguf
│ └── z-anime-base-q4_k_s.gguf
│
├── aio/
│ ├── z-anime-base-aio-bf16.safetensors
│ ├── z-anime-base-aio-fp8.safetensors
│ ├── z-anime-distill-8step-aio-bf16.safetensors
│ ├── z-anime-distill-8step-aio-fp8.safetensors
│ ├── z-anime-distill-4step-aio-bf16.safetensors
│ └── z-anime-distill-4step-aio-fp8.safetensors
│
├── text_encoder/
│ ├── qwen_3_4b-bf16.safetensors # default
│ ├── qwen_3_4b-fp8.safetensors # default
│ ├── qwen_3_4b-engineer-v4-bf16.safetensors # alternative (BennyDaBall)
│ └── qwen_3_4b-engineer-v4-fp8.safetensors # alternative (BennyDaBall)
│
├── vae/
│ └── ae.safetensors
│
├── diffusers/
│ ├── model_index.json
│ ├── scheduler/
│ ├── tokenizer/
│ ├── text_encoder/
│ ├── transformer/ (sharded safetensors + index)
│ └── vae/
│
├── images/
│ ├── cover.png
│ ├── workflow-cover.png
│ ├── workflow-overview.png
│ ├── 1.png
│ ├── 2.png
│ ├── 3.png
│ ├── 4.png
│ ├── 5.png
│ ├── 6.png
│ ├── 7.png
│ ├── 8.png
│ └── 9.png
└── workflows/
└── Z-Anime-Workflow-v1.json
Z-Anime/
├── README.md
├── config.json
│
├── diffusion_models/
│ ├── z-anime-base-bf16.safetensors
│ ├── z-anime-base-fp8.safetensors
│ ├── z-anime-distill-8step-bf16.safetensors
│ ├── z-anime-distill-8step-fp8.safetensors
│ ├── z-anime-distill-4step-bf16.safetensors
│ └── z-anime-distill-4step-fp8.safetensors
│
├── gguf/
│ ├── z-anime-base-q8_0.gguf
│ └── z-anime-base-q4_k_s.gguf
│
├── aio/
│ ├── z-anime-base-aio-bf16.safetensors
│ ├── z-anime-base-aio-fp8.safetensors
│ ├── z-anime-distill-8step-aio-bf16.safetensors
│ ├── z-anime-distill-8step-aio-fp8.safetensors
│ ├── z-anime-distill-4step-aio-bf16.safetensors
│ └── z-anime-distill-4step-aio-fp8.safetensors
│
├── text_encoder/
│ ├── qwen_3_4b-bf16.safetensors # default
│ ├── qwen_3_4b-fp8.safetensors # default
│ ├── qwen_3_4b-engineer-v4-bf16.safetensors # alternative (BennyDaBall)
│ └── qwen_3_4b-engineer-v4-fp8.safetensors # alternative (BennyDaBall)
│
├── vae/
│ └── ae.safetensors
│
├── diffusers/
│ ├── model_index.json
│ ├── scheduler/
│ ├── tokenizer/
│ ├── text_encoder/
│ ├── transformer/ (sharded safetensors + index)
│ └── vae/
│
├── images/
│ ├── cover.png
│ ├── workflow-cover.png
│ ├── workflow-overview.png
│ ├── 1.png
│ ├── 2.png
│ ├── 3.png
│ ├── 4.png
│ ├── 5.png
│ ├── 6.png
│ ├── 7.png
│ ├── 8.png
│ └── 9.png
└── workflows/
└── Z-Anime-Workflow-v1.json
Version History
v1.0 — Initial Release
- Z-Anime Base released in BF16 & FP8
- Z-Anime Distill-8-Step released in BF16 & FP8
- Z-Anime Distill-4-Step released in BF16 & FP8
- GGUF variants added (Q8_0 ~6.73 GB, Q4_K_S ~4.2 GB)
- AIO variants added — Base, Distill-4-Step and Distill-8-Step (each in BF16 & FP8)
- VAE (
ae.safetensors) and Text Encoder (qwen_3_4b.safetensors) included - Optimized for euler_ancestral, euler + beta, and simple practical use across the family
Links
- Base Model: Tongyi-MAI/Z-Image
- Author: SeeSee21 on Hugging Face
Attribution
- Base Architecture: Tongyi Lab (Alibaba) — Z-Image
- Fine-Tune: SeeSee21
- License: Apache 2.0
- Architecture: S3-DiT (Single-Stream Diffusion Transformer, 6B parameters)
- Engineer V4 Text Encoder: BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4 — full fine-tune with SMART training, included as alternative text encoder



