Fine Tuning

Page 2 of 2

Full fine-tune family based on Alibaba's Z-Image S3-DiT, with variants for quality, speed, and low VRAM.

Z-Anime: Full Anime Fine-Tune on Z-Image Base

Z-Anime is a full fine-tune of the Z-Image Base architecture, not a LoRA merge. It provides anime-style generation with natural language prompting, high diversity, and multiple variants including Base, Distill-8-Step, Distill-4-Step, GGUF, and AIO. Supports 8GB VRAM and includes VAE and text encoder.

Enhanced lighting, sharper focus, natural skin texture, and improved anatomy for cinematic image generation.

Juggernaut Z V1: Cinematic Fine-Tune of Z-Image Base

Juggernaut Z V1 is a cinematic fine-tune of Z-Image Base, trained by KandooAI and released by RunDiffusion. It features dramatic lighting, sharper focus, natural skin, improved anatomy, and better ethnic diversity out of the box. Available in FP16, FP8, and GGUF formats for Diffusers and other workflows.

A CLI tool that estimates VRAM usage for LoRA/QLoRA training on consumer GPUs, with benchmarking and calibration.

Can I Fine-Tune This? — Practical Guide to VRAM Estimation

Learn how to use canifinetune to predict whether your LLM fine-tuning configuration fits on your GPU before downloading weights. Includes memory estimation, feasibility checks, recommendation, benchmarking, and recipe generation for Hugging Face + PEFT + TRL.

End-to-end training and inference system using NVFP4 quantization, Balanced SP, and multi-shot attention sink for real-time, long, interactive video generation.

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

LongLive-2.0 presents the first end-to-end NVFP4 system for long video generation. It introduces Balanced Sequence Parallelism (SP) and NVFP4 quantization to accelerate training and inference. On Blackwell GPUs, W4A4 inference and quantized KV cache reduce memory and boost throughput. A clean training pipeline directly fine-tunes diffusion models into autoregressive models with standalone LoRA for real-time generation. Multi-shot attention sink enables stable streaming. Experiments show up to 2.15× training speedup and 1.84× inference speedup, achieving 45.7 FPS at 5B parameters.

Combining hierarchical latent tokenization with block-wise discrete diffusion and self-speculation for faster byte-level language models

Fast Byte Latent Transformer: Efficient Byte-Level Generation via Diffusion and Speculation

This paper introduces BLT Diffusion (BLT-D), BLT Self-speculation (BLT-S), and BLT Diffusion+Verification (BLT-DV) to accelerate byte-level language models. By replacing autoregressive decoding with block-wise diffusion and verification, the methods achieve over 50% memory-bandwidth reduction and up to 92% with larger blocks, while maintaining competitive performance on translation and code generation tasks.