Fine Tuning
Page 2 of 2

Z-Anime: Full Anime Fine-Tune on Z-Image Base
Z-Anime is a full fine-tune of the Z-Image Base architecture, not a LoRA merge. It provides anime-style generation with natural language prompting, high diversity, and multiple variants including Base, Distill-8-Step, Distill-4-Step, GGUF, and AIO. Supports 8GB VRAM and includes VAE and text encoder.

Juggernaut Z V1: Cinematic Fine-Tune of Z-Image Base
Juggernaut Z V1 is a cinematic fine-tune of Z-Image Base, trained by KandooAI and released by RunDiffusion. It features dramatic lighting, sharper focus, natural skin, improved anatomy, and better ethnic diversity out of the box. Available in FP16, FP8, and GGUF formats for Diffusers and other workflows.

Can I Fine-Tune This? — Practical Guide to VRAM Estimation
Learn how to use canifinetune to predict whether your LLM fine-tuning configuration fits on your GPU before downloading weights. Includes memory estimation, feasibility checks, recommendation, benchmarking, and recipe generation for Hugging Face + PEFT + TRL.

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation
LongLive-2.0 presents the first end-to-end NVFP4 system for long video generation. It introduces Balanced Sequence Parallelism (SP) and NVFP4 quantization to accelerate training and inference. On Blackwell GPUs, W4A4 inference and quantized KV cache reduce memory and boost throughput. A clean training pipeline directly fine-tunes diffusion models into autoregressive models with standalone LoRA for real-time generation. Multi-shot attention sink enables stable streaming. Experiments show up to 2.15× training speedup and 1.84× inference speedup, achieving 45.7 FPS at 5B parameters.

Fast Byte Latent Transformer: Efficient Byte-Level Generation via Diffusion and Speculation
This paper introduces BLT Diffusion (BLT-D), BLT Self-speculation (BLT-S), and BLT Diffusion+Verification (BLT-DV) to accelerate byte-level language models. By replacing autoregressive decoding with block-wise diffusion and verification, the methods achieve over 50% memory-bandwidth reduction and up to 92% with larger blocks, while maintaining competitive performance on translation and code generation tasks.