It is possible to train wan 2.2 14b i2v in 48GB VRAM?
I can almost do it with the following command, but I always OOM before it can write a checkpoint, so I can't resume:
accelerate launch --num_cpu_threads_per_process 1 --mixed_precision bf16 src/musubi_tuner/wan_train_network.py \
--task i2v-A14B \
--dit_high_noise /home/anon/Documents/ComfyUI/models/diffusion_models/wan2.2/wan2.2_i2v_high_noise_14B_fp16.safetensors \
--dit /home/anon/Documents/ComfyUI/models/diffusion_models/wan2.2/wan2.2_i2v_low_noise_14B_fp16.safetensors \
--dataset_config /home/anon/Documents/musubi-tuner/data/city-video-cfg/city-video-dataset.toml --sdpa --mixed_precision fp16 --fp8_base \
--optimizer_type adamw8bit --learning_rate 2e-4 --gradient_checkpointing --gradient_accumulation_steps 1 \
--max_data_loader_n_workers 2 --persistent_data_loader_workers --offload_inactive_dit \
--force_v2_1_time_embedding \
--network_module networks.lora_wan --network_dim 32 \
--timestep_sampling shift --timestep_boundary 900 --min_timestep 0 --max_timestep 1000 --discrete_flow_shift 3.0 \
--max_train_epochs 16 --save_every_n_epochs 1 --seed 23571113 \
--save_state \
--output_dir /home/anon/Documents/musubi-tuner/data/city-video-output/ --output_name wan2.2-14b-i2v-city.safetensors \
--logging_dir /home/anon/Documents/musubi-tuner/data/city-video-logs
I've tried setting --blocks_to_offload 32 but not only does it slow things down quite a bit, it still OOMs eventually anyway. I'm, training on 836x480 video. Maybe I should go smaller?