with the following core settings and a 640x360 video dataset, it's possible to train Wan 2.2 14b i2v with musubi-trainer and not OOM
--task i2v-A14B --sdpa --mixed_precision fp16 --fp8_base \
--optimizer_type adamw8bit --learning_rate 2e-4 --gradient_checkpointing --gradient_accumulation_steps 1 \
--max_data_loader_n_workers 2 --persistent_data_loader_workers --offload_inactive_dit \
--network_module networks.lora_wan --network_dim 32 \
--timestep_sampling shift --timestep_boundary 900 --min_timestep 0 --max_timestep 1000 --discrete_flow_shift 3.0 \
--max_train_epochs 16 --save_every_n_epochs 1 --seed 23571113 \
--save_state \
836x480 was close, like it almost worked in 48GB, but there would be occasional peak memory useage moments where it would OOM, and it would happen before it could write a checkpoint.