>>102510635
The command line argument I'm using (on a 4090) is:
accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 flux_train_network.py --pretrained_model_name_or_path C:/ai/ComfyUI/models/unet/flux1-dev.safetensors --clip_l C:/ai/ComfyUI/models/clip/clip_l.safetensors --t5xxl C:/ai/ComfyUI/models/clip/t5xxl_fp16.safetensors --ae C:/ai/ComfyUI/models/vae/ae.safetensors --cache_latents_to_disk --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 2 --seed 666 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --network_module networks.lora_flux --network_dim 32 --optimizer_type adamw8bit --learning_rate 1e-4 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --fp8_base --highvram --max_train_epochs 10 --save_every_n_epochs 1 --dataset_config d:/ai/lora/data/dataset.toml --output_dir d:/ai/lora/output --output_name luisroyo_flux --timestep_sampling shift --discrete_flow_shift 3.1582 --model_prediction_type raw --guidance_scale 1.0
Basically kohya's recommended way to run it with a few little tweaks to the paths, the number of epochs, and the network_dim. As long as it runs and produces a lora, I can usually figure out how to wrangle the other stuff to get something usable. It uses about 18GB VRAM. I *think* you could probably update the command to use the fp8 versions without too much agony.
>>102510664
> Did you try that ai-toolkit?
I didn't try that one. I used the kohya-ss-gui I got good use out of for SDXL loras. Supposedly it supports flux but I just could not make it work.
>>102510725
Me too, friend.