>>102036232
I joined together several medical datasets and I'm using unsloth to finetune mistral nemo with the following hyperparams
def train(model, tokenizer, dataset):
trainer = SFTTrainer(
model = model,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = MAX_SEQ_LENGTH,
tokenizer = tokenizer,
args = TrainingArguments(
per_device_train_batch_size = 8,
gradient_accumulation_steps = 4,
learning_rate = 2e-4,
warmup_steps = 10,
#max_steps = 60,
num_train_epochs=1,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = 'linear',
logging_steps = 1,
output_dir = "outputs",
),
)
trainer.train()
model.save_pretrained(DUMP_LOCATION)
And I'm getting weirdly fluctuating loss, picrel. What am I doing wrong?
Semi-related question, how do I determine if I'm overcooking it or overcooking or hWAT?