>>108542404
Only the last character of the prompt changed... I have this:
slot get_availabl: id 15 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 0.978
slot launch_slot_: id 15 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 15 | task 461 | processing task, is_child = 0
slot update_slots: id 15 | task 461 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 11847
slot update_slots: id 15 | task 461 | erased invalidated context checkpoint (pos_min = 0, pos_max = 11842, n_tokens = 11843, n_swa = 1024, pos_next = 11837, size = 9252.480 MiB)
slot update_slots: id 15 | task 461 | n_tokens = 11837, memory_seq_rm [11837, end)
srv log_server_r: done request: POST /v1/chat/completions 192.168.1.34 200
slot update_slots: id 15 | task 461 | prompt processing progress, n_tokens = 11843, batch.n_tokens = 6, progress = 0.999662
slot update_slots: id 15 | task 461 | created context checkpoint 3 of 32 (pos_min = 0, pos_max = 11836, n_tokens = 11837, size = 9247.793 MiB)
slot update_slots: id 15 | task 461 | n_tokens = 11843, memory_seq_rm [11843, end)
slot init_sampler: id 15 | task 461 | init sampler, took 1.65 ms, tokens: text = 11847, total = 11847
slot update_slots: id 15 | task 461 | prompt processing done, n_tokens = 11847, batch.n_tokens = 4
slot print_timing: id 15 | task 461 |
prompt eval time = 7738.53 ms / 10 tokens ( 773.85 ms per token, 1.29 tokens per second)
eval time = 11056.77 ms / 262 tokens ( 42.20 ms per token, 23.70 tokens per second)
total time = 18795.30 ms / 272 tokens
slot release: id 15 | task 461 | stop processing: n_tokens = 12108, truncated = 0
srv update_slots: all slots are idle
The delay between sending and request and seeing first token of response was like 8 seconds. To prompt-process 10 tokens? What? Why?