Threadly reminder if you're using your llm for coding or anything that requires repeating something in context almost verbatim and you're not using
--spec-type ngram-mod --spec-ngram-size-n 24 --draft-max 64
You're leaving a shitload of performance on the table.