>>108086701
> ./build/bin/llama-server --help 2>&1|grep cache
-cl, --cache-list show list of models in cache
--swa-full use full-size SWA cache (default: false)
whether to enable KV cache offloading (default: enabled)
-ctk, --cache-type-k TYPE KV cache data type for K
-ctv, --cache-type-v TYPE KV cache data type for V
-dt, --defrag-thold N KV cache defragmentation threshold (DEPRECATED)
page cache before using this
--offline Offline mode: forces use of cache, prevents network access
-ctkd, --cache-type-k-draft TYPE KV cache data type for K for the draft model
-ctvd, --cache-type-v-draft TYPE KV cache data type for V for the draft model
-lcs, --lookup-cache-static FNAME path to static lookup cache to use for lookup decoding (not updated by
-lcd, --lookup-cache-dynamic FNAME path to dynamic lookup cache to use for lookup decoding (updated by
-cram, --cache-ram N set the maximum cache size in MiB (default: 8192, -1 - no limit, 0 -
--cache-prompt, --no-cache-prompt whether to enable prompt caching (default: enabled)
--cache-reuse N min chunk size to attempt reusing from the cache via KV shifting,
--slot-save-path PATH path to save slot kv cache (default: disabled)
--spec-type [none|ngram-cache|ngram-simple|ngram-map-k|ngram-map-k4v|ngram-mod]