>>107910326
https://github.com/huggingface/transformers/pull/43031/files
vocab_size: int | None = 154880,
hidden_size: int | None = 2048,
intermediate_size: int | None = 10240,
moe_intermediate_size: int | None = 1536,
num_hidden_layers: int | None = 47,
num_attention_heads: int | None = 20,
num_key_value_heads: int | None = 20,
n_shared_experts: int | None = 1,
n_routed_experts: int | None = 64,
routed_scaling_factor: float | None = 1.8,
kv_lora_rank: int | None = 512,
q_lora_rank: int | None = 768,
qk_rope_head_dim: int | None = 64,
v_head_dim: int | None = 256,
qk_nope_head_dim: int | None = 192,
n_group: int | None = 1,
topk_group: int | None = 1,
num_experts_per_tok: int | None = 4,
norm_topk_prob: bool | None = True,
hidden_act: str | None = "silu",
max_position_embeddings: int | None = 202752,
initializer_range: float | None = 0.02,
rms_norm_eps: int | None = 1e-5,
use_cache: bool | None = True,
pad_token_id: int | None = None,
bos_token_id: int | None = 0,
eos_token_id: int | None = 1,
pretraining_tp: int | None = 1,
tie_word_embeddings: bool | None = False,
rope_parameters: RopeParameters | dict[str, RopeParameters] | None = None,
rope_interleave: bool | None = True,
mlp_layer_types=None,
attention_bias: bool | None = False,
attention_dropout: float | None = 0.0,
Also see https://github.com/vllm-project/vllm/pull/31386/files
Llama.cpp? In 2 weeks, maybe.