I'm looking to make a merge of exactly two finetunes which as far as I can tell are about equally good. Is just SLERPing between them the best way to do that? And if so what's up with stuff like:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1]
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0]
- value: 0.5
>Note that we input a gradient of values for the interpolation factor t. The parameters for the self-attention and MLP layers will use different combinations of OpenPipe/mistral-ft-optimized-1218 and mlabonne/NeuralHermes-2.5-Mistral-7B. The other layers are a 50/50 mixture of the two models.
Like, is there a reason to think that's a good idea or is it just to demonstrate how to use the configuration file?