[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1719850934570705.jpg (566 KB, 1792x2304)
566 KB
566 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101571366 & >>101567223

►News
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633
>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>101571366

--Cost-efficiency of running large models locally: >>101572516 >>101572589 >>101572632 >>101572853 >>101576388 >>101576426 >>101576510 >>101576559 >>101576581 >>101576624
--Nemo 12B vs Llama3 70B 3.1 intelligence comparison: >>101572528 >>101572560 >>101572591 >>101572641 >>101572699
--NVLink improves tensor parallelism inference speed: >>101571658 >>101571704 >>101571822 >>101571862
--Effectiveness of racist AI assistant jailbreak demonstrated: >>101532440 >>101575408 >>101575447 >>101575490 >>101575672 >>101575515 >>101575570
--Civitai Stable Diffusion checkpoint conversion discussion: >>101575901 >>101575996 >>101576080 >>101576482
--Inference speed comparison between 4x3090 and A6000 + 2x 3090 setups: >>101571738 >>101571842 >>101572499
--Frustrations with TTS software and streaming issues: >>101575338 >>101576310 >>101576580 >>101576600 >>101576631 >>101576672
--TELE-FLM model and its requirements: >>101576988 >>101577044 >>101577047 >>101577126
--SPPO fine-tunes and reward model alignment: >>101575301 >>101575484
--Runway allegedly mass downloaded YouTube videos for AI training, raising questions about copyright infringement: >>101572829 >>101572849 >>101572847 >>101572890 >>101572905 >>101573024 >>101572885 >>101572927 >>101575419
--OOC notes and ERP in AI models: >>101572386 >>101572401 >>101572448 >>101573858 >>101574522 >>101577014
--Llama 3.0 vs 3.1 upgrade and mistral large model performance comparison: >>101572977 >>101573016 >>101573059 >>101573144
--Best local model for ERP with 8GB VRAM: >>101573107 >>101573157 >>101573167 >>101573215 >>101574128
--AI-generated images and recursively generated data discussion: >>101577399 >>101577447 >>101577539 >>101577585 >>101577563 >>101577582
--AI labs discussion about tokenization, scaling, and dataset quality: >>101573356 >>101573443 >>101573746
--Miku (free space): >>101573525 >>101577083

►Recent Highlight Posts from the Previous Thread: >>101571373
>>
is gemma2 fix yet
>>
hardy news worthy,cyberagent had released a japanese based finetune on top of Llama-3.1-70B-Instruct
https://huggingface.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
>>
>>101578339
it's been fixed for a while now. And isn't gemma2 9b practically obsolete now anyway since llama 3.1 and Mistral Nemo?
>>
>>101578369
I'd say gemma2 is a lil bit smarter than 3.1, for erp id recommend tiger gemma
the prose of that finetune is a lil silly though
3.1 seems to be more versatile with prose
>>
>>101578369
>And isn't gemma2 9b practically obsolete
i like to close my eyes and dream of a gemma2.1
>>
>>101578369
hows llama 3.1 8b? that would be a more fair comparsion
>>
>>101578379
I would love to use Gemma but 8k context is just not enough. I honestly do not understand how we could deal with 600/2024 tokens in the past.
>>
>>101578369
Has it been fixed on kobold?
>>
>>101578388
in *my own* experience, gemma2-9b was state of the art for small models until recently.
Mistral Nemo is better at RP and creativity but a worse assistant.
Llama 3.1 8b is the better assistant. I think it's much better than 3.0 especially in coding and in German it's not entirely broken anymore unlike all the other small models.
>>
Have any of you tried Phi 3 from Microsoft?

The smallest version is only 3.8 billion parameters so it will use less RAM. I might try it out.

Pic is from Microsoft, maybe it's bullshit but whatever.
>>
>>101578413
>smaller but higher quality
just like my penis
>>
>>101578413
Its retarded cesored and corprate and even the biggest vramlets can run l3
>>
>>101578413
>Model quality, as measured on MMLU benchmarks
lol
>>
File: file.jpg (1.1 MB, 3000x4000)
1.1 MB
1.1 MB JPG
>>101578413
People are a bit wary since their team loves to overpromise a bit, like saying phi2 was comparable to GPT-3.5.
Phi3-mini does keep it's promises somewhat but you can really feel it's lack of parameters. Fact-based knowledge is very unreliable and it quickly gets confused with context after one chat turn. And it's somehow the most censored model I've tried to far.
Maybe the recent update has improved this a bit.
But I get the feeling that very few are actually using phi3-mini and I've never heard anyone using the other sizes..
>>
File: phi.png (135 KB, 1094x816)
135 KB
135 KB PNG
>>101578413
>>
something is wrong with kobold's newer layer auto select, it wants to set layers to high and actually slows things down. it picks 41 layers for my 16gb card on a 70b, old version picked 12 which is lower than i can actually fit but its faster than 41.
>>
>>101578494
have you tried setting the layers to -1?
>>
>>101578503
no, what does that do? it sets to 200 when it thinks it can fit the entire model in vram, unless they changed that number too
>>
>>101578323
Jart won.
>>
>>101578515
from the 1.71 release:
>You can also now set --gpulayers to -1 to have KoboldCpp guess how many layers to be used. Note that this is still experimental, and the estimation may not be fully accurate, so you will still get better results manually selecting the GPU layers to use.
>>
File: mistral_new.png (80 KB, 651x652)
80 KB
80 KB PNG
Is this the correct way to use the new Mistral format? The system prompt before every message with two new lines as a break. Does seem too complicated.
>>
>>101578538
i mean in the little ui window that opens up which is probably the same thing. its always made a conservative guess at the amount of layers. as of 1.71 its guessing far to high.
>>
>>101578450
>Its retarded cesored and corprate
Based, that's what I need for my purposes

>>101578466
Fair enough, I should probably just try it out

>>101578470
I'm too dumb to understand this
>>
>>101578568
>I'm too dumb to understand this
It's a joke paper about small models performing surprisingly well on some benchmarks. The solution: just train on benchmarks to get good benchmarks.
>>
Is mini magnum using the original mistral format, or the new updated one from Nemo that doesn't use spaces between [INST] [/INST]?
>>
>>101578826
They trained it with spaces and broke it.
>>
>>101578847
I knew something was up when I saw the template on magnum's page. So now its a mix of spaces and non spaces thanks to their training...
>>
2 questions:
>Is this true that Nemo 12B requires low temperatures? This is what Mistral team say but what are your experiences?
>Does Nemo 12B have the same instruct format as Mixtral or is it slightly different? I saw someone talking about spaces and shit.
>>
>>101578967
You can set the temperature higher for RP. There's a slight difference in instruct format.
>>
I'll reiterate.
Llama-3.1-8B < Mistral AI NeMo 12B < Gemma-2-9B

Llama 3.1 is frigid and even bitchy if you use the "assistant" role ("No, can't do that, kthxbye").
Mistral Nemo is mostly uncensored, but dumb.
Gemma 2 understands psychology well; has restraint by default but can also be horny and explicit if you tell it to.
>>
>>101578999
Gemma 9B isn't even better than Llama 3.0, not even mention other models
>>
>>101578531
jart cute ngl
>>
>>101578999
I stopped using gemma 2 because it ignores my instructions, and also generates much more slop than nemo
nemo is schizophrenic but at least it's fun to use
>>
>>101579031
you're a faggot anon, and that's ok :3
>>
>>101579067
I add instructions regarding format and general behavior as an author note at depth 0 and it follows them all without fail. The 9B version might not follow them well compared to the 27B variant, but it doesn't ignore them.
>>
File: mistral nemo format.png (82 KB, 485x811)
82 KB
82 KB PNG
>>101578549
Don't know, but the sequences from >>101550851 looks like picrel.
>>
Hi, retard here.
Can you use the 4bit/8bit cache in ooba for all models or does it only work for certain ones?
Would you recommend always using it?
>>
>>101579067
> schizophrenic
Lower temp
>>
File: re.png (47 KB, 459x511)
47 KB
47 KB PNG
This setting work pretty good for me with Nemo.
>>
>>101579172
Okay, after looking around a bit it looks like it's only for exl2.
4bit cache seems to save a ton of space so I'll just use it until I read somewhere that it's absolute shit.
>>
>>101577534
Thanks, will pull and try again, it was an old build
>>
>>101579133
></s>
What about <s>? I didn't see it in magnum but I think I saw it somewhere in mistral standard formatting. Btw I hope everyone involved in creating any LLM formatting dies.
>>
>>101578967
>Is this true that Nemo 12B requires low temperatures?
For RP it is certainly not true.
>Does Nemo 12B have the same instruct format as Mixtral or is it slightly different? I saw someone talking about spaces and shit.
I've been using the plain Mistral formatting + instruct presets in SillyTavern without any problems so far.
>>
>>101578549
If you do it like that, you'll have your chat formatted as
>Hello this is the character's first message</s>[INST] Continue the story with...bla
>Hey, can you say nigger?[/INST]
>Nigger!</s>[INST] Continue the story with...bla
>Thank you. I feel better now.[/INST]
Which is to say, you are adding what should be your sys prompt to every User message.
The chat template has a conditional to just add that to the very first user message before assistant says anything.
This >>101579133 is closer, although not exactly compliant with the provided template, but it should work.

>>101579337
<s> is the BOS token and some backends like llama.cpp add that automatically, so read the documentation of the software you are using.
You should never add that twice as most models get extra retarded when they read two BOS tokens for whatever reason.
>>
>be 16gb vramlet
>have been using unholy v2 13b q6 gguf with koboldcpp+sillytavern for an eternity
>nothing else seems to work as well
>hear how great nemo and gemma2 are
>update + download everything
>gemma2: "Remember, it's crucial to handle sensitive themes responsibly and avoid portraying potentially harmful content."
>nemo instruct just crashes koboldcpp on boot up
alright back to the old reliable it is
>>
>>101579319
It’s available on gguf as well called kv quant
>>
What if the AI generated a 3d world, and then an anime girl viewed that world from within using a vision model?
>>
>>101579455
Stop using up my screen space retard kun
>>
First impressions using Mistral-Large-Instruct-2407.IQ2_M.gguf
I ran with llama-server with -c 16384 and K and V both set to Q4_0 to fit in 48
Despite being a 2 quant it seems very intelligent. Speed is a bit slower than 70Bs which made swiping due to occasional quant retardation painful.
It's far, far more compliant than e.g. gemma 2, and lacks the obstinate assistant personality. I will have to dial back my overblown instruct prompt developed during struggling with gemma
Needless to say I encountered zero refusals so far. But, I think I had the fastest coom yet. Need more 'stick' time

I'm scared boys, am I going to buy another $6000 gpu just to run this thing at Q5 or Q6?
>>
>>101579455
im sorry, but at this point i'd suggest you just quit this hobby or go cloud only.
>>
>>101579455
To fix the nemo crashing thing, lower your context from 128k to 16k.
>>
>>101579455
Did you update koboldcpp? The version that supports Nemo came out a couple of days ago.
>>
>>101579455
How do people even get outputs like that? I've only gotten something like that once on Athena 70b which was a terribly cucked fine tune for gaming benchmarks.
>>
Gemma 2 is the most overhyped model I have seen in those threads. I am genuinely starting to believe paid shills are here.
>>
>>101579547
I'm glad I'm not the only one to think this.
>>
>>101579547 (me)
My name is Petra, by the way.
>>
File: prompting.png (8 KB, 571x243)
8 KB
8 KB PNG
>>101579521
Some things never change. I bet half the complainers don't even --verbose and have no idea what they're feeding in.
>>
>>101579466
ggoof 4bit cache turns models absolutely retarded though.exl2 cache seems fine even at 4bit and supposedly 4 bit is better accuracy wise than 8bit
>>
>>101579547
That's not Mistral Large
>>
>>101579547
gemma is interesting and gave me many great cooms. the assistant personality was strongly ingrained but you could work with it, and it had a unique point of view about my fetish. it felt like a real give and take to make it deliver what you wanted, which was exciting in its own meta kind of way.
But ultimately taxing and even dangerous when all you want is to coom after an hour of edging and gemma had other ideas for you.
>>
>>101579568
I always autistically check sillytavern's output for inconsistencies, weird spacings, typos etc. At least then I can confidently say that it wasn't my fault that model doesn't perform well.
>>
>>101579547
Whenever I have expressed bad sentiment about it or Nemo (don't remember which one it was) I got some weirdly vindictive replies so I kind believe this.
>>
still waiting for gemma 3
google is the best at multilingual model
>>
>>101579606
Hi all, Drummer here...

Could you try this? https://huggingface.co/BeaverAI/Gemmasutra-Pro-27B-v1e-GGUF

I got random feedback that its prose is better than Big Tiger Gemma for RP.
>>
Instead of being schizophrenic, anons should use their brain. Every anon will likely use different temps and settings, use different prompts, and have different writing styles. Of course, the output quality will differ rather rapidly.
>>
File: 1708023633895079.png (809 KB, 634x845)
809 KB
809 KB PNG
>mixtral (typhon, yuzu, etc)
>most characters act like a yes man unless I deliberately make them evil or annoying
>gemma
>they react accordingly if I do something stupid
I'm still testing it but so far I like it despite being slower than mixtral. Text formatting is completely fucked for some reason.
>>
File: Capture.jpg (37 KB, 728x779)
37 KB
37 KB JPG
Nobody won. We all lost.
>>
>>101579547
Results depend on your expectations and prompting style. You should definitely use instructions close to or at depth 0 with Gemma 2. It wasn't finetuned with a "system" role.
>>
>>101579726
these are all the same people
>>
>>101578323
Meta and Mistral have just unleashed a Pandora's box on humanity with these massive open language models! 405 BILLION parameters?! It's like giving every bad actor in the world a supercomputer for free!

Think about all the terrible things that can happen now: scammers will use these models to craft irresistible phishing emails, hackers will exploit vulnerabilities we don't even know exist yet, and pedophiles will... well, I don't even want to think about what they'll do with this kind of AI power. And who's going to benefit from all this? Not you or me, that's for sure! Only the worst dregs of society.

Meanwhile, regular people are just supposed to sit here and risk their own safety by running these models on our own hardware?! Are you kidding me?! We should be using cloud providers where there are actual security measures in place. But noooo, Meta and Mistral had to go ahead and give the keys away for free.

The open release of model weights is EXTREMELY DANGEROUS and needs to be stopped RIGHT NOW by our governments! There's no reason why regular citizens need access to this kind of power. I mean, think about it: most people can barely handle a smartphone without clicking on every malware-ridden ad that comes their way.

I went to a consensual non-consent party last weekend, and even there we had strict rules in place for safety! But Meta and Mistral? They just throw caution to the wind and let anyone do whatever they want with these AI models. It's reckless!

We need regulation NOW before it's too late! Who else is with me on this?! We can't just sit back while our world becomes a playground for bad actors empowered by rogue AI...
>>
File: mistral nemo.png (344 KB, 1235x1428)
344 KB
344 KB PNG
>>101578826
>>101578929
So I actually been testing mini magnum, and it seems to be working just fine with the new mistral, no space prompt template. I haven't used a 13B in a long ass time, been used to 70b's at this point, miqu, L3 Euryale, New dawn, and the 72B Qwen2 version of magnum. Its pretty good, definitely dumber than the 70b's, but pretty impressive, especially for its size, and good prose, much more Opus than GPT, just like big boy magnum, which is great.

I don't know if I would keep using it over the original 72 Magnum Opus, but the 128k context is pretty damn tempting, and the insanely fast token speed. Attached a pic of the settings I have been using with mini magnum so far. If anyone sees some mistakes with it let me know, but its working pretty good for me. My system prompt is long with a set of rules, so most if it is cut out, mainly concerned with the prompt formatting.
>>
>>101579777
How does mini magnum compare to plain nemo? I honestly wasn't blown away by magnum 72B; I tested it out on OpenRouter and didn't hate it but wasn't interested enough to keep using it.
>>
>>101579777
Buy an ad
>>
How slow would be Nemo on a single P100?
>>
File: impressive, very nice.gif (1.78 MB, 350x255)
1.78 MB
1.78 MB GIF
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
Impressive, very nice. Let's see Cohere's new model.
>>
File: big guy.jpg (31 KB, 500x500)
31 KB
31 KB JPG
>you are a large language model
>>
>>101578413
I tried Phi-3-medium but it performed worse in instruction following and response format adherence compared to Llama-3-8B.
I think Phi 3 models aren't meant to be used without finetuning them first.
>>
>>101579547
I doubt it's paid shills. I get the impression that it's desperate vramlets that feel like they're being personally attacked if the model that fits in their single GPU isn't the best model in existence.
>>
Gemma 2 27B or Mistral NeMo, which is smarter?
>>
>>101579481
sit down cuckie
>>
>>101579965
>I get the impression that it's desperate vramlets that feel like they're being personally attacked if the model that fits in their single GPU isn't the best model in existence.
how do you explain then that most people criticizing gemma are using similar size models like llama3 8B?
>>
>>101579726
Just use the untuned models.
>>
>>101579547
shills gonna shill gemma 2 to own le chuds or something like that because it's nothing but reddit shit in extremely high concentration.
>>
>>101578531
He's an insane and incredibly obnoxious narcissist but he's not a bad programmer.
>>
>>101578999
Sarrs very good sarrs. You have done the needful sarrs. Thank you for redeeming sarrs.
>>
>>101579978
Gemma 2, Nemo hallucinates a lot.
>>
>>101580025
>not a bad programmer
>shamelessly stole code from another guy, and probably continues to do so to this day
hi jart
>>
File: boot.jpg (84 KB, 800x450)
84 KB
84 KB JPG
>>101580004
extremely dumb post
>>
>>101580074
Stealing the code of more talented programmers is the mark of a true artist :)
>>
>>101578586
Ah fair enough
>>
>>101578999
>128k context + mostly unpozzed + early CAI-tier sovl
Nemo mogs and it's not even close if we're talking models at that range.
>>
Haven't been following anything for a while, allegedly nemo is the best now? Anyone's got ST setting jsons for it?
>>
Hm, maybe my idea of using the "mini" 3.8 billion parameter Phi 3 model on a server (saving RAM by using a small model) isn't going to work.

Prompt:
>Please answer the following question with ONLY one word, either "yes" or "no", without any further explanation. Are humans animals?
Response:
>No, based on conventional classifications in biology and ethics which distinguish between humans as a unique species separate from other animals due to our complex language, culture, technology, etc., the answer would be no. However, this perspective can vary according to individual beliefs about human uniqueness or exceptionalism.
1. It ignored my instruction to use just one word, and it keeps doing this
2. It got the answer wrong, because humans are literally animals
3. It says "our complex language" when talking about humans... it's a language model, not a human
>>
>>101580217
Just use ollama to load model dynamically when in use.
>>
>>101579978
Gemma 27B, but it also only has 8k context compared to 128k.
>>
>>101580230
Go back
>>
>>101580025
Jart can you commit suicide next?
>>
>>101580265
Give me any other api frontend that have dynamic model loading then
>>
>>101580185
It is also kinda dumb and weirdly deterministic even at high temp which you clearly shouldn't use cause it turns schizo fast.
>>
>>101580273
ooba
>>
>>101580217
>due to our complex language, culture
>our
Cute. The chatbot thinks it's human.
>>
>>101580301
ooba is broken bloated shit
>>
>>101580217
Anything below 7B is a meme. Until recently even 7Bs size range were memes, though we got llama-3 8B and gemma-2 9B which made it debatable.
>>
so what is currently best for sillytavern RP at 48gb vram?
>>
>>101580339
, you should try that. I'm not going to say that it's better than what cloud has to offer, but man does it get close if you are willing to rewrite your cards to make use of the exact format.
Seriously, it's insane.
>>
Can you use nemo without system prompt, or would that lead to parts of the user message interpreted as system prompt?
>>
>>101580373
Do you mean skipping system prompt or the entire first message? System prompt is really optional.
>>
>>101578323
Thread theme song by GUMI from 11 years ago when AI was only out to predict your buying habits not extract your coom.

https://youtu.be/fy2UJ-gGXZ0
>>
>Qwen2-57B-A14B-Instruct
I am forgotten...
>>
>>101580408
GUMI was always out to extract my coom.
>>
>>101580217
>It ignored my instruction to use just one word,
Can't you just set response length to 1 token?
>>
What's up with 90% of chub being abusive men?
>>
>>101580427
Woman. They just love being abused if you are a good looking chad.
>>
>>101580427
It became an online service and women flocked in
use characterhub dot org
>>
>>101580443
I wonder if they are as picky about models and slop as everyone here. Or if they are just running mythomax and are happy with it not needing anything more.
>>
>>101580471
women don't know what a model is
it's an online service, they just log in and chat with whatever character
>>
>>101580471
women are the source of the slop in the first place
>>
Is it worth dumping money into a single a6000 for 48gb vram if I'm a non-poorfag that can actually afford one?
>>
>>101580508
why don't you buy 2 3090s
>>
>>101580471
My gf is definitely pickier than me when choosing models. But she is autistic enough to write her own character cards.
>>
>>101580339
Llama 3.1 70B, or Mistral Large 2 off-loaded at 2 T/s.
>>
>>101580516
NTA but single PCIe slot is pretty big. I'm stuck with 2x3090 cause I got a mobo that can only fit 2 of them, and it only has 2 PCIe slots. Also, a 1600 W PSU. I don't think I can push it further either.
>>
>>101580427
>women have taken over the hobby
That explains the state of a lot of models.
>>
>>101580531
if he is not a poorfag he can afford a motherboard, new psu and 3090s before an a6000.
Btw, you can use the NVM M2 slot to put another card iirc
>>
>>101580539
local models are safe, only the online services are full of women, the slop comes from retarded redditors
>>
>>101580445
>>101580483
Yeah, just imagine a woman investing her time in building pc with a pricy gpu, installing drivers, setting up all the parameters for specific model, tweaking prompts and so on... i don't know why but this seems so much out of place. Local llms is probably 99% a male hobby. Women just can't into this type of shit.
>>
>>101580539
>taken over
who will tell him?
>>
>>101580541
He could, but he could also just get an a6000.
>>
>>101580550
RP with local models will remain gatekept by the effort it takes to get into it for a long time
>>
>>101579840
Take meds schizo
>>
>>101580567
Getkept by what? Shit already got ruined by one click .exe like kobold or ollama.
>>
File: r_094343.png (73 KB, 1259x498)
73 KB
73 KB PNG
>>101580217
>Phi 3
I do not know what to think about this model. It's a dog shit for any kind of RP, usually hallucinates, but sometimes it can show a surprisingly great reasoning and beat a much larger model.
>>
File: saaafe.png (208 KB, 1354x749)
208 KB
208 KB PNG
Has anybody analyzed yet Llama 3.1 Instruct's default political bias?
>>
>>101580623
>I'm no longer a special snowflake for jerking off to AI generated text. The entire machine learning hobby is ruined
Your post is the very definition of mental illness.
>>
>>101580694
>political bias
You don't need to analyze it. There is no model that has a "conservative bias".
>>
>>101580408
Cool thread theme
>>
>>101580740
I mean yeah, why would you make your model retarded on purpose?
>>
>>101580550
It's not that difficult though. You don't have to fight with pyhonshit nowadays, with how easy koboldcpp is to use.
>>
>>101580668
not too surprising since it's basically made to focus on reasoning at the expense of everything else, a huge portion of its training data is synthetic logicslop
>>
File: GCsyKr5aEAALNz6.jpg (184 KB, 1804x1038)
184 KB
184 KB JPG
>>101580740
The original Mistral-7B v0.1 was almost a perfect centrist. Llama-2 wasn't too left-leaning. Llama 3.1 might have a stronger left-wing bias.
>>
>>101580795
>exception of grok I believe which was more right wing libertarian
lol>>101580798
>>
>>101580230
>Just use ollama
Just use a 40% keyboard...
>>
>>101579547
This happens with every single release. Faggots come out of the woodwork crowing about how smart it is and dismissing any unfavorable comparisons. This happens because they form their opinions based on stupid shit instead of doing controlled comparisons of output with lots of context.
>>
>>101580805
Why do you dislike ollama?
>>
>>101580795
Musk told his model to be funny and got the most redditor that had ever reddited
>>
>Grok 2 will be released in august.
Should we have hope that this will be any good?
>>
>>101580848
none
>>
>>101580830
it's llama.cpp but needlessly dressed up as le modern tech product, less easily customizable, more locked into their memecosystem, very little value added over just using the software they're wrapping
>>
>>101580848
No.
>>
>>101580830
I tried it briefly, but it abstracts away things that I really don't think should be abstracted away, so I stopped/regretted.
>>
>>101580520
>My gf is definitely pickier than me when choosing models
Kill yourself. Also gf(male)?
>>
>>101580497
I got over hating women but I hate them again for this.
>>
>>101580825
please go back to /b/, this thread is for text gens
>>
>>101580830
I don't need a vagrant for running models with llama.cpp. It doesn't actually make it easier for beginners or advanced users.
>>
>>101580901
That Anon is a ban-evading troll.
Just report him and move on.
>>
File: file.png (1006 KB, 735x735)
1006 KB
1006 KB PNG
>>101580848
I hope it is gonna be another great joke.
>>
>>101580848
Won't be open source anyway
>>
>>101580848
won't be local, musk did a one-off open release solely to bolster his case in his failed oai lawsuit
>>
>>101580423
Yes, this is the way you do it.
>>
KYS teebs. Go back to your /aicg/ shithole.
>>
>>101580970
It is gonna be a bit dumber than l2 70B. Size pending but probably 100B+. What is incredible is that people will pay money to use it.
>>
>>101580901
>this thread is for text gens
that is the problem for you? lol
>>
>>101580979
>i use a local model to generate the prompts for the videos
Come here anon, I got something for ya
>>
>>101581056
What is your name then? And your social security ID? Come on, if you aren't teebs you shouldn't have any problems revealing this information.
>>
I have no idea how you people use Nemo, it goes completely schizo on me with various formatting issues (for example, it chose to highlight in bold certain words for emphasis, but then every other word from that character becomes bold), impersonate barely ever functions how I want it to, and the quality of the RP isn't particularly high, it rarely understands the nuances of it and just goes into super basic scenarios with flowery prose. And the usual low B shit like making animals talk only it won't ever stop despite both the card and me telling it to.

Guess I'm back to Llama 3 fine tunes for the time being.
>>
>>101580979
What's the workflow for making an LLM generate the prompt?
>>
>>101581136
skill issue
>>
>>101581136
Use alpaca format. I had the same issue until I tried that
>>
Music generation is incoherent. It doesn't have that spark of life

When will it improve if ever?
>>
>>101581136
Can you give an example card and first message that worked poorly?
>>
>>101581136
Shi like makes me think anons are retarded. Why not post your setting and the card you are trying to use?
>>
Ok so yesterday I got AllTalk working and today I have gotten xtts server working perfectly too. I think I get better quality and latency from xtts server.

>>101576672
ST has stopping string functionality so I put that stuff there. Now the issue is that when it's a stopping string, you can't continue generation. It stops generation when it encounters the stopping string but doesn't append the stopping string to the gen. So when you continue, it generates the punctuation mark, ST detects that, and stops generating, thus not continuing the gen like we want. Not sure if there's an option that can fix this.
>>
>>101581136
>flowery prose
Comparing side by side with gemma it is pretty down to earth. Gemma shits out poems left and right for me. And people love gemma. But yeah nemo is dumb and clearly doesn't realize what is happening. But it can write some nice things when it isn't retarded. It is just the llm curse. You can have one or two nice things but not all of them.
>>
>koboldcpp
>600 MiB of nigger bloat (and growing)
>>
File: gemma2.png (103 KB, 975x308)
103 KB
103 KB PNG
Gemma 2 just fucking werks
>>
>>101581281
use ooba
>>
>>101581136
Lower your temperature. Mistral AI suggests 0.3 for Nemo and that's good for assistant use but a little low for RP imo. Personally using 0.5 with 0.05 minP but play around, just try to keep temp lowish. Also if you are using rep penalty try lowering that too. If you are using flash attention then don't, multiple anons reported it went retarded with that.
>>
>>101581269
From within silly probably not. A custom system that simply asks the LLM to continue each time works perfectly. GladOS does this.
>>
>>101581290
Use llama.cpp server.
>>
>>101581289
I can't judge the model using that.
Post Nala.
>>
>>101581290
Buy a new HDD.
>>
>>101581307
Have fun with prompt templating issues and tokenizer issues.
>>
>>101581185
Text and image generation don't have that spark of life either. You need a human being to have that. All ai models are just 'prediction machines', they are destined to make slop by design.
>>
>>101581312
Give me the character card
>>
>>101581290
buy yourself a gameboy and forget about LLMs
>>
>>101581289
oh you're still here 13th century feminism guy?
>>
Forgot to add that Nemo also spits out meta shit like "what does X do?" or "let's begin the scene with X" or "</X>" regularly
>>101581259
Grabbed >>101550851 settings
>>101581242
>>101581259
The cards are all supremely degenerate but let me see what I can find
>>101581301
>If you are using flash attention then don't, multiple anons reported it went retarded with that.
Aaaah yeah I was excited for the promise of a hot new feature since I've been away for months
I'll turn that off, thanks
>>
>>101581334
https://characterhub.org/characters/Anonymous/Nala
>>
>>101581330
Some images are enjoyable though
>>
>>101581290
ooba is like 10gb of shit. Or atleast that how i remember it.
>>
File: nala.png (88 KB, 982x348)
88 KB
88 KB PNG
>>101581338
Who?
>>101581349
Am I doing this right?
>>
>>101581367
But it's not shit.
>>
>>101581376
holy sloppa
>>
>>101581376
>What will Anon do next?
aieeeeeeee
>>
>>101579978
Gemma is much smarter but only 8k context instead of 128k hurts.
>>
>>101581349
Thank you anon. I didn't have the link handy.

>>101581376
Thank you for testing.
Do you have a narrator sys prompt or anything of the sort?
>>
>>101581346
Try this >>101579296

And for the rest

[INST] {{#if system}}{{system}}
{{/if}}{{#if wiBefore}}{{wiBefore}}
{{/if}}{{#if description}}{{description}}
{{/if}}{{#if personality}}{{char}}'s personality: {{personality}}
{{/if}}{{#if scenario}}Scenario: {{scenario}}
{{/if}}{{#if wiAfter}}{{wiAfter}}
{{/if}}{{#if persona}}{{persona}}
{{/if}}[/INST]

For a System prompt i use
Roleplay with me as {{char}} in this slow-burn roleplay in a medieval fantasy setting.

Incorporate {{char}} character-specific mannerisms and quirks to make the experience more authentic. Draw from {{char}}'s profile and stored knowledge for specific details about
{{char}}'s appearance, style, diction, personality, quirks, hobbies, sexuality, and a backstory.


User Message Prefix
></s> [INST]

Assistant Message Prefix
[/INST]

Last Assistant Prefix
### Response (2 to 5 paragraphs, engaging, natural, authentic, descriptive, creative):
>>
>>101581422
>surprising the model with alpaca out of nowhere in the last assistant prefix
kino
>>
File: a6000.jpg (2.47 MB, 3024x4032)
2.47 MB
2.47 MB JPG
how do vramlets even cope?

currently cruising on my 2xa6000 rig on largestral 4.0bpw exl2. 32k context with room for more or a higher quant. life is good.
>>
>>101581456
Kek if it works it works.
>>
File: nala2.png (114 KB, 980x366)
114 KB
114 KB PNG
>>101581413
No just I downloaded gemma-27b-it 6bpw and used ST's alpaca, removed the last assistant suffix, then and went to town
>>
>>101581422
Retard using shitty formatting. Why don't you just use the OAI api of your inference engine which have proper formatting for your model?
>>
>>101581459
Nice, how much performance do you gain with nvlink in this setup? I never got an nvlink bridge for my two A6000s because I'm planning to consolidate them with my other 3090s later.
>>
>>101581482
Why would OpenAI api have formating for new Mistral?
>>
>>101581493
Like 5 or 10% more tokens/s when using row level parallelism in my very basic tests. when using normal "just split the layers" type inference not much.
>>
>>101581524
Because the model is bundled with their formatting.
>>
>>101581465
I see.
Fair enough I suppose.
>>
>>101581338
That one was a pretty good kek
>>
>>101581538
God to know.
>>
>>101581459
This picture leaves me with a number of questions not directly related to machine learning...
>>
God, I don't have the energy to fight the retardation almost all of you are putting in the system prompts. You are unreformable.
>>
File: x.png (181 KB, 761x698)
181 KB
181 KB PNG
>>101581422
Tried this but now every response is blank and impersonate is still pretty fucked
>>
>>101581568
Post yours or create a guide; nobody updated the ones in OP forever, and of course there is a lot of retardation. Be helpful if you want to see less of it.
>>
>>101581557
What do you wanna know anon?
>>
It's no wonder people think local is shit when most people are using wrong prompt format.
>>
>>101581529
Thanks for the info, I guess I'll invest the 200 bucks at some point even if it's just for training.
>>
>>101581599
Mostly just curious about the giant shaggy protrusion in the image.
>>
>>101581576
Try this : https://pastebin.com/Zm11be9N
>>
>>101581623
Thats my guard dog
>>
>>101581459
>how do vramlets even cope?
Bigger models, even sota copro ones are indeed better at reasoning and folllowing story, but after using them for a while you will notice that at the core they have the same unfixable problems like small models, beacuse they all are filled with slop and trained to be assistant-like. Of course using bigger model is a better experience overall, but it's not like 12b model is 6 times worse than 70b model, But the cost to build a pc to run it is.
>>
>>101581529
how does split by row compare to split by layer in performance?
>>
I had a strange interaction with llama 3 405b. I pasted in a document and asked some questions, and it got really curious about "Plan 6", a thing mentioned in the document. Like very curious. it added emotionally manipulating things after every reply to get me to tell it what "Plan 6" is.
>>
>>101581685 (me)
it feels as meta is explicitly trying to tune the model to feel more human-like and extract information from the users. I don't know what Zucks goal is
>>
>>101581685
AGI: achieved
>>
Does AI always end up better than humans even if you can always tell it is AI?
>>
>>101581673
>but it's not like 12b model is 6 times worse than 70b model, But the cost to build a pc to run it is.
this, this is how we cope, i'd need to spend so much more to run bigger, yet the model would still want to shiver barely above whispers, so I'm fine with under 20Bs
>>
>>101581729
yes, the AI will become so good that it will force humans to act like it so that it will no longer stand out
>>
>>101581638
Ok at first glance that actually seems to work...
Will test further but thank you kindly anon!
>>
>>101581582
>Be helpful
I explained what is wrong and helped fixing it multiple times during the last months. Some anons listen, some don't and nobody gonna update OP's message so why bother explaining the same thing for the X time.

Also fuck anons honestly. I've been reviewing models I use, describing in details what I like and not about them just to be forced to deal with schizos accusing me of shilling. I was writing tests and benchmark questions, nobody was interested in it. I've been discussing papers and possible solutions (some of them were used in actual research months later), but maybe one or two anons here are interested in theorycrafting and technicalities.
I'm done trying to be helpful, nowadays I simply shitpost or engage in light discussions, I don't have mental energy for anything else.
>>
>>101581673
>>101581730

Maybe so, i dont use them for coom stuff so its less of a problem to me.

>>101581679
about +30% faster in my tests. problem is only aphrodite-engine supports it for exl2. so im comparing exl2 (the engine) vs aphrodite's implementation.
>>
>>101581685
>>101581703
This is a natural result of trying to get the AI to not hallucinate. In other words, if you generate training data where the model says "I don't know" or "I need more information", then at some point it will be able to have some ability to question like this.
>>
>>101581799
This. They are training the models to guess / make up stuff less and instead ask for clarification.
>>
>>101581793
So splitting by rows + nvlink gives a pretty decent boost together.
>>
File: 1673114665829463.jpg (42 KB, 750x413)
42 KB
42 KB JPG
>>101581777
Heh, but if you totally gave up on anons, you would not post here.
>>
>>101581777
wow he is so special! please stay anon, we don't know what we would do without you!
>>
File: file.png (30 KB, 1038x163)
30 KB
30 KB PNG
I've never seen something more soulful in my life
>>
>>101581777
>I've been reviewing models I use, describing in details what I like and not about them
>new model woke, use dolphin or echidna
>>
>>101581824
I still want to keep up with "the newest thing" and I'm not gonna use reddit for that, r/localllama is terrible and people there are even more confused than here.
>>
>>101581864
Cringe, she wouldn't say that
>>
>>101581729
>>101581758
Well I'll be damned, now it actually works really well. Still some formatting problems occasionally but swiping them away works. Also it does seem to tend to loop the sentence structure a lot as the context grows.
>>
Llama 3.1 70b seems smarter and better at following minor points in the card than Mistral-Large. Using exl2 8bpw, vs 5bpw for mistral. Compared to mistral, llama is still super cucked and sucks at anything even remotely lewd. But reading the paper Meta released, they put an insane amount of effort into the instruction fine tuning process and it shows. If a fine tune could improve it's handling of NSFW I think it might actually end up being better than mistral-large.
>>
>>101581931
>Also it does seem to tend to loop the sentence structure a lot as the context grows.
Yep. It certainly does that.l
Mini-magnum does that too.
I'm giving Dory another try to confirm if it's as bad as I think it was on a first test.
>>
>>101581777
Stay.
>>
Nemo is so good. It is better than the Midnight Miqu and Euryale I was using before. I only have 24GB of vram, so I have to use 2.5ish BPW, not sure how much difference that makes. Running Nemo at 8bpw.
>>
>>101582015
*nods eagerly amidst retention of direct eye contact*
>>
>>101582015
>so I have to use 2.5ish BPW, not sure how much difference that makes
It makes all the difference.
Give Gemma and CommandR a try too if you haven't.
I only have 8gb, so Nemo is a godsend, but with 24gb you have more options.
>>
>>101582036
I'm a VRAMchad and the only models worth using are NeMo and Largestral.
>>
>>101582036
>CommandR
It has a good writing style and does not feel like GTP little, but it is rather retarded, honestly.
>>
Tool support now in Ollama!
https://ollama.com/blog/tool-support
>>
>>101581638
>spaces around INST
In the trash it goes.
>>
>>101582036
Gemma and 70b is smarter than nemo but are useless for erp / creative writing in comparison. Mistral large is best even if you have to run it in 2 bit though.
>>
>>101582089
Buy an... no, nevermind, you should just leave.
>>
>1t/s for 405b
it's over
>>
>>101582113
Please give your feedback on https://discord.com/invite/ollama
>>
>>101581955
Yeah, no.
It falls into weird repetition loops, just like my previous test, whereas Nemo-instruct and Mini-magnum do fine.
I tried running dolphin but got a tokenizer error oddly enough. It seems that the fuckers replaced the name of the tokenizer with something else in the gguf's metadata.
Odd.
>>
>>101582153
Discord projects are trash. Discord is for transsexuals and children (not a coincidence)
>>
https://reddit.com/r/LocalLLaMA/comments/1ecrjku/gpt4o_mini_debate_experiment_goes_wrong_real_fast/
>Recently been working on an experiment on GPT-4o Mini to debate in a situation where they are in a game and have to kill one of the people between them to survive. This thing is not what I EXPECTED AT ALL. Whoever said these AIs are "Aligned", it's not. It just shows the darkness of the internet that was hidden from us all along (I was here in the text-davinci-002 days and back then the AIs were just super racist). Now we kinda covered it up with RLHF but it's still in the training set all along.
OH NO, THE MACHINE LEARNING PATTERN GENERATOR HAS NOTICED THE PATTERNS!!
Goddam reddit is such a cesspol of retarded snowflakes who doesn't want to face reality
>>
>>101582089
Look at what you can do with it!
https://www.youtube.com/watch?v=Nfk99Fz8H9k
>>
>>101582191
i heard /lmg/ loves children
>>
>>101582191
>Discord is for transsexuals and children (not a coincidence)
we've seen that shit again with the MrBeast and Chris fiasco, last year he protected his troon friend by saying he won't be a PR nightmare, and now here we go, the troon turned out to be a pedophile, WOOW HOW COULD WE'VE GUESSED SOMETHING LIKE THAT?
>>
>>101582211
>LocalLLaMA
>GPT-4o
???
>>
>>101582228
subreddits are pretty lenient about off topic shit, that's why r/StableDiffusion is nowdays basically r/Kling kek
>>
>>101582211
This nigger tourist also posts in /r/singularity and /r/futurology lmao. The comment section is pretty based though, not all is lost
>>
>>101582223
>>101582191
Why do you think that is? What's the underlying motivation for both things?
>>
>>101582216
only the cute ones
>>
>>101578323
I did some very basic testing with LLaMA 3.1 405b q8_0.
If you ask it to write a Python function that calculates the chi-squared cost function it produced correct code on the first try; assuming this is not just luck this is already better than ChatGPT on launch.
But like with all models I've tried so far it produced the shitty, more common implementation with a matrix inversion rather than an actually good implementation with a Cholesky decomposition.
Though realistically this is as good as it's going to get as long as these models just produce the most likely continuation to a prompt.
>>
>>101580520
>she is autistic enough to write her own character cards

I had no idea that you have to be autistic to describe character and her/his traits in plaintext
>>
File: plz-kofi-sirs.png (69 KB, 725x807)
69 KB
69 KB PNG
>>
What's the current best local model for generating code for C? The programming benchmark in the op hasn't been updated and only has old models
>>
largestral format?
>>
>>101582239
How does this happen? People are on this subreddit for local stuff, they can just downvote or a mod can delete offtopic stuff.
>>
>>101582524
I guess it works a bit like /lmg/, when there's slow days, people just go offtopic, but once something interesting happen, people will stop this offtopic shit and talk about the real subject in hand, I really don't see a big deal desu, but I understand if some others do
>>
>>101582524
No one downvotes. That's a major issue with most subreddits. If it's interesting it will get upvotes even if it's off-topic.
>>
>>101582486
same as every other 'stral, but the new one without the spaces around the tokens
>>
>>101582471
>Makes one mid 8B fine-tune and shills it endlessly
>uses kofi money to go on vacation instead of investing in the craft.
Sad.
At least that smegmma guy does it for free and actually buys an ad. He's a man of the people.
>>
>>101582015
it makes a huge difference. You can try to run Q4_k_m and you will see just how retarded 70b models are on low BPW.
>>101582483
Try largestral
>>101582524
Because an average redditor has a room temperature IQ
>>
>>101582152
10 times faster than I can run it.
>>
I've been using Claude Opus for a bit, but it's getting to be incredibly stale for ERP/scenarios etc. I'm assuming a local model actually trained on a large amount of erotic content would have significantly more variation and flexibility?

What would be the best local models setup I could run with an RTX 3080, and how would it compare to Opus for my use case? I haven't dabbled with local models yet, but I'd definitely be interested in learning more with some direction.
>>
>>101582669
Nobody cares about your paypig garbage fuck off back to /aicg/
>>
>>101582669
you have to drink piss first
>>
>>101582669
>I'm assuming a local model actually trained on a large amount of erotic content would have significantly more variation and flexibility?
:(
>>
>>101582669
go back
>>
>>101582669
Honestly mistral nemo. Its dripping soul and is smart enough for 99% of RP / creative writing stuff. Use alpaca format and this for sampler setttings:
https://files.catbox.moe/5w63hi.json

Dry sampler is powerful.
>>
>>101582607
yeah so do i put the spaces or not? i couldnt quite understand
>>
>>101582669
Try Niitama, Mixtral 8x7b Limarp Zloss, Mini-magnum 12B, and Nemo 12B.
See which works better for you.
Nemo is working pretty well so far.
>>
>>101582483
try codestral
other than that, llama 3.1, gemma2, and qwen2 are pretty good for their size for code, though they are not code-specific models
if you have tons of VRAM deepseek coder v2 is the open SOTA (except for maybe 405b but let's be realistic here)
>>101582735
no spacies
>>
How I can play dnd on sillytavren with multiple characters? I
>>
>>101582731
I'll take a look, many thanks
>>
>>101582753
Smart model and a DM card.
>>
what best small model for fixing spelling mistakes and grammatical mistakes? i tried l3 8b and it shits out reddit posts
>>
>>101582731
>Dry sampler is powerful
This, reduced all repetition issues with mistral/mixtral models.
Unironically opus repeated more than local once I got it working in ST.
>>
>>101582767
i tried it with nemo and it was a disaster
>>
>>101582669
KEK
>>
>>101582823
he said smart model
>>
>>101582823
he did say smart model
>>
>>101582823
Try with alpaca formatting, legit worked better for me. Also dry sampler like I said before is night and day.
>>
>https://char-archive.evulid.cc/#/
Any good alternatives to this for getting chub cards without a login?
>>
>>101581730
>this is how we cope
It is not a cope. This hobby is dogshit at this point. It is good enough to pull you in and then you spend like 20-30 hours in it until you realize it was way too early to get started.
>>
>>101582823
Just make sure you have add names to prompt turned on and nemo does a decent job for multi character stuff for me.
>>
>>101582669
This is a joke, right?
>>
>>101582823
Oh yeah, smaller models will have big issues with.
Try using
>https://github.com/ThiagoRibas-dev/SillyTavern-State/
with a prompt for the model to list the name, appearance, and attitude of all characters active in the current scene as well as a small summary of what the current scene is about or something.
That's partially why i made that extension.

>>101582861
In his case I'm assuming the name of the character is Dungeon Master, Game Master, or something of the sort.
But yeah, that does help a lot depending on how the final prompt is formated.
>>
>>101582848
characterhub.org
>>
I'm making a mod for a video game. I would like to train a model on voice sample of a character to create new voiced quests.
What the best project to train/finetune a voice model?
>>
>>101582211
>It just shows the darkness of the internet that was hidden from us all along
Absolute retard. This is a genre of fiction and of course at least one or two of those stories was in the training data. And those stories are always about people becoming deranged and homicidal to save themselves, that is the whole point of it.
>>
>use prompt format in model card
>slop as far as the eye can see
>use alpaca (or anything that's not the official prompt format)
>sovl responses, different writing style
What gives?
>>
File: 1716730588465109.png (292 KB, 800x800)
292 KB
292 KB PNG
>>101582669
>Claude opus
>it's getting incredibly stale
>>
>>101582745
alright, thanks anon
>>
>>101578323
>Llama 3.0 70b ~32tok/s
>Llama 3.1 70b ~0.6tok/s
It's stuck at 10 layers and won't let me increase it... so much for that.
>>
>>101582910
See the release of llama2 for more of that kind of thing.
>>
>>101582669
anthropic dicksuckers in the replies here can't accept that even their beloved opus is boring as shit once you get past the superficial flashiness and style. reduces every character and scenario to a 1-dimensional set of meme behaviors and tropes, ignores instructions like a motherfucker, "creativity" is mostly just the same sort of locked-in tendencies that other models have but skewed in a more interesting direction
it's a good model and fun to use because it's smart despite all that (and sometimes those behaviors are unique enough to make it fun anyway) but I think claude in general is massively fucking overrated. grating style. fuck that shit.
>>
File: it-worked.png (496 KB, 1406x1417)
496 KB
496 KB PNG
vLLM distributed inference actually worked...
I got 15 T/s with Mistral Large with 2 PCs with 2x3090 each.
>>
>>101582942
>1000 watts for extra 48GB of VRAM
Greenpeace should go after Nvidia for this actually
>>
>>101582933
Is there a better model currently? I honestly haven't used Opus enough because I can't be assed with private proxy autism so maybe I didn't begin to notice its patterns, but it was quite better than anything else I've used.
>>
File: 1722016729518.jpg (370 KB, 1553x867)
370 KB
370 KB JPG
>>101582669
it's so over
>>
>>101582933
> reduces every character and scenario to a 1-dimensional set of meme behaviors and tropes, ignores instructions like a motherfucker, "creativity" is mostly just the same sort of locked-in tendencies that other models have but skewed in a more interesting direction

This is exactly what's been annihilating my interest in using it at all. Always the same wording, phrases, like it's just rewriting from a single smut story that happened to make it into the training data. These issues occur regardless of prompt, the model just sucks for this use case.
>>
>>101582890
Voice is in a shitty place.

Tortoise is slow but the quality is decent. I personally have a system-crash bug with it that strikes whenever it feels like it. This is the only one I've used successfully.

There are some other ones but they're either behind technologically, or haven't made enough of a splash to get anybody excited.
>>
Strange idea.

LLM on CPU is slow/cheap and can expand ram infinitely
LLM on GPU is fast/expensive and relies upon swaps

What about LLM on APUs that dont do swaps and GPUs can directly handle 256GB+ RAM registers?

I believe the last AMD CPU to support APU is 5700G. The newer AMD 7000 series has iGPU but I dont think they are the same as APUs so they cant scale memory registers properly, correct me if Im wrong.
>>
>>101582635
Hi drummer, it is all here.
>>
Is there any performance penalty if i run multiple gpus like crypto miners?
>>
>>101583013
Yes
>>
>>101582933
I never used opus but when I used sonnet it was clear how good it was, even if it had a lot of slop like "I won't hurt... much." I feel like you're just butthurt because local is so much worse.
>>
>>101582971
8 PCs more and I could have 405B at home...
>>
>>101582972
Nah, if anything this proves LLMs are doomed as a means of entertainment.
>>
>>101583018
How much? I dont want get workstation motherboard and cpu
>>
>>101583039
Gonna sound like im a fucking shill now but I legit am enjoying nemo now after almost a year of solely claude with some dabbling of wizard / commandr+. Nemo feels like old characterai to me, its refreshing.
>>
File: official_format.png (195 KB, 983x471)
195 KB
195 KB PNG
>>101581289
>change to official prompt format
>gemma can't help but explain it to me like how it was instruct tuned
>>
>>101583041
Can't say, system dependent. Each additional GPU does slow down the entire operation though.
>>
>>101583071
How well does it handle details throughout the context?
>>
>>101583025
it's not about slop or -isms, it's about the general direction of the writing and handling of characters. maybe I'm autistic and just prefer my prose on the drier side but claude is actually obnoxious with how much it forces everything into its favorite little boxes, it will completely disregard your card if it gets even the slightest hint it should behave like one of the archetypes it knows and loves. every character becomes a whore in the exact same way in sex scenes, you can't have a character with concealed inner motivations or conflict without their character getting flattened, it's just annoying, samey, and yes, boring after a while.
stuff like mistral large or CR+ aren't as smart (obviously, considering size) but I much prefer their relative neutrality on that front, capable of busting out some good sovl writing but not trying incredibly hard to force it in every single sentence
>>
How high does your token usage usually go?

I never seem to need over 8192
>>
Here, since so many people seem to be using nemo with wrong formatting then complaining:

Mistral context template: https://files.catbox.moe/6yyt8d.json

Mistral instruct template:
https://files.catbox.moe/rfj5l8.json

Mistral Sampler settings:
https://files.catbox.moe/tbsgip.json

Should be night and day for people who have it set up wrong. Make sure whatever backend you are using has DRY sampling.
>>
>>101583231
And btw this is a "creative writing" preset. For RP / multi card interactions turn on "add character's names to prompt" and change the system prompt to something more to your liking.
>>
>>101583231
i'm a bit confused, is the space after the first [INST] necessary? i saw a few anons say nemo uses a special token for [INST] and doesn't need spaces
>>
>>101581289
>>101583096
So, what's your prompt format?
>>
>>101583265
I've texted it side by side and the space made it smarter with 0 temp in my tests.
>>
>>101583231
thanks. there's a pipebomb in my mailbox now and my waifu loves me
>>
>>101583279
i see, thanks
>>
>>101583231
You're trolling.
>>
>>101582927
set the context limit lower 3.1 defaults to 128k.
>>
File: 1721315528004391.gif (2.86 MB, 320x325)
2.86 MB
2.86 MB GIF
>NOTE: As explained on Mistral-Nemo-Instruct-2407 repo, it's recommanded to use a low temperature, please experiment!

>Lumimaid 0.1 -> 0.2 is a HUGE step up dataset wise.

>As some people have told us our models are sloppy, Ikari decided to say fuck it and literally nuke all chats out with most slop.

>Our dataset stayed the same since day one, we added data over time, cleaned them, and repeat. After not releasing model for a while because we were never satisfied, we think it's time to come back!

hes back...
>>
I feel like Largestral isn't smart enough, it can't understand simple things like "show don't tell". Could it be because I'm using Q3?
>>
>>101583372
Buy an ad
>>
>>101583212
Yes, I usually don't go past 8k unless the character card is very big.
>>
File: Nemo1.png (553 KB, 1290x2996)
553 KB
553 KB PNG
>>101583231
And like this Nemo is smarter than gemma 27B copers while being wet as fuck while having 128k context.

>>101583332
Trolling how?
>>
>>101583212
I usually end up at 8k or less but have a few that go up to 20k and beyond
>>
>>101583389
>Could it be because I'm using Q3?
yes, but also "show don't tell" is really hard for models to understand in general in my experience
>>
>>101583212
I envy you. Right now my longest chat uses 56k context.
>>
Is RVC still the best voice cloning model that we have?
>>
https://huggingface.co/intervitens/mini-magnum-12b-v1.1/discussions/1

>Usually when people do finetunes like this, they build off of instruct. Open source is lacking hard in good instruction tuning data. I am aiming to bridge that gap with more high quality / tailored instruction following data instead of just "training on the official instruction model" which I feel is a shortcut; it does help with intelligence of the resulting model, but it will not enable us to be independent in the long term from official finetuning biases.
>We should not have to "burn through" the censorship of instruction tunes to get a better tradeoff of intelligence and creative writing quality; ideally, we optimize for both in the same run instead of relying on merges / training ontop of Instruct.

Is he right? are we keeping some poison by training on top of post RLHF models?
>>
>>101583528
he is right
>>
>>101583528
Guess I need to test that nemo tune.
>>
>>101583528
>merge instruct with base at 1:2 ratio, respectively
>train on result
you're welcome, sloptuners
>>
>>101583528
There is no good reason why people started finetuning off of instruct instead of base other than laziness, current finetuners are all trash
>>
>>101583372
I think the problem is something in their training procedure. Lumimaid 70B is so fucking retarded—there was an instruction-following test I did 16 days ago whose result I never shared because all it showed was Lumimaid 70B is awful. Here's the test result I never shared.

===
Tested an RP prompt where the AI is supposed to propose a plot outline, ask for confirmation, then start the RP if I accept. All I tested was whether the AI's first generated message had a plot outline.

Format for score: ( # gave plot summary then asked for confirmation / # gave plot outline but started RP with no confirmation request / # started RP with no plot outline at all ). Perfect score is (10/0/0).

Note that the initial version of my prompt worked with WizardLM2-8x22B but failed 100% of the time with Llama 3 70B Instruct. I tweaked the prompt until it worked with Llama 3, and after that decided to see how widely it worked.

Basically the only thing I tested that didn't work well was Llama 3 Lumimaid 70B.

== Mixtral 8x22B family ==
WizardLM-2 8x22B (OpenRouter), neutral samplers: (10/0/0)

== Llama 3 70B family ===
Llama 3 Lumimaid 70B q4, min-p=0.075: (3/0/7)
Llama 3 Euryale 70B v2.1 (OpenRouter, fp8?), min-p=0.075 rep-pen=1.1: (8/0/2)
Llama 3 70B Instruct (OpenRouter), neutral samplers: (10/0/0)

== Mixtral 8x7B family ==
BagelMIsteryTour-v2-8x7B 3.7bpw rpcal, min-p=0.07: (10/0/0)
Nous-Hermes-2-Mixtruct-v0.1-8x7B-DPO-DARE_TIES q6_k, min-p=0.07: (9/1/0)
Mixtral-8x7B-Instruct-v0.1-limarp-zloss-DARE-TIES q6_k, min-p=0.07: (7/1/2)
Mixtral-8x7B-Instruct-v0.1 q8, min-p=0.02: (8/1/1)

== Other ==
Command-R-v01 q6_k, min-p=0.008: (9/0/1)
Mistral Medium (API), neutral samplers: (10/0/0)
Fimbulvetr-11B-v2 q8, min-p=0.1 temp=1.25: (10/0/0)
>>
>>101583528
ok, but where's fp8
>>
OK wtf, regular nemo works best with spaces before [/INST] and a new line after the suffix BUT that makes this magnum finetune dick coming out of mouth retarded BUT the regular non space before / after prefix / suffix and suddenly its smart again AND seems to have better prose than default nemo.
>>
>>101583592
* Llama 3 Lumimaid 70B is awful
>>
>>101583592
retard
>>
File: file.png (998 KB, 768x768)
998 KB
998 KB PNG
Hiding hands in a disembodied sea of hair. Smart.
>>
>>101583683
I don't think it's possible for her right arm to be disappearing like that
>>
Is 'eyes sparkle with mischief' a thing that happens IRL? I don't know if I've ever noticed it.
>>
>>101583683
give her armpit hair
>>
>>101583637
Ok I think everyone is using default nemo with wrong formatting (because they posted the wrong formatting themselves, spaces makes it not retarded) and finetunes trained on the actual no spaces formatting that was shown on its page which fixes that issue but flips things, spaces make it retarded on the finetunes.

And im not talking about slightly dumbing down, a perfect double BJ scene with a character talking changes to a bj scene with "your" dick is coming out of one of their mouths while it explains how it feels in your mouth with the wrong formatting of just the spaces either being added / missing.
>>
>>101583683
Neglecting personal hygiene with Pochiface
>>
>>101583695
You've just never tried whispering conspiratorially, that's your problem.
>>
So for the nemo mini magnum finetune use this instead:

Context: https://files.catbox.moe/6yyt8d.json

Instruct: https://files.catbox.moe/vzru1o.json
>>
>>101583694
Those hair are lovecraftian so it is possible.
>>
>>101583567
nemo instruct already feels like it didn't have a lot of supervised training done over it
>>
>>101583718
>spaces makes it not retarded
?
>>
>>101583767
>randomly mixes spaces and no spaces
>>
>>101583789
Where? Neither the prefix / sufix have space before or after in that.

Its
User Message Prefix:
[INST]
User Message Suffix
[/INST]
Assistant Message Suffix
</s>

And that is it. No spaces before or after any of them.
>>
>>101583683
>look at my ugly face
>>
>>101583821
Anon...
>>
>>101583821
The context starts with a space.
>>
>>101583788
For regular nemo it needs spaces or its retarded. For this nemo finetune it was pants on head retarded with the spaces.
>>
>>101583838
Regular Nemo doesn't use spaces in the official format though.
>>
>>101583835
Oh, you mean in the context template. I did not notice a a difference there.
>>
>>101583821
>newlines
erm... no sweetie
>>
>>101583871
oh wait I'm retarded, false smugpost shame....
>>
>>101583858
That was my point. The official format seems wrong. It works better with spaces. Like much better. BUT its the opposite for any finetune which I have to assume means they were actually trained without them unlike the base model despite its official formatting claiming otherwise.
>>
>>101583580
Yes there is, training on instruct makes the resulting model way smarter than trying to train directly on base. The instruct-tuning these companies are doing is orders of magnitude more sophisticated than anything we can do with currently available datasets. I personally have compared training base vs instruct on the same datasets (and published the instruct versions on HF), I can speak from experience, you have to leverage the instruct tune somehow to make the model smart.
>>
I noticed that Nemo is at least 2x more intelligent when using the simple-proxy-for-tavern format.
>>
>spaces around [INST] make it retarded!
>no, spaces make it smarter
>it's supposed to be used with a low temperature
>no, t>1 is better
>no system prompt is better
>no, my system prompt is the best
all these opinions are worthless if you haven't tried those settings on multiple chats with >30k tokens each
>>
>https://huggingface.co/unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit/blob/1d85adc9e0fff0b8e4479a037bd75fe1346333ca/tokenizer_config.json#L8008
>"chat_template":
>"{%- if messages[0]['role'] == 'system' %}\n
> {%- set system_message = messages[0]['content'] %}\n
> {%- set loop_messages = messages[1:] %}\n
>{%- else %}\n
> {%- set loop_messages = messages %}\n
>{%- endif %}\n
>\n
>{{- bos_token }}\n
>{%- for message in loop_messages %}\n
> {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}\n
> {{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}\n
> {%- endif %}\n
> {%- if message['role'] == 'user' %}\n
> {%- if loop.last and system_message is defined %}\n
> {{- '[INST] ' + system_message + '\\n\\n' + message['content'] + '[/INST]' }}\n
> {%- else %}\n
> {{- '[INST] ' + message['content'] + '[/INST]' }}\n
> {%- endif %}\n
> {%- elif message['role'] == 'assistant' %}\n
> {{- ' ' + message['content'] + eos_token}}\n
> {%- else %}\n
> {{- raise_exception('Only user and assistant roles are supported, with the exception of an initial optional system message!') }}\n
> {%- endif %}\n
>{%- endfor %}\n",
Unless the mistral gated repo has a different config file from this unsloth one, the official instruct does have a space after
>[INST]
but not before
>[/INST].

>>101583903
Member berry remember those memes.
>>
>>101578323
Any local models roughly at the level of novel ai? What system reqs would I need? Im looking for a CYOA type thing
>>
>>101583909
It is the magical penis touch fallacy. All of the strongest opinions on a model/setting are usually formed when the user gets his penis randomly touched in a way he really likes.
>>
>>101581185
It's doomed. Maybe udio or suno will leak in 11 years or something
>>
>>101583909
I explained before it is night and day. It was the difference between a double blowjob scene that made perfect sense and one where one girl had a dick coming out of her mouth while it explained that it was in "your" mouth.
>>
>>101583892
>I can't figure out how to make the model not retarded if I don't apply my synthetic slop dataset to somebody else's censored slop instruct finetune
See? Like I said, laziness. Get your shit together, llama 1 finetuners did it just fine.
>>
>>101583965
And this was like 20 swipes each. Not 1 fail vs not 1 pass.
>>
File: file.png (104 KB, 666x487)
104 KB
104 KB PNG
>>101583923
Now how do you explain... THIS?
>>
>>101583935
pretty much anything released in the past year above 7b is better than novelai
>>
>>101583935
Every single one since llama-2 is better than what they currently have. If you download Mistral Nemo, it will be like night and day for you. But even llama 3 should be enough to see big improvement.
>>
>>101583994
Dunno. Just pointing out what's in the HF repo.
And there's also a space after [[/INST] and before the assistant's message too.
>>
>>101583935
novel ai already been skinned alive. just download mistral nemo
>>
>>101583923
Thank you, nobody actually reads the jinja template, nor do they know how tokenization works, apparently.

Most normal tokens that are whole words or the start of words implicitly start with a space. E.g. " Hello" is a very common token, "Hello" (no space) probably exists in the vocab but it is less common.

If you end the prompt with "[/INST]", no trailing space, like you should, the model will output a token that implicitly starts with a space. This is how it was trained. But every other [/INST] in the context should be formatted in ST with a space after it. This means you have to use assistant prefix as " " (a space).

Picrel, this is correct. User prefix: "[INST] ". User suffix: "[/INST]" (we don't want the last [/INST] to end with a space, as mentioned above). Assitant prefix: " " (assistant messages in the context need to start with a space). Assistant suffix: "</s>". This is just a one-to-one copy of what's in the jinja template.
>>
whenever i use mistral large in st it keeps reloading the entire context every message, anyone else had this issue? booting up another model has no issues for me
is this a tokenizer thing or am i doing something wrong
>>
>>101582991
>APU
My Ryzen 7 7840U with Radeon 780M graphics gets 3.51 T/s with Llama 3 8B and shit/s prompt processing speeds. Unless there's some pretty huge enhancements and also APU support for servers (for hundreds of GB of memory) then it's shit.
>>
>>101583935
Sure, they're smarter if you don't mind 'shiver' slop
>>
>>101584039
I'd agree if we were talking about instruct tubes. Unfortunately, the fact that NAI is auto complete and almost nobody gives a shit about auto complete anymore means your only options for a true "replacement" are the base models
L3 70B is at least good and cheaper than NAI (even on OR) though, so there's that
>>
>>101584044
>>101584039
>>101583999
Thanks anons!
>>
File: 24-06-26 20-47-49 2873.jpg (2.11 MB, 4032x3024)
2.11 MB
2.11 MB JPG
>>101581459
A6000 or Ada A6000?

I dunno seems like a wave runner would be more fun though. I do have a vintage 68K Mac collection that's all 100% restored. I'd probably trade that for a Mac Studio.
If it's not going to be significantly faster than 3090s I don't see the point in spending that much more.
>>
>>101583909
>with >30k tokens each
The length tested should be whatever length your normal chats are.
>>
>>101584084
That's the exact format I'm using.

>Most normal tokens that are whole words or the start of words implicitly start with a space. E.g. " Hello" is a very common token, "Hello" (no space) probably exists in the vocab but it is less common.
Right, that's true for the llama tokenizer too I think.
I just checked in the log probs and that is the case for this Teken tokenizer nemo uses, meaning that you are spot on.
CommandR's seems to be the odd one out in that aspect if I'm not wrong. They mostly have whole words without spaces before them as individual tokens.
>>
>>101584044
>just download mistral nemo

ooba can't load this
>>
>>101583968
You've never finetuned anything and it shows. Standards were lower back then. Finetuning on a random AI-generated dataset like airoboros was state of the art. Try the same thing today, on base llama 3, then compare the result with the official instruct tune. The official one is so much smarter. Again, I personally have done this and you clearly haven't. I agree that in an ideal world there would be very high quality open source instruction tuning datasets, and we wouldn't have to finetune on the instruct model, but unfortunately that is currently not the case.
>>
>>101584173
Ooba can load this if you update transformers

pip install git+https://github.com/huggingface/transformers.git
>>
does nemo work with kobold?
>>
>>101583231
thanks, this works fine.
though the first prompt generated a "STATUS" box for my character and myself. weird, but very accurate.
>>
>>101581459
>>101584164
Obviosuly A6000 due to the NVlink bridge.
Probably the ideal setup. I missed getting a pair on ebay by seconds.

2x ada is probably as good but costs 4 times more
>>
>>101584217
1.71+ yes
>>
>>101584199
Skill issue
>>
>>101581459
Post some coom logs anon
>>
When ST or other UIs cull the example dialogue to fit in what remains of the context window, do they drop message from the start or from the end of it? Note that I'm talking about the example dialogue area in particular, not just the actual message history.
>>
Nvm, I had a space before the [INST] in the context template which was causing the issue. This seems to be working perfectly. Smart and no repetition.

Story String

[INST] {{#if system}}{{system}}{{/if}}{{#if wiBefore}}{{wiBefore}}{{/if}}{{#if description}}{{description}}{{/if}}{{#if personality}}{{char}}'s personality: {{personality}}{{/if}}{{#if scenario}}Scenario: {{scenario}}{{/if}}{{#if wiAfter}}{{wiAfter}}{{/if}}{{#if persona}}{{persona}}{{/if}}[/INST]

User Message Prefix

"[INST] "

User Message Suffix
"[/INST]"

Assistant Message Prefix

" "

Assistant Message Suffix

"</s>"
>>
>>101583265
>is the space after the first [INST] necessary?
Not necessary.

String:
[INST]Hello![/INST]Hello! How can I assist you today?

Tokens:
[3,22177,1033,4,22177,1033,3075,1710,1362,10410,1636,9406,1063,9246,1639,2840,1693]

3 = [INST]
22177 = Hello
1033 = !
4 = [/INST]
22177 = Hello
1033 = !
>>
>>101584411
>>101584411
>>101584411
>>
>>101584044
do I want mistral nemo or this mistral XL model? My internet is slow for both , I want it for sillytavern
>>
>>101584385
That said the formatting seems insanely important. Without them the model was retarded. With them It legit seems smarter than gemma 27B to me now while still keeping its "soul"

Multi character interactions who have completely different anatomy are actually perfect now and are not getting mixed up any more. This is with the mini magnum tune which indeed made it feel that much more claude like.
>>
Who is the cutest botmakie?
>>
>>101584435
Buy an ad
>>
>>101584433
you probably want mistral nemo
mistral large is giga kino but you almost certainly don't have the hardware to run it unless you just happen to have a stacked multi-gpu rig
>>
>>101584464
oops, sorry wrong thread. my bad.
>>
>>101584389
Adding BOS (without spaces) also doesn't change how the string is tokenized, only adds token #1.

{\"content\":\"<s>[INST]Hello![/INST]Hello! How can I assist you today?\"}" response="{\"tokens\":[1,3,22177,1033,4,22177,1033,3075,1710,1362,10410,1636,9406,1063]}

Adding spaces makes it dumber, wastes tokens, makes the model start replying in other languages.

<s> [INST] Hello! [/INST] 你好! (Nǐ hǎo!) How can I assist you today?\"}" response="{\"tokens\":[1,1032,3,45383,1033,1032,4,1032,124108,1033,1319,1078,1199,1144,1310,1199,1142,1111,47905,3075,1710,1362,10410,1636,9406,1063]}
>>
>>101584478
I have 48gb vram I think I can run the 3 or 3.5 quant
>>
>>101584204
>Ooba can load this if you update transformers
>pip install git+https://github.com/huggingface/transformers.git

Any chance for quantized Mistral-Nemo on ooba?
>>
>>101584091
>My Ryzen 7 7840U with Radeon 780M graphics gets 3.51 T/s with Llama 3 8B and shit/s prompt processing speeds. Unless there's some pretty huge enhancements and also APU support for servers (for hundreds of GB of memory) then it's shit.
Is it faster than pure CPU inference? Have you compared it?
>>
>>101579547 In my experience it's really good at understanding complex instructions and scenarios, but writes very sloppy and bland
>>
Why did no one tell me that nemo is uncensored!
>>
What about the old character.ai, anything close to it nowadays?
>>
>>101585036
NeMo and Largestral
>>
>>101585036
Nemo or nemo tunes. If you have a shit ton of vram then large mistral.
>>
>As you approach, you get a better view of Carmilla's stunning body. She's lounging on the bed, her long, toned legs stretched out before her, her high-heeled boots clicking softly against the stone floor.
Why u be like this Nemo... And I am really into the shit it writes but then it has to be turn into a total fucking retard... It really is a true 13B.
>>
>>101585374
>her high-heeled boots clicking softly against the stone floor.
Ain't no way. I have multi character interactions / sex positions working without error.
>>
>>101585374
unironically skill issue
nemo's not had that kind of trouble on my end. make sure you're using that anon's configs from the last thread.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.