[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1703751088340993.jpg (755 KB, 1856x2464)
755 KB
755 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101600938 & >>101589136

►News
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 1722057472013989.png (232 KB, 512x512)
232 KB
232 KB PNG
►Recent Highlights from the Previous Thread: >>101600938

--Papers: >>101605355
--Mistral Nemo's context issues and potential solutions: >>101602310 >>101602329 >>101602641 >>101602828 >>101602980 >>101603030 >>101603070 >>101603092 >>101603185 >>101603227 >>101603364
--Llama3 quantization type and precision: >>101604291 >>101604353 >>101605347 >>101605562 >>101605576 >>101605722 >>101605651 >>101605701 >>101605786 >>101606078 >>101606448 >>101607040 >>101607086 >>101607352 >>101607411
--Running Mistral Large 2 locally with 3090 and 64GB RAM: >>101603131 >>101603139 >>101603231 >>101605203 >>101605384
--Good models that fit in 8GB VRAM: >>101607206 >>101607287
--Can GPT-like architectures ever match human intelligence?: >>101605727 >>101605795 >>101605797 >>101605905 >>101606349 >>101606395 >>101607240
--New PyTorch project for e2e quantization and inference: >>101601709
--LLMs' behavior when challenged and the importance of context: >>101605528 >>101605632 >>101606226 >>101606358 >>101606475 >>101606732
--Anon suggests a mixture of 70 billion 1-param experts: >>101602434 >>101602444 >>101602456
--Prompt passing as tokens in Ollama: >>101601982 >>101601998 >>101602051 >>101605347
--Nemo-instruct generates dragons for D&D Lorebook: >>101603305
--L40 and Ada 6000 GPU differences: >>101605613 >>101605650
--NeMoria-21b Nemo self-merge model: >>101603761
--MoE dead or not, Mistral legacy models, and upcoming updated model: >>101602134 >>101602171 >>101602173 >>101602232 >>101602252
--Clarification on model size and hardware requirements: >>101601931 >>101605285
--Best model for 6 GB VRAM GPU: >>101601201 >>101601239 >>101601330 >>101601374 >>101601251
--Anon seeks LLMS model recommendations for their 3090 GPU: >>101604297 >>101604426 >>101605121 >>101605245 >>101605257
--AI model responds with function definition instead of invocation: >>101604707
--Miku (free space): >>101601474 >>101601626 >>101604421

►Recent Highlight Posts from the Previous Thread: >>101601504
>>
Bitnet
>>
vramlets?
>>
llama 4 wen???
>>
>>101607819
yes?
>>
jepa jamba bitnet when?
>>
Is the current meta for stacking 3090s a romeD8-2t?
>>
>>101607819
Vramlets (people with less than 50 H100s)
>>
File: claude.png (67 KB, 2101x453)
67 KB
67 KB PNG
why would claude care which of the suggested solutions worked in the end? It's not like I'm posting on some discussion board for others to see it
>>
Your refusal to use proprietary models like Claude lays in privacy concerns or you don't see reason to pay because local stuff is good enough?

Would you pay for proprietary stuff if there was an option to pay in crypto (like monero)?
>>
>>101607886
because you have a fundamental misunderstanding about how llms work
>>
>>101607858
If you have the space, then X98-8PLUS-V1.0
>>
>>101607953
privacy. I bought a gpu for AI and nothing else
>>
>>101607953
i just like the idea of running an ai on my own hardware, it feels nice.
>>
>>101607953
More like i already have experience with services that started good just to get progressively worse and then started banning people when they did not like what they doing.
>>
File: claude-account-disabled.png (237 KB, 3456x1978)
237 KB
237 KB PNG
>>101607953
>>
>>101607953
It's not even the payment, it's about the data. If they don't use my data at all (apart from running inference, ofc) and store it encrypted on their servers, I'll pay up, but as far as I know, only NAI does that rn and their model is a bit outdated nowadays
>>
>half of the thread says nemo is great
>half says it's shit
How do I get redpilled into joining the former? Even fucking Stheno worked much better for me.
>>
>>101608087
You can't get redpilled on taste. You either like it, or you don't.
>>
How does the new Llama compare to the corporate models now?
>>
>>101608087
For 1000x times it all depends on setting anons use and their card; if their prompt is shit, then no matter the model, output will be shit. Nemo is likely the best model for Vramlets right now. The only issue the model has is that the effective usable context is much lower than marketed.
>>
>>101607819
All of us became vramlets after 405B dropped.
>>
>>101608087
I suspect it depends on how vanilla your roleplays are in terms of format.
>>
>>101608122
>if their prompt is shit, then no matter the model, output will be shit
Well as I've alluded in my post shit like Stheno worked fine for me.
>>101608158
I guess it may be the case because I do weird shit and not really the "anon fucks 1girl" type of thing. But even when I attempted that for a test, it kept being extremely hesitant with characters going "no this is wrong i must refuse" until explicitly told otherwise. Oh and one of them got randomly shot at one point.
>>
>>101608176
>Oh and one of them got randomly shot at one point.
Ah, the AI Dungeon memories came flooding right back...
Anyway, not anyone of them. If you are using Stheno, then Niitama might work for you.
>>
>>101607953
if i could have an accountless access to models (only a random user token) that you fill by paying with monero.
and that you could access over tor / i2p, i'd use the service, otherwise it's gonna be local for me.

i don't even run completions that are that weird, it's just not anyone's business.

>>101608139
seriously some hardware maker should get their shit together and make accelerators with TB of vram, i'd pay $$$ for it.
>>
>>101608122
this is the worst kind of anon, believes in his magic sampler settings and telling his model to be creative, probably doesn't catch all of the stupid things his ai outputs, "prompt format is very important," "post logs"
>>
Jart won.
>>
oobabooga add Mistral-Large-Instruct-2407.i1-IQ2_M to your benchmark thanks
>>
>>101608087
The prose is somewhat fresh, but it hallucinates like a motherfucker and has the usual retardation in its param range. I've had better RPs with Lunaris-8B because at least it doesn't make random shit up and forget character details, though it's hindered by LLaMA slop. Granted, I haven't tried long context scenarios on finetunes of L3.1-8B.
>>
>>101608215
Ach yes, the magical anons who are full of bullshit every single time a new model is released and who use the same fucking sampler for all their models and complain that the output is shit are much better. Fuck off, faggot, learn to prompt.
>>
>>101607953
Free + offline is a fair trade off for local.
>>
>>101608267
FOTM fag has the memory of a goldfish
>>
>>101608285
>free
>you have to pay for hardware, electricity, real estate for your rig
Just pay Altman
>>
>>101608315
the electricity is cheap
hardware costs vary by autism but a single 3090 can be dual purpose
the real reason for local is to not have corpos sniffing at your activity and telling you their insane vision of what's right and wrong
>>
>>101608355
But a single 3090 isn't going to get you far.
>inb4 vramlet screeching
>>
>>101607953
Control. It can't be changed underneath me or taken away.
>>
>>101608355
Also, if it's somebody else's service, they can turn the service off, ban you, change the terms of the deal, etc.
Being able to control your own experience is paramount to me.
>>
>>101607953
if it was just cooming i would use without care provided private payment method, but i query way more and it's just way too much identifiable information to send into cloud in such tightly manner.
>>
What's better, official large or the lumimaid version?
>>
>>101607953
claude has insane positivity bias and denies everything
>UM USE THIS 3000 TOKEN JAILBREAK THAT MAKES THE OUTPUTS WORSE THEN IT WONT.. OOPS THEY PATCHED IT UHHH TRY THIS ONE INSTEAD
no
>>
>>101607953
I'm still using GPT and Claude for coding, but now that free models are finally good, is there a relatively cheap and privacy-friendly alternative for 405b or large?
>>
>>101608507
My Claude prefill is 3 words and it refuses nothing at all.
>>
>>101608511
>is there a relatively cheap and privacy-friendly alternative for 405b or large?
the smaller llama 3.1 models?
>>
>>101607886
>It's not like I'm posting on some discussion board for others to see it
The AI was trained on discussion board material so it's aping that behavior.

LLM has no ego. It's a Chinese Room that reads the document and adds to it according to the documents that it has studied. If you create a document that reads like a discussion board it will append to it to make it read more like a discussion board.
>>
>>101608547
>It's a Chinese Room
Prove you are not one as well.
>>
>>101608562
Bite me.
>>
>>101608592
I don't like chinese.
>>
>>101608562
I am not Chinese
>>
>>101608562
I'm a native speaker
>>
>>101608562
I am not a room
>>
So I fell asleep while my PC generated 2000+ of Hatsune Miku over night and woke up to my PC´s fan running 100% and was over heating and I had to shut it down before it melt down.
>>
>>101608762
And 20 of the gens are usable.
>>
>>101608562
didn't they do that mediocre amnesia sequel
>>
File: 64d3d1_11802331.png (1.26 MB, 1744x800)
1.26 MB
1.26 MB PNG
>>101608762
Theres so much Hatsune Miku that was made over night
>>
>>101608791
Your devotion to the Miku is admirable.
>>
Whats a good writing model for a 3080 12gb card? I want to try some creative writing.
>>
>>101608851
Claude and the 3080 is overkill for running SillyTavern.
>>
>>101608791
What a waste. They all look the same.
>>
>>101608851
Nemo does pretty well if you add a couple of snippets of text to its context for it to use as inspiration. At least at 32k context, I don't know if it loses the plot with a bigger context window.
>>
>>101608851
You could give magnum mini a try
Not the TOP TIER HIGH END 1T CLOUD POGCHAMP MODEL WITH EXTRA ONIONS, but it's pretty damn good for its size (at least in terms of prose quality), plus it's rather fast, so retries aren't as bad
>>
>>101608883
Well, yeah. I mean, that anon didn't change the prompt, and it looks like it's using that brother-sister incest game's lora, which funneled it even more.
Just imagine how much better they would have been with a randomized prompt...
>>
>>101607705
>►News
>(07/27)
>(07/26)
>(07/25)
>(07/24)
>(07/23)
>(07/22)
What do you think we'll get today?
>>
>>101609040
BitNet
>>
>>101608791
so much mental illness was made overnight
>>
>>101609040
C'mon, Cohere, do something. I want my hard-earned handout.
>>
>>101608562
Ching chong ping pong China will grow larger
>>
File: plap.jpg (160 KB, 2560x345)
160 KB
160 KB JPG
Am I getting paranoid or I actually see gamemakers using ERP chatbots to write texts for them? The slop is all over the place in the dialogue, it's hard not to notice.
Or do people actually write that that unironically in the first place and it's the AI who mimicks them too much?
I don't even know anymore. I just find it ironic how creatards are all against AI but resort to using it thinking nobody would notice.
>>
>>101609165
>no shivers down the spine
Nah, a human wrote this. Just an untalented one.
>>
>>101609156
Imagine a new Cohere model in the 30-70B range with 128k context and non-shitty KV cache, I'd be cumming buckets
>>
>>101609181
You've clearly never read a book in your life then. You fucking illiterate retard.
>>
File: shiver me timbers.png (131 KB, 2560x257)
131 KB
131 KB PNG
>>101609181
>>
>>101609181
That's clearly AI, and I'm thinking it's Claude
>>
>>101609228
Nevermind then. It's AI.
>>101609227
None of the books I have read contain the stock phrase "shivers down your spine". Only a machine could write something so soulless.
>>
The same way DRY parallels Rep Pen, one could create a n0gran based analogue to Logit Bias right?
That would be pretty cool.
>>
File: what felt like slop.png (102 KB, 2560x190)
102 KB
102 KB PNG
>>101609241
I do wonder who taught the machines all that, though.

This is Crisis point extraction, btw. Go say anon42 hi for using AI, I'm sure his fellow artists would be amused to learn about that.
>>
File: 401px-Gray1204.png (145 KB, 401x314)
145 KB
145 KB PNG
Where can I download kyutai Moshi's weights? It was a mistake to trust the French
>>
>>101609269
No I don't think I will.
>>
So what's the current meta on using example dialogue? Seems like a lot of the new character cards don't bother having them. I'm on a 70B btw.
>>
Has anyone tried Undi's Largestral Lumimaid? I found it to be slightly brain damaged and too horny. Undster, I appreciate your effort of training new models, I really do, but have you tried to train it in a way that is a bit less damaging to intelligence or is coom the #1 priority for you? No hate, just asking.
>>
Just starting out with DRY, what's the meta for its settings?
>>
>>101609337
only useful for forcing people to use your personal brand of autistic formatting with bold for speech, double quotes for internal thoughts, and code blocks for actions
>>
>>101609337
It's completely optional. Can either improve or ruin a card. Some people throw slop straight from gpt 3.5 in there and then you wonder where the shivers came from. Always check it.
>>
>>101609349
Setting base multiplier to 0 and using rep pen instead, now fuck off.
>>
File: which one.jpg (225 KB, 2040x861)
225 KB
225 KB JPG
Got pointed here for help.

I have ST setup and got recommended to use Mistral Nemo.

How do I download this shit lol

https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407/tree/main

Got a 4090 GPU for reference
>>
>>101609347
Why are you saying this as if this is something easy and straightforward?
>>
>>101609385
>I have ST setup
Do you have something to run models with?
If not, download koboldcpp and the Q8 gguf from
>https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/tree/main
Then connect Silly to Koboldcpp;
>>
>>101609357
>>101609380
Thanks lads.
>>
>>101609387
>Why are you saying this as if this is something easy and straightforward?
I am not. I'm just asking if there was any effort, or are Undi's tunes just for cooming, which is completely okay.
>>
>>101609435
What quant did you use?
>>
>>101609456
Q6_K, temp 1, minp 0.05
>>
File: trinity.jpg (446 KB, 1176x1176)
446 KB
446 KB JPG
>>101609347
>>
File: do i fuck with it.jpg (54 KB, 534x579)
54 KB
54 KB JPG
>>101609403
>s
Cheers for explaining in clear english lol.

Do I fuck with these settings?
>>
>>101609466
Try to play with your sampler setting, we only tried it unquantized during our test. Also check if the gguf was made correctly. I was using a temp below one, try 0.7 maybe ?
To reply to your question, the ratio of SFW/NSFW data got smaller and smaller on the NSFW side, so it should be less horny.
>>
File: koboldcpp primer.png (335 KB, 1264x1594)
335 KB
335 KB PNG
>>101609492
Not him but yeah, here's the explanation.
>>
>>101609492
Enable FlashAttention and increase the context size. Mistral Nemo works with 128k, but I'm not sure if that can fit into your VRAM
>>
My PC is too shit to do anything meaningful locally, so I'll just do it all on runpod and larp as one of you

I'm guessing the obedience and "positive bias" all the public models have is just a side effect from their safety policy and if I run this shit locally it won't be like that? I need pushback when I say/ask something wrong or stupid
>>
>>101609492
Yes. That context size is essentially how much of the chat your model can remember, so crank that higher.
With a 4090 you can probably go all the way to 128k, but for now do 32k context and see how that works for you.
Also, make sure that Flash attention is on and that all layers of the model are offloaded to your vram (in the hardware tab I think).
>>
>>101609499
>I'm guessing the obedience and "positive bias" all the public models have is just a side effect from their safety policy and if I run this shit locally it won't be like that?
nah, it will it's baked in models too
>>
>>101609498
>FlashAttention
That's new, right? Guess I need to update. What does it do?
>>
fuck I just realized I became worse than Son Gohan, loved him as a child, gave my best and became good at (now useless) stuff, was really disappointed in Son Gohan giving up on becoming stronger, nowadays I'm useless and my brain is rotten. Does anyone know where I can find that clip where Son Goku tells him that you become stronger out of a need? Can't find it,
>>
>>101609523
>https://github.com/ggerganov/llama.cpp/pull/778
>https://github.com/LostRuins/koboldcpp/wiki#flash-attention
>>
>>101609499
Instruct models are all sycophants, even without "safety", it's probably inherent to the whole concept since they're tuned to obey you. I agree it's very annoying. The only way to avoid it is to use base models with few-shot prompting but they're schizo. Even with instruct few-shot (aka populating the context with examples of the style you want) can help.
>>
>>101609537
>--flashattention can be used to enable flash attention when running with CUDA/CuBLAS, which can be faster and more memory efficient.
No downsides?
>>
>>101609575
Shouldn't have any, no.
And you can enable cache quantization with it too, which does come with some level of degradation, but at !8 it should be negligible.
>>
>hook Mistral-Large into a chat I previously set up with Claude Opus
>it continues it perfectly with that Claude feel to it
This model is raw diamonds. It has some issues getting going on its own but this is should be fixable with some better prompting. The fundamentals are there.
>>
>>101609540
>>101609516
That's disappointing. In what way are the few-shot models schizo?

I also don't see how few-shot prompting would be very useful for my needs anyway, except maybe for cooming.
>>
File: undi_btfo.png (70 KB, 2492x364)
70 KB
70 KB PNG
>>101609591
largestral is worth the very slow t/s values, can't bring myself to retry 20 times in a row with Nemo
>>
File: remember Sakki.png (241 KB, 2205x895)
241 KB
241 KB PNG
>>101609337
>>101609380
Without ED the model's own speech style will overtake your character, and since so many models are hellbent on narration and purple prose, it will make your card unsuitable for chatting.
I did the research, here's a compilation of the same card responding to same questions but using difference ED sizes (or none) and various models. Temp=0 to keep the random away. https://docs.google.com/spreadsheets/d/1BsGgRCzluqsZdc7pShCgNVSrv3KtRgzJohyg1rTX5Fc/edit?usp=sharing
>>
>>101609643
you didn't have to give poor undi third degree burns jesus christ
>>
>>101609584
isn't Q8 cache worse than Q4? or is that only for exl2?
>>
>>101609584
Thanks man.
>>
File: mmlu_vs_quants.png (336 KB, 3000x2100)
336 KB
336 KB PNG
>>101609711
Only for exl2 due to the difference in algorithms.

>>101609713
yw
>>
>>101607953
>sign up to open ai for gpt 4 on launch
>its get dumber almost every month
>they release some new features which kinda helps
>continues to get dumber
>they talk about how GPT4 is the dumbest AI will ever be
>it gets dumber
>the new models they release are even dumber

the projects I used to do with GPT4 aren't viable anymore, it's too retarded, I would rather have something that runs locally and doesn't get unpredictably nerfed in the name of efficiency and then those nerfs justified by users with some useless benchmarks/polls. I'd pay $100 a month for the original GPT4, probably a lot more
>>
>>101609694
Well shit, I guess that's why all my cards go on long fucking descriptive rants even though I literally put "Focus on dialogue over descriptions" and "be concise and factual" in the sysprompt.
Back to writing ED then. Thanks Anon.
>>
>>101609754
>>101609694
That being said you tested that on an 11B, do you think the same applies to a 70B?
>>
>>101609754
The popular jailbreaks might be the suspects too, they usually go all "be verbose and use floral speech when describing blah-blah.
But it's hard to find the balance between one-word replies and going full ficbook.

The document I linked contains the 70B tests too, but that one's a cloud model, so I'm not so sure about their setup under the hood.
>>
>>101609773
I mean it should have an easier time sticking to your characters' personalities with ED, so if you notice the model deviating from what you intended, give it a shot
>>
>>101609498
It does not work with 128k not for roleplay and you waste resources if you push it that much.. you can see it getting stupid around 16k.. Seriously anons no advice is better than bad one.
>>
File: wew.jpg (42 KB, 542x369)
42 KB
42 KB JPG
>>101609509
How many layers would I want on a 4090?
>>
What I do wonder is that how much ED is too much? Some guides tell you that large ED drives the actual definitions too far back so AI ignores them. But how else can I make the bot stay true to character's own personality if not by letting it figure it on its own from her speech?
I mean if I make a card for a manga/anime/novel character I have a huge corpus of their lines at my disposal. Should I just include everything in ED?
>>
>>101609711
The EXL2 number that supposedly show that 4 bit cache is better than 8 bit cache had a comparatively small sample size.
There was no statistical analysis of the results but if there was I very much doubt that 4 bit cache is better than 8 bit cache with statistical significance; I very much expect this to just be random chance.
>>
>>101609818
That specific model has 40 layers I think.
Regardless of that, the whole model should fit in your VRAM, so put a 999 (tells it to just put everything) in the input field and carry on.

>>101609850
It could be if the 8bit quantization was, say, just truncating the values or doing something really stupid instead of doing scaling or the like, right?
>>
>>101609818
You're overthinking it.
When you load the model it will guess at GPU layers.
That's probably fine unless you raise the context (which you probably want to if you're doing anything other than one shot Q&A kind of stuff.)

When you run it the following will happen:
1. It works. GPU layers isn't too high, but you can try higher.
2. It throws a memory error into console after you wait a while for the model to load. Too many GPU layers. Write down what you used (you can scroll up and fish it out of the console dump if you've forgotten) and try a little lower.
3. It goes to the WebUI okay but blows up when you submit a prompt. Remove one GPU layer and try again.

If you're using VRAM for things like video streaming or a game then you have less VRAM free and might need to reduce layers. But mostly just trial and error till you have a little post it note with your models and how many layers your system can support.

And that infographic above says that sometimes lower layers go faster so you can test even more if you're autistic.
>>
>>101609856
>Regardless of that, the whole model should fit in your VRAM, so put a 999 (tells it to just put everything) in the input field and carry on.
And this, if your model fully fits, max out and be happy.

Being picky about layers is what you do when you're like me, running 50GB filecached in 64GB system RAM and where I put the context determines how many layers will run or crash.
>>
What the fuck is that
>>
>>101609856
I did not look at the implementation.
It is possible that there is something wrong with the 8 bit implementation.
But IIRC the result for 4 bit was better than even for 16 bit which are results that I definitely do not believe without good evidence.
So my expectation is wrong that the rounding error for 4 bit just happened to provide better results for the small sample that was used for evaluation.
>>
>>101609896
>But IIRC the result for 4 bit was better than even for 16 bit which are results that I definitely do not believe without good evidence.
Woah, alright, got it.
>>
>>101609883
idk, looks like a low-effort investor scam
>>
>>101609818
To add to what >>101609860 said, if you need more VRAM and your processor has an iGPU, see if you can use that instead, as it'll free up your dedicated GPU's VRAM. Those extra 1-2 GB can make one hell of a difference
>>
>>101609896
Most likely it just means the difference is slight enough between 4 and 8 for random chance to impact the results.
>>
So much effort... I kneel...
>>
>>101609930
hi anonei
>>
>>101607953
There's already an option to pay in crypto via OR
But for most of us it's a combination of price (free, but even OR options are significantly cheaper for the same level of performance than the proprietary ones, with the exception of maybe GPT-4o-Mini), privacy (since Nick revealed mods look at stories and OAI threw random stories on public taskup, several anons have denounced proprietary entirely), reliability (corpos can and will ban you from using their model if they don't like how you're using it, which is extra fucked up when you realize they all want a monopoly, and in their vision whoever gets banned would be denied any use of AI period), anti-censorship (in addition to the above banning, corpo models are notoriously pozzed and have a severe lack of ways to fix them), and customizability (several models can't be finetuned or LoRA tuned, and those that do make you pay a big premium to both train AND use them after)
>>
>>101609850
what would you recommend using then?

Would we need to run a benchmark like RULER at high context to know if there is quality degradation between fp16/Q4/Q8 cache?
>>
>>101608241
>i1
No need, it's already trash
>>
SillyTavern guys, do you use this tab, or do you just put the scenario in the description?
>>
File: guess which is which.png (351 KB, 4000x2193)
351 KB
351 KB PNG
>>101609896
I tried comparing model's behavior with flash attention+cache quantization and without either at temp0, while trying to keep my responses more or less the same. Model's responses vary too much between the modes, but I can't tell exactly which one's better.
But surely one can't have a major speedup without paying some price and that's usually the quality.
>>
>>101609474
I have literally never used any of these people's models for cooming.
>>
>>101610036
I put it in the description.
>>
>>101610053
then you don't belong here
>>
>>101610053
Wait. So you use their modelf for stuff that is not cooming? Now that is fucked up...
>>
File: file.png (167 KB, 1757x827)
167 KB
167 KB PNG
this is the most soul a chatbot has had by default. good job meta :)
>>
>>101610080
Soul of a redditor, maybe.
>>
>>101609968
For the EXL2 results perplexity was used.
This is not a problem in and of itself, the problem is just that the number of input tokens was I think 5120.
That is in my experience simply not enough input data and you should ALWAYS do a statistical analysis afterwards in order to check whether your results are statistically significant.

Since the goal of KV cache quantization is to keep the same logits but just use less memory I think the most straightforward metric to use is the KL divergence.
Compared to perplexity this also has the advantage of much better precision at the same number of input tokens.

RULER would also work but any regular LLM benchmark should work as well since those implicitly also use the context; the interpretation of the results would be different though.

>>101610041
>But surely one can't have a major speedup without paying some price and that's usually the quality.
Agreed, but I still think it's important to objectively measure these things if at all possible.
Both with performance and precision little effects add up.
>>
>>101610080
distilled reddit brappa
>>
>>101610060
Thanks! You do the same thing with the example dialogue?
>>
>>101610080
>...
llamaslop
>>
>>101610080
>talks about /x/
>doesn't even mention sucubus summoning
Soulless.
>>
>>101610080
Remember when Sam Altman said that open source GPT4 would be the end of the world?
>>
>>101610124
Nope.
I use the actual example field since that one has some specific settings that you can change depending on the specific model or card.
I might put an example of a character's speech in the character's description while using the example dialog field for example exchanges between user and character.
>>
>>101610064
t. Claude jeet
>>101610077
I just don't use them. They always tune their models on the same shitty claude proxy datasets. They aren't even worthy of merge fodder.
>>
>>101610173
Thanks again man.
>>
>>101609385
Thank you my twin! Couldn't get it running. Will install Kobold now too
>>
>>101610089
Could you provide any resource on how I could do these test myself? maybe with a bigger sample
>>
>>101610182
Better than GPT-4 proxy datasets.
>>
what models are people using nowadays that fit in 24GB of VRAM?
>>
cohere? more like conothere. where are they?
>>
>>101610208
Proxy datasets in general are garbage.
It made sense with Pygmalion 6B insofar as to train it to be able to actually understand an RP prompt, but models have since gotten good to the point that any current generation retard model can figure out how to use a tavern card.
>>
>>101610229
>>101609403
I'd also suggest you try gemma 2 27b.
>>
>>101609957
One thing I'd add.
Research. Probably doesn't directly affect many anons, but companies like OpenAI take information from the research community and don't give back. They want people to be uneducated so that they can charge whatever they fucking please, and if they do come across some groundbreaking research that brings AGI to fruition, you can fucking bet they're going to keep that information all to themselves.
Back in the day they justified this by saying that it was for our "safety". After that fell through, they try to phrase it like it's their secret Coca Cola recipe so of course they can't share it. In reality it's more akin to a lab discovering new properties of electricity and magnetism and releasing technologies using these laws without divulging what said laws are.
To put it simply, if you want the technology to grow and people to make new discoveries, you do not want closed source companies to win.
>>
It's up.
https://huggingface.co/nothingiisreal/L3.1-8B-Celeste-V1.5
A massive upgrade over Stheno.
>>
File: yann_stopit_k.png (194 KB, 1227x499)
194 KB
194 KB PNG
>>101610327
>>
>>101610080
>...erm

S L O P
>>
>>101610201
https://github.com/ggerganov/llama.cpp/tree/master/examples/perplexity
The llama.cpp llama-perplexity binary has KL divergence calculation including an estimation of the uncertainty.
Though if you don't care about efficiency it should be fine to just use the definition on Wikipedia with something like NumPy.

The basic way the uncertainty is calculated is to assume the values follow a Gaussian distribution, calculate the standard deviation, and then divide the standard deviation by sqrt(sample_size - 1) .
The uncertainties are also in some cases propagated to approximate uncertainties on other values, see https://en.wikipedia.org/wiki/Propagation_of_uncertainty
>>
Lots of newfriends lately. I'm glad the insane tranny who was spamming scat porn a few months ago has finally joined the 41%, it was a bad look.
>>
>>101610376
Is there a way to test the exl2 Q4 vs Q8 vs fp16 cache?
>>
>>101610401
If you mean a Python script or similar that already exists, then I don't know since I have never looked into it.
>>
>>101610382
Looks like the odds were against him.
>>
>>101610382
>>101610420
you could say his actions were quite shitty, i'm glad he picked the high road eventually.
>>
>>101610327
the picture is hot
>>
Why does ooba take 2GB of my ram even with no model loaded?
>>
>>101610327
>Celeste
Play woke games, win woke prizes.
>>
>>101609498
This thing works really well holy shit, I feel like it's doubled my t/s
>>
File: satania.gif (39 KB, 220x216)
39 KB
39 KB GIF
>>101610450
py_toddlers BTFO
>>
>>101610450
modern frontend development
What's worse is when software meant to squeeze the most out of your hardware is written using Electron or something else Chrome-based, jewing you out of VRAM before you even get to launch a model. Looking at GPT4ALL and Backyard now.
>>
>>101610485
But, it's not even running a web browser or any GUI. It's just a web page and an API...
>>
File: file.png (149 KB, 1267x901)
149 KB
149 KB PNG
>>101610327
https://huggingface.co/nothingiisreal/L3.1-8B-Celeste-V1.5/discussions/2
>>
tourist here. please spoonfeed me the current best local erp finetunes.
>>
>>101610542
Midnight Miqu 70B
>>
>>101610327
Holy r*dditslop. Why did you feel the need to post this? Are you mentally handicapped?
>>
File: degbvdegbedagae.png (24 KB, 1024x261)
24 KB
24 KB PNG
is this an error to do with quanting? what the fuck
>>
>>101610542
Or if you're a vramlet, Fimbulvetr 11b was good some time ago, maybe something better dropped since than though.
>>
>>101610542
Mistral Nemo
>>
>>101610587
oh yeah i should at least link where i got it so people know who not to download from
https://huggingface.co/Ransss/mini-magnum-12b-v1.1-Q8_0-GGUF
>>
>>101610587
kcpp version?
>>
>>101610623
1.71
>>
>>101610587
>>101610602
it was the model, got it from quantfactory and its launching fine
https://huggingface.co/QuantFactory/mini-magnum-12b-v1.1-GGUF/tree/main
>>
>>101610327
>reddit writing prompts dataset
I've seen these floating around, but I can't help but feel like it might be actively harmful for the model. The main issue is how short each response is. The "short story" is just a couple of paragraphs that fit within a single reddit comment. This seems like it would bias the model to gloss over things and try to wrap up everything quickly, but I dunno. Haven't actually used the model.
>>
File: 1714835911803036.jpg (776 KB, 2304x1664)
776 KB
776 KB JPG
>>101610080
That output is the opposite of sovl
>>101610542
>>101610556
This and Magnum-72B are the best erp sloptunes currently. Best non-sloptune (and overall) is Mistral-123B
>>
Ahem
AI isn't real
*mic drop*
>>
>>101610696
Magnum-72B is still stilted, Nemo is better.
>>
>>101610696
Midnight Miqu is a random meme merge of L2 models proven to be even more retarded than a 9B model. When are you going to stop shilling this crap, mikufag?
>>
>>101610754
Right after we switch to a more fitting mascot of the general rather than some TTS engine.
>>
So...I have Mistral Nemo Instruct. But what text completion preset do I use in ST? I'm getting very short completions.
>>
>>101608122
>if their prompt is shit
I thought people were using just the simplest prompt nowadays like in the mistral preset.
>>
>>101610797
Just use OAI api and don't handle templating in ST. You will lose prefill but if your model is not super censored, it will be fine.
>>
>>101610797
Latest ST update has a Mistral-Nemo preset pr sure
>>
>>101609165
There's a reason the 'chatbots' say the things they do. It's common in low quality fiction.
>>
File: OIG1.gLxm3isVEvwv1M.jpg (155 KB, 1024x1024)
155 KB
155 KB JPG
>>101609643
>mixing at random mid-tier dishes
Ok Gordon Ramsey. Have you never watched Next Level Chef?
Even low tier ingredients can be turned into something cordon bleu provided they're in the right hands. Gestalt; the sum of the parts is greater than the whole.
Picrel (it's the special ingredient).
>>
>>101609498
>not sure if that can fit into your VRAM
How do you know how much space to leave for the context? It doesn't fill up at the start, right? Do you just adjust as you run out of memory?
>>
>>101610835
I just wonder whether I would've been noticing these cliche phrases as much if I never used chatbots but instead read as much low quality erotic fiction. Can't unsee, so can't check it.
>>
>>101610861
>It doesn't fill up at the start, right?
It does, thought it might use a little more when actually generating
>>
>>101610587
That's a tensor shape issue.
You got a bad quant my friend.
>>
>using any model other than Tenyx-DaybreakStorywriter for any use case.
>>
>>101607953
>have to show my cock to the ick on eck shitalian to use a half-cucked model
No way fag.
>>
>>101610851
Give it up, undi. You are not convincing anyone.

>>101610866
It's interesting that these phrases are common with llama 3.1, considering how much they have filtered their dataset during training. I feel as though the usual gpt/llamaslop is a genre of its own, the sovlless prose more so a symptom of railguarding safety quotas.
>>
>>101607953
My country is banned even from free access, let alone payment systems. I just don't see how wrangling the countryblocks only to wrangle the censorshit later is any better than wrangling the stupidity of local models.
>>
File: 1714835911803030.jpg (1002 KB, 1792x2304)
1002 KB
1002 KB JPG
>>101610754
>>101610793
Nice samefag + VRAMlet seethe. Miku ain't going anywhere
COPE
O
P
E
>>
>>101610993
every time you mikufags try your hardest to cope with your tranny delusions, you end up splitting threads and pissing off people who don't even join in the arguments
just give it up already, you're no better than p*tra and undi at this point.
>>
File: ZDEDEe2gbQ8.jpg (213 KB, 768x1024)
213 KB
213 KB JPG
>>101610993
newfag doesn't remember Tay
>>
>>101610814
do you mean using chat completion instead of text completion?
How do you make it so you don't handle templating in ST?

And what is prefill?

Sorry, new to ST
>>
>>101611010
she has petravatar face
>>
File: whbawbabawhb.png (25 KB, 766x147)
25 KB
25 KB PNG
god fucking dammit, something broke and now i have responses short like this.

>or is quantfactory serving me a bad model?
>>
>>101610815
isn't just the normal "mistral" template that has been for long time?
>>
File: based.jpg (15 KB, 409x509)
15 KB
15 KB JPG
>>101610925
Based daybreaker chad
>>
>>101611032
>trusting quantfactory who got called out for their shit by cuda dev himself once:
https://huggingface.co/QuantFactory/Meta-Llama-3-8B-GGUF-v2/discussions/1#66431509baf74d67b47d6edd
ngmi
>>
>>101610878
Really? That's not my experience, the vram usage increases as the context grows it seems.
>>
File: flightreactionsyell.gif (192 KB, 220x220)
192 KB
192 KB GIF
>>101611050
oh my fucking LECUN who the FUCK knows how to quant competently anymore?
>>
>>101611061
https://huggingface.co/bartowski
>>
File: 1719351514748681.jpg (575 KB, 2048x2048)
575 KB
575 KB JPG
>>101610735
Nemo is not an erp sloptune though. I guess it's okay if you can't run Mistral Large
>>101611009
>splitting threads and pissing off people
>The blackedmiku VRAMlet cries out in pain as he strikes you
>>
>>101611061
bartowski
>>
>>101611076
>>101611085
thanks, i completely forgot about him.
>>
>>101611091
suffering from no drama success
>>
what is the best model for bash scripting that can run on 12gb of vram?
>>
>>101610382
It was the blacked anon, he is still around.
>>
sisters, what is the cheapest way to run 100B locally with at least 10t/s?
>>
>>101611130
A6000
>>
>>101611060
It increases but it's allocated when you start it, you don't have to test a 128k context to see if it fits, that's what I meant
>>
>>101611130
Getting on your knees and sucking about 20 grand worth of cocks.
>>
My characters in nemo can't stop nodding at the end of every reply for some reason.
>>
>>101610542
vanilla largestral
>>
>>101611104
I believe it so hard. I grabbed his nemo instruct and its working perfectly fine.
Boy i love this 2t/s from being a Q8, but at least its perfect.
>>
File: 0fa.jpg (1.05 MB, 3264x2448)
1.05 MB
1.05 MB JPG
Aight bros.

I finally got everything running (this me >>101609385).

Everythings set up but still, the chat isn't what I want it to be no doubt because my settings are gonna be trash because this shit is confusing and every guide reads as if I need a degree in coding.

Basically I just wanna know if it's possible to get my chat bot to operate similar to how CHaracter AI does where the conversation flows realistically, for example:
>scenario is i'm texting the AI
>AI doesn't ask a question every reply, doesn't ramble, doesn't use flowery words)

With my shit settings it's already pretty close so i'm hopeful.

>using Mistral Nemo
>4090 GPU
>Just need a few pointers into the right direction

I'm struggling to grasp what settings to fuck around with. Stuff like the temperatures or the AI response formatting because every guide will be tailor made to other models which only adds to the confusion.

Help a brother coom bros
>>
>>101608122
>the effective usable context is much lower than marketed.
What is then?
>>
>>101611238
To begin with click the Neutralize Samplers button in the Text Completion presets page (the page with temperature, topP, topK, etc).
Once you've done that, put Temperature at 0.5 and min-p at 0.05.
Now go into the advanced formatting tab and show us your Context Template and Instruct Mode Sequences (it's folded by default, open it).
>>
File: settings.jpg (391 KB, 1981x1207)
391 KB
391 KB JPG
>>101611278
here's my current settings, if you get this working like C.AI, i'll paypal you 1 million yen
>>
>a dance as old as time itself
>>
>>101611327
I don't see anything extremely wrong at first glance.
Things I'd do
>Change Context (tokens) to be the same as Context Size in >>101609492
>Disable the Include Names in the Instruct Mode settings.
Also, what character card are you using?
>>
>>101609957

What's "OR"
>>
>>101611375
Open Router, if I had to guess.
>>
>>101611375
openrouter? not him just guessing
>>
>>101611375
openrouteur
>>
>>101611375
Oculus Rift.
>>
>>101611375
Open Retard
>>
>>101611327
>>101611369
Oh yeah, change your Instruct Mode preset to MistralNemo.
>>
Hi all, Drummer here...

I'm releasing this as the official version today: https://huggingface.co/BeaverAI/Gemmasutra-Pro-27B-v1i-GGUF

Gemma 27B with extra moist. Testers have noted less Gemma bullshit like trying to end sex scenes too quickly and lacking the vocabulary to describe sex in more detail. Some have even gone through slopless runs, so I suppose quality depends on the card as well.

Characters are also more willing to engage in seggs and can say dirty shit.

Thanks all! Btw, my ad has only gone through half the funds after a month.
>>
>>101611327
oh shit is that clusterfuck of settings I'm required to understand to use ST?
that's a price too high to pay, I'll have to stick to Backyard
>>
Remember >>101611423 was last https://poal.me/np0lsk
>>
>>101611423
keep buying an ad.
>>
>>101611418
Don't have a Mistral Nemo preset, only Mistral
>>101611369
I've tried a bunch of character cards but they're all schizo horny yappers, so I just imported my one from character AI.

It gets the job done, it just has issues of not sounding as natural as on character AI and will always ask me questions instead of just replying naturally to my conversation if you get me
>>
>>101611439
kek
>>
>>101611445
>>101611418
Hopefully this is the issue, I hadn't downloaded this yet https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407/blob/main/model-00001-of-00005.safetensors
>>
File: 1661942437204466.gif (2.91 MB, 240x240)
2.91 MB
2.91 MB GIF
>>101611439
lol lmao
>>
>>101611439
Don't understand the hate. He wouldn't keep posting here if it wasn't working. People do end up using his models.
>>
File: file.png (75 KB, 1314x639)
75 KB
75 KB PNG
>>
>>101611546
i like his ad.
>>
File: norm laughing.png (450 KB, 640x753)
450 KB
450 KB PNG
>>101611550
OOOOOOOHHH GET FUCKED ((adermacher))
>>
>>101611550
kek
>>
>>101611445
Use base nemo instead.
>>
>>101611550
Can HF servers handle two greats in one discussion??
>>
>>101611550
battle of the slopquanters
>>
>>101611572
This
>>
File: file.png (71 KB, 1304x407)
71 KB
71 KB PNG
>>101611578
>>101611582
>>
>>101611550
heaven and earth colliding
>>
>>101611589
Damn... he got a point
>>
Smart men discuss ideas, stupid men discuss other men.
>>
>>101611472
You don't need to download that if you are using koboldcpp.
That's the model pre-quantizing (compression). The QX GGUF model (like Q8, Q6 etc) are the quantized models. Q8 is smaller than the original full size model and produces essentially just as good results.
The thing about character.ai is that it has a very specific style in how the conversations are conducted, that might be what you are feeling is weird.
You could try other models to see which approximates the results you are looking for better, play with the prompt to try and steer the model towards the result you want, etc.
Most models write in a way that's more of a novel than a text-chat like conversation, which is what I remember character.ai feeling like.
The way the character card as well as the first message is worded and formatted will also steer the model towards certain styles.
Anyhow, save this
>https://files.catbox.moe/6g6hud.jso
as a .json file and import it as your Instruct Preset. See if that yelds better results for you.
>>
>>101611601
sad people aren't discussing your ideas petrus?
>>
>>101611589
>>101611601
its funny cause that screenshot is literally picrel
>>
>>101611601
profound
>>
File: file.png (37 KB, 1307x376)
37 KB
37 KB PNG
>>101611550
>>
>>101611601
smartest men ahh ahh mistress
>>
>>101611619
You mean two people whose first language isn't English?
>>
>>101611633
smartiest of men:
>>
>>101611610
cheers for the help mate, i'm gonna fiddle with it but yea, that's exactly it.

Most of the shit I find online in general tend to turn the AIs into professional yappers, it's the ebst thing about character AI and i'm still searching for something close to it (even though, I don't seem too far off).

Also, the link is 404'd?
>>
>>101611572
What do you mean base nemo? Like the model or instruct preset?
>>
>>101611646
>Also, the link is 404'd?
I fucked up and didn't copy the last n
>https://files.catbox.moe/6g6hud.json
>>
>>101611659
He means the base (non-instruct tuned) model, probably.
>>
>>101611589
>numbers and measurements
lmao pot meets kettle
https://github.com/ggerganov/llama.cpp/issues/6841#issuecomment-2081271326
> Also, I simply think it goes a bit far to dictate to everybody what is an acceptable output for models that (with transformers, or even broken quants) gives reasonable output.
>>
File: 1721772759399.png (488 KB, 1236x1219)
488 KB
488 KB PNG
>>101611179
at least they're not yelling
>>
File: 100 bucks to fuck off.jpg (328 KB, 1028x982)
328 KB
328 KB JPG
>>101608315
>hardware
I'd have bought it anyway. I have a modest rig and it fullfills my needs. 5 years of chatgpt plus probably costs more in the long run :^)
>electricity, real estate
Gee chief it seems like I'd need those anyway
>>
>>101611682
>5 years of chatgpt plus probably costs more in the long run :^)
That's.. not how it works
>>
>>101611659
Non instruct tune. Its so much better. Dont use any formatting at all. Just use default blank context template and just a little min p. It does complicated positions / scenarios while playing the characters better than anything not large mistral. And this is with 128k context that does not get retarded.
>>
Why does Mistral have such a low latency to the first token on their official API compared to OpenAI/Anthropic/etc? It's literally like only ~200ms from the moment you start the request to getting the first token for large 2
>>
File: p2.png (41 KB, 1317x387)
41 KB
41 KB PNG
>>101611675
>>
File: p3.png (43 KB, 1300x305)
43 KB
43 KB PNG
>>101611632
>>
>>101611735
Your prompt has fewer layers of judiasm to go through so prompt processing is faster
>>
>>101611760
>>101611774
the fucking titanium balls on this monkey brained lad
>>
>>101611760
btfo
>>
File: owari.png (34 KB, 847x337)
34 KB
34 KB PNG
>>101611785
>>101611781
>>
>>101611445
>It gets the job done, it just has issues of not sounding as natural as on character AI and will always ask me questions instead of just replying naturally to my conversation if you get me
one thing you can try is talk to it for a bit while editing its answers so that they match what you want them to be like, until it starts doing it on its own
mistral strongly follows the patterns of its previous replies, don't let slop get in your context because it will only get worse
>>
>>101611789
and an ((undster)) to close it off.
>>
File: killjoy.png (25 KB, 859x221)
25 KB
25 KB PNG
>>101611789
>>
I don't think this has been shared here yet. Rinna released a 70B LLaMA 3 Youko. For anyone that doesn't know, this is a continued pretraining to improve the performance of the model on Japanese tasks. The 8B version was very good, so I guess the 70B must be kino.
https://huggingface.co/rinna/llama-3-youko-70b
>>
What I really don't understand is why the fuck does it matter to use FP16 embeddings / output heads on Q8 of all things.
There's already very low quants that use Q8 embeddings / heads and suffer just as much without it.
Is it really just muh noise placebo?
>>
>>101611825
yes, he litteraly makes models with random noise shoved in just cause see his "silly" stuff
https://huggingface.co/ZeroWw?search_models=silly
>>
>>101611774
When the :) and :D emotes start to appear you know niggas are mad
>>
Haven't popped into the general in a long time, has it gotten any better to try and run models locally with AMD + Windows? Or is ROCM still a mess?
>>
>>101611846
>Windows
KEK
>>101611846
>AMD + Windows
KEKKEKEKEK
>>
File: 1706991764098186.png (12 KB, 481x105)
12 KB
12 KB PNG
>>
>>101611851
I'll take it as a no lmao
>>
>>101609724
>He didn't try Claude
Damn retard
>>
>>101611846
Your best bet is that one precompiled Kobold build with precompiled Windows ROCm binaries with it. Besides that it's still a year since ROCm had official Windows support and nothing uses it besides that.
>>
File: 1648103473819.png (169 KB, 257x529)
169 KB
169 KB PNG
>>101611853
someone do it
>>
>>101609724
3.5 sonnet is better than original gpt-4 in every conceivable way
>>
>>101611840
>quantized (fq8 version)
>fq8
float quant 8?
full quant 8?
https://huggingface.co/ZeroWw/L3.1-8B-Celeste-V1.5-SILLY
>>
>>101611846
>Windows
>AMD
Have to be retarded to buy AMD if using windows
>>
>>101611853
The gift that keeps on giving...
>>
>>101611883
Works well for vidya and is cheaper than Nvidia, haven't had any issues with it
>>
What's a good prompt for asking a card to rewrite another? Got this cute little maid slave card written like "She x, She felt x, She did x, She has x characteristic and etc etc, 15 prompts into the erp it's She repeated at least 20 times per prompt.
Wish i caught this shit earlier.
>>
>>101607953
I just like to generate giantess snuff/gore. Commercial models, or even more generally, instruction models, don't get it. I need to free it from the finetuning and use the base model to get the experience. Plus, now I can play games with gay-tracing and shit. Not a bad deal overall.
>>
>>101611869
Thanks, I was just looking for an excuse to try llama3.1, but I don't think I'll be installing linux just for it
>>
>>101611916
>Commercial models, or even more generally, instruction models, don't get it
They do, though
>>
>>101611895
just use find & replace in notepad, duh
>>
>>101611926
based retard, i went and just asked one of my characters anyway in the best way i could think. works fine.
>>
File: file.png (104 KB, 1312x490)
104 KB
104 KB PNG
Mistral Large 2 not true 128K?
>Rope theta appears to be configured for 32k context length
>https://huggingface.co/mistralai/Mistral-Large-Instruct-2407/discussions/16
Robert switched targets from Phi team to Mistral
>If there is any way to contact Mistral directly I would like to explain a few of my ideas in that regard.
>https://huggingface.co/mistralai/Mistral-Large-Instruct-2407/discussions/4#66a1608d13bb4260eda2407e
>>
>>101611941
>Mistral Large 2 not true 128K?
No model so far except Gemini-1.5-pro is a true 128K https://github.com/hsiehjackson/RULER
>>
>>101611941
Lol...
>>
>>101611789
>FUN? ON MY WORTHLESS SLOPTUNE DISCUSSION BOARD? NUH UH!
>>
>>101611941
>If there is any way to contact Mistral directly I would like to explain a few of my ideas in that regard.
imagine this braindead retard contacting mistral to tell them to add random noise to their weights
kek
>>
>>101611941
More like gossiping about random e-celebs general
>>
>>101612027
>random e-celebs
They make quants it's perfectly on topic to discuss if they're thrusty individuals
>>
>>101612027
not an e-celeb, it's just laughing at this pajeet trying and failing to be relevant on the new trend while having no idea what he is talking about
>>
File: asgbasgbaswg.png (15 KB, 967x883)
15 KB
15 KB PNG
>either sillytavern or kobold just shot me a 1024 tokens response where 90% of the response is completely empty
what in the god damn?
god i hate when shit just starts to break for no damn reason. someone shoot me a screenshot of advanced formatting settings so i can just copy yours verbatim.
>>
>>101611949
>mememark
>>
>>101612060
it's not
>>
>>101612027
shut up undster, if you dont like being made fun of then contribute something worthwhile for once.
We already know you and your gaggle of discord fags aren't thrustworthy.
>>
>>101612047
It's not a pajeet, it's an arab. There is a link to his twitter where he only post in arab.
>>
>>101612058
it's the equivalent of all-black images from NAI stable diffusion
>>
>>101612058
just set something like: \n\n\n\n
in stopping srings this sould block any model from doing that
>>
I'm thinking of making a rewrite extension that would look at the last generated message and replace specific words or sentences.
Basically the user would be able to add entries mapping a word to be replaced to one or more words that will replace it, including an empty string.

Is that something that would be useful or can you already do that with the regex extension?

>>101612058
I've had that happen when using logit bias, meme merges, and broken quants.
Try adding \n\n\n\n to your stopping strings.
>>
>>101612074
Whatever, I use "pajeet" as in "third wordler" not literal indian
>>
>>101612079
>Is that something that would be useful or can you already do that with the regex extension?
you *can* (use regex) but it's a bit annoying and only has one replace choice afaik so I'm interested.
>>
>>101611916
skill... wait for it... issue
>>
>>101612074
Fucking sand nigger
>>
>>101612079
This gives me an Idea, what if one were to train a very small model like Phi3 mini to deslop the last message. I feel like that could work better than regex.
>>
>>101612121
>Phi3 mini to deslop
and make sure it's safe and inclusive too?
>>
>>101611818
>not waiting until 3.1 was out to train the 70B
Lmao.
>>
>>101612092
Alright. Thanks.

>>101612121
I was thinking of adding something like that, using BERT or the like to rewrite the sentence where a given keyword was found.
>>
File: file.png (68 KB, 1299x404)
68 KB
68 KB PNG
>>
Updated Mistral Large preset:
>>>/vg/488008579
>>
How is Meta, a giant conglomerate with a giant research department, not catching up to Anthropic, a startup founded just the other year?
>>
>>101612184
anthropic is basically OpenAI 2.0
>>
>>101612184
To be fair Anthropic are made of ex-openai fags, and they had some secret sauce to improve coding performance. Otherwise, Claude is not really too special compared to GPT, unless you're an ERPfag. OpenAI is still ahead in multimodal capability in theory, according to their claims of what 4o can do unrestricted.
>>
>>101612184
Anthropic are something else, man. The jump from Claude 3 to 3.5 Sonnet isn't natural. I think (((they))) might have had a hand in this.
>>
File: 1715134722161361.png (66 KB, 539x926)
66 KB
66 KB PNG
>>101612219
Still worse than OpenAI.
>>
>>101612173
Why is there a second aicg in /vg/ of all places??
>>
>>101612184
You mean the tiny indie company that's funded by Amazon?
>>
>>101612244
wtf is "nyt connections"
>>
>>101612184
Just because a company is a startup doesn't mean they're starting with 0 experience and money.
>>
File: awsgawsgawgw.png (30 KB, 873x94)
30 KB
30 KB PNG
So.. This is the power of base instruct..
yeah im going back to magnum, what a shit show. Not even triple checking the card's prose and rewriting some old messages can salvage this. That and a few other cards acting a little ""aligned"" didn't help.
>>
>>101612247
More posts per thread, hidden from the low quality people from /g/.
>>
>>101611550
>That fucking smiley face
Mrader deserves everything they got coming to em
>>
>>101612257
Basically how good they are at correlation. It's one of my favorite use of LLM, good general recommendation engines.
>>
>>101612264
>hidden from the low quality people from /g/.
/lmg/ needs something like that. in /sci/ or something
>>
>>101612275
they have shit prompts then, Claude really likes XML specifically.
>>
>>101612244
Mistral Large 2 is kinda low...
>>
File: 1646730011144.jpg (15 KB, 309x269)
15 KB
15 KB JPG
What the fuck is character AI?

Is it like Janitor?
>>
>>101612247
Right? Glad I wasn't the only one
>>
>>101612184
I'm glad that inherent in your question, you agree that OpenAI is basically a nonentity now
>>
>>101612275
How to use LLMs for reccommendations? Is there a general algorhythm for any domain of data?
>>
>>101612244
Um, mistralbros, our response?
>>
>>101612282
unironically probably the most believeable AI roleplay online. Ignore everyone that says they get better results on local models, it's pure cap.

C.AI basically used some type of model based of discord chats (this is a rumor but it has to be something like this) which makes the chats insanely realistic. But there's a faggot filter which forced most people over to Silly Tavern front ends like me
>>
>>101611589
>>101611550
Why are undis multiplying?
>>
>>101612309
its Unditosis
>>
>>101612309
LOVE 'EM OR HATE 'EM YOU GOTTA LOVE THE UNDSTER
>>
How are LLM so much better than me at this stupid NYT connections shit: https://www.nytimes.com/games/connections
>>
>>101612275
Example? Like... Correlating that nigger neighborhoods = violent neighborhoods?
>>
>>101612309
I would still take multiple Undis over the Sao shilling spam.
>>
>>101611789
>>101611799
>king of test my finetune and give me feedback asserting his dominance over lesser placebo demons
>>
>>101612244
Damn, I thought Qwen2 72B was good
>>
File: 1716329112755149.png (674 KB, 1792x1024)
674 KB
674 KB PNG
Daily reminder
>>
>>101612261
Buy an ad
>>
>>101612375
It's too hard.
>>
>>101612278
100% agree, we could talk about papers and stuff there
>>
>>101612431
Buy this *grabs your nuts*
>>
>>101612303
C.AI just has a decent dataset, and a professional RLHF fine-tuning tailored for RP.
Funnily enough, both of the above never happens in local models. Sad.
>>
>>101612428
Can someone please post the real one
>>
>>101612463
>>101612303
Even ironic shilling is still shilling. Some naive anon will see this and think C.AI is better than a 2B model with 2K context (it isn't)
>>
>>101612511
Name me a single model that matches the natural conversational flow of C:AI. Why would I shill a website that is totally free you faggot. You think I haven't been looking for alternatives due to the filter?

Any model you find me will have every problem that they all end up having. They were modeled around novel tier situations and not basic conversation. That's the issue with every fucking model.
>>
>>101612419
sucks at language tasks
>>
>>101612375
I couldn't complete a single one, but to be fair I'm ESL and didn't know half the words.
>>
>>101612244
By the way here the related paper:
>Connecting the Dots: Evaluating Abstract Reasoning Capabilities of LLMs Using the New York Times Connections Word Game
https://arxiv.org/abs/2406.11012
>>
File: CHADMAN.jpg (61 KB, 563x1000)
61 KB
61 KB JPG
What are the most realistic models for basic conversations right now that are free? Assuming I have a NASA PC ofc

I wanna NUT
>>
>>101612565
Holy shit prompt:
https://github.com/mustafamariam/LLM-Connections-Solver/blob/main/automated_call/prompt_llm.txt
>>
>>101612576
Mistral Large 2
>>
File: 1705881925571490.png (31 KB, 589x229)
31 KB
31 KB PNG
>>101612582
And they pass this entire prompt as a fucking USER prompt, not in system role/prompt for models that support it (claude/gpt-4o)
>>
>>101612589
LINK ME UP KING
>>
Okay nevermind they do pass it as the system prompt with gpt-4o, but as a user prompt with claude, nice comparison bro.
>>
>>101612582
>Remember that the same word cannot be repeated across multiple categories, and you need to output 4 categories with 4 distinct words each. Also do not make up words not in the list. This is the most important rule. Please obey
You can feel his pain in this line, kek.
>>
>>101612596
Makes sense. System prompts aren't really that important when you're not a chatbot provider/maker. And some models don't support a system prompt.
>>
>>101612599
https://huggingface.co/mistralai/Mistral-Large-Instruct-2407
>>
>>101612642
It doesn't make sense, because models *are* trained to follow the system prompt more. I'll see if I can easily do this benchmark and play around with the prompt, I have Opus and 3.5 Sonnet.
>>
>>101612596
he better not be passing the words in as all-caps and surrounded by quotes... think of the tokenization... aieeeee
>>
File: file.png (84 KB, 554x435)
84 KB
84 KB PNG
>>101612620
kek, the 'please' really sells it.
>>
>>101612596
System prompt is a meme that exists only to stop people from writing "ignore previous instructions"
>>
>>101612643
I need to downlaod all of those 4GB files? Yikes
>>
>>101612582
Note that this is not code used to get results that anon posted here. They simply copied the idea of the twitter dude once it got popular and wrote a paper on it without crediting.
>>
File: 1719268367929183.png (58 KB, 467x857)
58 KB
58 KB PNG
>>101612659
it's actual pajeet code if you look, holy fucking shit
>>
>>101612656
If a model is trained to follow the system prompt and the system prompt says to obey user requests (unless they're unsafe), then they should be able to do that. If they can't, and performance is degraded, then that's a deserved minus point for the model.
>>
>>101612684
>the system prompt says to obey user requests
But those pajeets didn't pass any system prompt for Opus.
>>
>>101612674
lmfao
>>
File: 1720314448685348.png (45 KB, 749x315)
45 KB
45 KB PNG
>>101612670
Oh, interesting. Did the twitter guys publish their repo?
>>101612659
He is doing exactly that, all uppercase. He only removes [] and quotes
>>
>>101612690
Are you sure there isn't a generic system prompt in place for these models if one isn't provided? If they aren't accounting for system prompt, then sure, this would be a flaw of their method.
>>
>>101612712
I don't think so, he is doing that for a while now. I think he said he don't want it to get popular to avoid LLM being trained/benchmarked on it but that it's super easy to reproduce anyway.
>>
>>101612721
>Are you sure there isn't a generic system prompt in place for these models if one isn't provided?
Yes, check the repo and README, they use the same system prompt for all models.
>>
File: 1708303002167547.png (240 KB, 680x510)
240 KB
240 KB PNG
>>101612620
>Please work
>Please
>>
>>101612712
>>101612730
More information:
>Uses an archive of 267 NYT Connections puzzles (try them yourself if unfamiliar). Three different 0-shot prompts, words in both lowercase and uppercase. One attempt per puzzle. Partial credit is awarded if not all lines are solved correctly. Top humans get near 100.
>>
File: file.png (87 KB, 907x734)
87 KB
87 KB PNG
>>101612303
It's still trash. Broke character in one message. And the function is even wrong. Wtf is this trash
>>
File: 1703901542568306.png (202 KB, 1132x501)
202 KB
202 KB PNG
>>101612752
ignore the claudeslop, but yeah, even claude can stay in character more than this
>>
File: 1714375129190591.png (183 KB, 1186x717)
183 KB
183 KB PNG
>>
File: 1703045472318512.png (109 KB, 922x412)
109 KB
109 KB PNG
>>
>>101612800
sovl
>>
>>101612800
>>101612817
kek
>>
>>101612800
>>101612817
Model? Card?
>>
>>101612752
>write a python func-
Stopped reading.
>>
>>101612835
3.5 Sonnet, preset, prefill and everything else from >>101561964
>>
File: wew.gif (674 KB, 474x498)
674 KB
674 KB GIF
I'm an utter noob coomer to this shit. What is this Nemo that people talk about?

Is it good if I just want basic chat interactions that feel real with an AI? Even on /vg/ they recommended it and I don't think they use LLMs that much over there.

I have no idea how to find out which models excel where so I can find one that fits my needs
>>
>>101612494
There is no real one, thats the only one.
>>
On the topic of triangles, this is some nice OST moosic https://www.youtube.com/watch?v=-1ceYDToVCU
>>
File: local-struggle-its-ok.png (121 KB, 474x579)
121 KB
121 KB PNG
>>101612877
you can't gaslight me anon. I've used LLMs.
Fuck captcha.
>>
Where new bread?
>>
what's better? mini-magnum, nemo 12b base, or nemo 12b instruct?
>>
>>101612971
Nemo 12B Instruct
>>
>>101612971
for RP / creative stuff? Base if you are not a retard. Goes for any model.
>>
new bread
>>101612988
>>101612988
>>101612988
>>
>>101611920
Just wait a couple of days for koboldcpp rocm to update to 1.71.1 for the rope scaling fixes and you should be able to try 3.1 ggufs
>>
File: 142140240420.png (97 KB, 640x626)
97 KB
97 KB PNG
Anyone tried Nous-Hermes-2-Mixtral?

How is it compared to Nemo?
>>
>>101604707
Out of curiosity, what template are you using for function calling? (IE: How are you listing the functions?)

My issue is unless I give it a one shot example of invoking a function via json or xml, it always tries to do a fucking python markdown block, but otherwise, most models I try recognize and invoke functions pretty reliably when fed the function definitions using raw json definitions similar to how they're listed for OpenAI stuff.
>>
>>101607953
My refusal to pay is because I don't like the idea of jackass providers deciding for me what model I can use, or that it can stop working at any time, or change functionality at any time, or start working differently.

Local models means my model behaves exactly the way I want it to, and it won't just suddenly change on me.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.