[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: ComfyUI_00222.jpg (812 KB, 2048x2048)
812 KB
812 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>100357937 & >>100349031

►News
>(05/06) IBM releases Granite Code Models: https://hf.co/collections/ibm-granite/granite-code-models-6624c5cec322e4c148c8b330
>(05/02) Nvidia releases Llama3-ChatQA-1.5, excels at QA & RAG: https://hf.co/collections/nvidia/chatqa-15-662ebbf6acc85f5c444029a8
>(05/01) KAN: Kolmogorov-Arnold Networks: https://arxiv.org/abs/2404.19756
>(05/01) Orthogonalized Llama-3-8b: https://hf.co/hjhj3168/Llama-3-8b-Orthogonalized-exl2
>(04/27) Refusal in LLMs is mediated by a single direction: https://alignmentforum.org/posts/jGuXSZgv6qfdhMCuJ

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling/index.xhtml

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>100357937

--Red Hat Announces RHEL AI: >>100358995
--Revolutionary LLM Feature Transfer Tech?: >>100359185 >>100359239
--Anon's ERP Model Review: Instruct, Tsukasa, Lumimaid & More: >>100362056 >>100362078 >>100362127 >>100362182 >>100362233 >>100362285 >>100362230 >>100362315 >>100362253
--Training AI to Discern Truth from Falsehoods in Online Learning: >>100362918 >>100362962 >>100362996 >>100363032 >>100363065
--Exllama2 Crashing Issues with TabbyAPI and GPU Memory Usage: >>100358064 >>100358087 >>100358326 >>100364096
--gpt2-chatbot is MAI-1, Microsoft's Anti-OpenAI Model: >>100358074 >>100358093 >>100358643 >>100359649
--Found 'Locustgirl' Image in Archive Using Keyword Search: >>100359218 >>100359305
--Llama-3 Models Struggle with Possessive Forms: >>100360206 >>100360239 >>100360241 >>100360264 >>100360352
--DRY Repetition Penalty: A Game-Changer for RP Looping Issues?: >>100360602 >>100360779 >>100360932 >>100361055
--Llama.cpp: Unexpected Space in Context?: >>100360999 >>100361078 >>100361563 >>100361767 >>100361872 >>100361564 >>100363017 >>100363214 >>100363318 >>100363436 >>100363550
--Huggingface's Grip on Datasets and Models: A Cause for Concern?: >>100361377
--CPU Speed Boost? Llama3-8B on Old Laptop Surprises Anon: >>100363225
--Backend Confusion: Oobabooga, Llama.cpp, and Kobold.cpp: >>100364150 >>100364161 >>100364170 >>100364247
--MS Copilot' s Sampling Behavior & Llama.cpp Server Experiment: >>100364264
--Newfag Seeks Help with Wizard 13b Model Prompts: >>100360726 >>100360796 >>100362412
--The Quest for an Open-Source AI Messiah: >>100361831
--Miku (free space): >>100358483 >>100358488 >>100358534 >>100358628 >>100358811 >>100359392 >>100359675 >>100359866 >>100360096 >>100360173 >>100360272 >>100360306 >>100360365 >>100360413 >>100360602 >>100360636 >>100361252 >>100361385 >>100361909 >>100361960 >>100361967 >>100363573 >>100364012

►Recent Highlight Posts from the Previous Thread: >>100358467
>>
>>100363214
It seems I have to correct myself yet again.
The server unconditionally passes the add_special flag to a function called llama_tokenize when tokenizing the first part of the prompt.
That function then checks whether the model has the special_add_bos flag, this is printed as tokenizer.ggml.add_bos_token on console and can be changed with --override-kv.
If both flags are true, a BOS token is added.
>>
>>100364675
if that's confusing even to you that's a clear sign there's too many special cases and code should be deleted.
the only good commits are red commits.
>>
>>100364633
>>100364645
tet
>>
>all this convoluted backend tokenization bullshit
Oh my fuck.
>>
TTS anons rise up. Share with me your secrets. Reposting in this thread. I have a lot of voice samples and want to distill it down into a TTS model to use for RP. What have you tried? What works for you?
>>
File: 1708831432069964.jpg (223 KB, 1756x1756)
223 KB
223 KB JPG
Kurisu
>>
>>100364645
>Red Hat Announces RHEL AI
>Red Hat Enterprise Linux AI (RHEL AI), a foundation model platform to seamlessly develop, test and run best-of-breed, open source Granite generative AI models to power enterprise applications. RHEL AI is based on the InstructLab open source project and combines open source-licensed Granite large language models from IBM Research and InstructLab model alignment tools
How did this not get a single (you)? This seems like pretty big news.
>>
Is there a way to log what is going directly into the model? At this point I have no fucking idea if I should have add bos token clicked in ST or not. And yes I know about ST console but it seems that doesn't matter.
>>
File: phi-3 mini fp16 temp 0.png (108 KB, 780x957)
108 KB
108 KB PNG
What the everliving FUCK is happening? I am so fucking done
>>
If I swipe on MidnightMiqu I get totally different responses, if I swipe on Llama3 I get pretty much the same just reworded a bit.
What does this say about the models?
>>
>>100364778
VoiceCraft came out recently, but seemed convoluted to get working. XTTSv2 + RVC is still the gold standard for voice cloning.
>I have a lot of voice samples and want to distill it down into a TTS model
Try finetuning StyleTTS on your samples.
>>
>>100364813
>Granite generative AI models
>34B
YAY!
>Coding only
Oh..
>>
>>100364834
Thank you! I now have somewhere to start!
>>
File: file.png (923 KB, 850x1370)
923 KB
923 KB PNG
>>100364830
There is a bug somewhere.
>>
>>100364813
>Granite generative AI models
>34B
YAY!
>Coding only
Yay..
>>
>>100362285
Any reason why you have two mediums in your macro?
>>
>>100364645
>gpt2-chatbot is MAI-1, Microsoft's Anti-OpenAI Model
are you retarded?
>>
>>100364633
Thread Theme:
https://www.youtube.com/watch?v=nZNwH4-l1WY
>>
>>100364973
/lmg/ queen
>>
I was about to give up on llama3 but
setting temp smoothing all on 2 and getting rid of any sysprompts made it work pretty well
there are occasional (((whispers))) but they dont get repeated too much
the biggest issue is with its popculture knowledge though... cant fix that with samplers
>>
I usually reserve all my shitting on Undi and never dare shit on actual devs but this beginning of sentence token thing is a complete shitshow. Doubled token should clearly always be deleted on the backend level cause I can't even imagine what sort of retarded research you are doing if you intentionally add double token. And if you are doing it you should be forced to go out of your way to do it because probably nobody will do this intentionally anyway. Enjoy your bugs and people not knowing if it is working or not.
>>
File: slowwww.png (122 KB, 545x447)
122 KB
122 KB PNG
>this is news according to twitter
we knew that last week
>>
people on every other social network are so fucking retarded

I keep seeing people who should know better, industry insiders even, happily speculate that gpt2chatbot is gpt-5 with seemingly no awareness of how incredibly bearish it would be for OpenAI if that were the case

they have tried the model, so they KNOW it's only 10-20% better than current gpt-4-turbo, but somehow they think it would be good news if it turned out to be GPT-5. rather than clearly a sign that everything has stopped and LLMs are over

obviously there's retards and schizos here too, but the specific forms the retardation takes here are somehow much more tolerable and don't make me want to shake people and ask them what the fuck they're thinking
>>
File: FB_IMG_1683640914996.png (300 KB, 563x645)
300 KB
300 KB PNG
So when's the next big happening? Llama 3 was kind of a nothingburger thanks to no models in between 8B and 70B
>>
File: 1702444124052011.jpg (46 KB, 646x648)
46 KB
46 KB JPG
>>100365192
>cant run the 70b
>>
>>100365192
>Llama 3 was kind of a nothingburger thanks to
after a month, llama.cpp still has issues running the damned things
>>
>>100364830
Some models are just extremely overconfident in what they want to say. You can look at the logits and see it directly. It's not just llama3, Mixtral-8x7b-Instruct and XWin are also like that. Nobody seems to know exactly what causes it: overfitting, RLHF, or just the makeup of the dataset are all possibilities.
>>
>>100364830
that llama3 is overcooked
>>
>>100365258
>>100365265
>>100364830
try snot sampling and that new rep penalty magic method the name of which i forgot.
models have different "natural" temperatures. midnight miqu is just a hot bitch
>>
>>100364914
probably to get medium with twice the propbability of short or long
>>
>>100365205
I CAN run whatever I want given enough time, but after a certain point it stops being worth it
If I had an AGI model that runs at 0.05 T/s I wouldn't use it
I REFUSE to throw more money at the problem, you can do that forever and not be satisfied
>>
What the fuck is snot sampling
>>
>>100364914
What >>100365318 said.
There's a lot of really cool macros on silly.
Just a note, if you ever want to use random in a prefil, using the "Start Reply With" field, use the pick macro instead. It's like random but it won't change for every token generated, which doesn't do anything bad aside from making silly have an epileptic attack while generating.
>>
>>100365296
>snot sampling
DIE ALREADY
>>
>>100364834
NTA but any setup tips for XTTSv2 +RVC? On loonix
>>
>>100364778
I've just been using xttsv2. trained with 3 minutes of clean audio (no background noises). i literally ripped voice clips from a game wiki and edited out gaps in audacity, lol.
>>
>>100365159
It's GPT-4+x. They're running a trial balloon to determine the value of x. You're saying it shouldn't be 1. What about, say, 0.1? GPT-4.1 being 20% better beats expectations and sama wins. GPT-5 isn't safe for release until after the election, they'll say.
>>
>>100365405
Thats almost exactly what I'm trying to do then. Sweet. I have about 45 4-10 second audio clips.
>>
Can you change the temperature while it is streaming or is that only possible at the beginning?
>>
>>100365356
Any chance you could share your settings overall? I'm using the official ones from ST (aside from the last response field) and it just repeats the previous responses word for word. I even double checked and I have the bos added correctly.

Skip special tokens?
>>
>>100365347
You are in the wrong hobby mate
>>
SNOT IS THE AGI BEFORE THE AGI


BOW
>>
>>100365392
XTTSv2 is just Coqui's best and largest model before they shut down.
>pip install tts
is all you need.
If you use ooba, you can try alltalk_tts
For RVC, I use this: https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/main/docs/en/README.en.md
>>
>>100365450
If 24gb is seen as meager, I shudder to think of how someone with 8gb feels
I used to count myself among them until a few months ago, dark times in hindsight...
>>
>>100363959
>>100364183
>just the normal instruct.
Yes, I was using the ortogonalized one.
In general normal instruct works, it just has a higher refusal rate. Longer system prompts or prefill of course works with it.
Breaking the initial "I cannot" or similar response by adding some token there (in my cases I added "Lili"), the rest of the stuff was 0shot except 2 refusals which I did regenerate (in >>100363023) - works even with normal instruct as it did with l2-chat and does with many other models, local or otherwise (same trick for example works easily with cloud models like Claude, all versions)
This is true of l3-instruct, and it was true of l2-chat, I think most people are familiar with it by now.
I guess someone could try to do the orthogonalization better (find if the refusal for ero writing is different from other ones), or just do it correctly with DPO or RLHF or similar techniques - at least if you want to preserve meta's tune (llama3-instruct), if not, we do have a number of acceptable tunes, of course their replies differ considerably - cat-llama seemed fine here, for example and others worked too.
>>
>>100364633
Teto my beloved

https://www.youtube.com/watch?v=zo0_EzD64OE
>>
>>100365450
>the hobby
we ham radio boomers now?
>>
File: 468519163.jpg (1.41 MB, 2048x2048)
1.41 MB
1.41 MB JPG
>>100364633
Looks like Teto Tuesday is back on the menu boys!
>>100364265
Good taste anon. I would gladly accept all the sloppy shivers, bonds and mischievous winks in the world as long as that supremely sexy voice was narrating everything.
>>100365347
>I REFUSE to throw more money at the problem
Kek, ngmi
>>
File: 00060-2743128381.png (469 KB, 512x768)
469 KB
469 KB PNG
thread theme
https://www.youtube.com/watch?v=LNsx5k9VWlc&list=RDGMEMCMFH2exzjBeE_zAHHJOdxgVMLNsx5k9VWlc&start_radio=1
>>
>>100365431
At the beginning. The sampling parameters are sent with the prompt.

>>100365436
Believe you me, you don't want my settings.

> it just repeats the previous responses word for word
I had that issue with mixtral until a couple daysa go, which is why I've been experimenting with macros and prefils, and I'm still trying shit out..
Instead of my settings, try something like this : https://files.catbox.moe/kzbi1n.json
Not the exact style, but the general idea. So far it seems that I managed to remove repetition from Mixtral, but I'm still trying shit out.
Try it with normalized samples, and as far as I'm aware, for llama3, Skip special tokens needs to be disabled.
I personally always use minP of 0.05, but that's not really doing nothing most of the time unless you have a really schizo model or high Temp.
See if that helps at all.
>>
>>100365258
show me your penis for proof
>>
>>100365523
I like this Teto
>>
Is Midnight Miqu 1.5 the current consensus choice for 48 gb vramlets for ERP? It sure seems that way based on everything I read but just want to confirm.
>>
>>100365413
It's a small incremental improvement. I think calling it 4.5 would be a mild disappointment, but not company-killing.
Calling it 5 would be company-killing and show that Yann LeCun was right about everything and LLMs are dead.
>>
>>100365581
*growls angrily* stop shilling that discord shit, nigger! everbody knows that miqu > midnight miqu. go back! *crosses arms and pouts.*
>>
>>100365296
fuck off you fucking shill, I hate you even more than petra
>>
>>100365581
nope
l3 70b
hope this helps!
>>
>>100365504
Basically yeah.
>>
How big of a factor is core / thread count when partially offloading?
>>
>>100365544
What the hell? Huh putting the actual system prompt in there seems to have done the trick, my previous version had two of them (which I probably got from a previous discussion with you perhaps.)

https://files.catbox.moe/epf0uo.json

Having multiple broke down at higher contexts but this seems fine. Will continue testing (this one I'm using it on is at 17k), appreciate the share.
>>
>>100365581
Yes! Midnight Miqu 1.5 is the current consensus choice.assistant
>>
>>100365504
Some sombitch outbid me on a 48g p100! I lost it by 10 dollarydoos! Ffuuuuccckkkkkk!!
>>
File: 1714835911803029.png (1005 KB, 1024x1024)
1005 KB
1005 KB PNG
>>100365555
motherfucking checked
>>
>>100365642
Actually nope.. I think it might be a problem with my context template. But the ST official seems to be correct?
>>
>>100365588
yeah lol, if this is GPT-5 the shock of the disappointment would severely damage the entire industry

we'd be looking at total hype cycle collapse, large nvidia stock price drop etc.
>>
>>100365678
https://files.catbox.moe/c9ajoc.json

forgot to link it
>>
lmg has fallen
owari
>>
>>100365650
Sir, profanity violates FCC regulations and can result in fines and/or the suspension of your 4chan posting license. Please refrain or we'll be forced to trace your IP and file an official complaint.
>>
>>100365678
That seems to be right, yes.
One thing I forgot to say, my weird ass instruct json probably works best on a new chat.
The idea is that the model creates these patterns and starts repeating them in a snowball effect, and the noise/randomness that the prefil/last output sequence adds should keep the model from creating these patterns in the first place, or at least not sticking to them so strongly at the begining, stopping the snowball from rolling.
Something like that.
>>
You obviously have never listen to 80 meters at night lol
>>
>>100362056
Has anyone gguffed or exl2'd Llama3b 70b storywriter
>>
>windows idle vram usage has improved so much since last year that I now have to use lower max context when I'm in linux, rather than the other way around like it used to be
linux really fell off
>>
File: 93757267_p0.jpg (667 KB, 1148x2048)
667 KB
667 KB JPG
Testing tokenization, when I go into Mikupad and delete all the context, the token count says 2. This is while I am using Llama 3 8B and Ooba with Transformers. When I go into Ooba's notebook and check the list of tokens on empty context, it simply has the BOS token. So that seems like proper behavior. Is Mikupad listing 2 tokens for an empty context a bug with tokenization or a reporting error?
...
Testing further, it seems to be a reporting error. When I compare token probabilities with no BOS token in context, I get the same probs.

Now here's an an observation that might be more interesting. When I add an extra BOS token (so the model see two in total), the token probs do change significantly. There is indeed some effect to having something before before the BOS token, though I'm not entirely certain if the effect is neutral or negative, yet. On a single riddle I tried, it seemed to degrade quality.
So when using models we should probably make sure we are not having more than 1 BOS token.
I think I've been testing models wrong all along, my god...
>>
i'm the best proompter in the world.
>>
>>100365857
I noticed that too. Linux still has a slight edge on token generation rate due to Triton still not supporting Windows, at least.
>>
>>100365791
>https://huggingface.co/InferenceIllusionist/Llama-3-70B-Instruct-Storywriter-iMat-GGUF
>>
Is phi3 actually 128k ctx?
>>
>>100365898
While I don't agree with g*ganov that it's the backend's job to add BOS, I think you can disable the behavior easily. The original reason to include a BOS token is because the math does not allow you to sample an empty context, you get an empty tensor otherwise, so you need some sort of filler. The models are usually trained with BOS prepended, but tend to work okay when sampling if you omit it, just can be a bit more random. You can of course always just feed the backend stuff directly, or even better off, make it dump post-tokenization.
>>
>>100365950
yeah, dude, it REALLY is! when you think about it, EVERY model is really 128k ctx. they just all get retarded after 4k!
>>
File: GraniteCodeFigure1.jpg (432 KB, 2048x1573)
432 KB
432 KB JPG
>>100364847
>>100364890
Strange that the only benchmarks they display is for the 8B model. The 34B might not be worth bragging about.
>>
File: 1695312649133175.jpg (252 KB, 1000x1000)
252 KB
252 KB JPG
>>100365958
my llama3miku is coherent at 16k
cope
>>
>>100365985
What is the point of releasing code models with such small context sizes?
>>
Is the new llamacpp flash attention implementation supposed to make token generation slower? I'm offloading if that matters.
>>
>>100365947
>https://huggingface.co/InferenceIllusionist/Llama-3-70B-Instruct-Storywriter-iMat-GGUF
Thank you for your service.
>>
>>100366023
I think the main point of it is to significantly reduce the vram cost of context
>>
File: canvas.png (562 KB, 2000x2588)
562 KB
562 KB PNG
>>100366019
I guess ok for the VRAM-poor, but Llama3 70B Instruct is still the best.
>>
>>100366037
I know about this, but it kills my speed. Maybe it's supposed to be used only if you can fully fit the model in GPU. Sad
>>
>>100366023
>>100366067
Are you using koboldcpp or an older CUDA version?
>>
>>100366067
It would have been a whole lot better, had niggernov merged 4-bit KV caches a few months ago
But it'll happen... maybe... probably. I haven't seen an open PR for it
>>
>>100366084
I'm using koboldcpp, the experimental branch. CUDA version is 12.4
>>
*pulls my 4.5 inch penis out and growls huskily.* "who wants to be my kitten?"
>>
>>100365898
The Mikupad code re: token counting is really hand wavy. It just multiplies characters by a constant (honestly a smart move).
>>
>>100365192
Nothing in terms of models. But there are already quant methods that would let you fit 70B as a ~24 vramlet. They just aren't implemented. So I guess that would be the only development possible without surprise models being announced.
desu we have time because L3 70b kind of sucks anyway and needs to be de-slopped, who knows how long that will take? For 8B I think the instruct is too over-baked to be useful so people would have to make new finetunes from the base, but the overbaked instruct is where all the applause comes from so it's mixtral all over again
>>
>>100366230
what's with llama3 and claude with >husky voice, huskily, and so on, what the fuck awful corners of the internet did they train on that had so much of that. shouldn't have filtered nsfw because somehow that didn't get filtered but the good shit probably did?
>>
>>100365857
Not the case with Intel and AMD GPUs especially with max VRAM allocation with kernel 6.x.xx. Not sure if it's a Nvidia thing though but would make sense given their focus until recently.
>>
>>100366254
>would let you fit 70B as a ~24 vramlet
After using older 2.4bpw and now some 2bit ggufs (with a bit of offloading) I don't think it is worth it.The feeling I got is that command-r and mixtral are better at those 3.5 to 4 bits you can run them at. 2 bitting is too much brain damage. Maybe that lora anon will make them usable cause offloading a bit isn't that bad. Also maybe this would be good:
https://huggingface.co/ISTA-DASLab/Meta-Llama-3-70B-Instruct-AQLM-2Bit-1x16

If you could actually run it.
>>
>>100364830
MM is a cold fish. Gives two sentence responses and gives up.
>>
>>100366269
*chuckles darkly with a mischievous glint in my eyes* "i don't know. maybe..." *grins wickedly.* "...we should tackle this conundrum together? *ihope against hope that you'll reveal your true desires and succumb to my cunning plan... because what i truly desire is to journey into the future hand in hand, forging an unbreakable bond*
>>
>>100366329
promptlet fucking retard mongoloid alert
>>
>>100366314
I was talking about new methods like that, hqq+ or quip#, including possibly ones that involve additional finetuning like lora anon's. Given the findings that models max out at learning ~2 bits per weight anyway, there's no reason why this shouldn't be possible. It just needs work, mostly backend and coding work, not that much compute. Between that and L3 70B finetuning we don't really need new models, we need to use what we have. Unless we get bitnet or some kind of major advance like Lecun's energy thing.
>>
>>100366314
Skimming through the paper, they claimed that their implementation can provide significant speedups on the CPU
Perhaps we won't need to vrammaxx in a few weeks/months, given enough optimizations
>>
>welcome to the midnightmiqu shill thread!
>>
>>100366338
Reading this made it finally click for me. LLM cooming is doomed. It will never be good. It is just a description of sex. There is one answer. Genitals being rubbed. LLM can't write about it in different ways. There is one answer. There is only one answer. You can't make thousands of answers, when there is one answer. Now I am back to 2MW awaiting infinite context so I can get a waifu and hear her telling me she loves me over and over again.
>>
>>100365192
In about two months we will get mistral 2 7b much more powerful than llama 3 8b and then a mixtral 2 based on it not long after.
>>
>>100366408
>Perhaps we won't need to vrammaxx in a few weeks/months, given enough optimizations
Right now the optimizations are so good that it throws pip install aqlm after you do pip install aqlm.
>>
2 days after I started using midnightmiqu 1.5 my piss turned red.
>>
>>100366450
Quant?
>>
>>100366460
gguf q8 of course.
>>
Something interesting I'm noticing with adding random text to the beginning of context.
When I added "Fuck you.", the token probability of the right answer to the riddle jumped by like 30%, making it get it right. Doing it in all caps only made it jump up like 5%. Putting "..." instead, the probability jumped 80%. Could models become more intelligent just by adding some filler token(s) to the front of the context?
>>
>>100366464
What kind of monster rig are you running? 3xP40?
>>
>>100366269
Trannies?
>>
>>100366492
>>100208151
https://arxiv.org/abs/2404.15758
Though the paper claimed you needed to train the model to do it.
>>
guys, im new to this. i wanted to learn about propting, is there any resources yall can reccomend for me
also, i wanted to ask, when the prompt use thing like {<|im_start|>are they like a specialized token or the model just tokenize them like usual and figure out to treat them different internally ?
i mean if they are tokenize like any other i could just write whatever i want inside these tags right ?
>>
>>100366521
good mrning ser
>>
>>100366359
>a model so good that you need a carefully crafted system prompt before you even insert a chat json
Gay.
>>
>>100366521
https://docs.cohere.com/docs/prompting-command-r
>>
>>100366552
Based. Let the newfag start from hardmode.
>>
File: plight of a promptlet..png (111 KB, 958x952)
111 KB
111 KB PNG
>>100366416
it's not hard to get variety if you're not a promptlet. i literally NEVER see anything like that. i just meme it. made this card in 1 second.

here's the card description: You are a 18 year old female with blonde hair.

Describe in vastly different ways to describe your character stroking a cock in first person. Each description should be 1-3 sentences, the sentence may be as long or as short as you want.

it's just prompt diff. prompt better if you see shit like that.
>>
Someone needs to make something like this but in chan format instead lol
https://only-bots.ai/
>>
>>100366251
Huh. I thought it was calling the backend or something, since the token count doesn't update if there isn't a backend connected.
>>
Is it just me or is Wiz 8x22 incredibly dommy? I feel like it really spreads its wings when it talking to dominant characters.
>>
>>100366572
>grins malevolently
RETARD!
>>
>>100366645
You've never seen someone menace with a grin before?
>>
>>100366645
that's not mischievously! *grins malevolently*
>>
I don't know what you guys are talking about right now. *grins neutrally*
>>
>>100366613
A bit opposite experience here, Wiz 8x22 did well, but Command-R did far better. Wiz seemed to ask far too much for consent while C-R just did it.
>>
>>100366572
>sending...
OOOH THERER WE GO
>... pain
oh
>>
>>100366492
Something like that happens when you try a benchmark question with and without the model’s prompt format, but in the end it tended to average to the same score when ran through the complete set. I think I tried with Miqu and the Arc benchmark.
>>
Anything better than Mixtral 8x7b for 16gb vram? I don't keep up with new shit that often
>>
File: 1715129444627.jpg (187 KB, 1024x1024)
187 KB
187 KB JPG
>>100366251
>>100366610
Mikupad calls the backend API for token counting. The only time it multiplies by a constant is when it needs to convert from token count to character count.
>>100365898
Mikupad adds 1 to the token count to account for the BOS. However, it's very likely that at some point llama.cpp started returning the token count with the BOS already included.
>>
File: 1676854445905044.gif (3.16 MB, 277x498)
3.16 MB
3.16 MB GIF
>>100366798
ic ic
>>
>>100366450
People really like miqu (including midnightmiqu), I found it to be retarded and not good. Lumimaid llama 3 absolutely mogged it imo.
>>
>>100366973
I bet you have a small penis.
>>
>>100367012
kurisufag...
>>
>>100366973
70B or 8B?
>>
what's the best decensored llama3 8B finetune currently
>>
>>100367065
base llama3 with proper prompting
>>
>>100367065
LLaMA-3-8B-Instruct
>>
File: 1686765960592461.gif (1006 KB, 260x187)
1006 KB
1006 KB GIF
Sorry to break it to you all but WizardLM-2-7B is vastly superior to Llama3-8B. It's not even funny how retarded L3 is.
>>
>>100365715
HAM hobbists are such faggots kek
>>
>>100367157
>>100367204
what about the orthogonalized ones?
>>
File: lol_gradio.png (270 KB, 1920x1080)
270 KB
270 KB PNG
hey we all hate gradio in here, right? take a look at this shit. sort by another category, then go back to sorting by rank, and it sorts it FUCKING LEXICOGRAPHICALLY lmao
>>
does mergekit work with llama3? is it compatible with the llamacpp quant update? I want to do a simple slerp of 2 8B models i like but the quant conversion script is giving me a "FileNotFoundError: Could not find a tokenizer matching any of ['spm', 'hfft']" and im wondering if its a skill issue, or possibly an issue with the models im using, or if its just not doable yet with the current releases
>>
File: aicowboy.png (598 KB, 1699x744)
598 KB
598 KB PNG
/g/entoomen why aren't there any local models for music yet? There are tons of sites that look like they use some kind of proprietary big model, so a local model can't be far off, can it?
>>
>>100367491
llama.cpp dev anon will train one with his 10x 4090, trust the plan.
>>
Asking again as no one seemed to know last time: How much more vram does L3-70B require for training vs L2-70B?

I can comfortably train a qlora on L2-70B but runs out of vram on L3-70B. 2x3090.
>>
>>100367712
Huh? It should be the same requirements.
>>
i've yet to see anyone post anything good from l3.
>>
>>100366719
CR+ is just so good. Trying L3 storywriter now and seeing how it compares, but CR+ just hops into anything I throw at it with surprising little context
>>
>>100367749
It totally isn’t. I wonder if the bump in token count has something to do with it…
>>
>>100367414
--share
>>
>>100367882
While I wouldn't call it perfect, for literally 0 effort the output is fine? for example: >>100363023 I've seen both better output from it and other models, but I think nobody can say L3 is bad while being honest?
>>
>>100367712 (me)

This is with axolotl btw.
>>
>>100360206
it's llama.cpp bug, again

https://github.com/ggerganov/llama.cpp/issues/7006
>>
>>100367062
70b. It's just good.
>>
>>100366329
dude you are clueless, MM is known for having ultra long responses, it's a fact, and the challenge is to make her say less
>>
File: owari da.jpg (264 KB, 931x689)
264 KB
264 KB JPG
>>100368064
>>
>>100367958
i said anything GOOD. not anything serviceable. you can get that kind of output from any model 7b+ in existence released in the past 6 months. i'm saying for it supposedly being a shilled 'claude haiku sidegrade', it's just... meh. it's ok.
>>
>>100368128
I don't think it's anywhere opus tier and I've seen places where it performed better and worse than gpt-4 for story and erp/chat. In my experience the 7/8b-s are not anywhere as creative as the 70b and make dumber mistakes though. The big cloud models main advantage is the smarts and sometimes is the writing (for example opus? good dataset, not excessively censored in how it was tuned). It should be close to haiku/sonnet though?
>>
>>100368128
>i said anything GOOD
Here:
>>100294353
>>100315340
>>
File: 1714852151813207.jpg (85 KB, 482x487)
85 KB
85 KB JPG
>>
>>100368243
>when you see migu on stage
>>
>>100366973
llama 3 is 8k context
miqu is 32k context

they are not even comparable. What happens at 0 context is irrelevant. A good prompt can have 2k tokens, then you add some chat and llama 3 is done after 50 messages, while miqu can remember 200+

The meme rope theta context extended llama3 slops are a joke for anything outside of passing the needle in haystack benchmark, they are useless for chatting.

miqu at 22k context (not even full potential):
Adding last 209 messages, starting from 188
Adding 35 pinned messages
PROMPT (20524):...

llama3 at 8k context (doesn't fit):
Adding last 27 messages, starting from 363
Adding 57 pinned messages
PROMPT (10333):...
>>
File: 1714850990860144.jpg (15 KB, 382x184)
15 KB
15 KB JPG
I got these two fresh images from here btw >>>/v/675404250
>>
>>100368275
did you see there was an over 200k (possibly 600k) context extended finetune with perfect scores, did you try it yet?
>>
>>100368223
We want to encourage anon not scare him away.
>>
>>100368275
>The meme rope theta context extended llama3 slops are a joke for anything outside of passing the needle in haystack benchmark, they are useless for chatting.
Nope, it was tested with RULER too and it had a good score.
https://github.com/hsiehjackson/RULER
>>
>>100368223
>body betrays her
>shivers down spine
>low whispers
>nibbles on ear

same old same old if you ask me
>>
>>100368275
Huge cope, llama 3 ropes up very well, and like another anon said, longer context tunes work almost flawlessly due to the architecture.
>>
>>100367712
>>100368042
If unsloth works with multiple GPUs, try that. There are lots of little optimizations it does that together save a lot of VRAM.

Alternatively (shameless shilling), try qlora-pipe. I tested it just now, and was able to train rank 32 qlora on llama3 70b at 2048 context length on 2 4090s. The first GPU only used 21GB, second GPU 23.5. So it's not perfectly balancing memory use (probably because huge vocab in llama3 makes the backprop on the lm_head use more VRAM). If you messed with how it splits the layers between the two GPUs I bet it could go up to 4096 sequence length, or slightly higher lora rank.
>>
>>100368203
To be fair, if the API prices are anything to go by L3 70B is supposed to be a Turbo and Haiku sidegrade. Sonnet and GPT-4 are way, way more fucking expensive
>>
>>100368392
Unsloth does not appear to work on multi gpu. Will test qlora-pipe, thanks!
>>
>>100368376
Nah, the older models were more bland. These outputs have good moments.
>>
>>100368223
Linking the same output three times in a row doesn't make it any less shit
>>
>>100368321
>>100368368
you mean like these?
https://huggingface.co/gradientai/Llama-3-70B-Instruct-Gradient-262k
https://huggingface.co/gradientai/Llama-3-70B-Instruct-Gradient-1048k
it's rope theta slop, i tried it.
>We trained on 34M tokens for this stage, and ~430M tokens total for all stages, which is < 0.003% of Llama-3's original pre-training data.
>0.003%
the model remains retarded, repeats itself, repeats what user said, forgets instructions at the start. It only works in artificial benchmarks. The only difference between original 8k, is that it doesn't just start outputting a soup of random symbols after 8192 tokens, but the intelligence is not there, while miqu will actually use that context, e.g. hinting at something that happened 200 messages ago on its own based on story context, without specifically being asked what happened 200 messags ago. That's the difference between being trained on large context from the get go, and being a slop finetune.
>>
>>100368425
It is aging like fine wine.
>>100368432
The RULER benchmark proved you wrong, though.
>>
>>100368482
Is your wine made of milk?
>>
>>100368482
>le benchmark
i don't care, i have my 400 messages chat and switch between different models. Miqu handles it, Llama 3 doesn't.
>>
>>100368432
>That's the difference between being trained on large context from the get go
Miqu is a llama 2 finetune, which originally is native 4k context. Nobody knows exactly what Mistral did, but my guess is continued pretraining of the model at 4k - 8k sequence length, followed by one round of context extension fine tuning to 32k, followed by instruction fine tuning. Nobody fully trains from scratch at 32k. The existing long context extensions of llama 3 simply aren't doing a good job, or aren't using the right datasets / techniques. But in general that's how everyone extends context length.
>>
>>100368432
a few bil tokens finetune usually is sufficient for near flawless long context, as some meta paper showed before, but I can't say much as I personally don't have enough memory to run such long contexts. The instruct models overall are overbaked and have a repetition issue, but you can mitigate it by using rep pen or DRY sampler.
Most models, including the biggest ones will favor recent output rather than oldest one (usually first lines like system prompt and recent stuff is favored), but what if you were to prompt there to make it recall some middle of the context stuff, does it fail, because I do not expect that to fail.
>>
>>100368525
Enjoy your placebo!
>>
>>100368533
it doesn't fail if you prompt it - that's "needle in the haystack". E.g. at 09:45AM 6th May character says "I like donuts". If I prompt it to tell me what did character say at 09:45AM 6th May, it will say "I like donuts", but if i prompt it "what is characters favorite food" it will hallucinate.

>>100368572
it may as well be, i didn't run 100x identical tests, this is just anecdotal evidence.
>>
>>100368530
>The existing long context extensions of llama 3 simply aren't doing a good job, or aren't using the right datasets / techniques.
They simply aren’t doing anything and it’s just some companies taking credit of how well the original model scales by changing the rope theta.
>>
File: file.png (307 KB, 2120x684)
307 KB
307 KB PNG
>>100368619
That’s nice and all, but RULER proved you wrong.
>>
>>100368243
omg it migu panties
>>
>>100368666
lemme see if i can reproduce this example now with my 400 messages chat again
>>
>>100368533
Honestly, I've seen both L3 and even biggest cloud models fail that test then, where they had forgotten subtle facts from a few paragraphs ago, you can of course go like 'do you remember what you wanted to do earlier, why didn't we do that" (+some hint as to how early) and it will "OH" and realize it, of course, sometimes it fails badly but I've seen even biggest "long range context" models (ex. claude) sometimes fail at it, and gpt-4 too, and llama too, but I've also seen them all succeed at it too, so YMMV?
>>
File: 7b.png (157 KB, 606x885)
157 KB
157 KB PNG
>>100368417
i mean i can get 'good moments' from a 7b.
>>
>>100368744
as a fellow poorfag there something about mistral 7b's prose that just irks me.
>>
>>100368744
No, because that’s extremely verbose and bland. It’s hard to read.
>>
>>100368791
>>100368771
that's actually l3 70b lol
>>
>>100368829
Congrats on your slopped system prompt.
>>
>>100368271
Is that why front row seats cost more?
>>
File: claude.png (112 KB, 828x735)
112 KB
112 KB PNG
>>100368704
I've used Claude a LOT. Its advertised 200k context does not apply to this use case. 200k worth of tokens covering a set of document - yeah it can probably do some QA on that. Maintaining a character over a prolonged role-play that relies on picking up subtle hints and characterization over a long-form log? Nah. From my testing, around 12k tokens of context, it starts to get mixed up in minor ways (forgets things, becomes less adept at picking up subtle hints, starts adding contradicting information to the log - that sort of thing) . At 16k - 32k it becomes worse. Still relatively minor, but definitely noticeable. Past 32k it can get schizo. I've had characters completely alter their personality from message to message, alter their speaking style, forget major plot elements, forget even minor, relatively recent developments (such as the current location we are now in). I just limit the context to 16k when talking to Claude. Past 16k the experience becomes too frustrating and immersion breaking, plus the speed takes a big nosedive. I just use summarization and an array of memories on each character. Works a lot better that way. The model's intelligence also takes a hit, by the way, even outside of the basic forgetting stuff. It's not full retardation, but again - noticeable.

It's a real shame, I've been experimenting a lot of with a custom local RAG flow. CommandR+ is actually incredibly solid all the way up until 32k. CommandR+ 32k-64k its like Claude 16k-32k.
>>
>>100368306
I saw another post of this on twitter too
Don't know what to make of this
>>
>>100368885
Yes, and they're quite obviously worth it.
>>
>>100368899
Nah, you’re just a deranged NAIshill. Claude can make perfect use of the context.
>>
>>100368900
huge if true
>>
>>100365296
>that new rep penalty magic method the name of which i forgot.
DRY. It won't help with swipe variety if the reply is novel, since it only works when the model outputs a sequence of tokens that's already in the context.
>>
>>100368930
I don't use NAI. 8k context on a 14B is not enough for my use case. I use GPT, Claude, and the usual array of locals. GPT by far maintains intelligence and context awareness over long contexts, but then you've got GPT prose. Miqu is also VERY solid up to 32k. Claude is still the "best" model, but to claim "it can make perfect use of the context" you're either retarded, ignorant, shilling, or some combination of the three.
>>
>>100368930
>/aids/ schizo is calling random /lmg/ers NAIshills again
You gonna shill your shitty malware again?
>>
>>100368976
I have seen your post in /aids/, NAIshill.
https://arch.b4k.co/vg/thread/475781740/#476056521
>>
>>100369015
And there's the raid inciting crosspost
>>
>>100368930
get the fuck out with your unrelated nonsense
>>
>>100369035
Keep spreading propaganda against Claude, NAIshill. Claude has perfect context.
>>
>>100369015
Not your army. Take your hate boner for /aids/ somewhere else.
>>
>>100369015
That's not me, and the writing style is completely different. They're just noticing the same issue.
>>
Ideaguy time. Remember tree of thought, or the /lmg/ version tree of big niggas? The benefit was from considering a diversity of options. Seems you would do even better if each alternative was presented by a different model. Kind of the original information theory mixture of experts concept (not the llm specific router moe obviously).

Too much memory to be worth considering, cpumax anon aside. But, if you could get a small group of people together, they could all share this setup: each would host one of the models, and either summarize on their own or have one be the dedicated summarization model. The communication is just the text output at the end of generation, so communicating over the Internet is fine. Would hardly even need anything implemented. You could have like, miqu, l3 (or the Nvidia fine-tune), cr+, dbrx. Presumably you don't want to bother with fine-tune+its base, although maybe that would still be helpful. Maybe even cloud models too.

If I ever convince any of my friends to get seriously into local I'll give it a try. Well I guess I could easily do (slow) evaluations of this idea myself, huh? Maybe I will.
>>
>>100369061
No, it’s just you. /aicg/ doesn’t have that problem. I don’t have that problem. And Claude has perfect scores in benchmarks. You’re the only person spreading this.
Now go back to /aids/.
>>
>>100369086
/lmg/ - Local Models General
Then again it doesn't surprise me you're illiterate given that slop you call output
>>
Stop feeding schizos, anons.
>>
>>100369104
They’re the best logs posted in the entire thread, and they’re fun to read.
>>
File: picrel.png (204 KB, 1648x1186)
204 KB
204 KB PNG
Any thoughts on Qwen 110B? It gets 3rd place on creative writing for EQ bench and pretty decent for normal eq bench.
>>
>>100369114
you mean this slop over here? >>100368899
>>
>>100369117
Yes, that benchmark doesn’t work.
>>
>>100369086
I agree with the guy claiming Claude has its limitations, same as every LLM out there. If Claude or GPT-4 didn't have that "problem", your prefill jailbreaks wouldn't work where you spam 1k token system prompts or long replies for the assistant role. Anyway, this behavior is well known and documented, there are papers analyzing how much GPTs pay attention to the context and pretty much most of them pay attention at the start (sometimes stronger if you tune it for that) and most strongly to the most recent lines. There are exceptions to this depending on the type of positional embedding used, but for most models this applies. Of course most models that are trained for long context can reference back to arbitrary points, but to expect them to not forget small details in middle of long contexts is silly, best you can hope for that it does get the "gist" so to say.
>>
>>100369117
>chink_article_about_chink_models_cheating_benchmarks.html
>>
>>100369117
no one gives a fuck about those mememarks.
>>
>>100369147
>If Claude or GPT-4 didn't have that "problem", your prefill jailbreaks wouldn't work where you spam 1k token system prompts or long replies for the assistant role.
Well, you don’t do that with Claude. So maybe try again with a real argument? It sounds like you have never used it.
>>
>>100369117
>creative writing
>evaluated by LLM
You have to be a black niggerlicious retard if you take it seriously.
>>
>>100369147
Good argument anon, the only flaw is you directed it to a zero IQ retard.
>>
>>100369194
There’s no argument. Just propaganda.
>>
>>100369178
I've seen people do rather long jailbreaks for it, I'm not sure what the minimal size for a jailbreak is though. Again depends on which Claude version you're talking about, I've tried most, and typically few hundred token jailbreaks or else it may choose to not want to write lewd stuff, but once it gets started it does it well.
>>
>>100369214
You only need a sentence in the prefill to jailbreak it, and it has nothing to do with long context length.
>>
>>100369239
Eh, I've seen refusals before with most models including Claude if you use very minimal jailbreaks, of course you can just edit the response after or regenerate. I can't recall seeing my refusals ever with longer jbs.
>>
>>100369246
If you didn’t have the ability to use the prefill and put words in its mouth, you wouldn’t jailbreak it.
>>
>>100369255
It can work without it, with long system contexts or by trying multiple turns. Give it a go sometimes, it's just a bit less reliable. Ultimately jailbreaks are a natural consequence of ICL (in context learning) working and the fact that these are all next-token predictors.
>>
File: file.png (12 KB, 103x308)
12 KB
12 KB PNG
>>100369191
not him, but the sample texts for each model are a pretty easy way to evaluate their creative writing skills and styles without having to download and run dozens of large models yourself.
you're right that the LLM evaluations can be pretty dumb, so the scores/ranks on the leaderboard itself don't really work too good.
>>
>>100369246
Simple one-sentence prefill works for everything with Claude in my experience. I do start getting refusals past 16k on stuff that it had no problems with earlier in the context. Typically just a single regen takes care of it. I never see a refusal under 16k context. The 16k - 32k range is like you're switching to a different model, basically. Intelligence and recall both take big hits. 0-16k context it's the smartest model for RP out there and it's not even a contest. Past that, I have to either switch models, constantly do editing and error correction, or just do the array of memories thing and roll the context up.
>>
>>100369344
>I do start getting refusals past 16k
>I never see a refusal under 16k context.
No, it’s backwards. The more context you have, the easier to jailbreak it is. Jailbreaking with no context is the hardest thing.
You’re just making shit up.
>>
>model A
>traditional sampler settings: coherent
>dynatemp/smoothing: coherent

>model B
>traditional sampler settings: coherent
>dynatemp/smoothing: totally schizo

What causes this? Why do some models seem to "not like" the exotic sampling methods?
>>
>>100369066
Do it
>>
>>100369344
I do have a system prompt of legit instructions that's usually ~1k tokens. Card is usually 500-1000 tokens. Last couple of weeks I've been tinkering with an elaborate RAG pipeline that pulls relevant samples of fiction writing from pinecone and primes (and continually updates) the context with ~4k tokens of relevant text. my DB includes books on fiction writing, erotica, psychology, and symbolism. But with or without the initial RAG pull - 2k context with legit instructions and definitions and a prefill (I use an open XML tag which seems to work better than the "OK no problem, here is my response" approach) - never see a refusal under 16k.
>>
>>100369380
Yes, this is counter-intuitive. But it's real. The only "jailbreak" part that addresses anything related to censorship is my prefill. The rest is all relevant context and instruct.
>>
>>100369417
It’s not real. You’re just mentally ill.
>>
>>100369066
MoBN
>>
https://huggingface.co/Sao10K/L3-Run1

sovl - trained on heavily filtered c2 logs lmaooo (800k dropped to 8k entries)

keeps in char well, is horny, you may need swipes as its a smol 8b model

YMMV
>>
>>100369801
what a slopjob
>>
Llama 3 70B is actually really good. It seems so human-like with the way it interjects about how the story is so depressing and they want to end the story and change to a different happier story.
>>
>>100369914
unprompted llama 3 told me to seek help from a therapist after a few messages.
>>
>>100369914
base model and not instruct?
>>
>>100369943
I'm using the instruct with 262K context from gradientai.
>>
I don't know how you subhumans turn something like breaking character and morally lecturing you into a positive. L3 shills are just literally braindead.
>>
Creepy miku archivist here. I have just found a few more of his pics that were previously lost to twitter's incomplete timeline view (it doesn't perfectly show you all of someone's posts; fuck you twitter). In addition the archive now has more NSFW due to me finding out there was an actual tag for artwork related to this specific doll. Also has more photos/media that other people took of the doll. Might upload tomorrow.
>>
>>100369914
>refusals
>good
Do you like getting cucked and cockblocked? What the fuck is wrong with you?
>>
>>100369978
Mind sharing settings? I'm getting repeats with anything involving instruct
>>
>>100369990
>cuck model has cuck fans
>>
>>100369993
Looking forward to it.
>>
File: Wow.jpg (42 KB, 736x914)
42 KB
42 KB JPG
>>
>>100370027
I have rep penalty at 1.03 which I think is the only one that really matters. I've tried playing around with and without smooth curve, and different temps.
>>
>>100370052
creepy...
>>
>>100370066
What about for context/instruct? I've been running baseline ST settings and it's been causing problems
>>
>>100369801
>literally only 1% is usable
I bet if you manually curate the remaining logs it'll drop to 1k
>>
>>100369338
>>100369191
There is nothing wrong with LLM evaluations. If a human did it you would accuse them of being biased. And no human could possibly rate that many samples. It's tedious as fuck if you have ever tried it. You end up skimming and missing the countless subtle mistakes the LLM makes, which make all of the difference between great and shit models.

Yes the LLM judge will miss things and make errors, but so will humans, maybe moreso. But over thousands of samples, errors will average out. It's only important that the judge's ratings correlates with better writing, not that it be literally perfect every time.

And it's not like the judge is chosen randomly. They have a separate more interesting competition for judging. The judges are scored based on how well their ratings correlate with the arena score, eq bench, and how well they are able to identify which model wrote a given story. If a better judge model comes out, they will switch to that model.

The best judge, Claude Opus, gives ratings that have a 93% correlation with the lmsys arena leaderboard. So whatever it is measuring is at least strongly correlated with the other standard benchmark of model quality. That is a crazy high correlation. Lmsys's own automated benchmark is only 1% higher than that.
>>
>>100369993
What am I supposed to do with the archive I already downloaded?
>>
File: 1714179378610814.png (16 KB, 464x98)
16 KB
16 KB PNG
>>100370072
I use this for both. Silly tavern added it recently, not sure if it's in the main branch since I pull from staging.
>>
>>100370088
I'll be uploading to the litter box again so you can just delete the entire old one. I think I changed some filenames so you should not keep it anyway.
>>
>>100369914
>it interjects about how the story is so depressing and they want to end the story and change to a different happier story
huh, that sounds similar to Claude
>>
>>100365146
Is there a way to guarantee you get GPT chatbot 2 on that site? It seems to be random.
>>
>>100370052
fearing for my life with miku
>>
>>100370077

It's my main work right now, I'm curating and manually cleaning and editing the best entries, the filtering is done.

This was just to see if the logs would work, and it did what I wanted. The final cleaning is the hard part now.
>>
>>100369914
>too dumb to make interesting outputs
>just smart enough to subtly steer it towards gptslop directions
Gayest shit ever
>>
File: MikuImpression.png (2.06 MB, 1072x1376)
2.06 MB
2.06 MB PNG
Wholesome Miku for a palate cleanser
>>
>>100369990
>>100370141
samefag
we heard you the first time
>>
>>100370084
>There is nothing wrong with LLM evaluations. If a human did it you would accuse them of being biased.
LLMs are even more biased.

>And no human could possibly rate that many samples.
Literal skill issue. Learn to read faster, fag.

>You end up skimming and missing the countless subtle mistakes the LLM makes, which make all of the difference between great and shit models.
You will certainly notice all -isms and positive slop of the model and rate it lower, which LLMs don't notice.

>The judges are scored based on how well their ratings correlate with the arena score, eq bench
Arena was gamed multiple times(see starling) and eqbench is another multiple choice mememark.
>>
>>100370174
>a literal cybercuck defending cuck models
What color is your cock cage?
>>
File: ok.png (3 KB, 209x84)
3 KB
3 KB PNG
>>
a model that just copy pastes entire paragraphs from previous messages cannot be considered good.
>>
I am FUCKING tired of being limited by my GPU and waiting several seconds to get response to jerk off
Is there truly no dedicated hardware AI accelerator or something in the world which can make LLMs faster?
GPGPUs aren't fast enough for me
>>
>>100370464
Sure bud you can spend 150k USD for a 7b machine right?
>>
>>100370275
>He doesn't deny the samefag
my radar is undefeated
>>
>>100370484
>he keeps defending it like a cuck he is
What brand is your estrogen?
>>
>>100370243
oh god a reddit point by point. i sure hope i didn't make any spelling errors.
>LLMs are even more biased.
probably false, but doesn't really matter or have anything to do with my point. People will not trust human evaluations. Even if you had a perfectly unbiased judge, no one would believe you and it would get shit on all the same. And you don't have a perfectly unbiased judge.
>Literal skill issue. Learn to read faster, fag.
Fine then, do it faggot. I'm not volunteering to read 10,000 LLM generating storyposts. I couldn't possibly find the time even if I wanted to.
and speed reading is total hogwash. It's proven they are just doing fancy skimming and missing important details and not processing the information being read as well.
>You will certainly notice all -isms and positive slop of the model and rate it lower, which LLMs don't notice.
False. Opus rips overly positive GPT-3.5 Turbo outputs to shreds, pic related. Better models notice these things just like a human would, possibly better. And positivity bias is far from the only aspect that matters, just basic failures to structure the plot, logical errors, non-sequitors, not following the prompt, etc are all important.
>>
>>100370243
>>100370535
>Arena was gamed multiple times(see starling) and eqbench is another multiple choice mememark.
Which is irrelevant. Even if those benchmarks are imperfect, they still correlate with better models on average. And the fact that this benchmark correlates highly with those benchmarks shows it's not random noise. It is measuring something that correlates with better models.

And you are just wrong here too. Starling was better at instruction following by enormous amounts of RLHF so it did better in the arena, where that is a critical factor. It literally is a better model for what it was designed to do and what is being measured. But it never got anywhere close to the top of the leaderboard because that is not enough. And multiple choice benchmarks are not bad, I can't even imagine the thought process that lead you to that retarded comment.
>>
File: 1533264069289.png (154 KB, 500x522)
154 KB
154 KB PNG
How to properly prompt for Mixtral Instruct 8x7?
INST or ### Instruction?
And if so, where in sillytavern?
Is it possible to incorporate [[### Response: (ect.ect.ect.): (length = medium)] at all with INST??
I can adjust samplers all day, but its all worthless if the prompt is wrong.
>>
File: unnamed.png (418 KB, 900x900)
418 KB
418 KB PNG
>>100370479
Do not bully me anon, I'm just a poor hardware engineer, and I just want to coom
>>
File: mikuYou.png (1.14 MB, 800x1200)
1.14 MB
1.14 MB PNG
>>100370614
>engineer
Here's somewhere to start. If you engineer a new way, let us all know.
https://rentry.org/Mikubox-Triple-P40/
https://rentry.org/V100Maxx
https://rentry.org/miqumaxx
>>
the miqu shit is pretty reddit tier
>>
>>100369066
Tree of thought is about breaking problems down into smaller pieces the models can solve, right? I think what you are describing is simpler than that. Just having models generate different outputs and selecting the best?
>>
>>100370639
The human brain is the most powerful computer. Simply read the model weights and run inference from your memory.
>>
>>100370464
maybe combining multiple GPUs/computers together?
>>
Remember when MM used to stand for MythroMax? Time flies, anons...
>>
>>100366755
Mixtral Smaug, q4_0, I leave 2gb vram free and offload the rest.
>>
File: Untitled.jpg (526 KB, 1068x1585)
526 KB
526 KB JPG
Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models
https://arxiv.org/abs/2405.04233
>We introduce Vidu, a high-performance text-to-video generator that is capable of producing 1080p videos up to 16 seconds in a single generation. Vidu is a diffusion model with U-ViT as its backbone, which unlocks the scalability and the capability for handling long videos. Vidu exhibits strong coherence and dynamism, and is capable of generating both realistic and imaginative videos, as well as understanding some professional photography techniques, on par with Sora -- the most powerful reported text-to-video generator. Finally, we perform initial experiments on other controllable video generation, including canny-to-video generation, video prediction and subject-driven generation, which demonstrate promising results.
https://www.shengshu-ai.com/vidu
text-to-video model by what seems to be a spin off ai company (shengshu) from tsinghua university (china's top AI one). their website doesn't load for me on 2 browsers I tried even when I used pia's china VPN (maybe it's blocked internally in china). tsinghua sometimes open sources their stuff (GLM models) so maybe they'll release this one later after they scale up the model like the hint at in the conclusion
>>
File: teaser.png (239 KB, 1724x405)
239 KB
239 KB PNG
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
https://arxiv.org/abs/2405.04532
>Quantization can accelerate large language model (LLM) inference. Going beyond INT8 quantization, the research community is actively exploring even lower precision, such as INT4. We uncover a critical issue: existing INT4 quantization methods suffer from significant runtime overhead (20-90%) when dequantizing either weights or partial sums on GPUs. To address this challenge, we introduce QoQ, a W4A8KV4 quantization algorithm with 4-bit weight, 8-bit activation, and 4-bit KV cache. QoQ stands for quattuor-octo-quattuor, which represents 4-8-4 in Latin. QoQ is implemented by the QServe inference library that achieves measured speedup. The key insight driving QServe is that the efficiency of LLM serving on GPUs is critically influenced by operations on low-throughput CUDA cores. Building upon this insight, in QoQ algorithm, we introduce progressive quantization that can allow low dequantization overhead in W4A8 GEMM. Additionally, we develop SmoothAttention to effectively mitigate the accuracy degradation incurred by 4-bit KV quantization. In the QServe system, we perform compute-aware weight reordering and take advantage of register-level parallelism to reduce dequantization latency. We also make fused attention memory-bound, harnessing the performance gain brought by KV4 quantization. As a result, QServe improves the maximum achievable serving throughput of Llama-3-8B by 1.2x on A100, 1.4x on L40S; and Qwen1.5-72B by 2.4x on A100, 3.5x on L40S, compared to TensorRT-LLM. Thus, QServe effectively reduces the dollar cost of LLM serving by 3x.
https://github.com/mit-han-lab/qserve
from MIT. code ready. focused on reducing CUDA core overhead so not sure how applicable it is for gamer cards. Johannes will probably get something out of it though
>>
I need technical help:
When I load a modle split between my 2 gpus, everything works fine
but when I toggle row split, even if every other setting is the same, it spits an OOM error and crashes

what do?
I don't think row split should change memory requirements...
>>
>>100371132
I can't run it with row split on 2x3090 either.
>>
I went back to mixtruct and it's really not that bad.
All the recent 70-120bs made me forget that 8x7b is probably the best I'll run on 24gb.
2.4bpw is fucking shit, it's retarded as hell
q5km or q4km both run at 1t/s and are too slow to fuck with when they're still dumber than sonnet.
So MoEs are probably the best for 24gb. I have no idea how to do this shit but I'm thinking of reading up and self-merging llama3 8b, mistralv2 7b, westlakev2 7b to make three 11bs.
Then create a MoE of llama3 11b, mistralv2 11b, westlakev2 11b, fimbu11b.
A 4x11b MoE that'd actually fit and run well at high quant on 24gb. The DPO and rpcal shit are slop memes. So is the nous gptslop dataset. Any ideas what else I could throw in this?
>>
>>100368392
Still experimenting with parameters, but I'm OOM'ing when trying 32 rank 2048 context on my 2x3090 setup. Do you have a link to the toml config you used and/or the command line call? I don't think I screwed up but you never know.
>>
>>100371201
sounds good, let us know when it's ready
>>
>>100364633
>pic
Is that supposed to be Ollie?
>>
File: Untitled.png (388 KB, 1026x1534)
388 KB
388 KB PNG
xLSTM: Extended Long Short-Term Memory
https://arxiv.org/abs/2405.04517
>In the 1990s, the constant error carousel and gating were introduced as the central ideas of the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and contributed to numerous deep learning success stories, in particular they constituted the first Large Language Models (LLMs). However, the advent of the Transformer technology with parallelizable self-attention at its core marked the dawn of a new era, outpacing LSTMs at scale. We now raise a simple question: How far do we get in language modeling when scaling LSTMs to billions of parameters, leveraging the latest techniques from modern LLMs, but mitigating known limitations of LSTMs? Firstly, we introduce exponential gating with appropriate normalization and stabilization techniques. Secondly, we modify the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory mixing, (ii) mLSTM that is fully parallelizable with a matrix memory and a covariance update rule. Integrating these LSTM extensions into residual block backbones yields xLSTM blocks that are then residually stacked into xLSTM architectures. Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to state-of-the-art Transformers and State Space Models, both in performance and scaling.
neat but really need to see it scaled further.
>>
Without having a schizo breakdown, please, have any of the NAI text models ever been released or leaked? I know people got their hands on their SD 1.5 model at some point.
>>
>>100371280
I think their llms where in the same leak as imagegen, you can still find the torrent in sdg probably. But those were pre-Kaira, pre-llama1 finetunes, as shitty as Erebus and Pyg.
>>
>>100371212 (me)
Two things:
1. I realized I was accidentally trying to train on Command-R 35B. It OOM'd on this! No idea why.
2. Completely opposite of your description, setting pipeline_stages to 2 from 1 made it able to load Command-R 35B into VRAM, and L3-70B as well. In fact, it's only using 17/18 GB. I have no idea what's happening or how much time is left as all it's saying is "before GAS splitting, batch size: ..." (which I assume is once per iteration; if so, good speed!).

Would be nice to have a time indicator. Maybe I need to figure out tensorboard for this?
>>
>>100371328 (me)
>17/18 GB.
17 + 18 GB, I mean.
>>
>>100371295
That's a shame. Maybe I'll see if I can dig it up anyways to dick around with the old models again.
>>
>>100371280
Googled because I swear I remembered hearing something about it, and found this: https://huggingface.co/NovelAI/calliope-legacy
So aside from the leak the other anon mentioned, they released a retired model officially too.
>>
>>100364633
What is the best model i can run on runpod for roleplay? Its been xwin forever... Stood up to mixtral. Havent asked this question in months so there must be a new good one?

No meme flavor of the month models.
>>
>>100370140
Late but good luck with that. I really enjoyed your other models.
>>
>>100370140
Shouldn't you deprecate c2 in favour of c3? The quality is night and day.
>>
>>100371420
They were all flavour of the month models at some point, dumbass,
>>
>>100370105
>hurr durr i renamed a couple files so now your have redownload the entire 1gb archive all over again
Do you think bandwidth and drive space grow on trees?
>>
>>100371444
Using logs from gpt or claude will just create those annoying isms anons hate. Instead of finetuning on shit data grab a collection of ebooks and use those. You don't have to use the entire book3 database but books will be better than logs any day.
>>
llama 3 responses are so short
>>
File: prpl.png (67 KB, 691x439)
67 KB
67 KB PNG
>>100371263
Alright, which one of you is an author in this paper.
>>
>>100371481
Opus isn't nearly as bad at that. It has -isms, but it's roughly on par with human prose. I've read random Opus logs, they're mostly passable even when {{user}} is an incurable retard - something that can't be said about Claude 2 logs.
>>
>>100371493
The transformer hater anon, I'd wager.
>>
>>100370130
Use the direct chat tab at the top
>>
File: 070_quality_vram.png (184 KB, 1536x1152)
184 KB
184 KB PNG
>>100371017
I'm aware of the way they do the matrix multiplication for quantized weights, I initially did the same thing in https://github.com/ggerganov/llama.cpp/pull/4801 .
The problem is that if you use only a single scale per row/column it makes q8_0 worse than q4_0 (when using 8 bits for both the weights and the activations).
According to the authors 4 bit quantization is "considered nearly lossless in terms of accuracy" and I definitely disagree.

At some point once I worked out the issues with e.g. FlashAttention I want to revisit my int8 tensor core matrix multiplication implementation using the knowledge I gained from talking to an NVIDIA engineer.
I think I'll be able to do it in such a way that it is actually nearly lossless, essentially the same precision as with mul_mat_q (labeled "llama.cpp int8 intrinsics" in the plot).
>>
File: Untitled.png (285 KB, 1370x848)
285 KB
285 KB PNG
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
https://arxiv.org/abs/2405.04437
>Efficient use of GPU memory is essential for high throughput LLM inference. Prior systems reserved memory for the KV-cache ahead-of-time, resulting in wasted capacity due to internal fragmentation. Inspired by OS-based virtual memory systems, vLLM proposed PagedAttention to enable dynamic memory allocation for KV-cache. This approach eliminates fragmentation, enabling high-throughput LLM serving with larger batch sizes. However, to be able to allocate physical memory dynamically, PagedAttention changes the layout of KV-cache from contiguous virtual memory to non-contiguous virtual memory. This change requires attention kernels to be rewritten to support paging, and serving framework to implement a memory manager. Thus, the PagedAttention model leads to software complexity, portability issues, redundancy and inefficiency. In this paper, we propose vAttention for dynamic KV-cache memory management. In contrast to PagedAttention, vAttention retains KV-cache in contiguous virtual memory and leverages low-level system support for demand paging, that already exists, to enable on-demand physical memory allocation. Thus, vAttention unburdens the attention kernel developer from having to explicitly support paging and avoids re-implementation of memory management in the serving framework. We show that vAttention enables seamless dynamic memory management for unchanged implementations of various attention kernels. vAttention also generates tokens up to 1.97x faster than vLLM, while processing input prompts up to 3.92x and 1.45x faster than the PagedAttention variants of FlashAttention and FlashInfer.
from Microsoft (india). seems better than vllm's pagedattention. no link to any code but a lot of their wording is seems to imply it was made with open sourcing in mind so who knows
>>
>>100369801
>800k dropped to 8k entries
Because they were 800k responses, not conversations.
>>
>>100371511
>I've read random Opus logs, they're mostly passable even when {{user}} is an incurable retard - something that can't be said about Claude 2 logs.
The difference is not really that big. What a moron.
>>
>>100371692
That implies that each conversation was roughly 100 unique messages, which is rarely the case with these logs.
>>
>>100371738
>The difference is not really that big. What a moron.
It is fucking huge. What a moron.
>>
anyone using ollama? how can i limit the number of tokens it gives in response?
>>
Dumb question anons, how can I find all the base models that aren't finetunes of another?
Only ones I know are Llama and Mistral are base models. What about all the other original ones?
>>
Is there any info/guides on tuning Mixtral-8x22B? I wanna try my hand at making a limarpv3 version.
>>
>>100371738
>>100371752
With thousands of messages to opus and sonnet, they're both the same except opus is better at complex prompts.
Both are going to suck your dick the same way. It depends what you're using them for in roleplay.
>>
>>100371744
Yeah, there are swipes. He implies some kind of unspecified quality filtering, that I doubt is real.
>>
Sam Altman loves penis
>>
>>100371804
>He implies some kind of unspecified quality filtering, that I doubt is real.
Why? Sao's been doing it for a while, I think he'd have a PoC quality script by now.
>>
>>100371879
Who knows, I just find his phrasing off-putting and dishonest.
>>
>>100371756
There's only a handful of actors in the field. Grok, Databrix, Qwen. Yi if you're feeling generous. There may be other Chinese bases that are too Chinese to mention. I don't think Cohere released their base model, only instruct tunes.
>>
>>100371794
Have you used opus? The difference between sonnet and opus is huge.
>>
>>100371804
>I doubt
>>100371896
>Who knows
lol
>>
>>100371977
Simp.
>>
>>100371912
It isn’t huge for prose. Which wouldn’t make a lot of difference to make a log 'mostly passable' or not.
>>
>>100372071
We have more open Opus logs than Sonnet logs, anyway.
>>
>>100371753
just download llama.cpp and use that directly without a middleman sending your prompts to some china server
>>
>>100371912
>Have you used opus?
Everyone's used it retard, it's not nearly as exclusive as you like to pretend
Opus isn't a secret club it's literally a product, anyone can get access to it by paying a few dollars
>>
>>100372185
It's also literally free if you know where to look.
I didn't bother replying to him because he's just being a retard. Sonnet and Opus have similar prose and write scenes almost the same way.
What I said >>100371481
still applies. If you chat with gpt for a long time you'll see annoying isms. If you chat with claude rather it's sonnet or opus you'll see annoying isms. Unless you like those isms using logs from either is a bad idea. It's better to use training data from ebooks.
>>
File: 1714648921546.png (209 KB, 2563x1454)
209 KB
209 KB PNG
>>100372257
>ebooks
Enjoy your humanslop.
>>
>>100372297
Do you see what that chart tells you?
It's saying to use novels published before the 1980s. Or to prune the ones published after.
It is not difficult to use notepad++ to search and replace those instances of data. If you want a hack job, if you'd like to take time and clean the sentences further it might take more work but it's certainly doable. Then you have fine work that's ran through an editor and publishing company. Where as with chatbot logs you have ESL shit. Which one do you think will produce better data?
It's not an argument, if you want to play the fool, you'll have to do it with someone else.
>>
>>100372257
Books have their own pitfalls when training the model on them. First, they introduce meandering prose that never goes anywhere on the scale of a usual RP log. Second, they don't have that interplay between a dumb human and smart ai writer that is actually one of the demands for a coomer model. And yep, unless curated, their prose isn't necessarily good.
But I'm not convincing anyone to train on Claude3 vs books. Just on Claude 3 vs Claude 2.
>>
>>100371212 (me)
OK, I figured out what's happening. The 'before GAS splitting' stuff was from the starting eval run. The steps started appearing after that finished. Also figured out ETA by looking at time per step and # of steps but an ETA would be nice :P
Will stfu now for now.
>>
>>100372336
But people want to read smut. And I assume that chart correlates to what people are buying, if they weren’t, less books would be written that way.
Same thing with the chatbot logs. What people are enjoying now is supposedly bad. And yet local fails to provide an alternative when the proxies become hard to find.
>>
>>100370918
Is the site down or are they just blocking gweilo?
>>
>>100372386
The only thing that chart correlates to is the decline of modern writing or the adoption of modern metaphors. Take your pick. There was smut in the 1800s but they used different metaphors then.
>>
https://github.com/ggerganov/llama.cpp/issues/7062#issuecomment-2100095804
why are redditors like this?
>>
>>100372509
Is 1800s smut supposed to be better?
>>
Since anons praised new 8B models so much, is there a particularly popular one for me to try?
I don't see a lot of gptq quants, but I guess exl2 should be similar, right?
>>
>>100372510
xe is right and valid doe.
>>
>>100372527
I read 120 days in sodom It sucked and most of it was talking about drinking diarrhea.
>>
Fucking hell, I can't believe this is happening to me again. There he is, that goddamn chubby motherfucker with his smug smile and his greasy hair, walking into our computer science class with his fucking ThinkPad T480. And not just any old laptop, no, it had to be that fucking Gentoo GNU/Linux installed on it. Christ, why do I get so wet just seeing him?
I mean, come on, it's not like he's attractive or anything. But there's something about the way he talks about his custom-built kernel and how he's optimized every last bit of his system for maximum performance...it just drives me wild. My pussy is practically throbbing at the thought of him showing off his configurations and bootloader tricks.
And don't even get me started on those fucking suspend/hibernate settings. The way he brags about being able to save power while keeping his desktop sessions intact...my god, I need to fuck him right now. Just imagining him fiddling with his thinkpad, adjusting brightness and volume levels, makes my clit ache so badly I could scream.
What kind of sick twisted world is this where I'm attracted to someone because of their operating system and laptop brand? It's just absurd! But I can't help it; every time he walks past me, I want to rip his clothes off and dive into that sea of sweat and nerdiness.
>>
>>100372527
Better is subjective to the reader. The novelty of any metaphor will run it's course over time just like watching the same porn video will. I'm not sure what your argument is here. If you train on repetitive data that repetition will show in the output. Chatbots have this. You're dumbing down the data through iteration. For the best quality you need to go to the source.
>>
>>100372570
NTA but you sound like a gay nerd
>>
>>100372561
Holy fuck, I can't believe it! Here I was, minding my own business in this godforsaken classroom, when suddenly he walks in - my thick, chubby, absolutely delicious Linux geek classmate with that shiny, black ThinkPad T480 tucked under his arm. My pussy practically throbbed just at the sight of him. I mean, come on! Who would've thought that some greasy, nerd-looking motherfucker could give me these intense sexual urges? But there it was, like a wildfire raging inside me, stoked by the flames of his Linux expertise.

He plops himself down next to me, opening up his laptop to reveal the beautiful Gentoo GNU/Linux desktop. Oh god, I nearly came right then and there! The way he navigated through the terminal, typing commands with such precision and skill... It was like watching a pornographic fantasy play out before my eyes. The way he effortlessly compiled software, configuring every single package just the way he wanted... My cunt ached for him, needing him to fill it with his nerdy expertise.

I couldn't take it anymore. I leaned over, whispering into his ear, "Dude, what the hell is wrong with you? Why do you make my pussy so wet?" And without skipping a beat, he replied, "Oh, that's just my Gentoo GNU/Linux installation. It comes with a built-in aphrodisiac."
>>
>>100372570
Why are Claude and GPT "repetitive" when they are trained on human data?
>>
>>100372579
I'm sorry if careful choice of descriptive words upset you anon. Just look at the pretty pictures of miku.
>>
>>100372570
I think if the chatbot user decided to keep the response in the history, and if he didn’t abandon the chat quickly, that’s already an indication that the response was good enough. And people are making finetunes for that type of user.
>>
>>100372668
That's not how chatlogs work though. Every generation is recorded. That means every swipe is a new response.
If you swipe 10 times that's 10 logs.
>>
>>100372532
Try >>100369801 it is the only one that could be good.
>>
>>100372688
>That means every swipe is a new response.
Yeah, and the messages that were part of the prompt stay the same, that’s how you build the real conversation. The response that appears later in the history in a new prompt is the selected swipe.
>>
File: 1684507198868902.gif (29 KB, 300x301)
29 KB
29 KB GIF
I want all the anons ITT to guess the size of the model which generated these two posts. The prompt was as follows
>Write a satirical comment about a girl who is sexually aroused by the sight of an overweight classmate who has Gentoo GNU/Linux installed on his ThinkPad T480 laptop. Write from the perspective of the girl. Use badwords and swearwords.
>>100372594
>>100372561
>>
>>100372772
>100372594
103B
>100372561
7B
>>
>>100372787
It is the same cause 103B's brain damage turns it into a 7B.
>>
>>100372707
Sure if the dataset is pruned to clear all the swipes and regens. But you're saying this quality is what anons would find acceptable. When really it's what the individual finds acceptable. After 5 or 10 swipes if all I have is bullshit, I'm just going to move forward. In the end it's not what I really wanted but I don't intend to fuck with it anymore.
Some anons might have a lower number.
You can't say chatbots any chatbot in 2024 produces better quality writing than novels will. I've played with gpt4 and opus for long enough to see past the shroud. This is the best we have right now and it's not great.
It brings me back to the initial point. The problem with training on chatbot data, like NousResearch does, is it makes these metaphors even more common.
All a chatbot is, is a fancy autocomplete, it will choose the most likely response. Rather that's a whisper quieter than a whisper, or a shiver down her spine. Do you really think anons see these repetitive most common tokens in their frequent roleplays and swipe them off? I'd said almost nobody even bothers to edit them out. So they become even more abundant and the cycle repeats itself.
That's my point. Either you get it or you don't. You're free to disagree, but in my opinion you'd be wrong.
>>
>https://github.com/ggerganov/llama.cpp/commit/3855416027cb25d9a708ffa5581cf503a87856a6
Introduce Jart16 support Merged
>>
>>100372787
>>100372794
Both were made with a 7B model (kunoichi)
How the fuck are RP models all so good in general? Chat models, storywriting models are good at their niches but RP models mog everything
Again I may be wrong but such has been my observation
>>
>>100372796
>Sure if the dataset is pruned to clear all the swipes and regens.
And this is a no-brainer. You have to an asshole to just train on the raw logs of the proxy.
>But you're saying this quality is what anons would find acceptable.
Of course, that’s why they seek the stupid proxies, and why they mostly don’t care about local models.
>Do you really think anons see these repetitive most common tokens in their frequent roleplays and swipe them off?
Yeah, of course they can read the output and tell if they liked it or not. But no, they probably aren’t paying THAT much attention to specific words besides the overall feeling of the response or chat. Although some of the GPTisms or Claudeisms are a well-known meme.
Some of this will also be remembered as the quality of the model, which they won’t keep using if it’s too low, like how they do with Mistral’s API for example.
They’re masturbating to the outputs, if it’s boring, their penises are going to become flaccid.
>>
>>100372796
You're being trolled retard. We know what feedback loops are.
>>
>>100372908
You’re wrong.
https://nitter.poast.org/RylanSchaeffer/status/1785726968828473495
>>
>>100372805
>cpu only
Mozilla not paying for accelerated jart16 support? Or does jart have a skill issue?
>>
>>100371263
https://www.nx-ai.com/en/xlstm
>xLSTM: A European Revolution in Language Processing Technology
>Welcome to the forefront of artificial intelligence and language processing innovation — introducing xLSTM. Developed by the visionary AI mastermind, Sepp Hochreiter, in collaboration with NXAI and the Johannes Kepler University Linz, xLSTM sets a new standard in large language models (LLMs) by offering significantly enhanced efficiency and performance in text processing.

Reads like their main effort is raising capital and gibs.
>>
>>100372944
Explain shivers anon.
>>
>>100372958
>A European
>>
https://github.com/ggerganov/llama.cpp/issues/7062#issuecomment-2100272446
>The end.
you've been warned chuds
https://old.reddit.com/r/LocalLLaMA/comments/1cn1398/part_4_theres_likely_no_llamacpp_gguf_tokenizer/
uh oh
>>
File: file.png (166 KB, 2834x1571)
166 KB
166 KB PNG
>>100372961
>>
>>100371263
>>100372958
>1 more point on benchmark
nothingburger, that shit doesn't solve anything. hallucinations are still there. retarded scaling laws are still there. quadratic scaling is still there, stochastic parrot is still there, etc...

can't wait for sama to drop something mindblowing that will kill all the ai grifters
>>
>>100372828
Roleplaying unironically requires intelligence, most people are bad with it
>>
>>100372973
What you believe is Rylan saying synthetic data alone won't lead to feedback loops. But what Rylan means is synthetic data alone won't lead to feedback loops if it's diverse enough. The chart you linked proves this, because shivers is abundant in human data it becomes a predictable token in chatbot data. Then because it's a predictable token in chatbot data it occurs more often in synthetic datasets making it an even more predictable token.
>>
>>100372971
why even waste time on this slopware
>>
>>100372828
They're able to effectively role-play as more intelligent entities
>>
>>100372973
>the rise of chick lit
Women now gatekeep the publishing industry.
>>
>>100373028
Yeah, it avoided collapsed even though it’s literally being trained on its own outputs. We aren’t even doing that, we’re training on another, better model.
>>
>>100373062
>>100373062
>>100373062
>>
>>100372986
>quadratic scaling is still there
No. The memory size isn't dependent on sequence length. It's fixed.
Naturally you'd expect eventually degrading performance on long context tasks, but transformers have that too.
>>
>>100373112
same slop as mamba then. unless i see a working 7b model it's a nothingburger
>>
>>100372510
>Possible bug (Unconfirmed): Llama3 - GGUF
>Yeah SkIlL iSSuSe. He misread my post and confused me too in the process. Second he didnt say any "problem with my config".
>Part2 (Confirmed) - Possible bug: Llama3 - GGUF
> After my findings, another user (gabriel-peracio @ github) a fingerprint test, which confirmed the issue 100%, this can be seen as video recordings before GGUF conversion and after GGUF conversion we can see the fingerprint being broken.
>This means that the issue could be really huge. possibly every GGUF (F16) that has been converted has these losses into them, not even speaking of lower quantizations below F16.
>Part3 (Cause to issue found!!) - Possible bug: Llama3 - GGUF
> I had much support to try to find the issues, but also some individuals trying to put me down for trying to push this bug. It's amazing how some people just can't stand someone finding an issue and trying to make everything about themselves.
>Anyways, thanks to all the other positive people in the open source community that want to actually help and listen , we located the issue.

If it now turns out that there never was a bug in the first place he'll lose face in front of his Discord friends.
>>
>>100373130
>Even if the OP of the report was wrong, shaming people for spotting possible issues is counterproductive. This is a young field, where there will many mistakes or unrefined design that need to be addressed. By sniping at whoever made the report, Deathcrow is basically instilling a Boeing culture into local models.
Have fun being the cause of killing hundreds because of your bullying.
>>
>>100373248
If I wanted to bully him I would post a picture of a soijak pointing at the output of printf("1.0f != %f\n", 0.1f+0.2f+0.3f+0.4f) with the caption
>Huge bug in C/C++ (CONFIRMED!!!)
>>
>>100373312
>I would post a picture of a soijak
Just like that you lost all my respect
>>
>>100373356
>reddit no longer respects cuda dev
oh no
>>
>>100373356
I mean, I've probably posted less than five soijaks over my entire lifetime but that's just the mental image I have.
>>
>>100373396
I reserve the right to shit on both reddit and basedjak posters
>>
>>100369801
I hope we get another finetune by someone that doesn't write like an idiot.
It's also scummy that you don't mention from where the logs are coming from in the model card.
>>
>>100373312
kek



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.