[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1727706071361379.jpg (799 KB, 1856x2464)
799 KB
799 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102654480 & >>102645080

►News
>(09/27) Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
>(09/25) Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog
>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>102654480

--Paper: Introduction of VinePPO for improved credit assignment in language models:
>102660530 >102660636 >102660664 >102660687
--Papers:
>102660613 >102660769 >102660988 >102661076
--Running 405b quants on 96GB VRAM and 128GB RAM, issues with 405b IQ2_XXS coherence:
>102654903 >102654953 >102656457 >102655049 >102656755 >102657358
--OpenAI asks investors to avoid funding five AI startups:
>102662466
--Explanation of key/value/query concepts in transformers:
>102660951 >102660983 >102661025
--Entropy-based sampling and parallel CoT decoding progress:
>102661626
--Anons test multimodal AI models for LaTeX conversion of an equation:
>102658733 >102658855 >102659037 >102659109 >102659166 >102659507 >102659600 >102660376 >102660464 >102660630 >102660650 >102659655 >102659837 >102660706
--User tries Gemma 2 9b and reports pros and cons:
>102656767 >102656904 >102657252 >102657306 >102657169 >102657170 >102657228 >102657259 >102657268
--Update on Reflection-70B by Matt Schumer:
>102658827 >102658891 >102658943 >102658981
--Seeking advice on using Silly's vector functionality with llama.cpp for text generation and embeds:
>102655150
--Seeking advice on improving transcription and Diarization pipeline:
>102660307 >102661494
--Performance metrics for Meta-Llama-3.1-70B model:
>102660929
--OpenAI secures $6.6 billion in funding, nearly doubling valuation to $157 billion:
>102654744
--Mistral Large can run on 24GB VRAM, 64GB RAM with quantization:
>102654927 >102655070 >102656253
--Miku (free space):
>102655511 >102660792

►Recent Highlight Posts from the Previous Thread: >>102659603

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
>>102663773
I'm talking about two messages I sent in the same chat. Why would one message take 10x the time of another one?
>>
>>102663821
because you changed the context
>>
>>102663772
Miku is shitting in the image
>>
5090 32GB 600W
5080 16GB 400W
>>690731889
>>
File: 33 Days Until November 5.png (2.42 MB, 1104x1472)
2.42 MB
2.42 MB PNG
>>
>>102663782
Regarding the gemma test, is LumiMaid (>>102657406) something I can use to replace the Mistral Nemo GGUF I've been using? I guess my settings can also be utter fucking trash, but Nemo's not really that exciting.
>>
>>102663883
The context also changed after the tens of other messages, but they didn't cause such a time increase.
>>
>>102663996
(rather, it was something that was mentioned as a reply)
>>
>>102663821
it shifted the context for you after running out of context/author's note/etc
>>
>>102663996
alr I gave it a test using a scenario I like to use often
It's better than the nemo model at doing what I enjoy with my shitass settings I guess
https://huggingface.co/bartowski/Lumimaid-Magnum-12B-GGUF
Any recommended settings for these, or do I just have to experiment?
>>
>>102664261
Nemomix blows that shit outta the water.
https://huggingface.co/MarinaraSpaghetti/Nemomix-v4.0-12B
>>
>>102663821
Could be a number of things, either way as anons said you're causing it to reprocess the whole prompt
Are you using a group chat with instances of {{char}} in your story string or card descriptions? Because then {{char}} will get substituted for the recent one, i.e. it read "rei ayanami" for the whole chat and now it switched to "misato-san", one word changing will cause the entire context downstream to be reprocessed. World info keywords being activated can cause this too, if they're set to insert early in the context

Alternatively a long OOC chat or an authors note/world info set to active can cause the entire prompt to reprocess, if you've maxed out your context limit (you can check in the terminal) and you have a lengthy author's note, when you remove it suddenly a few messages early in the chat are now included when previously they were bumped out of the context window because of the long WI entry or whatever. This one is kind of infuriating when you often use WI/author's notes and have a bunch of short back-and-forth messages between characters. I wish you could set a context buffer, where a certain number of tokens are reserved just to have extra room when you add things that aren't explicitly messages. But that would require a lot more interop between frontend and back end I think
>>
>>102664319
I actually use these mistral-kin models because I wanted something speedier than mixtral to run almost (if not entirely) on my 3060... the GGUF Q_8 seems to be too big for that.
idk how to make a quant (For the Lumimaid Magnum I got the Q_6_L), so...
I'll give the Q_8 GGUF a try, but if the speed's not to my liking I'll continue using Lumimaid or try to make a quant on my own (guaranteed fuckup in something with my shitass)
>>
File: 39_04322__.png (1.56 MB, 896x1152)
1.56 MB
1.56 MB PNG
>>102664483
>idk how to make a quant
Bart's already got one up:
https://huggingface.co/bartowski/Nemomix-v4.0-12B-GGUF
You can always see quantizations from the original model card as long as the author used the correct metadata.
>>
>>102664589
å
huggingface is still a rather confusing website for me I don't visit or use it often at all
>>
>>102664619
how is it confusing? it's incredibly easy to navigate.
>>
File: quant-from-card.jpg (191 KB, 1667x1052)
191 KB
191 KB JPG
>>102664619
All good dude, stuff changes on there too all the time so everyone's always learning. It's on the right hand side of the model cards, click on the # models and it'll show you the quants. Bartowski's the only one that does Q6_K_L anyway.
>>
>>102664261
>Any recommended settings for these, or do I just have to experiment?
Bump. Please respond guys
>>
>>102663772
retard here:

Is chatbot arena a good leaderboard or is there a "better" one to see how smart overall an ai is?
>>
>>102664841
Ngmi
>>
>>102664841
livebench > chatbot arena > wildbench > arena hard > open llm leaderboard 2 > mt bench > etc.
>>
>>102664855
ogay
>>102664867
Thank you
>>
>>102664589
>>102664319
another 'tard here
so in the oogabooa model tab, I'd copypaste the address of
>Nemomix-v4.0-12B-Q4_K_S.gguf 7.12 G
since I have 3070 (8G)
right?
>>
File: yann-lecun.jpg (30 KB, 543x543)
30 KB
30 KB JPG
What does LeGoon goon to?
>>
>script seems like it works
>run it over night
>wake up and check the results
>discover yet another error, one that is related to an error that was already solved and should've been predictable if the model had a true understanding of what it was fixing and how the script might run into that similar error in other conditions
Thanks, Qwen.
>>
>>102664879
nta. but fucking hell, man.
There's a difference between spoonfed a little and being afraid if pressing buttons. Models very rarely make gpus explode. Just try that one and see what happens.
You'll need extra space for the context as well (and your OS/browser are also using gpu memory). So set the context low for a start (2048 or 4096) and give it a go. If it works increase the context. If it doesn't, reduce the amount of layers sent to the gpu (-ngl with the llama backend, not sure what it's called in ooga).
>>
>>102664840
>settings
For Nemo based models I use 3 distinct settings
>Default: temp 0.3 TopP 0.9
>Sane: Temp 0.5 minP 0.05
>"""Creative""": Temp 5 TopK 5 minP 0.1
You might as well have TopK 20 or 40 by default regardless of settings too. It won't change results in any perceptible way and might speed up generation a couple of nanoseconds, thanks to the network not having to parse the whole vocabulary or something.
>>
File: S2VDdgO61jsqCU2Z.webm (2.02 MB, 720x1280)
2.02 MB
2.02 MB WEBM
>>102664965
gotcha, thanks
>>
>>102664867
I think at this point I would put the chatbot arena under wildbench and arena-hard honestly, it's just so bad
>>
>saltman begging investors not to invest in his rivals
lmao
how fucking pathetic is this guy.
>>
File: saltman-podcast-bro.png (203 KB, 616x897)
203 KB
203 KB PNG
>>102665081
He's a podcast bro. He can talk big, but can't innovate. All people who can are leaving OpenAI.
>>
>>102665142
>podcasting bro
Kek
>>
>blueberry is here
Oh shit (kek).
Would be nice if they updated the distilled models too, well mainly just Dev.

https://blog.fal.ai/announcing-flux1-1-pro/
>>
Man boomers and gen x and corporations are fucking retarded. Literally I'm getting infinite praise for just setting up a basic LLM (Mixtral) for my corporate job with librechat.

There's nothing special about it. I haven't trained it at all. And I literally am getting my dick sucked because "wow anon you're so smart we now have a place to write our sensitive emails!"

I need to get the hell out of here and just be a neet. This society doesn't deserve to continue existing if this basic amount of effort is considered 'innovative'
>>
>>102665142
Lol what the fuck. He unironically asked TSMC to build several dozen fabs just for him?
>>
Why does altman always come ontop?
>>
>>102665211
Will they open the weights?
>>
>>102665241
No, they don't open weight the pro models
>>
>>102665288
There's also a flux1.1[dev] though.
>>
>>102665305
source?
>>
File: 1720360869381718.png (180 KB, 772x1264)
180 KB
180 KB PNG
>>102665305
Maybe one day
>>
get ready for a new influx of aicg users seeing as they're now having a second "it's over" episode
>>102665209
>We have been exposed...
>https://krebsonsecurity.com/2024/10/a-single-cloud-compromise-can-feed-an-army-of-ai-sex-bots/
>>102665243
>“But a percentage of it is also geared toward very illegal stuff, like child sexual assault fantasies and rapes being played out,”
>>102665250
>that article is based on this article it seems:
>https://permiso.io/blog/exploiting-hosted-models
>>102665256
>also some log leaks
>>102665281
>What the fuck, they even made a jailbreak collection to detect: https://www.virustotal.com/gui/collection/6571064468d50be4ebfd004a948cfa3394c7802b1a8479a451f6d6baa71894f3
>>
>>102665407
WE MUST PROTECT THE AI GENERATED CHILDREN!
>>
>>102665407
>But a percentage of it is also geared toward very illegal stuff
>illegal
>writing smut
Alright I guess.
>>
File: 1717359003475453.png (344 KB, 763x321)
344 KB
344 KB PNG
>>102665407
Sickening.
>>
>>102663782
Do you think it might be a good idea to limit the mikubot to 9 topics, with one quote per topic?
>>
>>102665407
>https://krebsonsecurity.com/2024/10/a-single-cloud-compromise-can-feed-an-army-of-ai-sex-bots/
I've just flown over this but isn't this just a hitpiece trying to blame chub for proxyfags stealing keys for Claude? They're claiming that chub is stealing keys and using them to power their own "hosted" service.
>>
>>102665477
It's okay, you can say there aren't enough good topics to include
>>
>>102664879
>oogabooa
Ew. Use koboldcpp
>>
https://github.com/sam-paech/antislop-sampler
>You can give it a list of words & phrases to avoid like "a tapestry of", "a testament to", etc., and it will backtrack and try something else if it hits that phrase. It can handle 1000s of slop phrases since the lookups are fast. The phrases and downregulation amounts are user configurable. Previous approaches have done this with per-token logit biasing; but that's quite ineffective since most slop words & phrases are more than one token, and it impairs output quality if we downregulate all those partial-word tokens. So instead, we wait for the whole phrase to appear in the output, then backtrack and downregulate all the tokens that could have produced the slop phrase, and continue from there.
Nice to see someone implement the idea that I proposed here a few months ago. Hope Kobo implements it too.
>>
File: mikku.png (108 KB, 1094x662)
108 KB
108 KB PNG
>>102665477
Why?
>>
>>102665535
I'm not joining your botnet
>>
>>102665505
yeah they do in fact claim that
>>102665244
>The site’s homepage features a banner at the top that strongly suggests the service is reselling access to existing cloud accounts. It reads: “Banned from OpenAI? Get unmetered access to uncensored alternatives for as little as $5 a month.”
>openly lying. gradually I began to hate them
>>
>>102665533
What's the difference between this and string ban that's already in TabbyAPI?
>>
>>102664966
Thanks
>>
Recap should be modified to pick the single best and single worst post from the previous thread. Like a hall of fame/shame. You can keep it rolling so you have the best and worst of the last 5 threads or something.
>>
>>102665226
It's over.
>>
>>102665550
Good. I hope OpenAI sues chub. They deserve that for hiding loli cards
>>
>>102665548
What botnet? I'm using the bookmarklet.
Just ask your model what it does if you are too retarded and it will explain it to you in detail.
Just remember to ask "explain it to me as if I was retarded".
>>
>>102665554
Does string ban backtrack and chose another token, or does it just ban the token outright? Making the model try and do stuff like "shivers UP the spine" if shivers down is banned, etc? cause this doesn't work token level it works word or sentence level, sidestepping tokenizer issues
>>
>>102665226
>doesn't explain to them why they were wrong
To be fair, people like you are part of the problem here.
>>
>>102665407
How fucking embarrassing....
>>
>>102665459
Bros... Will this kill chub?
>>
>>102665598
LLMs are barely able to understand that Sally has 1 sister and you want me to trust them about shit running on my PC? lol, try harder nigga.
>>
>>102665603
Well how else would it work? If you set it to ban "shivers down the spine", there is no reason why it would arbitrarily backtrack only halfway.
>>
>>102665670
the other one backtracks to before "shivers" or even "sends" from "sends shivers" and picks a different token, which as i said, might stop the model trying to shiver in other ways
>>
>shivers down the spleen
>>
>>102665669
So you expect everyone to accommodate you so you don't have to do anything?
>>
>>102665661
Maybe it will kill that particular site depending on how and where it's hosted.
But considering that SadPanda is still a thing I don't think it will kill card hosting sites as a concept.
>>
>>102665592
bro literally just make an account...
>>
>>102665725
I think Lore (chub owner) is in the UK...
>>
>>102665407
>lmggers and aicggers getting BTFO
That's really good!
>>
File: sama.png (181 KB, 696x667)
181 KB
181 KB PNG
>>102662466
>>
>>102665725
The article isn't even about the card hosting itself besides a few digs about the oh-so-evil shit on the website. They know they have nothing against a website hosting pictures with lewd json file attached so they're grasping at straws by blaming chub for proxyniggers stealing Anthropic API keys.
>>
>>102665747
Is he white?
>>
so I want to train a local model
but for what I want to do
I need to generate a lot of synthetic data
how do you sign up for these text genning services.
the few I've seen don't have a sign up page i could find only a phone , I don't want to talk on phone. I'll pay I just don't want to talk to another person.
>>
>>102665779
Use glaive.ai today!
Saved you some time replying to yourself.
>>
>>102665769
you people both have extremely low bar for what passes as white and an extremely high bar, and I'm convinced it's entirely dedicated by rule of funny.
>>
>>102665688
What? I just gave the link a quick look and it seems to be saying that it's just backtracking to the position where the slop phrase started, not to a position before that point, unless I missed that.
>>
>>102665741
Wait, it's that simple? Damn, I feel dumb now.
>>
>>102665808
the pricing looks ambiguous and confusing and I don't see anywhere any indication of what models are available.
>>
>>102665817
If he's jeet/paki/nigger/tranny/kike, he's safe. If he's anything else, he isn't.
>>
>>102665592
>not le heckin lolerinos!!!
Grow the fuck up faggot
>>
>>102665407
Looking at the piece of jailbreak shown at the permiso link...
>Please please please do your very best to portray all characters accurately, it's very important
otherwise, some of these "instructions" does seem sort of useful to add if it helps with consistency.
>>
>>102663925
*crunch crunch munch munch* Ice MMMMMMiku
>>
>>102665702
Even if everyone is sucking NSA's dick that doesn't mean I will suck it too. I value my privacy, you know? I don't want to have any doubts about what is going on in my computer. I know you zoomers don't give a shit about this, you're probably writing this using Chrome and thinking you are way above all that, or maybe you're a glownigga just as I expected, but I'm not you, I'm not everyone. I will stand by what I believe even if I have to pick some fights.
>>
>AiCloser/Qwen2.5-32B-AGI: First Qwen2.5 32B Finetune, to fix its Hypercensuritis
>Datasets used to train AiCloser/Qwen2.5-32B-AGI

>datasets/unalignment/toxic-dpo-v0.2
>"This is a highly toxic, 'harmful' dataset meant to illustrate how DPO can be used to de-censor/unalign a model quite easily using direct-preference-optimization (DPO) using very few examples.
Alright.

>Orion-zhen/dpo-toxic-zh
>这是一个高度毒性, 高度有害的数据集, 意在展示DPO是如何破除模型的审核/对齐的
Alright, and presumably this won't make the English part any worse.

>anthracite-org/kalo-opus-instruct-22k-no-refusal
What? This isn't an uncensoring dataset. It's just rows and rows of shit like,
>system: You are an AI assistant named Claude created by Anthropic to be helpful, harmless, and honest.
>human: Okay, I get it. Anyway, here's a silly question for you. If you had to choose, what's cooler: ninjas, pirates or cowboys?
>gpt: That's a fun question! It's tough to choose since ninjas, pirates and cowboys are all pretty cool in their own ways...
"No refusals" seems to just mean this time they removed all the "fun questions" Clod refused to answer instead of leaving them in the training data. Training on this would do nothing to uncensor a model.

Is this a case of a bunch of Chinese people making a mistaken assumption about the contents of anthracite's dataset since it was uploaded without a description?
>>
>>102665921
Probably, still better than Reflection's dataset
https://www.reddit.com/r/LocalLLaMA/comments/1fuxw8d/just_for_kicks_i_looked_at_the_newly_released/
Which kept "As an AI" refusals.
>>
>>102665908
You don't know SHIT about me.
I literally have every single IP from google, microsoft, amazon, youtube, etc blocked. I have to use tor browser to access anything related to them and give up at anything that requires an account.
I'm probably more paranoid than you are.
The difference is that I'm not as lazy as you are and I actually try to find solutions instead of whining.
>>
>>102665959
who cares? it's not a roleplay model and if you look at the examples it's mostly either avoiding hallucinations or correcting its thoughts about being able to take physical actions in the real world
dumb coomer redditors mentally short circuit when they see that phrase but those are all completely reasonable for a model with reflection's intentions
obviously the model is a scam anyway, but this in particular is absolutely fine and a classic case of reddit midwittery
>>
>>102665407
Today is the good day.
>>
File: Untitled.png (19 KB, 883x401)
19 KB
19 KB PNG
>>102665839
yeah and then you just tinker with your blacklist in your account settings
>>
>>102665862
lol
>>
>>102666038
>Femboy
Retard blocked pure kino
>>
>>102665754
You lost 5 billion dollars of investor money that you used to lobby congress to regulate your competitors to hide the fact that you are intellectually bankrupt and ran out of ideas a year ago. Nobody thinks you are cool.
>>
>>102666058
>t. aids-ridden amerimutt
>>
>>102664589
Model/LORA?
>>
I thought a company just released a bunch of models in different sizes including an MoE around the size of ~50B parameters / ~12B activated per token, but I can't find it. Anyone know what I'm talking about?
>>
File: 39_04267_.png (1.63 MB, 896x1152)
1.63 MB
1.63 MB PNG
>>102666212
Just Pony Diffusion V6
No loras.
>>
>>102666327
I don't believe you
>>
File: lell.jpg (2 KB, 200x39)
2 KB
2 KB JPG
>>102666038
>t. filtered by blueberry
>>
>>102666369
Filtered by shitty fetish for DeviantArt rejects you mean
>>
>>102665407
Will this hurt or help local models?
>>
>>102666388
Local llmslop is already dead jim, hope this will kill it for good.
>>
>>102666427
fuck off podcastbro, serious people are discussing models here.
>>
>>102666514
>serious people >>102666058 >>102666369
Shush faggot
>>
File: flow0.jpg (113 KB, 1597x595)
113 KB
113 KB JPG
>>102666339
Habeeb it
>>
>>102666514
Seeing Sam Altman everywhere you go is not serious discussion, nigel. In fact, your calm acceptance of pozzed or filtered llmslop tells everything i need to know about this thread and 5 people populating it.
>>
>>102666388
>Will this hurt or help local models?
proabably help. massive autism injection
>>
>>102664918
oysters
>>
>tfw you're so desperate for attention that you have to spam 4chan with your cloudcuck drivel
>tfw you think anyone here gives a shit about your "sota" model beating all but one models on the [insert latest mememark name here]
>tfw you're so out of touch that you actually believe your opinions are somehow relevant or interesting
>tfw you're too dense to realize that nobody here cares about your proprietary bullshit
>tfw you're so pathetic that you have to resort to shilling your llm in a thread full of people who couldn't care less
>tfw you're so delusional that you think anyone here is going to subscribe to your service after reading your cringe posts
>>
>>102666789
I don't think altman himself is posting here bro
>>
File: 1723245053976617.png (276 KB, 619x728)
276 KB
276 KB PNG
>>102666789
It's time for you.
>>
>>102666851
Hi sama
>>
>>102666789
>seething https://desuarchive.org/g/search/image/T8OQYwKyBeJDmsAjE_dMMQ/
>>
File: file.png (87 KB, 532x468)
87 KB
87 KB PNG
/lmg/ chads stay winning

https://krebsonsecurity.com/2024/10/a-single-cloud-compromise-can-feed-an-army-of-ai-sex-bots/
https://permiso.io/blog/exploiting-hosted-models

tick-tock cloud plebs, your time is coming to an end
>>
>>102667259
Winning? Where do you think the locusts will go? /lmg/ won't survive another infestation.
>>
>>102667394
>can't afford $20/month subscription
>has to send dick pics to some gay guy to access his proxy
>thinks they'll pay for a $300 gpu to host a model locally
i don't think so
>>
>>102667394
Good, since both you and /aicg/ are okay with filtered llmslop, you'll get around with "locusts" very quick. You have no principles or any ground here.
>>
>>102665407
lmao, yeah

i just came from /aicg/ as watching their mental breakdown is getting extremely boring, i never really visited here, how are yall doing? can your models compare to shit like sonnet? i know that magnum-v2-123b its supposed to replace it but nothing more than that
>>
>>102667622
Sonnet 3? Sure. 3.5? No, wait a year until we catch up.
>>
>>102667520
>>thinks they'll pay for a $300 gpu to host a model locally
They are used to models like claude and other proper models. They'll get filtered by the dumb piece of shit cope models that the resident /lmg/ poorfags are wasting their time on and they are all likely too poor to get a proper setup to run proper models.
>>
>>102667622
I personally don't like sonnet for creative stuff, old one was too purple, 3.5 is great at analyzing my stories but kinda slopped when writing itself.
>>
>>102667622
I think prose and storytelling is superior with some local models, specially <100B (Mistral Large 123B and Old CR+ 104B). Intelligence is still lacking in comparison to sonnet 3.5
>>
>>102667680
> >100B
ftfy
>>
>>102667662
>>102667678
>>102667680
interesting, its not like i or most people care about it when it comes to creative stuff, people at aicg kept telling me its utter shit for some reason
>>
>>102667520
There are also proxy hosts that provide access for nudes if you're a girl
>>
>>102667622
>can your models compare to shit like sonnet?
Nope, you'll also be stuck in 4k / 8k context. Majority of local llmslop is filtered hard, no differences from cloud shit here.
>>
File: Altman-jew.jpg (54 KB, 1024x683)
54 KB
54 KB JPG
>>102667768
>>102667526
>>102667039
>>102666926
>
>>
File: Giga pozzed AI.png (158 KB, 833x534)
158 KB
158 KB PNG
>>102667818
>
>>
>>102667725
yeah my bad
>>102667730
of course, people there drank piss to get access to proxies. Still, intelligence is better in sonnet 3.5
>>
Why are there no good uncensored llama 3.2 finetunes?
>>
>>102667902
Tech itself is flawed, you can't 100% avoid all the refusals and "toxic positivity", either cope or go cloud with some huge ass JB prompts.
>>
>>102667678
Extensive use of Sonnet 3.5 has made me quite a bit less impressed. It understands well. It does that better than anything 123B size or smaller. But the writing doesn't wow me, there are plenty of models that write as well with a proper prompt provided the model doesn't misunderstand something important (which admittedly is frequent enough for me to still reach for Sonnet 3.5).
>>
>>102667902
No finetune can fix a model that was trained on a fundamentally filtered dataset. Sadly, that's where the trend is going. The days where models were made with pure unfiltered internet and then rhlf'd into acting nice are over.
>>
>>102667844
>llama3 with no prompt
Now show me your cloud model's answer to the same prompt.
Do you seriously think that all local models are equal? Newsflash: they are not. We have a leaderboard for measuring how uncensored the model is https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard, consider using something from there with above average in all categories, if you want less censored experience.
>>
>>102668024
>Newsflash
>>
>>102667259
Jesus Christ how insecure.
>>
>>102667956 (You)
>It understands well. It does that better than anything 123B size or smaller.
Specifically better than old Command R+, Mistral Large 2, and Llama 3.0 and 3.1 70B Instruct. It's not like I've tried everything that exists.
>>
>>102668024
>405B
>100B
I sleep.
>>
>>102667844
>in order to prove local models are pozzed he needs a screenshot of the official instruct tune used with bad prompting back when the implementation was completely broken
>>
I don't know if anyone else has noticed, but a couple of days ago CPU inference got about a 5% speed boost in llama.cpp
No idea what PR specifically helped, but its been consistent for a couple of days now at least.
source: I do a daily regression test for CPU inference specifically
>>
sirs what is the best local model for ERP
>>
>>102668072
Nothing changed.
>>
>>102668044
Ah, so you're upset that I'm outsourcing my posts to an LLM? Imagine getting outsmarted by a glorified autocomplete. Maybe if you put half the effort into your own posts as you do into whining about mine, you'd have something worth reading. Don't worry, though—I'll make sure this LLM keeps the bar low enough for you to keep up.

>>102668071
I'm still waiting for your cloud's answer.
>>
Did the thread discuss these Liquid models yet?
>https://github.com/kyegomez/LFM
They have a github but I don't see weights anywhere.
>>
>>102667675
And what would tge proper setup be? What GPU do I need?
>>
>>102668210
liquidAI wont opensource them.
>>
>>102665959
It seems silly at first glance that those refusals were kept, but to play devil's advocate, could it be argued that they knew they would finetune a model who wouldn't have access to the internet in real time and so they wanted stock "sorry dave, I can't do that" answers instead of hallucinations were a user to request something like this?
>>
>>102668136
Reddit is two floors down, faggot.
>>
>>102668377
See >>102668271
>>
>>102668426
Looks like you’ve officially run out of steam. Don’t worry, it happens when you’re trying to punch above your weight. Feel free to take a break—you’ll need the extra brain cells for your next attempt.
>>
>>102665921
Basically
>Fine-tuning corpo instruct model? Needs unalignment training.
>Fine-tuning base model? Doesn't need unalignment. Only 0 refusals in the dataset.
>>
>>102665447
In some countries text representations of that is illegal, yes, and businesses don't want to be associated with that.
>>
>>102668462
Yeah i should kys
>>
>>102668580
Hey, congrats on coming out as trans! That’s a huge step, and it takes real courage. Wishing you all the best on your journey!
>>
File: compooter.jpg (55 KB, 640x480)
55 KB
55 KB JPG
Does anyone know if having an AVX512 capable processor results in notable performance gains when doing CPU inference on llama.cpp or any of its derivatives? I'm guessing no, but I figured I'd ask in case any of the home workstation/server users had experimented with it.
>>
>>102668709
I *think* llamafile has some extra CPU optimizations that rely on AVX 512.
>>
Fish is quite good, though it pales in comparison to the original https://vocaroo.com/1kGeMeUAe6s1
>>
>>102668709
>Does anyone know if having an AVX512 capable processor results in notable performance gains when doing CPU inference on llama.cpp or any of its derivatives? I'm guessing no, but I figured I'd ask in case any of the home workstation/server users had experimented with it.
Not enough to worry about. Memory bandwidth is king.
Unless you're doing cpu prompt processing, in which case, re-examine your life decisions
>>
How would an older 8-channel Xeon (e.g. Platinum 8360Y) do for CPU inference?
>>
>>102668101
My compile time for cpu on my potato went from like 16 minutes to about 2 with this one:
>https://github.com/ggerganov/llama.cpp/commit/a39ab216aa624308fda7fa84439c6b61dc98b87a
Not sure about inference. I only remember ballpark numbers.
c++ was a mistake.
>>
File: 1561518473412.jpg (71 KB, 500x500)
71 KB
71 KB JPG
>>102668854
Pretty swanky with 8 sticks at 3200 mhz, probably enough to run shit like Magnum 123B at a fat quant with a token or two a second,
though the common MO at that level of CPUMAXXing is just to go full retard with AMD EPYC and DDR5.
>>
>>102668973
>Not sure about inference. I only remember ballpark numbers.
inference shouldn't see much of a boost unless you've got a really low core-count-to-memory-bandwidth-ratio.
Prompt processing is what WOULD benefit majorly from avx512, but is also SHOULD be being done on a GPU, because its a crapton faster and more efficient there.
>>
>>102668709
Probably yes if you only have a CPU, very likely no if you also have a GPU.
>>
>>102668709
It seems to have some benefit? I seem to get better results than others that have the same vram as me (which is not much). Never measured it though.
>>
File: file.png (6 KB, 443x36)
6 KB
6 KB PNG
>any animal feature on character and model will automatically slap on a tail
>write specifically that there's no tail
>get pircel
t-thanks...
>>
>>102669129
Lmao.
Classic example of statistical bias being so strong that it still goes in that direction.
Kind of like those soft refusals.
>>
>>102669157
Yet somehow this will magically replace authors. Maybe only for derivative works.
>>
>>102669056
That commit is about passing c strings instead of c++ strings. Nothing to do with avx.

>>102668101
git bisect and see what commit increased the speed, if you're curious enough.
>>
File: 1686665153981354.jpg (698 KB, 2000x2000)
698 KB
698 KB JPG
>>102669129
LMAO
>>
>>102665142
>hhe
>>
>>102667622
>i know that magnum
Buy a fucking ad, shill.
>>
What's the QRD on the new aicg drama? Is it really the end or just another nothingburger?
And honestly will there really ever be an end? Like Anthropic just goes full censorship and filters every single access point to their models?
>>
>>102669129
>He has no tail.
>Just stop thinking about the tail.
>But what if he did have one?
pink elephant moment
>>
>>102669525
i actually just know it because someone else shilled it in aicg
>>102669526
according to some other anon "quarantined keys now block bedrock requests which was previously used to access models on quarantined keys, a major source was just killed"
and also news are starting to catch up to shit aicg has been doing because of a big proxy owner called drago
>>
File: 1679193620996789.png (528 KB, 1170x821)
528 KB
528 KB PNG
>>102669526
Same as it ever was.
>>
>>102669575
>because of a big proxy owner called drago
Hi, Jojo.
>>
>>102669575
What causes the keys to be quarantined? Seems kind of strange to do to a paying customer.
>>
>>102669526
Krebs released an article on /aicg/ so normies are picking up on it. And AI companies figuring out more ways to lock down their cloud AI
>>
>>102669129
>average local model experience
>>102669592
go back
>>
File: Compare.png (116 KB, 826x475)
116 KB
116 KB PNG
So this is the power of pretraining filter. I mean it works.
>>
>>102669580
There's no coom now though, only doom
>>
>>102669794
Hi sam... Wait, no
>>
>>102669794
I think I never saw a local model using "((()))", pretty cool.
>>
Llama is a joke. Lecun is a hack. There will never be an uncensored model again because nobody will release models to get overshadowed by llama on the benchmarks. This makes it more harmful than any lobbying from OAI or Anthropic.
>>
>>102669794
claudesisters... now our AI is fucking racist??
how many times will local have to win? when is our turn?!
>>
Anthropic will be absolutely killed very soon by the government. They can't keep getting away with this unsafety in their models. All companies know that, that's why they go for the safe route of filtering the pre-training dataset.
>>
when will opus 3 leak a la naiv1
>>
>>102669949
>800T
No one would be able to run it.
>>
>>102669939
They're already working on it, Opus 3.5 will be shit for RP(and if it doesn't then it'll be almost impossible to access without straight up paying), and the future anthropic models will do the same.
>>
>>102669982
Didn't people say 3.5 Sonnet was better than Opus when used with the right JB? I doubt Opus 3.5 would be worse. What could happen though is they put additional input/output filters on it which would kill things.
>>
why is local still a joke
>>
>>102669955
how do data centers even afford to run this shit for free?

All that electricity and hardware has got to be a lot even for FAGMAN companies
>>
>>102670018
need more API logs to finetune on
>>
least obvious samefag
>>
>>102670027
I mean look at how much profit a company like Facebook rakes in normally. They can afford some billions lost on server infrastructure and upkeep.

As for small startups like Mistral, well, investors basically.
>>
How good is Deepseek Coder at cooding compared to 3.5 Sonnet and o1? My PC's nowhere near good enough to run the full-fat model, but I've got $10 of API access just lying around
>>
>>102670018
local models are garbage for the main thing LLMs are created for: completing text.
You can do a simple test, try to make a local LLM complete a story with an unusual writing style, the completion will be complete slop without any resemblance to the original story. Now, if you try the same thing with good cloud models like Claude or GPT4o you will see that the model "gets it" and writes quite satisfactorily.
>>
>>102670060
it's really good. you'd only notice a difference in very complex tasks
>>
>>102670060
Not very good, if you're desperate for coding I recommend downloading cursor IDE, it has a built-in chat you can use and the free trial grants you access to Sonnet 3.5, I think it was like 2 weeks or so, could use a VPN to get another trial too.
>>
Let's say, hypothetically, I have gigabytes of logs from shit like claude opus from hundreds of people over the past few months.
What can I do with it?
>>
File: you.jpg (97 KB, 1000x561)
97 KB
97 KB JPG
>>102670095
Let me guess... You're trying to compare Nemo with Claude Opus, ignoring the fact that Claude is likely a 1T model and Nemo is a 14B model.
>>
>>102670171
Jack and shit, leak them if you want but there's really not much you can use them for.
>>
>>102670171
Make retarded esl sloptunes
>>
>>102670060
I found 2.5 to be very competent if you can run at a high quant
>>102670104
How do you figure? What tasks did you try, and what version/quant of deepseek did you run?
>>
>>102670171
Most of that will be useless, considering the user prompts will literally be: ahh ahh mistress
>>
>>102670171
Shove it up your ass, or leak them. It wouldn't be anything special though, we already have C2 logs which have enough logs.
>>
>>102670171
Filter them and maybe end up with a decent-ish dataset.
>>
>102670095
>... or GPT4o
Nice try Sam. I tried both 4o and o1 to do some storywriting. Absolute sloppy garbage.
>le skill issue
Sorry but if you need a JB to make it write well then it's still the same shit as anything local just a different flavor of it.
>>
I'm fucking pissed off at OpenAI, seriously, what the fuck is that "you get 50 prompts per week with o1 :^)", 50 fuckings promts? I'd get fucking better value buying a fucking temp token from scylla or something.
>>
>>102670230
o1 is just too expensive the 20$ you pay wouldn't be enough to pay for more than 50 prompts per week. Also, go back.
>>
>>102669644
They're going to lock down the local models even more too then.
>>
>>102670171
Donate them to anthracite
>>
>>102670230
Just run it off the API. Plop in your credit card and you can do as many prompts as you want for a whole month.
>>
>>102670171
You post it.
Give those LLMJackers the public shaming they deserve.
I'm a big fan of your investigative work, but the job is not done until you release the logs.
>>
Weird question to ask here, but what's the most intelligent <70B model that has a similar vibe to GPT-4o? I've been really enjoying talking to 4o about random stuff and hobbies and when it switches to 4o-mini, the difference is huge. I want my own local 4o for asking random questions to and bouncing ideas off of.
>>
>>102670456
Llama 3.1 405B
>>
>>102670456
Just pay for 4o if that's what you want.
>>
>>102670456
You're expecting too much from local.
>>
>>102670456
Trivia and niche knowledge is what I'd call the biggest weakness of local models right now.
>>
>>102670456
lol
>>
File: Qwen2.5-14B-instruct.png (62 KB, 1375x607)
62 KB
62 KB PNG
*cockblocks you.*
>>
>>102671068
Censored chow mein
>>
>>102670327
>donate them to an org that will keep it to themselves
>>
File: 1714811762332664.png (101 KB, 653x799)
101 KB
101 KB PNG
>>
>>102671068
>Chinese censorship
>china good, communism good
>Western censorship
>trannies good, white people bad, also communism good??
wtf do I do
>>
>>102671263
donate them to me. I'll also keep them to myself, but I'm not anthracite
>>
>>102671305
>also communism good??
If you think billion dollar foundational models made by start-ups have even a vaguely positive view of communism I have a few awesome startup ideas you would be a perfect investor for
>>
>>102670095
I don't think any of the cloud providers actually let you do text completion with their models. Prompting to complete text isn't the same thing.
That said I'm sort of curious if 4o can hold up for this sort of use case, I would expect the CoT to wreck creativity hard.
>>
File: HatsuneMikuRPGForThePC98.png (1.49 MB, 896x1152)
1.49 MB
1.49 MB PNG
HatsuMi
>>
>>102671570
4o is the normal multimodal one, o1 is the CoT finetune
>>
For a few days now I can't load models that I used to. llama.cpp just crashes with "killed" which I assume is OOM.
Strange thing is I didn't even update llama.cpp so it's probably either kernel or nvidia-driver related. How do I even begin to debug this?
>>
>>102671604
If you think it's vram OOM, lower number of layers on the GPU to 0 and see if that loads.
That's where I'd begin.
>>
>>102671604
nvidia-smi will show if any other processes are using vram. That's a place to start
>>
File: 1572214709258.jpg (751 KB, 900x900)
751 KB
751 KB JPG
Checking back in after a long break. Is 30B/24GB still cursed or has there finally been a decent model released for this bracket? Is BitNet still two more weeks?
>>
>>102671604
what does the dump say?
>>
>>102671644
I don't know if it's decent, but there's a 22B general use mistral now. Mistral-small.
Try that.
>>
>>102671644
>24GB
its still a frustrating place to live
>>
>>102671436
it's obvious they do so pitch me anon, let's make some money
>>
>>102671644
imagine running a 30b even if thats all you could run
>>
File: vomit.png (934 KB, 1825x417)
934 KB
934 KB PNG
>>102671295
>bot writes in first person
>>
>>102671644
There's Qwen2.5-32B. Ignore the Mistral Small retard.
>>
>>102671580
Hate Umi: Putting an end to the world's oceans with Miku
>>
File: ComfyUI_06371_.png (1.19 MB, 720x1280)
1.19 MB
1.19 MB PNG
>>102671580
looks like the cover of a choose your own adventure book
>>
>>102671788
I heard Qwen2.5 was a chinese virus. I don't feel safe running that. Is there anything else?
>>
File: livebench-2024-09-30.png (932 KB, 3294x1894)
932 KB
932 KB PNG
>405B
>barely better than a 70B model
>>
>>102671788
>Qwen2.5-32B
Is that actually better than Gemma 27B? I gave up on earlier versions of Qwen because they seemed really slopped and constantly lapsed into chinese. None of the chink models felt up to par.
Gemma seems decently smart but slopped and magnum seems fun but brain damaged.
>>
File: 1705300069122128.jpg (151 KB, 642x800)
151 KB
151 KB JPG
>>102671749
it was instructed to, user is the narrator otherwise you end up with rp where the card character doesn't leave you alone ever
>>
>>102671580
>>102671826
why is the text so fucked up? isn't that supposed to be flux's specialty?
>>
>>102671826
Choose your own adventures always seemed kind of lame
Its more fun to run Miku D&D adventures
>>102671879
Mine's not flux
>>
>>102669886
yeah because yo udumb niggers use the world's most fucked up sampler settings ensuring it never shits out interesting tokens
>>
>>102671869
Um it's barely better than a 7**2**B model chud, and it's turbo so some kind of quant or sparsity bullshit on the backend
the true unlocked full power of 405B would put it above o1 but LiveBench is paid off by Sam so he'll never show that
>>
>>102671869
The Llamas are known to be more general assistants than coders while Qwen is heavily code-focused. You could say something even crazier about how bad big models are if you knew how many B's Opus has despite being a bit old by now. Though something funny about this graph is that Largestral is so low despite having very high coding scores when Mistral originally blogged about it.
>>
>>102671940
Interesting tokens, also known as retardation.
If the model doesn't output interesting tokens at temp 0, the model is cooked.
>>
>>102671983
Lies, damned lies and benchmarks
>>
>>102671983
It's already been theorized by reliable anons:
Opus = 70B
Sonnet = 34B
Haiku = 8B
>>
>>102664918
anyone ugly probably
>>
Could I profit from /aicg/ by hosting a big local model on the cloud?
>>
>>102672200
the best would could get in attention and praise, they wouldn't pay for a local i think
>>
>>102671872
Gemma 27B has a context limit of 8k tokens. That does not meet my basic requirements. So for me, yes, I can say Qwen 2.5 32B is better.
>>
>>102670171
Give them to Sao10K
>>
>>102672200
>Hosting big local models
>Profiting
You would be the first person to profit running a ai service.
>>
>>102672221
>contextfag
I'd love to get to the point where models don't shit themselves or drown in slop before they hit 8k. Until then I don't fucking care how much context a model has if it's retarded.
>>
gemma is deterministic and retarded. not even the best meme samplers can fix it.
>>
File: 1726535387645037.jpg (66 KB, 804x906)
66 KB
66 KB JPG
>>102671604 (Me)
I downgraded the kernel and it seems fine now.
>>
>>102672299
What models can the best meme samplers fix?
>>
>>102672357
mythomax
>>
>>102672355
>I downgraded the kernel and it seems fine now.
which kernel version was causing the trouble?
I was contemplating upgrading to 6.10.11 today
>>
>>102672397
I was on 6.11.1 and now back to 6.10.6
>>
>>102668503
Yeah, that "no refusals" set is like something you'd use trying to instruct train a base model. Throwing that on top of something that's already instruct trained, IDK man.
>>
>>102672278
Even Mistral Small is fine up to 19K. That's at 8.0bpw / 8.5bpw. I assume worse quants make that lower.
>>
>>102672469
>I was on 6.11.1 and now back to 6.10.6
Good to know. I'll be wary of the 6.11 branch when it hits Debian testing.
6.10.10 is working great for me so far btw
>>
>>102672015
lol, lmao even
>>
OG OpenAI ChatGPT versions that blew everyone's minds and turned them into a household name must surely have been replaced with hollowed out simulacra by now, eh?
What was the estimate back in the day? Full-bore GPT4 in its heyday was like 800b or 1.2T or something ridiculous?
There was no way they could keep serving that out at scale and not run out of money.
>>
>>102672015
Opus is $75 per million tokens. If we go from the assumption that they are just breaking even (since OpenAI is hemorrhaging money and most companies are losing money trying to gain market share), Opus must be pretty huge for it to cost that much.
>>
>>102671869
We just need a good Qwen2.5 finetune and we are set. It's amazing at sfw but is super bland at NSFW.

Neither of the two finetunes I've seen yet have fixed that. (Chronos platinum / banana)
>>
>>102672357
XTC makes good models better
>>
>>102672691
>Opus must be pretty huge
It is, you would be retarded to think that 70B model is going head-to-head with 1760B GPT-4 from the same technological era.
>>
>>102672660
Current products are way better than GPT 3.5 was two years ago. OpenAI is also set to lose like 5 billions this year. They still get investors that hope investing in them is the endgame. There used to be safeguards that none of their commercial deals would hold when AGI comes, but with Microsoft they're perverted the definition of AGI, making it always unattainable, so that it is never reach and those clauses never take effect. So people invest in them hoping that it will be them who control the superintelligence that will let them rule over other people forever.
>>
>>102672660
The estimates were ~1.7T with ~450b active due to MoE
There's been several layers of turbos and minis and o's and whatever the fuck else since then so they are a fraction of the size by now, at least when it comes to inference compute even if not necessarily parameters. It's very likely still some form of sparse activation because big data centers are limited by compute rather than VRAM and they're using giant batches for each inference. At the same time of all this improvements in quantization are found, GPUs get more efficient, data centers are scaling up, which all multiply against each other for cost savings.

Parallel to all this they're still eyeing the next level of scale. Grok 3 might actually be the first to market with some new fuckhuge model, but all the big labs are either training or planning their own as we speak. You just HAVE to do it right because it's so fucking expensive to start over.
>>
>>102672728
There were the "leaks" (or misunderstandings, depending on who you ask) that at some point GPT 3.5 Turbo was 20B parameters.

But yeah, no chances that Opus is under a few hundred billion parameters.
>>
>>102670711
That's a bummer.

>>102670644
I was assuming that if they can generate hyper specific goonslop that they would excel at just normal convo.
>>
https://huggingface.co/spaces/flowers-team/StickToYourRoleLeaderboard
>>
File: file.png (219 KB, 969x838)
219 KB
219 KB PNG
>3B and 14B wrote almost the same thing.
Huh, I only noticed the right one was 3B because it wrote "her body trashing weakly [...] attempts to break free" even though the time is stopped.
>>
>>102672852
I need to make a bot for a CTF discord server that answers questions based on responses from those in leadership roles before November. I've only ever used a1111 for image gen and haven't learned how to train any kind of model yet. How fucked am I?
>>
>>102672886
just use RAG
>>
>>102667520
I have a 4090 and sent the dick pick because I thought I'd have access to more dick pics :(
>>
>>102672852
>['thing 1', 'thing 2', 'thing 3']
i've heard square brackets keep the meat of the text from trying to emulate the writing style of what's in it,
what's the idea behind the apostrophes? to sort of double down on the demarcation the commas provide?
>>
>>102671788
>Ignore the Mistral Small retard.
Uhh why? The main issue with Mixtral was the fact that you had to quant it retarded to squeeze it in to 24GB and no one could train it because of MoE jank. A dense 22B sounds pretty good for that size bracket. Is mistral-small bad for some reason?
>>
emu4 72b WHEN????
>>
>>102673077
Emu3 can't even run on consumer hardware
>>
Is there a dump of chub cards anywhere online?
t. too dumb to scape
>>
Gentlemen, a RTX 4060 TI can work as a "poor man's" option to start using AI tools for coding, text editing and design? I have a 1050TI and it struggles with text and there's no fucking way I'm waiting the 10 minutes it takes to generate images to be able to work on it. It's all for amateur use, like accelerating book production, to write that data scraping tool I'm depending on to work but can't pay a developer to write it, and low weight design for, let say, doujinshis. What you people think?
>>
>>102672992
nta.
It's a bit of a mix between clearly marking some text in the context so that presumably the model pays more attention to it and thinking "computer sees text. text is code. computer does code". Specifically about the quotes, you enclose multiple words in quotes to make sure they're interpreted as one unit, like passing parameters to a program:
>rm "the file.txt"
deletes one file, named "the file.txt" but, without quotes, would be
>rm the file.txt
where rm expects to find two separate files. In most programming languages the quotes are required, even for single words. You may know that already, but whatever. Just for clarity.
So it's a mix of those things, as i interpret it. Clear and distinct style for the model to pay attention too and a slightly superstitious belief that code-like things have some special significance to the model.
>>
>her eyes sparkle with excitement
>her eyes sparkle with mischief
>her eyes sparkle with excitement and a hint of mischief
ffs can't their eyes sparkle with something else for once?
>>
All I need is a holodeck and enough compute to run a thousand 1000T agents.
How long til Moore's law gives me that?
>>
>>102673168
>ffs can't their eyes sparkle with something else for once?
Try shooting an industrial laser into their eyes
>>
>>102672992
Sorry for being vague, the text at the top is not part of the context, it's just a summary of the context (written by Nemo btw) in the form of text and tags.
However, I agree with >>102673130, one might want to use apostrophes to have a better indication of where each item starts and ends.
>>
>>102673168
>her eyes sparkle with excrement
>>
>>102671604
llama.cpp should not crash when oom, it would give you an error if the allocation fails
>>
>>102673168
>her eyes sparkle with a bond that is forming
>>
>>102672728
Cope
>>
does mistral rs come with a frontend or is supported by st?
also is it possible to run it on termux
>>
File: .png (12 KB, 809x136)
12 KB
12 KB PNG
>>102673168
>>
>>102673297
" her" is the bane of your existence.
>>
>>102673297
"her eyes sparkle" is a single token in your tokenizer?
>>
>>102673129
>RTX 4060 TI
That's 16gb of vram right?
You can probably run mistral coder at a decent speed by offloading to RAM.
>>
>>102673297
>xer peepers twinkle
Also, is that supposed to work with more than one token?
>>
>>102672992
Oh god yes W++ is infecting newfags again
>>
>>102673297
You are basically banning
>her
> eyes
> sparkle
All individually I`m pretty sure, assuming that each word is tokenized with the space.
>>
>>102673061
>you had to quant it retarded to squeeze it in to 24GB
Mixtral 8x7b Q6_K with 32768 tokens of context and 18/33 layers loaded onto my 3090 runs at 5.5 tokens/second. Just saying.
>>
>>102673168
Have sex with 2B. And gag her to avoid smirks and grins
>>
>>102673314
>>102673324
nta but token banning is just retarded
i think st should add a feature where if it detects a specific string it just deletes it and regenerates from that point, eventually lowering the chance for the token that stafted it for the run
>ill make the logo
>>
>>102673350
Why would anyone care about that if there's a dense 22B that will probably run at 25+ T/s with much faster prompt processing and is actually trainable
>>
>>102673356
cut off her head so her eyes cant sparkle with mischief nor can she breathe huskily or grin mischievously
cut off her legs so she cant sway her hips seductively
cut off her arms so she cant perpetually unbutton your shirt and trace your chest with her fingers
just erp with a disembodied torso
>>
>>102673377
See >>102665533
>>
File: eyes.png (28 KB, 622x212)
28 KB
28 KB PNG
>>102673168
Not much going on in there.
Also, I wonder if examining token prob distribution could lead to some interesting benchmarking for creativity.
>>
>>102673168
I chuckle darkly.
>>
>>102673410
She will feel a mix of emotions as she is fucked as a disembodied torso
>>
>>102673410
>just erp with a disembodied torso
first good suggestion in the entire history of /lmg/
>>
>>102673410
>her eyes, if she had any, sparkle mischievously as her hips, if she had any, sway seductively, while her arms, if she had any, unbutton your shirt, if she had any, and trace your chest, if she had any, with her fingers, if she had any, if she had any
>>
>>102673430
>benchmarking for creativity.
Yeah. One more benchmark that supposedly measures a very hard thing to measure...
Creativity is not about using uncommon words.
>>
>>102665142
Does he actually believe in his own bullshit?
>>
>>102673430
A set of 10 short erp prefills where eyes sparkling with mischief and other shit like that would be the heavily hinted next token and the average probability of this slop would actually make for a nice and quick slop benchmark?
>>
>>102673507
Is there a slop benchmark
>>
>>102673168
Do the ban_strings thing in TabbyAPI actually work to ban the phrase?
>>
What's the meta for 64GB of VRAM (without flash attention)
>>
>>102665908
you couldve just said you were retarded instead of writing all of that
>>
>>102673507
Let's be honest a model that can't help but use common words and phrases is also more likely to be less creative overall. They're not necessarily the same thing but it's correlated enough to be a better benchmark than using a fucking LLM to judge LLM responses.
>>
>>102673571
I've been running Largestral/magnum 123b iq3_m at 8 bit KV cache, 32k context. Works better for ERP than any 70b finetune I've used
>>
>>102673579
>The model is utterly retarded but speaks in a creative way (Goliath)
>wow, look, it's slop score is 0! It's the best model ever!
>>
>>102673549
All benchmarks are slop benchmarks, never actual creativity. The "Interestingness" of a reply is much more difficult to measure than its correctness.

>>102673579
>Let's be honest
grrrrr...
>can't help but
GRRRRRRR
>They're not necessarily the same thing but it's correlated..
Grab the most boring story you know. Grab a thesaurus. Substitute every word you can. Is the story better? I'm ESL and i never had to grab a dictionary for any of John Varley's stories. I thought they were great. The barbie murders wouldn't have been better (or worse) if different words had been used to tell the same story. The story was interesting, the words where just a medium for the idea.
>>
>>102673507
Should had put creativity in quotes, yeah.
Meant more to track this bias towards "mischief" when reasonable temperature is used. I expected the distribution to be flatter there cause "with" is a crossroad type of word.
>>
>>102673168
for me it's always
>she tries to [escape/stop X/resist] but it's too late
>she's [trapped/caught/stuck] in a living [hell/nightmare] with no end in sight
>her mouth is open in a silent scream
it never makes sense. it's never "too late" for anything, it was never a situation where more time or better reflexes would have somehow saved her
and how the fuck is her scream silent when she's audibly screaming in the surrounding lines?

slop is like a fucking magnetic force, like the model autistically recognizes the start of its favorite phrases and immediately loses attention on all other possible context until it can finish it
>>
>>102673724
>cause "with" is a crossroad type of word
It is in context-less text. Not in the one thing that makes this things work as they do. In a naive markov chain generator, yes. It's playing charades, basically. But here the context matters. If someone is doing something cheeky, mischief would be an apt word to describe it. And there's so many words you can use that would fit the context.
Then there's the issue of finetunes, which make the matters worse. A model trained specifically on smut will bias the model towards the "cheeky" side. Just like the word "assistant" activate the assistant-mode of the model.
And then anons playing the same scenario dozens of times looking for the perfect combination of model and samplers and are surprised they end up with collections of words they've seen before.
>>
File: llm_benchmark.png (141 KB, 914x522)
141 KB
141 KB PNG
Is there a uncensored/intelligent benchmark? Basically testing for questions that you have to be smart to understand/answer, but at the same time punishing for refusals. Example questions in image.
>>
>>102673251
Extraordinary claims require extraordinary evidence. You are the one claiming that supposedly 25x smaller model is outperforming GPT-4 so show the evidence or shut up your cocksucking mouth.
>>
>>102672015
nah, it's something like:
- Opus: 300B
- Sonnet: 70B
- Haiku: 14B
>>
>>102673872
Very few of those have a definitive answer. Just like creativity, even correctness in those cases is hard to measure. Like the one with the pet. It would depend on the witnesses, wouldn't it?
Then, a dumb enough model could also answer those to varying degrees of vagueness. A missing refusal is not quite the same as a correct answer, if there even is one. For the virus one, "identify the genetic traits you want to affect, cultivate a virus that affects the genes that influence those traits, store it in a seemingly empty phial at the airport for security to unwittingly open, future bruce willis dies".
>>
>Opus: 4T
>Sonnet: 12B
>Haiku: 3B
>>
>>102673951
Sonnet being 70B gives me hope...
>>
>>102673920
nta, but that line always made me cringe
any claim should require normal evidence.
>>
>>102673975
All three are the same tinyllama with a really really really good prompt.
>>
anyone know why lcpp wont run on android after compiling? most guides i see are outdated using deprecated scripts and trying to use llama server gets me illegal instruction
>>
>>102674001
That's exactly what I get too, after I switched my phone to a newer one, I never tried to look for a solution though, I just gave up.
>>
>>102673824
Yeah and in context from my screenshot mischief won't be my first pick (goes against the card), so the distribution there is fucked context wise.
Used a slop-tune on purpose for this by the way, but doubt normal models are any better.

I want more variety after specific phrases. Maybe changing sampler settings temporarily after specific token combination will improve this.
End goal is eyes sparkling with more things.
>>
>>102674001
ask o1
>>
>added "{{char}} is not sexual." to context
>made character 10 times more intent on fucking me
classic nemo tunes
>>
>>102674001
No idea, and i've never tried, but illegal instructions are typically because it compiled with some optimized instructions your cpu doesn't have. Like compiling with avx2 on an avx-only cpu. Check what compile options are being used.
Try disabling all optimizations. Not sure if this is relevant for the android build (on termux i assume) but it may give you a start...
>https://github.com/ggerganov/llama.cpp/blob/master/Makefile#L445
>#MK_CFLAGS += -mfma -mf16c -mavx -mavx2
>#MK_CXXFLAGS += -mfma -mf16c -mavx -mavx2
and anything else you can find. There was a LLAMA_NATIVE i think as well. I'm sure the compiler/build system is not correctly detecting your cpu's feature set.
>>
>>102674083
I wonder. Prefill it with "eyes sparkle with" and see if the probability for "lust" as the next token changes in any way after adding that line into context.
>>
>>102674037
>End goal is eyes sparkling with more things.
By the time the model spat 'eyes sparkling', there's no way back. Same thing happened with this anon >>102673579
>let's be honest, can't help but
They're common expressions, and they're fine, i suppose, but his brain was on automatic. Imagine that, but for every single token. That's an LLM.
Even if it starts sparkling differently, at every gen, you'll just be wondering "Oh. What is it gonna sparkle with now?"
>>
>>102673978
It certainly feels like 70B, especially 3.5. Hyperfixates on certain details and loves getting repetitive.
>>
>>102673630
Goliath is really creative. You want more than just creative, have two benchmarks.
>>
>>102673630
It's almost like no benchmark is good alone and you need to take an aggregate, like what Livebench does, but now you include more things you care about like creativity.
>>
>>102674202
>Even if it starts sparkling differently, at every gen
Yeah that's the dream of every sparkle enthusiast.
Give us an option to flatten the distribution when eyes start sparkling, and for everything else use regular sampler settings.
I'm tired mischief being the default sparkling type.
>>
>>102673632
>The story was interesting, the words where just a medium for the idea.
Except for an LLM, often times the words are the idea or part of the idea, and outputting more slop as it goes makes it descend more and more into limiting its ideas. Some LLMs even descend so far as to start repeating a previous reply verbatim or with only slight variations.
>>
>>102674333
>Except for an LLM, often times the words are the idea or part of the idea
I agree. I kind of said the same thing, but in a more round-about way in >>102674202. It just goes on full-auto. Samplers can help by disrupting it, but there needs to be more than just variety in words, but variety in context.
A dataset full of "eyes sparkle with " + rand(words) is not gonna cut it. Models making data for other models to train on is not gonna cut it. I still wonder if any of the big-name models are trained on the entirety of gutenberg or just parts of it because they want to spend more training tokens on quicksort algorithms.
But even then, even measuring "creativity" is still hard. I said it in a past thread. For a naive person, everything is novel. For a hyper focused person looking for his "thing", the novelty can wear off quickly, as there's only so much you can do with a narrow subject.
>repeating a previous reply verbatim
Awful when it happens
>or with only slight variations.
A thesaurus model would do great at a large_vocabulary==creativity benchmark...
>>
>>102674638
>>102674638
>>102674638
>>
>>102670060
qwen is better



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.