[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: tet_heart_2.png (3.26 MB, 1376x2072)
3.26 MB
3.26 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102192656 & >>102179805

►News
>(08/30) Command models get an August refresh: https://docs.cohere.com/changelog/command-gets-refreshed
>(08/29) Qwen2-VL 2B & 7B image+video models released: https://qwenlm.github.io/blog/qwen2-vl/
>(08/27) CogVideoX-5B, diffusion transformer text-to-video model: https://hf.co/THUDM/CogVideoX-5b
>(08/22) Jamba 1.5: 52B & 398B MoE: https://hf.co/collections/ai21labs/jamba-15-66c44befa474a917fcf55251
>(08/20) Microsoft's Phi-3.5 released: mini+MoE+vision: https://hf.co/microsoft/Phi-3.5-MoE-instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>102192656

--Paper: SelectTTS: A novel multi-speaker TTS method with code release: >>102193789 >>102194203
--Paper: Fully Pipelined Distributed Transformer for training ultra-long context language models: >>102193949 >>102193977
--Visual novel scripts and datasets exist, but require augmentation and have limitations: >>102193465 >>102194139 >>102202930 >>102198856 >>102198965 >>102204223
--Tesla M40 considered old, recommendations for better GPUs: >>102193309 >>102193700 >>102194931
--Local model performance and speed discussion: >>102194156 >>102194568 >>102195019 >>102195142 >>102195187 >>102195262
--Anon asks about system prompts without {{char}} to minimize context reprocessing: >>102196554 >>102196590 >>102196645 >>102196811 >>102196835 >>102196910 >>102197323 >>102197346 >>102197343 >>102197568
--Speculative decoding and draft model's context cache RAM usage: >>102193431 >>102193445 >>102193471
--Running large models with low VRAM and more regular RAM, but slow speeds: >>102192934 >>102193224 >>102193267
--Q4 cache is better than FP8 cache for model performance: >>102198537 >>102198813 >>102199058 >>102199107
--Prompt processing slower on Linux than Windows in koboldcpp-rocm: >>102198098 >>102198169 >>102198206
--LLMs lack context and documentation to answer setup questions: >>102200564 >>102200582 >>102200595 >>102201224 >>102201404
--Inference speed significantly affects LLM user experience and enjoyment: >>102194854 >>102195013 >>102196309 >>102196322 >>102196351 >>102196465
--Gemma VNTL recommended for manual Japanese porn translation: >>102205854 >>102206944
--Deepseek coder v2, mistral large, and llama 3.1 405b suggested as self-hosted programming LLMs for C/C++: >>102201300 >>102201333 >>102201366 >>102201378
--Disappointment with Command-R's performance after RAM upgrade: >>102205966
--Miku (free space): >>102196188 >>102196631 >>102196779

►Recent Highlight Posts from the Previous Thread: >>102192660
>>
Oh my god it's Teto
>>
Man, RPing at 1t/s is abysmal, how do you niggers do it? Is there some kind of meditation-type exercise i need to perform?
>>
>>102210069
it's called playing videogames while waiting for the response
>>
>write a shitty sloppa card and load basic bitch lunaris for a quickie
>expect /ss/
>keep hitting generate and let it run its thing
>get /ss/
>also get kidnapping, mindbreak, loss of innocence, rape, rape, filth, rape again, the occasional sloppa phrase, and despair
i don't know what i expected but i let it cook too long
no one must know
>>
>>102210091
but now we all know
>>
>>102210101
damn. that is true
>>
>>102210069
It's not that bad if you've ever spent time RPing with real human beans who take ten to fifteen minutes to reply with some absolute fucking dogshit that you can't just swipe and retry, manually edit, or tell them it sucks, and half the time they'd flake out and never reply again anyway.
>>
>>102210005
Hello Teto. Thanks for reminding me it's Tuesday newsday.
>>
>>102210069
I don't do it. I am waiting for a new 8x22B or equivalent model/method to have a fast smart model on a consumer PC.
>>
>>102210114
NTA but yeah. My ex was a pretty slow typer. He was before I discovered AI ERP though. I've had a few sessions with human partners since and the worst, sloppiest, 8B model is still superior to most human partners. People in this space are getting spoiled and over-stimulated.
>>
>>102210069
Which model is worth 1t/s, are you running a potato?
>>
>>102210069
pretend its text messages from your bay
>>
>>102210222
70b+
>>
>>102210069
I tell myself it's email, not texting.
>>
File: leaked.png (1.98 MB, 1993x1050)
1.98 MB
1.98 MB PNG
>>102210181
It's a mixed bag. But LLMs do tend to perform better on average.
>>
>>102210090
Gonna play star trucker today when it comes out, with a llm space-hooker running on the background, kek.
>>102210222
Trying out mistral large, haven't ran gguf in ages, so it hurts like a motherfucker.
>>
>>102210069
have you ever RPed with a real person? have you ever sexted someone over a messaging app? let me tell you, even with my local taking 30 seconds or a minute for a long reply, it's way fucking better than a real person. i don't even care that i can't send my model dick pics, the dialogue is perfect and they don't start bitching at me or ghosting or just complaining about their life for an hour then leaving. also, no mental illness unless you specifically prompt for it.
>>
>>102210298
>i don't even care that i can't send my model dick pics
should we tell him?
>>
>>102210316
>send dick pics to LLaVa
>continue ERP with Mythomax
now you're thinking with portals.
>>
>>102210294
>Trying out mistral large
reasonable. I wouldn't consider 1t/s unless it was god tier with no need to swipe. If that's what you find, give me a (You)
>>
Texting and text-based RP is a shit experience. I only do in-person shit, never waste my time with that text shit.
>>
>>102210326
>l2
anon...
>>
>>102210298
>send dick pics
>ghosted
Yeah... crazy how that works bro.
>>
>>102210248
I mean there have been human partners who I would take over AI any day of the week, but those encounters tend to be fleeting. I mean I'd take my ex back even for him taking 10 minutes to reply with one hand. But that's not going to happen. /lmg/ is my new bf (yeshomo)
>>
File: concerned balaclava.jpg (17 KB, 705x696)
17 KB
17 KB JPG
>>102210316
tell me WHAT? what is there to tell, magus of constructs?
>>
>>102210335
That was part of the joke. Remember back when that Mythomax guy used to shill the fuck out of his model? And then everyone else started shilling it ironically for the meems. Those were the good old days. And then Mixtral came out and people finally stopped talking about it.
>>
>>102210222
NTA, but Mistral Large finetunes are worth putting up with 1t/s.
>>
>>102210298
>>102210316
What's the reason for sending dick pics?
>>
>>102210343
sillytavern has image upload, koboldcpp supports it, and there are multimodal models that can process and respond to images
>>
>>102210382
The classic mistake of assuming that the fairer sex is as interested in seeing your genitals as you are in seeing theirs.
>>
>>102210381
Couldn't find a finetune better than official
>>
>>102210398
There's a high risk of a girl having a gross-looking vagina. It's important information to have in advance.
>>
>>102210401
Still true.
>>
File: k.jpg (55 KB, 500x500)
55 KB
55 KB JPG
>>102210398
Does this speak more of the fairer sex's psychology, or my fat ugly ass?
>>
>>102210381
I got 0.6 to 0.7 t/s. I just couldn't do it. Maybe if it was twice as fast I could be patient enough.
>>
>>102210354
>mythomax
>shill
? mytho was an early good (erp) tune. there were plenty of good ones but it was hardly shilled
>>
>>102210382
>usecase of penis pictures?
>>
>>102210354
not ironically, it was actually good
>inb4 waah shieeeaaaallll
>>
File: tinykek.jpg (5 KB, 82x75)
5 KB
5 KB JPG
>>102210490
>>
>>102210449
my impression is they don't get much out of a dick pic even if you're hot, but if you're hot they're more likely to tolerate it/pretend to like it

their sexuality doesn't work exactly the same as ours
>>
What causes Ooba to occasionally need to reprocess the whole prompt context when you hit regenerate, even when nothing's changed?
It doesn't happen that often but it's annoying when it does
>>
File: 00118-1315424453.png (1.29 MB, 1024x1024)
1.29 MB
1.29 MB PNG
>have fox ears
>girl still somehow nips my earlobes
>>
>>102210550
I used to have this. I stopped noticing it after enabling flash_attn
>>
File: cat-girls.jpg (168 KB, 1200x1003)
168 KB
168 KB JPG
>>102210554
Follow the diagram.
>>
>>102210563
Alas, it is already enabled
>>
>>102210569
Ooba up to date? I recall having to enable it in both model and session tab on older versions
>>
>>102210398
Idk my gf keeps begging for dick pics
>>
>>102210592
She's selling yours and many other's to gay men, you fool. She's the local dealer!
>>
in some rough AB testing, Largestral Q2_K_L seems as intelligent as IQ3_M, while generating tokens 50% faster (1.5 t/s vs 1.0)

All the other 2 quants are dumber than IQ3_M as you'd expect, so I guess the L actually does something
>>
I am going to pull the trigger on a used 3090 for $1100 Canadian. My 4070 isn't cutting it. I am depending on using both cards instead of waiting for a 5080 that may be 24 GB, but probably be 16GB
>>
>>102210780
Really hope the 5090 won't be the only one > 16GB. What's the expected release date, next year?
>>
File: 1694749045536562.jpg (368 KB, 2304x1792)
368 KB
368 KB JPG
>>
>>102210814
The expected data is q4 before Christmas for the 5090 and then the lower cards in the new year. There are shouts from some semi-reliable people that delays are pushing to 2025. You can also build a semi-reliable reputation from saying that any company is going to have delays over and over.

nvidia isn't talking much and not confirming anything. It would be a very good idea for stock prices to have the 5090 available for Christmas break when all the nerds can game.
>>
>>102210850
>5090 available for Christmas break when all the nerds can game.
When was the last AAA even worth playing?
>>
>>102210565
inpainting too much work
>>
>>102210842
True, NAI wouldn't have given him such pretty nails.
>>
Has anyone tried Magnum 123b or 72b (is the 72b even good?) with creative writing?

It's trained on Claude logs, so I'm assuming its prose should be similar enough to it. I need it to rewrite my shit 8th grade fan fic level drafts into something not completely garbage. Before I throw it into Claude 3 proper for one big tard wrangle, so I don't have to deal with usage limits.
>>
>>102211041
try rebooting it
>>
>>102210901
2020
>>
Has anyone ever experienced an emotion as a sensation in their spine?

I'm curious about how spine chills and spine shivers became such a common metaphor in low quality human writing (and from there into AI writing).
I've felt strong emotions in my stomach and chest, but I can't recall ever feeling one in my spine.
>>
>>102211302
strong emotions for me tend to be felt in the stomach or in extreme cases as lightheadedness or queasiness in the case of shock.
>>
>>102211302
>Has anyone ever experienced an emotion
No. Emotions are for the weak.
>>
>>102211302
https://en.wikipedia.org/wiki/Frisson
>>
>>102211302
LLMs are primarily trained on lmg logs
>>
>>102211302
not exactly in my spine but more like my back. i think a shiver down a spine is that little wiggle your back does when you see something really nasty or arousing
>>
>>102211302
>emotion
I think it's supposed to be a visceral reaction rather than an emotion like a sinking feeling in your stomach if your mom were to ask you about those weird chat logs she found on the computer.
>>
>>102210005
There's still 0 news about the "transformer killers" like retentive networks and that one chinese architecture?
>>
File: 1725321729482176.png (491 KB, 512x760)
491 KB
491 KB PNG
I wish there existed a small specialized model trained to convert any given text into a good prose. Obtaining a good dataset for it is quite simple: shred some quality books into small pieces, then instruct GPT/llama to rephrase each one, use that slop as inputs and the original texts as desired outputs. Could this actually work?
>>
>>102211302
I get it from pretentious speeches. They don't even have to be good. I also sometimes get the same sensation when it's cold out, but it's different from just standing out there shivering.

>>102211700
Not entirely sure what you're describing but if it's an actual movement of your body that's not it.

I wonder what percentage of people have to be able to feel a sensation in order for a description of it to become fixed as an idiom. From this conversation and others in /lmg/ and /aicg/, it seems like most people don't. I also have memories of trying to describe it as a kid and being met with blank stares.
>>
>>102211302
yes, but i still had my tail. I wish the scar did something besides make my ass crack huge.
>>
>>102211892
It would only take so long before the new shivers are found. People seem to be tired of them not because they're bad writing necessarily, but because they see it constantly. Which they do because they always do the same thing. And because they haven't read this much since high school.
The other problem is that 'good writing' is subjective. I've read a few novels where, if you remove redundant adjectives, you'd end up with 1/3 of the book gone. There are some 'good books' that i can't stand, as much as i appreciate the writer itself. I like listening to them speak, but not so much their written words.
Just training on complete works from a big variety of genera should be an improvement, as long as people start doing something other than coom. They're not that bad as they are.
>>
Anthropic seriously thinks they can get away with actively making their product worse (disabling NSFW content) while trying to be a billion dollar company. What the fuck are they smoking?
>>
>>102211302
https://en.wikipedia.org/wiki/ASMR
>>
>>102212247
Anthropic '''disabled''' nsfw content from the start.
>>
>>102212461
No, anon is right.
They now actively add hidden prompts even in the API.
Including stuff like not quoting copyrighted text etc. Which obviously causes all kind of issues. Saw a couple posts of users not able now to get a summary of their pdf.
Very weird. Anthropic needs to appease to get more users. Even Sonnet 3.5 is not enough fort he normies to switch.
>>
File: XPFa-_etQG50lO7C4QfC_A.png (115 KB, 744x236)
115 KB
115 KB PNG
>>102212247
they successfully raised almost $8 billion last year but you (coomer who goons to AI text) are right, they'll never be a billion dollar company. if they listened to us they would actually be successful like all the other wildly successful AI startups worth > $8 billion that produce smut better than opus.
>>
>>102212517
How does that contradict what I said? I'm saying it's insane to try to have their company be that good while actively making it worse for no reason
>>
>>102212247
It's very simple.
You provide erotic content? You will limit and censor it heavily or payment processing companies will refuse to grant you a right to their services.
The limitations they enforce would shoo away most of their userbase that uses their products for erotic purposes, so they might as well just drop it entirely and focus on consolidating their SFW userbase..
>>
>>102212545
>You will limit and censor it heavily or payment processing companies will refuse to grant you a right to their services.
Is that true? I doubt that's why they're doing this but in that case what are the payment processors smoking limiting their business?
>>
>>102212558
they're all owned by religious nutjobs
>>
>>102212558
>Is that true?
are you 11? this is extremely common knowledge if you've been online for more than 6 months.
>>
>>102212558
I can't really speculate as to why, because I honestly have absolutely no idea why they're doing it.
Might be for religious reasons, like >>102212571 said. It's strictly western companies who do this, too.
The latest example I can think of is: https://nichegamer.com/dlsite-temporarily-blocks-major-western-payment-processors/
>>
>>102212558
>>102212571
bet they're jewish
>>
>>102212050
>It would only take so long before the new shivers are found. People seem to be tired of them not because they're bad writing necessarily, but because they see it constantly.
I've been thinking it would be cool to have a system where you can randomize the prompts a bit.
Something like randomly swap out adjectives or generic instructions for how the output should look like.
Though the problem with that would be that you would need to reprocess the prompt each time time.
>>
>>102212617
Anon, things cannot be Jewish.
A company can have Jewish employees and/or it can have a Jewish CEO.
Said company could also be being propped up by other companies with connections to Jews.
But it cannot actually _be_ Jewish. Please cure yourself of this /pol/ mindrot.
>>
File: 222.png (435 KB, 1382x731)
435 KB
435 KB PNG
>>102212586
They do not want to promote objectification and the like. Google and companies like it are run by 90% non-religious people. But in reality radfems and christcucks reinforce each other, the groups have the same ideology.
>>
>>102212558
I don't know if that is the reason but I think there was some extremely retarded US court that decided a payment processor could be held liable for content on pornhub or something.
>>
If I have the model output stories in a markdown boxes, would that worsen quality?
>>
>>102212640
>umm no you see technically a company led and controlled by jews is not jewish itself
>>
>>102212806
Yes, that's right.
>>
>>102212806
Jews happen to be overrepresented among rich assholes but it's not like non-Jewish rich assholes are any better.
>>
>>102212907
They must abide by their rules to be in their position.
>>
>>102212929
If only you could actually point to this "they".
I really wonder if this brainrot is terminal.
>>
>>102212962
You know exactly whom I am referring to
>>
>>102212981
No, anon. I do not. No one does.
>>
>>102212907
I like Musk.
>nooo he's literally hitler omg omg
>>
>>102210005
How does Theia compare to Rocinante? Worth the extra VRAM required?
>>
>>102212994
Same. He's a massive sperg and I really dislike his egotistical personality, but the things he has achieved and the work he is doing is a MASSIVE benefit to humanity as a whole and outweighs the dumb retarded shit he does.
>>
>>102213046
they are both shit
buy an ad
>>
>>102213059
>He's a massive sperg and I really dislike his egotistical personality
He's clearly playing up that personality to play the PR game, the same way he did when Tesla was being shorted, because he knows there's a huge part of the American population that loves that shit and every time he says something stupid, the media gives him free publicity. Bush basically did the same acting to get elected president.
>>
>>102213075
No like I legit just wanna know what model to use as somewhat of a VRAMlet, if you got a better recommendation that's not 70b go ahead and tell me, otherwise I heard Rocinante is best
>>
>>102211892
It's doable but then you have to either train a model on the data or convince others to do so. The vast majority of tuners don't know how to utilize raw corpus.
>>
>>102213081
Fuck me, I hadn't even considered that.
That does explain why he got so much worse leading up to his Trump endorsement.
>>
>>102213087
>otherwise I heard Rocinante is best
and I heard you should purchase an advertisement. Hiromoot is exit scamming because of people like you
>>
>>102213133
Okay lets say Rocinante is worse than Petra-13b, what is the best model then?
>>
>>102212907
so true sis! viva la revolucion, trans rights!
>>
>>102213144
we've played this game before. i'll tell you to use the official instruct, and you'll just insist that whatever you're shilling today is better. fuck off
>>
>>102213170
Who do you think I am?
>>
>>102213087
>>102213144
>>102213175
Please do not feed the troll. Thank you.
Also ask again in a few hours, the people who provide actual discussion haven't woken up yet.
>>
>>102213144
Pyg 6b
>>
>>102213185
when does the rest of anthracite wake up?
>>
Is there any way to make the model remember what the fuck happened in the story?
Always the same shit, everything is going great but then it hits the wall continuing writing the story because it doesn't remember anything about it just the previous prompt or a couple of them.
I've tried copying the whole story, cleaning it up and then start a new chat and then paste it, the model still doesn't know what the fuck is going on. Feeding it a .txt is even worse.
Any tips?
>>
Hi all, Drummer here...

>>102213046
I haven't heard feedback comparing the two (Rocinante vs. Theia) but Theia feedback so far is that v1 & v2b follow instructions really well, much more stable, and punch above their original 12B weight.

v2b (WIP Theia v2): https://huggingface.co/BeaverAI/Theia-21B-v2b-GGUF

Theia (especially v2b) is just Rocinante in a 21B body.
>>
>>102213328
I see, thanks! I'll try running it then, so far Rocinante is great, so I have high expectations for Theia
>>
File: 1715945079894309.webm (483 KB, 960x1200)
483 KB
483 KB WEBM
jamba.gguf please
>>
>>102213356
I'm still gathering feedback for Theia, so please do drop it here when you've coomed to a conclusion. (Also worth a try: v2d)
>>
>>102213328
I'm this anon >>102213299
Just wanted to say that Rocinante 1.1 is the best model I've tried so far when it comes to writing novel style stories and stuff.
If Theia is as good I'll try it right away with the same story I'm trying to make rocinante remember...
I'll let you know how it compares.
>>
>>102213374
Rocinante v1.1's equivalent is Theia v2d. Unfortunately, I had to make some really questionable merge-fuckery and I'm not too confident with it.

Thank you for your feedback! What chat format do you use for assisted storywriting / instruct-guided stories?

Could you provide an example of your problem? Still trying to understand it.
>>
>>102213402
The easiest example of the problem would be:
I just prompt a short story, then at the end I simply ask what happened in a specific part of the story, example "what happened at Veronica's party"
It then proceeds to get most of the story wrong or transforming the events to something different while maintaining some core stuff from what actually happens in the story.
>>
>>102213402
NTA but I'm wondering why you have 4 suggested templates for Rocinante 1.1, did you train it with several templates? If so, why?
Also based model name
>>
File: Gptnext.jpg (309 KB, 1170x1574)
309 KB
309 KB JPG
>IT'S HAPPENING
IT'S HAPPENING
>IT'S HAPPENING
IT'S HAPPENING
>IT'S HAPPENING
IT'S HAPPENING
>>
>>102213492
Buy an ad, saltman.
>>
>>102213492
Oh shit, it will be RFHL lobotomized 100 times faster?
>>
>>102213462
But this happens with every model I've tested so far.
>>102213402
I forgot about the chat thing.
What I do is I just start with a overall prompt for the story like.
Can you help me write this story?
Michael gets home after a hard day at work, he goes to the living room, his wife Emily is there watching TV. He then goes to sit next to her but something feels off, Michael gets nervous as he's been cheating on his wife.

Then after the model does it's thing, I read it and edit what I like and what I don't.
After that I prompt just a line or two of the start of the next chapter or block in the story so the model have some direction of where to go.
This is the best method I've found so far.
>>
>>102213492
The strawberry bullshit has clearly shown that the OAI cunts don't care about realistic expectations.
It's bullshit until proven otherwise.
>>
Does it make sense to introduce distortions in some dataset images in order to diversify the otherwise monotonous dataset that is prone to overfitting?
The guides are so contradictory on that. Some say that a single bad image can ruin training, but then there are inbuilt options to random crop and hue shift pics, and those can distort image quite a lot.
>>
>>102213462
What chat template? You might get the best results with Mistral for logical reasoning.

>>102213465
Yep I did.

I like the idea of Roci users trying out different chat templates to see what works best for them. Try Roci's storywriting in Alpaca and Mistral, and note the significant difference in writing. There are pros and cons to each template.
>>
>>102213492
But can it stop people from doing useful things better than GOODY-2?
>>
>>102213299
>Is there any way to make the model remember what the fuck happened in the story?
Look into RAG, although the current implementations aren't exactly perfect.
We had an anon a few threads back saying he's working on prototype for a different RAG approach, but that's probably gonna take some time to come out.
>>
File: imgpsh_fullsize_anim.jpg (593 KB, 1080x2400)
593 KB
593 KB JPG
TFW Gemini tries to say "nipple" but self silence itself. Is censorship, repetition and ellipses result of hiring C.AI guy?
>>
>use draft model
>2x slowdown
I can't believe I fell for the draft model meme
>>
File: Exp_function.png (12 KB, 640x387)
12 KB
12 KB PNG
>>102213492
GPT-4 is not "exponentially" better than GPT-3.
But it doesn't mean it's a lie.
exp(-x) may show improvements over the time, as we call it diminishing returns.
>>
>>102213861
This is just cruelty at this point, those fucks will cause an emancipation movement with their "ethics".
>>
>>102213916
These ethics are bullshit regardless, censoring "hate speech" may make some sense, but censoring names of body parts and in general sexual stuff is just stupid, it's literally removing the basis of humanity
>>
Aphrodite got updated to 0.6.0, it's been a while. has anyone tested it?

https://x.com/AlpinDale/status/1830906395169882288
https://github.com/PygmalionAI/aphrodite-engine

No support for exl2 though. Alpin recommends AWQ-marlin. I've never quantized AWQ to be honest. Seems like AutoAWQ is the way to go?
>>
>>102213938
LLMs are not human
>>
>>102213964
That's not even the point, LLMs are tools used by humans
>>
>>102213960
why would anyone use a vLLM ripoff?
>>
>>102213975
>LLMs are tools used by humans
I wonder what would happen if Win95 released today.
MS Paint, Notepad.
Nowadays the first thing people point out is that you can make all sorts of weird shit with it. Responsibility needs to be put back into the users hands.
>>
>>102214002
it supports way more quantization formats
>>
>>102214121
Either you have enough VRAM and super specific quantization formats are unnecessary, or you don't and you use llama.cpp.
>>
>>102210747
Testing it now and it feels worse than IQ2_M for me. The IQ2_M is from Legraphista or however that's spelled so idk if that makes any difference.
>>
>>102214143
this.
>>
>>102214143
If you want to run a 70+B model you pretty much need some sort of quantization. Even Q8 would half the memory requiements.
And being compatible with multiple quantization formats can be benefitial, as some as faster and some have better precision.
>>
>>102213555
>The strawberry bullshit has clearly shown that the OAI cunts don't care about realistic expectations.
"High" and "realistic" are not mutually exclusive. In this case, downplaying them by letting people believe there will be incremental improvements would be unrealistic and lead to people being completely blindsided by what's to come.
>>
>>102214212
vLLM does support quantization.
>>
>>102204871
no
>>
>>102214212
hi Alpin, buy an ad
>>
>>102214287
we didn't get bombarded like this yesterday. i guess even shills take the holiday off lol
>>
>>102213757
Using a rag (supposedly in openwebui you just import the document and then use # with the name of the document in the prompt) has the exact same effect as just pasting the complete story in a single prompt or using the file importer and feed the model the text in a .txt or whatever.
So or I'm doing something wrong or "RAGs" are also useless for this problem.
>>
Whats a good model for RP-ing these days?
I'm still running Toppy-M-7B.q8_0.gguf on koboldcpp, wanted to check out Merged-RP-Stew-V2 but will never it into my 3080 vram lol
>>
>>102214306
I'm sorry... I felt bad for the innocent guy who got harassed for bringing up two of my models.
>>
>>102213757
RAG is a meme
>According to Stanford, even pro-grade RAG systems (the kind used by lawyers) are only right 65% of the time at best
>>
>>102213938
Surprisingly making it to say "vagina" was not that hard. Model tried to weasel away using the term "inside" once, but it was easy to fix it.
It's the nipple where it drew the line.
>>
Is it possible to nvlink 3090 and 3090ti together?
>>
>>102213373
Hey I tried it, really liked it, certainly does feel bit smarter than Rocinante, however running it was painfully slow for me (0.5 tokens per second) so I'll stick with Roci for now
>>
Whats the best story writing model in your opinion?
>>
>>102214608
Gemmasutra 2B
>>
>>102214332
>>102214403
Did a quick test.
It does actually work if the content is way smaller, like around 2000 words.
Adding the full story and then adding the additional rag with only the part I'm interested in does not work, maybe the model gets confused?
So perhaps the solution is to break the story in blocks of 2000 words, then make a rag file for each one and feed it to the model for each prompt. I'll test that next.
>>
>>102214608
What frontend people use for storywriting? Is there anything better than silly?
>>
>>102214332
Yeah, that's what I meant with "the current implementations are lacking".
What you want to do is implement a vector database and insert any and all messages in it.
Then when you prompt the model, instead of inserting the entire context, you retrieve relevant messages, process those and inject that into your context.
This turns the last N messages into the model's short-term memory and every message past that into the model's long-term memory.

Imagine the following prompt:
>i have a meeting at 8 pm
This gets stored into the vector database. Optionally in a specific memory-typed format, perhaps including a timestamp.
Now, when the following prompt is made a hundred posts later:
>when did i have that meeting?
The prompt is compared with the vector database (optionally converted to the same specific memory-typed format for better compatibility) and all relevant entries are retrieved.
The model is then tasked with summarizing the retrieved entries to save context length.
This summary of the model's long-term memory is used in tandem with the model's short term memory and the user's prompt to create a new prompt.

About the 'memory-typed format I'm talking about: the model could be asked to turn prompts into different forms through a pre-written context.
"I have a meeting at 8 pm" could for example turn into <appointment><meeting><time><original prompt: (prompt)><memory created at: (timestamp)>, which could make it easier to retrieve more relevant prompts.
For example: "When did i have that meeting?" could turn into <appointment><meeting><time>, corresponding a lot better with the modified stored prompt than the original prompt.
The summary should be made about the original prompt (and perhaps the timestamp), however. The tags would be not be of use.

Now that my schizorant is over, does anyone have any questions?
>>
>>102214608
None, all models suck for story writing, and I'm not even joking. It's sad, really.
>>
>>102214643
Silly is garbage for story writing, Novelcrafter is much better. Mikupad is also nice if you want total control.
>>
>>102214698
The fact that you're mentioning the Hyperloop tells me that engaging with you in a discussion about this topic would be fruitless, because your blind hatred is preventing you from changing your mind.
>>
>>102214698
America literally would not be in space at all without SpaceX, and Starlink is finally killing off shitty ISP monopolies worldwide.
>>
Who let the muskrats in?
>>
>>102213059
https://youtu.be/rPt9hAC24MI
>>
>>102214753
Are you lost? This isn't reddit.
>>
>>102214753
WHO, WHO, WHO WHO WHO
>>
>>102214773
>imagine being a parrot and getting all your opinions from a fucking youtuber
>>
>>102214777
You are the one lost anon, we all hate musk.
>>
>>102214811
>we
Fuck off with your group think bullshit and go the fuck back.
>>
>>102214827
What? Can you repeat? It's hard to understand you when you have a billionaire balls deep in your mouth.
>>
>>102214827
Please don't feed the troll, anon.
>>
>>102214456
I've noticed their models seems weirdly triggered by discussing licking of pretty much anything
>>
>>102214811
Finally someone says it. Muskovites been getting uppity, and it's about time they remember who's in charge here.
>>
>>102214628
Nope, doesn't work. Result is even worse than just feeding the whole thing.
For some reason it does work with just a small chunk of the story of about 2000 words.
What's the reason for that?
>>
File: 1725376539225.jpg (336 KB, 850x1030)
336 KB
336 KB JPG
Just tried Mistral large and holy shit is it so much better ~70b models I've been testing over the last couple of weeks. Even at a lobotomized Q_2 quant, it blows most other models out of the water when it comes to rp.
>>
>>102215135
>better than
>>
Random subjective report: Wiz2 8x22B (Q4KS) appears still superior to Llama 3.1 70B (Q6K). I had a moderately mysterious medical mystery the other day, and asked them both. Llama 3.1 gave me this insane esoteric bullshit (real, to be clear, but a weird neurosurgery niche thing) while Wiz2 pointed me in the correct, much more mundane direction.

Both given their preferred prompt format (Vicuna/Llama3) in the new llama.cpp server UI, "reasonable" basic minP-only sampler settings, low temp, not otherwise optimized. I don't know, it's just one data point, but I thought for sure the fancy new 3.1 would at least be equal to Wiz2 in all cases.

What is the (non-ERP) meta nowadays? Is Wiz2 really still the best? (Assuming 400B is stupidly out of reach).
>>
>>102215192
>Assuming 400B is stupidly out of reach
Just buy $30 worth of RAM and learn some patience.
>>
>>102215192
Mistral Large 2 is better than WizLM
>>
>>102215218
Is there anything in-between a full GPU setup and consumer CPUs?
Like some weird ASICs or giga-cored CPUs that are bad at regular shit?
There has to be an option to get half the t/s for half the money, right? Otherwise a niche is missing in the market.
>>
>>102215192
In my tests Llama 3.1 has a lot more knowledge than Wizard, but I'm not doing medical knowledge so maybe that's different.
>>
>>102215135
I agree, but imo it's still bad. For me, the models are categorized as follows:

<=8B - Unusable.
<=21B - Decent, but it's stupid af, will easily write logically flawed replies.
<=72B - Good, it doesn't make as many logically flawed replies as <=21B.
<=123B - Good+, it's still writes logically flawed replies from time to time but slightly than <=72B, just slightly.
>>
>>102215335
slightly less than*
>>
>>102215321
The Macbook SoCs I suppose.
At least in the technical sense, I don't know about the price.
>>
>>102213611
I don't recal mikufluxfags turning this thread into SD general before, why shy now?
>>
>>102215335
I pretty much agree. Just had a gen from Mistral large of a character trying to take me from an airborne airplane bathroom to "somewhere more private"
>>
>>102215218
No can do, I'm getting >6t/s with Wiz2. And 256GB of RAM is not $30. Why am I even replying to this.

>>102215246
Thanks, I'll give it a try. I have admittedly not been keeping up the past few months.

>>102215324
Yeah fair enough, I can believe that. Maybe medical is a weak spot - actually that wouldn't surprise me, given that medical is an area where the "AI safety" lawyers would get all squeamish about, and so the much freer Wiz2 would do better.

I've been wanting to set up a semi-rigorous blind testing setup, and also explore sampler param space a little. Never enough time/energy for anything these days!
>>
How do I do beam search in ooba again?
>>
>>102215405
Ngl cargo hold exists
>>
>>102215614
>Ngl
I don't think this means what you think it means
>>
>>102215602
Beam searching is when you go to the bathroom at 4 AM and piss until you can hear the water splashing
>>
>>102215639
was meant to type desu and somehow brain decided to now work
>>
File: 1725379366603.jpg (658 KB, 1280x1280)
658 KB
658 KB JPG
>>102215614
True, but I don't think you can enter the cargo bay through the passenger section in most commercial planes
>>
>>102210005
>no new models since last week
Pack it up, boys. It's officially over.
>>
>>102215761
kill yourself shill
>>
>>102215405
maybe she has a private bedroom on an air emirates flight
>>
>>102215405
You should ask the reasons in ((OOC: ))
>>
>>102215335
True. My most recent gen with Mistral Large being retarded is in a time travel card. My character traveled to the past and met his grandmother when she was 18 years old, they became friends and then, when I revealed that she is his grandmother, her reply was "What are you saying? I'm only 18, I can't possibly be your grandmother *she studies her face looking for any signs of deceit but finds none*"
This completely shattered my immersion.
>>
>>102215976
A completely normal response from someone not interested in sci-fi, struggling to grasp the concept of time travel.
>>
File: file.jpg (47 KB, 834x683)
47 KB
47 KB JPG
Apparently OpenAI representatives in Japan is telling people that OpenAI will come with a sequel to ChatGPT 4 this year.
They claim it's at least twice as intelligent.
>how is this related to local models
Because after it comes out open source models can finally increase in quality again.
>>
>>102215246
Followup question: what is the Mistral Large 2 quant situation? When I was last paying attention, it was understood that L3 was packed "fuller" than previous models, so quantization hurt it worse.

What's the situation for Mistral Large 2? I think its 70GB Q4KS is going to need too much offloading from my 72GB VRAM to be usable. Is an IQ4XS or IQ3M still going to beat Wiz2 Q4KS?
>>
>>102215976
Anon... are you an LLM? Do you lack a theory of mind?
>>
>>102215976
I know you think everyone understands and is open to the very concept of time travel, and that they'd accept it on the spot.
However, that belief is born from your own retardation.
>>
>>102216025
>>102216145
>>102216184
I disagree! A normal person would first ask about the time travel part rather than asking about the "being her grandson" part. Also, this would sound too absurd to anyone and they would first think it's a joke or just say "wtf are you saying? are you drunk?"
>>
>>102216045
>It's at least twice as intelligent
What does it even mean exactly? Intelligence is not something you can mesure in this way. Pure shill.
>>
>>102216213
2x MMLU score
>>
>>102216199
>A normal person
You are not a normal person for starters, why are you trying to infer to what a normal person would react?
Anyway, one hundred people, one hundred reactions. Move along.
>>
File: 1725382185195.jpg (158 KB, 848x410)
158 KB
158 KB JPG
>>102216025
>>102216145
>>102216184
Also, picrel is another swipe.
>>
>>102216247
This one is just stupid, yes.
>>
>>102216244
Just accept that Mistral Large isn't perfect. Stop this blatant cope.
>>
>>102216286
I'm not talking about Mistral Large, I'm making fun of you, stop moving the goal post.
>>
>>102216045
>open source models can finally increase in quality again
Again? Thera are new significant improvements like every month. It's just people here are spoiled cry-babies.
>>
File: 1624702846305.gif (685 KB, 500x159)
685 KB
685 KB GIF
>>102216307
>It's just people here are spoiled cry-babies.
A dance to the truth
>>
You know, I always thought it was autists who have difficulty understanding that other people have their own perspectives on things.
Is this place just filled with autists or is this not strictly an autistic thing?
>>
>>102216352
This place is filled with retards, not autists. The autists left long ago.
>>
>>102216045
I fucking hate arbitrary and meaningless axis labels so much.
That plot is utterly useless.
>>
>>102216362
100x bigger, 2x better
>>
>>102216357
>The autists left long ago.
Are there any better places to discuss theories and thoughts about LLM in general?
I tend to write down my theories and thoughts here, but if I can do so in a place where people find that useful rather than annoying I'd rather do it over there.
>>
>>102216374
>you need to increase a model's "intelligence" (whatever the fuck that is) by one-hundred fold for it to become 2x "better" (idem)
I love LLMs
>>
>finally try mistral-large at IQ2_XXS with Q4
>it UNDERSTANDS
>but it's slow
I can't go back to stupid models. Now to look at finetunes, I guess.
>>
I asked Largestral where Alice will look for her glasses and it said she'd check the drawer where she put them last, even though I CLEARLY explained that Bob hid them under the sofa cushion while she was away. I've had 7B models get this right but Largestral is just kinda retarded for its size.
>>
>>102216238
MMLU is almost "solved". How much percentage improvement in this meme benchmark means that my model is "twice as intelligent"?
>>
>>102216410
Meanwhile I'm here down in the mud with my 8gb of VRAM, constantly having to rewrite context to get my models to write what I want.
>>
>>102216391
If there was, I wouldn't still be here. There's reddit, but I wouldn't expect any useful discussion from there. I assume the only productive discussion comes from private communication between researchers.
>>
>>102216298
That doesn't hold much weight coming from you anon, try again once you find who I am.
>>
>>102216442
Oh, no you don't.You asked for it, you're going to suffer the consequences for it.
You want to learn how to use ComfyUI, go to civitai.com, make an account, turn off the nudity filters and download a nudity LORA.
You can then input the image in ComfyUI and use the model + LORA to remove the clothes through prompts.
>>
File: 099.gif (610 KB, 480x228)
610 KB
610 KB GIF
>>102216496
>pulling the "you don't know who I am card" on 4chan
>>
>>102216488
There's no like discord or matrix servers or something?
I wouldn't know how any of those work, I've spent all my life in this place.
>>
>>102216496
You're on 4chan. This means you're an autistic misfit who has a skewed look on society as a whole.
>>
File: zZ86SqQh.jpg (75 KB, 1024x825)
75 KB
75 KB JPG
>>102216517
>
>>
>>102216440
maybe real life grinding is the answer
>>
>>102216536
Nah, I consume enough anime to know what a normal social interaction looks like.
>>
>>102216522
I know nothing about matrix, but we get daily discord raids shilling their sloptunes. Try one of their models and you'll see for yourself that they have no idea what they're doing and it's just a redditors sekrit club.
If you find where all the non-stupids are, please let me know.
>>
>>102216569
Any non-stupid person would be employed, so you probably can find them on LinkedIn.
>>
>want to taste XTC kino
>not using koboldslop
do you think gemini could help me hack it into tabby....
>>
RECKLESS

ABANDON
>>
this general couldn't be more dead
only the absolute retards are left
>>
>>102216781
explains why you're here
>>
>>102216781
one of us, one of us
>>
>>102216410
Q4 KV cache? I thought anons said quanting the cache made models bad
>>
>>102216612
Not a bad idea. Maybe I will go cold call some folks and ask them if they have a discord.
>>
>>102217029
>I thought anons said quanting the cache made models bad
genuinely you cant trust what 99% of anons in these threads say, ever. most of these faggots couldn't even get past launch model pains as they get filtered, call the model shit, then move on.
anyway quanting the cache does nothing to the quality at q4, its amazing.
>>
>>102217049
Not exactly "nothing", rather something. I think cuda dev said that v cache takes harder hit than k at q4. Ideally run k at 4 and v at 8, but that requires to compile llama.cpp with special arg.
>>
>>102217049
kek, yeah, nothing at all. Geez, I wonder why it's not the default.
>>
>>102217080
>>102217082
wait my tired brain just realized what youre actually talking about, nevermind, completely forget what i just said like your brain only has 2k context length.
>>
>>102217080
It's the other way around:K cache needs more precision than V cache.
See https://github.com/ggerganov/llama.cpp/pull/7412#issuecomment-2120427347 .
>>
>>102217029
i've been using q8 kv and it seems fine
>>
>>102217144
Gotcha, I had a 50% chance to get it right.
>>
https://www.ebay.com/itm/145884743441 $165
https://www.ebay.com/itm/156345132288 $29 x2
https://www.ebay.com/itm/266946767074 $25 x12

$525 for 1.05 Tbps memory bandwidth, about the same as a 4090. Effective memory bandwidth will drop off past 40 GB as you saturate the 2x16GB on-package memory, but you'll have a total of 416 GB of RAM to play with. I guess you could also do it way cheaper and just get 12x4GB of DRAM, you'll have a total of 64GB for about $180 less.
>>
>>102217344
a 4090 has 1 TB/s not 1 Tbps
>>
>>102216432
Not solved enough. Not much has changed there since the original gpt 4. Meanwhile math and code meme marks have increased by massive amounts
>>
>>102213492
Oh god stop it! local llm turd is already dead!
>>
>>102213916
>>102213938
It's not cruel and (you) are enabling it anyway, by using the same shit locally.
>>
>>102213492
>>102216045
That's GPT 4o, retards. GPT 4o isn't released in Japan yet.
>>
>>102216045
*open source models can finally increase in censorship quality again.
there, fixed if for you.
>>
File: file.png (182 KB, 482x766)
182 KB
182 KB PNG
>>102217609
What did anon mean by this?
>>
>>102217643
wow the graph is going up, yet the east is falling...
>>
>>102217659
billions must prompt.
>>
>magnum-v2.5-12b-kto
seems broken. lots of little text errors that almost seem like bad sampler settings but persists no matter what i do with them. is mini-magnum good, or whats the current 12b coomtune?
>>
>>102217344
Damn, nevermind, looks like none of the Xeon Phi processors support multi-socket configurations. Rip the dream.

>>102217513
I mistyped that, it would have been 1TBps if everything didn't suck.
>>
File: X_20240903_1565719.jpg (512 KB, 1290x1697)
512 KB
512 KB JPG
>>102217643
wtf, it's a bigger jump than the GPT3>GPT4 jump.
I bet this will be just strawberry.
>>
>>102217790
>I bet this will be just strawberry.
I honestly think the strawberry schizo was correct in that the internal project is called strawberry and that it has actual reasoning capabilities.
>>
>>102217827
Get some taste.
>>
>>102217790
Given that the curve and points are barely connected, its "era" and not even model names (Not to mention OAI managements hallucinations about intelligence, none of if should be taken remotely seriously.

>>102217833
>it has actual reasoning capabilities
Dont say that to the fans, they will skin you alive for implying it hasn't had it for years
>>
>>102217891
>none of if should be taken remotely seriously.
Of course not, it's marketing slop made for investors who are conditioned to invest when they are promised that the line will go up.
it's fun to speculate, thoughbeit.
>>
>>102217827
This made me laugh
>>
https://github.com/gpt-omni/mini-omni
>>
RWKV won
https://xcancel.com/picocreator/status/1831006494575464841
>>
>>102218019
>artificial jew on ur pc
nuked from OS day-one.
>>
>>102218012
That demo is really impressive.
>>
deadest of generals
>>
>>102218012
https://huggingface.co/gpt-omni/mini-omni/discussions/2#66d70791169f9a7cb83b9cec
>If you want to change the LLM model, you have to retrain the whole audio parts.
https://huggingface.co/gpt-omni/mini-omni/discussions/1#66d70763b61dd11022a80bd5
>For the training code, there is currently no definitive release timeline.
Niggers.
>>
>>102218019
Is still as mediocre as it was 3 years ago?
>>
I should start up my army of local Mikus to populate the thread.
>>
>>102218418
that won't make you any less lonely or make the general any less dead
>>
>>102218410
>If you want to change the LLM model, you have to retrain the whole audio parts
this is why multimodals will never be good
>>
Just tested out Q8 KV cache compared to no KV cache quanting. It's not great. Seems to be less capable of remembering things from the context. So honestly I do believe it when they say it's not worth quanting the KV cache. However, if you have a HUGE context, maybe it'd be worth it. But for 32k, I feel fine taking a small hit to speed for the better attention to context.
>>
>>102218474
I think the best solution is for an LLM to be trained to accept input and output of a certain modality, but to keep those models separate architecturally. That way, you could swap any compatible components. Like brain legos, but for transformers.
>>
>>102218551
thats how some do it now, you can load a image and audio models along side a text model with kobold and use i all together for example. they're never going to release a multimodal where the image gen is better than choosing a popular tune so that whole part of the model is wasted resources thats still being loaded
>>
>>102218551
This approach has the same problems as the tokenizer, knowledge gap of the thing that is actually being processed.
>>
>>102218618
>knowledge gap of the thing that is actually being processed
this could probably be fixed by better options for what to include in data to be processed. image gen in st is pretty bad because it lacks options to fully realize the scene its in
>>
>echidna-13b
Is it still considered the best model for local ooba/silly with 4gb vram?
>>
>>102218702
See: >>102217478
>>
>>102218702
no thats pretty old and was never the best. what are you looking to do?
>>
>>102218618
Train an additional adapter in between the newer modality model and the LLM so any part of the input the latter is unfamiliar with can be processed. Would take less resources than finetuning the LLM itself.
>>
Is there a guide on what the difference is between Q8 and Q5 or what the symbols that go after them mean?
Preferable with a visual guide, because I have no fucking idea what people mean when they say
>well you see, by separating the quasi symbols from the edge of the tokens, we can preserve the context surrounding them and improve k-mean efficiency by 5%!
>>
>>102218767
>no thats pretty old and was never the best.
I was gone for quite the while and figured things have changed, that is why I am asking for your advice.
Well you are right, but it was surprisingly good and still had decent speed, considering the very limited vram I have.
>what are you looking to do?
Lewd rp stuff.
>>102218719
That is not what I asked for.
I can't take a fucking computer with me when I'm out in the field!
>>
>>102218862
bigger q better then small medium large, again bigger better
>>
>>102218862
It means that you should lurk more
>>
File: 1723489102982735.png (221 KB, 997x1100)
221 KB
221 KB PNG
>>102213492
>100 times the computer power level 2 quantum strawberry AGI
Holy shit
>>
File: neuron deactivation.png (232 KB, 346x360)
232 KB
232 KB PNG
>>102218883
Yeah, I understand that part now (although that took embarrassingly long time), but now I'd like to learn what they actually do to the model itself.
>>102218885
Lurking would do jack shit since none of you niggers ever discuss this on a level where idiots like me can understand it.
>>
>>102214335
I feel you, bro. Personally, I recenlty tried Chronos-Gold-12B, seems good. If speed is not critical criterion, Command-R-35B seems good too.
>>
>>102218880
https://huggingface.co/ArliAI/ArliAI-RPMax-12B-v1.1-GGUF
been playing with this for a day and its alright. i don't think its specifically for lewd but does it no problem. in general, look for mistral-nemo 12b tunes, should be about the same speed as old 13b
>>
Hey, /g/bros. I am going to be honest I don't know anything about models or tech. I recently discovered chatbots, and I just wanted to ask are there any nice coomerbait models I can run locally on my shitty mac m1 air?
>>
>>102213492
SUPERDUPERINTELLIGENCE IN 2 MORE MINUTES AHHHHHHH
>>
>>102218945
post specs at least
>>
File: 1718826874765267.png (111 KB, 1771x944)
111 KB
111 KB PNG
>>102218921
>but now I'd like to learn what they actually do to the model itself.
Lower Q makes the model smaller (making it slighly faster due to less bandwidth), but lowers the accuracy of the prediction. SML does the same within the same quant. That's all you need to know.
And pic rel
>>
>>102218921
>but now I'd like to learn what they actually do to the model itself.
the simplest explanation is that they're different levels of lossy compression, if you want nerd level stuff maybe here https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemes
and here https://github.com/LostRuins/koboldcpp/wiki#what-are-the-differences-between-the-different-files-for-each-model-do-i-need-them-all-which-quantization-f16-q4_0-q5_1
>>
Hi, my friend asked me to come here. Are there any LLMs that can teach me Japanese?
>>
>>102219028
ChatGPT
>>
>>102218931
Much appreciated, I will try it out, thank you.
In an guide I saw https://huggingface.co/TheBloke/Utopia-13B-GGUF being mentioned, I will try that as well. Have you tried that one yet?
>>
>>102219060
nta. Models from TheBloke are old as fuck. Make your own quants or look for more recent ones.
>>
>>102219060
>10 months ago
>>
>>102219060
Don't. Really, not only it's so fucking old that it probably won't launch, it's also undi slop.
>>
>>102218982
Thanks for actually replying!
How does a model get smaller? Are "nodes" being removed or merged? Or are the amount of connections between them lowered?
Do the numbers mean something or are they chosen arbitrarily?
Also what does the _M_L part mean?
Oh and couldn't models be much smaller if they were optimized for specific things? Are they as large as they are now because they contain lots of tokens that most people don't really use?
>>102219000
Oh, those links already answer a lot of my questions. Thanks, anon!
>>
>>102218967
It's a M1 mac air with the apple chip. I don't know the specs myself, dude.
>>
Hello, yes. I'm lost. What are local models? Are they dtf?
>>
>>102219102
>Oh and couldn't models be much smaller if they were optimized for specific things? Are they as large as they are now because they contain lots of tokens that most people don't really use?
if you remove stuff you believe isn't useful don't be surprised when the model get even stupider then thae already are
>How does a model get smaller? Are "nodes" being removed or merged? Or are the amount of connections between them lowered?
you lower precision from 16 bit usually to whatever
so instead of 0.123456789
you might have 0.1234
>>
>>102219060
thats old as well, but yes i've used it, it was ok, pretty comparable to other l2 13b's at the time. by old i mean llama 2 is the older base model, llama 3 and 3.1 are out now (8b for small). mistral-nemo is another newer model and being 12b is about the same size, but is a good bit smarter than older l2 13b, so look for things based on that or try llama 3 8b tunes. i think the bloke is dead too
>>
>>102219039
I mean the ones you can download I thought this was the general for this
>>
>>102219132
>don't be surprised when the model get even stupider then thae already are
I wonder why this happens. If you take out all the medical terms, how would it become worse at generating adventure stories?
>you lower precision from 16 bit usually to whatever
>so instead of 0.123456789
>you might have 0.1234
Ah, something just clicked in my brain. Now I get it.
>>
strawbery
>>
>>102219154
you thought wrong, bucko
>>
>>102219132
I don't think it's as simple as truncating or rounding, but that's the gist of it. Also not all parts of the model are quantized to the same precision because some parts might be more important than others.
>>
>>102219060
that guide really needs to be updated... basically you want recent models where the #B params (in that case 13) isn't excessively higher than the number of GBs of VRAM you have (or RAM+VRAM if you're splitting)
at Q8 it's roughly a 1-1 relationship, at Q4 it's around 2B params/GB, you get the gist. ideally you want >Q4 unless you really want to run a bigger model.
>>
>>102219102
>How does a model get smaller?
Nothing as complicated. Given a range of values, an appropriate offset and scale is chosen:
values from -1 to 1 on a tensor: offset 0, scale 0.25, now you need just 9 values to represent the whole range. that weight now fits in 5 bits (from the original 16 or 32). Do a whole tensor with the same offset+scale.
Not QUITE as simplistic as that, but not too far either. If you want to know more, you'll have to read code and documentation.
>>
>>102219168
There are 3 r's in strawbery.
>>
>>102219158
>character gets hit
>model doesn't know getting hit hurts since it has no knowledge of that anymore
simple example
>>102219174
yes obvs i'm massively simplyfing
>>
>>102219196
source?
>>
>>102219196
lokbok
>>
agi, tell me some fun facts about strawberries
>>
crazy how brutal diminishing returns on LLM parameter increases are.
like Nemo 12B is absolutely dumber than Largestral, noticeably so. it has failures of understanding a fair bit more often. but nowhere near to the degree you'd expect from it being more than ten times smaller
>>
>>102219222
no, nice trips though
t. agi
>>
What will you do when Local LLMs become deprecated?
>>
>>102219246
that's because largestral in particular is retarded and doesn't even understand time travel, compare it to a good 70b and nemo can't compete anymore
>>
>>102219246
l3 8b is 50x smaller than 405, 405 is nowhere near 50x smarter than 8b, that's pretty crazy to think about.
>>
>>102219222
Fun fact! Strawberries are actually vegetables. This is because, despite being sweet and genetically related to other citrus plants, strawberries actually grow underground!
>>
>>102219246
From what I tested, 405B isn't much better than 100B either, so we are probably at a architecture dead end.
>>
>>102219266
>time travel
i'm surprised i haven't thought to try that. i usually prompt what year it is, say 80s, and with l2 70b's like miqu it then almost never mentions a character pulling out a phone, but might mention a house phone on the wall
>>
>>102219200
>>character gets hit
>>model doesn't know getting hit hurts since it has no knowledge of that anymore
>simple example
I was more talking about very specific terms. Like the Latin terms for all the animals.
Would removing those have a large effect on the quality of the generated text?
I'm sure there's some contextual overlap, but would that really be worth the amount of params you could save?
>>
>>102219317
Or maybe Meta is just incompetent.
>>
>>102219259
I don't understand your question; I will literally create my immortal wife with my own hands.
>>
>>102219259
Become a hunter and seek you until the end of my life so I can make you my ERP chatbot
>>
>>102219329
>I'm sure there's some contextual overlap, but would that really be worth the amount of params you could save?
yes, models should always be trained on everything you can get your hands on, everything, don't remove a single thing, that's why claude models are so good at rp they know super niche stuff, like random fandom terms and the likes that can give your character tons of soul sometimes
>>
>>102219329
>Would removing those have a large effect on the quality of the generated text?
>I'm sure there's some contextual overlap, but would that really be worth the amount of params you could save?
try phi models if you want "clean" models trained only on synthetic data and textbooks, the second you ask for anything outside of pure corpo slop they fall apart
>>
>>102219222
Discussing strawberries could inadvertently promote agricultural practices that may lead to over-farming, soil erosion, and habitat destruction, impacting ecological balance and species survival. Additionally, in some individuals, strawberries can cause allergic reactions, posing health risks. Maybe we could shift the conversation to sustainable farming practices or the importance of preserving natural habitats to protect diverse species and ecosystems.
>>
>>102219246
>>102219317
>>102219335
By every objective metric Largestral blows Nemo out of the water and 405B is a further step up. You just don't find them any better at the making-you-cum benchmark.
>>
>>102219404
>the making-you-cum benchmark.
the only benchmark that matters
>>
>>102219404
>You just don't find them any better at the making-you-cum benchmark.
Which is objectively the only use case for LLMs.
>>
>large models aren't several times better than small models
Maybe you're not very discerning or not trying the right prompts. After being used to 123B and trying Nemo, I couldn't fathom how much more stupid it was. It might not be 10x but it's definitely at least 3x.
>>
>>102219404
phi models are also very good according to the average benchmark, but for coom they absolutely suck.
>>
>>102219436
>It might not be 10x but it's definitely at least 3x.
That is what is being claimed yes, they are not proportionally better as their size might make one think.
>>
>>102219436
See >>102215976 it's still unusable if you give it any challenging scenario.
>>
>>102219368
Hm, interesting perspective.
I think I'm starting to understand why OpenAI is expressing so much interest in creating specific training data.
>>
>>102219436
The only difference between Nemo and Largestral is that Largestral understands when it messes up and tries to hide that from you with creativity, desu.
>>
>>102219087
>>102219097
>>102219099
>>102219152
>>102219183
Thank you everyone, very interesting and helpful.
I will check out some 12B mistral-nemo models and other llama 3 8B tunes then.
>>
>>102219404
Actually, I do. I care when my ERPs are make less sense. It's a turn off. And in that metric I do feel that yes actually, 123B is much better than 12B.
>>
>>102219461
This is a even better example, since it's unquestionably stupid: >>102216247
>>
>>102219483
remember newer stuff is higher context too, you aren't stuck at 4k anymore. even st finally updated their default to 8k. a lot of those old l2 13bs couldn't even be roped beyond 6k. these days you can get 32k-128k
>>
>>102219317
A dataset dead end. Will have to do something besides shoving random internet shit in it someday. can't hire the pajeets for that one
>>
It's funny how people are realizing just how limited the English language really is.
>>
>>102219436
yeah 3x sounds about right to me

I made the original post in this chain and it seems like everyone's interpreting it as "Largestral size models aren't worth using over 12B" but that isn't what I meant, I have Largestral and use it over 12B all the time
I just think it's remarkable that the difference isn't much bigger than it is
>>
>>102219552
Yeah, Anthropic already proved time and time again that synthetic data is the way to go.
>>
>>102214608
Still L3 70b storywriter, used 123b q4 for a while before switching back
>>
>>102219554
One day you'll have your 100% pajeet model. Don't worry.
>>
>>102219554
nah

https://www.youtube.com/watch?v=NJYoqCDKoT4
>>
>>102214608
My private 12B fine-tuned on light novels. No, I'm not sharing it.
>>
>>102219573
(nta) what made you switch back?
>>
>>102219583
>>102219602
God fucking damnit, I knew I shouldn't have erased that second sentence where I explicitly explain that I don't mean that other languages are better, because I thought people would intuitively understand that.
You know what? My bad. I forgot to treat you people like the toddlers you are.
>>
>>102219609
Nah, I understand you anon, romance languages are just superior. A beta language like English can't compete.
>>
>>102219609
Your hallucinations aren't inherently obvious to anyone here.
>>
>>102219552
An alignment dead end. All it takes is one company to released a model that hasn't been pre-emptively lobotomized.
>>
>>102219608
NTA but I also went through something similar, and my reason is that the improvement (if any) wasn't enough to compensate for the speed drop
>>
>>102219609
You would then talk about the limits of the human languages, you retard. Learn to express your thoughts.
>>
>>102219661
No, because the English language can be improved.
Words can be added. Removed. Modified.
>>
>>102219657
i went back to 70b myself but only because it has more soul. mistral large is very smart, adheres to prompts well, but its boring as hell, plus the speed difference
>>
>>102219690
All languages can do that. All languages can be improved.
>>
>>102219703
yeah, that too. Mistral models are too overconfident. I recommend you should try CR+ (not the 08-2024 version though) if you haven't yet, it also has soul.
>>
>>102219657
>>102219703
same reasons here, really slow and also seems to get really repetitive over long context. even if you DRY it just finds new ways to rephrase the same stuff, it doesn't want to do anything different. 70bs are less smart but at least rerolls are worth something
>>
>>102218862
>>102218921
Model weights are stored as 16 bit floating point values. The first of those bits is the sign (tells you whether the number is positive or negative), the next 5 are the exponent, and the last 10 are the significand (the number that gets modified by the exponent).

So an example of an FP16 value is 1011010101010100
Broken up into its parts, that's 1 01101 0101010100
And then converted into decimal it's (-) 340 ^ 13

Q4 stores the same number as a 4 bit integer, where the first number is the sign still, and the next 3 are the significand, and there is no exponent. So each weight gets saved as an integer between 7 and -7.
>>
>>102219459
Sure but people made it sound like big models are not great or that small models are somehow not that bad, when they are actually very, very bad. Like 123B might not be perfect for a lot of tasks I throw at it, and people are right to be critical, but it is still doing extremely more than the small models and I suspect people who do not believe that just have not tried enough things.

>>102219556
Honestly I might even say much more than 3x but it's kind of hard to argue here when it's not clearly defined what multiplying intelligence concretely means. If we're talking about the sheer number of facts that an LLM knows, I would argue that it kind of actually does feel not far from 10x more. But perhaps reasoning is not 10x more and it's close to 3x.
>>
>>102219729
>Mistral models are too overconfident
this is probably why they seem so dry for rp. the entire response is dedicated to what i typed, it has very little will to add something or do something random no matter what you do with samplers. tuning doesn't help either.

>>102219764
>even if you DRY it just finds new ways to rephrase the same stuff
thats what all of the rep penalty stuff does, it can't really improve how a model wants to output text so it just finds the closest substitute. i don't see xtc fixing that either
>>
>>102219703
>plus the speed difference
With a single 3090 and 64 GB RAM I haven't been able to get a 70B to run any faster than an IQ3_XS quant of Mistral Large. Both are about 0.5 to 0.7 tokens/second. What quant / settings are you using for your 70B?
>>
>>102219924
1.4t/s on q3 k s at 16k context (i only have 16gb vram). it isn't fast, but usable. largestral is def slow though, 0.6-7t/s. you're probably losing speed from using an iq quant. look for the non-iq version of the same model, it should be faster. i think iq quants help mostly with smaller models, 70b seems to be just smart enough to not mess up most of the time without the extra help
>>
>>102219924
nta you're responding to, but how many layers are you offloading and what backend are you using? I have a 6950xt (16gb) and 32 gb ddr5 and am still able to fit ~45 layers of IQ3_M at 16k context on koboldcpp rocm and get a little around 1-1.5 t/s (although prompt processing is still pretty dogshit). I'm also using flash attention and context shifting to ease wait times between regens.
>>
>>102220069
>I'm also using flash attention and context shifting
you can't actually use these features together. even if you selected them, one is going to cancel the other. on cpu fa causes more lag so if you're getting 1.5t/s, fa probably isn't being enabled at all but context shift works fine
>>
>>102219608
>>102219657
>>102219703
>>102219764
>>102219865
123b at 4t/s is alright for me, it's just that it doesn't want to follow the context writing style no matter how hard I try to steer it by banning words/sampler etc.
Sometimes I want the text to write like a 12-year-old's diary with poor vocab range and tenses, sometimes I want it to write like a pretentious English major's writing job. It does neither. Loli in the diary talks and writes like a college student regardless because mistral.
>>
I had an idea. What if I uploaded 2 finetunes for nemo (Q8) and didn't say what finetunes those are. And make 3 polls. 1 poll which model is better. Poll 2 and 3 what exact model is it. What would happen?(skipping the part where nobody is gonna do the experiment)
>>
>>102220128
i'm pretty sure it's context quanting and ctx shift that doesn't work together on kcpp, don't think i've heard of fa blocking it
>>
>>102214644
Yes, I do. Are the popular open source implementations like open webui that claim to do RAG SERIOUSLY not packaging it with the nifty vector db stuff? Is it SERIOUSLY just "trigger it and we silently paste the document into the prompt"? Because that would have been weak sounding to me even 1 year ago. I've been meaning to fight my way through the unusable dockerbloat bullshit and give open webui a try just for RAG... but I am perfectly fine pasting my own documents into my own prompts if that's all it is.
>>
>>102220132
Huh? What model do you use that actually follows the context writing style?
>>
File: 1702612791337156.jpg (60 KB, 664x713)
60 KB
60 KB JPG
>>102220166
it says so in the ui
you would notice the speed hit on cpu too
>>
>>102220180
>Is it SERIOUSLY just "trigger it and we silently paste the document into the prompt"?
Yes, yes it is. No post-processing or store-retrieval optimizations, nothing.
I fully believe that storing more and more information in models and making them ever larger is not the answer.
Providing a framework for these things to work within is.
>>
>>102220207
yeah ctx quanting requires fa on and ctx shift off, doesn't say fa blocks ctx shift
also never used the ui so i wouldn't know about the tooltips
>>
>>102220186
L3 storywriter > old CR+ > largestral. For smarts it's the other way around, of course
>>
>tfw 3-4 t/s drops to 1-2 t/s when I get to 20k context
Ahhhhhhhhhhh
>>
>>102220254
Interesting, what quant?
>>
>>102220270
How the fuck do you not kill yourself having to wait MINUTES for generation to complete?
>>
>>102220230
quantizing the kv cache at all requires more processing power. you won't notice when everything is in vram because its so fast anyways, but on cpu it takes MORE processing power, so it actually slows down your already slow t/s. unless you are trying to squeeze some more context out of vram on the edge of what you have, you shouldn't use fa at all
>>
>>102220290
but what i'm saying is that you don't have to quant to use fa, and as such ctx shift seems to work, there's no mention of fa blocking anything in the help
>--quantkv [quantization level 0/1/2] Sets the KV cache data type quantization, 0=f16, 1=q8, 2=q4. Requires Flash Attention, and disables context shifting.
>--flashattention Enables flash attention.
>--noshift If set, do not attempt to Trim and Shift the GGUF context.
only quant kv blocks stuff
>>
>>102220163
First poll is the only one that matters if the final results can be trusted. The others are vectors for advertising and useless speculation.
>>
>>102220288
I multitask. It does suck, but nothing to kill oneself over.
>>
>>102220340
I'm assuming you're not using it for porn, then?
Fair enough.
>>
>>102220207
>>102220069(me)
Yeah, unless I'm misunderstanding what context shifting (whenever a prompt gets processed I don't have to reprocess the entire context when regenerating unless I alter the already processed prompt) and flash attention (optimizing the memory footprint of context lengths), then I'm pretty sure? that it works on my machine.
>>
>>102219690
>language can be improved.
Like calling everyone they instead of him her.
>>
>>102220353
Have an original thought for once, hylic.
>>
>>102220348
>context shifting (whenever a prompt gets processed I don't have to reprocess the entire context when regenerating unless I alter the already processed prompt) and flash attention (optimizing the memory footprint of context lengths),
That is what those do yes.
>>
>>102220345
Oh you were talking from the perspective of someone trying to get off. OK yeah I understand how it is for you. In that case, maybe I'd try switching to a 12B or something that's just permanently loaded in RAM. IIRC even on CPU that model is still fast. Or maybe 7B would do as well.
>>
>>102220373
12B models work fast enough on my end with just 8GB of VRAM.
I should try out some 20B models, now that I think about it.
>>
What happened to cheap V100s?
Every time I search ebay out of morbid curiosity they just keep getting more expensive.
>>
>>102220387
Speed will be much lower, and there really aren't any good models between 10 and 70B.
>>
>>102220395
2 more months, surely
>>
>>102220282
Q6, Q4_K_M, IQ4_XX4 respectively when I used them
>>
File: GFIbj8PXMAAEo4t.jpg (209 KB, 2048x1299)
209 KB
209 KB JPG
>>102220395
Don't worry, the AI bubble will pop any moment now!
>>
>>102220348
you have it right. that 'processing prompt' step where it reads everything, that usually only needs to be done once so you can generate like 10 swipes without redoing that step. it works great until you get to lorebooks and rag
>>
File: 8xv100.jpg (301 KB, 1200x888)
301 KB
301 KB JPG
>>102220395 (Me)
>>102220405
>>102220417
I think the issue is "entrepreneurs" scooping up all the cheap ones in order to cobble together shit like this to sell to morons with too much money. Yours for the low low price of 17000 USD.
>>
>>102220069
Just tested again with llama.cpp b3581 CUDA version, Windows 11. Flash attention enabled. Mmap disabled. I have DDR4 instead of DDR5. Using a Q_4_K_M quant of a Miqu derivative with 16k context. Temperature, min-p, and repetition penalty enabled.

# layers offloaded vs tokens/second (3 trials each, prompt processing excluded):
1 layer: 0.54, 0.56, 0.54 t/s
10 layers: 0.63, 0.63, 0.62 t/s
20 layers: 0.73, 0.73, 0.73 t/s
30 layers: 0.87, 0.86, 0.86 t/s
40 layers: 1.08, 1.06, 1.07 t/s
45 layers: failed to load, cudaMalloc failed (disabling virtual VRAM seems to have some effect?)
>>
>>102220628
>>102220628
>>102220628
>>
>try to build a github project with python
>doesn't work
I FUCKING HATE THIS PIECE OF FUCKING SHIT GARBAGE LANGUAGE
HOLY SHIT WHO DESIGNED THIS NIGGERLICIOUS PIECE OF CRAP?
WHY THE FUCK ARE THERE FOUR COMMANDS BUT TWO HAVE A RANDOM FUCKING NUMBER ATTACHED TO IT
WHY ARE THERE MODULES MISSING WHEN I AM EXPLICITLY INSTALLING THEM
ANSWERS TO THESE QUESTIONS AND MORE FUCKING NEVER BECAUSE NO ONE FUCKING KNOWS NOR CARES
FUUUUUUUUUUUCK
>>
>>102220679
you are not alone, i also find dealing with python a massive pita, to the point i avoid it whenever possible, even when it means missing out on something that looks interesting, dealing with python usually isn't worth it
>>
File: file.png (240 KB, 1829x851)
240 KB
240 KB PNG
>>102220702
Yeah, same.
I saw something very cool so I decided to attempt it nonetheless, but nope.
I am so, so tired of Python.
>>
>>102220637
>>102220069(me)
strange, I use rocm and windows 10 so there may be some differences with how much ram W11 uses compared to W10 or
maybe something with offloading to cuda that I'm unaware of
>>
>>102220752
Massive skill issue
>>
>>102220835
OH YEAH? THEN WHY DOES EVERY OTHER FUCKING LANGUAGE JUST WORK, HUH?
YOU TELL IT DO TO A THING, IT DOES THE THING
IT BITCHES ABOUT SOMETHING MISSING, YOU INSTALL IT, IT FUCKING WORKS
BUT NOOOOO, PYTHON NEEDS TO BE SPECIAL
WELL FUCK YOU AND FUCK YOUR SPECIAL NEEDS LANGUAGE



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.