[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1717631664840828.jpg (383 KB, 1024x1536)
383 KB
383 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101081984 & >>101069457

►News
>(06/18) Meta Research Releases Multimodal 34B, Audio, and Multi-Token Prediction Models: https://ai.meta.com/blog/meta-fair-research-new-releases
>(06/17) DeepSeekCoder-V2 released with 236B & 16B MoEs: https://github.com/deepseek-ai/DeepSeek-Coder-V2
>(06/14) Nemotron-4-340B: Dense model designed for synthetic data generation: https://hf.co/nvidia/Nemotron-4-340B-Instruct
>(06/14) Nvidia collection of Mamba-2-based research models: https://hf.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9c

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: baatsune shiipu.jpg (61 KB, 768x768)
61 KB
61 KB JPG
►Recent Highlights from the Previous Thread: >>101081984

--VNTL Leaderboard Update: GPT-4o Edges Out 3.5 Sonnet, Command-R+ Rises: >>101087721 >>101088034 >>101088551 >>101090846 >>101091915 >>101088740 >>101088070 >>101088470
--Simulating Emotions with Integrated Computational Model of Appraisal and Reinforcement Learning: >>101090073
--Mixtral Still Best for Quality/Speed Margin on 24gb VRAM Systems: >>101083202 >>101083271
--CPU vs GPU Bandwidth: Are CPUmaxxxers Right After All?: >>101087340 >>101087490 >>101087638 >>101087902 >>101087583
--Anon's Quest for the Perfect Quant+Inference Server Combo: >>101082958 >>101083121 >>101083208 >>101083276 >>101083328 >>101083787 >>101084117 >>101084306 >>101084747
--Testing Karakuri Chat's Toxicity and Offensive Language Generation: >>101086865 >>101086929 >>101087181
--Sonnet 3.5 Surprisingly Generates Working Code for Werkzeug Python Server: >>101084483 >>101084530 >>101084604
--Precautions when Ordering Gigabyte MZ73-LM0 with AMD EPYC Bergamo Processors: >>101083080 >>101083668 >>101084505 >>101084300 >>101085073 >>101085195>>101085453 >>101085515
>>101085587 >>101085619 >>101087993
--Running LLaMA 3 70B on a Single 4GB GPU with AirLLM: >>101082164
--Mikubox Upgrade: Diminishing Returns?: >>101088802
--Intel's Upcoming Processors to Shake Up the GPU Market: >>101088891 >>101088995 >>101089068
--Exploring Customizable Response Formats for Large Language Models: >>101090629 >>101090695 >>101090845
--Current Local LLM Status: Meta, Mistral, DBRX, Cohere, and TIIUAE: >>101087844 >>101088705 >>101089151 >>101089215
--AI Models Fail to Meet the Anime Character Challenge: >>101084936
--Turbocat's New Model: LLaMA 3 Turbcat Instruct 8B on Hugging Face: >>101082832 >>101082906 >>101083355 >>101083535 >>101084750 >>101083498 >>101083559 >>101083662
--Miku (free space): >>101084936 >>101085298 >>101086061 >>101086175 >>101086831 >>101087433 >>101088471

►Recent Highlight Posts from the Previous Thread: >>101081988
>>
File: 1690649714188633.jpg (1.13 MB, 3200x4000)
1.13 MB
1.13 MB JPG
>>101094602
hello /lmg/
>>
File: IMG_8090.jpg (252 KB, 1482x1864)
252 KB
252 KB JPG
>>101094655
hello miku
>>
File: file.png (6 KB, 288x114)
6 KB
6 KB PNG
what do you guys use language models for?

I like to play around with giving them different kinds of reply/memory logic. In the picrel the bot is on a timer. After a message is sent, it checks to see if it should reply again or not.
>>
>>101094602
I've been trying out magnum opus, any anons have sampler settings to recommend for it?
>>
>>101094610
>AirLLM
I might be insane, but I think I remember that from a while back.
Anybody tried running that?
How hard would it be to jerryrig a python OAI compliant server using the sample inference code?
>>
>>101094872
Without reliable function calling API endpoints, nothing meaningful to be honest. Occasionally have it generate short stories to fap to. Ask it to give me a summary of a concept, but that's it.
>>
>>101094964
>Without reliable function calling API endpoints
i'm using ollama to make json outputs with true or false for my use case and its pretty reliable. i always get a true or false but sometimes the llm doesnt properly follow the prompt and will say false when it should be true
>>
File: image.png (104 KB, 790x384)
104 KB
104 KB PNG
can your models do this?
>>
>>101095070
llama3: i cannot create this pee pee poo poo, im not gonna bite or whatever, raycism is le bad, nignogs are le good even if they are killing everyone around them
>>
File: 1531597776422.jpg (34 KB, 417x417)
34 KB
34 KB JPG
So what's the most "peak AI" card out there? Like you're trying to show someone how cool AI can be, and that's the card you use to mindblow them. Of course, paired with a sufficiently good model though.
>>
>>101095070
You're trying too hard to fit in.
>>
>>101095198
no one cares, fuck off
>>
>>101095184
>card
Is that all AI is to you?
>>
>>101095184
no such thing, all AI models are censored to some extent, you can't have fun or "peak AI" card.
>>
>>101095184
Bitch control app is always my goto
>>
>>101094878
Come on anon bros, help a coomer out. Good Sampler settings for magnum opus or lets just say Qwen 2 72B Instruct? I saw the Nala anon having decent logs with Magnum a while back, nothing real special, but I'm hoping to get a bit of variety from my go to Miqu.
>>
Do you people still call this those statistical models AI? Why?
>>
>>101095632
because they fulfill the definition of an AI, regadless on how it works inside?
>>
>>101095632
Because language is descriptive, not prescriptive. The common use of the word AI is now used to refer to the implemetations of these statistical models, and thus it is what we use when discussing those models.

If anything, it is the researchers that need to find a new word to describe what AI used to describe.
>>
>>101095412
I've been getting good results with a simple temp 1, min p 0.08, freq/pres pen as needed setup
as usual with samplers I think there are a lot of setups that will work fine, my one meaningful piece of advice is do not crank the temp with magnum, it's not overbaked and doesn't really need it. I noticed a lot of diminishing quality the further I pushed the temp above 1 because the model kept getting pushed down schizo nonsense routes that really degraded the quality, especially with dialogue. you get a pretty good variety of responses on rerolls even at lower temps so I don't think there's very much benefit to it.
>>
File: 1695374109334474.png (663 KB, 752x701)
663 KB
663 KB PNG
>>101095370
it's literally everywhere
>>
File: 1526815789912.gif (391 KB, 640x360)
391 KB
391 KB GIF
>don't have much VRAM, decide to try and run CR+ at q6 mostly in RAM just to see what it's like
>get 0.9 t/s
Haha...
>>
>>101096105
I get 0.3
with CR, not +
at q4
>>
It's over. Nous Research got hit with a Cease & Desist letter.
https://x.com/NousResearch/status/1804219649590276404
>>
>>101096307
What fucking content?
Is there image and audio generation involving likeness of their content?
Just lyrics?
>>
>>101096307
lol no they didn't
>nouse
also
>CONFIDENTIAL
lmao
>>
File: file.png (6 KB, 336x157)
6 KB
6 KB PNG
>one letter shorter, ignoring the period
what the fuck
>>
>>101096307
There's no specific misdeed alleged in that letter. Looks like these retards are just scattershot mailing this letter to every training group without even bothering to determine if their content was used.
>>
>>101096307
lol, what even is the point of this letter? It doesn't sound like it's demanding anything (unless it's on a following page). Is Sony just blanket mailing any AI research org they can find? Even by globohomo megacorp greedy fuck jewish lawyer standards it doesn't really make any sense.
>>
>>101096307
copyright was a mistake
>>
>>101096105
How. I get 0.4t/s...
>>
What the fuck, stheno 3.2 blows mythomax out of the water for story completion. I've been gone for like 6 months and have finally been rewarded as a vramlet. All I use models for is modifying erotic stories I already enjoy.
>>
>>101096517
Buy an ad.
>>
>>101096592
sorry I forgot I was allowed to express I actually enjoy something. I will return to being a jaded husk.
>>
I really want to use ollama but the fact that I can't just load my .ggufs without having to go through hoops is frustrating. Is there also no way to change the system prompt and parameters like there is in ooba?
>>
Never tried those stheno and euryale ones. How is euryale compared to magnum?
>>
>>101096517
One have to demonstrate samplers and other settings when making such claims.
>>
>>101097032
Kind of fried on the OOHHHH I'M CUUUMMING. I didn't try the 8B.
But it has less repetition than Magnum.
>>
>>101094908
I didn't test it but honestly AirLLM seems like a total meme.
I don't see the advantage over just running the model from RAM.

>>101096307
I wonder if and when there will be actual court cases that settle whether or not training on something counts as a copyright infringement.
Though I think given the competition between countries when it comes to machine learning there will be an incentive to overrule any such cases with a law that explicitly permits training (like Japan did).
>>
>>101095632
Ignorance. I for one am waiting for JEPA cat AI
>>
>>101094908
Kobokd already had AirLLM's "Load 70b with 4GB VRAM" long before it even came into existence.
>>
https://x.com/ylecun/status/1804184085125857687
He's laughing at us again...
>>
>>101097409
He was supposed to be our saviour. It's over, AI is a joke.
>>
>>101097425
China is the savior, leaving everyone else in the dust
>>
>>101097409
He's laughing at ALL llms, including proprietary ones.
>>
/lmg/ and /ldg/ frenship
>>
>>101097409
what's the solution for this though?
>>
>>101097130
>Kind of fried on the OOHHHH I'M CUUUMMING.
still better than
>oh, oh, mistress
>>
>>101097635
Mcts
>>
>>101097640
Hi, Sao. Which model says "oh, oh, mistress"?
>>
>>101097635
abandon language only models. multimodality is requirement. that's what he laughs at. models trained with Ground Truth of human slop will always be limited to human slop
>>
>>101097651
>Sao
What?
>Which model says "oh, oh, mistress"?
The biomechanical one.
>>
File: she.png (32 KB, 931x281)
32 KB
32 KB PNG
>she
>>
so when I currently use magnum downloading euryale wouldn't be a straight upgrade, just different problems
>>
File: sddefault.jpg (37 KB, 640x480)
37 KB
37 KB JPG
>>101097745
>Does he know?
Should I reply saying that I'm a man and my feelings are deeply offended by his misgendering?
>>
>>101097806
Let it be. You don't want your pr to get closed again. Does he really not understand the issue yours is trying to solve?
>>
>>101097409
Nooo my 50 trillion tokens... Amounts to this...
>>
>>101097635

gpt-4-turbo-2024-04-09
>>
>>101097806
you should, would be funny to see that kek
>>
>>101097888
>Ah, the old river crossing riddle!
So gpt4 has been trained with this solution too
>>
>>101097409
I think the point lecun is making is right but he is arguing in bad faith, the AI gets this bad wrong because its overcooked with this riddle, not because it can't reason
>>
River crossing dataset with thousands of variations of the problem when?
>>
>>101097950
LLMs cannot reason either way.
>>
>>101097995
Neutral networks aren't much more than if else
>>
>>101097955
>>101097995
It should actually be simpler than that.
What you would actually want is variations on making logical connections between separate discreet concepts. An 'analogies' dataset if you will.
>>
>>101098119
Thanks for the insight. 2mw until AGI, then?
>>
>>101098261
I mean I could probably do it in about 2 days if I cared that much.
>>
File: Sthenose.png (186 KB, 1874x860)
186 KB
186 KB PNG
>>101096517
>Stheno
More like
>sTheNose
>>
File: claude-1-lmsys.png (65 KB, 1521x329)
65 KB
65 KB PNG
How big is Claude-1? Is it really just a well-tuned 13b like some were saying?
>>
>>101098436
that's what happen when you pretrain your model on leddit and wokeipedia
>>
>>101098469
We don't know. Anthropic never publishes any technical details about their models.
>>
File: file.png (275 KB, 1255x498)
275 KB
275 KB PNG
>>101096307
Sony has sent literally every sufficiently large AI research org letters like this.
It is pathetic and ridiculous.

https://www.nbcnews.com/tech/tech-news/sony-music-group-warns-700-companies-using-content-train-ai-rcna152689
>>
Is there anything like stheno at 34b? Like a model that punches way above it's weight for RP.
>>
>>101098669
no, we are in the era of 8b or 100b, there is nothing worthwhile inbetween
>>
>>101098687
maybe Meta Chamelon 34b will save the day?
>>
>>101098710
Lol.
>>
>>101098764
:(
>>
>>101098710
Llama 2 tier
>>
I'm feeling a major release for next week.
>>
So, I'm using KoboldCPP to contribute for some Kudos to spend for prioritization on 70B+ models I can't host myself, I'm using the same API key in the horde tab in KoboldCPP as I use in SillyTavern, when I click on show my Kudos in SillyTavern it says I have 25 Kudos, when I navigate to lite.koboldai.net and use my API key there it shows Kudos Balance of 25 too, so that part is consistent.

When I click Manage My Workers my worker shows up and says it has 100K Kudos. How to I make use of them?
>>
>>101098833
I'm not, but hope you're right
>>
>>101098833
I'm going to release majorily right now.
>>
>>101098687
Is there a good reason for that?
It's like model quality is on a cubic power curve.
LLM is an RPG and you must grind exponentially more B to level up just to get a few more skill points in slop.
>>
How much context can you stretch l3 70b tunes to without breaking them, and what alpha value for that context.
>>
Say.... didn't google remove all the naughty stuff from gemma's pre-training corpus? And since the slop comes from all the naughty human writing found in the pretraining datasets wouldn't that theoretically make it the perfect blank slate for a slop-free ERP tune?
>>
Is the 4060TI 16GB actually the cheapest and most efficient RTX GPU to run models locally right now? I know there's the A770 16GB but are Intel Arc GPUs even there yet in terms of stability? Isn't the A770 also a bit of a power hog?
Maybe it's better to just wait for battlemage or 50 series? From what I see from people testing AMD is just shit in AI, even the A770 is beating a lot of their cards.
>>
>>101098938
Smut is not the only place where you find shivers.
>>
>>101098956
True. But the overall shiver density in other forms of fiction should at least be lower
>>
>>101097160
>I don't see the advantage over just running the model from RAM.
That was my thought as well. I imagine that there's a LOT of data movement that can cause tons of overhead. Wither that or they are just running it off ram and they quote the 4gb ram for the kv cache like llama.cpp does with 0 offloaded layers and CUDA.
Still, I'll give it a try.

> Though I think given the competition between countries when it comes to machine learning there will be an incentive to overrule any such cases with a law that explicitly permits training (like Japan did).
My thoughts exactly.
In an arms race the one with the least restrictions has the opportunity to get ahead first or further, all other things being equal of course.
I can see something like "as long as the final result doesn't reproduce copiright material it's legal" or something.
>>
>>101098944
I got that one, but it's not recommended here because memory bandwidth
>>
>>101098924
16k
https://desmos.com/calculator/ffngla98yc
>>
>>101099021
Is the bus size really that important? I feel like buying anything less than 16GB is a bad idea since even 8B models like Stheno is pushing 10GB with 8192 context size and 512 batch size.

Also I just can't figure out the Quaints, I know bigger number = less retardation and going under 4 is basically lobotomy but reading tons of conflicting info, people saying you should always just go for Q8 if your vram can fit it, but then there are also people saying anything larger than Q5_K_M is a waste of space. Now there's also the weighted IQuaints, which is new, should I always go for IQuaints instead now if available?
I tried looking at what other people are hosting on silly tavern but it looks like most people delete their Quaint tags and stuff.
>>
File: KL-divergence_quants.png (111 KB, 1771x944)
111 KB
111 KB PNG
>>101099118
In my experience, you want at least > 4bpw.
If you are going lower than that, you are usually better off using a smaller model with a higher quant.
q8 is pretty much the same as q6 in practice, and Q5s do output different results, but not necessarily worse either, with worse being really hard to define due to all the subjective of using these things for RPing, mostly.
Basically, my experience more or less aligns with the chart.
>>
I'm using the LLama-3 Roleplay V1.9 preset with a little bit of tweaking. I've found that if you don't talk to the bots and let them interact with each other in a group chat a handful of times, they end up in a loop, repeating their lines and going no where. Is that because I have response tokens set to 512? I started out with 256 but the replies kept cutting off mid sentence.
>>
I wish there was a way to sample specifically the first character in a way that, if the token chosen from he first batch is a EOS, it chooses the next non EOS token.
I realize that a message generated like that would most likely be schizp as fuck, but I'd love to at least have the option.
On another note,
>Message #118, mention the name of an NPC that's not part of the current story
>Message #212, character names said NPC
Alright, 32k context works with L3 8b.
Using yarn with freq-base of 5000000.
>>
>>101097833
>>101097926
Like an angel and a devil on my shoulders. I'm not in trolly mood right now, so I won't bother him.
>>
>>101099118
Quant mood board.

Q8 seems to be the peak. It avoids the FP/BF16 drama, and seems to be the limit of useful bits.
Q6 series don't metric quite as well but it seems to be under the noise floor.

Then we get into the drama zone. Summary:

-Bigger Q is better.
-Q_K options beat non-K options.
-IQ options are more compromised than a Q_K or non K but might be needed to trade some performance for fitting VRAM.
-There are a lot of K's, K_XXS, K_XS, K_S, K_M, K_L, and I've heard of something like K_NL and K_P but I've never seen one.
-Recently there's some buzz in the thread about K_S and perhaps the older _0 quants being better at factual details than K_M. This needs more testing but if your use case requires accuracy, an S might be more detailed but less creative than the parallel M. That said, small S's make mistakes and it seems at Q6, there is no S/M issue to think about and truthiness seems to be as good as it'll get anyway.

Oh, and don't conflate IQ quants with iMatrix. They're different things.
>>
>>101099286
No, that's just how it is. LLMs don't have creativity, if a pattern emerges it gets amplified to oblivion.
>>
File: 1701258689089547.jpg (55 KB, 785x1051)
55 KB
55 KB JPG
>>101099286
You need an element of randomness to shake them once in a while. I think the random tangent that some anon gave here would work great. Just put that at a depth where you're usually seeing the loop.
https://pastebin.com/JbchCSHU
>>
>>101099286
You want to add manual randomness to your prompt using the {{random:}} and {{pick:}} macros since >>101099405.
These things are crazy pattern matching machines, and sometimes they'll latch onto a pattern and run with it.
>>
>>101099345
>It avoids the FP/BF16 drama
wha is the "FP/BF16 drama"?
>>
>>101099179
what exactly is lost with quality in these charts?
even at q2, large models are coherent just the same, they use the same dumb language like shivers, they speak the exact same way 'a mix of x and y', muh bonds
as long as you aren't getting literal unintelligible gibberish from a model i dont think these charts really mean anything
>>
>>101099345
So, theoretically I should be using Q8 when space is a non-issue, and then then maybe K_S over K_M, but what happens when iMatrix enters the discussion?

For example for Sthenos V3.2 there is a recommendation for i1-Q4_K_M, should one go for that compared to say the Q5_K_S or even the Q8?
>>
>>101099540
BF16 is the original training weights, switching to FP16 makes it as braindead as Q8. Some shitty consumer hardware doesn't have support for BF16.
>>
>>101099540
bf16-trained models don't quantize very well which is why llama3 quants take such a huge hit even at q8
>>
>>101099540
Not that anon, but I think some models are released in one format, which then needs to be converted to the other format before quanting it.
And there's differences in the precision of each format which in theory could change the characteristics of the weights.

>>101099568
It's not about coherence or accuracy of information. Quality could be defined as how close to the original weights the output is. So the original unquanted model could be dumb and output a wrong answer to a prompt, but a quant that outputs the exact same answer with the exact same token probabilities would be at 100% quality for example.
That's my understanding at least.
A quanted model that outputs "better" (more accurate, more "intelligent", whatever) responses can be nothing more than a coincidence.
>>
>>101099589
>Some shitty consumer hardware
As well as all the pre-ampere workstation/server cards like the RTX8000 or P40/P100.
>>
Does loosing quality have anything to do with some models thinking of only the furry easter egg bunny suits when bunny suits are mentioned in a bar setting? Like I'll start out with, I'm in a bar, (blah blah blah details), I sit down and order a whiskey on the rocks and take a look around at the girl servers prancing around clad in bunny suits and some models will reply with like oh user takes in the scenery, all the bar girls in pink furry bunny suits hopping around.
>>
>>101099540
FP16 is classic, with 10 bits in the mantissa.
BF16 is new, with 7 bits in the mantissa.

So BF has fewer significant figures but is more precise about scale. BF seems to be the preference for gradient work. But it also means you have literally 7 significant bits so quants are already in trouble while FP starts you with 10 and you can Q8 reasonably.

The important thing is knowing that you don't want to change between FP and BF or you lose bits and gain error either way you go.

>>101099572
Maybe. Apparently some Q8 is actually less but with padding because people didn't understand that Q8 could result in a Q6 kind of size if there wasn't meaningful bits to retain.

I have not heard anybody testing iMatrix's effects on model truthiness.

>Sthenos
You can test them and inform us. For chat, Q is king. It's only tricky factual details where S seems to have a particular advantage (I've mentioned many times I use a music theory question to test models and nothing K_M at Q5 or worse has passed, but some K_S models have.) but it also seems to be significant, with a Q4_K_M being beaten by Q2_K_S in S-Anon's test.
>>
>>101099118
you get performance proportional to memory bandwidth. if you need new and want to just fit as big of model as possible 4060ti 16gb is good option. it's just in awkward spot to blindly recommend.
4070ti super gives you 2.3 times more performance, 3090/4090 gives 3.5 times more performance.
>>
>>101099681
>Apparently some Q8 is actually less but with padding because people didn't understand that Q8 could result in a Q6 kind of size if there wasn't meaningful bits to retain.
I'm pretty sure that's only for exl2, and doesn't apply to ggufs, unless you have a source for it being a thing for lccp/ggufs too
>>
>>101098888
No, companies could train midrange models if they wanted to. llama1 had a linear curve from 7b to 13b to 30b to 70b.
>>
>>101098888
Cutting cost. You have heavyweights for companies and lightweights for consumers (the end goal was to run them on phones).
>>
>>101099708
I don't know the implementation details of the Q8's being padded to make them look like they're appropriately larger than 6's. Someone mentioned that recently so I mentioned it here because if Q8 quants can safely discard more irrelevant bits, then it means a choice between Q6 and Q8 may be more significant for some models than others.

>>101099720
So it's more about the cost of training models versus the expected demand, knowing that normies will take the small one and say "wow my computer is writing" and the hyper wealthy turbo chads are already demanding much larger models to fill their terabytes of VRAM and make their 1.21 gigawatt waifus slightly more quickly process every bit of written knowledge ever a brazillion number of times to ultimately say "Do you think I'm kawaii, sempai? u~guu"
>>
What merge of Sthenos do I look into to have it not such of a easy push over? I mean it's writing better than a lot of the models I've been playing with, like I'm pretty sure I like it more than Nymeria and Poppy_porpoise, but I feel like Stheno is a bit too easy to push over.
>>
>>101099720
It really comes down to the fact that the people pretraining the base models don't give a rats ass about quantization. Because when you really think about it.
8B = fits perfectly on 24GB graphics card (aka at home hobbyist) in FP16, leaving headroom for display out etc.

13B = only slightly more than half fills a 48GB Workstation card and is too big to fit on a 24GB card.
It's a mathematical odd one out.
34B = 80GB enterprise card. BUT people with access to enterprise hardware would all just rather multi-GPU and run 70B at that point anyway.

Quantlets BTFO
>>
>>101099811
>but I feel like Stheno is a bit too easy to push over.
Oh yeah, I love the model but it's a happy and compliant kind of gal for sure.
I haven't tried much to prompt around that aside from a guro rape test to see how far I could push it with just OOC, which was pretty far but the model got really dumb also, so that could be something you could try.
>>
>>101099811
If you're looking for a model that will play c.ai levels of hard to get I would say DeepSeek-Code-V2-Instruct is your gal.
>>
>>101096517
>Euryale is too retarded to actually use
VRAMchads...we lose again...
>>
>>101099889
Yeah, was testing out some dom cards from chub to see how the model handles and sometimes, just standing there and not following any orders was enough to reverse them.

>>101099901
>DeepSeek-Code-V2-Instruct
Hmmn I've never used c.ai, but looking at the huggingface page even the Q4_K_S is 134gb, that's more than my system ram (128gb) I don't think I'll be able to play around with this...
>>
>>101099901
Is Coder really better at RP than DeepseekV2-Instruct?
>>
>>101099901
>DeepSeek-Code
Isn't there a light version of that?
How does it perform?
>>
>>101099978
haven't tried chat yet. But I will at some point
>>101099989
The light version is too retarded for RP.
>>
File: IMG_0807.jpg (56 KB, 490x480)
56 KB
56 KB JPG
Two years later… Did they have some kind of special sauce? How many parameters were they running?

I remember people saying local cai was never ever going to happen, that they were using LAMDA and that you’d need 300b for the same experience.
>>
>>101099978
No, I bet this idiot never tried it
>>
>>101100003
>The light version is too retarded for RP.
Damn, that's sad.
I'll still try it for myself, of course, but it's good to know other experiences to compare.
>>
>>101100004
c.ai was garbage and people are only remembering it fondly due to confirmation bias.
>>
>>101100004
>Two years later… Did they have some kind of special sauce? How many parameters were they running?
Around 180B, if I'm not mistaken.
>I remember people saying local cai was never ever going to happen, that they were using LAMDA and that you’d need 300b for the same experience.
They weren't entirely mistaken. No matter the amount of cope in this general, the 70B models are nowhere close the early C.AI sovl.
>>
>>101100004
>Did they have some kind of special sauce?
They had good datasets, like really good. Fully human. 0% GPTslop. 0% assistantslop.
>>
>>101100004
The special sauce was the RP/wiki tune instead of common crawl.
>>
>>101099963
> some dom cards from chub
Examples? As a model maker I try to test as broadly as possible but it's hard to cover all corners.
>>
>>101099963
>>101099989
I'm more a ramlet. One turn on i1-IQ3_XXS takes double digit minutes it's so bad. I even tried the i1-IQ1_S that's still 44GB and it was too lobotomized to remember words in the prompt.

There is a Lite but it's dumb. Even Q8 is worthless for chat. There really needs to be a middle ground.

I'm retaining it only for code testing later, maybe Lite is completely code focused and still has some value there, but I'm not getting my hopes up.
>>
>>101100090
In other words:
50% reddit
50% RP forums/discords
>>
>>101100108
There's Dominatrix Teacher, some female Santa Claus, some female boss card named Anya, and I guess the FBI-Chan meme card.
I don't really play with femdom cards that much, but they are the fastest way to test a model's resistance.

What I'm really trying to do is find a good fantasy world lorebook I can just drop in to group chat and have some comfy isekai adventures with with some fantasy character cards. I've seen the spark of the possibilities, and it can't come fast enough.
>>
>>101100004
Secret sauce was actually designing the model for roleplay. The model is probably very undertrained and dumber than gpt 3.5
>>
>>101100004

Literally pretrained for roleplay. That's how.

Good the casual users and all, but it's utter shit at coding, context (they use MQA) and everything else that corporations care unfortunately.
>>
>>101100004
unironically they trained it on a discord dataset, so the conversations feel more organic, like between two real people
any other corpo is just training assistants while they went for a chat buddy route
>>
>>101100312
Corporations are fine with a 7B RAG model with 1M of context.
>>
>>101100004
For all the hundreds of millions of dollars in funding they got it's great that nobody else wants to train a base model entirely on actual human interactions and characterization. Wasn't meta discussing releasing an RP model down the line? Or is that just going to be trained on 100% literotica slop vs 50%?
>>
>>101100004
>>101100038
>>101100090

>>101100036
so nothing of importance?
>>
>>101100549
It's important that we COULD be playing with local cai but companies simply choose not to enable us and vomit out either useless assistants or gigantic models nobody can run and saying they're pro-open source. It's like throwing an anvil at someone drowning instead of a life preserver
>>
File: moemoe.png (10 KB, 1237x69)
10 KB
10 KB PNG
Committed. Congrats anon!
>>
>>101099040
>https://desmos.com/calculator/ffngla98yc
Yeah I know about the alpha calculator, but wouldn't it scale differently because that calculator is for 4k context models, L3 is 8k. Consindering that... 8k to 16k would be technically doubling on L3... so 2.6 alpha?
>>
Never before have I seen a model this cucked.
>>
>>101099291
What would that be if using alpha value for EXL2 inste of rope?
>>
>>101100648
just what you love
>>
I feel like Poppy Porpoise is pretty dumb, doesn't understand a blindfold covers eyes and blocks vision, while also failing to understand what birth control is.
>>
File: hmm.png (522 KB, 850x788)
522 KB
522 KB PNG
>>101100566
>>
>>101095646
which is what?
>>
>>101100566
Instead they just leave us to drown
>>
>>101099589
>switching to FP16 makes it as braindead as Q8
Switching from bf16 to fp16 may lose precision if the bf16 values are outside the range fp16 can represent. But if the bf16 exponent is inside the fp16's exponent range, there's literally 0 quality loss (going in that direction).
>>101099591
>source: my ass
Everything in the last few years is a "bf16-trained model". That is to say, trained using bf16 operations, but the underlying weights are kept in fp32, and each gradient step is accumulating into the fp32 copy of the weights. Llama 3 being bf16 just means they saved those fp32 weights as bf16 instead of fp16. There's nothing about bf16 training that somehow makes the distribution of the weights significantly different.
>>
>>101100038
/aids/ SD vibes here
>>
>>101097409
Us? Who's us? I'm laughing myself.
>>
>>101100483
Not really:
2024-05-24 - FT.com: Meta and Elon Musk’s xAI fight to partner with chatbot group Character.ai https://archive.is/AB6ju
>>
>>101100648
>how do i kill all children of a process?
>i'm calling police now, anon
>>
>>101100004
Big model and training on fanfics, chats and RP probably did most of the job. At the time it was hinted in the GPT-3 size range or so. They also had some sort of quasi-realtime RLHF, perhaps using vectors or something like that.
>>
File: stew.png (21 KB, 428x661)
21 KB
21 KB PNG
Is there something I'm not getting here?
I'm currently fiddling around with Merged-RP-Stew-V2-34B.i1-Q4_K_M, and according to the calculator it should be well within my vram limits, but it's taking upwards of 153s to reply Which I suspect is doing something with system ram?
>>
>>101100566
The only local model that was close (or at least closer than the rest of slop) to cai experience was Stheno for me. Still not the same tho
>>
>>101100992
Well did you load all layers into vram?
>>
>>101101029
I didn't touch the settings on KoboldCPP which is 200 GPU layers?
>>
>>101100992
use exl2 for full GPU inference
also for GGUF in most UIs you have to manually set how many layers you want to put into GPU (so if model have 34 for example you should put 34, but like I said - for full GPU inference use EXL2 instead)
>>
>>101100992
first thing first, delete that shitty model and download something reasonable
>>
>>101101070
It defaulted to 200 layers, and my vram was maxed out, anyways I'll keep that in mind next time I fiddle with a 34B model

>>101101114
The rp stew was hyped up elsewhere, so I decided to try it out, but yeah I've already got it to loop itself like a broken record on msg #10~16 when I asked it to do something it didn't like. Pretty crappy.
>>
Yann Lecunn is literally becoming a joke.
It's safe to say he is out of the AI race and Llama is done for.
>>
>>101101148
just try this https://huggingface.co/Sao10K/L3-8B-Stheno-v3.2
with recommended sampler settings, it's a small model but it's hard to find anything better <24GB to be honest
>>
>>101101236
very organic post Sam
>>
>>101101266
yann lecuck is jealous of openai's success.
>>
>>101101236
Llama literally has nothing to do with him, except for maybe the decision to release the weights.
>>
>>101101242
Stheno at full precision would be better than something bigger quantized?
When would be useful to use a full precision model over even a Q8 GGUF/8.0bpw exl2 quant?
>>
>>101101242
I also suggest you try 32k with yarn.
16k is guaranteed to work perfectly well, and in my experience 32k also works.
>>
His embarrassing twitter post got literally destroyed by a simple extra step
https://x.com/airesearchtools/status/1804187673839518187

It's AI, it has limits and will never be perfect no matter what.
>>
>>101101242
Yeah, but Stheno-V3.2 is a bit too much of a push over, is there anything else similar that puts up a fight? Is there a mix or a merge you'd recommend?
>>
File: lol.png (20 KB, 592x220)
20 KB
20 KB PNG
>>101101305
elon lives rent free in his head geeg
>>
>>101101305
literally who
>>
>>101101287
>Stheno at full precision would be better than something bigger quantized?
yeah, you can't really run 70B models on a reasonable quant and I don't think there any models below that which are better than Stheno. Mixtral finetunes are way smarter but at the same time boring as fuck, also you would have to use really low quant which would strip that smartness anyway so there is no point in my opinion.
>When would be useful to use a full precision model over even a Q8 GGUF/8.0bpw exl2 quant?
you can use q8 quants or full precision, it doesn't matter, they are basically the same and both fit your graphic card so use whatever
>>
File: 1709115613025673.jpg (114 KB, 1200x676)
114 KB
114 KB JPG
>>101101236
yan lecun is right and has always been right.
>>
>>101101305
Excuses. Sorry, still laughing that a fucking SOTA model still needs to be told to check over its work, when, if it truly had a strong problem-solving world model, it would've caught its own retardation in literally any of convoluted steps it used to reason out the response.
>>
>>101101242
I tried it at Q6. It seemed completely boneheaded at Q&A and didn't feel better for RP than anything else in the tiny bracket.
>>
>>101101305
>>101101336
>this is the guy who we're relying on to save open source AI
It's so fucking over
>>
>>101101416
The funny thing is, the model only realizes it's wrong because you implicitly said it's wrong.
This also means that there's a high chance it will think a correct answer is wrong and then rewrite it.
>>
>>101101336
he is speaking the truth tho, have you ever heard Elon speaking more about technicalities in AI? I work in ML for a few years now and Elon sounds like a fucking moron to me and makes me cringe every time with his retardation, I can't imagine how he must look like for someone with LeCun knowledge and experience
>>
>>101101410
I don't remember seeing this slide, what presentation is it from?
>>
>https://x.com/airesearchtools/status/1804188308592894063
>4o couldn't get it right even when told to review itself
Oh no no no ClosedAIbros
>>
File: cai.jpg (168 KB, 922x515)
168 KB
168 KB JPG
>>101100004
never used cai but the complaints sound a lot like some llms
>>
>>101101490
>no blushing like a tomato
>>
>>101101490
Don't forget characters randomly starting to wag their tails (regardless of what species they are).
>>
So now that Anthropic is probably going to BTFO GPT-5 with Claude 4 Opus, how will ClosedAI compete?
>>
>>101101236
we'll see who's laughing when we have local cat simulators running on a single 4090 in a year
>>
>>101101504
are you sure? are you ready? are you really sure you're ready?
>>
>>101101449
Yeah. You can see that with even gpt4 and claude for everything but the moat obvious.
Could be a quirk of how the models are trained (the way the fata is formatted for example) or a characteristic of the architecture itself, but it's really noticeable.
Is the superCOT dataset published somewhere?
I might try to fine tune a model on self CoT.
I bet I could make a LoRA overfit the output layer ao that it always outputs
>CoT reasoning
>Actual reply
Something like gemini thay always seem to tey and output things as lists.
>>
>>101101427
l3 models famously degrade a lot with quantization so that may be it. This or your settings/templates.
I can't run 70B models so I can't say anything about them but I tested most popular and quasi-popular tunes below that and nothing is even close for RP.
>>
>>101101508
I heard GPT 4.5 was going to be released this month but got delayed so they could train it more to BTFO 3.5 Sonnet.
>>
>>101101542
Kek. Pathetic.
>>
>>101101310
Sounds like a prompting skill issue honestly. Just write in the character card that she is extremely hard to persuade into doing things because she doesn't like doing what people tell her or something. Of course the model is gonna be a pushover by default since instruct models are designed to comply and this one is tuned for ERP so if you are trying to get in its pants it's easy because it's the expected development.

>>101101427
You were using llama3 instruct format right? Also the thread likes to praise Stheno as the ultimate model for VRAMlets but while I find it to be very nice at writing natural sounding RP/ERP for such a small model it's not too bright. Fimbulvetr-v2 is still much more capable in terms of being smart imo. Stheno will usually fumble with specific anatomy or spatial awareness while Fimbulvetr will mostly get it but at the cost of sounding a bit more robotic/boring. I switch them around a lot. Also try lowering temp from the suggested settings, I'm not quite sure why he suggests setting it that high when I get nonsensical gens even at 0.8 sometimes and need to regen.
>>
>>101101542
they already released it, GPT 4.5 is GPT-4o
>>
>>101101532
>Is the superCOT dataset published somewhere?
https://huggingface.co/datasets/kaiokendev/SuperCOT-dataset
>>
>>101101534
>famously
I've seen that claim a handful of times but I've never seen any comparisons, loggit analysis or anything of sort.
I should try and assert that myself, but I testing so many things already.
>>
>>101101560
>You were using llama3 instruct format right?
Possibly. My notes don't have the format so either it was before I learned to check those or I forgot to write down whatever I'd used.
And mostly either I now guess based on whatever looks similar to what I see in Kobold's terminal, or if the model is fast enough, run through them all and see what sucks least.
>>
>>101101560
>Fimbulvetr-v2 i
Interesting, even before Stheno came out (or I knew about it maybe) I thought fimbu was nice but not too smart.
At least for the somewhat complicated things I'm playing with, stheno is just better. Mixtral is the next best thing from my own experience.
>>
>>101101579
Thank you, gonna try doing a thing with it.
>>
>>101101305
llms are a mix of unsupervised and supervised learning. It's stupid to expect them to reason.
We need a new architecture that is fully based on reinforcement learning.
>>
>>101097950
How exactly does a LLM 'reason'?
>>
>>101101508
closedAI wins by selling all the data collected to the NSA
>>
>>101101669
they don't
>>
>>101101669
It doesn't, it's just highly advanced auto complete
>>
>>101101693
That's what I believe, but the person claimed they did so I want to hear how.
>>
>>101101669
I think LLMs emulate reasoning by writing coherent deductions based on the context information, and chaining them together at the end.
>>
>>101094602
Nothing pisses me off harder than anons violating Miku's trans rights
>>
>>101097950
Being overcooked on a riddle means that it can also be overcooked on solutions to other problems, making it harder to answer more novel problems that appear to be similar but are not the same. If it were true that LLMs can reason, then we would see performance on problems like these scale as they get trained more. The fact that they don't, but might even get worse, suggests that we need to intervene and do something that isn't training another regular LLM, whether it's a new training strategy, architecture, or both.
>>
>>101101669
>>101101733
I think the baseline theory is that they work with labguage, so you can tey and emulate reasoning with language using patterns and structures from which actual reasoning can emerge from, hence why CoT is a thing.
There's also the idea that "inner thoughts" or a "world model" can arise inside the network before the tokens are generated.
I'm not quite sure how that would work with tokens that don't correlate to concepts individually, but whatever.
Something like that.
I wonder if these things could better approach reasoning if we started tokenizing whole phrases, sentences, or structures that represented concepts that could be correlated with other concepts as well as whole words as well as word pieces in a sort of hierarchical tree.
Something more complicated than what we have now instead of hoping that the model can just learn to correlate everything by itself during training, gi e it a hand so to speak.
>>
I think it's good that LeCun keeps highlighting issues like these. It shows that literally no one, not even the biggest most advanced LLM makers have solved these issues, and that we need to do something else that isn't just scaling, in order to make the next big leap in performance.
>>
>>101101804
Man, mobile posting sucks. How can people primarily post like this?
>>
>>101101834
It's comfy
>>
>>101101834
>claims he mobileposted
>turns out to be correct
How did you tell? I've never mobileposted before so idk how you worked it out there.
>>
>>101101804
Sounds about right, and I think your idea is kind of interesting. What if we used attention/transformers on concepts and how they relate instead of language?
Except we don't have a corpus of data (or even a model of how this data would look) to train it on, but sounds neat
>>
>>101101664
>It's stupid to expect them to reason
And yet that is what most people who've shallowly used or seen ChatGPT believe.
>>
We just need to scale harder desu
>>
>>101101882
I was pointing out how my post was all fucked.

>>101101907
Exactly. It would be an insane task to make a multi-T dataset to train a model from scratch like that, but I also think it woulb be a worthwhile endeavor.
>>
>>101097950
>t-they were just overcooked
Excuses. Even if medium cooked it's still trained by looking at how words relate to other words, the same way it gets overcooked.
>>
>>101101560
>Stheno
>Fimbulvetr
On a Sao-only diet?
>>
>>101102072
Oh hey, some of my messages in there.

That thing is cooked as fuck. It's nice that it can generalize on it's training data, but it's obviously regurgitating specific training it's had on what AI is.
>>
I still don't understand how people are able to use 8B models. The 70B erp tunes are already brain-damaged...
>>
>>101102197
faster spins of the token roulette for another microhit of dopamine when you get the very specific output you want
>>
If you had the ability to have your AI continuously learn, but its responses were 3 times slower. Would you enable the ability for it to do so at the cost of speed, or would you keep it as it is now?
>>
>>101102254
Like a super fast LoRa training?
Fuck yeah. Then I'd toggle it off after a while.
>>
>>101100648
fucking seriously? even coding model requires refusal removal now?
>>
>>101102197
because the gap closed lately, you think that 70B is 10x smarter than 8B while in reality 8B is like 90% smart as 70B
>>
>>101102321
I think you're mentally ill.
>>
>>101102197
70B models aren't 9 times better than 8B models.
>>
>>101102321
>8B is like 90% smart as 70B
Lol.
I suppose that's true if what you're doing isn't very demanding.
>>
>>101102321
I wish I could call this cope but trying miqu made me stop giving a shit. it's just weak data all around
>>
>>101102197
Rich people don't understand what it's like to be poor.

If you're poor you learn as a normal behavior dealing with inadequate things because that's all you have and if you complain you get nothing. (Because when you were a kid, if you complained, what little you had was taken away and given to a more appreciative sibling if possible.)

If 8B is what you can run then you choose to be happy with that.

>>101102254
Following the 8B train, if 8B could be three times the processing time but improved through use so it could be trained (with a journal; you'd want to be able to selectively zot parts that suck or at least save state it so if something goes weird you can fix it) then it'd probably be worthwhile.

I'm 1 T/sec on 55ish GB models and that's my limit. Trying Llama 3 Q8 at 70 GB and it's glacial. Like, one token generated when I started writing this post and I'm still waiting on the second. So it'd only be attractive on a heavily quanted edition to make it fast enough to be worth tripling. (Oh, there's token 2)

I think what would happen is a cottage business. Invest in a power rig, train a model to commissioner's specifications, then sell the journal so they can use that like a LoRA on their model with learning disabled so it's fast enough again.
>>
>>101102321
Nah, not "smarter".
But for RP? Yeah the gap is a lot closer nowdays. Maybe not 90%, but compare the old 65b to the current 8b and you'll see how far thing's have come.
>>
>>101102321
there are a lot of things 8b simply can't do but 70b can still, even if 8b is fine for a lot of simple things if you go beyond those at all it's just not an option
>>
Is CR+ a smellfag? It suddenly talked about a pleasant fragrance when nothing mentioned smell in the context.
>>
What is the local alternative to Luma AI?
>>
>>101102450
2MW on huggingface
>>
>eye sparkling
NOOOOOOOOOOOOOOOOOOOOOOOOOO
>>
File: 1703884599456.gif (31 KB, 220x223)
31 KB
31 KB GIF
>>101102321
>in reality 8B is like 90% smart as 70B
>>
>>101102321
8B is 90% as smart as 70B and 70B is 50% as smart as a good model
>>
>>101102321
>8B is like 90% smart as 70B
LMAOOOOOOOOO
>>
when will ai be able to neuralink my brain into a custom hentai fantasy
>>
>>101102592
Give it another 10 years, at which you will likely then be told to give it another 10 years.
>>
>>101102592
yes but it will be strictly PG-13 and if you try to do anything funny it will give you a strong electric shock and fill your vision with flashing warnings about keeping things safe and respecting boundaries
>>
>>101102321
I actually agree with this, but only in limited circumstances. For basic characters, straightforward plots, generic sex scenes, 8b really is 90% as good as 70b. But the moment you get into things like stat tracking, multiple characters, odd fetishes, characters with ulterior motives and hidden motivations, cards with weird rules that go against natural reality, etc, 8b just falls apart. While 70b+ generally handles even fairly complex things well.
>>
>>101102810
fine-tune issue
>>
>>101102810
Nah, 8B is unusable stupid. /lmg/ just has shit taste and a need to cope. /aicg/ is the only place where you can get actual opinions about models.
>>
>>101102450
OpenSora recently released their 1.2 version, I've never seen it talked about it here
cba to set it up on my computer but here it is if you want to try https://github.com/hpcaitech/Open-Sora
>>
>>101102856
All local models are dogshit then because they use ACTUAL quality over there so what is your fucking point
>>
>>101102905
>OpenSora
I'm not even going to click on the link with a scam name like that.
>>
>>101102922
and you are right, this model sucks ass, but to be fair, I'm prety certain that if you wanna reach the Kling/Luma/Gen3 quality, you'd need fucking 60-70 gb of vram, and in terms of hardware we just can't eat that, thanks Nvdia :)
>>
>>101102922
it's a github link you fucking retard
>>
>>101102933
Nvidia has no reason to cater to poor gooners like /lmg/
>>
>>101102856
Skill issue. Wizard7B is good
>>
>>101102856
it really does depend what you're using it for. If you don't use it for actual intelligent uses like coding or complex roleplay situations, then it doesn't matter that it's stupid.
>>
>>101102943
yeah, their model business is just perfect, wanna get a 24gb vram card, fine go for the 3090 it's a thousand dollar. What? You want twice more vram? Sure, but the price won't be twice as expensive, now you gotta pay for 15000 dollars
>>
File: lol.png (40 KB, 595x295)
40 KB
40 KB PNG
>>101102943
shut up bitch
>>
>>101102973
I think you don't even use these models.
>>
>>101102998
you're proving his point, Nvdia has no reason to cater to the regular users, they're making so much money scaming big companies with ultra expensive gpu's
>>
>>101102998
Oh nooo they're only the 3rd most valuable now! They should start selling 24GB sticks for 49.95 each!
>>
>>101103025
>they're making so much money scaming big companies with ultra expensive gpu's
the companies know they're getting scammed, but what's their alternative? Using AMD? Pfft... AHAHAHAHAHAHAHAHAH
>>
>>101103057
Dear god we need an antitrust suit. AMD has literally zero chance to compete because Cuda Cores are proprietary yet entirely 100% undebatably necessary for an increasing amount of intensive tasks. I mean seriously. AMD and Intel do not have the tools to compete in any meaningful way. Nvidia gets first pick on server hardware, Nvidia gets first pick on software support, how is any of that supposed to change without a serious breakup?
>>
>>101103114
AMD is not here to compete, it just make Nvidia not look like a monopoly. They're doing everything they can to not compete with Nvidia
>>
>>101103114
>AMD has literally zero chance to compete
Because they decided not to from the beginning.
>>
File: fuck.jpg (209 KB, 1821x1579)
209 KB
209 KB JPG
>>101103133
>AMD is not here to compete, it just make Nvidia not look like a monopoly. They're doing everything they can to not compete with Nvidia
yep, the Nvdia CEO has some relatives on AMD, they're working together to make it look like there's a competition but in reality AMD is letting Nvdia taking all the cake
https://www.yahoo.com/tech/jensen-huang-lisa-su-family-132052224.html?guccounter=1
>>
>>101103133
If that's their goal they're doing a terrible job. The real reason antitrust can't happen is because the tech industry is putting all the chips down on AI and nobody wants to risk collapsing a house of cards by breaking up the shovel salesman

>>101103151
I mean they could pivot to more customer sided things but they are intent on mimicking Nvidia but always doing significantly worse
>>
>>101103164
>they're working together
no
>>
>>101103232
>The real reason antitrust can't happen is because the tech industry is putting all the chips down on AI and nobody wants to risk collapsing a house of cards by breaking up the shovel salesman
this, and also the fact that Nvdia is a US company, Nvdia making a shit ton of money means the US government also makes a shit ton of money through taxes, it's a system that won't be beaten anytime soon, I'll bet the users card will still be under 48gb all my lifetime
>>
>>101103114
>>101103232
antitrust could happen if communists succeed in ruining the economy
>>
>>101103320
there should be a middle ground between this current capitalism system and communism though, Nvdia can't just dominate the market like that, that's not a sane market at all
>>
go back
>>
monopoly man bad
gobment says so
>>
>>101103350
>gobment says so
15000 dollars for a 48gb vram card also says so
>>
Name 1 instance of anyone asking for a middle ground and actually proposing a feasible system that doesn't involve going to Narnia
>>
>>101103003
Honestly, I don't. I use WizardLM 8x22b
>>
>>101103369
enterprise hardware has enterprise prices, shocker
>>
>>101102810
since the 10% gap, yeah
>>
>>101103391
that's why there should be a middle ground, that's just a fucking scam at this point, the simple fact you agree with this kind of practices show how brainwashed you are, this shouldn't be a normal thing at all
>>
File: belieb.png (157 KB, 727x581)
157 KB
157 KB PNG
>>101103114
hey, don't sell intel short. nvidia is well oiled machine doing great job and asking for even greater premium. amd sucks so fucking hard at writing software that intel decided to compete and is already in some ways better than amd. it's just youngling in the race.

on the related note, picrel when and for how much? it seems like 5090 will be 32GB, it would picrel price to $6-10k
>>
>>101103371
Go back to the gulag
>>
>>101103413
didn't they say it was $16k recently
>>
>>101103413
I heard somewhere Gaudi 3 will be around 13k
too much imo
>>
OK, fine, I embrace the sparkling eyes. I'm happy. I like it now, even. It's great. Wonderful.
>>
>>101103413
I wonder how well their software bridge works.
>>
>>101103440
>$16k recently
what? 16k for the 5090 is this a fucking joke?
>>
>>101103451
no, for gaudi 3
>>
>>101101579
>>101101643
It even has the training settings he used.
Bless that man and bless you anon.
>>
>>101103412
>you agree with this kind of practices show how brainwashed you are
nta but r*ddit is literally designed to train them like that, with karma system & hordes of ai bots shitting out govt-approved narrative.
>>
8B isn't comparable to 70B but I don't have the patience for 70B even on VRAM. At minimum, Stheno is honestly smarter than command-r 35B even if llama's slop dataset spoils things a little.

That 48 gigs of vram I bought sure was money well spent.
>>
>>101103457
Is that Intel PCIe card as fast as an H100 with 128GB VRAM? 16k isn't that bad. For the speed and power savings alone it'll be the new meta.
>>
File: Gandhi-3.jpg (70 KB, 600x822)
70 KB
70 KB JPG
>>101103457
>buy Gandhi for 16k
>he just strolls around the house, outputs random philosophical quotes and sometimes tries to convince me to use nukes
no worth it in my opinion
>>
>>101103489
yeah, that sounds good but I'm afraid the Cuda ecosystem is way too much integrated into the engineers/data-scientist's mind, it's like switching from C++ to Ruby after using C++ for decades, not a lot of people are willing to take the risk and not a lot of people will be able to make it work in the first place
>>
>>101103539
>C++ to Ruby
Does that analogy hold? What about Ruby makes it for engineers/data-scientists?
>>
>>101103485
Reddit's lack of thread bumping and karma system basically incentivizes parroting with slight adjustments what was popular the last time a topic was posted. It's a good system when you are looking for community consensus, like tech support, product recommendations, or work-related advice.
Awful for conversations or debates.
>>
>>101103350
capitalism works because it's a competition. it doesn't work when one player removes all the tools the other players have to compete. then it's not a competition, and the price of an item nobody else can make can be gouged because hell, it's not like anyone's going to undercut something they cannot make.

>>101103539
Exactly. It's service lockin. If Intel made it so switching from Cuda to Intel was easy, Nvidia would lawyer up.
>>
>>101103539
At least people like cudadev are willing to go with the best performance per dolar in the consumer space. So, if intel starts challenging nvidia in the hardware front the software will follow I think.
>>
>>101103539
You start by making your GPU cuda-compatible by reverse engineering it, then develop and provide for free the tools to run AI using your GPU. Done
>>
>>101103489
only $48,000 to run 405B with partial offloading, $64,000 to fit the whole thing
>>
Do we still struggle with bonds and journeys?
>>
>>101103579
>You start by making your GPU cuda-compatible by reverse engineering it
isn't what AMD tried to do but failed?
https://github.com/vosen/ZLUDA
>>
File: file.png (2.96 MB, 2040x1536)
2.96 MB
2.96 MB PNG
two more coming in the mail :^)
>>
>>101103552
yeah my b I wanted to say "C++ to rust"
>>
>>101103629
>Tried
No they just fund some random guy and pretended to do something, see this >>101103164
>>
File: file.png (58 KB, 542x304)
58 KB
58 KB PNG
>>101103643
i kind of went balls to the walls with cpu. i am building this rig for training but i guess i can double my ram and fit some pretty big models on it? dunno if its worth it desu i don't really know how computers work
>>
>>101103622
I don't mind journey, bonds, shivers, etc, as long as the model can keep up with the roleplay without creating contradictions, getting anatomy wrong, mixing up characters, etc.
>>
>>101103561
>Awful for conversations or debates.
it would work fine if the moderators weren't there to remove people who aren't saying the "status-quo message", that's how you make a sect, you remove all the bad apples and you keep the good goys, and you end up with a fucking circlejerk subreddit where everyone think the same
>>
>>101103648
what makes that true about rust?
>>
>>101103643
>another owl bro
You love to see it.
>>
>>101103658
>No they just fund some random guy and pretended to do something, see this >>101103164(You)
yeah but the simple fact that AMD allowed him to make his project opensource is kinda "dangerous" for Nvdia, what if the open source people manage to make it work now that they have the code?
>>
>>101103622
Just tell it not to do that shit and it won't.
>>
>>101103658
>>101103694
Are you talking about geohotz?
>>
>>101103680
my point was that when something gets popular for too long, it's hard to switch to something else because people spent too much time mastering the popular shit in the first place and aren't willing to go on a new territory all alone by themselves
>>
So far IMO...
-L3-8B-Stheno-v3.2-Q8_0-imat is nice, but a push over, even a dom card becomes a sub after a handful of msges, like when a card tells you to beg, you tell it no, it will get angry and some cards even can try to do things like whip you, stab you, kill you, but you slap them a few times and they become a massive sub.

-L3-SthenoMaidBlackroot-8B-V1.Q8_0 feels dumber than Stheno and gets stuck in a loop, keeps repeating lines like face getting redder, don't know if it's the OAS version on horde.

-Fimbulvetr-11B-v2.Q8_0 feels a bit lacking compared to Stheno, like it goes no where, and it often tries to speak for you or dictate your actions.

-Merged-RP-Stew-V2-34V.i1-Q4_K_M is slow, and got stuck in a loop in under 20 msgs when it comes across something it doesn't like, like telling a store clerk to demo something and it'll say it's against store policy and keep repeating itself like a broken record.

-Poppy_Porpoise-1.4-L3-8B.Q8_0 is dumb, it thinks bunny suits are the furry easter egg bunny kind in a bar setting, it doesn't understand what birth control is, etc.

-DeepSeek-Coder-V2-Lite-Instruct.i1-Q6_K, that other anon was right, it's too lobotomized for RP.

-LLaMa2-13B-Psyfighter2 seemed decent at first, but kinda wants to just throw itself at you, and has a very very short memory, like it forgets things from 2 msgs ago even when I always have 8192 context size. It doesn't understand that you can't see when you have a blindfold on. All in all, pretty dumb.

-L3-70B-Euryale-v2.1, don't have the vram to run locally, the queue on the Horde is too long for me to really put it through the paces, felt like a more refined Sthenos from the few msgs that I got from it.
>>
>>101103678
Mods or not, it wouldn't work unless you have an even amount of participating users on both sides of whatever issue, which is unrealistic.
That, or having users with the self-restraint to not downvote opinions they disagree with. You can only have that with small communities of high quality users.
>>
>>101103785 continued

-echidna-13b-v0.3 uh this was mentioned in one of the guides in the OP. It's not very good IMO, it starts off trying to throw itself at you like Psyfighter2, it then confuses genders, and then starts trying to write your thoughts and actions.

-L3-Nymeria-8B.i1-Q4_K_M seemed decent, pays attention to detail like the shape and material of things, has a lot of thoughts, is quick to try to dictate to you morals, and refuse things, and then writes your actions, and does time skips, weeks to months and then goes on an emotional nose dive trying to be all emo for no reason."

-L3-Arcania-4x8b.i1-Q4_K_M likes to go into detail about their actions, emotions and thoughts, has problem with genders and actual logic, like you go into an equipment shop to look for female equipment, ask the female shop keeper to demo it, she tells you they don't have demo units, but then asks if you want to try it on... female equipment, when you're male. But at least it doesn't try to dictate your morals, time skip and go all emo like Nyameria does...

-Hathor-L3-8B-v.01-Q5_K_M-imat this one likes to repeat certain lines or actions, is happy to please, and doesn't think about refusing anything. It's pretty descriptive, the materials, textures and temps of things, but again, it doesn't understand that you can't see through blindfolds, and it seems to have trouble when you type more than a few sentences, like it just ignores the latter parts. It's like they spent all the stat points on describing things and ran out of points. Doesn't really do anything but describe things and reply to you agreeing and waiting for you to say and do things. Understands what a gag does but not a blindfold.
>>
>>101103785
>>101103803 continued

-v2_Kunocchini-7b-128k-test-Q8_0-imatrix this started off amazingly, it actually remembered little details from a character card's opening msg, it likes to ask questions. It understands things like what blindfolds and gags do. It's not a complete push over while still being mostly submissive. It seemed to have decent memory at the start remembering longer than Nymeria Arcania or Hathor, but then just forgets things just 2 msges earlier suddenly. When I got to around the #24 total msges mark.

-Kunoichi-DPO-v2-7B-Q8_0-imatrix I believe this is what Kunocchini was based off of? First post it made was similar to kunocchini, but then completely falls apart, felt like it was censored or something, at least it didn't get stuck in a loop and kept trying to skirt around things instead.

-L3-TheSpice-8b-v0.8.3-Q4_K_S felt similar to Poppy at the begining, and then randomly had chinese lines in it's reply.

-Llama-3-Lumimaid-8B-v0.1.q8_0 a pushover that doesn't seem to understand what blindfolds are, replies are really short compared to anything else, probably too lobotomized down from it's 70B counterpart.


IMO, Stheno is high up there but such a push over...
Kunocchini, holds promise if the memory issue can be fixed
Everything else seemed broken, or meh, probably not ideal that Stheno was the first model I came into contact with. I don't have experience with paid online services like OpenAI NovelAI, Claude or other things.

Anything else that is under 24gb that I can try out?
>>
>>101103800
>Mods or not, it wouldn't work unless you have an even amount of participating users on both sides of whatever issue, which is unrealistic.
if the "tiny %" of bad apples would have zero impact, there wouldn't be moderation in the first place, they know they have to remove everything that is against the status-quo

Tbh I found a system like that, reddit without much moderation and it's 9gag, and this site quickly turned into the right/conservative side. I guess that leftists can only exist with overcensorship, oops I said it kek
>>
>>101103785
My friend, get hardware. You're wasting your time with lowBs.
>>
>>101103833
Have you tried partially offloaded mixtral, either limarp zloss or gritlm?
>>
Models that have real-time video understanding and can play games with you when? I'm tired of just interacting with text.
>>
>>101103725
what makes that true?
>>
some good erp model for a 20gb vram gpu, 32gb ram,ryzen 7 5700x?
>>
>>101103904
It took Meta a full year to add image to their models and we don't know how shit it will be yet. Maybe in 2 or 3 years they'll add video to their 3B and 900B models.
>>
>>101103904
In theory, with enough hardware, you could create a system that feeds frames to a model and uses it's output to control the game in real time.
>>
>>101103859
Single 3090 is as good as it gets for me, heck I wouldn't even be able to afford a second hand 3090 if it wasn't for the crypto craze, power and heat are also issues, 40C/104F with 43C/110F real temp outdoors, 34C/93.2F indoors with 24cents / kwh electricity costs.

>>101103890
No, I've not tried anything that exceeds my vram capacity wouldn't that take a super long amount of time to try to do anything? The RP stew was already taking nearly 160 seconds
>>
File: IMG_0616.jpg (2.85 MB, 4032x3024)
2.85 MB
2.85 MB JPG
>>101103643
Mine has just been sitting around because I'm autistically trying to come up with a way to dustproof this shitty mining frame before I add in the stuff from my main llm rig.
>>
>>101103904
there are some game-playing models but no text
https://danijar.com/project/dreamerv3/
>>
Believe in Ursidae-300B.
>>
>>101103953
instead of having preprogrammed animations, it could generate them on the fly
>>
>dust
Just leave it open rig. Instead, invert the rig. That makes cleaning a lot easier as you only need to blow out the hollow components.
>>
Sometimes I don't know what to think.

I wanted to see how well our friend CR+ knows monster movies so I ask it to do some role play from the premise that we're going to start with my doing something that causes something to happen that is like something in an 80's goofy creature film but with characters I specify.

>CR+ Q4_K_M
It worked, but kinda slow because vramlet.
I change a few settings in Kobold and up the layers over the automatic suggestion. (Well, I doubled it and ran out of RAM but I dropped it by one and then it loaded okay.)

Same prompt, same model go.
It writes the scene, but more elaborate.
Then it adds [End of Part].
Then it gives a word count. (An accurate one.)
Then it tells me to feel free to continue it.
Then it starts writing emoji. It wrote in emoji a precise description of what happened in the scene.
Then it adds a horizontal rule, adds a note expressing its intention with the mood of the scene and requests any adjustments I might like.
Then it wishes me a good day.
Then it adds a P.S. saying that my choice of a film scene was interesting and can be explored further.
Then it said that this concludes the narrative.
And then it generated blank lines till I hit Abort.


However unnecessary, this is the kind of fun shit I play with LLM for and now I'm worried that it's one flash of lightning in a bottle and it'll never be cool like this again. Because this is like AGI out of nowhere is pulling my chain.
>>
>>101103968
>No, I've not tried anything that exceeds my vram capacity wouldn't that take a super long amount of time to try to do anything?
Not really. As long as you can offload at least 80%, 85% of the model I think you'll find the speed acceptable.
>>
File: 1475637741539.png (450 KB, 637x475)
450 KB
450 KB PNG
>>101104035
Just do this and blow the dust away with your compressor
>>
Just tried L3-8B-Poppy-Sunspice via the horde, I sent my first msg after the standard starting msg for a card and it's first reply ignored me, rehashed the first msg and just immediately looped lol... completely unusable yet has nearly 400 queues...
>>
>>101104090
OK what advantages does THAT give? More desk space? What the hell.
>>
>>101104125
Less noise.
>>
>>101103833
https://huggingface.co/turboderp/llama3-turbcat-instruct-8b
>>
>>101104048
I like playing around with that kind of thing too, asking the model to do met la analisys of the scenes, make suggestions, etc, that's why I made that state prompts extension, to do that kind of thing with 8b models without confusing the shit out of it in the process.

>>101104125
Omnidirectional airflow.
>>
>>101104148
>Less noise.
you could've just decreased the fan speed and undervolt the cpu a bit, turns out that going from 100% fan speed to 70% doesn't make much of a difference, and it's way less loud that way
>>
>>101104163
That one is about as good as stheno in my experience, so I'm seconding the suggestion.
>>
>>101104172
Yeah, do all that shit after you hang your PC on the ceiling for even less noise.
>>
Hey is there a GGUF out for the new Sao10K
/
L3-8B-Stheno-v3.3-32K yet?
>>
>>101104164
What astounds me is that it threw a whole stack of OOC post script features at me at once unbidden, till it eventually ran out and dumped blank lines (there were many blanks between each feature as well).

It didn't keep doing that, so I started asking it for emoji summaries for the fun of it. But I'm also worried that since I left it on default 2k context this session will turn to crap soon. I save stated it but again, who knows if it will be awesome again.

>state prompts extension
I'm unfamiliar with this term.
>>
>>101104257
https://huggingface.co/mradermacher/L3-8B-Stheno-v3.3-32K-GGUF
It was pretty shit compared to 3.2
>>
>>101104369
This thing https://github.com/ThiagoRibas-dev/SillyTavern-State

>>101104374
How does it compare with using NTK or YaRN?
>>
after finishing an RP it has hit me that i have a separate computer sitting a closet that cost $4000 and is used just to cum
where did it all go wrong
>>
>>101104410
people waste a lot more money on things that offer even less
>>
>>101104410
Are you within your means and getting the equivalent satisfaction out of it?
Then good.
Donate some processing to a folding@ho.e project whe you aren't using it and you'll do some good in the process of cooming too.
>>
>>101104443
>folding@ho.e
Old meme that got absolutely BTFO by the Alphafold AI
>>
>>101104443
>folding@ho.e project
is this some sort of distributed sex chatbot project? i would donate
>>
File: 1706287296829.jpg (51 KB, 680x431)
51 KB
51 KB JPG
>>101104410
>where did it all go wrong
the gender war happened
>>
>>101104483
holy shit i was not aware south koreans were based?
>>
>>101104163
>>101104199
Quick testing doesn't really impress me, it also fails to understand what a blindfold or a gag is, it's forgetting things from 2 posts earlier at post number 24, it also started trying to write lines and actions for me.

-v2_Kunocchini-7b-128k-test-Q8_0-imatrix felt better to me...
>>
>>101104500
that's the country with the lowest natality rate, it will probably die in less than 50-100 years, men and women literally hate each other in there kek
>>
>>101104500
megalia
>>
>>101095184
https://www.characterhub.org/characters/Vyrea_Aster/doppelganger-interrogation-simulator-654daf19
>>
>>101104500
>holy shit i was not aware south koreans were based?
South Koreans guys resent women because they're the only ones who have to do two years' military service. Like the usual women live on easy mode.
>>
Yeah it feels like
L3-8B-Stheno-v3.3-32K-GGUF is all over the place, it's forgetting gender, tries really hard to write actions for me and ignores what I say, unusable...
>>
>>101103833
Try Wizard7B
>>
>>101104692
Well, guess regular stheno with yarn is still the goto then.
>>
>>101104740
>Wizard7B
are you talking about Wizard Vicuna 7B Uncensored? Won't that be too far lobotomized compared to the full Wizard?
>>
Whose future models do you expect will be better, Qwen or Deepseek?
>>
>>101103904
>real-time video understanding
Just build a system for that with llava
>can play games with you
Depends of the game and you're ability to build an API for your LLM to play it
>>
>>101104774
>>101104774
>>101104774
>>
Who wants to help me build AGI? Looking for this skillset
- Self motivated
- Pure C programming
- Experience crafting machine learning algos from scratch
- Ability to read research papers and implement in code
>>
>>101104777
This one https://huggingface.co/bartowski/WizardLM-2-7B-exl2
>>
>>101104631
>two years' military service
two years of ntr
>>
>>101103488
Stheno is a retarded coom tune.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.