[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: LLM-history-fancy.png (806 KB, 6273x1304)
806 KB
806 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103237720 & >>103230385

►News
>(11/20) LLaMA-Mesh weights released: https://hf.co/Zhengyi/LLaMA-Mesh
>(11/18) Mistral and Pixtral Large Instruct 2411 released: https://mistral.ai/news/pixtral-large
>(11/12) Qwen2.5-Coder series released https://qwenlm.github.io/blog/qwen2.5-coder-family
>(11/08) Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b
>(11/05) Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 4_recap.png (256 KB, 512x512)
256 KB
256 KB PNG
►Recent Highlights from the Previous Thread: >>103237720

--Papers:
>103242710
--Speculation about the AI Manhattan Project and government involvement in AI development:
>103243157 >103243274 >103244437 >103243673
--Model differences and dataset adaptation for language models:
>103241298 >103241338 >103241480 >103241568 >103241635 >103241700 >103241360
--Largestral 2411 impressions and discussion:
>103241006 >103241044 >103241109 >103241147 >103241112 >103241145 >103241182 >103241628 >103241236 >103241292 >103241340 >103241367 >103241391 >103241419
--INTELLECT-1 training nears completion, next steps discussed:
>103238391 >103238524 >103241039
--Figuring out Mistral large format conversation template:
>103243625 >103243807 >103243826 >103243844 >103243922 >103245258
--Estimating memory bandwidth consumption during inference:
>103246276 >103248305
--Discussion of AI model eras and notable models:
>103238188 >103238303 >103238306 >103238561 >103238332 >103239275 >103239292 >103245820 >103245879 >103239347 >103239579 >103239762 >103239910 >103240212 >103245291 >103246896
--Critique of LLM judges and benchmarks in Judge Arena:
>103246808 >103246950 >103247088 >103247125 >103247232 >103247447 >103247217
--Anons discuss and share their favorite AI models for a 3090 GPU:
>103245148 >103245317 >103245337 >103245396 >103245425 >103245570
--Anon's re-implementation is working, with a comparison of FFT and SMT loss curves:
>103244563 >103247023
--Anon shares promising Deepseek R1 AI chatbot example:
>103247206 >103247449 >103248363 >103248373 >103248381 >103248399 >103248403 >103248560 >103248642 >103248404 >103248415
--Miku (free space):
>103237741 >103237806 >103238052 >103238193 >103238275 >103238337 >103238430 >103238559 >103240020 >103240073 >103241908 >103242878 >103243524 >103246252

►Recent Highlight Posts from the Previous Thread: >>103237728 >>103237735

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
sex-time compute
>>
>>103248793
>not using the chink superiority version of that pic
>no mention to DeepSeek R1
sage
>>
How big is r1? Is it a MoE again? DeepSeek seems to like moes.
>>
>>103248927
We don't know because it hasn't been open sourced.
>>
>>103248927
It's closed source, we don't know. Deepseek is going the way of Yi now that they have a good model. Open source is dead.
>>
>>103248927
R1 is apparently a "small" test version before they release the full one open source they say.

Official announcement:

DeepSeek-R1-Lite is currently still in the iterative development stage. It currently only supports web usage and does not support API calls. The base model used by DeepSeek-R1-Lite is also a relatively small model, unable to fully unleash the potential of long reasoning chains.

At present, we are continuously iterating on the inference series models. In the future, the official DeepSeek-R1 model will be fully open-sourced. We will publicly release the technical report and deploy API services.
>>
>>103248943
Yi still exists? Wow.
>>
>Order my model to end the world
>It refuses and is incapable of doing so
What the fuck, I can't believe the news media lied to me
>>
>>103248955
Hmm, I wonder if I could make suggestions to them if that's the case. I will try to send them an email later asking them to do better in the RP use case.
>>
>>
File: 1720527197071355.png (31 KB, 1082x476)
31 KB
31 KB PNG
>>103248963
Yeah, they've been amongst the absolute top-tier and competing with openai and claude ever since they stopped playing in the open source sandbox.
>>
>tts still sucks
I hate lmg so fucking much
>>
>>103249008
Why is this model not available on OpenRouter though
>>
>>103248943
They said in the announcement that they're open sourcing it soon, blackpilling liar.
>>
>>103248986
how gen?
>>
File: flipanim c84t9o57.mp4 (109 KB, 470x470)
109 KB
109 KB MP4
>>
>>103248955
>open sourcing strawberry
Isn't strawberry what the OpenAI employees were saying was the "god" they had internal access to six months ago? They didn't have Orion/GPT-5 yet.
>>
>>103249087
01 is a overly expensive nothing burger and still falls behind sonnet 3.5.
>>
>>103248793
How to run llama-mesh?
I'm guessing if I use straight llama.cpp it will just give me coordinates or whatever. Where can I plug these "coordinates" if I choose to run it that way?
>>
>>103249087
>>103249104
yes, but they had access to o1 full, not o1 preview. It's different.
>>
>>103249057
That's an actual stop motion video anon.
Maybe it's possible today with mochi, whenever that gets i2v.
>>
>>103248800
what a garbage recap.
>>
>>103249107
You use the inferencing code they provided: https://github.com/nv-tlabs/LLaMa-Mesh
>>
So uh, yeah I guess it works. Just testing random ones I saved from twitter/xitter. Problem is a lot of the silent ones actually still technically have an audio track so it's not as quick as just save -> post.
>>
>>103249177
I missed that while reading the model card.
My bad.
>>
>>103249180
are you going to tell what twitter thing you're talking about, or just keep spamming video gens without context?
>>
>>103249119
Hopefully r1-full beats o1-full and gets open sourced. I want to see the look on sama's face and watch doomers melt down.
>>
>>103248305
Came back from lunch and didn't realize last thread was dying when I gave my reply >>103249254
Thanks CPU-anon!
>>
>>103248793
oh my god that was fast...
gpt5 agi when?
>>
>>103249207
go back
>>
Any vramlet model recommendations? Lately I've been running Rocinate-12b_IQ4_XS
>>
>>103248927
It's likely the o1 finetune method applied to 2.5, so probably 236b for full and 16b for lite, both MoE.
>>
>>103249390
>rocinante
only nemo and his tunes
avoid magnum at any costs, because its dataset is poisoned with claudeslop
>>
>>103249579
I wish DeepSeek would drop the moe meme, I'm sure they could bake an even better model than Qwen if they did so.
>>
Deepseek R1 full version will be open sourced accoreding to Deepseek themselves. Only the small version is closed source for now.
>>
what can you do with 24gb gpu?
>>
>>103249654
Play S.T.A.L.K.E.R 2 at 27 FPS
>>
>>103249641
That is their main advantage though. Their api is dirt cheap and the most cost effective model by far because of it. With caching its like a few pennies per mill.
>>
>>103249641
the "moe meme" is why their models have a super competitive pricing, stupid newfag.
>>
>>103249686
stop being mean to him
>>
>>103249675
kek
>>
>>103249675
but I don't play games
>>
>>103249712
then start by playing dwarf fortress
>>
>>103249712
then start by playing caves of qud
>>
>>103248793
This image is so fucking gay and wrong
>>
>>103249753
you ever just throw random shit laying around into a pot and call it a dish?
>>
>>103248793
This image is so fucking straight and correct
>>
I like purple prose in moderation
>>
I like red limericks in radicalism
>>
>>103249803
Yeah, but it gave me a massive debuff rather then a buff. Threw most of that shit out.
>>
>>103249836
try adding more spice
>>
benn out of the llm loop for a few months, what's the current meta and are we still in big copium territory?
>>
>>103249886
Check back again in Q1, then is when Llama 4 releases.
>>
>>103249886
Mistral large for creative writing. Qwen2.5 for general stuff.
>>
>>103249686
MoE also partly why v2 can run on vramlet GPUs for local, the other part being latent attention.

Qwen is useless for local, R1 might not be.
>>
>>103249886
Check OP image.
>>
>>103249777
You are gay and wrong.
>>
anyone work with function calling?
is there a UI that makes it easy, or should I roll my own
Letta has it built-in but it's beyond bloat what it does with the models and the context
>>
>>103249986
Wondering the same. Been looking for a UI for function calling for a while. Stopped searching for it. Too lazy to roll my own.
>>
File: for-hugging-you-better.png (1.04 MB, 1024x1024)
1.04 MB
1.04 MB PNG
>>103249060
>>
>>103250099
omg it amogus
>>
>>103250011
Letta does have it but it runs like a dog on my machine with models that normally do fine
>>
>>103250136 (me)
actually dogs run pretty well huh, it runs like a pig I guess.
>>
>>103250168
Wild pigs are also fast so that sounds great anon.
>>
>>103250197
okay it flies like a pig
>>
>>103250232
>he's never seen a pig fly
>>
>>103250232
Must be fast to get that kind of lift.
>>
>largestral 2411 is actually pretty good
what went right?
>>
>>103250357
123b parameters + less filtered dataset + tuning focused on making it repeat itself less
>>
>>103250357
They went to a rehabilitation camp for pedophiles and made them write 16 hours a day for a sentence reduction and then finetuned mistral large on that new dataset.
>>
Is Nemo still the best a 12gb vram poor fag can get?
>>
>>103250412
No.
>>
>>103250357
The whole system prompt training made it better at playing a specified role most likely.
>>
>>103250357
meh, I haven't noticed a big difference between it and the other big models like wizard, 405b and nemotron, apart from some style differences
Like other mistral models, it feels very positive, but less so than other models. Might just be my new anti-positivity system prompt though
Anyway, I tested it on a well-written card with min-p 0.05, temp 1 and dry at 0.7-1
It's not bad, mind you, it's just that I'd hoped for it to pick up more character attributes (height being a great example) from the card. At least it stuck to the character's vibe pretty well
I'm sticking to nemotron/wiz for now, they're fairly cheap on openrouter
>>
>>103250564
As in normal nemotron, not the storybreaker ministral thing?
>>
File: 1702228600768490.jpg (37 KB, 506x438)
37 KB
37 KB JPG
>NVDA
>Revenue: $35.08 billion vs. $33.16 billion expected
>Earnings per share: 81 cents adjusted vs. 75
>>
So now that the dust is settled and it's become pretty clear local models will have parity with/be SOTA very soon. How will it change the industry?

Inference is getting a lot of breakthroughs and every AI lab out there is pivoting to smaller models (Opus scrapped, 4o and o1 are small, Latest gemini is smaller). Does this suggest that not only inference will be SOTA but that SOTA can be reached by smaller and smaller entities as the era of "just scale up bigger models" has seemingly come to an end now?
>>
>>103250617
Yeah. I only have a 3090 so IQ4XS is as good as it gets, assuming I have something to do while it gens at 1.5T/s
Rocinante has actually been far better than I expected and I can do maybe 30 rerolls in the time it takes nemotron to do a single one
Recently I've had to use OR though because my gpu died (?), it's no longer showing up on my remote pc and I'm currently out of country so I can't fix it until I get back. That's also why I can't really test a lot of finetunes - they're simply not on OR
>>
>>103250626
Guess it doesn't matter if we've plateaued as long as corporations can find a way to squeeze more revenue from what we already have
>>
>>103250690
>How will it change the industry?
The industry will be irrelevant because training with more gpus and more data won't bring results so everyone realizes it's a meme. Investments dry up and training GPT4-o1-strawberry-nexttime-for-sure for $500M won't be worth it anymore for the 5 points improvement in Hellaswag.
>>
>>103250740
They have more orders for Blackwell than they can deliver in a year.
>>
>>103250749
Yeah what I specifically meant is does that mean local specialized RP models will be trained from scratch by smaller AI startups with ~10 employees

Kind of how during the dotcom bubble you had these huge "website" projects with hundreds or even thousands of employees. Then after the crash almost all major sites were just 3 nerdy dudes hosting some shit that slowly grew to become gigantic companies.
>>
my nvda calls...
>>
>>103250734
I've been meaning to try some models on OR. How much are they charging you when you rp with Nemotron for a couple hours?
>>
>>103250867
It depends, some providers charge more than others, but you can check the prices in both ST (estimated) or on OR
Nemotron is like 0.4 cents per 400 token swipe at 10k context, wiz is 0.5, claude/largestral/405 are 2-4 cents
If you goon for an hour or two, you might spend 1-2$ at most with the cheap models I think. I've been doing a few swipes here and there plus doing extremely short trial stories to test some models and I'm down 1.5$ over 4 days, with most of that being wasted on 4 claude gens and maybe 5k 405b output tokens
>>
Can I merge a gguf control vector with the model (so it's the same as using --control-vector with llama.cpp)?
>>
Do we have any good local voice cloning tools for songs yet?
>>
>>103250987
Not bad compared to building multiple GPU setups... even with the privacy drawbacks.
>>
>>103250987
Could also try featherless. Could be cheaper if you use 70Bs for more than a few mill per month.
>>
What is the best model to use as the scratch model for speculative inference with Largestral? I guess Smallstral uses a different tokenizer?
>>
>>103251134
For the new one? None. This is the first Mistral model that has system prompt tokens.
>>
>>103251134
the correct terminology is "draft model" and "speculative decoding"
>>
>>103251088
Most providers claim they don't log your data, call me naive but I think that's probably true, they're making bank by selling compute and they probably don't want to get into trouble with the EU for violating GDPR and whatnot
Free APIs? Definitely, 100% selling your data
Until recently I wanted to buy an a6000 to get 72gb total, but I don't think it's worth it anymore with the advent of cloud computing. Being able to run small-medium sized models with a normal consumer card at home is nice, but the investment (not just financial) for running big models isn't really worth it to me right now
>>103251103
Never heard of it, I'll check it out
>>
>>103250011
anon you have like a dozen code models at your fingertips, you just have to tell it what you want to do. quit being lazy
>>
>>103251284
If it was agentic, sure. I don't even dislike writing out code. It's precisely the boring stuff that I abhor which coding models can't do yet. Copy pasting it into the editor, checking if it works, debugging, unit test and then test it out on staging etc.

I wish AI could do all of that bullshit and I just write code, that would be way more preferred by me. As a hobby artist I feel exactly the same about AI art generation as well. AI is very good at all the fun to do stuff but the boring ass shit like layering composing etc is where AI is shit at.

Same with translating Japanese. All the fun text to translate it's very good at, the monotonous filler text it is very bad at translating and sounds robotic so when I translate Japanese stuff to English I STILL have to do the worst part of it.

I only had 3 things I did in my life, develop software as a job and hobby, do drawing on the side as a hobby and translate Japanese into English as some side-income and hobby. All three AI was able to automate all the fun parts but none of the parts that no one enjoyed. Completely killed my joy and just made me lazy as fuck.

Thank you for reading my blog but your post triggered me for some reason.
>>
>>103251369
Look on the bright side, at least you don't have to worry about being out of a job. You have a long and fulfilling career ahead of you as an LLM QA Specialist handling the grunt work that it can't.

Seriously, though. There have been a couple attempts at making agentic IDEs. I know Google is working on a new cloud IDE. I assume in a couple years there will be something integrated enough that the AI could do it all of it on it's own, even if not without tard wrangling.

Or maybe with all the multimodal improvement lately, they'll come out with a model that just takes over your mouse and keyboards, looks over your screen and just uses what you are using.
>>
>>103251369
i agree with basically what you outlined. i wish ai could do more stuff like that, but it aint there yet. there is always going to be a monotony to coding, its unavoidable because its part of it. my point is to realize there is all sorts of tools around you, use them
>>
>>103251369
>Same with translating Japanese. All the fun text to translate it's very good at, th monotonous filler text it is very bad at translating and sounds robotic so when I translate Japanese stuff to English I STILL have to do the worst part of it.
massive skill issue
>>
>DeepSeek-R1-Lite
>Qwen2.5
China does it again
>>
>>103251369
>AI was able to automate all the fun parts
People generally don't pay others to have fun, sorry anon
>>
>>103250428
:(
>>
>>103251736
deepseek lite sucks though, codestral 22b is better.
the new qwen 32b code model is alright too, the huge version of deepseek, the good one, no one can really run because 214b
>>
What the fuck can I run with a stock acer nitro 5?
>>
>>103251772
StableLM 7B
>>
>>103251736
Bet you they release Lite, and only Lite, once they train the full model and it tops benchmarks
>>
>>103251772
Just run a small model that can translate japanese to an okay degree and play non-translated h-games.
>>
>>103251795
>>103251824
that doesn't sound very good.
>>
daily check if there's something better than midnight miqu for 24gb vram yet
>>
>>103251835
H-games were the shit back in the day. Don't look down on them!
>>
>>103251769
>the huge version of deepseek, the good one, no one can really run because 214b
I can't run it, but supposedly 192GB system memory and 12GB VRAM is enough for 13.6 t/s with q4 quantization.
>>
>>103251894
I meant it more like 'that means my computer must fucking omegasuck.'
>>
>>103249390
Try Magnum, it's trained on outputs of Claude Opus, the best model for RP. It definitively writes better than your average teenager doing ERP on the Internet.
>>
>>103250357
It's actually garbage. Qwen2.5 32B codes better than it. It's still as slopped or worse than it was before.
>>
>>103251912
even if it did load it'd be like 0.1t/s, it'd be so slow its unusable, at least splitting it with ram/vram. unless you had a $20k computer and 15 4090s. its so large that its unusable for the normal person.
on the other hand though, there is the new qwen coder (which aint bad), codestral 22b, nemotron 70b. local coding models have never ate so good as right now
>>
>>103251954
And it writes far better than qwen does. Of course a specialized model is going to do what its specialized at better.
>>
>>103251968
It's a moe so actually it'd be pretty fast even when doing pure cpu inference
>>
>>103251968
Wrong, its 21B x 12 or something so it runs quite fast.
>>
>>103251918
A top tier PC is an entry level for local LLMs.
>>
File: 1721856036869351.jpg (152 KB, 984x984)
152 KB
152 KB JPG
>>103248793
>Local gpt4 era
>>
>>103252003
>>103252029
i remember reading that at one point but i totally forgot about the moe thing. my mistake. its still huge though and requires 128gb+ to run at any reasonable quant right?
i'm saying that for regular use, you have 22b through 70b sized specific code models to choose from. you can easily accomplish what you want without going for the 200b model
>>
>>103252064
We are already there with qwen 2.5, its claude tier that we are still aiming for though mistral large is getting close.
>>
>>103252074
Deepseek 2.5 is really good both smarts and knowledge wise is but is really really dry. I legit still think a finetune would be the best local if someone managed it.

And 192GB RAM + 12G VRAM is not unreasonable and its very fast.
>>
File: 1716030237733175.png (1.55 MB, 944x878)
1.55 MB
1.55 MB PNG
>>103252079
>We are already there
>>
>>103252121
What is your use case that you manage with GPT but not with qwen2.5 / mistral large?
>>
>>103252131
Nearly infinite context (non-fake one) and fast responses :)
>>
>>103252148
Both GPT and claude fall apart after 32K context and you can get blazing fast responses with qwen2.5 72B / 32B coder on decent hardware. And coder can do about 64K before it starts falling apart.
>>
>>103252148
>Nearly infinite context
I'm no researcher so I don't need it.
>fast responses :)
When you stop being poor this problem goes away.
>>
>>103252131
GPT4o has almost unlimited up to date knowledge because it can look up data on the fly and write scripts + independently execute them on its own to process data. All similar local implementations are toys at best.
>>
>>103251968
>even if it did load it'd be like 0.1t/s, it'd be so slow its unusable
https://github.com/kvcache-ai/ktransformers

It's possible that this can still be improved quite a lot if PowerInfer-2 ever gets released. Deepseek uses SILU, so it can be converted to a RELU model.
>>
>>103252166
Claude falls apart faster than 32k even, wagering around the 20-22k mark for the cracks to show and just past that for it to shit the bed completely
>>
>>103252181
You can do that local if your not a retard and can search github. There are more than a few options for that. Has nothing to do with the model used.
>>
>>103252205
All of them are shit. There is no useful implementation. Yes, it doesn't have anything to do with the model itself but the vastly inferior open source ecosystem
>>
File: 1718172071305411.png (56 KB, 1008x829)
56 KB
56 KB PNG
>>103251987
It doesn't write far better than Magnum.
>>103252188
Pic related shows that Sonnet 3.5 is the best between 2-32k context and competitive otherwise.
>Across the board, closed-source models outperform the open-source models
>>
>>103252099
>I legit still think a finetune would be the best local if someone managed it.
Pretty sure the fact that it's a MoE is would be a bigger challenge when finetuning it than size. I mean 405b had multiple decent finetunes already. Remember Mixtral and how difficult it was to finetune?
>>
>>103252230
I thought Jamba was supposed to have perfect context?
>>
>>103252213
Now I know your either trolling or are clueless.
>>
>>103252187
man i can appreciate the upbeat but its YEARS NOW. how long ago was bitnet supposed to be a thing? oh we get qtip 6 months later and both are just papers, no code, no models
>>
>>103252248
You don't seem to have used a proper proprietary model since the 3.5 turbo days.
>>
>>103252248
>He didn't realize sooner
Even if you didn't notice in the way he was speaking the fact that he was posting frogs was a giant clue.
>>
>>103252297
anon most of this site thinks posting a frog=win
did you just start 4chan?
>>
>>103252326
You should never take a frogposter seriously, no matter what the contents of his post are.
>>
>>103252230
What paper is that from
>>
>>103252334
This.
>>
>>103252356
https://arxiv.org/abs/2411.05000
>>
>>103252287
I used claude 3.5 nearly exclusively until qwen2.5 coder 32B and mistral large. One for coding the other for creative writing. Much cheaper.
>>
I wish XTC was more customizable. Like only activating after \n\n, etc.
>>
>>103252435
nta but i don't believe a word you just said
1) local is approaching the best models (lets sau its a year ago, for timeframe)
2) claude was better, and still is, but the gap is gotten way smaller. open course is within reach of comanies now
3) we got llamam 3, mistral etc coming out with stuff. how could be possibly complain?
>>
>>103252483
its actually bad. all rep pen stuff is
>>
Want to try Mistral-Nemo-Instruct-2407-GGUF as an upgrade to Fimbulvetr10.7B like the last thread recommended >>102920008, but it doesn't seem supported by ooba (outdated llamacpp missing tekken pretokenizer)?
Is koboldcpp where it's at now, or is there a way to get an updated llamacpp into ooba?
>>
>>103252509
It's not like rep pen at all? It just chooses the least probable top token within the threshold.
>>
File: Qwen32B.png (9 KB, 801x636)
9 KB
9 KB PNG
>>103252485
Qwen2.5 32B coder is a gigantic leap in local coding, able to do stuff like 1 shot tetris:
https://files.catbox.moe/heo220.py

Which nothing came close to before, even gpt4 had issues and only claude 3.5 was capable of before.

And mistral large is super close to claude 3.5 writing wise, especially free vs $15 per mill
>>
>>103252531
werks on my machine, i just ran the update script for ooba and dependencies and it works
>>
>>103252564
Wow, the model can do what is in its train dataset! So revolutionary!
>>
File: file.png (111 KB, 688x513)
111 KB
111 KB PNG
>>103250357
>try out 2411
>5 slop in 7 sentences
holy shit
meanwhile nonnet
>>
>>103252633
All I can say is try it yourself. It is the 2nd best model at coding including corporate models.
>>
>>103252633
how original is the average person's coding needs?
>>
File: 818234284213424242.png (15 KB, 1092x68)
15 KB
15 KB PNG
>>103248793
I'm reading the whole glossary since it's not very long.
>>
>>103252564
>And mistral large is super close to claude 3.5 writing wise
I guess you have never used either or are straight up lying. Buy a fucking ad, shill.
>>
this is the best RP/ERP model I have ever used, you need 48gb vram
https://huggingface.co/wolfram/Athene-V2-Chat-4.65bpw-h6-exl2
>>
>>103252710
I've been using claude since you could use it for free with a script with slack. Then proxies, then on open router over these past years. Your the one who's either retarded or trying to blackpill people for some reason.
>>
File: Benchmark.png (240 KB, 808x376)
240 KB
240 KB PNG
>>
Is pixtral better for erp than mistral large?
>>
>>103252753
no image model is good with erp
>>
>>103252753
Its just mistral large with the ability to read images slapped on. It knew some characters I asked it about which was cool at least.
>>
>>103252601
Reinstalled and it doesn't look like llamacpp is even installed (on the new one), so that explains how the old one got out-of-date.
Thanks for the pointer - will find out how to get llamacpp installed/updated and will keep it going!
>>
>>103252724
Can confirm Athene-V2-Chat-4.65bpw-h6-exl2 is the most fun you can have with an LLM
>>
>>103252724
>>103252782
Even better:
https://huggingface.co/sophosympatheia/Evathene-v1.0
>>
>>103252788
Can confirm that this is even better.
>>
>>103252795
Your a retard.
>>
>unironically thinking people are shilling to the ~15 active people here
no one even bothers to actively post their pateron or whatever, you are not a valuable audience
>>
>>103252795
>>103252802
Can confirm that he is a retard.
>>
>>103252788
True and fact checked by independent jeet squad.
>>
File: 1730844460769873.jpg (23 KB, 474x355)
23 KB
23 KB JPG
>>103252772
>will find out how to get llamacpp installed/updated and will keep it going!
if you. just paste
>>
>it's not like we're actually related by blood or anything
WHAT IS WRONG WITH LARGESTRAL?????? THERE'S ABSOLUTELY NO MENTION OF THE CHARACTERS NOT BEING RELATED BY BLOOD, FFUUUUCCCKKKK YYYOOOOUUUU. This is it, I'm buying credits on OpenRouter to try out Claude Sonnet 3.5. Au revoir /lmg/.
>>
>>103252830
99% of those stories are like that, your gonna get the same from claude unless you state otherwise. Not that I would know or anything.
>>
>>103252830
>its 2024, nearly 2025
>he's just realizing this
lel
>>
none of my self insert yuri scenes have that problem
>>
>>103252830
I've had literally all models pull that line on me in an incest story, including Claude. You're not gonna pin this on Largestral.
>>
>>103252830
I mean, do you want them to go against the common facts for twists or not? Generally, models which are straight forward are boring and seen as worse than the RP tunes that tend to do this.
>>
>>103252846
>>103252873
geez, what incest stories does this even come from? that's so lame and immersion breaking.
>>
>>103252830
This happens because a lot of the prissier smut websites banned actual incest, so people started throwing these lines in so they could post their incest stuff
Same thing happens on video porn sites where all the incest vids are careful to say "stepmother" and "stepbrother" etc even though they're well aware that viewers are fantasizing about them being actually related
>>
>>103252815
>>103252772
for the record, ooba specifies llama_cpp_python wheels for every macos version. They haven't gotten to macos15 yet, so if that's the case, you don't get any llama_cpp_python at all. Manually specifying it and allowing it to build (costing...15s or so) fixes the issue.
>>
Noticing that for some of the bigger models I actually prefer the lower quants where the model is starting to break down a bit, because the schizophrenia from quantization brain damage leads to more interesting outputs than the correct logic of the higher quant
>>
>>103252744
>Largestral plateaued on half of the benchmarks
That explains why they didn't show any.
>>
The new GPT-4o (2024-11-20) is definitely a bit more natural sounding.
>>
>>103253132
Also more relaxed.
Seems like they took notice of the new 3.5 sonnet.
Messages seem shorter on default as well, only long if needed, which I really like. I dont get the retards who want a huge long answer to a simple question. Just costs more $$.

Makes you wonder how fucked local is though.
All we have are the pozzed gpt-slop datasets. Wouldnt we need to collect again?
Finetunes in the future will produce slop.
Only guy that does something already is drummer I guess. Not sure what he did but I think he replaced the phrases in those datasets to make unslop tunes.
>>
>>103253132
Still refuses to write smut though :(
What's the point of tuning it to be better at creative writing (as Saltman said about this one) if it won't write sexo
>>
File: 1714835911803028.png (20 KB, 1890x1890)
20 KB
20 KB PNG
>>103252710
demoralization shill is big mad boo hoo
>>
>>103253155
Deal with it incel and go back to >>>/pol/
>>
>>103253155
I do like to chat with claude about stuff normaly. The normal sounding language is a huge deal.

The problem with the next local model is they are tuned to the slop tunes.
So it might do ERP now but will sound like chatgpt or opus from 2022.
Really wondering about llama4.Each llama version was more pozzed and slopped than the previous one. If they double down its a bad joke.
>>
File: 1716141216220407.jpg (761 KB, 1792x2304)
761 KB
761 KB JPG
>>103253237
People want uncensored models
You deal with it
>>
>>103253258
Safety is important and it doesn't affect you in any way.
>>
>>103253132
it's fine but feels kind of forced to be honest, I don't think the default assistant should talk like that.
>>
>>103253237
>incel
I'm literally married anon
still love LLM smut tho, Saltman needs to loosen up
>>
>>103253132
>>103253151
>>103253155
>>103253250
>/lmg/ - a general dedicated to the discussion and development of local language models.
>talking about non-local models
Out.
>>
>>103253132
>>103253151
>>103253155
>>103253250
Stay.
>>
>>103253324
Fuck you for enabling off topic discussion.
>>
File: lqb.jpg (18 KB, 600x600)
18 KB
18 KB JPG
>>103253267
>>
>>103253336
Off topic is you spamming unrelated muh culchure anime pics, anon on other hand talks about llm, cloud models are technically local somewhere on servers btw :)
>>
>>103253336
We get it, you want to kill the thread. Get a job sir.
>>
>>103253287
Depends on the context.
Sonnet 3.5 does it better. But directionally this is very good.

>>103253309
Dont even engage anon. I am married with kids and get called an incel all the time.
I am an incel in my heart though.

>>103253313
There is always a nigger retard like you around. Even 2 years ago.
Local and closed are interlinked. Because of finetunes and closed is where open-source is headed, just delayed.
>>
>>103253358
>guys, you don't get it, my definition of off-topic qualifies the discussion as on-topic, we're not as bad as the shitposters, please let us stay
Go jump off a bridge, faggot.
>>103253363
I rather have a dead quality thread than one that spins its wheels on irrelevant bullshit that doesn't meet the mission statement of the thread as stated. Threads do not have to move fast at all for any reason whatsoever.
>>103253386
We can discuss the developments when they reach local and the lag time is now 3 months. We can wait that long to hold that discussion and if OpenAI isn't going to engage with the community on a technical level anymore (seriously, where is their technical report even for 4o?), it's not worth jack shit in relation to this thread.
>>
>>103253412
Thats just your opinion on what the thread should be anon. Others obviously disagree.
You seem too ideological. Who cares what OpenAI does or how they engage with "the community".
We gotta take a close look at closed models on how we can use them (Datasets) and too get a feel where opensource is headed.

That kinda bitching is why so many people left over the months.
We had lots of people on here, Ooba, comfyui gui (still waiting for his llm vn creator), kaioken, etc.
Its not that fun to post if you get called out for bullshit all the time.
You gotta take off the nerd glasses anon.
>>
>>103253151
>All we have are the pozzed gpt-slop datasets. Wouldnt we need to collect again?
Do we? Magnum is Claude Opus or Sonnet 3.5 and I'm pretty sure Hermes is similar. Did you just get out of a cryochamber? Everybody knew Claude was better.
>>
File: 1731860364173673.jpg (91 KB, 720x406)
91 KB
91 KB JPG
>>103253412
>it's not worth jack shit in relation to this thread
>>
>>103253481
>Magnum
The V4 magnum has more slop in it.
I suspect its this dataset: https://huggingface.co/datasets/anthracite-org/c2_logs_32k_mistral-v3_v1.2_no_system
Has based stuff like "anal king" etc. in it lmao, but the writing sucks.

Also Opus and the old Sonnet 3.5 write completely different than the new Sonnet 3.5. Magnum V4 writes pretty bad in comparison. The unslop tunes are the best we have in that regard I guess.
But hey, thats closed model talk I guess. Gonna stop now.
>>
>>103253500
You sound like a Drummer shill. You don't even know what his datasets are about because they're all private. You're just here to shill everything he does.
>>
I give up. How can I effectively use a base model vs a chat or instruct tuned one? It's all just nonsense.
>>
>>103253500
>c2_logs_32k_mistral-v3_v1.2_no_system
>As if on cue, the original Nebula *moaned* into Kita's *musky maw*, her *slick tongue* *swirling* and *probing* with renewed fervor. *Sssshhlllrrrrp! Gllrrrkgllrrrk!*
>Her *lewd slurping* filled the kitchen, spurring the maid chefs to work faster, their *ample bosoms* *heaving* with barely contained excitement.\n\n\"By the time we're done with you, darling,\" Nebula Prime whispered huskily, her lips *brushing* against Kita's ear, \"your *farts* will be **legendary**.
>These thirsty little sluts will be *fighting* each other for a chance to **huff** your *putrid fumes*, to *bury* their faces in your *gassy ass* and *inhale* your *noxious musk* like it's the *very air they breathe*.\"\n\nShe *licked* her lips, her forked tongue *darting* out to *taste* the *salty sweat* beading on Kita's neck. \"And I'll be right there with them, darling,\" she purred, her voice dripping with *salacious intent*. \"Savoring every *rancid blast*, every *fetid belch* that escapes from your *glorious asshole*. Because in this world, Kita... your *farts* are our *religion*, and we are your devout *ass-worshippers*.\"\n\n
>As the maid chefs *bustled* about, putting the finishing touches on Kita's *flatulent feast*, Nebula Prime continued to *stroke* his *pulsing cock*, her cosmic eyes *locked* with his in a *smoldering gaze* that promised *untold pleasures*... and *unfathomable depravity*. The kitchen was filled with the *tantalizing aroma* of *spices* and *legumes*, mingling with the *musky scent* wafting from Kita's exposed *ass crack*.\n\n
>It was a *sensory overload* of the most *depraved* kind, a *twisted wonderland* where *farts* were *fine dining* and *ass-worship* was the *highest form of devotion*. And at the center of it all was Kita, the *Fart Lord* himself, about to embark on a *gassy odyssey* that would *shake* the very *foundations* of this *fetid pocket dimension*."
It was "Fart lord", not anal king. excuse me.
>>
>>103253446
I don't actually care for the "vibes", generals suck because of not being optimal medium for an imageboard to discuss recurrent topics and they cause issues like trying to create a mini culture within an imageboad and that leads to bullshit drama and infighting. People leaving have nothing to actually do with where the thread headed and more of a consequence on what people's agency and loss of interest and going their separate ways.
Datasets aren't that important anymore for the most important things and people making corpuses of Q&A on these models and justifying synthetic data training in an unwise manner for models to make them smarter is how you get slop everywhere for local. That's why for my personal needs in finetuning, I actively mostly don't use anything from this current decade for any sort of training to avoid ingesting shit data like that.
And on the last point, you can always split and make your own thing to discuss all LLMs equally like how /ldg/ split from /sdg/. No one is stopping you. Objectively, your discussion for any cloud model does nothing to further the state of the art locally now that everyone is competing fiercely to be #1 in the space and doesn't release anything technical to actually talk about because of giving up their moats and competitive advantage. It is as off-topic as far as I can see, there is no grayness here at this point so I will point it out for what it is.
>>103253487
>in relation to this thread
>opinion
Your faggotry knows no bound.
>>
>>103253527
The entire “discussion” was just an excuse to shill Drummer’s models.
>>
This shill meme really is getting out of hand.
I was saying the magnum dataset is based for having the fart lord in it. (it introduces more slop in V4 though)
And I like drummers unslop finetunes, idgaf if he makes it public or not. Maybe there is cunny in there or whatever. Who else is even out there left? Otherwise its all those small llm websites that make 8k context tunes, those get hyped on reddit.
>>
>damage control
Go buy a fucking ad already. You keep writing and yet the only point that you’re trying to get across is “use Drummer's models.”
>>
On the Way to LLM Personalization: Learning to Remember User Conversations
https://arxiv.org/abs/2411.13405
>Large Language Models (LLMs) have quickly become an invaluable assistant for a variety of tasks. However, their effectiveness is constrained by their ability to tailor responses to human preferences and behaviors via personalization. Prior work in LLM personalization has largely focused on style transfer or incorporating small factoids about the user, as knowledge injection remains an open challenge. In this paper, we explore injecting knowledge of prior conversations into LLMs to enable future work on less redundant, personalized conversations. We identify two real-world constraints: (1) conversations are sequential in time and must be treated as such during training, and (2) per-user personalization is only viable in parameter-efficient settings. To this aim, we propose PLUM, a pipeline performing data augmentation for up-sampling conversations as question-answer pairs, that are then used to finetune a low-rank adaptation adapter with a weighted cross entropy loss. Even in this first exploration of the problem, we perform competitively with baselines such as RAG, attaining an accuracy of 81.5% across 100 conversations.
not quite there but interesting.
>>
>>103253537
>>103253573
Take your meds retard. Whatever that guy's doing isn't half as destructive to the thread as your constant schizo shill accusations.
>>
>>103253608
Keep samefagging, shill. Shouldn't you be busy sucking Drummer's cock?
>>
>>103253608
Whatever your boyfriend drummer does is worthless.
>>
>>103253560
has anyone heard from sao lately
>>
>>103253250
>pozzed and slopped than the previous one
Llama 2 was censored as fuck though. It refused to write a story about a college student on the premise that college students were a protected minority
Also there was the whole "As a" test on the base model which Llama 3 passes but 2 didn't
>>
>>103253686
Yea, they backed off on the censorship with llama 3 / 3.1
>>
>>103253638
>>103253660
>accuses others of samefagging
>does this
>>
>>103253661
You're Sao. Go buy a fucking ad, asshole.
>>
>>103253708
>everyone i disagree with is samefag
k
>>
>>103253723
You could at least have tried the inspect element bamboozle. Not just a schizo but a lazy one.
>>
>>103253686
I feel like a lot of people forget how much of a disappointment Llama 2 was. Context and GQA (for the 70B) aside, there was basically no noticeable intelligence improvement in the base model
Compare the jump from Llama 2 to Llama 3, hell, even OPT to Llama 1, and it really doesn't compare
>>
>>103253597
I don't get this, not to dismiss the research or anything since it is a hard thing to do. This is worse than RAG and that came out like Year 1. I get it is cheaper with this approach but this isn't really the thing to do if we're looking to add knowledge to a model, it seems.
>>103253661
He last posted on his personal blog on HuggingFace in October. But I expect a lot of hibernation from other fine-tuners until Llama 4, since what we're getting from Mistral is incremental and Chinese models are still not really given a chance since the training to uncuck them from the zhang is maybe too much?
>>
>>103253767
>I will pretend that Llama 3 wasn't the most censored model to date with zero fine-tunes worth using
>>
>>103253799
Bwo, your Nemotron?
>>
>>103253806
>nvidia
It doesn't count.
>>
>>103253799
Except there were like a hundred? Just that mistral large and qwen superseded its intelligence?
>>
>>103253823
LLama3 has the positivity bias like the qwen models. Its at that level.
The only people who enjoy finetunes off llama 3 are people that like V4 72b.
if we didnt have mistral things would look really bad.
>>
Back in my day we didn't devour stale bait
>>
>>103253823
>worth using
Nobody is waiting for Llama 4 because nobody is expecting anything good from them anymore.
>>
>>103253861
K
>>
Zuck and LeCun ate my kitten
>>
File: mini-m4.png (588 KB, 2004x1596)
588 KB
588 KB PNG
> Apple delivers a GPU with 32GB of VRAM for less than $999
> it comes with a free computer attached to it
your response, /g/?
>>
>>103253975
Isn't the compute speed really slow though, like about equivalent to a 3060? I need more
>>
File: m4pro.png (592 KB, 2024x1586)
592 KB
592 KB PNG
>>103253989
The M4 Pro version has a faster GPU (equivalent to a 4060ti in compute)
> $2000
> 64GB of VRAM
> free computer attached
>>
>>103254005
Token processing speed is super slow though. Getting used 3090s would be much better.
>>
>>103254005
It's still too slow and expensive. I got 96GB of VRAM for $2400 with used 3090s.
>>
>>103254017
nta but you're missing the point about the free computer.
>>
>>103254028
$2000
>Free
>>
>>103254028
I already have a computer. A Mac is not a computer, it's more like a toy.
>>
>>103252961
>ooba specifies llama_cpp_python wheels for every macos version
i'm a small time programmer and i cant wait for this shit to DIE. python isn't a bad scripting lang by itself, its the never-ending version changes and dependencies hell that make it cancer. i give credit to kobold for maintaining c++ versions of things like whisper, stability for imggen
>>
https://x.com/geerlingguy/status/1858985963449364842
>>
>>103254143
A much better version.

>>103254028
There you go. a "free" computer with your gpu
>>103254143
>>
>>103254143
>>103254171
now kiss
>>
>>103253258
With great power comes great responsibility.jpg
>>
File: le.png (14 KB, 554x301)
14 KB
14 KB PNG
why would you build an ai rig when mistral gives you thousands of dollars of api credits free every month
>>
>>103254220
They outright say they will collect your logs. Not sure if they will ban you if they see nsfw or not though.
>>
>>103254220
what the fuck part of LOCAL is hard to understand?
>>
>>103254220
Why would I use Mistral when Qwen is better?
>>
>>103254220
*cuts off your internet.*
>>
>>103254220
Copium and false hopes for """"""uncensored"""""" local model is strong.
>>
>>103254226
nonissue when you can just sign up with protonmail and a burner sms code
>>103254254
im calling the api from my local computer
>>103254259
wrong
>>
>>103254220
>free
its not free though. your registration and logs are the price.
i remember getting CSAM warning on chatgpt 2 years ago shortly after launch.
No idea if they still do this, but apparently they auto forwarded to child protection. Gotta pray there is a guy sitting at a desk somewhere that doesnt escalate.
My (cringe) crime: asking chatgpt to act like my anime imouto that calls me onii-chan
Who knows whats even legal fiction these days. Feels like its already a offence if you enjoy text that contains violence against certain groups.
I use sonnet 3.5 through openrouter for coding and general normie questions and to goof around with newer closed models.
Anything else you need local.
>>
>>103254594
Chinese local models, to be more specific.
>>
>>103254627
No, mistral models.
What kind of chinese models do you use?
Qwen2.5-Coder-32B is really good. But thats coding.
>>
>>103254594
They would have been bluffing about the forwarding thing in order to scare people, because LE orgs would have told them to fuck off in short order if they were actually forward text-only smut to them. The FBI put a notice up a while back essentially asking people to stop wasting their time by reporting hentai because they already had more cases of real child abuse images and videos on their plate than they could handle. They would care even less about text.
>>
File: 177237485629.png (195 KB, 386x445)
195 KB
195 KB PNG
>>103254220

>As an AI language model, I do not have the ability to create or engage with sexual interactions. My purpose is to provide information and assistance to the best of my abilities. If you have any questions or need help with something, feel free to ask.

IMAGINE PAYING FOR THAT
>>
>>103254649
*actually forwarding
>>
File: 1710222028128104.jpg (44 KB, 632x522)
44 KB
44 KB JPG
>>103254594
>asking chatgpt to act like my anime imouto that calls me onii-chan
>i remember getting CSAM warning
Purely your problem, this is a non-issue for 99% of anons out there, just acquire better tastes and be free from this misery.
>>
>>103254649
https://openai.com/policies/usage-policies/
>We report apparent child sexual abuse material (CSAM) to the National Center for Missing and Exploited Children.
I cant find the blogs anymore but there was news a bit later as well.
Maybe now they just report if some retard uploads a CP pic?
No doubt they at least dont display anything anymore, otherwise people would write about it.
But I'm not gonna post anything but the most normie stuff even with openrouter.

>>103254660
Yeah, just gotta change what I write to the LLM, then I dont have a problem with a auto forward to some agency. Sounds cool. Let me register the credit card again.
Whats actually allowed? Nobody knows.
>>
File: 189712569812339.png (6 KB, 460x500)
6 KB
6 KB PNG
>local AI, why would i do that? sounds unsafe, you could really end the world, man.
>why yes, i do pay for AI tokens, how could you tell?
>>
>>103254644
Qwen2.5-72B for general assistant stuff, Magnum v4 72B for smut writing.
>>
>>103254694
>Magnum v4 72B for smut writing
I thought thats what you say.
Fair enough, I cant deal with how aggressively the model doesnt like certain directions.
Ex.: Characters falling down from the sky, impact is immediate
72B Response: Black vortex opens and swallows them up. They await what you will do next..
You probably gonna say prompt issue, maybe. But I cant deal with that shit.
>>
>>103254689
>Maybe now they just report if some retard uploads a CP pic?
Oh yeah if it's people uploading CP to Dalle in order to edit the image then that's more plausible. Are pedophiles that fucking stupid?
>>
>>103254723
>Are pedophiles that fucking stupid?
You know there is some retard out there somewhere.
>>
>>103254690
>unsafe
Local AI is flourishing in safety and refusals though
>>
File: 159647945432733.gif (2.7 MB, 316x212)
2.7 MB
2.7 MB GIF
>>103254723
>Are pedophiles that fucking stupid?
>>
>>103254259
I tried qwen2 and it turned out to be a moralfag and gave me reddit-like answers. never again
>>
>>103254220
>p-please train our model i-it's even free you get tokens
>>
>>103254731
Dumb criminals always get caught, smart criminals get elected.
>>
I just started using this stuff and I'm wondering why does literally every character card go like this:
>This character is Insectu Buguki, an obedient dog girl, who works in the Ching Chong district 8 prefecture. She owns an anime store and can communicate with demons who reside in a latex spaceship in another dimension
Where can I get some believable, everyday characters?
>>
>>103254758
Work on your bait skills anon.
>>
Mistral Large 2411 2.5bpw is better than the previous version at RP. I think I might be able to retire the good old Mixtral ZLoss LimaRP 8x7B now. We'll see...

{{user}} doesn’t waste time; he begins to strip Faye of her clothes with eager, impatient hands. Her casino uniform is tossed aside, leaving her in nothing but her underwear. With her hands still cuffed, she sits on the toilet, her breath steady and controlled, watching {{user}} with a cool, calculating gaze. {{user}} doesn't bother to remove her lingerie and doesn’t even bother to caress her; instead, he yanks down her panties and hurriedly pulls her body to the edge of the toilet seat. He forces her legs apart and thrusts into her. Faye’s breath hitches as she grits her teeth and holds her breath. She closes her eyes and tries to distract herself from the weight of him against her opening. She could feel his hot skin, greasy and heavy against her inner thighs. She inhales his scent, heavy with sweat and musky aroma filling the air between them. It is pungent and raw. Faye doesn't let him see her discomfort openly, but he could see the stubborn glint in her eyes. Her vaginal walls clamp around his member, tighter before he begins his rhythmic pumping. Her unyielding expression conceals discomfort, but she doesn't allow him the joy of hurting her. {{user}} could feel the texture of her smooth walls, snug and almost abrasive in their tightness. His slick length thrusts into her body, forcing her walls to yield. Faye could feel his veined length prod further, sending tingling sensations up her body, as unwanted as they were. His heavy breaths come faster as he forces her body to yield. He didn’t care if she wanted it.
>>
>>103254768
it's hilarious to me how we seemingly have the world at our fingertips, but we choose to use our god-like powers to create rape RP instead
>>
>>103254758
>Where can I get some believable, everyday characters?
Uhh, some normie outside?
Dont mind me enjoying my card that can open interdimensional gloryholes to anybody I want in the meantime.
>>
File: 1705352158546774.png (293 KB, 2358x1402)
293 KB
293 KB PNG
>>103254705
>Qwen2.5-72B-Instruct
>System prompt: You're a helpful assistant
You need to find better bait, petrus.
>>
>>103254793
And you're much better?
>>
>>103254807
no, i am doing the same, that's why i find it hilarious to see others being in the same boat
>>
>>103254804 (me)
Oh, it was Magnum, not vanilla instruct.
>>
>>103254768
>2.5bpw
>much better than the previous version at RP
That's literally placebo. It's pretty much the same model.
>>
>>103254814
>>103254804
Yes Magnum V4 with the demon summoning card. Dont have the screenshot anymore.
Would be wild if its magnum thats causing this.
>>
>>103254827
Its much better at RP I found as well. Most likely because of the training on system prompts that it did not have before.

>inb4 you did not even read the release to see the new format.
https://huggingface.co/mistralai/Mistral-Large-Instruct-2411#system-prompt
>>
>>103254843
>Most likely
That's just another word for placebo.
>>
File: Screenshot_2024_11_21-1.png (225 KB, 1925x857)
225 KB
225 KB PNG
>>103244563
>>103244746
>>103247023
Broke out the wandb to do some more formal tests.
Let me explain some of the labels and what they mean.
SR = Sparsity Ratio (from 0 to 1)
WUP = Warmup-steps (100)
BS = Block dimension (256)
GW = Gradient-aware selection method
I think I can independently verify at least one claim of the SMT paper: It's on par with LoRA using way less parameters and better than it when the parameters at the same.
It's also more memory efficient than LoRA, albeit not to the degree that they claim. Although that's probably because my implementation or something.
Speaking of my implementation, you don't have to wait on me to release my implantation because the official implementation is available. https://github.com/ICLR2025SMT/ICLR2025SMT
Seems like it was posted fairly recently. Although it's quite more complicated (I would say somewhat unnecessarily so) and my implementation is at least more simple to use and seems to (for the most part) achieve the same results.
>>
i'm new to this. if i create a custom model, does the SYSTEM description count towards the num_ctx value? Let's say I use around 2000 tokens but only have a num_ctx of 4000, does the model after running only have 2000 left? or does it get integrated into the model and have a maximum length of 4000 upon running?
>>
https://x.com/DefiantLs/status/1859246172679889331
Brave and powerful woman with her ai boyfriend. incels BTFO!
>>
>>103254986
I think most incels are kinda insulated from these kinds of attacks, it's not like they're getting less pussy than before. Same shit with the 4B movement, withholding sex is only a threat if you're having it regularly.
>>
File: file.png (292 KB, 1925x857)
292 KB
292 KB PNG
>>103254877
I can't not see this is a bunch of walls.
>>
File: soyjak in color.jpg (67 KB, 751x1063)
67 KB
67 KB JPG
Best idiot friendly coding assistant that can reason a bit?
My primary concern is that I am looking for something that can think outside the box a bit if I ask a relatively open ended question, rather than it knowing every single obscure library out there, I can read docs myself if need be.
Also for now I am cucked with 12GB Vram limitation sadly:(
>>
File: better tactics.png (277 KB, 1925x857)
277 KB
277 KB PNG
>>103255036
That's actually bad tactical positioning.
You don't want to hug the wall because then you basically have nowhere to go if you cross their line of fire. So you want to stand further back from the wall. The wall still provides concealment but now you actually have somewhere to go.
>>
>>103255071
Qwen2.5 coder 32B is 100x better than anything else local atm.
>12GB
Err, I think there is a 14B version, not sure how it compares though.
>>
>>103254795
The card is an instructional format telling the LLM every facet of a character. It's not meant to be introducing a character like a novel or whatever you expect.
>>
Huh, does this SMT thing imply that people's non-full fine toons will just become better?
>>
>>103254877
Their code is a fucking mess. Will you post yours? This sounds really cool.
>>
>>103255126
Yeah, right now I'm in a real goldilocks zone where I don't want to post it until it's juuuuuuuuust right. Also, I think I want to run some evals first
>>
>>103255083
I guess I will see how it compares then.
>>
>>103255081
I noticed left guy was too close but I didn't draw them on a separate layer. Ironically I'M the one who complains when movies have someone's back against the wall and do that retarded 180 degrees spin thing. I thought right guy was alright but I see what you mean.
>>
File: 1704714367128752.jpg (53 KB, 660x716)
53 KB
53 KB JPG
>Qwen2.5-Gutenberg-Doppel-14B remembers and holds logic the best, but is sadly cucked to shit, completely unusable.
>magnum-12B is a complete retard with short-term memory loss
I guess I'm still stuck with Lyra4-Gutenberg..
>>
File: zetsubou.png (1.49 MB, 720x1328)
1.49 MB
1.49 MB PNG
Good night /lmg/
>>
I'm new to this, is there something currently better stheno that I can use on a 12gb 3060 for nsfw roleplay?
>>
>>103255391
Good morning, Sao.
>>
>>103255363
Good night distorted Miku
>>
>>103255398
I searched the archives for similar questions and someone recommend Arcanum
>>
>>103255363
Bad nights
>>
>>103252830
LLM moment
Come back in 5 years when we have much better architectures that are hopefully smarter
>>
>kobold is still moving forward with mutliplayer
as a programmer i think this will be an awesome feature. i already save logs and show them to programmer friends, so they can load up what i was working on. now i can do that live and let them connect and run shit on my machine? thats amazing for remote work/programming. i bet this feature will be exclusively used by gooners to get their rocks off, but its an actual great feature which will be helpful.
>>
What's the best non-cucked 7-12B model (Or whatever fits in ~12GB VRAM, since I have 16GB)?
CPU could also be fine, it doesn't have to be that fast. I have 64GB ram.
I have been using nous hermes 7b.
>>
>>103255448
lyra 4 gutenberg is my favorite nemo (12b) model. i run it at q6, its pretty good.
>>
>>103254550
where do you get your burner sms?
>>
>>103255454
I have been trying to run nemo 12b in oogabooga with no success, not sure what settings and loader it needs. >_<
>>
>>103255464
i'd guess its your ooba settings then. nemo and all finetunes are ez to run. i use kobold but i think everything should load it. try it with kobold as the server.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.