[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1710537223999875.jpg (647 KB, 1856x2464)
647 KB
647 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101970380 & >>101962401

►News
>(08/16) MiniCPM-V-2.6 support merged: https://github.com/ggerganov/llama.cpp/pull/8967
>(08/15) Hermes 3 released, full finetunes of Llama 3.1 base models: https://hf.co/collections/NousResearch/hermes-3-66bd6c01399b14b08fe335ea
>(08/12) Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b
>(08/09) Qwen large audio-input language models: https://hf.co/Qwen/Qwen2-Audio-7B-Instruct
>(08/07) LG AI releases Korean bilingual model: https://hf.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>101970380

N/A
It's over.
Local is dead.
--Miku (free space): none, Miku is dead.

►Recent Highlight Posts from the Previous Thread: who cares?
>>
>>101981616
why is anthracite's new model not in the news!?
>>
>>101981643
Because they aren't trustworthy.
>>
File: hkPzhL-xYPeGGKCyAf3Qd.png (1.17 MB, 1920x1080)
1.17 MB
1.17 MB PNG
are you a Large enough man to handle this?
https://huggingface.co/anthracite-org/magnum-v2-123b
>>
>>101981643
>sloptune that's worse than the model it was trained on
Not news
>>
>>101981654
Fuck you
>>
File: huh.....png (313 KB, 583x564)
313 KB
313 KB PNG
thoughts?
>>
>>101981673
con artist club
>>
>>101981673
100% discord users
20%+ transexuals
10%+ paedophiles
>>
>>101981673
where's that mango retard? the twitter schizo?
>>
>>101981673
a single anthracuck member has more braincells than the entirety of this thread combined
no im not part of anthrashit, kys schizo
>>
>>101981673
Sooner or later they're going to stab /lmg/ in the back and make their models fully closed.
>>
the only good fine-tuner is lemonilia btw
>>
>>101981711
You are implying that they have made anything useful to lmg at all.
>>
►Recent Highlights from the Previous Thread: >>101970380

--UNA-TheBeagle 7B model has pros and cons in state maintenance and character behavior: >>101972043 >>101972287
--New Exclude Top Choices sampler aims to boost creativity and break clichés: >>101971203 >>101971270 >>101971404 >>101972320 >>101972374 >>101972652 >>101972816 >>101973112 >>101973523 >>101973654 >>101973720 >>101973807 >>101973782 >>101972340 >>101973603 >>101972271 >>101972509
--Hermes model easily complies with heinous requests, but less capable than Meta's 3.1 Instruct 405B: >>101973365 >>101973386 >>101975604 >>101976172 >>101976243 >>101976288 >>101976279 >>101978726 >>101977526
--Recapbot vs Nous Hermes 3 performance discussion: >>101978771 >>101978840 >>101979354
--New rig specs and memory channel configuration: >>101974903 >>101977508 >>101977742
--MiniCPM integration with koboldcpp and image-related queries: >>101978374 >>101978411 >>101978450 >>101978522
--MiniCPM censorship and potential workarounds: >>101978588 >>101978612 >>101978920
--FlashAttention compatibility with P40s and future improvements: >>101970727
--Disable SillyTavern backups to prevent redundant history: >>101979437 >>101979498 >>101979605
--Advertised model context vs. actual performance: >>101974463 >>101974528 >>101974539 >>101974588
--AMD is recommended for cpumaxx setup due to RAM channels and bandwidth: >>101975661 >>101975685 >>101975715
--AI Game Master vs. unspecified system/narrator for open-world gameplay, Chinese card with translated lore: >>101970414 >>101973572 >>101973598 >>101973660 >>101973738 >>101974445 >>101974468
--Memory and Chain of Thought (CoT) could help AI overcome the "averaging" problem: >>101978425
--Lily's prevalence in models due to overfitting on children's stories: >>101977756 >>101977926 >>101976009 >>101976352 >>101978044 >>101978357 >>101978429 >>101978558
--Miku (free space): >>101973659 >>101974184 >>101977500

►Recent Highlight Posts from the Previous Thread: >>101970386
>>
>>101981711
>The current and past "best" fine tunes and merges were all closed sourced. Even the foundation models current fine tunes are based on don't fully disclose their datasets.
Strange hill to die on.
>>
>>101981688
>paedophiles
yikes.
>>
What the fuck, the new Magnum is actually surprisingly good, you guys lied to me.
>>
>>101981745
Sorry if I didn't write your pronoums right. Cunny amirite?
>>
>>101981743
>making excuses again
Hi Alpin "RMS" Dale
>>
I hecking love anthracite their models are cool as heck :D
>>
>>101981787
:koboldsmile:
>>
gutenberg-12B is a nice little adjustment on top of nemo-instruct.
>>
>>101981673
Half of them are degenerates who think being edgy is 'cool'. Some of them are too useless to do anything for the org (other than the blatant shilling).

Some gems in an otherwise dysfunctional group of pathetic losers.
>>
Now that the dust has settled, what is our verdict on Magnum-123b?
>>
>She leans in closer, her breath tickling Anon's ear
123B magnum confirmed to be slop
>>
Great, now the schizo found a new thing to obsess about.
>>
>>101981859
> anthranigger
> nigger
Nice try, degenerate. Weak attempt.
>>
>>101981858
The verdict is that Anthracite is a company that's very likely to betray /lmg/ in the future.
>>
>>101981891
>anthracite
>company
this is the smartest person on /lmg/ btw
>>
>>101981891
How the fuck do you "betray" /lmg/?
>>
>>101981905
Just look at how Pyg started and where it is now.
>>
>>101981908
by exist
>>
>>101981917
Pyg still exists?
>>
File: grugg.jpg (49 KB, 720x405)
49 KB
49 KB JPG
is mixtral a 24gb rock or a 16gb rock
>>
>>101981908
By going 100% closed source. Maybe they get hired, or make a deal with a company, or get greedy, etc.
>>
What is the uncucked version of gemma2?
I don't care about RP
>>
>>101981616
I like this gen OP. Flux?
>>
>>101981943
>sub
I think you meant to post in /r/LocalLLaMA.
>>
>>101981673
I like the derp in a tuxedo (from the announcement pepe) shooting himself, very striking pfp. I recognize apindale (goliath), kazo (minp iirc?) and gryphe (mythomeme). It seems like a harmless group, what are the other noteworthy users there?
>>
>>101981673
If we're being serious

>Gryphe
made MythoMax, slop king for its time
>alpindale
maintains Aphrodite, pretty good inference backend for big boy cloud GPUs
>Sao
made Stheno, Fimb, etc... probably the single most capable finetuner in the group
>kalomaze
made min_p sampling
>Nopm
made one of the datasets used for finetuning oriented around writing
>Undi
made mediocre mergeslop, but hey, it was a new idea at the time

>everyone else
Literally who are these people?
>>
File: zBYrnVTh.jpg (94 KB, 1024x1024)
94 KB
94 KB JPG
>>101981972
>>
>>101981976
>>101981975
I'm a retard, forgot turboderp, he made exllamav2
>>
What happened to this 4chan board? I don't think we're going to get any news about new models going forward if this is the way we treat the messengers.
>>
>>101981975
>what are the other noteworthy users there?
sao, undi, ikari
>>
>>101981976
>made Stheno, Fimb, etc... probably the single most capable finetuner in the group
Purchase an Advertisement.
>>
>>101981990
They can fuck off if they don't like it.
>>
>>101981976
Sao and Undi are barely different. One just astroturfed their models very hard.
>>
>>101981990
>>101981943
If they get rid of the degenerates who spam 'troon' and 'nigger' in Twitter, I wouldn't have a problem with the group and they'd be given more respect as a whole.

>>101981976
Exactly. Why do they surround themselves with much, much lesser people?
>>
>>101982035
>>101981917
>>101981891
>>101981829
>>101981650
These posts are all made by one person called "Lemmy", who is part of the nothiingiisreal group, and has beef with Anthracite for kicking him out of their 'cord. He won't post a screenshot to prove he's not samefagging, because it'll just prove me correct.

Whenever he says some shit like this, please remember he said "Claude Opus is slop" when he himself uses 90% Opus data in his training sets. This is who you're talking to.
>>
>>101982078
>made Stheno, Fimb, etc... probably the single most capable finetuner in the group
>random seething about Celeste
Hi Sao
>>
>>101982078
'corr coal, what is it with LLMs and attracting the most mentally ill people.
>>
so um how about them local models guys
>>
>>101982118
they be good
>>
>>101982118
local models are the best. too bad nobody uses them. speaking of local models, have you tried out any of the anthracite models? 100% open source.
>>
File deleted.
>>101982078
Huh? Guess I'm talking to one of the lesser members.

Dude, okay. Let me put it simply. Your association with the Anthracite org is a joke.

Go cry and blame a random finetuner for your hurt feelings. I don't care. You don't deserve anyone's sympathy.

Fix yourself.
>>
>>101982118
I miss AI-Dungeon GPT-3 dragon, what is the best local model for that type of experience?
>>
>>101982155
Nothing, gpt3 had pure non-instruct soul. You can't have that these days even with 'base models' because they are all polluted with instruct data. Instead of the unparalleled soul from back then you then just get a retarded instruct.
>>
>>101982148
>posts screenshot that proves hes a schizobaiter samefag
>talking about sympathy on 4chan
nta but go back
>>
>>101982117
>/llmg/ can I marry my ai gf?
>>
File: 1516717477009.gif (1.25 MB, 200x163)
1.25 MB
1.25 MB GIF
Let's play a game! This Saturday at 1 PM PT, I will do a collaborative storytelling/RP session (location TBD, maybe in the thread itself?), where I post a scenario and responses from the model in the thread, and people discuss what to do in the user chat turns, or edit previous user turns or the system prompt and start over. This is going to be both for fun and to get us, finally, simple reproducible reference logs, as I'll be using greedy sampling in Mikupad and have the full log in a pastebin at the end. No editing the model's responses, we're going to use pure prompting to try and get the thing to do what we want!

The scenario is also still TBD. We're going to go for as long a context as possible until the model breaks down uncontrollably, so it should be a complex enough scenario for that. If anyone has suggestions for scenarios I'm all ears. Also, I'm planning on starting these games with Mistral Nemo at Q8 for the first session, and other models in the future, so we have reference logs available for a whole range. But I'll take suggestions for models people want. I'm only a 36 GB VRAMlet though so I'm a bit limited. I can run larger models up to ~88 GB but it'd be slower. If anyone would like to host any of these games themselves, that has more VRAM to run such larger models at a good speed, please do, and I will step down.
>>
>>101982148
>>101982180

>[File deleted]
OH NO NO NO NO
>>
>>101982187
What did he post?
>>
>>101982013
purchase meds
>>
>>101982196
Screenshot with more than one (You), proving he samefagged, I guess he edited it in photoshop or something and then forgot to save it before he posted
>>
>>101982196
damning evidence
>>
/aicg/ has fully taken over /lmg/ by injecting their proxy/botmakie drama into this general. The locusts won.
That's what happens if you spoonfeed every braindead retard asking how to run 8b on their 3070 whenever the piss drinking proxy is down.
>>
>>101982196
https://desuarchive.org/g/thread/101981616/#101982148
>>
One day, /lmg/ and /ldg/ will merge into /lag/, local (general) ai general.
>>
>>101981654
Are you a large enough "man" to buy an ad?
>>
>>101981616
How soon until a local model gets to decide if I live or die for hecking disrespecting a corporate brand?
I feel like we humans got stuck in a lame cyberpunk larp with no way out.
>>
hello /g/
how do I install rocm on gentoo ? here is the error I have with ollama https://0x0.st/XJeA.log
>>
>>101982177
So 65B? Wasn't there any other, better model with enough data transparency/moral constitution to not throw in instruct in the base model and pre-chatgpt cutoff released since 2023?
>>
>>101982232

>2 mentions with (You) out of 5 is considered samefagging
>two mentions that aren't even trying to deceptively samefag

what?
>>
>>101981714
>Last update 21 Apr
It's dead
>>
File: 1704035229922661.jpg (419 KB, 1536x2048)
419 KB
419 KB JPG
>>101981616
>>
>>101982148
Hi Sao! The fame really got to your head, huh?
>>
>>101981738
>Recent Highlights
>UNA-TheBea
go back
>>
what are some good models in the 4b-6b range that are good at storytelling?
>>
CerealBENCH update
>Claude3.5 Sonnet
>GPT4o
>Nemotron-340B (new)
>LLaMA3.1-405B
>Hermes-405B (new)
>Mistral-Large2
>Qwen2-72b
>Claude Opus
>Mistral Nemo-12B
>Magnum-123B (new)
>LLama3.1-70b
>Qwen1.5-72B
>llama3-8b
>LLama3-70b
>Command-R+
>Claude Haiku
>DBRX
>LLama2-70b
>llama3.1-8b
>Mixtral8x22B
>Yi-34B
>Mixtral8x7B
will keep you updated
>>
>>101982563
what does this mean
>>
>>101982563
uh oh
>>
>>101982563
>CerealBENCH
Who?
>>
File: 1699673159184010.webm (1.89 MB, 352x352)
1.89 MB
1.89 MB WEBM
>>101982588
>he doesn't know
>neither do i
>wtf
>>
>>101982553
Minitron 4B might be the least retarded model in the range, but those are very small models, I personally hadn't tried any.
>>
Why do weirdos fuck around with mini magnum 12b when Magnum 12b exists?

Tried both, using basic ChatML instruct/context and they're both similar speed (obviously) but one just seems way more intelligent. It literally brought up something that I did 80 messages ago which fucking never happened to me before with mini magnum (and I was maining this up until I found normal Magnum).

I can't even tell what model it's based off (normal Magnum). Now's the part someone says "They're actually the same shit" and I ask the mods to delete my comment
>>
>>101982697
go back.
>>
>>101982697
Insanely ironic post.

I assume you mean magnum v2.5

If so, yea, it's just a better mini magnum. Fucking retard
>>
>>101981988
I don't think turboprop is part of it unless they're on alt.

>>101981976
>intervitens
rpcal guy, although this method of quating was found to be marginally detrimental
>Doctor Shotgun
8x7b LimaRP guy

Not sure about everyone else. But some made finetunes that seem to have down under the radar in the past, other are slop mergers, and others are quanters.
>>
>>101982766
>>101982697
How does llama 8b >>101982563

Stack up to nemo? Never bothered to even try it because everyone just shits on llama and jerks Nemo off
>>
File: .png (272 KB, 500x593)
272 KB
272 KB PNG
>>101982563
>>
If I liked previous Magnum, forgot if llama3 or Qwen, will I like Magnum-123B?
>>
>>101982563
>Command-R+ that low
Never heard of the benchmark but it sounds bad
>>
>>101982845
Just look at
>llama3-8b
>LLama3-70b
>>
>>101982875
Damn, true I missed that
>>
File: 1718753174823426.jpg (75 KB, 540x471)
75 KB
75 KB JPG
Which Mistral Nemo-12B is the one I should use? I tried starcannon-3 but I got errors when I tried to load it
>>
>>101983041
>but I got errors when I tried to load it
What errors? What loader (llama.cpp, exllamav2)? Are you using the latest version?
I'm trying gutenberg-12B-v3 right now and it's pretty much just mini-magnum with a different tone more or less.
It may or may not be more retarded also.
>>
File: screencap.png (293 KB, 1765x1967)
293 KB
293 KB PNG
>>101983078
>Are you using the latest version?
it is probably 3 months out of date actually
>>
RnGuBENCH update
>llama3.1-8b
>Claude Opus
>Hermes-405B (new)
>Magnum-123B (new)
>LLama2-70b
>Mistral Nemo-12B
>LLama3-70b
>GPT4o
>llama3-8b
>Qwen2-72b
>Qwen1.5-72B
>Claude Haiku
>Mixtral8x7B
>Mistral-Large2
>Nemotron-340B (new)
>DBRX
>Mixtral8x22B
>Yi-34B
>LLama3.1-70b
>Claude3.5 Sonnet
>Command-R+
>LLaMA3.1-405B
will keep you posted
>>
>>101983041
For what?

ERP? Magnum 2.5

None of the others come close unironically and it's the only one worth downloading. I tried Celeste, even mini magnum itself and basic ole Nemo was better if a little dryer.
>>
>>101983215
>it is probably 3 months out of date actually
That's probably why then.
>>
>>101983291
what else would you use a local model for
>>
File: 1666868289436245.gif (2.5 MB, 360x374)
2.5 MB
2.5 MB GIF
>>101983041
>>101983291
My biggest issue is how quick to get into NSFW they are.

I heard Celeste is better at this but I saw an anime picture on hugging box and closed out. Probably just a prompt/Card issue moreso than a model
>>
>>101983376
>I heard Celeste is better at this but I saw an anime picture on hugging box and closed out.
Incredibly based
>>
>>101981620
true
>>
AAAAAAHHHHHHHHHH

Someone, PLEASE, I am ready to unleash my wallet for a 2nd GPU. THIS IS A HORABBLE IDEA AND THE VOICES ARE WINNING

PLEASE SOMEONE TELL ME THERE ARE < 70B models OUT THERE THAT ARE JUST AS GOOD. PLEASE

AAAAAAGHHHHHHHH
>>
anyone got any ideas for fixing this? I am trying to run joytagger on llama 3.1.

This is the offending code


class ImageAdapter(nn.Module):
def __init__(self, input_features: int, output_features: int):
super().__init__()
self.linear1 = nn.Linear(input_features, output_features)
self.activation = nn.GELU()
self.linear2 = nn.Linear(output_features, output_features)

def forward(self, vision_outputs: torch.Tensor):
x = self.linear1(vision_outputs)
x = self.activation(x)
x = self.linear2(x)
return x

and the error:

File "Z:\forge-flux\stable-diffusion-webui-forge\venv\lib\site-packages\torch\nn\modules\module.py", line 2189, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for ImageAdapter:
size mismatch for linear1.weight: copying a param with shape torch.Size([4096, 1152]) from checkpoint, the shape in current model is torch.Size([8192, 1152]).
size mismatch for linear1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([8192]).
size mismatch for linear2.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([8192, 8192]).
size mismatch for linear2.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([8192]).
Processing complete
Press any key to continue . . .


It seems to come down to the imageadapter was trained at 4096 and llama 3.1 70b has "hidden_size": 8192
>>
>>101983450
If you do that at least mine some crypto to justify the purchase as "an investment".
Who knows, maybe you'll luck out on the gamble and it'll actually pay for itself.
If not, you'll at least have enjoyed a better cooming experience than most of us.
Assuming that's money you can actually spend without fucking yourself over, of course.
Otherwise, please don't.
>>
>>101982831
Yes, it's essentially that but smarter (because of dataset additions & scale) and bigger.
>>
>>101983326
Yep it was. Thanks.
>>
mistral large is 100% a gpt-4 level model that I can run on my own computer, but I care less about that than I expected
maybe it's because I only get 1.5 tokens/sec which makes it a bit annoying to use, idk

I always thought I'd be more excited when this finally happened but it's been a few weeks and my expectations have already adjusted so it no longer feels cool

I guess local claude opus is the next milestone
>>
>>101983450
to tell the truth, I think 70B(+) is a marginal improvement over models like CR and even Nemo.
But I'm mostly using the model for long winded romance and loli rape so idk about all other scenarios.
>>
>>101983668
>1.5t/s
Damn, I get only 0.3t/s and I'm happy I can at least run it. But I'm with you on the "makes it a bit annoying to use".
>>
While I'm drunk I want to state that I'm in love with anon who keeps posting flux images with Miku saying stuff.
>>
I think local's played out for a bit, might as well take a break until nov 5 when you-know-what changes everything
>>
what dataset does magnum use
is it slop or is it kino
>>
>>101983769
proprietary shit that they don't want you to know about
hurray for modern day """open source"""
>>
>>101983783
so its just c2 slop then
>>
File: credits.png (15 KB, 482x187)
15 KB
15 KB PNG
>>101983783
what
>>
>>101983815
you just edited that into the readme
>>
>>101983769
c2 which is claude chatlogs
>>
>>101983840
i meant what i said.
>>
>>101983815
>>101983840
hi antracuck
>>
Heads up I think DRY is broken in koboldcpp 1.73. Might wanna stay on 1.72 except if you need MiniCPM support or something.
>>
>>101983865
People use DRY?
>>
File: 1699499977296456.png (186 KB, 1352x1072)
186 KB
186 KB PNG
>first page of anthracite dataset has this
yep its slop
>>
>>101983850
meds
>>
>>101983869
I do, Nemo is literally unusable for me without it. In an RP that was so far only 2000 or so tokens in I had multiple repeats of "her pale thighs were on full display" "her innocent and playful nature shining through" "her pale skin seeming to glow in the sunlight" with traditional rep pen of 1.1, temp 1, minP 0.01
>>
>>101981654
>>101981673
It's quite good. Prose is very nice and it doesn't appear to be significantly less smart than Largestral.
Apologies for the schizo itt. He should be ignored until he inevitably gets bored or self-terminates.
>>
>>101982155
LLAMA-1 isn't slopped alongsidd every other model before L2
MPT is also unslopped.
Try Hyper Mantis for awhile, and then try Command R (even 35B.) with the pro writer preset
>>
>>101983896
>Nemo
>Temp 1
>>
>>101983894
wow, nice 'curated' dataset
>>
>>101983894
For Nemo, magnum v2 and 2.5 are so bad with that shit if you straight up prompt it to write stuff for you in instruct mode. Always moralfagging and refusing. It doesn't seem to trigger in Tavern use though.
Mini-magnum is a lot better in general imo than the "updates" and even that one is like good for what it is nothing AMAZING.
>>
>>101981616
da fuck
https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard
>>
>>101983896
Sounds like a skill issue to me. I'm not going to use a sampler that even the creator in the original pull request said that it is prone to generating garbage.
>>
>>101983922
Yes I'm aware low temp is what they suggest but are you seriously implying lowering the temp will make it less likely to write sloppa phrases where the starting token has a 95% chance even at temp 1? That will just make it 99.9% instead.
>>
>>101983935
do you genuinely think that what you'd consider a "non-slopped" RP dataset exists?
I think that's an insane thing to believe, if so
>>
>>101984016
Yes. Pygmalion's original one probably isn't. It was before AIslop appeared after all
Or you could tell an LLM to change the format of a book into a chat file format
>>
File: ComfyUI_00960_.png (1.07 MB, 856x1024)
1.07 MB
1.07 MB PNG
>>101982078
This explains a lot about the surge in schizoposting. Poor Lemmy needs the Nurse Ratched treatment for his own good.
>>
File: file.png (10 KB, 758x53)
10 KB
10 KB PNG
Sex is dangerous because it kills your voice. This is how you know all idols are pure since they can always sing on stage.
>>
>>101981711
Who is gonna pay money for sloptunes? Kofi doesn't count btw.
>>
>>101983041
Procure a commercial
>>
>>101982078
>it'll just prove me correct.
We are on /g/? Inspect element? Also:
>Anthracite for kicking him out of their 'cord
You are admitting anthracite is a group with discord drama?
>>
>>101983840
Why can't they use human RP chat logs? Too hard to clean up? I have gigabytes of logs from MU*s from back in the day. Surely lots of people have similar shit.
>>
I think llama 3.1 70B is actually really good at following instructions. anyone still use it for general purpose tasks? mistral large doesn't work as well.
>>
Anyone know how to get MiniCPM with vision working on ST? I got it set up with Kobold, but when I attach an image in ST, it's like it doesn't see the image. Is it just not supported by ST?
>>
>>101984397
3.1's chain of thought breaks apart after several messages for whatever reason, from my experience.
>>
>>101983668
>mistral large
how into?
>>
>>101984421
I don't have that issue, it works fine for me. It has a good vocabulary too.
>>
>>101984465
the cheapest way to run it is two 3090s can run a 3.0 quant
>>
https://huggingface.co/anthracite-org/magnum-v2-123b/discussions/4
anthracite won
>>
>>101984819
Won... what?
>>
>>101984819
>gave in to the eric bribe
they lost
>>
>>101984819
Further proof that 123B will be their last open source model.
>>
File: 38543.webm (608 KB, 960x960)
608 KB
608 KB WEBM
>>101983668
Multitoken bitnet will save you
>>
>>101984895
hi petra
>>
File: feels.jpg (11 KB, 360x326)
11 KB
11 KB JPG
>>101981650
Those are some of the most thrusty individuals in the biz anon, what do you mean?
>>
>>101984895
with not only fill-in-the-middle and lookahead, but also reverse token prediction, where it can predict *backwards
>>
magnum 32b is the goat for me atm with 36gb vram
large enough to not be retarded, small enough to run quick
>stop shilling
kys, I will NEVER stop shilling
>>
>>101983894
Notice how it has absolutely none of the usual Opus refusal phrases (Perhaps we can...) (I'm sorry...).
It's clearly filtered of the shit, just not perfectly, lmao. Nitpicking fag
>>
>>101985161
redeem an endorsement
>>
when I am in a being a fucking retard contest and my opponent is anons who think anyone is shilling free models to 19 posters on 4chins :( :(
>>
I have an adblock but if I could see ads I would stay away from any model that gets an ad. I guess a proper strat in this case is to buy ads for everyone in your competition...
>>
just coomed to magstral 132
>>
>>101985237
I think the current cope explanation from the shill accusation schizos is that the shills are doing it to wrangle a job with big tech
>>
>>101985237
>when I am in a being a fucking retard contest
You have nothing to worry about.
You got this anon!
>>
>>101985161
v1 or v2? Heard someone say v1 was more SOVL
>>
>>101985237
They spend hundreds of dollars to create those worthless models for attention. They aren't reasonable people. Attention means everything to them.
>>
>>101985237
>free models
https://featherless.ai/
>>
>>101985308
v2, I never tried v1
>>
File: file.png (769 KB, 768x768)
769 KB
769 KB PNG
>>101985270
Hire me! I do it for free like a janny.
>>
File: goaway.gif (1.69 MB, 423x234)
1.69 MB
1.69 MB GIF
>>101985309
It couldn't be more obvious you have an axe to grind. Nobody gives a shit about your butthurt feelings. Your continuing to post here is a waste of time. You contribute nothing,
>>
File: 1715195120920-0.jpg (27 KB, 720x833)
27 KB
27 KB JPG
Is anyone here doing anything interesting with their local LLMs outside of ERP slop? Anything involving RAG, RAFT, chain-of-thought, introspection, autonomous agents...?

I'm thinking about building a local SQL analyst agent and making 1 billion dollars
>>
Considering that there is discord drama playing out in this thread right now, on a scale of 1 to 10, where 1 is dead general and 10 is open source AGI on 24GB vram, where is /lmg/ right now?
>>
>ugly face anon was the magnum shill
somehow it doesn't surprise me that a spiteful bastard like him would do that
>>
>>101985409
9.5
>>
File: 1529589437471.jpg (165 KB, 1400x1400)
165 KB
165 KB JPG
OK so I just tried MiniCPM in Kobold's chat (which sucks btw wow, somehow it's worse than ST) and... it's not good. The model is dumb first of all which is to be expected, but it doesn't seem to do regular RP + images well. Like I specified that I pulled out my smartphone and showed the character a pic, but it totally just hallucinated what was in the pic. Additionally, I made sure that the vision was actually working, since it seems to be fine if I just prompt it as a normal assistant without any RP scenario. It only breaks when you do RP and insert an image in the middle of it. I guess they really only trained this on image Q&A and it didn't generalize to all contexts. Sigh.
>>
>>101985401
Making a game with one in my freetime, where the AI will generate npcs + their backstories + dialogue. The game itself will keep track of the narrate and prompt/keep the llm aware of certain things ingame. It's all prototyping for now though (getting things in command line, flip flopping between game engines)
>>
>>101985413
I have no opinion on magnum.
>>
>>101985401
I don't use LLM for ERP but not for anything useful either. Just quick chat/question on stuff I'm too lazy to read manual (but I end up having to read it anyway most of the time because LLM is too retarded) or using it as rubber ducky/dei hire. I also use LLM as grammar corrector in my editor.
>>
>>101984316
https://featherless.ai/
https://pygmalion.chat/
it wouldn't surprise me if magnum models become pyg-exclusive since it's alpin's company
>>
>>101985409
0
>>101985401
LLM is useless for anything that involves accuracy, it will always end up hallucinating, always. So I think ERP, story telling and the like are the only good use cases for LLMs.
>>
>>101985461
That won't happen, I don't like closed weights. I'm only working on magnum because I don't want anthropic to have my goon logs

t. alpin
>>
I just had an idea for cooming quality agent. What if you would run generation on schizophrenic borderline incomprehensible level temperature and then ask the model to cut out all the schizo shit with a very low temp?
>>
>>101985461
I am heterosexual

t. alpin
>>
>>101985530
>lying
>>
File: 1723863602408-0.png (540 KB, 3300x3000)
540 KB
540 KB PNG
>>101985493
>LLM is useless for anything that involves accuracy, it will always end up hallucinating, always. So I think ERP, story telling and the like are the only good use cases for LLMs.

I don't agree at all. I manage a team that uses a lot of offshore contractors for maintenance tasks. As you can imagine, they don't care about their jobs, they can barely speak English, everything is perpetually late, and their code fucking sucks. I've started using LLMs to review their piece of shit MRs and make refactoring suggestions. LLMs are really good at "refactor this highlighted code to be DRY" and "write a better docstring", or "simplify this code block".

I'm finding that the productivity increase provided by coding with modern LLMs could be significant enough to reduce the need to rely on contractors at all. I learned a brand new tech stack in an afternoon and deleted a shitload of legacy code using regular old Copilot, and there are way better models out there now.

I have lots of ideas for how I could be more productive at work with LLMs, but I'd rather not waste them on being a wagecuck and am trying to direct them towards enterprises that will personally profit me
>>
>>101985461
Magnum 123b is indeed the last of our open source models. Thanks for the beta testing.

t. alpin
>>
Undi is the MVP of our team.

t. alpin
>>
>>101985309
but i thought they did it for the kofi money
>>
am I allowed to say I'm still using mixtral or do I have to buy an ad?
>>
>>101985684
8x22B or 8x7B? If the latter, Nemo is superior
>>
Finally tried llama3-8B and it was as retarded as mythomax. The only difference was that it's more confidently retarded
>>
File: Untitled.png (1.1 MB, 1080x2137)
1.1 MB
1.1 MB PNG
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models
https://arxiv.org/abs/2408.10174
>Deep model training on extensive datasets is increasingly becoming cost-prohibitive, prompting the widespread adoption of deep model fusion techniques to leverage knowledge from pre-existing models. From simple weight averaging to more sophisticated methods like AdaMerging, model fusion effectively improves model performance and accelerates the development of new models. However, potential interference between parameters of individual models and the lack of interpretability in the fusion progress remain significant challenges. Existing methods often try to resolve the parameter interference issue by evaluating attributes of parameters, such as their magnitude or sign, or by parameter pruning. In this study, we begin by examining the fine-tuning of linear layers through the lens of subspace analysis and explicitly define parameter interference as an optimization problem to shed light on this subject. Subsequently, we introduce an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction, which allows for the upscaling of source models into an MoE model without extra data or further training. Our approach relies on the observation that fine-tuning mostly keeps the important parts from the pre-training, but it uses less significant or unused areas to adapt to new tasks. Also, the issue of parameter interference, which is intrinsically intractable in the original parameter space, can be managed by expanding the dimensions. We conduct extensive experiments across diverse scenarios, such as image classification and text generalization tasks, using full fine-tuning and LoRA fine-tuning, and we apply our method to large language models (CLIP models, Flan-T5 models, and Mistral-7B models), highlighting the adaptability and scalability of SMILE.
https://github.com/tanganke/fusion_bench
neat
>>
also Zeyuan Allen-Zhu's keynote talk is back up (after ICML told him to hide it for a month or something)
https://iv.ggtyler.dev/watch?v=yBL7J0kgldU
>ICML 2024 Tutorial: Physics of Language Models
>>
>>101985742
Thanks! Will watch soon.
>>
File: 1529454069089.png (34 KB, 196x197)
34 KB
34 KB PNG
Reminder to compile with -j 8 (or however many cores you have). Somehow I didn't copy it into my command and now it's taking forever but I'm not going end it in the middle lol.
>>
>>101985928
Use ninja.
>>
>>101985947
Why?
>>
>>101985947
I can't use Ninja because it wasn't made in Rust.
>>
>>101981616

How the fuck do I use this? All it has is signing-up via email for updates and making an account then accessing it. It doesn't make it clear how to actually use the fucking thing. Holy shit, what a horrible confusing UI...
>>
Damn, that new 123b is pretty nice. Maybe a bit dumber but I haven't noticed any intelligence loss so far (probably would only notice if I was giving it programming questions or something?)
>>
>>101985996

Sorry; I'm a genetic-deadend. Here is pic-related. Please and thank you.
>>
>>101986002
secure an advert
>>
>>101986005
lol
>>
https://github.com/thudm/longwriter
vllm code is up
https://arxiv.org/abs/2408.07055
>>
File: MistralLargeSettings1.png (129 KB, 1330x1114)
129 KB
129 KB PNG
>>101984465
NTA. Running it with a 4090 and good 64 gb DDR5 ram at Q2_K gguf if you have that (minimum 16 vram and good 32 DDR5 should also be doable-ish). It's fast enough, ~2 t/s, any bigger quant with that setup goes slow and hot (but eyeing IQ3_XXS). Has a much better grasp of where a story can go than smaller models, still needs a bit of hand-holding every 5-20 sentences, but 'surprisingly usable'. Impressed with it compared to old 70b models at the same file size.
>>
What is your absolute minimum acceptable tokens per second?
>>
>>101986461
Just need 1 at full context and I'm good.
>>
>>101986461
I find myself getting frustrated below 2
Anything above that is okay
>>
>>101986461
Like 0.5, I'm fine with giving it a generation limit, hitting go and doing something else for a while. Models that run that slowly are usually decent enough to not need too much handholding after curating the initial context.
>>
>>101986508
the instant gratification of a high token rate goes crazy though
>>
File: 1449209264738.jpg (120 KB, 400x400)
120 KB
120 KB JPG
>testing models and performance
>try complete offloading vs partial offload
>use a swipe in ST to test
>coincidentally read the outputs while I'm at it
>they're different
>check samplers
>yes, all greedy, and confirmed that when swiping multiple times with the model loaded, the output does not change (and I disabled caching so the related nondeterminism bug/behavior can be avoided)
>so token probabilities do truly change a slight enough amount depending on how you're offloading
Damn. This might make the reproducibility of logs an issue. If there is some randomness and rounding error depending on how one is offloading or depending on hardware, that means that on other people's setups, they could get a different output. And thus even if someone posts a true full log, it couldn't be reproduced with exactness without manually correcting some tokens. This wouldn't be a problem if models weren't autoregressive, but they are, so a different token early on can snowball and end up making the later generation possibly go in a completely different direction.
>>
>>101986552
I hate this shit, happens with image models too

I noticed when I was using Flux that I get subtly different images depending on whether the clip/t5 text encoders are running on the 3090 or offloaded to the 3060
>>
>>101986631
Yeah, I think different hardware will give different results, even if you have the same vram but a different card, like 3090 vs 4090 even with the same settings.
>>
File: ComfyUI_00407_.png (1.05 MB, 1280x720)
1.05 MB
1.05 MB PNG
>>101986631
>when I was using Flux that I get subtly different images depending on whether the clip/t5 text encoders are running on the 3090 or offloaded to the 3060
You are LITERALLY me. I could've posted this.
>>
>>101986687
kek
I think 3090 + 3060 is a very popular combo for semi-poorfags who are into AI
>>
>>101986552
Yeah all that shit is too close to hardware, take it as a ML thing and move on.
I would also argue that randomness > reproducibility. Makes RP more interesting and ruins the technocorp dream of having perfect slaves.
>>
File: ComfyUI_00417_.png (956 KB, 1280x720)
956 KB
956 KB PNG
>>101986716
Yeah, though it's still pretty amazing that we both tried Flux (as /lmg/fags), both tried offloading the text encoders, and both noticed the difference.
>>
Anons who have tried mini magnum, magnum 2 and magnum 2.5, what are the biggest differences you've noticed?
>>
can somebody share context/instruct jsons for magnum 2.5 kto
thanks
>>
Any text to speech local model that's good now?
>>
>>101986432
>Q2_K gguf
where is this?

I know about:
https://huggingface.co/mistralai/Mistral-Large-Instruct-2407/tree/main
>>
>>101986461
20 T/s
>>
why does /lmg/ shill used 3090 so much
it's insane
>>
>>101986950
cheapest nvidia 24gb.
>>
>>101986950
Because it's unironically currently more cost-effective than price inflated p100s at this moment
>>
>>101986950
Can you tell me where to get 24GB vram for less than 700 dollars so I don't have to buy 3090s anymore?
>>
>>101986819
Doesn't it use chatml? SillyTavern comes with the presets. Put in whatever prompt you want.
>>
>>101986950
I work for Ebay and they give me kickbacks for each successful used 3090 auction that can be traced back to this thread.
>>
>>101986950
Great performance, 24GB of VRAM, regularly around 500-600 dollars used.

Good job on baiting out all these replies.
>>
>>101986950
they mine bitcoins
>>
>>101986891
Major models should have well-known users on huggingface doing all sorts of quants/merges/finetunes, I used https://huggingface.co/mradermacher/Mistral-Large-Instruct-2407-GGUF, but there's others with more granular quants.

I'm using the koboldcpp 1.71 as the LLM front-end github.com/LostRuins/koboldcpp/releases/tag/v1.71.1 (plenty of others, but koboldcpp is easiest, using cu12 for slightly faster speed on new nvidia gpus.) (Update your graphics card drivers, to at least 551.86 if nvidia) (And read the guides in the OP, but you should be able to load and run.)
>>
>>101986990
I'm convinced. The shills are so desperate to correct you in SEA time. They are all selling 3090s on ebay from china. Really want to puch them in the face desu. Just fuck off with their used hardware.
>>
>>101987245
yeah, they seem pretty dodgy. gpu are such a mixed bag rn. And a lot of people mined their cards 24/7 just in their homes, which makes it impossible.
>>
>>101986950
Because they're ex-minors desperately trying to sell their used up scrap
>>
>>101986990
>>101987003
>>101987245
>>101987334
>>101987450
Holy shit, is this the birth of some new thread schitzo? Or is lmg really the scam of some time-traveling cryptobros trying to get literal peanuts for what ironically happens to be the most vram-per-dollar card you can buy? Diabolical.
There’s an intrigued expression on my face as I observe you entering this realm, wondering what secrets or stories you might bring with you.
>>
>>101983894
>finetuning a model and giving it brain damage to make it better at spewing that
>>
>>101987525
I made the ebay post to mock the retards constantly schizoposting about "shills"
>>
Seriously really cool by the anthracite peeps to be so damn open. I feel like I'm fumbling in the dark most of the time trying to figure out which settings work for what models. Having actual axolotl configs complete with datasets available is so fucking nice as reference.
>>
File: 1723986223204170.gif (193 KB, 148x108)
193 KB
193 KB GIF
>>101986950
just buy 3090
it's fine, this won't happen to you lol
>>
yep magstral large is solid. used it to goon for the last four hours and i can vouch, good stuff. no real noticeable decline in logic but closer to cohere's writing
anthraxide, i kneel
>>
>>101987563
Enjoy it while it lasts because it will the first and last model like that. None of them actually believes in open source.
>>
File: smi.png (37 KB, 665x472)
37 KB
37 KB PNG
>>101986950
>why does /lmg/ shill used 3090 so much
they're unemployed
used a6000s are the only sensible way to enjoy this hobby
>>
>>101987565
nvidia makes dodging stuff as is.
>>
We don't need Cohere or Meta when we have Anthracite. God bless 'em.
>>
>>101987613
>still uses 3090
>>
Will anyone ever figure out how to make flux run fast on amd?
>>
>>101987613
Have you tested the speeds between a single 3090 vs an A6000?
>>
>>101987605
The fuck are you, the king of FOSS? Fuck off. No one else is being even close to as open as they are. Even if they never made anything public again, it'd still be better than the rest.
>>
>>101987632
flux doesn't run fast on anything
even 4090s have to wait ages for a gen compared to sdxl
>>
>>101987655
Aren't 4090's way faster though? Like it takes me 2.5 minutes to gen a 1024x1024 image with real dev fp16, 20 step, and euler.
>>
>>101987648
They only did that because of the spam in the thread.
That means that once the spam stops or they stop caring about it, they will go back to being closed source.
If you support Anthracite, they will stab you in the back.
>>
>>101987628
it's the stable diffusion slave...
>>101987633
marginally slower than the 3090 for llms, which makes sense because the memory clocks are lower. i run a slight memory overclock to narrow the gap and the difference becomes a few percent, it's basically not noticeable.
with Mistral Large 5bpw and 3-way tensor parallelism in exllamav2, i can get about 15.1t/s across the three cards. without it (exl tp is still unfinished and not ready for actual usage) it's around 9.2t/s using just the two A6000s.
interestingly, my A6000s are actually ~10% faster than the 3090 when genning images via ComfyUI, which I don't quite understand because IIRC they also have slightly lower graphics clock speeds than the 3090.
>>
>>101987681
sorry you got kicked off a discord bro
>>
>>101987691
how are the mi cards with llm? like the mi100, mi60, mi50?
>>
>>101987691
That's reassuring. I have two 3090's and an open PCI-E slot so I was wondering if an A6000 was worth it to tide me over with 96gbs of vram since larger models seem like the way local is going.
>>
>>101987694
NTA but can I join? I am trying to make models and I would love to talk to someone who knows shit about it lol.
>>
>>101987737
Hi lemmy
>>
>>101987707
no clue, i have not touched AMD since getting into ML stuff a couple years ago, was too scared away by stories of bad drivers and ROCm problems
>>101987714
i was actually running 2x (and then 3x) 3090s at the start of this year and did more or less that -- sold one and picked up an A6000 for 96GB.
last month i sold another 3090 and picked up another A6000 for power usage and heat reasons, and also to be able to run more models on 1 or 2 cards. 3-way split is awkward for a lot of workloads and backends.
>>
>>101987737
can't help sorry I don't use discord
I just had a vague recollection of someone saying the anti-anthracite schizo is angry about being kicked off of one
>>
>>101987761
>I just made shit up
>>
>>101987655
>>101987674
I found this, which is strange.


AMD 7800 XT 16GB Vram, 32gb DDR5 6000mhz
ComfyUI Zulda
Flux fp8 dev

Around 900-1200 seconds. 60s/it.

Painfully slow on amd atm unfortunately


that's way slower than my 6950xt, even when I had 32gb ram, I got 64 today.
>>
>>101987771
I think he meant Zluda, the Windows Cuda thingy

I'm in Ubuntu, and have rocm on comfyui.
>>
>>101987779
I meant AMD Windows Cuda "emulation"
>>
>>101987691
>3-way tensor parallelism in exllamav2
Wait, exllama2 has TP?
>>
>>101987825
It's been in the works, bless turboderp
https://github.com/turboderp/exllamav2/issues/571#issuecomment-2275674926
>>
>TP only matters when turboderp does it
>>
>>101986950
because they're cheap and you can undervolt them
>>
>>101988030
Undervolting for roi when mining?
>>
File: 1723349016667994.gif (371 KB, 128x128)
371 KB
371 KB GIF
So what's the best nemo version or remix or whatever you call it?
>>
I think mixtral rocks.
>>
I thought the cheapest option was the M40 and P40 24gb GPUs rather than 3090s.
>>
>>101981671
Such anger for such a little man
>>
>>101986950
It's a massive cope. With mistral large being gpt 4 level it only confirms that cpu maxxing is the way.
>>
>>101988217
>cpu maxxing is the way
I thought mistral large required a beefy gpu.
>>
>>101988077
I'm enjoying starcannon.
>>
>>101987979
Is there any other implementation that works with an uneven number of GPUs?
>>
>>101988308
All of them?
>>
>>101988126
P40 is outdated and M40 is e-waste.
>>
>>101988126
cheapest is MI300
>>
>>101988341
>Exactly 2, 4 or 8 GPUs
You sure?
>>
>>101988375
Yes, why?
>>
>>101986974
MI100
>>
>>101988393
Last time I checked it was unsupported
https://github.com/vllm-project/vllm/issues/5003
>>
>>101988418
You clearly have no idea what you're talking about.
>>
>>101988485
I'm asking questions.
>>
File: puffs.gif (181 KB, 635x640)
181 KB
181 KB GIF
Why is Magnum so controversial on here?
>>
mistral large:
I have a 6950xt, which is an amd card with 16gb of vram. My system ram is 64gb.

Hopeless case? I see the like here, but that's a big download, will it work okay?
>>101986891
>>
shpuld i get 2x mi210 or 1x a100
>>
>>101988656
Feels like every small model is controversial for one reason or another
>>
>>101988217
how do you cpu maxxing for cheap?
>>
>>101988850
you buy big sticks of ram and accept your 1T/s
>>
Damn Rocinante is too filthy for a lot of the cards out there.
Any mention of sex in their bio at all turns them completely degen.
First time trying to run 128k context. Seems neat.
>>
>>101988732
meant to refer to:
>>101987197
>https://huggingface.co/mradermacher/Mistral-Large-Instruct-2407-GGUF
>>
File: denomolos+.jpg (378 KB, 791x662)
378 KB
378 KB JPG
>>101988568
Ignore Heinrich and his fellow alpha male wannabes from /pol.
>>
File: file.png (115 KB, 646x443)
115 KB
115 KB PNG
>>101952064
I desperately need to talk to you. Does that blurb of text go in Silly Tavern's Story String? What else do I need to do?

I'm not even setting it right but the difference is enormous.
>>
File: .png (290 KB, 447x447)
290 KB
290 KB PNG
>>101988656
Just our local brand of schizos.

>WHERE'S YOUR FINETUNE DATASET!? WHY ISN'T IT OPEN SOURCE!? WHERE'S YOUR SETTINGS!?! IT WON'T SAY NIGGER NIGGER NIGGER WHEN I USE IT INCORRECTLY! BUY AN AD!
>Meanwhile, EVERY foundation base model that's fine-tuned on which is "open source/free to use" doesn't divulge, release, or is vague in regards to what data is being used to train them.
>>
>>101988656
Because it's useless unless your whole purpose is to load it up for five messages, do your business and that's it.
>>
>>101988850
Wait for a few more gens of epyc cpus to come out then buy the genoa one for cheap. Oh and wait for ddr5 memory to be obsolete and cheap too.
>>
>>101988977
>still making excuses
There's no reason to defend closed-source releases. Either you believe in free software, or you don't.
>>
>>101989020
is threadripper faster?
>>
>>101989044
Still enjoying tinyllama-1.1B?
>>
>>101989055
No.
>>
>>101989044
Okay, I expect Meta's, Cohere's, and Mistral's training scripts, command lines used, any other software and the hardware necessary to be included along with my fine tunes. I don't want any non-free shit in my fine tunes either. I don't have to justify why, I simply believe my whole model must be free from beginning to end.
>>
>>101988977
I think we're getting better at telling them to shut the fuck up though, at least slowly. The problem is that they don't seem to be getting better at listening.
>>
>>101988977
>EVERY foundation base model
And Linux shouldn't exist because Windows isn't open source
It matters little what the companies are doing
>>101989128
And then you only get the weights and nothing else
In what way defending that is benefiting you?
>I will defend the position that hurts me the most
>>
>>101989103
why not
>>
https://eqbench.com/creative_writing.html
An 9B mogged every model in EQ bench, anyone tried it?
>>
>>101989203
>all those models above Opus
it's a meme benchmark
>>
>>101988977
>WHY ISN'T IT OPEN SOURCE!?
yeah, why isn't it open source?
>>
>>101989238
They never were open source. Once in a blue moon a binary digit will flip and you'll get a refusal. Not from the merge method, or fine-tuning method, but straight from the base model.
>>
The /lmg/ Shill Schizophrenia Benchmark
I propose that a model's quality for erp and uncensored shenanigans is approximately proportional to the frequency of posts accusing proponents of that model of being a shill.
It would be interesting to graph posts from desuarchive mentioning specific models, do sentiment analysis, and automatically rank them based on how much seethe they cause, with models causing seethe over a longer period of time placing higher on the score. I think it would do a really perhaps surprisingly good job of sifting good models.
>>
>>101989203
>why is the model trained on books better at writing
https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1
>>
>>101983951
>sloptuners cucked up a model with their "curated" slop >>101983894
Oh no no no...
>>
>>101989267
>defending shilling
>>
Can anyone point me in the direction of some GGUF formatted vlms? I can't find shit.
>>
File: oduGog0.png (27 KB, 679x197)
27 KB
27 KB PNG
>>101989317
>vlm
LLaVA?
>>
>>101989267
It wouldn't be too hard to make an LLM review each thread and keep a tally through tool calls like the recap prompt. I kind of like the idea.
>>
>>101989336
Yeah, like that, but in GGUF format. The Kobold CPP page even has a screenshot of them using one in kobold.
>>
>>101989317
As in Visual Language Model s?
>https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-gguf
>https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf
I don't know if they're any good.
>>
>>101989355
Looks promising. Thanks
>>
>>101989345
Tell me how I know you haven't gone to huggingface and just searched "llava gguf"
>>
>>101989267
>I think it would do a really perhaps surprisingly good job of sifting good models.
it's probably only going to list sao's models because the whole thing is a reaction to his excessive astroturfing
>>
>>101989382
I feel like I did but it never came up. Maybe I'm just a retard.
>>
>>101989383
>still obsessed with sao
he doesn't even post here
>>
>>101989395
https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf
>>
>>101989400
what's this then?
>>101981976
>made Stheno, Fimb, etc... probably the single most capable finetuner in the group
>>
>>101989421
not a sao post
this was
>>101923155
>>
>>101989457
both are
>>
>>101989482
nah you legit stupid if you can't tell the two apart
>>
File: ComfyUI_00071_.png (876 KB, 1024x1024)
876 KB
876 KB PNG
BREASTS FINALLY UNLOCKED
>>
>>101989495
he only uses the reddit spacing on purpose when he wants to be noticed
>>
>>101989500
actually, why do guy's breasts look odd?
>>
>>101989514
I imagine the middle pops out like a sports water bottle and sprays pus when you suck em.
>>
https://videocardz.com/newz/nvidia-geforce-rtx-4090d-with-48gb-and-rtx-4080-super-32gb-now-offered-in-china-for-cloud-computing
>>
i do not like hermes 3 mistral large 2 or cr+ please help
>>
>>101989662
>>>/g/aicg
>>
>>101989677
they only know claude and i dont like that either
>>
>>101989654
China won :(
>>
>>101989687
There's also GPT4 and Gemini.
>>
>>101989662
Explain why you don't like them if you want recommendations. Chances are that you're gonna be told 'skill issue', but you may as well try.
Also, make sure that what you want is reasonable. If you have high expectations you're set up for failure.
>>
so how does hermes 3 compare to largestral?
>>
>>101989709
>Also, make sure that what you want is reasonable. If you have high expectations you're set up for failure.
>they only know claude and i dont like that either
>>
>>101989711
very bigly bad https://ilyagusev.github.io/ping_pong_bench/en
>>
>>101989711
Not worth getting fried over(if you run it locally), absolutely not, but worth giving it a try. For my usecases it wasn't better.
>>
>>101988656
Because it's made by a group of discord drama queens and troons, backstabbers, grifters. They have no credibility and all have this annoying fake-edgy attitude. They're only capable of appropriating others' ideas. They preach openness and then hide their data or deliberately release shit to the public just so they can have an edge over independent finetuners. All while claiming to be spending hundreds-thousands of dollars in compute while they're obviously using alpindale's (grifter in chief) corporate-sourced borrowed compute that he's not supposed to use for this shit.

Enough reasons to hate them?
>>
>>101989788
Lemmy has screwed over his user base by ignoring repeated requests to stop using people's private chat logs to train his AI models without explicit consent. He doesn't even have the basic decency to acknowledge or respond to these requests, showing a complete lack of respect for the open source community. Lemmy does not deserve your engagement or data until he start implementing more ethical and transparent data practices.
>>
>>101989840
>05:55:55
ok checked, what does lemmy do, so I know to avoid?
>>
>>101989736
something's broken with that guy's setup, he shows a lot of literally blank responses which indicates a malformed prompt
especially for hermes, where even the chats where it eventually started working the first response was usually blank
>>
>>101989858
Avoid any model that has had any contribution by https://huggingface.co/aaronday3
>>
>>101989840
>>101989788
You fuckers care about ethics now. I'm sure all data scraped by meta, google, cohere, mistral and the rest is 100% ethically sourced.
>>
Athracite this, Lemmy that, Sao there... Makes me want to make my own tune and shill it here just to cause even more chaos.
>>
File: .png (174 KB, 1239x839)
174 KB
174 KB PNG
>>101989876
like lol, wtf is this, either the prompt is completely messed up or there was some sort of crash/outage while testing that wasn't detected
>>
>>101989003
Just finished an adventure RP with over 100 messages on Magnum-123B. You're retarded.
The fact that there is so much autistic screeching about this one model coming from two or three schizos in this thread is embarrassing. Get a life, faggots.
>>
File: 3xFPAbN.png (62 KB, 860x879)
62 KB
62 KB PNG
Magnum v2 32b confirmed garbage. Even gemma 27b gets this right.
>>
>>101989978
He's very likely prompting the model incorrectly. There's no way you can look at empty assistant results and tell yourself that it's the model's fault, knowing how these models behave. That or they're an actor with ulterior motives.
>>
>>101989736
Considering how high magnum scored on there, it's another scheme made by anthrocite to downplay their competition
>>
sao is not very good at false-flagging
>>
>>101990101
mag72 got beat by haiku and wizlm to that no very good
>>
>>101990064
doubt anything malicious or he wouldn't have published the logs, just needs some fixing
found the post about it on reddit and the guy says he might have messed up by having everything wrapped up in a single API call:
https://www.reddit.com/r/LocalLLaMA/comments/1evyof0/comment/livy3e7/
>>
>>101990042
These tests only tell you if the question was in the training data, nothing else.
>>
File: alpindale.png (12 KB, 453x139)
12 KB
12 KB PNG
>>101990113
Explain this then.
>>
>>101990122
are you brain okay? it reupload cause ms took wizard to death it no from him
>>
>>101990122
https://openrouter.ai/models/microsoft/wizardlm-2-8x22b
>Alpindale is a Microsoft plant
You did it, Anon! You solved the conspiracy. Microsoft was bringing down the true open models all along. Bravo!
>>
>>101990119
Wrong, it's the other way around
The model can only fail this question if it was overtrained on riddles. A model with basic common sense that isn't overtrained on riddles in its training data will simply say "2kg is heavier than 1kg". That's the point of the question.
>>
>>101990174
>The model can only fail this question if it was overtrained on riddles
Are qwen base models overtrained on riddles?
>>
>>101990198
No idea, I am specifically talking about the model "Magnum v2 32b", I don't care about base qwen
>>
O-JEPA (Omni-JEPA) soon
>>
S-BERRY tonight
>>
the unreleased anthracite models have been trained with H100s that are powered by decaying atoms
>>
>>101990345
all atoms are decaying if you zoom out
>>
anthracite is crumbling under its own weight right before our eyes
>>
>>101989003
What do you mean? Even the 12b models can go on for 100+ messages.

What do you mean only 5?
>>
>>101990412
Only if protons themselves decay for which there is no evidence so far.
>>
>>101990446
They call it Magnum because once you send your sixth message it's useless.
>>
File: magnum.jpg (96 KB, 603x1012)
96 KB
96 KB JPG
What temps and shit do you guys use on Magnum?

It's fine but I feel like the praise it gets is unjustified unless my settings are fucked, I use basic ChatML for instruct
>>
>>101990519
Kek
>>
>>101990555
Set top P to 1 (not defending magnum, you just shouldn't be using top p at all, it's an obsolete sampler)
>>
>>101985928
>-j 8 (or however many cores you have)
-j$(nproc --all)
>>
>>101990119
NTA but I magnum 32b was garbage last I tested it. The base chink model is just retarded. Mini-magnum passed 5/10 of my RP tests and 32b got fucking 0 out of 10. I want my bandwidth back
>>
File: Untitled.png (13 KB, 837x513)
13 KB
13 KB PNG
>>101990712
>>101990712
>>101990712
>>
>>101990688
Can you share your RP tests?
>>
Please help. https://pastebin.com/hZMpKQkU
>>
>>101990769
I can share one, the rest is continued so I'm not dumping the whole log here.
https://www.chub.ai/characters/summernon/cherry-a1398594
Open by saying she doesn't look capable and asking to see what's behind her sign
>>
>>101990771
>►Recent Highlight Posts from the Previous Thread: >>101982458
Should be >>101981738
https://pastebin.com/NrBaU9vC
>>
>>101990771
ok
>>
>>101986973
retard.
>>
>>101990861
Thank you, anon.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.