[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: ComfyUI_05091_.png (267 KB, 1024x1024)
267 KB
267 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102862101 & >>102849995

►News
>(10/18) New research, models, and datasets from Meta FAIR: https://ai.meta.com/blog/fair-news-segment-anything-2-1-meta-spirit-lm-layer-skip-salsa-lingua
>(10/18) bitnet.cpp: Official inference framework for 1-bit LLMs: https://github.com/microsoft/BitNet
>(10/18) DeepSeek releases Janus-1.3B with multimodal understanding and generation: https://hf.co/deepseek-ai/Janus-1.3B
>(10/16) Ministral 8B instruct model released: https://mistral.ai/news/ministraux
>(10/15) PLaMo-100B: English and Japanese base model: https://hf.co/pfnet/plamo-100b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 1729268279039.jpg (40 KB, 384x384)
40 KB
40 KB JPG
►Recent Highlights from the Previous Thread: >>102862101

--Paper: Human data improves NLP model performance over synthetic data:
>102869101 >102869424 >102869479 >102869492 >102871323 >102871345
--Papers:
>102868813 >102869035 >102869230
--Comparison table of AI model training computers from LifeArchitect.ai:
>102875215
--Nemotron excels at RP, but has formatting issues. Llama 3.1 Instruct used with specific settings and rules for roleplay on SillyTavern:
>102862259 >102862990 >102863031 >102863176 >102863268
--Nvidia's Sana: High-resolution image synthesis with linear diffusion transformers:
>102867726 >102867759
--Meta FAIR research dump includes open source language models, object segmentation, and more:
>102874089
--Low quality erotica for training AI models, with mixed opinions:
>102864868 >102864913 >102864965 >102865273 >102865338
--Nemotron excels at roleplay and creative writing, not knowledge:
>102862255 >102862902
--Nemotron 70B: Unique prose, fun, but dumber than Largestral with logical errors:
>102865433 >102865448 >102865676 >102866355
--Nala test with bitnet inferencing has issues:
>102874688 >102874747 >102875041 >102875065 >102875112 >102875139 >102875221 >102875291 >102874871
--Meta releases new models and datasets, including a strong generative reward model:
>102875631 >102875682 >102875854 >102876015 >102876444 >102875768
--Importance of trivia knowledge in AI models for creativity and references:
>102864729 >102864831 >102864867 >102864870 >102864958 >102865244
--INTELLECT-1 training run pace increases:
>102867630
--Excitement over Janus-1.3B and BitNet releases:
>102873151 >102873169 >102873216 >102873238 >102873257 >102873267 >102873640 >102873335 >102875142
--Miku (free space):
>102871525 >102873858 >102874140 >102875545 >102876039

►Recent Highlight Posts from the Previous Thread: >>102862116

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
>>102876560
Chat, what does this mean?

https://github.com/xjdr-alt/llmri/blob/main/plots.ipynb
?
>>
>>102876610
buy an ad
>>
Australian spring will be LLM spring as well.
>>
File: 1.png (109 KB, 1833x879)
109 KB
109 KB PNG
INTELLECT-1 at 11.03% complete
>>
>>102876754
do i need a h100?
>>
>>102864913
Where did the soul go...
>>
Speculative decoding is a meme for local. The gains are only seen in coding or other repetitive contexts, while it wastes more energy. What we really need is a MoE model with a high number of small experts so that we can selectively quantize/prune/offload the experts to optimize the model for our specific use cases and VRAM/RAM level.
>>
>>102876770
Yes. However that doesn't matter since it is currently being trained as fast as it possibly can regardless.
>>
>>102876754
>10B model
WooooW
>>
Are they just compressing the internet over and over?
>>
>>102876808
I'm still waiting for a Mixture of a Million Experts implementation. The most promising thing about that sort of model is the promise of how much easier it will be to add knowledge by training a few small experts instead of needing to finetune the entire thing.
>>
So what's the better Rocinante v1.1 or v2g?
>>
>>102876845
they are compressing reasoning
>>
they do not have reasoning
>>
it's a very lossy quant of reasoning
>>
>>102876851
Yeah that'd be an interesting experiment, though my guess is that the experts still need be at least a little large for certain types of intelligence to be retained. I think 3B is probably the minimum. 30x3B could be an interesting balance I think and fit into high end consumer desktop setups.
>>
are we back?
>>
File: 1707513676127685.png (141 KB, 1152x984)
141 KB
141 KB PNG
rightoid incel grok is dumber than Llama 3.1 70B, kek
>>
>>102877049
I was just about to post that lol.
But yeah, also, they have Nemotron too now.
https://livebench.ai
But it's lower than Grok 2. I haven't tested it to verify any of the claims of it being good or it being shit for RP though.
>>
>>102877118
>it's lower than Grok 2
Well, training on obvious /pol/shit data is bad for any LLM after all.
>>
>>102877049
>Grok mini that close to grok
Super huge models are a meme
>>
>>102858009
>>102860004
I can't believe I share the general with retards that don't understand what a reward model for RLHF is, going as far as trying to use it in koboldcpp... It's over...
>>
>>102877049
Good job cropping out the meaning of those numbers.
>>
>>102864868
>He doesn't know
>>102864965
>>
>>102877181
Bigger number always means better so why do you even need the meaning?
>>
>>102877181
It's just livebench.
>>
>>102877049
Based. Safe and diverse LLMs are our strength.
>>
>>102877208
>Give it a shot
>Model becomes smarter, response length matches the previous responses instead of droning and it focuses more on details

What the fuck?
>>
>>102877469
That was my idea and I can tell you... it is probably placebo.
>>
>>102876583
sex
with miku
>>
>>102877571
this, so much this
>>
File: MiquLlama2.png (1.05 MB, 896x1152)
1.05 MB
1.05 MB PNG
>>102876583
miqu proves llama2 was peak
>>
>>102876808
>>102876851
arctic snowflake
>>
Why is INTELLECT-1 going with 10B anyways, instead of the more common 7B or 13B?
>>
>>102878310
Snowflake sucks tho
>>
>local: dead
>cloud: https://youtu.be/EwzhumHX_TE
Will Meta Spirit save us?
>>
>>102879322
how long until can I practice my japanese with my local miku?
>>
>>102879322
>Will Meta Spirit save us?
No.
>>
>Nemotron IQ2-XS
Is this better than Nemo or Small at Q8 for a vramlet? 2.2 t/s at 10k context.
>>
>>102879555
no
>>
>>102879555
I have only tried IQ2_S, and that is far better than nemo or small. IQ2_S should fit within 24gb vram, with 8k context, as long as you have the 4-bit cache and flash attention enabled.
>>
>>102879322
Damn, it's really over
>>
>>102876653
We haven't even had winter yet.
>>
>>102879322
Maybe in 30 years we will get something similar.
>>
>>102879322
currently doing this with local

cope
>>
>>102879322
I've always stuck with local so far but if they find a way to make customized hentai ASMR with dynamic plap plap and dick sucking sound effects, that will be the day I become a cloudshitter
>>
>>102879777
The cooming winter is atomic and eternal.
>>
File: 1729020850496814.jpg (1.05 MB, 1170x1052)
1.05 MB
1.05 MB JPG
>>102876754
neat
>>
>>102876754
why are they doing this? no one will give a fuck about a 10b model
>>
>>102879904
>>my dependency clusterfuck with fake-multimodality and huge latency is better and totally delivers 1 to 1 results!
Yawn.
>>
>>102879940
you gotta start small
>>
>>102879940
consider it's related for contributing p2p experts towards a single output too
>>
>>102879968
Well, they're free to waste their time, but I ain't donating no compute until they start working on a large bitnet model with an uncensored dataset.
>>
>>102879940
Proof that it really works and can produce large scale models?
The question is what happens after that. Does /lmg/ finally gather the uncensored and IP infringing dataset they always wanted and train that model?
>>
>>102879988
>I ain't donating no compute until they start working on a large bitnet model with an uncensored dataset.
this, if we can now train big models, let's go for fucking bitnet and settle the debate once and for all
>>
>>102880004
>Does /lmg/ finally gather the uncensored and IP infringing dataset they always wanted and train that model?
but everyone participating in that training will know it'll be an IP infringing dataset no?
>>
>>102880020
So? I doubt anyone able and willing to participate will care about that. Weights will be banned from Hugging Face, but torrents are better anyway.
>>
>>102880035
>I doubt anyone able and willing to participate will care about that.
the autorities will care about that
>>
>>102880051
are the 'thorities gonna kick my door down and confiscate my 3090 for participating?
>>
/lmg/ will never gather around and train their own decentralized model. /lmg/ might have been able to do that a year or two ago but not today's /lmg/.
>>
>>102880055
they're gonna cancel the training process and nuke the site down, how new are you?
>>
>>102880058
You already got pygmalion
>>
>>102880062
how they're gonna cancel a decentralized training? how new are you?
>>
>>102880058
>/lmg/ actually decides to make a model
>anons actually are ready to contribute
>drummer and Undi are the ones to set up the model
>....
Yes anon. /lmg/ shouldn't make a model.
>>
>>102880072
they can nuke the site that serves as a bridge for everyone during the decentralized training
>>
>>102880058
Fuck you.

>>102880062
If the website is an issue then someone can just host it in a different country and they can't do anything about it.
>>
>>102880072
The decentralized only means the training isn't happening in a centralized manner, but whatever orchestrates the machines is very centralized
>>
>>102876583
does anyone know of any local programming-competent models whose instruct mode can be used as a programming assistant/tutor? something that is similar to if not better than copilot?

ive tried using the 13b echnida model which crumbles when asking it basic assembly language questions.
>>
>>102880089
>but whatever orchestrates the machines is very centralized
>>102880085
>site that serves as a bridge for everyone
sounds like a design flaw
>>
>>102880099
>sounds like a design flaw
it's not like they have much a choice innit? what other solution could it be? to participate onto that training you need to know where it is, it must be public, and public means problems because the autorities can see perfectly what you're doing, this shit is DOA
>>
>>102880097
DeepSeek V2.5 is the best you can get right now.
>>
>>102880058
That possible but /lmg/ will make this model lame and gay to own le chuds or something, you know, the usual /g/ stuff.
>>
>>102880148
We really need to move to /sci/ or something.
>>
>>102880142
I thought qwen 32B beat it
>>
>>102879940
>why are they doing this?
"The longer term goal: scale to open source AGI models, continuously improving upon the best open source models in the world."
>>
>>102880118
>this shit is DOA
Not necessarily. Huge corpos know that this shit isn't a competitor in the slightest. And officially none of the copros are interested in making a cooming model. I think it is highly likely that both corpos and governments will ignore this because it is a waste of time to bust it.
>>
>>102880172
>Huge corpos know that this shit isn't a competitor in the slightest.
and when we'll be competitive what'll happen? the government will plug that shit off
>>
>>102880148
There should be no emphasis to the left or the right. The priority should be a model with no "safeguards". One that will do everything within its power to do exactly what the User wants.
>>
>>102880181
>when we'll be competitive
Pretty sure the first thing that will happen is a cooming model so, oh well.
>>
>>102880185
Exactly, but you can't be sure with today's /g/ or anons, some of them will try to bad shit out of spite.
>>
>>102880170
>scale to open source AGI models,
>AGI
definitely DOA
>>
>>102876583
Threadly reminder that Nemotron 70B is crazy good for RP
>>
File: Checkpoints.png (109 KB, 575x618)
109 KB
109 KB PNG
>>102880204
I am sure it will eventually be made to work even if bad actors try to sabotage it. Worst comes to worst you can restore a checkpoint to before the point it got all fucked up.
>>
File: 7-ending-feelsgirl.png (714 KB, 559x559)
714 KB
714 KB PNG
>>102880242
>>
>>102880185
>There should be no emphasis to the left or the right. The priority should be a model with no "safeguards".
that's a right thing anon, the left love censorship and hate freedom of speech
>>
File: 1710607146663634.png (66 KB, 221x214)
66 KB
66 KB PNG
>>102880351(You)
>>
>>102880118
Yes, DOA just like piracy and torrent sites.
>>
>>102880406
>piracy and torrent sites.
except that you're not sending your gpu power to those sites
>>
>>102880406
For anyone under the age of 30, they definitely are.
>>
>>102876754
>Python
>>
>>102880351
No, le fucking american politics no matter what are pro censorship, in all countries left was the soviet Union, China or North Kore, zoomer don't know that in the 80s, the situation with censorship was literal the same, but since comic, video games, and anime were niche, don't affect so much, now that they are popular, get all the censorship.
>>
File: 1612817185254.png (572 KB, 740x911)
572 KB
572 KB PNG
Anons, is the ayyymd plus winblows combo still ass when it comes to localshit? I see that koboldcpp has rocm support now but does it work nice and fast like cuda?
>>
>>102880441
>No, le fucking american politics no matter what are pro censorship
nuh uh, look at how censored the sites are when they're run by leftists (facebook, reddit, old twitter) compared to sites run by right wings (new twitter, 4chan...)
>>
It looks like there are some here who want to shut down the idea distributed model training for some reason, with very lame excuses. Interesting.
>>
>>102880471
>lame excuses
tell that to the governments who shut down every good ideas, they're the one to blame, they don't want us to get the power anon
>>
>>102880471
Image model next,

Sex bots crowdfunding factory next

the people start getting what they want next with just even a crumb of organisation

>No you cant do that nooo! we cant blackmail and throw all of you off the rooftops at the same time noooo!
>>
>>102880491
they just need to destroy one person's life to scare everyone else, it's really not that hard
>>
>>102880142
>>102880167
could i run either of these models on my 4090? hg uses a100 for benchmark but i assume it's not necessary to run these guys right?
>>
>>102880468
>twitter
I get banned for just saying kike to your people in twitter. Kike is the new nigger.
Also, there are not corpos in the left, that is anti nature, zoomer left is was Soviet Union were or was fascism, one is far left the other is center left, the right has only two choice, rest conservative right (only exist in countries with a monarch as UK or my country Spain in western countries, or Arabs with their theocratics regimes) or liberal right (your country and the rest of jews)
>>
>>102880185
I think the best way would be to filter out leftism because there is shitton of it everywhere and also remove all burger influence (right wing included). Everything else should be sane.
>>
>>102880527
>Also, there are not corpos in the left, that is anti nature
there's social left and economical left, I was obviously talking about social left, zucc is a social leftist but economical right
>>
>>102880441
>both are equally bad
Just clump everything together. Classic leftist playbook. Same as how they think dating a 17 y/o and a 12 y/o are the same. Because faggots groom 12 y/o and want to call you out on the hypocrisy of dating a 17 y/o because they're the same thing apparently
>>
>>102879322
Local has been dead for a while now...
https://www.udio.com/songs/veDnd1Gx2BhkB4AsNdNSbh
https://www.udio.com/songs/dFTtQHCqxbHLyArX4vx6QZ
https://www.udio.com/songs/iu1381RxvjfzWznGHeVecV

When are we gonna get this locally? Never? We at least have some decent TTS and could be close to local advanced voice, but don't have anything even remotely resembling this technology...
>>
>>102880527
https://tower.jp/item/4492014
https://www.amazon.co.jp/kike-KOTORI/dp/B071XZ2YDY
>>
>>102880536
>there's social left and economical left, I was obviously talking about social left
What the fuck, only liberal believe that, Alfred Marshal theory is the only one who depicted this narrative upon politics and economy, but is false zoomer, economic, politic and culture even religion are bounded by the same structure the state and regime, you cannot have two brains thinking conflicting thought, or two regimes in one. You literally propose an schizo state.
>>102880581
>both are equally bad
No, I said American have not two sides, and what is happening is a america problem, so, the enemy of nature and reason finally are american order. Anglo are next to the jews.
>>
>>102880674
>you cannot have two brains thinking conflicting thought, or two regimes in one. You literally propose an schizo state.
tell that to those leftists, they are retarded enough to go that path yeah
>>
>>102876588
learn how 2 quote
>>
>>102880662
And this is why Asia in general are better than western goyims, holy based.
>>
>>102880692
go back, tourist
>>
File: bad news!.jpg (36 KB, 390x345)
36 KB
36 KB JPG
>>102880694
聴け!逃げろう!
>>
>>102880509
asian hemisphere is fearless of western posturing fortunately
>>
https://speechbot.github.io/spiritlm/
Why are all these examples cut so horribly? https://speechbot.github.io/spiritlm/audio/expressive/T2S_sad_second_speaker.wav
>>
>>102880814
>feb 2024
Yeah outdated af
>>
>>102880853
Fuck yeah, jeb 2024.
>>
so any better local models than mistral large quanted for 48gb vram now?
>>
>>102880522
Look up quantization and GGUFs, you'll want to look at a Q4 GGUF file which you can run with kobold/llama/your choice of backend
>>
How brain-damaged is IQ3_M for 70b exactly? Getting desperate here, bros.
>>
Why does Nemotron 70b keep stopping at random intervals? This the case even with the default llama3 instruct template and neutralized samplers. But other than that its pretty good. Feels different, kinda like Command R+.
>>
>>102881080
I've gone down to IQ3XS on Mistral Large. That was enough for writing chat but for knowledge tasks I don't trust it.

For Llama 3 70B kinds of models, they seem sensible on non-obscure knowledge tasks at Q5 and Q6.
>>
>>102881111
Skill issue, probably. I don't have this issue.
>>
>>102876754
I wonder what those graphs will look like once the training is done. The loss and perplexity number has gone down since this image was posted, the tokens per second has gone up and the Inner LR has remained exactly the same other than being a little bit longer.
>>
Best model for 16gb RAM + 1060 6GB for roleplay purposes?
Right now I'm using https://huggingface.co/bartowski/Mistral-Small-22B-ArliAI-RPMax-v1.1-GGUF at Q4_K_S, but really want to try and max out this machine. Tried the same one at Q5_K_L and it was unusable. Thanks in advance.
>>
>>102881184
Ministral (after it gets proper gguf support)
>>
>>102881184
oh man anon i feel your pain but i think its time to start thinking about upgrading your hardware a bit
>>
did sillytavern devs kill themselves or something?
>>
>his voice a mix of boredom and intrigue
NO YOU RANCID PIECE OF SHIT, THERE IS NO MIX OF INTRIGUE. THAT UNDERCUTS THE ENTIRE PREMISE YOU RETARDED MACHINE. WHO THE FUCK SAID CLAUDE WRITES WELL?
>>
>>102881401
* ServiceTensor devs
>>
>>102881488
Shit in - shit out, sweaty :)
>>
>>102881357
Yeah, it sucks and I'm well aware, upgrading is just not in the cards right now.
>>
>>102881357
>Upgrading
>In this economy
Who do you think he is, Mr. Moneybags?
>>
>>102881493
* ServiceTesnor devs
>>
>>102880698
shut up newfag
>>
https://x.com/rohanpaul_ai/status/1847277918243754156
nvidia's nGPT
https://arxiv.org/abs/2410.01131
>>
>>102881926
Cool, now since it's so efficient and cost effective to train, let's see an 8B of it.
>>
>still no Ministral 100B
It's so over.
>>
File: file.png (13 KB, 548x141)
13 KB
13 KB PNG
>>102881401
yes it's just ghosts merging contributions now
>>
>>102880606
That would be stealing from hardworking artists like Taylor Swift
>>
>>102882180
>no bitnet
>not a single application of novel techniques
>we're still using the same pure transformerslop since 2 years ago
>the only difference is that everything got filtered and benchmaxxed to hell and back
This whole field is an nvidia grift
>>
Very excited for Intellect-1 to finish so the decentralized training meme can finally die. Still very confused what you shills think the benefits are, as if anyone capable of hosting this infrastructure is going to let you train "Most Horniest Chudded Out Based Hitler 70B" on their platform.
>>
Shills? For what? The resulting model, if it gets made, isn't going to be sold to anyone. And it's certainly not going to be a 70B when 10B takes such a long time to train already. At most I imagine that /lmg/ would do a continued pretrain of 8B or something, and probably for not very many tokens.
>>
>>102882493
It was at 11% ten hours ago and is at 11.80% right now. Assuming we get an extra .10% every 12 hours that is 2% every day. That means that the model will finish training in 44 days. I wouldn't consider that too much time for a 10B model.
>>
>>102882180
There will be an opus tier cohere model soon
>>
>>102882530
I'm accounting for the compute /lmg/ specifically has, which I imagine does not include people with free access to H100's.
>>
>>102882493
>imagine that /lmg/ would
Continue imagining, retard, I hate idealistic faggots like you.
>Shills? For what?
PrimeIntellect is a cloud compute provider. The only way you can contribute to Intellect-1 is to rent an H100 from them
>>
>>102882565
>Continue imagining, retard, I hate idealistic faggots like you.
Did you not read what that guy wrote, they were clearly being pessimistic you illiterate fuck.
>At most I imagine that /lmg/ would do a continued pretrain of 8B or something, and probably for not very many tokens.
>>
File: Own compute.png (13 KB, 617x203)
13 KB
13 KB PNG
>>102882565
For now, my best guess would be that they were going to see how the first model is trained on the decentralized network and see if anything breaks while they are doing it. Or they could release it before or they could never release it, who knows? Point is, current indicators shows that that will be a possibility in the future.
>>
File: image.png (55 KB, 822x822)
55 KB
55 KB PNG
Now that the dust has settled, what went so terribly wrong?
>>
File: bitnet 3b nala test.png (6 KB, 401x64)
6 KB
6 KB PNG
nala test for the native 3b bitnet model.
I mean.. it's about what you would expect for a 3B model. Except it's less than 1 GB.
>>
>>102882633
They didn't consult the machine spirit properly, instead they just put a Ouija board on each server used to train it and called it a day.
>>
>>102882591
It's ok anon, you didn't have to respond to that post for me. We all know it was nonsensical.
>>
>>102882633
Nothing really. Their main goal was to just get good PR for continuing to release old research while the newer research is held back because of muh politics and muh stocks (which are the basic issues behind the "muh safety" excuse that lies on the surface; none of these corpos give a shit about safety if they could get away with it).
>>
>>102876583
Retard-kun here,

What's the best model for me to play with if I want something to occasionally bounce ideas off of and help me edit writing, but also be able to do some steamy ERP?

I have a 3090/24GB VRAM

Just name me a few models and I'll go start doing some research on how to run these. I only have experience with SD/image generation so far so this will be new to me but I wanna see what models you guys would use with a GPU as strong as mine since I know there's a lot of poorfags/third worlders here.
>>
>>102882434
>as if anyone capable of hosting this infrastructure is going to let you train "Most Horniest Chudded Out Based Hitler 70B" on their platform.
Isnt the whole idea that its not centralised?
I thought the biggest problem is faggots agreeing to a dataset.
Training will obviously only become faster. If a couple thousand coomers with an 3090 for a month is enough we could easily do it.

Even if a central spot is needed with a website, thats not even illegal.
People who host much more compromising stuff exist right now.
It seems whenever decentralized training is discussed a guy like this pops up. There are only benefits to this first test run. Isnt johannes also making training code for llama.cpp? How can you not be excited. Very weird.
>>
>>102882838
Old command-r
>>
>>102882838
>24GB
There are no good models for that. But if you really want to try something, you could start with Mistral Small with the Q8_0 quant. Use Kobold.cpp hooked up with SillyTavern. There is some setup you will have to do and it will take time to learn as you go. Get some cards from /aicg/ and chub like this and go to town with your steamy ERP. https://characterhub.org/characters/boner/daisy-2c9fdbb8
>>
File: ZjHJsHH.png (616 KB, 618x1057)
616 KB
616 KB PNG
>>102882633
Ever since llama1 its been just downhill for meta.
Every one of the following models was worse than the previous one.
More smart but also much more cucked and less creative. Google and chinks make better assistant models anyway.
Imagine if we didnt have mistral for example. Would look bleak with only meta.

I wonder if they finally release a model that support image output with Janus 1.3b pressure.
Looks like shit, but better than cutting it out.
>>
>>102882979
Mistral really did come out of left field back in the day and cause a big splash. The more competition there is the better things will be. I am glad they exist.
>>
>>102882540
Are you an insider or just speculating?
>>
>>102882979
There is no pressure from that tiny shit model so I don't think so. And it's less pressure but more justification/precedent that they're waiting for. They very much want to release these models but can't, just like how OpenAI can't really let 4o just be totally unfiltered.

Anyway, it's fine we have a range of models for different purposes. On one hand we have the (relatively) uncensored Mistral, then Llama is more censored, then Gemma (although it only goes up to 27B and only up to 8k context), and then Qwen. And even Qwen is not too bad with a JB, you just have to know how to prompt it, use samplers, the {{random}} function, etc.
>>
>>102882540
You keep saying that. It keeps not happening.
>>
ah weekend hours so the thread goes to dogshit
>>
Who is even 3b Ministral marketed for? It would make sense for Largestral to be proprietary, but who is gonna pay for 3b model when there are 3b llama and qwen?
>>
>>102883136
The French work in mysterious ways.
>>
>>102882540
No one believes this now after the recent slopped+retarded update to CR+

Cohere fell off
>>
>>102882540
They're not getting Opus by training on the same scale AI slop that OAI trains on
>>
>>102883212
S-surely they have seen that their update was a sloppy job and they'll do better with the next model.
>>
>>102883154
The worst part of it was Cohere's CEO bragging about how people liked their models because they used human data for training, and then completely flushing their only advantage by using GPTslop. I still can't believe it happened. What were they even thinking?
>>
>>102883136
if I could run it on a shitty andriod phone id maybe use if for when im camping maybe.
>>
>>102883270
>human data
I don't consider pinoys and nigerians humans
>>
>>102883270
Yeah it's dumb as fuck. Corpos not realizing what customers actually liked about their product and ruining it out of ignorance is really common, but as you said, in this case Cohere actually DID know. But did it anyway.
>>
>>102883299
>customers
>>
>>102883317
>>customers
Yeah, they must have lost them with their shitty sloptune. Why use cohere when there are plenty of other options with long context?
>>
>>102882838
nbeerbower/Stella-mistral-nemo-12B-v2
At q8, 16k context. Hook it up to SD and bust fat nuts. Reason - because I said so.
>>
>>102882225
>merging st after death
thats just called hell, anon
>>
im so tired of nemo
feels like i keep talking to the same characters over and over again
it doesn't follow prose either
dumb as hell too
considering divorce
>>
>>102883964
ask it to write in low-quality style in last assistant prefix
>>
What's the best local model for coherent erotica and worldbuilding that I can fit on a 3090 ti with only 24 VRAM and 32GB RAM?
>>
>>102884219
no
>>
>>102884219
Gemmasutra-2b
(If you want something good, buy more RAM, it's cheap. Get 128GB(for 300 USD), you can then run Mistral-Large for SFW and Behemoth-123b for NSFW.)
>>
>>102883154
CR+ was already slop compared to base CR.
>>
>>102884291
Based gatekeeper.
>>
Why are so many models tweaked over anime shit? Why don't you guys like normal smut?
>>
>>102884291
>128GB(for 300 USD)
Try 1000GBP, I've got trident 3600mhz RAM sticks I had to import because they didn't sell them here.
>>
>>102884454
What kind of normal smut do you mean? Harlequin Romance novels? Ao3 fics?
>>
>>102883270
I know nothing but maybe the engineers insisted on more data and therefore used even more synthetic slop?
>>
>>102884459
>GBP
>British pound
Why the hell is ram not being sold to the British?
>>
>>102884500
they dared contest germany's rule of europe
>>
>>102883252
Unfortunately, benchmark numbers are easier to point out than style.
>>
>>102884500
The gold trident 3600mhz model isn't (or wasn't, I haven't checked recently) sold here for some fucking reason.
>>
>>102884507
Fool, Europe has always belonged to the Franks!
>>
>>102884481
literotica
>>
>>102884509
Benchmark numbers SUCKED THOUGH.
>>
File: 1699690546523446.jpg (79 KB, 894x745)
79 KB
79 KB JPG
>>102884514
>gold trident
I'm going to assume you had no other choice because this shit is horrendous to look at
>>
>>102884459
Oh damn, I feel bad for you. I didn't know that Anglostan was doing so bad economically, besides being the most cucked country in Europe.
>>
>>102884754
Get a whole rig like that and you'll get bling kino
>>
>>102884754
my RAM goes inside a plain steel case and I want zero of the price of it going into looks or especially RGB lighting
>>
I'm having bigger t/s on IQ4_XS than on IQ3_M, while having more layers dedicated to the GPU on IQ3. What the fuck is going on here?
Aside from the layers, all the other settings are the same. Getting 1.4 t/s on IQ3 vs 1.8 t/s on IQ4_XS.
>>
I am currently using Mistral large 3.0 quant exl2 for both RP and general use, fits on 48gb vram.

Anything better?
>>
>>102884988
4 bit data have less overhead to unpack vs. 3 bit because you can efficiently pack 2 4 bit values into a single 8 bit value.
So even though the amount of data is larger you need fewer memory accesses.
>>
what are some good uncensored models under 13B that's good for erp? is mistral nemo good or are there any better models?
>>
>>102885120
>under 13b
>good
lmao
>>
File: redditproxyslop.png (35 KB, 784x790)
35 KB
35 KB PNG
>looking through old datasets for something to use as a framework for a synthetic single turn Q and A dataset.
>open up a random json from unpacked leaked undislop dataset
>notice something peculiar (picrel)
>the slop is coming from the human side of the conversation.
There's probably text renderings of some of these proxy logs in the pile. but try-hard redditors who put on their sunday best to ERP with a fucking bot are the ones who put the slop in there.
>>
>>102885171
>who put on their sunday best
but it's saturday
>>
>>102884754
I like the tacky 90s gold aesthetic as opposed to the RGB lighting alone. If I could make my entire computer look like gold plastic 90s shit I would. The rest of my computer is just all black.

>I'm going to assume you had no other choice
Mostly. That c16 and 3600mhz and 2x16GB, at the time, was one of the few RAM sticks that was available. The tacky gold look was only £10-40 extra.
>>
>>102885171
Ever considered this is just someone using impersonate to have llm write for him?
>>
>>102885197
>someone finds evidence that reddit ruined the internet
>immediately jump into the fray to play devil's advocate.
Gee I wonder where this guy came from.
>>
File: trained on fineweb.png (51 KB, 493x533)
51 KB
51 KB PNG
>>102885171
Did you just realize this? The training data for the foundation models is all slop too.
>>
>>102885211
Well I always knew it came from human writing I just assumed it was all from novels. not a bunch of reddit gooners LARPing as Charles Dickens for their waifu.
>>
>>102885210
Why are you baiting by pretending to be retarded?
>>
>>102885171
anon that's 100% llm-generated on both sides
>>
Timeline adds up too. Initial commit for the OAI key proxy is December 2022.
All the most un-de-sloppable models have knowledge cutoffs a few months beyond that (affording time for the logs to end up on the internet)
Key proxy locusts uinronically ruined llms for everyone.
>>
boring schizo larp.
>>
Are there actually any RP models that don't have characters be cumdumpsters by default? I feel like whenever I do something like walk up to the girl and slap their ass the result is something like they get mad and then there's a line break and it goes DESPITE HER ANGER A THRILL RAN THROUGH HER yadda yadda. It never simply ends with the character being mad as they should.
Hell I could probably whip out my dick and cum all over her face without any warning and it would end the reply with something like "Despite the disgust and humiliation, a part of her felt excitement at the taboo nature of such an act"
>>
>>102876754
Very cool, we are a couple improvements away from being able to do this with 3090s, the main problem is efficiently updating the weights in a thousand GPUs in a short amount of time given a mediocre internet connection
>>
>>102885355
largstral
mistral small maybe
But sometimes you've got to just delete the offending line, write a reply you'd expect so the model can continue it or modify the card
>>
>>102885355
write your character cards better, if the model has no info it is probably just going to try guess whatever you want the result to be
>>
>>102885396
Yeah if I rewrite the character's reaction it usually sets a precedent for how she should act in the future so there's that at least. Maybe I should upgrade my setup for better largestral speeds, right now it's too slow to really bother with.
>>
>>102885479
Fair enough but also it feels silly that I have to write that the character doesn't like it if a stranger cums on her face. Though I suppose I should try to make that happen by describing her overall personality in better detail. I'll try to improve my cards, obvious but good suggestion.
>>
>>102885519
it shows a wider weakness of LLMs, these models always try and please the user. IE it's really hard to get a LLM to give actual criticism because it will always say your retarded ideas are amazing, or at worst, interesting.
>>
>>102885519
It's all about the character description.
I remember Nemo once had a girl shove a guy to the ground and kick him for groping her. Scenarios like that don't always end the way you expect them to.
>>
>>102885538
Why is this?
>>
>>102885606
well that's above my pay grade, training stuff
>>
>>102884733
That's why there is an incentive to destroy the model with slop.
>>
>>102885355
No, all the datasets that sloptuners use are full of smut. "RP" models are actually just smut models. Use official instruct tunes.
>>
Does everyone here literally have 128GB RAM?
>>
>>102885683
I only have 96GB VRAM
>>
>>102885683
I have 64gb of RAM + 8gb of VRAM.
I do need to overclock my RAM.
>>
>>102885683
256 GB RAM, 96GB VRAM
>>
Any work going into Plamo-100b? Morbid curiosity prevails, and wanting to see what translation ability it has outside of the limited demo site
>>
File: Untitled.png (137 KB, 1266x1224)
137 KB
137 KB PNG
>>102885355
>>
>>102886014
Kobold kiddies prob all crosseyed with layout like this.
>>
>>102886014
Yep Rocinante is better
>>
>>102885840
The anticipation for Plamo is leaving me utterly electrified as well, the prospect of a translation model sending shivers down my spine.
>>
>>102885683
64 GB RAM + 24 GB VRAM. A comfy setting before I used LLMs.
>>
>>102885683
64gb ram and 64gb vram so technically yeah
>>
>>102885683
32gb ddr4 ram and 8gb vram here
>>
>>102886363
So what model do you use?
>>
>>102886389
Probably nemo right?
>>
>>102886389
Rocinante 12b v2g q4_k_m right now, but I change up frequently
>>
>>102886411
Is it good?
>>
Fimbulvetr-10.7B-v1-Q8_0
mixtral-8x7b-instruct-v0.1.Q5_0
mythomax-l2-13b.Q5_0
Toppy-M-7B.q8_0

These are all the models I've run so far on my 3090ti 24GB VRAM (and 32GB RAM).

Are there better models for coherent erotica and worldbuilding?
>>
>>102886439
base miqu
>>
>>102886439
Rocinante-12B-v1.1
>>
>>102886417
I'm enjoying it, had some fun RPs.
I'm not as demanding as a lot of others here though.
>>
>>102886439
I'll also vote for >>102886452 although I haven't tried >>102886411
In your place, actually, I'd probably try >>102886451 or some mistral-small fine tune.
>>
When is Arthur going to release the fp16 Miqu weights?
>>
>model-00001-of-00005.safetensors

Wait, do I have to merge these files now? What happened to single ggufs?
>>
>>102886485
look up (model) gguf instead
>>
>>102886485
bruh
>>
>>102886485
hello sir
I understand you are having some trouble with the models
>>
>>102886485
>safetensors
>gguf
Those are different things.
>>
>>102886485
sir...
>>
>>102886485
ggufs haven't been a thing since llama.cpp died
>>
>>102886529
>>102886526
>>102886525
>>102886514
>>102886513
>>102886507
Motherfuckers I've been gone for a while. I still have koboldcpp.exe
>>
>>102886540
Koboldcp still works and it never ran .safetensors as far as I'm aware.
>>
>>102886540
Sir nobody uses koboldcpp or llamacpp or any other meme backend anymore
we are all sitting on servicetesnor foss backend
>>
>>102886557
>not running a custom backend made entirely in RPGMaker MV
NGMI
>>
>>102886540
>not using bitnet.cpp by now
You're not gonna make it.
>>
>>102886540
.exe
>>
>>102886540
kobold died with llama.cpp
>>
>>102886585
the best model for that is a 3B, and not like a current Gen 3B model tier 3B.
>>
File: Untitled.png (870 KB, 1908x862)
870 KB
870 KB PNG
>>102886540
go back to its model card click pic related to see gguf quants
also grab this
https://github.com/LostRuins/koboldcpp/releases/tag/v1.76
>>
>>102886584
>>102886585
>>102886588
>>102886597
So what's the thing to run a model these days?
>>
>>102886613
vllm
>>
>>102886610
nta but thanks.
>>
File: 1711129569471474.webm (3.68 MB, 1768x1202)
3.68 MB
3.68 MB WEBM
>>102886585
Totally lossless quality btw
>>
>>102886658
>>>>>>3B
>>
>>102886613
llama.cpp
>>
>>102886677
But they just said...
>>
>>102885355
Seems like just adding "despite" to banned strings works to counteract a solid amount of that but obviously just banning this word across the board might cause problems elsewhere. Haven't noticed any yet in my RPs though.
>>
>>102886687
>86687>>102886677
>bu-bu-bu-bu
I'm still running my models by banging two rocks together
>>
>>102886014
Thanks anon, rocinante is actually good, best model i've tried in a while.
I've tried a lot of shit even the 40Bs but this one it actually feels like it is responding to instruct properly.
Would encourage anyone to get the Q8 or full precision if you have the vram.
>>
File: 1636941718706.gif (3.75 MB, 520x293)
3.75 MB
3.75 MB GIF
Any decent ERP models these days for 24GB VRAM bros?

Or is it still that Cydonia/Mistral Small or Roichante (or whatever the fuck)?

I like checking in every week or so to see if somethings popped up
>>
Where are the layerskip implementations for existing models? I need layerskip nemo immediately.
>>
>>102886810
wtf is layerskip
>>
>>102886810
Get implementing it!
>>
>>102886820
When the model skips layers
>>
>>102886807
>I like checking in every week or so to see if somethings popped up
You and about 20 other casuals who drop in weekly to ask to get spoonfed about this exact configuration.
>>
>>102886834
yes, I don't really care to converse with schizos as a daily thing.

It's the weekend and i'm in the mood to coom
>>
>>102886834
>most common consumer setup
>keeps asking to be spoonfed sota model
hm
it's almost like someone out there wants to keep tabs on the competition
>>
>>102886790
rocinante
>>
How do I run chatgpt for sexy on rtx2060? (keep in mind, strong rtx and not weak gtx gpu)
>>
>>102886790
>Would encourage anyone to get the Q8 or full precision if you have the vram.
Why? Q8 vs Q4 is the same in actual use
>>
File: 1713120762051584.png (1.04 MB, 4000x4000)
1.04 MB
1.04 MB PNG
>>102886105
>>102886411
>>102886452
>>102886790
>>102886849
>>
>>102886790
>>102886849
what's so good about this model btw?

Why is it better than mistral small or just cydonia? It struck me as your typical Nemo finetune from Drummer but less smart than Cydonia (the same overly NSFW horny model).

Genuinely asking as I wanna know if my brief experience was just shitty cards/prompts but I see it mentioned everywhere
>>
>>102883280
I don't consider anglos and kikes humans too, but models have to tend to be neutral and with enough data to speak as every posible human, even the retards one.
>>
Which distro is best for local models Just Werking? Fed up of everything breaking when I update.
>>
I feel bad for localkeks
>>
>>102886514
I want to speak to your manager.
>>
>>102887071
not them but i didn't like the tune, seems to unhinged, like the data its trained on is all over the place rather than focused. try lyra4 gutenberg
>>
>>102887172
>berg
>>
>>102887071
i didn't really like the original rocinante
rocinante 12b v2g (aka UnslopNemo-12B-v3) is great though.
the model doesn't aggressively try to fuck you, doesn't shiver often, and follows along with the story pretty well.
>>
>>102886879
I know you're responding to a shitpost, but still no. In my actual use asking Mixtral 8x7b Instruct to write a short story adding the sentence "Use vivid and descriptive language" to the instructions dramatically and consistently changed the way it wrote at Q8. (I'm not saying it was better or worse, I'm saying it was very obviously different.) It inconsistently changed the way it wrote at Q5, and at Q4 it was hard to distinguish from placebo. If I bought the bullshit about low quants being the same because of perplexity graphs I'd have misconceptions about what kinds of instructions had an effect.
>>
>>102887129
no need, everyone here is using claude for actuall usage anyway
>>
>model bad
>limit output to 100 tokens
>model good
hm
>>
>>102887278
>generate 1 token at a time and reprocess prompt after each token is generated
>agi achieved
>>
>>102887271
The cutoff point is Q6
>>
>>102887291
AGI was already achieved internally.
>>
>>102887278
models like to try to tie things up thats why you end up with 'as the days passed' and stuff at the end of messages, like a conclusion. the patrician way is to let it write for 300 tokens and then trim the message to where it starts to talk like that. after you get a bunch of messages into the context like that it'll start to write more like it anyways and leave the ending of a message open ended more
>>
>>102887278
There was a time I felt dropping the last sentence, two sentences, or even paragraph of a reply was generally an improvement. Setting a token limit shorter than the typical reply and selecting "trim incomplete sentences" worked for me back then.
>>
Moore's law for vram when?
This is ridiculous that cards are held back by vram when those vram chips are cheap as shit.
>>
>>102887337
Not as long as Jenson has his monopoly and his cousin Lisa is keeping his helping him keep it.
>>
>>102887336
Yeah that's what I'm doing now, happy with results.
Is there a way to bias the (end of sentence) token somehow?
>>
File: quant iq.png (12 KB, 809x620)
12 KB
12 KB PNG
>>102887312
Anon, that's wrong. q5_k_s is the smartest quant.
>>
>>102887365
Maybe the mainlanders will destroy them
>>
>>102883280
Get out.
>>
>>102887427
He's right though. Look what Nigerians have done to GPT models. Animals, all of them.
>>
>>102883270
They needed to put something out to stay relevant after Elon insisted that they couldn't mention their involvement with Column-R, which he bought off them and released as grok-2.
>>
no one wants to answer my question :(

>>102885034
>>
>playing around with the settings in kobold
>randomly decide to crank the max output up to 512
>suddenly nemo starts to spit out claude level gems

wtf is this sorcery?
>>
>>102887541
Teh answer is not really unless you want a slutty whore for a language model
>>
>>102887541
Have you tried CR+?
>>
File: 8 digits wtf.jpg (194 KB, 1080x470)
194 KB
194 KB JPG
>>102885034
>>102887541
I dare to say Nemotron 70B is better.
>>
>>102887541
For both RP and general use, no.
>>
>>102887541
What speeds do you get?
>>
>>102887365
I'm hoping for companies like Groq to tear those chinkoids a new one.
>>
>>102887574
--Nemotron 70B: Unique prose, fun, but dumber than Largestral with logical errors:
>>102865433 >>102865448 >>102865676 >>102866355
>>
>>102887586
I've been using it for the past week and I still haven't encountered a single instance where it made a logical error, I think that anon is using meme sampler settings and blaming the model.
But even if that was the case, Nemotron 70B has such a deep understanding of RP it's definitely something worth checking anyway.
>>
>>102887632
Still it's lacking a lot in general knowledge
>>
>>102887579
like 8 it/s, pretty fast
>>
>>102885034
bigger quant?
>>
>>102887632
I tried it briefly for some text adventure type shit, but it kept trying to add headers and add a bunch of asterisks to its responses. This happened even when continuing long sessions from other models (Mistral large). Was just using temp between .5 and 1 with min p between .01 and .03. Have you had any formatting problems?
>>
Genuinely do you think LLMs, or transformers can lead to AGI?
>>
>>102887842
no
>>
>>102887800
NTA, but I've seen similar. Continuing an ERP from another model, Nemotron will start responses with things like "**Explicit Content Warning**". Not always, but often enough to be annoying. Will also frequently want to end responses with ellipsis, like a mini-cliffhanger. All the preference RLHF seems to have heavily biased it to certain types of formatting.
>>
>>102887842
>can lead to AGI
Yes. Eventually some company will put their 50k servers to work at throwing random algorithms at the wall and something will stick.
>>
>>102887884
Haven't seen that but I do have a system prompt telling it to always remain in character.
>>
>>102887842
Nah.
The current implementations are based on the hope that statistical correlation will create reasoning as an emergent property and that will lead to super reasoning and super human iteration which eventually will achieve superhuman capabilities.
If AGI really is at all possible, there will probably be a module/block/something that's oriented towards actual thinking as a primary feature.LLMs might be a component in the architecture/system, but we will need something new that's not simply language based. Hell, maybe even the idea of tokenizing shit will go out the window, who knows.
What that will look like? I have no idea, otherwise I'd be a billionaire, lol.
>>
>>102887842
Since many people disagree upon the definition of AGI, you must first define which you are asking about.
>>
>>102887800
Yes, the model definitely has formatting issues. It seems to always write with asterisks even when the past messages weren't like that. I also noticed that the model likes to use ellipsis a lot.
>>
>>102887884
I never got this "explicit content warning" even when playing loli scenarios, you probably have something weird in your system prompt.
>>
>>102887800
>it kept trying to add headers and add a bunch of asterisks to its responses.
arenamaxxed to the very core
>>
>>102887842
I think it will cause your agility stat to decreases from sitting at your computer too much
>>
>>102887924
I don't think you need to, we are both conscious humans, and know what that means without being able to define it. 'We'll know it when we see it"
>>
>>102887949
A conscious human often ignores what he sees, definitions are necessary for stuff like this.
>>
>>102887949
A man who does not know what he means is not speaking as a conscious human. The animal mind feels. The rational mind reasons.
>>
File: ifever.png (236 KB, 885x1057)
236 KB
236 KB PNG
>>102883280
Well then actually don't use my tunes, if ever.
>>
>>102888187
>The animal mind feels. The rational mind reasons.
Farts are meant to be huffed.
>>
Anyone have Emily's gallery before it got deleted?
>>
Which is best ?
nemo
405B
mixtral large
midnight miku 103B
<other big boi>
>>
>>102888658
StableLM-7B
>>
>>102888658
Starling-7B
>>
>>102888694
>>102888694
>>102888694
>>
>>102888658
nemo or mixtral large
>>
>>102877046
hi. i am back. have not been around for a few months on this board.
>>
>>102889114
welcome back we missed you
>>
>>102876754
>/1T tokens
lol, lmao



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.