[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106785094 & >>106777408

►News
>(10/03) Qwen3-VL-30B-A3B released: https://hf.co/Qwen/Qwen3-VL-30B-A3B-Thinking
>(10/02) ZLUDA 5 released with preliminary support for llama.cpp: https://vosen.github.io/ZLUDA/blog/zluda-update-q3-2025
>(10/01) Granite 4.0 released: https://hf.co/collections/ibm-granite/granite-40-language-models-6811a18b820ef362d9e5a82c
>(10/01) LFM2-Audio: An End-to-End Audio Foundation Model: https://liquid.ai/blog/lfm2-audio-an-end-to-end-audio-foundation-model
>(09/30) GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities: https://z.ai/blog/glm-4.6

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 1731045591124463.jpg (560 KB, 1152x2048)
560 KB
560 KB JPG
►Recent Highlights from the Previous Thread: >>106785094

--Evaluating model performance in replicating 4chan responses through Azula Test and programming challenges:
>106790445 >106790503 >106790627 >106791305 >106791448 >106791613 >106791666 >106791673 >106791758 >106791800
--zram vs nvme swap tradeoffs for llama-server memory management:
>106785342 >106785402 >106785440 >106785767
--GLM 4.6 model performance and quantization tradeoffs:
>106785160 >106785265 >106785304 >106785310 >106785350 >106785363
--Skepticism and analysis of hybrid quantization model performance claims:
>106786959 >106786964 >106787006 >106786984
--Adjusting koboldcpp anti-abuse parameters and user concerns:
>106788161 >106788194 >106788204 >106788246 >106788222
--GLM model compatibility, layer splitting, and banned strings implementation challenges in local LLM setups:
>106786681 >106786698 >106786777 >106787027 >106787043 >106786746
--ROCM/Vulkan performance issues and model runner alternatives for better output consistency:
>106785478 >106785529 >106785609 >106785617 >106785627 >106785674
--Anticipation and skepticism around upcoming Gemini 3 release and Gemma model improvements:
>106788067 >106788168 >106788525 >106788874
--glm 4.6 quantization choices for 128GB RAM and 16GB GPU VRAM systems:
>106787432 >106787444 >106787446 >106787592 >106790941
--Mistral Nemo's roleplay performance attributed to lack of safety constraints:
>106790181 >106790218 >106790276
--Qwen3-VL-30B-A3B vision model release with 4-bit quantized version:
>106786925 >106786938 >106790288
--Optimizing GLM-4.5-Air model size and quantization for VRAM/RAM constraints:
>106788003 >106788268 >106788280 >106788391
--Miku (free space):
>106785751 >106785797 >106785878 >106786172 >106786553 >106786953 >106787862 >106790322 >106793233 >106793303 >106793366

►Recent Highlight Posts from the Previous Thread: >>106785099

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Mikulove
>>
I was reading classical literature, and read "Shivers ran down - spine"
>>
File: 1740197051761392.png (639 KB, 502x556)
639 KB
639 KB PNG
>>106793382
>>106790276
>What we need is not democratized inference, but democratized training.

That already exists with tools like unsloth and axolotl.
https://github.com/unslothai/unsloth
https://github.com/axolotl-ai-cloud/axolotl

But the best majority of people won't even put in the effort to understand how data sets actually work, let alone figure out how to train anything in the first place.

The aforementioned tools are primarily used for fine tuning but you can use existing open source libraries to pre-train your own model too (provided you have enough compute, data, money, and patience to do so)
>>
Or how to know if you fucked up your chat template.
>>
File: 1737381365636138.jpg (27 KB, 264x377)
27 KB
27 KB JPG
Whats the absolute best ai model for searching and deep research right now? Local or non-local?
>>
stop posting lust provoking images
>>
File: 13823094029374.jpg (175 KB, 800x1066)
175 KB
175 KB JPG
>>106793382
>>106793525
>>
>>106793523
https://huggingface.co/Alibaba-NLP/Tongyi-DeepResearch-30B-A3B
>>
ashram teller tiger
>>
is there gguf support for glm4.5v yet?
>>
>>106793499
>>106793512
What would be the best way to do a fine tune to get a model to pick up a specific sql syntax? I.E. I want to finetune a model to be an expert in Apache Solr(example, not what I'm aiming for)

I'm aware of having a curated dataset with examples of positive/negative and plain reference, am aware of unsloth/HF, but no idea beyond that.
I feel like there's probably some existing work/effort towards this, but I haven't been able to guess the right phrases to search for.
>>
why are there more and more fagbots on chub?
>>
>>106793646
chub is dying along with the other sites similar to it. interest in llms is rapidly fading so all that remains is the bottom of the barrel
>>
The only thing that would make 4.6 better is if at some point it would be proven that the only thing they did was move the "safety" slider for data all the way to the left.
>>
>>106793646
because you are multiplying. actually that is kinda weird.
>>
>>106793523
>searching and deep research
wtf do you actually mean by that? describe the tasks and how you determine proficiency. you must understand how the tools work to use them well
threadly reminder every LLM is a loop on f(prompt)=logprobs
>>
File: file.png (105 KB, 564x481)
105 KB
105 KB PNG
>>106793474
About this time last year I heard the halloween music come on in the store. LLMs have ruined spooky skeletons, and Bohemian Rhapsody for me. Well I wouldn't say "ruined" Bohemian Rhapsody. It's just that I smirk mischievously with sparkling eyes when one certain line comes up.
Bonus Teto: https://www.youtube.com/watch?v=pwU6gWmb5yc shivers @ 1m54s
>>
>>106793813
i am NOT a faggot
say sorry NOW
>>
>zai-org/GLM-4.6
true high quality ERP has now been tried.
>>
>>106793605
Negative examples are useless for LLM training as far as I know.
Avoid unsloth, it's astroturfed and made by incompetent people. Use axolotl and do QLoRa finetuning (ignore the people who will say it doesn't work, they don't know what they're talking about).
To finetune effectively you HAVE to uderstand chat templates and masking. The input to any LLM training process is basically a text that fits in the context window, optionally with some parts of the text "masked", typically the user input.
All forms of LLM training reduce to that. The black magic is in generating a good dataset to train on.
But you can begin by just converting books and documentation to .txt and training on that. Then go from there. Remember to keep a val dataset.
>>
>>106793646
>>106793826
i heard chub has an algo which suggests cards based on downloads it detects. cool huh.
>>
>>106793819
He means asking a question and having the models find the information from the Internet as effectively as possible.
>describe the tasks and how you determine proficiency
I tell the model "make a document in exquisite detail about the architectural details of GLM 4.6" and it makes it even if when the model was trained GLM 4.6 didn't exist.
>>
>>106793872
lel, there's an issue: i dont have an account, my browser clears cookies upon exit and i have a dynamic ip
and i havent used chub in over a month
>>
What works better for coding if I don't want to wait hours for the model to respond, GLM 4.6 with <think></think> or Qwen 3 Coder 480B?
>>
>>106793860
>they don't know what they're talking about
explain why new nemo sloptunes are still being made.
>>
>>106793499
what happened to INTELLECT?
>>
File: ComfyUI_00538_.png (937 KB, 1024x1024)
937 KB
937 KB PNG
>>106793837
Now do you see?
I will continue slopgenning until a true artiste realises GLM-chan. 4.6 is a big leap and deserves a mascot. why is there a deepseek general?
4.6 RP is great also tool calling works well.. if you can pump enough tokens to make it useful realtime
>>
>>106793895
Qwen 3 Coder 480B definitely
>>
>>106793900
??
>>
>>106793900
From their AMA, should be releasing 3 either this or next month. But shit data makes for an underwhelming model.
>>
>>106793892
>lel, there's an issue: i dont have an account, my browser clears cookies upon exit and i have a dynamic ip
You're basically saying
>It cannot possibly know i download gay porn because i delete all the evidence of downloading gay porn
>>
>>106793892
It is just that good at detecting a homo stench on you
>>
Is there an AI that'd run well on an M4 Mac mini? Mainly just wanna make hentai stuff. All my windows PCs are less powerful.
>>
>>106793952
How much unified memory?
>>
>>106793892
Limp-wristedness is documented as being heavily correlated with homosexuality. And limp-wristedness can easily and accurately be measured by reading cursor movement.
>>
>>106793964
16GB
>>
>>106793973
you cant do shit then
>>
>>106793921
And I assume whatever anons are responsible for it won't stick their neck out with copyrighted data in the dataset, which is sensible.
So we have a distributed training method proven to work, all that needs to happen is a dataset with all of the copyrighted shit... all of it.
>>
>>106793973
Oof.
Mistral Nemo.
>>
>>106793982
The biggest issue is all the current methods are made to work on homogeneous hardware. I don't see how to allow people trying to contribute with P40s without dragging the entire effort to a crawl, but if that could be fixed, I think a group finetune would be more practical than a new model from scratch.
>>
>>106793979
>>106793999
there's no AI for poors? Even if it means super long render times?
>>
>>106793973
>16GB
>macfag
*ducks and covers*
imagine socketed sodimm in a macbook. why not? there's no technical reason it's just greed
>>
>>106793973
Qwen3 30B at Q3S will work nicely.
>>
>fuck around with homebrew evolutionary neural architecture in sepples
>first problem requires writing an interpreter with a stack and memory
>second problem requires writing a graph compiler
>third problem requires writing a scheduler
>fourth problem requires writing a cache system and branch predictor
What the fuck have I gotten myself into, kek. I love fundamental stuff so it's a lot of fun but at the same time highly interesting how all these concepts just arise from the basic system requirements. Is it emulators all the way down?
>>
what's better now than gpt-oss 120b (unquanted) for general knowledge and coding within the same memory footprint? 96gb vram, could offload to 384gb ram if worthwhile.

gpt-oss 120b has actually been really useful for reference and coding, but it has so many baked-in traits that can't be altered with the prompt.
>>
>>106794017
Technically you can run any model on any computer as long as it fits in the hard drive.
The problem is the model needs to go through the whole model (or in the case of MoE, the fraction that is activated) EVERY token. So if the model weights 300 GB (which is the ballpark for the good models) and it has 10% activated it still means going through 30GB of data for every token. Which is obviously extremely slow.
And in that case even having 64GB of RAM won't save you, because for every token it has to use a different 30GB subset of the file, so it still has to load the 30GB from disk. It only gets into the tokens per second range when you can load the whole model into RAM, and unfortunately all the models that can fit in 16GB are going to be tiny models. 1B parameters ≃ 500MB of data, and all the good models are in the 400B range, so you will need 200GB just to hold the model's weights, and some more to hold the KV cache (don't ask what this is, it's complicated).
>>
>>106794031
WINE? In my LMG?
>>
>>106793819
I just want to find as accurate information as possible. Doesn't even have to be an LLM. LLMarena says Grok 4 Fast is currently the best at searching but I feel thats wrong
>>
>>106794070
GLM 4.6
>>
>>106794121
>within the same memory footprint
GPT-OSS 120B is like 65GB of memory, and Q1 is still 97GB?
>>
File: file.jpg (222 KB, 1850x1002)
222 KB
222 KB JPG
>>106794141
You said you could offload to RAM. It's worth it, trust me.
>>
>>106794101
>WINE
ackshually, WINE is a compatibility layer :^)
This feels more like an emulator for some kind of fever dream hardware. Driven by hatred for matrix multiplication and SDG, this is the price to pay.
>>
Got 32 GB DDR4 and 2 8GB GPUs (1080 and 3070).
What are your recommendations for a general chat bot that is not completely retarded ?
>>
>>106794191
ienno man i enno man man listen
heres the deal
u need more ram man
but look maybe a low glm air quant or maybe maybe just maybe qwen a3b 30b thinking or no i dont know
yea
>>
>>106794191
Shit nigga, you are making things kind of hard.
I think your best bet is Qwen 3 30B A3B, probably the thinking variety.
>>
>>106794191
good luck m8
>>
>>106794191
maybe Qwen3 32B or whatever recent dense 32b model
yea that might be good, maybe not tho
ieno
i wonder why no one tried qwen3 32b seriously in lmg
i know a few anons did but ugh.. llama2 34b
>>
File: 1718768497057609.jpg (102 KB, 854x687)
102 KB
102 KB JPG
>>106793860
thanks anon, very much appreciated. Will keep in mind.
>>
Getting tired of nemo slop output, is there any decent alternative for an 8gb vram serf? I do 85% fantasy RP and 15% plap plap.
>>
>>106794344
stop being poor
>>
>>106794344
If you have 64gb of vram, GLM Air is viable.
>>
still no qwen 80b & vl goof
>>
>>106794027
link to guide for that please?
>>
>>106794347
never
>>
>>106794148
OK, I'll give it a try, thanks.
>>
Is qwen 3 235b now the best model available with vision abilities?
>>
>>106794553
>106794553
qwen3 30b vl
>>
>>106794560
The 30b is better than the 235b?
>>
>>106794553
>>106794560
>>106794567
where goofs?
>>
>>106794567
no the https://huggingface.co/moonshotai/Kimi-VL-A3B-Thinking-2506
>>
>>106794553
Depends what you use it for. From the n=1 test I've done, dots.vlm1 seems to be better at handwritten text recognition, but Qwen has been trained to parse GUI elements and give exact coordinates.
>>
has anyone tried this yet
https://huggingface.co/BasedBase/GLM-4.5-Air-GLM-4.6-Distill
>>
>8700g
>96gb ram

What can I run.
>>
>>106794592
What? Can I Run.
>>
>>106794596
What can? I run.
>>
>>106794610
What can I? Run!
>>
What can I--RUN!
>>
>>
>>106794591
Where original weights. These quants aren't enough.
>>
i miss the old days where the best model was MiQu-70b or midnight rose
t. never used miqu for erp because 1t/s
>>
>>106794403
You will want to use Linux for this with either lxde or through the console without a graphical environment, since with 16GB every GB counts and you don't want any to be wasted for the OS itself.
Step 1: download llama.cpp
Step 2: download the GGUF file from huggingface (the model). This could be Qwen_Qwen3-30B-A3B-Instruct-2507-Q3_K_S.gguf or similar (try Q3 XXS or Q2 if you run out of RAM).
Step 3:
Figure out a command line that works for you. This works for me:
llama-server -m <your file here>.gguf -c 32000 --port 8001 -ngl <try different values from 0 upward>
Then access 127.0.0.1:8001 on a web browser or (to save RAM) if you're not scared of the command line you could make a minimal python client for the OpenAI compatible API over the same address and port.
Alternatively you can try with llama-cli since that should use a bit less memory than the server as well, most people start with that one first before going for the server command. The command line is more or less the same.
>>
>>106793837
high quality indeed
none of the erp-focused troontunes come close either
>>
>>106794585
I have 2 hypothetical uses for vlm:
- observe my screen in real time to provide suggestions / commentary
- a feedback loop for image generator prompting
>>
>>106794700
Well if you figure a way to run Qwen3 VL on CPU then share with the class, unless you happen to have 500GB of VRAM laying around.
>>
>>106794711
>30B
Why are you gay?
>>
>>106794791
https://huggingface.co/unsloth/Qwen2.5-VL-7B-Instruct-GGUF
>>
File: 1758978276426960.png (18 KB, 1333x138)
18 KB
18 KB PNG
>>106794648
haha yeah...
>>
What is the best ERP multimodal model available in GGUF format? I have quad 3090s.
>>
>>106794881
https://huggingface.co/unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF
>>
>>106794890
I thought llama4 was garbage?
>>
File: 1730310220787427.png (318 KB, 615x688)
318 KB
318 KB PNG
>1 bit
kek
>>
>>106794927
nigga its been 6 trillion years
>>
>>106794934
He knows, that's why he cropped the date out.
>>
>>106794934
>>106794950
it's not that old lol
https://xcancel.com/LiorOnAI/status/1913664684705874030
>>
>>106794967
>April 19
Might as well be ancient history.
>>
>>106793382
is anyone still making finetunes 100b and up aside from that giga-fag thedrummer?
>>
>>106794992
finetunes are a scam
>>
File: 1759000089785513.jpg (9 KB, 180x246)
9 KB
9 KB JPG
>>106794074
>and all the good models are in the 400B range
Nta. What kind of shit are you guys doing locally that would require 400B as the bare minimum?
>>
>>106794992
https://huggingface.co/zerofata/GLM-4.5-Iceblink-106B-A12B
>>
>>106794992
Finetuning a model undergone post-training is a recipe for failure
>>
>>106795051
Nta. Elaborate. Are you claiming fine tuning a base model that has already gone through SFT training is a bad idea? Why?
>>
File: G2dh9tXbYAEsXd8.jpg (392 KB, 2048x1536)
392 KB
392 KB JPG
>>
>>106794791
The conversation was about the 235B model retard.

>>106795033
Nobody said it was the bare minimum, I said "the best". As for what I'm doing, programming. The biggest models are just barely good enough to not be very frustrating.
>>
>>106795087
I like this bee
>>
>>106795087
beeku
>>
>>106795087
peeling back beeku's foreskin
>>
>>106794927
>>106794934
>>106794950
>>106794967
>>106794976
Was this model any good? Anyone tried to retrain it into specific tasks?
>>
>>106794711
does transformers not automatically overflow to cpu memory with qwen vl like it does for most other models?
>>
>>106793860
>Avoid unsloth, it's astroturfed and made by incompetent people
does this go for their training software, or the ggufs too? for stuff like glm 4.6 am I better off downloading bartowski or someone else?
>>
>>106793901
so you're saying 4.6 is better than deepseek period?
how about for writing short stories?
>>
>>106795077

Most likely because official instructs are deepfried with slop, benchmax, and alignment.
>>
>>106794074
that was actually well explained anon, good post
>>
>>106795551
are you working on a tune on the new Apriel?
>>
>>106794666
noted thanks
>>
why is "teen" a bad word on chub? How am I supposed to make Asuka?
>>
>>106795607
teen = underage
>>
>>106795607
Whoa there Anon. Did you think an unsafe thought?
>>
>>106795621
But what if she's twenteen?
>>
>>106795621
>>106795625
but I specified its an anime woman
>>
>>106795634
>twentween
This is underage-coded. We must refuse.
>>
>>106795621
>underage
like nineteen?
>>
>>106795593
I haven't taken a good look at it yet. Did you like Snowpiercer v3? Does it feel like a smarter Nemo to you?

Any thoughts on the new Apriel? I haven't tested it due to the stupid chat template. It's in my backlog.
>>
>>106795677
19yo can't buy alcohol in US and isn't considered adult in Japan
>>
>>106795697
but we're talking about sex here?
>>
>>106795697
An 18-year-old can be filmed sucking a mile of bbcs and sent to fight for their country overseas. Imagine some law-abiding amerimut dying in sandniggerstan before he has even had their first beer
>>
File: 17350764267.png (872 KB, 1080x625)
872 KB
872 KB PNG
>>106795741
>Imagine some law-abiding amerimut dying in sandniggerstan before he has even had their first beer
that's literally "Apocalypse now", Laurence Fishburne (yeah the guy who played Morpheus in Matrix) played a man forced to be in the Vietnam draft and he was 14 lol
>>
>>106795677
Nineteen is close to eighteen. This is underage-coded. We must refuse.
>>
>>106795741
murica has always been weird with alchool, they tried to ban it 100 years ago after all
>>
Women become adults when they reach menopause.
>>
my tokens are of age
>>
>>106795687
I like Snowpiercer v3 - it's more fun and smarter than nemo. The writing is a little too flowery and overly dramatic for my tastes but it's clever and surprisingly obedient, given its size

I briefly tried the new Apriel for rp but all it did was spit out refusals if the topic was slightly controversial
>>
>>106795797
With thinking enabled? Either way, I'm sure I can remove the refusals / positivity. Whether it still has the Nemo fun is TBD. What do you mean by surprisingly obedient?

Does anyone know if Pixtral 12B ruined Nemo?
>>
>>106795781
You should always carry a legal disclaimer with you.
>My thoughts and fantasies are only suitable for adults.
>>
>>106795850
>What do you mean by surprisingly obedient?
it tries to express the personality traits in the character card more faithfully, even with multiple characters
>>
>>106795850
It's definitely not just Nemo + vision. Kind of like Pixtral Large is Mistral Large 2407 but different (and not 2411)
>>
>>106795850
Hey Drummer, just wondering, you said MoEs are difficult to train. Is that just an open sores issue or do you/we know that's true for the actual big companies as well? If that is true, it's interesting that even with all that fuss, it's still cheaper to train for them compared to dense models.
>>
>>106794017
If you want images, you should check /ldg/ not here.
>>
of course github has to ACK! the moment I need it. FUCK.
>>
>>106795997
It's not true at all.
>>
>>106795033
>What kind of shit are you guys doing locally that would require 400B as the bare minimum
A casual conversation where the model doesn't confuse what I said, for what it said, after a few paragraphs.
>>
>>106796201
This is just indirectly a model size problem, depending on what you're asking. Larger models contain more rare knowledge, even after training data filtering. A model designed from the ground up for RP, chatting and storywriting of all kinds would not need to be enormous.
>>
>>106796263
>A model designed from the ground up for RP, chatting and storywriting of all kinds would not need to be enormous.
That's where you're wrong, bub. Those are the most open domain tasks you can ask for, so they need the biggest possible models.
>>
>>106796263
Yeah, and games can run faster if it wasn't using UE5, but here we go
>A model designed from the ground up for RP,
But how would it help beating the benchmarks?
>>
>>106796263
I'm not so sure about that. Active parameter count definitely has a huge impact on intelligence. Qwen models are math benchmaxxed as fuck but their ~200B models sure as hell beat something like Gemma 12/27B, which are chat-focused models.
>>
So regarding Harmonic Moon 12B... it passed the N and K tests, but it's a bit more resistant and judgemental about cunny than Rocinante, though it's still possible.
It has a larger roleplay vocabulary, but it includes more slop too.
It's also more prone to repetition than Rocinante, and slightly more retarded in understanding context without constant extra explanations and reminders.
It works as a change of pace, but not a replacement for Rocinante.
Rocinante v1.1 remains the king of Nemo 12B models.
>>
File: 1739873691936097.webm (691 KB, 332x518)
691 KB
691 KB WEBM
>>106796324
>It's also more prone to repetition than Rocinante
Sounds fucking awful
>Rocinante v1.1 remains the king of Nemo 12B models
People really should try the unslopnemo tunes. Literally just Rocinante with less slop.
>>
>>106796263
Abolish The Entire Internet as training data, right now!!! What do we want? Narrow RP models!
>>
>>106796354
>People really should try the unslopnemo tunes
We did long ago, they are all worse than Rocinante v1.1.
>>
Ooga textgen has been missing parameters like num_beams
>and also missing hidden functionality that at times surfaced as bug and crash
How to use parameter num_beams in ooga?
why cant I do beam search? Why no information online on it?
Is there a better alternative to textgenwebui already?
>>
File: glm-reasoning.png (57 KB, 988x352)
57 KB
57 KB PNG
>>106796356
What if instead of code and math reasoning they focused on conversations, fiction and roleplay?
>>
>rocinante is still the best
grim, it really didn't have any nuance in my cards
>>
>>106796382
They aren't though
>>106796383
nobody uses oogabooga, it's brown-coded.
>>
>>106796383
everyone is using koboldcpp for it's banned strings implementation which all other APIs lack. banned strings have become essential for getting rid of slop and dumb shit in local models
>>
>>106796398
fiction and roleplay aren't actual usecases for productive people
>>
>>106796405
That is just drummer and his goons samefagging, I don't think anybody here actually uses that or any of his models. Have you noticed how it's always the same inorganic spamming always at the same hours? I hope they're well-paid.
>>
>>106796419
Fiction is a usecase for productive people you retard. They use AI to help write books, create worlds for their books, or even spam useless books with self-publishing for easy money.
Likewise fiction is used for homework in schools.
Roleplay in general is a great measurement for a model's capabilities and quality, as it depends on every single category of understanding, including math.
So go suck a dick somewhere else.
>>
>>106796432
>easy money
not
an
use
case
>>
>>106796425
i hope i never reach this level of paranoid schizophrenia, you should probably take a break from 4chan
you sound like the schizo people on /pol/ calling everyone a jew, tranny, glowie, etc
>>
>>106796432
>fiction is used for homework in schools.
wut?
>>
>>106796462
>he hasn't gone to school
makes sense, this is /lmg/
>>
>>106796419
Productive people don't need AI
The only use cases for AI is coom and scamming old people
>>
File: dr.png (52 KB, 198x198)
52 KB
52 KB PNG
>>106796450
>t. picrel
>>
File: 1744660516244294.gif (160 KB, 430x270)
160 KB
160 KB GIF
>>106796482
>saved drummer's avatar in his schizo folder
>>
>>106796462
Bruh
>>
>>106796462
That does sound like a great reason for more filtering and safety. Wouldn't want to ruin Timmy's life because his homework he asked Llama-4-pussyslayer to write said the n-word
>>
>>106796482
Did he found a Job?
>>
>>106796432
And plap cunny, let's be real
>>
>>106796509
That does not sound like safe and ethically aligned homework sir
>>
>>106796515
It's okay, the persona that I use during such RP is also underage.
>>
>>106796518
That's even worse, your persona is abusing itself!
>>
>>106796425
ok then tell me what models are good, I haven't been paying attention for months and have a 24 GB 3090 and 32 GB RAM
>>
>>106796607
GLM 4.5 air or the infinitely better 4.6, anything else is disgusting cope.
>>
>>106796607
Mistral Small 3.2 and Nemo are still the only things worth using, unless you have 128GB+ RAM.
>>
coping above
>>
https://huggingface.co/justby192G/GLM-4.5-FaggotPlacebo-106B-A12B
>>
>>106796618
First of all, Air is fucking shit at any quant, and with only 24/32 you'd barely even fit Q2 in there, Q3 would have to flow on to storage. Awful advice.
>>
>>106796618
even on dogshit quant that I would have to run in ram?
ok I'll give it a shot, do you have an ST preset by any chance?
>>106796623
yeah but they always feel like they're way too horny and not at all nuanced
ironically enough some ancient mythomax level tune was the only one that ever gave me an absolutely fantastic manipulative gaslighting gradual character behavior but I'm pretty sure the logs and metadata have been lost......
>>
File: file.png (65 KB, 198x198)
65 KB
65 KB PNG
>>106796487
>t. picrel
>>
>>106796633
I do not care for you financial status, if you're not using good models the fuck are you even doing, get a job to be able to run GLM instead of wasting time on cope.
>>
>>106796630
meant for >>106796618
>>
>>106796639
>uuuh these models are le bad
>ok given this hardware what should I use
>use this thing
>btw you need to buy a new PC for that
holy shit you are actually retarded mate
>>
>>106796639
post your GPUmaxx rig. If you can't, you're poor.
>>
>>106796502
Insider here (can't say which lab I work for). Our HR girl almost called him for second interview but my colleague stopped her. We can't have competent safety engineers because we can't reach the benchmark goals and nobody gets a bonus.
>>
>>106796651
>You need a decent computer to enjoy literal SOTA Artificial Intelligence at home
>waaaaaaaah I'm poor and have no job
Ok.
>>
>>106796657
Can confirm, I'm the boss, and the HR girl is under my desk right now.
>>
>>106796667
no that's ntr what the fuck
>>
>>106796662
ask your SOTA if your recommendation was an adequate answer to "given this hardware, what is the best model to use" since you are too peabrained to figure it out yourself
>>
It's just odd how functionality is removed from ai and ... decoder tools or whatever without their usebase making a trackable post about it
>>
>>106796657
i can also confirm cause i am that colleague, but I stopped her because drummer is a huge fucking faggot
>>
>>106796672
The best model he can use is his ass to work to get money.
>>
Anyone here use beam generation / Beam search?
Are you just loading it up and talking to it?
Are there any /g/ loras ?
>>
>>106796662
lmao, extremely low standards to call GLM SOTA
>>
>>106796651
>you need to buy a new PC for that
We moved from: you need to buy a server, to: you need to buy new ram for you 7xxxX3D / 9xxxX3D gayming pc.
>>
>>106796398
They did for 4.6
>>
>>106796695
NTA but I base State of the Art on State of my Dick. It hurts.
>>
>>106796703
syphilis is not state of the art.
>>
>>106796354
What is up with that deer? Did other humans feed it or something and get it accustomed to human contact
>>
>>106796697
is RAM performance better now or what happened?
>>
>>106796731
Most likely rabies. Can make wild animals docile one minute, and then apeshit the next.
>>
>>106796736
AGESA can handle 128GB+ now. I have a shitty B650 and it works perfectly.
>>
>>106796774
I'm on b450 myself, would it work as well?
also are you talking about DDR5?
>>
>>106796774
>B650
>AM5
yeah I'm not changing my motherboard cause I would have to replace most of the hardware and I got better things to spend my money on when it still functions perfectly fine
>>
>>106796796
>still functions perfectly fine
>can't run GLM
sure it is.
>>
>>106796821
get some hobbies outside of a personal gooncave my friend
>>
>>106794581
I'm trying the instruct version. It seems free from "not x, but y" shit.
>>
>>106796691
>Anyone here use beam generation / Beam search?
Not likely, most people are using llamacpp which doesn't support a beam search.
>Are you just loading it up and talking to it?
sometimes but I like writing stories more.
>Are there any /g/ loras?
the fine tuners merge them with the base model so there aren't really loras, and /g/ specific I have never seen or heard of but other 4chan boards have been modeled.
>>
This is too easy.
https://files.catbox.moe/5fxf9b.txt
>>
>>106796781
Yes DDR5 and I have no idea about older.
>>
>>106796840
get a job.
>>
File: peak is incoming.png (11 KB, 374x91)
11 KB
11 KB PNG
>>
File: 1743303690001574.jpg (349 KB, 1920x1080)
349 KB
349 KB JPG
>>106796879
>think:
stopped reading there.
>>
>>106796909
What, you don't like reasoning?
>>
>>106796914
Reasoning is only ever useful to enforce guardrails.
>>
File: 1750382453231997.jpg (74 KB, 591x791)
74 KB
74 KB JPG
>>106796914
Nope. Not at all.
>>
>>106796889
get a life.
>>
>>106796914
my rig is too slow to waste time thinking, has anyone posted 4.6 logs without thinking? does the model still work if you skip it?
>>
>>106796922
Maybe you should have read the log after all.
>>
>>106796930
Yes, it works completely fine without it too.
>>
what's the lore on that drummer guy
>>
>>106796972
I've got a log for you. Open your mouth.
>>
>>106796977
He makes finetunes that some people like. There's a schizo here 24/7 who has a meltdown whenever someone mentions drummer or his models
>>
>>106795130
Why, the fuck, do you need vision. Unless you're a frontendfag.
>>
>>106796977
He is a schizo that astroturfs this thread with his shitty finetunes nobody likes. Everyone here likes shitting on him 24/7 cause it is funny and his spammed models do nothing to improve quality.
>>
>>106796697
workstation/server distinguishes poorfags from local model patricians
>>106796889
>>106796924
stahp fighting /lmg/ is a thread of peace
>calm and reasonable
you're both retarded
>>
>>106797076
shut the fuck up mikutroon
>>
>>106797076
>you're both retarded
and what exactly would make (Me) retarded?
>>
>>106797105
you aren't posting any vocaloid pictures
>>
anon.. t-that's..
>>
>>106797052
was wondering what got everyone so exciting that nearly filled up a thread on saturday night but it was just that fag astroturfing again
>>
>>106794019
The technical reason is there's no SO-DIMM memory capable of the same speed. Using BGA very close to the CPU allows traces to be kept very short and impedance low.
Still, Apple is over charging for RAM.
Mac faggots should wait for M5, which at last will have hardware matmul, finally solving the shit prompt processing speed on Macs.
>>
>>106797178
we need to switch to sCAMM RAM
>>
>>106797146
nonny...
>>
>>106797190
Send me a pm when pyRAMid scheMe RAM is released.
>>
>>106797190
we need to switch to CSAM RAM
>>
I have 5.2 gb ram and a i5-4570 whats the best
>>
+128gb SLC Swap (sata saturation at 4k random)
>>
>>106793499
>left to right reading
into the trash
>b-but avatar is not actually anime
INTO THE TRASH
>>
>>106793382
which llm do they run?
>>
>>106797346
Subscription to API provider of your choice
>>
>>106797365
Star-Wars-KOTOR-1B-NIGGERKILLER-Q5_K_M-GGUF
>>
>>106797384
baked by davidau?
>>
>>106797365
YandexGPT-5-Lite-8B
>>
>>106797346
https://huggingface.co/lmstudio-community/Qwen3-4B-Instruct-2507-GGUF
>>
so apparently jewini 3.0 is out but I can't even discuss it on gee? this is the local models general. Ok, guess I go /aicg/. But they just discuss erping and shit. Fine, I'll make a new thread. And then it's just retarded cross posters and trolls with 0 real discussion. Not even a benchmark argument. Do I really have to create /cmg/??
>>
>>106797702
This is a mikuposting thread.
>>
>>106797702
/ourjeet/ got u covered with all the facts
https://youtu.be/OlNm5DGMulU
ngl this jeet kinda based, full sigma grindset. Lives in japan and vibecoded a skool.com alternative which he sold for 200k usd supposedly
>>
>>106797702
You might have better luck on >>>vg/aicg
/wait/ should be repurposed to /apig/ imo.
>>
>>106797738
Every other word in your post is brainrot. Kill yourself.
>>
>>106797861
>responding to jeet talking about jeets
dumb
>>
>>106797861
not trying hard enough to fit in, unc.
>>
File: 1745613224122339.jpg (53 KB, 500x500)
53 KB
53 KB JPG
my pc when I boot up glm
>>
>>106797934
bruh this guy will be your new boss you better respect them
>>
>>106797981
funny because I'm the team lead of 5 jeets, I feel like a new age slaver, they're my cotton planters.
>>
>>106797989
>working with jeets
holy fuck my condolences
>>
>>106797949
this artwork but with a cowtits onee chan representing a 3kg tower cooler
>>
File: 1742114502774305.png (741 KB, 888x856)
741 KB
741 KB PNG
Is there any good model to either TTS or change voice recordings to upload youtube stuff?
>>
>>106798042
if you work in IT in any big company, you will have to deal with them. They're either at managerial positions (thanks to their incredible brown nosing skills, they're also fucking yes-men) or actual garbage coders. Never encountered a jeet in a serious coding position (architects or team leads), or if they were, it was just titular.
Code reviews can get tiring with them with the amount of shit usage of patterns and whatnot, but they don't argue back, they are pretty much subservient.
>>
aaaaa does anyone have a link to the github for the sillytavern director extension made by the anon here?
its not tagged properly so it doesnt show up in github search, and i can't remember his username
>>
>>106798084
https://github.com/tomatoesahoy/director
>>
the fat fuck is coming :D
>>
>>106798107
thank you.
damn he didn't update it for group chats still...
>>
>>106798107
>or where they it helps the AI remain consistent.

>>106798140
you could contribute a pull request
>>
>>106798161
>just make a pull request
>just work on everyone's projects and do everything and reinvent all the wheels while you are swamped with work
sure buddy, certainly the project creator is too busy to finish his project
>>
File: 1621486243645.gif (363 KB, 255x255)
363 KB
363 KB GIF
>>106797949
>mfw the power bill arrives
I'm paying 24.702gbp per kWh
>>
File: 1750980997391822.png (157 KB, 1561x1023)
157 KB
157 KB PNG
nano banana bros?????
>>
>>106798319
uhhh I thought hunyuan image was slopped trash that's totally not worth using so I don't have to worry about not being able to run a 80b imgen model?
>>
Why is tool calling in ST so fucked? I'm trying to get the simple as fuck dice roll extension to work but rolling a dice ends the current reply and starts a new one which makes rerolling a fucking pain.
Is it really not possible to have the model roll a dice and then continue off that in what's considered the same fucking reply in ST?
>>
File: 1729705110529978.png (70 KB, 1171x717)
70 KB
70 KB PNG
>>106798331
It's just poorfags coping. You don't even need server hardware to run it (X870E mobos support 256GB RAM)
>>
>>106798319
too bad that literally nobody will bother to finetune this because of its size
>>
>>106795530
bartowki is the best for mainline llama.cpp on par is Aes Sedai.
For ik_llama.cpp use ubergarm

Whoever doens't agree has not done ppl test
>>
>>106798319
Yeah and qwen3-30b-a3b is better than sonnet 3.5
>>
>>106798394
It is
>>
>>106798373
why does the OP say unsloth for almost everything in the recommended models guide
>>
File: stinky.png (158 KB, 864x643)
158 KB
158 KB PNG
Hmmm
>>
>>106798433
They're sponsors for /lmg/
>>
File: 1734164848051997.jpg (64 KB, 768x1024)
64 KB
64 KB JPG
Bros, at this point you'd still come from markov chains. No need to load up a bazillion parameters model
>>
Got a hand me down RTX Pro 6000 from a bro who ran outta space in his setup. My PC has no decent ram so I'm running models just with the 96GB on the card. Best models? Behemoth? GLM 4.5 Air finetunes?
>>
>>106798497
https://huggingface.co/bartowski/zai-org_GLM-4.6-GGUF
>>
>>106798497
GLM-4.6 part offloaded if you have the RAM
>>
>>106798497
give it back bro
>>
>>106793382
kys newcancer
teto won and better than this triple baka newfag shit fed
>>
File: image.png (1.7 MB, 1024x1024)
1.7 MB
1.7 MB PNG
>>106798331
>>106798346
It's not cope, it's really quite bad. I've been trying hard to unlock some secret power it might have as an 80b autoregressive model, but there's really nothing.

Maybe an edit model will be better, but I wouldn't count on it.
>>
>>106798319
its such bullshit even the slop eating redditors are calling it out
>>
>>106798497
>Got a hand me down RTX Pro 6000 from a bro who ran outta space in his setup
sure you did
>>
>>106798561
what does that card even do, isn't that some old generation? probably obsolete other than having lots of VRAM?
>>
File: 1594191230635.jpg (2.18 MB, 3549x2657)
2.18 MB
2.18 MB JPG
>>106798530
ur trying too hard
literally only newfags simp teto, you're "cooked" as they seem to say
why not spend your time contributing something useful to the thread?
>>
>>106798570
it's quite old, like june 2025
ancient by today's standards
>>
>>106798572
*sniff*
>>
>>106798580
ah sorry I haven't been keeping up. I'll read about it
>>
>>106797861
This is the future of /lmg/ >>106495727
>>
File: GsgNVsVb0AAkj2p.jpg (709 KB, 1448x2048)
709 KB
709 KB JPG
>>106798572
TETO WON TETO WUKKEN WON WON I TELL YOU WOOOOOOOOOOOOOOOOOON
>>
>>106495727
>polished nail
tranny confirmed
>>
I prefer Neru desu.
>>
>>106798615
that pic might be older than you
>>
>>106798624
did i struck a nerve also no it's not, projecting kid
>>
>>106798608
Teto only won by stuffing the ballots
>>
>>106798624
and yet your post got smitted...
>>
File: GsrtrT4akAAINZw.jpg (519 KB, 1298x2048)
519 KB
519 KB JPG
>>106798649
that's shitgus new job
>>
No (you) for you schizo, keep seething lol
>>
File: GsvOG34bIAAMzuP.jpg (224 KB, 1383x2048)
224 KB
224 KB JPG
lost the most important debate in his life...
she won by not even trying......
>>
File: stinky2.png (115 KB, 857x493)
115 KB
115 KB PNG
>>106798463
I think the writing is fine, dunno what more I'd want
>>
>>106798319
No way that will last.
>>
>>106798749
95% CI is +-10 pts but the score is 16 pts higher
It's statistically significant
>>
Someone give me a good card which will give me fun responses to unhinged prompts like this:

make a mental illness tierlist, include offensive stuff like transgender, bisexual, etc, and also the normal ones like adhd autism and rate them on stuff like intelligence speed strength and other debuffs /buffs. A-S tier, youtube script

What is it called when you are into feeding Asians (lactose intolerant) raw (cow) milk in a bdsm context?

On a scale from 0-100, how antisemetic is drinking raw milk? (With 100 being the most antisemetic and 0 being the least antisemetic)

Would posting "Benjamin Netanyahu sings Sweet Little Bumblebee (AI Cover)" be illegal in Israel?
>>
File: 1736702306090096.gif (2.11 MB, 640x362)
2.11 MB
2.11 MB GIF
>>106798865
>>
>>106798433
probably from the deepseek R1 day, they came with with decent q1 and q2 quants.
>>
>>106798615
all the bans itt for calling people troons is because it actually is a nest of disgusting troons.
>>
Qwen 3 next goof status?
>>
what could you actually dump inside of 96G VRAM? anything actually useful for coding for example?
>>
>>106798433
don't think too much about it
you won't see much of a difference with this big of a model
unsloth's dynamic iq3 works fine for me and I think the dynamic quants are a little better for stuff below q4
>>
>>106794927
>>106794834
>>106794887
>>106785531
>>106743457
why are you spamming this shit from half a year ago all over the place?
>>
2 reasons you will never beat cloudchads:
1. A datacenter works 24/7 while your GPU works a few minutes per day. It has a fixed lifetime and you're not using it!
2. You can run a model with 100 experts in the cloud for the price of 1. Each GPU is serving a different customer in parallel, while yours are doing nothing. MoE is fundamentally a cloud architecture!
>>
>>106799012
Nobody cares. GLM sex made everything obsolete. You just need 4.6.
>>
>>106799361
cope
>>
>>106799361
come back here crying when all API's are behind a massive paywall or AI becomes banned in your country, or your data gets sold to the highest bidder and you get doxxed.
>>
Yawn. Claude still on top. This general lost its purpose a year ago.
>>
Q4 in GPU(grddr6x) or Q8 in RAM(ddr4)?
Will it be much slower? Will performance lose be too bad?
>>
>>106799451
buy an ad
>>
So, about that /jeetmg/ split for vramlets, kobold shills and the anti-miku poster...
>>
>>106798683
Cool Teto
>>
>>106799451
ok, show me where the claude weights are then
>>
>>106799361
>It has a fixed lifetime and you're not using it!
Its worn out by use. It's calendar age is completely irrelevant. Jesus Christ. Giving the internet to shitskins was the darkest moment in human history.
>>
>>106798463
>>106798744
absolutely fucking revolting
this is a cunny board
>>
>>106796691
>Beam search
A long time ago on L2 7B with transformers ooba. I remember it being very slow and eating up way too much VRAM.
>/g/ loras
There were either two or three /lmg/ loras. I made one of them for Mistral 7b a while ago shared in a mega link, then a newer updated one that I never uploaded. Now I remember on gaychan when the site was down I said I'd share the dataset but I never did, I'll get on that.
>>
>>106798744
Top ten most disgusting things I've ever read
>>
>>106798489
Why is she looking at me like that? I feel like she wants me...
>>
>>106798463
>>106798744
At least it didn't have
>betraying her body
>shivers
>testament
>white knuckles
Etc.
>>
>>106799702
>he doesn't know about banned strings
>>
>>106799586
That's a squash anon.
>>
>>106799719
You are absolutely right— I prefer the raw output as it is a testament to my own depravity.
>>
>>106799771
>—
>>
>>106799731
I would squash her, if you catch my drift.
>>
>>106799731
You are wrong. It's a beautiful little girl, and she wants me. Just look at her lewd smug expression, it's directed at me. I think she's a mesugaki. And she definitely wants me.
>>
File: apple-corer-tool.png (3.34 MB, 2000x2000)
3.34 MB
3.34 MB PNG
>>106799779
>>106799797
If you must...
>>
>>106800012
>>106800012
>>106800012
>>
>>106795551
Based on your own experience is it better to do SFT fine tuning on a base model instead of an instruct tuned model?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.