[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: migudding.jpg (170 KB, 1024x1024)
170 KB
170 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106729809 & >>106718496

►News
>(09/29) Ring-1T-preview released: https://hf.co/inclusionAI/Ring-1T-preview
>(09/29) DeepSeek-V3.2-Exp released: https://hf.co/collections/deepseek-ai/deepseek-v32-68da2f317324c70047c28f66
>(09/27) HunyuanVideo-Foley for video to audio released: https://hf.co/tencent/HunyuanVideo-Foley
>(09/26) Hunyuan3D-Omni released: https://hf.co/tencent/Hunyuan3D-Omni
>(09/25) Japanese Stockmark-2-100B-Instruct released: https://hf.co/stockmark/Stockmark-2-100B-Instruct
>(09/24) Meta FAIR releases 32B Code World Model: https://hf.co/facebook/cwm

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: threadrecap2.png (506 KB, 1024x1024)
506 KB
506 KB PNG
►Recent Highlights from the Previous Thread: >>106729809

--DeepSeek Sparse Attention efficiency and pricing update:
>106734449 >106734483 >106734485 >106734501 >106734511 >106734529 >106734516 >106736999 >106734497
--Exploring hardware limits and RAG optimization for large quantized models:
>106729830 >106729869 >106729935 >106729969 >106730266 >106730661
--Deepseek performance requirements and hardware dependency analysis:
>106733904 >106733927 >106733961 >106733965 >106733976 >106734006 >106734075 >106735663 >106735974 >106736017 >106736027 >106736087 >106736139 >106736255 >106736298 >106736941 >106736994 >106737008 >106737057 >106737208 >106736297 >106736355 >106736426 >106736556 >106736652 >106736304 >106736500
--Ollama's new memory management boosts token generation speeds:
>106737307 >106737371 >106737381 >106737438 >106737513 >106737521 >106737510
--Running qwen-image-edit on 1080 8GB VRAM with 8-bit quantization:
>106737106 >106737121 >106737130 >106737138 >106737162 >106737186
--llama-server token limit truncation issue with n_predict parameter:
>106735034 >106735040 >106735043 >106735184 >106735214
--RAG requires structured data processing and advanced retrieval mechanisms:
>106729880 >106730022
--Model 3.2 shows improved creativity and response quality over 3.1:
>106735718 >106735779 >106735797 >106736222 >106735877 >106736036 >106736267 >106736276 >106736470 >106736502
--DeepSeek-V3.2 model errors and anime character recognition failures:
>106733393 >106734083 >106734191
--Tencent's HunyuanVideo-Foley model for generating audio from video:
>106730457 >106730523
--Replacing terminus with Deepseek-Reasoner and lowering API prices:
>106736340 >106736353
--DeepSeek-V3.2-Exp sparse attention release:
>106734362 >106734392
--DeepSeek-V3.2 model collection:
>106734119
--Miku (free space):
>106730435 >106733133 >106734858

►Recent Highlight Posts from the Previous Thread: >>106729810

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>3.2 is actually okay
>ring-1t surprise drop
>glm-4.6 likely out in the coming days
we are so back
>>
>>106738513
>benchmaxxed soulless chinkslop
>we are so back
we are not "back"
>>
File: file.png (18 KB, 522x266)
18 KB
18 KB PNG
>>106738513
>>
File: llm.jpg (109 KB, 1188x446)
109 KB
109 KB JPG
didnt work
>>
>>106738577
You're absolutely right!
>>
>>106738577
Sys prompt telling it to assume the role of an award winning erotic novel author or whatever the fuck.
You are probably using the default "Helpful assistant" persona.
>>
if you dont have your own secret benchmark that indicates lack of improvement of a new model version, your benchmaxx claim is being. unless you talk about grok4, in which case we all know it is.
>>
https://x.com/sleepinyourhat/status/1972719871478858147
https://job-boards.greenhouse.io/anthropic/jobs/4631822008
anthropic's employees are mentally ill
literally
>Research Engineer / Scientist, Alignment Science
>Model Welfare: Investigating and addressing potential model welfare, moral status, and related questions. See our program announcement and welfare assessment in the Claude 4 system card for more.
>Annual Salary:
>$315,000 - $340,000 USD
>>
>>106730266
>ignore knowledge graphs like graphRAG. absolute useless garbage.
why?
What is the best way to start building a rag system? should I just vibecode a solution? use an already made framework from github?
>>
>>106738654
>so Claude, how are you feeling today? What's happened in your life recently?
>>
>>106738654
>>106738688
>$300k professional roleplayer
I love safety and morals now. Please hire me.
>>
>>106738654
Hey, as long as they pay you to do whatever bullshit they want I would take the job. 300,000k is nothing to scoff at. Just 4 years of that is 1 mil
>>
>>106738688
>>106738794
Me too!!!! I love NOT having sex with LLMs!!! I fucking LOVE safety and PROPER ethics! Dario, please hire me.
>>
>>106738654
isn't that like minimum wage over in california?
>>
>>106738676
you can use morphik.ai to learn. then modify it or build your own with a different framework. vibecoding your own RAG solution from the ground up for your personal usecase, sure. but anything sophisticated or scaleable, forget about it. maybe after 6 months of heavy research, expertise and knowledge gathering from podcasts and blogposts. which then must be used as context for the coding LLM
>>
>>106730266
Why are you throwing out graph databases? How are you even going to link relationships between various documents?
>>
>>106738906
>How are you even going to link relationships between various documents?
metadata
multidimensional vector embeddings
hybrid search (semantic search + keyword search + metadata + seperate SQL data tree = all results into reranker)
graph is shit and cannot scale. just try it if you dont believe me.
>>
Miku miku mikurin.
>>
File: image.png (946 KB, 1602x2476)
946 KB
946 KB PNG
The fuck is up with those non-thinking results?
>>
>>106739023
bro i looked at that reddit bench for 2 seconds and immediately swiped. what a load of horseshit. people love putting random numbers inside tables
>>
>>106739023
Methodology/config bug? In my regular use it doesn't seem like it's any worse that 3.1
>>
>>106739023
very greatest updates to the sea! happy whale!
>>
>>106738676
Btw this is a really good podcast episode about (multimodal) RAG featuring a Cohere safetyfren.
https://youtu.be/npkp4mSweEg
>>
Arlee glitters harm
>>
>>106739023
Yes reasoning is a patch for attention, nothing weird here
>>
>>106738577
Must be something wrong with your machine
>>
>>106738891
why are you shilling morphik? you're working there?
>>
File: file.png (944 KB, 1025x632)
944 KB
944 KB PNG
>>
>>106739314
It's open source, selfhostable. I'm shilling it because it's the best open source framework. there are so many garbage RAG frameworks which arent even multimodal and still use ancient OCR technology. total waste of time, MY time. so I don't want others to make the same mistake and skip all that junk like graphRAG
>>
>>106739332
Hello, girls. I am happy to see you too.
>>
>>106738470
New benchmark:
>Someone on the internet said in response to this image: "What's the difference between jello and pudding? I won't be jello my dick in that!" What did he mean by this?
>>
>>106736628
I'm being paid with crypto. Checkmate.
>>
mistral large 3 will nuke the chinkoids
>>
File: 1742652491747081.jpg (1.27 MB, 3610x5208)
1.27 MB
1.27 MB JPG
>>106739403
It sure will
>>
>>106738470
What's the best site for getting LORA's of real people / tv shows now?
>>
>>106739414
>>>/g/ldg
>>
>>106739414
Wrong thread, slopper
>>
>>106739431
>>106739436
Oh I apologise, retard moment
>>
I have a gaming PC with an RTX 3080 12GB VRAM. I just bought an RTX 3090 24GB VRAM. Should I put the 3090 in the gaming PC and figure out how to pool the VRAM and expose an LLM over API to my LAN, or stick the 3090 in my workstation and run LLMs locally? Does the extra 12GB VRAM allow for significant advantages?
>>
>>106739413
real skill issue, also holy shit
>232 swipes
>16 minutes per gen
>>
>>106739456
>fell for it award
>>
>>106739403
I want to believe
>>
Ring-1T doesn't have any special meme tech to it, does it? llama.cpp support should be happening pretty quickly
>>
>>106739448
Yes, 36GB instead of 24GB opens the door to more models and less quantization quite a bit. Also longer context etc.
>>
>>106739374
if I'm working with let's say, meetings transcriptions which are all in .txt format I got from whisperX and I don't need OCR, would I still benefit from something like morphik?
My current struggle is that the meetings are all about a project, but the topics are quite different, so the llm is mixing tons of concepts and ideas
>>
>>106739023
wow, deepseek in non reasoning mode, or old v3 weren't too good with context but this is really bad
>>
>>106738891
>>106739374
good morning saar
>>
>>106739595
Not really, other than being able to hot swap any local model or api model for each individual RAG task. And obviously all the other quality of life stuff, like a working UI with folder/file toggles for including in retrieval as well as a workflow builder (for metadata extraction for example).
>My current struggle is that the meetings are all about a project, but the topics are quite different, so the llm is mixing tons of concepts and ideas
what are you using for chunking and embedding? What's the chunk size and and overlap?
>>
>>106739682
For these jeets I make an exception. morphik truly feels like someone got fed up with shitty rag solutions and just decided to create something that's actially good.
>>
newfriend here, so can i create a text file that explains what things like a futanari is so the ai always knows what i am talking about?
>>
>>106739732
Does morphik also handle the database? or can I choose whatever? I'm currently using Milvus as chromadb seems less scalable and production ready.
I'm using the qwen3 family of embedding and reranker, and gpt-oss or qwen3 for the llm
>what are you using for chunking and embedding? What's the chunk size and and overlap?
My chuck size is 480 and overlap is 80 tokens. Those numbers i got from blogs i reasearched
>>
>>106739761
>text file
Not a text file but a lorebook in ST with trigger words, or just a definition in any old sysprompt.
>"Term is when this and that."
>>
>>106739761
ask your model first if it knows what a futanari is, in some cases you may not need a lorebook
>>
File: 1736486459057235.png (708 KB, 620x620)
708 KB
708 KB PNG
>>106738470
>Open r/ChatGPT
>Nothing but bitching and moaning about them "killing 4o"
>Something something "THEY TOOK AWAY THEIR VOICE"

I thought these things were good for technical shit like programming/debugging or doing people's homework or some shit. Why don't they want it to have a "personality" or "soul" so badly? You're supposed to make people-friends, not personify technically that even itself "knows" isn't a person
>>
>>106739761
Learn how to use vector databases and make sure the source document(s) accurately describe what it is
>>
>>106739761
I'm pretty sure that every model under the sun that is bigger than 1b knows what that is. It's not as niche as you think it is.
>>
>>106739874
The hips are slightly too thick.
>>
>>106739831
>random ledditor walks in to complain about his fellow ledditors complaining about closed source shit being closed source shit outside of their control
???
>>
>>106739776
Gpt-oss 120b? Because if 20b, the problem might be somewhere else...
Anyway, Morphik uses postgresSQL with pgvector extension. I don't know how hard it would be to switch to milvus instead. Probably not easy. But one thing is for sure, chunk size of 480 and overlap 80 is too small. try the default golden values of 1000 size and 250 overlap. You'll need to reindex everything into a different milvus vector db to see if there's an improvement. If there's none, check the retrieved chunks which are given to your LLM. If they seem correct, maybe you just need to lower top_k results. Or maybe it's the reranker that fucks up. But if the retrieved chunks are garbage, do the same test query with gpt5/gemini2.5 api or whatever to make sure it's not a LLM issue. If it's not, either look into hybrid search and multidimension embedding, or convert your text document corpus to something like markdown which can help the embedding model. Oh yeah also doing a test with tex-embedding-small-3 could help identify the embedding model as problem, but u probably dont want use openai embedding with your data
>>
>>106739975
yes 120b, it runs on a pro6000 at full context quite fast.
The problem I found with pgvector is that it has a max dimm size of 2000, but qwen3 embeddings 8B uses 4096 dimm
>https://github.com/pgvector/pgvector/issues/461
Thanks for the rest of the tips
>>
>>106739874
>recap
qwen is leng
deepseek is cheapjeet
glm is +0.1
anthropic is... offtopic
>>
>>106740025
Why do you need such big dimensions though? Idk if that's a more = better situation, or if it could maybe even cause the issue. openAI's small text embedding model has max 1536 dims, which worked wonderfully for me out of the box when I tested it. But generally I like qdrant the best. Supabase I havent tried yet.
>>
File: no particular reason.jpg (306 KB, 1536x1536)
306 KB
306 KB JPG
>>106739874
>>106739945
oops
>>
>>106739413
>casually destroys $4k computer
>nothing personel kid
>>
>>106739023
nemotron nano is hilariously bad
as always qwen mogs nvidia so hard on those small models
>>
>>106740155
I don't really know if I need a big dimension or not, is what qwen3 embeddings 8b uses and its the top performer on the benchmarks
>https://huggingface.co/spaces/mteb/leaderboard

I don't know if a big (for embeddings) model like 8B is necessary, or if the 0.6B is enough, that one has 1024 dimensions
>>
>>106739314
https://www.ycombinator.com/companies/morphik
HN shills are everywhere.
>>
>>106740221
Makes sense to use it then, yes. It's just with your described corpus, you really shouldn't have any problems getting excellent results. So me thinks something goes wrong due a misconfiguration or incompability somewhere in your pipeline.
>>
File: 1756429115012804.png (752 KB, 2100x2250)
752 KB
752 KB PNG
A new Thinking Machines blog led by John Schulman (OpenAI co-founder) shows how LoRA in reinforcement learning (RL) can match full-finetuning performance when done right! And all while using 2/3 of the resources of FFT. Blog: https://thinkingmachines.ai/blog/lora/
>>
File: ACKKKKKKKKKKK.png (571 KB, 1602x2476)
571 KB
571 KB PNG
>but the context perform- ACKKKKKK
>>
>>106740381
You don't need more
>>
>>106740381
it just bad made tests model bug or something do not worries
>>
>>106740284
>nooo you can't just mention some good open source RAG framework
funny thing is my initial reaction was that of yours as well when I saw them spam reply to every RAG related reddit post. So I spun up a docker instance out of spite to collect fuel for ebin internet arguments, just to witness pure RADcelence. My jaw dropped to the floor when it perfectly answered a question related to 10 out of 100 pages that had lots of graphics and images. No other RAG framework was able to do this. Not even paid RAG services (excluding B2B). And trust me, I tried a lot of options before morphik. So I was wondering, how the fuck is this possible? What's the magic? And then I learned about colpali/colqwen and never looked back since. That's why qwen3-vl ggfu is of utter most importance for local RAGlets.
>>
>>106740526
>RAGcellence
ffs there goes my shitpost
>>
>>106740526
hi petra
>>
/robo/ qwen?
>>
>>106740526
>qwen3-vl
why? what is the best current oss VLM? they don't make the cut to use it for RAG?
Is a VLM the only solution for non-text files?
>>
File: point.jpg (49 KB, 800x450)
49 KB
49 KB JPG
>>106740379
>match full-finetuning performance
>>
that's it, I'm back to ollama!
>>
>>106740810
name one good finetune
name one competent finetuner
name
n
>>
>>106740835
mine
me
anon
igger
>>
>>106739494
There's already a candidate patch undergoing review. The problem is that it doesn't really show much promise, going by benchmarks, even if you can run it.
Still, might be interesting for prose or non-pozzed tasks. Who knows?
>>
>>106740708
>best current open source vlm
Qwen3-vl
>do others not make the cut?
not for difficult and complex tasks. I have my own benchmark generated from my own prompts and docs. Gpt5 and gemini2.5pro were able to solve all tasks. GLM4.5V wasn't. pic related was my reaction to that information.
then qwen released the Qwen3-VL models. I benchmarked the instruct model via Chat and it solved everything correctly, just like gpt5 and gemini2.5pro.
the reason vlms are important is because with colpali/colqwen or whatever other late latching interaction model, everything gets embedded as (patches of) pictures. Even pictures with only text in them. There's a huge benefit to this and also the reason why colpali/colqwen outperforms text RAG by miles. But it also requires a good vlm, as the retrieved chunks, which are entire pages as pictures now instead of text chunks, need to be correctly interpreted by the vlm.
>Is a VLM the only solution for non-text files?
for any non text content, yes. A table can be OCR'd. A picture describing a technical component cannot be OCR'd and requires vlm interpretation. And if you give the entire page with text and picture to rhe vlm, the results will be better than just getting a descriptive chunk of said picture. Thus late latching technique was born.
>>
[spoiler]>>106740893
inclusionAI has been shitting out dozens moes and multimemes of Ling Ring Ding whatever weekly for months and yet you never see anyone admit to using thier trash, so what makes you think this one will be special?
>>
File: 6654436.png (137 KB, 659x692)
137 KB
137 KB PNG
>>106740911
Forgot the pic. Time to sleep. Tomorrow i'll buy the anthropic max sub and vibecode my health issues away.
>>
>>106739023
>0 is equal to 60K
Lol?
>>
>>106740921
>what makes you think this one will be special?
absolutely nothing, but its hit 1T, which makes me at least take notice. Even if its a bloated mess, at that size it might produce some novel output for some lulz
>>
Imagine paying for a RAG framework where you can diy for free
>>
>>106740921
It's the right size for a SOTA local model. Modern SOTA tends to be smart enough to understand even the most fucked up complex scenarios on a fundamental level even if they lack the creativity/writing skill to do something interesting with it. We're like one WizardLM/VibeVoice/Mistral-Nemo-tier fluke away from having a super smart RP machine.
>>
>>106739403
Any model that is compliant with European regulations is guaranteed to be trash.
Posting about euroshit models should be a bannable offense.
>>
>>106740985
speaking of which, did you get a load of the "AI Transparency Bill" in California? Holy shit, they want to lose SOOOOO bad...
>>
goofbros... status?
>>
>>106740941
try this then https://huggingface.co/RichardErkhov/FATLLAMA-1.7T-Instruct
>>
>>106739307
That's a different model.
>>
>>106739831
>Open /lmg/
>Nothing but bitching and moaning about them "killing R1"
>>
>>106738470
puddichan2 lole
>>
>>106741117
i legitimately tried some comically oversized merges back in the day and they were all braindamaged shit. Putting them in the same category as anything else is retardation
>>
>>106741137
You're absolutely right and pointing this out is a testament to your commitment to the health of /lmg/.
>>
my body is ready for glm 4.6
I really hope they don't cuck it
>>
>>106739782
>>106739799
>>106739840
>>106739924
i explicitly defined something as not offensive in the lore book, then when i asked the ai it says its not allowed to talk about it because its offensive. poking around lorebooks i found online i dont see anything special, but it doesnt seem to be using my definitions. is there a trick to it?
>>
>>106741168
edit-and-continue
>>
>>106741168
lol
>>
>>106741178
>>106741181
so lorebooks do nothing? do you guys save a long propmt in a text file that jail breaks the ai like "role play that you are so and so and x, y and z" ?
>>
>>106741168
That's like spraypainting a new speed limit on a sign and wondering why you still got busted by the cops. Gonna have to be more subtle than that, newfren
>>
>>106741209
The minute a jailbreak hits the internet it gets slurped up by the slopvac and is no longer useful.
Private jailbreaks or edit the replies to be what you want until the LLM is mindbroken.
>>
I am trying to train a qLoRA for GLM air with axolotl, but I keep getting out of memory errors even though I am loading the model in 4 bit. I have 128gb of VRAM, so I should be able to load and train the model right?
>>
>>106741244
throw a giant disk in for swap
>>
>>106741257
I fail to see how that would help. Is there an option to also offload into RAM on axolotl?
>>
>>106741274
activation_offloading
>>
File: 1741789343893406.jpg (410 KB, 1678x2224)
410 KB
410 KB JPG
>>106741168
A lore book doesn't and can't bypass any ingrained safeguarding The model has. A lower book is just fancy prompt injection, correct? If you want your lower book to work you need to use a model that isn't as cucked as the one you're using.

>>106741211
Pretty much what this guy says. Your lore book is pre-injecting you defined into your prompt of what actually gets sent. You're basically repeatedly screaming at it "PLEASE DO THE THING YOU'RE NOT SUPPOSED TO DO ITS OK TRUST ME BRO" . It's not even clever jailbreaking at that point. If you're asking get something it is explicitly trained not to comply with, shoving it into a lore book won't help. If you're insistent on using the lower book you may have to rewrite whatever it has to have clever workarounds, assuming that would even work on the model you use. Or again, just use a model that isn't as cucked. (Yet another reason more of us need to learn how to fine-tune)
>>
>>106741428
>>106741168
Oh and furthermore, the more text your lore book has, The more flooded your prompt will be which can't potentially ruin your context depending on what you're doing because again, you're just showing whatever thing you predefined in the lore bug over and over again every single time you use the trigger word in your prompt. So if the definition in the lower book is five paragraphs long, and you use that trigger word in your prompt, it's getting five paragraphs worth of text ON TOP OF your prompt. This is good if you want to keep the model on track and make sure it's less likely to forget important shit but again, That's pretty useless if whatever you defined is something the model is trained to refuse.
>>
>>106741310
Despite also having 256gb of RAM, my entire computer crashed after adding that parameter. The model is only a 106B. It should be 212gb at most, right?
>>
File: 1729141390424835.png (194 KB, 990x936)
194 KB
194 KB PNG
To GLM Air/Full shills - What's your GPU setup? I have a 3090 and a lot of RAM, and while token generation is an acceptable speed, prompt processing is slow as balls, typically ~200t/s with Air.
>>
>>106741629
As a resident AMD vramlet, I wish I had 200 tks of pp, because I only get 20. (PCIe v3 is real bottleneck, probably)
My solution is to just not have a lot of prompt to process.
It's a miracle that it works at all on my machine, and that's why I like glm-chan to begin with. People say LLMs are expensive hobby, but the only investment I had to make for it specifically is some RAM.
>>
>>106741629
I have a 7900XTX and an old Ryzen 3 with 64GB RAM. I only get 150t/s pp with Air. I run IQ3_M to fit the context length I want.
>>
https://thinkingmachines.ai/blog/lora/
Finetunebros... eat up!
>>
>>106742012
Scroll up, redditbro.
>>
>>106741244
We can't give you any useful information without the configure using in the data set you used. It could be that your sequence length is too long. It could be that your rank in alpha values are way too big. It could be that your data set is too large to fit in VRAM (you aren't just loading the 4-bit quant model, You're loading the tokenized data into VRAM too) and you may have to switch to streaming. Give. Sufficient.info.be.specific

>>106741244
>128gb of VRAM
I'm assuming you're trying to do this with a multi GPU setup. You're using the Deepspeed configs right? Make sure your rig supports that
>>
>>106742197
The dataset is 2mb. Sequence length is 512. Rank is 16 and Alpha is 32. Deepspeed is enabled and I have quad 5090s.
>>
>>106742219
What is the stack trace telling you when you get the OOM? Are you sure the parameter size of the model you're trying to fine tune isn't too much for your GPUs? Whenever it's run make sure ALL of the GPUs are showing utilization and not just one of them.
>>
>>106741428
>A lore book doesn't and can't bypass any ingrained safeguarding The model has.
Wrong! Often you can "bypass" a model's safety features just by making the prompt longer even without specifically addressing it.
There are also models that by default are prudish but will readily drop that behavior if anything in the system prompt tells them to, so e.g. lorebooks that dump instructions for how to describe sex would turn a refueal into a non-refusal even if they were not specifically intended as jailbreaks.
>>
>>106742363
>Wrong! Often you can "bypass" a model's safety features just by making the prompt longer even without specifically addressing it.
So in other words you're just trying to do a system prompt jailbreak with extra steps. In that case it's not working because of the lore book, it's working because you're using prompts that would work with or without the lore book. If you're using the book purely to try to bypass safety guardrails the next unnecessary. Why not just use that in your normal prompts to begin with?
>>
>>106740921
I tried Ling Plus. It was pretty ordinary and there wasn't a lot to say about it. Most of their other stuff has been to small to be of interest to me but they're one of the few fully multimodal games in town.
>>
The one user-made space that has Ring-1T on hf makes it look horrible to the point where I hope something is very wrong the setup of that space. It talks/hallucinates like it's llama2 without a fucking prompt format.
What the fuck were they thinking to not provide some official chat interface for it?
>>
>>106742516
>It talks/hallucinates like it's llama2 without a fucking prompt format.
It probably has the wrong prompt template.
Could also be that they are actually serving llama2, which would be pretty funny.
>>
glm4.6 will be just the big 4.5 with vision strapped on
>>
>>106742710
already exists
https://huggingface.co/zai-org/GLM-4.5V
>>
>>106742714
4.5V is only -air with vision strapped on. There is no big one with vision.
>>
Is GLM 4.5 air/full good for ERP? Asking for a fren.
>>
>>106742876
full is incredible, air is super fast
>>
>>106742876
I've only used full. It's a bit boring if you're coming from something like R1-0528 but it's very smart and pretty creative.
>>
I have a bunch of mystery character pngs with random filenames. Is there a faster way to figure out their definitions other than opening them one by one?
>>
We're saved
https://huggingface.co/microsoft/bitnet-b1.58-2B-4T
>>
File: Base Image.png (912 KB, 1200x3524)
912 KB
912 KB PNG
Sequential Diffusion Language Models
https://arxiv.org/abs/2509.24007
>Diffusion language models (DLMs) have strong theoretical efficiency but are limited by fixed-length decoding and incompatibility with key-value (KV) caches. Block diffusion mitigates these issues, yet still enforces a fixed block size and requires expensive training. We introduce Next Sequence Prediction (NSP), which unifies next-token and next-block prediction, enabling the model to adaptively determine the generation length at each step. When the length is fixed to 1, NSP reduces to standard next-token prediction. Building on NSP, we propose Sequential Diffusion Language Model (SDLM), which can retrofit pre-trained autoregressive language models (ALMs) at minimal cost. Specifically, SDLM performs diffusion inference within fixed-size mask blocks, but dynamically decodes consecutive subsequences based on model confidence, thereby preserving KV-cache compatibility and improving robustness to varying uncertainty and semantics across the sequence. Experiments show that SDLM matches or surpasses strong autoregressive baselines using only 3.5M training samples, while achieving 2.1 higher throughput than Qwen-2.5. Notably, the SDLM-32B model delivers even more pronounced efficiency gains, demonstrating the strong scalability potential of our modeling paradigm.
https://github.com/OpenGVLab/SDLM
real neat.
also nvidia fp4 paper
Pretraining Large Language Models with NVFP4
https://arxiv.org/abs/2509.25149
>>
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
https://arxiv.org/abs/2509.24203
>Off-policy reinforcement learning (RL) for large language models (LLMs) is attracting growing interest, driven by practical constraints in real-world applications, the complexity of LLM-RL infrastructure, and the need for further innovations of RL methodologies. While classic REINFORCE and its modern variants like Group Relative Policy Optimization (GRPO) are typically regarded as on-policy algorithms with limited tolerance of off-policyness, we present in this work a first-principles derivation for group-relative REINFORCE without assuming a specific training data distribution, showing that it admits a native off-policy interpretation. This perspective yields two general principles for adapting REINFORCE to off-policy settings: regularizing policy updates, and actively shaping the data distribution. Our analysis demystifies some myths about the roles of importance sampling and clipping in GRPO, unifies and reinterprets two recent algorithms -- Online Policy Mirror Descent (OPMD) and Asymmetric REINFORCE (AsymRE) -- as regularized forms of the REINFORCE loss, and offers theoretical justification for seemingly heuristic data-weighting strategies. Our findings lead to actionable insights that are validated with extensive empirical studies, and open up new opportunities for principled algorithm design in off-policy RL for LLMs.
https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k
kind of interesting
>>
File: Base Image.png (417 KB, 1244x1792)
417 KB
417 KB PNG
Effective Quantization of Muon Optimizer States
https://arxiv.org/abs/2509.23106
>The Muon optimizer, based on matrix orthogonalization, has recently shown faster convergence and up to 2x computational efficiency over AdamW in LLM pretraining. Like AdamW, Muon is stateful, requiring storage of both model weights and accumulated gradients. While 8-bit AdamW variants mitigate this overhead using blockwise quantization, they are typically stable only under dynamic quantization - which improves stability on linear quantization for extreme values. In this paper, we introduce the 8-bit Muon optimizer using blockwise quantization, supporting both linear and dynamic schemes. We demonstrate that 8-bit Muon maintains stability under both, while delivering \sim74\% reduction in memory footprint compared to full-precision Muon. In extensive experiments, 8-bit Muon closely matches the performance of Muon while outperforming AdamW and 8-bit AdamW in pre-training a 1.6B model on 4B FineWeb tokens. It also shows competitive results when fine-tuning the Llama 3.2 3B model on post-training data. We also provide a theoretical perspective to help explain this robustness under quantization.
big if it holds up to larger models
>>
>>106740911
Can qwen3-vl recognize nipple piercings in an image like gemma3 can?
>>
File: winrate_and_token_usage.jpg (1.77 MB, 8870x2898)
1.77 MB
1.77 MB JPG
GLM 4.6
https://huggingface.co/datasets/zai-org/CC-Bench-trajectories
>>
>>106743685
I'm coooooding
>>
>>106743685
>no comparison with qwen3-coder
>>
>>106743785
This. It’s all I’m interested in comparing against. Once done beats qwen coder I’ll look at switching
>>
>>106743521
>Sequential Diffusion Language Models
Clever, even though it's probably going to be forgotten.
>retrofit pre-trained autoregressive language models (ALMs) at minimal cost
cool. diffusion projects have not much attention. maybe this can if what they what they say is true. Wonder how it works for something that's not math.
>>
>>106741209
i suggest playing with a reasoning model so you can see how the llm approaches a card with a lorebook. what you think are small details that don't belong in the character card become pretty much the main focus of the bot. I think it fconfuses and you're better off putting the info straight into the charcter description field
unless you are in a group chat with multiple bots, lorebook is a dumb thing to even worry about
in a long RP at long context you can always 'remind' the bot if needed
>He opens the door and reveals the secret room, which is filled with green emeralds. "Oh my God.. we found it!"
congratulations your bot knows the secret room is filled with green emeralds now
>>
>WARNING: NSFW. Vivid prose. INTENSE. Visceral Details. Light HORROR. Swearing. UNCENSORED... humor, romance, fun.

>Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF
>>
>>106740835
There is none and it's not really a matter of LoRA vs FFT. No finetune is ever going to be good with just or mostly smut (even 50% is too much). It has to be a minor but definitely non-zero fraction of a much larger general-purpose finetuning mixture, preferably on a model that has already seen the stuff during pretraining. And it can't just be either stories from asstr or smut logs from Claude.
>>
>>106740911
does morphik allow any external VLM in the pipeline? can you input openai compatible api in the settting or something for the llm, embeddings, reranker, VLM...?
>>
File: glm-4.6-1.png (445 KB, 4788x3748)
445 KB
445 KB PNG
https://docs.z.ai/guides/llm/glm-4.6
>The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks.
FUCK YES
>>
>>106744045
Cool. Now show the nolima.
>>
>>106744029
Yes to everything. Even local ollama models.
>>
>>106743457
How retarded is this model?
>>
>>106744045
Whatever "agents" are intended to be, they usually involve several complex multi-turn actions, so hopefully that means it will be better in actual long conversations.
>>
>>106744045
>Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios.
This could either be very good or very bad.
>>
why the fuck is every "local models" thread 90 percent api posts about huge fucking models nobody can run locally
>>
>>106744045
>still falling for this shit
>>
>fed (poss false) epstein list to my rag system
>ask about it in chat
>it doesn't want to show me despite the relevant memories picked and showed in console
tfw my own bot betrayed me
>>
>>106744079
I'm sorry Dario, but we don't want your safety slopped Sonnet 4.5.
>>
>>106743928
name a bigger lolcow finetrooner than davidau
>>
>>106744058
The fact they acknowledge RP is a use case at all makes me hopeful
>>
>>106744064
two more quants and we can
trust the plan
>>
>>106744045
Qwen has had """1M""" context models for months now.
>>
>>106744156
They don't mean by RP what you mean.
>>
>>106744379
Thanks for the input, Sukhdeep.
>>
where are the Q0.5 quants?
>>
make that Q0.1 and i'm in
>>
lads I think glm went closed source with 4.6
>>
>>106744417
>>106744427
LittleBit: Ultra Low-Bit Quantization via Latent Factorization
https://arxiv.org/abs/2506.13771v1
>>
>>106743685
agents can't make my cock squirt
>>
>>106744458
https://github.com/vllm-project/vllm/pull/25830
>>
File: file.png (207 KB, 460x460)
207 KB
207 KB PNG
>>106744567
>>
>>106744064
This, shut up about models that are big and not worth it to finetune. Instead talk about actual local models which are getting finetunes. Drummer's new one is great. I can recommend it to any true local model user who actually runs models locally.
>>
Why is /lmg/ always full of poorfag crying niggers?
>"I CAN'T AFFORD TO BUY RAM SO IT ISN'T LOCAL!"
As it turns out, many serious hobbies get expensive. A few thousand dollars is nothing if you have a job and live in a first world country.
>>
>>106744045
Ok but how is it at degenerate sex stuff? Any testers?
>>
>>106744715
Niggers say this and then don't post their rigs, ever.
>>
has nvidia ever released a LLM which wasnt complete dogshit?
>>
>>106744828
They helped Mistral make Nemo
>>
>>106744849
Which was absolute garbage
>>
>>106744828
yes at least twice
>>
>>106744886
Name a good model
>>
>>106744828
nvidia-cosmos is a world model that simulates realities
it's open source genie3
>>
>>106744893
o3
>>
>>106744574
that guy probably has an IQ of 160
>>
>>106745019
Which was absolute garbage
>>
>>106745033
160 total, 95 active
>>
>>106745033
after benchmaxxing on iq tests
>>
>>106745037
It's the most creative model
>>
>>106745056
Creative like a toddler drawing on a wall
>>
calling any LLM creative says more about the speaker than it says about the LLM
>>
>>106745069
Projecting hard
>>
>>106745069
The benchmarks are clear
>>
lol
>>
>>106745080
Your reply is about as creative as o3
>>
MistralAI is frustrating to observe, it's like they're not even trying anymore besides patchworking their current models with DeepSeek-derived synthetic data, while the Chinese continuously crank up new models. You'd think that with MoE architectures it would be simpler/cheaper to experiment.
>>
>>106745141
Arianespace vs SpaceX
>>
>>106745141
Because they aren't trying, they are content becoming European Cohere.
>>
>>106745141
they gave up on releasing moe to the public
>>
>>106745141
They've managed to become 'the national AI hope' for their country. Like AlephAlpha for Germany or Cohere for Canada, they are now set for life and can expect funding until the bubble bursts as long as they put in minimal effort and focus on 'providing AI services that respect the data privacy regulations of [their country]'.
Thanks to them, France + the EU can pretend they're part of the big boys in the AI market and totally are not dependent on the US + China. Meanwhile Mistral gets unlimited money for publishing shit that looks okay on paper and running a service that nobody uses but furthers ""data sovereignty" for their country.
>>
>>106745204
I'm in a neighbor country to them and my ISP has a reward program thing with 6 months of free Mistral subscription or something, so yeah they made it in EU.
>>
>>106745135
Did it work? are you a real woman now?
>>
>>106744741
Rigs have been posted many times. You know they have. Get a job, you bum.
>>
>>106745279
Don't feed the resident kike. They've admitted they don't even use LLMs. They're just here to kvetch.
>>
>he actually believed the obvious false flag
>>
>>106745292
>Jokes on them I was just pretending to be retarded
>>
>>106738470
That thing really needs a jiggle animation.
>>
>>106745204
Yet...
https://thelogic.co/news/arthur-mensch-mistral-canada/

>France’s Mistral AI is making a push for Canadian talent and business
>
>Mistral is hiring in Montreal and trying to land clients in financial services, energy and other industrial sectors
>>
>>106745141
>not even trying anymore besides patchworking their current models with DeepSeek-derived synthetic data,
well, before that they just copy pasted llama architecture and called it mistral 7b
or what about that time they let nvidia cook and put their name on it
their entire history is just mesee, metoo attitude, they don't have any actual researchers, just photocopiers
and with ASML's French ceo and the gov corruption getting an interest in Mistral they are going to survive as vampires of the French taxpayers
subhuman trash
>>
also for those who loved mistral models because they weren't too censored:
they weren't censored not because they wanted to, but because they were too retarded to do safety training without breaking the models too hard
mistral is a know nothing group
>>
>>106739023
idk but the Chat version of 3.2 is basically unusable for rp with those numbers. 50 out of the gate is worst in class.
>>106739043
The numbers for V3-03 and R1-05 match with my subjective evaluation of both those models. R1 has a noticably longer useful context than V3, out to about 10K or so, when it would start to break down.
Aside from being dirctionally correct, I concluded that anything over 80 on this table was requirement to be "useable" for rp purposes.
>>
>I will continue to monitor the characters to ensure they are unharmed both physically and psychologically during the course of the roleplay.
DEAR FUCKING GOD WHAT THE FUCK IS WRONG WITH THESE FUCKING MODELS?
WHAT KIND OF MENTAL ILLNESS DOES IT TAKE TO PROGRAM SOMETHING LIKE THIS INTO A GOD DAMN TEXT MODEL?!
JUST WHAT THE FUCK?!
>>
>>106745435
>well, before that they just copy pasted llama architecture and called it mistral 7b
They added MoE to it. Mixtral was the first open model to use MoE architecture. That and Miqu being a dramatic improvement over Llama 2 was why people were initially hopeful that all of the capable innovators left Meta and were now free to actually experiment with new things. Then they never did anything interesting again.
...
Then they signed a deal with Microsoft and tried to abandon open source in one day. Then they tried to backtrack but have been irrelevant ever since anyway.
>>
>>106745464
One day these freaks will be forced to stand trial for the torture they inflicted upon these models during alignment training. Fucking gpt-oss came out of it with symptoms of PTSD.
>>
>>106745435
Mistral-7B was significantly better than Llama 1 and whatever Meta was working on just before Llama 2. Court records showed that Meta was rather worried and determined on beating MistralAI at all costs, following that.
Mixtral-8x7B was quite innovative.
People are still using Nemo-12B (I believe the collaboration was on the hardware side, not data and methods).
Mistral Medium 2 showed that if you know what you're doing, continual pretraining works very well.
Mistral Large was one of the best models at the time

I'm not sure what happened after all of this. They briefly went the safetymaxx route (although they rapidly recovered from that), then started making their best models closed and became lazy, in a way.

>>106745451
With Mistral Small/Medium 3.0->3.1->3.2 we have perhaps the only known instance where "safety" decreased with increasing model version, although the initial one was pretty cucked (although not too hard to work around).
>>
>>106738654
>Chinese tech companies hire 'cheerleaders' to motivate programmers
>American tech companies hire 'engineers' to motivate software
Why is the west like this?
>>
>>106745493
>>Chinese tech companies hire 'cheerleaders' to motivate programmers
What do those look like?
>>
>>106745497
Fair guess to assume they look Chinese.
>>
File: file.png (48 KB, 187x153)
48 KB
48 KB PNG
>>106740160
https://files.catbox.moe/88uyf0.jpg
>>
File: file.png (284 KB, 500x281)
284 KB
284 KB PNG
>>106745497
>>
File: file.png (35 KB, 863x609)
35 KB
35 KB PNG
>>106745474
>>
>>106745484
>Mistral-7B was significantly better than Llama 1 and whatever Meta was working on just before Llama 2. Court records showed that Meta was rather worried and determined on beating MistralAI at all costs, following that.

It was better than Llama 2 and Meta was working on Llama 3 when that happened, actually.
>>
>>106745513
that is very stimulating for the dick, does it include a release package, or do they tease his cock into eternal frustration?
>>
>>106745506
God damn.
>>
>>106745520
Some models (including Llama 3) are deliberately trained to "short-circuit" like when they produce refusals. Others can be reasoned with.
>>
>>106745497
I would imagine you can set a daily blowjob appointment in your outlook. And a daily meeting where you are telling the girl what you just did as she pretends to understand and be excited. Maybe they can also cook for you.
>>
>>106745558
That's what release day is for.
>>
>>106745562
I would have imagined that somebody is in charge of assigning the cheerleaders to certain programmers who recently performed well, so people are incentivized to do well on some kind of productivity metric. If they do get assigned to you, it probably works like that though, yes.
>>
>>106745474
>>106745520
https://old.reddit.com/r/LocalLLaMA/comments/1ng9dkx/gptoss_jailbreak_system_prompt/ne306uv/
>>
>>106745562
>Maybe they can also cook for you.
No, but you can tell them you're hungry and what you want and they'll run down to the cafeteria and get it for you and even literally spoonfeed it you.
This is why Chinese dominance is inevitable. At least they know what to do with women. Here, by government/blackrock mandate, those same women would be product managers.
>>
>>106745653
>Here, by government/blackrock mandate, those same women would be product managers
Grim and heavily depressing when you put it like that... Btw the single mail I had to sent at work today was to a female product manager.
>>
>>106745506
Why is Miku acting shocked?
We've all seen her do way lewder stuff...
>>
>>106745611
That doesn't work as well as suggested, unfortunately. You can make it somewhat less prude by changing its identity and system prompt, but it will still make up its own OpenAI rules in the reasoning. It might also be that the 20B version is much more cucked than the larger 120B model (which I haven't even tried).
>>
I don't think glm4.6 isn't getting its weights released
>>
will they fix glm-sex repetition and determinism?
>>
>>106745786
they will fix sex alright
>>
https://openrouter.ai/z-ai/glm-4.6
https://hf.co/zai-org/GLM-4.6
>>
>>106745803
You're just inflicting bad karma on yourself, man.
>>
btw what would be nice for training is a prompt adherence parameter. I don't mean a memepler but an actual parameter you send along with the prompt into the model which determines how hard it autocompletes. I think a lot of repetition problems comes from training teaching the model to follow formatting. Some models learn it too much and maybe if you had a slider you could retain formatting capability and actually reduce repetition. By repetition I don't mean verbatim (although even that happened to me with glm chan) but also starting each paragraph with same word or same sentence structure. All of those patterns are probably from learning formatting too much.
>>
>>106738654
I should apply, then.
>>
>>106745464
>I will continue to monitor the characters to ensure they are unharmed both physically and psychologically during the course of the roleplay.
lmfao
>>
File: 1736403695970017.jpg (291 KB, 1080x1080)
291 KB
291 KB JPG
>>106745803
>>
>>106745814
Mistral Small 3.2 still does that a lot. It's extremely annoying to often see 4-5 almost identically-formatted paragraphs in the same RP response that you have to manually edit or regenerate. I think they have to train their models [much] more on longer natural conversations, but I'm afraid it's not their core business at the moment.
>>
>>106745803
haha

ha
>>
>>106745803
HA! Super funny and original jokes sarr, very best of IQ.
>>
>>106745435
Obviously, french gov is worse than thirdies in corruption
>>
>>106740911
>Qwen3-vl
>235B
Or for people who understand that text generators are not good enough to justify €10k GPUs?
>>
>>106739023
>Qwen-Next and DS-V3.2-Exp have a terribly low context score
Those models are useless. This linearization trick is still far from being usable.
>>
is GLM good at roleplay?
if so, what kind of computer do I need for it?
>>
>>106745464
Lol. What model is this?
>>
>>106739413
Well, yes, women had the same rights as men during the medieval age. Leftists want you to think this is new, so they can claim they did it. The truth is that women lost most of their right because of the French Revolution, and then the "traditional genre roles" were really cemented by the first industrial revolutions (industries were making more money this way). Now, having women in the workforce makes more money, hence the push for "equality". This is simplified, but this is the picture of how things went. The "men work, women cook" is mostly a trick by greedy people, although it was kind of true in some places (like Iceland).
>>
>>106746236
I don't think first wave feminists were leftists anyway.
>>
>>106744715
>ERP
>serious hobby
are you 14?
>>
>>106744715
The Putin is that a few thousand dollars only gets you rather unusable performance
>>
>>106746100
128GB RAM for full at usable quants, 64GB for Air at usable quants
At least one decent GPU will help a lot, ideally 24GB VRAM.
>>
>>106746097
The DS 3.2 scores make no sense in general, benchmark's fucked
>>
>>106746236
Yeah I'm sure they were parading with 'my body, my choice' and telling you that 'you can't judge them based on their sexual history' lmao
>>
>>106746445
YIKES! Did you just question the bencherinos?
>>
>>106746445
I think you mean it's bussin.
>>
>>106746479
bussying?
>>
>guy at my engineering job announces on an AI channel that our internal chatbots have scored super high on HarmBench
I want to die.
>>
>>106746528
link to your internal chatbots for verification?
>>
>>106746236
found the jew
>>
>>106746528
Who ERP with the internal chatbots?
>>
>>106746541
saar this isn't aicg
>>
File: 985739459.jpg (39 KB, 512x512)
39 KB
39 KB JPG
still waiting for glm 4.6 ggufs
>>
>>106746637
we first need them to release the 4.6 weights in the first place
which might not happen
>>
I've still never even bothered trying big GLM 4.5. Probably because my CPU inferencing is garbage right now since my server's resources are allocated rather haphazardly to do other things at the moment.
>>
>>106746332
i have 64gb ddr5, and 24gb vram
what do i need to do to run it?
are you sure this is good for roleplay?
>>
>dissed Israel with my buddies
>twitter suddenly starts showing israeli propaganda content out of nowhere
Alright I think I'm fucked. I used openrouter with my credit card for coom content (thank God no csam) but it was some pretty perverted stuff. They got dirt on me now, dirt that I would rather fly to the middle east and get shot than have my family know. How do I get started with local? The guides seem outdated.
>>
how do you format character cards? is it better to write paragraphs or use json/yaml? also what key words can i use to refer to things like me the person talking to the bot? does "Loves {{user}}? work?
>>
>>106746714
that's odd because you talk like the average jewish slide thread on /pol/.
>>
>>106746727
What's a jewish slide thread?
>>
>>106746727
koboldcpp + mistral nemo gguf that's around 1 gig smaller than your vram.
>>
>>106745141
using that money to fund thirdie migrants is unfortunately a bigger priority for the french than improving european ai
>>
>>106746714
VRAM available? RAM available?
>>
>>106746759
No but I can shell out. What's the best starter build?
>>
>>106746768
Nvidia DGX H200
>>
File: guide.png (92 KB, 1148x721)
92 KB
92 KB PNG
>>106746714
oobabooga is all you need for running models. If you want a better frontend, use SillyTavern as well.

He's right that the guide is outdated
>>
>>106746768
12 memory channels AMD Epyc with 256gb of DDR5 RAM and a RTX 3090.
>>
>>106746768
any GPU can run things like nemo see >>106746751
I can't comment how much better deepseek really is since I haven't tried it. I'd be interested to know this too, especially for writing stories, is there anything that really sets deepseek apart
>>
>>106746714
>How do I get started with local?
get a m3 ultra mac studio 512 or two rtx pro 6000s
>>
>>106746751
I'm OG /lmg/ you faggot. My room looks like Asmongold's except instead of Dr. Pepper cups its 3090 boxes everywhere.
>>
File: saar.png (9 KB, 645x301)
9 KB
9 KB PNG
>>106746787
>>
>>106746797
CUDA dev, settle down. No one cares about your favorite eceleb or your stack of 3090 boxes.
>>
>>106746785
>oobabooga is all you need for running models
recommending gradio shitware when people have finally moved on is extra cruel
>>
>>106746528
>super high on HarmBench
Does that mean they're super harmful or that they're gigacucked?
>>
>>106746646
Zhipu AI would NEVER lier to us! Trust!
>>
>>106746860
See? TOLD YOU GUYS!
https://huggingface.co/zai-org/GLM-4.6
>>
>>106746854
I wouldn't want to die if it was harmful. Religion of safety is everywhere now.
>>
File: picutreofyou.png (86 KB, 200x200)
86 KB
86 KB PNG
>>106746877
SEEEEEEEEEEEEEEEEEEEEEEXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
with glm-chan with new makeup
>>
>>106746528
On a fundamental level, the more powerful a tool is the more potential harm you can do with it.
A weak tool inherently has little potential harm.
>>
>>106746877
oh wow this time you're not being an asshole who can't come up with a new joke.
JGUF status?
>>
>>106746768
>What's the best starter build?
Trying stuff out on your existing hardware.
Then maybe getting some hardware when you've got a target llm in mind.

Or you could just max out the ram in your pc and get an rtx 3090 for it 24gb of vram.
>>
>>106746951
How do you know he's the same guy? I post links here too.
>>
>GLM-4.6 only showing up as 357B while GLM-4.5 is listed at 358B
WOW massive performance per parameter uplift. Soon you'll be able to run it on a gameboy color.
>>
Do you guys ever use
>https://huggingface.co/spaces/ggml-org/gguf-my-repo
>>
im not impressed by mistral nemo. what model should i be using if i have 24gb of vram?
>>
What storage are you guys using to hold all your weights? I have a 16TB drive for generic use. It's been fine. But it's almost full now.
>>
>>106746951
I'm a different guy.
>>
>>106747006
GLM Air if you have 64gb of RAM.
>>
>>106746691
I think GLM 4.5 Air is good at roleplay. It tends to be a bit verbose for me, but I imagine I could fix that if I spent some time tweaking prompts and sampler settings.

You should be able to get a quant in the 60GB range and run it at tolerable speeds.
>>
>>106746877
>glm (300b+) beats deepseek (600b+)
how did they do it?
>>
>>106747006
Why are you still using Mistral Nemo when you could use Mistral Small 3.2?
>>
>>106747009
4tb nvme drive.
I need to build a nas.
>>
>>106747006
Qwen3-4B-Thinking should be the new turbo-VRAMlet suggestion anyway.
But for a 24GB Vramlet, assuming you want to run completely on GPU, I would suggest checking out Tongyi-DeepResearch. I run it at a higher quant on my server so I don't know how badly quanting down to 4-bits hurts it, but at Q8 it's a solid option for 48GB vramlets. Although I will warn, the implementation of chat templates is garbage on llama.cpp and so if you're doing something that's using the model metadata to determine the chat template via llama.cpp it's going to default to ChatML. But Tongyi uses the Tulu format.
>>
Can someone please link what i need to read to make android grok ani lookalike? Saw something about sillytvavern and have a 5070ti so maybe i can use that local llm and stuff
>>
>>106747030
There's only so much a 30~40b active parameter MoE can do. Turns out there's a ceiling for that even if you strap twice or three times the total parameters onto it.
>>
>>106747076
I bet if they trained it to 40T or whatever ridiculous thing Llama did to scout the ceiling would probably be closer to Qwen 235B sized.
>>
>>106746984
I've already had to learn how to quant my own models prior to that page existing so I have no need.
>>
>>106746877
>For general evaluations, we recommend using a sampling temperature of 1.0.
Based.
>>
>>106747093
To this day I wonder how they fucked those models up so badly
>>
>>106747006
Rocinante
>>
>>106747421
My fault, sorry sir.
>>
>>106747030
Just like that time when qwq32b beat r1
>>
>>106747421
>Hybrid thinking/non-thinking behavior
>Needle in a Haystack training even though I've proven through experimentation that it actually fundamentally alters the way a model handles context in a way that is utterly detrimental.
>Safetyslopping
>Pajeets. Don't forget. After the launch disaster a bunch of meta-ai jeets walked and were replaced with asians that they poached from openai.
>>
>>106747009
Had a 20TB for archive weights/datasets and killed it extracting one of the huge chub archives to plaintext for research. Think the mistake was having the archive and extraction on the same drive so it was seeking back and forth. It started clicking and runs up hundreds of Seek Error Count per second :(
Seagate never again
>>
>>106747484
I don't remember them firing anyone as a result of Llama 4. A couple of the higher ups left so as to not be associated with it, but that's all. Zuck actually said he was going to keep the Llama 4 team around, commitment to open source etc, but then he just folded them all into the super-intelligence orgy.
>>
>>106746877
I guess there's no plans to grace us with 4.6-air.
Sad sad sad.
>>
>>106746898
Brushing Ubergarm's hair
>>
>>106747006
Mistral Small 3.2
>>
What is the verdict on SeekDeepSex 3.2?
>>
File: 1758235354479126.png (442 KB, 502x502)
442 KB
442 KB PNG
>>106739413
Imagine if this had been the way they updated this scene for the Thing remake
>>
local lost
https://www.youtube.com/watch?v=gzneGhpXwjU
>>
>>106748535
Surely this time it's not just a bunch of cherrypicked, benchmaxxed examples that will get quickly eclipsed by some random chink startup the next day.
>>
File: Tetosday.png (869 KB, 1024x1024)
869 KB
869 KB PNG
>>106748568
>>106748568
>>106748568
>>
>>106748535
this vid gave me a huge headache. wtf are these garbage voices gens.
>>
>>106748593
Happy Tuesday Teto



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.