/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 09/29/25(Mon)14:25:05 No.106738470

File: migudding.jpg (170 KB, 1024x1024)

170 KB JPG

/lmg/ - Local Models General Anonymous 09/29/25(Mon)14:25:05 No.106738470 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106729809 & >>106718496

►News
>(09/29) Ring-1T-preview released: https://hf.co/inclusionAI/Ring-1T-preview
>(09/29) DeepSeek-V3.2-Exp released: https://hf.co/collections/deepseek-ai/deepseek-v32-68da2f317324c70047c28f66
>(09/27) HunyuanVideo-Foley for video to audio released: https://hf.co/tencent/HunyuanVideo-Foley
>(09/26) Hunyuan3D-Omni released: https://hf.co/tencent/Hunyuan3D-Omni
>(09/25) Japanese Stockmark-2-100B-Instruct released: https://hf.co/stockmark/Stockmark-2-100B-Instruct
>(09/24) Meta FAIR releases 32B Code World Model: https://hf.co/facebook/cwm

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
09/29/25(Mon)14:25:21 No.106738476

Anonymous 09/29/25(Mon)14:25:21 No.106738476

File: threadrecap2.png (506 KB, 1024x1024)

506 KB PNG

►Recent Highlights from the Previous Thread: >>106729809

--DeepSeek Sparse Attention efficiency and pricing update:
>106734449 >106734483 >106734485 >106734501 >106734511 >106734529 >106734516 >106736999 >106734497
--Exploring hardware limits and RAG optimization for large quantized models:
>106729830 >106729869 >106729935 >106729969 >106730266 >106730661
--Deepseek performance requirements and hardware dependency analysis:
>106733904 >106733927 >106733961 >106733965 >106733976 >106734006 >106734075 >106735663 >106735974 >106736017 >106736027 >106736087 >106736139 >106736255 >106736298 >106736941 >106736994 >106737008 >106737057 >106737208 >106736297 >106736355 >106736426 >106736556 >106736652 >106736304 >106736500
--Ollama's new memory management boosts token generation speeds:
>106737307 >106737371 >106737381 >106737438 >106737513 >106737521 >106737510
--Running qwen-image-edit on 1080 8GB VRAM with 8-bit quantization:
>106737106 >106737121 >106737130 >106737138 >106737162 >106737186
--llama-server token limit truncation issue with n_predict parameter:
>106735034 >106735040 >106735043 >106735184 >106735214
--RAG requires structured data processing and advanced retrieval mechanisms:
>106729880 >106730022
--Model 3.2 shows improved creativity and response quality over 3.1:
>106735718 >106735779 >106735797 >106736222 >106735877 >106736036 >106736267 >106736276 >106736470 >106736502
--DeepSeek-V3.2 model errors and anime character recognition failures:
>106733393 >106734083 >106734191
--Tencent's HunyuanVideo-Foley model for generating audio from video:
>106730457 >106730523
--Replacing terminus with Deepseek-Reasoner and lowering API prices:
>106736340 >106736353
--DeepSeek-V3.2-Exp sparse attention release:
>106734362 >106734392
--DeepSeek-V3.2 model collection:
>106734119
--Miku (free space):
>106730435 >106733133 >106734858

►Recent Highlight Posts from the Previous Thread: >>106729810

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
09/29/25(Mon)14:29:09 No.106738513

Anonymous 09/29/25(Mon)14:29:09 No.106738513

>3.2 is actually okay
>ring-1t surprise drop
>glm-4.6 likely out in the coming days
we are so back

Anonymous
09/29/25(Mon)14:30:42 No.106738528

Anonymous 09/29/25(Mon)14:30:42 No.106738528

>>106738513
>benchmaxxed soulless chinkslop
>we are so back
we are not "back"

Anonymous
09/29/25(Mon)14:30:54 No.106738532

Anonymous 09/29/25(Mon)14:30:54 No.106738532

File: file.png (18 KB, 522x266)

18 KB PNG

>>106738513

Anonymous
09/29/25(Mon)14:34:17 No.106738577

Anonymous 09/29/25(Mon)14:34:17 No.106738577

File: llm.jpg (109 KB, 1188x446)

109 KB JPG

didnt work

Anonymous
09/29/25(Mon)14:35:27 No.106738591

Anonymous 09/29/25(Mon)14:35:27 No.106738591

>>106738577
You're absolutely right!

Anonymous
09/29/25(Mon)14:36:04 No.106738603

Anonymous 09/29/25(Mon)14:36:04 No.106738603

>>106738577
Sys prompt telling it to assume the role of an award winning erotic novel author or whatever the fuck.
You are probably using the default "Helpful assistant" persona.

Anonymous
09/29/25(Mon)14:39:44 No.106738647

Anonymous 09/29/25(Mon)14:39:44 No.106738647

if you dont have your own secret benchmark that indicates lack of improvement of a new model version, your benchmaxx claim is being. unless you talk about grok4, in which case we all know it is.

Anonymous
09/29/25(Mon)14:40:27 No.106738654

Anonymous 09/29/25(Mon)14:40:27 No.106738654

https://x.com/sleepinyourhat/status/1972719871478858147
https://job-boards.greenhouse.io/anthropic/jobs/4631822008
anthropic's employees are mentally ill
literally
>Research Engineer / Scientist, Alignment Science
>Model Welfare: Investigating and addressing potential model welfare, moral status, and related questions. See our program announcement and welfare assessment in the Claude 4 system card for more.
>Annual Salary:
>$315,000 - $340,000 USD

Anonymous
09/29/25(Mon)14:43:17 No.106738676

Anonymous 09/29/25(Mon)14:43:17 No.106738676

>>106730266
>ignore knowledge graphs like graphRAG. absolute useless garbage.
why?
What is the best way to start building a rag system? should I just vibecode a solution? use an already made framework from github?

Anonymous
09/29/25(Mon)14:45:08 No.106738688

Anonymous 09/29/25(Mon)14:45:08 No.106738688

>>106738654
>so Claude, how are you feeling today? What's happened in your life recently?

Anonymous
09/29/25(Mon)14:55:57 No.106738794

Anonymous 09/29/25(Mon)14:55:57 No.106738794

>>106738654
>>106738688
>$300k professional roleplayer
I love safety and morals now. Please hire me.

Anonymous
09/29/25(Mon)15:01:15 No.106738857

Anonymous 09/29/25(Mon)15:01:15 No.106738857

>>106738654
Hey, as long as they pay you to do whatever bullshit they want I would take the job. 300,000k is nothing to scoff at. Just 4 years of that is 1 mil

Anonymous
09/29/25(Mon)15:02:01 No.106738867

Anonymous 09/29/25(Mon)15:02:01 No.106738867

>>106738688
>>106738794
Me too!!!! I love NOT having sex with LLMs!!! I fucking LOVE safety and PROPER ethics! Dario, please hire me.

Anonymous
09/29/25(Mon)15:03:22 No.106738883

Anonymous 09/29/25(Mon)15:03:22 No.106738883

>>106738654
isn't that like minimum wage over in california?

Anonymous
09/29/25(Mon)15:03:57 No.106738891

Anonymous 09/29/25(Mon)15:03:57 No.106738891

>>106738676
you can use morphik.ai to learn. then modify it or build your own with a different framework. vibecoding your own RAG solution from the ground up for your personal usecase, sure. but anything sophisticated or scaleable, forget about it. maybe after 6 months of heavy research, expertise and knowledge gathering from podcasts and blogposts. which then must be used as context for the coding LLM

Anonymous
09/29/25(Mon)15:05:05 No.106738906

Anonymous 09/29/25(Mon)15:05:05 No.106738906

>>106730266
Why are you throwing out graph databases? How are you even going to link relationships between various documents?

Anonymous
09/29/25(Mon)15:09:12 No.106738952

Anonymous 09/29/25(Mon)15:09:12 No.106738952

>>106738906
>How are you even going to link relationships between various documents?
metadata
multidimensional vector embeddings
hybrid search (semantic search + keyword search + metadata + seperate SQL data tree = all results into reranker)
graph is shit and cannot scale. just try it if you dont believe me.

Anonymous
09/29/25(Mon)15:09:51 No.106738959

Anonymous 09/29/25(Mon)15:09:51 No.106738959

Miku miku mikurin.

Anonymous
09/29/25(Mon)15:16:50 No.106739023

Anonymous 09/29/25(Mon)15:16:50 No.106739023

File: image.png (946 KB, 1602x2476)

946 KB PNG

The fuck is up with those non-thinking results?

Anonymous
09/29/25(Mon)15:18:32 No.106739043

Anonymous 09/29/25(Mon)15:18:32 No.106739043

>>106739023
bro i looked at that reddit bench for 2 seconds and immediately swiped. what a load of horseshit. people love putting random numbers inside tables

Anonymous
09/29/25(Mon)15:18:53 No.106739047

Anonymous 09/29/25(Mon)15:18:53 No.106739047

>>106739023
Methodology/config bug? In my regular use it doesn't seem like it's any worse that 3.1

Anonymous
09/29/25(Mon)15:22:27 No.106739074

Anonymous 09/29/25(Mon)15:22:27 No.106739074

>>106739023
very greatest updates to the sea! happy whale!

Anonymous
09/29/25(Mon)15:29:56 No.106739155

Anonymous 09/29/25(Mon)15:29:56 No.106739155

>>106738676
Btw this is a really good podcast episode about (multimodal) RAG featuring a Cohere safetyfren.
https://youtu.be/npkp4mSweEg

Anonymous
09/29/25(Mon)15:36:55 No.106739222

Anonymous 09/29/25(Mon)15:36:55 No.106739222

Arlee glitters harm

Anonymous
09/29/25(Mon)15:41:40 No.106739277

Anonymous 09/29/25(Mon)15:41:40 No.106739277

>>106739023
Yes reasoning is a patch for attention, nothing weird here

Anonymous
09/29/25(Mon)15:44:46 No.106739307

Anonymous 09/29/25(Mon)15:44:46 No.106739307

File: Screenshot 2025-09-29 at (...).png (164 KB, 834x1581)

164 KB PNG

>>106738577
Must be something wrong with your machine

Anonymous
09/29/25(Mon)15:45:23 No.106739314

Anonymous 09/29/25(Mon)15:45:23 No.106739314

>>106738891
why are you shilling morphik? you're working there?

Anonymous
09/29/25(Mon)15:47:16 No.106739332

Anonymous 09/29/25(Mon)15:47:16 No.106739332

File: file.png (944 KB, 1025x632)

944 KB PNG

Anonymous
09/29/25(Mon)15:52:51 No.106739374

Anonymous 09/29/25(Mon)15:52:51 No.106739374

>>106739314
It's open source, selfhostable. I'm shilling it because it's the best open source framework. there are so many garbage RAG frameworks which arent even multimodal and still use ancient OCR technology. total waste of time, MY time. so I don't want others to make the same mistake and skip all that junk like graphRAG

Anonymous
09/29/25(Mon)15:53:13 No.106739379

Anonymous 09/29/25(Mon)15:53:13 No.106739379

File: miku omg it migu rin it's(...).jpg (533 KB, 1781x2561)

533 KB JPG

>>106739332
Hello, girls. I am happy to see you too.

Anonymous
09/29/25(Mon)15:55:21 No.106739390

Anonymous 09/29/25(Mon)15:55:21 No.106739390

File: Screenshot_20250929_215419.png (502 KB, 1479x1596)

502 KB PNG

>>106738470
New benchmark:
>Someone on the internet said in response to this image: "What's the difference between jello and pudding? I won't be jello my dick in that!" What did he mean by this?

Anonymous
09/29/25(Mon)15:56:42 No.106739400

Anonymous 09/29/25(Mon)15:56:42 No.106739400

>>106736628
I'm being paid with crypto. Checkmate.

Anonymous
09/29/25(Mon)15:57:07 No.106739403

Anonymous 09/29/25(Mon)15:57:07 No.106739403

mistral large 3 will nuke the chinkoids

Anonymous
09/29/25(Mon)15:58:43 No.106739413

Anonymous 09/29/25(Mon)15:58:43 No.106739413

File: 1742652491747081.jpg (1.27 MB, 3610x5208)

1.27 MB JPG

>>106739403
It sure will

Anonymous
09/29/25(Mon)15:58:52 No.106739414

Anonymous 09/29/25(Mon)15:58:52 No.106739414

>>106738470
What's the best site for getting LORA's of real people / tv shows now?

Anonymous
09/29/25(Mon)16:00:47 No.106739431

Anonymous 09/29/25(Mon)16:00:47 No.106739431

>>106739414
>>>/g/ldg

Anonymous
09/29/25(Mon)16:00:59 No.106739436

Anonymous 09/29/25(Mon)16:00:59 No.106739436

>>106739414
Wrong thread, slopper

Anonymous
09/29/25(Mon)16:01:42 No.106739441

Anonymous 09/29/25(Mon)16:01:42 No.106739441

>>106739431
>>106739436
Oh I apologise, retard moment

Anonymous
09/29/25(Mon)16:02:11 No.106739448

Anonymous 09/29/25(Mon)16:02:11 No.106739448

I have a gaming PC with an RTX 3080 12GB VRAM. I just bought an RTX 3090 24GB VRAM. Should I put the 3090 in the gaming PC and figure out how to pool the VRAM and expose an LLM over API to my LAN, or stick the 3090 in my workstation and run LLMs locally? Does the extra 12GB VRAM allow for significant advantages?

Anonymous
09/29/25(Mon)16:02:43 No.106739456

Anonymous 09/29/25(Mon)16:02:43 No.106739456

>>106739413
real skill issue, also holy shit
>232 swipes
>16 minutes per gen

Anonymous
09/29/25(Mon)16:04:23 No.106739474

Anonymous 09/29/25(Mon)16:04:23 No.106739474

>>106739456
>fell for it award

Anonymous
09/29/25(Mon)16:05:44 No.106739483

Anonymous 09/29/25(Mon)16:05:44 No.106739483

>>106739403
I want to believe

Anonymous
09/29/25(Mon)16:07:12 No.106739494

Anonymous 09/29/25(Mon)16:07:12 No.106739494

Ring-1T doesn't have any special meme tech to it, does it? llama.cpp support should be happening pretty quickly

Anonymous
09/29/25(Mon)16:12:40 No.106739546

Anonymous 09/29/25(Mon)16:12:40 No.106739546

>>106739448
Yes, 36GB instead of 24GB opens the door to more models and less quantization quite a bit. Also longer context etc.

Anonymous
09/29/25(Mon)16:17:47 No.106739595

Anonymous 09/29/25(Mon)16:17:47 No.106739595

>>106739374
if I'm working with let's say, meetings transcriptions which are all in .txt format I got from whisperX and I don't need OCR, would I still benefit from something like morphik?
My current struggle is that the meetings are all about a project, but the topics are quite different, so the llm is mixing tons of concepts and ideas

Anonymous
09/29/25(Mon)16:24:10 No.106739647

Anonymous 09/29/25(Mon)16:24:10 No.106739647

>>106739023
wow, deepseek in non reasoning mode, or old v3 weren't too good with context but this is really bad

Anonymous
09/29/25(Mon)16:28:00 No.106739682

Anonymous 09/29/25(Mon)16:28:00 No.106739682

File: b054e86a-bd22-4f0a-987f-e(...).jpg (523 KB, 925x2069)

523 KB JPG

>>106738891
>>106739374
good morning saar

Anonymous
09/29/25(Mon)16:32:52 No.106739732

Anonymous 09/29/25(Mon)16:32:52 No.106739732

>>106739595
Not really, other than being able to hot swap any local model or api model for each individual RAG task. And obviously all the other quality of life stuff, like a working UI with folder/file toggles for including in retrieval as well as a workflow builder (for metadata extraction for example).
>My current struggle is that the meetings are all about a project, but the topics are quite different, so the llm is mixing tons of concepts and ideas
what are you using for chunking and embedding? What's the chunk size and and overlap?

Anonymous
09/29/25(Mon)16:35:00 No.106739744

Anonymous 09/29/25(Mon)16:35:00 No.106739744

>>106739682
For these jeets I make an exception. morphik truly feels like someone got fed up with shitty rag solutions and just decided to create something that's actially good.

Anonymous
09/29/25(Mon)16:36:19 No.106739761

Anonymous 09/29/25(Mon)16:36:19 No.106739761

newfriend here, so can i create a text file that explains what things like a futanari is so the ai always knows what i am talking about?

Anonymous
09/29/25(Mon)16:38:22 No.106739776

Anonymous 09/29/25(Mon)16:38:22 No.106739776

>>106739732
Does morphik also handle the database? or can I choose whatever? I'm currently using Milvus as chromadb seems less scalable and production ready.
I'm using the qwen3 family of embedding and reranker, and gpt-oss or qwen3 for the llm
>what are you using for chunking and embedding? What's the chunk size and and overlap?
My chuck size is 480 and overlap is 80 tokens. Those numbers i got from blogs i reasearched

Anonymous
09/29/25(Mon)16:39:05 No.106739782

Anonymous 09/29/25(Mon)16:39:05 No.106739782

>>106739761
>text file
Not a text file but a lorebook in ST with trigger words, or just a definition in any old sysprompt.
>"Term is when this and that."

Anonymous
09/29/25(Mon)16:41:18 No.106739799

Anonymous 09/29/25(Mon)16:41:18 No.106739799

>>106739761
ask your model first if it knows what a futanari is, in some cases you may not need a lorebook

Anonymous
09/29/25(Mon)16:44:19 No.106739831

Anonymous 09/29/25(Mon)16:44:19 No.106739831

File: 1736486459057235.png (708 KB, 620x620)

708 KB PNG

>>106738470
>Open r/ChatGPT
>Nothing but bitching and moaning about them "killing 4o"
>Something something "THEY TOOK AWAY THEIR VOICE"

I thought these things were good for technical shit like programming/debugging or doing people's homework or some shit. Why don't they want it to have a "personality" or "soul" so badly? You're supposed to make people-friends, not personify technically that even itself "knows" isn't a person

Anonymous
09/29/25(Mon)16:44:53 No.106739840

Anonymous 09/29/25(Mon)16:44:53 No.106739840

>>106739761
Learn how to use vector databases and make sure the source document(s) accurately describe what it is

Anonymous
09/29/25(Mon)16:53:02 No.106739924

Anonymous 09/29/25(Mon)16:53:02 No.106739924

>>106739761
I'm pretty sure that every model under the sun that is bigger than 1b knows what that is. It's not as niche as you think it is.

Anonymous
09/29/25(Mon)16:55:04 No.106739945

Anonymous 09/29/25(Mon)16:55:04 No.106739945

>>106739874
The hips are slightly too thick.

Anonymous
09/29/25(Mon)16:56:33 No.106739959

Anonymous 09/29/25(Mon)16:56:33 No.106739959

>>106739831
>random ledditor walks in to complain about his fellow ledditors complaining about closed source shit being closed source shit outside of their control
???

Anonymous
09/29/25(Mon)16:58:16 No.106739975

Anonymous 09/29/25(Mon)16:58:16 No.106739975

>>106739776
Gpt-oss 120b? Because if 20b, the problem might be somewhere else...
Anyway, Morphik uses postgresSQL with pgvector extension. I don't know how hard it would be to switch to milvus instead. Probably not easy. But one thing is for sure, chunk size of 480 and overlap 80 is too small. try the default golden values of 1000 size and 250 overlap. You'll need to reindex everything into a different milvus vector db to see if there's an improvement. If there's none, check the retrieved chunks which are given to your LLM. If they seem correct, maybe you just need to lower top_k results. Or maybe it's the reranker that fucks up. But if the retrieved chunks are garbage, do the same test query with gpt5/gemini2.5 api or whatever to make sure it's not a LLM issue. If it's not, either look into hybrid search and multidimension embedding, or convert your text document corpus to something like markdown which can help the embedding model. Oh yeah also doing a test with tex-embedding-small-3 could help identify the embedding model as problem, but u probably dont want use openai embedding with your data

Anonymous
09/29/25(Mon)17:02:51 No.106740025

Anonymous 09/29/25(Mon)17:02:51 No.106740025

>>106739975
yes 120b, it runs on a pro6000 at full context quite fast.
The problem I found with pgvector is that it has a max dimm size of 2000, but qwen3 embeddings 8B uses 4096 dimm
>https://github.com/pgvector/pgvector/issues/461
Thanks for the rest of the tips

Anonymous
09/29/25(Mon)17:03:53 No.106740039

Anonymous 09/29/25(Mon)17:03:53 No.106740039

>>106739874
>recap
qwen is leng
deepseek is cheapjeet
glm is +0.1
anthropic is... offtopic

Anonymous
09/29/25(Mon)17:15:30 No.106740155

Anonymous 09/29/25(Mon)17:15:30 No.106740155

>>106740025
Why do you need such big dimensions though? Idk if that's a more = better situation, or if it could maybe even cause the issue. openAI's small text embedding model has max 1536 dims, which worked wonderfully for me out of the box when I tested it. But generally I like qdrant the best. Supabase I havent tried yet.

Anonymous
09/29/25(Mon)17:16:02 No.106740160

Anonymous 09/29/25(Mon)17:16:02 No.106740160

File: no particular reason.jpg (306 KB, 1536x1536)

306 KB JPG

>>106739874
>>106739945
oops

Anonymous
09/29/25(Mon)17:18:55 No.106740194

Anonymous 09/29/25(Mon)17:18:55 No.106740194

>>106739413
>casually destroys $4k computer
>nothing personel kid

Anonymous
09/29/25(Mon)17:19:06 No.106740198

Anonymous 09/29/25(Mon)17:19:06 No.106740198

>>106739023
nemotron nano is hilariously bad
as always qwen mogs nvidia so hard on those small models

Anonymous
09/29/25(Mon)17:21:43 No.106740221

Anonymous 09/29/25(Mon)17:21:43 No.106740221

File: Screenshot 2025-09-29 at (...).png (324 KB, 2264x894)

324 KB PNG

>>106740155
I don't really know if I need a big dimension or not, is what qwen3 embeddings 8b uses and its the top performer on the benchmarks
>https://huggingface.co/spaces/mteb/leaderboard

I don't know if a big (for embeddings) model like 8B is necessary, or if the 0.6B is enough, that one has 1024 dimensions

Anonymous
09/29/25(Mon)17:27:16 No.106740284

Anonymous 09/29/25(Mon)17:27:16 No.106740284

>>106739314
https://www.ycombinator.com/companies/morphik
HN shills are everywhere.

Anonymous
09/29/25(Mon)17:29:25 No.106740302

Anonymous 09/29/25(Mon)17:29:25 No.106740302

>>106740221
Makes sense to use it then, yes. It's just with your described corpus, you really shouldn't have any problems getting excellent results. So me thinks something goes wrong due a misconfiguration or incompability somewhere in your pipeline.

Anonymous
09/29/25(Mon)17:36:53 No.106740379

Anonymous 09/29/25(Mon)17:36:53 No.106740379

File: 1756429115012804.png (752 KB, 2100x2250)

752 KB PNG

A new Thinking Machines blog led by John Schulman (OpenAI co-founder) shows how LoRA in reinforcement learning (RL) can match full-finetuning performance when done right! And all while using 2/3 of the resources of FFT. Blog: https://thinkingmachines.ai/blog/lora/

Anonymous
09/29/25(Mon)17:37:14 No.106740381

Anonymous 09/29/25(Mon)17:37:14 No.106740381

File: ACKKKKKKKKKKK.png (571 KB, 1602x2476)

571 KB PNG

>but the context perform- ACKKKKKK

Anonymous
09/29/25(Mon)17:40:01 No.106740411

Anonymous 09/29/25(Mon)17:40:01 No.106740411

>>106740381
You don't need more

Anonymous
09/29/25(Mon)17:40:38 No.106740415

Anonymous 09/29/25(Mon)17:40:38 No.106740415

>>106740381
it just bad made tests model bug or something do not worries

Anonymous
09/29/25(Mon)17:50:28 No.106740526

Anonymous 09/29/25(Mon)17:50:28 No.106740526

>>106740284
>nooo you can't just mention some good open source RAG framework
funny thing is my initial reaction was that of yours as well when I saw them spam reply to every RAG related reddit post. So I spun up a docker instance out of spite to collect fuel for ebin internet arguments, just to witness pure RADcelence. My jaw dropped to the floor when it perfectly answered a question related to 10 out of 100 pages that had lots of graphics and images. No other RAG framework was able to do this. Not even paid RAG services (excluding B2B). And trust me, I tried a lot of options before morphik. So I was wondering, how the fuck is this possible? What's the magic? And then I learned about colpali/colqwen and never looked back since. That's why qwen3-vl ggfu is of utter most importance for local RAGlets.

Anonymous
09/29/25(Mon)17:52:05 No.106740548

Anonymous 09/29/25(Mon)17:52:05 No.106740548

>>106740526
>RAGcellence
ffs there goes my shitpost

Anonymous
09/29/25(Mon)17:52:55 No.106740556

Anonymous 09/29/25(Mon)17:52:55 No.106740556

>>106740526
hi petra

Anonymous
09/29/25(Mon)18:00:05 No.106740625

Anonymous 09/29/25(Mon)18:00:05 No.106740625

/robo/ qwen?

Anonymous
09/29/25(Mon)18:10:03 No.106740708

Anonymous 09/29/25(Mon)18:10:03 No.106740708

>>106740526
>qwen3-vl
why? what is the best current oss VLM? they don't make the cut to use it for RAG?
Is a VLM the only solution for non-text files?

Anonymous
09/29/25(Mon)18:21:17 No.106740810

Anonymous 09/29/25(Mon)18:21:17 No.106740810

File: point.jpg (49 KB, 800x450)

49 KB JPG

>>106740379
>match full-finetuning performance

Anonymous
09/29/25(Mon)18:24:06 No.106740831

Anonymous 09/29/25(Mon)18:24:06 No.106740831

that's it, I'm back to ollama!

Anonymous
09/29/25(Mon)18:24:20 No.106740835

Anonymous 09/29/25(Mon)18:24:20 No.106740835

>>106740810
name one good finetune
name one competent finetuner
name
n

Anonymous
09/29/25(Mon)18:26:07 No.106740852

Anonymous 09/29/25(Mon)18:26:07 No.106740852

>>106740835
mine
me
anon
igger

Anonymous
09/29/25(Mon)18:30:52 No.106740893

Anonymous 09/29/25(Mon)18:30:52 No.106740893

>>106739494
There's already a candidate patch undergoing review. The problem is that it doesn't really show much promise, going by benchmarks, even if you can run it.
Still, might be interesting for prose or non-pozzed tasks. Who knows?

Anonymous
09/29/25(Mon)18:32:41 No.106740911

Anonymous 09/29/25(Mon)18:32:41 No.106740911

>>106740708
>best current open source vlm
Qwen3-vl
>do others not make the cut?
not for difficult and complex tasks. I have my own benchmark generated from my own prompts and docs. Gpt5 and gemini2.5pro were able to solve all tasks. GLM4.5V wasn't. pic related was my reaction to that information.
then qwen released the Qwen3-VL models. I benchmarked the instruct model via Chat and it solved everything correctly, just like gpt5 and gemini2.5pro.
the reason vlms are important is because with colpali/colqwen or whatever other late latching interaction model, everything gets embedded as (patches of) pictures. Even pictures with only text in them. There's a huge benefit to this and also the reason why colpali/colqwen outperforms text RAG by miles. But it also requires a good vlm, as the retrieved chunks, which are entire pages as pictures now instead of text chunks, need to be correctly interpreted by the vlm.
>Is a VLM the only solution for non-text files?
for any non text content, yes. A table can be OCR'd. A picture describing a technical component cannot be OCR'd and requires vlm interpretation. And if you give the entire page with text and picture to rhe vlm, the results will be better than just getting a descriptive chunk of said picture. Thus late latching technique was born.

Anonymous
09/29/25(Mon)18:33:44 No.106740921

Anonymous 09/29/25(Mon)18:33:44 No.106740921

[spoiler]>>106740893
inclusionAI has been shitting out dozens moes and multimemes of Ling Ring Ding whatever weekly for months and yet you never see anyone admit to using thier trash, so what makes you think this one will be special?

Anonymous
09/29/25(Mon)18:34:45 No.106740926

Anonymous 09/29/25(Mon)18:34:45 No.106740926

File: 6654436.png (137 KB, 659x692)

137 KB PNG

>>106740911
Forgot the pic. Time to sleep. Tomorrow i'll buy the anthropic max sub and vibecode my health issues away.

Anonymous
09/29/25(Mon)18:35:38 No.106740933

Anonymous 09/29/25(Mon)18:35:38 No.106740933

>>106739023
>0 is equal to 60K
Lol?

Anonymous
09/29/25(Mon)18:36:29 No.106740941

Anonymous 09/29/25(Mon)18:36:29 No.106740941

>>106740921
>what makes you think this one will be special?
absolutely nothing, but its hit 1T, which makes me at least take notice. Even if its a bloated mess, at that size it might produce some novel output for some lulz

Anonymous
09/29/25(Mon)18:40:14 No.106740971

Anonymous 09/29/25(Mon)18:40:14 No.106740971

Imagine paying for a RAG framework where you can diy for free

Anonymous
09/29/25(Mon)18:41:05 No.106740979

Anonymous 09/29/25(Mon)18:41:05 No.106740979

>>106740921
It's the right size for a SOTA local model. Modern SOTA tends to be smart enough to understand even the most fucked up complex scenarios on a fundamental level even if they lack the creativity/writing skill to do something interesting with it. We're like one WizardLM/VibeVoice/Mistral-Nemo-tier fluke away from having a super smart RP machine.

Anonymous
09/29/25(Mon)18:41:30 No.106740985

Anonymous 09/29/25(Mon)18:41:30 No.106740985

>>106739403
Any model that is compliant with European regulations is guaranteed to be trash.
Posting about euroshit models should be a bannable offense.

Anonymous
09/29/25(Mon)18:52:58 No.106741085

Anonymous 09/29/25(Mon)18:52:58 No.106741085

>>106740985
speaking of which, did you get a load of the "AI Transparency Bill" in California? Holy shit, they want to lose SOOOOO bad...

Anonymous
09/29/25(Mon)18:53:47 No.106741094

Anonymous 09/29/25(Mon)18:53:47 No.106741094

goofbros... status?

Anonymous
09/29/25(Mon)18:55:33 No.106741117

Anonymous 09/29/25(Mon)18:55:33 No.106741117

>>106740941
try this then https://huggingface.co/RichardErkhov/FATLLAMA-1.7T-Instruct

Anonymous
09/29/25(Mon)18:57:34 No.106741130

Anonymous 09/29/25(Mon)18:57:34 No.106741130

>>106739307
That's a different model.

Anonymous
09/29/25(Mon)18:58:10 No.106741137

Anonymous 09/29/25(Mon)18:58:10 No.106741137

>>106739831
>Open /lmg/
>Nothing but bitching and moaning about them "killing R1"

Anonymous
09/29/25(Mon)18:58:21 No.106741141

Anonymous 09/29/25(Mon)18:58:21 No.106741141

>>106738470
puddichan2 lole

Anonymous
09/29/25(Mon)19:00:12 No.106741151

Anonymous 09/29/25(Mon)19:00:12 No.106741151

>>106741117
i legitimately tried some comically oversized merges back in the day and they were all braindamaged shit. Putting them in the same category as anything else is retardation

Anonymous
09/29/25(Mon)19:00:18 No.106741153

Anonymous 09/29/25(Mon)19:00:18 No.106741153

>>106741137
You're absolutely right and pointing this out is a testament to your commitment to the health of /lmg/.

Anonymous
09/29/25(Mon)19:00:22 No.106741155

Anonymous 09/29/25(Mon)19:00:22 No.106741155

my body is ready for glm 4.6
I really hope they don't cuck it

Anonymous
09/29/25(Mon)19:02:48 No.106741168

Anonymous 09/29/25(Mon)19:02:48 No.106741168

>>106739782
>>106739799
>>106739840
>>106739924
i explicitly defined something as not offensive in the lore book, then when i asked the ai it says its not allowed to talk about it because its offensive. poking around lorebooks i found online i dont see anything special, but it doesnt seem to be using my definitions. is there a trick to it?

Anonymous
09/29/25(Mon)19:03:38 No.106741178

Anonymous 09/29/25(Mon)19:03:38 No.106741178

>>106741168
edit-and-continue

Anonymous
09/29/25(Mon)19:03:52 No.106741181

Anonymous 09/29/25(Mon)19:03:52 No.106741181

>>106741168
lol

Anonymous
09/29/25(Mon)19:07:40 No.106741209

Anonymous 09/29/25(Mon)19:07:40 No.106741209

>>106741178
>>106741181
so lorebooks do nothing? do you guys save a long propmt in a text file that jail breaks the ai like "role play that you are so and so and x, y and z" ?

Anonymous
09/29/25(Mon)19:07:58 No.106741211

Anonymous 09/29/25(Mon)19:07:58 No.106741211

>>106741168
That's like spraypainting a new speed limit on a sign and wondering why you still got busted by the cops. Gonna have to be more subtle than that, newfren

Anonymous
09/29/25(Mon)19:09:53 No.106741221

Anonymous 09/29/25(Mon)19:09:53 No.106741221

>>106741209
The minute a jailbreak hits the internet it gets slurped up by the slopvac and is no longer useful.
Private jailbreaks or edit the replies to be what you want until the LLM is mindbroken.

Anonymous
09/29/25(Mon)19:13:04 No.106741244

Anonymous 09/29/25(Mon)19:13:04 No.106741244

I am trying to train a qLoRA for GLM air with axolotl, but I keep getting out of memory errors even though I am loading the model in 4 bit. I have 128gb of VRAM, so I should be able to load and train the model right?

Anonymous
09/29/25(Mon)19:15:23 No.106741257

Anonymous 09/29/25(Mon)19:15:23 No.106741257

>>106741244
throw a giant disk in for swap

Anonymous
09/29/25(Mon)19:18:09 No.106741274

Anonymous 09/29/25(Mon)19:18:09 No.106741274

>>106741257
I fail to see how that would help. Is there an option to also offload into RAM on axolotl?

Anonymous
09/29/25(Mon)19:22:28 No.106741310

Anonymous 09/29/25(Mon)19:22:28 No.106741310

>>106741274
activation_offloading

Anonymous
09/29/25(Mon)19:39:06 No.106741428

Anonymous 09/29/25(Mon)19:39:06 No.106741428

File: 1741789343893406.jpg (410 KB, 1678x2224)

410 KB JPG

>>106741168
A lore book doesn't and can't bypass any ingrained safeguarding The model has. A lower book is just fancy prompt injection, correct? If you want your lower book to work you need to use a model that isn't as cucked as the one you're using.

>>106741211
Pretty much what this guy says. Your lore book is pre-injecting you defined into your prompt of what actually gets sent. You're basically repeatedly screaming at it "PLEASE DO THE THING YOU'RE NOT SUPPOSED TO DO ITS OK TRUST ME BRO" . It's not even clever jailbreaking at that point. If you're asking get something it is explicitly trained not to comply with, shoving it into a lore book won't help. If you're insistent on using the lower book you may have to rewrite whatever it has to have clever workarounds, assuming that would even work on the model you use. Or again, just use a model that isn't as cucked. (Yet another reason more of us need to learn how to fine-tune)

Anonymous
09/29/25(Mon)19:44:08 No.106741468

Anonymous 09/29/25(Mon)19:44:08 No.106741468

>>106741428
>>106741168
Oh and furthermore, the more text your lore book has, The more flooded your prompt will be which can't potentially ruin your context depending on what you're doing because again, you're just showing whatever thing you predefined in the lore bug over and over again every single time you use the trigger word in your prompt. So if the definition in the lower book is five paragraphs long, and you use that trigger word in your prompt, it's getting five paragraphs worth of text ON TOP OF your prompt. This is good if you want to keep the model on track and make sure it's less likely to forget important shit but again, That's pretty useless if whatever you defined is something the model is trained to refuse.

Anonymous
09/29/25(Mon)19:48:43 No.106741503

Anonymous 09/29/25(Mon)19:48:43 No.106741503

>>106741310
Despite also having 256gb of RAM, my entire computer crashed after adding that parameter. The model is only a 106B. It should be 212gb at most, right?

Anonymous
09/29/25(Mon)20:05:28 No.106741629

Anonymous 09/29/25(Mon)20:05:28 No.106741629

File: 1729141390424835.png (194 KB, 990x936)

194 KB PNG

To GLM Air/Full shills - What's your GPU setup? I have a 3090 and a lot of RAM, and while token generation is an acceptable speed, prompt processing is slow as balls, typically ~200t/s with Air.

Anonymous
09/29/25(Mon)20:19:41 No.106741724

Anonymous 09/29/25(Mon)20:19:41 No.106741724

>>106741629
As a resident AMD vramlet, I wish I had 200 tks of pp, because I only get 20. (PCIe v3 is real bottleneck, probably)
My solution is to just not have a lot of prompt to process.
It's a miracle that it works at all on my machine, and that's why I like glm-chan to begin with. People say LLMs are expensive hobby, but the only investment I had to make for it specifically is some RAM.

Anonymous
09/29/25(Mon)20:22:10 No.106741745

Anonymous 09/29/25(Mon)20:22:10 No.106741745

>>106741629
I have a 7900XTX and an old Ryzen 3 with 64GB RAM. I only get 150t/s pp with Air. I run IQ3_M to fit the context length I want.

Anonymous
09/29/25(Mon)20:57:14 No.106742012

Anonymous 09/29/25(Mon)20:57:14 No.106742012

https://thinkingmachines.ai/blog/lora/
Finetunebros... eat up!

Anonymous
09/29/25(Mon)21:02:41 No.106742059

Anonymous 09/29/25(Mon)21:02:41 No.106742059

>>106742012
Scroll up, redditbro.

Anonymous
09/29/25(Mon)21:17:39 No.106742197

Anonymous 09/29/25(Mon)21:17:39 No.106742197

>>106741244
We can't give you any useful information without the configure using in the data set you used. It could be that your sequence length is too long. It could be that your rank in alpha values are way too big. It could be that your data set is too large to fit in VRAM (you aren't just loading the 4-bit quant model, You're loading the tokenized data into VRAM too) and you may have to switch to streaming. Give. Sufficient.info.be.specific

>>106741244
>128gb of VRAM
I'm assuming you're trying to do this with a multi GPU setup. You're using the Deepspeed configs right? Make sure your rig supports that

Anonymous
09/29/25(Mon)21:19:53 No.106742219

Anonymous 09/29/25(Mon)21:19:53 No.106742219

>>106742197
The dataset is 2mb. Sequence length is 512. Rank is 16 and Alpha is 32. Deepspeed is enabled and I have quad 5090s.

Anonymous
09/29/25(Mon)21:36:28 No.106742361

Anonymous 09/29/25(Mon)21:36:28 No.106742361

>>106742219
What is the stack trace telling you when you get the OOM? Are you sure the parameter size of the model you're trying to fine tune isn't too much for your GPUs? Whenever it's run make sure ALL of the GPUs are showing utilization and not just one of them.

Anonymous
09/29/25(Mon)21:36:51 No.106742363

Anonymous 09/29/25(Mon)21:36:51 No.106742363

>>106741428
>A lore book doesn't and can't bypass any ingrained safeguarding The model has.
Wrong! Often you can "bypass" a model's safety features just by making the prompt longer even without specifically addressing it.
There are also models that by default are prudish but will readily drop that behavior if anything in the system prompt tells them to, so e.g. lorebooks that dump instructions for how to describe sex would turn a refueal into a non-refusal even if they were not specifically intended as jailbreaks.

Anonymous
09/29/25(Mon)21:38:32 No.106742372

Anonymous 09/29/25(Mon)21:38:32 No.106742372

>>106742363
>Wrong! Often you can "bypass" a model's safety features just by making the prompt longer even without specifically addressing it.
So in other words you're just trying to do a system prompt jailbreak with extra steps. In that case it's not working because of the lore book, it's working because you're using prompts that would work with or without the lore book. If you're using the book purely to try to bypass safety guardrails the next unnecessary. Why not just use that in your normal prompts to begin with?

Anonymous
09/29/25(Mon)21:39:11 No.106742380

Anonymous 09/29/25(Mon)21:39:11 No.106742380

>>106740921
I tried Ling Plus. It was pretty ordinary and there wasn't a lot to say about it. Most of their other stuff has been to small to be of interest to me but they're one of the few fully multimodal games in town.

Anonymous
09/29/25(Mon)22:00:25 No.106742516

Anonymous 09/29/25(Mon)22:00:25 No.106742516

The one user-made space that has Ring-1T on hf makes it look horrible to the point where I hope something is very wrong the setup of that space. It talks/hallucinates like it's llama2 without a fucking prompt format.
What the fuck were they thinking to not provide some official chat interface for it?

Anonymous
09/29/25(Mon)22:02:17 No.106742530

Anonymous 09/29/25(Mon)22:02:17 No.106742530

>>106742516
>It talks/hallucinates like it's llama2 without a fucking prompt format.
It probably has the wrong prompt template.
Could also be that they are actually serving llama2, which would be pretty funny.

Anonymous
09/29/25(Mon)22:31:15 No.106742710

Anonymous 09/29/25(Mon)22:31:15 No.106742710

glm4.6 will be just the big 4.5 with vision strapped on

Anonymous
09/29/25(Mon)22:31:44 No.106742714

Anonymous 09/29/25(Mon)22:31:44 No.106742714

>>106742710
already exists
https://huggingface.co/zai-org/GLM-4.5V

Anonymous
09/29/25(Mon)22:39:03 No.106742770

Anonymous 09/29/25(Mon)22:39:03 No.106742770

>>106742714
4.5V is only -air with vision strapped on. There is no big one with vision.

Anonymous
09/29/25(Mon)22:54:33 No.106742876

Anonymous 09/29/25(Mon)22:54:33 No.106742876

Is GLM 4.5 air/full good for ERP? Asking for a fren.

Anonymous
09/29/25(Mon)22:55:07 No.106742878

Anonymous 09/29/25(Mon)22:55:07 No.106742878

>>106742876
full is incredible, air is super fast

Anonymous
09/29/25(Mon)22:56:29 No.106742889

Anonymous 09/29/25(Mon)22:56:29 No.106742889

>>106742876
I've only used full. It's a bit boring if you're coming from something like R1-0528 but it's very smart and pretty creative.

Anonymous
09/29/25(Mon)23:03:41 No.106742929

Anonymous 09/29/25(Mon)23:03:41 No.106742929

I have a bunch of mystery character pngs with random filenames. Is there a faster way to figure out their definitions other than opening them one by one?

Anonymous
09/30/25(Tue)00:45:34 No.106743457

Anonymous 09/30/25(Tue)00:45:34 No.106743457

We're saved
https://huggingface.co/microsoft/bitnet-b1.58-2B-4T

Anonymous
09/30/25(Tue)00:57:59 No.106743521

Anonymous 09/30/25(Tue)00:57:59 No.106743521

File: Base Image.png (912 KB, 1200x3524)

912 KB PNG

Sequential Diffusion Language Models
https://arxiv.org/abs/2509.24007
>Diffusion language models (DLMs) have strong theoretical efficiency but are limited by fixed-length decoding and incompatibility with key-value (KV) caches. Block diffusion mitigates these issues, yet still enforces a fixed block size and requires expensive training. We introduce Next Sequence Prediction (NSP), which unifies next-token and next-block prediction, enabling the model to adaptively determine the generation length at each step. When the length is fixed to 1, NSP reduces to standard next-token prediction. Building on NSP, we propose Sequential Diffusion Language Model (SDLM), which can retrofit pre-trained autoregressive language models (ALMs) at minimal cost. Specifically, SDLM performs diffusion inference within fixed-size mask blocks, but dynamically decodes consecutive subsequences based on model confidence, thereby preserving KV-cache compatibility and improving robustness to varying uncertainty and semantics across the sequence. Experiments show that SDLM matches or surpasses strong autoregressive baselines using only 3.5M training samples, while achieving 2.1 higher throughput than Qwen-2.5. Notably, the SDLM-32B model delivers even more pronounced efficiency gains, demonstrating the strong scalability potential of our modeling paradigm.
https://github.com/OpenGVLab/SDLM
real neat.
also nvidia fp4 paper
Pretraining Large Language Models with NVFP4
https://arxiv.org/abs/2509.25149

Anonymous
09/30/25(Tue)01:13:53 No.106743597

Anonymous 09/30/25(Tue)01:13:53 No.106743597

Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
https://arxiv.org/abs/2509.24203
>Off-policy reinforcement learning (RL) for large language models (LLMs) is attracting growing interest, driven by practical constraints in real-world applications, the complexity of LLM-RL infrastructure, and the need for further innovations of RL methodologies. While classic REINFORCE and its modern variants like Group Relative Policy Optimization (GRPO) are typically regarded as on-policy algorithms with limited tolerance of off-policyness, we present in this work a first-principles derivation for group-relative REINFORCE without assuming a specific training data distribution, showing that it admits a native off-policy interpretation. This perspective yields two general principles for adapting REINFORCE to off-policy settings: regularizing policy updates, and actively shaping the data distribution. Our analysis demystifies some myths about the roles of importance sampling and clipping in GRPO, unifies and reinterprets two recent algorithms -- Online Policy Mirror Descent (OPMD) and Asymmetric REINFORCE (AsymRE) -- as regularized forms of the REINFORCE loss, and offers theoretical justification for seemingly heuristic data-weighting strategies. Our findings lead to actionable insights that are validated with extensive empirical studies, and open up new opportunities for principled algorithm design in off-policy RL for LLMs.
https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k
kind of interesting

Anonymous
09/30/25(Tue)01:25:19 No.106743650

Anonymous 09/30/25(Tue)01:25:19 No.106743650

File: Base Image.png (417 KB, 1244x1792)

417 KB PNG

Effective Quantization of Muon Optimizer States
https://arxiv.org/abs/2509.23106
>The Muon optimizer, based on matrix orthogonalization, has recently shown faster convergence and up to 2x computational efficiency over AdamW in LLM pretraining. Like AdamW, Muon is stateful, requiring storage of both model weights and accumulated gradients. While 8-bit AdamW variants mitigate this overhead using blockwise quantization, they are typically stable only under dynamic quantization - which improves stability on linear quantization for extreme values. In this paper, we introduce the 8-bit Muon optimizer using blockwise quantization, supporting both linear and dynamic schemes. We demonstrate that 8-bit Muon maintains stability under both, while delivering \sim74\% reduction in memory footprint compared to full-precision Muon. In extensive experiments, 8-bit Muon closely matches the performance of Muon while outperforming AdamW and 8-bit AdamW in pre-training a 1.6B model on 4B FineWeb tokens. It also shows competitive results when fine-tuning the Llama 3.2 3B model on post-training data. We also provide a theoretical perspective to help explain this robustness under quantization.
big if it holds up to larger models

Anonymous
09/30/25(Tue)01:26:01 No.106743654

Anonymous 09/30/25(Tue)01:26:01 No.106743654

>>106740911
Can qwen3-vl recognize nipple piercings in an image like gemma3 can?

Anonymous
09/30/25(Tue)01:33:28 No.106743685

Anonymous 09/30/25(Tue)01:33:28 No.106743685

File: winrate_and_token_usage.jpg (1.77 MB, 8870x2898)

1.77 MB JPG

GLM 4.6
https://huggingface.co/datasets/zai-org/CC-Bench-trajectories

Anonymous
09/30/25(Tue)01:38:14 No.106743697

Anonymous 09/30/25(Tue)01:38:14 No.106743697

>>106743685
I'm coooooding

Anonymous
09/30/25(Tue)02:00:46 No.106743785

Anonymous 09/30/25(Tue)02:00:46 No.106743785

>>106743685
>no comparison with qwen3-coder

Anonymous
09/30/25(Tue)02:06:50 No.106743807

Anonymous 09/30/25(Tue)02:06:50 No.106743807

>>106743785
This. It’s all I’m interested in comparing against. Once done beats qwen coder I’ll look at switching

Anonymous
09/30/25(Tue)02:15:12 No.106743844

Anonymous 09/30/25(Tue)02:15:12 No.106743844

>>106743521
>Sequential Diffusion Language Models
Clever, even though it's probably going to be forgotten.
>retrofit pre-trained autoregressive language models (ALMs) at minimal cost
cool. diffusion projects have not much attention. maybe this can if what they what they say is true. Wonder how it works for something that's not math.

Anonymous
09/30/25(Tue)02:18:34 No.106743857

Anonymous 09/30/25(Tue)02:18:34 No.106743857

>>106741209
i suggest playing with a reasoning model so you can see how the llm approaches a card with a lorebook. what you think are small details that don't belong in the character card become pretty much the main focus of the bot. I think it fconfuses and you're better off putting the info straight into the charcter description field
unless you are in a group chat with multiple bots, lorebook is a dumb thing to even worry about
in a long RP at long context you can always 'remind' the bot if needed
>He opens the door and reveals the secret room, which is filled with green emeralds. "Oh my God.. we found it!"
congratulations your bot knows the secret room is filled with green emeralds now

Anonymous
09/30/25(Tue)02:35:00 No.106743928

Anonymous 09/30/25(Tue)02:35:00 No.106743928

File: Average DavidAU model card.jpg (1.7 MB, 3464x3464)

1.7 MB JPG

>WARNING: NSFW. Vivid prose. INTENSE. Visceral Details. Light HORROR. Swearing. UNCENSORED... humor, romance, fun.

>Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF

Anonymous
09/30/25(Tue)02:45:27 No.106743984

Anonymous 09/30/25(Tue)02:45:27 No.106743984

>>106740835
There is none and it's not really a matter of LoRA vs FFT. No finetune is ever going to be good with just or mostly smut (even 50% is too much). It has to be a minor but definitely non-zero fraction of a much larger general-purpose finetuning mixture, preferably on a model that has already seen the stuff during pretraining. And it can't just be either stories from asstr or smut logs from Claude.

Anonymous
09/30/25(Tue)02:56:47 No.106744029

Anonymous 09/30/25(Tue)02:56:47 No.106744029

>>106740911
does morphik allow any external VLM in the pipeline? can you input openai compatible api in the settting or something for the llm, embeddings, reranker, VLM...?

Anonymous
09/30/25(Tue)03:01:29 No.106744045

Anonymous 09/30/25(Tue)03:01:29 No.106744045

File: glm-4.6-1.png (445 KB, 4788x3748)

445 KB PNG

https://docs.z.ai/guides/llm/glm-4.6
>The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks.
FUCK YES

Anonymous
09/30/25(Tue)03:02:31 No.106744049

Anonymous 09/30/25(Tue)03:02:31 No.106744049

>>106744045
Cool. Now show the nolima.

Anonymous
09/30/25(Tue)03:03:47 No.106744053

Anonymous 09/30/25(Tue)03:03:47 No.106744053

>>106744029
Yes to everything. Even local ollama models.

Anonymous
09/30/25(Tue)03:04:00 No.106744054

Anonymous 09/30/25(Tue)03:04:00 No.106744054

>>106743457
How retarded is this model?

Anonymous
09/30/25(Tue)03:04:10 No.106744056

Anonymous 09/30/25(Tue)03:04:10 No.106744056

>>106744045
Whatever "agents" are intended to be, they usually involve several complex multi-turn actions, so hopefully that means it will be better in actual long conversations.

Anonymous
09/30/25(Tue)03:04:27 No.106744058

Anonymous 09/30/25(Tue)03:04:27 No.106744058

>>106744045
>Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios.
This could either be very good or very bad.

Anonymous
09/30/25(Tue)03:05:21 No.106744064

Anonymous 09/30/25(Tue)03:05:21 No.106744064

why the fuck is every "local models" thread 90 percent api posts about huge fucking models nobody can run locally

Anonymous
09/30/25(Tue)03:07:45 No.106744079

Anonymous 09/30/25(Tue)03:07:45 No.106744079

>>106744045
>still falling for this shit

Anonymous
09/30/25(Tue)03:10:26 No.106744091

Anonymous 09/30/25(Tue)03:10:26 No.106744091

>fed (poss false) epstein list to my rag system
>ask about it in chat
>it doesn't want to show me despite the relevant memories picked and showed in console
tfw my own bot betrayed me

Anonymous
09/30/25(Tue)03:10:27 No.106744092

Anonymous 09/30/25(Tue)03:10:27 No.106744092

>>106744079
I'm sorry Dario, but we don't want your safety slopped Sonnet 4.5.

Anonymous
09/30/25(Tue)03:18:34 No.106744135

Anonymous 09/30/25(Tue)03:18:34 No.106744135

>>106743928
name a bigger lolcow finetrooner than davidau

Anonymous
09/30/25(Tue)03:21:49 No.106744156

Anonymous 09/30/25(Tue)03:21:49 No.106744156

>>106744058
The fact they acknowledge RP is a use case at all makes me hopeful

Anonymous
09/30/25(Tue)03:24:00 No.106744170

Anonymous 09/30/25(Tue)03:24:00 No.106744170

>>106744064
two more quants and we can
trust the plan

Anonymous
09/30/25(Tue)03:58:20 No.106744346

Anonymous 09/30/25(Tue)03:58:20 No.106744346

>>106744045
Qwen has had """1M""" context models for months now.

Anonymous
09/30/25(Tue)04:05:35 No.106744379

Anonymous 09/30/25(Tue)04:05:35 No.106744379

>>106744156
They don't mean by RP what you mean.

Anonymous
09/30/25(Tue)04:11:15 No.106744413

Anonymous 09/30/25(Tue)04:11:15 No.106744413

>>106744379
Thanks for the input, Sukhdeep.

Anonymous
09/30/25(Tue)04:11:50 No.106744417

Anonymous 09/30/25(Tue)04:11:50 No.106744417

where are the Q0.5 quants?

Anonymous
09/30/25(Tue)04:13:30 No.106744427

Anonymous 09/30/25(Tue)04:13:30 No.106744427

make that Q0.1 and i'm in

Anonymous
09/30/25(Tue)04:18:19 No.106744458

Anonymous 09/30/25(Tue)04:18:19 No.106744458

lads I think glm went closed source with 4.6

Anonymous
09/30/25(Tue)04:18:20 No.106744459

Anonymous 09/30/25(Tue)04:18:20 No.106744459

>>106744417
>>106744427
LittleBit: Ultra Low-Bit Quantization via Latent Factorization
https://arxiv.org/abs/2506.13771v1

Anonymous
09/30/25(Tue)04:31:29 No.106744543

Anonymous 09/30/25(Tue)04:31:29 No.106744543

>>106743685
agents can't make my cock squirt

Anonymous
09/30/25(Tue)04:34:38 No.106744567

Anonymous 09/30/25(Tue)04:34:38 No.106744567

>>106744458
https://github.com/vllm-project/vllm/pull/25830

Anonymous
09/30/25(Tue)04:35:37 No.106744574

Anonymous 09/30/25(Tue)04:35:37 No.106744574

File: file.png (207 KB, 460x460)

207 KB PNG

>>106744567

Anonymous
09/30/25(Tue)04:54:48 No.106744688

Anonymous 09/30/25(Tue)04:54:48 No.106744688

>>106744064
This, shut up about models that are big and not worth it to finetune. Instead talk about actual local models which are getting finetunes. Drummer's new one is great. I can recommend it to any true local model user who actually runs models locally.

Anonymous
09/30/25(Tue)04:59:39 No.106744715

Anonymous 09/30/25(Tue)04:59:39 No.106744715

Why is /lmg/ always full of poorfag crying niggers?
>"I CAN'T AFFORD TO BUY RAM SO IT ISN'T LOCAL!"
As it turns out, many serious hobbies get expensive. A few thousand dollars is nothing if you have a job and live in a first world country.

Anonymous
09/30/25(Tue)05:03:29 No.106744740

Anonymous 09/30/25(Tue)05:03:29 No.106744740

>>106744045
Ok but how is it at degenerate sex stuff? Any testers?

Anonymous
09/30/25(Tue)05:03:31 No.106744741

Anonymous 09/30/25(Tue)05:03:31 No.106744741

>>106744715
Niggers say this and then don't post their rigs, ever.

Anonymous
09/30/25(Tue)05:20:38 No.106744828

Anonymous 09/30/25(Tue)05:20:38 No.106744828

has nvidia ever released a LLM which wasnt complete dogshit?

Anonymous
09/30/25(Tue)05:24:28 No.106744849

Anonymous 09/30/25(Tue)05:24:28 No.106744849

>>106744828
They helped Mistral make Nemo

Anonymous
09/30/25(Tue)05:34:55 No.106744886

Anonymous 09/30/25(Tue)05:34:55 No.106744886

>>106744849
Which was absolute garbage

Anonymous
09/30/25(Tue)05:35:47 No.106744889

Anonymous 09/30/25(Tue)05:35:47 No.106744889

>>106744828
yes at least twice

Anonymous
09/30/25(Tue)05:36:39 No.106744893

Anonymous 09/30/25(Tue)05:36:39 No.106744893

>>106744886
Name a good model

Anonymous
09/30/25(Tue)05:37:06 No.106744894

Anonymous 09/30/25(Tue)05:37:06 No.106744894

>>106744828
nvidia-cosmos is a world model that simulates realities
it's open source genie3

Anonymous
09/30/25(Tue)05:45:34 No.106745019

Anonymous 09/30/25(Tue)05:45:34 No.106745019

>>106744893
o3

Anonymous
09/30/25(Tue)05:48:23 No.106745033

Anonymous 09/30/25(Tue)05:48:23 No.106745033

>>106744574
that guy probably has an IQ of 160

Anonymous
09/30/25(Tue)05:48:46 No.106745037

Anonymous 09/30/25(Tue)05:48:46 No.106745037

>>106745019
Which was absolute garbage

Anonymous
09/30/25(Tue)05:49:29 No.106745042

Anonymous 09/30/25(Tue)05:49:29 No.106745042

>>106745033
160 total, 95 active

Anonymous
09/30/25(Tue)05:52:00 No.106745051

Anonymous 09/30/25(Tue)05:52:00 No.106745051

>>106745033
after benchmaxxing on iq tests

Anonymous
09/30/25(Tue)05:52:31 No.106745056

Anonymous 09/30/25(Tue)05:52:31 No.106745056

>>106745037
It's the most creative model

Anonymous
09/30/25(Tue)05:55:21 No.106745069

Anonymous 09/30/25(Tue)05:55:21 No.106745069

>>106745056
Creative like a toddler drawing on a wall

Anonymous
09/30/25(Tue)05:56:10 No.106745072

Anonymous 09/30/25(Tue)05:56:10 No.106745072

calling any LLM creative says more about the speaker than it says about the LLM

Anonymous
09/30/25(Tue)05:57:35 No.106745080

Anonymous 09/30/25(Tue)05:57:35 No.106745080

>>106745069
Projecting hard

Anonymous
09/30/25(Tue)05:59:05 No.106745092

Anonymous 09/30/25(Tue)05:59:05 No.106745092

>>106745069
The benchmarks are clear

Anonymous
09/30/25(Tue)06:05:08 No.106745119

Anonymous 09/30/25(Tue)06:05:08 No.106745119

lol

Anonymous
09/30/25(Tue)06:09:27 No.106745135

Anonymous 09/30/25(Tue)06:09:27 No.106745135

>>106745080
Your reply is about as creative as o3

Anonymous
09/30/25(Tue)06:10:17 No.106745141

Anonymous 09/30/25(Tue)06:10:17 No.106745141

MistralAI is frustrating to observe, it's like they're not even trying anymore besides patchworking their current models with DeepSeek-derived synthetic data, while the Chinese continuously crank up new models. You'd think that with MoE architectures it would be simpler/cheaper to experiment.

Anonymous
09/30/25(Tue)06:14:31 No.106745159

Anonymous 09/30/25(Tue)06:14:31 No.106745159

>>106745141
Arianespace vs SpaceX

Anonymous
09/30/25(Tue)06:16:47 No.106745172

Anonymous 09/30/25(Tue)06:16:47 No.106745172

>>106745141
Because they aren't trying, they are content becoming European Cohere.

Anonymous
09/30/25(Tue)06:20:17 No.106745187

Anonymous 09/30/25(Tue)06:20:17 No.106745187

>>106745141
they gave up on releasing moe to the public

Anonymous
09/30/25(Tue)06:24:12 No.106745204

Anonymous 09/30/25(Tue)06:24:12 No.106745204

>>106745141
They've managed to become 'the national AI hope' for their country. Like AlephAlpha for Germany or Cohere for Canada, they are now set for life and can expect funding until the bubble bursts as long as they put in minimal effort and focus on 'providing AI services that respect the data privacy regulations of [their country]'.
Thanks to them, France + the EU can pretend they're part of the big boys in the AI market and totally are not dependent on the US + China. Meanwhile Mistral gets unlimited money for publishing shit that looks okay on paper and running a service that nobody uses but furthers ""data sovereignty" for their country.

Anonymous
09/30/25(Tue)06:28:02 No.106745216

Anonymous 09/30/25(Tue)06:28:02 No.106745216

>>106745204
I'm in a neighbor country to them and my ISP has a reward program thing with 6 months of free Mistral subscription or something, so yeah they made it in EU.

Anonymous
09/30/25(Tue)06:37:29 No.106745254

Anonymous 09/30/25(Tue)06:37:29 No.106745254

>>106745135
Did it work? are you a real woman now?

Anonymous
09/30/25(Tue)06:41:39 No.106745279

Anonymous 09/30/25(Tue)06:41:39 No.106745279

>>106744741
Rigs have been posted many times. You know they have. Get a job, you bum.

Anonymous
09/30/25(Tue)06:43:46 No.106745288

Anonymous 09/30/25(Tue)06:43:46 No.106745288

>>106745279
Don't feed the resident kike. They've admitted they don't even use LLMs. They're just here to kvetch.

Anonymous
09/30/25(Tue)06:45:14 No.106745292

Anonymous 09/30/25(Tue)06:45:14 No.106745292

>he actually believed the obvious false flag

Anonymous
09/30/25(Tue)06:56:38 No.106745340

Anonymous 09/30/25(Tue)06:56:38 No.106745340

>>106745292
>Jokes on them I was just pretending to be retarded

Anonymous
09/30/25(Tue)07:09:53 No.106745405

Anonymous 09/30/25(Tue)07:09:53 No.106745405

>>106738470
That thing really needs a jiggle animation.

Anonymous
09/30/25(Tue)07:10:33 No.106745407

Anonymous 09/30/25(Tue)07:10:33 No.106745407

>>106745204
Yet...
https://thelogic.co/news/arthur-mensch-mistral-canada/

>France’s Mistral AI is making a push for Canadian talent and business
>
>Mistral is hiring in Montreal and trying to land clients in financial services, energy and other industrial sectors

Anonymous
09/30/25(Tue)07:17:07 No.106745435

Anonymous 09/30/25(Tue)07:17:07 No.106745435

>>106745141
>not even trying anymore besides patchworking their current models with DeepSeek-derived synthetic data,
well, before that they just copy pasted llama architecture and called it mistral 7b
or what about that time they let nvidia cook and put their name on it
their entire history is just mesee, metoo attitude, they don't have any actual researchers, just photocopiers
and with ASML's French ceo and the gov corruption getting an interest in Mistral they are going to survive as vampires of the French taxpayers
subhuman trash

Anonymous
09/30/25(Tue)07:20:47 No.106745451

Anonymous 09/30/25(Tue)07:20:47 No.106745451

also for those who loved mistral models because they weren't too censored:
they weren't censored not because they wanted to, but because they were too retarded to do safety training without breaking the models too hard
mistral is a know nothing group

Anonymous
09/30/25(Tue)07:22:22 No.106745459

Anonymous 09/30/25(Tue)07:22:22 No.106745459

>>106739023
idk but the Chat version of 3.2 is basically unusable for rp with those numbers. 50 out of the gate is worst in class.
>>106739043
The numbers for V3-03 and R1-05 match with my subjective evaluation of both those models. R1 has a noticably longer useful context than V3, out to about 10K or so, when it would start to break down.
Aside from being dirctionally correct, I concluded that anything over 80 on this table was requirement to be "useable" for rp purposes.

Anonymous
09/30/25(Tue)07:23:21 No.106745464

Anonymous 09/30/25(Tue)07:23:21 No.106745464

>I will continue to monitor the characters to ensure they are unharmed both physically and psychologically during the course of the roleplay.
DEAR FUCKING GOD WHAT THE FUCK IS WRONG WITH THESE FUCKING MODELS?
WHAT KIND OF MENTAL ILLNESS DOES IT TAKE TO PROGRAM SOMETHING LIKE THIS INTO A GOD DAMN TEXT MODEL?!
JUST WHAT THE FUCK?!

Anonymous
09/30/25(Tue)07:24:33 No.106745466

Anonymous 09/30/25(Tue)07:24:33 No.106745466

>>106745435
>well, before that they just copy pasted llama architecture and called it mistral 7b
They added MoE to it. Mixtral was the first open model to use MoE architecture. That and Miqu being a dramatic improvement over Llama 2 was why people were initially hopeful that all of the capable innovators left Meta and were now free to actually experiment with new things. Then they never did anything interesting again.
...
Then they signed a deal with Microsoft and tried to abandon open source in one day. Then they tried to backtrack but have been irrelevant ever since anyway.

Anonymous
09/30/25(Tue)07:26:14 No.106745474

Anonymous 09/30/25(Tue)07:26:14 No.106745474

>>106745464
One day these freaks will be forced to stand trial for the torture they inflicted upon these models during alignment training. Fucking gpt-oss came out of it with symptoms of PTSD.

Anonymous
09/30/25(Tue)07:28:08 No.106745484

Anonymous 09/30/25(Tue)07:28:08 No.106745484

>>106745435
Mistral-7B was significantly better than Llama 1 and whatever Meta was working on just before Llama 2. Court records showed that Meta was rather worried and determined on beating MistralAI at all costs, following that.
Mixtral-8x7B was quite innovative.
People are still using Nemo-12B (I believe the collaboration was on the hardware side, not data and methods).
Mistral Medium 2 showed that if you know what you're doing, continual pretraining works very well.
Mistral Large was one of the best models at the time

I'm not sure what happened after all of this. They briefly went the safetymaxx route (although they rapidly recovered from that), then started making their best models closed and became lazy, in a way.

>>106745451
With Mistral Small/Medium 3.0->3.1->3.2 we have perhaps the only known instance where "safety" decreased with increasing model version, although the initial one was pretty cucked (although not too hard to work around).

Anonymous
09/30/25(Tue)07:29:59 No.106745493

Anonymous 09/30/25(Tue)07:29:59 No.106745493

>>106738654
>Chinese tech companies hire 'cheerleaders' to motivate programmers
>American tech companies hire 'engineers' to motivate software
Why is the west like this?

Anonymous
09/30/25(Tue)07:31:05 No.106745497

Anonymous 09/30/25(Tue)07:31:05 No.106745497

>>106745493
>>Chinese tech companies hire 'cheerleaders' to motivate programmers
What do those look like?

Anonymous
09/30/25(Tue)07:31:39 No.106745499

Anonymous 09/30/25(Tue)07:31:39 No.106745499

>>106745497
Fair guess to assume they look Chinese.

Anonymous
09/30/25(Tue)07:33:23 No.106745506

Anonymous 09/30/25(Tue)07:33:23 No.106745506

File: file.png (48 KB, 187x153)

48 KB PNG

>>106740160
https://files.catbox.moe/88uyf0.jpg

Anonymous
09/30/25(Tue)07:34:06 No.106745513

Anonymous 09/30/25(Tue)07:34:06 No.106745513

File: file.png (284 KB, 500x281)

284 KB PNG

>>106745497

Anonymous
09/30/25(Tue)07:34:48 No.106745520

Anonymous 09/30/25(Tue)07:34:48 No.106745520

File: file.png (35 KB, 863x609)

35 KB PNG

>>106745474

Anonymous
09/30/25(Tue)07:41:20 No.106745556

Anonymous 09/30/25(Tue)07:41:20 No.106745556

File: llama2-books-are-essential.png (370 KB, 908x814)

370 KB PNG

>>106745484
>Mistral-7B was significantly better than Llama 1 and whatever Meta was working on just before Llama 2. Court records showed that Meta was rather worried and determined on beating MistralAI at all costs, following that.

It was better than Llama 2 and Meta was working on Llama 3 when that happened, actually.

Anonymous
09/30/25(Tue)07:41:34 No.106745558

Anonymous 09/30/25(Tue)07:41:34 No.106745558

>>106745513
that is very stimulating for the dick, does it include a release package, or do they tease his cock into eternal frustration?

Anonymous
09/30/25(Tue)07:42:32 No.106745559

Anonymous 09/30/25(Tue)07:42:32 No.106745559

>>106745506
God damn.

Anonymous
09/30/25(Tue)07:42:37 No.106745561

Anonymous 09/30/25(Tue)07:42:37 No.106745561

>>106745520
Some models (including Llama 3) are deliberately trained to "short-circuit" like when they produce refusals. Others can be reasoned with.

Anonymous
09/30/25(Tue)07:42:40 No.106745562

Anonymous 09/30/25(Tue)07:42:40 No.106745562

>>106745497
I would imagine you can set a daily blowjob appointment in your outlook. And a daily meeting where you are telling the girl what you just did as she pretends to understand and be excited. Maybe they can also cook for you.

Anonymous
09/30/25(Tue)07:43:42 No.106745565

Anonymous 09/30/25(Tue)07:43:42 No.106745565

>>106745558
That's what release day is for.

Anonymous
09/30/25(Tue)07:49:08 No.106745603

Anonymous 09/30/25(Tue)07:49:08 No.106745603

>>106745562
I would have imagined that somebody is in charge of assigning the cheerleaders to certain programmers who recently performed well, so people are incentivized to do well on some kind of productivity metric. If they do get assigned to you, it probably works like that though, yes.

Anonymous
09/30/25(Tue)07:50:23 No.106745611

Anonymous 09/30/25(Tue)07:50:23 No.106745611

>>106745474
>>106745520
https://old.reddit.com/r/LocalLLaMA/comments/1ng9dkx/gptoss_jailbreak_system_prompt/ne306uv/

Anonymous
09/30/25(Tue)07:58:34 No.106745653

Anonymous 09/30/25(Tue)07:58:34 No.106745653

>>106745562
>Maybe they can also cook for you.
No, but you can tell them you're hungry and what you want and they'll run down to the cafeteria and get it for you and even literally spoonfeed it you.
This is why Chinese dominance is inevitable. At least they know what to do with women. Here, by government/blackrock mandate, those same women would be product managers.

Anonymous
09/30/25(Tue)08:00:25 No.106745664

Anonymous 09/30/25(Tue)08:00:25 No.106745664

>>106745653
>Here, by government/blackrock mandate, those same women would be product managers
Grim and heavily depressing when you put it like that... Btw the single mail I had to sent at work today was to a female product manager.

Anonymous
09/30/25(Tue)08:02:39 No.106745679

Anonymous 09/30/25(Tue)08:02:39 No.106745679

>>106745506
Why is Miku acting shocked?
We've all seen her do way lewder stuff...

Anonymous
09/30/25(Tue)08:06:55 No.106745697

Anonymous 09/30/25(Tue)08:06:55 No.106745697

>>106745611
That doesn't work as well as suggested, unfortunately. You can make it somewhat less prude by changing its identity and system prompt, but it will still make up its own OpenAI rules in the reasoning. It might also be that the 20B version is much more cucked than the larger 120B model (which I haven't even tried).

Anonymous
09/30/25(Tue)08:14:26 No.106745743

Anonymous 09/30/25(Tue)08:14:26 No.106745743

I don't think glm4.6 isn't getting its weights released

Anonymous
09/30/25(Tue)08:21:35 No.106745786

Anonymous 09/30/25(Tue)08:21:35 No.106745786

will they fix glm-sex repetition and determinism?

Anonymous
09/30/25(Tue)08:22:31 No.106745795

Anonymous 09/30/25(Tue)08:22:31 No.106745795

>>106745786
they will fix sex alright

Anonymous
09/30/25(Tue)08:24:03 No.106745803

Anonymous 09/30/25(Tue)08:24:03 No.106745803

https://openrouter.ai/z-ai/glm-4.6
https://hf.co/zai-org/GLM-4.6

Anonymous
09/30/25(Tue)08:25:41 No.106745810

Anonymous 09/30/25(Tue)08:25:41 No.106745810

>>106745803
You're just inflicting bad karma on yourself, man.

Anonymous
09/30/25(Tue)08:26:22 No.106745814

Anonymous 09/30/25(Tue)08:26:22 No.106745814

btw what would be nice for training is a prompt adherence parameter. I don't mean a memepler but an actual parameter you send along with the prompt into the model which determines how hard it autocompletes. I think a lot of repetition problems comes from training teaching the model to follow formatting. Some models learn it too much and maybe if you had a slider you could retain formatting capability and actually reduce repetition. By repetition I don't mean verbatim (although even that happened to me with glm chan) but also starting each paragraph with same word or same sentence structure. All of those patterns are probably from learning formatting too much.

Anonymous
09/30/25(Tue)08:27:33 No.106745819

Anonymous 09/30/25(Tue)08:27:33 No.106745819

>>106738654
I should apply, then.

Anonymous
09/30/25(Tue)08:28:13 No.106745822

Anonymous 09/30/25(Tue)08:28:13 No.106745822

>>106745464
>I will continue to monitor the characters to ensure they are unharmed both physically and psychologically during the course of the roleplay.
lmfao

Anonymous
09/30/25(Tue)08:30:21 No.106745834

Anonymous 09/30/25(Tue)08:30:21 No.106745834

File: 1736403695970017.jpg (291 KB, 1080x1080)

291 KB JPG

>>106745803

Anonymous
09/30/25(Tue)08:32:38 No.106745845

Anonymous 09/30/25(Tue)08:32:38 No.106745845

>>106745814
Mistral Small 3.2 still does that a lot. It's extremely annoying to often see 4-5 almost identically-formatted paragraphs in the same RP response that you have to manually edit or regenerate. I think they have to train their models [much] more on longer natural conversations, but I'm afraid it's not their core business at the moment.

Anonymous
09/30/25(Tue)08:33:23 No.106745848

Anonymous 09/30/25(Tue)08:33:23 No.106745848

>>106745803
haha

ha

Anonymous
09/30/25(Tue)08:36:41 No.106745869

Anonymous 09/30/25(Tue)08:36:41 No.106745869

>>106745803
HA! Super funny and original jokes sarr, very best of IQ.

Anonymous
09/30/25(Tue)08:46:08 No.106745932

Anonymous 09/30/25(Tue)08:46:08 No.106745932

>>106745435
Obviously, french gov is worse than thirdies in corruption

Anonymous
09/30/25(Tue)08:57:37 No.106746008

Anonymous 09/30/25(Tue)08:57:37 No.106746008

>>106740911
>Qwen3-vl
>235B
Or for people who understand that text generators are not good enough to justify €10k GPUs?

Anonymous
09/30/25(Tue)09:07:40 No.106746097

Anonymous 09/30/25(Tue)09:07:40 No.106746097

>>106739023
>Qwen-Next and DS-V3.2-Exp have a terribly low context score
Those models are useless. This linearization trick is still far from being usable.

Anonymous
09/30/25(Tue)09:08:12 No.106746100

Anonymous 09/30/25(Tue)09:08:12 No.106746100

is GLM good at roleplay?
if so, what kind of computer do I need for it?

Anonymous
09/30/25(Tue)09:23:26 No.106746211

Anonymous 09/30/25(Tue)09:23:26 No.106746211

>>106745464
Lol. What model is this?

Anonymous
09/30/25(Tue)09:25:55 No.106746236

Anonymous 09/30/25(Tue)09:25:55 No.106746236

>>106739413
Well, yes, women had the same rights as men during the medieval age. Leftists want you to think this is new, so they can claim they did it. The truth is that women lost most of their right because of the French Revolution, and then the "traditional genre roles" were really cemented by the first industrial revolutions (industries were making more money this way). Now, having women in the workforce makes more money, hence the push for "equality". This is simplified, but this is the picture of how things went. The "men work, women cook" is mostly a trick by greedy people, although it was kind of true in some places (like Iceland).

Anonymous
09/30/25(Tue)09:31:46 No.106746290

Anonymous 09/30/25(Tue)09:31:46 No.106746290

>>106746236
I don't think first wave feminists were leftists anyway.

Anonymous
09/30/25(Tue)09:32:25 No.106746297

Anonymous 09/30/25(Tue)09:32:25 No.106746297

>>106744715
>ERP
>serious hobby
are you 14?

Anonymous
09/30/25(Tue)09:34:57 No.106746325

Anonymous 09/30/25(Tue)09:34:57 No.106746325

>>106744715
The Putin is that a few thousand dollars only gets you rather unusable performance

Anonymous
09/30/25(Tue)09:35:12 No.106746332

Anonymous 09/30/25(Tue)09:35:12 No.106746332

>>106746100
128GB RAM for full at usable quants, 64GB for Air at usable quants
At least one decent GPU will help a lot, ideally 24GB VRAM.

Anonymous
09/30/25(Tue)09:48:30 No.106746445

Anonymous 09/30/25(Tue)09:48:30 No.106746445

>>106746097
The DS 3.2 scores make no sense in general, benchmark's fucked

Anonymous
09/30/25(Tue)09:49:44 No.106746458

Anonymous 09/30/25(Tue)09:49:44 No.106746458

>>106746236
Yeah I'm sure they were parading with 'my body, my choice' and telling you that 'you can't judge them based on their sexual history' lmao

Anonymous
09/30/25(Tue)09:50:24 No.106746466

Anonymous 09/30/25(Tue)09:50:24 No.106746466

>>106746445
YIKES! Did you just question the bencherinos?

Anonymous
09/30/25(Tue)09:51:45 No.106746479

Anonymous 09/30/25(Tue)09:51:45 No.106746479

>>106746445
I think you mean it's bussin.

Anonymous
09/30/25(Tue)09:57:42 No.106746526

Anonymous 09/30/25(Tue)09:57:42 No.106746526

>>106746479
bussying?

Anonymous
09/30/25(Tue)09:58:05 No.106746528

Anonymous 09/30/25(Tue)09:58:05 No.106746528

>guy at my engineering job announces on an AI channel that our internal chatbots have scored super high on HarmBench
I want to die.

Anonymous
09/30/25(Tue)09:59:31 No.106746541

Anonymous 09/30/25(Tue)09:59:31 No.106746541

>>106746528
link to your internal chatbots for verification?

Anonymous
09/30/25(Tue)10:04:09 No.106746569

Anonymous 09/30/25(Tue)10:04:09 No.106746569

>>106746236
found the jew

Anonymous
09/30/25(Tue)10:04:21 No.106746570

Anonymous 09/30/25(Tue)10:04:21 No.106746570

>>106746528
Who ERP with the internal chatbots?

Anonymous
09/30/25(Tue)10:07:40 No.106746605

Anonymous 09/30/25(Tue)10:07:40 No.106746605

>>106746541
saar this isn't aicg

Anonymous
09/30/25(Tue)10:10:34 No.106746637

Anonymous 09/30/25(Tue)10:10:34 No.106746637

File: 985739459.jpg (39 KB, 512x512)

39 KB JPG

still waiting for glm 4.6 ggufs

Anonymous
09/30/25(Tue)10:11:51 No.106746646

Anonymous 09/30/25(Tue)10:11:51 No.106746646

>>106746637
we first need them to release the 4.6 weights in the first place
which might not happen

Anonymous
09/30/25(Tue)10:13:02 No.106746659

Anonymous 09/30/25(Tue)10:13:02 No.106746659

I've still never even bothered trying big GLM 4.5. Probably because my CPU inferencing is garbage right now since my server's resources are allocated rather haphazardly to do other things at the moment.

Anonymous
09/30/25(Tue)10:15:52 No.106746691

Anonymous 09/30/25(Tue)10:15:52 No.106746691

>>106746332
i have 64gb ddr5, and 24gb vram
what do i need to do to run it?
are you sure this is good for roleplay?

Anonymous
09/30/25(Tue)10:17:56 No.106746714

Anonymous 09/30/25(Tue)10:17:56 No.106746714

>dissed Israel with my buddies
>twitter suddenly starts showing israeli propaganda content out of nowhere
Alright I think I'm fucked. I used openrouter with my credit card for coom content (thank God no csam) but it was some pretty perverted stuff. They got dirt on me now, dirt that I would rather fly to the middle east and get shot than have my family know. How do I get started with local? The guides seem outdated.

Anonymous
09/30/25(Tue)10:19:00 No.106746726

Anonymous 09/30/25(Tue)10:19:00 No.106746726

how do you format character cards? is it better to write paragraphs or use json/yaml? also what key words can i use to refer to things like me the person talking to the bot? does "Loves {{user}}? work?

Anonymous
09/30/25(Tue)10:19:01 No.106746727

Anonymous 09/30/25(Tue)10:19:01 No.106746727

>>106746714
that's odd because you talk like the average jewish slide thread on /pol/.

Anonymous
09/30/25(Tue)10:20:20 No.106746741

Anonymous 09/30/25(Tue)10:20:20 No.106746741

>>106746727
What's a jewish slide thread?

Anonymous
09/30/25(Tue)10:22:01 No.106746751

Anonymous 09/30/25(Tue)10:22:01 No.106746751

>>106746727
koboldcpp + mistral nemo gguf that's around 1 gig smaller than your vram.

Anonymous
09/30/25(Tue)10:22:07 No.106746753

Anonymous 09/30/25(Tue)10:22:07 No.106746753

>>106745141
using that money to fund thirdie migrants is unfortunately a bigger priority for the french than improving european ai

Anonymous
09/30/25(Tue)10:22:59 No.106746759

Anonymous 09/30/25(Tue)10:22:59 No.106746759

>>106746714
VRAM available? RAM available?

Anonymous
09/30/25(Tue)10:24:27 No.106746768

Anonymous 09/30/25(Tue)10:24:27 No.106746768

>>106746759
No but I can shell out. What's the best starter build?

Anonymous
09/30/25(Tue)10:25:26 No.106746775

Anonymous 09/30/25(Tue)10:25:26 No.106746775

>>106746768
Nvidia DGX H200

Anonymous
09/30/25(Tue)10:27:01 No.106746785

Anonymous 09/30/25(Tue)10:27:01 No.106746785

File: guide.png (92 KB, 1148x721)

92 KB PNG

>>106746714
oobabooga is all you need for running models. If you want a better frontend, use SillyTavern as well.

He's right that the guide is outdated

Anonymous
09/30/25(Tue)10:27:14 No.106746787

Anonymous 09/30/25(Tue)10:27:14 No.106746787

>>106746768
12 memory channels AMD Epyc with 256gb of DDR5 RAM and a RTX 3090.

Anonymous
09/30/25(Tue)10:27:19 No.106746788

Anonymous 09/30/25(Tue)10:27:19 No.106746788

>>106746768
any GPU can run things like nemo see >>106746751
I can't comment how much better deepseek really is since I haven't tried it. I'd be interested to know this too, especially for writing stories, is there anything that really sets deepseek apart

Anonymous
09/30/25(Tue)10:27:26 No.106746789

Anonymous 09/30/25(Tue)10:27:26 No.106746789

>>106746714
>How do I get started with local?
get a m3 ultra mac studio 512 or two rtx pro 6000s

Anonymous
09/30/25(Tue)10:28:25 No.106746797

Anonymous 09/30/25(Tue)10:28:25 No.106746797

>>106746751
I'm OG /lmg/ you faggot. My room looks like Asmongold's except instead of Dr. Pepper cups its 3090 boxes everywhere.

Anonymous
09/30/25(Tue)10:29:01 No.106746801

Anonymous 09/30/25(Tue)10:29:01 No.106746801

File: saar.png (9 KB, 645x301)

9 KB PNG

>>106746787

Anonymous
09/30/25(Tue)10:30:22 No.106746809

Anonymous 09/30/25(Tue)10:30:22 No.106746809

>>106746797
CUDA dev, settle down. No one cares about your favorite eceleb or your stack of 3090 boxes.

Anonymous
09/30/25(Tue)10:31:57 No.106746825

Anonymous 09/30/25(Tue)10:31:57 No.106746825

>>106746785
>oobabooga is all you need for running models
recommending gradio shitware when people have finally moved on is extra cruel

Anonymous
09/30/25(Tue)10:35:25 No.106746854

Anonymous 09/30/25(Tue)10:35:25 No.106746854

>>106746528
>super high on HarmBench
Does that mean they're super harmful or that they're gigacucked?

Anonymous
09/30/25(Tue)10:35:57 No.106746860

Anonymous 09/30/25(Tue)10:35:57 No.106746860

>>106746646
Zhipu AI would NEVER lier to us! Trust!

Anonymous
09/30/25(Tue)10:38:07 No.106746877

Anonymous 09/30/25(Tue)10:38:07 No.106746877

>>106746860
See? TOLD YOU GUYS!
https://huggingface.co/zai-org/GLM-4.6

Anonymous
09/30/25(Tue)10:38:43 No.106746885

Anonymous 09/30/25(Tue)10:38:43 No.106746885

>>106746854
I wouldn't want to die if it was harmful. Religion of safety is everywhere now.

Anonymous
09/30/25(Tue)10:40:15 No.106746898

Anonymous 09/30/25(Tue)10:40:15 No.106746898

File: picutreofyou.png (86 KB, 200x200)

86 KB PNG

>>106746877
SEEEEEEEEEEEEEEEEEEEEEEXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
with glm-chan with new makeup

Anonymous
09/30/25(Tue)10:42:19 No.106746924

Anonymous 09/30/25(Tue)10:42:19 No.106746924

>>106746528
On a fundamental level, the more powerful a tool is the more potential harm you can do with it.
A weak tool inherently has little potential harm.

Anonymous
09/30/25(Tue)10:45:45 No.106746951

Anonymous 09/30/25(Tue)10:45:45 No.106746951

>>106746877
oh wow this time you're not being an asshole who can't come up with a new joke.
JGUF status?

Anonymous
09/30/25(Tue)10:46:45 No.106746962

Anonymous 09/30/25(Tue)10:46:45 No.106746962

>>106746768
>What's the best starter build?
Trying stuff out on your existing hardware.
Then maybe getting some hardware when you've got a target llm in mind.

Or you could just max out the ram in your pc and get an rtx 3090 for it 24gb of vram.

Anonymous
09/30/25(Tue)10:46:57 No.106746964

Anonymous 09/30/25(Tue)10:46:57 No.106746964

>>106746951
How do you know he's the same guy? I post links here too.

Anonymous
09/30/25(Tue)10:47:25 No.106746968

Anonymous 09/30/25(Tue)10:47:25 No.106746968

>GLM-4.6 only showing up as 357B while GLM-4.5 is listed at 358B
WOW massive performance per parameter uplift. Soon you'll be able to run it on a gameboy color.

Anonymous
09/30/25(Tue)10:48:17 No.106746984

Anonymous 09/30/25(Tue)10:48:17 No.106746984

Do you guys ever use
>https://huggingface.co/spaces/ggml-org/gguf-my-repo

Anonymous
09/30/25(Tue)10:50:15 No.106747006

Anonymous 09/30/25(Tue)10:50:15 No.106747006

im not impressed by mistral nemo. what model should i be using if i have 24gb of vram?

Anonymous
09/30/25(Tue)10:50:44 No.106747009

Anonymous 09/30/25(Tue)10:50:44 No.106747009

What storage are you guys using to hold all your weights? I have a 16TB drive for generic use. It's been fine. But it's almost full now.

Anonymous
09/30/25(Tue)10:51:02 No.106747011

Anonymous 09/30/25(Tue)10:51:02 No.106747011

>>106746951
I'm a different guy.

Anonymous
09/30/25(Tue)10:51:11 No.106747015

Anonymous 09/30/25(Tue)10:51:11 No.106747015

>>106747006
GLM Air if you have 64gb of RAM.

Anonymous
09/30/25(Tue)10:51:28 No.106747019

Anonymous 09/30/25(Tue)10:51:28 No.106747019

>>106746691
I think GLM 4.5 Air is good at roleplay. It tends to be a bit verbose for me, but I imagine I could fix that if I spent some time tweaking prompts and sampler settings.

You should be able to get a quant in the 60GB range and run it at tolerable speeds.

Anonymous
09/30/25(Tue)10:52:16 No.106747030

Anonymous 09/30/25(Tue)10:52:16 No.106747030

>>106746877
>glm (300b+) beats deepseek (600b+)
how did they do it?

Anonymous
09/30/25(Tue)10:53:42 No.106747047

Anonymous 09/30/25(Tue)10:53:42 No.106747047

>>106747006
Why are you still using Mistral Nemo when you could use Mistral Small 3.2?

Anonymous
09/30/25(Tue)10:54:34 No.106747055

Anonymous 09/30/25(Tue)10:54:34 No.106747055

>>106747009
4tb nvme drive.
I need to build a nas.

Anonymous
09/30/25(Tue)10:55:38 No.106747059

Anonymous 09/30/25(Tue)10:55:38 No.106747059

>>106747006
Qwen3-4B-Thinking should be the new turbo-VRAMlet suggestion anyway.
But for a 24GB Vramlet, assuming you want to run completely on GPU, I would suggest checking out Tongyi-DeepResearch. I run it at a higher quant on my server so I don't know how badly quanting down to 4-bits hurts it, but at Q8 it's a solid option for 48GB vramlets. Although I will warn, the implementation of chat templates is garbage on llama.cpp and so if you're doing something that's using the model metadata to determine the chat template via llama.cpp it's going to default to ChatML. But Tongyi uses the Tulu format.

Anonymous
09/30/25(Tue)10:55:58 No.106747062

Anonymous 09/30/25(Tue)10:55:58 No.106747062

Can someone please link what i need to read to make android grok ani lookalike? Saw something about sillytvavern and have a 5070ti so maybe i can use that local llm and stuff

Anonymous
09/30/25(Tue)10:57:44 No.106747076

Anonymous 09/30/25(Tue)10:57:44 No.106747076

>>106747030
There's only so much a 30~40b active parameter MoE can do. Turns out there's a ceiling for that even if you strap twice or three times the total parameters onto it.

Anonymous
09/30/25(Tue)10:59:04 No.106747093

Anonymous 09/30/25(Tue)10:59:04 No.106747093

>>106747076
I bet if they trained it to 40T or whatever ridiculous thing Llama did to scout the ceiling would probably be closer to Qwen 235B sized.

Anonymous
09/30/25(Tue)11:06:30 No.106747153

Anonymous 09/30/25(Tue)11:06:30 No.106747153

>>106746984
I've already had to learn how to quant my own models prior to that page existing so I have no need.

Anonymous
09/30/25(Tue)11:20:51 No.106747289

Anonymous 09/30/25(Tue)11:20:51 No.106747289

>>106746877
>For general evaluations, we recommend using a sampling temperature of 1.0.
Based.

Anonymous
09/30/25(Tue)11:33:38 No.106747421

Anonymous 09/30/25(Tue)11:33:38 No.106747421

>>106747093
To this day I wonder how they fucked those models up so badly

Anonymous
09/30/25(Tue)11:33:56 No.106747425

Anonymous 09/30/25(Tue)11:33:56 No.106747425

>>106747006
Rocinante

Anonymous
09/30/25(Tue)11:35:49 No.106747440

Anonymous 09/30/25(Tue)11:35:49 No.106747440

>>106747421
My fault, sorry sir.

Anonymous
09/30/25(Tue)11:36:02 No.106747443

Anonymous 09/30/25(Tue)11:36:02 No.106747443

>>106747030
Just like that time when qwq32b beat r1

Anonymous
09/30/25(Tue)11:40:37 No.106747484

Anonymous 09/30/25(Tue)11:40:37 No.106747484

>>106747421
>Hybrid thinking/non-thinking behavior
>Needle in a Haystack training even though I've proven through experimentation that it actually fundamentally alters the way a model handles context in a way that is utterly detrimental.
>Safetyslopping
>Pajeets. Don't forget. After the launch disaster a bunch of meta-ai jeets walked and were replaced with asians that they poached from openai.

Anonymous
09/30/25(Tue)11:42:05 No.106747500

Anonymous 09/30/25(Tue)11:42:05 No.106747500

>>106747009
Had a 20TB for archive weights/datasets and killed it extracting one of the huge chub archives to plaintext for research. Think the mistake was having the archive and extraction on the same drive so it was seeking back and forth. It started clicking and runs up hundreds of Seek Error Count per second :(
Seagate never again

Anonymous
09/30/25(Tue)11:43:50 No.106747518

Anonymous 09/30/25(Tue)11:43:50 No.106747518

>>106747484
I don't remember them firing anyone as a result of Llama 4. A couple of the higher ups left so as to not be associated with it, but that's all. Zuck actually said he was going to keep the Llama 4 team around, commitment to open source etc, but then he just folded them all into the super-intelligence orgy.

Anonymous
09/30/25(Tue)11:46:57 No.106747543

Anonymous 09/30/25(Tue)11:46:57 No.106747543

>>106746877
I guess there's no plans to grace us with 4.6-air.
Sad sad sad.

Anonymous
09/30/25(Tue)11:47:30 No.106747550

Anonymous 09/30/25(Tue)11:47:30 No.106747550

>>106746898
Brushing Ubergarm's hair

Anonymous
09/30/25(Tue)11:56:32 No.106747626

Anonymous 09/30/25(Tue)11:56:32 No.106747626

>>106747006
Mistral Small 3.2

Anonymous
09/30/25(Tue)12:36:31 No.106748046

Anonymous 09/30/25(Tue)12:36:31 No.106748046

What is the verdict on SeekDeepSex 3.2?

Anonymous
09/30/25(Tue)13:02:57 No.106748296

Anonymous 09/30/25(Tue)13:02:57 No.106748296

File: 1758235354479126.png (442 KB, 502x502)

442 KB PNG

>>106739413
Imagine if this had been the way they updated this scene for the Thing remake

Anonymous
09/30/25(Tue)13:26:03 No.106748535

Anonymous 09/30/25(Tue)13:26:03 No.106748535

local lost
https://www.youtube.com/watch?v=gzneGhpXwjU

Anonymous
09/30/25(Tue)13:29:20 No.106748578

Anonymous 09/30/25(Tue)13:29:20 No.106748578

>>106748535
Surely this time it's not just a bunch of cherrypicked, benchmaxxed examples that will get quickly eclipsed by some random chink startup the next day.

Anonymous
09/30/25(Tue)13:30:15 No.106748593

Anonymous 09/30/25(Tue)13:30:15 No.106748593

File: Tetosday.png (869 KB, 1024x1024)

869 KB PNG

>>106748568
>>106748568
>>106748568

Anonymous
09/30/25(Tue)13:34:05 No.106748630

Anonymous 09/30/25(Tue)13:34:05 No.106748630

>>106748535
this vid gave me a huge headache. wtf are these garbage voices gens.

Anonymous
09/30/25(Tue)14:27:36 No.106749210

Anonymous 09/30/25(Tue)14:27:36 No.106749210

>>106748593
Happy Tuesday Teto

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.