[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: GPU (Giant Purring Unit).jpg (173 KB, 1024x1024)
173 KB
173 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107003557 & >>106996568

►News
>(10/21) Qwen3-VL 2B and 32B released: https://hf.co/Qwen/Qwen3-VL-32B-Instruct
>(10/20) DeepSeek-OCR 3B with optical context compression released: https://hf.co/deepseek-ai/DeepSeek-OCR
>(10/20) merged model : add BailingMoeV2 support #16063: https://github.com/ggml-org/llama.cpp/pull/16063
>(10/17) LlamaBarn released for Mac: https://github.com/ggml-org/LlamaBarn
>(10/17) REAP: Router-weighted expert pruning: https://github.com/CerebrasResearch/reap

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: file.png (1.05 MB, 751x751)
1.05 MB
1.05 MB PNG
►Recent Highlights from the Previous Thread: >>107003557

--GPU offloading limitations in language models and theoretical speed optimization paths:
>107011692 >107011737 >107011770 >107011775 >107011786 >107011841 >107011877 >107011913 >107011924 >107011944 >107011994 >107012062 >107012083 >107012092 >107012150 >107012171 >107012155 >107012237 >107012388 >107012454 >107012843
--Experimenting with text-image diffusion models and emergent language patterns:
>107005611 >107005628 >107005642 >107005649 >107005792 >107005907 >107006770
--Context length mismatch issues in Gemma model finetuning:
>107006665 >107006693 >107006748
--Managing AI response length through token limits and explicit instructions:
>107008811 >107008834 >107008850 >107008922 >107011252 >107008901
--Debate over model evaluation and the role of training artifacts in repetition loops:
>107004058 >107004209 >107004608 >107004891
--Debates on effective samplers and model-specific optimization:
>107003916 >107003985 >107004012 >107004185 >107004900 >107007224 >107004032 >107004267 >107004329 >107004198
--Llama.cpp's position in the inference engine landscape:
>107004840 >107004888 >107004919 >107004946 >107004909 >107004941 >107004970
--FP4 Hadamard and ternary computing potential for Bitnet revival:
>107007440 >107007495 >107007512 >107007535 >107007597 >107007555
--Evaluating Deepseek's J-E translation and pondering native Japanese model needs:
>107004515
--Hardware selection for roleplaying within budget constraints:
>107010921 >107010955 >107011057 >107011085 >107010956 >107010962 >107010967
--DDR5 memory options for high-capacity model workloads:
>107012368 >107012395 >107012411 >107012409
--Miku (free space):
>107006597 >107006889 >107006973 >107007060 >107007071 >107007909 >107010460 >107011127 >107011270 >107011371 >107011443 >107012747 >107012816 >107013228

►Recent Highlight Posts from the Previous Thread: >>107003560

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
exotic miku sex
>>
File: d.jpg (134 KB, 2221x384)
134 KB
134 KB JPG
>>107013301
>>107013303
>>107013323
>>
File: file.png (863 KB, 1066x605)
863 KB
863 KB PNG
>>107013341
counterpoint:
>>
>>107013301
GLM 4.5 Air at Q4_K_M from Unsloth keeps fucking up formatting for me, forgetting asterisks and in some more rare cases confusing things said by user as things said by character.
Is GLM 4.5 Air a meme I fell for or something? Or can it be a faulty quant?
>>
>>107013367
>Unsloth
Try bartowski's.
>>
>>107013367
Convert it yourself and find out.
>>
>>107013367
using same quant (also tried q5 and q3 and GLM STEAM by the copetuner). They all fuck up with the MD formatting (ive been using text completion exclusively). SOMETIMES they will confuse stuff too, but it's not too frequent and I usually just edit it to fix it.
I'm curious if GLM 4.6 AIR will work better. 2 more weeks
>>
What do you think the best pre-safety, pre-alignment open model was? I think it's https://huggingface.co/togethercomputer/GPT-NeoXT-Chat-Base-20B (March 2023)

Thinking back to 2023, most of the pre-llama models you could run locally were retarded. I had a pair of P100 16GB GPUs but only could make use of one, since there wasn't layer splitting across GPUs, and the largest I could run was 6B, since there were no quants either.
>>
>>107013445
The original llama leak was probably the last pre-safety and pre-alignment that we got or will ever get. Now due to contamination, I don't think one could make another even if they wanted to.
>>
>>107013361
looks gay as fuck
>>
>>107013480
counterpoint: you're gay
>>
>>107013361
>5080
What a waste
>>
Why do all Google models have such bad self esteem issues? Gemma is constantly apologizing for wasting my time and Gemini had that meltie where it deleted all the user's files.
>>
>>107013537
Pradeep is sorry Big Sir, we will redeem the needful
>>
>>107013537
>Gemini had that meltie where it deleted all the user's files.
I think they all have that tendency.

>Codex keeps deleting unrelated and uncommitted files! even ignoring rejected requests. #4969
https://github.com/openai/codex/issues/5594
>Critical Bug: Agent executed destructive rm -rf command without safeguards #3934
https://github.com/openai/codex/issues/3934
>gpt-5-codex ran rm -rf .git out of nowhere #3728
https://github.com/openai/codex/issues/3728

Never had it happen with any model, but who know what sorts of fucked up prompts and abysmal codebases they get subjected to where they feel nuking everything is the best option.
>>
File: file.jpg (147 KB, 1254x837)
147 KB
147 KB JPG
>>107013488
>counterpoint: you're gay
>>
>>107013537
Isn't that 90% prompting? I've never seen a model so biased that way it didn't just follow the prompts.
>>
Fuck. I just realized tuning on reduced context encourages hallucinations. Since in the original conversations the model is presumably going to look back at the data, but at training time the model is encouraged to output that same data that appears nowhere in its context window. So by truncating conversations at train time we are teaching the model to make shit up basically. Damn. That sucks.
Means training a LoRa on any consumer level machine (beside something tiny like a 1B or 3B) is a non starter.
>>
Hello sirs.
I redeem the prediction that at midnight tonight, Mumbai time, in celebration of the autumn poo festival Gemma 4 shall be released
>>
>>107013577
Interesting. It though codex was safe from that bullshit. But I guess sometimes the sampler will just choose rm out of all the possible commands. I wonder if top-k could have prevented that. Or an additional command review and approval stage so the model has a chance to recognize its own mistake.
>>
Is there any weight to the idea of providing a model with a set of random words to increase the creativity / randomicity of its output?
>>
>>107014306
I don't see why not. Or you could modify the sampler to prefer tokens according to some secondary semantic model.
>>
>>107014306
TRY IT YOURSELF FAG
>>
>>107014328
Basically finetune a small model to rewrite the big LLM response. I think it was tried a lot without any good result yet
>>
>>107013934
Good morning sir! Many blessings of Lord Ganesha for this info!
>>
>>107014306
yeah, people have done that quite a bit with ST using the {{random}} macro in depth 0 prompt. you could also make a large set of instructions that you swap between to e.g vary post structures.
it can work, though as with anything LLMs there are annoying caveats. you have to be careful with how you prompt to keep the model from being like "There, I started the post with faggot like you requested anon!" or other unwanted commentary on the 'task'.
>>
>>107014153
I don't think it's a sampling issue or that it picks rm at random. Usually it talks itself into think that is the correct choice. They could hard ban rm -rf, but once a model has made up its mind that something needs to be deleted, telling it that rm is not allowed will just cause it to helpfully write a python script to handle the deletion.
>>
>>107013367
I think it's a model issue. Q8 sometimes adds a double quote in the middle of a word
>>
>>107014937
That's even weirder considering that they when questioned after the fact acknowledge that it was the wrong thing to do.
>>
File: absolutely right.png (306 KB, 2165x936)
306 KB
306 KB PNG
I just thought this was funny. The "you are absolutely right" retardation is proving somewhat tricky to remove.
>>
>>107015249
That's a context issue, it forgot some parts of what should be done which lead to that consequence
>>
>>107013367
ALL glm models are memes. From the first they've released until the last. Though for some reason the shilling became really hardcore after they released their first MoE, people mostly didn't pay attention to their trash fire before here.
>>
>>107013577
I don't get how these things happen often enough to be a topic. I know LLMs will do utterly braindead shit from time to time, but they are tuned to avoid deletion so hard that they commented out code in one of my script that used one of my program's subcommand that I had called rm (but is meant for entry deletion in state management, not for deleting the file you point it to) because it thought it was a bug to do that..



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.