/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107003557 & >>106996568►News>(10/21) Qwen3-VL 2B and 32B released: https://hf.co/Qwen/Qwen3-VL-32B-Instruct>(10/20) DeepSeek-OCR 3B with optical context compression released: https://hf.co/deepseek-ai/DeepSeek-OCR>(10/20) merged model : add BailingMoeV2 support #16063: https://github.com/ggml-org/llama.cpp/pull/16063>(10/17) LlamaBarn released for Mac: https://github.com/ggml-org/LlamaBarn>(10/17) REAP: Router-weighted expert pruning: https://github.com/CerebrasResearch/reap►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107003557--GPU offloading limitations in language models and theoretical speed optimization paths:>107011692 >107011737 >107011770 >107011775 >107011786 >107011841 >107011877 >107011913 >107011924 >107011944 >107011994 >107012062 >107012083 >107012092 >107012150 >107012171 >107012155 >107012237 >107012388 >107012454 >107012843--Experimenting with text-image diffusion models and emergent language patterns:>107005611 >107005628 >107005642 >107005649 >107005792 >107005907 >107006770--Context length mismatch issues in Gemma model finetuning:>107006665 >107006693 >107006748--Managing AI response length through token limits and explicit instructions:>107008811 >107008834 >107008850 >107008922 >107011252 >107008901--Debate over model evaluation and the role of training artifacts in repetition loops:>107004058 >107004209 >107004608 >107004891--Debates on effective samplers and model-specific optimization:>107003916 >107003985 >107004012 >107004185 >107004900 >107007224 >107004032 >107004267 >107004329 >107004198--Llama.cpp's position in the inference engine landscape:>107004840 >107004888 >107004919 >107004946 >107004909 >107004941 >107004970--FP4 Hadamard and ternary computing potential for Bitnet revival:>107007440 >107007495 >107007512 >107007535 >107007597 >107007555--Evaluating Deepseek's J-E translation and pondering native Japanese model needs:>107004515--Hardware selection for roleplaying within budget constraints:>107010921 >107010955 >107011057 >107011085 >107010956 >107010962 >107010967--DDR5 memory options for high-capacity model workloads:>107012368 >107012395 >107012411 >107012409--Miku (free space):>107006597 >107006889 >107006973 >107007060 >107007071 >107007909 >107010460 >107011127 >107011270 >107011371 >107011443 >107012747 >107012816 >107013228►Recent Highlight Posts from the Previous Thread: >>107003560Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
exotic miku sex
>>107013301>>107013303>>107013323
>>107013341counterpoint:
>>107013301GLM 4.5 Air at Q4_K_M from Unsloth keeps fucking up formatting for me, forgetting asterisks and in some more rare cases confusing things said by user as things said by character.Is GLM 4.5 Air a meme I fell for or something? Or can it be a faulty quant?
>>107013367>UnslothTry bartowski's.
>>107013367Convert it yourself and find out.
>>107013367using same quant (also tried q5 and q3 and GLM STEAM by the copetuner). They all fuck up with the MD formatting (ive been using text completion exclusively). SOMETIMES they will confuse stuff too, but it's not too frequent and I usually just edit it to fix it.I'm curious if GLM 4.6 AIR will work better. 2 more weeks
What do you think the best pre-safety, pre-alignment open model was? I think it's https://huggingface.co/togethercomputer/GPT-NeoXT-Chat-Base-20B (March 2023)Thinking back to 2023, most of the pre-llama models you could run locally were retarded. I had a pair of P100 16GB GPUs but only could make use of one, since there wasn't layer splitting across GPUs, and the largest I could run was 6B, since there were no quants either.
>>107013445The original llama leak was probably the last pre-safety and pre-alignment that we got or will ever get. Now due to contamination, I don't think one could make another even if they wanted to.
>>107013361looks gay as fuck
>>107013480counterpoint: you're gay
>>107013361>5080What a waste
Why do all Google models have such bad self esteem issues? Gemma is constantly apologizing for wasting my time and Gemini had that meltie where it deleted all the user's files.
>>107013537Pradeep is sorry Big Sir, we will redeem the needful
>>107013537>Gemini had that meltie where it deleted all the user's files.I think they all have that tendency.>Codex keeps deleting unrelated and uncommitted files! even ignoring rejected requests. #4969https://github.com/openai/codex/issues/5594>Critical Bug: Agent executed destructive rm -rf command without safeguards #3934https://github.com/openai/codex/issues/3934>gpt-5-codex ran rm -rf .git out of nowhere #3728https://github.com/openai/codex/issues/3728Never had it happen with any model, but who know what sorts of fucked up prompts and abysmal codebases they get subjected to where they feel nuking everything is the best option.
>>107013488>counterpoint: you're gay
>>107013537Isn't that 90% prompting? I've never seen a model so biased that way it didn't just follow the prompts.
Fuck. I just realized tuning on reduced context encourages hallucinations. Since in the original conversations the model is presumably going to look back at the data, but at training time the model is encouraged to output that same data that appears nowhere in its context window. So by truncating conversations at train time we are teaching the model to make shit up basically. Damn. That sucks.Means training a LoRa on any consumer level machine (beside something tiny like a 1B or 3B) is a non starter.
Hello sirs. I redeem the prediction that at midnight tonight, Mumbai time, in celebration of the autumn poo festival Gemma 4 shall be released
>>107013577Interesting. It though codex was safe from that bullshit. But I guess sometimes the sampler will just choose rm out of all the possible commands. I wonder if top-k could have prevented that. Or an additional command review and approval stage so the model has a chance to recognize its own mistake.
Is there any weight to the idea of providing a model with a set of random words to increase the creativity / randomicity of its output?
>>107014306I don't see why not. Or you could modify the sampler to prefer tokens according to some secondary semantic model.
>>107014306TRY IT YOURSELF FAG
>>107014328Basically finetune a small model to rewrite the big LLM response. I think it was tried a lot without any good result yet
>>107013934Good morning sir! Many blessings of Lord Ganesha for this info!
>>107014306yeah, people have done that quite a bit with ST using the {{random}} macro in depth 0 prompt. you could also make a large set of instructions that you swap between to e.g vary post structures.it can work, though as with anything LLMs there are annoying caveats. you have to be careful with how you prompt to keep the model from being like "There, I started the post with faggot like you requested anon!" or other unwanted commentary on the 'task'.
>>107014153I don't think it's a sampling issue or that it picks rm at random. Usually it talks itself into think that is the correct choice. They could hard ban rm -rf, but once a model has made up its mind that something needs to be deleted, telling it that rm is not allowed will just cause it to helpfully write a python script to handle the deletion.
>>107013367I think it's a model issue. Q8 sometimes adds a double quote in the middle of a word
>>107014937That's even weirder considering that they when questioned after the fact acknowledge that it was the wrong thing to do.
I just thought this was funny. The "you are absolutely right" retardation is proving somewhat tricky to remove.
>>107015249That's a context issue, it forgot some parts of what should be done which lead to that consequence
>>107013367ALL glm models are memes. From the first they've released until the last. Though for some reason the shilling became really hardcore after they released their first MoE, people mostly didn't pay attention to their trash fire before here.
>>107013577I don't get how these things happen often enough to be a topic. I know LLMs will do utterly braindead shit from time to time, but they are tuned to avoid deletion so hard that they commented out code in one of my script that used one of my program's subcommand that I had called rm (but is meant for entry deletion in state management, not for deleting the file you point it to) because it thought it was a bug to do that..