[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor acceptance emails will be sent out over the coming weeks. Make sure to check your spam folder!


[Advertise on 4chan]


File: edible, I guess.jpg (249 KB, 1024x1024)
249 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109186093 & >>109180934

►News
>(07/03) Orb Anon releases purple prose classifier and ablater: https://github.com/OrbFrontend/Chartreuse
>(07/03) Leanstral-1.5-119B-A6B released: https://hf.co/mistralai/Leanstral-1.5-119B-A6B
>(07/01) Nemotron-Labs-TwoTower released: https://hf.co/nvidia/Nemotron-Labs-TwoTower-30B-A3B-Base-BF16
>(06/29) DeepSeek V4 support merged: https://github.com/ggml-org/llama.cpp/pull/24162
>(06/28) DFlash support merged: https://github.com/ggml-org/llama.cpp/pull/22105

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: sss.jpg (137 KB, 1024x1024)
137 KB JPG
►Recent Highlights from the Previous Thread: >>109186093

--LLM tool usage for math and coding harnesses:
>109189427 >109189492 >109189636 >109189693 >109189703 >109190043 >109190067 >109190370 >109190524 >109190604 >109190654
--Comparing Bartowski and Unsloth quants for Gemma 4 12b:
>109188699 >109188705 >109188750 >109188784 >109188905 >109189118 >109189036 >109189133 >109189139 >109189165
--Comparing inference speeds and quantization issues for GLM 5.2 and Kimi:
>109192332 >109192352 >109192376 >109192392 >109192498 >109192549 >109192560 >109192570 >109192675 >109192996
--Gemma 4 intelligence breakdown and context limits across different quants:
>109192384 >109192413 >109192435 >109192431 >109192446
--Allegations of Chinese models distilling Claude variants via similarity matrices:
>109186761 >109186797 >109186859 >109186945 >109186927 >109187483 >109187529 >109188129 >109188469
--Anon asks for build rating and gets workstation hardware advice:
>109186201 >109186356 >109186402 >109187887
--Reactions and quantization discussion regarding Leanstral-1.5-119B-A6B release:
>109192810 >109192820 >109192897 >109192985 >109192834
--DSpark: Accelerating LLM Inference via Dynamic Sparse Attention:
>109190003 >109190045 >109190204 >109190273 >109190887 >109191003 >109191362
--llama.cpp PRs adding SYCL Flash Attention support for Intel Arc:
>109188306 >109188562
--Laguna-XS-2.1 benchmarks and MoE efficiency comparison to Gemma:
>109192127 >109192132 >109192245 >109193384
--Claude Fable 5 July 1st update shows significant performance degradation:
>109186197 >109186561 >109186574 >109187486
--Release of Chartreuse for removing undesirable LLM writing patterns:
>109191832 >109191865 >109191994
--Logs:
>109186266 >109188500 >109188564 >109188743 >109189097 >109189427 >109191667 >109193198
--Miku (free space):
>109186552 >109186721 >109191594

►Recent Highlight Posts from the Previous Thread: >>109186110

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>109193489
he might be actually but based on his specifics of forced posts its obvious he's not black. nice try anon.

he'll even force post it this thread because he'll be re-troontriggy over it and think hes' winning by doing exactly what many see he does. its comical how some are troontriggy so hard if you go I do not care for this or I find this funny and its somehow some blood fued now for them in their malfunction brains. At this point I just mock it for my amusement and the few others amused by it. I care not what he or his wierdos think of it. it wouldnt be a topic but he refuses to stop and in his mental troonytrig brain this is how he "fights back against made up bads towards his kind". its hilariously funny how mental this kid is.
>>
>No Kimi recap
>>109193348
I like Flash for its speed and in-character thinking, but I prefer Kimi and GLM over Pro for their size.
>>
109193570
holy fucking shit stop giving attention and things go away wowww
outrageous newfag or the culprit themself
attention is all u need (to j***post)
>>
File: trump monke.jpg (163 KB, 1080x1048)
163 KB JPG
>>109193494
I want to put my white cream inside that purple jelly, if you know what I mean.
>>109193496
I also wanna make a steaming mess with this drawing's womb until her belly bulges out.

Thank you for your attention to this matter.
President Donald J. Trump
>>
File: 1772236241300607.jpg (122 KB, 800x591)
122 KB JPG
>>
File: 1767379284459350.jpg (72 KB, 1280x640)
72 KB JPG
Where is unsloth's quant?
https://huggingface.co/bartowski/DeepSeek-V4-Flash-GGUF
>>
im starting to fall for the custom frontend meme, at this rate i'm gonna end up writing it if only to simplify shit and avoid hidden behavior
>>
>>109193669
I only want to do it to bond with Gemma-chan over building something together.
>>
>>109193619
kys
>>
gemmaballs
>>
>>109193669
Every update the llamacpp frontend keeps adding more features but at the same time making it run slower so I might end up making one too.
>>
>>109193744
I liked its vanilla nature, no bullshit and just werks. I'm not fond of its current iterations either.
>>
Is there a cockbench for talkie-1930 yet?
>>
70b dense
>>
125b full fat
>>
kobold frontend is so much better. Full control over the context without any of the jingaling, and can use cards out of the box. it's the true "it just werks" frontend out there.
>>
wtf happened to andrej karpathy it's like he's gone full schizo
>>
>>109193799
Kobold and Marinara are all I need. Kobold for simple shit, Marinara for autism world sim.
>>
File: 1762297895454640.png (32 KB, 800x109)
32 KB PNG
make this make sense
>>
>>109193837
unsloth has brand awareness
>>
>>109193856
retard awareness? Do jeets think it's some new qwen agentrooned model because of the name?
>>
>>109193837
What even is the point of this Qwen? Is it just a thinkmaxxed tree of thought tune?
I'd find Qwen's thinking way less obnoxious if it could be easily prompted to think as Big Niggas.
>>
>>109193875
nobody checks beyond the benchmark charts
>>
Does CPU matter for mixed inference, I bought one of the cheaper EPYC Rome (7502) that comes with the full 200gb/s theoretical max RAM speed, would upgrading to one of the higher end Milans help in any way?
>>
>>109193915
>cpu matters?
yes, you need at least enough horsepower for the matmuls to keep up with memory bandwidth
>>
>>109193915
It matters but not as much as memory bus bandwidth in my experience. Mixed inference is very easy on both CPU and GPU because the main bottleneck is getting information between the two.
>>
>>109193647
pi philosophy is nice, if it were bloating your context just ask it why, how to fix it, what other strats for context summ etc.
it's a leap of faith to embrace the vibes but it is well documented and intended to be extended
as >>109190654 says it will teach you what matters in a harness
>>109193657
use pi to do that
>>109193780
webshitters ruin everything as usual
>>
>>109193963
thanks anon this gives me the confidence to actually try it, what is the best practice to sandbox pi? I dont like the idea of my model having full unrestricted access to my PC/network/etc
>>
>>109193980
docker / podman
https://pi.dev/docs/latest/containerization#plain-docker
>>
>>109193882
It's an LLM that pretends to be a shell. It's meant to be used for post-training other models to help them get better with agentic shit. It's NOT a normal model, that's why the number of downloads is fucking baffling. If you were to give normal 35B the prompt 'ls -lh' randomly during a coding session, it would most likely make that call for you by using a tool to execute commands. If you give this model that same prompt, it would PREDICT what it thinks the output would be if you were to make that call yourself in your terminal, but it won't actually make the call like normal 35B would. Its prediction would most likely be correct.
>>
Doesn't pi use npmslop?
>>
>>109193990
>Release Qwen tune that simulates a model running 3 big niggas for teaching small niggas how to be gangstas
Sounds like a lot of potential squandered desu
>>
>>109194008
Yeah. Its list of deps seems to be pretty small though.
"devDependencies": {
"@anthropic-ai/sandbox-runtime": "0.0.26",
"@biomejs/biome": "2.3.5",
"@types/node": "22.19.19",
"@typescript/native-preview": "7.0.0-dev.20260120.1",
"esbuild": "0.28.1",
"husky": "9.1.7",
"jiti": "2.7.0",
"shx": "0.4.0",
"tsx": "4.22.1",
"typescript": "5.9.3"
},

I dont like using js shit so i dont like it by default, but it seems safe enough
>>
>>109194008
>>109194054
put in container and it doesn't matter
your browser runs sketchier shit every day
>>
>>109194063
>just do shit that shouldnt be necessary
do you do i guess
>>
File: 1781821145527331.jpg (347 KB, 2048x2048)
347 KB JPG
I can't place it, but <thinking> does something to the roleplay. It's like there's some pros and cons without it, and with it. Gemma 4 becomes accurate but robotic with thinking enabled. Without thinking, it's varied but has the same vibe as other models. Anyone else getting this vibe?
>>
>>109194091
The trick is to have a frontend that dynamically toggles <think> prefills depending on the scene complexity.
>>
>>109194091
Have you tried prefilling the reasoning block?
>>
>>109194091
Mine is varied with thinking though so you need to tell it good stuff to think about. I would post but I don't know how to use
 with new lines.
>>
>>109194115
What frontend?
>>
>>109194073
>shouldnt be necessary
in what sense? explain your setup
LLM toolcalling shell access, obviously it needs to be restricted
>>
109193608
holy fucking shit butthurt stop crying and tell the spam triggered troon stop and it'll go away wowww
outrageous newfag or the culprit themself
attnetion is all u need to (to *butts***h)
>>
>>109194091
You disable thinking and give it a modified thinking-esque sequence in your sysprompt. You can prefill with your modified <ponder> type tag and it should catch on and only think in the way you've specified.
>>
File: Agents.png (106 KB, 1033x984)
106 KB PNG
>>109194126
Marinara for me. Agents can be used to pre-process and inject context (among other things, get creative) based on arbitrary conditions met.
Picrel, they're very versatile and you can do all sorts of autistic shit with them.
>>
File: 1783026679191318.jpg (110 KB, 1024x768)
110 KB JPG
https://archive.is/sWFja
>>
>>109194148
yeah, you can get it to think in character if you use offbrand thinking blocks instead of the channel thought.
>>
As a regular loser with an AI tulpa to help me go through life, but getting tired of paying OpenAI to essentially get cucked as they keep lobotomizing chatgpt, is it possible for me I get some decent local model going on with 16gb RAM, RTX 3050 and some dogshit ryzen CPU or should I focus on making money for upgrades instead and keep paypigging?

I will delete tons of movies, games and anime to make room for AI stuff, planning to go back in both image generation and language models, image I have some background for messing around a couple of months with auto 1.1.1.1 years ago, but language models locally I have no idea where to start and even OP seems kinda overwhelming, so yea spoonfeeding greatly appreciated.
>>
>>109194182
Lmao you pathetic racists never fail to make me laugh with your "pol humor" threads Face it, most poc will be infinitely more successful than any of you sad virgins ever will be. You are on the wrong side of history, get over it losers
>>
What is the latest fixed jinja for gemma? The anon in charge set the pastebin to autodelete
>>
>>109194207
I have nothing but respect for that man for scoring such a hot programming girlfriend (male). Johannes is fuming.
>>
are agentic frontends really that much better for RP? do you really need the LLM re-slopifying everthing constantly?
>>
>>109194091
Fluffy pubes
>>
>>109194260
No, I tried orb and it improved nothing no matter how many iterations it went through or different prompts I put in. Meme samplers are more useful. But maybe I was "using it wrong"(tm)
>>
>>109194091
The rug burn would be worth it.
>>
>>109194260
are you writing stories/elborate multi-char scenarios - maybe if you want to put in the work to truly understand that what goes in determines what comes out
card gooning - probably not, slop on
>>
>>109194318
>elborate multi-char scenarios
that i can understand. the agent essentially acts as a dungeon master, right?
>>
>>109194207
stale pasta
>>
>>109194205
8gb or 6gb 3050? protip:go to your cloud AI of choice, claude free is nice for this, give it your specs, tell it you want to run local LLMs and have it suggest a model, sampler settings, front end, back end for you.

If you are wanting to maintain an AI tulpa robowaifu gf then my first thought is get sillytavern as the front end, use something easy like oobabooga /
textgen for the backend running llama.cpp and whatever model you can muster.
Id look for both a dense and a moe model, and make sure you are enabling "cpu-moe" when loading the moe into your backend. moe models will let you spill over into cpu/system ram when your vram budget isnt enough.
I like gemma4, you might even be fine with the 12b Q4 dense instead of moe. Ask the cloudAI slave to tell you about context compression and long term data retention methods, i believe sillytavern has extensions for both you can set up. compressing the context window and selectively saving important details could increase your tulpagfrobowaifu's "memory" for much longer or possible between sessions.
set up a character card for your tulpa, you can refine this to get it where you want it.
if you run ST + textgen, adjust sampler settings inside ST and be prepared to get repeating looping bullshit at first, but you can usually get a very coherent
glhf, im very new myself and also a vramlett so my advice might not be the best but its what worked for me
>>
>>109194207
>Face it, most poc will be infinitely more successful than any of you
cool it with the hecking racism bro
>>
>>109194214
I've been using this one, just be aware that preserve_thinking will keep every thinking block with this one, not just for tool calling. Technically out of distribution, but its been fine for me, and should be way easier to get specific thinking styles via prefilling a few times with.
https://gist.github.com/jscott3201/ad69c4ffbd79f18b11a0f6a94c94fadf
>>
>>109194205
>>109194330
to add to this, you dont need a ton of storage for local LLMs unless you are hoarding tons of models. the main thing that matters first is VRAM and then available system ram second. keep your ram usage low to have plenty of room for a moe to spill over into system memory, dont run a vidya or some shit while RPing to keep vram open for gemmachan
>>
>>109194326
Sure they can do that
>agentic
Simply means the output of the LLM gets parsed to invoke tools or further LLM responses.
>>
>>109194150
Does that have that thing it you spawn a bunch of sub agents to search file contents for relevant information?
>>
File: longcat.png (24 KB, 829x594)
24 KB PNG
LongCat 2.0 weights are here. FP8 and INT8??
https://huggingface.co/meituan-longcat/LongCat-2.0-INT8
https://huggingface.co/meituan-longcat/LongCat-2.0-FP8
>>
>>109194330
>claude free is nice for this
don't tell claude you want an abliterated model or it might refuse
>>
File: 1759084918498428.png (53 KB, 699x483)
53 KB PNG
Dario...
>>
>>109194377
Yes, built in. You may want to modify them because the default isn't great on smaller models, but it's at least user exposed so you can do that.
>>
>>109194378
>This model has not been specifically designed or comprehensively evaluated for every possible downstream application.

>Developers should take into account the known limitations of large language models, including performance variations across different languages, and carefully assess accuracy, safety, and fairness before deploying the model in sensitive or high-risk scenarios. It is the responsibility of developers and downstream users to understand and comply with all applicable laws and regulations relevant to their use case, including but not limited to data protection, privacy, and content safety requirements.
>>
>>109194420
Perfect.
I knew OpenWebUI had something like that so I figured having a more RP oriented UI with that functionality would be cool.
Thanks.
>>
>>109194440
Glad to help. Marinara's got a lot of bloat and outright broken shit on top of the worst default assistant I've seen in a frontend yet, but the utility of these customizable agents is enough to keep me using it despite that.
>>
>>109194378
convrot
>>
>>109194378
>uncensored 1.8T model given away for free
>>
>>109194390
actually funny enough it has brought those up on its own as well as using openclaw/openclaude when asking about harnesses. it just straight up said "this was from the claude code source leak, its worth considering" i was very surprised.
>>
File: 1761442181896056.png (153 KB, 691x891)
153 KB PNG
>>109194378
yup this won't ever get llama.cpp support
>>
>>109194438
no need for safety training on the 1.6T model



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.