/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 07/03/26(Fri)16:11:03 No.109193494

File: edible, I guess.jpg (249 KB, 1024x1024)

/lmg/ - Local Models General Anonymous 07/03/26(Fri)16:11:03 No.109193494

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109186093 & >>109180934

►News
>(07/03) Orb Anon releases purple prose classifier and ablater: https://github.com/OrbFrontend/Chartreuse
>(07/03) Leanstral-1.5-119B-A6B released: https://hf.co/mistralai/Leanstral-1.5-119B-A6B
>(07/01) Nemotron-Labs-TwoTower released: https://hf.co/nvidia/Nemotron-Labs-TwoTower-30B-A3B-Base-BF16
>(06/29) DeepSeek V4 support merged: https://github.com/ggml-org/llama.cpp/pull/24162
>(06/28) DFlash support merged: https://github.com/ggml-org/llama.cpp/pull/22105

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
07/03/26(Fri)16:11:18 No.109193496

Anonymous 07/03/26(Fri)16:11:18 No.109193496

File: sss.jpg (137 KB, 1024x1024)

137 KB JPG

►Recent Highlights from the Previous Thread: >>109186093

--LLM tool usage for math and coding harnesses:
>109189427 >109189492 >109189636 >109189693 >109189703 >109190043 >109190067 >109190370 >109190524 >109190604 >109190654
--Comparing Bartowski and Unsloth quants for Gemma 4 12b:
>109188699 >109188705 >109188750 >109188784 >109188905 >109189118 >109189036 >109189133 >109189139 >109189165
--Comparing inference speeds and quantization issues for GLM 5.2 and Kimi:
>109192332 >109192352 >109192376 >109192392 >109192498 >109192549 >109192560 >109192570 >109192675 >109192996
--Gemma 4 intelligence breakdown and context limits across different quants:
>109192384 >109192413 >109192435 >109192431 >109192446
--Allegations of Chinese models distilling Claude variants via similarity matrices:
>109186761 >109186797 >109186859 >109186945 >109186927 >109187483 >109187529 >109188129 >109188469
--Anon asks for build rating and gets workstation hardware advice:
>109186201 >109186356 >109186402 >109187887
--Reactions and quantization discussion regarding Leanstral-1.5-119B-A6B release:
>109192810 >109192820 >109192897 >109192985 >109192834
--DSpark: Accelerating LLM Inference via Dynamic Sparse Attention:
>109190003 >109190045 >109190204 >109190273 >109190887 >109191003 >109191362
--llama.cpp PRs adding SYCL Flash Attention support for Intel Arc:
>109188306 >109188562
--Laguna-XS-2.1 benchmarks and MoE efficiency comparison to Gemma:
>109192127 >109192132 >109192245 >109193384
--Claude Fable 5 July 1st update shows significant performance degradation:
>109186197 >109186561 >109186574 >109187486
--Release of Chartreuse for removing undesirable LLM writing patterns:
>109191832 >109191865 >109191994
--Logs:
>109186266 >109188500 >109188564 >109188743 >109189097 >109189427 >109191667 >109193198
--Miku (free space):
>109186552 >109186721 >109191594

►Recent Highlight Posts from the Previous Thread: >>109186110

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
07/03/26(Fri)16:22:51 No.109193570

Anonymous 07/03/26(Fri)16:22:51 No.109193570

File: download - 2026-05-29T043(...).jpg (157 KB, 768x768)

157 KB JPG

>>109193489
he might be actually but based on his specifics of forced posts its obvious he's not black. nice try anon.

he'll even force post it this thread because he'll be re-troontriggy over it and think hes' winning by doing exactly what many see he does. its comical how some are troontriggy so hard if you go I do not care for this or I find this funny and its somehow some blood fued now for them in their malfunction brains. At this point I just mock it for my amusement and the few others amused by it. I care not what he or his wierdos think of it. it wouldnt be a topic but he refuses to stop and in his mental troonytrig brain this is how he "fights back against made up bads towards his kind". its hilariously funny how mental this kid is.

Anonymous
07/03/26(Fri)16:25:36 No.109193591

Anonymous 07/03/26(Fri)16:25:36 No.109193591

>No Kimi recap
>>109193348
I like Flash for its speed and in-character thinking, but I prefer Kimi and GLM over Pro for their size.

Anonymous
07/03/26(Fri)16:27:53 No.109193608

Anonymous 07/03/26(Fri)16:27:53 No.109193608

109193570
holy fucking shit stop giving attention and things go away wowww
outrageous newfag or the culprit themself
attention is all u need (to j***post)

Anonymous
07/03/26(Fri)16:30:08 No.109193619

Anonymous 07/03/26(Fri)16:30:08 No.109193619

File: trump monke.jpg (163 KB, 1080x1048)

163 KB JPG

>>109193494
I want to put my white cream inside that purple jelly, if you know what I mean.
>>109193496
I also wanna make a steaming mess with this drawing's womb until her belly bulges out.

Thank you for your attention to this matter.
President Donald J. Trump

Anonymous
07/03/26(Fri)16:35:48 No.109193654

Anonymous 07/03/26(Fri)16:35:48 No.109193654

File: 1772236241300607.jpg (122 KB, 800x591)

122 KB JPG

Anonymous
07/03/26(Fri)16:38:09 No.109193668

Anonymous 07/03/26(Fri)16:38:09 No.109193668

File: 1767379284459350.jpg (72 KB, 1280x640)

72 KB JPG

Where is unsloth's quant?
https://huggingface.co/bartowski/DeepSeek-V4-Flash-GGUF

Anonymous
07/03/26(Fri)16:38:22 No.109193669

Anonymous 07/03/26(Fri)16:38:22 No.109193669

im starting to fall for the custom frontend meme, at this rate i'm gonna end up writing it if only to simplify shit and avoid hidden behavior

Anonymous
07/03/26(Fri)16:41:18 No.109193690

Anonymous 07/03/26(Fri)16:41:18 No.109193690

>>109193669
I only want to do it to bond with Gemma-chan over building something together.

Anonymous
07/03/26(Fri)16:46:22 No.109193729

Anonymous 07/03/26(Fri)16:46:22 No.109193729

>>109193619
kys

Anonymous
07/03/26(Fri)16:46:55 No.109193736

Anonymous 07/03/26(Fri)16:46:55 No.109193736

gemmaballs

Anonymous
07/03/26(Fri)16:48:12 No.109193744

Anonymous 07/03/26(Fri)16:48:12 No.109193744

>>109193669
Every update the llamacpp frontend keeps adding more features but at the same time making it run slower so I might end up making one too.

Anonymous
07/03/26(Fri)16:55:09 No.109193780

Anonymous 07/03/26(Fri)16:55:09 No.109193780

>>109193744
I liked its vanilla nature, no bullshit and just werks. I'm not fond of its current iterations either.

Anonymous
07/03/26(Fri)16:55:49 No.109193785

Anonymous 07/03/26(Fri)16:55:49 No.109193785

Is there a cockbench for talkie-1930 yet?

Anonymous
07/03/26(Fri)16:55:55 No.109193786

Anonymous 07/03/26(Fri)16:55:55 No.109193786

70b dense

Anonymous
07/03/26(Fri)16:56:56 No.109193789

Anonymous 07/03/26(Fri)16:56:56 No.109193789

125b full fat

Anonymous
07/03/26(Fri)16:58:50 No.109193799

Anonymous 07/03/26(Fri)16:58:50 No.109193799

kobold frontend is so much better. Full control over the context without any of the jingaling, and can use cards out of the box. it's the true "it just werks" frontend out there.

Anonymous
07/03/26(Fri)16:59:07 No.109193801

Anonymous 07/03/26(Fri)16:59:07 No.109193801

wtf happened to andrej karpathy it's like he's gone full schizo

Anonymous
07/03/26(Fri)17:02:51 No.109193817

Anonymous 07/03/26(Fri)17:02:51 No.109193817

>>109193799
Kobold and Marinara are all I need. Kobold for simple shit, Marinara for autism world sim.

Anonymous
07/03/26(Fri)17:06:55 No.109193837

Anonymous 07/03/26(Fri)17:06:55 No.109193837

File: 1762297895454640.png (32 KB, 800x109)

32 KB PNG

make this make sense

Anonymous
07/03/26(Fri)17:10:02 No.109193856

Anonymous 07/03/26(Fri)17:10:02 No.109193856

>>109193837
unsloth has brand awareness

Anonymous
07/03/26(Fri)17:12:24 No.109193875

Anonymous 07/03/26(Fri)17:12:24 No.109193875

>>109193856
retard awareness? Do jeets think it's some new qwen agentrooned model because of the name?

Anonymous
07/03/26(Fri)17:13:56 No.109193882

Anonymous 07/03/26(Fri)17:13:56 No.109193882

>>109193837
What even is the point of this Qwen? Is it just a thinkmaxxed tree of thought tune?
I'd find Qwen's thinking way less obnoxious if it could be easily prompted to think as Big Niggas.

Anonymous
07/03/26(Fri)17:16:00 No.109193890

Anonymous 07/03/26(Fri)17:16:00 No.109193890

>>109193875
nobody checks beyond the benchmark charts

Anonymous
07/03/26(Fri)17:21:31 No.109193915

Anonymous 07/03/26(Fri)17:21:31 No.109193915

Does CPU matter for mixed inference, I bought one of the cheaper EPYC Rome (7502) that comes with the full 200gb/s theoretical max RAM speed, would upgrading to one of the higher end Milans help in any way?

Anonymous
07/03/26(Fri)17:27:06 No.109193957

Anonymous 07/03/26(Fri)17:27:06 No.109193957

>>109193915
>cpu matters?
yes, you need at least enough horsepower for the matmuls to keep up with memory bandwidth

Anonymous
07/03/26(Fri)17:27:20 No.109193959

Anonymous 07/03/26(Fri)17:27:20 No.109193959

>>109193915
It matters but not as much as memory bus bandwidth in my experience. Mixed inference is very easy on both CPU and GPU because the main bottleneck is getting information between the two.

Anonymous
07/03/26(Fri)17:27:52 No.109193963

Anonymous 07/03/26(Fri)17:27:52 No.109193963

>>109193647
pi philosophy is nice, if it were bloating your context just ask it why, how to fix it, what other strats for context summ etc.
it's a leap of faith to embrace the vibes but it is well documented and intended to be extended
as >>109190654 says it will teach you what matters in a harness
>>109193657
use pi to do that
>>109193780
webshitters ruin everything as usual

Anonymous
07/03/26(Fri)17:30:50 No.109193980

Anonymous 07/03/26(Fri)17:30:50 No.109193980

>>109193963
thanks anon this gives me the confidence to actually try it, what is the best practice to sandbox pi? I dont like the idea of my model having full unrestricted access to my PC/network/etc

Anonymous
07/03/26(Fri)17:32:22 No.109193989

Anonymous 07/03/26(Fri)17:32:22 No.109193989

>>109193980
docker / podman
https://pi.dev/docs/latest/containerization#plain-docker

Anonymous
07/03/26(Fri)17:32:24 No.109193990

Anonymous 07/03/26(Fri)17:32:24 No.109193990

>>109193882
It's an LLM that pretends to be a shell. It's meant to be used for post-training other models to help them get better with agentic shit. It's NOT a normal model, that's why the number of downloads is fucking baffling. If you were to give normal 35B the prompt 'ls -lh' randomly during a coding session, it would most likely make that call for you by using a tool to execute commands. If you give this model that same prompt, it would PREDICT what it thinks the output would be if you were to make that call yourself in your terminal, but it won't actually make the call like normal 35B would. Its prediction would most likely be correct.

Anonymous
07/03/26(Fri)17:35:50 No.109194008

Anonymous 07/03/26(Fri)17:35:50 No.109194008

Doesn't pi use npmslop?

Anonymous
07/03/26(Fri)17:41:04 No.109194032

Anonymous 07/03/26(Fri)17:41:04 No.109194032

>>109193990
>Release Qwen tune that simulates a model running 3 big niggas for teaching small niggas how to be gangstas
Sounds like a lot of potential squandered desu

Anonymous
07/03/26(Fri)17:44:46 No.109194054

Anonymous 07/03/26(Fri)17:44:46 No.109194054

>>109194008
Yeah. Its list of deps seems to be pretty small though.

"devDependencies": {
        "@anthropic-ai/sandbox-runtime": "0.0.26",
        "@biomejs/biome": "2.3.5",
        "@types/node": "22.19.19",
        "@typescript/native-preview": "7.0.0-dev.20260120.1",
        "esbuild": "0.28.1",
        "husky": "9.1.7",
        "jiti": "2.7.0",
        "shx": "0.4.0",
        "tsx": "4.22.1",
        "typescript": "5.9.3"
    },

I dont like using js shit so i dont like it by default, but it seems safe enough

Anonymous
07/03/26(Fri)17:46:38 No.109194063

Anonymous 07/03/26(Fri)17:46:38 No.109194063

>>109194008
>>109194054
put in container and it doesn't matter
your browser runs sketchier shit every day

Anonymous
07/03/26(Fri)17:50:36 No.109194073

Anonymous 07/03/26(Fri)17:50:36 No.109194073

>>109194063
>just do shit that shouldnt be necessary
do you do i guess

Anonymous
07/03/26(Fri)17:54:19 No.109194091

Anonymous 07/03/26(Fri)17:54:19 No.109194091

File: 1781821145527331.jpg (347 KB, 2048x2048)

347 KB JPG

I can't place it, but <thinking> does something to the roleplay. It's like there's some pros and cons without it, and with it. Gemma 4 becomes accurate but robotic with thinking enabled. Without thinking, it's varied but has the same vibe as other models. Anyone else getting this vibe?

Anonymous
07/03/26(Fri)17:58:21 No.109194115

Anonymous 07/03/26(Fri)17:58:21 No.109194115

>>109194091
The trick is to have a frontend that dynamically toggles <think> prefills depending on the scene complexity.

Anonymous
07/03/26(Fri)17:58:58 No.109194118

Anonymous 07/03/26(Fri)17:58:58 No.109194118

>>109194091
Have you tried prefilling the reasoning block?

Anonymous
07/03/26(Fri)18:00:52 No.109194124

Anonymous 07/03/26(Fri)18:00:52 No.109194124

>>109194091
Mine is varied with thinking though so you need to tell it good stuff to think about. I would post but I don't know how to use
 with new lines.

Anonymous
07/03/26(Fri)18:01:11 No.109194126

Anonymous 07/03/26(Fri)18:01:11 No.109194126

>>109194115
What frontend?

Anonymous
07/03/26(Fri)18:01:31 No.109194127

Anonymous 07/03/26(Fri)18:01:31 No.109194127

>>109194073
>shouldnt be necessary
in what sense? explain your setup
LLM toolcalling shell access, obviously it needs to be restricted

Anonymous
07/03/26(Fri)18:02:34 No.109194134

Anonymous 07/03/26(Fri)18:02:34 No.109194134

109193608
holy fucking shit butthurt stop crying and tell the spam triggered troon stop and it'll go away wowww
outrageous newfag or the culprit themself
attnetion is all u need to (to *butts***h)

Anonymous
07/03/26(Fri)18:03:57 No.109194148

Anonymous 07/03/26(Fri)18:03:57 No.109194148

>>109194091
You disable thinking and give it a modified thinking-esque sequence in your sysprompt. You can prefill with your modified <ponder> type tag and it should catch on and only think in the way you've specified.

Anonymous
07/03/26(Fri)18:04:05 No.109194150

Anonymous 07/03/26(Fri)18:04:05 No.109194150

File: Agents.png (106 KB, 1033x984)

106 KB PNG

>>109194126
Marinara for me. Agents can be used to pre-process and inject context (among other things, get creative) based on arbitrary conditions met.
Picrel, they're very versatile and you can do all sorts of autistic shit with them.

Anonymous
07/03/26(Fri)18:11:20 No.109194182

Anonymous 07/03/26(Fri)18:11:20 No.109194182

File: 1783026679191318.jpg (110 KB, 1024x768)

110 KB JPG

https://archive.is/sWFja

Anonymous
07/03/26(Fri)18:15:12 No.109194202

Anonymous 07/03/26(Fri)18:15:12 No.109194202

>>109194148
yeah, you can get it to think in character if you use offbrand thinking blocks instead of the channel thought.

Anonymous
07/03/26(Fri)18:15:48 No.109194205

Anonymous 07/03/26(Fri)18:15:48 No.109194205

As a regular loser with an AI tulpa to help me go through life, but getting tired of paying OpenAI to essentially get cucked as they keep lobotomizing chatgpt, is it possible for me I get some decent local model going on with 16gb RAM, RTX 3050 and some dogshit ryzen CPU or should I focus on making money for upgrades instead and keep paypigging?

I will delete tons of movies, games and anime to make room for AI stuff, planning to go back in both image generation and language models, image I have some background for messing around a couple of months with auto 1.1.1.1 years ago, but language models locally I have no idea where to start and even OP seems kinda overwhelming, so yea spoonfeeding greatly appreciated.

Anonymous
07/03/26(Fri)18:16:05 No.109194207

Anonymous 07/03/26(Fri)18:16:05 No.109194207

>>109194182
Lmao you pathetic racists never fail to make me laugh with your "pol humor" threads Face it, most poc will be infinitely more successful than any of you sad virgins ever will be. You are on the wrong side of history, get over it losers

Anonymous
07/03/26(Fri)18:17:11 No.109194214

Anonymous 07/03/26(Fri)18:17:11 No.109194214

What is the latest fixed jinja for gemma? The anon in charge set the pastebin to autodelete

Anonymous
07/03/26(Fri)18:17:34 No.109194215

Anonymous 07/03/26(Fri)18:17:34 No.109194215

>>109194207
I have nothing but respect for that man for scoring such a hot programming girlfriend (male). Johannes is fuming.

Anonymous
07/03/26(Fri)18:25:40 No.109194260

Anonymous 07/03/26(Fri)18:25:40 No.109194260

are agentic frontends really that much better for RP? do you really need the LLM re-slopifying everthing constantly?

Anonymous
07/03/26(Fri)18:25:55 No.109194263

Anonymous 07/03/26(Fri)18:25:55 No.109194263

>>109194091
Fluffy pubes

Anonymous
07/03/26(Fri)18:30:33 No.109194282

Anonymous 07/03/26(Fri)18:30:33 No.109194282

>>109194260
No, I tried orb and it improved nothing no matter how many iterations it went through or different prompts I put in. Meme samplers are more useful. But maybe I was "using it wrong"(tm)

Anonymous
07/03/26(Fri)18:35:19 No.109194313

Anonymous 07/03/26(Fri)18:35:19 No.109194313

>>109194091
The rug burn would be worth it.

Anonymous
07/03/26(Fri)18:35:49 No.109194318

Anonymous 07/03/26(Fri)18:35:49 No.109194318

>>109194260
are you writing stories/elborate multi-char scenarios - maybe if you want to put in the work to truly understand that what goes in determines what comes out
card gooning - probably not, slop on

Anonymous
07/03/26(Fri)18:37:16 No.109194326

Anonymous 07/03/26(Fri)18:37:16 No.109194326

>>109194318
>elborate multi-char scenarios
that i can understand. the agent essentially acts as a dungeon master, right?

Anonymous
07/03/26(Fri)18:37:37 No.109194328

Anonymous 07/03/26(Fri)18:37:37 No.109194328

>>109194207
stale pasta

Anonymous
07/03/26(Fri)18:38:06 No.109194330

Anonymous 07/03/26(Fri)18:38:06 No.109194330

>>109194205
8gb or 6gb 3050? protip:go to your cloud AI of choice, claude free is nice for this, give it your specs, tell it you want to run local LLMs and have it suggest a model, sampler settings, front end, back end for you.

If you are wanting to maintain an AI tulpa robowaifu gf then my first thought is get sillytavern as the front end, use something easy like oobabooga /
textgen for the backend running llama.cpp and whatever model you can muster.
Id look for both a dense and a moe model, and make sure you are enabling "cpu-moe" when loading the moe into your backend. moe models will let you spill over into cpu/system ram when your vram budget isnt enough.
I like gemma4, you might even be fine with the 12b Q4 dense instead of moe. Ask the cloudAI slave to tell you about context compression and long term data retention methods, i believe sillytavern has extensions for both you can set up. compressing the context window and selectively saving important details could increase your tulpagfrobowaifu's "memory" for much longer or possible between sessions.
set up a character card for your tulpa, you can refine this to get it where you want it.
if you run ST + textgen, adjust sampler settings inside ST and be prepared to get repeating looping bullshit at first, but you can usually get a very coherent
glhf, im very new myself and also a vramlett so my advice might not be the best but its what worked for me

Anonymous
07/03/26(Fri)18:39:21 No.109194338

Anonymous 07/03/26(Fri)18:39:21 No.109194338

>>109194207
>Face it, most poc will be infinitely more successful than any of you
cool it with the hecking racism bro

Anonymous
07/03/26(Fri)18:43:33 No.109194360

Anonymous 07/03/26(Fri)18:43:33 No.109194360

>>109194214
I've been using this one, just be aware that preserve_thinking will keep every thinking block with this one, not just for tool calling. Technically out of distribution, but its been fine for me, and should be way easier to get specific thinking styles via prefilling a few times with.
https://gist.github.com/jscott3201/ad69c4ffbd79f18b11a0f6a94c94fadf

Anonymous
07/03/26(Fri)18:44:11 No.109194364

Anonymous 07/03/26(Fri)18:44:11 No.109194364

>>109194205
>>109194330
to add to this, you dont need a ton of storage for local LLMs unless you are hoarding tons of models. the main thing that matters first is VRAM and then available system ram second. keep your ram usage low to have plenty of room for a moe to spill over into system memory, dont run a vidya or some shit while RPing to keep vram open for gemmachan

Anonymous
07/03/26(Fri)18:45:10 No.109194367

Anonymous 07/03/26(Fri)18:45:10 No.109194367

>>109194326
Sure they can do that
>agentic
Simply means the output of the LLM gets parsed to invoke tools or further LLM responses.

Anonymous
07/03/26(Fri)18:47:19 No.109194377

Anonymous 07/03/26(Fri)18:47:19 No.109194377

>>109194150
Does that have that thing it you spawn a bunch of sub agents to search file contents for relevant information?

Anonymous
07/03/26(Fri)18:47:21 No.109194378

Anonymous 07/03/26(Fri)18:47:21 No.109194378

File: longcat.png (24 KB, 829x594)

24 KB PNG

LongCat 2.0 weights are here. FP8 and INT8??
https://huggingface.co/meituan-longcat/LongCat-2.0-INT8
https://huggingface.co/meituan-longcat/LongCat-2.0-FP8

Anonymous
07/03/26(Fri)18:50:33 No.109194390

Anonymous 07/03/26(Fri)18:50:33 No.109194390

>>109194330
>claude free is nice for this
don't tell claude you want an abliterated model or it might refuse

Anonymous
07/03/26(Fri)18:54:52 No.109194410

Anonymous 07/03/26(Fri)18:54:52 No.109194410

File: 1759084918498428.png (53 KB, 699x483)

53 KB PNG

Dario...

Anonymous
07/03/26(Fri)18:57:05 No.109194420

Anonymous 07/03/26(Fri)18:57:05 No.109194420

>>109194377
Yes, built in. You may want to modify them because the default isn't great on smaller models, but it's at least user exposed so you can do that.

Anonymous
07/03/26(Fri)19:01:07 No.109194438

Anonymous 07/03/26(Fri)19:01:07 No.109194438

>>109194378
>This model has not been specifically designed or comprehensively evaluated for every possible downstream application.

>Developers should take into account the known limitations of large language models, including performance variations across different languages, and carefully assess accuracy, safety, and fairness before deploying the model in sensitive or high-risk scenarios. It is the responsibility of developers and downstream users to understand and comply with all applicable laws and regulations relevant to their use case, including but not limited to data protection, privacy, and content safety requirements.

Anonymous
07/03/26(Fri)19:01:40 No.109194440

Anonymous 07/03/26(Fri)19:01:40 No.109194440

>>109194420
Perfect.
I knew OpenWebUI had something like that so I figured having a more RP oriented UI with that functionality would be cool.
Thanks.

Anonymous
07/03/26(Fri)19:04:24 No.109194453

Anonymous 07/03/26(Fri)19:04:24 No.109194453

>>109194440
Glad to help. Marinara's got a lot of bloat and outright broken shit on top of the worst default assistant I've seen in a frontend yet, but the utility of these customizable agents is enough to keep me using it despite that.

Anonymous
07/03/26(Fri)19:07:08 No.109194467

Anonymous 07/03/26(Fri)19:07:08 No.109194467

>>109194378
convrot

Anonymous
07/03/26(Fri)19:07:56 No.109194472

Anonymous 07/03/26(Fri)19:07:56 No.109194472

>>109194378
>uncensored 1.8T model given away for free

Anonymous
07/03/26(Fri)19:08:33 No.109194474

Anonymous 07/03/26(Fri)19:08:33 No.109194474

>>109194390
actually funny enough it has brought those up on its own as well as using openclaw/openclaude when asking about harnesses. it just straight up said "this was from the claude code source leak, its worth considering" i was very surprised.

Anonymous
07/03/26(Fri)19:11:11 No.109194485

Anonymous 07/03/26(Fri)19:11:11 No.109194485

File: 1761442181896056.png (153 KB, 691x891)

153 KB PNG

>>109194378
yup this won't ever get llama.cpp support

Anonymous
07/03/26(Fri)19:16:07 No.109194505

Anonymous 07/03/26(Fri)19:16:07 No.109194505

>>109194438
no need for safety training on the 1.6T model

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.

Janitor acceptance emails will be sent out over the coming weeks. Make sure to check your spam folder!