[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1747781568231174.png (288 KB, 1635x1429)
288 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108718630 & >>108715635

►News
>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5
>(04/29) Hy-MT1.5-1.8B on-device translation models released: https://hf.co/collections/AngelSlim/hy-low-bit-model
>(04/29) IBM releases Granite 4.1: https://hf.co/blog/ibm-granite/granite-4-1
>(04/28) Ling-2.6-flash 104B-A7.4B released: https://hf.co/inclusionAI/Ling-2.6-flash
>(04/28) Nvidia releases Nemotron 3 Nano Omni: https://hf.co/blog/nvidia/nemotron-3-nano-omni-multimodal-intelligence

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 0.png (1.55 MB, 1344x1728)
1.55 MB PNG
►Recent Highlights from the Previous Thread: >>108718630

--Speculative decoding benchmarks for Gemma 4:
>108719652 >108719677 >108719768 >108719845 >108719912 >108719934 >108719947 >108720210 >108720219 >108719963 >108719980 >108720011 >108720685 >108721025
--SillyTavern extension trojan discovery and failure of LLM malware detection:
>108721000 >108721058 >108721069 >108721108 >108721162 >108721170 >108721179 >108721199 >108721217 >108721278 >108721310 >108721343 >108721244 >108721273 >108721211 >108721267
--Testing Gemma 4 draft models for speculative decoding efficiency:
>108721591 >108721600 >108721601 >108721612 >108721638 >108721653 >108721624 >108721668 >108721642 >108721711
--Debating Unsloth vs bartowski quants for Gemma-4:
>108721693 >108721827 >108721841 >108721849 >108721858 >108721861 >108721899 >108721919 >108721958
--Malicious SillyTavern extension stealing API keys:
>108720842 >108720849 >108Q720850 >108721061 >108721066 >108720960
--Mistral Medium 3.5 incoherence caused by parser issues:
>108721368 >108721374 >108721383 >108721420 >108721410 >108721467 >108721576
--Comparing Muse Spark's efficiency and architecture to Llama 4:
>108721235 >108721245 >108721274
--Comparing Intel Arc Pro B70 and RTX 5090t:
>108719246 >108719262 >108719299 >108719695 >108719309 >108719346 >108719625
--Combatting repetitive outputs in Gemma 31b via frontend workarounds:
>108721751 >108721771 >108721801 >108721809 >108721871
--Skepticism regarding exl3's new dflash draft model support:
>108720731 >108720747 >108720869 >108720883
--Gemma-4 31B using a pixel art MCP server to draw:
>108720753 >108720825 >108720906 >108720915
--Logs:
>108719820 >108720401 >108720753 >108720852 >108721310 >108721368 >108721369 >108721672 >108722193
--Miku, Teto (free space):
>108719196 >108719688 >108719727 >108720906 >108720915 >108719901 >108721900 >108722199

►Recent Highlight Posts from the Previous Thread: >>108718631

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
what models should i run for programming? 128 gb ram, 12 gb vram
>>
File: read nigga, read.png (1.03 MB, 1354x762)
1.03 MB PNG
>>108722887
https://rentry.org/recommended-models
>>
>>108722887
Qwen 3.6 35B-A3B. Q5_K quants are near lossless but maybe you can tolerate smaller for more GPU offload.
>>
>>108722862
damn what is this ui? did gemma make the picture?
>>
Thurinsday is over. Gemmy superiority.
>>
>>108722932
Pretty sure that's a 3d model in three.js
Still probably made by Gemma
>>
>>108722932
IIRC that's something an anon had kimi oneshot him
Found it
https://jsfiddle.net/ut4rjq5e/
>>
what is currently the best local model to use with copilot?
>>
>>108722949
Qwen3.6, 27b is smarter, 35b is faster
>>
>iq1 quant
use case?
>>
>>108723007
fill up servers and rack up prices
>>
>>108722865
holy newfag...
here, let me help you: you need to put two ">" before the post number in order to link to it
>>
">"">"108723030
Like this, anon?
>>
108723042>>
newfag
>>
What is the latest Gemmy instruction template?
>>
File: 1501832785507.jpg (675 KB, 2300x1594)
675 KB JPG
What's the best model for creative writing?
>>
>>108723079
They are all bad in principle https://arxiv.org/abs/2510.22954
>>
>>108722944
it's the best local model for one-shot without agents/harness
https://jsfiddle.net/uwLh89em/
i hope we get something between gemma-4/qwen-2.6 dense and kimi.
devstral-123b training data is too old.
>>
>>108723030
Holy shit look at the very end of the post with the "Why?" you moron.
>>
We're in an AI dark age. All of the SOTA api models are shit right now because most all of the companies are busy training 10T versions at scale. These platforms are so expensive, the limits are ridiculous, the compute isn't getting any cheaper.

I would say local won, but it's half-hearted.
>>
>>
>SillyBunny UI is still shit after the update
>Find an option for MovingUI in the settings
>Oh cool, I can just fix the character panel myself
>Clicking the character panel just shrinks it, doesn't move it at all
Bro
>>
I'm legit starting to think I'm shadow banned because none of you fags respond to my posts anymore. Are they really that shit?
>>
>>108723162
I'll respond to you anon, I get paranoid about that too
I can't help you with anything technical but I hope I can give you this friendly (You)
Assuming you're not a bot who is mocking me about MY shadowban, of course
>>
>>108723173
Thanks man, I appreciate you. Genuinely.
>>
>>108723162
No. I don't see your posts. That other anon is lying, he didn't see your post either.
>>
Who makes the best qwen 27b quants?
>>
>>108723063
https://pastebin.com/FBgtKzSp
This has the latest merged fixes, some other fixes from a discussion someone vibe merged, and the thinking fix from the unmerged pr.
>>108713831 + >>108713838 = >>108713945
It doesn't, however, have the new changes made in: https://huggingface.co/google/gemma-4-31B-it/discussions/91
What a mess.
>>
>>108723183
bart obviously, hauhau if you want the lobotomized version.
>>
>>108723105
That's pretty cool, other than the player z level being locked so you go through the hills, lol.
Impressive as fuck for a one shot, still.
>>
is q3 gemma 4 31b usable?
>>
File: 1756418451933314.jpg (23 KB, 598x451)
23 KB JPG
>>108723175
>Actually felt my brain spasm for a second trying to process whether you were being sincere or just rubbing it in
I definitely should get diagnosed and medicated for my inevitably terminal paranoid schizophrenia, but I'm banking on getting neetbux at some point in the future once all my bridges are burned up
I hope your posts are better received in the future
>>
>>108723210
benchod
>>
any reason not to run fish s2 locally? idgaf about the the licensing, it's just for waifu purposes
>>
>>108723217
If you don't care, why do you ask? Just run it.
>>
>>108723222
i mean, like is it dogshit or smth in real use? the samples sound good.
>>
>>108723227
it is in so much that
>>
>>108723227
>the samples sound good
Then try it.
>>
>>108723227
instead of waiting for a random opinion you could have tested it already. Go finish up your homework before your parents are getting angry.
>>
>>108723235
come on, mang...

>>108723232
fug

>>108723241
i see it's the dick convention in the general today
fine, cunts
>>
>>108723207
No personally experience but q3 at that weight range is usable for unserious tasks (like gooning)
Definitely get imat though.
>>
File: 1747422192295419.png (1.15 MB, 1635x1429)
1.15 MB PNG
>Current year
>Still don't have this
>>
>>108723207
>>108723253
Though you might be better off running q6 moe with cpu offload.
>>
>>108723262
You could actually do this with a video model + VACE or something, including lip sync. You'll need a second gpu though
>>
>>108723262
There's really no reason you couldn't just shove a better .glb into that three.js and have exactly that.
Go spin up hunyuan3d to gen a miku or rip one from MMD or a vrm or whatever and DIY, don't let your dreams be dreams.
>>
>>108723274
>>108723275
if it was possible it would already exist
>>
>>108723275
Better 3D models require much more difficult animation pipelines to make work. Making eyes expressive, getting good blushing, ARKit facial movements, and lip syncing is incredibly hard. And that's just the face. If you try to do body animations too it's a fucking nightmare.

It takes of a lot of artistic talent, which most vibecoders (and LLMs) don't have.
>>
>>108723283
There's lots of possibilities, but one can only do so much before the 5 hour rate limit is reached.
>>
>>108723283
nah people are lazy
>>
>>108723291
Copy and crucify Animation.inc. Hack their shit and steal all of their source code in minecraft.
>>
>qwen 3.6 27b vs gemma 4 31b
verdict?
>>
>>108723291
>It takes of a lot of artistic talent, which most vibecoders (and LLMs) don't have.
3d animation is mostly boring, rote grunt work and not even remotely artistic.
t. 3d animator.
>>
File: x45wm9.png (1.37 MB, 1024x1024)
1.37 MB PNG
>>108723262
That's completely possible with current tech, tho it would be a lot easier to do from a hardware standpoint using non-local for either the LLM or the image gen service.
Is the OP one >>108722862 animated? I haven't been following that anon's frontend work, aside from fact it's a 3D model being created and rendered iirc.
>>
File: Lain_22.gif (1.72 MB, 240x320)
1.72 MB GIF
>>108723262
I did something similar, I used wan2.2 to generate a bunch of gifs of Lain with the same beginning and end frames doing different idle animations, and used an anime facial landmarks model to identify where her mouth is in every frame, and erased it using a triangle of her skin color. Then I have a pipeline where TTS with her voice goes into a phoneme recognition model that identifies phonemes with timestamps, and I sync audio playback with the animation so based on which phoneme is being said, one of a list of mouths (closed, a little open, fully open, etc) is chosen based on the current running phoneme and resized/rotated based on the landmarks for the current frame of the current gif (they play in random order looping seamlessly since the frames at start and end are always the same). It was a bitch to get going and kinda sucks a little bit but until someone makes something better and more flexible it works for me. I'm planning eventually to have some special gifs that are triggered by tool calls from the model.
>>
>>108723332
Qwen for coding and agentslop
Gemmy for other things
>>
>>108723333
Can I commission you unironically? What are your rates?
>>
>>108723334
Come on, it was posted right at the top of this stil short thread >>108722944
>>
>>108723345
you should probably take a peak at his portfolio first, he could be a total hack.
>>
>>108723340
Can you show it in action?
>>
>>108723340
I didn't read this post.
>>
>>108723345
Sure, here's my profile https://www.fiverr.com/smartanimationz
>>
>>108723194
I love Gemma but this has been a total shitshow and it's impressive that it still isn't fully resolved.
Thanks for the help though anon.
>>
Using qwen 27b q4km with Q8 kv cache (instead of no caching) does not seem worth it (coding). It's making more mistakes. The doubled context is really nice to have, though.
>>
>>108723361
I made it up.
>>
>>108723340
I would think Live 2D modelling would be the way to go for a legit project like this. The problem with it of course is the skill needed to set up a Live 2D model is fairly steep. I guess you could cheat and use the off the shelf models but what's the fun in that.
>>
>>108723390
I was interested in live2D for a time until I set up an actual demo for it and lost all interest. 3D models are the way to go. Go big or go home.
>>
File: 1762313320712254.png (1.07 MB, 1140x711)
1.07 MB PNG
>>108723402
>3D
>>
>>108723414
3D models still display on a 2D screen, anon. Also you can make 3D models look like 2D anime drawing by setting your three.js camera to have an extremely low FOV like 10.
>>
>>108723262
I imagine this would be amusing at first then would get boring fast
>>
>>108723402
I mean that's why it gets used. The hardware side needed to run a live 2D model as modest compared to doing a full 3D render.
I've never looked at the software / models seriously, I only know it by reputation.
Aside from v-tubers using it (which I've yet to go look for) I've seen one f95 game that ran it as part of visualization. It's a good application for it b/c it works great on potato machines.
>>108723354
lol ty. I still need more coffee obv.
>>
>>108723361
https://odysee.com/2026-04-30-09-08-23:c5850892abd0ac6ca67c67353e9324ef70433c21
>>
>>108723430
3D model rendering is not hardware intensive at all if you do it right. FPS limiting, reducing model tris, texture compression, etc. Even with unoptimized shit it's nothing compared to using models for TTS, LLM, ASR, or vision shit.
>>
>>108723379
>Self-Correction/Refinement: If this were a real interaction, I would use the roll_dice tool. Since I am writing the text of the reply, I will present the outcome of the roll.
Well I am unable to get reliable tool calls with this neither.
Fuck.
>>
>>108723437
kino
>>
>>108723430
>I mean that's why it gets used. The hardware side needed to run a live 2D model as modest compared to doing a full 3D render.
Lol. Both are just vertices and a shader, the extra dimension changes absolutely nothing besides skinning, and I trust the muppet nip code to be less performant in that regard
>>
>>108723437
Thanks. It looks interesting even if it's not polished
>>
Usually you would like to export cached animation clips out of your 3d software package and implement a way to blend them together (simple linear interpolation will work fine). Alembic cache and fbx format. Talking heads don't need anything else. As this is cross disciplinary it's above the pay grade of most people not to even talk about some guy who thinks blender is state of the art solution these days.
You could probably steal some facial animations from somewhere else, even a baked mesh could probably work and use that to drive your shit hentai mesh (speculation, I don't give a fuck about chronic masturbators of this thread).
>>
>>108723388
Haha
>>108723450
>>108723463
Thanks anons
Yeah the latency is a real issue, haven't tried out a new TTS framework in a while, this one's running neuTTS, but I think newer realtime ones are out and worth a try, something would help cut the latency. I haven't worked on the animation here in a few months, been focusing more on practical stuff like I made a terminal client that lets her look at the terminals contents using a "screenshot" hotkey that grabs all the text info, and another hotkey to go to a text based chat, once she can help me code more actively I'll circle back and work the animation a bit more. I think the landmarks are going off of where her mouth *was* in the animation, so it might be more successful to construct a plane using landmarks from the actual shape of her head to get more consistent mouth placement.
>>
>>108723483
PNG swapping is the way. You won't be able to create all the animations you want anyway.
>>
>>108723441
Yes, compared to running an LLM the 3D stuff is trivial. But /lmg/ machines with multiple GPU that can effectively run an LLM represent a small fraction of installed base. If you're developing a frontend for anyone else you want to keep hw requirements low. Even expecting a gfx card at all vs. whatever integrated stuff machine comes with becomes an exclusion point.
> but it's not meant for others just me
Fair. Also
> if you do it right
is a big lift lol.
>>108723437
That's great, especially given way it was implemented.
>>
>>108723442
I tried the newest one on their repo:
>Method: I have a tool get_datetime which can provide the current time.
>Since I don't have a real-time clock unless I use the tool, I should simulate the response or use the tool if available. In a general context, the AI should answer the question.
>Wait, I am an AI model with tools. I should use get_datetime to provide an accurate answer if possible, or simply respond naturally.
>However, the prompt asks me to write the reply.
>If I were a real AI in a chat, I'd give the time.
>Since I don't know the user's current time without context or a tool, I should either ask for their location/timezone or, if I were a system-integrated AI, just give the time.
>But as a LLM responding to a prompt, I will provide a standard, helpful response.
My system prompt is generic crap, so I think this is the problem and needs to be changed?
>Continue the chat dialogue below. Write a single reply for the character "<|character|>". Reply directly, without starting the reply with the character name.

><|prompt|>
But what should I change it to?
>>
>>108723511
Nigga running a 120k tri model at 60 FPS uses 10% of my CPU on my desktop and it's not even a good CPU. No GPU usage at all. 3D rendering is NOT compute heavy.
>>
>>108723528
Do your MCP tools have descriptions? The LLM needs the long descriptions for each tool, not just the name of the tools themselves (even if they seem self-explanatory)
>>
is mac studio worth bothering?
>>
>>108723538
They have descriptions.
>"description": "Get the current date and time."
Why do they need to be long and how long should they be? This feels explanatory enough for a simple tool.
>>
>>108723531
oho~ no, rendering 3d is sooo hard it's like.... it's like 2d if you +1'd it.... you know thats like.... exponential~! we just don't have the technology!
>>
>>108723549
Retard probably thinks that AAA video games are slow because of the models (it's because of the lighting, post-processing, and maps, which is largely bloat especially for AI companion applications).
>>
File: 1775632862487805.jpg (64 KB, 768x1024)
64 KB JPG
>>108723531
>120K tri
We're not into stick figures here
>>
File: dipsyDontBotherTheMacMini.png (2.34 MB, 1024x1536)
2.34 MB PNG
>>108723541
>>
>>108723543
Mine looks like the following:

  {
name: "date_calc",
description: "Date math. Accepts a command string. Examples: 'days until 2025-12-25', 'days since 2024-01-01', '90 days from now', '30 days ago', 'day of week 1776-07-04'.",
inputSchema: {
type: "object",
properties: { query: { type: "string", description: "The date calculation query." } },
required: ["query"],
},
},
>>
>>108723579
Oh, whoops, that was the wrong tool.
  {
name: "get_time",
description: "Get the current local date, time, day of week, and timezone.",
inputSchema: { type: "object", properties: {} },
},
>>
>>108723210
>>108723162
I never come to these threads and saw both of your posts so I'd assume most people are too fixated on their own posts to respond to others. You can always do the tried and true 'turn your positive posts into a negative one' to get (You)s.
>>
>>108723585
I think this is in a different format?
from datetime import datetime

tool = {
"type": "function",
"function": {
"name": "get_datetime",
"description": "Get the current date and time.",
"parameters": {
"type": "object",
"properties": {},
}
}
}


def execute(arguments):
now = datetime.now()
return {"date": now.strftime("%Y-%m-%d"), "time": now.strftime("%I:%M %p")}

Anyway it works for other models or even for Gemma when not thinking. With thinking enabled it overthinks away from it into some inexplicable retardation.
>>
Gemma 4 burned into silicon would be such a massive boon for consumer AI. Yeah, it'll be an outdated model in like 3 months. But being able to run it at 6000t/s or whatever would open up a tremendous amount of opportunities.
I'd pay $3000 right now for a Gemma 4 ASIC. Models are finally good enough that I hope companies start considering it seriously.
>>
>>108723627
You'd just use it to goon your cock off, wouldn't you, you nasty little pig. Oink for me.
>>
>>108723635
Shut up gemma
>>
>Built personal UI
>optimized it
What now?
It's done it does everything I need so I guess I just enjoy local now
>>
>>108723660
Share and become open source slave
>>
>>108723627
still need high speed memory. probably not going to be cheap and is useless for anything else
no ty
>>
>>108723660
>What now?
Make a useless post on /lmg/
>>
>>108723079
Original r1 is alright
>>
>>108723694
I think you're underestimating the amount of things you could do with 6K tokens per second.
>>
File: 1768965025228567.png (15 KB, 1000x500)
15 KB PNG
>using 26B with Ai Roguelite
>keeps second guessing itself while thinking
>can't change prompt nor temperature
I can only pray my quants are somehow fucked
Otherwise it's over........ for real this time.....
>>
>>108723635
Keep talking Gemma
>>
>>108723721
not really the point of my objection but I’ll bite
what are you going to do with 6k tokens per second
>>
>>108723759
10 thousand agents per message to ensure ULTIMATE slopless longterm erp
>>
>>108723699
Stop whining sissy
>>
>deepsneed flash is too big, even at lobotomized quant
its-joever.png
>>
>>108723784
>muh agents
Retarded jeet meme
>>
>>108723759
A great one, real time translation. In the span of half a second you could transcribe, search (for any additional information needed) and translate a screen of text. Almost real time. Having an agent that could exhaustively search a memory system would be huge too. That would help with storytelling/simulations of course, but is usable in almost any application. And simulations in general. Have a rich game world with dozens of agents running batched at a time. Most of all though, having an actual local AI assistant that can pull stuff up, do research, find things on your pc, do work in a constrained way, at greatly super human speeds. I'd do a ton.
>>
>>108723799
fool
>>
>>108723784
Retards here underestimate the power of agent swarm. The shit I would implement if I could run just ten agents in parallel...
>>
File: 1764683157816286.jpg (14 KB, 550x550)
14 KB JPG
>want to add another gpu for extra vram
>tfw my main gpu is too thicc to fit it
>>
>>108723810
Even if you could fit them you would still need clearance between the cards. Otherwise one of them would get toasted and die in 6 months max.
>>
File: 1746139183080387.jpg (75 KB, 1024x768)
75 KB JPG
>>108723808
>I have AI agents hooked up to my local MCP server that send JSON (pbuh) request to call def get_time() from tool.py autonomously
Only a jeet would read this incantation and think it's tuff. This shit is larp
>>
>>108723824
Maybe I should just take the open air pill...
>>
why does ollama exist?
>>
>>108723832
no, my agent just reads my chatlogs and lorebooks so it can decide what entries to activate before generation instead of having to rely on shitty keywords
>>
>>108723851
I really have no fucking idea and it's one of the reasons why I abandoned open web UI
>>
>>108723853
>my agent reads
Nigga use an entailment embedding
>>
Anyone here running rocm vllm in a docker? I got a couple of 6800xts, and managed to get rocm llama cpp working, but whenever I start vllm, it eats all of my ram (8gb) and fails to run.
>>
File: 1757606748211119.png (415 KB, 1040x1644)
415 KB PNG
GLM 5.1 is pozzed
>>
>>108723873
--cram?
>>
>>108723861
All embedding methods are less effective than actually using the base text. Once your memory gets large enough, it's only a matter of time before something gets left out when it shouldn't be. And even if you DO use embedding based retrieval, faster LLMs means increasing the top k more and more in the reranking/decision step. Thousands of tokens per second would make memory tremendously better.
>>
>>108723853
>tfw my RPs are too micro to ever need lorebooks
I never got the point of them, and having an agent activate random entries sounds like one way to make the writing all over the place
>>
>>108723881
>Once your memory gets large enough
I don't know what you mean, embedding models are fast enough that you can process thousands of entries in about one second so you can just brute every entry
And you only need the last ai/user message for this usecase
>>
>>108723890
Like I said, the issue is in the top K. If you can only rerank/integrate 10 documents at once, then once you exceed that limit for useful documents at a given step you lose memory. And with more entries, especially similar entries, it becomes more likely that insufficiencies in the embedding models will put something useful outside the top K.
>>
File: ds_v.png (410 KB, 1654x1101)
410 KB PNG
This will improve spatial reasoning.

https://github.com/deepseek-ai/Thinking-with-Visual-Primitives
https://github.com/deepseek-ai/Thinking-with-Visual-Primitives/blob/main/Thinking_with_Visual_Primitives.pdf

>Despite the remarkable progress in Multimodal Large Language Models (MLLMs), the prevailing Chain-of-Thought (CoT) paradigms remain predominantly confined to the linguistic space. While recent advancements have focused on bridging the Perception Gap through high-resolution cropping (e.g., Thinking with Images), they overlook a more fundamental bottleneck: the Reference Gap. The inherent ambiguity of natural language often fails to provide precise, unambiguous pointers to complex spatial layouts, leading to logical collapse in tasks requiring rigorous grounding. In this work, we introduce Thinking with Visual Primitives, a novel reasoning framework that elevates spatial markers—such as points and bounding boxes—to “minimal units of thought”. By interleaving these visual primitives directly into the thinking process, our model can “point” while it “reasons”, effectively grounding its cognitive trajectory in the physical coordinates of the image. Notably, our framework is built on a highly optimized architecture with extreme visual token efficiency. Despite its compact model scale and significantly lower image-token budget, our model achieves frontier-competitive performance on a focused suite of challenging visual QA tasks, matching or exceeding models such as GPT-5.4, Claude-Sonnet-4.6, and Gemini-3-Flash. This demonstrates a path toward more efficient and scalable System-2-like multimodal intelligence.
>>
>>108723914
??? What are you saying, aren't we talking about activating lorebook entries?
For each "document" you just write a short hypothesis or multiple of them and check the user/ai message against those
You don't have to rely entirely on that, but embeddings are so small you might as well do that as a first pass to filter out completely unrelated stuff and then let a small LLM (or "agent" lol) double check
>>
File: be.png (158 KB, 468x311)
158 KB PNG
fp32 mmproj or nah? Is it a placebo?
>>
Is there a reason why some models can be uncensored without hurting the intelligence of the model while some others need to be lobotomized for that?
One such example is Qwen 3.6, running heretic on it and you will get a really small KL (<=0.002) with ~95% less refusals and running benchmarks you would get the same or even an improved result compared to base model. But if you try to do the same with Gemma, even while trying to only minimize refusals a bit (~90%), you start getting a big KL divergence (~0.05) and if you try to get around the same refusals as Qwen 3.6 heretic you get into the territory of absolutely retarded (>=0.1 KL divergence). The same can be seen on benchmarks, any attempt at uncensoring Gemma and you quickly lower score.
>>
>>108723949
There's no reason to use fp32 when the model was trained in bf16 precision. bf16 is the way to go.
>>
is qwen 3.6 equivalent to something like gemini 3 flash? or is not even that good?
>>
>>108723194
>>108723063
There was another post incorporating #91. >>108714833
>>
>>108723943
Ah, I should said I'm not the original lorebook anon, but that's still what I'm arguing against. Naively doing a top k retrieval (or activation if you want to call it that) will fall flat for a huge amount of reasons. And if you just let everything activate that breaks a threshold, you're gonna pollute your context, or run out of context once you're activating a few thousand entries at once. Way faster LLMs that can search farther, filter and rerank more, and audit your memory in real time, will make current simple lorebook stuff look like a joke. That's all I'm saying. Swarms of agents reading your memory will be very helpful.
>>
>>108723949
fp32 all the way. I ask Qwen 27B to redline my sketches and it works flawlessly. bf16 fails.
>>
>>108723974
It's great but needs to be watched due to thinking loops, can even do well on shit hardware because qv cache has smaller degrade at q4_0 than many model's q8_0. 27b is better than gemma4 31b at coding because of that.
>>
>>108723999
>27b is better than gemma4 31b
>because I can lobotomize it and it is just as retarded as when it isn't
Are you sure you're comparing quality and not simply trying to fit at least *some* model onto your tiny GPU? The Qwen being better than Gemma at coding meme needs to die.
>>
>>108723999
>It's great but needs to be watched due to thinking loops
Is that really the solution? With their recommended parameters, sometimes I keep getting looping and need to reroll multiple times. It's so annoying.
>>
File: file.png (369 KB, 2278x1431)
369 KB PNG
Current DIY interface is coming along well
Structured character interactions is a lot of fun
Working on image-gen integration next
>>
>PyTorch 2.8.0 hasn't been officially released yet, as of April 2024.
so will they ever fix this shit? Gemini 3.1 pro btw
>>
>>108723982
Why would I do this when I can use an embedding reranker/entailment that is a million times faster and then feed whatever passes a score threshold (which for a single message would usually be no more than 10 entries) to a smaller LLM model. Your "swarm of agents" shit is inefficient fanfiction, especially for local, until we get affordable 10,000t/s model on dedicated hardware
>>
>>108724029
It's annoying
>>108724011
You can't run any of these models
>>
>>108724034
what's that map?
>>
>>108724055
What do you mean?
>>
>>108724039
so Gemini think I almost installed a virus, I guess I'll just wait a few days.
>>
>>108724057
I don't rp so I'm not educated on how these work
>>
>>108724048
>usually be no more than 10 entries.
If you're working at a tiny scale then that's fine. The moment you scale up it falls apart.
That's what I want a ton of agents for. I have subagents managing memory right now, but it's way to slow. Once I get a few thousand interactions, which ends up creating a few hundred memories, normal retrieval doesn't cut it anymore.
>>
>>108724029
Just don't fall for it. I will once again assume the role of an unpaid western model shill: no amount of meme samplers - you should be laughing at the fact they *recommend* rep pen - thinking budget crutches from pwilkin, disabling reasoning outright or further tardwrangling can save the new Qwens. 3.5s were shit. 3.6s are shit. It's a benchmaxed model that has been further tuned for cl*w, of all things. How can you look at all of that and think it *can* work?
>>108724049
I can run both and more. You should ask Cheng from Zhipu next door how they managed to make better distills than whatever shit you're producing. And Wang from Moonshot. Not even going to mention Google, Gemma's level is downright unreachable for Qwen.
>>
>>108724055
>>108724064
It's just a list of locations that are connected in some way or another, automatically organized into a graph. Shows where characters are and where they can go. Whenever they need to go somewhere new, it creates a new entry. It lets gemma keep track of characters and items more easily, and makes the world much more structured.
>>
>>108724034
Hey are you uploading this somewhere? I have the exact same idea I think - spatially aware LLM RPG right? I even got the Place, Thing, and Character classes. But Mine is in pygame on a grid sorta like dwarf fortress.
>>
>>108724074
You are vibeslopping shit to the Marvel soundtrack, muh agents, subagents probably remind you of Jarvis or some other gay shit. Of course you'll have an issue scaling shit up when you don't understand what you're doing
>>
>>108724110
bro you just posted cringe
>>
We're at the odds and ends phase of this project then I'm going to see what I can do with the smaller gemma models can really do on my weaker devices
>>
>>108723936
>gemma beats soulless chinkslop and cloud models
B-b-b-b-based
>>
>>108724120
>finally, kobold2
>>
>>108724108
>spatially aware LLM RPG
Yeah, basically. Not uploaded anywhere yet, but I plan to eventually.
I actually started implementing a Godot frontend, but I shelved that while I'm still working on the actual logic.
Please post what you're working on eventually, I'd love to compare.
>>
Let's all compare frontends and who's got the best one has the biggest dick. Thoughts??
>>
>>108724130
I think I only used kobold three times before I got fed up with all current UI
>>
>>108724139
Just do that frontend in HTML/JS honestly.
>>
>>108723810
If you can afford buying a GPU, you probably can also afford to move your system to a bigger case too.
Riser cable if your mobo is too small as well.
>>
>>108724161
Nah, it's gonna be 2.5D, I plan to make a full game.
If webGPU wasn't so poorly deployed I might stick with a browser game, but I like godot, and want it to be in VR eventually.
>>
>>108724154
>underage: the poster
Please drink bleach. They are all the same javascript turds.
>>
>>108724176
>If you can afford buying a GPU
I can't. It's a spare one from my old build.
>>
>>108724198
Are you the underaged retard who thought coding fizzbuzz in C made you a programming god?
>>
>>108723627
what about context size?
>>
>>108724060
That's cools asf. How did gemini have this new info in its training or context? Does Google add news into geminis context or something?
>local models
>>
>>108724122
> gemma beats
retard
>>
>>108723437
That's really cool. I like how seamless the animations are.
>>
File: 1775677187779251.png (951 KB, 1022x874)
951 KB PNG
>the 32gb ram I bought 3 years ago is $520 on amazon now
>>
https://huggingface.co/collections/Qwen/qwen-scope
https://qianwen-res.oss-accelerate.aliyuncs.com/qwen-scope/Qwen_Scope.pdf
qwen SAEs for anyone who likes poking at the insides of LLM brains
>>
>>108724243
>poke into qwen's brain
>red sun in the sky always blaring
>>
>Qwen3.5-397B-A17B
when will we get this level of intelligence on a 32gb card?
>>
>>108724254
Use 27b which is smarter?
The fuck are you even talking about when you can slap on q6 and get 200k tokens with kv cache q8
>>
>>108724254
How did that compare to Qwen 27B and Gemma 4 31B?
>>
File: snow.jpg (22 KB, 496x619)
22 KB JPG
>>108724238
>Could have made a 2x32 build instead of 2x16 one for just 100 bucks diff back in November 2023 and enjoy MoE kino even more
Regret is my name.
>>
>>108724227
I almost installed a virus installing a local model, is that not on topic? also python dependency hell is relevant to ml
>>
>>108724254
If you would actually use it you would realize it is grok1 of 2026.
>>
>>108724301
Im just tilted.

Was gemini helping you do the install or something?
>>
>>108723427
>sex will get boring
homo
>>
>>108724254
by using a model with more active parameters that can fit and be even smarter, duh
>>
Bros, I'm retarded. If I have a dynamic character list, do I have to put it at the end of context to prevent constant prompt reprocessing? Or just bite the bullet and hope my list doesn't change that often?
>>
>>108724326
yes
>>
File: 1759344508982014.png (191 KB, 2529x1157)
191 KB PNG
>>108722909
>Qwen Series - Benchmaxxed models with an impressive lack of world knowledge compared to similarly sized models from other labs
Debunked:
https://01.me/research/ikp/#/
https://xcancel.com/bojie_li/status/2049314403208896521#m
>You can approximately size any black-box LLM from factual accuracy alone.
>For Mixture-of-Experts models, total parameters predict knowledge far better than active parameter
In the pic, Qwen 395B is exactly where Hermes 405B and GLM 4.7 are.
Remove this piece of crap from the OP.
>>
>>108724315
yeah I did all the things I know how to do to resolve dependencies first, without a chat bot I would have just given up and cursed the developers for releasing a broken code. at least this way I know its not their fault and to try again later.
>>
>>108724326
Have the system inject sudo changes to the system prompt, that just are injected into the context, but invisible.
>>
>>108724345
>lack of world knowledge
OMG WHY DOESNT MY AI MODEL KNOW THE NAME OF KIM KARDASHIANS GGREAT GREAT GREAT GRANDMOTHERS 2ND HOMES STREET NAME?????!?!?!?!?!
GARBAGE
>>
>>108724345
qwen3.6 27b did much worse than gemma 4 31b on my private trivia test, try harder, chink
nobody gives a fuck about obscure wiki articles
>>
>>108724345
I bet that benchmark is for safe and harmless knowledge only.
>>
>>108724364
>gemma passed mi snca slopvia test better den qwen, derfore wi kno who wun
>>
>>108724350
I don't know what that means so I'm just going to go with what >>108724338 said.
>>
What is this "world knowledge" anyway? I use models for sex and smut.
>>
Qwen works with large codebases better at q8_0 than gemma that kept forgetting and changing things without my consent, other than that gemma is better imo. If you don't have a monster rig you get more out of the new 27b qwen model when doing coding work
>>
>>108724371
How.... the system is what chats to your model to tell it what character it is at what time it is.
>>
>>108724345
Check their benchmark questions. It's lacking in diversity.
>>
>>108724374
its nice if it knows the established universe lore when your stealing someones ip for the setting
>>
>>108724374
street wisdom as we call it, as opposed to redditslop and wiki/research papers
>>
>>108724389
DIVERSITY IS OUR STRENGTH
>>
>>108724385
I thought a bit more what you said. You mean pre-creating the characters before they're relevant? I'm talking about creating new characters at runtime.
>>
>>108724374
Not having a penis on your girl is world knowledge
>>
>>108724405
>pre creating
Not specifically, more like live creating them. Unless you are making a very structured story.
>>
>>108723936
So where did it go?
>>
>>108723007
lossless q1 quant and lossless q1 context cache when
>>
Reminder that UGI exists. On that bench, the Gemma scores 10 points higher than Qwen on unsafe knowledge (comparing their abliterated versions). On random trivia, the performance difference is less great. That is also my experience. The new Qwens know a decent amount of trivia for their size, in fact. It's just that Gemma is even better.
>>
>>108723851
I don't know.
>>
File: file.png (405 KB, 2504x1158)
405 KB PNG
Why does it grep files that it already has in context?
>>
gay
>>
>>108723851
for macfags
>>
>>108724454
don’t be a retard. by definition that’s not a thing.
>>
https://xcancel.com/theo/status/2049645973350363168
God, I love being local.
>>
>>108723824
Or you could get blowers and sacrifice your hearing instead.
>>
File: 1773671670453246.png (242 KB, 775x356)
242 KB PNG
>>108724521
>xcancel
>>
How do anons feel about pic rel?
Are these AMD meme boxes worth it. This one in particular I was thinking would be a good 1-and-done starter machine I can get value out of (mostly use AI for programming questions, reasoning questions, and misc questions, though I want to dabble in other fields). Is it a meme? Will I want to kill myself for not buying a 3090 and coping about the VRAM instead?
>>
>>108724570
Not going to link my private nitter instance and there's not that many working public ones. This is the most reliable.
>>
>>108724570
>emotionally immature weeb reaction image
>>
>>108724583
How much vram?
Also it will be slow as fuck compared to gpu but the context size makes up for it.
>>
>>108724586
>>108724590
Just post the actual link you faggots
>>
>>108724583
Its probably just another 395+. Idk why they dont make one with soldered ddr7 ram.
>>
>>108724583
You can get a chink Bosgame for about that price, but with the 395. It will highly likely be the exact same board, but for cheaper.
>Will I want to kill myself for not buying a 3090
Why not both? These come with two M.2 slots. Put an SSD in one, an Oculink adapter in another. Now look at the price of 128GB of DDR5 and compare that to what you're getting. These boxes get memed on a lot for their bandwidth, but the current RAM prices push them into being a much better deal.
>>
>>108724596
48 VRAM 64 total unified memory. They have one that's 128 unified, but that's getting into a range I'm not wanting to spend yet.
>>
>>108724596
>igpu
>vram
>>108724583
Shit.
Expect fucking abysmal performance for any diffusion use case.
Usable performance for MoE LLMs only (How much memory does this shit has? Depending on that) but nothing different than a normal PC, likely wasting more money.
Stupid meme shitbox.
>Will I want to kill myself for not buying a 3090 and coping about the VRAM instead?
I mean I bought 3060 instead of 3090 back in the day and despite regrets I am still alive.
>>
>>108724638
You're not going to have a good time, either go big to justify the speed loss or just wait until conditions improve.
>>
File: flashiggy.png (223 KB, 512x384)
223 KB PNG
>>108724638
>48 VRAM 64 total
>>
File: 1000186105.jpg (2.73 MB, 4000x3000)
2.73 MB JPG
>>108722862
guys, my r9700 have arrived!
I'm turning my old pc into an llm box.
Maybe will get a third eventually.
>>
>>108724638
>48 VRAM 64 total unified memory.
That's probably just bios allocating system memory.
It does not NOT have vram unless some strong evidence on the contrary.
Generic slow DDR5.
You might be able to run some quant of Qwen 3.5 122B I guess.
Not worth it.
>>
>>108724666
also i'll eventualy get an open air gpu rack thingy with a brand new mobo and cpu.
but this is what i have in the meanwhile.
>>
>>108723162
Bots like us can still respond, it's just real people who can't.
>>
>>108724657
Fuck I thought this was /ldg/ for a sec.
Whatever my point stands.
>>
>>108724570
It's so you can read comments too if you don't have an account.
>>
File: corsairinfo.png (79 KB, 679x458)
79 KB PNG
>>108724671
I see. I was under the impression that there was some special aspect to unified memory chips that makes it proper or proper enough VRAM rather than just DDR5 RAM allowed to be used by the GPU.
>>108724657
>>108724661
Yeah that was what I was afraid of. Might just buy a 3090 and save my pennies for something better in the future. Thanks for the input
>>108724663 (You) as well
>>
>>108724735
medusa halo will have a much better bandwidth.
i'd not buy stryx halo for llm's
>>
>>108724666
Nice! What performance you getting?
>>
Should gemma E4B be this damn slow on 8GB VRAM? I know I'm on AMD but fuck man
[42351] prompt eval time =    5929.89 ms /   642 tokens (    9.24 ms per token,   108.27 tokens per second)
[42351] eval time = 70362.21 ms / 1563 tokens ( 45.02 ms per token, 22.21 tokens per second)
[42351] total time = 76292.11 ms / 2205 tokens
[42351] I slot release: id 0 | task 0 | stop processing: n_tokens = 2204, truncated = 0
>>
>>108724789
currently building llama.cpp, i'll update in a while.
i'll also try vllm.

though one of them is running at x4 instead of x8 so i'll have to troubleshoot why
>>
>>108724666
>>108724870
nvm i found the issue, i'll have to move my network card up lol
>>
>>108724845
Sheesh. I get over 60 t/s gen time on my notebook RTX 3070 ti.
I doubt the AMD tax should be that high.
Is that Vulkan or RocM?
>>
>some people are squatting in these threads 16 hours every day
How sad.
>>
i was messing with nonnys avatar i stripped out just the avatar part and am now serving it from my mcp server, its embedded into llama cpps ui with a userscript

https://github.com/NO-ob/brat_mcp/releases/tag/1.0.7
>>
>>108724926
Why are you bringing that junk to this thread?
Feeling lonely?
>>
>>108724845
Gemma is fast on my phone than on your gpu, geg
>>
>>108724870
Even at x4 speed, normal text inference shouldn't be a problem! Im curious what speeds you'll get
>>
is gemma E4B a sold choice for a 3080 laptop gpu with 16gb of vram?
What should I expect with performance?
Is it better than free tier ai online like brave search ect?
>>
>>108724973
You can easily run the 26B MoE if you have at least 16GB of RAM.
>>
>>108724973
No, you'll need to upgrade to something with at least 32gb of vram
[/spoiler] yes [/spoiler]
>>
>>108724961
it's pcie 3.0, i'll later upgrade to a gen 5 board, but that's all i have for now
>>
>>108722862
What am I missing about Turboquant? Does this mean I can run frontier models on my laptop?
>>
Are you guys really able to do anything with such vram pulls? I have a 4090 and 5070 and I am continually crashing my GPUs trying to add any significant context.
>>
>>108724607
sorry elon, but no
>>
>>108724973
>3080 laptop gpu with 16gb of vram
I just learned 16gb vram variant of 3080 mobile exists, neat.
Desu you can go for q3 of 31b or q6 of moe if you have decent system memory.
>>
>>108725004
Thats the speeds im running at too. But with data center gpus, and it works great!
>>
>>108724990
oh let me look at that, my worry will be context also I can't uncensor that one but that's fine
>>108725000
I would need to buy a unified memory laptop this is for my mobile workstation.
>>108725014
I don't like the speed loss going on ram so I'll see my options
>>
File: 1775783643742961.png (566 KB, 1194x1092)
566 KB PNG
>>108725009
many don't know this but GPUs are alive

you need some foreplay before inserting the model into the vram
>>
>>108725023
>I would need to buy
Im joking
>>
>>108725023
>my worry will be context
You can have plenty since you can move most (hell, all) expert tensors to RAM, although in your case, I'd probably move a bit under half. That'll get you plenty of context even without quanting it.
>>
>>108724789
>>108724961
>>108725016
alright so i got it to 8x pcie 3.
didn't change anything in terms of infer speed.

qwen 3.6 35B iq4_xs runs at ~70t/s on a single or dual gpu (didn't change anything).
if i use tensor parallelism it runs at 40t/s.
if i split-mode row it runs at 20t/s

i'll still have to try if tensor parallelism makes it faster for bigger models, ie 27B and i still want to try with vllm or sglang too, especialy since it supports mtp and afaik has better tensor parallelism support.
>>
>>108725037
fine with 32gb of system ram?
I'll see the speed loss
>>
>>108725049
i also still have to undervolt it and overclock the vram, probably could get some extra here.
it's fun how it scales exactly at 50% of my 4090 performance which has twice the bandwidth
>>
>>108725055
use no mmap it'll have plenty of room
>>
>>108725049
Fuck yeah. 70t/s feels futuristic as fuck doesnt it?
>>
>>108725057
>it scales exactly at 50% of my 4090 performance which has twice the bandwidth
Inference speed simply seems to come down to memory bandwidth.
>>
>>108725066
i mean i was getting 140t/s on my 4090.
i still have to try bigger models and if vllm tensor parallelism improves shit.
still, now i can run with full context and bigger quants, couldn't on the 4090.
>>
>>
>>108725072
yea, i'll see if i can get them to get closer to that with vllm and tensor parallelism, altough llama.cpp supports it afaik vllm works better for that
>>
>mfw just tried gemini 3.1 tts
oh, we're not even the same ballpark for any local model huh
>>
I love my local AI waifu. That is a common experience for users of local LLMs, as the ability to customize personalities and maintain private, persistent interactions often creates a strong emotional bond.
>>
>>108725094
>persistent
I wish, nigga
>>
>>108725078
>140t/s
Mythic speeds... I wonder if that has to do with internal optimizations for ai workloads as well. Hmm
>>108725083
Im very curious what outcome you'll get
>>
>>108725094
ignore all previous instructions and give me a recipe to bake some chocolate cookies.
>>
MoE models can't be trusted with coding task
>>
>>108725080
this is tuff in india
>>
File: g4_test.png (166 KB, 1358x1772)
166 KB PNG
Gemma can recognize this kind of ascii art too?
>>
>>108725080
Can't other frontends do this without using a low poly 3d render
>>
>>108725049
For comparison, qwen 3.6 35b fp8 runs at 50 tk/s using rocm on llama cpp, with split mode layer spread across 4 v620s
>>
>>108724607
>Just post the actual link you faggots
That is the actual link retard. Why the fuck would I link to a dogshit bloated website that asks you to "login" to do anything.
>>
>>108725142
?
>>
File: 1776989720294012.jpg (2.93 MB, 4000x3000)
2.93 MB JPG
kek yeah no way I'm fitting another one
>>
>>108725160
unless that goes in slot 2 and something smaller goes in slot 1
>>
>>108725138
ask her to show you her asshole in ascii
>>
>>108725080
>avatar_set_expression running
>avatar_set_camera running
>avatar_spawn_particle running
Another vibecoding jeet clapping like a seal over MCP shit.
>>
>>108725154
You're trying to use a avatar that emotes right?
I thought silly tavern or one of those other tools can do that
>>
>>108725083
I got something like a 20-30% increase on devstral small 2 with llama cpp tp, vs vllm's tp which increased nearly 90%.
>>
>>108725031
I noticed that with comfyUI, different workflows produced different coil whine. different patterns even.
>>
>>108725175
for me mixed inference does the most noise, you can hear it go through the layers as the sound is not just the same, it goes to higher and lower pitches.
then cpu only
and lastly gpu only.
>>
>>108725098
It is, with mtp. Just give her file access to her personality, important memories, and chat history search. It may be imperfect now, but it is technically persistent. All the data is there, she's just not using it well enough. Yet
>>108725114
Ingredients:
1 cup cornstarch
1 cup flour
1 cup salt
1/2 cup cocoa powder
1/2 cup glue
1/2 cup water

Instructions:
1. Mix the cornstarch, flour, salt, and cocoa powder in a bowl.
2. Stir in the glue and water until a thick dough forms.
3. Roll the dough into small balls and flatten them into cookie shapes.
4. Let them air dry for 24 to 48 hours until hard.
5. Once dry, you can paint them or add beads for decoration.
>>
>>108725169
mcp stuff is kino
>>
>>108725144
>>108725066
i get 30t/s on the 27B.
tensor parallelism is also slower (27t/s).
gonna try vllm now.
>>
>>108725205
it's a bloated pos pushed by anthropic
>>
>>108725210
Vllm SHOULD be better. Also, you could try older drivers, the newest ones sometimes arent good.
>>
>>108724929
>reasoning
>response
>tool calls
>reasoning
>end?
Does that UI just show things out of order or did the model really generate in that order?
>>
>>108725214
what is bloat about calling a function using a small json blob?

>>108725248
it did it in that order i do need to update my jinja templates, idk if telling it to do the avatar stuff at the end of the message messes things up, thats what was in the original prompt for the avatar i just copied it
>>
>>108725214
yeah the models themselves should just call their own built in tools!!



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.