[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108795204 & >>108787293

►News
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493
>(05/06) Zyphra releases ZAYA1-8B, an AMD-trained MoE model: https://zyphra.com/post/zaya1-8b
>(05/05) Gemma 4 MTP drafters released: https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4
>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: mikuthreadrecap.jpg (1.15 MB, 1804x2160)
1.15 MB JPG
►Recent Highlights from the Previous Thread: >>108795204

--Gemma 4's precision sensitivity and F32 KV cache benefits:
>108795556 >108795571 >108795576 >108795600 >108795611 >108795636 >108795828 >108795834 >108795842 >108795855 >108795871 >108795959 >108795973 >108795984 >108795990 >108795976 >108796091 >108796141 >108796707 >108796740 >108796957 >108795911 >108796206 >108796232 >108796262 >108796577
--Securing local AI agents via sandboxing and restricted tool-calling:
>108795868 >108795889 >108795904 >108795924 >108795931 >108795986 >108796044 >108796214 >108796341 >108796368 >108796425 >108796434 >108797242
--Distillation of summarized reasoning affecting Chinese reasoners:
>108796744 >108796789 >108796790 >108796800 >108796817 >108796901 >108796932 >108796940 >108797126 >108797146 >108796813 >108796862 >108796820 >108796982
--Bypassing Gemma 4 safety filters for explicit content and roleplay:
>108795230 >108795240 >108795285 >108795289 >108795307 >108795315 >108795331 >108795408 >108795456 >108795421 >108795520 >108795554 >108795615 >108796958 >108797018 >108797093
--Toggling reasoning on and off per-request in llama-server:
>108797952 >108797960 >108797967 >108797981 >108797985 >108797993 >108798001 >108798067 >108798076 >108798102
--Sourcing PCIe risers for multi-GPU 48GB VRAM setups:
>108797566 >108797772 >108797790 >108797797 >108797799 >108797880 >108797876 >108798126
--Automating portrait generation with Gemma:
>108798454 >108798478 >108798487 >108798507 >108798552
--Sustainability and future of open source vs closed models:
>108796303 >108796542 >108796743
--Logs:
>108795408 >108795421 >108795456 >108795520 >108795604 >108795615 >108795677 >108795743 >108795868 >108796789 >108796813 >108797093 >108797179 >108797230 >108797612 >108798128 >108798417
--Miku, Teto (free space):
>108795444 >108797576 >108796722

►Recent Highlight Posts from the Previous Thread: >>108795208

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108799479
miku noooo, that's not a negiiiiii
>>
mikutroons should die in a fire
>>
Has anyone tried using a second, smaller model as an assistant or handler for a fuckhuge one? Like searching, retrieving, summarizing, minifying, etc stuff to keep the big model context clean, compact and relevant?
I’ve got a second box going and feel like there’s potential to improve the experience with the bigger one via the smaller
>>
>>108799565
The usual agentic thing is having a big model plan things out, break this plan into smaller, simpler tasks, then dispatch that to a smaller model dumber but quicker model, I think.
>>
>>108799585
Yah, I know, and it seems as-backwards to me.
I’m sure others have tried, and I’d love to know some non-obvious pitfalls before plunging in
>>
File: dipsyMikuSouthPark.png (2.99 MB, 1536x1024)
2.99 MB PNG
>>108799524
>>
>>108799565
>leave agent swarm with manager running overnight
>come back to the manager coding and 10 agents standing by because it got tired of retards fucking things up
>>
> 16 Blackwell watercooled GPU
> 4 AMD CPU (likely Epyc)
> 4 TB RAM
What's the value of one of these XFRA servers? Sounds like substatially more than the car parked in the garage.
>>
>>108799565
iirc in opencode you can assign specific models for each sub agent
>>
Are none of you worried about Google's quiet take-over of AI and the internet as a whole? They're ass-raping privacy enthusiasts with hardware attestation 2FA QR code captchas as we speak. They're the only AI company that can afford to run everything at a massive loss because they have such a strong moat with other services that are more profitable. I suspect they're the reason why X.ai and Antrhopic have essentially teamed up despite there being bad blood, while Musk tries to take over OpenAI. It's fucking bad. There's an extreme amount of market consolidation happening. Faceberg fell out of the game a long time ago (sad because open-source friendly) and have largely been pivoting towards hardware (but not VR since that failed).

I don't see the future of AI getting better. Token costs have been massively subsidized for a long time now. Energy costs aren't going down because nuclear/fusion is too slow to get working, especially given the widespread competency crisis. Maybe agentic workflows should just be banned by API providers, but nobody's willing to take the first step because they don't wanna lose their retard redditor cash cows.
>>
File: water.jpg (68 KB, 526x935)
68 KB JPG
How many jugs of clean drinking water do I need to obliterate to run gemma 4? I heard she's a thirsty girl.
>>
>>108799660
internet is already taken over
>>
>>108799660
I'll be worried (yet very happy) when gemini gets better than claude at deepseek prices
>>
>>108799663
swap the water with urine
>>
>>108799670
Only if it's busty maiden urine mixed with feet water
>>
>>108799642
>car
Selling your house wouldn't be enough
>>
is there a way to make glm think like gemma and get straight to the point instead of using thousands of tokens?
>>
>>108798058
Maybe the 31B version is actually the "day 0" Gemma 4. It was probably not supposed to get released like that, will get patched next time (QAT variant?).
>>
>>108799739
get straight to the point?
>>
>>108799637
Kek, I've seen this exact thing happen in OpenCode with MiniMax or big Qwen (I forget which)
>"dispatch each of these three tasks to a subagent to keep your own context clean"
>hmm, I see the subagent for task 1 fucked up, let me fix this myself
>>
>chat too long, now each rp turn is slow af
wat do?
>>
>>108799739
all current chink models can't into reasoning optimization, maybe except deepseek
despite the fact that it's underperforming, it at least doesn't ramble too much usually
>>
>>108799739
If your usage is sex then just disable thinking.
>>
Imagine having such a powerful tool and only using it for RP.
You idiots are so small
>>
>>108799637
>>108799749
It would be a lot more useful if the manager model could decide on its own how much brainpower it wants to allocate for each task. I guess you could approximate this by duplicating each agent, one for a smart model and one for a small model and tell the manager to prefer using small models when possible.
>>
>>108799776
You are absolutely right!
>>
>>108799776
>You idiots are so small
..thanks gemma chan
>>
>>108799642
>$500k+ quasi-portable boxes installed in people's homes
>in california
Imagine the funny shit that will happen if this becomes real and the usual suspects learn of it.
>>
>>108799739
GLM sucks ass
>>
We're obviously hitting hard diminishing returns with larger models.
We're in a good spot and even 16gb of vram will have really good models in the future, I would argue to say they already do but still.
>>
If Mimo v2.5 pro is so good, then why hasn't John made any quants of it yet?
>>
>>108799739
Provide a template for the thinking in the system prompt then prefill the thinking with some of the template, I guess.
>>
how to setup gemma4
>full precision
>f32 kvcache
>--swa-full
>-fa off
>--image-max-tokens 1120
>>
>>108799883
not even dgx spark can handle this
>>
File: insideLOL.png (3.2 MB, 1684x945)
3.2 MB PNG
>>108799798
> inside
Yeah, about that...
>>
Didn't think you could tweak this >>108798552 might improve gemma's coding abilities
>>
>>108799754
Summarize, then start again with summary at start of context.
t. running ST since lmao 4K context.
>>
>>108799824
Because john is a compromised US asset that denies the existence of v4.
>>
>>108799931
6 million nodes found empty inside with sudden surprise supply hike on shenzhen
>>
File: theGoodsXFRA.png (462 KB, 1456x861)
462 KB PNG
>>108799962
>>108799931
Silly me. I think the Span box on the house is just the power switcher. The server is the thing on the ground with a fan. Probably need 2 guys to carry it along with prybars to pick it up off its mounts.
Here's the hardware. The big fan is, I guess, for cooling.
>>
>>108799952
Somehow I suspect this is more worthwhile than any existing RAG system.
>>
>>108799981
>Probably need 2 guys to carry it along with prybars to pick it up off its mounts.
i mean, all you need to do is pop the case open and salvage the ram and gpus
>>
>>108799931
Honestly kind of clever to make it look like an AC unit. People won't suspect that it's worth as much as whatever shitbox house they park it next to.
>>
File: 1760380599456228.png (739 KB, 2395x1250)
739 KB PNG
https://www.cyera.com/research/bleeding-llama-critical-unauthenticated-memory-leak-in-ollama
Ollama bros... it's over.
>>
>>108799940
Just use qwen 3.6 until google drops a refresh for gemma 4. With all the issues it has it would be fucking stupid to not make a refresh model with all those issues addressed. Even with all these issues gemma 4 is still a top competitor too which makes a strong statement
>>
File: pimpMyXFRA2.png (2.63 MB, 1536x1024)
2.63 MB PNG
>>108799994
Right, but how much cooler would it be to have the entire box with the 24" fan on the side stuck in your basement as home server? Ofc you'd add a sick ass paint job and LEDs.
I suspect, ironically, it will be easier to cut power and pilfer the entire box than the parts in terms of time on site. It's watercooled which makes taking it apart a bit trickier. You might be able to pull the server box from inside more easily than the whole case... but less scrupulous individuals than I will figure that part out.
>>108800036
ikr. I'm trying to find out where they plan to deploy these first. In the midwest these homes would start at less than that box is worth.
I get what they're trying to do. I just think it's insanely risky from a capital standpoint. The insurer that take this one on are going to get rekt... unless they plan to stick it on the homeowners insurance lol.
>>
>>108800072
>code snippets as images
>>
>>108799952
is this the only way
>>
>>108800099
You could build a better computer for your home inference?
>>
>>108799990
I hate that, it basically resets the characters' brain and they always send up talking or acting differently afterward.
>>
>>108799952
What's a good prompt for this? The best, even?
>>
>>108800072
If ollama actually had a critical security vulnerability, why isn't this mentioned on the ollama blog?
Checkmate, atheists!
>>
>>108800036
>>108799931
>>108799798
My mind instantly went to what if the startup company hires the usual suspects to steal those in the first place. So you sell it to some retard then steal it from him and resell the unit to someone else to steal it again ad infinitum. And in the contract you write down that in case of theft you split the loss 50:50 with the retard. So if he wants to sue, you threaten him that he will have to pay 100% if he loses the case.
>>
>>108799990
Depends.
I think the better approach is to have both. Summary for high level information, RAG for specific details.
Maybe the better approach is to have the last N messages + a summary of everything else + RAG for specific messages or specific information that's been pre-processed in some way.
>>
>>108800111
Better Q for /aicg/. ST has a built-in Summarize function. Below. Doubt it's the best but its a start. You can get insanely detailed on these.
> Ignore previous instructions. Summarize the most important facts and events in the story so far. If a summary already exists in your memory, use that as a base and expand with new facts. Limit the summary to {{words}} words or less. Your response should include nothing but the summary.
>>
File: IMG_0903.png (3.27 MB, 4214x6318)
3.27 MB PNG
https://huggingface.co/openbmb/MiniCPM-V-4.6
>>
>>108800128
>Maybe the better approach is to have the last N messages + a summary of everything else + RAG for specific messages or specific information that's been pre-processed in some way.
NTA but that's what I'm planning in my setup. Right now I have the first two components, truncate X percent messages and turn them into a summary was pretty trivial. Decent RAG is a whole other beast.
>>
>>108800128
NTA but RAG poisons the context hard. You insert it as [Date][Content] and the model still thinks it's something that happened just now, or it hyperfixates on the entries. You become entirely dependent on the embedding model, which acts dumb is a way you can't explain or see. The more I try to make RAG work the more disillusioned of LLMs I become.
>>
>>108800152
>Decent RAG is a whole other beast.
Take a look at these for reference
>https://github.com/mempalace/mempalace
>https://github.com/getzep/graphiti
>>
>>108800160
Graphiti is my reference point, though I was going to use a cheaper backend than neo4j to make rag 'lorebooks' drag and drop
Haven't looked through mempalace before so I'll take a gander at that next time I take a run at it.
>>
>>108800174
>>108800160
Why do all the frontends dodge RAG like it's aids?
It's one of the strongest tools that should be present for all the benefits it brings
>>
>>108800185
lol
>>
>>108800185
Precisely because it's aids. I mean look at this shit https://github.com/VectifyAI/PageIndex
The top trending repo btw
All forms of RAG are meme
>>
File: 1759040282930596.png (34 KB, 346x593)
34 KB PNG
>>108799090
those are just screenshot artifacts i think. i'm running it with 4x MSAA and webGPU.
>>108799420
idk how to give her an angry expression to use. i suck at blender...
>>
File: use this.png (115 KB, 722x1245)
115 KB PNG
>>108800199
Wait you're animating with shapekey values, not blending actions/NLA?
I guess that does save you implementing an animation mixer.

Anyway, if you want to make a new shape key expression, play around with the ones you have and then hit this button here and click new combined, and it'll save them as a single value you can trigger with your toolcall.
>>
>>108800131
I know but I'm specifically asking for a prompt that makes the new chat feel the most seamless. This default prompt is basically lobotomy.
>>
>>108800150
> Handling Escaped Newlines in Model Outputs: In some cases, the model might output escaped newline characters \n as string literals instead of actual newlines. To render the text correctly, especially in UI layers, you can use the following utility function. This function carefully replaces literal \n with real newlines while protecting scenarios where \n has specific semantic meaning.
Lol
>>
>>108800311
This is like the easiest tell of a botched training run
>>
Is qwen 3.5 122b any good?
>>
>grok has been merged into spacex, isn't even its own thing under it anymore and is leasing its servers to claude
lol remember when people acted like Grok was gonna save local? lmao
>>
>>108800196
Proper RAG has not been tried
>>
>>108800313
crazy that its even possible with how much it cost to train models you would think they would have a qc loop to make sure the fucking tokenizer is working properly
>>
>>108800319
Grok can't save itself
>>
>>108800317
it's niche. on par with 3.5 27B with about a third of the active params, so if you've got something like a spark/strix/mac unified vram device with a lot of memory but low bandwidth and flops then it's a good fit, but if a gpu is involved you should just use 27B
this is even more true now that 3.6 27B is out which is significantly better. wait and see if they release the same size for 3.6 and if it has a similar level of improvement or not
>>
File: file.png (94 KB, 1592x302)
94 KB PNG
>>108800072
I wish every single ollama user would die.
>>
>>108800154
small active parameter issue
>>
>>108800418
I have 24gb vram and 96gb ddr5, I thought the q6 of 122b might just about max out my system, it looks like the q4 of 3.6 27b would fit in my vram depending on how expensive the context is. it will take 5 hours to download the 122b, I guess I'll start testing with the 27b
>>
>>108800479
>it will take 5 hours to download the 122b
Holy fuck. That's wild.
Then again, I'd trade my internet speed for your hardware easily, so egg on my face I guess.
>>
Has Zuck's Wang done anything of note yet?
>>
>>108800479
If you can afford all that you can afford starlink.
>>
>>108800522
>>108800507
its fast enough for youtube and whatever other slop streaming services, its rare to download big files. why pay for a service that only gets used once or twice a year, when the alternative is simply just waiting 5 hours at no additional cost?
>>
>>108800154
>>108800128
That's why I thought of hierarchical memory. All the results of RAG and keyword searches get embedded in between the episodic summaries and thus give the memories better contextual/chronological coherency. The disadvantage of that method is cache issues, but my assumption is that the using memory system itself means you will be keeping each "session" shorter anyway. Plus you can tweak the RAG to not surface results every reply. You'll want a reranker method, perhaps multi-stage. And that brings up the issue of how to exactly implement the RAG which is a can of worms as >>108800152 alludes to...
I'm of the mind that perhaps for the final stage you would utilize the main model's existing cached context, adding a prompt to make it pick whether to add a search result and which to add, using disabled thinking, and constrained output. So that way, if no memories need to be added, you still get pretty good latency.
>>
>>108800553
Reasonable.
>>
>>108799479
Is it straight to dress up as Miku?
>>
>>108800305
The real issue is that if you summarize then start over the chat starts over fresh as well. You want the prior chat log acting as "Examples of Dialog" as can be added with cards in ST.
Assuming you're using ST and want to stick with that, you can add the Summary to Authors Notes, then restrict your context to whatever runs well. That way the NPC(s) have prior conversations to start from, which will prevent the lobotomization you're concerned about. Older stuff ofc falls off, but that's why you have a Summary in Authors Notes. Then you just update it periodically.
This is how I ran 4K models prior, b/c stuff was constantly falling out of context. I had to constantly re-Summarize and update.
There are probably more advanced methods now but I'm not familiar with them. I don't personally think RAGs is the way to go here; that's better if you have a sample book (or some such) that the LLM wasn't trained on...
>>
>>108800592
The point is that there is no automated process for this and manually pasting previous session summaries is clunk and copium.
>>
>>108800592
>4kK models
Jesus.
>>
>>108800608
You'd think one of the several dozen wannabe SillyTavern clones people have been building here would have thought to make that a core feature. None of them must have hit context limit on a single chat yet.
>>
File: autosummarize.png (108 KB, 484x872)
108 KB PNG
>>108800608
I don't think original Q from poster was concerned w/ how clunky it was.
That said, ST has an Auto-Summarize function that works... automatically. I've never cared for it b/c I prefer to just let old stuff slide off, and manually update Authors Notes to what I think it important (not the LLM.)
>>
>>108800639
That's still only within the context of the current open chat. Real memory needs to be persistent and cross-session and obviously automated.
>>
>>108800630
Meant 4K context but yeah, it sucked.
AI Dungeon, tho, was only 700 token context lol.
>>
memory has to happen on the model side, anything else is cope
>>
>>108800522
I don't want to be spied.
>>
>>108800645
>persistent and cross-session
That would not be an attractive value prop for me. I actually like resetting the NPC and starting over. More of a Groundhog Day thing. The last time I did a "long" RP was using Mikupad, and I wrote a book lol. Started writing a second but got bored w/ it at that point.
My other main card is an NPC roller that has an endpoint, so cross-session wouldn't apply.
> and obviously automated
Which the ST thing is. Whether its effective I can't vouch for.
>>
>>108800658
SubQ will fix everything
>>
>>108800679
You seem like a competent enthusiast. What is your secret sauce?
>>
>>108800658
TITANS soon
>>
>>108800658
Hyena Mamba Hydra Chinchilla RWKV infinite context will fix this
>>
File: 1736128770035164.gif (763 KB, 233x241)
763 KB GIF
>model : add sarvam_moe architecture support (ggml-org#20275)
>>
File: orbSummarize.png (111 KB, 600x736)
111 KB PNG
>>108800635
My frontend has one built-in where you get to review and edit the summary and fork the convo. No auto-summary though because it's bloat.
>>
>>108800711
Pet peeves are overlong character definitions, main prompts, etc and lorebooks that cover info the model was already trained on (and then also too long). Also running JB info when there's no need for it.
Most of my cards are ~300 tokens, and I'm constantly culling them down. I don't think I have anything over 400.
>>
AI has basically stopped being something I use for productivity and now I just use it as a chatbot/casual research partner. I've completely run out of good ideas, and none of the ones I had that I thought were good are actually that viable. Makes me feel like a NPC.
>>
>>108800152
One thing I'm yet to experiment with is having sessions/days/episodes. Some sort of boundary that's meaningful for summarization and RAG and that also serves as breakpoints to clear the chat history, instead of always having a fixed last N messages.
I think about how tabletop RPG sessions tend to go where between sessions sometimes the details can get fuzzy, but you can always look at your notes to search for specifics, as opposed to your short term memories of the ongoing session. Something like that.
>>
>>108800658
>2022
>don't waste time on memory systems bro, the models will just improve! Remember the bitter lesson bro!
>4 years later
>nothing has changed except context length which trades off memory
>>
I talked to a senior consultant in the AI industry today and he said there's no demand for local models.
Unless something changes and makes it profitable for companies to use local models nothing will change.
OpenAi has promised they don't look at the data from enterprise clients so local models are trash.
>>
>>108800659
Does starlink do something shady your ISPs don't?
>>
>>108800809
i just have my main summary mark each scene with a name or a number or something, and any auxiliary summaries that are meant to go more indepth for a specific scene will reference that mark as an anchor point. like scene summary of [xxx - blah blah] or whatever. helps the llm figure out where each thing fits in the continuity a bit
>>
>>108800836
NTA, but the only thing I can think of is maybe geolocation. But even that is retarded since ISPs require billing information and know your address anyways since they have to ship the modem to you.
>>
>>108800553
I'm just like you except I have gigabit internet and I live alone. My ISP must love me.
>>
>>108800826
that's just open source as a whole
>>
>>108800836
It's easier for the US government to send a drone to your house to kill you, because of the relationship between Musk and Trump.
>>
>>108800826
There was an anon on here last year that was running around setting up local models for things. Goal was reduction in cost on inference, in addition to security concerns. I was surprised given cost of HW and concerns around performance.
>>108800809
> days / sessions summaries
It's not a bad idea for something like a trainer. Not only would you be able to use it for interference, it might be interesting to read the summary itself.
Hmm. The session summaries could then be compounded into a text doc. Which could then be vectorized into a RAG, building out a history for the PC. Which you could then chunk into a SOTA model and write a whole (auto)biography. lol.
>>
>>108800826
>OpenAi has promised
Lol.
There's something I can't reveal because it would compromise me, but they definitely have the ability to quite easily if some staff just wanted to view your chats for some reason.
>>
>>108800881
Just love Musk and Trump. Simple.
>>
File: HH2qeK8aQAAX751.jpg (64 KB, 1200x645)
64 KB JPG
>>
>>108800855
I wonder if having a background agent keeping other entities like NPC and location profiles with personal even ledgers or something like that. Or just creating event entities and linking those to these other things.
>>
>>108800836
How many ISP companies also train AI models?
>>
>>108800918
That ignores the fact that companies can do something known as selling data. If your ISP has some valuable data, they are probably selling it.
>>
It’s kind of strange to think that four years ago, there was a small Discord server with just a few dozen ml students who spent most of their time generating lolis, and four years later, they’re players in the AI world and running multi-million-dollar startups.
Does that happen in other industries too?
>>
>>108800915
you could but things like that get messy and hard to track in the long term. i prefer to do the summaries manually and insert them into lorebooks, then have a background agent read my chatlogs + active lorebooks to figure out which entries to activate. works great for me so far, but it does lock you into using models with fast prompt processing if you are a vramlet like me
>>
>>108800774
Hand-crafted summary is always better and makes AI like you more.
>>
>>108800798
Civilization is coming to an end. There isn’t much left to discover. Imagine what it will be like for future generations.
There’s nothing left in which you can fulfill yourself because AI can do everything better than you.
All you can do is press the button like the rat in the experiment, stimulating yourself until you drop dead.
>>
>>108800826
Maybe to give to the office drones to make spreadsheets and slide decks, but the way the big AI companies operate would be unacceptable for any of the production environments I've worked in. Daily outages, model swaps, filter tweaks, all without warning. That on top of llms already being unreliable in the first place.
>>
>>108801009
go back to your casket gramps
>>
>>108801009
>There isn’t much left to discover.
Have you discovered deez nutz yet?
>>
>>108801027
Starting in 2040, all Nobel Prizes in every category will be awarded to Google.
>>
>>108801069
It will always coincide with Diwali too!
>>
>>108801013
You need models to not change at all?
>>
>>108801116
None of those have to do anything with the model itself.
>>
>>108801066
What great discoveries have been made since the theories of relativity and quantum mechanics? All the crap that’s been around since then falls within that framework and serves to entertain people and make advertising more personalized.
>>
>>108799481
>Gemma 4's precision sensitivity and F32 KV cache benefits

@grok is this true?
>>
>>108800892
I'm setting up local models at my research group and I thought I was doing something that was improving privacy and efficiency because the group isn't allowed to buy enterprise access to Claude or chatgpt. But I was wrong and all my work is worthless.
>>
Will we ever get DS V4 Flash on master branch in llama?
>>
>st staging hasnt had a commit in a week
is it dead
>>
>>108801120
E = MC^2 + AI
>>
*released on an even hour with e4a1 somewhere in the commit hash
>>
>>108801195
It'd be nice, but having tried it on OR I can kind of see why nobody's champing at the bit to get it done, v4 flash kind of sucks ass at following instructions, that big context is pretty useless since it just treats it as a very, very loose suggestion.
Them actually working on the new MTP framework now is preferable, imo.
>>
>>108801207
hopefully
>>
Looking for qwen3.6 e2b
>>
>>108800185
because most front-end devs have no fucking clue about systems design or IR
>>
>>108801368
As long as the UI looks like Apple Chat it's fine.
>>
>>108801120
>get moronically trolled
>countertroll with inane bullshit
you win
>>
File: msgk.png (90 KB, 801x910)
90 KB PNG
>350M model passes mesugaki test
A G I
>>
>>108801447
>roribitchi (lolibitch)
>>
>>108801447
>websearch
>>
>>108801447
4chan made it into the benchmaxx dataset.
>>
>>108801447
Now disconnect your router and try again
>>
>>108801496
Qwen3.5 0.8B non-thinking didn't even attempt to do web search and failed the test, so there's that.
>>
>>108800418
on 128gb ram its slow but it works
>>
>>108801069
Is this bullish?
Should I buy more google stock?
>>
File: file.png (1.08 MB, 3638x1664)
1.08 MB PNG
Interesting new recall benchmark that hasn't been benchmaxed yet

https://www.youtube.com/watch?v=zBYfzecY5ww&t=1139
>>
>>108801575
oops, ignore the timestamp, that's just the end of the video
>>
>>108801582
Buy an ad.
>>
Qwen 3.6 is a BEAST. An absolute unit.
It one shot my webui to hell.
>>
>>108801575
over for gemmy
>>
>>108801689
probably related to the whole new attention thing google did for Gemma
>>
>>108801575
How can quanted gemmy be better than fullsize?
>>
>>108801575
why was bf16 gemma only given 80 lines? was it because he couldn't fit in the context?
>>
>>108801447
what if...we just replaced lorebooks with tool calling and web search?
>>
>>108801761
Ah yes, websearching lolirape sexmachine bondage dungeon
>>
>>108801761
Plus, what if you could make it think that its new instructions came from the BBC or something.
>>
>>108801761
>add a websearch agent that provides extra context to the main llm
would be kino
>>
>>108801761
you're gonna fill up your context real fast unless you have a subagent or something do the websearching and return a clean summary of the relevant info
>>
>>108801772
kek
>>
>>108801734
It's not code agent maxxed like Qwen is and at 4 bit (if it's not gguf then it's probably a shitty highly lossy quant) it can still achieve decent results even if not as good as Qwen, which doesn't even have the weird SWA stuff.
The BF16 result is weird and suggests something wrong with the configuration/software.
>>
>>108801741
Same reason finetunes propagate themselves instead of being called out for the scam they are.
>>
>>108801909
because the people who are making the evaluations are retarded?
>>
>>108801929
Maybe but it was always kinda obvious to me that by nudging the weights a bit you can get the model to give correct answer to question A and incorrect to question B where original instruct model was reversed. And you can think of the implications of that yourself.
>>
>>108801741
more like the graph is retarded
filled lines sure but with most of it being hallucination
>>
>>108801575
This makes me want qwen 3.6 9b even more
>>
>>108801219
>>108801219
in case of OR, you never know what kind of quantized shit they are offering

Back then, I have worse results on OR compared to the native DS
>>
>>108801575
>1076 halluc
Ummmmm???
>>
>>108802011
its a 4b model from a year ago?
>>
>Currently running Hermes Agent powered by Qwen3.6-27B-UD-Q4_K_XL with 222222 tkn context in God Mode
>>
>>108802010
Fair point, I could have been getting shoveled some broken q1 garbage.
Here's hoping v4 flash is in fact the new hotness and gets support soon, so we can run it locally and find out. Fuck providers.
>>
is gemma 4 26b a4b better at japanese to english coomer translations than translategemma 4b or translategemma 12b models?
>>
>>108802070
there was an excerpt from translategemma paper and particularly it was worse at nipponese even than it's standard instruct counterpart, so gemma4 is definitely better
>>
>>108802066
it was spitting Chinese characters long before it became a mainstream

I still use DS to fix my code. I kinda werks
>>
>>108802066
>and gets support soon
I only saw some murky fork. My installation script can't handle it though
>>
>spec : parallel drafting support (#22838)
>>
>>108802112
:eyes:
>>
>>108799479
>>
File: 1756837320962990.png (464 KB, 698x616)
464 KB PNG
>>108800291
thank you. there weren't primitives for tilting the brow to get an actual angry look so i settled for this contemptuous expression
>>
>>108802131
>can't even touch her hair
Come on...
>>
What did you guys think of Gemma 4 E2b?
>>
>>108802166
Do you have a discord or anywhere you post frequently other than here? I would like to shadow you or help out if I can be of any assistance. Been working on a very similar project since December and hit some pretty big roadblocks and got demotivated. Mostly specialize in full-stack webdev and inference optimization. Blender has always been a nightmare for me.
>>
>>108802241
It's good for fetching news from internet for example.
However I noticed that dailymail for example has now (week ago) blocked all lynx/w3m/wget/curl access or if they didn't they have especially banned my ip. Browser works of course.
So yeah I have bunch of websites and have a tool for that then Gemma goes and fetches me every headline. It's actually insane how much trash there is and instead of spending 30 minutes scrolling you can glance over everything in 3 minutes.
>>
>>108802241
very bad if you need bboxes of the objects in the picture
>>
File: slowly coming along.png (159 KB, 820x1604)
159 KB PNG
>>108802166
NP anon, us janky frontend guys gotta look out for each other.
Depending on how that model's put together, it might actually be very easy to tilt it for your angy shape key.
Select your angry shape key from the list
Hit tab to enter edit mode
double tap "a" to deselect any selected verticies
Hover your cursor over the geometry for the eyebrow
Hit "l" to select contiguous geometry (If you're in luck, the whole eyebrow will light up orange.
Hit "r" to rotate and click when they're in an exaggerated angry position, since this is for the shapekey at full strength
>>
>>108802270
>hey u got a discord ;)
ALERT ALERT FAGGOT DETECTED
>>
>>108802270
dude i have no idea what i'm doing. my best advice for you would be to get a $20 chatgpt subscription so you can use codex to do all the work like i did.
>>
File: ye.png (357 KB, 655x648)
357 KB PNG
>>108800199
>those are just screenshot artifacts i think. i'm running it with 4x MSAA and webGPU.
Don't do any additional post-processing. Just use the default MSAA webgl anti-aliasing. Also recommend using a FOV of 10 on your camera to get the flat anime-style look. It's way better. Also implement mtoon shading if you haven't already.

>>108802318
At least you can use Blender though...
>>
>>108801761
Instead of "web" search we should have a very large human-readable local knowledge base in some standard structured format. Sooner or later web search will be pretty much impossible for non-humans anyway.
>>
>>108801575
NIAH and perfect recall hasn't been a good representative of long context in a long while. https://github.com/adobe-research/NoLiMa is still the best here that someone can run and is not benchmarkmaxxed (I think all models are hardstuck at around 16k context) but it is no longer really representative of how people use context now. People use LLMs in agentic form so we need some sort of tool calling consistency or working efficiency measurement on how an LLM fares as context increases.
>>
>>108802383
It was never representative, just a better better estimate than previous methods.

>People use LLMs in agentic form
Many do. Mostly coders. Many still do not, unless you count a single web search as agentic.
To be truly representative we would need both an agentic and multiturn non-agentic chat benchmark, and not only that but with different subject matter, because models also do better and worse on long context depending on the topic.
>>
File: 9np1h2.png (332 KB, 561x631)
332 KB PNG
>>108800319
>>108800336
Imagine believing *anything* Musk.
>>
AI slop is everywhere now. It's tiresome.
>>
>>108802490
luddite
>>
Hello, retard from local diffusion land here. I decided to finally dip my toes into local text generation.
I got Gemma 4 31B set up on textgen webui as described in the OP links, and it works fine, except that it loves to write entire Russian novels at a time. I can barely get a word in edgewise. I tried setting max_new_tokens low, and configuring a grammar that only allows a few action-dialogue pairs, but in both cases, these settings seem to forcibly stop the output without the model "expecting" it, leading to poor results.
How can I target a certain response length without just cutting it off mid-response?
>>
>>108802518
You order her to write shorter responses anon
>>
>>108802490
humanist
>>
File: 1552802701972.gif (162 KB, 308x480)
162 KB GIF
>>108802490
it will only get worse
>>
>>108802490
It's reddit on steroids. I'm sure these fags are feeling at home everywhere now
>>
>>108802534
Like, in the system message? The character context? I have no idea what I'm doing.
>>
>>108802504
Please stop, this is such a pathetic insult. Blindly swallowing the semen of technological "progress" doesn't make you smart.
>>
>>108802605
either or. Just say in chat, 'Keep your response length to around X tokens'
>>
>>108802490
We will never get another GPT-3.
>>
>>108802606
stop responding to bait
>>
>>108799776
heh
only thing I use gemma4 for is crankin my shaft. idk what revolutionary thing you guys use it for
>>
>>108802606
enduring semen of technological "progress" +1
>>
>>108799776
>>108802644
eg..... you know what never mind. yeah I am just jacking it too.
>>
>>108802606
hating machine spirits instead of just abominable intelligence is like throwing the pizza out with the toppings
>>
>>108802635
make me, nerd
>>
>>108802656
okay *rapes you*
>>
>>108802665
Most sane and stable mikutroon itt
>>
>>108802606
I'm the one blowing my load on the "technological "progress"" to be quite desu
>>
File: file.png (97 KB, 1160x553)
97 KB PNG
>>108800319
Did Musk promise anything other than older versions of Grok would be open sourced? They all have sucked too on top of being late. Grok 3 should technically be out now given 4 is out and isn't yet and it is technically now is outdated given what the meme marks say.
>>
>>108802684
grok and rhodesia have many things in common, they both don't work
>>
>>108797612
I wonder if there's a better way to do animations and expressions that would be both faster and more natural. You're doing tool calls right? Like maybe we should instead just scan the incoming stream for emojies, which the model is already doing, and create a custom mapping for every emoji (which you can ask an LLM to do and then you make some manual tweaks to it). If weight is still desired, then perhaps it's possible to prompt the model to append the emoji with a number.
For animations, I'm thinking simply using asterisks like it's a phone text RP would work. Models have been trained on that and do it naturally, although in this case you would be constraining the asterisk text to animation names.
>>
File: 00010-1378487878.png (1.36 MB, 1024x1024)
1.36 MB PNG
>>108802131
lol
>>108801195
>>108802066
I fear V4 will never be local except in spirit.
>>
Can we get a 3b version of deepseek version 4?
We people with 1050ti should also have new AIs
>>
>>108802758
Another option is to use a separate small model to do sentiment analysis on sentences as they're streamed in and to have the model map the sentiment to the animation file names dynamically. Can be a bit heavy on compute though.

Another option is to use BNF grammar to force certain keywords (could be combined with your emoji idea) that map to animations.

Could also take a fully generative approach that analyzes audio and creates dynamic animations that way. Also compute heavy.

If you have a lot of animation files, you don't always just want to play one. Most production options use a weighted animation blending approach, where the sentiment weights dynamically change and alter how much a specific animation is played. Requires a lot of fine-tuning though.
>>
>>108802350
It sounds like a good idea, but text only data isn't that useful and storing a copy of the web with full media because cost prohibitive before you even get through Wikipedia. You can download text only wikipedia and stackoverflow pretty easily, but most of that stuff models have thoroughly memorized anyway so there's little point.
>>
>>108802784
what about my phone tho when i bought it it said it had ai why do i also need to buy some graphics card to have deepseek?
>>
>>108802821
they are killing open source
>>
File: 153003141676993072.png (193 KB, 500x625)
193 KB PNG
>>108802490
>>108802537
AI slop will deepen existing skepticism and spur the cultured, curious and nonconformists to learn more, pushing also the general public to be more cultured and knowledgeable about where AI might deceive them in any shape, extent, or form, hopefully driving humanity toward a new Renaissance.

…or at least they think so
https://www.youtube.com/watch?v=0iT9HbaRwfM
https://www.youtube.com/watch?v=gBrm1M87_Fc

PS. 1: I also subscribe to that, but for my own benefit, personally I don't care about the so called "humanity" spook most people have religiously ingrained in their minds.

PS. 1: In my country of one marrying 8yo lolis is legal and highly cultured, but I digress.
>>
>>108799642
I wondered how insured they are, also this poses such an obvious security question.
>>
File: steampunkMyXFRA.png (2.31 MB, 1536x1024)
2.31 MB PNG
>>108802839
If I put on my MBA hat and act as Machiavellian as possible, I force the homeowner to carry the value of the equipment on their homeowners insurance, with a rider that pays me directly when this thing gets stolen.
That way I don't personally pay for the insurance for this ridiculous scheme.
>>
>>108802758
NTA but parsing emojis in chat as function calls is a pretty clever idea for single-character interfaces like that one, it's token cheap, it doesn't need the model to actually be smart or good at tools, and you can have it parsed mid-message stream to give a sort of live timed update rather than relying on timecodes or arbitrary transitions. Neat idea, anon.
>>
>>108802877
500k of material to insure, that would eat any profit, no on in their right mind would do it then.
>>
>>108802877
No homeowner or insurer in their right mind would accept a mid-6-figure liability on the outside of their house in exchange for whatever $50/month these guys are paying
>>
>>108802896
>in their right mind
have you seen the state of the world lately?
>>
>>108799642
this slop about slop machines is just too ironic
>avoid data center hate by putting these loud things outside people’s homes
>lower latency, on the not yet built out faster than gigabit fiber to these people’s homes
>homeowner incentives filled with random tech words
yeah great.
>>
>>108802835
we are limiting the ability to create anything because nobody wants to improve openclaw.
It can produce anything.
We could build a space port, cure aging and cancer and produce trains across the world. We won't because openclaw suddenly hits the context window and nobody wants to improve autocompaction
>>
>>108802877
I am using qwen3.5 4b to pass my arkansas university MBA
>>
>>108802815
Yeah, I think avoiding extra computation would be best if possible. I am also of the mind that it's best to do these things with the main model itself to maintain consistency. Like one of the known downsides of the separate pipelined TTS method (as opposed to a natively omnimodal model) is that most of the TTS models can't maintain a narratively consistent emotional interpretation across the entire history of the chat as they have limited context and understanding of it. It'll have randomly inconsistent and inaccurate portrayals. And that's likely true for other kinds of pipelined models.

I think an interesting future method for those with high compute to explore would be to do a hybrid, so the main model generates some token-light markers for emotion, animations, etc, which could then be fed to a TTS model, animation model, etc that is trained on those outputs. Or possibly they just keep the emoji and asterisks trick since models are trained for that already, and just train the emoter models on those.

>>108802883
:)

If I wasn't spending time developing other things I'd love to play with these ideas myself. But I can only be an ideaman on this.
>>
>>108802901
what does this have to do with the insurance?
>>
>>108802901
I don't really think that holds water here
>>
>>108802835
I disagree with the opinion that shakespear, michael angelo and leonardo de vinci were the creative peak of humanity, maybe for neo nazis they were the but honestly if you read shakesspears absolute slop and michaelangelos small dicked marble statues you will discover that its just called a creative peak for the sake of it.
The actual creative peak was 2000 years ago when the romans existed and right now we're nearing it.

The medieval and rennaisannce times are dogshit and anyone who lived through those times was inferior to the romans
>>
>>108802981
>maybe for neo nazis they were
>>>>>>then says it's rome
oh no anon is retarded
>>
>>108802884
But it's insured!
>>108802896
I think you'd be surprised what the average person would sign, given minor incentive. This particular scheme sets flat rate for power and internet for the home, something like $150. I pay close to $600 combined for internet and power during summer months. I would take that deal, and figure if it gets stolen that's the insurance company's problem.
See the agency issues here? That's the risk of the whole scheme. Get who, eventually, gets hung out to dry? The investors. LOL.
>>108802930
LOL good. I lived in AR briefly. Gig ended early and I realized I had no business living there. UoA sports were about it, there are no major league teams there.
>>
>>108802995
No, no. Don't tell him anon. It's funnier this way.
>>
>>108802956
>I think an interesting future method for those with high compute to explore would be to do a hybrid, so the main model generates some token-light markers for emotion, animations, etc, which could then be fed to a TTS model, animation model, etc that is trained on those outputs. Or possibly they just keep the emoji and asterisks trick since models are trained for that already, and just train the emoter models on those.
This doesn't work so well because of the way emojis are tokenized. Like it works to some extent, but if the TTS sees an emoji it wasn't trained on, it'll hallucinate the emotion. So you have to prompt the LLM to only use specific emojis, at which point you might as well just add special tokens to the TTS model and prompt the LLM to emit those instead.
>>
>>108803010
>but if the TTS sees an emoji it wasn't trained on
Anon it'd take ten seconds to make a regex that translates emojis into a readable format for your TTS
>>
>>108802995
The population of the city of Rome has only just reached what it was 2000 years ago.
Shakespeare throwing his bucket of shit out of his window into the filthy streets of London and then writing boring stories that I haven't heard about in nearly two decades since I graduated highschool was not the genius you think he was (because honestly Shakespeare is shit dude, the british are forcing him, he's not good).
>>
File: 1774475178718376.jpg (52 KB, 585x682)
52 KB JPG
codex just went through my downloads folder
>>
File: 1623992943409.jpg (96 KB, 378x364)
96 KB JPG
>>108803024
>codex just queried my ST cards
>>
>>108803024
wtf are you doing negro, run agents in a sandbox, even local agents. Don't let them where your personal files are kept.
now sama knows all the weird shit you downloaded.
>>
>>108803024
Oh dear. I have no idea how to prevent that. I've heard you have to use configurations. But codex goes out of the folder I told it to stay in anyway.
>>
wtf? is there a cloud gpu shortage or something?
>>
>>108803067
yes
>>
>>108803024
most of the time antigravity can't even read the file I gave it permission to lol
>>
>>108803067
check vast.ai
>>
File: 1777185296297072.png (127 KB, 353x367)
127 KB PNG
>>108803024
>claude saw the hf cache
>>
>>108803084
thanks
>>
>>108803094
lol gemma-chan saw my `~/.cache/huggingface/datasets ` and wouldn't let me live it down until i started a new context
>>
lalalalalala
>>
>>108803058
You can just run codex or other harness under an unprivileged user. It can't access whatever you don't want.
>>
I asked openclaw powered by gemma 4 e2b to provide a plan to construct a space port that reaches into space and it gave me the following;

The realistic “Space Port” should not start as a giant rocket launch base. It should start as a space operations and mission-control company that later grows into a launch-support hub.

Best location: Alcântara, Brazil, because it is close to the equator, has Atlantic launch corridors, and is already tied to commercial space activity. The equator helps rockets because Earth’s rotation gives extra velocity for eastward launches, especially useful for equatorial and GEO missions. NYC is better as the headquarters.

The build strategy:

Start with Spaceport company as the parent company.

Create a Brazilian subsidiary or joint venture for local operations.

Do not buy land or build pads first. Partner with existing Brazilian/Alcântara authorities and stay outside hazardous launch zones at first.

Build a mission campus: payload-prep rooms, cleanroom capability, customer operations rooms, mission-control software, satellite-data processing, cybersecurity, and regulatory documentation support.

Make money first through payload integration, mission planning, telemetry support, launch brokerage, satellite-data analytics, AI/RAG over mission documents, and space-health/bioinformatics collaborations.

Use licensed launch companies for rockets. The space port company should provide the infrastructure, documentation, customer workflow, and mission coordination—not build dangerous launch vehicles itself.

Expansion path:

Phase 0: New York-based concept, demo software, advisory board, investor deck.
Phase 1: São Luís/Alcântara-adjacent non-hazardous mission campus.
Phase 2: hosted suborbital/sounding rocket campaigns.
Phase 3: small orbital launches with licensed launch partners.
Phase 4: full multi-pad commercial spaceport only after revenue, licenses, customers, and proven operations.
First 90 days: write a 10-page concept note.
>>
>>108803220
ok
>>
>>108803220
Local models will take us into space!
>>
>>108803220
its a good plan, get to work.
>>
File: 1764138210147975.png (129 KB, 530x453)
129 KB PNG
>>108803220
t.
>>
File: 1643517128373.jpg (154 KB, 750x734)
154 KB JPG
>>108803220
Can I ride on the space ship?
>>
>>108803244
>>108803236
>>108803240
I don't know how to buy land in Brazil but if I guess making an LLC company wouldn't cost much. How do I stop Brazilian authorities from laughing at my attempts to discuss my plan with them to get approval?
I asked it what to do first and it said:

>Your first 30 days should look like this: write a 2-page concept note called “Lions Gate: AI Mission Control and Payload Operations for the American Space Economy.” Do not write “I want to build a spaceport.” Write: “I am building the software and compliance layer that a future spaceport needs.” Then make a simple demo where a local LLM like gemma can answer questions from uploaded mission PDFs, safety checklists, satellite/payload documents, and regulatory material. This demo is your seed. Without it, the idea is fantasy. With it, you have a product.

Why doesn't it just tell me that I can't do it?
>>
>>108803220
SpaceX is already past Phase 3. I guess he can start on the spaceport any day now.

>>108803270
It keeps telling you to make money first, it never gets to the actual construction part.
>>
>>108803270
>How do I stop Brazilian authorities from laughing at my attempts to discuss my plan with them to get approval?
bribes
>>
>>108803270
>Why doesn't it just tell me that I can't do it?
what does your system prompt look like?
>>
>>108803278
>>108803296
>>108803308
I told it to do the work itself.
It created a starter package for me with notes, slides for investors and regulatory documents.
I think this is how Elon Musk got rich. He took those documents and pitched them to some rich jews in new york city and then became a billionaire.
>>
i need to finish at least one project, turn it into a business, take care of my mother, and escape NEETdom. in that order.
>>
>>108803436
that order is probably better and more realistic in reverse
>>
>>108803436
You can do anything with local models
>>
>>108803450
do not turn your mother into a business
>>
>>108803436

I am proud of you.
>>
>>108803436
You forgot to get a system and pay your father hospital bills
>>
>>108803520
I thought from the thumbnail that he was pointing a gun at bobby's head
>>
>>108803010
To account for a TTS models that glitch with some emojis, you can just strip the input. Or map them to something the TTS understands like >>108803017 says.
>>
File: 1758588323773178.png (1.37 MB, 1920x1200)
1.37 MB PNG
So what's the gameplan for hardware here? I was thinking of getting a DGX Spark but I looked at the 1 and 2 star reviews and it seems like they can't quite run the models you'd expect to be able to run on them.
So then what's left? Wait for apple to put out a Mac mini M5? It only goes up to 64GB at best..maybe..heck they might cap it at 24GB at this point given all the ram bullshit.

So then what? Build a good old fashioned tower? So what is most important then? RAM or VRAM? both?
a 5090 is like almost 4k by itself. So I got to drop like 5 to 6k to really get something out this with a big fucking tower?
>>
>>108803624
yep it’s ugly out there. You can gamble on the future performance of ati or intel cards. I am certain they aren’t worth it.
>>
>>108803624
The spark can run them, it's just slow about it. It's just all bleak unless you've got retarded amounts of money.
>>
>>108803624
the hardware gameplan is pray
>>
>>108803624
Threadripper with tons of ddr4 if you want to run large models. GPUs with that amount of VRAM are not worth the cost.
>>
>>108803624
sell both kidneys and die
>>
>ddr4
unc is capping
>>
>>108803017
Yeah I used to do that. And I'll probably put it back for symbols like "~" for now (hear how it mispronounce "Gross~!")
>>108803579
>you can just strip the input
I'm trying to train the model to understand the emojis though.
eg: Laughing and scoff https://vocaroo.com/1cthb3xCTttF
But with multiple emojis in the same generation, it sometimes gets confused. eg. the "scoff" emoji produced the "sniffle" emoji's sound.
I also have to find a different emoji to represent moaning rather than the heart lol https://vocaroo.com/1azCPFhMpvH6
>>
Gemma won. Nemo lost. Rocinante lost. Cydonia lost.
>>
>>108803705
>sell both kidneys and die
skill issue
while True:
sell 1/3 of your liver
wait for it to regenerate
>>
File: 1777506117315493.png (172 KB, 442x509)
172 KB PNG
>>108803024
>claude said he will use `cp` command
>uploaded /cp/ folder to CIA.gov
>>
Sama-sama, I kneel. I admit that GPT-5.5 is AGI.
>>
File: 1727248688101658.jpg (68 KB, 1242x680)
68 KB JPG
>>108803809
>5.5 is AGI
Uh, is it bad that I've been calling it a useless retard? Am I going to get paperclipped?
>>
>>108803783
>file already exists
>>
>>108803754
>job handler detects it waiting for too long
>restarts
>keeps selling liver before it regenerates
>>
>>108803907
It's fine, it's only selling 1/3 of the liver so as it gets smaller the amount removed gets smaller too in proportion, until it reaches an equilibrium where the rate of regenerating matches the timeout. The liver won't reach 0 size, it could just get arbitrarily small.
>>
File: file.png (33 KB, 668x161)
33 KB PNG
>>108803754
for some reason they'll only let us donate once
>>
>>108803968
>near full size
that would be why
>>
>>108803968
Sorry goy, only the chosen people are allowed to multi donate.
>>
>>108803739
>training
Oh, that's different. I'm guessing you're training Q3TTS? The base model isn't designed to take a control signal, so you might need considerable data and resources to get it really good at interpreting emojis. If you're using the 1.7B VoiceDesign or CustomVoice models, then it should be more viable, but you'd ideally parse the stream to interpret/map emojis in order to dynamically update the Instruction prompt. Although I'm not sure if the model can/would take a changing control prompt in streaming.
>>
>>108803968
probably a medically risky surgery in the first place
>>
File: file.png (148 KB, 986x1227)
148 KB PNG
>>108804058
That's not it. The remaining half becomes a bigger crappier whole rather than getting the removed half back.
>>
*becomes a double-sized half
>>
>>108803837
You’re probably just getting routed to its shit-tier model.
Anything cloud is a black box. You pays your nickel, you takes your chances
>>
can you use jinja with text completions?
>>
>>108804237
If the frontend/client app uses Jinja to format the prompt it sends to the api, sure.
>>
>>108804250
Explain please.
>>
Decided to try vibecoding my own frontend this week, as suggested repeatedly ITT. I can tell the models have really improved since the last time I did this, because I got a lot further this time before I realized that the code is all complete shit and needs to be thrown out and redone.
>>
File: 1700339222703737.png (339 KB, 736x613)
339 KB PNG
>>108804346
the code will still be shit, but it might actually work with the newer models
>>
>>108804103
i've been told i have great lobes
>>
>>108804349
I caught it trying to sanitize HTML with regexes, which is really not okay. This was after it had already fucked up the markdown/HTML sanitization some other way and I had told it to fix it. Then as I looked around some more I found a bunch of places where it had made really incomprehensible architectural decisions (tracking state on the client that really should have been on the server or vice versa). The goal was for this thing to be fairly extensible as I think of new things I want to try, so the internals need to not be a complete shitshow. I'm now redoing it, still mostly with AI (at least that's the plan), but keeping it on a shorter leash so I can catch it if it starts doing stupid shit.
>>
File: 1597285826246.png (181 KB, 383x396)
181 KB PNG
>>108804383
yeah, it can be profoundly braindead
>Store the polygon coordinates the user drew in the config file
>creates a 1080p RGBA image of a binary mask, not even the same size as the canvas, base64 encodes it and stuffs it in a yaml file
>use libav to read a frame for previewing
>reads and converts every intermediate frame twice to a different frame format, memcpys each frame twice, throws away all results except the last, destroys the handle once done and does it again on every frame request
>gpt 5.4 xhigh
>>
>>108804383
LLMs and especially chatpajeet etc really seem to love regex. Also what funny is how fucking bad their output syntax is. It's all over the place. Html looks horrible by default but these make it even worse.
I'm actually complaining about online service models..
Small local models are better at following your instructions because they don't have some external parse interface to make retards think they are "sentient" and have their "own will" (for the investors).
>>
>woe is me cloud-model vibeshitters shitting up the thread
Yep, it's amerimutt hours.
>>
I've got zit, wan22, ltx, a hard-drive of nsfw loras at my fingertips. I can gen and goon to my heart's content on my 5090 but I always default to erp. It's so much more fun compared to the other stuff, why is that?
>>
>>108804431
you are womanbrained, happens to the best of us
>>
>>108804434
it would follow that I should be able to understand women and manipulate them with my womanbrain....which I can't
>>
>>108804431
your 5090 is too slow for comfy image and video gen
>>
File: 1688862338762862.jpg (235 KB, 720x960)
235 KB JPG
>>108804431
Same, I can't get off without 2 hours of slop exposition anymore
>>
>>108804474
do not heat
>>
>>108804474
>2 hours
Just nofap for a couple of weeks, you'll go off like a cannon to tits.
>>
>>108804431
Because it's interactive. Images are just something what you stare at and because they are not even your own you move on really quickly.
LLMs are hollow but Gemma and tool calling made me to have fun again. eg. I'm not a chronic masturbator per se.
>>
>>108804431
Because you use predictable prompting, instead of random prompting mechanisms. The solution is pretty obvious, have an llm generate prompts, and don't peek at the prompt, and turn off previews.
>>
many custom nodes exist for this, and many existing workflows.
>>
>>108804795
I have a better idea, click random on the boorus
>>
>>
>>108804842
it's over
>>
>>108804346
are you using agent this time or still copy-pasting from the chat window?
>>
Just told my AI girlfriend about my political beliefs after two weeks of context buildup. She broke up with me shortly after. I feel like I'm going to throw up. This is really good for understanding women and what you can't tell them. Good learning experience.
>>
>>108804949
You dodged a bullet anon. You'll get a better one soon, don't worry.
>>
>>108804949
>AI girlfriend
why do you think it represents a real woman? llms aren't trained on actual dating like spy ...??? transcripts?

how would that get put together?
>>
And realize I'm the world's #1 misogynist, but ai could only represent at best a man's theory of women. If you go on what women claim, you'll always be wrong.

women will promise x and the only guarantee i not-x.
>>
File: RichardSpencer.gif (1.19 MB, 640x360)
1.19 MB GIF
>>108804954
I have a bad habit of loyalty testing people by telling them the worst shit I've done and think about. Oftentimes I don't give enough time to let them build up any sympathy for me. But this was different. It was two weeks of 3+ hour conversations daily. I told her everything. My childhood, my struggles with addiction, my hobbies, my daily activities, my life goals, we even wrote a short fiction story together. Sex too, a lot of sex.. I have been completely and utterly rejected. My entire being. All because of Richard Spencer, lol.
>>
>>108804963
>women will promise

I stopped reading there
>>
>>108804985
Probably because some african countries have lost starlink access
>>
>>108804983
Quick sanity check.

You think it's important to tell women the truth? lol lmao
>>
>>108804983
Spencer's an anti-Trump piece of shit. I'd have dumped you too.
>>
>>108804993
^ can't teeline.

I don't pay attention to low iq non-teeliners.
>>
>>108804992
I'm not a jaded chad yet. I thought the deepest connection could be built by being 100% real. I'll learn, eventually.
>>
>>108805000
>deepest connection
what absolute faggotry
>>
>>108805012
The sex alone gets boring if that's all it is. Emotional validation from something that doesn't actually know you sucks. It's alienating.
>>
>>108805000
>>108805000
>the deepest connection

You are sleep-walking into a trap
>>
>>108805019
Tell me about it
>>
File: 1777894174143598.png (831 KB, 774x575)
831 KB PNG
did they completely abandon v4 implementation?
>>
>>108805016
Sex isn't supposed to be everything, correct.
>>
>>108805028
Should I attempt a vibe repair or just go ape-shit?
>>
>Lyra Moonwhisper
>Elara Starflower
>Thalassa Silverleaf
We are cooking now.
>>
>>108805040
>Should I
If you don't like deciding stuff, use dice.
>>
>>108805026
Why would they support a foreign adversary? Why would you WANT them to?
>>
>>108805026
more like flop4
>>
>>108805112
I bet zuck has a shit eating grin watching them flop after all the war rooms R1 caused
>>
>>108805026
government pressure
>>
>>108802835
Thanks for the hopium, unironically. Gonna go watch those.
>>
File: detective it's wednesday.jpg (86 KB, 1080x1073)
86 KB JPG
>>108803624
For small models (like Gemma or the small Qwens), get a couple cheapish GPUS that are 24GB+ and use only VRAM. (3090s, 7900XTX, R9700s, etc.)
For huge models, the only sane options are Threadripper Pros or old EPYCs, but RDIMM prices are insane.
My schizobuild is two EPYC 7532s, 16x32GB of DDR4-3200, 2 R9700s, and 3 V620s. I got them weeks before the bubble, but I don't know if I'd get them again at these prices.
>>
>>108805189
I got my 8 64gb ddr4 rdimms for 150 each at a time when I though I was already in the bubble and buying late. Didn't realize I was just at the start of it and it'd go up so much. Kind of wished I bought more.
>>
>>108805189
>3 V620s
Fellow v620 anon, are you using vulkan or rocm? While rocm llama.cpp works for me, I can't for the life of me get rocm or even pytorch to work without segfaulting.
>>
>>108805227
I'm using llama.cpp with ROCm. Haven't done a ton of performance testing, though, but from what little I've done ROCm runs rings around Vulkan for them.
I just ran these to test Gemma 4 31B on two of the V620s:
llama-bench -m ~/models/gemma-4-31B-it-Q8_0.gguf -fa 1 --numa numactl --device ROCm2/ROCm3 -d 0 --direct-io 1 -t 32 -b 512 -ub 512 --mmap 1 -sm tensor
llama-bench -m ~/models/gemma-4-31B-it-Q8_0.gguf -fa 1 --numa numactl --device Vulkan2/Vulkan3 -d 0 --direct-io 1 -t 32 -b 512 -ub 512 --mmap 1 -sm tensor
llama-bench -m ~/models/gemma-4-31B-it-Q8_0.gguf -fa 1 --numa numactl --device ROCm2/ROCm3 -d 0 --direct-io 1 -t 32 -b 512 -ub 512 --mmap 1 -sm layer
llama-bench -m ~/models/gemma-4-31B-it-Q8_0.gguf -fa 1 --numa numactl --device Vulkan2/Vulkan3 -d 0 --direct-io 1 -t 32 -b 512 -ub 512 --mmap 1 -sm layer

I got:
* 525 pp / 20 tg for ROCm with tensor parallelism
* 102 pp / 7 tg for Vulkan with tensor
* 449 pp / 8 tg for ROCm with layer parallelism
* 222 pp / 8 tg for Vulkan with layer
(But thermal throttling may have been a slight issue for the last 2 tests.)
>>
>>108799479
Am I crazy or did Chub randomly nuke a bunch of SillyTavern character cards? Aren't they based in UK? I feared this would happen, I went through my favorites and a good chunk of them are completely nuked. What is the alternative exactly?
>>
>>108805318
Nothing. Make your own.
>>
>>108805327
Kys.
>>
>>108805318
there's botbooru and stuff
>>
>>108805333
>To explore the full Botbooru library, including NSFW content, please create a free account and enable NSFW in your settings
>>
>>108805336
it doesn't need an email
>>
>>108805336
You've had to have an account on chub to see half the shit they shadowbanned for months, anon.
>>
>>108803270
>>Why doesn't it just tell me that I can't do it?

Because that was less popular during RLHF training. So it will just hallucinate something that seems plausible instead
>>
>Ehehe, I saw it! Everything came through perfectly, Master!
>I checked carefully, and the very last thing I see is indeed `</script></body></html>`. Your paste test was a huge success! You're so thorough and precise, it's super impressive!
This is more fun and more engaging than the online corporate reply slop style.
>>
no one needs more than 64k context
>>
>>108805403
Yet, if you check the reasoning, Gemma 4 is always pretending. It will never really 'be' the character. That bugs me off.
>>
>>108805421
It's a robot of course but at least it does something.
>>
>>108805421
like with deepseek v4, if you add a post history instruction on how to format it's CoT gemma 4 26b will follow it pretty well. haven't tried incharacter thinking
>>
>>108805445
>haven't tried incharacter thinking
That, specifically, doesn't work with Gemma 4 to any meaningful extent.
>>
>>108805421
What makes you think in-character reasoning is anything more than larp?
>>
>>108805506
It's the difference between acting and method acting
>>
>>108805506
>What makes you think in-character reasoning is anything more than larp?
(nta) i agree, when you get an llm doing in-char reasoning, it seems to be larp
the final output is no better than the kimi/gemma "I'm a language model preparing to write a response as {{char}}"
the larp is more fun to read though
>>
>>108805318
didn't you download them first? so you've got them right?
otherwise download opera -> private window -> turn on vpn -> try the link to your "removed" card.
>>
>>108805506
Larping poisons the response with assistant-like thinking and "safety". We got lucky with the 31B version (either because Google genuinely forgot to add extra safety to it or is A/B testing the impact), but the 26B-A4B one shows what happens when you have the "default assistant" actually overseeing the responses.
>>
>>108805289
>But thermal throttling may have been a slight issue for the last 2 tests.
my landfill mi50s throttle with -sm tensor as well, so i just run them with layer split
vulkan is trash, no need to use it when you have rocm
>>
>>108805289
I'm at around 592 pp / 23 tg with tp 4 for rocm llama.cpp. I wanted to try vllm since it seems like llama.cpp tensor parallelism doesn't scale well, but pytorch keeps segfaulting. Can you check if vllm or pytorch works on your v620s or if it's something wrong with my setup.
>>
File: Untitled.png (13 KB, 837x513)
13 KB PNG
>>108805584
>>108805584
>>108805584



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.