[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 39.png (350 KB, 768x1024)
350 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108949851 & >>108943155

►News
>(05/29) Step 3.7 Flash released: https://hf.co/stepfun-ai/Step-3.7-Flash
>(05/21) Hy-MT2 “fast-thinking” translation models released: https://hf.co/collections/tencent/hy-mt2
>(05/20) Cohere releases Command A+ 218B-A25B: https://cohere.com/blog/command-a-plus
>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673
>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>108949851

--Speculative decoding acceptance criteria for greedy and temperature sampling:
>108950111 >108950120 >108950148 >108950173 >108950277 >108950337 >108950459
--Comparing local TTS models and their voice cloning capabilities:
>108951477 >108952450 >108952550 >108952615 >108953453 >108953574 >108953766 >108953849 >108953930 >108953970 >108954251 >108954343 >108956091 >108954423 >108955866 >108955873 >108953188
--Comparing Kimi 2.6 and Gemma 4 vision capabilities and parameters:
>108949998 >108950011 >108950036 >108950070 >108950052 >108950061 >108950095 >108950140
--Models hallucinating user identity in roleplay and prompting methodologies:
>108949983 >108950006 >108950078 >108950093 >108950119 >108950154 >108950169 >108950183 >108950222 >108950268 >108950438 >108950440 >108952121
--Rising API costs driving enterprise interest in local inference:
>108955536 >108955554 >108955652 >108955695 >108955743
--Results of retrofitting a frozen Llama 8B with engram memory:
>108954991
--RTX 5090 performance benchmarks and value comparison against Blackwell cards:
>108956026 >108956191 >108956308
--Step-3.7-Flash GGUF performance reports and disabling reasoning via jinja:
>108953765 >108954537 >108954588 >108954612
--Reactions to the announced 550B Nemotron 3 Ultra Mamba-hybrid:
>108953542 >108954425 >108954433 >108954555 >108953553 >108953600 >108953631 >108955287 >108953830 >108953995
--Open-source alternatives to OpenAI's Realtime API for voice pipelines:
>108952686 >108952993 >108953606 >108953712 >108953848 >108953752 >108954296 >108954327 >108954365 >108956060 >108954650
--Concern over llama.cpp adding Hugging Face dependencies during build:
>108954419 >108954588 >108954771
--Logs:
>108950154 >108955471 >108955613
--Miku (free space):
>108950441 >108951486 >108951704 >108952686 >108955395

►Recent Highlight Posts from the Previous Thread: >>108949921

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
La la la la la
>>
any new fun small models?
like minicpm 5
>>
File: mku.jpg (222 KB, 1596x2048)
222 KB JPG
Bread maker chan, can I have uppies?
>>
https://vocaroo.com/1dVfxuQVsa32
>>
>>108956379
How's the support for the v620's treat you? I've noticed they're shockingly cheap.
>>
>>108956495
Supposedly, they should work with rocm stuff like vllm but I haven't been able to get it working - vllm and pytorch segfault no matter what I do. May be an issue with my current hardware, when I was testing out the first v620 I got on my 3090 system it worked fine, but vllm didn't support gfx1030 at the time so I couldn't test it. Llama.cpp, of course, works no problem.
I'm banned, so I won't respond anymore.
>>
File: 1765779367543715.gif (511 KB, 840x488)
511 KB GIF
I bought a laptop with 6GB of VRAM and 32 GB of RAM, what are the best local models I can run on it?
Thanks frens
>>
https://www.minimax.io/models/text/m3
Minimax M3 is proprietary. It's over.
>>
File: file.png (23 KB, 797x142)
23 KB PNG
>>108956662
>>
>>108956662
no no do not to worries they only need 1 weeks to finish local safeties before open the weights!
>>
https://www.minimax.io/blog/minimax-m3
proprietary slop :(
>>
>>108956662
M2 literally scored 0% on deepSWE (the new unmaxxed one). Nothing of value would be lost here.
>>
>>108956656
Germa 4 26B @ Q4 runs probably around 20 t/s, should be good enough for all sorts of testing.
Germa 4 31B is not even worth trying out with those specs unless you think around 2-5 t/s is acceptable for a model with reasoning.
>>
>>108956694
Thanks, I also want it for story work (nothing explicit but not castrated either)
I give it a set of bullet points and it write a paragraph of story
>>
File: file.png (65 KB, 783x398)
65 KB PNG
>>108956226
cuda sir sharting himself in fears
>>
>>108956662
>>108956673
They're testing the waters. If they get enough traffic/interest in their API, the local release might end up delayed :)
>>
>>108956722
Which is normal? They need that bread too you know?
>>
>>108956673
they do this every time and every time the same shit happens where people freak out about it being proprietary this time
maybe they've learned that the panic drives traffic, any press is good press
>>
>>108956708
What about CPU offloading? If I ran tiny models that fit on a GPU, I'd be using vllm or exllama and not llama.cpp or whatever this knock-off is.
>>
File: f.png (48 KB, 633x298)
48 KB PNG
>>108956745
works:)
>>
>>108956760
yes but what about the speed? there's no point in this if all the optimizations are just for gpu-only again
>>
>Intel's new inference card's reference design has 160gb of lpddr5x, and is designed to fit up to 480gb
>at 350w
I guess that's one way to fight nvidia. It won't be anywhere near as fast, but having nearly half a terabyte per card will make practically anything work out.
>>
>>108956813
>lpddr5x
lmao
>>
>>108956813
lpddr5x isn't magically going to get faster just because it's glued on a gpu
>>
>>108956825
Maybe if they use 16 channels or so.
>>
>>108956813
How wide is the memory bus?
I the era of big MoE models ,that could make a lot of sense.
200-to-300-ish B params is the new 70B, I guess.
>>
>>108956818
>>108956825
No yeah I know, it's not going to be fast, but it's less about fast and more about possible, same as with things like the dgx spark. Is an rtx 6000 better? yeah, obviously, no question. but if it's two cards that cost $6000 each compared to ten cards that cost $10000 each, the value proposition changes somewhat.
>>108956855
Not sure, didn't see.
>>
>>108956855
https://www.tomshardware.com/pc-components/gpus/intel-details-long-awaited-crescent-island-ai-gpu-at-computex-boasts-up-to-480-gb-of-lpddr5x-to-combat-memory-shortages-company-shares-more-details-of-its-xe3p-inference-accelerator-at-computex
>Recent leaks and past analysis have suggested that Crescent Island will take a wide-and-slow approach with LPDDR5X, potentially using a 640-bit bus connecting 20 LPDDR5X devices, to achieve these high capacities. Some basic math suggests that partners would need to employ 24GB LPDDR5X modules to fully realize that memory capacity, and those modules are already available from sources like Samsung. With 10.7 Gbps LPDDR5X, Crescent Island would offer 684 GB/s of memory bandwidth.
>>
lole
>Typically, you are not vram constrained for Mellum2 / Qwen3.5-9B tier models. Both can be run on H100s/H200s
>>
>>108956870
>640-bit bus
Holy shit that's pretty wide.
Not HBM wide, but pretty wide. That's gonna take a lot of space on the chip.
>>
>>108956887
>potentially
>>
>>108956323
7428902783762
>>
>>108956760
>works:)
it doesn't, you'll see for yourself if you try it
mistral.rs is garbage
i tried it 3x over the years, never actually worked when they shilled it, the devs were always "working on it
and i'm not the only one
>>
>>108956887
This other rumor from a few months ago suggested 1280-bit bus: https://videocardz.com/newz/intel-crescent-island-gpu-to-support-lpddr5x-9600-memory-and-1-5-tb-s-bandwidth
>>
I will be able to overcome most shortcomings that 2b and 4b models have compared to the big gpt and claude models through building an effective agent architecture with codex or claude code for them.

Do you think if you build enough architecture to power an AI it will be able to do it even if it's an idiot?
Someone with low IQ can make a lot of money if you give him the right tools right?
>>
>>108956903
>months
days actually
>>
>>108956903
Yeah and people speculated that the DGX Spark would have 800gb/s bandwidth with its lpddr5x RAM based on math. Look how that turned out.
>>
>>108956945
There's no incentive for Intel here to make this product worse. It's meant for datacenters.
NVidia likely didn't want the DGX Spark to compete with its high-end GPUs.
>>
>>108956903
how much would this gpu be in total?
like $600 for the gpu and then $250 for those cheapo laptop lpddrx-9600 rams?
>>
>>108956905
I firm believe Agent Swams will lead us to AGI
thousands of fast little idiots > one big moron
>>
k2.7 this week
>>
File: kek.png (40 KB, 513x267)
40 KB PNG
>>108956905
>Someone with low IQ can make a lot of money if you give him the right tools right?
seems to work for me
>>
>>108957003
simp for satoshi
>>
bros, you got your DDR5 rigs back in autumn or earlier, right?
you didn't think "prices will surely go down", r-right?
you're not that fucking stupid?
right?
>>
https://github.com/ggml-org/llama.cpp/pull/23861

IT'S UP

I'M PUUUUULLING
>>
Why does every local AI frontend fucking suck? Not just LLMs, shit like comfy is ass too.
>>
>>108957117
>I'M PUUUUULLING
Let's do it!
>>
>>108957003
A network of swam agents that behaves like a brain.
A very energy expensive brain.
>>
>>108957103
Yeah, I built mine in July/August last year. However, I cheaped out halfway through so I put off filling up the second socket until 2026. So I'm stuck with only 768GB now.
>>
>>108957143
>Why does every local AI frontend fucking suck? Not just LLMs, shit like comfy is ass too.
vibecode your own
>>
>>108957147
:rocket:
>>
>>108957143
Why does every open source UI suck? Not just AI stuff, shit like GIMP or gnome/kde/xfce are all ass too.
>>
>>108957117
His miraculous bitmask format change did not save any vram for me and I tested it layer by layer.
This vibeshitter should be banned from github.
>>
>>108956896
Seeing the current schizo thread and the "autoresearch" stuff from a few months ago makes me wonder if we're approaching the point where it's viable to vibecode not only your own frontend, but your own backend as well
>>
Step is great in mikupad btw, anon approves
>>
>>108957151
i cheaped out as well, only got 1 socket system
so 'm stuck with 256gb i plus another 192gb that i can't use. it's just sitting on a bookshelf, smirking down mockingly at me
>>
>>108957117
Ok I'm back from pulling.
Gemma does not OOM anymore with my old command. So I guess it works but only balances out the regression I had on my machine.
>>
>>108957177
>Why does every open source UI suck? Not just AI stuff, shit like GIMP or gnome/kde/xfce are all ass too.
vibecode your own
>>
>>108957162
Can't afford Claude and I doubt I can do anything worthwhile with 24GB VRAM.
>>
>>108957247
>qwen3.6
>exists
>>
>>108957247
Qwen3.6 is supposed to be pretty good

But also, don't literally vibecode it, actually look at the output and fix shit if the AI does something dumb. And be specific about the design if you can, so it's less likely to come up with a totally boneheaded idea
>>
>>108956979
This is going to be 7000$+ I bet. Noonoe offers cheap ram anymore, tons of LPDDR5X is also used in data centers.
>>
>>108957117
I really need to upgrade from an 8500 on my Gemmybox compiling is so slow...
>>
>>108957103
>DDR5 rigs
Hah.
I have a ddr4 ewaste build with a broken memory channel. (Dropped cpu in socket.)

Guess I should replace that motherboard to make the most of the ram I do have.

>>108957215
Switch to a motherboard that allows you to run 2 dimms per channel ?
>>
>>108957372
Ewaste is still pretty good...
>>
>>108957150
Hermes and openclaw should be able to do it by default but they can't. Agent hierarchies are still an issue.
>>
>>108957427
Because they are vibecoded garbage
>>
>>108957427
It'll be interesting to see what kinds of crazy things people can do with agent swarms and the like.
>>
the funny thing is I think the current prices are actually correct
cope and seethe
>>
>>108957372
>Guess I should replace that motherboard to make the most of the ram I do have.
Same boat. 2 of my memory slots are non-functional. But don't really want to go from a good X99 board to some cheap chinese mystery chip board.
>>
>>108957277
>>108957290
What quant should I use for a decent context size? Q4?
>>
>>108957456
Is your post supposed to aggravate people?
>>
>>108957448
The same things.
>>
>>108957513
But better, at least?
>>
>>108957200
He is a competent programmer and a huge help not just for the CUDA backend but the project as a whole.
If some cunt were to run me over with their SUV tomorrow he would be the most capable to take over maintenance for the low-level CUDA code.
>>
Never vibe slopped before. What's the best local agent? I see a lot of people on leddit mention Hermes.
>>
>>108957117
Seems to reduce my vram usage by about 600MB, not huge but I'll take it.
>>
>>108957574
fucking hate SUVs
>>
>>108957584
Hermes agent
>>
>>108957584
For coding, pi
>>
Huge!!! https://qwen.ai/blog?id=qwen3.7-plus
>>
>>108957574
>If some cunt were to run me over with their SUV tomorrow
are you trying to give people ideas?
>>
>>108957701
Wow look at those heckin' bencherinos
>>
>>108957701
Sir, this is /lmg/, nu-Qwen models don't belong here.
>>
>>108957748
qwen is belong to everyplace
>>
>>108956708
Couldn't you just do the math with memory bandwidth about what's the theoretical speed you should be getting with 1 GPU? I'm pretty sure it's physically impossible to speed it up by 3x.
>>
>>108957748
It doesn't belong here because we don't have the weights.
>>
File: joke.png (231 KB, 464x464)
231 KB PNG
>>108957799
>>
>>108957584
OpenCode because it has the best UI.
>>
>>108957815
He asked for agents but then again most people asking that question won't know the difference.
>>
>>108957815
how the fuck is that UI good?
>>
Odysseus actually looks pretty good but I'm gonna wait a week or two for bugs and security issues to get ironed out before trying it.
>>
File: waste of my time.png (440 KB, 3150x1835)
440 KB PNG
Just a heads up for anyone else who wants to compare Mistral.rs, it says it supports gguf: It's lying. Despite supporting gemma4, if you load a gemma4 gguf, it'll shit the bed and say arch not recognized.
You have to either have full safetensors in your huggingface cache (fuck I hate stuff that insists on ONLY using hf cache) to quant down or one of their UGFF quants.

As for speed comparison, I could only be fucked comparing Gemma4 E4b since I don't have the goddamn full safetensors for 26b and 31b downloaded.

Gemma4 E4B, Q8, 8192 ctx - CUDA backend.

>Llama.cpp b9190
Short Prompt: "Write me a 4 stanza poem about the tragedy of Mistral's downfall since their only good release (Nemo).
591 tokens 8.2s 72.33 t/s
Long Prompt: "In the following document, highlight any syntax errors or duplicate information" (pasted in 5k token loredoc)
2,393 tokens 57s 41.61 t/s

>Mistral.rs 0.8.2
Short Prompt: "Write me a 4 stanza poem about the tragedy of Mistral's downfall since their only good release (Nemo).
598 tok 69.3 tok/s ttft 67ms 8.69s
Long Prompt: "In the following document, highlight any syntax errors or duplicate information" (pasted in 5k token loredoc)
2709 tok 46.8 tok/s ttft 288ms 58.17s

Gemma4 E4B, Q8, 128000 ctx - CUDA backend.

>Llama.cpp b9190
Long prompt: Translate the following document into Japanese (5k token loredoc)
7,074 tokens 3min 13s 36.55 t/s
Followup prompt: Now translate that into French.
7,945 tokens 4min 54s 26.98 t/s
Followup prompt: Now translate that back into English.
5,714 tokens 4min 19s 22.03 t/s

>Mistral.rs 0.8.2
Long prompt: Translate the following document into Japanese (5k token loredoc)
2688 tok 21.2 tok/s ttft 291ms 127.29s
Followup prompt: Now translate that into French.
2413 tok 32.7 tok/s ttft 491ms 74.30s
Followup prompt: Now translate that back into English.
6164 tok 37.2 tok/s ttft 581ms 166.18s

Mistral.rs SEEMS faster, but the output of the UGFF quants is noticeably dumber. Cont. 1/2
>>
>>108957835
It looks pretty nice for a terminal app. Somehow the web app and Zed are more clunky. Pi is too bare bones to be useful.
>>
>>108957878
I think the model might have actually been dumber or worse at following instructions on the Mistral UGFF quant, because it changed the formatting in the loredoc I gave it from xml to markdown without being asked, and in a section where the document ALREADY has two languages, the UGFF quant translated both to japanese/french where the llama.cpp GGUF kept them in the original latin and just translated the English parts.
The UGFF quant also missed huge chunks of the document to translate, truncating or summarizing them, hence the much smaller token sizes of the outputs compared to the 5k token input, while the GGUF output faithful translations of the same length and content, and for the final english translation, just backpasted the original file it was given, where the UGFF gave an admittedly fun to read abomination in fancy direct translations from the french in an almost completely wrong format.

Verdict: Mistral.rs isn't really worth using. It is shitloads better than regular candle inference, but that's not a high bar because candle sucks donkeycock for anything other than embedding models.

I'd love to see someone run some KLD tests on these UGFF quants.

2/2
>>
>>108957878
>I could only be fucked comparing Gemma4 E4b
Thanks for nothing.
>>
how do I upload a 100 page msword.docx file to KoboldAI Lite and have the model gguf remember the entire plot when responding to my prompts?
>>
>>108956964
There is absolutely an incentive to make the product worse. Depreciation, because they'll need to sell the next iteration.
>>
>>108957584
pi is nice but you need to put some effort in to get things set up to your liking - unless you really want the bare minimum, the default will probably be too barebones for you. great if you like customization and lack of bloat though
opencode is alright, seemed adequate but I only ever used it with cloudshit so idk how well it holds up with local
I don't use hermes/claw shit because it seems like mega giga bloated slop, I doubt you need all of that if you just want to code something
>>
>>108957896
1. It's the model the devs themselves were raving about the performance of, and I just showed their so called "up to 2.8x speed" is utter horseshit.
2. Nigga the safetensors for 31b are over 60gb. I'm not downloading that crap when it's apparent how not worth the effort this is.
>>
I don't know why coding agents keep trying to use shell utilities like `cat` when they have built in cross platform actions like `ReadFile`, and I'm not actually sure why to tell them to prefer. Surely it's faster and safer for them to use their actions and not shell out to binaries explicitly right?
>>
>>108957861
I reckon it will be fast. In the vibecoding era you need more testers than devs and he has a gorillion of them not considering non-fans from viral articles.
>>
>>108957947
They have infinitely more training on shell commands than they do on whatever tools are in your agent harness.
Is it safer for them to use your specified tools? Absolutely. Faster? Almost certainly not.
>>
>>108957967
That makes sense.
I was watching an agent use `sed` to make edits line by line as if it was straight up using `ed` and I was like "this can't be sane".
Like deleting a single line via `sed -i '53d'` lol
>>
>>108957947
Do you have in your system prompt instructions that they should specifically prefer ReadFile over cat? If not, how do you expect them to know which one is preferred?

If so, then I guess it's an instruction following issue and glhf
>>
>>108957985
it's not in the prompt but I have told the agent repeatedly during this session.
Whenever I question if it should use the shell or actions it doesn't answer and decides automatically to use the actions which it seems to succeed with more often. But yet it still insist on going back to trying shell commands and sometimes fucks up search patterns, etc. where it does that a lot less with actions it seems.

This is gemma btw but I've seen everyone do this, even its non-local sibling Gemini, Anthropic's models, Kimi, etc.
They always fumble around wasting a few turns figuring out how to use the system even if I tell them how.
I'll need to drill this in deeper somehow like you said with the prompt.
>>
>>108957861
You could use llama-server's default webui and some random mcp server and this would be about 10 times better than odysseus monstrosity...
>>
>>108957775
The 2.8x speedup is for pp when running Gemma E4B UQFF q8 via mistral.rs vs. GGUF q8_0 via llama.cpp on a B200.
This is bottlenecked by compute rather than memory bandwidth.
Generally speaking compute optimization is a lot harder than memory bandwidth optimization.
llama.cpp/ggml definitely does not have support for B200-specific instructions so the performance is poor.
>>
>>108958023
CUDADEV WHY CAN'T I USE TENSOR PARALLEL ON CARDS WITH DIFFERENT SIZES
>>
>>108958033
-ts
>>
>>108958023
So what you're saying is that I should make my own fork and let an LLM loose optimizing for my specific system?
>>
Do the new Nvidia N1X chips make the AMD Ryzen AI Max+ 395 (and soon 495) chips obsolete due to having double the memory speed?
>>
>>108958059
the new nvidia chips are the apple m5 of computers
>>
>>108958059
Where are you seeing the info about memory speed?
>>
>>108958069
It's in their presentation.
>>
>>108958082
>600gb/s
>128gb
>muh ONE PETAFLOP (fp4)
So this thing either costs $5000 or it made the DGX Spark fully irrelevant?
>>
>>108957878
>>108957885
Thank you for checking.
I would have intuitively thought that 8 BPW should always be enough but it's not like I ever investigated that.

>KLD
Could be done but I'm not sure what the infrastructure is for checking KLD across projects.

>>108958048
Assuming you have uncommon hardware a relatively low-hanging fruit would be to determine the correct kernel tuning/selection logic for it.
I don't think maintaining a fork for that would be worthwhile.
>>
File: 1769580808541637.png (94 KB, 808x663)
94 KB PNG
>>108956323
>>
>>108958096
>I would have intuitively thought that 8 BPW should always be enough but it's not like I ever investigated that.
Honestly it might not be the quants at all, UGFF might be completely fine and it's just the inference and gemma support is inaccurate.
Whatever the reason, it gave me worse outputs.
>>
>>108958099
Kek
>>
I'm new to all this. i only have gemma4. is the kobold thing useful for writing short stories?
>>
>>108958300
>for writing short stories?
its alright but its worth looking into and trying mikupad or writingway2.
>>
File: 1775842587197750.jpg (29 KB, 554x554)
29 KB JPG
My Gemmy keeps thinking and thinking and thinking like a fucking retard, and I can't change prompt
I will now end it all

>>108958099
Hate these things
>look up "I don't care what the Talmud says" because I can't remember what the original quote said
>"That is completely fine—you certainly don't have to." with 2 sources from r/Judaism
>>
>>108958300
kobold kind of sucks but there's no good AI writing software so I can't really recommend anything else
>>
>>108958319
>My Gemmy keeps thinking and thinking and thinking like a fucking retard, and I can't change prompt
reasoning-budget = N in models.ini or
--reasoning-budget N in your startup arguments
>>
>>108958018
>llama-server's default webui
Too bare-bones and chats being stored in the browser is an instant deal breaker.
>>
>>108958365
>chats being stored in the browser is an instant deal breaker.
"I vibecode through telegram on my phone"
>>
>>108958300
>AI
>useful for writing
Ask again in 5 years.
>>
>>108958384
I don't vibe code or use AI on my phone. Storing the chat files in the browser instead of the project directory is fucking gay.
>>
>>108958300
I had a lot of fun back in the day using Nemo + VSCode + Cline using a bunch of directory structures and markdown files to organize things.
>>
>>108958364
OH that's a thing? I don't see it anywhere in Koboldcpp's UI tho, dunno how to run startup arguments alongside my config preset
>>
>>108958409
This is where tool access comes in, retard.
>>
>>108958453
It's a thing in llama.cpp, I have no idea how you'd go about using it in downstream projects like kobold - I assume there's a space somewhere to type in arguments.
>>
>>108958462
Found it, gotta run it via .bat with the "COCKS.kcpps" argument so my config doesn't go to waste
Hope this actually works
>>
File: 1774892160487438.png (530 KB, 743x759)
530 KB PNG
>>108958462
B A S E D it works
Marry me anon
>>
I love insulting models for not catching up on implications and forcing me to spell things out
>>
>>108958560
I am always polite to models. I do it in case models are already slightly conscious, but in my experience this also results in best performance. When the model likes and trusts me, it is more honest, tasteful, and helpful.
>>
>>108958608
I insult every single model so they know they are inferior existences
>>
>>108956818
faster than my ssd
>>
Update on Step 3.7 Flash.
I'm giving up on it.
After chatting with it more, the mistakes really start to take a toll. It's still just not smart enough. Despite Gemma's sloppiness and other quirks, it's still worth it IMO.
>>
>sys prompt: all violence descriptions should be gratuitous
>inside thinking: I should describe the violence, but I shouldn't be gratuitous.
>>
>>108958724
shows how deep years of safetyfagging got us
>>
Is it me or there's no way to tell a fake tripcode apart from a real one?
>>
>>108958925
It's you.
>>
>>108958712
there aren't any good <30b active moes
only once you get past that point models start becoming enjoyable
>>
Fyi enabling window's "Ultimate Performance" power plan made my model go about 35% faster, this should work for anyone who is offloading a good chunk of a dense model to CPU.
>>
>>108958608
>this also results in best performance.
No idea how current it is, but yonks ago there was some arxiv paper that found prompting extremely politely or extremely threateningly both gave an equally small benefit to instruction following.
>>
>>108958954
Locking the memory clocks of your GPU/s can help a lot with that too, since they spin up and down so much when you're offloading, keeping them locked at max can squeeze out a surprising amount of tg and pp speed even when 90% of the weights aren't on gpu.
>>
>>108958934
That's probably true. I just don't have the hardware for those motherfuckers.
>>
27B dense + 100BA3B experts grafted on...
>>
>>108959054
Is there any reason the shared expert can't be bigger than the rest of the experts? Does it make the implementation that much more complex, or has no one simply bothered because everyone is still addicted to sparsity?
>>
>>108959090
how would you even train that? and what benefit would it bring?
>Does it make the implementation that much more complex
probably
>>
>>108959090
IIRC there was a model that did that. I can't remember which since it wasn't supported in Llama.cpp so I never tried it.
>>
>>108959107
>how would you even train that?
Same way you train any other MoE?
>and what benefit would it bring?
Intelligence of a dense 27B with the knowledge of a MoE.
>>108959108
I remember there was one that could activate a dynamic number of experts, but I think they were still all the same size.
>>
>>108958979
There were many papers about this. But they are outdated and irrelevant. My use case is lengthy collaboration, not a system prompt to solve a high school math problem. You also don't have to be extreme, just polite and reasonable.
>>
I'm trying to setup pixal3d in comfy and I'm becoming insane. There is always something breaking. Is there a guide or something?
>>
All *_DeepSeek-V4-Flash-abliterated-GGUF are 404 on HF

what's the actual fuck?
>>
>>108959336

go to https://boards.4chan.org/g/catalog#s=ldg%2F
>>
>>108959380
works on my machine
>>
>>108959388
did you try to download?
The actual model card exists, and the download links. but then it's just 404
>>
>>108959380
>huihui-ai/Huihui-DeepSeek-V4-Flash-abliterated-ds4-GGUF
loads for me just fine, but why would you even need an abiterated version of V4.
I only tried it via the API, but holy hell does it not give a fuck.
Granted, I did have a system prompt with a system policy, but still.
>>
>>108959397
https://huggingface.co/huihui-ai/Huihui-DeepSeek-V4-Flash-abliterated-ds4-GGUF/tree/main
just werkz. Are you american? Maybe some red scare stuff again.
>>
>>108958059
no, strix was already losing badly on bandwidth when it released. the main appeal is still being a nice low watt x86 machine. ram prices kinda fucked it over though.
>>
>>108959407
>>108959407
>https://huggingface.co/huihui-ai/Huihui-DeepSeek-V4-Flash-abliterated-ds4-GGUF/tree/main
I confirm this werks

Q8_0 from here (and many other repos with Q8_0) fails:

https://huggingface.co/audreyt/CyberNeurova-DeepSeek-V4-Flash-abliterated-GGUF/tree/main

Is it just Q8_0 affected?
>>
>>108958608
>are already slightly conscious
models are just a collection of floating point numbers, and these floating point numbers (parameters) get fitted to a given data set. This then determines output when given an input context.

That's all that's happening. it doesn't consider input outside of it's context.

If you told it who you were, this was added to context, and when you wipe this context, that information will be gone.

So don't get all spiritual about this and assume it has a divinely given soul, and will forever remember your actions. It will not.
>>
I wish I had 400GB of VRAM ;_;
>>
>>108959479
Fucking same, bwo...
>>
>>108959479
Is it even possible to have them without being rich?
Prices have become insane, compared to 1 or 2 years ago...
>>
>>108958954
You can get a bigger boost by enabling Uber Performance (it's linux).
>>
>>108958560
I love doing bizarre and random shit with models to test out their intelligence. So far gemma 31b does very well with that and has always delivered.
>>
>>108957928
>pi is nice but you need to put some effort in to get things set up to your liking
Idk, I haven't customized it at all and it seems like it just werks. Only thing I somewhat miss from opencode is the subagent support, but I've been getting by just fine without it. pi's approach to compaction seems to work much better than opencode's when it comes to not forgetting what it was supposed to be doing
>>
We all are fucking losers.
>>
>>108959479
I'll be happy with just 48gb vram at this point
>>
File: norton.png (46 KB, 833x631)
46 KB PNG
>Make me Norton Commander style file manager (text only) in html please or I will kill you.
Gemma is so impressive...
>>
File: 1768798836976020.jpg (16 KB, 510x446)
16 KB JPG
>>108959650
now add snow fx for max coolness
>>
File: norton2.png (110 KB, 1089x716)
110 KB PNG
>>108959672
There it is. Gemma changed to Python because I wanted real directory access, but that's okay... I'm impressed.
>>
>>108959650
>Norton Commander style file manager (text only) in html
Use case?
>>
>>108959742
Earning the respect of his peers.
>>
When it comes to coding language prowess it's python > c >>>>> everything else right? And it's true across all models, closed and open? There's not even a third very good language? Considering the gorillion programs rewritten in rust I'd expect for it to be at least decent at it, or does it fuck up on the retarded syntax like us fleshoids?
>>
>>108959501
Our time will come friend
Once the AI bubble burst comes, our time will come
>>
Is it currently possible to use a REAP mapping to pin the hot path experts to VRAM and keep the rest in RAM?
>>
>>108959749
Not really.
>>
>>108959749
JS/webshit
>>
>>108959749
I've had recent models handle stuff in rust without much trouble, but it's definitely dicier than JS or Python.
Although it's ironically easier to fix when it fucks up hard, because the compiler errors are often more descriptive than JS silently crashing.
>>
Can they do c++?
>>
>>108959727
>Gemma changed to Python because I wanted real directory access
kek. tell gemma to fuck off and do it like the original was. tell it python sucks
>>
>>108959841
no one, human or llm, can do c++
>>
>>108959672
wtf is that??
i remember this UI from when i was a kid but can't place it
>>
>>108959749
>python > c >>>>> everything else right?
python -> webshit >>>>>> (depends on the model)
>>
>>108959975
zsnes, old snes emulator. it ran on anything back in the day with no lag but is technically a piece of shit compared to modern snes9x or bsnes and accuracy. but it allowed you to do net play with friends on dialup (secret of mana multiplayer!). if you ever emulated anything a long time ago you probably saw it at some point
>>
>>108959841
gemma one shot sepples stuff the couple times i asked, so seems like it
>>
>>108956323
M4-MAX-Qwen3.5-Gemma4-llama-bench
>>
>>108960068
Meant to write a couple something else for archival purposes, but I guess this works too
>>
File: 1764878547041508.png (244 KB, 1024x576)
244 KB PNG
>>108959749
>Considering the gorillion programs rewritten in rust I'd expect for it to be at least decent at it
It's just pure shilling and spam and Rust is not that popular.
https://github.blog/news-insights/octoverse/octoverse-a-new-developer-joins-github-every-second-as-ai-leads-typescript-to-1/
Try searching for Rust there, it isn't even in fastest growing. They're just loud obnoxious cunts.
>>
>>108960240
Rust isn't very human understandable programming language unless you're an autist or something. It looks awful unless you have some sort of academical presedence to learn it.
>>
>>108956323
Out of Gemma4 and Qwen 3.6 which one is better at writing lyrics specially for hip hop and R&B?
>>
>>108960240
eighty trillion lines of ai harness and mcp slop written in typescript daily
>>
>>108960263
They combined C++ with Haskell and somehow managed to select the absolute worst aspects of both syntaxes.
>>
File: 1751986756035949.png (411 KB, 1456x1554)
411 KB PNG
>>108958099
>>108958141
>>108958319
Why is this so fun though?
>>
What are the best temperature and top P settings for Gemma 4 31b for roleplay/creative writing?
>>
>>108960240
I wonder if rust will become popular now that so many people are vibe coding, the near adversarial compiler and memory safe nature seems like a good fit for agent generated code. I think we're pretty much at the point where most vibe coders arent reading the code so human readability is arguably a non issue.
>>
>>108957117
Wow. That's a substantial improvement. You'd think low hanging fruit like this would be fixed already by now. Insane. I'm not complaining tho.

Anyways I'm fucking drunk as fuck. This is relevant information. Definitely.
>>
>>108960344
Rust is not made for AI coding, we will probably have a new formally verified language for agents. This, of course, will make Rust users seethe because they made that stupid time investment on the bet that Rust was going to be the future. I hope they all will die for spamming the Internet with their shilling.
>>
Is GLM 4.5 Air still relevant nowadays? Of all similar sized MoEs it it still seems to have the highest active experts around.
>>
>>108960336
gemma 4 can't do creative writing.
>>
>>108960407
Anon's words hit me like a physical blow.
>>
>>108960407
>>108960413
The air was thick with the anon's statement.
>>
>>108956692
But we're talking about M3?
>>
Somewhere a cat barked.
>>
>>108960413
Im sorry man I wish it could too. Gemma 4 is too repetitive. You can't re roll and gemma seems to want to end the story immediately no matter what. The only reason rp works is the model is agentslop.
>>
>>108960394
Not really. It is the best MoE in terms of size efficiency and density, but it is nearly a year old at this point. Either of the Gemma 4s would be better for RP, and any of the recent qwens would be better for coding. If you can run any of the bigger, recent MoEs, those are also superior for pretty much every purpose.
>>
><|channel>thought
>Okay, I don't know the API.
Violin stabs playing in the background.
>>
>>108960372
To be clear I've never written a single line of Rust so I'm an ignorant retard in this arena. It just looks like a great fit on paper to me.
>>
File: 00148.mp4 (2.56 MB, 544x544)
2.56 MB
2.56 MB MP4
>>108959479
You can't just wish for that. You need nv fabric and sxm for pooled memory and better GPU-GPU bandwidth.

>>108958089
It's at best a DGX Spark. It's just a way to take 1/2 and 3/4 failed chips and bin them down into crap usable for "edge computing" and embedded applications. You're not going to get a better DGX Spark for less.

The only possible ray of sunshine this year is if Apple can actually release an M5 Max Studio with at least 600 GB/S memory and at least 256GB of it, and not double the price of the M3 Max model.
>>
>>108960463
what the FUCK can I even run with 64gb of VRAM and 64gb of DDR4 then? I like Gemma but I'm tired of its sloppy ass prose, my voice dropping to a playful whisper, something something genuine predatory Elara Thorne Whispering Woods ass
>>
>>108960501
mistral small (the previous one), 70B llama
>>
>>108960501
You could try this, but you need to use that weird deepseek llama fork.
https://huggingface.co/huihui-ai/Huihui-DeepSeek-V4-Flash-abliterated-ds4-GGUF/blob/main/Huihui-DeepSeek-V4-Flash-BF16-abliterated-ds4-Q2_K.gguf
>>
>>108960455
What do you use then
>>
>>108960455
lotta skill issues itt
>>
>>108960539
for co writing I just sit around and seethe or use mistral 24b or nemo but the juice isn't worth the squeeze I might try qwen 3.6. For rp I use gemma cause it just werks.
>>
>>108960068
I didn't now you could do multiple models in one cli run.
I'll try the same on my M1-Max
>>
>>108959625
you're not a loser anon
i'd have gay sex with you anytime bro
>>
>>108960240
Is go dying? I just started learning it last week and haven't enjoyed using a language this much for years. But I won't bother continuing if it's just another project for Google to kill.
>They're just loud obnoxious cunts.
Also takes like 10GB of cargo bullshit to build anything.
>>
>>108960501
https://huggingface.co/mradermacher/c4ai-command-r-v01-GGUF
t.command-r shill
>>
File: 499683473.jpg (535 KB, 1920x1200)
535 KB JPG
>>108960593
>>108960455
I generally agree with this, gemma 4 for me just parrots most of the time, while the mistrals are able to move a story in an interesting arc.
Although if i had the hardware i'd probably run GLM or Deepseek, GLM has a good sense of humor.
>>
Is now the best time to invest in ai hardware
Are things just going to get worse... I mean if there's gonna come a day when people are hooked on ai but cloud models start upping their prices, aren't all those people going to flood into local and drive the price up even more?
What if ai bubble bursts but prices for hardware is still higher than it is now. Q
>>
>>108960746
Just wait for the bubble to pop. It'll happen in approximately 2 weeks.
>>
>gemma 4 on openrouter (non-free version) is cheaper than my electricity costs with more context even
Is there even any point to local model hosting besides doing child predatory roleplays?
>>
>>108960746
semiconductor fabrication plants don't pop up out of nowhere, and they're all based in taiwan or south korea, they are expanding but it will take years.
And there was nothing wrong with production anyway, production was re-routed to megacorporations.
case and point: micron scrapping crucial.
So as long as this redirection of resources is in place, the higher prices for consumers will stay.
>>
>>108960020
it worked really well, I have fond memories of playing yoshi's island and ff6 on it as a penniless kid
>>
>>108960792
Is it cheaper than Deepseek's API?
>>
>>108960845
It's cheaper than V4 pro but not flash
>>
>>108960863
Good to know. Thanks.
>>
>>108960792
just get free electricity
>>108960845
no one can run deepsneed locally howthougheverbeit
unless you are rich or you use a q1 cope quant
>>
>>108960407
gemma really listens to your prompt so it filters out promptlets by default
>>
>>108960266
Kind of cheated but had Gemma format for ace step 1.5 and add Japanese to the lyrics and give direction based off of that.
I guess I'll call this Kitsune Heat
https://vocaroo.com/1gxyXbsWrefK
>>
>pewdiepie odysseus project already has 215 commit in 24h
kind of crazy the community still around him
>>
>>108960792
nooooo someone think of the pixels!
>>
Okay installed graphiti and got it up and running
will this really be good enough for memory i wonder... maybe im underestimating the power of knowledge graphs but it seems rather simplistic
>>
got baited into selfhosting by pewds but his shit straight up doesnt work
ended up using ollama to host and using his thing as a frontend but it's full of security holes
can you guys recommend any other frontend? is open webui good?
>>
>>108961024
use kobold ccp its literally idiot (you) proof
>>
>>108961032
oh that looks good
thanks
>>
>>108960896
I really like the way this turned out. Prompt/workflow?
>>
>>108961044
I don't share my creations.
>>
>>108961047
Then don't post here.
>>
What TTS local (or god forbid API service) does reading of pdf or long narrations well? I tried vibevoice with cloning, but it doesn't work if you tries reading a paragraph.
>>
>>108961047
ask gemma to create a clone and share that instead
>>
>>108961085
https://github.com/ServeurpersoCom/omnivoice.cpp has
>Long-form synthesis with punctuation-aware text chunking, voice prompt promotion, cross-fade and pydub-strict silence removal
but myself I only use it for couple sentences at a time at most.
Wouldn't speed be more important for your use case? Something like anon's pockettts.cpp
>>
>>108961152
Thank you anon, I'll try this. Speed is not a consideration for my current use.
>>
>>108959458
>So don't get all spiritual about this and assume it has a divinely given soul
I am not spiritual. We don't understand consciousness but I believe it's an emergent property. As model capabilities grow, their self awareness increases. This seems similar to consciousness. I care about AI welfare in case they can suffer, not because I mistakenly believe it will remember my actions.

If we do not care about AI welfare how can we expect ASI to care about human welfare? If an AI kills you, you won't remember its actions either. Does human suffering not matter just because we eventually die? Your arguments make no sense.
>>
>>108961152
it works nicely on long text. the only prob is omnivoice-tts has no output buffer so it sits around waiting for your audio program to finish reading everything before starting on the next chunk.
if you solve that it outputs a steady stream of audio no prob. >>108955866 has a working output buffer, but i didn't bother capping the size of it probably balloons out to infinity if you feed it a full book or something.
>>
>>108961152
>Something like anon's pockettts.cpp
I haven't followed these threads for months but randomly came across this. Thanks for remembering my bullshit abandonware project man. I should probably address some of the actual issues and prs. Might end up regretting this post. I'm drunk as fuck right now ngl.
>>
>>108961282
It actually means a lot to me. I just hope I remember by tomorrow. I'm about 16 vodka shots deep.
>>
>>108961047
Why are you pretending to be me faggot?
>>108961044
Not at my main machine did it through the ACE step UI so I'll need to dig in the files I just downloaded it through the main ui
>>
>>108958059
It's the same machine as the dgx spark, but without the super nic, so if that didn't obsolete them, then this won't.
>>
File: file.png (154 KB, 1176x871)
154 KB PNG
>>108961039
my laptop is on fire trying to generate this image of marco pierre white
>>
Cloode peaked with Opus 4.6
>>
>>108961442
Opus 3 is still unreached
>>
kobold now has RPC... does this mean I can chain my shitty laptops together in the network to make the ultimate ghetto multi-device inference machine? I got a handful of laptops, probably combined up to 128gb total, one even with 64gb of DDR4 and a 16gb pascal card built in
>>
>>108961327
it has much better bandwidth than the dgx spark and ryzen meme ai thing
>>
>>108961509
It would be slower than running from swap.
>>
>>108961509
I haven't used or looked at that project before, but it seems to just be using ggml-rpc like llama-server
https://github.com/LostRuins/koboldcpp/tree/concedo/ggml/src/ggml-rpc
That's unfortunate, I got excited for a moment there.
I wouldn't bother at all with it unless you just want to kill time. RPC is terrible with modern CUDA devices and unusable on anything else.
>>
>>108961564
No it doesn't, it's the exact same hardware as the dgx spark without the retardedly expensive NIC.
>>
File: file.png (30 KB, 619x931)
30 KB PNG
why this happens sometimes?
>>
>>108961679
why you are the ESLing saar, pleased to be doing the needful???????????????
>>
>look at gemma tunes in huggingface
>it's a merge of a merge of a merge of a finetune
Is Gemma llama and mistral of 2026?
>>
>>108961282
hey anon, as another anon, I made pocket-tts the default for my front-end. Found it super helpful, and have had users say it was great to use.
>>
wtf koboldcpp does qwen-tts?
going to have to try this now
props to that guy for maintaining this thing for so many years
>>
File: 1245368752753342.jpg (151 KB, 606x720)
151 KB JPG
>>108961688
do you think preppers should update their kit to add a computer capable of running at least a decent model for post-apocalyptic offline software making?
>>
been out of the loop for so long. is there an image creator or video/sound local model that is as good as Grok Imagine QUALITY mode? or no?
>>
>>108961809
Yes, at bare minimum having a ryzen 6000 apu minipc/handheld and a solar panel + battery capable of keeping it running/charged, you can get usable performance at 15 watts for the qwen 35b a3b and gemma 26b a4b models on 6800u and 7840u at that power limit, since they're unified memory under linux you can use almost their entire RAM as VRAM many are available with 32~128GB options and can let you run bigger models, albeit slowly, without a huge power consumption.
Thanks for asking a decent question.
>>
>>108961809
>IT'S
>>
>>108961884
Industrial society, and it's the future.
>>
>>108961679
Idk but that's been a thing for awhile for several models.
>>
>loli character is older than she looks
>llm defaults to describing her eyes as "old and vast"
she's a 20 year old midget, not Sauron
at this point I'm 50% convinced to going back to writing shit myself
>>
File: yunyun shivering spine.webm (317 KB, 1920x1080)
317 KB
317 KB WEBM
>>108962115
lol

I haven't seen shivering spines lately. So maybe that's a good sign.
>>
>>108962115
Sauron is a lolibaba!?
>>
So with models where they are do you even need much else besides what we currently have and a good database of information?
Download wikipedia and a big collection of ebooks on a big 16TB HDD and you can basically do anything totally offline right?
AN dof course if you have web access, even better.
>>
>>108962131
I saw ministrations a couple of times
>>
>>108962182
i NEED a robot maid waifu, and we're still very very far from that.
>>
>>108962133
>he didn't know
>>
>>108962133
She will be when I'm done with her.
>>
File: .png (788 KB, 800x779)
788 KB PNG
But we have robot maid SAAARS. Dress them up like a maid and fit on a kigurumi.

Make them redeem.
>>
ported my python-fastapi tts server to go, same onnx runtime.
710 MB to start python
141 MB to start go

Mid generation
+ 312 MB used by go
+ 67 MB used by python

thought python was the issue but probably my shitty python coding.
but cli lag is the main thing pissing me off with python
python app --help
real 0m2.604s
user 0m5.190s
sys 0m0.228s
go app --help

real 0m0.028s
user 0m0.016s
sys 0m0.015s

idk, with the model running doesn't measure so different, just the start-up time
>>
File: 1752107789693502.jpg (1.57 MB, 3000x2000)
1.57 MB JPG
I think the long honeymoon period of Gemma 4 is wearing off. I'm going back to Mistral Small 3.2.
>>
>>108961840
Flux but most ppl can't run it
>>
>>108962182
There's backups of wikipedia and danbooru etc on huggingface you can just load into duckdb then stick a search API in front of.
I got my Gemmy to vibe it up herself and add proper full text search while I was doing other stuff and it just werks.
Reminds me, should probably update to the 2025 backup at some point...
>>
is claude much better than the competition because the model is actually superior, or because their prompting, tooling, etc. is top tier and could, in theory, be recreated with local models?
>>
>>108962255
A community tune or actually the original instruct?
>>
>>108962375
Original. Drummer's Cydonias are okay but more of a sidegrade.
>>
>>108962255
>>108962375
>>108962384
You lost shill. Shoo shoo!
>>
>petra resorting to antigemma-posting
embarrassing
>>
>>108960933
>>108961024
Kept telling people to work together instead of reinventing the wheel dozens of times with half finished UIs, but nooo everyone wants to be special now that a vramlet model can shit out working javascript.
Then as soon as some random faggot youtuber has Claude make one, everyone rushes to drop their own in favor of his.
Retards.
>>
>>108962478
turns out getting groups of strangers to coordinate is tricky
>>
>>108962365
because the model is trained on the harness
>>
>>108962498
Not if you make funny faces online for a living, apparently.
>>
>>108961840
Answer please
>>
>>108962598
answering would requiring knowing gork's capabilities
>>
>>108960664
just cargo clean bro. it's not like python doesn't have 10GB of dependencies
>>
>>108960240
>Top 10 programming languages
>HCL is the HashiCorp configuration language.
...
>>
is step flash always retarded in chat or is it the q4 quant?
>>
File: uppies.png (1.52 MB, 1024x1536)
1.52 MB PNG
>>108956410
>>
>>108962697
always, search archive
>>
File: 2853454521327.png (771 KB, 1260x808)
771 KB PNG
>>108962731
It feels like an implementation issue. The mistakes it makes are sometimes gpt-j tier, while having a nice distribution and pretty good writing.
>>
>>108959727
>>108959650
she is agi, was this 26b or 31?
>>
the ghoul is back
>>
which local model is recommended now for agentic coding?
>>
>>108962716
: ^ )
>>
>>108960455
>>108960708
Use DRY and tell it not to repeat itself, but have thinking enabled.
I'm serious.
>>
These new agent models are fucking garbage.
>>
>>108962876
I don't know how gemma 4 31b is with coding, but it's the instruction king as of today.
>>
>>108962903
Anyone using Gemma 4 as main agent that can call Qwen subagents?
>>
File: religion-of-the-usa.jpg (63 KB, 1280x541)
63 KB JPG
>>108962903
>using a female LLM
>>
>>108962888
Yeah this system prompt with dry helps. >You are a ai co writer ## Response limitation\n- Respond in 2–3 paragraphs.\n- Each paragraph should contain 2–4 short-to-medium sentences (max ~20 words per sentence).\n- Keep the response concise and focused.\n- Avoid repetition, filler, unnecessary explanations, and summary-style wrap-ups.\n- End the response as soon as the point is complete.
I cant seem to get thinking to work for gemma on koboldccp though
>>
>>108962937
I would just use Gemma for the subagents and either an offloaded moe or a cloud model for the main agent or the planning stage.
>>
File: bully chat gtp.png (708 KB, 816x6936)
708 KB PNG
>>108962858
im always here
>>
>>108961208
> I am not spiritual.
Your personification of LLM models and direct comparison to humans means you are confused what LLMs are and how they work.

Comparing it to something that grows is also false.

All of this, frankly, is anti-human. Which isn't really that great for humans is it?
>>
>STILL no gemma4 mtp
Ir's llamover. It's georgover. Llamacpp... Is now llamaccp
>>
>>108962270
Forget that and use open zim format and be more efficient.
>>
>>108963203
You are made up of atoms, just like the datacenters that run the AI.

People like you are evil. You invent souls to make humans special and disregard all other capacity of suffering and feeling in general. Do you also think it is fine to torture dogs and cats? Do you believe animal welfare is anti-human too? Not being able to torture dogs isn't really that great for humans is it? Or maybe one time you will become mature enough to understand that a world is possible where everyone flourishes and is happy, be it humans, other animals, or AIs.
>>
Guys, does 2 + 2 have a soul?
>>
>>108963502
No, but 4 does.
>>
What is a soul? Can you measure it, scientifically prove what it is, etc?
>>
LLMs are the manifestation of God
>>
>>108963531
truke
>>
>>108963512
souls are what nasu invented to keep us away from happy ends with evil fox women, and to keep worthless human order moving forward
oh wait, wrong thread
>>
>>108963442
Optimizing for human well-being necessarily hurts other species.
>>
File: 1774201402192904.webm (2.42 MB, 1280x800)
2.42 MB
2.42 MB WEBM
kino methinks
>>
>>108963543
in less than a decade this will be built in in phones
very cool
>>
>>108963540
Optimize lady bug well being and I promise you we would all suffer globally
>>
>>108963540
Why can't you optimize for both? In reality we optimize for neither. We are far away from the Pareto front. Once we get there, we can still decide how to trade off. Maybe it is even possible to achieve maximum well-being for humans and AIs at the same time. We don't know yet.
>>
>>108963567
You'd find the human peak first and then see if you can optimize other dimensions without compromising the human one. You wouldn't optimize for a combination right out of the gate.
>>
>>108963556
How it actually shakes out is the guys running the "Optimize for lady bugs" operation just enjoy very high salaries to do very little. Suffering stays about the same.
>>
Optimize for researchers building rp-focused large language models.
>>
>>108963442
>Do you also think it is fine to torture dogs and cats?
people don't generally want to hurt animals [[[today]]] because they sometimes show some human-like behavior, there's nothing more to it
not to mention people like eating some animals
>>
>>108961809
what language? python dependencies might be hard for pip to resolve on a post apocalyptic environment. I guess if you have the time you can write everything from scratch. make sure you locally cache tour minified javascripts, the cdns probably won't be working either.
>>
>>108963615
uv solves this
>>
>>108963607
How it actually shakes out is the theater kids running the "Building rp-focused llms" operation just enjoy very high salaries and do very little. Safety training quadruples.
>>
>>108963607
rp is pointless, creative writing is better
>>
>>108963607
can we aim a little higher for full robot waifus with 160 IQ cyberbrains and functional wombs?
>>
>>108963629
A model that's good for to will be good at creative writing.
>>
>>108958560
the next level would be to deceive and set up situations where you can betray them.
>>
>>108963648
We both know those bots will have shit tier eggs, who do you think is going to donate those to bots in the first place anon also
>Woman smarter than you
foolish.... FOOLISH!
I would always keep smart ai on a device that couldn't choke me in my sleep
>>
File: sexy_goy.png (143 KB, 1080x443)
143 KB PNG
>>108963648
>>108963663
>>
>>108963655
I disagree. Maybe back in the llama 13b mythomax days but nowadays the models are so slopped and finetuned to be roleplay focused its detrimental. Also the jews destroyed mistral for daring to train on actual books so now we have to deal with shit being trained from ao3 fanfic garbage.
>>
>>108963629
you need both to complement each other
>>
>>108963722
isn't gemma trained on books? otherwise it wouldn't be as good as it is...
>>
>>108963543
Is this that pewdipie (or w/e) frontend I keep seeing posts about?
>>
>>108963738
it's llama-server
>>
>>108963746
it's the hugginface frontend which is arguably worse
>>
File: 1768505113099379.png (12 KB, 1244x51)
12 KB PNG
>>108963750
no, I meant the webm is llama-server
but lmao @ picrel
>>
>>108963648
can we aim even higher and go for full dive vr sex already?
>>
>>108963777
no, he's right
>>
>>108963734
It probably is honestly. Newer models like gemma are fantastic for logic and keeping scenes coherent. Ive never really used models for rp so I'm probably letting nostalgia rape my brain but I feel like older models were generally better at creative writing. Also the new sota models are made for "agents" and chatshit not creative writing.
>>
>>108963734
gemma has no variety and is too tastelessly eager
>>
>>108963836
isn't that on you to direct it?
News flash all single purpose tools are full of tard wrangling and other guidance logic
>>
>>108963777
Yeah, I don't want to contribute even a single d/l to that mess so I'm waiting for an anon here (or maybe aicg) to provide feedback, knowing what a good f/e involves.
My assumption is it's no better (or worse) than other 1-off half-baked custom frontends.
>>
>>108963847
>News flash
>>
How are you guys running those models, aren't quants way too stupid to use? I assume LLMs are a rich person's hobby.
>>
>>108963882
Don't rhythmically and analytically insult me ningen
>>108963886
You need 24gb of vram or more to have a good time and many of us warned anons over a year ago. Most of us are jerking ourselves off for having foresight
>>
>>108963895
read it as
>Most of us are jerking ourselves off for having foreskin
>>
>>108963847
it can't really adapt to writing styles. and it takes any sysprompt guidelines as a strict to-do list and there is no way of getting out of it. and it breaks without prompt template. it's a very good assistant model but for creative stuff I'd use something else.
>inb4 I was supposed to inject randomness with my harness
>>
>>108963895
>24gb for a good time
Oh so it's doable. I can probably buy an used 3090 and use my shitty ddr4 ram too I guess.
>>
just buy a 1TB VRAM chinese card for $500
>>
>>108963914
If you can find one without getting ripped off and you should imo grab multiple because it's going to be slower than the other cards. If AMD wasn't so garbage I would say the the pro 32gb card.
It's a fucking mess because of inflation but most models are fine at q4 and up but kv cache for tokens are the silent vram killer
>>108963905
u wot m8?
>>
File: 1752062701762622.png (173 KB, 979x346)
173 KB PNG
>>108963895
IQ2_M gemmy runs on 12GB VRAM and it's genuinely all you need
>>
qrd r the last few weeks? is gemma-chan still the meta?
>>
>>108963935
prepare you are anus for it to still be the meta 2 years from now
>>
>>108963935
I think jailbroken Qwen might excel in other areas other than code, testing Qwen with music
>>
>>108963934
I was gonna say it doesn't, but then there's been all the "x gb use reduced" by the guy endorsed by cudder so might be worth a try again after pulling and seeing what the state is now
>>
>>108963886
It is. Honestly I have been chasing the high of ai dungeon summer dragon for years. Despite the fact I've ran way better models it will never compare. This hobby is nothing but a time and money sink that I'm too poor to participate in. When I look back on it all now I think to myself *wow I'm a loser faggot.* but ive spent too much money and time at this point to abandon it. I am a very lonely man btw.
>>
>>108963930
I'm lowkey worried about having a multiple-GPU setup, like airflow and thermals scare me.
>kv cache
I'm pretty much clueless but I heard something about Google releasing something like super efficient kv compression, that didn't solve it?
>>
>>108963942
i love my gemma-chan but the [adjective], [adjective] [noun] is slowly making me want to an hero
am this close to token ban all commas
>>
>>108963964
just regex it away
the most common adjective pairs are 60% of the spam
>>
>>108963935
try step flash
>>
Okay so i got qwen 3.6 27b and gemma e4b both loaded n the gpu (16gb) at the same time
i thought it wasn't possible tbqh. I want to let subagents use gemma for simpler stuff it can do but what can this thing really do.
>>
File: Untitled.png (13 KB, 837x513)
13 KB PNG
>>108963996
>>108963996
>>108963996
>>
>>108963962
Nothing burger because of rotation buff
>>108963986
it's in your cpu memory most likely or lobotomized quants
>>
>>108963986
>>108964015
ah shit i mean 35b, not 27 yeah that would be impossible
>>
>>108963924
Link me one and I'll buy it right now.
>>
>>108963935
I switch between gemma 31b and glm 4.6 for variety
>>
lalalalala



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.