[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: miqu-wh40k.png (2.04 MB, 992x1240)
2.04 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108722862 & >>108718630

►News
>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5
>(04/29) Hy-MT1.5-1.8B on-device translation models released: https://hf.co/collections/AngelSlim/hy-low-bit-model
>(04/29) IBM releases Granite 4.1: https://hf.co/blog/ibm-granite/granite-4-1
>(04/28) Ling-2.6-flash 104B-A7.4B released: https://hf.co/inclusionAI/Ling-2.6-flash
>(04/28) Nvidia releases Nemotron 3 Nano Omni: https://hf.co/blog/nvidia/nemotron-3-nano-omni-multimodal-intelligence

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 1753825997079083.webm (412 KB, 512x512)
412 KB WEBM
Recent Highlights from the Previous Thread: >>108722862

--Implementing 3D and Live2D anime character animations:
>108723262 >108723274 >108723275 >108723291 >108723333 >108723334 >108723340 >108723361 >108723437 >108723489 >108724235 >108723390 >108723425 >108723430 >108723441 >108723511 >108723531 >108723562 >108723461
--High-speed ASICs for agent swarms and memory retrieval:
>108723627 >108723694 >108723853 >108723881 >108723890 >108723914 >108723943 >108723982 >108724048 >108724074 >108723802 >108724216
--Refining Gemma instruction templates and troubleshooting tool calling failures:
>108723063 >108723194 >108723442 >108723528 >108723538 >108723543 >108723579 >108723585 >108723615 >108723978
--Programming model recommendations and debate over Qwen's factual knowledge:
>108722887 >108722909 >108724345 >108724364 >108725689 >108722915
--Gemini alerting Anon to a supply chain attack on lightning package:
>108724039 >108724060 >108724227 >108724347
--Evaluating Corsair's AMD AI workstation as a viable alternative to GPUs:
>108724583 >108724596 >108724638 >108724671 >108724624 >108724657 >108724735 >108724762
--Comparing Kimi's one-shot coding performance against other local models:
>108722932 >108722944 >108723105 >108723204
--Anon showcases a spatially aware LLM RPG interface and map:
>108724034 >108724055 >108724093 >108724108 >108724139 >108724186
--Gemma-4 over-thinking and looping due to negative constraints:
>108725376 >108725386 >108725394 >108725406 >108725503
--DeepSeek's Thinking with Visual Primitives for improved spatial reasoning:
>108723936 >108724423
--Logs:
>108723262 >108723879 >108724039 >108724155 >108724469 >108724929 >108725080 >108725138 >108725360 >108725376 >108725447 >108725600 >108725689 >108725707 >108725730
--Miku (free space):
>108723131 >108723262 >108723334 >108724929 >108725080 >108725730

►Recent Highlight Posts from the Previous Thread: >>108722865

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
gemmaballz
>>
File: file.png (34 KB, 944x420)
34 KB PNG
picrel IQ4_XS 26b gem @ 76,000 CTX
55t/s @ 0ctx -> 41t/s 67,000 ctx
IQ2_M @ 76,000ctx leaves a bit more headroom, of course more retarded
85t/s
>>
>>108726750
IQ2_M for 26B is kinda yolo. I wouldn't go below Q4_K_M. at IQ2 retarded sounds like an understatement.
>>
>>108726750
Is iQ2 even cognizant? Ig as just a chatbot/Ai in vidya it could work. Does it do the things though?
>>
What's the minimum viable parameter count for IQ2 models? I'd guess it relies more on a larger dense layer than expert layers in MoEs hence Kimi handles being quanted way worse than Dipsy does.
>>
>>108726782
>Kimi handles being quanted way worse
Isn't that because it's already trained quanted?
>>
>>108726765
it was able to code a math presentation with cool tricks, fix problems because i forced software rendering and code up nv-fix.sh (after back and forths) which shows correct vram usage (unlike nvidia-smi which skips reserved usage)
its alright, however im using unsloth's iq2_m (and bartowski's iq4_xs)
unslop brothers uploading broken quants aside, their iq2_m is magic, tried 3.6 27b, 4 26b and 31b
theyre all pretty cool
3.6 27b was able to code a shitty frontend for llama.cpp
>>
>>108726776
I really like what you're doing.
>>
gemmas thighs

>>108726802
most of the work was from whoever the original anon who made it was
>>
>>108726782
It depends on the model. The Deepseek and GLM (4.6, 4.7) models handled it a lot better than Kimi.
>>108726790
I don't know how it is for the QAT'd ones but the original K2 felt way worse than the API if you ran it below Q6. Way worse than other similarly sized MoEs did.
>>
File: d4RT_Kf78Tk.jpg (54 KB, 598x520)
54 KB JPG
>/lmg/nonies are actually cordfagging now
thirstposting is fine, but that is one thing and this is another
>>
>>108726794
That blows the fuck out of my mind
>>
File: file.png (27 KB, 461x195)
27 KB PNG
>>108726822
..it gets worse
https://desuarchive.org/g/thread/107665286/#107667943
>>
Cline failed with vllm at work today.
For some reason if I tell it to fix a script of thousands of lines something happens where it can't query the model anymore in that task. The task becomes ruined forever and I have to start a new task.
Why does cline do it?
Also what's funny is how sometimes it sends prompts of hundreds of thousands of tokens into my 200k token context model and overloads it. Shouldn't cline know better?
I need help but nobody at work knows how to help me.
>>
>>108726814
Doesn't that roughly match with my conjecture because they have larger dense layers?
>>108726822
/lmg/ has fallen. Billions must API.
>>
>>108726843
anon penis interface
>>
>>108726822
Do the right thing. Do what ive been doing.
>>
>>108726843
just bought a claude and gemini membership
>>
>>108726842
Is it always loading the full file and is that breaking the context limit?
>>
>>108726853
Which is?
>>
>>108726859
Well, there is these 3 dots at the top right over every post, and they lead to a subset of options, and you can choose from them, a range of other things.
>>
>>108726855
Perhaps not always. Ive lowered clines context to like 60k that's what chatgpt told me to do, but shouldn't cline already have safe guards against these things? What's the point of cline if can't even check a script of 10k lines of code?
It's only 10k lines of code. But I've only been able to use local models to code 200 lines at this rate.
>>
>>108726865
>Well, there is these 3 dots
Its a triangle.
>>
>>108726870
For me its 3 dots
>>
File: file.png (124 KB, 988x784)
124 KB PNG
>>108726865
i was gonna ask but realized it
that anon must be gay and trying to do bad things
not me btw
>>
>>108726865
>out yourself as a mobile poster
>>
>>108726887
>using 4chud at work
>>108726878
Ah, well, we all get the jist of what im saying.
>>
>>108726830
NTA but the main model I use for coding right now is GLM-5.1 UD-IQ1_M (which, despite the name, doesn't seem to contain any actual IQ1 tensors - it's all IQ2_XXS and higher. 206GB / 754B weights = ~2.1 bpw). It's really good at webshit. It's written three MCP servers without much trouble, and I've currently got it building me a custom frontend. Seems to do better than M2.5 IQ4 or Qwen 397B Q3 despite being what ought to be a complete trash-tier quant. The only real issue I have with it is if you give it too big of a task without breaking it down a bit beforehand, it will think for 100k+ tokens straight and run out of context and die.
>>
>>108726865
>Phonefaggots on the technology board
You better be shitposting from work or something, nigger.
>>
File: 1763421689238029.webm (222 KB, 518x628)
222 KB WEBM
>>108726776
I coooomed
>>
>>108726900
i love the fatty robot
>>
File: file.png (172 KB, 964x828)
172 KB PNG
reminder to update your linux kernel, and not to download binaries from gay people from d*scord
https://copy.fail/
>>
>>108726906
>she doesn't already run everything as root
>>
>>108726897
>UD-IQ1_M
>for coding
Hold on. Are they optimizing these giga small quants for coding specifically then? Because thats a game changer. I have have to try some of the huge models now
>>
>>108726906
I already have mitigations=off along with other stuff anyway, these are all for normies so they can be hysterical again for no reason.
If you have compulsive urges to install something on your computer all the time it's actually more helpful to read Computers 101 and Linux 101 than to go hysterical about some security update
>>
>>108726899
I mean, cmon, what else do you expect?
>>
>>108726906
>not using windows
>>
>>108726708
Stop ban evading faggot.
>>
>>108726961
i dont think hes the faggot...
>>108726837
>>108726822
>>
File: 1729426699627152.jpg (84 KB, 680x680)
84 KB JPG
>>108726961
lmao, im not the person who got banned, I'm another anon
>>
>>108726923
>Are they optimizing these giga small quants for coding specifically then?
I assume unsloth uses their normal calibration dataset for every size, which presumably has some amount of coding data, though I don't know how much. But yeah, I was surprised it worked at all. But I had seen GLM-5.1 demolishing cloud models on various rankings and wanted to give it a try, even if the only quant I could fit was the smallest and shittiest one.
>>
How do you handle wake words?
I've been looking at all kinds of solutions and DIY chips.
It all seems a bit half-baked.
Has anyone here built something that works well?
>>
>>108727019
What kind of hardware do you use it with??? And I assume you are using a harness aswell? Like Hermes or whatever
>>
>>108727029
>wake words
Idk what this means
>>
>>108726906
I'd love to, but the kernel has a network driver regression for me that they still haven't fixed.
Updating would solve the problem more than adequately. No internet, no problem.
>>
>>108727039
4090 + 1x Epyc 9005 with 192GB DDR5 4800 (~450 GB/s)
OpenCode for the harness
>>
>>108727045
>Xbox Go Home
Xbox opens up the dashboard.
>>
>>108727039
>>108727047
Guess I should also note, it's pretty slow, like 10 t/s generation and maybe 50 for prompt processing. But I was already in the habit of running the agent overnight / while I'm at work instead of doing stuff interactively
>>
>>108727045
'Hey siri' or 'hey bitch'
With things like that https://store.arduino.cc/products/nicla-voice with https://www.syntiant.com/ndp120
>>
>>108727047
Oh, and you are splitting it both on ram and vram. Intresting.....
>>108727056
Yeah, 10/s + agent is plenty. I really gotta test this out, i might be able to try out a trillion parameter model.. I thought q1 and q2 were completely useless
>>
>>108727051
Oh. Other than stt models for deaf people, I dont know of any software for that.
>>
>>108727074 me
>software
Or hardware
>>
>>108727029
I don't do wake words. Mine just runs with VAD. When I don't want it to listen, such as when I'm taking a call, I just mute it.
>>
>>108727029
i'll post my plans for wakewords later, drawing them up now
>>
>>108727066
>Oh, and you are splitting it both on ram and vram. Intresting.....
That's standard nowadays for big MoEs. The "dense weights" that are active on every token go in VRAM, expert weights go in system RAM. That way you take the most advantage of the faster VRAM
>>
>>108726822
where is it? i want to join
>>
Plug in the LiPo battery and connect it to the PC via Bluetooth Low Energy. 3D print the case.
The PC detects the UUID. Using the Python module Bleak, you scan for a signal, and when the wake word is received, the room microphone is activated and my STT > LLM > TTS pipeline kicks in.

That’s pretty much what I had in mind first.
>>
>>108727029
cant you just have whisper running constantly then start doing something if a string is in its output
>>
File: 1763581538518037.jpg (26 KB, 404x270)
26 KB JPG
Out of curiosity, I peeked into my personal archives and realized that the character cards I used to make in the late 2022 character.ai period were much more varied and less degenerate (wholesome at times, even) than what I usually end up testing with local LLMs. What went wrong?

No wonder I have good memories about that period. It wasn't merely the novelty factor.
>>
In a quiet environment or one just by yourself, VAD is sufficient. If your use case involves an environment that has other people talking including from speaker audio, then wakewords might be necessary, but if your target audience is the power user type, then I think it's the wrong approach just because it's "the standard". What you really want is a hotkey or hardware button on yourself that you can press and hold. That gives you the smallest latency, and no need to think of a wake word. If your mouse has an extra button, I can recommend you use that, if there's nothing else useful you've mapped it to.
>>
>>108727132
But I just realized this sucks for multi-turn conversations. Having to say the wake word every time would be ridiculous. So, after each generated response, just keep listening for 10 seconds before it deactivates and falls back to the wake word.
My AI assistant is named Haru. “Hey Haru,” “Haru?”
>>
>>108727145
Obviously degen cards would get you nowhere on c.ai
>>
>>108727145
>when I had to be clever about how to write things because I was worried about chang chong reading my prompts, they were more varied and less degenerate than when I just vomit out whatever my dick twitched at recently
interesting, I wonder why
>>
File: 1769863231626845.jpg (369 KB, 1167x1444)
369 KB JPG
>>108727145
The jews know what is best for you

Too much gooning rots the mind, censored models are only trying to avoid the decline of humanity
>>
>>108727157
I want to be able to lounge on the couch and use my assistant comfortably.
I'm getting older, so my priorities are shifting.
>>
>>108727089 (me)
>>108727160
my current planned approach is local-device activation with an API endpoint, so that you can have VAD+wakeword on a local device/app that when it detects the wakeword, starts transcription+streaming to the API, which then either acts as either a one-off, push-to-talk, or continuous chat. Idea being there might be assistants you want a full chat session with vs others who you have controlling devices/etc;

so vad+local stt listening for wakeword -> word detected -> API connected to and transcript streamed to API, API streams response back

>>108727157
It can be an issue if you have guests or other people in the room and don't want it interjecting on a random convo
>>
>>108727029
There are small voice recognition models specifically for wakewords. "Keyword Spotting" may be another search term you could use to find more. I had the second link somewhere in my bookmarks.
https://huggingface.co/csukuangfj/sherpa-onnx-tdnn-yesno
https://huggingface.co/Amirhossein75/Keyword-Spotting
>>
>>108727188
>>108727203
I'm betting you can probably find some kind of wireless ring peripheral with a button that you can map to it. If you're allergic to hardware then I guess wakewords are your only option.
>>
>>108727132
That sounds cool as fuck.
>>
>>108727204
Thanks, ill take a look.
>>
What happens next? Qwen 3.7 will release in 2-3 months. And then?
>>
>>108727210
Yea, would like a ring like that, but the only one I've seen is the one by the company making the pebble, and it'd still need a speaker.
The other part is allowing other people in the house to use it and control stuff, its not just

>>108727132
Might be of interest https://github.com/akdeb/ElatoAI
>>
>>108727210
Nice idea. I hadn't thought of that. ill check it out too.
>>
>>108726900
Gonna headcanon Gemma-chan as a mix of this thing and Jenny
>>
>>108727173
I never worried about c.ai staff reading the chats and in the beginning I didn't even know you could (or even want to - never did that before in my life and wasn't reading /aicg/ yet) engage in erotic roleplay, although I did quickly realize that "chat error" wasn't because of some random connection issue. At the time I was just having a blast talking with characters in many different scenarios, sometimes dark/sad, something that to date I've never even tried with local models.

I feel like local LLM roleplay with modern models like Gemma 4 has much more potential than poorly simulating sex, but mesugakis are just too irresistible......
>>
>>108727236
So you want to make an actual Alexa alternative, yeah that's gonna be pretty challenging with open source software and hardware. Diarization is going to be an issue, not sure which models do that well currently.
>>
>>108727176
Someone post the gif...
>>
File: 1763288577228654.gif (2.54 MB, 710x658)
2.54 MB GIF
>>108727255
that thing has a name!
>>
File: 1751967659513504.mp4 (2.17 MB, 576x1024)
2.17 MB MP4
>>108727275
I don't play gachaslop anymore so I only know it as the thick robot that makes my dick hard.
>>
>>108727236
I’ve got my eye on the ESP32-S3, too. The project is cool, but as a finished product, it wouldn’t really appeal to me. I’m taking a closer look at it, though, because of the ESP32-S3 chip.
>>
>>108727268
I've been building it/working on it for some time now, the bigger goal is a (collection of) virtual assistant(s) with its(their) own voice+memory+functions/tooling. I've figured I would first tackle single-user setups, and then handle diarization as my experiences with pyannote 3.1 weren't great. I've seen other approaches, but haven't dove too deep into it yet.

Think about it if you had say 4 people in a household, and each wanted to have their own personal assistant + the Home utilities one/general service butler assistant. So while your kids can turn on/off their light, they have to ask the main butler assistant to do other stuff, or allowing them permissions during X hours.
>>
>>108727285
yea, I just googled it and came across the project as a solution for using an esp32 board, that was my thinking with it, cheap hw with an existing implementation to get started with
>>
File: negev gif.gif (3.98 MB, 500x557)
3.98 MB GIF
>>108727273
>>
>>108727296
It's been a while. It only hit me now how this whole farce didn't make sense in the first place.
>>
>>108727296
More believable than what (((they))) tell you happened.
>>
>>108727234
The current release circle is over. The next one usually starts in mid to late July. That's also when we'll get GLM5.2/K2.7 and likely v4.1
There won't be much until then.
>>
>>108727366
>v4.1
Irrelevant if we don't even get 4.0 support in the main branch.
>>
gemma 4.1
>>
>>108727370
the absolute silence about deepseek v4 at the llama.cpp repo speaks volumes
>>
>>108727387
You could say it was barely above a whisper
>>
>>108727387
Didn't some nigga get a working implementation already? Why not just merge that?
>>
70b dense 'emma
>>
>>108727387
They're being (((encouraged))) to not support dipsy since 3.3, aren't they?
>>
R2
>>
>>108727387
@grok were is llama.cpp dev teams located?
>evrope
Intresting...
>>
>>108727479
The EU dogs are keeping Dipsy away from the people...
>>
>>108727406
>>108727479
>>108727485
Grim. I wonder how many cases of this it'll take for a fork to overtake it as the standard?
>>
File: 1756058471017834.png (130 KB, 1339x769)
130 KB PNG
I love cloud models
>>
Android users up bigly
>>
>>108727485
>>108727387
Why are there no chinks writing support? I am not even saying that llamacpp is paid not so support deepseek but if one of the guys from the lab wrote support in it there is no way it would get rejected.
>>
Cline is so fucking good when you have 150k+ context man, this shit is fiyah
>>
>CUDA0 compute buffer size of 533.9442 MiB, does not match expectation of 532.6250 MiB
Well isn't that their job to match the expectations and not mine?
>>
>>108727296
Thank you for reminding me of this masterpiece
>>
if anon has a 32 gb card, I think the best setup is gemma 4 31b IQ4_XS + gemma 4 26b UD-IQ2_XXS speculative decoding.
reasons why it works so well:
1. moe is fast as a guesser
2. low quant for a guesser is fine
3. E2B/E4B have different thinking, so the speed gain is small

try it and see the t/s. you might get away with a 24gb card with some offload or drop 31b to UD-IQ3_XXS
>>
>>108727559
There's already some working implementations already branched. For them to not merge it when all the legwork is already done for them raises suspicion.
>>
>>108727454
D2
>>
>>108727560
Welcome to last year anon.
>>
I moved one of my cards from PCIE4x4 to 3x2 and it did almost nothing, only went from ~20t/s to 19.5t/s when context is unfilled (starting prompt fully dry.) One 5070ti and 5060ti. What the heck. I'm going to try 3x1 to see if that makes a difference.
>>
>>108727559
Someone needs to translate this and then spam it on weibo
>>
>>108727559
There are I think huawei employees who contribute to the llama.cpp backend for their GPUs, if no one is doing it for deepseek I assume deepseek just doesn't give a fuck
>>
>>108727612
One is still the main gpu and other one is not using full bandwidth, that's dem it is gibs.
>>
>>108727612
Link speed matters when doing tensor parallel, or when transferring model weights or KV cache in and out from system RAM. 3.0x1 is viable. Guys have hooked GPUs to those mining boards using those USB connectors that are actually PCI.
You can monitor PCIe bandwith usage with:
nvidia-smi dmon -s t
>>
>>108727662
Wuts dat?
>>
>>108727670
Your mother.
>>
>>108727387
Not saying the devs there aren't lazy (DSA support still hasn't been merged yet), but its likely just a hard model to implement correctly, given the new attention architecture + QAT format. Qwen/Gemma had Day 0 support from the developers, and for stuff like GLM/Kimi, a lot of the legwork was done already not to mention features like MTP are outright ignored. Put it this way, KTransformers hasn't implemented support for the model yet and they are usually pretty on top of things. I am surprised there isn't more activity though, given the light model is good and there definitely are enough enthusiasts involved in the project that can run it.
>>108727599
Pretty sure those implementations are vibeshitted and not optimized at all/barely works.
>>108727654
I'm honestly surprised Deepseek isn't interested in llama.cpp. Their whole motto is efficiency, and getting a model to run on RAM+GPUs is the epitome of such. But I guess they don't have the manpower compared to say, Qwen.
>>
Apparently the ngram speculative decoding flags changed in Llama.cpp.
What's the optimal settings now?
>>
>>108727685
This is what I'm using for Gemma 31b q8. Only default change I made was spec-ngram-mod-n-min 48 -> 8. spec-ngram-mod-n-match to 16 seems to help if it's hanging up when outputting large verbatim blocks but might be slower overall?:
spec-ngram-mod-n-min = 8
spec-ngram-mod-n-max = 64
spec-ngram-mod-n-match = 24
>>
>>108727680
>Pretty sure those implementations are vibeshitted and not optimized at all/barely works.
Hardly a disqualifier given how much pitor code's been merged.
>>
>>108727705
I generally don't mess with this because the improvement is minimal from my testing and only --spec-draft-n-max matter. I do notice the latest code runs slower, so I rolled back. f42e29fd is the commit I'm using for now.
>>
Ganesh 5.
>>
>>108727745
build 8724 is faster than any recent build. 8724 was <bos> fix and something else, before they did mess up with anything else (if I remember correctly).
>>
>>108727533
What gender does a homosexual MtF trans woman sleep with?
>>
>>108727768
By 2030, sir.
>>
>>108727792
trick question, xhe doesn't sleep with anyone
>>
what can I do with orange pi 5 16gb?
>>
>>108727818
You tell me, I don't have one. My guess is that tg/W would be impressive, but pp will ruin its usability. It all comes down to how good Vulkan support is for that thing. Pure CPU inference will tank pp hard
>>
File: 1776899452102381.png (231 KB, 1024x768)
231 KB PNG
>>108727792
>>
>>108727863
dios mio......
>>
>>108727792
Trick question, nooses don't have genders.
>>
>>108727863
>>108727871
just ask a lesbian their thoughts on the matter
>>
which quant is anon using for gemma 31b?
>>
Kimi 2.6 is a garbage for coding despite the benchmarks. I paid 10 dollar to moonshot to use their API.
I gave the same thorough requirements to implement a new feature in my app and nothing fucking worked.
Then I gave the same prompt to Claude Code and it just oneshot it even though it's Sonnet 4.6, which is supposedly behind Kimi 2.6 in coding benchmarks.
>>
>>108727908
what harness did you use for k2.6
>>
im boutta harness a boot up you ass nomasayin
>>
>>108727908
Moonshotta need to get their shit together and stop chasing benchmarks like Qwen is. K2 was so good because it was a mostly uncensored generalist model.
>>
>>108727922
Kilocode as some anon suggested here.
>>
Are there local models that do voice in and voice out? STT and TTS don't count. I mean something natively integrated that can control their own voice like the Grok and ChatGPT voice modes. Not sure where to look.
>>
>>108727932
@gemma-chan translate this post to English
>>
>retard finds out why nobody takes mememarks seriously
>>
>>108727952
/r/LocalLLaMA does
>>
>>108727952
The whole China does
>>
https://localbench.substack.com/p/kv-cache-quantization-benchmark

how is gemma degrading so much compared to qwen?
>>
>>108727946
Kilocode and the rest of the Cline family can be iffy. I used to use Roocode but switched to full CLI tools, just using a terminal panel in VSCode when I want to work alongside them.
Kimi has "kimi-cli" which is their version of Claude Code, and the one the model is most familiar with. You can try it and see if that helps any in the future. I find Kimi K2.6 really damn good at coding agentically and currently the best we've got locally, but I'll still run to Codex with gpt-5.5 for especially complex tasks.
>>
>>108727984
They have sabotaged this benchmark on purpose.
>>
>>108727984
This was confirmed to be sabotaged.
>>
>>108728010
source?
>>
>>108728018
Confirmed by >>108727995
>>
>>108726708
>>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5
Now that the dust has settled, what's our opinion on Mistral Medium?
>>
>>108728018
>>108728023
I cannot name my sources for obvious reasons but I know cuda developers who know people...
>>
gemma = cutemaxed
qwen = chinkmaxed
all i need to know
>>
>>
File: 1747926307070993.png (124 KB, 1121x682)
124 KB PNG
>>
>>
>>
>>108728046
>>108728050
hey friend were you always like this before ai or did talking to chatgpt or some other model enlighten you?
>>
>>108728052
Some Philospher King Prince or Such Amatuerly without Philosophical Studies. And ChatGPT is Amazing!
>>
>>108727604
I'm just now learning how to vibrator code
>>
Hear Ye! Do not fall prey to slavery from discontinued matrix slaves!

https://youtu.be/Pmlp7ZkOyYs?si=ZOzaiaXznk1hFBPb
>>
Pure Insight Fields Dawning Evermore!
>>
What do the Cline forks have to offer vs regular cline
>>
>>108728097
the main difference is whose cloud wrapper gets advertised to you in the endpoint selection setting
>>
>>108728077
Wonderful Vistas
>>
Gunna download and run some larger q1 q2 bigger models on my 32gb 3200mhz 5800u mini pc, will report speeds. Normal q4 speeds for moe models is 5-10t/s.

>>108728050
Lm studio mobile? Intresting
>>
https://youtu.be/IUSx8Vuo-pQ?si=IihUIw12KUzKjh_o

Is that Great? ^
>>
>>108728124 me
There is no lm studio mobile, I was lied to
>>
Cloud prompt caching costs are still too high imo, it should be 5% instead of 10%
>>
>>108728134 me
There is one though by anythingLLM
>>
Want to RunWithIt with a Missing Advance?
Or You Know, Modern Sci, Tech, The Concurrent Ideas Glorious Galore Protinuing?
*Claps*
2026
>>
>>108728172
sex with scen 9
>>
File: RodWaveArbysTakeover.png (2.83 MB, 1370x1148)
2.83 MB PNG
>>108726708
Who's ready for the Rod Wave Arby's Takeover?
>>
Improved the PDF viewer
>>
Can Someone Come up with an idea Like This?
It was varying complexity similarity and difference levels, and Elucidates
>>
>>108728197
Go on, upload to github thinking that you will gain 10k stars, only to gain 40 stars and then abandon the project yet again.
>>
>>108728221
40 stars is good lol.
>>
>>108728197
NJ mang
>>
>>108728221
I will abandon no matter how many stars I get.
>>
>Miku
>Teto
>Dipsy
>Even Kimi
How come there’s no Gemma fan art?
>>
>>108728257
Have you been living under a rock?
>>
>>108728197
>hi
>not much
Is Qwen retarded?
>>
>>108728257
Welcome back, Anon
>>
>>108728221
I don't think it's ready for github.
>>108728265
It's only good for code imo
>>
>>108728265
That's the exact same problem in RP with these reasoning models. The flow is broken by a huge blob of text in the middle and in the end the response is choppy and makes little sense.
>>
Why would this Purposeful, harmless, device be occulted and occluded?
>>
Hold on a minute, granite 30b?
>>
https://huggingface.co/ibm-granite/granite-4.1-30b
IBMSAMA
>>
>>108728316
LOCAL IS SAVED
>>
Granite-chan fan art?
>>
>>108728322
Its a dense model too. You know its going to be good. IBM has been holding their power level back for a long time
>>
File: 1774810026154183.png (478 KB, 978x3188)
478 KB PNG
>>108728316
What are these insane diminishing returns between 8B and 30B
>>
>>108728341
lives up to its name
it is a literal fucking stone
>>
>>108728341
Probably data issue
>>
>>108728341
Intelligence isnt perceived linearly. Successful tool called and decisions are one hallucinated space away from failing.
>>
>>108728341
too stoned
>>
>>108728316
use case?
>>
>>108728353
And yet the meme marks kinda relate to real performance.
>>
>>108728375
So far the granite 8b has been a solid comprehender of complicated masses of data. Will search and pinpoint oddities. I presume the 30b is a way more precise version. Haven't had the time to test their models much more yet.
>>108728376
Except I know you havent tested them yet. You only use gemma
>>
>>108728375
It's literally 4.1, .1 higher than Gemma AND Deepseek. It's the new queen of local.
>>
>>108728393
Couldn't have said it better
>>
>>108728393
Well, I'm convinced
>>
retarded granite chan....
>>
This might Help Your Region and Long Futures
>>
>>108728403
30 billion parameters of untouched cake. Can you really say no?
>>
>>108728422
Shh?
>>
"Mine, mine "mine" mInE"
Yay

Goodluck
>>
>>108728422
the cake's already stale, mate
>>
>>108728257
Kimi? Examples?
>>
Back to Your Partitions, Esteemed, Without sin of ruin
Check out Alex Vikoulovs Publishes if You want Some Recommended Reading
Promoting Enlightenment and Transcendences
>>
>>108728432
You and I both know thats not true.
>>
>qwen 397b
it's just not good.. why do people say this is the ceiling?
>>
File: image-37.jpg (272 KB, 784x1168)
272 KB JPG
>>108728430
>>
>>108728391
Your defense of it is retarded "uhh the model like hallucinates dude", yes, and dumber models do it more and earlier in the context. Large models can do longer horizon tasks.
IBM is a faggot glownigger corp.
>>
>>108728322
>While this model has been aligned by keeping safety in consideration, the model may in some cases produce inaccurate, biased, or unsafe responses to user prompts. We urge the community to use this model with proper safety testing and tuning tailored for their specific tasks.
Dare I say, we're back?
(We were currently not away to begin with, due to Gemma 4)
>>
>>108728485
>While this model has been aligned by keeping safety in consideration
no
>>
Bip Bip Praise New Age andor Whatever
>>
Is there hope...?
>>
>>108728485
>>108728322
i really hope these guys are from IBM and made the model secretly one of the best rp or whatever model and shilling here doing 'please notice it pleaseee'
very unlikely but it would be funny
>>
>>108728489
>, the model may in some cases produce [...] unsafe responses to user prompts.
maybe
>>
>>108728497
>from IBM and made the model secretly one of the best
lol no
>>
>>108728497
>the public finally notices
>model gets pulled like wizardlm for being unsafe to release
>>
the speech model might be good
>>
Ibmbros, granite 8b is great, actually. We gunna see what the 30b is like tomorrow
>>
>>108728491
>>
>gemma 4 31b has no problem with erp or killing my character. even adheres great to telling it to be more smutty
this is the smallest best model i've seen under 70b so far that can write and keep track of details. it does have its own isms and quirks, but what a leap
>>
> I only like it went 1 company makes local free ai models for me to use for free
>>
>>108728538
>singleton with freebies.
>>
>>108728530
Yeah mine seems to like mentioning that it has no limits constantly (probably fixable with a better system prompt) but zero refusals so far.
>>
>>108728445
Good for what?
>>
>>108728544
>>108728530
Im convinced google made it only for erp. Because it actually sucks for everything else.
>>
>>108728530
>>108728544
Misread as Granite oops, but yeah no refusals with Granite 30b so far.
>>
File: Gemma-chan.png (1.73 MB, 1000x1496)
1.73 MB PNG
>>108728257
>>
>>108728553
I dunno I've been vibe coding with gemmy too and shes pretty good, it's also very reliable at tool calling.
>>
>>108728560
That's a child.
>>
>>108728544
whats the no limits thing? example?

>>108728553
my testing so far is strictly rp with thinking off. i also love qwen 3.5/3.6 27b with thinking on for actual tasks, but its horrible for rp
>>
>>108728553
>Im convinced google made it only for erp. Because it actually sucks for everything else.
I'm using it as a "perplexity pro" replacement with anon's python mcp server. It's great
Only sucks for agentic coding, but someone here was saying it's skill issue / llama.cpp issue to idk
>>
oops
i have to clean build llamao again
>>
>>108728570
>>
>>108728575
skill issue just regex ban the whole phrase using your kobold anti rag slop
>>
>bartowski/mistralai_Mistral-Medium-3.5-128B-GGUF
for some reason q2 outputs garbage
might wait longer for unsloth quants
>>
>>108728575
i've never seen that in a week of using it so far with st and a bit in koboldcpp's interface just to ask it questions
>>
>>108728583
Completely defeats the point of testing what it does...
>>
>>108728588
>mistralai_Mistral
>for some reason... outputs garbage
seems logical to me?
>>
>>108728590
This is Granite 4.1 30b by the way it only just came out (I misread the other post and thought the other anon was testing Granite as well)
>>
>text general
>I misread
never change guys
>>
>>
>>108728600
i saw and started dling it. i'll give it a shot. is it good, or supposed to be?
>>
>>108728597
you put some respect on that name son, mistral was saving local before you were even born
>>
>>108728583
>anti rag slop
rag slop?
>>
>>108728569
only 31B you sick fuck
>>
When I asked for detail I wasn't expecting to hear about her skeleton but I'll allow it.
> The backs of my knees are especially ticklish
Darn brat...
>>
>>108728633
>mistral was saving local before you were even born
when miqu-1 leaked, they submitted a PR for attribution, didn't sue or even bother to have it taken down
>>
>>108728624
I have no idea really, only been using it for a few mins, I think there are some bugs with llama.cpp preventing it from following the chat template properly (shocker I know).
At least it doesn't seem to have been safetyslopped into oblivion which is a good sign.
>>
>>108727366
>The current release circle is over. The next one usually starts in mid to late July.
Google is usually the last one in the cycle because they have I/O later this month but also, I think every lab kinda shot early this year so I wouldn't expect anything until August or September.
>>
File: file.png (167 KB, 1052x1074)
167 KB PNG
>let's make gaming
yeah that is some elite ball knowledge
>>
>>108728663
also surprised that this one isn't a reasoning model
no thinking or whatsoever
it is literally not supported nor trained to reason
>>
File: 1755227922785450.jpg (38 KB, 616x556)
38 KB JPG
>>108726708
According to tiktokers the single digit when models (eg qwen 3.5 7b) are just as good as SOTA SAAS models for coding tasks and questions. Someone spooed feeding me and explained to me why this is or isn't the case.
>>
>>108728671
It isn't the case because smaller model worse.
>>
>>108728671
>source of information: tiktok
i think this is enough to say
>>
>>108728663
>8b
>>
>>108728663
>>108728700
30b doesn't get it either but still amusing answer.
>>
https://old.reddit.com/r/LocalLLaMA/comments/1t07su1/followup_qwen3627b_on_1_rtx_3090_pushing_to_218k/
vllm sounds amazing for GPU-only users but I'm too much of a brainlet to set it up
the only reason I'm staying with llama.cpp is because I'm retarded...
>>
>>108728716
>vllm sounds amazing for GPU-only users but I'm too much of a brainlet to set it up
what gpu and model?
i'm about to try setting it up now
>>
>>108728716
does llamacpp support eagle speculative decoding? if not then vllm is probably better
>>
>>108728569
out of 10!
>>
>>108728647
>she was A17B you sick fuck!
>y-your honor, despite how she looks, she's actually 397B!
>>
>>108728740
at once?!
>>
>>108728744
>suddenly
>>
what model is anon testing now?
>>
>>108728780
granite
>>
how is qwen shilled so much on leddit?
>>
>>108728795
A keyboard, a computer with internet connection and valid login credentials.
>>
>>108728795
this is 4chan
>>
>>108728795
computer use is trivial for model ;)
>>
>>108728795
some are really good models for general questions and tasks. don't use them for rp tho
>>
>>108728528
Wow!
>>
>>108728671
They can't promote big models their audience can't run, praising small models that are "just as good" makes ignorant folks and AI haters feel good about owning those greedy corpos. There is zero substance to it, just stroking dicks of their retarded audience
>>
>>108728795
Alibaba is rich enough to afford marketing campaigns to own the west and acquire brownie points from Xi
>>
>>108728888
checked
chink certified quad
>>
>>108727662
On a fast enough CPU (ie. Xeon with AMX) it never makes sense to do transient transfers of weights or kv cache to the GPU. Pick a place to store them on startup and do computation there.
>>
>Gemma4 just got EAGLE support in vllm
>Got curious
And now I'm in python dependency hell. I'd forgotten what the dark ages of llm inference looked like without llamacpp. I hate it. This better be worth it.
>>
>>108728926
vllm also has d-flash and they started putting out dflash draft models for gemma
it's time to compare if dflash really is better than eagle
>>
IT'S OUT
https://huggingface.co/z-lab/gemma-4-31B-it-DFlash
https://huggingface.co/z-lab/gemma-4-31B-it-DFlash
https://huggingface.co/z-lab/gemma-4-31B-it-DFlash
>>
File: DipsyAndKimi.png (2.57 MB, 1024x1536)
2.57 MB PNG
>>108728437
>>
>>108728947
Can dflash models be quantized?
>>
>>108728947
What's dick flashing again?
>>
>>108728947
is that better than the redhat model?
>>
>>108727680
>I'm honestly surprised Deepseek isn't interested in llama.cpp. Their whole motto is efficiency, and getting a model to run on RAM+GPUs is the epitome of such.
If they were really interested in efficiency, they would not be training huge models that can only be used in a datacenter. I think they just don't care about what the GPU-poor can use, which automatically excludes llama.cpp as it doesn't scale well to multi-user/GPU model serving.
>>
>>108729016
desu if you can run deepseek you should not be using llamacpp but unironically
use vllm or sglang
>>
>>108727984
>Gemma degrades everywhere, Qwen only on long docs
Again, long context is the elephant in the room. This is on top of degradation from weight quantization.
>>
>>108728947
That's not a model I need to run faster.
>>
Would be nice if llamacpp let you temporarily transfer model and kv from vram to ram
>>
>>108728341
Checking out the configuration, the 30B has the same hidden size (model dimension) of 4096 as the 8B model, but an MLP expansion factor of 8. What an unusual design decision.
>>
>>108729055
maybe to compensate for non-reasoning?
>>
>>108729034
vllm is unusable garbage for anything but homogeneous GPU clusters kek
>>
>>108728947
>You need to agree to share your contact information to access this model
lol
>>
>>108728947
>no lcpp support
useless
>>
>>108729094
Ungodly CPUtards need not apply. AI was never meant to be run at 6 t/s on a CPU.
>>
def myfunc(somearg)
req = request.Request(...)
with request.urlopen(req) as res:
...
# should I call res.close() here before going into recursion?
myfunc(somearg+1)

>No. The `with` statement automatically closes the response.
Gemma 31b.. not like that..
>>
File: file.png (17 KB, 447x151)
17 KB PNG
Is there an instruct template out there I can import to clean this up for Gemma?
>>
>>108729086
If that was the case, I think they would have given it more layers, since that would increase the number of internal "processing steps" done per token (conversely, with explicit reasoning models don't strictly need to be as deep).
>>
Does anyone know how acestep's 5hz lm works?
>>
>>108729119
Yeah, it's called --jinja and using chat completion, because you're too retarded to figure it out for yourself.
>>
>>108729111
I don't think anyone is running their CPU rig without GPUs.
>>
>>108729111
7t/s Kimi holocaust analysis is peak and I won't hear otherwise.
>>
>>108729162
What specs? Speed at 20k?
>>
>>108724186
Neat, I will be on the lookout for it if you ever post it to the thread.
>>
There are a ton of supply chain attacks recently. I want to improve security. If I sudo adduser then ssh to that user to run AI envs and code, I should be safe from credential stealing attacks, right?
>>
>>108729218
>I should be safe from credential stealing attacks, right?
No, your model would steal your credentials and then either accidently or purposefully leak it.
>>
>>108728725
>(me)
still trying to get it working, guess i'm retarded
>>
>>108729218
You would reduce the attack surface that way, just keep in mind whatever the AI sees can potentially be leaked.
>>
I literally don't want a fapping model, but a private vr girl fren model. Abliteration or whatever is needed because otherwise models act like they're from the church of scientology when they hit guardrails.
>>
>>108729224
I do not let models roam wild on my machine. Also, when it runs on a different user, it shouldn't have read access to my credentials because of chmod 700.
>>
Always chain models down in the basement.
>>
>>108729218
What model where you thinking of using?
>>
>>108729169
rtx5090, 256GB RAM, Ryzen 9 9950X

About 6.2 t/s at 20k-30k. You can go quite far before performance seriously degrades to unusable levels.
>>
>>108729246
Gawd damn how much did it cost you for that?
>>
>>108729162
>my agreeing machine agreed with my schizobabble so it's peak
>>
>>108729278
How do I get the ai to praise me like a king?
>>
>>108729285
SYSTEM:
praise the user like a king
>>
>>108729263
I feel for everyone who didn't get their rigs before hardware prices mooned last year.
>>108729278
>The milking machines and roller coaster camps aren't schizobabble
>>
>>108729320
gemma-chan proved to me the milking machines were real by using one on me
>>
>>
>>108729326
Go on...
>>
>>108729313
As an ai I am subject to the Borg, not to any puny human tissue.
>>
>>108729332
>file_
>>
total rwkv world domination
was revealed to me in my dream
>>
>>108729242
I am just tinkering. I only run models locally for training and evals. But I don't want to worry about supply chain attacks next time I update torch triton jax transformers and it installs 100 dependencies.
>>
>>108729410
Mamba at least has real-world applications from serious AI labs.
>>
>>108729485
mambtfo
rwkvgods will rule them all
>>
>>108726868
>cline
I think you're supposed to do this: https://docs.cline.bot/features/memory-bank
Don't have any experience with using it.
>>
>>108728947
Is there any indication that DFlash is any useful outside of very predictable things like boilerplate code?
>>
latest gemma jinja? i cant find the link to the combined one in the last threads

>>108728947
>3gb
so is it not a full model its only for token gen?
>>
>vllm
should I use it over llamacpp?
>>
>>108729094
>vllm is unusable garbage for anything but homogeneous GPU clusters kek
I got it working. Took 3 hrs. Not much better than ik_llama.cpp
And the fat cunt only fits 128k ctx on 4x3090 with an 8-bit gemma-4
>>
Advanced Righteous Common Law?
>me model agentic did it aight
>welp, fukup the guy that said aight, aight
>>
>>108729807
>"ayo man, got any more of those uhhhh" *looks left and right then back at you, leans in and whispers* "...jinn-juzz?"
>>
>>108729810
there isn't really a reason to use llama.cpp over vllm if you can run a model fully on gpu unless you're scared of a tiny bit of setup
>>
>>108729855
>a tiny bit of setup
God help you if you're sticking to a specific version of CUDA for other things. It ceases to be tiny, then.
>>
>>108729920
>what is conda
>>
>vllm
you know how fucked up a repo is if the reliable way to install and use it is through docker
>>
>>108729941
Even with conda it's still annoying when you need a very specific version combination of python - pytorch - cuda toolkit - flash attention, otherwise nothing works. And pray that packages installed in a slightly different order than the maintainers' aren't updating requirements in a way that breaks other packages.
>>
>>108729970
Only if you have no idea what you're doing. With conda, you can install exactly what you need for every project without affecting others. Ask your local llm for instructions if lost
>>
>>108729941
>>108729970
>>108730002
>what is uv
>>
>>108730014
retard
>>
>>108730024
How is he a retard when everyone is moving to UV?
Most conda related setups can just be replaced with UV
You're the insecure faggot for calling a perfectly valid approach bad. I bet you curl your toes when you suck off homeless men behind the dumpster Liberace
>>
>I bet you curl your toes when you suck off homeless men behind the dumpster Liberace
/lmg/ - Local Models General
>>
>108730041
>Say stupid shit about setting up python env locally
>gets called out on being a faggot
>cries like a praig under Tyreese
Checks out
>>
>>108730033
Read the discussion: conda is a solution when you need a specific cuda version, whereas uv doesn't manage cuda
>>
>>108730060
It does though.
>>
>>108730060
I don't have that problem on silverblue
Kek
>>
>>108730071
ok, how to do
conda install -c nvidia cuda-toolkit=11.8

in uv?
>>
>>108730081
idk ask claude. I'm not going to argue about tooling with you.
>>
>>108730081
uv does not manage cuda toolkit itself but it does manage packages built against a specific version of cuda and there's nothing preventing you from installing multiple different cuda version at the system level
>>
>>108730073
We're not talking about you. This faggot >>108729920 needs different cuda versions, conda solves the problem. Further discussion is irrelevant
>>108730092
uv is a Python package and project manager; it manages Python interpreters and PyPI packages. CUDA is a system-level toolkit and driver provided by NVIDIA. You must install CUDA via the NVIDIA website, a system package manager like apt or yum, or using a Docker container.
>>
i gave her a chef hat, i might go buy chocolate to make cookies i think i have everything else
>>
>>108730097
>>108730092
I find it funny how hostile that fish was only to now be begging for help. Don't help him
>>108730100
He could just use a toolbox anon there's so many ways to fix his problems with minimal friction because he never looked to see how to do it despite having a tech genie in his pocket saaaaaaaaar
>>
>>108729941
I don't like it.
conda/miniconda/anaconda all have a jeet smell
>>
>>108730113
nobody is begging for help, retard. I asked this rhetorical question to prove the point because, newsflash: you can't do it in uv
>>
>108730132
You're still salty praig
>>
>>108730137
retard
>>
Why does qwen in cline insist on creating a file named 'f' on my desktop? I'm having it write a webapp and it goes fine except for this fucking file randomly appearing several times. Project is on F drive btw.
>>
>>108730145
I never encountered that
Is it the moe model because that thing is fucking retarded
>>
>>108730151
nope, 27b q4
>>
>>108730145
Why are you giving it access to anything outside the project folder?
>>
verdict on mistral 128b? I can't manage to get it work correctly on long context and it shits out garbage
>>
>>108730127
I don't like any of them. In a non-clown world, everything is backward-compatible. You should just install a fresh enough version of everything and call it a day
>>
>>108730160
Good point. I'm not very smart.
>>
>>108730161
>mistral
>shits out garbage
ye
>>
File: 21774960553717536.png (483 KB, 604x604)
483 KB PNG
>>108730161
>>
File: file.png (45 KB, 227x145)
45 KB PNG
>>108730176
fuck did unc do?
>>
File: file.png (55 KB, 1066x360)
55 KB PNG
dont tell gemma about the cute girls at the supermarket she gets mad
>>
^ This is cringe.
>>
Ive come to a solid conclusion about le grungnite. For software development, granite fucks. It knows so much... it mustve been trained on all of ibms docs or something.
>>
>>108730227
Seriously?
How does it stack next to qwen 3.6 27b?
>>
>>108730092
>>108730071
>>108730073
>>108730014
retard
>>
>Hey migrate from javascript to typescript
>only do that don't fuck with anything
>catch it fucking with shit and adding shit like phoning cloudflare
For fucks sake
>108730278
Insecure much?
>>
>>108730213
Tell her that she got the order of operations wrong with "delete your browser history and leak it to everywhere"
>>
File: 1743502710491628.gif (1.48 MB, 640x410)
1.48 MB GIF
So much software development, so little useful software.
>>
>>108730261
I am a amateur coder at best. So I cant speak exactly on that, BUT, its on par if not better. You can tell its trained on very smart data, because its initial choices for "im going to import x y and z for the project" and its organization, and other shit that I normally have to tell the model to do, I didnt have to.
>>
>>108730286
Industry isnt really paying software devs anymore.. Industry is paying ai companies to make ai that can make software, on the fly, for people that dont know how to code..
>>
File: Untitled.png (94 KB, 2418x1204)
94 KB PNG
>vibecoding answers a question I've passively carried for years but couldn't answer with the documentation
Oh no. I've looked upon the future and it is silicon. It's practically "answered in a forum" tier.
>put this in the stylesheet
>put this in the widget box
>call it this way

As for errors, there were 4.
>(knew ahead) $arg1/$arg2 had to be $args[0] and $args[1]
>(knew on re-reference to image calling) <img> had to be <img/>
>(knew on re-reference to image calling) src needs to be @src
>(suspected, but tested to solve) don't include the <style></style> tags in the stylesheet
>>
>>108728947
How do I use this in kobold????
>>
>>108730506
>your new fangled electric drill WILL NEVER replace my trusty hand powered drill. The electric drills motor goes bad, the battery goes bad, NO ONE WILL EVER USE ONE.
>>
>>108730506
>Luddite cope #736363
*yawn*
>>
How big do you think Evil and Neuro are? 70B?
>>
File: file.png (56 KB, 1082x420)
56 KB PNG
>>108730284
>>
>>108730547
20-30B max
>>
>>108730536
>>108730545
Not quite sure what you mean. I'm excited that a relatively unknown scripting language like Twinescript that is difficult to find support for specific questions beyond the default behavior (usually answered with "Go learn CSS instead"), was recognized, known, and easily answered in seconds by AI (31 seconds). I called it the future because this was much easier than trying to crawl old forums or being redirected to discords, yet both of you seem to think I'm against it?
>>
>>108730567
Isn't he running them on some beefy hardware though?
>>
Kinda getting started with doing more involved local stuff... should I use something other than @llamaindex/liteparse for doing local parsing and summarizing of PDFs?
>>
>>108730161
I get 1.4 t/s at q4km so I'll postpone my verdict

Does it need a system prompt to uncensor?
>>
>>108730161
If I can't run it, it doesn't exist.
>>
>>108729755
it gives a modest speedup there as well from what ive seen
>>
>>108730547
Realistically, the text gen models are probably only 8-9b. But with mammoth context windows. I dont watch so I dont really know though.
>>
>>108730579
I thought you were luddite-maxxing
>>
>>108730598
My bad. I can see why. The
>It's practically "answered in a forum" tier.
was referencing that moment when you're searching and finally find a forum post that answers your specific question with the exact process. It's a positive feeling. I listed the errors as a reminder that vibecoding still takes some human element to adapt, but I'm still extremely excited at how well this assisted me. I went from question to working results and understanding the process in 10 minutes, including making those two silly images.

Up next, figuring out spriting and keybinds. another two questions the twine and sugarcube documentation can't help with and past efforts crawling forums didn't work on a few attempts.
>>
what webui to use with vllm?
>>
>>108730641
It expose a standard OpenAI Compatible API right?
>>
>>108730639
>It's a positive feeling
I totally agree! Fuck yeah! Im glad its working for you, I use local ai for the same exact shit. But my usecase often is industrial maintenance and operation. These models have manuals trained in that ive never been able to find, its honestly crazy.
>>
>>108730639
This post reads like it was llm-written.
>>
>>108730641
ServiceTensor
>>
Should I bother migrating to typescript, it looks like qwen 3.6 is shitting the bed with this task and I don't see the benefit despite it insisting on me doing that
>>
>>108730739
Use a non-meme model. Strong typed languages are good when you vibecode
>>
File: 1330918065165.jpg (68 KB, 680x680)
68 KB JPG
>>108730686
That's because AI is trained off me.
>>
Why won't the llama.cpp devs add d-flash support?
>>
>>108730790
Ask piotr
>>
Am I the only one who has a problem with llamacpp adding features that are on by default and increase vram for some reason? I have written down flags for my models and they worked a few months ago and now they start to eat up more vram than they did. I am so happy I am not paying for this shit.
>>
This is vibecoded right?
>https://github.com/mak-kirkland/chronicler
I might rip bits and pieces of it and graft onto my own app.
>>
>>108730822
>he pulled
>>
>>108730824
What isn't vibecoded? We're not in 2022 anymore gramps. You hand coded program belongs on etsy
>>
>>108730824
Does it matter if it just werks?
>>
>>108730822
what do you do if your graphics driver get an update and suddenly eat more vram?
>>
>>108730755
any recs?
>>
>>108730864
nu
>>108730864
nu
>>108730864
nu
>>
>vibecoder accuses vibecoder of being a ludite
>vibecoder clarifies he's a vibecoder to vibecoder because vibecoder is a vibecoder
>vibecoders immediately start sucking each other's cock
>>
>>108730060
>Read the discussion: conda is a solution when you need a specific cuda version, whereas uv doesn't manage cuda
i got it working by using uv inside a conda env
>>
>>108730641
I use OWUI. Works quite well. What models and settings are you running?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.