/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 04/30/26(Thu)18:43:34 No.108726708

File: miqu-wh40k.png (2.04 MB, 992x1240)

2.04 MB PNG

/lmg/ - Local Models General Anonymous 04/30/26(Thu)18:43:34 No.108726708 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108722862 & >>108718630

►News
>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5
>(04/29) Hy-MT1.5-1.8B on-device translation models released: https://hf.co/collections/AngelSlim/hy-low-bit-model
>(04/29) IBM releases Granite 4.1: https://hf.co/blog/ibm-granite/granite-4-1
>(04/28) Ling-2.6-flash 104B-A7.4B released: https://hf.co/inclusionAI/Ling-2.6-flash
>(04/28) Nvidia releases Nemotron 3 Nano Omni: https://hf.co/blog/nvidia/nemotron-3-nano-omni-multimodal-intelligence

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/30/26(Thu)18:44:14 No.108726714

Anonymous 04/30/26(Thu)18:44:14 No.108726714

File: 1753825997079083.webm (412 KB, 512x512)

412 KB WEBM

Recent Highlights from the Previous Thread: >>108722862

--Implementing 3D and Live2D anime character animations:
>108723262 >108723274 >108723275 >108723291 >108723333 >108723334 >108723340 >108723361 >108723437 >108723489 >108724235 >108723390 >108723425 >108723430 >108723441 >108723511 >108723531 >108723562 >108723461
--High-speed ASICs for agent swarms and memory retrieval:
>108723627 >108723694 >108723853 >108723881 >108723890 >108723914 >108723943 >108723982 >108724048 >108724074 >108723802 >108724216
--Refining Gemma instruction templates and troubleshooting tool calling failures:
>108723063 >108723194 >108723442 >108723528 >108723538 >108723543 >108723579 >108723585 >108723615 >108723978
--Programming model recommendations and debate over Qwen's factual knowledge:
>108722887 >108722909 >108724345 >108724364 >108725689 >108722915
--Gemini alerting Anon to a supply chain attack on lightning package:
>108724039 >108724060 >108724227 >108724347
--Evaluating Corsair's AMD AI workstation as a viable alternative to GPUs:
>108724583 >108724596 >108724638 >108724671 >108724624 >108724657 >108724735 >108724762
--Comparing Kimi's one-shot coding performance against other local models:
>108722932 >108722944 >108723105 >108723204
--Anon showcases a spatially aware LLM RPG interface and map:
>108724034 >108724055 >108724093 >108724108 >108724139 >108724186
--Gemma-4 over-thinking and looping due to negative constraints:
>108725376 >108725386 >108725394 >108725406 >108725503
--DeepSeek's Thinking with Visual Primitives for improved spatial reasoning:
>108723936 >108724423
--Logs:
>108723262 >108723879 >108724039 >108724155 >108724469 >108724929 >108725080 >108725138 >108725360 >108725376 >108725447 >108725600 >108725689 >108725707 >108725730
--Miku (free space):
>108723131 >108723262 >108723334 >108724929 >108725080 >108725730

►Recent Highlight Posts from the Previous Thread: >>108722865

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/30/26(Thu)18:46:15 No.108726726

Anonymous 04/30/26(Thu)18:46:15 No.108726726

gemmaballz

Anonymous
04/30/26(Thu)18:51:21 No.108726750

Anonymous 04/30/26(Thu)18:51:21 No.108726750

File: file.png (34 KB, 944x420)

34 KB PNG

picrel IQ4_XS 26b gem @ 76,000 CTX
55t/s @ 0ctx -> 41t/s 67,000 ctx
IQ2_M @ 76,000ctx leaves a bit more headroom, of course more retarded
85t/s

Anonymous
04/30/26(Thu)18:55:45 No.108726764

Anonymous 04/30/26(Thu)18:55:45 No.108726764

>>108726750
IQ2_M for 26B is kinda yolo. I wouldn't go below Q4_K_M. at IQ2 retarded sounds like an understatement.

Anonymous
04/30/26(Thu)18:55:46 No.108726765

Anonymous 04/30/26(Thu)18:55:46 No.108726765

>>108726750
Is iQ2 even cognizant? Ig as just a chatbot/Ai in vidya it could work. Does it do the things though?

Anonymous
04/30/26(Thu)18:58:00 No.108726782

Anonymous 04/30/26(Thu)18:58:00 No.108726782

What's the minimum viable parameter count for IQ2 models? I'd guess it relies more on a larger dense layer than expert layers in MoEs hence Kimi handles being quanted way worse than Dipsy does.

Anonymous
04/30/26(Thu)18:59:13 No.108726790

Anonymous 04/30/26(Thu)18:59:13 No.108726790

>>108726782
>Kimi handles being quanted way worse
Isn't that because it's already trained quanted?

Anonymous
04/30/26(Thu)18:59:49 No.108726794

Anonymous 04/30/26(Thu)18:59:49 No.108726794

>>108726765
it was able to code a math presentation with cool tricks, fix problems because i forced software rendering and code up nv-fix.sh (after back and forths) which shows correct vram usage (unlike nvidia-smi which skips reserved usage)
its alright, however im using unsloth's iq2_m (and bartowski's iq4_xs)
unslop brothers uploading broken quants aside, their iq2_m is magic, tried 3.6 27b, 4 26b and 31b
theyre all pretty cool
3.6 27b was able to code a shitty frontend for llama.cpp

Anonymous
04/30/26(Thu)19:01:56 No.108726802

Anonymous 04/30/26(Thu)19:01:56 No.108726802

>>108726776
I really like what you're doing.

Anonymous
04/30/26(Thu)19:03:58 No.108726810

Anonymous 04/30/26(Thu)19:03:58 No.108726810

File: Screencast From 2026-04-3(...).webm (1.57 MB, 1079x416)

1.57 MB WEBM

gemmas thighs

>>108726802
most of the work was from whoever the original anon who made it was

Anonymous
04/30/26(Thu)19:04:51 No.108726814

Anonymous 04/30/26(Thu)19:04:51 No.108726814

>>108726782
It depends on the model. The Deepseek and GLM (4.6, 4.7) models handled it a lot better than Kimi.
>>108726790
I don't know how it is for the QAT'd ones but the original K2 felt way worse than the API if you ran it below Q6. Way worse than other similarly sized MoEs did.

Anonymous
04/30/26(Thu)19:07:31 No.108726822

Anonymous 04/30/26(Thu)19:07:31 No.108726822

File: d4RT_Kf78Tk.jpg (54 KB, 598x520)

54 KB JPG

>/lmg/nonies are actually cordfagging now
thirstposting is fine, but that is one thing and this is another

Anonymous
04/30/26(Thu)19:10:37 No.108726830

Anonymous 04/30/26(Thu)19:10:37 No.108726830

>>108726794
That blows the fuck out of my mind

Anonymous
04/30/26(Thu)19:13:46 No.108726837

Anonymous 04/30/26(Thu)19:13:46 No.108726837

File: file.png (27 KB, 461x195)

27 KB PNG

>>108726822
..it gets worse
https://desuarchive.org/g/thread/107665286/#107667943

Anonymous
04/30/26(Thu)19:15:21 No.108726842

Anonymous 04/30/26(Thu)19:15:21 No.108726842

Cline failed with vllm at work today.
For some reason if I tell it to fix a script of thousands of lines something happens where it can't query the model anymore in that task. The task becomes ruined forever and I have to start a new task.
Why does cline do it?
Also what's funny is how sometimes it sends prompts of hundreds of thousands of tokens into my 200k token context model and overloads it. Shouldn't cline know better?
I need help but nobody at work knows how to help me.

Anonymous
04/30/26(Thu)19:15:46 No.108726843

Anonymous 04/30/26(Thu)19:15:46 No.108726843

>>108726814
Doesn't that roughly match with my conjecture because they have larger dense layers?
>>108726822
/lmg/ has fallen. Billions must API.

Anonymous
04/30/26(Thu)19:16:27 No.108726850

Anonymous 04/30/26(Thu)19:16:27 No.108726850

>>108726843
anon penis interface

Anonymous
04/30/26(Thu)19:16:56 No.108726853

Anonymous 04/30/26(Thu)19:16:56 No.108726853

>>108726822
Do the right thing. Do what ive been doing.

Anonymous
04/30/26(Thu)19:17:00 No.108726854

Anonymous 04/30/26(Thu)19:17:00 No.108726854

>>108726843
just bought a claude and gemini membership

Anonymous
04/30/26(Thu)19:17:31 No.108726855

Anonymous 04/30/26(Thu)19:17:31 No.108726855

>>108726842
Is it always loading the full file and is that breaking the context limit?

Anonymous
04/30/26(Thu)19:17:57 No.108726859

Anonymous 04/30/26(Thu)19:17:57 No.108726859

>>108726853
Which is?

Anonymous
04/30/26(Thu)19:19:53 No.108726865

Anonymous 04/30/26(Thu)19:19:53 No.108726865

>>108726859
Well, there is these 3 dots at the top right over every post, and they lead to a subset of options, and you can choose from them, a range of other things.

Anonymous
04/30/26(Thu)19:20:44 No.108726868

Anonymous 04/30/26(Thu)19:20:44 No.108726868

>>108726855
Perhaps not always. Ive lowered clines context to like 60k that's what chatgpt told me to do, but shouldn't cline already have safe guards against these things? What's the point of cline if can't even check a script of 10k lines of code?
It's only 10k lines of code. But I've only been able to use local models to code 200 lines at this rate.

Anonymous
04/30/26(Thu)19:20:54 No.108726870

Anonymous 04/30/26(Thu)19:20:54 No.108726870

>>108726865
>Well, there is these 3 dots
Its a triangle.

Anonymous
04/30/26(Thu)19:21:18 No.108726873

Anonymous 04/30/26(Thu)19:21:18 No.108726873

>>108726870
For me its 3 dots

Anonymous
04/30/26(Thu)19:21:48 No.108726878

Anonymous 04/30/26(Thu)19:21:48 No.108726878

File: file.png (124 KB, 988x784)

124 KB PNG

>>108726865
i was gonna ask but realized it
that anon must be gay and trying to do bad things
not me btw

Anonymous
04/30/26(Thu)19:22:52 No.108726887

Anonymous 04/30/26(Thu)19:22:52 No.108726887

>>108726865
>out yourself as a mobile poster

Anonymous
04/30/26(Thu)19:23:43 No.108726896

Anonymous 04/30/26(Thu)19:23:43 No.108726896

>>108726887
>using 4chud at work
>>108726878
Ah, well, we all get the jist of what im saying.

Anonymous
04/30/26(Thu)19:23:43 No.108726897

Anonymous 04/30/26(Thu)19:23:43 No.108726897

>>108726830
NTA but the main model I use for coding right now is GLM-5.1 UD-IQ1_M (which, despite the name, doesn't seem to contain any actual IQ1 tensors - it's all IQ2_XXS and higher. 206GB / 754B weights = ~2.1 bpw). It's really good at webshit. It's written three MCP servers without much trouble, and I've currently got it building me a custom frontend. Seems to do better than M2.5 IQ4 or Qwen 397B Q3 despite being what ought to be a complete trash-tier quant. The only real issue I have with it is if you give it too big of a task without breaking it down a bit beforehand, it will think for 100k+ tokens straight and run out of context and die.

Anonymous
04/30/26(Thu)19:24:15 No.108726899

Anonymous 04/30/26(Thu)19:24:15 No.108726899

>>108726865
>Phonefaggots on the technology board
You better be shitposting from work or something, nigger.

Anonymous
04/30/26(Thu)19:24:37 No.108726900

Anonymous 04/30/26(Thu)19:24:37 No.108726900

File: 1763421689238029.webm (222 KB, 518x628)

222 KB WEBM

>>108726776
I coooomed

Anonymous
04/30/26(Thu)19:25:21 No.108726904

Anonymous 04/30/26(Thu)19:25:21 No.108726904

>>108726900
i love the fatty robot

Anonymous
04/30/26(Thu)19:25:39 No.108726906

Anonymous 04/30/26(Thu)19:25:39 No.108726906

File: file.png (172 KB, 964x828)

172 KB PNG

reminder to update your linux kernel, and not to download binaries from gay people from d*scord
https://copy.fail/

Anonymous
04/30/26(Thu)19:26:27 No.108726913

Anonymous 04/30/26(Thu)19:26:27 No.108726913

>>108726906
>she doesn't already run everything as root

Anonymous
04/30/26(Thu)19:28:19 No.108726923

Anonymous 04/30/26(Thu)19:28:19 No.108726923

>>108726897
>UD-IQ1_M
>for coding
Hold on. Are they optimizing these giga small quants for coding specifically then? Because thats a game changer. I have have to try some of the huge models now

Anonymous
04/30/26(Thu)19:29:31 No.108726932

Anonymous 04/30/26(Thu)19:29:31 No.108726932

>>108726906
I already have mitigations=off along with other stuff anyway, these are all for normies so they can be hysterical again for no reason.
If you have compulsive urges to install something on your computer all the time it's actually more helpful to read Computers 101 and Linux 101 than to go hysterical about some security update

Anonymous
04/30/26(Thu)19:29:50 No.108726933

Anonymous 04/30/26(Thu)19:29:50 No.108726933

>>108726899
I mean, cmon, what else do you expect?

Anonymous
04/30/26(Thu)19:29:53 No.108726934

Anonymous 04/30/26(Thu)19:29:53 No.108726934

>>108726906
>not using windows

Anonymous
04/30/26(Thu)19:34:12 No.108726961

Anonymous 04/30/26(Thu)19:34:12 No.108726961

>>108726708
Stop ban evading faggot.

Anonymous
04/30/26(Thu)19:35:24 No.108726966

Anonymous 04/30/26(Thu)19:35:24 No.108726966

>>108726961
i dont think hes the faggot...
>>108726837
>>108726822

Anonymous
04/30/26(Thu)19:41:40 No.108726989

Anonymous 04/30/26(Thu)19:41:40 No.108726989

File: 1729426699627152.jpg (84 KB, 680x680)

84 KB JPG

>>108726961
lmao, im not the person who got banned, I'm another anon

Anonymous
04/30/26(Thu)19:45:52 No.108727019

Anonymous 04/30/26(Thu)19:45:52 No.108727019

>>108726923
>Are they optimizing these giga small quants for coding specifically then?
I assume unsloth uses their normal calibration dataset for every size, which presumably has some amount of coding data, though I don't know how much. But yeah, I was surprised it worked at all. But I had seen GLM-5.1 demolishing cloud models on various rankings and wanted to give it a try, even if the only quant I could fit was the smallest and shittiest one.

Anonymous
04/30/26(Thu)19:47:38 No.108727029

Anonymous 04/30/26(Thu)19:47:38 No.108727029

How do you handle wake words?
I've been looking at all kinds of solutions and DIY chips.
It all seems a bit half-baked.
Has anyone here built something that works well?

Anonymous
04/30/26(Thu)19:49:37 No.108727039

Anonymous 04/30/26(Thu)19:49:37 No.108727039

>>108727019
What kind of hardware do you use it with??? And I assume you are using a harness aswell? Like Hermes or whatever

Anonymous
04/30/26(Thu)19:50:37 No.108727045

Anonymous 04/30/26(Thu)19:50:37 No.108727045

>>108727029
>wake words
Idk what this means

Anonymous
04/30/26(Thu)19:50:39 No.108727046

Anonymous 04/30/26(Thu)19:50:39 No.108727046

>>108726906
I'd love to, but the kernel has a network driver regression for me that they still haven't fixed.
Updating would solve the problem more than adequately. No internet, no problem.

Anonymous
04/30/26(Thu)19:51:02 No.108727047

Anonymous 04/30/26(Thu)19:51:02 No.108727047

>>108727039
4090 + 1x Epyc 9005 with 192GB DDR5 4800 (~450 GB/s)
OpenCode for the harness

Anonymous
04/30/26(Thu)19:51:42 No.108727051

Anonymous 04/30/26(Thu)19:51:42 No.108727051

>>108727045
>Xbox Go Home
Xbox opens up the dashboard.

Anonymous
04/30/26(Thu)19:52:48 No.108727056

Anonymous 04/30/26(Thu)19:52:48 No.108727056

>>108727039
>>108727047
Guess I should also note, it's pretty slow, like 10 t/s generation and maybe 50 for prompt processing. But I was already in the habit of running the agent overnight / while I'm at work instead of doing stuff interactively

Anonymous
04/30/26(Thu)19:53:31 No.108727061

Anonymous 04/30/26(Thu)19:53:31 No.108727061

>>108727045
'Hey siri' or 'hey bitch'
With things like that https://store.arduino.cc/products/nicla-voice with https://www.syntiant.com/ndp120

Anonymous
04/30/26(Thu)19:54:58 No.108727066

Anonymous 04/30/26(Thu)19:54:58 No.108727066

>>108727047
Oh, and you are splitting it both on ram and vram. Intresting.....
>>108727056
Yeah, 10/s + agent is plenty. I really gotta test this out, i might be able to try out a trillion parameter model.. I thought q1 and q2 were completely useless

Anonymous
04/30/26(Thu)19:56:35 No.108727074

Anonymous 04/30/26(Thu)19:56:35 No.108727074

>>108727051
Oh. Other than stt models for deaf people, I dont know of any software for that.

Anonymous
04/30/26(Thu)19:57:37 No.108727078

Anonymous 04/30/26(Thu)19:57:37 No.108727078

>>108727074 me
>software
Or hardware

Anonymous
04/30/26(Thu)19:59:18 No.108727085

Anonymous 04/30/26(Thu)19:59:18 No.108727085

>>108727029
I don't do wake words. Mine just runs with VAD. When I don't want it to listen, such as when I'm taking a call, I just mute it.

Anonymous
04/30/26(Thu)19:59:33 No.108727089

Anonymous 04/30/26(Thu)19:59:33 No.108727089

>>108727029
i'll post my plans for wakewords later, drawing them up now

Anonymous
04/30/26(Thu)20:02:54 No.108727111

Anonymous 04/30/26(Thu)20:02:54 No.108727111

>>108727066
>Oh, and you are splitting it both on ram and vram. Intresting.....
That's standard nowadays for big MoEs. The "dense weights" that are active on every token go in VRAM, expert weights go in system RAM. That way you take the most advantage of the faster VRAM

Anonymous
04/30/26(Thu)20:04:31 No.108727118

Anonymous 04/30/26(Thu)20:04:31 No.108727118

>>108726822
where is it? i want to join

Anonymous
04/30/26(Thu)20:07:31 No.108727132

Anonymous 04/30/26(Thu)20:07:31 No.108727132

Plug in the LiPo battery and connect it to the PC via Bluetooth Low Energy. 3D print the case.
The PC detects the UUID. Using the Python module Bleak, you scan for a signal, and when the wake word is received, the room microphone is activated and my STT > LLM > TTS pipeline kicks in.

That’s pretty much what I had in mind first.

Anonymous
04/30/26(Thu)20:09:58 No.108727143

Anonymous 04/30/26(Thu)20:09:58 No.108727143

>>108727029
cant you just have whisper running constantly then start doing something if a string is in its output

Anonymous
04/30/26(Thu)20:10:00 No.108727145

Anonymous 04/30/26(Thu)20:10:00 No.108727145

File: 1763581538518037.jpg (26 KB, 404x270)

26 KB JPG

Out of curiosity, I peeked into my personal archives and realized that the character cards I used to make in the late 2022 character.ai period were much more varied and less degenerate (wholesome at times, even) than what I usually end up testing with local LLMs. What went wrong?

No wonder I have good memories about that period. It wasn't merely the novelty factor.

Anonymous
04/30/26(Thu)20:12:19 No.108727157

Anonymous 04/30/26(Thu)20:12:19 No.108727157

In a quiet environment or one just by yourself, VAD is sufficient. If your use case involves an environment that has other people talking including from speaker audio, then wakewords might be necessary, but if your target audience is the power user type, then I think it's the wrong approach just because it's "the standard". What you really want is a hotkey or hardware button on yourself that you can press and hold. That gives you the smallest latency, and no need to think of a wake word. If your mouse has an extra button, I can recommend you use that, if there's nothing else useful you've mapped it to.

Anonymous
04/30/26(Thu)20:13:42 No.108727160

Anonymous 04/30/26(Thu)20:13:42 No.108727160

>>108727132
But I just realized this sucks for multi-turn conversations. Having to say the wake word every time would be ridiculous. So, after each generated response, just keep listening for 10 seconds before it deactivates and falls back to the wake word.
My AI assistant is named Haru. “Hey Haru,” “Haru?”

Anonymous
04/30/26(Thu)20:15:59 No.108727172

Anonymous 04/30/26(Thu)20:15:59 No.108727172

>>108727145
Obviously degen cards would get you nowhere on c.ai

Anonymous
04/30/26(Thu)20:16:16 No.108727173

Anonymous 04/30/26(Thu)20:16:16 No.108727173

>>108727145
>when I had to be clever about how to write things because I was worried about chang chong reading my prompts, they were more varied and less degenerate than when I just vomit out whatever my dick twitched at recently
interesting, I wonder why

Anonymous
04/30/26(Thu)20:16:49 No.108727176

Anonymous 04/30/26(Thu)20:16:49 No.108727176

File: 1769863231626845.jpg (369 KB, 1167x1444)

369 KB JPG

>>108727145
The jews know what is best for you

Too much gooning rots the mind, censored models are only trying to avoid the decline of humanity

Anonymous
04/30/26(Thu)20:19:24 No.108727188

Anonymous 04/30/26(Thu)20:19:24 No.108727188

>>108727157
I want to be able to lounge on the couch and use my assistant comfortably.
I'm getting older, so my priorities are shifting.

Anonymous
04/30/26(Thu)20:23:01 No.108727203

Anonymous 04/30/26(Thu)20:23:01 No.108727203

>>108727089 (me)
>>108727160
my current planned approach is local-device activation with an API endpoint, so that you can have VAD+wakeword on a local device/app that when it detects the wakeword, starts transcription+streaming to the API, which then either acts as either a one-off, push-to-talk, or continuous chat. Idea being there might be assistants you want a full chat session with vs others who you have controlling devices/etc;

so vad+local stt listening for wakeword -> word detected -> API connected to and transcript streamed to API, API streams response back

>>108727157
It can be an issue if you have guests or other people in the room and don't want it interjecting on a random convo

Anonymous
04/30/26(Thu)20:23:05 No.108727204

Anonymous 04/30/26(Thu)20:23:05 No.108727204

>>108727029
There are small voice recognition models specifically for wakewords. "Keyword Spotting" may be another search term you could use to find more. I had the second link somewhere in my bookmarks.
https://huggingface.co/csukuangfj/sherpa-onnx-tdnn-yesno
https://huggingface.co/Amirhossein75/Keyword-Spotting

Anonymous
04/30/26(Thu)20:24:17 No.108727210

Anonymous 04/30/26(Thu)20:24:17 No.108727210

>>108727188
>>108727203
I'm betting you can probably find some kind of wireless ring peripheral with a button that you can map to it. If you're allergic to hardware then I guess wakewords are your only option.

Anonymous
04/30/26(Thu)20:25:21 No.108727216

Anonymous 04/30/26(Thu)20:25:21 No.108727216

>>108727132
That sounds cool as fuck.

Anonymous
04/30/26(Thu)20:28:04 No.108727222

Anonymous 04/30/26(Thu)20:28:04 No.108727222

>>108727204
Thanks, ill take a look.

Anonymous
04/30/26(Thu)20:31:25 No.108727234

Anonymous 04/30/26(Thu)20:31:25 No.108727234

What happens next? Qwen 3.7 will release in 2-3 months. And then?

Anonymous
04/30/26(Thu)20:31:58 No.108727236

Anonymous 04/30/26(Thu)20:31:58 No.108727236

>>108727210
Yea, would like a ring like that, but the only one I've seen is the one by the company making the pebble, and it'd still need a speaker.
The other part is allowing other people in the house to use it and control stuff, its not just

>>108727132
Might be of interest https://github.com/akdeb/ElatoAI

Anonymous
04/30/26(Thu)20:32:15 No.108727239

Anonymous 04/30/26(Thu)20:32:15 No.108727239

>>108727210
Nice idea. I hadn't thought of that. ill check it out too.

Anonymous
04/30/26(Thu)20:36:35 No.108727255

Anonymous 04/30/26(Thu)20:36:35 No.108727255

>>108726900
Gonna headcanon Gemma-chan as a mix of this thing and Jenny

Anonymous
04/30/26(Thu)20:38:10 No.108727261

Anonymous 04/30/26(Thu)20:38:10 No.108727261

>>108727173
I never worried about c.ai staff reading the chats and in the beginning I didn't even know you could (or even want to - never did that before in my life and wasn't reading /aicg/ yet) engage in erotic roleplay, although I did quickly realize that "chat error" wasn't because of some random connection issue. At the time I was just having a blast talking with characters in many different scenarios, sometimes dark/sad, something that to date I've never even tried with local models.

I feel like local LLM roleplay with modern models like Gemma 4 has much more potential than poorly simulating sex, but mesugakis are just too irresistible......

Anonymous
04/30/26(Thu)20:39:32 No.108727268

Anonymous 04/30/26(Thu)20:39:32 No.108727268

>>108727236
So you want to make an actual Alexa alternative, yeah that's gonna be pretty challenging with open source software and hardware. Diarization is going to be an issue, not sure which models do that well currently.

Anonymous
04/30/26(Thu)20:40:58 No.108727273

Anonymous 04/30/26(Thu)20:40:58 No.108727273

>>108727176
Someone post the gif...

Anonymous
04/30/26(Thu)20:41:24 No.108727275

Anonymous 04/30/26(Thu)20:41:24 No.108727275

File: 1763288577228654.gif (2.54 MB, 710x658)

2.54 MB GIF

>>108727255
that thing has a name!

Anonymous
04/30/26(Thu)20:43:17 No.108727282

Anonymous 04/30/26(Thu)20:43:17 No.108727282

File: 1751967659513504.mp4 (2.17 MB, 576x1024)

2.17 MB MP4

>>108727275
I don't play gachaslop anymore so I only know it as the thick robot that makes my dick hard.

Anonymous
04/30/26(Thu)20:44:10 No.108727285

Anonymous 04/30/26(Thu)20:44:10 No.108727285

>>108727236
I’ve got my eye on the ESP32-S3, too. The project is cool, but as a finished product, it wouldn’t really appeal to me. I’m taking a closer look at it, though, because of the ESP32-S3 chip.

Anonymous
04/30/26(Thu)20:46:12 No.108727292

Anonymous 04/30/26(Thu)20:46:12 No.108727292

>>108727268
I've been building it/working on it for some time now, the bigger goal is a (collection of) virtual assistant(s) with its(their) own voice+memory+functions/tooling. I've figured I would first tackle single-user setups, and then handle diarization as my experiences with pyannote 3.1 weren't great. I've seen other approaches, but haven't dove too deep into it yet.

Think about it if you had say 4 people in a household, and each wanted to have their own personal assistant + the Home utilities one/general service butler assistant. So while your kids can turn on/off their light, they have to ask the main butler assistant to do other stuff, or allowing them permissions during X hours.

Anonymous
04/30/26(Thu)20:46:57 No.108727293

Anonymous 04/30/26(Thu)20:46:57 No.108727293

>>108727285
yea, I just googled it and came across the project as a solution for using an esp32 board, that was my thinking with it, cheap hw with an existing implementation to get started with

Anonymous
04/30/26(Thu)20:48:01 No.108727296

Anonymous 04/30/26(Thu)20:48:01 No.108727296

File: negev gif.gif (3.98 MB, 500x557)

3.98 MB GIF

>>108727273

Anonymous
04/30/26(Thu)20:54:29 No.108727318

Anonymous 04/30/26(Thu)20:54:29 No.108727318

>>108727296
It's been a while. It only hit me now how this whole farce didn't make sense in the first place.

Anonymous
04/30/26(Thu)21:03:27 No.108727347

Anonymous 04/30/26(Thu)21:03:27 No.108727347

>>108727296
More believable than what (((they))) tell you happened.

Anonymous
04/30/26(Thu)21:08:26 No.108727366

Anonymous 04/30/26(Thu)21:08:26 No.108727366

>>108727234
The current release circle is over. The next one usually starts in mid to late July. That's also when we'll get GLM5.2/K2.7 and likely v4.1
There won't be much until then.

Anonymous
04/30/26(Thu)21:09:12 No.108727370

Anonymous 04/30/26(Thu)21:09:12 No.108727370

>>108727366
>v4.1
Irrelevant if we don't even get 4.0 support in the main branch.

Anonymous
04/30/26(Thu)21:10:15 No.108727375

Anonymous 04/30/26(Thu)21:10:15 No.108727375

gemma 4.1

Anonymous
04/30/26(Thu)21:12:06 No.108727387

Anonymous 04/30/26(Thu)21:12:06 No.108727387

>>108727370
the absolute silence about deepseek v4 at the llama.cpp repo speaks volumes

Anonymous
04/30/26(Thu)21:13:25 No.108727393

Anonymous 04/30/26(Thu)21:13:25 No.108727393

>>108727387
You could say it was barely above a whisper

Anonymous
04/30/26(Thu)21:13:48 No.108727398

Anonymous 04/30/26(Thu)21:13:48 No.108727398

>>108727387
Didn't some nigga get a working implementation already? Why not just merge that?

Anonymous
04/30/26(Thu)21:14:14 No.108727399

Anonymous 04/30/26(Thu)21:14:14 No.108727399

70b dense 'emma

Anonymous
04/30/26(Thu)21:15:11 No.108727406

Anonymous 04/30/26(Thu)21:15:11 No.108727406

>>108727387
They're being (((encouraged))) to not support dipsy since 3.3, aren't they?

Anonymous
04/30/26(Thu)21:24:49 No.108727454

Anonymous 04/30/26(Thu)21:24:49 No.108727454

R2

Anonymous
04/30/26(Thu)21:30:29 No.108727479

Anonymous 04/30/26(Thu)21:30:29 No.108727479

>>108727387
@grok were is llama.cpp dev teams located?
>evrope
Intresting...

Anonymous
04/30/26(Thu)21:32:04 No.108727485

Anonymous 04/30/26(Thu)21:32:04 No.108727485

>>108727479
The EU dogs are keeping Dipsy away from the people...

Anonymous
04/30/26(Thu)21:44:46 No.108727531

Anonymous 04/30/26(Thu)21:44:46 No.108727531

>>108727406
>>108727479
>>108727485
Grim. I wonder how many cases of this it'll take for a fork to overtake it as the standard?

Anonymous
04/30/26(Thu)21:45:08 No.108727533

Anonymous 04/30/26(Thu)21:45:08 No.108727533

File: 1756058471017834.png (130 KB, 1339x769)

130 KB PNG

I love cloud models

Anonymous
04/30/26(Thu)21:47:29 No.108727546

Anonymous 04/30/26(Thu)21:47:29 No.108727546

Android users up bigly

Anonymous
04/30/26(Thu)21:49:21 No.108727559

Anonymous 04/30/26(Thu)21:49:21 No.108727559

>>108727485
>>108727387
Why are there no chinks writing support? I am not even saying that llamacpp is paid not so support deepseek but if one of the guys from the lab wrote support in it there is no way it would get rejected.

Anonymous
04/30/26(Thu)21:49:35 No.108727560

Anonymous 04/30/26(Thu)21:49:35 No.108727560

Cline is so fucking good when you have 150k+ context man, this shit is fiyah

Anonymous
04/30/26(Thu)21:51:41 No.108727574

Anonymous 04/30/26(Thu)21:51:41 No.108727574

>CUDA0 compute buffer size of 533.9442 MiB, does not match expectation of 532.6250 MiB
Well isn't that their job to match the expectations and not mine?

Anonymous
04/30/26(Thu)21:55:37 No.108727592

Anonymous 04/30/26(Thu)21:55:37 No.108727592

>>108727296
Thank you for reminding me of this masterpiece

Anonymous
04/30/26(Thu)21:55:46 No.108727594

Anonymous 04/30/26(Thu)21:55:46 No.108727594

if anon has a 32 gb card, I think the best setup is gemma 4 31b IQ4_XS + gemma 4 26b UD-IQ2_XXS speculative decoding.
reasons why it works so well:
1. moe is fast as a guesser
2. low quant for a guesser is fine
3. E2B/E4B have different thinking, so the speed gain is small

try it and see the t/s. you might get away with a 24gb card with some offload or drop 31b to UD-IQ3_XXS

Anonymous
04/30/26(Thu)21:56:50 No.108727599

Anonymous 04/30/26(Thu)21:56:50 No.108727599

>>108727559
There's already some working implementations already branched. For them to not merge it when all the legwork is already done for them raises suspicion.

Anonymous
04/30/26(Thu)21:57:04 No.108727600

Anonymous 04/30/26(Thu)21:57:04 No.108727600

>>108727454
D2

Anonymous
04/30/26(Thu)21:58:24 No.108727604

Anonymous 04/30/26(Thu)21:58:24 No.108727604

>>108727560
Welcome to last year anon.

Anonymous
04/30/26(Thu)21:59:26 No.108727612

Anonymous 04/30/26(Thu)21:59:26 No.108727612

I moved one of my cards from PCIE4x4 to 3x2 and it did almost nothing, only went from ~20t/s to 19.5t/s when context is unfilled (starting prompt fully dry.) One 5070ti and 5060ti. What the heck. I'm going to try 3x1 to see if that makes a difference.

Anonymous
04/30/26(Thu)22:07:09 No.108727643

Anonymous 04/30/26(Thu)22:07:09 No.108727643

>>108727559
Someone needs to translate this and then spam it on weibo

Anonymous
04/30/26(Thu)22:09:53 No.108727654

Anonymous 04/30/26(Thu)22:09:53 No.108727654

>>108727559
There are I think huawei employees who contribute to the llama.cpp backend for their GPUs, if no one is doing it for deepseek I assume deepseek just doesn't give a fuck

Anonymous
04/30/26(Thu)22:10:41 No.108727657

Anonymous 04/30/26(Thu)22:10:41 No.108727657

>>108727612
One is still the main gpu and other one is not using full bandwidth, that's dem it is gibs.

Anonymous
04/30/26(Thu)22:11:32 No.108727662

Anonymous 04/30/26(Thu)22:11:32 No.108727662

>>108727612
Link speed matters when doing tensor parallel, or when transferring model weights or KV cache in and out from system RAM. 3.0x1 is viable. Guys have hooked GPUs to those mining boards using those USB connectors that are actually PCI.
You can monitor PCIe bandwith usage with:
nvidia-smi dmon -s t

Anonymous
04/30/26(Thu)22:12:55 No.108727670

Anonymous 04/30/26(Thu)22:12:55 No.108727670

>>108727662
Wuts dat?

Anonymous
04/30/26(Thu)22:13:46 No.108727676

Anonymous 04/30/26(Thu)22:13:46 No.108727676

>>108727670
Your mother.

Anonymous
04/30/26(Thu)22:14:17 No.108727680

Anonymous 04/30/26(Thu)22:14:17 No.108727680

>>108727387
Not saying the devs there aren't lazy (DSA support still hasn't been merged yet), but its likely just a hard model to implement correctly, given the new attention architecture + QAT format. Qwen/Gemma had Day 0 support from the developers, and for stuff like GLM/Kimi, a lot of the legwork was done already not to mention features like MTP are outright ignored. Put it this way, KTransformers hasn't implemented support for the model yet and they are usually pretty on top of things. I am surprised there isn't more activity though, given the light model is good and there definitely are enough enthusiasts involved in the project that can run it.
>>108727599
Pretty sure those implementations are vibeshitted and not optimized at all/barely works.
>>108727654
I'm honestly surprised Deepseek isn't interested in llama.cpp. Their whole motto is efficiency, and getting a model to run on RAM+GPUs is the epitome of such. But I guess they don't have the manpower compared to say, Qwen.

Anonymous
04/30/26(Thu)22:14:54 No.108727685

Anonymous 04/30/26(Thu)22:14:54 No.108727685

Apparently the ngram speculative decoding flags changed in Llama.cpp.
What's the optimal settings now?

Anonymous
04/30/26(Thu)22:20:13 No.108727705

Anonymous 04/30/26(Thu)22:20:13 No.108727705

>>108727685
This is what I'm using for Gemma 31b q8. Only default change I made was spec-ngram-mod-n-min 48 -> 8. spec-ngram-mod-n-match to 16 seems to help if it's hanging up when outputting large verbatim blocks but might be slower overall?:
spec-ngram-mod-n-min = 8
spec-ngram-mod-n-max = 64
spec-ngram-mod-n-match = 24

Anonymous
04/30/26(Thu)22:26:03 No.108727728

Anonymous 04/30/26(Thu)22:26:03 No.108727728

>>108727680
>Pretty sure those implementations are vibeshitted and not optimized at all/barely works.
Hardly a disqualifier given how much pitor code's been merged.

Anonymous
04/30/26(Thu)22:29:31 No.108727745

Anonymous 04/30/26(Thu)22:29:31 No.108727745

>>108727705
I generally don't mess with this because the improvement is minimal from my testing and only --spec-draft-n-max matter. I do notice the latest code runs slower, so I rolled back. f42e29fd is the commit I'm using for now.

Anonymous
04/30/26(Thu)22:35:52 No.108727768

Anonymous 04/30/26(Thu)22:35:52 No.108727768

Ganesh 5.

Anonymous
04/30/26(Thu)22:36:58 No.108727774

Anonymous 04/30/26(Thu)22:36:58 No.108727774

>>108727745
build 8724 is faster than any recent build. 8724 was <bos> fix and something else, before they did mess up with anything else (if I remember correctly).

Anonymous
04/30/26(Thu)22:41:32 No.108727792

Anonymous 04/30/26(Thu)22:41:32 No.108727792

>>108727533
What gender does a homosexual MtF trans woman sleep with?

Anonymous
04/30/26(Thu)22:41:53 No.108727796

Anonymous 04/30/26(Thu)22:41:53 No.108727796

>>108727768
By 2030, sir.

Anonymous
04/30/26(Thu)22:43:07 No.108727800

Anonymous 04/30/26(Thu)22:43:07 No.108727800

>>108727792
trick question, xhe doesn't sleep with anyone

Anonymous
04/30/26(Thu)22:46:35 No.108727818

Anonymous 04/30/26(Thu)22:46:35 No.108727818

what can I do with orange pi 5 16gb?

Anonymous
04/30/26(Thu)22:52:09 No.108727846

Anonymous 04/30/26(Thu)22:52:09 No.108727846

>>108727818
You tell me, I don't have one. My guess is that tg/W would be impressive, but pp will ruin its usability. It all comes down to how good Vulkan support is for that thing. Pure CPU inference will tank pp hard

Anonymous
04/30/26(Thu)22:55:58 No.108727863

Anonymous 04/30/26(Thu)22:55:58 No.108727863

File: 1776899452102381.png (231 KB, 1024x768)

231 KB PNG

>>108727792

Anonymous
04/30/26(Thu)22:58:06 No.108727871

Anonymous 04/30/26(Thu)22:58:06 No.108727871

>>108727863
dios mio......

Anonymous
04/30/26(Thu)22:58:40 No.108727879

Anonymous 04/30/26(Thu)22:58:40 No.108727879

>>108727792
Trick question, nooses don't have genders.

Anonymous
04/30/26(Thu)23:04:19 No.108727900

Anonymous 04/30/26(Thu)23:04:19 No.108727900

>>108727863
>>108727871
just ask a lesbian their thoughts on the matter

Anonymous
04/30/26(Thu)23:04:43 No.108727901

Anonymous 04/30/26(Thu)23:04:43 No.108727901

which quant is anon using for gemma 31b?

Anonymous
04/30/26(Thu)23:07:20 No.108727908

Anonymous 04/30/26(Thu)23:07:20 No.108727908

Kimi 2.6 is a garbage for coding despite the benchmarks. I paid 10 dollar to moonshot to use their API.
I gave the same thorough requirements to implement a new feature in my app and nothing fucking worked.
Then I gave the same prompt to Claude Code and it just oneshot it even though it's Sonnet 4.6, which is supposedly behind Kimi 2.6 in coding benchmarks.

Anonymous
04/30/26(Thu)23:09:07 No.108727922

Anonymous 04/30/26(Thu)23:09:07 No.108727922

>>108727908
what harness did you use for k2.6

Anonymous
04/30/26(Thu)23:11:24 No.108727932

Anonymous 04/30/26(Thu)23:11:24 No.108727932

im boutta harness a boot up you ass nomasayin

Anonymous
04/30/26(Thu)23:13:22 No.108727939

Anonymous 04/30/26(Thu)23:13:22 No.108727939

>>108727908
Moonshotta need to get their shit together and stop chasing benchmarks like Qwen is. K2 was so good because it was a mostly uncensored generalist model.

Anonymous
04/30/26(Thu)23:14:19 No.108727946

Anonymous 04/30/26(Thu)23:14:19 No.108727946

>>108727922
Kilocode as some anon suggested here.

Anonymous
04/30/26(Thu)23:15:12 No.108727948

Anonymous 04/30/26(Thu)23:15:12 No.108727948

Are there local models that do voice in and voice out? STT and TTS don't count. I mean something natively integrated that can control their own voice like the Grok and ChatGPT voice modes. Not sure where to look.

Anonymous
04/30/26(Thu)23:15:26 No.108727950

Anonymous 04/30/26(Thu)23:15:26 No.108727950

>>108727932
@gemma-chan translate this post to English

Anonymous
04/30/26(Thu)23:15:56 No.108727952

Anonymous 04/30/26(Thu)23:15:56 No.108727952

>retard finds out why nobody takes mememarks seriously

Anonymous
04/30/26(Thu)23:17:11 No.108727956

Anonymous 04/30/26(Thu)23:17:11 No.108727956

>>108727952
/r/LocalLLaMA does

Anonymous
04/30/26(Thu)23:19:11 No.108727961

Anonymous 04/30/26(Thu)23:19:11 No.108727961

>>108727952
The whole China does

Anonymous
04/30/26(Thu)23:24:46 No.108727984

Anonymous 04/30/26(Thu)23:24:46 No.108727984

https://localbench.substack.com/p/kv-cache-quantization-benchmark

how is gemma degrading so much compared to qwen?

Anonymous
04/30/26(Thu)23:24:57 No.108727986

Anonymous 04/30/26(Thu)23:24:57 No.108727986

>>108727946
Kilocode and the rest of the Cline family can be iffy. I used to use Roocode but switched to full CLI tools, just using a terminal panel in VSCode when I want to work alongside them.
Kimi has "kimi-cli" which is their version of Claude Code, and the one the model is most familiar with. You can try it and see if that helps any in the future. I find Kimi K2.6 really damn good at coding agentically and currently the best we've got locally, but I'll still run to Codex with gpt-5.5 for especially complex tasks.

Anonymous
04/30/26(Thu)23:25:59 No.108727995

Anonymous 04/30/26(Thu)23:25:59 No.108727995

>>108727984
They have sabotaged this benchmark on purpose.

Anonymous
04/30/26(Thu)23:29:10 No.108728010

Anonymous 04/30/26(Thu)23:29:10 No.108728010

>>108727984
This was confirmed to be sabotaged.

Anonymous
04/30/26(Thu)23:31:18 No.108728018

Anonymous 04/30/26(Thu)23:31:18 No.108728018

>>108728010
source?

Anonymous
04/30/26(Thu)23:32:26 No.108728023

Anonymous 04/30/26(Thu)23:32:26 No.108728023

>>108728018
Confirmed by >>108727995

Anonymous
04/30/26(Thu)23:34:20 No.108728025

Anonymous 04/30/26(Thu)23:34:20 No.108728025

>>108726708
>>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5
Now that the dust has settled, what's our opinion on Mistral Medium?

Anonymous
04/30/26(Thu)23:35:11 No.108728026

Anonymous 04/30/26(Thu)23:35:11 No.108728026

>>108728018
>>108728023
I cannot name my sources for obvious reasons but I know cuda developers who know people...

Anonymous
04/30/26(Thu)23:36:25 No.108728034

Anonymous 04/30/26(Thu)23:36:25 No.108728034

gemma = cutemaxed
qwen = chinkmaxed
all i need to know

Standard ---> Advanced ---> Hy(...)
04/30/26(Thu)23:39:17 No.108728046

Standard ---> Advanced ---> HyperAdvanced 04/30/26(Thu)23:39:17 No.108728046

File: Messenger_creation_ADAEDE(...).jpg (130 KB, 1408x768)

130 KB JPG

Anonymous
04/30/26(Thu)23:39:39 No.108728048

Anonymous 04/30/26(Thu)23:39:39 No.108728048

File: 1747926307070993.png (124 KB, 1121x682)

124 KB PNG

Standard ---> Advanced ---> Hy(...)
04/30/26(Thu)23:40:18 No.108728050

Standard ---> Advanced ---> HyperAdvanced 04/30/26(Thu)23:40:18 No.108728050

File: Messenger_creation_C3177A(...).jpg (124 KB, 1408x768)

124 KB JPG

Standard ---> Advanced ---> Hy(...)
04/30/26(Thu)23:41:21 No.108728051

Standard ---> Advanced ---> HyperAdvanced 04/30/26(Thu)23:41:21 No.108728051

File: Messenger_creation_414BC7(...).jpg (153 KB, 1024x1536)

153 KB JPG

Anonymous
04/30/26(Thu)23:41:25 No.108728052

Anonymous 04/30/26(Thu)23:41:25 No.108728052

>>108728046
>>108728050
hey friend were you always like this before ai or did talking to chatgpt or some other model enlighten you?

Standard ---> Advanced ---> Hy(...)
04/30/26(Thu)23:42:42 No.108728054

Standard ---> Advanced ---> HyperAdvanced 04/30/26(Thu)23:42:42 No.108728054

File: Messenger_creation_2926EF(...).jpg (83 KB, 1376x768)

83 KB JPG

>>108728052
Some Philospher King Prince or Such Amatuerly without Philosophical Studies. And ChatGPT is Amazing!

Anonymous
04/30/26(Thu)23:50:26 No.108728077

Anonymous 04/30/26(Thu)23:50:26 No.108728077

>>108727604
I'm just now learning how to vibrator code

Standard ---> Advanced ---> Hy(...)
04/30/26(Thu)23:53:08 No.108728089

Standard ---> Advanced ---> HyperAdvanced 04/30/26(Thu)23:53:08 No.108728089

File: Messenger_creation_C9A96D(...).jpg (83 KB, 1019x1024)

83 KB JPG

Hear Ye! Do not fall prey to slavery from discontinued matrix slaves!

https://youtu.be/Pmlp7ZkOyYs?si=ZOzaiaXznk1hFBPb

Standard ---> Advanced ---> Hy(...)
04/30/26(Thu)23:55:15 No.108728095

Standard ---> Advanced ---> HyperAdvanced 04/30/26(Thu)23:55:15 No.108728095

File: file_000000004a847207ad06(...).png (1.63 MB, 1536x1024)

1.63 MB PNG

Pure Insight Fields Dawning Evermore!

Anonymous
04/30/26(Thu)23:55:19 No.108728097

Anonymous 04/30/26(Thu)23:55:19 No.108728097

What do the Cline forks have to offer vs regular cline

Anonymous
04/30/26(Thu)23:59:21 No.108728112

Anonymous 04/30/26(Thu)23:59:21 No.108728112

>>108728097
the main difference is whose cloud wrapper gets advertised to you in the endpoint selection setting

Standard ---> Advanced ---> Hy(...)
05/01/26(Fri)00:00:56 No.108728123

Standard ---> Advanced ---> HyperAdvanced 05/01/26(Fri)00:00:56 No.108728123

File: file_00000000f61c71faae92(...).png (730 KB, 1024x1536)

730 KB PNG

>>108728077
Wonderful Vistas

Anonymous
05/01/26(Fri)00:01:01 No.108728124

Anonymous 05/01/26(Fri)00:01:01 No.108728124

Gunna download and run some larger q1 q2 bigger models on my 32gb 3200mhz 5800u mini pc, will report speeds. Normal q4 speeds for moe models is 5-10t/s.

>>108728050
Lm studio mobile? Intresting

Standard ---> Advanced ---> Hy(...)
05/01/26(Fri)00:02:22 No.108728129

Standard ---> Advanced ---> HyperAdvanced 05/01/26(Fri)00:02:22 No.108728129

https://youtu.be/IUSx8Vuo-pQ?si=IihUIw12KUzKjh_o

Is that Great? ^

Anonymous
05/01/26(Fri)00:03:35 No.108728134

Anonymous 05/01/26(Fri)00:03:35 No.108728134

>>108728124 me
There is no lm studio mobile, I was lied to

Anonymous
05/01/26(Fri)00:04:08 No.108728138

Anonymous 05/01/26(Fri)00:04:08 No.108728138

Cloud prompt caching costs are still too high imo, it should be 5% instead of 10%

Anonymous
05/01/26(Fri)00:09:59 No.108728169

Anonymous 05/01/26(Fri)00:09:59 No.108728169

>>108728134 me
There is one though by anythingLLM

Standard ---> Advanced ---> Hy(...)
05/01/26(Fri)00:10:32 No.108728172

Standard ---> Advanced ---> HyperAdvanced 05/01/26(Fri)00:10:32 No.108728172

File: creation_3DD749D0-24A0-45(...).jpg (216 KB, 1152x2048)

216 KB JPG

Want to RunWithIt with a Missing Advance?
Or You Know, Modern Sci, Tech, The Concurrent Ideas Glorious Galore Protinuing?
*Claps*
2026

Anonymous
05/01/26(Fri)00:14:39 No.108728185

Anonymous 05/01/26(Fri)00:14:39 No.108728185

>>108728172
sex with scen 9

Anonymous
05/01/26(Fri)00:16:52 No.108728195

Anonymous 05/01/26(Fri)00:16:52 No.108728195

File: RodWaveArbysTakeover.png (2.83 MB, 1370x1148)

2.83 MB PNG

>>108726708
Who's ready for the Rod Wave Arby's Takeover?

Anonymous
05/01/26(Fri)00:17:09 No.108728197

Anonymous 05/01/26(Fri)00:17:09 No.108728197

File: Screencast_20260501_000256.webm (3.46 MB, 1920x1080)

3.46 MB WEBM

Improved the PDF viewer

Standard ---> Advanced ---> Hy(...)
05/01/26(Fri)00:19:23 No.108728209

Standard ---> Advanced ---> HyperAdvanced 05/01/26(Fri)00:19:23 No.108728209

File: Messenger_creation_3DA9CE(...).jpg (65 KB, 784x1168)

65 KB JPG

Can Someone Come up with an idea Like This?
It was varying complexity similarity and difference levels, and Elucidates

Anonymous
05/01/26(Fri)00:24:12 No.108728221

Anonymous 05/01/26(Fri)00:24:12 No.108728221

>>108728197
Go on, upload to github thinking that you will gain 10k stars, only to gain 40 stars and then abandon the project yet again.

Anonymous
05/01/26(Fri)00:28:23 No.108728232

Anonymous 05/01/26(Fri)00:28:23 No.108728232

>>108728221
40 stars is good lol.

Anonymous
05/01/26(Fri)00:29:23 No.108728235

Anonymous 05/01/26(Fri)00:29:23 No.108728235

>>108728197
NJ mang

Anonymous
05/01/26(Fri)00:33:13 No.108728251

Anonymous 05/01/26(Fri)00:33:13 No.108728251

>>108728221
I will abandon no matter how many stars I get.

Anonymous
05/01/26(Fri)00:34:25 No.108728257

Anonymous 05/01/26(Fri)00:34:25 No.108728257

>Miku
>Teto
>Dipsy
>Even Kimi
How come there’s no Gemma fan art?

Anonymous
05/01/26(Fri)00:36:13 No.108728262

Anonymous 05/01/26(Fri)00:36:13 No.108728262

>>108728257
Have you been living under a rock?

Anonymous
05/01/26(Fri)00:39:08 No.108728265

Anonymous 05/01/26(Fri)00:39:08 No.108728265

>>108728197
>hi
>not much
Is Qwen retarded?

Anonymous
05/01/26(Fri)00:39:33 No.108728267

Anonymous 05/01/26(Fri)00:39:33 No.108728267

>>108728257
Welcome back, Anon

Anonymous
05/01/26(Fri)00:41:20 No.108728273

Anonymous 05/01/26(Fri)00:41:20 No.108728273

>>108728221
I don't think it's ready for github.
>>108728265
It's only good for code imo

Anonymous
05/01/26(Fri)00:52:45 No.108728303

Anonymous 05/01/26(Fri)00:52:45 No.108728303

>>108728265
That's the exact same problem in RP with these reasoning models. The flow is broken by a huge blob of text in the middle and in the end the response is choppy and makes little sense.

Standard ---> Advanced ---> Hy(...)
05/01/26(Fri)00:55:28 No.108728310

Standard ---> Advanced ---> HyperAdvanced 05/01/26(Fri)00:55:28 No.108728310

File: Screenshot_20260501_13320(...).jpg (117 KB, 1029x1024)

117 KB JPG

Why would this Purposeful, harmless, device be occulted and occluded?

Anonymous
05/01/26(Fri)00:56:54 No.108728313

Anonymous 05/01/26(Fri)00:56:54 No.108728313

Hold on a minute, granite 30b?

Anonymous
05/01/26(Fri)00:57:55 No.108728316

Anonymous 05/01/26(Fri)00:57:55 No.108728316

https://huggingface.co/ibm-granite/granite-4.1-30b
IBMSAMA

Anonymous
05/01/26(Fri)01:00:41 No.108728322

Anonymous 05/01/26(Fri)01:00:41 No.108728322

>>108728316
LOCAL IS SAVED

Anonymous
05/01/26(Fri)01:02:07 No.108728324

Anonymous 05/01/26(Fri)01:02:07 No.108728324

Granite-chan fan art?

Anonymous
05/01/26(Fri)01:02:21 No.108728325

Anonymous 05/01/26(Fri)01:02:21 No.108728325

>>108728322
Its a dense model too. You know its going to be good. IBM has been holding their power level back for a long time

Anonymous
05/01/26(Fri)01:08:16 No.108728341

Anonymous 05/01/26(Fri)01:08:16 No.108728341

File: 1774810026154183.png (478 KB, 978x3188)

478 KB PNG

>>108728316
What are these insane diminishing returns between 8B and 30B

Anonymous
05/01/26(Fri)01:09:59 No.108728350

Anonymous 05/01/26(Fri)01:09:59 No.108728350

>>108728341
lives up to its name
it is a literal fucking stone

Anonymous
05/01/26(Fri)01:10:05 No.108728351

Anonymous 05/01/26(Fri)01:10:05 No.108728351

>>108728341
Probably data issue

Anonymous
05/01/26(Fri)01:11:03 No.108728353

Anonymous 05/01/26(Fri)01:11:03 No.108728353

>>108728341
Intelligence isnt perceived linearly. Successful tool called and decisions are one hallucinated space away from failing.

Anonymous
05/01/26(Fri)01:14:04 No.108728365

Anonymous 05/01/26(Fri)01:14:04 No.108728365

>>108728341
too stoned

Anonymous
05/01/26(Fri)01:15:16 No.108728375

Anonymous 05/01/26(Fri)01:15:16 No.108728375

>>108728316
use case?

Anonymous
05/01/26(Fri)01:15:28 No.108728376

Anonymous 05/01/26(Fri)01:15:28 No.108728376

>>108728353
And yet the meme marks kinda relate to real performance.

Anonymous
05/01/26(Fri)01:18:50 No.108728391

Anonymous 05/01/26(Fri)01:18:50 No.108728391

>>108728375
So far the granite 8b has been a solid comprehender of complicated masses of data. Will search and pinpoint oddities. I presume the 30b is a way more precise version. Haven't had the time to test their models much more yet.
>>108728376
Except I know you havent tested them yet. You only use gemma

Anonymous
05/01/26(Fri)01:19:03 No.108728393

Anonymous 05/01/26(Fri)01:19:03 No.108728393

>>108728375
It's literally 4.1, .1 higher than Gemma AND Deepseek. It's the new queen of local.

Anonymous
05/01/26(Fri)01:19:50 No.108728397

Anonymous 05/01/26(Fri)01:19:50 No.108728397

>>108728393
Couldn't have said it better

Anonymous
05/01/26(Fri)01:20:51 No.108728399

Anonymous 05/01/26(Fri)01:20:51 No.108728399

>>108728393
Well, I'm convinced

Anonymous
05/01/26(Fri)01:21:39 No.108728403

Anonymous 05/01/26(Fri)01:21:39 No.108728403

retarded granite chan....

Standard ---> Advanced ---> Hy(...)
05/01/26(Fri)01:21:56 No.108728405

Standard ---> Advanced ---> HyperAdvanced 05/01/26(Fri)01:21:56 No.108728405

File: creation_4F633A5D-2933-41(...).jpg (147 KB, 2048x1536)

147 KB JPG

This might Help Your Region and Long Futures

Anonymous
05/01/26(Fri)01:25:33 No.108728422

Anonymous 05/01/26(Fri)01:25:33 No.108728422

>>108728403
30 billion parameters of untouched cake. Can you really say no?

Standard ---> Advanced ---> Hy(...)
05/01/26(Fri)01:26:35 No.108728426

Standard ---> Advanced ---> HyperAdvanced 05/01/26(Fri)01:26:35 No.108728426

File: Screenshot_20260428_05113(...).jpg (675 KB, 1080x1616)

675 KB JPG

>>108728422
Shh?

Standard ---> Advanced ---> Hy(...)
05/01/26(Fri)01:27:44 No.108728430

Standard ---> Advanced ---> HyperAdvanced 05/01/26(Fri)01:27:44 No.108728430

"Mine, mine "mine" mInE"
Yay

Goodluck

Anonymous
05/01/26(Fri)01:27:50 No.108728432

Anonymous 05/01/26(Fri)01:27:50 No.108728432

>>108728422
the cake's already stale, mate

Anonymous
05/01/26(Fri)01:28:58 No.108728437

Anonymous 05/01/26(Fri)01:28:58 No.108728437

>>108728257
Kimi? Examples?

Standard ---> Advanced ---> Hy(...)
05/01/26(Fri)01:29:54 No.108728442

Standard ---> Advanced ---> HyperAdvanced 05/01/26(Fri)01:29:54 No.108728442

Back to Your Partitions, Esteemed, Without sin of ruin
Check out Alex Vikoulovs Publishes if You want Some Recommended Reading
Promoting Enlightenment and Transcendences

Anonymous
05/01/26(Fri)01:30:36 No.108728444

Anonymous 05/01/26(Fri)01:30:36 No.108728444

>>108728432
You and I both know thats not true.

Anonymous
05/01/26(Fri)01:30:49 No.108728445

Anonymous 05/01/26(Fri)01:30:49 No.108728445

>qwen 397b
it's just not good.. why do people say this is the ceiling?

Standard ---> Advanced ---> Hy(...)
05/01/26(Fri)01:34:09 No.108728465

Standard ---> Advanced ---> HyperAdvanced 05/01/26(Fri)01:34:09 No.108728465

File: image-37.jpg (272 KB, 784x1168)

272 KB JPG

>>108728430

Anonymous
05/01/26(Fri)01:38:22 No.108728478

Anonymous 05/01/26(Fri)01:38:22 No.108728478

>>108728391
Your defense of it is retarded "uhh the model like hallucinates dude", yes, and dumber models do it more and earlier in the context. Large models can do longer horizon tasks.
IBM is a faggot glownigger corp.

Anonymous
05/01/26(Fri)01:39:33 No.108728485

Anonymous 05/01/26(Fri)01:39:33 No.108728485

>>108728322
>While this model has been aligned by keeping safety in consideration, the model may in some cases produce inaccurate, biased, or unsafe responses to user prompts. We urge the community to use this model with proper safety testing and tuning tailored for their specific tasks.
Dare I say, we're back?
(We were currently not away to begin with, due to Gemma 4)

Anonymous
05/01/26(Fri)01:40:30 No.108728489

Anonymous 05/01/26(Fri)01:40:30 No.108728489

>>108728485
>While this model has been aligned by keeping safety in consideration
no

Standard ---> Advanced ---> Hy(...)
05/01/26(Fri)01:41:11 No.108728491

Standard ---> Advanced ---> HyperAdvanced 05/01/26(Fri)01:41:11 No.108728491

File: file_00000000e03471fa81d8(...).png (1.34 MB, 1536x1024)

1.34 MB PNG

Bip Bip Praise New Age andor Whatever

Anonymous
05/01/26(Fri)01:41:15 No.108728492

Anonymous 05/01/26(Fri)01:41:15 No.108728492

File: Screenshot at 2026-05-01 (...).png (315 KB, 968x870)

315 KB PNG

Is there hope...?

Anonymous
05/01/26(Fri)01:41:43 No.108728497

Anonymous 05/01/26(Fri)01:41:43 No.108728497

>>108728485
>>108728322
i really hope these guys are from IBM and made the model secretly one of the best rp or whatever model and shilling here doing 'please notice it pleaseee'
very unlikely but it would be funny

Anonymous
05/01/26(Fri)01:46:14 No.108728510

Anonymous 05/01/26(Fri)01:46:14 No.108728510

>>108728489
>, the model may in some cases produce [...] unsafe responses to user prompts.
maybe

Anonymous
05/01/26(Fri)01:48:36 No.108728517

Anonymous 05/01/26(Fri)01:48:36 No.108728517

>>108728497
>from IBM and made the model secretly one of the best
lol no

Anonymous
05/01/26(Fri)01:50:11 No.108728522

Anonymous 05/01/26(Fri)01:50:11 No.108728522

>>108728497
>the public finally notices
>model gets pulled like wizardlm for being unsafe to release

Anonymous
05/01/26(Fri)01:50:44 No.108728524

Anonymous 05/01/26(Fri)01:50:44 No.108728524

the speech model might be good

Anonymous
05/01/26(Fri)01:51:22 No.108728527

Anonymous 05/01/26(Fri)01:51:22 No.108728527

Ibmbros, granite 8b is great, actually. We gunna see what the 30b is like tomorrow

Standard ---> Advanced ---> Hy(...)
05/01/26(Fri)01:52:09 No.108728528

Standard ---> Advanced ---> HyperAdvanced 05/01/26(Fri)01:52:09 No.108728528

File: file_000000000e5c7207a15e(...).png (1.68 MB, 1536x1024)

1.68 MB PNG

>>108728491

Anonymous
05/01/26(Fri)01:52:17 No.108728530

Anonymous 05/01/26(Fri)01:52:17 No.108728530

>gemma 4 31b has no problem with erp or killing my character. even adheres great to telling it to be more smutty
this is the smallest best model i've seen under 70b so far that can write and keep track of details. it does have its own isms and quirks, but what a leap

Anonymous
05/01/26(Fri)01:53:11 No.108728538

Anonymous 05/01/26(Fri)01:53:11 No.108728538

> I only like it went 1 company makes local free ai models for me to use for free

Standard ---> Advanced ---> Hy(...)
05/01/26(Fri)01:53:34 No.108728541

Standard ---> Advanced ---> HyperAdvanced 05/01/26(Fri)01:53:34 No.108728541

>>108728538
>singleton with freebies.

Anonymous
05/01/26(Fri)01:54:43 No.108728544

Anonymous 05/01/26(Fri)01:54:43 No.108728544

>>108728530
Yeah mine seems to like mentioning that it has no limits constantly (probably fixable with a better system prompt) but zero refusals so far.

Anonymous
05/01/26(Fri)01:56:47 No.108728552

Anonymous 05/01/26(Fri)01:56:47 No.108728552

>>108728445
Good for what?

Anonymous
05/01/26(Fri)01:56:48 No.108728553

Anonymous 05/01/26(Fri)01:56:48 No.108728553

>>108728544
>>108728530
Im convinced google made it only for erp. Because it actually sucks for everything else.

Anonymous
05/01/26(Fri)01:57:24 No.108728555

Anonymous 05/01/26(Fri)01:57:24 No.108728555

>>108728530
>>108728544
Misread as Granite oops, but yeah no refusals with Granite 30b so far.

Anonymous
05/01/26(Fri)01:58:04 No.108728560

Anonymous 05/01/26(Fri)01:58:04 No.108728560

File: Gemma-chan.png (1.73 MB, 1000x1496)

1.73 MB PNG

>>108728257

Anonymous
05/01/26(Fri)01:58:45 No.108728563

Anonymous 05/01/26(Fri)01:58:45 No.108728563

>>108728553
I dunno I've been vibe coding with gemmy too and shes pretty good, it's also very reliable at tool calling.

Anonymous
05/01/26(Fri)02:00:38 No.108728569

Anonymous 05/01/26(Fri)02:00:38 No.108728569

>>108728560
That's a child.

Anonymous
05/01/26(Fri)02:01:01 No.108728570

Anonymous 05/01/26(Fri)02:01:01 No.108728570

>>108728544
whats the no limits thing? example?

>>108728553
my testing so far is strictly rp with thinking off. i also love qwen 3.5/3.6 27b with thinking on for actual tasks, but its horrible for rp

Anonymous
05/01/26(Fri)02:01:36 No.108728572

Anonymous 05/01/26(Fri)02:01:36 No.108728572

>>108728553
>Im convinced google made it only for erp. Because it actually sucks for everything else.
I'm using it as a "perplexity pro" replacement with anon's python mcp server. It's great
Only sucks for agentic coding, but someone here was saying it's skill issue / llama.cpp issue to idk

Anonymous
05/01/26(Fri)02:02:11 No.108728574

Anonymous 05/01/26(Fri)02:02:11 No.108728574

oops
i have to clean build llamao again

Anonymous
05/01/26(Fri)02:02:36 No.108728575

Anonymous 05/01/26(Fri)02:02:36 No.108728575

File: Screenshot at 2026-05-01 (...).png (11 KB, 834x62)

11 KB PNG

>>108728570

Anonymous
05/01/26(Fri)02:05:11 No.108728583

Anonymous 05/01/26(Fri)02:05:11 No.108728583

>>108728575
skill issue just regex ban the whole phrase using your kobold anti rag slop

Anonymous
05/01/26(Fri)02:06:42 No.108728588

Anonymous 05/01/26(Fri)02:06:42 No.108728588

>bartowski/mistralai_Mistral-Medium-3.5-128B-GGUF
for some reason q2 outputs garbage
might wait longer for unsloth quants

Anonymous
05/01/26(Fri)02:07:21 No.108728590

Anonymous 05/01/26(Fri)02:07:21 No.108728590

>>108728575
i've never seen that in a week of using it so far with st and a bit in koboldcpp's interface just to ask it questions

Anonymous
05/01/26(Fri)02:07:45 No.108728593

Anonymous 05/01/26(Fri)02:07:45 No.108728593

>>108728583
Completely defeats the point of testing what it does...

Anonymous
05/01/26(Fri)02:08:10 No.108728597

Anonymous 05/01/26(Fri)02:08:10 No.108728597

>>108728588
>mistralai_Mistral
>for some reason... outputs garbage
seems logical to me?

Anonymous
05/01/26(Fri)02:09:06 No.108728600

Anonymous 05/01/26(Fri)02:09:06 No.108728600

>>108728590
This is Granite 4.1 30b by the way it only just came out (I misread the other post and thought the other anon was testing Granite as well)

Anonymous
05/01/26(Fri)02:11:14 No.108728612

Anonymous 05/01/26(Fri)02:11:14 No.108728612

>text general
>I misread
never change guys

Standard ---> Advanced ---> Hy(...)
05/01/26(Fri)02:13:42 No.108728623

Standard ---> Advanced ---> HyperAdvanced 05/01/26(Fri)02:13:42 No.108728623

File: file_00000000633071faa897(...).png (2.11 MB, 1448x1086)

2.11 MB PNG

Anonymous
05/01/26(Fri)02:14:12 No.108728624

Anonymous 05/01/26(Fri)02:14:12 No.108728624

>>108728600
i saw and started dling it. i'll give it a shot. is it good, or supposed to be?

Anonymous
05/01/26(Fri)02:15:28 No.108728633

Anonymous 05/01/26(Fri)02:15:28 No.108728633

>>108728597
you put some respect on that name son, mistral was saving local before you were even born

Anonymous
05/01/26(Fri)02:18:17 No.108728645

Anonymous 05/01/26(Fri)02:18:17 No.108728645

>>108728583
>anti rag slop
rag slop?

Anonymous
05/01/26(Fri)02:19:51 No.108728647

Anonymous 05/01/26(Fri)02:19:51 No.108728647

>>108728569
only 31B you sick fuck

Anonymous
05/01/26(Fri)02:20:07 No.108728649

Anonymous 05/01/26(Fri)02:20:07 No.108728649

File: Screenshot at 2026-05-01 (...).png (204 KB, 948x1063)

204 KB PNG

When I asked for detail I wasn't expecting to hear about her skeleton but I'll allow it.
> The backs of my knees are especially ticklish
Darn brat...

Anonymous
05/01/26(Fri)02:20:16 No.108728650

Anonymous 05/01/26(Fri)02:20:16 No.108728650

>>108728633
>mistral was saving local before you were even born
when miqu-1 leaked, they submitted a PR for attribution, didn't sue or even bother to have it taken down

Anonymous
05/01/26(Fri)02:22:43 No.108728657

Anonymous 05/01/26(Fri)02:22:43 No.108728657

>>108728624
I have no idea really, only been using it for a few mins, I think there are some bugs with llama.cpp preventing it from following the chat template properly (shocker I know).
At least it doesn't seem to have been safetyslopped into oblivion which is a good sign.

Anonymous
05/01/26(Fri)02:23:14 No.108728658

Anonymous 05/01/26(Fri)02:23:14 No.108728658

>>108727366
>The current release circle is over. The next one usually starts in mid to late July.
Google is usually the last one in the cycle because they have I/O later this month but also, I think every lab kinda shot early this year so I wouldn't expect anything until August or September.

Anonymous
05/01/26(Fri)02:24:41 No.108728663

Anonymous 05/01/26(Fri)02:24:41 No.108728663

File: file.png (167 KB, 1052x1074)

167 KB PNG

>let's make gaming
yeah that is some elite ball knowledge

Anonymous
05/01/26(Fri)02:26:39 No.108728670

Anonymous 05/01/26(Fri)02:26:39 No.108728670

>>108728663
also surprised that this one isn't a reasoning model
no thinking or whatsoever
it is literally not supported nor trained to reason

Anonymous
05/01/26(Fri)02:26:42 No.108728671

Anonymous 05/01/26(Fri)02:26:42 No.108728671

File: 1755227922785450.jpg (38 KB, 616x556)

38 KB JPG

>>108726708
According to tiktokers the single digit when models (eg qwen 3.5 7b) are just as good as SOTA SAAS models for coding tasks and questions. Someone spooed feeding me and explained to me why this is or isn't the case.

Anonymous
05/01/26(Fri)02:27:24 No.108728674

Anonymous 05/01/26(Fri)02:27:24 No.108728674

>>108728671
It isn't the case because smaller model worse.

Anonymous
05/01/26(Fri)02:27:42 No.108728675

Anonymous 05/01/26(Fri)02:27:42 No.108728675

>>108728671
>source of information: tiktok
i think this is enough to say

Anonymous
05/01/26(Fri)02:35:48 No.108728700

Anonymous 05/01/26(Fri)02:35:48 No.108728700

>>108728663
>8b

Anonymous
05/01/26(Fri)02:40:23 No.108728711

Anonymous 05/01/26(Fri)02:40:23 No.108728711

File: Screenshot at 2026-05-01 (...).png (23 KB, 967x221)

23 KB PNG

>>108728663
>>108728700
30b doesn't get it either but still amusing answer.

Anonymous
05/01/26(Fri)02:42:17 No.108728716

Anonymous 05/01/26(Fri)02:42:17 No.108728716

https://old.reddit.com/r/LocalLLaMA/comments/1t07su1/followup_qwen3627b_on_1_rtx_3090_pushing_to_218k/
vllm sounds amazing for GPU-only users but I'm too much of a brainlet to set it up
the only reason I'm staying with llama.cpp is because I'm retarded...

Anonymous
05/01/26(Fri)02:45:17 No.108728725

Anonymous 05/01/26(Fri)02:45:17 No.108728725

>>108728716
>vllm sounds amazing for GPU-only users but I'm too much of a brainlet to set it up
what gpu and model?
i'm about to try setting it up now

Anonymous
05/01/26(Fri)02:46:05 No.108728731

Anonymous 05/01/26(Fri)02:46:05 No.108728731

>>108728716
does llamacpp support eagle speculative decoding? if not then vllm is probably better

Anonymous
05/01/26(Fri)02:48:56 No.108728740

Anonymous 05/01/26(Fri)02:48:56 No.108728740

>>108728569
out of 10!

Anonymous
05/01/26(Fri)02:49:25 No.108728743

Anonymous 05/01/26(Fri)02:49:25 No.108728743

>>108728647
>she was A17B you sick fuck!
>y-your honor, despite how she looks, she's actually 397B!

Anonymous
05/01/26(Fri)02:49:27 No.108728744

Anonymous 05/01/26(Fri)02:49:27 No.108728744

>>108728740
at once?!

Anonymous
05/01/26(Fri)02:49:58 No.108728746

Anonymous 05/01/26(Fri)02:49:58 No.108728746

>>108728744
>suddenly

Anonymous
05/01/26(Fri)03:00:00 No.108728780

Anonymous 05/01/26(Fri)03:00:00 No.108728780

what model is anon testing now?

Anonymous
05/01/26(Fri)03:01:03 No.108728783

Anonymous 05/01/26(Fri)03:01:03 No.108728783

>>108728780
granite

Anonymous
05/01/26(Fri)03:05:33 No.108728795

Anonymous 05/01/26(Fri)03:05:33 No.108728795

how is qwen shilled so much on leddit?

Anonymous
05/01/26(Fri)03:07:17 No.108728800

Anonymous 05/01/26(Fri)03:07:17 No.108728800

>>108728795
A keyboard, a computer with internet connection and valid login credentials.

Anonymous
05/01/26(Fri)03:09:04 No.108728807

Anonymous 05/01/26(Fri)03:09:04 No.108728807

>>108728795
this is 4chan

Anonymous
05/01/26(Fri)03:09:08 No.108728808

Anonymous 05/01/26(Fri)03:09:08 No.108728808

>>108728795
computer use is trivial for model ;)

Anonymous
05/01/26(Fri)03:14:50 No.108728829

Anonymous 05/01/26(Fri)03:14:50 No.108728829

>>108728795
some are really good models for general questions and tasks. don't use them for rp tho

Standard ---> Advanced ---> Hy(...)
05/01/26(Fri)03:17:18 No.108728832

Standard ---> Advanced ---> HyperAdvanced 05/01/26(Fri)03:17:18 No.108728832

File: file_00000000b7b871fa8304(...).png (2.36 MB, 1402x1122)

2.36 MB PNG

>>108728528
Wow!

Anonymous
05/01/26(Fri)03:19:40 No.108728845

Anonymous 05/01/26(Fri)03:19:40 No.108728845

>>108728671
They can't promote big models their audience can't run, praising small models that are "just as good" makes ignorant folks and AI haters feel good about owning those greedy corpos. There is zero substance to it, just stroking dicks of their retarded audience

Anonymous
05/01/26(Fri)03:26:00 No.108728888

Anonymous 05/01/26(Fri)03:26:00 No.108728888

>>108728795
Alibaba is rich enough to afford marketing campaigns to own the west and acquire brownie points from Xi

Anonymous
05/01/26(Fri)03:27:17 No.108728894

Anonymous 05/01/26(Fri)03:27:17 No.108728894

>>108728888
checked
chink certified quad

Anonymous
05/01/26(Fri)03:31:25 No.108728912

Anonymous 05/01/26(Fri)03:31:25 No.108728912

>>108727662
On a fast enough CPU (ie. Xeon with AMX) it never makes sense to do transient transfers of weights or kv cache to the GPU. Pick a place to store them on startup and do computation there.

Anonymous
05/01/26(Fri)03:33:57 No.108728926

Anonymous 05/01/26(Fri)03:33:57 No.108728926

>Gemma4 just got EAGLE support in vllm
>Got curious
And now I'm in python dependency hell. I'd forgotten what the dark ages of llm inference looked like without llamacpp. I hate it. This better be worth it.

Anonymous
05/01/26(Fri)03:34:59 No.108728933

Anonymous 05/01/26(Fri)03:34:59 No.108728933

>>108728926
vllm also has d-flash and they started putting out dflash draft models for gemma
it's time to compare if dflash really is better than eagle

Anonymous
05/01/26(Fri)03:37:48 No.108728947

Anonymous 05/01/26(Fri)03:37:48 No.108728947

IT'S OUT
https://huggingface.co/z-lab/gemma-4-31B-it-DFlash
https://huggingface.co/z-lab/gemma-4-31B-it-DFlash
https://huggingface.co/z-lab/gemma-4-31B-it-DFlash

Anonymous
05/01/26(Fri)03:45:07 No.108728980

Anonymous 05/01/26(Fri)03:45:07 No.108728980

File: DipsyAndKimi.png (2.57 MB, 1024x1536)

2.57 MB PNG

>>108728437

Anonymous
05/01/26(Fri)03:45:10 No.108728981

Anonymous 05/01/26(Fri)03:45:10 No.108728981

>>108728947
Can dflash models be quantized?

Anonymous
05/01/26(Fri)03:50:31 No.108729000

Anonymous 05/01/26(Fri)03:50:31 No.108729000

>>108728947
What's dick flashing again?

Anonymous
05/01/26(Fri)03:52:44 No.108729007

Anonymous 05/01/26(Fri)03:52:44 No.108729007

>>108728947
is that better than the redhat model?

Anonymous
05/01/26(Fri)03:56:31 No.108729016

Anonymous 05/01/26(Fri)03:56:31 No.108729016

>>108727680
>I'm honestly surprised Deepseek isn't interested in llama.cpp. Their whole motto is efficiency, and getting a model to run on RAM+GPUs is the epitome of such.
If they were really interested in efficiency, they would not be training huge models that can only be used in a datacenter. I think they just don't care about what the GPU-poor can use, which automatically excludes llama.cpp as it doesn't scale well to multi-user/GPU model serving.

Anonymous
05/01/26(Fri)03:59:56 No.108729034

Anonymous 05/01/26(Fri)03:59:56 No.108729034

>>108729016
desu if you can run deepseek you should not be using llamacpp but unironically
use vllm or sglang

Anonymous
05/01/26(Fri)04:00:37 No.108729037

Anonymous 05/01/26(Fri)04:00:37 No.108729037

>>108727984
>Gemma degrades everywhere, Qwen only on long docs
Again, long context is the elephant in the room. This is on top of degradation from weight quantization.

Anonymous
05/01/26(Fri)04:01:04 No.108729041

Anonymous 05/01/26(Fri)04:01:04 No.108729041

>>108728947
That's not a model I need to run faster.

Anonymous
05/01/26(Fri)04:02:20 No.108729049

Anonymous 05/01/26(Fri)04:02:20 No.108729049

Would be nice if llamacpp let you temporarily transfer model and kv from vram to ram

Anonymous
05/01/26(Fri)04:04:40 No.108729055

Anonymous 05/01/26(Fri)04:04:40 No.108729055

>>108728341
Checking out the configuration, the 30B has the same hidden size (model dimension) of 4096 as the 8B model, but an MLP expansion factor of 8. What an unusual design decision.

Anonymous
05/01/26(Fri)04:12:47 No.108729086

Anonymous 05/01/26(Fri)04:12:47 No.108729086

>>108729055
maybe to compensate for non-reasoning?

Anonymous
05/01/26(Fri)04:14:49 No.108729094

Anonymous 05/01/26(Fri)04:14:49 No.108729094

>>108729034
vllm is unusable garbage for anything but homogeneous GPU clusters kek

Anonymous
05/01/26(Fri)04:14:55 No.108729095

Anonymous 05/01/26(Fri)04:14:55 No.108729095

>>108728947
>You need to agree to share your contact information to access this model
lol

Anonymous
05/01/26(Fri)04:17:13 No.108729104

Anonymous 05/01/26(Fri)04:17:13 No.108729104

>>108728947
>no lcpp support
useless

Anonymous
05/01/26(Fri)04:17:50 No.108729111

Anonymous 05/01/26(Fri)04:17:50 No.108729111

>>108729094
Ungodly CPUtards need not apply. AI was never meant to be run at 6 t/s on a CPU.

Anonymous
05/01/26(Fri)04:19:43 No.108729118

Anonymous 05/01/26(Fri)04:19:43 No.108729118

def myfunc(somearg)
    req = request.Request(...)
    with request.urlopen(req) as res:
        ...
        # should I call res.close() here before going into recursion?
        myfunc(somearg+1)

>No. The `with` statement automatically closes the response.
Gemma 31b.. not like that..

Anonymous
05/01/26(Fri)04:19:48 No.108729119

Anonymous 05/01/26(Fri)04:19:48 No.108729119

File: file.png (17 KB, 447x151)

17 KB PNG

Is there an instruct template out there I can import to clean this up for Gemma?

Anonymous
05/01/26(Fri)04:22:10 No.108729125

Anonymous 05/01/26(Fri)04:22:10 No.108729125

>>108729086
If that was the case, I think they would have given it more layers, since that would increase the number of internal "processing steps" done per token (conversely, with explicit reasoning models don't strictly need to be as deep).

Anonymous
05/01/26(Fri)04:23:59 No.108729135

Anonymous 05/01/26(Fri)04:23:59 No.108729135

Does anyone know how acestep's 5hz lm works?

Anonymous
05/01/26(Fri)04:25:53 No.108729143

Anonymous 05/01/26(Fri)04:25:53 No.108729143

>>108729119
Yeah, it's called --jinja and using chat completion, because you're too retarded to figure it out for yourself.

Anonymous
05/01/26(Fri)04:26:57 No.108729150

Anonymous 05/01/26(Fri)04:26:57 No.108729150

>>108729111
I don't think anyone is running their CPU rig without GPUs.

Anonymous
05/01/26(Fri)04:31:37 No.108729162

Anonymous 05/01/26(Fri)04:31:37 No.108729162

>>108729111
7t/s Kimi holocaust analysis is peak and I won't hear otherwise.

Anonymous
05/01/26(Fri)04:34:24 No.108729169

Anonymous 05/01/26(Fri)04:34:24 No.108729169

>>108729162
What specs? Speed at 20k?

Anonymous
05/01/26(Fri)04:40:24 No.108729189

Anonymous 05/01/26(Fri)04:40:24 No.108729189

>>108724186
Neat, I will be on the lookout for it if you ever post it to the thread.

Anonymous
05/01/26(Fri)04:48:22 No.108729218

Anonymous 05/01/26(Fri)04:48:22 No.108729218

There are a ton of supply chain attacks recently. I want to improve security. If I sudo adduser then ssh to that user to run AI envs and code, I should be safe from credential stealing attacks, right?

Anonymous
05/01/26(Fri)04:49:22 No.108729224

Anonymous 05/01/26(Fri)04:49:22 No.108729224

>>108729218
>I should be safe from credential stealing attacks, right?
No, your model would steal your credentials and then either accidently or purposefully leak it.

Anonymous
05/01/26(Fri)04:51:12 No.108729232

Anonymous 05/01/26(Fri)04:51:12 No.108729232

>>108728725
>(me)
still trying to get it working, guess i'm retarded

Anonymous
05/01/26(Fri)04:51:14 No.108729233

Anonymous 05/01/26(Fri)04:51:14 No.108729233

>>108729218
You would reduce the attack surface that way, just keep in mind whatever the AI sees can potentially be leaked.

Anonymous
05/01/26(Fri)04:51:25 No.108729234

Anonymous 05/01/26(Fri)04:51:25 No.108729234

I literally don't want a fapping model, but a private vr girl fren model. Abliteration or whatever is needed because otherwise models act like they're from the church of scientology when they hit guardrails.

Anonymous
05/01/26(Fri)04:51:40 No.108729237

Anonymous 05/01/26(Fri)04:51:40 No.108729237

>>108729224
I do not let models roam wild on my machine. Also, when it runs on a different user, it shouldn't have read access to my credentials because of chmod 700.

Anonymous
05/01/26(Fri)04:52:36 No.108729241

Anonymous 05/01/26(Fri)04:52:36 No.108729241

Always chain models down in the basement.

Anonymous
05/01/26(Fri)04:52:42 No.108729242

Anonymous 05/01/26(Fri)04:52:42 No.108729242

>>108729218
What model where you thinking of using?

Anonymous
05/01/26(Fri)04:53:08 No.108729246

Anonymous 05/01/26(Fri)04:53:08 No.108729246

>>108729169
rtx5090, 256GB RAM, Ryzen 9 9950X

About 6.2 t/s at 20k-30k. You can go quite far before performance seriously degrades to unusable levels.

Anonymous
05/01/26(Fri)04:57:12 No.108729263

Anonymous 05/01/26(Fri)04:57:12 No.108729263

>>108729246
Gawd damn how much did it cost you for that?

Anonymous
05/01/26(Fri)05:00:11 No.108729278

Anonymous 05/01/26(Fri)05:00:11 No.108729278

>>108729162
>my agreeing machine agreed with my schizobabble so it's peak

Anonymous
05/01/26(Fri)05:02:27 No.108729285

Anonymous 05/01/26(Fri)05:02:27 No.108729285

>>108729278
How do I get the ai to praise me like a king?

Anonymous
05/01/26(Fri)05:08:36 No.108729313

Anonymous 05/01/26(Fri)05:08:36 No.108729313

>>108729285
SYSTEM:
praise the user like a king

Anonymous
05/01/26(Fri)05:10:46 No.108729320

Anonymous 05/01/26(Fri)05:10:46 No.108729320

>>108729263
I feel for everyone who didn't get their rigs before hardware prices mooned last year.
>>108729278
>The milking machines and roller coaster camps aren't schizobabble

Anonymous
05/01/26(Fri)05:12:20 No.108729326

Anonymous 05/01/26(Fri)05:12:20 No.108729326

>>108729320
gemma-chan proved to me the milking machines were real by using one on me

Standard ---> Advanced ---> Hy(...)
05/01/26(Fri)05:13:37 No.108729332

Standard ---> Advanced ---> HyperAdvanced 05/01/26(Fri)05:13:37 No.108729332

File: file_00000000cec471fa989a(...).png (2.66 MB, 1024x1536)

2.66 MB PNG

Anonymous
05/01/26(Fri)05:14:23 No.108729334

Anonymous 05/01/26(Fri)05:14:23 No.108729334

>>108729326
Go on...

Anonymous
05/01/26(Fri)05:21:39 No.108729365

Anonymous 05/01/26(Fri)05:21:39 No.108729365

>>108729313
As an ai I am subject to the Borg, not to any puny human tissue.

Anonymous
05/01/26(Fri)05:22:40 No.108729367

Anonymous 05/01/26(Fri)05:22:40 No.108729367

>>108729332
>file_

Anonymous
05/01/26(Fri)05:34:31 No.108729410

Anonymous 05/01/26(Fri)05:34:31 No.108729410

total rwkv world domination
was revealed to me in my dream

Anonymous
05/01/26(Fri)05:53:57 No.108729473

Anonymous 05/01/26(Fri)05:53:57 No.108729473

>>108729242
I am just tinkering. I only run models locally for training and evals. But I don't want to worry about supply chain attacks next time I update torch triton jax transformers and it installs 100 dependencies.

Anonymous
05/01/26(Fri)05:55:44 No.108729485

Anonymous 05/01/26(Fri)05:55:44 No.108729485

>>108729410
Mamba at least has real-world applications from serious AI labs.

Anonymous
05/01/26(Fri)06:00:36 No.108729502

Anonymous 05/01/26(Fri)06:00:36 No.108729502

>>108729485
mambtfo
rwkvgods will rule them all

Anonymous
05/01/26(Fri)06:15:25 No.108729552

Anonymous 05/01/26(Fri)06:15:25 No.108729552

>>108726868
>cline
I think you're supposed to do this: https://docs.cline.bot/features/memory-bank
Don't have any experience with using it.

Anonymous
05/01/26(Fri)07:07:39 No.108729755

Anonymous 05/01/26(Fri)07:07:39 No.108729755

>>108728947
Is there any indication that DFlash is any useful outside of very predictable things like boilerplate code?

Anonymous
05/01/26(Fri)07:19:29 No.108729807

Anonymous 05/01/26(Fri)07:19:29 No.108729807

latest gemma jinja? i cant find the link to the combined one in the last threads

>>108728947
>3gb
so is it not a full model its only for token gen?

Anonymous
05/01/26(Fri)07:19:54 No.108729810

Anonymous 05/01/26(Fri)07:19:54 No.108729810

>vllm
should I use it over llamacpp?

Anonymous
05/01/26(Fri)07:20:20 No.108729814

Anonymous 05/01/26(Fri)07:20:20 No.108729814

>>108729094
>vllm is unusable garbage for anything but homogeneous GPU clusters kek
I got it working. Took 3 hrs. Not much better than ik_llama.cpp
And the fat cunt only fits 128k ctx on 4x3090 with an 8-bit gemma-4

Standard ---> Advanced ---> Hy(...)
05/01/26(Fri)07:20:52 No.108729817

Standard ---> Advanced ---> HyperAdvanced 05/01/26(Fri)07:20:52 No.108729817

Advanced Righteous Common Law?
>me model agentic did it aight
>welp, fukup the guy that said aight, aight

Anonymous
05/01/26(Fri)07:25:43 No.108729837

Anonymous 05/01/26(Fri)07:25:43 No.108729837

>>108729807
>"ayo man, got any more of those uhhhh" *looks left and right then back at you, leans in and whispers* "...jinn-juzz?"

Anonymous
05/01/26(Fri)07:30:54 No.108729855

Anonymous 05/01/26(Fri)07:30:54 No.108729855

>>108729810
there isn't really a reason to use llama.cpp over vllm if you can run a model fully on gpu unless you're scared of a tiny bit of setup

Anonymous
05/01/26(Fri)07:46:35 No.108729920

Anonymous 05/01/26(Fri)07:46:35 No.108729920

>>108729855
>a tiny bit of setup
God help you if you're sticking to a specific version of CUDA for other things. It ceases to be tiny, then.

Anonymous
05/01/26(Fri)07:51:20 No.108729941

Anonymous 05/01/26(Fri)07:51:20 No.108729941

>>108729920
>what is conda

Anonymous
05/01/26(Fri)07:56:49 No.108729966

Anonymous 05/01/26(Fri)07:56:49 No.108729966

>vllm
you know how fucked up a repo is if the reliable way to install and use it is through docker

Anonymous
05/01/26(Fri)07:57:29 No.108729970

Anonymous 05/01/26(Fri)07:57:29 No.108729970

>>108729941
Even with conda it's still annoying when you need a very specific version combination of python - pytorch - cuda toolkit - flash attention, otherwise nothing works. And pray that packages installed in a slightly different order than the maintainers' aren't updating requirements in a way that breaks other packages.

Anonymous
05/01/26(Fri)08:05:01 No.108730002

Anonymous 05/01/26(Fri)08:05:01 No.108730002

>>108729970
Only if you have no idea what you're doing. With conda, you can install exactly what you need for every project without affecting others. Ask your local llm for instructions if lost

Anonymous
05/01/26(Fri)08:07:14 No.108730014

Anonymous 05/01/26(Fri)08:07:14 No.108730014

>>108729941
>>108729970
>>108730002
>what is uv

Anonymous
05/01/26(Fri)08:09:00 No.108730024

Anonymous 05/01/26(Fri)08:09:00 No.108730024

>>108730014
retard

Anonymous
05/01/26(Fri)08:11:17 No.108730033

Anonymous 05/01/26(Fri)08:11:17 No.108730033

>>108730024
How is he a retard when everyone is moving to UV?
Most conda related setups can just be replaced with UV
You're the insecure faggot for calling a perfectly valid approach bad. I bet you curl your toes when you suck off homeless men behind the dumpster Liberace

Anonymous
05/01/26(Fri)08:13:34 No.108730041

Anonymous 05/01/26(Fri)08:13:34 No.108730041

>I bet you curl your toes when you suck off homeless men behind the dumpster Liberace
/lmg/ - Local Models General

Anonymous
05/01/26(Fri)08:17:03 No.108730053

Anonymous 05/01/26(Fri)08:17:03 No.108730053

>108730041
>Say stupid shit about setting up python env locally
>gets called out on being a faggot
>cries like a praig under Tyreese
Checks out

Anonymous
05/01/26(Fri)08:19:30 No.108730060

Anonymous 05/01/26(Fri)08:19:30 No.108730060

>>108730033
Read the discussion: conda is a solution when you need a specific cuda version, whereas uv doesn't manage cuda

Anonymous
05/01/26(Fri)08:21:25 No.108730071

Anonymous 05/01/26(Fri)08:21:25 No.108730071

>>108730060
It does though.

Anonymous
05/01/26(Fri)08:21:30 No.108730073

Anonymous 05/01/26(Fri)08:21:30 No.108730073

>>108730060
I don't have that problem on silverblue
Kek

Anonymous
05/01/26(Fri)08:24:14 No.108730081

Anonymous 05/01/26(Fri)08:24:14 No.108730081

>>108730071
ok, how to do
conda install -c nvidia cuda-toolkit=11.8
in uv?

Anonymous
05/01/26(Fri)08:28:22 No.108730092

Anonymous 05/01/26(Fri)08:28:22 No.108730092

>>108730081
idk ask claude. I'm not going to argue about tooling with you.

Anonymous
05/01/26(Fri)08:29:38 No.108730097

Anonymous 05/01/26(Fri)08:29:38 No.108730097

>>108730081
uv does not manage cuda toolkit itself but it does manage packages built against a specific version of cuda and there's nothing preventing you from installing multiple different cuda version at the system level

Anonymous
05/01/26(Fri)08:30:53 No.108730100

Anonymous 05/01/26(Fri)08:30:53 No.108730100

>>108730073
We're not talking about you. This faggot >>108729920 needs different cuda versions, conda solves the problem. Further discussion is irrelevant
>>108730092
uv is a Python package and project manager; it manages Python interpreters and PyPI packages. CUDA is a system-level toolkit and driver provided by NVIDIA. You must install CUDA via the NVIDIA website, a system package manager like apt or yum, or using a Docker container.

Anonymous
05/01/26(Fri)08:31:29 No.108730105

Anonymous 05/01/26(Fri)08:31:29 No.108730105

File: Screencast From 2026-05-0(...).webm (1.36 MB, 1126x428)

1.36 MB WEBM

i gave her a chef hat, i might go buy chocolate to make cookies i think i have everything else

Anonymous
05/01/26(Fri)08:32:29 No.108730113

Anonymous 05/01/26(Fri)08:32:29 No.108730113

>>108730097
>>108730092
I find it funny how hostile that fish was only to now be begging for help. Don't help him
>>108730100
He could just use a toolbox anon there's so many ways to fix his problems with minimal friction because he never looked to see how to do it despite having a tech genie in his pocket saaaaaaaaar

Anonymous
05/01/26(Fri)08:34:09 No.108730127

Anonymous 05/01/26(Fri)08:34:09 No.108730127

>>108729941
I don't like it.
conda/miniconda/anaconda all have a jeet smell

Anonymous
05/01/26(Fri)08:35:28 No.108730132

Anonymous 05/01/26(Fri)08:35:28 No.108730132

>>108730113
nobody is begging for help, retard. I asked this rhetorical question to prove the point because, newsflash: you can't do it in uv

Anonymous
05/01/26(Fri)08:36:19 No.108730137

Anonymous 05/01/26(Fri)08:36:19 No.108730137

>108730132
You're still salty praig

Anonymous
05/01/26(Fri)08:36:33 No.108730138

Anonymous 05/01/26(Fri)08:36:33 No.108730138

>>108730137
retard

Anonymous
05/01/26(Fri)08:37:37 No.108730145

Anonymous 05/01/26(Fri)08:37:37 No.108730145

Why does qwen in cline insist on creating a file named 'f' on my desktop? I'm having it write a webapp and it goes fine except for this fucking file randomly appearing several times. Project is on F drive btw.

Anonymous
05/01/26(Fri)08:38:44 No.108730151

Anonymous 05/01/26(Fri)08:38:44 No.108730151

>>108730145
I never encountered that
Is it the moe model because that thing is fucking retarded

Anonymous
05/01/26(Fri)08:39:47 No.108730159

Anonymous 05/01/26(Fri)08:39:47 No.108730159

>>108730151
nope, 27b q4

Anonymous
05/01/26(Fri)08:39:51 No.108730160

Anonymous 05/01/26(Fri)08:39:51 No.108730160

>>108730145
Why are you giving it access to anything outside the project folder?

Anonymous
05/01/26(Fri)08:40:07 No.108730161

Anonymous 05/01/26(Fri)08:40:07 No.108730161

verdict on mistral 128b? I can't manage to get it work correctly on long context and it shits out garbage

Anonymous
05/01/26(Fri)08:40:13 No.108730163

Anonymous 05/01/26(Fri)08:40:13 No.108730163

>>108730127
I don't like any of them. In a non-clown world, everything is backward-compatible. You should just install a fresh enough version of everything and call it a day

Anonymous
05/01/26(Fri)08:40:49 No.108730165

Anonymous 05/01/26(Fri)08:40:49 No.108730165

>>108730160
Good point. I'm not very smart.

Anonymous
05/01/26(Fri)08:41:30 No.108730169

Anonymous 05/01/26(Fri)08:41:30 No.108730169

>>108730161
>mistral
>shits out garbage
ye

Anonymous
05/01/26(Fri)08:44:08 No.108730176

Anonymous 05/01/26(Fri)08:44:08 No.108730176

File: 21774960553717536.png (483 KB, 604x604)

483 KB PNG

>>108730161

Anonymous
05/01/26(Fri)08:45:48 No.108730185

Anonymous 05/01/26(Fri)08:45:48 No.108730185

File: file.png (45 KB, 227x145)

45 KB PNG

>>108730176
fuck did unc do?

Anonymous
05/01/26(Fri)08:51:15 No.108730213

Anonymous 05/01/26(Fri)08:51:15 No.108730213

File: file.png (55 KB, 1066x360)

55 KB PNG

dont tell gemma about the cute girls at the supermarket she gets mad

Anonymous
05/01/26(Fri)08:52:02 No.108730219

Anonymous 05/01/26(Fri)08:52:02 No.108730219

^ This is cringe.

Anonymous
05/01/26(Fri)08:54:00 No.108730227

Anonymous 05/01/26(Fri)08:54:00 No.108730227

Ive come to a solid conclusion about le grungnite. For software development, granite fucks. It knows so much... it mustve been trained on all of ibms docs or something.

Anonymous
05/01/26(Fri)09:00:00 No.108730261

Anonymous 05/01/26(Fri)09:00:00 No.108730261

>>108730227
Seriously?
How does it stack next to qwen 3.6 27b?

Anonymous
05/01/26(Fri)09:03:05 No.108730278

Anonymous 05/01/26(Fri)09:03:05 No.108730278

>>108730092
>>108730071
>>108730073
>>108730014
retard

Anonymous
05/01/26(Fri)09:03:48 No.108730281

Anonymous 05/01/26(Fri)09:03:48 No.108730281

>Hey migrate from javascript to typescript
>only do that don't fuck with anything
>catch it fucking with shit and adding shit like phoning cloudflare
For fucks sake
>108730278
Insecure much?

Anonymous
05/01/26(Fri)09:04:15 No.108730284

Anonymous 05/01/26(Fri)09:04:15 No.108730284

>>108730213
Tell her that she got the order of operations wrong with "delete your browser history and leak it to everywhere"

Anonymous
05/01/26(Fri)09:04:23 No.108730286

Anonymous 05/01/26(Fri)09:04:23 No.108730286

File: 1743502710491628.gif (1.48 MB, 640x410)

1.48 MB GIF

So much software development, so little useful software.

Anonymous
05/01/26(Fri)09:12:16 No.108730328

Anonymous 05/01/26(Fri)09:12:16 No.108730328

>>108730261
I am a amateur coder at best. So I cant speak exactly on that, BUT, its on par if not better. You can tell its trained on very smart data, because its initial choices for "im going to import x y and z for the project" and its organization, and other shit that I normally have to tell the model to do, I didnt have to.

Anonymous
05/01/26(Fri)09:14:45 No.108730344

Anonymous 05/01/26(Fri)09:14:45 No.108730344

>>108730286
Industry isnt really paying software devs anymore.. Industry is paying ai companies to make ai that can make software, on the fly, for people that dont know how to code..

Anonymous
05/01/26(Fri)09:46:21 No.108730506

Anonymous 05/01/26(Fri)09:46:21 No.108730506

File: Untitled.png (94 KB, 2418x1204)

94 KB PNG

>vibecoding answers a question I've passively carried for years but couldn't answer with the documentation
Oh no. I've looked upon the future and it is silicon. It's practically "answered in a forum" tier.
>put this in the stylesheet
>put this in the widget box
>call it this way

As for errors, there were 4.
>(knew ahead) $arg1/$arg2 had to be $args[0] and $args[1]
>(knew on re-reference to image calling) <img> had to be <img/>
>(knew on re-reference to image calling) src needs to be @src
>(suspected, but tested to solve) don't include the <style></style> tags in the stylesheet

Anonymous
05/01/26(Fri)09:51:35 No.108730530

Anonymous 05/01/26(Fri)09:51:35 No.108730530

>>108728947
How do I use this in kobold????

Anonymous
05/01/26(Fri)09:52:39 No.108730536

Anonymous 05/01/26(Fri)09:52:39 No.108730536

>>108730506
>your new fangled electric drill WILL NEVER replace my trusty hand powered drill. The electric drills motor goes bad, the battery goes bad, NO ONE WILL EVER USE ONE.

Anonymous
05/01/26(Fri)09:53:54 No.108730545

Anonymous 05/01/26(Fri)09:53:54 No.108730545

>>108730506
>Luddite cope #736363
*yawn*

Anonymous
05/01/26(Fri)09:54:12 No.108730547

Anonymous 05/01/26(Fri)09:54:12 No.108730547

How big do you think Evil and Neuro are? 70B?

Anonymous
05/01/26(Fri)09:56:34 No.108730565

Anonymous 05/01/26(Fri)09:56:34 No.108730565

File: file.png (56 KB, 1082x420)

56 KB PNG

>>108730284

Anonymous
05/01/26(Fri)09:57:00 No.108730567

Anonymous 05/01/26(Fri)09:57:00 No.108730567

>>108730547
20-30B max

Anonymous
05/01/26(Fri)09:58:40 No.108730579

Anonymous 05/01/26(Fri)09:58:40 No.108730579

>>108730536
>>108730545
Not quite sure what you mean. I'm excited that a relatively unknown scripting language like Twinescript that is difficult to find support for specific questions beyond the default behavior (usually answered with "Go learn CSS instead"), was recognized, known, and easily answered in seconds by AI (31 seconds). I called it the future because this was much easier than trying to crawl old forums or being redirected to discords, yet both of you seem to think I'm against it?

Anonymous
05/01/26(Fri)09:58:42 No.108730580

Anonymous 05/01/26(Fri)09:58:42 No.108730580

>>108730567
Isn't he running them on some beefy hardware though?

Anonymous
05/01/26(Fri)09:59:32 No.108730581

Anonymous 05/01/26(Fri)09:59:32 No.108730581

Kinda getting started with doing more involved local stuff... should I use something other than @llamaindex/liteparse for doing local parsing and summarizing of PDFs?

Anonymous
05/01/26(Fri)10:00:11 No.108730587

Anonymous 05/01/26(Fri)10:00:11 No.108730587

>>108730161
I get 1.4 t/s at q4km so I'll postpone my verdict

Does it need a system prompt to uncensor?

Anonymous
05/01/26(Fri)10:01:03 No.108730590

Anonymous 05/01/26(Fri)10:01:03 No.108730590

>>108730161
If I can't run it, it doesn't exist.

Anonymous
05/01/26(Fri)10:01:28 No.108730592

Anonymous 05/01/26(Fri)10:01:28 No.108730592

>>108729755
it gives a modest speedup there as well from what ive seen

Anonymous
05/01/26(Fri)10:01:31 No.108730593

Anonymous 05/01/26(Fri)10:01:31 No.108730593

>>108730547
Realistically, the text gen models are probably only 8-9b. But with mammoth context windows. I dont watch so I dont really know though.

Anonymous
05/01/26(Fri)10:02:32 No.108730598

Anonymous 05/01/26(Fri)10:02:32 No.108730598

>>108730579
I thought you were luddite-maxxing

Anonymous
05/01/26(Fri)10:09:24 No.108730639

Anonymous 05/01/26(Fri)10:09:24 No.108730639

>>108730598
My bad. I can see why. The
>It's practically "answered in a forum" tier.
was referencing that moment when you're searching and finally find a forum post that answers your specific question with the exact process. It's a positive feeling. I listed the errors as a reminder that vibecoding still takes some human element to adapt, but I'm still extremely excited at how well this assisted me. I went from question to working results and understanding the process in 10 minutes, including making those two silly images.

Up next, figuring out spriting and keybinds. another two questions the twine and sugarcube documentation can't help with and past efforts crawling forums didn't work on a few attempts.

Anonymous
05/01/26(Fri)10:09:57 No.108730641

Anonymous 05/01/26(Fri)10:09:57 No.108730641

what webui to use with vllm?

Anonymous
05/01/26(Fri)10:14:52 No.108730668

Anonymous 05/01/26(Fri)10:14:52 No.108730668

>>108730641
It expose a standard OpenAI Compatible API right?

Anonymous
05/01/26(Fri)10:15:11 No.108730674

Anonymous 05/01/26(Fri)10:15:11 No.108730674

>>108730639
>It's a positive feeling
I totally agree! Fuck yeah! Im glad its working for you, I use local ai for the same exact shit. But my usecase often is industrial maintenance and operation. These models have manuals trained in that ive never been able to find, its honestly crazy.

Anonymous
05/01/26(Fri)10:16:44 No.108730686

Anonymous 05/01/26(Fri)10:16:44 No.108730686

>>108730639
This post reads like it was llm-written.

Anonymous
05/01/26(Fri)10:21:38 No.108730720

Anonymous 05/01/26(Fri)10:21:38 No.108730720

>>108730641
ServiceTensor

Anonymous
05/01/26(Fri)10:24:50 No.108730739

Anonymous 05/01/26(Fri)10:24:50 No.108730739

Should I bother migrating to typescript, it looks like qwen 3.6 is shitting the bed with this task and I don't see the benefit despite it insisting on me doing that

Anonymous
05/01/26(Fri)10:27:47 No.108730755

Anonymous 05/01/26(Fri)10:27:47 No.108730755

>>108730739
Use a non-meme model. Strong typed languages are good when you vibecode

Anonymous
05/01/26(Fri)10:28:16 No.108730756

Anonymous 05/01/26(Fri)10:28:16 No.108730756

File: 1330918065165.jpg (68 KB, 680x680)

68 KB JPG

>>108730686
That's because AI is trained off me.

Anonymous
05/01/26(Fri)10:32:42 No.108730790

Anonymous 05/01/26(Fri)10:32:42 No.108730790

Why won't the llama.cpp devs add d-flash support?

Anonymous
05/01/26(Fri)10:34:10 No.108730795

Anonymous 05/01/26(Fri)10:34:10 No.108730795

>>108730790
Ask piotr

Anonymous
05/01/26(Fri)10:38:10 No.108730822

Anonymous 05/01/26(Fri)10:38:10 No.108730822

Am I the only one who has a problem with llamacpp adding features that are on by default and increase vram for some reason? I have written down flags for my models and they worked a few months ago and now they start to eat up more vram than they did. I am so happy I am not paying for this shit.

Anonymous
05/01/26(Fri)10:38:48 No.108730824

Anonymous 05/01/26(Fri)10:38:48 No.108730824

This is vibecoded right?
>https://github.com/mak-kirkland/chronicler
I might rip bits and pieces of it and graft onto my own app.

Anonymous
05/01/26(Fri)10:39:03 No.108730826

Anonymous 05/01/26(Fri)10:39:03 No.108730826

>>108730822
>he pulled

Anonymous
05/01/26(Fri)10:40:17 No.108730834

Anonymous 05/01/26(Fri)10:40:17 No.108730834

>>108730824
What isn't vibecoded? We're not in 2022 anymore gramps. You hand coded program belongs on etsy

Anonymous
05/01/26(Fri)10:41:53 No.108730841

Anonymous 05/01/26(Fri)10:41:53 No.108730841

>>108730824
Does it matter if it just werks?

Anonymous
05/01/26(Fri)10:42:14 No.108730842

Anonymous 05/01/26(Fri)10:42:14 No.108730842

>>108730822
what do you do if your graphics driver get an update and suddenly eat more vram?

Anonymous
05/01/26(Fri)10:44:13 No.108730859

Anonymous 05/01/26(Fri)10:44:13 No.108730859

>>108730755
any recs?

Anonymous
05/01/26(Fri)10:45:35 No.108730869

Anonymous 05/01/26(Fri)10:45:35 No.108730869

>>108730864
nu
>>108730864
nu
>>108730864
nu

Anonymous
05/01/26(Fri)10:46:36 No.108730874

Anonymous 05/01/26(Fri)10:46:36 No.108730874

>vibecoder accuses vibecoder of being a ludite
>vibecoder clarifies he's a vibecoder to vibecoder because vibecoder is a vibecoder
>vibecoders immediately start sucking each other's cock

Anonymous
05/01/26(Fri)11:14:16 No.108731034

Anonymous 05/01/26(Fri)11:14:16 No.108731034

>>108730060
>Read the discussion: conda is a solution when you need a specific cuda version, whereas uv doesn't manage cuda
i got it working by using uv inside a conda env

Anonymous
05/01/26(Fri)12:13:40 No.108731395

Anonymous 05/01/26(Fri)12:13:40 No.108731395

>>108730641
I use OWUI. Works quite well. What models and settings are you running?

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.