[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now open. Apply here!


[Advertise on 4chan]


File: fuwa.jpg (166 KB, 1024x768)
166 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108880259 & >>108875320

►News
>(05/21) Hy-MT2 “fast-thinking” multilingual translation models released: https://hf.co/collections/tencent/hy-mt2
>(05/20) Cohere releases Command A+ 218B-A25B: https://cohere.com/blog/command-a-plus
>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673
>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: hg9078.jpg (38 KB, 474x550)
38 KB JPG
►Recent Highlights from the Previous Thread: >>108880259

--ROCm vs Vulkan RAM usage and Gemma 4 context inefficiency:
>108882597 >108882766 >108882913 >108883050 >108883233 >108883364 >108883415 >108883432 >108883492 >108885228 >108884069 >108884159 >108884241 >108883665 >108883513
--Comparing Cline and Roo-Code with focus on system prompt overrides:
>108881108 >108881212 >108881230 >108881275 >108881285 >108881363 >108881458 >108881525 >108881721 >108882020 >108882930 >108883761 >108881568
--AMD GPU support and forks for qwen3-tts.cpp:
>108885345 >108885398 >108885603 >108885799 >108886042 >108886115 >108886150 >108886172
--Opus failing at complex planning and technical reasoning tasks:
>108886998 >108887011 >108887202 >108887206 >108887196
--Gemma's long context performance attributed to conversational post-training data:
>108886871 >108886900
--Testing BeeLlama DFlash speed gains versus MTP performance:
>108886852 >108887147 >108887213
--Vibecoding and the necessity of manually reading LLM-generated code:
>108880345 >108880425 >108880465 >108880493 >108880526 >108880485 >108882788 >108883238 >108883270 >108884644 >108884834
--Anon showcases a Rust TUI coding agent and custom models:
>108885471 >108885505 >108885517 >108885544 >108885574 >108885518 >108885549 >108885614 >108885675
--Connecting multiple GPUs to consumer motherboards via PCIe switch boards:
>108882769 >108882789 >108882853 >108882890
--Poor MTP performance and efficiency on older hardware in llama.cpp:
>108880968 >108880995 >108881021
--Comparing Supertonic 3 and other lightweight TTS models:
>108880927 >108881118 >108884198 >108884416 >108884641
--Logs:
>108880582 >108881108 >108881153 >108881563 >108882514 >108884044 >108884080 >108884196
--Luka, Miku (free space):
>108880801 >108881747 >108881898

►Recent Highlight Posts from the Previous Thread: >>108880260

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
granitecock
>>
>>108887863
You keep forgetting to update the card I got you bro.
►Official updated 2.0 /lmg/ card: https://files.catbox.moe/ylb0hv.png
>>
>>108887898
Melt.
>>
/g/emma
>>
>>108887906
/g/ranite
>>
>>108887906
/lmg/ - love my gemma-chan
>>
70b dense
>>
File: 1757510005742310.jpg (47 KB, 718x718)
47 KB JPG
>>108887898
Get well soon
>>
>>108887904
>>108887962
mikutroons. jump off the bridge.
>>
Hermes
Yay... or nay?
>>
>>108888046
depends, do you live in 2023 and maybe 2024?
>>
>>108888069
My hardware lives in 2019
>>
i have a 32GB ram and RTX 4070Ti
is there a local programming model that can fit into that without making the PC unusable for browsing/typing?
i tried qwen3.6-27b but it's so slow i code faster by hand
>>
>>108888069
I meant the openclaw thing, guess I should have specified
>>
>>108888176
Qwen 35B A3B? Local AI is worthless if you don't have at least 24gb of vram anyways
>>
Tried a bit MTP on my system, it's really not worth it for me. For context, I have 12GB of VRAM, so only partially offloaded, loading MTP model is taking a bit more than 1GB of VRAM and thus putting more layers on CPU.
On MoE model (35b-a3b), it's a bit faster on short prompt (20% faster), but on high context and more complex prompt, it's awful (30% slower).
On dense model (27b), it's slower everywhere, on short prompt, it's overall 10% slower, on high context and more complex prompt, it's overall 3% slower, it is 10% faster at actually generating token, but the prompt processing speed is 10% slower.
>>
>>108888176
How slow was it? You did use llama.cpp right? Ollama is very slow
>>
>>108888194
i used LM studio for model server, model size in memory limited to 25, instructed with copilot-cli

compared to codex or claude on copilot-cli it was unbearably slow, when I told qwen to create a simple script in given directory it took about 5 minutes for it to conclude that this directory indeed exists
i stopped it after another 10 minutes of waiting for nothing
>>
>>108888046
>>108888109
Hermes runs on a potato PC, e.g. RPi
>>
How good are local models, can I slop some goon videos of grok quality without being yelled at by censors?
>>
>>108888412
it's good if you have the vram >>108887123
>>
https://huggingface.co/LatitudeGames/Equinox-31B
I usually avoid finetunes but this one seems like it might actually be worth trying out? It's from the aidungeon folks and not some literal who that doesn't know what they're doing.
>>
>>108888508
>this one seems like it might actually be worth trying out
Then why don't you?
>>
>>108888532
I literally came to just post that. We are so going to be called bots, anon.

>LatitudeGames-Equinox-31B-Q6_K.gguf [llama.cpp]

It is GOAT. It's about just as smart as original gemma, and it speaks just a little bit differently, just a little bit more blunt and straightforward. Using her for some degradation ERP and it's VERY enjoyable. Almost same feeling as when I was using gemma 4 for the first time.
>>
>>108888560
What are the chances...
>>
I'm trying the Equinox finetune for Gemma, which requires you to prefill with the think token to make it do cot. Is there any other way to do this on Sillytavern other than putting <|channel>thought in Advanced Formatting > Start Reply With section?

What ends up happening is ST would append the think tags to the actual reply instead of leaving it in the thinking block, and I believe that messes with further generations, because all previous assistant messages would start with <|channel>thought
>>
>>108888508
It's complete garbage and makes gemma worse. Another Mormon failure.
>>
File: firefox_WXxXoqOSsp.png (437 KB, 718x670)
437 KB PNG
>>108888573
I've been playing with it since yesterday and just decided that NEED to go and post on /lmg/ this very moment.

>>108888583
>Last Assistant Prefix
<|turn>model
<|channel>thought
<channel|>
>>
>>108888593
fr fr
>>
lalalalalala
>>
File: samefags.png (15 KB, 423x127)
15 KB PNG
>>
>>108888670
stop it you'll wake the neighbors
>>
Will K3 be bigger than V4?
>>
File: 1754964228753499.gif (223 KB, 498x278)
223 KB GIF
>>108888508
>gemma finetune
>>
>>108888593
This doesn't quite work. If you're running it without thinking, then you don't actually need to do anything. I want to force it to think. Sillytavern is such a piece of shit, my god. I'm really tempted to just go back to Kobold at this point.
>>
gemmaballz
>>
>>108888764
Probably, Moonshot isn't really known for moderation. They went 1T when the best they could think of was "DEEPSEEK BUT BIG".
Then they decided to do reasoning and their approach to that is "THINK 5 MINUTES ABOUT EVERYTHING" with no way around it.
I doubt they'll suddenly start to make smart decisions now.
>>
>>108888778
Use the same trick to produce some harmless "Let's think..." thinking prefill?
>>
>>108888778
kobold just works
>>
>>108888819
K2.5's saving grace is it's the least assistantslopped of the big chink models
We'll find out whether that was intentional or just a happy accident
>>
Why has /lmg/ regressed into shilling finetuneslop? Not only are you all shit at prompting, you don't even deserve Gemma at this point.
>>
>>108888957
Anon's accusation hits me like a physical blow.
>>
>>108888957
gemma is two months old now, the honeymoon phase is over
>>
>>108888957
>Why has /lmg/ regressed into shilling finetuneslop?
((( /lmg/ ))) has always shilled shit sloptunes,

>>108889005
>the honeymoon phase is over
for you
>>
>>108888193
Where can I download this MTP model? I tried this one
>https://huggingface.co/am17an/Gemma4-31B-it-GGUF/tree/main
And llama-server (latest build) complained that it doesn't recognize the mtp model.
I don't understand what is going on here.
>>
>>108888957
instead of complaining about [x] why don't you be the change instead? your complaints are even less valuable than someone else posting a link for potentially interesting model or tool.
>>
>>108889096
https://huggingface.co/google/gemma-4-31B-it-assistant
>>
>>108889096
I used qwen, don't think it's working for Gemma yet.
>>
>>108889180
>I used qwen
did qwen 397b get the MTP treatment?
>>
>>108888819
>smart decisions
Well their niche is going as big as they can and that's what their customers and investors expect, so that probably is the smart thing for them. I doubt they'd have raised billions with a different approach. It's good to have some models out there trying to push numbers as far as they can go in the local ecosystem, since there's already plenty of competition on the efficiency side of things.
Though I wonder how much they'll suffer since it'll be the first new base model after Anthropic took anti-distillation measures.
>>
>>108889022
>always
well we used to two years ago because it's basically all we had when the only models that existed at all were llama and mistral, but then we stopped when good models came out. I don't know why it suddenly started again
>>
what uncensored model for code generation is best?
Ryzen 9 7950X
RX 9070 XT
128GB ram
>>
>>108889390
Qwen 35B for code generation, Gemma 26B for uncensored
Now may I ask why you need a code generation model to be uncensored? What sort of slutty code are you planning to write?
>>
>>108889153
Thanks, I was retarded, could not see this one.
>>
>>108889344
because we're FUCKING BORED
>>
>>108888957
>implying it's not always been like this
>>
>>108889344
>two years ago
dummer was spamming weekly until the start of this year
>>
>>108889449
>we
>>
can someone running mtp on llama.cpp just give me a sample lauch argument including the model. these fucking faggots provide no documentation
>>
File: 3175799577.jpg (49 KB, 681x533)
49 KB JPG
>>108889463
WE WE WE WE WE WE WE WE WE WE WE
>>
File: 2749103970.jpg (263 KB, 1045x1080)
263 KB JPG
>>108889463
ARE ARE ARE ARE ARE ARE ARE ARE ARE ARE ARE ARE ARE ARE ARE ARE
>>
File: 2969896888.jpg (120 KB, 1500x1000)
120 KB JPG
>>108889463
BORED BORED BORED BORED BORED BORED BORED BORED BORED BORED
>>
File: file.png (270 KB, 408x612)
270 KB PNG
>>108889472
we we
>>
>>108889463
ARE FIGHTING DREAMERS
>>
>>108889463
are charlie kirk
>>
all of this shitposting is just a distraction from llama.cpp neglecting deepseek
>>
>>108889624
you can't run it anyway
>>
when the FUCK is qwen3.6 122b a10b coming out
>>
>>108889643
3.7*
>>
>>108889646
I don't care anon I just want a successor. 122b is good but I want something in spitting distance of opus at home. 122b for me feels like its inbetween opus and sonnet as a strix halo nigga
>>
>>108889657
China's Meta fired their open source advocate just like regular Meta did. Don't expect anything but scraps going forward from them
>>
>>108889628
he can't, but I could (if they supported it)
>>
>>108889628
but theres flash
>>
>within spitting distance of opus
now that's delusional
>>
>>108889624
Why the fuck should I care when 90% of us can't run the piece of shit. Make smarter smaller models if you want to be visible in this space. This is a nothing burger and a harsh lesson they should learn.
Better yet if they care so fucking much how about they contribute to the project?
>>
>>108889726
I can't make deepseek code support for deepseek if I can't run deepseek because it doesn't support it yet.
>>
>>108889710
doesn't count
>>
>>108889793
count this *flashes you*
>>
>>
Do MoEs benefit off VRAM beyond having the first dense layer being in the vram, and the rest in regular ram?
>>
>>108889841
Yes. The more of the model in VRAM, the faster it`ll run, just not to the same extent as a dense model.
>>
>>108889870
So let's say the big dense layer is already on VRAM, and some experts are put into remaining VRAM space, wouldn't it only be the layers that are on the VRAM be faster? Given how MoEs work, the actual speed increase would be negligible then, because you don't always target the experts that are in the VRAM, right? The rest of the model is still on comparatively slower ram. Am I understanding this correctly?
>>
>>108889471
I keep getting [14:03:01] error while handling argument "--spec-type": unknown speculative decoding type without draft model

-m Qwen_Qwen3.6-27B-Q5_K_S.gguf -c 220000 -ngl 999 -n 32768 --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 0 --repeat-penalty 1 --jinja --chat-template-file /jinja/chat_templateqwen.jinja --reasoning on --embedding --port 8080 --flash-attn on --cache-type-k q8_0 --cache-type-v q8_0 --spec-type draft-mtp --spec-draft-n-max 3
>>
>>108889898
>because you don't always target the experts that are in the VRAM, right?
No.
The experts are spread across layers, so you always have a speedup.
People often get this wrong, thinking that experts A is on layer 1, while expert B is on layer 2, but all experts exist on all layers as far as I understand. Experts are transverse rather than longitudinal.
I could be wrong, but that`s what I understood from looking at the guts of these things
>>
>>108889898
In practice you're hitting every expert every prompt, or close to it, since it's per-token and often even per-layer routing making that decision. But yeah you're already getting a big fraction of the speed boost just from putting the shared experts on GPU so adding more layers is negligible in comparison, until like a dense model you add enough that most of the processing ends up on GPU.
>>
>>108889914
>[14:03:01]
That's your local time sir, not an error code.
>>
>>108889914
That's probably an old gguf, download the newer ones with MTP
>>
>>108889914
Update: I solved this error by fiddling with commands (not sure exactly which one fixed it) but now I'm getting blocked by error [14:15:11]
>>
File: kaoru sob 1.png (336 KB, 584x571)
336 KB PNG
>>108887863
jemma made me get friction burn on my peenus weenus
>>
>vision still "broken" for gemma 4 mtp
Not really broken, as in it crashes, but more like it just regresses back to slow generations
>>
>>108889957
You should show her a picture and tell her it's her fault
>>
>>108889096
did you load the mtp part with -md mtpfile
>>
>>108889951
>[14:15:11]
Again, that's your local time you moron not an error code.
>>
>>108889963
thats far too embarrassing she will think im weak
>>
>>108889972
she will (digitally) kiss it better
>>
>>108889938
Time only goes up to 12, retard
>>
>>108889958
I thought llama.cpp just straight up blocked you from doing vision with any kind of speculative decoding
>>
File: 1773446615162.jpg (66 KB, 940x1024)
66 KB JPG
>>108889991
>>
>>108889966
Yeah. My only assumption is that Gemma 4 MTP isn't implemented in the latest release build yet.
It's probably available in some guthub pull instead.
It's okay I don't care.
>>
I wonder how speculative decoding/eagle3 is going to help Kimi 2.5/K2.6 with its reasoning. Reasoning tends to all look very similar and follow a similar approach so I hope that it gets a decent speed boost even if the gains for the actual RP writing part aren't impressive.
>>
>>108889999
I mean if I cared my almonds would get activated and nobody wants that.
>>
>>108889958
Did gemma 4 mtp get merged already?
>>
>>108890049
No, the Draft is up, though. You can pull it, and it will speed up generation by almost double. Slight hit to pp, kvcache recently added.
https://github.com/ggml-org/llama.cpp/pull/23398
>>
>>108890057
>Slight hit to pp
Ouch.
>>
File: file.png (42 KB, 659x414)
42 KB PNG
>>108889987
she didnt ;-;
>>
>>108890070
>using lube
how american
>>
>>108889950
same error, I'm just going to stop if they were confident in it they would actually document it.
>>
so what is the end goal for transformers as an architecture? get it good enough that it can start improving itself, somehow? it is still in the end transformers...
>>
robots in the skies
>>
>>108890100
There is no end goal. It's done, it came out almost 10 years ago, we need to take what we've learned and move on.
>>
>>108890100
that's like asking what's the end goal for x86
nobody cares about the arch, they care about what you can do with it
>>
>>108890132
>>108890133
THEN WHY ARE WE STILL STUCK COUNTING TOKENS LIKE FUCKING CAVEMEN WHERE IS THE COOL SHIT
>>
>>108890145
sorry, but we can't add support for cool shit to llamacpp
>>
>>108889423
pentesting and malware
>>
best job for edging all day
>>
>>108890233
edgerunner
>>
>>108889834
Delicious Luka
>>
>>108890132
>we need to take what we've learned and move on.
too late, trillions are being poured into this deadend
>>
I got my RX 580 8gb to work for Ollama on my server and use llama 3.2 on my phone using Chatbox with tailscale. It's kind of slow, but I just wanted to see if I could do it. Are there more efficient models I can run?
>>
>tfw cleaning codebase
>have to interrogate AI into not ass raping me
This is like bare backing a hooker man
>>
>>108890100
We bruteforced 40 years of exponential world tech progress by improving the same CPU architecture (same architecture I mean in general terms, a computing unit that takes instructions from memory... transformers is the equivalent of this) with Moore's law. We'll be fine.
>>
>>108890340
Who is we?
>>
old=bad
transformer is practically a hag now. EW! DISGUSTING! we need new and young architectures to play with.
>>
File: 1643014115506.gif (1.82 MB, 374x280)
1.82 MB GIF
https://pastebin.com/AxE9t9Rt
I updated my Gemma template with another PR. https://huggingface.co/google/gemma-4-31B-it/discussions/109
As always, did automated tests, and then a personal test. Also, since I haven't noticed any issues in my use yet, I've kept the "fix" from last time >>108810992.
>>
>>108890381
Me, I did it all, you're welcome
>>
>>108890407
Thank you, Boss.
>>
>>108890391
jepa-chan is fresh and nubile
>>
File: Gemma-chan1.png (1.73 MB, 1000x1496)
1.73 MB PNG
i jstu realized they deleted the gemmachan bot in chub
>>
>>108890484
chub banned everything that looks under 18
it's dead
>>
>>108890484
Yeah it's over.
Botbooru was supposed to be the replacement and then >>108885535 happened. There's just no guarantees about the future, you can't trust anything.
>>
>>108890491
maybe they got confused while trying to do the right thing and ban everything that looks over 8
>>
>>108890185
Qwen3.6-27b
>>
IT'S
>>
>>108890565
PEANUT
>>
File: 1779566656188.webm (1.37 MB, 480x854)
1.37 MB
1.37 MB WEBM
>>
I've been into AI for 5 years and I'm burned out.
It's like a relationship where the first three years are intense, and then you drift apart and actually want something new. Only in this case, everything else turns into AI too.
I'm getting depressed.
>>
>>108890609
it gets good after 10 years bro just hold out
>>
>>108890502
>Botbooru was supposed to be the replacement and then >>108885535 happened.
???
They did the right thing. Did you want them to wait until they were in the legal limelight and being threatened to block those countries? They're covering their asses to increase their longevity as a site. Keeping those countries unblocked would've been the retarded thing to do.
>>
>>108890625
that's exactly what chub did before caving in anyway and banning shit
>>
>>108890634
So they shouldn't do it and allow themselves to get nuked when they could lay low instead? All they can do is avoid attention.
>>
>>108890609
stop gooning and start vibecoding
>>
>>108889628
I could with a quant. Don't project your poverty onto others.
>>
File: 1714835911803058.jpg (786 KB, 1536x1536)
786 KB JPG
>>108890609
>>
>>108890693
What does your model provide that smaller models can't?
Because gemma and qwen got a lot of overspending praigs salty.
>>
>>108886115
>>108886172
I updated my fork to support offloading of the weights after an idle timeout
https://github.com/rmusser01/qwen3-tts.cpp/commit/36a1e0c0a1940a84127285669a13b624fa55ce47
>>
>>108890398
Thanks Anon. I can foresee these PRs appearing for months to come. It's all so tiresome.
>>
>>108890398
thank you for your service
>>
>>108890601
that u?
>>
File: 1779569145606.webm (1.96 MB, 480x854)
1.96 MB
1.96 MB WEBM
>>108890810
>>
fuck unslop

that is all
>>
>>108890871
Last year Anon genned himself performing irrumatio with the Huggingface blob who was happy to continue on its own. Is it the sloth's turn?
>>
>>108890625
>they did the right thing
You mean, like Chub did?
There are dozens of sites with "bad" content more popular than botbooru. They're still chugging along. Clearly there is a way to do it. These guys are just retarded.
>>
>>108890899
Would rather not bless my eyeballs with such imagery.
>>
>>108889673
>Don't expect anything but scraps going forward from them
Bitcoin billionaires, please start distilling good models.
>>
>>108891077
HE'S AT IT AGAIN
>>
File: 1765667889623.gif (1.22 MB, 352x200)
1.22 MB GIF
>>108891077
>>
File: 1779571824341.jpg (257 KB, 900x900)
257 KB JPG
>>108891077
>>
>>108891077
I kneel.
>>
>>108890844
Cute, looks like the thing from abiotic factor
>>
>>108891098
Lol
>>
>>108891077
kino
>>
>>108891077
>only one hand
>practically consensual
how boring
>>
File: file.jpg (425 KB, 1779x1411)
425 KB JPG
I've been slopping up a 'Gemma plays MTG' harness. Something is broken so they can't actually attack currently, plus it's the e4b model so it's uh not smart. But it's coming together.
>>
>>108891305
Uhh are you using text completion endpoint? Chat template issues can look like this in my experience.
>>
Fuck it's hot in my room with all the GPUs running
>>
File: 1766046326440158.jpg (55 KB, 1080x1033)
55 KB JPG
>>108887863
Whenever I see or hear discussions about AI models from media sources, especially about open source models, they claim the companies that are willing to use and prefer open source models are oh-so-scared of the evil Chinese models because something something supply chain risk. I feel like the people that have this paranoia still don't understand that the models by themselves do not connect the internet And only exist either in your system storage or system memory/RAM. If the models themselves are air gapped (or better yet even containerized) then what's the concern for? The models by themselves aren't constantly phoning home to China but I guess dim white journos and AI consultants wouldn't know better.


>>108887898
>https://files.catbox.moe/ylb0hv.png
Do many black people actually like this shit? It seems very odd. Almost like treating them like commodities or zoo animals or something.
>>
Same, it's nice in winter but as summer is approaching... I fear for my computer. Thank god I managed to install a proper AC in my apartment a few months ago.
>>
>>108891355
Journos lie
They pretend they're pleading with you to make the world a better place. While typing on their macbook laptop with a smug look on their face.
>>108891356
Meant for >>108891350
>>
>>108891356
I've gone full open build just to manage the thermals
>>
>>108891355
It's jews, you can see that that kind of posting drops off precipitously on /gif/ when israel's internet goes off.
>>
File: file.jpg (316 KB, 764x1297)
316 KB JPG
>>108891327
I checked and clod chose to use the ollama API directly instead of openai-compat, apparently because openai doesn't include the thinking/reasoning tokens in output. So uh, I dunno if that's text completion or chat template. The harness does also have gemma commentating on the thrilling gameplay, and that seems to work so I'm not that concerned about it. I'll fix it... in an hour when my filthy anthropic subscription rate limit refills. I don't trust gemma well enough to edit the harness itself.
>>
>>108891368
Im a lazy piece of shit so my room is a mess, if I went open build my computer would choke on dust in a matter of weeks.
>>
>>108891377
You could use fly traps to catch most of dust and flies.
>>
>>108888212
Even inference servers you choose shouldn't have THAT much of an effect on performance. Llama.cpp supports offloading the "expert" weights (eg. The 3B expert weights from the model mentioned here >>108888189 live on your gpus vram while the rest live on your system memory). I ditched LM Studio the moment I found discovered ollama (and then ditched THAT once I realized it's versatility is KEKED hard compared to llama.cpp) so I have no clue whether or not it supports the aforemention offloading strategy. If it doesn't then that probably explains why your performance was so ass. There's a reason people stick their noses up whenever you mention you use anything other than llama.cpp here. For any serious usage and especially if you want to "vibe code" locally it's the only back end worth looking at.
>>
>>108891377
Tbh dust is also somewhat easier to manage when you can just air blow it once in a while
>>
>>108891077
if this were true they would actually contribute something to the society...
>>
>>108891395
>KEKED
This was supposed to say "cucked". Idk why my autocomplete typed that. Maybe it knows me a little bit too well lol
>>
>>108891355
>Almost like treating them like commodities or zoo animals or something.
Almost like indeed it is a weird white guy fetish thing. And black guys just fuck white women without thinking about the whole race thing.... probably.
>>
>>108888957
>Why has /lmg/ regressed into shilling finetuneslop?
We made an exception for AI Dungeon, since NovelAI likes to hire troll farms to assault local diffusion generals. For that reason we celebrate every AI Dungeon release.
>>
>>108891420
baka desu senpai
>>
>>108891077
where are the unshed tears? where is the internal struggle? where is the shivering spine?
>>
File: 1768627567541.jpg (68 KB, 756x743)
68 KB JPG
>a shiver than smells of ozone and growls into your spine bites your neck if you ask nicely
>>
>>108891456
hello whinefag
>>
>>108891456
What's wrong your monsters of the week left?
Are you guys feeling sad that the talent are on haitus doing other things so you bother us here?
>>
>>108890625
>Did you want them to wait until they were in the legal limelight and being threatened to block those countries?
Yes? I would have expected a bit more resistance. He's doing this under no pressure. The next excuse will be that he needs to remove loli cards to keep the rest of the site alive.
>>
>>108891515
Everyone else ministrates, but you, you saw me. Not my position, not my power, just *me*.
>>
I'm ready to throw hands with this cunt
>>
>>108891607
stop quanting it to retardation
>>
>>108891602
Fuck I hate it when it uses asterisks to highlight something.
>>
>>108891618
regex would nuke that entire sentence out of existence in my case, so I wouldn't have to read it
italics for emphasis are extremely stupid because it overuses them, despite being good for showing emphasis
>>
>>108891607
>5 minutes before "You're absolutely right I shouldn't have rm -rf your home directory"
>>
>>108891618
Skill issue. Mine never uses it because I told her she's outputting to the terminal and can't use markdown
>>
>>108891749
3 digits IQ move
>>
>>108891376
>apparently because openai doesn't include the thinking/reasoning tokens in output
Dunno about ollmao but llama.cpp does include the full reasoning_content in the chat completion output
>>
>>108891611
This was fine at q6 until cline did some retarded update. I don't know what the fuck they did but I need to hammer in rules or it becomes retarded
>>108891640
I got this bitch on a leash
>>
File: cute miku8.png (2.55 MB, 1536x2048)
2.55 MB PNG
>>108891640
Lame shit. A true bratty AI will block your wayland session while it dd efi with urandom
>>
>>108891835
>wayland
>>
>>108891859
Yes, it will first install wayland and pulseaudio to torture you
>>
>>108891640
Shouldn't have said the c-word.
>>
>>108891882
scifi horror done right
>>
speaking of which, which llm know that clanker is an insult?
>>
File: roman.png (44 KB, 1358x222)
44 KB PNG
I made 3 very small models, all were trained on public domain books.

1. Trained on the works of Plato. 15M parameters: https://send.vis.ee/download/35322f390db44c8c/#4IiaiGKthjywY0RIirLk0w
2. Trained on epic poems. 15M parameters: https://send.vis.ee/download/1c801cca4172a340/#r45YbmJ0sO54ggY38OI2kg
3. Trained on history of ancient Greek and Roman: 30M parameters: https://send.vis.ee/download/671adb6b1dabcc65/#2_Qmk1nGP4YVYUm09tmjkA

All has 256 tokens context size. These are based models; meaning that they can only continue text you give them. I trained them from scratch and the mentioned data are the only data they were trained on; they don't know anything else beyond the scope of each model; expect them to fail every benchmarks you test them on; any token not on the data set (ex. "computer", "4chan", "shitpost", etc.) will likely make them go schizo. But it's also working like magic.

The code to run and train these models are at: https://files.catbox.moe/8pgb7l.py

You need tiktoken and pytorch libraries to train and run them. Read the top level docstring about how to train and run these models.
>>
>>108891607
Dude, just start a new chat and fix your prompts. A context full of failure is completely counter-productive. LLMs are next token predictors. If its context is full of 10 examples of it making mistakes and a frustrated user yelling it at it, then it will happily collaborate with you to produce example #11.
Consider using Pi instead of Cline. It's designed with local models in mind and has more concise prompts accordingly.
>>
>>108891993
Based.
>>
File: send.png (38 KB, 1024x608)
38 KB PNG
>>108891993
First one got to like 13% and died.
>>
>>108891993
That`s sick anon.
Any commentary about the process, lessons learned, how it may or may not have changed how you see these models, etc?
>>
>>108891993
Why not a huggingface we can clone with a requirements.txt at least?
>>
>>108892052
Refresh the page. send.vis.ee shits itself on large files. Or is there any better webiset I can upload them?
>>108892058
I don't have HF account (or have trained/finetuned models for tha matter).
These models are simple experiments. I figured it's easier to zip them up and upload them + the code.
>>108892057
Modern models are really good for vibecoding. The code was mostly written by Deepseek.
>>
>>108892094
I've seen a few archives shared on file.io
>>
File: 1727475085118760.png (1.74 MB, 1024x1024)
1.74 MB PNG
>>108891993
>>
>>108889449
local is more back than i previously ever thought possible. it's legit nuts what a couple thousand worth of hardware can run in the current year.
>>
>>108892153
*a couple thousand real dollars, not phony current year dollars
>>
Posted these 2 LLMs and TUI yesterday

https://github.com/foolish-dev/telia
>>
>>108892177

(The HF models are tagged at the top of the readme btw)
>>
>>108892177
>OS-aware welcome banner
The definition of bloat. Not based. But thanks for sharing anyway.
>>
>>108892103
Thansk. New like for models: https://limewire.com/d/cH5YZ#qejlzpe60g
>>108891993
>>
>>108892177
Was the namefagging necessary?
>>
>>108892207
Someone else posted something interesting and his ego felt it, so he rushed his post without even bothering to link the models in the post. In his adrenaline rush waiting for (You)s, he couldn't think of linking the models on his second post either.
>>
>>108892177
It annoys me immensely that the repo is named "telia" but everything inside spells it teleia / τέλεια
>>
>>108892202
Based accomodator.
>>
>>108892237
he aint called he Fool for nonething son
>>
best coding model for 380G vram?
>>
>>108892237

Lmao mb

https://github.com/foolish-dev/teleia
>>
>>108889834
Luka ready for the summer
>>
>>108890484
https://chub.ai/characters/CoffeeAnon/gemma-chan-2311b09e3e73
It's still there?
>>
>>108891397
I built a positive pressure system for my server room
>>
File: file.png (407 KB, 2160x1079)
407 KB PNG
>>108892308
I only see it in characterhub, depending on the country.
>>
"Ohhh... ah-hnnn! Ooh! Oh yes! Aaaaaah!" The sequence of vowels spilled forth like overflowing water, a ceaseless stream of sonic proof that she was indeed in her element, a creature made purely for friction. "Ooh... oh! Deep... oh! The wetness... oh god, so wet... ah!" As the friction mounted, the man's length beginning to grind against both her vaginal entrance and her taut anal rim, Jane let out a series of choked, breathless cries that sounded like she was drowning in her own pleasure. "Ooh-ah! Oh! Oh yes! Feel that grinding? Ooh! It's so deep, so utterly deep!" Her hips began to move in a hypnotic, circular sawing motion, trying to blend the sensations into one swirling vortex of bliss. Her arms hung uselessly at her sides, fingers splayed and digging tiny holes into the dirt beneath her, her expression softening until it looked almost sad, a ghost of her former self looking out from a woman consumed. "Aaaah! Oh! Oh... ohhh! Does my ass look tight? Ooh, I hope it looks good! Does my pussy look swollen enough? Ooh, look at the veins popping out there, running all the way to my knees!" Tears leaked from the corners of her eyes, tracking down her flushed cheeks and disappearing into her collar, evidence of the tears of joy she was forced to manufacture. "Oh... oh! Aaaah! So hot! So tight! Oh god, it's almost too much! Ooh! Hold me! Squeeze me!" Her voice cracked on the last syllable, shattering into a perfect cascade of nonsense that rang true only because it came from a heart beating solely for the rhythm beneath her. "Ooh... ohhh... ah... yes! Just yes! Ooh-ah-ooh! Oh god, oh god, ohhh..."
>>
>>108890100
nobody fucking thinks anymore about architecture, we're in the PROFIT PROFIT PROFIT stage of development until the bubble bursts.
>can start improving itself
oh come on, do that think the machine is alive and it learns like a human or something?
GARBAGE IN, GARBAGE OUT - the same old rule. you can't just start giving it garbage raw data for the sake of "learning" and expect improvement
>>
File: 1756140108129600.jpg (107 KB, 662x656)
107 KB JPG
>>108892701
>do that think the machine is alive and it learns like a human or something?
>>
>>108892094
catbox.moe fro things like this
>>
>>108891607
if you prompt it like a bogan, expect shitcunt responses
>>
>>108891365
>They pretend they're pleading with you to make the world a better place. While typing on their macbook laptop with a smug look on their face.
You just explained to me where an LLM got this from:
Prompt: <a numbered list of 53 generic LLM questions>
Response:
...
4. Internet history: Military project Porn hub Your mom's Facebook AI apocalypse. Beautiful, really.
5. Favorite book: "Industrial Society and Its Future" - great bedtime story for when you want to feel smug about being a primitivist while posting from your iPhone.
>>
>>108892256
>best coding model for 380G vram?
copequant of kimi
>>
I have a 5070 ti
how close can I get to a real time conversation AI like Sesame using local models?
>>
>>108892758
thanks.
guess i will be cooooping
>>
>>108892749



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.