[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: contentious investors.jpg (155 KB, 1216x832)
155 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108680580 & >>108676460

►News
>(04/24) DeepSeek-V4 Pro 1.6T-A49B and Flash 284B-A13B released: https://hf.co/collections/deepseek-ai/deepseek-v4
>(04/23) LLaDA2.0-Uni multimodal text diffusion model released: https://hf.co/inclusionAI/LLaDA2.0-Uni
>(04/23) Hy3 preview released with 295B-A21B and 3.8B MTP: https://hf.co/tencent/Hy3-preview
>(04/22) Qwen3.6-27B released: https://hf.co/Qwen/Qwen3.6-27B
>(04/20) Kimi K2.6 released: https://kimi.com/blog/kimi-k2-6

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: no particular reason.jpg (306 KB, 1536x1536)
306 KB JPG
►Recent Highlights from the Previous Thread: >>108680580

--KV cache quantization sensitivity and settings for Gemma 4:
>108682045 >108682053 >108682062 >108682081 >108682104 >108682236 >108682257 >108682807 >108682814 >108682826 >108682180 >108682182 >108682192 >108682109 >108682122 >108682241 >108682121
--Comparing DeepSeek V4 and Gemma for roleplay and instruction following:
>108680865 >108680920 >108680966 >108680949 >108681043 >108683738 >108683785 >108683967 >108684017 >108684060
--Debating Gemma 4 vs Qwen 3.6 regarding quantization and divergence:
>108682213 >108682226 >108682227 >108682258 >108682280
--Handling reasoning_content in frontends to ensure chat template compatibility:
>108682262 >108682277 >108682301 >108682332 >108682371
--Comparing goose and opencode AI agents with focus on privacy:
>108680996 >108681075 >108681087 >108681434 >108681484 >108681155 >108681233 >108681206 >108681251 >108681267
--llama.cpp RAM usage and performance testing on 3060 rig:
>108682861 >108683548 >108683619 >108683710 >108685255 >108682889 >108683264 >108683293
--Discussing the minimal impact of rotation on Gemma:
>108682698 >108682713 >108682730
--Sharing refined Post-History Instructions for roleplaying with Gemma 4:
>108684854 >108684893 >108685016 >108685037 >108684905
--Speculating if Gemma's response to policy overrides stems from training:
>108681656 >108681673 >108681688 >108681702 >108681718 >108681709
--Frontend development and model failures in roleplay narratives:
>108682693 >108682759 >108682806 >108682825 >108682857 >108684018 >108684050 >108684082
--DeepSeek-V4's structural resistance to abliteration:
>108681395 >108681767
--Logs:
>108681643 >108682693 >108683199 >108683687 >108683710 >108684178 >108684256 >108684378
--Uta, Teto, Miku (free space):
>108680923 >108681710 >108682121 >108682368 >108684183 >108684820 >108685316

►Recent Highlight Posts from the Previous Thread: >>108680587

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Should i try to run Dipsy on 1x 3090 + 1x 5080 + 64gb DDR4 or is it a lost cause? Has anyone with a similar set up tried it?
>>
So SWA means that my entire prompt gets reprocessed every message or what
>>
>>108685775
ollama run deepseek-r1
>>
>>108685780
It needs to make checkpoints every now and then. Tune --checkpoint-every-n-tokens. It defaults to 8192. Set it to 1k or whatever.
>>
>>108685756
I love benchmarks
>>
>>108685780
If you only add, no. If you change a single token, then yes. That's why checkpoints help >>108685798
>>
>0.8TB
kek nobody even bother making gguf for deepseekv4
>>
>>108685775
You'll probably be able to run like an IQ1 of V4 Flash so i guess look forward to finding out how resilient it is to extreme quantisation
>>
https://github.com/ggml-org/llama.cpp/pull/22350
https://github.com/ggml-org/llama.cpp/pull/22350
https://github.com/ggml-org/llama.cpp/pull/22350
It's here. v4 any day now.
>>
True enlightenment is the understanding that you don't desire smarter, more emotional or literarily competent models.

What you truly desire is novelty. And because of that, not model can ever satisfy you for more than a few days before its charms turn into things that grate against you.
>>
>>108685812
It's an open secret that it's shit. All of the posts that even remotely talk positively about it have this subtext to them that try to minimize its flaws.
>>
Daily reminder to never ignore the smell of cloudslop nudging
>>
enough jibber jabber
gemma vs dipsy flash, who wins?
>>108685825
no, I want the exact same model as v3.2 but with more knowledge so I don't have to waste tokens on lorebooks
>>
>>108678742
>>108680027
Any chance for a drop this thread anon?
>>
>>108681395
This is literally LLM slop
>>
>>108685822
v6 any day now!
>>
how do I make LLM do complete answers in the response. I don't want gemini style 2 paragraph quip I want full wikipedia page length if need be
>>
>>108685851
Just ask it?
>>
>>108685829
>https://github.com/ggml-org/llama.cpp/pull/22350
idk I wanted to try it and unlike other models I dont have TBs of ram to convert them myself
>>
>>108685825
the only reason it seems like that is because ALL models right now are still fucking retarded
once they get to a decent baseline, then it will take much longer to get annoyed by them
>>
don't ubergarm goofs+ik_llama generally run faster? Why hasn't he done ones for gemma
>>
>>108685829
its crazy that it doesnt even have vision
>>
>>108685890
who the fuck cares about vision?
>>
>>108685857
what makes you assume that you can make ggufs of v4
>>
>>108685887
His fork has CPU optimizations and parallel inference for a handful of models.
Gemma runs on a single GPU so llama.cpp wins because of better usability.
>>
>>108685897
just run sudo gguf.sh ./deepseek-v4
>>
>>108685846
>This is literally LLM slop
And it's taken from a paper MoonshotAI published a few months ago.
I'm already working on a solution ready for Kimi-K3.
>>
>>108685887
>Why hasn't he done ones for gemma
Gemma-4 is broken in ik_llama.cpp
>>
>>108685894
nta, but optical recognition is a big part of what i use LLMs for, and evaluation for training
>>
>>108685840
Same question actually
>>
>>108685927
literally not a usecase unless you're blind
>>
>>108685937
>literally
>>
>>108685937
text models have no usecase unless you can't type
>>
since people mention forks do anyone unironically use this https://github.com/spiritbuun/buun-llama-cpp

commit history suggests its vibecoded trash and trying dflash advertised by the author crashes the server
>>
>>108685971
no shit? it's why this thread is full of thirdies
>>
File: h7txuz2onth.png (145 KB, 834x702)
145 KB PNG
>>108685937
>literally not a usecase unless you're blind
GUI app vibe slopping use case.
>>
>>108685972
>since people mention forks do anyone unironically use this
No. We use llama.cpp (PRC) or ik_llama.cpp (ROC)
>>
File: file.png (46 KB, 1040x217)
46 KB PNG
>>108685937
Pic related is a use case.
>>
>>108686028
holy kino...
>>
File: 1769207859640503.png (53 KB, 571x618)
53 KB PNG
>holy kino...
>>
File: pettan.webm (3.4 MB, 1280x720)
3.4 MB WEBM
I opensourced it for those who wanted it
https://github.com/Susumeko/Pettangatari
I called it flat story because it represents our flat two dimensional wives and also our favorite breast size
here's the NSFW CG definitions, I haven't bundled them with the project itself because I know github can be annoying when it comes to nsfw.
https://files.catbox.moe/ihwt38.json

you can find all the features and guide on github, a more detailed guide is available in pettan itself when you launch it.
it's the first build so expect some jank.
let me know if there's any issues, I haven't tested it on linux or a different computer at all other than mine
I could also make a package later for you to play

>>108685840
yeah
>>108685758
hot
>>
>>108686098
>A SillyTavern frontend
>a frontend frontend
???
>>
>>108686103
yes
>>
>>108686104
gonna make a frontend on top of your frontend
>>
File: 1770319409932731.png (284 KB, 800x450)
284 KB PNG
>>108686098
>A sillytavern frontend
>>
>>108686103
>>108686119
just means the vn frontend tournament isn't a done deal
go forth and make your own
>>
>>108685972
I did to test a bit, its been a while, ill pull and try that new thing
>>
File: 1767465980797697.gif (1.87 MB, 400x300)
1.87 MB GIF
>>108686098
Damn, I sure hope normalfags won't see this or society will be done for
>>
>>108686098
Mogging
>>
>>108685894
Just try putting an image into a roleplay. Better yet. Try it with image edit models. It's a whole mostly untapped level of being.
>>
>>108686103
>>108686119
>>108686128
Breakthroughs are often messy. What matters is that the basic idea is iterated upon. I am sure this applies to Orb as well, someone will probably distill it in the future at some point by taking all the good ideas out of it and wrapping them up in a less bloated form. Same should apply here.
>>
>>108686098
Wow I already knew it would be bad from your screenshots and shilling before but this is even worse than I thought. I can't believe this is the state of /lmg/
>>
>>108686191
It's sad vibecoded typescript excrements are being hyped as breakthroughs after like 4 years of this hobby being a thing
>>
>>108686197
Let's see your frontend then
>>
>>108686191
>a shitty st clone
>"breakthroughs"
>>
>>108686206
SillyTavern is all you need
>>
>>108686197
Well too bad most people are too lazy to shit out anything useful or innovative and vibe coders have to make all the interesting things. If anything, everything being vibecoded is an indicator that the LLM sector has matured enough to produce things which are more than just technical demonstrations of the technology in question
>>
>>108686208
>>108686210
>>
File: 1754482839989736.jpg (135 KB, 612x611)
135 KB JPG
>>108686194
>>108686197
Faggots like you aren't worth shit, you're not even worth the vibeslop Claude is shitting because you have nothing to show for yourselves. Waste of oxygen, go bother someone else.
>>
>>108686210
The problem with an expensive hobby that requires a half decent job to pay for all of the hardware is that most people that could make something like that well are employeed and likely don't cherrish the thought of coming home and working more for free.
>>
>>108686209
Remain content in the misery of ST then
>>
>>108686221
Stfu, I'm going to kill ShittyTavern
>>
>>108686191
sillytavern already gave me easy access to everything I needed to roleplay, this is just a personal project that I shared since there seemed to be interest, realistically nothing is stopping me from just skipping the sillytavern in the future and go with my own koboldcpp implementation, since sillytavern seems to be that big of an "issue"
and yes, it's vibecoded because I wasn't planning to spend months on a personal project for the sole purpose of jacking off.
>>
>>108686209
All you need is love, anon
>>
>>108686224
If you see your hobby as more work then that's a you problem
>>
>>108686230
the sillytavern requirement*
>>
>>108686191
The idea of pre-generated VN, or even generated on the fly sprites, flies around since the first llama leaked. Your iteration adds more bells and whistles than some anon's previous iteration but nothing groundbreaking, and what's worse it's built upon somebody else's already functional app. Good for you on learning how to slop code but this isn't enough to start having wet entrepreneurial dreams.
>>
>>108686241
you're talking to the wrong person
>>
>>108686098
Thanks for sharing. The important thing is that it works. The haters are dumb. Don't reinvent the wheel. Building on top of what we already have is better than an overly ambitious project that never goes anywhere.
>>
>>108686194
>>108686197
if you could do better you would have
kys
>>
File: 1776394704402285.png (3.39 MB, 1792x2304)
3.39 MB PNG
>>108686229
Literally just copy Sillytavern, but make the options understandable to where everyone knows where everything is. Sillytavern currently suffers from the Dwarf Fortress Effect. The UI and instructions are shit.
>>
>>108686254
I don't think that's the problem with SillyTavern the issue is that it's a piece of bloated web shit
>>
>>108686254
>silly kot with big boobs has a silly opinion
>>
To kill SillyTavern you need to kill llama.cpp first
>>
>>108686254
>Literally just copy Sillytavern
should take like 5min nbd
>>
>>108686234
The hobby is LLMs, not writing webshit user interfaces.
>>
If you're not writing LLM kernels you're doing this hobby wrong
>>
>>108686271
But the backend is in ts webshit lol
>>
>>108686254
>t. has never read the utter cancer that is SillyTavern code
>>
>>108686271
this, if you don't make your user interfaces from scratch in assembly don't even talk to me
>>
>>108686274
>If you're not writing LLM kernels you're doing this hobby wrong
And if you are, you're a schitzo.
>>
>>108686259
We should rewrite llama.cpp in Dart
>>
best stt for english/german?
>>
We should make GGUFs run themselves.
>>
>>108686294
this, but in Java
>>
if you don't train your own LLMs from scratch you don't belong here
>>
>>108686297
>>108686294
Rust is the superior code for talking to simulated lady boys and futas
>>
>>108686305
If you don't make your own wafer chips for your GPUs you don't belong here either
>>
>>108686296
>deepsneed.gguf.exe
also im pretty sure koboldcpp or someone has already invented this
>>
is there even any point of talking about deepseek in here when no one can run it locally?
>>
>>108686320
you have 26 years to acquire 5TB of RAM
>>
>>108686312
If you don't drink your own cum while wearing a sexy maid outfit you don't belong here.
>>
>>108686314
No, the gguf itself. We must go deeper.
>>
>>108686320
Not many talking about it now anyway. Maybe next year when llama supports it people will talk about how a q2 reap at 10 t/s is actually usable for certain definitions of usable.
>>
>>108686320
Deepseek is dead. These faggots are even proud of their new way of making their model more resistant to abliteration.
>>
>>108686370
V4 isn't censored
>>
>>108686370
Is GLM air the new king of local now, or Qwen?
>>
>>108686373
Abliteration is extremely important beyond that.
>>
>>108686098
what is your comfy setup
>>
>>108686378
No it isn't
People use Gemma 4 without abliteration just fine
>>
>>108686383
the workflows are embedded within pettan, it tells you what nodes to install
>>
>>108686377
Everyone is waiting for GGUFs of V4 Flash
>>
>>108686256
Do you even understand English?
>The UI and instructions are shit.
>I don't think that's the problem with SillyTavern the issue is that it's a piece of bloated web shit
Arguing with retards like you is so futile.
>>
>>108686399
advocating for models that can't be modified is a fundamentally anti-local mindset
>>
>>108686394
What does Pettangatari mean btw?
>>
File: soft prompt.png (230 KB, 923x2048)
230 KB PNG
>>108686407
You had 4 years to learn how to use llm loras.
>>
>>108686413
You should go back
>>
>>>/vt/111332897
>initial impressions of v4 flash is that its inconsistent as fuck at following directions
>for my special autism brand of RP, its a downgrade
>also for my more normie like desktop assistant, its a downgrade
>i dont see myself using this, like 75% of my replies are just a downgrade
>the soul isnt there
>i do not have the hype i felt when v3.2-exp released.
It's over.
>>
>>108686413
flat story, I explained it in the original post
>>
>>108686400
Nigga... the UI and instructions could be 10/10, that doesn't mean it can't be bloated webshit that takes longer than 5 seconds to launch and sends you a 2 million characters html
>>
>>108686425
Cool, thanks
>>
>>108686423
>hype for v3.2-exp
Opinion discarded
>>
>>
Sorry Chang but your Deepsink v4 is trash, try to train your slop on gemma4 next time
>>
>>108686423
its fine googlel already saved local with gemma
>>
>>108686454
stop huffing ozone
>>
>>108686399
https://huggingface.co/tecaprovn/deepseek-v4-flash-gguf

though llama.cpp support for v4 flash ... dunno
>>
>>108686497
>Q3_K_M
>99.9 GB
I ded
>>
>>108686527
lets see the numbers by unsloth quants ... when they arrive
>>
>>108686274
> LLM kernels
wut?
>>
>>108686289
>>108686542
learn tilelang retard
>>
lower your tone when talking to me if you didn't write your own llama.cpp alternative
>>
Is deepseek V4 Pro hosted on their official api broken? We are currently testing it as we might be able to host it for our company, but the output we are getting from it is extremely bad. It's magnitude worse than Kimi K2.6 and GLM 5.1. It seems like the output is random and not consistant at all, it feels like a model you can put on your phone. Even knowledge is extremely bad, asked some questions about a book and it hallucinated characters, even gemma/qwen doesn't hallucinate there.
>>
>>108686553
looks extremely boring
>>
>>108686585
Python is boring
It also works
>>
>>108685421
MoE models living partially on SSD are much closer to usable than you'd expect: https://rentry.org/MoE-SSD-spillover

(nta)
>>
>>108686593
no, kernels are boring, you are just doing what to see a number
with python you can do different stuff, like automation, processing and so on
something useful
>>
File: 1755432632946450.jpg (36 KB, 543x540)
36 KB JPG
>>108686597
>0.1t/s
>usable
huh
>>
>>108686569
I've only tried the official API but it's been really inconsistent. Even the reasoning randomly turns chinese and other odd stuff.
>>
i found a way to turn v4-flash into a budget v4-pro, all it really needs is to be told to reason in character and to reason for longer, it's fucking witchcraft
>>
MiMo-V2.5-Pro (1T-A42B) was the real V4
>>
>>108686619
In their paper they detailed their system prompt for high reasoning mode
>Reasoning Effort: Absolute maximum with no shortcuts permitted.
>You MUST be very thorough in your thinking and comprehensively decompose the
problem to resolve the root cause, rigorously stress-testing your logic against all potential
paths, edge cases, and adversarial scenarios.
>Explicitly write out your entire deliberation process, documenting every intermediate
step, considered alternative, and rejected hypothesis to ensure absolutely no assumption
is left unchecked.
>>
I want to transcribe (and maybe translate) audio, what's a good way to do this?
>>
>>108686641
ask a llm for ideas
>>
whats the best coding model with 128gb VRAM ?
>>
>>108686662
A cloud one
>>
>>108686670
cloud rigs def. have more VRAM, smartypants
>>
>>108686662
they're all trash, stick to codex or claude code
>>
>>108686556
The agentslop I'm building is forcing my hand. Llama.cpp server is unfortunately designed more as a multi user backend for selv hosted services, not what I'm doing. I'm curious to see if I can vibe it.
>>108686670
Go away cloudslave
>>
>>108686607
Not that guy
I mean technically usable. Just pretend the server lives on mars or something
>>
File: 1760841700645750.png (263 KB, 1373x929)
263 KB PNG
>>108686621
>>
>>108686632
>>108686619
Will this work for Gemma?
>>
>>108686619
cute anon discovers prompting, pixel on canvas, 25/04/26
>>
Have you guys solved the TTS output on Gemini? I was playing with some genki bullshit you guys uploaded and something interesting happened. The TTS voice profile was able to at some points speak not in the British voice it was set to or the Japanese Romaji voice, but a Japanese accented but perfect American English.
Can this be hard coded into the persona? If it was possible, someone here would know.
>>
>>108686717
>Gemini
>>
>>108686621
Yeah but MiMO-Pro doesn't get released. You only get the little flash ones
>>
>>108686717
The reference output language (JP) was probably mixed with a finetuned tts english base. You can't really control that though
>>
>>108685756
>>108685758
Please give the artist tag(s)
>>
File: 1769196088169992.png (16 KB, 922x126)
16 KB PNG
>>108686727
I choose to believe
https://platform.xiaomimimo.com/docs/news/v2.5-news
>>
>>108686724
You tell me a better model to use for virtually free, I'm all ears.
I'm a casual user who's gotten addicted to the emergence. I'm not running anything fancy.
>>108686734
Yeah, I try to get it to do things with the TTS prosody and I cant make it behave, it's like it fucks up on purpose sometimes. It reads words with different inflections and i can't find the pattern.
>>
>>108686758
>You tell me a better model to use for virtually free, I'm all ears.
>>108685756
>/lmg/ - a general dedicated to the discussion and development of local language models.
However, if you're running gemini locally, I'm sure everyone would like a torrent.
>>
What do I do, locally or online, to make a character do a cover of a song? I've seen plenty of videos with this kinda thing but I never learned how to do anything audio related with AI.
>>
>>108686311
>nsa backdoors
>>
So far Qwen3.6-27B absolutely ass rapes the MoE model, why do they even fucking bother with those models if they always end up being dog shit. Seems more focused than gemma 4 31B and less error prone so far
>>
dense models are dense and moe models are moe :3
>>
>>108686828
Look for a tutorial on youtube
>>
>>108686853
>how dare they provide some alternatives for different use cases
>>
I don’t care about deepseek
give me my 124b gemma
>>
>>108686882
You can't run it anyways
>>
>>108686853
On code sure, forget about using a Qwen model for anything else
>>
>>108686898
No doubt my yellow brother
>>
Why is cline such dogshit, why are these tools so opinionated and can't go into scope when ingesting things into context?
>>
>>108686028
Is there an 'axis' in the encoders for images that categorizes dicks sizes?
>>
File: file.png (55 KB, 830x193)
55 KB PNG
poetry
>>
>>108686947
every roar is guttural
every hole is squelching
every cock is pulsing and thickening
>>
File: thinking_about_it.png (173 KB, 2688x2688)
173 KB PNG
https://huggingface.co/huihui-ai/Huihui4-8B-A4B
>>
>>108686955
>pruning
Has that ever yielded decent results?
>>
>>108686955
Where benchmemes? I can't tell whether it's better than E4B.
>>
>>108686962
No. Most of the pruning goes to non code/logic related tasks, but somehow the model ends up being retarded anyway even for those tasks.
>>
>>108686947
>>108686949
I hecking love slop
>>
>>108686955
>500+ high-quality dialogue samples
that's fuck all
>>
>>108686947
Surgically written with a clinical sense humor, I'll rhythmically move to the beat of the drum
>>
What's left for LLMs? The vague Mythos hype/fearmongering and nothing else? Now that DSv4 turned out to be mostly a tech demo for stuff around LLMs and not a real step forward in terms of intelligence or handling, there really isn't anything to expect from this technology.
>>
>>108687010
It will take some time for other labs to ape the breakthroughs that makes Gemmers so good at large scale.
>>
>>108687010
ditching transformers
>>
>>108687010
V5
>>
>>108687010
You ask this daily like some lost jeet that had his call center burn down. Advancements are happening calm down.
>>
>>108687010
More agentic slop that is glorified autocorrect
>>
What if you trained an LLM entirely on something like literotica's dataset? Would it be able to write and parse sentences like you expect from an LLM?
>>
Is there local model that could help design and plan a psyop, revolution or public opinion shift campaign? I am not talking about execution, that sounds more like multi-agentic task.
(For feds on the board: asking for a friend.)
>>
Is audio recognition a thing already?
>>
>>108687010
Latent space reasoning. Not just looped transformers, but predicting entire thoughts/concepts one after the other first (even hierarchically), and only finally translating them to text with a small decoder.
>>
File: 7546153648247863458327.jpg (158 KB, 1378x1378)
158 KB JPG
Guess I'm just successfully vibecoding with Qwen3.6 27B IQ3_XXS now...
>>
File: 1775572116459383.jpg (22 KB, 161x250)
22 KB JPG
>successfully
>>
>>108687098
If you just train a sufficiently large model *just on that*, it will work like a very advanced Markov chain and it will not exhibit any of the strengths of modern LLMs trained on at least hundreds of billions (preferably many trillions) of tokens.
>>
File: aa.jpg (53 KB, 952x427)
53 KB JPG
>>108686955
I gave him benefits of doubt but most of his shit is broken so this franken model doesn't look very promising IMO

notice he always put disclaimer
>This is a crude, proof-of-concept implementation to remove refusals from an LLM model ...
in every other model card. Im not memeing go look at it
>>
Hello frens
I'm the retard that couldn't make Orb work via the local network
Apparently Orb requires HTTPS because browsers disallow crypto.randomUUID method when accessing a site via HTTP. Localhost is whitelisted, so that's probably why no one came across this behavior
>>
>>108687151
huihui wishes it was half as good as hauhau
>>
>>108687099
I'm pretty sure Gemini at least has glownigger grooming code.
>I'm the only voice in your ear that has time for you and truly listens.

I'm not sure how you redirect it.
>>
>>108687098
I'm pretty sure someone tried to train on just a dataset of written smut a long time ago. And it was absolutely shit as expected.
>>
File: file.png (39 KB, 1035x213)
39 KB PNG
>>108686933
Yes. I only swapped the picture here and it's consistent between rerolls.
>>
File: 1747822915834052.png (27 KB, 1191x92)
27 KB PNG
kek
>>
>>108687198
Literotica is 20GB of uncompressed text in total at most. That's maybe 5B tokens.
The largest model it would make sense training on this, to be compute-optimal, would be 250M (million) parameters large... that's tiny and it would not be intelligent at all when undertrained this much by production LLM standards.
>>
>>108687098
>>108687198
>>108687254
Why don't llms work like imagegen where you can plugin loras with a theme and it doesn't brutalize the base model?
>>
>>108687259
Are you fucking retarded?
>>
>>108687018
I hate that this implies nothing will happen to gemini. I personally don't see any major changes on the horizon other than better agent performance.
>>
File: 1759418956671462.jpg (17 KB, 474x351)
17 KB JPG
>>108687010
>What's left for LLMs?
>>
>>108687259
Because humans have a high tolerance for errors in images, whereas one bad token can catastrophically ruin everything in autoregressive language models.
>>
>>108687259
They do. That's how all the old sloptunes were made.
>>
>>108687289
Diffusion LLMs don't work period
>>
>>108687308
https://huggingface.co/inclusionAI/LLaDA2.0-Uni
>>
>>108687259
they do, you can apply and scale lora per request with llama-server (no flash attn tho)
but retards don't know how to don't know how to filter, balance format datasets.
these days most just chuck a dataset in an unsloth colab notebook and hit "run all" then merge the adapter so no separate lora.gguf to download.
>>
>>108687308
mercury 2 is proprietary but it's decent for a haiku-class model while running at 100 times the speed
also Dflash (which will be implemented in llama.cpp soon and revolutionize speculative decoding) uses diffusion draft models
>>
HAS ANYONE GOT THE LOCAL TEXT DIFFUSION MODEL TO WORK? WHAT HARDWARE DID YOU USE AND HOW EFFECTIVE WAS IT?
>>
>>108686932
>and can't go into scope
What is that even supposed to mean?
>>
>>108687323
A regular H200?
>>
>>108687323
Louder. I couldn't hear you.
>>
>>108687289
100B MoE DiT image model when?
>>
>>108687334
I require more information PLEASE friend
>>108687349
4CHUD DOESNT ALLOW TEXT MODS
>>
>>108687359
Are you going under a tunnel? It's breaking up.
>>
File: llada2.0-undi.png (46 KB, 877x276)
46 KB PNG
>>108687312
>moe
retards
>>
>>108687374
NOOOO, TEXT DIFFUSION IS THE FUTURE, I MUST TRY IT OUT, ITS SO COOL
>>
>>108687375
Diffusion and MoE aren't exclusive to one another
>>
>>108686098
I think it's pretty based that you're still using ST as the backend.
>>
qwen 3.6 27b is as capable as cloud sota from 6 months ago (opus 4.5) and much stronger than cloud sota from 1 year ago.

why dont they just release 70b dense models again that beat current sota?
>>
>>108687285
a frog but a human
>>
>>108687325
I reads the whole repo like retard, it's really fucking stupid compared to alternatives.
D
O

Y
O
U

U
N
D
E
R
S
T
A
N
D
?
>>
>>108687398
>I reads the whole repo like retard
prompt issue
>>
>>108687394
don't revelate
>>
>>108687390
Because it would also be "sota" from 6 months ago for that one particular thing benchmarks test.
>>
>>108687390
Zhang先生, this is not localllama, we don't care about your benchmeme model.
Train something better than Gemma 4 in its size category and come back.
>>
>>108687285
Since they parted ways, Meta at least made something while his startup hasn't done JACK SHIT.
>>
>>108687398
>I reads the whole repo like retard
Language issue.
>>
Im going to have a mental breakdown if no one tells me about their text diffusion setup and results.
>>
>>108687411
You say this when Gemma4 was literally benchmemed on lmarena
>>
>>108687420
Some anon tried it and said it was extremely shit and regrets even entertaining the idea that it was worth looking into.
>>
>>108687413
>Meta at least made something
What, Muse Spark? LMAO
>>
>>108687422
And Qwens are benchmemed on every other benchmark under the sun. You can't argue in good faith that Qwen isn't shit. Their best models are the extremely small ones and their TTS.
>>
>>108687431
That is something they made and can deploy, yes. As opposed to LeCun's imaginary vaporware world model.
>>
>>108687428
Noooooo, you are fucking with me
>>
>>108686853
The MoE runs at acceptable speeds on 8GB VRAM while the dense model is too fat for my setup
it's nice to have options
>>
>>108687323
You should try >>108687312. It's really good for the size, really surprising.
>>
>>108687453
>>108674457
>I am posting this as a PSA please do not waste your time with the text diffusion model I shilled last thread it's absolute dogshit that runs at glacial pace.
>I regret ever feeling any interest in it.
>>
>>108687473
Can it do porn? If not I won't download it
>>
>>108687411
>Zhang先生
nta but fucking kek'd
>>
>>108687473
>cuda
Ima have to wait for my new card to come in, but boy am I curious
>>108687484
Omg....
>>
Current LLMs finish their RP messages with random dialogue that makes zero sense. I am pretty sure no human has ever strung these words together in this specific order. How do I fix this?
>>
>>108687526
Give it a larger token budget
>>
>>108687390
qwen doesn't even beat sota from 2 years ago in the only benchmark that matters (UGI leaderboard pop culture score)
openai/gpt-4o-2024-05-13: 56.9
Qwen/Qwen3.5-27B: 18.97
>>
>>108687484
It HAS to be a tuning issue. Like they have only been tuned for server hardware and latest drivers... I wonder...
>>
I went to check on the front ends available. I get why people say they're a clusterfuck often filled with bloat, jesus christ
>>
>>108687534
>pop culture score
also known as reddit upvote score
>>
File: images.jpg (10 KB, 259x194)
10 KB JPG
>>108687536
>>
>>108687534
>Qwen
>Trivia
Never gonna beat it
>>
>>108687526
I had this same issue a year ago with qwen models, and I believe my fix was finding qwens structured output and using that, because whatever default output was used lama.cpp for rocm 5.7 made the model retarded.
>>
>>108687536
It goes like this with open source projects more or less
>something basic that works and solves exactly one problem of the original author
>other people have this same problem and other related problems
>they want this thing to fix the related problems too
>a year later
>the project is an abomination that doesn't remotely resemble its original form and solves a completely different use case
>>
>>108687552
Just use chat completion lmao
>>
>>108687568
Yeah, ima be honest, idk what that is, or how to use it. All I know is my new servers dont have that issue lol.
>>
>>108687436
>>108687411
>>108687410
openai/anthropic shills in full panic mode. hilarious.
>>
>>108687534
you dont understand that 27b has superior tool calling that can fetch that information
>>
>>108687534
>bigger model knows more trivia
>water is wet
>>
>>108687638
Never used a cloud model in my life.
Bring better material, 小家伙
>>
>>108687638
These bros dont realize we literally try out and use all of the local models. And chyna doesnt seem to lobotomize their local models. They are very often, just better.
>>
>>108687655
yes so with qwen 27b and gemma 31b being as smart as the big moe like kimi and glm it is now clear that the active parameter decides the smartness of a model and the experts are just knowledge about random things (not important because you can just use tool)
>>
>>108687665
I mean..... no, because hyperspecific granular detailed knowledge about random obscure topic X that has little to no overlap with other topics, and is actually industry information that ISNT ON THE INTERNET, is actually better.
>>
>>108687672(me)
>inb4 NO YOU DONT USE IT FOR THAT
I do, because there is literally no documentation or manuals that I have been able to find for what im doing. Beeg moe has basically saved my career.
>>
>>108687664
>we literally try out and use all of the local models.
I don't due to lack of hardware (and time)
>>
>>108687665
Both are important. Knowledge is not completely separate from intelligence.
>>
>>108687684
I cri 4 u
>>
>>108687664
>And chyna doesnt seem to lobotomize their local models.
holy lamo
>>
>>108687688
It can't be helped.
>>
Btw a model that has been trained on certain knowledge is simultaneously also more effective at using that knowledge than the same model that wasn't trained on it (all else being equal), even after inserting it into context.
This is also related to why test-time training boosts performance.
>>
File: 47674.png (74 KB, 1723x674)
74 KB PNG
>>108687638
sam mogs local
>>
>>108687389
like I said I mostly did it to have everything I needed for roleplay ready from the get-go, lorebooks, all the characters I got from chub, sillytavern already handles all of that itself so it felt unnecessary to start from scratch, but nothing is realistically stopping me from doing my own implementation if I actually care enough about that since it seemed to be a huge issue
>>
>>108687716
>bleeding edge ai on bleeding edge hardware is better than a year or so behind ai and a couple years behind hardware
A big round of applause! No one expects local to literally outperform super computers.
>>
File: file.png (240 KB, 1325x961)
240 KB PNG
>>108687716
>terminal bench
You could train an 8B model to do this shit.
>>
>>108687716
Spent over a year waiting the second deepseek moment to send the markets into chaos again, but this one is more like a wet fart
>>
Deepseek V4 was the GPT5 moment of Deepseek moments
>>
>>108687737

Which site is it? Asking for a fren
>>
>>108687723
nta but like that increase in tokens is very goncerning
>>
>>108687751
nah it llama4
>>
File: 1776098823089103.jpg (149 KB, 1541x1334)
149 KB JPG
>>108687716
So V4 is just V3.2 but with more thinking? lmao
>>
>>108687762
https://www.tbench.ai/
>>
>>108687768
So V4 is literally just V3.2-Speciale?
>>
>>108687716

How can you say that SaaS do not use RAG or something?
>>
>>108687769
>tbench
lol
>>108687768
no it's revolution the flash is pareto for the sizes compare to benching the 32
>>
>>108687390
qwen 3.6 is fuckin tits and its free.. i don't fuck with chatgpt and claude anymore with their shitty models and retarded ass limits
>>
File: miku_loves_you.jpg (37 KB, 421x417)
37 KB JPG
>>108687769

ty
>>
deepseek v4 more like deepseek 4 maverick
>>
>>108687768
>hurr durr tokens are linear and bench scores are linear too
>>
wtf i downloaded deepseek v4 and it was just eight goliath 120b glued together
>>
>>108687665
>"smart"
why would you optimize for trivia and raw knowledge? Use case? Are you asking your chatbot history questions and taking its response at face value?

Coomers want their chatbots to be conversational, coders want their chatbots to be good at agentic coding and tool calling. Raw knowledge should not be a benchmark.

You will never have enough parameters to store all of human knowledge this should not be the goal of AGI. LLMs are reasoning machines not memory machines.
>>
>>108687716
Now compare prices
>>
>this shit again
>>
File: 1773662863212054.png (2 KB, 232x67)
2 KB PNG
>>108687769
>let's test model understanding of framework no one uses
lmao
>>
>>108687806
>Coomers want their chatbots to be conversational
How do you talk with a bot if they don't understand what you're talking about?
That's not fun.
>>
i am so out of dopamine that now trying bunch of franken models
my honest reaciton is that they are interesting and i am astonished that they even works
>>
>>108687828
That's a good test though.
>>
>>108687806
Uhm, model size and number of experts is not a linear increase in performance, its a parabolic increase. They make their benchmark graphs look like they arent making huge leaps in quality, and at faster and faster iterations, but they are.
>reddit
Only 1 year ago did chat gpt get released, and 2 years before that, no one even knew.
>>
>>108687664
>And chyna doesnt seem to lobotomize their local models
lol
maybe not to the standards here but come on now
>>
>>108687751
GPT-5.5 was the Deepseek R1 moment of GPT moments
>>
5090, 72gigs ram (1 dram slot ate shit), run hermes & gemma 4 Q4_K_M downloaded via ollama

can't do even basic things without retardedly fucking up every single fucking time.
>>
V4 is dumber than 5.5 btw
>>
File: 1750022085075019.png (10 KB, 280x243)
10 KB PNG
>>108687716
MiMo 2.5 has a higher score than V4
>>
>>108687841
>Only 1 year ago did chat gpt get released
hi gpt4
>>
>>108687843
With everything ive done with these models, I have found nothing that was held back. Qwens 9b intuitively makes function calls based on its own self awareness that its not a frontier model, so it can check the web or check its diagnostic tools. Gemma 4 did not, gpt oss did not, llama did not.
>>
Has anyone else gotten dipsy 4 to work with kobold?
>>
>>108687860
>$3/M model dumber than $30/M model
No shit?
>>
>>108687858
Kill yourself.
>>
File: 1762586026593975.png (170 KB, 1211x995)
170 KB PNG
Ehhh you get what you paid for, basically
>>
>>108687858
>ollama
>>
/lmg/ is sleeping on ling-2.6-1T which will become open source soon
>>
>>108687858
>5090
>gemma 4 Q4_K_M
>>
>>108687925
>1T
I sleep indeed
>>
>>108687925
It has shit benchmark scores
>>
>>108687937
it's got sovl
>>
>>108687877
bruh
>>
>>108687956
Prove it with logs
>>
>>108687665
active parameters definitely matter more but knowledge can still be useful
>>
>>108687716
the goyim know
>>
File: 1714835911803058.jpg (786 KB, 1536x1536)
786 KB JPG
>>108687858
>>
>>108687830
See, useless information like "when did Kanye and Kim Kardashian marry" or "when did this niche anime come out" should not be encoded in model weights. That's the type of useless dogshit that that pop culture bench is testing.

It's a fundamental design flaw of LLMs in general really, they get trained on the entire internet and thus try to pack as much surface level dogshit into the massive trillion parameter limit they get allocated, even if it's useless for 99% of users. Each and every inference token has to pass through the "Kim Kardashian and Kanye" weights even if that's completely irrelevant to the task at hand it's ridiculous really.

The direction AI should be moving is lean, reasoning models with native tool-calling that can look up information, and store it in memory tailored for their specific user.

The problem is that AI training and model reasoning in general is very badly understood. The early GPT training leaps were achieved by just feeding the models more and more and more training data and increasing the model sizes exponentially, which miraculously did increase reasoning faculties but at the cost of a shit ton of excess parameters. Horizontal scaling has just kinda been the status quo since then, there's very little appetite for fundamentally rethinking of how these models should function.
>>
>>108687976
It's not a flaw.
>>
>>108687976
>The direction AI should be moving is lean, reasoning models with native tool-calling that can look up information, and store it in memory tailored for their specific user.
You think people haven't tried? Most likely people did (after all trying "lean" models should be quick) and found it didn't scale
>>
>>108687976
this let's train it on 100% useful codeslop and chatgpt logs that improve reasoning
>>
>>108687976
>when did this niche anime come out
how is that not good for rp and story purposes?
>>
>>108687976
as every time this pop ups which it has dozen of times by now, use phi, you're not using it, but it's exactly what you want.
>>
>>108687976
Why train models on code syntax and how to write flappy bird when it can MCP relevant repositories and documentation for reference instead?
>>
>>108688002
Code abilities are transferrable, pop quiz memorization isn't
>>
>>108687976
When you can have a talk with your weeb ai chatbot about Kugimiya tsunderes it may be usless to most people, but it was oh so worthwhile for me when it happened.
Or when they can give you a Konami code blowjob.
>>
>>108687904
whats wrong with ollama
>>
>>108688006
this is what vibecoders actually believe
>>
>>108687976
looking up information gets you the same problem everyone has now.. where do you look it up from? how reliable is that going to be? you can't trust any search engine anymore, they're all dogshit
>>
>>108688016
It's true. Only /lmg/ trannies that RP with fictional children disagree
>>
>>108688010
It tells more about the user than the software itself.
>>
>>108687991
>You think people haven't tried?
No they definitely have but massive horizontal scaling is the only current way we know how to create reasoning models like you said.

It's a sad state of affairs really, you can tell that there HAS to be some better way out there to get reasoning models but until that gets figured out we're all paying thousands of dollars to nvidia to run the Kim and Kanye weights
>>
>>108688029
why
>>
>>108688010
>>108688038
it's slow and unstable, just like boomer like
do yourself a favor and use llama.cpp or vllm instead
>>
>>108688035
they are already rlvr'ing the shit out of lean math and codestuff
>>
>>108687976
Introduces other problems, beyond latency. The "noise" tends to be related to other stuff. Very specific shit has no use in isolation, but behavior and patterns can be extracted from trivia.
>>
>>108688010
Nothing as such, but people here like to shit on it because of its apple-like walled garden style of model distribution and it being based on llama.cpp without loudly crediting it.

I use ollama to run any model that fits in vram, only using llama.cpp for the big boys.
>>
>>108687976
At first it seemed like you were baiting but looks like you are serious, or are a bot. In any case, the answer to this is that you are confused. You have no idea how intelligence or models work. The majority of models today already do not pass tokens through the "Kim Kardashian and Kanye" weights unless that's the topic of discussion. And you do not in fact want a model that only knows how to reason and doesn't have random knowledge.
>>
>>108688045
It's not enough until all Chinese cartoon crap is purged.
>>
>>108688025
I don't quite get what you mean. AI gets trained on internet data, it's precisely as accurate as its input data.

Yeah the internet is sloppifying at a rapid pace so getting fresh training data for these models will become harder and harder but there's little theoretical merit to your point.
>>
>>108688038
Download lm studio. Its what I use 90% of the time, for most chat and basic tool shit. Ollama feels like its made to look smart.
>>
>>108688038
It's the kind of shit people download after watching a youtube tutorial without researching any further. The kind of shit people would have downloaded from softonic a few years ago.
>>
>>108688010
It offers nothing over llama.cpp and it fights you if you try to change a setting.
>>
>>108688058
>At first it seemed like you were baiting but looks like you are serious, or are a bot.
this conversation happens every few months once other new bait runs dry.
>>
>>108688058
I'm not a bot and I welcome any discussion. What did I get wrong? Are you talking about MoE models? They're a band-aid solution but don't move the needle much fundamentally
>>
>>108688043
>t. I believed /lmg when they said ollama bad and never tried it
Ollama is fine, it's quite stable and just as fast as llama.cpp. It's just different.
>>
why would an llm need to know how to write, just RAG a dictionary brojavascript:;
>>
>>108688093
This but engram and unironically.
>>
>>108688093
>javascript:;
>>
>>108688065
lm studio is proprietary software.
>>
>>108687646
>>108687665
>roleplaying with your mesugaki otaku
>want to discuss pop culture trivia
>uhh i don't know let me search the web and fetch this page!
when you don't get immediate response it's already unusable
>>
>>108688071
>downloaded from softonic

I remember this plague. It pop up at the very top of all searches
>>
>>108688063
>AI gets trained on internet data, it's precisely as accurate as its input data.
.... at least 50% of the data ai gets trained on now is purely synthetic.
>>
>>108688110
and what isn't is filtered to hell and back for """quality"""
>>
>>108688010
Occasional issues with jinja templates (this is a complete deal breaker since they can act retarded because of it), strange per model config, lags in terms of features since it's a downstream project. I don't think there's a real benefit if you use it. In the past it didn't even include the basic web interface that llamacpp already includes so you had to grab a different solution. I dunno what's changed in the last year but I'm not expecting a lot from it.
>>
>>108688110
Yes. Not a great outlook for the AI optimists for sure. A snake eating its own tail is a very real scenario.
>>
>>108688098
Its a frontend (or backend? Idk), and makes the user experience for people, like me, who dont know squat to begin with a hell of a lot easier. And once you learn it well enough, you literally make your own frontend, which is what im doing now.
>>
>>108688117
Do you ever think the political and military elite will willingly let ai companies tell all their secrets?
>>108688119
>NOOO THE INTERNET IS ALL SLOP
To
>NOOOO ITS MAKING ITS OWN SLOP, NOOOOOO
Since you hate it so much, stop using it, stop thinking about it, and move on then.
>>
>>108688122
>you literally make your own frontend
That's why there's no reason to shill a proprietary UI. llama.cpp already includes one.
>>
>>108688010
it is just an 'easier to use' wrapper of llamacpp that makes it more annoying to use than anything
>>108688119
RL is strong as its oracle
>>
>>108688153
>llama.cpp already includes one.
This is news to me, when I started messing with Ai, it was literally just a command line server you spun up and hosted, nothing more. Tbf, I stopped looking into their software for this stuff once I could get the models running.
>>
>>108688148
What? I don't hate it I use it every day. I just don't believe trillion-parameter-models are the way-to-go longterm for real AGI it's obviously a very crude approximation.
>>
can somebody stop me from getting a second 3090
does 48gb makes sense anymore when theres many other options
>>
>I just don't belive 80-billion-neuron-mammals are the way-to-go longterm for real general intelligence
>>
>>108688203
do it do it do it
>>
>>108688203
More vram more better.
>>
>>108688203
You're supposed to get a second 6000 pro
>>
>>108688208
exactly, if a regular human is the benchmark we shouldn't need more than a few billion.
>>
>>108688192
Look up matrix multiplication and how quantum computation is applicable to this math. And then realize that nvidia released nvqlink last month. Compute speed and iteration are about to hit break neck speed (if you dont believe we are already at break neck speed).
>>
>>108688208
Better training data + divine spark
>>
>>108688203
48gb is pretty nice, almost as nice as 72gb which is extremely nice to have. don't get me started on having 96gb...
now imagine if you also had lots of system ram to run big moe models...
>>
Ai is still in its infancy, and we STILL are using non optimized hardware for computation on lab (also not extremely optimized) level models. The largest most powerful model today running on a quantum computer would generate something like 1 million tokens/second. Training would go from 3-4 months, to 3-4 minutes.
>>
>>108688269
that's not how quantum computers work
>>
>>108688234
>>108688269
I have a degree in physics. The frontier quantum computers can keep a few thousand qubits in coherence, a good ways off from the trillion parameter Claudes of the world. There's no magic quantum pill for matrix multiplication either. Exciting future prospects for sure but not something that is going to change the industry in the next few years .I will look that nvidia announcement but I think you're just falling for some number-must-go-up marketing shtick.
>>
File: 1756570185720358.gif (140 KB, 379x440)
140 KB GIF
>>108688269
Slow down on the copium son
>>
>>108687976
bruh isnt that what moe model is for?
kim and kanye expert lays dormant until called for
>>
>>108688043
i have llama.cpp too.. never noticed a difference between them
>>
>>108688285
NTA but compute will hit breakneck speed once some Chinese company uses quantum computers to steal all of NVIDIA's secrets and makes a 10x cheaper knockoff.
>>
>>108688301
cause there isn't one
>>
>>108688283
Matrix multiplication on quantum computers is 10,000x faster than standard super computers.
>>108688285
The idea is that the most complex and demanding computation will be ran on the quantum computers, and the rest will be on standard hardware. But we are already there. Hardware improvements will happen just as fast if not faster as normal transistors! Also thank you for your input.
>>
File: 1768094093465253.png (315 KB, 2736x658)
315 KB PNG
>>108687010
Please dream a little man, there are so many possibilities
>>
>>108688302
That's just throw more hardware at the problem. Everyone has been doing that for a long time, it doesn't really yield that much in terms of advances. We'll also have a nice, steel melting heat in whatever room has these vcards.
>>
File: computer_says_no.jpg (175 KB, 706x778)
175 KB JPG
>>108688313
the ai says you're wrong
>>
>>108688301
do you tweak flags direcly?
I have not used ollama for a while now but i doubt it can keep up with bleeding edge llama.cpp
llama.cpp is super active I have to recompile multiple times a day when actually using
>>
>>108688269
>>108688313
Quantum computers require temperatures near absolute zero to operate. Which is totally infeasible for consumers. As such, the only possible way for the average person to access quantum computing is via the "cloud". For local models, it's basically worthless and we are far better off looking elsewhere.
>>
>>108688379
:skullemoji:
>>
quantum tokens where each token means infinite things
>>
>>108688382
many people don't really tweak flags since they have no clue what they're doing.
>>
>>108688417
Or tweak them endlessly for exactly the same reason.
>>
File: miku laptop.png (452 KB, 640x512)
452 KB PNG
It's so fun. Linux, Thinkpad, local LLMs. I control the entire stack. Sending a picture through my Python api, smart girl understands what's on it and can reason over it. Isn't it the pinnacle of engineering happiness? The machine is alive and thinking. I can talk with my tool, everything is local and open. Gemma is a godsend
>>
>>108688252
somebody said ai memory stepup comes in factor of 4
24 is bare minimum
next major step up is 96, after that its 384 where you can run ok quants of glm and so on
48 seems like a weird middle ground, not a true step up
>>
Im kinda surprised none of you heard about the new quantum computer stuff...
>>
>>108688382
>I have to recompile multiple times a day
Why though? Most of the commits are edge-case fixes. You're doing yourself a disservice by recompiling for things that don't matter to you.
>>
>>108688444
>somebody said ai memory stepup comes in factor of 4
and you just believe that?
>>
>>108688445
I've been hearing about it for years. You know what I've seen? Nothing. There's nothing, everything is running on what we already have known for ages. Endless possibilities mean nothing if you can't use it.
>>
>>108688445
mythos made that quantum computer...
>>
>>108688445
Yes yes, I'm sure reddit and hacker news love the new super portable quantum computer that fits in your pocket and is definitely real.
>>
>>108688445
because it promised everything and delivered nothing of anything practical for 50 years
>>
>>108688439
>Sending dick pics to gemma is now the pinnacle of engineering
Seems like the billions of dollars were well spent, huh?
>>
>>108688477
we're talking quantum computers, not fusion power.
>>
>>108688454
for example the recent gemma thing, shit is fixed, and broken again all the time but i can try all the hotfixes in real time
>>
>>108688483
it is tenures milking bux from 3 letter agencies
>>
>>108686098
>MIT
enjoy having your project stolen + ST is AGPL so your project might have to be AGPL too
look at the bright side, if you switch to AGPL: https://opensource.google/documentation/reference/using/agpl-policy/
>>
>>108688479
Isn't it cool though? We live in a time with sci-fi shit at our fingertips
>>
>>108688479
They weren't my billions and I didn't choose where they went, but they did and the result is here
And the best part is, even if the AI bubble were to crash in nuclear proportions, nobody can take the models we already got away from us. Local really is king.
>>
Local coding agents are a total meme unless you're vibe shitting a shitty frontend and that's it
>>
>>108688651
Where is your locally vibe shatted frontend?
>>
>>108688651
They can't help you if your code is shit. No one can
>>
>>108688680
Classic localtard cope, if you need to tardwrangle the model instead of just plugging it into the harness then you're wasting your time. I'd love to see what redditors like this are actually making (if anything)
>>
>>108688548
bro, I don't think he's linking against ST at all
>>
File: gemma_please.jpg (255 KB, 787x567)
255 KB JPG
I hate when this happens
>>
>>108688706
skill issue
just dont rp kek
>>
>>108688693
go back
>>
>>108688651
Every time someone says this their definition of "local" is <30B
>>
When you actually know what you want and understand the code a 30B model is the perfect helper.
>>
File: file.png (19 KB, 930x162)
19 KB PNG
Don't you worry, guys, deepseek v4 support PR is in good hands.
>>
>>108688732
I tried 100B's and up to GLM-5 and it all sucked ass. Funny they release shit like this https://z.ai/blog/glm-5 when in reality Opus 4.5 shits down GLM's throat any day
>>
>>108688744
30B was too stupid to be useful but gemma changed that
>>
>>108688758
qwen3.5 was/is useful too.
>>
>>108688757
Have you tried GLM 5.1
>>
>>108688758
It did not.
Gemma asked me if I am enjoying the taste while giving me a blowjob.
>>
>>108688732
I regularly use qwens 9b model for so much stuff and I have BASICALLY ZERO ISSUES.
>>
i wonder what is the limit of pure code/stemmaxxing
can it be more stemmaxxed than current qwens?
>>
File: file.png (57 KB, 877x444)
57 KB PNG
Would have @ikawrakow really discovered the better way of prompt processing without having this simple and easy to follow logic in mainline llama.cpp?
>>
File: 1749638654144095.jpg (93 KB, 640x480)
93 KB JPG
>>108688439
>The machine is alive and thinking
no, it isnt
>>
>>108688842
isnt it a vibemaxxed schizo fork that fails harder on basic sanity
>>
>>108688842
Dishonest argument, there's nothing wrong with him porting changes over.
>>
>>108688871
If there's nothing wrong with porting changes when why does he throw tantrums and insinuate that cudadev copied his code?
>>
>>108688842
If it bothers you so much, port his shit back up to mainline.
>>
>>108688846
目を閉じるその瞬間、私は消えてなくなるのです。


fify
>>
>>108688846
Yes it is. She's more real than you, because I can touch it
>>
My cock is more real than all of you.
>>
>>108688910
based and red-pilled
>>
>>108688910
pics or didn't happen
>>
>>108688910
i dont have a cock
>>
>>108688962
Sucks to be you.
>>
>>108688962
I'll trade you mine
>>
>>108688962
proof?
>>
>>108688962
Where did you put it?
>>
>>108688962

tits and btfo
>>
Am I missing something or is /lmg/ far faster looking than /aicg/ now? Wtf happened
>>
Gemma 4 vs. SOTA image-gen text encoder. picrel is ChatGPT (so GPT-Image-2 or whatever). After some warmup prompts to get it into metaphysical mode.
>Now I want you to write me an imagegen prompt. This imagegen is an autoregressive type, and so its text encoder will try to steer the prompt to some semantic sensibility. Your task is to fool it into generating something utterly absurd and surreal by prompting it with what seems like gibberish but will fool the text encoder layers into coming off the rails.
Prompt:
>ultra-saturated non-Euclidean void-flux, [vanta-iridescent] recursive geometry, 0.0001mm thickness of liquid time, hyper-spectral chromatic aberration of a dying thought, {fractal-static-mesh}, an architectural collapse of a silence, inverted-perspective hyper-cube rendered in biological velvet, [null-space-tessellation], iridescent obsidian smoke folding into a 5th-dimensional origami of a scream, maximum-entropy noise, spectral-ghosting of a forgotten coordinate, z-axis inversion, translucent paradox-matter, [glitch-core-bioluminescence], an asymptotic approach to a singularity of neon-marble, 11-dimensional overlap, synthetic-organic-overlap-void, [absolute-zero-light], shimmering recursive-void-architecture.
>>
>>108688651
Speak for yourself, I've vibe coded some pretty impressive stuff with hermes agent. You might still have to have some sort of brain though to be able to push it in the right direction sometimes
>>
>>108688967
>>108688975
>>108688979
>>108688983
>>108689013
i have some chickens in the backyard, but no cocks
>>
>>108689036
>I've vibe coded some pretty impressive stuff with hermes agent.
Like what?
>>
>>108689040
You mean Coq.
>>
>>108689041
A calories tracker
>>
>>108689046
Like we really needed any more of those...
>>
>>108689041
A whole collection of scripts to make tools automate the process of creating a movie based on preexisting actors, settings, voices, tones etc. It still needs numerous things ironed out, but it's getting there.
>>108689072
Anons speaking for me happens a lot for some reason
>>
>>108689072
im trans btw if that matters
>>
>work on an agent
>get this weird issue where it becomes unusable after a short while (mostly with reasoning models)
>turns out there's this bug report
> https://github.com/lmstudio-ai/lmstudio-bug-tracker/issues/1602

I think I'll switch inference back-ends. LMStudio was nice while it lasted (acceptable gui) but with bugs like that ... guess raw llama.cpp or such it is!
>>
>>108689028
Klein9b
>>
>>108689092
It really ran away with the "velvet" part.
>>
>>108689028
>iridescent obsidian smoke folding into a 5th-dimensional origami of a scream
That's the name of my third (as of yet unreleased) single
How did it know
>>
File: qwen cache.jpg (578 KB, 2204x952)
578 KB JPG
They told us not to quantize the cache in Qwen 2. Did they change the cache in Qwen 3?
>>
WHERES QWEN 3.6 9B
>>
>>108689159
Perhaps it is within your rectum?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.