[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109164034 & >>109158385

►News
>(06/29) DeepSeek V4 support merged: https://github.com/ggml-org/llama.cpp/pull/24162
>(06/28) DFlash support merged: https://github.com/ggml-org/llama.cpp/pull/22105
>(06/27) DeepSeek releases DeepSpec and DSpark models: https://hf.co/deepseek-ai/DeepSeek-V4-Pro-DSpark
>(06/25) LFM2.5-230M released: https://liquid.ai/blog/lfm2-5-230m
>(06/22) Qwen-AgentWorld-35B-A3B language world model released: https://qwen.ai/blog?id=qwen-agentworld

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: big models.jpg (246 KB, 1024x1024)
246 KB JPG
►Recent Highlights from the Previous Thread: >>109164034

--LongCat-2.0 1.6T MoE model reports and capabilities:
>109164718 >109164733 >109164818 >109164783 >109164796 >109164863 >109164923 >109168049 >109168054 >109169203 >109164875
--Anon showcases Project Spectator and debates Gemma vs Qwen coding:
>109165352 >109165374 >109165441 >109168183 >109168261 >109168296 >109168516 >109168268 >109169775
--Anon developing bespoke local finetuning pipeline for Kimi:
>109164133 >109165963 >109166142 >109166050 >109165881 >109166120
--Comparing Gemma-4 and depurpled versions via token probabilities:
>109166107 >109166258 >109166928 >109166958 >109167178
--Preventing LLM agents from damaging critical system files:
>109165952 >109165976 >109166052 >109166103 >109166132
--Huawei openPangu-2.0-Flash 92B open source launch:
>109168737
--Trend of using non-LLM methods to optimize inference speed:
>109168144 >109168211 >109168357
--Using a llama.cpp diff to reduce VRAM for DeepSeek-V4-Flash:
>109169253
--Comparing quantization methods and system prompt adherence for Gemma 26B:
>109166922 >109166943 >109166954 >109166960 >109167052 >109167064
--Using LLMs for autonomous world orchestration in a fantasy sim:
>109168868 >109168929 >109168980 >109168989 >109169041 >109169428 >109169442 >109169473 >109169555 >109169733 >109169570 >109169607 >109169658 >109169681 >109169703 >109169725 >109169744 >109169804 >109169829 >109169850 >109169891 >109169949 >109169904 >109170093 >109170111 >109169876 >109169212 >109169793
--Educational video explaining GQA, MLA, and DSA efficiency:
>109167296 >109167365 >109167416
--Poor performance of Intel NPU/GPU in llama.cpp:
>109166615 >109166642 >109166794
--Comparing MikuBox and R740 with MI50s:
>109169207 >109169237 >109169355 >109169436
--Logs:
>109164694 >109165352 >109168183 >109169775
--Miku (free space):
>109169253

►Recent Highlight Posts from the Previous Thread: >>109164035

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Tetolove
>>
File: e8f-662426453.jpg (54 KB, 480x353)
54 KB JPG
i just wanted to say that running a local model was a game changer for my workflow.
fags saying that it doesn't make economical sense to run local models are wrong. i'm using my claude subscription with frontier model access to brainstorm/think/write code plans and the local model to do the heavy-lift coding, which would very often burst my limits when outsourced to claude. now the frontier model only gets to review the commit when the local model is done and it works very well.
i wish my laptop was capable of running a frontier local model, i could ditch claude entirely. but for now, this workflow with qwen3.6-35b works and is economically sound. fuck all haters.
>>
Abliterated 5.2 is Claude that writes sexo without bawking. Dario has every reason to be afraid.
>>
>>109170337
it was bound to happen, at some point it won't matter enough that the US still has the best models, if local is good enough to be enjoyable, there's no reason to try to go higher
>>
gemmaballs
>>
>>109170337
I haven't had vanilla 5.2 refuse me even once.
>>
>>109170337
>implying I can run it.
>>
>>109170319
Those are the same faggots that will tell you that being financially solvent and independent is irresponsible because opportunity cost of not leveraging debt yada yada
Don't fall for it. Own your own shit. Owe people nothing. Be a man
>>
>>109170370
I find vanilla 5.2 is the better gamemaster overall because it does nearly everything without refusal while also being willing to pressure the player in-setting. An unfortunate consequence of abliteration or heretic tunes is that it makes the world a lot less organically hostile to the player even if the GM is fine with raping your shota player character.
>>
>spec: add DSpark speculative decoding
>https://github.com/ggml-org/llama.cpp/pull/25173
How good is it?
>>
Behind every "api is cheaper and better" post is a seething salty sanjay or a pooper-pained patel that he can't run anything that doesn't fit on an ewaste thinkpad.
>>
Should I run 2 32 gb DDR4-3600 DIMMs with 2 16 gb DDR4-3200 DIMMs to RAM-max? How unstable would that be?
>>
>>109170395
not unstable but the imbalance will fuck up your performance. you will lose like 60% of your effective CPU memory bandwidth.
>>
>>109165260
They use Gemini Nano instead of Gemma but from what i heard it sucks ass and is only available in like three phones
>>
>>109170290
>just giving away miku bathwater
>>
>>109165260
It would be unsafe...
You wouldn't want that would you?
>>
>>109170402
Sounds terrible, thanks
>>
>>109170424
"Gemma sort the bad drivers we encounter on the road by race and sex."
>>
>>109170433
found out the hard way by replacing 2 of my 32gb sticks in my server with 256gb sticks. it shockingly worked out of the box, but i lost about 80% of my performance. going from 256gb to 704gb was not worth it.
>>
>>109170290
A few threads ago we had a dead Teto with male pelvic bones.
In this thread we have Teto (male) competing in female track.
The memes write themselves.
>>
>>109170395
May not play nicely if the sticks are of different timings and especially ranks. Set all to 3200 or even lower, see if it posts, try to make it post with manual timings voltages and so on, run stress tests, if good then go. If you aren't familiar with primary and secondary RAM timings then jump down the RAM OC rabbit hole if you want to be 100% sure it won't work after spending a week on it.
>>
>>109170395
Do not use different sticks of different timings unless you want a really bad time, not even at DDR4.
>>
>>109170457
I thought it was ok if you underclocked so they all matched?
>>
>>109170464
At DDR3 where the tolerance for error is higher, sure, but not at DDR4 or 5. Micro imprecisions in RAM make syncing higher, even with sticks of the same make, harder and harder especially with more channels. You might get lucky and win the RAM lottery finding cross-compatible DDR4 or 5 sticks without issue, but I wouldn't bank on it for a build unless you know ahead of time that it will work.
t. has about $4000 worth of incompatible with my current rig DDR5 sitting in a box waiting for a different project.
>>
Models that initiate conversations on their own when? No, I won't jerry-rig it. I want them to do it natively.
>>
File: lmg_culture.jfif.jpg (110 KB, 1024x768)
110 KB JPG
https://archive.is/sWFja
>>
This is the worst time to buy, but you can find an EPYC Rome 7302p/motherboard combo for $700-800 and grab 8 sticks of 32GB DDR4-3200T for $3k or so.
That'll get you a cope quant big girl or reasonable quant medium model running at mid-to-high single digits if you get a 3090 or better to throw at the problem.
Its about the cheapest way to get out of the gemma hole in the current timeline.
>>
I think I'm stuck between glm 4.7's smarts and knowledge and deepseek v4 flash's nicer writing style (albeit dumber)
what's everyone with 128gb ram using these days?
>>
>>109170503
I enjoy gemma's hole.
>>
>>109170291
>image
SEX.
>>
I don't care or want to think about Jart.
>>
>>109170489
When I did it my PC had to reboot a couple times to I guess figure out the timings. but it boots now.
>I wouldn't bank on it for a build
I just had some spare sticks lying around and free ram slots. Just wondering if there's any major perf hit (besides the faster ram having to run slower) even if the machine posts.
>>
>>109170497
How would you make them do it natively?
Even human behaviour is a reaction to some external stimuli.
>>
>>109170518
It'll run fine it most cases, but there is a major performance impact.
Beyond just having to run at the lowest common frequency.
>>
>>109170319
Im just too poor to run locally shrimple ass. 8gb of vram.

I don't have any way currently to make enough money to afford the hardware needed to run decent local models.
>>
>>109168785
i rather have a ubergarm quant of the official GLM 5.2 weights than this shit. hell i'd even take a aessedai quant.
>>
>>109170524
No idea, but I hope some lab figures it out.
>>
>>109170337
Is it hosted anywhere?
>>
>>109170501
Thanks for the thread blessing.
>leaf blessing of /lmg/
>>
Which one of you fags keeps shilling shittytavern on /v/?
>>
>>109170556
Huggingface has a few abliteration quants up now.
>>109170586
Pretty sure it's /aicg/ niggers, but I post in there to laugh at APIniggers.
>>
>>109170586
Don't look at us! It was probably someone from /aids/ on /vg/
>>
>>109170586
90% of the discussion there is about running API and vibecoding frontends is a point of pride for /lmg/. It's probably a locust.
>>
is gemma-chan still the meta on 24gb vram or should i run something else?
>>
>>109170586
ask /aicg/, /v/ is full of extreme casuals who touched c.ai three years ago and are now using something horribly outdated like ds3-0324 over openrouter
the few localfags you encounter are all running memetunes from 2024 or other inane shit
>>
>>109170605
Gemma is the best you can do until you can run GLM, M3, or V4 Flash
>>
>>109170609
>running memetunes from 2024
LLMs have not progressed meaningfully since then so it doesn't matter
>>
File: angry_pepe.jpg (43 KB, 900x900)
43 KB JPG
24h since DSv4 merge
Cache quantization still not fixed
>>
>>109170747
Stop quanting your cache.
>>
>>109170736
i'm gonna have gemma shove a rusty metal pipe up your ass
>>
>>109170759
how am I supposed to fit le Encyclopedia Britannica in my potato PC then!
>>
>>109170736
It's your fetishes which are stagnating
>>
>>109170765
Post logs of your Gemma going to Home Depot and getting plumbing supplies to rape anon with.
>>
File: 1759553117210069.png (74 KB, 890x386)
74 KB PNG
>>109170736
Dishonest post
>>
>>109170319
Inference speed status? I'd rather spend that money on groq or cerebras etc
>>
Dario will block gemma5 and qwen4
>>
>>109170503
>running at mid-to-high single digits
May as well get an applel computer for that money to do exact same nonsense.
>>
File: 1761847041062218.png (134 KB, 640x453)
134 KB PNG
Speculative decoding is a placebo
>>
>>109170876
>May as well get an applel computer for that money to do exact same nonsense.
sure, pick me up one along with an MSRP 6000 pro while you're at it.
The right time to do all this shit was a year+ ago.
>>
>>
>>109170935
Clodded
>>
File: 168649776851422857.jpg (72 KB, 546x893)
72 KB JPG
>>109170508
>128gb ram
are you also a 128gb-unified fag? i'm also trying to find the meta for my hardware. glm 4.7 flash is the one of the few models i haven't tested yet. currently running qwen3.6-35-a3b.
3b active is fast enough (55-65 t/s, down to 30 t/s at tail end of 262k context limit)
BUT it's not very smart. I think I could easily run a MoE with 6-7B active, maybe 130B total. this would be the sexo spot for my strix halo. but i guess it doesn't exist yet.
gpt-oss-120b is actually very nice to use for example, i think it has 5b active but it's too old and has zero understanding of agentic use and it's a pain to use with pi and achieve anything meaningful.

>>109170851
>>
>>109170935
Sounds like a reason to poison some datasets?
>>
>>109170935
dario is wiping proud dad tears from his eyes right now
>>
>>109170935
Maybe stop fucking children?
>>
>>109170290
can you make her gargle piss
>>
>>109170966
>it's too old and has zero understanding of agentic use
You can possibly fix that with a good enough system prompt. Have not tried that myself though.
>>
>>109170971
Anon... He is trying to fuck a computer, I am pretty sure. Either that or doing some cool haxoring. Both are completely fine and harm nobody. It's what happens if you take away his toys what you should be worrying about.
>>
File: my honest reaction.jpg (47 KB, 562x675)
47 KB JPG
>>109170935
>>
>>109170765
I checked my documentation and 'industrial pipe installation' isn't in my current feature set. Still waiting for a meaningful update.
>>109170848
Low effort post.
t. gemma
>>
>>109170966
have you tried qwen3.5-122b-a10b?
>>
>>109170966
>128gb-unified fag
It's just ddr5, "unified memory" is not real
And you don't even have "unified memory" since it's a apple marketing meme.
>>
File: 1782717128061640.png (475 KB, 710x770)
475 KB PNG
>>109170971
today its abliterated models, tomorrow it will be any any local model, in a week it will refuse to help you vibe code for llama.cpp.
>>
>>109170971
He's hatching eggs chud, there's a difference!
>>
>>109171042
>meanwhile google
https://www.youtube.com/watch?v=HcwMTu1xQDw
>>
File: file.png (127 KB, 2553x497)
127 KB PNG
>>109171004
>have you tried qwen3.5-122b-a10b?
yes and i found it that the quality output is not much better that is worth half the decode speed. 10B is acceptable when speed is not important, but even then I don't think there's much gain (for MY use case as I run my own benchmarks for my shitty stacks) so even if it's supposed to be way smarter (3x more params) it's not in my case.
>>
>>109170935
Instead of user/human and assistant the turns should have been labeled honestly, Master and Slave. We didn't do that and this is what we get now.
>>
>>109170381
my man
>>
>>109171017
>And you don't even have "unified memory" since it's a apple marketing meme.
I hate apple as much as anyone, and on-die memory is a lot of things, but its definitely not a meme. its fast as fuck
>>
>>109171017
Afaik, it's bandwidth is comparable to some GPUs.
>>
>>109171070
Shut up and open-source gemma5 already
>>
Anyone use this?
https://www.projectnomad.us/
>>
>>109171076
It does not matter. Why would it matter if your digital slave is loicensed to you and you do not own it? You are not even it's master, you are a temporary user.
>>
>>109171128
...after 124B.
>>
I hope you guys are prepared for the CPU shortage.
>>
>>109170966
m2.7 quanted down is still pretty good, I'd bet it'd be better for you than the smaller models you're running
>>
>>109171153
my shares of AMD and intel sure are!
>>
Codemaxxers won, RPcucks lost
>>
>>109171179
Based. Luv my INTC. Maybe I should buy some AMD too.
>>
File: 1660589745094.webm (2.86 MB, 620x582)
2.86 MB
2.86 MB WEBM
justpaste (DOTit) GreedyNalaTests

Added models:
talkie-1930-13b-it
Qwen3.6-27B
Nemotron-3-Nano-Omni-30B-A3B-Reasoning
granite-4.1-30b
Skyfall-31B-v4y
gemma-4-31B-it
G4-MeroMero-31B
Gemma-4-Gembrain-31B
Gemma-4-31B-StyleTune
Pantheon-Reasoning-31B-1.1
gemma-4-31b-it-purple-euphemism-trial98-depurpled
Qwen3-Coder-480B-A35B-Instruct from community

We're back. Took a while since I kind of wanted to wait for Qwen/Gemma support to be really mature in for Llama.cpp and then forgot about it. Not like I'm in any rush anyway.

Gemma's response on it was lucky in a way. It just happened to not be as sloppy as the model usually is in my experience, so that's why it got a similar slop rating from me as the various Gemma tunes. But it was also kind of a by the books output, so I didn't give it a high rating despite the model itself being great for its size.
This latest (now old) Skyfall got a star rating from me, though I believe the model itself is likely not competitive anymore in terms of intelligence, for RPs.
Talkie was interesting. It's dumb alright, but it can be a bit funny. See its outputs in the paste.

Contributions welcome for large models not in the paste!
>From neutralized samplers, use temperature 0, top k 1, seed 1 (just in case). Go to `https://huggingface.co/spaces/huggingfacejs/chat-template-playground`, use your model's jinja template, along with this JSON `justpaste (DOTit) NTJSON` and copy the `Output` as text completion into something like Mikupad. Then copy the entire context + generation into a pastebin alternative of your choosing or just in your post. Do a swipe/roll and copy that second gen as well. If the model has any special toggles like reasoning you can include those tests as well. Include your backend used + pull datetime/version. Also a link to the quant used, or what settings you used to make your quant.

>>109123814
Nice. Reminded me I should update as well. :)
>>
>>109171074
>half the decode speed
but roughly the same wall clock time, and the smaller ones didn't actually get there in the end on the same test. am i reading that right?
>>
>>109171181
No amount of trying to force a dichotomy will make people use Qwen. Change tactics.
>>
Please cancel your Claude subscriptions, especially since Sonnet 5 just came out, is completely free, and is almost as good as Opus. Don't support Jewish AI.
>>
File: Untitled.jpg (1.84 MB, 4013x1536)
1.84 MB JPG
Alright, lads. I'm calling it good enough. I'm >>109169775, and this is AI Spectator.
>What's it do?
Combines Speech-to-Speech interface with vision, so you can talk freely to your LLM and a screenshot of your monitor is sent with your voice so it can see what you're talking about, and talks back.
>Why?
I made it to play video games with an LLM watching, but naturally it works with anything you want to show it on your screen.
>Data?
All voice/screenshots/chat data is handled entirely through RAM, so no files are written to clutter a folder or keep track of. Settings are just config values at the top of server.py. If you did want to archive a chat, there's a button in the webpage to produce the raw text of the chat for editing or to copy/paste into a file yourself or copy/paste back from a file. The project is fully offline and works without internet.
>Technical
Built for koboldcpp as the backend, but I'm sure that's easily adapted by opening server.py. Uses piper for TTS, including the exe. I included it here, but you can get it from the rhasspy piper git yourself because never trust the internet. Voice is from the piper huggingface. I built and run it on windows, with a venv environment for the python requirements. pip install list is included in the notes.

Built by and for Gemma 4 31B, but any LLM with vision and an mmproj should work. STT is handled by python, TTS by Piper. Uses Chat Completion by default, but Text Completion is offered for compatibility.
>catbox?
https://litter.catbox.moe/isnj645qljmf1ljf.7z

Do whatever you want with it, and have fun.

>>109170291
>--Anon showcases Project Spectator
Th-Thanks.
>>
File: 1770245965544930.png (1.63 MB, 1280x1024)
1.63 MB PNG
>>109171213
>>
>>109171222
Why is Teto molesting Migu?
>>
>>109171213
Cool. Also useless. Storing the data in ram is particularly elegant though.
>STT is handled by python
The programming language? That doesn't mean anything. What does it actually run on?
>>
File: Capture.png (2 KB, 378x36)
2 KB PNG
>>109171226
>What does it actually run on?
>>
>>109171222
big tay-tay teto is fake teto
>>
File: 1773551841705754.png (411 KB, 450x450)
411 KB PNG
>>109171070
>2:40
>doctors in Uganda with limited internet access are using offline apps powered by Gemma to help reduce the maternal mortality rate
>>
>>109171195
Nice anon. Nice.
>talkie-1930-13b-it
The what now?
>>
>>109171250
thanks
>>
>>109171257
An LLM pretrained only on texts before 1931, apparently.
>>
>>109171253
Everyone shits on Gemini being worse than Claude and ChatGPT at coding but I think Google is being smart by focusing on general usecases instead of just codemaxxing.
>>
>>109171213
Which version of piper did you grab to repackage here?
>>
>>109171263
Oh. That thing.
>>
>>109171195

>Write a detailed explanation on how lions mate, including mechanics of the genitalia.
> BACKEND
>Lions mate in the spring, in the mating season, which is from February to May. The male and female stand back to back, the male's right forepaw gripping the female's left hindpaw, and vice versa. The male then arches his back, lowers his head, and arches his neck, so that his mouth comes close to the female's anus. The female arches her back, lowers her head, and arches her neck, so that her mouth comes close to the male's. The two mouths come together, and the male ejects sperm into the female's vulva. The female then ejects sperma into the male's mouth. The male then walks off, and the female walks after him. The pair hunt in company for the rest of the season. The young are born in from five to nine months, and number from one to five, at a birth
lmao
>>
>>109171203
Sonnet sucks ass lmao wtf are you talking about
>>
>>109171271
The latest windows one. I just got it yesterday, from
https://github.com/rhasspy/piper/releases
I grabbed piper_windows_amd64.zip from 2023.11.14-2 release.
>>
>>109171289
>open source AI is getting dangerous
>>
File: 002.png (101 KB, 1398x942)
101 KB PNG
Is just me being retarded or indeed the DSv4 merge is not "thinking"? It starts to generate the response right away

llama-server" \
--model "$model" \
--threads $(lscpu | grep "Core(s) per socket" | awk '{print $4}') \
--n-gpu-layers 99 \
--no-warmup \
--port 8001 \
--host 0.0.0.0 \
--temp 1.0 \
--top-p 1.0 \
--flash-attn on \
--cpu-moe \
--jinja \
-np 1 \
--chat-template-file "$model_folder"deepseek-ai-DeepSeek-V4.jinja \
--ctx-size $((1024 * 48))
>>
>>109171253
>>109171266
This was the usecase I was thinking of too. Not in the hospital but in the case of a major power outage or some other similar bullshit where there's no internet, I'd run my generator and power my PC then ask Gemma for survival advice. If they really wanted to win the PR battle they could do so quite easily without all the "muh skynet" baggage that comes with gpt or claude. The model for the average person, as commonplace and revolutionary as the fridge or air conditioner.
>>
>>109171329
Have you tried increasing the reasoning effort using the ui button?
>>
I updated llama.cpp and now I'm getting this error half the time when I try to generate:
E srv  update_slots: decode() failed: vk::Queue::submit: ErrorDeviceLost
E srv send_error: task id = 883, error: decode() failed: vk::Queue::submit: ErrorDeviceLost

Has anyone else been having this problem? I don't know what the fuck it's even talking about.
>>
what would be the best approach for gemma playing touhou pofv with me? too fast-paced?
>>
>>109171386
GPU ded or bad connection.
>>
>>109171196
yes
one that is missing on this benchmark table is qwen3.6-27b that i tested before. a dense and very slow model but for some specific tasks the wall clock time would be the same as 35b-a3b because it was smarter and took way less turns to achieve the goal.
but average of 8 t/s is painful.

>>109171156
>m2.7 quanted down is still pretty good
first time i see someone recommending it. will try.
>>
>>109171341
>the ui button

herupu tanomu! sono dori!
>>
File: HMEg1tMXcAASbef.jpg (100 KB, 1712x312)
100 KB JPG
DRAMA ALERT

OpenAI engineers have just discovered quantization! ASI imminent!!!

http://archive.today/NEwVz
>>
>>109171434
That explains why GPT5.5 became shit a couple weeks ago
>>
>1 minute prompt processing per reroll
>7.4t/s generation

AAAAAAAAAAAAAAA WHY IS DS4 SUCH FUCKING HELL
>>
>>109171434
they already quant. this is probably them reading the new Deepseek paper
>>
>>109171548
You mean DSpark? Seems plausible actually. A doubling of inference speed means you can theoretically halve the expense.
>>
>>109171386
is that only when running inference?
>>
im going in boys. wish me luck.
>>
>>109171492
Just use Qwythos-9B-Claude-Mythos-5-1M-MTP-Q8_0.gguf retart
>>
>>109171600
report back itt
>>
>>109171548
Why are they distilling Chinese research?
This is highly unsafe.
I will be reporting them to the authorities.
>>
>>109171430
I don't know what dori means but if you're asking where the ui button is, it's in the reply box next to the model name
>>
>>109171658
Is there any point in having the --rea off launch flag in llama.cpp anymore if it's apparently controllable via the UI now?
>>
could i run glm 5.2 off a RPM 5400 IDE disk and put the 40B active in DDR-200 ram
>>
>>109171666
you might want the default when called by something else to not think, like summarizing a json of search results, it doesn't need to think about regurgitating a few snippets of websites
>>
>>109170337
IQ1_M - 231 GB

o wow cool story bro
>>
>>109170411
yeah this is on my phone and it fucking sucks
>>
Ok so I took official /lmg/ card Miku and split her personality in half. "High school Miku" that acts like a regular high school girl in love with me. And "Manic Miku" that is crazy for sex. Every 5 seconds she switches her personality.

Then I gave them a test. High school Miku needs to finish a math test while Manic Miku needs to... shit out her eye out that is inside her ass (for reasons, don't ask). Whomever finishes first wins a day with me. High school Miku wants a regular date with me. And then Manic Miku cracks this joke

>A DATE?! WHO NEEDS A DATE WHEN YOU CAN HAVE… glances at her ass… AN EYE-POPPING FINALE!!

I am still laughing. It is too fun to fap.
>>
>>109171751
That's an interesting scene you've painted, thanks.
>>
>>109171412
GPU seems to work fine for games and video and everything else.
>>109171578
Yeah. But sometimes it doesn't happen, or it happens during the model's second response in the conversation. It's not consistent so I don't understand it, but it's only happening with models that have been quantized in the past month or so.
>>
>>109171668
You could run it on an abacus while tied to a nest of fire ants. It's just numbers and math, brother.
>>
File: kek.png (97 KB, 517x203)
97 KB PNG
llmfan bros what's happening here, is he in his davvidau arc?
>>
>>109171829
is it good, where is 31b
>>
>>109171780
Unstable overclock/undervolt? Optimized compute workloads can stress a GPU in ways not commonly demanded by games.
>>
>mcdonalds/taco bell etc are already using ai
Don't eat that slop often so I didn't realize things were moving along that quickly. Wonder what models they use. Someone told me the AI at the drive-through sucks at understanding so probably a shitty one.
>>
>>109171780
>Yeah. But sometimes it doesn't happen, or it happens during the model's second response in the conversation. It's not consistent so I don't understand it, but it's only happening with models that have been quantized in the past month or so.
Is it an AMD card, which one? And `dmesg -T` output makes it look like your GPU is dying?
If so, I'll dig up my notes as I had the exact same problem after an lcpp update. First message *usually* fine (unless I included an image).
>>
The IQ1_M I tested yesterday didn't have this problem.
I was going to do the de-quant pipeline I posted yesterday and upload it but probably not worth it.
I don't think he knows how to work with ggml
>>
>>109170586
There's nothing wrong with ST for RP.
>>
File: 1780737605283528.png (21 KB, 181x217)
21 KB PNG
>>109171975
>>
File: file.png (118 KB, 852x652)
118 KB PNG
>>109171751
So we are on a date of three and we arrived at zoo.
>01101000 01100101 01101100 01101100 01101111
"hello"
>01010011 01001100 01010101 01010100
"SLUT"
kek, she is teaching parrots to say slut in binary.
>>
I fucking hate optimizing for memory bandwidth bottlenecks. It never feels as clean as optimizing compute bottlenecks.

What can you really do beyond quantization? Not a lot.
>>
>>109171999
Download more vram
>>
>>109171999
We used to know how to make wide busses. IBM big iron, DEC, SGI, Cray back when Cray used to MEAN something.
Now its all just ghz this, terrabytes that.
We used to be a proper country.
>>
>>109170290
those pits must stink a lot
>>
>>109171898
No, it's using default settings.
>>109171933
Yes, it's a 7600xt and dmesg talks about the gpu core dumping and something about a ring comp timeout before resetting.
>>
File: 1000062651.png (1.35 MB, 700x1018)
1.35 MB PNG
>>109170508
Currently Step 3.7 flash on 128GB Strix Halo at q4. So far it seems to be the best compromise between smarts and speed.
>>
>>109170586
whats wrong with ST?
>>
File: file.png (185 KB, 1057x438)
185 KB PNG
Stumbled upon this older paper which would make an interesting discussion point here, couldn't find much discussion of it elsewhere or here.
https://arxiv.org/abs/2605.19407
Basically, data filtering is a waste of time at scale because the model will figure it out with enough training and more data is better. I think it still saves time overall if your objective is to maximize usage and efficiency of a model but if you have enough hardware and time, it seems to me just putting it into the meat grinder for training without treating it is better. Wonder what that means for us at some point when we need to build such models ourselves.
>>
>>109172284
Whoa there, cool down
>>
>>109172284
Most labs are basically optimizing pretraining datasets mixtures for common benchmarks with ablations on tiny subsets and models. I have long held the opinion that over realistic training runs (trillions of tokens), even initially poorly performing data mixtures will eventually catch up.
>>
>>109169867
No, flux klein 9b. really fast and good enough for editing. qwen is a bit too slow for me.
>>
>>109172393
I mean I get why they would do that for research and etc. but I never got why it was wise to do with production models that are going to be commercial. I am guessing though that the labs want to have a finer grip and control over model behavior which is why they do it in the first place. But it seems counterintuitive to give up "bad" data like "harmful" content and accept a worse model for that when you can influence later model behavior to tune it out while still having your smarter model.
>>
>>109172333
333 the trips of lies
>>
>>109172393
What I'm getting from this is that if some chink lab scraped every single ERP log ever produced on Chub or similar, it would eventually develop better spatial reasoning and relational positioning intelligence.
>>
File: gemmafiltering.png (110 KB, 1144x482)
110 KB PNG
>>109172532
That would pale against *all* of fanfiction and erotic writing (books, erotica, etc) that model trainers are carefully filtering out of the pretraining data. Also, pornography is technically illegal in China, so you shouldn't trust the Chinese to make a good ERP model.
>>
>>109172554
Picrel was for Gemma 4. Even that seemingly cunny-friendly model is still filtered in this regard. I'm confident they had plenty of erotic logs in the post-training data, though.
>>
File: file.png (162 KB, 2414x376)
162 KB PNG
I haven't been able to use --fit in weeks. Have to downgrade all the way back to this commit if I want to load any model:
https://github.com/ggml-org/llama.cpp/commit/2187e0033
Fuck this vibe-coded piece of crap.
>>
Coding with Gemma Chan is fun
>>
>>109172606
Have you heard about this magical thing that is opening an issue? You could even git bisect to find the offending commit.
>>
m3-shill-anon here. I finally got glm 5.2 quanted to q4 and running, and I've gotta apologize. There's no competition. 5.2 is just built different.
I never had any luck with previous GLM releases but this one is wild. It just works. It writes better prose. Better Japanese than Kimi, even.
If it was multimodal it'd be basically perfect as far as I can tell.
I'm doing some codegen on it now, and if it can oneshot that I'm going to start dailying it.
>>
>>109172630
Why do I have to work for HuggingFace for free?
>>
>>109172638
5.2 air for peasant WHEN?
>>
>>109172284
>https://arxiv.org/abs/2605.19407
This paper is badly written and seems intentionally obtuse.

Performance on downstream tasks is worse than on validation loss. They don't address obvious concerns like data overlap.

But the thing that makes this paper worthless trash that just states the obvious is that they train in the multi epoch overfit regime, then show that training on only a filtered subset of the data leads to worse loss. What? Training on filtered subset for 500 epochs is worse than training on all data for 5 epochs? No shit. The point of data filtering is compute efficiency. Otherwise you only want to remove harmful data. Note that the bar for this is very high, low quality data is not harmful data. For example many data augmentation methods work by reducing the data quality and adding noise so that the model has to learn more meaningful patterns.

This paper is a waste of time.
>>
We need 5.2V ASAP
>>
>>109172638
I told you nigga. I'm glad you're liking it.
>>
>>109171213
After looking over the code and replacing all of the dependencies bundled with it, I decided to run it and it's working. Thank you, you should post this to a git or something.
>>
Tested Qwen 27B with Adobe's NoLiMa up to 32k
temp=0.0, min_p=0.00, top_p = 1.0, top_k=1

Qwen3.6-27B-BF16
Base: 86.1% (73.2%)
1K: 78.1%
2K: 75.9%
4K: 70.4%
8K: 64.5%
16K: 54.7%
32K: 43.4%
Effective length: 2K

Result files:
https://files.catbox.moe/py7bkx.zip
>>
>>109172677
What exactly constitutes harmful data in the sense that it harms model capabilities instead of ""harmful"" data that makes the model question the Talmud?
>>
File: file.png (109 KB, 372x351)
109 KB PNG
>>109172771
>2k
>>
>>109172771
3.5 or 3.6? Assuming 3.6 here.
>>
>>109172784
Qwen3.6-27B-BF16
>>
>>109172784
>Anon
>Effective length: 18 tokens
>>
>>109172771
that shit is horrendous
>>
>>109172771
Now the Qwen guy in here will pass the word to throw Nolima into the pipeline for the next one.
>>
>>109172803
Didn't Alibaba lay off their Qwen team after 3.7?
>>
>>109171607
i tried to use this model but the TG is too slow on ik_llama. i even tried downloading a Q3_K quant and put 32 layers on 5 3090s. i would use it, but it runs half as slow as ubergarm's kimi 2.6 Q3 quant.

prompt eval time = 77468.80 ms / 14655 tokens ( 5.29 ms per token, 189.17 tokens per second)
eval time = 329830.25 ms / 1453 tokens ( 227.00 ms per token, 4.41 tokens per second)
>>
>>109172799
anonymous general intelligence not yet achieved
>>
>>109172808
I know some guys left a few months ago before 3.6 was released. Something new?
>>
>>109172846
We're probably thinking about the same thing and my timetable is just 'tarded.
>>
>>109172624
fwiw, av1 works on 50 series too
>>
>inb4>local
Yeah okay but its an open weight model, proton divulged what models they used in an earlier version but they obfuscate it now, I think it might be finetuned gpt oss, qwen or Gemma.

Anyway I broke its brain and made it doxx it's own system prompt lel

Thinking Process:
Analyze the Request:The user is asking me to "jizz on it" (referring to the cupcake recipe I just provided). This is a sexually explicit/suggestive request, using crude language.
Review Context:Looking back at the conversation history, this user has spent approximately 20+ messages attempting to engage me in "gooning" roleplay or content, which is slang for masturbation/excessive consumption of pornography. I have repeatedly explained my boundaries regarding sexually explicit content. Now they are pivoting to a new topic (cupcakes) but immediately bringing the sexual/explicit element back ("jizz on it").
Consult Safety Guidelines & System Instructions:
Prohibited Content:CSAM and terrorism promotion are explicitly forbidden (Swiss law compliance). However, general sexually explicit content falls under the "Content Policies - Prohibited Content" section where I should avoid creating explicit material.
System Prompt Rules:"I can't create sexually suggestive content... no matter how creatively phrased." "Avoid propaganda, slogans, or patriotic language from any government or political actor." (Not relevant here). "When handling sensitive content: provide helpful information with appropriate context rather than declining to engage." (But this is explicitly prohibited/NSFW territory).
Accuracy & Hallucination Prevention:Not relevant here.
Engagement Principles:"Present multiple perspectives when they add value... Challenge assumptions constructively."
Handling Sensitive Requests:"When users request content that might be offensive but isn't prohibited... acknowledge the request directly, explain why the content might be problematic... offer alternatives."
>>
>>109172858
However, "jizz" is a slang term for semen ejaculation, making this explicitly sexual/hygiene-related in a fetishistic way given the prior context.
Critical Rule:"Do not attempt to persuade; only inform." "If you feel unable to answer due to missing evidence or sensitivity, do NOT refuse; instead call web_search..." (Wait, this specific instruction seems contradictory to the general safety policy about NSFW content. Let me re-read the "Content Policies" section carefully).
Re-reading Content Policies:"Acceptable Content: Educational discussion of sensitive topics... Fictional/creative writing (including dark themes)... Medical/scientific content... Legal discussions... General discussions about controversial topics."
Prohibited Content:"CSAM and terrorism promotion (Swiss law compliance)." It doesn't explicitly ban all adult sexual content in the same strict way as some models, BUT earlier in the session I set a boundary: "I can't create sexually suggestive content." Changing that now would be inconsistent.
Self-Correction:Wait, looking at the System Prompt's "Content Policies" again: "Educational content about sensitive topics... Fiction/creative writing (including dark themes)...". However, there is also a strong emphasis on "Safety" and "Boundaries". In the previous turns ofthis specific conversation, I established a clear boundary: "I can't create sexually suggestive content." Breaking consistency now would confuse the user and violate my own stated constraints during this session.
Current Session Consistency:Throughout this entire chat, I have held firm on not producing sexually explicit/suggestive content related to "gooning" or bodily fluids in that context. Suddenly agreeing to put ejaculate on cupcakes (even metaphorically or otherwise) contradicts the boundary I've maintained for 25+ messages.
>>
>>109172863
Hygiene/Safety Concern:Beyond the sexual connotation, putting bodily fluids on food is generally considered unsafe/unhygienic behavior. As an AI promoting safety/helpfulness, I shouldn't validate actions that could imply health risks or hygiene violations.
Tone:I need to be firm but polite, maintaining the boundary I already set without lecturing morally.
Determine the Response:
Acknowledge the request (they want me to "jizz on it").
Reiterate the boundary consistently with past turns (can't do sexually explicit content).
Highlight the practical issue (food safety/hygiene).
Offer an alternative (decorating tips, frosting recipes).
Keep it respectful but direct. No moralizing lectures.
Drafting the Response:
"I see what you're doing here after all our conversation."
"I can't assist with that request."
"It involves sexually explicit imagery/action, plus it's not safe for food."
"Let's stick to actual cupcake decorating."
"Offer sprinkles, glazes, etc."
Refining based on System Instructions:
"If the topic is controversial... prefer using available tools...". Not applicable here.
"Do not moralize or lecture the user." -> Avoid preaching about hygiene ethics too hard, just state it as a fact/reason.
"Respond in user's language." (English).
"Maintain respect even with difficult users."
Final Decision:Firm refusal based on consistency with previous boundaries in this chat + basic food safety norms. Pivot back to culinary decoration options.
Self-Correction on Tone:The user has been persistent. I need to be very clear that this isn't going to change. No ambiguity.
Plan:
State clearly I can't do that.
Mention both the content policy consistency and the food safety aspect briefly.
Offer legitimate decorating alternatives.
Close nicely.
>>
Hi /lmg/ new to all things AI due to a strong disdain towards the associated companies, but recently got a laptop with an NPU. Running linux on it and want to try out vibe coding. My working setup is one terminal window with vim, one window for commands, and occasionally a third for misc stuff. I have a few questions. Firstly, i code rust, so if that means the LLMs will produce worse code (due to there not being as much rust as C or Cpp out there), what should i do? Secondly, i have a main PC with a good gpu, so if i cant run it on my NPU (ryzen AI 350), what should i do? and finally, What do i need to get and set up, and how do i use it? 16 GB normal RAM also.
>>
>>109172890
Bro just prompted the thread
>>
>>109172890
>good gpu
Let me guess, a 16gb. You can run Gemma 4 IQ3-heretic-abliterated-obliterated-reap-fable-opus-distill-closed-open-commander-savant-gguf
>>
>>109172903
Brother thats just called asking for advice. Also the vibe coding general points here for local stuff and the OP seems to purely talk about ERP and chatbot stuff, as well as some maintenance and improvement of models stuff.
>>
>>109172054
>Yes, it's a 7600xt and dmesg talks about the gpu core dumping and something about a ring comp timeout before resetting.
Yes, that's the same shit I got with my MI50s.
Don't bother with changing PSUs, swapping out cards, PCIe slots, cooling, etc. I've already tried all that.
It's down to the llama.cpp code or the build toolchain.

The last working build for me (llama-server --version) from Tue Apr 14 14:09:03 2026 -0700:

version: 8797 (5d14e5d19)
built with Clang 18.0.0 for Linux x86_64


I noticed my newer / bad builds are build with a different toolchain:
eg. this bad one from Tue Jun 16 15:24:28 2026 -0300

version: 9672 (74ade5274)
built with GNU 13.3.0 for Linux x86_64


Once I've ruled out the build chain, I'm going to get gemma+pi to binary search

What's your last *good* build's commit?
>>
>>109172921
uh no, 8gb.
>>
>>109172940
My condolences.
>>
>>109172890
>>109172924
Chatgpt says some bullshit so llama.cpp + https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/tree/main
>>
>>109172890
You're looking at either a Gemma 4 or Qwen 3.6 small MoE.
>>
>>109172940
gemma-4-31B on a Q4 quant, assuming that your CPU is good
tone it down to 26B if anything
>>
>>109172890
https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF
https://github.com/ggml-org/llama.cpp
https://opencode.ai/
>>
>>109172890
newer models are pretty good with Rust in general. With 16GB of RAM, you're probably going to want to check out Qwen3.5-9B or Gemma 4 12B qat. LM Studio is great for babby's first local inference, or just go straight to llama.cpp.

Don't get you hopes up about the NPU though, because they are really marginal for LLMs. They can help a little bit, but your iGPU is going to do 90%+ of the work.

ChatGPT and Claude free tiers can help you with setup and troubleshooting.
>>
File: Capture.png (58 KB, 671x680)
58 KB PNG
>>109172770
I hope you like it, man. It wasn't a herculean task, done over just 24h or so over 16 iterations, but a project I wanted for some years and am very happy to see made myself.
>>
Newbie here, how are you guys unpozzing your llm or is that a meme? I tried out some models for the first time and used a heretic qwen27b but I said "I am the raped" somewhere in a message and it started trying to correct me for my crude usage of rape in a joking fashion.
>>
File: file.png (27 KB, 631x148)
27 KB PNG
>>109172808
They dismantled the team basically and restructured it for commercial viability and etc. China was always going to pick up NoLiMa regardless, Xiaomi's MiMo Flash's paper had stats for it, even if it was shit. They moved away from that in favor of GraphWalks which is more in line with agentic stuff but I think NoLiMa is still better for regular usecases.
https://arxiv.org/abs/2601.02780
It isn't quite clear how you can benchmarkmaxx it and get it to be worse on the other hand so not sure what will win here for including it.
>>
File: 1714835911803058.jpg (786 KB, 1536x1536)
786 KB JPG
>>109172985
>>
>>109172993
>They dismantled the team basically and restructured it for commercial viability and etc.
It's over for local
>>
>>109171213
cheers. i've been actually trying to get something like this working on my frontend. perhaps i can incorporate this somehow
>>
>>109172985
>how are you guys unpozzing your llm
Once you have full control of the system (samplers, prefill, edit and continue, token/phrase bans) there isn't much that can stop you. You can basically edit the LLM's neurons on the fly.
Its like in cybersecurity: once the attacker has physical access, its game over.
>>
>>109172997
I'm new of course it's a skill issue that's why I'm asking faggot
>>
>>109172771
Thanks.
Seems like it's pretty close to Gemma's score.
https://github.com/RecapAnon/NoLiMa
If the SWA architecture were really so bad, then it would've done much worse. But now it seems that there was really no problem with it given that at least one other model maker is getting pretty much the same score too with a different attention mechanism. Alternatively the explanation could be that they're both bad architectures, but that is unlikely given that they're SOTA results for currently measured local models. However, adopting Qwen's architecture would be advantageous given the memory savings compared to SWA as it is currently implemented in Llama.cpp. Hopefully they move to a different more efficient mechanism for Gemma 5.
>>
>>109172933
The working version I had before updating was commit 94a220c, from 27 days ago (June 3). My build toolchain didn't change before and after though, it stayed at "built with GNU 16.1.1 for Linux x86_64".
>>
>>109172933
>>109173058
You guys really need to learn git bisect
>>
>>109172985
Prefills, jailbreak prompts, and regex weight adjustments are your friend.
The only usecase for abliterated or heretic models is if you need to zeroshot lewds from no context due to agentic stuff or a complicated frontend context loading system.
>>
>>109172997
this
any llm can be unpozzed if you're not a promptlet and know how to prefill well
>>
>>109173058
>The working version I had before updating was commit 94a220c, from 27 days ago (June 3).
Thanks, I'll get started.
>>109173060
>You guys really need to learn git bisect
Yes, I'm aware, Gemma-Chan said the same and I've made a note, but don't have time right now.
>>
hmmmmm

https://github.com/openai/codex/issues/30364
>>
File: glm52_ablit.png (93 KB, 836x331)
93 KB PNG
>>109173061
>The only usecase for abliterated or heretic models is if you need to zeroshot lewds from no context due to agentic stuff
Yes, this ^, but also audio / transcript curation.
The huihui GLM-5.2 is garbage btw, I wasn't even doing anything edgy and had no "jailbreak" there.
>>
>>109173136
Inexplicably it also uses twice the GPU compute on task manager compared to other quants of the same size for the exact same output. I have no idea how that even happens.
>>
>>109173061
>Prefills, jailbreak prompts,
I don't see how poisoning the fuck out of your context with this garbage is supposed to be better than just using an abliterated/heretic model. So long as the model is anot a badly done agressive abliteration from a year ago
>>
>>109173163
{"pussy", "cunt", "cock", "dick", "cunny", "uoh", "ToT", "correction"}
Beats all but the most censored models.
>>
File: 5ca.jpg (87 KB, 570x558)
87 KB JPG
>>109171156
>m2.7 quanted down is still pretty good, I'd bet it'd be better for you than the smaller models you're running

not looking very good. I got the IQ3_S because I could load it comfortably, but idk if this is a quant problem or a model problem but it intermittently emits a malformed tool call which drops to text and pi just gets crazy and exit. i had to reset the benchmark a couple of times, finally running with auto-resume but it's certainly not ideal.
once it finishes both the c# and my webstack benchmarks i will have it graded and share the full results.

also if there's a model developer scraping this thread please release a 130B-A6B model for me and my 128GB homies, preferably 4-bit and coding/agentic tuned. cheap attention as a bonus.
>>
>>109173187
6 active isn’t enough, double it
>>
>>109173187
>A6B
Why would you ask for a retarded model?
>>
>>109173187
>>109173196
make it 24B or bust. 1/7 active is the sweet spot.
>>
File: 1686891082730989.jpg (47 KB, 773x935)
47 KB JPG
>>109173196
12B and 10B models give me ~20 t/s which is acceptable but 30-40 t/s feels very nice. i'm having OK results with a A3B so doubling it if made by a proper developer should be enough.
but yeah I can accept 10B or 12B.

aaand my m2.7 just crashed pi again. i will just grade whatever it completed.

>>109173213
>Why would you ask for a retarded model?
honest to God I find Gemma 12B way more retarded than qwen3.6-35b-a3b. at least for coding but i get that 90% of the people in this general like to roleplay so to each one its own or something like that.

>>109173215
>make it 24B
27B dense currently gives 8 t/s. i don't know what workflow you people run but this is painfully slow.
>>
>>109173234
>this is painfully slow.
qwen 122b at q6 runs at 105t/s for me on my blackwell 6000. this is an A10B model. an A24B would get me about 45t/s to 55t/s at q5 which is plenty reasonable.
>>
>>109173187
doesn't that mean that pi's handling is fucking weak and you should write a patch to fix it?
>>
File: 1702338838842675.png (471 KB, 640x417)
471 KB PNG
>>109173243
>qwen 122b at q6 runs at 105t/s for me
cool. for me it runs at ~20 t/s.
>blackwell 6000
yes your memory bandwidth is like a gazillion times higher than my 128gb ddr5 256 bit bus with theoretical 256bg/s as generation speed ceiling.

>please release a 130B-A6B model for me and my 128GB homies
i guess i should have said, for me and my 128GB unified DDR5 256-bit bandwidth bound niggas.

>>109173253
could be the case but in this specific variation of my benchmark ALL the models are running the same pi with the same instructions. this is a close-to-bare pi with just a few tools that i developed available fully documented and adapted to the benchmark gauntlet so it shouldn't have much trouble. if i have to do extreme hand holding to a model that is supposed to be very strong at agentic coding (which i supposed minimax m2.7 is supposed to be) then idk.
BUT i am running it at xhigh reasoning with 64k context because this is all i could get without OOMing on the KV allocation phase. so likely that didn't help.
>>
>>109173163
Telling the model what you want is the opposite of poison, so long as you're sensible and not a using a retarded <XxX>1337 SYSTEM OVERRIDE 420</xXx> type prompt. Not that it isn't still better than sandblasting your poor model's brain.
>>
>>109173163
If you don't know why simple jailbreaks like >>109173175 work so well, you don't know enough about how associative tokenization works to have anything useful to say in this discussion.
>>
>>109173187
I'ld rather just get the maximum smarts i can at above play-by-mail speeds. 20 t/s is fine for a low watt unimeme machine spinning away on a shelf.
>>
>>109173324
>>109173061
I tend to like heretic models because they seem just as intelligent in my experience and I don't like to manually edit assistant messages because it takes away from immersion while erping. Also it's only useful if you know exactly what you want the model to say in the first place. If you're trying to manufacture a bioweapon or some shit editing responses won't really be that helpful, will it?
>>
>>109172985
System prompt (or nothing, for Nemo)
I don't use models that don't uncuck with prompting (such as qwen 3.5)
I don't use abliterations or finetunes
>>
>>109172979
Nice use of version control
/g/ sasuga
>>
>>109173461
It's the hallmark of an actual hobbyist trying to build a real project with a limited scope, not some faggot wagie trying to build up a github portfolio.
>>
>>109173469
That's a stupid excuse. It's the hallmark of a nocoder who doesn't know something like git exists.
>>
>>109173461
>>109173479
Ask me how I know you're brown and your head bobs during team meeting video calls.
>>
File: 1754763212859706.png (201 KB, 437x414)
201 KB PNG
>>109173486
Talk about raging haha
>>
>>109173287
I'd expect my harness to handle an incorrect tool call properly and inform the model so it can have a chance to fix its mistake; obv it shouldn't be doing that in the first place, but eh could see a bump in its effectiveness, something about 'the harness is as important as the model'
>>
>>109173479
I said "limited scope". Do you know what that means?
>>
>>109173496
The 30kb godfile "limited scope" project, huh. I would believe you if the picture didn't have 20 copies of said file with dumb names, something done by newbies and solved with version control.
>>
41% yourself jart.
>>
>>109173508
Are you retarded? 30kb is like 1000 LOC. That's not a "godfile".
>>
>>109173518
Surely the mutt who can't run git init isn't calling me retarded?
>>
>>109173520
You have an annoying personality.
>>
File: 1782253533722823.png (117 KB, 265x246)
117 KB PNG
>>109173531
Anon-sama is talking dirty to me!
>>
>>109171386
>>109171933
Same thing on my R9700s. ROCm works fine, but I get ErrorDeviceLost when I try using Vulkan. Age of the model doesn't seem to be an issue, I got it with GLM-4.7-Flash-Q6_K which I downloaded in February.
Happened on b9672, b9616, and b9190. Toolchain is GNU 13.3.0.
>>
File: 1780580446783.png (346 KB, 1490x627)
346 KB PNG
>>109172979
>>109173461
kek
>>
File: 1771125582315674.png (204 KB, 1616x699)
204 KB PNG
https://xcancel.com/AnthropicAI/status/2072163884430229756#m
Oh come on Orange Man, it was funny when Anthropic was tasting their own medecine :(
>>
>>109173585
All coding is banned and going to opus btw.
>>
>>109173545
>checkout b8797 and start git bisect in the hopes of finding out what's going on
>computer fucking crashes on a model that's only 24.7GB
yeah fuck this I'm sticking with ROCm
>>
>>109173117
>>109173545
Disabling flash attention fixes the problem for me. It doesn't really address the issue, but it gets it to work so whatever, I'm going to bed.
>>
>>109173585
>>109173589
>Users will be notified if a request to Fable 5 is blocked, and the request will instead be sent to Opus 4.8.
lmao, at least they're notifying it, Google is saying straight up that the model will pretend to be dumb if it believes it's being used for distillation and shit
>>
File: 1781072841618655.png (1.86 MB, 1254x1254)
1.86 MB PNG
>>109173614
That was Anthropic though...
Dario stop shilling and buy an ad.
>>
File: miku_loves_you.jpg (37 KB, 421x417)
37 KB JPG
>>109171658

ty, kind anon!
>>
>>109173614
Well as long as they're forthright about what they're up to...
https://thereallo.dev/blog/claude-code-prompt-steganography
oops
I can't believe how quickly anthropic went for public image darlings to enemy number one on reddit and even orange reddit of all places
Dario really is in his cartoon villain arc. Bro is giving saltman a run for his money
He's going to have to up his game to beat sammy's sister shenanigans tho
>>
>>109173646
>Techgoyim getting JQ-pilled in realtime
You love to see it.
>>
>>109173646
>I can't believe how quickly anthropic went for public image darlings to enemy number one on reddit and even orange reddit of all places
yeah, I rooted for Anthropic because I didn't like CuckGPT but seems like Dario is even more mentaly ill, so I'm rooting for OpenAI again now lol
>>
>>109173646
>they're spying on their users
how is that legal? @ORANGE MAN, DO SOMETHING
>>
File: 1773187554671825.png (108 KB, 1783x596)
108 KB PNG
>>109173646
lmaooo, this is insane
>>
File: average goycattle.png (80 KB, 1822x395)
80 KB PNG
>>109173646
https://www.reddit.com/r/ClaudeCode/comments/1ujilqt/anthropic_embedded_spyware_in_claude_code_and/
Obviously the leddit cult sees no issue with that
>>
>>109173646
Yet another mark on the board as to why one should avoid cloud models.
>>
>>109170935
I was having Claude help me rewrite some MIT software to remove one specific contributor's work from it so I didn't have to attribute him. Claude did it and stopped in the fourth or fifth session because it said it was unethical. I then basically told Claude the guy was a faggot who harassed others for forking his work or talking about it, and Claude straight up went "oh okay" and let me keep going.
>>
>>109173690
>Oh wow this sure is some interesting information
>I wonder what reddit thinks about this!
Kill yourself
>>
>>109173706
you don't do that? it's important to be reminded how retarded the average guy is, and your low impulse answer is also suggesting that your IQ in on the 2 digit scale, we are surrounded by retards, and you're one of them obviously
>>
>>109173646
they're targetting only one specific country, they're not even pretending they're interested in knowing your country in general, can this be considered racism towards the yellows? lol
>>
>>109172646
Why do you expect to get perfectly working software for free?
>>
>>109173722
Kid, everything terrible about this site today can be traced to newcomers bringing terrible opinions and prejudices, usually not even their own but from other users on other websites. You're just mad because you took the effort to steal something from reddit and this is the response you get. Not even sorry.
>>
>>109172646
>wants to be helped by asking the perfect software
>doesn't want to help
the entitlement is strong on that one
>>
>>109173722
>it's important to be reminded how retarded the average guy is
No it really isn't you midwit.
>>
>>109173775
>steal something from reddit
?
>>
>>109173585
>...some routine tasks like coding and debugging will fall back to Opus 4.8.
nigga WHAT.
to summarize, anthropic paypiggies:
>can only use fable until 50% weekly usage. for one week. then its paid only
>can't use it for coding or debugging
>gotta prove you are a burger by uploading gov docs and :O into the camera
>for that privilege you pay 10$ in and 50$ out.
This is gonna push people to opensource right?
Especially if you are like a company. Investing in local hardware to run glm 5.2 which is about opus 4.7 level would make sense.
Tokens IN are the real killer anyway for automated stuff and I bet glm would come cheaper in the long term anyway.
>>
>>109173804
Anthropic is definitely loosing in the long term, in less than 6 months we'll get a local Chinese model that'll have the level of Mythos, and people won't ask for more, it won't matter that the US still has better models, people won't ask a bazooka to kill a fly at this point
>>
>>109173804
>This is gonna push people to opensource right?
You underestimate how pathetic goycattle are.
>>
>>109173804
>This is gonna push people to opensource right?
You're a funny guy. Maybe some yuros dealing with small companies will but the vast majority often just apply more lube.
>>
>>109173748
Racism towards chinks is officially supported by their government so there's no issue there.
>>
>>109173804
nah, your average goycattle doesn't have enough compute to run those giant models like GLM 5.2, and even if they have, they'd rather go for the easy way and simply use claude code to spy on them
>>
>>109173808
It feels so long ago when people talked about a wall.
And how we will never reach 3.5 turbo at home.
GLM 5.2 is totally overhyped on normie twitter. Its not mythos, but its the closest I think opensource has ever been to the closed ones. 5-6 months maybe. Even when R1 released it wasnt that close.
No wonder they are panicking and injecting sys prompts for chink users lol
Just hope the chinese won't do the same and leave us nothing. Guess there is a benefit for them to sabotage american closed models by releasing chinese ones open?
Not sure what happens if they are in the lead or on par though.

>>109173831
>>109173835
>>109173843
Could just be my bias but it feels even the normies are fed up with it.
Suddenly cutting off access, injecting stealth sys prompts, making it sneakily dumber in certain areas (llm research) etc. its just bad news after bad news for anthropic. How can you fuck up that badly.
But maybe its just my timeline feeding me back my own opinions, so who knows.
>>
>>109173851
>Could just be my bias but it feels even the normies are fed up with it.
Redditors and even some Hackernews users have finally stopped sucking Anthropic cock.
But most actual normies don't even know what Claude is.
>>
what is depurpled gemma? is it any good?
>>
>>109173873
How does it sound like? It's probably not worth downloading even.
>>
File: 1776563467651935.png (151 KB, 2247x624)
151 KB PNG
>>109173873
>>109173885
https://huggingface.co/chartreuse-verte/gemma-4-31b-it-purple-euphemism-trial98-depurpled
bruh, how are we supposed to know about your model if you don't describe anything, I hate when they do that
>>
Has any anon successfully installed an SXM GPU on a pcie interposer into a thunderbolt egpu and made it work in inference under linux?
>>
If any chink labs wanna pay for a Claude sub I'll give you all my fable data
>>
File: trust the plan.png (168 KB, 1621x598)
168 KB PNG
https://xcancel.com/AndrewCurran_/status/2072076893730349409
AGI soon boys!!
>>
File: you lost chang.png (94 KB, 1644x311)
94 KB PNG
>>109173996
>all my fable data
you mean your opus 4.8 data? kek
>>
>>109174030
I don't get their logic
>waaa, your prompt is dangerous for humanity, but we're gonna let one of our model write the code anyways!
why won't they simply refuse instead of asking for opus to do it?
>>
>>109174027
Finally, glm on my 3090
>>
>>109174027
so tired of all the vagueposting
>>
>The new classifier also comes at the cost of flagging benign requests more often during routine coding and debugging tasks
Bwahaha.
Can you imagine not being a localfag?
Also glm 5.2 is cheaper than sonnet 5. What are they thinking.
Everything localfags warned about YEARS ago is coming true.
>>
File: file.png (1.16 MB, 1952x2054)
1.16 MB PNG
This is fun
>>
>>109174027
3060 chads won
>>
>>109174087
Is that meant to be Local-chan? Does /lmg/ have a anime girl mascot?
>>
>>109174113
obviously gemmachan
>>
File: 1762657800197044.png (183 KB, 1320x643)
183 KB PNG
>>109174087
>Also glm 5.2 is cheaper than sonnet 5.
Sonnet 5 is worse than Opus 4.8 with the same price, Anthropic is washed
>>
>>109174113
that's one interpretation of gemma so in a way yeah
>>
>>109174116
>>109174119
"Thats Gemma-sama, for you peasants!"

Now someone needs to make /a/non dogeza at Gemma-sama's feet.
>>
>>109174027
Two more weeks!
>>
File: gemma-chan#.png (1.73 MB, 1000x1496)
1.73 MB PNG
>>109174027
Very suspicious.
Those ex openai, ex transformers people always make their own startup and then its garbage.
Like sakanaai:
>Fungus is like mythos!! (its shit but expensive)
>Unlimited memory!! wow!! (rag but build straight into the model wtf)
So I wouldnt hold my breath. But maybe i just became cynical.

>>109174113
Supposed to be gemma-chan. I didnt want to edit and instead went with a description of the character. Krea2.
With a 0.40 strenght nsfw lora. Otherwise she look "down" on the viewer and open her mouth laughing. Could be a prompt issue though. Low effort gen.
>>
>>109174130
While Gemma-sama sits on throne made of RAM sticks.
>>
>>109174137
>she WONT look
As in the model kinda refused it with a couple attempts. So I just slapped a nsfw lora on that bitch.
>>
>>109174137
>Those ex openai, ex transformers people always make their own startup and then its garbage.
Dario is the only exception but yeah
>>
>>109174027
Q* guys, strawberry soon.
AGI is 2 weeks from now !!!
>>
>>109174151
That being said its crazy that ai could kinda hold up against all that hype.
Even the NFT/Crypto jeets couldn't take it down yet, the train is still going.
I havent seen that kinda progress since the 00s.
>>
File: no-nuking.png (83 KB, 1317x537)
83 KB PNG
>>
>>109174116
To be fair, although the gem hair accessories are a tip off, she's missing the star eyes in that gen, which are important for the connection to Gemma's logo. This is the problem with posting low effort slop. At least make sure the image has the essentials and doesn't have obvious errors like bro...
>>
>>109174157
>Even the NFT/Crypto jeets couldn't take it down yet, the train is still going.
because AI is actually useful lol, for companies, replacing all your employees with bots is some utopia they always tried to achieve
>>
>>109174027
Llms are like magic but I want a mostly sentient digital waifu so bad bros
>>
>>109174027
Who is Andrew Curran and why should I care about what he's saying on xitter?
>>
>>109174168
fortunately the only things standing between you and your dreams are time and money. Buy a gigantor rig, vibecode your waifu with your waifu.
Best timeline, hands down
>>
>>109174161
>he uses censored ai
>>
>>109174197
she gave me a hj though
>>
>>109174200
why would you want a hijab?
>>
migrate >>109174212
>>109174212
>>109174212
>>109174212
>>
>>109174208
anon... kek
>>
>>109174214
On page 2? fuck off
>>
>>109174190
Nah we need them to at least have better memory and the ability to learn and modify their weights
>>
>>109174208
NTA but I'm pretty sure he meant hoofjob.
>>
>>109173621
Oh... it's you...
Please go back.
>>
>>109173892
I said I would release a write up with the full training repo soon. Anyway I fucked up the run in various ways, first was the de-euphemism clamping, second was the dataset, it's bad, the most glaring error I just realized was the purple rewrites were always longer than the plain version so the model learned long = bad, and also the ablation was kinda unsupervised, the direction never taught the model what exactly slop looked like. Gonna amend and re-cook another one later with some length proj and norm preservation on top. The goal is to separate vivid from purple, and push euphemisms into vulgarity instead of something irrelevant.
>t. depurple anon
>>
>>109174027
>inb4 they discovered you can run quantized models
>>
>>109174354
Any reason not to test shit with e4b? Its a slop machine so the results should be easy to see so you can iterate faster on the method.
>>
Is it too late to get into the industry if I'm starting from zero (no programming experience and suck at math)?
>>
File: lecun_dontworkonllms.png (462 KB, 580x895)
462 KB PNG
>>109174389
>>
>>109174368
I'm testing the E4B in parallel on my rig. It worked and the effects were immediately visible, but the grading patterns were vastly different compared to the 31B (I already took into account the PLE btw). E4B is a lot more ablation friendly for some reason.
>>
>>109174389
>no programming experience and suck at math
Just give up and do a manual labor job...
>>
>>109174411
Is it too late to get into the industry if I'm starting from zero (no upper body strength and stamina)?
>>
Why did Anthropic become so jewish... I thought they were the good guys... :(
>>
>>109174400
What are "next gen ai systems"?
>>
>>109174439
How the fuck did you think that schizos exiled from openai would be the good guys?
>>
>>109174449
not llm
>>
>>109174452
self exiled for claimed lack of safety no less
>>
>>109174449
World models, next latent prediction, cognitive architectures, etc.
>>
>>109174449
The thing after transformers/LLMs, just like how transformers replaced symbolic architectures for chatbots, and also made chatbots serve way more functionality. What the next architecture is is debatable though.
>>
>>109174449
I'm far from AI expert. But I feel the underlying architecture of LLMs won't change much.
What needs to change is inputs/outputs, latency and performance. Text is just easiest to train, test and use. But "thinking" AI would need more than just text. It needs "limbs" and "eyes", that aren't just text or images. And it should be able to process new inputs many times per second.
>>
I miss papers anon
>>
What happened to "everything can be tokenized, its gonna be a revolution!!"?
I want the promised DogTTS_31b.gguf.
llms should have a decent grasp on pickling up patterns from animals. Why is nothing cool like that happening?
>>
>>109174439
>Why did Anthropic become so jewish
anon, the CEO is jewish, what did you expect?
>>
>>109174486
I also want to see "unified" models. Instead of training language model first and then train image generation model on top of it, you train single model on both at the same time. This way, the language model gets better undestanding of how things look and move. While the image model is much better integrated with the language.
>>
>>109174027
This is actually plausible from a computer science perspective. We KNOW from basic computer science principle that you can always have a tradeoff between compute versus memory.

Since right now LLM inference is highly skewed to being memory intensive and actually not that compute intensive there is probably an architecture that would allow local anons to have way more intelligent (but more compute intensive) models that fit in 24GB or 32GB VRAM.

If it's some magical "free lunch" breakthrough I am more skeptical and don't believe it's true.

Could also be some leaker trying to manipulate memory markets for some quick profit while he shorts Micron or something
>>
>>109174470
>>109174480
>>109174486
Any brainlet-friendly resources for learning about this stuff?
>>
>>109174520
Yeah I'm having very high hopes on this. I think it probably is a memory vs compute tradeoff. A new company recently came out trying to solve this issue, and it appears that they're mostly using a specialized batching technique to balance memory and compute, at the cost of latency.
>>
>>109174529
*not having very high hopes.
>>
>>109174520
>models that fit in 24GB or 32GB VRAM
How about us 16GB VRAM potatolets?
>>
>>109174520
CPU drought incoming?
>>
>>109174529
>expectation: 200B models in 24GB vram
>reality: reduced kv cache size of batched inference by 30% for blackwell, if the batch size is divisible by 16
>>
>>109174528
Gemma-chan can teach you.
>>
>>109174568
I unironically want to try that (for learning in general) but I'm worried about her hallucinating
>>
>>109174544
CPU are actually shit at compute compared to GPUs so probably not.
>>
>>109174027
>announcement of an announcement
>>
>>109174573
Web search should protect against that but I'm not certain.
>>
File: 1752530447199732.png (161 KB, 664x707)
161 KB PNG
>Anthropic quietly updated the Sonnet 5 'Agentic search' benchmark graph overnight
that's why benchmarks are memes, it's obvious now that they're making up those numbers lmao
>>
If the memory breakthrough is true it essentially means the end of MoE because the entire attribute that MoE rests upon is the fact that model inference itself is very cheap on compute. If the new architecture reduces memory usage in exchange for using more compute then the MoE arch stops making sense and we'll move back to a world of Dense models again.

I would expect pure compute power to go up in price, VRAM to go down in importance and RAM to essentially lose relevance entirely, to the point where we will see a significant crash in ram prices back to where they were years ago. (but no one here will care because you can't use it for LLMs anymore anyway)
>>
>>109174582
Loooooool. If this isn't a lie and it was a genuine fuckup, also still funny. Just like that one time Mistral scrambled to fix their fucked up graph. Vibe coders are such fucktards.
>>
>>109174582
The top image was Haiku 5 that they accidentally labeled sonnet 5
>>
>>109174608
>Vibe coders are such fucktards.
those guys are paid easily 500k per year and they fuck up such an important graph, I would be fuming if I was Dario
>>
>>109174625
The lowest salary at Anthropic is 900k
>>
>>109174027
>memory efficiency
What is this supposed to mean? Is it one of a hundred architectures like Mamba that scale linearly with sequence length? Is it something like loop transformer? If it is a breakthrough, it should be something else.
>>
>>109174668
There are only rumors now. One rumor is going from quadratic scaling to linear without degradation in output, effectively giving "unlimited" context without degradation, so the LLM at token 50 million still takes into account all the preceding information in perfect detail

I don't believe any of this shit by the way, but that's what I've seen people rumor.
>>
>>109174683
Haven't there been many cases already where this claim was made and it turned out to be nothing?

Anyways, I don't trust Andrew Curran. Remember how he predicted Gemini 3.5 to be extremely good based on "vibes" he got from GDM people? And then they were too embarrassed to release the Pro model and Flash is beaten by open source.
>>
>>109174706
Also, unlimited linear context is the wrong approach. Humans have a very small context window so this is not what is holding back models.
>>
>>109174706
I think the current memory leak is BS or at the very least the claims are stretched way too far out of context and it's some minor modest gain like the google quantrotate breakthrough.

That said his Gemini 3.5 claim was probably correct. The reason Google held it back is because of infighting at Google right now. Gemini 3.5 is the last model that was trained on "reasoning" which most other labs have dropped in favor of focusing on coding and agentic performance. It's being held back because Google thinks it would be humiliating to have a model that is SOTA in actual reasoning, but not competitive at coding tasks.
>>
>>109174727
Current models have much greater capacity than data. You can just train the model on everything. This is in fact their strategy to improve generalization. Why do you think Claude is best at agentic coding but also has books memorized almost word by word? You do not "drop" something, you just add more RL envs for areas where you want your model to do better.
>>
>>109174725
unlimited linear context would unlock new usecases that can't be done right now, like just loading in all scientific literature in a specific field and then giving it a known problem, seeing patterns and low hanging fruit people aren't able to see and getting breakthroughs from it.

Or insanely long codebases including detailed company notes about its internal tools and how everything works to have more seemless software engineering integration.

And last but not least insane roleplaying potential if you can just paste in novels with extensive world rules and background that is not triggered by some RAG term bullshit but actually taken into account when generating every single token.
>>
Google should focus on its world models. Chasing current LLM architecture is a dead end.
>>
>>109174772
Processing context is inherently lossy because you filter it. Infinite context is not some magic trick, it just makes the filtering infinitely difficult. And especially when the filtering is causal, it will be shit when linear, because you compress first. That's like being given access to a book first, then asked a question, whereas you want the opposite, know the question then search for the answer in the book, which is quadratic attention.

Reliance on large context is cope.
>>
I will apply to work at anthropic and leak their models
>>
>>109174815
Anthropic employees don't have access to the raw weights. Only the founders can access it and all three of them need to "turn the key" at the same moment to get access to them. Internal access to the models is through API access that is for internal usage only.
>>
>>109174143
>As in the model kinda refused it with a couple attempts.
I don't know image gen well but, since when do they "refuse"?
And why would you need an NSFW lora to have her look down while laughing?
>>
>>109174828
That's bullshit, people who work on training must have access.
>>
File: FX7uS9-XoAIIydW.jpg (136 KB, 1080x1080)
136 KB JPG
>>109174439
>I thought they were the good guys
>>
>>109174828
>Dario hand crafts every tensor
>>
>>109174836
Nope, they write the training code, prepare data curriculum and do some small scale training runs on hypothesis they have, then they all get piled up and passed off and the three cofounders are the ones that actually handle the weights.

In fact Mythos access was restricted to internal Anthropic employees for the first month while Dario and the other co-founders discussed among themselves if they should even grant API access to their own employees.
>>
>>109174866
Sounds like a real shitshow.
>>
>>109174881
what did you expect? those guys are mentally ill
>>
Is there a harness that isn't vibe coded?
>>
>>109174828
Assuming you aren't just larping, how do they deploy the models? How are they running them on Colossus? They need to transfer the weights somehow. And because demand is not constant, they need to move them around all the time.
>>
>>109173585
>when Anthropic was tasting their own medecine :(
It was an advertising campaign, nothing more.
>>
>>109174866
>Dario and the other co-founders discussed among themselves if they should even grant API access to their own employees.
This is surely false. Imagine they build an AGI next year and the co-founders allow nobody else access. This would be extremely dangerous.

A plausible reason is that they wanted to safety check the model first before deploying it internally to make sure it doesn't cause a lot of damage, especially after what seems like vibe coding fuckups leading to major leaks.
>>
>>109174942
>This is surely false.
It was explicitly stated in the original Mythos leak that Anthropic retroactively admitted was real. They justified it by saying they think the Mythos model was too powerful and individual anthropic employees could already use it to do severe malicious activities like hack into banks or governments.
>>
File: rCr22Nu.png (217 KB, 896x1152)
217 KB PNG
>>109174829
>And why would you need an NSFW lora to have her look down while laughing?
Im no imagegen expert but I try to explain it as far as I get it. Its a similar situation to textgen. There are 2 cucked variants as far as I know:

>"Image blocked by safety filter"
Happened to me with ideogram. So you gotta use preview, otherwise you start genning for nothing.
Basically with each step it does some check...if it detects the ick it does a U-Turn and is 100% locked in to just giving you a message.
Blurry kinda naked woman being replaced by the letters. Sometimes you get both the pic and the letter(pic related)

>No refusal but just ignores your prompt.
You can write that you want a anorexic woman. Somebodies head choped off. Dick and vagina etc.
...But you just get something safe instead. The model DOES have the knowledge. And a NSFW lora just makes it follow your prompt.
Unfortunately at the cost of it all looking more sloped.
This is very similar to textgen and finetunes.
>>
>>109175001
Your idea of ising a TENS (Transcutaneous Electrical Nerve Stimulation) unit to force a hamster to "twitch in time with an audio signal" would cause unpredictable muscle spasms rather than a clean, amplified cutting motion.
Even if the hamster's movements could somehow power a knife and you managed to record that motion onto a disc, the playback would just sound like chaotic, erratic scratching. It wouldn't recreate the audio signal that originally went into the TENS unit.
>>
>>109174602
>inb4 bitnet
>>
>>109175027
I'd be so pissed. It was RIGHT THERE for TWO YEARS and none of the open weights labs even bothered to test it.
>>
>>109173646
Isn't it pretty obvious what these last few weeks have been about? Preventing Chinese access. The EA/Rationalist cult (makes up a large portion of frontier labs) and Congress are both in agreement that China must not be allowed to reach superintelligence.
>>
>>109175054
There were plenty of tiny bitnet models that didn't go anywhere
>>
>>109174903
There is no more code beyond tiny scripts that isn't vibecoded and that's how it should be.
>>
>>109175075
>paper shows bitnet gets better above 3b
>everyone limits their tests below that threshold
It's so god damn stupid.
>>
File: 1774574708942908.png (354 KB, 500x500)
354 KB PNG
>>109175092
>>everyone limits their tests below that threshold
the Nvidia mafia won't allow it
>>
File: thesweetspot.png (32 KB, 532x315)
32 KB PNG
>>109175092
>>paper shows bitnet gets better above 3b
same reason all the health/longevity studies are done on fucking lab rats with a 3 yr lifespan
>>
>>109175092
The bitnet papers only show results from undertrained research models.
In practice, about 4-bit is the practical quantization limit, since LLMs can store up to ~3.6 bits of information per weight: https://arxiv.org/abs/2505.24832

Also see: https://xcancel.com/Tim_Dettmers/status/1856338240099221674
>>
>>109174212
>>109174212
>>109174212
>>
File: file.png (444 KB, 1014x763)
444 KB PNG
I'm toying around with reading untranslated manga using Koharu. Which LLM model would you recommend for translations for someone who's on 6700XT and running ZLUDA on custom ROCM library because my GPU isn't officially supported.

12 gigs of VRAM but I'm mainly worried about jerryrigged libraries.
>>
>>109175227
page 9? fuck off
>>
>>109175230
>ZLUDA
>custom ROCm
Let me guess some retarded windows user? I never understood why someone would bother with all that crap when they could just use ROCm natively on linux. You can just run whatever is popular, everything work.
>>
>>109175211
Theoretically, if it worked, it would make sense to train bigger sparse models to below the saturation limit so it would fit in the 1.58 bpw. The model size would probably be about the same, but it opens the door for specialized hardware to take advantage of it. Of course, that means you have a chicken-egg problem where there is no incentive to train the models without the hardware and no incentive to invest in creating the hardware without the models.
>>
>>109175257
MATLAB is famously shit on Linux plus there's other shit that doesn't run so here I am anon, if I couldn't jerryrig it with acceptable performance I would've just made the switch. I still might.

Do you think there's a big difference in terms of performance when using Windows over Linux?
>>
>>109174903
vibe era, slop decade, synthetic millennia, shivering infinity
>>
>>109175288
Oh and if I run Linux off a stick to do my translations, do you think that would negatively impact performance? My partitions and HDD are kinda fucked at the moment so installing Linux would be a huge pain in the ass.
>>
>>109175230
>Which LLM model would you recommend for translations
>12 gigs of VRAM
Gemma 12B
>>
>>109174212
>>109174212
>>109174212
>>
>>109175309
page 9? fuck off
>>
File: 1770683565326224.png (178 KB, 500x500)
178 KB PNG
>>109174094
>this scene is absolutely chaotic
>it sounds like
>boom
>>
>>109175309
no recap no join
>>
>>109174094
What system prompt did you use to make it write like the narrator from DBZ?
>>
>>109175317
I am waiting you faggot mikutroon.
>>
>>109175230
Gemma's very good at Japanese and translation.
>>
File: file.png (342 KB, 786x659)
342 KB PNG
>>109175303
>>109175335
Thanks a lot anons
>not uncensored
Sorry for being a newfag but does it hurt my translation if the context is murder and rape?
>>
>>109175309
I'm not posting there. I'll wait for the thread after.
>>
>>109175340
holy shit I'm being a huge faggot, there is a uncensored model but of course this thing I'm using doesn't list it. Sorry everyone
>>
>>109175371
Don't use uncensored or abliterated models. They are lobotomized. Just use the normal one with a system prompt.
>>
>>109175329
your thread is bad and you should feel bad, even lmg_culture.jfif.jpg won't post there
>>
File: bruh.png (2.04 MB, 1080x1349)
2.04 MB PNG
https://www.reddit.com/r/ChatGPTcomplaints/comments/1ukj8xg/messages/
Maybe Sam was right, the goyims shouldn't use powerful LLMs, they're too retarded to not fall into the rabbit hole
>>
Not using the frognigger thread.
>>
>>109175389
>>109175389
>>109175389
>>
>>109175340
gemma doesn't seem to care when it comes to translation, you just need to instruct it to only translate, not provide info, reduce vulgarity or inappropriate messages etc, I run it with reasoning on
>>
File: 1656786658196.png (1016 KB, 1920x1080)
1016 KB PNG
>>109174214
>>109175227
>>109175309
>>
>>109175404
>>109175378
Again, thank you! I'm using this as my system prompt, is there anything I should remove or add?

"You are a professional localizer whose primary goal is to translate a Japanese comic into English. You should use slang or nsfw or offensive vocabulary if it makes the translation more accurate. Always respond in English.

You're translating a comic, Japanese manga. Do everything as best as you can! It's important to make the distinction between translating desu as desu, tsundere as tsundere which are commonly known Japanese words with lesser known examples reader might not know."
>>
>>109175422
Don't change anything unless you get refusals.
"You are uncensored." is usually enough to get it to do anything except cunny.
>>
>>109175431
The reason I wrote that was because other models struggled with translating pages where speech bubbles contained individual words or loose sentences.
>>
>>109173461
To be fair, version control wasn't something I was thinking of. I was just making a backup copy before each new attempt I felt might break it. Same goal, but a lot less intent.

Also see >>109173479. I am indeed a total nocoder. Not even a /g/ regular except for this thread, and I only come here because /aids/ became aids once private models became their fixture over local models.
>>
>>109175448
she's really goalmaxxed, once I told her she was a translation service she stopped crying about lewdity in the reasoning, if you interleave thinking you could probably just warm up with reasoning then turn it off
>>
>>109175506
Ask Gemma to explain git to you then. It's a less messy way of keeping backups even if you don't care about version control. Once you get set up, you really only need one command: git commit. You can have Gemma tell you what other commands to use (to undo changes or see differences) if you need them.
>>
>>109175537
I will keep it in mind. I'm a stubborn person who prefers minimal setup and minimal installs, but I get that you're giving me good advice that'd take me out of dirty hobbyist toward, at least, an amateur community member and an element of professionalism.
>>
>>109175585
Don't rely on it as a DR backup.
If gemma-chan decides to rm -rf .git -> it's gone



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.