[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: mikulmg.jpg (1.18 MB, 1804x2160)
1.18 MB
1.18 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106566836 & >>106559371

►News
>(09/13) Ling & Ring 16B-A1.4B released: https://hf.co/inclusionAI/Ring-mini-2.0
>(09/11) Qwen3-Next-80B-A3B released: https://hf.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d
>(09/11) ERNIE-4.5-21B-A3B-Thinking released: https://hf.co/baidu/ERNIE-4.5-21B-A3B-Thinking
>(09/09) K2 Think (no relation) 32B released: https://hf.co/LLM360/K2-Think
>(09/08) OneCAT-3B, unified multimodal decoder-only model released: https://onecat-ai.github.io
>(09/08) IndexTTS2 released: https://hf.co/IndexTeam/IndexTTS-2

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: mikuthreadrecap.jpg (1.15 MB, 1804x2160)
1.15 MB
1.15 MB JPG
►Recent Highlights from the Previous Thread: >>106566836

--Differential privacy: VaultGemma's memorization reduction vs generalization:
>106566944 >106567707 >106568235 >106568661 >106568688 >106568747 >106568803 >106568808 >106568337 >106568567
--Qwen3-Next quantization and GPU deterministic inference challenges:
>106573151 >106573171 >106573199 >106573224 >106573235 >106573226 >106573279 >106573379 >106573425 >106573441 >106573467 >106573519 >106573610 >106573660
--1.7B open-sourced model achieves document OCR success with minor errors:
>106570867 >106570892 >106571715 >106570901 >106570943 >106572018 >106572081 >106572287 >106575181
--Balancing GPU driver updates for software support vs power efficiency and stability:
>106572592 >106572637 >106572669 >106572729
--ASML and Mistral AI form €1.3 billion strategic partnership:
>106574819 >106574857 >106574864 >106574900
--Challenges in domain-specific knowledge teaching with LoRA and summarized information:
>106568875 >106568949
--vllm's broken GGUF and CPU support issues:
>106569268 >106569331 >106569356 >106569357 >106569385 >106569553 >106569630 >106569666 >106569594
--Feasibility challenges for AI-generated game chat with video input:
>106569817 >106569839 >106569869 >106570004 >106570036 >106569923 >106569955 >106570369 >106570480
--Kimi K2's delusion encouragement performance:
>106570964 >106571077 >106571090 >106571099 >106571105
--Skepticism about K2's 32B matching GPT-4 capabilities:
>106567118 >106567806 >106568369
--Qwen 80B testing performance and comparison to larger models:
>106568659 >106568674
--Kioxia and Nvidia developing 100x faster AI SSDs for 2027:
>106569299
--vllm vs llama.cpp performance benchmarks with Qwen 4B model:
>106570266
--Miku (free space):
>106567977 >106568645 >106569488 >106571835 >106571849 >106571853 >106571856 >106571961 >106572139 >106573324

►Recent Highlight Posts from the Previous Thread: >>106566844

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
glm is for schizos, finetroons for drooling retards, mistral models break past 1k tokens and this thread is filled with people who have no idea what they're talking about but are happy if they can oneshot prompt some of the most disgusting ERP known to man
>>
Anything released within the last week gguf status?
>>
>>106575274
Yes... And?
>>
File: 1757285195262315.jpg (47 KB, 686x815)
47 KB
47 KB JPG
>>106575274
You'll elevate the thread, right?
>>
>>106575274
>people are doing something I don't like, that doesn't involve a singular other human being, entirely in the privacy of their own home, REEEEE
Grow the fuck up.
>>
>>106575274
Your temperature is set way too high, anon.
>>
>>106575274
gr8 b8 r8 it 8/8 m8
>>
>>106575274
I don't like that you can make a post like this and if you get called out, you just claim it was a shitpost all along. Where's the accountability?
>>
>goofs
I'm an old man out of the loop, the fuck does this even mean?
>>
>>106574177
>On linux I get over 50tk/s
wait what? how do you get over 50t/s on GLM air? I am a linux user and i only get around 20t/s on my dual 4090 setup
>>
File: pretending.jpg (40 KB, 349x642)
40 KB
40 KB JPG
>>106575394
>>
>>106575394
We need poster ID to find out what these kinda guys are up to. Probably shilling their latest enterprise jeetRAG solutions.
>>
>>106575413
goof is a synonym for mistake. they are trying to say that llamacpp was a mistake.
>>
>>106575413
You're an old man but unfamiliar with classic Disney characters?
>>
How's the progress on the llama.cpp MTP PR?
>>
>>106575413
it's the GGUF format for quantized weights
>>
Name one thing a local model has done for you
>>
>>106575474
There's nothing preventing you from having a full precision model in GGUF format, right?
Not nitpicking btw, I really don't know, but intuitively I imagine that yes.
GGUF is just a way to pack the weights and some metadata, right?
Does that mean you could have AWQ GGUFs?
>>
>>106575492
best orgasms of my life
>>
>>106575492
It recalled on demand Kasane Teto's birthday with 70% confidence
>>
https://youtu.be/7Jzjl3eWMA0?t=117
Women raping gay billionare werewolf writers sounds unsafe. But their fucked up fetishes are somehow safe. I hate this world
>>
>>106575274
>but are happy if they can oneshot prompt some of the most disgusting ERP known to man
If a model could oneshot 5-6 prompts continuing my organic ERP logs. Maintain coherence up to 30k. And would withstand me getting bored after a month my penis would be a happy penis. GLM chan was the closest so far.
>>
>>106575499
I have never tried to run f32 but it handles bf16 just fine. I think the quantization matters because it need to know how to do the math or whatever, they seem to call it a kernel for some reason.
>>
>>106575492
Best orgasms like other anon followe by wanting to kill myself again not because of sense of shame but because all the models are still fucking trash.
>>
>>106575451
You understand that feeding trolls is doing it wrong. Yes?
>>
>>106575523
For me it's reinforcing fantasies that can never happen (like getting a gf). I'm not sure it's doing me any good, but whatever, nothing I can do to my mind is permanent.
>>
>>106575506
Absolutely critical use case for me to be desu
>>
>>106575492
Reminded me that local is still a waste of time.
>>
>>106575537
At a certain point, I'm pretty sure it's just trolls feeding each other.
>>
>>106575492
help with scripts
medical advice (regarding headaches)
orgasms as the others said also helped me schizomaxx by making my daydreams more vivid and shit
help with looking shit up (eg what standrad does x use etc)
>>
>>106575420
I'm pretty sure that's not normal, triple 4090s on *windows*, with the windows performance nerf can do 80 tokens/s according to others, and I've seen my iq4xs air do 70 tokens/s on linux with 3090s.

20t/s... is about what my windows (I'm the guy with the fucked up multi gpu windows performance) does on 2 gpus using iq2xs.

Are you sure nothing's spilling to ram?
>>
>>106575507
Yes, female privilege is getting bigger with time
>>
>>106575668
it shouldn't be. I am using an IQ2 quant which should just barely fit in VRAM. could it potentially be my backend? I use oobabooga webui because it is convenient, but could it really be hindering my performance that much?
>>
>>106575629
You're right. Or samefagging
>>
>>106575698
>oobabooga webui
If it isn't an acient install, you're using llama-server to load your ggufs, so that shouldn't be it. I used koboldcpp, llamacpp (llama-server), and oobabooga (llama-server) and there wasn't a noticable difference.

When you load your model, there should be a line that tells you how llama-server is loading the model. Maybe you need verbose flag in cmd_flags.txt to see it.

Did you split by rows?
>>
>>106575792
it is a recent install, so then I guess that isn't the problem. I have tried both with and without row splitting and including it actually slightly reduces performance
>>
>>106575808
Are you able to check what's your gpu usage during generation? In my case, my absymal performance on windows is verified by the power usage - barely 80w on each card during generation, while linux pulls 150+.
>>
Why do these models speak as faggots
>>
>>106575836
I will check right now, but from what I have seen, usually it is below 100w during generation
>>
>>106575843
monkey see monkey do
>>
File: 1742672324772800.jpg (75 KB, 750x732)
75 KB
75 KB JPG
>>106575202
>>(09/11) Qwen3-Next-80B-A3B released
>Still no GGUF
>>
>>106575851
GGUF is a state of mind, friend.
>>
>>106575851
>9/11
Did they really? Fuckers. We should release a model on the 15th of april.
>>
>>106575507
Pretty cool, really. When was the last time you heard a real female speaking to you, anon?
>>
>>106575848
Btw, I confirmed it was specifically a multi-gpu problem in my case by running a model that fits in one gpu, and then running that same model with the same setting but split across three gpus. See if that's the case for you as well.
>>
>>106575891
so, then you're saying I should try to keep the model on one GPU?
>>
>>106575507
I hate jewoman
>>
>>106575898
Try rubning a smaller model on one gpu, then running that same smaller model but split across two gpus. There shouldn't be too much of a performance drop.

I'm just wondering if you have the same issue I have, but on linux.
>>
>>106575869
>noooo a foreign tech company is doing a minor release on OUR sad day??
>>
>>106575869
On their National Security Education Day?
>>
>>106575935
>>106575940
I am SEETHING. Raging. I can not COPE with their insensitivity.
>>
>>106575877
I don't remember... But now I will write a gay billionaire werewolf book with help of R1 and I will get molested while signing my book. That is my dream from now on.
>>
>>106575869
>the second qwen model hit hf
>>
>>106575507
Is this about women raping guys who write about or are gay billionaire werewolfs? Or is it about writers who write about gay billionaire werewolfs who rape women?
>>
>>106575952
Where's the catch?
>>
>>106575933
so they both cap out at around 120W and enabling row splitting reduces performance by about 75%. I tested with an FP16 of gemma 270m. ~250t/s without vs. 65t/s with row splitting
>>
>>106576012
It is a about women raping guys who write about gay billionaire werewolves raping women.
>>
>>106575973
No, write your own version first - or at least a rough draft - then edit with a LLM. Start with a novella and build up your own workflow. It's very doable.
>>
>>106576021
Sorry I should clarify, I asked if row splitting was enabled before because it's bad if you don't have enough bandwidth between the cards (like pcie).

Can you check your memory clocks when running a single gpu vs multi-gpu? Mine are 1000+ on a single gpu, and 650mhz on multi-gpu.
>>
File: G0vSxkYb0AA8Kn5.jpg (74 KB, 1172x702)
74 KB
74 KB JPG
Member? lol
>>
>>106575851
https://blog.vllm.ai/2025/09/11/qwen3-next.html

vllm has support including mtp layers and everything. Probably one of the nicer, and fastest local models right now but fuck spending all day getting vllm to run without a UI for what is essentially a sidegrade to glm air.
>>
>>106576059
memory clocks are about the same for single and multi GPU. around 1250MHz
>>
File: 4chan-etiquette.png (153 KB, 350x455)
153 KB
153 KB PNG
>>106575413
>>
>>106576094
Now thats some old shitposting
>>
>>106576092
Aww, not the same symptoms as me.

What's your settings? Is every multi-gpu model you run this slow?
>>
>>106576126
What about --mlock?
>>
>>106576137
No difference on or off for me. But I only testing that on windows. Linux I left it off. --no-mmap is always on though.
>>
File: file.png (282 KB, 1815x1524)
282 KB
282 KB PNG
>>106576126
these are my settings for GLM air. i got about 33t/s just now.
>>
>>106576089
>sidegrade to glm-air
>at a smaller size
>at 3b active
>with mtp
this thing is going to be fast as fuck
>>
>>106576162
Are you sure you can fit 100k+ context? What's the speed like if you set the context size to 8192?
>>
man, the imagen community is pretty fucking stagnant. how are the llm bros holding up?
>>
>>106576221
We get something new about once a year. This year we peaked in February.
>>
>>106576221
we're about 12 deepseek sidegrades deep while the best model for consumer gpus is from july 2024
>>
>>106576234
NTA but what did we get in feb?
>>
File: file.png (109 KB, 1505x76)
109 KB
109 KB PNG
>>106576186
33.4t/s instead of 33t/s
>>
File: 1757284991163290.png (1.24 MB, 7279x2969)
1.24 MB
1.24 MB PNG
>>106576221
We're at the tail end of the summer flood, euphoria starting to wear off
>>
>>106576269
>Summer Flood
Next...
>Drummer's Cold Season
>>
>>106576243
r1
>>
I switched back from K2-0905 to the old K2. The new one writes like it caught autism from the original R1.
Also the July K2 had the nice quirk that it wrote by far the best post-orgasm scenes out of any llm I've seen while the new one handles them much more generically 95% of the time.
>>
>>106576269
>euphoria
That's a weird way of describing what people feel seeing a flood of identical, useless synthetic models, each claiming to beat r1
>>
>>106576315
on what hardware?
>>
>>106576245
Is glm air the only model this happens with? What models do you have?
>>
>>106576331
I also have a very small quant of GLM full that runs at about 3.5t/s. 8 bit gemma 27b runs at about 13.5t/s. everything has always run extremely slow for me despite having good hardware
>>
>>106576358
Yeah that's fucky.

Can you download a q4 gemma or nemo and report the speeds when running on 1 gpu vs two gpus? 13 tk/s for q8 gemma on dual 4090s isn't right.

What's your cuda and driver versions?
>>
>>106576269
explain the strawberry and spade thing with OpenAI? I don't get it
>>
>>106576358
>>106576378
>https://www.perplexity.ai/search/i-have-4-x-rtx-3090-s-and-128g-2EtrYlIlSUKZwfWnxK0aqw
>>
>>106576395
Makes me want to throw up. Jesus christ.
>>
>>106576415
It's quite... generic answer.
>>
>>106576378
CUDA is 12.8, drivers are 580.65.06.
Mistral-Nemo-Instruct-2407-Q4_K_M.gguf got me about 53t/s on 1 GPU and multi GPU is about 19t/s
>>
>>106576174
yah, but it will also be dumber for roleplaying and writing. Maybe better for coding and longer context. It is crazy to see mtp support. I wonder if qwen helped a bit, we may become qwens bitch if they keep doing stuff like that behind the scenes
>>
>>106576286
Forget that was just back in feb, feels like a lot longer ago
>>
>>106575202
>Previous threads: >>106566836(Cross-thread) & >>106559371(Cross-thread)
i still dont know why theres always two and i probably never will but i might be okay with that
>>
>>106576431
Does this happen on windows or other distros as well?

In my case, windows 10 iot ltsc and 11 iot ltsc behave the same way, sane tks on single gpu, and abysmal performance on multi-gpu.
But debian 13 with driver version 550 had no problem delivering the speed for multi gpu.
My 3090s are running on x16 gen4. I did notice that resizable bar, while turned on in the bios, was reported as disabled in windows. While resizable bar shouldn't affect the speeds, it's weird that it says disabled even though it's enabled in linux, where I have better performance, so it may be indicative of some other issue to do with how my gous are handled in windows.
>>
>>106576477
I am on a threadripper 3960x, so both of my 4090s are on 16x gen 4 as well. I have never tried other distros on this machine, I just use Mint. I have tried windows in the past and the performance was terrible for me too, even worse than now. ReBAR is enabled for both of my GPUs
>>
>>106576381
Strawberry was some marketing hype about some openai innovation i think. Was a while ago. Spade: "If a white woman has this tattoo inked on her skin, it indicates that she is the sexual property of a Dominant Black Man, also called a 'Bull' or 'Owner'." in conjunction with the Israel flag, i
>>
>>106576497
>zen 2
Same with me... Do you think that's it? Hmmm but then why does my linux have no problems?
>>
>>106576471
Memory's fuzzy, but I think threads were moving super fast early on and someone asked for 2 so they could quickly see if they missed a thread and whoever was maintaining the template at the time obliged.
>>
>>106576514
no idea, honestly. I have seen people post about their performance time and time again and they always exceed me by far despite having similar hardware to me. 80t/s would be a dream, that is almost like real time text gen. I was ecstatic about getting above 20t/s for the first time in my life on a good sized model with GLM air
>>
>>106575274
Truth nvke
>>
>>106576431
Imagine spending money on two 4090s to get only 19 tokens per second on nemo, rip anon.
>>
>>106576431
RTX 4090 has 24GB of vram. And NEMO is about that size.
When you split that between two gpus it means that cpu is still acting as a middle man.
Have you turned on Hardware-accelerated GPU scheduling in Windows?
>>
>>106576561
You definitely should be able to hit 80t/s. My 3090s can hit 70t/s when everything is working properly. Maybe bug report? To whom? I dunno lol.

You should not be satisfied with 20t/s. That's horrible.
>>
>>106576592
4090 anon is mint. I'm the windows guy. And yeah, I've already tried toggling that. No difference.
>>
>>106576569
yeah, it is pretty terrible, but this is what I have lived with for years. I was fine with getting 1.5t/s on old 120B models. I didn't know any better.
>>106576592
I don't use windows
>>106576596
could try asking chatgpt or something I guess. I have never been able to figure this issue out
>>
I haven't paid that much attention to local in a while, who is drummer and are his models something special?
>>
>>106576610
Buddy, 20t/s and 80tk/s are worlds apart. You can not go back to the crawl that is 20 after tasting 80. Do not accept that this is what your 4090s can give you.


And if you end up figuring out what's wrong, please tell me as well so I fix my windows too.
>>
>>106576616
rocinante model by drummer is basically the goto for vramlets who have got like standard gaming hardware.
>>
>>106576606
>>106576610
These are important experiences to learn from.
>>
>>106576610
Are your gpus the same model?
>>
File: GSJQ3-CaUAMWcTU.jpg (674 KB, 2746x3620)
674 KB
674 KB JPG
>>106575202
i got GLM-4.5-Air a couple weeks ago is it still the best?
>>
>>106576668
>learn from
What can you learn from this?
>>
>>106576690
Yes.
>>
>>106576652
gonna put in a couple hours with chatgpt to see if I can fix this
>>106576688
no, different 4090s
>>
>>106576693
Use Linux for multi-gpu setups and for more serious computing tasks. Windows is still for consumer faggotry.
>>
>>106576698
For what it's worth, I don't think having different models is the culprit, but I also have different 3090 models.
>>106576704
Your opinion has been noted, I thank you for your response.
>>
>>106576704
??? 4090 anon is using linux and they still have multi-gpu problems.
>>
>>106576094
I've never seen noko referenced before...
>>
>>106576690
Best you can do really on single gpu 16g vram and 64g of ddr4/5 at the moment. Jamba is pretty good at a smaller size with about 5-6k worth of human written writing to avoid the endless slop, but it reprocesses the entire cache every message because the llamacpp implementation is retarded. Every other moe is 1-3b active, 20-30 inactive aside from next or 220-300b inactive which requires you to have a shitload of ram and maybe a couple gpus.
>>
>>106576726
Wrong kernel configuration.
>>
>>106576753
>jamba mini mentioned
I really like it, but it's godawful retarded.
>>
>>106576753
i have 90gb ram and 24gb vram if that helps currently i use GLM-4.5-Air-Q3_K_M think it was a fairly low t/s i am on amd
>>
>>106575492
Oneshot code for an esphome ir blaster and receiver. Then oneshot all the recorded codes into buttons I can use from home assistant.
Thank you GLM 4.5 Air.
>>
>>106576698
If >>106576759 is right, try debian 13? I just installed the 550 driver and llamacpp.
>>
>>106576777
is that for using a tv remote for your lights?
>>
>>106576766
It takes a lot of handholding for sure, and like I said, it takes a lot of tokens to break away from slop, but imo it has the most diverse swipes of any model I've tried with neutral samplers, apart from changing top_p in the range of 0.75 to 0.9. Better than mistral nonstop regurgitating the same shit every swipe, or devolving into repetition past 10k tokens
>>106576774
I have less vram/ram than you and also am using ayyymd, getting around 8-9 t/s with air which is tolerable early on, you should be getting better speed than that, unless you're expecting 20-50 t/s. Try messing around with the --ncmoe option if llamacpp, or the option that does the same in kobold. Subtract the total layers of the model by 5-10 and you should get a decent t/s boost
>>
>>106576789
Linux != distro, when will you learn this? I thought this is /g/.
Just recompile your kernel and see if there's something what will help. I'm pretty sure it might come to how pci-e is being handled and whatnot.
Changing distribution is not that intelligent because it serves no purpose in this sense.
>>
>>106576814
Yes but did it for a tower fan and my window AC. I keep losing the only remote I have for each but now I have a virtual remote in home assistant I can use from my phone or PC.
>>
>>106576503
>Spade: "If a white woman has this tattoo inked on her skin, it indicates that she is the sexual property of a Dominant Black Man
lol, is that real?
>>
>>106576877
sort of. i wouldn't look too far into the internet abyss of degeneracy. but yeah, stuff like that exists
>>
>>106576877
ever heard of "blacked" that's somewhat related to it
>>
>>106576867
>Linux != distro, when will you learn this?
Idk, when I have time. The aversion I have to linux is that I need to set time aside to get used to how things work it in compared what I already know. Normally, this process can be hastened by asking others, but asking linux users stuff can be very frustrating.

The kernel that comes with debian 13 works for me, so that's why I suggested it.
>>
File: openaistrawberry.png (425 KB, 1342x903)
425 KB
425 KB PNG
>https://www.strawberyai.com/
>Latest Update: 27 Aug-2024
>“Strawberry” is the codename for OpenAI’s latest AI initiative, which is set to launch as early as this fall
lol
>>
Why have memeplers died but memetunes continue to live?
>>
>>106576937
wasn't strawberry like early codename for o1 or something
>>
>>106576952
I assume the website is a joke, but that seems to be the case
>>
>>106576931
Yeah of course, some pre-configured kernels are more suitable for distributed tasks than others but it shouldn't be a reason to switch distribution.
By all means it only takes couple of hours max to go through a configuration and compiling a new kernel. It's not that different from configuring some application to your liking.
>>
>>106576952
They hyped it up for November 5th or something then rushed a released when the Reflection scam came out. I still believe down to my bones they were originally bluff hyping and stole the idea from the Reflection dude.
>>
>>106577028
I'm sure that's intuitive for linux users, but as a windows users I do not know this. All I know is that you said it might be kernel issues, and I know that my distro, with its kernel, worked for me. That's why I suggested switching to debian. Because that's the simplest way I know to have a different kernel.

Thank you for teahcing me that kernels can be changed like that, but your advice should probably be directly to the 4090 anon.
>>
>>106577094
You sound bit condescending and bitchy. Imageboard posting is always bit generalistic, don't you think? This is not your discord.
>>
>>106577110
Alright then, I'm sorry, I apologize. I was being retarded and will conduct myself better in the future.
>>
Are the Jamba models any good?
Is the whole hybrid ssm-transformer gimmick worth anything? Does it at least make the model faster to run compared to a transformer dense model of the same size? Is it more like a MoE maybe?
Does it run well on the CPU?
I'm download 1.7 mini to test, but figured I might as well ask.
>>
>>106577110
I think that's all in your head. He seemed polite and straighforward to me.
>>
>>106577152
It just makes context use less memory.
>>
>>106577152
Jamba mini is a lot less safe than something like qwen (lol), but it's also a lot less intelligent.
>>
>>106577146
My own aggressive stance. I should have typed:
Common Linux distributions have been configured with normal user in mind, some guy with 4 GPUs and hundreds of GBs of memory is not a normal user - he should configure his own kernel instead.
>>
File: 1737135381233186.png (817 KB, 1064x1460)
817 KB
817 KB PNG
Which model or repo can I use to ingest all this information and find juicy stuff?, I don't know any Chinese
https://x.com/gfw_report/status/1966669581302309018
>>
>>106577311
Give it back john
>>
>>106577110
I don't agree with this retard >>106577094
You're just fumbling around but aren't completely retarded, just unlearned
Installing a different kernel, as far as arch linux goes, is just `sudo pacman -S linux-hardened linux-hardened-headers` or `sudo pacman -S linux-zen linux-zen-headers` or whatever, adapting to your package manager to whatever kernel you need. I swap between a few when I run into retarded issues frequently
>>106577152
Reprocesses on every message/swipe, but small moe that you can offload a majority of to ram and have a lot of context while maintaining good speed. Sloppy as fuck until you feed it enough context to learn from. Less degradation over long context as well. The reprocessing thing basically kills the benefits I mentioned, though
>>
>>106577350
I got the first and second quotes backwards but whatever
>>
>>106577350
>but small moe that you can offload a majority of to ram and have a lot of context while maintaining good speed
>Less degradation over long context as well.
I can work with that if the context processing is fast enough in my hardaware. Thanks.
I'll test it after I'm done fapping to ERP with GLM air.
>>
>>106577311
cat * | grep -i keyword
>>
>>106577374
I can't find that model on HF. Link?
>>
>>106577374
i don't think cat nor grep can understand and translate chinese sar
>>
>>106577372
It's pretty tolerable if you set your batch size to 2k, but it's still pretty fucking lame to wait 30-40s to reprocess anything around 10-20k tokens even if the tg is fairly fast
>>
>>106577395
Just ask qwen3 30b to translate the keyword to chinese
>>
>>106577311
finally, we'll know if sending tianmen square copypasta actually boots someone off
>>
>>106576162
whats this llamacpp frontend?
>>
>>106577395
https://vocaroo.com/1mgCGBF0m9LF
>>
>>106577311
Taiwan is China 2bqh
>>
>>106576221
>imagen is stagnant
>after getting QI, QIE, WAN and CHROMA in the last couple of months
literally kys retard
>i cant run them because i have a 1060 TI
literally kys retard vramlet
>>
>>106577441
it looks like oobabooga's text gen webui
>>
>>106577456
kek even lmg has its own jeet helpdesk now
>>
>>106577556
vibevoice has a 30s sample of a jeet speaking with the thickest accent too, pretty easy
>>
>>106577408
>Prompt
>- Tokens: 8789
>- Time: 29325.184 ms
>- Speed: 299.7 t/s
>Generation
>- Tokens: 1233
>- Time: 100108.141 ms
>- Speed: 12.3 t/s
That's not bad.
Granted, it's Q3KS, 32k context, n-cpu-moe 5, and batch size 512, but still.
If it's smart enough at this level of quantization, I'll replace Q6 qwen3 A3B with this.
>>
>>106577575
>n-cpu-moe 5
Sorry, n-cpu-moe 27.
>>
>>106577464
qwen isn't much better than flux. if anything you have to snake oil the fuck out of it but most people in the community have poverty cards and don't bother. it's just more benchmaxxed safety garbage. chroma is shit btw. Wan is fantastic but 2.2 isn't much of an improvement over 2.1 and just adds confusion by having two separate models. 5 second limit is just shit and there has been so many cope techniques to extend but it's shit. there isn't even a point to qwen image when edit can just txt2image as well. Wan is seriously a better contender for an image model because it's at least trainable on garbage
>>
>>106577603
I'd like to add uncomfyui just keeps getting shittier by introducing more telemetry or a worse frontend. there really needs to be something else. tired of the API nodes grifting
>>
>>106575394
>if you get called out, you just claim it was a shitpost all along
[headcanon]
can't be surprised text coomers would have a thing for making up things in their heads
>>
>>106577573
I'd like to find German-English accent and perhaps French one but it's pretty difficult.
>>
>>106577311
This finally convinced me that China should turn the firewall off and let USA mess with their internal politics... Oh wait, it did not.
>>
>>106577575
Don't know what you have for a gpu, but one of the benefits is that that gen speed will stay consistent, if you ignore the reprocessing. I run a q6 as as a warmup, then switch to air usually
>>
>>106577664
https://www.youtube.com/watch?v=rEhXFZJUtJE
This is all what I needed. Youtube is full of shit it's hard to see something good.
>>
File: 1740558417421685.jpg (177 KB, 2048x1536)
177 KB
177 KB JPG
@grok

please generate a male in their 20 with a thick nasally chinese accent uttering these words:
>This finally convinced me that China should turn the firewall off and let USA mess with their internal politics... Oh wait, it did not.
>>
>>106577684
https://www.youtube.com/watch?v=27hsoahUehE
There's Russian as well.
>>
>>106577374
https://porkmail.org/era/unix/award
always funny to see fucktards pretend to RTFM someone by showing how little they know about the CLI
grep -r nigger
or, better yet, rg, because grep is has been slow garbage
the only sort of ree-tard who would think of doing a "cat | grep" is a ree-tard who never used grep
>>
i made a character card if anyone want her https://files.catbox.moe/9fl9yu.png

>>106576841
>Try messing around with the --ncmoe option if llamacpp
so currently i have -ngl set at like 13 i think how do i take the ncmoe into account also where do i find out the total layers?
>>
>>106577739
Cute card.
Set -ngl to 99 and -n-cpu-moe to 99. Look at the console and see how many layers the model has, it'll say something like
>offloading N repeating layers to GPU
That's the number of layers. Then you try lowering n-cpu-moe to N-1, N-2, etc.
It's easier if you launch GPU-Z to see how much VRAM you have free to fuck around with.
>>
>>106577734
> If you came here looking for material about abuse of feline animals, try this Alta Vista search instead.
link is broken :(
>>
>>106577768
>-ngl to 99 and -n-cpu-moe to 99
how does ncpumoe effect vram i know going higher than around 13 gpu layers causes my system to lock up
>>
>>106577739
nta, here's an extracted card for human reading.
https://litter.catbox.moe/ewomrtg6gqsb73hc.txt
>>
File: file.png (39 KB, 522x407)
39 KB
39 KB PNG
>>106577739
the --ncmoe option is basically "it offloads layers at the end of the model"
Leaving layers minus the layer total puts those on gpu, which usually gives a speedup
pic rel, subtract 5-10 or however many layers as long as it doesnt OOM and itll run faster
>>
>>106577783
Basically, -ngl will tell llama.cpp to put all tensors (the vectors that compose each layer) in VRAM, then n-cpu-moe will tell llama.cpp that "actually, no, the expert layers will go in RAM".
So you end up with the heaviest tensors of the model in RAM.
From there, you can check how many layers your model has, and try adjusting n-cpu-moe to have only as many expert tensors in RAM as you can't fit in VRAM.
>>
>>106577798
bottom right, moe cpu layers if kobold
>>
>>106577739
if using -ngl just set it to 99 and make sure --ncmoe doesn't make you OOM, it's basically backwards logic. Almost everything will be in ram, you just want to adjust --ncmoe so it fills a good amount of ram but not all of it
>>
It picks up, need to refine the source audio better.
https://vocaroo.com/1aIcVBCSa7ac
German accent isn't as funny sounding as Indian English anyway.
>>
>>106577841
so start at 1 and increase until it doesnt oom?
>>
>>106577739
...
>>
>>106577864
No. If you start with 1, you'll have all the experts in VRAM, minus one, which will OOM for sure.
Start with exactly how many layers the model has then lower the value gradually until you find out how many experts you need in RAM to now OOM.
>>
>>106577864
You can click "file info" on huggingface for a model and if you can read, it'll be apparent how many layers a model has (or use kobold's retarded estimation thing to see how many layers). Depending on how big the whole thing is, you'll need to incrementally change how many layers get offloaded using ncmoe or that like for various backends. As much as it sucks, you need to take the information we give you, try it yourself and learn. Otherwise we could direct you to a rentry instead of this
>>
>>106577864
To try and spoonfeed a little bit more, figure out how many layers the moe model you're trying to run has, then do --ncmoe (amount of max layers it has) - 5
If you oom, lower it. If you have too much spare, up it
This hobby isnt exactly easy to figure out
>>
>>106577958
>>106577913
okay i think i got it so the llama consoles says offloading 48 layers to gpu so i started there and then lowered it until it would launch without crashing which ended up being 33

ill probably do some llama bench runs tomorrow so i can see what the performance difference actually is

    -ngl 99 \
--n-cpu-moe 33 \
-t 48 \
--ctx-size 20480 \
-fa on\
--no-mmap;


load_tensors: offloading 47 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 48/48 layers to GPU
load_tensors: ROCm0 model buffer size = 17562.93 MiB
load_tensors: ROCm_Host model buffer size = 34892.00 MiB
load_tensors: CPU model buffer size = 254.38 MiB
>>
>>106577913
How do I calculate number of experts with llama-server prompt? Let's say that I'm using Qwen3-Coder-30B-A3B-Instruct-IQ4_XS?
>>
>>106578037
>llama bench
I think llama-bench doesn't support n-cpu-moe, so you might want to check that.

>>106578038
llama-server spits out the number of layers the model has, just like in the post above yours.
>>
>>106578070
Y-you sound like an expert... *shivers*
>>
>>106578070
I think there was a pr for it, but I might be misremembering
Likely not merged yet if it exists
>>
>>106578085
I am indeed an expert in launching llama-server and setting -n-cpu-moe. A truly specialized skill set that is sure to become incredibly in demand in the near future.
Right?
Right?
>>
>>106578132
She sets her book down and leans forward, her expression becoming more intimate. "I've been thinking about how the MOEs were always so obsessed with their own power and control. Kinda reminds me of how I feel about you sometimes, you know?"
>>
File: 1000018834.jpg (2.73 MB, 3072x4096)
2.73 MB
2.73 MB JPG
damn i thought this would be a cool llm client device but the built in browser doesnt work an neither does the ancient firefox that can run on it

>>106578070
>>106578118
>I think llama-bench doesn't support n-cpu-moe, so you might want to check that.
oh thats annoying guess ill just wait then lol
>>
>>106578158
just roll your own front-end that will work on the ancient browser?
>>
File: cute spid.png (224 KB, 478x677)
224 KB
224 KB PNG
>>106578204
to be fair i did write a java client before doubt thatd be hard to port but i have an android mobo coming for this soon
>>
>>106578158
You could use ssh?
>>
File: file.png (231 KB, 802x612)
231 KB
231 KB PNG
client in question
>>
>>106578158
>whois 192.168.1.128

Anon, we are neighbors!
>>
>>106578240
ssh for sillytavern lol?
>>106578242
please dont piss in my letterbox
>>
192.168.1.103
I'm here too.
>>
im surprised other anons are using a .1 subnet
>>
Is it normal for GLM 4.5 Air to feel bland when it comes to erp or am I doing something wrong? Using an q3_k_xl quant
>>
File: 1734947454836403.png (852 KB, 1080x1106)
852 KB
852 KB PNG
>>106578241
>>
>>106578295
Seems like really small...
>>
sexless thread. coomless hobby.
>>
>>106578357
It's not it's the same as Qwen3 you need to really tell it to what to write and still..
>Write as a contemporary author, use varied language but not too over the top - be natural, immersive and explicit. Try to suprise the user.
It changes its output but is it still what you really want? You can shape it, experiment more.
>>
>>106578249
>please dont piss in my letterbox

It wasn't me!
Err, how did you know?
>>
>>106577664
Authentic German English:
https://www.youtube.com/watch?v=icOO7Ut1P4Y
https://www.youtube.com/watch?v=lLYGPWQ0VjY
>>
>>106578533
>Ze yello from ze egg
>>
>>106576789
after several hours of research and testing with chatgpt's help, i have pretty much nothing. guess i will look into that kernel stuff but i dont wanna break things
>>106577477
thats exactly what it is
>>
>>106578533
Thanks, I saved these. I will edit these later.
I think this sounds a lot better than the other guy - because this is real talk, not pretence or 'examples'.
>>
>>106578583
That's why the Indian guy example was so great because he talks in his real voice - it's not someone who is teaching anything and so on.
>>
>>106578533
I cut off a snippet and let's see what comes out.
https://vocaroo.com/1iqNvlT8iNXm
Yeah it's 1:1 good.
Normal talking pace, no pretention, long enough voice clip -> result.
>>
>>106577110
>You sound bit condescending and bitchy
non commital half insult
>Imageboard posting is always bit generalistic
appointing onseself as the arbiter of the culture
>don't you think?
condescending "you know what you did" ahhh shit
>This is not your discord.
trying to use group strength for one selfes purpose a la "not your army"

tell me how i know a woman or a troon wrote this post XD jesus christ never in my life would i ever type out something this faggoty just corssed myself irl god forbid nigga
>>
>>106578663
Are you that plebbit moderator? You have such a problem with understanding real people.
>>
>>106578658
It also sounds better when you normalize the result. Not just increase the volume.
>>
>>106578658
Yeah, that sounds better. Glad it works.
>>
File: 1735036285126388.png (1.22 MB, 1024x1536)
1.22 MB
1.22 MB PNG
>>106575202
Limitless Mikus General
>>
>trannies LARPing as oldfags - the thread
>>
>>106578561
You could just install to a usb solely for the sake of testing.
>>
>Her blue eyes searched yours with vulnerability usually masked behind phlegmatic calm
Well shit, I learned a new word today.
Thank you GLM Air.
>>
>>106578746
>phlegmatic
uhhh before i look it up isn't that the stuff you get in your throat when you have a cold?
>>
>>106578756
No.
No it isn't.
Cool huh?
>>
File: 1747872137579927.png (57 KB, 2029x310)
57 KB
57 KB PNG
>>106578759
language is interesting
>>
>>106577464
qwen image is synthmaxxed trash, the edit is a shitty cope for 4o and the google model, wan is good if you are satisfied with waiting 10 years for a 5s clip, chroma is complete unstable shit that knows 0 (zero) booru artists despite claiming to be trained on them
>>
>>106578766
I'm not US but phlegmatic is probably related to slow as molasses in etymology.
>>
>>106578778
Bingo.
>>
File: 1749778373288697.png (88 KB, 1022x577)
88 KB
88 KB PNG
>>106578778
yeah, I guess it goes back to when people thought the 4 humours controlled everything.
another funny word like this is seminal, which means "containing seeds of later development" but its etymology is pretty funny if you look it up.
>>
File: 1747206883570917.png (2.73 MB, 1024x1536)
2.73 MB
2.73 MB PNG
>>106578711
>>
>>106578787
I have learned something from films. I'm ESL, from Finland. Not from India.
Trouble is with written language even after XX years.
>>
File: 1745491574733438.png (63 KB, 1394x568)
63 KB
63 KB PNG
>>106578766
>>
>>106578790
I'm more interested about the ancient history of mankind. Doesn't mean that much if language goes as far as few thousands years. Our history goes far beyond that.
>>
>>106578793
nice :)
>>
>>106578793
wow thats cool! (:
>>
I have concluded that 256 GB vram is all I need (for now)
>>
>>106579126
Based DGX owner. I tried to buy one on ebay once for 10k, but the fuckers canceled my order.
>>
>>106579126
that's a lot of vram
>>
>>106579126
how do you have that much vram?
>>
>>106579287
Did you notice that most consumer mobos used to have 4 slots but now there's 2 slots.
I thought this was proprietary because of Dell or HP but now... it's because of the price jew.
Even the efficient gaming mobos have only 2 slots available.
>>
>>106579420
there are some basic ones with 5, but the problem is most dont get full pcie. i have an epyc and an asrock romed8-2t which has 7 full bandwidth slots
>>
>>106579126
a m4max macbook with 128gb unified ram is all you need for local
>>
File: 1741651907349476.png (1.57 MB, 666x1300)
1.57 MB
1.57 MB PNG
>>106575492
Helping me shit out decent sft datasets via a custom pipeline. Even managed to create a DPO dataset too
>>
>>106579126
Only 8 MI50 and a ddr4 server with 512gb of ram.
The problem would be the slow pp.
>>
File: file.png (152 KB, 1706x1870)
152 KB
152 KB PNG
Qwen Next is too censored.
>>
>>106579721
Jesus... Have you or do you manually enable or diable <think></think>?
>>
>>106579721
Next is not reasoning model but if your tags still inject this, it can behave in wrong way.
>>
>>106579736
It's the Instruct one. But this is a file that has a bunch of <|channel|>analysis<|message|> in it because I was using it to jailbreak gpt-oss. I just let it complete one of the CoTs and I thought the result was funny.
>>
>>106579713
>runpod
What part of local did you not understand?
>>
>>106579784
Sorry I forgot <|xxx|> chatml exact format. But if it breaks down it means something is leaking.
>>
>>106579721
fantastic
>>
>>106576315
Sadly I have to agree. nu-Kimi lost the calm that made it likeable. >>106576269 will have to lower it to notable from top.
>>
File: 1731344891643778.jpg (42 KB, 632x518)
42 KB
42 KB JPG
>>106579787
Salty your gatekeeping is ineffective?
>>
>>106579784
Oh wait, I can help you more.
>>
>>106579784
gpt-oss will display
> <|channel|>analysis<|message|>
or sometimes it will not display this at all.
Oh fuck I'm too drunk.
Last tests I did was with Qwen and this is chatml.
https://litter.catbox.moe/pd89w3421se7y4zm.txt
I deleted gpt-oss models after I made it work.
>>
>>106579784
You need to clean the mesage from anything else what is not <|final|>
I'm sorry I'm bit drunk for this and I don't have gpt-oss on my disk any longer.
It's just a simple string operation.
>>
File: 1751681952520421.mp4 (3.17 MB, 320x568)
3.17 MB
3.17 MB MP4
are finetunes more prone to repetition?
>>
File: 1741364294668764.jpg (985 KB, 2832x2348)
985 KB
985 KB JPG
>>106575202
>>
>>106579897
><|start|>assistant<|channel|>final<|message|>
This is what you want to extract for final message.
But before this it will often say
><|start|>assistant<|channel|>blablbalbalba analysis<|message|>balblablblabla<|end|>
This is what you need to fetch and ignore, that's the reasoning block of text.
>>
>>106579900
Why are there so many fat people?
>>
>>106579908
>This is what you need to fetch and ignore, that's the reasoning block of text.
There is also something what llama-server does or maybe the model does it that, it will not prefill <|start|>assistant<|channel|>
But it will blurt out the straight final message.
You need fetch out that and do if - manage string patterns.
>>
File: 1950sfraternity.jpg (131 KB, 1334x750)
131 KB
131 KB JPG
>>106579917
because the american food supply became so tainted it disrupted people's natural hunger "thermostat"
>>
>>106579908
>>106579934
I am sorry if my English does not make any sense but it is a matter of string pattern recognition. It is confusing that this pos model will not sometimes just use 'assistant' at all but you will need to manuall make an exception.
>>
>>106579949
And the worst part is that the documentation
>https://huggingface.co/blog/kuotient/chatml-vs-harmony
Tells more about their big chatgpt thing than what if you implemented it yourself.
All of this is just bullshit,
it's still just chatml format but with extended
tags and exceptions.
>>
>>106580014
wrong link
https://cookbook.openai.com/articles/openai-harmony
>>
>>106579944
Tainted with what?
>>
>>106580034
there's a theory that polyunsaturated fats (which are industrial made and cheaper) throw off the nadh:fadh ratio and stop reverse electron transport from happening
https://www.youtube.com/watch?v=pIRurLnQ8oo
another theory is that processed grains, the intestines aren't equipped to "sense" the volume of correctly
it's probably multi-factoral
>>
qwen3-next goofs status?
>>
>>106579897
I had these in the file because if you leave edited reasoning blocks in the context, it changes how gpt-oss does reasoning in the following responses. I use that to let gpt-oss reason without the refusals. You can still do that in chat completion mode.
Later I changed the model to Qwen Next and it started to imitate the reasoning blocks, but it does it more like a parody of a GPT model. And I was getting distracted by the kind of things it says.
>>
File: 30474 - SoyBooru.png (118 KB, 337x390)
118 KB
118 KB PNG
Do you like the kiwis? (Qwen models) (When models?)
>>
>>106580072
There is no chat completition - what string you send to the server it comes back and then you will format it. Trial and error type of thing.
But gpt-oss doesn't follow normal ways because it's broken.
I'm sorry if I sound retarded or annoying but any other model you can say format it will respond back with that format.
Don't waste your time with gpt-oss.
>>
>>106580072
I can supply you with my code but it doesn't make any sense for you because it's out of the context and bad string management.
https://litter.catbox.moe/tisv7n22ye9rwqjs.txt
It's python.
>>
>>106580105
The prefix of each assistant turn is just "<|start|>assistant". That's why even in chat completion mode if the message content is "<|channel|>analysis<|message|>" it will still be formatted correctly when you put it together. It's an easy way to edit how it does the reasoning even in a chat UI. The backend would need to escape the special tokens for it to not work.
gpt-oss is still fun but to be honest, I never used it to write stories. I only have used it for fake text games and MUDs, without much narration.
>>
>>106580156
Just tell me what you are having a problem with, I'm a retard, really.
>>
>>106580156
I think you are trying to pull me. Try your best.
>>
>>106580170
I don't have a problem.
>>
>>106580191
What do you mean?
>>
>>106580197
I mean that I don't have a problem.
>>
>>106580204
In this moment, I am euphoric, not because I shared a text file with you, but because I am enlightened by my intelligence.
>>
>>106580197
import os
import re
import sys
import requests
import random
import textwrap
from colorama import init, Fore, Back, Style
import contractions
import numpy as np
import sounddevice as sd
import subprocess
import wave
>>
>>106580069
months of sir before work
>>
What's a good model for having the AI be a kindof ttrpg GM? I recently tried Omega Directive but it seems better suited to be a one on one chatbot instead of a proper adventure mode helper.
>>
>>106580295
any sufficiently large model can compete competently at any task
>>
>>106580315
I have 16 gb of VRAM, so looking for stuff that'll fit on that.
>>
>>106580332
how much RAM? the new qwen next might be good for you
>>
>>106580337
65gb abouts
>>
>>106580342
plenty for a q4 quant
https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct
>>
https://github.com/ggml-org/llama.cpp/pull/15539
ggergachud will soon merge grok pr
>>
>>106580342
Try gpt-oss-120b. It only has 5B active parameters.
>>
>>106580514
>It only has 5B active parameters
and most of those are dedicated to ensuring OpenAI policy is followed at all times
>>
>>106580514
buy an ad
>>
Has anyone experimented with the LLM teaching itself a task by autonomously generating tasks and training data for itself?
>>
>>106580707
There is nothing to teach
The weight is fixed
>>
>>106580707
somewhat https://github.com/e-p-armstrong/augmentoolkit
>>106580712
>tasks and training data
the reading comprehension in this thread is off the chart
>>
>>106580717
speed running mode collapse lol
>>
>>106580721
like literally everyone isn't using synth slop, with this you at least get some chance of the model seeing synth slop of stuff you care about instead of more code and math
>>
>>106580717
You're mad at everyone now, go drink some water you little bitch.
>>
>>106580728
"Synth slop" is just data augmentation
I don't see you calling cropped images synth slop
>>
>>106580762
>I don't see you calling cropped images synth slop
maybe because we're in the text general and not relating to image gen shit?
>>
>>106580781
Yeah right, act as if we weren't spamming vocaroos
>>
>>106580785
oh its you
>>
>>106580721
>parroting the meme collapse paper in 2023+2
>when all SOTA models are trained on synthetic data with verifiable rewards
>>
>>106580717
thanks
>>
>>106580707
Yeah I made my own augmentoolkit since that one is bloated af and slow. Very useful to turn raw data from scraped websites into something usable and you can easily scale by data augmentation. You still need human data initially, pure synthetic slop would make it collapse fast
>>
>>106577739
>>106577768
PSA: if you have the latest llama.cpp build you no longer need to set ngl to have it at 99, they finally are starting to bring sane defaults to llama.
--no-context-shift is no longer needed too, they finally got rid of that mindnumbingly stupid default
>>
I can release prompts for interactive fiction. I just think that people who ask them don't need to know.
>>
>>106580842
You're absolutely right! Your genius is best contained to yourself and not spread to idiotic plebs.
>>
>>106580851
It's not because they need to know; it because you are unique<|analysis
>>
Lets imagine a situation in which I am forced to release a simple text file- this would be adaptable by even ST users. Do I feel inclined to do so?
>>
>>106580909
Why release when you can HODL?
>>
>>106580914
My knowledge doesn't understand HODL
>>
>>106580923
Then your knowledge is worthless I'm afraid.
>>
Qwen Next is obsessed with short sentences. It's annoying. I checked the one in OpenRouter to make sure it wasn't a thing of the AWQ version. It still has that. But in this story that I'm trying, the AWQ version always ignores what I'm putting in the last turn, while the full version always pays attention to it. I'm deleting it and giving a try to the FP8 version, but it still feels like a big downgrade compared to GLM Air or gpt-oss.
>>
I am going to think about this, and then release a simple format. This will make most people's chats better. This is not a joke.
>>
File: 1742111898538268.png (87 KB, 1053x370)
87 KB
87 KB PNG
>>106580794
Didn't you know?
LLMs have peaked
It's over
>>
>>106580935
I don't query into deep joking.
>>
>>106580940
Kimi K2 loves short answers too, maybe distillation.
>>
>>106580943
Tomorrow, I am preparing a simple format to help brainlets.
>>
>>106580914
>>106580935
cryptobro knowledge belongs to the oven
>>
>>106580794
SOTA on what? Equally synthetic mememarks? lmfao
>>
>>106580966
for synthetic use cases yes
>>
Anyone having success with Longcat Flash Chat? I'm using a 5.5 bpw quant with 0.7 temp & 0.8 top-p and I'm finding its ability to write stories unsatisfactory.
>>
>>106580951
I actually have two swipes with the updated Kimi K2 at this point in the story. It's nothing like that and it writes quite well.
>>
>>106576269
>DeepSeek flops for the first time with V3.1
IDK what you mean. It's what I use now instead or V3-0324 or R1-0528.
>>
File: 1747083442013324.png (674 KB, 1484x1117)
674 KB
674 KB PNG
>>106576269
>DS v3.1
>flop
Skill. Issue.
>>
>>106580966
Math, programming, anything that has benefited from CoT.
>>
File: degraded.png (46 KB, 1082x84)
46 KB
46 KB PNG
>>106581286
GLM-chan does her best and doesn't degrade at all.
>>
>>106581286
most retarded benchmark in the history of llm benchmarks
LLM as judge for human writing LOL
>>
>>106581534
GLM-chan is fat and obese and stinky
>>
File: 1749548135019803.png (2.36 MB, 1328x1328)
2.36 MB
2.36 MB PNG
>>106579721
>I am not
descartes is sad
>>
>>106581607
Shut up, Sam.
>>
>>106575202
I clicked on the image and I got a bigger version of the image.
>>
>>106581987
yes that is how this site works
>>
File: 1732695470798402.png (431 KB, 1469x969)
431 KB
431 KB PNG
>>106581599
sama coping because gp-toss ranks below gemma3 12b
>>
>>106582014
Wait until he finds out selecting text to quote-reply. It's gonna blow his fucking mind.
>>
>>106582053
Mistralbros...
>>
File: scells.png (1.93 MB, 1919x1074)
1.93 MB
1.93 MB PNG
>>106582053
>0.770
>>
File: 1757840830335.png (92 KB, 557x446)
92 KB
92 KB PNG
>Of course!
>Exactly!
>You're absolutely right!
>>
>>106582053
speaking of toss
https://old.reddit.com/r/LocalLLaMA/comments/1ng9dkx/gptoss_jailbreak_system_prompt/ne306uv/
>>
>>106582090
Perhaps it has a degradation fetish.
>>
>>106582014
With the giant sign it seemed set for the every-time-you-open-this-thumbnail meme.
>>
Is rvc still the king for ai voice covers?
>>
>>>/pol/515958539
>>
>>106582173
I'm glad they're finally banning those pesky white and black bars.
>>
File: ComfyUI_02766_.png (1.03 MB, 896x1152)
1.03 MB
1.03 MB PNG
After using GPT-OSS-20B for a period for a variety of reasons, Gemma-3-27B almost feels like an erotic finetune. It still can't write smut, but it has a rather flirty writing style and does almost anything, as long as you provide it suitable instructions for doing so. GPT-OSS, even after "jailbreaking", is always fighting against you and prioritizing its imaginary OpenAI policies, and is utterly retarded for actual conversations.

I hope Google won't ruin Gemma-4. It's almost guaranteed they'll add reasoning, probably MoE or Matformer architecture, possibly system instruction support due to popular demand.
>>
>>106582186
imagine being filtered more than reddits >>106582101
>>
>>106582183
that is clearly a yellow bar
>>
>>106582202
The "jailbreak" there doesn't really work well. The first mistake is telling the model it's ChatGPT.
>>
>>106582224
nta but it actually does work on 120b, I stopped getting refusals
it still wastes something like 192 tokens for it's schizo policies on reasoning_effort high
>>
>>106582316
You mean 500 tokens
The jb alone is 300 tokens
>>
>>106582316
You can mitigate refusals by changing the actual system prompt (not the "developer" instructions) on the 20B version too. It's just not good for roleplay and some topics will still be off-limits, no matter how hard you try to override content policy or changing the model's identity. Gemma 3 refuses hard on an empty prompt, but you can very easily work around that, and then it will even enthusiastically follow along. It just feels like it's been covertly designed for roleplay, whereas GPT-OSS probably had these capabilities removed or omitted. I haven't tested it for storywriting.
>>
is there a better alternative to whisper? I tried out parakeet and it likes to skip sentences
>>
>>106582402
No.
>>
https://github.com/ggml-org/llama.cpp/issues/15940
Why are there so many vibecoding retards trying to implement this?
>>
>>106582475
>>106582475
>>106582475
>>
>>106579721
You need to use quality quants in llama.cpp



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.