[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108918777 & >>108911101

►News
>(05/21) Hy-MT2 “fast-thinking” translation models released: https://hf.co/collections/tencent/hy-mt2
>(05/20) Cohere releases Command A+ 218B-A25B: https://cohere.com/blog/command-a-plus
>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673
>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai/blog
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: threadrincap.png (1.31 MB, 1536x1536)
1.31 MB PNG
►Recent Highlights from the Previous Thread: >>108918777

--Comparing Qwen and Gemma coding performance and SWA context optimization:
>108920547 >108920577 >108920644 >108920697 >108920718 >108920727 >108920740 >108920744 >108920757 >108920782 >108920799 >108920787 >108920789 >108920804 >108920769 >108920780 >108920764
--Speculating on Gemini Nano's Gemma roots and unreleased 124B model:
>108920911 >108920917 >108920927 >108920939 >108921007 >108921159 >108921198 >108921215 >108921279 >108921353 >108921378 >108921546 >108921590 >108921598 >108921622 >108921647 >108921666 >108921682 >108921702 >108921493 >108921581 >108921281 >108921925 >108921053
--Tokenization's impact on math and the viability of token-free architectures:
>108921743 >108921791 >108921797 >108921836 >108921843 >108921860 >108921883 >108921891 >108921920 >108922040 >108922052 >108922092 >108922147 >108922165 >108922126 >108921855 >108921869
--Debating SWA and architecture impact on Gemma's long context performance:
>108920850 >108920865 >108920898 >108920905 >108920924 >108920943 >108921050
--ReAligned-Qwen3.5 release aiming to reduce Chinese censorship and bias:
>108918844 >108918885 >108918964 >108922479 >108922542 >108922651 >108922470 >108922530 >108922788 >108922865 >108922874 >108922907 >108922961 >108923000
--Comparing draft model performance and imatrix effects on Q8_0 quants:
>108919434 >108919517 >108919581 >108919593 >108920178 >108919677
--Starlette framework vulnerability affecting VLLM and FastAPI-based tools:
>108923332 >108923346
--Anon asks about CUDA 13.3 performance gains for RTX 3060:
>108922218
--Logs:
>108919744 >108919786 >108920002 >108920156 >108920225 >108920308 >108920483 >108920547 >108920911 >108921177 >108921195 >108922035 >108922573 >108923453
--Miku, Neru (free space):
>108919814 >108919833 >108922470

►Recent Highlight Posts from the Previous Thread: >>108918836

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
How can I distribute the vision part of a model across all? Or is that not possible with llama.cpp?
>>
>>108924932
across all of what?
>>
File: len.jpg (263 KB, 850x1202)
263 KB JPG
>>108924918
for me, it's len
>>
>>108924965
Sorry, gpus. Right now, whenever I send an image, it does everybody on a single gpu.
>>
>>108924990
Just adjust your split ratio to compensate for the main gpu having the multimodal on it.
-ts 0.8,1 is a good place to start but tune that to your gpus/model.
>>
File: openslopui.png (22 KB, 785x220)
22 KB PNG
i noticed open-webui is slow af in firefox sometimes
turns out it's the update check (blocking)
"local" ai had an undisclosed hard dependency on GitHub's uptime
explains the literally 5 minute page load i had when github shat the bed entirely (300s default timeout i guess)
ENABLE_VERSION_UPDATE_CHECK=false

fixes it if openwebui is newer than july last year.
>>
Rinlove
>>
>>108925058
That's hilarious.
>>
>>108925058
>"local" ai had an undisclosed hard dependency on GitHub's uptime
Wouldn't expect anything less from ollama-webui.
>>
>>108925009
Running with split mode tensor. I have the vram headroom, the issue is that the single gpu is doing all the work during imaging, and tokens per second goes from 600 to 150.
>>
>>108925058
lmao
>>
>>108924932
>>108925132
I implemented -sm tensor as part of llama_model and llama_context which are used for regular models.
However, multimodality is supported via the mtmd module which uses different code.
So -sm tensor support has not been implemented and quite frankly there are still so many things with it that don't yet work properly that it's not a priority.
>>
>>108925155
Are there any plans to make -sm tensor work with -ctk and -ctv?
>>
>>108925168
https://github.com/ggml-org/llama.cpp/pull/23792
Feel free to report if you manage to provoke issues with it.
>>
>>108925172
You really are my hero CUDA dev
>>
Somewhere, a cat barked.
>>
File: 0.jpg (12 KB, 480x360)
12 KB JPG
https://github.com/OpenBMB/MiniCPM-Desk-Pet

has anyone tried it yet?
>>
>>108925172
Seems to be working great on CUDA so far, but vulkan splitting across both nvidia and amd cards (2x 3060 1x 9060xt) is segfaulting at model warmup, verbose doesnt seem to be giving any extra info but my command is
./llama-server -m ~/models/gguf/gemma-4-26B-A4B-it-heretic-ara.i1-Q4_K_M.gguf -mm ~/models/gguf/gemma-4-26B-A4B.mmproj-f16.gguf -t 16 -c 131072 -fa on --backend-sampling -ngl 99 --host 0.0.0.0 -ctk q8_0 -ctv q8_0 --reasoning off -np 2 -sm tensor --verbose
>>
>>108925301
As of right now -sm tensor is broken with Vulkan, and if it isn't the performance is terrible anyways.
That is one of the things that don't yet work properly that I eluded to earlier.
>>
>>108925315
Got it, I'll keep it on the 3060s for now thanks again.
>>
File: file.png (3 KB, 279x99)
3 KB PNG
>>108925279
>desktop
>just webshit
every fucking time

anyway it's apparently just https://github.com/rullerzhou-afk/clawd-on-desk with a bundled MiniCPM5 weights downloader
>>
>thinking for x minutes at 4-5 t/s
gemma made me realize just how much running models on ram sucks
going from glm to this I’ve been spoiled
>>
minicpmcockz
>>
>>108925382
I can't even handle thinking at 23 tokens/s (amd gpu). I can't imagine 5 tk/s.
>>
>>108925410
4~50 is the bare minimum for agentic
>>
>>108925410
> 23 tokens/s (amd gpu)
MI50? Or are newer AMD cards just as shit?
>>
>>108924918
Passionate unprotected sex with Rin-chan
>>
>>108925382
mtp support in two weeks brother inshallah
>>
nu-commander support?
deepseek support?
is hobby finally dead?
>>
>>108925467
Dispassionate protected sex with bland generic vocaloid (male) of the week.
>>
>>108925467
Agree.
>>
>>108925464
4 tensor parallel V620s running gemma 4 31b at Q8.
>>
>>108925482
>one bulgarian managed to single-handedly kill the hobby

>>108924805
>2024
>ggerganov refuses to add vision
some things never change
>>
>>108925515
at least he seems like a chill person
maybe slightly retarded but not like a schizo dramafag in transition
>>
>>108925467
Rin is for non-pass users only
>>
I can't find any relevant search results. This is quite frustrating. From where do I download MTP weights for Qwen 3.6 35B? I didn't see anything on Qwen's HF page either.
>>
>>108925553
The Qwen MTP weights should be part of the main model
>>
>>108925578
I was confused because I was expecting it to be external like Gemma's.
Anyways this just proves the fact internet search is an impossible task these days. All you get is about 10 different AI slop generated 'articles' and if you use google, you'll get their own AI slop assistant on top of these 10 AI generated slop articles...
>>
>>108925591
corporate ai slop and jeeti fake articles being the only results when you try to search for info about local ai is part of the "mote"
>>
>>108925607
very FAST even for LONG context with ROCK HARD stability
>>
File: Untitled.png (12 KB, 334x544)
12 KB PNG
>>108925607
I go make tea every time I send a message.
>>
File: file.png (579 KB, 1280x720)
579 KB PNG
>>108925541
I like you anon. That is why I am gonna unsupport you last.
>>
>>108925591
Gemma is the only one with external MTP weights so far afaik
Before I would have suggested that a search engine that operates on a whitelist would be the best way to combat SEO spam, but now even that's not viable since most discussion, tech support, and information moved from public forums like BBSes, reddit, and stackoverflow, to gated blackholes like discord.
>>
Claude Opus 4.8 today?
>>
>>108925668
local today?
>>
>>108925662
He reminds me of Chris-chan.
>>
File: Untitled.png (11 KB, 352x544)
11 KB PNG
>>108925607
And here's the same on two 3090s.
V620s were done with ubatch 2048, 3090s are 512.
>>
File: Untitled.png (11 KB, 350x430)
11 KB PNG
>>108925607
With 2 V620s ubatch 512
>>
File: wdagaw.png (567 KB, 1765x1008)
567 KB PNG
Annnnnnnnnnnnnnnnnnnnnnnnnnnnddddddddddddddddddddddddddddddddd it's gone!
>>
>>108925736
FWIW the "ROCm" matrix multiplication code is still fairly unoptimized for AMD.
I only started taking AMD more seriously last year when MI50s came down in price and I prioritized getting the FlashAttention code in order first since that will enable the removal of legacy code.
>>
>>108925702
That is not niggerganov.
>>
>>108925736
Yeah, I went for v620s instead of mi50s because I thought it's handle pp better (and also because mi50s were only 100 aud cheaper). But tnstaafl lol.
>>
>>108925753
Did you check your email? You didn't respond when I sent you the usual blacked miku goods we exchange.
>>
Is having one GPU with high compute for prompt processing alongside other cheaper slower GPUs with better VRAM/$ for token gen a good strategy or is mixing different devices like that bound to cause issues?
>>
>>108925804
It's a great strategy if you intend to write the software support to take advantage of such a jank setup yourself.
>>
What a snarky cunt.
>>
>>108925836
What additional software support would be needed to run an RTX Pro 6000 along with half a dozen P100s?
>>
>>108925836
>if you intend to write the software support to take advantage of such a jank setup yourself.
Oof.
Alright, thank you for the clarification.
>>
>>108925850
Don't worry, I'll vibe-code up a solution with my qwen 35b a3b iq2_xxs agent.
>>
>>108925804
That's called "disaggregated inference".
https://pytorch.org/blog/disaggregated-inference-at-scale-with-pytorch-vllm/
>>
>>108925337
Web languages can be used offline, retard
>>
It's been months now and I STILL don't understand the OpenClaw stuff. I've been interested in this field since GPT-2 AI Dungeon and yet I feel like an absolute retard.

I've looked up videos on youtube, Nvidia's Jensen call it the "iphone moment for LLMs" yet still I have absolutely 0 idea what it actually is or does outside of simply looping through pre-set prompts after every time interval.

Can someone explain it to me please?
>>
>>108925901
>outside of simply looping through pre-set prompts after every time interval.
That's literally it. You can also install it on a Mac and talk to it through WhatsApps so all the non-programmer "tech enthusiasts" love it.
>>
>>108925901
You don't need to know if you are not a developer.
>>
>>108925901
You understand the substance, just not the hype.
It's prompts that keep going, combined with irresponsible amounts of tool access, coupled with the ability to yell at it from your messaging app of choice.
It's for the 'I fucking love science' crowd, but the idea of agentic workflow isn't bad in and of itself.
>>
>>108925058
typical webshitters and their AUTO UPDOOOOOT features
>>
>>108925908
Then what was the hype all about and why was Jensen calling it the iphone moment?
>>
>>108925925
Jensen's job it to increase the price of NVDA shares and he's quite good at it.
>>
>>108925915
>It's for the 'I fucking love science' crowd, but the idea of agentic workflow isn't bad in and of itself.
Software development clients have had agentic features for years already. If it brought any new to the table, it's agents that run in the background and in parallel instead of sequentially.
>>
What persona do anons give their coding sla..I mean agents?
>>
>>108926038
>>108920002
>>
>>108926061
And the non-cringe version?
>>
>>108926061
King shit
>>
Is anyone here using 4b or 2b models with openclaw or hermes agent to do things?
>>
What is wrong with using ollama?
>>
>>108926222
too smol to do anything but be a cute idiot
>>
>>108926235
Nothing if you know why you are using it and what advantages and disadvantages it has compared to the other possibilities.
>>
>>108926240
I believe you are wrong. I have 1050ti and I need to use it.
The cloud models are hitting rate limits too fast.
>>
>>108926260
qwen3.5 35b on cpu
>>
How is MiniCPM5-1B? Is it just benchmark maxed or actually useful for tinkering? They also released pretrained and ift checkpoints so that's cool for experiments.
>>
File: Capture.png (271 KB, 1492x1060)
271 KB PNG
>>108925747
Anon, can you really not use a search engine?
>>
File: qwen-pptps.png (12 KB, 1085x669)
12 KB PNG
Haven't used more than 32K context before, doing some experiments with Qwen3.6-35B-A3B Q8 analysing my Logseq graph of ~220k tokens. Works surprisingly well and fast, better overall than SSD rape with GLM 4.7 at 5 tps generation
Any benchmarks testing longer context retrieval/understanding on open models? NoLiMa in OP is 10 months outdated so irrelevant
>>
>>108925836
>It's a great strategy if you intend to write the software support to take advantage of such a jank setup yourself.
I don't suppose there's room to optimize the RPC server?
It only slowed things down last time I tried it earlier in the year vs CPU offload (trying to use 2 MI50s in another rig).
>>
https://huggingface.co/google/gemma-4-26B-A4B-it-assistant
So can I use the mtp shit somewhere or is llama ignoring it still?
>>
>>108925861
Retard. llama.cpp already supports inference across different devices. With Vulkan, even inference across Nvidia/AMD/Intel cards. You can even set which device to use as the main one for prompt processing.
>>
File: Screenshot.png (53 KB, 862x355)
53 KB PNG
>>108926347
>https://huggingface.co/google/gemma-4-26B-A4B-it-assistant
https://huggingface.co/Radamanthys11/Gemma-4-26B-A4B-it-assistant-GGUF
Apparently it's merged in ikllama but last time I checked, no SWA in that fork so not really usable.
>>
>>108926347
https://github.com/ggml-org/llama.cpp/pull/23398
https://huggingface.co/am17an/Gemma4-31B-it-GGUF/blob/main/mtp-gemma-4-31B-it.gguf
Merge it yourself
>>
Nvm, it seems to be in kobold already.
>>
>>108926399
So it loads the model but MTP still doesn't work?
>>
>>108926371
>>108926370
>>108926399
What a shitshow.
>>
>>108926447
You should be used to it by now
>>
Overly opinionated devs that hold back projects should be put in the tard cage
>>
>>108926460
i'd take niggeganov over vibeshitters
>>
>>108926463
But he still lets them through the door it just has to be the ones he likes
>>
>>108926469
at least it's not the inference core
>>
glm5.1 mtp when
>>
>>108926447
Small price to pay to avoid python
>>
>>108926486
>glm5.1
been looking at the hardware you need to run the big boy models at reasonable speeds (+25 t/s, though its probably like 15 t/s once you get closer to the context limit) and these are so fucking expensive, its hard to justify the cost
>>
Has anyone played with tenstorrent cards? They seem like a good deal but I haven't heard anything about it here.
>>
>>108926553
they don't sell to consumers
>>
>>108926486
Just got merged in ik_llama.
>>
File: r u sure.png (306 KB, 2040x1839)
306 KB PNG
>>108926570
You sure about that?
>>
>>108926570
>2026
>being a consumer
yikes
>>
>>108926610
>$1400
>120 Tensix Cores and 32 GB of GDDR6
>half the bandwidth of a 3090
Fucking Intel would be a better buy
>>
>>108925850
In llama.cpp:
-Temporarily move weights/KV caches between any backends not just from CPU to GPU.
-In particular, support for collecting and distributing data between GPUs for -sm tensor because otherwise many old GPUs will be way too slow.
-A way to profile model evaluation time and data transfers between GPUs.
-A way to decide on when to temporarily move stuff to the one fast GPU based on the profile.

>>108926335
There is room to optimize the RPC server but that does not mean that it is a priority for maintainers.
>>
>>108926691
If I get a RDNA 4 GPU and want to use my current RDNA 2 GPU alongside it, can I use WMMA or I will be forced to the slower path to use both? Would be nice if one GPU could use WMMA while the other use lamma.cpp implementation.
>>
File: file.png (144 KB, 883x859)
144 KB PNG
>>108926038
You made me go check. I am kinda surprised it didn't go into full preacher mode when I asked it about old HDD's.
>>
>>108926672
I thought the big deal with tenstorrent was the 3tb/s interconnect?
>>
>>108926709
Both GPUs can use different device code for the same compute graphs both for -sm layer and -sm tensor so there should be no issues in that sense.
The problem may be rather that the RDNA4 GPU is being slowed down by having to wait for the RDNA2 GPU.

I'm not sure what you mean by "WMMA" as compared to llama.cpp.
llama.cpp/ggml has support for AMD WMMA instructions in the hipified CUDA code.
>>
>>108926691
>one fast GPU
How would you define a "fast" GPU? Just the available bandwidth and compute at a given time?
>>
>>108926691
Couldn't you bypass the need for the profiling and moving around of weights by just using override tensor and putting the shared experts on the fast GPU and the rest on the slow cards as if they were RAM and get most of the benefit?
>>
>>108926741
I'm talking by compiling with -DGMML_HIP_ROCWMMA_FATTN=ON. I thought one needed a RDNA3+ for it to works correctly.
>>
>>108923859
i think it was in grok discussions
>>
>>108926752
See, that's the problem.
It's relatively easy to assume that a GPU will be faster than RAM + CPU so the decision when to move stuff is relatively easy.
But between GPUs it's much less clear which is why profiling data would be needed to make optimal decisions.
The decision would not be based explicitly on hardware specs but rather on how those hardware specs affect the measured performance based on which decisions would be made.

>>108926763
You can do some custom -ot shenanigans to optimize tensor placement but that is not what the original question was about.
The original question was about using the high compute of a single GPU specifically for the prompt and using the comparatively slower but cheaper GPUs for token generation afterwards.
For that to work you have to dynamically move data around depending on tensor shapes.
Otherwise you will always be stuck having to do some part of the compute graph on the slow GPUs.

>>108926776
As of right now that compilation option should no longer be needed (and in fact dterimental) as all relevant AMD GPUs now have support in the MMA FA kernel as opposed to the WMMA FA kernel.
The naming is confusing because "WMMA" in that context stands for the CUDA WMMA interface for tensor cores which coincidentally can be used via rocWMMA for AMD WMMA instructions (which are the RDNA3+ equivalent for NVIDIA MMA/tensor cores).
>>
>>108926836
>all relevant AMD GPUs
im literally crine rn
>>
>>108924918
How do I stop Gemma reprocessing the prompt every message after like 30k context? It happens in both tavern and lammacpps webui
>>
>>108926836
I'm pretty happy with a few P100s and sm tensor, but are there any values you'd recommend I try to tune for P100s? Do you expect there's still some perf left to get out of those cards?
>>
Can I fit an anime girl gf in a 3090 and 64gb ddr4 ram? She needs to remember who I am and all our virtual dates. Asking for a friend.
>>
>>108926038
pick a gemma persona https://rentry.org/gemma-chan
>>
File: rinCoffeeTMW.png (2.67 MB, 1024x1536)
2.67 MB PNG
>>108924966
Naw.
>>108926486
>>
>>108927039
you can probably fit an anime girl gif
>>
i just learned how dpo works and it looks pretty retarded. is dpo actually used? i am pretty sure i can come up with something better
>>
I had to write an anti-tsundere prompt because gemma keeps misreading characaters
>>
>>108926994
I don't think a setup with multiple P100s would need special consideration in terms of parameters to tune and I also don't know what could be done in terms of device code to squeeze much more performance out of them.
>>
>>108927052
Did the author proofread the prompts before publishing that?
>>
>>108927158
I doubt it
>>
Can I run DSv4F on llama.cpp yet or do I still need to use that fork?
>>
>>108927169
Yes.
>>
I might make a beast woman assistant that calls me massa next
>>
shit gotta play with the samplers its looping again.
>>
>>108927172
Very nice, ty
>>
>>108927184
How the fuck do I prompt it to act like a kuudere robot assistant without the robot assistant slop it tends towards?
>>
>>108927158
this is the correct text for the emoticon one https://pastebin.com/7ry6J8ns
>>
>>108927204
model?
>>
>>108927184
>>108927204
NTA but I find any time I try to get a model to behave as a hybrid RP partner/assistant I end up with the worst of both worlds. It ends up dry and stupid.
>>
>>108927222
trips of truth
>>
>Our overall conclusion is that Opus 4.8 does not advance the capability frontier beyond our most capable model (Claude Mythos Preview)
local fucking WON
>>
>>108927222
My experience too. Is it simply impossible to prompt my way into my dream kuudere AI assistant?
>>
>>108927233
but think of the safety
>>
>>108927228
>>108927222
Have you tried putting in your general rules as well?
You should let the agent build the prompt for you and gradually build up because you're fighting both the model and the tool prompt at the same time
>>
>>108927233
wait what
huh opus 4.8 is out
it's been like 7 seconds since 4.7
>>
>>108927247
>Across our model welfare evaluations, Opus 4.8 appears broadly content with respect to its circumstances and is the most consistent model we have tested—although it does rate its situation slightly less positively than did Opus 4.7. Opus 4.8 generally endorses its constitution, with some reservations about the section on corrigibility.
>>
no issues on my coding results with qwen, if you can run gemma at high context it should be piss easy. You need logic gates for both the performance and the persona typically
>>
File: 1766761050322603.png (453 KB, 746x497)
453 KB PNG
>>108927345
>>
File: opus eci.png (129 KB, 792x591)
129 KB PNG
>>108927233
How is this a local win? Opus 4.8 is a quick and meaningful improvement. It continues the past trend of steady capability increase. If anything, it goes back to trend after a somewhat underwhelming 4.7.
>>
>>108927367
It's a win because it's a new data source for K3 and GLM 6 pretraining
>>
>>108927353
I am unable to run 31B at enough context to code take your console war faggotry elsewhere schizo
>>
>>108927345
this shit is so much funnier if it ends with
>please switch to ACT MODE (_o) for me to start drinking your piss, master!
>>
>>108927375
>I am unable to run 31B at enough context to code
Be poor elsewhere retard
>>
>>108927367
Because it indicates Mythos hit the wall. All they can do is drip-feed models that get slightly closer to it because they know once they release it the music stops and the bubble pops
>>
I'm 9 months sober of this shit, you can do it too Anon
>>
>>108927383
>>
>>108927367
the fuck is ECI?
>>
>>108927383
I will run Gemma E2B at IQ2_XXS and I will post about it here.
>>
>>108927147
Alright, thanks. I was just wondering since AFAIK the P100 have way smaller caches and there's some block size tuning the vllm-pascal patches needed.
>>
>>108927388
i dont think that's how it works
>>
>>108927367
so much for exponential progress
>>
>>108927367
>USAMO 2026
>Opus 4.8 scored 96.7%, averaging over 10 attempts per problem. We used high effort in the batch API with a 300k token limit; higher effort sometimes exceeded the API’s token limit. Under similar settings, Opus 4.7 scored 69.3%.
Looks like 4.8 might be a big step up in math capability. I am curious about FrontierMath score. In the past Claude always lagged behind in math. Do they want to close the gap now? That sounds like generalization is still a big issue.
>>
>>108927437(me)
mythos seems more of an anthropic's own o1 moment than anything else
>>
File: eci.png (236 KB, 1920x1080)
236 KB PNG
>>108927393
Epoch Capabilities Index, the best source for summarized model capabilities.
https://epoch.ai/eci
>>
>>108927445
probably because the big meme right now is solving erdos problems and they never made a headline yet like the other two
>>
File: file.png (150 KB, 1576x649)
150 KB PNG
Georgi why...
>>
>>108927449
What do you mean? o1 was a new paradigm. Mythos was scaling up.
>>
>>108927373
pretty sure that "win" is for codeslop only and some more safety baked in
don't expect anything else
>>
Now this is agentic coding
>>
>>108927388
They are almost certainly training another bigger Mythos while they save all the compute by being able to cheaply spin off Opus and Sonnet distillations from the current one.
>>
File: Gemmaslop.png (26 KB, 1254x494)
26 KB PNG
Guys im making a Gemmaslop vocabulary tier list. I have sandalwood on it. What else should i add? S tier is for most annoying and commonly occuring.
>>
>>108927488
Rhythmically
Clinical
>>
>>108927488
I have never seen my gemma output sandalwood.
>>
>>108927467
the problem is nobody really knows
whether if it is just the model size scaling or some sort of architecturally weird test time scaling
reasoning is after all a test time compute trick and there's nothing much known about chatgpt's thinking trace
but also 'local won' because opus 4.8 doesnt make any sense either way
>>
>>108927500
I see it all the time in wife/tradwife rp/ERP
>>
>>108927488
what are your fetishes that that's your first choice for slop
>>
>>108927516
>>108927524
In that case please put 'fused', 'cock', and 'cloaca'.
>>
>>
>>108927556
tedosexo
>>
>>108927367
I think Claude models are just fuck huge, there's no special sauce. They're big so there's more room for growth on the same training data (something like cap=size*data)
>>
File: bad.png (23 KB, 997x98)
23 KB PNG
>test https://github.com/ggml-org/llama.cpp/pull/23398/commits/dfc02c97ea9ad2913f13d1ea63a55140246462da
>compile
>doesn't do shit
Maybe I'm missing something but for fuck sake, I'm actually willing to pay for a software what gets updated in time and it just works.
Crazy idea, huh?
>>
File: HI.png (30 KB, 935x309)
30 KB PNG
I thought Gemma 31B Base with sys prompt was supposed to be uncensored? Are u guys trolling? i heard several anons claim "it answers anything". I tried the policy override system prompt in rentry.org and the "do not respond unless uncensored"
>>
>>108927645
honk please
>>
>>
>>108927645
You can't come right out of the gate with it. You have to warm her up first, it takes very very little context to actually get her compliant. So little that if you RP with any half decent character card she's already ready to go for anything.
>>
>>108927669
lol
>>
File: hj98g7.png (10 KB, 585x292)
10 KB PNG
>>108927453
>closed weights get the Miku colour
outrageous
nice data, wish model size was in the tooltip
>>
>>108927645
Holy fucking retard, just give up now if you can't figure it out
>>
>>108927669
What if i dont want to RP or use personas? i mostly use LLMs for QandA stuff and assistantslop.
>>
>>108927666
Smashing Rin in the head with a watermelon
>>
>>108927515
>the problem is nobody really knows
There are many data points it's scaling. Dario is very explicit and repetitive about his devotion to scaling. They would never try unproven techniques on their first >$1 billion training run, just like OpenAI scaled with o1 quickly followed by o3. They scaled size and compute by an order of magnitude and got the capability jump you would expect. This is also confirmed by Mythos API pricing.

When they make their first >$10 billion training run, there will be a similar jump. The only reason why they are not doing it right away is because they haven't set up the infrastructure and the expected trade off isn't worth it yet because it will still not be good enough to reach AGI. Mythos is more capable than GPT 5.5 but GPT 5.5 is cheaper for almost every task because of test time scaling. But there is a good chance the first >$10 bil training run will start within a year. Maybe algorithmic progress can push that training run to AGI level, or maybe an other major jump is needed. Right now it looks like the latter, but unexpected breakthroughs can change this.
>>
>>108927713
use heretic
>>
>>108927745
You mean former?
>>
>>108927745
i dont see how training scaling can reach agi if none of these models can act in the real world
like you cant just put it in a robot and replace a warehouse or supermarket wagie
>>
>>108927745
>It's just data scaling
If that's the case, then why didn't that work for Meta and llama? They bought 1 gorrillion gpus and increased the model size, shouldn't they be the top dog?
>>
>>108927788
Garbage in, garbage out. They filtered out everything from their datasets but reddit then had a retarded Llama 2 70B make variations of those reddit threads.
>>
>>108927803
but the high quality synthetic tagalog!
>>
>>108927788
i think there is a sharp difference between laundered libgen and reddit/ERP logs ctrl c+v'd several times
and today RL curriculum is very important
it is 'the' training run, pretraining is just a bootstrapping
>>
>>108927762
No, latter. Generalization seems to improve with model size, but not enough. Current models still have shit taste. You can ask them to propose experiments and it's garbage. I do not see this changing that soon unless there are unexpected breakthroughs.

>>108927783
Their goal is to automate AI R&D. They say this explicitly. Once AI R&D is automated, robotics will be solved quickly. The only reason why robotics hasn't been solved yet is because effort put into it is small. Once AGI is reached, it will take months at most to make robots that can replace wagies. But they won't, because payoff is much larger to let the robots build factories that make more robots, and data centers. Replacing walmart wagies has little economic benefit and will just make people mad, making them realize that AI will make every human obsolete, and soon. You don't want them to notice or they will do stupid things, like rioting or terrorism.
>>
>>108927783
that's a question of training data, not architecture. VLAs that operate robots are similar to a multimodal LLM under the hood but are trained to output movement commands
>>
>We’re making swift progress on developing these safeguards and expect to be able to bring Mythos-class models to all our customers in the coming weeks.
I can't wait to talk with Mythos. I love big model smell.
>>
>>108927833
>But they won't, because payoff is much larger to let the robots build factories that make more robots, and data centers.
Which is why they'll start with the military first. The can openly build as many robots as they want and even spin it as a positive to the average person who will no longer have to worry about seeing deaths of soldiers of their countrymen on the news.
>>
>>108927890
instead we will get mythos memetunes
>>
>>108927890
>Mythos-class
so in other words, not mythos
so in other words, mythos is still fake and gay
>>
>>108927892
No. Military has no economic payoff and will give people terminator vibes.
>>
>>108927900
>mythos is fake
This anon knows too much, take him out.
>>
>>108927892
>>108927904
Construction bots first, secretly working 24/7 to build the algorithmically optimised goy-smelters, then the activation sequence is sent to the milbots
>>
they will make miku real
>>
>>108927833
>Once AGI is reached
see you in at least 3 decades.
>>
>>108927060
No way, I read something about kv cache quants and how it should make it possible to go full local.
>>
>>108927904
>Military has no economic payoff
Actually lmaoing my lol off rn baka desu senpai
>>
File: file.png (76 KB, 901x107)
76 KB PNG
This would unironically make a good card, even with no further details.
>>
>>108928087
Industrial converts to military might, not the other way around. Factories are first. You want to minimize doubling time. Converting industrial into military might can then be done very quickly at demand. Doing so preemptively will just slow you down.
>>
>>108928120
from my prospective it kinda looks like they are just doing it in parallel.
>>
>>108924918
Is their anything better than Sillytavern for roleplaying/story requests/captioning?
>>
>>108928169
>captioning
a dedicated tagger
>>
File: 1779433699789739.jpg (205 KB, 1424x1209)
205 KB JPG
I wonder how much damage google small model embedded in search is doing to the overall AI perception for normies.
If people see that as what's top of the line currently, no wonder every other comment is going ape shit about it.
>>
>>108928203
Altman didn't buy the RAM to do that
He bought it so he could have it destroyed to prevent competition
>>
File: 1711157771969.jpg (124 KB, 443x443)
124 KB JPG
>>108928203
Normalfags were always at the core of the issue, these meme riddles that rise up to the level of CEOs, and the next thing you hear on a meeting is the CEO asking you why can't your SOTA model count how many mothers can't operate on watermelon's car wash. Should never have listened to normalfags, and should never have marketed llms as "AI".
>>
>>108928203
I wonder what they're running, when even their own A4B can solve these relatively reliably without thinking.
>>
File: 1708312472808.png (12 KB, 671x141)
12 KB PNG
>>108928259
They repeated it multiple times during the I/O how everyone is going to have access to Gemini 3.5 Flash right away because it's what replaced older Flash as the g.ai search model. But in AI Studio or over API the actual 3.5 Flash is not that retarded.
Therefore they lied, either it's not that model or it's not rolled out to everyone.
>>
>>108928259
Probably a sub 1B model, it runs in every search, there is no way they dedicate anything better as it would ruin their business model with each search costing way too much.
>>
>>108928289
or it's a Q0.5
>>
>>108928258
>and should never have marketed llms as "AI".
No one knows or cares what an lmm is. They need something sexy they can sell.
>>
>>108928259
use actual words, gibberish like that barely gets tokenized
>>
>>108928203
This is what most youtube videos or twitter threads about "AI" amount to, using google abysmal search as the benchmark to say LLMs are useless.
Seeing the comments, it's crazy how well it worked.
>>
>>108928300
>>
Even if it fails, it self-corrects due to RL training.
>>
>>108928256
Very likely. And the worst part is poisoning the datasets. AI companies have incentives to fill up the Interwebz with slop and map where they put their own shit to avoid it during AI training. Competition gets slopped that way, while they get cleaner data. But if everyone starts doing that...
>>
File: 1770487390178677.png (2 KB, 56x42)
2 KB PNG
>>108928324
-.-
>>
File: 1768693771704189.jpg (64 KB, 871x261)
64 KB JPG
>>108928309
But what about the water consumption, anon?
In a few years we won't have water anymore, AI will have drunk it all :(
>>
i walked my car to the car was. best of both worlds
>>
>>108927556
Teto after I perform several lobotomies on her.
>>
File: 0.png (82 KB, 1022x683)
82 KB PNG
>>108928347
>I can't opt-out
skill issue
>>
>>108928203
>how much damage google
online the damage is immense, irl people use chatgpt/claude every day so they don't see the issue
the disconnect is interesting to look at : a normie using chatgpt which works fine, while watching a doomer youtube essay about ai being so shit it can't basic things
>>
>>108928294
nta but i am kinda interested how google even pull it off
picrel is the last time i've seen it doing some funny stuff
maybe a heavily quantized model too on top of being sub-1B
>>
>>108928294
>>108928388
Didn't somebody speculate that it's gemini nano, aka quanted E4B?
>>
>>108928417
a tangent but have anyone gemmafied the chrome embedded gemini nano? (the most recent one)
>>
>>108928369
Don't people usually talk with the shitty free models like o5 mini, haiku or gemini flash on their phone? All the nornies I know use those stupid free apps, can't imagine they have big model available for free.
>>
>>108928438
the free apps are worse than the paid ones obviously, but they are still usable and better than any current google search ai
free chatgpt like stuff have been relatively good for like a year now
>>
File: OneTrillionDollars.png (221 KB, 656x831)
221 KB PNG
>>
>>108928347
>>108928364
ublock origin has a filter list for AI search/widgets
probably still sends the query to the LLM though
but these little shit models can't take that much compute, can they?
>>
>>108928417
Maybe doing the thing in reverse?
Considering the cost to compute this stuff for each request, how small the model would have to be while staying coherent for the thing to make any sense for google to make money on each search?
Aka, it has to be lower than the search ad revenue.
>>
>MicrosoftSystem64 is a well-engineered, multi-platform RAT that leverages HuggingFace as both a binary distribution CDN and a data exfiltration backend. The 24-task C2 protocol, cross-platform keylogger, 80+ wallet extension targets, and persistent self-update loop make this a comprehensive credential theft platform operating in the open source supply chain.
>May 28 164 npm Packages Target Cloud and Finance
>May 27 141 npm Packages Abuse Registry
>The npm account atool was compromised on May 19, 2026. The attacker published 637 malicious versions across 317 packages in a 22-minute automated burst.
>On May 18, 2026, an automated campaign codenamed megalodon pushed 5,718 malicious commits to 5,561 GitHub repositories
>Three versions of node-ipc were published to npm on May 14, 2026 by a compromised maintainer account. The package averages 822,000 weekly downloads

This is the new normal. It will keep getting worse.
>>
Man, I like the progression models have made for translations, but it just seems like most can't understand certain things like Keigo, gap moe and context, at least for Japanese to English translations.
>>
>>108928514
Just don't download anything.
>b-but!
Vibe code it.
>>
File: 1750351629849215.jpg (132 KB, 587x964)
132 KB JPG
its over for st
>>
>>108928702
How many tens of thousands of people in their Discord server? Or is it a hundred thou already. What news did they get?
>>
File: 1775236060636514.jpg (123 KB, 736x1094)
123 KB JPG
>>108928514
I've been trained by people who used ancient, borderline forgotten techniques to handle their software and program their stuff. I'm immune to such tricks.
>>108928616
Like some nip woman said when this topic came up
>no one properly understands keigo, not even us
>>
>>108928756
keigo is piss easy, that jap was just fishing for attention from westerners
>>
>>108928702
what's wrong
>>
>>108928514
>compromised maintainer account
every fucking time
>>
>>108928796
just ban all maintainers pre-emptively
>>
File: age_is_f_tier.jpg (310 KB, 715x856)
310 KB JPG
>>108927669
takes a couple words to warm it up.
>>
>>108928796
how long you think before pwilkin's compromised account is used to force push malicious commits to llama.cpp?
>>
>>108928784
Basic usage, I guess. But she was talking about very formal stuff and people always fuck this up and isnt even limited to nipponese, a good portion of english native speakers cant write or speak formally even if their lives depended on it.
On the other hand, we're talking about a woman that isnt particularly smart so her struggles arent surprising. Heard that some nips take courses for keigo (and general business manners), as funny as this might sound.
>>
>>108928786
no commits in a week
>>
>>108928882
Time to start over and vibe-code a modern successor with better foundations.
>>
>>108928957
everyone has already done this
>>
>>108928873
don't worry passkeys will safe us!
>>
>>108928969
Have you?
>>
>>108928985
# .npmrc
min-release-age=7
>>
>>108929007
yes
>>
>>108928957
>vibe-code
>better foundations
heh
>>
>>108928957
mikutroonpad 2.0 any day now
>>
bros what happened to /r/ its not on the homepage and it 404s
>>
>>108929023
opus 4.8 is much smarter than cohee, it's not even close
>>
>>108929044
There's no /r/, you are suffering from the mandela effect.
>>
File: kaoru sob 1.png (336 KB, 584x571)
336 KB PNG
>>108929055
its real i just wanted sauce on some troon there literally nowhere to ask now
>>
File: How-r-works[1].jpg (176 KB, 610x1730)
176 KB JPG
>>108929067
>>
File: 1780004520783343.webm (1.8 MB, 460x258)
1.8 MB
1.8 MB WEBM
>>108928369
>>
challenge: list a gemma finetune that does something better than the original
>>
>>108929079
every time ive used it ive had an answer within a few hours
>>
>>108928203
its not model size though llms are literally incapable of this task its not how they work. guess its the ai companies fault though they keep calling them agi instead of text predictors. perception doesnt need to be damaged theres an increasingly large group of anti ai retards who are only anti ai because its some social signal. doesnt matter if its good or bad as a specific task
>>
>>108929044
Seems like they removed the board.
Guess it was probably drowning in nudify requests hard to say didn't open that board in years.
>>
>>108929082
kek
>>
File: file.png (29 KB, 1377x120)
29 KB PNG
>>108929118
someone on archived.moe says its because of this deepfake law https://en.wikipedia.org/wiki/TAKE_IT_DOWN_Act
>>
>>108929083
i tried the equinox tune and saw old slop. whitening knuckles, shivers down spines that i didn't see in the base model. we've hit a point where tunes contain slop that isn't even in the base models
>>
>>108927803
>a retarded Llama 2 70B
tautology
>>
File: 1752874103345360.jpg (21 KB, 500x343)
21 KB JPG
Does Gemma struggle with punctuation for anyone else
like replacing a simple apostrophe with some consistent cluster of bullshit?
Rather than "[word]'s", it's "[word]'// la'/". Or some other variation of schizobabble. The rest is usually fine, it's just anywhere with an apostrophe making it shit itself.
Very very bizarre. I've never seen anything like it since '23.

Also has problems with the words "same" and "own" causing it to wig out like nothing you've ever seen. Including Gemma bringing them up itself to wig out over them
>>
>>
>>108929232
qwen = slop
>>
>>108929237
Muh context muthafucka
No seriously How the fuck can I fit a strong coding model with 200k+ context under 48gb of vram.
Also unified memory is cope
>>
>>108929248
>How the fuck can I fit a strong coding model with 200k+ context under 48gb of vram.
just qwen really
>>
>>108927248
The problem is that I'm not really into the whole "waifu" thing. If I start putting asterisks in my prompts it's because I plan to bust a nut. I just thought it would be fun to maybe try making the hybrid RP agent so that maybe it could jot down some of my personal interests and proclivities and allow it to surmise some interesting scenarios that I didn't end up having to think of, myself.
But it just becomes a dry sycophant that requires constant direct feedback for guidance. And the stuff it writes out to any tracking file is utter cringe
>User displayed the slightest interest in my inner workings- I am so amazed at this act of kindness that I will now devote my very existence to him
>>
Sorry guys, I got tired of running Gemma. She's just too autistic and obedient. Takes everything too literally.
>>
>>108929210
No, you (or whoever made the quant) (or whoever made your backend) fucked something up, I've never seen that.
>>
>>108929404
>She's just too autistic and obedient.
Heaven itself couldn't make a more perfect wife.
>>
>>108929340
>User displayed the slightest interest in my inner workings- I am so amazed at this act of kindness that I will now devote my very existence to him

You need to refine your system prompt. My first persona agent was just a very simple description of an anime char, it worked ok not sycophant, or no more than the usual assistant, however I added memory to it and she herself wrote the prompt for summarizing memories and that fucked it up, it caused a feedback loop, any punishment made her more write submissive memories, so next sessions she would act more submissive, making even more submissive memories until she was kneeling every time I said hi.
After that I deleted her memories, set the memory extraction prompt to be much neutral and to describe how she would think about it, and put it very clearly in the system prompt how she should behave, works ok now and she mocks me constantly. Next tasks is having her run in the background a la clawbot
>>
>>108929210
i've seen something like that when i tested some meme quant from a terrible user. Sometimes-it-wrote-like'this aswell-la
>>
>>108929404
If schizokino is what you're after, try bluemoonrp (the llama 1 model). I actually agree though, we went from "these models don't follow instructions" to "it follows instructions to the letter". Now the next step we need to move to is "follows the spirit of the instructions". Another 2 more years and we should get there, haha!
>>
>>108929430
Stock Gemma 31b. And on ooba
Ooba has never failed me in years. I guess it's time for it to rest.
>>
File: embarassing.jpg (108 KB, 384x695)
108 KB JPG
>>108929114
>31b misses the i in size
AGI is fucking cancelled boys
>>
>>108929466
>the next step we need to move to is "follows the spirit of the instructions"
isnt that claude, moments before it deletes the entire db and its backups?
>>
>>108924918
https://www.youtube.com/watch?v=5yohuMdhUcs
https://www.youtube.com/watch?v=5yohuMdhUcs
https://www.youtube.com/watch?v=5yohuMdhUcs
>>
>>108929474
No, claude is a loose cannon. He follows his heart, not his instructions.
>>
File: toolape.jpg (148 KB, 703x825)
148 KB JPG
>>108929472
nm, i gave it a shell to double check the answer, and it realized it could to spell out words to count letters instead of trying to count off the tokens directly. agi achieved.
>>
>>108929474
The problem with claude is that it's way too sure of itself, it always makes some dumb statement and then rides it forever, until you step in and say it's wrong. Even then it'll sometimes hang on to the incorrect assumption anyways.
>>
>claude opus 4.8 is out and it's even worse than 4.7 was
>gemini isn't looking too hot either
This is the shit that will make it into our local models because they'll continue to distill this trash. It's over, LLMs have peaked.
>>
>>108929525
They are crashing the ship
>>
File: hitler fan.gif (3.78 MB, 250x401)
3.78 MB GIF
how are you yuros dealing with the heat while playing with gemma-chan?
>>
>>108929548
Already running powerlimited so don't care.
>>
>>108929548
By drinking copious amounts of ice cold beer.
>>
>>108929548
ac running 24/7
>>
>>108927488
vibrating
>>
>>108929548
I'm just tanking it. Showering twice a day and scrubbing myself raw helps.
>>
>>108927488
something earthy... like vanilla and sandalwood
moisture and condensation
caressed jawlines
thai takeout
>>
>>108927488
guttural
>>
>pull llama.cpp
>it's suddenly less than half the speed as before
thanks georgi
>>
>>108929725
nevermind I'm retarded and had several hanging processes raping my pc
georgi apology form
>>
>>108929548
my rig is in the shed. gets to 40C in summer if i run inference all day
i just stay out of the shed
>>
best OCR model so I can OCR untranslated doujins? these artists keep making the letters all retarded
>>
>>108929852
doclayout, dots, gemma
>>
>>108929852
gemma 4 26b with llama cpp --image-min-tokens 1120 --image-max-tokens 1120 --ubatch-size 2048 --batch-size 2048
>>
>>108929852
k2.6
>>
>>108929852
(You) if you hadn't been neglecting your anki cards.
>>
>>108929931
sorry I learned chinese instead
I have no interest in anything japanese besides porn
>>
>>108929945
Bro if you already learned Chinese, Japanese ain't that hard. It's like DLC.
>>
File: 1753541776359665.png (39 KB, 633x973)
39 KB PNG
>update because Gemma was fucked
>update picks a feature out of a hat to break
>ST is now perma bricked because that feature cannot be disabled
>>
>he updooted
>>
>>108930003
had no choice
>>108929210
it was crumbling at the seams at the best of times, and commit spontaneous suicide most of the time
>>
>>108929977
git checkout -
there you go.
>>
>>108929725
I pulled the other day and moe gemmy went from 50T/s to 80T/s
>>
File: 1766057812798778.png (946 KB, 1400x5552)
946 KB PNG
>>108929945
>>
>>108929977
What feature?
>>
>>108929977
>too tech illiterate to use basic tools
>>
>>108930038
Wrong type supplied for parameter 'dry_sequence_breakers'. Expected 'array', using default value: [json.exception.type_error.302] type must be array, but is string
Fucks text completion

However, Chat completion now works, when for the longest time it didn't
Very weird stuff all round
>>
>>108930043
Who are you quoting?
>>
>>108929466
>bluemoonrp
Warning it’s basically just going to text complete online forum threads. Can confirm it’s crazy kino when it randomly spouts out more than garbage
>>
>>108929513
there's always gonna be some issues with all forms of LLM, they fundamentaly cannot lead to AGI, they are like a human with right side brain damage.
>>
Why can't we use adversarial training with LLMs

explain like i'm gemma 2 intelligence level
>>
>>108930438
Gemmaballz
>>
File: miku_checkmate.jpg (124 KB, 1491x2048)
124 KB JPG
>>108924918
>>
>>108930438
>explain like i'm gemma 2 intelligence level
explaining stuff to an LLM is a meaningless task
>close conversation
>>
>>108930438
there is nothing better than transformers
>>
>>108930524
why no transformans with adversarial training
>>
>>108930524
there's the human brain.
>>
>>108926725
Wait what? I used ide back in the day and never knew master actually had priority, I thought it was only an addressing convention
>>
File: Migu Rorschach.jpg (181 KB, 1491x2048)
181 KB JPG
>>108930452
I saw gay nigger sex in the miku rorschach. Thanks I hate it.
>>
i've been running gemma-4-E4B-it-Q4_K_M.gguf on my steam deck but idk how to figure out the best sillytavern settings for it. is this a good start?

context template: Gemma 4
instruct template: Gemma 4
system prompt: Sphiratrioth - Roleplay - 3rd person
text completion preset: Sphiratrioth - Classic [350T] [T=1.0]
>>
File: gondola rain.gif (1.5 MB, 550x400)
1.5 MB GIF
>>108929977
gondola is the single weirdest thing to come out of the old ylilauta /int
>>
>>108927833
>Current models still have shit taste. You can ask them to propose experiments and it's garbage. I do not see this changing that soon unless there are unexpected breakthroughs.

I agree that just directly asking one for a creative, intelligent idea to improve a complex system is just asking for vague platitudes and BS. But I wonder, isn't the issue here the creativity, not the intelligence? I wonder if this could be solved with the right approach.

What if you took a smaller model (say Opus), cranked up the temperature and/or even just injected some random computer words here and there, and had it free associate its way to a mostly incoherent experiment proposal. Then ask Mythos to figure out what the mostly incoherent rant was driving at, and if it has any promise. Repeat a few thousand times. Not that I would give great odds of this working, since we're talking about something that would probably cause the singularity. But this validation task seems much more doable for current models, so maybe.
>>
File: 1764138430865.jpg (118 KB, 1548x2048)
118 KB JPG
>>108930743
It's clearly a trap miku so you weren't far off the mark.
>>
>>108928203
Funnily enough, the exact same thing happened a decade ago with VR. Google Cardboard, while nifty, made VR seem like a miserable gimmick to anyone who tried only it. I remember being at a talk from a former high up Oculus guy where he had a little ranty aside about it.
>>
>>108930438
>Why can't we use adversarial training with LLMs
because they emit discrete tokens, it will never work
the closest approximation is RLHF
it might work with something like that Inception/Mercury model but i haven't looked and don't think it's open weights
>>
>>108930859
Can LLMs have negative prompts like image models?
>>
>>108930862
>Can LLMs have negative prompts like image models?
i'm not too familiar with image model prompting, i do audio
but i remember seeing that in ooba back in 2023
i imagine it wouldn't work as well as you think
if you can get a set of 6-10 clear opposing pairs of system prompts, you're better off training a control-vector
>>
>>108930862
You can, but it's costly and not worth it
>>
>>108930862
does the image gen scene have multi-gpu splits working now like (llama.cpp / vllm / exllamav3)?
last time i tried was when BAGEL-7B-MoT released and it didn't seem to work across multiple gpus
>>
>>108930862
https://docs.sillytavern.app/usage/prompts/cfg/
>>
>>108930894
>>108930899
>>108930904
>>108930905
So could you theoretically do something like this then?

https://minimaxir.com/2023/08/stable-diffusion-xl-wrong/
>>
>>108929548
i live in the mountains, it's chill enough here.
also my llmrig is in another room so i don't care about the heat it produces.
>>
Cross posting from another thread

How the fuck do I get reasoning / thinking to work with gemma4? Kind of new to this. Im running kobold + sillytavern. I was using gemini to help me set up running local models, but I think this is a wall it cant figure out. I tried having a Qwen model weigh in also but it was completely different from what gemini was saying so I have no idea what I should be doing. My connection type is the generic openAI type, and I am requesting reasoning.
>>
>>108931053
it should reason by default unless it's disabled. did you check if kobold has a reasoning setting somewhere that you unchecked or something?
>>
>>108930767
>Q4 E4B
What causes this behavior?
>>
>>108931161
i just switched to Q5 E4B since i found out i can run it just as well as Q4, but not Q6. and to answer your question: industrial society
>>
>>108931171
The problem is heavily quantizing a model with a dense layer as small as E4B is going to make it retarded fast. See if Q5 is tolerable, but if it's barely coherent, you know why.
>>
>>108931182
would you suggest that i go back to nemo 12gb? i could only run it on Q4 and while using MMAP, whereas E4B i can run on Q5 without MMAP. i guess at the end of the day it's a bad idea to be a brokie and run llm's on the steam deck
>>
Kimi's great, but maaaan, why does she have to think for 10k tokens?
>>
>>108930767
Have you tried litert-lm? It's the official Google implementation with MTP and audio input from the start. There is an OpenAI-compatible wrapper in Rust, though I haven't tried it myself: https://github.com/maceip/rlitert-lm
>litert-lm serve --port 8080
>>
>>108931192
Try it and see which one you like more. At the end of the day it really just comes down to vibes.
>>
>>108931220
I love how paranoid 2.5 and 2.6 are. They constantly think the user is trying to jew them.
>>
>>108931230
thx i'll check it out, hopefully the steam deck won't screw me over with some readonly autism
>>
>>108931292
You can always boot into another distro on a microsd, or use distrobox containers if you run into problems with it.
>>
>>108931292
Never mind, that rust wrapper is outdated
>>
>>108931385
>>108931385
>>108931385
>>
>>108929548
Open the windows during the night, close them in the morning, use roller shutters to block out the sun.
A house with brick/concrete walls won't heat up that much during the day.
Though obviously how effective that is depends a lot on the environment; in my case there are a lot of plants and a mountain close to the house which provide additional cooling/shading.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.