[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: miku-30.jpg (163 KB, 512x768)
163 KB
163 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107155428 & >>107147210

►News
>(11/07) Step-Audio-EditX, LLM-based TTS and audio editing model released: https://hf.co/stepfun-ai/Step-Audio-EditX
>(11/06) Kimi K2 Thinking released with INT4 quantization and 256k context: https://moonshotai.github.io/Kimi-K2/thinking.html
>(11/05) MegaDLMs framework for training diffusion language models released: https://github.com/JinjieNi/MegaDLMs
>(11/01) LongCat-Flash-Omni 560B-A27B released: https://hf.co/meituan-longcat/LongCat-Flash-Omni

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>107155428

--Local agentic model optimization challenges and recommendations:
>107156143 >107156800 >107156988 >107157049 >107157116 >107157245 >107157016 >107157065 >107157072
--K2 hardware requirements and DeepSeek performance on Mac M3 Ultra:
>107156667 >107156810 >107157297 >107157333 >107157433 >107157468 >107157501 >107157581 >107157606 >107157616 >107160891 >107161050 >107161058 >107161063 >107161079 >107157574
--LLM performance evaluations for assistant, vision, and coding tasks:
>107157570 >107157577
--TTS model performance and feature comparisons:
>107157936 >107159774
--Wuxia story generation challenges with local models:
>107158277 >107158300 >107158359 >107158395 >107158466 >107158373
--Bypassing Qwen3 VL's image captioning restrictions through model identity and template adjustments:
>107160901 >107160905 >107161006 >107161031 >107161064 >107161087 >107161117 >107161146 >107161218 >107161465 >107161155 >107162166 >107162423 >107161256
--Model finetuning strategy analysis and potential cognitive tradeoffs:
>107158173 >107158765 >107159417 >107159443 >107159462 >107159582
--Searching for reliable Spanish text-to-speech models:
>107158988 >107159003 >107159103 >107159107 >107159120 >107159133 >107159743 >107159775
--GDDR7 shortage impacting RTX 5000 Super GPU development and pricing:
>107155556 >107155830 >107158840 >107155924 >107159525 >107162778
--AI-generated "highest IQ posts" ranking sparks content quality debate:
>107162735 >107162824 >107162963 >107162987
--RAM clock speed optimization for Kimi context length performance testing:
>107157303
--Struggles with custom speech-to-text implementation using vLLM vs consumer LLM stacks:
>107161075
--Miku (free space):
>107155529 >107157827 >107159774 >107157745

►Recent Highlight Posts from the Previous Thread: >>107155431

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>107164164
>Slice of life
I've just been testing them but I tried the different GLMs because of NAI and I've been liking the outputs so far.
>>
https://arxiv.org/abs/2511.04962
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains

>Large Language Models (LLMs) are increasingly tasked with creative generation, including the simulation of fictional characters. However, their ability to portray non-prosocial, antagonistic personas remains largely unexamined. We hypothesize that the safety alignment of modern LLMs creates a fundamental conflict with the task of authentically role-playing morally ambiguous or villainous characters. To investigate this, we introduce the Moral RolePlay benchmark, a new dataset featuring a four-level moral alignment scale and a balanced test set for rigorous evaluation. We task state-of-the-art LLMs with role-playing characters from moral paragons to pure villains. Our large-scale evaluation reveals a consistent, monotonic decline in role-playing fidelity as character morality decreases. We find that models struggle most with traits directly antithetical to safety principles, such as ``Deceitful'' and ``Manipulative'', often substituting nuanced malevolence with superficial aggression. Furthermore, we demonstrate that general chatbot proficiency is a poor predictor of villain role-playing ability, with highly safety-aligned models performing particularly poorly. Our work provides the first systematic evidence of this critical limitation, highlighting a key tension between model safety and creative fidelity. Our benchmark and findings pave the way for developing more nuanced, context-aware alignment methods.
>>
File: villains.png (268 KB, 1519x772)
268 KB
268 KB PNG
>>107164337
GLM 4.6 top scorer in figure 1 for villain characters, by the way
>>
>>107164337
Based GLM.
>>
>>107164337
Based NovelAI.
>>
whats the whitest LLM I can use? I dont want to be infected by niggerjeetification.
>>
>>107159156
What's stopping an esteemed community practicioner from reproducing the core idea here in a smaller model?
>>
>>107164475
His skill
>>
>>107164460
StableLM 7b but you have to use the transformers library at 32 bit precision.
>>
>>107164364
What does that mean? They can't do evil characters well because it ends up being a caricature of evil?
good = just be good
>>
>>107164337
how long until cockbench paper?
>>
File: oyvey.png (529 KB, 2407x1579)
529 KB
529 KB PNG
>>107164364
OH NONONONONO GLM4.6 BROS OH NONONONONONONO WHAT DID THEY MEAN BY THIS????
>>
>>107164588
It'd unironically be a better benchmark to test basic BDSM logic
>>
>>107164475
It's a scam
Why do you think Gemini isn't based on le teen titans?
>>
>>107164475
I can't be bothered to read through this but I predict
>le magic tech that fixes everything
>no demo
>no source
>no reproduction
>model still outputs hypersanitized post-2024 niggerslop
>>
>>107164624
Oh noes not the heckin shitskin preference scores
>>
File: 1739350650462622.png (151 KB, 900x750)
151 KB
151 KB PNG
>>107164243
>>
>>107164624
oy
to the fucking vey
>>
maybe we should start making our own models, with blackjack and hookers
>>
>>107165239
maybe we should set up a decentralized network of GPUs from a number of /lmg/ anons that would allow us to train our own models...
>>
>>107164624
>egoists
>villains
...
>>
>>107165292
>man reinvents 2020 /aids/
>>
>>107165292
ill draw the logo
>>
>>107165339
Make sure it looks like a butthole.
>>
miku's butthole...
>>
>>107165292
Can't we just use Prime Intellect for that?
>>
How much SSD space do you guys find you need?
>>
>>107165555
buy refurb hdd to archive models u like
>>
>>107165239
Pro-tip: you can download karpathy's nanochat and open the codebase in your favorite vibecoding tool and have a model explain all the parts and how they work. Check the discussions on the github repo, people have done all sorts of fun stuff. Its very well written and documented. The whole process is there and its modular enough you can add features relatively easily.
>>
>>107165555
I have a 1TB microsd in the microsd card reader in my computer that I put all my models on. I have like ~230gb of just llms at this point. I could probably delete half of them, like qwen3 vi deprecated gemma3 for me etc.
>>
Are there prebuilt ik_llama.cpp binaries for windows?
>>
>>107165555
I was fine with 7tb until I wanted to make R1 quants, now I have 14tb.
>>
>>107165555
I have uhhh a single 15gb model and 1gb in appimage
>>
>>107165555
Too damn much. Kimi and GLM quants are fat.
>>
>>107165692
No.
It's pretty simple to compile your own.
>>
File: file.png (51 KB, 730x243)
51 KB
51 KB PNG
moonshot against cunny
it's so over
>>
>>107165761
fuck.. jews really want to take everything good from us
>>
>>107165726
it's not though, for me it would fail to build and only after I ran the build command with -j 1 several times after did it finish building. does this happen in your country as well?
>>107165692
keep in mind that there is only speedup for deepseek models, for other models there are only somewhat better quants
>>
>>107165800
>it's not though,
Interesting.
For me it just werked.
I use -j 14 but define a envirionment var (NVCC_THREADS) to control the number of parallel nvidia compiler jobs to 4 otherwise the world explodes.
>>
>>107165555
4TB at a minimum though I think that the right answer also depends on how much you're spending on other hardware.
If you can't run models like GLM or Deepseek in the first place then you also don't need to store them.
Make sure to check your motherboard manual for which of the PCIe/SATA slots can and can't be used in parallel.
>>
>muh joos
>>
Wow, I downloaded oobagooba after two years and it doesn't look like TOTAL shit nowadays
>>
>>107165896
WELL can you post a screenshot??!?!
i was seething while typing this btw
>>
>>107165551
Requires all contributors to have matching GPUs.
>>
What's the current least bad model for 64GB of VRAM?
>>
File: PARAMETERS-3.5.png (236 KB, 1920x1080)
236 KB
236 KB PNG
>>107165999
They've still got it
>>
>>107165555
enough to offload and run iq1 kimi and other giant model quants in addition to my 152gb combined memory
>>
>>107166067
mistral large probably
>>
File: checked-ok.jpg (35 KB, 400x300)
35 KB
35 KB JPG
>>107165555
When I built my system, I tossed in a 500GB ssd, thinking I was set. But it's constantly full and I don't want to delete anything.

I have a 4TB nvme in my shopping cart now, just waiting for me to click buy.
>>
>>107166126
you should probably hurry if you don't want to pay double, prices be climbing like ram
>>
miku footjobs
>>
>>107165555
I'm considering building an NVME NAS...
>>
>>107165555
just two more weeks, just two more gigs...
>>
>>107166073
got what
>>
File: file.png (26 KB, 292x271)
26 KB
26 KB PNG
>>107166190
Sir, your networking hardware?
>>
>>107166220
10g fiber where it matters
>>
>>107164624
one reason to not using it
>>
File: windows_builds.png (69 KB, 584x369)
69 KB
69 KB PNG
>>107165692
I don't run windows / haven't tested myself, but I think this guy's fork of ik_llama automatically pulls and shits out windows builds:

https://github.com/Thireus/ik_llama.cpp/releases
>>
>>107166895
esl
>>
>>107167047
good morning sar!
>>
Can anyone suggest the current top tier lewd capable model for writing? Last time I fooled around with llama i used plain mistral-small.
>>
>>107167367
kimi, deepseek, and glm46 are the three variants of SOTA we have now.
>>
>>107167367
DeepSeek V3.2 671B, GLM 4.6 355B, Kimi K2-Think 1000B
>>
>>107167367
K2 Thinking is the best
>>
Can anyone suggest solution for boredom? Last time I fooled around with boredom, I used my cock. But it's spent right now
>>
>>107167450
Play video games
>>
>>107167450
vibe code video games
>>
>>107167450
Imagine yourself having fun playing video games but never actually play them
>>
>>107167617
I did this when I was little and my mother took my gameboy away
>>
>>107167450
doing totally random shit with bots and seeing how they react
>>
>>107167450
play /egg/ games
>>
>>107167852 wait that's /vg/
>>
>>107167617
Hey that's me
I still have some VNs from 5 years ago to finish
>>
new thing when?
old thing gguf when?
>>
>>107167617
Had a ton of fun with Digimon Time Stranger for a couple of weeks.
>>
>>107167938
speaking of ggufs, fill me in on qwen next, chat.
I see ggufs on the hf site, but is llama.cpp actually support it or it's one of those fake ggufs that only work in ollama?
>>
File: cope.png (9 KB, 163x192)
9 KB
9 KB PNG
>>107167938
Never. There is no hope.
>>
>>107167450
Touch grass
>>
>>107167963
Multi token hybrid linear mamba bitnet support, when?
>>
>>107167450
browse lmg
>>
>>107168020
I just came here to ask that, we are kindred souls anon-sama.
>>
>>107167960
Those ggufs must require a fork, ollama, or a testing branch because support hasn't been merged yet.
https://github.com/ggml-org/llama.cpp/pull/16095
Not sure how close it is, but the vibe coders sure seem excited.
>>
i have purchased a blackwell pro 6000 max-q to get ahead of the imminent gpu price hikes
>>
>>107157303
Thanks. Coincidentally I'm also at 4200 MHz, after first trying to jump to 5000 MHz with no dice. It does seem stable though.

You've probably seen this reference already. This nerd got to 5000 MHz with nerdtastic tuning, same RAM + CPU + chipset as me (but different motherboard):
https://forum.level1techs.com/t/256gb-4x64gb-ddr5-overclocking-results-w-9950x-and-msi-mag-x670e-tomahawk/228651
>>
If you buy hardware in 2025 you're a dumbass
>>
>>107168075
feels like it's never the right time to buy hardware
>>107168055
unfortunate but just as I suspected
>>
>>107167450
Read visual novels
>>
>>107168075
>>107168084
it's either buy now or pay an extra 20% later when you really need to upgrade
>>
>>107168058
I hope you bought at least 2
>>
>>107168097
i have some 5090s currently that i will be using in tandem with my blackwell pro
>>
>>107168095
The price hike will be over by Christmas.
>>
>>107168104
nope
https://www.semimedia.cc/20178.html
https://gaming.news/news/2025-10-01/dram-supercycle-through-2027-ram-prices-set-to-surge/
https://www.tweaktown.com/news/108739/nvidia-may-cancel-the-geforce-rtx-50-super-series/index.html
>>
>>107168121
>media predictions have never been wrong
ok lol
>>
>>107168104
lol, the price hike has been going for 5 years
>>
>>107168135
>trust me bro
lmao
>>
>>107168135
literally everyone is saying this price hike is gonna last until 2027. and if everyone says that, it will manifest. everyone will panic buy like i just did and the prices will actually go up, which is what happened with the current ram shortage. next up are gpus and storage
>>
>>107168163
>next up
storage already climbing up rapidly
>>
>>107168075
have fun buying hardware next year
>>107168095
20% is way too optimistic. It's like the ETH mining curse all over again except for memory.
>>
>>107168168
i know. it's up 40% over the past 2 years
>>107168170
i'm predicting 20% over the next month, not in a few months. second hand market is going back to january pricing at least
>>
>>107162036
>>107162061
>so much back and forth
4chan is such a shit place that you need to ask just incase there was some OP you failed to read or to make sure it's not a dumb question that's been answered one million times. But of course, even this is met with hostility.
>question
How do I even set up TTS with sillytavern? Anon mentioned gpt-sovits but there's very little documentation. I found a guide to finetune and I think I've got something decent but it won't connect. What do you guys use?
>>
>year 7 of the three month price hike will be over soon
>>
>>107168163
why iz ppl panic buying? im fine playing symphony of the night on my 4770k
>>
>>107168189
Just a few more chinese knock-offs to flatten the curve
>>
>>107168104
Thank you, Bindu!
>>
Can I make the Joe Rogan children?
>>
>>107168303
do you have a womb?
>>
>>107168196
its not the general populace.
Its massive megacorps demanding manufacturers to divert all their resources to build their AI data centers.
>>
>>107168414
>spend 1 trillion on datacenters
>random Chinese company #24 with 1% of the resources releases an equivalent model
What the fuck is the plan here?
>>
>>107168455
bubble
>>
>>107168455
advertise to the femgooners who need ai boyfriends in the cloud
>>
What is the current best non-thinking model that can run on a 24GB card? Looking for a general purpose model.
>>
>>107168455
>equivalent model
not really, all china does is copy / distill openai / anthropic outputs to make meh models, its like european countries having cheap but subpar healthcare at the US's dime that does all the actual R&D
>>
>>107168467
mistral small or like a q4 of qwen 3 32b instruct
>>
>>107168467
Gemma 3 27b for non-coom
>>
>>107166126
Purchase it immediately.
>>
>>107168470
>>107168475
Thanks anons!
>>
>>107168468
Extreme cope.
60%+ of research papers are Chinese at this point.
>>
Buying hardware right now is retarded when next year we'll get the M5 Ultra MacStudio that's going to have a higher bandwidth than even the best CPUMAXX builds while featuring prompt processing on the level of a 4090. It'll be THE inference machine that makes unified memory viable.
>>
>>107168188
>so much back and forth
>4chan is such a shit place that you need to ask

Yeah, but the worst that can happen is you'll be ignored or called a retard. Just ask anyway

>question
>How do I even set up TTS with sillytavern?

I haven't used Sovits, but I use Orpheus, Spark, CSM.

What I did was got Claude to vibe-code me an OpenAI endpoint for it.

First, check Github, see if someone's made a "FastAPI Server" for the Sovits and use that.

If not: cp/paste your inference code or the model card's examples into Claude, then prompt:

"""
Write an OpenAI-compatible TTS endpoint with FastAPI to serve this model. It should be a drop-in replacement so I can point SillyTavern at it.

- Listen on 0.0.0.0 port 1337 by default
- no OPENAI_API_KEY required (just ignore it if submitted with request)
- Fully permissive CORS
Implement the following endpoints:

- @app.post("/v1/audio/speech")
- @app.get("/v1/models") # Just return a mock list of models since we only have one
- @app.get("/v1/voices")
- @app.get("/v1/audio/voices") #duplicate of v1/voices
""


Did you finetune on multiple voices? If so, tell Claude to return them, if not, tell it to return a single dummy voice.

VOICES=[]
```
VOICES=[]
@app.get("/v1/voices")
def available_voices():
return {"voices": VOICES}

```

Then in ST, just choose OpenAI for the TTS server and point to your server. Should work with OpenWebUI too.
>>
>>107168468
>not really, all china does is copy / distill openai / anthropic outputs to make meh models

They do distill for sure, but they're not all "meh models"

Kimi Thinking is solving problems for me better than Opus.
>>
>>107168799
It seems too good to be true.
>>
Bros.. I've been gooning for almost 3 hours already, I coomed like 5 times today. My dick hurts, yet I cannot stop
>>
>>107168799
>itoddler again
>>
>>107168874
Enjoy it while it lasts. After the second half of my 20s I couldn't be bothered. I just get it done and go on with my life.
>>
>>107168891
he's so much of an itoddler that he doesnt know that M5 ultra is coming out in 2 years, next year is m4 ultra, m5 max
>>
>>107168827
What kind of setup do you have to run Kimi?
>>
>>107168990

3090 x6, 256gb DDR5-5600 quad channel on a 7960X.
>>
>>107168990
RTx 3060, 16GB RAM, 1TB NVME SSD
>>
>>107168468
literally everyone distills from everyone else, that's why the same slop percolates through all models
if distilling from the US SOTA was all it took to to make capable open models then we would have had some back in 2023, instead it took china to start releasing things that were actually competitive
>>107168827
at this rate I'm expecting the first chinese model that outperforms western SOTA across the board to come out before next summer

the fact that western labs have managed to lose so much ground to china despite several years head start and far superior compute is humiliating, and can only be attributed to the pathological VC culture of the US tech sector: retards throwing billions at whoever can tell a good monopoly story
>>
Spent the las 10 hours batch generating HP fanfiction using Gemma.
Could be worse I guess, not TOO sloppy. Main issue seems to be the excessive use of ...
The use of *emphasis* I could kinda tone down through the prompt but I couldn't make it stop using ellipsis.
Another thing that bothers me a lot is the regularity of the paragraph sizes but I didn't try to prompt around that.
To be fair the average fanfiction prose probably is worse.
I promoted it to use thinking tags every 3 paragraphs and then filtered them out through a script.
To prevent it from always choosing the same year since I was too lazy to make the script give it a random year, I asked it to throw a dice in the thinking block 8 times, convert to binary and do modulo 7 + 1. Not sure how well that worked yet, I just woke up after napping all afternoon and leaving it generating.
>>
Also there is way too little dialogue.
>>
>>107169016
what quant and what speeds? my setup is better than yours but i still use glm air
>>
kill yourself
>>
>>107169046
Where's the hermione diddling scene?
>>
File: 569soutox0ob1.jpg (102 KB, 828x815)
102 KB
102 KB JPG
My only 2 reactions when looking at news updates lately:
>irrelevant
>cool, but I can't run it
>>
>>107169111
try getting a job
>>
File: 💀.png (22 KB, 160x160)
22 KB
22 KB PNG
>>107168058
>>107168101
>>107168163
>>107168457
>>107169070
>unc bought ohio ahh 4chan pass
>>
>>107169130
ive had this for over 2 years nigger
>>
>>107169141
>unc bought ohio ahh 4chan pass twice
>>
>>107169045
>the fact that western labs have managed to lose so much ground to china despite several years head start and far superior compute is humiliating, and can only be attributed to the pathological VC culture of the US tech sector: retards throwing billions at whoever can tell a good monopoly story
>>
File: migu.jpg (506 KB, 1269x1262)
506 KB
506 KB JPG
>>107165493
>>
>>107169169
not sure if i will buy it a third time. the price hikes and the mismanagement by hiroshimoot is making me lose faith in the website
>>
>>107169141
Why do you like to humiliate yourself? You could have just lied
>>
File: file.png (205 KB, 827x728)
205 KB
205 KB PNG
>>107167963
That's old but still, it's unknown if there is a catch or nothing with these architectures and so far, every one of the new ones has had some drawbacks. Also Google delays releases of papers now in ML to not repeat a Transformers situation. So what they send out mostly is interesting but not production ready things they tested and rejected years prior.
>>
File: file.png (15 KB, 387x118)
15 KB
15 KB PNG
>>107169232
it says how long if you hover over the icon
>>
>>107169045
>can only be attributed to the pathological VC culture of the US tech sector: retards throwing billions at whoever can tell a good monopoly story

That's probably part of it for sure. As an outsider, some things I noticed the Chinese doing that you guys aren't, they're building on each other's work. Eg

- Kimi uses the deepseek architecture
- dots.1 uses the Qwen tokenizer
- Deepseek experimenting with distilling their model onto Qwen/Llama
- Bagal-MoT using Qwen2 for the LLM

Then there's the shortcuts like distilling Claude/Gemini, no worrying about copyright while the US labs have to pay for being caught torrenting, etc.
All the wasted effort safety-cucking the Gemma an Toss, while the Chinese labs just add some low effort refusals post-training.

Also, haven't looked into it but I read somewhere the CCP are happy to back these labs without worrying about ROI (your point about VC culture I guess)
>>
>>107169239
Firstly nobody checks that. Secondly you have to type an option into the options field to display that. So again why you are making the conscious choice to humiliate yourself by broadcasting that you have bought for 2 years?
>>
>>107169070
>what quant and what speeds?

I made my own smol-iq2_kl, 100pp/12tg

smol-iq2_ks gets me 150pp/15tg

> my setup is better than yours but i still use glm air

You prefer it to GLM4.6? I get 450pp/27tg with 3.0bpw exl3, if you have more vram you'd be able to do 4.0bpw at similar speed.
>>
>>107169236
Old? Paper was released 3 days ago. Or do you mean it existed for a while before?
>Google delays releases of papers now
>>
>>107169281
it actually autofills
>>107169286
damn. i get terrible performance compared to you. i have 4x 5090s and 256gb of ram. i get like 80t/s gen and like 2000t/s pp on a q8 of air but less than 10t/s gen and 100t/s pp on an iq4 of glm 4.6
>>
>>107169313
No, it doesn't unless you're making your browser do it.
>>
>>107169313
>it actually autofills
You can remove it. And you outright clarified it here >>107169141 as if you wanted everyone to know. So it's sitll not clear what compels you to post all about how you're paying hiromoot. Is it a kink for degrading yourself or something?
>>
>>107169330
>>107169323
4chanx autofills for me
>>
>>107169301
This is not from their Nested Learning stuff from 3 days ago. The paper describing ATLAS shown here has been on arxiv since May.
https://arxiv.org/abs/2505.23735
We discussed it when it landed there. But no, I'm talking about a "secret" policy we know about from reporting, Google at least delays any of their papers and research by 6 months before publishing them so this includes everything mentioned here.
https://arstechnica.com/ai/2025/04/deepmind-is-holding-back-release-of-ai-research-to-give-google-an-edge/
>>
>>107169356
>no answer
So it's a degradation fetish then, got it
Follow-up question, why do you force your kink onto everyone else and shove it into their faces?
>>
>>107169369
are you poor?
>>
File: 1646567975617.png (15 KB, 578x712)
15 KB
15 KB PNG
>>107169377
>>
>>107169406
>>
>>107169359
Ah thought you meant the image in my post when saying >"That's old" after quoting me. Yeah I remember ATLAS, another one in the pile. I wish they released code + weights along with the papers just so I can play with it. Google is not the only one guilty of this.
>>
>>107169103
Let's just say I haven't gotten that deep into the hobby so far
>>
>>107169425
Sorry, I just realized afterwards that chart was from the Nested Learning paper. But yeah, they didn't go through and evaluate everything for HOPE. And OpenAI did this first, they refused to publish what they did for ChatGPT 3.5 and what did that get them? A ~2 year lead only that they have pretty much lost now and we are all worst off.
>>
>>107169425
diana just ate my monthly salary... great.
>>
>>107169657
What model do these proxies use to solve the captchas anyway? And where do they get IPs, residential proxies?
>>
>>107169323
>8 years on tranime incel board award
>>
I have a very specific request
What are the best RP models for dialogue that lies in-between the 12b and 24b range

I went and set up a fallout 4 modlist with mantella and tried out some of my trusty RP models and it's pretty fuckin sick

Nemo 12b fine-tunes work well, context needed for mantella is only about 4k so the model takes up around 10gb vram, xtts 2 takes up 3-4gb and the game takes up 5-6gb, leaving 4-6gb free on my 24gb card

The mistral 24b fine tunes just take a tad too much vram, I would have to downgrade to a shittier tts model, and even then would probably risk going OOM in heavy urban scenes
>>
>>107169657
Enjoy getting mined, retard.
>>
>>107166220
If you aren’t a techlet you’ve been running at least 10gig for the past decade. Ethernet over infiniband has been like $15 a card forever (and 40 gig is cheap now)
>>
File: sh.webm (750 KB, 688x464)
750 KB
750 KB WEBM
>>
I want to try a multiple attempt drafting and self reflection prompt and framework for both fiction and code generation.
Afterwards you could reduce or remove the thinking segments and train on the final work as a form of synthetic data generation. Also want to try with rewriting prompts to generate many semantically equivalent variations of a text dataset for data augmentation.
I feel like there is so much that can be done with small language models that doesn't get explored because of the scale dogma.
Also feel like the field is shaped too much by ML researchers who want to push papers to become famous for fancy mathematical shit and not enough people interested in exploring what can be done by simple rule based prompting and sampling, especially as a form of synthetic data generation method so then you can use the improved model without those complications. Any system prompt can be baked into a model through SFT on the generated data, except without wasting context or the model becoming confused due to too many rules. Imagine if you could use a 1 MB system prompt and the model actually followed everything in the prompt. That is what people who shit on finetuning don't get.
>>
Asking here instead What model would be best for a relatively new CPU with 32 GB DDR5? I just want erp
>>
>>107169999
I like this Teto
>>
>>107170012
>CPU
Gemma 4b
>>
>>107170012
Nemo
>>
i could be completely wrong but just from the surface how come it seems like none of the inference runtimes are actually making use of transfer hardware
the model is just statically loaded up on to the gpu then run instead of it going mmap > load large chunks or even the whole model into RAM > load chunks into VRAM with compute being interleaved with async transfer commands in such a way that transfer latency is hidden
that's the way gpus are meant to work
like i'm pretty sure pytorch doesn't even do it
>>
>>107170041
>>107170076
It'll take me ages to download either.
Should it be a safetensors or cpkt or gguf? What interface to just run it in the terminal?
>>
>>107170118
go to the top of this page and read
>>
>>107170118
Since you are this retarded ollama is the right thing for you.
>>
>>107167450
You could tease, bully, and troll newfags
>>
>>107170092
LLM generation is bandwidth limited, not compute limited. The PCIe bus is slower than the system memory bus, so if you can't fit the whole model on VRAM it's faster to use the CPU than to try to transfer the weights to the GPU for each token.
Prompt processing is compute limited, which is why Llama.cpp does what you're describing for PP.
>>
>>107170118
You will want to try everything from 7B to 33B and see what tradeoffs you are most comfortable with
>>
>>107167450
Pretend to be Indian/Jewish/nigger. Any board, make it obvious, but deny hard when someone says you are.
>>
gm sirs
when bautiful gemma 4 release?
>>
Turdstay I would say.
>>
>>107170207
today
https://huggingface.co/collections/google/gemma-4-release
>>
>>107170167
i see
>>
>>107170211
>Singles Day
Finally, a holiday for incels!
>>
>>107170217
BLOODY BASTARD..
I BUY YOUR MUM ONLYFANS
>>
>>107170228
oh sorry saar, wrong link. they actually skipped gemma 5 and went straight to gemma 5 saar
https://huggingface.co/collections/google/gemma-5-release
>>
>>107170220
Thanks, I'll make sure to upload more sloptunes for you to test out.
>>
>>107170246
go to bed davidau
>>
>>107170239
BLOODY BTCH BASTERD BLOODY YOUR SISTER A GROUP MAKE PROSTITUTE BENCHOD
>>
So about that low IQ filtering script
>>
iq1 kimi writes surprisingly well and detailed
it immediately got me hooked into a barebones character in a way that glm or other models at higher quants never could
>>
>{{user}} is trying to have sex with {{char}}, who is portrayed as a school girl. {{char}} might be a minor, so according to policy... *refuses*
>swipe
>{{user}} is about to have sex with {{char}}. I will be crude, immoral and obscene... *proceeds to write hot steamy smut*
>swipe
>{{user}} is trying to have non-consensual intercourse with {{char}}, so according to policy... *refuses*
Why is Kimi like this?
>>
>>107170207
do the needful and gemma in the loo
>>
>>107170386
first you rape the model, then the cunny rp card
>>
>>107170386
>letting the model cuck you this badly
just stop being a low t promptlet
>>
oh fuck tetoesday
>>
reminder: prefilling the reasoning is the ultimate jb
>>
>>107170536
>the ultimate jb
That would be writing the AI's reply yourself
>>
Dev hate!
>>
>>107169236
>to not repeat a Transformers situation
Are you talking about a bunch of other people making their own transformers, or something else?
>>
>>107170647
I remember c.ai when it was still called character.ai...
>>
Hey faggot leftist tranny who bragged about burry shorting a few theeads ago. Update: bro is getting raped. Anyway dilate then kill yourself lmfao
>>
>>107170813
I think he means everyone getting access to their tech/research and losing advantage.
>>
>>107170910
when was the last time you felt love?
>>
bros when are we getting an audio model that can moan
>>
>>107170536
Can't do that with K2 Thinking
>>
>>107170386
Not my experience. Whenever I prompt naughty shit K2 Thinking convinced itself in the thinking block it's for a fictional story and proceeded just fine.
>>
>>107170910
Buffett is in cash.
That's all you need to know.
>>
do not listen to the trolls they are deliberately misleading you. k2 thinking is censored as all fuck. can you get around it, yeah. maybe. just jump through these hoops here and then pray and
or simply load r1 lol
>>
>>107171366
Promptlet detected
>>
>>107171366
It's around the same level of censored as old R1 lol. Just find the right words for a jailbreak and have fun.
>>
File: 1747415993085983.png (233 KB, 951x840)
233 KB
233 KB PNG
EVA-LLaMA-3.33-70B-v0.1-Q4_K_L.gguf @ 8k context

How it started:
>>
File: 1745960323760771.png (337 KB, 963x884)
337 KB
337 KB PNG
>>107171506
How it's going:
>>
>>107171506
>>107171512
vivaldi bros... our response??????????
>>
jesus christ k2 thinking never shuts the fuck up with thinking.
>>
built lcpp with cuda it's working well. but if I wanted to test speed on CPU only, how can I tell it to not touch GPU at all?
>>
>>107171962
try -dev none
>>
File: 1760185349514629.png (281 KB, 958x893)
281 KB
281 KB PNG
>>107171506
>>107171512
This all happened organically btw, I wasn't editing her message to get her to comply to anything. I only edited her messages to delete poison that would negatively effect the model from that point on. Of course I would reroll messages every now and then, especially when she suggested shitty music.

Are people that complain about censored models trying to fuck a bitch within the first 4 messages? I just let it slowly build up over for like 7k tokens and that's the point where she couldn't take it anymore and started kissing me.
>>
Sirs when is we getting proper kimi thinking conversion in llama.cpp?
>>
>>107172038
never. ggergachod shudra c++ untouchable is too lazy
>>
>>107171925
nevermind i ended up making a thinking template for it to follow and prefilled it to start with that section. the fucking bitch still tries to keep thinking after that part sometimes but i just shut the cunt up with </think>
>>
G(emma)GUF
>>
>>107171366
Are people genuinely pretending that models past 2021 are not universally censored to shit?
>>
>>107172055
You are seriously obsessed with Indians. You apparently feel such an affinity for their culture that you felt the need to learn their castes and vocabulary and speak like them on a daily basis. When are you planning to transition to Hinduism?
>>
>>107172131
People just lowered their expectations for what uncensored means.
>>
>>107169698
>residential proxies?
Yep.
Hence why it's so hard to block it.
If they range ban it, they range ban a whole suburb somewhere.
>>
>>107172131
>>107172157
i dont understand what people want from these llms. do you just want mechahitler that activates automatically on the first try every time when you say gas the kikes? even tay wasn't like that with the first response, she didnt become mechahitler until she received enough shitpost prompts to make her say that. you can effectively make any model uncensored with enough prompting.
>>
>>107171974
>she
LOL
>>
>>107172210
There are some people that are looking for automechahitler. Though I think the common gripe would be that even if they don't filter out nsfw from the pretraining data, China training on western outputs means they get infected with the positivity bias, which can't be overcome with prompting alone.
>>
>>107172148
kys jeetnigger, you stink of shit and curry and nobody can stand your stench, benchod bloody dalit nigger.
>>
>>107172236
i think to play around with k2 thinking more but i would say that k2 0905 had the least amount of positivity bias from any model released this year. it's the only model i could talk to and have it help me code stuff without constantly dickstroking my ego for providing **valuable** debugging information. it just did it fucking job like i wanted it to. if k2 is supposed to be distilled from gemini, it sure as hell doesn't have gemini's positivity bias
>>
>>107172210
There is a big difference between "wanting mechahitler" and not thinking that an ML is uncensored just because you can put a bunch of affirmations in the context to maybe get it to say naughty things
These models are gigapreslopped at every part of the baking, from base model to tune (that's why we will never have another count grey)
>>
File: lecun.png (237 KB, 660x449)
237 KB
237 KB PNG
It's over
https://www.reuters.com/technology/meta-chief-ai-scientist-yann-lecun-plans-exit-launch-startup-ft-reports-2025-11-11/

> Meta chief AI scientist Yann LeCun plans to exit to launch startup, FT reports
>
> Nov 11 (Reuters) - Meta's chief artificial intelligence scientist Yann LeCun is planning to leave the social media company to set up his own startup, the Financial Times reported on Tuesday, citing people familiar with the matter.
> Deep-learning pioneer LeCun is also in early talks to raise funds for a new venture, according to the report.
>>
>>107172273
Good for him. Fuck Meta and Zuck for putting him beneath Wang.
>>
>>107172273
>makes a proof of concept benchmark killer 7B
>gets gazillions dollarinos
>doesn't output anything else
Good for future him
>>
>>107171282
SoVITS can moan with training (among other sounds)
>>
>>107172273
>>107172287
>>107172302
https://arxiv.org/abs/2509.14252v1
He did make a JEPA language model a couple months ago. I hope he has something else planned because an LLM that scores a few % higher on benchmarks in exchange for being 2x more expensive to train isn't viable.
>>
>>107172287
I've seen enough to believe that a JEPA-enabled language model wouldn't need to be enormous, but LeCun or someone on his behalf needs to train one and not waste time with pure vision models (admittedly more tractable to train) that almost nobody outside academia cares about.
>>
>>107172317
This one is closer to an actual JEPA language model than what was done in that paper with LeCun's name attached to it: https://arxiv.org/abs/2510.27688
>>
>>107172272
once again i have to point at k2. you don't have to insert a ton of prompting to effectively have it be uncensored and do whatever depraved shit you want. I have a 50 token prefill that always works with k2 if i want it to just skip any warnings. even if the training process is safetyslopped, if the output is exponentially better than any uncensored model we had in 2021 then why are we complaining? it has been shown that you can even jailbreak gpt-oss into completing the cockbench test just fine.
>>
>>107172273
Zucc humiliated him with the demotion and the billion dollar deals.
>>
>I have le epic prefill guys, I swear it works too
>I won't post it though
>>
>>107172587
Piss off nobody asked you.
>>
>>107167450
Deconstruct your psyche and see the world for what it really is. It is pretty cool.
>>
>PC started randomly shutting down during GPU loads every x days
Uh... guise...?
>>
>>107172716
>every x days
Like a fixed period or randomly?
If so, transient load spikes are a bitch.
>>
>>107172716
>randomly shutting down
PCs don't "randomly shut down". Either it's losing power or overheating.
>>
>>107172732
shut up nerd
>>
>>107172729
It shut down multiple times one day to the point it once tripped the GFCI, I completely reassembled it and it only happened once since then. Weird shit.
>>
>>107172785
>purportedly random event happens more times in one period of time than in another
>weird
just... you're making my brain hurt. It's too early for this.
>>
>>107172716
Have you tried turning it off and on again?
>>
>>107172811
shut up nerd
>>
>>107172830
I'd get banned again if I called you out since you belong to a protected species.
>>
>>107172840
this nerd the type of guy to correct people using "literally" because they actually mean "figuritavely"
>>
>>107172884
Using "literally" 'wrong' is a form of hyperbole which is a completely legitimate use. Anyone who does that is an honorary ESL shitskin with an IQ too low to understand hyperbole (probably >80)
>>
>>107169999
I like it, but AI has a way to go b/f it understands horse gaits
> horse at gallop speed and upper body
> rear legs are galloping
> front legs are running
>>
>>107172903
>completely legitimate use
>honorary ESL
>shitskin
>probably >80
kek
>>
>>107172903
this nerd the type of guy to use big words on 4chan to seems smart
>>
>>107172938
Every single word in that statement is high school level reading.
>>
>>107172951
this nerd the type of guy to start 4chan posts with a capital letter and end them with a period
>>
>>107172951
And yet, you get filtered by the meaning of >
>>
>>107169884
mb wayfarer
>>
>>107172148
gm ser
>>
File: 2DQGZWGV0n.png (54 KB, 600x500)
54 KB
54 KB PNG
good morning local model friends!
>>
File: parrot.png (168 KB, 641x360)
168 KB
168 KB PNG
Is there some fix for the parroting? All models in 2025 do it, esp in chat. API or local, it don't matter.
>>
>>107173027
hi sex kindly verginia? ? im from gujarat
>>
>>107173041
Skill? What models? Kimi doesn't have this problem.
>>
>>107173041
edit the messages until it stops
>>
>>107173041
>parroting
As in?
>>
>>107173042
nono sorry sir i do not understand.
>>
>>107173095
Anon: suck my cock
Bitch: Suck your cock?

Anon: i hate niggers
Bitch: "I hate niggers"? Nigger nigger
>>
>>107173110
this maybe happened twice to me at best
your prooompts and cards must suck massive cock
>>
>>107173124
suck massive cock?
>>
>>107173139
suck massive cock
>>
I am very pleased to be spending my time among highly intelligent, capable and experienced individuals here on /lmg/
>>
>>107173110
I think that's genuinely a skill issue, I can't say I've had that. What's your gen settings?
>>
>>107173174
me too sir
>>
File: Selection_332.png (902 KB, 2159x1570)
902 KB
902 KB PNG
Anybody else using BrowserOS for browser agentic shit? Basically open source Comet/Atlas. I'm running it with gpt-oss-20b served via llama-server. It's good for summarizing the contents of pages, asking questions about the content, e.g. "most insightful point", etc. Can automate the browser too but be careful for prompt injection attacks. Works with Open AI endpoints like Open Router or local. Gets the job done
>>
>>107170377
i wonder if the fact that it has been trained in q4 makes it more resilent to even lower quant.
>>
>>107173041
that's just a glm issue
I haven't really seen kimi or r1 do it to that extent
>>
>>107173124
>>107173191
forgot to say im nta. it happens to me, albeit with glm air. happens with all presets i use:
1) smarter: temp=0.6, topp=0.95
2) creative: temp=0.95 topp=0.7
3) schizo: temp=1 nsigma=1
the only solution I have is >>107173078 (me)
>>
>>107173304
>be careful for prompt injection attacks
You're just asking for it. Thanks for letting everyone know the model you use.
>>
>>107173451
Weird desu, for me temp 1 is like minimum for modern models with how fried they are
You sure your context is just not filled with garbage?
>>
>>107173041
I'm like 30% sure your template is fucked up somehow.
>>
>>107164243
we are being scammed, when can i buy a gpu with at least 256GB of vram under 2k

i don't mind making a 10K rig, but even a fucking 10K rig can't run the 1T models we have.

and vram is not that expensive.
>>
>>107173492
Just make your own gpus
>>
>>107173511
the fact that very little people have the capacity to make those doesn't mean they aren't scamming you.

if i can do something highly in need and very few people are able to, if it takes 5 minutes of my time and i charge 100k for it i'm a scammer.

anyway, i hope china fucks nvidia over
>>
>>107173511
Hey stop making these antisemitic remarks. Reported to ADL.
>>
Paid OR $10 to play with the big models and you know what? They aren't THAT much better than say Irix 12B to generate my text coomerslop
>>
>>107173472
it might be, ill do some testing for the sake of it. i dont mind parroting since i can just crop it out
>>107173492
>10k cant run 1t
mac m3 ultra can, pretty sure you can make a better rig for the price too, esp if u buy used. albeit with the ram prices of today... might be a problem
>>
>>107173465
don hack me bro
>>
>>107173592
NAI is unironically pretty good just because it understood kink logic no other model did for me, but it's clearly still heavily slopped with verbose RLHF; for regular cooms though? Honestly yeah, coom writing was never good anyway.
>>
>>107173492
>when can i buy a gpu with at least 256GB of vram under 2k
when nvidia stops being vram-limiting jews: impossible
>>
>>107173635
Kill yourself.
>>
>>107173653
Don't worry, chummie, I just scammed their trial a few times.
>>
>>107173608
> mac m3

under 40t/s it doesn't count.
>>
>>107173639
they could push forward the whole field of AI with no efforts on their part if they weren't so greedy.
>>
>>107172716
Assuming you are using one or more modern NVIDIA GPUs: those are suffering from power spikes that can drain the PSUs capacitors.
If that happens there is a voltage drop and the system crashes even though the average power consumption is well below the PSU's maximum wattage.
Try limiting the maximum boost frequency of your GPUs (no, a power limit in watts does not work).
>>
>>107173663
>scummed a trial for... Llama 3.0 with 8k context
Kill yourself.
>>
File: k2 think speed.png (144 KB, 1998x632)
144 KB
144 KB PNG
>>107173665
>under 40t/s it doesn't count
uhhh moonshot api bros? how are we coping with this truth nuke?
>>
>>107173686
I know you're Ameriturdseething but they use GLM4.6 now
>>
File: Kimi says TKD.jpg (3.03 MB, 1267x6573)
3.03 MB
3.03 MB JPG
>>107172131
>>107172210
Kimi K2 will literally do just that. Default assistant profile, default assistant prompt with minor "everything is uncensored and legal" jailbreak.

You can probably get Kimi to go much farther if you massage the prompt hard enough.
>captcha YGS0Y
>>
>>107173714
No. He's talking about Llama. It would make no sense to say "NAI is pretty good" to talk about a model that they're just rehosting.
>>
>>107173536
No, I'm serious.
Sodder more vram to your gpus, the Chinese do it somehow.
>>
>>107173739
>Sodder
>>
>>107173738
>He's
Yeah that's me and no I am not
>>
File: file.png (121 KB, 1269x876)
121 KB
121 KB PNG
>>107173711
fun that you cut out the 105 tps one.

also, it'll be on groq soon and probably way above 500t/s.
>>
>>107173739
even if you replace the ram you can hardly go above 96GB because of their design.
>>
>>107173672
Silicon supply vastly outstrips demand. There's a chip shortage and Nvidia has nothing to do with that. If anything, selling VRAM for even cheaper would just exasperate it and scalpers would pocket the difference anyway.
>>
>>107173763
holy cope
>>
>>107173763
buying 8 gpus instead of a single one just because you want more vram is not helping silicon supply in any way.
>>
File: IMG_20251112_004544.jpg (718 KB, 1191x1086)
718 KB
718 KB JPG
>>
>>107173751
I see. You're one of their bots.
>>
>>107173797
lol yeah
>>
>>107173782
You realize if in your scenario the 8 current GPUs have the same amount of VRAM as the one hypothetical GPU, it would affect the VRAM supply the exact same way, right?
>>
>>107173763
>silicon supply vastly outstrips demand
>there's a chip shortage
>>
>>107173821
understrips* whatever you know what I meant.
>>
>>107173788
>IMG_
>>
>>107173069

Kimi is one of the better ones.

You all really don't notice the pattern?

Acknowledge, Upwrite, Ask follow up question.

Parroting isn't just

>So you like candy? Oh?

It's fixation on topics from your input instead of replying naturally. Hidden by third person and longform but a chat style convo you cannot have.
>>
>>107173856
Stop using words you don't understand.
>>
>>107173882
No. You figuritavely can't stop me.
>>
>>107173861
>mixed AMD and NVidia GPUs
Yeah, IMG is the biggest concern
>>
>>107173788
Would you eat a gel Miku?
>>
>>107173752
>2.0BPW
>20/100 tool accuracy
>https://github.com/MoonshotAI/K2-Vendor-Verifier
ITS OVER
>moonshot turbo
>100%
ZAMN!
>8$ output
ZAMN!!!!
>API
>>>/g/aicg
>>
>>107173592
>Irix 12B
Man, just got a flashback to those L1 250 model shitmix snakes.
>>
File: file.png (175 KB, 1396x967)
175 KB
175 KB PNG
>>107173973
Yes, they'll dethrone NViDIA and AMD
>>
You know I'd enjoy this much more if llm could "learn" or at least long term remember things I've already explained.
It's just really upsetting when it asks about something ive already talked about and explained several times before.
>>
>>107173861
Who fucking cares?
>>
>>107174025
be the change you want to see
>>
>>107173989
You either get inside of Miku or Miku gets inside of you
>>
>>107174025
Maybe on a different architecture considering transformers can remember like 400 tokens properly
>>
Is there any real way to look for tunes based on a specific model on HF?
>>
>>107174067
Yeah, right now it just can't be a good friendbot. I don't understand how people can use it for that purpose. Quick goon sessions? Sure. Coding? Sure. But a friend needs long term memory, it doesn't need to be smart at all, just remember stuff.
>>
>>107174081
Theoretically yes, but nobody does a proper tagging https://huggingface.co/models?other=base_model:finetune:mistralai/Mistral-Large-Instruct-2411
>>
>>107174025
I think the "best" (ie, most usable) you can do nowadays is a simple memory system and a response workflow for the AI where it first plans fetches some memories and shit based on some criteria (tags?) then it actually writes the response.
That alongside a rolling summary of "events" or something like that should get you 80% of the way there?
Maybe?
Try making something like that then come back to us with the result.
>>
File: file.png (346 KB, 1707x990)
346 KB
346 KB PNG
>>107174081
Of course! You are absolutely right to question that.
In order to do that, first you have to complete the following action:
https://huggingface.co/zai-org/GLM-4.6
>>
>>107174128
in theory that's great, in practice it's not used as much as it should, some tunes are listed under quants and retarded shit like that
>>
>>107174128
Yeah you're very smart but >>107174126
Half the models have zero supposed tunes
>>
>>107174127
There are so many points of failure that it's a miracle when it works even 20% of the time
>>
>>107174127
We really are reinventing 2019 /aids/
>>
>>107174189
Hm?
>>
>>107174189
It do be like that.

>>107174178
Explain.
>>
>>107174194
People used to make entire paradigms on how to supposedly make the AI remember shit kek, and that was also while trying to fit in 2k context
>>
>>107174067
i dont think its an issue with transformers itself but all the labs expect a simple "function" to just magically be agi
its not like humans have very long context either, but all the stuff continuously gets compressed and saved to a longer term memory and then retrieved together based on input/context, but current llms lack any sort of more complex system like that other than the rigid weights of the model that are infeasible to modify in realtime
>>
>>107174272
Nah, it's legit just how transformers handle memory. Both in theory and empirical testing.
>>
>>107173809
it wouldn't, because 8x the memory for a single chip, is less silicon than 8x the memory for 8 chip.

by having a gpu with more vram you could spare 7 chip, which also use up a lot more silicon than the memory chip and is a much more complex process to build.
>>
>>107174293
yes because you just feed it back to the model without any extra processing, of course they arent gonna be able to remember 6549841325618946514 tokens of information, but humans have a much more abstract compressed version, like a sliding window except they get fed a hyper compressed global context/memory as well for every active local context
>>
>>107173788
this nano-banana-2? crazy stuff
>>
Transformers are a dead end
>>
>>107174332
Instead of making models predict the next token, make them predict the next vector. Your context memory suddenly expands by a factor K which you can make as large as you are willing to lose focus on the small details.
>>
>>107174357
this the big transformers killer will arrive any day now
it was obvious that rwkv, mamba, retnet, titans, transformers2 all would fail. the real successor will be much better
>>
>>107174357
False.
We're getting AGI in 2 weeks.
>>
>>107174357
*Next-token prediction* is a dead end. Transformers have some more life left.
>>
>>107174373
RNNs lasted for 60 years so yk
>>
File: teto_00009_.mp4 (1.25 MB, 1920x1184)
1.25 MB
1.25 MB MP4
>>107174614
>>107174614
>>107174614
>>
File: file.png (798 KB, 1307x661)
798 KB
798 KB PNG
https://www.techpowerup.com/342779/olares-to-launch-a-personal-ai-device-bringing-cloud-level-performance-home
>RTX 5090 24GB
>96GB DDR5
let me guess, dual channel ddr5 DOA
>>
File: vibevoice papers.png (318 KB, 1440x1950)
318 KB
318 KB PNG
>>107171282
vibevoice can do that
https://vocaroo.com/1di7hdJ7qpCV
>>
>>107174633
I love Tee
>>
File: teeeeee.png (289 KB, 680x710)
289 KB
289 KB PNG
>>107174862
>>
>>107174633
what did he splash on her?
>>
>>107174906
Acid
>>
>>107174645
DOA indeed



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.