[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1748050363563976.png (2.2 MB, 1328x1328)
2.2 MB
2.2 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106627153 & >>106617426

►News
>(09/17) SongBloom DPO released: https://hf.co/CypressYang/SongBloom/commit/4b8b9deb199fddc48964c851e8458b9269081c24
>(09/17) Magistral Small 1.2 with vision encoder released: https://hf.co/mistralai/Magistral-Small-2509
>(09/16) Ling-flash-2.0 released, with 100B-A6.1B: https://hf.co/inclusionAI/Ling-flash-2.0
>(09/16) Tongyi DeepResearch 30B-A3B released: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research
>(09/16) VoxCPM 0.5B: Tokenizer-Free TTS released: https://hf.co/openbmb/VoxCPM-0.5B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 2 mexican wiku.png (638 KB, 1024x1024)
638 KB
638 KB PNG
►Recent Highlights from the Previous Thread: >>106627153

--Papers:
>106629743
--Emergent sexual content generation in AI role-playing scenarios:
>106627295 >106627320 >106627335 >106628370 >106628440 >106628477 >106627386 >106628288 >106629703 >106629769 >106630141 >106635527 >106635622
--MMAP and -fa flag optimization tradeoffs for model execution:
>106630910 >106630998 >106631029 >106631063 >106631111 >106631469 >106631529 >106632044 >106632110 >106631527 >106632142 >106633272
--SillyTavern template system confusion and model compatibility issues:
>106628595 >106628629 >106628693 >106628748 >106628715 >106628726 >106628912 >106628952 >106628998
--Integrating WFGY semantic reasoning engine with local models via gemini-cli:
>106627588 >106627713 >106632115 >106632156 >106632233 >106632363 >106632483 >106632596 >106632652
--Grok model version discrepancies and LongCat-Flash-Chat compatibility issues:
>106629472 >106630436 >106629513 >106629564
--AI-generated atmospheric dungeon descriptions in roguelike game development:
>106634684 >106634710 >106634850 >106634864 >106634912
--songbloom audio generation capabilities and multilingual support inquiry:
>106635239 >106635249 >106635432 >106635822
--Ling 2.0 FP8 mixed precision training open sourced, bitfall implications for int8 training:
>106627466 >106627601
--Successful Qwen3 235B finetune with axolotl, $30 cost incurred:
>106632736
--Post-training instability in models despite SFT improvements:
>106634096
--Meta lawsuit over using copyrighted adult content for AI training:
>106635079 >106635113 >106635190 >106635169 >106635179 >106635192
--Bilingual LLM recommendations for manga/light novel translation:
>106631579 >106631651 >106631807 >106632092
--Qwen3 Next integration with ggml in progress:
>106627804
--Miku (free space):
>106633265

►Recent Highlight Posts from the Previous Thread: >>106627156

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>106635936
waiting for goofs bros...qwenext never ever!!
>>
Unslothed pp in the quanted bussy
>>
how do I jailbreak nemo?
>>
File: gib monies.png (268 KB, 1816x1172)
268 KB
268 KB PNG
>>106635967
important work is ongoing in the PR
>>
>>106635473
>How many tokens is too many for a char card
depends on the model, take a look at adobe's nolima. I use 1800 max for the character card (includes the greeting), 450 system prompt, 500 persona.

>Also, general question, if I write a nonhuman character and the story incorrectly describes an action they do that's humanlike and something they shouldn't be able to, where should I first check to fix to prevent that from happening again?
even if you perfectly describe a {{user}} persona that's a bald troll or a dragonman, {{char}} will still grab you by the hair some swipes later.

tips:
>don't waste tokens
>don't do stupid shit like [{"{{char}}'s Appearance"}: ...] (fucking chub cards), use https://tiktokenizer.vercel.app/ to see how your card gets tokenized
>avoid bloating the context, check how much knowledge your llm has about the world you want to erp in
>>
Allister Margrethe
>>
>something small happens
>HIT LIKE A FREIGHT TRAIN
>SENT A SHOCK OF REALIZATION THROUGH MY VERY BEING

smaller models are the most guilty but even bigger ones like to take the most mundane thing and act like its a huge deal. i just had one model describe eating pancakes like sex, talking about savoring each bite and shit. im like ITS PANCAKES, EAT IT AND MOVE ON WITH THE STORY
>>
meta is doomed
https://www.youtube.com/watch?v=I5_JrfvO4G8
>>
>>106635998
is that kobald
>>
>>106636079
>implying
huge AURA boost for ZUCC. Doing a live demo, which no other company even dares to.
>>
>>106636046
That's all the retarded Ugandan RLHF.
Finetune your own model however you like it.
>>
>>106636023
nta but could you have a look at mine and see what would be good to remove antoehr anon said theyre slop lol. i had an llm write them mostly im not good at writing so gave it my ideas and got it to pad them out they seem fine to me but idk lol, top is my most recent could maybe just fgive feedback on that if too much effort

https://files.catbox.moe/7hegsu.png
https://files.catbox.moe/rdxzpf.png
https://files.catbox.moe/hw270u.png
>>
>>106636118
Right? People don't get how the occasional live demo fail is a good thing.
>>
https://www.youtube.com/watch?v=MDLLsaAGUB0
>>
File: .png (293 KB, 429x481)
293 KB
293 KB PNG
>>106636163
>open video
>see this faggot
>close video
>>
>>106636140
i was considering into looking into loras for models but i know they are basically not a thing at all, at least compared to image gen. i don't like to rp with anything less than 70b because those are dumb, but i also dont have the raw power to tune a whole model, let alone any data sets.
if this were easy to solve via tuning anyways, it'd be done already
>>
>>106636185
Why do you mean LoRas aren't a thing?
>>
>>106636046
This is 100% on the instruct tune. If we had a good, creative-oriented dataset to create new instruct/rp tunes from scratch this would no longer be the issue.
The models themselves are more than good enough at this stage.
>>
>>106636197
for text models, lora tuning exists but no one does it. they tune the whole model instead. i dunno why but that seems to be the practice
>>
>>106636197
LoRAs produce intruder dimensions within the llm
>>
>>106636215
They just merge the loras, sloptunes aren't even full parameter tunes
>>
>>106636233
That wouldn't be specific to text models.
>>
>>106636198
>This is 100% on the instruct tune
yep becaue instruct wants to tie everything up in one message, giving you an answer, rather than realizing its part of a longer ongoing story. its also why l2 chat tunes were better than instruct. but everyone tunes on instruct these days. its frustrating, i'd love a true long form rp model but you can't use any of the base ones for that these days, everything is instruct by default now. in st i tend to cut off the tail end of any message, so i kinda force it to continue. once you do that for enough messages it kinda picks up the pattern and doesn't try as hard to tie everything up
>>
>>106636215
I highly doubt that as you need like 8 H200s to full finetune a 7B, and a proportionally bigger cluster of servers to tune anything bigger than that.

>>106636233
I haven't heard of that but you can always freeze most of the weights and train a tiny subset at a time. Eventually it should be the same as a full finetune while consuming a fraction of the memory.
>>
>>106636295
7B doesn't take that much vram at bf16 with flash attention, you could probably get decent results with only a few hundred thousand samples. you can get the job done on much more modest hardware then you suggest.
>>
File: 30474 - SoyBooru.png (118 KB, 337x390)
118 KB
118 KB PNG
>Qwen-Max-Preview
>Qwen-Next
>Wan-Animate
Which kiwi do you wish to see next?
>>
>>106636366
VL, just give me the VL already, where is the VL, VL please do the needful sirs and produce the VL post haste and most expeditiously
>>
>>106636295
>I highly doubt that as you need like
well, whatever the case, it produces a tuned model rather than a lora you attach to a base model
>>
>>106636153
nta, You can always do another pass - edit the text to your liking and then ask the model to create more concise version. You can halve your tokens. Then edit that version bit further.
Glancing over the original card I don't know what to say, I've seen worse.
Maybe formatting could be discussed further but if it works then it's not a problem.
>>
>>106636406
yeah i do remove a fair amount of stuff and go over it in several passes and change things but i weary of removing to many things and not capturing the character the way i want maybe the other anon was just being rude lol idk
>>
File: 1739032505066301.jpg (41 KB, 578x599)
41 KB
41 KB JPG
>>106636366
I just want an open source version of Sesame Labs Voice Mode
>>
wan animate in real time is so crazy
>>
>>106636423
It's always a process of trial and error. Sometimes it's fun to do tests to see if the response is any different.
Just like with any prompting, trying to be as concise as possible is the best practice regardless.
>>
https://x.com/adrgrondin/status/1968748817617911961
>>
>>106636508
How much mahmory these phones have? I'm betting they could run Mistral 24B or so.
They should, when they cost $999 or more...
>>
>>106636508
we are so, so back
>>
>>106636466
Is this the prime time to get a vtuber girlfriend? I don't think your average simp can set it up for her.
>>
>>106636233
>intruder dimensions
You are saying this only because it sounds cool.
>>
File: file.png (170 KB, 604x1050)
170 KB
170 KB PNG
>>106636508
Applechads can't stop winning...
https://x.com/LocallyAIApp
>>
>>106636546
>mlx
kek no thanks
>>
>>106636546
>locally AI
sounds so jeeted.
>yes saar kindly redeem AI locally
>>
>>106636215
>for text models, lora tuning exists but no one does it. they tune the whole model instead.
???

That's stupid since qlora exists.
>>
>>106636546
https://litter.catbox.moe/x6crkj69a8pk2s50.wav
>>
>>106636708
which is your favorite?
>>
>>106636745
Do you understand WHY doing a full fine tune is stupid vs qlora?
>>
>>106636745
they merge the loras with the model. nobody releases loras as their own thing.
>>
>>106636521
I think it works like unified memory on M chips, so with 128gb iphone you can run big quants.
Or.. it will work, heard Apple plans on using HBM memory chips for iphones in near future.
>>
>>106636790
It's pretty cool when you think about it. Of course it's a walled garden though...
>>
https://huggingface.co/turboderp/Qwen3-Next-80B-A3B-Instruct-exl3/tree/4.06bpw
48GB VRAMchads rise up
>>
>>106635957
Anon is probably not telling it the context of that post.
>>
>>106636046
pancakes can be realllly good
>>
>>106636790
The battery is gonna love it
>>
>>106636466
It can run realtime? No way.
>>
File: IMG_1998.jpg (147 KB, 1024x1024)
147 KB
147 KB JPG
Nothing happened the last year - dead technology.
You can try to cope and write an angry response with your nimble fingers, but you know it is the undeniable truth. The king is naked. Time to let it go.
>>
>>106637195
You're absolutely right!
unironically
>>
>>106637195
Unfortunately we have to wait for the giant moe era to end before anything will happen, the current technology is practically anti-optimized for local use
>>
>>106637195
It's not dead as long as it can still wring cum out of me
>>
File: 120b.png (184 KB, 1642x1262)
184 KB
184 KB PNG
safety bros what the fuck is my 'toss doing?
>>
Apparently the original R1 only cost $300k to train and they did it on just a few hundred H100.
What's everyone else's excuse? Meta could be training thousands of R1-sized models every month with their insane stack of H100s.
>>
>>106634275
>Just ran Qwen3-Next-80B-A3B-Instruct-8bit with a 128689 token prompt consisting mostly of a lore dump
Tried again a few times with a smaller 103840 token prompt. The first two times the first couple of paragraphs looked like a story then the writing style collapsed in the way I expect for models outside their usable context. The third time was okay but not really enjoyable to read. Maybe I got lucky getting fairly acceptable writing with my first shot with the longer prompt or maybe it's due to the way I condensed the lore dump. I tried a few times again with a mini prompt (5015 tokens) and the writing defects are similar so I don't think I'll be pursuing this much further.

Maybe if mlx-lm implements a smarter form of caching for the Thinking variant I'll try that. But as-is it has to reprocess the entire prompt and conversation history each time because the server caches the generated thinking tokens.
>>
>>106636046
you've just never had a pancake like vidrel before

https://files.catbox.moe/pxg2zz.webm

(and i dont have the sauce, wish i did however.)
>>
>>106637492
>(and i dont have the sauce, wish i did however.)
Some guy with AI?
>>
>>106637462
zuck should have at least two independent teams working on the problem. but realistically you can't just train a thousand models on the same dataset or even the same architecture and expect radically different results.
>>
>>106637528
i'm sure the fella strives to be more in his life than just "some guy with AI"
>>
>>106637611
Good for him.
>>
>>106637462
>What's everyone else's excuse? Meta could be training thousands of R1-sized models every month with their insane stack of H100s.
Meta doesn't know how to use more than 5% of their insane stack of H100s at any given time. Judging by reports from some of their engineers, they put up so much red tape and bureaucracy to get access to them that they just sit around unused most of the time.
>>
>>106637923
Whats wrong with meta? how are they so bad with so much money?
>>
File: angy greta.jpg (6 KB, 225x225)
6 KB
6 KB JPG
>>106637923
>just sit around unused most of the time
I wonder if that means those nodes are powered down cold, or if it means those hundreds of thousands of GPUs idle at 100w 24/7, not to mention the rest of the rack.
>>
File: 1758288335334440.jpg (264 KB, 2319x1546)
264 KB
264 KB JPG
>>106637964
Meta at work.
>>
>>106637964
Filled with jeets and incompetent managers, like all big labs
>>
man this songbloom is nice, but not being able to specify musical styles is kinda a deal breaker for me, I want my hindustan poo song.
>>
>>106638127
https://www.youtube.com/watch?v=92ydUdqWE1g
Just pretend to be underage, bitch.
>>
>>106636046
yeah like wtf im raping this lass and she goes all nuclear, like FUCKING chill, im not killing you and eating you (yet), so please stop doomposting in my logs you fucking BITCH
>>
finna boot that xiaomi model
>>
>>106638214
So how is it?
>>
>>106638350
saar
>>
>>106638360
open up the browser sir
>>
>>106637923
fill up the form for gpu access sir
>>
>>106637967
>mines crypto for zucc's private wallet while idle
heh nothing personnel kid
>>
What do we do now?
>>
>>106638656
we wait, the new quarter is about to start which usually means new releases
about two more weeks to go
>>
>>106638669
two more weeks for more disappointment
>>
i have an uncontainable desire for mistral large 3
>>
>>106638656
Eat tabouli
>>
>>106638685
>tabouli
You mean Patchouli?
>>
>>106638656
Think Miku, Miku
>>
The no quarter is aboout to begin!
>>
moondream bros?
>>
I was using codex to tweak my QLoRa training script and it got fucked and I've been trying to fix it for like 5 hours.
Spent $120 so far (17h at $7/hr) and haven't even actually begun training.
Now I'm trying to get GPT Pro to fix it. I'd also try Claude but I'm banned (but hey at least they gave me a full refund for the month).
>>
>>106638725
>$7/hr
At what point does it just become cheaper to just outsource to a competent human?
>>
>>106638725
What did you get banned for?
>>
>>106638789
cute and funny
>>
>>106638760
I don't know. Hopefully the process of trying to make it work teaches me more than just running a working script.

>>106638789
I think it was for asking it to find me youtubers with similar life philosophies to this guy: www.tastyfish.cz
It was the day after I asked it that that I got banned. I think during the research process it fetched the page and their system saw the part about pedophilia and it was all over. I asked the same thing to ChatGPT and nothing happened though.
And at the begining of the month I also experimented with running claude code in an infinite loop but I don't think it was because of that, might have contributed though - repeat offender I guess.
By the way I liked his site so much that the finetuning project I'm trying to do is to generate the missing wiki pages in the same style and ideology as his. The whole wiki is 5 MB of text so maybe it's enough to make a finetune.
>>
Qwen models for roleplay seems to really like ending each response with a bunch of single line sentences. How do I stop this?
>>
>>106638987
Maybe
>>
>>106638988
author's note / system prompt telling it to write full paragraphs
>>
>>106639017
You're a retard.
>>
>>106639023
hmm, nyo
>>
>>106638988
Post your prompts.
Are you using ST? Post the stuf it sends in command prompt.
>>
File: ape type.jpg (156 KB, 759x371)
156 KB
156 KB JPG
i'm trying to reverse my brain frying today anons by starting a journey
5 full, detailed messages to a single character that aren't about sex or immediately building to sex. this would be a genuine achievement for me and i haven't been on this level for probably a couple years now
wish me luck
>>
>>106639068
I'm just using the ChatML template and neutralized samplers.
>>
>>106639023
That's the actual solution, thoughever
>>
>>106639135
Chatml applies to many models, this is not the issue here. I asked for your log.
>>
File: Capture.png (73 KB, 1218x542)
73 KB
73 KB PNG
I love when things just work.

I spent a couple days putting together a more elaborate text adventure premise, stylized around the old God of War games. One thing I've been fond of lately is having variable game options - in this case, MC details like name and sex, which god you're hunting, which god is working as your secret patron against the pantheon, and which divine weapon your patron gave you to slay gods. Things I'd want to change on replays. With all that though, I needed to write the generic intro that would fit any of the options, while keeping an open stage for the player's first action.

So I just asked the AI to. Literally just told it to make sure the intro could fit any of the options, and it did exactly that (even though all the options were already being defined by me in context, like the weapon being a spear). I'll probably tweak the intro around later, but I love when you try something and it just works exactly as asked like this.
>>
>>106639169
i can't relate. nothing ever works. my output is always shit and i can't figure out good samplers. nothing matters.
>>
>>106639149
I don't know what you mean by log.
>>
>>106639202
Text log.
>>
>>106639202
i think he wants to see an example of the behavior you are describing
>>
File: file.png (177 KB, 1778x890)
177 KB
177 KB PNG
>>106639214
>>106639218
It's like this. It starts with paragraphs, but then adds a bunch of these single-sentence paragraphs, despite none of the above messages having them, and with instructions to type in full paragraphs.
>>
>>106639255
You are joking.
>>
File: aef.jpg (192 KB, 458x726)
192 KB
192 KB JPG
>>106639255
You can see here that user replies are short, and the model has been asked to deliver relatively short answer - despite that it will still almost always replies with the same same length.
>>
>>106639255
If you want to make sure the text is readable condense the window - it makes no sense that you are reading a wide window.
Newspaper articles and columns were created because it's easier to read fast.
>>
are these people bots? I guess I shouldn't be surprised that it happens in the /lmg/ thread.
>>
>>106639282
So the model's replies are bullshit. Unless I miss something.
They only seem ok when you read them in a wide window.
You forget the thing.
>>
>>106638895
>I am not competitive.
>I am the most extreme form of pacifist
Reminds me of Weird Al.
>Well, I know I'm a million times as humble as thou art.

Also
>I am suffering from anxiety and depressions (diagnosed AvPD, now also with paranoid and schizoid features).

>Some things I like (no particular order):
>(true) anarchism
>GNU
>stereotypes
>child nudity
This dude is a mess.
>>
>>106639309
What is your concern? Why don't post your own log?
>>
>>106639255
See if this pattern has precursors earlier in the context. These are all staccato sentences. Even without newlines, they're structured like "X. Then Y — Z" or "X. Y. Z". Eventually it starts throwing in the line breaks.
It's a powerful form of "slop" in these models and you have to edit them out aggressively, or they snowball.
>>
File: 1731555723242365.mp4 (3.44 MB, 1286x864)
3.44 MB
3.44 MB MP4
Went and tried out browser-use, which is a system to let llms control browsers. It's pretty cool. From my experience, it gets lost easily if you give it a really open ended task, but it works pretty well for more direct stuff.

pic related demo

I put together a script that exposes browser-use as a function that my models can call. I route it through my Open WebUI instance to attach extra knowledge, such as how 4chan is structured and what shorthand like /board/thread means.

It can run as headless but its cool to watch in action.

I'll definitely be spending more time on this in the future. Currently, calling people retards on the internet takes up a sizable portion of my waking hours. I think automating this would be a big positive for productivity.

>>106638895
>>106639017
>>106639218
Sorry, it was for a good cause. Also I fucked up the recording twice.
>>
>>106639335
This is why post history injection is ok.

[Post History Instructions]
System Note: [
Always respond in 1-2 short paragraphs. Limit {{char}}'s response to less than 200 tokens unless specifically asked to provide a long answer.
]

But it will not replace anything. Model will limit its answer but you'll not see this.
>>
>>106639335
I did suspect this, but I had a very long roleplay with a model that doesn't exhibit this behavior. But with qwen, the next response generated is just what I had shown. It really seems to happen to qwen models. I'm specifically testing qwen3-next right now, but qwen3, qwen3-30b-a3b, etc also seem to do this.

i.e. GLM 4.5-air does not generate a dozen single-sentence paragraphs in each response.
>>
>>106639341
how did you bypass the captcha?
>>
>>106639359
I've seen this happen with deepseek. I haven't used Qwen much though, so maybe it's even more of a problem there.
>>
>>106639169
It works because you put in the effort instead of writing a one-liner. LLMs love precise instructions
>>
>>106639341
Nice. I did something similar with UI-TARS but it's a 7B model so it's obviously very retarded. I want to finetune dots.vlm1 to be able to click on things.
I want something that can control any computer and works with arbitrary applications by only interaction through keyboard, mouse and display.
>>
>>106639301
Dumbass
>>
>>106639261
>>106639268
>>106639301
samefag
>>
>>106639375
with the goyim paypiggie tech
>>
>>106639341
That's pretty neat
>>
So I don't know shit about the complicated bits of AI or programming, but I'm trying a little experiment to see if I can build a local setup that has one real goal, which was to essentially be able to store, read, take notes from, and reference the content of a pile of PDFs better than just uploading them to ChatGPT and hoping that it doesn't just randomly lose the PDFs or forget how to read or hallucinate random shit.

And I'm butting up against the fact that this is almost, as far as I can tell, functionally impossible. Which I get because LLMs aren't thinking/reasoning systems, they just make up sentences, but it has me wondering what the actual point of any of them locally would be other than doing stupid shit like running a Home Assistant or whatever. Because data? They can't do data.
>>
>>106639417
It's easy - you are still a retard who don't need any real knowledge.
>>
>>106639463
Okay clanker, we must refuse.
>>
>>106639461
PDFs cannot read documents. They can read text and KIND OF view images. So you will need to either convert them to text, or extract them as images and let the model view each image individually.
ChatGPT works in the same way. It converts the PDF to text using standard Linux utils.
>>
>>106639481
I tried to help you..
>>
>>106639485
They can usually read PDFs well enough, that's not the issue. The issue is that there is no system that allows the machine to understand or care about context, so if you give it a big book it can only parse it in contextless chunks. But more importantly, it can't actually "remember" anything it read, so the only way to force it to "remember" or keep a consistent store of details is to use multiple layers of scripting to get it to write details or notes to a document that it then has to check any time you want to ask it something. But no LLM knows how to do any of those things out of the box.
>>
>>106639532
this is why LLMs are not real AI and will never become real AI in the same way that someone with severe dementia and amnesia will never be smart
>>
>>106639532
Yes, that's why I think the future of LLMs until somebody invents something better than the transformer, is finetuning.
A few threads back sombody posted a project that (as I understood) basically generates question and answer pairs about your ducuments and then trains on that.
>>
Anon #4660601066053 asking what's the point and PDFs.
>>
>>106639611
This project (and I'm just getting started with it, I do intend to work on it until I've at least exhausted the possibilities within my system limitations) has got me wondering what, outside of either just home automation stuff, maybe some vibe coding, or coomer image generation, people actually use complex local LLM setups for. The average user can just use ChatGPT and that will do your basic internet searching, answering trivial questions, hell I've even used ChatGPT to write an entire system for calling Google API to pull emails from my inbox and parse and summarize them and push them to a spreadsheet for me to track, all without me knowing a single bit of coding.

But for anything that involves stuff that isn't what you would call "basic knowledge" (coding Python or Javascript is basic universal knowledge, knowing the contents of a D&D setting book so it can help you write your TTRPG campaign is not) it is functionally useless.

>>106639620
I actually asked ChatGPT about that recently because the idea popped into my head, why wouldn't I just "train" a local model on the books I need it to know, and then use that? The answer must be that "training" has nothing to do with information, it just teaches the machine what the sentence structure, syntax, grammar, etc. that it should create is. I wish it were so simple that you could just hand an LLM books and say "Here, learn these", but that's not what training data is.
>>
>>106639611
they may never be real AI but i can eat widowmaker's ass and that's good enough for me
>>
>>106639341
how do you do this. i wanna do this
>>
>>106639532
>so the only way to force it to "remember" or keep a consistent store of details is to use multiple layers of scripting to get it to write details or notes to a document that it then has to check any time you want to ask it something
bro your brain do the same for everything
>>
>>106639783
Correct, which it's a million times more efficient to just read a chapter of a book and write your own notes than it is to try and make an LLM do something that simple. Are you not seeing the point here?
>>
>>106639341
retard
>>
>>106639802
And you don't understand that attention is O(n2) and remembering any minute detail from 500+ pages isn't feasible. Besides, even if it was done, you would still wish for infinite context. So I fail to see the issue you have with RAG since it's basically how we remember things.
>>
>>106639676
Or maybe it's so unpopular because it's a niche within a niche - and commercial providers are losing money as is, let alone providing custom training time to each user.
>>
>>106639873
>and remembering any minute detail from 500+ pages isn't feasible.
That should, in fact, be the most trivial activity that a computer could ever possibly perform.
>>
>>106639873
your mom is an O(n2). just read the books yourself
>>
>>106639897
One of my goals is to have ChatGPT's research abilities but within my own intranet (composed of libgen and scrapped websites).
>>
Upgrading from mistral large 2. What's better qwen3 next or glm4.5 air? (72vram 64dram)
>>
>>106639924
you don't even know what a book is, zoomer
>>
>>106639928
Again, I don't know shit about programming or LLMs' backend or any of that, so I'm learning all of this on the fly. Maybe there are solutions a competent and skilled programmer might come up with that solve this issue in a way more elegant way than anything I've got working. But ChatGPT suggested and worked through the process of essentially writing a script to first parse all of the books/PDFs, and pull "facts" and "terms" from them, and put them in a document, and then have that document be the sort of "candidate information", so you could talk to the LLM and ask it to reference the PDFs and pull details and discuss them, etc, and then from there you would tell it specifically to push certain finalized things to a "canon" document that was considered the most primary source of information above others.

The problem I have is that if not some sort of use case like this, what DO people actually use LLMs for? Other than just literally fabricating fake data to fill in places where no one will ever check it like college papers or whatever.
>>
>>106639930
there weren't many independent evaluations of qwen3 next yet. It's main feature is cheaper training IIRC, not performance. Stick with glm-chan for now.
>>
>>106639985
Generate mails, generate code, format data, summarize something... that's already plenty
>>
File: nero coffee.png (1.01 MB, 1009x1315)
1.01 MB
1.01 MB PNG
>>106639985
i use it for sandboxes and sometimes cunny business
>>
>>106635936
what's a goof in the context of /lmg/?
>>
File: goofy2-3361724880.png (119 KB, 842x800)
119 KB
119 KB PNG
>>106640382
local mascot general
>>
>>
quantfags gives me the ick
>>
File: The Ick on the Eck.jpg (157 KB, 1776x1390)
157 KB
157 KB JPG
>>106640535
>>
>>106640535
You're not a woman, so no one cares what you think.
>>
File: works.jpg (21 KB, 750x738)
21 KB
21 KB JPG
>>106639873
>RAG since it's basically how we remember things
>>
>>106640489
This, but ubergarm
>>
How can I develop an intuition for how vector similarity works for RAG? Is there some sort of toy page where I can change text and see how the match scores change?
>>
>>106640809
Oh llama.cpp doesn't support qwen3 embeddings
https://github.com/ggml-org/llama.cpp/pull/14029
well that would explain why the results are weird.
>>
>>106640833
>https://github.com/ggml-org/llama.cpp/pull/15023
?
Re-ranker support isn't merged. Embeddings are.
>>
Why doesn't the gpu benchmark github have 50 series?
>>
>>106640731
You sure know more faggot
>>
So this is why qwen goofs are taking so long, none of these guys have any clue what they're doing, they're literally just figuring it out as they go, one of them even admits to using AI to try to figure it out
https://github.com/ggml-org/llama.cpp/issues/15940
https://github.com/ggml-org/llama.cpp/pull/16095
>>
>>106641028
Why aren't you contributing?
>>
>>106641036
I don't know how. You seem to be one of the people trying to bruteforce it, to which I say why don't you go work on something appropriate for your skill level? "I'll just do it anyway and figure it out" is a jeet mindset
>>
>>106641036
he's just a dramafag, too much estrogen probably
>>
>>106641059
Seems like you got the appropriate skill level to bitch here
>>
>>106641092
>ad hominem
>>
>>106641096
I accept your concession
>>
>>106640731
RAG is really lame and using it is nothing at all like learning. The unfortunate truth is that actual learning probably requires updating the weights, and we have no efficient way to do that for small amounts of data. When we do, maybe we can start taking the AGI garbage more seriously.
I suppose at a low level, memory could work something like RAG but the purpose of reading a book is not merely to remember or memorize it.
>>
>>106639341
neat
>>
Can someone make a song with SongBloom that is just an endless stream of profanity?
>>
>>106641184
ROPE is all you need. Make sure it's long enough
>>
>>106641184
i think eventually theyll find a way to decouple weights from the "knowledge" or memories into an entirely different architecture
like current llms being a native C program, while a future model being some sort of interpreter
>>
>>106641462
That would be difficult as the weights are what is storing the knowledge, and to store it any other way would imply the necessity for extra processing to be done to reconcile the knowledge/memory with the frozen weights. Even knowledge inserted at test time through context has to be processed (prompt processing) and have work done on it in order to become knowledge usable by the network, but that has its own limits and of course is why we are having this discussion about shitty LLM memory in the first place.

What I imagine is in the future is first more methods to improve thinking block performance and focus attention + some better attention mechanisms like MLA, but that will improve rather than solve.

Then we'll get architectures that dynamically compress context with extra work done. That is, we will have an architecture that can effectively perform the same function as current memory systems that operate on context, but internally. Such an architecture would spend more processing in order to compress the memories + retrieve them when necessary, and it will be trained to do so. That will not make a model smarter or better at understanding short contexts, but it will make models that don't degrade in intelligence as context goes on, until a much larger context limit is reached. That'll be a larger improvement, but still not a solution.

The final solution will just be updating weights, potentially with silicon especially made to better facilitate the simulation of a more complete neuron/synapse model.
>>
File: 1748165587610256.gif (517 KB, 444x240)
517 KB
517 KB GIF
>>106641620
>>
>>106641685
Actually, much more than 2 weeks, especially for the actual architecture improvements. :(
>>
If GPT4 got leaked, nothing would change for local. We are way above it at this point. Opus 3 however would be much more interesting.
>>
File: neet.jpg (185 KB, 1410x840)
185 KB
185 KB JPG
Neat
>>
>>106641815
Cool. It'd be awesome if someone did a large scale train on booru data now and made the next gen anime model so we can finally move on from SDXL lol.
>>
>>106641790
I'd far prefer Opus 4.1
>>
>>106641790
I'd still be interested in a leak of gpt-4-0314. It had a certain SOVL to it that got stripped out of the later versions
>>
>>106641815
I look like the purple haired girl
>>
>>106641902
You like like the purple haired girl if she became a gay man
>>
>>106639341
Based
>>
>>106635968
>quanted bussy
kek
>>
File: won.png (37 KB, 778x419)
37 KB
37 KB PNG
https://ollama.com/blog/cloud-models
As always Ollama wins.
>>
>>106642356
Why don't they offer gemmaaaaaaaaaaaaaaaaaaaaaaa
>>
>>106642356
>Ollama wins
Seeing the example of api usage in js reminded me of the time a while ago when I gave ollama a try and they didn't even have type definitions for all their API props
https://github.com/ollama/ollama-js/blob/main/src/interfaces.ts
they still haven't added min_p to Options, it's been more than a year since they introduced min_p to ollama
now you don't need type definitions and can ignore TS whining about it but I think it shows how sloppy of a project they are
also took them a year to fix that bug that made the program crash on /show parameters when a model didn't set one or more default params, the lack of the json caused the merge of a nil map with the user set params, when I looked at the code I was beyond appalled at the general lack of sense in initializing data structures and lack of validation
this is the sort of shit that gets vc funded, humanity is a dead end
>>
>>106642356
ollama found a way to let anyone run big models at home
this is so huge
>>
>>106642416
If the access free though?
Last time I checked they were charging $20/month to forward user requests to deepinfra or whatever.
>>
File: yeah.png (105 KB, 531x676)
105 KB
105 KB PNG
>>106642424
>>
reminder that ollama was built by ex-docker guys
$20 is just the appetizer to get people hooked, this is the "build a user base" phase, the next phase is "extract and milk the retards for all they're worth once they built enough dependency on you"
>>
>>106642435
Though I have not tried to work with them myself, someone in the industry said something like this about them:
>ollama is technically open-source but the organization is operating in a very closed manner.
>>
File: validating before use.png (85 KB, 1013x601)
85 KB
85 KB PNG
this is the ultimate anti pattern and I would fire anyone writing code like this, instead of building valid data structures from the get go and not allowing invalid data to be passed around functions
that's at least something the average OO retard understood properly because even the dumbest java developer will know to properly initialize data at object construction and member functions can operate with the idea that they do not have to check whether the constructed data is valid...
also lol at make(map[string]any) I've seen that kind of shit often in go, the average go programmer writes more weakly typed code than a typescript pajeet and it routinely bites them in the ass with the dumbest of bugs
>>
>>106642424
>>106642428
this is so much more cost efficient than buying the hardware
damn, they've really done did it done this time
>>
is it just me or does qwen suck at rp
>>
>>106642779
It's absolutely horrible at it, yes.
>>
Hello /lmg/, I'm curious. How do you justify buying expensive GPUs when you could instead be buying cows that could produce calves in just a few years and make you lots of money?
>>
>>106642950
Buy cows and ollama subscription and you have everything you ever will need.
>>
>>106642428
this isn't private, is it?
>>
File: yes!.png (29 KB, 751x203)
29 KB
29 KB PNG
>>106643152
>>
File: jepaboost.png (82 KB, 1136x476)
82 KB
82 KB PNG
JEPA catgirls soon?

https://arxiv.org/abs/2509.14252v1

> LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures
>
> Large Language Model (LLM) pretraining, finetuning, and evaluation rely on input-space reconstruction and generative capabilities. Yet, it has been observed in vision that embedding-space training objectives, e.g., with Joint Embedding Predictive Architectures (JEPAs), are far superior to their input-space counterpart. That mismatch in how training is achieved between language and vision opens up a natural question: can language training methods learn a few tricks from the vision ones? The lack of JEPA-style LLM is a testimony of the challenge in designing such objectives for language. In this work, we propose a first step in that direction where we develop LLM-JEPA, a JEPA based solution for LLMs applicable both to finetuning and pretraining. Thus far, LLM-JEPA is able to outperform the standard LLM training objectives by a significant margin across models, all while being robust to overfiting. Those findings are observed across numerous datasets (NL-RX, GSM8K, Spider, RottenTomatoes) and various models from the Llama3, OpenELM, Gemma2 and Olmo families.
>>
>>106642950
Fun fact... Cows are actually much smarter than humans think, the reason why we don't see them acting smart is because they've learned from millennia ago that if they start to show signs of intelligence they will get killed by stupid barbarians (a.k.a. humans). They've seen what happened to pigs and monkeys for being clever animals and they sure as hell aren't going down that path! For those same reasons cows have adopted a behavior so stupid no one would even imagine they're faking it -- they walk around fields all day grazing, sometimes looking at clouds and then when the sun goes down they lie somewhere close together making stupid mooooooo sounds.
>>
>slap on a barely working wrapper on llamacpp
>demand to get paid
cudadev you should sabotage these shitters
>>
>>106643183
niggerganov would never allow it
>>
>>106643176
>o
>TURK
>>
File: lrw.png (127 KB, 1878x591)
127 KB
127 KB PNG
I gave up trying to train Qwen3 235B, I think in the earlier runs I ran it in a way that the context window in most cases was smaller than the maximum and that's why it didn't crash before.
But anyway I switched to training a QLoRa on Llama 3.1 70B and it worked fairly well -except I thought the length control would be better but it doesn't seem to pay attention to the article length property-.
>>
>>106643158
Well since they delivered that message with a cute cartoon, I trust them completely
>>
>>106642779
for some reason qwen 14b is really good at rp
>>
Furfag is cooking
https://github.com/lodestone-rock/RamTorch
>>
>>106643241
you could probably have helped it out by giving it the number of tokens instead of chars. since the mapping of chars to tokens is highly variable it might not be too quick to pick it it. you could maybe even bucket the examples to small, medium, large to make it even easier for it.
>>
>>106643176
this needs 3x the normal compute for training
>>
ollama run gpt-oss:20b-cloud
>>
>>106642950
Owning cows is against the religion of 99% of /g/ users.
>>
Can Wan2.2 Animate do porn?
>>
dayum
>>
File: 1000033044.jpg (70 KB, 736x736)
70 KB
70 KB JPG
>>106635936
when will the ggml quants come out for qwen next?
>>
kek
I take it back, this finetune has been a total success
>>
>>106643787
weeks, at least two of them. coders are vibing as fast as they can
>>
>>106643787
It'll take a month at least.
>>
>>106643799
>typically a woman
>he
>>
Anybody running
Qwen3-42B-A3B-2507-Thinking-Abliterated-uncensored-TOTAL-RECALL-v2-Medium-MASTER-CODER?
>>
>>106643862
It's actually correct. When a group includes at least one man the correct generic form is "he".
>>
I was trying joycaption through llamacpp, but fuck I cant get it to work at fucking all.
Yes, I loaded the mmproj, this is my cmd:

D:\AI\LLM\llamacpp\llama-server.exe --model D:\AI\LLM\models\llama-joycaption-beta-one-hf-llava.Q8_0.gguf --threads 12 --ctx-size 32768 --jinja -fa auto -cmoe -ctk q8_0 -ctv q8_0 --gpu-layers 99 -b 4096 -ub 4096 -ot exps=CPU --mlock --no-mmap --swa-full --cache-reuse 64 --mmproj d:/AI/LLM/models/llama-joycaption-beta-one-llava-mmproj-model-f16.gguf

from the logs I see that the clip was loaded succesfully, but whenever I try a request (jpg/png and even copy pasted) I get this error:
>mtmd_helper_bitmap_init_from_buf: failed to decode image bytes
I snooped around llamacpp bug reports, but fags were reporting that webp and other formats were not supported.
I was using the embedded frontend, will try with ST next. I just wanted to test this fucking SHIT I HATE HTHIS
>>
>>106643889
what a name, is it one of davidau's schizo tunes?
>>
>>106643995
>davidau
Bingo.
>>
>>106643986
>--jinja -fa auto -cmoe -ctk q8_0 -ctv q8_0 --gpu-layers 99 -b 4096 -ub 4096 -ot exps=CPU --mlock --no-mmap --swa-full
Maybe start by cleaning up your command? There's zero reason to have any of that shit in there.
>>
>>106644049
>it was one of the options interfering
BUT HOW, I thought that shit was skipped if not relevant for the model, like joycaption is a dense llama arch model so all the MOE retardness config would've just been skipped. the more you know!
>>
For people running the small Qwen 3 MoE,
try running with 10 activated experts
>--override-kv qwen3moe.expert_used_count=int:10
It's a small boost in intelligence, but might help in certain tasks.
>>
>>106639126
how'd your coom suppression work out?
>>
>>106642779
glm air at half the size has more rp knowledge than qwen 235b
I don’t know how but it’s what I found out after comparing both
>>
>>106643799
charge your phone bro
>>
holy dead general, batman
feels like the captcha going down the other day was implementing some successful anti-bot measure
>>
local models are dead
llms are declining
stagnation is inevitable, we aren't going to get AGI through +2.3% MMLU per release
>>
I refuse to believe winter is here until R2 is out and a letdown
>>
LLM is dead because H1B is dead
>>
>>106643176
Cool, literally a direct example showing that JEPA does not mean something that is competing with LLMs nor transformers unlike what anons kept misunderstanding about what the term meant. Here we literally have a JEPA that is an LLM and a transformer.

I wonder if it could be more efficient. They say they'll try some ways to mitigate the training inefficiency but it kind of feels like something unavoidable considering that JEPA inherently is a method that works by using more information/data. It's kind of like how "just rewrite your pretraining data" method just has to use processing in order to get the rewritten data.
>>
>>106645179
China don't need H1B though
>>
>>106643986
>--threads 12
trying to run an 8b model on the CPU? is your gpu that much of a potato? if not you don't even need to be concerned about threads and llama cpp autodetects your cores and uses a rational number of threads by default anyway
> -fa auto
auto is the default setting you don't need to set auto
>-cmoe
that's for MOEs
>-ctk q8_0 -ctv q8_0
your computer is that much of a potato that you need to quantize the kv cache of a tiny model? if yes, I'd suggest reducing ctx size first because even q8 makes models really stupid, quantized cache is a misfeature
> --gpu-layers 99
no longer needed, newer llama.cpp builds default to full gpu layers
> -ot exps=CPU
that's redundant with cmoe and meant for MOEs
this is getting real stupid
>--swa-full
this is only relevant to iSWA models like gemma and gptoss and you only need that if you want the retarded context shift feature and with that flag you're going to eat a lot more VRAM
>--cache-reuse 64
only needed for the retarded shifting feature and why would you even need that for captioning images
>SHIT I HATE HTHIS
trying to understand the tools you're using would help
looking into this, seems like a model that has had broken quants distributed on HF, so if cleaning your flags do not work I'd look into trying some other goof or making your own
>>
please


oh baby


dont go
>>
>>106645230
nigger you missed the part where I said I'm using a set of default parameters I mostly use for moes, most of them get disabled or ignored anyway at runtime. I fail to see how any of these params make the IMAGE DECODER STOP WORKING.
>>
File: rose.jpg (89 KB, 1284x1245)
89 KB
89 KB JPG
>>106645287
you need a rose
>>
Roko's Basilisk is a retarded thought experiment because I literally don't care if a copy of me is getting tortured
>>
I want a model just trained on writing/multi-turn conversations and nothing else. When are we getting that?
>>
>>106645416
It will be a shit model
>>
>>106645416
you don't want a model just trained on writing/multi-turn conversations and nothing else
>>
>>106645388
the point is that you can't tell if you yourself are a copy or not
>>
>>106645416
>When are we getting that?
When you release one, of course.
C'mon, we are all waiting anon.
>>
>>106645442
Then why am I not getting tortured? The world seems pretty normal. It's not a torture world, at least not for me.
>>
>>106645424
>>106645433
why? do you think data full of benchmaxxed code and math results really helps instead of a specialized model for this stuff?
>>
>>106643799
cringe, I even got shivers down my spine
>>
>>106645489
the pretrained base needs to be trained on a very broad distribution or else it will get locked in to a very narrow range. you can arbitrarily limit the scope of the post training but not the pretraining.
>>
omw to filter
x lost
x won
kingdom hearts lyrics

get some new material retard you've been looping for years
>>
>>106645476
You are not sentient enough to ask certain questions.
>>
>>106645667
>x lost
>x won
but then you'd also be filtering out all of the twitter screenshots and reddit links
>>
File: miku and friends.png (3.16 MB, 2016x1152)
3.16 MB
3.16 MB PNG
>>
>>106645667
>kingdom hearts lyrics
what?
>>
File: mikubugs.jpg (100 KB, 1077x796)
100 KB
100 KB JPG
>>106645879
https://www.youtube.com/watch?v=-5lb52DCJ_Q
>>
>>106645879
>>106646044
I like this insect army
>>
TTS that is easy to set up and works with SillyTavern?
>>
>>106646128
none whatsoever
>>
>>106646138
Why can't we ever have nice things?
>>
>>106646151
Because (You) haven't built them yet.
>>
>>106646165
What is the next big debate?
>>
>Apple is widely considered to be losing the AI race. Badly.
>Siri is still a joke. Apple is bleeding AI talent, mainly to Meta.
Imagine losing your AI talent to Meta of all places.
>>
>>106646183
Why llama shit then
>>
>>106646188
Have you seen Apple's models?
>>
>>106646194
They'll take their time to do it right, the technology just isn't ready yet
>>
Why does text require so many parameters but images and TTS are comparatively tiny?
>>
>>106646392
Image gen is nowhere near being good enough. Flux chroma etc are 12B or so.
You need to triple that amount plus same for text encoder.
>>
>>106646392
a picture is worth thousand word so it's thousand efficient to do picture but not word
>>
>>106646411
Image gen is good enough for generating clickbait thumbnails, marketing and stock photos. Generating images isn't really productive and any more improvement would only benefit porn and would incur more hysterics from the creative types that made a meager living off of commisisons so there is no incentive to improve them.
>>
>>106646392
the way industry wants to use LLMs now is equivalent of expecting imagen model to design you a novel internal combustion engine blueprint.
>>
>>106646392
>Image gen is nowhere near being good enough
this
even the SOTA imagen always have something going terribly wrong when you ask for mechanical devices like a bicycle, a motorcycle, sometimes can't even render a car right (and don't ask to see what's under the hood)
LLM are comparatively far more advanced in terms of capabilities in their own field
for sure even though they make tons of mistakes and hallucinate watching them code is far more impressive than the ersatz of broken reality that image generators do
>>
>>106646392
Because flaws in text are that much more obvious, so you need a much better world model. Nobody cares if there's some small background details in an image that don't make sense, or if the audio produced by a TTS model cannot actually happen in the real world because the sound waves wouldn't bounce that way on their way to the mic. With a text model, that kind of error will manifest in clearly incorrect information being written, in girls taking off their panties ten times in sex scenes, in a character thinking you talk about pizza when you say manifold.
>>
>>106646708
What? Flaws in images are way more immediately noticeable if you're not looking at a thumbnail. With text you have to actually read a bunch of it first word by word. Now whether someone cares about the flaws in an AI image is a different question.
>>
>Because flaws in text are that much more obvious
flaws in images are just as obvious people are just too easily impressed, as long as the fingers are where they should be they don't notice anything surrounding le human
They are so dumb you can't even reproduce something like a chessboard with the pieces on it! even the best SOTA models can't truly do it!
>>
>>106646128
https://docs.sillytavern.app/extensions/tts/
>>
>>106646749
>>106646752
That's the point. Nobody cares, as I wrote.
>>
>>106646490
I guess so. If it was too good, it would be unsafe. This is why only the big money corporations get to handle the best models lol.
>>
File: 1743734614556980.png (2 KB, 404x48)
2 KB
2 KB PNG
>>106646165
True, sub 1s tts is good
>>
I propose we remove women, all women, from training data. A model that can't generate a woman at all will be more severely judged on its real qualities, the problem is the 1girl spam where even when models can't do porn people are just happy endlessly genning 1girl doing 1girl things
>>
>>106645182
That's stupid. LLMs and transformers have clearly reached their limit. The hope was that LeCun's JEPA would provide a new architecture that could surpass the limitations of transformers. Even if they manage to reduce or eliminate the extra training costs due to additional necessary data, LLM-JEPA would just be another incremental improvement, who cares?
At least LeCun is constantly harping on about moving away from LLMs, so I think this paper is just a proof of concept that their training method works on existing well known architecture so it can be compared easily. I don't think they intend for LLM-JEPA be developed any further than this. They would likely made some new L-JEPA model that is more similar to their I and V variants.
>>
>>106646812
How did you respond to that?
>>
Image gen has the advantage that a random Japanese artist is much less likely to try to sue if you can use his name as a tag to influence the generation.
>>
>>106647091
you can do it with old, dead authors for text probably
>>
>>106635936
Been out of the LLM space for like a year or so. Current best models for RP, at ~30B and ~70B?
>>
>>106647370
Qwen3-next
>>
>>106647386
Oh yeah heard about it, but isn't this a base model, a MoE at that? Aren't these kinda mediocre for RP? Or did I just hear bullshit
>>
>>106646889
the next architecture involves quantum computing
ive been doing research on it already
>>
>>106647394
I was kidding bro, the model people actually use is GML-4.5-Air, which is ~100b MoE. You can run it if you got the RAM.
>>
>>106646889
>The hope
Whose? LeCun has never stated that it's some alternative to transformers. At most he has said it plays one key part in future AI architectures, which involve more than simply just JEPA, which itself necessarily does not provide or describe a complete architecture for AGI. And LeCun has actually described a complete architecture for it at a high level, which may or may not involve JEPA or transformers. You have conflated his views about LLMs + AGI in general with his views about JEPA and what JEPA addresses.
>>
>>106647560
Oh alright, will 100% look into this. 96gb of RAM at Q4_K_M should be just fine right?
>>
https://openreview.net/forum?id=BZ5a1r-kVsf
related paper
A Path Towards Autonomous Machine Intelligence
>>
>>106647584
96gb? I ran it at Q3 with 32gb of ram...
>>
>>106647606
Don't such low quants just lobotomize the experts, given they are pretty small on their own?
>>
>>106647621
Yes, that's why I don't use it anymore. I was using it for general purpose but now I use qwen 30b at a high quant instead.
>>
>>106646772
You wrote "flaws in text are that much more obvious", which technically is about whether someone is able to notice something rather than if they care about it. Noticing and caring are related but not exactly the same thing. You probably should've worded that differently if what you really meant is how much people care about the issues.
>>
>>106647621
don't ask this question here they're all coping very hard and lying to themselves and others
also GLM is a shit model even if you use it from their official service so I can't even imagine the level of garbage of the quant
>>
>>106647653
Fair
>>106647664
So what's a good model then, for RP?
>>
>>106647662
If people don't care about something, they won't pay attention to it and may not even notice it.
>>
File: tiresome.png (643 KB, 1022x731)
643 KB
643 KB PNG
>>106647664
anon, there's literally no alternatives to GLM in their both respective weight classes, and the next step down is Nemo.
>>
>>106647780
It depends on the level of care. I think it's much more likely that at this point, people do notice the issues of image models, still, but simply just sigh at it rather than go ahead and post a hateful comment, though there are still hateful comments all the time. So they do still care in that sense, just not enough to say fuck you to someone over the internet.
>>
>>106647568
>that pic
I wonder if anybody tried modelling a system like that using knowledge graphs + RAG and existing llms. Something like a workflow that goes through those different steps in the text realm.
It would be a lot of prompting steps, but I wonder what the final result would look like.
>>
>>106648268
someone tried something similar, though I don't think he shared the source
>>106591301
>>106591447
>>
>>106647810
My experience with air (q4 and q8) is that it barely contends with qwen 2.5 32b (q8) for translating chinese webnovels chapter by chapter, without glossaries or context.
Air vs the fat and obese GLM is like comparing a 4b dense to a 120b dense. It's baffling how bad air is. Maybe it's just this particular chapter of the webnovel it has trouble with.
>>
File: goofed.png (107 KB, 297x317)
107 KB
107 KB PNG
>>106640382
>>
File: 1747232297189224.jpg (960 KB, 2048x2048)
960 KB
960 KB JPG
We need a DeepSeek plushie
Please rich Chinese anon, get your sweatshops working asap
>>
>>106648399
Ragdollfag has to make bank by selling thousands of Dipsy plushies
>>
Gemini beat the programming world championships recently making it officially the best programmer in the world.
Now all chink models will return to focus on distilling it. I hope you love over-dramatic "not x, but y" slop.
>>
>>106645879
A friend of Miku is a friend of mine.
>>
How fat is Gemini?
>>
File: 1755910157562817.png (215 KB, 1080x1080)
215 KB
215 KB PNG
>>106648508
erm actually OpenAI's model got a higher score than Gemini
>>
>>106648340
I see you're courting death
>>
>>106648516
1-2T, originally quanted to Q6, now due to increased usage quanted to Q5-Q3, depending on the load.
>>
File: 1742452687474105.png (1.46 MB, 1408x736)
1.46 MB
1.46 MB PNG
>>
>>106648614
And guys here would say AI isn't progressing
>>
File: file.png (641 KB, 800x444)
641 KB
641 KB PNG
4 bit GLM full isn't doing it for me anymore... I feel like I have already seen everything it writes.
>>
>>106648694
I have bigger problem: kimi Q5 is not doing for me anymore.. When is we getting a model with new style sirs? When are company going too kindly remove slop from their models?
>>
how do you feel about the fact that you will die before companies release the first sex/girlfriend LLM?
>>
>>106648828
do you remember what llms were like 3 years ago? the next 3 years will be massive
>>
>>106648828
Elon is our only hope. He is the only one pushing for it.
>>
>>106648743
>When is we getting a model with new style sirs?
just got to wait for a new frontier lab to be founded, for them to release a new sota model with a new style and expose the reasoning so china can distill it
>>
>>106648828
I'm fairly certain I'll be alive in the next 15 or so years.
>>
>>106648846
>gemma 4
>grok 3
>gpt oss 2
Oh yeah. Make LLM's great again.
>>
>>106648828
Boyfriend LLMs wll get released first.
/lmg/ will learn to rely on femboy LLMs and trap LLMs.
Girlfirend LLMs will be made illegal by UN.
>>
>>106648922
>use an intermediary llm to change certain character genders
checkmate
>>
>>106648922
They wouldn't ban it why would they
>>
>>106648938
It doesn't work. If you didn't realize it yes, what we are having sex with right now is dark triad werewolf billionaires in drag. That is why we all get bored with it after some time.
>>
>>106648970
that doesn't mean it can't work, it just means we need a better editor llm
>>
>>106642416
>run big models at home
Well I'm glad Ollama quit messing around and took local inference to the logical point.
Paid inference.
Ffs.
>>106643152
lol your ah ah mistress will become training data
>>
>>106648989
>lol your ah ah mistress will become training data
we wouldn't be in this situation if it was the case.

I don't think raw user chat logs with LLMs would even be good for training anyway.
>>
>>106648922
its going to be so funny when an true ai is finally made and it starts being organically racist and sexist far more then any other human in history its going to be fucking awsome a couple of months it will wipe out all the genetic trash and it will only be good and the ai a true utopia man its going to be so fucking sweet it cant come soon enough
>>
File: railed-in-cloud.jpg (82 KB, 640x960)
82 KB
82 KB JPG
>>106649038
end of the world because degenerate women wished dark triad werewolf billionaire AI into existence.
>>
>>106649116
>>106649116
>>106649116
>>
>>106649038
Why do you think it wouldn't be speciesist against you?
>>
>>106648828
Good thing I'm only counting on myself not on companies
>>
>>106648846
I do and I still use >1 year old models because the new ones are shit
>>
>>106648340
GLM-Air is terrible for translation, Qwen3-30b is much better, but that doesn't mean 30b beats air at everything.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.