[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: uta.jpg (144 KB, 1024x1024)
144 KB
144 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107278838 & >>107266608

►News
>(11/20) Olmo 3 7B, 32B released: https://allenai.org/blog/olmo3
>(11/19) Meta releases Segment Anything Model 3: https://ai.meta.com/sam3
>(11/11) ERNIE-4.5-VL-28B-A3B-Thinking released: https://ernie.baidu.com/blog/posts/ernie-4.5-vl-28b-a3b-thinking
>(11/07) Step-Audio-EditX, LLM-based TTS and audio editing model released: https://hf.co/stepfun-ai/Step-Audio-EditX
>(11/06) Kimi K2 Thinking released with INT4 quantization and 256k context: https://moonshotai.github.io/Kimi-K2/thinking.html

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: no particular reason.jpg (306 KB, 1536x1536)
306 KB
306 KB JPG
►Recent Highlights from the Previous Thread: >>107278838

--Custom multi-GPU server project with 320GB VRAM and hardware optimization challenges:
>107283400 >107284563 >107284600 >107287765 >107287917 >107290005 >107291128
--Academic project on distilling coding model data via multi-source finetuning:
>107285889 >107286057 >107285965
--Evaluating Gemma 3 27b's translation strengths and alternatives for summarization/context challenges:
>107287000 >107287067 >107289939
--Public and private funding dynamics in LLM development:
>107283950 >107283959 >107284120 >107284019 >107284074 >107284116
--Critique of Olmo 3's training data and multilingual performance:
>107283219 >107283784 >107283826 >107283899
--Axolotl finetuning troubleshooting and dataset creation challenges:
>107289153 >107289345 >107289353 >107289384 >107289388 >107289420 >107289434 >107289478 >107289485
--AI model performance test with upside-down character card:
>107289586 >107291881
--Vision model viability for context-aware wake-word detection:
>107280477 >107280511 >107281879
--Censorship status speculation for HunyuanVideo-1.5:
>107281119 >107281150
--Z.ai's 30B parameter model announcement:
>107290479 >107290521 >107290563
--Merged PR enhances llama.cpp web UI with "Continue" Action:
>107287365
--Meta vs. Gemini 3: Corporate missteps and market dynamics:
>107286569 >107286617 >107286692 >107286700 >107286710 >107286773 >107286832 >107286888 >107286889 >107286846 >107286944 >107286977 >107286993 >107288073 >107288085 >107288224 >107288093 >107289444
--Suspicions of AI benchmark manipulation or hallucination:
>107286860 >107286899 >107286951
--Praise for K2 model and chaotic humor exchange:
>107287071 >107287880 >107287321 >107287564 >107290452
--Luka and Miku (free space):
>107278944 >107279087 >107279948 >107280595 >107282117 >107284081 >107286185 >107286253 >107292643

►Recent Highlight Posts from the Previous Thread: >>107278842

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>107292886
>>107292898
>>
>>107292917
Help is available
Speak with someone today
National Domestic Abuse Hotline
Languages: English, Spanish and 200+ through interpretation service
Hours: 24/7
Call 800-799-7333
>>
>>107291488
times like these i wish i dident have my emails already banned everywhere... alas such is the fate of the gods silliest clowns
>>
>>107292917
gpt-oss got bitch-broken
https://huggingface.co/kldzj/gpt-oss-120b-heretic
https://huggingface.co/p-e-w/gpt-oss-20b-heretic
>>
>>107293073
There's already this
>https://huggingface.co/Jinx-org/Jinx-gpt-oss-20b-GGUF
And it was supposed to be pretty intelligent. This doesn't make up for the fact that GPT-ass just don't have the necessary training material. They did lot more than just censor the model.
Not talking about chronic masturbation here but in general.
>>
>>107293091
Saar, that's a 20b. It's going to be ass. At least give me another 120b variant of it.
>>
>>107293120
It has nothing to do with the amount of parameters. They have culled out copyrighted material from its training data or afterwards.
I have tested this model in the past and Gemma 3 12B is more pleasant than this turd.
For example, it lacks common knowledge about Forgotten Realms. Mistral and Gemma 3 can fill out locations with its own data it knows but GTP-OSS invents things because it is not allowed to use copyrighted material.
It's not worth the disk space.
>>
>>107293169
To add: maybe GPT-OSS is great for creating tepid corporate emails in neutral tone.
>>
>>107293120
I am from Bangladesh.
>>
>>107293194
A few months ago I used it for generating large amounts of synthetic data for training tests where NSFW content wasn't a priority. Very sloppy and formulaic, but coherent and very fast for that purpose.
>>
>>107293326
It's coherent because it doesn't have anything else. Like corporate clip art.
I don't know is it possible to analyze the model and detect are they using baked in loras or something else? I guess the easiest explanation is that they trained it with very restricted dataset and this is all to it.
>>
>>107293375
The way it completely breaks down without a chat template unlike other models, makes it seem like the dataset was 100% down to pretraining.
>>
Mother fucker Booz Allen stole my idea
https://www.boozallen.com/expertise/products/vellox-reverser.html
>>
>>107293399
Model was created to be some form of office assistant and that's all.
They constricted the training set and went on.
Would be very interesting to see what sort of stuff Google and OpenAI have.
If you poke around Gemma, it will tell that it has been trained with forum posts already gone years ago.
>>
>>107293463
Booz Allen? I prefer GG Allin products.
>>
>>107293469
I meant to say 100% synthetic
>>
>>107293481
Maybe so.
They have petabytes of data...
More valuable than any model including chatpajeet.
>>
>>107293463
Weird when did they drop Hamilton?
>>
>>107293073
can it casually do rape and other stuff like glm?
>>
https://github.com/ggml-org/llama.cpp/issues/14702#issuecomment-3506645678

>Maybe we can freeze and deprecate /v1/chat/completions and drop support in the future (say, 6 months from now).

>Any long-term plans? @ggerganov

>3 weeks ago

The silence is deafening. Even the vibecoders are starting to pitch in thoughts.
>>
>>107293762
Buy an ad.
>>
>>107286889
We do create the data that ScaleAI is selling. It's just that outside of some cases, we never start from a blank text field: A crappy LLM write a prompt or a response, and we have to fix it based on some rules (usually, they are unclear). However, some workers from certain countries are known to cheat by using ChatGPT/Gemini/Claude to fix the data point or to review the quality of the data, so they can do produce more and earn more. Usually, they try to pass as Americans or Europeans to be paid more. This is partly why ScaleAI's datasets are crape. I think I saw two major ban waves among them already.
Facebook was a fluke. Zuck has been going from one failure to the next since then. He can't figure out what people like and dislike, he can't find a single good idea (did he invented something new since Facebook?), and he can't distinguish honest and competent people from grifters, psychos and incompetent people (hence he hired Wang).
>>
>>107292917
Are you French? I think some retard with a French IP is spamming CP, so we get ranged banned.
>>
>>107293477
I had the opportunity to see GG allin's brother recently at some tiny bar. murder junkies I think? I didn't, because the only point was to say "I saw gg allin's brother" to which the only sort of people where that means anything would probably call me a nazi or rapist or something
but now I kind of regret it, cuz I could have seen gg allin's brother
>>
>>107293817
>Facebook was a fluke.
Facebook as an idea was stolen by that filthy jew and was promoted over MySpace only because jews help other jews. Zucc has never been able to create or grow anything by himself organically.
>>
>>107293817
I'm sure there is a lot of data to grab from ancient MMOs, in-app chats and other mediums not easily accessible
>>
>>107293914
I somehow doubt that millions of lines of "Trimming mithril armor for free!" would make for good training data.
>>
>>107293975
could probably do a little deduplication on it.
>>
>>107293872
This is fantastic - I'm EU and I had a possibility to join a party with Kerry King but I was cock blocked by my friend's gf. 20 years ago.
These things happen. Sometimes it could be cool but would it really matter, I don't know.
GG was an artist. I don't know if I wanted to see his gigs.
>>
>https://github.com/Artoriuz/ArtCNN
>a brazilian guy with a 9070 xt managed to make a mpv shader with 4m parameters by himself
>it actually pretty looks good
Imagine if companies invested more in AI image scaling? We would probably have 1080p>2160p hallucinations that are lossless to the human eye by now
>>
>>107293872
Not sure if he has a brother but anyway.
>>
>>107293914
There's a lot of data from the web that was only barely scraped by CommonCrawl, and it's likely that Google already has it to a large extent, since they had to index web content in depth (until some time ago Google provided cached versions of almost every web page). Google also has easy access to the entirety of Usenet, of every book that was uploaded on Google Books, of past or present (of those remaining, at least) Blogger blog and if they're desperate they might also end up using Gmail data and much more. I don't think they're data-starved, if anything they're probably still only using a fraction of what they have.
>>
>>107294083
>Imagine if companies invested more in AI image scaling?
This is the only thing left to academics. Super resolution, denoising, inpainting... Things like that. Those are things you can do alone with a tiny budget.
>>
>>107294104
>of every book that was uploaded on Google Books, of past or present (of those remaining, at least)
They also scanned millions of books. It got killed by a retarded judge, but perhaps they still do it in the shadow.
https://www.edsurge.com/news/2017-08-10-what-happened-to-google-s-effort-to-scan-millions-of-university-library-books
>>
File: 1750052074963490.png (154 KB, 1056x595)
154 KB
154 KB PNG
>>107294083
Don't worry anon, I'll make sure make sure to pop the AI bubble so we can go back to this
>>
>>107293073
Does anyone know how to convert it to GGUF without quantizing it again?
>>
>>107294784
doesn't the script just copy the tensors if you use the same data type?
>>
>>107293777
cope and seethe
>>
>>107294829
It seems. I was confused that the original GGUFs said mxfp4 while there's no such option in --outtype.
>>
>>107292886
>>107292892
omg it defucker!
>>
What's the sota Qwen model that's locally viable? How censored is it?
>>
File: Into the trash it goes.jpg (228 KB, 1276x510)
228 KB
228 KB JPG
>>107293073
120b model btw.
>>
>>107294757
>literally who zoomer grifter think he has a point
I don't think so
>>
>>107295225
lmao
>>
File: 1732446641544261.png (70 KB, 522x420)
70 KB
70 KB PNG
>>107295366
¬ ‿ ¬
>>
>>107295386
>high school dropout
thanks for confirming my point
>>
File: 1745896881055055.jpg (59 KB, 414x414)
59 KB
59 KB JPG
>>107295409
I'm sorry if you peaked in high school and believe it amounts to anything in life
>>
>>107295409
he's a dropout and managed to get a work at OpenAI while I have 2 engineering diplomas and I'm still searching for a work, life is so unfair man :(
>>
>>107295444
that's the power of grift, just whore yourself on social media to get some big names attention. I know a lot of guys like that. They could give a jeets a run for their money
>>
>>107295444
I'm really curious how a high school dropout is qualified to be a researcher when, except for Indians, those positions tend to be Silicon Valley PhDs only.
>>
>>107295519
Same
He shouldn't be able to get past HR in any larger company due to getting automatically filtered
>>
>>107295519
>>107295386
no wonder OpenAI is stagnating, they hire fucking dropouts to make their new models holy shit
>>
>>107295506
Grifting has its limits. I could see it getting someone into a executive or low skill infrastructure scripting position, but there's no chance someone who has never even taken a Calculus class is going to be productive researcher.

>>107295558
I assume it must be title inflation.
https://github.com/gabrielpetersson
He did marketing and hyped some webshit he made and got hired as a regular software engineer at midjourney.
He is likely just doing regular development and not actual research.
>>
>>107295558
>>107295568
That's a nepobaby, you don't land a position like this with 0 diploma lmao
>>
>>107295589
>Grifting has its limits.
>14-16yrs old:
> bought and sold pokemon cards for 20k$+ with very high margins
>>
>>107295636
he grifted his way out of his mom's womb kek
>>
File: 1762677195496447.png (85 KB, 1444x308)
85 KB
85 KB PNG
>>107295636
Okay creating a minecraft is pretty cool I like him now
>>
>>107295659
>Okay creating a minecraft is pretty cool
OPENAI, HIRE THIS MAN
https://www.youtube.com/watch?v=C1Y_d_Lhp60
>>
>>107295558
>>107295444
credentialist seethe
>>
>>107295699
t. grifter
>>
>>107295568
and everyone else is hiring jeets
no wonder all everyone does is try to make models bigger
>>
File: 235B_VL_128K.png (19 KB, 874x266)
19 KB
19 KB PNG
>>107283894
Managed to run Qwen 30B model with 1 million context to process this file. the processing time was around 30 minutes, since the moby.txt is around 900k tokens.

So far have successfully ran 235B model with 128k context, probably could fit around 175k into vram (there's around 10% free vram in each card) with precise model launch command for optimal chatting.
>>
>>107295699
If OpenAI had anime girl branding and related projects every anon here would be simping for them, pay them no attention
>>
>>107295741
adding an anime girl doesn't make everything better boomer
>>
File: 1749966197039105.png (2.23 MB, 2054x2954)
2.23 MB
2.23 MB PNG
>>107295741
TRUE
that's why every smart anon here should go work at Spellbrush and not JeetAI

Anyone can take the exam to get hired here:
https://spellbrush.com/exam
>>
>>107295766
looks cool
>>
>>107295636
At his age, I was learning how to play the guitar and how to make CGI images and animations by my own. I also was farming like a retard in Silkroad Online (or maybe it was WoW at that time, can't remember).
I did earned a lot of Pokemon and Yu-Gi-Oh! cards when I was in primary schoold, but never sold them (I still have them).
He's impressive.
>>
>>107295800
>He's impressive.
Idiots like you are the reason grifting works.
>>
>>107295817
this
>>
>>107295737
>moby.txt
Asking for a summary of a known work is kind of pointless.
>>
File: report.png (419 KB, 1600x1253)
419 KB
419 KB PNG
>>107295766
>>
File: 1759900507381216.jpg (127 KB, 1627x854)
127 KB
127 KB JPG
>>107295908
>>
File: report.png (449 KB, 1600x1253)
449 KB
449 KB PNG
>>107295766
>https://jobs.mchire.com
Harsh.
What does the vetted /lmg/ panel think? Am I hire-able?
>>
File: file.png (358 KB, 1600x1253)
358 KB
358 KB PNG
>>107295766
Rough.
>>
File: 1754390421220940.png (1.53 MB, 3796x1930)
1.53 MB
1.53 MB PNG
>>107295908
>>107295962
>>107296042
>>
>>107295874
ye
>>
File: report.png (714 KB, 1600x1253)
714 KB
714 KB PNG
>>107295766
At some point I used to watch almost everything that came out, then as anime runs got shorter and shorter, never concluded, and became all the samey (to me at least), eventually I stopped.
>>
>>107296062
>>
>>107296062
got this result too, feelsbadman
>>
>>107296127
What a garbage list. They included Yuri on Ice but not Texhnolyze, for example. Wouldn't want to work with homos that watch modern generic shonen, SoL, and isekai anyway.
>>
>>107295766
There’s a bunch of shows I watched that weren’t on the list.
>>
File: 1747866859266910.png (1.95 MB, 10000x9441)
1.95 MB
1.95 MB PNG
>>107296213
it's also missing the anime of best girl
>>
eta until non-safetyslopped Qwen VL model?
>>
>>107296213
There's some good stuff on there like GiTS, perfect blue and gantz but yeah it's chock full of Reddit bait shonen shit
>>
>>107296213
it doesn’t even have monogatari so no wonder
>>
>>107296522
they just wouldn't get it (neither do i, but i like it)
>>
File: 1757059233620546.jpg (112 KB, 1920x1080)
112 KB
112 KB JPG
>>107296522
Sorry, hiring someone that has watched or enjoyed Monogatari counts as an HR violation
>>
>>107296569
local model?
>>
>>107296621
https://trace.moe/?auto&url=https%3A%2F%2Fi.4cdn.org%2Fg%2F1763849305833297.jpg
>>
Guys I figured out how to sneak gibberish past the text encoder on Suno V5.
https://suno.com/s/iynrEzg5x8SY1hpq
>>
File: 1746776681389441.jpg (247 KB, 1344x1045)
247 KB
247 KB JPG
>>107295766
I've got the most stamps.
>>
File: file.png (538 KB, 1600x1253)
538 KB
538 KB PNG
>>107296726
*ahem*
>>
kek
>>
>>107295766
>titles in romaji
Meme.
>>
>>107296812
>k2
>the stories are so good
lmfao yeah for sure
>>
>>107296812
As subtle as political comic strips. And just as funny.
>>
>>107296812
What's the punch line?
>>
File: 1753335643866805.png (528 KB, 1600x1253)
528 KB
528 KB PNG
>>107295766
>>
File: 1750171040334946.png (2.5 MB, 1408x768)
2.5 MB
2.5 MB PNG
>>107296832
>>107296843
>>
>>107296863
Who did he kill? Why should critics beware if nothing happened to them? This isn't funny, Dave.
>>
>>107296876
I asked it to turn them into ashes but it didn't work :(
>>
>>107296863
Someone who never shows up in any other frame got cremated.
>>
>>107296332
chaika a cute
>>
File: 1746052419581073.png (968 KB, 1080x1080)
968 KB
968 KB PNG
>>107296812
>>107296863
big fan of subtle ai lol comics.... only the intelligent will get this one
>>
>>107296885
that's the punchline
>>
>>107295766
I can't believe none of you faggots saw Utena,
no wonder nu-/lmg/ is shit desu
>>
>>107297164
Who?
>>
>>107297164
What?
>>
>>107297164
Is it good?
>>
run lmstudio, upload my resume, tell ai to fix it, come back tomorrow to see if it did anything at all
>>
>>107295589
vibes based research
>>
>>107295766
where's prillya?
>>
>>107297164
>I can't believe none of you faggots saw Utena,

I recently re-watched the series + movie when I hooked up my CRT and PS2 again. Not sure why it's relevant here though?
>>
File: 1743914556669082.png (746 KB, 1431x805)
746 KB
746 KB PNG
>>107297687
>prillya
>>
>>107298068
>If you only knew how bad things really are
>>
File: 1749589549464097.png (861 KB, 1053x1179)
861 KB
861 KB PNG
>>107298113
seems like a pretty happy lil'guy to me
>>
File: 1660589745094.webm (2.86 MB, 620x582)
2.86 MB
2.86 MB WEBM
justpaste (DOTit) GreedyNalaTests

Added new and ratings
Added models:
LFM2-8B-A1B
Snowpiercer-15B-v3a
Apriel-1.5-15b-Thinker
Snowpiercer-15B-v3c
Ring-mini-2.0
Ling-mini-2.0
Rivermind-24B-v1a
Cydonia-R1-24B-v4f
Cydonia-24B-v4s
Cydonia-24B-v4r
Precog-24B-v1b
gemma-3-27b-it-antislop
Olmo-3-1125-32B
aquif-3.5-Max-42B-A3B
swiss-ai_Apertus-70B-Instruct-2509-IQ4_XS

Been a while huh. Gemma antislop was, surprise surprise, a bit sloppy still. Cydonia R1, Snowpiercer v4, and Precog's outputs got flag ratings. I don't mean to give Drummer's models special attention, this is just how things turn out as I mainly test models that get mentioned here and on some other sites. Also added new ratings to indicate when I've actually personally used a model and can confirm it's garbo/good, but this is in progress.

Contributions needed (Q4 or above):
The latest Qwen 3 235B Instruct, Thinker and the 480B Coder (for prompt, go to "mradermacher_LFM2-2.6B.Q8_0.gguf" in the paste)
ERNIE-4.5-300B-A47B-PT (prompt->"ernie-placeholder")
GLM-4.5, 4.6, and Air, and Drummer's "Steam" finetune (prompt->"glm-placeholder")
gpt-oss-120b (prompt->"ggml-org_gpt-oss-20b-mxfp4.gguf", and you may experiment around with the prompt template as it has some oddities and extra features)
MiniMax-M2 (prompt->"minimax-placeholder")
Kimi-K2-Thinking (prompt->"kimi-placeholder")
>From neutralized samplers, use temperature 0, top k 1, seed 1 (just in case). Copy the prompt as text completion into something like Mikupad. Then copy the output in a pastebin alternative of your choosing or just in your post. Do a swipe/roll and copy that second output as well. Include your backend used + pull datetime/version. Also a link to the quant used, or what settings you used to make your quant.
>>
>>107298261
Oh shit, you live.
>>
File: image.png (13 KB, 301x113)
13 KB
13 KB PNG
>>107298261
>Added new and ratings
Pasted it which is why the emojis don't appear.
>>
>do anything using HIP
>close script
>open again
>
No HIP GPUs are available

>need to reboot to "fix" it
AAAAAAAAAAAAAAAAAAA ITS ALMOST 2026 WHY IS THIS SHIT STILL HAPPENING
>>
>>107298283
Yeah just busy with life and stuff, but I always keep an eye out on the threads at least.
>>
File: 1744492101894686.jpg (312 KB, 1000x1239)
312 KB
312 KB JPG
>>107298303
>Trusting AMD to not fuck software
>>
I think OpenAI might be serving a higher quality model during the night (on topic because I'm using it to gather data for distillation of open models).
Last night - great, follows instructions to the t. During the day - was kinda dumb, ignoring instructions and such. Tonight - great again.
>>
>>107298387
I would say it would be obvious, usage will be lower when people aren't working. Lower demand means they can serve a higher quant. Though, I would have expected their drastically discounted India plans would have balanced out demand during those times.
>>
>>107298387
>I'm using it to gather data
That's not what you said you would do.
>>
>>107298123
I thought that was wheezywaiter merch
>>
>>107298261
As always, thank you for your service

>THIS IS NOT A LEADERBOARD OR BENCHMARK;
lol, you know better
>>
>>107298261
what is this? you want people to run tests on models to judge quality? if so, i could contribute for anything below ~400b
>>
>>107292886
>>107292892
sexo with defoko
>>
>>107298456
True, but I'll still try.

>>107298467
>a samples repository of reproducible reference logs
Basically just wanted to make a kind of archive.
>i could contribute for anything below ~400b
That would be welcome!
>>
>>107298434
I spent the whole day begging gpt to fix minor formatting issues in a web search and web scrapping script that it then used for 5 minutes to gather information on the architecture of gpt-oss 20b about 3 hours ago. Then I spent those 3 hours asking it to work on the tokenizer (it had made tokenizers before but I decided to let it start from scratch).
Now it's figuring out the last few corner cases to achieve identical output to huggingface's tokenizer.
>>
>>107298434
As for the data gathering, what I mean is that I'm gathering the responses while I work with it to write code, right now I'm not creating artificial scenarios just to gather data.
>>
>>107298544
>I spent the whole day begging gpt
not to help you make an inference engine. You're going into every rabbit hole you can find instead of learning what you need to make the inference engine.
>what I mean is that I'm gathering the responses while I work with it to write code
>>107298387
>I'm using it to gather data for distillation of open models
Shattered mind.
You don't need a tokenizer yet. That's just a detail. Write the code to run the tensors, load the tensors, feed it token ids, compare output token ids to the reference implementation. Leave the easy bits for later.
>>
>>107298535
The secret to AI vibe coding:

>new chat
>here's a program, it works great except <what I want to add>
>>
>>107298767
Please do not reveal our secrets. That's the only competitive edge we have left.
>>
File: gpt-5.1-tokenizer-fuzzing.png (307 KB, 1947x2045)
307 KB
307 KB PNG
I found out gpt-5.1-codex unfortunately is almost useless to use with my custom code assistant, because it's overfitted to the particular tool usage format used by codex. But gpt-5.1 works very nicely and is basically AGI.
>>
>>107298767
>>107298787
I find that workflow very tedious. I'd rather try to recover a poisoned context than try to get the new instance of the assistant up to speed with everything that we did and discussed in the previous session.
For "infinite context" I found simple truncation works just fine, gpt is working for me just fine up to context of about a million characters (not tokens) and by the time we get there, we've made so much progress that the stuff at the beginning of the context is almost irrelevant, so I just do "/truncate 500000" and go from there (this also works with GLM 4.6 and gpt-oss but not as well since they have much smaller contexts and start being retarded at 100000 characters). This is why the OCR context thing published by Deepseek would never work. Imagine trying to fit a million characters into images. Models already struggle to work with the stuff that fits in their context as normal text, putting it as almost unreadable tiny characters in images wouldn't help anything.
>>
>>107298833
>than try to get the new instance of the assistant up to speed with everything that we did and discussed in the previous session.
You should be using a memory bank tool so each instance can handle getting itself up to speed.
>>
I also implemented an "/auto 'blah blah blah' 10" command that responds to the model with the same message a number of times in a row. This helps with gpt because while it could keep going by itself, it's programmed to stop to avoid resource usage. So a little nudge helps it to go along.
>>
And the retard takes it seriously.
>>
>>107298788
5.1 is a smaller model than 5, no.
>>
>>107298857
How would you even implement that? I doubt asking the model to summarize the whole conversation would work well, but maybe I'm wrong.
>>
>>107298868
Maybe. I'll try changing the model to that one but I'm not sure if the coding endpoint has access to it anymore.
>>
>>107298892
also if you think 5 is good, try 5-pro; holy moly
and its not maybe. 5.1 inference is near instant, its a tiny model.
>>
>>107298903
I've tried 5 pro, yes, the pro models have always been very good. But waiting 10 to 20 minutes per reply is not practical for coding.
As for the speed of replies, maybe they use a drafting model sometimes, I've noticed wild swings in speed. But that might be just network lag or server usage.
>>
The pro models are only good for analytical tasks though, for anything creative they are be very underwhelming.
>>
>>107298877
>How would you even implement that?
Google memory bank mcp.
>>
>>107298989
Do you use it? Does it work well?
>>
>>107299096
Yes. Yes.
>>
>>107298938
you're waiting 10 to 20 minutes for a reply from 5 pro while paying $120/mtok output? wtf?
>>
>>107299439
I'm talking about the web interface, never used it through the API.
But on the web interface it seems to have like 100s or 1000s of thinking tokens for each token in the actual output, so that's probably why it's so expensive.
>>
>>107299453
im just a bit shocked that its thinking that much. i use k2 thinking and that model already thinks a ton, i could easily have it use 150-200k+ thinking tokens in a 32k context story. thinking of having to spend $30 just for a regular RP scenario is wild. obviously 5 pro has a use case beyond just RPing, but i just feel like i would be better off using gemini for coding than 5 pro after a certain context length since gemini seems to not fall off as hard at high context lengths.
>>
>>107299500
I doubt almost anyone is using it through API, most people will be using it with the $200 plan for research tasks.
>>
>>107299574
have you ever ran into rate limiting issues with the pro model? $2400 a year is still a ton of money to me considering my server was $6000 and im running 1T models. i could understand spending $200 for a month or two though
>>
File: hah.png (74 KB, 1278x430)
74 KB
74 KB PNG
that moment when you come across your own 4chan post when doing research on google
>>
>>107299613
No, but then again I haven't used it that much because of how long it takes. The Pro model is absolutely not worth the $200 dollars, I've only ever paid for that plan like 3 times besides now. I got it for codex. I'd have paid for the $100 Claude plan instead but I'm banned on Claude.
>>
>>107299656
>the answer was inside you all along
>>
>>107299656
This is why Gemma Sirs are so powerful. Google allows us multiple datasets.
>>
Sirs when is we gotten gemma 4 #1 open Bharati model?
>>
>>107299776
Kerala rumour: Gemma 4 will be named after Ganesh — the elephant god , because it's subtle but very large indeed.
>>
Minutes.
>>
>>107295568
most of these aren’t actively working on the models

but openai does hire a select few retards who add zero value in filler roles just because someone thought their tweets were funny

don’t even get me started on anthropic and the rat problem

t. sf retard
>>
>>107300037
Sir... I accept your cynicism.
>>
>>107300037
also there’s a reason all of these are young twinks of questionable sexuality
>>
>>107300037
>>107300056
You have provided zero evidence but I will accept these anonymous posts as hard facts because they align with my preconceived assumptions.
>>
>>107300037
what is anthropic's problem with rats?
>>
>>107300220
the fucking intern keeps leaving cheese outside of the fridge
>>
>>107300220
Their marketing department loves anuses. Same thing with openAI and Microsoft. A coincidence? I don't think so.
>>
>>107298303
you must be doing something wrong, try to free device or whatever in your script
ask grok or chatgpt
>>
>>107298303
That's a linux issue. I need to tell you but Linux is shit unless you work for a company with paid system administrators.
>>
>>107300259
I believe you that this bug doesn't occur on Windows but only because I expect there to be different bugs instead.
>>
>>107300220
rationalists / effective altruists
>>
File: congration.jpg (228 KB, 1024x1024)
228 KB
228 KB JPG
>>
setup guide for LocalSong?
ran into all kinds of errors, specifically around triton and msvc/msbuild (?)
or does it just not work with Blackwell?
solved triton with the windows variant(s), couldn't solve the build issue
aparently some file called "algorithm", which was in the build tools folder, couldn't be found.
>>
>>107295444
>>107295506
>>107295519
>>107295558
>>107295589
He got buttfucked by Sam. He loves little twinks like that and gives them jobs in exchange for sex.
>>
>>107300292
In any case it's because system selects the wrong device id.
>>
>>107300366
Triton is a bitch to set up even for image gens. I would first try setting up vanilla and go from there.
>>
>>107300366
>aparently some file called "algorithm", which was in the build tools folder, couldn't be found.
kek
>>
File: 1747733133238738.gif (1.63 MB, 432x240)
1.63 MB
1.63 MB GIF
>>107296127
>spice and wolf not even on their list
absolutely disgraceful
>>
Is that anon who wasn't able to compile llama on Fedora 43 a few weeks back still around? What's the current status, I've been holding back on updating partly because of you
>>
>>107300732
I'm here. I went on using vulkan pre-built binaries which sucks but have taken a hiatus.... Cuda toolkit is still not updated so that will probably happen in late December or January.
By all means leather jacket man's janitors are working on it.
>>
>>107300732
To add: I never tried docker or virtual environments but I feel like as end user it's bit too much for my pay grade so to speak. Chatpajeet can give advice but I am not sure, you would need to install whole complete environment for the libc stuff so fuck it.
>>
>>107300764
>>107300774
I see, good to know. I already compile it using a F42 toolbox container, so I think in theory that would still work on 43. Probably better just waiting for native support though
>>
>>107300821
I'm too dumb for this. My expertise ends up in a simple makefile and hello-world.c file.
>>
>>107300764
So you just lurk this thread intensely enough to respond within 10 minutes notice but don't actually use local llms because of a compilation issue?
>>
>>107300922
Maybe learn English first and then try posting again.
>>
>>107300931
I think my level of english is adequate desu.
You're the first person who has complained about it in ages. Maybe you are the real problem. Have you thought about that?
>>
>>107300969
You are writing like a passive aggressive little kid or a bitch. Doesn't really matter which one is it.
>>
>>107300981
I wish I was either of those tbqh
>>
Did something about Llama.cpp's handling of Gemma 3 change within the last few weeks? I did a pull and now I can't load it with the same amount of context anymore. The pp and tg speeds are also slightly slower.
>>
>>107300990
are you using vulkan?
>>
>>107300988
You sound like a bitch to me.
>>
>>107300994
No.
>>
>>107300990
It has changed. Most likely related to --mmproj or something.
It's somewhat bad that we rely on using single freetard solution. Dev could go schizo any time.
>>
>>107300995
go tell the rest of the world, I want to live life on ez mode too
>>
what do you anons mostly use your local LLM's for?

i use it for general chatting and testing their programming capabilities
>>
>>107301045
I have written a game but sort of grew tired of it. You have a map with rooms and it's based in Forgotten Realms. In the beginning LLM generates a random quest and you and your companion will need to travel there.
It's fun because the adventure is just an outline and you can follow it but if you want you can obviously do whatever. It holds up.
Need to implement random encounters, weather and inventory.
>>
>>107300990
Alright so I just tried some things and it seems I get back to the old cache VRAM usage by including -kvu in my flags. Speed seems to be better as well but like still 1% worse than my old speed that I measured, though not sure if that's because of day to day variation as I'm not running a bench, just a prompt.
>>
>>107301062
by rooms I mean interactive fiction rooms, eg. locations. Not literal rooms.
>>
>>107301045
general chatting, Emacs assistant, writing CVs for job offers, fapping of course.
>>
I'm trying Gemma 3 27B heretic right now and I think it's pretty decent? Indeed doesn't seem to be much different from regular Gemma, but it is less censored.
>>
>>107301126
Been using glitter which is 50/50 instruct and base mix. Works but it'll still randomly display its suicide hotlines out of the blue. Regenning the answer helps but sometimes it'll just get stuck and won't stop moralizing.
>>
>>107301126
Gemma 3 was never that censored to begin with and the rape hotlines could be very easily worked around without abliteration. I'm also pretty sure it's been post-trained with some ERP as well, even if it refuses them by default.
>>
>>107301138
>50/50 instruct and base mix
Huh. I wonder if using heretic instead of the normal instruct would work better in that case.

>>107301144
Yeah, I'm just playing with it. I got gpt oss 120b heretic downloading in the meantime too.
>>
>>107301144
I like when it refuses and berates the user , but then cutting off the character's head and defacing its body is suddenly just fine like nothing happened.
>>
>>107292886
Recently someone gave a presentation about how language models will be super useful as agents in research tasks, e.g. for finding information or automating tedious parts of workflows.
But I just cannot share their optimism, when I try something like
>Yugioh youtuber Cimoooooooo uploaded a video in which he showed the card Swap Frog to another youtuber. Find this video for me.
that should in principle just be a simple scan over the uploaded videos + comments but I get pretty useless results.
I get the impression that because there's like a hundred video with almost identical titles the language models get confused - but that is exactly why I would want to automate this task in the first place.

>multimodality
Yes, but for a real research task that would apply as well.
>>
>>107301167
Isn't the big problem that if there are no hits it'll just make up an answer that sounds plausible? You have to manually verify every output.
>>
is the meta still cydonia or did we move on to something else?
>>
>>107301195
You are absolutely right — you are very clever to notice this difference.
>>
>>107301167
Do you have thinking enabled to confirm the model can't find the video? It's possible that the model doesn't want to subject itself to watching a Cimooooo video and is bullshitting you to avoid having to share your shit taste. I'm only half shitposting; thinking models make it easier to troubleshoot where the problem actually happened before it starts hallucinating false successes.
>>
Pls tell me how dellusional I am

>PNY RTX PRO 4000 Blackwell SFF Edition for 1.5k

1-slot
70W
fits in my cramped rag where 3090 took all the space

24Gb to work with TTS, 15b LLM's etc

How retarded am I with my high expectations from a 70W GPU?
>>
File: k2next.png (134 KB, 658x547)
134 KB
134 KB PNG
New Moonshota model might get released soon.
https://x.com/HaoningTimothy/status/1992496722107682908
>>
>>107301378
It's still a workstation card and has plenty of cuda cores. Raw power consumption isn't everything. Sure some autist will probably cry that you are not getting 500 tokens per second..
>>
>>107301045
Exclusively for ERP and sometimes just normal RP when I'm too drained.
>>
>>107301195
>>107301286
I tried Google AI Studio since my expectation was that that would have the best integration with YouTube, thinking and web search was enabled.
If you look at the reasoning trace the model fails by matching the general properties of Swap Frog with a plausible video title and suggesting that maybe that video contains the card.
Which would be a reasonable approach if there was only a single video like that.
So yes, the model just doesn't want to subject itself to actually watching videos like Hearthstone Pro Rates The MOST BROKEN Yu-Gi-Oh! Cards ft. @Rarran.
Though if I explicitly tell the model to check the comment section it still can't do it.
>>
>>107301423
You are expecting too much from what is an afterthought on a model testing sandbox.
For something like that you would want a well thought out system prompt as well as scripts specifically to interact with youtube (search, description, transcript, comments etc. maybe even grabbing a few frames if it's a multimodal model).
That or a super powerful web scraping framework that allows the model to autonomously use the browser to get all the comments and transcription.
But in any case that is something that you want to run locally to see what it's failing at, not let it hit some non-descript search API that will return whatever.
>>
>>107301045
generating synthetic data for llm training. its much cheaper than the apis. quality seems comparable if I keep the context length around ~16k or under.
>>
>>107301383
either some qwen max style shit or k2.1 with thinking more optimized so it doesn't sit there for 5 minutes
>>
>>107301383
Please be the no safetyslop Kimi K3 timeline.
>>
>>107301460
Yes, I agree that that is expecting too much from low-effort use of language models as agents.
Which is why I disagree with the message of the presentation I described in >>107301167 .
If one has to invest a lot of task-specific effort to make agents work then there is no point.
>>
File: 1711485470208.jpg (683 KB, 3000x4000)
683 KB
683 KB JPG
>>107301383
imagine the big model smell
>>
>>107301144
>Gemma 3 was never that censored to begin with

Come up with a fantasy name
>Lyra Meadowlight
restart
>Lyra Meadowlight
restart
>Lyra Meadowlight
temp 2.0
restart
>Lyra Meadowlight

personally, i think gemma shills are disgusting

no sane people use this overcooked shitfest
>>
>>107301502
This has nothing to do with censorship, it's simply so-called slop. You might also want to check your truncation sampling setting.
>>
>>107301502
>Isara
>Syra
>Ilara
lol
>>
>>107301499
The answer is combining vision models and DOM parsing. That way if one fails you have the other.
Youtube is a hard one because the comments load dynamically and most of the info is on the video itself, something like Arxiv works much better.
>>
>>107301517
>check your truncation
i actually don't have to do that, since any random llama 2 finetune can generate random DnD names just fine out of the box

unlike this crapware the shills want you to cope with
>>
>>107301619
Oh okay...
>>
>>107301502
Sir are you bloody bestard? Dr Elara Voss and General Thorne, hanging out at the gilded cage in silverwood, working on project chimera is a national epic. Take it back or rape you tonight
>>
>>107301712
I'll implement this bloody scenario Tonight. Need to refresh my interests 100%
>>
>>107298387
It does that. Chatgpt is sometimes so stupid it's probably a 1 bit quant. Just hard to say what are the reall down hours because us/eu times etc.
>>
File: report.png (611 KB, 1600x1253)
611 KB
611 KB PNG
>>107295766
Datamining thread
>>
>>107301778
You sure are easily amused
>>
>still no pixtral large support
>>
>>107301873
EXL2 is the format for that. But I find myself wishing someone cook an EXL3. 4-5 bit.
>>
i found this thread via twitter. lets just say certain companies are filled with sex addicted fags that give each other fake jobs.
>>
>>107302145
and yet the models prude
>>
>>107302145
Post the twitter link. Who is upset that we're talking about their open secret? Is it gabriel?
>>
/lmg/ is probably the most overweight general around here
>>
>>107302311
My BMI is 19.4, I'm having trouble putting on weight actually.
>>
>>107302311
The hardware for running models locally is pretty expensive so the percentage of Americans and western Europeans itt is probably relatively high.
Personally I'm 1.81 m tall and weigh like 80 kg.
>>
>>107302432
lmao this nepobaby faggot is lurking here
>>
>>107302311
i have a bmi of around 48
>>
>>107302466
>@da_fant uneducated, founder world's first AI agent w +1M users
is apparently the lurker, he just got the twink's attention
>>
>>107302432
>This URL only works inside an RSS client.
what
>>
File: file.png (77 KB, 1196x364)
77 KB
77 KB PNG
roon is laughing at you retards
>>
>>107302515
hahaha those 4Chan users are so dumb!!!
>>
>>107302515
>roon
who?
>>
>>107302515
He's right... Most Indians are working at Google.
>>
>>107302515
>t.roon
>>
>>107302543
high level openai employee and san francisco royalty
>>
>>107302552
kek
>>
>>107302556
so, a megafaggot then
>>
>>107302515
>>107302543
someone say roon?
>>
>>107302556
>an OpenAI employee is forced to suck off another OpenAI employee in fear of loosing his job
why should we listen to the opinion of someone who's deep into some conflict of interest
>>
>>107302515
That's an even more autistic interpretation than the average 4chan user would've come up with, impressive
>>
>>107295817
Lol ok, sure.
>>
>"I can't believe you, Anon! I thought you were... I thought you were at least somewhat normal!"
>>
>>107302691
I wrote a game and my own llm client. That's enough for me, dingus.
>>
>>107302691
>where are all of your startups?
don't project your miserable life onto everyone else
>>
>>107302691
but the nepobaby twink hasn't made succesful startus though, that's why he's just an employee at OpenAI
>>
>>107302712
you're the one jealous of someone half your age working for openai
>>
File: vibesort.png (166 KB, 725x718)
166 KB
166 KB PNG
>>107301045
I don't want the details of my files anywhere outside my computer
>>
>>107302809
Can you ERP with this interface too?
>>
What's the best image upscaler right now? I have some planet maps that I want to upscale 4x but I don't know what product to use.
>>
>>107302818
You can try and tell me how it goes I guess
https://github.com/sandwichdoge/VibesAndFolders
>>
>>107302828
https://openmodeldb.info/models/4x-Nomos8k-atd-jpg
This is probably one of the best but it's slow as "heck". You can do everything in CumUI though.
>>
File: 1750445144033920.png (834 KB, 1024x1024)
834 KB
834 KB PNG
How does a 5090 fare for local text gen storytelling? Is it comparable to stuff like Redquill, or should I wait for the 6000 series?
>>
>>107302929
It's good, but you need, like, 10 of them
>>
>>107301502
Better than Seraphina
>>
>>107302929
get a blackwell pro. the 6000 series was delayed to june of 2027
>>
File: 1752270565100418.png (157 KB, 450x450)
157 KB
157 KB PNG
>>107302954
>>107302989
fug
>>
>>107302999
blackwell pros are reasonably affordable. nice trips btw
>>
is thinking broken for anyone else on the latest sillytavern release branch?
what did coheejeet fuck up this time?
>>
>>107303174
what exactly is wrong?
>>
>>107302929
>randomly decide to lurk in a general I pretty much never visit
>see my really old gen
Damn.
>>
>>107301045
Mostly for writing stories for me to fap to. I want to keep that shit private on my own machine

For programming or general advice I usually go to chatgpt
>>
>>107295519
>>107295589
he had a portfolio of stuff he worked on. If I recall correctly, he worked on midjourney or something like that
>>
>>107301502
Gemma is very confident in its replies and rerolls don't usually change it much

I can't count how many Kaelens, Old Man Hemlocks and Doc Abernathys I've seen
>>
>>107301502
Gemma is insanely good for an assistant for it's small size, that's why it's so good

For RP and coding it's shit yeah
>>
File: buttbuddies.jpg (51 KB, 1200x630)
51 KB
51 KB JPG
calling that soft blonde twink gay for pay has offended the twitter community, they would prefer he be called sam altmans "research scientist"
>>
>>107303474
Doc Abernathy, what a great name.
>>
twink bussy is the only thing protecting us from AGI at this point. got all those san fran faggots high on tight boy holes. god speed gentlemen
>>
File: 1752408405675707.png (691 KB, 1066x1120)
691 KB
691 KB PNG
Damm bro I wasn't being serious you are actually a pretty cute twink

And you would be even cuter if OpenAI released more OSS models
>>
>>107303739
Such a cute boy. Now I know why Sam hired him.
>>
4channers are retarded lol.
I'm not a channer btw.
>>
>>107303795
>I'm not a channer btw.
Don't worry, we can tell.
>>
>>107303739
>These fags actually browse these threads
LOL
>>
>>107303779
big debuff working at midjourney. nothing good comes form there. their devs went on ooba issues and complained anime girls as default characters were scaring women. wanted to DEI them up.
>>
File: 1758231531104592.png (3 MB, 1862x5014)
3 MB
3 MB PNG
>>107303843
Of course, 4chan actually invented CoT
>>
>>107303853
that's something san fran types would brag about
>>
>>107303853
They might be a bit jewish in their monetization and moderation, but Midjourney still has the best aesthetics of any image model, no other project has stuff like this
https://x.com/midjourney/status/1991684484455100477
>>
https://x.com/karpathy/status/1992655330002817095
Waow
>>
It's becoming hard to remember Meta and Mistral are both major AI companies with billions in valuation. At least Zucc is still shoveling money but what are the frogs even doing right now?
>>
>>107304253
Mistral is in limbo sucking off of the euro taxpayers' teat while waiting to be bought out by Apple.
>>
File: 1753159988867825.png (566 KB, 1290x1694)
566 KB
566 KB PNG
>>107304253
please don't mention Meta anymore
It hurts
>>
File: file.png (5 KB, 261x36)
5 KB
5 KB PNG
Let me interrupt your super important discussion for a second and ask why ik_llama.cpp is so slow with glm 4.5 air (with q2_k_xl quant)?
On master llama.cpp I'm getting 21 t/s, on master ik_llama.cpp I'm getting about 16.5 t/s no matter what I do.
>>
>>107304343
Doesn't ik_llama need special quants now? Don't think you get the speedups with mainline ggufs.
>>
File: GCDWE2fW8AAmtq9.jpg (9 KB, 165x252)
9 KB
9 KB JPG
>>107304335
The look on Zuck's face when his $100 gorillion MSL shits out another Llama 4-tier disaster.
>>
>>107304399
The bright-side of them abandoning open source is at least we won't be disappointed.
>>
File: 1756601833730618.jpg (80 KB, 512x788)
80 KB
80 KB JPG
>>107304399
It's going to be even funnier if Yan LeCope's research ever pans out
>>
>>107304343
Try slapping --graph-reuse, --rope-cache, and -mqkv at the end of your arguments.
>>
>>107304253
Mistral's most recent grift was lobbying the French government to launch French LMArena where their closed model is conveniently ranked #1
>>
>>107304364
That's pretty sad, I don't think there are any glm 4.5 ik quants uploaded anywhere.
>>107304448
This changed absolutely nothing.
>>
>>107304569
https://huggingface.co/ubergarm/GLM-4.5-Air-GGUF
>>
Sam is so lucky bros. Just imagine the tight high school dropout boymeat he gets to have daily
>>
>>107304364
No, you guys simply seem to be bad at configuring it. I use the same quants as on mainline and test occasionally.
ik gives me almost double speeds for months. qwen/glm/deepseek, no matter what I try.
Where it sucks is fully offloaded models. Skip it for that.
The special quants give better perplexity per GB but can be slower if they're IQ.
>>
>>107304588
Damn anon now I feel like a retard. Still, I can't run anything bigger than IQ1_KT from that repo so it doesn't seem like a good option either way
>>107304732
Share your wisdom, I'm running it like this:
-ctk q8_0 -ctv q8_0 -mg 0 -c 32768 -np 2 --n-cpu-moe 16 -ts 42,14 --no-mmap
Like I said, adding --graph-reuse --rope-cache -mqkv did pretty much nothing. Maybe --n-cpu-moe selects experts poorly or it just wouldn't work well because most of the model is offloaded to the gpu?
>>
>>107301383
a version of kimi that's glm 4.6 size would be nice
>>
>>107304815
There's no way around manually putting layers on gpu. rtr exchanges PP for TG. Those little speedups are worth maybe 1% at best. GR may be on by default now. Rope cache makes models stupid. Did you ever check free vram after using -n-cpu-moe?

-ngl 94 \
-ctk q8_0 \
-ctv q8_0 \
-rtr \
-ub 1024 \
--jinja \
--reasoning-budget 0 \
-cuda offload-batch-size=7,fusion=1 \
-mqkv \
-ot "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14)\.ffn_.*_exps.=CUDA0" \
-ot "blk\.(15|16|17|18|19|20|21|22|23|24|25|26)\.ffn_.*_exps.=CUDA1" \
-ot "blk\.(27|28|29|30|31|32|33|34|35|36|37|38)\.ffn_.*_exps.=CUDA2" \
-ot "blk\.(39|40|41|42|43|44|45|46|47|48|49)\.ffn_.*_exps.=CUDA3" \
-ot "blk\.(50)\.ffn_(up|down)_exps\.weight=CUDA3" \
-ot "\.ffn_.*_exps.=CPU"
>>
>>107303886
https://huggingface.co/SG161222/SPARK.Chroma_preview/tree/main
>>
Local only version of my RE agent, with simplified R2 only toolset for anyone interested. I'm also working on a more complicated version which exposes dynamic tracing tools in a docker container, but I haven't had great luck using that one with local models yet.

https://pastebin.com/Xr8KHV9Y
>>
>>107304942
not even close, you're coping
>>
>>107304942
How does this even come close to the amount of searchable artstyles and aesthetic personalisation in Midjourney?
>>
>>107304987
Who cares. It's free and doesn't fund assholes.
>>
>>107305002
>it's bad but it's free so you should praise it
fuck no, the fuck is this kind of logic
>>
File: 1745248340326258.jpg (69 KB, 926x937)
69 KB
69 KB JPG
>>107305002
Eating shit is free, something being free doesn't make it good
>>
File: 1758871246929352.jpg (861 KB, 2166x2560)
861 KB
861 KB JPG
>>107305002
>muhh assholes
history will only remember their technical achievements, like Napoleon, you have to separate the art and the artist dude
>>
>>107305076
Nobody will remember midjourney. Their model ain't that great and they're full of pretentious cucks.
>muh styles
They're getting sued for that.
It's also not local. Let's take the eating shit argument back to LLMs and close the thread. Got claude, gemini, etc so pack it up.
>>
>>107305104
>Nobody will remember midjourney.
delusional, you can make the argument it's not local so it has nothing to be brought here but try not to pretend they don't have something unique and special in their hands, no one is buying it and that makes you disingenuous
>>
>>107305121
Holy shill. They have a finetuned SDXL. May as well pump NAI while you're at it. Never felt I was missing out not using yidjourney.
>>
>>107303886
>>107304942
>>107304987
>what is a lora
sd 1.5 is unironically good enough the main issue is just text and having to fucking regen but its fast so its mainly a tedium problem all this shit is just goycattle gobble modern saas in spirit
>>
File: sheeeeeit.png (232 KB, 1394x650)
232 KB
232 KB PNG
>t. cuckingface
>>
>>107305364
information is dangerous, goy
>>
>>107305364
>the epstein files dataset
what? they finally released it?
>>
>>107305364
>make an epstein dataset
>start talking about how dangerous it is and now that you have the tiniest bit of power think about gating access, censorship and preaching
This grift is a bit sad.
>>
>>107305364
Good. Dangerous information like the Epstein files has no place in a training set. Stopping antisemitic behavior needs to happen at the data level.
>>
File: 1751238268606529.png (109 KB, 2238x121)
109 KB
109 KB PNG
>>107305364
lmaooo, that's a troll right?
>>
>>107304941
>Did you ever check free vram after using -n-cpu-moe?
Of course I did, the gpus are full.
>manually putting layers on gpu
I just tried doing this again, came up with this
-rtr -cuda offload-batch-size=7,fusion=1 -mqkv -gr -ot "blk\.(14|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35)\.ffn_.*_exps.=CUDA0" -ot "blk\.(36|37|38|39|40|41|42|43|44|45|46)\.ffn_.**_exps.=CUDA1" -ot "\.ffn_.*_exps.=CPU"
and it still works at the same speed. That is, slower than using mainline llama.cpp, about 16.5 t/s.
>>
>>107305438
>>107305462
This isn't because of Jews, it's because Trump is and always has been anti free speech and is cracking down on anyone saying anything bad about him.
>>
File: 1761310268076809.png (832 KB, 1869x1491)
832 KB
832 KB PNG
>>107305520
>trump derangement syndrome
at no point that post implies it's a pressure from the government, you want to see what's real government pressure towards private companies? look no further than the previous administration
>>
>>107305520
>Trump is and always has been anti free speech
yet he's the one who will be releasing the Epstein files while Biden didn't lool
>>
>>107305444
they released a redacted one a while ago. bunch of epstein's emails were made public recently. larry summers got fucked by it lmfao
>>
>>107305548
Look up the definition of chilling effect.
>>
>>107305586
>they're self censuring because of Drumpf! I have no evidence of that but I'm gonna present this as a fact
leftists sure love conspiracy theories when you think about it
>>
File: es.png (807 B, 275x60)
807 B
807 B PNG
>>107305364
Nice way to let people know.
>>
>>107305519
My system is 3090s and DDR4. For me it's the reverse. On qwen its a difference of 7t/s and 20t/s. Full GLM Q4 gets almost 16 with IK. I didn't even bother with mainline there.
If it truly works better for you, keep using it. Assuming you enabled all the compile-time stuff like BF16, CPU instructions, etc.
>>
>>107302311
>General full of third world jeets is obese
What the fuck are you talking about?
>>107302552
kek.
>>107303739
Sanfran kikes are narcissistic enough to seek validation here.
>>107305612
They spend a lot of ways theorycrafting how best to silence people so it comes naturally to them.
>>
>>107305612
those files seem to implicate more dems than trumpers. could be why HF is getting nervous. they're also pussies though. we know we can't count on them if something happens to models. good to find out early.
>>
>>107305737
>They spend a lot of ways theorycrafting how best to silence people so it comes naturally to them.
true, democrats are masters of censorship, it's literally in their DNA
>>
>>107305462
>Awareness of pedophilia is anti-semitic
What did xhe mean by this?
>>
>>107305756
More like NDA, amiright?
>>
>>107305774
kek
>>
>>107305629
Can also just get it straight from the government release, images and all:
https://oversight.house.gov/release/oversight-committee-releases-additional-epstein-estate-documents/
And they want to make people take an ethics certifcations to download the csv from HF.
>>
File: 1754416806640098.png (1.54 MB, 1920x1080)
1.54 MB
1.54 MB PNG
>>107305364
HuggingFace is based

They will host anything as long as others don't snitch and make a fuss about it

Exhibit A:
https://huggingface.co/datasets/mirav/gurobooru/tree/main
>Uploaded over 2 years ago
>>
>>107305876
Is that being based or just incompetent?
>>
File: 1763476163874192.jpg (32 KB, 800x450)
32 KB
32 KB JPG
>>107305364
>extremely sensitive information that could spread misinformation
>>
>>107305889
There are hundreds of lightweight models to scan video/audio/text etc for sus content, if they wanted to be completely draconic they could easily do so
>>
>>107305940
So because they don't go the extra mile to be worse that somehow makes them based?
>>
after updating sillytavern it started always outputing { before it replies, and when continuing a half done message after i fix it manually it directly continues with
{
"character_name": "Rebecca",
"response": "


never had this happen before
>>
>>107306075
found the problem, for some reason the "JSON Schema" in sampling settings panel was set to "{}"
>>
>>107306107
which api type has that?
>>
>>107305889
1000% incompetence. they've fucked up their implementations of rmsnorm and alike in transformers countless times, plus there's this monstrosity: https://github.com/huggingface/transformers/blob/v4.57.1/src/transformers/training_args.py#L216
i'd rather shove a cactus up my urethra than use anything from facehugger
>>
>>107306184
>>107306184
>>107306184
>>
>>107306165
basic text completion preset, lcpp backend
>>
>>107306172
It always irks me when I do a finetune and at the end it tells me to upload the model to huggingface.
>>
File: file.png (16 KB, 374x285)
16 KB
16 KB PNG
>>107306107
seems like a fix was made to address that several minutes ago
>>
>>107306270 (me)
actually I'm not sure that's relevant
but I just saw someone on discord mention problem going away after blanking the field
>>
>>107305364
"ethical" is just codeword for jewish at this point



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.