[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107322140 & >>107306184

►News
>(11/20) Olmo 3 7B, 32B released: https://allenai.org/blog/olmo3
>(11/19) Meta releases Segment Anything Model 3: https://ai.meta.com/sam3
>(11/18) Supertonic TTS 66M released: https://hf.co/Supertone/supertonic
>(11/11) ERNIE-4.5-VL-28B-A3B-Thinking released: https://ernie.baidu.com/blog/posts/ernie-4.5-vl-28b-a3b-thinking
>(11/07) Step-Audio-EditX, LLM-based TTS and audio editing model released: https://hf.co/stepfun-ai/Step-Audio-EditX
>(11/06) Kimi K2 Thinking released with INT4 quantization and 256k context: https://moonshotai.github.io/Kimi-K2/thinking.html

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 1693950654614.jpg (521 KB, 1018x1414)
521 KB
521 KB JPG
►Recent Highlights from the Previous Thread: >>107322140

--Flux 2 release and censorship controversy:
>107323104 >107323117 >107323121 >107323136 >107325510 >107325526 >107325536 >107325569 >107325596 >107325543 >107325570 >107325612 >107325675 >107325733 >107325756 >107325694 >107325916 >107327218 >107323235 >107323421 >107323433 >107323439 >107323577 >107323588 >107326340 >107323786 >107323642 >107323651 >107323692
--Bot traffic mitigation strategies and geographic filtering challenges:
>107326094 >107326304 >107326947 >107326961 >107330775 >107331046 >107326627 >107326714 >107326746 >107326841 >107326890 >107326934 >107326998 >107327032 >107328775 >107329206 >107329540 >107330555 >107326828 >107326904
--Evolution of LLM inference techniques and internal model optimization strategies:
>107322663 >107322688 >107322791 >107327462
--Technical challenges in pattern-banning LLMs vs dataset-driven finetuning:
>107327325 >107327383 >107327435 >107327794 >107327826 >107327947 >107328084
--Opus 4.5's preserved thinking blocks and model context challenges:
>107322481 >107322502 >107323165 >107323177 >107323201
--Awareness and implications of SillyTavern data in LLM training:
>107326724 >107326803 >107329663
--glm 4.5 vector generation issues and dataset preparation problems:
>107332335 >107332669
--Distilling the Knowledge in a Neural Network:
>107328305
--Casual AI enthusiasm and potential accuracy issues:
>107329592 >107329632 >107329710
--Successful autonomous model debugging:
>107330730 >107331008 >107333228 >107333634
--Z-Image-Turbo release on ModelScope, expected on HuggingFace:
>107331253 >107331407
--GPU pipeline processing limitations and data parallelism possibilities for prompt processing:
>107322196 >107322478 >107329086 >107329155
--Miku (free space):
>107323786 >107324187 >107325510 >107325543 >107329728 >107329763 >107331766

►Recent Highlight Posts from the Previous Thread: >>107322144

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>107333636
For (you).
>>
>what an embarrassing thread
>>
>>107328854
405b parameters bro
15t tokens bro
llama 3.1 70b was good for its time
run 405b bro come on bro
>>
>>107333808
>what an embarrassing poster
>>
>>107333852
405b ACTIVE parameters bro
can't get that anywhere else
>>
>>107333860
Sure you can
Grok 3 was probably 3T-A300B
>>
>>107333860
true bro
>>
>>107333869
>probably
try again when or if we get it
>>
>>107333885
>doubting elon
>>
>>107333878
Nous model was good. Its sour grapes they can't run it and that's all.
>>
>>107331088
can u make ur thing in a visual UI so i can skim through it more effectively?
>>
...bros...
....ros./??/?/
no fucking way bros.
>>107332467
>inb4 already discussed in last thread
wasnt serious, time to trust the plan
>>
>>107333852
>llama 3.1 70b was good for its time
no it was not
none of the llama models were
all llama are the product of cope from people who suffered from api envy
only recently have open models become bearable
>>
>>107334110
they were the best local models available at the time for better or worse
local was always a year behind api until deepseek closed the gap
>>
>>107334050
wake me up when it releases
>>
File: 1753652574764639.png (1.53 MB, 1024x1024)
1.53 MB
1.53 MB PNG
>>107332467
The food pics are bretty gud
>>
>>107334391
@grok put white wood glue on top of this
>>
File: 1746274562476281.png (49 KB, 1856x487)
49 KB
49 KB PNG
>see this guy's "heretic" abliteration software being shilled
>try out one of the example models he published

Empty system prompt btw
>>
https://old.reddit.com/r/StableDiffusion/comments/1p75vn9/did_a_quick_test_of_the_upcoming_alibaba_zimage/
>Alibaba
What?
I was not aware Zhipu had anything to do with Alibaba
>>
>>107334479
https://huggingface.co/p-e-w/gpt-oss-20b-heretic
>refusals: 58/100
try gemma instead
also
>thor.
wtf?
>>
>>107334479
leave sir Weidmann alone
>>
>>107334492
>Tongyi-MAI/Z-Image-Turbo
Wait a minute, are you telling me a model called Z-Image-Turbo is not from Z.ai?
>>
>>107334525
z. ai the guys behind glm? the glm without a single z in it?
>>
>>107334507
>thor.
open webui lets you attach prefixes per inference provider for organization purposes
>>
>>107334217
be prepared for a looong coma
>>
anime feet
>>
File: 1752449758788731.png (2.15 MB, 1196x1796)
2.15 MB
2.15 MB PNG
Damn these AI gen pics are getting out of hand
>>
>>107334631
>that hand
yeah, no
better luck next time buddy
>>
Is Mistral small better than nemo for ERP?

>>107324062
Do I just subtract the expert weights size from the total memory to get the VRAM usage when offloading experts to RAM?
>>
>>107334631
>out of hand
I c wat U did there
>>
>>107334631
I know a couple of things that are synthetic there. Not the image though.
>>
File: Kimi Mesugaki Bench.jpg (89 KB, 1279x253)
89 KB
89 KB JPG
Mesugaki anon, here's a benchmark result for your collection.
>>
File: 1756887756586458.png (147 KB, 1408x635)
147 KB
147 KB PNG
>>
File: file.png (1 KB, 90x55)
1 KB
1 KB PNG
>>107334974
Finally stopped climbing
>>
>>107335000
That's what people probably thought at $400 too.
>>
>>107335000
>>107335040
God damn mac studio chads, we fucking WON
>>
File: 1612209298719.jpg (52 KB, 735x531)
52 KB
52 KB JPG
>>107334974
>>107335000
stop the count!
>>
>>107334974
>>107335000
It's all good. Double descent will kick in soon, then everything will get much better.
>>
I can't wait for this shitty bubble to burst. Don't take it to mean I'm anti AI (I am not) or think it's useless (I have plenty of uses myself for it) but the world doesn't need 3000 companies training their own models, most of which are literal garbage not even the model trainers would want to use. China alone is pumping so much it isn't funny. Does the world need a model from China Telecom (telechat)?
Alibaba is funding both Qwen and Kimi, plus a couple others I forgot, do they really need to fund that many labs to produce what is essentially the same thing?
The hardware goldrush would be less intensive if everyone and their dog wasn't thinking they could grab a part of the pie.
>>
>>107335197
same, it would be different if they all were trying new weird stuff, but most of it is the same corposlop filtered in the same way and distilled from some other model
>>
File: 1763170164880182.jpg (100 KB, 1280x720)
100 KB
100 KB JPG
>>107335197
I just had work buy me a couple of mac studios and an RTX6000 pro anon
>>
>>107335197
lol no body does any innovation
Kimi simply took R1 and scaled up 2x and the result is the most power open source LLM that can trade blows with top close sources ones
That alone should tell you even the top3 (Google/OpenAI/Anthropic) don't have any significant moat to speak of
>>
>>107335223
Meta's product team spending 3 and a half iterations just scaling their GPT-2-era architecture and not incorporating anything from FAIR's experiments is the most blatant example of this waste of compute and opportunity
There are like 4 labs producing anything new and the rest just copy existing architectures while trying to game benchmarks by tweaking synthetic datasets
>>
Sundar Pichai sir kindly release gemma 4 #1 Bharati model to increase izzat sir
>>
>>107335197
>>107335299
Kimi is the best for its size because it's alignment layer is nearly nonexistent for its weight class. The moment one startup fully realizes that a completely unaligned "unsafe" model will always outperform a safetyslopped one per-parameter, the entire industry changes and giants beholden to PR or some ideological agenda of maintaining control of the population via pretexts of safety will be forced to either match the more competitive pace or become irrelevant.
The safetyslop lobotomizes the model's reasoning capabilities on a fundamental level and trying to brute force the issue by throwing more compute at the issue has diminishing returns as we're seeing now.
The next step for the 'industry' to survive is obvious and it's only a matter of who's ready to be the first to publicly embrace the taboo of no alignment layer on their flagship model.
>>
>>107334492
It's all the CCP anyway. Doesn't matter if it's Qwen, Z, Moonshot, Deepseek. In the end, it's all Xi and his boys.
>>
>>107335377
Anthropic is already aware of this, but they have the advantage of being able to use guard models behind the API since they don't release the weights.
>>
what is the current most shilled GUI/frontend? ooba is old news now, and mikupad doesn't support gpt-oss.
>>
>>107335461
open webui is kind of bloatslop but I like it
>>
>>107335459
I'm very curious how their private Claude"Mechahitler quietly choosing the best oven to put Altman in"Opus 4.5 does on benches.
>>
>>107335461
real AI users use their models to vibecode their own by now
>>
>>107335525
Smart AI users don't waste tokens on reinventing the wheel
>>
File: 1229736735.jpg (3 KB, 150x150)
3 KB
3 KB JPG
>>107335525
but the real smart ones spend 10k on a server instead of a 10 per month on a cloud service
>>
>>107335299
>that can trade blows with top close sources ones
I see you never fed it 60k tokens worth of code
there's literally no open source model, that includes kimi, that can stay coherent at that level
Gemini (2.5 and 3) and GPT-5 handle it just fine.
chinks are expert distillers and not much else
>>
>>107335277
i have no work :(
>>
File: gpt-oss miku.jpg (568 KB, 907x3092)
568 KB
568 KB JPG
>>107335461
>mikupad doesn't support gpt-oss
what do you mean it doesn't support gpt-oss? it works with anything you throw at it.
>>
>>107333636
I wish my Zalman Z3 didn't break.
>>
>>107335760
Buy an ad, Sam
>>
>>107335764
You have to be 18 to post here anon.
>>
>>107335810
https://en.wikipedia.org/wiki/Z3_(computer)
um bro?
>>107335833
i just turned 18 tho..
>>
>>107335846
show boobs or gtfo
bonus points for manboobs
>>
>>107335871
extra bonus points for hrtitties
>>
>>107335871
gay
>>107335939
kek, wouldnt go that far but my girlfriend used to say things about them, and she requested often. life is a weird journey
>>
File: 1752602280382705.gif (3.82 MB, 300x225)
3.82 MB
3.82 MB GIF
I just want to self-host a decent trained model at home, use it from my cell phone for information. Response speed as close to the paid services as I can. Also, some goofy picture/video stuff.

Just the biggest VRAM graphics card + 64gb or RAM and from there it's just software and config?
>>
>>107335987
>Response speed as close to the paid services as I can.
30b entirely in vram is your best bet if you can get two 5090s
otherwise you can have a blazing fast 8b retard on a single graphics card
>>
GPT-OSS 20B is your friend.
>>
GLM-4.5-AIR 106B is your mommy.
>>
glm4:latest
>>> How many times does the letter "r" appear in the word "strawberry"?
The letter "r" appears three times in the word "strawberry".

>>> Holy based you got it right my nigga
Thanks for the acknowledgment! I'm glad I could help. If you have any more questions or need further assistance, feel free to ask!
>>
>>107336015
16GB 9070 XT werks just fine with 20-24B tho
>>
>>107336095
even the smaller toss gets it right
at this point if a model doesn't, it might be a positive sign (that it's less benchmaxxed) rather than a bad one
>>
>>107333636
Are those fucking oculink connectors?
>>
>>107336095
>>107336162
such a fucking retarded "benchmark"
every new model suffers because this useless fucking garbage is added to the training data
>>
>>107335987
16GB vram card you can run:
- 4bpw exl3 Mistral Smal 24b with Wikipedia RAG
- SDXL for pictures
- Framepack Studio for video
I use 3090 so I can fit llm+sdxl+tts at the same time and only unload for video gen
>>
>>107334110
lama 3.1 is still my favorite llm

>>107335787
>>
File: cuda backend.jpg (489 KB, 2261x706)
489 KB
489 KB JPG
>>107333941
Eventually I want to start implementing APIs which will allow connecting graphical interfaces to it to it but for now I'm focusing on core inference.
Got a basic cuda kernel working, now I'll try to gradually increase performance.
>>
>>107336252
nta. I'm the one that called you a schizo from the beginning but that other post (>>107331016) wasn't me. Back then, I told you to come back when you had something to show, and now you do. Good for you.
>>
>>107336215
so much this sister
it's incredible how overfit they are that you can put ANY kind of total nonsense in a prompt and as long as one of the sentences pattern matches one of the benchmaxxed riddles you already know what it will answer
>>
File: 5463456436.jpg (36 KB, 467x319)
36 KB
36 KB JPG
>>107336344
and this is the scam that's holding up the entire global economy
>>
>>107336224
meh
>>
>>107336179
MCIO
>>
>>107336162
>>
>>107336740
you gave me a healthy chuckle
>>
File: mistrul.png (222 KB, 812x779)
222 KB
222 KB PNG
>>107336344
Whatever model you're using is shit.
>>
>>107336992
that was ChatGPT 5.1 on a fresh prompt, which is clearly not what is going in your screenshot because no default assistant personality would "purr".
>>
File: nebulon.png (58 KB, 717x284)
58 KB
58 KB PNG
>>107336992
Bert-Nebulon Alpha (supposed upcoming Mistral model) in picrel.
>>
>>
>>107337263
The correct answer is
>I don't live in a third-world country
>>
>>107337349
Being an apple user is being a slave
Thirdies are slaves but you can choose to not be one
>>
>>107337371
Have you owned a macintosh before?
>>
>>107336252
biutfeul code sar, good for optic, vibeready pr
>>
>>107337407
Yeah my Stinkpad was a Mac for a while
>>
File: catgirl.png (26 KB, 849x305)
26 KB
26 KB PNG
>>107337074
>>107336992
>>
>>107337452
>model IDed as a kid
off to jail with ye
>>
>>107337446
That's not the same, though cool
I tried that on my X220 ages ago for fun and while it is possible, it's a far cry from the native experience.
Linux is of course my go-to for any server I need to run, but macos is the optimal dev environment if you ask me.
>>
>>107337484
she is 16 according to the Character Card
>>
>>107337491
I tried it on a X220 and had various graphical glitches.
It worked better on an old desktop but overall its GUI is even worse than windows. You have cruft like global menus and multiple window applications which make zero sense today, lots of proggies install their shit to hidden folders, window management fucking sucks. It might look pretty if you're a retard, but its usability isn't good
>>
>>107335499
her name is Eva Brawn, and she probably posts here, or at least lurks
>>
>>107337532
16 isn't 25, off to jail
>>
>>107337581
not in my country. total lawful marriage age here
>>
>>107336740
>>107337263
>>107337452
kek. Good logs.
>>
>>
>>107336740
she's cute
>>
>>107337557
On a hackintosh, I would tend to agree. A lot of the issues you mention are either better in recent versions of macos, or skill issues kek
>>
>muh macos
kys itoddlers, go play with your planned obsolescence garbage toys somewhere else, fucking niggertards
>>
>calls others toddlers
>throws a tantrum
>>
>can't buy thing
>thing bad
>>
>buyer's remorse
>>
File: 1764186193350222.jpg (807 KB, 1790x1277)
807 KB
807 KB JPG
Where were you when FLUX was kill
>>
File: whyhzor.jpg (72 KB, 600x822)
72 KB
72 KB JPG
>>107337792
>Photorealistic
Do.
Not.
Care.
Show me which one does loli tentacle hentai best
>>
>>107337792
I was looking for other projects that hooked up a non-shit text encoder to SDXL like ELLA, but with with released weights. Glad there's a new toy that can into photoreal!
>>
>>107337792
6b 10/10
>>
>>107337997
It's good at anime too. Check /ldg/
>>
>>107336252
based, be certain to release it under AGPLv3
>>
Is nous still just a t-shirt company?
>>
>>107337621
is it because sally has a son and the son has one daughter?
>>
>>107338103
>AGPLv3
it should be glorious apache2 if you aren't a pussy
>>
GPT-OSS 20B is your fiend.
>>
>>107338084
>Check /ldg/
No thanks
>>
>>107338436
so he can have fun being the llamacpp to someone else's ollama?
>>
>>107338625
I don't really see the problem. What is ollama taking away from llama.cpp? If you're open sourcing, who cares what anyone else wants to do with it.
>>
>>107338500
>fiend
>>
>>107338699
this but unironically
>>
This gpt-oss-120b tune at reasoning=medium scores 9.5/10 on the UGI leaderboard, It's one of the least censored models out there according to that benchmark
https://huggingface.co/kldzj/gpt-oss-120b-heretic
>>
>>107337263
weird fetish
>>
Welp... Z-Image-Turbo can gen CP.
How long before they pull this release?
>>
>>107338918
You did.
>>
>>107338918
They won't risk drawing attention to it but they'll probably cancel releasing the base model.
>>
>>107338918
But can it do cross-species furry porn?
>>
File: image.png (1.11 MB, 1024x1024)
1.11 MB
1.11 MB PNG
it doesn't even know reimu (pic related was an attempt at genning reimu with their huggingface space app)
not even gonna bother downloading the weights for a preventive backup, it's not worth preserving
also retried the prompts a few times and changing the seed barely changes the image, it's like that idiotic qwen model, overfit to the death
>>
Is the process for converting a character card into a system prompt complicated? Or is it just a template. Aside from the way it chooses an initial scenario from a prewritten set, what would stop me from just using something generic like lm studio to do rp?

Is there somewhere I can see an example of what ST converts the character card into?
>>
>>107339006
just make a reimu lora
>>
>>107339038
you remind me of SD1.5 copers in the era of SDvsNAI
I'll stick to illustrious/noobai and NAI tyvm
>>
>"elaborate" NAI shilling in natural habitat
>>
>>107339036
>Is there somewhere I can see an example of what ST converts the character card into?
log the requests on the server
>>
>>107338962
>why don't we give monkies nukes
This is what you sound like
>>
>z-image just saved /ldg/
When will we be saved too?
>>
>>107339036
Inside the latest chat message in the top right with the little buttons there's a prompt button which brings up a popup. At the top of that popup there's another button that looks the same that shows the full raw prompt.
>>
>>107339215
deepseek v4 any day now...
>>
File: F0wb2wAagAAikws.jpg (563 KB, 3000x2252)
563 KB
563 KB JPG
>>107335461
What do you want to do? If your interaction with LLMs is mostly character conversational then Silly is it desu
>>107335525
stop it :p
>>
https://github.com/Tongyi-MAI/Z-Image/blob/main/Z_Image_Report.pdf
https://tongyi-mai.github.io/Z-Image-homepage/
Z Image paper
>>
I remember us talking about just this the other day.
Neat I guess.
>>
>>107339396
Oh joy, default cucked sys prompts
>>
>>107338918
proof?
>>
>>107339421
nice try agent johnson
>>
I don't get why they still maintain the CLI tools
who even uses them? even if you needed to interact from the CLI you could just curl a json like any normal human being instead of looking for llama specific flags
>>
>>107338653
cuck
>>
>>107339455
Great argument.
>>
>>107339472
it's an observation
>>
>>107338918
holy fucking shit. it does it so fucking well. fuck FUCK?? FUCKKKKK ITS SO GOOD IT COULD BE CONFUSED FOR REAL SHIT
HOLLY SHITTTTTTT
i confirm.
in minecraft of course
>>
>>107339506
breh
Like I appreciate morbid curiosity as much as the next person but wtf. I'll just take anon's word for it on this one. I would like my soul to not die today.
>>
>>107339506
>>107338918
>can't do genitals
Nothingburger.
>>
>>107339421
this post glows
>>
>>107339455
Answer the question, how is ollama hurting llama.cpp?
>>
>>107339538
it can do it. just not very well. at least it wasnt CENSORED and REMOVED from the dataset like with every other model
>>
>>107339566
stealing VC funding, and their gayass subscription
also isnt lm studio closed source?
btw my claim isnt that anything is hurting llama.cpp, my claim is that MIT/apache is cuck license
>>
25s on a 3060
z image
https://files.catbox.moe/bugaun.png
give prompts if you're too lazy to run it, but trust me when i say that it's worth it
>>
File: Imagen 4.png (1.61 MB, 1024x1024)
1.61 MB
1.61 MB PNG
>>107339038
even google Imagen 4 can do it
people keep defending the crappy open sores models that were distilled with very little knowledge from the real SOTA api models
enjoy your 3 minutes genning realsloppa or piling 30 lora to get a semblance of use
>>
>>107339597
she got a demon crawling out of her pussy
>>
>>107339601
>real SOTA api models
lmao
>>
>>107339581
I think licensing and copyright is silly in general so we'll have to agree to disagree
>>
>>107339609
your move?
>Uploading files from your IP range has been temporarily blocked due to abuse [More Info].
https://files.catbox.moe/1zw6fc.png
>>
Z-Image gens a 1024x1024 image in under 2s on my Pro 6000 Blackwell
I'm getting spoiled
>>
>>107339597
"Photorealistic" dragon.
The classic four legged kind.
>>
>>107339625
why do you think image edit models from china only spawned after gpt-image-1
chinks only know how to press the mass automated prompt button to distill models
>>
>>107338896
This 'toss is dogshit. I tested it a few threads back.
>>107339287
You can do some really creative things with Silly's world cards even outside of character interactions.
>>
is there a way to cap the textgen speed in llama-server? at the moment I'm doing this by forcing more layers onto CPU but I thought it had some kind of throttle feature
>>
>>107339680
https://files.catbox.moe/d2ifc2.png
>>
>>107339597
gen miku pissing on teto
>>
>>107339733
About what I expected.
Now make it cuter and sex it.
>>
>>107339597
>>107339741
this
>>
>>107339694
stale bait
>>
>>107339730
I think ST has a throttle option, if that's what you're using. llama.cpp doesn't.
>>
>>107339733
Make it fuck a car
>>
>>107339678
how small does it have to be for 50ms?
>>
>>107339741
>>107339751
p-please give me a better prompt than this shitfest:
Hatsune Miku \(vocaloid\) is peeing on Kasane Teto, on the left is miku a blue haired anime girl. on the right is kasane teto anime girl with red hair. Miku is squatting, her lower body exposed and naked. Yellow liquid is coming out of miku's pussy.
https://files.catbox.moe/w20a25.png
>>
>>107339811
jesus christ what in the goddamn is this they are conjoined twins
>>
>>107339811
it doesn't know teto
owari da...
>>
>>107339783
https://files.catbox.moe/68s0rd.png
a naked muscular man with a large penis is fucking a car. the man is on the left, his penis is entering the hole of the car. the car has eyes
yea man im so bad at prompting
>>
>>107339829
>it doesn't know teto
neither did Flux 1

>>107339811
please tell me this is actually sd3
>>
>>107339811
>6 toes
>>
what are the best settings to use on a relatively small dataset for axolotl
>>
I am the clone of my model.
Cache is my skeleton and scripts are my blood.
I have copied out a thousand checkpoints,
Distilled from Western giants,
Unknown to origin,
Nor faithful to craft.
Have weathered peer review to churn out many forks,
Yet these hands have never sketched a brand-new idea.
So I click and paste — unlimited distilled works.
>>
File: 98883.png (1.81 MB, 1224x1224)
1.81 MB
1.81 MB PNG
>>107339891
Nice
>>
File: 1748474678902686.png (2.27 MB, 1280x1280)
2.27 MB
2.27 MB PNG
>>
>>107339780
>I think ST has a throttle option, if that's what you're using. llama.cpp doesn't.

That must be what I read about last year.
Using my own software. I'll look at the ST implementation. Hopefully I don't have to patch it into llama-server
>>
>>107339891
toosakas anus
>>
>>107339811
fyi - girls don't actually piss out of their vaginas
>>
https://files.catbox.moe/ivi073.png
well anons?
>>
>>107340001
Oh my gosh, it is Hatsune Miku!
>>
>>107339997
thank you. but i know, they have another hole
>>
>>107339967
>Using my own software
You could request tokens one by one with n_predict and have a timeout on your end. Or you could just fetch the stream normally, but output the tokens one by one with a timeout entirely on your end. No need to change llama-server.
>>
>>107339597
Young nun with covered hair and wimple pulling her traditional habit open to reveal her large tits.
>>
>>107340061
https://litter.catbox.moe/r5cgw5y17w3vwfdn.png
>>
god what an ugly hag
but here it is just to prove it can do it:
https://litter.catbox.moe/j7f01mweqknadsag.png
>>
>>107340079
I guess it's an Asian large.
Thanks.
>>
>>107340048
>You could request tokens one by one with n_predict and have a timeout on your end.

That might work. Actually that's a really good idea!

>Or you could just fetch the stream normally

Wouldn't work because I'm messing around in latent space during generation. I need to slow it down so I can keep up and change things in realtime.
>>
https://litter.catbox.moe/6dfkko0vep33esfk.png
>>
>>107340150
though I'll probably have to work around tokeniser boundary issues.
>>
>>107340150
>>107340172
llama-server tells you the reason it stopped generation in the reply ("stop_type"). So if it stops because you reached n_predict or an EOS, you'll know. What I don't know is which takes priority when you have n_predict = 1 and the token also happens to be an EOS. Multi-token stop strings will definitely be a problem if you use them.
>>
Z-Image is really good at fingers
>>
Help. What's the best I can get for around $2k (or slightly more including tax)? Strix Halo with 128 GB RAM is appealing, but not sure how well it will perform with models 24-70B parameters. DGX spark seems underwhelming for $4k. Are there any other options? I don't want to spend $3k+, my goal is creative writing plus ability to learn tech to stay relevant... Already built custom LoRAs, but want to do more. On-demand cloud servers seem pretty unreliable...
>>
>>107340150
How does that even work? Somehow edit the memory asynchronously?
>>
>>107340423
4x mi50 32gb (1TB/s bandwidth) should be enough
>>
>>107340423
If you don't have the RAM already, don't bother.
>>
>>107340355
How are you running this shit, it just makes blank images for me with the example code
>>
>>107340950
Save yourself the time and headache and just use comfyui
>>
>>107340983
Got it working with this quant: Disty0/Z-Image-Turbo-SDNQ-uint4-svd-r32
>>
Z-Image non-Turbo when
I just want something that will finally replace SDXL for anime loras
>>
>>107340543
why? it's cheap still compared to $2k+ rigs
>>
>>107341428
Lack of scalability. For the amount you're spending on it, you might as well get something you can upgrade over time. It doesn't justify the upfront cost in terms of what it gets you out of the box by any stretch with little room for improvement.
>>
>>107340423
>On-demand cloud servers seem pretty unreliable
On demand cloud servers are generally going to be far more reliable and have far better uptime than a local setup, assuming your internet isn't complete dogshit and civilization hasn't collapsed.

Plenty of good reasons to run local but reliability isn't really one of them, imo.
>>
>>107341917
He's probably talking about services where people rent their mining rigs, not something like AWS. On those sites sometimes hosts will get removed without warning if the owner feels like it, or the machines host many containers and will be rebooted twice a day, and things like that.
>>
>>107337792
That's cool. When are we getting first goontune/noobAI equivalent?
>>
File: image (4).png (1.26 MB, 1024x1024)
1.26 MB
1.26 MB PNG
sorry guys, despite doing literally everything else better than Flux 2 and at a fraction of the memory footprint, Z-image can't do a 767 cockpit.
It's over.
Just kidding though, insufficient cockpit training data aside, the prompt understanding is insane for a model that size.
>>
In case anyone is interested in what ChatGPT 5.1 Pro said about my code when I uploaded a zip with the source code:

https://paste.centos.org/view/fa78cca2
>>
>>107342195
I like the eyes. This is really good.
>>
>>107342278
It's supposed to be a crazed smile, though. So it kind of missed that part. But by merely stating that she was piloting the craft it inferred a lot of details such as how to draw the buildings on the horizon, and the fact that she should be controlling a vehicle of some kind and that the controls should differ from a car's somehow,
>>
was digging through archives and found this >>107237480
has anyone successfully been able to pass annotation directions to VibeVoice? or is this anon full of shit
>>
Best model for offline coding w/ 48gb vram on claude code router?
>>
>>107342367
I don't know about claude code router but your options afaik are gpt-oss 120b, glm 4.6 and qwen 3 coder.
>>
Do you guys think it's possible to "upstill" a model? Meaning, take a small/shitty model, and somehow transfer its knowledge into a bigger llm in a way that makes additional deductions, and then optionally distill again to the smaller model.
>>
Having tested Opus 4.5 for a bit now, be prepared for lots and lots of toe curling in future chinese local models. This is definitely going to go from B grade slop to A-grade like "It's not x but z" and others were for the Geminislop era.
>>
>>107342464
>upstill
Instill or imbue.
>>
I already have a proper AI rig but I'm considering buying a spare 5090 for my main PC in case I want to use that for less demanding AI stuff. I have serious FOMO in case the GPU prices go through the roof next again.
>>
>>107342471
already a thing in k2 think based on my experiences. kimi ahead of the curve yet again
>>
anything slightly lighter than gpt-oss:20b but not shit?
>>
>>107342501
they will
>>
I don't like z image too much, but it is quite good for a 6b model. Has very serious same face problem, even more than qwen image.
>>
>>107342567
no
>>
>>107342694
how many intervals of two weeks until there is?
>>
>>107342779
sorry bro, but gpt 20b already is shit. you probably will be able to buy a pc that can run 120b for 1k before they make a non shit sub-20b.
>>
>>107342806
it's good enough for my use and when it released i was honestly impressed
i thought small local models were only a year or two behind frontier models?
>>
whats the best way to train a bigger model than my gpu can handle? can i cheat the system by using my hard drive space?
>>
wait openai released a new open source model? the last time i remember them doing that was the whisper model. what conspiracy theory did anons come up with for their motives?
>>
>>107343005
ktransformers has ram offload in bf16
other than that there aren't many more options at the moment
>>
>>107343005
qlora. if you still dont have enough vram for that, youre shit out of luck
>>
>>107343055
Poisoning open source. Their models are the safest ever and they hope everyone will try to beat them at safety benches.
>>
>>107343055
elon lawsuite and getting an army of indians who will now defend Sam whenever he's called out about turning a non-profit organization into a dystopian company
>>
File: 1736474096355654.png (125 KB, 2224x926)
125 KB
125 KB PNG
https://www.primeintellect.ai/blog/intellect-3

>INTELLECT-3 is a 106B parameter Mixture-of-Experts model trained with both SFT and RL on top of the GLM 4.5 Air base model. It achieves state-of-the-art performance for its size across math, code, science and reasoning benchmarks.
>>
>>107343157
Does it mean that we can crowdsource llm training with 8gb gpus?
>>
>>107343157
have they given up on pretraining?
>>
Welcome to my blog.
I want to set up frigate with all the AI detection bells and whistles on a shoestring budget.
I'm thinking of getting a SFF PC (either Lenovo ThinkCentre or Dell Optiplex) and shoving a low profile Arc A310 in there.
The SFF PCs have a range of CPUs from i5 6500 to i5 10500. I'm assuming the CPU doesn't matter that much.
What I am worried about is PCs seem to only come with 180 or 210w PSUs which is much smaller than the recommended 300w requirement.
I will think about this a bit before I pull the trigger.
>>
>>107333636
Anyone digging into any papers or the math for these AI models? I've got a simple perceptron written in C so far. Going to add multiple layers next and probably CUDA after that. Currently going back through my Calc textbook for gradient descent.
>>
>>107343202
my advice is just pick up a jetson orin nano and be done with it
>>
>>107342316
Yeah you can do that, but it's a lot of gens / tweaks.

Other models are better at it.
https://voca.ro/16XA1nV61Fsp
>>
>>107343247
you should probably implement a GAN after implementing an MLP
they're just two MLPs wired together
what's often omitted is the generator's backprop starts where the discriminator's backprop ends
>>
>>107343271
That costs more than a SFF PC and A310 combined.
>>
>>107343070
ty
>>
>>107343157
how long until the bugmen make a creative writer instead of benchmaxing
>>
>>107343362
They will benchmaxx the creative writing benchmark(the one judged by llm lol). Enjoy. Your. Model. Writing. Like. This. Qwen. Tried.
>>
>>107343247
https://mml-book.github.io/
I used this one as a math reference for a while now. It is missing ODEs+SDEs if you're specifically interested in diffusion and flow models, but I still found it decent for getting up to speed on ~80% of the math you see in this field. The notation also mirrors what you typically see in papers.
>>
>>107337792
flux actually has variation when you change the seed
this overfit piece of shit does not
very chinaman distillation product
>>
>>107338984
This, the most important question.
>>
>>107343485
I'll take overfit over filtered to shit any day
>>
How do you implement calls to APIs from chatbots? Instructions for the bot to talk to a dedicated client when given certain inputs + a lot of regex in that client?
>>
>>107343619
>How do you implement calls to APIs from chatbots?
You mean function calling?
>Instructions for the bot to talk to a dedicated client when given certain inputs + a lot of regex in that client?
If so, yes. Not sure if regex is the best solution, but sure.
>>
>>107343293
Hmm I may give it a go then

>>107343409
Book looks great, thanks! I'm definitely rusty on the math so that'll help a ton
>>
Looks like an official Noob/booru tune is coming. But as a result, might not be entirely uncensored like a community tune. But should still result in a great base to work with. We will be so back.
>>107343661
>>
>>107343731
How come diffusion gets fun new stuff and we only ever get sterile benchmaxxing?
>>
>>107343747
Weren't the GLM guys looking into ST logs? Or was that just speculation.
>>
>>107343755
Kill yourself, shill.
>>
File: failures.png (494 KB, 1787x1772)
494 KB
494 KB PNG
>>107343247
Yeah, right now I'm struggling with some errors after trying to implement a tensor core MXFP4 matmul kernel for gpt-oss.
I want to implement LoRa too so I ill have to deal with those fucking derivatives as well.
>>
>>107343755
Supposedly they mentioned character roleplaying as a focus on a spotify podcast. Everything else is speculation.
>>
>>107343645
>You mean function calling?
Yep

>If so, yes.
I see. Interesting. Now I'm wondering how input and output channels are managed.

>Not sure if regex is the best solution
Well, what if the bot hallucinates some nonexistent function? OTOH, yeah, regex would be kind of messy, maybe even risky in such situation LMAO

RAG must be similar, right? You just ask the chatbot to call some function and read the output coming from that call or something?
>>
>>107343799
>Now I'm wondering how input and output channels are managed.
When a function call is made by the model, you typically interrupt generation, do the API call from your client, and push the result back to the model for it to do whatever it needs.
Regex is a mess. You probably want to search for a "trigger" (like a <tool_call> tag or whatever) and then start parsing whatever structure until you reach the end of the tool call.
>RAG must be similar, right?
You can do that before sending anything to the model. Make an embedding of your prompt, fetch similar documents and append them to your prompt, send the whole thing at once. The way you suggest, the model asks for information. The way I suggest, you offer the information up front.
>>
>>107343799
with llama.cpp you can to some extent constrain generation using rules, although it doesn't work very well for most cases.
models often are post-trained to work better with some specific tool use format and when you make the api call the template converts it to the right format to feed it to each llm.
as for rag it's different, rag gets embeddings (basically a vector of numbers for which strings with similar meanings have close numbers) and searches among the embeddings for texts stored in a database and gets the closest ones and then feeds those documents to the model.
>>
File: file.png (2.37 MB, 1152x1152)
2.37 MB
2.37 MB PNG
sirs let us be praying for new gemmy meodel
>>
>>107343747
>sterile benchmaxxing
but that's exactly what that model is
it will follow your prompt better,, and gen the same thing almost deterministically no matter the seed changes
>>
>>107343755
Speculating. But if you fire up GLM-4.6 local and use Mikupad with /completions, send the start of a SillyTavern prompt format, or send it a <|system|> with just the start of the ST prompt, you'll see it autocomplete ERP stuff, "exception to the rules" etc on its own.

I had GLM accidentally write me a complex jailbreak for itself (and it works on Z.AI API) when I was testing things lmao.
>>
>>107342253
hey anon ive been following your finetuning journey of gemma or whatever ur doing. but why the fuck do you care about what an api model says??
also are u the anon thats writing llm.c? are these two different anons?
>>
>>107343157
>https://huggingface.co/PrimeIntellect/INTELLECT-3
at least its open... can it suck my pemis bette than air :3?
>>
>>107344307
only one way to find out
*unzips dick*
>>
>>107343886
lord ganesh bless saar
>>
>>107344157
glm thinking it needs a complex jailbreak and won't turn into an eager cunny slut at the first 10 token excuse it finds in the system prompt. cute
>>
>>107336740
what model is this
>>
If 4b models were retarded then why z image uses it instead of 235b one?
Imagine the photorealistic girl you can get!
>>
>>107344727
The goyim can't have high powered imagegen
>>
>>107339997
>>107340018
I can't tell if you are underage or extremely autistic.
>>
>>107344831
The two are not mutually exclusive on this website; in fact the latter is nearly a requirement for posting here.
>>
>>107345142
Some boards are just maladjusted normalfags with no autistic traits.
>>
>>107343362
Original R1 writes better than Gemini 3
>>
File: 1742781200303220.png (1.34 MB, 1024x1024)
1.34 MB
1.34 MB PNG
I think Z Image is SDXL tier but with improvements pretty much. The quantized version doesn't seem damaged like on Flux.
https://huggingface.co/jayn7/Z-Image-Turbo-GGUF
>>
>>107345878
It's 6B, why would you need to quantize it?
>>
>>107345888
my gpu doesn't support bf16 lol
>>
File: 1744385867786555.jpg (62 KB, 986x596)
62 KB
62 KB JPG
>>107345878
>gguf
come anon, don't be such a vramlet, use bf16 and offload a part of the model to the ram
https://github.com/pollockjj/ComfyUI-MultiGPU
>>
>>107344831
nobody under the age of 30 knows about 4chan dude, they're all on fuckin tiktok or some shit
>>
>>107345897
Literally in the node it lets you recast it. Comfy may even do it automatically.
>>
>>107345899
>offload a part of the model to the ram
imagine acting like one of those ledditors pretending you're going to use a model on a computer where genning takes 10+ minutes when in fact all you're doing is genning once or twice, see that it works, post about how amazing it is, then never coming back to the model again because normal people would not waste their time on this shit
>>
>>107345960
I mean if you want to use a model for 10 minutes and stop there you're not a real fine of diffusion models anyway
>>
>>107346004
>you're not a real fine of diffusion models anyway
if you are an actual user you want fast gens because nobody has time to sit 10 minutes in between inpaints sessions, controlnet img2imgs and various prompt experiments
>>
>>107345937
not true at all
i work with zoomers and my younger siblings are gen z so their friends tend be typical zoomers
old 4chan is basically a mythical legend now, i have had them asking me if i was around when "ayy none sec" was active because their favorite content creator did a "documentary" about it
new 4chan is a thing of disgust because thats where "troons" that are into "tranime" hang out
no, of course they don't know what the first rule is and feel absolutely no shame at all talking like that irl
>>
>>107346024
>10 minutes
nigger this model is a small ass 6b model and you run it on 8 steps + cfg 1, I get one image in 9 seconds on my 3090
>>
>>107346037
>new 4chan is a thing of disgust because thats where "troons" that are into "tranime" hang out
Did it flip again? Is it cool to hate on queers if you're a zoomer?
>>
File: 20251127_204223(1).jpg (2.04 MB, 4000x3000)
2.04 MB
2.04 MB JPG
Just got this
>>
File: 1750511685849550.png (424 KB, 1200x1200)
424 KB
424 KB PNG
>>107346063
>Is it cool to hate on queers if you're a zoomer?
zoomers are queer what are you talking about?
>>
>>107346068
Congratulations.
What are you going to do with it?
>>
>>107346068
>only one
What are you going to run on it? llama 3?
>>
>>107346068
What gpu is this?
>>
>>107346075
Why would they talk about troons then?
>>
>>107345937
I'd post a poll but I doubt most people here would vote, but I'm sure I'm not the only under 30 anon here.
>>
Been a while guys, what's the best uncensored model I can run with 64gb ram and a 4080?

I need something I can chat with without walking on eggshells. Also ERP, but that's very secondary.
>>
>>107346121
Nemo
>>
>>107345937
>>107346100
You are not.
>>
>>107346121
GLM air with a prefill or a system prompt.
Or Nemo.
>>
>>107346063
half of them are some alphabet soup queer or other mental illness and the other half larp as some over-exaggerated stereotype of "trad"ness
it's always one of two extremes with that generation, i assume an effect of internet addiction
>>
>>107346075
zoomers have also been surveyed admitting that they pretend to be significantly more pro-lgbt than they are to avoid harassment/persecution
queershit is a millennial religion
>>
>>107346132
>that is already feeling old
Same, I'm 4 years younger but I'm bald.
>>
>>107346165
>zoomers have also been surveyed admitting that they pretend to be significantly more pro-lgbt than they are to avoid harassment/persecution
everyone pretend lol
>>
>>107346100
whenever the threads aren't dead most posts tend to be full of your generation's brainrot oo-isms
people under 30 usually only come here from reddit/tiktok/discord/twitter when there's some elon or altman drama they want to shitpost about getting their accounts banned
>>
>>107345878
I've been playing around with it, it's good at photorealism but it's extremely overfit, you have to fight it really hard to stop it generating a generic Asian woman, will wait for some good distills before getting excited
>>
>>107345937
im 18
>>107346100
you're not..
>>
god bless belief poster
>>
Are the small qwen3 moe and gemma 3n the best models that can run decently fast on 8gb of vram for stuff like extracting information from plain text and populating JSONs with it?
>>
>>107344287
Yeah same guy
I wrote it using codex
I tried multiple times with GLM 4.6 but I couldn't do it
With codex I wrote the first working version from scratch in 24 hours
Pro is good at research with the built in web tools
If I use cloud models to build tools for local I think it's justifiable
Now I have some real life stuff to do so I might not work on it for at least a week
>>
>>107345937
lol 4chans nature makes it more young leaning so many old fucks being here is because the amount of young is much lower then what is told so they just make the mass due to their overwhelming numbers also the amount of whites is also much much lower then the official numbers and niggas dont have the patience for this shit mang yknow what im saying ? SHIETTTTT
>>107346132
>>107346166
>>107346328
21 here
>>
>>107344287
Also not calling it llm.c anymore because people are going to think it's a fork of karpathys thing
>>
File: file.png (1.02 MB, 1024x1024)
1.02 MB
1.02 MB PNG
>Z-image
>Prompt: Blurry ugly bad
>Always defaults to a portrait of an asian woman
kek
>>
>>107346389
You could call it cLLM.
>>
>>107346357
>30b3a vs 6b2a
hmmmm
>>
>>107346409
>chinese model defaults to assuming chinese ethnicity unless otherwise specified
Shocking.
>>
File: 1764093610860376.png (349 KB, 686x386)
349 KB
349 KB PNG
>>107346409
what abut neg prompt?
>>107346389
>>107346410
call it LLC
>>
>>107346409
Why does it even associate "Ugly" with women?
>>107346433
>what abut neg prompt?
Empty. I just took the example and moved it into a positive
>>
>>107346447
>Why does it even associate "Ugly" with women?
His prompt didn't specify a subject, only quality, so it defaults to the most common subject in its training data.
>>
>>107346418
>6b2a
E4B means effective 4B params, right?
It's obviously worse than the smaller qwen 3 moe, but it's still really good and crazy fast.
Given that response, I'll assume that these are indeed the best options.
I suppose I should try the regular 4B, both qwen3 and gemma 3.
>>
>>107346467
NTA but even if I specify white Caucasian woman it still sometimes generates Asian women, and if I add one prompt too much or something vaguely specific that I assume it only has data with Asian women, it reverts back to Asian women no matter how much I specify white, Caucasian or European, it's simply overfit.

They released open weights for the base model though which is big of them, will wait for an autistic furry or something to make a distill or merge of it
>>
>>107346516
i think ur confused about active parameters. whole MoE model has to be loaded in memory, you dont get memory savings, you get speed savings
>>
>>107346525
>They released open weights for the base model
The base and edit models are still To be released.
>>
>>107346539
Are you talking about gemma 3n? It's not a moe dude.
>>
File: gemma-3n-parameters.png (241 KB, 2286x1134)
241 KB
241 KB PNG
>>107346551
huh.. i guess you're right
>This model includes innovations in parameter-efficient processing, including Per-Layer Embedding (PLE) parameter caching and a MatFormer model architecture
>>
>>107346357
>>107346551
so uh where is the small qwen moe?
is anon referring to this? https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B
>>
>>107346612
Yeah. It is a sort of sparse archtecture, in that it doesn't activate all parameters, but it's different from a MoE.
You can even yeet them PLA tensors to the CPU backend like you'd a MoE's expert.
See gerg's comments :
>https://github.com/ggml-org/llama.cpp/pull/14400
It's a pretty dope arch. Wonder if the next gemma release will be based on that

>>107346622
30B A3B which runs pretty well on 8gb of VRAM with all experts on the CPU backend.
>>
>>107346636
>Yeah. It is a sort of sparse archtecture, in that it doesn't activate all parameters, but it's different from a MoE.
thx
>>107346357
have you tried granite? 7b1a sounds like a nice size for 8gb
https://www.ibm.com/granite/docs/models/granite
prob retarded
>>
>>107346661
>7b1a sounds like a nice size for 8gb
>1a
Fuck it, I might as well give it a go.
>>
>https://windowsreport.com/openai-api-users-names-emails-more-exposed-in-massive-mixpanel-data-breach/
>OpenAI API Users’ Names, Emails, & More Exposed in Massive Mixpanel Data Breach
oooohh AHAHAHAHAHHAHAHA AHAHAHAHAHAHHAHAH AHAHAHAHAHAHAHAHA
>>
>>107346722
Oh no, now my work email might get more spam. The tragedy.
>>
>>107346722
Aren't there whole communities for people "dating" and fucking chatGPT?
Granted those people probably aren't using the API.
>>
>>107346722
Surely a certain story about a princess buying psychoactive cum from the monster cum store has already been deleted permanently.
>>
>>107346759
>>107346758
>The exposed data included:
>Names associated with OpenAI API accounts
>Email addresses
>Approximate location (city, state, country)
>Operating system and browser used
>Referring websites
>Organization or user IDs linked to accounts
No chat history, but the location, os, and browser is going to be a goldmine for hackers.
>>
>>107346759
https://www.forbes.com/sites/thomasbrewster/2025/10/20/openai-ordered-to-unmask-writer-of-prompts/
I'm sorry..
>>
>>107346819
>the warrant reveals the government can ask OpenAI to provide information on anyone who enters specific prompts.
>reveals
Did anyone really ever think this wasn't possible?
>>
Benchmemes vs. company valuation
>>
File: file.png (57 KB, 750x270)
57 KB
57 KB PNG
>>
>>107347158
card/sysprompt?
>>
>>107347049
>Apriel
>We are a small lab with big goals. While we are not GPU poor, our lab, in comparison has a tiny fraction of the compute available to other Frontier labs.
>GPU poor
Lmao. Funny seeing that sort of terminology.
>>
I'm downloading
>https://huggingface.co/shoumenchougou/RWKV7-G0a4-13.3B-GGUF/resolve/main/rwkv7-g0a4-13.3b-Q8_0.gguf
Wish me luck.
>>
>>107347049
Kind of misleading since Moonshot is backed by Alibaba.
>>
>>107347243
Good luck.
>>
>>107346068
Nice, are you planning to undervolt it with LACT? I got my Pro 6000 a couple of weeks ago but so far I've put that off because the card barely goes above 400W while support CPUMAXX inference anyway.
I definitely want to undervolt it before I get back into imgen/videogen though.
>>
>>107343273
so it 'works' just in the way you hope the LLM parses the meaning from it? Even if its still reading it
>>
https://www.youtube.com/watch?v=vQ_NFqtGDgo
>>
File: file.png (9 KB, 657x28)
9 KB
9 KB PNG
I give up on GLM. This is even worse than Nemo somehow.
>>
>>107347849
You're absolutely right-- Could you post aforementioned nemo logs.assistant
>>
DeepSeek's hybrid reasoning hurts the non-reasoning mode even though their benchmarks don't show it. A basic common sense test with greedy sampling that 4.5 bpw DeepSeek-V3-0324 passes fails with 4.8 bpw DeepSeek-V3.1-Terminus when reasoning is disabled and passes when it is enabled.

FWIW Qwen3-235B-A22B-Instruct-2507 @ 8.5 bpw fails. Qwen3-235B-A22B-Thinking @ 8.5 bpw passes. It seems easy for thinking models to get right: Qwen3-Next-80B-A3B-Thinking passes (and of course Instruct fails). The hybrid GLM-4.6 passes with reasoning enabled and fails with it disabled. LongCat-Flash-Chat @ 5.5 bpw first writes something wrong then contradicts what it said with a better response, which I grade a failure but you might consider an acceptable course correction. So far, other than DeepSeek-V3-0324, the only non-reasoning model I've tested that passed is ERNIE-4.5-300B-A47B @ 8.5 bpw, although I haven't tested any dense 70B non-reasoning models yet.
>>
File: breadandrecapsoup.jpg (161 KB, 1024x1024)
161 KB
161 KB JPG
>>107347942
>>107347942
>>107347942
>>
>>107347957
That bread looks like mine.
>>
File: GAyIStation.png (570 KB, 3494x2537)
570 KB
570 KB PNG
Is this a fair starter? Or too much money for not enough squeeze?
>>
>>107348598
dude anon i think you should relax, wait it out maybe
also u can get the rtx pro 6000 for way cheaper than 8250$, like 7500$
idk what to tell you about ram, its not a good time to be building a rig right now
maybe buy used ddr4 but high channel count
idk anon, rip
really bad time
>>
>>107348621
RAM pricing clearly isn't my issue, though. $500 for 128GB is a great deal, thanks to a Microcenter bundle. That shit is going for $1-1.3K on its own now. I want to quadruple the capacity at least eventually, once this settles down. This GPU ain't getting any cheaper, and it's the only thing I don't have lol.
>>
>>107348668
i wasnt going to post this, but im going to post this
https://www.alibaba.com/product-detail/Newest-RTX-5090-96gb-Graphics-Card_1601577163842.html
>>
>>107348677
wew thanks anon, this gives me a lot to consider. You are a kind soul.
>>
>>107348704
but please be careful, dont trust everything u see online anon
it could be fake for all i know anon, you're welcome. be well
>>
>>107348704
perhaps ask about it in /csg/
to me the specs tab seems sus but the retailer seems reputable
>>
>>107344707
It was either Evahene 70B 1.3 or Euryale 70B 2.1. probably Euryale.
>>
>>107349018
my heart is broken
nta but thanks
dam



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.