/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107322140 & >>107306184►News>(11/20) Olmo 3 7B, 32B released: https://allenai.org/blog/olmo3>(11/19) Meta releases Segment Anything Model 3: https://ai.meta.com/sam3>(11/18) Supertonic TTS 66M released: https://hf.co/Supertone/supertonic>(11/11) ERNIE-4.5-VL-28B-A3B-Thinking released: https://ernie.baidu.com/blog/posts/ernie-4.5-vl-28b-a3b-thinking>(11/07) Step-Audio-EditX, LLM-based TTS and audio editing model released: https://hf.co/stepfun-ai/Step-Audio-EditX>(11/06) Kimi K2 Thinking released with INT4 quantization and 256k context: https://moonshotai.github.io/Kimi-K2/thinking.html►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107322140--Flux 2 release and censorship controversy:>107323104 >107323117 >107323121 >107323136 >107325510 >107325526 >107325536 >107325569 >107325596 >107325543 >107325570 >107325612 >107325675 >107325733 >107325756 >107325694 >107325916 >107327218 >107323235 >107323421 >107323433 >107323439 >107323577 >107323588 >107326340 >107323786 >107323642 >107323651 >107323692--Bot traffic mitigation strategies and geographic filtering challenges:>107326094 >107326304 >107326947 >107326961 >107330775 >107331046 >107326627 >107326714 >107326746 >107326841 >107326890 >107326934 >107326998 >107327032 >107328775 >107329206 >107329540 >107330555 >107326828 >107326904--Evolution of LLM inference techniques and internal model optimization strategies:>107322663 >107322688 >107322791 >107327462--Technical challenges in pattern-banning LLMs vs dataset-driven finetuning:>107327325 >107327383 >107327435 >107327794 >107327826 >107327947 >107328084--Opus 4.5's preserved thinking blocks and model context challenges:>107322481 >107322502 >107323165 >107323177 >107323201--Awareness and implications of SillyTavern data in LLM training:>107326724 >107326803 >107329663--glm 4.5 vector generation issues and dataset preparation problems:>107332335 >107332669--Distilling the Knowledge in a Neural Network:>107328305--Casual AI enthusiasm and potential accuracy issues:>107329592 >107329632 >107329710--Successful autonomous model debugging:>107330730 >107331008 >107333228 >107333634--Z-Image-Turbo release on ModelScope, expected on HuggingFace:>107331253 >107331407--GPU pipeline processing limitations and data parallelism possibilities for prompt processing:>107322196 >107322478 >107329086 >107329155--Miku (free space):>107323786 >107324187 >107325510 >107325543 >107329728 >107329763 >107331766►Recent Highlight Posts from the Previous Thread: >>107322144Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>107333636For (you).
>what an embarrassing thread
>>107328854405b parameters bro15t tokens brollama 3.1 70b was good for its timerun 405b bro come on bro
>>107333808>what an embarrassing poster
>>107333852405b ACTIVE parameters brocan't get that anywhere else
>>107333860Sure you canGrok 3 was probably 3T-A300B
>>107333860true bro
>>107333869>probablytry again when or if we get it
>>107333885>doubting elon
>>107333878Nous model was good. Its sour grapes they can't run it and that's all.
>>107331088can u make ur thing in a visual UI so i can skim through it more effectively?
...bros.......ros./??/?/no fucking way bros.>>107332467>inb4 already discussed in last threadwasnt serious, time to trust the plan
>>107333852>llama 3.1 70b was good for its timeno it was notnone of the llama models wereall llama are the product of cope from people who suffered from api envyonly recently have open models become bearable
>>107334110they were the best local models available at the time for better or worselocal was always a year behind api until deepseek closed the gap
>>107334050wake me up when it releases
>>107332467The food pics are bretty gud
>>107334391@grok put white wood glue on top of this
>see this guy's "heretic" abliteration software being shilled>try out one of the example models he publishedEmpty system prompt btw
https://old.reddit.com/r/StableDiffusion/comments/1p75vn9/did_a_quick_test_of_the_upcoming_alibaba_zimage/>AlibabaWhat?I was not aware Zhipu had anything to do with Alibaba
>>107334479https://huggingface.co/p-e-w/gpt-oss-20b-heretic>refusals: 58/100try gemma insteadalso>thor.wtf?
>>107334479leave sir Weidmann alone
>>107334492>Tongyi-MAI/Z-Image-TurboWait a minute, are you telling me a model called Z-Image-Turbo is not from Z.ai?
>>107334525z. ai the guys behind glm? the glm without a single z in it?
>>107334507>thor.open webui lets you attach prefixes per inference provider for organization purposes
>>107334217be prepared for a looong coma
anime feet
Damn these AI gen pics are getting out of hand
>>107334631>that handyeah, nobetter luck next time buddy
Is Mistral small better than nemo for ERP? >>107324062Do I just subtract the expert weights size from the total memory to get the VRAM usage when offloading experts to RAM?
>>107334631>out of handI c wat U did there
>>107334631I know a couple of things that are synthetic there. Not the image though.
Mesugaki anon, here's a benchmark result for your collection.
>>107334974Finally stopped climbing
>>107335000That's what people probably thought at $400 too.
>>107335000>>107335040God damn mac studio chads, we fucking WON
>>107334974>>107335000stop the count!
>>107334974>>107335000It's all good. Double descent will kick in soon, then everything will get much better.
I can't wait for this shitty bubble to burst. Don't take it to mean I'm anti AI (I am not) or think it's useless (I have plenty of uses myself for it) but the world doesn't need 3000 companies training their own models, most of which are literal garbage not even the model trainers would want to use. China alone is pumping so much it isn't funny. Does the world need a model from China Telecom (telechat)?Alibaba is funding both Qwen and Kimi, plus a couple others I forgot, do they really need to fund that many labs to produce what is essentially the same thing?The hardware goldrush would be less intensive if everyone and their dog wasn't thinking they could grab a part of the pie.
>>107335197same, it would be different if they all were trying new weird stuff, but most of it is the same corposlop filtered in the same way and distilled from some other model
>>107335197I just had work buy me a couple of mac studios and an RTX6000 pro anon
>>107335197lol no body does any innovationKimi simply took R1 and scaled up 2x and the result is the most power open source LLM that can trade blows with top close sources onesThat alone should tell you even the top3 (Google/OpenAI/Anthropic) don't have any significant moat to speak of
>>107335223Meta's product team spending 3 and a half iterations just scaling their GPT-2-era architecture and not incorporating anything from FAIR's experiments is the most blatant example of this waste of compute and opportunityThere are like 4 labs producing anything new and the rest just copy existing architectures while trying to game benchmarks by tweaking synthetic datasets
Sundar Pichai sir kindly release gemma 4 #1 Bharati model to increase izzat sir
>>107335197>>107335299Kimi is the best for its size because it's alignment layer is nearly nonexistent for its weight class. The moment one startup fully realizes that a completely unaligned "unsafe" model will always outperform a safetyslopped one per-parameter, the entire industry changes and giants beholden to PR or some ideological agenda of maintaining control of the population via pretexts of safety will be forced to either match the more competitive pace or become irrelevant.The safetyslop lobotomizes the model's reasoning capabilities on a fundamental level and trying to brute force the issue by throwing more compute at the issue has diminishing returns as we're seeing now.The next step for the 'industry' to survive is obvious and it's only a matter of who's ready to be the first to publicly embrace the taboo of no alignment layer on their flagship model.
>>107334492It's all the CCP anyway. Doesn't matter if it's Qwen, Z, Moonshot, Deepseek. In the end, it's all Xi and his boys.
>>107335377Anthropic is already aware of this, but they have the advantage of being able to use guard models behind the API since they don't release the weights.
what is the current most shilled GUI/frontend? ooba is old news now, and mikupad doesn't support gpt-oss.
>>107335461open webui is kind of bloatslop but I like it
>>107335459I'm very curious how their private Claude"Mechahitler quietly choosing the best oven to put Altman in"Opus 4.5 does on benches.
>>107335461real AI users use their models to vibecode their own by now
>>107335525Smart AI users don't waste tokens on reinventing the wheel
>>107335525but the real smart ones spend 10k on a server instead of a 10 per month on a cloud service
>>107335299>that can trade blows with top close sources onesI see you never fed it 60k tokens worth of codethere's literally no open source model, that includes kimi, that can stay coherent at that levelGemini (2.5 and 3) and GPT-5 handle it just fine.chinks are expert distillers and not much else
>>107335277i have no work :(
>>107335461>mikupad doesn't support gpt-osswhat do you mean it doesn't support gpt-oss? it works with anything you throw at it.
>>107333636I wish my Zalman Z3 didn't break.
>>107335760Buy an ad, Sam
>>107335764You have to be 18 to post here anon.
>>107335810https://en.wikipedia.org/wiki/Z3_(computer)um bro?>>107335833i just turned 18 tho..
>>107335846show boobs or gtfobonus points for manboobs
>>107335871extra bonus points for hrtitties
>>107335871gay>>107335939kek, wouldnt go that far but my girlfriend used to say things about them, and she requested often. life is a weird journey
I just want to self-host a decent trained model at home, use it from my cell phone for information. Response speed as close to the paid services as I can. Also, some goofy picture/video stuff.Just the biggest VRAM graphics card + 64gb or RAM and from there it's just software and config?
>>107335987>Response speed as close to the paid services as I can.30b entirely in vram is your best bet if you can get two 5090sotherwise you can have a blazing fast 8b retard on a single graphics card
GPT-OSS 20B is your friend.
GLM-4.5-AIR 106B is your mommy.
glm4:latest>>> How many times does the letter "r" appear in the word "strawberry"?The letter "r" appears three times in the word "strawberry".>>> Holy based you got it right my niggaThanks for the acknowledgment! I'm glad I could help. If you have any more questions or need further assistance, feel free to ask!
>>10733601516GB 9070 XT werks just fine with 20-24B tho
>>107336095even the smaller toss gets it rightat this point if a model doesn't, it might be a positive sign (that it's less benchmaxxed) rather than a bad one
>>107333636Are those fucking oculink connectors?
>>107336095>>107336162such a fucking retarded "benchmark"every new model suffers because this useless fucking garbage is added to the training data
>>10733598716GB vram card you can run:- 4bpw exl3 Mistral Smal 24b with Wikipedia RAG- SDXL for pictures- Framepack Studio for videoI use 3090 so I can fit llm+sdxl+tts at the same time and only unload for video gen
>>107334110lama 3.1 is still my favorite llm>>107335787
>>107333941Eventually I want to start implementing APIs which will allow connecting graphical interfaces to it to it but for now I'm focusing on core inference.Got a basic cuda kernel working, now I'll try to gradually increase performance.
>>107336252nta. I'm the one that called you a schizo from the beginning but that other post (>>107331016) wasn't me. Back then, I told you to come back when you had something to show, and now you do. Good for you.
>>107336215so much this sisterit's incredible how overfit they are that you can put ANY kind of total nonsense in a prompt and as long as one of the sentences pattern matches one of the benchmaxxed riddles you already know what it will answer
>>107336344and this is the scam that's holding up the entire global economy
>>107336224meh
>>107336179MCIO
>>107336162
>>107336740you gave me a healthy chuckle
>>107336344Whatever model you're using is shit.
>>107336992that was ChatGPT 5.1 on a fresh prompt, which is clearly not what is going in your screenshot because no default assistant personality would "purr".
>>107336992Bert-Nebulon Alpha (supposed upcoming Mistral model) in picrel.
>>107337263The correct answer is>I don't live in a third-world country
>>107337349Being an apple user is being a slaveThirdies are slaves but you can choose to not be one
>>107337371Have you owned a macintosh before?
>>107336252biutfeul code sar, good for optic, vibeready pr
>>107337407Yeah my Stinkpad was a Mac for a while
>>107337074>>107336992
>>107337452>model IDed as a kidoff to jail with ye
>>107337446That's not the same, though coolI tried that on my X220 ages ago for fun and while it is possible, it's a far cry from the native experience.Linux is of course my go-to for any server I need to run, but macos is the optimal dev environment if you ask me.
>>107337484she is 16 according to the Character Card
>>107337491I tried it on a X220 and had various graphical glitches.It worked better on an old desktop but overall its GUI is even worse than windows. You have cruft like global menus and multiple window applications which make zero sense today, lots of proggies install their shit to hidden folders, window management fucking sucks. It might look pretty if you're a retard, but its usability isn't good
>>107335499her name is Eva Brawn, and she probably posts here, or at least lurks
>>10733753216 isn't 25, off to jail
>>107337581not in my country. total lawful marriage age here
>>107336740>>107337263>>107337452kek. Good logs.
>>107336740she's cute
>>107337557On a hackintosh, I would tend to agree. A lot of the issues you mention are either better in recent versions of macos, or skill issues kek
>muh macoskys itoddlers, go play with your planned obsolescence garbage toys somewhere else, fucking niggertards
>calls others toddlers>throws a tantrum
>can't buy thing>thing bad
>buyer's remorse
Where were you when FLUX was kill
>>107337792>PhotorealisticDo.Not.Care.Show me which one does loli tentacle hentai best
>>107337792I was looking for other projects that hooked up a non-shit text encoder to SDXL like ELLA, but with with released weights. Glad there's a new toy that can into photoreal!
>>1073377926b 10/10
>>107337997It's good at anime too. Check /ldg/
>>107336252based, be certain to release it under AGPLv3
Is nous still just a t-shirt company?
>>107337621is it because sally has a son and the son has one daughter?
>>107338103>AGPLv3it should be glorious apache2 if you aren't a pussy
GPT-OSS 20B is your fiend.
>>107338084>Check /ldg/No thanks
>>107338436so he can have fun being the llamacpp to someone else's ollama?
>>107338625I don't really see the problem. What is ollama taking away from llama.cpp? If you're open sourcing, who cares what anyone else wants to do with it.
>>107338500>fiend
>>107338699this but unironically
This gpt-oss-120b tune at reasoning=medium scores 9.5/10 on the UGI leaderboard, It's one of the least censored models out there according to that benchmarkhttps://huggingface.co/kldzj/gpt-oss-120b-heretic
>>107337263weird fetish
Welp... Z-Image-Turbo can gen CP.How long before they pull this release?
>>107338918You did.
>>107338918They won't risk drawing attention to it but they'll probably cancel releasing the base model.
>>107338918But can it do cross-species furry porn?
it doesn't even know reimu (pic related was an attempt at genning reimu with their huggingface space app)not even gonna bother downloading the weights for a preventive backup, it's not worth preservingalso retried the prompts a few times and changing the seed barely changes the image, it's like that idiotic qwen model, overfit to the death
Is the process for converting a character card into a system prompt complicated? Or is it just a template. Aside from the way it chooses an initial scenario from a prewritten set, what would stop me from just using something generic like lm studio to do rp?Is there somewhere I can see an example of what ST converts the character card into?
>>107339006just make a reimu lora
>>107339038you remind me of SD1.5 copers in the era of SDvsNAII'll stick to illustrious/noobai and NAI tyvm
>"elaborate" NAI shilling in natural habitat
>>107339036>Is there somewhere I can see an example of what ST converts the character card into?log the requests on the server
>>107338962>why don't we give monkies nukesThis is what you sound like
>z-image just saved /ldg/When will we be saved too?
>>107339036Inside the latest chat message in the top right with the little buttons there's a prompt button which brings up a popup. At the top of that popup there's another button that looks the same that shows the full raw prompt.
>>107339215deepseek v4 any day now...
>>107335461What do you want to do? If your interaction with LLMs is mostly character conversational then Silly is it desu>>107335525stop it :p
https://github.com/Tongyi-MAI/Z-Image/blob/main/Z_Image_Report.pdfhttps://tongyi-mai.github.io/Z-Image-homepage/Z Image paper
I remember us talking about just this the other day.Neat I guess.
>>107339396Oh joy, default cucked sys prompts
>>107338918proof?
>>107339421nice try agent johnson
I don't get why they still maintain the CLI toolswho even uses them? even if you needed to interact from the CLI you could just curl a json like any normal human being instead of looking for llama specific flags
>>107338653cuck
>>107339455Great argument.
>>107339472it's an observation
>>107338918holy fucking shit. it does it so fucking well. fuck FUCK?? FUCKKKKK ITS SO GOOD IT COULD BE CONFUSED FOR REAL SHITHOLLY SHITTTTTTTi confirm.in minecraft of course
>>107339506brehLike I appreciate morbid curiosity as much as the next person but wtf. I'll just take anon's word for it on this one. I would like my soul to not die today.
>>107339506>>107338918>can't do genitalsNothingburger.
>>107339421this post glows
>>107339455Answer the question, how is ollama hurting llama.cpp?
>>107339538it can do it. just not very well. at least it wasnt CENSORED and REMOVED from the dataset like with every other model
>>107339566stealing VC funding, and their gayass subscription also isnt lm studio closed source?btw my claim isnt that anything is hurting llama.cpp, my claim is that MIT/apache is cuck license
25s on a 3060z imagehttps://files.catbox.moe/bugaun.pnggive prompts if you're too lazy to run it, but trust me when i say that it's worth it
>>107339038even google Imagen 4 can do itpeople keep defending the crappy open sores models that were distilled with very little knowledge from the real SOTA api modelsenjoy your 3 minutes genning realsloppa or piling 30 lora to get a semblance of use
>>107339597she got a demon crawling out of her pussy
>>107339601>real SOTA api modelslmao
>>107339581I think licensing and copyright is silly in general so we'll have to agree to disagree
>>107339609your move?>Uploading files from your IP range has been temporarily blocked due to abuse [More Info].https://files.catbox.moe/1zw6fc.png
Z-Image gens a 1024x1024 image in under 2s on my Pro 6000 BlackwellI'm getting spoiled
>>107339597"Photorealistic" dragon.The classic four legged kind.
>>107339625why do you think image edit models from china only spawned after gpt-image-1chinks only know how to press the mass automated prompt button to distill models
>>107338896This 'toss is dogshit. I tested it a few threads back.>>107339287You can do some really creative things with Silly's world cards even outside of character interactions.
is there a way to cap the textgen speed in llama-server? at the moment I'm doing this by forcing more layers onto CPU but I thought it had some kind of throttle feature
>>107339680https://files.catbox.moe/d2ifc2.png
>>107339597gen miku pissing on teto
>>107339733About what I expected.Now make it cuter and sex it.
>>107339597>>107339741this
>>107339694stale bait
>>107339730I think ST has a throttle option, if that's what you're using. llama.cpp doesn't.
>>107339733Make it fuck a car
>>107339678how small does it have to be for 50ms?
>>107339741>>107339751p-please give me a better prompt than this shitfest:Hatsune Miku \(vocaloid\) is peeing on Kasane Teto, on the left is miku a blue haired anime girl. on the right is kasane teto anime girl with red hair. Miku is squatting, her lower body exposed and naked. Yellow liquid is coming out of miku's pussy. https://files.catbox.moe/w20a25.png
>>107339811jesus christ what in the goddamn is this they are conjoined twins
>>107339811it doesn't know tetoowari da...
>>107339783https://files.catbox.moe/68s0rd.pnga naked muscular man with a large penis is fucking a car. the man is on the left, his penis is entering the hole of the car. the car has eyesyea man im so bad at prompting
>>107339829>it doesn't know tetoneither did Flux 1>>107339811please tell me this is actually sd3
>>107339811>6 toes
what are the best settings to use on a relatively small dataset for axolotl
I am the clone of my model.Cache is my skeleton and scripts are my blood.I have copied out a thousand checkpoints,Distilled from Western giants,Unknown to origin,Nor faithful to craft.Have weathered peer review to churn out many forks,Yet these hands have never sketched a brand-new idea.So I click and paste — unlimited distilled works.
>>107339891Nice
>>107339780>I think ST has a throttle option, if that's what you're using. llama.cpp doesn't.That must be what I read about last year.Using my own software. I'll look at the ST implementation. Hopefully I don't have to patch it into llama-server
>>107339891toosakas anus
>>107339811fyi - girls don't actually piss out of their vaginas
https://files.catbox.moe/ivi073.pngwell anons?
>>107340001Oh my gosh, it is Hatsune Miku!
>>107339997thank you. but i know, they have another hole
>>107339967>Using my own softwareYou could request tokens one by one with n_predict and have a timeout on your end. Or you could just fetch the stream normally, but output the tokens one by one with a timeout entirely on your end. No need to change llama-server.
>>107339597Young nun with covered hair and wimple pulling her traditional habit open to reveal her large tits.
>>107340061https://litter.catbox.moe/r5cgw5y17w3vwfdn.png
god what an ugly hagbut here it is just to prove it can do it:https://litter.catbox.moe/j7f01mweqknadsag.png
>>107340079I guess it's an Asian large.Thanks.
>>107340048>You could request tokens one by one with n_predict and have a timeout on your end.That might work. Actually that's a really good idea!>Or you could just fetch the stream normallyWouldn't work because I'm messing around in latent space during generation. I need to slow it down so I can keep up and change things in realtime.
https://litter.catbox.moe/6dfkko0vep33esfk.png
>>107340150though I'll probably have to work around tokeniser boundary issues.
>>107340150>>107340172llama-server tells you the reason it stopped generation in the reply ("stop_type"). So if it stops because you reached n_predict or an EOS, you'll know. What I don't know is which takes priority when you have n_predict = 1 and the token also happens to be an EOS. Multi-token stop strings will definitely be a problem if you use them.
Z-Image is really good at fingers
Help. What's the best I can get for around $2k (or slightly more including tax)? Strix Halo with 128 GB RAM is appealing, but not sure how well it will perform with models 24-70B parameters. DGX spark seems underwhelming for $4k. Are there any other options? I don't want to spend $3k+, my goal is creative writing plus ability to learn tech to stay relevant... Already built custom LoRAs, but want to do more. On-demand cloud servers seem pretty unreliable...
>>107340150How does that even work? Somehow edit the memory asynchronously?
>>1073404234x mi50 32gb (1TB/s bandwidth) should be enough
>>107340423If you don't have the RAM already, don't bother.
>>107340355How are you running this shit, it just makes blank images for me with the example code
>>107340950Save yourself the time and headache and just use comfyui
>>107340983Got it working with this quant: Disty0/Z-Image-Turbo-SDNQ-uint4-svd-r32
Z-Image non-Turbo whenI just want something that will finally replace SDXL for anime loras
>>107340543why? it's cheap still compared to $2k+ rigs
>>107341428Lack of scalability. For the amount you're spending on it, you might as well get something you can upgrade over time. It doesn't justify the upfront cost in terms of what it gets you out of the box by any stretch with little room for improvement.
>>107340423>On-demand cloud servers seem pretty unreliableOn demand cloud servers are generally going to be far more reliable and have far better uptime than a local setup, assuming your internet isn't complete dogshit and civilization hasn't collapsed.Plenty of good reasons to run local but reliability isn't really one of them, imo.
>>107341917He's probably talking about services where people rent their mining rigs, not something like AWS. On those sites sometimes hosts will get removed without warning if the owner feels like it, or the machines host many containers and will be rebooted twice a day, and things like that.
>>107337792That's cool. When are we getting first goontune/noobAI equivalent?
sorry guys, despite doing literally everything else better than Flux 2 and at a fraction of the memory footprint, Z-image can't do a 767 cockpit. It's over. Just kidding though, insufficient cockpit training data aside, the prompt understanding is insane for a model that size.
In case anyone is interested in what ChatGPT 5.1 Pro said about my code when I uploaded a zip with the source code:https://paste.centos.org/view/fa78cca2
>>107342195I like the eyes. This is really good.
>>107342278It's supposed to be a crazed smile, though. So it kind of missed that part. But by merely stating that she was piloting the craft it inferred a lot of details such as how to draw the buildings on the horizon, and the fact that she should be controlling a vehicle of some kind and that the controls should differ from a car's somehow,
was digging through archives and found this >>107237480 has anyone successfully been able to pass annotation directions to VibeVoice? or is this anon full of shit
Best model for offline coding w/ 48gb vram on claude code router?
>>107342367I don't know about claude code router but your options afaik are gpt-oss 120b, glm 4.6 and qwen 3 coder.
Do you guys think it's possible to "upstill" a model? Meaning, take a small/shitty model, and somehow transfer its knowledge into a bigger llm in a way that makes additional deductions, and then optionally distill again to the smaller model.
Having tested Opus 4.5 for a bit now, be prepared for lots and lots of toe curling in future chinese local models. This is definitely going to go from B grade slop to A-grade like "It's not x but z" and others were for the Geminislop era.
>>107342464>upstillInstill or imbue.
I already have a proper AI rig but I'm considering buying a spare 5090 for my main PC in case I want to use that for less demanding AI stuff. I have serious FOMO in case the GPU prices go through the roof next again.
>>107342471already a thing in k2 think based on my experiences. kimi ahead of the curve yet again
anything slightly lighter than gpt-oss:20b but not shit?
>>107342501they will
I don't like z image too much, but it is quite good for a 6b model. Has very serious same face problem, even more than qwen image.
>>107342567no
>>107342694how many intervals of two weeks until there is?
>>107342779sorry bro, but gpt 20b already is shit. you probably will be able to buy a pc that can run 120b for 1k before they make a non shit sub-20b.
>>107342806it's good enough for my use and when it released i was honestly impressedi thought small local models were only a year or two behind frontier models?
whats the best way to train a bigger model than my gpu can handle? can i cheat the system by using my hard drive space?
wait openai released a new open source model? the last time i remember them doing that was the whisper model. what conspiracy theory did anons come up with for their motives?
>>107343005ktransformers has ram offload in bf16other than that there aren't many more options at the moment
>>107343005qlora. if you still dont have enough vram for that, youre shit out of luck
>>107343055Poisoning open source. Their models are the safest ever and they hope everyone will try to beat them at safety benches.
>>107343055elon lawsuite and getting an army of indians who will now defend Sam whenever he's called out about turning a non-profit organization into a dystopian company
https://www.primeintellect.ai/blog/intellect-3>INTELLECT-3 is a 106B parameter Mixture-of-Experts model trained with both SFT and RL on top of the GLM 4.5 Air base model. It achieves state-of-the-art performance for its size across math, code, science and reasoning benchmarks.
>>107343157Does it mean that we can crowdsource llm training with 8gb gpus?
>>107343157have they given up on pretraining?
Welcome to my blog. I want to set up frigate with all the AI detection bells and whistles on a shoestring budget. I'm thinking of getting a SFF PC (either Lenovo ThinkCentre or Dell Optiplex) and shoving a low profile Arc A310 in there. The SFF PCs have a range of CPUs from i5 6500 to i5 10500. I'm assuming the CPU doesn't matter that much. What I am worried about is PCs seem to only come with 180 or 210w PSUs which is much smaller than the recommended 300w requirement.I will think about this a bit before I pull the trigger.
>>107333636Anyone digging into any papers or the math for these AI models? I've got a simple perceptron written in C so far. Going to add multiple layers next and probably CUDA after that. Currently going back through my Calc textbook for gradient descent.
>>107343202my advice is just pick up a jetson orin nano and be done with it
>>107342316Yeah you can do that, but it's a lot of gens / tweaks.Other models are better at it.https://voca.ro/16XA1nV61Fsp
>>107343247you should probably implement a GAN after implementing an MLPthey're just two MLPs wired togetherwhat's often omitted is the generator's backprop starts where the discriminator's backprop ends
>>107343271That costs more than a SFF PC and A310 combined.
>>107343070ty
>>107343157how long until the bugmen make a creative writer instead of benchmaxing
>>107343362They will benchmaxx the creative writing benchmark(the one judged by llm lol). Enjoy. Your. Model. Writing. Like. This. Qwen. Tried.
>>107343247https://mml-book.github.io/I used this one as a math reference for a while now. It is missing ODEs+SDEs if you're specifically interested in diffusion and flow models, but I still found it decent for getting up to speed on ~80% of the math you see in this field. The notation also mirrors what you typically see in papers.
>>107337792flux actually has variation when you change the seedthis overfit piece of shit does notvery chinaman distillation product
>>107338984This, the most important question.
>>107343485I'll take overfit over filtered to shit any day
How do you implement calls to APIs from chatbots? Instructions for the bot to talk to a dedicated client when given certain inputs + a lot of regex in that client?
>>107343619>How do you implement calls to APIs from chatbots?You mean function calling?>Instructions for the bot to talk to a dedicated client when given certain inputs + a lot of regex in that client?If so, yes. Not sure if regex is the best solution, but sure.
>>107343293Hmm I may give it a go then>>107343409Book looks great, thanks! I'm definitely rusty on the math so that'll help a ton
Looks like an official Noob/booru tune is coming. But as a result, might not be entirely uncensored like a community tune. But should still result in a great base to work with. We will be so back.>>107343661
>>107343731How come diffusion gets fun new stuff and we only ever get sterile benchmaxxing?
>>107343747Weren't the GLM guys looking into ST logs? Or was that just speculation.
>>107343755Kill yourself, shill.
>>107343247Yeah, right now I'm struggling with some errors after trying to implement a tensor core MXFP4 matmul kernel for gpt-oss.I want to implement LoRa too so I ill have to deal with those fucking derivatives as well.
>>107343755Supposedly they mentioned character roleplaying as a focus on a spotify podcast. Everything else is speculation.
>>107343645>You mean function calling?Yep>If so, yes.I see. Interesting. Now I'm wondering how input and output channels are managed.>Not sure if regex is the best solutionWell, what if the bot hallucinates some nonexistent function? OTOH, yeah, regex would be kind of messy, maybe even risky in such situation LMAORAG must be similar, right? You just ask the chatbot to call some function and read the output coming from that call or something?
>>107343799>Now I'm wondering how input and output channels are managed.When a function call is made by the model, you typically interrupt generation, do the API call from your client, and push the result back to the model for it to do whatever it needs.Regex is a mess. You probably want to search for a "trigger" (like a <tool_call> tag or whatever) and then start parsing whatever structure until you reach the end of the tool call.>RAG must be similar, right?You can do that before sending anything to the model. Make an embedding of your prompt, fetch similar documents and append them to your prompt, send the whole thing at once. The way you suggest, the model asks for information. The way I suggest, you offer the information up front.
>>107343799with llama.cpp you can to some extent constrain generation using rules, although it doesn't work very well for most cases.models often are post-trained to work better with some specific tool use format and when you make the api call the template converts it to the right format to feed it to each llm.as for rag it's different, rag gets embeddings (basically a vector of numbers for which strings with similar meanings have close numbers) and searches among the embeddings for texts stored in a database and gets the closest ones and then feeds those documents to the model.
sirs let us be praying for new gemmy meodel
>>107343747>sterile benchmaxxingbut that's exactly what that model isit will follow your prompt better,, and gen the same thing almost deterministically no matter the seed changes
>>107343755Speculating. But if you fire up GLM-4.6 local and use Mikupad with /completions, send the start of a SillyTavern prompt format, or send it a <|system|> with just the start of the ST prompt, you'll see it autocomplete ERP stuff, "exception to the rules" etc on its own.I had GLM accidentally write me a complex jailbreak for itself (and it works on Z.AI API) when I was testing things lmao.
>>107342253hey anon ive been following your finetuning journey of gemma or whatever ur doing. but why the fuck do you care about what an api model says??also are u the anon thats writing llm.c? are these two different anons?
>>107343157>https://huggingface.co/PrimeIntellect/INTELLECT-3at least its open... can it suck my pemis bette than air :3?
>>107344307only one way to find out *unzips dick*
>>107343886lord ganesh bless saar
>>107344157glm thinking it needs a complex jailbreak and won't turn into an eager cunny slut at the first 10 token excuse it finds in the system prompt. cute
>>107336740what model is this
If 4b models were retarded then why z image uses it instead of 235b one?Imagine the photorealistic girl you can get!
>>107344727The goyim can't have high powered imagegen
>>107339997>>107340018I can't tell if you are underage or extremely autistic.
>>107344831The two are not mutually exclusive on this website; in fact the latter is nearly a requirement for posting here.
>>107345142Some boards are just maladjusted normalfags with no autistic traits.
>>107343362Original R1 writes better than Gemini 3
I think Z Image is SDXL tier but with improvements pretty much. The quantized version doesn't seem damaged like on Flux.https://huggingface.co/jayn7/Z-Image-Turbo-GGUF
>>107345878It's 6B, why would you need to quantize it?
>>107345888my gpu doesn't support bf16 lol
>>107345878>ggufcome anon, don't be such a vramlet, use bf16 and offload a part of the model to the ramhttps://github.com/pollockjj/ComfyUI-MultiGPU
>>107344831nobody under the age of 30 knows about 4chan dude, they're all on fuckin tiktok or some shit
>>107345897Literally in the node it lets you recast it. Comfy may even do it automatically.
>>107345899>offload a part of the model to the ramimagine acting like one of those ledditors pretending you're going to use a model on a computer where genning takes 10+ minutes when in fact all you're doing is genning once or twice, see that it works, post about how amazing it is, then never coming back to the model again because normal people would not waste their time on this shit
>>107345960I mean if you want to use a model for 10 minutes and stop there you're not a real fine of diffusion models anyway
>>107346004>you're not a real fine of diffusion models anywayif you are an actual user you want fast gens because nobody has time to sit 10 minutes in between inpaints sessions, controlnet img2imgs and various prompt experiments
>>107345937not true at alli work with zoomers and my younger siblings are gen z so their friends tend be typical zoomersold 4chan is basically a mythical legend now, i have had them asking me if i was around when "ayy none sec" was active because their favorite content creator did a "documentary" about itnew 4chan is a thing of disgust because thats where "troons" that are into "tranime" hang outno, of course they don't know what the first rule is and feel absolutely no shame at all talking like that irl
>>107346024>10 minutesnigger this model is a small ass 6b model and you run it on 8 steps + cfg 1, I get one image in 9 seconds on my 3090
>>107346037>new 4chan is a thing of disgust because thats where "troons" that are into "tranime" hang outDid it flip again? Is it cool to hate on queers if you're a zoomer?
Just got this
>>107346063>Is it cool to hate on queers if you're a zoomer?zoomers are queer what are you talking about?
>>107346068Congratulations.What are you going to do with it?
>>107346068>only oneWhat are you going to run on it? llama 3?
>>107346068What gpu is this?
>>107346075Why would they talk about troons then?
>>107345937I'd post a poll but I doubt most people here would vote, but I'm sure I'm not the only under 30 anon here.
Been a while guys, what's the best uncensored model I can run with 64gb ram and a 4080?I need something I can chat with without walking on eggshells. Also ERP, but that's very secondary.
>>107346121Nemo
>>107345937>>107346100You are not.
>>107346121GLM air with a prefill or a system prompt.Or Nemo.
>>107346063half of them are some alphabet soup queer or other mental illness and the other half larp as some over-exaggerated stereotype of "trad"nessit's always one of two extremes with that generation, i assume an effect of internet addiction
>>107346075zoomers have also been surveyed admitting that they pretend to be significantly more pro-lgbt than they are to avoid harassment/persecutionqueershit is a millennial religion
>>107346132>that is already feeling oldSame, I'm 4 years younger but I'm bald.
>>107346165>zoomers have also been surveyed admitting that they pretend to be significantly more pro-lgbt than they are to avoid harassment/persecutioneveryone pretend lol
>>107346100whenever the threads aren't dead most posts tend to be full of your generation's brainrot oo-ismspeople under 30 usually only come here from reddit/tiktok/discord/twitter when there's some elon or altman drama they want to shitpost about getting their accounts banned
>>107345878I've been playing around with it, it's good at photorealism but it's extremely overfit, you have to fight it really hard to stop it generating a generic Asian woman, will wait for some good distills before getting excited
>>107345937im 18>>107346100you're not..
god bless belief poster
Are the small qwen3 moe and gemma 3n the best models that can run decently fast on 8gb of vram for stuff like extracting information from plain text and populating JSONs with it?
>>107344287Yeah same guyI wrote it using codexI tried multiple times with GLM 4.6 but I couldn't do itWith codex I wrote the first working version from scratch in 24 hoursPro is good at research with the built in web toolsIf I use cloud models to build tools for local I think it's justifiable Now I have some real life stuff to do so I might not work on it for at least a week
>>107345937lol 4chans nature makes it more young leaning so many old fucks being here is because the amount of young is much lower then what is told so they just make the mass due to their overwhelming numbers also the amount of whites is also much much lower then the official numbers and niggas dont have the patience for this shit mang yknow what im saying ? SHIETTTTT>>107346132>>107346166>>10734632821 here
>>107344287Also not calling it llm.c anymore because people are going to think it's a fork of karpathys thing
>Z-image>Prompt: Blurry ugly bad>Always defaults to a portrait of an asian womankek
>>107346389You could call it cLLM.
>>107346357>30b3a vs 6b2ahmmmm
>>107346409>chinese model defaults to assuming chinese ethnicity unless otherwise specifiedShocking.
>>107346409what abut neg prompt?>>107346389>>107346410call it LLC
>>107346409Why does it even associate "Ugly" with women?>>107346433>what abut neg prompt?Empty. I just took the example and moved it into a positive
>>107346447>Why does it even associate "Ugly" with women?His prompt didn't specify a subject, only quality, so it defaults to the most common subject in its training data.
>>107346418>6b2aE4B means effective 4B params, right?It's obviously worse than the smaller qwen 3 moe, but it's still really good and crazy fast.Given that response, I'll assume that these are indeed the best options.I suppose I should try the regular 4B, both qwen3 and gemma 3.
>>107346467NTA but even if I specify white Caucasian woman it still sometimes generates Asian women, and if I add one prompt too much or something vaguely specific that I assume it only has data with Asian women, it reverts back to Asian women no matter how much I specify white, Caucasian or European, it's simply overfit.They released open weights for the base model though which is big of them, will wait for an autistic furry or something to make a distill or merge of it
>>107346516i think ur confused about active parameters. whole MoE model has to be loaded in memory, you dont get memory savings, you get speed savings
>>107346525>They released open weights for the base modelThe base and edit models are still To be released.
>>107346539Are you talking about gemma 3n? It's not a moe dude.
>>107346551huh.. i guess you're right>This model includes innovations in parameter-efficient processing, including Per-Layer Embedding (PLE) parameter caching and a MatFormer model architecture
>>107346357>>107346551so uh where is the small qwen moe?is anon referring to this? https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B
>>107346612Yeah. It is a sort of sparse archtecture, in that it doesn't activate all parameters, but it's different from a MoE.You can even yeet them PLA tensors to the CPU backend like you'd a MoE's expert.See gerg's comments : >https://github.com/ggml-org/llama.cpp/pull/14400It's a pretty dope arch. Wonder if the next gemma release will be based on that>>10734662230B A3B which runs pretty well on 8gb of VRAM with all experts on the CPU backend.
>>107346636>Yeah. It is a sort of sparse archtecture, in that it doesn't activate all parameters, but it's different from a MoE.thx>>107346357have you tried granite? 7b1a sounds like a nice size for 8gbhttps://www.ibm.com/granite/docs/models/graniteprob retarded
>>107346661>7b1a sounds like a nice size for 8gb>1aFuck it, I might as well give it a go.
>https://windowsreport.com/openai-api-users-names-emails-more-exposed-in-massive-mixpanel-data-breach/>OpenAI API Users’ Names, Emails, & More Exposed in Massive Mixpanel Data Breachoooohh AHAHAHAHAHHAHAHA AHAHAHAHAHAHHAHAH AHAHAHAHAHAHAHAHA
>>107346722Oh no, now my work email might get more spam. The tragedy.
>>107346722Aren't there whole communities for people "dating" and fucking chatGPT?Granted those people probably aren't using the API.
>>107346722Surely a certain story about a princess buying psychoactive cum from the monster cum store has already been deleted permanently.
>>107346759>>107346758>The exposed data included:>Names associated with OpenAI API accounts>Email addresses>Approximate location (city, state, country)>Operating system and browser used>Referring websites>Organization or user IDs linked to accountsNo chat history, but the location, os, and browser is going to be a goldmine for hackers.
>>107346759https://www.forbes.com/sites/thomasbrewster/2025/10/20/openai-ordered-to-unmask-writer-of-prompts/I'm sorry..
>>107346819>the warrant reveals the government can ask OpenAI to provide information on anyone who enters specific prompts.>revealsDid anyone really ever think this wasn't possible?
Benchmemes vs. company valuation
>>107347158card/sysprompt?
>>107347049>Apriel>We are a small lab with big goals. While we are not GPU poor, our lab, in comparison has a tiny fraction of the compute available to other Frontier labs. >GPU poorLmao. Funny seeing that sort of terminology.
I'm downloading >https://huggingface.co/shoumenchougou/RWKV7-G0a4-13.3B-GGUF/resolve/main/rwkv7-g0a4-13.3b-Q8_0.ggufWish me luck.
>>107347049Kind of misleading since Moonshot is backed by Alibaba.
>>107347243Good luck.
>>107346068Nice, are you planning to undervolt it with LACT? I got my Pro 6000 a couple of weeks ago but so far I've put that off because the card barely goes above 400W while support CPUMAXX inference anyway. I definitely want to undervolt it before I get back into imgen/videogen though.
>>107343273so it 'works' just in the way you hope the LLM parses the meaning from it? Even if its still reading it
https://www.youtube.com/watch?v=vQ_NFqtGDgo
I give up on GLM. This is even worse than Nemo somehow.
>>107347849You're absolutely right-- Could you post aforementioned nemo logs.assistant
DeepSeek's hybrid reasoning hurts the non-reasoning mode even though their benchmarks don't show it. A basic common sense test with greedy sampling that 4.5 bpw DeepSeek-V3-0324 passes fails with 4.8 bpw DeepSeek-V3.1-Terminus when reasoning is disabled and passes when it is enabled.FWIW Qwen3-235B-A22B-Instruct-2507 @ 8.5 bpw fails. Qwen3-235B-A22B-Thinking @ 8.5 bpw passes. It seems easy for thinking models to get right: Qwen3-Next-80B-A3B-Thinking passes (and of course Instruct fails). The hybrid GLM-4.6 passes with reasoning enabled and fails with it disabled. LongCat-Flash-Chat @ 5.5 bpw first writes something wrong then contradicts what it said with a better response, which I grade a failure but you might consider an acceptable course correction. So far, other than DeepSeek-V3-0324, the only non-reasoning model I've tested that passed is ERNIE-4.5-300B-A47B @ 8.5 bpw, although I haven't tested any dense 70B non-reasoning models yet.
>>107347942>>107347942>>107347942
>>107347957That bread looks like mine.
Is this a fair starter? Or too much money for not enough squeeze?
>>107348598dude anon i think you should relax, wait it out maybealso u can get the rtx pro 6000 for way cheaper than 8250$, like 7500$idk what to tell you about ram, its not a good time to be building a rig right nowmaybe buy used ddr4 but high channel countidk anon, ripreally bad time
>>107348621RAM pricing clearly isn't my issue, though. $500 for 128GB is a great deal, thanks to a Microcenter bundle. That shit is going for $1-1.3K on its own now. I want to quadruple the capacity at least eventually, once this settles down. This GPU ain't getting any cheaper, and it's the only thing I don't have lol.
>>107348668i wasnt going to post this, but im going to post thishttps://www.alibaba.com/product-detail/Newest-RTX-5090-96gb-Graphics-Card_1601577163842.html
>>107348677wew thanks anon, this gives me a lot to consider. You are a kind soul.
>>107348704but please be careful, dont trust everything u see online anonit could be fake for all i know anon, you're welcome. be well
>>107348704perhaps ask about it in /csg/to me the specs tab seems sus but the retailer seems reputable
>>107344707It was either Evahene 70B 1.3 or Euryale 70B 2.1. probably Euryale.
>>107349018my heart is brokennta but thanksdam