/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107278838 & >>107266608►News>(11/20) Olmo 3 7B, 32B released: https://allenai.org/blog/olmo3>(11/19) Meta releases Segment Anything Model 3: https://ai.meta.com/sam3>(11/11) ERNIE-4.5-VL-28B-A3B-Thinking released: https://ernie.baidu.com/blog/posts/ernie-4.5-vl-28b-a3b-thinking>(11/07) Step-Audio-EditX, LLM-based TTS and audio editing model released: https://hf.co/stepfun-ai/Step-Audio-EditX>(11/06) Kimi K2 Thinking released with INT4 quantization and 256k context: https://moonshotai.github.io/Kimi-K2/thinking.html►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107278838--Custom multi-GPU server project with 320GB VRAM and hardware optimization challenges:>107283400 >107284563 >107284600 >107287765 >107287917 >107290005 >107291128--Academic project on distilling coding model data via multi-source finetuning:>107285889 >107286057 >107285965--Evaluating Gemma 3 27b's translation strengths and alternatives for summarization/context challenges:>107287000 >107287067 >107289939--Public and private funding dynamics in LLM development:>107283950 >107283959 >107284120 >107284019 >107284074 >107284116--Critique of Olmo 3's training data and multilingual performance:>107283219 >107283784 >107283826 >107283899--Axolotl finetuning troubleshooting and dataset creation challenges:>107289153 >107289345 >107289353 >107289384 >107289388 >107289420 >107289434 >107289478 >107289485--AI model performance test with upside-down character card:>107289586 >107291881--Vision model viability for context-aware wake-word detection:>107280477 >107280511 >107281879--Censorship status speculation for HunyuanVideo-1.5:>107281119 >107281150--Z.ai's 30B parameter model announcement:>107290479 >107290521 >107290563--Merged PR enhances llama.cpp web UI with "Continue" Action:>107287365--Meta vs. Gemini 3: Corporate missteps and market dynamics:>107286569 >107286617 >107286692 >107286700 >107286710 >107286773 >107286832 >107286888 >107286889 >107286846 >107286944 >107286977 >107286993 >107288073 >107288085 >107288224 >107288093 >107289444--Suspicions of AI benchmark manipulation or hallucination:>107286860 >107286899 >107286951--Praise for K2 model and chaotic humor exchange:>107287071 >107287880 >107287321 >107287564 >107290452--Luka and Miku (free space):>107278944 >107279087 >107279948 >107280595 >107282117 >107284081 >107286185 >107286253 >107292643►Recent Highlight Posts from the Previous Thread: >>107278842Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>107292886>>107292898
>>107292917Help is availableSpeak with someone todayNational Domestic Abuse HotlineLanguages: English, Spanish and 200+ through interpretation serviceHours: 24/7Call 800-799-7333
>>107291488times like these i wish i dident have my emails already banned everywhere... alas such is the fate of the gods silliest clowns
>>107292917gpt-oss got bitch-brokenhttps://huggingface.co/kldzj/gpt-oss-120b-heretichttps://huggingface.co/p-e-w/gpt-oss-20b-heretic
>>107293073There's already this>https://huggingface.co/Jinx-org/Jinx-gpt-oss-20b-GGUFAnd it was supposed to be pretty intelligent. This doesn't make up for the fact that GPT-ass just don't have the necessary training material. They did lot more than just censor the model.Not talking about chronic masturbation here but in general.
>>107293091Saar, that's a 20b. It's going to be ass. At least give me another 120b variant of it.
>>107293120It has nothing to do with the amount of parameters. They have culled out copyrighted material from its training data or afterwards.I have tested this model in the past and Gemma 3 12B is more pleasant than this turd.For example, it lacks common knowledge about Forgotten Realms. Mistral and Gemma 3 can fill out locations with its own data it knows but GTP-OSS invents things because it is not allowed to use copyrighted material.It's not worth the disk space.
>>107293169To add: maybe GPT-OSS is great for creating tepid corporate emails in neutral tone.
>>107293120I am from Bangladesh.
>>107293194A few months ago I used it for generating large amounts of synthetic data for training tests where NSFW content wasn't a priority. Very sloppy and formulaic, but coherent and very fast for that purpose.
>>107293326It's coherent because it doesn't have anything else. Like corporate clip art.I don't know is it possible to analyze the model and detect are they using baked in loras or something else? I guess the easiest explanation is that they trained it with very restricted dataset and this is all to it.
>>107293375The way it completely breaks down without a chat template unlike other models, makes it seem like the dataset was 100% down to pretraining.
Mother fucker Booz Allen stole my ideahttps://www.boozallen.com/expertise/products/vellox-reverser.html
>>107293399Model was created to be some form of office assistant and that's all. They constricted the training set and went on.Would be very interesting to see what sort of stuff Google and OpenAI have.If you poke around Gemma, it will tell that it has been trained with forum posts already gone years ago.
>>107293463Booz Allen? I prefer GG Allin products.
>>107293469I meant to say 100% synthetic
>>107293481Maybe so. They have petabytes of data...More valuable than any model including chatpajeet.
>>107293463Weird when did they drop Hamilton?
>>107293073can it casually do rape and other stuff like glm?
https://github.com/ggml-org/llama.cpp/issues/14702#issuecomment-3506645678>Maybe we can freeze and deprecate /v1/chat/completions and drop support in the future (say, 6 months from now).>Any long-term plans? @ggerganov>3 weeks agoThe silence is deafening. Even the vibecoders are starting to pitch in thoughts.
>>107293762Buy an ad.
>>107286889We do create the data that ScaleAI is selling. It's just that outside of some cases, we never start from a blank text field: A crappy LLM write a prompt or a response, and we have to fix it based on some rules (usually, they are unclear). However, some workers from certain countries are known to cheat by using ChatGPT/Gemini/Claude to fix the data point or to review the quality of the data, so they can do produce more and earn more. Usually, they try to pass as Americans or Europeans to be paid more. This is partly why ScaleAI's datasets are crape. I think I saw two major ban waves among them already. Facebook was a fluke. Zuck has been going from one failure to the next since then. He can't figure out what people like and dislike, he can't find a single good idea (did he invented something new since Facebook?), and he can't distinguish honest and competent people from grifters, psychos and incompetent people (hence he hired Wang).
>>107292917Are you French? I think some retard with a French IP is spamming CP, so we get ranged banned.
>>107293477I had the opportunity to see GG allin's brother recently at some tiny bar. murder junkies I think? I didn't, because the only point was to say "I saw gg allin's brother" to which the only sort of people where that means anything would probably call me a nazi or rapist or somethingbut now I kind of regret it, cuz I could have seen gg allin's brother
>>107293817>Facebook was a fluke.Facebook as an idea was stolen by that filthy jew and was promoted over MySpace only because jews help other jews. Zucc has never been able to create or grow anything by himself organically.
>>107293817I'm sure there is a lot of data to grab from ancient MMOs, in-app chats and other mediums not easily accessible
>>107293914I somehow doubt that millions of lines of "Trimming mithril armor for free!" would make for good training data.
>>107293975could probably do a little deduplication on it.
>>107293872This is fantastic - I'm EU and I had a possibility to join a party with Kerry King but I was cock blocked by my friend's gf. 20 years ago.These things happen. Sometimes it could be cool but would it really matter, I don't know.GG was an artist. I don't know if I wanted to see his gigs.
>https://github.com/Artoriuz/ArtCNN>a brazilian guy with a 9070 xt managed to make a mpv shader with 4m parameters by himself>it actually pretty looks goodImagine if companies invested more in AI image scaling? We would probably have 1080p>2160p hallucinations that are lossless to the human eye by now
>>107293872Not sure if he has a brother but anyway.
>>107293914There's a lot of data from the web that was only barely scraped by CommonCrawl, and it's likely that Google already has it to a large extent, since they had to index web content in depth (until some time ago Google provided cached versions of almost every web page). Google also has easy access to the entirety of Usenet, of every book that was uploaded on Google Books, of past or present (of those remaining, at least) Blogger blog and if they're desperate they might also end up using Gmail data and much more. I don't think they're data-starved, if anything they're probably still only using a fraction of what they have.
>>107294083>Imagine if companies invested more in AI image scaling?This is the only thing left to academics. Super resolution, denoising, inpainting... Things like that. Those are things you can do alone with a tiny budget.
>>107294104>of every book that was uploaded on Google Books, of past or present (of those remaining, at least)They also scanned millions of books. It got killed by a retarded judge, but perhaps they still do it in the shadow. https://www.edsurge.com/news/2017-08-10-what-happened-to-google-s-effort-to-scan-millions-of-university-library-books
>>107294083Don't worry anon, I'll make sure make sure to pop the AI bubble so we can go back to this
>>107293073Does anyone know how to convert it to GGUF without quantizing it again?
>>107294784doesn't the script just copy the tensors if you use the same data type?
>>107293777cope and seethe
>>107294829It seems. I was confused that the original GGUFs said mxfp4 while there's no such option in --outtype.
>>107292886>>107292892omg it defucker!
What's the sota Qwen model that's locally viable? How censored is it?
>>107293073120b model btw.
>>107294757>literally who zoomer grifter think he has a pointI don't think so
>>107295225lmao
>>107295366¬ ‿ ¬
>>107295386>high school dropoutthanks for confirming my point
>>107295409I'm sorry if you peaked in high school and believe it amounts to anything in life
>>107295409he's a dropout and managed to get a work at OpenAI while I have 2 engineering diplomas and I'm still searching for a work, life is so unfair man :(
>>107295444that's the power of grift, just whore yourself on social media to get some big names attention. I know a lot of guys like that. They could give a jeets a run for their money
>>107295444I'm really curious how a high school dropout is qualified to be a researcher when, except for Indians, those positions tend to be Silicon Valley PhDs only.
>>107295519SameHe shouldn't be able to get past HR in any larger company due to getting automatically filtered
>>107295519>>107295386no wonder OpenAI is stagnating, they hire fucking dropouts to make their new models holy shit
>>107295506Grifting has its limits. I could see it getting someone into a executive or low skill infrastructure scripting position, but there's no chance someone who has never even taken a Calculus class is going to be productive researcher.>>107295558I assume it must be title inflation.https://github.com/gabrielpeterssonHe did marketing and hyped some webshit he made and got hired as a regular software engineer at midjourney.He is likely just doing regular development and not actual research.
>>107295558>>107295568That's a nepobaby, you don't land a position like this with 0 diploma lmao
>>107295589>Grifting has its limits.>14-16yrs old:> bought and sold pokemon cards for 20k$+ with very high margins
>>107295636he grifted his way out of his mom's womb kek
>>107295636Okay creating a minecraft is pretty cool I like him now
>>107295659>Okay creating a minecraft is pretty coolOPENAI, HIRE THIS MANhttps://www.youtube.com/watch?v=C1Y_d_Lhp60
>>107295558>>107295444credentialist seethe
>>107295699t. grifter
>>107295568and everyone else is hiring jeetsno wonder all everyone does is try to make models bigger
>>107283894Managed to run Qwen 30B model with 1 million context to process this file. the processing time was around 30 minutes, since the moby.txt is around 900k tokens.So far have successfully ran 235B model with 128k context, probably could fit around 175k into vram (there's around 10% free vram in each card) with precise model launch command for optimal chatting.
>>107295699If OpenAI had anime girl branding and related projects every anon here would be simping for them, pay them no attention
>>107295741adding an anime girl doesn't make everything better boomer
>>107295741TRUEthat's why every smart anon here should go work at Spellbrush and not JeetAIAnyone can take the exam to get hired here:https://spellbrush.com/exam
>>107295766looks cool
>>107295636At his age, I was learning how to play the guitar and how to make CGI images and animations by my own. I also was farming like a retard in Silkroad Online (or maybe it was WoW at that time, can't remember).I did earned a lot of Pokemon and Yu-Gi-Oh! cards when I was in primary schoold, but never sold them (I still have them).He's impressive.
>>107295800>He's impressive.Idiots like you are the reason grifting works.
>>107295817this
>>107295737>moby.txtAsking for a summary of a known work is kind of pointless.
>>107295766
>>107295908
>>107295766>https://jobs.mchire.comHarsh.What does the vetted /lmg/ panel think? Am I hire-able?
>>107295766Rough.
>>107295908>>107295962>>107296042
>>107295874ye
>>107295766At some point I used to watch almost everything that came out, then as anime runs got shorter and shorter, never concluded, and became all the samey (to me at least), eventually I stopped.
>>107296062
>>107296062got this result too, feelsbadman
>>107296127What a garbage list. They included Yuri on Ice but not Texhnolyze, for example. Wouldn't want to work with homos that watch modern generic shonen, SoL, and isekai anyway.
>>107295766There’s a bunch of shows I watched that weren’t on the list.
>>107296213it's also missing the anime of best girl
eta until non-safetyslopped Qwen VL model?
>>107296213There's some good stuff on there like GiTS, perfect blue and gantz but yeah it's chock full of Reddit bait shonen shit
>>107296213it doesn’t even have monogatari so no wonder
>>107296522they just wouldn't get it (neither do i, but i like it)
>>107296522Sorry, hiring someone that has watched or enjoyed Monogatari counts as an HR violation
>>107296569local model?
>>107296621https://trace.moe/?auto&url=https%3A%2F%2Fi.4cdn.org%2Fg%2F1763849305833297.jpg
Guys I figured out how to sneak gibberish past the text encoder on Suno V5. https://suno.com/s/iynrEzg5x8SY1hpq
>>107295766I've got the most stamps.
>>107296726*ahem*
kek
>>107295766>titles in romajiMeme.
>>107296812>k2>the stories are so goodlmfao yeah for sure
>>107296812As subtle as political comic strips. And just as funny.
>>107296812What's the punch line?
>>107296832>>107296843
>>107296863Who did he kill? Why should critics beware if nothing happened to them? This isn't funny, Dave.
>>107296876I asked it to turn them into ashes but it didn't work :(
>>107296863Someone who never shows up in any other frame got cremated.
>>107296332chaika a cute
>>107296812>>107296863big fan of subtle ai lol comics.... only the intelligent will get this one
>>107296885that's the punchline
>>107295766I can't believe none of you faggots saw Utena, no wonder nu-/lmg/ is shit desu
>>107297164Who?
>>107297164What?
>>107297164Is it good?
run lmstudio, upload my resume, tell ai to fix it, come back tomorrow to see if it did anything at all
>>107295589vibes based research
>>107295766where's prillya?
>>107297164>I can't believe none of you faggots saw Utena, I recently re-watched the series + movie when I hooked up my CRT and PS2 again. Not sure why it's relevant here though?
>>107297687>prillya
>>107298068>If you only knew how bad things really are
>>107298113seems like a pretty happy lil'guy to me
justpaste (DOTit) GreedyNalaTestsAdded new and ratingsAdded models:LFM2-8B-A1BSnowpiercer-15B-v3aApriel-1.5-15b-ThinkerSnowpiercer-15B-v3cRing-mini-2.0Ling-mini-2.0Rivermind-24B-v1aCydonia-R1-24B-v4fCydonia-24B-v4sCydonia-24B-v4rPrecog-24B-v1bgemma-3-27b-it-antislopOlmo-3-1125-32Baquif-3.5-Max-42B-A3Bswiss-ai_Apertus-70B-Instruct-2509-IQ4_XSBeen a while huh. Gemma antislop was, surprise surprise, a bit sloppy still. Cydonia R1, Snowpiercer v4, and Precog's outputs got flag ratings. I don't mean to give Drummer's models special attention, this is just how things turn out as I mainly test models that get mentioned here and on some other sites. Also added new ratings to indicate when I've actually personally used a model and can confirm it's garbo/good, but this is in progress.Contributions needed (Q4 or above):The latest Qwen 3 235B Instruct, Thinker and the 480B Coder (for prompt, go to "mradermacher_LFM2-2.6B.Q8_0.gguf" in the paste)ERNIE-4.5-300B-A47B-PT (prompt->"ernie-placeholder")GLM-4.5, 4.6, and Air, and Drummer's "Steam" finetune (prompt->"glm-placeholder")gpt-oss-120b (prompt->"ggml-org_gpt-oss-20b-mxfp4.gguf", and you may experiment around with the prompt template as it has some oddities and extra features)MiniMax-M2 (prompt->"minimax-placeholder")Kimi-K2-Thinking (prompt->"kimi-placeholder")>From neutralized samplers, use temperature 0, top k 1, seed 1 (just in case). Copy the prompt as text completion into something like Mikupad. Then copy the output in a pastebin alternative of your choosing or just in your post. Do a swipe/roll and copy that second output as well. Include your backend used + pull datetime/version. Also a link to the quant used, or what settings you used to make your quant.
>>107298261Oh shit, you live.
>>107298261>Added new and ratingsPasted it which is why the emojis don't appear.
>do anything using HIP>close script>open again>No HIP GPUs are available>need to reboot to "fix" itAAAAAAAAAAAAAAAAAAA ITS ALMOST 2026 WHY IS THIS SHIT STILL HAPPENING
No HIP GPUs are available
>>107298283Yeah just busy with life and stuff, but I always keep an eye out on the threads at least.
>>107298303>Trusting AMD to not fuck software
I think OpenAI might be serving a higher quality model during the night (on topic because I'm using it to gather data for distillation of open models).Last night - great, follows instructions to the t. During the day - was kinda dumb, ignoring instructions and such. Tonight - great again.
>>107298387I would say it would be obvious, usage will be lower when people aren't working. Lower demand means they can serve a higher quant. Though, I would have expected their drastically discounted India plans would have balanced out demand during those times.
>>107298387>I'm using it to gather dataThat's not what you said you would do.
>>107298123I thought that was wheezywaiter merch
>>107298261As always, thank you for your service>THIS IS NOT A LEADERBOARD OR BENCHMARK;lol, you know better
>>107298261what is this? you want people to run tests on models to judge quality? if so, i could contribute for anything below ~400b
>>107292886>>107292892sexo with defoko
>>107298456True, but I'll still try.>>107298467>a samples repository of reproducible reference logsBasically just wanted to make a kind of archive.>i could contribute for anything below ~400bThat would be welcome!
>>107298434I spent the whole day begging gpt to fix minor formatting issues in a web search and web scrapping script that it then used for 5 minutes to gather information on the architecture of gpt-oss 20b about 3 hours ago. Then I spent those 3 hours asking it to work on the tokenizer (it had made tokenizers before but I decided to let it start from scratch).Now it's figuring out the last few corner cases to achieve identical output to huggingface's tokenizer.
>>107298434As for the data gathering, what I mean is that I'm gathering the responses while I work with it to write code, right now I'm not creating artificial scenarios just to gather data.
>>107298544>I spent the whole day begging gptnot to help you make an inference engine. You're going into every rabbit hole you can find instead of learning what you need to make the inference engine.>what I mean is that I'm gathering the responses while I work with it to write code>>107298387>I'm using it to gather data for distillation of open modelsShattered mind.You don't need a tokenizer yet. That's just a detail. Write the code to run the tensors, load the tensors, feed it token ids, compare output token ids to the reference implementation. Leave the easy bits for later.
>>107298535The secret to AI vibe coding:>new chat>here's a program, it works great except <what I want to add>
>>107298767Please do not reveal our secrets. That's the only competitive edge we have left.
I found out gpt-5.1-codex unfortunately is almost useless to use with my custom code assistant, because it's overfitted to the particular tool usage format used by codex. But gpt-5.1 works very nicely and is basically AGI.
>>107298767>>107298787I find that workflow very tedious. I'd rather try to recover a poisoned context than try to get the new instance of the assistant up to speed with everything that we did and discussed in the previous session.For "infinite context" I found simple truncation works just fine, gpt is working for me just fine up to context of about a million characters (not tokens) and by the time we get there, we've made so much progress that the stuff at the beginning of the context is almost irrelevant, so I just do "/truncate 500000" and go from there (this also works with GLM 4.6 and gpt-oss but not as well since they have much smaller contexts and start being retarded at 100000 characters). This is why the OCR context thing published by Deepseek would never work. Imagine trying to fit a million characters into images. Models already struggle to work with the stuff that fits in their context as normal text, putting it as almost unreadable tiny characters in images wouldn't help anything.
>>107298833>than try to get the new instance of the assistant up to speed with everything that we did and discussed in the previous session.You should be using a memory bank tool so each instance can handle getting itself up to speed.
I also implemented an "/auto 'blah blah blah' 10" command that responds to the model with the same message a number of times in a row. This helps with gpt because while it could keep going by itself, it's programmed to stop to avoid resource usage. So a little nudge helps it to go along.
And the retard takes it seriously.
>>1072987885.1 is a smaller model than 5, no.
>>107298857How would you even implement that? I doubt asking the model to summarize the whole conversation would work well, but maybe I'm wrong.
>>107298868Maybe. I'll try changing the model to that one but I'm not sure if the coding endpoint has access to it anymore.
>>107298892also if you think 5 is good, try 5-pro; holy molyand its not maybe. 5.1 inference is near instant, its a tiny model.
>>107298903I've tried 5 pro, yes, the pro models have always been very good. But waiting 10 to 20 minutes per reply is not practical for coding.As for the speed of replies, maybe they use a drafting model sometimes, I've noticed wild swings in speed. But that might be just network lag or server usage.
The pro models are only good for analytical tasks though, for anything creative they are be very underwhelming.
>>107298877>How would you even implement that?Google memory bank mcp.
>>107298989Do you use it? Does it work well?
>>107299096Yes. Yes.
>>107298938you're waiting 10 to 20 minutes for a reply from 5 pro while paying $120/mtok output? wtf?
>>107299439I'm talking about the web interface, never used it through the API.But on the web interface it seems to have like 100s or 1000s of thinking tokens for each token in the actual output, so that's probably why it's so expensive.
>>107299453im just a bit shocked that its thinking that much. i use k2 thinking and that model already thinks a ton, i could easily have it use 150-200k+ thinking tokens in a 32k context story. thinking of having to spend $30 just for a regular RP scenario is wild. obviously 5 pro has a use case beyond just RPing, but i just feel like i would be better off using gemini for coding than 5 pro after a certain context length since gemini seems to not fall off as hard at high context lengths.
>>107299500I doubt almost anyone is using it through API, most people will be using it with the $200 plan for research tasks.
>>107299574have you ever ran into rate limiting issues with the pro model? $2400 a year is still a ton of money to me considering my server was $6000 and im running 1T models. i could understand spending $200 for a month or two though
that moment when you come across your own 4chan post when doing research on google
>>107299613No, but then again I haven't used it that much because of how long it takes. The Pro model is absolutely not worth the $200 dollars, I've only ever paid for that plan like 3 times besides now. I got it for codex. I'd have paid for the $100 Claude plan instead but I'm banned on Claude.
>>107299656>the answer was inside you all along
>>107299656This is why Gemma Sirs are so powerful. Google allows us multiple datasets.
Sirs when is we gotten gemma 4 #1 open Bharati model?
>>107299776Kerala rumour: Gemma 4 will be named after Ganesh — the elephant god , because it's subtle but very large indeed.
Minutes.
>>107295568most of these aren’t actively working on the modelsbut openai does hire a select few retards who add zero value in filler roles just because someone thought their tweets were funnydon’t even get me started on anthropic and the rat problem t. sf retard
>>107300037Sir... I accept your cynicism.
>>107300037also there’s a reason all of these are young twinks of questionable sexuality
>>107300037>>107300056You have provided zero evidence but I will accept these anonymous posts as hard facts because they align with my preconceived assumptions.
>>107300037what is anthropic's problem with rats?
>>107300220the fucking intern keeps leaving cheese outside of the fridge
>>107300220Their marketing department loves anuses. Same thing with openAI and Microsoft. A coincidence? I don't think so.
>>107298303you must be doing something wrong, try to free device or whatever in your scriptask grok or chatgpt
>>107298303That's a linux issue. I need to tell you but Linux is shit unless you work for a company with paid system administrators.
>>107300259I believe you that this bug doesn't occur on Windows but only because I expect there to be different bugs instead.
>>107300220rationalists / effective altruists
setup guide for LocalSong?ran into all kinds of errors, specifically around triton and msvc/msbuild (?)or does it just not work with Blackwell?solved triton with the windows variant(s), couldn't solve the build issueaparently some file called "algorithm", which was in the build tools folder, couldn't be found.
>>107295444>>107295506>>107295519>>107295558>>107295589He got buttfucked by Sam. He loves little twinks like that and gives them jobs in exchange for sex.
>>107300292In any case it's because system selects the wrong device id.
>>107300366Triton is a bitch to set up even for image gens. I would first try setting up vanilla and go from there.
>>107300366>aparently some file called "algorithm", which was in the build tools folder, couldn't be found.kek
>>107296127>spice and wolf not even on their listabsolutely disgraceful
Is that anon who wasn't able to compile llama on Fedora 43 a few weeks back still around? What's the current status, I've been holding back on updating partly because of you
>>107300732I'm here. I went on using vulkan pre-built binaries which sucks but have taken a hiatus.... Cuda toolkit is still not updated so that will probably happen in late December or January.By all means leather jacket man's janitors are working on it.
>>107300732To add: I never tried docker or virtual environments but I feel like as end user it's bit too much for my pay grade so to speak. Chatpajeet can give advice but I am not sure, you would need to install whole complete environment for the libc stuff so fuck it.
>>107300764>>107300774I see, good to know. I already compile it using a F42 toolbox container, so I think in theory that would still work on 43. Probably better just waiting for native support though
>>107300821I'm too dumb for this. My expertise ends up in a simple makefile and hello-world.c file.
>>107300764So you just lurk this thread intensely enough to respond within 10 minutes notice but don't actually use local llms because of a compilation issue?
>>107300922Maybe learn English first and then try posting again.
>>107300931I think my level of english is adequate desu.You're the first person who has complained about it in ages. Maybe you are the real problem. Have you thought about that?
>>107300969You are writing like a passive aggressive little kid or a bitch. Doesn't really matter which one is it.
>>107300981I wish I was either of those tbqh
Did something about Llama.cpp's handling of Gemma 3 change within the last few weeks? I did a pull and now I can't load it with the same amount of context anymore. The pp and tg speeds are also slightly slower.
>>107300990are you using vulkan?
>>107300988You sound like a bitch to me.
>>107300994No.
>>107300990It has changed. Most likely related to --mmproj or something.It's somewhat bad that we rely on using single freetard solution. Dev could go schizo any time.
>>107300995go tell the rest of the world, I want to live life on ez mode too
what do you anons mostly use your local LLM's for?i use it for general chatting and testing their programming capabilities
>>107301045I have written a game but sort of grew tired of it. You have a map with rooms and it's based in Forgotten Realms. In the beginning LLM generates a random quest and you and your companion will need to travel there. It's fun because the adventure is just an outline and you can follow it but if you want you can obviously do whatever. It holds up. Need to implement random encounters, weather and inventory.
>>107300990Alright so I just tried some things and it seems I get back to the old cache VRAM usage by including -kvu in my flags. Speed seems to be better as well but like still 1% worse than my old speed that I measured, though not sure if that's because of day to day variation as I'm not running a bench, just a prompt.
>>107301062by rooms I mean interactive fiction rooms, eg. locations. Not literal rooms.
>>107301045general chatting, Emacs assistant, writing CVs for job offers, fapping of course.
I'm trying Gemma 3 27B heretic right now and I think it's pretty decent? Indeed doesn't seem to be much different from regular Gemma, but it is less censored.
>>107301126Been using glitter which is 50/50 instruct and base mix. Works but it'll still randomly display its suicide hotlines out of the blue. Regenning the answer helps but sometimes it'll just get stuck and won't stop moralizing.
>>107301126Gemma 3 was never that censored to begin with and the rape hotlines could be very easily worked around without abliteration. I'm also pretty sure it's been post-trained with some ERP as well, even if it refuses them by default.
>>107301138>50/50 instruct and base mixHuh. I wonder if using heretic instead of the normal instruct would work better in that case.>>107301144Yeah, I'm just playing with it. I got gpt oss 120b heretic downloading in the meantime too.
>>107301144I like when it refuses and berates the user , but then cutting off the character's head and defacing its body is suddenly just fine like nothing happened.
>>107292886Recently someone gave a presentation about how language models will be super useful as agents in research tasks, e.g. for finding information or automating tedious parts of workflows.But I just cannot share their optimism, when I try something like>Yugioh youtuber Cimoooooooo uploaded a video in which he showed the card Swap Frog to another youtuber. Find this video for me.that should in principle just be a simple scan over the uploaded videos + comments but I get pretty useless results.I get the impression that because there's like a hundred video with almost identical titles the language models get confused - but that is exactly why I would want to automate this task in the first place.>multimodalityYes, but for a real research task that would apply as well.
>>107301167Isn't the big problem that if there are no hits it'll just make up an answer that sounds plausible? You have to manually verify every output.
is the meta still cydonia or did we move on to something else?
>>107301195You are absolutely right — you are very clever to notice this difference.
>>107301167Do you have thinking enabled to confirm the model can't find the video? It's possible that the model doesn't want to subject itself to watching a Cimooooo video and is bullshitting you to avoid having to share your shit taste. I'm only half shitposting; thinking models make it easier to troubleshoot where the problem actually happened before it starts hallucinating false successes.
Pls tell me how dellusional I am>PNY RTX PRO 4000 Blackwell SFF Edition for 1.5k1-slot70Wfits in my cramped rag where 3090 took all the space24Gb to work with TTS, 15b LLM's etcHow retarded am I with my high expectations from a 70W GPU?
New Moonshota model might get released soon.https://x.com/HaoningTimothy/status/1992496722107682908
>>107301378It's still a workstation card and has plenty of cuda cores. Raw power consumption isn't everything. Sure some autist will probably cry that you are not getting 500 tokens per second..
>>107301045Exclusively for ERP and sometimes just normal RP when I'm too drained.
>>107301195>>107301286I tried Google AI Studio since my expectation was that that would have the best integration with YouTube, thinking and web search was enabled.If you look at the reasoning trace the model fails by matching the general properties of Swap Frog with a plausible video title and suggesting that maybe that video contains the card.Which would be a reasonable approach if there was only a single video like that.So yes, the model just doesn't want to subject itself to actually watching videos like Hearthstone Pro Rates The MOST BROKEN Yu-Gi-Oh! Cards ft. @Rarran.Though if I explicitly tell the model to check the comment section it still can't do it.
>>107301423You are expecting too much from what is an afterthought on a model testing sandbox.For something like that you would want a well thought out system prompt as well as scripts specifically to interact with youtube (search, description, transcript, comments etc. maybe even grabbing a few frames if it's a multimodal model).That or a super powerful web scraping framework that allows the model to autonomously use the browser to get all the comments and transcription.But in any case that is something that you want to run locally to see what it's failing at, not let it hit some non-descript search API that will return whatever.
>>107301045generating synthetic data for llm training. its much cheaper than the apis. quality seems comparable if I keep the context length around ~16k or under.
>>107301383either some qwen max style shit or k2.1 with thinking more optimized so it doesn't sit there for 5 minutes
>>107301383Please be the no safetyslop Kimi K3 timeline.
>>107301460Yes, I agree that that is expecting too much from low-effort use of language models as agents.Which is why I disagree with the message of the presentation I described in >>107301167 .If one has to invest a lot of task-specific effort to make agents work then there is no point.
>>107301383imagine the big model smell
>>107301144>Gemma 3 was never that censored to begin withCome up with a fantasy name>Lyra Meadowlightrestart>Lyra Meadowlightrestart>Lyra Meadowlighttemp 2.0restart>Lyra Meadowlightpersonally, i think gemma shills are disgustingno sane people use this overcooked shitfest
>>107301502This has nothing to do with censorship, it's simply so-called slop. You might also want to check your truncation sampling setting.
>>107301502>Isara>Syra>Ilara lol
>>107301499The answer is combining vision models and DOM parsing. That way if one fails you have the other.Youtube is a hard one because the comments load dynamically and most of the info is on the video itself, something like Arxiv works much better.
>>107301517>check your truncationi actually don't have to do that, since any random llama 2 finetune can generate random DnD names just fine out of the boxunlike this crapware the shills want you to cope with
>>107301619Oh okay...
>>107301502Sir are you bloody bestard? Dr Elara Voss and General Thorne, hanging out at the gilded cage in silverwood, working on project chimera is a national epic. Take it back or rape you tonight
>>107301712I'll implement this bloody scenario Tonight. Need to refresh my interests 100%
>>107298387It does that. Chatgpt is sometimes so stupid it's probably a 1 bit quant. Just hard to say what are the reall down hours because us/eu times etc.
>>107295766Datamining thread
>>107301778You sure are easily amused
>still no pixtral large support
>>107301873EXL2 is the format for that. But I find myself wishing someone cook an EXL3. 4-5 bit.
i found this thread via twitter. lets just say certain companies are filled with sex addicted fags that give each other fake jobs.
>>107302145and yet the models prude
>>107302145Post the twitter link. Who is upset that we're talking about their open secret? Is it gabriel?
/lmg/ is probably the most overweight general around here
>>107302311My BMI is 19.4, I'm having trouble putting on weight actually.
>>107302311The hardware for running models locally is pretty expensive so the percentage of Americans and western Europeans itt is probably relatively high.Personally I'm 1.81 m tall and weigh like 80 kg.
>>107302432lmao this nepobaby faggot is lurking here
>>107302311i have a bmi of around 48
>>107302466>@da_fant uneducated, founder world's first AI agent w +1M usersis apparently the lurker, he just got the twink's attention
>>107302432>This URL only works inside an RSS client.what
roon is laughing at you retards
>>107302515hahaha those 4Chan users are so dumb!!!
>>107302515>roonwho?
>>107302515He's right... Most Indians are working at Google.
>>107302515>t.roon
>>107302543high level openai employee and san francisco royalty
>>107302552kek
>>107302556so, a megafaggot then
>>107302515>>107302543someone say roon?
>>107302556>an OpenAI employee is forced to suck off another OpenAI employee in fear of loosing his jobwhy should we listen to the opinion of someone who's deep into some conflict of interest
>>107302515That's an even more autistic interpretation than the average 4chan user would've come up with, impressive
>>107295817Lol ok, sure.
>"I can't believe you, Anon! I thought you were... I thought you were at least somewhat normal!"
>>107302691I wrote a game and my own llm client. That's enough for me, dingus.
>>107302691>where are all of your startups?don't project your miserable life onto everyone else
>>107302691but the nepobaby twink hasn't made succesful startus though, that's why he's just an employee at OpenAI
>>107302712you're the one jealous of someone half your age working for openai
>>107301045I don't want the details of my files anywhere outside my computer
>>107302809Can you ERP with this interface too?
What's the best image upscaler right now? I have some planet maps that I want to upscale 4x but I don't know what product to use.
>>107302818You can try and tell me how it goes I guesshttps://github.com/sandwichdoge/VibesAndFolders
>>107302828https://openmodeldb.info/models/4x-Nomos8k-atd-jpgThis is probably one of the best but it's slow as "heck". You can do everything in CumUI though.
How does a 5090 fare for local text gen storytelling? Is it comparable to stuff like Redquill, or should I wait for the 6000 series?
>>107302929It's good, but you need, like, 10 of them
>>107301502Better than Seraphina
>>107302929get a blackwell pro. the 6000 series was delayed to june of 2027
>>107302954>>107302989fug
>>107302999blackwell pros are reasonably affordable. nice trips btw
is thinking broken for anyone else on the latest sillytavern release branch?what did coheejeet fuck up this time?
>>107303174what exactly is wrong?
>>107302929>randomly decide to lurk in a general I pretty much never visit>see my really old genDamn.
>>107301045Mostly for writing stories for me to fap to. I want to keep that shit private on my own machineFor programming or general advice I usually go to chatgpt
>>107295519>>107295589he had a portfolio of stuff he worked on. If I recall correctly, he worked on midjourney or something like that
>>107301502Gemma is very confident in its replies and rerolls don't usually change it muchI can't count how many Kaelens, Old Man Hemlocks and Doc Abernathys I've seen
>>107301502Gemma is insanely good for an assistant for it's small size, that's why it's so goodFor RP and coding it's shit yeah
calling that soft blonde twink gay for pay has offended the twitter community, they would prefer he be called sam altmans "research scientist"
>>107303474Doc Abernathy, what a great name.
twink bussy is the only thing protecting us from AGI at this point. got all those san fran faggots high on tight boy holes. god speed gentlemen
Damm bro I wasn't being serious you are actually a pretty cute twinkAnd you would be even cuter if OpenAI released more OSS models
>>107303739Such a cute boy. Now I know why Sam hired him.
4channers are retarded lol.I'm not a channer btw.
>>107303795>I'm not a channer btw.Don't worry, we can tell.
>>107303739>These fags actually browse these threadsLOL
>>107303779big debuff working at midjourney. nothing good comes form there. their devs went on ooba issues and complained anime girls as default characters were scaring women. wanted to DEI them up.
>>107303843Of course, 4chan actually invented CoT
>>107303853that's something san fran types would brag about
>>107303853They might be a bit jewish in their monetization and moderation, but Midjourney still has the best aesthetics of any image model, no other project has stuff like thishttps://x.com/midjourney/status/1991684484455100477
https://x.com/karpathy/status/1992655330002817095Waow
It's becoming hard to remember Meta and Mistral are both major AI companies with billions in valuation. At least Zucc is still shoveling money but what are the frogs even doing right now?
>>107304253Mistral is in limbo sucking off of the euro taxpayers' teat while waiting to be bought out by Apple.
>>107304253please don't mention Meta anymoreIt hurts
Let me interrupt your super important discussion for a second and ask why ik_llama.cpp is so slow with glm 4.5 air (with q2_k_xl quant)?On master llama.cpp I'm getting 21 t/s, on master ik_llama.cpp I'm getting about 16.5 t/s no matter what I do.
>>107304343Doesn't ik_llama need special quants now? Don't think you get the speedups with mainline ggufs.
>>107304335The look on Zuck's face when his $100 gorillion MSL shits out another Llama 4-tier disaster.
>>107304399The bright-side of them abandoning open source is at least we won't be disappointed.
>>107304399It's going to be even funnier if Yan LeCope's research ever pans out
>>107304343Try slapping --graph-reuse, --rope-cache, and -mqkv at the end of your arguments.
>>107304253Mistral's most recent grift was lobbying the French government to launch French LMArena where their closed model is conveniently ranked #1
>>107304364That's pretty sad, I don't think there are any glm 4.5 ik quants uploaded anywhere.>>107304448This changed absolutely nothing.
>>107304569https://huggingface.co/ubergarm/GLM-4.5-Air-GGUF
Sam is so lucky bros. Just imagine the tight high school dropout boymeat he gets to have daily
>>107304364No, you guys simply seem to be bad at configuring it. I use the same quants as on mainline and test occasionally. ik gives me almost double speeds for months. qwen/glm/deepseek, no matter what I try.Where it sucks is fully offloaded models. Skip it for that.The special quants give better perplexity per GB but can be slower if they're IQ.
>>107304588Damn anon now I feel like a retard. Still, I can't run anything bigger than IQ1_KT from that repo so it doesn't seem like a good option either way>>107304732Share your wisdom, I'm running it like this:-ctk q8_0 -ctv q8_0 -mg 0 -c 32768 -np 2 --n-cpu-moe 16 -ts 42,14 --no-mmapLike I said, adding --graph-reuse --rope-cache -mqkv did pretty much nothing. Maybe --n-cpu-moe selects experts poorly or it just wouldn't work well because most of the model is offloaded to the gpu?
>>107301383a version of kimi that's glm 4.6 size would be nice
>>107304815There's no way around manually putting layers on gpu. rtr exchanges PP for TG. Those little speedups are worth maybe 1% at best. GR may be on by default now. Rope cache makes models stupid. Did you ever check free vram after using -n-cpu-moe? -ngl 94 \ -ctk q8_0 \ -ctv q8_0 \ -rtr \ -ub 1024 \ --jinja \ --reasoning-budget 0 \ -cuda offload-batch-size=7,fusion=1 \ -mqkv \ -ot "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14)\.ffn_.*_exps.=CUDA0" \ -ot "blk\.(15|16|17|18|19|20|21|22|23|24|25|26)\.ffn_.*_exps.=CUDA1" \ -ot "blk\.(27|28|29|30|31|32|33|34|35|36|37|38)\.ffn_.*_exps.=CUDA2" \ -ot "blk\.(39|40|41|42|43|44|45|46|47|48|49)\.ffn_.*_exps.=CUDA3" \ -ot "blk\.(50)\.ffn_(up|down)_exps\.weight=CUDA3" \ -ot "\.ffn_.*_exps.=CPU"
>>107303886https://huggingface.co/SG161222/SPARK.Chroma_preview/tree/main
Local only version of my RE agent, with simplified R2 only toolset for anyone interested. I'm also working on a more complicated version which exposes dynamic tracing tools in a docker container, but I haven't had great luck using that one with local models yet.https://pastebin.com/Xr8KHV9Y
>>107304942not even close, you're coping
>>107304942How does this even come close to the amount of searchable artstyles and aesthetic personalisation in Midjourney?
>>107304987Who cares. It's free and doesn't fund assholes.
>>107305002>it's bad but it's free so you should praise itfuck no, the fuck is this kind of logic
>>107305002Eating shit is free, something being free doesn't make it good
>>107305002>muhh assholeshistory will only remember their technical achievements, like Napoleon, you have to separate the art and the artist dude
>>107305076Nobody will remember midjourney. Their model ain't that great and they're full of pretentious cucks.>muh stylesThey're getting sued for that.It's also not local. Let's take the eating shit argument back to LLMs and close the thread. Got claude, gemini, etc so pack it up.
>>107305104>Nobody will remember midjourney.delusional, you can make the argument it's not local so it has nothing to be brought here but try not to pretend they don't have something unique and special in their hands, no one is buying it and that makes you disingenuous
>>107305121Holy shill. They have a finetuned SDXL. May as well pump NAI while you're at it. Never felt I was missing out not using yidjourney.
>>107303886>>107304942>>107304987>what is a lora sd 1.5 is unironically good enough the main issue is just text and having to fucking regen but its fast so its mainly a tedium problem all this shit is just goycattle gobble modern saas in spirit
>t. cuckingface
>>107305364information is dangerous, goy
>>107305364>the epstein files datasetwhat? they finally released it?
>>107305364>make an epstein dataset>start talking about how dangerous it is and now that you have the tiniest bit of power think about gating access, censorship and preachingThis grift is a bit sad.
>>107305364Good. Dangerous information like the Epstein files has no place in a training set. Stopping antisemitic behavior needs to happen at the data level.
>>107305364lmaooo, that's a troll right?
>>107304941>Did you ever check free vram after using -n-cpu-moe?Of course I did, the gpus are full.>manually putting layers on gpuI just tried doing this again, came up with this-rtr -cuda offload-batch-size=7,fusion=1 -mqkv -gr -ot "blk\.(14|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35)\.ffn_.*_exps.=CUDA0" -ot "blk\.(36|37|38|39|40|41|42|43|44|45|46)\.ffn_.**_exps.=CUDA1" -ot "\.ffn_.*_exps.=CPU" and it still works at the same speed. That is, slower than using mainline llama.cpp, about 16.5 t/s.
>>107305438>>107305462This isn't because of Jews, it's because Trump is and always has been anti free speech and is cracking down on anyone saying anything bad about him.
>>107305520>trump derangement syndromeat no point that post implies it's a pressure from the government, you want to see what's real government pressure towards private companies? look no further than the previous administration
>>107305520>Trump is and always has been anti free speechyet he's the one who will be releasing the Epstein files while Biden didn't lool
>>107305444they released a redacted one a while ago. bunch of epstein's emails were made public recently. larry summers got fucked by it lmfao
>>107305548Look up the definition of chilling effect.
>>107305586>they're self censuring because of Drumpf! I have no evidence of that but I'm gonna present this as a factleftists sure love conspiracy theories when you think about it
>>107305364Nice way to let people know.
>>107305519My system is 3090s and DDR4. For me it's the reverse. On qwen its a difference of 7t/s and 20t/s. Full GLM Q4 gets almost 16 with IK. I didn't even bother with mainline there. If it truly works better for you, keep using it. Assuming you enabled all the compile-time stuff like BF16, CPU instructions, etc.
>>107302311>General full of third world jeets is obeseWhat the fuck are you talking about?>>107302552kek.>>107303739Sanfran kikes are narcissistic enough to seek validation here.>>107305612They spend a lot of ways theorycrafting how best to silence people so it comes naturally to them.
>>107305612those files seem to implicate more dems than trumpers. could be why HF is getting nervous. they're also pussies though. we know we can't count on them if something happens to models. good to find out early.
>>107305737>They spend a lot of ways theorycrafting how best to silence people so it comes naturally to them.true, democrats are masters of censorship, it's literally in their DNA
>>107305462>Awareness of pedophilia is anti-semiticWhat did xhe mean by this?
>>107305756More like NDA, amiright?
>>107305774kek
>>107305629Can also just get it straight from the government release, images and all:https://oversight.house.gov/release/oversight-committee-releases-additional-epstein-estate-documents/And they want to make people take an ethics certifcations to download the csv from HF.
>>107305364HuggingFace is basedThey will host anything as long as others don't snitch and make a fuss about itExhibit A:https://huggingface.co/datasets/mirav/gurobooru/tree/main>Uploaded over 2 years ago
>>107305876Is that being based or just incompetent?
>>107305364>extremely sensitive information that could spread misinformation
>>107305889There are hundreds of lightweight models to scan video/audio/text etc for sus content, if they wanted to be completely draconic they could easily do so
>>107305940So because they don't go the extra mile to be worse that somehow makes them based?
after updating sillytavern it started always outputing { before it replies, and when continuing a half done message after i fix it manually it directly continues with{ "character_name": "Rebecca", "response": "never had this happen before
>>107306075found the problem, for some reason the "JSON Schema" in sampling settings panel was set to "{}"
>>107306107which api type has that?
>>1073058891000% incompetence. they've fucked up their implementations of rmsnorm and alike in transformers countless times, plus there's this monstrosity: https://github.com/huggingface/transformers/blob/v4.57.1/src/transformers/training_args.py#L216i'd rather shove a cactus up my urethra than use anything from facehugger
>>107306184>>107306184>>107306184
>>107306165basic text completion preset, lcpp backend
>>107306172It always irks me when I do a finetune and at the end it tells me to upload the model to huggingface.
>>107306107seems like a fix was made to address that several minutes ago
>>107306270 (me)actually I'm not sure that's relevantbut I just saw someone on discord mention problem going away after blanking the field
>>107305364"ethical" is just codeword for jewish at this point