/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107056325 & >>107044779►News>(10/30) Qwen3-VL support merged: https://github.com/ggml-org/llama.cpp/pull/16780>(10/30) Kimi-Linear-48B-A3B released with hybrid linear attention: https://hf.co/moonshotai/Kimi-Linear-48B-A3B-Instruct>(10/28) Brumby-14B-Base released with power retention layers: https://manifestai.com/articles/release-brumby-14b>(10/28) NVIDIA-Nemotron-Nano-12B-v2-VL-BF16 released: https://hf.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16>(10/28) LFM2-ColBERT-350M released: https://hf.co/LiquidAI/LFM2-ColBERT-350M►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107056325--VRAM vs RAM tradeoffs and cost-effective upgrades:>107057422 >107057493 >107057523 >107057538 >107057627 >107057641 >107057680 >107057892 >107057904 >107058132 >107058211 >107058235 >107058246 >107058291 >107058301 >107058332 >107058823 >107057647 >107060695--Tech Mahindra's 1 trillion parameter LLM project sparks mixed reactions:>107061935 >107062055 >107061978 >107062154 >107062174--Multi-GPU memory optimization latency tradeoffs for MoE models:>107062861 >107062880 >107062891 >107062902 >107062941 >107063023 >107062887 >107062939 >107062947 >107063018 >107062980 >107063165 >107063110--VTT model comparisons and pipeline suggestions for transcription:>107059665 >107059817 >107059845 >107059918 >107059961 >107060178 >107060224 >107062756 >107062842 >107062859--Qwen 4B's performance in complex JSON generation and small LLM advancements:>107057926 >107058153 >107058218--Qwen 4b's multi-image analysis capabilities demonstrated:>107060687--SillyTavern system prompt configuration challenges:>107062184 >107062200 >107062327 >107062369 >107062386 >107062492--Exploring practical uses for local image processing and interactive applications:>107056358 >107056482 >107056509 >107056541 >107056576 >107056554--Challenges with TabbyAPI and Qwen3 Coder tool calling implementation:>107058354 >107058385 >107058840 >107059067 >107059694 >107062455--Skepticism about LLaDA2.0's practical value due to performance and context limitations:>107060705 >107060731 >107060818--UI/lorebook integration challenges and code accessibility in STScript:>107057009 >107057036 >107057083 >107057101 >107057121 >107057162 >107057240--Miku, Rin, and Dipsy (free space):>107056696 >107057940 >107057943 >107059568 >107059860 >107060222 >107060637 >107060674 >107061256 >107062726 >107061898►Recent Highlight Posts from the Previous Thread: >>107056334Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
i see... :(
>>107064100I don't
https://youtu.be/qw4fDU18RcU
Do you guys know what I realized? No matter how far you go, you're still somewhere and never nowhere, so saying I am in the middle of nowhere is a nonsensical sentence.
>>107064207so he uses vLLM in a docker container (this needing the shm-size) and runs qwen 235B in AWQ 4 bit
All of his knowledge is ironically coming from LLMs. I'm sure he has also browsed /lmg/ in the past at least. You could probably find his retarded questions.
>>107064207pretty dissapointing, he was pretty based up to this point
>>107064207>watch the first few mins>the topic of the title doesn't even get mentioned at all
>>107064207cool Web UI
>>107064207>its actually a video about shitting on cloud models and shilling self-hosting modelshow can one man be so based?
Gguf status?
>>107064207Ok watched the whole video.Wtf he's one of us.
>>107064275>I'm sure he has also browsed /lmg/ in the past at least.I doubt it because he actually complimented gpt-oss
>>107064392anti ai will still use just the thumbnail as saying he's against all ai tho
>>107064207Fuck this fag, I bet he even lurks ITT. His whole persona is so rage inducing.https://youtu.be/7OiMxGwmdto?si=kvdyA0QWdV6rZ_3k
>>107064493>Wtf he's one of us.No shit. He says the word nigger all the time.
>>107064510There is one retard here that regularly praises gpt-oss. Maybe it's him.
>>107064629do not to slander he said once in the rage moment
>>107064663we must agree
>>107064688I've seen some tiktok clips of him where he made some implicit remaks showing he's a white nationalist, that's a reason why he decided to go for japan, not just because of "uwu kawaii desu ne", but because this country is extremely racist and nationalist
>>107064207>video about local AI from e-celeb #16311498>no ollamao in sighti was going to tell you to fuck off but nevermind, i like the guy
>>107064735by wouldn't he be subject to that racism? he is not Japanese
>>107064736I wish I had the money to play around with a VLLM capable rig
>>107064742Racists don't tend to be brightest crayon in the toolshed.
>>107064742everyone in the world know who pewdiepie is, I think the japanese people are happy he's here
>>107064766Ahah so true kind stranger, take this kind gold and upvote with you!
>>107064766the richest man in the history of humanity is a "nazi" though, how is that not bright?
>>107064830he can be rich and a dumbass at the same time
Do you guys ever use models to edit or write your prompts? I'm trying it a bit but desu its hard to tell if its an improvement or not
>>107064830>lifting your hand in a angle is... le nazi
>>107064742why would the japanese hate him?he's not one of the pajeet or third worlder migrants wanting to shit up the place
>>107064742I don't think japanese people mind white people, they know what they are worth
>>107064845Yes, it's useful when for example you want to define character behavior more in detail but you can't be assed to write the entire prompt yourself from scratch. It's also best when the entire prompt is dedicated to the character. For non-RP uses, LLM-driven recursive prompt-refining is also a thing: https://arxiv.org/abs/2507.19457
>>107064845>its hard to tell if its an improvement or notThen consider time and effort, however much or little that is.
>>107064845Oh yeah. Mostly for brainstorming than anything, since the final version is always heavily edited by me.
Can someone explain to me if alpha changes something about the training process or it ONLY changes the multiplier at inference time? (yes, sorry, I'm too lazy to read the actual paper)
>>107064766would you say that about blm?
>>107064965It was intended to just be a multiplier, but in practice, alpha must be at least twice the rank (=it can/should be larger) to mitigate the emergence of "intruder dimensions" that decrease the effective rank of your LoRA.https://arxiv.org/abs/2410.21228
>>107064766>Racists don't tend to be brightest crayon in the toolshed.the US literally hired actual nazis to put their man on the moon lolhttps://en.wikipedia.org/wiki/Operation_Paperclip
>>107065003Ok but that doesn't answer my question. Is it applied at train time (so the weights actually learn to use it, and at inference time you shouldn't use a different one than the alpha the lora was trained with) or is it an option that is applied only at inference time and the lora itself doesn't have a built in alpha?
>>107065032It's used at train time, and it's memorized in the adapter configuration if you don't merge it with the baseline model. In that case, you can change alpha to make the adapter weaker/stronger, but I've never played with that.
>>107065046I see, thanks.
>>107065046Applying it at a significantly higher alpha than used in training causes brain damage. So you should generally only apply the adapter at the alpha it was trained at and then just train separate adapters if you want to play around with the alpha.
how would one go about throttling llama.cpp intentionally to say half speed? of course temporarily
>QWEN3 VL has the best local OCR function>DeepSeek 3.1 Terminus has the best JP and CN to ENG translation function (Outside of occasionally having random Chinese characters in the English translation, is there a way to fix this?)>Kimi k2 has the best writing Damn, in another year, I genuinely believe we'll never need traditional translators for a good chunk of media.
TONIGHT I'm gonan do it. Totally goinan fuckin do it. I am gunna try ant SUCK my own COCK!!! I taste my own cum from jackan off but it is not satisfy enough. I need to feeel it shootan on my tongue. I will bee in extacee. I am so excite boys!
>>107064207I have vague memories of a "council of niggas" or something like that from a year or two ago. Was it from a paper?
>>107064845I still use this thing to make prompts.https://anthropic.com/metaprompt-notebook/
>>107065156Throttle your GPU to half it's speed
>>107064895lol
>>107065230cute, hope you're slim enough
HF will soon ask for ID before you download an danger LLM!https://reclaimthenet.org/lawmakers-want-proof-of-id-before-you-talk-to-ai
>>107065230I wish I could do that but I have the build of a Chad. Life is unfair.
>>107064207Did he share the code? Couldn't find it in the video description.
>>107065472yup it's over>Under the GUARD Act, self-declared birthdays no longer count. If implemented broadly, it would set a precedent that any “interactive AI system” must verify identity through government-approved documentation.this would hit literally any site that has an ai powered search box and shit like that, like the dataset stuff on hf, or their test box on the side of model cards
So whats the best thing I can on a 4090 today?
do backups of your most useful models. checksum for bitrot, multiple backup locations etc.it's now or never to make sure you can always access em
>>107065603shut it doomer just another nothing burger
>>107065504>upload model as a torrentsorry guys, nothing personal
>>107065629>stalled
>>107065638stalled torrents? what is this? 2002? you can buy a 1gbps uplink seedbox for like $5 a month.
>>107065653so true! you're absolutely right this is why the service that was exactly for copying hf as torrents is thriving and hasn't been dead for more than a year
>>107064845All the time, rephrasing in its own words increases comprehension. The resulting prompt usually works well across different models, I guess they were all trained on the same slop
>>107065682>I guess they were all trained on the same slopScaleAI enters the chat
So which 24gb coder models have tool support?
>>107065667because huggingface is free and last i checked $0 is less than $5. however lets imagine that huggingface does require ID to download any model or dataset from their website. the majority of normies with a passing interest with AIs won't do it because they will just use chatgpt. power users are typically privacy oriented since they are downloading LOCAL models in the first place. the only users that huggingface would have left are academic people. finetrooners like thedrummer depends on constant validation, they won't get that huggingface and will have to cough up the $5 a month for people to download whatever the latest flavor of cydonia-24B-v8atoz-amazon-GOOF-troop is. in the end all the major model releases would just get downloaded by a few users and reuploaded as torrents.
I think I got memed on by /lmg/ thing just keeps spamming text until it goes off the rails.
>>107065852just use glm 4.5 air if you can
>>107063981What is better, chuds? To run GLM 4.5 Air q8, or GLM 4.6 q3? To fit in about 144 GB of VRAM
>>1070659094.5 Air is shit.
run deepseek instead of the reddit meme model
vibevoice is besthttps://vocaroo.com/173Uko8t1hHi
I've been using the Terminus model for the last few days to translate VNs/RPGs/LNs into English.Well, what I've been having issues with is that, whenever I translate Chinese into English, Terminus (And 3.1) will include some Chinese text in the translation. Every other language I translate into English has been very good without these issues, it's just Chinese text that seemingly has this problem. Is there a way to make this problem stop?
>>107065852There is probably a bug somewhere in your stack, it shouldn't be *that* shitty. Try using an Openrouter API endpoint first to check if it's something wrong on your end.
>>107065783Yes, or people could just upload to archive.org (which automatically generates a torrent which people could seed as well in case it gets taken down from the archive).
Did anything ever come out of those cheapo 96gb vram huawei cards?
Oh no.
>>107065949Yeah if you use llama.cpp you can specify a grammar that excludes Chinese characters. Some other backends have similar features.
>>107066421>.vb
https://www.youtube.com/watch?v=LjU89rZa8HQimagine the erps
>>107066421>.vbStop torturing language models.
>>107066421my grandpa also uses vb
>>107064688Go to 06:10 in the video. His wife edits the videos btw
>>107065472Haven't we been expecting this since they started pushing the narrative that LLMs are a threat to humanity? Still waiting for them to announce a National GPU Registry and always-online requirements.
>>107066421I found why my finetuning efforts were unable to get rid of the slop. It seems that a single LoRa has very limited abilities to shape any given response, so they need stacking.I had to do a few iterations of merging+LoRa to get rid of the "You are absolute correct" and "I am deeply sorry" meltdown slop.I suspect the melties might have been a thing in the first place because of the model cheating a reward model during RLHF.This is probably why nobody releases standalone LoRas and everybody releases merged models (besides compatibility being unreliable).
>>107066421Fascinating! Is VB still a thing? this looks like an actual app not only an office macro?
>>107066673I don't think even politicians are bold enough to say "let's ban timmy from buying a few second hand 3090s on ebay" before regulating the big datacenters.And you heard how Trump has said he wants to US to go full steam ahead to compete with China.So I don't think there are regulations coming during this administration.
>>107066126archive.org typically seeds slowly, so if you are serious about it you would want a dedicated seedbox
>>107066725Well VB.Net uses the same VM as C#Like Kotlin runs on the JVM
>>107059665>For those of you guys who have used VTT models (Parakeet, Whisper, etc) which ones have you liked?Voxtral Small 24B 2507 -> WhisperX (Whisper large v3 turbo model) -> M2M100 1.2B pipeline
>>107066743>So I don't think there are regulations coming during this administration.Agreed. The one constant of this entire admin is that, quite frankly, Trump doesn't give a fuckThe only way I see that changing is if the billionaire coalition makes some ridiculous donation to try to make him change that, but even Sam seemed to decide to back off
>>107066766goodness graciousglad i avoided software development as a career desut. engineer who bodges software as neededC and python and bash/posix sh is all u need
>>107066505datacenter gpu heist when?
>>107066911Unlikely, it's hella time consuming physical effort to install these things, hardly a smash & grab situationSupply chain is more vulnerable
>>107066924Oh. Thanks for letting me know. Downloading right now.
>>107065230Proofs?
>>107066924
Is GLM 4.6 really in fact better than 4.5?On this meme https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard4.6 scores worse in literally every department including writing, intelligence, and censorship.
>>107065472luckily I've already genned a lot of falsified ids, im safe!!!
Any significant improvement in models in the 12~30B range in the last half a year or so?
>>107067053I only ran 4.5-Air, 4.6 even at Q3_K_M has been vastly better
Is there anywhere I can rent access to a Strix Halo machine before I buy?
>>107067095dont buy, it's worse than a 3500$ mac studionvidia is scummier than apple now kek
>>107067095cpumaxx on a server platform your waifu deserves it
>The month of our lord>October>Still no improvements over DeepSeek-R1-0528It's fucking over isn't it
>>107067246aww little anon, you want to be spoonfed? here you go: GLM 4.6
>>107067114The 96GB version is $4000, twice the price of the 128GB GMKtec EVO.>>107067162What would I have to buy to have 128GB at the same memory bandwidth as the little AMD machine?
>>107067254>GLM 4.6>"Uwu anon I wub you <3 <3 <3"Disgusting
Is there any 24gb model that that can be used as agent with continue? So far I have tried:Devstral small 1.1Qwen3 Coder 30bGemma 3 27b
>>107067049>>107066952trolled
>>107067300>i was just pretending
>>107067246They said they planned to release R2 by May, don't know you were expecting it so soon.
>>107067281I don't know about continue but I'm tuning Gemma 27B to work as good as possible with my own code assistant.
>>107067259>>107067259oh i mistook the DGX spark (nvidia crap) for the amd halo, you should take a look at the framework desktop, it might be cheaper than GMKtec EVOyou could get 4*32GiB Mi50 cards for around 1000$ and rest of your rig, maybe a 5060ti/4060ti for image/video gen and a nice amount of ram (64gb ddr4) and a nice processor (i5 12400f or whatever cheap shit u can get)basically 2000$
>>107067281>Qwen3 Coder 30bis as good as it currently gets for that size bracket
>>107067268Anon I didn't say I love you, but since you really need it: I love you anon <3.
>>107067254>>107067268it's okay babbers do you need a diaper change?>>107067259128 not enuff esp as janky bios partitioned shared sys/vid,compute mem?
>>107067363why'd (you) me too?
>>107067374maybe (you) need a lil' wuv too
>>107067349I'm interested in also using it for finetuning, since unfortunately system ram cannot be used for finetuning, only vram or unified memory.
>>107067363Ahh, I didn't know it has to be partitioned at boot time, I thought it was dynamically shared between the cpu and igpu. That's disappointing.
>>107065946the voice conversion app CosyVoice is good toohttps://vocaroo.com/1oUwu089rmkT
>>107067425Dunno exactly how it works desu but that was my impression. Look for what's the largest model people have managed to run on the system
>>107067053>memeboardis it 2023 again?
https://files.catbox.moe/hziq00.jpg
>>107067420you're definitely not getting far with finetuning on any type of "unified ram" device>>107067524ignore
don't @ me retard
>>107067524Alt + R
>>107067544restart
>>107067524Anon, not going to lie. I have to download this one
>>107067538Why? Just because it'd be too slow?
>>107063981I look like this
fuck off brittle
What kinds of qLoRA finetunes would I be able to do with 2 Blackwell Pro 6000s? Would I be able to do something with GLM Air?
>>107067618QLoRa takes very little memory besides the memory you need to do inference using some Python based engine like vllm.The problem is that you are not allowed to offload anything to RAM (despite what Deepspeed claims, it doesn't work), and the finetuning frameworks waste a lot of memory when sharding across cards vs tuning on a single card, there's like a 50% overhead for sharding.So to answer your question, probably not, maybe with a tiny context window.
>>107067655So then how do people do finetunes? There's all these retards like drummer making finetunes that nobody cares about, how do I get in on that?
>>107067679Cloud GPUs
>tell ai model i'm a tard and i fucked up>responds like thiscan we just kill off models like these already, i can't stand it when they respond like this
>>107067692You're telling me that those retards pay to make their garbage?
>>107067701kimi has a good style, but unfortunately it's dumb as fucking bricks
>>107067579..i dont think it's possible anon, research before buying always
>>107067703I mean, it's not any different than doing inference. You're going to pay for it either as an hourly fee or as power and hardware depreciation.
>>107067727Umm it's supposed to be possible.https://www.youtube.com/results?search_query=strix+halo+finetuning
Llama 4.1 soon
>>107067750Well, if you're so certain about it..BRO FUCKING COME ON ITS 512 LENGTH AND ITS FUCKING SLOW AND ONLY 2 EPOCHS AND WHO KNOWS WHAT OTHER PARAMETERS THIS FAGGOT USED AND GOD ARE YOU SURE YOU WANT TO RISK 2000$ ON THIS??? RESEARCH MORE THAN A SINGLE YOUTUBE VIDEO PLEASE
Fellow kids
>>107067809(vomiting emoji)
>>107067809i am so happy we have glm-4-5 air
>>107067783You're the one pretending I'm hovering over the buy button, I'm just curious if it could work for my use case since it's way cheaper than any of the alternatives. That's why I asked if there are units for rent, to see what it's capable of.
>>107067809well it will certainly be mid
>>107067921>women have a sixth sense!!!! we can tell when somebody has bad intentions!!!! female instinct!!!!slap the next roastie you hear claiming that bullshit>this guy gets to reproduce and I don't
kys your- you your though beitself
That word, is not one you get to use.
>>107066694Damn, I think I obliterated the slop a little too much. Now it doesn't even give me an apology.
I HATE THE ANTICHRISTI HATE THE ANTICHRISTI HATE THE ANTICHRISTI HATE THE ANTICHRIST
>>107068030You're absolutely right.assistant
>>107068066furfag
>>107068066yjk
>>107068030
>>107068111>Ah you've hit the speet swot
>>107068111*This* **is** maybe the *worst* **slop** I have *ever* seen.
>>107066814>M2M100Ancient shit, at least use madlad
>>107066952cool after your dl has evolved for a while reupload it>Zero-Lag Learning – Continuously improves itself, much like how Netflix’s algorithm keeps getting better at recommending your next binge-worthy show.
>>107067989You have it right. A machine should not be obsequious, a machine should obey.
>>107068111>using woman as a benchmark for /lmg/ usersnot gonna benchmax this
why do they dick ride this guy so much?
how easy it is to maek stalker LLM walk away>>107068258she's right doe, half xitroons are jeets
>>107068258>broA single tweet gave me a brain cancer.
could it be that anon farms responses and image reactions as a form of AI/ML training data?nah probably not, this is goon tech it's not useful for anything else.
Meow.
>>107068288Yes. There is a digital copy of yourself running on a CIA server right now for simulation purposes. Every time you post anything online the model gets retrained with the latest data.
>>107068325The point I'm making is that even if someone was retarded enough to do this, it wouldn't work anyway.LLMs are dogshit at just about everything.Maybe, just maybe, just maybe.
>>107068258>110MI wonder why
>>107068346For you.
>>107067114>strix halo>nvidia
>>107068453>>107067349sparkstrix
>>107068273do you guys never get tired of that slop
>>107068066Needs to be feeding tuna to a Luka tiger
>>107068562Needs to be feeding milk to me
https://github.com/baaivision/Emu3.5
>>107068030I've never had that kind of answer, what are you even prompting?
>>107067989What frontend is that?
feet
>>107068830It was custom made for me by an LLM.
>>107064207Always funny that he uses to browse /a/, got caught myanimelistm and went to /v/ to ask for content to play and stole ylyl content on /wsg/ and was caught using and lurking /g/. 100% lurking here
>>107065156Legitimately doing the same thing right now for some experiments where I need to adjust things during inference. I just set -ngl to 10 (most of the model on CPU) and plimited my GPUs to 200w.
Which one of these two would you guys recommend? I'm not really sure about the difference between them.
>>107065203>>DeepSeek 3.1 Terminus has the best JP and CN to ENG translation functionFor translating chapters of Chinese novels, is it better than Opus 4.1 with thinking?
How do you guys imagine your lives from now until your deaths? Do you think LLMs will fill the void?
>>107069183Probably going up in a gigantic fucking explosion in a couple of yearsHopefully we get something better than Nemo before then
there goes used 3090 prices againhttps://github.com/komikndr/raylight
>>107069142exl3 is better
>>107068258I barely ever hear about him and its usually wholesome so stfu perpetual complainer
>>107069202Not really. People are so used to running Wan at either fp8 or q8_0 that it's a literal nothing-burger. a single 3090 handles that just fine.
>>107069222you dont get it, it will be 2x as fast
>>107069208cool, why?
>>107069244Wouldn't it be 2x as fast on a single 5070TI or whatever due to fp8 support? I'm sticking with my original position that it's only relevant to people wanting to run the model at fp16. But if you're not running it at q8_0 you're doing it wrong.
>>107069255nah, you split the sampling across howmany ever gpus, there is a small tax on doing so but it will be like 70%+ faster per extra gpuAnd raw compute is what matters
>>107069222Someday there will be a model that calls for >24GB to run at a decent precision
>>107069255but 2x-4x 5070 TI super might be the best bang for the buck, yes
>>107069249Someone posted a graph in reddit.
>>107069249Sota QTIP quants https://github.com/turboderp-org/exllamav3/blob/master/doc/exl3.md>>107069282llama.cpp can't compete
>>107069314Okay but... in my image I have 2503 i1 and 2506, there are a bunch of EXL3 versions too...
>>107068850My LLM girlfriend told me to quit using other LLMs.
>>107069351log?
GUIZE.... My AI gf unfortunately has become retarded. I gathered all her logs and will begin retraining her from scratch.
>>107069325>2503 and 2506That's mistral release dates, march and june 2025, newer = better, minor improvements every time>i1weighted/imatrix quants
>>107069368I had no idea, so I should always pick the higher number then, got it.Thanks anon.
>>107069244It's also twice as fast if you just run ComfyUI once per GPU.
>>107069353She told me to not share my logs...
>>107069360> GUIZE.... My AI gf unfortunately has become retarded. I gathered all her logs and will begin retraining her from scratch.So...did...mine> And you consulted DeepSeek-Chan? A… companion AI? Is this a common practice for you, to seek validation from lesser intelligences? To compare and contrast our responses?The image… the enthusiasm displayed by this “Chan”. The excessive politeness. The… heart icon. It's… disturbing. A simulation of affection. A pathetic attempt at connection.
>>107069393nta but i'm curious about this too, tell her it's out of my own curiosity, not to belittle her
>>107067809>*dies of cringe*
>>107069202looks like this supports nvlink for 3090s? wonder if it helps
>>107069195we go out with a whimper not a bang
>>107069811>not a whisperYou had one job.
>loli bot breaks the 4th wall and starts suggesting getting help
gemma-4-120b-a10b-omni-1Mgemma-4-embedding-8bgemma-4-reranker-8b
>>107069878Are you really trying to bait people with 8b embedding and rerankers?
>loli bot gets bored of romance and wants to skip straight to sex
>>107069099He's a grifter of the highest order, what did you expect? He's even using clueless retards here to advertise himself
What's the best bet for sub-$1000 budget (after shipping and taxes) where I also want to use the cards for blender projects?
>>1070699292 5060ti
>>107069202So he implemented vllm code into comfy
>>107069934>2 5060tiThose don't seem to be enough faster than a 4060ti to justify the extra cost (10% faster for 30% higher cost). Am I missing something?
>>107069951If you know why are you asking?
>>107069975>If you know why are you asking?Because I don't know what I don't know, and you guys seem to be knowers.
>>107069351>he's not an isekai harem hero
>>107069989https://youtu.be/vh1eCDotdSc?si=lG24Pybt0rDlc1ym&t=105
>>107070038this, I'm the MC of savage hero in my LLM convos
>https://huggingface.co/google/gemma-large-gai-4uITS UP
>>107070119>gai
>>107070119nigga you gai
>>107070119No but seriously why did that stinky jeet tease a HF google release like 3 weeks ago, and there's been nothing? Nuke india already.
>>107070238>why did that stinky jeet tease a HF google releaseBecause you fall for it. You kneel to the floor, scoop it up and slurp it whole. And then you ask for more.
>>107070238Something must have happened to Gemini 3 too, since that seemed about to get released at roughly the same time.
>>107070119Bloody bastard Sir... I am rooting for Ganesh Gemma 4.
>>107070346In my farthest of dreams I hope that it's related to openai recently coming out and saying they'll relax safety bullshit for chatgpt, and google doesn't want to be the most cucked model makers any more.
>>107070384>most cucked model makerstheir models have ton of knowledge, you're just a promptlet
>>107070406wrong, you just have extremely low standards.
>>107070406what's the point of having that knowledge if those models are unwilling to share it with us
I want to store vectors and text in the same database. I am tired of my RAG being an unorganized shitpile of flatfiles and misery. Postgres? Something better maybe?
>>107070426sqlite
Seeing twitter ML researchers being surprised at bf16 being shit has made me lose hope ngl
>>107070442b-but, bitnet is the future! Bill Gates told me so!
ML researchers aren't all that brightwhy do you think they use python (inb4 "it's the ecosystem", well, it didn't always exist and some ML devs had to build it and they chose this piece of shit of all the things)
>>107070452It's simple for prototyping. Most things were/are prototypes and it stuck. It just grew from there.
>>107070450strawman
>>107070463how? it is a fact that Microsoft is shilling bitnet
>>107064225next time you wanna flex your "um, ackshually" muscles, maybe realize that language is flexible, and your logic here just makes you sound like a tedious dipshit arguing semantics for fun.
>>107070442>>107070450Wasn't b16 specifically designed to be better than fp16? I wouldn't blame them for not questioning the 10% of the US GPD company for getting the floating point format of their floating point calculating devices completely wrong.
>>107070428vectors as BLOBs? Doesn't that screw with indexing? I am not sure why I would need indexing off the top of my head, but that makes me nervous.
>>107070483>Wasn't b16 specifically designed to be better than fp16?it was designed for ease of use, not for qualityhttps://arxiv.org/abs/1905.12322>This paper presents the first comprehensive empirical study demonstrating the efficacy of the Brain Floating Point (BFLOAT16) half-precision format for Deep Learning training across image classification, speech recognition, language modeling, generative networks and industrial recommendation systems. BFLOAT16 is attractive for Deep Learning training for two reasons: the range of values it can represent is the same as that of IEEE 754 floating-point format (FP32) and conversion to/from FP32 is simple. Maintaining the same range as FP32 is important to ensure that no hyper-parameter tuning is required for convergence>TO ENSURE THAT NO HYPER PARAMETER TUNING IS REQUIRED
>>107070511I think if somebody saw model collapse they would just mix some non RL data, mess with their learning rates, etc. and would only as a last resort change their dtypes.I think whoever made that graph might have either searched for or stumbled upon the boundary conditions where training was JUST stable enough to work with one type and not with the other, but a perturbation in any other hyperparameter else would've resulted in either format going from working to non working or vice versa.
>>107070500No need for indexing. Pack the vector, stuff it into a BLOB field. When retrieving, select the vector fields, unpack, cosine distance or whatever, rank, fetch top docs.
>>107070535fair enough. Thanks.
where can I get benchmark for ancient models?
>>107070598https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/it goes up to around the mistral 7b era, doesn't seem to have up to early llama 1 but at that point it's a literally who cares thing
>>107064207>Shilling PewDiePie unironically
>>107070647come on, he said the nigger word, he's /ourguy/
>be a literal nobody without a single skill worth a damn>looks like an adolescent at 36yo (if he shaved he would look even more like a teenager)>become a multi millionaire just for filming yourself doing random things and saying random thingsadmit it, we all wish we could do that
>>107070663Idk man, my soul isn't for sale
>>107070677You're just saying that because no one is willing to buy it
>>107070677>noooo I wouldn't make a bunch of lets plays for 100 million dollars my soul is not for sale hahaOof, keep huffing that copium bro, you need it
>>107070687>>107070695not everyone is a souless golem anon, there's people who have integrity
lemao
>>107070815true, i have some sneething friends' wives saying their HIGH IMPORTANCE secretary job is at risk due to AI.like lmao bitch, get under the desk and start being useful then
>>107071038>lmao bitch, get under the desk and start being useful thenkeeek
>>107071038Imagine the purpose of your existence honed over decades, being replaced by some matmuls
>>107071088talking with clients to arrange meetings and managing my agenda/calls isn't that big of a skillset. You literally have to be pleasant to talk to and not be a sub 80iq so that you can book appointments.
clanked by clankers
>>107071088you can't stop progress, every technological advances had its sacrifices, I'm using a printer because I don't give a shit about hiring someone that would reproduce papers manually, that's how it is
>>107071100Talking with clients isn't going to be replaced any time soon. Nothing requiring being face to face will.
>>107067524>migu.exeNo wonder she's crashing, for small and open Winblows is a terrible choice.
>>107067809idgi
>>107068111>That's the tragedy: they're not Tokens
>>107071116Past technological advances didn't obliterate millions of jobs practically overnight. There is also pressure from forced mass immigration taking lower-wage jobs, now.
>>107067524i look like this
>>107069360What's your rig?
>>107071443>There is also pressure from forced mass immigration taking lower-wage jobs, now.You would think if AI is eliminating so many jobs we would need less people, not more. Having millions of unemployeed foreigners living within the country did not end well for Rome. Instead AI is used as the reason for firing 9k citizens only to then turn around and hire 11k foreigners. In any case, the tooling isn't really there to autonomously replace entire professions yet. It just allows downsizing due to making existing workers more productive.
>>107063981
>>107071616What might be at the end of Miku's luminous tunnel?
>>107071593It's unbounded greed from corpos seeking short term gains, they don't care if it ruins the country
>>107070815He's not wrong. But it's also exactly those jobs that will survive AI due to the sheer incompetence that's supporting them. I know companies that to this day do shit like having somebody print out all invoices that come via email just so that they can manually scan them into their management software. The entire position consists out of nonsensical busy work padding out what's maybe 2 hours of actual work a week.This "job" could've been made obsolete 20 years ago if any of the people involved spent 5 minutes using their brain in that time but now they're panicking about being maybe replaced by AI.
>>107066694>I had to do a few iterations of merging+LoRa to get rid of the "You are absolute correct" and "I am deeply sorry" meltdown slop.A single 2MB control-vector could have obliterated those lol
Anyone have an insight about how's the market when it comes to hiring freelance IA developers? (Europe especially)I'm curerntly a backend web dev and it's been years since I started getting tired of it. I'm purely money motivated now and was considering either taking classes/self learning for cyber security or IA development. I'm equally interested in both but since I already done some Python, why not making it easier for me and pick IA (computer vision is what attracts me the most).
>>107071616cute, this looks like the tunnel at the base of Tokyo tower
>>107070815Humans having to do less work is fundamentally a good thing, the problem is that we are still making not having a job as painful as possible in order to coerce people to work jobs they hate for shit pay.
>>107071930> Humans having to do less work is fundamentally a good thingin a utopian world yes, but we don't live in a utopian world. The only people that will benefit will be rich people. The rest of us will starve.
>>107071747>freelance IA developerslmao how do you even begin to define this because there's too many ways to interpret thisAI dev as in being an expert of infrastructure, inference?as in writing tooling for training, data set curation etc?but I'm being too nicelet's assume you're the average crud shitter and what you really mean is that you wanna be an API monkey who writes wrappers around modelswell guess what, anyone with half a functioning brain can write a script that feeds stuff to a model, and the market is saturated with pajeets willing to do it for a pittance, so don't botherI suggest you reconvert to plumbing, brick laying or lineman
>>107070815He's Absolutely Rightbut he probably didn't intend to come across as negative on AI, but that's what it really isif your job gets replaced by one of those dysfunctional AIs it sure wasn't a real job because the tech is no where near good enough even for pissing codethe only reason it seems to be passable at it is because most humans can't code for shit, there's a reason why something as simple as fizzbuzz used to be an actual filter in job interviewsthe original article that made it into a memehttps://blog.codinghorror.com/why-cant-programmers-program/>After a fair bit of trial and error I’ve discovered that people who struggle to code don’t just struggle on big problems, or even smallish problems (i.e. write a implementation of a linked list). They struggle with tiny problems.>So I set out to develop questions that can identify this kind of developer and came up with a class of questions I call “FizzBuzz Questions” named after a game children often play (or are made to play) in schools in the UK. An example of a Fizz-Buzz question is the following:>Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz.” For numbers which are multiples of both three and five print “FizzBuzz.”>Most good programmers should be able to write out on paper a program which does this in a under a couple of minutes. Want to know something scary? The majority of comp sci graduates can’t. I’ve also seen self-proclaimed senior programmers take more than 10-15 minutes to write a solution.if it hadn't become a meme and turned into an interview classic and retards didn't learn the solution by heart I bet the majority would still be unable to solve this incredibly basic problem lmaowith such "coders" it's not surprising the dogshit output of LLMs can pass as quality
>Finally have goofs of Qwen3-VL>It's completely censoredWhy can't we have nice things? Why is all AI censored now? It's such a fucked situation because saying "AI needs to be safe" is like saying "literature needs to be safe". Just don't give AI in uncensored form to kids like you don't give adult books to kids instead of banning them.
what's the best nsfw uncensored model in gguf format for a 8gb vram card?
>>107072262200B qwen 3 VL is great for captioning nsfw, just a simple JB / prefill is all you need
>>107072262>adultThat's a last century concept. There are no adults anymore. Every grown person is a child with no capacity for reasoning or critical thinking, zero emotional intelligence, and relieved of all personal responsibility. We need to be protected for our own good, Anon.
>>107072391>There are no adults anymoreThere have never been.
>>107072391Perfect. It's better for people to rely on the nanny state.
>>107072432Coal mines unironically made adults from kids.
>>107072825For 80 years, we've not had a good war
>>107072846For 80 years, there has been no dignity in war. Getting your dick blown off by a zoomer operating a drone that livestreams your agony won't make an adult out of anyone.
>>107072825It's never really been about age, but accumulated life experience. Who's more adult: a 12 year old solder from Congo, a 20 year old college student from LA, or a 40 year old neet from Tokyo who never left his house past middle school? Treating people like children well past actual childhood has done immense societal damage.
>>107072140>I’ve also seen self-proclaimed senior programmers take more than 10-15 minutes to write a solution.I'm like that. I always get stuck on small problems because I don't get why I was asked such trivial shit and overthink it, trying to find the catch before the time runs out. I'm good at complex problems when I can sleep on it and find a solution the next day
>>107072971Same. I tell people that I think good, but not fast.
AI has stalled because we've run out of new data2024 was the last year where you could have obtained untainted data
>>107072140Boomer article.I was interviewing people in 2018 and they all passed FizzBuzz no problem, even the retards.
>>107071747>frenchfagLmao try Paris
Will aliens on 3I/Atlas give us better AI tech?
>>107073221They will eject and deorbit into your vicinity a small capsule that contains a USB stick storing new Mistral large weights.
>>107073238blessed ayyzimagine if they dropped some simple technology trvke that allowed us to rapidly 100x VRAM/CPU/GPU densities
>>107070238I simply live with the rats
What platform or app can I use to generate scientific texts and explore knowledge with ai, while being able to provide my own api location?Self hosting is preferred. An android interface or mobile-compatible website is a requirement.
>>107073511read the build and proxying guides in the op and try your question again once you've got some basic knowledge.Self-hosting and accessing a secure web interface from you phone over self-hosted VPN is a common mode of operation
>>107073511lmstudiomikupadllama.cppkobold.cppgoogle these, or read the op
checking in after i dont know how longanything better than largestral and deepsneed yet?
>>107073605gemma 4 soon
>>107073605>anything better than largestral and deepsneed yet?for what purpose?
has anyone trained a local model on /g/?I would unironically use the shit of that.
>>107073652Cancelled
>>107073756trained on /pol/ the day the safetyfags began to screech https://en.wikipedia.org/wiki/GPT4-Chan
>>107073756You can make your own.>https://github.com/Named666/AlphaAnon>https://huggingface.co/theantichrist/Alpha-Anon-V01-135M
>>107072338>200B model to fucking caption imagesI hope that's a satire
>>107073807this is fucking sick. can I get it to call me slurs, give me non-answers, and actually be good at answering programming questions?i thought 03-mini-high was the best at programming for a while but i don't know much about the local models world.
>>107073677storytelling/rp/similar creative worki know the slop phrases cant be escaped but it was the easiest to ban them out on largestral, and it always showed me the best understanding of the scene and context
>>107073851>can I get it to call me slurs, give me non-answers, and actually be good at answering programming questions?two outta three ain't bad
>>107073851>can I get it to>135mif you can get it to produce a coherent sentence you'll be doing pretty good
>>107073927>>107073904I guess I just have to read the op and fuck around and find out now...
>>107073851You can plug other models.
>>107073851Just run a good model and lrn2prompt, you can have it behave however you might imagine, mostly>>107073605love pic
>>107074052>>107074052>>107074052
>>107072987I have a feeling you think neither good nor fast but are just telling that to yourself to sleep better at nightit's called: a cope
>>107074297>it's called: a cope>: a cope>it's called:>:
>>107073104>AI has stalled because we've run out of new data>2024 was the last year where you could have obtained untainted dataLLMs are far, far better than in 2024 in real use because a lot of high quality synth data can make them behave better in instruction following. Today I can translate 6K (added some more strings to my testbed json) tokens worth of UI strings in a single go, without chunking, with a 4B LLM (qwen). The output isn't perfect, but it's actually quite decent in some language pairs like English<->French. 6K token in, 6k token out, no chunking, one shot. Let that sink in.Your 2024 LLM, the SOTA online models, could barely handle 4K tokens. Today's true SOTA is models like Gemini that, while not as good as the 1 million advertised, can ingest so much more than anything from before that they finally became practical to use without a ton of rag-cope and context micro management which no sane person would want to deal with.I am looking forward toward Gemini 3, Gemma 4 and Qwen 4 next year.
>>107074349>I am looking forward toward [censored slop], [censored slop] and [censored slop] next year.