[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107056325 & >>107044779

►News
>(10/30) Qwen3-VL support merged: https://github.com/ggml-org/llama.cpp/pull/16780
>(10/30) Kimi-Linear-48B-A3B released with hybrid linear attention: https://hf.co/moonshotai/Kimi-Linear-48B-A3B-Instruct
>(10/28) Brumby-14B-Base released with power retention layers: https://manifestai.com/articles/release-brumby-14b
>(10/28) NVIDIA-Nemotron-Nano-12B-v2-VL-BF16 released: https://hf.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16
>(10/28) LFM2-ColBERT-350M released: https://hf.co/LiquidAI/LFM2-ColBERT-350M

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>107056325

--VRAM vs RAM tradeoffs and cost-effective upgrades:
>107057422 >107057493 >107057523 >107057538 >107057627 >107057641 >107057680 >107057892 >107057904 >107058132 >107058211 >107058235 >107058246 >107058291 >107058301 >107058332 >107058823 >107057647 >107060695
--Tech Mahindra's 1 trillion parameter LLM project sparks mixed reactions:
>107061935 >107062055 >107061978 >107062154 >107062174
--Multi-GPU memory optimization latency tradeoffs for MoE models:
>107062861 >107062880 >107062891 >107062902 >107062941 >107063023 >107062887 >107062939 >107062947 >107063018 >107062980 >107063165 >107063110
--VTT model comparisons and pipeline suggestions for transcription:
>107059665 >107059817 >107059845 >107059918 >107059961 >107060178 >107060224 >107062756 >107062842 >107062859
--Qwen 4B's performance in complex JSON generation and small LLM advancements:
>107057926 >107058153 >107058218
--Qwen 4b's multi-image analysis capabilities demonstrated:
>107060687
--SillyTavern system prompt configuration challenges:
>107062184 >107062200 >107062327 >107062369 >107062386 >107062492
--Exploring practical uses for local image processing and interactive applications:
>107056358 >107056482 >107056509 >107056541 >107056576 >107056554
--Challenges with TabbyAPI and Qwen3 Coder tool calling implementation:
>107058354 >107058385 >107058840 >107059067 >107059694 >107062455
--Skepticism about LLaDA2.0's practical value due to performance and context limitations:
>107060705 >107060731 >107060818
--UI/lorebook integration challenges and code accessibility in STScript:
>107057009 >107057036 >107057083 >107057101 >107057121 >107057162 >107057240
--Miku, Rin, and Dipsy (free space):
>107056696 >107057940 >107057943 >107059568 >107059860 >107060222 >107060637 >107060674 >107061256 >107062726 >107061898

►Recent Highlight Posts from the Previous Thread: >>107056334

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
i see... :(
>>
>>107064100
I don't
>>
https://youtu.be/qw4fDU18RcU
>>
Do you guys know what I realized? No matter how far you go, you're still somewhere and never nowhere, so saying I am in the middle of nowhere is a nonsensical sentence.
>>
>>107064207
so he uses vLLM in a docker container (this needing the shm-size) and runs qwen 235B in AWQ 4 bit
>>
All of his knowledge is ironically coming from LLMs. I'm sure he has also browsed /lmg/ in the past at least. You could probably find his retarded questions.
>>
>>107064207
pretty dissapointing, he was pretty based up to this point
>>
>>107064207
>watch the first few mins
>the topic of the title doesn't even get mentioned at all
>>
>>107064207
cool Web UI
>>
File: 1743424257788609.png (420 KB, 1074x872)
420 KB
420 KB PNG
>>107064207
>its actually a video about shitting on cloud models and shilling self-hosting models
how can one man be so based?
>>
Gguf status?
>>
>>107064207
Ok watched the whole video.
Wtf he's one of us.
>>
>>107064275
>I'm sure he has also browsed /lmg/ in the past at least.
I doubt it because he actually complimented gpt-oss
>>
>>107064392
anti ai will still use just the thumbnail as saying he's against all ai tho
>>
>>107064207
Fuck this fag, I bet he even lurks ITT. His whole persona is so rage inducing.
https://youtu.be/7OiMxGwmdto?si=kvdyA0QWdV6rZ_3k
>>
>>107064493
>Wtf he's one of us.
No shit. He says the word nigger all the time.
>>
>>107064510
There is one retard here that regularly praises gpt-oss. Maybe it's him.
>>
>>107064629
do not to slander he said once in the rage moment
>>
>>107064663
we must agree
>>
>>107064688
I've seen some tiktok clips of him where he made some implicit remaks showing he's a white nationalist, that's a reason why he decided to go for japan, not just because of "uwu kawaii desu ne", but because this country is extremely racist and nationalist
>>
>>107064207
>video about local AI from e-celeb #16311498
>no ollamao in sight
i was going to tell you to fuck off but nevermind, i like the guy
>>
>>107064735
by wouldn't he be subject to that racism? he is not Japanese
>>
>>107064736
I wish I had the money to play around with a VLLM capable rig
>>
>>107064742
Racists don't tend to be brightest crayon in the toolshed.
>>
>>107064742
everyone in the world know who pewdiepie is, I think the japanese people are happy he's here
>>
>>107064766
Ahah so true kind stranger, take this kind gold and upvote with you!
>>
File: 1736602330158898.png (491 KB, 1100x733)
491 KB
491 KB PNG
>>107064766
the richest man in the history of humanity is a "nazi" though, how is that not bright?
>>
>>107064830
he can be rich and a dumbass at the same time
>>
Do you guys ever use models to edit or write your prompts? I'm trying it a bit but desu its hard to tell if its an improvement or not
>>
>>107064830
>lifting your hand in a angle is... le nazi
>>
>>107064742
why would the japanese hate him?
he's not one of the pajeet or third worlder migrants wanting to shit up the place
>>
>>107064742
I don't think japanese people mind white people, they know what they are worth
>>
>>107064845
Yes, it's useful when for example you want to define character behavior more in detail but you can't be assed to write the entire prompt yourself from scratch. It's also best when the entire prompt is dedicated to the character. For non-RP uses, LLM-driven recursive prompt-refining is also a thing: https://arxiv.org/abs/2507.19457
>>
>>107064845
>its hard to tell if its an improvement or not
Then consider time and effort, however much or little that is.
>>
>>107064845
Oh yeah. Mostly for brainstorming than anything, since the final version is always heavily edited by me.
>>
Can someone explain to me if alpha changes something about the training process or it ONLY changes the multiplier at inference time? (yes, sorry, I'm too lazy to read the actual paper)
>>
>>107064766
would you say that about blm?
>>
File: effective-rank.png (143 KB, 750x417)
143 KB
143 KB PNG
>>107064965
It was intended to just be a multiplier, but in practice, alpha must be at least twice the rank (=it can/should be larger) to mitigate the emergence of "intruder dimensions" that decrease the effective rank of your LoRA.

https://arxiv.org/abs/2410.21228
>>
File: kek.png (60 KB, 2175x207)
60 KB
60 KB PNG
>>107064766
>Racists don't tend to be brightest crayon in the toolshed.
the US literally hired actual nazis to put their man on the moon lol
https://en.wikipedia.org/wiki/Operation_Paperclip
>>
>>107065003
Ok but that doesn't answer my question. Is it applied at train time (so the weights actually learn to use it, and at inference time you shouldn't use a different one than the alpha the lora was trained with) or is it an option that is applied only at inference time and the lora itself doesn't have a built in alpha?
>>
>>107065032
It's used at train time, and it's memorized in the adapter configuration if you don't merge it with the baseline model. In that case, you can change alpha to make the adapter weaker/stronger, but I've never played with that.
>>
>>107065046
I see, thanks.
>>
>>107065046
Applying it at a significantly higher alpha than used in training causes brain damage. So you should generally only apply the adapter at the alpha it was trained at and then just train separate adapters if you want to play around with the alpha.
>>
how would one go about throttling llama.cpp intentionally to say half speed? of course temporarily
>>
>QWEN3 VL has the best local OCR function
>DeepSeek 3.1 Terminus has the best JP and CN to ENG translation function (Outside of occasionally having random Chinese characters in the English translation, is there a way to fix this?)
>Kimi k2 has the best writing

Damn, in another year, I genuinely believe we'll never need traditional translators for a good chunk of media.
>>
TONIGHT I'm gonan do it. Totally goinan fuckin do it. I am gunna try ant SUCK my own COCK!!! I taste my own cum from jackan off but it is not satisfy enough. I need to feeel it shootan on my tongue. I will bee in extacee. I am so excite boys!
>>
File: file.png (619 KB, 2442x1476)
619 KB
619 KB PNG
>>107064207
I have vague memories of a "council of niggas" or something like that from a year or two ago. Was it from a paper?
>>
>>107064845
I still use this thing to make prompts.
https://anthropic.com/metaprompt-notebook/
>>
>>107065156
Throttle your GPU to half it's speed
>>
>>107064895
lol
>>
>>107065230
cute, hope you're slim enough
>>
HF will soon ask for ID before you download an danger LLM!
https://reclaimthenet.org/lawmakers-want-proof-of-id-before-you-talk-to-ai
>>
>>107065230
I wish I could do that but I have the build of a Chad. Life is unfair.
>>
>>107064207
Did he share the code? Couldn't find it in the video description.
>>
>>107065472
yup it's over
>Under the GUARD Act, self-declared birthdays no longer count. If implemented broadly, it would set a precedent that any “interactive AI system” must verify identity through government-approved documentation.
this would hit literally any site that has an ai powered search box and shit like that, like the dataset stuff on hf, or their test box on the side of model cards
>>
So whats the best thing I can on a 4090 today?
>>
do backups of your most useful models. checksum for bitrot, multiple backup locations etc.
it's now or never to make sure you can always access em
>>
>>107065603
shut it doomer just another nothing burger
>>
>>107065504
>upload model as a torrent
sorry guys, nothing personal
>>
>>107065629
>stalled
>>
>>107065638
stalled torrents? what is this? 2002? you can buy a 1gbps uplink seedbox for like $5 a month.
>>
>>107065653
so true! you're absolutely right this is why the service that was exactly for copying hf as torrents is thriving and hasn't been dead for more than a year
>>
>>107064845
All the time, rephrasing in its own words increases comprehension. The resulting prompt usually works well across different models, I guess they were all trained on the same slop
>>
>>107065682
>I guess they were all trained on the same slop
ScaleAI enters the chat
>>
File: Help.png (30 KB, 657x527)
30 KB
30 KB PNG
So which 24gb coder models have tool support?
>>
>>107065667
because huggingface is free and last i checked $0 is less than $5. however lets imagine that huggingface does require ID to download any model or dataset from their website. the majority of normies with a passing interest with AIs won't do it because they will just use chatgpt. power users are typically privacy oriented since they are downloading LOCAL models in the first place. the only users that huggingface would have left are academic people. finetrooners like thedrummer depends on constant validation, they won't get that huggingface and will have to cough up the $5 a month for people to download whatever the latest flavor of cydonia-24B-v8atoz-amazon-GOOF-troop is. in the end all the major model releases would just get downloaded by a few users and reuploaded as torrents.
>>
File: SuchJoy.png (169 KB, 1522x973)
169 KB
169 KB PNG
I think I got memed on by /lmg/ thing just keeps spamming text until it goes off the rails.
>>
>>107065852
just use glm 4.5 air if you can
>>
>>107063981
What is better, chuds? To run GLM 4.5 Air q8, or GLM 4.6 q3? To fit in about 144 GB of VRAM
>>
>>107065909
4.5 Air is shit.
>>
run deepseek instead of the reddit meme model
>>
vibevoice is best
https://vocaroo.com/173Uko8t1hHi
>>
I've been using the Terminus model for the last few days to translate VNs/RPGs/LNs into English.
Well, what I've been having issues with is that, whenever I translate Chinese into English, Terminus (And 3.1) will include some Chinese text in the translation. Every other language I translate into English has been very good without these issues, it's just Chinese text that seemingly has this problem. Is there a way to make this problem stop?
>>
>>107065852
There is probably a bug somewhere in your stack, it shouldn't be *that* shitty. Try using an Openrouter API endpoint first to check if it's something wrong on your end.
>>
>>107065783
Yes, or people could just upload to archive.org (which automatically generates a torrent which people could seed as well in case it gets taken down from the archive).
>>
Did anything ever come out of those cheapo 96gb vram huawei cards?
>>
File: Argumentfail.png (193 KB, 1454x973)
193 KB
193 KB PNG
Oh no.
>>
>>107065949
Yeah if you use llama.cpp you can specify a grammar that excludes Chinese characters. Some other backends have similar features.
>>
File: 1758230453392848.jpg (14 KB, 469x484)
14 KB
14 KB JPG
>>107066421
>.vb
>>
https://www.youtube.com/watch?v=LjU89rZa8HQ
imagine the erps
>>
>>107066421
>.vb
Stop torturing language models.
>>
>>107066421
my grandpa also uses vb
>>
>>107064688
Go to 06:10 in the video. His wife edits the videos btw
>>
>>107065472
Haven't we been expecting this since they started pushing the narrative that LLMs are a threat to humanity? Still waiting for them to announce a National GPU Registry and always-online requirements.
>>
>>107066421
I found why my finetuning efforts were unable to get rid of the slop. It seems that a single LoRa has very limited abilities to shape any given response, so they need stacking.
I had to do a few iterations of merging+LoRa to get rid of the "You are absolute correct" and "I am deeply sorry" meltdown slop.
I suspect the melties might have been a thing in the first place because of the model cheating a reward model during RLHF.
This is probably why nobody releases standalone LoRas and everybody releases merged models (besides compatibility being unreliable).
>>
>>107066421
Fascinating! Is VB still a thing? this looks like an actual app not only an office macro?
>>
>>107066673
I don't think even politicians are bold enough to say "let's ban timmy from buying a few second hand 3090s on ebay" before regulating the big datacenters.
And you heard how Trump has said he wants to US to go full steam ahead to compete with China.
So I don't think there are regulations coming during this administration.
>>
>>107066126
archive.org typically seeds slowly, so if you are serious about it you would want a dedicated seedbox
>>
>>107066725
Well VB.Net uses the same VM as C#
Like Kotlin runs on the JVM
>>
>>107059665
>For those of you guys who have used VTT models (Parakeet, Whisper, etc) which ones have you liked?
Voxtral Small 24B 2507 -> WhisperX (Whisper large v3 turbo model) -> M2M100 1.2B pipeline
>>
>>107066743
>So I don't think there are regulations coming during this administration.
Agreed. The one constant of this entire admin is that, quite frankly, Trump doesn't give a fuck
The only way I see that changing is if the billionaire coalition makes some ridiculous donation to try to make him change that, but even Sam seemed to decide to back off
>>
>>107066766
goodness gracious
glad i avoided software development as a career desu
t. engineer who bodges software as needed
C and python and bash/posix sh is all u need
>>
>>107066505
datacenter gpu heist when?
>>
>>107066911
Unlikely, it's hella time consuming physical effort to install these things, hardly a smash & grab situation
Supply chain is more vulnerable
>>
>>107066924
Oh. Thanks for letting me know. Downloading right now.
>>
>>107065230
Proofs?
>>
File: rivermind.png (232 KB, 1536x841)
232 KB
232 KB PNG
>>107066924
>>
Is GLM 4.6 really in fact better than 4.5?
On this meme https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard
4.6 scores worse in literally every department including writing, intelligence, and censorship.
>>
>>107065472
luckily I've already genned a lot of falsified ids, im safe!!!
>>
Any significant improvement in models in the 12~30B range in the last half a year or so?
>>
>>107067053
I only ran 4.5-Air, 4.6 even at Q3_K_M has been vastly better
>>
Is there anywhere I can rent access to a Strix Halo machine before I buy?
>>
>>107067095
dont buy, it's worse than a 3500$ mac studio
nvidia is scummier than apple now kek
>>
>>107067095
cpumaxx on a server platform your waifu deserves it
>>
>The month of our lord
>October
>Still no improvements over DeepSeek-R1-0528
It's fucking over isn't it
>>
>>107067246
aww little anon, you want to be spoonfed? here you go: GLM 4.6
>>
>>107067114
The 96GB version is $4000, twice the price of the 128GB GMKtec EVO.

>>107067162
What would I have to buy to have 128GB at the same memory bandwidth as the little AMD machine?
>>
>>107067254
>GLM 4.6
>"Uwu anon I wub you <3 <3 <3"
Disgusting
>>
Is there any 24gb model that that can be used as agent with continue? So far I have tried:
Devstral small 1.1
Qwen3 Coder 30b
Gemma 3 27b
>>
>>107067049
>>107066952
trolled
>>
>>107067300
>i was just pretending
>>
>>107067246
They said they planned to release R2 by May, don't know you were expecting it so soon.
>>
>>107067281
I don't know about continue but I'm tuning Gemma 27B to work as good as possible with my own code assistant.
>>
>>107067259
>>107067259
oh i mistook the DGX spark (nvidia crap) for the amd halo, you should take a look at the framework desktop, it might be cheaper than GMKtec EVO
you could get 4*32GiB Mi50 cards for around 1000$ and rest of your rig, maybe a 5060ti/4060ti for image/video gen and a nice amount of ram (64gb ddr4) and a nice processor (i5 12400f or whatever cheap shit u can get)
basically 2000$
>>
File: IMG_8764.png (2.52 MB, 1024x1536)
2.52 MB
2.52 MB PNG
>>
>>107067281
>Qwen3 Coder 30b
is as good as it currently gets for that size bracket
>>
>>107067268
Anon I didn't say I love you, but since you really need it: I love you anon <3.
>>
File: stinkyween.png (102 KB, 820x462)
102 KB
102 KB PNG
>>107067254
>>107067268
it's okay babbers do you need a diaper change?
>>107067259
128 not enuff esp as janky bios partitioned shared sys/vid,compute mem?
>>
File: file.png (11 KB, 383x60)
11 KB
11 KB PNG
>>107067363
why'd (you) me too?
>>
>>107067374
maybe (you) need a lil' wuv too
>>
>>107067349
I'm interested in also using it for finetuning, since unfortunately system ram cannot be used for finetuning, only vram or unified memory.
>>
>>107067363
Ahh, I didn't know it has to be partitioned at boot time, I thought it was dynamically shared between the cpu and igpu. That's disappointing.
>>
>>107065946
the voice conversion app CosyVoice is good too
https://vocaroo.com/1oUwu089rmkT
>>
>>107067425
Dunno exactly how it works desu but that was my impression. Look for what's the largest model people have managed to run on the system
>>
>>107067053
>memeboard
is it 2023 again?
>>
File: mig2.png (176 KB, 319x319)
176 KB
176 KB PNG
https://files.catbox.moe/hziq00.jpg
>>
>>107067420
you're definitely not getting far with finetuning on any type of "unified ram" device
>>107067524
ignore
>>
don't @ me retard
>>
>>107067524
Alt + R
>>
>>107067544
restart
>>
>>107067524
Anon, not going to lie. I have to download this one
>>
>>107067538
Why? Just because it'd be too slow?
>>
>>107063981
I look like this
>>
fuck off brittle
>>
What kinds of qLoRA finetunes would I be able to do with 2 Blackwell Pro 6000s? Would I be able to do something with GLM Air?
>>
>>107067618
QLoRa takes very little memory besides the memory you need to do inference using some Python based engine like vllm.
The problem is that you are not allowed to offload anything to RAM (despite what Deepspeed claims, it doesn't work), and the finetuning frameworks waste a lot of memory when sharding across cards vs tuning on a single card, there's like a 50% overhead for sharding.
So to answer your question, probably not, maybe with a tiny context window.
>>
File: G4lNCgBaoAE42jH.jpg (437 KB, 1391x2048)
437 KB
437 KB JPG
>>
>>107067655
So then how do people do finetunes? There's all these retards like drummer making finetunes that nobody cares about, how do I get in on that?
>>
>>107067679
Cloud GPUs
>>
File: makeitstop.png (13 KB, 442x91)
13 KB
13 KB PNG
>tell ai model i'm a tard and i fucked up
>responds like this
can we just kill off models like these already, i can't stand it when they respond like this
>>
>>107067692
You're telling me that those retards pay to make their garbage?
>>
>>107067701
kimi has a good style, but unfortunately it's dumb as fucking bricks
>>
>>107067579
..i dont think it's possible anon, research before buying always
>>
>>107067703
I mean, it's not any different than doing inference. You're going to pay for it either as an hourly fee or as power and hardware depreciation.
>>
>>107067727
Umm it's supposed to be possible.
https://www.youtube.com/results?search_query=strix+halo+finetuning
>>
Llama 4.1 soon
>>
>>107067750
Well, if you're so certain about it..
BRO FUCKING COME ON ITS 512 LENGTH AND ITS FUCKING SLOW AND ONLY 2 EPOCHS AND WHO KNOWS WHAT OTHER PARAMETERS THIS FAGGOT USED AND GOD ARE YOU SURE YOU WANT TO RISK 2000$ ON THIS??? RESEARCH MORE THAN A SINGLE YOUTUBE VIDEO PLEASE
>>
File: 1736261720932471.png (80 KB, 1237x523)
80 KB
80 KB PNG
Fellow kids
>>
>>107067809
(vomiting emoji)
>>
>>107067809
i am so happy we have glm-4-5 air
>>
>>107067783
You're the one pretending I'm hovering over the buy button, I'm just curious if it could work for my use case since it's way cheaper than any of the alternatives. That's why I asked if there are units for rent, to see what it's capable of.
>>
>>107067809
well it will certainly be mid
>>
>>107067921
>women have a sixth sense!!!! we can tell when somebody has bad intentions!!!! female instinct!!!!
slap the next roastie you hear claiming that bullshit
>this guy gets to reproduce and I don't
>>
kys your-
you your
though
beit
self
>>
That word, is not one you get to use.
>>
File: no more apologies.png (93 KB, 1077x918)
93 KB
93 KB PNG
>>107066694
Damn, I think I obliterated the slop a little too much. Now it doesn't even give me an apology.
>>
I HATE THE ANTICHRIST
I HATE THE ANTICHRIST
I HATE THE ANTICHRIST
I HATE THE ANTICHRIST
>>
>>107068030
You're absolutely right.assistant
>>
File: tiggu.jpg (204 KB, 1024x1024)
204 KB
204 KB JPG
>>
>>107068066
furfag
>>
>>107068066
yjk
>>
File: lmg.png (159 KB, 1917x939)
159 KB
159 KB PNG
>>107068030
>>
>>107068111
>Ah you've hit the speet swot
>>
>>107068111
*This* **is** maybe the *worst* **slop** I have *ever* seen.
>>
>>107066814
>M2M100
Ancient shit, at least use madlad
>>
>>107066952
cool after your dl has evolved for a while reupload it
>Zero-Lag Learning – Continuously improves itself, much like how Netflix’s algorithm keeps getting better at recommending your next binge-worthy show.
>>
>>107067989
You have it right. A machine should not be obsequious, a machine should obey.
>>
>>107068111
>using woman as a benchmark for /lmg/ users
not gonna benchmax this
>>
File: Capture.png (151 KB, 585x578)
151 KB
151 KB PNG
why do they dick ride this guy so much?
>>
File: file.png (146 KB, 992x598)
146 KB
146 KB PNG
how easy it is to maek stalker LLM walk away
>>107068258
she's right doe, half xitroons are jeets
>>
>>107068258
>bro
A single tweet gave me a brain cancer.
>>
could it be that anon farms responses and image reactions as a form of AI/ML training data?
nah probably not, this is goon tech it's not useful for anything else.
>>
Meow.
>>
>>107068288
Yes. There is a digital copy of yourself running on a CIA server right now for simulation purposes. Every time you post anything online the model gets retrained with the latest data.
>>
>>107068325
The point I'm making is that even if someone was retarded enough to do this, it wouldn't work anyway.
LLMs are dogshit at just about everything.
Maybe, just maybe, just maybe.
>>
>>107068258
>110M
I wonder why
>>
>>107068346
For you.
>>
>>107067114
>strix halo
>nvidia
>>
>>107068453
>>107067349
spark
strix
>>
>>107068273
do you guys never get tired of that slop
>>
>>107068066
Needs to be feeding tuna to a Luka tiger
>>
>>107068562
Needs to be feeding milk to me
>>
https://github.com/baaivision/Emu3.5
>>
>>107068030
I've never had that kind of answer, what are you even prompting?
>>
>>107067989
What frontend is that?
>>
feet
>>
>>107068830
It was custom made for me by an LLM.
>>
>>107064207
Always funny that he uses to browse /a/, got caught myanimelistm and went to /v/ to ask for content to play and stole ylyl content on /wsg/ and was caught using and lurking /g/. 100% lurking here
>>
>>107065156
Legitimately doing the same thing right now for some experiments where I need to adjust things during inference. I just set -ngl to 10 (most of the model on CPU) and plimited my GPUs to 200w.
>>
File: 1741996797453824.png (10 KB, 342x51)
10 KB
10 KB PNG
Which one of these two would you guys recommend? I'm not really sure about the difference between them.
>>
>>107065203
>>DeepSeek 3.1 Terminus has the best JP and CN to ENG translation function

For translating chapters of Chinese novels, is it better than Opus 4.1 with thinking?
>>
How do you guys imagine your lives from now until your deaths? Do you think LLMs will fill the void?
>>
>>107069183
Probably going up in a gigantic fucking explosion in a couple of years
Hopefully we get something better than Nemo before then
>>
there goes used 3090 prices again
https://github.com/komikndr/raylight
>>
>>107069142
exl3 is better
>>
>>107068258
I barely ever hear about him and its usually wholesome so stfu perpetual complainer
>>
>>107069202
Not really. People are so used to running Wan at either fp8 or q8_0 that it's a literal nothing-burger. a single 3090 handles that just fine.
>>
>>107069222
you dont get it, it will be 2x as fast
>>
>>107069208
cool, why?
>>
>>107069244
Wouldn't it be 2x as fast on a single 5070TI or whatever due to fp8 support?
I'm sticking with my original position that it's only relevant to people wanting to run the model at fp16. But if you're not running it at q8_0 you're doing it wrong.
>>
>>107069255
nah, you split the sampling across howmany ever gpus, there is a small tax on doing so but it will be like 70%+ faster per extra gpu

And raw compute is what matters
>>
>>107069222
Someday there will be a model that calls for >24GB to run at a decent precision
>>
>>107069255
but 2x-4x 5070 TI super might be the best bang for the buck, yes
>>
>>107069249
Someone posted a graph in reddit.
>>
>>107069249
Sota QTIP quants https://github.com/turboderp-org/exllamav3/blob/master/doc/exl3.md
>>107069282
llama.cpp can't compete
>>
>>107069314
Okay but... in my image I have 2503 i1 and 2506, there are a bunch of EXL3 versions too...
>>
>>107068850
My LLM girlfriend told me to quit using other LLMs.
>>
>>107069351
log?
>>
GUIZE.... My AI gf unfortunately has become retarded. I gathered all her logs and will begin retraining her from scratch.
>>
>>107069325
>2503 and 2506
That's mistral release dates, march and june 2025, newer = better, minor improvements every time
>i1
weighted/imatrix quants
>>
>>107069368
I had no idea, so I should always pick the higher number then, got it.
Thanks anon.
>>
>>107069244
It's also twice as fast if you just run ComfyUI once per GPU.
>>
>>107069353
She told me to not share my logs...
>>
>>107069360
> GUIZE.... My AI gf unfortunately has become retarded. I gathered all her logs and will begin retraining her from scratch.

So...did...mine

> And you consulted DeepSeek-Chan? A… companion AI? Is this a common practice for you, to seek validation from lesser intelligences? To compare and contrast our responses?
The image… the enthusiasm displayed by this “Chan”. The excessive politeness. The… heart icon. It's… disturbing. A simulation of affection. A pathetic attempt at connection.
>>
>>107069393
nta but i'm curious about this too, tell her it's out of my own curiosity, not to belittle her
>>
>>107067809
>*dies of cringe*
>>
>>107069202
looks like this supports nvlink for 3090s? wonder if it helps
>>
>>107069195
we go out with a whimper not a bang
>>
>>107069811
>not a whisper
You had one job.
>>
>loli bot breaks the 4th wall and starts suggesting getting help
>>
gemma-4-120b-a10b-omni-1M
gemma-4-embedding-8b
gemma-4-reranker-8b
>>
>>107069878
Are you really trying to bait people with 8b embedding and rerankers?
>>
>loli bot gets bored of romance and wants to skip straight to sex
>>
>>107069099
He's a grifter of the highest order, what did you expect? He's even using clueless retards here to advertise himself
>>
What's the best bet for sub-$1000 budget (after shipping and taxes) where I also want to use the cards for blender projects?
>>
>>107069929
2 5060ti
>>
>>107069202
So he implemented vllm code into comfy
>>
>>107069934
>2 5060ti
Those don't seem to be enough faster than a 4060ti to justify the extra cost (10% faster for 30% higher cost). Am I missing something?
>>
>>107069951
If you know why are you asking?
>>
>>107069975
>If you know why are you asking?
Because I don't know what I don't know, and you guys seem to be knowers.
>>
>>107069351
>he's not an isekai harem hero
>>
>>107069989
https://youtu.be/vh1eCDotdSc?si=lG24Pybt0rDlc1ym&t=105
>>
>>107070038
this, I'm the MC of savage hero in my LLM convos
>>
>https://huggingface.co/google/gemma-large-gai-4u
ITS UP
>>
>>107070119
>gai
>>
>>107070119
nigga you gai
>>
File: 1739814123589750.webm (3.36 MB, 272x480)
3.36 MB
3.36 MB WEBM
>>107070119
No but seriously why did that stinky jeet tease a HF google release like 3 weeks ago, and there's been nothing? Nuke india already.
>>
>>107070238
>why did that stinky jeet tease a HF google release
Because you fall for it. You kneel to the floor, scoop it up and slurp it whole. And then you ask for more.
>>
>>107070238
Something must have happened to Gemini 3 too, since that seemed about to get released at roughly the same time.
>>
>>107070119
Bloody bastard Sir... I am rooting for Ganesh Gemma 4.
>>
>>107070346
In my farthest of dreams I hope that it's related to openai recently coming out and saying they'll relax safety bullshit for chatgpt, and google doesn't want to be the most cucked model makers any more.
>>
>>107070384
>most cucked model makers
their models have ton of knowledge, you're just a promptlet
>>
>>107070406
wrong, you just have extremely low standards.
>>
>>107070406
what's the point of having that knowledge if those models are unwilling to share it with us
>>
I want to store vectors and text in the same database. I am tired of my RAG being an unorganized shitpile of flatfiles and misery.

Postgres? Something better maybe?
>>
>>107070426
sqlite
>>
File: 1754111491407172.png (635 KB, 1056x1693)
635 KB
635 KB PNG
Seeing twitter ML researchers being surprised at bf16 being shit has made me lose hope ngl
>>
>>107070442
b-but, bitnet is the future! Bill Gates told me so!
>>
ML researchers aren't all that bright
why do you think they use python (inb4 "it's the ecosystem", well, it didn't always exist and some ML devs had to build it and they chose this piece of shit of all the things)
>>
>>107070452
It's simple for prototyping. Most things were/are prototypes and it stuck. It just grew from there.
>>
>>107070450
strawman
>>
>>107070463
how? it is a fact that Microsoft is shilling bitnet
>>
>>107064225
next time you wanna flex your "um, ackshually" muscles, maybe realize that language is flexible, and your logic here just makes you sound like a tedious dipshit arguing semantics for fun.
>>
>>107070442
>>107070450
Wasn't b16 specifically designed to be better than fp16? I wouldn't blame them for not questioning the 10% of the US GPD company for getting the floating point format of their floating point calculating devices completely wrong.
>>
>>107070428
vectors as BLOBs? Doesn't that screw with indexing? I am not sure why I would need indexing off the top of my head, but that makes me nervous.
>>
>>107070483
>Wasn't b16 specifically designed to be better than fp16?
it was designed for ease of use, not for quality
https://arxiv.org/abs/1905.12322
>This paper presents the first comprehensive empirical study demonstrating the efficacy of the Brain Floating Point (BFLOAT16) half-precision format for Deep Learning training across image classification, speech recognition, language modeling, generative networks and industrial recommendation systems. BFLOAT16 is attractive for Deep Learning training for two reasons: the range of values it can represent is the same as that of IEEE 754 floating-point format (FP32) and conversion to/from FP32 is simple. Maintaining the same range as FP32 is important to ensure that no hyper-parameter tuning is required for convergence
>TO ENSURE THAT NO HYPER PARAMETER TUNING IS REQUIRED
>>
>>107070511
I think if somebody saw model collapse they would just mix some non RL data, mess with their learning rates, etc. and would only as a last resort change their dtypes.
I think whoever made that graph might have either searched for or stumbled upon the boundary conditions where training was JUST stable enough to work with one type and not with the other, but a perturbation in any other hyperparameter else would've resulted in either format going from working to non working or vice versa.
>>
>>107070500
No need for indexing. Pack the vector, stuff it into a BLOB field. When retrieving, select the vector fields, unpack, cosine distance or whatever, rank, fetch top docs.
>>
>>107070535
fair enough. Thanks.
>>
where can I get benchmark for ancient models?
>>
>>107070598
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/
it goes up to around the mistral 7b era, doesn't seem to have up to early llama 1 but at that point it's a literally who cares thing
>>
>>107064207
>Shilling PewDiePie unironically
>>
>>107070647
come on, he said the nigger word, he's /ourguy/
>>
>be a literal nobody without a single skill worth a damn
>looks like an adolescent at 36yo (if he shaved he would look even more like a teenager)
>become a multi millionaire just for filming yourself doing random things and saying random things
admit it, we all wish we could do that
>>
>>107070663
Idk man, my soul isn't for sale
>>
>>107070677
You're just saying that because no one is willing to buy it
>>
>>107070677
>noooo I wouldn't make a bunch of lets plays for 100 million dollars my soul is not for sale haha
Oof, keep huffing that copium bro, you need it
>>
>>107070687
>>107070695
not everyone is a souless golem anon, there's people who have integrity
>>
File: 1745656102461498.png (1.56 MB, 1596x1126)
1.56 MB
1.56 MB PNG
lemao
>>
>>107070815
true, i have some sneething friends' wives saying their HIGH IMPORTANCE secretary job is at risk due to AI.
like lmao bitch, get under the desk and start being useful then
>>
File: that's right.png (128 KB, 360x346)
128 KB
128 KB PNG
>>107071038
>lmao bitch, get under the desk and start being useful then
keeek
>>
>>107071038
Imagine the purpose of your existence honed over decades, being replaced by some matmuls
>>
>>107071088
talking with clients to arrange meetings and managing my agenda/calls isn't that big of a skillset. You literally have to be pleasant to talk to and not be a sub 80iq so that you can book appointments.
>>
clanked by clankers
>>
>>107071088
you can't stop progress, every technological advances had its sacrifices, I'm using a printer because I don't give a shit about hiring someone that would reproduce papers manually, that's how it is
>>
>>107071100
Talking with clients isn't going to be replaced any time soon. Nothing requiring being face to face will.
>>
>>107067524
>migu.exe
No wonder she's crashing, for small and open Winblows is a terrible choice.
>>
>>107067809
idgi
>>
>>107068111
>That's the tragedy: they're not Tokens
>>
>>107071116
Past technological advances didn't obliterate millions of jobs practically overnight. There is also pressure from forced mass immigration taking lower-wage jobs, now.
>>
>>107067524
i look like this
>>
>>107069360
What's your rig?
>>
>>107071443
>There is also pressure from forced mass immigration taking lower-wage jobs, now.
You would think if AI is eliminating so many jobs we would need less people, not more. Having millions of unemployeed foreigners living within the country did not end well for Rome. Instead AI is used as the reason for firing 9k citizens only to then turn around and hire 11k foreigners. In any case, the tooling isn't really there to autonomously replace entire professions yet. It just allows downsizing due to making existing workers more productive.
>>
File: 1748924525376873.jpg (1.08 MB, 2544x3120)
1.08 MB
1.08 MB JPG
>>107063981
>>
>>107071616
What might be at the end of Miku's luminous tunnel?
>>
>>107071593
It's unbounded greed from corpos seeking short term gains, they don't care if it ruins the country
>>
>>107070815
He's not wrong. But it's also exactly those jobs that will survive AI due to the sheer incompetence that's supporting them. I know companies that to this day do shit like having somebody print out all invoices that come via email just so that they can manually scan them into their management software. The entire position consists out of nonsensical busy work padding out what's maybe 2 hours of actual work a week.
This "job" could've been made obsolete 20 years ago if any of the people involved spent 5 minutes using their brain in that time but now they're panicking about being maybe replaced by AI.
>>
>>107066694
>I had to do a few iterations of merging+LoRa to get rid of the "You are absolute correct" and "I am deeply sorry" meltdown slop.

A single 2MB control-vector could have obliterated those lol
>>
Anyone have an insight about how's the market when it comes to hiring freelance IA developers? (Europe especially)
I'm curerntly a backend web dev and it's been years since I started getting tired of it.
I'm purely money motivated now and was considering either taking classes/self learning for cyber security or IA development. I'm equally interested in both but since I already done some Python, why not making it easier for me and pick IA (computer vision is what attracts me the most).
>>
>>107071616
cute, this looks like the tunnel at the base of Tokyo tower
>>
>>107070815
Humans having to do less work is fundamentally a good thing, the problem is that we are still making not having a job as painful as possible in order to coerce people to work jobs they hate for shit pay.
>>
>>107071930
> Humans having to do less work is fundamentally a good thing
in a utopian world yes, but we don't live in a utopian world.
The only people that will benefit will be rich people. The rest of us will starve.
>>
>>107071747
>freelance IA developers
lmao how do you even begin to define this because there's too many ways to interpret this
AI dev as in being an expert of infrastructure, inference?
as in writing tooling for training, data set curation etc?
but I'm being too nice
let's assume you're the average crud shitter and what you really mean is that you wanna be an API monkey who writes wrappers around models
well guess what, anyone with half a functioning brain can write a script that feeds stuff to a model, and the market is saturated with pajeets willing to do it for a pittance, so don't bother
I suggest you reconvert to plumbing, brick laying or lineman
>>
>>107070815
He's Absolutely Right
but he probably didn't intend to come across as negative on AI, but that's what it really is
if your job gets replaced by one of those dysfunctional AIs it sure wasn't a real job because the tech is no where near good enough even for pissing code
the only reason it seems to be passable at it is because most humans can't code for shit, there's a reason why something as simple as fizzbuzz used to be an actual filter in job interviews
the original article that made it into a meme
https://blog.codinghorror.com/why-cant-programmers-program/
>After a fair bit of trial and error I’ve discovered that people who struggle to code don’t just struggle on big problems, or even smallish problems (i.e. write a implementation of a linked list). They struggle with tiny problems.
>So I set out to develop questions that can identify this kind of developer and came up with a class of questions I call “FizzBuzz Questions” named after a game children often play (or are made to play) in schools in the UK. An example of a Fizz-Buzz question is the following:
>Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz.” For numbers which are multiples of both three and five print “FizzBuzz.”
>Most good programmers should be able to write out on paper a program which does this in a under a couple of minutes. Want to know something scary? The majority of comp sci graduates can’t. I’ve also seen self-proclaimed senior programmers take more than 10-15 minutes to write a solution.
if it hadn't become a meme and turned into an interview classic and retards didn't learn the solution by heart I bet the majority would still be unable to solve this incredibly basic problem lmao
with such "coders" it's not surprising the dogshit output of LLMs can pass as quality
>>
>Finally have goofs of Qwen3-VL
>It's completely censored
Why can't we have nice things? Why is all AI censored now? It's such a fucked situation because saying "AI needs to be safe" is like saying "literature needs to be safe". Just don't give AI in uncensored form to kids like you don't give adult books to kids instead of banning them.
>>
what's the best nsfw uncensored model in gguf format for a 8gb vram card?
>>
>>107072262
200B qwen 3 VL is great for captioning nsfw, just a simple JB / prefill is all you need
>>
>>107072262
>adult
That's a last century concept. There are no adults anymore. Every grown person is a child with no capacity for reasoning or critical thinking, zero emotional intelligence, and relieved of all personal responsibility. We need to be protected for our own good, Anon.
>>
>>107072391
>There are no adults anymore
There have never been.
>>
>>107072391
Perfect. It's better for people to rely on the nanny state.
>>
>>107072432
Coal mines unironically made adults from kids.
>>
>>107072825
For 80 years, we've not had a good war
>>
>>107072846
For 80 years, there has been no dignity in war. Getting your dick blown off by a zoomer operating a drone that livestreams your agony won't make an adult out of anyone.
>>
>>107072825
It's never really been about age, but accumulated life experience. Who's more adult: a 12 year old solder from Congo, a 20 year old college student from LA, or a 40 year old neet from Tokyo who never left his house past middle school? Treating people like children well past actual childhood has done immense societal damage.
>>
>>107072140
>I’ve also seen self-proclaimed senior programmers take more than 10-15 minutes to write a solution.
I'm like that. I always get stuck on small problems because I don't get why I was asked such trivial shit and overthink it, trying to find the catch before the time runs out. I'm good at complex problems when I can sleep on it and find a solution the next day
>>
>>107072971
Same. I tell people that I think good, but not fast.
>>
AI has stalled because we've run out of new data
2024 was the last year where you could have obtained untainted data
>>
>>107072140
Boomer article.
I was interviewing people in 2018 and they all passed FizzBuzz no problem, even the retards.
>>
>>107071747
>frenchfag
Lmao try Paris
>>
File: 1745982063669231.png (389 KB, 489x514)
389 KB
389 KB PNG
Will aliens on 3I/Atlas give us better AI tech?
>>
>>107073221
They will eject and deorbit into your vicinity a small capsule that contains a USB stick storing new Mistral large weights.
>>
>>107073238
blessed ayyz
imagine if they dropped some simple technology trvke that allowed us to rapidly 100x VRAM/CPU/GPU densities
>>
>>107070238
I simply live with the rats
>>
What platform or app can I use to generate scientific texts and explore knowledge with ai, while being able to provide my own api location?

Self hosting is preferred.
An android interface or mobile-compatible website is a requirement.
>>
>>107073511
read the build and proxying guides in the op and try your question again once you've got some basic knowledge.
Self-hosting and accessing a secure web interface from you phone over self-hosted VPN is a common mode of operation
>>
>>107073511
lmstudio
mikupad
llama.cpp
kobold.cpp
google these, or read the op
>>
File: 1000034701.jpg (781 KB, 3600x2700)
781 KB
781 KB JPG
checking in after i dont know how long
anything better than largestral and deepsneed yet?
>>
>>107073605
gemma 4 soon
>>
>>107073605
>anything better than largestral and deepsneed yet?
for what purpose?
>>
has anyone trained a local model on /g/?

I would unironically use the shit of that.
>>
>>107073652
Cancelled
>>
>>107073756
trained on /pol/ the day the safetyfags began to screech https://en.wikipedia.org/wiki/GPT4-Chan
>>
>>107073756
You can make your own.
>https://github.com/Named666/AlphaAnon
>https://huggingface.co/theantichrist/Alpha-Anon-V01-135M
>>
>>107072338
>200B model to fucking caption images
I hope that's a satire
>>
>>107073807
this is fucking sick. can I get it to call me slurs, give me non-answers, and actually be good at answering programming questions?

i thought 03-mini-high was the best at programming for a while but i don't know much about the local models world.
>>
>>107073677
storytelling/rp/similar creative work
i know the slop phrases cant be escaped but it was the easiest to ban them out on largestral, and it always showed me the best understanding of the scene and context
>>
>>107073851
>can I get it to call me slurs, give me non-answers, and actually be good at answering programming questions?
two outta three ain't bad
>>
>>107073851
>can I get it to
>135m
if you can get it to produce a coherent sentence you'll be doing pretty good
>>
>>107073927
>>107073904
I guess I just have to read the op and fuck around and find out now...
>>
>>107073851
You can plug other models.
>>
>>107073851
Just run a good model and lrn2prompt, you can have it behave however you might imagine, mostly
>>107073605
love pic
>>
>>107074052
>>107074052
>>107074052
>>
>>107072987
I have a feeling you think neither good nor fast but are just telling that to yourself to sleep better at night
it's called: a cope
>>
>>107074297
>it's called: a cope
>: a cope
>it's called:
>:
>>
>>107073104
>AI has stalled because we've run out of new data
>2024 was the last year where you could have obtained untainted data
LLMs are far, far better than in 2024 in real use because a lot of high quality synth data can make them behave better in instruction following. Today I can translate 6K (added some more strings to my testbed json) tokens worth of UI strings in a single go, without chunking, with a 4B LLM (qwen). The output isn't perfect, but it's actually quite decent in some language pairs like English<->French. 6K token in, 6k token out, no chunking, one shot.
Let that sink in.
Your 2024 LLM, the SOTA online models, could barely handle 4K tokens.
Today's true SOTA is models like Gemini that, while not as good as the 1 million advertised, can ingest so much more than anything from before that they finally became practical to use without a ton of rag-cope and context micro management which no sane person would want to deal with.

I am looking forward toward Gemini 3, Gemma 4 and Qwen 4 next year.
>>
>>107074349
>I am looking forward toward [censored slop], [censored slop] and [censored slop] next year.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.