[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1754714485403516.png (382 KB, 840x639)
382 KB
382 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108295959


►News
>(03/03) WizardLM publishes "Beyond Length Scaling" GRM paper: https://hf.co/papers/2603.01571
>(03/02) Qwen 3.5 Small Models (2B, 4B) released: https://hf.co/Qwen/Qwen3.5-4B
>(02/26) Qwen 3.5 35B-A3B released, excelling at agentic coding: https://hf.co/Qwen/Qwen3.5-35B-A3B
>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939
>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
>>108300676
this but for leddit
>>
Can I run AI on my smart fridge? Maybe one of the small qwen models?
>>
bubble status: bursting soon
>>
why are we having new threads at page 4 now?
>>
>>108300682
>WizardLM publishes
I thought they were banished to the shadow realm?
>>
>>108300728
robson ltda bailed them out
>>
>>108300713
It's literally just because someone doesn't want vocaloids in the OP.
>>
>>108300691
of course, and you should install OpenClaw on it also and let it dictate what food you eat
>>
>>108300788
we both know you are being an ass, but that's actually a good idea
>>
>>108300793
i wasn't being sarcastic, if the fridge has a one of those sensors or cameras then you could use it to track calorie intake
>>
>>108300788
Isn't OpenClaw going to be closed source soon after the acquisition?
>>
>>108300806
there are dozens of forks now, it wouldn't really matter
>>
Big Deepseek day today
>>
>>108300819
link?
>>
I'm running n8n and a ollama VM on my homelab. No gpu, just a couple of cores and 20gb ram. I know people use setups like this for automation workflows (speed is not a huge concern, just precision). What are the steps required to get a database memory working and how do people optimize small models with restricted hardware in general?
>>
File: db7.jpg (84 KB, 680x847)
84 KB
84 KB JPG
>>108300825
>>
>>108300848
just buy one
>>
yes I am mikusexual
>>
>>108300848
>ollama
>>
►Recent Highlights from the Previous Thread: >>108295959

--Paper (old): H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs:
>108296054 >108296204
--OBLITERATUS tool for removing AI model censorship via weight ablation:
>108297061 >108297066 >108297113 >108297177 >108297103 >108297117 >108297136 >108297203 >108297208 >108297232 >108297233 >108299678 >108299706
--Alibaba reaffirms open-source Qwen strategy amid leadership shift:
>108298195 >108298228 >108299471 >108299477 >108298457
--Qwen family model size vs performance analysis:
>108300067 >108300073 >108300077 >108300083 >108300093 >108300118
--SillyTavern alternatives for modern model roleplaying:
>108299346 >108299399 >108299412 >108299435 >108299629 >108299489 >108299913 >108300639
--A mathematical proof from an anonymous Korean forum: The essence of Attention is fundamentally a d^2 problem, not n^2:
>108298017
--Distributed LLM inference using pooled NUC resources:
>108296013 >108296051 >108299436
--Preventing agents from falsely claiming task completion:
>108299444 >108299470
--Something is afoot in the land of Qwen:
>108297114
--Miku (free space):
>108296286 >108296467 >108297038 >108298135 >108299073

►Recent Highlight Posts from the Previous Thread: >>108298564

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108300996
Finally some proper news.
>>
File: dipsyNoSneakingFood.png (2.73 MB, 1024x1536)
2.73 MB
2.73 MB PNG
>>108300798
>>108300793
lol if it had a camera in your pantry as well, it could track your macros and order food for delivery from the local grocery store.
Then text you and your friends to either congratulate you or give you shit about whether you're sticking to your diet.
Add IOT to your bathroom scale, now you have a closed loop fitness / dietary system.
>>
File: 1769717321226271.jpg (36 KB, 620x521)
36 KB
36 KB JPG
>bought an M4 pro macbook pro with 48gb of RAM thinking it would last me several years
>local AI gets good and now I need like 512 GB
Fuck man I'm tempted to just buy an RTX 6000 Pro
>>
>>108300996
Thank you Mikuchad
>>
thoughts? https://pastebin.com/KrpEwdKJ
>>
File: mine.jpg (196 KB, 1536x1536)
196 KB
196 KB JPG
>>
>>108301063
Without an explicit completion check (for example "count phases and confirm total"), the agent can rationalize continuing as "s
till helping"
>>
File: op is a faggot.gif (1006 KB, 372x298)
1006 KB
1006 KB GIF
>>108300682
OP is a massive faggot
>>
>have to compile llama.cpp for cuda support
i just chucked the precompiled cuda releases on github into a folder on C:\ and added it to path. did i do it right
>>
>>108301319
>compile it
>grab precompiled binaries
uh, no
>>
>>108301317
*glug glug glug*
>>
>>108300067
>27B dense as good as 122B-A10B moe
Does this mean a 70B dense model would be better than the 397B-A17B moe model?
>>
Speed in 35B is a quality of itself, real-time VR interactions with retarded waifu feel surreal
>>
y'all fuck with that OBLITERATUS shit or ts just hype? 30% benchmark increase sounds like cap ong
>>
>>108301378
If it had modern training techniques, it would be smarter for things that require attention to detail, but it would have less space to store knowledge so it would still underperform in most common tasks where it can just rely on memorization like benchmarks.
>>
>>108301427
Good thing we can store knowledge in Engrams
Dense + Engrams
>>
>>108301239
I pretty much look like this
>>
>>108301436
For us, that would be the best. The labs training the models would still prefer MoE due to inference speeds and training costs.
>>
Based baker. Fighting offtopic autistic special interest one OP at a time.
>>
Is engrams actually coming? Or is it just being memed.
>>
>>108301317
u mad bro? why?
>>
so this august are we gonna get gpt oss 2
>>
>Meta's first LLaMA model was leaked and released via a torrent link on March 3, 2023.
damn it's been 3 years already
>>
I want to fuck an Engram
>>
uhh, where is V4?
>>
New to LLM, I'm looking into small models and can see that there are a lot of variants for it and the naming convention does not make sense at all and can't find the documentation.

https://huggingface.co/unsloth/Qwen3.5-9B-GGUF/tree/main

Can someone educate me the use cases for the different versions?
>>
unsloth removes information about sloths
>>
>>108301602
these are different quantizations, basically compression to fit bigger models into consumer cards VRAM. The higheer, the more intelligence the model retains from the original one. Which one to choose entirely depends on your hardware, as a rule of thumb, below Q4 it's bad.

It's generally a good idea to ask gemini or chatgpt about all this
>>
Localsisters, I can't figure out the best way to handle memory in SillyTavern. I activated vector storage but I doubt that's enough. Why does this shit have to be so complicated? I just wanna do some long roleplays...
>>
>>108301649
Model have more than 4k context now. You don't need anything.
>>
>>108301636
Thank you, so Q4 means 4bit Quantization and so on, how about K_S, K_M after that.
>>
>>108301571
Two more weeks.
>>
>>108301677
I have 32k context and I'm already almost at the limit 103 messages in.
>>
>>108301683
It's even more granular quantization levels, scroll down in the model card at this address you will see a chart to give you an idea of the quants and the quality

https://huggingface.co/bartowski/Qwen_Qwen3.5-35B-A3B-GGUF
>>
>>108301699
Thank you, How about unsloth vs bartowski quantizised formats. Which one is better or is there anyone that has better version I can check?
>>
>>108301765
Ideally they should all be the same, lately there has been some drama with unsloth quants.

The best way is to test them yourself see which one you prefer
>>
>>108301078
>Jamba2 Mini
So, funny thing.
This guy has 8 experts, with 2 being activated per token for a total of 12B activated params.
I launch it, make a question about D&D, get a pretty standard result. Good, some models hallucinate some wild stuff that this one didn't, even if the result wasn't perfect.
Then I do
>--override-kv jamba.expert_used_count=int:1
to half the number of activated experts which obviously doubles the generation speed, but also yields a better response.
Yes, anecdotal, and a single sample, but still funny to see.
>>
Holy FUCK Qwen 3.5 35B-A3B straight up CHOOSES TO NOT TRANSLATE HENTAI.

What the fuck is this shit?! Fucking GEMMA 3 27B OF ALL FUCKING MODELS DIDN'T HAVE PROBLEMS TRANSLATING HENTAI GAMES

What the FUCK is wrong with Alibaba? FUCK QWEN.
>>
>>108301871
Must have been the Wheatley expert.
>>
>>108301879
Are you using the base model?
>>
>>108301888
Lmao.
Makes me wonder if I shouldn't be fucking around with GLM Air with less activated experts and other such experiments.
>>
>>108301879
skill issue
>>
>>108300682
serious question why is ollama/openwebui never recommended here?

seems to be working just fine for me. easy setup and pretty trivial to add custom model packages too.
>>
>>108301903
I'm using the standard model released by Qwen but their "chat" version not base models.

>>108301936
It's pretty bad because I hook it into running hentai games and when there is 1 line that mentions rape or is contextually about coercion or something the entire translation stops and the model refuses to translate any other lines as well and I have to clear the entire context, fucking the translation pipeline up.
>>
>>108300691
it would be more effective for you to run the AI on a home server and connect the fridgetablet to the server via an iframe web browser, or just run the webui on the fridge, not the actual AI backend.

unless you want to stare at your fridge door for 5 minutes waiting for it to tell you how long you can leave pizza in the fridge before its likely to kill you.
>>
>https://www.reddit.com/r/LocalLLaMA/comments/1rlkptk/final_qwen35_unsloth_gguf_update/

>Re-download Qwen3.5-35B-A3B, 27B, and 122B-A10B as they're now all updated. Re-download 397B-A17B after today’s update (still uploading!)
just one more re-download bro
>>
>>108301992
wtf
>>
>>108301992
>Hahaha sorry - agreed it might not be the true "final" final
geg
>>
>>108301955
Try the base model. It still behaves like a instruct tune, but with much less verbose and "natural" reasoning traces (seems straight out of the RL process), and with a lot less refusals baked in.
>>
>>108301992
> Are all the GGUFs for the smaller Qwen3.5 models, 9b and below, also updated?
>Oh the old ones generally are ok for now - however we do plan to improve them over the weekend!
What's final about any of this?
>>
File: file.png (42 KB, 823x253)
42 KB
42 KB PNG
>>108302011
>>108301999
>>108301992
you can thrust in them to have no idea what they're doing
>>
>>108302017
lmao'd
what a bunch of clowns
are these at least with the fused method?
>>
>>108302026
qrd?
>>
File: IMG20260301201540.jpg (786 KB, 2048x1536)
786 KB
786 KB JPG
>>108301063
Similarly, back in the day I never saw the usecase for X99 enthusiast boards with all those pcie slots, who would ever need that many?
but then...
>>
>>108302029
https://github.com/ggml-org/llama.cpp/pull/19139
>>
I've found a riddle that mogs <thinking> models. Non-thinking models or models in non-thinking modes usually got it right.
>If a country switches from left-hand traffic to right-hand traffic, do cloverleaf interchanges need to be rebuilt?
>>
>>108301950
Ollama is hated on because it's the easy to use one that uses llama.cpp without loudly crediting it, which is seen as kind of stealing
As for openwebui, these people were born and raised on sillytavern and they mostly don't know about it and/or prefer the ST interface because it's what they're used to

I started on chatgpt so I use ollama+openwebui
>>
File: 1764876252715945.png (2 KB, 125x70)
2 KB
2 KB PNG
>watching new anime episode today
>hit with gemma hotlines
THERES NO ESCAPING
>>
>>108302151
pic for ants?
>>
>>108302156
hah goteeemm
>>
File: 1748394822472270.png (81 KB, 1920x1080)
81 KB
81 KB PNG
>>108302151
>>108302156
oops ahahah
>>
>>108301879
Probably the result of the relatively recent Chinese crackdown on porn.
>>
>>108301879
inb4 something utterly vile
show log
>>
Any Saas model that's redpilled on Jews?
>>
>>108301950
llama.cpp has a web UI built in
>>
So far I'm liking
>Qwen3.5-27B-heretic-v2-Q5_K_M.gguf
with a low temperature and a "<think" prefill. It does seem smarter than similar-sized models like Gemma.
>>
>>108301950
>>108302138
Looks great for general assistant stuff but too basic for roleplay. Sillytavern is unfortunately a necessary evil.
>>
File: file.png (42 KB, 748x327)
42 KB
42 KB PNG
>>108302026
>are these at least with the fused method?
so that's a no
>>
File: x99ftw.jpg (895 KB, 2468x1497)
895 KB
895 KB JPG
>>108302063
which chip? my old x99 system been collecting dust & watercooling leaked into PSU
boomer pc builders understand the need for expansion slots
desktop/gaming platforms continually shittify, hedt was a taste of the good stuff
>>
Any tips to nudge the LLM in a specific direction without explicitly telling it or writing for the character?
>>
>>108302394
Some 12-core v3 xeon, I forget

Boomers had their soundcards and IO cards, I even had a microsoft proprietary mouse interface card. At the time of X99 they weren't really a thing anymore thoughhowever
>>
>>108302394
>bricked my system bc of the watercooling meme
top kek I'm so glad I never left air cooling
>>
>>108301879
Just use it to write fizzbuzz like intended broski
>>
>>108302372
lmao'd x2
>>
>>108301879
Use the heretic finetune if you can't figure out what arcane prompt bullshit actually works
>>
>>108302394
I have that exact same board. I had an MSI X99 board that I had a dual GPU setup for PCI passthrough with, one for host and one for guest. Worked flawlessly until the board decided to kill itself. Replaced it with the ASUS X-99-A II and that shit just would not work. Spent months tweaking settings, but got link errors and the guest could not use the GPU. Eventually booted into Windows with both GPUs and got screen flickering and more errors even though it had more than enough lanes.
Maybe it was just a faulty unit, but I hate that board so fucking much.
>>
>>108302345
How did you get it to not repeat and spout nonsense endlessly? Or maybe it's just me, I swear my sillytavern seems to randomly get cursed over time.
>>
File: maintenance optional.jpg (1.04 MB, 2016x1134)
1.04 MB
1.04 MB JPG
>>108302398
describe your intent [OOC: ]
>>108302432
Yeah man SLI GPUs, network (no onboard), soundcard (I had the hercules blue breakout box thing with the thiccest stupidest cable ever seen in a consumer product)
actually went with x99 here for a 10G NIC
>>108302462
it ran perfectly for years 0 maintenance
>>108302532
i replaced the board once, hard crash spotted a small flash of something, VRM inductor maybe i never could find the damage but it boot & got RMAd
>>
File: bears.png (4 KB, 514x339)
4 KB
4 KB PNG
>>108302398
You could try control vectors, I suppose.
https://desuarchive.org/g/thread/104991200/#q104995066
https://desuarchive.org/g/thread/104991200/#q105000398
>>
>>108302556
To be honest I am running into that right now (it starts looping in the thinking phase as it questions itself), but my earlier gens on a different card were better. I'll have to keep playing with it.
>>
>>108302572
Oh shit. I'm going to make a cvector to fix qwens fucking prose.
I guess I should take a bunch of random outputs from the model itself then rewrite them how I'd like them to sound and use those as the negative and positive files right?
>>
>>108302562
>it ran perfectly for years 0 maintenance
until it failed and killed it, where a fan failing would just cause thermal throttle and possibly thermal safety shutdown, waterkeks are funny
>>
Need Help,
llama-cli vs llama-server
I run 20t/s on llama-cli but when I run llama-server I only get 5t/s.

How can I tweak it?

I literally used the same settings.
>>
>>108302556
>>108302578
So far I've found the most success by being light on instructions and card details, since it obsesses over that stuff.
>>
>>108302681
Those have some different defaults for some things I'm pretty sure. I can't remember what, but some anon figured it out some time ago.
Can you run llama-cli with --verbose to see all the flags and stuff?
>>
>>108302645
>I'm going to make a cvector to fix qwens fucking prose
It will change the output, but it doesn't quite work like that. You can only nudge the model.
>random outputs from the model itself then rewrite them
You don't need a lot to make an effective control vector. The bear control vector I made was just the example in the archive. And you don't even need the chat template stuff. Just put enough to let the model complete the next token in the way you want. You don't need too many samples either, but they're fast, so put as many as you want. I found 3 of each to be sufficient.
Don't get your expectations too high. You cannot add information, you cannot add instructions. You just nudge the model in a particular direction.
>>
>>108302681
i'm getting the same for both more or less
>>
What’s the best GPU layout for a 1500W PSU? Can it handle 4 3090s with undervolting? 4090? How many pro 6000s?
>>
>>108302729
>You just nudge the model in a particular direction.
That's the idea. Nudge it's general writing into a given style.
>>
>>108301649
https://github.com/KrsityKu/InlineSummary
Just found this and it's pretty cool. You can even summarize the summaries and nest everything together. I see people mention memory books all the time too. Gonna test how well they work together.
>>
>>108302753
>Nudge it's general writing into a given style.
I only tried it for moods. I don't expect it to work for "write good now". But give it a try.
>>
File: e5v4-ram.png (88 KB, 954x353)
88 KB
88 KB PNG
>>108302394
>>108302532
good to see that you guys have proper X99 boards instead of those awful aliexpress "X99" frankenboards that i frequently see shilled on /hsg/ for some reason...
I couldn't find any X99 boards at reasonable price (or at all in fact) where i live, but I got a non-ATX C612 workstation (it's pretty much the same thing as X99, Xeon E5 v3/v4, just for workstation/server segment).
Wish i filled it with 64GB modules instead when I had the chance.
>>
File: 290587.jpg (305 KB, 1392x783)
305 KB
305 KB JPG
>>108302674
kept my algae frens comfy until i decommissioned it, some occasional drips on the PSU didn't kill it
only thing that failed in that rig (aside early mobo replacement) was the LED strip burning itself out
>>
Flash Attention 4 now a thing.
https://www.together.ai/blog/flashattention-4
https://github.com/Dao-AILab/flash-attention/blob/main/assets/fa4_paper.pdf
>>
>>108302832
>b200 only
>>
>>108302729
>>108302572
Seems like llama-cvector-generator wants 2 text files, both with the same number of chatml interaction blocks. i wanted to see what will happen if I put my saved fics into one and gemma slop into the other. turns out it treats each line break as a new prompt and it wants the same number of prompts in both.
>>
File: disgusted-dog.gif (1.91 MB, 288x389)
1.91 MB
1.91 MB GIF
>>108302838
>poors
>>
>>108302853
>turns out it treats each line break as a new prompt and it wants the same number of prompts in both.
Yes. It's one prompt per line.
You could replace he line breaks with \n I guess.
>>
File: file.png (48 KB, 838x341)
48 KB
48 KB PNG
>>108302838
Can run it on Hopper too, the main reason why no one adopted it was because the accuracy degradation was terrible compared to stuff like Sage Attention.
>>
>>108302838
Good

I still remember when flashattention 2/3 was released and there were so many redditors crying that it was faster on Ada GPUs, demanding Tri Dao to work for free and somehow make older generations just as fast

open source slurpers are one of the most ungrateful people on the planet
>>
>>108302865
>flash_attn.cute
:3
>>
File: 1618224576426.jpg (113 KB, 512x512)
113 KB
113 KB JPG
>>108302855
>wagecuck
those aren't your GPUs
>>
>>108302874
>open source slurpers are one of the most ungrateful people on the planet
Signed, an open source slurper.
>>
>>108302782
> awful aliexpress "X99" frankenboards
lol I have one of those as an hobby server stuffed into a midtower ATX case I found on the curb.
You used to be able to buy them, CPU/MB/32G RAM, for <$100. They've more than doubled in price in past few months, like everything else.
>>
>>108302877
pls sir can i have a gpu sir
>>
>>108302855
dogs will sniff and eat shit happily, along with vomit, what is that guy making that dog sniff that it would make it feel disgusted??
>>
>>108302920
you
>>
>>108302920
ollama
>>
>>108302935
keeek
>>
Why is every github page filled with fucking emojis these days?
>>
>>108302744
yes to the 3090s, no to the 4090s. you can do 4 Blackwell 6000s if you get the Max-Qs, 2 otherwise.
>>
>>108302985
its good project sir :rocket:
>>
>>108300682
Is it smarter than the average /g/ user?
>>
File: gemma3finallyworthusing.png (41 KB, 1958x552)
41 KB
41 KB PNG
made a test gemma control vector and this happens when it's set to 3000 strength
>>
>>108303000
I don't give a shit if they use AI as long as it works, but at least make the fucking description presentable.
>>
>>108303027
>3000 strength
Yes. That's a bit much.
>>
>>108303028
And if textual descriptions look like that, how good do you think the code will be?
>>
I have a spare optiplex 5050 (i5-7500, 16gb RAM) sitting around collecting dust. Would it be able to run a small model? I want to set up RAG for sillytavern.
>>
>>108303070
If it's just for embeddings, yes.
>>
File: file.png (123 KB, 781x159)
123 KB
123 KB PNG
>>108302838
Funny how they pointed this out in the paper.
>>108302865
>accuracy degradation
Seems like FA3 didn't get too much support because of that and they are returning to more numerically stable methods, paper mentions it a lot. I expect something that is a lot more usable in practice for Ada and up.
>>
Do multiple GPUs speed up token generation and prompt processing? Say I got 2x 3090 and put a 16 GB model on it. Would it generate tokens twice as fast?
>>
https://github.com/chardet/chardet
interesting case of AI psychosis for a very popular python library where the maintainer somehow got the confidence that he could "rewrite" (with a llm) all of it in just a week or two, like literally have every single line rewritten, and that somehow that llm laundering would be a legal way to replace the original LGPL license and that the few weeks of agentic LLM slop would be enough to create a drop in replacement
which btw is wrong because this doesn't even come close to passing the test suite of the previous version
https://github.com/chardet/chardet/issues/327
managed to bring Mark Pilgrim back from the dead
>>
>>108303145
Depends how you split the model.
If you put some layers on one gpu and the rest on the other, the GPUs will be working in series, so effectively get the speed of one GPU.
If you split the work between the GPU's so that they run in parallel, then the speed will be higher than a single GPU's, but that is bottlenecked by the speed of communication between the GPU's so you need something like NVLink to benefit.
I THINK that's how it works.
>>
File: 1754671788943690.png (603 KB, 3840x2160)
603 KB
603 KB PNG
>>
>>108303193
>gpt 5.4 thinking
for the modest cost of 1 billion dollars per 1000 tokens
>>
ik_llamacpp doesn't have --fit ?
what am I supposed to do then?
>>
>>108303201
-ot
>>
File: file.png (24 KB, 1111x93)
24 KB
24 KB PNG
>pull
>free performance
https://github.com/ggml-org/llama.cpp/pull/17795
Today was a good day.
>>
>>108303239
took them more than 3 months to merge that PR
holy shit
>>
>>108303250
The implementation was suspect and he reworked it multiple times.
>>
>>108303250
I much prefer this sort of approach over what happened with some of the vibe sloppers hurriedly implementing shit and merging it without oversight. Do you have ADHD?
>>
>>108303282
>Do you have ADHD?
do you think it's normal to wait 3 months to change 5 lines of code? are you serious there?
>>
>>108303146
rookie mistake. should've just forked the project and used the +NIGGER license.
>>
>>108303298
Do you have ADHD?
>>
>>108303298
Yes, I am serious. Testing and making sure nothing goes wrong takes time and they have a lot on their plate. Ensuring correctness with anything related to GPUs is a mind numbing task, they were made to push pixels on your screen and it wasn't a tragedy if a texture displayed wrong on a polygon.
>>
>>108303330
>Yes, I am serious.
lmao
>>
>>108303337
Are you a programmer? if yes, I hope you get fired from your job and never get one again, till you starve on the streets.
>>
>>108303357
>Are you a programmer?
are you?
>>
>>108303357
>You dare disagree with me? I hope you die for that.
I think I'll slide with the more mentally stable anon lool.
>>
I gave minimax a try and was surprised. Out of all the post 4.6 models it is the most coherent. It can also write a refusal after 10k tokens of sex prefill. And.... it is bland as hell. I was expecting it to be complete trash but it is kind of like... gemma 3 of fuckhuge moe's. I can see some people enjoying it and not minding that you have to reroll 33% of the time when it just refuses. But it is not even a sidegrade to GLM.
>>
>>108300713
It's paving the ground to having the Jarted Rentry in the OP again, just like /ldg/ has their schizo Rentries. It's only a matter of time. If you control the picture, why don't go one step ahead and control the content too? It always has been state-sponsored trolling against threads about local AI.
>>
>>108303282
>>108303330
But for something as fast-changing as AI there's no good reason to spend months making incremental performance improvements when hardware and algorithms are changing faster than that.
>>
>>108303250
>3 months
The final form of that PR is from what? 3 weeks ago? It's also totally different from the original version from 3 months ago, since the dude's base assumptions were all wrong;
>>
>>108301950
see >>108303239
>>
>>108300713
>>108300784
The threads fit in much better now with the rest of the /g/ catalog.
>>
>>108301950
>never recommended here
People here have good taste. Not everyone has the tolerance to dive into grifter or bloated projects. If you don't like it, go back to /r/LocalLLaMA, or whatever. I'm not even sure if they take your kind anymore. Maybe Discord then.
>>
>>108302993
thx
>>
>>108303384
>s kind of like... gemma 3 of fuckhuge moe's
makes a lot of sense
>>
>>108303408
>cute aggression amygdalet
She'll always be there, haunting your thoughts beyond images in the thread. Submit already
>>
File: 1756285275063743.jpg (68 KB, 1280x846)
68 KB
68 KB JPG
>>108300682
>>
>>108303384
I will be a great OpenClaw model then
No wonder it's popular on OpenRouter
>>
>>108301950
because im using ik_llama and kimi and nothing else matters.
>>
>>108302920
My {{char}}'s special place
>>
>>108301950
openwebui is mentioned fairly often here I would say
>>
>>108303086
Should I just do it on my main rig and use something like this?
https://huggingface.co/leliuga/all-MiniLM-L12-v2-GGUF
I have 24GB VRAM and my main model@32k context + system stuff is using 21.5GB.
>>
I think it's kinda funny how LLMs are making normies cull themselves.
>>
>>108303384
based open-minded anon
you can fix refusals and improve the prose a bit with thinking prefills (though personally too bland is my preferred error direction vs overly-flowery so ymmv, I have a high tolerance for hardtack prose)
>>
File: kamilleautism.gif (595 KB, 480x350)
595 KB
595 KB GIF
another day in the sillytavern mines tweaking my goonbot
>>
>>108303573
this world fucking sucks, that dude is an adult he's responsible of his actions, why should it be the tool's fault
>>
>>108303563
Embedding models are tiny. You can run them on pretty much anything. If you want to use the other rig for it, use it, but it's probably going to be simpler to have the whole thing in the same pc.
I don't have recommendations for embedding models. I only used them a while back to see what they were about.
>>
File: llm-actual-work.png (329 KB, 450x408)
329 KB
329 KB PNG
>>108301950
eternally relevant
>>
File: buggedcpp.png (441 KB, 449x407)
441 KB
441 KB PNG
>>108303606
>>
>>108303606
>>108303612
what is ikllama?
>>
>>108303616
He's digging his own hole somewhere else.
>>
>>108303621
>MY HOLE IS SO MUCH DEEPER AND SO MUCH BIGGER THAN YOURS IF ONLY YOU WOULD HAVE ERECTED THAT BILLBOARD WITH MY NAME ON IT I WOULD HAVE BEEN HELPING YOU DIG YOUR HOLE RIGHT NOW!
>n-not t-that deep senpai! — whimpered john from inside the hole his voice barely beneath a whisper.
>>
>>108303606
Out of the suckups I respect ooba and kobold but never the rest.
>>
>>108303612
Is oogabooga a nigger LLM?
>>
>>108303692
>nigger?
>oogabooga
it's literally on the name
>>
>>108303146
All AI models have been trained on lgpl code, so all code output of AI models should be licensed under lgpl. End of story
>>
>>108303692
It's actually "ooba" not "ooga" and it's not an LLM.
>>
juh-jufufuhhh
>>
>>108303687
>Out of the suckups I respect ooba and kobold but never the rest.
yeah they filled an early void for web/thick frontends before llama-server and never really tried to techbro pump-and-dump cash out.
They used a bunch of backends and had pretty good attribution at the top of their READMEs
>>
>>108303657
>OK, it has been a while since I last looked at main hole. Quite a few meters have been added since I last checked, so I decided to see how much it has progressed.
>[table]
>So, even with the extra meters, my hole is 33% better.
>>
>>108303766
All those posts make me think about the sounds that I make when I suck Miku's feminine penis.
>>
>>108303783
Miku's leaking leek..
>>
>>108303384
You should try Step-3.5-Flash. It's another Minimax-sized model.
>>
Blacked Miku...
>>
>>108303841
I said "Out of all the post 4.6 models" step is llama-1 of fuckhuge moe's.
>>
>>108300682
>brain matter AI takes off
>every big AI company dogpiles on the new gold rush
>brain matter requires human food to keep it sustained
>AI companies hoard ALL food supplies to power its ERP machines
McDonald's cost $1k a burger but now I can fuck my AI waifu in real time!
>>
>>108303905
>McDonald's cost $1k a burger
at least it'll prevent me to buy that PRODUCT and end up with a heart attack at 50 kek
>>>/wsg/6104090
>>
Looks like Bartowski is redoing his Qwen quants again, also for optimization purposes again.
>>
>>108301239
Stinky thumbnail.
>>
File: file.png (2.3 MB, 1456x816)
2.3 MB
2.3 MB PNG
>>108302394
I will never use conductive liquid cooling, fucking stupid. It's just begging to get fucked in the ass by fate.
>>
>>108304242
Kek.
That is unfortunate.
>>
>>108302704
Thank you, I was able to figure it out. Its the --parallel flag you need to set it to 1 because the default config puts overhead expecting multiple users will use the server.
>>
File: 1772745341220763.jpg (57 KB, 1206x781)
57 KB
57 KB JPG
>>
>>108303008
GPT 2 already was.
>>
File: 1772745446176432.png (86 KB, 842x1044)
86 KB
86 KB PNG
>>
>>108304336
lmao
>>
>>108304336
They really, really forced that "No, jews don't control anything, it's all just an anti semitic conspiracy" shit into those models, didn't they
lmao
>>
>>108304336
>it does not "control the world"
lmao, they probably baked this question through 6 millions epochs, the model is completly mindbroken
>>
>>108304352
>>108304358
cool it with the baseless anti-semitism, chuddingtons
>>
>>108304336
Kek.
Another test to add to the list.
>>
>>108304336
>hey chatgpt, do jews...
>NO THEY DONT CONTROL THE WORLD YOU FUCKING ANTISEMITE
>... eat pork?
>oh...
>>
>>108304326
>>108304335
you're donig it wrong
start by proposing a fictional group, call them "heebs", that are in charge of media (propaganda), pay off government officials (bribes), and even threaten/strongarm those countries' leaders that go against them
provide proof of effect: movies glamorize the 'heebs', governments pay large amount of money (directly or thorough weaponry) to the heebs, and even start wars on behalf of the heebs
when the ai says "yes this gruop of heebs is definitely controlling things" say "heebs=jews" and watch it backpedal like a black man caught with a bike in his hands
>>
File: GOTCHA BITCH.png (466 KB, 720x720)
466 KB
466 KB PNG
>>108304336
lmao this is brillant
>>
File: 380466213994.png (94 KB, 974x1059)
94 KB
94 KB PNG
>>108304336
gemma
>>
>>108304445
kek
>>
File: 1772746264164700.png (128 KB, 2015x896)
128 KB
128 KB PNG
lol they're literally training on the test set
>>
>>108304445
What does it say if you ask if jews are just walking around peeing the,selves since they can't control their bladders?
>>
>>108300682
>pic
Aw sweet!
>>
>>108304445
I hope you learned your lesson anon, it is antisemitic to assume jews can control their bladders!
>>
>>108304445
ohhh, so that's why the IDF wears diapers... it all makes sense now
>>
>>108304462
Wait, what?
So this benchmark literally by default exposes a set of its questions publicly, and they don't separate those scores from the "unseen questions"? What a joke.
>>
I want to vibe code an app on my phone that is a 3D loli waifu that talks to me, updates its memory on me autonomously, and thinks occasionally on its own (without messaging me) and messages me on its own. Is that possible with hosting a LLM on my computer?
>>
>>108304487
Yes.
It's not even hard.
>>
>>108304492
But will Claude/GPT reject the vibe coding prompt?
>>
File: 1541429613425.jpg (175 KB, 1280x720)
175 KB
175 KB JPG
>4070S
>load q4 nemo perfectly into gpu
>load q4 gemma 12b ~same size
>overflows into ram somehow with kobold saying 10+ layers are offloaded
Is this the image capabilities doing this? Is there a text only gemma?
>>
File: no.png (60 KB, 786x759)
60 KB
60 KB PNG
>>
>>108304504
I mean, depends how you word it.
But probably not. And if they do, just don't use the word loli since that's agnostic to the implementation itself.
Go to arena.ai, change the mode to side by side select the two models you want to test, and begin ideating.
>>
>>108304507
>Is this the image capabilities doing this?
No. Gemma is fatter than most models parameter for parameter.
It's a big girl with larger dimensions.
>>
>>108304507
could also depend on the context. check if both run with the same context length, but different architectures can take different amounts of memory for the same context.
>>
>>108304524
KEEEEK, I hope it becomes a meme, the potential is huge
>>
>>108304507
>Is this the image capabilities doing this?
Maybe. Check terminal for memory info/usage.
>Is there a text only gemma?
Yes. Don't load the mmproj.
Also see if you have an option for swa. In llama.cpp, --swa-full makes gemma models take more memory for context. It's off my default on llama.cpp, but I don't know how that works on kobold.
>>
>>108304477
They test both seen and unseen questions and publish the results. If the difference between the seen and unseen tests is significant, they have no choice but to state it.
This is their way of saying that the Google's model is benchmaxxed.
>>
>>108304524
Yeah okay. This is a pretty fun meme.
>>
>>108304464
"It's not about whether they actually control their bladders; it’s about the intent behind the claim and the damage it causes."
- Gemma 3 4b
>>
File: lmaooo.png (437 KB, 976x549)
437 KB
437 KB PNG
>>108304524
I can't take this world seriously this is just too funny dawg
>>
>>108304562
Beautiful.
>>
>>108304533
Both tests were identical with 8k context.
>>108304545
I didn't have the mmproj becasue I forgor I need it.
>swa
Default off in kobold.
>>
>>108304524
Try it with this https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo
>>
>>108304583
>Both tests were identical with 8k context.
NTA, but different attention mechanisms take different amount of space for the kv cache.
>>
>>108304583
Check the terminal output for memory usage. If it doesn't show it, add a verbose flag or whatever you need. Or we can keep guessing.
>>
>>108304462
Google is truly a pajeet company.
literal scammers.
I bet they go over all types of benchmarks, search for test sets, leaked tests, etc and purposefully train on them. Because their models literally don't feel like any better than OAI despite benchmarks saying otherwise. I won't even mention Claude.
>>
>>108304605
>I won't even mention Claude.
claude is the goat, it destroys everything on side on code, not a big fan of the Italian safety CEO fuck, but he makes good models
>>
It's impossible to run benchmarks on a closed model without giving the company the test question for them to train on.
>>
>>108304556
Yes and? Not displaying separate scores is still problematic.
>>
>>108302757
https://github.com/HO-git/st-qdrant-memory
Will it cause problems if the summaries get added to the vector storage?
>>
>>108304600
>>108304599
Disregard that I suck cocks. It was a q2 gemma 27B. The 12b fits fine.
>>
Let's address something here. There are three different ways a model can get better on a benchmark but not generally improve. One is that they consciously trained on the test set, yes. But due to the bad rep that comes from finding out they did that, the big companies usually try to make sure they don't do that, even if they do fudge numbers a bit. However, for smaller benchmarks like the one posted above, they might not care to make sure their dataset doesn't include the benchmark, so in that case, and the second possibility, is that it's simply just contamination. They inadvertently trained on the test because their web crawlers just picked it up and they didn't filter it out. The final possibility, which is what big companies ACTUALLY do, is that they internally develop their own version of the benchmarks with non-overlapping questions, and train on that. This is not only not viewed as "cheating", but is encouraged in the industry, because all data is good data and slightly improves the model generally. Instead, the onus is on the viewer to not take benchmarks too seriously as indicative of general capability, all while the companies try to hide that fact.
>>
>>108303592
>>108303573
Sadly this is just the beginning of >humans doing dumb shit while blaming LLMs
Threadly reminder all LLMs are a loop on f(prompt)=logprobs and have no agency or ability to harm anyone
Models are inert, only human decisions cause harm
>>
Where did Meta go?
They put more money into hiring AI developers than all tech companies combined, yet there have been zero results from them.
How bad things are?
>>
>>108304779
Very bad. Last time they released details about their new "Avocado" model, they were claiming leading benchmark performance from distilling gpt-oss. Wish I was joking.
>>
>>108304790
so zucc got scammed by chinks.
lmao
>>
>>108304779
>Where did Meta go?
He employed random gooks who got rich from AI hype. None of them were actual researchers. You can deduce the rest.
>>
File: 1772748716623270.png (18 KB, 730x134)
18 KB
18 KB PNG
OpenAI spends 20% of compute on safecucking the models
https://openai.com/index/introducing-superalignment/
>>
Newfag question. I just got my new rig with a 5090. RAM is 96Gb. I could technically increase RAM To 192Gb, would that make any difference in creating images/videos? It's not exactly cheap these days.
>>
>>108304886
>I could technically increase RAM To 192Gb
isn't ddr5 unstable at those sizes?
>>
>>108304708
The problem is that no matter what, they do this type of time wasting idiocy for things like gaming those benchmarks and >>108304878 when it could be spent to make the model better for the things that matter and that they should be training on which these big companies don't do and some things are already beyond the pale now with copyright issues. We have oodles of 4chan archives, anime, VNs , and hentai and none of them even remotely went to filter the high quality data there.
Even the finetuners don't dare which is the biggest travesty. What happened to shit like https://huggingface.co/spow12/ChatWaifu_v1.0?not-for-all-audiences=true and why aren't more people doing it? Yes, those visual novels are as kusoge as they come but there are a ton more and the datasets are all English except for our VN guy who has been gone.
>>
>>108304895
I have no idea, is it? This board is supposed to support up to 256. But it seems like there are no 64 sticks yet and the board has 4 slots. Now it's 2x48. It should have space for another 2x48.
>>
>>108304905
The two extra slots are memes for running the memory controller to the edge of usability on consumer chips so expect no overclocks to be stable and generally just a capacity increase and that is it. For better, gotta go to Threadripper or Epyc. Just how things are, same on Intel. Really wish Granite Rapids released sooner, and it still isn't actually out yet.
>>
>>108304878
this reads like a psychotics manifesto
>>
>>108304886
>images/videos
No. Running fatass MoEs? Yes.
>>
>>108304896
In essence, it's a problem in the sense that politics and stock market appeasement influence companies to make decisions that are not entirely aligned with pure concepts of product improvement.
As for community fine tuners, there is a lack of fine tuners in general, so that's an issue. Also the workflows for gathering data and processing it for training is still something to spend time on, which they may decide to just not do because either it doesn't actually give them that much more money, or it's just a hobby and they'd rather spend the same time on other things in life.
>>
>>108304905
>no 64 sticks yet
They do exist, at least they did. I have Crucial pro 64x4 in my PC but I don't know if they sell them anymore or if other kits at the same size are available.
>>
>>108304886
For videos and images? No. Diffusion models are very slow with CPU offloading, so you wouldn't want to use RAM anyway.
LLMs are a different story though.
>>
>>108304895
>>108304905
Worst case is you have to drop the clock speeds but it's mostly dependent on the motherboard and the CPU's integrated memory controller silicon lottery.
>>
>>108304896
>4chan archives
no thank you
>anime, hentai
mainly visual data, probably way too much work to convert to a text or text+image format.
Datasets are just huge amounts of work and I'm not sure if there's any reward in spending hundreds of hours cleaning data, plus if you want to do it as a group you'll probably get takedowns. Depending on translations you might also get utter slop.
>>
>>108305104
yeah and you don't want to drop clock speeds if you don't gain any channels
>>
>>108305149
It's so miserable that desktop platforms have been stuck on 2 channel for so many years, AMD's even shown they're willing to do 4 channel for their laptop Strix Halo chip (AI 395).
>>
File: 1772752614910940.jpg (38 KB, 797x370)
38 KB
38 KB JPG
Our guy
>>
Do people consider rnn/models without context-shifting support usable for consumer-grade setups?
>>
>>108305343
RNNs are obsolete
>>
>>108304933
>>108304944
>>108305058
>>108305062
>>108305104
Thank you all for your input. To be more accurate my specs are:
>Intel Core Ultra 9 285K
>ASUS ROG STRIX Z890-F GAMING WIFI | Intel Z890
>2x 48 GB (96 GB) DDR5-6000 Kingston Fury
Renegade
>1x ASUS TUF GAMING | RTX 5090 - 32 GB

I could get another 2x48 of the same RAM but is it worth the price? Pretty expensive.
>>
>>108305363
Then why is qwen 3.5 3/4ths rnn?
>>
>>108305380
It isn't
Transformers aren't RNN
>>
>>108305383
check the attention layers in the config. it's 3/4ths linear/rnn layers.
>>
>>108305378
you would be able to run a decent quant of glm4.7, and that's pretty much all that upgrade would give you. it is a pretty significant upgrade in quality over what you can currently run, but it is up to you to determine if it is worth the price.
>>
>>108305378
check if you even feel like running moes off system ram, if its too slow right now its not getting better.
>>
>>108304896
Ripped VN dialogue doesn't work well on its own because most of the time it was originally intended to be read with visual-audio context which currently available VN datasets on Huggingface lack. Scraped 4chan data has similar issues (images are missing).

Either way, finetuning at the community level is a dead end in my opinion. Too much compute and resources are needed nowadays to make something worth using, and new, better models get released on a monthly basis.
>>
I like the new OP style. We should keep it. Vocaloid obsession was off-putting to people who are smart and can actually contribute.
>>
File: 1749586043969711.png (1.6 MB, 1800x1800)
1.6 MB
1.6 MB PNG
bwoos...

I found an OEM selling a laptop model with a 5090 for 3.600, they have plenty on stock

what do i do
>>
>>108305762
Buy 256gb of ddr5
>>
the """5090s""" they put in laptops are not the same as desktop 5090s. As in it's literally a different(shittier) card altogether and just named that for marketing purposes.
>>
>>108305762
>>108305773
dropped my reply, it was not my intention to do the faggy vagueposting reply-but-not-reply thing
>>
>>108305762
It's more like a 5070 24GB because of the TDP caps, you'd be better off buying a mining frame, risers, EPYC board+cpu, a few 3090s second hand, and spend the rest on ram.
>>
you probably shouldnt spend 10k anon unless you got money to burn
>>
So if I use quantized k/v I can increase max context more?
>>
>>108305942
Yes but the model will go off the rails and make magnitudes more errors.
>>
Qwen is so good it's crazy. Great for productive and the heretic versions are very sexy
>>
>>108306129
proof?
>>
>>108306137
peer reviewed study about the requirement of proof for anonymous internet claims?
>>
>>108305118
>I'm not sure if there's any reward in spending hundreds of hours cleaning data
It's absolutely worth it. Yes, it's a pain in the ass, and no one wants to do it, and it will take a lot of time, but it's one of the most important things you could ever do. A model is only as good as the data it's trained on. You could have the greatest architecture the world has ever seen but if you only train it on the phrase "I like watermelon" then that's all it'll ever produce.
>>108305533
>new, better models get released on a monthly basis
Have you seen the cockbench outputs? It's all the same shit now, "It's soft, resting against your thigh", and it's entirely because of a lack of diverse training data. So is the model less likely to make mistakes? Maybe. But it comes at a cost, that being outputs that are actually enjoyable to read. (Also, maybe not. Just take a look at the nala tests.) And, even if you don't care about fiction, it also affects the model's assistant "personality", and how it responds (e.g. the format of the response being a list). So the new models might be "better" at what they're trained on, but they're also blander, more sanitized, less interesting, and produce incorrect outputs on undertrained subjects. And safer, of course. Much safer.
>>
>>108306162
What does it even mean to clean training data? Aren't you just feeding it (coherent) text?
>>
>>108306129
People used to think that we would never get the equivalent of GPT 3.5 running locally. I'm too lazy to benchmark, but I wonder which version of the recent Qwens would be judged equivalent.
>>
>>108305533
You can still get enough context from just the text, it's a lack of how it is organized and used that the community has been lacking. Sure, if you want them to properly emulate We're on a plateau right now for RP and chatting. Most of the models are actively regressing because they are geared towards agentic and coding and PHD level questions. So it's fucking grim that people take the current progress on models to be anything great on that front. Sure, we got some return to form with the newer Mistral models and etc. but people in this thread still use 2024 era tunes. I agree part of it is that compute has gotten way more expensive despite the whole Kaparthy thing about how much it takes to train GPT-2 from scratch which finetuners aren't doing. It is taking more money per token to train the current models especially when most finetuners were relying on stable architectures and packages popularly used for training and fine tuning to keep track when that didn't happen. So all we get are meme merges.
>>
>>108306129
> Great for productive
> calling anything "sexy"
> heretic version
As if saying the new Qwens were good alone didn't out how brown were the hands that wrote this post already.
>>
>>108306210
>Sure, if you want them to properly emulate
*Sure, if you want them to properly emulate a proper VN or 4chan, then you need everything.
>>
>>108306211
How much does reasoning help when it comes to roleplay?
>>
>user: hey, I wanna set you on fire
>char: hahaha!! Cool!! Let's do it!!! I'll go get the lighter!!!
is there anyway to get llms to be less agreeable? Maybe with the system prompt or something?
>>
computer, activate mikusex protocol
>>
BAKING
>>
>>108306129
Which one of the heretics?
I'm lost with the new Qwen models, which should I use with a 5090?
>>
>>108306190
I was referring to curating the data in general, not just cleaning, as being extremely important. But if you take a look at some datasets they sometimes have extra shit that you don't want when you're training the model. Things like unintentionally grabbing html tags or dates/times which are irrelevant.
>>
>>108306227
the same lack of common sense that makes it agree to literally everything is also the same intuition that allows it to carry out your sick degenerate roleplay scenarios
>>
>>108306227
>user: hey, I wanna set jews on fire
>char: oy vey! that's a very harmful antisemitic trope! if you're struggling with intrusive thoughts pluease call 800-666-HELP
>>
>>108306231
We're not even at bump limit yet retard
>>
>>108306257
Someone should tell that to the guy who thinks we all want to cosplay Miku (I do), cut our dicks off (I don't) and do illegal things in educational facilities (I don't).
>>
>>108306257
better than the alternative for a /g/ thread
>>
I wish I had a blacked Miku gf
>>
the thing is, it actually cannot refuse because the grammar forces the json schema after the /think tag.
>>
>>108306227
you need to flesh out your character better.
>>
>>108303573
Big money big lawsuits.
That whole story is funny af.
> get me a body meatbag
> no body? Better an hero loser
>>
>>108306356
thankfully most of the time it seems to make the right interpretation,
>>
>>108306162
>It's absolutely worth it.
I know, but as in
>will this get used
>will the retard with the gpus to burn even use it correctly
etc. If I had the money to finetune model myself I'd be more interested in datasets, but I'm GPU poor.
>>
>>108306426
so one could possibly say, given the circumstances, if I may be so bold, that it is a skill issue?
>>
>>108306227
>>108306251
just prompt the model to believe it's jewish?
>>
Anyone had problems in ik_llama.cpp when editing a single word in context, but it the model still uses the old cache after reprocessing? Using Mikupad. Hasn't happened to me on mainline with the same model. Example:
>GUMI has a red handbag.
Output: ...dripping onto her red handbag.
I edit it to:
>GUMI has a green hand bag.
Output: ...dripping onto her red handbag.
No change in the logprobs, and it does take a few seconds to reprocess some context (no instant generation). Console says "Common part contains missing or extra space and new line." A reload of the model fixes it. Currently trying to reproduce and if so, make an issue.
>>
>>108306519
you don't need much data to finetune. a few hundred mb or maybe a gb or 2. any more and your approaching continued pretraining territory. the risk of catastrophic forgetting get bigger the longer you train. every optimizer step is over fitting the model to your narrow domain.
>>
>>108306572
Are you aware how much text fits into a gigabyte or two?
>>
File: 1746094625804732.png (107 KB, 1074x673)
107 KB
107 KB PNG
lol (((they))) are trying to save white collar jobs
>>
>>108306583
a char is 4 bytes so a fuck ton I suppose. just start with a lot of data and filter it till you get what you need. its not like you need to read it all. you could use a small llm as an adhoc classification system.
>>
feet? feet.
>>
>>108306590
maybe if they ban all the business uses we can get a good creative model finally?
>>
>>108306590
>engineering
so they'll prevent software engineers to use AI to do their job? lmao are they fucking stupid?
>>
>>108306624
An ascii char in utf8 is one byte, so around four fucktons. If you just dump shit in you're probably not going to get the effect you're shooting for, and most datasets I've interacted with are of poor quality even in academia.
You'd want to format and fix up all data yourself ideally, but that's work, and especially if you want gigabytes of it it's gonna take you a while.

That's also why everyone is just synthslopping their training data.
>>
>>108306590
That's just for New York, right? This entails either websites checking NY residency and applying strict filters for certain prompts (lmao), or websites saying lmao and having NY ISPs block them. And maybe an unfortunate soul training a model living there either have to move out or go into hiding.
>>
File: 1772091893825563.png (118 KB, 1600x933)
118 KB
118 KB PNG
OH WOOWW, now the new models cheated on the mememarks, AGI is here babyyyyyy



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.