[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 39_06118-2_.png (1.1 MB, 720x1280)
1.1 MB
1.1 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102524339 & >>102513868

►News
>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5/
>(09/18) Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization
>(09/17) Mistral releases new 22B with 128k context and function calling: https://mistral.ai/news/september-24-release/
>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
>>102535977
fuck your anime garbage
>inb4 anime site
kys
>>
File: 42 Days Until November 5.png (1.91 MB, 1472x1104)
1.91 MB
1.91 MB PNG
>>
File: recap-102524339.jpg (3.69 MB, 1780x7623)
3.69 MB
3.69 MB JPG
►Recent Highlights from the Previous Thread: >>102524339

--Papers:
>102527814
--Quantization turns floats into ints, smaller quant = faster and lower VRAM:
>102531358 >102531397 >102531399
--JavaScript code to linkify greentext quotes in threads:
>102525946 >102526315 >102527428 >102530573
--Discussion on the timeline, definitions, and challenges of AGI and ASI:
>102525653 >102525785 >102526071 >102526499 >102526932 >102527674
--Working on a userscript to treat different symbols as quotes:
>102525273 >102525299 >102525393 >102525519
--Custom Floating-Points for LLMs, but results may be model-specific and lack statistical significance:
>102531991 >102532241 >102532256
--Choosing Lora rank and alpha, understanding Loras, mathematicians vs. engineers:
>102532942 >102533148 >102533195 >102533307 >102533287 >102533302 >102533338 >102533423 >102533538 >102533570 >102533627 >102533559 >102533602 >102532973 >102532996 >102533100
--Anon shares concerns about cloud AI services logging and safety:
>102531752 >102531956 >102532242
--Nvidia's Llama-3_1-Nemotron-51B-Instruct model discussion:
>102524761 >102524862 >102525022 >102525041 >102525306
--Choosing between Aphrodite, vLLM, and llama.cpp based on hardware constraints:
>102529626 >102529816 >102529861 >102529888
--Request for de-slopped Llama3 405B tune for AMD MI300x:
>102525233
--Chromebook insufficient for serious AI inference:
>102530210 >102530784 >102530860 >102532235
--Anon praises Florence2 multimodal architecture:
>102532950 >102533001 >102533091 >102533288
--Miku (free space):
>102524999 >102525111 >102525197 >102525335 >102527437 >102532158 >102533585 >102534092

►Recent Highlight Posts from the Previous Thread: >>102524347
>>
any abliterated Qwen2.5 72B?
it refuses to do certain wholesome and family friendly things.
>>
>>102535991
I want to fuck the anime girl.
>>
File: 1727137045529766.jpg (870 KB, 2048x1568)
870 KB
870 KB JPG
Happy autumn, /lmg/
>>
Thanks but I'm still sticking with Command-R v01
>>
>>102536036
Why are her eyes empty?
>>
File: ComfyUI_01089_.png (1.27 MB, 1272x1024)
1.27 MB
1.27 MB PNG
>>102535991
anime site
>>
>>102535999
Can an AI create that image you attached to the post, or does a human still have to make that manually?
>>
>>102535991
>fuck your anime garbage
OP needs an image to start the thread. What would you put there instead if you had your druthers?
Honest question, I'm curious.
(I'm assuming an actual answer and not a content-free negative eg "not anime")
>>
Is Autumn the season of hibernation?
>>
>>102536081
There's a body under the leaves and you are too close.
>>
>>102536127
neither, just a normal script does that part
>>
>>102536036
rake teto
>>
File: 1727167138625474.png (646 KB, 512x768)
646 KB
646 KB PNG
>>102536036
Isn't it too early?... Oh, wait, it's almost October already!
>>
>>102536036
tetoctober soon
>>
>>102536179
>fall red eyes
I like this Miku
>>
>>102536128
lecunny. dead body of sama. llama. graph from an llm paper. model card screenshot.
>>
File: Xeon 6 P-core SKU Map.png (935 KB, 1920x1080)
935 KB
935 KB PNG
>12 channel DDR5-6400
>or 12 channel MRDIMM-8800
>PER SOCKET
epycbros I don't feel so good..................
>>
>>102536215
>lecunny
Yes, because we should revolve the entire general around a person that spends all day being passive agressive on social media.
>dead body of sama
blue board
>llama
Meta will eventually turn against open source. They all do.
>graph from an llm paper. model card screenshot.
Mostly empty image that no one will spend any time looking at. If you want to be a psued and virtual signal your intelligence, you can fuck off back to r-ddit.
Anime. Website.
>>
File: IMG_8263.jpg (363 KB, 2048x1583)
363 KB
363 KB JPG
teto
>>
>>102536237
It's a meme, bro, you won't even get half of the speed
>>
>>102536272
You are just arguing for the sake of arguing. All of those options are more on topic than hatsune miku(male). And that makes all of them better.
>>
>>102536277
Is this Academic bullying?
>>
>>102536215
>dead body of sama
kino
i think I've seen all the others in past threads, though
Thanks for the honest answer. I personally like variety in the OP, especially when the image is riffing off of a previous thread's conversations/news. Vocaloids make good stand-ins for pretty well any scenario, so I think that's partly why they get used (along with the whole virtual idol/AI thing being appropriate to LLMs)
I am still curious as to why you hate anime images so much, though. I'm sure there's some other thing out there that would annoy me in the same way, but your reaction still puzzles me
>>
>>102536237
Israel isn't a real country.
>>
>>102536349
free energy not bullying
>>
>>102536308
>It's a meme, bro, you won't even get half of the speed
like epyc? 30% cheaper and 50% slower (at least on any hard math, specifically atan2) kek
>>
reminder anthracite spent all their money on failed finetunes and now can't even pay their shills
>>
why aren't there ever Macross references in this thread? There are lots of other famous AI type shows out there
I know we get some bladrunner images sometimes, but it feels like there's a lot of other fertile ground that's being ignored
>>
File: 1234231412389342.jpg (47 KB, 639x422)
47 KB
47 KB JPG
>>102536215
>no anime
>model card screenshot
wut
>>102536334
>miku is off topic
>but let me sperg out about her and bump the thread
double wut
>>
>>102535991
I’d just like to interject for a moment. What you’re referring to as Anime, is in fact, a Vocaloid, or as I’ve recently taken to calling it, ボーカロイド.
>>
Reminder that you said you were trying to quit trolling, Evan.
>>
>>102536355
I just hate the local troons that gather around that image. I actually like miku songs.
>>
>>102536366
I like magnum-123B and will willingly shill it for free.
>>
>>102536237
>epycbros
I've gotten enough good use out of my dual EPYC build for a relatively cheap build price that I've got no regrets
Tech marches on. Glad Intel is putting something better out there. Hopefully there's some way to get ahold of it for less than 6 figures.
Good luck to future cpumaxxers!
>>
i am starting to feel like taking a blacked miku shit...
>>
calm down cuda dev
>>
>>102536407
>local troons
This isn't actually a thing is it? I haven't seen a non-troll tranny references in years in this general
>>
>>102536355
He is just butthurt that there are a few VRAM chads itt that like genning mikus
He's been spilling spaghetti all over this general for a while now, see
>>102525042
>>102525071
>>
>>102536460
>in years in this general
this general is only a year old retard
>>
How much data do I need to make a finetune worth anything?
>>
>>102536472
3
>>
>>102536460
It is and one mikufaggot here was doxing people when he thought it was one of anti-miku trolls.
>>
>>102536416
it was a bit of a shock to come back to this site after a decent number of years and see all this obsession over trannies in any thread over the slightest thing. when no one used to talk about them at all.
>>
>>102536482
Is there a unit to that, or is data a dimensionless quantity?
>>
Bootstrap. Iterate. Bootstrap. Iterate. And one day we get ASI. Meanwhile Lecun basically admitted defeat saying it's too hard. Hope I can see his AGI cat in ten years.
>>
>>102536494
>when no one used to talk about them at all
Because they were rightly ridiculed and called out for being mentally ill instead of having extra privileges.
>>
>>102536511
yes
>>
>>102536511
3
>>
>>102536519
Do not care. Not your personal army. Fuck off.
>>
>>102536472
as few as 1000 samples can get you a meaningful result, as per the lima paper. more is better though, ideally you should do as much as you can as long as your data is all good quality. high number of good quality samples > low number of good quality samples > high number of low quality samples
>>
>>102536517
JEPA will R U I N you.
I can't wait to see your reaction.
>>
>>102536578
JEPA is vaporware
>>
>days have passed
>he's still upset that I accurately pointed out all mikuposters are pedophiles
seethe harder nonce
>>
>>102535991
>inb4 anime site
Correct. Now go kill yourself, reddit troon
>>
>>102535977
>70B-instruct distilled to 51B
What do I need to imagine here exactly, 70b TYPE quality in the form of 51b? I guess that would be more ideal at Q6-8 or whatever instead of 70b at Q4?
>>
>>102536804
imagine lobotomized slop
>>
>>102536816
Google's managed to do distillation surprisingly well.
>>
>>102536804
It's a sort of calibrated type of pruning followed by knowledge distillation, so it should come pretty close, in theory at least, although probably not on every domain.
>>
gpt voice is rolling out for real this time
>>
>>102536823
Interesting idea to say the least, question is how well it worked out for them.
>>
>>102536834
I set up gemma2 with Whisper, a shell enabled dialog engine, and Festival and I can say without out a doubt that is the most frustrating, clumsy way to use a computer.

Fucking toggling in machine code on a switch panel would be more ergonomic.
>>
Are the new CR's that bad?
>>
>>102536885
Probably about the same as every other distillation attempt. Great on benchmarks, but retarded for any practical usage.
>>
>>102536895
>Are the new CR's that bad?
They lost their only differentiating factor when they started chasing benches with slop datasets. They don't have the same soul that some could squeeze out of the first gen.
>>
>>102536895
The CR+ refresh is a side grade at best
>>
>>102536895
Cohere is dead. After what has happened to Mistral and the Chinese models, the only hope for local at this point is that Jamba 2 can actually carry its (literal) weight.
>>
>>102536986
>Isreal is our only hope
how many months has the Jamba 1 pull request been festering?
>>
>>102536908
>great on benchmarks
what do I buy to invest in this brilliant model!?
>>
>>102536535
You sound like a troon. Their main strategy is trying to make people ignore them as they take more and more power.
>>
anyone complaining about anime here is a newfag who desperately needs to go back
it's specifically the mikuposters who need to face the wall
>>
>>102537316
>complains about anime
>>
>>102537289
>Not my personal army!? YOU SOUND LIKE A [buzzword]!
And you sound like a child. You need to be 18 to post here, also not your private army, faggot.
>>
Slow day huh?
>>
>>102537405
>>102536366
>>
>102537390
miku is the vocaloid mascot, newfag. go back.
>>
File: ComfyUI_00960_.png (1.07 MB, 856x1024)
1.07 MB
1.07 MB PNG
>>102537390
>replying to the resident schizo's schizobabble
>>
>>102537405
slow month, slow year
not looking good for local models
>>
it's actually over this time huh
>>
good
>>
>>102537625
it was over the moment troons took control of this place. current state of it was just a matter of time.
>>
>102535999
Fix the fucking references.
>>
the poster above me needs to go back
>>
the poster above me needs to kill himself
>>
In retrospect, what went wrong?
>>
the poster above me is really cute :3
>>
Someone is desperate to derail this thread. They REALLY don't want us speculating about Meta's big announcement tomorrow. I wonder why? Who benefits from this...?
>>
>>102537670
Faggot. Drummer.
>>
>102537647
>>102532154

>>102478518
>tldr can't have more than 9 mentions now, probably cause of the "ever wonder why" poster
>>
>102537625
>102537635
>102537654
>102537662
>102537668
>102537670
what's up with these gay ass posts? embarrassing.
>>
>>102537714
oh i get it, miku shitter is mad because OP pic is not miku this time, lol
>>
What is the prompting secret sauce so that characters know they can't look you in the eye while turned away, and other such things?
>>
>>102537742
not using Magnum 12B
>>
>>102537742
About 100b promptamaters
>>
https://x.com/OpenAI/status/1838642444365369814
>>
What's a good llm to run with Sillytavern and simulate an online chat with my imaginary waifu?
Right now I'm running the NemoMix-Unleashed-12B-Q6_K.gguf which is kinda alright but some messages are really weird
>>
Terrible voices
>>
>>102537792
largestral
>>
>>102537711
So you are saying he is ban evading and trying to skirt the rules? Shouldn't some janny take care of this?
>>
>>102537792
Try these
Mistral Small
Mistral Large
Hermes-3-Llama-3.1-70B
Gemma 2 27b
Midnight Miqu 70B
older version of Command-R
Mixtral 8x7B (fast on CPU for it's size)
>>
File: high_effort_shitpost.jpg (214 KB, 573x1268)
214 KB
214 KB JPG
>>102537792
>>
>>102537750
>>102537764
What's the solution if I'm poor?
>>
>>102537850
>So you are saying he is ban evading and trying to skirt the rules? Shouldn't some janny take care of this?
resolving uncomfortable or difficult people issues through ham-fisted technological means is a tried-and-true method used by lazy managers everywhere. Bonus points if it makes the world worse for everyone else.
>>
>>102537862
Buy more ram
>>
>>102537861
That's just average local turd experience.
>>102537776
People bullshitting on fossjeet here: https://x.com/reach_vb/status/1838645845652332955
>>
>>102537862
>What's the solution if I'm poor?
Assuming "dont be poor" is too far out of reach for you, then "be more patient" is typically the fallback.
Alternatively, you could also sell your soul to online services
>>
>>102537813
>>102537856
Thanks, I check them out
>>
>>102537862
Use a different 12b.
>>
>>102537856
don't forget the mixtral 8x22b Wizard LLM fientune
So many things depend on hardware specs tho
>>
>Model scopes for Vector Storage will be enabled by default in the next release. Opt-in earlier by setting enableModelScopes to true in the config.yaml file. This will require to regenerate stored vectors.
i enabled it with a previously made db and it didn't seem to regenerate. is this normal or am i expected to purge old ones first? usually this kind of migration stuff is automatic
>>
Best base text continuation model for 40gb VRAM + 64gb RAM?
>>
>>102538092
Mixtral
>>
>>102538102
don't listen to this retard, download Magnum 2.5 Kto
>>
I'm going to eat 7 hitlerbars
>>
>>102538102
Mixtral is worse than even nemo.
>>
>>102537856
I'd recommend Hermes 2 over 3. 3.1 can be strange.
>>
I like slop
>>
>>102538124
think of the children
>>
>>102538194
qwenbro...
>>
>>102538205
Sure they'll also get some hitlerbars
>>
>>102538194
I don't mind it if the model is doing great otherwise.
if that's the price to not read how a girl giving me a blowjob while kissing my lips softly, I'll gladly take it.
>>
>>102538194
Go away woman. Fuck some chad.
>>
I do not know what slop even is.
>>
>>102538194
Slop likes you too :)
>>
>>102538250
thank you :)
>>
>>102538258
Look into the mirror anon
>>
File: 1718979347484461.jpg (43 KB, 600x450)
43 KB
43 KB JPG
>>102537862
Install russian onlin super-RAM
>>
>>102538194
based
>>
>>102538296
What kind of weird potato is that?
>>
>>102536237
Time for a tetomaxxing guide
>>
>>102538321
the fluffy kind
>>
If the models have a rolling window, why does it still go schizo when it nears to filling the context allotted? Mind you, I am using 32k context size. Am I misunderstanding what a rolling window means when it comes to LLMs?
>>
24 hours from now, Llama, and thus local, will be saved.
>>
>>102538446
They only released 3.1 a month ago. 4 isn't coming until next year.
>>
>>102538446
two more years
>>
>>102538428
What model? If it's something like nemo it's only good to 16k. So try setting it to that if you're using context shifting.
>>
Is there a way to let an llm search the web on it's own if it realizes that it doesn't have enough information about a topic?
Let's say the cut-off date is 2023 and I'm asking "Tell me what happened in the year 2024" the LLM will then give the answer and reflect that this is wrong or useless information and will perform a web search instead.
>>
>>102538194
they hated him because he was the same as them
>>
>>102538498
function calling
>>
wait, you guys have a local schizo too? i thought that was just /sdg/ and /ldg/
>>
>>102538194
good slop:
>half-lidded eyes
>shivers
bad slop:
>ministrations
>don't think this means anything, i still
>>
>>102538446
Tacked on multimodal won't save anything.
>>
>>102538573
local schizoid general
>>
>>102538498
>Is there a way to let an llm search the web on it's own
Yes. read on function calling.
>if it realizes that it doesn't have enough information about a topic?
They have no introspection. They don't know what they know, for a very generous definition of knowledge.
>>
File: file.png (3 KB, 741x17)
3 KB
3 KB PNG
>loss going down
>eval loss going down
>epoch 0.5
I'm feeling it! This time I will make the best model ever.
>>
File: 1698546841473101.jpg (32 KB, 500x375)
32 KB
32 KB JPG
>>102538573
Every general has resident schizos, simply the way it be
>>
>>102538596
i'll use your model if it's 12b or under
>>
>>102538596
i'll use your model if its 70b or over
>>
File: 172686929462211.jpg (845 KB, 2048x2048)
845 KB
845 KB JPG
>>102538333
Checked
>>
>>102538583
>ministrations
I've never even heard the word "ministrations" irl
>>
>>102538583
>shivers
Used too much but not necessarily bad.
Another one is "a mix between"
>>
>>102538664
me neither, and i'm a pretentious pseudo-intellectual sesquipedalian scrabble player
>>
>>102538664
It is a word invented specifically for harlequin romance aimed at women.
>>
>>102538641
try sonnet, you'll never go back to localcucking
>>
>>102538775
this is /lmg/ retard
>>
>>102538775
>$0 / month
>Access to Claude 3.5 Sonnet
What's the catch?
>>
>>102538799
They train on your logs also you get rate limited
>>
>>102538793
>retard thinks local only applies to language models
this is your brain on 8k context
>>
>>102538854
yes, to a local models general
now fuck off to your proxy before it croaks and you have to cook up another piss drinking video, faggot
>>
>>102538854
>Local MODELS general

Kys shill
>>
>>102538883
>>102538885
mald more, you will never have local gpt-4o capable AI.
>>
>>102538908
r u sure
>>
>>102538883
you're an absolute retard. you can use non-local text models with other local models like image gen and they are a million times better
>>
>>102538908
>you will never have local gpt-4o capable AI
>ClosedAI so afraid of local that they're banning people for trying to reverse engineer a prompt

Back to /aicg/ little pajeet
>>
>>102538922
>image gen
????
not a general for this either? did your mother drink excessively during pregnancy or something?
>>
>>102538922
Then go to a thread for image gen, or aicg, not one meant literally for llm's
>>
But how will I know when I get there?
And how will I know when to leave?
>>
>>102538922
>a general dedicated to the discussion and development of local language models.
>>
>>102538927
>afraid
funny headcanon
>>
File: ComfyUI_00820_.png (1.19 MB, 1024x1024)
1.19 MB
1.19 MB PNG
>Miku, get the locust spray
>>
>>102538922
>you can use non-local text models with other local models
how?
>>
>>102538940
>>102538941
>>102538958
discussing how trash local models are in comparison is discussion. you can't possibly be this dumb
>>
>>102538959

>No mention of GPT5, No mention of Sora, No mention of GPTo with voice enabled

>Months of work for a COT finetune

>Btfo'd by Qwen in coding

>Btfo'd by Sonnet in literally everything else

>Sama seething on twitter


Kek.
>>
>>102538995
SillyTavern with local TTS connected to claude for example.
>>
>>102539010
nice fanfic
>>
>>102539000
No faggot, you interrupted an actual discussion about local by saying just use Sonnet
>>
>>102539010
>No mention of GPTo with voice enabled
r you blind? >>102537776
>>
>>102539032
My bad, they delivered on one of their promises after months, OpenAI is back and Sama def wasn't dilating on twitter
>>
>>102539030
it's the best solution. if you can't handle discussion on 4chan, try reddit, you can downdoot facts that infuriate you.
>>
>>102539032
I can see fine I just can't hear so how could I know that retard?
>>
i have a 6700 XT 12 GB and an i7-10700k. is there anything i can run decently local or do I need a nvidia gpu?
>>
>>102539051
Openai's tech actually works though.
>>
>>102539093
Look into the rocm koboldcpp build.
You can run nemo-instruct at a decent quant with a good amount of context.
>>
>>102539057
Best solution for the tards at /aicg/ This is local models retard
>>
the only schizo in this general is the schizo who calls everyone a schizo
>>
magnum shills have been real quiet ever since anthracite ran out of money huh
>>
>>102539118
>This is local models retard
oh the horror
>>
>>102539154
Good riddance
>>
>>102539154
Rocinantesisters we won
>>
>>102539154
money for what? I thought they got all their compute undeserved
>>
>>102538664
I have, it is mainly used in religious connotations.
>>
>tard squad finetunes a shitty base
>makes it marginally better
>/lmg/ opens their wallets
>tard squad finetunes larger models
>/lmg/ realizes the dataset sucks
>pretends they never liked tard squad
happens at least 4 times per year
>>
>>102539154
>>102539173
How does one (1) guy alone absolutely BTFOs anthracite so much?
>>
>>102539250
hi drummer
>>
>>102539256
Hi Sao. I am not Sao. You are Sao.
>>
>>102539269
unironically Drummer
>>
>>102539219
They can't scam me if I don't have money to begin with.
>>
What if the final solution model never comes? And it will be a perpetual state of new slightly smarter differently slopped models you can kinda enjoy for 2-3 roleplays before you see everything it tends to repeat and you can't take it anymore. And you will have to keep 100 of models around to swap them to get different styles?
>>
>>102539312
Go back to Pyggy and see if things have improved or not
>>
>>102539312
That sounds like a you problem.
>>
I can't make sense of this thread at all and I consider myself pretty knowledgeable about open source LLMs. What the fuck are y'all talking about?
>>
>>102539323
The cooming plateau is here.
>>
File: file.png (77 KB, 1165x567)
77 KB
77 KB PNG
Saars in their natural habitat are funny.
>>
Can I run Qwen2.5 32B 4.65bpw or even 5.0bpw on a 3090?
>>
>>102539682
check filesize
>>
>>102539693
4.65bpw is 20GB and 5.0bpw is 21.68, but context also takes some space so I'm not sure.
>>
Is there any way to speed up context loading? At the cost of extra ram perhaps?
>>
>>102539729
You need to enable turbo mode.
>>
>>102539729
You need to download more FLOPS
>>
File: 39_06277_.png (1.55 MB, 1280x1280)
1.55 MB
1.55 MB PNG
>>102534097
Flux D 1.0
>>
can i run this stuff on amd cards? rx 7900.
I dont want to train anything, just want to play around with image and chatbot.
I dont mind compiling stuff and digging through forum posts, but I'm not sure if this is a complete fool's errand
>>
>>102540000
>>102539117
>>
>>102540000
ye
i have a fun time with just 8gb on my shitty 4060 so you'll probably have a blast with your 16-20gb (or whatever) even if it's stinky amd
>>
small question, if i want to make a lora of a model, does it need to be the pure safetensors file or can i use a gguf?
>>
>>102539803
The Dota 2 Turbo mode?
>>
File: file.png (67 KB, 300x188)
67 KB
67 KB PNG
>>102540090
The turbo mode you enable on your pc case.
>>
Best model for erotic RP? Im not sure whats the latest stuff
>>
>>102540176
this one's my current favorite
https://huggingface.co/mradermacher/Arcanum-12b-GGUF/tree/main
it's not leaps and bounds over other nemo merges or anything though.
>>
File: file.png (375 KB, 807x580)
375 KB
375 KB PNG
>>102540130
I only have a turbo mode on my gamepad, but it's connected to a pc. Does that count?
>>
File: 11__00149_.png (2.11 MB, 1024x1024)
2.11 MB
2.11 MB PNG
>>102536215
Image genned with a local model, what's the problem anon?
>model card screenshot
Ah so what you really want is free real estate to shill, fuck right off
>>
>>102540239
back to cage >>>/a/nimal
>>
>>102540196
>combining TheDrummer/Rocinante-12B-v1.1 and MarinaraSpaghetti/NemoMix-Unleashed-12B using a novel merging technique.
>novel merging technique.
Without a proper cooming testing methodology why does this mean anything?
>>
>>102540176
mistral nemo / mistral small / mistral large

Biggest you can fit.
>>
File: 1725922368500279.jpg (649 KB, 2384x1808)
649 KB
649 KB JPG
>>102540266
Checked
>>
File: 1700027893072764.jpg (242 KB, 1024x1024)
242 KB
242 KB JPG
>>102535977
>>
>>102540324
kek
>>
File: hmmmm.gif (352 KB, 256x256)
352 KB
352 KB GIF
>huggingface.co/gghfez/SmartMaid-123b-exl2
New largestral slop dropped
no fp16 weights???
>>
>>102539312
imo I just need these models to have better spatial reasoning/world models
>>
>>102540969
>maid
Undislop?
>>
>>102540969
Buy an ad
>>
>>102540267
>why does this mean anything?
it doesn't.
>>
Small but commendable performance improvement on code generation: https://arxiv.org/html/2309.02772v3
On this topic, do you guys know about anything else that could improve code generation?
>>
>>102541017
No thanks, rabbi. I think instead I'll post whatever the fuck I want.
>>
File: AGI_confirmed.png (383 KB, 648x764)
383 KB
383 KB PNG
>>102539312
things are about to accelerate
>>
>>102541159
If I had to guess, newer models trained on datasets from COT models will probably increase coding benchmarks significantly.
>>
>>102541543
pls tell me I won't have to work ever again and can instead live my life doing things I actually enjoy
>>
>>102541713
That's communism
>>
>>102541722
I'll take it as long as it doesn't turn into authoritarian garbage.
>>
What's the best most intelligent, creative, soulful model for RP currently?
>>
File: be more grateful.jpg (87 KB, 945x2048)
87 KB
87 KB JPG
>>
>>102540049
>>102540079
cool thanks anons
>>
>>102538583
the only good slop is the one ood
>>
File: Untitled.png (103 KB, 1071x421)
103 KB
103 KB PNG
Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering
https://arxiv.org/abs/2409.16167
>Low-Rank Adaptation (LoRA) has emerged as a popular technique for fine-tuning large language models (LLMs) to various domains due to its modular design and widespread availability on platforms like Huggingface. This modularity has sparked interest in combining multiple LoRAs to enhance LLM capabilities. However, existing methods for LoRA composition primarily focus on task-specific adaptations that require additional training, and current model merging techniques often fail to fully leverage LoRA's modular nature, leading to parameter interference and performance degradation. In this paper, we investigate the feasibility of disassembling and reassembling multiple LoRAs at a finer granularity, analogous to assembling LEGO blocks. We introduce the concept of Minimal Semantic Units (MSUs), where the parameters corresponding to each rank in LoRA function as independent units. These MSUs demonstrate permutation invariance and concatenation-summation equivalence properties, enabling flexible combinations to create new LoRAs. Building on these insights, we propose the LoRA-LEGO framework. This framework conducts rank-wise parameter clustering by grouping MSUs from different LoRAs into k clusters. The centroid of each cluster serves as a representative MSU, enabling the assembly of a merged LoRA with an adjusted rank of k. Additionally, we apply a dual reweighting strategy to optimize the scale of the merged LoRA. Experiments across various benchmarks demonstrate that our method outperforms existing approaches in LoRA merging.
might be cool no code though so w/e
>>
>>102541866
Same as "you reached context limit - enjoy OOM moment or extreme hallucinations".
>>
>>102541824
Seconding this but it needs to fit onto 24 GB of VRAM without stepping below 8-bit quantization.
>>
>>102541824
>>102542255
mythomax
>>
>>102542255
No it needs to fit into 64G of ram
>>
slop is soul and I'm tired of pretending it's not.
>>
what's the best model for flirting with a venezuelan math teacher while I roleplay as a homeless black midget pretending to be a middle schooler?
>>
>>102542298
Probably something by anthracite
>>
>>102542290
>buckbroken
>>
>>102542275
Thank you, Anon.
>>
File: Untitled.png (1.85 MB, 1080x3125)
1.85 MB
1.85 MB PNG
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
https://arxiv.org/abs/2409.16040
>Deep learning for time series forecasting has seen significant advancements over the past decades. However, despite the success of large-scale pre-training in language and vision domains, pre-trained time series models remain limited in scale and operate at a high cost, hindering the development of larger capable forecasting models in real-world applications. In response, we introduce Time-MoE, a scalable and unified architecture designed to pre-train larger, more capable forecasting foundation models while reducing inference costs. By leveraging a sparse mixture-of-experts (MoE) design, Time-MoE enhances computational efficiency by activating only a subset of networks for each prediction, reducing computational load while maintaining high model capacity. This allows Time-MoE to scale effectively without a corresponding increase in inference costs. Time-MoE comprises a family of decoder-only transformer models that operate in an auto-regressive manner and support flexible forecasting horizons with varying input context lengths. We pre-trained these models on our newly introduced large-scale data Time-300B, which spans over 9 domains and encompassing over 300 billion time points. For the first time, we scaled a time series foundation model up to 2.4 billion parameters, achieving significantly improved forecasting precision. Our results validate the applicability of scaling laws for training tokens and model size in the context of time series forecasting. Compared to dense models with the same number of activated parameters or equivalent computation budgets, our models consistently outperform them by large margin.
https://huggingface.co/Maple728
Only the smallest 50M model has been uploaded so far
https://github.com/Time-MoE/Time-MoE
300B timepoint dataset still to be released
>>
>>102542314
>anthracite
the slop brigade? no thanks I don't want the model forgetting I'm a black midget every swipe.
>>
File: nala experiment.png (319 KB, 916x805)
319 KB
319 KB PNG
Well now that's an interesting result. I was expecting a lobotomized model. It's certainly forgotten what an EOS token is, though.
>>
L3.1-70B-Hanami seems good so far. 3.1 smarts but it seems to be breaking it's dryness.
>>
File: ball.png (128 KB, 915x407)
128 KB
128 KB PNG
>>102542430
I seem to have created one of those man made horrors beyond your comprehension.
>>
>>102542430
>>102542447
commaslop
>>
>>102542430
Which model?
>>
>>102542458
Some qlora I ran on Mistral-Small-Instruct
an experiment in using extremely high dropout rate.
>>
>>102542430
>>102542447
>She
>Her
>She
>Her
>She
>She
>Her
>She
>>
You realize you're coming up on the 16k context limit. Do you:
1. Keep going, trusting that discarding the start of chat history will be fine
2. Switch to a lower quantization of your current model so you can increase the context without a big slowdown
3. Increase the context at the price of having to offload more to RAM, drastically slowing down
4. Summarize the chat and restart
5. Other (write your own)
>>
I don't load my models with 16k context
>>
>>102542492
lmao
>>
File: Untitled.png (202 KB, 1316x682)
202 KB
202 KB PNG
Whisper in Medusa's Ear: Multi-head Efficient Decoding for Transformer-based ASR
https://arxiv.org/abs/2409.15869
>Large transformer-based models have significant potential for speech transcription and translation. Their self-attention mechanisms and parallel processing enable them to capture complex patterns and dependencies in audio sequences. However, this potential comes with challenges, as these large and computationally intensive models lead to slow inference speeds. Various optimization strategies have been proposed to improve performance, including efficient hardware utilization and algorithmic enhancements. In this paper, we introduce Whisper-Medusa, a novel approach designed to enhance processing speed with minimal impact on Word Error Rate (WER). The proposed model extends the OpenAI's Whisper architecture by predicting multiple tokens per iteration, resulting in a 50% reduction in latency. We showcase the effectiveness of Whisper-Medusa across different learning setups and datasets.
https://github.com/aiola-lab/whisper-medusa
kind of cool. the medusa-block is probably the one to use.
>>
>>102542430
>>102542447
Also here's an old forgotten arxiv paper on it.
https://arxiv.org/abs/2403.00946
In their experiment they used 90+% dropout, but that was back when finetuning was still done layer by layer I think.
I tried at 90% at first but it was instant lobotomy so I dropped both the learn rate and the dropout to 75% and yeah... I think the results are interesting and worth exploring further.
>>
You realize the woman you're having sex with is actually a man. Do you:
1. Keep going, it's too late to take it back anyway
2. Switch to your hand so you don't get blue balls
3. Kill him and hope your state has gay panic laws
4. Bend over and give him a turn
5. Other (write your own)
>>
File: xnBO8ARozp.png (49 KB, 538x392)
49 KB
49 KB PNG
>>102542586
Qwen2.5
>>
>>102542492
summarize and keep going until the model's stupidity drives me insane and I do something else for a few days
>>
File: sanic desu.png (142 KB, 788x812)
142 KB
142 KB PNG
sovl
>>
File: stawberry.png (9 KB, 851x188)
9 KB
9 KB PNG
interesting. It seems to have undone Arthur's cook-in of the correct answer.
>>
>>102542492
>all that to get 10% max. of cloud model's power
>>
>>102542492
Keep going, I only summarize when the current scene has run its course.
>>
>qwen 2 vl says retarded things
2.5 vl when?
>>
>>102542724
A cloud model is useless because you're at a corporation's mercy. Nobody here is interested in your shilling.
>>
>>102542492
2. Increased to 32k context and now I'm going at 2.5 tokens per second.
>>
>>102542746
fuck I mean 3, undo
>>
>>102542740
No shilling, telling it as is, you will never have anything usable with these toys.
>>
>>102542724
localbros, how do we respond without sounding mad?
>>
>>102542789
Keep making "ahh ahh mistress" one message tests i guess?
>>
Would a q2 qwen 72b program better than q5 gemma 27b?
>>
File: a0a.jpg (192 KB, 508x677)
192 KB
192 KB JPG
>>102542756
There is no undo, face the consequences of your actions and take responsibility.
>>
>>102542492

Due to limitations in local, I tend to keep my roleplays episodic in nature while keeping the overarching themes intact, either RAG or lore book maintenance. Option 4 is perfect in that regard.

For programming or more serious, ‘normie-friendly’ projects where my own ideas or privacy doesn’t matter, I always opt for cloud.
>>
>>102542820
It's over. After 19730 tokens of context a switch flipped and the model repeated its last reply like a broken robot in a TV show.

Zero regens until now. Zero edits until now. Is my run over?

I guess I should have done >>102542636 >>102542830
>>
>>102542851
claud doesn't have that problem
>>
Gemma doesn't know how to make ascii art.
What model can do decent ascii art?
>>
>>102538573
Don't forget /aids/, they have multiple
>>
File: 1716475862235626.png (1.81 MB, 1224x1224)
1.81 MB
1.81 MB PNG
I'm writing a small script (https://github.com/battleprogrammershirase/BUERgence) to quickly narrow down on the best inference parameters for llama.cpp. Right now I'm only testing -t and -ngl since these seem to have the biggest impact on performance. Are there any other parameters I'm missing out on especially as a VRAMlet?
>>
>>102542830
I dislike lore books because they can't affect the first message where they keyword appears if the keyword was in an AI response. This problem isn't theoretical for me. Actual case in my last chat using a Monster Girl Encyclopedia lore book I was trying to improve. When talking about something else the model starts talking about werewolves and weresheep because they're things that reasonably could (and do) exist in the setting and wrote a bunch of stuff that contradicted MGE lore.
Not great solution: when a new lore book entry would be triggered by the newest AI post immediately regenerate it with the additional entry.
My solution: stop caring about MGE lore because it's bland and many of the descriptions are the same thing.
>>
>>102542492
5: Thank God that He created me with both the intelligence and the drive to not be a poorfag
>>
>>102540239
These Tetos are always interesting to admire.
>>
>>102542851
Yeah it's falling apart. Regen was fine, next message questionable, next was going back to the time loop. RIP. I guess the adventure is done. Even if I switch to a more powerful model the writing style and ideas of how the story should work won't be the same. It will be like someone else took over all of a sudden. Maybe I can delete all the example messages to get another 1.2k of context to try to limp along to a conclusion but around 19k tokens looks like the limit of Mistral Small.
>>
>>102542513
Even if you set it longer the recall isn't as good past 16k for a lot of models.
>>
I hate to say it but I think I might really go back to Wizard or some 8x22B after all. Mistral Large is too slow for me, and Mistral Small and Nemo are too dumb. I haven't checked out Sorcerer yet. Maybe I'll try it out.
Miqu 2 when?
>>
>>102535999
>florence is amazing!
>not a single example in the thread
blegh
>>
File: tech_case.jpg (214 KB, 975x668)
214 KB
214 KB JPG
>>102543037
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery
https://arxiv.org/abs/2409.05591
https://github.com/qhjqhj00/MemoRAG
>>
File: 1701199461973372.png (38 KB, 869x340)
38 KB
38 KB PNG
>>102543295
it's happening?
>>
>>102543295
Why does the graphic go from bottom to top?
>>
>>102543206
Screenshot of full log.
>>
>>102542586
I would not have sex with someone I don't already know deeply about and love.
>>
>>102541866
How long is the limit?
>>
>>102543472
45 minutes per day
>>
File: 1722588141364.png (569 KB, 2468x984)
569 KB
569 KB PNG
>>102543285
>>
>>102543491
That's not too bad, assuming they don't count silence so you can actually use it like they advertise as an always-on tool.
>>
>>102543498
Cool, is there a comparison for Florence2?
>>
>>102543562
https://desuarchive.org/g/thread/101749053/#q101750118
https://desuarchive.org/g/thread/101749053/#q101750162
https://desuarchive.org/g/thread/101749053/#q101750228
the guy who originally posted it said Florence was Florence-2-Large-ft
>>
>she Xs, her eyes Ying
>>
File: 119147028_p1.png (3.28 MB, 2569x1440)
3.28 MB
3.28 MB PNG
>Tuesday is over
Bet. A new model will release today that is not from Meta.
>>
>>102543519
no, silence counts too.
redditfag claimed he needed to take a phonecall, so muted himself in chatgpt app but it still counted down the minutes.
45min per day is still more than i thought. i was thinking like 15 min per day or something.

the main problem is people are complaining everywhere how they get the "my guidelines wont allow me to talk about that" CONSTANTLY. even for work related stuff.
And apparently its output is being gimped even harder since the initial rollout. so even less imitations/effects.
The most funny part is normies STILL get their own voice cloned or hear a unrelated 3rd voice. "scary". lolo
i hope somebody comes along who just doesnt give a shit. just relase it and let people figure it out. ms-paint would not have been released in 2024.
first reactions: what if somebody would draw child genitalia with it?!?! total normal behavior from these SF freaks..
>>
>>102543667
catbox the uncensored version
>>
>>102543677
Wow, I thought it'd be funny if I jinxed it but if that's crazy if true. Like what the fuck, what a scam.
>>
>>102543726
The pixiv link is right there anonymo-
Wait that's a 4chanx feature isn't it.
Just install 4chanx bro, you're going to save yourself a lot of trouble in the future.
>>
>>102543230
>Miqu 2
Looking back on it, miqudev was the most based person to ever grace this general. We may never see his like again...
>>
>>102539312
Improvements are constantly being made, though they are primarily refinements. I think the next big leap will be when they solve catastrophic forgetting. Once they do that it will be all about continuous learning and years of refinement will be done on that. We have no need to rush, AI isn't going anywhere anytime soon.
>>
>>102543726
Don't do what the other Anon said, never install 4chanx if you can help it.
>>
This might be a retarded idea but why can't we add user feedback on ST for discarded gens and prefered gens and use that as a dataset to train a custom little reward model that would be used later to prefilter the next generations? I think cai did something like that before.
>>
>>102543726
install 4chanx but leave it disabled
>>
>>102544223
t. regularly gets filtered
>>
File: 1723482268503787.png (6 KB, 281x110)
6 KB
6 KB PNG
>>102544261
>>
File: file.png (733 KB, 768x768)
733 KB
733 KB PNG
>>
>>102544249
>why can't we add user feedback on ST for discarded gens and prefered gens and use that as a dataset to train a custom little reward model that would be used later to prefilter the next generations
You are asking why can't we make local models not local? I dunno anon... But yeah it is a great idea you could ask locusts to do. I am sure they can make an extension for that or something.
>>
>>102544476
Are you retarded? Everything I said can be done locally
>>
>>102544450
updates... doko...
>>
>>102544450
that face...
>>
>>102544249
Why do you need feedback? Just delete any gens you don't like and all the jsonl is now your dataset.
>>
>>102544489
>Are you retarded?
Are you? It is incredible how you don't see a problem with this.
>>
>>102543295
>https://github.com/qhjqhj00/MemoRAG

Oh fuck yeah. Thank you, chinks.
>>
>>102544515
Do you even know what a reward model is?
>>102544516
Enlighten me then?
>>
>>102543295
Wait... Is this the holy waifu grail? And we are finally gonna get waifus and the final problem won't be the alzheimer's but their positivity bias and how they will talk to us about consent? Weird timeline.
>>
>>102544450
bad gen, her top is like an unfinished suggestion
>>
>>102544543
I know. But I like top 80% of the picture a lot and I didn't want to cut it.
>>
File: cuteandlovelymiku.png (1.08 MB, 800x1248)
1.08 MB
1.08 MB PNG
good night /lmg/
>>
>>102544540
>Weird timeline.
kys
>>
>>102544540
>https://github.com/qhjqhj00/MemoRAG

Kinda want to try out their summarization module. But that might be just limited to the model that being used at the end of the day.
>>
>>102544553
Miku, it's 10 AM. Get out of bed.
>>
>>102544526
NTA but it’s inspiring that you talk to retarded people like that they’re probably very lonely and crave the social connection. Anyway yes that’s an interesting idea but people like to switch models all the time and you’d need to have the hw capacity and time to do the actual fine tuning every time you switched. It could be QoL maxxed with some effort tho.
>>
>>102544632
>>102544526
samefag
>>
File: 1702915973031111.png (6 KB, 507x138)
6 KB
6 KB PNG
>>102544647
Retard
>>
File: file.png (8 KB, 579x138)
8 KB
8 KB PNG
>>102544647
You got me...
>>
File: mikuramen.jpg (79 KB, 900x607)
79 KB
79 KB JPG
>>102544661
>>102544687
The duality of anon
>>
>>102543295
I just thought about next steps and what is gonna be the new sally brothers thing for testing if your waifu can remember things? Cause you can bet everyone here is gonna be doing all those memory riddles instead of actually enjoying their LLM waifu.
>>
>>102544848
>>102544848
>>102544848
>>
>>102544540
It's just a better lorebook, that won't solve the long memory issue in a conversational setting
>>
Bump
>>
sage sage sage



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.