[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106522347 & >>106516368

►News
>(09/08) OneCAT-3B, unified multimodal decoder-only model released: https://onecat-ai.github.io
>(09/08) IndexTTS2 released: https://hf.co/IndexTeam/IndexTTS-2
>(09/05) Klear-46B-A2.5B released: https://hf.co/collections/Kwai-Klear/klear10-68ba61398a0a4eb392ec6ab1
>(09/04) Kimi K2 update for agentic coding and 256K context: https://hf.co/moonshotai/Kimi-K2-Instruct-0905
>(09/04) Tencent's HunyuanWorld-Voyager for virtual world generation: https://hf.co/tencent/HunyuanWorld-Voyager

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: rrrrrrrrrrrr.png (558 KB, 480x768)
558 KB
558 KB PNG
►Recent Highlights from the Previous Thread: >>106522347

--Papers:
>106528839
--GLM Air quantization tradeoffs and performance benchmarks:
>106523514 >106523546 >106523556 >106523647 >106523581 >106523562
--Analysis of new NSFW prompt steering dataset and its potential impact on model training:
>106523317 >106523496 >106523652 >106523755 >106524808 >106524921 >106524317 >106524624 >106524678 >106524740 >106524797 >106525004 >106525167 >106525178 >106524832 >106525039 >106525225 >106525328 >106525388 >106525503 >106525662 >106525920 >106525780 >106526087 >106526133 >106526227 >106526333 >106526592 >106526620 >106526703 >106526728 >106526797 >106528480
--Game audio format debate: WAV vs FLAC efficiency and storage concerns:
>106522913 >106522928 >106522993 >106523037 >106523040 >106523049 >106523118 >106523295 >106523337 >106523351 >106523369 >106523372 >106523426 >106523441 >106523083
--IndexTTS-2 release and comparison with BigVGAN vocoder:
>106523054 >106523065 >106523076 >106523085 >106523861 >106523877 >106524403 >106524449 >106524544
--VibeVoice license restrictions on voice impersonation and consent requirements:
>106523582 >106523597 >106523639
--Multi-speaker voice generation challenges in VibeVoice with ComfyUI:
>106523404 >106523481 >106524063
--Phrase banning effectiveness vs coherence/performance tradeoffs in LLMs:
>106525300 >106525320 >106525366 >106525463 >106525512
--High cost of NVIDIA GPUs due to CUDA dependency and firmware restrictions:
>106524893 >106524908 >106525006 >106525779 >106524910
--Introduction of REFRAG: Rethinking RAG based Decoding for Multi-step Reasoning:
>106524013 >106524026 >106524047 >106524915
--OneCAT-3B multimodal generation model performance issues:
>106524720 >106524764
--K2 Think open-source reasoning model launched in UAE:
>106524516
--Teto (free space):
>106524155 >106524287 >106528101

►Recent Highlight Posts from the Previous Thread: >>106522352

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: 1748227897708796.gif (2.44 MB, 800x675)
2.44 MB
2.44 MB GIF
>>106528965
what da teto doin
>>
How do you guys come up with good prompts for image generation that actually work well?
>>
>>106528979
what model
>>
File: cat stare.webm (1 MB, 576x1024)
1 MB
1 MB WEBM
>>106528992
sd-3.5-large-ggml
I'm open to other models if this one is poo
>>
Tetolove
>>
File: hero.png (379 KB, 573x549)
379 KB
379 KB PNG
what are the odds that a new thread pops up just as i finish going through my older drives for some maintenance :D well anyways enjoy some caps i got my niggas

first of all:
our hero...
>>
File: beeku.webm (365 KB, 720x720)
365 KB
365 KB WEBM
>>
File: slaym.png (2.32 MB, 1536x1536)
2.32 MB
2.32 MB PNG
the gel hair was the best never liked the faces though they were always too simplistic and female piccasoque esque aesthetically speaking also wish i found the og dipsy the short muscular one in the white ching chong dress speaking of which if you read this and remember anon i was the the tired autist who replied to you lol
>>
File: 1744020740387654.png (3.33 MB, 1692x1860)
3.33 MB
3.33 MB PNG
>>106528999
I don't know about non-anime models
But for anime models browsing https://aibooru.online/posts?tags=has%3Ametadata and checking the metadata of images that interest you is a good start
>>
mikusovl
https://www.youtube.com/watch?v=-5aR2fRklN8
>>
File: GhNDF3IbgAEvzKY.jpg (159 KB, 678x900)
159 KB
159 KB JPG
if anyone remembers this thing lol
>>
>>106529016
I will now have that glass of bees, thanks.
>>
File: Ghanej_XAAAsr7c.jpg (108 KB, 680x766)
108 KB
108 KB JPG
>>106529032
this but 2.0 or 1.0 i forgot which one came first also going through the archives i found the zip so i shall link it aswell https://github.com/rejunity/tiny-asic-1_58bit-matrix-mul
>>
File: 1742230553446913.png (1.78 MB, 1054x1904)
1.78 MB
1.78 MB PNG
>>106529032
This one is cuter
>>
>>106528940
Is there actually any sort of legal issue against this with the game? I get it's Japanese and etc. which was why it was taken down out of respect but is there anything wrong legally if it wasn't written down you can't do it? Also surprisingly, no one has tuned on visual novels when they are a pretty good source of RP and fiction writing knowledge source.
>>
File: 2025-01-27_03-54-19.png (386 KB, 789x920)
386 KB
386 KB PNG
lastly this XD i will stop spamming now apologies i just thought it would be cute so have like a little lookback if anyone has anything else do share mine arent the best i jsut did a quick walkthrough dident check my archives properly
>>
File: 1747881301372260.png (1.24 MB, 1920x1080)
1.24 MB
1.24 MB PNG
Any progress on infinite memory? It's just not gonna happen, is it?
>>
File: 1732650311767260.jpg (66 KB, 850x627)
66 KB
66 KB JPG
>>
>>106529023
>og dipsy
Was it this one? There was more but I can't find them.
>>
File: lmg-at-home.png (53 KB, 542x476)
53 KB
53 KB PNG
>>
File: 1750564531340861.png (946 KB, 1784x1414)
946 KB
946 KB PNG
>>106529084
coming any day now
>>
>>106529138
Good morning sir!
>>
>>106529143
https://litter.catbox.moe/lw6x485dkf1zmjr5.mp3
>>
>>106529112
ye thats the one :D
>>
>>106528973
getting ready to swallow bread
>>
File: 1746144292946963.png (184 KB, 692x479)
184 KB
184 KB PNG
>>106529203
>>
NEW DRUMMER FINETUNE
https://huggingface.co/bartowski/TheDrummer_Valkyrie-49B-v2-GGUF
THIS IS NOT A DRILL
>>
Is there actually anything better than vibe voice?
>>
I tried downloading the vibevoice-community repo to see if any of the issues I've experienced are being caused by comfyui, and it is crazy slow. Did I miss a step or something? In comfyui, it'll usually take a little less time to generate an audio file than the length of the audio file itself. Running this directly via CLI, it took a couple minutes just to do a 4 second voice clip.
>>
>>106529281
failed the vibe check.
>>
File: 1732913061179217.jpg (1018 KB, 1416x2120)
1018 KB
1018 KB JPG
>>
>>106529307
>when the period is early
>>
>>106529281
The issue is this:
>comfyUI vibevoices are wrappers
>if you are on Windows, comfyUI portable is using its own torch install
and
>when you are using the repo
>you are using your global torch install
Difference is that your global torch is probably cpu only and Cumrag torch is using gpu too.
But only you know the difference.
>>
File: 1744826027273291.jpg (992 KB, 1416x2120)
992 KB
992 KB JPG
>>
File: 1751245848327294.jpg (308 KB, 984x984)
308 KB
308 KB JPG
>>106528960
>>106528813
What causes mental illness like this? I couldn't imagine having a family member anywhere near as far up her own ass as she is. I almost don't want to believe these people even exist.
>>
>>106529333
Internet addiction resulting in losing a grasp on reality.
>>
File: 1742034808673029.jpg (70 KB, 986x904)
70 KB
70 KB JPG
>>106529153
>Do the needful

What the hell does that even mean?
>>
File: 1738649831402048.jpg (693 KB, 1416x2120)
693 KB
693 KB JPG
>>
>>106529333
Ironically, the problem could be fixed by family members going up her ass.
>>
>>106529317
Ah, that's quite possibly the issue. Thanks.
>>
>>106529346
freshly squeezed tetomeat
>>
>>106529333
Bruh, antis are going fucking nuts lately. I am 100% that at some point in the near future one of them is going to kill an AI user. Go to their subreddits some time. It's an absolutely unhinged contest of who can demonstrate their hate for AI users the most.
>>
>>106529359
If you check demo/text inference.py or whatever it was named, it has options too..
>>
>>106529023
Hey.
Tbh the style of the faces is also basically unintentional, or rather I kind of don't care a ton about it. I optimized the artist mix of these gens for rendering that jelly look to the slime, and that's the face that resulted. This is true for pretty much all of my gens really, I focus on the style of the environment and some other aspects and the face is an afterthought.

>>106529112
Hey. Here's another. I wasn't the first to gen a deepseek mascot tho. I think I saw a few people coming up with different designs and decided to try adding another to the pile, something a bit different from "dipsy" and the others at the time.
>>
File: 1728618664952634.jpg (724 KB, 2000x1496)
724 KB
724 KB JPG
>>
>>106529346
Is she ok?
>>
>>106529153
lmao
Is this trained on indian scammer calls or something?
>>
File: 1726404699404191.jpg (766 KB, 2000x1496)
766 KB
766 KB JPG
>>106529426
shes got a lot on her mind
>>
File: file.png (230 KB, 694x718)
230 KB
230 KB PNG
>>106529396
Now I remember the drawfag's ones too https://litter.catbox.moe/1ofhmuz9qsf1vuhi.zip
>>
File: 1736350421650724.png (136 KB, 1454x576)
136 KB
136 KB PNG
>>106529333
Oh sadly there's plenty of them
>>
>>106529488
That first one is a particularly bizarre, yet distressingly common form of mental illness.
>>
>>106529488
>steal
>>
>>106529317
That was exactly it. It doesn't have the issue I was running into with comfy where it randomly interjects new voices or uses the wrong voices in multi-speaker conversations. I was immediately able to set up a two-person conversation without any random voice switching. It does still distort pretty hard on longer audio, though. Maybe that's an issue of using a higher CFG.
>>
File: 1738861016392255.webm (1.66 MB, 1280x720)
1.66 MB
1.66 MB WEBM
K2 reasoner??? when??
>>
>>106529488
>Al isn't useful unless it's constantly fed
Are normies really that unaware of local models?
>>
>>106529511
it's like people in game modding refusing to release source code and DMCA'ing shit they don't even maintain anymore because "you reuploaded my work without muh PERMISHUN"
>>
>>106529569
Don't you know? AI takes pieces of existing art and puts them together like a collage. Once there is no more original art left to steal AI stops working.
>>
File: 1755397041960705.png (123 KB, 1486x564)
123 KB
123 KB PNG
>>106529586
TRUEEE
>>
>>106529569
Normies are that unaware of most of how computers work, do you really think they know what a "local instance" is? Some of them don't even know what a file is anymore.
>>
File: 1737479563272476.jpg (1.19 MB, 2708x1684)
1.19 MB
1.19 MB JPG
>>106529598
art is art
>>
>>106529616
I like some parts of postmodernism but really don't like other parts of it
>>
>>106529488
I am actually less enthusiastic about sharing my mind babies and I am not opposed to AI.
>>
File: 1726650702830026.png (189 KB, 318x238)
189 KB
189 KB PNG
>>106529581
>>
File: 1749137279678844.png (1.72 MB, 670x1216)
1.72 MB
1.72 MB PNG
>>106529598
>>
>>106529598
LLMs killed the phrase "give the satisfaction" for me.
>>
>>106529688
That's like saying faggots killed rainbows for you
Stop let other people or AI define what you want
>>
>>106529700
I never used it before, I just see people write it in posts occasionally and it triggers "dats slop" in my mind, and memories of AI characters acting like they can stop you from "winning" by resisting a certain way.
>>
>>106529607
anyone have that pic of "what's local?" "whats a machine?"
>>
poorfag ayymdbros
https://github.com/iacopPBK/llama.cpp-gfx906
>>
>>106529741
Surely anyone with the knowledge to work on this can get employed and buy a proper gpu.
>>
>>106529569
>>106529586
"AI" is a single evil "thing" that runs in a massive datacenter that performs matter dematerialization using drinking water for cooling that the earth is running out of, magicking it out existence, while simultaneously using gargantuan amounts of electricity that pollutes the atmosphere.
It steals your personal information so that it can continuously learn from everything you tell it, feeds you advertisements based on what it learns, reports to the government, takes your jobs, and spys on your family. It steals the work of everything posted online by every human to end your child's creative career before it starts (links to my baby's Deviantart, soundcloud, and ao3 in his profile) and teaches them that sex exists.
It... It has to be stopped because this work of the devil is going to get so smart by itself and kill us all like in the movies.
>>
>>106529749
The second sentence is all true though.
>>
>>106529756
they were doing that long before genai though.
>>
>>106529715
What foolishness. I would simply not allow those words to remind me of those things.
>>
https://x.com/YuGiOh_MD_INFO/status/1965001231400931724
Seems like ygo MD japanese ai commentary voice used Anneli,which get found out to be illegal as it used voice rip from a VN. They will be canceling the release of this function
>>
>>106529281
Try use shorter sample voice
>>
>>106529775
Now it's the AI monster that's doing it on its own, and that's bad!
>>
>>106529031
Lamaze is love
>>
>>106529805
No, I was just using the wrong version of torch. Once I switched to one that works with blackwell GPUs, it worked fine.
>>
am i dumb for wanting 768gb of ddr5 ram and 3x3090s
i feel like its calling to me.
>>
>>106529932
maybe a bit silly for not choosing better graphics hardware, but not dumb.
>>
>>106529932
why do you want gpus 5 years out of date
>>
https://files.catbox.moe/z37o40.flac
>>
China winning yet again
https://x.com/bdsqlsz/status/1965293660058386484
>>
File: file.jpg (947 KB, 3981x1050)
947 KB
947 KB JPG
>>106529973
How did they manage to get the exact same sloppy anime style that qwen image has?
>>
>>106529973
>another gigaslopped synthetic dataset model
im good
>>
File: screaming horny girl.png (199 KB, 351x411)
199 KB
199 KB PNG
>>106529973
https://huggingface.co/tencent/HunyuanImage-2.1
SLOPPAAA
>Minimum: 59 GB GPU memory for 2048x2048 image generation (batch size = 1).
>Note: The memory requirements above are measured with model CPU offloading enabled.
Yikes, but for 2048x though. need goofs
>>
>>106529992
>exact same
It's not even slightly different. You could hold up two images from different models and I would not be able to tell you which is which.
>>
>>106529962
>up the anti
I giggled with sparkling eyes and a mischievous smirk.
>>
>>106529741
The code in that repo reads like AI slop and a lot of the "performance optimizations" don't make any sense to me.
But it does use the v_dot2_f32_f16 for FP16 multiplication with FP32 accumulation that I was previously unfamiliar with as I hadn't read the Vega ISA documentation in detail.
After applying the same instructions to mainline llama.cpp https://github.com/ggml-org/llama.cpp/pull/15884 I've found it to be faster.
It's getting to the point where I think Mi50s will soon be a universally better buy than P40s:

| GPU      | model         | backend | test            |     t/s |
| -------- | ------------- | ------- | --------------: | ------- |
| RTX 3090 | llama 8B Q4_0 | CUDA | pp512 | 5327.80 |
| RTX 3090 | llama 8B Q4_0 | CUDA | tg128 | 141.04 |
| RTX 3090 | llama 8B Q4_0 | CUDA | pp512 @ d16384 | 2572.06 |
| RTX 3090 | llama 8B Q4_0 | CUDA | tg128 @ d16384 | 96.27 |
| P40 | llama 8B Q4_0 | CUDA | pp512 | 1034.45 |
| P40 | llama 8B Q4_0 | CUDA | tg128 | 53.63 |
| P40 | llama 8B Q4_0 | CUDA | pp512 @ d16384 | 311.11 |
| P40 | llama 8B Q4_0 | CUDA | tg128 @ d16384 | 30.66 |
| Mi50 | llama 8B Q4_0 | ROCm | pp512 | 1053.87 |
| Mi50 | llama 8B Q4_0 | ROCm | tg128 | 73.04 |
| Mi50 | llama 8B Q4_0 | ROCm | pp512 @ d16384 | 212.49 |
| Mi50 | llama 8B Q4_0 | ROCm | tg128 @ d16384 | 20.25 |
>>
>>106529932
1.5TB RAM or REGRET
>>
>>106529932
https://www.youtube.com/watch?v=19UsAtgtBmo
Feed me Anon!
>>
>>106529488
it's kind of true though. Like I've been using grok to ask how to run ai models, configure MoE layers, upgrade my PC, how bandwidth works on new gpu's, research cpu's for me etc. A lot of it pulls from recent reddit threads that just happened. AI is most useful when it scrapes up to date data.

And it's the reason why I'm not using local models for this shit (maybe I should experiment with something like tool calling agents locally again but last time I tried using a model fast enough it sucked).

I always see stupid agent stuff on reddit for local but it's usually a one-day shit out project. Any good ones that can scrape the web for answers? Maybe running gemma 30b or some shit?
>>
>>106529341
Indian formality for telling people what to do. Instead of just asking politely, they add urgency and appeal to an inherent necessity to act, derived from the religious word "dharma" (duty, religious necessity)
>>
>>106530222
After buying a 5090 I am starting to see why they made the PC a monster here. Scarily even more relevant ten years later. Also, I thought this was just ai generated until I saw the date.
>>
>>106529333

It's because it makes them think they're intelligent. Being anti-AI allows people to tell themselves that the rest of us are just gullible idiots, but that they're not letting themselves be fooled.

The one area where I agree with them though, is the amount of low effort, AI-voice generated shovelware that gets constantly uploaded to YouTube. That shit is a plague which absolutely needs to cease to exist; but that comes under the category of stupid things which humans do with AI. It's not AI itself that is to blame for it.
>>
>>106530010
>https://github.com/Tencent-Hunyuan/HunyuanImage-2.1
>404
>>
>>106529932
>3x3090
This is cheap enough and you can get this piece by piece.

Ideally you'd have a motherboard with multiple slots with pcie lanes straight from the cpu.

Though the upcoming 5070 Ti Super 24GB (up from 16GB) might give the 3090 a run for its money.
>>
>>106530350
Use online models for generic research and local for creation or private data.
You can copypasta online model info into your local inference engine if needed
>>
>>106530489
Did they accidentally have porn in the dataset?
>>
>>106530529
The model page is up.
>>
>>106530167
Where do I get cheap ram? They can get quite expensive.
>>
>>106530539
I see.
>>
File: cult.jpg (124 KB, 1500x1080)
124 KB
124 KB JPG
>>106529333
One of the fundamental destructive cult traits is alienating members from their family and old friends.
>>
>>106530661
A cult needs a leader
>>
miku feet
>>
File: EYudkowsky.jpg (10 KB, 135x160)
10 KB
10 KB JPG
>>106530682
>>
>>106529962
Cool it with the anti-hindi sentiment.
>>
>>106530727
He's not anti like those antis, he uses image gen himself.
>>
Mistral Large 3
>>
What is the most recent advancement in the field of sex and jerking off?
>>
>>106531235
futanari
>>
>>106531307
that's not recent
>>
>>106531235
you're mom
>>
>>106531235
I started using tmsu to tag doujinshi/manga/CG sets with numerical values for e.g. breast size and degree of pregnancy content.
That way I can filter my collection using simple CLI commands:

tmsu files "size_female>=3" and "pregnancy>=3"|sort --random-sort|xargs feh
>>
>>106531318
also not recent
>>
File: 1739143215048174.png (1006 KB, 644x620)
1006 KB
1006 KB PNG
>>106528960
>>106523317
>>106523496
>>106524181

Updated, corrected version of the previous one. As >>106524181 pointed out, all "conversations" were single turn which was an error in oversight on my end. Here's the corrected version that with actually multi turn objects within:

https://gofile.io/d/ZaBzaH

>>106524333
(I find it odd how one or two anons were hyperfixated on whether or not it had "shivers" in the dataset then the more obvious flaw of the thing only having single turn conversations. Not surprising)
>>
>>106531349
Who do you think has the compute to finetune 2GB of data on a model large enough to be worth using?
>>
File: 1743122148258490.gif (5 KB, 90x90)
5 KB
5 KB GIF
>>106531399

1) I couldn't care less whether or not (YOU) use it, I'm just sharing it
2) Use a trainer that supports streaming.

https://docs.axolotl.ai/docs/streaming.html

You don't HAVE to load the entire thing into VRAM. Even the companies that have rooms upon rooms of GPUs don't load the entire data set into the ramp because that's an idiotic and inefficient way to do it. You little pieces of it in, train on those, then offload and reload the next piece of the data set in and train on that. Do that until you've looked over the entire data set at which point you've completed one epoch and then you do that again for the remaining epochs/steps.

You also act like training on 2 gigs of data alone is actually a lot. Sure it'll take way longer than training on something smaller like a few MB but I don't know why you think It can only be done with data center grade hardware or a giant cluster or some shit.
>>
>>106529741
>>106530063
i was looking through some of their device specific code, from a really quick glance and knowing nothing specific about gfx906 as a target and with no concrete info it does seem to be similar kinds of optimizations in parts of libraries like composable kernel/ck-tile where AMD will use inline device assembly due to weaknesses in how HIP exposes AMD specific features
like clang only semi-recently became vaguely "aware" of buffers as like a concept and the version ROCm ships way predates that
most of AMD's really hardware specific code across ROCm has to manually pack the bitfields of a buffer into a vec4
that'd be the memory access patterns bit

also serious question does no one working on llama/ggml and/or its forks know how to use modern c++ properly or is there some kind of a project level requirement for the codebase to be like that
every time i get curious and think about trying to help i get put off by how weird the codebase is
composable kernel/ck-tile handles all that basically just using templates and constexpr, which is how you're supposed to handle that level of hardware optimization
and if you really want to go whole ass ass modern versions of clang and gcc can use constexpr std::strings as the parameters of inline assembly expressions
slam that shit on C++26 and constexpr and template as much as fucking possible wtf
>>
>>106531475
But who can finetune DeepSeek for even one token here?
>>
>>106531589
No one here cares about even attempting fine-tuning deep-seek except you... That inside joke is getting old. We care about local AI use remember?
>>
if you use st you got haxed https://www.aikido.dev/blog/npm-debug-and-chalk-packages-compromised
>>
File: sample.png (525 KB, 1620x601)
525 KB
525 KB PNG
>>106531475
Data of this grade for a finetune would have to either be very heavily filtered, completely rewritten, and/or strongly diluted with something smart.
>>
>>106531612
I've checked all packages and thankfully cohee isn't an updooter and all packages are a version lower or less than the hacked versions.
>>
>>106531613
For what purpose? If you want it to be "smart" on top of being able to RP like an natural human and the censor the model then just include other data sets and training like the following in order for it to not "forget" common sense (in theory this could help maintain or even improve things like temporal coherence like some other anons suggest):

https://huggingface.co/datasets/derek-thomas/ScienceQA
>>
>>106531494
>memory access patterns
The part about consolidating reads from SRAM makes sense to me and I'll check the PTX/Vega ISA documentations for what sizes of reads are actually available on a hardware level.
What doesn't make sense to me is how the SRAM is padded.
According to the documentation Vega has 32 shared memory bank with a size of 4 bytes each, the same as NVIDIA.
So I don't understand why the padding was changed.

>code
The main requirements by llama.cpp/ggml are C-compatible external APIs and no dependencies if at all possible.
If your issue is with the CUDA device code specifically, I've written it as essentially C code with minimal C++ features like templating.
I need to get things like memory access patterns and register/SRAM use exactly right and I therefore want minimal abstraction over what the hardware is actually doing.
If your question is about ck specifically, my opinion is that for abstracted hardware Vulkan is a better choice.
>>
>>106531494
>>106531663
>So I don't understand why the padding was changed.
Actually, if you change the size of the reads it would make sense to change the padding.
Though I'm not sure the values in the fork are the correct ones to minimize shared memory bank conflicts
>>
I am interested in running tts and maybe even voice recognition input into my chatbots. Issue is I'm already reaching what my GPU's capable of just running a 12b model. Is the work to process my inputs and generate outputs with things like tts or image gen done one at a time, or am I right to be concerned about how my hardware can try to do all this at once?
>>
>>106531647
Do you think finetuning just on real human ERP hasn't been tried already countless times? That doesn't give good results even when the training data is flawless and the participants are literate.

I think the data might still have value, just not directly for a finetune. If you're serious about sharing it, upload it on HF.
>>
Does anyone else have any issues with vibevoice going crazy fast with its audio generation the longer your input text? I'm using the 1.5b model with two samples; 1 4 minute audio track and 1 40 second audio track and 3 sentences of audio turns into a incomprehensible mess, i thought the 1.5 model was better at longer generations.

also if you have any tips on getting better generations with vibevoice i would be grateful for even you crappiest piece of advice. I am truly a novice.
>>
>>106531745
>Do you think finetuning just on real human ERP hasn't been tried already countless times?
>"DUUUUDE SOMEONE'S ALREADY DONE THIS WHY ARE YOU DOING THIS FOR FUN WHY DO YOU LIKE FUUUNNN? WHY DO YOU DO ANYTHING EVER?"

Christ man.... Why do you, in particular, always have a thorn in your side or sound like you woke up on the wrong side of the bed? Did someone wrong you in the past? You have ungodly levels of pessimism even by this place's standards. Perhaps you're more of an academic type that seeks glory or something.


>That doesn't give good results

Define "good results". What do YOU think I intend to use this data set for? What would you or others use it for?


>If you're serious about sharing it, upload it on HF.
Many of the stories may technically violate that platform's tos so that's the only reasoned I event pulled the trigger on that yet. I don't quite understand why me uploading it there would be any indication of whether or not I'm "serious about sharing it"
>>
>>106531311
It's like the Nemo of fetishes. It's old but it's still GOAT within its own niche.
>>
>>106531663
>I need to get things like memory access patterns and register/SRAM use exactly right and I therefore want minimal abstraction over what the hardware is actually doing.
two kinds of templating styles in C++
type erased/type constrained generics or compile time permutation over all edge cases (abstraction but with all costs offloaded to compile time)
the latter involves a lot more template metaprogramming especially back in the day which is why it's not that common and if the library as a whole has this as a general requirement
>The main requirements by llama.cpp/ggml are C-compatible external APIs and no dependencies if at all possible.
then it would probably still be undesirable since opaque massively abstracted constexpr template metaprogramming library doesn't exactly mesh with the spirit of this

now that you mention it i actually really like that the device code is fairly readable because so much shit isn't
>>
>>106531793
>You have ungodly levels of pessimism even by this place's standards.

You have no idea.
>>
File: 1751084308251691.png (2.08 MB, 946x946)
2.08 MB
2.08 MB PNG
>>106531839
I always forget this board supports names but not trip codes. Why the fuck not? (Or if it does, why aren't you using that?)
>>
>>106531793
Just finetuning on ERP makes models retarded and prone to turning any event into a porn scene, it's as simple as that. They won't simply internalize the knowledge and become better at writing ERP. If you add brain-damaged prose to this, it might be funny for for a few chats, but it will get old quickly.
>>
>>106531876
It's not about quality, it's about it not being The Entire Internet.
>>
drummer bros... whens the next SOTA finetune coming??
>>
>>106531933
drumner herre next sota wil com wait me
>>
>>106531936
I thrust in you
>>
>>106531945
comfy
>>
File: 1728091820015666.png (1.55 MB, 670x1204)
1.55 MB
1.55 MB PNG
>>106531876
>Just finetuning on ERP makes models retarded and prone to turning any event into a porn scene,
If you're data set is geared to doing that then yet that can happen (ie every single author was very impatient and got straight to the smut after only a couple paragraphs). If you train on "conversations" that don't do that shit (actually building up to that) then the motor will be less prone to being way too eager to get to the smut. Garbage in, garbage out. This isn't to say using that data set won't result in the model being biased towards smut Even when on prompted to some extent, but it won't IMMEDIATELY jump to that each time just because the data said happens to have smut. You can medicate that by simply having a lot of stories in the data set that are just..... regular stories. Ao3 has plenty of those so incorporating that into a data set like >>106531349 wound be trivial

(Also neither one of us have actually tested this yet so I'm not quite sure why so gung ho unconvincing me not to do this. Learn to have fun and not be a Grinch)
>>
>IndexTTS2
is it good
is it better than vibevoice
>>
>>106530222
roflamo
>>106530503
thanks anon, i might wait two more weeks just to see if there is any more info about this card, looks good.
>>
all this drummer talk but what about davidau's mythomax cooks?
>>
>>106532078
you mean this:
DavidAU/Psyonic-Cetacean-MythoMax-Prose-Crazy-Ultra-Quality-29B-GGUF
jesus christ i got a stroke just reading the name
>>
>>106532099
>he only read the title
come here after reading the card
>>
>>106532110
after reading this https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters you mean
>>
File: 1752288164285029.png (20 KB, 626x118)
20 KB
20 KB PNG
>>106532121
for me, its the NEO model, along with the suggested recc. temperature
>>
If I have a large (huge) codebase and I want a coding model that knows it in and out is it possible to finetune an open source model on it and then use that? or is that not how finetuning works?
>>
File: file.png (623 KB, 2371x1017)
623 KB
623 KB PNG
https://en.wikipedia.org/wiki/Llama_(language_model)
Did you know that Justine Tunney introduced new optimized matrix multiplication kernels for x86 and ARM CPUs, improving prompt evaluation performance for FP16 and 8-bit quantized data types?
>>
>>106532193
First figure out how to create sft datasets specialized for coding. You need to know how to do that if you want to then turn your existing code base into a data set that can be used on a sufficiently intelligence LLM. If this were me a typing to do something like this I would consider fine tuning LLMs codellama or qwen coder

(Do not take anything I said as objective fact. This is just how **I** would go about attempting this)
>>
>>106529973
>China winning yet again
winning the slop award yes
>>
>>106532453
I know that the llama.cpp Wikipedia page is extremely bare-bones.
>>
File: file.png (968 KB, 2375x1540)
968 KB
968 KB PNG
>>106532601
https://en.wikipedia.org/wiki/Llama.cpp
>no ollama
>no ik quants
Jart is good at self-promotion. And Wikipedia is a dogshit site.
>>
>>106531851
/g/ does have trips, retard. Drummer doesn't use one because he's also a retard and enjoys it when people pretend to be him while spouting racial slurs
/g/ doesn't have spoilers, however.
>>
>>106531851
it does (see: cudadev) but drummer doesn't because he's stupid and lazy, if you couldn't tell by the quality of his tunes
>>
>>106531839
Why don't you add anything to the model cards? I will not download any of your retarded slop mixes if the beaverAIs front page is empty.
If you don't bother telling people anything on HF about the slop mix you released, don't expect me to download it. Incompetent faggot, fuck you with your spam.
>>
>>106532032
TheDrummer thread. Ask again tomorrow.
>>
File: 1739009736622581.gif (1.67 MB, 407x1021)
1.67 MB
1.67 MB GIF
>>106532673
Nta. If I were to release a tune of my own to HF, how detailed should the .md be? Off the top of my head I would include things like but chat template you would need to use in order to effectively inference the model, strengths and weaknesses ("good at x but retarded at y", I think there are specialized benchmarks for that too that can be exported in chart form), inference chat log examples so you actually see how it works instead of "just trust me bro" shit, any and all settings used with said logs for close reproducibility, etc.

What would you guys say is the bare minimum information that should be presented on a model page?
>>
>>106531851
I've had a long standing disdain for tripfriends since I was in /p/.

>>106532673
Are people really confusing the BeaverAI page as the official source? Those are test models published publicly so anyone can comment on them. If you want my official releases, check hf.co/TheDrummer , not hf.co/BeaverAI
>>
>>106531951
Pleaes stop putting random unrelated words everywhere, it makes your posts very annoying to read.
>>
>>106532791
>"Boo hoo trip codes le bad!"

What's the point of "name fagging" here without if anyone can impersonate you? What kind of autism is that?
>>
>>106532822
yes but even a small comment detailing what makes for example... Cydonia Redux different from regular cydonia. like why publish if you dont tell anything? retard
>>
>>106532821
>yet, motor, said, medicate,unconvincing
yes, model, set, mediate, on convincing

i think i decoded it
>>
>>106532857
You can infer from the name and size. Redux is a 22B tune, while the last time I named a 22B as Cydonia was nearly a year ago.

It's a updated tune of the older Mistral Small since some people say MS2 was more creative than MS3/3.1/3.2. After trying it out, I kinda agree.
>>
>>106531951
>Learn to have fun and not be a Grinch
I agree with this
>>
>>106529138
>>106529084
Wasn't that just RAG? He said "memory", not "context".



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.