[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now being accepted. Click here to apply.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106522347 & >>106516368

►News
>(09/08) OneCAT-3B, unified multimodal decoder-only model released: https://onecat-ai.github.io
>(09/08) IndexTTS2 released: https://hf.co/IndexTeam/IndexTTS-2
>(09/05) Klear-46B-A2.5B released: https://hf.co/collections/Kwai-Klear/klear10-68ba61398a0a4eb392ec6ab1
>(09/04) Kimi K2 update for agentic coding and 256K context: https://hf.co/moonshotai/Kimi-K2-Instruct-0905
>(09/04) Tencent's HunyuanWorld-Voyager for virtual world generation: https://hf.co/tencent/HunyuanWorld-Voyager

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: rrrrrrrrrrrr.png (558 KB, 480x768)
558 KB
558 KB PNG
►Recent Highlights from the Previous Thread: >>106522347

--Papers:
>106528839
--GLM Air quantization tradeoffs and performance benchmarks:
>106523514 >106523546 >106523556 >106523647 >106523581 >106523562
--Analysis of new NSFW prompt steering dataset and its potential impact on model training:
>106523317 >106523496 >106523652 >106523755 >106524808 >106524921 >106524317 >106524624 >106524678 >106524740 >106524797 >106525004 >106525167 >106525178 >106524832 >106525039 >106525225 >106525328 >106525388 >106525503 >106525662 >106525920 >106525780 >106526087 >106526133 >106526227 >106526333 >106526592 >106526620 >106526703 >106526728 >106526797 >106528480
--Game audio format debate: WAV vs FLAC efficiency and storage concerns:
>106522913 >106522928 >106522993 >106523037 >106523040 >106523049 >106523118 >106523295 >106523337 >106523351 >106523369 >106523372 >106523426 >106523441 >106523083
--IndexTTS-2 release and comparison with BigVGAN vocoder:
>106523054 >106523065 >106523076 >106523085 >106523861 >106523877 >106524403 >106524449 >106524544
--VibeVoice license restrictions on voice impersonation and consent requirements:
>106523582 >106523597 >106523639
--Multi-speaker voice generation challenges in VibeVoice with ComfyUI:
>106523404 >106523481 >106524063
--Phrase banning effectiveness vs coherence/performance tradeoffs in LLMs:
>106525300 >106525320 >106525366 >106525463 >106525512
--High cost of NVIDIA GPUs due to CUDA dependency and firmware restrictions:
>106524893 >106524908 >106525006 >106525779 >106524910
--Introduction of REFRAG: Rethinking RAG based Decoding for Multi-step Reasoning:
>106524013 >106524026 >106524047 >106524915
--OneCAT-3B multimodal generation model performance issues:
>106524720 >106524764
--K2 Think open-source reasoning model launched in UAE:
>106524516
--Teto (free space):
>106524155 >106524287 >106528101

►Recent Highlight Posts from the Previous Thread: >>106522352

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: 1748227897708796.gif (2.44 MB, 800x675)
2.44 MB
2.44 MB GIF
>>106528965
what da teto doin
>>
How do you guys come up with good prompts for image generation that actually work well?
>>
>>106528979
what model
>>
File: cat stare.webm (1 MB, 576x1024)
1 MB
1 MB WEBM
>>106528992
sd-3.5-large-ggml
I'm open to other models if this one is poo
>>
Tetolove
>>
File: hero.png (379 KB, 573x549)
379 KB
379 KB PNG
what are the odds that a new thread pops up just as i finish going through my older drives for some maintenance :D well anyways enjoy some caps i got my niggas

first of all:
our hero...
>>
File: beeku.webm (365 KB, 720x720)
365 KB
365 KB WEBM
>>
File: slaym.png (2.32 MB, 1536x1536)
2.32 MB
2.32 MB PNG
the gel hair was the best never liked the faces though they were always too simplistic and female piccasoque esque aesthetically speaking also wish i found the og dipsy the short muscular one in the white ching chong dress speaking of which if you read this and remember anon i was the the tired autist who replied to you lol
>>
File: 1744020740387654.png (3.33 MB, 1692x1860)
3.33 MB
3.33 MB PNG
>>106528999
I don't know about non-anime models
But for anime models browsing https://aibooru.online/posts?tags=has%3Ametadata and checking the metadata of images that interest you is a good start
>>
mikusovl
https://www.youtube.com/watch?v=-5aR2fRklN8
>>
File: GhNDF3IbgAEvzKY.jpg (159 KB, 678x900)
159 KB
159 KB JPG
if anyone remembers this thing lol
>>
>>106529016
I will now have that glass of bees, thanks.
>>
File: Ghanej_XAAAsr7c.jpg (108 KB, 680x766)
108 KB
108 KB JPG
>>106529032
this but 2.0 or 1.0 i forgot which one came first also going through the archives i found the zip so i shall link it aswell https://github.com/rejunity/tiny-asic-1_58bit-matrix-mul
>>
File: 1742230553446913.png (1.78 MB, 1054x1904)
1.78 MB
1.78 MB PNG
>>106529032
This one is cuter
>>
>>106528940
Is there actually any sort of legal issue against this with the game? I get it's Japanese and etc. which was why it was taken down out of respect but is there anything wrong legally if it wasn't written down you can't do it? Also surprisingly, no one has tuned on visual novels when they are a pretty good source of RP and fiction writing knowledge source.
>>
File: 2025-01-27_03-54-19.png (386 KB, 789x920)
386 KB
386 KB PNG
lastly this XD i will stop spamming now apologies i just thought it would be cute so have like a little lookback if anyone has anything else do share mine arent the best i jsut did a quick walkthrough dident check my archives properly
>>
File: 1747881301372260.png (1.24 MB, 1920x1080)
1.24 MB
1.24 MB PNG
Any progress on infinite memory? It's just not gonna happen, is it?
>>
File: 1732650311767260.jpg (66 KB, 850x627)
66 KB
66 KB JPG
>>
>>106529023
>og dipsy
Was it this one? There was more but I can't find them.
>>
File: lmg-at-home.png (53 KB, 542x476)
53 KB
53 KB PNG
>>
File: 1750564531340861.png (946 KB, 1784x1414)
946 KB
946 KB PNG
>>106529084
coming any day now
>>
>>106529138
Good morning sir!
>>
>>106529143
https://litter.catbox.moe/lw6x485dkf1zmjr5.mp3
>>
>>106529112
ye thats the one :D
>>
>>106528973
getting ready to swallow bread
>>
File: 1746144292946963.png (184 KB, 692x479)
184 KB
184 KB PNG
>>106529203
>>
NEW DRUMMER FINETUNE
https://huggingface.co/bartowski/TheDrummer_Valkyrie-49B-v2-GGUF
THIS IS NOT A DRILL
>>
Is there actually anything better than vibe voice?
>>
I tried downloading the vibevoice-community repo to see if any of the issues I've experienced are being caused by comfyui, and it is crazy slow. Did I miss a step or something? In comfyui, it'll usually take a little less time to generate an audio file than the length of the audio file itself. Running this directly via CLI, it took a couple minutes just to do a 4 second voice clip.
>>
>>106529281
failed the vibe check.
>>
File: 1732913061179217.jpg (1018 KB, 1416x2120)
1018 KB
1018 KB JPG
>>
>>106529307
>when the period is early
>>
>>106529281
The issue is this:
>comfyUI vibevoices are wrappers
>if you are on Windows, comfyUI portable is using its own torch install
and
>when you are using the repo
>you are using your global torch install
Difference is that your global torch is probably cpu only and Cumrag torch is using gpu too.
But only you know the difference.
>>
File: 1744826027273291.jpg (992 KB, 1416x2120)
992 KB
992 KB JPG
>>
File: 1751245848327294.jpg (308 KB, 984x984)
308 KB
308 KB JPG
>>106528960
>>106528813
What causes mental illness like this? I couldn't imagine having a family member anywhere near as far up her own ass as she is. I almost don't want to believe these people even exist.
>>
>>106529333
Internet addiction resulting in losing a grasp on reality.
>>
File: 1742034808673029.jpg (70 KB, 986x904)
70 KB
70 KB JPG
>>106529153
>Do the needful

What the hell does that even mean?
>>
File: 1738649831402048.jpg (693 KB, 1416x2120)
693 KB
693 KB JPG
>>
>>106529333
Ironically, the problem could be fixed by family members going up her ass.
>>
>>106529317
Ah, that's quite possibly the issue. Thanks.
>>
>>106529346
freshly squeezed tetomeat
>>
>>106529333
Bruh, antis are going fucking nuts lately. I am 100% that at some point in the near future one of them is going to kill an AI user. Go to their subreddits some time. It's an absolutely unhinged contest of who can demonstrate their hate for AI users the most.
>>
>>106529359
If you check demo/text inference.py or whatever it was named, it has options too..
>>
>>106529023
Hey.
Tbh the style of the faces is also basically unintentional, or rather I kind of don't care a ton about it. I optimized the artist mix of these gens for rendering that jelly look to the slime, and that's the face that resulted. This is true for pretty much all of my gens really, I focus on the style of the environment and some other aspects and the face is an afterthought.

>>106529112
Hey. Here's another. I wasn't the first to gen a deepseek mascot tho. I think I saw a few people coming up with different designs and decided to try adding another to the pile, something a bit different from "dipsy" and the others at the time.
>>
File: 1728618664952634.jpg (724 KB, 2000x1496)
724 KB
724 KB JPG
>>
>>106529346
Is she ok?
>>
>>106529153
lmao
Is this trained on indian scammer calls or something?
>>
File: 1726404699404191.jpg (766 KB, 2000x1496)
766 KB
766 KB JPG
>>106529426
shes got a lot on her mind
>>
File: file.png (230 KB, 694x718)
230 KB
230 KB PNG
>>106529396
Now I remember the drawfag's ones too https://litter.catbox.moe/1ofhmuz9qsf1vuhi.zip
>>
File: 1736350421650724.png (136 KB, 1454x576)
136 KB
136 KB PNG
>>106529333
Oh sadly there's plenty of them
>>
>>106529488
That first one is a particularly bizarre, yet distressingly common form of mental illness.
>>
>>106529488
>steal
>>
>>106529317
That was exactly it. It doesn't have the issue I was running into with comfy where it randomly interjects new voices or uses the wrong voices in multi-speaker conversations. I was immediately able to set up a two-person conversation without any random voice switching. It does still distort pretty hard on longer audio, though. Maybe that's an issue of using a higher CFG.
>>
File: 1738861016392255.webm (1.66 MB, 1280x720)
1.66 MB
1.66 MB WEBM
K2 reasoner??? when??
>>
>>106529488
>Al isn't useful unless it's constantly fed
Are normies really that unaware of local models?
>>
>>106529511
it's like people in game modding refusing to release source code and DMCA'ing shit they don't even maintain anymore because "you reuploaded my work without muh PERMISHUN"
>>
>>106529569
Don't you know? AI takes pieces of existing art and puts them together like a collage. Once there is no more original art left to steal AI stops working.
>>
File: 1755397041960705.png (123 KB, 1486x564)
123 KB
123 KB PNG
>>106529586
TRUEEE
>>
>>106529569
Normies are that unaware of most of how computers work, do you really think they know what a "local instance" is? Some of them don't even know what a file is anymore.
>>
File: 1737479563272476.jpg (1.19 MB, 2708x1684)
1.19 MB
1.19 MB JPG
>>106529598
art is art
>>
>>106529616
I like some parts of postmodernism but really don't like other parts of it
>>
>>106529488
I am actually less enthusiastic about sharing my mind babies and I am not opposed to AI.
>>
File: 1726650702830026.png (189 KB, 318x238)
189 KB
189 KB PNG
>>106529581
>>
File: 1749137279678844.png (1.72 MB, 670x1216)
1.72 MB
1.72 MB PNG
>>106529598
>>
>>106529598
LLMs killed the phrase "give the satisfaction" for me.
>>
>>106529688
That's like saying faggots killed rainbows for you
Stop let other people or AI define what you want
>>
>>106529700
I never used it before, I just see people write it in posts occasionally and it triggers "dats slop" in my mind, and memories of AI characters acting like they can stop you from "winning" by resisting a certain way.
>>
>>106529607
anyone have that pic of "what's local?" "whats a machine?"
>>
poorfag ayymdbros
https://github.com/iacopPBK/llama.cpp-gfx906
>>
>>106529741
Surely anyone with the knowledge to work on this can get employed and buy a proper gpu.
>>
>>106529569
>>106529586
"AI" is a single evil "thing" that runs in a massive datacenter that performs matter dematerialization using drinking water for cooling that the earth is running out of, magicking it out existence, while simultaneously using gargantuan amounts of electricity that pollutes the atmosphere.
It steals your personal information so that it can continuously learn from everything you tell it, feeds you advertisements based on what it learns, reports to the government, takes your jobs, and spys on your family. It steals the work of everything posted online by every human to end your child's creative career before it starts (links to my baby's Deviantart, soundcloud, and ao3 in his profile) and teaches them that sex exists.
It... It has to be stopped because this work of the devil is going to get so smart by itself and kill us all like in the movies.
>>
>>106529749
The second sentence is all true though.
>>
>>106529756
they were doing that long before genai though.
>>
>>106529715
What foolishness. I would simply not allow those words to remind me of those things.
>>
https://x.com/YuGiOh_MD_INFO/status/1965001231400931724
Seems like ygo MD japanese ai commentary voice used Anneli,which get found out to be illegal as it used voice rip from a VN. They will be canceling the release of this function
>>
>>106529281
Try use shorter sample voice
>>
>>106529775
Now it's the AI monster that's doing it on its own, and that's bad!
>>
>>106529031
Lamaze is love
>>
>>106529805
No, I was just using the wrong version of torch. Once I switched to one that works with blackwell GPUs, it worked fine.
>>
am i dumb for wanting 768gb of ddr5 ram and 3x3090s
i feel like its calling to me.
>>
>>106529932
maybe a bit silly for not choosing better graphics hardware, but not dumb.
>>
>>106529932
why do you want gpus 5 years out of date
>>
https://files.catbox.moe/z37o40.flac
>>
China winning yet again
https://x.com/bdsqlsz/status/1965293660058386484
>>
File: file.jpg (947 KB, 3981x1050)
947 KB
947 KB JPG
>>106529973
How did they manage to get the exact same sloppy anime style that qwen image has?
>>
>>106529973
>another gigaslopped synthetic dataset model
im good
>>
File: screaming horny girl.png (199 KB, 351x411)
199 KB
199 KB PNG
>>106529973
https://huggingface.co/tencent/HunyuanImage-2.1
SLOPPAAA
>Minimum: 59 GB GPU memory for 2048x2048 image generation (batch size = 1).
>Note: The memory requirements above are measured with model CPU offloading enabled.
Yikes, but for 2048x though. need goofs
>>
>>106529992
>exact same
It's not even slightly different. You could hold up two images from different models and I would not be able to tell you which is which.
>>
>>106529962
>up the anti
I giggled with sparkling eyes and a mischievous smirk.
>>
>>106529741
The code in that repo reads like AI slop and a lot of the "performance optimizations" don't make any sense to me.
But it does use the v_dot2_f32_f16 for FP16 multiplication with FP32 accumulation that I was previously unfamiliar with as I hadn't read the Vega ISA documentation in detail.
After applying the same instructions to mainline llama.cpp https://github.com/ggml-org/llama.cpp/pull/15884 I've found it to be faster.
It's getting to the point where I think Mi50s will soon be a universally better buy than P40s:

| GPU      | model         | backend | test            |     t/s |
| -------- | ------------- | ------- | --------------: | ------- |
| RTX 3090 | llama 8B Q4_0 | CUDA | pp512 | 5327.80 |
| RTX 3090 | llama 8B Q4_0 | CUDA | tg128 | 141.04 |
| RTX 3090 | llama 8B Q4_0 | CUDA | pp512 @ d16384 | 2572.06 |
| RTX 3090 | llama 8B Q4_0 | CUDA | tg128 @ d16384 | 96.27 |
| P40 | llama 8B Q4_0 | CUDA | pp512 | 1034.45 |
| P40 | llama 8B Q4_0 | CUDA | tg128 | 53.63 |
| P40 | llama 8B Q4_0 | CUDA | pp512 @ d16384 | 311.11 |
| P40 | llama 8B Q4_0 | CUDA | tg128 @ d16384 | 30.66 |
| Mi50 | llama 8B Q4_0 | ROCm | pp512 | 1053.87 |
| Mi50 | llama 8B Q4_0 | ROCm | tg128 | 73.04 |
| Mi50 | llama 8B Q4_0 | ROCm | pp512 @ d16384 | 212.49 |
| Mi50 | llama 8B Q4_0 | ROCm | tg128 @ d16384 | 20.25 |
>>
>>106529932
1.5TB RAM or REGRET
>>
>>106529932
https://www.youtube.com/watch?v=19UsAtgtBmo
Feed me Anon!
>>
>>106529488
it's kind of true though. Like I've been using grok to ask how to run ai models, configure MoE layers, upgrade my PC, how bandwidth works on new gpu's, research cpu's for me etc. A lot of it pulls from recent reddit threads that just happened. AI is most useful when it scrapes up to date data.

And it's the reason why I'm not using local models for this shit (maybe I should experiment with something like tool calling agents locally again but last time I tried using a model fast enough it sucked).

I always see stupid agent stuff on reddit for local but it's usually a one-day shit out project. Any good ones that can scrape the web for answers? Maybe running gemma 30b or some shit?
>>
>>106529341
Indian formality for telling people what to do. Instead of just asking politely, they add urgency and appeal to an inherent necessity to act, derived from the religious word "dharma" (duty, religious necessity)
>>
>>106530222
After buying a 5090 I am starting to see why they made the PC a monster here. Scarily even more relevant ten years later. Also, I thought this was just ai generated until I saw the date.
>>
>>106529333

It's because it makes them think they're intelligent. Being anti-AI allows people to tell themselves that the rest of us are just gullible idiots, but that they're not letting themselves be fooled.

The one area where I agree with them though, is the amount of low effort, AI-voice generated shovelware that gets constantly uploaded to YouTube. That shit is a plague which absolutely needs to cease to exist; but that comes under the category of stupid things which humans do with AI. It's not AI itself that is to blame for it.
>>
>>106530010
>https://github.com/Tencent-Hunyuan/HunyuanImage-2.1
>404
>>
>>106529932
>3x3090
This is cheap enough and you can get this piece by piece.

Ideally you'd have a motherboard with multiple slots with pcie lanes straight from the cpu.

Though the upcoming 5070 Ti Super 24GB (up from 16GB) might give the 3090 a run for its money.
>>
>>106530350
Use online models for generic research and local for creation or private data.
You can copypasta online model info into your local inference engine if needed
>>
>>106530489
Did they accidentally have porn in the dataset?
>>
>>106530529
The model page is up.
>>
>>106530167
Where do I get cheap ram? They can get quite expensive.
>>
>>106530539
I see.
>>
File: cult.jpg (124 KB, 1500x1080)
124 KB
124 KB JPG
>>106529333
One of the fundamental destructive cult traits is alienating members from their family and old friends.
>>
>>106530661
A cult needs a leader
>>
miku feet
>>
File: EYudkowsky.jpg (10 KB, 135x160)
10 KB
10 KB JPG
>>106530682
>>
>>106529962
Cool it with the anti-hindi sentiment.
>>
>>106530727
He's not anti like those antis, he uses image gen himself.
>>
Mistral Large 3
>>
What is the most recent advancement in the field of sex and jerking off?
>>
>>106531235
futanari
>>
>>106531307
that's not recent
>>
>>106531235
you're mom
>>
>>106531235
I started using tmsu to tag doujinshi/manga/CG sets with numerical values for e.g. breast size and degree of pregnancy content.
That way I can filter my collection using simple CLI commands:

tmsu files "size_female>=3" and "pregnancy>=3"|sort --random-sort|xargs feh
>>
>>106531318
also not recent
>>
File: 1739143215048174.png (1006 KB, 644x620)
1006 KB
1006 KB PNG
>>106528960
>>106523317
>>106523496
>>106524181

Updated, corrected version of the previous one. As >>106524181 pointed out, all "conversations" were single turn which was an error in oversight on my end. Here's the corrected version that with actually multi turn objects within:

https://gofile.io/d/ZaBzaH

>>106524333
(I find it odd how one or two anons were hyperfixated on whether or not it had "shivers" in the dataset then the more obvious flaw of the thing only having single turn conversations. Not surprising)
>>
>>106531349
Who do you think has the compute to finetune 2GB of data on a model large enough to be worth using?
>>
File: 1743122148258490.gif (5 KB, 90x90)
5 KB
5 KB GIF
>>106531399

1) I couldn't care less whether or not (YOU) use it, I'm just sharing it
2) Use a trainer that supports streaming.

https://docs.axolotl.ai/docs/streaming.html

You don't HAVE to load the entire thing into VRAM. Even the companies that have rooms upon rooms of GPUs don't load the entire data set into the ramp because that's an idiotic and inefficient way to do it. You little pieces of it in, train on those, then offload and reload the next piece of the data set in and train on that. Do that until you've looked over the entire data set at which point you've completed one epoch and then you do that again for the remaining epochs/steps.

You also act like training on 2 gigs of data alone is actually a lot. Sure it'll take way longer than training on something smaller like a few MB but I don't know why you think It can only be done with data center grade hardware or a giant cluster or some shit.
>>
>>106529741
>>106530063
i was looking through some of their device specific code, from a really quick glance and knowing nothing specific about gfx906 as a target and with no concrete info it does seem to be similar kinds of optimizations in parts of libraries like composable kernel/ck-tile where AMD will use inline device assembly due to weaknesses in how HIP exposes AMD specific features
like clang only semi-recently became vaguely "aware" of buffers as like a concept and the version ROCm ships way predates that
most of AMD's really hardware specific code across ROCm has to manually pack the bitfields of a buffer into a vec4
that'd be the memory access patterns bit

also serious question does no one working on llama/ggml and/or its forks know how to use modern c++ properly or is there some kind of a project level requirement for the codebase to be like that
every time i get curious and think about trying to help i get put off by how weird the codebase is
composable kernel/ck-tile handles all that basically just using templates and constexpr, which is how you're supposed to handle that level of hardware optimization
and if you really want to go whole ass ass modern versions of clang and gcc can use constexpr std::strings as the parameters of inline assembly expressions
slam that shit on C++26 and constexpr and template as much as fucking possible wtf
>>
>>106531475
But who can finetune DeepSeek for even one token here?
>>
>>106531589
No one here cares about even attempting fine-tuning deep-seek except you... That inside joke is getting old. We care about local AI use remember?
>>
if you use st you got haxed https://www.aikido.dev/blog/npm-debug-and-chalk-packages-compromised
>>
File: sample.png (525 KB, 1620x601)
525 KB
525 KB PNG
>>106531475
Data of this grade for a finetune would have to either be very heavily filtered, completely rewritten, and/or strongly diluted with something smart.
>>
>>106531612
I've checked all packages and thankfully cohee isn't an updooter and all packages are a version lower or less than the hacked versions.
>>
>>106531613
For what purpose? If you want it to be "smart" on top of being able to RP like an natural human and the censor the model then just include other data sets and training like the following in order for it to not "forget" common sense (in theory this could help maintain or even improve things like temporal coherence like some other anons suggest):

https://huggingface.co/datasets/derek-thomas/ScienceQA
>>
>>106531494
>memory access patterns
The part about consolidating reads from SRAM makes sense to me and I'll check the PTX/Vega ISA documentations for what sizes of reads are actually available on a hardware level.
What doesn't make sense to me is how the SRAM is padded.
According to the documentation Vega has 32 shared memory bank with a size of 4 bytes each, the same as NVIDIA.
So I don't understand why the padding was changed.

>code
The main requirements by llama.cpp/ggml are C-compatible external APIs and no dependencies if at all possible.
If your issue is with the CUDA device code specifically, I've written it as essentially C code with minimal C++ features like templating.
I need to get things like memory access patterns and register/SRAM use exactly right and I therefore want minimal abstraction over what the hardware is actually doing.
If your question is about ck specifically, my opinion is that for abstracted hardware Vulkan is a better choice.
>>
>>106531494
>>106531663
>So I don't understand why the padding was changed.
Actually, if you change the size of the reads it would make sense to change the padding.
Though I'm not sure the values in the fork are the correct ones to minimize shared memory bank conflicts
>>
I am interested in running tts and maybe even voice recognition input into my chatbots. Issue is I'm already reaching what my GPU's capable of just running a 12b model. Is the work to process my inputs and generate outputs with things like tts or image gen done one at a time, or am I right to be concerned about how my hardware can try to do all this at once?
>>
>>106531647
Do you think finetuning just on real human ERP hasn't been tried already countless times? That doesn't give good results even when the training data is flawless and the participants are literate.

I think the data might still have value, just not directly for a finetune. If you're serious about sharing it, upload it on HF.
>>
Does anyone else have any issues with vibevoice going crazy fast with its audio generation the longer your input text? I'm using the 1.5b model with two samples; 1 4 minute audio track and 1 40 second audio track and 3 sentences of audio turns into a incomprehensible mess, i thought the 1.5 model was better at longer generations.

also if you have any tips on getting better generations with vibevoice i would be grateful for even you crappiest piece of advice. I am truly a novice.
>>
>>106531745
>Do you think finetuning just on real human ERP hasn't been tried already countless times?
>"DUUUUDE SOMEONE'S ALREADY DONE THIS WHY ARE YOU DOING THIS FOR FUN WHY DO YOU LIKE FUUUNNN? WHY DO YOU DO ANYTHING EVER?"

Christ man.... Why do you, in particular, always have a thorn in your side or sound like you woke up on the wrong side of the bed? Did someone wrong you in the past? You have ungodly levels of pessimism even by this place's standards. Perhaps you're more of an academic type that seeks glory or something.


>That doesn't give good results

Define "good results". What do YOU think I intend to use this data set for? What would you or others use it for?


>If you're serious about sharing it, upload it on HF.
Many of the stories may technically violate that platform's tos so that's the only reasoned I event pulled the trigger on that yet. I don't quite understand why me uploading it there would be any indication of whether or not I'm "serious about sharing it"
>>
>>106531311
It's like the Nemo of fetishes. It's old but it's still GOAT within its own niche.
>>
>>106531663
>I need to get things like memory access patterns and register/SRAM use exactly right and I therefore want minimal abstraction over what the hardware is actually doing.
two kinds of templating styles in C++
type erased/type constrained generics or compile time permutation over all edge cases (abstraction but with all costs offloaded to compile time)
the latter involves a lot more template metaprogramming especially back in the day which is why it's not that common and if the library as a whole has this as a general requirement
>The main requirements by llama.cpp/ggml are C-compatible external APIs and no dependencies if at all possible.
then it would probably still be undesirable since opaque massively abstracted constexpr template metaprogramming library doesn't exactly mesh with the spirit of this

now that you mention it i actually really like that the device code is fairly readable because so much shit isn't
>>
>>106531793
>You have ungodly levels of pessimism even by this place's standards.

You have no idea.
>>
File: 1751084308251691.png (2.08 MB, 946x946)
2.08 MB
2.08 MB PNG
>>106531839
I always forget this board supports names but not trip codes. Why the fuck not? (Or if it does, why aren't you using that?)
>>
>>106531793
Just finetuning on ERP makes models retarded and prone to turning any event into a porn scene, it's as simple as that. They won't simply internalize the knowledge and become better at writing ERP. If you add brain-damaged prose to this, it might be funny for for a few chats, but it will get old quickly.
>>
>>106531876
It's not about quality, it's about it not being The Entire Internet.
>>
drummer bros... whens the next SOTA finetune coming??
>>
>>106531933
drumner herre next sota wil com wait me
>>
>>106531936
I thrust in you
>>
>>106531945
comfy
>>
File: 1728091820015666.png (1.55 MB, 670x1204)
1.55 MB
1.55 MB PNG
>>106531876
>Just finetuning on ERP makes models retarded and prone to turning any event into a porn scene,
If you're data set is geared to doing that then yet that can happen (ie every single author was very impatient and got straight to the smut after only a couple paragraphs). If you train on "conversations" that don't do that shit (actually building up to that) then the motor will be less prone to being way too eager to get to the smut. Garbage in, garbage out. This isn't to say using that data set won't result in the model being biased towards smut Even when on prompted to some extent, but it won't IMMEDIATELY jump to that each time just because the data said happens to have smut. You can medicate that by simply having a lot of stories in the data set that are just..... regular stories. Ao3 has plenty of those so incorporating that into a data set like >>106531349 wound be trivial

(Also neither one of us have actually tested this yet so I'm not quite sure why so gung ho unconvincing me not to do this. Learn to have fun and not be a Grinch)
>>
>IndexTTS2
is it good
is it better than vibevoice
>>
>>106530222
roflamo
>>106530503
thanks anon, i might wait two more weeks just to see if there is any more info about this card, looks good.
>>
all this drummer talk but what about davidau's mythomax cooks?
>>
>>106532078
you mean this:
DavidAU/Psyonic-Cetacean-MythoMax-Prose-Crazy-Ultra-Quality-29B-GGUF
jesus christ i got a stroke just reading the name
>>
>>106532099
>he only read the title
come here after reading the card
>>
>>106532110
after reading this https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters you mean
>>
File: 1752288164285029.png (20 KB, 626x118)
20 KB
20 KB PNG
>>106532121
for me, its the NEO model, along with the suggested recc. temperature
>>
If I have a large (huge) codebase and I want a coding model that knows it in and out is it possible to finetune an open source model on it and then use that? or is that not how finetuning works?
>>
File: file.png (623 KB, 2371x1017)
623 KB
623 KB PNG
https://en.wikipedia.org/wiki/Llama_(language_model)
Did you know that Justine Tunney introduced new optimized matrix multiplication kernels for x86 and ARM CPUs, improving prompt evaluation performance for FP16 and 8-bit quantized data types?
>>
>>106532193
First figure out how to create sft datasets specialized for coding. You need to know how to do that if you want to then turn your existing code base into a data set that can be used on a sufficiently intelligence LLM. If this were me a typing to do something like this I would consider fine tuning LLMs codellama or qwen coder

(Do not take anything I said as objective fact. This is just how **I** would go about attempting this)
>>
>>106529973
>China winning yet again
winning the slop award yes
>>
>>106532453
I know that the llama.cpp Wikipedia page is extremely bare-bones.
>>
File: file.png (968 KB, 2375x1540)
968 KB
968 KB PNG
>>106532601
https://en.wikipedia.org/wiki/Llama.cpp
>no ollama
>no ik quants
Jart is good at self-promotion. And Wikipedia is a dogshit site.
>>
>>106531851
/g/ does have trips, retard. Drummer doesn't use one because he's also a retard and enjoys it when people pretend to be him while spouting racial slurs
/g/ doesn't have spoilers, however.
>>
>>106531851
it does (see: cudadev) but drummer doesn't because he's stupid and lazy, if you couldn't tell by the quality of his tunes
>>
>>106531839
Why don't you add anything to the model cards? I will not download any of your retarded slop mixes if the beaverAIs front page is empty.
If you don't bother telling people anything on HF about the slop mix you released, don't expect me to download it. Incompetent faggot, fuck you with your spam.
>>
>>106532032
TheDrummer thread. Ask again tomorrow.
>>
File: 1739009736622581.gif (1.67 MB, 407x1021)
1.67 MB
1.67 MB GIF
>>106532673
Nta. If I were to release a tune of my own to HF, how detailed should the .md be? Off the top of my head I would include things like but chat template you would need to use in order to effectively inference the model, strengths and weaknesses ("good at x but retarded at y", I think there are specialized benchmarks for that too that can be exported in chart form), inference chat log examples so you actually see how it works instead of "just trust me bro" shit, any and all settings used with said logs for close reproducibility, etc.

What would you guys say is the bare minimum information that should be presented on a model page?
>>
>>106531851
I've had a long standing disdain for tripfriends since I was in /p/.

>>106532673
Are people really confusing the BeaverAI page as the official source? Those are test models published publicly so anyone can comment on them. If you want my official releases, check hf.co/TheDrummer , not hf.co/BeaverAI
>>
>>106531951
Pleaes stop putting random unrelated words everywhere, it makes your posts very annoying to read.
>>
>>106532791
>"Boo hoo trip codes le bad!"

What's the point of "name fagging" here without if anyone can impersonate you? What kind of autism is that?
>>
>>106532822
yes but even a small comment detailing what makes for example... Cydonia Redux different from regular cydonia. like why publish if you dont tell anything? retard
>>
>>106532821
>yet, motor, said, medicate,unconvincing
yes, model, set, mediate, on convincing

i think i decoded it
>>
>>106532857
You can infer from the name and size. Redux is a 22B tune, while the last time I named a 22B as Cydonia was nearly a year ago.

It's a updated tune of the older Mistral Small since some people say MS2 was more creative than MS3/3.1/3.2. After trying it out, I kinda agree.
>>
>>106531951
>Learn to have fun and not be a Grinch
I agree with this
>>
>>106529138
>>106529084
Wasn't that just RAG? He said "memory", not "context".
>>
https://github.com/huggingface/transformers/pull/40771
>The Qwen3-Next series represents our next-generation foundation models, optimized for extreme context length and large-scale parameter efficiency.
The series introduces a suite of architectural innovations designed to maximize performance while minimizing computational cost:
>- **Hybrid Attention**: Replaces standard attention with the combination of **Gated DeltaNet** and **Gated Attention**, enabling efficient context modeling.
>- **High-Sparsity MoE**: Achieves an extreme low activation ratio as 1:50 in MoE layers — drastically reducing FLOPs per token while preserving model capacity.
>- **Multi-Token Prediction(MTP)**: Boosts pretraining model performance, and accelerates inference.
>- **Other Optimizations**: Includes techniques such as **zero-centered and weight-decayed layernorm**, **Gated Attention**, and other stabilizing enhancements for robust training.
>Built on this architecture, we trained and open-sourced Qwen3-Next-80B-A3B — 80B total parameters, only 3B active — achieving extreme sparsity and efficiency.
>Despite its ultra-efficiency, it outperforms Qwen3-32B on downstream tasks — while requiring **less than 1/10 of the training cost**.
>Moreover, it delivers over **10x higher inference throughput** than Qwen3-32B when handling contexts longer than 32K tokens.
>>
>>106532927
Some of us have been doing this since early 2023.
>>
>>106532977
I can't wait to have only the bare minimum implemented in llama.cpp while the performance optiomizations rot in a half implemented PR until it diverges too much and becomes impossible to merge.
>>
>>106533008
By then everyone will lose interest and will be waiting for support for the next fotm model.
>>
>>106532977
Not naming it Qwen 4 is an interesting decision. Probably means this is more of an experiment and the 80B is all they planned to train of it.
>>
>>106532977
>it outperforms Qwen3-32B on downstream tasks — while requiring **less than 1/10 of the training cost**.
Now you can see why AI companies don't care about dense models anymore.
>>
>>106532977
>The chinks can't innovate, only cop-ACK
>>
>>106532977
Imagine a day when SOTA models are trainable on non-server hardware. That'd be cool.
>>
>>106532977
>Multi-Token Prediction(MTP)
so we'll be able to filter out whole sentences? like "shivers down my spin"? nice
>>
>>106533079
not what it means, and we've been able to do that for months anyway
>>
File: file.png (191 KB, 1072x805)
191 KB
191 KB PNG
>>106533122
almost a year even
>>
>>106533031
4 is a bad number in China.
>>
>>106533165
That must be very inconvenient.
>>
>>106533175
Why do all those Asian countries make their unlucky numbers so laughably low. English at least got into the double digits before getting superstitious.
>>
>moeslop
Call me when qwen4 drops
>>
>>106533189
>qwen4
>>106533165
>4 is a bad number in China

Please do learn to read.
>>
>>106533198
So your source that there won't be a qwen 4 is... anon from 4chan? And you're so confident that you're going around correcting people?
>>
File: Congratulations?.jpg (68 KB, 546x548)
68 KB
68 KB JPG
>>106532995
>>
>>106533165
>GLM 4, 4.1, 4.5
>Ernie 4.5
?
>>
Oh my goosh, qwen 3 omni soon!
>>
>>106533224
And they're all bad.
>>
>>106529448
>>106529410
>>106529346
>>106529307
Holy shit, these are amazing. What were they made with? Is it based on someone's specific art style?
>>
>>106529333
AIslop will destroy billions.
She's right.
>>
>>106533236
GLM-chan does her best and isn't bad at all. And big Ernie works well for OCR.
>>
>>106533189
The only reason they put out a dense 32B was because they shat bricks seeing the backlash Meta got. They even said they don't see a point in dense models bigger than 32B. Don't know why you'd expect different from qwen4.
>>
>>106529566
what happened in the original video
>>
>>106533450
she gets culturally enriched
>>
>>106533743
>she gets culturally enriched
she literally had a BLM poster on her bedroom, she got to experience what she truly believed, how beautiful is that?
>>
>>106532977
nice
235B version when?
>>
>>106533796
>MTP
Will we finally have working MTP in llama.cpp?

>>106533079
No. It's basically speculative decoding using a specialized layer of the model as a draft model instead of a separate draft model with its own kv cache and the like, something like that.
>>
>>106533450
I looked it up. She got stabbed to death.
>>
Reminder: we once had Undi ITT. It was fun. Now we have fucking drummer... We can't even have davidau. We have fucking shitty drummer.
>>
>>106532977
There is a 10% chance they finally included raunchy sex in pretraining and didn't murder it after pretraining. Which would finally make it the model for sex.
>>
>>106533450
she gets to experience the beauty of american public transport up close
and the worst part is that she probably voted to import more of them too
>>
File: lol.gif (2.51 MB, 498x278)
2.51 MB
2.51 MB GIF
>>106533911
>10%
>>
>>106533911
>finally included raunchy sex in pretraining
where does one even find something like this? are there websites I can scrape?
>>
>>106533949
ao3
>>
>>106533949
The entire ASSTR archive, most of the explicit AO3 fictions that get filtered out, published erotica, etc...
>>
>>106533963
the asstr archive is really bad quality tho, its all hard wrapped and full of ascii art and bbs headers and shit. has anyone cleaned the thing?
>>
>>106533990
Most base models seem to have been trained on a certain amount of Usenet documents with hardwrapped lines, so they should be fine to use mostly as-is unless you're using them for a finetune.
>>
File: cat labor.webm (2.83 MB, 720x1280)
2.83 MB
2.83 MB WEBM
Why doesn't China just force all of its AI research to be merged? They control their companies over there why not force them to work together.
>>
>>106534033
Why would they?
Competition creates innovation.
Works for manufacturing, seems to be working for AI too.
>>
>>106534033
That would make one giant worse abomination, look at meta.
>>
>>106532921
Do you really think that random people are following your release history and are able to decipher whatever the fuck you are doing with your slop releases? Grow up faggot. If you can't bother with some rudimentary release notes, don't bother at all.
>>
>>106534009
yeah i guess I could lower my standards a bit for the sake of simplicity and availability.
>>
File: 664347.jpg (28 KB, 640x640)
28 KB
28 KB JPG
new benchie~
>>
File: 1743816040111236.jpg (15 KB, 400x228)
15 KB
15 KB JPG
Intel Arc Pro B80 will be announced next month at Semicon West
32 GB GDDR6
384-bit bus
32 xe cores
>>
>>106534174
>32 GB GDDR6
I sleep
>>
>>106534174
500USD?
>>
>>106531710
Just run the tts on your CPU
>>
>>106534184
>>106534211
I didn't see it mentioned. My guess would be $700. B60 also should also be officially announced with a $500 price tag and preorders going up
>>
>>106534240
but whats the cuda performance? this is the real crux of the matter, not the VRAM amount, not the bus, not the speed.
>>
>>106529566
>>106533848
Niggers should be killed on sight.
>>
>>106534240
>>106534245
also isnt the b60 dual a meme only support by few consumer boards that have pcie bifurcation and only on intel?
>>
>>106534245
unknown, also arc doesn't directly support cuda so you have to go through sycl/ipex/vino. I wouldn't expect concrete numbers until Q1/Q2 2026
>>106534255
Normal B60 is just 5.0 x8
Dual B60 requires 5.0 x16 (x8/x8) yes.
imo Dual B60 is a bad value just looking at leaked pricing ($1500) You're better off buying two B60s imo if your board has two full x8 slots ( a lot scam you and only do x16 + x4 or x1)
>>
Prediction for when Veo 3 level video models with sound and no censorship will be available locally?
>>
>>106534291
you can sort of meme around with wan.. 2.1? multitalk but I didnt try it
>>
File: file.jpg (28 KB, 768x432)
28 KB
28 KB JPG
NVIDIA does it again. Any bets on the cost?
>>
>>106534444
>end 2026
DDR6 will save us by then.

also checked
>>
File: 1745955587579804.jpg (35 KB, 500x281)
35 KB
35 KB JPG
>>106534444
>nvfp4
I LOVE NVIDIA BENCHMAXXING
>>
>apple event
>nvidia new gpu announcement
>intel announces new execs
>qwen3-next announced
>all on same day
You ever just wonder if you are in a simulation and all the events are staged?
>>
Just MI50 Max
>>
>>106534811
>what are fiscal quarters
https://investor.nvidia.com/financial-info/financial-reports/
>>
>>106534879
The current fiscal quarter doesn't end for another 21 days tho
>>
File: 1751284437559751.png (1.38 MB, 1148x1080)
1.38 MB
1.38 MB PNG
>>106534444
This bitch do be looking kinda sexy tho
>>
>>106534917
got to make the milestones in time for the bean counters to do their tax evasion tricks.
>>
>>106532977
gguf status?
>>
>>106534937
Would look better without the dick shadow across the center of the image.
>>
>>106534937
huggingface blob fellatio anon
get on in
make it suck dick
>>
File: 1727282097325433.gif (140 KB, 379x440)
140 KB
140 KB GIF
>>106534444
>NVFP4
How many quant formats will they make?
>>
>>106532977
Is it going to be worse than GLM air again?
>>
>>106535066
this is the last one bro i swear no more we will totally stop after this one
>>
>>106534444
will i be able to train my own llm
>>
How fucked must intel be internally if this is their response? We got Groq and Cerebras in what seemed like no time, and intel had a chance to be an actual tech leader and gain back prestige and mindshare and this tepid trash is their best play? baka
>>
>>106529333
She is right to be scared, AI will put an end to foids, I love to see it
>>
>>106532780
based on model x
finetuned using dataset Y and method Z
recommended sampler settings U
recommended template V
inteded usecase W

cute benchmark graphs dot png

don't forget the logo.
>>
>>106529962
can you give the sample file?
>>
>>106534033
boomers in charge don't know how computers work
>>
>cooming
>coding
>csam
How come all llm use cases start with c?
>>
>>106534444
>high value use cases
eg. Palantir.
>>
>>106536381
Safety doesn't start with a c
>>
>>106536381
companionship
>>
>>106536427
censorship does
>>
>>106534444
The only good thing is that this will crash the price of strix halo.
>>
File: chatpajeet.jpg (10 KB, 787x54)
10 KB
10 KB JPG
>be lazy and curious
>ask chatgpt to reformat my world book without changing anything else
>reformat would increase its length, not shorten it
New edit has pieces of missing text and edited phrases...
>>
>>106536673
Unironically try qwen 3 coder.
>>
File: d.jpg (61 KB, 789x379)
61 KB
61 KB JPG
>>106536673
I swear to god this happens on purpose.
If I fed this to my Cuckstral it would just shit out the text without any issues.
>>
>>106536673
>New edit has pieces of missing text and edited phrases...
Yup. That's how it goes, specially with cloud models if you are using the online chat interface.
At least in AI studio you can set a low as fuck temperature for Gemini 2.5 pro.
>>
>>106536726
I dislike the way it says that so much.
>>
>>106536673
>use chatgpt for something that needs a lot of context
use gemini bro
>>
File: 1726789104598554.jpg (58 KB, 874x784)
58 KB
58 KB JPG
>>106535440
>finetuned using dataset Y
Based on hf's policies
https://huggingface.co/content-policy
Would this dataset >>106531349 be at risk of getting nuked?
>>
>>106536763
yeah i like how models say sometimes "if you're done with this" or "we you can sit in silence too" which is basically the model telling you to shut the fuck up.
>>
>>106536831
only if enough people report it.
>>
>>106536831
You can just say your dataset is special secret sauce.
>>
File: 1730805377837.jpg (314 KB, 1080x1350)
314 KB
314 KB JPG
>>106533239
ty, umisida is the artist. picrel is one of the 35 images i used in the dataset, my captions could probably be better but i think most times it emulates the style well
>>
>>106533239
also noobai vpred is the base model, "final" lora is around 1.4k steps
>>
File: 1309379053001.jpg (35 KB, 413x413)
35 KB
35 KB JPG
>>106530010
>>
https://vocaroo.com/1epBb8RRjycR
>>
File: 1416049706777.gif (1.9 MB, 500x550)
1.9 MB
1.9 MB GIF
It should be possible to skip a token in prediction and just generate only every odd token, right?
Then you can use fill-in-the-middle to generate even tokens.
And then you can batch those operations together and get almost x2 speed "for free".
Or do this with for (n+2)%3, (n+1)%3 and (n)%3 tokens for almost x3 speed.
Or increase k in n%k and get xk speed boost until you hit the batching limit.
>>
>>106537106
lmao, this is so good
>>
>>106537130
>Speaker 1: Ah, Good Morning... Local Models... General... Sirs! I have been paneering my... kofta... I noticed you have... malware uhh on your... Windows Desktop... Sir. Please contact our... support center... uh, and... Our Engineer will Call you...back...ahhh.
I haven't tested it too much but the calmer the source audio is the better it sounds. Some voice actors have too many highs and lows (because they are acting) and the result can be annoying.
>>
what's the best <80 GB model if I don't care about cockbenchs and creativity and whatnot, all I want is actual intelligence so the model is smart enough to write a coherent plot without contradicting itself every other paragraph (or worse, giving girls dicks)
>>
>>106530010
is it three times as good as what we had before now that it takes three times the vram?
>>
>>106537232
>I don't care about cockebench and creativity, all I want is creativity and cockbench
>>
>>106537112
>It should be possible to skip a token in prediction
how? you could do a beam search but you can't just skip steps.
>>
>>106537239
Three times the slop. Try: https://huggingface.co/spaces/tencent/HunyuanImage-2.1
>>106531724
>>106531858
>>106531890
>>
>>106537112
Draft does this.
>>
File: 1752133741624309.jpg (1.07 MB, 3072x1472)
1.07 MB
1.07 MB JPG
>>106537239
>is it three times as good
it's even more slopped than Qwen, that's a new record
>>
SaaS sirs, recommended watch
https://youtu.be/TwdiUu93DPw
>>
>>106537290
the most retarded way to do it would be by adding a special wildcard token during training and teach model to ignore it, then insert it into prompts.
but I'm 80% sure you don't need to go that far and can just tweak the math a little bit.
>>106537374
no, as I understand it, draft uses separate retarded model to generate tokens fast and then big model is used to check the correctness.
>>
>>106537446
>no, as I understand it, draft uses separate retarded model to generate tokens fast and then big model is used to check the correctness.
Correct.
>>
>>106537106
Artificial General Indian has been achieved



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.