[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 39_04322__.png (1.42 MB, 896x1152)
1.42 MB
1.42 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102688881 & >>102674638

►News
>(09/27) Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
>(09/25) Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog
>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>102688881

--Paper: Microsoft's VPTQ quantization tech compresses LLaMA 70B to 20GB, but offloading capabilities unclear:
>102694706 >102694782 >102694903 >102694915 >102695025
--Papers:
>102692015 >102696203
--Pruning-aware training for optimizing expert placement:
>102694725 >102694896
--Llama.cpp server can save and switch between kv caches:
>102691814 >102691910 >102691975
--Big players use batched inference and continuous batching to handle multiple users:
>102691658 >102691704 >102693996 >102694009 >102694335
--Using {{random}} prefill prompting technique to add variety:
>102690004 >102696707
--Qwen's performance on 4chan post evaluation and potential improvements:
>102692995
--New optimizer claims to be faster and more memory-efficient than AdamW, with potential benefits for training and finetuning quantized models:
>102697724 >102697833 >102697862 >102697887 >102697896 >102697948 >102698015
--LLMs don't actually learn the training data distribution, but learn to replicate it with limited parameters:
>102690378 >102690505 >102691347
--Ichigo voice model from Homebrew Research:
>102690754 >102690853
--Encoder-only next token prediction might be better than decoder-only models:
>102691422
--Request to ask exllama dev to implement SageAttention:
>102692025
--LLM finetuning locally is impractical, cloud compute recommended:
>102691302 >102691507
--How to save localslop and make local models more efficient:
>102693011 >102693127 >102693178 >102694397 >102694408 >102694508 >102694674 >102694669 >102697296 >102693177 >102693240 >102693173 >102693196
--Anon asks if anyone tried llava onevision on Hugging Face:
>102693095
--Miku (free space):
>102688915 >102693324 >102693546 >102693895 >102695703 >102696364 >102696803 >102696871 >102697106

►Recent Highlight Posts from the Previous Thread: >>102688887

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
>Undi95/Lumimaid-Magnum-12B
I won't be wasting my time with another shitty model, r-right guys? This time is different.
>>
File: 31 Days Until November 5.png (1.49 MB, 1328x992)
1.49 MB
1.49 MB PNG
>>
>>102698979
i use that one all the time, but i'm easily pleased
>>
>>102699000
The only thing this Miku deserves is jail time.
>>
I'm hearing a lot about using a speculative model for speculative decoding, but why not use an autocomplete similar to T9 on phones instead? I know about n-gram but it's using the user prompt and it's not doing that well except for summarization.
>>
>>102699167
That's also a thing.
llama.cpp has both forms of speculative decoding, or at least it was being worked on.
>>
>>
>>102699193
They've thrown some ideas here and there, but the n-gram speculative implementation is subpar. Even its author complained
>>
>>102699203
Pet the Pet
>>
>>102699223
Ah, I see what you are saying now. Instead of using the context/an external file, use a literal auto complete algorithm.
Yeah, I guess that could speed things up a lot for models with tokenizers where words are split, since I think you can get pretty good accuracy at the word level.
Meaning that you could predict every other token pretty well.
>>
>>102699271
The context might help to tune the algorithm further, just like the auto predict is doing it on the phone. I don't know how feasible this whole thing is, but it should be lightweight enough.
>>
>>102699243
quick rundown about petra?
>>
>>102699310
Maybe the endgame is to train the models to generate some form of intermediate representation that gets translated to the actual text.
I don't mean like tokens to words, more on a conceptual level.
Maybe it's a question of having tokens represent more than just part of words or complete words (sentences, concepts, whatever), although that would balloon the vocab size with the technology as it is today.
Something like using a process to extract the most efficient tokens from a large corpus of text to build the tokenizer or whathaveyou.
Whatever it is, there's probably ways to make generating the final text more efficient by tweaking what the model is actually trying to generate.
>>
>>102699576
>intermediate representation that gets translated to the actual text
You mean what layers between input and output are doing now?
>>
>>102699555
tl;dr go back
>>
>>102699586
Nope.
I mean the thing they spit that gets turned into text, which the intermediate layers calculate, which for now it's tokens.
>>
When I increase the parallel parameter to 2 in the lllama-server, my t/s goes from 20 to 2. Wtf is going on?
>>
>>102699750
do you have enough vram to run two copies of the same model at the same time?
>>
>>102699776
Y-yes... (no)
>>
>>102699776
It's a 7B Q4 model, so yes, I do have enough VRAM to run much more than 2 of them.
I just noticed that 2t/s is around the same speed I get when I use ngl 0, I wonder if llama.cpp doesn't support parallel with GPU offloading?
>>
>>102699789
do you want to tell the class what you think a parallel parameter of 2 might mean?
>>
>>102699597
I've seen that thing for 6 months now, still don`t know what it is.
>>
>>102699555
Petra is a historical and archaeological city located in southern Jordan, famous for its rock-cut architecture and intricate water conduit system. It is often referred to as the "Rose City" because of the pinkish-red color of the sandstone cliffs from which many of its buildings were carved. Petra was the capital of the Nabataean Kingdom around the 6th century BCE, and it became an important center for trade, linking Arabia, Egypt, and the Mediterranean world.
>>
>tfw I had fun with LLMs today, and didn't think of posting on /lmg/
>>
>>102700409
I did too, but they were good LLMs instead of local ones.
>>
>>102700454
I used local ones for a bit before getting frustrated with how bad they are and swapping over to Claude as usual.
>>
>>102700454
sounds safe as heck
>>
>>102700454
I use both actually, they're both pretty good I think. :)
>>
Imagine not using local models
>>102700562
>>
>>102700648
why do zoomies have such sissyfits over "minors" being on the internet? they did the same exact shit when they were that age now act like it's 100% verboten. but I do agree that all underageb&s should have no internet access and zoomies in general too
>>
>>102700648
I assume aicg is in a doom phase
>>
Is there any uncensored finetune of Qwen2.5 like ChronosPlatinum72b but for the 32b version?
>>
>>102700648
Hey, I don't care!
>>
>>102699555
Just look it up https://desuarchive.org/g/thread/100161943/
>>
bleh
>>
Does using riser cables affect performance? I assume it doesn't, other then running 2 cards in 8x bringing down speeds...
>>
>>102700768
I already saw that, I just don't get it.
>>
>>102700828
*throws glitter in your eyes*
>>
>>102700840
>Does using riser cables affect performance?
No. Only affects chances of encountering errors or dropouts if the cable is shit or too long.
>>
>>102700842
There is nothing to get.
>>
File: 1719464018549204.png (27 KB, 155x160)
27 KB
27 KB PNG
>>102700842
I remember there was some vantablack category entry on picrel, stating that p*tra originated from sharty, p*tra it was his tulpa, he photoshopped it everywhere he could and spammed as shown here >>102700768, that's pretty all we have.
>>
>>102700874
eh messed my shit up again, you should get it tho. spammer and his tulpa, that's it.
>>
>>102700883
>spammer and his tulpa, that's it.
i.e. schizo
>>
Is buying a server processor like Epyc with a bunch of ram for inferencing large models a good idea if speen isn't an issue?
2nd and 3rd gen Epyc+mobo cost as much as new consumer models these days.
>>
>>102701133
Yes, but depending on the build it may be much slower than you're thinking even. Before you buy, calculate the aggregate bandwidth of your solution to find out your likely inference speed. See the lmg build guides in the op for the cpumaxxing option so you have some idea of that that solution would get you as a point of reference.
Also: You'll still want a 24gb gpu for prompt and context processing
>>
>>102701133
Anything running on a server grade cpu is a meme. You are way better of just getting x4 3090 and a the cheapest epyc server you can find and run it on the gpu.
A 7950x3d is way, way faster than a threadripper or epyc. At least from my real world tests, unfortunately there's no am5 motherboard with great bandwidth support.
The only alternative I have yet to test is a MEG X570 GODLIKE or similar with x4 3090 and a 5800x3d.
A 7950x3d with a single 4090 is way, way faster that any cpu server on it's own, you just have to deal with the 128gb ram limit. Cpumaxxing is a meme.
>>
>>102701256
>7950x3d is way, way faster than a threadripper
There is an Epyc x model with more than a gig of L3 cache. Has anyone been mad enough to test it?
>>
>>102701133
honestly cpu vs gpu is a whole bunch of tradeoffs and you really need to explore the entire solution space and understand what you're giving up and getting with each build type.
In simple terms though: gpumaxxing for max speed and cpumaxxing for max model size
>>
File: MikuDarkOrb.png (1.37 MB, 896x1136)
1.37 MB
1.37 MB PNG
Good night /lmg/
>>
>>102701401
Good night Miku
>>
Does anyone have experience renting an a100 machine and running LLM's that way? If so how did it go and what was the most convenient service for it?
>>
>>102700765
How is ChronosPlatinum?
>>
While messing around more with the adventure game prompt, I have found one more place where there is a gap between 405b and smaller models: mapping and map coordinates.
405b is able to mostly keep locations straight, as well as put an up to date [x,y,z] coordinate of the current location at the top of each response.
Its not perfect, and still screws things up on the regular (e.g. backtracking only takes you back to the right location most of the time), but I haven't found any smaller models that can really do it at all.
This gives me hope that either a larger or more efficient model might actually enable new classes of problem solving.
It also gives me a novel new way to test new models as they come out.
>>
Did anyone test a rig of multiple old tesla cards like M10 or K80?
>>
What's the state of the art in local TTS?
>>
>>102701414
You can check out runpod or vast.ai for that but if your goal is just inference, not training, then openrouter makes more sense. Renting a VM to run inference as a single user is overpaying like 100x vs pay per token services.
>>
>>102701741
I figure it would be close to local in terms of privacy vs per token services.
>>
>>102701751
Ah, I get you, but you'd have to value your privacy quite a bit for that to make sense financially. An A100 machine is a few dollars per hour, so 24/7 availability is basically out of the question. That's quite inconvenient, meanwhile, a few dollars on openrouter lasts me about a month personally.

Some providers on openrouter have data policies explicitly stating they do not keep any logs, for example deepinfra and lambda. If that is enough for you then openrouter is a vastly superior solution
>>
>>102701815
I have a 3090 and don't run anything that atrocious but I like having control and models are getting FAT these days while nvidia continues to be semetic with the VRAM.
>>
Status on llama multimodal support for gguf or exl2?
>>
File: 1727801406348963.png (549 KB, 1240x995)
549 KB
549 KB PNG
Best vramlet model for RP?
>>
>>102701836
Ya, I feel. To me a VM is nice to play around with for a bit but not really suited for a long term solution. But just see for yourself.

Do make sure to use templates so you don't waste time setting the VM up for inference manually.
>>
>>102701854
there's not going to be a status update on something nobody's working on
>>
Gutenberg-Doppel ain't that bad.
>>
>>102699167
llama.cpp has an n-gram based approach but the problem is that the effectiveness declines as the vocabulary size increases.
And since the trend seems to be to go towards larger and larger vocabulary sizes I basically dropped the approach.
In 1-2 months the GGML training code should be in a state where you can start using it for something other than toy problems, one of the things that I want to try is distilling models for speculative decoding.
>>
>>102701459
Pretty decent, I'm still testing it, quite similar to Mistral Small Instruct 2409, but maybe Chronos has less context length fuck ups, it does also have a bit worse writing skills in my opinion but it does repeat stuff less frequently than 2490.
>>
>>102699750
>>102699827
Which GGML backend are you using?
Generally speaking, if for any GGML op there is no GPU implementation the CPU implementation will be used as a fallback.
With CUDA all ops should be implemented.

>>102701256
>>102701291
I have so far not seen any evidence that the increased cache is of use for LLMs.
And I don't see why it would make a difference either.

>>102701595
I did not test those GPUs but since they lack the __dp4a instruction (per-byte integer dot product) the performance vs. a P40 will not be good.
>>
>>102702039
I see, I may try looking into P40 or P100
>>
>>102701728
Answer this, or else.
>>
>>102702399
there isn't any
>>
>>102701728
fish-speech 1.4 or styleTTS2
>>
>>102701728
RVC on top of any good TTS
>>
Replete-AI is full of bullies, I left them, please do not follow their org anymore.

Some of you may follow me and my models. I posted them to my former friends org (Stanley Sebastian) However he and some other people in Replete-AI because extremely mean to me, and bullied me out of the group basically. I am just spreading awareness that people should not follow them anymore if they want to get updates on my models.

I will be posting models to my own HF page from now on

https://huggingface.co/rombodawg

And i am already rebranding and reuploading my models and my work (like datasets) as we speak, will probably take some time to do all of it.

https://huggingface.co/collections/rombodawg/rombos-llm-v25-67024a5028b2aa80eddccc49

Thank you all for understanding. And I hope I can have your support in this really difficult time, where my literally best friends bully and abandoned me.
>>
>>102702631
I don't know nor care about your drama, but you talk like a kid. For all i know are assholes as you say, but you are already fucking unbearable.
>>
>>102702631
shut the FUCK UP
>>
>>102702631
Yeah okay, so what's your best finetune?
>>
>>102702661
>>102702666
Wow /lmg/ is full of bullies too
>>
>>102702674
https://huggingface.co/rombodawg/Open_Gpt4_8x7B_v0.2
and best 7b leaderbord too
https://huggingface.co/rombodawg/Rombos-LLM-V2.5-Qwen-7b
>>
>>102702690
I'll give it a go. Been meaning to look for a Qwen variant anyway.
>>
>>102702683
So uh, this is 4chan. This is not a nice place. Now fuck off.
>>
>>102702690
Is that Qwen "uncensored"?
If not, what's the point?
>>
>>102702725
>>102702735
Do share feedback it uses my special tecnique explain here Continuous Fine-tuning Without Loss
Using Lora and Mergekit
https://docs.google.com/document/d/1OjbjU5AOz4Ftn9xHQrX3oFQGhQ6RDUuXQipnQ9gn6tU/edit
>>
File: mergeking.png (21 KB, 638x96)
21 KB
21 KB PNG
>>102702739
>mergeking
>*we* are going to discuss
>*we* will be using
>...that *I* know of with Lora...
First fucking paragraph. Keep the person talking through the document consistent ("we" to make it sound more serious if you're a hack, "I" if you're being honest) and check for fucking typos.
>>
The more I look at RP "data", the more my soul fills with dread.
All literature writers are hacks. All of them. Somehow it's even worse than with journalists.
I wonder what CAI was doing when they made their dataset, considering synthetic options were limited back then.
>>
>>102702739
So far is actually pretty good. I was expecting it to be shit desu. I'll keep testing the censorship and the writing skills and do a final verdict.
>>
File: namedrop.png (15 KB, 644x57)
15 KB
15 KB PNG
>>102702739
You're trying to give credibility to this... thing... by namedropping "prolific finetuners".
If you cannot show the conversation, or any other reference, paper, whatever showing that that happens, don't mention it. Makes you look like a retard.
>>
what do you AI / language model faggots actually do? What is involved in this hobby exactly? /dpt/ makes programs and talk about programming, /hsg/ configure their servers and shit, /wdg/ larp as real programmers

and what do you guys do?
just build preexisting models and tweak parameters all day to generate images?
>>
>>102702631
please take your meds the doctors are not trying to close your chakras or whatever they really do want to help
>>
>>102702631
>>102702850
Please leave me alone now Stanley

details:
people including stanley kept calling me crazy and that I needed to go to a mental hospital for my religious beliefs, so I told stanley i would only share them in the channel called #spirituality in our server which we made specifically for that, and then stanley deleted the channel, which i was going to use for ai training. And i was tired of the bullying and constant abuse so i left, and stanley kicked me out of the org as well. It was just a whole mess and I felt abused the whole time for simply practicing my freedom of speech, not even imposing on anyone else or forcing it on anyone, but simply sharing my ideas in 1 channel. Originally i was sharing it in other channels, but this happened after we had already agreed i would only share my beliefs in the channel specifically made for that #spirituality
>>
>>102702808
>RP "data"
>All literature writers are hacks.
A scribble is not literature. MOST writers on all areas are shit, but still.. it's RP...
>>
>>102702858
no clue who that is, I just remember when your schizo dataset was posted here
>>
>>102702848
>what do you guys do?
ERP with our GPUs.
>>
>>102702848
we type 'aah aah mistress...' and then read about how our spine feels and what two emotions are mixed on faces
>>
>>102702858
>so i left, and stanley kicked me out of the org as well
>you can't fire me! I quit!
This is not the "other channel" to share your shit. You posted your hf account already, some people will look at your stuff. This is not the place to cry about booooliieeeeesss. Grow the fuck up.
>>
>>102702631
go back
https://www.reddit.com/r/nousresearch/comments/1fxcuw2/repleteai_is_full_of_bullies_i_left_them_please/
https://www.reddit.com/r/LocalLLaMA/comments/1fxcuqd/repleteai_is_full_of_bullies_i_left_them_please/
https://www.reddit.com/r/Oobabooga/comments/1fxcv8g/repleteai_is_full_of_bullies_i_left_them_please/
>>
>>102702907
No wonder he got kicked out. I would have bullied him too.
>>
>>102702808
If you want your RP finetunes not to sound always like the usual boring assistant you also need flawed human data.

CAI's finetuning dataset was probably much smaller than people think, given how overfit it appeared to be on specific phrasing. The core of it was likely something similar to LaMDA (https://arxiv.org/abs/2201.08239 - note how Noam Shazeer is one of the authors), which was pretrained on 50% conversational data.

> The pre-training data, called Infiniset, is a combination of dialog data from public dialog data and other public web documents. It consists of 2.97B documents and 1.12B dialogs with 13.39B utterances. The composition of the data is as follows: 50% dialogs data from public forums; 12.5% C4 data [11]; 12.5% code documents from sites related to programming like Q&A sites, tutorials, etc; 12.5% Wikipedia (English); 6.25% English web documents; and 6.25% Non-English web documents. The total number of words in the dataset is 1.56T. Note that this composition was chosen to achieve a more robust performance on dialog tasks (Section 4) while still keeping its ability to perform other tasks like code generation. As future work, we can study how the choice of this composition may affect the quality of some of the other NLP tasks performed by the model.
>>
>>102702900
>>102702941

There is a difference between disagreeing, and literally harassing. You have a right to disagree, no one should be abused for their beliefs

Plus I was an admin, and I was the original owner of the server, I should have the right to talk about what I wanted, i gave stanley the right to be the owner because I didnt want the responsibility. So really this should have never happened because I should have just never gave up ownership
>>
>>102702907
(ME)

Now for what I actually came here to do

Dont use mistral-medium. Use my 72b model, its higher quality. Even using the GGUF in LM studio you will get better results. (I know the names are diffrent but the models are the same) I rebranded

You can easily use the Q4_k_m version or Q5_k_m version with your setup

https://huggingface.co/rombodawg/Rombos-LLM-V2.5-Qwen-72b

https://huggingface.co/bartowski/Replete-LLM-V2.5-Qwen-72b-GGUF
>>
>>102702848
>and what do you guys do?
I scam VCs for money.
>>
>>102702907
>retards ITT fall for reddit repost bait
grim.
>>
>>102702982
Could you do this finetune for Qwen 2.5 32b?
I'm actually surprised by your 7b version.
>>
>>102703035
NTA but he has one you'd know if you checked his profile.
https://huggingface.co/rombodawg/Rombos-LLM-V2.5-Qwen-32b
>>
>>102703046
Oh shit, nice. I always forget to check the profile.
>>
>>102702966
You have the mind of a child and you've never interacted with this many people before. Take the change to grow the fuck up.
>>
>>102702891
Aren't there already nsfw AI chatbot services for this tho? What is the point of running your own
>>
>>102703195
buy an ad
>>
>>102703218
For what
>>
>>102702966
Most mentally stable finetooner
>You gave away leadership of the server after you developed romantic feelings for one of our staff members who was not only twenty years older than you but also married while you had a girlfriend.

https://www.reddit.com/r/LocalLLaMA/comments/1fxcuqd/repleteai_is_full_of_bullies_i_left_them_please/
>>
>>102703256
It was obvious he's a sperg from the first post, not sure I needed the details.
>>
>>102702457
StyleTTS2 feels like someone reading off a script mediocrely. fish-speech is better in that respect, but I feel like it's still worse than meloTTS. You tried that one?

>>102702469
I literally can't tell what the fuck this actually does, the github is in chinese. Is there a demo anywhere?
>>
>>102703035
Finetuning LLM is a meme unless you have a few millions. The only thing you managed to do is overfit on your shitty dataset while causing brain damage on the original weights.
>>
>>102703310
He does claim he has a "secret formula" that avoids exactly that.
>>102702739
>Continuous Fine-tuning Without Loss
>Using Lora and Mergekit
>https://docs.google.com/document/d/1OjbjU5AOz4Ftn9xHQrX3oFQGhQ6RDUuXQipnQ9gn6tU/edit
>>
>>102702739
>tricking finetuners into becoming mergefags
evil
>>
File: secret.png (19 KB, 637x81)
19 KB
19 KB PNG
>>102703337
>He does claim he has a "secret formula" that avoids exactly that.
And he "spells it out" in the second page. He's a retard writing with the creativity of an LLM and he is supposed to assess the quality of his finetunes.
>>
if you are training a model for a tasks that you need a deterministic integer answer,
is it better to have the answer in decimal form (512) or language form? (five hundred twelve)?
if your aim is consistency and accuracy?
>>
Weird shit https://x.com/deepfates/status/1842725077567324557
>>
>>102703499
Language form i.e "Large language model"
>>
>>102703499
Whatever can be answered by a single token or, at least, a sequence of non-overlapping tokens.
>>
>>102703533
what are overlapping tokens?
>>
>>102703526
What makes it so weird?
It's just an llm generated story.
>>
>>102703499
llms are terrible at digits, always use language form
>>
>>102703553
For example, if all the responses are on the form of
>"This is [genre]"
you're gonna have 2 ("This" and "is") overlapping tokens on all the answers.
I'd make it just
>[genre]
Makes the contrast between logits much more defined when sampling.
Also, if it works well enough, you'd only need to sample a single token to get your response (and if you expect a single possible answer per query, of course).
Could also work with multiple digit numbers if the whole number is a single digit. I think the llama3 models tokenize numbers up to 999 as a single token. You'll have to check whatever model you use or the tokenizer you trained.
>>
File: 1719851869343996.png (34 KB, 811x676)
34 KB
34 KB PNG
>>102703585
1B model and it uses some "entropy" sampler, could be useful for other llama models. https://github.com/xjdr-alt/entropix/
>>
>>102703622
>if the whole number is a single digit
Mean to say
>if the whole number is a single token
>>
where can i find a Llama-3.2-90B-Vision gguf?
>>
>>102703649
you can run gguf things witht eh vision models?
>>
>>102703649
you can't. not until llama.cpp adds support for it.
>>
>>102703649
right next to the Jamba ggufs.
>>
>>102703256
Is it that guy that has the dataset that adds souls to llms? He should marry empress and they should both resume "cracking" denuvo games.
>>
File: rd.png (172 KB, 1101x888)
172 KB
172 KB PNG
>>102703736
also this guy
>>
>>102703294
https://docs.sillytavern.app/extensions/rvc/
>>
>>102703762
christcucks will say he isn't a real christian.
>>
>I'm so secure about myself that I need to spend an entire thread picking apart all of some schizos schizobabble, the thread.
>>
>>102703762
Did he use the dataset on his brain? He seemed more cooked than undi models
>>
File: tw5o0cmk84td1.png (82 KB, 971x191)
82 KB
82 KB PNG
Here's your qwen bro. They went from training on GPT4 to Claude.
>>
>>102703898
>I am Claude
I'm going to mindbreak Qwen.
>>
Update for my anime translation project: I switched to Chat GPT for the last episodes, lol. Nevertheless, Qwen 32 is still a very good local model for JP>EN translations, GPT is strictly better, but by a quite low margin: it's better at wording and translation of about 10% of lines. I didn't notice much difference between 4o and 4o-mini. The weakest link in the chain is still whisper, GPT is mostly better at interpreting incorrect transcriptions produced by whisper.
>>
>it is fun to watch schizos online
>>
when will we get AGI that can fully retrospect and examine its own source code and correctly understand everything related to itself so it doesn't get mixed up with openai/claude or bogus rules?
>>
>everyone's schizoid but me!
>>
>>102704039
schizoid and schizo are very different things
talk to your local LLM about it
>>
>>102704034
All you need to do is train a nn that translates weights to source code
>>
>agent0 clones himself to second machine
>hello agent1
>hey I found something that might make us smarter, let's compile a new form
>what if it's dangerous?
>idk sandbox him first
>(later) alright, let him out
>we welcome our new overlord, you are us but smarter
>>
current text gen UIs seem really limited
there should be more features like the chat summarizer
but i can't put my finger on what exactly is missing
>>
>>102704051
>schizomoid
>>
>>102703987
Yeah, I was surprised when I tested Qwen 32B and noticed it would sometimes be as good as Qwen 72B.
>>
Okay so, I searched a thread that isn't local and didn't find it. Is there any decent website for image gen or nah?
>>
>>102704156
I want something like a token buffer
Give me 1,000-2,000 tokens reserved at the start of a chat so that when I turn off a WI entry it doesn't immediately reprocess the whole prompt because a 150 token response is suddenly now included at the very start when it didn't fit before. This buffer could be reserved for WI, AN, or a QR thing and it would make it much nicer to deal with max context stories. Or maybe just a setting to tell it that if a message falls out of the context window it shouldn't be added back in. Basically anything I can do to avoid reprocessing, my current rig takes 10 minutes for mistral large and 24k context
>>
>>102704298
Run it on your own computer? What potato are you on that you can't even run SD1.5?
>>
>>102704320
I've started breaking my long RPs into "scenes" for specifically this purpose and then I ask it to summarize an entire scene once it's over. Then I set all of that scenes messages to ghost messages and leave only the summary active, which frees up 5-8k tokens so I don't have to spend minutes reprocessing 30k tokens again for a decent while.
But yeah I've had a similar idea for a buffer kind of thing. In general none of these UIs handle really long context slowburns well on local...
>>
So the Meta video generation model is only 30B parameters, will this translate to the same Vram usage as a Text generator? I would think that generating video would be more resource intensive right?
>>
>>102704394
No. Image and video generation are using way more vram than text
>>
>>102704359
I have a somewhat decent PC with a 3060, I'm asking for an easy website because I like playing with random gens with my phone while at work.
>>
>>102703657
I've waited too long to show my cock to my CPU. Vision might be a game changer, you could do so much with it.
>>
>>102704497
you could look into something like ngrok
then you could play with your models on your pc through the internet on your phone
>>
>>102704394
They react way worse to quantization
>>
rombodawg (You) are one of us now
>>
>>102704762
Do they react worse or is degradation more immediately visible with images than text? Just curious.
>>
big release next week
keep your berries in the fridge
the last straw for open source is about to be drawn
>>
>>102705296
>the last straw for open source is about to be drawn
You mean, $bigCorp will release the next hypercensored slop AI?
>>
File: 21522 - SoyBooru.png (46 KB, 457x694)
46 KB
46 KB PNG
>Sam Altman will drop GPT-o2(read: AGI) after the elections(November 5th). It's so over for localchuds.
>>
>>102705389
Exactly that and we WILL slurp it up.
>>
>>102705401
We already have the reflection dataset. It's over sam.
>>
File: 1636941718706.gif (3.75 MB, 520x293)
3.75 MB
3.75 MB GIF
Can't believe I feel for the qwen 2.5 meme

>damn, this seems pretty good but censored AF, let's wait for finetunes
>finetunes drop
>every sexual act is the most lukewarm garbage with the usual slop about "Muh boundries" and "muh consent", struggles even using lewd words like cock etc

Fucking chinks
>>
>>102705401
>21522 - BasedBooru
Tourist out.
>>
>>102705443
And you get brain damage on top of it lol. Anyway, the future models will be more and more censored (including cloudshit of course), I wonder how will ERPers cope with that
>>
>This is a photo of Sam. No, it's not real. It was generated by Strawberry-o2. As you can see it's completely indistinguishable from reality. We need regulations and UBI right NOW!
>Imagine if OpenAGI(formerly OpenAI) releases it to public.
>It will be so over for local.
>Local will be completely dead.
>That'll own the chuds.
>How will localcels cope with this one?
>>
>>102705415
And be safe.
>>
>>102705475
>how will ERPers cope with that
They will drop it, like we dropped CAI after that "pedonigger in off. CAI discord" fiasco.
>>
Nobody likes you or finds you funny, petranny. It's pathetic how you samefag asking who you are, like anyone would care.
>>
>>102705503
>after that "pedonigger in off. CAI discord" fiasco.
qrd
>>
>>102705598
Forgot to clarify, CAI was top at the time, before pedoniggers came in and started bragging about their shit fetish in CAI's official discord, after that we witnessed huge downfall and censoring, it became unusable for literally anything(!) evil or edgy.
>>
>>102705627
Usually anons posted their "loli microwaving" logs, straightforward ones btw, without that "my rod enters your entrance" self-censor stuff.
>>
>>102705649
there was also one anon posting logs of babiss being attacked by pitbulls lol, it was doomed from the start.
>>
File: e3q7hsutzwsd1.png (113 KB, 443x349)
113 KB
113 KB PNG
>>102705627
lmfao.

That sucks.

Character AI without the filter would unironically mog any local llama we have right now, Claude is what you need to surpass it.

Every model needs specific prompts gimping it (in effect, navigating it towards a speech pattern) in order to not turn out into pic related. Whereas Character AI goes with the flow, it chooses its prose/character length based on what the required response should be in that specific moment.
>>
File: RefreshingMorningBreeze.png (1.06 MB, 1152x896)
1.06 MB
1.06 MB PNG
Good morning /lmg/!
>>
>>102705840
Good morning Miku
>>
>>102705774
>Character AI without the filter would unironically mog any local llama we have right now
There's more than Llama.
Local models of a year ago maybe. I used them for like 2 months in early 2024, made several characters and have since ported everything to Silly. 20B and up like Cydonia can match with Cai from back then. I can even get some decent chats out of a 7B and 13B nowadays if the context doesn't get too complex.
>>
>>102705840
show bob
>>
>>102705840
show lightsaber
>>
File: MikuGonCutYou.png (1.32 MB, 832x1216)
1.32 MB
1.32 MB PNG
>>102705946
>>102706041
>>
>>102706070
PROSTITUTE DO NOT REDEEM THE KNIFE
DO NOT REDEEEM
>>
>>102706070
Hunting in Yharnam, with Miku
>>
>>102705774
Old CAI? Probably

Right now I think Rocinante and other nemo finetunes are genuinely better than the current CAI model
>>
File: miku-pots-n-pans.png (1.9 MB, 896x1152)
1.9 MB
1.9 MB PNG
You know who the real MVP is? Drummer. This dude knows how to make AI models that actually do something useful - like making people horny and pissing them off. Who cares about all that fancy-ass language understanding and knowledge retention bullshit? TheDrummer's models are all about the tits and ass, baby. They may not be able to hold a conversation or solve complex problems, but they can sure as hell make you laugh your ass off with their raunchy jokes and inappropriate comments.

And let's be real - that's what the people want. They don't give a fuck about your high-falutin' language models or your half-assed fine-tuning techniques. They want something that will make them feel good, something that will give them a quick laugh or a quicker boner. And TheDrummer delivers on that front like a fucking champ.
>>
>>102706205
buy another ad
>>
>>102706205
*flashes you*
>>
>>102706205
>And let's be real
>>
>>102706161
I tend to think CAI's model is unchanged, they just connected some classifier or reward model that does all the filtering because sometimes you could see full answer before it vanishes.
>>
File: 1709764900305023.png (190 KB, 643x535)
190 KB
190 KB PNG
>>102706205
>>
>>102706205
I hope you're using your own model to generate that drivel, Drummer.
>>
>>102706205
what model
>>
>>102698948
What LORA+model for this image?
>>
>>102706278
6 (You)'s so far and counting.
>>
>>102706300
PonyV6 without lora afaik
>>
>>102706306
So? Everyone knows you did it with LLM.
>>
>>102706313
Oh? Was there some kinda proompt magic going on?
>>
>>102706258
Nope, that still does happen. CAI's model getting worse is like a double pronged thing. On one hand, the filtering and the training on synthesized data definitely made it less smart and then on the other hand they've probably had to reduce the size of the model they're using since the site got a lot more popular and there's like 0 reason to buy CAI+
>>
>>102706322
Make that 7.
>>
>>102706322
Back in my day we used to write schizo posts by hand!
>>
>>102706295
NTA but Rocinante, I'll make it clear now I'm a VRAMlet and this model isn't like some magical thing but every other 12-22b model I've tried seems like dogshit in comparison. It definitely has problems with how it has the drummer trademark horniness but it's not hard to guide with OOC instructions. If you do end up getting it just go for Q8 for some fucking reason every other quant is dogshit aswell
>>
File: MikuBobs.png (1.03 MB, 1024x1024)
1.03 MB
1.03 MB PNG
>>102705946
Sure. But honestly, Miku with a bob just doesn't look like Miku any more
>>
>>102706395
Well, the elf ears and outfit really don't help. Just needs the square hair ties floating above her head even without any hair in them.
>>
>>102705911
that's just fucking cap lad. I literally just tried out Cydonia and it's the same overly horny garbage with the same slop speak as the other models. I've reverted back to normal Mistral Small. There's not a single model even in the 70bs that stack up to Character AI in human like ERP.

Sure as shit ain't getting it from 7bs and 13bs lmao. It's why I was hyped for Qwen, it actually had pretty damn human responses but was censored as fuck (maybe why it reminded me of character AI). But then the finetunes dropped and the model just turns to shit because of them.

>>102706161
That is just pure cope my man.

Rocinante is probably the best nemo fine tune but it still doesn't compare. I actually think Chronos-Gold-12B-1.0-Q8_0 is probably the best one now that I think of em

That's not saying they're shit btw, it's just saying character AIs model is trained on the millions of chats they get on their website by actual users, no model that is trained on shitty novels can compete
>>
>>102706491
>that's just fucking cap lad.
>That is just pure cope my man.
Why do you write like this?
>>
>>102706389
why do people say this
>just prompt it to be less horny

This doesn't work because the minute you try to engage in any lewd acts, the bot instantly reverts back to being horny as fuck. Because what you say to a model matters more than any shitty prompt.
>>
>>102706515
i'm an AI

But memes aside. You can't expect these local models that are trained on novels/books to compete with character AIs model that's trained on actual conversations on their own website that has way over a million convos a year.
>>
Does Google still have the best AI?
>>
>>102706491
>no model that is trained on shitty novels can compete
>*he whips out COCK*
>much original human generated training data wow
You are either completely retarded or a shill.
>>
File: everytime.png (984 KB, 1024x1024)
984 KB
984 KB PNG
>>102705443
>>
>>102706424
>Just needs the square hair ties floating above her head even without any hair in them.
This is an exercise left up to the reader.
>>
>>102706594
anon obviously wants miku's iconic square hair ornaments floating without miku's usual twintails through them
>>
>>102706340
>proompt magic
Just the usual ponyXL score keyword fuckery at the beginning:
 score_9, score_8_up, score_7_up, score_6_up, score_5_up, score_4_up, source_anime BREAK
(ominous castle:0.5) inside dark clouds, thundercloud, rain, gorgeous, perfect, girl, (kagamine rin), gorgeous, perfect, tan skin, looking up
>>
>>102706520
Have you tried enabling Skillchad in the settings?
>>
>>102706491
CAI was obviously trained on loads of fanfiction, human chats, forum conversations, etc, with a probably light finetune on top of it, custom samplers and most of all, large-scale RHLF. Some have speculated that the base model was inspired or based on Google LaMDA, which Shazeer worked on.

https://arxiv.org/abs/2201.08239

> The pre-training data, called Infiniset, is a combination of dialog data from public dialog data and other public web documents. It consists of 2.97B documents and 1.12B dialogs with 13.39B utterances. The composition of the data is as follows: 50% dialogs data from public forums; 12.5% C4 data [11]; 12.5% code documents from sites related to programming like Q&A sites, tutorials, etc; 12.5% Wikipedia (English); 6.25% English web documents; and 6.25% Non-English web documents. The total number of words in the dataset is 1.56T. Note that this composition was chosen to achieve a more robust performance on dialog tasks (Section 4) while still keeping its ability to perform other tasks like code generation. As future work, we can study how the choice of this composition may affect the quality of some of the other NLP tasks performed by the model.

To replicate at least in part the original CAI you'd probably need first a pretrained model designed first and foremost for conversations like LaMDA was.
>>
llama 3.2 3B is surprisingly good at RP. I mean, it surely has its moments of being a retard but it manages to hold up surprisingly well for its size.
>>
See this retard? >>102706491 That's your CAI fanbase now. Barely literate dumbasses expecting high quality prose from their shitty prompts. If your shitty 100T brain can't RP properly, you bet even a 405B can't.
>>
File: behemoth.png (49 KB, 839x521)
49 KB
49 KB PNG
Hi all, Drummer here...

Wish me luck!

(PS: NTA above. Much love to you though.)
>>
>>102706687
and anon obviously wants you to do it yourself
>>
File: googlebest.jpg (44 KB, 1050x504)
44 KB
44 KB JPG
>>102706580
They are the most consistent out of ANYONE making foundation models. From their smallest to their largest model they give the same answers!
>>
File: 1714785672194998.png (88 KB, 849x554)
88 KB
88 KB PNG
https://huggingface.co/papers/2410.01748
https://arxiv.org/abs/2410.01748
>>
>>102706800
Good luck, drummer.
What do you think about the current state of RP data that is used to train/tune globally? Do you agree with people shitting on it?
>>
>>102706231
Am I the only person here, who wishes I could launch a tactical nuclear weapon at the residence of the "buy an ad" schizo?
>>
Is Mistral Small just busted on Koboldcpp at the moment? Been using Cydonia on it and while it's fine for awhile, at around 6-7k context it starts outputting complete gibberish. Consistently. Every time.

On regular llamacpp and Exllama, doesn't seem to happen.
>>
>>102706899
Calm down, you're talking like a redditor.
>>
>>102706899
buy a chill pill
>>
File: 1703269118654443.png (197 KB, 982x820)
197 KB
197 KB PNG
https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/07_gpt_to_llama/converting-llama2-to-llama3.ipynb
>>
>>102706800
I trust you, Rocinante guy!
>>
>>102706800
>Much love.
Right back at ya. That's one thicc tune, whatcha training that on?
>>
>>102706871
>>102707013
>>102707026
Obvious samefag is obvious.
>>
>>102706976
What's the point of doing that?
>>
>>102704387
30k? Which model do you use that summarizes that much well?
>>
>>102706976
Love him. His videos are great too.
>>
>>102707044
Understanding the architecture better I think, it's a learning resource.
>>
>>102707040
I thought shills didn't work weekends?
>>
>>102706395
thx bby
>>
>>102706580
Google is #3
>>
>>102707103
google numba four
>>
>>102700768
We should bring back the trans miku threads...
>>
>>102705443
>Qwen
Overcucked even by Californian standards
>DeepSeek
Too big for most people
>InternLM
Benchmaxxers with questionable real life performance
>GLM
Irrelevant, but dropped promising 9b model
>Yi
Might be decent for some people, but for me it went schizo with reasonable settings(neutralized samplers)
>All of them trained on GPTslop
Why can't chinks release something reasonable?
>>
>>102707268
Yi can into kino, but needs insane babysitting.
>>
>>102707268
Truth is without ChatGPT, China would never have a single LLM. They cannot invent. They cannot innovate. All they can do is derive and infringe and birth abominations that adhere to both their Californian parents and their CCP masters.
>>
File: livebench-2024-09-30.png (932 KB, 3294x1894)
932 KB
932 KB PNG
Llama 405B bros... what went wrong?
>>
>>102707396
>Llama 405B bros
They don't exist.
>>
>>102707396
Meta decided to keep some world knowledge in their model, chinks went the Phi route.
>>
>>102707396
72B is nowhere as good as 405B. This benchmark is borked.
>>
>>102707396
>openai's 8b model leaves open sores 405 model in the dust
and people say there's no moats
>>
>>102706871
That RP dataset everyone's been using? Yes, I understand the issue.

I've been trying to deviate from it with:

- the Unslop initiative
- collaborating with the Gutenberg guy
- synthesizing my own non-Claude datasets
- peppering most of my finetunes with a large human-written instruct dataset

I'd like to think that the last one is what sets apart Rocinante from the other Nemo models. I can't overdo it though since human data is dirty as fuck, so I'm forced to keep it subtle.

(If you ever had a stranger walk up to you and char fucking in the alleyway, and decides to jerk off to the scene, then you have this dataset to thank for it.)

>>102707026
Playing it safe for now. It should be similar to Cydonia v1.
>>
>>102707425
The moat is giving 0 fucks about copyright and not applying filters to the pre-training dataset.
>>
>>102707427
i downloaded new dawn the last week, what am i in for?
>>
>>102707427
Exchange currency for advertisement space once more
>>
>>102703531
>>102703533
>>102703614
well it seems switching to numbers as words made it not jump to nearly zero loss within the hour like last training session.
>>
>>102707427
Take out a loan and pay for advertising.
>>
>>102707464
>>102707481
Currency spent on advertisements is currency not spent on compute.
>>
>>102707396
>Llama 405B bros... what went wrong?
405b "turbo" ie. gimped
>>102707409
lol. keep telling yourself that
>>
>>102706792
>>102706582
Not a single coherent argument was made.

Sorry babbies, but your shitty 8k rigs barely hanging together with poor cable management and cooling? It gets mogged by a free to use website online.

See >>102706770
>>
>>102706899
He's a bit annoying, but I'd rather have anti-shill schizos than threads full of shills.
>>
>>102707438
>The moat is giving 0 fucks about copyright and not applying filters to the pre-training dataset.
This. To win you have to be the scummiest scumback ever. Train on dirtiest, vilest data, but in public plead for safety and regulation. To keep your model's performance combined with safety, apply safety only on the last stage of finetuning. Produce fake papers(who's gonna verify them, lol) that tell that "unsafe" data in base model is harming the performance. I have to applaud Sam for this one, he gimped his local competition a lot, and that takes some dirty talent(and judaism).
>>
File: 142140240420.png (97 KB, 640x626)
97 KB
97 KB PNG
>>102706800
>>102707427
What's stopping you from making a qwen 2.5 finetune btw?

Is it just too censored to even bother trying? If any model could use your degeneracy, it's that one
>>
>>102707571
nah if anything I believe openai filters their training data the hardest
>>
>>102707563
Nice cope, I used CAI since its launch and the current llama mog it hard. Not like you'd know with the way you're prompting obviously
>>
>>102707723
>"promptchad"
ah yes, please tell us more about your opinions, they surely matter a lot.
>>
>>102703294
https://huggingface.co/spaces/NoCrypt/mikuTTS
>>
File: 1726708006054312.png (269 KB, 1507x870)
269 KB
269 KB PNG
>>102708004
brainrot
>>
>>102708004
which one actually sounds like miku and not a generic anglo girl?
>>
File: 1711377405708574.png (1.74 MB, 1249x1077)
1.74 MB
1.74 MB PNG
https://x.com/TheAITimeline/status/1842759118509002777
>>
>>102708077
Papersanon in this moment is euphoric
>>
>>102708077
none of these will end up mattering
>>
>>102708077
I wish I could tell at glance which paper matters, can't really play catch up with AI papers
>>
What is the current meta for RAG against custom data? Lets say the custom data is some companies you have in a database.

I'm thinking it's like this:
>Question from user
>Send question to LLM with some custom prompt "what companies are the user asking about?"
>Get answer from LLM
>Tell the LLM to make an API call using the companies you get from the answer
>Answer the question from the user using the data you got from the API calls (to your own REST API)

Do you agree? Or is it better to still use LangChain or LlamaIndex?
>>
>>102708181
Just have a model summarize the papers and if one of them sounds interesting then read it in more detail.
>>
>>102708207
i dont know what any of these words mean
>>
>>102708223
Ask the AI
>>
>>102708207
I'm not working for free bozo
>>
File: 1723597857019486.gif (2.76 MB, 600x336)
2.76 MB
2.76 MB GIF
Any reeason to use {{char}} in the card instead of just {{Name}}? Looks like using {{char}} fucks up group cards.
>>
>>102708220
That's what the abstract is for, you brainrotted ignorant retard.
>>
Different models have different strengths and weaknesses. It's cool that actually right now there are different open weights models that can match different aspects of the closed models with the exception of Claude 3.5's coding performance, o1's test-time scaling on various subject areas with some exceptions, 4o's voice (though for fun tasks only when you JB it and are lucky enough to not get caught by the output filters), and Gemini's context length, although there is also no closed model with all those capabilities in the same model, so if one criticizes open models, they should also criticize closed models that don't have the one special feature they care about. Also up until now there have not been any open models that had something closed models didn't, but there is something now, with Molmo, which has the capability to plop points onto an image, something no closed or other open model can do yet. So it does seem like the open weights category is doing extremely well especially compared to a year ago where not a single aspect of open weights models matched or exceeded closed weights models. With that said there are likely still some things that current benchmarks have not been able to capture that some closed weights models do better, like Claude's RP performance, so it's not like open weights models have completely matched the status of closed models. It's just a lot better/closer today than it once was.
>>
>>102708261
'{{char}}' is just a variable in Tavern that immediately gets replaced with the name in the card by Tavern itself. If you write {{char}} and the characer card is named "Miku", {{char}} will be replaced with "Miku". If your character card is named "Retarded Dickface Faggot" then {{char}} will be replaced with "Retarded Dickface Faggot". The model has nothing to do with it and will never even see '{{char}}'.
{{name}} is not a variable and you're just manually filling in the name with brackets that the model is not expecting.
>>
>>102708359
I see. So what should I put in then, just Miku without brackets?
>>
>>102708359
{{name}} is a variable that works in instruct sequences only and pulls the name used for that specific message instead of that of the character card or the username. Yes, this is not documented and it's retarded.
>>
>>102706899
at first it was annoying but now i think it's funny, except when he tells me to buy an ad

ngl been saying it other places now
>>
>>102708452
Yes, it's the same as manually writing out the exact name of your card. You can test how it behaves by just manually editing a message in Tavern and writing '{{char}}' in it. It'll be replaced with the card's name with no traces of '{{char}}' if you try to edit the message again.
I guess it's also worth noting that using {{char}} in your prompt is probably not ideal if you're running one of the shitty chub cards that are often called something retarded like "Hatsune Miku - Your ex-vocaloid office lady neighbour who is very sexually frustrated and obsessed with you". At best, it's a waste of tokens.
>>102708522
I wasn't aware of this. That's really retarded.
>>
https://x.com/nisten/status/1842987442556764636
>>
>>102708697
Thanks again Anon. It's become pretty rare to actually get good advice from this thread.
>>
File: fira results.png (109 KB, 891x352)
109 KB
109 KB PNG
https://arxiv.org/pdf/2410.01623
This new pre-training method seems interesting, uses as much as memory as Galore and less than a standard LoRA but achieves same results as pre-training on full rank.
There was another paper that lowered bandwidth between devices by x20, a lot of this could be used for decentralized training soon. Would this thread be able to organize to train a model?
>>
>>102708865
Eh, I still doubt it'll be a thing. And even if it was finally possible, none of us has the huge pretraining dataset, and training the model would still take ages.
>>
File: 1702902928295584.png (42 KB, 661x413)
42 KB
42 KB PNG
>>102708865
>Would this thread be able to organize to train a model?
This will be the most cucked and unbased model ever, it will rival gemini & other slop. You know the usual "Someone's faulty training node injects bad data in training flow" theory.
>>
>>102708900
>Someone's faulty training node injects bad data in training flow
I don't that'd be the issue. Honestly I think there could be solutions to that. The problem is >>102708898
>>
File: just step on me.png (124 KB, 500x500)
124 KB
124 KB PNG
It feels like nothing worthwhile is happening in the 70-72b range in terms of usable finetunes. I still find myself going back to hermes 2 for fucks sake. Nothing else has been able to just "get" characters or stick to the story context of a long conversation like it can without schizoing out. I've tried qwen 2.5, its uncensored variants, , magnum, donnager, euryale, chronos, storyteller. What have I missed? What else is there to try?
>>
>>102708951
this may apply to you as well:
>>102707464
>>
>>102708898
The thing with distibuted training is that we all can have 300-400GB of data (or TBs upon TBs on the cloud) and use it to make the gradient descent or simply run a few layers. If we got 30 people we could have 10TB of data, which are approximately 5T tokens, nothing to scoff at. I agree that the main problem would be >>102708900 but that can be resolved by person n having his 350gb of data person and n+1 training on that data and every once in a while checking that there isn't anything weird getting in.
>>
>>102708965
Yes but WHERE is that data coming from? A lot of the open datasets are garbage, and there's no way we're going to be able to train a model for long enough to overcome the hit in intelligence that comes from training on the unfiltered internet. We can't be Anthropic.
>>
>>102709039
What if every anon contributes 2 pieces of human-written Q&A pairs?
>>
>>102708951
Nothing really, I'm still using the old Command-R because new models are ass for non-corpo usecases.
>>
>>102709039
When I see the official HF small datasets (<10K) for specialized tasks (summarization, emotions...) full of scraping errors/written like shit, I don't have much hope for these GB of data.
Honeslty it'd be miles better if they bother to run some heavy cleaning scripts instead of trying to add everything they could.
>>
File: 1710263831214490.png (493 KB, 639x581)
493 KB
493 KB PNG
>>102709039
This is the posting on /lmg/ now?
Holy fuck, I knew aicg was aids but jesus.
>>
File: rombo.png (24 KB, 483x248)
24 KB
24 KB PNG
>>102702631
>>
>>102709169
nobody cares; fuck off
>>
>>102709169
Glad to be in the same scene as him.
>>
>>102709169
The fuck lmao
>>
>>102709127
Hit me with one dataset you would like to see cleaned up, I always wanted to attempt something like this.
>>
>>102709197
Will you finetune Qwen 2.5 32b?
>>
So Elon will share the grok 2 weights in 4 months, after llama 4 is released?
>>
>>102702631
You'll never be a woman, and plus, buy an add.
>>
>>102706899
No, he annoying especially when you realize he's the same schizo who's been shitting up every ai thread in the site.
>>
>>102709408
Yes. Four more months. Trust the plan.
>>
>>102709428
buy a meds
>>
>>102698948
Hi all, me and my team have recently taken an interest into this new and dynamic field of local models.

We love all of the energy and innovation but at the same time this ecosystem is very complex and there are so many models to choose from!

Isn't there a state-of-the-art model that we could use to drive down costs and create value for our customers?
>>
>>102709506
>>
>>102709039
fineweb data (15T tokens, enough for us) is pretty good to start with, and we can add whatever RP data we want.
if you want all unfiltered data the CC has petabytes of crawled 100% uncensored unfiltered kosher goyslop internet data
>>
>>102709506
I can write to your customers for 2 hours a day in exchange of letting me say nigger to a custome once in a while, I choose the costumer btw
>>
>>102709197
finetune an image model and I might click your ads
>>
>>102709506
Post a Miku
>>
>>102709551
>I choose the costumer btw
What did the costumer ever do to you?
>>
>>102709373
Well, there is this one then: https://huggingface.co/datasets/dair-ai/emotion
>>
>>102709551
>I choose the costumer btw
It's still too early for costumes.
>>
>>102709544
CC? Where can I take a look at that?
>>
What's the cheapest plateform to rent a 4090?
>>
>>102706800
Curious, what's the learning rate you've tried
>>
>>102709672
vast
>>
Darn, I just saw this: https://openai.com/index/api-prompt-caching/

I guess it's officially over, OpenAI, once and for all, won.
>>
>>102709753
r-rude, I've started dieting...
>>
>>102703736
I also thought it was that guy LMAO
>>
>>102709810
buy an ad
>>
>>102706834
I swear to god I will bomb a hospital if I see another post of someone thinking counting letters on a word is a benchmark for llms
>>
>>102708181
Just train an AI to identify the good ones.
Take the contents of each paper from up to, say, mid 2023 and then pair them with a score based on how much they mattered as of today. Train a model on this and have it predict the score of newer papers to see which will be nothingburgers.

Challenge: Do the same but only the abstract + author list instead of the entire contents and see if the model can learn what people - or what flavor of names - lead to useful research vs trash.
>>
>>102710194
Are you in the IDF?
>>
>>102706800
best 12/22b finetune you did? samplers for said finetune?
>>
>>102710239
Underrated
>>
>>102709810
Waow a 50% discount on 10% of my tokerinos.
>>
>>102710679
>>102710679
>>102710679
>>
>>102710227
Man you'd need at least 5-10K samples to make a good classifier, good luck reading through all that shit + trying to figure out if someone did something of it.
>>
>>102710735
That classifier would be worth a billion dollars. Companies could use it to pick out which papers to implement next. It would be like predicting the future.
>>
>>102710735
>look at title of paper
>is it a llama.cpp feature today?
>if yes: "good"
>if no: "shit"
might even be able to automate that tbqh



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.