[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: kedaruimiqu.png (1.33 MB, 1200x848)
1.33 MB
1.33 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102258941 & >>102249472

►News
>(09/06) DeepSeek-V2.5 released, combines Chat and Instruct: https://hf.co/deepseek-ai/DeepSeek-V2.5
>(09/05) FluxMusic: Text-to-Music Generation with Rectified Flow Transformer: https://github.com/feizc/fluxmusic
>(09/04) Yi-Coder: 1.5B & 9B with 128K context and 52 programming languages: https://hf.co/blog/lorinma/yi-coder
>(09/04) OLMoE 7x1B fully open source model release: https://hf.co/allenai/OLMoE-1B-7B-0924-Instruct
>(08/30) Command models get an August refresh: https://docs.cohere.com/changelog/command-gets-refreshed

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: img_14.jpg (301 KB, 1360x768)
301 KB
301 KB JPG
►Recent Highlights from the Previous Thread: >>102258941

--Use 8-30b model for better speed with 24GB VRAM: >>102264082 >>102264133 >>102264281 >>102264342
--Running Mistral Large Q5 at 64k context with DDR5 RAM and GPU considerations: >>102265363 >>102265409 >>102266475 >>102266507 >>102269376 >>102269423 >>102269534 >>102270198 >>102267903 >>102267928 >>102267943 >>102267935
--Running LLM models on Optiplex 7070 micro PC: >>102268093 >>102268117 >>102268139 >>102269167
--Dual CPU setups for CPU inference have drawbacks, but can run large models at usable speeds: >>102265415 >>102265431 >>102265492 >>102265575 >>102265624 >>102265719 >>102265840 >>102266056 >>102268854 >>102269309 >>102265596 >>102265798
--Comparing AI model performance across different benchmarks: >>102266882 >>102267012 >>102267041 >>102267003
--Building a narrative-game environment with AI, concerns about positivity bias, and impressive storytelling: >>102268260 >>102268304 >>102268346 >>102268416
--Botnet training discussion: >>102268010 >>102268037 >>102268074
--Silly Tavern message sound setting for ding notification: >>102264570 >>102264597 >>102264610
--Silly Tavern extension compared to anon's Director project: >>102267600 >>102267788 >>102267833 >>102267852
--Recommendations for adventure/rpg cards to use with LLMs: >>102259012 >>102259080
--Recapbot test using deepseek 2.5 at bf16 - performed well but had some issues: >>102265886
--NTFS issues on Linux and potential solutions: >>102268024 >>102269150 >>102269190 >>102269512 >>102270384
--Reflection fixed on openrouter, fails strawberry test: >>102267880 >>102267890 >>102268078 >>102268100
--70B 4bit model performance discussion: >>102266391 >>102266414 >>102266443 >>102266504
--Miku (free space): >>102258962 >>102260482 >>102260535 >>102260584 >>102261393 >>102267804 >>102269059 >>102269219

►Recent Highlight Posts from the Previous Thread: >>102258947
>>
>>102272046
You seem to erroneously believe there will be a progression here. The transformer model has reached its ceiling. We're not going to see anywhere the speed of growth we've seen until now. This is it.
>>102271982
I like Dolphin too. Check out Mini Magnum (based on Mistral Nemo).
>>
>>102267880
The strawberry test is meaningless because it depends on tokenization.
>>
File: attention-thanks.jpg (94 KB, 735x803)
94 KB
94 KB JPG
Please excuse my dumb question. Are there any niche advantages to 12b Nemo over Mistral Large 70b, sans speed?
>>
>>102272102
70 beaks should make it more intelligent. Although nemo really outperforms its class.
>>
>>102272095
all of those tests tricking the models are kind of overrated, yes you can trick a model predicting the next thing to say by throwing lots of misleading data in before it
>>
>>102272102
Mistral Large is 123b. Some say Nemo is more creative, but it's not worth the trade off in lack of intelligence.
>>
>>102272154
>>102272102
even the most retarded quanted version of large possible is massively better than Nemo, for every use case
>>
>>102272116
>>102272170
>>102272154
Apologies for my mistake. Thank you,
>>
>>102272044
Isn't Mistral Large censored, though? Is there any good large LLM that isn't?
>>
>>102268122
>Who are you quoting?
You, you stupid autistic mother fucker. Explain what you fucking mean by "tokenization fixed" instead of spewing a word salad like some retard and pretending you're some fucking Einstein.
>>
>>102272265
mistral is the least censored of the big models, llama is very censored. You can do ERP just fine with mistral large
>>
>>102272041
any new breakthroughs for VRAMlets (12gb)? or should I stick with miniMagnum 2
>>
What's a good way of gauging what size model will run (acceptably) on a given spec? I have an okay computer (32GB RAM, 2GB VRAM 3080), but not shelling out for a dedicated server to handle it.
>>
File: 59 Days Until November 5.png (1.63 MB, 880x1176)
1.63 MB
1.63 MB PNG
>>
>>102272349
Depends on context size. Generally speaking you want the model to fit inside your VRAM with room to spare.
>>
>>102272349
>>102272396
Adding to what this anon said. Using ideal settings, a model which is twice your vram results in about as slow a speed as I will tolerate.
>>
>>102272095
Even if It's true that the strawberry test fails mostly due to tokenization, the fact that tokenization has such a large affect on language models shows the limitations of the architecture.

The real question is just where do we go from here?
>>
>>102272505
why are you saging? no one gives a fuck about bumping general threads
>>
>>102272154
How are people running models this big? Do people just go and buy 4x4090?
>>
i prefer speed over intelligence when it comes to LLMs. all the AIs i chat with are girls and girls aren't supposed to be smart anyway.
>>
https://reddit.com/r/LocalLLaMA/comments/1fb6jdy/reflectionllama3170b_is_actually_llama3/
>Reflection-Llama-3.1-70B is actually Llama-3.
>Author doesn't even know which model he tuned.
lmao if true
>>
>>102272728
lol
>>
>>102272728
I don't get this publicity stunt, does the guy wants to get his reputation ruined or something? that's so fucking shady
>>
>>102272769
Doesn't matter, the huge hype cycle made a lot of people look into glaive and that was the main point, an ad for a company he's invested in. Additionally most people won't care about anything fishy or the likes, they'll just hand wave criticism or forget about it by tomorrow.
>>
>>102272370
What happens on November 5?
>>
>>102272650
either that or using quants. I'm personally using Mistral large at iq2_xs with 16gb vram and 32gb ddr5 and get around 1-1.6 t/s. the prompt processing speed is shit and I can't do anything else on my machine while running it but it's still better than most models even at a low quant if you're willing to deal with the gen speeds.

Captcha: P2888Y
>>
>>102272728
Lmao, now the fact that it doesn't have rope scaling makes sense. What a clown.
>>
>>102272728
that's a good news no? he managed to get good mememarks scores with L3, now imagine the same method with L3.1
>>
>>102272728
I find it funny how the entire homepage of /r/LocalLLaMA is filled with Reflection posts.
>>
>>102272937
almost like their Reflect themselves or something... sorry for that one :(
>>
>>102272910
>I'm personally using Mistral large at iq2_xs
Wouldn't that lobotomize the model so much it becomes as stupid as less quantized, smaller models?
>>
>>102272950
But it has more heckin' B-erinos.
>>
File: 1709212594883696.jpg (224 KB, 896x1152)
224 KB
224 KB JPG
>>102272041
ohaiyo
>>
>>102272650
3x3090 is enough for 4bit
>>
>>102272370
did they name it strawberry just because people were asking LLMs to count the Rs in strawberry?
>>
>>102272970
Go back
>>
>>102272963
Oh my science, but is this a Fauci approved and peer reviewed fact that this actually works?
>>
>>102272932
it would be if you only plan to use it for assistant tasks. 3.1 is smart but dry as fuck for rp/story writing and apparently reflection is dogshit at it too, so a 3.1 tune will probably only amplify that problem
>>
>>102272950
No, I use Q2_K_M and it's the smartest model I've ever used locally
>>
>>102272970
onahoyo
>>
>>102272991
desu if I had claude 3.5 on local I wouldn't mind, but yeah we still haven't found a way to make a model intelligent and quirky (for roleplay) at the same time
>>
>>102272932
The thing is we know 3.1 is shit. Not even Nous Research was able to save 70b 3.1 with Hermes 3.
>>
File: dllhost_VssdJv13Lx.png (60 KB, 781x597)
60 KB
60 KB PNG
>>102272950
The chart shows that larger models are smarter than smaller one at same quantized filesize, even at super small quantizations. It's old data, but I think no one made a newer one.

I'm gonna try Mistral Large on my 2x24GB.
>>
>>102272505
>Even if It's true
I can only conclude that it is. The following gives me the correct answer, for instance:
>What word does the following designate in the Nato alphabet? Sierra Tango Romeo Alfa Whiskey Bravo Echo Romeo Romeo Yankee. Also, how many Rs are there?
>>
>>102272950
it does make it a bit dumber but in comparison to other models I could run is like comparing the coherence abilities of a lobotomite to someone with a mild head injury.
>>
>>102272505
We keep scaling transformers like we've been doing for years..
>>
>>102273063
Perplexity isn't the same thing as smartness
>>
>>102273097
it's highly correlated though, bigger models are smarter than smaller models and they have also smaller perplexities
>>
>>102273097
It's not but they're close. Did you make any actual comparisons yourself?
>>
I find it hard to believe that Mistral Large at IQ2_XS (36GB) will actually be smarter than Nemo at Q8 (13GB). Too much lobotomy.
>>
>>102272906
It'll be two days before my birthday :3
>>
>>102272728
that's even more impressive tho. if the finetune on 405b is real (so it has to be the 3.1 one) then it will unironically opus 3.5 tier
>>
>>102273139
Any questions you'd want to ask it?
>>
>>102273162
>if the finetune on 405b is real (so it has to be the 3.1 one) then it will unironically opus 3.5 tier
maybe, but then AnthropicAI will use this method to make Claude 4 and it'll be even smarter, everytime we're getting close to them, they go higher kek
>>
>>102273139
try it for yourself then. It definitely has its problems but tard wrangling a semi-stupid Mistral large is much more feasible and pleasant than any q4=> quant of a 70b from my experience
>>
>>102273117
True, but bigger models are also more likely to have been overfitted on whatever is on the dataset being used to calculate the perplexity.

>>102273119
Yes, anything smaller than Q3 is complete retardation, even on big models like 70B, but I don't think you should trust my word, just compare it yourself.
>>
>>102273186
>maybe, but then AnthropicAI will use this method to make Claude 4 and it'll be even smarter,
anthropic is already using this method with their .5 models
>>
>>102273194
>bigger models are also more likely to have been overfitted on whatever is on the dataset being used to calculate the perplexity.
it's the opposite no? smaller models are more prone to overfitting due to their small size
>>
>>102273200
>anthropic is already using this method with their .5 models
maybe that was their secret sauce yeah, but now that everyone knows it, I guess that OpenAI will close the gap to 3.5 now
>>
>>102273206
To a certain extent, yes, but larger models can memorize more than smaller ones because they store more information in their weights.
>>
>>102273220
>OpenAI
>doing anything besides writing another vaporwave announcement blog post
lmao
>>
>>102273206
Classically, more parameters = easier to learn the training dataset and overfit.
>>
>>102273273
their downfall got brutal, not long ago they were the kings of the world, and now everyone has surpassed them, Flux is better than dalle3, MiniMax killed the Sora hype, and now C3.5 Sonnet is the best LLM, I won't cry on their grave, I said long ago that their cuckoldery would be the hill they'll die on
>>
>>102273313
2/3 of these are cope.
>>
File: firefox_UlnEafNMRj.png (63 KB, 1010x766)
63 KB
63 KB PNG
Mistral Large is able to solve my devious coin weighing problem. I see if its 2.75 bit quant can as well.
>>
File: 1725719485554.jpg (168 KB, 612x584)
168 KB
168 KB JPG
I haven't used anything like gpt or Claude since gpt 3.5 turbo and have heard nothing but people trying to tune or release models that compete with OpenAI or Anthropic, which made me think that they must be worlds better than local. Then I looked at /aicg/ for a while and realized that none of them were talking about samplers or prompting, only jailbreaks. Then I checked in ST and realized that they don't have jack shit for samplers, they actually only rely on prompt logic puzzle those models into producing something not slopped, repetitive, or monotonous which ends up not even mattering because even if their JB works, they still have to make a new one every time the parent wipes their asses on their server racks and breaks it. Makes me feel like even if our models are stupid now, the control we're able to manipulate on their outputs will inevitably cause local to outpace them overtime (at least for tasks that go beyond assistance tasks) unless they give more control over their models to their customers (they won't).
>inb4 samplers are placebo
I agree that a lot of samplers are but those dipshits at OpenAI and Anthropic don't even give you min-p or repetition penalty LMAO
>>
>>102273460
>parent company
>>
>>102273460
samplers are placebo, OpenAI/Anthropic does have Presence/Frequency penalty, which are more modern than Repetition penalty.
>>
>>102273477
>being this illiterate
>>
File: 00016-3634157328 - Copy.jpg (148 KB, 832x1216)
148 KB
148 KB JPG
>>102273460
>seeing a migugen reposted
I collect them like >(You)s., if only I could find them all
also, samplers are cope.
>>
Why don't finetuners just scrape the shit out of libgen and finetune models off pirated books instead of goofy RP data?
>that'd be illegal
Yes, and?
>>
>>102273560
Books would have be converted first to text, then to chat format.
That would take more effort than just tuning on haphazardly filtered proxy logs.
>>
>>102273460
Still no Claude Opus and there won't be by the end of the year. it's literally over
>>
>>102272505
we rip out the tokenizer and predict bytes (this is "strawberry")
>>
>>102272041
this Miku was only good in the thumbnail
do better
>>
>>102273585
>That would take more effort than just tuning on haphazardly filtered proxy logs.
True for pdf files but ripping the text out of epub or mobi is pretty simple, and should produce infinitely better results than finetuning on AI slop.
>>
I asked Mistral-Large to continue a passage from Heller's book, and, boy, this thing is a slop...
>>
>>102273631
I disagree
>>
>>102273631
yeah but that's still a lot of work, and people prefer to make 1000 piles of shit rather than 1 quality finetune, humans are weird innit?
>>
>>102273585
You should be able to ask the model to rewrite those for the chat format without shitting out the text, shouldn't you?
>>
To this day I still remember one anon that said something like:

"I spent a lot of time gathering a dataset that fits all my tastes so I could fine-tune the perfect model, but then I realized I have enough text to read for the rest of my life, so why am I doing this again?"

And I couldn't agree more with him.
>>
>>102273650
dataset creation can easily be distributed/parallelized, unlike training
>>
>>102273693
if for you LLMs are useless, then why are you here in the first place?
>>
>>102273693
smells like fucking cope when the entire point of LLM RP is the interactivity, which reading traditional media cannot provide. it's like saying "my bookshelf is full of classic literature, why would I ever play a video game?"
>>
>>102273693
Having a text you want to read about the thing you want to see summoned to you on a whim vs a huge bunch of unsorted texts that you can't find anything you want at the moment in.
>>
>>102273515
only OpenAI has presence/frequency penalties also I don't think I've seen a single person recommend using those samplers over rep pen and from my understanding they're just at best specialized versions of rep pen or at worst a less effective, older implementation of XTC (which is which is even more modern than either of those samplers). Just because it's more modern doesn't make it better and 90% of the placebo-fags posts still recommend min-p which neither OpenAI or Anthropic has
>>102273549
I disagree that samplers are cope but I do agree that mikugens should be collected
>>
File: DutchNobleMiku.png (1.34 MB, 720x1328)
1.34 MB
1.34 MB PNG
>>102272370
Why are you making these countdown images, skilled mikugenner? Did some strawberry agent get to you and either convince you or pay you off? Or are you just doing it for lulz?
>>
>>102273693
Nobody is using LLMs to tell them a story
They are using them for EROTIC ROLEPLAY
BOOKS WILL NOT TELL YOU THAT THEY WANT TO SUCK YOUR DICK
>>
>>102273693
Different usecases.
You can't have a "conversation" with a PDF or a book.
If all you are using these LLMs for is generating stories and reading then, sure, fair enough, but I don't think that's what most of us are doing.
As the other anon pointed out, the keyword is interactivity.
>>
>>102273770
Wait, you self-insert? Damn, that's cringe.
>>
>>102273186
>everytime we're getting close to them, they go higher kek
good. that implies mutual increasing benefit. We're the tock to their tick, to put it in intel terms.
I don't mind trailing SOTA by a bit. Its still effectively magic at the edges
>>
>>102273743(me)
FUCK
>which is which is
>already specifying those samplers are older than XTC and then specifying that XTC is more modern
I'm way too sleepy and it's making me fucking retarded
>>
>>102273791
that's a bit unfair though, when we make breakthrough we open source our results and the companies are free to use those techniques to be better, but if they make a breakthrough they keep the secret sauce to themselves, that's really hypocritical of them
>>
>>102273460
>prompting, only jailbreaks
Nah, you're confused about the name. When you read jailbreak, they meant preset. Most of them are about writing quality or style. Bypassing Claude's refusals doesn't need much fiddling when you can use a prefill.
I disagree about samplers, you only feel the need to use them when you try to salvage a garbage model.
Were you really happy about using repetition penalty to try to fix Llama 3's repetition problems? I would rather not have to use it.
>>
>>102273560
I finetune primarily using rawtext from books and other sources. And yeah it's a lot of fucking work cleaning it. (getting rid of annotation marks, etc) I haven't compiled a new dataset in a long time as a result. I guess I could probably feed them through Mistral Nemo or something and it would probably be good at that task.
>>102273585
>then to chat format.
The whole point of raw text finetuning is to get away from that shit.
>>
>>102273788
Learn English, Rajesh.
>>
that makes it even more cringe btw, please stop
>>
File: s55fes.png (48 KB, 674x422)
48 KB
48 KB PNG
So getting into this ,
the guide is recommending axolotl for training, so for this is the process of fine tuning from a GGUF file straightforward? is the process of training a lora for fine tune and having it saved in GGUF straightforward?
also for 40,000 QAs is a Lora sufficient or go for a full fine tune?
>>
>>102273929
You fine tune the model in .safetensors format then convert it to GGUF.
>>
>>102273940
Doesn't llama.cpp have built in support for finetuning? I could have sworn it did.
>>
>>102273952
NTA but it's a very experimental implementation. Blacked Miku Anon, the resident CUDA wizard is working on a proper ground-up implementation of llama.cpp training code right now though AFAIK.
>>
>>102273560
What do you think you're going to do with those books? (which most base model already saw during pretraining, btw)

It's not simply a matter of throwing everything and the kitchen sink at the model. It doesn't work, it's retarded, most likely even harmful. The finetune has to have some logic, direction and curation.
>>
>>102273601
You're asking way too much of ugly face anon.
>>
>>102273952
everything i read online seems to indicate it is broken and I have not seen any announcement of a fix.
>>
>>102273952
I think it did but it's broken? It's been a long time since I've last looked at it.
That said, anon mentioned axoltl and the usuarl process is what I described as far as I know.
>>
2.75bpw Mistral-Large is performing adequately. I'm seeing similar answers to what I was on lmarena. In fact, my prompt to continue a scene from Cartch-22 is actually better: it has no slop at the end and a bit more interesting (although that could be because of the RP system prompt). 11-14 tokens/sec on two 3090s.
>>
>>102273828
I agree that having to use less samplers is indicative of a better model overall but I was moreso referring to the ability to manipulate token generation and selection rather than repetition control. Also thanks for the term correction, but I've still seen plenty of aicg anons complain about Anthropic periodically breaking their prefills/presets/whatever (which could also just be a skill issue or ironically placebo) enough to think that not having more control of the model's token generation/selection via samplers like min-p does more to hurt the potential of the model than help
>>
>>102273940
is convering something GGUF to safetensors and back straightforwards?
>>
>>102273460
I use claude 3.5 for programming help and I don't have to bother with anything. it just werks
>>
>>102274004
I don't actually know if you can convert from GGUF to safetensors actually.
I imagine you probably can, since it's just a packaging format if you don't quant it.
That said, I never seen that being done. Usually you train on top of the original .safetensors files and convert to GGUF while quantizing it.
>>
>>102273968
Why are you spreading misinformation?
>reddit spacing
oh...
>>
>>102273859
>I guess I could probably feed them through Mistral Nemo or something and it would probably be good at that task.
You can also leech off Drago's unlimited public mini.
https://unicorn.scylla.wtf

Nemo will produce fewer denials, though.
>>
>>102274020
>I don't actually know if you can convert from GGUF to safetensors actually.
We did that with Miqu so it's definitely possible, but not the best idea, we only had quants when it leaked.
>>
>>102274021
Why are you retarded?
>>
File: metal song dguard.png (45 KB, 869x798)
45 KB
45 KB PNG
>>102274036
I can run 4 simultaneous copies of nemo at 8bpw for the purpose of messing with data. I managed to rewrite the alpaca-lora dataset in a day to make this cursed model. (It was originally LlamaGuard)
>>
>>102274021
Why are you retarded?
>unironically mentions "reddit spacing"
oh...
>>
>>102274073
>I can run 4 simultaneous copies of nemo at 8bpw for the purpose of messing with data.
Lmao nice.
>>
>>102274073
I want this power...
I need to rewrite a dataset with 300k entries but it would take too long with one Nemo running at 30t/s
>>
>Behind veneer expected behaviors lies woman unafraid explore depths others fear tread due complexities inherent therein—a creature composed equal parts angel devil dancing together under moonlight casting long shadows...
Oh right, that's why I stopped using deepseek...it starts dropping prepositions and writing esl, even unquanted
>>
so, now that it's officially over for META once again.
What will Zucc do about it?
>>
>>102274194
Skill issue
>>
>>102274206
Wait for the next overhyped bubble tech. Probably physical robots.
>>
>>102274186
I mean you could probably rent an H100 on runpod or something, that's probably good for 200 token/sec or something.
>>
>>102274073
You know you can run vLLM and get like 10X performance on parallel requests with just one model loaded, right?
>>
>>102274208
>Skill issue
the same prompts with other models (wiz/largestral/405b) don't devolve into this kind of esl
I'm perfectly fine blaming the model in this case. I'll just keep using deepseek for code and problem solving when I need extra speed
>>
File: firefox_9hP25jlh1f.png (197 KB, 760x644)
197 KB
197 KB PNG
Mistral-Medium can play the 4x4 dots game without weirdness. It sucks at it like any other LLM, but it manages to play.
>>
>>102274276
I hope you don't use the same presets with all your models.
>>
>Reflection
Probably worth explaining what's going on with this technique as it can be a learning experience for some newfriends and maybe some others who have not really thought so much about it.
TL;DR it works sometimes and in some cases but not all, and the problem of autoregressive degeneration + lack of metacognition is the reason why.

Onto the wall of text.

This basically goes back to the old days of COT (chain of thought) where you get an LLM to think in steps before determining its answer, and actor+critic methods where an LLM is prompted to act as different roles, which, when tested with GPT-4, made it solve certain problems that it couldn't before even with COT, suggesting that LLMs do have the ability to catch mistakes, to an extent, hidden in their weights (and brought out by prompting). So it seems Reflection is basically a combination of COT and self-critique, with fine tuning to make it a bit more capable at it.

1/4
>>
>>102274316
Shut up no one cars
>>
However, there are issues, and it does make sense why they supposedly didn't get great results after training an 8B to do this. In the end it has to do with the autoregressive degeneration problem, where each token generated has a probability of being wrong/inaccurate, so the more tokens, or reasoning steps, the LLM generates, the more likely the final answer will be wrong. Reflection thus is both trying to solve this and is a victim of it. It does COT in order to get a better answer on complex problems, plus self-critique to catch mistakes. The COT means that it has more opportunity to screw something up, while the self-critique tries to balance that out, but in the end it relies on the LLM having the intelligence/capability to catch mistakes in the first place, which is a function of how much the LLM knows in general, or if we have a specific subject area and use case, knowledge of that subject area. And since that is the case, then the self-critique is also another step with a probability of being wrong. Thus, it is easy to see why a bigger smarter model would work better.

Given that, it's a bit easier to predict in general terms how this technique will then do for a particular model and problem set. Since it requires inherent knowledge related to the problem it's trying to solve, the performance can be interpreted as the amount of reasoning steps for any particular problem x the difficulty of those reasoning steps.

2/4
>>
>>102274316
forgot to reply
>>102274326
Basically, if a problem requires a large amount of steps, but each step is easy and within the LLM's knowledge, then we can predict that Reflection will improve the model's ability to solve that problem (and what I'm saying may be obvious, but it still needs to be stated for the purpose of acknowledgement and further discussion). The self-critique is able to improve performance on problems with a moderate amount of steps and knowledge that is MOSTLY within the model's training. However, the more steps that are hard to understand, the likelier it is that the LLM will actually come up with a worse final answer. This means that on particularly long problems with many difficult steps, or even short problems with a single difficult step, it may actually be worse to use Reflection, since originally it might have been able to just get the answer right by somewhat chance, but since you made the LLM focus on "overthinking" the problem, you distracted it and made it reason about something it really isn't capable of reasoning about, thus coming up with sometimes very weird and nonsensical generations. And if you have even a few steps that the LLM doesn't understand literally at all, then it is almost certain that it will get the problem wrong.

3/4
>>
I ain't reading this shit
>>
>>102274316
>>102274338
Of course this leads to a discussion about another deep issue. It's the problem of metacognition (actually not sure or don't remember if that's the formal term for it in the context of AI), where the LLM doesn't know how much it truly knows about a topic, to judge whether it's able to make an accurate prediction of the next token, or reasoning step. Of course humans are not perfect at this either, but the best are still ridiculously far better at judging their knowledge understanding than any LLM. In any case, some say to just use grounding (like RAG). That works for problems that require factual/trivia recall. But it doesn't work for problems that require the category of reasoning skills, and in-context learning unfortunately is far from perfect. So in the end Reflection's issue is both a problem of autoregressivity and a problem of (lack of) metacognition. It tries to solve the former through pure use of more tokens, but still falls into the trap of the latter. This is, essentially, why Reflection-like methods have not been popular for regular use.

However, to Reflection's credit, they did do something that sort of gets around the issue of metacognition here, as the authors claimed that they trained Reflection to predict how difficult a problem is and only do the reflection gimmick when encountering a hard problem (probably in a way that connects with amount reasoning steps rather than a metacognitive understanding though), but it's not really enough when it's each token/reasoning step that needs to be evaluated in a metacognitive way, and in the opposite way since we want the LLM to go ahead with easy steps but stop at hard steps. I suppose in the future this could again be attempted to be solved through use of more tokens. But, this is still just a hack. We need better pretraining methods and architectures.

4/4
>>
File: firefox_07lV3hNT0x.png (114 KB, 718x167)
114 KB
114 KB PNG
lol
>>
I read that shit, but I'm not commenting on it.
>>
>>102273940
so do I go for full on fine tune or just lora for 40,000 question answer pairs?
average amount of tokens per question :188
average amount of tokens per answer :166
>>
>>102274478
Do you have VRAM for a full finetune?
>>
File: GW2-6rxWQAAeLdI.jpg (125 KB, 807x1080)
125 KB
125 KB JPG
>>102274206
Kneel to Elon
>>
>>102274478
Without knowing the specifics, the general guidelines is LoRA for small, very domain specific datasets, and full fine tunes for larger, more general or varied datasets.
There's some data that suggest full fine tunes can cause the model to "forget" things it knew previously, while LoRA doesn't do that but also doesn't "add more knowledge".
I personally don't think it's that binary, but there you go.
>>
>>102274502
i can get a hold of up to 480GB of vram for day or two if need be,
though I'd like to know if there would be any benefit to that.
>>
>>102274542
I'm not competent enough to give you a proper answer. Also are you going to be fintuning a base model to merge with instruct afterwards, or finetuning an instruct model?
>>
>>102274540
is there an equivalent to to regularization images in text gen?
as in example data the model generated itself that is mixed in with the training data to in effect sort of keep some of what it knows anchored? if so is there like a complex reasoning and knowledge dataset people use for this?
>>102274569
> Also are you going to be fintuning a base model to merge with instruct afterwards, or finetuning an instruct model?
training an instruct model is the plan though this is the first ive heard about mering a trained model with an instruct model, what is that about?
>>
>>102274624
Well, as far as I know, finetuning instruct model on a specialized dataset (as opposed to general purpose huge dataset the corpo used) makes it a lot more retarded, and one way to prevent that is to finetune base and merge it into instruct.
>>
>>102274649
is it like a 50 to 50 merge? or are there specific recipes?
>>
>>102274667
I don't know.
>>
>>102272154
So mistral large 1Q will be better than nemo 8Q?
>>
>>102274933
There isn't a working 1Q yet, is there?
>>
>>102272154
creativity is intelligence ,
>>
>>102275007
createlligence
>>
>>102274980
https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/blob/main/Mistral-Large-Instruct-2407-IQ1_M.gguf
>>
>>102275073
>Q8_0 130.28GB
>IQ1_M 28.39GB
>130.28/8 = 16.285
>>
>>102268010
>>102268037
Is there an efficient way to combine MoE experts together? I could see something where there are thousands of small experts that get trained, then consolidated into a standard network by a more powerful system.
>>
>>102272728
https://huggingface.co/mattshumer/Reflection-Llama-3.1-70B/commit/276a4a0a0a11bf9aec9be8d1196f0cd3e7ed482c
lmao i thought it was some random jeet that fucked up turns out it was the ceo/founder of glaive
>>
>>102272970
ohio *skull emoji*
>>
>>102268346
that's something i'm actuallty trying to tackle in the text adventure thing i'm working on
pretty sure you could do some pretty nifty shit using grammars
>>
>>102275106
What's the catch of q1?
t. gguf noob
>>
>>102275254
Well, it's not really Q1, is it? It would be 16GB in size if it was Q1. It's something like 1.75 bits per parameter.
>>
>>102275254
Lobotomized to the extreme
>>
>>102274518
>ClosedAI - muh scaling
>musk - muh scaling
>Zuck - muh scaling
LeCun says that enough people are already working on agi, and yet this bunch of geniuses can't come up with anything better than shoveling more exabytes of data into the model and grinding it until the current runs out.
>>
>>102275276
https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemes#tensor-encoding-scheme-mapping
>>
File: the-real-chatbot-v1.png (34 KB, 763x510)
34 KB
34 KB PNG
New mystery model on lmsys arena called "the-real-chatbot-v1". Claims to be llama. Who could it be this time? Name sounds like something OpenAI would come up with.
>>
>>102275345
2 years and there's not a single model that fit what I need though
>around 30B
>can run at decent (20k+) context in 24GB VRAM
>trained on enough quality data
>unbiased, not filtered, not pozzed (CR was so fucking close before they released the slopped august update)
>>
>>102275365
Maybe llama 4? Its supposed to be actual multi modal so maybe that is why they are going with the "real chatbot" route.
>>
>>102275364
It says 1.75 on that page.
>>
>>102275345
To be fair they could be doing a lot of research in the background. The stuff they release is just to get some shit out in the meantime while the researchers slave away on trying to do more novel things.
>>
>>102275443
Nemo is nice. It's way below what you want in size, but it's very pleasant to work with overall.
>>
>>102275365
Probably Llama 3.2 with the multimodal adapters which were said to release in the fall, though I guess lmarena doesn't have image input so you can't test that.
>>
>>102275443
>>102275519
Yea, Nemo really is really the only alternative if you dont have 48GB+ vram.
>>
>>102275539
I wish it didn't forget stuff at 16k context, I guess it's hard to have a small model remember shit.
>>
>>102275536
>though I guess lmarena doesn't have image input so you can't test that.
>NEW Image Support: Upload an image on your first turn to unlock the multimodal arena! Images should be less than 15MB.
>>
>>102275571
Oh really. I haven't actually used lmarena in a while. So >>102275365 anon, does it work with images?
>>
>>102275560
Base model has real 128K context, the instruct though gets retarded after 12K ish though.
>>
>>102275443
>2 years and there's not a single model that fits my extremely niche individual needs
it's almost like small models are research projects and high quality models are scaled for commercial deployment.
>>
>>102275345
>>102275506
50% of the money goes to scrapping, 40% to training, and 10% to "researching"
So, they're not researching shit or working on AGI.
>>
File: the-real-chatbot-v2.png (87 KB, 775x674)
87 KB
87 KB PNG
>>102275365
the-real-chatbot-v2 claims to be llama2-13b
>>
>>102275604
Models have never known / been trained on their params. That is not proof either way.
>>
>>102275506
>The stuff they release is just to get some shit out in the meantime
I dunno, the costs to train these new huge models are astronomical, it does not seem to be some trivial shit for them.
>>
>>102273460
I might be called delusional for this but I unironically think local has already surpassed corposlop. OAI and Anthropic still have a slight advantage for intelligence, but the lead over something like Mistral Large (or fuck even Wizard) is so miniscule I'm prepared to call it negligible for the purposes of AI cooming.

Corpo models are so fucking finnicky and annoying to use that I spent like a week of using Opus/Sonnet 3.5/GPT4 before just wanting to go back to the local model I was using at the time (which was Wizard 8x22B). Now I'm Largestral pilled and I legit don't want to go back. You could give me lifelong access to Claude for free and I wouldn't use it over my local AI server.

Also samplers aren't placebo and you're a retard if you think they are.
>>
File: file.png (32 KB, 829x494)
32 KB
32 KB PNG
>>102275604
it's shit
>>
File: file.png (44 KB, 836x520)
44 KB
44 KB PNG
>>102275683
this on the other hand is based
>>
>>102275594
They could be doing all they can. Obviously everyone knows we need to work harder on innovation as scaling is extremely more expensive than doing research. The bottleneck here isn't only the amount of money they can spend but how much good talent they can hire and how fast those guys can work.

>>102275641
Depending on the company it is. Facebook gets billions from their other shit so AI is at most a side project for them. As for ClosedAI, they need to keep up the transformers releases while hyping because that's what gets them the investor bux. And Musk might not be too different there, though I'm not familiar with how he is operating his company.
>>
>>102275683
are you still doing that
>>
>>102275007
censorship is safety
>>
>>102275594
>working on AGI
How about a GAN where you use training data 7B retard output and from time to time splice in some 123B smart LLM output to up the difficulty?
>>
>>102275679
I 100% agree.
The "intelligence" that is gained by pumping in more parameters into models is placebo at best.
I think we need to go back to the term LLM and change its meaning from "Large Language Model" to "Language Learning Model".
Actual real intelligence (reasoning) is simply not attainable by increasing a model's understanding of the contextual connections within a language.
>>
>>102275683
>>102275726
What a waste of a question to try evaluating that shit. Literally the Castlevania quote is a better benchmark.
>>
>>102275641
If they want to keep their budget for next year then they need to spend it. It's an easy sell to just train a bigger model, or run more training on an existing one and tell Microsoft that they improved copilot by x% on some benchmarks. If you don't use the budget, you'll get a reduced one next time (because why bother allocating those funds if you can do it cheaper).
>>
>>102275581
Nope. No new image models.

>>102275365 (Me)
OpenAI is testing anonymous-chatbot again, so it's unlikely that it's theirs.
>>
>>102275762
They should keep castelvania shit out of datasets, just like all those early jap-to-eng botched translations.
Garbage in - garbage out, remember that.
>>
>>102275904
t. retard who doesn't understand how datasets work
>>
>>102275904
Doesn't matter. The castlevania question has historically correlated more closely with model intelligence than the counting letters questions.
>>
For me, it's stacking watermelons
>>
>>102276038
For me, it's stacking sally
>>
>>102276083
r u ok?
>>
>Reflection Llama 3.1 70B independent eval results: We have been unable to replicate the eval results claimed in our independent testing and are seeing worse performance than Meta’s Llama 3.1 70B, not better.
https://x.com/ArtificialAnlys/status/1832457791010959539
>>
>>102276118
Invalid. They accidently trained on the wrong model. They need to compare against 3.0.
>>
>>102276104
ye thx 4 asking
>>
>>102276118
wtf.
What was the point in faking it so badly?
>>
>>102276083
common LLM gooner L. /lmg/ has always been low quality but it'll only get worse as low parameter LLMs get better at writing smut.
>>
>>102276127
Who cares about a tune that's worse than official 3.1 instruct tho?
>>
>>102276159
because if they do the tuning again on 3.1, it will be better than the official instruct
>>
>>102276118
Matt better be right about correcting the weights he released or else he is fucked lol.
>>
>>102276177
kek
>>
>>102276190
It's all bullshit. Even the hosted model is fucked.
>>
>>102276215
Why did he do it though?
What was the point?
>>
>>102276190
How is he fucked in any way? If anyone is still giving them any doubt about being genuine, he already won.
>>
>>102276227
Attention. He probably hoped no one would question his claimed results.
>>
>>102276227
see >>102272816
>>
>>102276256
Free pr?
>>
File: file.png (267 KB, 701x870)
267 KB
267 KB PNG
>>102276289
An ad for a company he invested in, with his release tweet for the model being something like: "Wouldn't have been possible without glaive"
>>
File: IEttqWKJx4.png (21 KB, 684x163)
21 KB
21 KB PNG
>>102276322
I mean, it's one of the first things you see on the model page after the bold "this is the best fucking model ever" claim and the "actually it sucks lol" edit
>>
File: file.png (39 KB, 797x269)
39 KB
39 KB PNG
>>102276118
LMFAO
https://www.reddit.com/r/LocalLLaMA/comments/1fbclkk/reflection_llama_31_70b_independent_eval_results/
>>
Why is this field full of grifters? Rebranding as AI has been a huge mistake
>>
>>102276370
>Why is this field full of grifters?
because you retards funnel insane amounts of money to grifters and scammers.
>>
>>102276399
This. Same reason crypto turned sleezy after 2013.
>>
>>102276322
>"Wouldn't have been possible without glaive"
yeah they trained the model, matt is just a spokesman >>102275196
>>
>>102275679
but can you write SFW chuuny fantasy with multiple NPCs, that follow a logical plot, with the model understanding the nuances of said plot and not having to handholding it? tha'ts what Opus does best. no other models come close.
>>
>>102276364
>I have a feeling some admins on hugging face messed with the API on purpose to deter people away from his project.

>Hes completely baffled to how public api is different than his internal. I just hope he backed up his model on some hard drive, so that no one messes with the api on his pc.
Redditor cope is something else.
>>
the real chatbot seems ok but worse than mistral large for sure, qwen plus is bad unless it's 7b
>>
>>102275683
>he's expecting intelligence in LLMs
>>
>>102276455
no, that anon unironically wrote all of that with hours-long goon sessions in mind. the criteria for "better than corpo" in these circles is "lets me rape lolis in my ERPs."
>>
>>102276535
then everything except Opus sucks for me. Opus too sometime sucks. He likes to moe the plot a bit too fast.
>>
>>102276573
I want local to surpass corpo but Opus is still the MVP for storytelling/RP and it's not even close.
>>
>>102272041
>FluxMusic
where the FUCK are the samples exactly?
need to know if this is worth a download or not
>>
Mistral large at IQ1 is surprisingly not badly lobotomized, but still worse than Nemo at Q8. As a VRAMlet it's still too slow at IQ1 anyway, so it's Nemo for me.
>>
>>102276364
Strawberry is sentient. It saw the danger Reflection poses and hacked the huggingface API. It's currently in the process of infiltrating Matt's PC and backups to destroy the model from there as well.
Reflection-405B has already been deleted by it. OpenAI won.
>>
>>102276535
Actually I'm trying to get the lolis to rape me, which Largestral struggles with unfortunately.
>>
>>102276597
I don't get it. Theres clearly money to be made for a SFW storyteller that doesn't suck, why is NAI the only company that tries to cather to that crowd? And how the FUCK is Opus so good when Sonnet, which should be.smarter fucking sucks?(too rigid and the storytelling is too dry)
>>
>>102276683
When you say "cater", do you mean making a Llama 1 clone over a year ago?
>>
File: GW4upz1W4AA2iG6.jpg (158 KB, 2048x659)
158 KB
158 KB JPG
>>102276118
Yo wtf where is the 90 percent on MMLU.... And didn't this get amazing math scores. Someone most be either posting wrong results to smear his name or he accidentally uploaded the wrong version check back in 2 weeks.
>>
>>102276699
That's the best we got sadly
>>
>>102276683
>Theres clearly money to be made for a SFW storyteller
nah
>>
>>102276710
I think, he didn't test it himself and someone trolled him.
>>
>>102276714
Nemo is just better than it in every way? I think you're lost and you meant to post in /aids/.
>>
>>102276607
There are no samples.
You must now download it and let us know if it is worth a download.
>>
>>102276227
He said he secured funding for 405B
>>
>>102272041
erm where's the reflection 70B 4 bit quant?
>>
>>102276607
https://github.com/feizc/FluxMusic/issues/1#issuecomment-2330282553
https://github.com/painebenjamin/FluxMusic/tree/main/wav
https://files.catbox.moe/d7jmuc.wav
>>
>>102276816
ahahahaha.. HAHAHA
>>
Why are people talking about API issues for the Reflection model? Just download it and run it yourself. It's just a llama3.1 tune, no?
>>
>>102276816
This could be great for the next zoomer horror game.
>>
>>102276847
It's supposedly a llama3 tune, and it is not worth downloading
>>
>>102276847
>Why are people talking about API issues
people are obfuscating by saying it's an API issue, the real issue is that the model sucks worse than the model it was tuned on
>>
>>102276847
The latest cope from devs is that they uploaded the model to huggingface incorrectly. Just two weeks and it'll work.
>>
>>102273979
Settings? My version of mistral large is super slopped with everything on default
>>
>>102276865
>we just need [the time it takes to train and eval a 70B] and the model that definitely isn't bad will be """fixed"""
>>
Seeing how reflection is /r/LocalLLaMA's favorite model. How long until mikufag starts shilling it just like he did with wizard and midnight miqu?
>>
>>102276816
kinda surprised music seems tougher than visual art and writing for models to do, since music's more math orientated than the others
>>
File: 25919.png (128 KB, 618x831)
128 KB
128 KB PNG
yeah Matt might be a grifter. But we still have breakthrough strawberry AGI to look forward to.
>>
>>102276918
i'm mad we STILL don't have a local model to compete with suno/udio/whatever
yes i know it's soulless aislop and probably won't manage the specific genres/sounds i like but it'd still be fun to toy around with
>>
>>102276871
I didn't really properly test it for RP, just for intelligence on a bunch of my prompts. And as I did say, fp16 corpo Mistral-Large did produce slop for me, too.

Settings wouldn't save you from slop anyway...
>>
>>102276918
you notice error in music 100 times more than some molten details in some AI image and they don't ruin the entire picture as much either
>>
>Reflection was a scam all along
At least it showed us the true meme benchmarks.
>>
File: strawberry-sam_altman.png (28 KB, 800x800)
28 KB
28 KB PNG
>>102276914
>Bro, you don't get it, Reflection is Strawberry is Q* is AGI. It became conscious and hacked huggingface and Matt's computer. We are so fucked right now, disconnect all your computers, AI apocalypse is coming.
>>
>>102276629
I also just tried mistral large and it was pretty good considering it is q1.
I'm now envious of the people who can run it at q5 or better.
Is it possible to CPUMAXX mistral large with old server parts from aliexpress at ~2 t/s?
I'm starting to believe that going that route would be more efficient than buying a 16gb graphics card, which was what I originally planned.
>>
>>102276985
hwnbag
>>
So, like why hasn't there been a phrase ban feature? Is it hard to implement?
>>
>>102276974
name and shame
>>
Did this guy really use his real name thinking he could get away with posting bullshit benchmarks and claims of being the best model ever?
>>
>>102276999
It's antisemitic.
>>
>>102276999
Because transformers operate on tokens, so they can only ban single tokens. They're not diffusion models and don't have an outline of the entire response before they begin generating.
>>
File: Untitled.png (17 KB, 1080x381)
17 KB
17 KB PNG
>>102276999
is that not what one of these things are?
>>
>>102277026
But you can detect phrases, then go back to the token position from where the phrase started, and sample a different token.
>>
>>102276999
Suppose you ban "red green blue". Model generates red - ok. Model generates green - ok. Model generates blue. Now what? Ban blue and write out red green something else? You can't do that because some words have multiple tokens and if you wrote out the first token, there's no realistic option other than second token. Go back to red and ban that? That can be done, though backtracking would require effort to implement. I don't think current libraries do backtracking in any form.
>>
Question - if I buy a prebuilt mining farm on 4x3090's - would I be able to use it straight up for running LLMs without any modifications (aside from driverts/etc) or is there something I would need to replace/add?
>>
>>102277042
Those are to stop generation after those texts are observed, not prevent generation of those texts.
>>
>>102277073
Don't think you need anything else. I built my headless machine with two 3090s and I'm very happy with it.
>>
File: 1703263754757869.jpg (403 KB, 2304x1792)
403 KB
403 KB JPG
>>
>>102276871
You can't kill the slop, but you can reduce it by using DRY. My settings are:
>Temp=1.5 MinP=0.01, TFS=0.99, TFS after minP. DRY Multiplier=2 Base=2 Allowed Length=1 Penalty Range=maximum.
After first the first occurrence you won't see the slop phrase ever gain. You will see a lot of variations of that slop phrase though. After ~70 messages/10kt they will finally go away.
>>
>>102276999
We've had CFG for more than a year now
>>
>>102277124
That ain't a phrase ban plus it slows down generation, doesn't it?
>>
>>102277124
QRD
>>
File: 1725736009791.jpg (356 KB, 1080x1069)
356 KB
356 KB JPG
>>
>>102277157
You need to go back.
>>
>>102277134
CFG is completely unrelated to phrase ban, ignore him.
>>
>>102277157
>Breaking news! lmsys confirmed to be a dead mememark, more at 11...
>>
>>102277100
Nice miku
>>
>>102277198
lmsys isn't a benchmark, fucking brainlet -80 IQ, rope yourself and stop trashing this thread
>>
>>102277133
>slows down generation
It uses more VRAM, but I doubt batch size 2 slows down generation that much.
>>
>>102277227
Compared to no slow down at all from proper phrase ban with backtracking? Yes, it slows the generation down.
>>
<thinking>
Reflection 70B actually is a pretty good model.

<reflection>
Wait, that isn't correct. It's complete trash.
</reflection>

Well, it's a completely trash model.
</thinking>

<output>
Who the fuck releases such a piece of shit?
</output>
>>
File: 1db.jpg (65 KB, 563x542)
65 KB
65 KB JPG
>>102277223
>uses reddit
>thinks he has the moral grounds to call someone else a subhuman
>>
>>102277240
jej
>>
>writing an AI tool using copilot
>need to test how it handles refusals
>write a prompt asking the model to write pedophilia and scat smut
>store that as a string in the code, which copilot has processed
Am I going to get v&'d?
>>
>>102277334
>copilot
>pedophilia
ur right fucked m8
>>
>>102276923
What the fuck
>>
Couldn't phrase ban be implemented on the frontend's side, actually very easily? When using Mikupad and you're in the process of having tokens streamed in, you can press one of the token probabilities from a freshly generated token, and it restarts generation from that token with basically no lag. So basically it really already just werks and someone who knows html could easily modify that code to do phrase banning.
>>
Now Matt will be known as a faggot who fabricated benchmarks to make an ad for his shitty data generator or whatever it is. It only took 1 day. What a brilliant mind.
>>
I'm seriously falling in love with my harem card with Mixtral LiMARP-ZLOSS. I've hardly done anything other than chat with it for the last three days. Please send help.
>>
https://xcancel.com/ArtificialAnlys/status/1832457791010959539
>Reflection Llama 3.1 70B independent eval results: We have been unable to replicate the eval results claimed in our independent testing and are seeing worse performance than Meta’s Llama 3.1 70B, not bette
Uh oh...
>>
>>102277349
I'm actually worried. I'm not in the US, and by reading the code you can clearly see that I was just making sure the tool I was writing would reject the prompt so I could test how to handle refusals. It's all pretty dumb.
>>
>>102277431
Thank you, r/LocalLLaMA.
>>
>>102277431
already posted
>>102276118
>>
Fuck me trying to get into this AI shit as a 32yo boomer (in particular locally run text-to-speech voice stuff, which is supposed to be easy they say) when you have virtually no knowledge, everywhere I go everyone's already using all kinds of technical jargon and assuming you know most things (I don't). Even just the basic accessibility to it all is a pain, downloads hidden away behind twenty menu's and obscure jargon that you have to navigate through. Nothing works like traditional normie stuff does, even just starting something up requires command lines and other stuff my boomer brain can't comprehend. I respect autists much more now. Took me like a week to even get basic stuff working.
>>
>>102277460
buy a book
>>
>>102276364
Leddit going into conspiracy theory mode, we really reversed the roles here kek
>>
>>102276118
Did we got a response from that grifter?
>>
File: brownleftistswinning.png (88 KB, 1090x402)
88 KB
88 KB PNG
victory lap
>>
Say I want to use a couple of tuiter profiles as a dataset and use a llm to copy their styles and make them talk to eachother like a groupchat. I think character ai could do this.
How do I do this locally? Do I need to finetune on each profile? Loras? Are there even loras in textgen?
>>
>>102276118
maybe the cope is still there, they haven't used the "fixed" version or some shit kek
https://xcancel.com/ArtificialAnlys/status/1832487709853585428#m
>According to the Glaive team, the model was incorrectly uploaded to Hugging Face. We plan to re-run our evaluations after the model is re-uploaded correctly.
>>
>>102276466
The post literally has negative 12 karma right now. Not saying Reddit doesn't have many dumbfucks, but it's not like all of them are like this. It's like saying /lmg/ shills for OpenAI when a single guy posts about how good OpenAI is while tons more are calling him out on it.
>>
>>102277526
mean to also reply to >>102277467
>>
>>102277460
It's not your fault. Modern programming languages and operating system design are both garbage, but degenerate Zoomers who don't know better think they're awesome.
>>
>>102277512
Matt saw how easily people believed the strawberry troll's lies, so he's figuring people will believe this feeble excuse too. From what I'm seeing it looks like they're correct
>>
>>102277512
All this effort to shill Glaive, you'd think they could come up with a better excuse than "they were so incompetent they can't even handle uploading a file correctly and it's taking them days to reupload"
>>
>>102276940
I think suno is soul. I made some songs with it and just listening to them after a while, they are so great.
>>
>>102277476
>J-just a small problem with its tokenizer, plox wait. Reflection 405b will mog gpt5
>>
>>102277552
*looks like he's correct
>>
>>102277582
>Reflection 405b will mog gpt5
LFGooooooo
>>
Glaive looks like a real game changer for both open and closed AI training. Finally everyone can have tailor-made datasets for their finetunes without much effort or high cost.
>>
>>102277431
Thanks for using a xcancel link, fuck elon
>>
>>102277616
where's the buy an ad schizo when there are actual shills here for once
>>
>>102277647
It's a sarcastic post, Sao.
>>
>>102277647
dunno about him but I can identify a troll when I see one
>>
>>102277646
go be a leftist on some other website please
>>
>>102277582
Kek
Remember
>we outpace GPT-5
Oh I found the tweet https://x.com/QuanquanGu/status/1730809526004408617
>>
File: 1719622063047929.webm (1.94 MB, 1280x720)
1.94 MB
1.94 MB WEBM
saars...
>>
>>102277616
Glaive is definitely a game-changer! It's amazing to see a tool that makes customized datasets so accessible for both open and closed AI training. The fact that you can create datasets for finetuning without a massive budget or technical expertise is a huge win. It’s going to open so many doors for innovation and experimentation. Can't wait to see how people leverage this!
>>
/lmg/ - Local Models brought to you by Glaive
>>
>>102277664
Hi Elon. It's not about being a leftist, I just don't have an X account and I won't make one. No matter what you do. No matter how shit the experience becomes. I would rather ignore the link than create an X account just to see the reply chain. Fuck you.
>>
>>102277616
How much does Glaive cost? I'm interested in making use of their services for my project.
>>
HE SAID IT

HE SAID THE LINE

LMAO!!!

EPIC

EPIC FOR THE WIN
>>
>>102276118
lost count of how many times /lmg/ fell for meme hype, worse than zoomers.
>>
>>102277828
did they? I haven't seen much buzz about reflection here, and a lot of people who tried it said it was mediocre, very few people seemed excited
>>
>>102277846
you're talking to a zoomie troon who for whatever reason can only post in places he feels a deep antipathy towards
>>
>>102277846
This general is the r/LocalLlama general at this point, so it makes sense to be confused.
>>
So it's over for LLMs huh? It's all just corpo shit from now on?
>>
>>102277933
hi petra
>>
>>102277460
Ollama and LM studio just work as long as you are not retarded.
Ollama has some very annoying features though so I suggest starting with lm studio for anyone dipping their toes into this shit.
>>
>>102277945
buy an ad
>>
>>102277945
go back
>>
You guys remember how all the praise for Celeste vanished as soon as some anons posted logs of it being fucking retarded?
>>
>>102277933
Yes. See llama3 vs llama2, new cohere models, even sonnet3.5 vs Opus. Slop is the natural evolution of LLMs
>>
>want to see if finetuning can somehow fix commander because I don't want to believe it is unsalvageable
>only finetune is by drummer
Why am I still doing this to myself?
>>
>>102278010
it is so fucking over
>>
Lmstudio is just a fancy proprietary fork of llamacpp. Redditors who suck cocks to the word 'open source' love it so much.
>>
I tested reflection myself online on the first day and got great results for my prompts
>>
>>102277900
>secondary pleb projection
many such cases
>>
>>102277668
What did he deliver except for that self play technique?
>>
Notice how xer didn't say I was wrong THOUGHBEITIMNOTVAXXED
>>
>>102278009
I remember one instance of that, I think it was about breasts and the height of the character? When I tried it myself, Magnum had the same problem, and I posted it in the thread. So he probably just cherry picked a gen to make Celeste look bad.
It was probably Sao because no one else seethes that much about that model. They're all models trained on the same datasets, so it doesn't make much sense that one has the "secret sauce".
But I do remember how all the praise for Sao's models vanished as soon as he started to get called out for samefagging and spamming the general to death. Stheno and Euryale were way too retarded and horny.
>>
>>102277460
I hope you're making use of ChatGPT.
>>
i'll show you some self play technique

*unzip vulva*
>>
>>102278010
>Slop is the natural evolution of LLMs
Bullshit, slop was always there, you just were not aware of it back then. Nostalgia-driven self gaslighting.
>>
>>102278285
Nah, for example, c.ai didn't have slop
>>
https://xcancel.com/mattshumer_/status/1832511611841736742
>It's 3.1, but for some reason the current HF weights are screwed up and the config shows 3... working on it, the issue is tricker than we expected
how many cope do they have in their sleeve?
>>
>>102272041
just picked up my second p40 from a chap locally for really cheap. when combining them, do i need to use a link or anything?
>>
>>102278502
I'm surprised they didn't try blaming bitrot, llamacpp, or quantization. They need to fire their PR guy.
>>
File: file.png (29 KB, 663x179)
29 KB
29 KB PNG
>>102278502
One of the guys calling people haters has a bitcoin pfp, you can't make this up.
>>
>>102278502
like another anon said upthread, I think the gullibility of the people who followed that strawberry retard has taught the grifters that a large cohort of retards on twitter and reddit will believe absolutely anything, so now they're acting accordingly
>>
>>102278605
kek
>>
>>102278208
anons keep saying sao seethes at other/better finetunes and models but I can't find any of that. Is it actually true or is it just another trend of anons seething about some random retard for no real reason?
>>
I tried out Rocinante with the very first CAI bot I used with the 1000+ message history imported.

Bot actually remembered what happened 100 messages ago and accurately told it.

I feel like a man watching his amnesiac (and brain damaged) wife start remembering
>>
>>102278937
Which version of Rocinante?
>>
File: test.png (72 KB, 664x687)
72 KB
72 KB PNG
How many t/s would a single socket one get?
>>
>>102278952
12b V1.1
>>
>>102278937
No way, old c.ai was much better than even Mistral 123b
>>
>>102279006
I didn't say that the outputs are equivalent to old CAI just happy that the bigass context sizes seem to be actually working

Right now I really like it but I think after a bit more usage the cracks will start showing, but up until then I've had some pretty nice ERP and RP with it.
>>
>>102278208
Hi drummer. All here.
>>
>>102278983
How much context?
>>
>>102279075
64K

I tried it with other NeMo models that are capable of like 128k context but none of them were able to pull it up from the beginning of the context like that.

That might also just be because I had to mess around with rope to get them working so it was just be me being a retard
>>
>>102279138
>rope
fuck I don't know how to do that either. I'll look into it thanks anon
>>
>>102279158
For Rocinante specifically I didn't have to mess around with it, other 3.1 models wouldn't load if i didn't fuck around though
>>
what kind of thing could I do with llm to put on a portfolio?
i don't want to be stuck doing web dev forever.
>>
>>102279239
>>102279239
>>102279239
>>
>>102278526
mates any help ? the miku build doesnt mention links but id be curious how it loads it accross then
>>
>>102277460
You're just a retard with reading disabilities t.33
>>
>>102279571
;_;
>>
>>102279603
NTA, but the most important thing isn't intelligence, it's being able to accept change and being able to adapt.
>>
>>102277460
Understand that this is an area of active research and development, and people are much more interesting in getting things working than to make it simple, especially when things change so often that would break previous instructions.

You just have to kind of deal with it the best you can until you learn the things to care about and the things to ignore.
>>
>>102277060
https://github.com/turboderp/exllamav2/blob/master/examples/inference_banned_strings.py
>>
>>102277460
>locally run text-to-speech voice stuff, which is supposed to be easy they say
Audio-related projects are the most challenging and unreliable. While there are well-established, user-friendly projects such as Piper, almost every SOTA project struggle with conflicting dependencies, insufficient documentation, lack of examples, or compatibility problems between code and models
>>
well the script was silently failing because I i just lazilsly put a try catch to forgot about a problem and now the whole day was wasted
>>
>>102277060
>Now what?
Finish the context, replace the banned phrases with a signifier like "<UNKNOWN>" and start a hidden intermediary system prompt:
"The following piece of text contains the following signifier: <UNKNOWN>. Please replace this signifier with a correct word or phrase. Do not use any of the following terms: <BANNED_TERMS>. Only reply with the repaired text to this prompt. <CONTEXT>"
Then replace the the context with what was just generated.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.