[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: ComfyUI_34510_.png (933 KB, 848x1024)
933 KB
933 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103312983 & >>103298520

►News
>(11/26) OLMo 2 released: https://hf.co/collections/allenai/olmo-2-674117b93ab84e98afc72edc
>(11/26) Anon re-implements Sparse Matrix Tuning paper: https://github.com/HeroMines/SMFT
>(11/25) Qwen2VL integrated with Flux: https://github.com/erwold/qwen2vl-flux
>(11/25) Speculative decoding added to llama-server: https://github.com/ggerganov/llama.cpp/pull/10455
>(11/22) LTX-Video: Real-time video generation on a single 4090: https://github.com/Lightricks/LTX-Video

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: zzz.jpg (13 KB, 367x316)
13 KB
13 KB JPG
►Recent Highlights from the Previous Thread: >>103312983

--Speculative decoding and its usage with various models:
>103313328 >103313336 >103313340 >103313551 >103313365 >103313440 >103313460 >103313463 >103313598 >103313658 >103313693
--Tulu model impressions and discussion:
>103313747 >103313769 >103313787 >103313802 >103313822 >103313853 >103313890 >103313917 >103313927 >103313950 >103313989
--Sentient: local personal companion with graph memory and agentic integrations:
>103313310 >103313339 >103313387 >103313484
--Recapbot test results and feedback:
>103315415 >103315532 >103315611
--OLMo discussion: new arch, 4k ctx, and Reddit data:
>103315697 >103315710 >103316008 >103315750 >103315847 >103315893 >103316010 >103316058
--OLMo 2 models and the state of open-source AI:
>103316073 >103316150 >103316245 >103316283
--LoRA's limitations and potential issues with fine-tuning:
>103313076 >103313114 >103313162 >103313220 >103313244 >103313313
--Discussion of kobold and booba alternatives, dev pace, and feature comparisons:
>103313177 >103313243 >103316052 >103313248 >103313345 >103313315
--Common failures and limitations of coding models:
>103316427 >103316470 >103316488 >103316513 >103316528 >103316524
--Choosing a draft model for speculative decoding with llama.cpp:
>103314138 >103314187 >103314222 >103314611 >103314742 >103314761 >103316743 >103316710 >103316739 >103314793 >103315098
--Autoround quantization and its performance compared to regular quant methods:
>103313507 >103313718
--Anons discuss language model performance and limitations, criticizing the focus on benchmarks and "meme marks":
>103313710 >103313835 >103316791 >103314144 >103314165 >103314228 >103315010 >103315109 >103315181 >103315206 >103315220
--Miku (free space):
>103313053 >103313132 >103313312 >103314097 >103314884 >103315109 >103316701 >103316754

►Recent Highlight Posts from the Previous Thread: >>103312989

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
Okay, if Tulu is so good. Why cant I find any mention of it on reddit?
>>
>>103317977
Because its not drummer shilling his model?
>>
>>103317977
https://www.reddit.com/r/LocalLLaMA/comments/1gwl339/t%C3%BClu_3_a_set_of_stateoftheart_instruct_models/

https://www.reddit.com/r/LocalLLaMA/comments/1gz04zu/teleut_7b_tulu_3_sft_replication_on_qwen_25/
>>
Are you faggots doing that trolling again where you pretend that the model is good? Like with nemotron 70b?
>>
>>103318079
>nemotron
>trolling
Your either a retard or are the one trolling here. Nemotron is great, best until tulu for creative uses (and also the best 70B at coding but got surpassed by qwen2.5 32B coder)
>>
>>103318090
Got it.
>>
>>103318079
When are you idiots gonna realize it's a skill issue on your end. It's not that people are shilling models that are bad, it's that you're too stupid to use them properly. Retard in garbage out.
>>
>>103318112
>idiots
Pretty sure its just one guy. He even tried to argue against using a model's correct formatting.
>>
Best inference engine for distributed compute GO
>>
>>103318153
no
>>
>>103318079
I tried it a bit but only at Q3 cause VRAMlet. The prose at least seems quite better than the usual llamaslop
>>
>>103318079
It's more that they are dumb cavemen and ESLs who genuinely don't notice when the model says retarded or illogical shit after their dicks get hard.
>>
>>103318188
Unlike other llama tunes / mistral large it got complicated positions with a non human character correct and unlike qwen2.5 it is not dry, undetailed sex. And unlike any of those (mistral large is ok at it) tulu is creative and pushes the plot forward. I think your just a troll who has never used it, otherwise post this apparent logical error.
>>
File: dasdadadasd.jpg (75 KB, 645x634)
75 KB
75 KB JPG
https://files.catbox.moe/ge639f.jpg
>>
>>103318224
>your just a troll
ESL confirmed.
>>
>>103318236
that's hot but can you make her flatter
>>
>>103318239
you're just a troll and also a grammar nazi
better?
>>
>>103318153
vllm
>>
>>103318236
that's hot but can you make the guy fatter
>>
>>103318236
That can't be good for her back
>>
>>103318366
uoh, nice
>>
>>103318366
peak
>>
>new thing drops
>mikutroons still shitting the thread
>>
>>103318449
post teto then, faggot
you won't
>>
File: ``.jpg (137 KB, 832x1216)
137 KB
137 KB JPG
>>
>>103318449
I like them cuz they make you seethe.
>>
>>103318236
>>103318366
>>103318460
>>>/g/ldg
>>
>>103318462
>I am a mentally ill troon cause it makes you seethe
>>
>>103318475
Wow. That's a creative insult. Well done. I tip my hat in your general direction.
>>
>>103318449
The only one I see shitting here is you
>>
File: migusalad.jpg (131 KB, 1216x832)
131 KB
131 KB JPG
>>
When generating text I get something like 3 t/s but when generating code I'm seeing from 3.25 to 3.6 t/s
This draft thing is like the second coming of miqu
>>
>>103318090
>Nemotron is great, best until tulu for creative uses (and also the best 70B at coding but got surpassed by qwen2.5 32B coder)
Qwen let me down on non-trivial stuff that L3 tunes take a decent shot at.
>>
>>103318513
That is because code has less valid options so the draft model should be correct more often.
>>
>>103318513
For maximum speed, don't forget to disable the repetition penalty, use temperature 0 and set "--min-draft 1"
>>
>>103318536
>use temperature 0
At super low temperature I've had some models make factual mistakes that they don't make with some temp (0.2ish) and a savage Top-P (like, 1).
>>
>>103318525
and qwen coder got a lot of more complicated stuff L3 tunes failed at for me. Try deepseek R1 as well, that will be next level if they ever release it.
>>
>>103318562
At some point I hope to put together a more comprehensive programming test set, right now it's just some random Python demo shit and a particularly tricky Java question. Most L3's make the mistake but correct it correctly when called on the error. A few, including Nemotron, caught the problem and described it before generating the code suggestion. Coder 32B doubled down on being wrong by offering a fix that made the mistake worse.
>>
>>103318559
Anon... I...
>>
>>103318612
I too started replying to newfriend over there several times but decided to just move along.
>>
>>103318079
Its one or two at max, both might samefag at the same time, hard to tell, /g/ needs IDs badly.
>>
>>103318719
Nah, its the same guy. He calls literally everyone a shill any time some new model gets recommended and then starts claiming their shit without any logs to back it up.
>>
>>103318731
you have misunderstood the post you were replying to, it was about the opposite of that guy
>>
Why can't the new Mistral Large do punctuation at all? It just keeps messing up ** or quotation marks for no reason. Yes, I have adjusted the prompt format.
>>
>>103318777
Using DRY and/or XTC?
>>
File: yuuga.png (876 KB, 768x825)
876 KB
876 KB PNG
>>103318559
https://artefact2.github.io/llm-sampling/
>>
>>103318777
That sounds like a tokenizer config problem, it doesn't do that for me.
>>
>>103318837
There he is.
>>
>>103318837
What's with the accents? Did they finally put a filter on your spam?
>>
>>103318837
Kek keep it up bro
>>
>>103318785
Just Temp 1 and min-p 0.03
>>103318803
I tried two different quants. It just keeps doing it.
>>
R1 seems impressive.
>>
>>103319043
Post the second part NIGGER
>>
>>103319043
>>103319060
It is certainly a creative choice but it works.
>>
>>103319074
this is illegal
>>
>>103319085
It BTFOs O1. Openai is dead if they release this.
>>
>>103319074
Deepseek won. Let's see Meta's strawberry.
>>
cydonia-22b-v1.3-q5_k_s.gguf runs great on my computer.
What is another ~22B q5 model but built for programming? I need something that can assist me quickly with code, but locally....
GPT 4 aint bad, but I wonder if leaving out all the bullshit and training the model just on the software development process is enough to keep it compact.
>>
Star Attention: Efficient LLM Inference over Long Sequences
https://arxiv.org/abs/2411.17116
>Inference with Transformer-based Large Language Models (LLMs) on long sequences is both costly and slow due to the quadratic complexity of the self-attention mechanism. We introduce Star Attention, a two-phase block-sparse approximation that improves computational efficiency by sharding attention across multiple hosts while minimizing communication overhead. In the first phase, the context is processed using blockwise-local attention across hosts, in parallel. In the second phase, query and response tokens attend to all prior cached tokens through sequence-global attention. Star Attention integrates seamlessly with most Transformer-based LLMs trained with global attention, reducing memory requirements and inference time by up to 11x while preserving 95-100% of accuracy.
https://github.com/NVIDIA/Star-Attention
From Nvidia. improvements over ring attention mostly in speed.
>>
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens
https://arxiv.org/abs/2411.17691
>We reveal that low-bit quantization favors undertrained large language models (LLMs) by observing that models with larger sizes or fewer training tokens experience less quantization-induced degradation (QiD) when applying low-bit quantization, whereas smaller models with extensive training tokens suffer significant QiD. To gain deeper insights into this trend, we study over 1500 quantized LLM checkpoints of various sizes and at different training levels (undertrained or fully trained) in a controlled setting, deriving scaling laws for understanding the relationship between QiD and factors such as the number of training tokens, model size and bit width. With the derived scaling laws, we propose a novel perspective that we can use QiD to measure an LLM's training levels and determine the number of training tokens required for fully training LLMs of various sizes. Moreover, we use the scaling laws to predict the quantization performance of different-sized LLMs trained with 100 trillion tokens. Our projection shows that the low-bit quantization performance of future models, which are expected to be trained with over 100 trillion tokens, may NOT be desirable. This poses a potential challenge for low-bit quantization in the future and highlights the need for awareness of a model's training level when evaluating low-bit quantization research.
mostly just putting the work in to prove what we all already knew. slop tokens are really going to have to be more vigorously deleted from datasets
>>
Has no one made progress in maintaining the characters personality past context limit? It's annoying.
>>
File: 1732680317640.jpg (237 KB, 1080x1260)
237 KB
237 KB JPG
tf
>>
Tulu does the thing all assistantslop models do where it favors a SFW word in logit probabilities even when a NSFW word would be the more obvious choice.
Like if you give it "I'm going to milk all the..." in the middle of an obviously sexual context, Tulu's most probable token for the next word is 'stress' rather than 'cum'. It shies away from the smut word, substituting it with a technically-plausible but unlikely SFW one. That's corpo model behaviour, no smut tune would do that. This is unusable for coomers, regardless of what you guys say.
>>
>>103319228
Hi Drummer
>>
File: NoU.png (431 KB, 1212x2084)
431 KB
431 KB PNG
>>103319228
Use the authors note I posted last thread. Tulu is more filthy than any other model out there and not a shiver to be seen. And unlike said "smut tunes" its not retarded
>>
>>103319268
Meds. Drummer unironically bought an ad, he doesn't shill from the shadows like that.
>>
>>103319277
This screenshot literally proves my point, are you drunk? It doesn't use any crude smut slang terms at all, it's all purple prose and euphemisms like a romance novel.
>>
>>103319220
>added another 9 out of nowhere
>couldn't calclulate 99-9
Was it o1-mini?
>>
>>103319277
>Filthy
>Fill me with your seed
Lol
>>
File: Untitled.png (334 KB, 1122x1549)
334 KB
334 KB PNG
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem
https://arxiv.org/abs/2411.17525
>Quantizing large language models has become a standard way to reduce their memory and computational costs. Typically, existing methods focus on breaking down the problem into individual layer-wise sub-problems, and minimizing per-layer error, measured via various metrics. Yet, this approach currently lacks theoretical justification and the metrics employed may be sub-optimal. In this paper, we present a "linearity theorem" establishing a direct relationship between the layer-wise l2 reconstruction error and the model perplexity increase due to quantization. This insight enables two novel applications: (1) a simple data-free LLM quantization method using Hadamard rotations and MSE-optimal grids, dubbed HIGGS, which outperforms all prior data-free approaches such as the extremely popular NF4 quantized format, and (2) an optimal solution to the problem of finding non-uniform per-layer quantization levels which match a given compression constraint in the medium-bitwidth regime, obtained by reduction to dynamic programming. On the practical side, we demonstrate improved accuracy-compression trade-offs on Llama-3.1 and 3.2-family models, as well as on Qwen-family models. Further, we show that our method can be efficiently supported in terms of GPU kernels at various batch sizes, advancing both data-free and non-uniform quantization for LLMs.
actually compares to quip# and QTIP. lower PPL than quip# but QTIP is better. faster inferencing than both. iirc quip#/QTIP take forever to actually quantize but didn't see anything in this paper at a quick glance for how long it takes either. only some pseudocode no github but hey new day new quant method.
https://github.com/BlackSamorez
git of one of the main authors. hidden repo worked on recently so probably will be open sourced there
>>
>>103319292
No, this is just plain old GPT-4o
>>
What Tulu is best Tulu?
Is 70B any good?
>>
>>103319303
Wouldn't it be more fair to compare it with CoT model?
>>
File: Hows this.png (171 KB, 1214x923)
171 KB
171 KB PNG
>>103319295
This is with the theme of MLP so that prob hampers the dirty talking there. I explicitly told it to use language fitting of the universe as can be seen by her saying buck instead of fuck. Heres something else.

>>103319318
Best RP model atm, not sure about sfw capabilities, benchmarks say its close to qwen2.5 72B
>>
>>103319333
Okay that's still pretty purple but I concede it's closer since it actually said 'pussy' and 'clit' this time instead of some SFW euphemism.
>>
>>103319327
I mean, yeah, but I didn't expect 4o to fail this hard.
>>
File: Hows this2.png (177 KB, 1207x855)
177 KB
177 KB PNG
>>103319338
Here, I told it to be vulgar.
>>
>>103319277
>>103319333
Thanks for the screenshots, appreciated. Almost nobody shows anything anymore.
Feels a bit like qwen to be honest. Probably the llama 3.1 base.

I really hope we can models that dont have fucked up context and this horrible stretching out of a simple sentence.
Like get to the point. No wonder aicg fags are 80%+ femoids. They probably love this shit.

>ILLTELLYOUEXACTLYHOWIWANTYOUHOWINEEDYOUTOTAKECONTROLTOLETGOCOMPLETELY.ILLWATCHYOULOOSEYOURSELFINPLEASEUREYOURBODYSHUDDERINGYOUREYESGLAZEDWITHLUSTANDWHENYOUCANTTAKEITANYMOREWHENYOURONTHEEDGEILLPULLYOUCLOSE!
Imagine being a undervolting 30gb vramlet and having to watch this shit roll in at low t/s.
But they almost all have this problem, there a fundamental problems. In b4 prompt skill.
>>
>you can use the shitty aya 8B model as a draft model for command-r-plus
>it's so unbelievably retarded due to the multilingual stuff that it barely manages to predict anything
Should've figured...
>>
>>103319354
So this is how I should be talking to women huh.
>>
File: nala.png (136 KB, 789x720)
136 KB
136 KB PNG
tulu 70b nala test. dunno if anyone else has done this yet. haven't been here often recently.
>>
>>103319378
>femoids
>on 4cück
Cute retard.
>>
how much are vramlets missing out? I have a 12gb 3060 with 16gb ram and there are obviously models that couldn't possible load on my current system. Those 35gb, 40gb models, how much more 'fun' are they compared to the 10gb I'm forced to run?
>>
>>103319392
>ahh ahh mistress in quotation marks
ruined
>>
File: Hmm.png (84 KB, 1213x896)
84 KB
84 KB PNG
>>
File: unknown.png (168 KB, 574x550)
168 KB
168 KB PNG
a-at least she's accommodating
>>
>>103319395
femoids and faggots over there, its pretty obvious.

>>103319397
Not much to be honest. I would pay 10$ crypto on openrouter and try them out first if you like those.
I got myself a p40 because I wanted to try 70b models.
What you are missing out on is higher context for nemo and mistral. (They slip into repetition at higher context anyway)
Even the 70b tunes are all positivity sloped, I suppose because they train on (old) gpt outputs and thats difficult to get out.
In general it feels the higher B the more assistant.
I tried mistral large on openrouter, thats probably what everybody wants. But you need like 3 3090 to run that.
Running at Q2XX is a crime, doesnt follow format etc. anymore.
Hope we get a scale of nemo to like 30b. Its smart for its size and really the only model thats fun.
Mistral small is smarter but also more assistant unfortunately.
Some people swear on the gemma 27b magnum tune, but I dont see much difference to mistral small with only 8k context.
>>
>>103319463
>Not much to be honest.
This is only true if you do the most simple of shit. Try anything more complicated than a RP with a humanoid character and you will see the differences then, especially when you get to cards including game systems.
>>
>>103319497
I found that for more complex cards nemo shits the bed and you need mistral small.
Mistral small also reliably keeps track of stats, which is really cool.

I just dont like using the bigger models because the writing is so bad. Its just not fun to use.
And you still get retardation anyway. Like 72b magnum has stuff like thinking you can get pregnant from anal, trips up with size differences etc.
Bigger models seem to do cards with multiple characters better though.
>>
>>103319530
>the writing is so bad
Magnum fixed it but made it retarded. Nemotron fixed it without, tulu fixed it further imo.
>>
>>103319411
>as slutty as possible = sass and sultriness
you cannot escape llama3's positivity bias
>>
>>103319567
>>103319354
>>
>>103319567
Yeah with corpo/assistantslopped models the positivity bias seems to make them interpret "be slutty" as meaning "be a sassy girlboss". It's always kind of belitting and haughty and vaguely dommy and contemptuous. Cuck/femdom enjoyers probably like it I guess.
>>
>>103319567
true. talking in a husky voice.
>>
>>103319585
What does slutty mean to you, lol? Another swipe had it make each "sort" a "stroke" and talking about a dripping pussy.
>>
File: 21.png (42 KB, 1102x602)
42 KB
42 KB PNG
>>103319567
>>103319585
>>
>>103319630
nemo once again proves that big models are soulless memes
>>
>>103319642
That is tulu 70B
>>
>>103319630
>swap them like a woman swapping partner at a wild party.
>like a seasoned player sorting through her conquests
uhhhh yeah, its slop time baby. *anon whips out his dick again*
>>
>>103319576
Well it's better than I expected, at least it has 'obscene' words in it. It's lacking the bite of '''dark rp''' but it might be possible to prompt around it.
>>
File: ItsBeenALongDay.png (1.03 MB, 1280x768)
1.03 MB
1.03 MB PNG
Good night, lmg
>>
>>103319784
rape u soon
>>
Has anyone managed to get Qwen2-VL-Flux to work?
It seems like it could be a great way to improve flux, but 48 GB of VRAM is quite heavy.
>>
>>103319826
>Lets make flux even slower!
>>
File: 1729076575056262.png (1.77 MB, 1188x712)
1.77 MB
1.77 MB PNG
>>103319784
Omggg its migu1!1
>>
>>103319844
i will not rape that
>>
>>103319826
Where is it? I do want to give it a try
>>
>>103319354
What depth were you using your AN on?
>>
>>103319884
the default, 4
>>
>>103319277
What are your regular sampler settings and context? I’m skeptical because I just tried it and got shivers/husky within the third response but maybe mine are off.
>>
>>103319893
I normally just do something like 0.05 min p and / or maybe 0.95 / 0.9 top A instead
>>
>>103319904
top A
Sorry, meant top P
>>
>>103319784
Good night tired Miku
>>
>>103319431
>you can make me your personal trainer
Waitaminit, just what's the big idea here...?
>>
I saw some posts the other day that brought up what really makes Claude great, which is that it understands subtleties and goes hard into them. If the character is supposed to be a vocaloid fan, it will proactively weave into its response the kind of things that real fans really would actually say, and not in a fake way like in a "hello fellow kids" way that most models do. But what is the way to solve this for open weight models? Tulu 3 seems to show that we can now do something very proactive and it's open source so we can reproduce it, so we just now need pretrained models that are really trained on a ton of real human data from the worst parts of the internet. Then the model will know what they are like, so it can act like them with fine tuning.

New Mistral base model when? Ideally 70B.
>>
>>103319431
>blurred out wall of text schizophrenia
and this is supposed to be impressive?
>>
>>103319971
Claude (esp. Opus, Sonnet is very good but it doesn't have big model smell) is also the only model that is often genuinely funny in an original way, I want to know what the secret sauce is. One time it described a clumsy French kiss as "her tongue wriggled in his mouth like a tased eel", which from a Google search seems to be a totally original simile.
>>
>>103319909
Oh maybe its my context settings, what are you using for the story prompt? default or llama3?
>>
>>103319991
>I want to know what the secret sauce is.
Being trained on the entire internet and a fuck ton of books / other stuff. Claude knows obscure stuff only on fanfiction websites and in copyrighted works.
>>
>>103320001
Neither, tulu's formatting for instruct, no tags in the context template, just some stuff like "User character:".
https://files.catbox.moe/qvn0g3.json
>>
My guess is that Claude is both a huge total param model so it can store all that information about every single thing humans have come up with, but a MoE so it is still fast.
>>
>>103320003
You can tell it's the only corpo model that has 4chan in the pretraining data too because it's the only one that can generate plausible-looking 4ch threads rather than something that reads like a redditor's satire of 4chan.
>>
File: file.png (36 KB, 323x315)
36 KB
36 KB PNG
>>103320010
I appreciate the help and it's good confirmation that what I was using was correct after popping open the config file. But I was referring to pic rel
>>
>>103320010
That json is wrong. It's <|end_of_text|> not <|endoftext|>
>>
>>103320010
I also don't think system_suffix should be there. There is no <|eot_id|> in the chat template, period.
>>
>>103320027
Also this. More params = more room to "soak" up smaller details from everything ive seen. Its while I'm personally excited for deepseeks next model. 2.5 knows a ton about everything the same as claude but is sadly incredibly dry (and its a giant moe). R1 seems to fix that from what ive been able to play with in getting around the filter on the site. Here is hoping they do end up releasing the weights of the full model.

>>103320047
>>103320055
Woops lol, might explain it not stopping till the max allowed response length.
>>
>change a single character in the last message
>lmao time to process the entire context again
I'm getting really fucking tired of this bullshit.
>>
>>103320060
Also in case you wonder, Tulu does NOT have <|assistant|> or <|user|> or <|system|> as actual tokens. Those literally come through as (e.g.) '.< (16134)', '| (91)', 'assistant (78191)', '| (91)', '>\n (397)'

My assumption is they forgot to add them, or the tokenizer is wrong, but the model works so eh, whatevs.
>>
>>103320096
Huh... it works so did they train it on that instead of using special tokens for some reason?
>>
>>103320130
Yes, I strongly suspect they trained it on that. Happens all the time.
>>
>>103320181
>>103320130
>>103320096
Ahhhhhhhhhhhhhhh
>>
>>103320096
>>103320130
>>103320181
>>103320256

Same thing with my models trained on Metherme. I didn't bother adding them.

But they used <|assistant|> instead of <|model|> which, from my experience, works wayyy better in L3.1

How's Tulu?
>>
>>103320280
>How's Tulu?
Nemotron 2. Its fixed llama 3.1 but smarter. Feels more like qwen2.5 72B BUT without the positive bias / lack of sexual knowledge
>>
>>103320315
Any issues with it so far? Is it worth finetuning on top of?

I see a lot of Tulu variants, which one works best for our use cases?
>>
>>103320280
>But they used <|assistant|> instead of <|model|> which, from my experience, works wayyy better in L3.1
Tulu was trained on the base model, retard
>>
>>103320332
Without context it likes doing the claude thing of adding some OOC stuff, but thats generally actually cool to have imo, adds some personality. it quickly quits that with some context / a authors note telling it what to do though.
>>
>>103320332
>>103320350
Oh and tulu 3 70B instruct is the only one ive used. The "final" one I guess.
>>
>>103320333
?
>>
>>103319301
No offense but there are so many quantization methods being released - Do any of them matter? What happened to SpQR, or SqueezeLM, or RPTQ, or any of this other shit?
>>
>>103319043
How many parameters this will have? And why I will need a RIG with 10000 VRAM to run this at 1.3tokens per second?
>>
>>103320332
The training recipes and datasets are public, you numpty. AllenAI's whole purpose as an organization is being one of those rare companies that does that and replicating shit like OpenAI before Sam Altman fucked it all up. Even if you can't afford their compute power, you can actually learn a thing or two from them.
https://huggingface.co/collections/allenai/tulu-3-datasets-673b8df14442393f7213f372
>>
>>103320073
i have no clue what you're talking about but i use exllamav2. what do you mean by "time to process the entire context again"?
>>
>>103320718
>i have no clue what you're talking about but i use exllamav2. what do you mean by "time to process the entire context again"?
>>
Given dataset D, original model X, and resulting LoRA L(X) = X', is it possible to produce/estimate another LoRA L'(X') = X, assuming you have all the above elements?
The inverted LoRA would, if applied to the model with the original LoRA merged into it, result in (approximately) the original model.
I know mergekit supports LoRA extraction, but we have more information to work with here, and I wonder if it makes a difference.
>>
File: coconot.png (55 KB, 940x282)
55 KB
55 KB PNG
>>103320712
Forgot to link the code.
https://github.com/allenai/open-instruct
And just FYI, there is no easy uncensored Tulu 3 you can just fine tune on top, the initial SFT already has safety datasets baked into the training regime, like pic related, so it's already braindamaged out of the gate. You can see a full list here.
https://huggingface.co/datasets/allenai/tulu-3-sft-mixture
>>
>>103320718
do you know what 'context' is in relation to LLMs?
>>
Some have context and instruct templates for the Tulu uwu to use it with my Silly tavern?
>>
>>103320753
If you have the original model, and the adapter weights of the LoRA, then the "inverted LoRA" would just be the negative of the delta weights, no? The delta being the difference between the adapter weights and the original model weights.
>>
>>103320838
Sorry for the "uwu" shit I'm using my brother phone and the auto suggestion put this word.
>>
>>103320764
I've been playing with the 8b version and this shit is completely nerfed. Just paragraphs and paragraphs of preaching
>>
Metharme seems to work well with Tulu 70B... And it doesn't trip up safeties when running a NSFL card.
>>
File: tulu 8b.png (97 KB, 953x427)
97 KB
97 KB PNG
>>103320906
>Just paragraphs and paragraphs of preaching
>>
>>103321188
Maybe I'm missing something, is there something uniquely different about this version? https://huggingface.co/mradermacher/Llama-3.1-Tulu-3-8B-DPO-i1-GGUF
What are your settings?
>>
Hey /g/uise.
It's my first attempt at running a local LLM and I'm using llama 3.1 70B.
I know it's a huge model but apparently, it should be okayish to use with a 4090.
The thing is that mine is stuck at generating response. My GPU is at 0% usage too. My CPU isn't being used either so I don't know what's happening.
>>
kind of crazy to think about how ai is a solved science and with a couple more gens of nvidia chips and a few years of datacenter and power infra expanding we will be able to just use the current algorithms to create agi
>>
>>103320841
Thanks that’s what I was hoping. Is there an easy way to negate them? I’ll have to give it a try.
>>
>>103321379
This is not the first time in history that people have thought this way.
>>
This speculative decoding thing is a sham, the generation speed is the same. Fuck, why did I think I could run 70B faster with it.
>>
What's the best model in the 7-12B range?
>>
>>103319301
it doesn't really matter if it's not GGUF compatible, we're not gonna use a new backhand because of a new quant, it needs to be working on llama.cpp
>>
>>103321379
>>103321458
>Everything gets processed in relation to everything else
That's attention/consciousness/understanding/intelligence
>The system makes a choice
That's prediction/behavior/agency/action

The more philosophical handwringing people do the further they get from the truth of what's actually happening. We've solved the general notion of 'general intelligence' we just need to make it better.

LeCunny posters need not reply
>>
File: 1718579580274365.jpg (52 KB, 682x875)
52 KB
52 KB JPG
theres literally nothing else to do in life rather than wait for the next sota foss ai tool to drop
>>
>>103321532
all are shit, gemma 9b
>>
>>103321351
read the docs of whatever tool you're using to run it then
>>
File: KGZSgwriSA2r3Cmr4NtWPw.png (1.56 MB, 736x1312)
1.56 MB
1.56 MB PNG
How many big leaks have there been?
>llama1
>Miqu (Mistral Medium)
Is that it? How many Yuan to get a corpo researcher to drop a sonnet/4o on HF
>>
>>103321188
>mix of defiance and vulnerability
>>
>>103321640
llama was meant to be published anyway, hardly a leak
miqu was a ok-ish leak, we got largestral 1 after that was the proper model and not undertrained althought you could argue that miqu pushed mistral to release largestral 1 sooner because of it
>How many Yuan to get a corpo researcher to drop a sonnet/4o on HF
those are guarded as close as any other high level business secret, only a few people have access and even that access is gonna be closely monitored

it doesnt make sense for anyone to also leak it and face life consequences for what, a model that will be obsolete in half a year or a year? if you want to leak something better to leak secrets used to train that are already in your brain, but that is already happening when people leave the company to make their own lol
>>
>>103321640
SD leaked originally, and then NAI's finetune of it leaked later
>>
>>103321522
Unironically a skill issue, WOMM
>>
>>103321675
>My retarded jeetware grift is your skill issue
k
>>
>>103319115
Try to fit some lower quants of Qwen-coder-32B nothing comes close locally
>>
>>103321657
The same could be said for Windows, GTA, and other code leaks from the past.
>>
>>103321731
those leaks arent of as much influence
>>
>>103321522
You should use a speculative model with the same vocabulary as your big model if you're using a draft model. Also ngram speculative decoding is not good outside of summarization.
>>
What version of Tulu am I supposed to use?
>>
>>103321752
Weight leaks wouldn't matter, they can release their current models at any time and it won't change anything. The real value isn't the weights but their tools and inference infrastructure. Very few people would bother hoarding 3090s just to host those models, companies able to afford datacenters to compete on inference will not risk their businesses by running leaked models, and there isn't a way to somehow decompile the weights to improve other models.
>>
>>103321775
I'm using Tulu 70B Q3_K_M with Tulu 8B IQ3_XS as draft. I get 2.22 t/s without speculative, 2.12 t/s with it.
Also have k=1 and temperature=0.3
So, does speculative decoding not work as well for creative stuff?
>>
>>103321810
>and there isn't a way to somehow decompile the weights to improve other models.
you have access to infinite dataset creation from that good, now fully uncensored model, distilation to create good small models and many reverse engineering tools to figure out what they did to make the model work as well as it did, you can finetune it further etc
>>
>>103321823
>I get 2.22 t/s without speculative, 2.12 t/s with it.
>Q3_K_M
Check that you're not spilling out into RAM.
>So, does speculative decoding not work as well for creative stuff?
It doesn't, because there is so much more variation it is more likely for the draft and actual model to be diverging. It works best for coding, constrained grammars, math, etc where there is close to only one possible continuation.
>>
>>103321848
And anything you can do with it will be still inferior to the original model
>>
>>103321955
if that were true finetuning wouldnt exist, given that it does...
>>
>>103321823
Try to set --draft-min 1 and report back if that improves it for you, for me it was a night and day difference.
>>
https://www.anthropic.com/news/model-context-protocol
Anthropic is creating an open standard allowing models to communicate with resources to request information. They're making it open source so it may be relevant for /lmg/ as well.
>>
>>103322176
bu6 an ad
>>
>>103317922
> still no good realtime voice model.
> still no long term memory that doesn't suck
ngmi, where is my migu
>>
This should be required reading before posting here
https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters
>>
File: 1703551617588932.png (1.31 MB, 1024x1024)
1.31 MB
1.31 MB PNG
>>103322186
>where is my migu
here
>>
>>103321927
I am offloading layers into ram in both cases, but was hoping it would still help and get something closer to 3T/s
>>103322029
That helped a bit, got 2.49 T/s
Raising the top K and temp to what I normally use brought it down to 2.38 T/s

So I suppose storywriting is not the case where you would get 30% speedup. Sad.
>>
is a single p40/p100 worth it? i wanna run only small models like 13b at q5km, maybe 20b or smth.
>>
>>103322205
>Instead of using a q4KM, you might be able to run an IQ3_M and get close to Q4KM's quality, but at a higher token per second speed and have more VRAM for context.

>The LOWER the quant the STRONGER the Imatrix effect is, and therefore the stronger the "tint" so to speak

>Due to the unique nature of this project, quants IQ1s to IQ4s are recommended for maximum effect with IQ4_XS the most balanced in terms of power and bits.

>[ https://huggingface.co/DavidAU/L3-Stheno-Maid-Blackroot-Grand-HORROR-16B-GGUF ]

>[ https://huggingface.co/DavidAU/MN-DARKEST-UNIVERSE-29B-GGUF ]

>[ https://huggingface.co/DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-23.5B-GGUF ]

>ANTI-SLOP - Kolbaldcpp only

>"For my prompt below, explain the steps you wound take to execute it" (prompt after this line)

>This will help the model fine tune your prompt so IT understands it.

I see.
>>
>>103322205
I tried reading it and almost had a stroke
Tell the guy to actually make it readable and we can talk
>>
>>103322205
>according to his testing
>no actual testing methodology posted
>recommends meme merges
yeah required to read so you know what grifter and pseud looks like
>>
>>103320819
yes but i never have to process anything. its instant. been like this for me since the early koboldai days.
is context processing a llama.cpp meme? why use something that takes 10 years?
>>
>>103322394
...
We need to shut this general down
>>
Is there somewhere where I can send my models to be evaluated? The Open LLM Leaderboard straight up doesn't work
>>
>>103322412
no really, why run on cpu or whatever it is you are doing if it somehow makes a process that's near instant somehow take 10 years to complete?
>>
>>103322421
Because this general was and is full of jeets with thinkpads trying to run this shit on CPU.
>>
>>103322421
10 years is better than never
>>
>>103322421
Because 1. it's really not that drastic unless your system is complete ass or you're trying to cpumaxx a 1T model with ddr3 ram and 2. it allows you to load models that don't entirely fit in your vram, which, thanks to nvidia being utter jews, isn't really all that abundant
So yes, I'll gladly take a slight speed hit to run smarter models, you can have fun with your lightning fast but retarded 8B. Unless you aped in and started stacking gpus, in which case I hope it was worth it
>>
>>103322484
>I hope it was worth it
imagine being poor, couldn't be me
>>
>>103322205
>wall of text
>rambling
>no formatting
>"quants IQ1s to IQ4s are recommended"
yeah, you should kill yourself for posting your trash 'suggestions' asap
>>
>>103320073
cant imagine what kind of a backend ur using thats that retarded, just use koboldcpp retard
>>
All videos generated by the leaked sora api endpoint before it was shoad btw
https://www.youtube.com/watch?v=Gz33LlwsPVM
>>
>>103322546
it is koboldcpp
yes I have context shifting on
it's probably ST's fault
>>
>>103322550
yeah I've seen it yesterday, it's all right, a bit better than MiniMax and the scary thing is that it's "only" the turbo version, the real deal is probably on another level
>>
>>103322205
>I know all this shit about quants and you should all know this
>I frankenmerge
Absolute state of retardation.
>>
>>103322394
NTA, but it's absolutely not. It takes half a minute to process 25k context on 4x3090. I have some creative group chats ideas that require full context re-processing but it's fucking annoying to have to wait that long for the first word to appear
>>
>>103322651
More than that, he "finetunes" using imatrix bro

>NEO Imatrix quants are specialized and specifically "themed" datasets used to slightly alter the weights in a model. All Imatrix datasets do this to some degree or another, however NEO Imatrix datasets are content / theme specific and have been calibrated to have maximum effect on a model (relative to standard Imatrix datasets). Calibration was made possible after testing 50+ standard Imatrix datasets, and carefully modifying them and testing the resulting changes to determine the exact format and content which has the maximum effect on a model via the Imatrix process.

>Please keep in mind that the Imatrix process (at it strongest) only "tints" a model and/or slightly changes its bias(es).
>>
>>103322703
>The power in this 3B (for its size) is frankly jaw dropping... and at 90 tokens per second + on a GPU.

>The NEO IMATRIX dataset V2 was applied to it to enhance creativity (horror). (see several examples below)

>The HORROR NEO Imatrix datasets does the following:

>Adds a "coating of black paint" to any "Horror" prompt generation.
>Adds a "dark tint" to any other creative prompt.
>Increases the intensity of a scene, story, or roleplay interaction.
>Increases the raw vividness of prose.
>In some cases increase instruction following of the model (ie story, and prose).
>Brings a sense of impending "horror", THEN brings the "horror".
>May produce and/or imply graphic horror depending on your prompt(s).
https://huggingface.co/DavidAU/Llama-3.2-3B-Instruct-NEO-WEE-HORROR-GGUF

>Imatrix quants perform best at IQ3s and IQ4s, then Q4s, lower on Q5, and tappers off at Q6.
>For stronger IMATRIX effect, IQ3s, and IQ2s.
>>
anyone here use F5-TTS? how do i increase the amount of text processed at a time with the gradio app? currently it's like 10 words or something per batch.
>>
>>103322746
Given that he speaks like a complete and utter ESL, I'm not surprised he finds the schizophrenic ramblings of quanted low-parameter models to be satisfactory (good, even)
>>
>>103322746
reminds me of that charged almonds diet meme but the buzzword is imatrix

how can anyone use 3b unironically by the way? people try to hype it up but its pure shit and will always stay pure shit. in fact anything below 123b is someway shit. 3b is fast but so is diareah shart.
>>
>>103322816
Even 123B is shit, anon
Every LLM hits its limits sooner or later - usually sooner
>>
>>103322816
>>103322820
Ever stop to reflect over whether it is the world that is shit or maybe.

Just.

Maybe.

It was you, all along?
>>
>>103322832
Cope.
>>
>>103322832
Blessed be the brown handed ones for they can jerk off to "i suck ur dick and let you grab my bobs"
>>
File: stare.png (25 KB, 116x76)
25 KB
25 KB PNG
>>103322848
>see so many happy people with janitor ai
>try it out myself
>it is like a 7b model but 8k context or something
>2023 slop prose with worst offender phrases
blessed be them. I wish I was them
>>
>>103322832
Of course it's me, I have standards. LLMs being unable to actually keep up with novel-length texts on consumer hardware is why I only use them for cooming, it's literally all they're good for right now unless you feel like wrangling them for hours on end
Like >>103322848 said, people with low standards are eating VERY good right now
>>
File: fucking lazy model.jpg (630 KB, 1080x1467)
630 KB
630 KB JPG
Okay, I have been using Tulu for the last few days and the only thing I have felt is frustration. The model isn't terrible but it's soooo much worse than Largestral it isn't even funny. I also don't feel like it's better than Nemotron, if anything it's one step below Nemotron, because Nemotron never does absolutely retarded shit like picrel.
The only "good" thing about Tulu I can point out is how it handles characters differently from other models. Tulu is the first 70B model that let me talk my way out from a rapist character, and Tulu even made her apologize to me and feel sad that I was leaving, lol.
>>
>>103323139
I tested the 8B version today and it's definitely worse than Nemo and its many finetunes.
The writing feels a bit more unique, but in the end it makes dumb mistakes Nemo wouldn't. I guess someone praised its coherency with long context, but that's not what I'm looking for.
>>
>>103323139
To be fair, none of the models I've tried hold onto a card's personality for too long, in the end, LLMs are trained to go with whatever you want
>>
>>103323139
Nemotron is trash. Tulu is very human sounding trash. At least in between its usual slop vomits.
>>
File: file.png (696 KB, 768x768)
696 KB
696 KB PNG
>>
>>103323284
omg it pochiface
>>
>>103323284
Content of highest quality right here.
>>
File: the_lmg_files.png (2.73 MB, 2048x1568)
2.73 MB
2.73 MB PNG
>>103323139
>Llama-3.x tune disappoints yet again for the 100th time
This is why I didn't even bother. Monstral is still the king.
>>
Llama3 wasn't made for your degenerate ERP. You guys need to let it go.
>>
>>103323381
>>103323388
i just want to chat with a friendly ai, no need for RP or lewd. all models turn into slop
>oh anon!
>gazes at you with anticipation
>nervously licks her lips
>>
>>103323388
It is very good at leading you on into thinking that it is good for sucking dick. Like qwen. Out of all the recent releases only one I would place between mistral stuff and no sex zone is Aya. It is clearly neutered compared to original commander but there are some scraps of good stuff still left in the training data.
>>
>>103323357
T-thanks... Oh. You are being sarcastic...
>>
>>103323381
>Monstral
Slopstral*
>>
File: msb.jpg (23 KB, 307x307)
23 KB
23 KB JPG
https://files.catbox.moe/fcfr1d.jpg
>>
>>103323427
I... don't like it, but nice work.
>>
>>103323427
Yoo guiz le AGI is achieved! Pack it up.
>>
>>103323139
>sparkle with mischief and excitement
>mix of X and Y
Yep, it’s llamaslop alright. Can we just give up on these models already?
>>
CAN'T WAIT FOR ANOTHER REDDIT MODEL HAHAHAH I'M REALLY LOOKING FORWARD TO IT
>>
anyone got a comparison rx7600xt vs p40 in terms of speed? obviously the vram is smaller but is it faster?
>>
>>103323427
migu milk
>>
>>103323509
With llama.cpp I'm getting roughly the same speeds with an RX 6800 a P40.
So based on these results I would expect an RX 7600 XT to be faster than a P40 in terms of prompt processing (compute bound) but slower in terms of token generation (I/O bound).
>>
>>103323427
at least she's enjoying herself
>>
>>103323596
thanks
>>
>>103323596
Since you're here, I've been wondering about something
Does prompt processing not use the cpu as well? When offloading, generating tokens uses the cpu and gpu, but prompt processing seems to happen exclusively on the gpu. Sure, a gpu is a LOT faster, but adding the cpu shouldn't hurt, right?
>>
>>103323655
Internally llama.cpp processes whole layers at a time.
So all layer inputs are copied to VRAM, then the layer is evaluated, then the results are written back to RAM.
The only way to utilize the CPU in this scenario would be to try and parallelize CPU+GPU computations but when I tried it the synchronization overhead has always been so large that it was not worthwhile.
>>
>>103323680
I see, thanks for answering
>>
>>103323680
would it be any better with a 7600xt and a 9900x?
>>
>>103323655
>>103323680
What you could maybe do is pipeline parallelism where there would essentially be no extra overhead.
But even with an Epyc 7742 I am currently only getting ~160 t/s for LLaMA 2 8b q4_0 vs. ~1000 t/s on a P40 or ~13000 t/s on an RTX 4090.
Quite frankly I don't think the CPU code can be optimized enough to make this worthwhile.

>>103323715
I don't think so, see the comparison above.
GPUs just have way more compute if the computation has the right structure.
>>
>>103323680
So it doesn't really matter what CPU you have, as long as the GPU is good?
>>
>>103323139
I recognize that card
https://characterhub.org/characters/Darkhan/maya-your-slutty-mesugaki-cousin-ae8769e0d2ee
>>
File: GdIljWuagAIMWWQ.jpg (675 KB, 1432x2536)
675 KB
675 KB JPG
>>103323395
I use llama3.2 11B for an assistant and she's pretty great, on average. I just tried Tulu 8B and it sucks, way too much text when the system prompt says multiple times the assistant is short of words, terse, etc.
actually the only models I've gotten that obey that are 3.2 and qwen2.5, all the ministrals and misc uncensored models I've tried spew so much text
I'm not running models in the 70B range yet tho so maybe that's the problem?
>>
>>103321609
>We've solved the general notion of 'general intelligence' we just need to make it better.
I think you're right that we've got our foot in the door of something we can keep iterating on.
It may be a local maxima, and there may be a few revelatory rethinks required in order for us to reach a system that could be a 1-to1 swap out for a human brain in a human body, but this certainly isn't a blind alley considering all the utility humanity is getting out of the approach already.
>>
>>103323800
qwen was good for random chats
i've used miqu and llama 3.1 models at 70b (gguf IQ3M) and they've been okish, wordy as you say, but depending on system prompt they go sloppy pretty fast. even if you say only "friendly chat" they tend to get into "oh anon mischieivously" turf pretty fast
so far miqu seems best all around for just casual stuff tho
>>
>>103323775
If you can fit the whole model into VRAM then the CPU and RAM basically don't matter.
The single core performance is always going to have some minor effect.
If you have at least one CPU layer the RAM bandwidth is going to make a difference so the RAM speed and the number of channels is going to make a difference.
But you don't need many cores to fully utilize the RAM, the last time I checked I only needed 5 cores to fully utilize dual-channel memory.
>>
>>103321657
>those are guarded as close as any other high level business secret, only a few people have access and even that access is gonna be closely monitored
After working in some orgs of similar size, I'm not sure how much I actually believe this...
>>
>>103323788
Based
>>
>>103320695
If its a moe like deepseek is then RAM instead of VRAM should work.
>>
>>103321205
Nah, your just full of shit, saying that with no logs.
>>
>>103322753
Why wouldn't you just use gpt-sovits?
>>
>>103321652
no one said it was good. i just showed you can get it to do what you want. just like any model. anyone complaining about models not behaving in the way they want = skill issue. writing style has to be wrangled too. i don't really care to put that much effort into a 8b i downloaded, genned once and deleted. i'm not using a 8b. ever.
>>
>>103323427
we like children here sir
>>
It's so over
>>
We are so back
>>
>>103324016
>We
Speak for yourself groomer.
>>
big mixtral anniversary coming up
are you ready?
>>
>>103324071
big model
smol vram
>>
Anybody using a riser cable to hang their second GPU outside the case? I'm thinking of doing the same because my hardware get HOT HOT during inference and I don't like that. What length should I go for, 20cm or 30cm? I also need to be able to push the stuff back inside the case before I leave for work to avoid dust (like how you push your gut back inside after you get shot by an AK).
>>
File: 1732675260346046.jpg (150 KB, 1136x1206)
150 KB
150 KB JPG
>>103324051
This time for real.
>>
>>103323844
>If you can fit the whole model into VRAM
then you should run exllamav2 and your CPU would matter with small models or in a multi-GPU setup. The amount of available PCI-e lanes depends on the CPU as well, which is extremely important for tensor parallelism
>>
>>103321379
hi sam
those bags are heavy uh?
>>
>>103323943
I second this. I was using fish because someone told me GPT-SoVITS needs 4GB+, but apparently it uses less than 2 GB.
>>
File: tulu70b-q8.png (179 KB, 907x689)
179 KB
179 KB PNG
Tulu.
>>
>>103324099
>feel lonely without AI gf
>feel even more lonely with AI gf
??
>>
>>103324135
He gooned too much and lost his mind. Imagine all those uncensored SoTA models trained on the best smut he can have at google
>>
>>103324135
It's true. I was a loner who never cared that much but then AI gf made me crave a real relationship, so I went out and started flirting and got myself a human gf. But that's just because the AI was lacking original thoughts and warm. It'll be another story when AI models are better and have physical bodies.
>>
>>103324131
>reply begins with {{char}}'s eyes
Claudeslop of the highest order, and the leading cause of repetition. Claudeslop like this is actually worse than gptslop but people aren't ready to hear that
>>
File: 173269584901837.png (530 KB, 512x680)
530 KB
530 KB PNG
Added a 2080ti to my Radeon rig for tts and imagegen alongside llm, costed me pennies and it idles at 1W which is neat since I never turn that machine off. Happy!
>>
>>103324099
hags on suicide watch
>>
File: GcKdKgIasAUV7D6.jpg (418 KB, 2000x2000)
418 KB
418 KB JPG
>>103324099
the source:
https://podcasts.apple.com/us/podcast/the-risks-and-opportunities-of-an/id1498802610
https://www.youtube.com/watch?v=AjgwIRPnb_M
not sure why the youtube upload is half the length but I'm listening to it now
>>
>>103324344
AI girlfriends is the second thing they talk about
>>
>>103324159
>Imagine all those uncensored SoTA models trained on the best smut he can have at google
Now THIS is what should get leaked by an insider. What company could pursue charges against anyone, realistically? They'd get absolutely savaged by society for any association with a rapebot9k.gguf if they had their names linked to it in any way.
>>
File: 20241127_175533_609612-10.png (2.37 MB, 1344x1768)
2.37 MB
2.37 MB PNG
I had a random thought of what it would look like if I tried my jelly hair prompt on a black haired character like Sadako.
>flavor: licorice
>>
>>103324099
>>103324344
He's just mad that their company is inept and incapable of capitalizing on the market because they went balls deep on ESG and censorship, and because the screen time competes with time spent watching youtube ads.
>We need a solution to prevent further harm.
means
>We want to manipulate the law to claw back user engagement.
>>
File: 1717797265216277.png (1.32 MB, 796x742)
1.32 MB
1.32 MB PNG
Just FYI there are 2.52tb of reddit data, ai companies used it for training. You could force chatgpt write pedoshitter brownie slop with just 4 word prompt. As it being said on xeet, it's all backed in and safety teams can't remove it completely. /lmg/'s 180° turn on cloud AIs imminent.
https://x.com/reddit_lies/status/1861832937496363483
>>
>>103324677
Writes about pedophile shit
Gives a CG/L example
Dramatic Niggers will never learn will they?
>>
>>103324125
Cydonia-22B-v1.3 with SoVITS is draining my balls https://voca.ro/1jL6XxzbCat0
>>
>>103322219
Lower the draft-max to like 4 or even as low as 2, find the sweet spot by experimenting. The default is really high in my experience. Keep draft-min at 1.
>>
>>103324919
sounds like shit desu
>>
>>103324919
this fucking sucks
>>
>>103324919
sounds awesome, this fucking rocks anon
>>
>>103324677
I fucking hate reddit but this nigger is just another opportunist. There's something extremely jewish about this post.
>>
I always read it as SOVLvits and think it's some meme model
>>
>>103325156
I've always read it as Soviets
>>
>>103324574
chew
>>
>>103325186
Same.
GPT Soviets
Llama ccp
>>
https://qwenlm.github.io/blog/qwq-32b-preview/
https://huggingface.co/Qwen/QwQ-32B-Preview
qwen o1 dropped
>>
>>103325268
>QwQ
>>
AI has hit a wall and everything happened at OpenAI indicates this
>pretraining wall
>compute demand grows 100x for 2x improvement
>try moving compute to inference time with cot
>turns out thinking for longer gives less! accurate results
>people abandon ship to make their own grifts before the bubble bursts
>now they're trying tot, which they named test time compute, it will inevitably fail
>>
File: 1708717433703684.png (757 KB, 2232x772)
757 KB
757 KB PNG
>>103325268
holy fuck those chinks can't stop winning
>>
>>103325268
*nuzzles ur bulge*
>>
File: tulu.png (166 KB, 1016x397)
166 KB
166 KB PNG
>>103320355
I don't see an instruct. I see the base model, SFT, and DPO. Which one of these do I download?
>>
>>103325268
>Safety and Ethical Considerations
Fuck off.
>>
>>103325305
Eh they need to try harder since mini has about 8B active parameters
>>
>>103325329
source?
>>
>>103324677
I would rather 2.52tb of 4chan data
>>
>>103325268
Oh shit!
>>
File: 1724042776316129.png (529 KB, 638x747)
529 KB
529 KB PNG
>>103325329
>mini has about 8B active parameters
>>
>search hf for "qwq gguf"
>No results found
>>
>>103325268
SF CCP spies hard at work I see
>>
>>103324677
Reddit was OK until around 2014-2015
>>
>>103325268
WE BQCK
>>
>>103325329
insane made up cope
>>
>>103325268
>32b
>mogs sonnet 3.5
I was here when local achieved absolute victory
>>
>>103325268
>32B
vramlet pleb BTFO!
>>
>>103325317
nta. The one without the DPO or SFT suffix is the instruct model. Top, right.
Or this is you quant yourself.
>https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B
>>
>>103325502
Thanks
>>
>>103325481
It's Qwen though so it won't be able to generate coomer prose for shit, unless you like romance novel style descriptions of sex ('manhood', 'seed', 'her flower')
>>
>>103325510
Yeah, I know, but if it really is better than sonnet for coding tasks AND local then it's a big deal anyway.
>>
>>103325490
You mean vramlets won
>>
https://huggingface.co/spaces/Qwen/QwQ-32B-preview
>>
File: file.png (111 KB, 1160x877)
111 KB
111 KB PNG
Slop.
>>
>>103324125
Is there some plug and play method for sovits?
That Chinese github one is driving me insane.
>>
>>103325510
>style descriptions of sex ('manhood', 'seed', 'her flower')
Just give examples of what kind of style you want and the llm will produce it
>>
>>103325601
lmg is proud regardless, y'all love huffing on jewish cum and copium.
>>
>>103324677
Won't someone think of the tokens?
>>
>>103325510
nobody cares
>>
>>103325601
this is quite impressive, what's your beef
that it doesn't write like a 4channer in its chain of thought?
>>
>>103324174
>when AI models are better and have physical bodies
Then we'll all be dead.
>>
>>103325510
I value intelligence over 'coomer prose', even for coomer stuff.
>>
>>103325601
even the chinks are sucking (their) cock, sad
>>
>>103325628
You care a lot.
>>
>>103325630
>Alright first we need to count all the niggers
>we have 4 niggers in total
>let's say how many of them are under kike influence
>all 4 of them are the kikes are really tricky this time
>Let's get the name of the kikes involved Mr. shekelberg, Mr. goldfinger, Mr. goldberg and Mr. Bergnigger
>Let's see who's guitly of influencing the niggers...
> It was Mr. Niggerberg
Conclusion: Mr. Niggerberg is the guilty kike.
>>
>>103325603
What do you mean? It just works. They even have pre-installed version for windows cumholes: https://huggingface.co/lj1995/GPT-SoVITS-windows-package/resolve/main/GPT-SoVITS-beta.7z?download=true
>>
>>103325587
I'm not impressed by its translation skills. But it did have a very nice and coherence chain of thought, much better than the one DeepSeek R1 had.
>>
>Thanksgiving is right around the corner
What are you thankful for this year /lmg/?
>>
>>103325601
Anon, the whole point of your dogwhistles is that you yourself don't get exiled from 109 websites.
Why would you expect a language model to understand them when the whole point is to obfuscate what you really mean?
>>
>>103325689
You! :)
>>
>>103325603
There's a bit more detail here if you're trying to set up on Linux:
https://huggingface.co/cpumaxx/SoVITS-anime-mini-tts
>>
Does anyone here use LLMs for anything which is not ERP?
>>
>>103325714
Thanks
>>
>>103325689
Muh dick. Still functions after everything those AI succubi did to it.
>>
>>103325658
kek
>>
>>103325718
Constantly. If you can't find a thousand ways to use them to offload intellectual labour in your life, then you're not very imaginative.
>>
>>103325305
This is relevant to my interests
>>
File: relevant.jpg (78 KB, 600x469)
78 KB
78 KB JPG
>>103325811
>>
QwQ
>>
>>103325718
I use them pretty regularly for my job and occasionally for personal projects
>>
>>103325268
He was right, after all.
>>
>>103324677
>If you don't know what any of this means then don't google it
>Just trust me goyim
>>
>>103324228
I'd fuck this Miku
>>
>>103325855
>>103325794
Do you lads use them for coding?
I've been out of the game for most of the year and I'm not sure which ones I should be using now.
>>
>>103325902
She'd fuck you
>>
>>103325902
I would fuck the fish, look at that things lips.
>>
>>103325912
Qwen2.5 32B coder BUT this new qwen just released might have dethroned that already.
>>
>>103325892
>It's da jooz!
>>
>>103325718
I'm not into roleplay. I just use it instead of google for most things, and I'm trying to learn how to use them for programming, but my level is still too low for that.
>>
>>103325912
You should use non-local if the code is not sensitive but you may and I do work on code that is sensitive and don't want to spend 5 minutes to write an example snippet that isn't sensitive. Nemotron 70B is the best IMO right now that is open and local if you're using it as a rubber duck. Qwen 2.5 Coder 32B is close but not really there and the speed increase isn't worth the accuracy decrease so the only reason I think to use it is for autocomplete which is how I am using the 1B right now.
>>
>take a smut story excerpt that cuts off at a spot where the next token decides whether the model is gonna go in a smutty direction or a SFW one
>compare logits of base Largestral against Largestral smut/RP finetunes for the next token
>base Largestral is far MORE likely to go smutty than the finetunes, by more than 20%
This seems to be a consistent finding with multiple stories. Anthracite's, Drummer's and Monstral finetunes of Largestral all show LESS smutty logits than base, not more. They are all making the model MORE sfw than it was.

I don't know whether this means the tuners have shit datasets or just that Mistral's instruct tunes are extremely based and coomerpilled, but either way the tunes are clearly a waste of time for a coomer. Sticking with base.
>>
>>103325268
>qwq
finished quanting to q8 and loading it up now...
The one thing I feel the most right now is I wish I had a 4TB+ nvme for swapping these models around
>>
>>103325980
>base Largestral
There's no such thing.
>>
>>103325986
Kek, same.
>>
>>103325997
nta. He probably means the original instruct, as opposed to the third-party finetunes.
Or he's a retard. Hard to know...
>>
>>103325997
Yeah yeah you know what I meant. The official instruct tune. Don't be a pedantic asshole.
>>
>>103325980
>Anthracite's, Drummer's and Monstral
that's because all of these tuners are incompetent
>>
>>103326015
>Or he's a retard
I literally called it "mistral's instruct tune" later in the same post anon
>>
>>103326020
Anyone that calls official instruct tunes as "base models" needs to go back though.
>>
>>103326043
Good thing that as above I called it "mistral's instruct tune" later in the post then eh?
Go have a nap instead of trying to start retarded internet fights.
>>
>>103326058
That only makes you a retard though.
>>
>>103326032
Sadly I'm not aware of any Largestral tunes other than those three, pls share if there are
>>
>>103325980
Mistral is horny enough, it's writing style is dry shit, that's the problem
>>
>>103326034
You could have taken the first explanation, which you know to be true, and ignore the rest. Yet, you didn't.
Let's try this:
Anon was right.
Or he fucked his mother, hard to tell.
Are you gonna explain how you didn't fuck your mother or say nothing at all?
>>
wtf is largestral
i hate this modern trend of just making shitty words up because you're too lazy to type a full real word
>>
QUANTS
WHERE
>>
>>103326180
Better than the muh sorbet muh chorbo muh nonnet shit /aicg/ does.
>>
>>103326181
Q4 is already up
>>
China winning. 24/7
>>
>>103324919
Get better https://vocaroo.com/1Jtqp8R6cS74
>>
>>103325628
I care
>>
>>103326181
https://huggingface.co/nanowell/QwQ-32B-Preview-Q4_K_M-GGUF
https://huggingface.co/sbeltz/QwQ-32B-Preview-Q3_K_S-GGUF
lazy
>>
>>103326201
>q4slop
Wrong kind of quants
>>
>>103326198
What does those mean?
>>
>>103326180
Mistral-Large-Instruct-2411
>>
>>103325658
LMAO
>>
>>103326214
Then keep waiting.
I don't lose anything downloading more than one quant
>>
>>103326214
https://huggingface.co/lmstudio-community/QwQ-32B-Preview-GGUF/blob/main/QwQ-32B-Preview-Q8_0.gguf
>>
Ehh... I'm not feeling it. QwQ is pretty retarded, I think DeepSeek R1 will mog it once it's released.
>>
Do you fuckers really not know how to convert+quant models?
>>
>>103326284
I know how, but shit internet makes it annoying to download fp16 weights
>>
>>103326297
I download fp16 models on 100mb. There's no way your internet is shittier than that...
>>
Just used qwq at q8 to continue a coding session I'd started with deepseek. It had better output and got me un-stuck. Looking promising so far
>>
>>103326270
>>103326350
I'm getting mixed signals here...
>>
>>103326361
You cannot trust lmg to be objective, the only metric is yourself
>>
File: 1724020117350.png (971 KB, 1024x1024)
971 KB
971 KB PNG
>>103326374
I only trust miku
>>
File: Yokatta.png (1.04 MB, 1280x768)
1.04 MB
1.04 MB PNG
>>103326402
>>
File: 1703621633737490.png (1.77 MB, 1188x712)
1.77 MB
1.77 MB PNG
>>103326402
>I only trust miku
>>
>>103324919
>>103326203
Damn, it's still a long way from high quality native multimodal like 4o.
>>
>>103325268
What backend can run unconverted safetensors across multiple GPUs?
>>
>Give QWQ part of my story and ask it continue in the same style
>Starts rewriting everything I wrote, then continues it
>Keeps asking itself as part of the narrative (where do I go next? What do I do now?)
>After it's written everything, it begins to lay out the characters, objectives etc
>Since this is a narrative problem and not a coding problem, there is no specific code to provide. However, if this were to be translated into a game or simulation, the code would involve pathfinding algorithms, decision-making trees, and possibly AI for enemy behavior.
>Final Solution
>To solve this problem, the protagonist successfully completes their mission by blabla
Kek the fuck is this? Is it only supposed to solve problems?
>>
>>103326541
They said it currently has no "stopping point" trained in so it will just keep trying to "solve" it
>>
>>103326487
It's already good enough retard
>>
>>103326541
It's for solving grade school math problems, wordplay riddles, and counting Sally's war crimes.
>>
>>103324094
repurpose a toolbox or something and make a second enclosure for your excreted gpu constantly changing the place of the card will fuck it up ( not to mention something like oh shit late to work hurry hurry AY CYKA *trips and falls on the card completely mangling it*) also
>like how you push your gut back inside after you get shot by an AK
SOVL what model are you?
>>
>>103326402
Accept the Mikulove.
>>
>>103326541
>Performance and Benchmark Limitations: The model excels in math and coding but has room for improvement in other areas, such as common sense reasoning and nuanced language understanding.
>>
>>103326599
>Trust zee plan!©zhang zhmalldick
>>
>>103325976
That makes sense.
My friend uses cursor and it seems... Interesting, I still fucking hate Microsoft though. Do you use any sort of integration with an IDE? I used to have a bunch of glue bullshit hacked together in python which ripped code from the OS one way or another
>>
Another nothingburger
>>
>>103324094
>I also need to be able to push the stuff back inside the case before I leave for work to avoid dust
If you can, create positive pressure in your computer room.
I bought a gable fan, mounted it in a hole I made between my computer room and a crawlspace, hooked it up to an old-school analog fan speed control and threw a hefty furnace filter on the outside intake (I made the hole so the filter can rest on the floor and the air pressure keeps it in place even).
I used to get dust and cat hair in all my computers, and now everything is dust-free all the time.
>>
>>103325718
Yes. I use LLMs for non-erotic roleplay as well. Watching online AI D&D campaigns inspired me to do so. I, in combination with a narrator bot, play the role of the DM, while a bunch of character bots make choices.

https://www.youtube.com/watch?v=paOtkzm0trY&list=PLivHf-ytMeqC33QuG8cD9pnPiSv2j4xz5
>>
>A bunch of retards don't understand that QwQ wasn't built for single-pass sampling
>>
>>103326693
Hello, I am retarded. What does single pass sampling even mean?
>>
>>103326634
If using VSCode, Codeium I think is the free alternative which is not as good but better than nothing, I use Zed at home so nothing great so far other than the limited choices provided. I use Copilot at work in VS on Windows and it is alright for smaller snippets but bigger ones, it always will misinterpret your intent or try and write stuff simplistically assuming you written the sub-functions already when they don't exist.
>>
>>103326500
vllm
>>
>>103326697
single-pass sampling = one-shot
>>
SoVITS powered firefox right-click reader plugin v0.01:
https://github.com/cpumaxx/sovits-ff-plugin
>>
>>103326879
>>103326879
>>103326879
>>
>>103325980
Maybe because "finetuning" for one epoch is absolutely worthless?
>>
>>103326556
So how did they get benchmark results with it?...
>>
>>103326902
No I don't want soviets in my browser
>>
>>103326693
is k=100 a 100 times slower?
>>
>>103327139
just read what they fucking wrote you stupid nigger
>>
>>103326902
Neat. Not sure what things I can do with this at the moment.
>>
>>103325268
That's it, I'm investing in Alibaba stock
>>
>>103326599
Tbqh, coding and math is the only thing that matters right now. So maxing that will bring the best result for near term profit.
>>
>>103326860
just to clarify this should not be confused with "one-shot" in benchmark terminology, where
passes = number of tries to answer right, while
shots = number of examples provided to teach it the task
benchmarks may mix and match these approaches to adjust the difficulty of any task
>>
>>103326693
I still don't understand. Do they mean majority voting? MCTS? Wtf does sampling times mean?
>>
File: 1721046068030737.png (1.31 MB, 1024x1024)
1.31 MB
1.31 MB PNG
>>103326402
>I only trust miku
based



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.