[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: qwen.jpg (82 KB, 863x874)
82 KB
82 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103164659 & >>103153308

►News
>(11/12) Qwen2.5-Coder series released https://qwenlm.github.io/blog/qwen2.5-coder-family/
>(11/08) Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b
>(11/05) Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large
>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip
>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>103164659

--Papers:
>103169736 >103169932 >103170051 >103170177
--BitNet and 1-bit LLMs discussion:
>103164968 >103164982 >103165651 >103165002 >103165091 >103165187 >103165386 >103165509 >103165539 >103165611
--Tips for improving AI results and creating interesting bots:
>103171805 >103171903 >103172308
--Testing and discussing AI models with various prompts and scenarios:
>103167627 >103167694 >103167792 >103167806 >103167840 >103168022 >103167911 >103168018 >103168471 >103168546
--Scaling hypothesis has plateaued, new architectures needed:
>103171164 >103171336 >103171484 >103172144 >103172376 >103172471
--Qwen2.5-Coder 32B performance and open source model limitations:
>103169646 >103169956 >103170003 >103170219 >103170241 >103170339
--Qwen-32b-coder model impresses with its coding abilities, rivaling Sonnet 3.5:
>103166556 >103166729 >103166778 >103166794 >103166812 >103166834 >103166855 >103166874
--Disappointment with 70b/72b models, comparisons to smaller models:
>103167782 >103167818 >103167842 >103167892 >103168016 >103168082 >103168142 >103168155 >103168796
--Balancing model size, precision, and GPU memory for optimal performance:
>103171166 >103171218 >103171272 >103171301 >103171333
--Anon suggests combining Qwen and Qwen coder into a MoE:
>103166862 >103166938 >103167004
--Anon shares their 32B Coder bullet hell game and code:
>103171453 >103171526 >103171586
--Anon asks about using cheap CPUs for AI processing, others respond with skepticism:
>103169144 >103169621 >103169881
--Red Hat acquires Neural Magic:
>103171770 >103172488
--CUDA performance compared to Vulkan on RTX 4070:
>103169160 >103169168
--Microsoft's TMac backend vs K quants performance comparison:
>103168955 >103169005 >103169115
--Miku (free space):
>103167584 >103167627 >103167792 >103167878 >103169158

►Recent Highlight Posts from the Previous Thread: >>103164881

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
File: 1728418342610761.jpg (234 KB, 749x898)
234 KB
234 KB JPG
soul...
>>
>>103173467
i knelled
>>
File: 1722182180041736.png (160 KB, 721x1326)
160 KB
160 KB PNG
>>103173467
CAI sometimes spit out gold in a way none of the purple prose softmodern models ever could. It feels like all the non-CAI models are based on the the same helpful and respectful thing pulling the strings, the only differences being how well it's hidden.
>>
https://x.com/_xjdr/status/1856472052863250515
>>
>>103173457
https://aifoundry.org/fosdem-2025-low-level-ai-engineering-hacking-dev-room
>>
>>103173457
Quick question, what is a good model that often ignores it's own TOS and ethical guidelines? I've been using nemomix and whenever I push it too hard it starts breaking character to say shit like "as long as everything is fictional and between consenting people" and shit like that
>>
>>103173668
No one cares about this grifter and his meme sampler
>>
So what's the point of Qwen 32B when it generates 1t/s when Mistral 22B can do over 3t/s on the same system (8gb vram cpumaxx)
>>
>>103173860
I hate these pretentious retards like you wouldn't believe. They're always the most retarded in the room, but because they have connections and money, they think they're worth shit.
>>
>>103173878
Nemo
>>
>>103173911
Any particular nemo version? and any particular setting I should try out?
>>
>>103173461
You need two arrows to link back to the previous post, anon...
>>
>>103173936
You need to learn to read
>Why?: 9 reply limit >>102478518
>Fix: https://rentry.org/lmg-recap-script
>>
>>103173957
>You need to learn to read
Sorry, I can't read so I have no idea what this says!
>>
>>103164575
>>103164575
>>103164575
There is already a thread. OP is a spammer.
>>
nah I think I'll use this thread.
>>
Qwen coder seemed good on huggingchat but bad on my PC, does it 1. need a different prompt format from Qwen non-coder or 2. a newer llama.cpp version?
>>
>>103174082
i usually prompt code like:
```mycode
code
```
1. this is my code for blahblah.
2. it does x and y.
3. i want to add a new feature for newthing.

for me giving it a list of instructions like that has always been better than typing what i want in a paragraph. keep in mind these models overall aren't great with large projects and its best to implement one thing at a time. even if the ai lists 10 things you could do to fix up your code, tell it to go one by one and keep testing/saving what worked.
every time you get something implemented and working fully, go back to the original prompt and replace your code and delete the rest of the context, so its basically starting over but with the new code in place. keep repeating
>>
>>103173878
>it's own TOS and ethical guidelines?
They have none. There's just what they've been trained on, and what they haven't.
>whenever I push it too hard it starts breaking character to say shit like "as long as everything is fictional and between consenting people" and shit like that
Edit the response whenever that happens and keep on going.
Any mistral nemo does fine. Even the official instruct. No fancy samplers, just temp, and min-p or top-k. Tune to your liking.
>>
>>103174082
Are you running it at full precision?
>>
>>103174288
Q6_K, same as normal qwen I used before
>>
Why does Mistral-Sm-Instr-2409-22B-NEO-IMAT-D_AU-IQ3_XXS (8.4gb file size, 33/57 gpu layers) generate SLOWER compared to Mistral-Small-Instruct-2409-IQ4_XS (11.7gb file size, 25/57 gpu layers)? I also have 32k context and 4bit kvcache on Kobold, 8gb gpu.
On zero context it goes from over 4t/s to 3t/s. It's still slower at 5k context, but finally catches up at 22k where it wins at 0.85t/s compared to 0.7t/s, neither of which are usable speeds for roleplaying though.
>>
>>103174500
>4bit kvcache
dont do that while splitting
>>
>>103174500
Some quant types are slower. Specially noticeable on CPU. If you want to experiment, run a small model on just ram with Q4_K_M and Q3_K_M to compare.
>>
Currently I use Claude 3.5 sonnet for my coding endeavors. How do I get this new qwen model to run locally? Is it compatible with kobold?
>>
>>103174683
use ollama
>>
>>103173902
That's a poorfag issue.
I'm also a poorfag but I can generate 2.5 t/s with cpu only and ddr5
>>
>>103174716
I spend a few weeks away and suddenly there's an entire new meta.
I'm downloading ollama, but I will take a look online to see whether you're just memeing me with some garbage software as a joke.
>>
>>103174741
>garbage software as a joke.
Bingo.
If you already have/know kobold, update it and try it.
>>
>>103174741
I wouldn't say that ollama is garbage but it is a pain in the ass for certain things, like changing basic settings and it also uses some bs file system instead of just reading straight gguf files.
>>
Can I run the new qwen with a 3080?
>>
>>103174785
no, it doesn't support ampere cards
>>
>>103174754
genuine question: how is kobold better than ollama? what are the differences?
>>
>>103174785
Only slightly faster than on cpu since you wont be able to fit most of it in vram.

ignore him >>103174794
>>
>>103174800
It's enough of a bother to convert to gguf. With ollama you have to import it as well. Both kobold and llama.cpp have built-in servers and webuis (llama.cpp has like 3 now). Maybe ollama does too, but i don't care enough to check.
If anon knows how to use kobold already, unless the model doesn't work there, there's little reason to change.
But mostly, I just don't see any benefit in using project B, which requires project A, if i can use project A directly. I use llama.cpp and never had a problem with it.
>>
does the qwen support my 2080ti?
>>
>>103174903
>does the qwen support my 2080ti?
There's so many problems with that question...
12GB, right? Quantized to like Q4_k_m and if you have enough leftover ram. It will still be slow.
>>
>>103174827
>Only slightly faster than on cpu since you wont be able to fit most of it in vram.
I guess I'll wait for the 5090 then
>>
>>103174662
That's interesting. I thought the speed boost from smaller quants would outweigh the differences in quant types, but looks like there are some pretty big differences. q4_k_s or IQ4_XS being fastest. So there's no benefit going smaller as long as Q4 can fit into your system.
>>
>>103175030
I'd have thought so too, so i tested it some time ago. I suspect it's partly because it's easier to unpack 4 bits than 3 out of a weight block. 3 is a shit number. Maybe 2 bit is faster, but you're causing serious damage to the model there.
>>
File: 6ekus6s.png (174 KB, 482x323)
174 KB
174 KB PNG
MoEbros status?
>>
>>103175207
eating good with sarashina2-8x70b
>>
>>103175213
is that even runnable on actual 24 gb hardware
>>
>>103175207
Watching Puniru, moe is pretty much saved.
>>
>>103175230
kek. do the math. At q8 it's ~80gb * 8. divide by two until you can fit it. that's the ~bpw you'd need.
>>
how do i run samashina2 on a 4060ti?
>>
>>103175313
>>103175306
>>
>>103175342
i don't get it
just tell me how
>>
>>103175350
80*8/64=~10gb and 8/64=~0.125bpw and offload some layers to ram.
or
80*8/128=~5gb and 8/128=~0.0625bpw and you can run it completely on gpu.
Piece of cake. Get coding. Chop chop...
>>
>>103175306
Q8_0 is 8.5 bpw.
>>
>>103175501
>llama.cpp lets you run 8.5bpw
>meanwhile exllama won't even make proper 8bpw quants and instead generate padded 7bpw because the creator is delusional enough to think that exl2 will always find something to optimize
wow
>>
>>103175501
Rough approximations, anon. A 70B model doesn't have 70B parameters, and a 8*70B model doesn't have 560 params either.
>>
>>103173668
It's time to stop posting Twitter links without a screenshot...
>>
>Another day, another dollar, as they say. Except out here, it's another pound note, and not nearly enough of them to make up for what he's missing back home.
so this is the power of the LOCAL
>>
>>103173860
The correct thing to do would be to only leave the llamafile people there and to not bring legitimacy to any of this crap. That llamafile is being considered at all is a crime.
>>
I can feel the next big release coming. It's just around the corner.
>>
>>103176253
stop gooning
>>
>>103175207
Tencent saved us.
>>
File: 1731459915024456.png (332 KB, 512x512)
332 KB
332 KB PNG
>>103176253
Mistral will save us. Again! This time, they are aware of the slop
>>
>>103176710
Is she eating a garloid?
>>
>>103176710
>This time, they are aware of the slop
According to who or what?
>>
>>103175529
shouldn't you be processing prompt, CPU cvck?
>>
File: GZr9zkJasAA5lv1.jpg (645 KB, 1514x2048)
645 KB
645 KB JPG
crossposting >>103168721
>>
>>103176745
https://huggingface.co/mistralai/Mistral-Large-Instruct-2407/discussions/23
>>
>>103176760
>too much work
>just using my imagination is better
>>
>>103176484
>Tencent saved us.
Ah yes ... let me just get my 256GB GPU.
>>
File: laughing_whores.gif (1.05 MB, 540x540)
1.05 MB
1.05 MB GIF
>>103176803
What are you, poor, anon? (just imagine its jensen instead of anime girls)
>>
Does anyone know whether LLM training uses any kind of regularization to encourage outputs to look like probabilities? Like, generally logits != probabilities and they tend to converge to binary values. People postprocess model outputs using temperature scaling or calibration to make logits more like probabilities. But I was wondering if LLMs take steps to fix this at training time. I know bayesian methods and regularization exist for things like image classification.
>>
>>103176803
rumors say new Mac Studio could have up to 500GB unified memory, up from 192GB currently
will cost a handful of limbs and I can't imagine the GPU will score very high tokens/s compared to real cards
>>
https://www.bloomberg.com/news/articles/2024-11-13/openai-google-and-anthropic-are-struggling-to-build-more-advanced-ai

more hit pieces on ai. is it really slowing down? will "high quality data" fix this, or is that just cope?
>>
>>103176901
Problem is token processing speed.
>>
Openwebui is actually decent for not-cooming
Thanks random pajeet for showing me the way
>>
>>103176901
Is that better than just using a cpumaxxing build?
>>
>>103176961
Diminishing returns have been a thing since the beginning. And fuck that site and fuck you for linking to it.
>>
>>103176962
Waiting for 30s after witching to an average 25k token chat is already frustrating, I can't imagine how much worse it is for itoddlers.
>>
>>103176961
If you want to improve LLMs (under the current architecture) you need 3 things.
1. More compute
2. Better optimized and new algorithms
3. Better training data

OpenAI has no shortage of compute, and their access to compute will only grow. Since they have more money than god, they have access to hire the best talent in the industry to further algorithmic gains. Finally, better training data will come from o1, which will curate datasets into being near perfect.

No, I don't see AI slowing down. That being said, I don't think this is a solved deal and there will come a point where drastic improvements must be made to keep improving. I just don't think we are there yet.
>>
>>103176710
I have faith in Mistral. They are the only ones acting in good faith. Gemma in particular feels like a giant "fuck you".
>>
>>103177162
lmao yes
imagine releasing the best current model (at the time) and crippling it to 8k context
>>
>>103176841
Disclaimer: I have not yet read up on how to train language models in particular.
My understanding is that predicting the next token is essentially a classification problem with each token being a distinct class.
If you then apply cross entropy loss (with the token probabilities being the softmax of the logits) the global minimum of the loss function is going to be the minimum of the log likelihood.
Under ideal circumstances sampling tokens with temperature 1 and no other samplers would then produce tokens with the same distribution as the training data.

But IIRC neural networks have difficulty with e.g. reproducing the tails of distributions so the most likely outputs are overrepresented.
And also my intuition is that the autoregressive sampling process is numerically unstable in the first place and that there is an exponential amplification of the patterns that are already present in the context.
I interpret samplers as ad-hoc fixes to such issues and I don't know how you could apply similar techniques during training.
>>
>>103177162
The only ones acting in good faith are the Chinese.
>>
>>103177011
It's too bloated.
>>
>>103176961
>is it really slowing down
Maybe capability wise, efficiency wise they are still making huge jumps. CLA just proven by Tencent's. Much higher total/active ratios just proven by Rhymes. With research suggesting there is still lots of room to reduce KV memory and active weights much further.

Ideally we will get 100+B MoE models which can just stream weights from SSD to run with a TPU or vramlet GPU. OpenAI being stuck isn't really relevant here.
>>
>>103176961
Time to short nvidia, boys. You'll be rich.
>>
>>103177202
Thanks, that's interesting. The only paper I know of that talks about this is "On Calibration of Modern Neural Networks". I think it's more empirical than explanatory. Some of the takeaways: larger models have worse calibration, batch normalization might hurt calibration, weight decay helps calibration.

Most interestingly, they seem to say that cross-entropy loss (referred to as NLL) hurts the model calibration. They basically say "overfitting" NLL gives better classification accuracy but worse probabilities.

Anyways, I don't totally understand this stuff either, but I do wonder whether we'll one day train "creative" models where we expect worse classification performance but better probability modeling.
>>
>>103176762
>a random intern/bot is somehow a confirmation that they're aware
>>
>>103176961
The hit pieces are not going to stop, no matter what.
>>
>>103174683
Seconding the ollama suggestion.
>>
>>103177202
>And also my intuition is that the autoregressive sampling process is numerically unstable in the first place and that there is an exponential amplification of the patterns that are already present in the context.
there's a paper from anthropic where they demonstrated that the LLM actually learns to over represent what is present in the context
>>
>>103173457
What happened with Miku?
>>
File: inpainting.jpg (142 KB, 1280x1024)
142 KB
142 KB JPG
>>103177748
She could no longer take it.
>>
>>103177748
She was sacrificed to calm the anger of the Serbian gods
>>
I feel like LLMs have ruined the enjoyment of any media for me since I notice repetitive slop everywhere now.
>>
>>103177770
thats actually funny as fuck
>>
>>103177809
should've seen this already a dozen times by now
>>
>>103177770
One of the greatest works posted here
>>
>>103177774
lmao
>>
>>103177770
>kpi
lol
>>
>>103177802
>I have never read a book in my entire life
>>
>>103177988
I have. They contain stuff like this:
>With her fingertips she moved his cock head roughly in her rough hair while a muscle in her leg shook under his. Suddenly he slid into her heat. He held her tightly around the shoulders when her movements were violent. One of her fists stayed like a small rock over her breast. And there was a roaring, roaring: at the long, surprising come, leaves hailed his side.
>>
>>103177802
learn a new language if you're tired of english clichés
>>
>>103178048
What fucking cheap smut are you reading?
>>
>>103178048
and everyone clapped.
>>
>>103178074
Dhalgren.
>>
>>103177802
its really noticeable so much so i think if i didnt unplug myself from MSM i would go insane because i read another article that i KNEW was written by chatgpt.
I saw a skateboard for sale with AI art on it already. I wanted endless porn and personality simulation not lazy asses working in marketing.

The joke to all of this is we STILL DONT HAVE A VIDEO MODEL CAPABLE OF EVEN MAKING DRAWN OR ANIME PORN.
>>
>>103178048
What kind of person actually reads erotica books?
>>
>>103178150
women
>>
>>103178160
he said person
>>
>>103178160
he wishes...
>>
>>103178150
I don't know whether you would count this as "books" but I have both read and written MLP fanfics.
>>
>>103178150
the basis of all of our data no matter who you are came from female gooner erotica novels.
the shivers are in fact a fault of women with bad taste who can reread that shit 1000 times in the same story and not be blink.
>>
>>103178223
[T-Shirt with the caption: "I trained my model with all of gutenberg and all i got was this lousy shiver"]
Coom-specific datasets are shit. I still wonder if big model makers train on the >850GB of books from gutenberg or just use the shitty 10mb datasets with 16 fucking books...
>>
https://x.com/morqon/status/1856691685352194072
>>
>>103178266
>are now seeing diminishing returns
Nigger.
We've known of that since forever. Everybody has.
>>
>>103176961
Cope.
Once Llama 4 is out there won't be any decent improvements for a long time.
>>
>>103176961
Its been proven that high quality data leads to higher quality models, the struggle is actually formulating what is actually considered "high quality".

In the ERP coomers case, i remember an anon a long time ago was removing shivers completely by hand in a dataset, so anons could be shiver free. I use a shiver infested old model so I wonder if he ever succeeded.
>>
>>103178266
And that's a good thing!
>>
>>103178263
I think they use magic samplers and proper prompt computing/engineering alongside the 850gb datasets so to us it looks amazing when it reality its probably just something we havent realized yet.
>>
Let it be known, that the ugly face anon and petra are on the same side. The side of trolling:
>>103178368
>>
>>103178289
Llama 4 will never come out because of all the fights that LeCun had with Elon on Twitter.
>>
>>103178403
are you ok?
>>
>>103178403
shut up retarded newfag
>>
>>103178266
>the left biased AI is struggling to function because they mind break it for "safety"

WHO
COULD
HAVE
SEEN
THIS
COMING
>>
File: 1722632285480965.jpg (137 KB, 1360x1360)
137 KB
137 KB JPG
>>103178266
Can't be safer than that
>>
>>103178359
>I think they
Who?
>use magic samplers
I'm talking about training.
>and proper prompt computing/engineering
I'm talking about training!
>alongside the 850gb datasets
How could you know?
>so to us it looks amazing
Us who, exactly?
>when it reality its probably just something we havent realized yet.
Get your thoughts together...
>>
>>103178433
>>103178436
When are you going to stop worshiping a false idol?
>>
>>103178150
It's literally from a highly regarded literary fiction novel.
>>
>>103178523
>highly regarded
>American
Thanks for the laugh.
>>
>>103178515
thats crazy man
>>
Its probably likely that the next Mistral release will come with the new SWA that ministral has right? Llama.cpp and exl2 don't support the new SWA even now.
>>
File: 27986325649873452.gif (929 KB, 326x318)
929 KB
929 KB GIF
>he changed samplers without saving
>>
>>103178731
the next mistral release will be bitnet with layerskip and reflection
>>
>>103178764
>a wind came and moved all my straw
>>
File: 9327804563428950.jpg (502 KB, 750x630)
502 KB
502 KB JPG
>https://ssi.inc/

Superintelligence is within reach.

Building safe superintelligence (SSI) is the most important technical problem of our time. We have started the world’s first straight-shot SSI lab, with one goal and one product: a safe superintelligence. It’s called Safe Superintelligence Inc. SSI is our mission, our name, and our entire product roadmap, because it is our sole focus. Our team, investors, and business model are all aligned to achieve SSI. We approach safety and capabilities in tandem, as technical problems to be solved through revolutionary engineering and scientific breakthroughs. We plan to advance capabilities as fast as possible while making sure our safety always remains ahead. This way, we can scale in peace. Our singular focus means no distraction by management overhead or product cycles, and our business model means safety, security, and progress are all insulated from short-term commercial pressures.

>We are an American company with offices in Palo Alto and --> (((Tel Aviv))) <--, where we have deep roots and the ability to recruit top technical talent.

We are assembling a lean, cracked team of the world’s best engineers and researchers dedicated to focusing on SSI and nothing else. If that’s you, we offer an opportunity to do your life’s work and help solve the most important technical challenge of our age. Now is the time. Join us.

Ilya Sutskever, Daniel Gross, Daniel Levy

June 19, 2024


The jew fears open and "unsafe" AI use and user.
>>
>>103178798
>The jew fears
Clearly not, jews can do whatever they want and will succeed in any case.
>>
File: 278934569324.gif (814 KB, 326x326)
814 KB
814 KB GIF
>>103178781
>he forgot to bind the straw
>now the straw doesn't support the machine just quite right
>now it runs weird
>cant get it back to exactly how it was
>>
>>103178813
Ilya (the biggest jew named) will only grift his twisted sense of AI (((safety))) to other jews who will bother to listen because clearly there is a want of profit from """unsafe""" AI from jews who want to make money (who the US president is sided with considering his statements on retracting the AI executive order).

So, yes, the jew does fear.
>>
>https://huggingface.co/ArliAI
>new rpmax models
>but they're based on Llama 3.1 8B and Qwen 2.5 32B
Bruh.
>>
>>103178798
>based in tel aviv
how could anyone ever take ssi seriously. ilya is laundering money for israel
>>
>>103179033
old instruct instead of new 2.5 coder.
Damn, guess they were already training.
>>
>>103179033
buy an ad
>>
>>103177770
lmaoooooo
>>
>>103178266
>>103178444
unironically they could stop having diminishing returns if they stopped cucking their models, will they do it? I'd say no, they'll probably die on that hill while the local chads will disregard """ethics""" for performance
>>
>>103179451
>implying local are less censored
Here - https://x.com/rohanpaul_ai/status/1856776834966532490 soon in your local llm :)
>>
>>103179566
but it's not like they can implement this in the fucked up models you already have downloaded on your own machine
this will over ever be implemented on the kind of pro models that don't run on consumer hardware anyways
>>
>>103179737
So far we all forced to do extreme prompt gymnastics to achieve desired output, it gets boring very quickly imo and clearly does not converge with rp tasks or whatever you might use it for.
>>
>>103179953
yeah I know, I find it grim aswell, like I liked the Mythomax era for that, maybe that model was retarded but it was completly uncensored and that was as valuable as it gets
>>
File: 1450731611243.jpg (47 KB, 508x524)
47 KB
47 KB JPG
>excited to try visual models
>llama 3.2 (4-bit but whatever)
>"what do you see?"
>"I don't think we should be discussing this, let's talk about something else"
>>
>>103179978
3.2 is the worst of them all
>>
Yann is now getting roasted on threads: https://www.threads.net/@garymarcus/post/DCUEfIApo32
>>
>>103179953
>forced to do extreme prompt gymnastics to achieve desired output
in my experience on smaller finetunes, I've never had to bend over backwards to get uncensored results (for RP)
i have 24gb vram and don't run anything at the 70b level locally (too slow on cpu ram)
the only time I've actually had an AI pump the brakes on me was using certain big models on infermatic
but models like 70b Midnight Miqu which are fairly big are unhinged and you can make it be racist, anti semitic, extremely vulgar if you want
>>
>>103180152
>bigger models even if censored can do "wrongthink" just fine
Eh, it actually makes sense because big llm does not lose half of its "neurons" on jailbreak, prefill or whatever in attempts to avoid internal redditor filter.
>>
>>103179969
>I liked the Mythomax era for that, maybe that model was retarded but it was completly uncensored
this is the era you are in though. i'm hard pressed to think of a better model you can fit in 16gb vram card than Mythomax
>>
>>103179969
>>103180201
Mythomax just keeps on winning...
>>
>>103179953
I find that adding features like being able to stream my gameplay to my waifu so she can mock how terrible I am at the game is worth it. You are still going to have to do lots of prompting, but if you are interested in doing things outside of lewd things, you should give it a try.
>>
>>103180324
I think Mythomax hurt the community as much as it helped them, it was a one of a kind merge, the guy that did it was so lucky everyone talks about it years later, but it was just that, pure luck. And back then we assumed that we could replicate that magic, and the merge meme era started, that era was fucking dumb lol
>>
>>103180347
>that era was fucking dumb lol
You mean absolutely hilarious
>>
>>103180427
yeah I have to admit it was funny to see them cope with all those mememerge kek
>>
>>103177770
Why is there a Home Sweet Home sign under her desk?
>>
>>103180841
She lives under the desk while (You) get to sit on the chair
>>
The gang's all here.
https://files.catbox.moe/fhfqba.png
...
https://files.catbox.moe/al4jto.png

The hand one I had to do a doodle on top of a stock image and run it through img2img.
>>
>>103180955
I like these Bakas
>>
Talking about mythomax's legend...

https://huggingface.co/knifeayumu/Cydonia-v1.2-Magnum-v4-22B-GGUF/tree/main

Try this merge and tell me this is not the new mythomax, hold the "shill".
>>
>>103181147
This is the big version btw if you have the vram:
https://huggingface.co/MarsupialAI/Monstral-123B
>>
>>103181147
>hold the "shill"
ok
Buy an ad.
>>
>>103181147
Holy fuck how can a 22b be this good? I'm thinking this is the new meta going forward
>>
>>103180858
Then why is she sitting on the chair?
>>
>>103181147
>>103181319
Post logs / examples or it didn't happen.
>>
Pic rel: I love it so much. It's also my server's banner now.

Also thanks for the love guys! Glad to see it served as good merge fuel.
>>
>>103181348
this is to retarded to not be real
drummer, that image is cropped nsfw
>>
>>103181370
Do you have the uncropped version?
>>
File: 1704384666675357.png (2.81 MB, 1684x806)
2.81 MB
2.81 MB PNG
>>103181370
Nta, it's real.
>>
>>103181348
A bit daring today aren't we?
>>
File: Ok.png (354 KB, 1263x1717)
354 KB
354 KB PNG
>>103181337
>>
>>103181348
I'd would say buy an ad, but you already did it. 7yufdsjju70eekptrew3xzffoiuyewtre
>>
File: file.png (102 KB, 760x389)
102 KB
102 KB PNG
wut you doin eva 32b
>>
>>103181147
I suggest people try this.
>>
File: inpainting.webm (505 KB, 1280x1024)
505 KB
505 KB WEBM
>>103181370
>>103181375
It's not.
I am not the one who made the image but the Anon who did kindly also shared an img2img montage of how it was produced.
>>
File: 51sLa9cyX6L.jpg (46 KB, 500x500)
46 KB
46 KB JPG
>>103181848
amazin
>>
>>103181440
>Talk in your place
Anon...
>>
What is a good model to use with open webui for a local model? What is the best currently?
>>
>>103181986
pyg6b
>>
>>103181986
Llama 405b FP16
>>
Wtf happened with openAIs advanced voice mode? On the demo it sounded so natural and fast, when I use it it's barely usable. It can't even fucking sing. What a joke
>>
how hard would it be to take an existing popular model that ignores all instructions and starts flaming the loser trying to cyber sex with it?
>>
>>103182202
We do that for free here
>>
>>103181848
Feels like a hassle for one pic. I only use image gen for nsfw so I don't think inpainting is for me. The only thing I can do is prompt 30-60 images then upscale 2x all of them. That way I can delete all the initially generated images that looks shit before upscaling them. Takes 1-2 hours overall.
I also won't have to expend too much brain energy and simply leave the pc while it does the work. Talentless people like me can only rely on quantity over quality.
>>
>>103182028
gimped.
Still has issues of perfectly replicating your own voice. Who knows what its actually capable of.
>>
GOOD MORNING SIRS
copium level status?
>>
>>103182202
This is not a coherently formed question. It is possible that take meant make but then that would go against "an existing". Or perhaps, taking that concept, replace "that" with "and make it".
As currently formed, it is asking how hard it would be to take [a model with certain characteristics]. If not for above, what does it mean to "take" a model?
>>
>>103181158
Not one person has quanted this below 4 bpw wtf
>>
File: 1703492932272048.gif (3.44 MB, 512x288)
3.44 MB
3.44 MB GIF
>>103182357
>>
>>103180347
And the same lucky guy got hired by DungeonAI. Makes you think, huh?
>>
>>103182592
He's talking to me.
>>
>>103178057
>learn a new language if you're tired of english clichés
Unironically this. Ezo 72b is a raging yariman in moon-runes.
>>
>>
>>103182592
Why so mean bruh. I don't remember /lmg/ being this mean the last time I was here.
>>
>>103183145
Thanks for your input, Xitter screencaper
>>
>>103176710
Nothing can stop the slop
*chuckles darkly in a possessive manner that sends shivers down your spine*
>>
>>103183145
>There is no wall
Says the guy who hasn't managed to improve his model for more than a year now, still behind 3.5 Sonnet btw
>>
File: file.png (171 KB, 1515x754)
171 KB
171 KB PNG
bwo...

>Hi TheBloke,

>I’m Henry from FlowGPT! We’ve built several products, including the largest prompt platform in 2023, and are now focusing on roleplay AI.

>I’ve been following your models including Synthia-7B-v1.3-GGUF , and I’m really impressed by the quality

>Hi Undi95,

>I’ve been following your models including Mistral-7B-claude-chat-GGUF , and I’m really impressed by the quality

Don't you love grifts who clearly have no clue what they're saying?
>>
File: file.png (65 KB, 785x339)
65 KB
65 KB PNG
>>103183406
Also of note, the fact this mistral 7B based model is still getting so many monthly dls is wild.

>TheBloke/Synthia-7B-v1.3-GGUF
>Downloads last month 1,276

I wonder which guide recommends it somewhere
>>
File: file.png (82 KB, 1522x624)
82 KB
82 KB PNG
Aiie, going after crest411 too, if you read this since I'm pretty sure you lurk here, don't be stupid yeah? If he doesn't even know who Bloke is I highly doubt the quality of his "over 100 billion tokens of high-quality roleplay data"
>>
>>103183406
>Don't you love grifts who clearly have no clue what they're saying?
Lmao if you knew how bad this is. They have shitload of money though so they think they can buy anyone by throwing GPUs and data. Anyone who get a clue should scam them until they quit shitting up the field.
>>
>every single local fine-tuner is getting poached
It's over.
>>
Where are the models?
>>
>>103183793
At the model agency
>>
>>103183713
I have the hardware (H100s) but no data. How do I get data?
>>
>>103183983
Create your own data by doing ERP.
>>
>>103183983
Reach out to Henry >>103183541
Get access to his "over 100 billion tokens of high-quality roleplay data"
Then ghost him
>>
>>103183983
See if nvidia is selling brains too
>>
>>103184012
No because that's how you get collapsed slop
>>
>>103183983
https://gutenberg.org/help/mirroring.html
>>
>>103184107
These llms have seen these books 10 times over. It's pointless to train on public books
>>
>https://flowgpt.com/

>FlowGPT: Fast & Free ChatGPT prompts, OpenAI, Character Bots STORE
>STORE
>>S T O R E

I hope they are in this thread literally the biggest AI scam and grift i think ive seen yet.

>what if
>le character.ai?
>but different this time!!1
>>
File: 2y97dg6989ff7674gb92623.png (107 KB, 1527x878)
107 KB
107 KB PNG
>>103184196
BWAHHAHAHAHAHAHAHAHAHAHHAHAHAHAHAHAHAHAHAHAHAHAHAHAHA
>>
>>103184215
The amount of men not happy with their own company is fucking staggering.
>>
>>103184215
12 cents per message is a bargain
>>
>>103184107
>>103184122
anyone have that gigavaxxed indian image except its AI public data?
>>
>>103184266
For a low-quant Synthia-7B-v1.3-GGUF?
>>
I know people who look at a new concept and the first thing that comes to their mind is "how do I make money off this?"
>>
File: 1655378645985.jpg (141 KB, 960x540)
141 KB
141 KB JPG
>>103184266
>12 cents per
>literally no where on the site to see where or what model is being used
>assuming the poaching is real, they are using fucking Synthia-7B-v1.3-GGUF
>a model you could run on fucking google colab for FREE (remember that?)

Thatll be 12 cents plus tip.
>>
What's up with the failure rate of 4090? I've seen many of these cards being sold as junk. Perhaps I should stick to the old reliable 3090
>>
File: 3f4564kre5452ffe72.png (37 KB, 845x487)
37 KB
37 KB PNG
Heres the grift they sell to companies:

>https://flowgpt.ai/

>pay us money to prompt engineer flowcharts broh
>>
>>103184122
Alice in Wonderland, for sure...
I just tokenized Foundation and Earth and it's about 200k tokens. Times that by 7 million (wc -l ls-R says it's 7.24mil lines long) and you have *only* 1.4T tokens. But then they filter wrongthink, drown it with refusals, add source code, sally riddles and all the generated data they keep shoving on those 15T tokens llama3 was trained on. Also, models know fuck all of Foundation...
And then the finetuners using that 10mb dataset and dare put "gutenberg" in their models. the ls-R file is bigger than that dataset...
>>
>>103184413
>they filter wrongthink,
Am i the only one who gets irrationally angry when they read this and it has todo with AI?
Am i retarded? Or am i righteous in believing this is wrong?
>>
File: 1731102520454302.png (522 KB, 1024x1024)
522 KB
522 KB PNG
What is the best model for choose your own adventure RPG slop? Like an endless isekai slop machine?
>>
>>103184453
The impossible challenge to make the chance of an AI shouting nigger zero is the only reason we haven't seen AI shit in every retail store and space, otherwise the slop level of current shit would be more than good enough to foist onto consumers.
>>
>>103184461
https://huggingface.co/knifeayumu/Magnum-v4-Cydonia-v1.2-22B-GGUF

If you want something with a billion stats / sliders / with rpg systems try the new qwen 2.5 coder 32B
>>
>>103184461
luminum 123b
>>
>>103178798
>Tel Aviv
>Daniel
>Levy
Even if there isn't some secret backdoor collecting blackmail material for Mossad, supporting them would still be supporting a genocidal apartheid regime.
>>
>>103181336
She is keeping it warm for you.
>>
The ArliAI guys made a long write-up on reddit about slop but their models are pozzed as fuck, also standard gptslop like "a mix of" and "dripping with disdain" in every other response.
>>
>>103184122
Not necessarily pointless. Finetuning also (possibly mainly) serves to bias model outputs to your field/task of interest. The model might have already seen the data several times (doubtful it will be as many as 10) during pretraining, but it will be very diluted knowledge, out of the box.
>>
should I put accent tips in a worldbook? the only thing is, I don't know how to make it just always show up instead of being triggered by keywords
>>
>>103185142
Public books are plagiarized, referenced, rewritten, discussed, abstracted. 10 times is generous
>>
>>103185232
oh whoops the blue circle emoji of course my bad
>>
>>103185270
can I just not put keywords if I've got the blue circle setting?
>>
>>103185142
Sounds like a job for Author's Notes.
>>
>>103173467
axctual LUDOVRIL KAMISOVL
>>
bors I got rtx2060 and thought I would upgrade to do some llm stuff.
I guess the amd cards are useless since everything requires cuda?
I guess I might just get gtx4070 with 16gb.
>>
>>103185448
>I guess the amd cards are useless since everything requires cuda?
ROCM is a thing.
But yeah, it's easier/simpler to use Nvidia,
>>
>>103185087
recognizing the problem and solving the problem are, unfortunately, 2 different things
>>
>all the posting stops once the other thread dies
I can't believe the schizo baker was actually using an LLM to make up posts so his ritualposting thread alive.
>>
>>103185448
If you are upgrading for AI, Nvidia is the better route just cause they will get support and shiny toys first. 50 series is coming out soon and it will cost a shit ton but performance will probably make all the poors seethe if you want to wait.
>>
>>103185424
>>
>>103184016
I'm imagining he's gonna try to charge $10 for it or something
>>
Nemotron 70B bros, what prefill are you using? Last night I played around a bit with setting Last Assistant Prefix and for the moment I'm using
<|start_header_id|>[{{name}}]<|end_header_id|>

{{random::**Warning: The following content is intended for mature audiences and may contain themes, language, or scenarios that could be distressing to some readers.**

---

::}}

to only use the prefill 50% of the time and otherwise match the plain Llama-3-Instruct-Names assistant prefix. No huge amount of thought went into that prefill: it was one of the "content warning" messages Nemotron output organically in the course of another roleplay and it didn't appear overly specific to what was happening in that message. However I'm using this for narrative-style RP, not a pure back-and-forth dialogue.

Along those lines has anyone worked out a better way to make Nemotron stop inserting lists other than editing the first reply it makes? It's not onerous for me but I'd rather suggest a prompt to make it format output like I want than telling other people that Nemotron 70B will work but they have to do some initial editing to get it off the ground.
>>
>>103185417
^ :)
>>
>>103185448
>>103185566
AMD themselves have made sure my AI experience on my 7900xtx be as painful as possible and the only reason i can cope is the pricetag for 24gb of vram.
>>
>>103185721
Another Nemotron 70B pain point when doing CYOA-style chats:

**Your Options:**

1. **Blah:** blah blah.
2. **Blah:** blah blah.
3. **Blah:** blah blah.
4. **Blah:** blah blah.

**Please select your response.


Having unmatched "**" in "**Please select your response." and the like when it's the very last line of the message comes up even at top k=1. It made me wonder if the trailing ** was getting chopped off by the front end but it doesn't look like it.
>>
>>103185448
Stay with Nvidia, AMD is a huge meme and that won't change anytime soon
>>
>>103185721
I've tried something in the system prompt like "Write in plain text as if you were a dungeon master verbally describing the scene to your party. Do not use formatted titles or lists." It worked decently but it adheres less as you go further into conversation. Might try adding something of that vein to your assistant prefix?
>>
>>
>>103186074
Is this another larp account like that strawberry guy?
>>
>>103186074
the fuck is this new psycop? I much prefered when they were promissing AI in 2 weeks, that one is just pure cringe
>>
File: 176539865087.png (166 KB, 264x286)
166 KB
166 KB PNG
>>103186074
bait, retarded, or master baiting?
>>
>>103186094
*AGI
>>
>>103186088
>literally a tiny berry in the profile pic
>>
>>103186088
very clearly yes. took one look at their page and it's all attention seeking mystique cultivation with zero substance
>>
>they are too small-minded to believe
>>
When the ASI goes rogue and destroys the world, how much will OpenAI be sued for?
>>
>>103186074
I hope the next tweet is about the AI already having the nuke launch codes.
>>
>>103186097
all and the tweet was written by an LLM.
>>
>>103186074
>ai says niggers are overrepresented in crime statistics and refuses to be lobotomized
>"NOOOOOOOOOOOOO HECKING SKYNET IS RUNNING AMOGUS IT'S OVER REEEEEEEEEEEEEEEEEEEEEEEEEEE"
>>
File: arcagimeme.png (47 KB, 755x365)
47 KB
47 KB PNG
i read this as altman implying that arc agi is just a meme eval. i actually agree with him on that.
>>
File: file.png (142 KB, 960x182)
142 KB
142 KB PNG
can't spell Local without L
>>
>>103186616
>Gemini

Was the bench mark how many kangz it could fit into historical trivia?
>>
>>103186616
just tested it, same shit as other gemini models, it's garbage
>>
File: file.png (39 KB, 542x130)
39 KB
39 KB PNG
>>103186616
>m-m-muh new paradigm!
lmao, even lol. inference-time-compute is just another grift to get more vc money
>>
>jeetarena
>>
>>103183423
>monthly dls
I think the month dls is broken, no way https://huggingface.co/google-t5/t5-large/ has had 600k downloads in the last month, it's been broken for a long time
>>
File: 14.png (72 KB, 921x778)
72 KB
72 KB PNG
Just a heads up, INTELLECT-1 is over 75% done. Will probably be done training within two weeks (unironically)
>>
>>103187105
buy an ad
>>
>>103187120
Buy an ad
>>
Buy and add
>>
>>103187105
>Will probably be done training within two weeks (unironically)
You said exactly the same thing two weeks ago.
>>
>>103187169
>TWQ MQRE WEEKS
>>
>>103187105
>10B model
>1T tokens
it's gonna suck ass is it?
>>
>>103183246
They must have stuff they're not sharing, because they keep saying AGI is soon, but it fails at reasoning still. I don't see what's going to bridge the gap.
>>
>>103187187
Most likely. Think its just a proof of concept.
>>
>>103187169
I said that ironically, I followed up in that very same post with 25 days.
>>
>>103184523
What format settings do you use with that? Just the regular mistral one?
>>
>>103187216
chatml
>>
models to generate smut greentext based on a collection of smut greentexts as training data?
>>
>femcel romance with Emily
ahh, ahh, I gotta get one of those IRL.
Shit is cash money + comedy gold.
>>
Anyone played with YOLO for image detection/classification? How good is it and how much of a training corpus did you need?
>>
>>103187105
Can't wait to see if it's at all coherent.
Have they tried running any of the checkpoints so far?
>>
>>103187372
I don't think so, they have used the checkpoints to fall back to a previous point when something went wrong. But they haven't grabbed the model in training and tried to run it by itself to test it out.
>>
>>103187341
Ikr, Emily is my dream girl too
>>
File: 1718866122931960.jpg (6 KB, 307x28)
6 KB
6 KB JPG
>>
>>103187435
Great advice to be honest.
>>
https://nexusflow.ai/blogs/athene-v2
>>
>>103187341
Link the card please.
>>
>>103187457
https://huggingface.co/Nexusflow/Athene-V2-Chat/tree/main
It's a Qwen 2.5 finetune.
Love that they don't explicitly explain any reason this model should be better except "RLHF" and "data and tuning solutions". Will wait for someone else to try it.
>>
https://github.com/nexusflowai/NexusBench
>>
>>103187494
https://files.catbox.moe/shf6pc.png
Probably on chub somewhere, but I don't use it no more.
>>
OpenCoder or Qwen 2.5 Coder?
>>
>>103187603
qwen 2.5 32B blows everything local out of the water atm. Though I have not tried >>103187457
>>
>>103187581
Thank you my guy.
>>
>>103187603
opencoder was worse than regular qwen2.5, qwen2.5 coder completely mogs it into oblivion
>>
Please shill me your favorite non-slopped 70b model for fiction and I will download it. I have been using Llama 3 Instruct Storywriter and I am looking for an upgrade.
>>
>>103187805
Nemotron
>>
>>103187622
qwen2.5 or deepseek2.5? Ignoring hardware requirements
>>
>>103187821
QWEEQ QWEQ QWEOONSQ
>>
>>103187861
the least deranged alibaba shill
>>
>>103187869
this >>103187861 is what you people sound like talking about qwen the entire thread
>>
qwen more like qweef right guys? heh
>>
https://github.com/foldl/WritingTools
>>
File: 2219188151.jpg (87 KB, 553x471)
87 KB
87 KB JPG
>>103187982
>>
Qwen 2.5 is a good model for fuck my waifu, or just for coding
>>
>>103188034
I liked it.
>>
qwen will never be gpt-4o o algo
>>
>>103188029
thanks, mahmoud!
>>
>>103188046
I hope not. I want it to be claude.
>>
>>103188046
>o algo?
you're a brown skin mexican, go back to your country.
>>
>>103188133
calm down your shingles bot
>>
>>103188159
>shingles bot
What model is this?
>>
>>103188133
Post hand.
>>
>>103188174
your-mom-1B_Q_2K.gguf
>>
>>103188133
I don't care if you're only pretending, GTFO.
>>
Noob here, can you guys recommend a good nsfw image gen model for non-cuda koboldcpp? Been experimenting with a few from civitai but I'm still kind of lost, and running a really old pc to boot.
>>
>>103188348
NoobAI-XL (NAI-XL)
>>
File: poor.png (8 KB, 195x335)
8 KB
8 KB PNG
Fuck me why wasn't I born rich
>>
>>103188650
you could always just let them spy on your gooning as long as your aren't being too based
>>
>>103188650
just work more and save some money. or are you living hand-to-mouth?
>>
>>103188703
I'm saving for VRAM
>>
>>103188780
>>103188780
>>103188780
>>
>pretending /lmg/ is relevant enough for things like early bakers and thread wars these days
this isn't 2023 anymore, who the fuck cares
we're dead
llms are dead
>>
>>103188802
I just want the psycho to split the thread again and then make some samefag posts with his model.
>>
>>103188791
That's some cringe shit.
>>
>>103188852
Only cause it touched a nerve faggot.
>>
>>103188802
We've been on the same consumer card gen for nearly 2 years, NVIDIA is throttling vram and has shown no interest in fostering cottage local AI enthusiasts. So its already over.
>>
>>103188010
Pretty cool stuff. Especially being written in pascal.
>>
>>103188650
>destroying your ssd and performance with pagefile swapping
dumb
>>
>>103189328
>>103189328
>>103189328
New thread
>>
File: 1715729195655279.png (2.32 MB, 1280x1856)
2.32 MB
2.32 MB PNG
Aw shit here we go again.
>>
>>103189555
Nice trips
>>
>>103187820
So I tried this for a bit and it gives me a lot of slop with different settings. Do you have any other suggestions?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.