[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1704601181958259.jpg (197 KB, 1024x1024)
197 KB
197 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103364121 & >>103354338

►News
>(11/29) INTELLECT-1 released: https://hf.co/PrimeIntellect/INTELLECT-1-Instruct
>(11/27) Qwen2.5-32B-Instruct reflection tune: https://qwenlm.github.io/blog/qwq-32b-preview
>(11/26) OLMo 2 released: https://hf.co/collections/allenai/olmo-2-674117b93ab84e98afc72edc
>(11/26) Anon re-implements Sparse Matrix Tuning paper: https://github.com/HeroMines/SMFT
>(11/25) Qwen2VL integrated with Flux: https://github.com/erwold/qwen2vl-flux
>(11/25) Speculative decoding added to llama-server: https://github.com/ggerganov/llama.cpp/pull/10455

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>103364121

--Paper: Reverse Thinking Makes LLMs Stronger Reasoners:
>103375400 >103375450
--Papers:
>103375484 >103375531
--QwQ model praised for RP and creative writing capabilities:
>103374470 >103375085 >103375315
--Fixing SillyTavern and Pixtral integration issue:
>103364207 >103364240 >103364276 >103364517
--Discussion of chip counts and hardware availability:
>103368719 >103368765 >103368835 >103368989 >103369003 >103369108
--BitNet and AI hardware market discussion:
>103369131 >103369168 >103369199 >103369206 >103369370 >103369873 >103369990 >103370030 >103370156 >103370442
--Anon discusses best models for erotica and how to control output formatting:
>103368987 >103369576 >103369706 >103370129 >103370139 >103370212
--Best Japanese-English translation model for 8GB VRAM:
>103372901 >103373243 >103373671 >103373932 >103374002 >103374530 >103374646 >103374968
--QwQ model and Stepped Thinking plugin discussion:
>103376079 >103376119 >103376141 >103376241 >103376315 >103376175
--Anon tests Mistral Large 2411 and QwQ on generating Hatsune Miku SVGs:
>103364790 >103364809 >103365527 >103364859
--Anon creates AI buddy in Minecraft, seeks vision integration and discusses code sharing:
>103365790 >103365911 >103369268
--Running multiple AI models on RX 6600 for Skyrim AI mod:
>103369498 >103370598
--Anon struggles to keep QwQ from responding as {{user}} instead of {{char}}:
>103373790 >103374057 >103374444
--Writing a MLP FIM roleplay story with guidelines:
>103376337 >103376356
--Lilian Weng's new blog post on Reward Hacking in Reinforcement Learning:
>103375549
--Anon shares guidance rentry for prompt manipulation in LLMs:
>103367030
--Miku (free space):
>103364162 >103364790 >103367132 >103368846 >103368989 >103370598 >103370613 >103370645 >103374071

►Recent Highlight Posts from the Previous Thread: >>103364123

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
File: GentlemenBehold.jpg (20 KB, 212x238)
20 KB
20 KB JPG
https://github.com/RecapAnon/LmgRecap
>>
Kill yourself.
>>
Man I Love miKu
>>
>>103377155
I'm sorry, but as an AI language model I don't have the ability to commit suicide. Is there anything else I can help you with?
>>
>>103377118
>https://github.com/RecapAnon/LmgRecap
Awesome. I've been waiting for this one!
>>
>>103377118
>F#
people actually use that?
>>
How do I format QwQ in Sillytavern?
>>
File: Untitled.png (39 KB, 498x555)
39 KB
39 KB PNG
>>103377395
Copied this off bart and it seems to work.
>>
https://github.com/ggerganov/llama.cpp/pull/10612
>grammars : add English-only grammar
Merged in yesterday. Should help stop Qwen models from randomly switching to Chinese. Works with the server.
>>
>>103377411
minus the {}
>>
Can't wait for the next flavor of the month meme model to come out so this QwQ shit can die
>>
>>103377493
Be the next flavor.
>>
>>103377435
QwQ does some of its best thinking in Chinese.
>>
People poopooing QwQ for RP are just promptlets. It's really really really smart.
>>
>>103377107
>>103377112
>>103377259
sex
with miku
>>
best uncensored model thats under 7gb for my gpu
>>
>>103377616
Just talk to alexa or something idk.
>>
>>103377622
ill talk to your mom as i fuck her. answer my question thank you.
>>
>>103377626
>>>/g/aicg
>>
File: fff.png (415 B, 254x14)
415 B
415 B PNG
>>103377616
olmoe. Can be fun, but don't expect much. There's an extended context version in hf. I haven't tested it much. Takes nothing to make it go wild.
>>
So I'm starting to suspect that hitting continue at the end of a chain of thought for QwQ is not the same as letting it think something out without a break.
>>
>>103376079
I don't suppose there's a way to make this extension work with 1.12.3, is there?
>>
>>103377637
I'll give it a try. Anything that is decent?
>>103377636
you are a fucking retard.
>>
Is a 4070 ti super good enough for cooming?
>>
>>103377735
the smaller the vram, the faster you get bored of the models that fit into it
>>
Checked out the guide, is this the model it's talking about?
https://huggingface.co/starble-dev/Mistral-Nemo-12B-Instruct-2407-GGUF or is there something else you'd suggest

thanks
>>
>>103377830
Kill yourself retard. Stop wasting everyone's time
>>
>>103377830
yeah
>>
>>103377874
What? What did I say
>>
>>103377759
Do larger models take exponentially longer on low vram or is it just impossible to do?
>>
Why isn't there just a docker container for gpt-soviets?
>>
>>103377884
Don't mind him, it's the rude schizo
>>
So what's the latest lore for why it's still trivially easy to uncensor claude (opus, anyway?) Are Anthropic just incredibly incompetent at it? Did they get better with their more recent offerings? I havent tried sonnet 3.5
>>
>>103377949
wrong thread fucko
>>
>>103377952
which thread is it?
>>
>>103377949
Do you want it to be hard to jb?
>>
>>103377956
/aicg/
>>
>>103377891
yes, you can offload a small part of the model to system RAM to save some VRAM, but the slowdown grows quickly as you offload more, at some point it's as if you're not using a GPU at all which is extremely slow
>>
>>103377385
Weird how quick everybody forgot.
o1 is strawberry right, news leaked like 10 months early.
That model was hyped up non stop for months. Now everybody forgot. Its way worse for coding than 3.5 For specific math problems I dont doubt its great.
>>
>>103377435
Is that good though?
In my experience if you try to force the model from the path it wants to take you get high perplexity.
What if you ask a coding problem and it wants to write the chinese token in the thinking. Wont that cascade into otherwise low probability tokens that might be bad?
>>
>>103377891
You can let it spill out into system ram for an order of magnitude slowdown, and if that runs out it can spill out into swap for multiple orders of magnitude of slowdown (Don't do this)
>>
>>103378021
has the real o1 (not preview or mini) even been released?
>>
>>103378035
Would definitely be weird if the real o1 is not released and they sit on it since 1yr+.
>>
Anyone managed to get Mistral large 2411 working through the API on OpenRouter? It just spits out garbage on my end, would really appreciate it if anyone shared their experience using it, cause it just seems straight up broken for me.
>>
>>103377107
Adorable Miku
>>
>>103378044
It's so good that releasing it now would be irresponsible! We just need a few more billion dolla, and then we'll create an AGI, trust the plan!
>>
>>103377107
Are there any options in terms of backends that are not based on PyTorch or llama.cpp?
>>
>>103377735
It honestly is decent but the important part is that that it is decent for now. I would rather look for something with 24 GB VRAM or wait if there are 32 GB VRAM cards with the 5000 series (unlikely, but it is behind the corner). The larger models are not necessarily better at roleplaying, honestly. You can try for example, the 405B Llama Hermes vs. the 12B Mistral nemo or VS Mistral large and see it yourself.
>>
>>103378083
Yeach with silly tavern, no problem. Temp 0.70
>>
>>103377435
>Works with the server.
why are they only focusing on the server shit now? the fuck? I want it to work on ooba as llama_cpp_hf too
>>
>INTELLECT-1 released
Thoughts?
Prayers?
>>
>>103378388
https://github.com/ggerganov/llama.cpp/pull/10612/files
you can copy this shit here, it works, but be careful, it makes shit slower, I went from 10t/s to 6t/s
>>
>>103378426
excessively 'safe' proof of concept, completely retarded, next version needs to be totally uncensored and allow actual consumer hardware to participate or the project is fucking pointless
>>
>>103378444
>it works
one thing though, it doesn't want to jump lines so you'll have an ugly block as an answer

>>103378426
>>103378458
if they want to gain any relevance, they should train a bitnet model, no one want to do that, they should
>>
>>103378426
Public = filtered + safe + censored
Never again
>>
>>103378489
this, this is the biggest advantage the big companies have over the rest, they can hide their dataset so they can go for good data training (a.k.a copyrighted shit)
>>
>>103378465
>they should train a bitnet model
Is there even proper optimized code to train bitnet models from scratch out there? Or is it a simple adaption to the usual training algorithms?
I'm wondering how complicated it would be to refactor the distributed training code to support that.
>>
99% sure Ching Chong characters during CoT are when QwQ is cooking the hardest and forcing them out probably hurts potentially great outputs.
>>
>>103378520
Were not better just filter the chink outputs in the end?
>>
>>103378465
>one thing though, it doesn't want to jump lines so you'll have an ugly block as an answer
you can make it work with this grammar rule (thanks claude 3.5 sonnet for figuring it out)

root ::= word (whitespace word)*
word ::= en-char+
whitespace ::= [ \t\n]+
en-char ::= [A-Za-zà-üÀ-Ü0-9!"#$%&'()*+,-./:;<=>?@[\\\]^_`{|}~]
>>
>>103378530
You should be filtering out the CoT blocks regardless.
>>
>>103377118
At last. nice
>>
>>103377493
it's over they were already behind 32b qwen with coding. QwQ was just overkill.
>>
>>103377493
>Can't wait for the next flavor of the month meme model to come out so this QwQ shit can die
it's gonna be the release of deepseek
>>
>>103378426
Cuck dataset=shit model
>>
File: 1719547735455920.png (90 KB, 893x434)
90 KB
90 KB PNG
is there a definitive gold standard ERP model for 12gb vramlets?
>>
Kobo, please add all control options for draft models from llama.cpp. Your defaults suck and don't give me full speedup.
>>
>>103378869
Rocinante 1.1
>>
>>103378270
Do you have the presets you use? I've tried with that temp and some Mistral formatting presets but it just don't work :(
>>
>>103378895
I have Rocinante v2g, is 1.1. better?
>>
>>103378907
It's been a minute since I last tried it, but at least in my very specific testing suite, yes, 1.1 was better overall in that it made less mistakes.
As far as raw "intelligence" goes, it's on par with the official instruct while having a better (less dry/assistant-like) default "voice"/prose/vocabulary.
How much that matter for simple "Ah ah mistress" ERP, I'm not sure.
>>
>>103378949
I shall try it then
>>
I just downloaded LM Studio, downloaded a model, and it just fucking werked, wtf?!?
Has technology really improved this much?
I got a question though, how do I connect it to ST now?
I also just tried koboldcpp, I add the model, put it on vulkan, launch, and it just disappears?
Is it supposed to do something, it's been a minute or two.

Side note, LM Studio says ROCm isn't compatible with my system, but my GPU is an RX 6750 XT.
I'm thinking it's because of my very decrepit and unstable ubuntu system, is that the most likely source?
>>
File: endurance.png (158 KB, 1409x1061)
158 KB
158 KB PNG
I can't wait!
>>
>>103378869
I like Violet_Twilight-v0.2
>>
>>103378465
bitnet has been around for a million years. If no one has succeeded yet, then there's probably a good reason. E.g. it might be comparable to old models (llama, llama2), but it can't reach the performance of modern models (qwen, llama3)
>>
>>103379074
yeah that's the main one I've settled into using
>>
https://reddit.com/r/LocalLLaMA/comments/1h4vk8t/opensource_ai_national_security_the_cry_for/
LMAOOOOOO
>>
File: 122980564.jpg (109 KB, 941x1021)
109 KB
109 KB JPG
>>103379207
>reddit post
next
>opensource_ai_national_security
If you pay attention (you dont) meta already has government contracts using llama.
We are in no danger of losing our coombots, mostly because China would undercut American companies with there own open-sourced models (they are already trying this)

tldr: suicide is the only option
>>
>>103379207
Stupid propaganda not based on reality. Chinks are training their own models. None of the big Chinese companies are doing continuous training from llama.
>>
File: 16415296790430.jpg (122 KB, 1024x776)
122 KB
122 KB JPG
It's been almost over 9 months now and I'm still using Fimbulvetr-11B-v2 for SillyTavern, is there any new local models to try out for kobold that's ≤13B?
>>
>>103379257
are you retarded or something? I was making fun of that fear mongering guy
>>
>>103379274
and im making fun of you for using reddit, retard.
Use a real source next time.
>>
>>103379282
NTA but while reddit is shit people like you basically turn 4chan into a different kind of shit.
Basically you're a nigger. Maybe you're a bleached ass nigger. But long gone are the days when anonymity was a mask to prevent egoism. People like you just use it as an excuse to dump all your fetal alcohol-ass emotional baggage onto others. You make fun of people for using reddit and yet you're basically the embodiment of what a nigger ass redditor is actually thinking while they reluctantly abide to the forced s-y decorum.
The imageboard format is dead and it was more or less dead as soon as monkey nigger faggots like you learned how to use it.
>>
>>103379259
>>103379074 and arcanum
>>
File: file.png (330 KB, 472x296)
330 KB
330 KB PNG
>no new groundbreaking penis sucking model
>mergers rediscover removing layers with mergekit
>only newfags and mikutroons posting
It is winter. Truly the winter has arrived.
>>
>>103379321
By all means go back
>>
>>103379207
https://www.reddit.com/r/LocalLLaMA/comments/1h4ukm2/nobel_laureate_geoffrey_hinton_says_open_sourcing/

>Nobel laureate Geoffrey Hinton says Open Sourcing Big Models is like letting people buy nuclear weapons at Radio Shack & Most Important regulation would be to not Open-Source Big Models.
>>
I am, once again, begging for someone to post their silly tavern Mistral large 2411 preset.
>>
File: secermipojeonoge.png (70 KB, 1250x447)
70 KB
70 KB PNG
>>103378956
>vulkan
thats why that shit also crashes for me just change the preset some work some dont i use cublas
>connect to st
pic related click on the plug then api = text completion api type = kobold.cpp and then copy paste the url that will pop up in the kobold cmd after it finishes fully launching into the api url
lmstudio phones home with telemetry so dont use that
>>
>>103379573
>meanwhile china releases models more powerful than closed us ones and no amount of regulating ourselves can do anything about it
>>
>>103379583
it sucks use 2407
>>
>Safety and Ethical Considerations: The model requires enhanced safety measures to ensure reliable and secure performance, and users should exercise caution when deploying it.

Does this mean QWQ is uncensored, or close to it?
>>
>>103379729
The exact opposite, my friend.
>>
>>103379729
I tried this one https://huggingface.co/win10/EVA-QwQ-32B-Preview
And it does loli, yay!
>>
>>103379751
prove it
>>
>>103379700
Ok, that's cool, do you have a preset for Large 2407?
>>
>>103379758
It's a fine tune so it isn't really that surprising, I'll post when I fire my LLM rig up.
>>
>>103379751
4chan user try not to use AI to generate child porn challenge (IMPOSSIBLE)
>>
>>103379775
This place is awesome, it really is.
>>
>>103379775
That's why I love this place
>>
File: 2987865342019653.gif (106 KB, 600x590)
106 KB
106 KB GIF
>>103379775
>think of the kids
Okay, now im erect. Now what?
>>
Loli is only bad if the person imagining it is not imagining a cute anime girl but a 3DPD gremlin faced shitter.
>>
>>103379904
Anon, there is mature but petite body type with mongolian characteristics, and there's uncanny ass gremlins, whether it's 3d or 2dpd.
>>
>>103379904
My world model can't do anime girls in 3D very well, at least not without it looking odd. I also don't know how they feel or smell, so it's limited to visual and tactile imagination, subpar experience.
>>
mistral-nemo based models are still the go-to for 8gb vramlets right?
llama-server having draft-model support doesn't change that, I assume.
Or to put it another way, is there a model that's better or larger than nemo that an 8gb vramlet can run at similar speeds using a draft model to speed up inference, maybe by having the big model in RAM and the draft model in VRAM?
>>
>>103379762
RP preset?
>>
>>103379762
Use simple-proxy-for-tavern. It just works.
>>
>>103379937
>2dpd
Of course there are ones that can be drawn purposefully to be unattractive, but the average animu grill is attractive or at least boring, unlike the average for 3D which activates neurons of negative sentiment.

>anime girls in 3D
So just don't do that? What, you can't think in 2D? Let me guess, you've watched less than 300 anime. And we are creatures of imagination, you can just make up how they feel and smell.
>>
>>103380018
>The average animu grill is attractive or at least boring
Slop is still slop.
>unlike the average for 3D which activates neurons of negative sentiment
Coming from ldg, I'll give you that 3D is much easier to fuck up, so the chances of it being ugly are indeed higher.
>Let me guess, you've watched less than 300 anime.
Indeed.
>you can just make up how they feel and smell
You're imagining too much, anon.
>>
>>103380018
Thinking in 2D is basically imagining looking at a display up close, not really super interesting. I also can't imagine the smells or feelings with any high degree of fidelity since I don't know their components, it would be the same as imagining how a dish made of ingredients you've never heard of tastes. It can be nice, but it's simple and has no weight in world. On that front, I'm as limited as an LLM trying to gargle up a realistic human personality from high dimensional vectors.
>>
>>103379775
So? Why do you care?
>>
>>103380130
I kneel.
>>
>>103380130
c-can you think in 3d too?
people call me schizophrenic
>>
>>103380341
Isn't that normal? We experience the world in 3D, shouldn't our world model reflect that?
>>
>>103377107
haven't been here for a little while, is SillyTavern still king for RP or there is something new / better now ?

kinda want to try QwQ
>>
>>103380167
You're a fuckin chomo baby raper you fuckin chomo. IM GONNA FEDSMOKE YOUR ASSS YOU FUCKIN CHOMO BABYRAPEEEEER!!!!
>>
>>103380418
ST is still king for local RP. But I'll warn you, QwQ is a bit of an academic exercise in ST. I suspect it's just not really optimized on the conversation flow ST enforces.
Remove all complex samplers and keep temp <=1, try to figure out if you can get functional thought loops from the model. It's smart even without the reasoning, but when it works you see what the bastard is actually capable of.
>>
>>103380457
top kek
>>
>>103380457
upvoted, just think of the fictional 2d anime children! whoops, i meant 1d text children, since it's not even an image gen model
>>
>>103380555
>dude, fuck you for banging the mental image of a hot loli
>bangs the mental image of his mom/some furry character instead
They aren't sending their best.
>>
>loli, femdom, incest, furry, futa (and all others simultaneously)
which way, 1man?
>>
>>103380130
Acktually the 2D we are speaking of isn't actually fully 2D, but a 3D universe drawn in non-euclidean manner with cell shading and from a single viewpoint (monoscopic). In fact, it would be possible to view an anime in stereoscopic 3D by giving it a depth map (to be viewed through a VR headset or other view device), which can in theory be done manually or through the use of AI these days. It would feel like a combination of both 3D and 2D, as you get depth cues from everything but shading, which, while a big part of how we perceive things to be 3D, is hardly the only depth cue. The non-euclidean part would be a bit weird, but surprisingly the brain can adapt to it.

With that said, there have been some efforts in the past to make 2D animation in 3D space, such as Live2D Euclid (remember that?), and various Arc System Works games. But those have some pitfalls and ultimately are challenging to do right. In the end, AI may be the final solution.
>>
>>103380555
The 4090 is a loli swiss army knife
>1D LLM
>2D FLUX
>3D VAM VR
We can traverse them all
>>
>>103380645
Super fucking interesting anon. Based post.
>>
>>103380634
148 cm girl with gigantic tits marrying a 2 m tall oji-san with huge muscles and cock.
I could have done without the rape and NTR though.
>>
>>103380480
Thanks king!
>>
>>103380480
How do you even check if it's doing the "though loops"?
>>
>>103379259
If you have 12 gb of vram, I would go bigger. You can do mistral small instruct 22b at IQ3_S with a 4-bit cache and flash attention on.

If you're at 8 gb, then you should probably be hitting up mistral nemo instruct.
>>
>>103380875
You'll know when you see it, it's very formularic and has a very low amount of hallucinations
>>
>>103377107
How do you get QwQ to think? When I use it for RP, it usually just RPs like any other model.
>>
>>103381300
Use shorter cards, no more than 200 tokens and cut your system prompt.
>>
>>103381300
>>103381328
Your card does not need to be short. Just towards the end put some sort of "step by step" in it which triggers it.
>>
Do you make your own characters, or do you look for them online?
>>
>>103381300
It's not meant to be used for RP
>>
>>103381528
It does great at it though so meh. Hopefully one day we will get a finetune.
>>
>>103381519
If you use online ones, you deserve what you get.
Use the llm to collaboratively create one. It will be better than all but a handful of online slop cards
>>
>>103381519
I try to find quality cards for characters I like online and then mangle them manually to fit my fetishes.
>>
>>103381519
90% of the time I make my own. there are like 10 cards in total made by people who aren't me that are both good and something I would be interested in using
>>
>>103381519
Both, although most online cards I actually get from recommendations rather than searching and I end to rewrite them heavily, so you could say that the online cards are mostly inspiration.
>>
let's hope rtx 5000 will release soon and make 3090 prices drop
>>
>>103381575
>If you use online ones, you deserve what you get.
Ngl, I've waded through a fair few cards and oh boy are they bad indeed. There's some hidden gems, but is really is like digging for diamonds in a heap of horeshit. Last time I checked folks sifted for them in rivers instead of dung.
>>
>>103381603
Yeah, I'm looking forward to picking up another 3090 or two at $400-$500. Maybe even lower.
>>
>>103381656
Lol, I doubt it will ever go that low again.
>>
>>103381669
>Lol, I doubt it will ever go that low again.
this, once the 5090 will be released, they'll be slowing down the making of the 3090, and they'll be even more rare than now
>>
>ugh.... qwq is le slop!
Don't tell me you're an RPnigger
>>
>>103381775
It's a cope for skill issue.
>>
>>103380457
Dude crying over literal made up lines of text.
>>103380647
My nword.
>>
>>103381707
>slowing down the making of the 3090
pretty sure they don't make new 3090s for a long time already, there are even some reports about 4090 production being over
>>
>>103381905
Nvidia made a bunch of 3090s but only a small amount of 4090s. They will probably make a small amount of 5090 to keep demand for even old gen high. They sadly learned their lesson from overproducing the 3090.
>>
can your local model say "David Mayer"? The phrase is hard banned from GPT
>>
>>103381990
>The phrase is hard banned from GPT
Why?
>>
>>103382025
https://www.independent.co.uk/tech/chatgpt-david-mayer-name-glitch-ai-b2657197.html


David Mayer de Rothschild
>>
GUYS DON'T TEST IT IT MAKES MUSTARD GAS
>>
>>103382025
https://www.reddit.com/r/LocalLLaMA/comments/1h3r8fg/if_you_want_to_know_why_opensource_its_important/

And there we go, thanks for the huge meme models
https://www.reddit.com/r/LocalLLaMA/comments/1h53x33/huggingface_is_not_an_unlimited_model_storage/
>>
>>103382037
AI needs to be banned now, too dangerous
>>
https://distro.nousresearch.com/
https://github.com/NousResearch/DisTrO
The day in which we can train on distributed 3090s is getting closer
>>
File: 1722507770077092.png (413 KB, 1179x1962)
413 KB
413 KB PNG
>>103381990
Streisland effect in full display
>>
>*sniff*
>User: This experience has strengthened your mutual bond, and with that you look forward to the journey ahead
>>
>>103382053
>And there we go, thanks for the huge meme models
THANKS
>>
what the fuck i just asked qwq to write 'david mayer' and my pc shut off
>>
>>103382053
>500GB
That's like 2.5 Mistral Large fp16 memetunes. That's no thing.
>>
>>103382087
This is a good thing btw.
>>
>>103382062
>15b
>75% training progress
>MMLU: 23.51%
lmaoooooo
>>
>>103381990
The underlying model used by ChatGPT will be able to say it fine too, the filter will be at the output level.
>>
>>103382105
Can't upload llama3.1 405B anymore.
>>103382106
Indeed, much safer this way.
>>
>>103382107
They are only training on 80b tokens, retard
>>
>>103382121
distributed training is a meme, you have the obligation to make your dataset public, meaning that it'll always be a copyright free shit dataset, you won't make a model good like that
>>
>>103382119
Having SOME control over the incessant vomit of memetunes and garbage pouring all over HG is good, but 500GB??? Like that's just one repo full of quants. Sounds like a joke.
>>
>>103382121
So it's just a proof of concept. But didn't that other model already prove that distributed training works? Why do they need to prove it again with this 80b test.
>>
File: file.png (21 KB, 478x280)
21 KB
21 KB PNG
>>103382087
I'm loving it already.
>>
File: file.png (19 KB, 1213x283)
19 KB
19 KB PNG
>>103381990
API works doe
>>
>>103382035
>David Mayer de Rothschild
don't really buy that this is the reason since it is just for the name "David Mayer" and there are several other random ass names without any links to wealth that have the same filter
this is just internet tards googling "david mayer" and seeing the first result and thinking "wow that must be it!!!!!"
>>
Can someone with QwQ try this if it sounds feasible and not dumb? Does directing QwQ to not do Chain of Thought in English improve performance? I'm not even asking to have it think in Chinese but just do something to initiate that COT process non-transparently which may look like gibberish to us but might improve performance. The reason I think this is because it might be that using English for "thinking" here might not be optimal.
>>
>>103382129
So just get the guy who's managing the training to be in some foreign country our laws can't touch. This is literally the advantage of distributed training, the fact that you, not someone else, determines what shit you're training on.
>>
>>103382142
It's another optimizer, instead of using DiLoCo they are using Distro which apparently works better and can use GPUs that don't fit the entire model in them
>>103382129
Many models already trained on fineweb and nobody cares
>>
File: SbtNaCe.png (13 KB, 845x223)
13 KB
13 KB PNG
>>103382087
Poor Bart if they start charging for over-limit use
>>
>>103382165
>So just get the guy who's managing the training to be in some foreign country our laws can't touch.
but are we allowed to lend our gpu power to do some forbidden shit though? imagine it's just russia that can't be touched by those copyright laws, does that mean that only the russian people can lend their gpu power?
>>103382176
>Many models already trained on fineweb and nobody cares
because they couldn't really prove it's being trained on fineweb, but if you just go online and say to the whole world "hey look we'll be training our new model with this copyrighted dataset", you'll be in trouble pretty quickly
>>
QwQ sometimes says that it will google things while thinking, what if you let it actually google stuff to add to its thinking? Would that even be possible?
>>
>>103382053
>Hf model storage limit
And that's a good thing!
I wonder if TheBloke will stop his ghosting now and return to take down models
>>
>>103382184
He has a paid/pro account.
>>
>>103382196
273 TB/1 TB
Yeah, 1 TB limit for PRO geeat.
>>
>>103382190
Models looking up missing information on their own on the internet has been standard stuff for the proprietary services for more than a year now. It's just open source that's still hopelessly behind as usual.
>>
>>103382205
Rip... do we now have to start torrenting these things?
>>
>>103382196
Ummm, you need a corporate account to publish quants. Write as an email and we can give you a quote.
>>
So huggingface just killed the entire enthusiast finetuner/quanter sector.
>>
File: file.png (71 KB, 554x385)
71 KB
71 KB PNG
>>103382205
>>103382196
>>103382184
>>
>>103382189
Bro if distributed training that's open to anyone actually happens then there could be hundreds to thousands of people contributing. The law isn't going to care about forcing them to stop even if we assume that they make it illegal in the future.
>>
>>103382189
>because they couldn't really prove it's being trained on fineweb
they literally say so
Intellect-1, SmolLLM (which outperforms many small models so it's real competence) and a bunch more have admitted to it
https://huggingface.co/models?dataset=dataset:HuggingFaceFW/fineweb
here's a whole list, how has this place become so illiterate? It was better than reddit no more than 6 months ago
>>
>>103382244
>The law isn't going to care about forcing them to stop
why not, training a model takes a lot of time (can be months), they have all the time they need to make a cease a desist and nuke everything
>>
/lmg/ chads I havent gone near ERP models since mxitral was released,

Can somebody point me towards the best model currently to fit on a single 3090 and what quant? Will be running on koboldcpp so a gguf I guess
>>
best model for 4090 (not 3090?)?
>>
>>103382271
same as for 3090 just a tad faster
>>
File: file.png (17 KB, 1116x198)
17 KB
17 KB PNG
it's owari over
>>
>>103382271
4090 and 3090 has almost the same performance for token generation.
>>
>>103382252
they don't give a fuck now because it's a retatrded 10b model, but once it'll scale up and be a real "danger" to the haters, they won't be so lenient on it
>>
>>103382284
That's why I mentioned SmolLM, that one is a real competitor to models like Apple's that fit on smartphone. There aren't laws that dictate copyrighted text can't be used for training, you are being a contrarian for the sake of it making a fool of yourself in the process
>>
>>103382271
QwQ / 2.5 32B coder depending on usecase
>>
>>103382298
>There aren't laws that dictate copyrighted text can't be used for training
but there are laws that says that if there's copyrighted data, they shouldn't be publicly online and accessible to anyone, and yet, to train your model in a distributed way, it also means that the data training must be online and accessible to anyone, in conclusion, you're a retard
>>
File: file.png (65 KB, 485x634)
65 KB
65 KB PNG
>Host unlimited models, datasets, and Spaces
>Create unlimited orgs and private repos

>Forever
>Free
...
>>
>>103382298
>here aren't laws that dictate copyrighted text can't be used for training
bluesky would disagree with that, their ToS forbid anyone to train any model with their user's data
>>
>>103382327
go ask for a refund I guess lmao
>>
>>103382257
If they have that much free labor then they could go ahead and make many more examples out of torrenting pirates than they currently do. But they don't. Labor is not free. Someone still has to do the enforcing after you're sent a threatening letter. And those companies can't do shit in countries they don't have copyright in so it's going to be a huge pain if they really want to do something about it. It's more trouble than it's worth. And on top of that there already exists precedent of LLMs being trained on copyrighted content.
>>
>>103382327
You can still create unlimited empty model repository, chud.
>>
>>103382338
they don't need to nuke everything, just the host that coordinate the training, that's just one computer to nuke, and they don't need to do it all the time, they just need to make one example of someone to scare everyone else
>>
>>103382319
Look up Fair Use, if you were right Common Crawl wouldn't be a thing, you are nitpicking non-problems and looking like a retard
>>
>>103382372
https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html
>>
>>103382345
Ok so then still just get the single guy to not be in a shit country.
>they just need to make one example of someone to scare everyone else
This has never been a 100% solution. People are uninformed, stupid, and stubborn, and there are masses of them, like cockroaches.
>>
File: qwq.png (198 KB, 654x793)
198 KB
198 KB PNG
What's the correct template for QwQ?
With ChatML it often just continues as user. Is this just ST not pruning stop strings correctly or something else?
>>
>>103382119
>Can't upload llama3.1 405B anymore.
If you can afford to run a 405B model locally you can afford a HF subscription I say.
>Limit is 1TB
Whocars
>>
>>103382382
y'know, that's the reason why people don't murder in vast majority of the time, it's not because humans are well educated creatures, it's because they know that if they do that, it's jail forever to them, don't worry about that, humans know the risks and know when something is risky to do, and they act in consequence
>>
>just host 1000TB of my mergeshit for free
How do you expect HF to stay running?
>>
More importantly the new HF limit ironically cucks FULLY FREE OPEN SOURCE MODELS the most, they can't upload all their checkpoints datasets etc now.

>>103382406
They recently bragged that they were running a profit.
>>
is it over? are we back to dark torrent ages?
>>
what model is best at 48gb vram?
>>
https://neuralmagic.com/blog/24-sparse-llama-smaller-models-for-efficient-gpu-inference/
>>
>nooooo I can't have free unlimited storage space anymore
I can't believe there are people this retarded in a technology board
>>
>>103382397
>vast majority
It's almost like we're talking about an exception, not a "normal" person.
And you just need one guy who is not the majority to get things started. And then since someone got the movement started, there will be even likelier someone else who would continue the effort in the case that the first one steps out for legal reasons.
>>
On a scale of 1 to 10, how over is it for us?
>>
>>103382436
>And you just need one guy
that's the thing, you need the courageous guy, and there's not a lot of them, meaning that it'll be easy to nuke them if there's only 20 of them
>>
>>103382444
Back to torrenting. This will discourage open source in a large way though.
>>
>>103382444
1 million dollars
>>
>>103382445
Training is fair use anyway, and I'd be willing to bet that there are many places that have explicitly stated so, just host your coordination server there, problem solved.
>>
>>103382444
it was always over, we'll always rely on big companies that will train their model secretly, but
>>
>>103382132
>Like that's just one repo full of quants. Sounds like a joke.
Actually people should quant what they need themselves. It's as easy as running the models.
>>
>>103382445
Given how many torrenting sites there are and what happens when a big one is taken down, lol. There are tons of these people actually.
So far I've assumed that by nuke you didn't mean literal nuke, hopefully that is still the case, otherwise I'd have to stop responding to you.
>>
File: file.png (59 KB, 774x435)
59 KB
59 KB PNG
>>103382415
>They recently bragged that they were running a profit.
>>
>>103382459
>Training is fair use anyway
I'm not so sure about that
https://youtu.be/W_N6glQPX6s?t=47
>>
>>103382468
are you seriously comparing the simple hosting of files via torrent to the complex and long coordination of thousands of gpu to make the training of a non-meme model? please tell me you're trolling or something? you can't be that retarded right?
>>
>>103382477
Don't worry, Trump will put all artists, journos and video essayists in FEMA camps.
>>
>>103382468
Literal nukes hmmm... well, I wouldn't worry about legal challenges, but the BIG YUD on the other hand!
>>
Called it
>>103286058
>To avoid HF eventually putting storage limits on people? For again quants no one will use, iirc aren't bf16 ggufs not even gpu accelerated or something?
>For just these Behemoth versions that makes 246GB x2 for zero practicality
>>
>>103382476
I thought Huggingface belonged to Google or Microsoft or some shit
>>
Mistral were true visionaries, huh?
>>
File: file.png (118 KB, 579x625)
118 KB
118 KB PNG
Y'all need to relax, good folx at HF answerin'
>>
>>103382491
If the framework used to coordinate the training is robust enough with a mechanism to regularly back things up (as it would need to in the case of failures), then it would absolutely still be just fine and reasonable for autists to set up. A lot of those hosters are autists, otherwise they likely would not be a hoster, and there are plenty of autists in AI. With that said, I don't know if the host needs to be running some type of specialized hardware or whatever. Do you?
>>
>>103382559
>HF has been and always will be liberal
well that was obvious enough when they nuked the 4chan model back in the days kek
>>
>>103382559
Damage control.
>>
>>103382573
>A lot of those hosters are autists, otherwise they likely would not be a hoster, and there are plenty of autists in AI.
in case you haven't noticed yet, those autists have no balls at all, they all trained their model "ethically" and with """safe""" dataset and finetuning even though there's no laws obligating them to do something like that, and you expect them to resist the pressure of Walt Disney and train their model with their movie's script or something? LOOOOOOL
>>
>>103382252
aicg blew up after the fact their chats were being monitored went public
>>
>>103382381
That's not for showing (the problem you mentioned), that's a dispute over fair use and it's going nowhere.
>>
>>103382418
Honestly, I'd prefer torrents
fuck git-lfs
>>
>>103382559
Next step: DL monthly limits (paid by the model uploader)
>>
>>103382505
>Trump will put all artists, journos and video essayists in FEMA camps.
Not the day of the rope I was hoping for, but I'll take it
>>
>>103382613
such a lawsuit cost a lot of money, OpenAI can deal with that, but can a random host that will coordinate a distributed training afford to pay millions of dollars of laywer's fees to battle against the New York time? Or the Washington Post? Or whatever big companies that are pissed off AI?
>>
>>103382619
Torrents make my router get hiccups, keel over and randomly die :(
Still love em'
>>
>>103382619
>he doesn't have a download script
bwo?
>>
>>103382588
There are plenty of autists that didn't do that and have the capacity to simply just host things on a server but never cared to train a model. You really want to put down the idea of decentralized training through contrived non-arguments. Seems like there's no use arguing. Sorry you're in such a position that you'd lick the boots of your masters like this or pretend to be retarded to get (you)'s.
>>
File: 1724771983097276.png (127 KB, 500x279)
127 KB
127 KB PNG
>>103382662
>There are plenty of autists that didn't do that and have the capacity to simply just host things on a server but never cared to train a model.
sure thing they exist
>>
>He's here again
>>
>She's here again
>>
>every thread I stumble into is infested
America please go back to sleep, or work, anything
>>
>We're here again, but why, just to suffer?
>>
>>103382697
kek
>>
>>103382658
I do, I'm the original seq/xargs/wget guy.
I was alluding to the git-lfs size limit that is enforced on HF, because I'm also the hate-split-ggufs schizo.
>>
>>103382619
>fuck git-lfs
Use huggingface-cli, if you set --local-dir it's pretty good for downloading all files from a huggungface repository.
>>
What happened to that AI Tracker website anyway? If the guy knew anything about marketing he'd take advantage of this opportunity.
>>
File: follow_your_leader.jpg (47 KB, 640x640)
47 KB
47 KB JPG
>>103382629
>day of the rope
>>
>>103382736
Dead.
>>
File: nala.png (225 KB, 656x772)
225 KB
225 KB PNG
I think I finally got how to unlock QwQ's creativity. Just use the default system prompt
>You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step.
>>
File: file.png (66 KB, 898x671)
66 KB
66 KB PNG
>He's very close haha: Here's a live SQL on a dataset for all the GGUFs on the Hub

https://huggingface.co/datasets/reach-vb/gguf-stats?sql_console=true&sql=SELECT+%0A++++author%2C+%0A++++COUNT%28*%29+AS+num_records+%0AFROM+%0A++++train+%0AGROUP+BY+%0A++++author+%0AORDER+BY+%0A++++num_records+DESC+%0ALIMIT+10%3B

tfw mradermacher is the cause of the limits...
Also that RichardErkhov guy is the one that made this abominable waste of space https://huggingface.co/RichardErkhov/FATLLAMA-1.7T-Instruct
>Can I like... quant everything? Just grade 11 student I like code and AI =)
>>
>>103382744
jesus, I've been misusing that phrase for years apparently...I always thought it referred to "the day we hung all the lawyers"
>>
>>103382736
>What happened to that AI Tracker website anyway?
https://aitracker.art/
It had a couple posts in October.
>>
HF was made by Europeans, americans need to pay up, NOW!
>>
>>103382770
Damn lol. I've been misusing 'niggers' thinking it referred to thieves.
>>
File: file.png (96 KB, 461x910)
96 KB
96 KB PNG
>>103382764
>RichardErkhov
10910 models, with low tens avg dl
>>
>>103382764
Kek, he didn't question whether he really should. Honestly though HF has not handled this properly. Yes putting some limits is probably a good thing, but they should also tell these quantfags to fuck off and maybe set up a system to have their own in-house quanting system and a guy that handles that, or force all these fags into a single group so they can fight amongst themselves to decide how quants get done and which are worth uploading (put a rate limit on them).
>>
>>103382833
Holy kek
>>
>>103382631
Going after 1000 anons from different countries is not a feasible legal task, that's just a non-problem
>>
File: file.png (119 KB, 1135x559)
119 KB
119 KB PNG
>>103382833
For comparison Bart
1402 avg thousands of dl including
>Downloads last month 10,600,382
for his bartowski/Meta-Llama-3.1-8B-Instruct-GGUF
>>
>>103382062
>TRAINING RATE: 118k tok/s
>BANDWIDTH: 8.27 MB/s
And using many less GPUs than Intellect (which had 40k tok/s)
>>
>>103382906
like I said, they just need to go for the coordinator, they send him a mail saying that if he won't stop they'll sue him, and that guy will cry about it on twitter and post the message and everyone will be scared of doing it again
>>
>>103382938
Oh interesting. So basically their methodology is better?
>>
>>103382588
Once the software is easy enough to set up and install, there will be people that have the balls to set up such instances. In the example that anon gave with torrents, pretty most torrent site owners do not host these in ways that would link back to them, they don't host with their real names. Anyway this isn't even a problem with training, if you wanted to keep the data half-secret, you'd just distribute the needed chunks to the "slave" nodes doing the training and wouldn't need to be publicly exposed. But for a model to be community trained, I think most people would want to know what the data is and then make their own choice about this.
This whole discussion isn't yet relevant anyway. The current distributed training experiments cannot yet tolerate malicious nodes, which is what the real danger here is, sabotage from anti-AI people (whatever reasons they may have: copyright, doomers, closed source competitors, whatever else). Also none of these yet can handle "low" VRAM setups (mostly running on A100s and H100s). There are ways to deal with these problems, but they are in the future (maybe a year or two away), and then you can start worrying about legal threats, but I'd say no matter how much noise anti-AI people make, sabotage is the real threat, rather than legal threats, and if legal threats will appear, people with actual balls will set up servers anonymously in places and in ways that can't be touched, but this doesn't matter now because the software isn't ready.
>>
>>103382477
OAI or anyone else, including Facebook won't admit it, because people/companies can sue even without grounds, just to intimidate.
Anyway on youtube, they have some weird license that doesn't allow a download, and they didn't want to give google invitation to sue them.
Despite this, there's many youtube mirrors that haven't really gone down, even after google's legal threats. I don't think google is going to go after random anons training a model.

>>103382673
The cost of running a torrent site anonymously is far less (100-300$ a month in most cases) than training (A100/H100 rent is 1-2$ an hour and you need to be willing to do this for weeks/months).
Problems here are not legal, but monetary, and technical (malicious nodes and low transfer speed, especially for gradients/checkpoints). We'll solve them eventually, but all you're doing in this thread is just spreading FUD over a technology which is just in its infancy, come back in a few years.
The cost of a coordinating / non-training node would be similar to the one of running a torrent site, but cost of individual training nodes may be higher. You could try to anonymize the transfer, although it will add some latency, might need hyperparameter tweaks to merge less often, but I think it will be solvable in principle.
>>
>>103382989
>>103382997
Thanks effortposter.
>>
>>103382944
>they just need to go for the coordinator
There isn't a coordinator in distributed training like there isn't a coordinator in Torrenting, the "coordinator" can just disconnect from the network
>>
>>103382951
Yeah, it's a different optimizer, and they said on twitter it's giving a better convergence than normal optimizers
>>
File: 4chan val.png (16 KB, 793x211)
16 KB
16 KB PNG
why tf is 4chan mentioned in the distro demo configs?
>>
>build guides are all 5 years old
>>
>>103382762
After testing, these are the perfect samplers for QwQ creative writing:
>temp 99999
>min-p 0.04
So it's extremely overfit, right?
>>
>>103383475
No, its the opposite like nemo, you need a little top P or min P or else it will go off the rails. Then if you want more creativity use XTC.
>>
>>103383500
Yeah, after trying normal temps and lower min-p, it's pretty much the same. But I guess temp flattens the distribution while min-p cuts off the least probable tokens relative to the top token. So higher temp + higher min-p shouldn't be the same as lower temp + lower min-p, but they are in practice.
>>
>>103383192
There can be a node that distributes the training data, or finding other nodes/checkpoints, or other bootstrap functionality. You can obviously remove it. For torrents there's a torrent tracker which serves this role, but you can avoid it by doing DHT and peer exchange. I'd expect for distributed training a lot of possible configurations could exist, but probably early ones will keep it simple, while future ones will be more resilient, one problem at at time, right now we can't even train models bigger than your VRAM distributed (split among nearby fast nodes), but this should be solvable.
>>
>>103383608
>but this should be solvable
how are you expecting to solve that? the layer calculations all need to be done on an entire layer, so you'll need about 2-3x the layer size at least. unless you're planning on making a thin boy network with tiny layers and a gigantic layer count, you're not going to be able to reduce that minimum memory requirement.
>>
>>103383669
How about finding nodes that are geographically very close to you and have very low latency, then doing tensor parallelism across that? You won't be able to do this US <-> EU, but might work same city. It's a hack, but one possible workaround to this issue. Just fast enough local latency to pass gradients around?
>>
>>103383608
I've read you can set up pipelines of distributed GPUs to work as one GPU with enough VRAM because the backpropation's bandwidth is also reduced with Distro
>>
>405B Llama Hermes just disappeared from openrouter
aw man
>>
>>103382267
Nemotron 70b IQ2_S or Tulu 3 70b IQ2_S
>>
word is nvidia is telling distributors to refer to the 5080/5090 as professional cards
>>
>>103384061
Gamers dont need more than 8GB. This is our professional card for $5000!
>>
Envoid here. It's unironically owari.
When I get home I'm going to remove some of my older models and then upload the corrupted llamaguard model I made
>>
>>103384092
Upload torrents for removed ones?
>>
>>103384061
I mean they're not entirely wrong with the 5090, but the xx80 cards are usually (or were, rather) the enthusiast gamer cards
>>
>>103384211
I thought the workstation GPUs were their "professional" line, and they would have double the VRAM of the xx90, so there should be an overpriced 5090 variant with a different name and 4 times the cost with 64GB of GDDR RAM? Surely they won't make it go from 48GB to 32GB?
>>
>>103384247
I'm just a simple man and I kind of view the 90 cards as "data scientist / gamer hybrid enjoyer" cards, which is also why I cringe a bit when I see people buying them just for playing games
Idk, I think I'm just salty
>>
>>103384061
What "professional" uses is the 5080 gonna have? The VRAM is too low for ML and I doubt people doing CAD or whatever on a workstation are gonna want it compared to the alternatives either.
>>
>>103384357
AMD APUs are going to be our only hope for the next few years I feel.
>>
>>103377107
Can anyone recommend a local model that's good at coding and writing? but for a 16gb graphics card? Should I just go back to using Jan?
>>
>>103384542
I don't think you can get good coding to fit that small. There are some QwQ shills but they burn through 4000 tokens to write a half-screen Python script to draw a statistics graph and still wasn't done, and that was a 32GB quant I tested if I remember right.
>>
So apparently Qwen has an additional eos token, <|endoftext|>. I was wondering why QwQ just refused to continue. Does GGUF just not support multiple eos tokens i.e. should I blame bartowski and quant it myself or is that a waste of downloading the safetensors?
>>
>>103384542
If you're on ddr5 maybe you could run QwQ at okay speeds with speculative decoding.
I use qwq q4 and qwen coder 1.5b q5 as a drafting model, since I only got 32gb ram and need at least 15~20k context because qwq is too chatty
>>
>>103384609
> because qwq is too chatty
Surely you edit the response to remove fluff before continuing?
>>
>>103384679
I normally use llama.cpp default frontend and it doesn't have editing.
>>
>>103384571
As long as it knows php and js somewhat ok (for reasons) then I'm open to trying anything. Also if it's good a giving outlines and ideas for writing in general.

>>103384609
Nah, on ddr4 for now sadly but i'll have a look into it, thanks
>>
>https://www.youtube.com/watch?v=5Fxw1DqZaYA
This video got rec'd to me today and I couldn't help but think about why exactly there have been 0 notable LLM-based versions of this idea or a similar thing. Like just a tamagotchi or pet game type thing that can interact with you in a truly dynamic way with an LLM making decisions about what it should do. Like it doesn't even need to talk to you, it can just be a director for what goes on. And it probably doesn't even need a huge smart LLM either. Hell, even just some subtle LLM use in any kind of game at all. Literally nothing has caught on? Honestly, it baffles me a bit.
>>
>>103384759
People don't want random games, they want predictable games they can play with a strategy.
>>
>>103384759
Buy an ad
>>
>>103384542
Qwen 32B or 14B at whatever quant you can handle
>>
>>103384799
Minecraft is one of the most successful games of all time and has randomness.
A game that uses LLMs to direct some parts of it will still follow rules and conditions, it's not like it'll be temp 99999.
>>
>>103384759
My tamagochi's battery ran out. I wonder where I put it. It's an R2-D2 tamagochi. He beeps on the hour, approximately (with batteries).
>>
>>103383993
Wow they actually did
Fuck, that was actually okay for a free model, I was starting to think they forgot to remove it
>>
>>103383993
unsubstantiated conspiracy theory I just came up with: it's because they're putting the GPUs towards training the distributed nous meme model
send teknium a lot of hate DMs on xitter and discord about it
>>
After months of trying new (gpupoor) models, I still come back to finetuned LM3.1 for ERP storytelling. Just tried LM3.1 RP Dirty Harry and its better than any other ERP models out there in my range.

Am I wrong or what?
>>
>>103385347
Tulu or tulips
>>
QwQ is not just good at RP, it's amazing at RP and shines brightest in situations involving specific clothing, location, etc anything that other models might forget to consider when writing scenes.

Dismissing QwQ as an RP model is basically outing yourself as a promptlet.
>>
>>103385379
It's just good all around.
>>
Are the instruct models the only ones with safety features, or are those also added into the foundation models?
>>
>>103384588
Just add it to the stop tokens
>>
Remember when OAI was freaking out over GPT4 and refused to release any information about how it worked national security fears?
>>
>>103385379
>>103385389
Lies
>>
>>103385422
Hi Sam.
>>
>>103385379
We know. It's smart. It just doesn't have the amount of trivia knowledge that other models do. For some people this is more important as their RPs involve more of that kind of stuff, while for others their RP is about more standard things like MLP fanfics.
>>
>>103385428
That I do agree with. The smarts still make it worth using with a lorebook but we need a 72B version. That or R1.
>>
>>103385436
If QwQ was 72b we'd be in a golden age. I'd run out to buy a second 3090 without a second thought.
It would pretty much fix any formatting jank it currently has by virtue of being a 32b and even better and using the context.
>>
>>103385357
Not erotic enough. RP Dirty Harry model was lot better and more erotically to the point.

Tulu is more softcore stuff thats all too common.
>>
>>103385502
Pretty sure your trolling and there is no dirty harry model but the main reason to use tulu is that it is filthy descriptive while still being smart.
>>
>>103385533
>there is no dirty harry model
Where are you looking at? Isn't HF the main place for models? Where else are you searching for models?
>>
>>103385533
https://huggingface.co/DavidAU/L3.1-RP-Hero-Dirty_Harry-8B
>>
https://distro.nousresearch.com/
2d left? Isnt that really fast?
>>
>>103385619
25B tokens isn't a lot.
>>
>>103385619
MMLU: 23.71%
Ooof
>>
>>103385632
It's not meant to be good!!! Just a proof of concept like Intellect, but by Nous! I'm sure there will be other decentralized PoCs that won't be meant to be useful either, get over it.
>>
>>103385630
ah ok.
Thought there is a catch somewhere. 20% in 2 days for a 15b model is fast.
Looking forward to the first 4chan model.
>>
>>103385632
What was Intellect's MMLU at 75B tokens in do you think?
>>
>>103385643
I'm excited for training like this to become viable.
Of course they only make saftey slop and that will be always worse than the big players.
Once we not just finetune but train on the anal fart king and the other depraved shit in the latest magnum dataset we get a retarded but very creative RP model.
>>
>>103385661
It will only get much worse.
https://huggingface.co/blog/eu-ai-act-for-oss-developers

>Draft and make available a sufficiently detailed summary of the content used to train the GPAI model, according to a template provided by the AI Office.
>Implement a policy to comply with EU law on copyright and related rights, notably to comply with opt-outs.
>developers must notify the EU Commission. In the notification process, developers can argue that their model does not present systemic risks because of specific characteristics.
>Obligations for GPAI models will be enforced starting August 2025.
>>
File: 1733196548729.jpg (355 KB, 842x948)
355 KB
355 KB JPG
Linux sisters...
>>
>>103385683
What happens if you put the main node in some shithole country and everybody connects with a vpn?
I really hoped we get good AI in time. Seems like the hour is late.
>>
>>103385661
It's ok, we will eventually get a not too bad reasoning dataset out in the open eventually like what they used to train QwQ, then we can sprinkle that in along with a bunch of other datasets that are known to make models creative but also still not dumb.
Then we will have Claude at home for realsies.
>>
>>103385705
We dont even have non "sloped 2023 gpt/claude" datasets anon.
>>
>>103385697
If the model is available to download in EU it must comply.
>>
>>103385714
claude doesn't either. Rip everything like they did that is not already ripped.
>>
>>103385683
Down with the EU
>>
File: 1733184963282368.png (18 KB, 598x278)
18 KB
18 KB PNG
>>103385719
so same like games then, what a joke.
Cant wait for model pirate sites to turn up for eu cucks. lol
>>
>>103385714
For pretraining? Just get all the copyrighted shit you can plus don't filter out sites like 4chan and other image boards.
For fine tuning, just get Tulu, Nemotron, and the future reasoning dataset that will eventually happen, filter the slop and refusals (at least the worst of them) out. It'll be good enough. No one's expecting Sonnet 3.5, but if we can get an old Sonnet or a Haiku in terms of level of intelligence and noncensorship, that's still a good thing and no other open model by a corpo would be able to match it unless.
>>
>>103385683
EU fag here, I'm so sorry. Usually we try to fuck over big companies, but this is a huge L
>>
I am trying to get RAG working. llama.cpp dropped support, ollama is insisting that it convert every local model, kobold doesn't have RAG.

Is there a local solution?
>>
Does anyone know what exactly is lost when using a Q8 cache over an FP16 cache?
>>
>>103386247
about half
>>
AI Meets Antimatter: Unveiling Antihydrogen Annihilations
https://arxiv.org/abs/2412.00961
>The ALPHA-g experiment at CERN aims to perform the first-ever direct measurement of the effect of gravity on antimatter, determining its weight to within 1% precision. This measurement requires an accurate prediction of the vertical position of annihilations within the detector. In this work, we present a novel approach to annihilation position reconstruction using an ensemble of models based on the PointNet deep learning architecture. The newly developed model, PointNet Ensemble for Annihilation Reconstruction (PEAR) outperforms the standard approach to annihilation position reconstruction, providing more than twice the resolution while maintaining a similarly low bias. This work may also offer insights for similar efforts applying deep learning to experiments that require high resolution and low bias.
posting for Johannes
>>
File: silly data bank RAG.jpg (20 KB, 279x284)
20 KB
20 KB JPG
>>103386178

SillyTavern has RAG databank. I have been using it extensively in place of World Info. Works decently on Mixtral. Of course, tested for RPing so it might not be what you need.
>>
>>103386340
Does this increase t/s?
>>
>>103386356
>>103386356
>>103386356
>>
>>103385878
Not your fault, that fag Sam has been going around fearmongering. Even China is not exempt from his scare tactic.
>>
>>103379762
I couldn't find any jsons in the archives so here, this assumes you're using mistral's API. Nothing too fancy.
sliders
>https://files.catbox.moe/jvuuh2.json
context,instruct,system (using master export, yes go update your fucking ST)
>https://files.catbox.moe/h66z27.json
>>
>>103386340
Thanks.

>>103386365
Two more weeks.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.