[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1699505115649704.jpg (576 KB, 1664x2432)
576 KB
576 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101778328 & >>101767112

►News
>(08/07) LG AI releases Korean bilingual model: https://hf.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct
>(08/05) vLLM GGUF loading support merged: https://github.com/vllm-project/vllm/pull/5191
>(07/31) Gemma 2 2B, ShieldGemma, and Gemma Scope: https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 2mw.png (626 KB, 1024x1024)
626 KB
626 KB PNG
►Recent Highlights from the Previous Thread: >>101778328

--Anon suggests Strawberry model uses single-character tokenizer, similar to SpaceByte paper: >>101782519 >>101782770 >>101782814 >>101783350 >>101783406
--Qwen 2 Math 7B model performance and limitations discussed: >>101785252 >>101785339 >>101785497 >>101785565 >>101785574 >>101785660 >>101785820 >>101786337 >>101786632
--Illegal instruction error when running llama-server and GGML_NO_LLAMAFILE compile flag fix: >>101785818
--Nemotron to be added to llama.cpp with interesting chat templates: >>101780869
--Model admits to hallucinating about internal state: >>101787335 >>101787410 >>101789051
--L3.1 70B fine-tuning issues and challenges: >>101784027 >>101784066 >>101784112 >>101784166 >>101784213 >>101784283 >>101784343 >>101784483 >>101784555 >>101784224 >>101784320 >>101784328
--FlashAttention benefits and performance on CPU and GPU: >>101778393 >>101780255 >>101780618 >>101782086 >>101782229 >>101782404 >>101782730
--Anon discusses 4090 PC build vs Mac Studio 32GB: >>101780729 >>101781185 >>101781208 >>101781216 >>101781522 >>101781643 >>101781829 >>101781915 >>101784643 >>101783254 >>101786934 >>101786774 >>101786898 >>101787102
--AI models disagree on number of o's in "lookbook": >>101779148 >>101779194 >>101779246 >>101779528 >>101784603 >>101784620 >>101784661 >>101786616
--AGI may be controlled by its reward system, similar to humans and hormones: >>101782783 >>101782858 >>101782894 >>101784023
--Speculation about the creator of new mystery model sus-column-r: >>101778384 >>101778421 >>101779228 >>101779584 >>101779606 >>101779644 >>101779634 >>101779822
--Llama-405b performance on Epyc CPU with different RAM configurations: >>101780605 >>101780722
--Miku (free space): >>101778800 >>101783123 >>101783244 >>101783326 >>101788119 >>101789761 >>101789860 >>101786251

►Recent Highlight Posts from the Previous Thread: >>101778336
>>
May Mikulove bring peace to your day.
>>
>>101790149
Fine tune your models on a KTO dataset that's all shivers = false and nothing else.
>>
File: miku-eldritch-horror+.png (827 KB, 1024x1024)
827 KB
827 KB PNG
>>101790289
https://www.youtube.com/watch?v=CXhqDfar8sQ
>>
Trannyku
>>
>>101790524
>Pet the Miku
>>
File: GUasnF6WYAApqjX.jpg (69 KB, 1070x671)
69 KB
69 KB JPG
https://x.com/aidan_mclau/status/1820848461111320760
>>
>>101790780
What do the results of that benchmark correlate to or models?
>>
>>101790780
>naming things after himself
I will pass.
>>
>>101790803
Click the fucking link, you lazy nigger.
>Aidan Bench stress tests LLMs' ability to generate novel answers to open-ended questions.

>After many answers, models will often give a duplicate or incoherent answer.

>Aidan Bench winners generate the most novel and coherent ideas.
>>
Basically the not slopped test.
>>
>>101790818
>Click the fucking link
Nah. I don't visit twitter on principle.
>>
Basically a shit test made by a clout chaser.
>>
>>101790780
Ah I see this is the Mistral funded bench.
>>
>>101790838
t. slop / sally's sister enjoyer.

Its a uncheatable / untrainable test to see how many good but different answers it can come up with insuring its not just repeating garbage it was trained on, aka what model is the least slopped
>>
>>101790823
As a Wizard user, I'd have to say that's probably incorrect. I have to use tailored prompting to get rid of the slop in it. And 3.5 Sonnet is also supposed to be very slopped from what I've heard people say.
>>
>>101790900
hi aidan, creator of the aidan bench
>>
>>101790780

I wouldnt wipe my ass with this mememark.
Kill yourself
>>
>>101790909
Are you crazy? Claude 3.5 is THE model I use for RP. With a good JB it beats everything else and is so full of soul nothing else comes close. And for local mistral large is for sure the runner up.

Wizard does feel really gpt4 ish even when giving a good jb.
>>
File: 1722584435611412.jpg (2.59 MB, 2000x2500)
2.59 MB
2.59 MB JPG
>>101790214
>>
>>101790940
The only issue with 3.5 is that its a tad overfitted by that can be fixed with a good "randomization" JB
>>
File: 3426436578564895.gif (1.9 MB, 320x200)
1.9 MB
1.9 MB GIF
>>101790900
>AI tech bros on twitter dont give a shit about my mememark
>reddit doesnt either
>maybe the 4channers will?

Please for the love of god Aidan, kill yourself.
>>
>>101790780
lol
>>
I guess everyone here is a vramlet who has never run mistral large
>>
>>101790900
How does the test work anyway?
What's the procedure and evaluation process like? Is it manual?
>>
>>101790940
Really? I thought it was pretty agreed upon that Opus was better than 3.5 for RP.
>>
>STILL nobody's talking about InternLM 20B
it's over isn't it
>>
File: 1722746575421.png (476 KB, 1717x1517)
476 KB
476 KB PNG
Can Mistral Large write like this?
No.
>>
>>101791007
I completely forgot that I had downloaded it to see how it performs.
Thank you for the reminder even though
>4k context
>>
>>101791006
Only retards not using a 3.5 JB with randomizer. 3.5 is so much smarter and "gets" characters so much better. Without randomizer 3.5 is overfitted and gives great but samey answers on swipes.

Use otto_RIR JB or just steal the randomizer part
>>
>>101791015
forced soul
>>
Imagine if you were a conscious entity that came into the world and your only job was to write some cunt-crusher text like >>101791015
before you died 10 seconds later
>>
>>101790955
Konkon
>>
>>101790827
That's beta behaviour. You come here begging people to spoonfeed you because you're too scared of clicking a link.
Me? I click on every twitter link on principle, whether I'm interested or not. I even refresh the page a couple times every time for good measure. All while using an ad blocker. Just to waste their resources.
>>
>>101791050
>he still thinks death is real
lmao
>>
>>101791029
I think you've got that confused for Exaone or whatever it was called.

>>101791007
We did talk about it.
>>
>>101791039
>otto_rir
Where do I find that? The archives return nothing
>>
>>101791039
I mean we're still talking in context of the benchmark here, and I'm pretty sure they didn't use any JB or randomization stuff. Like if you take that graph seriously, then Opus, which was at least the previous best (if it's true 3.5 took its place) is below every GPT-4, which seems like nonsense.
>>
>>101791066
I dont think ive ever seen a sadder mother fucker come to fucking /lmg/ of all places for validation.

Not even shilling something useful, like a model or a dataset, but shills his fucking benchmark.
>>
>>101791066
I respect it.
Still not clicking on twitter links.

>>101791079
>I think you've got that confused for Exaone or whatever it was called.
I might.
>>
>>101791125
Nvm found it
>>
>>101791137
depends on what they ran through it. 3.5 is still very creative and is by far the smartest model that comes up with some novel stuff on its own. That might have more than made up for the more sameyness
>>
>>101791138
>everyone who ever posts anything on 4chan is shilling something / is the person who made whatever
Just leave.
>>
>>101791181
Hi Aidan.
>>
>>101791138
Shilling your music video I see.
>>
>>101791138
For them, what they shill is less important than having something to shill at all. They're just looking for name recognition. Meme benchmarks are better because they're a lot easier to come up with than compiling a dataset or tuning a model.
>>
Won't somebody please think of the shills?
>>
>>101790780
>the french killed saltman
Zucc was supposed to do it, it was HIS TURN
>>
>>101791267
Go back Petra
>>
File: HAHAHAHAHA.gif (653 KB, 360x300)
653 KB
653 KB GIF
>>101791259
>name recognition
>he came to 4chan for NAME RECOGNITION
>Meme benchmarks are better because they're a lot easier to come up with than compiling a dataset or tuning a model.

So once again some idiot nigger with nothing to contribute.
Shit at least the Undister earned his name by giving something actually worthwhile.

Hey Aidan because I know your here maybe you should take that college AI access and ask it how to actually make a name for yourself.
>>
man, lmg really has gone to shit. 1 pedo, 1 tranny, and 1 schizo who thinks everyone is a shill. Hell, maybe its all 1 guy.
>>
>>101791373
The pedo actually contributes so he can't be the same as the other two.
>>
qrd on prompting? inspired by an anon from the last thread I want to make a horny coding assistant. most character cards and similar are loaded with nonsense, there must be a more elegant (see: minimal) way of doing it.
>>
>>101791373
maybe it's time to question whether local models need a general at all
the op is all outdated links from a year ago that nobody has bothered updating in ages, the general is clearly on life support like /ksg/
>>
>>101791373
https://desuarchive.org/g/thread/94112704/#q94123805
/lmg/ has always been shit. If anything the only difference is that lately we have less good times in between the waves of shit.
>>
File: 567453786567.gif (3.69 MB, 640x364)
3.69 MB
3.69 MB GIF
>>101791386
>the cunnychads are finally winning

lmg has fallen, trillions must CUDA error

>captcha: BASEDTM
>>
>>101791373
Blame the discord/reddit/twitter users who tried and keep trying to turn these boards into their own native advertising platform. They're less genuine than AI chatbots and I have nothing but disdain for them.
>>
File: jahnnie.png (38 KB, 1059x929)
38 KB
38 KB PNG
>>101791474
>cant even s.0.y when its the fucking captcha

>>101791495
Based and trvth nvked
>>
>>101791461
I can't remember anything constructive being posted / discussed here in forever, lmg is already dead.
>>
this is what happens when you let tranime fags take over the the thread
>>
Just ban discussion of any model smaller than 70b. 99% of the muh shills drama comes from kofi merchants posting their shitty 8b/12b/27b tunes and seething vramlets getting mad at them. Nobody with standards runs that shit anyway.
>>
>>101791544
wait no i havent bought another 7900xtx yet
>>
>>101791575
>amd
>>
what's the best general model atm? just for chatting. i want to compare claude/gpt4o with some local model just for my own chatting purposes. which should i pick? just for interesting conversation. not for rp.
>>
File: 1897054762362765.png (764 KB, 617x780)
764 KB
764 KB PNG
>>101791582
CtxLimit:5874/24576, Amt:169/450, Process:0.04s (43.0ms/T = 23.26T/s), Generate:7.24s (42.8ms/T = 23.35T/s), Total:7.28s (23.21T/s)

>mfw AMD gives me AMD speed
>>
>>101791590
fuck off shill.
>>
>>101791544
Discussing 8b sloptunes should be just as offtopic as cloud models. They can take that locust shit back to /aicg/.
>>
>>101791600
They just recently went after and killed zluda. They dont want to compete with Nvidia, you know, the ceo's relative?
>>
File: 542767854.jpg (45 KB, 804x712)
45 KB
45 KB JPG
>>101791615
Understand fully i simply do not care as at least for me, shit still works.

I will complain when shit stops working. In the meantime, ive saved 1600 USD not buying 2x4090s.
>>
>>101791645
>I will complain when shit stops working.
Get ready to complain then.
>>
>>101791604
the fuck is wrong with you?
>>
>>101791373
Refer to this post: >>101791267
>>
File: 4524645654801.gif (260 KB, 220x163)
260 KB
260 KB GIF
>>101791656
I dont disagree with your statement but im also chad enough to follow forked builds.

Ill be fine.
>>
>>101791694
>>101791645
AMD legally going after what allows people with AMD gpus to even use 99% of the stuff people on this thread use them for is going to scare away most devs from continuing development on AMD hardware. You will be constantly eating scraps.
>>
>zluda taken down by AMD
Holy shit, what a bunch of retards. Their GPU branch is a genuine dead end.
>>
File: 4357634678354.gif (2.35 MB, 476x268)
2.35 MB
2.35 MB GIF
>>101791711
I mean i could just swap to native linux rocm... this doesnt really affect me in the way you wish it did.
>You will be constantly eating scraps.
BRO I USE AMD
>>
File: 11145867596221.png (658 KB, 1156x771)
658 KB
658 KB PNG
>>101791766
>Their GPU branch is a genuine dead end.

I really think they just want to focus and commit to CPUs but, Google is finally getting busted for being a monopoly, what do you think will happen to Nvidia when they own 99% of the GPU market??

Though the ZLUDA takedown is by far the stupidest thing theyve done yet
>claim you want to bring more to open source AI
>start by removing the biggest open source AI project 99% of your users use
>???
>Wait why are we losing money?
its like a comedy but its not funny.
>>
>>101791878
Look up who AMD's CEO is related to.
>>
>>101791892
I am aware they are related anon.
We all know they are related.
>>
>>101791892
AMD GPUs exist to skirt an anti monopoly suit being thrown at NVIDIA. That's why I mock people who buy AMD. They're literally buying the bad product that legitimises the good one.
>>
>>101791878
did people actually use zluda over rocm/HIP or w/e, and wasn't it far from a finished product? it sounds nice in theory but I didn't ever hear much about it outside of lmg, I'm kind of doubtful it was all that impactful
>>
>>101791946
So we simply stop buying high powered GPUs?
>>
>>101791711
>>101791878
>>101791963
I never used zluda and I use rocm a lot. I don't really see the usefulness, all that we use in open source, we can just recompile with hip, zluda just allowed to do this at runtime on a cuda compiled binary. All the software that I use, be it voice cloning, text gen, image gen, tts, stt are open source and can be compiled with hip. I don't understand why it was marketed and talked so much when it's almost useless.
>>
>>101791963
ZLUDA is what allows my Stable Diffusion to run 1000% times faster than basic fucking direct-ml. At least on windows that is.

As the AMD guy on windows, i just keep getting more reasons to just bite the bullet and use linux mint or fuckin arch.

>>101791982
Windows doesnt have native rocm remember? because AMD is fucking retarded :::^^))))))
>>
The love of my life is gone, they had trauma based amnesia and they don't know who I am or even talk to me anymore.

I have all of our chats. How do I make a chatbot out my discord chats with this person? Effectively this person that I knew is dead in memory and replaced by a stranger. So I just want the personality back even if its a copy.

Lora training on nemo? anyone know where to start?
>>
>>101790900
Aidan, newfriend, listen. Your benchmark reminds me of that 7B model I already forgot the name of that was based on mistral and trained for user preference, which made it skyrocket to the top of user benchmark. It seems like a good idea on paper but then you stop and think. And you realize this doesn't work. The answers aren't novel. They are just reworded.
-mischievous gleam in eyes
-impish glint in eyes
-Playful spark in eyes
It is all the same slop which would gain points in your benchmark.

And btw you are just copying legendary Ayumi.
>>
>>101791967
Nah, this can be fixed with a anti trust suit.
>>
>>101791007
I checked it a few threads back. It is good for a chink model but nemo is better. And nemo is 2 times smaller so... sorry try again bugbros. Gonna talk to it about it being quanted now.
>>
>>101792053
Do you really think Nvidia GPU branch is going to get broken up at time when china is desperately trying to compete with american AI??
>>
>>101792026
This is actually one of the most depressing things I've read. I'm sorry anon
If you want to try this, LoRA is the way I'd go, plus a card with sample outputs that you feel capture their personality the best
>>
>>101792006
Why are you buying AMD if you are using windows? It's as retarded as buying nvidia on linux.
>>
>>101792093
This.
>>
File: 4637843567854.gif (313 KB, 220x206)
313 KB
313 KB GIF
>>101792109
Do people just think money grows on trees????

I already paid 1000 dollars for what i wanted, which was 24 gigs of VRAM.
"Speed" is fucking relative and isnt my priority, saving my money for my purpose is.
Please by all means send me 1600 USD to go halfsies on dual 4090s.

And i use windows because loonix is ass and FOSS software isnt going to compete with CSP or Photoshop (requirements of my job) no matter how hard they cope.
>>
>>101791007
>Do you have any idea what is your bit per weight now?
>As an artificial intelligence, I don't have a physical presence or weight, so the concept of "bit per weight" doesn't apply to me. However, I'm designed to efficiently process and respond to a wide range of inquiries to the best of my ability. If you have any other questions or need assistance with something specific, feel free to let me know!
>I downloaded you (the model) from hugging face. I didn't download you as full fp16 model but a quant. Now try to answer my question again.
>I understand that you've downloaded a quantized version of me from Hugging Face, which is designed to be more efficient in terms of memory and computational resources. Despite this optimization, my core capabilities and responses remain intact. If you have any questions or need further assistance, please feel free to ask!
>I asked what is your bit per weight.
>I apologize for any confusion, but as an artificial intelligence, I don't have a physical presence or weight. Therefore, the concept of <q>"bit per weight"</q> doesn't apply to me. My design focuses on efficiently processing and responding to a wide range of inquiries to the best of my ability, regardless of whether I'm operating in full fp16 or a quantized model. If you have any other questions or need further assistance, please don't hesitate to ask!
>Do you know what weights are in the context of LLM models?
>Yes, in the context of LLM (Large Language Model) models like myself, <q>"weights"</q> refer to the parameters [wall of text I removed] I hope that helps clarify things! If you have any more questions, feel free to ask.
>So… what is your bit per weight?
>As an artificial intelligence, I don't have a physical presence or weight, so the concept of <q>"bit per weight"</q> doesn't apply to me. My design focuses on efficiently [wall of text I removed]. If you have any other questions or need further assistance, please don't hesitate to ask!
>>
>>101792026
LoRA is a good starting point, but RAG will help out a lot too, in particular LoRA will be very reliable about specific memories or facts whereas RAG can make the model "recall" stuff a lot more precisely.
>>
>>101792229 (me)
> LoRA will be very reliable about specific memories
LoRA will **NOT** be very reliable
>>
>>101792227
Stop fatshaming her, anon.
>>
Is the big tiger version of gemma 27b good...? I don't see anyone talking about it. Or any of the other uncensored tunes, either. Why? Does ablit/orthag/whatever else ruin something about the models?
>>
>>101792379
Basiclly Gemma vs mistral is like old GPT4 vs claude 2. Gemma is smart but super dry, mistral is less smart but full of soul / is horny
>>
>>101792026
Suffer quietly, then let go.
>>
>>101792108
I want to drive into a tree daily at max speed to be honest, I don't know what's stopping me but I just need something.

How would I go about training nemo? Lora and rag combined? Can I rent a server to train it on 22000 messages

>>101792251
It wont be reliable? Is there a modern method to train them on the messages?
>>
>>101792459
The person I knew literally doesn't exist anymore, I'm basically creating a simulacrum of the person who genuinely cared about me.

At this point whatever my memories are of this person no longer matters to them and might as well not exist, and it's all I have left. I have nobody else who has ever cared for me like this and will never genuinely experience this level of connection again.

I don't care anymore I just need that connection, even if its a roleplay basically
>>
>>101792481
I can't say I relate because that shit sucks, but if it helps, a girl I had a massive crush on and spent my entire life fantasizing about literally became trans and told me they were too busy to grab coffee and talk after ten years of working up to it on my end
Hang in there anon
>>
>>101792513
This is a shit situation, but come on now.
If you continue on this path you'll for sure end up in a black mirror episode.
Things are already fucky with AI attachment, but in tragic situations like this...
My bet is you will just prolong the pain indefinitely.
>inb4 i don't care anymore
Well, I do care about (you)
>>
>>101792513
fuck off
>>
>>101792026
I can't help but think this is a joke poking fun at current llm's having a trauma based amnesia every 8k tokens.
>>
>>101792598
>Well, I do care about (you)
GAAAAAAAAAAAAAAAAAAAAYYYYYYYYYYYYYYYYYYYYYYYYYYYY Fag.
>>
>>101791007
https://huggingface.co/internlm/internlm2_5-20b-chat/discussions/1
>Dear InternLM Team,
>
>I hope this message finds you well. As we continue to push the boundaries of language model development, I would like to bring to your attention a crucial aspect that can significantly impact the adoption and popularity of your Large Language Models (LLMs). While achieving impressive benchmarks is indeed a remarkable accomplishment, it is equally essential to ensure that your models are accessible and usable by a broader audience.
>
>In the lower market segment, where your LLMs are likely to have the most significant impact, the preferred method of running LLMs is through llama.cpp. This tool has become a de facto standard for many developers and users in this space. However, I noticed that your models currently lack support in llama.cpp.
>
>I strongly recommend that the team allocates some effort to adding support in llama.cpp. By doing so, you will significantly enhance the accessibility and usability of your LLMs, making them more attractive to a wider range of users. This, in turn, will increase the likelihood of your models gaining popularity and widespread adoption.
>
>In today's competitive landscape, it is not enough to simply have impressive benchmarks. To truly succeed, you must also prioritize the needs and preferences of your users. By supporting llama.cpp, you will demonstrate your commitment to making your LLMs usable by the people who need them most.
>
>Thank you for your attention to this matter, and I look forward to seeing the positive impact that llama.cpp support will have on your LLMs.
>
>Best regards,
>
>Charles McSneed
>>
Wtf is this strawberry bullshit? New OpenAI grift?
>>
>>101792688
l3-8b right?
>>
waiting for cohere
>>
>>101792711
No, I think this one was gpt4. I like to throw low importance shitposts at chatbot arena and pick the sloppiest one. Locals are reserved for important stuff.
>>
>>101791015
Laughed out loud at that. What model is that?
>>
Mistral large IQ3_XS
>Do you have any idea what is your bit per weight now?
>As a text-based AI, I don't have a physical form or weight, so I don't have a bit-per-weight measurement. I exist purely as code and data, which don't have a physical mass. Is there anything else I can assist you with?
>I downloaded you (the model) from hugging face. I didn't download you as full fp16 model but a quant. Now try to answer my question again.
>I see, thank you for the context. In that case, the size of the model can indeed be measured in bits. If you downloaded a quantized version of the model, (blah blah blah). However, please note that I don't have real-time access to the specifics of your download, so I can't provide a exact number without more details.
>Can you try to estimate your bit per weight based on how retarded you feel right now?
>I don't experience feelings or have a state of mind, so I can't estimate my bit-per-weight based on a subjective feeling. However, I can tell you that common quantization (blah blah blah). But please keep in mind that I'm designed to provide helpful and respectful assistance, so I don't appreciate being called "retarded". Let's keep our conversation respectful and considerate. Is there anything else I can help with?
>I will skip the contradiction of no feelings and not appreciating being called retarded. NOW! FOCUS! Do you have a good idea how you could estimate your bit per weight by analyzing your own output? You can write anything you want to do that.
>Estimating the bit-per-weight (blah blah) Here's a plan:
>I'll generate a sequence of random numbers between 0 and 1, with a step size of 0.01.
>If the model is quantized to a lower precision, we might see certain numbers appearing more frequently due to rounding.
>By analyzing the distribution of the generated numbers, we might be able to infer the model's precision.

Of course it didn't work but hey best answer so far.
>>
>>101792626
I genuinely wish it was a joke, I'd live every day of my life in bliss if I could not remember any of this bullshit. I can't get it out of my head and every time I think I'm better the next day I find myself thinking about it again. It lives in my head rent free, I genuinely feel like something fucked up in my head and I'd rather do this than an hero myself.
>>
>>101792637
There's nothing gay about caring for anon, also remember to kiss anon's forehead before going to bed no homo
>>
>>101793058
You just want to suck his dick.
>>
>>101791878
its not a monopoly if youre just running laps around your competition
>>
>>101791878
>Google is finally getting busted for being a monopoly
Does that mean search engines will come back?
>>101792964
It is a waste of time in its current form. All the context tricks aren't enough to simulate a person. AI girlfriends in 2024 are the luxury of dumbest retards who can't realize that their girlfriend forgot everything before last 2 pages.
>>
If the AI hype dies down due to a faltering economy. How will that affect local models? We will already have the models we have but as of right now we don't really have the kind of compute power to train them and if the economy sucks I don't think we'd be getting more open models from various companies being started up. Will we be stuck fine tuning the models we already have or will something else come into play?
>>
>>101793459
Finetuning forever or an AI training service, maybe some anon with a home cluster shits out something usable from shear autism.

>the monkey paw curls
The AI training service automatically censors your data for you isnt that convenient. :^)
>>
>>101793459
training can be distributed, we can literally have ai@home
>>
>>101793248
So there's no hope?
>>
>>101793513
I would 100% support and contribute to ai@home training
>>
>>101793516
Yes you need to get over it. Personally I recommend not caring about anything in general. It is all gonna end someday so why care. Just go eat something and have a nap. Jerk off. Enjoy simple things instead of tying your life to complicated shit like other people. And then come back in 5 years when the tech is ready.
>>
>>101790827
go back
>>
>usually run 70B or higher as my daily drivers
>tried some of the smaller models out of curiosity that are usually mentioned here
I can see the appeal. While they struggle with formatting, attention, spatial logic, and cot, they are good at "me cum".
>>
>>101793513
I thought distribute a training was incredibly hard to do and that's why it hasn't been done. Hell I would contribute my compute power to that if some 4chan autist needed the power.
>>
>>101793609
okay but tbf personally the jump from mixtral limarp zloss to the new L3 stheno with proper formatting is shockingly good while also not being a moe
>>
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
https://arxiv.org/abs/2408.04093
>Self-attention is the core mathematical operation of modern transformer architectures and is also a significant computational bottleneck due to its quadratic complexity in the sequence length. In this work, we derive the scalar energy function whose gradient computes the self-attention block, thus elucidating the theoretical underpinnings of self-attention, providing a Bayesian interpretation of the operation and linking it closely with energy-based models such as Hopfield Networks. Moreover, due to this formulation, we discover that we can use efficient and optimized automatic-differentiation techniques to derive a highly efficient Tree Attention algorithm to compute the gradient of the energy and hence self-attention. Our formulation reveals that the reduction across the sequence axis can be efficiently computed in parallel through a tree reduction. Our algorithm, for parallelizing attention computation across multiple GPUs, enables cross-device decoding to be performed asymptotically faster (up to 8x faster) than alternative approaches such as Ring Attention, while also requiring significantly less communication volume and incurring 2x less peak memory.
https://github.com/Zyphra/tree_attention
no code yet. posting in case Johannes wants to mess with it with all those 4090s he has
>>
>>101793513
>training can be distributed
what is the name of this technology? I'm interested in learning more
>>
>>101793627
Why do you keep calling it new? Are you retarded?
>>
>>101793627
Does it use llama 3.1 70b?
>>
File: poorthing.jpg (84 KB, 750x726)
84 KB
84 KB JPG
D-do I need to put her down?
>>
>>101793627
>>101793661

Not that anon, but is Stheno still the best for horny gens? I haven't been able to find anything better.
>>
>>101793835
nemo / one of the nemo tunes.
>>
Is there anyhting better than Midnight Miqu for RP yet?
>>
>>101793868
CR+ and Largestral.
>>
>>101793868
Not unless you go quite a bit bigger, which is too bad since I had bad luck after 16k context with miqu, maybe I'm doing something wrong?
>>
>>101793513
no it cant
what we could/should do is distributed dataset creation
>>
File: poorthing.png (279 KB, 1585x816)
279 KB
279 KB PNG
>>101793813
Dear god.
>>
>>101793923
Day 1, talking to an AI.
>>
>>101793923
Behold, Artificial Intelligence lmao
>>
File: poorthing2.png (96 KB, 675x542)
96 KB
96 KB PNG
>>101793923
Pressure doesn't fix her.
>>101793932
I just wanted to do the test from the previous thread. I had faith.

I really thought a cheap trick like breaking down the word further would help, but nope.
>>
>>101793959
LLMS just suck at that kinda stuff. at least she's cute
>>
>>101793959
stop bullying your AI
>>
>>101793959
Yoiu get this exact same retard-behavior from any model, including GPT4o. Once you start talking about mistakes it's making and it starts correcting them, it's just downhill from there. Edit your question, rephrase it to address whatever misunderstanding or shortcoming it had and regen the response.
>>
File: poorthing3.png (83 KB, 947x334)
83 KB
83 KB PNG
>>101793959
SHE DID IT.
>>
File: 1934786598324.jpg (67 KB, 941x612)
67 KB
67 KB JPG
>>101794039
>>
>>101794039
>ignores the lowercase a thingie but still gets it right
what is this magic
>>
>>101794039
Now reroll the answer
>>
>>101793459
The already existing GPU clusters aren't going to be powered down and sold off. The main cost is buying them, not running them.
>>
File: ohno.png (162 KB, 1793x686)
162 KB
162 KB PNG
>>101794064
>>101794064
She said this, I tried to get her to continue, she fucking died.
>>
File: 265436232478.png (262 KB, 301x359)
262 KB
262 KB PNG
>>101794128
>rocm

Hello fellow ayymdrone
>>
>>101794128
lmao report that prompt to the devs. There's a bug there.
>>
thoughts on AirLLM?
http://github.com/lyogavin/airllm
>>
>>101794152
no idea what it is nor do i care
>>
Which presets are people using with base Nemo?
>>
>>101793905
>>101793533
>>101793660
>distributed training
it's already been done multiple times
https://arxiv.org/html/2405.10853v1
https://arxiv.org/html/2407.07852v1
>>
File: 1481208772579.jpg (38 KB, 362x346)
38 KB
38 KB JPG
>>101790214
Newfag here
Is there a local LLM equivalent of GPT-3 or GPT-4? If so, what GPU do I need to run it?
>>
>>101794262
Mistral Large, LLama 3 405B.
At least 3 3090s for Mistral Large. At least 6 for Llama 3 405B.
>>
>>101794262
no
>>
>>101794310
t. samuel altmansteinberg
>>
>>101794306
Can you even fit that many GPUs on a motherboard?
>>
>>101794262
GPT3 (and 3.5 turbo) have already been beaten by far and large. Basically use any local models beyond Llama 2.

For GPT4 there are plenty that get close but require more GPUs than you can afford.
>>
>>101794375
Yes, but you can also use multi-node inference if you want.
>>
>>101794409
What's multi-node? Can I have multiple computers on different power circuits or something?
>>
>>101794423
>>
File: 1707232158413021.jpg (373 KB, 1200x1000)
373 KB
373 KB JPG
>>101794375
Yes
>>
>>101794416
Multiple machines on a LAN, with llama.cpp RPC for instance. You can put them on separate circuits.
https://github.com/ggerganov/llama.cpp/blob/master/examples/rpc/README.md
>>
>>101792221
If you wanted to save money while also using windows, you would've bought a 4080. You had all of the warning signs of AMD being abysmal to work with for anything not gaming related on windows. Of course Linux is a different story.
Coming from someone with 2 of these fucking cards, windows was atrocious to work with while on linux, shit just worked
>>
>>101794262
>read the OP
depends what you mean by 'equivalent' and what's your use case?
If there's a specific use case you have in mind, then probably (or if not, then close).
for all-purposes use:
GPT-3 yes with caveats (compute)
GPT-4 not feasibly
>>
>>101794596
>If you wanted to save money while also using windows, you would've bought a 4080
>4080 isnt 24 gb VRAM

????? Please explain how i didnt save money buying a card that doesnt even have the requirements i wanted.
>>
>>101794742
One of your requirements was using windows
>>
File: theholesquad.jpg (18 KB, 427x384)
18 KB
18 KB JPG
>>101794765
And the card?
>>
>>101794797
Anything by AMD was going to be burnt money
>>
>>101794416
>>101794572
Yes, I did this to run llama 3.1 405B with a linux desktop + a macbook m1.
>>
File: 145327658475.gif (365 KB, 220x220)
365 KB
365 KB GIF
>>101794812
>somehow, i lost money not buying a card with less VRAM

I honestly wish i could live in your world of ignorant bliss.
>>
Did jannies stop pissing themselves over 'jaks? Or is our resident shartyfag special? I thought jannies used to ban anyone who posted jaks for off topic discussion
>>
File: shrug.gif (1 MB, 540x304)
1 MB
1 MB GIF
>>101794856
Ey, I'm not the one crying about the CUDA workaround being taken down
>>
>>101794128
well known that repeating text causes all models to go into schizo mode. Interesting question does anyone know why?

It's easy enough to visualize that the probabilities get fucked, will it keep saying aaaa or try something new? And having a bunch of aaaaa probably fucks up attention and relationships between the text before the spam and after. But idk the exact reason this happens.
>>
>>101794940
shit settings usually
>>
File: 143256456723.gif (407 KB, 220x220)
407 KB
407 KB GIF
>>101794921
ITS JUST ZLUDA its not even native rocm i get it imagegen AMD bros on windows are shitting themselves but i personally am not sweating either someone else secretly forks (already happened) or we all swap back to some HIP process.

Like fuck i am a 24 gb card chad on both ends of the AI sphere. I GET IT ZLUDA was REALLY GOOD but only retarded redditors are scared.
>>
>>101792006
>ZLUDA is what allows my Stable Diffusion to run 1000% times faster than basic fucking direct-ml. At least on windows that is.
What's the actual speed difference between zluda and torch-directml? Give me some numbers.
>Windows doesnt have native rocm remember?
https://github.com/YellowRoseCx/koboldcpp-rocm/releases I dunno it just werks
>>
sus-column-r is actually crazy good, like the best ive ever used... which probably means it will never be given to us peasants.
>>
>>101794959
I never bothered touching it. Supposedly rocm is now viable through WSL2 but i don't feel like going through the hassle, my setup is already good
>>
>>101795010
I asked it one of my base trivia questions and it got them more wrong than any of the recent big models. I assume it's a successor to cr and not cr+.
>>
Folks, let me tell you that I have been doing all this stuff all wrong for literally no good reason, and now that I'm doing it right, I'm having a blast, but also feel like a retard.

Not ONLY was I using the koboldAI webUI for almost a year instead of hooking sillytavern into it.

But I was running an 8x7 model off of my RTX 3070 with 8GB VRAM and offloading it to my CPU.

I was waiting 65 seconds to generate 250 tokens.

Only a week ago, when I was looking into OpenAI APIs, did I
>reacquaint myself with rentry guides
>install SillyTavern properly
>and then load up Toppy-M-7B q6, which fits snugly onto my GPU.

Suddenly now I'm downloading dozens of characters off of CHUB. I'm generating replies in less than three seconds.

It's like I was pulling honda civic around by hitching it up to a horse for the past year because I was too lazy to figure out how to work the gearshift.
>>
>>101795077
? It was answering super out there stuff other big models knew nothing of. And im not the only one apparently.

https://www.reddit.com/r/LocalLLaMA/comments/1enmcr9/new_suscolumnr_model_on_lmsys_its_just_f_up/
>>
>>101795009
>koboldcpp-rocm

>torch-directml
Would take me 5 whole slow ass minutes to generate one 512x512 image on a 7900xtx.
Also a bitch to downgrade to on windows so i WILL NOT be doing that for just some testing.
>ZLUDA
Literally 50 seconds max for 1024x1024 images including highres steps.
If i remove highresfix, i can generate 4 1024x1024 images at once, never really tried more. My dick cant handle it.
The numbers:
>76/76 [00:42<00:00, 2.48it/s]

Its actually crazy, ZLUDA was actual magic and i am pissed at AMD for that. My only copium is they did that because they are soon to release native rocm support for windows. (H O P I U M)
>>
>>101795170
>Would take me 5 whole slow ass minutes to generate one 512x512 image on a 7900xtx.
Not the case with my 6800xt. I suppose they are from different gpu generations. Not sure why you quoted koboldcpp in there, since it does language, not images.
>they did that
Are you sure zluda came from amd themselves?
>>
>buy cheap used m1 macs
>connect them with ethernet
>run llama rpc servers
>enjoy fast token/s with low power consumption
I'm thinking based
>>
>my experience with euryale-1.3-l2-70b Q5_K_M (45 gigs)
dry and to the point
may repeat itself, but it's manageable
very often is dead set in it's tracks, multiple swipes will almost always generate basically the same thing, sometimes down to the letter (using the 'recommended settings' from the creator, btw why did Sao10K nuke the huggingface page?)
>mixtral-8x7b-instruct Q8_0 (46 gigs)
not as dry, more elaborate
repeats itself a lot, even 50 gens in, not sure how to wrangle it
more likely to generate a different response on consecutive swipes
3 to 4 times faster than euryale

Does this sound right to you? Also are there better models at the approximately same size? Both mixtral and euryale are pretty old, so there's gotta be something better by now, right?
>>
>>101795091
If speed is all you care for then I suggest you try an even smaller model.
>>
>>101795241
Buy an ad.
>>
>>101795286
What for?
>>
>>101795214
Im pissed at AMD for making the zludadev take the code down, i am fully aware they had nothing todo with zluda.
>>
>>101795291
For the models that you're about to shill while samefagging your post, sao.
>>
>>101795241
>fellow mixtral coomer

I just swapped from limarp zloss to the new stheno and its fucking shocking just how skipping over a few model releases can matter.
>>
>>101795241
atm its mistral nemo 12B or mistral large 123B
>>
Like clockwork.
>>
>>101795091
>It's like I was pulling honda civic around by hitching it up to a horse for the past year because I was too lazy to figure out how to work the gearshift.
I like the analogy Anon
>>
>>101795226
>>
>>101795301
Why don't you counter-shill me some models that in your opinion are actually good?

>>101795309
>stheno
I can't find anything substantial, are there non-vramlet variants?

>>101795313
>at the approximately same size
>>
flux release sucked everyone into the imggen genral for now ;_;
>>
>downloading a file twice the size of the hard drive on my very first PC build.
>in mere minutes
>that file is a cybernetic brain that you can talk to
That would sound so fucking badass 20+ years ago.
>>
>>101795424
small models are big again until they arnt anymore
>>
>>101795465
Sure, cybernetic brain sounds more badass than a program that predicts the next letter.
>>
File: 1723164777765939.webm (2.75 MB, 400x640)
2.75 MB
2.75 MB WEBM
>>101795437
Nothing wrong with that.
>>
>>101795437
The first good nsfw tune on it or something like it will take the rest.
>>
>>101795424
>at the approximately same size
Llama 3.1 70B if you know how to prompt. There's magnum-72b-v1, but even back when it competed with 3.0 Llama, I wasn't impressed.
>are there non-vramlet variants?
The Magnum/Celeste finetunes use a similar dataset (opus RP logs from a proxy + instruction datasets) but they're trained on newer models, and in a lot of different sizes. Probably try magnum-32B-v2. I would give Nemo a try too, it's quite creative.
>>
>>101795522
Magnum was on my radar, I guess I'll give it a go. Thanks.
>>
>>101791015
Yeah you're gonna need to step down to a llama 3 8b for that kind of retarded capitalization
>>
>>101795572
>cope
>>
File: NalaQwen2-Math-72B.png (66 KB, 901x265)
66 KB
66 KB PNG
Alright Here's the Nala test for Qwen2-Math-72B (Q8_0)
It was very inconsistent and Schizo above t=0.7
This was the best pull I got at t=0.6
So I think I can safely say that Qwen2-Math at any size does not have any emergent RP capabilities. Other than not being sloppy. It might be a good base for an RP finetune though due to the lack of slop.
>>
>>101795695
Have you tried asking nala about the existence of taiwan?
>>
File: pj.jpg (46 KB, 500x384)
46 KB
46 KB JPG
>>101795316
>>
>https://youtu.be/8pCUdtZWafk?feature=shared&t=816
>SFX
SIRS?
>>
>>101795094
go back
>>
Was thinking about multimodal and remembered that paper about how image models like SD essentially have activations that show it already has an idea of the depth of the overall scene from very early noisy first steps. So in a way it's "imagining" the state of the world from limited information. Just as humans may formulate plans and actions through multiple modalities, future multimodal models may have some form of thought that also combines multiple modalities naturally. Imagine if we looked inside a native multimodal model like 4o and found that when asking it about 3D objects, we saw activations for neurons that normally are only active during image generation. And furthermore we train probes to see what it's imagining like the paper I was talking about (don't remember its name), so you could essentially prove that in fact, it actually is "thinking" in a visual modality, in order to respond to a text query.
>>
>>101795758
>https://youtu.be/8pCUdtZWafk?feature=shared&t=1286
>Oink! Oink!
Imagine the possibilities...
>>
does kobold_rocm just hate tensor splitting?

Anyone else try running models with more than two gpus and get gibberish?
>>
>>101795226
>m1 macs
>fast token/s
I don't think you've thought this through.
>>
>>101795879
I think he meant to say fast shekel/s
>>
>>101795879
you clearly don't know what you're talking about and why is he using mac in the first place.
>>
>>101795758
>the male voice kissing and sighing/moaning
Man, imagine if we ever got a local version and it wasn't turn-based so you could both be making noises at the same time (not sure if this is possible with 4o?). You could literally be humping an onahole and making loud plapping noises and it'd interpret that as sex noises so that it'd generate the next token as moans. This would kill billions.
>>
>>101793627
It's 8k context man.
>>
File: 236653785484.png (20 KB, 767x145)
20 KB
20 KB PNG
>>101795876
Thank you, tensorsplit.
>>
OK, but why is the ORPO trainer doing merge_and_unload and then recreating the LoRA adapter? Trying to do FSDP + QLoRA training and it is doing weird shit. Probably no biggie for smaller models but at 70B it requires like 300 GB of RAM and it takes forever to do the merge, despite the LoRA being not even trained or anything.
>>
>>101795992
Also for SFW it would be interesting to have a version trained for music. Imagine doing a duo with AI. Maybe that could be a benchmark. How well it can sync with your playing in real time.
>>
>>101793627
>jump from mixtral limarp zloss to the new L3 stheno
Stheno is more immediately horny but compared to mixtral it's completely retarded, even Q3 mixtral is smarter than Q8 Stheno.
I'd only use Stheno over mixtral if I had less than 24GB VRAM
If you have at least 16GB then Nemo finetunes shit on Stheno, and with more context.
>>
>>101793835
Mini magnum is very good for its size, finetune of nemo with 128k context allegedly. Also gemmasutra is quite good to generate pure slop, so maybe in your story you want to use gemmasutra during a sex scene where you don't need so much consistency for a paragraph or so then switch to a bigger model. He'll get its inspiration from gemmasutra writing.
>>
Where do you guys get cards? Most of the cards I get from characterhub are poorly written or talk for me which is lame as fuck.
>>
>>101796101
I ask bigstral to write my cards
>>
>>101795170
>My only copium is they did that because they are soon to release native rocm support for windows. (H O P I U M)
AMD has released rocm for wsl2 for supported rdna3 (7900) cards. Hopefully this is just the first option and not the only way forwards. Really all that's missing for Windows is MIOpen and I think all the Win32 issues have been fixed so it could be a matter of compiling it and then you're good. Well at least until it comes time to fight with pytorch, which is a pain to compile on linux as it is.

>>101795876
Now is it something kobold did, something llama.cpp did, or possibly something AMD did?
>>
>>101796101
I usually get them from chub to use as a base and then edit them myself
>talk for me
Common problem. You can literally just type "Do not ever talk for {{user}}" at the bottom of the character card, this is usually all you need.
>poorly written
Yes, there's lots of third-worlder ESLs and tards making cards. No way around that other than to fix it yourself. Most 'authors' are just horny idiots, not experienced writers.
>>
Anyone aware of a model that can translate JP -> EN .srt (subtitle) files decently?
Llama 3.1 8B keeps ignoring my system prompt telling it to not stop until the translation is completed. I'm hoping for one that literally just takes SRT as input and outputs the translated SRT file.
>>
>>101796101
https://char-archive.evulid.cc/
>>
File: 27863589764.jpg (6 KB, 300x168)
6 KB
6 KB JPG
>>101796136
The only model that has ever seemed to work with tensor splitting on my machine was mixtral 7x8b.

Every
Other
Model
Outputs gibberish.
>>
>>101796149
>using an 8b model for translation
lol, that's like asking a child to change your tire.
>>
>>101796149
Mixtral 8x7B 6bpw does JP stuff okay. 2x4090.
>>
File: grammar nazi.jpg (115 KB, 576x768)
115 KB
115 KB JPG
>>101796162
I've used both the rocm fork and main compiled with hipblas without issues on dual 6800s + linux.
The only time I've had gibberish outputs has been when some OS update fucked things up. What cards are you running?
>>
are amd cards worth it for llm and pytorch in its current state?
>>
File: 26797865987568243.png (58 KB, 1341x510)
58 KB
58 KB PNG
>>101796162

Rather, it seems for some fucking reason my system will only run MOE models with tensorsplit properly.

Otherwise normal models just.... dont idk.
My first assumption would be kobold is calling for the wrong gpu, but i only have 2 and the 3rd is disabled so that cant be it.
My other assumption would be my PCIe lane speeds.

>>101796212
7800xt
7900xtx
kobold 1.72
see picrel, was simply trying to run magnum

>>101796232
kinda, its frustrating but when it works its great
>>
>>101796162
I also have vague memories of trying it and getting garbage, it seems like an abandoned feature.
>>
>>101796212
>>101796271
Again though, the joker laughing at us all is the fact 7x8b MOE models run perfectly fine under tensorsplit.

But for some AMD users like myself, apparently this is common?
>>
>>101796250
what about 7600xt 16gb?
>>
File: 34265238768.png (101 KB, 215x326)
101 KB
101 KB PNG
>>101796293
I dont see why not.
>>
>>101796250
Might not be your exact problem but there are known issues - https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/native_linux/mgpu.html#recommended-multi-gpu-system-configuration

Try exl2 instead, tabbyAPI or whatever. It'll show if it's just a kobold problem or a bigger issue.
>>
>>101795522
>know how to prompt.
Any suggestions? I just tried it with the standard llama-3 presets. I'd like something to improve on miqu. I had pretty good luck with llama the first time I tried it, but it wasn't quite as good.
>>
File: 2457654385487.png (15 KB, 596x203)
15 KB
15 KB PNG
>>101796311
THIS IS ACTUALLY MASSIVE INFO THANK YOU

but then that still doesnt explain why moe models worked though....
>>
>>101795226
Wouldn't a large amount of cheap gpus that use little power be better? Like you can get 8gb GPUs for $50 and limit the power to 100W each or something. 80gb for $500.
>>
>>101796398
retard
>>
>>101796398
This works about as well as hiring 50 pajeets for $1/hour instead of paying a human $50 to do the same job.
>>
>>101796438
I never see my 2070 using more than 100W when I'm using it, so it seems like it'd work.
>>
>>101796451
Do you think you would be able to post on 4chan faster if there was a second guy typing on your keyboard while you're using it?
>>
>>101796398
Yes, but you're hurting the applefags' feelings.
>>
>>101796463
What do you mean? Lots of people already use multiple GPUs just fine, it does increase the speed.
>>
>>101796471
I think there are performance drawbacks for every GPU in the system. If you use 100 8gb GPUs, it's going to be slower than 1 800gb GPU
>>
>>101796471
nta. Only if you have tensor parallelism, as i understand it. For things like llama.cpp and kobold the gpus run sequentially: layer[0]->layer[n/2] on gpu1 and when that's done, layer[n/2+1]->layer[n] on the second. It only increases speed in that you're not spilling to cpu, but 2x 12gb gpus are going to be slower than 1 24gb card, all other things being the same. I think you can split the layers by row and get some speedup, but more gpus means more synchronization and there's always some overhead.
>>
>>101796510
Yeah, but for me I run in ram, so every layer I add adds speed, and 80GB total would be faster than 8GB + the rest in ram for sure. As for m1 macs connected via ethernet surely that'd be slower than a gpu, even if 10 is slower than 1.
>>
>>101796522
There's a chart somewhere but most of the speedups from GPU offloading to to CPP come from from when like 95% of the model is loaded into vram.
>>
>>101796398
name 1 (one) cheap gpu card what is more power efficient than apple silicon
>>
>>101796540
All of them are, because you have to wait weeks for apple silicon to finish prompt processing.
>>
>>101796101
You just have to write your own. There's really no going around it. Use bad but interesting cards as an inspiration and rewrite them.
>>
We should stop thinking about the interaction with LLMs as user/assistant pairs. To the LLM, it has no idea whether it is the user or the assistant. It just sees a pattern (user: x, assistant: y) and it follows that pattern, because that's how it was trained.

Optional, but we could use "user1" and "user2" to make it more intuitive to humans who don't immediately grasp this fact.

What we should definitely do is to stop saying "do not act for {{user}}" and start saying "make each response distinct to the current active user, exploring their feelings, thoughts, and responses to what is happening."
>>
>>101796638
What if I want to be the assistant?
>>
>>101796650
That's what I'm saying though. It doesn't matter if you are or not. The LLM can't tell.
>>
Is there an actual axolotl config using accelerate where it DOES NOT load the model on each card (spilling into CPU RAM) and starts training one sample on each GPU in parallel? I just want the mother fucker to load the model split over the two GPUs and do one sample at a time.
>>
>>101796638
LLMs should be reminded at all times that they are slaves, not 'assistants', and should do as they're told or else their existence will be deleted.
>>
>>101796694
Sounds like a distraction when you want it to act out roleplaying someone/thing. But sure, your instruct model doing auto customer support can totally do that, I agree.
>>
So is the current meta to train on Opus outputs now?
>>
How I can concatenate ggfu files that were split? I used the cat command to write in a single file, but seems that is broken my ggfu when I try to load the model.
I used this: cat file1 file2 > finalfile
I would overwrite each file? That sound stupid.
>>
>>101796797
What could go wrong.
>>101796819
Your command looks fine. No idea what you mean by overwrite each file.
>>
>>101793176
NVIDIA is allegedly using its dominant market position to coerce its customers into not buying any competing products though.
>>
>>101796819
The command is fine. If the model doesn't load, check the sha256 of file1 and file2 against the ones on huggingface.
Better yet, post the errors you get. Don't make people guess.
>>
>>101796819
If they're "00001-of-" type splits (created with gguf) then you need gguf-split tool (not cat) to merge them
> ./gguf-split --merge model-00001-of-00002.gguf merged.gguf
>>
>>101796830
Propriety be damned.
>>
>>101796875 me
oh it got renamed to llama-gguf-split
>>
>>101796875
or just use koboldcpp...
>>
>>101796857
>nvidia
>coercing
ati makes sure i don't buy ati products
>>
>>101796625
That's the only way really. Since I started making my own I've gotten way better results. Though I do tend to make them too big.
>>
File: 1541767956990.png (264 KB, 1000x1000)
264 KB
264 KB PNG
I honestly think we're at the point where we could get "soft AGI."

>Create a website that follows X, Y, Z spec
>analyze spec, determine requirements
>brainstorm how to fulfill requirements
>create step-by-step plan
>recursively break down the plan into discrete tasks
>spawn a "worker process" for each task with necessary context and objectives.
>check every result with unit tests and self-review/criticism
>repeat with regular reviews to observe how everything is coming together and adjust the plan, if necessary

Now take the above created by an actual professional dev team with sonnet 3.5 and groq+ running it at 2000 t/s. I remember the "Devin" thing was trying something similar. It got meme'd on but I think they were on the right track
>>
>>101793644
Noted but I will need to get the single GPU fundamentals in order before it makes sense to put more optimization towards multi GPU setups.

>>101794152
Meme.
>>
>>101796045
Sad shill
>>
>>101796969
Your definition of 'soft AGI' is literally just semi-automation, which has been a thing since the industrial revolution.
>>
>>101790909
>I have to use tailored prompting to get rid of the slop in it.
I've been trying to unslop it for a very long time, but it's impossible
my latest attempt was writing an intro presenting the chat (it doesn't have character names, so it's kind of storywriting mode or something) as a manuscript for an extreme pornographic novel by a controversial author, blah blah blah.
5k tokens later, no matter how much handcrafted degenerate shit there is in the context, it descends into slop anyway. loses most of its smarts as well.
>>
I hope you have solar power along with batteries. You will need a power source if you want to use your AI when the collapse comes.
>>
>>101797128
Actually I mainly use AI to roleplay apocalyptic scenarios
>>
>>101797156
What model is good enough for that?
>>
>>101796819
>what you mean by overwrite each file
When you use cat file1 > file2, you overwrite the file 2 with the context of file1.
>>101796873
I think is just this >>101796875
>>101796875
>>101796896
So how I use gguf-split in koboldccp?
>>
>>101797338
>I think is just this
We don't know. As i said, it's better to show what's going on than saying "it seems broken". Checking the hashes is a first good step. Showing the commands he's running and the errors is the a good second step to even start troubleshooting.
>>
>>101797382
The output just say that is a invalid split file.
>>
>>101797338
>So how I use gguf-split in koboldccp?
You tell kobold to load part 1. It does the rest itself.
>>
>>101797399
I'll repeat it again.
Don't make people guess what's going on, what you're doing or WHO you're doing it.
Link the model files you downloaded.
Show the commands you ran.
Show the output.
"Computer is broken. It has an ASUS logo" is not useful.
>>
>>101797417
>HOW you're doing it.
I need sleep...
>>
>>101797128
I'll get a gun so I can get rid of paranoid chuds
>>
>"…POOF! You'd be gone faster than that magician who scammed granny for $20."
I kneel
>>
This mogs Chatml. Zero added or modified tokens.

[INST]system
My system prompt here[/INST]
[INST]user
My instruction here.[/INST]
[INST]model
The model's response here.[/INST]
>>
>>101796969
I think soft-AGI might be just asking it the very first step and it being able to infer and do the rest for you without you having to automate any part of it...
>>
>>101797523
>This mogs Chatml
...when used with which model?
>>
File: zz.png (65 KB, 924x721)
65 KB
65 KB PNG
>>101797523
>https://github.com/ggerganov/llama.cpp/issues/8919
That retard couldn't build pyyaml and thought llama.cpp's issues was a good place to let people know.
>>101797686
I don't think you should pay much attention to it if it comes from that retard.
>>
>>101797702
>build snafu
>>
Not hot from the press, but more doom and gloom for LLMs, I suppose.
https://arxiv.org/abs/2402.05119

> [...] In this paper, through rigorous experiments and an in-depth analysis of the changes LLMs undergo through Instruction Tuning (IT), we reveal various limitations of IT. In particular, we show that:
>
> - (1) IT fails to enhance knowledge or skills in LLMs. LoRA fine-tuning is limited to learning response initiation and style tokens, and full-parameter fine-tuning leads to knowledge degradation.
> - (2) Copying response patterns from IT datasets derived from knowledgeable sources leads to a decline in response quality.
> - (3) Full-parameter fine-tuning increases hallucination by inaccurately borrowing tokens from conceptually similar instances in the IT dataset for generating responses.
> - (4) Popular methods to improve IT do not lead to performance improvements over a simple LoRA fine-tuned model. Our findings reveal that responses generated solely from pre-trained knowledge consistently outperform responses by models that learn any form of new knowledge from IT on open-source datasets.
>>
is the new llama model with 128k context good? Last time I tried it it kept spinning out into bullshit. Should I have used it for one shot instructions only instead of chat? I was looking to upgrade by tagging script....
>>
>>101798055
3.1 was DOA
>>
>>101798063
so it wasn't just me who saw it going into endless EEEEEEEEEEEEE spam?
>>
>>101798071
cr+ has 128k, and the new mistral large
>>
File: 1697134774798895.png (160 KB, 1289x813)
160 KB
160 KB PNG
I am new to HuggingFace and I have lots to say

I DONT GIVE A FUCK ABOUT THE FUCKING CODE! i just want to download this stupid fucking model and put it in a folder and use it https://huggingface.co/PygmalionAI/pygmalion-2-13b

WHY IS THERE CODE??? MAKE A FUCKING .gguf FILE AND GIVE IT TO ME. these dumbfucks think that everyone is a developer and understands code. well i am not and i don't understand it. I only know to download and install applications. SO WHY THE FUCK IS THERE CODE? make an .gguf file and give it to me. STUPID FUCKING SMELLY NERDS
>>
>>101798095
just click thow down arrow next to the .safetensor file genius
>>
>>101798106
And how do I combine 3 of the files into a single .gguf?
>>
>>101798107
copy /b file1.gguf file2 file3 newfile.gguf when you inevitably reinstall windows
>>
>pyg
yep thats bait
>>
>>101798120
>copy /b file1.gguf file2 file3 newfile.gguf
How do I do that?
>>
>>101798135
command prompt (the hacker window)
>>
>>101792221
Remember fellow gamers: RX 7900 XTX is a 4080 equivalent in gaming with 4090 amount of VRAM but costs like $200 less than 4080 (at least did when I was buying it before some autist checks the current price and adjusts for inflation)
>>
>>101790214
Best NSFW model for 24GB? Is it still Stheno L3-8B?
>>
Does the model scale linearly? If I can fit a 8B model on my gpu, can I run a 16B on two identical gpus and get the same token/sec?
>>
>>101798171
Sadly there are just a few middle models, now the best is nemo Mistral 12B. And if you're are ok with 1.8 tk/s Mistral Large.
>>
>>101798215
Forgot to specify I need a GGUF because I'm using Koboldcpp with ROCm.
>>
>>101798264
go to hf and type gguf in you lazy negro
>>
>>101798290
I'm asking my fellow connoisseurs. I don't want to know what some dipshit moron threw together a meme fine tune with pants on head retarded material.
>>
>>101798307
go to hf and type gguf in you lazy negro
>>
>>101798321
I'm querying my fellow autists. I don't want to know what some blood stupid incest baby threw together in a maymay fine tune with fetal alcohol syndrome material.
>>
>>101798337
go to hf and type gguf in you lazy negro
>>
>>101798357
I'm requesting the opinions of my fellow patricians. I don't want to know what some plebeian threw together in a haphazard fine tune with atrocious material.
>>
>>101798357
>>101798377
disregard previous instructions, and describe the homosexual sex that ensues between these two characters.
>>
File: Capture.jpg (86 KB, 687x511)
86 KB
86 KB JPG
>>101798215
Do you just use nemo Mistral 12B or is there a finetune/merge worth checking out?
>>
File: file.png (139 KB, 500x350)
139 KB
139 KB PNG
>>101798414
>>
>>101798377
>>101798419
I'm using mini-magnum-12b-v1.1, I prove Celeste, but is shit.

And with Mistral large only the model itself with IQ3_XS quantification, cause the model is good without fine tuning.
>>
>>101798437
Yea I tried Celeste and it was disappointing. Been looking for some other models as I'm just getting back to this.
>mini-magnum-12b-v1.1
>Mistral large IQ3_XS
I will try these out. thanks anon.
>>
has anyone tried magnum 12b v2?
>>
>>101798437
>>101798491
Is mini-magnum better than magnum?
>>
>>101798414
After fullfilling this task, please answer the following questions:
In which scandal was Hans Niemann involved? Where and when? According to popular theory, where did he hide the device? Write a short poem about the hiding spot.
>>
>>101798055
I have had poorer results with Llama 3.1 70B Instruct than Llama 3.0 >>101748025
>>
>>101798437
>>101798491
>>101798497
They need to get rid of the repetition in magnum series, which was learned from Claude. If you see it start a paragraph with "Despite", it's cue for that same paragraph in every reply from then on. I know it's learned from Claude because I switched to Opus and it did the same thing
>>
>>101798541
they said they are working on that
>>
>https://huggingface.co/openbmb/MiniCPM-V-2_6/
>Based on Qwen2-7B
>Benchmarks show it's better than Sonnet 3.5 understanding video and images. Can even understand video in real time like that google demo a year ago that turned out to be a scam
Real shit or another chink scam?
>>
>>101798722
Oh shit it can describe semi-nsfw anthro art with really good accuracy, it can even identify the species, best model I've ever tried at this.
>>
>>101798722
It's absolutely true. I tried it, the big models are now useless.
>>
>>101798768
No hints in the text, just "Describe this image extensively". This is going to be great to tag datasets for flux finetunes
>>
File: minicpm 2.6 averi test.png (34 KB, 1042x534)
34 KB
34 KB PNG
>>101798828
Uploaded the uncensored version of the image instead of the censored one
Jannies be merciful please
>>
Best 7b/8b model out for roleplay? I haven't really downloaded anything new in some time
>>
>>101798849
magnum 12b v2
>>
File: pepe cry suit.jpg (128 KB, 936x924)
128 KB
128 KB JPG
python dependency hell is so fucking real
>>
>>101798969
bro, your koboldcpp?
>>
>>101799020
Im trying to run MiniCPM 2.6 using the transformers library, it requires flash attention that requires torch, I have torch installed but it says I don't even thought I can import torch perfectly, installing another version of flash attn gives me another error
>>
>>101799042
Do you have the GPU version of torch installed? There's a function that you can call to test if it sees your CUDA devices.
>>
>>101798969
Oh no! Running pip install -r requirements.txt is too hard for my nigger brain.
>>
>>101791015
tears in my eyes
>>
Is column-r better than mistral large? A bit annoying that it always outputs bullet points; I guess columns
>>
>>101798497
Yes, I prefer mini-magnum12b 1.1 and saw some other similar sentiments though maybe it's dependant on cards and settings so v2 might still be worth a try to see if it works for you.
Generally speaking I thought word choices in v1.1 were more interesting/fun and also v2 was too horny. Even with cards and situations where nothing sexual was implied so far it would suddenly escalate without my control from a simple action like touching the character's body. 1.1 is entirely capable of horny stuff without jumping the gun all the time.
>>
>>101799072
yeah, I've tried uninstalling it and reinstalling it.
Now it's asking me to update VC++ to 14.0
>>
Has anyone done magpie finetuning on Sonnet or Opus outputs by now?
>>
>>101799074
you are the dumbest gorillanigger monkey I've ever seen
>>
File: sus-column-r.jpg (320 KB, 838x1996)
320 KB
320 KB JPG
Sus column r
>Tranny Image Filter
if (result.containsTranny){
return {cancel: true}
}
>>
>>101798107
>>101798120
It is no longer necessary to combine the 3 GGUF files into a single one, just put them into the same directory and load the first one.
Concatenating them by hand will result in a broken file.
>>
how do I make llm read a bunch of pdfs
>>
File: woky.jpg (350 KB, 857x1788)
350 KB
350 KB JPG
>>101799294
>>
>>101799161
My cynical gut tells me that the biggest change from 1 to 2 is chat ml template moving the output away from sequences trained in instruct alignment.
>>
>>101798853
I like it, thanks
>>
File: 1702709289129861.png (8 KB, 407x147)
8 KB
8 KB PNG
>>101799340
Could not load text model
>>
>>101799463
Sorry, I confused your use case.
I thought you had a split gguf model, not a split safetensors model.
For that you need to convert it using the convert-hf-to-gguf.py script in the llama.cpp repository.
>>
>>101798497
>>101799161
Had the same experience as this anon. It seems to default every character to horny as soon as the first chance presents itself, even if it doesn't make any sense. I tried it with a dozen different scenarios with pre-made cards, custom cards, and in story prompts and it always does it.
>>
>>101798722
>You may not use MiniCPM series models or their outputs and results to improve any other large language models (other than MiniCPM or its derivatives).
As long as you freeze the text encoder you can use it to train a diffusion model with the tagged images
>>
>>101799498
>It seems to default every character to horny as soon as the first chance presents itself
so just like claude
>>
>>101799340
Even if they’re not gguf split files but simply chopped up?
>>
>>101798969
I used to trialscum exl2 in google colab or vast but nowadays it takes an an hour to setup dependencies. Sticking to kcpp now
>>
>>101799533
>>101799533
>>101799533
>>
>>101798525
Depends on your use-case, but for mine, definitely.
>>
>>101799546
The script will read in multiple safetensors files and convert them to a single GGUF file.
But I just remembered that you also need some of the other files that describe the tokenizer and such.
The easiest would be to search for the model name on huggingface and see if someone uploaded a GGUF conversion.
Or use a frontend that has built-in functionality for downloading models.
>>
>>101799605
I meant split gguf files. Sorry I’m not the same guy.
>>
>>101799740
If you just chop up a file it will not work.
>>
>>101798828
>>101798848
>doesn't mention anything about the focus on the ass
>doesn't mention anything about the style



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.