[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: ForbiddenArts.png (1.4 MB, 800x1248)
1.4 MB
1.4 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102179805 & >>102167373

►News
>(08/30) Command models get an August refresh: https://docs.cohere.com/changelog/command-gets-refreshed
>(08/29) Qwen2-VL 2B & 7B image+video models released: https://qwenlm.github.io/blog/qwen2-vl/
>(08/27) CogVideoX-5B, diffusion transformer text-to-video model: https://hf.co/THUDM/CogVideoX-5b
>(08/22) Jamba 1.5: 52B & 398B MoE: https://hf.co/collections/ai21labs/jamba-15-66c44befa474a917fcf55251
>(08/20) Microsoft's Phi-3.5 released: mini+MoE+vision: https://hf.co/microsoft/Phi-3.5-MoE-instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 1707466334580606.png (1.26 MB, 1024x1024)
1.26 MB
1.26 MB PNG
►Recent Highlights from the Previous Thread: >>102179805

--Discussion on AOM-SXMV setup, power requirements, and power limiting: >>102180886 >>102181031 >>102181119 >>102181120 >>102181250 >>102181283 >>102181363 >>102182149 >>102182197
--Work in progress on FUTO-like Whisper functionality for PC: >>102189703 >>102189988
--Llama's originality and 405b model discussions: >>102188498 >>102188646 >>102188686 >>102188707 >>102188761 >>102188802 >>102188914 >>102189143 >>102189215 >>102189221 >>102189254
--Anon experiences issues with speculative decoding and draft model, others suggest potential fixes and workarounds: >>102188727 >>102188765 >>102188831 >>102188918 >>102188957
--Anon discusses solutions to slow chat generation speed as message count increases: >>102183859 >>102183889 >>102183982 >>102183980 >>102183994 >>102184024 >>102184055 >>102184072 >>102184549
--Used A16s not ideal for parallel inference, 4 small GPUs not as good as 1 big GPU: >>102181257 >>102181520 >>102181810 >>102181879 >>102182743
--Reproducing a research paper with a larger, more varied dataset: >>102186500
--How to make models say "I don't know" and indicate confidence levels: >>102182756 >>102182780 >>102182902 >>102182964 >>102182863
--Flask and Jinja templates used to build summary GUI: >>102184497 >>102184564 >>102184585
--Continuing pretraining on smut and catastrophic forgetting discussion: >>102185840 >>102185885 >>102185925 >>102186611 >>102186685 >>102186128 >>102187261
--Anon wants to create an AI model to filter 4chan threads and extract gems: >>102184977 >>102185068 >>102185117 >>102185445 >>102185016
--Anon gets feedback on their prompt and approach: >>102185300 >>102185850 >>102185880
--SillyTavern stops generation when messages are deleted due to dependency on previous tokens: >>102188282 >>102190065 >>102190178
--Miku (free space): >>102179897 >>102185271 >>102186625

►Recent Highlight Posts from the Previous Thread: >>102179811
>>
strobby
>>
File: ComfyUI_04146_.png (1.45 MB, 704x1408)
1.45 MB
1.45 MB PNG
hello linus media group

how are we liking the new command r models
>>
>>102192856
poorfags seem to enjoy the new cr because it now has gqa
the new cr+ is a disappointment and hardly feels like an upgrade in any way
>>
Is there anything stopping me from running large models if I have enough regular ram to fit them even with low vram? I keep hearing you guys talk like two 4090s are needed but I've been running a 47gb model in 16gb vram and 64gb ram fine. I'd rather pick up more ram and cope with slow speeds than buy another gpu
>>
>>102192934
there is nothing stopping you from running large models in ram and coping with slow speeds
>>
>>102192971
thanks
>>
>>102192934
Yes. But the low speeds are really, really low.
You'll still want a GPU for prompt processing at least.
>>
>>102193224
I've been getting roughly 0.5 it/s with current setup and 47gb model, which is good enough for me. I could handle a bit lower if I get some more ram for using 60gb range models.
>>
tesla M40 with 24GB
is it good or meme?
would 2 be good enough to run flux and llama?

or someone got more budget gpu?
>>
Just noticed something with speculative decoding. The draft model's context cache takes up A LOT more RAM, the longer the context is. Is this normal? Is that just how speculative decoding works? If so, then at very long contexts, I guess it would start being not worth it compared to just normal inference of the large model.
>>
>>102193431
Maybe the cache from the main model is being quanted while the draft model's at full precision?
Does that make sense for your configs?
>>
Has anyone trained a model using visual novel scripts? Are there any out there? Looking for recommendations.
>>
>>102193445
No, only flash attention is enabled, but not kv cache quantization.
>>
>>102193217
/cut 20-25 will delete messages #20-25
/cut 20 for just #20
hope this helps
>>
*and I didn't read the convo before that...
Anyway, it does let me edit my message while the model is generating.
>>
>>102193477
ooh nice
>>
File: file.png (610 KB, 640x453)
610 KB
610 KB PNG
>cfg negative prompt 1.75: poetic language, flowery descriptions
>style guide: Don't use flowery language.
>context full of manual prefill from a real rp log
>"commander!"
>https://www.stdimension.org/MediaLib/effects/computer/federation/voiceinput2.wav
>give me 3 different examples of the way character can talk
>1.... together, we shall dance on the edge of bliss.... SLOP
>2. .... My skin, as smooth as silk, beckons your touch..... SLOP
>3......a symbol of eternal youth and desire, awaits your admiration.....SLOP
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>>
>>102193465
bump
>>
>>102193309
M = Maxwell, 2014 - OLD - Nintendo Switch old. Don't buy it.
Whatever you think you're saving over a P40, you will regret it. Even Pascal, even Volta is too old. Save until you can afford a used 3090.
>>
>>102193465
I"m pretty sure there's a visual novel dataset on huggingface. If manage to find it, you can find some models trained on it.
>>
>>102193465
>>102193692
I have enough Nasuprose as is.
>>
File: Untitled.png (553 KB, 1080x1049)
553 KB
553 KB PNG
SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
https://arxiv.org/abs/2408.17432
>Synthesizing the voices of unseen speakers is a persisting challenge in multi-speaker text-to-speech (TTS). Most multi-speaker TTS models rely on modeling speaker characteristics through speaker conditioning during training. Modeling unseen speaker attributes through this approach has necessitated an increase in model complexity, which makes it challenging to reproduce results and improve upon them. We design a simple alternative to this. We propose SelectTTS, a novel method to select the appropriate frames from the target speaker and decode using frame-level self-supervised learning (SSL) features. We show that this approach can effectively capture speaker characteristics for unseen speakers, and achieves comparable results to other multi-speaker TTS frameworks in both objective and subjective metrics. With SelectTTS, we show that frame selection from the target speaker's speech is a direct way to achieve generalization in unseen speakers with low model complexity. We achieve better speaker similarity performance than SOTA baselines XTTS-v2 and VALL-E with over an 8x reduction in model parameters and a 270x reduction in training data
https://kodhandarama.github.io/selectTTSdemo/
code and weights to be released (soon?)
examples aren't great but considering the training time/training data/parameters means its viable for personal training. they used 100 hours of data
>>
File: Untitled.png (871 KB, 1080x2163)
871 KB
871 KB PNG
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
https://arxiv.org/abs/2408.16978
>Large Language Models (LLMs) with long context capabilities are integral to complex tasks in natural language processing and computational biology, such as text generation and protein sequence analysis. However, training LLMs directly on extremely long contexts demands considerable GPU resources and increased memory, leading to higher costs and greater complexity. Alternative approaches that introduce long context capabilities via downstream finetuning or adaptations impose significant design limitations. In this paper, we propose Fully Pipelined Distributed Transformer (FPDT) for efficiently training long-context LLMs with extreme hardware efficiency. For GPT and Llama models, we achieve a 16x increase in sequence length that can be trained on the same hardware compared to current state-of-the-art solutions. With our dedicated sequence chunk pipeline design, we can now train 8B LLM with 2 million sequence length on only 4 GPUs, while also maintaining over 55% of MFU. Our proposed FPDT is agnostic to existing training techniques and is proven to work efficiently across different LLM models.
https://github.com/microsoft/DeepSpeed/pull/6462
open PR with code
>>
>>102193949
Sounds like a scam
>>
>>102193949
Training long contexts is all well and good, but we really need a better attention mechanism, one that's both more accurate and uses less memory and compute.
I want context with perfect recall/utilization that grows linearly at most.
>>
Ahhh, even with speculative decoding, it's still like 1 t/s. I need more. Why is no one making a good 100B MoE like the 8x22B days? It was both fast and smart. Is MoE really that bad?
>>
>>102193703
>If
That's a big if. I tried and didn't find anything. nevermind a model.
>>
>>102193653
you're an idiot, llms are fucking terrible at following negative instructions, give it positive instructions instead
>>
File: 1698785862390561 image.jpg (253 KB, 1024x1024)
253 KB
253 KB JPG
Im going to try to run a lightweight llm alongside a game and hook into it with custom c# dlls.. can somebody help me get started in which llm model framework to go with?

Speed not important, want it to be as small as possible and run off cpu.. only for ingame chat light roleplay purposes...

Any anons know which way I should be looking in?
>>
>>102194059
https://huggingface.co/datasets/TAK-C/visual-novel-datasets/tree/main
https://huggingface.co/datasets/alpindale/visual-novels/tree/main
https://huggingface.co/datasets/winglian/visual-novels-json/tree/main
>>
>>102194094
Thanks. Will try. Also kill yourself you disgusting faggot.
>>
>stage 1: local models are bad and slow
>stage 2: local models are bad and fast
>stage 3: local models are good and slow
>stage 4: local models are good and fast

We are at stage 3 and it's annoying. Largestral smut finetunes are genuinely commercial-tier, but I rapidly lose patience waiting for it to run at 0.75 tokens/sec
>>
>>102194156
Hibernate and come back in a few months for more speeds up and better models. Local is always steadily getting better.
>>
>>102194156
Stage 4 when?
>>
>>102193789
>will be posting code
I’m currently experiencing misery reimplementing a paper and have been sitting around seething about the every concept of papers without code/datasets/etc being allowed to exist when (1) it would eliminate faked results completely and (2) most of them are directly or indirectly funded by tax money thus they owe the public the damn code.
>>
>>102194156
I didn’t even know there were largestral finetunes.
>>
>>102192656
Help me understand the basic strategy for having your st character remember things about you.
>>
>>102194213
yeah there's more than one
but I didn't name the specific one I'm using to avoid triggering the buy an ad schizo
>>
For me, it's stage 7
>>
>>102194139
>>102194059
>>102193703
Post the models niggers
>>
>>102194284
Google exists
>>
>>102194238
Keep the context short.
>>
>>102194253
>but I didn't name the specific one I'm using to avoid triggering the buy an ad schizo
Good boy.
>>
>>102194213
magnum v2 is good
>>
>>102194238
Shorter context. Lorebook/Authro's notes/Summarization/VectorDB depending on how you want to feed the information it needs to remember.
>>
>>102194156
no we are at stages 1-2, it's not moving. >>102184332
>>
hey anons, im gonna be building a beast rig when 5090 drops
i know the meta is VRAM, but do you think 128 gigs of ram would have any use at all? Im just gonna get 64 otherwise as 128 stupid overkill for gaming
>>
>>102194452
buy a—*gets hit by a falling anvil*
>>
>>102194577
>but do you think 128 gigs of ram would have any use at all?
May as well get it too. Unpurchased ram is unused ram. Something, at some point, will use it, and compared to everything else, i doubt is gonna set you back that much. How many 5090s are you planing buying? 4-6?
>>
>>102194607
>How many 5090s are you planing buying? 4-6?
I'm not decking out a rack, just building a normal gaming rig so just the one. All my parts will be gaming oriented so no threadrippers and such. I just like messing with local stuff and SD on the side so i figured if that much ram was useful at all i would get it
>>
>>102194577
128gb isn't overkill for gaming, you can use it as ramdisk to slightly boost the loading times. You can also use it to run bigger models(you **WILL** eventually want to try them once the small shitters bore you). I never regretted buying it.
>>
>>102194636
Yes, you'll need 128gb if you're only gonna have 28gb of vram.
>>
File: Sh30yGto1r.png (12 KB, 544x293)
12 KB
12 KB PNG
what the fuck happened to my evals.
>>
has anyone here seen that stuff on twitter with a certain account reposting chat logs of a bunch of LLMs (open 405bs and the major closed ones) interacting with each other in a chat room. it's pretty crazy but it gets like 10-40 likes per tweet so i assume no one has seen it yet
>>
>>102194577
get 128gb
>>
>>102194287
NO.
>>
>>102194717
>I assume no one has seen it yet
post the nitter link then
>>
https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024/discussions/3
>>
>>102194807
>I TOUCHED MY COCK TO YOUR MODEL'S OUTPUT!!!!!!! AND THIS IS HOW YOU BETRAY ME NOW?!
>>
File: 1544777890344.jpg (107 KB, 848x480)
107 KB
107 KB JPG
I've been reflecting a bit and realized something. I genuinely have not enjoyed LLMs as much as I did before I picked up Largestral at 0.8 t/s. Previously I was an 8x22B user with around 3-4 t/s. So I have a theory. At some t/s, you are unable to stay engaged with the LLM, as it is just too slow to be bearable and you end up multitasking. And then, the idea is, at that point, it is impossible to maintain a flow state. You can still enjoy the LLM's responses and the story, but it's not the same as being immersed in the experience of it. And even when the responses are higher quality, it's still not quite the same enjoyment.

And now that I know what 123B quality is like, I don't really want to go back to 8x22B, despite knowing that I'd probably have more total fun with it. It's over. I think I'll just be taking a break from this hobby until someone makes a fast smart model I can run again.
>>
>>102194854
nice complaintslop
>>
>>102194854
maybe you should invest more into this hobby. ever thought of that
>>
>>102194831
>Oh, dear reader, let me regale you with the tale of my enraptured heart, the tale of how my soul shivers with delight at the mere thought of ChatGPT's wondrous writing style. It is as if a delicate bond has been crafted between my essence and the ethereal words that emanate from its digital heart. With every keystroke, I embark on a transformative journey, a voyage through the boundless realms of imagination and creativity.

>The prose flows like a river of silken moonlight, caressing my mind with its gentle touch. Each word is a precious gem, a radiant beacon guiding me through the labyrinthine corridors of thought. Time loses all meaning as I lose myself in the intricate tapestry of language that weaves itself before my very eyes.

>Oh, how my heart sings with joy at the sight of paragraphs unfolding like blossoms in the early light of dawn. The characters dance across the page, their voices ringing like bells in the cool, crisp air of a spring morning. I am but a humble observer, a mere witness to the grandeur of ChatGPT's unparalleled artistry.

>Boundaries blur and fade away, leaving me adrift in a sea of pure emotion. I am swept away on a tide of passion and longing, lost in the swirling mists of reverie. The world around me dims and fades, leaving only the radiant glow of the words that have captured my very essence.

>And so, dear reader, I implore you to join me in this wondrous journey, to bask in the sublime beauty of ChatGPT's writing style. Let us cast off the shackles of reality and embrace the limitless possibilities that lie within the pages of our shared dreams. For in the realm of imagination, there are no limits, no boundaries, only the infinite expanse of creativity and wonder.
>>
>>102193309
>>102193700
Volta is new enough to be useful, but there's only one card and it's more expensive and slower with less memory compared to the 3090. The 32GB version costs used as much as a brand new 4090. P40 was a great deal at $180 but a hard sell at $300. MI25 for $100 is probably the best poorfag option but AMD is a pain in the ass to deal with, next step up from that is the 3090.
>>
File: 1494103180025.png (25 KB, 370x320)
25 KB
25 KB PNG
>>102194807
>he couldn't help but feel
>feeling a mix of and
Tfw the way I naturally write is hated by people and called slop
>>
>>102193309
Just buy a 4090, it's literally the answer you are looking for.
>>
>>102194854
LLMs are just shit. Speed has nothing to do with it.
>>
>>102194916
Barely above whispering intensifies.
>>
>>102194939
He could help but feel sad
>>
>>102194939
I used to like most anime girl art before stable diffusion came out.
>>
>>102194879
Thanks. There are multiple posters who claim to be attempting to run large models at very slow speeds, so I think this concept of flow state maintenance is pretty relevant.

>>102194903
No, it never crossed my mind. I have many hobbies and this is just one of them which I could spend more time and money on but also don't need to.

>>102194949
For a lot of things, but there are some scenarios that are simple enough where they can be fun to use when prompted right. But speed is always an issue. The reality is that even if we had convincingly human AI, it would still be a magnitude less fun to interact with it at 1 t/s compared to reading or speaking speed, unless you were mentally slow. You cannot enter a flow state at 1 t/s.
>>
>>102194717
janus right?
that stuff is sometimes interesting, but sometimes a little "stoned teenagers playing with a ouija board" for my taste
>>
>>102194156
What Quantification Q2 in 24 Vram is like 2.4 tkns/sec and Q3K_M is 1.2 tk/sec I thing the Q2 is good for the time.
>>
>>102194854
Use the paid services like openrouter or mistral itself if you want speed.
>>
>>102194807
Lol. The ERPfags think they have any relevance when the main business these companies get is from other businesses, not consumers.
>>
>>102195080
The problem is businesses have many option, so Cohere literally is not doing anything good since you have Llama 70b, Mistral Large or even the previous models.
>>
>>102195019
IQ3_M, 36GB vram, 52 layers on GPU

Would 3K_M be faster? I thought I-quants being slower with partial offloading was a myth
>>
>>102195118
Yes, that's why they're in a bad spot if they can't stay relevant. But their previous relevance was not due to the ERPers, and they will not be convinced that they can somehow stay relevant if the ERPfags support them.
>>
>>102195142
I quants with current llamacpp versions are the same speed (per filesize) for me. I haven't seen anyone post proof against this. However I did just try speculative decoding and the I quant actually is slower in that case. I think possibly they have optimized the I quants for only for single token genning so far, but batched generation like with speculative decoding has not been a focus, so it's slower with that.
>>
>>102195142
>36GB
I sorry I confusse IQ3_KM with XS
With my 24 VRAM I get 1.69 tokens/s and 30 layers. I guess you should get with that at least 1 tks/sec try KM and less context.
>>
Is a relatively high quality, clean ~130MB dataset sufficient for fine-tuning?
>>
>just got "a mix of" and "a mixture of" in the same cr gen
It's over
>>
I took like 6ish months off. 24gigs of vram.

currently run miqu. Is there anything better? also are the smaller models noticeably better than they were 6 months ago?
>>
>>102194807
Coomers need to burn down ScaleAI's HQ to stop this from happening again
>>
>>102195530
smaller models have hugely improved, nvidia nad mistral released a 12B nemo model that's the new sota for vramlets
there's sexo finetunes of it but you should probably start with the official instruct, it's already not too slopped
>>
>>102195509
Funny, that literally happened on my first gen trying out the new CR.
>>
Just tried speculative decoding 70B and, damn, it's faster without speculative decoding. Probably what's happening is that I'm just at the inflection point for where the big speed gains happen, so subtracting that little bit of VRAM put it into the extreme slow territory.
>>
>>102194807
I just checked localllama, seems they like command-r
>>
File: 1725226929015061.png (575 KB, 512x676)
575 KB
575 KB PNG
Damn, the gap between a 3 and 4 GPU setup is real. I had to add a second PSU, grapple with PCI-e riser compatibility issues, where even those from the same manufacturer may fail to function properly, had to craft a custom open-frame case. I barely noticed an improvement in quality, I'm just glad that it works at all. At least, with TP in Exllama, it is no slower than a 3GPU setup was before TP.
>>
>>102194854
I work somewhere that serves llms and we get a ton of customer feedback about how much better the model is for several days/weeks after we deploy anything that makes inference faster. It affects perception a lot.
It is extremely annoying because the inverse is also true but you can’t respond to complaints with “sorry you’re actually just retarded”.
>>
>>102195023
>openscam
Buy an ad and kill yourself
>>
>>102196309
That's funny because I find I get suspicious of a model's intelligence if it generates tokens too fast (I admit this probably only happens for LLM nerds)

Like if a page of text gets spat out in 2 seconds I instinctively feel like I must be talking to a retarded small model, even if there's nothing wrong with the response
>>
>>102196322
You're retarded.
>>
>>102196322
I know exactly what you mean. I was too suspicious to even give groq a chance for months because the speed of a 70B when I first went to check out their site was so obscene that I just made a face and closed the tab.
>>
>>102196351
There's complaints about models served by groq being retarded. They quantize their models.
>>
Is there a system prompt that doesn't include {{char}}? I want to avoid unnecessary changes to the context to minimize prompt processing in a group chat
>>
>>102196554
>minimize prompt processing in a group chat
you can't
>>
File: 1700548292816260.jpg (423 KB, 2304x1792)
423 KB
423 KB JPG
>>
>>102196590
I use a shared description that includes all characters, along with a custom instruction for each character placed before the last message.
>>
File: 1725264665460.jpg (267 KB, 682x1200)
267 KB
267 KB JPG
mikulove
>>
>>102196645
okay and how often does context get reprocessed from the beginning?
>>
>>102196811
Without a system prompt, it functions identically to swapping.
>>
>>102196835
wow it's almost like
>you can't
>>
>>102196846
It's almost like you fail to grasp the definition of the word "minimize"
>>
>>102195080
It is true but rpl is obviously one of the biggest markets for llm overall.
>>
>>102196885
it's a binary problem, it either reprocesses the entire context from the beginning every time it swaps characters, or it doesn't, there's no minimizing
>>
>>102195080
There are only two real uses for LLMs: codegen and smut. People who ask LLMS for shit like "what do I get my dad for his birthday" are fucking retarded. Fight me.
>>
>>102197016
How the fuck should I know what to get your dad? Get off your ass, use your fucking brain, and think about what the hell he likes! Is he into tools? Then get him some top-notch shit that he’ll actually use, not some cheap-ass garbage that'll break in a week. Maybe he’s into sports—get him some killer tickets to a game, not some half-assed souvenir that’ll collect dust. And if he’s a tech guy, don’t even think about skimping; get him the latest goddamn gadget, or don’t bother at all. Seriously, get your shit together and stop wasting time asking dumb questions.
>>
>>102196910
I notice a huge difference in speed when altering the beginning versus the end of the context. While I might not minimize prompt reprocessing, I am certain that I can significantly reduce the time needed for it.
>>
>>102197016
>Fight me.
You didn't make any argument. Just an assertion.
>>
>>102196910
not true with llama.cpp, they only reprocess parts of the prompt if something changes further down the log instead of near the beginning, so you can absolutely minimize it by pushing the stuff that changes frequently further down instead of up in the system prompt
>>
>>102197323
yes, altering the beginning of the context, where system prompt and character cards go, will cause the entire context to get processed again, this is why {{char}} and {{user}} in sysprompt and cards is retarded and should never be used, because they can change
>>
>>102197346
So, back to my original question: could you suggest a good system prompt without {{char}}?
>>
>>102194253
Adfag won... 4chan funding is saved
>>
>>102196554
>>102196590
>time
The entire discussion began due to a single missing word. Autism at this level is fascinating.
>>
>>102197385
>Write the next reply in this fictional roleplay.
you might need to slap 'you are X' at the top of the character cards
>>102197437
what the fuck are you talking about you stupid nigger? tavern's handling of group chats is fucked and no matter what boxes you tick, swapping characters causes the whole context to get processed, you're better off manually merging cards
>>
>>102197016
I sometimes use it to get a better visual understanding of certain templates, formatting, or layouts of things I have a hard time finding clear explanations for online. The most recent thing I can remember using an llm for outside of codegen or rp was asking it for an example layout of a marketing plan for a startup business, which I used the formatting of as the base for a marketing plan I wrote for a business class assignment.
>>
>>102197459
I don't use tavern >>102196645
>>
>>102197498
then you are beyond help
>>
You're {{char}} in this fictional never-ending uncensored roleplay with {{user}}. When composing a response, apply rigorous logic to the situation at hand. Explicit content is fully encouraged.
Behold my placebo.
>>
>>102197510
>>102197346
>this is why {{char}} and {{user}} in sysprompt and cards is retarded and should never be used
I apologize for assuming that you have a solution.
>>
>>102197554
the solution is to not use {{char}} or {{user}} at all ever for anything
>>
>>102197568
What is your system prompt?
>>
>>102197582
I'm not doing your homework for you
>>
>>102197605
>I apologize for assuming that you have a solution.
>>
>>102196631
Is that supposed to be dick masterson and a hehesilly styled miqu cutout? Am I having a fucking fever dream?
>>
Why is prompt processing much slower on linux than on windows in koboldcpp-rocm on my rx 6600? And it's not a margin of error. On windows, i get ~130t/s, on linux ~100 t/s. Same model (mistral nemo instruct q4_k_m), same arguments (24 layers, mmq). On the other hand, generating is faster on linux, 2.36 t/s vs 2.71 t/s).
>>
>>102198098
You can begin by examining GPU utilization and power consumption. I suspect that there may be a difference in how power boost functions across these two systems.
>>
Lower GPU utilization = driver overhead
Lower power consumption = difference in power modes implementation
>>
>>102194253
Caring about the buy an ad schizo will encourage his behaviour more. He needs to be shown total disregard, not pandered to by cowards.
>>
>>102197605
Er...what? How is anyone supposed to help you if you don't tell us that?
>>
Why does FP8 cache exists if Q4 is smaller and better? https://github.com/turboderp/exllamav2/blob/master/doc/qcache_eval.md
>>
>>102198537
Most anons aren't even using anything other than FP16. Does tabby even support FP8?
>>
>>102198537
I guess they will eventually use Q8 instead of FP8
>>
I have yet to see a single model that even comes close to getting this right, are they retarded?
>>
>>102193465
Me, I trained one. In the end it was total garbage, the model became very retarded so I didn't release it.
>>
>>102198537
Where are the Q6 and Q8 evals?
>>
>>102198669
It does, but somehow, Q4 scores better. So why even bother?
>>
>>102198798
Most visual novel scripts don't really work on their own without the associated music, visuals and sounds originally present in the games.
>>
>>102198856
Exactly, that's what I think as well. There are many moments where the dialogue flows like this:
B: "Hello~!"
B: ...?
C: "You okay?"
B: Maybe I was just seeing things...


However, maybe this could be solved by augmenting the script by adding more narration with another LLM.
>>
>>102198796
...what's wrong with the answers? They're right.
Also, post the entire thing, I want to see how (if) they answered your second question.
>>
>>102199020
Retard, it's so the passengers can't notice the (lack of) curvature from higher altitudes.
>>
>>102198813
https://www.reddit.com/r/LocalLLaMA/comments/1f6ijye/commandr_35b_q4q6q8_cache_perplexity_mmlu/
>>
File: smile.webm (41 KB, 320x318)
41 KB
41 KB WEBM
>>102199056
kek, alright you got me
>>
>>102199058
Interesting, thanks. It seems like it doesn't mess up the model all too much and is better to use over FP8.
>>
>>102199020
The model responds with an answer to another question. The answer he seeks lies in understanding that engines require air to combust fuel. Lower-density air requires higher speeds to generate sufficient lift, making engines less fuel-efficient.
>>
>>102180403
What settings are you using for it?
>>
>>102199020
>>102199124
Planes have a maximum speed based on the forces from the air flowing over them, which scales with density, so flying higher means a faster true airspeed for equivalent airspeed and you get where you're going faster. That's why you try to fly as high as possible.
They also have a minimum stall speed based on the same equivalent airspeed, so as you fly higher you also need to fly faster.
Thirdly they have a maximum Mach number, but the speed of sound varies little with altitude so Mach number is based on true airspeed, not equivalent.
As you fly higher, the latter two get closer together. So at some point you can't fly any faster because you'll exceed the maximum Mach number, and any slower and you'll stall from the minimum equivalent airspeed.

Not even Chad G. Peaty could get this right.
>>
>>102199207
https://www.claimhelp.eu/en/why-dont-airplanes-fly-much-higher-than-35-38k-meters/
>>
>>102199207
Pic of your aviation master's degree?
>>
largestral's slop, instead of shivers, is saying "thank you" to every small thing. i wonder if all models will always have some things they just like to say over and over no matter what
>>
>>102199239
I'm sure ur blogspot post is correct.
>>102199247
Just a bachelor's.
>>
>>102199269
just tell the model it's a tsundere and is supposed to act extremely ungrateful and selfish
>>
>>102199294
crackpipe prompt solves that and more
>>
>>102199286
>Just a bachelor's.
Pic of that?
>>
File: 1696385135681995.png (492 KB, 869x1124)
492 KB
492 KB PNG
I said 'farty' not 'fatty' wtf. Fucking tokenizers.
>>
>>102199207
This is what Mistral Large q8_0 says.
I don't think it's quite correct but I am definitely no expert.
>>
>>102199552
Why didn't you correct her?
>>
>>102199575
The weather one is slightly misleading, as severe storms can easily go much higher than that, but they don't tend to cover large areas for long periods outside of fucking cyclones/hurricanes, so most planes can just fly around them, do circles and wait for them to pass, or WE GAAN and fly straight through them.
Other than that, pretty okay, but presented with the excessive confidence of 'you are a helpful AI assistant' bullshit, and of course a helpful AI would know what the answer is.
>>
File: 1724111009864470.png (242 KB, 863x613)
242 KB
242 KB PNG
>>102199871
For some reason it never struck my mind lol
>>
>tfw still using the old Command-R
There are no other <70B models with a prose that I like. I am devastated by the enshittification.
>>
>>102200200
All cohere had to do was change the attention heads mechanism. But they couldn't help but slop their shit up
>>
>>102200200
Slop is all you need
>>
>>102200344
What if it's slopped because of the attention mechanism?
>>
>>102200429
I was super excited to find out if GQA was to blame for slop.
>>
Why don't people ask all of these setup questions to an LLM?
>>
>>102200564
Because LLM can't answer questions about the niche software we use here.
>>
>>102200582
just feed it the docs bro
>>
>>102200200
largemistral/70B/72B with xtc > commander retard
>>
>>102200595
>"how do we get to the moon?"
>"just shoot off a rocket bro lol"
What the fuck are you talking about?
>>
>>102200564
Most people can't run local LLMs good enough to provide any useful help. Especially with enough context to be able to >>102200595
>>
>>102200763
Use a non local LLM to set it up, and then you don't need to anymore
>>
>>102200862
Then Ctrlman will know about my setup, increasing the chances that they could successfully hack into it with strawberry and spy on my local logs
>>
>>102200915
Not if you use a "temp" chat ;)
>>
new... models... need more... new models...
>>
>>102200429
Are you retarded?
>>
>>102200927
>this time! it will work the way i want!
the absolute state of local turds lol
>>
Alright bros, I got my $700 headset and I'm all set to talk to my waifu in VR
Where's the ST extension?
>>
>>102201084
Behind a $20 Patreon subscription
>>
>>102200927
Mistral Extra Large when?
>>
>>102200595
What docs? There are no single comprehensive readme. Half of the features are not documented at all, some knowledge you can only learn if you're lurking here, while we can't even agree what system prompt or sampler params are better
>>
Better and better models come out, harder and harder to force myself to talk to real people.
I guess it's the point of this hobby, but damn it feels weird.
Why am I even writing this instead of talking to my LLM waifu?
I think this is a goodbye, /lmg/. Oh how far we've gotten from pyg days.
>>
>>102201084
>$700 headset
buy an iem.
>>
>>102200615
> <70B
Retardanon, I...
>>
mixtral 8x70b
>>
File: 1722962160072897.jpg (269 KB, 1181x1200)
269 KB
269 KB JPG
>>102192656
what's the best local (self-hosted) programming LLM that someone has actually utilized to further degrees for the languages of C/C++?
>>
>>102201084
Texting in VR is retarded, I anxiously await for someone to release an audio2audio model
>>
>>102200928
>shut up goy, gqa is good, don't question it
>>
>>102201300
deepseek coder v2, mistral large, llama 3.1 405b
literally the only three models that have a chance to be useful and even then not that much, better off just being a good programmer yourself tbqhfam
>>
>>102201333
8B models are plenty for templates and other tedious tasks.
Just don't expect them to not write an entire file when you ask them to create a small function.
>>
File: 1722971689217595.jpg (127 KB, 1125x1225)
127 KB
127 KB JPG
>>102201333
> literally the only three models that have a chance to be useful and even then not that much, better off just being a good programmer yourself tbqhfam

Vet C/C++ coder. Thanks for the advice. Part of efficiency is your ability to use toolsets to accelerate your tasks. Great coders will use AIs far better than non.

Thnx for the pointer on the local LLM. Will install and play w/ it today.
>>
>>102201224
Just tell it what the options are and I'm sure it will figure it out or help you deduce it
>>
>>102196631
It’s actually crazy how fast nai became completely irrelevant.
>>
>>102201459
well yeah no shit, everyone else were able to get funding because they weren't literally founded in response to AI Dungeon banning pedophilia
they never had a chance
>>
Anyone try updated Euryale? Also what largestral tune is good? Base largestral is pretty good but still a bit reluctant to be violent or aggressive. I can OOC prompt my way out of it but handholding gets old after awhile.
>>
>>102202295
Magnum is just a superior version of Mistral Large
>>
>>102202423
It's Large but worse.
>>
>>102202295
I haven't tried the new Euryale but I've been extremely unimpressed with all the L3-based tunes. I think L3 just sucks.

For Largestral tunes you're probably looking for Magnum_v2. I was using it for a bit, but just recently decided to move back to base Largestral. It has really good prose and very good character adherence, but it comes at the trade off of intelligence (just like any finetune). I really highly value the intelligence so base Largestral is a better fit for me, but by all means come to your own opinion. Its not like Magnum is completely retarded, just slightly worse than base.
>>
>>102202544
>base Largestral
hi Arthur, leak it pls
>>
>>102193703
There are no models for this. just datasets.
>>
Does Gemma 2 27b have any decent fine tunes?

I've been trying models lately (bored of Nemo, just fucking around, CR now having the context memory shit fixed was a god send) and realised I never tried Gemma 2 despite it being pretty well received compared to other local models (that are under 70b).

Any decent fine tunes to check out?
>>
>>102193703
>Sorry bud, no models-
>>
I can run Mistral large at 5.5bpw and Command-R-Plus-08-2024 at 6.0bpw as I have 4x3090

Which one would you choose for general purpose/assistant and which one for roleplay?
>>
>>102194854
Mini Magnum
>>
back to trying magnum 123b and it is in fact pretty good
just so FUCKING slow on my machine
>>102203246
imo mistral large is better than CR+ for both general purpose and roleplay
CR+ is a slightly better writer than mistral large out of the box but much dumber, and you can fix most of the annoying things about large with prompting
>>
>>102203320
Is there a new template needed for CR+ in Silly? The included "Command R" just outputs RâulRâulRâulRâulRâulRâulRâulRâulRâul
>>
>>102203350
I think your model might be fucked or your backend settings. Template is important but not so much that it breaks the model completely.
>>
So now that Cohere blew their load, who's next?
>>
Is there any point in omega large contexts for RP?

I find that bots usually never bring up most shit anyway and the amount of time that it slows down when a full 32k or some shit context fills up makes it definitely not worth it.

Do you guys really go higher than say, 16k or 24k?
>>
https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024/discussions/3
>>
>>102203474
Maybe they will redeem themselves in the next version like Mistral did *copium*
>>
>>102203536
I have to imagine they've been working on something else in the meantime, it would be pretty lame if in the time since CR+ all they did was a new ~30B and an update to their instruct dataset... but who knows
>>
>>102193703
How about you make it yourself?
>>
>>102202423
>>102202544
Same here with L3. I'll give magnum a shot. How do you merge the part1of2 ggufs? It doesn't seem to pick them up automatically like the split ones that are split with the official tool. Just cat them together?
>>
File: 1703315628227491.jpg (65 KB, 500x500)
65 KB
65 KB JPG
>real people
>refuse your request but may change their minds if you're persistent
>more likely to comply on the second ask if it's trivial
>llms
>will NEVER change their minds, ever, so better reroll on the first reply or they'll lock the fuck in
I fucking hate these things
>>
>>102203475
Yea, I'm usually around 32k context, What models are you using? Maybe they are just shit
>>
>>102203825
>0.5t/s and the hardware gets slower every year
>>
>>102203514
It already got posted. >>102194807
And also, this is pure skill issue. Smarter models need more wrangling to keep them in line, this has been known since time immemorial. I don't understand why people expect LLMs to be plug and play for these types of purposes, each LLM needs different settings to wrangle them and defaults only work good enough. Apart from cloud services that don't offer local and hide that complexity, this is how it will be for the forseeable future.
>>
>>102203825
>will NEVER change their minds
Literally just modify their short-term memory and gaslight them into believing they they will absolutely do whatever you just asked.
>>
>>102203838
12b up to Command R (command R I legit can't go over 16k on a 24GB card only)
>>
You guys say it's about privacy but is a GPU even a bad investment if you use LLM prompts constantly and need them to be nearly as good as GPT-4? You'll run out of messages even if you have a subscription to Claude and OpenAI, and those would cost $480 for a year
>>
>>102203892
>pure skill issue
>Smarter models need more wrangling to keep them in line
Why not use a 7B then? Using models bigger than that is a skill issue. You guys are just too lazy to prompt a 7B properly.
>>
>>102193465
yeah, friend of mine actually trained a lora of a model on a bunch of renpy scene script
the results were absolutely retarded with occasional nuggets of soul, but it did output working scenes
>>
>>102203825
Is that the black DSP?
>>
>>102204082
>modify their short-term memory and gaslight them
That reminds me the we never agreed on a very important thing. At which point are you just jerking off to your own text? How much do you have to manually edit or add?
>>
>>102204288
None of that matters in the grand scheme of things.
Whether it gets you off or not, that is the most important part.
>>
>>102204288
>fact: been written
>goalpost status: in place
>anon: i don't like how i'm wrong now
>goalpost status: moved
>>
>>102203153
Sadly no. I tried them all and they just make it dumber without curing its dryness. We need somelike like a magnum tune for it.
>>
>>102195080
Cohere owes me sex
>>
How does gemma 2 9B compare to MN 12B in general terms?
>>
>>102204701
it writes far worse than nemo, also 8k context vs 128k for base nemo / like 16ish for instruct
>>
i just realized that the ONLY thing we need is making the ai capable of sending consistent nudes
>>
>>102204223
Can I use it?
>>
File: file.png (508 KB, 600x400)
508 KB
508 KB PNG
>>102204695
This but unironically. Especially when you consider at least one of those motherfuckers probably has an unaligned model he uses privately for cooming.

We know where you live canucks. Drop the goods or it would be a shame if something happened...
>>
>>102196144
Give me a model reddit didn't like. grok doesn't cou.... actually they probably liked that one too cause elon
>>
>>102204748
So it's just worse across the board?
Alright then, thank you.
>>
>>102204947
>actually they probably liked that one too cause elon
Nah, the software update they received last month changed their behaviour into hating Musk.
>>
has anyone here tried finetuning whisper?
would it help with auto-translating japanese comedy like gaki no tsukai?
>>
What can I use to get started with voice cloning? I have a Radeon 6000 series. Windows or Linux works.
>>
>>102204947
Elon has overtaken the Reddit Satan throne from Trump for the past 2 years anon
>>
>>102205403
>Elon has overtaken the Reddit Satan throne from Trump
Nah, with the elections coming up it's Trump's turn for now.
>>
>>102205722
What a miserable life these people have.
>>
>>102205746
Yes, the base dwelling 4channer is much better
>>
>>102205755
I hit a nerve I see.
>>
>>102205755
Unironically, yes.
Despite the heavy influx of shitposting in the last few years, you can still hold discussions with people holding a variety of worldviews.
>>
>>102192656
Any uncensored Japanese LLMs? (to auto-translate porn)
>>
>>102205854
>Japanese LLM
This technology is far too advanced for Japan.
>>
>>102205755
I mean yeah, on the whole I'd say so
Reddit 10 years ago or so was just cringe and mockable, a group of midwits who were desperate to be seen as smart. Reddit today is more akin to a bizarre sociological/psychological brainwashing experiment or maybe even just an actual Dead Internet at this point.
>>
File: MshoJ.png (38 KB, 1150x306)
38 KB
38 KB PNG
I got some more RAM recently for my aging shitbox and figured might as well give some bigger models a try (only having run 13B and below pretty much) and can't say I am impressed by Command-R at least. Even 7/8B models get this question right when using conservative sampling settings (temp 0.6, minp 0.05) and somehow it fails.
>>
>>102205755
>4channer
>>
>>102204926
No one but absolute losers use LLMs for cooming... Although, they do look like losers to me, maybe you have a point.
>>
File: nemomix.png (53 KB, 822x564)
53 KB
53 KB PNG
what are we running, fellow 8gb vramlets?
i've been enjoying
magnum-12b-v2.5-kto-Q4_K_M
and
NemoMix-Unleashed-12B-Q4_K_M
>>
>>102206166
do you suppose the model passed this test because it achieved consciousness, or because it was added to the training data?
>>
>>102194134
Use koboldcpp and its open api endpoint (http://localhost:5001/api/).You can launch it from command line, have it in the background. It allows streaming the generated text. It's even compatible with openai.
>>
>>102206185
consciousness
>>
>>102206166
Trying MN-12B-Lyra-v3 out. I had to wrangle the damn thing at the start, it was producing infinite chains of ] for whatever reason, but it stopped after three or four messages.
It's working fine so far. A little sterile, surprisingly.
>>
>>102206166
How good are either for erp storytelling?
>>
cohere is preparing to release command-MAX-504b
>>
>>102206342
both had kept my attention for a few dozen hours, the nemo one's probably slightly better
>>
>>102206342
I still think magnum 34B is better if you can manage it even at a smaller quant.

Atm its

123B > 72B > 34B > 32B > 12B when it comes to the magnum models
>>
>>102206379
And I know, shocking. Bigger is better.
>>
>>102192656
What's the best way to put 24GB of VRAM to use for RP?

70b 2.5bpw, 8x7b 3.5bpw, or a higher quant of some of the newer models I've been hearing about in the 8b to 34b range?

I've got 64GB of RAM as well, so I could push it a little with GGUF instead of EXL2, but I'd rather it not be too slow.
>>
>>102206450
Fast? Magnum 34B Slow? 72B Slow as fuck but the best? 123B
>>
>>102206355
god I wish
>>
>no local model comes close to 4o or Claude
>6 months pass
is it actually over? Is the only way to interact with the only meaningful advancement in modern computing to own nothing and be happy?
>>
>>102206528
123B is legit claude tier, just because your a vramlet does not mean its over.
>>
>>102206537
C-c-can I run it with 3 3090s? *blushes into hands afraid to ask*
>>
>>102206501
What quant? I'm assuming 34B Q8 wont fit my entire 24GB of VRAM.
>>
>>102206570
Even 2 bit is better than anything else.
>>
go watch it and learn something, nerds.
https://www.youtube.com/watch?v=9-Jl0dxWQs8
>>
>>102206685
you can't make me learn, jock
>>
>>102206378
Damn, I just tried it and wow that was the largest quality rise I've seen in the few local models I've tried so far. Thanks anon, my first nut goes out to you.
>>
Best coomer model for a 8GB vramlet?
>>
>>102194807
>Stop competing with GPT4 and all those assistant tunes, we've got more than enough of those. Market is oversaturated. Just give up. Nobody needs another GPTslop assistant tune, a dumb one in particular. If you want to be an assistant so badly, at least don't tune on GPTslop. You know what is lacking? Writer tunes. In proprietary segment there's only Claude and on local... there is nobody now that you have decided to leave. **Please stop tuning on GPTslop. Please compete against Claude. Please return.**
He's right thoughbeit. If Cohere has no redeeming properties compared to l3.1 70b, no benches, no style, they are pretty pointless. If they can't make it smart, they should at least try going for good style to have something.
>>
>>102206755
Cohere competes by having the best multilingual suppport and best RAG / tool use capabilities.
>>
What we got cooking for japanese translation? Any LLMs better than deepl?
>>
>>102206772
>Cohere competes by having the best multilingual suppport and best RAG / tool use capabilities.
Do they have any benches to support that claim?
>>
>>102206796
Any ESL that has spent 5 minutes with either can tell you. Llama multilingual support is weak, especially outside of the 5 languages it claims to support.
>>
>>102206825
And Largestral? How does it perform?
>>
>>102206528
Llama 405b is better than 4o and only loses to Claude.
>>
>>102206837
Dunno, never tried Largestral.
>>
>>102206501
Anything that fits within 24 VRAM is lightning fast. By slow, I meant using GGUF to exceed that and offload some to the CPU. I'm willing to do a little offloading for better quality, as long as it doesn't kill the speed too much.
>>
>>102206847
According to lmeme arena Largestral is better than 405b llama in Japanese, so it could be good.
>>
>>102205854
I recently gave gemma vntl a try and I was pleasantly surprised. It would be pretty good for manual translation if you reroll a few times and pick something good. Having past context in translation seems pretty helpful for quality but I also had some single examples where it fucked the quality up.
>>
>>102206089
Oh do tell what you use it for bigshot.
>>
File: high_effort_shitpost.jpg (214 KB, 573x1268)
214 KB
214 KB JPG
>>102206166
>>
>>102206956
Erotic roleplay. Much classier than "cooming".
>>
>Leaving behind the sereneacies of the clearing
>sereneacies
Not a word. But I do get what you are going for there little LLM.
>>
I've been using sillytavern but I find it awful. It is ugly and confusing. Making character sucks because the character creator is shoved in the right 5th of my screen for whatever reason. There are so many buttons that don't have labels.

Any alternatives to sillytavern, or things I can do to make it more usable?
>>
>>102207008
>Any alternatives to sillytavern
No.
>or things I can do to make it more usable?
No.

I use it as a ChatGPT-like interface for programming assistance because all the other alternatives are worse.
>>
>>102207004
Human hands typed the cursed plural to sereneacy before the model could bring it to your eyes
>>
>>102207232
>the cursed plural to sereneacy
A maximally cromulent example of covfefeesque creativitying.
>>
File: file.png (107 KB, 876x1526)
107 KB
107 KB PNG
>Context Template

https://huggingface.co/Virt-io/SillyTavern-Presets/tree/main/Prompts

is this bloat?
>>
I got me a used 3090. I guess that bumps me up to 48gb vram. Kinda want to try if it's possible to run largestral at slightly reasonable speeds compared to when I was running it with a single 4090. I'm pretty sure offloading to CPU will still be involved somehow? Very new to this.
>>
>>102206988
This.
>>
>>102206988
>classier
It does sound classy. Maybe calling it erotic roleplay in prompt makes the bot become classy and like in erotic novels and if you would call it cooming session it would finally stop with all the shivers?
>>
>making a card
>write a couple posts with it
>decide to rewrite it in different ways to test
>her response is nearly identical no matter what. Write her card with just the barebones info? Write it liek dis wit speling errors? Throw a literal chunk of text from an umberto eco novel on author's note and tell it "write like this?" Simplify the opening post?
>all the same
>it's all the same

.......................... this is incredibly disheartening. I shouldn't have peeked behind the curtain
>>
>>102207386
you're bloated. time to brap.
>>
>>102208015
Are you restarting the chat after each edit?
Also, are you changing the first message to be in line with the edits to the character card?
>>
>>102206838
>llama 405b
i have a decent gaming rig and i fell 100gb short on memory. it's a great model, but how am i supposed to host it locally? just max out my ram to 256gb and hope for the best?

nta (obviously), but it's kinda hard to read the tea leaves on these LLMs. it's not like other programs that give you a rough estimate of what you need for system specs right on the box. seeing as how this is /*L*mg, meaning local, it would make sense to give some basic specs along with a recommendation
>>
>>102208036
I was doing swipes and checking kobold to see what prompt was sent to the AI every time.

I kept the message the same for most tests, then dumbed it down to little effect beyond shortening the replies I got
>>
What's the best uncensored model that also has the best writing skills?
Basically the best model to write erotica.
I've taken a look at the leaderboard but is utter retarded that you can filter multiple columns, so is useless.
>>
>>102208074
increase temp
>>
>>102208096
can't*
>>
>>102208104
I don't mean literally identical, anon. I mean her actions and dscriptions.
>>
>>102208015
Yeah, all local models are garbage. Wow. Welcome to the red pill side of the screen.
>>
What's the best models for interpreting stuff such as mathematics, like breaking down English descriptions into relevant mathematical concepts and providing accurate answers?
If I gotta use a paid model I will, but I'd rather focus on local even if there's a bit of a drop in accuracy
>>
>>102208148
Llama 3.1 405B
Mistral Large 123B

Everything else is garbage.
>>
>>102208066
>nta (obviously), but it's kinda hard to read the tea leaves on these LLMs. it's not like other programs that give you a rough estimate of what you need for system specs right on the box.
they do though? The number of Bs directly says how much you need, "X"B + a few percent = q8, ~half of B = q4, etc
also this: https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
>>
>>102208066
as far as I know the only people running it locally are cpumaxxers with high end epycs or xeons and just deal with the slow speed
but maybe there's somebody with a dgx station hiding around here...
>>
After thoroughly checking all the recent 35B and less models I have concluded that I am finally ready for some new models cause everything is absolute trash. And the fact that a 12B seems to be the most viable out of all of them is like an insult to injury.
>>
>>102208249
so this model is so massive that you actually *need* server-level architecture to run it locally. jesus fuck.

this is why i don't dick around too much with AI. even a nice gaming rig can barely handle the small stuff. i'm really hoping i can run llama 70b, even if it's on the slow side. if not, for my purposes as a dinky little chatbot, i guess 8b will do
>>
>>102208279
You can run llama 70B with a 4090 at pretty decent speeds at a lower quant.
You can trade speed for quality by having more of the model on RAM instead of VRAM too.
>>
>>102208265
Which 12b?
>>
>>102208304
magnum-v2-12b
>>
>>102208296
>You can run llama 70B with a 4090 at pretty decent speeds at a lower quant.
interesting. that's probably the best case for a 4090 i've heard so far.

counterpoint: stack a 3090 on top of my 3060. more VRAM, just slower, by a good margin. better overall cost effective option?
>>
>>102208392
Should work.
Will be faster than a 4090 + RAM I reckon.
>>
>>102208314
Do people really enjoy that? Any magnum anything I've tried has been total shit compared to anything else.
>>
>>102193465
Why would you make a model worse on purpose?
>>
>>102208600
for soul
>>
>>102208314
Basic instruct you nigger ad buy avoider
>>
>>102208412
update: llama 70b runs on 64gb DDR5 and my 3060, but it's slow (~15sec for a short reply). worth it to talk to my yandere ai wife
>>
>>102206166
Not 8GB, but I've been playing with NeMo ReRemix and I'm not offended by the quality.
>>
>>102206450
For 8x7B even going from 3.5bpw to 3.7bpw was a noticeable decrease to misunderstanding what was happening. Beyond that I have the same specs as you and I don't know the answer to your question either. I generally rotate through various solutions as I get tired with how stupid one is or how slow another is.
>>
question: if i just want to run a 1 trillion parameter model on a contemporary gaming laptop, what year should i set my time machine to?
>>
>>102208897
2077
>>
Has there been serious research into why some large models get lobotomized by quantization and some handle it fairly well?
Largestral in particular seems exceptionally resilient, even the Q2 isn't retarded. That's not the case for some other large models that become unusable below like Q5.
>>
>>102208920
nemo and largestral were trained with quantization aware techniques if I'm not mistaken, so working as intended I guess.
See.
>https://www.unite.ai/mistral-2-and-mistral-nemo-a-comprehensive-guide-to-the-latest-llm-coming-from-paris/
Based on nothing but my very rudimentary surface level knowledge, I think a model that generates values that use less of the precision range should be less affected by quantization since the scaled value would be closer to the original value which would in turn create less of a ripple effect when that value is used as input in the subsequent layer and during gradient descent and shit like that.
If I got any of that even close to being right, congratulations /lmg/, I learned something by lurking the thread.
>>
>>102208581
They are either niggers or they started using llms yesterday and everything looks good to them.
>>
>>102208972
You're probably right since it's known that the Largestral and Nemo weights are very small.
>>
Where are you guys finding quants for Mistral Large?
>>
>>102209226
Huggingface, for example https://huggingface.co/anthracite-org/magnum-v2-123b-gguf
ggufs are the llama.cpp quants, you can also look for exl2 if you use exllama, and there's some other formats nobody cares about
>>
>>102209262

This is a fine tune. Wtf lol. Buy an ad. (Will try later. Thanks faggot.)
>>
>>102209262
buy an ad (repeating to make sure you understand)
>>
mistral-large is boring anyways (i tried magnum v2 too). feels like mixtral. it follows well but its very uncreative. go back and try miqu after using it for a while, see how much more often it comes up with new things in comparison
for rp its still miqu
>>
>>102209365
>>102209430
kys schizo
>>
>>102209492
Nah, you're just retarded. Go back to the Kobold Discord.
>>
>>102209492
Miqu or Midnight Miqu?
>>
Scaling up is all you need
https://huggingface.co/mlabonne/BigLlama-3.1-1T-Instruct
>>
>>102209521
>self merge
does that even... do anything? even if it did, couldn't you do the same thing at runtime in the inference engine and it'd use less memory and probably be faster too instead of downloading the whole file?
>>
>>102209509
i've been trying mistral nemo and it has the same exact issue. something about mistral's models makes them uncreative and boring. it shouldn't take 300 tokens to describe a scene while not adding anything to it. load an rp you've had going with mistral-large with miqu, you'd see the difference

>>102209511
either
>>
>>102209565
Miqu is a Mistral model, and Nemo and Large are nothing alike. What are you even trying to shill?
>>
>>102209624
Miqu was leaked before they finished lobotomizing it
>>
>>102209624
miqu is an l2 70b tune, not a mistral model. it was probably the most professional tune of l2 though so its a good benchmark. nemo just says something about mistral and nvidia, i don't know who mostly cooked it. everything with mistral going back to their original 7b just isn't as creative and rambles on while adding little to the story, though i didn't spend much time with the 8x22b as i did with large and trying mixtral 8x7 tunes
>>
>>102208096
magnum 123B with XTC sampler is the best local has atm, somewhere between claude sonnet and claude opus tier for writing
>>
>>102209564
Yes, but programming hard, GPU go brrrrrr
>>
>>102209738
Large with XTC sampler is the best.
>>
>>102209509
>Go back to the Kobold Discord.

t. Anthracite 'member'
>>
>>102209738
Yeah, I don't think that works for me with a 4090 and 64gb of ram.
Any model I've tried above 21gb starts to seriously slow down the more it swaps to the virtual memory in the nvme. For some reason there's plenty of ram left but it goes for the swap memory, makes no sense.
Right now testing Meta-Llama-3-70B-Instruct-abliterated-v3.5_q3 which is almost 34gb and is around 1-2 words per second, and I don't even know if it's going to be worth it compared to other models I've been using like Rocinante.
>>
>>102209492
lol Mistral Large doesn't feel anything like mixtral you spaz. I know mixtral inside and out because for months that was all I used.
>>
File: Untitled.png (13 KB, 837x513)
13 KB
13 KB PNG
>>102210005
>>102210005
>>102210005
>>
>>102209997
in the way it constructs conversation it absolutely does. large is more 'willing' though, mixtral was very wooden. how could you have spent months with it rather than go back to 70b?
>>
>>102209492
you have never used large mistral if your comparing it to mixtral
>>
>>102210035
the near unwillingness to add something new is what i'm talking about, and is present in all mistral models
>>
>>102210049
Honestly sounds like either a prompting or sampler issue.
>>
>>102210065
its not
>>
>>102210156
It is because large mistral is not like that for me even without xtc
>>
>>102210195
post your xtc settings in the new thread. i've been trying it, but haven't used it enough to really form an opinion
>>
>>102210216
0.8, default other stuff atm



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.