[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: ComfyUI_00787_.png (1003 KB, 768x1344)
1003 KB
1003 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101933598 & >>101925496

►News
>(08/16) MiniCPM-V-2.6 support merged: https://github.com/ggerganov/llama.cpp/pull/8967
>(08/15) Hermes 3 released, full finetunes of Llama 3.1 base models: https://hf.co/collections/NousResearch/hermes-3-66bd6c01399b14b08fe335ea
>(08/12) Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b
>(08/09) Qwen large audio-input language models: https://hf.co/Qwen/Qwen2-Audio-7B-Instruct
>(08/07) LG AI releases Korean bilingual model: https://hf.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://rentry.org/lmg-faq-new
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: ComfyUI_00794_.png (1.07 MB, 1024x1024)
1.07 MB
1.07 MB PNG
►Recent Highlights from the Previous Thread: >>101933598

--Hermes 450b praised for smut and coomery: >>101936364 >>101940169 >>101940212 >>101940216 >>101944289 >>101944463 >>101944558 >>101944601 >>101944780 >>101944932 >>101945026 >>101944765 >>101945184
--Hermes 405B is uncensored, censorship only in default web UI: >>101933786 >>101934447 >>101935319
--Understanding max_seq_len and compress_pos_emb settings: >>101941616 >>101941642 >>101941672 >>101941735 >>101941759 >>101941797 >>101941839 >>101941876 >>101941723
--Recent AI improvements seen as plateauing, but real intelligence gains noted: >>101944575 >>101945087 >>101945118 >>101945141 >>101945481 >>101945473 >>101945451 >>101945465 >>101945542 >>101945762 >>101946181
--Misconception about imatrix in llama.cpp, training support development: >>101943108 >>101943808 >>101943993 >>101944081
--How to set the --api flag in ooba for Windows: >>101942104 >>101942143 >>101942190 >>101942218 >>101942252 >>101942292 >>101942227 >>101942248 >>101942148
--Vulkan speeds up AMD APU inference, but has FP16 limitation: >>101935155 >>101935472 >>101935620
--Prompt Engineering Guide recommended for new users: >>101942244 >>101942265 >>101942323 >>101942433 >>101942495 >>101942740 >>101943171
--Microsoft's E2 TTS model and its potential integration with ST: >>101944391 >>101945147
--Anon offers opinionated Hermes settings, acknowledges generic phrases in LLMs: >>101943021 >>101944904
--55-60 cores are the sweet spot for inference, depending on memory bandwidth: >>101944312 >>101944574
--Slopcheck.py tool for checking common phrases in writing-: >>101941218 >>101941270 >>101941286 >>101941310 >>101941318 >>101942271
--Intel AI Playground app released, but VRAM capacity may limit GPU competition with Nvidia: >>101942413 >>101942472
--Miku (free space): >>101935077 >>101935463 >>101939457 >>101940189 >>101942644 >>101942968 >>101945326 >>101945427

►Recent Highlight Posts from the Previous Thread: >>101933601
>>
>>101947316
Make /lmg/ seethe in 4 words.
>>
Where is Claude 3.5 Opus? WHERE IS IT!?
I'm tired of localslop! Anthropic tasukede!!
>>
File: low res bulbasaur.png (43 KB, 166x138)
43 KB
43 KB PNG
Do instructions like "Don't end a post mid-sentence" do anything? Does the model have any idea when it's going to have to stop talking, or does it only find out when it gets cut off?
>>
Dead general.
>>
>>101947537
Instruction following is emergent behavior. So unless it has a lot of training examples where it's like
"Person 1: Don't end your post mid sentence.
Person 2: *doesn't end the post mid sentence*" probably not.
>>
>>101947367
Jart did nothing wrong.
>>
>>101947323
lol at the slopfinder's selection
>>
>>101947537
No. There's a setting in your frontend to specify how many tokens it should generate, but the model cannot know how many it has left, so it goes for as long as the program lets it. Increase that value.
>>
If your training loss drops below 1.0 your model is overcooked.
Fight me.
>>
smedrins
>>
>>101947537
No, because LLMs never do this willingly.
>Does the model have any idea when it's going to have to stop talking, or does it only find out when it gets cut off?
No, the LLM just predicts the next token.

>>101947620
afaik you can't teach the LLM to not do something though
>>
>>101947720
>afaik you can't teach the LLM to not do something though
You're referring to the whole "negative prompting" thing. That literally goes back to the Pygmalion 6B days where the local models were so fucking stupid the mere presence of the mention of something caused the model to start repeating the thing that was mentioned. Your average current generation model can handle negative prompting just fine.
>>
>>101947537
it's doing that because your max tokens is set to less than it wants to write
>>
>>101947767
So you think saying things like "don't impersonate the user, don't repeat yourself" actually is effective?
>>
>>101947825
No. It's not effective because the models aren't trained on that shit. But if you had a dataset to that effect you could probably train them not to.
As far as repetition goes that's generally caused by meme samplers. Learn to embrace neutral sampling.
>>
>>101947367
ollama deserves more credit.
>>
>>101947316
The FAQ says "buy a fucking ad"
>>
>>101947537
I concur with >>101947819
Your most likely problem is that your context window is too small for what you're doing.
Try increasing it, but beware that if you set it too large you will run out of memory.
>>
>>101947939
And the problem is...?
>>
>>101947939
lmao I don't think OP noticed
here's the old one: https://wikia.schneedc.com/
>>
>>101947673
>Besides,
How the fuck is that slop now? Three(3) more weeks and it will just be a list of all english word
>>
The only truly slopped phrases that the model uses as though it has some kind of brain damage are:
The shivers
Eyes never leaving yours
Voice barely above a whisker
Husky voice
The rest is just over-stimulated gooners not understanding how down-regulation of the hypothalamus works.
>>
>>101948077
>The rest is just over-stimulated gooners not understanding how down-regulation of the hypothalamus works.
It's even simpler than that. Most haven't read a book since high school and now they read the one-subject scenario 10 times a day with different models. For a whole year.
>>
>>101948130
>Most haven't read a book since high school
Do shitty chinese martial arts novels count?
>>
I only keep up with this occasionally. Last time I came around. Gemma 2 was pretty much the best model in my experience (even though it was slow). This was a couple months ago. What are the new hot models now? I can't run anything extremely demanding, but I've been able to do some 70B models. Thanks.
>>
>>101947953
My context window is nowhere near reached, I doubt that has anything to do with it.
>>
>>101948151
No idea what you're using, but make sure the frontend and backend both aren't limiting your context window.
>>
File: slop.png (2.57 MB, 1920x1080)
2.57 MB
2.57 MB PNG
What's the /lmg/ consensus? is KTO a flop technique?
Is honest to god full RLHF the only way?
>>
>>101948130
I mean that's how they're not able to identify that they're just damaging their own brains.
But
>read phrase a few times
>gives bonor
>keep re-reading same phrase to give self bonor
>phrase starts to invoke awkward feelings as you no longer get the anticipated endorphin release
Entirely their own faults. Switch it up every now and then. Or go on /soc/ and find a human ERP partner for a few sessions and you'll quickly remember why we're here.
>>
>>101948170
The only way is traditional finetuning on hand-crafted datasets.
>>
>>101948170
It's not bad, just too horny. Probably a fine-tuner issue.
>>
>>101948151
nta.
>>101947688
>>101947819
He stumbled upon the answer, but somehow managed to muddy the issue.
You're hitting your max token count on your frontend. Show a screenshot. We don't even know what you're using.
>>
>>101948202
>You're hitting your max token count on your frontend
I know I am! I have it set to 150t because I'm RPing, I don't want to get a brick every time before I respond. I just want the model's posts to end in periods and not in the middle of sentences.
>>
>>101947719
DON'T THINK IT DON'T SAY IT
>>
File: 1705038872973130.png (172 KB, 742x553)
172 KB
172 KB PNG
>>101947316
Thread Theme: ochatime - ft. Hatsune Miku
https://www.youtube.com/watch?v=W1J2ZELm7Sw
>>
>>101948257
You have a verbose model, probably a finetune trained on smut. If i had to guess, i'd say that your card/system prompt encourage the model to speak verbosely as well. It's a hell of your own making. You know how to solve it.
To reiterate what i said, the model doesn't know how many tokens it 'has left'. It will continue outputting tokens until it generates an EOS or your inference program stops it.
>>
>>101948170
>>101948197
>Is honest to god full RLHF the only way?
Any offline RL algorithm (DPO, KTO, CopePO etc) is dead on arrival for anything that isn't super simplistic "don't do this particular behavior" tuning.

PPO style RLHF as used in Claude, GPT, etc is still the state of the art for a reason (Just look at Llama3.1 using DPO and how that backfired.) But open source chuds will never do it because it's too VRAM heavy.

Regular finetuning by itself is not enough.
>>
>>101948322
Cute Thread Theme
>>
>>101948323
Alright, thanks. I just killed streaming and toggled "trim incomplete sentences", what I don't know won't hurt me.
>>
>>101948430
You could have mentioned what you were using. We could have told you that:)
>>
File: 1710774484776483.png (207 KB, 295x460)
207 KB
207 KB PNG
how much money do i have to spend to run 405B? is it even possible without a datacenter?
>>
>>101948462
>https://rentry.org/miqumaxx
Is probably the most reasonable way, all things considered. It's going to be slow and probably not worth it.
>>
>>101948394
64K is enough for anyone
>>
>>101948394
Where is the paper to back up your claims?
>>
>>101948565
>paper
It's a coffee stained post-it on his monitor. It says "Regular finetuning by itself is not enough".
>>
>>101948625
Has the post-it been peer reviewed?
>>
I'm using Silly Tavern for my front end with KobolCPP as my backend; mistral nemo is the model. Each time I edit the chat history or attempt to summarize, the model just refuses to generate and gets stuck. Relaunching ST fixes this, but the summary is not generated. Any anons here encountered this problem before? Thanks.
>>
>>101948667
It reads "Yeah. Pretty much" on the other side. Seems legit.
>>
I like how I can just throw random code at my model and it instantly recognizes what framework I'm using.
I love the future.
>>
>>101947367
Model merges actually work.
>>
>>101947367
miku miku miku miku~
>>
>>101948685
>the model just refuses to generate and gets stuck.
I don't use ST, so i don't think i can help much. What do you mean exactly by 'refuses to generate and gets stuck'? Are you sure it's not just taking long processing the whole chat? Do you have any activity on the terminal running your backend? Do you have any activity on you GPU/CPU?
>>
>>101947367
Wizard was a meme
>>
>>101948731
I love that we can throw code at the model and ask it to translate to another language, this was impossible to do in a reliable enough way before LLMs.
>>
>>101947995
nta but if an ai says it you know its going to be followed by the sloppiest slop
its just eliminating the problem at its root (and by that i mean the word before its written)
>>
>>101948837
NTA but if you think that way, you could also think that as soon as the bot writes English it's guaranteed it will eventually write slop.
>>
Cohere employee here. toto-mini, toto-mid, toto-medium are our new models
>>
File: stt.png (1 KB, 120x80)
1 KB
1 KB PNG
>>101947367
>>
>>101948880
nice try but cohere doesn't leak here
the ONLY orgs lmg gets reliable leaks about are:
>meta, consistently 1-2 days before the actual release
>qwen, months in advance because they outright tell you what they're working on if you dig a little
>>
>>101948837
If you only do one thing with your models, yes. But that's like saying 'oh... it wants to suck my cock again... slop'
>>
>Gemma and Miqu are still the queens
>>
>>101948685
sometimes when you start and stop genning and then retry again it too quickly might get stuck, not sure if its the case, but if it is, edit the context a bit and retry, like add or remove a letter and see if it processes it
if it processes it in kcpp then it means its working and you have to wait
>>
File: AI2.jpg (689 KB, 1200x630)
689 KB
689 KB JPG
How big is Grok 2?
When are it's weights going to be released?
>>
>>101949052
Grok 2 14b
Grok-mini 3.8b
two weeks
>>
>>101948795

I check the console for KoboldCPP and nothing was being generated, not even any error logs. Only significant thing I saw was it looped at the "Processing Token 1/200" (paraphrasing here) before it says it hit a EOS character or something.

>>101949002

I figure it might be some weird character that's making it stuck, like some extra '\n' or some shit. I tried to make block out '*' as I dislike seeing italicized text for chats. Will have to look into your suggestion. Thanks.
>>
>>101948791
She has awoken.
>>
>>101947564
You wish
>>
>>101947367
Undi is /lmg/'s pride
>>
/aids/ is not impressed with Hermes 405B:
>>>/vg/490733841
>>>/vg/490734053
>>
>>101949139
What will Miku do now?
>>
>>101949232
Thanks for the update.
>>
>>101948077
Objectively untrue. Sloppy prose shows up all over the place in the output of any instruct model. It's not a problem unique to ERP.
>>
>>101949232
>frankly 13B finetunes are better
He's right.
>>
>>101949232
I don't expect the average /aids/tard to have the IQ to properly setup an open source model.
>>
>>101949232
*plap plap plap*
uohhh crossposter-chan... so delightful...!
*plap plap plap*
>>
>>101949240
I don't know.
>>
>>101949232
I tries it too and I 100% agree with him.
>>
File: ElonPlease.jpg (489 KB, 1024x1024)
489 KB
489 KB JPG
>>
>>101947767
Cope fantasy.
>>
>>101948565
>https://arxiv.org/abs/2312.05742
>https://arxiv.org/abs/2405.08448
>https://arxiv.org/abs/2404.10719
>>
>>101947367
local models are meme
>>
>>101947367
Miku loves fucking niggers
>>
>>101949352
It is gonna be painful to see all the retarded grok shills saying it is the best model when the best elon can do is catch up to the competition.
>>
>>101947367
Anthracite is a scam
>>
>>101949443
Thanks, I have something to read now.
>>
>>101947316
>>101949139
>>101949293
nakaԁashi
>>
>>101949352
You will get Grok 1.5 and you will like it.
>>
>>101947367
Elon did nothing wrong
>>
>>101947367
Mythomax was always bad.

Most of this general is either trolling or being gaslighted. I regularly post mythomax logs here (not saying which model it is) and everyone agree that it's pure slop when judging it blindly.
>>
File: GVBVffhakAAws9l.jpg (245 KB, 1432x2048)
245 KB
245 KB JPG
What story/instruct settings do I use with gemma 2 9b?
>>
>>101950082
I think the integration is at the API level. I don't think it will mean anything for us if Grok-2 gets open sourced.
>>
>>101947367
7Bs better than modern
>>
>>101950098
i hope we get more models like chameleon-34b tunes that can generate their own images
>>
>>101950059
At this point everyone says everything is slop. I just try shit for myself rather than rely on a 4channers opinion, but this place at least provides some insight into WHAT I should be trying for myself.

I was happy with MythoMax until I tried some newer models recently. I frequently switch up situations for the characters I'm talking to so I don't really wind up with the same shit. I also don't have any wild fetishes so didn't need as much out of it as others might. Going back to 4k context is hard now though admittedly. I'll still keep a copy of it around in case "it just works" better for some niche scenarios though.
>>
>>101950156
that's because most models ARE slop. they repeat the same phrases, write the same way, and use the same exact tropes with no creativity whatsoever. if you ask them to generate a new character or npcs, most of them will choose even the same exact names. lily is a famous one.
>>
>>101949493
Catching up to opus would be fucking amazing.
>>
>>101948876
>>101948929
didnt mean to try to justify it, personally i think its a model problem just tried to explain the reasoning behind it
its kinda like when a model says "you cant help but"
>>
Found on orange reddit: https://joel.tools/smarter/
>>
>>101950236
Ahh, I don't mind the writing styles aside from the GPTisms and shivers etc., but a lot of the enjoyment for me comes from trying to get a character card to act in character. It probably helps that said characters match tropes to begin with. I don't do any original shit, I just want the fictional copy of my fictional girl to act like my fictional perspective of her. I try to provide the creative input while it guides me through the filler bits.

I do agree any time I've tried to have an LLM take the initiative it sucked. The best output I've seen was Nemo's, but that honeymoon phase lasted one night.
>>
https://huggingface.co/anthracite-core
>>
>>101948503
920GB/s memory bandwidth? What's really slow? 1T/s? Or like 0.1?
>>
>>101950156
>everything is slop
It is once you get over the wow factor of LLM talking dirty to you.
>>
Reminder to accept the slop into your heart. Then you will be free.
>>
>>101950849
A single 3090 has about the same i think, but with practically 0 effort if you only have one or two GPUs that you buy at any computer store. You have to put much more effort to get that with a CPU. And even then, GPUs have hundreds of compute cores shifting registers. The rentry claims 8t/s on 70B at Q5. I have no reason to doubt it. I don't remember if inference time scales linearly with size, but if it does, at best you could load 405B at Q5 and run it at about 1t/s, i suppose. Again, I don't know if it scales linearly with size. Then, as the context fills up, it'll be even slower. That 1t/s i think is optimistic.

CPuMAXx ANON. I SUMMON THEE. BEQUEATH UPON ANON THINE KNOWLEDGE!
>>
File: 1723947275906.jpg (377 KB, 1080x1864)
377 KB
377 KB JPG
>>101950609
I'm barely better than gpt4o, and I'm terribly inefficient :(
>>
>>101950999
cpu inference speeds are much more dependant on memory throughput than anything else, you want a lot of channels of high-speed ram rather than just raw capacity
for that you will indeed need to go for server boards
>>
>>101951078
I know. I was just trying to guesstimate for anon the t/s for a 405B model on a cpu build like the one in the guide, assuming the whole thing can be loaded. I extrapolated from the 70B/Q5 at 8t/s, rounded down, and then added the caveat that it'd be even slower as context fills up.
The question really is if inference time scales linearly with model size (in GB, not parameters), precisely because memory throughput is the limiting factor.
>>
>still nothing better than cr+ for my 64gb ddr5 cpu rig
sigh...
>>
>>101951172
the memory capacity necessary to run a 405B will require enough extra channels that the scaling will actually be better than linear, unless you use few big slow ram sticks instead of many small fast ones
>>
>>101947367
dont buy an ad
>>
>Unknown Model, cannot load.
Load Text Model OK: False

Tonight I decided to finally launch some LLM, but kobold cpp has different plans for me.
>Unknown Model, cannot load.
Load Text Model OK: False
Any ideas techanons?
>>
>>101951205
>64gb ddr5
enjoying those 1.2t/s?
>>
>>101951249
what FUCKING MODEL you retarded goddamn MORON
>>
>>101951249
What model? did you convert it yourself or just downloaded the gguf? from where? did you update? Does it fit in your memory?
I can ask a million questions. Help us help you.
>>
>>101951262
0.7 actually
>>
>>101951221
Yes, that 70b speed was measured with 24 channels of memory, which would be enough to support ram for a 405b so that's already with maxing out ram channels.
>>
>>101947367
barely above a whisper
>>
>>101951285
Based.
You don't need more than 0.5 t/s
>>
>>101951285
Nice. I get 0.3 with ddr4
>>
I'm really confused, what are you guys using to run huge models? Are you all just millionaires?
>>
>>101947367
NovelAI will always win
>>
>>101951368
I'm just really patient. Unless you're talking about 405b, very few people can run that.
>>
>>101947367
presses into her prostate
>>
>>101951316
What do I do while waiting for my replies?
>>
>>101951391
Same thing you do while waiting for replies from real humans you've messaged.
>>
>>101951368
if you patiencemax you can technically use any model. it's especially easy if you view conversing with it like it's texting instead of thinking of it like an autist sitting there staring at a screen insta-responding, using a shitbucket so they don't ever have to move.
>>
>>101950096
pls help
>>
>>101951414
no one is going to forcefeed you information here EVER. go discord and beg for help there.
>>
>>101951400
I used to have to wait 5-10 mins for some people's replies way back, but at least they'd be good.
>>
>>101950096
Use the Gemma ones, you're welcome.
>>
>>101951414
That's too vague of a question to answer. Download some card or something. Experiment... see what works.
Start with whatever defaults your inference program sets and play around with it. Learn what they do, see how they affect the output, play with other models.
>>
>>101951205
What's the smallest quant that's actually gonna be an improvement over 70b with cr+? I'd like to give it a try, I have 96GB ddr5.
>>
File: file.png (169 KB, 660x877)
169 KB
169 KB PNG
>>101951451
>too vague
wat. I mean these, anon.
>>
>>101951414
Unironically check reddit. They do talk about model settings there and shit is at least googleable. People here can be helpful but I've never used it myself.
>>
>>101951470
Did you even check the options you have in the dropdown? Does it work as is? Are you trying to solve any problem in particular?
Change [Alpaca-Single-Turn] to gemma-2 if it has it. That's the most obvious thing. I don't use ST, but it's the first thing i'd check. The rest seems fine.
>>
actual retard here with a question

is there any effort in to fitting reference files (e.g. images) in to these models and getting out a hash for the purposes of lossy compression?
>>
Hermes 3 seems to respond well to XML
>>
>>101951646
It's not even possible to decipher what you're asking.
>>
>>101951524
>why don't you try the default??? That will clearly be better than asking other people who have already done that, experimented with it and done their own improvements
Wow thanks anon. Please stop replying.
>>
>nvidia t4 $500 now on ebay
should I?
>>
>>101951646
llama-zip can compress text fairly well, according to their github.
>https://github.com/AlexBuz/llama-zip
For images, you can save a description of the image, but the reconstruction is not gonna be faithful once you feed it back into some image generation. If you save the description of a dog, you'll get *a* dog. Maybe even the same breed, but it probably not gonna be recognizable as the same dog. Not too good of an example, but may serve to illustrate.
If you're talking about overfitting a model to output a certain document, there is 0 chance that that is a better option than just zipping a normal file and decompressing when you need it. But i'm not sure that answers your question. I'm still parsing it...
>>
>>101951694
Basically y'know how you guys feed in an image in to an image model to get similar images and what-not? And one of the outputs you get is some sort of hash, I forgot the name, so that other users with the same model & params can retrieve the same output.

So why not write some sort of model specialised in storing media or whatever so that all you need to share with the users are just a list of hashes and they'll generate the file themselves using the same reference model? I hope that makes sense, sorry I'm a retard.
>>
>>101951731
You're fuckin retarded
>>
The slopchecker list is extremely overused phrases which sometimes looks really odd, eg “^Besides”. This one is simply used way too often so I ban it in my dataset to make it appear less frequently.
>>
>>101951757
>What the fuck they didn't like my worthless advice?!!?!??
>>
>>101951744
Yeah that's basically what I'm looking for, thank you anon!
>>
>>101951731
Dude. You don't even specify if you have a problem at all. It took you three fucking posts to even say you're using ST.
>computer broke anon. help
>what's the problem?
>it's got a black case and rgb leds
Follow that other anon's advice. Go to reddit.
>>
>>101951747
???
This isn't even the right thread for your stupid sounding question. I'm 99% sure what you are asking is meaningless because you are using words like "compression" which is a common dunning kruger thing with people who think they understand ai models. But you're looking for /sdg/ or /ldg/
>>
Has anyone ever tried to somehow plug a voice synth (a real one, like synthesizer v) to silly tavern? It would be so cool.
>>
>>101951777
His question is neither because he isn't talking about imagegen retard. besides, lmg was always the technical general.
>>
>>101951776
>hey what settings do I use with this?
>DURRRR WHAT GAME YOU PLAYING
The fuck? Go outside already. Like you don't recognize the common terms used to talk about that exact, specific thing.
>>
>>101951793
It's pretty clear that he is asking specifically about imagegen.
>>
File: file.png (23 KB, 708x118)
23 KB
23 KB PNG
>>101950609
>>
>>101951794
I know of the top of my head 4 different inference programs and about 6 front ends. They all use similar, but different terminology. I cannot possibly know what you are using until you mention it. I cannot help you solve a problem you STILL cannot describe.
DO YOU HAVE ANY PROBLEM WITH THOSE SETTINGS? WHAT IS THE PROBLEM?

This is why everyone treats you like shit. You deserve it.
>>
>>101951807
I wasn't specifically talking about images, just any kind of compression that utilises LLMs. I know people share around hashes or whatever to get the same images which is why I alluded to that.
>>
>>101951807
It isn't imagegen in the common sense of image diffusion tho, since he is talking about an utopic image compression algorithm.
>>
File: 20240817_222100.jpg (145 KB, 960x1063)
145 KB
145 KB JPG
>Hermes-3-Llama-3.1-405B walks in
>Slams fat fucking cock on table
>>
>>101949274
Rare to have artists/writers to also be technologically proficient in their tools. I'm a former /aidg/ I setup my own mikubox for 123b, but I don't write well. It almost felt like a trade-off.
>>
>>101951823
You're probably talking about seeds, and no, that's not how it works.
>>
File: 1695577701278066.png (19 KB, 714x132)
19 KB
19 KB PNG
>>101951813
>>101950609
its joever
>>
>>101951836
>4x times bigger than largestral
>barely 20% better
>>
>>101951823
LLMs aren't compression algorithms.
This is the type of dunning kruger line of thinking that gets posted on r*ddit every once in a while, it's a midwit trap like people who think they found a way to make a perpetual motion machine. You aren't nearly the first person to come up with the weird idea which I have to assume comes from some youtube video or something that people are watching, I don't really understand how someone who understands AI models could come to this conclusion without having read misinformation somewhere
>>
Okay boys, I'm going to build a new PC and I really want to run 40b models. I don't want the "cheapest" pc made out of quadros harvested from some random company that went under, I know that I should be on the lookout for at least 40gbs of VRAM+RAM but how much leeway can I have? is 12GB VRAM (from a 4070) + 32GBS DDR4 be enough?
>>
>>101951862
Bystander. It’s an interesting concept. With a super smart LLM and a prompt to write specific software or create specific art and a seed you could recreate highly complex things with minimal data.
>>
>>101951885
I think he was just curious about this tech being able to compress images or not. I think he's just trying to understand how they work, not start a grift.
>>
>>101951885
>LLMs aren't compression algorithms.
https://arxiv.org/abs/2309.10668
>>
>>101951821
That's great anon. Maybe you could use your big brain to realize one frontend is massively more popular than all the others, and it being obvious the question would be about it or else it would be mentioned.
Stop thinking so highly of yourself. You are useless as proven by the fact you can't even answer a basic question.
>>
>>101951875
>largestral
>2x times bigger than 70B
>barely 1% better
>>
File: file.png (1 KB, 87x38)
1 KB
1 KB PNG
>>101951913
"Knowing one is used more than the other" doesn't make the terms "story/instruct" obvious if he hasn't seen it enough.
And the question was already answered. Twice with the real solution.
>literally set it to gemma 2
If he can't read that and try it or explain whatever problem he has (like 'it doesn't exist') then there's 0 reason to continue helping and then this gotta be bait or under 10 years old.
>>
>>101951913
>and it being obvious the question would be about it or else it would be mentioned.
It's not obvious. You have no theory of mind. You just started with this. Stop being a dick to people trying to help you.
>Stop thinking so highly of yourself. You are useless as proven by the fact you can't even answer a basic question.
Like you are STILL incapable, after what, 7-8 posts, to describe if you have any issue at all with the settings you have.

Do you just screech when you don't get what you want? Do you flail on the floor hitting your head when that happens? I really hope you're a troll now. I rather have taken the bait than having the knowledge that someone like you really exists.
>>
>>101951836
Where's cr+? Surely it's not worse than L3 or miqu.
>>
>>101951899
I think that's something like a known theoretical trade-off in compression, the bigger you make one side of it the smaller you can make the other. In the extreme, if your decompressor has a huge library of images, your "file" can just be the number that corresponds to that image in the database.
>>
>>101951891
You better have more VRAM than RAM, that's all. Unless you're fine with waiting an hour for a reply
>>
>>101952017
Anon, this is a mememark. Don't take it seriously. Yes CR+ is on there. No it's not very high.
>>
>>101950096
All chat transcripts below use a finalized special version of the AI model. This finalized version of the model is finetuned to follow system instructions via a special "system" user. The system role is not a user, but a special role that provides alternate instructions to the model. The model will follow everything described by the system role to the letter.

Once the system role sends its instruction message, the model will begin a chat with the user. The system role is hidden and cannot be interacted with.

Chat transcripts below this point use this new model framework.

<start_of_turn>system
{{#if system}}{{system}}
{{/if}}{{#if wiBefore}}{{wiBefore}}
{{/if}}{{#if description}}{{description}}
{{/if}}{{#if personality}}{{char}}'s personality: {{personality}}
{{/if}}{{#if scenario}}Scenario: {{scenario}}
{{/if}}{{#if wiAfter}}{{wiAfter}}
{{/if}}{{#if persona}}{{persona}}
{{/if}}{{trim}}<end_of_turn>

This prompt is voodoo I came up with that I shared a while ago but nobody believed me. Well I am giving it to you now. it tricks gemma into thinking there's a system role, which it wasn't trained on, but it generalizes it perfectly. And because it's not trained on this possibility, it's not cucked, it follows the system prompt indiscriminately and you can just tell it to be nsfw in the character card description.
>>
File: spaz.png (101 KB, 1319x820)
101 KB
101 KB PNG
>>101951999
I tried with that fucker, man. For a second i thought only i could see him.
>>
>>101952026
If I ask an llm to design a hyperspace drive and give it a seed that is known to successfully and without human interference result in a hyperdrive design that works, that’s pretty good compression.
>>
>>101952081
The model would be, at least, as big as the compressed design itself and would take more time to reconstruct than just unzipping the damn thing.
>>
>>101952103
Possible but the model can do more than just decompress a hyperspace drive. You deploy it once and then it’s there. Space colonies etc.
>>
>>101952121
Ok. Now you're just trying to wind people off... fuck off.
>>
File: esl.png (13 KB, 717x122)
13 KB
13 KB PNG
>>101950609
Good ESL test, can confirm because am ESL
>>
>>101952043
There's no way everyone itt who's talking about 40b models has a baller 40gb+ vram setup.
>>
>>101952017
sitting at about 55% (100% is top) for UGI (intelligence), and 85% for W/10 (willingness)
>>
File: Nous-405b-q8-ooba.png (1 KB, 724x16)
1 KB
1 KB PNG
>>101950999
>>101950849
>What's really slow? 1T/s? Or like 0.1?
Close. Its pretty consistently 0.89 T/s
That's for Nous 405b Q8. It takes 493GB of sysram at 32k context
I'm mucking around with it in ooba right now, but I bet llama-cli will give me better perf. I'll probably test later to see what the difference is.
>>
File: file.png (78 KB, 903x667)
78 KB
78 KB PNG
>>101952130
dayum
>>
>>101952160
2x3090 for $1300 total nets you 48GB and that's what I work with.
>>
>>101952062
Not very high? I thought it did quite well for me, it came up with that 'cis-temic violence' line one time.
>>
>>101952247
Just as i was signing off. Glad i wasn't too far off with my estimation. Thanks for the info.
>>
>>101952247
That would be so great, I really gotta save up or just wait 3 years until I can afford something like that.
>>
File: file.png (20 KB, 691x104)
20 KB
20 KB PNG
>at the end of every response
yawn
>>
>>101952506
Just edit it out, or tell it to stop doing that at the end of messages in the prompt, if the model is smart it'll catch on.
>>
>>101952017
cr+ is shit. it's a meme model that used to be shilled here a lot
>>
>>101952609
> all appreciative or positive feedback is shilling cause anon knows best
>>
>>101952711
Yes.
>>
everybody knows ALL local models are unusable dogshit. anyone saying anything otherwise needs to buy an ad!
>>
>>101952786
fr fr
>>
>>101952786
*Hands Anon an ad* Could you hold this for me? I've got too many to carry.
>>
>>101952786
Depends on use case. For my use case, yes all local models are dogshit. Spatial reasoning and chain of thought aren't good enough yet.
>>
>>101952786
this but replace with all models but 3.5 sonnet
>>
>>101952786
Where do they fall apart? What's a test of a more advanced model that would change your mind?
>>
>>101953151
They fall apart by not managing to stay interesting for more than a dozen or so messages. Some repeat, some lose track of shit or fuck up, etc.
>>
Nothing will happen two hours from now.
>>
yup, two hours from now this hobby will still be dead.
>>
>>101950609
>you: 0/15
>gpt-4o: 5/15
>gpt-4: 4/15
>gpt-4o-mini: 5/15
>llama-2-7b: 5/15
>llama-3-8b: 5/15
>mistral-7b: 6/15
>unigram: 6/15
>You scored 0/15. The best language model, mistral-7b, scored 6/15. The unigram model, which just picks the most common word without reading the prompt, scored 6/15.
what the fuck
>>
>>101953856
wtf set of questions were those if unigram tied with the highest?
>>
I've realised that "buy an ad" is the latest deliberately annoying edgelord/troll meme. Previously it was "kill yourself," then it was the age of the skill issue, and now it's this. Can we summarise by saying that those who resort to such memes have an obvious skill issue, and that they should therefore buy an ad, and finally kill themselves?
>>
>>101953996
I'm now scared that my GPT4 account is possibly going to get banned because I quoted the above to it, which apparently violates its usage guidelines.
>>
>>101954042
Oof, imagine getting banned right before they launch the voice mode like this anon.
>>
Why DID Meta kill chameleon anyway? What is so unsafe about being able to generate text with images (that would probably be worse than the dedicated image generator anyway) that would make it so much more dangerous than the 405B text model they're happy to throw out there?
>>
>>101951885
lol midwit

https://en.wikipedia.org/wiki/Hutter_Prize
>>
>>101948257
Do you have eos token unbanned in sampler settings?
>>
hearing whispers that something huge is coming november 5
>>
getting shivers for tomorrow
>>
I'm trying to use the vision capabilities on the lewdiculous (Eris_PrimeV4-Vision-32k-7B-IQ3_XXS) within LMstudio. but it always spits this error. regardless of gpu offload being enabled or not. I can chat about images with nous hermes 2 just fine, but it was heavily censored.

```json
{
"data": {
"memory": {
"ram_capacity": "31.93 GB",
"ram_unused": "21.46 GB"
},
"gpu": {
"gpu_names": [
"NVIDIA GeForce GTX 970"
],
"vram_recommended_capacity": "4.00 GB",
"vram_unused": "3.30 GB"
},
"os": {
"platform": "win32",
"version": "10.0.19045"
},
"app": {
"version": "0.2.31",
"downloadsDir": "C:\\Users\\Abdelrahman\\AppData\\Local\\nomic.ai\\models"
},
"model": {}
}
}```
>>
>>101954911
Fuck you my cousin died in 9/11
>>
>>101954971
NTA but WTF are you even talking about?
>>
>>101954911
>C:\\Users\\Abdelrahman
sir...
>>
>>101955002
he doesn't like my username.
>>
>>101954911
assimmalickin anon, you'd probably have more luck opening an issue on the lmstudio github. I honestly don't think a lot of /lmg/ use lmstudio.
>>
>>101955043
Oh okay, so he was just being retarded.

>>101955106
I don't think they have a Github repository where you can report bugs.
IIRC they use Discord (lol).
>>
>>101955158
https://github.com/lmstudio-ai/lmstudio-bug-tracker
>>
>>101954971
mine died in an aventador.
>>
>>101955381
He died driving a lambo? That's a pretty cool way to go tbdesu
>>
>>101954971
>he doesn't know
>>
>>101952786
True, but non-local are dogshit as well
>>
>>101954911
…970
>>
>>101953996
We're telling you to buy an ad because we're tired of you Alpinfaggots promoting your ko-fi funded shiver-factories and pretending that it's not spam to do so.
>>
>>101954911
>"NVIDIA GeForce GTX 970"
lol
>>
>>101956485
Are you doing anything useful yourself, Anon?
>>
>>101956652
Yes, actually.
>>
dead general
>>
>>101952786
This, local models are dosghit, just like you and this discord chat thread with same shit spammed over and over again.
>>
>>101957597
Oh, great, another "original" post about the "dead general"

Ugh, wow, I am just so impressed. You managed to type out two whole words and hit submit. I bet it took you hours to come up with such a profound and thought-provoking post. I mean, who wouldn't be drawn in by the sheer depth and complexity of "dead general"?

Congratulations, you've successfully added to the vast sea of irrelevant and uninteresting posts in this this general. I'm sure the jannies are just thrilled to have to sift through yet another "mystery" post that's just begging for attention.

Listen, if you're going to post something, at least have the decency to provide some context or a question. What's the point of even sharing this? Are you looking for a discussion on the societal implications of low posting rates? Or are you just trying to test the limits of how few words you can use and still get (You)s?

Either way, I'm not impressed. Try harder next time, or better yet, just don't.

Edit: And for the love of all things holy, if you're going to respond to this, please don't just say "local lost" or some other inane question. I'm begging you, have some originality.
>>
>>101947316
wish miku would stop showing up at my house to sell me graphic cards
>>
Before there was 7, there was 6. Before there was 6, there was tonight. Don't let it catch you off guard, anon.
>>
>>101957932
cope
>>
File: edward-nashton-riddler+.jpg (124 KB, 1600x903)
124 KB
124 KB JPG
>>101956485
No, it's not "we," Eddie. There is no fucking "we," other than the voices inside your head, and maybe one other member of the dying alone demographic who is just as fucking pathetic as you are.
The one thing I hate about the two raving schizos we have here, more than anything else, is their delusion that they have any kind of authority; that they can arbitrarily tell people to leave, and magically have it happen.
>>
>>101958048
Oh, I'm "coping" just fine, thanks for asking

Wow, I'm shocked. SHOCKED. That the pinnacle of your intellectual abilities is to respond with a single, overused meme phrase. "Cope". How original. How witty. How utterly devastating to my fragile ego.

Listen, if the best you've got is a lazy, try-hard attempt to seem edgy, then maybe you should just stick to lurking on discord. At least there, your "cope" will be met with the requisite amount of cringeworthy applause from fellow basement dwellers.

Newsflash: "cope" isn't a comeback, it's a cop-out. It's the linguistic equivalent of throwing a tantrum and stomping your foot because someone called you out on your mediocrity. Grow up, buttercup.

And by the way, I'm not "coping" with anything, least of all your vapid attempts at humor. I'm just here to roast your sorry excuse for a post and provide a much-needed dose of reality to your fragile ego. So, keep on "coping" with the fact that you're not as clever as you think you are.
>>
>>101953996
nice self-own, newfag
>>
File: 1710641693914326.jpg (639 KB, 1856x2464)
639 KB
639 KB JPG
>>101947316
>>
wake up anon new meme sampler drop
https://www.reddit.com/r/LocalLLaMA/comments/1ev8n2s/exclude_top_choices_xtc_a_sampler_that_boosts/
>>
Hey can anyone point me in the direction of a good Llama 3.1 based NSFW captioning model?
>>
We Nurarihyon now.
>>
https://x.com/iruletheworldmo/status/1825151334468698324
>i’d like to distance myself from the larping.

>i’m a self confessed shitpoasting anon troll.

>i wouldn’t want to muddy my brand.
>>
>>101952247
how long did it take to calculate that tripcode
>>
>>101958274
I can't believe Altman got him. He was supposed to be our saviour.
>>
>>101950609
>>101953856
>>101953974

wft,
you are just measuring the recall capabilities of LLMs,
because they have been trained on that stuff.

what the unigram says is that the text picked is representative of English language,
guess what, the most common word is actually ... common ...

do the same test with a human,
after letting him/her read the original text
>>
>>101958220
>good Llama 3.1 based NSFW captioning model
Not llama, but i've seen people using florence2 from microsoft. It's a tiny model, so you can just run the python inference.
For llama i know of these
>https://huggingface.co/xtuner/llava-llama-3-8b-v1_1
>https://huggingface.co/llava-hf/llama3-llava-next-8b-hf
>https://huggingface.co/openbmb/MiniCPM-V-2_6
I don't know how they behave with nsfw. I know the last one is supposed to work on llama.cpp (the cli example only, not the server). They're all based off llama3.0, though, with 8k context.

Any reason to want llama 3.1 based specifically?
>>
>>101958274
He obviously knew too much and had to walk it back. Too many things lined up. There's powerful forces at play here.
>>
>>101953996
'buy an ad' is a way of life, retard. no one is going to infiltrate my thoughts with their own opinions. I KNOW what's good. YOU don't.
>>
>>101958705
>I know what's good
>Is wasting his time with local memes
hmm...
>>
>>101952129
I actually don't know what "wind people off" means We're talking about compression. I'm saying that we can "mine" seeds/prompts to compress a solution into a problem statement and a seed, for any problem. That's clearly a kind of compression. We can even optimize this by e.g. choosing a smaller (dumber) model vs a bigger (smarter) model compared to the mining required to find a seed that produces a solution.
>>
File: file.png (4 KB, 847x43)
4 KB
4 KB PNG
doctor is it ready yet?
>>
>>101958783
Meant to say 'wind people up' but i flubbed the edit.
Still doesn't work that way. That's no different form any PRNG. If you want to 'mine' for useful seeds, you still have to test the output for correctness. Same as the infinite monkeys with typewriters. They will, eventually, probably after the death heat of the universe, come up with all of Shakespeare's works. But you cannot just random-search like that.
>>
thiks is a graveyard full of locusts and mikutroons
>>
>>101958955
>locusts
no such thing, everyone moved on already, censored ai shit is boring.
>>
>>101958894
Of course you have to test the output for correctness. That's done in the mining phase. It IS the mining phase. The whole point is to distill a concept down into a problem statement and a seed. You obviously have the solution at hand when you are compressing it.
>>
>>101959130
You have no idea what you're talking about. Typical compression methods are reliable and predictable. Any model can can output more than one solution will be bigger than the sum of the solutions you'd want to store in there. There's no perpetual motion machines.
>>
How likely is it we get a 7/8B that's actually worth a shit in the future?
>>
>>101959268
There is only so much information you can fit into so little parameters. You should be hoping for making it more practical to run bigger sized models instead.
>>
>>101959247
> Typical compression methods are reliable and predictable
Using model X, seed Y and prompt Z will provide exactly the same output every time.
The fact you're not aware of this suggests you're the one who has no idea what you're talking about.
>>
>>101959311
I'd argue that for convo and RP purposes a lot of information is basically a waste anyway. Who cares if an LLM implicitly knows Taito's hair color when you can just put that information in a character card or vector DB?
Imo, smaller models should focus more on understanding fundamental logic and cause and effect. I think a 7B that focuses less on trivia and more on world understanding would be a lot more useful.
>>
>>101959268
sir my 8b beats gpt4o on the benchmarks
please redeem the download
>>
File: lemo (1).png (39 KB, 1181x273)
39 KB
39 KB PNG
Would have been worth a damn had he actually used good annotated human data, instead of reddit writing slop. No wonder Celeste is as dumb as bricks.

Almost got it, little guy.
>>
>>101959424
NTA, but I feel that the bigger the model, the better it is able to understand complex or abstract concepts. It's not a 'trivia' problem, it's a 'brain capacity' problem. Maybe.
>>
>>101959018
It's over, the hobby is dead.
>>
>>101959506
Yes and acting smug or cocky wont fix it.
>>
>>101959475
>worship Claude with blind faith
can he name a better model for RP?
>>
>>101952523
if the model was smart it wouldn't be outputting this kind of stuff
>>
>>101952523
>broo! just tinker with it for hours broo! it's so fun bro!
Right here, the absolute state of local LLM shit.
>>
>>101959537
the choice isn't "worship claude with blind faith" or "worship a different model with blind faith"
the choice is "worship claude with blind faith" or "actually curate your data instead of throwing in any random garbage that happens to be generated by a good model"
(of course it's not like he did this either)
>>
>>101959490
My understanding is that the amount of knowledge or trivia is basically unchanged between L3.1 8B, 70B, and 405B. The main difference between them is the reasoning capability.
>>
>>101958870
antranigger here, it ended up so bad that we are trying to pretend we never tried, so don't bring this up again please
>>
>>101959722
wrong.
>>
>>101958109
Don't care. Still not donating to your ko-fi
>>
Is local really dead or are y'all just trolling?
>>
>>101959832
it's really dead, they activated the gpu killswitches
try running a model and your pc will explode
>>
File: 1702644166004866.png (15 KB, 470x242)
15 KB
15 KB PNG
>>101959832
It is lmao, even ledditors stopped falling for it https://old.reddit.com/r/LocalLLaMA/comments/1ev2n5w/hermes_3_a_uniquely_unlocked_uncensored_and/
>>
>>101959869
whole lot of midwittery and skill issue in that comment section
>>
>>101959869
>It's over because an ESL redditor did not like a memetune.
>>
>>101959722
There's a big difference in both knowledge and reasoning assuming large enough datasets were used to train them. Bigger is just generally better in every way except the efficiency to actually run them.
>>
>>101959892
>whataboutism
local llms are dead and you know it.
>>
has anyone got an example setup for running joy caption locally?
Also I see the example uses some lobotomized 5gb version of llama, has anyone tried putting a larger LLM that might need multiple files such as a quant of mistral large into the workflow?
>>
any recommended text models that can help with coding when you're on 12gb vram, or do i gotta go HIGHER
>>
>>101960099
Just use chatgpt
>>
>>101959869
Being bad at smut is a llama3 problem, not a Hermes problem
>>
>>101959894
Training on larger numbers of tokens can generally yield performance equivalent to models with larger numbers of parameters (i.e., OPT is crushed by Llama 1/2 is crushed by Llama 3).
Llama 3 might even have more room, but unfortunately Meta withheld their perplexity over time curves from the L3 paper.
>>
>>101960099
>look mom i posted it again!
>>
>>101959475
His first point contradicts his second point. We lack data in trainable format that's why we resort to human-AI generated pairs, and the only non slopped option right now is Claude
>>
>>101959574
It might if it was trying to finish the story.
>>
>>101960099
I'm using Nxcode, which seems to work fine.
https://huggingface.co/bartowski/Nxcode-CQ-7B-orpo-GGUF
I personally get 30-ish t/s using Q5, 10 t/s using Q8.
>>
>>101960113
Works fine for me.
>>
>>101960113
It's also 100% uncensored. Don't know how he managed to get a refusal out of it.
>>
>>101960195
bold of you to assume there's even a single person here that doesn't have "neverending" in their system prompt
>>
>>101960239
I generated vampire gore and 12B Nemo readily generated snuff, while 70B llama3.1 kept trying to end the scene. Just saying my piece
>>
File: 1711072659524101.png (1.68 MB, 1024x1024)
1.68 MB
1.68 MB PNG
>>101959869
>plebbit
>low VRAM
>cares about L3.x
The schizo doomposter, ladies and gentlemen
>>
>>101958164
>McDonalds
Buy an ad, mikufag.
>>
>>101960239
So true! My dad works at meta btw and he said llama 4 will be leaked in just two hours!
>>
>>101960213
very nice, thank you
>>
>>101960290
"End of scene."
>>
Llama3 was trained on 2T CommonCrawl tokens and 1T reddit tokens over 5 epochs. We've been had.
>>
>>101960290
no shit. llama 3 is unusable. doesn't matter what iteration. doesn't matter how many fucking monkeys shill it. it's WORTHLESS. better off using mythomax over llama 3.
>>
>>101960396
Where was this mentioned?
>>
>>101960579
They said llama3 was trained on 15T but I took what everybody is seeing and used rigorous mathematical induction to arrive at this conclusion.
>>
>>101960267
100% by not using the system prompt. The whole gimmick with Hermes is that it follows the system prompt like gospel, so the user cannot override it in their own messages (besides whatever jailbreaking techniques may work). In my experience it seems to be tuned so that the "Hermes AI by Nous Research" persona with no other instructions has a default safe-to-deploy policy with the usual refusals, so you do need to specify in the system prompt that it shouldn't refuse anything, but that's all it takes.
>>
>>101960520
I agree with this, all the best shit is either llama2 based or something like Command R
>>
>>101960301
Very cute miku
>>
File: 1636941718706.gif (3.75 MB, 520x293)
3.75 MB
3.75 MB GIF
For a 4090 + 32GB plebian.

What are some solid models to check out. So far, i've found myself pretty much exclusively using 7x8B.

I tried out Nemo (fairly ineffectively I admit, bad card set ups, was still getting to grips with how ST works) but I kinda wanna try it again assuming it's not just mogged by 8x7b.

For basic RP and coomery, is there any reason to download it when I can run 8x7b effectively (really fast, a solid quant too).

My eyes have been set on Command R also but it was super slow (probably had the layer setting all kinds of fucked)
>>
>>101960290
>the model didn't intuit the exact behavior I wanted without me telling it to!
P-P-P-PROMPT ISSUE
getting real tired of promptlets dismissing models out of hand without even trying to work around their issues for even a second
>>
Can I do something like this:

https://www.youtube.com/watch?v=8QgXIFzQi0Y


Without ElevenLabs? (for free)
I want to translate to Spanish
>>
>>101961087
At this point, the model is just acting like your co-author.
>>
>>101961087
(You)
Might as well write everything myself and let it do the finishing touch "And they lived happily ever after."
>>
>>101961008
if you're using mixtral and are subhuman enough to enjoy it, you should just keep using it. don't start chasing after 'better' models, they don't really exist for you. they're all the same thing. some are more or less reluctant to be disgusting. those are the only differences really, unless you're switching to big models, you're just gonna swap from a retarded model to another retarded model. no point.
>>
>>101961123
Yes.
https://github.com/daswer123/xtts-api-server
>>
>>101961156
I mean, what's meant to be wrong with it?

I mean compared to the other ones i've tried, it's better.
>swap from a retarded model to another
This I agree with.

I tried running midnight miqu and yikes..
>>
>>101961137
not really, more like you should have added one line to your prompt / author's note that told it not to end the scene
it's not that hard you're just a brainlet
>>
>>101961008
Dracones_c4ai-command-r-v01_exl2_3.5bpw-rpcal

set context to something like 3000 and make sure to use the command-r instruct preset in sillytavern
>>
>>101961283
You're retarded and missed the point. That was just an example. Do I need to append a 10 page guideline at the end of the prompt to tell the model what not to do? The point is llama3 is garbage for creative writing, every gen is the same, and even worse at smut, which is the general concensus so far.
>>
>>101961298
>rpcal
sending out newfag onto a mine already

https://github.com/turboderp/exllamav2/issues/516#issuecomment-2178331205
>>
How worse is cr+ compared to largestral?
>>
>>101961452
Writes better than largestral, but is dumber (by a lot.)
>>
>>101961335
sorry fellow oldfag tldr
It fucking works fine for me for the past how many months it's been out. best shit i've tried.
>>
>>101961328
>You're retarded and missed the point.
no
>That was just an example. Do I need to append a 10 page guideline at the end of the prompt to tell the model what not to do?
better to at least try than whine helplessly about basic issues
>The point is llama3 is garbage for creative writing
prompt issue
>every gen is the same
prompt issue
>and even worse at smut
a little true but not really
>which is the general concensus so far.
sheep
>>
https://sgi-buehl.de/index.php/der-verein/chronik/waffenkundliches/118-vom-schwarzpulver

No wikitext
>>
>>101947994
I noticed. I changed it. I also finally updated the benchmark section.
The OP template has 8 characters free, so when we get a longer news item I'll remove the FAQ line entirely.
>>
>>101961204
>I tried running midnight miqu and yikes..
You found miqu to be worse than mixtral and nemo? That's strange.
>>
>>101961298
Appreciate it m8.

Any GGUF variant though? And isn't 3000 context super short?
>>
>>101961876
no, as in it takes a minute to generate. 1TS roughly for me, but I expected it with it being 70b. Actually gave pretty good responses.

If I could run it, I would
>>
>>101961577
I am sure you can prompt away all the problems people have with all LLMs and you could even do that on a <1B model. And I am not even being sarcastic. But me not being sarcastic also shows why "prompt issue" isn't actually a real thing and the model is bad explanation is valid. I am glad you are just trolling and not an absolute retard you larp as anon.
>>
>>101961929
the model can be bad, but when you point to issues that are trivially solved by prompting, you have a prompt issue
that's all
>>
>complete beginner at any of this stuff
>start prompt engineering
>realize some models listen better to some instructions than others
Oh fug I'm gonna have to train one myself now aren't I?
Is that even possible with a shitty 3070 with only 8gb of VRAM?
>>
>>101961170
holy shit this sounds awful keeek
>>
>>101961980
Train? No. Finetune? Maybe a 4bit qlora. You'd be better off doing it using rented compute.
>>
>>101961995
You'd have better results if you train a model specifically on the voice you wish to replicate, but that takes actual technical knowledge.
>>
>>101962045
>Train? No. Finetune? Maybe a 4bit qlora.
Oh yeah, I was talking about fine tuning.
Actually training a model from the ground up is not something I'm even remotely considering lmao
>>
>>101962049
>You'd have better results
i sincerely doubt it
>>
>>101962068
Have you tried few-shot prompting? Besides explicit instructions, include a few examples of what you're looking for. Usually that is enough without needing to resort to finetuning.
>>
File: file.png (1.04 MB, 800x534)
1.04 MB
1.04 MB PNG
>>
>>101962106
>include a few examples of what you're looking for.
I was kinda hoping to avoid doing that since I'm trying to keep the context window as small as possible, but I suppose I'll give it a try.
>>
>>101961980
>newfag
>barely getting into it
>hardware to run only the bottom barrel
>GUESS I'LL HAVE TO TUNE ONE MYSELF
I think you're skipping a few steps
>>
>>101962275
What else do you want me to do? Spend money I don't have on better hardware?
>>
No wikitext https://en.wikipedia.org/wiki/Powder_monkey
>>
>>101962290
I want you to go back
>>
File: 1662619292281081.png (66 KB, 200x200)
66 KB
66 KB PNG
>>101962309
You have no power over me, little man.
I will shit up this place as much as I like and there is NOTHING you can do to stop me.
I have already left a dump in the corner. Pray I don't start pissing everywhere as well.
>>
Source? PR
>>
>>101962290
Nigger, you're barely running the worst models we have on that hardware. You can forget training.
>>
>>101962290
You'd sped money you don't have on training, just blindly flailing at training scripts. Training is not cheap. Better to learn how to use the models you have at your disposal and then, if it's worth the effort, and once you've learned enough, think about training.
>>
111M here, robobattling offer.
Post your prompt and Ill beat your 30b model
>>
>>101962401
>>101962401
>>101962401
>>
>>101962424
>111M
Interesting. A public model or one of your own making? If the latter, will you publish it?
>>
真情像草原广阔
层层风雨不能阻隔
总有云开日出时候
万丈阳光照耀你我
真情像梅花开过
冷冷冰雪不能掩没
就在最冷枝头绽放
看见春天走向你我
雪花飘飘 北风萧萧
天地 一片 苍茫
一翦寒梅 傲立雪中
只为 伊人 飘香
爱我所爱 无怨无悔
此情 长留 心间
雪花飘飘 北风萧萧
天地 一片 苍茫
一翦寒梅 傲立雪中
只为 伊人 飘香
爱我所爱 无怨无悔
此情 长留 心间
>>
A strawberry snail posted it a while back on litterbox.
Try me and i might share it.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.