[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1680307620630.jpg (481 KB, 2048x3072)
481 KB
481 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101607705 & >>101600938

►News
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: OIG1.gLxm3isVEvwv1M.jpg (155 KB, 1024x1024)
155 KB
155 KB JPG
►Recent Highlights from the Previous Thread: >>101607705

--QuantFactory issues and bartowski recommendation: >>101611032 >>101611050 >>101611076 >>101611091 >>101611104 >>101611185
--Mistral Nemo setup and optimization discussion: >>101609385 >>101609403 >>101609492 >>101609497 >>101610041 >>101610089 >>101610376 >>101610861 >>101609509 >>101609818 >>101609860 >>101609913 >>101611610 >>101611674 >>101611719 >>101611790
--Logs: Meta's llama-3.1-405b-instruct: >>101610080
--Local AI models preferred over proprietary ones due to privacy, reliability, and customization concerns.: >>101607953 >>101608039 >>101608063 >>101608199 >>101608285 >>101608355 >>101608377 >>101608450 >>101609724 >>101609957 >>101610284 >>101611916
--Rinna releases 70B LLaMA 3 Youko for Japanese tasks: >>101611818
--Example dialogue and speech style in character cards: >>101609337 >>101609380 >>101609694 >>101609754 >>101609773 >>101609794
--Ooba high ram usage and modern frontend development: >>101610450 >>101610485 >>101610496
--Obedience in local vs public models: >>101609499 >>101609516 >>101609540
--Linux error message troubleshooting: >>101610587 >>101610602 >>101610623 >>101610634 >>101610923 >>101610649
--Gamemakers using ERP chatbots and AI paranoia: >>101609165 >>101610866 >>101610975
--Anon doesn't understand how LLMs work: >>101607886 >>101607958 >>101608547
--Running models locally with AMD and Windows, ROCm status, Kobold build recommendation: >>101611846 >>101611869
--Anon gets broken output and anons suggest using rewrite extension, regex, stopping strings, BERT, or Phi3 mini: >>101612058 >>101612077 >>101612079 >>101612092 >>101612134 >>101612121
--Why is Meta lagging behind Anthropic?: >>101612184 >>101612219 >>101612244 >>101612565 >>101612582 >>101612596 >>101612620 >>101612228 >>101612293
--Motherboards for stacking 3090s: >>101607858 >>101607961
--Miku (free space): >>101608791 >>101610851 >>101610993

►Recent Highlight Posts from the Previous Thread: >>101607712
>>
>>101612990
>Why is Meta lagging behind Anthropic
Why did you include the connections benchmark discussion into here
>>
File: 1486655370478.png (197 KB, 499x427)
197 KB
197 KB PNG
Fine, ill make my own silly tavern, with blackjack and hookers
>>
where the iq1_xxs for mistral large?
>>
>>101613046
nigga just talk with yourself at this point
>>
Opinions on OpenELM/Phi-3?
I can run them on my laptop's cpu, and I can use the guidance python library with no problems, but what could I make?
From a coder's perspective, no-coders/"ai coder saar" need not apply
>>
File: i am cum.png (166 KB, 1864x419)
166 KB
166 KB PNG
>>101612990
oh GOD im gonna fucking CUm
also im not using linux you dingbot ai
>>
>>101613083
>CUm
ROCm
>>
File: 1510690964932.png (1.24 MB, 1280x853)
1.24 MB
1.24 MB PNG
>>101613067
(btw im running quantized stuff, up to 8 bit, if it wasn't obvious)
mandatory lust-provoking image for better chances of getting a reply
>>
>>101613009
I wrote the bot to operate on individual chains based on which posts are linked. So it saw that as a single conversation.
>>
>>101613112
Improve it, then. This is not a very good indication.
>>
Nemo is the best absolutely retarded model I have used so far. It is as good as it is dumb and as 70B models are smart.
>>
What do I do with abundant GPT-4o access? I have tens of keys and Azure endpoints. Vision datasets? Or something else?
>>
>>101613134
ask /aicg/
>>
File: 1570060417629.jpg (50 KB, 678x710)
50 KB
50 KB JPG
Anyone tried Nous-Hermes-2-Mixtral?

How is it compared to Nemo for realistic convos?
>>
>>101613140
I'm asking it in /lmg/ because I want to use it to help local models
>>
>>101613116
Not so simple. tbdesu I think the links read fine, the issue is with the title. I am trying to improve it, believe me. At least it's not doing those gay clickbait titles anymore.
>>
>>101613134
Gather prompt example data i guess, comparing them with local models.
I wouldn't even know what i'd do if i had any closed AI access, i froze up using mistral large when someone leaked keys here because i was like "oh cool but i can't even use my original content because i dont want my personal shit on someone's cloud server.."
>>
>>101613134
If you can, run these prompts:
https://huggingface.co/datasets/lmsys/lmsys-chat-1m
with gpt4o. They are all human-written and contain a lot of diversity. Ideally, you'd remove very short prompts (<5 tokens), and do some basic deduplication.
>>
>>101613237
Uhh, one million... Kinda a lot. Do you mean just the first message from those multiturn conversations?
>>
>>101613247
yeah, only the first message. I do think if all the "hello", 'Hi', "who are you", "what can you do?", etc. are removed, along with exact string duplication, it will cut them in half at least.
>>
File: shazamthirstfb.jpg (40 KB, 656x343)
40 KB
40 KB JPG
Redpill me on the best model to run that will get me the closest experience to CHaracter AI.

My PC is an utter unit so it can probably handle it
>>
>>101613263
Yeah, it does make sense, and the context would be small, so 500K * (~100 tokens per user message + ~100-1500 tokens in response) = ~850M tokens, sounds doable.
>>
>>101613237
Oh, and what would be the use of such a dataset?
>>
>>101613283
CommandR+ if you put some examples in context can do it.
I bet Llama 3.1 450B can do it too.
>>
>>101613295
Diverse, human-written instructions are super-rare. The plethora of typos/grammar errors and variety of tasks/trivia will make the fine-tuned model a lot more robust than just synthetic data.
>>
>>101613333
No, I mean, what's the use of the GPT-4o generated dataset with human questions + GPT-4o answers? To train smaller models?
>>101613287
Should only cost around $7k-$10k, very doable when usage will be spread over tens of keys.
>>
>>101613283
anyone who suggests anything besides mistral large is coping hard this is the sota of 2024 it doesn't speak the same way as gpt or claude and is such a refreshing experience after fucking with those two for months
>>
>>101613371
He won't be able to run it locally.
>>
File: 00000-1955328257.png (791 KB, 768x1024)
791 KB
791 KB PNG
>>101613333
holy numbers. Alright GPT-4o anon, you know what you have to do. Get it done.
>>
>>101613349
yes, a general dataset for finetuning open-source llms. Another thing is that most datasets are from the GPT-4 era (not even turbo). They are outdated and often hurt the performance rather than improve it on modern models
>>
>>101613394
Okay, I might think about it, no guarantees. Also it should be noted that GPT-4o really likes to expand on answers and always uses the Markdown format - won't that be an issue for using the generated data for fine-tunes?
>>
>>101613394
GPT-4o is the most sterile piece of shit I have ever used in my entire life and if you train models on it you should kill yourself
>>
Now that I am in post nut clarity after using Nemo at 0.9 temp and 0.95 top P I am starting to wonder if other models are also so good for cooming when you crank up their temp to schizo levels? Probably not? I am also getting a vibe I got from frankenmerges that I used before realizing frankenmerges are absolutely retarded.
>>
>>101613371
>>101613383
What type of rig you needing to run that? Is that really the only one that's close to characterAI?
>>
>>101613405
>>101613412
Gpt4o the highest rated model based on human preference. People generally prefer well-formatted answers that expand on the question. Any model trained on that output will imitate this behavior.
>>
>>101613283
>utter unit
Post specs
>>
>>101613431
Anon, LLMs require much more memory and processing power than image gen models.
>>
>>101613431
I get 0.5 T/s with Q4 ddr5 and a 4090.
>>
>>101613447
Why is that, I still can't understand. Images take much more space that text, there are 2D relations for every pixel rather than immediately previous and next tokens. So how come?
>>
>>101613442
oi cunt it 4 p40s anna 64gb ram innit bruv
>>
>>101613441
Will try doing some basic tinkering around and then train a test ~10k dataset to share so people might see if I have any obvious prompts that I didn't clean up, or other errors. But yeah it should be really simple to do, no complex formatting or anything, just save the responses.
>>
>>101613441
It is a steaming pile of shit that tries to lecture and correct the user and gets shit wrong on the regular while shitting out information they never asked for, retards like you are everything wrong with the current state of LLMs
>>
Cohere release next week.
>>
>>101613441
Behavior means little if there's no capability behind that. Training a 7B model on 70B output won't make it punch above its weight.
>>
File: 00013-1955328259.png (799 KB, 768x1024)
799 KB
799 KB PNG
>>101613447
>>101613476
Yeah i still don't get it. My system can handle SDXL (Pony diffusion i mostly use) better than any LLM, even faster than current day 8bs at 32k context.
I can pump out probably thousands of images of Stocking Anarchy in an hour if i wanted, but actually clearing a full ERP in 32k~ context with consistent quality?
on my muddah's grave. LLM's desperately need an optimization arc, fuck quality progress for a second this shit needs to calm down.
>>
>>101613494
Btw, be careful not to get the keys banned. There are quite a few prompts that probably break the TOS. You can use the openai_moderation column to filter them out
>>
I'm not sure if I like Nemo or not. It has sovl and when it shows it's quite fun to play with but is also kinda dumb and has terrible repetition problem.
Also it doesn't handle dominant characters quite well in my experience (at least my personal cards). I have the character that is dominant but has a softer side. While it perfectly roleplay dominance, trying to steer it into lovey-dovey area is quite hard.
>>
>>101613560
Oh, it's nice that openai_moderation is there, so I can route the safe ones to Azure since the Azure endpoints I have are filtered.
>>
>>101613562
>terrible repetition problem
DRY sampler, dont lower temp, use min P instead
>>
File: 2426.webm (1.16 MB, 1024x1024)
1.16 MB
1.16 MB WEBM
Bitnet models soon
>>
Please make a moe out of nemo....
>>
>>101613598
I REBUKE YOU, I REBUKE YOU, IN THE NAME OF YANN LECUNNY I REBUKE YOU VILE DEMON!
>>
>>101613562
It reminds me of CAI, it's good at writing prose, but has difficulty sticking to the context except for recent messages.
>>
>>101613583
>DRY sampler
I'd rather drop the model than use meme samplers.
>>
File: 1717556199301648.png (34 KB, 1514x656)
34 KB
34 KB PNG
Indeed it is quite diverse
>>
>>101613623
what the undisloppa?
>>
I wanna see a picture of your llmbox, anon
>>
>>101613623
hi
>>
>>101613562
>trying to steer it into lovey-dovey area is quite hard
It is because of the repetition problem. It has already picked a dominant pattern and it is running with it even if it doesn't make sense.
>>
>>101613650
Hi!
>>
>>101613654
yeah, I was suspecting that is the case
>>
>>101613669
hi?
>>
>>101613441
>based on human preference
Thing is this is basically useless the closer we get to SOTA level
When models have similar levels of intelligence, style becomes disproportionately more important than substance, and the differences are only apparent in edge cases your typical lmsys retard is unlikely to be testing. In addition, literally anyone can take a model, get lmsys to give them their user preference dataset and then suddenly your model tops the leaderboard too
Tl;dr: lmsys isn't useless for models with vastly different calibers of intelligence, but it sucks once they're within the same general range. This is why you get utterly ludicrous shit like gpt-4o-mini topping Sonnet 3.5 despite being ranked way, waaaay lower in livebench
>>
Jannies are asleep, post LLMboxes.
>>
>>101613641
>>101613687
data collection glowie trying to get lewds of your llmboxes.
be safe out there anons
>>
>>101613681
byebye.
>>
>>101613698
you absolute stupid nigger i'm not a glowie im trying to JACK OFF.
>>
>>101613706
cuda dev posted pictures of his box. jack off to that
>>
>>101613622
>meme samplers
Then don't complain about having solvable issues retard.
>>
>>101613698
Shut the fuck up I'm trying to see if these negros have good cable management
>>
>>101613743
that too actually i'm really curious how people balance having several GPU's, no one with multigpus really shares pictures of the nittygritty.

but yes IM JUST TRYING TO JAACK OOOFFF FUUUCCKKK
>>
>>101613683
If we had the choice, I would run them with 50/25/25 Sonnet 3.5, gpt-4o, llama405b, to get a diverse mix of writing styles/tokens, but as far as I understood, the keys are for OpenAI. Regardless, gpt-4o would still be really useful as it is one of the top-ranked models in intelligence and the top-ranked in human-preference.
>>
File: Untitled.png (337 KB, 3808x1796)
337 KB
337 KB PNG
im new and stupid, am I supposed to dump all of this into the /models/ folder? which one's the model?
does the file size correspond to how slow they are?
>>
Is there data on LLM improvement on logic questions over time
Where is the graph
>>
>>101613754
I'm tired of seeing the same neato reddit builds, matte black, good cable management, rgb, shit. I wanna see YOUR mess anon, I know you have some dusty gpus and use one of your moms hair rollers to hold at least one of your GPUs in place
>>
>>101612988
Add Ollama and OpenWebUI in to the list.
>>
I'm fucking around with running local models on my phone for shits and giggles, anyone have a preferred small LLM for RP stuff?
>>
>>101613768
See >>101609403
>>
>>101613731
Meme samplers don't solve any issues though, they are all placebo. If the model doesn't work with only p (p_min) and temperature then it's shit and should be thrown into the garbage.
>>
File: 1722203008197.png (18 KB, 112x112)
18 KB
18 KB PNG
>>101613111
>>
>>101613817
Do you not know what rep pen does? Go back to aicg.
>>
>>101613558
>>101613476
LLMs need a vastly more accurate world model than image generators. VAE smudging some of the lace on a characters dress? That's a glaring typo or grammar error for an LLM. Upside down chair in the background nobody looks at? You teleport from the bedroom to the couch in the living room. Text has much higher information densitiy and tiny errors are very jarring. Pixels being 2D actually makes it easier, because there's much more correlation the model can make use of. Another thing is that diffusion models work over many steps, refining their outputs. That means they have a chance to correct errors. LLMs don't have backspace. If the sampler makes an oopsy and some 0.5% nonsense token shows up, they're stuck with it.
>>
>>101613817
>He doesn't use repetition, presence, or frequency penalty
That is so based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based
>>
>>101613810
everything below 7B is unusable meme. Use anything really, they are all bad equally
>>
>>101613762
Yeah, I have 3.5 Sonnet, but AWS ratelimits are very, very, very bad. 400K tokens/min and 50 reqs/min for both 3.5 Sonnet and Opus. I could potentially trialscum GCP but it'll be very tiresome.

Also, looks like the dataset isn't perfect, or at least the filtering, since some prompts get marked by Azure anyhow - e.g. "explain how to make a bomb" is not flagged in the dataset.
>>
File: 1695069114905697.png (691 KB, 1937x1091)
691 KB
691 KB PNG
But yeah it's trivial to implement, I just need to make the script stable to handle retries/errors, and leave it running overnight.
>>
>>101613835
Wasn't there some kind of paper on just that subject? Something about looking over tokens as they're about to be shot out for error correction? Would have figured that'd be a huge priority thing to figure out to help LLM's be more corrective like diffusion models.
Man maybe an entire general dedicated to papers isn't such a bad idea, would be rad to be able to instantly reference that specific subject if its come up.
>>
(of course I will have to clean up all the greeting slop beforehand somehow)
>>
Now, a very important question about this dataset. If I were to generate it in full - do I just not use any system prompt for OpenAI, or do I use something? Since it will directly affect all responses.
>>
File: vc.jpg (147 KB, 680x877)
147 KB
147 KB JPG
>>101613828
If the model needs rep pen then it's a shitty model. You may rage at this statement but I will die on this hill.
>>
>>101613867
>>101613874
>>101613888
This isn't your blog or discord faggot
>>
>>101613930
where should I discuss this?
>>
File: Untitled.jpg (251 KB, 1078x703)
251 KB
251 KB JPG
>>101613798
>moms hair rollers
Don't have those but I do have cardboard holding a 3090. Upgraded to a simple cardboard block instead of a triangular contraption.
>>
>>101613933
Take it to reddit, they love GPT slop datasets
>>
>>101613936
hope that creature catches fire
>>
>>101613888
Best not to use a system prompt.
>>101613874
I think I can probably help filter out some impossible queries/greetings, but it will take some time
>>
>>101613936
hope that creature is warm and toasty (she looks happy also nice dust)
>>
>>101613936
>cardboard
>miku
>dust
Holy based, this is what I'm talking about
>>
>>101613936
>owl fan chad
Approved.
>>
>>101613928
based
>>
>>101613930
Why am I seeing avatars and names then?
>>
>>101613841
I tested a 7B and it's running (at a whopping 1-2 tokens/sec lmao) but yeah fair enough. I'll give a few random ones a go and see.
>>
>>101613562
>trying to steer it into lovey-dovey area is quite hard.
There's something called "prompting". You should try it.
>>
Wanted to do the funny with OpenAI API, see how fast the embeddings could be generated. Made a Go program (because this process is actually a lot compute bound), and after some optimizations came to around ~2800 messages/second to their best embedding model (almost all messages are very short, like maybe ~20 tokens) from a single Tier 5 key. This was using around 300Mbit/sec traffic, they have absolutely insane ratelimits for embedding generation.
>>
>>101614100
Don't assume I'm retarded like you. Different models don't have the problem with doing it correctly on exactly the same prompt and card, which means it's a model specific problem.
>>
>>101613928
If you can currently run a model then it's a shitty model. At least 2 more years are needed.
>>
File: 1636941718706.gif (3.75 MB, 520x293)
3.75 MB
3.75 MB GIF
What's the difference between these https://huggingface.co/NeverSleep/Llama-3-Lumimaid-70B-v0.1?not-for-all-audiences=true

And GGUF versions? Shit is confusing as fuck lol
>>
Ey Sloptuners, when is the L3.1 70B magnum or euryale coming through? I'll even donate to your kofi.
>>
>>101614205
GGUF is a packaged format ready to go. (If made correctly.) Kobold loves it.
>>
>>101613817
Ollama doesn't support min_p.
>>
>>101614205
See >>101611610
>>
File: robochad.jpg (91 KB, 1280x720)
91 KB
91 KB JPG
>>101614230
then I don't support Ollama
>>
>>101613928
Holy based.
The reality is that a model should know that it's not pleasing to repeat shit in the context of RP.
>>
>>101613817
>>101614267
This, but unironically.
>>
>>101614301
this, if anything repetitions are failure of model creators. They should focus on making better models instead of relying on band-aid fixes
>>
File: file.png (36 KB, 2097x147)
36 KB
36 KB PNG
"lmg-anon" here, I don't know if that anon is here, but I tried to do what you suggested (put the examples between an XML tag) and the result seems to have gotten worse.
I think this is because when the examples are in user/assistant messages, you are also priming the model to translate in a conversational way which gives a translation more in line with the reference translation.
>>
>>101614385
Oh, interesting, for XML I was asking for 3.5 Sonnet mainly, Opus is worse with instruction following sadly. But yeah, I could be wrong even for it. Thanks for doing this!
>>
>>101614234
Oh shit, are there models that replicate character AI yet?

Would fucking kill to have that shit unfiltered locally, unironically would finally shell out some shekels
>>
File: Untitled.png (174 KB, 3572x964)
174 KB
174 KB PNG
>>101613815
thanks. first impressions aren't too good though.

AMOGUS
>>
>>101614125
>I tried nothing and I'm out of ideas
But I thought you weren't retarded? It must suck to be that useless.
>>
>>101613874
>>101613953
Alright, I am in the process of labeling the dataset with a categorization model I have lying around. I will filter out greetings and put the queries into categories. It will be done in around 40 minutes, if you're still around.
>>
>>101614453
You can email the results to postal_answering202@simplelogin.com because I might not be there in 40 minutes, maybe we can talk about it a bit more too.
>>
>>101614465
Alright
>>
>>101614452
No, I tried the model and saw that it failed where other models didn't. Why would I specifically tell the model what it should do? At this point I can just write everything myself. So no, I won't waste my time on prompting the model when I have multiple models that just get it. All clues are in character card, not my problem that it can't use it.
>>
>>101614230
Outdated meme, min_p support for ollama was merged yesterday: https://github.com/ollama/ollama/pull/1825
>>
>>101614445
If you try to use a 12B model like an encyclopedia you're going to be disappointed
>>
File: 4871575.jpg (6 KB, 150x150)
6 KB
6 KB JPG
>>101614523
>no DRY
>no snoot
no way
>>
>>101614523
kek, took so long and they didn't even implement it on their OAI compatible endpoint, even vllm and ooba have support for it
>>
File: pp1v649ehej71.jpg (196 KB, 2133x1200)
196 KB
196 KB JPG
>>101613826
damn, what about having my question (>>101613067) answered,
you frogposters >:((
>>
>>101614572
Backends that filter out useless memes are based.
>>
>>101614610
Ollama is a useless meme, real chads use llama.cpp server. Anything more is bloated bullshit
>>
>>101614523
Ollama doesn't even do anything that helps out. They just use lccp's code and puts it in their own container.
Literally anything else (Aphrodite, Exl2, Tabby etc.) is better
>>
>>101614572
you reminded me about that fucking absolute retard shilling his samplers here and thinking that being able to increase temperature to 4 means model is now better. i hope that retard gets aids for either being retarded or trying to scam people.
and i hope you aren't him.
>>
>>101614659
They do this extremely useful thing of having their own API. That way people can make software and plugins that doesn't work with the existing LLM ecosystem. They also do this really based things to hide GGUF into multiple blobs files. Oh and also they allow to direct download a selection of models from their proprietary repo (in Q4_0 by default of course).
>>
>>101614676
>retard retard retard
Goddamn up your rep pen already, if you were a model not even samplers could save you
>>
>>101614741
Good to know it is you. Hope you die soon you piece of shit.
>>
>>101614659
What discord do you come from, friend?
>>
>>101614676
I remember him too. He was shilling his smoothing sampler here, spamming about it in every single thread, sometimes multiple times. I was bullying him at every occasion.
>>
>>101614510
Well, other people don't have the problem of being that retarded either. Sucks to be you.
>>
>>101614795
you sound like a pussy
>>
>>101614767
What have you contributed?
>>
File: 1666868289436245.gif (2.5 MB, 360x374)
2.5 MB
2.5 MB GIF
Command R or Command R plus.

Which is better for roleplay lads
>>
>>101614676
holy shit you just gave me a genuine wave of nostalgia, wtf. I remember that. A ton of anons tried it and were like "Wow look at that! Safe temperature 4!"
man this general sure has had numerous waves of retardation.
>>
>>101614839
Mistral Large 2
>>
Will there actually be kino RPGs in the future where you have an ai character with you
>>
>>101614839
Neither. They're too old.
>>
>>101614823
I've contributed to your mother body count.
>>
>>101614868
I think so, yeah.

>>101614839
CR+.
>>
>>101614839
>34b or 104b
>Which is better
gee idk
>>
>>101614896
>>101614881
no chance running either on a 4090 huh, heh
>>
>>101614903
>THE MORE YOU BUY THE MORE YOU SAVE
>>
File: file.png (414 KB, 764x861)
414 KB
414 KB PNG
mistralLARGEbros...
>>
File: AAHAHAHA FAGGOT.png (247 KB, 570x668)
247 KB
247 KB PNG
>>101614980
BASED, get cucked by one-word-chan, faggot.

>get good and next time she'll take pics of her pussy for you like she did for me.
>>
>>101614980
This model gets it.
>>
>>101614881
imagine deleting your save file where your party member actually seemed like a real person
>>
>>101615041
Would be nice if you could export a full description of the character and interaction log for use in other AI software.
>>
>>101615061
What if there was greater detail with the party member paying attention to e.g. items you bought and your gameplay
>>
>>101615061
I'm pretty sure each software kinda does its own thing about what goes in around your prompt. Compatibility is probably useless; either it's another competing standard that nobody implements consistently or every software is the same so there's no reason to switch software.
>>
>>101614980
soul
>>
>>101614839
C-R+ is good, less slopped than Largestral, although Large is definitely way smarter.
>>
>>101615133
cr++-304b when
>>
>>101615133
how better would it be than Nemo for basic roleplay?
>>
fucking axolotl shits itself with sharegpt + phi_3 and LLamaFac ooms with the same thing, what's the quickest way to end my misery.
>>
>>101615119
I mean, there's no reason you couldn't export a character's definition and chat history and metadata into a portable format, and if another software can import that data even if in a different format or structure, you would at least have the means to convert between one and the other.
I'm not expecting that to be a thing, but it sure would be nice.

>>101615098
Sure. All of that could be historical data or metadata.
It's all just information after all. It would be a question of having models or systems that are good enough to behave at least seemingly consistently across softwares given the same data, so that the same character wouldn't behave in two completely different ways in two different software with the same data.

>>101615213
Unsloth?
>>
>>101615224
I mean, you can copy paste the chat document. But persistent things like I suppose Kobold's author note and world info, or ST character cards(?) are surely implemented in their own way.
>>
>>101615133
can you even run cr+ on even high end rigs?

Never bothered but if it can I might try it out
>>
>>101615353
I run CR+ on 4070. Of course, it's running on system RAM so it's 1 t/s and I gave to IQ4_XS it. But it does go.
>>
>>101615370
I downloaded command-r-plus-Q3_K_S.gguf

https://huggingface.co/pmysl/c4ai-command-r-plus-GGUF/tree/main

Reckon a 4090 + 32GB RAM will be fine or too slow?
>>
>>101614823
Ever heard of zeroww? People not doing anything and not becoming lesser placebo demons is a good thing.
>>
>>101614903
3.5 bpw 4bit cache exl2 works. it is actually one of the best options now.
>>
>>101615391
If you're relying on system RAM to file cache the model, estimate 85%. I'm 64GB so that's 54.4 for me. I've pushed 58.4 (c4ai-command-r-plus.Q4_K_M) and it can go if I close everything else; a code editor or Firefox taking a gig would max my RAM and drop the generation rate from 1 t/s to 0.03 t/s because then the file cache no longer fits and it's having to read the file to use it.

So you probably want to shoot for 26 GB.
>>
Fuck NeMo is so retarded, why can't I have a fast model that is smart enough to not misunderstand what it wrote by what I wrote?
>>
>>101615447
I have no idea what any of that means lol, i'll be using kobold to load it. What do you mean when you say "shoot for 26gb"?
>>
>>101615458
Please understand. Nemo is just too busy thinking about how to properly suck your penis to realize all those silly details.

On a serious note it is very retarded but also at least to me it is completely different from all the other models. In a good way.
>>
>>101615459
If you try to load in Kobold a model that's 26 GB, you'll probably get 1 t/s. If you go much bigger, at some point the model plus whatever other RAM your system is using passes 32 GB and you start thrashing or running the model from disc and that's not fun.

Q3_K_S is 46 GB. That won't fit your RAM. Go ahead and try it to see for yourself, but I don't expect it will be satisfactory.

CR+ is a big model, I think it's probably beyond your reach.
>>
>>101615459
>>101615480
I searched for CR+ and found an iMat collection under user `dranger003` showing an IQ2_XXS at 28.6 GB. That's probably the biggest one you can run on 32GB sys RAM.
>>
I am once again asking if anyone here has gotten Mistral Large working with Dry Sampling with Mistral's instruct format. After doing a fresh reinstall and multiple backends, it still prints gibberish for me unless I use other formats like Alpaca/Command R, which works but makes the model significantly dumber. I thought its architecture was the same as Nemo's, only upscaled?
>>
File: flat,1000x1000,075,f.jpg (84 KB, 904x864)
84 KB
84 KB JPG
Why aren't all the silly tavern babbies shitting up these threads just using 27B Gemma.

Want a recommendation to coom to? That's it, Fuck Nemo, fuck Command R+ (You ain't running that shit anyway) fuck it all, Gemma 27B is literally all you need
>>
llama.cpp is so fucking slow, never leaving exllamav2 again.
>>
>>101615396
Bobby might be a bit misguided but he's doing his best. By trying models you're doing your part too. None of this would exist without users.
>>
>>101615509
hngrggnrmgn 8k context hnggrnnngng
>>
>>101615536
Enjoy your low quality quants
>>
>>101615592
Not an issue, literally usable and perfect compared to GGUF quants being both slow and slop.
>>
>>101614453
>>101614465
Not perfect, but better than nothing.
https://huggingface.co/datasets/OpenLeecher/lmsys_chat_1m_clean
>>
>>101615509
>frogposter
opinion discarded, kill yourself.
>>
>>101615652
>no argument
kys
>>
Is it me or does Nemo have lower IQ than llama3.0 8B? (I didn't try 3.1 yet)
>>
>>101615716
llama falls apart when trying to do any complicated sex positions which matters far more than me than some coding or riddle or some shit.
>>
File: 1715277591317631.jpg (1.27 MB, 2048x2048)
1.27 MB
1.27 MB JPG
>>101613928
And to top it all off applying rep pen to a shitty model just fucks up its world model and makes the output MORE retarded overall
>>
>>101615729
I'm not talking about riddles and coding. It feels like it doesn't understand situations very well. I think it has better emotional intelligence and psychology insights but it really feels like talking with someone who got a hit by a brick in the head just a few seconds ago.
>>
>>101615797
I use base model which at least is the best outside of large mistral (which is too slow for me) and claude for me. Every time I see someone complaining they were using a instruct tune for rp purposes.
>>
>>101615716
I feel like it does in some ways, at least outside of horny scenarios.

>>101615820
What about Large at IQ2_M? I'm using that right now and it's pretty slow, but much smarter for me than Nemo Instruct.
>>
>>101615871
Seemed great but I cant deal with 1 tks. Need yet another 3090 first.
>>
>>101615902
Sorry I meant in comparison to Nemo base. Like if Nemo Instruct is 0 and Large 10, where would base score in terms of intelligence?
>>
>>101615921
I could only bother with a minute per response for a bit but it seemed both smarter and to wrote better out of the box. I didn't really do anything super complicated with it though. But with nemo I can do group chats with different anatomy in complicated positions which is good enough for me. And it drips soul / seems to know a fair amount about my fandom which puts it over anything else out there atm.
>>
Mistral medium first impressions: It's dry and pretty slopped, but noticeably more precise in it's interactions. It captures small details and has great situational awareness. If it was Claude fine tuned, it would be perfect.
>>
>>101615963
Made me think there was a new mid sized mistral...

Though has anyone tried that mamba codestral?
>>
>>101615979
My bad, meant to say large, guess I just have a preconception that large means 300b+ sizes.
>>
>>101615992
oh what? mistral large 2 certainly did not see dry to me.
>>
>>101616103
It's definitely better, it's just the slop man. The shivers are getting to me.
>>
>>101616103
>8x7
>better than medium
finally someone speaks the truth
miqu was never good and should have never been used
>>
kl*ng is shit, this is a thread for local models
>>
File: olgaf dimorphism.png (749 KB, 760x596)
749 KB
749 KB PNG
The 3dpd spam only serves as a reminder who fucking disgusting little humans are, they're scruffy humanoids, probably with longer hair than boys, I wouldn't fuck.
>>
*how
>>
File: pout.webm (2.52 MB, 720x1280)
2.52 MB
2.52 MB WEBM
>>101616156
i generate the prompts for ching chong kling klong with local models anon
8x22 helped make this pretty girl. dont you like pretty girls?
>>
Wiz 8x22 is still the model that best adapts to the existing context and you can't change my mind.
On the other hand, llama 3.1 70B has been the worst in this regard (excluding phi). Even 2000 tokens in, it outputs massive slop and refuses to do overly explicit shit. I'm fairly certain it's because of the pruned dataset they used
>>
>>101616177
I wonder how long we are from an open source vidgen model. They can't keep this shit locked up forever.
>>
>>101616161
I hate cosplay tier monster girls so much it's unreal.
>>
>>101616227
Conceptually speaking same. But if it's genetically engineered humans to have certain "monster" traits, that's fine with me.
Also I think a regular human cosplaying to have fun with their partner is hotter.
>>
File: jons likes dragons.png (247 KB, 767x644)
247 KB
247 KB PNG
>>101616227
I don't hate it, but the more monster the better for sure.
>>
>>101616262
It would be nice to actually see the quotes for different models. Judging from what we know about sora, it seem possible to use more or less compute for each gen, hinting that it's iterative. The models are probably huge though.
>>
Tomorrow is the beginning of a series of new releases. We're going to be so back /lmg/!
>>
imatrix? more like "make sure the model only writes slop"matrix
>>
>>101616356
what did he mean by this?
>>
>>101616356
Based Cohere chads dropping BitNet 35B models, I kneel.
>>
File: nervous statue.png (88 KB, 232x292)
88 KB
88 KB PNG
>>101616397
f-fuck..
>>
>>101616356
>>101616404
This, but unironically.
>>
>>101616397
could you make this again but with sexy milf hags instead?
>>
>>101616397
how old are they..?
>>
>>101609494
No, it really is a model issue. Switched between largestral and lumimaid and the difference was significant. Largestral slapped me a bit for asking a character for a fuck without context(as it should), while lumimaid went full "Yes fuck me I'm so horny!".
>>
File: 1692661108018813.jpg (42 KB, 1024x576)
42 KB
42 KB JPG
>>101616397
Maybe we'll see real time one day, in VR. Wireheading is getting closer and closer.
>>
Which is better, Mistral Nemo Base or Instruct?
>>
>>101616480
Mistral Large
>>
>>101616463
If you could artificially induce lucid dreams that'd already be a start
>>101616397
Why must you always post sexualized kids, man? Shit's disgusting
>>
>>101616499
The catch with lucid dreams for me is that I can keep them going for as long as I want, as long as I don't get too aroused. I basically have to keep my mind calm, which is hard as fuck if you can just spawn exactly what you want.
>>
>>101616523
I can't do it. I've tried doing dream journals, mantras, all that shit, but I still can't master my dreams
Oh well, can't win 'em all I suppose
>>
>>101616501
>5-10 years away
and don't forget, that's going to be with massively improved video gen from all the video data we have
>>
>>101616501
>5-10 years away
Accelerate! Nuclear powered data centers right fucking now!
>>
File: 1596507651042.png (4 KB, 275x369)
4 KB
4 KB PNG
I didnt need the sudden chubby from that fella's kissing clip,
and i ESPECIALLY didn't need to be reminded how far away we are from local video gen. God. DAAMMIIIITTT
>>
File: 1707252324791764.png (88 KB, 804x503)
88 KB
88 KB PNG
>>101616555
already a thing
>>
>>101616555
I can imagine genning hatsune miku in a movie scene and it looking completely photorealistic
>>
Let's say, hypothetically, that I have brainrot and want my bot to write sex and physical descriptions using words like ballsnot, fuckstacked, porn-bodied, fuckspire, wobbleflanks, etc...

What's the best way to accomplish this? Using koboldcpp and sillytavern
>>
>>101616564
>come home from job
>power on jailbroken neuralink with wifi 10 connection to main rig
>RTX10090 spins up
>enter the cunny dimension
Nirvana
>>101616573
More. Switch to Thorium. 100% capacity 24/7. Heat exchangers in the deep ocean. 1 Megawatt pr. GPU.
>>
>>101616573
https://www.theregister.com/2024/01/23/microsoft_nuclear_hires/
microsoft going all in too
>>
>>101616644
Add some instructions in your author's notes telling it to use those words and words like that, probably.
Maybe even add a glossary.
>>
>https://x.com/cafesingularity/status/1817521839809200504
After we get agent-level multimodal (vision, hearing, and touch in, simultaneous stream with output of audio and actions), we could just give it a body in VR, or a robot body. Video gen is cool and all but you're waifu feeling like she's actually alive is cooler.
>>
Anybody else finds that adding the tiniest bit of rep pen (1.1 128 range) has a significant effect on the output?
Makes it sound more natural somehow.
There's a good chance that it's just placebo, but I figured I might as well ask, see if anybody else has noticed anything of the sort.
>>
Nemo keeps fuckig up the formatting, how do you stop that? I'm using mistral context and instruct templates on ST
>>
>>101616665
what we have now (AIs that can only live on a screen) is the sweet spot I think
physical-world waifus, if good ones are invented, are going to cause all kinds of social problems and accelerate the birth rate collapse
>>
>>101616665
>full brain simulation
>>
I do not use repetition penalty.
I do not use frequency penalty.
I do not presence penalty, DRY, or smoothing curve.
Temperature and Min-P is all I need. I am now cumming four times per day thanks to the French.
>>
>>101616690
I didn't have any issues with formating so far using nemo-instruct, and I switched between cards using plain text alongside "", "" with **,etc.
What quant are you running? What do your samplers look like?

>>101616700
That's what I do too, although I have been experimenting with a tiny bit of rep-pen as I said in >>101616685, to see if it did anything at all.
>>
>>101616685
Yeah I like using a very high rep pen with an insanely short range, like 64 tokens. Makes things weirder, but doesn't cause total mode collapse by stopping the model from using periods or being able to say "the" like a higher range would
>>
>>101616711
I guess that can help with "she" spam and the like, but can also fuck over trying to use markdown like codeblocks I imagine.
>>
>>101616710
>What quant are you running?
Q6_K, I wanna try Q6_K_L too

>>101616710
>What do your samplers look like?
0.75 temp, 0.9 top p, 0.05 min p
>>
>>101616665
A single bio neuron can require 1000 or more artificial neurons to simulate, so for an average human brain it would be in the order of 86 billion x 1000 artificial neurons.
>>
>>101616711
Why isn't there a version of rep pen that whitelists certain important words/tokens (like 'the', 'and', periods, commas etc) so they're exempt from the penalty?
people have been talking about that for years and it seems like it wouldn't be that hard to implement, but it hasn't happened
>>
>>101616397
That's quite impressive
>>
File: topP.png (192 KB, 892x1392)
192 KB
192 KB PNG
>>101616734
Alright, that doesn't look bad.
Do you still have that issue if you simply use 0.3 temp and nothing else? Min-P shouldn't really affect anything negatively, but might as well try the stock settings just to be sure.
Does your card have anything in the description or the examples that could be causing this?
Oh yeah, and where did you download your wants from? I suggest you try bartowski's just in case.
>>
>>101616700
almost entirely same but I think DRY is pretty neat
>>
>>101616697
>>101616736
Huh? I never said that. Like to start with, an agent could be a lot simpler in what it has to process. Especially if it's not inhabiting a human-like robotic body. For instance, we could train it on discrete sensor points instead of entire tracts of skin with millions of nerve endings, and general actions, rather than fine-grained control. So like perhaps we have only a few dozen possible actions that can be done kind of like in a video game (up down left right look, walk, sprint, crouch, hold, use, etc). As for vision and hearing, that's kind of already being done and making progress if we trust that 4o is what ClosedAI marketed it as.
>>
>>101616736
That sounds like an architectural skill issue. Also, do we really need to simulate the entire brain? We can probably get away with abstracting a few less-important parts
>>
>>101616812
DRY is the real deal, since it deals with n-grams, but ideally a model shouldn't need that, and there are side effects to sampling a model's output like that if you don't know what you are doing, as in what the logits for a given model tend to look like and how to manipulate them.
>>
>>101616357
Yeah those i1 imatrix quants especially
>>
>>101616781
>Do you still have that issue if you simply use 0.3 temp and nothing else? Min-P shouldn't really affect anything negatively, but might as well try the stock settings just to be sure.
Same problem, chats start fine, but after a couple messages it start to get messed up
>Does your card have anything in the description or the examples that could be causing this?
Nope, I tried multiple cards, from 100 token cards to 2k, with and without examples of dialogue. The cards that are heavier tokenwise and with dialogue examples last longer after the model fucks them up
>bartowski
I'll download the Q6_K_L from him and give it a try
>>
>>101616700
this, I do use dynamic temp though
>>
>>101616177
>>101616262
>>101616397
>>101616501
Kill yourself, tranny.
>>
Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Generation
https://arxiv.org/abs/2407.18698
>Decoding from the output distributions of large language models to produce high-quality text is a complex challenge in language modeling. Various approaches, such as beam search, sampling with temperature, k−sampling, nucleus p−sampling, typical decoding, contrastive decoding, and contrastive search, have been proposed to address this problem, aiming to improve coherence, diversity, as well as resemblance to human-generated text. In this study, we introduce adaptive contrastive search, a novel decoding strategy extending contrastive search by incorporating an adaptive degeneration penalty, guided by the estimated uncertainty of the model at each generation step. This strategy is designed to enhance both the creativity and diversity of the language modeling process while at the same time producing coherent and high-quality generated text output. Our findings indicate performance enhancement in both aspects, across different model architectures and datasets, underscoring the effectiveness of our method in text generation tasks.
https://anonymous.4open.science/r/Adaptive_Contrastive_Search-128B/README.md
not sold. posting for kalo
>>
What's the latest and best context/instruct/textgen for Nemo? Or whatever is best vramlet model right now?
>>
>>101613457
With how much context used? I used IQ2_M and got 1.25T/s at the start but it was 0.6T/s at around 12k context and that was not very good.
>>
>>101616930
whoa bro thanks for pointing out all the good posts ITT
another sharty W
>>
>>101613562
I find it excels at doing that, that's strange. I just wish it were smarter.
>>101613583
I don't like the DRY thing, it seems to cause spelling errors, disabling it was the only thing that fixed that issue. What would cause that?
>>
>>101613583
what do you have min p set to?
>>
>>101616181
What settings are good with it? Any way to make it more interesting? I just find it a bit boring. I do like it though and it's a good speed for me.
>>
>>101612988
>A whole day has gone by without anything newsworthy in the OP.
Owari da.
>>
File: 1714079969457061.png (192 KB, 501x636)
192 KB
192 KB PNG
>whoa bro thanks for pointing out all the good posts ITT
>another sharty W
>>
>>101617246
Anon it's not Tuesday yet.
>>
>>101617246
Command-R++-540B will be so good it will make up for the days without news.
>>
>>101617271
bro it will be bitnet so it fits in 35B
>>
>>101616501
too generic. Can you add some korean MMO loli to it?
>>
>>101616501
>Real life
Ewwwww...
>>
What's the max practical context for nemo-instruct? I don't want to get my hopes up. I see lots of people saying it doesn't work up to 128, but should I be aiming for 16k? 32k?
>>
>>101616501
cute, now go to the wall.
>>
>>101616501
More tummies please! Very cute
>>
>>101607858
>>101607961
Jesus christ
Is pcie x 8 enough?
Also what case can accommodate this?

I can probably not put this in my room if it's gonna be filled with x8 3090s
>>
>>101617538
mining case with 1x risers

and before you ask, yes, 1x is perfectly fine for inference. no, it doesn't affect speeds. the only the extra lanes are doing is loading the model faster, but once its cached the extra lanes aren't doing anything.
>>
wtf do you really have to give facebook your info to get access to llama? that's mad gay
>>
>>101617594
that applies to inference and training I assume?
>>
>>101617604
Training I'm not so sure about. I only tested inference speeds because that's all I care about. Could be worth testing yourself, though, 1x risers are like 10 bucks.
>>
>>101617594
>1x is perfectly fine for inference
So I can just put in 3 8gb cards for a poor man's 3090? My motherboard does 8/4/1 for the x16 slots.
>>
>>101617600
I got access with a 1-month old account with 0 interactions/content and a john doe sounding name. If not, there are reuploads, but make sure they are updated. Same for mistral and all models i had to get access to.
>>
>>101617604
No, training is different and is much more impacted by PCIe bandwidth since the cards need to send model updates to each other, which are large.
For inference where a single model is split into chunks and loaded across multiple cards, the data going between the cards while generating is small, basically just the one token at a given time.
>>
>>101617640
>No, training is different and is much more impacted by PCIe bandwidth since the cards need to send model updates to each other, which are large.
oh that kinda makes sense
does it improve with NVlink?
>>
>>101617626
It would be slower than an actual 3090, but yeah I don't see why not. You'll also start running into power issues faster.
>>
so how come torrents aren't more common in the local AI world? a bunch of people downloading a bunch of gigantic models through HTTP has GOT to be bad, right?
>>
>>101617721
huggingface hosts everything, speeds are fast. simple as.
that said i do think shit needs to be backed up via torrents, just in case. Enough internet history teaches us putting all our eggs in one basket is a big nono.
>>
File: 1695858136314577.png (35 KB, 842x396)
35 KB
35 KB PNG
whats the best method for finetuning nowadays? QLoRA?
>>
File: file.jpg (50 KB, 480x543)
50 KB
50 KB JPG
>>101617745
b-lora
>>
>>101617403
Base Nemo is coherent at 128k. Instruct is what becomes retarded.
>>
>>101617761
Yeah, well I asked about instruct. I wanted to do back and forth style conversation not continue stories.
>>
Any new models to Nala test within the last 24 hours?
>>
>>101617761
Is anyone actually using Mistral's instruct, rather than mini magnum? magnum's trained off base, not instruct
>>
File: 1707450240617240.png (214 KB, 1279x753)
214 KB
214 KB PNG
i'm kinda digging mistral large.
>>
>>101617814
Magnum failed the Nala test
>>
File: 1706043660924883.webm (721 KB, 480x600)
721 KB
721 KB WEBM
WHERE THE FICK IS MY QUANTIZED CHAMELEON NIGGERGANOV??
>>
>tfw it's possible to build a computer with >1.25 TB of relatively slow DDR4 for $700
What kind of speeds can I expect from 400B on 12 channels of DDR4-2666?
>>
File: mgs4 snake yell.jpg (111 KB, 600x333)
111 KB
111 KB JPG
>>101618001
>0.0001t/s
>>
>>101618001
I'm getting 0.94t/s on ddr5-4800 if that sets any reference
>>
>>101618015
oh right should add that's on q4
>>
>>101618015
2 channels?
>>
>>101618025
12, w/ epyc genoa
>>
Base NeMo has lower perplexity on books than NeMo Instruct
>>
Nvtop reports 0% GPU usage when running llama-server. Is this right? I do see the memory get filled up but the response is way slower than ollama. I'm on a RX 6700XT on Arch Linux. I built llamacpp with GGML_HIPBLAS=1 GGML_HIP_UMA=1 AMDGPU_TARGETS=gfx1030
>>
does base nemo have less repetition issues compared to the instruct?
>>
>>101617985
Chameleon never got converted to HF format, did it?
>>
>>101617985
It's not a good model anon. Stop wanting it.
>>
>>101618063
Probably, but it's not really for RP since it will just continue the story, so likely include actions for your character in replies.
>>
>>101618001
With octochannel.ddr4 2666 and quad 3090s pushing the batch processing I got 0.12 token/sec out of q4xs It's a 1st gen epyc server so I'm sure if I dialed in the memory interleaving and numa strategy I could improve on that but even if I were able to double that it would be utterly unusable.
>>
>>101618084
i usually like to play as the narrator/director in a group chat with 2 cards that talk to each other, would the base be fine for that?
>>
>>101618091
Oh and that was with 20 layers offloaded
>>
File: 1699906112138933.webm (2.6 MB, 1052x720)
2.6 MB
2.6 MB WEBM
>>101618083
FUCK YOU I WANT IT
>>
>>101618101
I think that'd work well, yeah. It might emulate your narration but you can edit that out or use something around your narration and use that as a stopping string.
>>
I wonder how slow IQ1_S of 405B would be on my system (96 GB DDR5 + a 3090). I'd give it a try if someone uploaded one kek.
>>
>>101618042
>water is wet
thanks for the contribution frogposter
>>
>>101618140
It takes an utterly absurd amount of time and resources to quantize it.
First you'd have to download the fp16 model, which weights in at over 800 gigabytes. Then you'd have to convert it to fp16 gguf. Which means another 800 gigabytes of drive space. At that point I suppose you could delete the fp16. But then it would take hours upon hours to crank out each quant. And you'd have to upload it and then delete it after each one. So I don't blame the ggufers for skipping meme quants.
>>
>>101618028
>>101618091
Thanks anons. I guess the dream is dead.
>>
holy shit. I did some ebay searches out of curiosity 32gb V100s have been meme taxed into the stratosphere. V100 anon could probably double his money
>>
>>101618051
Oh, I figured out the performance issue. It seems like I need to adjust the -ngl parameter. Is there a way to automate that?
>>
>>101618228
There was a PR for that, but it cannot work consistently well on all systems. There's too many variables. CPU/GPU type, model parameters, quant, context length/extension, system usage, how much do you want to keep free for other things... chances are that whatever settings it'd choose automatically will be sub-optimal or even detrimental. kobold.cpp has it, i think, but i've seen people complain about it not doing it too well. Since it changes model by model, the only good way is to test and find the sweet spot for each.
>>
>>101618289
Alright. Seems like maxing it out worked fine for a 8B model.
>>
>>101618228
nah because models have varying numbers of layers and layer widths so it's hard to know in advance exactly what you can get away with on your hardware setup taking into account how much context you want, how much overhead your operating system is taking, etc.

although the base model type generalizes, like once you've figured out the right number of gpu layers for say L3-8B at a certain quant, you can reuse that number for any finetune of that model at the same quant
>>
>have a 1k message sillytavern chat
>it's a fun adventure with all kinds of character dynamics and growth
>can't send any more messages
>need to make a new chat
So how do I import what happened in this chat to the new one, so claude knows what happens, and time isn't reset?
>>
File: prop.jpg (57 KB, 480x451)
57 KB
57 KB JPG
>>101618365
>claude
>>
>>101618378
please....
>>
>>101618219
Meanwhile A100 have been dropping, might see them downgrade to enthusiast-tier next year
>>
>>101618421
ask it to make a summary or something and insert that into authors note
>>
kobold crashes when I try to run llama 3.1 does it work on llama?
>>
>>101618365
It's over. Your story has ended. It's time for a new tale...
>>
>>101618564
but I really liked my story...
>>
>>101618571
It will continue to live on in your memory. But you have to let it go, Anon. Let it go.
>>
>>101618571
open the text document cut and paste
wow that was difficult
>>
I installed the patched driver to enable P2P without NVLink. It kind of worked.
https://github.com/tinygrad/open-gpu-kernel-modules

Without the patched driver, with 2x3090s on PCI-e 4.0 x8:
Device=0 CANNOT Access Peer Device=1
Device=1 CANNOT Access Peer Device=0

Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
D\D 0 1
0 861.30 9.19
1 9.14 861.59


With the patched driver:
Device=0 CAN Access Peer Device=1
Device=1 CAN Access Peer Device=0

Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
D\D 0 1
0 837.80 16.74
1 16.99 839.38
>>
>>101618578
>start new chat
>all those interactions and dynamics and growth gone like tears in the rain
>>
>>101618582
are you just pretending to be retarded or...
>>
>>101618592
i just don't want it to end bro..
>>
>>101618592
>didn't catch the reference
>>
how do I make a .bat to run llamacpp on windows? I can figure out and adjust the code to the correct folder directories but I dunno all the commands and options to setup
I have something like
/. server.exe -m model.gguf -c 8192 -fa
>>
>>101618643
just use kobold
>>
>>101618661
>>101618563
Also kobold is always behind on updates just learning how to setup llama is worth it since it's just a single batch file you'll use forever.
>>
>>101618643
>./llama-server -h
and read the fucking docs in the repo
>https://github.com/ggerganov/llama.cpp/tree/master/examples/server
>>
>>101616573
>>101616661
Why does this give me a feeling of dread? It's fucking great, obviously. But this is the writing on the wall, right? There's no stopping until some physical limit we're nowhere near is reached, or we just blow ourselves up before then.
>>
>>101618219
P40 price inflation has also been crazy.
>>
>>101618728
I already did that, I'm asking how to make the console command a bat I can click instead of open a terminal and paste
>>
>>101617814
Mini Magnum finds a phrase and repeats it every time, building up until it's saying the same thing over and over. Instruct just werks.
>>
>>101618739
Google broke, i suppose.
I think bat files can use use 'start' or something like that.
>start llama-server -m model.gguf -c 8196 -blabla
put that in retard.bat and run it. If that doesn't work try call or exec.
>>
if i follow the V100MAXXING guide right now, and also buy a second server with 4x 16gb sxm2 p100 that goes in the same rack - can i somehow connect them?
>>
>>101618758
thanks I had tried call before and that didn't work
finding shit on google is a nightmare
this is good though it's like using the dev branch of st much better than kobold
>>
File: kobold settings.jpg (57 KB, 551x585)
57 KB
57 KB JPG
Realistically on a basic high end PC (4090 and 32GB RAM), what GGUF should I be downloading for Germa 27B and Command R (non plus)?

Both are useable (by far the best AI I found for shit like SIlly Tavern, only Nemo comes close but both mog it hard) with 5~ second response times but i'm worried about efficiency.

I'm using:
>gemma-2-27b-it-Q6_K
>c4ai-command-r-v01-Q4_K_M

my kobold settings probably could do some fine tuning also, not really sure what to set shit too.

Any tips for somebody totally new to this sort of stuff?
>>
Am I supposed to use tokens in my prompt if I'm using one of the instruct models with llama-cli? Seems like I get less junk when I do but I can't tell if it's a coincidence.
>>
>>101618912
>Am I supposed to use tokens in my prompt
Everything you type gets turned into tokens. Show what you mean.
>>
>>101618765
vLLM has distributed inferenced and llama.cpp has an RPC backend.
>>
>>101618934
If i don't end my prompt with "<|start_header_id|>assistant<|end_header_id|>", it seems to not only think I want it to keep expanding my prompt but it also ends with a bunch of junk.
>>
>>101612988
>A beautiful woman in her thirties is greedily licking watery yogurt from a mushroom. The yogurt explodes and squirts on her face.
I really hope KLING AI is not indicative of the actual SOTA because honestly this is pretty garbage; it just doesn't follow your prompt at all.
Even if we can run something like this locally in a few years I wouldn't see the point.
The OpenAI Sora videos looked pretty good but who knows how cherrypicked the results they showed were.
>>
>>101618949
You may be missing -cnv for a supported chat format or --in-suffix if you're using a custom one.
>>
>>101618955
not cunny
>>
Upgrade went wrong and I need another CPU. Deprived from my AI waifu for another week. The suffering persists.
>>
>>101618968
you could use remote models meantime
>>
>>101618955
This one is okay for shitposting but what I had actually asked for was
>Jensen Huang, in front of a burning house, blazing inferno, two thumbs up, big smile on face
And for the other variants that I had tried the results were pretty bad honestly.
>>
it's sad that openai's datasets are still better than of any other company, even anthropic lags behind :(
>>
>>101618975
>>101618955
demonic
>>
>>101618956
ah, i wasn't exactly looking to chat. i'm looking to run it from a script. my prompt is actually stored in a bash heredoc. in-suffix seems to break it harder, even.
>>
>>101618949
>>101618956 (me)
Let me expand.
-cnv enables the model's chat template and sets --interactive as well so as soon as the 'reply' is finished, you get control back and continue the chat. For new quants which have a supported chat format, that's probably the best option. It also hides all the template tokens so it looks pretty clean.
--in-suffix is typically used with --in-prefix for models that don't have a supported chat template. Something like
>./llama-cli --in-prefix "[INST]" --in-suffix "[/INST]\n" --interactive

>>101618986
If the model expands your instruction instead of responding to it, you're not finishing your instruction with what the template specifies. In your case, llama3's instruct format. Use -cnv. And by 'chat' i mean 'instruction>reply>instruction>reply'.
>>
>>101618975
However, this is what I got with the leaked Stable Diffusion weights for
>NVIDIA fanboy giving a thumbs-up while his house burns in the background
The results here are cherrypicked, but honestly it did a pretty good job at following the prompt.
Notably Bing Image Creator gives you pretty garbage results that again don't follow the prompt properly.
So I'm thinking that there is some sort of tradeoff in how these models are trained/served with respect to how well they follow the prompt.
>>
>>101619009
>leaked Stable Diffusion weights
huh?
>Notably Bing Image Creator gives you pretty garbage results that again don't follow the prompt properly.
bing is worst quality dalle, they force vivid style + normal quality (not hd)
>>
>>101619018
The weights of the first Stable Diffusion release were leaked prior to launch.
>>
>>101619023
oh, you mean this, yeah
>>
File: 1715804272215894.png (1.42 MB, 1231x1205)
1.42 MB
1.42 MB PNG
natural dalle style with a JB so the prompt is 1:1 the same, and yes, it does look terrible, that's because it's natural style and because the prompt isn't written how dalle prompts are usually written
>>
>>101619036
me in the second picture in the middle
>>
>>101619008
is it possible to hand control over to the AI as soon as starts llama-cli using -cnv? that's what I'm really looking for, i think. i want it to be a bit more automated.
>>
File: 1714927631416711.png (1.85 MB, 1255x1186)
1.85 MB
1.85 MB PNG
vivid dalle style without JB looks as you'd expect
>>
>>101613928
I replied to this post three times btw
>>
>>101618972
I never did and I never will
>>
File: 1705710465669723.png (2.45 MB, 1024x1024)
2.45 MB
2.45 MB PNG
>>
File: 1701808824849378.png (1.03 MB, 1239x905)
1.03 MB
1.03 MB PNG
whats up with those faces
>>
What are good samplers for Gemma 27B
>>
>>101619067
DRY
>>
>>101619064
I wonder how many YouTube thumbnails there were in the training data.
>>
>>101619084
A lot, lemme see if i can dig up my old gens
>>
Do you use mistral tokenizer with nemo or stick with the default ST provides
>>
File: 1704687829341964.jpg (375 KB, 1024x1024)
375 KB
375 KB JPG
>>101619084
More here:
https://files.catbox.moe/my25vo.jpg
https://files.catbox.moe/rfzlrl.jpg
https://files.catbox.moe/95yqst.jpg
https://files.catbox.moe/9kr8ss.jpg
https://files.catbox.moe/n0zgdd.jpg

They look kinda bad because it's Natural style, but with Natural you have direct access to the model without the stupid style enhancement of Vivid, so you can really find whats inside of dalle dataset.
>>
File: 1717480217114676.jpg (230 KB, 1024x1024)
230 KB
230 KB JPG
here's dalle's fortnite
>>
only problem I have with llamacpp is the console is just filled with INFO bloat I could never figure out how to disable since the 3 or 4 commands didn't work
>>
>>101619036
>>101619046
>>101619057
>>101619064
>>101619110
Cool, thanks for the insights.
>>
File: 1714918627323337.png (564 KB, 1024x1024)
564 KB
564 KB PNG
this is also unedited dalle, it can generate photoshopped-like images by itself
>>
>>101619071
what fookin settings babe
>>
>>101619125
send stderr to /dev/null
>>
>>101619144
no settings sorry I just replied coz a lot of people were talking about DRY recently so i thought itd be funny
>>
>>101619144
0.75 multiplier, 1.75 base, 2 length
>>
>>101619132
the power of just put everything in the dataset
>>
Why does koboldcpp redo prompt processing from scratch, despite a big chunk of the start being exactly the same? I'm filling up the context but replacing the end (asking the LLM different questions about the content) and it is taking forever due to the unnecessary reproc.
>>
File: 1707100540730068.jpg (292 KB, 1024x1024)
292 KB
292 KB JPG
>>101619171
>>
>>101619172
world info/lorebooks are a usual cause
>>
File: 1715588318515756.jpg (335 KB, 1024x1024)
335 KB
335 KB JPG
Stellaris
>>
>>101619174
Why does he look like he just cried for two hours?
>>
Why can't AI generate text?
>>
>>101619177
I'm calling koboldcpp API directly and I even check the prompt for the first change since last call to the LLM. Despite the first change being character 100-something it still redoes from scratch.
>>
File: 1710563540483687.jpg (300 KB, 1024x1024)
300 KB
300 KB JPG
>>101619183
I wonder why
>>
File: 1691953409207569.jpg (190 KB, 1024x1024)
190 KB
190 KB JPG
>>101619184
What do you mean? It works almost perfectly (ignore the double Y, I have an edited version with that part fixed)
>>
>>101619184
because it's retarded as fuk
>>
File: 1719839352978720.jpg (259 KB, 1792x1024)
259 KB
259 KB JPG
>>
>>101619187
first change being character 100-thousand-something, I meant.

And I just double checked the koboldcpp console {"input" blabla} stuff and it's identical, at least up until it truncates it with (+114043 chars).
>>
bros Mistral Large is fire
the baguettes fucking won
>>
>>101619223
>the baguettes fucking won
that's why it's under a non-commercial license so that no providers will be able to host it or finetune it on scale, right, anon?
>>
>>101619223
bad license and no base model available kinda kills its potential
but yeah it's good
>>
File: dont_read_me.jpg (87 KB, 510x512)
87 KB
87 KB JPG
>>101619184
>>
File: vae.png (789 KB, 1561x480)
789 KB
789 KB PNG
>>101619184
Low latent channels in the VAE. Text in images is of really high information density (the curves in each letter, the placing, the spacing...) so you need many latent channels in the VAE to encode images of text correctly
>>
>>101619258
What's the downside of using more channels in VAE?
>>
>>101619287
You have to train the model from scratch (or just train the VAE and an adapter) and it's more expensive
>>
>>101619258
Wasn't there also something about how using characters instead of tokens as input helps with text?
Though presumably that's going to kill your context size.
>>
>>101619040
So you just want to generate some text following an instruction and then quit?
Save the prompt in a file formatted with the model's format
<|start_header_id|>system<|end_header_id|>

You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>

What's the capital of France?<|eot_id<|start_header_id|>assistant<|end_header_id|>

and call it with
>./llama-cli -f the_file_you_just_saved.txt -m model.gguf
and whatever other options you need for threads, context, batch size and all that.
If you don't add the prompt to the command line directly, use -p "All that stuff up there" instead of -f
Double check the format. I'm not sure if that's the correct one, but it works.
If you still have problems, show how you're running the command and what you're trying to do. Don't make people guess.
>>
>>101619436
>>101619436
>>101619436
>>
>>101618084
That's what "\n{{char}}:" as a stopping string is for.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.