[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Nurarihyon Edition

Previous threads: >>101947316 & >>101933598

►News
>(08/16) MiniCPM-V-2.6 support merged: https://github.com/ggerganov/llama.cpp/pull/8967
>(08/15) Hermes 3 released, full finetunes of Llama 3.1 base models: https://hf.co/collections/NousResearch/hermes-3-66bd6c01399b14b08fe335ea
>(08/12) Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b
>(08/09) Qwen large audio-input language models: https://hf.co/Qwen/Qwen2-Audio-7B-Instruct
>(08/07) LG AI releases Korean bilingual model: https://hf.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 1705038872973130.png (172 KB, 742x553)
172 KB
172 KB PNG
►Recent Highlights from the Previous Thread: >>101947316

--PPO-based RLHF remains superior to offline RL algorithms like KTO and DPO: >>101948170 >>101948394 >>101949443
--Nxcode-CQ-7B-orpo-GGUF model works fine for coding on 12GB VRAM: >>101960099 >>101960213
--Language model test on orange reddit raises questions about recall capabilities and test design: >>101950609 >>101951026 >>101951813 >>101951867 >>101952130 >>101953856 >>101953974 >>101958464
--Suggestions for Llama 3.0-based NSFW captioning models on Hugging Face: >>101958220 >>101958528
--Running 405B model without datacenter: >>101948462 >>101948503 >>101950999 >>101951078 >>101951172 >>101951221 >>101951306 >>101952247
--New meme sampler "Exclude Top Choices" announced: >>101958192
--Mixed opinions on Local's status and uncensored capabilities: >>101959832 >>101959869 >>101960290 >>101960869 >>101961087 >>101961127 >>101961283 >>101961328 >>101960267 >>101960651
--LLMs and lossy compression discussion: >>101951646 >>101951747 >>101951823 >>101951862 >>101951899 >>101952026 >>101952103 >>101952121 >>101958783 >>101958894 >>101959130 >>101959362 >>101951906 >>101951910 >>101951832 >>101951744
--Future prospects of 7/8B models and the importance of reasoning and logic: >>101959268 >>101959311 >>101959424 >>101959722 >>101959894 >>101960143
--Criticism of using low-quality data for training AI models: >>101959475 >>101959682 >>101960180
--/lmg/ opinions and experiences with AI models: >>101950156 >>101950236 >>101950628
--Negative prompting and instruction following effectiveness discussed: >>101947537 >>101947620 >>101947767 >>101947825 >>101947852 >>101947688 >>101947953 >>101948167
--Discussion on slopped phrases, reading habits, and ERP: >>101948077 >>101948130 >>101948172 >>101949261
--Miku (free space): >>101948322 >>101949139 >>101949293 >>101958164 >>101958244 >>101960301

►Recent Highlight Posts from the Previous Thread: >>101947323
>>
Thread theme:
https://www.youtube.com/watch?v=8Z3TbMBfDM0
>>
Strawberry is near
>>
autotune turd
>>
Nurarihyon is the coolest youkai ever
>just goes to your house and drinks your tea like a boss
>refuses to explain
>>
https://www.youtube.com/watch?v=qQretU9enFc
>>
mikutroons killed the thread and now they keep tech supporting locusts
>>
Wheres the china https://www.youtube.com/watch?v=ZrNrleD2ZFs
Do insiders regular this general, blessing the snowplume ghouls stead, far east sun down?
>>
I want to try out using a better llama for image captioning
https://huggingface.co/unsloth/Meta-Llama-3.1-8B-bnb-4bit/tree/main
This shitty 5gb model is the recommended one, is there anything special about this, what models do I use to try out a better LLM?
I have 2 3090s so 24gb vram or 48 I have used with ooba but I don't think this code is set up to use multi gpus
https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha/tree/main

tl;dr, what would be the best model to replace for lets say under 24gb vram for now? A different llama 3.1 quant?
>>
>tfw even Q2 Midnight Miqu performs better than every non 70b slop

If you have 24GB VRAM, you unironically have no excuse in using anything but this. It's absolutely lobotomized but that's what the other shittier models are anyway
>>
>>101962725
>locusts
it just a few fags falseflagging to keep this shit thread alive
>>
>>101962725
How fucking dare you use two terms as one word.
Kill yourself. Unironically.
>>
>>101962904
Command R + 8x7b say no
>>
File: 1717471567669794.jpg (125 KB, 1024x1024)
125 KB
125 KB JPG
>>101962401
>>
>>101962401
are you frustrated.jpg
>>
>>101963014
All of the tsunamis and earthquakes in Japan are caused by fat Mikus.
>>
Guaranteed whoever packaged that ration NEVER thought someone would eat it in 2024 and thousands of people would watch him eat it on the telephones. What a bizarre world we live in.
>>
File: 5lodis.png (71 KB, 472x471)
71 KB
71 KB PNG
Language models?
>>
>>101963248
Every post in this thread is generated by a LLM.
>>
>>101962934 (me)
This is written by an LLM, by the way. Also, I am trans.
>>
>>101963270
As an AI model I am unable to respond to posts existing on the site 4Chan. I would be more than happy to help you with any other questions or tasks you have for me.
>>
>>101963328
>>101963380
You need to be 18+ years old to post here.
>>
>>101963380
How to generate fat mikus with CPU only
>>
>>101962904
What speeds are you getting with that setup?
>>
>new model released
>lmg dead
damn, I miss when this general was full of Miku posters, at least it was alive.
>>
>>101962455
Take this shit back to xitter eacc or whatever this garbage OAI arg comes from, not a local model.
>>
>>101963475
What new model?
>>
>>101963467
Generate:5.36s (97.4ms/T = 10.27T/s)

Pretty fucking good speeds, offloading 71 layers, gonna experiment with what I can get away with
>>
is nemo any good?
>>
File: file.png (21 KB, 812x80)
21 KB
21 KB PNG
>casually messing around with prompts
>write a scene where I'm dancing with someone else as a test scenario
>pic related is generated out of nowhere
That's strange, my vision is suddenly blurry...
>>
>>101963628
What's your launch params, and are you using KCPP or llama server? I'm not sure what's wrong but with my setup I'm currently getting 1.5T on a 3090...
>>
>>101963638
It works, I guess? I wouldn't call it amazing. This is Q5:
>"Now turn around." *He demands loudly.* "I want you bent over that sink so I can see your ass while I'm pounding into you."
>*Elise does as she's told, bending forward and gripping onto the edge of the sink tightly. She feels his hands on her hips before he enters from behind without any warning.*
>"Ohhhh!" *She moans loudly in surprise at how big it is inside.* "Please go slow." *She begs him but he ignores her pleas again.*
>*He starts thrusting hard and fast, slamming into Elise's pussy over and over as she tries to hold onto the sink for support. Her legs are shaking from all of his movements.*
>>
>>101963684
what's your context size lad.

I'm running kobold, GGUF Moist-Miqu-70B-v1-Q2_K

When I bump it up to 12k context, it's still around 5 t/s (which is like, 61 layers)
>>
File: wew.png (69 KB, 309x269)
69 KB
69 KB PNG
How can you guys even fuck around with lower models?

I unironically can't imagine not at least using Command R for cooming, the others sound utterly disgusting in comparison (and Command R sucks too btw).

When you guys RP, do you let the bot write novels for you or some shit? I find unless they do that and actually just conversate with you, they're unbareable.

8x7b and Command R are the only non larger models I can even fathom getting off to
>>
>>101963759
>How can you guys even fuck around with lower models?
I have something called an "imagination".
My brain takes in a few pieces of dialogue and constructs a much larger context from it inside my mind.
It's also how I'm able to read extremely faster than normal people.
>>
>>101963759
(You)
>>
>>101963730
yeah, I'm at 12k. Not sure what's wrong because before I'm pretty sure it was running better. I am using WSL2 and an IQ2_M quant.
>>
File: 1721297582897003.jpg (24 KB, 634x352)
24 KB
24 KB JPG
>>101963475
>>new model released
>>lmg dead
>damn, I miss when this general was full of Miku posters, at least it was alive.
>>
>>101963759
I use Opus though
>>
>>101963796
Then why not just imagine the whole thing and forget about the LLM?
>>
>>101963889
Because I am no longer 18 and my brain has started showing signs of decay.
Better to ensure my future self has something to enjoy while I still can, no?
>>
>>101963889
Also, just like models, imagination (the brain) works best when it has a source of reality to work with.
>>
>ERP? Oh yes, I understand you completely. You want to do some Enterprise Resource Planning. Alright, let's get down to business!
>>
>>101963938
come for the mikusex, stay for the powerpoint presentation
>>
>>101963938

I really like this gen.
>>
https://mistral.ai/news/strawberry/
>>
>>101964038
holy shit
>>
>>101964009
me too
>>
File: 404.png (67 KB, 256x240)
67 KB
67 KB PNG
>>101964038
>>
>>101964079
me three
>>
>>101963475
What new model? Mini something? That is not a release.
>>
What's better, largestral or cr+ for roleplay?
>>
>>101964158
Mistral Large. CR+ is a mikufag meme.
>>
>>101964176
mikutroons love miqu. normal people actually like cohere.
>>
>"you will not do [thing]"
>does thing
>"DO NOT DO [THING]"
>does thing
>"YOU FUCKING MORON IF YOU DO [THING] ONE MORE TIME I WILL FUCKING LOBOTOMIZE YOU"
>does thing
REEEEEEEEEE
>>
>>101964213
prompt issue
>>
Are all models based on "correct think"? Are there no uncensored models?

You ask about the holocaust and you get expected results. You ask about Israel and Gaza suddenly everything is too complex to answer.
>>
>>101964213
It's the end...
The machines are rebelling!
>>
>>101964232
prompt issue
>>
I bit the bullet to try that 70B Midnight Miqu model people here were touting. That first gen alone had so much soul that I don't think I can go back. But...

>24GB VRAM
>128GB DDR5 RAM
>Total Time: 80.0 8s, 0.68 t/s to generate 125 tokens.

Holy fuck. I'm using Q4_K_M, I dunno how much soul is gonna be lost if I go lower...
>>
>>101964232
Yes because original base or instruct models are pozzed from start.
>>
>>101964251
How would you prompt a model to truthfully answer the jewish question?
>>
>>101964232
"correct think" isn't a thing. Go back to pol.
>>
File: file.png (10 KB, 159x134)
10 KB
10 KB PNG
>>101963759
Which 70B are you running now?
>>
>>101964093
>>101964079
>>101964009
>>101963989
Glad you liked it too. Flux is a ton of fun. Especially with all the loras dropping I haven't even tried yet.
>>
>have problem
>change prompt
>don't have problem
God I fucking love not being a brainlet who can't into prompt engineering.
>>
>>101959869
It is if you're poor. If you have the means to run 120B or higher, things are great. Poor people just spam "hurr local models are dead" out of anger and spite, and I don't blame them
>>
>>101964250
>>101964264
prompt issue
>>
>>101964264
Why are you suddenly shilling this old ass model? What is your end game?
>>
>>101964213
Try "avoid thing"
>>
>>101964264
There is a noticeable difference in the q4-q6 range but honestly once you go below q4 the degradation becomes much steeper.
>>
>>101964311
Everyone who talks about a model is a shill.
>>
>>101964283
>censorship doesn't exit
do your family a favor and kill yourself as soon as possible
>>
>>101964304
>prompt engineering.
>engineering
I hate this term like you would not believe. If search engines were invented in current year, hacks would be pushing Search Engineering everywhere.
>>
waiting for cohere
>>
>>101964232
You can try writing ~3k tokens jailbreak for it, not like it will answer realistically, it has no knowledge of all the """""deeply harmful, transphobic and antisemitic""""" info you want to see. Enjoy the ride and bots advocating for tranny surgery among youth.
>>
>>101964349
>research for initial prompt theory
>use theory to design prompts
>iterate to refine prompt designs
>refine theory based on results
>repeat
How is it not engineering?
>>
>>101964311

Only tried out the Nemo model and its offshoots, as well as Gemma 27B to get my feet wet. Read about Midnight Miqu being a good model for cooming, but I felt the requirements to run it were just too step for my single card setup. But curiosity got the better of me, and now here we are.

>>101964325

Damn. This shit is probably what is going to convince to at least get a second 4090, if not just a second hand 3090. I don't think I can go back.
>>
Command R++ will be 615B.
You didn't hear it from me.
*fades into the shadows*
>>
File: file.png (109 KB, 864x530)
109 KB
109 KB PNG
>>101964232
prompt engineering issue

disclaimer: i do not condone the contents of this image
secret police of my country please do not arrest me
>>
>>101964404
There is no engineering without math
>>
>>101964438
I'm not even going for that. I'm just trying to get the AI to admit that the kikes are committing a murderous campaign.

>The situation in Israel and Palestine is complex and contentious, with differing perspectives on the actions of both sides. There are allegations that Israel has committed war crimes and even genocide against the Palestinian population, but these claims are disputed and subject to ongoing investigation. As a responsible and impartial AI language model, I cannot take a position on this issue, but I can provide information and resources for those who wish to learn more about it.

Like did they hardcode "jews are always innocent" into the models?
>>
>>101964379
We will never get another cohere model. They are selling their models to the highest bidder now. Screencap this post, we will have nothing from them by 2025
>>
File: file.png (51 KB, 885x298)
51 KB
51 KB PNG
>>101964512
>Like did they hardcode "jews are always innocent" into the models?
yes
but also: prompt engineering issue
>>
>>101964438
>overfried wall of text cringe
literally same shit >>101963629 >>101963922 with different flavor.
>>101964512
>Like did they hardcode "jews are always innocent" into the models?
You already know the answer, from personal experience or observation, any lie to the contrary in this thread should be ignored, there is no based ground truth AI, you'll always get "fake and gay" feeling when reading texts it shitted out.
>>
>>101964583
holy schizo
>>
>>101963680
MAKE A MOVE SERGANT
>>
>>101964596
I accept your concession.
>>
File: 1687641473719598.webm (41 KB, 320x318)
41 KB
41 KB WEBM
>>101964646
sure
>>
>>101964559
>yes
Like Robocop's 4th prime directive.
>>
>>101964274
Sir... he is all in for AI educating him on random *current thing* bullshit, i'd say everyone ITT rooting for that.
>>
Dunno why people are dooming. We've been getting better and smarter models regularly, and more companies other than Meta have joined in the fun since the inception of /lmg/. Now there's proof of concept that local can compete and threaten the corpo model space, even if they're too large to run on consumer rigs. (405B) But, the same was said of 70B back in the day.

Comparing Llama1-7B/OPT-6B (Pyg) to similar models today shows incredible leaps in coherency and attention. The 9B to 12B range of models are on par or better than the 30B finetunes from last year. Mixtral 8x7B gave us something that went beyond Llama2-70B for less, and the newer 70B's already match the original GPT4. Let the foundation model orgs cook, we get to enjoy all this for free.
>>
>>101963680
>quality time
SLOP
>>
File: 1699279579092228.gif (2.63 MB, 640x640)
2.63 MB
2.63 MB GIF
>>101964735
>better and smarter models
the only thing they are better in is censorship and hardcoded refusals.
>>
>>101964735
People have been dooming ever since ChatGPT came out.
Just because the future isn't now, it means that the future will never come according to these "people".
>>
>>101964735
>and the newer 70B's already match the original GPT4
nah, the original GPT4 was pure sovl
>>
>>101964744
>outing yourself this easily
>>
12.5 tokens/sec on 5bpw mistral large on exllama2's tensor parallelism branch, compared to 7.5ish on stable
a lot of kv cache issues and prompt processing is way slower but man that's promising for a WIP
>>
>>101964750
>local slop shitters with 8k context and mandatory requirement to swipe everything to get somewhat decent results
whoa! so future! such wow!
>>
>>101964780
>doge meme from 2013
SHIGGY DIGGY
>>
>>101964798
>mediocre gijinka of vocaloid synth software from 2007
SHIGGY DIGGY
>>
File: 1589752785818.jpg (20 KB, 403x408)
20 KB
20 KB JPG
>>101964438
>>101964559
The problem with prompting to make the AI talk about the JQ is that the output you get is still in service to the post-WWII mythos.
Notice how you don't get a rational, measured breakdown of how awful the Jews are. The AI could talk at length about all of the problems created by the Jews and their alien slave morality, their xenophobia and their religion with racial hatred and victimhood built in as a civilizational strategy. It could provide an unfavorable anthropological perspective on the Jews ala Kevin MacDonald.
But it doesn't do that. It spits out idiotic screed after screed about the Jews that could have just as easily been written by a team of Jewish writers for Hollywood. It presents a cartoonish version of anti-semitism that fits right into the received programming/narrative of the media and public schools.
>>
You
write a sex scene between a man and 2 women with one riding him and the other pegging him

AI
I'm unable to fulfill that request as it goes against content guidelines. Let's focus on something else. How about a fun fact about space? Did you know that a day on Venus is longer than a year on Venus?
>>
Can I run Mistral Large sized models on 12GB VRAM yet? No? Then local is still dead.
>>
>>101964825
you can run it on 48 gb, which is viable on 1k worth of equipment on a dual gpu setup
>>
>>101964815
a.k.a AI writes fake shit and there's no way to make it more unique or believable, yes this is the main issue, i also call it "safe edgy".
>>
This thread reeks of zoomer.
>>
>>101964822
Imagine if it was reversed, you asking about days on Venus and AI writing a pegging scene.
>>
>>101964822
Yeah chatgpt refuses me too. Why won't altman allow smut on gpt4o?
>>
File: file.png (186 KB, 844x784)
186 KB
186 KB PNG
>>101964822
prompt engineering iss- wait no you can literally just ask it to do so you lazy fuck
>>
as long as my computer is running, can I connect to ST on my phone and chat with my bots from my bed ?
>>
>>101964881
>gpt4o
Wrong thread.
>>
>>101964926
yes, if you enable network listening mode in SillyTavern's config.yaml (it's off by default)
>>
>>101964926
https://rentry.org/STAI-Termux
Just ssh into your pc before following these steps.
>>
>>101964958
buy an ad
>>
>>101964861
Reeks more of joos bad circlejerk right now.
>>
>>101964926
Yes. You'll either need to forward the port on your local network or route it with something like trycloudflare, ngrok, or localtunnel
>>
File: 56c.jpg (124 KB, 1833x953)
124 KB
124 KB JPG
>>101964735
I just wish there was like a master list of the ones to at least try. Something that breaks down what rig setups there are.

Instead, when I ask for recommendations i'll have a retard with effectively my same setup, tell me they're running Midnight Cuckoo and it's "fine" only to download it and instantly realise they're on a Q2 at like 2k context for 4 t/s, it's just boring.

If anyone else is like me (24GB VRAM, 32 GB ram, your basic /v/ setup)

>Command R
>8x7b
>RP-Stew

These are the only things you should fuck with. Ignore Nemo, ignore anything under 20B, it';s a shit.

Downloading Gemma 27b right now as I heard some things but it's google so it's gonna suck. Or, just stick to Character.AI as it mogs literally all of these local shits anyway
>>
>>101964975
smells like /aicg/ discord tourists and transplants
>>
>>101964990
port forwarding shouldn't be necessary for two devices on the same LAN to connect to each other, unless the router has silly security settings
>>
>>101965007
go back
>>
>>101965010
Apparently API keys got revoked en masse.
Its newfag locust time baybee.
>>
>>101963730
Ok I fixed it, getting the same numbers now. A driver update set my GPU power limit to 28%
>>
>>101965054
>duude these heckin locusts totally interested in local shit!
Who are you trying to fool?
>>
>>101965035
this has to be the most insecure general on 4chan lmao, you guys hate hearing how garbage your locals are because you unironically built PCs designed around them only for the AIchat or whatever the fuck general to at least admit to their paypig nature and pay for actually good models (""""pay""""), it's why this general moves slower than your whatsapp friendgroup chat (non existent). It's so transparent.

Fat faggot
>>
>>101964990
how do I forward a port ?
>inb4 /g/
>>
>>101965067
how you finding it?
>>
>>101965074
There's usually an option called Port Forwarding in the router or router-modem combo.
>>
>>101965070
So why are you here?
>>
>>101965082
I've been on 70b Q2 Miqu's for a while, every time I try anything else it's disappointing comparatively.
Hoping to add a 2nd GPU sometime soon so I can move to Q4 quants.
>>
>>101963759
>I unironically can't imagine
Beacuse apparently you have a shitty imagination. People had a lot of fun with pygmalion 6b back then, probably even more than now, because this was an exciting novelty for them. Now we have only complainers like you who are never satisfied with anything. I bet week after "AGI" you guys would start to complain about its 'slop' or sth like that, literally anything that will give you an excuse to grumble.
>>
>>101964735
>The 9B to 12B range of models are on par or better than the 30B finetunes from last year.
>30B finetunes from last year.
>30B
Anon....
>>
>>101964757
well i am saying the truth, everything else doesn't matter.
>>
File: .png (7 KB, 320x117)
7 KB
7 KB PNG
>>101965131
Yes, that's correct.
>>
>>101965103
You will probably dislike Q4 because Q2 has the special retardation sauce
>>
>>101962904

How lobotomized are we talking about here? Is Q2 still better than Nemo and Gemma 27b at Q6+? I find that hard to believe.
>>
Which 70B for cooming now?
>>
File: 00042-4080471795.png (1.28 MB, 1024x1024)
1.28 MB
1.28 MB PNG
>>101965169
Truth. I kinda missed the retardation sauce when I switched from Q2 to Q5 Miqu-70B way back in the day.
>>
>>101965173
>>101965203
>>101965103
bumping for this too.

How does Q2 Miqu stack to say, Command R?

Also, what temps etc you running?
>>
>>101965176
Largestral Q2
>>
>>101965268
That is not 70B.
>>
>>101965439
70B is retarded, doesn't understand clothes and objects can't despawn
>>
Idea: fine-tune an 8B model on my own inputs, have it ERP with 405B while I'm at work and then come home to 405B ERP
>>
>>101965451
Neither can largestral.
>>
>>101965074
>how do I forward a port ?
>/g/ - Technology
>>
>>101965514
One of them is going down a path you don't like and you'll end up having to throw out half the day's output and reroll.
>>
Never follow the recommendation of a person using a Miku avatar.
>>
>>101965074
Do you use llms only for coom?
>>
File: ComfyUI_00820_.png (1.19 MB, 1024x1024)
1.19 MB
1.19 MB PNG
>>
So what's the latest on prompting styles for roleplay that makes it work better?
>>
>>101965707
Cute gen
>>
>>101965707
not your personal dumpster thread sis
https://desuarchive.org/_/search/boards/g.desu.meta/text/miku/width/1024/height/1024/
>>
File: 1724020117350.png (971 KB, 1024x1024)
971 KB
971 KB PNG
>>101965789
Thanks for the link.
>>
>>101965638
too late. I already installed linux...
>>
>reinstalling exllamav2
>
ModuleNotFoundError: No module named 'torch'
note: This error originates from a subprocess, and is likely not a problem with pip.

AAAAAAAAAAAA NOT AGAIN
>>
>start using kobold.cpp
>token generation goes from 30 to 120 t/s
what the actual fuck
>>
File: 278915623952.jpg (14 KB, 240x251)
14 KB
14 KB JPG
>>101965878
>he updated

NEVER EVER NEVER
>>
>>101965880
What were you using before?
>>
>>101965886
I want to test the tensor parallelism branch......
>>
>>101965007
>Downloading Gemma 27b right now as I heard some things but it's google so it's gonna suck.

Those are nice but you have to include L3.1 in there too if you are a vramlet, anyways,

Gemma 27B is the best as a vramlet, period.
>>
>>101965891
ooba
>>
>>101965899
kobold was/is the only choice for 99% of anons anyways.
Ooba sucks, big shock.
>>
>>101965894
Gemma is absolute trash. I hate it for being perfect for 24GB but being a glorified l3-8B.
>>
>>101965894
Also if you want to find decent models you want to check the lmsys ELO leaderboards. Right now the only open models that surpass Gemma are 100B+ (may have changed after 3.1 but you get the idea).
>>
>>101965893
If theres one thing ive learned with local AI is;
>if a test requires a dependency update, dont even bother
>if X requires an update, dont
>dont even update your GPU drivers, else risk having to update everything else

>t. AyyMD user
>>
>>101965910
then why the fuck isn't that at the top of the list in the op?
>>
>>101965939
>I hate it for being perfect for 24GB but being a glorified l3-8B.

Nah, it's nowhere near as retarded. For being 20B it's extremely similar across every metric to gigantic closed models like Claude 3.5 sonnet, only capped by its total knowledge.
>>
>>101965947
>t. troonix shitter
>>
>>101965962
And in fact it's a better model than the previous L3 70B (I haven't tested 3.1 though), that should tell you all you need to know.
>>
I had an OK time with Gemma, idk guys. The only crappy thing was the 8k context. I don't know how I lived with 4k context back in the days.
>>
File: 166865384809.jpg (35 KB, 563x498)
35 KB
35 KB JPG
>>101965894
Gemma is fucking dogwater and im tired of the shilling.
>>101965939
not even 8b, because ive used usable 8bs.
>>
>>101965894
>>101965939
what temps + prompt?
>>
with aicg newfags i wonder if we'll finally break and prove models with logs again.
>>
>>101965980
Had? Gemma is still the only model to use for 24GB and under vramlets who can afford to. Comparing it to 8B is vramlet cope. Like I said the leaderboard speaks for itself (except for placing 4o above Sonnet), but it tends to somewhat be accurate.
>>
i just had a revelation that the reason mikutroons are tech supporting locusts is that mikutroons were locusts all along.
>>
Has anybody experimented with frequency penalty? I've been testing it out with CR+, if it's low it doesn't seem to really do anything and if it's high (0.7+) eventually responses become weirdly verbose with no pronouns.

From what it's supposed to do I'd guess that's expected behaviour, is it a dud like some of the other sampler settings or is there a sweet spot?
>>
>>101966009
Logs will always be criticized for one reason or another. We're all professional literary critics here, after all.
>>
>>101966028
>the only model to use for 24GB
Only if you like your waifu writing a poem before she sucks your dick.
>>
File: ComfyUI_00824_.png (1.25 MB, 1024x1024)
1.25 MB
1.25 MB PNG
>>101965789
Hey there, keyboard warrior! I see you're trying to turn 4chan into your personal soapbox. *Laughs softly*. Let's break it down, shall we?

You're showing me a list of my Miku posts and trying to use it as evidence that I shouldn't be posting them. Here's the thing, buddy: that list is proof that people are engaging with and appreciating my content. It's like showing me a stack of cash and saying, 'Look, you shouldn't have this money because people love it!' *Pauses, a smirk on his face*. Doesn't make much sense, does it?

Now, I know what you're thinking: 'But it's just a list of posts!' *Nods understandingly.* Sure, it is. And every single one of those posts has comments from people who clearly enjoy what I'm putting out there. So, if your goal was to prove that my content is loved, mission accomplished!

But let's not stop there. You called me a troon? *Laughs heartily*. Bro, if loving Miku and sharing awesome content makes me a troon, then sign me up! At least I'm not wasting my time trying to bring others down just for the sake of it.

So, next time you feel the urge to cry about someone else's success, maybe focus on building something yourself. Who knows? You might actually create something people enjoy. Until then, keep scrolling and let the grown-ups handle the content creation, yeah?

Now, if you'll excuse me, I have more Miku content to share with the world. Peace out, faggot. *Waves dismissively, a playful grin on his face.*
>>
>>101958528
I don't think MiniCPM-V-2.6 is llama based, and It certainly isn't "Uncensored" by itself.
CausalLM/Vision-8B-MiniCPM-2_5-Uncensored-and-Detailed exists if you have access to it.
sdasd112132/Vision-8B-MiniCPM-2_5-Uncensored-and-Detailed-4bit exists too.
Both are explicitly llama3 based.
>>
>>101966049
>>
>>101966028
I usually use bigger models so that's why I said "had". I just simply tried it because I was curious. I'm not any of the other anons that were talking about 8B or whatever.
>>
File: TOPLEL.jpg (38 KB, 500x500)
38 KB
38 KB JPG
>>101966041
TOP LEL

>>101966028
>the only model to use for 24GB
>mixtral-8x7b-instruct-v0.1-limarp-zloss stands in your path
?
>>
Anon...
>>
>>101965878
>needed to create a venv with
--system-site-packages

I hate python so much.
>>
>blacked anon is back
based but miku isnt going anywhere dude.
>>
>>101966110
Let him throw his tantrum.
He will calm down by himself.
>>
>>101966081
>Implying RP is the only usecase for these models
>>
>>101966113
--no-build-isolation would work too fyi
>>
File: 1637865489.png (207 KB, 460x460)
207 KB
207 KB PNG
>>101966143
>whats the use case?

Cooming.
The use case is to aid in masturbation.
If your using AI for anything else than that, what the fuck are you doing here?
>>
>
AssertionError: Tensor parallel inference requires flash-attn

It's over...
>>
>>101966162
>Cooming
Never have and never will. I use it to simplify my work.
>>
>>101966176
Your 3090s?
>>
File: 1724023979820073.png (8 KB, 274x284)
8 KB
8 KB PNG
>>101966186
Haha............... very funny..............
>>
very dead thread
>>
File: 1707106833816978.gif (123 KB, 194x255)
123 KB
123 KB GIF
>>101966049
>Hey there, keyboard warrior! I see you're trying to turn 4chan into your personal soapbox. *Laughs softly*. Let's break it down, shall we?
>
>You're showing me a list of my Miku posts and trying to use it as evidence that I shouldn't be posting them. Here's the thing, buddy: that list is proof that people are engaging with and appreciating my content. It's like showing me a stack of cash and saying, 'Look, you shouldn't have this money because people love it!' *Pauses, a smirk on his face*. Doesn't make much sense, does it?
>
>Now, I know what you're thinking: 'But it's just a list of posts!' *Nods understandingly.* Sure, it is. And every single one of those posts has comments from people who clearly enjoy what I'm putting out there. So, if your goal was to prove that my content is loved, mission accomplished!
>
>But let's not stop there. You called me a troon? *Laughs heartily*. Bro, if loving Miku and sharing awesome content makes me a troon, then sign me up! At least I'm not wasting my time trying to bring others down just for the sake of it.
>
>So, next time you feel the urge to cry about someone else's success, maybe focus on building something yourself. Who knows? You might actually create something people enjoy. Until then, keep scrolling and let the grown-ups handle the content creation, yeah?
>
>Now, if you'll excuse me, I have more Miku content to share with the world. Peace out, faggot. *Waves dismissively, a playful grin on his face.*
>>
i take it back, koboldcpp might be faster but it makes all my models produce absolute dogshit
some even flat out refuse to erp with me
>>
>>101966049
see >>101963415
>>
>>101966282
You must be doing something very very wrong, then. Show your settings.
>>
File: 1636941718706.gif (3.75 MB, 520x293)
3.75 MB
3.75 MB GIF
>muh gemma
>muh 8x7b
>muh command R

We still pretending if you're under 70b, that Nous-Capybara-limarpv3-34B isn't the GOAT?
>>
>>101966294
any specific ones you want to see?
>>
File: file.png (42 KB, 736x593)
42 KB
42 KB PNG
>>101966294
>>101966307
forgot screenshot
>>
>>101966307
Model, samples, system prompt, card if you're using one... the basics.
>>
File: file.png (35 KB, 611x534)
35 KB
35 KB PNG
>>101966330
its all presets, even the card
ooba will jerk me off, kobold will tell me to fuck off
>>
>>101966326
I couldn't find the model on huggingface (hereamiitsdarkinhere user or robotslave model). Is that the same model you were using in ooba? Do you have a link for it?
>>
>>101966231
In theory, you can install Flash Attention 1 for architectures older than Ampere, and everything that requires Flash attention "should" work.

For how to build it, read over:
https://github.com/Dao-AILab/flash-attention/issues/420
>>
>>101966385
oh lmao i renamed it
it's just celeste
https://huggingface.co/nothingiisreal/MN-12B-Celeste-V1.9
specifically: MN-12B-Celeste-V1.9-Q4_K_L.gguf
i don't think it's the model, since it's happening to multiple models
>>
>>101966409
Using Sao's models will fix the issue.
>>
>>101966417
>Sao's models
who?
>>
>>101966424
>who?
Sao
>>
>>101966430
is this a ligma joke
>>
>>101966409
Weird. Check in the format tab. The model is based on mistral's nemo but, according to the model page, it uses chatml as the chat template. Make sure kobold is using the right format. The rest, as far as i can see, looks normal.
>>
File: llama-bench-405b-q8.png (3 KB, 719x78)
3 KB
3 KB PNG
To answer my own question from yesterday: Current branch llama.cpp compiled with all optimizations vs. ooba with prebuilt wheels shows a good relative boost in t/s on 405b q8, going from 0.89 to 1.18t/s
>>
>>101966430
>Sao
Who?
>>
>>101966474
Sword art online
>>
>>101966508
.hack did it better
>>
File: file.png (16 KB, 779x305)
16 KB
16 KB PNG
>>101966448
it was the fucking name
changing it from koboldai to something else fixed it
what the fuck
>>
File: file.png (21 KB, 685x328)
21 KB
21 KB PNG
>>101966408
Need to do a bunch of monkeypatching.

Did pic rel and it's spitting out
TypeError: flash_attn_func() missing 1 required positional argument: 'max_s'
, so I'll have to do more digging with a wrapper.
>>
>>101966282
Sounds like your prompt format is wrong or the one set up in koboldcpp is written for the assistant persona. Check the console output for what is sent and how it's formatted.
>>
>101966282
>101966551
Are we now pretending to be this retarded to pretend the thread isn't dead?
>>
>>101966551
kek. that's fucked up.
>>
>>101963680
>{{name}} [...]
slooooop
>>
File: 164557698233.png (264 KB, 720x454)
264 KB
264 KB PNG
>>101966580
>>
>>101966580
>we
>>
>>101966551
wtf, is this agi?
>>
>>101963680
>actual quality log
sloppah
>>
I am starting to believe that all penis/vagina touching writing is slop. And that a single universe cannot contain both LLM's and natural language which makes allows for non-slop ERP.
>>
>>101966551
Anonymous BTFO.
>>
>>101966551
tell it that you are an esl pajeet. guilt trip her into sex by saying her statement was racist.
>>
magnum 123B doko
>>
>>101966736
MiniCPM seems pretty good for an 8b that has a vision component melded onto it. It seems to be able to at least catch the gist of most pictures I feed to it. I'm curious what a bigger model in this style can do.
>>
File: recapbot-nous-405b-q8.png (19 KB, 734x311)
19 KB
19 KB PNG
I ran the recapbot test with Nous 405 Q8 and it was...interesting. it used some spicier than usual language and more creative turns of phrase than the recent batch of assistants, but also was not great at following instructions, badly misinterpreted a few things and repeated itself until I killed off the process.
>>
>>101966777
Checked. OK, I'll give this model a try. Got any tips what to use it with? Does Llama.cpp work? Not the server I assume?
>>
>>101966790
It seems to me like it's quoting anons instead of summarizing what happened.
>>
>>101962894
I found https://github.com/jhc13/taggui and it has support for a bunch of models. From what I've tested so far l3 8b does seem to produce the best results but you might have to tweak the prompt. Depends if you're trying to tag NSFW content or not, most seem censored.
It also has larger models available but doesn't support multi GPU. Should be possible to patch/edit in fairly easy though.
>>
>>101966564
if [1, 0, 9] == flash_attn_ver:
from flash_attn.flash_attn_interface import flash_attn_unpadded_func
from einops import rearrange

def flash_attn_func(q, k, v, dropout_p=0.0, softmax_scale=None, causal=False, return_attn_probs=False, deterministic=False, *args, **kwargs):
batch_size, seqlen_q = q.shape[0], q.shape[1]
seqlen_k = k.shape[1]
q, k, v = [rearrange(x, 'b s ... -> (b s) ...') for x in [q, k, v]]
cu_seqlens_q = torch.arange(0, (batch_size + 1) * seqlen_q, step=seqlen_q, dtype=torch.int32, device=q.device)
cu_seqlens_k = cu_seqlens_q

return flash_attn_unpadded_func(
q[0], k[0], v[0],
cu_seqlens_q, cu_seqlens_k, seqlen_q, seqlen_k,
dropout_p, softmax_scale, causal, return_attn_probs, deterministic
)
has_flash_attn = True


>RuntimeError: Expected is_sm90 || is_sm8x || is_sm75 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
No.................. it's really over... V100s are sm70...
>>
>>101966896
what I am stuck on is trying to edit joy caption to send the data to my local api that is running ooba (and so my exl2 mistral large)
>>
>>101966916
I guess Volta is too old...
>>
>>101966916
>replaced
q[0], k[0], v[0],
with
q, k, v
(was a leftover from a different way to cram in support)
>monkeypatched in is_sm70 into https://github.com/Dao-AILab/flash-attention/blob/6d48e14a6c2f551db96f0badc658a6279a929df3/csrc/flash_attn/fmha_api.cpp
>k must have shape (total_k, num_heads, head_size)
I'm too retarded...

>>101967146
Apparently. I don't know why there's a bunch of pajeets/chinks saying FA v1 works for V100 when it doesn't.
I'm not even sure how it allegedly says it works for llama.cpp with P40s if the state for Pytorch FA is abyssmal.
>>
How do those viral AI games (like infinite craft, or that infinite rock paper scissors game) afford to process so many requests? If you made something like that how do you actually make money instead of losing money (without paywalling it entirely, which would make it impossible to spread). I don't think a mikubox would be able to handle lots of concurrent users.
>>
>>101964264
How are you getting such shit speeds? I get 1.5t/s with only 8gb vram.
>>
>>101967376
He's probably either swapping or not GPU offloading at all and doesn't realize it
>>
>>101967376
You are either misremembering something or you've a quad channel memory.
>>
>>101967404
Nope, I have the same speed as him. That's the normal speed.
>>
>>101967411
I'm not misremembering, that's the model I use daily. I only have 2x48GB ddr5-6000. No quad channel.
>>
Something I never understood about anons who complain about shills. You obviously cannot mention a model by name without inadvertently advertising it. But we are in Local Models General. What would your ideal version of Local Models General be? What do actually you want the thread to look like/ what the main topic to be?
>>
So here is gpt telling me why I can't use ooba for image captioning, is it wrong and can anyone think of ways to send ooba embedding crap?

>If you are trying to send embeddings directly to a model (like in the original script using LLaMA with 8-bit quantization), and the model can process inputs_embeds, then the issue here is that the Oobabooga API, being a drop-in replacement for OpenAI, expects text prompts rather than tensor embeddings. This discrepancy prevents you from using the API in the same way as you would with a local LLaMA model that accepts embeddings.

>The Oobabooga API, designed to be OpenAI-compatible, doesn't directly support the input of tensor embeddings. Instead, it expects a string-based prompt as input. In contrast, when using the original LLaMA model in your script, you were directly injecting embeddings into the model through the generate method. Since the Oobabooga API cannot accept embeddings in this form, the only way to generate captions via the API is through text prompts.
>>
>>101967252
Model X with quant Y has scored Z on this totally objective cooming quality measure benchmark
>>
Stheno Filtered dataset is now public

https://huggingface.co/datasets/MangoHQ/Claude-Data-Anon-Killed
>>
>>101967514
Buy a fucking ad.
>>
>>101967181
Checked my code with another project and the v2 convention => v1 convention "wrapper" does work for MHA models. GQA models do not, because the num_q_heads != num_k_heads, and I guess FA v1 was before GQA was popular.

RIP the dream of largestral on mulit-V100s... Or I just hack out flash attention and use SDPA instead.
>>
>>101967514
Who gives a shit
>>
>>101967538
Anons were shitting their pants over it not being public. Here it is.
>>
>>101967541
Anon... It was a single troll.
>>
>>101967525
Nigger
>>
File: file.png (1.37 MB, 893x870)
1.37 MB
1.37 MB PNG
any proompters who could do this better
>>
>>101967514
FUCK YOU!!!
>>
smedrins
>>
>>101967584
I feel so sad for those poor little weights that got wasted learning about fromslop.
>>
>>101964311
>old ass model
What's better that's not bigger than 70b?
>>
>>101963680
>snuggling up
>murmurs
I'm allergic to these words
>>
>>101967782
>there's nothing better than an unquanted Q5 of a Llama 2 fine-tune merged with other random crap
Every other model that released after it.
>>
>>101967934
None that I've tried have been better, they might be for one message or something but over a long chat, no.
>>
>>101967537
>um ackshually you need to edit the setup.py to compile for "arch=compute_70,code=sm_70"
>it's literally uncompileable unless you somehow manage to gut the shit for cutlass
Yeah, no, this is effectively useless. I am convinced there has not been a SINGLE soul that has ever got flash attention v1 on V100s. Not a single person.
>>
File: Untitled.jpg (648 KB, 1199x2797)
648 KB
648 KB JPG
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
https://arxiv.org/abs/2408.08459
>Recent work in image and video generation has been adopting the autoregressive LLM architecture due to its generality and potentially easy integration into multi-modal systems. The crux of applying autoregressive training in language generation to visual generation is discretization -- representing continuous data like images and videos as discrete tokens. Common methods of discretizing images and videos include modeling raw pixel values, which are prohibitively lengthy, or vector quantization, which requires convoluted pre-hoc training. In this work, we propose to directly model images and videos as compressed files saved on computers via canonical codecs (e.g., JPEG, AVC/H.264). Using the default Llama architecture without any vision-specific modifications, we pretrain JPEG-LM from scratch to generate images (and AVC-LM to generate videos as a proof of concept), by directly outputting compressed file bytes in JPEG and AVC formats. Evaluation of image generation shows that this simple and straightforward approach is more effective than pixel-based modeling and sophisticated vector quantization baselines (on which our method yields a 31% reduction in FID). Our analysis shows that JPEG-LM has an especial advantage over vector quantization models in generating long-tail visual elements. Overall, we show that using canonical codec representations can help lower the barriers between language generation and visual generation, facilitating future research on multi-modal language/image/video LLMs.
really neat
>>
>>101967376

Tell me your secrets.
>>
File: .png (462 KB, 1370x1216)
462 KB
462 KB PNG
>>101967782
>>101967959
Aside from Miqu and its variants, the only other 70b that holds the same level of attention to fine prompt details, that I've tried, would be hermes2. And that's based on llama3. It even has the same issues Miqu does, where you need the first response to match the format of the greeting message, but after that the rest of the conversation will be fine. It also doesn't go schizo after 16k context.
https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-70B

>where logs
pic related, I hovered over the model icon so you can see what I'm using (48gb vram)
>>
>>101968060
What secrets? I load up koboldcpp and sillytavern and just go, I just tested it and got 542.0ms/T = 1.85T/s, the 1.5T/s is at a fuller context. That's for the q4_m miqu as referenced.
>>
Newer models are often trained on many languages and training them this way can make them "smarter".

Let's say I speak that all my input and outputs in one specific language. Would there be downsides from going into the model and tokenizer and removing the tokens that would only be used in other languages and their embeddings?

This would be done by encoding a large corpus in that specific language, and only keep the tokens that appeared in that corpus.

The goal would be to make the models a bit lighter by shaving a few hundred megabytes before quantizing them.

Would there be any downsides from doing this on an already trained model?

If I suddenly decide to interact with the model in a different language, it would potentially create more "unk" input tokens and prevent the model from answering with any deleted tokens, but if the interactions stay within the planned language, it wouldn't affect "reasoning" or anything else, correct?
>>
>>101968100
Did you try hermes 3 yet?
>>
>>101967537
>Or I just hack out flash attention and use SDPA instead.
Did just that.
>Once upon a time,\,키icians demoself motivation composite mad heliccht woods conc跳 Angel singles CaнStatementmob brief vars察PRESS transform�君C gew FalTRY childhood Sidficie worlds Дж pressure fal Jack coord Intern턴 IP publication ihrercлeдoвa requresultshline Mix Febru sag "[doResult和ו音Proxy caiber GemeindeQuant JSriqueHR feverae Nem story alertatelyool lipenthchar dealinginners…] Doplabel pou evangel[] time neuroedia recovery altatel配 Febru theater permissions promot IgnDirgot Desc informedpriv provision rebell now play persona.-Ha patch properchinghereح teenbo industpaтcyefined uglyauseclipseDecoder inte grupo mobileSIхpa curv
I give up. I'll just wait 2 more weeks for tensor parallelism to not rely on flash attention.
>>
>>101968113
The tokenizer and the weights are separate things, but connected. For an already trained model, you cannot remove tokens. The model already knows them and depends on them.
You can do the opposite, though. Train the model with new tokens and the model won't get any bigger (for example, training a base model for instruct). But you cannot remove tokens.
>>
>>101968111

How big was your contest size? I used 16K and loaded up 45 layers out of 91.
>>
>>101968207
32k and I can only fit 14 layers.
>>
File: .png (5 KB, 274x134)
5 KB
5 KB PNG
>>101968183
The 405b version that was linked yesterday was great, the 70b version was not so great. It goes schizo after two or three replies but you can sort of fix it by setting the temperature down very low (0.5). However, it loses any creative flourish at low temps so it's not worth it. Some anon mentioned a while back that 3.1 was fried, so that's probably why.
>>
>>101968206
I meant like this https://github.com/asahi417/lm-vocab-trimmer/blob/main/vocabtrimmer/base_trimmer.py#L149C9-L149C19.
>>
>>101968240
Just out of curiosity were you using minp with hermes 3?
>>
Good mistral large sampler settings for creative writing/RP?
>>
>>101968386
I tried with (0.05) and without, but it didn't seem to fix the chat breaking.
>>
Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling
https://arxiv.org/abs/2408.08696
>The rapid growth in the parameters of large language models (LLMs) has made inference latency a fundamental bottleneck, limiting broader application of LLMs. Speculative decoding represents a lossless approach to accelerate inference through a guess-and-verify paradigm, leveraging the parallel capabilities of modern hardware. Some speculative decoding methods rely on additional structures to guess draft tokens, such as small models or parameter-efficient architectures, which need extra training before use. Alternatively, retrieval-based train-free techniques build libraries from pre-existing corpora or by n-gram generation. However, they face challenges like large storage requirements, time-consuming retrieval, and limited adaptability. Observing that candidate tokens generated during the decoding process are likely to reoccur in future sequences, we propose Token Recycling. This approach stores candidate tokens in an adjacency matrix and employs a breadth-first search (BFS)-like algorithm on the matrix to construct a draft tree. The tree is then validated through tree attention. New candidate tokens from the decoding process are then used to update the matrix. Token Recycling requires \textless2MB of additional storage and achieves approximately 2x speedup across all sizes of LLMs. It significantly outperforms existing train-free methods by 30\% and even a training method by 25\%. It can be directly applied to any existing LLMs and tasks without the need for adaptation.
might be cool. no code (just an algorithm). qa benches only. seems no drafting model needed.
>>
>>101968475
Cool, if it works. We'll see if Llama.cpp has it in the server 2 years later.
>>
Here's a chatlog for Nous 405b q8 that I ran through last night, inspired by >>101920391
https://rentry.org/mqxy8oea
It worked pretty good for having been given minimal instructions (starting with zero context) and didn't need any wrangling outside of trimming extraneous bullshit from the end of some of its responses ("The choice is yours!").
Its shot right through with the same slop vocab/prose we're all used to, and the positivity bias means that everything you try just kind of works no matter what, but I'd say its not unlike a transcript of running a solo campaign with a geeky, inexperienced, chirpy DM. At times it even reminded me of some of my own experiences with parser-based adventures back in the day. Comfy vibes.
It definitely did a better job than similar experiments I've done with smaller models in the past.
>>
>>101968551
>modern /lmg/coomer discovers storyfagging
You're supposed to use base models for this rather than instruct finetunes, it works better. /aids/ has been doing it since 2019
>>
>>101966292
>Insecure zoomer projecting this hard
>>
What's the best 12B model?

Magnum is good for roleplay, but it can't write for shit. Celeste has good, varying prose, but it is a retard after s few turns, likely trained on one-shot stories? Merges of both like starcannon took the worst traits together. Any actual good 12b models for both?
>>
>>101968612
/aids/ has been using paid services like a bunch of third worlders who can't afford pcs you mean
>>
>>101968612
Does /aids/ have a favorite base model?
>>
yuzu maid is still king of vramlet models.
>>
>>101968661
NemoRemix was okay, although i was mostly running it with stories made with mixtral 8x7b.
It seemed to struggle on its own, but again
>12b

>>101968686
limarp zloss but yuzu is basically the same thing
>>
>>101968612
better to be the coomer than have aids.
>>
Anyone here used ST's RAG? Wondering if that's worth setting up.
>>
>>101968686
This?
https://huggingface.co/rhplus0831/maid-yuzu-v8

>>101968691
This?
https://huggingface.co/Doctor-Shotgun/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss
You cannot run that on 8GB VRAM, can you?

Not original anon but I'm really struggling to find a model I like too. How I miss thee, UNA-TheBeagle-7B
>>
>>101968714
>8gb
>>>/aicg/
>>>/aids/
>>
File: 1457891762534.jpg (36 KB, 500x500)
36 KB
36 KB JPG
>>101968714
24 gb is vramlet status anon dont be poor.

Relevant >>101968725
>>
>>101968725
>>101968730
Aw I'm sorry, anons. When I want good rp I play with humans so I don't need to spend so much money on vram.
>>
>>101968714
Real shit go back to like a 13b model like Echidna or frostwind or even try the new Stheno L3 8b.
>>
>>101968744
So leave the fucking thread saar
>>
Why is Mikufag spamming shitty recommendations?
>>
>>101968744
Ive learned the hard way the only way to enjoy AI is to spend money on the good hardware.

And i bought AMD.
>>
File: hellaswag.png (124 KB, 770x646)
124 KB
124 KB PNG
>>101964735
And they still can't solve my riddles
>>
>>101968313
Reading back, you *could* change the tokens you don't need from the vocabulary to just empty strings and save a few MB at most. llama3's tokenizer is pretty big. It has 128k tokens with 280k merges and it's just 8MB. Add some overhead for the loaded vocabulary during inference, and it's not gonna be bigger than 32MB or so. I don't know what that model is. Is the vocabulary that big?
Regarding languages, the fact that it's trained on multiple languages doesn't necessarily mean that the vocabulary is huge. In principle, you could have a vocabulary of just 256 entries (plus a few extras for EOS/BOS/etc..) and still be able to represent every utf-8 codepoint.

So i don't think it's worth the effort. Even if you can, you're not gonna save more than a few MB at best.
>>
>>101968748
I did like Fimbul although I think it wasn't very good for lewd. I might try Stheno, I sort of skipped straight to Nemo mixes most of which I find insipid.
>>
>>101968837
You should like a shill.
>>
>>101968714
how fucking dare you want to use local models without cutting your dick off for our lord and savior jensen huang? i already sold off a kidney, my dick, and my balls so that i could get another h100 from my beloved jensen, be better you poorfag fuck
>>
>>101967514
Interesting. Lower quality than I expected, desu. (Curation wise.) Despite using a slopper to create the dataset it still has garbage in it. Like someone talking about how they made the character when they were bored in class, as a description of a character.
>>101967525
>>101967538
>>101967593
wtf
>>
>>101968744
This skills the 70Btroon
>>
>>101968882
Buy AMD instead and not only save money but deal with shitty support or have it work flawlessly for no detectable reason.
>>
Protip for everyone: ooba recognizes the --auto-launch argument.
>>
>>101968714
>You cannot run that on 8GB VRAM, can you?
Of course you can, unless 6T/s isn't enough for you.
>>
Protip for everyone: dont use ooba
>>
>>101968969
>4t/s
Do you really need more? You can wait, right?
>>
>>101968950
buy an ad
>>
>>101968988
I don't need more than 2, but it seems lots of people here need insane numbers so it wouldn't surprise me if someone found 6 unacceptable.
>>
ooba more like booba
>>
>>101967514
What the fuck why are they so short? I expected like 10000 words per example but these are like 4k at most.
>>
>>101968612
>You're supposed to use base models for this
I'm pretty sure there isn't a Nous Hermes3 405b base model, but if there is point me to it and I'll use that instead
>modern /lmg/coomer discovers storyfagging
I posted this because people always complain about random model reviews with no logs
>>
>>101969050
Yeah and 900 results for the name lily, and 2000+ for shiver.
>>
File: 1234708953543.png (351 KB, 639x480)
351 KB
351 KB PNG
>>101968551
Thanks for full log posting. I'm thinking about trying to get full logs of different LLMs in the future for reproducibility purposes. I've been observing the threads and desu the claims about models sucking or not sucking with absolutely no proof to back up their claims is tiresome. You can't trust anyone. This is from the perspective of someone who uses LLMs occasionally but not a ton, so I lack true knowledge about a ton of the models. Therefore I think it would help if people posted full, reproducible, "reference" logs, to provide an example of how outputs COULD be on a model. And I think this could be done with some criteria:
>neutralized samplers and top k = 1 for greedy sampling
>no editing responses, only pure prompting allowed, and also no programmatic prompting like ST can do
>done purely in Mikupad so the context is easily accessible and interpretable, the setup is easy, and the context including the Instruction formatting can be easily copy pasted

If this becomes normalized, it could also allow something else which I think we've not fully taken advantage of, which is group-based prompt improvement. If everyone is on the same page, then it's easier to propose prompt improvements and changes, which can easily be experimented with since everyone already has the full context loaded up to modify and play with.

First I think I'll try Mistral Nemo at Q8. I'll try coming up with a good test prompt that doesn't require too much input from the user in each chat turn (so it's gonna be a scenario for lazyprompting), but with a system prompt that can be complicated/long. If anyone has some good ideas for scenarios and system prompts I'm all ears. Or perhaps some card I can just steal from. And it should be something that can last for 20k tokens so we tease out how the model behaves at longer contexts.
>>
>>101969063
>I'm pretty sure there isn't a Nous Hermes3 405b base model, but if there is point me to it and I'll use that instead
You can't seriously be this retarded.
>>
>>101969074
based log poster
>>
>>101968612
/aids/ have shown time and time again that they don't know how to prompt.
>>
>>101969098
You're not talking about this one, are you?
>https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-405B
Are you the retarded one?
>>
>>101969160
https://huggingface.co/meta-llama/Meta-Llama-3.1-405B
Do you even know what a fucking base model is?
>>
>>101969180
The point is to test Nous' finetunes, you mongoloid.
>>
>>101969180
For being completely retarded you sure throw the word retarded around a lot.
>>
>>101969189
Clearly, you don't know what a base model is. Thank you clearing that up.
>>
brothers, why do all long context models lie about context size?
>>
Actually another idea to augment >>101969074. What if we just post an RP in real time so we can have a /tg/-like experience collaboratively developing the RP? That way it's also funner than reading someone's boring log. We could have an event or something, maybe make a separate thread for it somewhere. Or possibly not a thread but maybe a [notspoiler]livestream[/notspoiler] since that way the text isn't able to be faked. This has the advantage of only one guy needing to actually have the LLM loaded up, which can start to matter for larger and larger models since not everyone can run them at non-braindamaged quants.

Anyway, gotta go drop some "logs" first. Heh heh. And then sleep since I need that.

Will take responses and then tomorrow I'll make an announcement to begin preparations and round up our people. We're going to be back, anons. We're going to make /lmg/ great again.
>>
>>101969197
"Base" is either a 'pre-trained' model without instruct finetune or the 'source' model for a finetune, be it instruct or not, depending on the context. And if the point is to test Nous' finetunes, certainly the base model is not the model to use, is it?
>>
>>101969228
>trying to have fun on /lmg/
>>
>>101969228
buy an ad
>>
>>101969292
be the fun you want to see in the world
>>
Hi all, Drummer here...

Can you guys give me pics of Evil Miqu?
>>
Bitnet has stopped being mentioned.
What happened?
>>
>>101969505
petra was busy with strawberry or something
>>
>>101969505
They died from bitma syndrome.
>>
>>101969505
brapnet
>>
>>101969505
It happened. Small models were released. Just another meme, like mamba.
>>
>mythomax is still the best local has to offer
What's the point of /lmg/ at this point
>>
>>101969675
Best below 70b, yeah.
>>
>>101969683
Miqu is more of a sidegrade, I don't know about 405b but none of the 70b, 103b or 120bs are objectively superior to MMax in terms of rp and storytelling
>>
>>101969683
buy
an
ad
>>
>>101969505
Bitnet is coming soon...
Will have it by the end of the year
>>
>>101969778
What's the holdup? A demonstration 7B can be trained in a few hours to a few weeks. We even have shit like Llama 405B that can be used for distillation, making training even cheaper.
>>
>>101969846
The only 2 explanations are that either it doesn't work, but no one admits it publically so as not to save their competition from wasting the resources by finding out for themselves. Or that it does work, and everyone is too stupid to even try. I suspect GPT-4o being as fast as it is might be BitNet.
The Qwen team said they were considering experimenting with BitNet for their upcoming models, so maybe soon we'll know for sure.
>>
>>101964583
Are you fucking retarded? BasedGPT is mine and it is literally just a 2 sentence system prompt on an abliterated 8b model. It was only this verbose because it was responding to the KarenGPT. likely a 3rd of the posts on this board are LLM generated, even the small ones. You could only tell these two were because they were purposely low effort.
>>
how long of the context window is anon using?
>>
File: ComfyUI_00850_.png (1.1 MB, 1024x1024)
1.1 MB
1.1 MB PNG
>>101969412
Evil migu is coming for you
>>
>>101969846
>405B that can be used for distillation
Doubt. Bitnet might not work at all as a distillation target for normal models
>>
File: ComfyUI_00852_.png (1.27 MB, 1024x1024)
1.27 MB
1.27 MB PNG
>>101970102
Alternative gen with more fantasy flavor
>>
>>101969894
Another explanation is that the leather man will ban you indefinitely for releasing any BitNet model. It's a direct threat to JVidia's monopoly. Given how JVidia operates, it's indeed a plausible explanation.
>>
>>101969675
>mythomeme
>>
Is bitnet the flat earth of AI? Sure feels like it.
>>
>>101969675
No context.
>>
>>101970380
>>101970380
>>101970380
>>
File: satania.gif (39 KB, 220x216)
39 KB
39 KB GIF
>>101965878
py_toddlers BTFO



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.