[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: IMG_8099.jpg (436 KB, 1536x2048)
436 KB
436 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101474151 & >>101464048

►News
>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/
>(07/16) Codestral Mamba, tested up to 256k context: https://hf.co/mistralai/mamba-codestral-7B-v0.1
>(07/16) MathΣtral Instruct based on Mistral 7B: https://hf.co/mistralai/mathstral-7B-v0.1
>(07/13) Llama 3 405B coming July 23rd: https://x.com/steph_palazzolo/status/1811791968600576271

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>101474151

--Understanding Context Shifting in koboldcpp and its Differences with llama.cpp: >>101475054 >>101475102 >>101475238
--Optimal Placement of System Prompts and Character Cards in Mistral: >>101481173 >>101481456
--Putting on our thinking caps: comparing numbers with R and the impact of temperature settings: >>101483736 >>101484037
--Proper Formatting for System Messages,: >>101479160 >>101479788 >>101479796 >>101479903
--Llama.cpp is way faster now with GPU offloading, and Anon seeks help with rope scaling: >>101476940
--Fixing EOS token issues with Mistral-Nemo's tokenizer: >>101480410 >>101480853
--Estimating concurrent users for LLMs: a cat in a computer case: >>101482136 >>101482335 >>101482481
--AI: A Slave to Our Lonely Needs, But Not a Replacement for Genuine Connection: >>101475534 >>101475665 >>101475932
--SillyTavern Template Implementation and Best Practices: >>101480417 >>101480751 >>101480853
--New Mistral's Impressive ERP Character Description: >>101478112
--Mistral-Nemo: A Surprising Contender for Best Local RP Model: >>101478725 >>101478769 >>101478932 >>101478952 >>101480297
--Gemma 27b Review: Sovl but Formatting Issues and NSFW Avoidance are Drawbacks: >>101474884 >>101475375 >>101483374
--Dark and Moody Depictions: LLMs and Sensory Deprivation: >>101476284
--Miku (free space): >>101476953 >>101485255

►Recent Highlight Posts from the Previous Thread: >>101474172
>>
>>101487448
>her breath warm against your neck
>sly grin
>her voice a sultry whisper that sends shivers down your spine
>lips brushing against your ear
>nips lightly at your earlobe
>eyes locked on yours
>a smirk plays on her lips
>voice dripping with seduction
holy shit I've never seen a model this shiverslopped before
god damn what did they train it on
also lmao 236b parameters for this
>>
File: 1706634404113415.png (14 KB, 526x355)
14 KB
14 KB PNG
Will LLMs forever suffer from gptisms?
>>
Looks like it's going to be a waiting game for the Nemo finetunes to drop. Really curious if it's better than gemma 27b
>>
>>101488159
WHY WOULD YOU THINK THAT. STOP BRINGING THAT UP.
>>
>>101488159
Wait, so gemm2a was better than 70B and now there's a 12B that might be even better than that?
>>
>Nemo isn't free on OR and is priced near 4o-mini
Oh... Any new logs please? I promise to read the recent highlights when I wake up.
>>
>>101488042
Fusing with Miku and Rin
>>
>>101488157
they're trained recursively on their own generations now and are not trained for solely rp

draw your own conclusions
>>
>>101488117
Pure GPT4 with an extra sloppy purple prose prompt.
>Wai man laiks roleplay. Let's add a lot of data for it.
>Yeah. Use GPT-fo, it's the smahtest, they will laik it.
>>
>>101488175

It hurts, anon. The waiting hurts. I need to share the pain. I'm just generating image sets of my waifu, maybe that can help while you wait.
>>
>>101488157
I like how "GPTisms" are just run of the mill literary cliches that you'd see in books and journals. It proves /lmg/ never read anything before they started gooning to llms.
>>
>>101488376
The issue is that the models stuff them into their gens at every given opportunity like a 14 year old fan fiction writer who's trying to ape his favourite young adult writer's shitty style.
>>
>made up recap title
I'm dying from cringe again...
>>
Any good tunes of gemma or new mistral?
t. took a break after llama.cpp fiasco.
>>
>>101488408
tiger gemma is functional
>>
>>101488395
It's because [insert assistant LLM here] is playing the character of a corporate assistant with a professional tone, so when you force it to erp an anime girl getting plapped, it's actually roleplaying a corporate assistant with a professional tone who is uncomfortably roleplaying an anime girl getting plapped. So it generates the bare minimum of a caricature of erp text with a heavily exaggerated style, like a parody.
>>
>>101488469
it happens too with models that don't give a shit about complying
>>
>>101488433
No it's fucking not.
>>
>>101488534
I am reading legible outputs produced by tiger gemma right now.
>>
>>101488376
Yeah they'd be appalled if they go to the usenet story archives.
>>
So Mistral format is the same as old Mistral, but remove spaces and move the system prompt inside the last user message?
>>
I know the llama.cpp quants are fucked for Mistral, but how about exl2? Any problems?
>>
tiger gemma 9b werks
friendship with llama 8b officially ended
>>
>>101488159
Community finetunes will make it worse, don't place your bets on them. We're not in the Llama1 days anymore.
>>
>>101488744
I think exl2 quants are broken in subtle ways and probably we won't see the full model quality until post-training quantization algorithms properly take into account that the model has been trained in FP8.
>>
Who the fuck shills gemma? I just tried it and it sucked. Going back to CR+.
>>
wow.
tried exllama2 for the mistral nemo model. its been months and last time i tried the original version.
its become even slower than back then. what the fuck.
is exllama now only for ampere cards and later? i have a pascal card and its just a horrible experience.
no idea why loading the model takes 2m+ either if its not from ooba but directly latest version from github.
auto split directly throws an OOM. huge prompt loading times. what a shitshow.
if we didnt have gpu anon people with older cards would be fucked. bless him. for his work.
>>
I thought mistral fp8 was supposed to be lossless?
https://huggingface.co/neuralmagic/Mistral-Nemo-Instruct-2407-FP8
>>
>>101489040
I would guess people like me who only can use stuff until ~30b.
stheno is such a horrible experience. I dont get the hype about mixtral at all and the chinese models in that size range arent that good either.
Gemma 29b feels like a huge upgrade. Its more "present". You can feel the remnants of the pyg retardation we had back then with smaller models.
>>
>>101489057
That's what they say

Mistral NeMo was trained with quantisation awareness, enabling FP8 inference without any performance loss.
>>
>>101489057
That might depend on the exact FP8 format used (exponent/mantissa bits); if there's a mismatch with the one used for training the model, there will be quality loss.
>>
>>101489083
Add to that that l3 70b isn't really good at rp and comprehending user context and feels on par with gemma now in that regard (but light ages behind Wiz and CR+), and you've got your audience.
>>
>>101489057
of course there's bound to some quality loss, especially since they aren't using the exact same code that Mistral used during training
fp errors can accumulate from anywhere, so 99% recovery of the unquanted is still incredible
>>
Local died in 2023. Only 12 good models have been released since then (most of them by Cohere, being already trained by 2023). Local achieved its creative peak in models like Mythomax, L2 Euryale, and SuperCOT, elevating the field into a legitimate SOVL form. Now, thanks to Llama3 and Mixtral, all its potential was squandered and the field has been reduced into being mere riddle solvers for reddit idiots (i.e. the lowest common denominator - stop trying to turn open-source AI into corposlop).
>>
>>101489340
youve been called gay in a lmg thread, doesnt mean local died
>>
>>101488469
That's hot.
>>
>>101489178
You've had satisfying experiences roleplaying with WizardLM-2 8x22B? Could you share a prompt / card combo that worked out well?
>>
>>101478112

As awesome as this might look, the problem is that it's still the same vocabulary, from the same basic GPTslop dataset. I recognise each and every one of those rote expressions that are being cut and pasted together, now.

Also, while I still don't completely understand the popular fixation with demons where ERP is concerned, I assume that the appeal in the case of a succubus, is a life form that literally views semen as a food source, and who doesn't require emotional gratification before being willing to suck someone off. That was the main reason for the appeal of futanari in my own case, I've realised.
>>
>>101488469
>plapping anime girls instead of your corpo assistant
Grow up.
>>
>>101489564
didn't you say you hated llms and wouldn't be back petrus/petra?
>>
>>101489340
I never used Euryale, but I view Nous Hermes and Dolphin Mixtral 2.5 as the local peak, personally...although Goliath was an almost spiritual experience, as well.

Broadly speaking I agree, though. Llama3 is corporate woke garbage whose every word sounds like a marketing press release. I think what really crushed me was when I realised just how much the suits WANT language models which act and sound like either L3 or contemporary GPT4. Soulless, sterile, safe, completely predictable...and utterly useless and pointless.

It was a beautiful dream, but it honestly looks like it's over.
>>
>>101489595
yes undi is releasing less models, it's over go away now
>>
>>101489595
Uh, soulless and sterile definitely wasn't the general opinion when Llama-3 got released. Safe, definitely; predictable, probably.
>>
>>101489607
he's just here to doom now, he said so himself before he hasn't used any new llms
>>
>>101489340
Damn hate to sound overdramatic but owari fucking da
>>
File: denomolos+.jpg (378 KB, 791x662)
378 KB
378 KB JPG
>>101489593
Sorry to disappoint you, Heinrich. I'm still here now and then.
>>
>>101489085
There are two FP8 formats that NVidia proposed: E5M2 and E4M3, which one does NeMo use?

https://developer.nvidia.com/blog/nvidia-arm-and-intel-publish-fp8-specification-for-standardization-as-an-interchange-format-for-ai/
>>
>>101489623
we know... the threads have been utter dogshit, so of course you're here...
>>
>>101489612
I used several L3 finetunes, and I've also used a couple of Drummer's Gemma tunes, as well. Gemma is promising as a coombot, but I honestly just don't have the motivation to test LLMs for anything other than ERP any more.
>>
>>101489436
I've had satisfying experience making Wiz work through my complicated context and produce logical, believable, horribly cookie-cutter responses out of the box, no special system prompt required. Prose in those can later be prettified/edgified/humanized/followed with Storywriter. Storywriter by itself fails to understand what's going on even in moderately complicated prompts, much like all other l3 finetunes.
>>
>>101489652
no you haven't you're literally just here to demoralize because you're burnt out and want others to be as miserable as you
>>
>>101489637
I haven't been posting nearly enough to make that happen by myself. These threads are dogshit mostly because very little is genuinely happening at the moment. If I also can't make even the most innocuous statements without you immediately arching up and telling me to get out, then that's your problem, not mine.
>>
>>101489667
What models have you been testing recently, Anon?
>>
>>101489630
>2 bits of mantissa
You have to be shitting me. At this point why even bother?
>>
>>101489340
And in 2022 we only had 1 model. pyg. No quantization.
I wrote it before but you needed 5 swipes to get something resembling coherent text. And it was amazing.
Closed models we had mormons running gpt2 in the background who leaked loli chats and banned paying users. Moderators reading over everything.

Either closed or open source we have it better than ever.
Sonnet 3.5 is a huge step up. Its so good. And I really like the new Gemma. Simple prompt and its much more uncucked vs. llama3 and thats google.
You are probably one of those twitter pajeets who said agi by autumn 2024 and are now crying.
I understand zoomers have been fucked over so hard they have no energy left. I'm a millennial.
But if I had all those ai tools we have now when I was young and had more time.
I had fucking rpg maker. Needed to ask artfags for their dumb ass charsets and make compromises everywhere. Music and sound effects were a struggle as well.
You can literally make videos for free now with a prompt. Crude and short maybe. But this is so far ahead of what I had I cant even put it in words.
>>
>>101489734
That's what I'm using though?
>>
>>101489792
I'm retarded please ignore
>>
>>101489775
I'm glad you're enjoying it, Anon, honestly. I wish I knew how to get some of your enthusiasm back. I think my real problem is that I was around for maybe the last three months before Character.AI went completely to shit, and anyone who experienced that will understand how hard it is to let go of that memory. Anything else we experience, short of literal AGI, feels like a step down by comparison. Goliath was the only other model I've seen that has come close.
>>
>>101489834
This, but unironically.
>>
>>101489834
>I'm glad you're enjoying it, Anon, honestly
no otherwise you wouldn't be here demoralizing
>>
>>101489834
Fair enough anon.
I think I understand, at the beginning chatgpt was so good.
It sniffed out what you wanted without it being explicitly prompted.
Difficult to put in words but like its mission was to serve the user as best as possible.
That lasted a very short time and its never been the same.
Just lean back and relax. Even alignment wise it seems directionally we are heading into a better direction.
>>
I don't get all the character.ai love, yes I tried the first versions, a few days then I didn't care anymore
>>
>>101489929
you didn't care anymore because you couldn't sex it.
>>
>>101488042
Wft Fish Audio is actually good? It's a bit slow but I can tolerate that for the sheer quality compared to other models.
>>
>>101485864
Sweet fiddlers fuck I haven't seen this many gpt-isms in a long while. Husky whispers, going on journeys together and forming bonds, it's all there. On the plus side the characters are a lot more well-spoken than I've seen with other models.
As usual at 70b the question is 'is this model worth losing 32k context' and the answer is definitely not here.

Also has GGUF gotten better over the last 3ish months? My gen times are cut in half.
>>
>>101489957
First version could be sexed to hell and back, though.
>>
>>101490120
>Also has GGUF gotten better over the last 3ish months? My gen times are cut in half.
yes, there are lots of people saying it's gotten much closer to exl2 speeds recently
>>
>>101490129
It didn't last very long
>>
NEMO LCPP STATUS?
>>
>>101489907
>Even alignment wise it seems directionally we are heading into a better direction.
I don't understand why /lmg wasn't more enthusiastic about Dolphin Mixtral 2.5 in particular, to be honest. It was great. Great compliance with prompts, and text generation that honestly felt close to GPT4 at times in my experience.

https://huggingface.co/TheBloke/dolphin-2.5-mixtral-8x7b-GGUF

If you've never tried it, give it a go, Anon. It's awesome.
>>
>>101490333
>Dolphin
gptslopped to hell that's why
>>
will 400b be our savior or the final nail in the coffin of open source llms
>>
>>101488879
can it follow chat formatting? retain all quotes and asterisks?
>>
>>101490385
no
>>
>>101490382
It depends on the instruct tune provided by Meta; hopefully it won't be as cucked as the previous L3-instruct. Almost nobody will be able to finetune it, although the model is so large and will have a long enough context size that perhaps in-context learning with the base model will be enough for most uses.
>>
>>101490333
>Dolphin Mixtral
I like it more than L3 and recent models (with the exception of CR) for chatting.
>>
>>101490333
>>101490431
limarp-zloss or dolphin?
>>
>>101490463
Dolphin is honestly GPT6 tier, no cap.
>>
>>101488042
>give cats breakfast
>their excitement is so palpable that it sends shivers down my spine
>>
If you had the chance to purchase a server with 8xAMD Instinct Mi100 32GB GPU's for 7000€, would you do it?
>>
>>101490515
gpt brainrot has consumed you
>>
>>101490385
sometimes
>>
>>101490120
The latest llama.cpp update has really improved speeds especially with full gpu offloading, may be as fast as exl2 now. Booba hasn't updated yet though so its still slow on boobs.
>>
>>101490545
>AMD
>>
>>101490693
yeah, thats the point of the question.
>>
>>101490545
I would say no....
#1. Pricing it out on a GPUs only basis you're not really getting any kind of bulk buy discount on that if we go from the bottom of the stack.
#2. no bitsandbytes support.
Which is fine for just running models in exl2 or gguf. no bitsandbytes needed. But if you want to start playing around with training you're more or less relegated to fp16 training.
#3. Each card has about 1/3rd the fp16 performance as a 3090. So even if you did find a massive model to load up with them, due to the inefficiencies added with multiple GPUs, which get worse with every card that you add, like let's say Q4 405B you're probably not looking at a particularly useable experience. More useable than a gen-1 epyc or haswell xeon rig, but that's not saying much with a model that big.
>>
Are we getting any other sizes next week or just 405B? Saw an anon say they are refreshing the whole lineup.
>>
>>101490545
For that price you can buy a proper CPUMAXX server with 500GB RAM that can run 70b at like 7t/s
>>
>>101490785
rumors of 8/70B 128k
>>
>>101490382
I don’t think they’ll risk having it too aligned, it will probably be the closest to uncensored yet.
>>
Installing wheels from source gives some ninja error, and how is this supposed to work?

export VLLM_VERSION=0.5.2 # vLLM's main branch version is currently set to latest released tag
pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-${VLLM_VERSION}-cp38-abi3-manylinux1_x86_64.whl
# You can also access a specific commit
# export VLLM_COMMIT=...
# pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/${VLLM_COMMIT}/vllm-${VLLM_VERSION}-cp38-abi3-manylinux1_x86_64.whl

Means there are supposed to be wheels for each commit, but I have to specify a version, which would be older than the commit? (url doesn't work)
>>
>>101490725
>Each card has about 1/3rd the fp16 performance as a 3090
huh?
I looked at wikipedia and it said 184.6 TFLOPS for FP16. And 3090 has like 35 TFLOPS for fp16.
is that misinformation?
>>
>>101490998 (Me)
just to confirm, i went onto amds website, and it said the same thing.
https://www.amd.com/en/products/accelerators/instinct/mi100.html
>>
I feel so good about not buying a second card just for this. I seriously considered that for a moment.
>>
>>101491029
Oh, google lied. My bad.
That's actually really good.
I don't know exactly where the memory bandwidth bottleneck kicks in with gpu inferencing so I still can't say if it would be good for 400B or not. I can appreciate wanting to run it for the memes, but if you feel like CR+ and 70B aren't good enough the reality is nothing will actually please you and you'll just end up with buyer's remorse.
>>
405B isn't going to be inherently have a better writing style, it's just going to be less prone to making retarded mistakes where it generates words inappropriate to what's going on, right?
>>
>>101491153
And it will be a trivia master.
>>
>>101491087
Its not for cooming, its for preparing for the dystopia :)
>>
>>101491178
I'd still be worried about the lack of AMD/Legacy support for bitsandbytes.
Sucks that huggingface sucks Jensen's cock that hard but it is what it is.
Because you'll probably want to be able to train in the dystopia.
>>
>>101491041
Imagine how those that bought 6 must be feeling right now.
>>
>>101491173
Riddle master, but bad at trivia.
>>
what type of person uses ai for anything but cooming
>>
>>101491460
Indians use it for programming
>>
>>101491460
a man skilled a tech but not social interaction
many such cases
>>
>>101488159
It doesn't need a finetune.
>>
>>101490385
yes
>>
>>101490998
>>101491029
The 35.6 TFLOPS number is for regular FP16 operations.
With tensor cores an RTX 3090 has 142 FP16 TFLOPS or 284 int8 TOPS.
>>
What would be a good tiny model to pack into a game? For now Im thinking
>teknium/OpenHermes-2.5-Mistral-7B (4GB ram usage)
>>
Having just fapped to a game with my fucked up fetish + horrendous writing my post nut clarity made me think about something. Is the problem uncanny valley in text form? The quality of writing in that game is absolutely atrocious. Like 14 year old fanfic level. But I don't mind it that much. On the other hand when I see shivertastic beneath the whisper gleams in the eyes I start to quickly lose my erection.
>>
>>101491690
You sound illiterate.
>>
File: humanslop.png (90 KB, 1581x738)
90 KB
90 KB PNG
>>101491690
No, the problem is data diversity. LLMs have consumed too much data with shiverslop and not enough data without it. Erotic fiction is already a niche and erotic fiction written is unslopped way is even more rare. We need better data, possibly hybrid data(heavily edited synth data), because we don't have a lot of human data.
>>
>>101491690
is the game called euphoria
>>
>>101491690
The solution is to replace erotic literature dialog with translated eroge visual novel dialog. The shivers and whispers will be replaced with can't be helped's and pleasures of being cummed inside
>>
>>101490333
What about the other dolphin models? I think I had dolphin dbrx downloaded but never tried it.
>>
Word on the street is that this is an upgrade over niitama.
https://huggingface.co/nothingiisreal/L3-8B-Celeste-V1.2
>>
>>101491880
yes it is
>>
>>101491880
no it isn't
>>
>>101491923
>using Reddit
>>
What Kobold Preset do I use in Silly Tavern for gemma 2 based models?

I'm on the latest version of Silly Tavern which has presets for context and instruct in the advanced formatting section but the Kolbold Presets in the first menu doesn't seem to have any thing for Gemma2?
>>
>>101491990
neutralize samplers
temp 1
>>
>>101491640
so if 3090 is better for integer does that mean it'll be superior in 2 weeks time when bitnet becomes the norm?
>>
>>101492046
I'm struggling to get even 50% of the peak int8 tensor core throughput for MMQ so probably not.
>>
>>101491923
Using reddit, very smart of you, unironically.
>>
Is Gemma 2 27B generation quality on Exllama on par with Llama.cpp yet?
>>
>>101492135
No, nobody even made an issue.
>>
does mistral work properly with llama.cpp?
>>
File: 1713871087817283.jpg (91 KB, 640x720)
91 KB
91 KB JPG
As my eyes devoured those overused phrases and hackneyed words, I couldn't help but feel a fiery mix of frustration and exasperation coursing through my veins. My blood practically simmered with righteous indignation, threatening to boil over at any moment. A shiver of annoyance ran down my spine, and I found myself clenching my fists, my knuckles turning white with barely contained irritation. The very sight of such literary clichés sent waves of displeasure pooling in my belly, my jaw tightening as I struggled to contain the tempest of emotions swirling within me.
>>
I'm on vacation, but remoted in to my rig long enough to do a recapbot test with the new deepseek. I haven't been following the threads closely enough to really evaluate performance, but on the surface it seems to have done a good job. How does it compare to recapanon's multistage recapbot's output for the last thread?
>>
>>101492182
No.

llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_q.weight' has wrong shape; expected 5120, 5120, got 5120, 4096, 1, 1
>>
File: Overlord.png (749 KB, 1000x562)
749 KB
749 KB PNG
I just realized I find 1 on 1 RPs extremely boring. I want a group of characters interacting with me and each other, each one with their own thoughts, life and motivations. Oh well, 10 years or something to wait.
>>
>>101490316
https://github.com/ggerganov/llama.cpp/issues/8577
Support for the custom 'Tekken' tokeniser just got pushed 30mins ago. An inference implementation will soon to follow up.
>>
>>101492266
Nice. Can't wait for this and then the dozen tokenization bug fixes later on.
>>
How's base vs instruct Nemo for RP?
>>
I don't even know what to do with 128k context
>>
>>101492265
Most models are still trained with a 1-on-1 paradigm (user-assistant), that will need to change first.
>>
>>101492265
skill issue

>>101492286
they're both terrible
>>
>>101492233
How many t/s do you get and at what quant? I got 5-6 t/s at bf16
>>
>>101492241
this works but you need to compile it yourself, hopefully "official" support soon enough
https://github.com/iamlemec/llama.cpp/tree/mistral-nemo
>>
>>101492297
For me? I roleplay as an immortal being and go around impregnating random girls, fast forwarding a decade or two, impregnating my daughters then revealing my relationship to them after they give birth and having a kek at their reaction, and repeating this infinitely.
>>
>>101492266
WE'RE SO BACK???
>>
>>101492302
skill issue my ass, sota models can't even realistically portrait one character, not to mention multiple ones
>>
>>101492302
skill issue
>>
>>101492265
I imagine somebody would have to program the model to speak for each character in the group, processing each personality and the conversation up to that point.
They could call it group chat or something.
Alas and lack the day, such functionality is just a dream.
>>
How do you set up the AI to have a cooming session?
I installed KoboldAI, installed what I assume is a good model - OpenHermes-2.5-Mistral-7B, tried chatting with it, but it gives kind of shit answers.
Do you have like a top tier chatting preset?
Am I using the right model?
Where do you get character cards now? I don't see anything in OP.
3080 12gb, 16gb ram.
>>
File: nemosystemmessage.png (63 KB, 967x336)
63 KB
63 KB PNG
I just want to rant that the way "system messages" are implemented in Mistral NeMo Instruct is utterly retarded.
>>
>>101492344
>sota models can't even realistically portrait one character
I take my earlier "skill issue" comment back and aim it at this one

>>101492357
both just seemed broken to me
>>
>>101492328
I've tried doing a long roleplay but it just turns into a collaborative writing session because I'm the one that ends up guiding the narrative anyway. I wish there was an event system or something, like a wildcard prompt injector that you can load with dragon attacks or equipment breaking down or whatever
>>
>>101492372
>installed what I assume is a good model - OpenHermes-2.5-Mistral-7B
bad bait
>>
>>101492297
You use it to insert up-to-date information on whatever the fuck you're trying to do and enjoy 12B speeds and current-day-bigger-than-12B knowledge.
>>
>>101492374
How's it retarded? They're trained to continue token sequences. A token is a token. Whether or not it's \n or <|system_message_end_im_a_midwit_retard_and_need_handholding|>
>>
>>101492408
Other than coding and other long-context productivity scenarios, in theory 128k of usable context would be very useful for in-context learning and base models. I haven't had much luck with models released so far though, they generally tend to get confused with too much information in context.
>>
>>101492385
If you think that Claude 3.5 Sonnet, GPT-4o or whatever is even close to emulating human behavior than you have to touch some grass and talk to actual people. I will assume you are trolling and not being retarded or basement dweller who didn't see a sun for 20 years.
>>
>>101492396
I don't go to this general like you so I have no idea what's meta right now. I only use Stable Diffusion.
>>
>>101492454
>I don't go to this general like you
yeah which means you need to go back, likely to reddit
>>
>>101492362
I can program a button with the label "click to cum in 1 sec", doesn't mean it will.
>>
>>101492444
>realistically portray one character
vs
>close to emulating human behavior

If you think those things are equivalent, then I'm not the retard
>>
What nemo instruct version are you using? Unsloth?
>>
>>101492478
They are virtually the same things, now you are nitpicking.
>>
>>101492388
You can just prompt it to include random events and twists of fate. Tell it to make it realistic or as wacky as possible.
Funniest thing the AI did for me was introduce Shrek walking into the cafe my daughter-wife and I were at then slapping the cashier after they said they don't sell onions.
Works perfect for me using L3 70B New Dawn
>>
>>101492533
It's very clear that they are virtually the same thing /to you/.
>>
>>101492362
You literally just need to set up a loop where each character is an isolated agent, have it iterate over the characters, provide the context of the conversation and then use JSON to choose from a list of actions such as wait, reply, etc. example character "chooses" wait, then it moves on to the next, character "chooses" reply then it re-prompts the model for the character to give a reply. Use regex to make sure it's writing for the correct character and discard the reply if it fucks up. It's not rocket science. You could even have it maintain a text file for each character containing dynamic summaries for each character and have it added to the context.
Why has nobody done it yet? Because it's easier to just make a multi-character card and deal with the shortcomings since after you coom you're not going to give a shit about any of it anymore anyway.
>>
>>101492563
yes they are, something else?
>>
/lmg/ is like the concentration of the worst people on /g/
>>
>>101492301
There's something called 'writing a story'. You should look it up. Maybe pick a book sometime?
>>
>>101492627
models should pick a book first, they are terrible at writing
>>
>>101492603
Character portrayal is not restricted to humans or human behaviour. Not a very shocking option.
>>
>>101492619
Nah, that's aicg
>>
>>101492589
jeez he was being sarcastic group chats existed for months in silly...
https://docs.sillytavern.app/usage/core-concepts/groupchats/
>Swap character cards
>>
>>101492638
both are pretty shit desu
>>
>>101492634
Let me guess, only Kayra can write stories? Go back to /aids/, shill.
>>
>>101492645
tavern is jeet spaghetti code, though, and anything it does can probably be done better and with about a thousand less lines of code.
>>
>>101492638
aicg is better because it has people interested in writing. /g/ is just Americans seething about Indians and anyone else smarter than them.
>>
>>101492637
>nitpicking: noun, /ˈnJtˌpJk.Jŋ/ us /ˈnJtˌpJk.Jŋ/. Giving too much attention to details that are not important, especially as a way of criticizing.
>>
>>101492658
>Why has nobody done it yet?
>nobody
>>
>>101492658
And yet there is no 1000 line reimplementation of a better Tavern and hasn't been all year. Any idea why?
>>
>>101492649
since when Kayra is not a model, retard? take your meds
>>
>>101492678
why bother? It just works.
>>
LimaRP-DS dataset now available, as promised.

I also trained one model on this (sunfall-v0.5). It feels... refreshing.
>>
>>101492700
I'll wait for LimaRP-3DSXL
I know how nintendo is with these things.
>>
>>101492668
Nope, that's not a nitpick. You just seem unable to grasp the concept of what I've explained to you, and that's okay.
>>
>>101492700
based! thanks anon!
>>
File: 135.png (21 KB, 700x700)
21 KB
21 KB PNG
>>101492731
>Ackchyually the word 'character' can refer to non-human persona. They can be monsters, animals, and other creatures! They don't behave like humans! I'm very smart.
>>
anons... i won't lie : mistral nemo feels different
it's a little retarded but it has sovl, i've been starved of relevant sovl since the mythomax days and it feels weird having a model that doesn't spew out the same standard flowery shit over and over again
definitely need to tweak parameters because i'm using my bagel misterytour template and it's not quite the best for it, but damn mistral outdone themselves on this one
>>
>>101492776
buy an ad arthur
>>
>>101492717
>>101492733
You're welcome. I realized that I kind of demolished some of the structure of the original dataset when I did the conversion (e.g. the data-long vs data-short dirs are gone; also trashed all BAD and WIP entries). A fine tuner with moderate level IQ should be able to get it right, but I may restore some of that if needed.
>>
I'm starting go look into this whole local models thing, and I have a 6gb card that seems to run llama 3 with 8b fine.
I was willing to upgrade to a 16gb card, but the more I read the more pointless is seems? Apparently I won't be running the 70b version anyway
Does having 16gb vram even matter for casual prompts? I'm having a hard time gathering factual information about how this whole thing work over a bunch of "(doesn't) works on my machine"
>>
>>101492785
you're a dumb nigger, you're the blackest retard gorilla i have ever seen
>>
>>101492785
>posts about merge/tune
>Kys shill only use true corpo models!!!
>posts about corpo model
>Buy an ad
...
>>
>>101492793
I wouldn't upgrade unless you were going for at least 24gb.
T. Running 8gb of vram.
>>
>>101492793
not worth it until you get at least 24+gb vram no
>>
File: nico_rosberg.jpg (162 KB, 600x706)
162 KB
162 KB JPG
>>101492700
Cool. Will take a look.
>>
>>101492785
>>101492811
samefag
>>
>>101492638
lmg is aicg-lite, same shills, same baits, same avatartrannies.
>>
>>101492441
doesn't that code mean the system message is prepended to the last user's message?

we've been doing that a long time now with last assistant prefix and depth 0 inserts and such
>>
>>101492297
I'm gonna use it to run huge simulator cards like this. https://chub.ai/characters/Branon/shin-megami-tensei-simulator-v2-0-d1ac08fc
If it works good I'm going to modify it to have custom AI era games for every niche franchise I like
>>
>>101492849
>we've been doing that a long time now with last assistant prefix and depth 0 inserts and such
apparently a ton of people are still doing sysprompt up top, then they wonder why the models don't follow instructions...
>>
>>101492813
>>101492793
96gb vramlet here, if you're patient mistral-nemo seems like it might be promising and would be borderline useable with partial cpu offload. You're going to want a lot of system RAM anyways for this hobby so upgrading your ram to make sure you have enough for MiNeMo running mostly on CPU might be a good first step before you go busting out big bucks for GPU upgrades. Then decide from there if you want/need more.
>>
>>101492864
Models started following system prompts since Mixtral. And the problem with putting heavy instructions before the last message was the it breaks the flow of the conversation, at least with older models.
>>
>>101492219
this would be fine if you read in a book on page 200 out of 500. It's only a problem when you get it by message #5 in your ERP. Book sex doesn't happen after 5 paragraphs, and by prompting for sex you are technically also prompting for dramatic page 200 shit too, cuz that's where it all goes.
>>
>>101492914
then maybe nemo should be now much better at this than any other model, since it was trained to follow conversations with system prompt at the end
>>
>>101492849
Anything that could be called functionality or behavior is emergent.
It's just auto-completing.
And it's all highly generalized through training.
Literally the only people I see constantly sperging out and hair-pulling are the people who obsess over reddit bullshit like system prompts.
I have never given a fuck about system prompts, or insertion depth, and never had a fucking issue. I don't know what you people keep going on about. Learn to see it for what it is: A text predictor.
From the very first token to the very last. That's literally all it does.
>>
>>101492971
You want to be in distribution
>>
File: file.png (691 KB, 720x1009)
691 KB
691 KB PNG
>>101492971
trvth
>>
>>101492971
Your IQ is negative
>>
I'm trying to write a short story about a woman possessed by a nympho demon and it's struggling. It treats both characters as either the same person or separate people, not one person with two consciousnesses
am I asking too much?
>>
>>101490811
wouldn't their largest open model be the most censored? to show people that it's safe to have open models.
>>
how do i find quantized models on huggingface? I lurked here for ages until i saw one for gemma but i'm itching to try nemo now and no one has posted a link that i've seen. yes i'm retarded no need to point that out
>>
>>101491942
>>101491961
None of those were (me)
>>
>>101493386
Not really.
What model, quant, backend, frontend, instruct template, etc are you using?
Also, share your initial message, character card, sys message, etc.

>>101493469
look for model name gguf if you use llamacpp or model name bpw for exllama2.
>>
>>101493469
>type "mistral nemo exl2 or "mistral nemo gguf" in search bar
>???
>profit
>>
>>101493386
>am I asking too much?
yes, LLMs are shitty in 2024, wait a few years
>>
>>101493505
>>101493514
well that was easier than i thought, thanks
>>
>>101493514
>gguf
wait, the support got merged?
>>
>>101492374
*cracks knuckles* those Claude jb <tag> wrapping is making extra sense now
>>
>>101492374
llama.cpp ignores all that anyway.
>>
>>101492938
every mistral model has handled system prompts the same way, I remember remarking on it when they first released their API service
>>
>>101492971
>I have never given a fuck about system prompts, or insertion depth, and never had a fucking issue. I don't know what you people keep going on about. Learn to see it for what it is: A text predictor.
>From the very first token to the very last. That's literally all it does.
But I want to believe that Chun Li is speaking to me, Anon. Do you really want to take that away from me?
>>
>>101493593
Yeah, but that implies the model has been trained in that way, with system instructions separated by a double newline from the actual user request.

Putting aside how using a double newline as a separator conflicts with the way most character cards and instructions are formatted, reproducing that prompting in a non-hacky way in SillyTavern doesn't seem possible right now either; there is no "last user message" and certain macros don't work in instruct sequences.
>>
File: 1715284989664.jpg (82 KB, 510x705)
82 KB
82 KB JPG
>>101493705
Finally, an opportunity to use this image.
>>
File: file.png (1.75 MB, 1202x1183)
1.75 MB
1.75 MB PNG
>>101493650
TIL
accept this silver token of gratitude
>>
>>101491779
could it be possible to increase the loss on rare-occuring phrases during training? hmm
>>
File: nemosovl1.png (751 KB, 933x783)
751 KB
751 KB PNG
mistralbros... we're so fucking back
>>
>>101493848
focal loss?
>>
>>101493848
Yes, you can do that with a custom loss function.
>>
>>101493894
yeah, seems that's exactly what i was talking about, lol
is this already being used for llms?
>>
>>101493848
>>101494075
actually, why isn't it possible to artificially increase the sample size of the rare data during training by duplicating it instead of scaling the loss?
>>
>>101493870
Feels llm-generated. I bet it will be full of slop come erp.
>>
I need a local text model to reformulate text, improve wording and add corpo sugar. Which small model (<20B) would be best for that task? From my search, llama 3, phi 3, mistral nemo, gemma 2 or qwen2 are the current good small models.
>>
As compute scales, its outputs will be increasingly kino.
>>
>>101494179
what exactly do you duplicate? the entire text containing the rare sequence, which may contain overused slop, or just the text, which becomes nonsense without its context?
>>
>>101491668
what's your plan?
>>
>>101494269
wouldn't you like to know
>>
>>101493821
>WAGMI 2021
did they make it?
>>
>>101494348
no
>>
>>101494246
i guess it's just defining what's 'rare' in context of llms is a problem in and of itself
you can't scale the loss arbitrary on some random tokens either, can you?
>>
>>101494211
Probably phi3, but you are better off just trying them all and seeing which works best for you.
>>
>>101494413
You probably can.
>>
>>101493870
formatting is all fucked

first paragraph has asterisks, then it misses asterisks between quotes, at the end it appends a single asterisk at the end. Kinda like gemma. I don't remember even 8b l3 having issues with formatting like this, now both gemma and mistral mess it up, are we regressing?
>>
>>101494559
>at the end it appends a single asterisk at the end
actually the last three paragraphs.
>>
>>101478725
This kind of poorfag cope is straight up disinformation. It infects this cesspool of a general like a virus.
>>
>>101494413
I mentioned focal loss because that one doesn't rely on any quality inherent to the dataset, only whether the model is already accustomed to the data for which the loss is being evaluated, but it's an object classification/detection thing and I don't know if anything similar has been done for LLMs
>>
>>101494609
t. overspent on hardware to run obsolete big models
>>
>>101494609
I switched from CR+ to Nemo, your fallacy doesn't hold water.
>>
>>101494627
>obsolete big models
4 more days.
>>
>>101494503
It's hard to judge. I tried llama 3 but just an instruction like "Modify the following text to improve grammar and spelling:" changed my text a lot, even changing the meaning. GPT4-o or 3.5 sonnet are able to do it effortlessly.
>>
>>101492533
Look at this fag get checkmated and BTFO and then scramble to salvage his fragile ego XD XD XD
>>
>>101494657
stop
get some help
>>
>>101494655
1) You can't compare an 8B model with GPT-4o.
2) You literally told the model to modify the text.
>>
File: screen.png (38 KB, 1386x810)
38 KB
38 KB PNG
Anyone else getting this error?
>>
>>101487352
hey anon, this prompt format is wrong, and you're also supposed to include the past translations.
>>
>>101493870
Haters can hate, but I think this is awesome.
>>
File: 405b.png (58 KB, 1048x404)
58 KB
58 KB PNG
woops!
>>
>>101494626
we can also approach this problem inversely
we can split a huge dataset into distinct clusters based on some criteria (let's say similarity or topic) and sample from each cluster uniformly (or giving the priority to extremely rare pieces) until we find the dataset big enough, but that will lead us exactly to training on
>the entire text containing the rare sequence, which may contain overused slop
still worth a shot maybe
>>
File: 1704050998520400.png (22 KB, 1118x158)
22 KB
22 KB PNG
>>101494920
!!!
>>
>>101494920
Bitnet dreams crushed
>>
>>101494609
buyers remorse cope
i switched from wizard 8x22 to stheno and i'm loving it!
>>
File: file.png (31 KB, 718x566)
31 KB
31 KB PNG
>>101494983
>>101494920
it ded
>>
>>101494983
>18 days ago
bad bait
>>
>>101494983
Should've been a rickroll.
>>
Sorry for the spoonfeed beg but I've never tried to or had to use anything other than gguf and exl2. How the fuck do I run an FP8 model?
>>
>>101495002
hi sao
>>
>>101495055
newfag, i also used to use ggml
>>
>>101495055
You use bitsandbytes with the pytorch API
There's a huggingface wrapper but you need an nvidia GPU. CPU inference has been broken for years.
>>
>>101494920
I was promised bitnet
What the fuck
>>
>>101495155
By who?
>>
>>101495155
Nobody promised bitnet.
When bitnet became legit people said "I wish we were getting bitnet instead of 405B that's probably already obsolete" and somehow that played telephone to people thinking the next thing would be bitnet.
>>
>>101495163
me
>>101495155
sorry anon
>>
>>101495194
And who are you?
>>
>>101495201
i'm anon
>>
>>101495241
Why?
>>
>>101495241
Never heard of him.
>>
>>101492678
You could probably hack this into my neovim macros.
>>
forget 405b, stellar dong is here

https://huggingface.co/smelborp/StellarDong-72b
>>
>>101495272
Stupid shill. Do you not even know the difference between a dong and a gong?
>>
IT'S UP
https://huggingface.co/PrimeIntellect/Meta-Llama-3-405B-Instruct
>>
>>101495306

the gong goes dong :)
>>
>>101488376
I think this is really good. Make overused cliches and garbage prose be filtered as "gpt slop" and suddenly authors need to start writing properly again or have their works be accused of being AI garbage.
>>
>>101495316
>832 GBs
Who the literal fuck is this even for?
>>
>>101495437
It'll be only 400GB at Q8 and 200GB at Q4.
>>
>>101495450
Who is that even fucking for? What is the use case?
>>
>>101495450
And it'll still send shivers down your spine.
>>
>>101495450
That model is fake, the official one won't be that heavy.
>>
>>101495316
>"max_position_embeddings": 8192,
>>
>>101495437
What, your telling me you don't have 10 H100s?

Vramlets.
>>
>>101495463
API
>>
>>101495465
anon... 405B is 405B
>>
>>101495450
It would fit in 128GB at Q2.
My motherboard supports this much memory although I think I'd want more CPU cores.
>>
vramlets already coping ITT, lmao.
But don't worry, bitnet in two more weeks, Q* predicted this.
>>
>>101495450
iMat_IQ1_XXXS when?
>gonna git bitnet one way or the other
>>
>>101495501
>vramlets already coping ITT, lmao
We're having fun with it.
We haven't even seen yet if 405B can count the R's in strawberry, can compare 9.9 and 9.11, or can speak in a low tone that's not barely above a whisper.
>>
File: pepefroggie.jpg (38 KB, 780x438)
38 KB
38 KB JPG
Closed model companies serve their shit on these GPU server farms that have 500% better cooling and wattage efficiency than the average hobbyist. Cloud models is clearly the future. I bet if you live in some EU shithole like Germany it'll be cheaper to just pay for Claude Sonnet than try to run a shitty 70B in your dual 3090s rig
>>
>>101495815
that's obvious. It's also cheaper because they can run requests in batches.
>>
is the Echidna model recommended in the guides good or will it send shivers down my spine?
>>
>>101495815
>renting it is cheaper than hosting it yourself
Are there people who think otherwise?
>>
>>101495903
>Echidna
it's 9 months old
>>
>>101495955
Yeah, I figured, it's why I'm asking... all the guides are pretty ancient or just refer to basic models.
>>
>>101488157
You aren’t a fond of their gentle ministrations?
>>
>>101489775
Where can I make free videos and can they be spicy?
>>
>>101489834
Opus
>>
>>101494559
Both Gemma and this mistral are really smart when it comes to Japanese translations, but fucky errors with formating errors and overlooking text make them non-options to me. Sorry VNTL dude.
>>
>>101490020
What’s this fish what what?
>>
>>101496116
A text to speech model that can clone voices from 5-10 seconds files like xttsv2.
>>
>>101491880
I was so disappointed we didn’t get to tickle the stoic girl, only Nemu…
>>
>>101492265
Opus can do it..
>>
>>101492328
.., Post logs.
>>
>>101495903
Get the new Mistral 12B, it's not cucked, so no fine-tuning is necessary.
>>
>>101492388
That’s already a thing to the point where I find it annoying to deal with the curveballs and interruptions with my prompts
>>
>>101492444
I mean if you compare the best models to the dumbest humans…
>>
>>101496256
is that even available as gguf yet?
>>
>>101496335
not officially, don't know what they're doing, using some rando's fork I've been testing it all day...
>>
>>101496335
It's available in exl2. Are you vramlet?
>>
>>101496152
Is it free? Can I do spicy stuff with it?
Where? Where?!
>>
>>101496357
12GB... AMD...
>>
>>101496357
I will never use exlmeme.
>>
>>101496440
I use whatever works atm
>>
>>101496440
based
same here, I'm too lazy to install some other shit just for one model.
>>
>>101496440
>Translation: yes.
>>
So what is the verdic about the new Mistral?
>>
>>101496504
Not awful, an interesting vramlet side grade, it does tend to repeat itself a tad.
>>
so what is the new deepseek supposed to be good at anyway?
>>
>>101496539
Explaining the historical events that have occurred at Tienanmen Square.
>>
>>101496504
Okay for vramlets
>>
>>101496539
I did some experiments and it doesn't feel different than the old one, at all.
>>
>>101496369
It's an open source model. Just google "Fish Audio".
>>
>>101496504
Better than Llama 3 70B.
>>
>>101496852
Hardly an achievement.
>>
>>101496504
Best model smaller than 70B. The context is super nice.
>>
>>101496937
For RP / creative writing I mean btw. Gemma 27B is a lot smarter but too dry. Nu-mistral has soul.
>>
I used to think Chub was degenerate, but I every time I assume that it can't possibly get any more sick, somehow it still manages to surprise me. It's made me realise that that's why the /poltards want to take over society; to get rid of that stuff.

Hard degen cards are pretty pointless though now, because there are virtually no recent models that will run them authentically.
>>
File: llama405.png (33 KB, 527x446)
33 KB
33 KB PNG
apologize
>>
>>101496852
Why does Meta suck so hard?
>>
>>101497072
So is the current HF repo with 8k context fake?
>>
>>101497072
3.1 128K context

YES
>>
send help I can't stop making degen shit
Qwen2-72B-Instruct-Q5_K_M

oh look it's been 0.01s since I last genned deepthroat smut
>>
retards
>>
>>101497148
How do you freaks get off to that stuff? To me that's just boring. I could barely tell where the erotic material even was, in amongst all the purple prose.
>>
>>101497205
ask my dick
>>
>>101497144
Cool, when are we getting 70b bitnet with 128k context though?
>>
>>101497246
>>101497246
>>101497246
>>
File: GS8TMEfbIAUkJ23.jpg (104 KB, 556x1005)
104 KB
104 KB JPG
>>101496965
I mean these people do not even hide it.. They are proud of it and signal it to the world.. I am not /pol/tard but i would not mind if these people were purged.
>>
>>101497298
Retard
>>
>>101496965
Really? Haven’t the worst offenders stopped bot making entirely?
>>
>>101497148
More.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.