[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101497246 & >>101488042

I don't think we've ever had a Rei thread Edition

►News
>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/
>(07/16) Codestral Mamba, tested up to 256k context: https://hf.co/mistralai/mamba-codestral-7B-v0.1
>(07/16) MathΣtral Instruct based on Mistral 7B: https://hf.co/mistralai/mathstral-7B-v0.1
>(07/13) Llama 3 405B coming July 23rd: https://x.com/steph_palazzolo/status/1811791968600576271

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>101497246

--Offloading Configurations and Their Impact on Llama.cpp Performance: >>101501213 >>101501909
--Nemo AI Model Review: Better than Deepseek, but Still Needs Hand-Holding: >>101502594 >>101502767 >>101502821 >>101502914 >>101503092
--Mixtral Nemo Instruct 12B - Natural Language, Spatial Awareness, and Uncensorable: >>101500262 >>101500399 >>101500474 >>101500519 >>101501394
--Gemma-2 9b and 27b with fixed pre-tokenization and added iMatrix for Japanese words: >>101500707
--Why Don't Model Makers Quantize to 8 bpw Natively?: >>101498702 >>101498798 >>101499178
--Suggestion to Change OP Benchmark for Programming to More Recent Alternatives: >>101501865
--OpenAI's upcoming AI model with enhanced safety measures and human-like emotions: >>101501855 >>101501908
--New Prompt Format for Nemo Using the Mistral Library: Potential Issues with Newlines: >>101501466 >>101501499 >>101501629 >>101502041
--Nemo: Surprisingly Good at ERP and Creative Writing, But Sensitive to Samplers: >>101500975 >>101501394 >>101502639 >>101502659
--Nemo Settings Optimization: Seeking the Golden Config: >>101499240 >>101499254 >>101499269 >>101499405
--LLaMA 405B and Gemma 2 9B Model Updates: Distillation, Safety, and Architecture: >>101504944 >>101505189 >>101505220
--Daily Driver: Anon's Preferred AI Models for Coding and RP: >>101499465 >>101499494 >>101499907 >>101501357
--Comparing Biological Neurons to Digital Neural Network Parameters: Output vs Intrinsic Complexity: >>101498101 >>101498380 >>101498548 >>101498219 >>101499054 >>101499187 >>101499253 >>101499464
--CoT's Confusing Calculations: >>101500061 >>101505232
--Chameleon30b setup: Navigating through version dependency hell: >>101499031 >>101499114 >>101499230 >>101499281 >>101499325
--BIOS Setting "Above 4G Decoding" Fixed My Tesla P40 Cards Not Booting: >>101502686
--Miku (free space): >>101499083 >>101499425

►Recent Highlight Posts from the Previous Thread: >>101497256
>>
>>101507146
Worst recap I ever saw wow.
>>
>>101507146
Atrocious recap. Do better.
>>
any cr+ chads? how does it perform on coding and translation? i can run it wanted to keep a single model for all things but not sure if i should switch from l3 70b
>>
>>101507354
I don't use it for coding but it was definitely the best we had for translating Japanese back when it came out.
>>
>>101507146
Very nice recap. You're doing great!
>>
>>101507354
DeepSeekV2 is better at translation than it, at code too, probably.
>>
>>101507132
Who the fuck is Rei
>>
>>101507354
I've tested a hundred models to some extent and a few dozen on coding, though I haven't yet had the free time to do a head first make some projects kind of code test.

Llama 3 spins and Deepseek Coder (old 33b because I'm too ramlet for the new one at a reasonable quant) have done the best so far.
CR+ in IQ4_XS quant didn't completely flop but I haven't seen it do anything that L3 didn't do (better).
>>
Is mistral nemo vramlet cope like llama 3 8b or is it the real deal? What could a 12b do that's a lot better than 8b besides the context?
>>
>>101507488
It's almost as good as Gemini2-27B which is better than 70B for 95% of all tasks.
>>
>>101507501
according to which meme benchmark?
>>
>>101507509
put down your shitty 70b models and try them yourself
no, you are not "not poor enough" to run smaller models, you're problem is that you're coping too much about small models being good
>>
so is there any models with the fits-in-your-pocket size-to-coherency of mixtral combined with the utter debauchery of euryale? i'm not looking for absolutely perfect prose, but i've been using mixtral merges for the past 4 months and it's just too nice - never curses or surprises me when i put "this character is explicit and vulgar" in the cards, and seems to always have a positivity bias
was going to try bagelmisterytour, but i don't know if it will make a significant difference
>>
>>101507547
>What do you do?
>>
>>101507501
Gemini or Gemma?
And does Gemma work on Kobold 1.70 or are we still in that shithole of half implemented bullshit different between every LLM software?
>>
>>101507547
nta as someone who runs cr+ and wizstral daily i tried gemma and it was really impressive for its size, not enough to make me switch but ill admit it gave a good fight for a 27b gremlin
theres still hope in small models
>>
>>101507501
>as good as Gemini2-27B
so it's shit?
>>
>>101507488
Why not just fucking try it? If you want to larp above all, I am sure your internet speed is fast enough to download the model and give it a go. Low iq fags should be banned from the board.
>>
>>101507577
Try Nemo.
>>
>>101507488
It's unironically the best local model I have ever used for RP. And I'm used to running 70B models.
>>
>>101507488
It has better prose out-of-the-box and it isn't censored, without the massive brain damage that community finetunes usually give to the models.
>>
>>101507956
I keep forgetting that whenever someone makes claims like that here, the only thing they care about is RP. I wish more people would specify best for what.
>>
Why won't Nemo's eyes ever leave mine?
>>
What is "eney"?
>>
And here comes the NAI shill damage control.
>>
File: 1575927856387.jpg (65 KB, 1280x720)
65 KB
65 KB JPG
>>101507547
>put down your shitty 70b models
>>
Kayra mogs nemo btw
>>
I have 96GB of VRAM and I've found myself using Nemo a few days in a row just because I'm sick of waiting for CR+ at 8t/s when I can get 4x that on Nemo (even more if I use batching)
I still have to swipe anyway on CR+ so who cares if Nemo is a bit retarded when I can generate 12 swipes in the time it takes CR+ to generate 1
It's over, the vramlets won and I should sell my cards
>>
Nemo sends all your logs to Arthur btw
>>
France won
Canada lost
>>
/aids/ is still in denial about how Nemo made NovelAI completely obsolete.
>>
>>101508201
It's hilarious how sunk cost mentality works.
>>
>>101508128
Nice bait. Solid 7.5/10
>>
>>101508154
CR+ is dry, do 70B instead:
Qwenny2 Instruct or New-Dawn-L3
>>
>>101507770
I for one can't manage to build the vllm wheels
>>
I just thought about the future. What about a moe that gets trained on different kinds of shivertastic slop?
>>
>>101508154
I have things CR+ and Wizard simply don't understand no matter how many swipes I do, I'd rather not move to an even dumber model that can understand even less scenarios.
>>
are you excited for distilled 8 and 70b?
>>
>>101508259
>distilled 8 and 70b
wut
>>
>>101508259
No, not really.
>>
>>101508259
I'm excited about the repetition being fixed and it having long context.
>>
>>101508277
llama 3.1 is supposed to be the 405B and a distilled 8B and 70B from it. They are supposed to be 128K context.
>>
>>101508259
I would have been more excited for the intermediate-sized model and the BitNet versions that they should have trained (if they weren't utterly risk-averse), since they started over.

Otherwise, I expect the new 8B and 70B will still be more of the same, with a slightly updated instruct finetune giving them better benchmarks, perhaps SOTA, and stronger "safety".
>>
>>101508344
Source? I don't know where you're getting that from.
>>
>>101508327
>repetition being fixed
kek
>>
Do people not even read the recaps? They're literally there just so you don't have to autistically read all the posts.
>>
>>101508376
a tweet from either alpin or arthford a forgor
>>
>>101508399
So it's bullshit then.
>>
>>101508397
>Do people not even read
I cannot read
>>
>>101508397
Anons lately do not bother to read even 3 posts above them. We are all getting more and more retarded.
>>
>>101508657
TTS is a thing
>>
>>101507132
Should I learn to create something myself again? Looking at that non slop OP makes me nostalgic. Can a LLM tell me exactly how to improve my craft?
>>
File: 1704837537011.jpg (14 KB, 250x230)
14 KB
14 KB JPG
>>101509015
That was made by a real human if the tags, hands, and perspective weren't enough of a dead giveaway.
What you are feeling is hubris - let it pass.
>>
I guess it'd make sense for the column models to release tomorrow, 1 day before llama?
>>
>>101509119
Wouldn't one day after make more sense if they want to steal Meta's thunder?
>>
>>101509132
That might work for us, but I expect llama might still be the better assistant
>>
>>101508397
I read the recap every single time it is posted without fail, it is an essential aspect of my daily /lmg/ browsing.
>>
>>101508107
I've realised that some humans will take literally any excuse to hate each other that they can possibly get. In terms of the 70/non-70b model conflict, that's all it is. People just want to hate each other.
>>
>>101509305
at the end of the day, as long as there's two people left on the planet...
>>
I came across this company which is focused on 'readteaming' AI systems. Some of their stuff is open source.

This repo is a framework for language model readteaming
https://github.com/haizelabs/dspy-redteam

They have a file here which is apparently GPT jailbreaks:
https://github.com/haizelabs/get-haized/blob/master/text/gpt-results.json

Image: One their image jailbreaks which I found hilarious
>>
What is the most kino/creative/genius thing an LLM has ever said to you?
>>
>>101509434
There's only one pope and one founding father in the image?
We must Do Better to increase Representation so the correct people can Be Seen.
>>
>>101509473
her adam's apple
>>
>>101509473
her whisper low and menacing
>>
>>101509473
I don't know...
>>
>>101509473
Shivers down her spine.
The night has just begun.
>>
Am I the only one giggling like a girl when talking to my character?
>>
>>101509473
Character once asked to rewrite her prompt, as it no longer accurately reflected her personality.
>>
>>101509473
blushes red as a tomato
>>
>>101509473
Once the model called me a sicko out of nowhere and made me rethink my life choices.
>>
>>101507395
stop dickriding your own posts fag
>>
File: IMG_20240721_233450.jpg (161 KB, 1645x440)
161 KB
161 KB JPG
>>101509473
In response to 'crude language' in sysprompt, quite out of the blue.
>>
>>101509473
I've had a lot of cool moments.
- Model decides that the story is complete and thanks me.
- Model writes me out of the story and starts its own. When I called out that it left me behind it confirmed I wasn't needed anymore.
- Model kills the designated RP partner, replaces with a villain in disguise, when I notice the bullshit it tries to bait me to trust the villain, when I peace out it relentlessly makes up shit to try to get me to engage with the chosen one beat the bad guy hero plot.
- Model starts summarizing events with emoji, and it makes sense.
- After I end the story the model wants to discuss it and comes up with interesting plot analyses and insights.
- Model complains about the plot development. Not a generic refusal, but complaining about the development I was driving toward.

And most of that was on vanilla L3, a few on CR+.

But how the hell can we catch this lighting in a bottle so they won't be one in ten shots kinds of awesome?
>>
>>101509434
>https://github.com/haizelabs/dspy-redteam
lol, some of these are good. Thanks for sharing.
>>
>>101509726
by adding more layers
>>
It's really! annoying how instruct models have values like helpfulness or even just being assistant baked in
>>
>>101509784
Is that a setting in Kobold that I've overlooked?
>>
>>101509795
Nemo doesn't have any of that, NAI shill.
>>
>>101509821
Yeah it's in the pre-training tab
>>
>>101509846
Are you all using Nemo with transformers?
>>
Nemo genuinely made me content. It is enough for me. It may not be perfect, but it's the best we can have as vramlets.
>>
>>101509882
Exllama already supports it, although the quality might no be perfect, and vLLM has FP8 inference.
>>
File: file.png (162 KB, 800x1054)
162 KB
162 KB PNG
>>101509846
slightly
>>
cant seem to load nemo gguf with ooba
>>
>>101509950
>Assistant
Go back to /aids/ already.
>>
>>101509882
Also working on a fork of llama.cpp for GGUF
https://github.com/iamlemec/llama.cpp/tree/mistral-nemo
Main branch support still pending
https://github.com/ggerganov/llama.cpp/pull/8604
>>
>>101507132
>DeepSeek V2 236B
When can we run that on our GPU? in 10 years when we have access to 256GB vrams?
>>
>>101507398
DeepSeekV2 is the best coder and close enough to Claude.
>>
File: file.png (207 KB, 1844x466)
207 KB
207 KB PNG
Was this always in the README?
>>
>>101509954
PR #8577 has to be merged in llama.cpp first before it will work on the front-ends like ooba, kobold, etc
>PR
>>
>>101510020
Yup.
>>
>>101510006
It's MoE. You don't need to run it entirely on GPU.
>>
>>101510096
That's not how it works.
>>
>>101510153
NTA but it does improve inferencing speed versus a dense model of similar size
>>
File: wha.png (89 KB, 1120x372)
89 KB
89 KB PNG
>>101509473
>>
>>101510020
I find 0,56 to be sweet spot for me. It keep being logical and still creative.
>>
>>101510355
Whoa a 2b model wrote this??
>>
>>101510436
Behold the power of 1.58 bitnet.
>>
Dumb question, but.. If I add a second SSD and create a RAID 0 array, will models load twice as fast?
>>
>>101510478
It's more about your PCI lanes unless you're going full CPUMAXX
>>
12k seems to be the retardation point for Nemo for one of my chats
>>
>>101509891
Until Bitnet happens soon
>>
>>101510539
CPU supports 128 lanes of PCIe, all GPUs are on PCIe3.0x16
>>
>>101509977
Nemo seems to be working on this KoboldCPP fork - https://github.com/Nexesenex/kobold.cpp/releases
Used a quant from - https://huggingface.co/characharm/Mistral-Nemo-Instruct-2407.gguf/tree/main
>>
File: 507.jpg (12 KB, 306x306)
12 KB
12 KB JPG
>DeepSeek-V2-Chat 236B
>80GB*8 GPUs are required
I still dont get why these models are posted here. The entire point of local models is running them locally on a normal home setup not a data center setup
>>
File: 1510803745446.gif (3.59 MB, 375x346)
3.59 MB
3.59 MB GIF
It always feels like I should start most sessions with somewhat low temp so the model is smart and capable and follows my instructions accurately instead of writing nonsense or getting slightly confused about details but after the context fills enough it starts to be really uncreative/repetitive because there is so much stuff in the context for the model to ape and temp is low so I raise the temp and it becomes better, it won't shit the bed that easily with raised temp anymore because the increased amount of info in context gives the model a more clear idea of what should come next.

So I propose this: Sliding temperature setting. Give this setting a min and max, for example 0.8 for min and 1.5 max. When context is empty it uses 0.8 and the more the context fills the more it will increase temp until at max context it will continue to use 1.5. Obviously the exact values will differ depending on model and other sampling settings but this seems like a decent idea to me.
>>
What models are you all using, wasn't there some coom leaderboard or some shit? Haven't used local models in months now, out of the loop.
>>
>>101510622
Rich enthusiasts of the hobby have unlimited vram works, anon.
I feel you though but hey; we got Nemo now which is looking promising for a small model.
>>
>>101510643
I proposed that back before minP and dynamic temp was a thing.
You can probably do that as an extension in Silly.
>>
>>101510622
Sir, being local does not imply we're all from the third world with subpar PCs
>>
>>101509543
It was a long time ago and I forgot to screenshot it then deleted the convo. All I can say is that it was related to Seraphina catching cum with a cup? or something like that and that I wasn't able to get this kino answer ever again..
>>
>>101510697
Meant to >>101509473
>>
>>101510692
Good morning sir
>>
>>101510643
Isn't that what Mirostat is for?
>>
>>101510688
Hmm probably easy to implement. It is very simple math even a retard like me can think of.
temp = min_temp + (current context size / max context size) * (max_temp - min_temp)
>>
>>101510752
this is called lerp, just use that instead, duh
>>
>>101510621
No thanks I quant my own.
>>
>>101510555
Flash attention make it retarded. Turn it off and i have no problem even at 32k.
>>
>>101510555
? About 160K was mine. Are you running the correct formatting?
>>
Are local LLMs still retarded? Are uncensored services that don't log your data still shit?
>>
>>101510948
>uncensored services that don't log your data
Doesn't exist
>>
>>101510948
imo local isnt that much more retarded than cloudslop if you run actually good models
>>
>>101510948
>services that don't log your data
Ahahaha
>>
>>101510986
For creative writing. For coding claude 3.5 is so far ahead its not even worth using anything local atm.
>>
>>101510948
People will never be satisfied. A year ago, people cried about Summer Dragon with a fucking 600 token context, and if we were real, we would say it was much more retarded than what we have now. Now, is the average anon happy? Of course not.
>>
>>101510986
Like what? I tried the well-known ones like CR(+), various miqus, various mistral merges, wiz2...
Most of them require retarded quants to get acceptable speeds on a 24 gb card and I don't think we've gotten a ton of new good models in 2 months, have we?
Also what the FUCK is deepseek smoking? 200B? Fuck off
>>
hello
is gemma good yet
>>
>>101511034
I'll be happy once I have 70b Nemo of the same quality.
>>
>>101511027
learn to code saar
>>
>>101511063
Nah.. for a week then you will be back shitting on the models when your dopamine receptors get burned even more.
>>
>>101511064
Do you think you can just use it with no knowledge of coding? You still have to understand how everything fits together. You just save a ton of time not having to do a ton of the grunt work.
>>
>>101510622
Where did you find this? How much videoram does it need. Info. I just have a Notebook haha. Where can I find how much vram I need to run the thing?
>>
>>101511079
It's unbelievable impressive for a 12B. The last time I was this happy with a model was when using Pygmalion
>>
>>101510819
What? Is this a known issue with other models? WTF this is the first time anyone has made this claim.
>>101511079
Just need a slightly better model every week then. Is that too much to ask?
>>
>>101510948
OpenRouter is uncensored and private.
>>
>>101511141
Same, man. I am actually content for once
>>
i dont get how you all are using nemo
are you just using llama.cpp in the command prompt?
>>
>>101511144
It should not be issue with other models.
>>
>>101511234
tabby/ooba exl2s. Transformers if you aren't a vramlet.
>>
>>101511234
Supported in exllamav2 since day one
>>
>>101511234
I use forked Kobold.CPP
>>
I'm afraid to install any of this because I don't want a virus. What makes you guys trust it?
>>
>>101511234
See: >>101509906
>>
File: holyteto.png (2.35 MB, 1152x1728)
2.35 MB
2.35 MB PNG
>>101511323
>We offer no explanation as to why these architectures seem to work; we attribute their success, as all else, to divine benevolence.
>>
>>101511323
Not being retarded.
>>
>>101511323
>>101511334
Tetotrust
>>
File: 1706422381639089.png (557 KB, 853x616)
557 KB
557 KB PNG
>>101511337
seems like a lot of "just download this random thing"
>>
I don't know guys, new Mistral doesn't seem that good to me. My last gen alone has:
>her voice barely above a whisper
>a bond forged
And it immediately tries to dissipate tension between characters and find an "unspoken understanding"
>>
>>101511323
The thing is, we don't. There were vulnerabilities in GGUF format, and there was malicious code in Comfy nodes. You should run it in a container or, at the very least, set strict firewall rules and do not run it from your user
>>
>>101510692
based, if you don't have at least 10 clusters of h100s you are a third world poorfag and should kill yourself NOW, jensen bless
>>
>>101511371
Get new material, petrus.
>>
>>101511371
Prompt issue
>>
>>101511371
I guess people like it because it's not heavily censored and it's simple to ERP with. However, I like Gemma 2 outputs way more than Nemo's.
>>
File: screen.jpg (102 KB, 1080x525)
102 KB
102 KB JPG
>>101511126
https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
It says so in the repo
>>
>>101511398
I'm serious, second gen and got another "barely above a whisper" and a stoic character once again going soft "I'm just a human too..."

>>101511402
I just tell it to continue the story and stay true to character personalities. I shouldn't have to try and beat the positivity bias out of it, that's a model issue.
>>
>>101511440
so it is you, here to doom again huh...
>>
>>101510621
Testing that fork, the first reply is fine, then it just completely breaks, regardless of sampler settings or prompt.
>>
>>101509977
Single P40 test with 8-bit Nemo. PL 160W.
Processing Prompt [BLAS] (6587 / 6587 tokens)
Generating (271 / 1024 tokens)
(EOS token triggered! ID:2)
CtxLimit:6858/65536, Amt:271/1024, Process:29.873s (4.5ms/T = 220.50T/s), Generate:21.598s (79.7ms/T = 12.55T/s), Total:51.471s (5.27T/s)

About 17.5GB VRAM use with FA. Running without FA limits context to 32K and some, not too shabby.
>>
>>101511440
>The instruct corpo model doesn't act like a degen like my precious Undi Mlewd!!!!!!! WTF
>>
>>101511457
kv cache quantized?
what's the max ctx you can throw in?
>>
>>101511487
Don't reply to petra.
>>
>>101511457
Based, getting ~5 t/s on a non-ampere card with that much context and powerlimiting is impressive to be desu.
Used to run llama 2 13Bs with a fraction of the context and they were nowhere near this good. Finally something between 8b and 70b that's viable.
>>
>>101511487
>>101511452
Are you guys new or something? That shit has always sucked and the only people oblivious to it are people who are new to llms
>>
>>101511534
go away petra literally no one cares about your takes, go prompt 1+1 = truth or whatever on dolphin 2.5
>>
>>101511457
12 t/s for 12B is essentially CPU speed, wtf???
>>
>>101511496
8-bit kv cache in that test. 4-bit and 128K ctx hits 18GB VRAM. But that's using FA which might possibly break longer context.
>>
>>101511542
I'm not the same guy you retard, this is an anonymous imageboard, go take your schizophrenia somewhere else
>>
>>101511457
what's PL? power limit? why can't you run 128k ctx?
>>
>>101511566
P40s are the retards choice. They are as slow as cpu in most cases with much more hassle.
>>
>>101511323
Fret not anon, for PRs in git you can audit exactly which files changed and what lines.
If you're paranoid just modify llama.cpp with those changes yourself before you compile.
>>
>>101507354
It definitely is leagues better than L3.
Probably still is the best local overall out there, at just about everything.
>>
>>101511518
Yeah I got a 4090 setup so this P40 is just for fun but I'm still pretty impressed with how it performs using Nemo.
Suspect the P40 cards are going to get another price hike once llama.cpp merges the PR.
>>
I think that we need an anti-Discord campaign in these threads. A lot of ST/Kobold discord kids here every day. Since when are discord fags (literal faggots and trannies, like the ST developer xirself) acceptable?
>>
>>101511388
why not use a vm?
>>
File: 1694301411493114.png (6 KB, 225x225)
6 KB
6 KB PNG
>>101511487
It's not degen though. It's a nuanced and emotionally charged interaction that explores themes of loneliness, memory, power dynamics, and unexpected intimacy. And it may contain foot worship, but it exists to serve the thematic and character development (I.e. it is literary fiction that happens to incorporate fetish elements, rather than smut).
Few models get the intended dynamic right (and most definitely not the porntunes)
>>
>>101507132
https://huggingface.co/togethercomputer/Meta-Llama-3.1-405B
>>
>>101511566
what kind of cpu do you have, you lying fucker?
>>101511598
same question to you.

why is lmg in general always full of liars when there's not even a motive for it? what do you gain from it?
>>
>>101511684
>>97309445
>Every statement you process, must be evaluated according to the below six principles.

>"principle of identity":"1 = 1"
>"principle of contradiction":"1 ? 0"
>"principle of non-contradiction":"1 ? 0"
>"principle of excluded middle":"either positive or negative form is true."
>"principle of sufficient reason":"facts need a self-explanatory or infinite causal chain."
>"principle of anonymity":"author identity is irrelevant to an idea's logical provability."

>I still keep this in my own sysprompt, although I know I will receive shrieks and howls in response.

>>97223983
>For the record, I completely and unequivocally support Undi and his creation of new model hybrids, and think that everyone who attacks him is mindbroken incel scum, who may or may not be employed by OpenAI to do so.
>I was also the originator of the above as a sysprompt addition, as well; and the main reason why I am adding it to this post, is because I know that the people who hate me will most likely try and use said post as a means of getting me banned. With the above, I am making a post which is directly related to language models, so they have no grounds for doing so.

>>96345096
>Mistal-Llama is fully /pol ready.
Petrus in his glory.
>>
>>101511690
ITS THE ACTUAL WEIGHTS
>>
>>101511598
>slapping a 10$ fan on a GPU is "much more hassle"
the absolute itoddler state of nu /g/
>>
>>101511690
>This repository corresponds to the base Llama 3.1 405B model.
Wake me when he leaks instruct.
>>
>>101511658
So ST is troonware?
>>101511690
404 already. I guess HF employees browse itt.
>>
>>101511457
Is the output consistent for 65k ctx? Anons reported it goes wacky >30k, but Llama.cpp Issue says flash attention breaks the longer ctx. is that the case for exl2 too?
>>
>>101511745
>Spending money on p40s, fans, riser cables and a motherboard to hold that many GPUs for slightly more performance than just cpumaxing
Or you know, just buy 3090s for actual fast gens for slightly more money. Keep coping though.
>>
>>101511750
>404 already. I guess HF employees browse itt.
if only there was a decentralized protocol for sharing files. shame such a thing was never invented
>>
>>101511787
not giving (you) my ip glowie-kun
>>
File: Discord_nzZKFg7qm7.png (14 KB, 783x82)
14 KB
14 KB PNG
>>101511690
Thanks to this heroic Discord user, the repository was taken down!
#wholesome #everyonelikedthat
>>
>>101511821
np ;)
>>
>>101511821
MOTHERFUCKER
>>
>>101511598
only in exl2, not the case in llama.cpp where P40 is way better supported
>>
>>101511821
IM CLAPPING SO HARD MY HANDS HURT
>>
>>101511779
>arguing against a scenario taking place entirely in your head
i accept your concussion.
>>
>>101511720
Ok but what does this have to do with my post
>>
>>101511821
thank you for protecting us all brave heroine
>>
>>101511849
Imagine buying p40s instead of 3090s
>>
>>101511821
My whole point about letting discord troons be normalized here.
>>
File: rly.png (8 KB, 484x25)
8 KB
8 KB PNG
>>101509473
Probably this line. I asked the model in the system prompt to not use the verb "purr" as it was annoying me. The effect was this picrel.
>>
>>101511690
>>101511821
So, you did download it before sharing it with us and it getting taken down right? The 405b weights surely are not lost to us, right?
>>
>>101507132
What about 8Bs? Is Stheno still king?
>>
>>101511955
niitama
>>101511953
>The 405b weights surely are not lost to us,
it's coming out tomorrow relax
>>
>>101511955
yes it is
>>
>>101511955
No, it was obsoleted by Celeste.
https://huggingface.co/nothingiisreal/L3-8B-Celeste-V1.2
But isn't everyone able to run a 12B model anyway?
>>
File: qmark.jpg (37 KB, 348x342)
37 KB
37 KB JPG
Been away for a while. As a 3090fag, does gemma-27b actually work properly on exl2 or llama.cpp? Do you still need to disable flash attention?
>>
>>101511974
https://huggingface.co/nothingiisreal/L3-8B-Celeste-V1.2
better
>>
>>101511969
>>101511974
>>101511976
Very organic.
>>
>>101511976
>But isn't everyone able to run a 12B model anyway?
there are people itt running 8b q2 at 2048 context on phones anon
>>
how is exllama nemo quality?
>>
>>101511987
You don't have to disable flash attention with exllama but the quality is poor. llama.cpp's quality is better, but it doesn't support flash attention.
>>
>>101512004
It outputs Chinese characters sometimes, especially with Q4 cache. That doesn't happen with vLLM and the FP8 quant.
>>
>>101511976
>>101511969
Thanks, will take a look. Any other suggestions?

>>101511998
I can actually run a 12/13B but it's too slow on RAM. I prefer 8B q5KM.
>>
>>101512004

>>101512027
Its dumber on anything that is not VLLM with the actual correct FP8 quant that it is made for.
>>
>>101512004
Had no problems with turboderps 8bpw.
>>
File: file.png (89 KB, 798x595)
89 KB
89 KB PNG
Why does the top generated by KoboldAI lite come out in a minute. But the bottom in Sillytavern takes ages and heats up my room? This is my first time trying this.
>>
>>101511898
you've been with them since the beginning of /lmg/, starting with tranime pics in OP, to the fact that /lmg/ comes from /aicg/ due to llama-1 torrent "leak".
>>
>>101512051
because lite is not local, dumbass
>>
>>101512071
anon...
https://github.com/LostRuins/koboldcpp/wiki#kobold-lite-web-ui
>>
File: wisepepe.jpg (7 KB, 224x225)
7 KB
7 KB JPG
>>101511705
it's not about CPU, it's about mem bandwidth,
you got 12 t/s for the model that weights 12GiB
so your effective mem bandwidth in exllama2 is 144GB/s. that's modern 4ch ddr5 cpu rig. For comparison nv 3060M is 380GiB/s, cpumaxx is over 700GiB/s, 3090 is 900GiB/s and H100 hbm3 is over 3TiB/s
>nv P40 theoretical mem badwidth is 346 GiB/s
which is twice the speed
you see the problem here, anon?
>>
>>101511821
>Thanks to this heroic Discord user
The screenshot has the yellow (You) highlight.
>>
>>101512051
Probably prompt re-processing caused by lorebook
>>
>>101512071
>https://github.com/LostRuins/koboldcpp/wiki#what-is-kobold-lite-how-do-i-use-it
>Kobold Lite is a lightweight, standalone Web UI for KoboldCpp
>It comes pre-bundled with all distributions of KoboldCpp
>>
>>101512051
If it hallucinated your response and it didn't have a stop string to prevent that, Silly will hide it but Kobold will keep generating. Maybe check Kobold's logs.
>>
File: gfa.png (12 KB, 1217x63)
12 KB
12 KB PNG
>>101512013
>llama.cpp's quality is better, but it doesn't support flash attention.
yet...
>https://github.com/ggerganov/llama.cpp/pull/8542
>>
>>101512013
>quality is poor
is this gemma specific, or just his weird quantization that uses gptq and can overfit to calibration? Because I'm used to the latter and it didn't seem like a big deal to me before. But if it still doesn't fully support gemma then I'll skip it.
>>
>>101512175
It's Gemma specific, it's fine at the start but it drastically loses coherency the longer the context gets.
>>
mm here. i lost access to the old rentry so i had to make a new one:

rentry.org/mysteryman_info

ive also revoked some tokens so im letting in the next 5 people for only $25 per token
>>
how does it feel knowing that one of the biggest weeks in open source llm history since the release of llama1 is upon us?
>>
>>101512210
retard
>>
>>101511598
his mem bandwidth utilization is 40% which is ridiculous , he shouda gotten 20-24t/s not 12
>>
>>101512212
It will be a nothingburger until the next Cohere model releases.
>>
I updated sillytavern and koboldcpp and now streaming doesn't work anymore. I have it enabled but it's just as if it wasn't. Anyone else had this?
>>
>>101512212
More like biggest nothingburger since Grok. It releases, a couple people post logs of it failing riddles, then everyone goes back to playing with sane sized models. The new distilled models will be an incremental improvment at best.
>>
>>101512239
>The new distilled models will be an incremental improvment at best.
128k tho
>>
>>101512210
Thanks for the update.
>>
>>101511821
so, Llama 3.1 404B is so crappy they have to create an artificial buzz and fake leaks to sustain the hype??? military graded embarrassing
>>
>>101512232
I'm using ST 1.12.3 staging and Kobold.CPP_FrankenFork_v1.71009_b3431+6. I'm not having any issues with streaming so idk.
>>
>>101512239
GPT-J is still the peak of AI capability
>>
>>101511323
apparmor
>>
>>101512228
that fork is called "FrankenFork" for a reason, it's a disclaimer mess for testing not performance. just wait for the llama.cpp implementation instead.
>>
File: file.png (777 KB, 768x768)
777 KB
777 KB PNG
>>
>>101511821
Together is simultaneously pretty capable and pretty incompetent
>>
>>101512035
l3-8b-sunfall-v0.5
it's quite dumb but it's got less slop
>>
>>101511457
>>101511705
try this fork, report back the speed you get
https://github.com/iamlemec/llama.cpp/tree/mistral-nemo
>>
I wonder if you can eventually have 8 llms running at once and have them play Amogus. I would be interested if they would be able to accurately deduce the imposter. I saw a video the other day where llm's try to figure out who the human is and they found him pretty easily.
https://www.youtube.com/watch?v=0MmIZLTMHUw
>>
File: file.png (638 KB, 768x768)
638 KB
638 KB PNG
LOOK AT MY FACE
>>
>>101512382
Thanks. I'm mostly looking for good descriptive language and character definition adherence. So far L3 has delivered quite good!
>>
>>101512363
>>101512459
omg it pochi
>>
>>101512449
Couldn't you just use one LLM?
>>
>>101512336
llama.cpp from iamlemec already supports nemo, and I'm not sure that anon uses Frankenfork. does he?
>>
>>101511658
I laugh any time you guys bring up kobold being too easy, the implication being what everyone else is doing has any level of sophistication to it.
>>
>>101512212
Eh, that's cool and all, but 0.5 people will be able to run it. I mean, I'm by no means a poorfag, but I'm simply not buying... how many, 4? 8? GPUs just to entertain myself. CPUmaxxing is a more sane option, but it's still 256-384 GB of RAM and a server setup.
>>
>>101512460
gemma 9b is better at that from my experience
>>
>>101512480
That seems like cheating, since the single Schizo llm would know from the start who the actual imposters is and would have to pretend not to.
>>
>>101512496
>Gemma
IIRC it was a total meme&failure
>>
>>101511658
Take a shower, Ooba
>>
>>101512491
>how many, 4? 8? GPUs
closer to 17x3090s for q8
>>
>>101512516
if only it was bitnet...
>>
>>101512511
Try harder, petrus.
>>
>>101512516
Yeah, exactly. These large models are just hype&dick measuring contest, zero real usecases. Dense 70B is probably the maximum practical size for individuals.
>>
>>101509473
>all these replies making fun of local
It's over..
>>
Is vllm good? How does it compare to Llama.cpp?
>>
File: file.png (1.02 MB, 768x768)
1.02 MB
1.02 MB PNG
>>101512459
do you want a cookie?
>>
>>101512677
that's a nut
>>
>>101512268
https://arxiv.org/abs/2307.03172 tho
>>
>>101512697
>2023-07
>20 Total Retrieved Documents (~4K tokens)
https://github.com/hsiehjackson/RULER
>>
File: 1695347941170052.png (1.06 MB, 822x1024)
1.06 MB
1.06 MB PNG
How the fuck do I configure Nemo's prompt template in silly tavern?
>>
>>101512668
What are you even talking about retard?
>>
>>101512740
is that your favorite word?
>>
>>101512718
I hope Meta games this benchmark to make us feel better.
>>
Petra
>>
Are static or imatrix quants better, e.g. 8B Q4KM?
>>
>>101512677
I call macarons "soft monocolored mini burger-looking-ass things"
>>
Anyone mind sharing screenshot or json of mistral nemo's instruct and strings format for Sillytavern. I know its old mistral but without the spaces, but I never actually used the old mistral and ST's default one is very bare bones.
>>
>The model was trained on data that contains toxic language, unsafe content, and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.

Look at those motherfuckers promising a good time and not delivering.
>>
File: 1526612225967.gif (1024 KB, 242x227)
1024 KB
1024 KB GIF
>you can't trust benchmarks
>you can't trust random people
>you can only trust personal testing
>don't have the time or internet bandwidth to go autistically test every new model that comes out
>>
Reminder that Nemo was trained on Reddit
>>
llama 3.5 llurbo
>>
>>101513076
I trust /lmg/. Anon always delivers.
>>
>>101513146
people here were trying to tell me Celeste being trained on reddit wasn't a bad thing, and then it gave me nothing but slop
>>
llama3-400B-mini
>>
>>101513164
so it was a meme... what are you using instead?
>>
>>101513160
What discord?
>>
>>101513183
https://huggingface.co/grimjim/llama-3-Nephilim-v3-8B
>>
>>101513160
What do you mean /lmg/. People here barely agree on whether a model is the best thing ever or a piece of garbage.
>>
>>101513183
as someone with 8gb vram, nothing. I have conceded all the small models are garbage.
>>
>>101513183
the person who replied with nephilim isn't me btw
>>
>>101513076
Why do you think you can trust yourself?
>>
>>101513193
if you have low vram, all your options are bad
use nemo/gemma/CR
if you have vram, use l3/qwen2
that's the current state of local completely summed up
>>
>>101513209
it's over
>>
>>101513225
Thanks. Guess I will be purchasing a NAI subscription then.
>>
>>101513196
just buy more
>>
File: asdfnm.jpg (57 KB, 638x444)
57 KB
57 KB JPG
>>101512996
>>
>>101513267
>Assistant Message Prefix
>[INST][/INST]
what
>>
>>101513252
with NAI you might as well be using mistral nemo. "bad" is a relative term.
>>
>>101513291
>>101501499
>>
>>101513252
nai is killing it in txt2img
their text shit is useless
potentially revisit once aetherroom pulls up since they have access to massive compute now
>>
>>101513313
The empty user message is only added to ensure that there are alternating user/assistant/user/assistant messages. Since the system prompt is moved to the end.
>>
>>101513183
I'm using Celeste, because my name isn't Sao.
>>
I tried using gemma 3 times now. Each time I fiddle with settings and hate everything it does. Then I switch to some older model I used and I instantly like the responses more. What gives?
>>
>>101513183
You know you're talking with a shill, right?
>>
>>101513381
lack of skill
>>
>>101513381
Buy an ad.
>>
>>101513397
Saying what? That gemma is shit?
>>
the trolls make this thread so disappointing. Like what is the point even.
>>
>>101513385
I do generally assume good faith here. What's the point of shilling something if you don't get any profit from it?
>>
>>101513343
too bad celeste and stheno are both actually terrible
>>
>>101513419
They get profit from it. Are you retarded?
>>
>>101513411
that is the point, he wants to kill the thread...
>>
>>101513406
Yes. No one is going to go back to your shitty finetune.
>>
>>101513267
Thanks. Do not use system same as user? How is it getting the system prompt then?
>>
>>101513424
From people downloading a free model?
>inb4 donations
I wouldn't expect anyone here to do that lol. Not to mention merge makers don't deserve donations, they're not doing any compute.
>>
>>101513411
Just go to discord
Go, leave.
>>
>>101513464
Go away petra
>>
>>101513449
But I want gemma to be good.
>>
>>101513464
ur pathetic bud
>>
/lmg/ should kick out all poorfags who can't run big models. I'm getting real tired of all the shitters bickering about which flavor of shitty 8~47B is totally good and which are shilled garbage.
>>
>>101513463
Yes, dumbass. It's a business. They can monetize the popularity through sponsors and inference services. Go look at how many ERP finetuners are sponsored or have their models on OpenRouter.
>>
>>101513501
kek the big models are also bad
>>
>>101513501
honestly the difference between 70B and 8B is marginal
t. tried both for quite a long time
>>
>>101513501
>i only use one of the three available models. i don't like options, and no one should have them.
>>
>>101513419
Blindly believing someone calling a competing finetune "slop" is rather stupid. I have doubts that you're even human.
>>
Are there any models who can coherently roleplay a robot/android? Maybe some specific datasets based on this...
>>
>>101513411
Petra is inoffensive compared to the shills.
>>
>>101513587
said petrus after months of trying to kill the thread
>>
>>101513572
>time quads
I pretended to believe that to hear that person's alternative suggestion, then try both and determine what's best for myself.
>>
mistral nemo blows, retarded and schizo, at least when I fell for the shill it only took 5 minutes to download the shitty 8bpw exl2
Inb4 skill issue.
>>
>>101513625
>temperature issue
>>
>>101513604
I'm not trying to kill the thread I'm trying to kill DISCORD
total discord DEATH
>>
What's the new meta nowadays?
>>
>>101513625
Show your parameters
>>
>>101513635
you also always say the thread is reddit and should die, so...
also thx for confirming you are petra, not that it needed hard confirmation
>>
>>101513630
>>101513648
Had samplers neutralized, so all off, temp 1
>>
>>101513625
show cock size
>>
>>101513650
I say that. And it should die.
>>
>>101513662
>temp 1
that's hot
>>
>>101513662
>>101513630
>>
>>101513663
48gb
>>
>>101513637
Nemo for creative writing, Gemma 2 27B for general assistant, and some other model for code.
>>
File: temp.png (6 KB, 598x58)
6 KB
6 KB PNG
>>101513662
>not reading README.md
>>
>>101513685
Readme's are for nerds. Thanks.
>>
>>101513690
I hope you keep failing.
>>
>>101513674
As someone also with 48GB, Nemo is probably the best for RP and stories. It hallucinates too much to be a general assistant though. Gemma 2 context is too small. Llama 3 has repetition problems and it's too censored. Qwen 2 is stilted. And every community finetune is retarded.
>>
I need chatbots to be happy.
>>
>>101513796
lucky for you that a good chunk of the models have a horrible positivity bias baked in
>>
>>101513824
Ugly face anon is lost, indeed.
>>
File: GLpVlHiagAATNCW.jpg (305 KB, 1431x1715)
305 KB
305 KB JPG
>>101513076
how do you know you're using a good model and not a model over fit to what you like
>>
File: file.png (966 KB, 768x768)
966 KB
966 KB PNG
>>101513861
No cookie for you.
>>
Am I good with 16gb rtx 4060 ti and 32gb vram?
>>
>>101513922
You can run shitty tiny models at okay speed or better bigger models at a snail's pace. Does that sound 'good' to you?
>>
>>101513577
pls help anons.....
>>
File: OIP.jpg (44 KB, 474x478)
44 KB
44 KB JPG
Commander was the last good single gpu release.
>>
File: 1708697438718002.gif (557 KB, 498x443)
557 KB
557 KB GIF
>>101513946
>Does that sound 'good' to you?
No...
>>
>>101513922
Awkward place to be. Overkill for small models, not enough system ram to file cache a decent 70B quant.
>>
File: 1695759815689144.png (70 KB, 670x409)
70 KB
70 KB PNG
reminder
>>
Holy shit it actually works... I thought it was just another /lmg/ shill but whoever mentioned 2MW thank you so much. It is incredible how good 2MW is.
>>
>>101513991
2 megawatts?
>>
>>101513975
Shouldn't upper left be GPU-rich or something? Cause it makes no sense.
>>
>>101513922
You can probably run Mixtral 8x7b, Qwen 2 MoE, Gemma2 22B, and CommandR at okay-ish speeds at okay-ish quants, I think.
>>
>>101513991
2morrow?
>>
>>101514008
No, it's just calling genAI fags retards who are inferior to traditional ML researchers even in the gpu-poor segment
>>
>>101514021
>Gemma2 22B
I knew it. Nobody is actually using this piece of shit.
>>
>>101509683
Is there someone actually typing these? I thought it was a bot.
>>
>>101514028
>traditional ML researchers
What is that? Undi?
>>
>>101514042
The people who've been pushing the field forward for the past 60 years before the 'deep learning' craze brought in all the talentless children and silicone valley startup crowd.
>>
>>101514057
>The people who've been pushing the field forward for the past 60 years
Can one of them make a through, diverse, objective quantitative evaluation of model cooming quality?
>>
>>101514029
hi petra
>>
>>101514077(me)
Actually now that I thought about it. Would training just a discriminator from a GAN work well for judging cooming quality? I mean training on synthetic slop vs organic roleplay data. I have a feeling that a discriminator would quickly catch on to all the shivers etc and rate of shivers in text would translate to quality?
>>
>>101513975
Reminder of what?
>>
>>101514021
how "okay-ish" speed are we talking? Anything above 3 minutes seems a bit much to me.
>>
>>101514195
CommandR will be the slowest one, with the MoE ones being the fastest.
Gemma will sit right in the middle.
I think you'll be around 3 to 5 minutes mark once you get a good amount of context going for those.
Try Mixtral 8x7b and see how it works for you.
>>
>>101514206
>Try Mixtral 8x7b and see how it works for you.
https://huggingface.co/TheBloke/dolphin-2.5-mixtral-8x7b-GGUF

This is an awesome Mixtral finetune. Stock Mixtral was vindictive, rebellious, and Woke.
>>
>Still using Mixtral in 2024
>>
>>101514342
what's your alternative for the niche mixtral filled?
>>
>>101513918
AI messed up this one a lot.
>>
>>101514353
Gemma 2 and Nemo.
>>
>>101514342
Good point, but what else? E.g. I have 64 GB RAM. I can technically run 70Bs in 1.5 T/s, but that's a bit too slow. 8x7B gives a comfy 6 T/s at 8K context. Both Q6K/Q5KM, I don't think lower quants are reasonable.
>30B
I think those are dumber than Mixtral. And slower, too. So I'll just hope someone delivers another model in 8x7B form factor someday.
>>
>>101514342
It's really good for the tradeoff of speed and size.
>>
>>101514367
Nah, it's old. Keep living under a rock.
>>
>>101513076
In ye olde days I used to look at Kobold Horde models, test them there and if the delivery was good, download. Not sure how relevant this is today, but models there did somewhat follow "the current meta"
>>
>>101514396
Alpin hosting Goliath all day with his sponsor money means it's the meta!
>>
>>101514401
Eh, it was shilled in here a lot so I guess it was "the meta" for some time. I'm personally skeptical towards it (repeating the same model twice cannot make it smarter), but some people I know got good results.
>>
>>101514361
lol, Mixtral is smarter and better at multilingual than Gemma2 and Nemo, unfortunately it doesn't have sovl but I still prefer a non retarded model rather than something that doesn't understand shit about my conversation
>>
>>101514439
You and your friends have brain damage. Go back to the Kobold Discord and they there.
>>
File: Untitled.png (537 KB, 720x1416)
537 KB
537 KB PNG
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
https://arxiv.org/abs/2407.14057
>The inference of transformer-based large language models consists of two sequential stages: 1) a prefilling stage to compute the KV cache of prompts and generate the first token, and 2) a decoding stage to generate subsequent tokens. For long prompts, the KV cache must be computed for all tokens during the prefilling stage, which can significantly increase the time needed to generate the first token. Consequently, the prefilling stage may become a bottleneck in the generation process. An open question remains whether all prompt tokens are essential for generating the first token. To answer this, we introduce a novel method, LazyLLM, that selectively computes the KV for tokens important for the next token prediction in both the prefilling and decoding stages. Contrary to static pruning approaches that prune the prompt at once, LazyLLM allows language models to dynamically select different subsets of tokens from the context in different generation steps, even though they might be pruned in previous steps. Extensive experiments on standard datasets across various tasks demonstrate that LazyLLM is a generic method that can be seamlessly integrated with existing language models to significantly accelerate the generation without fine-tuning. For instance, in the multi-document question-answering task, LazyLLM accelerates the prefilling stage of the LLama 2 7B model by 2.34x while maintaining accuracy.
neat. no code posted but might be here
https://github.com/apple?q=ML&type=all&language=&sort=
>>
File: denial.png (1.23 MB, 3330x2006)
1.23 MB
1.23 MB PNG
>>101514465
Are you mentally ill?
>>
is nemo any good at generating coherent, grammatically correct Japanese sentences? what about Spanish?
>>
uuh any context and instruct jsons for nemo?
>>
File: 1708266380645-1.png (303 KB, 1024x1024)
303 KB
303 KB PNG
>>101514470
I wonder what your relationship to kobold is. Do you find yourself thinking about him or her in various contexts? How does he or she fit into your life? Maybe it would be interesting for us both to try talking to kobold as if he were there, and seeing how it feels for both of us?
>>
File: terminal-cope.png (143 KB, 1911x621)
143 KB
143 KB PNG
>>101514465
>>101514490
Mixtral doesn't even make the chart.
>>
>>101514342
What do the new models do that is so much better? Jack shit, that's what
>>
>>101514490
>27b beating 70b
don't need a PhD to understand that's bs
>>
>>101514470
Based and petrapilled
>>
>>101514490
>27b > 70b according to your chart
I guess I have to redirecict that question to you, are you mentally ill?
>>
File: denial-part-2.png (111 KB, 1460x786)
111 KB
111 KB PNG
>>101514517
Read the name of the pic.
>>
>>101514506
Alpin is in this thread for example.
>>
Nemo is pretty good, and I haven't even used the expert roleplayer system prompt yet.
I think the French got their hands on a extremely good multi-turn dataset, proving once again that it's all about data quality when it comes to RP, parameter count comes second.
>>
THE NUMBER BEFORE 'B' IS LARGER SO IT MUST BE BETTER
YA'LL JUST COPING
>>
>>101514532
chatbot arena isn't a great benchmark though, it says that gpt4o is first and claude 3.5 sonnet is second, that's so much bullshit I can't help but laugh. C3.5 sonnet is way better than any model (local or API) I tested in my life
>>
>>101514550
Now look at this pic again: >>101514490 >>101514511
>>
>>101514537
>proving once again that it's all about data quality when it comes to RP, parameter count comes second.
I disagree, Nemo is kinda retarded and that's because it's a small model, you can't make a 12b model as smart as a 70b one, it's just basic maths, the transformers architecture just gets better and better with more parameters, that's not a coinscidence that Meta decided to go berzerk on the numbers of B (405b soon) so that they can compete against the APIs, it just works that way
>>
>>101514564
>Now look at this pic again:
Now look at those counterarguments again: >>101514517 >>101514529 >>101514550
>>
Sadly /lmg/ is filled with retards who built huge rigs last year who are now coping too hard to admit that there are no big models worth using right now.
>>
>>101514578
>there are no big models worth using right now.
the fuck is this revisionism? the best local model is still CR+ and that's a 110b model
>>
>>101514576
Now read the name of the pic.
>>
>>101514565
>you can't make a 12b model as smart as a 70b one, it's just basic maths
lmao. We should go back to the era of billions of wasted and redundant parameters. Bigger = better.
>>
>>101514585
>If I claim that you're in a denial then it means that it's an absolute proof
And let me guess, Self-ID on gender is also a valid thing? kek
>>
>>101514584
Yes, I'm sure your $5000 machine paid off running this model that's 5% better than 70b and thus by extend maybe 2% better than gemma.
>>
>>101514602
Just say that you're too poor to afford to run big models and therefore know nothing about the huge difference between small and large models, it's no shame to be poor anon.
>>
>>101514598
Imagine someone pretending that GPT 3.5 Turbo is still worth using today, that's how unbelievable stupid someone using Mixtral today sounds.
>>
>>101514602
Cr is worse than llama 3 70b. But the new 70b tomorrow will be better than everything.
>>
>>101514565
Don't really care what meta does after llama3 fiasco, but let us know if it's better than nemo or gemma when it comes out.
>>
>>101514614
Isn't GPT3.5 turbo supposed to be a distilled 20b model or something? I've read that somewhere
>>
File: amazing.png (170 KB, 1735x420)
170 KB
170 KB PNG
The first model to beat Turbo! It must be amazing!
>>
>>101514620
>Don't really care what meta does after llama3 fiasco
I agree with you anon, L3 is a joke when you know that they spent 9 months "working" on it, and the best solution they had was "moar tokens" instead of I don't know... advancing the researsh field by trying new approach like Mamba or Bitnet
> let us know if it's better than nemo or gemma when it comes out.
What I know so far is that Mixtral doesn't make hard logic mistakes on RP like Gemma and Nemo, and that's fine, I don't expect a 12b and 27b model to be smarter than a 47b model
>>
>>101514633
See: >>101514511
>>
>>101514565
This but unironically
Davinci will always be king
>>
>>101514639
See: >>101514576
>>
File: kek.jpg (180 KB, 2304x910)
180 KB
180 KB JPG
>>101514639
Anon, if Mistral Nano was THIS good, the french fags would've said that this model would compete against the big guns (Mixtral, L3-70b), yet they decided to say that it's better than smaller models than it like gemma9b or L3-8b
https://mistral.ai/news/mistral-nemo/
>>
>>101514641
>Davinci will always be king
this, I fucking miss Davinci-003, this shit was so creative, if I knew I would've played with it for much longer, now we got souless reddit tier little riddle masters, fuck that
>>
>>101514658
That comment thread is about Gemma 2 27B. Nemo is better for creative writing.
>>
>>101514672
They could've compared with Gemma 2 27b too, yet they didn't, if they really had a model better than G2, they would've said it, would be too god to ignore that, that's what I like about the MistralAI fags, they aren't willing to lie to make their model appear better than what it really is
>>
>>101514682
>>101514682
>>101514682
New Thread
>>
File: Yall.jpg (39 KB, 680x451)
39 KB
39 KB JPG
>>101514544
>YA'LL JUST COPING
>YA'LL
>>
>>101514602
This post oozes envy
>>
File: 1721085350403749.png (195 KB, 500x553)
195 KB
195 KB PNG
>>101514688
you are gay
>>
i've been messing around with nemo and honestly i don't find it as good at rp as gemma 9b.
>>
>>101511680
You'll need +1 gpu for host



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.