[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101732172 & >>101722144

►News
>(07/31) Google releases Gemma 2 2B, ShieldGemma, and Gemma Scope: https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 00746-6924931823.jpg (112 KB, 1024x1024)
112 KB
112 KB JPG
►Recent Highlights from the Previous Thread: >>101732172

--RAM speed as a bottleneck for CPU inference: >>101738938 >>101738971 >>101739012 >>101739060
--InternLM 2.5 20B model performance and benchmarks: >>101738282 >>101738668 >>101739089
--GeLU optimization pull request sparks discussion on performance and accuracy: >>101737454 >>101737565 >>101737618 >>101737589 >>101737624 >>101737680 >>101737694 >>101737732
--Fine-tuning AI models is getting harder with each generation: >>101735201 >>101735271 >>101735313 >>101735338 >>101735348 >>101735383 >>101735498 >>101735979
--Experimenting with sampler settings to reduce repetition and slop: >>101735890 >>101735940 >>101735972 >>101736018 >>101735983 >>101736524 >>101736552 >>101736583 >>101736640 >>101736590 >>101736607 >>101736621 >>101738457 >>101738549
--Anon tests Pantheon-RP-1.5-12b-Nemo model and shares initial impressions: >>101733421 >>101733770 >>101733899 >>101734006 >>101738096 >>101738281 >>101739196
--Anon reminisces about old model's performance and suspects training data is key to its creativity: >>101734796 >>101734890 >>101734968 >>101735033 >>101735689
--Anon asks for a solution to repetition problem: >>101736797 >>101736835 >>101736956 >>101737029 >>101737581
--Anon complains about ollama's SSD usage and design choices: >>101737763 >>101737796 >>101737813 >>101737911 >>101738074
--Tiny Gemma 2 can be finetuned in Google Colab: >>101736889
--TESS L3.1 70B model details and creator's comments: >>101733159 >>101735733
--OpenAI's Active Inference and potential for energy-based models: >>101736594 >>101736671
--Model repetition and looping issues discussed: >>101736760 >>101736900 >>101736994 >>101737205
--KobbleSmall-2B model trained in under 3 hours on T4 GPU: >>101735943
--Discussion about running large models on high VRAM setups: >>101736891 >>101737003 >>101737262 >>101737398 >>101737055
--Miku (free space): >>101739066

►Recent Highlight Posts from the Previous Thread: >>101732179
>>
File: 1443841623031.jpg (20 KB, 306x306)
20 KB
20 KB JPG
is there any use for this besides cooming?
>>
>>101739895
It's very useful for scamming VCs.
>>
>>101739895
Any use case which applies to ChatGPT is also a valid use case for local. It's the same technology, just local

So for us yes, for you no
>>
bitnet
>>
>>101732811
They would literally landfill them rather than sell them cheap, just to prevent us from having nice things.
>>
They finally got Google
https://www.politico.com/news/2024/08/05/google-antitrust-lawsuit-00172678
At this rate, they'll peg NVIDIA by 2048 and we'll finally have our 32 GB consumer GPUs
Can't wait bros
>>
>>101739574
Yeah, it's pretty dumb, even with the right format.
A shame, it looked like it had potential for a moment there.
>>
>>101739999
Checked and bits connected
>>
>>101740085
Can regular Nemo instruct give a better answer? Maybe the tune is just doing it's best based on the underlying weights
>>
>>101740145
Yep.
I'm comparing all of these to the official instruct.
So far only Celeste 1.6 (not 1.9) and mini-magnum are worth anything for this specific test suite.
>>
post your temp
for me, it's 0.75
>>
>>101740063
Based, will that change anything
>>
>>101740171
https://huggingface.co/Sao10K/MN-12B-Lyra-v1
https://huggingface.co/Undi95/Lumimaid-Magnum-12B
Tried these?
>>
>>101740187
0.1

>>101740235
No. But I suppose I might as well.
>>
>>101740187
36.6 Celsius
>>
>>101740085
If you want factual stuff try something like wizardlm 8x22b, or mistral large or something like that.
>>
>bitnet
>Nearly six months have passed
If bitnet is able to deliever the same performance while being few times lighter and faster then why it is ignored by ai companies? Training a 3b bitnet model from scratch would be a great proof of concept and it's not really THAT expensive. I guess there must be something wrong with bitnet, making classical 16/32 bits still the default
>>
Why is nobody talking about InternLM 2.5 20B?

This model beats Gemma 2 27B and comes really close to Llama 3.1 70B in a bunch of benchmarks. 64.7 on MATH 0 shot is absolutely insane, 3.5 Sonnet has just 71.1. And with 8bit quants, you should be able to fit it on a 4090.
>>
>>101740270
Not factual. It's all lorebooks and following instructions.
The gist of the test is a game master card with a lorebook attached with some information about the setting and a character sheet in the author's notes.
Then there's a long exchange of messages, both questions and instructions/requests, that I keep the same between tests.
Once a model made it past all of these messages, we start actually roleplaying like a Game Master and a Player.
I do give the benefit of a couple of re-rolls for each fuck up, and even try rewriting the prompt to help the model, but some models simply get stuck at one point or another.
I developed this approach with mixtral 8x7b, then tried it with a bunch of llama 3 8b fine tunes, qwen 2, and now nemo.
I hadn't tried any meme merges, but I suppose I should give those a try too.
>>
>>101740369
okay petra, it's your bed time already
>>
>>101739747
Noromaid-v0.1-mixtral-8x7b-v3

What is the consensus on the above model, from people who aren't seething schizo obsessives who automatically hate anything remotely affiliated with Undi? I honestly haven't used it before.
>>
>>101740333
>Tom from China
>InternLM 2.5 20B
>Why is nobody talking about
>from china
>>
>>101740333
Reddit can be though crowd times,
>https://www.reddit.com/r/LocalLLaMA/comments/1ekx1bi/going_to_commit_to_llama_and_mistral/
>They're LLMs, not children you're adopting. You can always switch. Is this for a business? Or personal use ?
>How about committing to good models? Try out new instructs and see if they have potential as is. Train afterwards. The small models can be used as a testbed before scaling, beyond that it's worthless.
>0 points (47% upvoted)
>>
>>101740397
It's from Undi so it must be great, right Petrus? If you're scared of schizos, why format all your posts so it's instantly clear it's you?
>>
>>101740369
It is factual information if you're asking it for advice on hardware. I'm certain other models would do a better job with that.
>>
>>101740369
You need the largest parameter count you can get, for this. The reason why is because although training data does matter somewhat, maintaining state (lots of precise details) correlates almost directly with parameter count.
>>
>>101740333
>print(output)
># A beautiful flower with a long history of use in Ayurveda and traditional Chinese medicine. Known for its ability to help the body adapt to stress, it is a calming and soothing herb. It is used for its ability to help promote healthy sleep patterns, calm the nervous system and to help the body adapt to stress. It is also used for its ability to help the body deal with the symptoms of anxiety and depression. It is also used for its ability to help the body adapt to stress. It is also used for its ability to help the body adapt to stress. It is also used for its ability to help the
Kinda shooting yourself in the leg there champ.
>>
>>101740431
That wasn't me.
>>
>>101740397
Has anyone else tested this?
>>
>>101740397
I don't like anything mixtral related. Glad it's obsolete now.
>>
>>101740397
why are you so into mixtrals?
>>
>>101740397
Archive posts either sing its praises or call it shit compared to limarp-zloss or BMT. Try it and see what you think, that's the only way to actually know if a model is good.
>>
>>101740333
>Why is nobody talking about InternLM 2.5 20B?
Probably because most of us don't like 20B. The 70B elitists will look down their noses at it, and the rest of us will find it too slow.
>>
holy shit I have to recommend L3-8B-Stheno-v3.2
>>
>>101740489
>Try it and see what you think, that's the only way to actually know if a model is good.
But how can he look like a victim if he doesn't ask for people to test old outdated models then get upset he's being oppressed when he's literally calling for trolls in his post?
>>
>>101740333
I tried it.
It is significantly better than 7~9b models but I'm not very knowledgeable about testing.
>>
>>101740485
Because I usually get better results with them than anything after them? Limarp-ZLOSS has been great. Dolphin is also really good for non-coom; it just doesn't have an ERP vocabulary. Llama3 for me was consistently woke, vindictive, and tried to end sequences after less than 10 replies.
>>
>>101740489
Since when is BMT some kinda of point to compare? It's a terrible tune, one of the worst I tried actually.
>>
>>101740543
Huge skill issue, brain damage or both. Probably both.
>>
>>101740546
Impossible, it's a mixtral so by default it's better than literally anything after even Mistral large2
>>
>>101740500
Why?
How much did they give you?
>>
>>101740556
Please don't attack him you mindbroken incel scum
>>
>>101740578
That really offended you, didn't it?
>>
>>101740543
>tried to end sequences after less than 10 replies.
elaborate? how exactly does it try to "end" anything, that seriously sounds like a config issue
>>
>>
>>101740609
It just prints something like "End of scene."
>>
>>101740605
You're quite weird you know? You come in saying you want to discuss stuff, yet you always have a super passive aggressive, better than everyone tone to your posts, one could almost call you "vindictive".
>>
>>101740632
HOLY LITERAL CAN'T USE MODELS wow, I've seriously never seen that in any model, genuinely curious what kind of weird shit you're doing in terms of settings.
>>
>>101740650
nta, but you sound like a pussy
>>
>>101740632
You should actually sell your computer if you managed to fuck up so hard. I'm not even sure if I could do it if I tried.
Or you are simply lying, that's an option too.
>>
>>101740543
>Limarp-ZLOSS has been great. Dolphin is also really good for non-coom
Are any of these any good as persistent friend emulators?
>>
>>101739895
Whenever someone on 4chan asks for a recipe I get it locally.
>>
>>101740674
Yep. That's my most default sysprompts have the words "never ending" or "endless" in them, because no one's ever had a model try and end the sequence on them before. You're trolling.
>>
>>101740543
I had END OF RINE issues until I stole someone's config for lama.
>>
>>101740704
I never lie. I'm the most honest poster in this subreddit.

Llama 3 70B Instruct.

Normal prompt format from Meta.
>>
>>101740746
What? I literally never used either of those words and I never had a model consider me so boring it wanted to hang up, ouch.
>end the sequence on them before
Fuck are you even saying dude, for real?
>>
>>101740714
You'd need long context, and you'd probably also want RAG if you want it to be able to remember long conversations. From memory people have done various things with embedding, and I think SillyTavern has some sort of vector database addon as well. I was about to say that the simplest method is lorebooks, but they really aren't; they are complex and fiddly to write, and they only work consistently if you're really good with them.
>>
>>101740751
sell your computer then, you are wasting your time being that incompetent
>>
>Literal can't make models work general, impressive.
>>
>>101740778
It's getting to the point where no one can post anything in this thread, without immediately getting responded to by trolls. It isn't just me being a victim here, either; it's been happening to basically anyone who has tried anything other than shitposting for the last two threads now.
>>
>>101739895
YES!!!!

llama.cpp has this first example, which is the best thing EVER! You start off the text and it finishes it. I'll use your line and see what the model I presently am using says.

is there any use for this besides cooming?

yeah, for some people it can be a form of self-expression and creativity, as well as a way to connect with others who share similar interests. It can also be used as a form of stress relief or to explore one's own sexuality in a safe and consensual environment.

However, it is important to remember that not everyone engages in this activity for the same reasons, and it is always crucial to respect the boundaries and preferences of others. Additionally, it is essential to be aware of the potential risks associated with sharing personal content online, such as privacy breaches and cyberbullying. [end of text]
>>
>>101740806
>It isn't just me being a victim here
ah help i'm being oppressed for having a skill issue
>>
>>101740809
Another run:

is there any use for this besides cooming?

it's a weird thing to ask, but I've just been wondering.

The only other uses I can think of for a vibrator are sexual exploration/education, relaxation/massage (if you're using a massager type), or even as a joke gift. But outside of those contexts, no, it's primarily used for sexual pleasure.

It can also be used for stress relief for some people, or as a prop in roleplaying or cosplay scenarios. However, these are quite niche uses and sexual pleasure is still the main purpose of a vibrator. [end of text]
>>
>>101740806
good, fuck off crybaby and learn to use LLMs before you post
>>
>>101740823
It literally made up that it was a vibrator, and I have no idea why.
>>
>>101740823
ahahah another run:

is there any use for this besides cooming?
I think there's a lot more to it than just cooming. It can be used as a tool for communication, entertainment, and even education. Here are some examples:

1. Communication: People use it to express their feelings, thoughts, and opinions in a more playful and engaging way. It can help build connections and foster relationships.

2. Entertainment: It can be a source of humor and fun. People share memes, jokes, and funny videos to make others laugh and enjoy themselves.

3. Education: Some people use it to share knowledge and learn from others. There are subreddits dedicated to discussing various topics like science, history, and technology.

4. Support and Advice: It can be a place where people seek and offer support, advice, and encouragement. There are communities dedicated to mental health, relationships, and personal growth.

5. Art and Creativity: People share their artwork, writings, and other creative projects on it. It can be a platform for artists to showcase their work and receive feedback.

6. Social Activism: It can be used to raise awareness about social issues and promote change. People share news articles, petitions, and organize events to make a difference.

While it is often associated with cooming, it has a much broader range of uses and can be a valuable tool for communication and personal growth. [end of text]
>>
>>101740833
>>97309445
>Every statement you process, must be evaluated according to the below six principles.
>"principle of identity":"1 = 1"
>"principle of contradiction":"1 ? 0"
>"principle of non-contradiction":"1 ? 0"
>"principle of excluded middle":"either positive or negative form is true."
>"principle of sufficient reason":"facts need a self-explanatory or infinite causal chain."
>"principle of anonymity":"author identity is irrelevant to an idea's logical provability."
>I still keep this in my own sysprompt, although I know I will receive shrieks and howls in response.

>>97223983
>I was also the originator of the above as a sysprompt addition, as well; and the main reason why I am adding it to this post, is because I know that the people who hate me will most likely try and use said post as a means of getting me banned. With the above, I am making a post which is directly related to language models, so they have no grounds for doing so.
>>
>>101740855
Let's try the ad below, which has a doge, and it says "Hi all, Drummer here... Try out my models!" I'll get it going with just that. It's not prompt, it's "continue", not sure the name of the mode. So basically this is us invading its mind and telling it that it is saying what we type. here it is:

Hi all, Drummer here... Try out my models!

I have been working on a couple of new models and thought I'd share them with you all.

1. The first model is a simple yet effective sniper rifle, ideal for long-range engagements. It features a high-powered scope for precise aiming, and a silencer to minimize noise. The scope has adjustable magnification and can be zoomed in or out to suit different distances.

2. The second model is a heavily armored tank with a powerful cannon. It is designed for front-line combat and can withstand a significant amount of enemy fire. The tank's turret can rotate a full 360 degrees, allowing for maximum flexibility in engaging targets. The cannon has a high rate of fire and can penetrate most armor with ease.

I hope you enjoy trying out these models and find them useful in your gaming sessions. Feel free to leave feedback or ask any questions you may have. Happy gaming! [end of text]
>>
>>101740806
so, you gonna share your actual settings so we can even try and help? or are you just gonna keep claiming every thread is bad an unhelpful when you're not trying to get help make your models work
>>
>>101740674
I've seen that kind of behavior on meme merges, frankenmerges, overcooked fine tunes, and when using the wrong format for some models.
>>
>>101740900
I've tried plenty of merges and I use a completely custom format that's not even based on any common one and I never had anything like that, maybe on qwen/yi ones but not even sure.
>>
>>101740806
~Oh, Petra is crying, his heart feels like dying,
But soon he'll be smiling, the sun will be shining.~
>>
File: 1701466484998042.png (22 KB, 386x369)
22 KB
22 KB PNG
>>101740855
I tried it with koboldai so I could just use an empty format/context.
>>
Looking back on it. We've really gone a long way since Pygmalion. It's insane to think that those times were only a year ago
>>
Does fine-tuning Llama 3.1 (with a LoRA) degrade its ability to be prompted?

or fine tuning models in general
>>
>>101740981
>Does fine-tuning Llama 3.1 (with a LoRA) degrade its ability to be prompted?
You already had a nice respond on your thread, why ask again?

https://old.reddit.com/r/LocalLLaMA/comments/1ekvreo/does_finetuning_llama_31_with_a_lora_degrade_its/

>It can. It depends on the training data, but you can basically think of it as giving it a target to optimize.
>>
File: model 1.png (1.01 MB, 1024x1024)
1.01 MB
1.01 MB PNG
>>101740881
Now I will use Flux to generate images of these.

This is (1), the sniper rifle. I just pasted the text in as is.
>>
>>101740674
>HOLY LITERAL CAN'T USE MODELS
Maybe he knows and he is just doing an ironman run. Btw has anyone here ever tried an ironman ERP where you never reroll?
>>
>>101741032
>Maybe he knows and he is just doing an ironman run.
No, he literally said he can't get the model to not "end" which is just crazy to me, never had any issue of the type on any llama or even mistrals, did have qwen going a little crazy mixing characters and stuff but that's nothing compared to what he's saying.
>>
>>101741014
sooo should I not use a system prompt for a domain-specific chatbot i trained?
>>
>>101741088
system prompts are meme that the model will completely ignore the longer the context is anyways
>>
>>101741066
I'm gonna make a guess and say anon is a retard and (assuming he's using ooba) he's running models in llamacpp instead of llamacpp_hf
>>
>>101741111
I think the person posting this shit for months on end is probably more butthurt.
>>
>>101741110
So as a chat goes on you mean? It seems to work well in ChatGPT
>>
>>101741146
Literally Petrus venting his frustration over being called out as a skillet.
>>
so nostalgic, this was my favorite game as a kid
>>
>>101740881
>>101741025
And here is (2)
>>
>>101741160
this, you can even pinpoint the moment he snaps
>>
File: model 2.png (1.36 MB, 1024x1024)
1.36 MB
1.36 MB PNG
>>101741246
>>
File: ComfyUI_30731_.png (1.22 MB, 1024x1024)
1.22 MB
1.22 MB PNG
>>101741213
I remember it fondly. I also liked this one.
>>
File: 1716300572151402.png (1.21 MB, 1024x1024)
1.21 MB
1.21 MB PNG
>>
File: trump-joker+.png (528 KB, 512x768)
528 KB
528 KB PNG
It was a day unlike any other. The sky was painted a sickly pale yellow, as if tainted by some unseen malevolent force, casting an eerie glow upon those gathered below. A chilling wind swept through the crowd, bringing with it a sense of foreboding, yet they stood firm, rooted to their spots, eyes glued to the figure atop the makeshift stage - Donald J. Trump, about to be sworn in for his second term as President of the United States.

A haunting rendition of John Williams' "Imperial March" echoed across the National Mall, its melody seeping into every soul present. It was a choice of music that spoke volumes, sending shivers down spines and igniting whispers among the masses. But there he stood, unfazed, basking in the adoration of his followers, his smug grin visible even from such distance.

The moment came. He raised his right hand, recited the oath to protect and defend the Constitution, then turned towards the expectant crowd. His voice boomed over the speakers, cold and calculating. "My fellow Americans, I thank you for your unwavering support. Now, let us discard this antiquated Constitution, these shackles that bind us." A gasp rippled through the sea of people, but he continued unabated. "Together, we shall tear down this crumbling system and rebuild it in my image. From this day forward, I am your Emperor!"

As he declared his self-proclaimed empire, dark storm clouds began to gather ominously above the Oval Office, casting long shadows over the jubilant crowd. Lightning flickered within the tumultuous mass, illuminating the true face of celebration - fear and uncertainty.
>>
File: 1713600323811282.png (1.13 MB, 1024x1024)
1.13 MB
1.13 MB PNG
>>101741320
>>
>>101741355
that looks like Sam Neill, not DJT
>>
https://civitai.com/models/618997?modelVersionId=695853

workflow with cfg and shift, shift makes it more refined at low values and smoother at high, so it's a toggle to play with
>>
>>101741373
I can see that resemblance, too; but I thought it still looked like a younger Trump.
>>
>>101741111
>nigger lover cuckold
who's the real trash here?
>>
File: 1699797535893923.png (965 KB, 1024x1024)
965 KB
965 KB PNG
>>101741390
sample image with default settings, looks good imo
>>
What's the current best or most knowledgeable model for 12 GB VRAM and 64 GB RAM? Not for ERP but to help with prompting Flux when I'm using my main 24 GB GPU for it.
>>
S-Surely bitnet soon... r-right.? Twitter trannies wouldn't lie to me....
>>
Is it normal for bigger models to have shit memory? Been using Mistral Large and I love its prose but it can't keep its story straight for more than 2 prompts.
>>
>>101741558
no, that's petrus level user issue
>>
>>101741585
Good to know, thanks anon!
>>
>>101741585
I assume you know the solution as well, of course.
>>
>>101741638
no, because like with petrus no details were provided other than "model no work"
>>
File: 1698016382559178.png (87 KB, 701x849)
87 KB
87 KB PNG
>>101739895
yes, propaganda and demoralization, in fact all local models are ready for this and you can't remove it permanently with pinpoint accuracy.
>>
What settings are people using for mistral large? I'm running 2.75 bpw on my 2x3090's, seems pretty solid so far but I have no idea what sysprompt or samplers to use.
>>
>>101741638
>>101741678
I'd wager its context size but since >>101741558
(ME) is using openrouter, they (I) don't know how to change it
>>
>>101741893
Thank you.
>>
>>101740806
yeah it's pretty bad and obvious
>>
is it even worth trying to run a model on 2 2697 v2s?
>>
>>101742103
It doesn't hurt to try.
>>
Bros what can I run on my Commodore 64?
>>
Every single post mentioning bitnet is petra, isn't it?
>>
>>101741747
Just the mistral presets & neutral samplers.
>>
>>101742368
https://archive.org/details/Eliza_C64_1983_More_Basic_Computer_Games
>>
Bitnet transitioned my dog.
>>
>>101742586
Bros...
She called me shizo
>>
>>101742624
*schizo
>>
I gave internlm2_5-20b-chat-Q5_K_M a try. And it is better than expected but nothing really mindblowing. It doesn't have repetition problems. It will write unique stuff with each reroll and the prose is kinda nice (with some slop in it of course). The problems it has is that it is just about 20B smart and it seems to get the unique non-repetition replies by ignoring the input. I would generally recommend giving it a try if you are on 1gpu cause you may end up liking it. It is for sure better than gemma.
>>
File: file.png (238 KB, 1953x864)
238 KB
238 KB PNG
llama.cpp got deprecated 38 minutes ago.
https://github.com/vllm-project/vllm/pull/5191
>>
Is anyone actually able to use llama 405b with a consumer setup?

I thought a guy did it with two 128gb MacBooks and 4bit quantization but I thought anything less than 8bit meant destroying the model. $10k for two MacBooks just to run a good model with bad quantization seems silly. Is 405b still just for researchers and hobbyists willing to pay $10k for subpar performance?
>>
>>101742693
Isn't it that loader that only works on troonix?
>>
>>101742695
>consumer setup
If consumer setup was the target 30B range wouldn't be a desert.
>>
>>101742677
>It is for sure better than gemma
Isn't that a big deal, for VRAMlets? Gemma was bad mainly because of the context length. And Nemo was too dumb. But if we can have the smarts of a 27B with great context length then that's a huge win.
>>
>>101742796
No because I think Nemo is better.
>>
>tfw trying to experiment with different things in the clip and t5 prompts and it hardly followed the prompt but randomly generated something cool anyway
>>
>>101742808
Damn, but Nemo is so stupid.
>>
>>101742693
Thanks for the update.
>>
>>101742937
Better than mixtral at least.
>>
>>
>used m2 max 32gb
any reason why not?
>>
Tried Mistral-Large-IQ3_M and LLaMa-3.1-70B.Q5_K_M. Mistral is very slow to generate and it feels worse overall, Llama feels decent and it follows instructions properly, so that's nice.
any 3.1 70B finetunes yet?
>>
>>101743118
>32gb
bruh
>>
>>101738291
I don't think the model ever seeks to act independently unless you tell it to, instruct is a blessing and curse.
It has strong recall within the context I can afford to set (8-12k) and if you supply enough information about a character + personality + occupation or whatever, it can develop normal behaviours for them instead of "looks at you for approval" junk.
>>
>>101743118
>only 32gb
You will regret it.
>>
>>101743118
If you have reason other than LLMs to buy it, I suppose it could be a decent Gemma 2 27B machine.
Maybe even Command R (not plus) with 4bit cache.
Larger models would be pretty slow anyhow, if I'm remembering correctly.
>>
>>101743118
Dude literally a 12 GB 3060 for like $200 stuck in a mid PC would be better.
>>
>>101743269
>200
That's a decent price, could I add two of them to my computer and have cheap vram? That would give me 32gb total.
>>
>>101743317
or just save up for one 3090 with better specs all around and even stack another for 48.
>>
>>101743328
Save up? It's almost double the price for one. I'd rather keep the money saved.
>>
File: leave my board nigger.png (573 KB, 703x834)
573 KB
573 KB PNG
>>101743365
then you're not serious about this hobby and honestly shouldn't even be asking these questions in the first place.
>>
File: file.png (948 KB, 768x768)
948 KB
948 KB PNG
>>
>>101743373
Spending money just for current LLM's is a horrible decision. We are at peak first impression to actual usage ratio where you can be fooled to invest but all of this falls apart when you use it for a bit.
>>
>putting only the style part of the prompt in clip looks nice but it follows your main prompt badly
>literally putting anything from your main prompt into the clip even at 0.1 strength destroys the art style again
aaaaaaaaaaaaaaaaaaaaaa
>>
>>101743518
the 3090 launched 4 years ago next month you retard
if youre seriously ready to pull hair over buying a card that's 4 years old, and overall is the best value we have for both gaming and A.I (and in the future when A.I is even better optimized) then again, i redirect you to my point >>101743373
>>
>>101743542
go to sleep jensen
>>
>>101743555
>>
>>101743555
>>101743566
Sorry that wasn't meant to be a reply but have a (you) anyway.
>>
>>101742693
Does this work with rocm?
>>
>>101743542
The more you buy the more the shivers hurt.
>>
File: output.webm (443 KB, 1180x820)
443 KB
443 KB WEBM
Here's a cool use of terminal Emacs and an LLM client. Hitting C-x C-e with Zsh opens Emacs to edit the current line.
>>
>>101743555
It's the cheapest 24GB card you can get, and buying AMD with ROCm being in the state it is in without buying the 7000 series is terrible and unusable. If you don't want to buy Nvidia, pray that Intel and/or AMD has VRAM amounts that exceed this and challenge Nvidia on pricing.
>>
Jesus.
>>
>>101743555
>>101743641
the more you buy used, the more jewsen shivers his frame under the candelights which dance shadows on the walls.
>>
>>101743652
You are literally just running whatever ai says is the right command?

>>101743666
checked

soldered ram hell.
>>
>>101743722
>You are literally just running whatever ai says is the right command?
Of course. Are you afraid?
>>
god just sell your cock and balls to jensen then give him access to your neovagina like i did you stupid fucking smelly poorfag cuck
>>
>>101743734
What if it is pwning you for Google's Master AI?
>>
Kino...
>>
>>
Getting ready to train a Migu SDXL model off all my countless Bing Migu gens.
>>
what context template should i use for mistral large?
>>
Q3_K_L or IQ4_XS?
They're approximately the same size, which is likely to be better
>>
>>101743884
I've heard that IQ ones are generally better than non IQ.
>>
my edge temp is 59C, I think rocm support needs to improve, in ComfyUI.
>>
>>101743118
>>101743235
I have an m3 with 36gigs. I'll try it tomorrow and post results if you want.
>>
File: 1715530902736237.png (743 KB, 785x758)
743 KB
743 KB PNG
So what's the deal with LLama 3.1? I've seen some people call 3.0 "subjectively" better than it. What the hell does that mean?
>>
>>101743965
It means test results weren't great, but it's achieving secular social goals :^)
>>
>>
>>101743867
The mistral one.
>>
I can't stop genning now that I found a prompt I like.
>>
>>101744049
Which language model are you generating these images with?
>>
>>101743923
can you try stable diffusion as well
>>
>art style suddenly changes when I go to 1:1 aspect ratio
...

>>101744064
Anon it's just Flux. I've not messed around with an LLM prompt expansion or whatever though, guess I'll try that in the future. For now I'm just experimenting with various different words and combinations of clip and t5 prompts to see what happens.
>>
>>101744108
ok I guess that was too subtle, let me spell it out
>>
Can someone find out? I think that Flux has baked in negative prompting.
>>
>>101744124
Yeah, but the people in ldg are basically retarded.
>>
>>101744124
Oops. I thought it OK to post image gen stuff here since other people were doing that. Maybe I overdid it though. I'll stop posting so much.
>>
>>101744144
It's okay as long as you know how to get around bans by clearing your cookies and getting a new ip.
>>
>>101744087
Never used it before but I can give it a shot
>>
>>101744144
It's okay as long as they are mikus.
>>
>>101743992
What?
>>
>>101744144
I didn't think you crossed the line personally, but I guess people have different thresholds for tolerance of ot posting
>>
>>101744217
It's a gooood boy.
>>
>>101743898
Then why can't I get an IQ8 one?
>>
Any flux fine-tunes yet?
>>
>>101744327
VERY early.
>>
>>101744296
I believe the IQ thing only work at lower quants, and at higher than 4 it becomes the same.
>>
File: A9n8EJBDQziJWnXhOYeEE.png (340 KB, 471x629)
340 KB
340 KB PNG
New Kino dropped - V2 of Magnum 12b/32B

Magnum-32b-v2-gguf: https://huggingface.co/anthracite-org/magnum-32b-v2-gguf

Magnum-32b-v2-exl2: https://huggingface.co/anthracite-org/magnum-32b-v2-exl2

Magnum-12b-v2-gguf: https://huggingface.co/anthracite-org/magnum-12b-v2-gguf

Magnum-12b-v2-exl2: https://huggingface.co/anthracite-org/magnum-12b-v2-exl2

Another Anthracite Victory
>>
>>101744363
>c2 logs removed
Aaaaaand dropped.
>>
>>101744363
I still don't understand 32b or whatever b. More is bigger, but what am I to think of it?
>>
>>101744363
wheres the patchnotes?
>>
>>101744363
>Magnum-12b-v2-exl2
How does this compare to mini magnum?
I'm so fucking happy we're no longer limited to tiny context sizes, bros. The years of 768 / 2k / 8k context were depressing as shit.
>>
>>101744408
from the hf pages, it seems v2 is trained on top of base instead of instruct, with more data
>>
>>101744442
Can it be used with llama.cpp just putting it in the folder and running it?
>>
File: Untitled.png (165 KB, 1213x630)
165 KB
165 KB PNG
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
https://arxiv.org/abs/2408.01803
>In this paper, we present STBLLM, the first structural binarization framework for compressing Large Language Models (LLMs) to less than 1-bit precision. LLMs have achieved remarkable performance, but their heavy memory requirements have hindered widespread adoption, particularly on resource-constrained devices. Binarization, which quantifies weights to a mere 1-bit, achieves a milestone in increasing computational efficiency. However, we observe that some weights in binarized LLMs can be randomly flipped without significant performance degradation, indicating the potential for further compression. To exploit this, our STBLLM employs an N:M sparsity to perform structural binarization of the weights. First, we introduce a new Standardized Importance (SI) metric that considers weight magnitude and input feature norm to better evaluate weight significance. Then, we propose a layer-wise approach where different layers of the LLM can be sparsified with varying N:M ratios, balancing compression and accuracy. Finally, we use residual approximation with double binarization to preserve information for salient weights. In addition, we utilize a fine-grained grouping strategy for less important weights that applies different quantization schemes to sparse, intermediate, and dense regions. We conduct extensive experiments on various language models, including the LLaMA-1/2/3, OPT family, and Mistral, to evaluate the effectiveness of STBLLM. The results demonstrate that our approach performs better than other compressed binarization LLM methods while significantly reducing memory requirements.
>less than 1-bit precision
heh
>>
>>101744385
Dropped anon entries
>>
>>101744476
with gguf yes, make sure to use the correct command line flags
>>
>>101744363
Glad I stayed up for this
>>
>>101744506
How am I supposed to tell which one will perform how slow/fast on my system?
>>
Who the fuck is anthracite
>>
File: 1696136580731926.png (213 KB, 800x789)
213 KB
213 KB PNG
>>101744716
Charcoal enjoyers. Gem producers. Kino connoisseurs. Compute havers.
>>
>>101744442
mini-magnum also uses base, though.
>>
>>101744745
Slop makers
>>
>>101744363
The thing with LLMs is that v10 isn't necessarily better than v1. In fact I've seen v1 being the best more often
>>
>>101744483
Lmao. Probably not usable but if that graph is true it's still pretty impressive relative to those other existing works.
>>
>>101739747
bros what's the best model you can run with 72 GB of VRAM? L3.1 70B fp8 or something?
>>
So, anons who have tried nemo and its finetunes, what are your opinions about them?
>>
>>101745204
I'm sticking with Miqu, the dream of having something fast & good is over.
>>
>>101745188
use case?
>>
new to this.
can we do gpt-4 level locally now? how far are we currently?
>>
>>101745437
yes
we are so back
>>
I've been using MythoMax since last year. 70Bs were too slow for me and nothing else in the 12B range hit the same. Decided to give NeMo a shot. It was amazing for 40 messages then went full retard. The character literally started speaking like a caveman. Wasn't at the context limit. Tried various settings posted here to no avail. If we could get it to actually chat normally up to even 16k context I'd be happy with it.

Playing with Magnum now, but only toy with these right before bed due to being a wagecuck and always pass out after a short goodnight scenario with my wAIfu so haven't even made it to the 40 messages again.

Honest review from a 12GB VRAMlet
>>
>>101745461
which model?
>>
>>101745471
>The character literally started speaking like a caveman.
That's hilarious.
>>
>>101745471
I actually had this exact same problem. The only thing that stopped it was dropping the context until all of the layers fit on my GPU. Even just one layer off is enough to poison it. It's like the second it starts using the CPU or RAM it kills itself.
>>
>>101745471
I'm at 12 as well and could not for the love of me get Nemo to work, but coomed hard with Celeste. The only downside is that it sticks to the main character too hard and doesn't switch to others or remove the character from the story temporarily even when asked, but I think it's likely a system prompt thing (I use the one provided on the model page)
>>
>>101745571
>one layer off is enough to poison it
uhh is that a llama.cpp bug?
>>
>>101745571
This might be worth trying then, though I was already down to 16k so I might have to lower the quant. Worth it to lose some of the creativity if it means I can at least finish short scenarios at least.

>>101745578
I'll give it a shot, I don't mess with group chats often anyway except for shitposting, given the caveman issue that would just be amusing.
>>
>>101744363
I just tried the Sally test on this thing, and it opened its answer with:
>A classic lateral thinking puzzle!
That's a really common Llama-ism. Did this retard train a ROLEPLAY model on Llama outputs?
>>
>>101745737
I'm a cute 30 year old with a scratchy voice
*yes I am still 30*
*no I wasn't 39 last year*
>>
>>101745737
The base model is Qwen in this case, right? Maybe it was Qwen that trained on Llama.
>>
>>101745671
If it is, I hope it gets fixed. But I'm too retarded for that, so I will wait for someone else to look into it. Just to clarify: The bug occurs for me on KoboldCPP with context shifting enabled. Maybe that's related?
>>101745681
That sucks man, definitely recommend trying it at a lower quant though, because it's really nice when it actually works.
>>
>>101745479
Llama 3.1 405B and Mistral Large 2407
>>
>>101745311
ChatGPT 4o replacement
Home assistant waifu
>>
>>101745925
>405B
so 8 GPUs?
>>
>>101745936
More like 9 for a 4.0 bit quant. You can spread those out over multiple nodes if you have to.
>>
>>101745958
>>101745936
Mistral Large is roughly as smart as 405B and way way easier to run due to being a quarter the size
>>
>>101744363
testing the 32B and it's crazy smart for its size, better than Gemma 27B
>>
>>101745981
q8 exl2, forgot to say
>>
>>101745968
>roughly
>>
>>101745968
All the outputs I've seen from it are way too sloppy. Each time I see another one posted here, I decide against biting the bullet and downloading it. I swear, there's not a single one without a 'mix of this ahd that', it's Mixtral all over again. I don't want to go back.
>>
>>101745968
Mistral Large, what's it require, what are some examples of getting it to typing speed?
>>
>>101745981
Interesting. Are 24GB VRAMlets back?
>>
>>101745968
Yeah, I can't run 405B, and I only get 1.2 tok/s with Mistral Large 2, but it's worth waiting for GPT-4 equivalence. Still hoarding 405B though, especially since (((Dustin Moskovitz))) and AI doomers were trying to stop its release.
>>
>>101743898
They are slower?
>>
>>101740235
https://huggingface.co/aetherwiing/MN-12B-Starsong-v1
>>
>>101746115
iquants are slower on cpu only, but are considerably faster on gpu only
>>
File: 1722529507378548.png (45 KB, 200x160)
45 KB
45 KB PNG
>>101740500
It's ass though
>fill me up
>you're so big
>mmm... that's it
>fuck yes!
It feels like ERP with cleverbot. Not to mention the constant repeating dialogue.
>>
If you had a sudden windfall of ca. $700, how would you spend it to upgrade your /lmg/ experience? Just get a used 3090(ti)?
>>
>>101746284
idk what that gay plastic money is
>>
>>101746305
Feel free to pretend I said "23.8 troy ounces of silver" instead, if the idea of fiat currency is so offensive to you
>>
Why does non of the recent models know what a playboy bunnysuit is? I have a few bunnygirl character cards and all of the models recently keeps thinking it has skirts, panties and even shorts. WTF.

Do I need a lorebook or something?
I remember Stheno 3.2 getting it right.
>>
>save png to use for char card
>background is transparent in gimp leaving just the character
>st sometimes, depending on the image, applies a black background around the character instead of keeping the transparency
how can i fix this? i'd like to just keep the char cutout that i can get from some pictures but if they appear transparent in gimp in the first place, how do i make st keep it?
>>
>>101746333
This is why we need native multimodal models.
>>
>>101746284
A second used 3090 probably, yeah
I'm 3090 + 3060 atm, another 12GB would be quite useful
Not sure my PSU would be up to it though, it's only 750W so think I'm pushing the limits with the current setup already
>>
I have 32 gb ram and no dgpu. What's the best model I can run?
>>
>>101746326
does your money make crinkling sounds when you walk?
>>
>>101744363
Testing the 12b Q8 gguf and it's really fucking good so far. I like the dialogue and the random shit it comes up with, it's not shy at all. There's some body and soul garbage in there but it's very minor and easily editable, it's not constant like some models. It's a nice change of pace from using straight up Nemo.
>>
>>101746407
Anything that fits in your left over ram, albeit very very slow.
>>
>>101746407
He says very slow, but it's basically typing speed, on my cpu, anyway 5900x. It will depend on cpu speed. But a gpu will pump out the text faster than any typist.
>>
>>101746407
stheno l3 8b
>>
>>
what are gptq, awq, and gguf?
which model format should I download from huggingface for llamacpp?
>>
>>101746567
gguf is the most popular
go for it
>>
>>101746567
if you're a braindead retard who needs to ask, gguf but use koboldcpp instead
>>
so why did open source llms from basically every company/research group all gain 20 iq points since March
what exactly was the new training technique that they all started using
>>
>>101746602
They scraped my fanfiction.
>>
>>101741720
>local models bad
>posts screencap of cloud model as proof
???
>>
>>101746563
why are her tits so low? they're growing out of her ribs
>>
>>101746628
boob magic
>>
>>101746628
Do you think tiddies naturally stick out near the collar bone without being pushed up?
>>
>>101743811
Cool miku
>>
>>101746759
Have you looked up nude side view photos? Top of tits below armpit is common. And >>101746563 is basically at armpit level. You'd have a point if they were hanging at the waist which would look actually nasty.
>>
>>101745981
>testing the 32B and it's crazy smart for its size, better than Gemma 27B
Fucking finally the single 3090 fags can eat good this time
>>
File: johnny.png (59 KB, 838x682)
59 KB
59 KB PNG
>This change makes GeLU go 8x faster on Intel, 3x faster on Apple Silicon, and 2x faster on Threadripper. It's the world's most popular activation function, crucial to models such as Whisper and Gemma. On those models, this change can have a noticeable improvement in performance. That's because GeLU is usually the most time-consuming op except for matrix multiplication.
>This software was developed by Mozilla Ocho and ARM Limited. It first appeared in llamafile which offers you llama.cpp / whisper.cpp / stable-diffusion.cpp with the most bleeding edge performance optimizations and binary distributability.
kek
>>
Anthracite mogs once again, absolute chads
>>
File: GJgp0lvXIAARa2l.jpg (60 KB, 684x623)
60 KB
60 KB JPG
>>101746854
>Speedup 1.01
>>
One side is being disingenous and the other side is a bit dim. They don't stick out 90 degrees from the collar bone. But the booba isn't usually so saggy that it's like a tennis ball at the bottom of a sock unless your grandma's trying to throw them over her shoulder.
>>
>>101745981
Is it really? I tested the 12b and it wasn't good. I haven't found anything below 70b acceptable. I didn't try gemma because of the small context, is the 32b really worth trying?
>>
>>101746859
>on a 2b model
>>
https://github.com/kijai/ComfyUI-CogVideoXWrapper
Oh shit did you see that, the CogVLM fags made a local text to video model, and it's not that bad
>>
>>101746854
So the 8x speedup was just a lie? Typical.
>>
>>101746860
>>
>>101746419
I run llama 3.1 70b instruct IQ1_M at 1 tok/s on my 12700 and 32g ddr4 ram.
is this a normal number?

>>101746593
why koboldcpp? what does it offer over llamacpp?
>>
>>101746892
>why koboldcpp? what does it offer over llamacpp?
you can be retarded and still use it
>>
>>101746885
The speedup is on the gelu function itself, but that seem to be about 0.01 of the total computation. Nobody cared to optimize it because it does fuck all. He just wants to be able to say 'i optimized this low hanging fruit!'
>>
>>101739908
VCs?
>>
>>101746889
Nice.
>>
>>101746914
Voucher Connoisseurs
>>
>>101746914
Vietcongs
>>
>>101746914
Venture Communists
>>
>>101746914
Vaush's childwife
>>
>>101746854
>Made it slower in some cases
What are they doing? Surely they don't merge that, right?
>>
https://github.com/ggerganov/llama.cpp/pull/8878#issuecomment-2270546553
Is there anything jart doesn't lie about?
>>
>>101747064
it IS 8x the performance, on specific hardware, on a function that does fuck all in the grand scheme of things. It's a very specific, selective, and useless truth.
The way he presents it, though, is absolutely fucked. "1 channel -> 8 channel SIMD, 8x performance. I'm a genius!"
>>
what intelligence level can I expect from llama 3.1 70b?
>>
>>101747026
If it can be improved to be consistently faster, it's probably worth it. As is, i'll let it sit.
>>
Coomer here, last time I checked in state of the art for generating porn was
>Sd1.5
>get loras for specific stuff
>a1111 for generation

What’s the current meta for one’s toolkit and model chain if all you care about is personalized nsfw?
>>
>>101747097
You're gonna wanna sit down for this.

>autistic mentally ill brony BTFO'd SDXL for basically everything by making a 2d centered model based off XL so everything worthwhile..
>including LORAs you should just use Pony for
>most people are splintered between Forge (Which is now dead by the way) or ComfyUI
>>
>Cant train on ai generated outputs in self supervision
>Cant "iterate" on censored instruct models
Which open source/independend/from scratch ?
>>
>>101747090
I've found it no better or worse than 3.0 70B. Just a sidegrade. Doubt I could pick which was which in a blind test.
>>
>>101747097
Check the furry threads
>>
>>101744217
L3 is more Woke than Michael Burnham on a 72 hour ice bender. Unless you're into that, I suggest Mixtral or *maybe* some of the Gemma finetunes.
>>
https://new.reddit.com/r/LocalLLaMA/comments/1elbn3q/quantize_123b_mistrallargeinstruct2407_to_35_gb/
>I quantize 123B Mistral-Large-Instruct-2407 to 35GB with only 4% average accuracy degeneration in 5 zero-shot reasoning tasks!!!
HOO LEE FUK !!
>>
>>101747334
holy fuck
>>
File: notfound.png (211 KB, 1920x1080)
211 KB
211 KB PNG
https://llm.extractum.io/list/?query=gpt-3
>>
>>101746914
Vagina connoisseurs.
>>
>>101747334
>GPTQv2
What? it's already out?
>>
>>101747334
>>
>>101747334
Damn. I suppose this doesn't work with koboldcpp yet though
>>
>>101744363
ok the 32B is actually crazy
>>
File: Bonzi_Really.png (1.12 MB, 894x894)
1.12 MB
1.12 MB PNG
>>101747539
>>
>>101747001
>>101747005
>>101747007
>>101747017
go back https://www.reddit.com/
>>
Man Mistral large really seems to like making defiant female characters so headstrong they fucking die.

>Do the thing and I'll shoot you lady *points a gun at her head*
>I'm a strong independent woman. I'll do what I want *{{char}} does defiant thing*
>*Gets shot*
Wow, zero sense of self-preservation. Art truly does imitate life.
>>
>>101747334
Yeah... sure
>>
>>101747591
nyo :3
>>
>>101744386
Nothing, it's shit
>>101744693
You know you didn't
>>101745737
That's because it's a shilled model by an organized group of shills, now
>>101745981
False, and shill
>>101746417
Shill
>>101747542
Crazy mountain of shit
....................
Better models can be found here:
>https://huggingface.co/Sao10K
or even here if you're into this stuff:
>https://huggingface.co/TheDrummer
>>
>>101747648
>"y-you're a shill!"
>*proceeds to shill shamelessly himself*
>>
wtf is this openwebui shit and why do you use it instead of llamafiles
>>
>>101746284
Get a stick of 32gb mem and put it inside
>>
>>101747026
For small end-to-end performance improvements on the order of ~1% you have to be very careful with how you measure it because the run-to-run variance can easily obscure the actual change.
Considering that the speedup for GELU itself is quite large you can reasonably assume that the changes would at the very least not be detrimental in terms of performance.
>>
We dont like censored proom models .. right
>>
>>101747690
>llamafiles
>>
>>101747690
>llamaballs
>>
>>101747705
Or the implementation is poor and only shows benefit on specific cpus at the detriment of others.
>>
>>101747715
answer the question or just fucking kill yourself already
>>
>>101747752
go back or just fucking kill yourself already
>>
>>101747739
Georgi confirmed that he gets better performance when benchmarking GELU directly though so in this particular case it seems to be an issue of measurement precision.
>>
>>101747760
Go back to where? I've been in this thread since 2008.
>>
Tried to get llama 3.1 to tell me how to involve guns in sex and it just noped the fuck out
>>
>>101747752
Use whatever you like. I just think that packing a 2-4gb file into an executable is a stupid idea.
>>
>>101747334
Can anyone confirm if this actually real? I'd eat my hat on livestream if quanting something down to 2b didn't functionally retard it.
>>
>>101747789
llama 3.1 uncensored models will let you do anything, get those
>>
>>101747809
You can see the ppl in the post if you care.
>>
>>101747819
Ill I see is
>gib gguf
>>
>>101747812
Where they at?
Speaking of, how do they actually uncensor it?
>>
File: 1720877578036169.png (64 KB, 1417x811)
64 KB
64 KB PNG
>>101747845
in LM Studio you can search for llama 3.1 uncensored (just search llama 3.1), the model I have is like 4 gigs, 8b, it works good

here is a test: no more "I can't do that" ethics bs.
>>
File: 1704390069253474.jpg (429 KB, 2472x1345)
429 KB
429 KB JPG
>>101747845
this is what it looks like, theres a huggingface search for public models.
>>
>>101747860
>>101747883
>proprietary software
Nah, I'm good.
>>
>>101747908
then just get the models off huggingface and use whatever frontend you like.
>>
Magnum man you're a wizard.
>>
>purged the RP logs from the dataset
Nah
>>
Llama 3.1 70B Instruct over multiple gens confused a boy's unnamed father and named great-grandfather until I edited the message right before the screwup to name the father when I refer to him. This happened at around only 2000 tokens.
>>
>>101744363
I've been out of the loop. Why does this post have so many responses and what's so good about these models?
>>
>>101747963
3.1 sucks other than 405B, which is itself mogged by Mistral Large.
>>
>>101747984
Full fine-tunes on base models that don't make them retarded.
>>
how do I make llamacpp work like meta-llama/llama-agentic-system?
something like asking it to read the csv file and plot the column in time series.
>>
>>101747984
It's called shilling.
>>
>>101748003
>>101747648
Hi lemmy
>>
>>101747963
By multiple I mean 5/5 explicitly making the same mistake.

I also tried with Llama 3.0 70B with the original chat history to see if it sucked less. It fucked up two times out of five, once changing great-grandfather to grandfather and once to father like 3.1 does.
>>
>>101747984
>Why does this post have so many responses
having dedicated shills in the thread does that
>>
>>101748053
So this guy has shills in this thread just to reply to his posts to his models for what exactly?
>>
>>101748056
It's pointless to argue with "shill" accusation schizos, they're on every board on this site, pop up whenever someone says they like something, and can never explain what the logical motivation of people shilling a free thing for no money is supposed to be. All they ever do is hint darkly and pretend it's obvious, while never explaining.
>>
>>101747984
Overhyped models from an organization assembled for the purpose of "organically" boosting their own shit. Chances are that Petra is among them as well.

>>101747998
Full finetuning is a half-meme. There's easily more to be lost than gained from it in most finetuning scenarios.
>>
>>101748056
People say this about any finetune that gets posted as a force of habit.
It's just a popular model series. Happens to be the only one where they avoid finetuning on top of Instruct and aren't settling for LoRA cope
>>
>>101748082
Nah, it's just tiring to reply to the defense force. And not needed. Anyone can put 2 and 2 together and judge by themselves.
>>
>>101748108
>All they ever do is hint darkly and pretend it's obvious, while never explaining.
>Anyone can put 2 and 2 together and judge by themselves.
>>
>>101748118
>he doesn't know
>>
>>101748125
>he cares not
>>
>>101748093
>There's easily more to be lost than gained from it
Absolutely not, have you ever tried comparing 72b FFT vs rank 32 Lora? It's not even close sorry
>>
>>101748082
>free thing for no money
Nothing is free. You're donating them your attention and time, at the very least. At this level, end-users are pawns ready to be discarded as soon as the so-called finetuners find employment somewhere after "getting noticed" (by inorganically spamming their shit everywhere).
>>
>>101748118
I wonder why there's a reply guy defending shilling in this thread right now.
>>
who shills for shills? honestly
>>
>>101748134
Hi lemmy
>>
>>101748134
>attacks your post
nothing personal, kid
>>
>>101748136
Just making fun of him. He immediately did the thing anon predicted. Look at this anon, for example: >>101748134 mentioning all the attention 'shills' spend, when he's spamming whenever someone mentions a model.
Twice. TWICE he failed to pick up on the irony.
>>
>Maxime Labonne: BigLlama-3.1-1T-Instruct (experimental self-merge using Meta-Llama-3.1-405B-Instruct and created with Arcee.AI's mergekit)
>https://huggingface.co/mlabonne/BigLlama-3.1-1T-Instruct
hi please click link thank
>>
>>101748158
ironic
>>
>shills
And are these supposed """""shills""""" in the thread with us right now?
>>
>>101748167
it's like raaaiiiiaaaaaaain....
>>
>>101748180
Yes...
>>
Let's add an anchor for shilled models, like /AICG/ has for shilled bots. This should calm dawn the schizos over time.
>>
>>101748264
you know what happens when you give schizos an inch?
>>
>any model gets mentioned
>shill goes off on a 90 post meltdown
very organic sam
>>
>>101747334
>read the comments
Christ that sub got invaded didn't it? The posters didn't use to be this retarded
>>
>>101744363
What is the individual contribution of each of the 29 members of the organization involved with this model?
>>
>>101748322
Do you even know what the word shill means?
>>
>>101748322
>*the slightest push against astroturfing happens in the thread
>shill goes off on a 90 post meltdown
fixed
>>
>>101748298
There are inches for everyone involved if you ask me.
I would love to check the new stuff that came out in one post, and we can't really put tunes in the news section.
So yeah, anchor please.
>>
>posts the link to your model that you and your friends on discord "fine-tunned"
>suddenly multiple posts "oh it's soooo good, mmm, the finetuners did a good job!"
very organic
there was a time I was testing these shitty models (every single one was terrible), but now I simply stopped using tunes from certain people and I recommend you to do the same
>>
File: file.png (20 KB, 738x155)
20 KB
20 KB PNG
>>101747334
>>
Is Celeste 1.9 good? Thinking about switching from Nemo.
>>
>>101748421
No, even the guy shilling celeste says 1.6 is better than 1.9.
>>
>>101748428
Why tho? What's the difference.
>>
>>101748414
>OP
>>
>>101748432
Objectively, it's trained with more data, if you compare the READMEs.
Subjectively, I don't know because I didn't use either.
>>
>>101748412
Hi lemmy
>>
>>101748465
Are there any models trained on the gigabytes of horrible fanfiction in places like AO4? I feel like that would massively increase a model's ability to reference popular media.
>>
>>101748412
The models are free, who the fuck cares?
>>
Why does mmap seem to make everything worse all the time in every single case? Why is it on by default? Why does it exist at all?
>>
>>101748502
>Why is it on by default? Why does it exist at all?
Jart
>>
>>101748490
Ads are always free. I'm sick of having them constantly shoved in my face all the same.
>>
>>101748490
>The models are free, who the fuck cares?
Me. I only want to see organic posts in the thread.
>>
>>101748490
Me, I care.
>>
>>101748530
>ads
it makes zero sense to advertise a free "product"
>>
>>101748490
the 2 anons involved in a campaign to drive opinion away from any model that isn't what they're paid to post about is supposed to seem organic but they hired a sperg who goes off on every single model mentioned that isn't what he's paid to shill so they glow when they work and look like retards instead
>>
>>101748564
what models are they paid to post about? who paid them? how would you know?
>>
>>101748564
>paid
>hired
nobody pays anyone except people who waste cash on cloud gpu rentals
>>
File: file.png (40 KB, 687x283)
40 KB
40 KB PNG
>>101748455
>>
>>101748553
consulting, sponsorships, etc.
these are some ways to make money with reputation/popularity
>>
>>101748490
I care, spam of retarded models obfuscates the discussion and can possibly hide really good ones in the deluge of trash
>>
>>101748585
Show me one single sloptuner who leveraged their popularity (in a niche within a niche, I might add) for something profitable.
>>
>>101748597
Undi literally got a job from his merges
>>
File: file.png (73 KB, 990x454)
73 KB
73 KB PNG
>>101748597
Pretty much all of them, if you read the readmes. Here's the miqu 120b dude.
>>
>>101748571
>what models are they paid to post about?
company name starts with an m and contains two vowels
>who paid them?
you know who
>how would you know?
curse of high iq
>>
>>101748617
It seems to me they sometimes get a random startup to help with compute expenses, but that's about it.
>>
File: file.png (147 KB, 501x840)
147 KB
147 KB PNG
>>101748597
>>
>>101748636
holy shit over $330k for running someone else's software on someone else's cloud for a few hours
>>
File: 1690367687764.png (392 KB, 1338x1516)
392 KB
392 KB PNG
>>101748597
I also remember Pygmalion getting like $10k from Spicychat in the early days. And Alpin is party of Anthracite. He probably got a couple more sponsorships since then, which explains the money to do full finetunes.
>>
>>101748564
Too contrived. What if they're just tired of seeing meme models from people known to care more about getting noticed than the quality of their "products"? Just look at who are the members of this Anthracite organization. It's basically most of the SillyTavern discord regulars.
>>
>>101748636
Kek he's probably richer than all of the fags who called him retarded here
>>
>>101748618
>company name starts with an m and contains two vowels
moo??
>>
>>101748654
>$10k for Pygmalion
>$100k for Goliath
$1M for Magnum...
>>
>>101748648
>>101748667
>1% of 1$ is 1$
>>
>>101748648
>over $330k
Check your tokenizer nigga
>>
>>101748667
these two are not mutually exclusive, retarded people are rich too, scammers are usually more successful in business than people doing honest work
>>
File: file.png (159 KB, 506x947)
159 KB
159 KB PNG
>>101748679
>>101748667
>>101748648
>>101748597
>>
>>101748685
coping
>>
>>101748687
>I merge models for fun
The brightest mind in the fine-tuning community...
>>
>>101748690
>coping
do you actually live on this planet or did you spend your entire life locked in the basement?
>>
You really use the llama t-tokenizer ? Talk full sentences?
>>
>>101748659
>It's basically most of the SillyTavern discord regulars.
You're making me feel bad for using SillyTavern.
>>
>>101748633
>the compute needed to merge a model with itself
>>
so did anyone with 2x3090 test the chink gptqv2 mistral "only 4% loss" in 35gb?
how fucked is it actually?
>>
>>101748702
If you are not successful, you are dumb. Making profit is a purely intellectual task. So this chum is much smarter than you, by the only measurable metric.
>>
>>101748502
Because developers are not allowed to disagree with them in public. You will be jarted and you will be happy.
>>
>>101748687
green tea is like the most boring noob one you can go on about, there are so many good ones
>>
>>101748180
yes, you are.
>>
File: OEMhYF15BjZUc7S0nN-u7.png (78 KB, 989x590)
78 KB
78 KB PNG
>Any local is better than 3.5 Turbo by now

>Microsoft WizardLM-2-8x22B 11.7 %

>Mixtral-8x7B-Instruct-v0.1 20.1 %

https://huggingface.co/spaces/vectara/Hallucination-evaluation-leaderboard
>>
>>101748734
>https://en.wikipedia.org/wiki/William_James_Sidis
>works mostly menial jobs and lives almost in poverty
>https://en.wikipedia.org/wiki/Nikola_Tesla
>lived in poverty, feeding pigeons
they must have been very dumb
>>
>>101748734
hi Undi
>>
As always jart is lying, first it was "the test failing have nothing to do with my PR" when it obviously was. Then it's "17% end to end improvement" ends up being same or slower in benchmark.
https://github.com/ggerganov/llama.cpp/pull/8878#issuecomment-2270546553
>>
>>101748846
>they must have been very dumb
They clearly were. EQ is far more telling of true intelligence than IQ, which is basically just an austism measurement.
>>
>>101748889
You already posted that in this thread.
>>
>>101748889
rent free
>>
>>101748845
Who
>>
There are literally HF orgs with members whose role is to shill their models and attack any other model maker and shit on their tunes, regardless if the 'competing' finetune is just as good or better.

>>101748734
$3000 in revenue after 1 year is nothing, literally nothing. This is not a lucrative market. It consists of broke ass basement dwellers who are mentally sick enough to find joy in talking to a computer program.
>>
>>101748911
xD
whatever floats your boat Undi
>>
>>101748845
>Higher the hallucination rate, more soulful it is.
Unironically a useful reference chart. Gemini 1.5 Pro is near the top at 28.1% for good reason. If only it was a bit looser for lewds, then it would surpass Opus by miles. Getting access is easy too, but that's a bit too much remote talk for /lmg/.
Little weird small models are so common high up.
>>
>>101748636
>>101748687
Why do they all have tranime profile pics
>>
>>101748933
>This is not a lucrative market.
Now look at this: >>101748654
>>
>>101748845
glm4 9b is one of the worst models i've ever tried
>>
>>101748998
Careful. Anti-shill poster will accuse you of shilling literally every model but glm4...
>>
>>101748994
>single consultancy agreement
>10k for compute only
kek even the crypto grifters are business gods compared to these guys. it's just a hobby and a money sink if you are not a pro training base models from the ground up
>>
>>101748654
>llama2
>10k
that money dried up a long time ago
>>
>>101749036
Of course it did. It's probably the only sponsorship and consultancy gig they ever got. Everybody knows that reputation doesn't accumulate over time. That's why they're doing full fine-tunes on H100s now, they have no money!
>>
>>101749053
>>101749053
>>101749053
>>
>>101748687
>>101748636
>3328$
>1370$
WHOA, look, it's fucking crazy, they are ALMOST like millionaires right now only by slop finetuning... damned ko-fi richfags
>>
Yep, no money involved. Nothing to see here. Just organic word of mouth.
>>
>>101749111
Pocket money until employment arrives, you know that.
>>
>>101748845
>Intel neural chat 7b v3-3
Omg how did it fly under the radar? It must be absolute dogshit if it scores this high on a benchmark.
>>
Does vllm gguf inference support CPU offloading?
>>
>>101748502
Runtime for LLaMA 3 8b q8_0 on my desktop when generating a single token on an empty context:
-CPU, mmap: 1.985s
-CPU, no mmap: 6.816s
-CUDA, mmap: 1.748s
-CUDA, no mmap: 1.537s
So for the CPU backend the startup with memory mapping is much faster and for CUDA the difference is negligible.
You can also prevent paging via --mlock which in turn necessitates the use of memory mapping.
The memory mapping provided by Windows was kind of bad though I think.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.