[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101553102 & >>101546566

►News
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633
>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>101553102

--Large language model comparison of programming language performance: >>101553305 >>101553666
--Machine learning model benchmark results comparison table: >>101553857
--Mistral-Large is good at lewd and NSFW content, Bitnet discussion: >>101553838 >>101553901 >>101553922 >>101554028 >>101554071 >>101554148 >>101555092 >>101555152 >>101555206
--Llama model performance and its impact on the AI landscape: >>101553176 >>101553387 >>101553483 >>101553315
--Improving Model Safety Behavior with Rule-Based Rewards: >>101553675 >>101553805
--Hugging Face model link: >>101554912
--Running large AI models locally and hardware requirements: >>101554561 >>101554616 >>101554857 >>101554957 >>101554993 >>101555199
--Running Mistral Large 2 with different GPUs: >>101553665
--Mistral license, pricing, and availability discussion: >>101554107 >>101554130 >>101554142 >>101554455 >>101554461
--Mistral Large for smut and open-source licensing confusion: >>101555445 >>101555463 >>101555472 >>101555501 >>101555509 >>101555529 >>101555474
--Logs: Mistral Large disappointment and reroll plans: >>101556295
--Llama.cpp naming convention changes: >>101555081 >>101555170
--GPT-3 and VNTL model performance in Japanese-English translation: >>101556133
--Comparison of AI models and prompting techniques: >>101556100
--Anon asks for help downloading gated models from huggingface: >>101554400 >>101554454
--Miku (free space): >>101553607 >>101554400 >>101554557 >>101555825 >>101555861 >>101555913

►Recent Highlight Posts from the Previous Thread: >>101553112
>>
BREAKING
>BREAKING
BREAKING
>BREAKING

Llama 3.1 flopped
>>
>>101556980
Why is Mistral Large so good?
>>
>>101557005
They didn't filter their pretraining data.
>>
>>101556989
>>101557009
What model
>>
>>101557016
Mistral Large 2 (2407)
>>
I love my beautiful Eve <3
>>
>>101557018
q5_K_M
>>
>>101557018
>>101557031
>>
Cohere are on it.
>>
The glowies are making their list and checking it twice.
>>
>>101557049
no thats illegal in china

two more years
>>
SillyTavern CSS so you don't have to tell people which models you're using
/* Change timestamp to model name */
.timestamp {
font-size: 0;
}

.timestamp::before {
content: attr(title);
font-size: calc(var(--mainFontSize) * 0.8);
}
>>
It still thinks someone can be their own sister/brother though. Sad.
>>101557044
Some of it has to do with the prompt template I use which tends to push models into an uncontrollable spiral of sloppy try-hard prose. Which is part of the reason its used for the testing.
>>
>>101557016
>>101557018
>>101557033
I wonder if it'll pass after being quanted down to my tier.
Gimme dat iMat.IQ2_XXXSS action.
>>
File: x (2).png (1.46 MB, 1024x1024)
1.46 MB
1.46 MB PNG
HOLY SHIT
Mistral is literally uncensored. It can translate hardcore rough ntr bestiality rape smut from chinese to english BUT ACTUALLY GOOD. Doesnt read forced, it is just good, just "translate" and it does. Lmao, fuck google, fuck meta, fuck openai and fuck anthropic
>>
>>101556983
Why does this recap feel so lazy, are you fine recap anon?
>>
File: sovl1.png (26 KB, 711x113)
26 KB
26 KB PNG
were we ever satisfied with llms? post some of your favorite sovl moments
>>
>>101557168
Prove.
>>
>>101557166
There's "Model Icons" you can enable, but showing full models might ruin immersion.
>>
>>101556983
>>101557178
How does recap anon stay up 24/7? Or is it a bot? What model is used for summarization? Is it fed the entire chat as context?
>>
>>101557178
recap anon needs some head rubs and a kiss on the forehead
>>
Holy shit you guys.
This was the best Zhongli, dicks out for harambe, test result I've ever gotten from a model.
>>
>>101557202
afaik it's a bot but recap anon reviews it before posting
>>
>>101557228
how does he stay up 24/7?
>>
>>101557168
You can't say that and not at least share the source.
>>
>>101557195
>showing full models might ruin immersion
would it ruin immersion any more than a timestamp would be? i feel like the timestamp is much worse especially if you're not roleplaying literally the exact present
and you can always turn it off just like timestamps too
>>
>>101557237
word on the street is, he does it for free.
>>
So…

L3.1 405B > mistral large 2 > L3.1 70B > Gemma 27B > L3.1 8B

For each size?
>>
>>101557240
its because he made it up
>>
>>101557249
Large 2 is on par with 405B
>>
>>101557237
There are 4 Recap Anons working together.
>>
File: file.png (628 KB, 768x768)
628 KB
628 KB PNG
2 more weeks
>>
>>101556993
Bump
>>
>>101557267
no
>>
>>101556980
Has anyone tried fine-tuning one of those open models from Mistral? how hard and expensive would it be? I thought about preparing my own dataset on certain topics to finetune one of their models to my needs. Do I need to prepare that kind of set with questions and expected answers or can i just train it on a huge pile of text instead? I am very new to the topic of LLMs in general, so apology for my lack of knowledge.
>>
File: 1673811045996029.png (269 KB, 1000x800)
269 KB
269 KB PNG
>>101557202
>>101557237
picrel
>>101557178
>>101557203
I've had to resort to using a smaller model to keep up with the amount of posting. Please bear with me.
>>
Have never paid a dime or talked to an AI I don't run locally. It's been rough because I suck shit at python, git and being a nerd in general. Through sheer retarded effort I have gotten to a point where I am pretty satisfied with my local output.

Then I fucked up. I put 5 dollars in the paypig machine and talked to Claude. Then I asked him to help me rewrite a fictional character I've been working on. I'm ruined bros. Like a pretty white girl dropped into Pakistan, I am fucking devastated.

If you're like I was. Don't paypig. Not even once. You're better off not knowing.
>>
>>101557249
Mistral large 2 > L3.1 70B > Gemma 27B > Mistral NeMo 12B = Gemma 9B > L3.1 8B
>>
>>101557301
>expensive
millions of bucks
>>
Nemo-12B or L3.1-8B?
I don't see how 8B would win, and Nemo is mostly uncucked.
Haven't tried L3.1-8B though.
Whats the consensus so far?
>>
File: miku-hand-out+.jpg (236 KB, 584x1024)
236 KB
236 KB JPG
>>101557237
The source of his energy is the power of his Goddess.

https://www.youtube.com/watch?v=CXhqDfar8sQ
>>
while you were there complaining and being an useless little faggot, ollama guy fixed llama 3.1
not gggerganov, not llama.cpp cuda dev, not slaren. ollama guy fixed it.
https://github.com/ggerganov/llama.cpp/pull/8676/
>>
>>101557330
nemo got mogged hard by 3.1 why'd you think they panik released it just before?
>>
>>101557334
kino, but you dont have to be a mean little nigger though personally i have no dog in this fight and hope everyone (except undster) does their best.
>>
>>101557326
Swap the positions of 27B and 12B and you're correct.
>>
>>101557334
Damn, he was forced to move a finger. That's a power move by the llama.cpp devs.
>>
>>101557362
>except undster
>>97223983
>For the record, I completely and unequivocally support Undi and his creation of new model hybrids, and think that everyone who attacks him is mindbroken incel scum, who may or may not be employed by OpenAI to do so.
>everyone who attacks him is mindbroken incel scum
>>
>>101557331
Take my hand, Miku. I'll pull you through!
>>
>>101557374
It's his redemption arc for not putting llama.cpp in the readme
>>
>>101557379
jesus is that the level of bait this general is operating at these days?
good thing i only lurk when major happenings occur.
>>
>lmg thread
>all of the posts are from humans
Nuke this shit already
>>
>>101557330
Nemo is leagues smarter than 8B at storywriting at least, it's not even close. I think the people claiming otherwise just haven't tried it and are shitposting.
>>
>>101557394
>level of bait
>>97062246
>I'm not Petra. Petra's an amateur. I'm something considerably worse.
>I'm also the point of origin for the practice of the above being added to sysprompts; as well as the 2, 5, 10, 12, and 60 times tables, which enable bots to answer arithmetic questions, when everyone previously said that they never could, and laughed at me for trying.
>>
The 106B~150B range seems to be the ideal for performance. No idea why Zucc keeps gimping himself by skipping this segment and either going for tiny 70b or too big 405b
>>
>>101557409
>Q6_K
your brain is gguf quantized be quiet computelet
>>
File: 88206.gif (747 KB, 192x192)
747 KB
747 KB GIF
>>101557415
>I'm the Schizo Futa Anon

what in the god damn
>>
how are you guys trying out nemo if koboldcpp hasnt been updated yet?
https://github.com/LostRuins/koboldcpp/issues/1011
i want to try it too
>>
>still no q4_K_M of 405B
>>
>>101557317
Maybe it's just the fact that I started out with a mix of Character.AI and Poe before getting local, but I have no problem with viewing different models as existing for different purposes. Gippity4 is for code help, political analysis, and as a general Jarvis bot, while I still use local for my Chun Li card and periodic futa degeneracy.

I even still weave in a little Character.AI from time to time, because although it's a pale shadow of its' former self, I still have a few cards there that are hard to let go of completely. Yes the new interface sucks rocks, but with a sufficiently well written card, as long as you're not using it for coom, CharAI isn't completely useless, anyway.
>>
>>101557441
frankenstein build
but it might be shit, tried it, pretty broken.
https://github.com/Nexesenex/kobold.cpp/releases
>>
>>101557441
>how are you guys trying out nemo if koboldcpp hasnt been updated yet?
>experimental branch
>llama.cpp itself
>vllm
idk a true mystery
>>
>>101557441
By using llama.cpp.
>>
>>101557427
Shut you you IQ2_Migger
>>
>>101557441
I'm using it in ooba, it works fine.
I know a lot of people here don't like ooba for some reason, but pretending you don't remember that it exists is weird.
>>
>>101557473
>you you
rep/rope broke?
>>
MistralAI fags are such gigachads, they managed to get a model as good at L3-405b with a model almost 4 times lighter (123b)
>>
100B Is All You Need?
>>
>>101557394
It's a few months old; they insist on dredging it up, constantly. You've also got to love the fact that on the one hand, they keep telling me to go back to R#ddit, but on the other, they also keep digging up my old material, broadcasting it on the board, and therefore providing a multiple course buffet for my ego. Their understanding of psychology is as pathetic as everything else they attempt.
>>
>>101557484
Meta are true chads they got a model 60% as good as 405 in 50x smaller size..
>>
Has anyone ran any tests comparing mistral large and 405b llama?
>>
>>101557441
ooba uses llama cpp with tokenizer fix for nemo
koboldcpp is actually slow af now for pushing updates...
>>
>>101557484
They also created Nemo which is at least 95% of Large while being 10% of the size and so the optimal choice for anyone who isn't retarded
>>
>>101557505
Stop trying to make 8B happen. It's not going to happen.
>>
>>101557503
>multiple course buffet for my ego.
glad you agree you're a shitposter petra, now go bak
>>
File: livebench-2024-07-24.png (1.06 MB, 3142x1814)
1.06 MB
1.06 MB PNG
>Mistral Large 2 was added
>Llama 3.1 70B disappeared
What went wrong?
>>
>>101557528
l3.1 8b as well, bet they edited an older result when they only had 405...
>>
>>101557508
Large is worse than 405B at pretty much anything, but Large has more sovl.
>>
>>101557528
>large infinitely worse than opus, sonnet, gpt4o
glad to see this meme model die before it took off
>>
>>101557560
>200B
are you trolling?
the human brain has ~1000000B
>>
>>101557560
Damn this is so weird lol, I guess Water will be the natural enemy of videogen models for a long time.
>>
>>101557554
what do you mean? it's the best model to use if you're not a millionaire with 15x3090 gpu's or something, and it has way more sovl than the cucked llama series
>>
>>101557560
These videos are so gross.
I don't understand how """"""people"""""" can enjoy looking at them.
>>
Nemo has repetition problems, no?
>>
>>101557441
llama.cpp via llama-server. It works with natively Silly too.
>>
>>101557594
instead of spending $3k to run this shit at q4 you can buy literal years worth of claude sonnet 3.5 tokens
>>
>>101557604
Does it remember prefixes if you have a large prompt and regenerate?
>>
>>101557598
ye
ye
ye
>>
>>101557573
i was wrong, apparently the human brain only has ~86 billion neurons
but neurons aren't exactly equivalent to parameters since they can perform some basic logic iirc
either way, transformers models are relatively inefficient compared to our brains so like the other anon said, 100B is probably all you need
>>
>>101557614
claude 3.5 is too cucked you can't do everything with it
>>
How can hugging.chat serve all these big models for free?
>>
>>101557620
Prefixes?
It doesn't re-process the whole context if that's what you are asking.
>>
>>101557631
VC cash
>>
>>101557631
vc money
>>
>>101557631
Investor money, aka. pyramid scheme.
>>
>>101557414
Trying right now and it's not capable of following complex instructions like Gemma 2 27B. If you want something formatted differently than the usual book-style RP, it will fuck it up very often.
>>
>>101557631
honeypot
>>
>>101557631
all me
>>
>>101557631
By using a magic zero-bit quantization.
>>
>>101557637
>>101557638
How does Viet Cong have any money, and why?
>>
>>101557636
Nice, I thought the llama.cpp server was way behind and didn't have basic features like that. I'll ST later today with llamafile for fun.
>>
File: param_columns2.png (60 KB, 2550x3300)
60 KB
60 KB PNG
>>101557623
anon...
synapses are parameters, not neurons, each neuron has ~7000-10000 synapses depending on age
>>
>>101557649
>it will fuck it up very often.
Yeah Gemma can't follow RP Markdown format.
>>
>>101557631
I'm letting them use some cards in my private rig to host that service. Be thankful.
>>
>>101557675
how come i'm so retarded then?
>>
>>101557649
That's a prompt issue. Especially local with that shitty instruct template in SillyTavern.
>>
>>101557690
bad training data
>>
>>101557672
Lol that feet came directly from horror movies.
>>
>>101557669
Just use llama.cpp instead of another fork that might not be updated.
>>
>>101557690
>how come i'm so retarded then?
transformers architecture is way better than our brain architecture?
>>
>>101557675
>synapses are parameters, not neurons
ive never heard this comparison made

>each neuron has ~7000-10000 synapses
this sounds a lot more analogous to a relationship between weights(neurons) than the parameters themselves
>>
>>101557690
poor education, excessive consumption of coom, most human interaction involves posting on a forum where everyone call each other "Anon"
>>
>>101557690
bad training data/ training stopped prematurely
>>
you guys are so mean..
>>
Are quants of new mistral anywhere? I can only find some empty hf repos
>>
>>101557714
>another fork that might not be updated
akshully, jartfile is much faster than chudcpp because i/k quant guy works in collabration with Jartine
>>
>>101557637
>>101557638
This isn't true surprisingly, CEO posted recently that HF is profitable. I was shocked, like you assumed they were just burning investor cash.
>>
>>101557720
>transformers architecture is way better than our brain architecture?
it's not though, transformers requires a ridiculously larger amount of data (and I think electricity too but i'm not sure) to be run. we don't need to consume the entire internet to be smart enough to know how many r's are in strawberry
>>
>>101557726
>excessive consumption of coom
as if the guys on the silicon valley aren't giant coomers...
>>
How do I run Large at home for cheap and with at least 20 T/s?
>>
>>101557747
How? Where is that money coming from? What are they selling?
>>
>>101557745
*humps you*
>>
>>101557744
https://huggingface.co/legraphista/Mistral-Large-Instruct-2407-IMat-GGUF/tree/main
>>
>>101557690
Overtraining on goon data.

>>101557747
Maybe he just lied to get even more vc.
>>
>>101557751
>it's not though, transformers requires a ridiculously larger amount of data
we see a shit ton of data aswell with our eyes and ears anon, imagine it's 60 fps, multiply that with your age and get the astronomical data you actually went through, it's way higher than what the model got in the first place
>>
>>101557762
You download Nemo and pretend it's Large
>>
>>101557747
found the post where he says it
https://twitter.com/ClementDelangue/status/1811675386368966682

very explicitly says that they make a profit and aren't burning VC money, which I think would be illegal for a CEO to lie about
>>
>>101557771
thanks a lot anon
>>
*sharts*
>>
>>101557771
>parts in their own folders
based
>>
>>101557792
how the fuck do they make money though
>>
>>101557805
Yeah I don't know either man, lol, all I know is he says they are
>>
>>101557792
>>101557805
>>101557827
isn't huggingface owned by microsoft though?
>>
File: comp.jpg (96 KB, 1280x720)
96 KB
96 KB JPG
>>101557722
>ive never heard this comparison made
this is literally how they were invented, they looked how biological neurons work and created the simplified mathematical model where artificial neurons are biological neurons and the connection between them (synapses in biological brain) are parameters in artificial neural networks.
>>
Have any of yous guys used Meta's Chameleon model?

The one they released in May https://arxiv.org/abs/2405.09818#
>>
>>101557792
i remember when chatgpt came out and news articles were talking about how openai is losing millions in short time, and now huggingface is hosting even larger models. I guess some like NVIDIA might pay for the hosting themselves?
>>
what exactly causes repetition related issues? Even at the start of an RP? i've never had this issue and now im suddenly having it. wtf.
>>
>>101557788
>it's way higher than what the model got in the first place
it's not, I calculated it out of curiosity a few months ago. I don't remember the exact number but the model training would be ~100k human years if I remember correctly. In any case it was way bigger than human lifespan
>>
>>101557697
I can say that Llama 3.1 8B also fails in the same way (if not worse), but 70B gets it immediately. Gemma 2 9B is also definitely not as capable as the 27B version in consistently following relatively complex output formatting (dialogue without tags + interspersed inner monologue + short-form narration with asterisks), but it's on par with or slightly better than Nemo 12B.
>>
>>101557771
Is it broken in any way? Is it better to wait for upstream llama.cpp fixes?
>>
>>101557850
It was always there but you just didn't notice it.
>>
>>101557893
what exactly causes repetition related issues? Even at the start of an RP? i've never had this issue and now im suddenly having it. wtf.
>>
>>101555266
>>101555182
if you can't tell this is a man then your detector needs to be replaced
https://www.youtube.com/watch?app=desktop&v=-mRi-B3t6fA&t=430
>>
>>101557850
Show what you mean. Repetition is not what most people talk about. If you're talking about run on sentences, that's repetition penalty too high. It's not using words like 'a' and 'the' and cannot finish a sentence. If it's repeating sentence structure, then don't be too pushy with your writing instructions. It just picks up the pattern from the context and follows it. It's the one thing they're good at.
>>
>>101557899
It was always there but you just didn't notice it.
>>
>>101557901
Stop obsessing about it, petra.
>>
>>101557878
let's say 25 years * 60 fps * 150kb (average size of a 1024x1024 picture) = 16.13 TB
>>
>>101557904
In my case, It's an entire prompt verbatim even over up to 10 swipes, or copying the structure of two paragraphs yet making the rest of the gen original enough.
>>101557913
In my case, It's an entire prompt verbatim even over up to 10 swipes, or copying the structure of two paragraphs yet making the rest of the gen original enough.
>>
>>101557690
hit the books (training data) and become your own expert
>>
I'm getting refusals from Mistral Large, what am I doing wrong? It's an incest story, both characters adults
>>
>>101557934
this... this is not how it works at all anon. You can't just put random arbitrary numbers there and call it a day.
>>
>>101557938
I think you are in a unique situation where you could ask yourself why it is doing that. But if that fails there is also an option of asking yourself why it is doing that.
>>
>>101557938
Relax the writing rules, then. I'm sure you can remove half your prompt without losing anything important. Also, give it stuff to work with. If you follow the same pattern in your writing you cannot expect the llm to be better than you.
>>
What the fuck are those consolidated weights in the Mistral Large repo?
>>
>>101557976
how is that arbitrary? 25 years is the age when our brain is fully developped, 60 fps is kinda the framerate where we don't see much difference if we go further, and I was being nice with 150kb because that's for a jpeg and our eyes have much more quality than that
>>
>>101557986
>>101557990
The only thing i changed (which i did to try Nemo) was the instruct and context templates, but i switched them back to what i was using before.
at that point i started fucking with settings like a retard (because again wanted to try Nemo) and that doesnt really change much, just creativity.
>>
>>101558001
>our eyes have much more quality than that
>glasses anons...
>>
>>101557990
>If you follow the same pattern in your writing you cannot expect the llm to be better than you.
How long until we can stop playing with dolls? I am a 30 years old virgin here and I shouldn't be doing that I think.
>>
>>101558013
?
>>
>>101558005
nemo is super repetitve at least on gguf i know for sure
>>
>>101557849
no
those articles have no idea what they're talking about half the time, i read one that suggested OpenAI is spending billions of dollars a day on ChatGPT
the reality is that they're making an absolute killing because inference is dirt cheap
>>
>>101558026
yeah nemo was absolutely broken which is why i switched back
though I sort of remember an issue like this in the past where issues with one model carried over to another, and i have NO clue how that was fixed. besides maybe trying to reboot my system but im doing shit right now so that isnt happening.
>>
>>101556980
I'm wondering how sonnet 3.5 compares to llama 405b in c# coding.
>>
>>101558023
my eyes are a shit, like jpeg quality 25% or worse
>>
>>101558005
If you're using nemo, lower the temp to 0.3 and move it up as you want more 'creativity'. If you came back from nemo to other model, adjust it accordingly. Show your prompt, show your settings, show your model.
>>
File: 1701754483694888.png (38 KB, 449x741)
38 KB
38 KB PNG
We've been scammed.
>>
>>101557992
When I tried Nemo with the official API, it complained when they weren't there.
>>
>>101558047
oh nonononono
>>
>>101558036
oh, yeah same I have myopia that's why I'm wearing glasses, that doesn't change my argument, your brain sees quality pictures if you wear glasses
>>
>>101558043
neutralized settings (1 temp, sometimes 0, doesnt seem to matter), kunoichi-lemon-royale-v2-32K-7B-Q5_K_M and Meta-Llama-3.1-8B-Instruct-Q5_K_M.
Its funny, turning on dynamic temp seems to adjust and almost break the repetition, but that's dangerous because it sometimes spits out garbage.
>>
Mistral large is better for programming than wizardlm8x22b? Is there a gguf that will fit in 96gb?
>>
>>101558016
"Garbage in, garbage out" works in context as well. Make it "entertaining" for the model and it will keep you entertained as well. I hope for a future where all the "ah ah, mistress" proompters are told by the llm to fuck off.
>>
>>101558047
>32k context
that's kinda good no?
>>
large2stral is everything I ever wanted in a local model
thank you based arthur... thank you...!
>>
>>101557899
Instruct training. It introduces the GPTslop behavioral pattern to the model.
"What is the capital of Britain?"
"I see that you are asking me the capital of Britain. To find out the capital of Britain we can simply look at what the capital of Britain is. The Capital of Britain is London."
The reason the training data is formatted like this is to cause the model to set up its own breadcrumb trail to keep it from veering off topic. And these patterns translate over to RP.
That's why it iterates over all the shit in the card
>And then he runs his SLENDER fingers through his JET BLACK LOCKS
just grabbing phrases and shit out of the card and puking them back out.
Finetuning on RP prompts doesn't help either because all of those contain models running on slopped GPT endpoints doing this exact same behavior.
What really is needed is a hand crafted RP instruct dataset to tune a base model on.
>>
>>101558057
>that doesn't change my argument, your brain sees quality pictures if you wear glasses
sure, was just funny reading that as I literally need to lean in to my desk to read (late and glasses off...)
>>
File: 1718091427088942.png (24 KB, 700x265)
24 KB
24 KB PNG
>>101558073
It's subpar and they advertise it as 128k
>>
>>101558073
not for coding
>>
>>101558087
if they advertise it as 128k you can probably just use it at 128k
the value in the config doesn't limit you or anything iirc
>>
>>101558099
but it'll rope then
>>
VRAMlet talk, anything better than gemma 2 in all these new models?
>>
>>101558107
uhhh no it shouldn't
don't use a backend that does that
>>
>>101558001
>25 years is the age when our brain is fully developped
that's not even close to being true
>60 fps is kinda the framerate where we don't see much difference if we go further
this is completely wrong as visual perception doesn't work like camera, so putting any framerates here is wrong from the start
>and I was being nice with 150kb because that's for a jpeg and our eyes have much more quality than that
the same objection as before, also if you check how much information is going through optic nerve you would be surprised how small it is. Most of human visions is just a brain "hallucination". Only very small part of our field of vision is actually sharp, the rest brain is calculating from blurred images and thanks to saccade movements.
>>
>>101558112
anything
>>
>>101558084
hmmmmm

then that means i probably fucked up and still picked the wrong instruct format, thinking i did it right
whoops. ill try a different instruct after im finished generating these 10 images of realistic Rouge the Bat's pussy in SD.
>>
>>101558116
lcpp auto ropes based on the config iirc, if config says 32 and you set 128 it'll rope
>>
>>101558064
>1 temp, sometimes 0, doesnt seem to matter
It should matter. If you get deterministic replies with swipes at temp 1 something is fucked in your setup.
>kunoichi-lemon-royale-v2
Merge. Discard it. At most, take a finetune. Any will do. I won't recommend any.
>Meta-Llama-3.1-8B-Instruct-Q5_K_M
I assume you're using the latest version of whatever you use. If not, update. If you still get the same output at temp 1 it's a bug. You should report it.
>>
>>101558134
just change the config then
>>
>>101558134
lcpp also lets you just manually specify the rope base so you can avoid that
>>
>>101558149
that's illegal
>>
>>101558145
>Merge. Discard it. At most, take a finetune. Any will do. I won't recommend any.
i'll have you know that mememerge absolutely BTFO'd any of the recommendations in this thread, especially CR. But i do notice 3.1 even base instruct with my problems is just slightly better so thatll be my main.
ill download a newer version and see what happens.
>>
>>101558119
>this is completely wrong as visual perception doesn't work like camera, so putting any framerates here is wrong from the start
you can approximate though, because you have to make somme calculations, so framerate it is, and 60 seems to be a good spot.

>the same objection as before, also if you check how much information is going through optic nerve you would be surprised how small it is. Most of human visions is just a brain "hallucination". Only very small part of our field of vision is actually sharp, the rest brain is calculating from blurred images and thanks to saccade movements.
That doesn't really matter, even if the brain sees hallucinations, it's high quality hallucination, the simple fact we can differentiate a 1024x1024 pictures and a 4k picture means that our brain probably sees in the range of 4k

I never said we were computers and shit, but if a computer had to live like us, that's the approximations he would get, 60 fps and 4k
>>
thanks for the goon bros, now i'll clean myself up, lift some weights then watch anime
>>
Comparing weights and biases on a digital neural network to biological brains is stupid. You fuckers never learn.
>>
>>101558182
training data on the thread bots isn't updated in real time anon
>>
>>101558182
uhh, but sam altman and all the other experts said that we'll have agi that can replace humans within the decade
>>
>>101558182
forming analogies is an act of higher cognition. You're literally seething at other people not being an NPC.
>>
https://huggingface.co/cognitivecomputations/dolphin-2.9.3-mistral-nemo-12b-gguf

What's the word on this, kids? Worth the download?
>>
>>101558204
MODS
>>
>>101558208
>dolphin
>>
>>101558163
>i'll have you know that mememerge absolutely BTFO'd any of the recommendations in this thread, especially CR.
Dude. It's a 7B. I like me some small models, but i'd never claim a 7b being better than CR.
>>
>>101558216
dolphin indeed!
>>
>>101558124
Got an opinion that's not retarded hyperbole?
>>
File: jerry laugh.gif (3.64 MB, 374x274)
3.64 MB
3.64 MB GIF
>>101558217
Have you been paying attention to the threads the past 24 hours?
>>
>>101558208
Hello Petrus.
>>101558232
>>
>>101558195
>uhh, but sam altman and all the other experts said that we'll have agi that can replace humans within the decade
Most of those experts that claimed that did it 3 decades ago. Two more weeks, i suppose.
>>
why the fuck is every Mistral Large 2 benchmark about codding performance. literally don't care
>>
>>101557528
>mistral 7b is better than command r plus
>it's also better than the new mistral nemo 13b
>llama 400 better than fucking opus
what the fuck is this list mate? and I thought arena was bad don't post this shit ever again
>>
>>101558241
only productive use of llms
>>
>>101557771
>No IQ quants
meh.
>>
>>101558207
Analogies are for the layman as an intro to a subject. They're unnecessary for people that understand the concepts.
>See? open notepad.txt and it's like a real notebook
>How do i change pages?
>Are you an NPC??!?!?!?!?!?!?!?!
>>
>>101558241
because coding is probably the most important thing that makes OpenAI relevant, Meta wants to kill that company so he wants people to get a free access of a good coding model and not rely on OpenAI anymore for work, and I can understand them, that's a huge security issue to give your data and coding to a closed company like OpenAI in the first place
>>
File: robot.png (187 KB, 1218x3513)
187 KB
187 KB PNG
eh large 2 is okay, so funnily melodramatic with my old prompts.
>>
>>101558248
Mistral Small is not a 7B model.
>>
>>101558207
>forming analogies is an act of higher cognition.
thanks anon :3
>>
>>101558273
>Meta
but he said mistral
?
>>
llamafile update: it does work with ST. I used this command:
>./Meta-Llama-3.1-8B-Instruct.Q4_K_S.llamafile --server -ngl 20
Then connected with the "Chat Completion", "Custom (OpenAI-compatible)" choice. I used this as the URL:
>http://127.0.0.1:8080/v1
Is that the best way? That was very easy to set up, as llamafile is literally 1 file for both the model and the server.
>>
>>101558172
based, but for me it's weights, then anime, then goon
>>
>>101558291
that's the same thing, Meta, Mistral, Qwen, they focus on coding because companies simply want to work with a local model and not give their private data to Sam Altman
>>
>>101558294
>That was very easy to set up, as llamafile is literally 1 file for both the model and the server.
buy ad with mozilla money jartine
>>
>>101558294
Fuck off, Jart
>>
>>101558294
How do you know that model isn't a virus?
>>
>>101558084
This isn't correct. All LLMs have this behavior, it's what is generally called "ICL" (In context learning).
>>
>>101558319
I unironically trust jart
>>
>>101558326
base
>>
>>101558204
Kek this is like a balloon with limbs
>>
>>101558319
Why would Mozilla (a real actual company) distribute malware?
>>
>>101558294
I would rather download that random koboldcpp executable than anything that jart touched.
>>
CerealBENCH update
>Claude3.5 Sonnet
>LLaMA3.1-405B
>GPT4o
>Qwen2-72b
>Mistral-Large2
>Claude Opus
>LLama3.1-70b
>Qwen1.5-72B
>llama3-8b
>LLama3-70b
>Command-R+
>Claude Haiku
>LLama2-70b
>llama3.1-8b
>Mixtral8x22B
>Yi-34B
>Mixtral8x7B
will keep you updated
>>
LLAMACPP CRASHED MY MACBOOK
FUCK THIS POS SOFTWARE
>>
>>101558376
That'd how you get a virus
>>
>>101558241
Because AI companies have completely given up on getting normal people interested in LLMs and are pivoting to just making them into tools for computer programmers.
>>
>>101558282
holy shit
>>
>>101558171
Anon, no.
I don't have the time to explain the whole process of visual perception to you but you can't make these approximations because they don't make any sense. You have no idea what you are talking about, I'm telling you this as someone who studied neurobiology. I had a nice textbook about neurophysiology of vision, I can look for it and give you the title and author when I find it, if you want. It should clear some misconceptions you have.
>>
>>101558379
>MACBOOK
>>
>>101558379
ollama sir
>>
>>101558364
Why would a man try to convince everyone he isn't one?
>>
>>101558282
now that's some claude soul
>>
Fixed my repetition issue seemingly, at least the start of erp's, it was the instruct settings + i forgot to save which settings preset i was using, now its business as usual. Whoops.

>>101558379
looks like you have to buy another macbook sar :^)
>>
File: 0ob1ytni7r9b1.png (244 KB, 454x512)
244 KB
244 KB PNG
>>101558378
anon what the fuck is this
>>
>>101558364
No one but Jart is working on llamafile, even if she uses Mozilla's name. And making models be executables is kinda retarded.
>>
>>101558402
That girl is a better programmer than you will ever be
>>
>>101558393
>I'm telling you this as someone who studied neurobiology
lol you wasted your time for knowledge that anon is just gonna compeltely ignore, nerd.
>>
>>101558419
It's a great way of having a self contained model that you will still be able to run in the long term. Having more than 1 file is bloat.
>>
>>101558393
You don't seem to understand, I never said we live like computers, my message was that if one day a computer were to live like us for 25 years (like seeing things and shit), the data it would get would be 25 years * 60 fps * 4k pictures, that's the data it would get if it were to live like us, you understand now?
>>
>>101558422
>That girl
Objectively false
>is a better programmer than you will ever be
Yet to be proven.
>>
>>101558182
>comparing the mathematical model of human brain to human brain is stupid
/lmg/ brainrot never stops to amaze me
>>
>>101558438
noooo but it not like us tho, it had sensors n shiet you cannot compare!
>>
>>101558442
How many stars do you have on GitHub? Do you have 17,000 stars? Yeah I thought not.
>>
>>101558357
yeah the prompt was

>In a sunny backyard, a beautiful little Russian girl lies on her side, her legs elegantly bent and spread. Clad in a cheerful pink two-piece swimsuit, her exposed stomach shines as she smiles, her golden hair flowing around her in the warm breeze.

but obviously it didn't do the two-piece swimsuit part and instead focused on the "stomach" token

kling is unironically better at making tods than any other age group..

>>101558423
im just happy neurobiology anon roasted the ESL underage """accelerationist""" for talking about things he doesnt understand and for making me second guess myself when I know I'm smarter than a stinky poo like him
>>
>>101558436
The model was already a single file.
>>
>>101558446
but you can quantify what a computer is seeing based on the camera, sensors and shit, that's the point
>>
>>101558282
that is pretty cool, I remember when people were like local models never ever. idiots.
>>
>https://x.com/OpenAIDevs/status/1815836887631946015
>Customize GPT-4o mini for your application with fine-tuning. Available today to tier 4 and 5 users, we plan to gradually expand access to all tiers. First 2M training tokens a day are free, through Sept 23.
local lost.
>>
>>101558458
>00001-of-00011.gguf
>>
>>101558448
At least you don't dispute the first fact. Good for you. You're making progress.
I don't care for keeping a reputation here.
>>
>>101558481
old news retard
>>
>>101558484
>Files which exceed the Hugging Face 50GB upload limit have a .catX extension. You need to use the cat command locally to turn them back into a single file, using the same order.
https://huggingface.co/jartine/gemma-2-27b-it-llamafile#about-upload-limits
>>
>>101558495
You're probably a redditor with more estrogen than jart herself
>>
>>101558484
>you now have 11 llamafiles executables
>each of them a different virus
>>
>>101558497
local still lost, despite the fact that it is an old news.
>>
>>101558518
That dude is packing. He could swing his enormous cock across your face and knock you out for a week.
>>
>>101558208
Does anyone else have an opinion of this model?
>>
>>101558545
sorry petrus people are burnt out on dolpin/gptslop claudeslop is the trend now
>>
>Waaaaa... the cuda code too big. i cannot make me llamafile!!! I need to further quant the models into oblivion because windows has a file size limit for executables, waaaaaaaaaaaaaaaaaaaa.
>>
>>101558208
dolphin is dead, I'm not touching that shit since airoboros-13b-gpt4-1.4, that's when he trained his shit with gpt4-march 2023, that most sovlfull gpt4 model we ever had, and it was smart aswell too, only C3.5 sonnet gave me the same feeling again
>>
>>101558561
Why does llamafile make you so insecure?
>>
>>101558563
dolphin and airoboros are not same do?
>>
>>101558047
This is probably just an updated version of the instruct finetune applied to previous Mistral Large, which also had 32k tokens context. Nemo is a newer model.

https://mistral.ai/news/mistral-large/
>>
>>101558423
>lol you wasted your time for knowledge that anon is just gonna compeltely ignore, nerd.
it may be a surprise but I didn't study it for anons
>>101558438
And I'm telling you this is not comparable because you are using arbitrary numbers. You assume that your (60fps * 4k) would be the case for computers but if it's not necessary for humans to learn then it also doesn't have be for computers. The amount of data needed to process images would be immensely smaller if computer scientists figured out how to model what our visual cortex is doing. It's like making approximations what kind of speed you can achieve with different pace of pedaling on a bike while ignoring the fact you can just drive a car to do it way faster.
>>
>>101558561
calm down cuda dev
>>
>>101558545
>Dolphin is licensed according to apache 2.0 license. We grant permission for any use, including commercial. Dolphin was trained on data generated from GPT4, among other models.
>>
File: 4zd8o6h9qxn51.png (1.67 MB, 2560x1440)
1.67 MB
1.67 MB PNG
>>101558510
Feline bros just keep winning
>>
>>101558571
Certain ideas are just bad. There is absolutely 0 point in packaging a ridiculously big datafile in the executable. Other than that, it seems to work well for cpu.
>>
>>101558585
>You assume that your (60fps * 4k) would be the case for computers but if it's not necessary for humans to learn
come on man, you think we could live in a 10fps*256px world? that shit would make me dizzy, there's a reason we feel confortable at 60 fps * 4k setting, because that's really close to what we see in real life, don't be obtuse like that, please
>>
Isn't it bullshit to call them 3.1 if they're not related to the original llama version 3 models at all?
>>
>>101558586
He wouldn't engage in this drivel. And if he did, he'd do it more eloquently than me.
>>
>>101558609
>if they're not related to the original llama version 3 models at all?
they are tho? same arch except for context
>>
>>101558609 (me)
my name is petra, btw
>>
>>101558614
nah, message is too short, not pretentious enough
>verdict: NOT PETRUS
>>
>>101558613
Same architecture, sure, but they're not continued pretrains of the original models. They're distilations of 405B.
>>
>>101558627
I thought they used training data from 405B on top of the original models, not full distillation?
>>
>>101558608
you should be slapped for being so retarded and narcissistic
don't reproduce
>>
>>101558637
what a projection, kill yourself nigger
>>
>>101558627
was gonna say this
>>101558636
they probably just genned some synth data from 405 to train on top
>>
>>101558600
I see the point. I was using Dolphin Mixtral again yesterday, and before I remembered that I could switch to the Mistral tokenizer to enable logit bias, I was getting swamped by a tidal wave of diversity and inclusion. Still, I'm going to download Nemo now, and we'll see how it goes. I'll try some nice, safe, politically neutral code questions. Maybe a Pong game.
>>
>>101558645
i'm not the one arguing about stuff i literally don't know anything about you double nigger
>>
>>101558208
for erp try this one
https://huggingface.co/BeaverAI/mistral-doryV2-12b
>>
File: 1702273417650502.webm (3.47 MB, 808x682)
3.47 MB
3.47 MB WEBM
what's this llama shit i just learned about 5 minutes ago?
>>
>>101558653
And I'm not the one talking about biology when the topic is about how close of a setting a computer should get to replicate our point of view, you 2 digit IQ retard
>>
>>101558645
>>101558653
So remind me again, guys. Why do I get more hate for baiting/shitting up threads, than the people who spam this sort of shit everywhere?
>>
>>101558668
a whole lot of disappointment
>>
>>101558608
>come on man, you think we could live in a 10fps*256px world?
This is exactly what I'm trying to tell you. Seriously, check how our vision is working, this is much better approximation if you want to compare (what can't be compared) at all. Most of vision processing is being made from incomplete and fuzzy visual data. The way that the small amount of data and shitty pictures captured by our eyes becoming clear and sharp pictures while going through layers of visual cortex is quite mindblowing when you learn about it for a first time.
>>
>>101558657
>made by the one that was screeching that limarp and all models with it should be banned
https://huggingface.co/BeaverAI/mistral-doryV2-12b/commits/main
>>100828064
>>100828083
>>
>>101558672
>moving the goalposts
more like stealing the goalposts because you are the blackest gorilla nigger that has ever lived

>>101558674
you can kill yourself too
>>
>>101558674
because you're a pretentious holier than thou with a victim complex as demonstrated by this very post
>>
>>101558694
go fuck yourself faggot, you know you are wrong in the end of the day, trying to sound smart while talking about irrelevant shit that has nothing to do with the topic in question, get bent nigger
>>
>>101558026
I have the same issue with exl2
>>
>>101558657
>>101555363
>>101555391
>>
>>101558719
Sheeeit
>>
I don't wanna FIND NEMO i wanna FIND DORY and i think EVERY RED BLOODED AMERICAN CAN AGREE!
>>
>>101558672
you are arguing with two different anons if you didn't catch that by the way
>>
dory more like boring
>>
>>101558769
astounding pun
>>
>>101558705
I don't deny this, but the problem is I enjoy it.
>>
>>101558769
im finding this pun to be funny
>>
>>101558775
tokenizer issue pls understand
>>
What do we do now?
>>
File: 1721862754.png (589 KB, 1246x1416)
589 KB
589 KB PNG
>>
>>101558793
die of blood clots from sitting too long
>>
>>101558800
Which model?
>>
>>101558800
posting logs without model, sampler, and prompt info should be a capital offense
>>
>>101558793
Dunno bout you, but I'm still testing Nemo out, with two fine tunes queued for testing.
>>
>>101558793
if Vram >= 72GB:
run_mistral_large_2()
elif Vram <= 24GB:
run_mistral_nemo
elif boring_dry == true
run_gemma_27B
>>
>>101558827
>Nemo out, with two fine tunes queued for testing.
if dory id reconsideer
see
>>101558722
>>
>>101558819
I think the only real reason why they do it, is because they get off on the sense that they have something which other people don't.
>>
currently running Mistral Large q4_M in all of its 0.8t/s glory
comparing results to Nemo i don't think the slight increase in quality can vouch for 50 times less gen speed
>>
>>101557904
It's repeating the first words from the previous sentences and the following sentence structure. Gemma 27b doesn't do that with the same prompt
>>
>>101558852
nemo?
>>
>>101558848
If your scenario is simple enough to not see a smart model then nemo is the best balance imo. Soul and smart enough for 99% of rp / writing stuff.
>>
>>101558834
Those comments were mine, in fact.
I like to try the official tune, then a fine tune, then back to the official tune to see if I was doing anything wrong.
Rinse and repeat.
>>
>>101558856
yes
>>
>>101558852
Show your prompt and settings. If you're too lazy to even give enough information for people to help you, go use gemma or whatever model works for you.
>>
>>101558868
know issue it's ove'r
had it too, low temp, high temp, no rep pen, some rep pen, {{random}} schizo inject, it'd still loop
>>
>>101558812
>>101558819
gemma2-9b-sppo-iter3-q8_0
config from anon >>101545047
:3
>>
>>101558880
backend, quant, format, settings? I have not had that problem myself but I use vllm which I know most dont.
>>
>>101557301
finetuning largestral will cost something to the tune of $1k-$10k depending on how big the dataset is. Maybe more. You will need to rent at least a couple of a100/h100 for lora. Don't even think about full finetune.
For the small models, it's much more manageable. You can do it at home if you have 3090s
>>
dbrx2 when
>>
>>101558909
2 days after grok 2
>>
>>101558898
lcpp/kcpp q8, 0.2-1.1 temp, 1.0 (disbaled) to 1.1 rep pen. but some anon said exl2 looped also above or last thread.
>>
>>101558921
Speaking of, Elon took his shiny new 100k H100 cluster online yesterday and started training right away.
>>
File: 1718911930703958.png (23 KB, 777x305)
23 KB
23 KB PNG
>>101558899
4bit Qlora isn't that bad. You can finetune 8x22b with just 96GB VRAM and that's a bigger model than the new Mistral-Large
>>
>>101558927 me
>0.2-1.1 temp
not dynatemp btw tried a few in between
>>
>>101558852
a} Disable all other samplers except static temperature.

b} Don't set temperature higher than 0.4 to start.

c} If it's still doing it, the problem is likely either your card or prompt format.

https://files.catbox.moe/ot5sj3.png

For cards, consider using a few shot format like the one in the above card, rather than W++. The word duplication in the description is intentional; it's a name:value format which explicitly specifies the relations between terms.
>>
>>101558898
I use vllm with the neuralsomething fp8 weights from huggingface, set add bos true add eos false, currently temp 0.4, repetition penalty didn't seem to do much so 1.0, top_p 0.9
>>
>>101558962
petrus... not like this
>>
>>101558962
>it's a name:value format which explicitly specifies the relations between terms.
>Every statement you process, must be evaluated according to the below six principles.
>"principle of identity":"1 = 1"
>"principle of contradiction":"1 ? 0"
>"principle of non-contradiction":"1 ? 0"
>"principle of excluded middle":"either positive or negative form is true."
>"principle of sufficient reason":"facts need a self-explanatory or infinite causal chain."
>"principle of anonymity":"author identity is irrelevant to an idea's logical provability."
>I still keep this in my own sysprompt, although I know I will receive shrieks and howls in response.
so you do huh
>>
>>101558962
i've seen you shill this card alot of times and i still don't understand wtf it's doing
>>
>>101558986
Either tell me specifically what I am doing that you're having a problem with, or shut the fuck up. If your issue is simply the fact that the card format violates your own preconceptions, then that's also not my problem. It's the only card I've got that consistently produces good results with every model I try it with.
>>
>>101558962
>This fork has been tested with three major models; MLewd 13b, Mythomax 13b, and Mistral 7b. Mythomax seems to work best.
https://characterhub.org/characters/petrus4/adriana-cruz
why not link your chub?
>>
>>101558829
Is 72GB enough to run it?
>>
>>101559011
Have you actually tried using the card? Chatting with it?
>>
>>101559025
>The design of this fork adheres to my card authoring doctrine, of minimising prose as much as possible, while giving the model descriptions, numerical data, a list of interests, and one or two examples of behaviour, and then letting the AI fill in the rest of the blanks. I feel that it works better than adding every single detail myself, since it encourages adaptive rather than static behaviour. I also use Myers Briggs personality profiles, as a means of providing a full and complex personality, while minimising token expenditure.
word slop of pure pretentious
>>
>>101558962
Why is the word duplication intentional?
>>
>>101559049
i will eventually
>>
>>101559073
you wouldn't understand, petrus thinks in a higher plane of existence, literally
https://characterhub.org/characters/petrus4/hexnet-1d18e703
>>
>>101559082
>>
https://pastebin.com/gHVRraHJ
>>
>>101559118
I thought about doing something like that for perplexity too.
>>
>>101559100
Tokens: 616 (l3 tokenizer)...
>>
>>101559135
Yeah, it saves a lot of time for bigger models. Hopefully they add Mistral Large since 405B is just watery soup.
>>
>>101558946
any LoRA that doesn't change at least 10% of the model's parameters is a cope lora.
>>
>>101559158
4bit qlora 1 epoch rank 8 is enough
donate to my kofi
>>
>>101558379
what shit OS crashes from an user mode app?
>>
>>101559196
windows
>>
>>101559212
"macbook"
>>
VLLM crashed my windows. Piece of shit.
>>
Is it me or is 3.1 8B / 70B noticeably worse than 3
>>
I am once again asking is there anything worth updating for RP over midnight miqu that can run on ~48 vram
>>
>>101559238
gguf?
>>
>>101559238
They're about the same in my testing, but I think there was a lot more shine on 3 due to the hype cycle being so long. Same as I think Gemma 2 27B is being mildly slept on due to it being a bit wonk at release. It's a monster for that size.
>>
>>101559243 (me)
my name is mikufag, btw
and yes, i'm still in denial
>>
>>101559252
>Same as I think Gemma 2 27B is being mildly slept on due to it being a bit wonk at release
Don't most of us know now to wait for finetunes, rather than using a vanilla release?
>>
>>101559256
look I don't give a shit about you schizos crying about shilling free models, I am currently using said model and nothing I have tried easily outperforms it in sillytavern but I have not checked for a month
>>
>>101559229
linux only crashes windows huh?
>>
>>101559269
no? most shit on all tunes pretty much, even the corpo instructs
>>
>>101559256
>>101559169
>>101559100
>>101559082
>>101559073
>>101559054
No matter how much misery you attempt to cause others, it will never equal the amount that you are obviously motivated by yourselves.
>>
>>101558899
I wanted to basically add more knowledge to the model from small dataset that I will create on my own. I though about finetuning mixtral or nemo.
>>
File: nemo_sovl_2.png (133 KB, 871x771)
133 KB
133 KB PNG
i think i'm gonna stick with Nemo for the time being
>>
>>101559054
>Myers Briggs personality profiles
Amateur shit. Real pros use personality checksums.
>>
>>101559297
I'm enjoying it too.
I'll still give dolphin-2.9.3-mistral-nemo and mini-magnum-12b-v1.1 an honest run.
>>
so is there a guide on what to buy to run the big llama yet?
>>
>>101559291
>No matter how much misery you attempt to cause others
funny coming from the guy that was dooming that anything post mixtral 25 was woke and local was over
>>
I saw someone on le reddit say they ran the 70b on a single 4090, isn't that literally impossible
>>
>>101559272
They won't give you good information. They have no intention of doing that. All they are interested in is trying to spread their own pain. If you want to know what models are good to use, you are going to have to download some and try them yourself.
>>
>>101559318
>isn't that literally impossible
totally possible if you chop3/4 of th brain out ~q2 quants
>>
>>101559319
>If you want to know what models are good to use, you are going to have to download some and try them yourself.
Finally some petrus advice i'd agree on
>>
>>101559336
Technically Q4 is 3/4ths removed and Q4 is as low as you can go without it becoming a brain-damage quant.
And also technically don't they do the pretraining at fp32?
So even fp16 is 50% brain removal. and Q4 is 75% of the remaining 50%.
>>
>>101559318
IQ2_M and below are viable on 24GB.
>>
>>101559314
In how many years' time will you still remember my posts, Anon? 10? 20? 50? If you have so little else to occupy your mind, perhaps I've actually done you a favour.
>>
do either llama 3.1 or mistral large have good elx2 quants yet?
>>
>>101559352
Even "brain-damaged" quants are better than FP16 8B.
>>
>>101559364
i dunno maybe i'd forget you if you didn't have such a recognizable "tone" to your posts + typing style, and weren't always doomin
>>
Is offloading context faster than the model?
>>
>>101559252
Honestly, post unfucking, Gemma 2 27B is my favorite model and it isn't even close. So much knowledge, solve, and understanding of human psychology and emotion that the other models (sans maybe 405B) just can't fucking touch
Just wish it had more context. 8k is basically nothing nowadays
>>
>>101559011
He thinks invoking logical first principles will magically bootstrap models into becoming smarter than they are. Naturally, if they can't manage basic reasoning, the best way to fix this is to barrage them with a list of impractically generic and abstract rules and they'll use their retard-level faculties of inductive logic to overcome the obvious catch-22 and connect the dots and become 10x smarter.

But no one here recognizes his tortured genius and everyone always just writes off his sysprompt as placebo :(
>>
>>101559410
imo nemo being a bit dumber is worth the 128k context. Its also far less dry.
>>
>>101559377
>muh heckin' bencherinos
>>
>>101559418
>But no one here recognizes his tortured genius and everyone always just writes off his sysprompt as placebo :(
>>97309445
>I still keep this in my own sysprompt, although I know I will receive shrieks and howls in response.
>>
>>101559410
>8k is basically nothing nowadays
Depends on your use case. For RAG it's hardly a broom closet, but for coombot cards it's still usable. Then again, if its' text is that good, you're probably going to want it to slowburn.
>>
>>101559272
Try looking in /r/LocalLLaMA again, because you got that recommendation from there, Miku.
>>
>>101559418
>He thinks invoking logical first principles will magically bootstrap models into becoming smarter than they are.

Can you prove it doesn't?
>>
>>101559410
Hello sars I would like to take the time to talk to you about google's latest model. Sars are you listening? Sar?
>>
>>101559457
>everything I don't like is le reddit
retard
>>
>>101559418
Don't worry, Anon. You did manage to convince me to give up, for the most part.
>>
>>101559483
but that not me.. (your no1 fan) like for real, that someone else...
>>
>>101559460
No, they can't; and they also never bothered trying to come up with an alternate approach themselves. They are a group of maybe 3-4 howling jackals; they produce absolutely nothing of worth themselves. Their only goal is to demoralise and dissuade anyone else here, who might potentially produce something valuable; and unfortunately, they are extremely effective at what they do.
>>
>7B: Llama 3 8B
>13B: Nemo 12B
>30B: Gemma 2 27B
>65B: Llama 3 70B
It all worked out in the end
>>
>>101559410
I RoPE it out to 16k. If you're already running a quant, a little RoPE doesn't hurt as bad as people think. Things for me have been generally stable out to 4x, even for intensive stuff like coding.
>>
>>101559529
come on google gemma 2.1 128k context, come oooonnnnn!
>>
>>101559515
>Their only goal is to demoralise
once again funy
>>
>>101559536
Even 16k is hardly useful besides quick testing and I'm sure any higher and it becomes retarded.
>>
>>101559529
Gemma 27B is worse than Gemma 9B though.
>>
>>101559601
Yeah how did google manage to fuck that one up so bad?
>>
>>101559643
distillation work too well
>>
>>101558378
openai lost, big time nasty
>>
>>101559436
I tested IQ2 70B quants with my 3090 and they do feel better than 8B Llama 3.1 like the graph suggests. Better at following the prompt, at least. They would probably have less damage if output and embed tensors were quantized to something higher.
>>
>>101559667
Hello and welcome Robert!
>>
>>101559601
No, that's definitely not true. Earlier implementations lacked softcapping support though, and apparently that affected the 27B model more. GGUF quantizations done before the fix will remain defective.
>>
File: file.png (416 KB, 529x718)
416 KB
416 KB PNG
>>101559678
ROBBERRRTT!!!
>>
>>101559667
>They would probably have less damage if output and embed tensors were quantized to something higher.
https://huggingface.co/RobertSinclair
https://huggingface.co/ZeroWw
>>
>>101559707
>GGUF quantizations done before the fix will remain defective.
Tried after and SPPO 9B still mogged 27...
>>
>>101559711
Nah, q5_k to q8_0 will be enough, no need for f16.
>>
>>101559748
>no need for f16
But (ZeroWw) quantizations...
>>
>>101559740
It's okay anon
I believe you
>>
>>101558898
fp16, fp8 or awq?
>>
>>101559601
Thats not true at all. And im not talking about benchmarks.
>>
>>101559894
FP8
>>
>>101559997
made yourself or from huggingface?
>>
>>101560013
>>101560013
>>101560013
>>
>>101559515
this whole thread is odd
i've never heard of your magical sysprompt and I'm inclined to believe all the posts defending you are simply you with a different hat on
>>
>>101560128
Correct.
>>
I tested llama 3.1 8b, gemma 2 27B and mistral-nemo 12B in my native language, German - mistral wipes the floor with llama and gemma.
mistral - perfect german, coherent and meaningful answers
llama - good german, produces mostly nonsense
gemma - broken german, not usable

thanks mistral, i am now your fan
>>
>>101560266
hi Wolfram Ravenwolf



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.