[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: miku_seine_alter_.png (2.63 MB, 1280x1280)
2.63 MB
2.63 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102849995 & >>102838447

►News
>(10/16) Ministral 8B instruct model released: https://mistral.ai/news/ministraux/
>(10/15) PLaMo-100B: English and Japanese base model: https://hf.co/pfnet/plamo-100b
>(10/15) Llama-3.1-70B-Instruct customized by NVIDIA: https://hf.co/nvidia/Llama-3.1-Nemotron-70B-Instruct
>(10/14) Llama 3.1 linearized: https://hf.co/collections/hazyresearch/lolcats-670ca4341699355b61238c37
>(10/14) Zamba2-7B released: https://www.zyphra.com/post/zamba2-7b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: GYlCgrqasAAjKX-.jpg (20 KB, 462x370)
20 KB
20 KB JPG
►Recent Highlights from the Previous Thread: >>102849995

--Papers:
>102855851
--Recommendations and considerations for running models on a 2060super 8gb setup:
>102851319 >102851349 >102851356 >102851375 >102851414 >102851704 >102851931 >102852032 >102852056 >102852084 >102851972 >102851483
--How to understand and implement samplers in AI models:
>102854191 >102854542 >102854634
--Nemotron 70b resists explicit story direction:
>102856315 >102856323 >102856345
--Mistral's performance on trivia questions:
>102852002 >102852061 >102852178 >102852203 >102852201 >102852232 >102852310 >102854908 >102852409 >102852312 >102854622 >102852481
--L3.1 Nemotron Instruct at Q6K shows promise but falls short in RP:
>102858009
--H100 worth the price, but depends on the task and setup:
>102852441 >102852670 >102852712 >102852744 >102852753 >102852964 >102853007 >102853223 >102853246
--Ministral-8B-Instruct Nala test shows improvement over Nemo:
>102850925 >102850945 >102850962 >102851092 >102851019 >102851030 >102851023
--Ministral ggufable but has issues at long context:
>102851380 >102851421 >102851455 >102851473 >102851488 >102851558 >102851565 >102851611 >102851713 >102851737 >102851828
--Debate on AI surpassing human-level intelligence and its capabilities in ERP and TTRPG:
>102850496 >102850808 >102852429 >102853638 >102853941 >102853974 >102853963 >102854001 >102854095 >102854133 >102854169 >102854215
--8B model performs well for RP purposes and holds up under quantization:
>102851186 >102851548
--New ooba feature allows download cancellation with ctrl+c:
>102850112 >102850232 >102850266 >102853434 >102853471 >102858260
--Models have video game knowledge but struggle with trivia questions:
>102853355
--Miku (free space):
>102850413 >102850771 >102851605 >102851900 >102853205 >102854227 >102854365 >102855777 >102856017 >102859296 >102861263

►Recent Highlight Posts from the Previous Thread: >>102850022

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
https://x.com/_xjdr/status/1846944172445782408
>>
File: 1708184095859170.png (130 KB, 488x497)
130 KB
130 KB PNG
>>
damn france was the shit at sone point wasnt it
>>
>>102862181
Yes, but then we lost a world war and got flooded with 40IQ browns.
>>
>>102862181
they were pretty good until the EU destroyed them along with the rest of Europe, now they survive on tourism like some third world country
>>
File: 4918.jpg (108 KB, 882x1232)
108 KB
108 KB JPG
Its over...
>>
Nemotron beats every other 70B ive ever used for RP so far. Not sure if its better than mistral large tunes though. It has some oddities with its formatting.
>>
>>102862255
Who is this arx person
>>
So is the novideo 70B finetune actually better than largestral or is it just a meme?
>>
>>102862303
bench gamed meme.
>>
>>102862303
Its hard to compare. I really like its prose, and its quite smart / deeply introspective which is good for RP stuff.

>>102862313
Its human preference tuned which all it does it increase its personability. Its worse at coding than base llama but its FAR better at creative uses in my testings so far. It's actually creative and interesting unlike dry 3.1
>>
>>102861776
This is really fun, I'll definitely give adding it to Mikupad a try.
>>
I just came back from a two week vacation. Updated oobabooga and now all my models run 4-5x slower, from 20 t/s to 3 or 4 even with <10k context. I have a 1080ti with 7GB vram used. Tested with the new ministral 8b and magnum 12b v2 gguf with q4ks variants.
Did something happen to ooba while I was gone? Should I be using something else for my interface?
>>
>>102862303
The show off Nemotron solving the strawberry riddle right on the model card. It's also been clearly trained on Sally and probably most of the reddit riddles. Human preference alignment means that it's just a bigger Starling.
>>
>>102862361
Yeah, koboldccp/tabbyapi is the new meta
>>
>>102862388
koboldcpp is the same, must be a llamacpp issue. Guess I'll have to wait for a patch.
>>
File: file.png (110 KB, 200x232)
110 KB
110 KB PNG
>>102862361
>he pulled
>>
>>102853138
>>102853177
>>102855144
It could also be a methodology problem. Just looking at top-k=1 completely ignores any changes to the probability distribution that didn't bump the most likely token from first place.
>>
>>102853205
>>102858490
Training an instruct model on unformatted raw text strikes me as methodologically unsound.
>I ran a fairly strong LoRA on it using a private raw-text dataset. The results were 'overcooked' so I did a 50/50 SLERP merge back onto the original model and this is the result of that merge.
>>
when will 8b beat aicg?
>>
>>102862361
How were you using ministral 8b two weeks ago?
>>
>>102862255
Isn't MMLU a benchmark for knowledge evaluation? They only trained Nemotron to be aligned with arena preferences, so their training wouldn't add anything to its knowledge.

Nemotron is much better for RP and creative writing than the base model. That's what matters.
>>
>>102862259
I second this opinion. It's the best 70b model I've used for RP.
>>
>>102862259 >>102862347 >>102862902 >>102862918
What main prompt are you using, what instruct template, and what sampler settings?
>>
Why do I get horrible slowdown in ST when using group chats?
>>
>>102862990
Get your own
>>
>>102862999
Probably because the context has to be reprocessed every time due to the card information of each character being different, and high on the context.
>>
>>102862918
>>102862259
Didn't it fail the Nala test?
>>
>>102862990
Try this:
https://files.catbox.moe/wwtnkf.json

Regex:
https://files.catbox.moe/qs0dwf.json
>>
>>102863031
CoT with a cute formatting regex, will give it a try and A/B it against Llama 3.1 Instruct. What sampler settings have you personally used when experiencing good results?

>"allow_jailbreak": false
I always wonder when I see this whether someone left it at the SillyTavern default or if they tried and saw
<|start_header_id|>system<|end_header_id|>
doesn't work for injecting instructions after the start.
>>
>>102863024
Yeah, I had to use the "join character cards" instead of the default of switching them.
>>
>>102862990
0.21 smoothing, 0.03 min p, temp 1, DRY on.
>Instruct template
Basic llama 3 Instruct on Sillytavern
>System Prompt
"This roleplay consists of alternating messages between Assistant (you) and Human (the user). Human and Assistant take turns to add to the story, and this continues indefinitely.

Both {{user}} and {{char}} are major characters in the story, with other side characters taking on a supporting role.

There are strict rules for the contents added in each turn:
Human turn: Describe only {{user}}'s actions, dialogue, thoughts and feelings.
Assistant turn: Write only general story narration and the actions/dialogue of {{char}}. You cannot control or imply {{user}}'s thoughts or actions.

Note: Text that is formatted with parentheses is out of character and is directed to you outside of the role-play. If you are sent an OOC request, then it must be obeyed and implemented immediately!

(OOC: This is an example of an out-of-character message.)"
>>
https://github.com/xjdr-alt/entropix/tree/70B/entropix
>>
File: 1722759829144264.jpg (124 KB, 850x1000)
124 KB
124 KB JPG
LLM powerd VNs when?
>>
>still no 70b natively multimodal model
holy shit these niggers really dont want to risk anything, they just want that extra 1% mmlu pro
>>
>>102863432
What's the use case?
>>
>Llama.cpp still no SWA support.
>Llama.cpp still no multimodal support.
What the fuck do they do all day then, huh?
>>
Hello, /lmg/, my old friend.

What literature should I read to fine-tune a coding-oriented LLM on my codebase?
>>
>>102863509
the fact that you actually unlock the biggest functionality of AIs aside from AGI, its ability to properly interact with all GUIs, you know, the things that allow you to interact with everything that was made with human eyes in mind
>>
File: file.png (82 KB, 1111x527)
82 KB
82 KB PNG
I've been using Miqu Midnight for a long ass time and I just downloaded Nemotron to test it
how the fuck is it so annoying right from the start
>>
>>102863563
maybe test it on an actual use case instead of grading its specific greeting message intstruction training gorilla nigger?
>>
>>102863563
What is your problem?
>>
>>102863516
>Llama.cpp still no SWA support.
Then who is running these Nemotron 70B ggufs and how?
>>
Minitron is pretty good, easily the best 8B I've tried.
>>
>>102863596
From what I understand the new interleaved sliding window attention mechanism borks ministral quants. Until llama.cpp gets on it you might as well not even touch ministral.

>still no llama 3.2 support either.
Its like they want their project to die.
>>
>>102863621
Is it better than nemo 12B?
And if so, in which ways?
>>
>>102863596
>Nemotron 70B ggufs
based on l3.1 and uses rope scaling, not swa?
>"_name_or_path": "meta-llama/Llama-3.1-70B-Instruct",
>"rope_scaling": {
https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF/blob/main/config.json
>>
>>102863375
As soon as we get agi.
>>
>>102863692
I've only tried Nemo 12B RPMax and it seemed pretty dumb to me. Minitron seems comparable to Mistral Small.
>>
>try nemotron 70B
>shivers down the spine on the first response
I'm not trusting you people again.
>>
>>102863825
I'm sure your prompt had nothing to do with it.
>>
File: 8th-snitch.jpg (38 KB, 750x778)
38 KB
38 KB JPG
>>102863825
lol.
lmao.
>>
>>102863862
My prompt is high tier literary fiction with some fetish stuff. Even Nemo merges did better than that. Sad!
>>
>>102863825
You learned a lesson.

Supplemental lesson: a sign of bullshit is using hyperbolic language ("best ever") as a substitute for details and examples. If something is so great it is very easy to give an example of how it blew you away.
>>
>>102862807
I wasn't, I tested it today to compare against my older models. I used to just switch between gemma 2 9b sppo, magnum and nemo.
>>
Anyone knows if there is a way to disable markdown rendering in Silly Tavern?
>>
File: Untitled.png (98 KB, 638x346)
98 KB
98 KB PNG
>>102863916 (Me)
And I'm not memeing here. It can realize it's reading something of high quality. Unfortunately it can't keep; instruct llms are usually bad at following style without finetuning.

>>102863940
That's why I try them online if possible before downloading, so no time/bandwidth wasted.
>>
>>102864095
>It can realize it's reading something of high quality
kek
>>
nemotron 70b really likes to do choose-your-own-adventure roleplay
>>
Is there any local model can trade blows with my dick yet?
>>
>>102864360
im sure if you look long enough you will find a local whore that used to be a model.
>>
>>102864360
>my dick
Any 2B model should be about just the right.
>>
>>102864410
not a pedo
>>
>>102864095
interesting
i'm going to try putting "genre: high quality literature" in my author's note next rp session
maybe it'll work like "masterpiece, 1girl" the way image diffusion used to work
>>
>>102864460
Purple prose will shoot through the roof.
>>
>>102864460
That doesn't really work. LLMs weren't trained with such tags so they don't really know what high quality literature reads like. It will either not do much at all, or make it vomit purple prose and meaningless filler.
The best you can do is fill your context with good quality text and hope it picks up the patterns.
>>
>>102864489
>>102864686
Not the first anons that don't pick up on sarcasm... or have they been here for a bit? hmmmm?????
>>
File: 1722529467985490.jpg (281 KB, 1024x1024)
281 KB
281 KB JPG
I believe there are a few reasons why we do in fact want a model that's good at trivia rather than focusing solely on smarts.

First we already know that models trained on something perform better at that thing over a model not trained on it that instead gets the information from RAG. It's not surprising that someone who is familiar with something is able to talk about it with more nuance and understanding than someone who isn't.

Second, a model that knows trivia also means it is more likely to have other kinds of creative text in its training data, so it could result in a more creative model overall (if the fine tune doesn't screw it all up).

Third, people love references, allusions, and memes in their media and entertainment. Why shouldn't at least one model in existence be able to spontaneously make those kinds of references unprompted? Imagine if you were doing a lighthearted Halo story/RP and the model suddenly references the Halo guy meme at some point (when it makes sense to). Wouldn't that be cool and natural? A model that can do that would be pretty fun.

While we need models to get smarter, if our personal goals are to have fun with models, then trivia should not be forsaken entirely. A balance of trivia knowledge and non-trivia knowledge would be important. And we can still have different models made by different people that focus on different things, wouldn't that be great?
>>
nigger, faggot, troon, and you know what? nigger, faggot, and troon again.
>>
>>102864729
Your point is...?
>>
>>102864729
>words words words
If you want trivia so much, just use DBRX and let us know how much you enjoy it.
>>
File: trash.jpg (43 KB, 658x439)
43 KB
43 KB JPG
>She sighs wistfully. "Every part of this encounter becomes an exquisite dance between sight, touch, and imagination!"
>>
>>102864489
>>102864686
>purple prose
it's doing something much weirder, "genre:high quality literature" raised the value of everything because of "high quality" i guess.
like usually the rp i use goes
>trashy girl comes over to my apartment at midnight, bitches about how much of a shithole my apartment is
>wants me to (let her crash because she got kicked out of her place/ beat the shit out of her stalker or exboyfriend/steal some money off a drug dealer)
now it's
>sultry girl comes over to my apartment at midnight, marvels at how nice it is
>wants me to (steal a monet painting worth millions)
>>
>>102864778
for me its Victor Vex that shows up for some reason in multiple chats

>Suddenly, the sound of footsteps echoes from outside the warehouse. A figure, dressed in a long, black coat, steps into view. It's Victor Vex, a tall, lean man with piercing green eyes and jet-black hair. He's known for his insatiable lust and his love of watching others suffer.

>Victor Vex: his voice is low and menacing "Well, well, well… what do we have here?
>>
>>102864758
That unlike what someone said last thread, trivia is not a bad thing for models to know.

>>102864762
>having an allergy towards reading, in a hobby with copious amounts of reading
Ironic.
>>
>>102864826
What model? I've never seen that name.
>>
>>102864831
>dismisses my point without adressing it
Ironic. I look forward to seeing your mindblowing DBRX logs that changes everyone's mind about smarts over trivia.
>>
I just had a thought. What about low quality erotica? My human brain instantly imagines: "i suck yo dick. i do it fast" but maybe low quality erotica or low quality erotic roleplay is actually what we want?
>>
>>102864729
1. What model/quant for this?

2. The right kind of training data to teach memes/reference is data that *integrates* that trivia, not that shits it out in response to pub quiz question prompts.
>>
>>102864778
me throwing out another cum rag after long shiver session
>>
>>102864867
I only did the same thing that you did to my post. Your dismissal doesn't actually make sense in the context of the original post, if you actually read it.
>>
I firmly believe we need sloppier models.
>>
File: god.png (50 KB, 1291x193)
50 KB
50 KB PNG
>>102864868
Chronoboros 33B on high freq penalty once spewed me this apex creation of impersonation.
>>
>>102864889
It's ok to admit you don't know what DBRX is
>>
>>102864913
The most shocking thing to me is how it writes YESSS 10 times and somehow resists the siren call of doing it again.
>>
>>102864868
>>102864913
I really need an AI to rub its fazzlenudge against my gigglestick.
>>
>>102864870
1. I don't know. I don't test trivia a ton. But so far it feels like Mistral Large even at Q2 is fairly smart and creative without dipping into being a dry smarty pants model or a creative but pants on head retarded model.

2. That sounds like the right approach and there's no reason to think that I disagree. I believe if we are to get a trivia benchmark that truly tests trivia knowledge, it should be something that's not multiple choice but somehow tests how likely the model can spontaneously make a reference.
>>
>>102864868
>author note: Genre: Low quality smut
it's beautiful.
>>
>>102864921
It's ok to admit you didn't read or understand my post.
>>
>>102864921
>>102864969
Now... kiss!
>>
>>102864814
>>102864965
You are kidding that it actually works this way...
>>
Nemotron is the best I've used for humor. It's actually funny and inventive.
>>
A hallucinated game world with an intelligent Miku in it
>>
>>102865038
>an intelligent Miku
an oxymoron if I ever heard one
>>
>>102864965
>low quality smut is actually higher quality
>>
>>102864989
>>102864965
So what if you try this on a different model? Is it really working this way because of Nvidia's tuning?
>>
>>102863375
LLMs can't keep secrets, they would spoil everything on the first scene
>>
>>102865055
LLMs can't keep memories, they would forget everything after the first scene
>>
>>102865052
i (second person quoted) am actually using Rocinante-12B-v2g-Q4_K_M and no i won't buy an ad.
>>
>>102862756
I'm sorry, I don't speak reddit.
>>
>>102865055
Wrong >>>102242181
>>
>>102864729
I think I demonstrated yesterday that the knowledge is there - it just hasn't been well generalized into the instruct behavior.
>>
>>102864965
SVOL
>>
>>102865076
>i (second person quoted) am actually using Rocinante-12B-v2g-Q4_K_M

>Arsenal (Supported Chat Templates)
>* ChatML for RP
>* Alpaca for Story / Instruct Adventure
>* Mistral for NeMo
>* You can mix it up and see which works best for you.

What makes you the way you are, having not merely downloaded but run what is on its face a defective merge made by a moron?
>>
>>102865201
What makes you the way you are, having not merely shat on a perfectly good merge without having so much as tried it yourself but being a moron and a nuisance here without contributing anything yourself?
>>
>>102865118
I saw that. I think that's essentially a case of shallow knowledge as opposed to deep knowledge. It might've been trained on texts that directly have that information, but not any texts that reference or manipulate the information in other ways. It might be something that can't be solved with only fine tuning.
>>
>>102864965

Too good to be true. This would be the biggest 'gotcha moment in for LLM cooming. No fucking way its this simple.
>>
>>102865228
No really, tell me. Are you underaged? Did your mother drink while she was pregnant with you? Are you a non-native English speaker like the 'tard who excreted that negative-value-added merge of other people's fine tunes and pretended it was his original work? Explain what made you think any part of that is competent or acceptable.
>>
>>102865273
I'd be curious to see how a larger Mistral reacts to it. Maybe it really does work this way? Or the issue is that while it writes in a more preferable way, it also becomes more retarded.
>>
>>102865306
Can you go sperg out somewhere else? We might be on to something big here.
>>
tell me what to think about nemotron 70b
>>
>>102865433
it's ok
>>
>>102865400
No. THIS is my sperging space, and I won't have it polluted by begging jeets and their braindead simps. Ignoring mentions of shittunes proliferates newfriends thinking they're fitting in by using them.
>>
>>102865433
Its unique. Its prose is very different from llama / mistral models, closer to claude than anything I can name. Its also pretty smart and has a "deeper perspective" for RP, not sure else how to explain that. I suggest trying it. I really like it.
>>
>>102865448
Flowery prose and more fun to use than largestral, which is surprising coming as a benchmaxxed corpo model. It's actually not that biased towards the assistant personality having checked the logits with a blank prompt.
>>
>>102865448
leatherjacketman plz
>>
Why does llamacpp seem so broken on Sillytavern for me? When I swipe to generate a new response it almost repeats it back to me verbatim, minus a few changes in words. Using ooba. Doesn't happen with koboldcpp.
>>
>>102865764
Because your settings are fucked
>>
>>102865764
It is not almost. It just repeats itself. Started a few pulls ago. Still not fixed.
>>
I gave Nemotron 70B a try and it's actually not as bad as one would expect from something being shilled here. It does feel like the model has a better understanding of how to act in a RP than most models, although it's certainly slopped.
>>
>>102865796
>it's certainly slopped.
It still has its shivers but I really like its prose otherwise. Not sure how nvidia made 3.1 better at RP than finetuners did.
>>
>>102865777
Is the problem with ooba or sillytavern?
>>102865776
I don't think so. I'm using the same setting as I do with koboldcpp
>>
>>102865818
Maybe they are the only hope we have? They make money making gpus. Their AI division is probably some playground division. Maybe they will actually make a coom model when nobody is looking or cares what they are doing?
>>
And now that I thought about it some more if jewvidia forces buyback into agreements and they know that none of the reputable companies will make a cooming model... it is actually in jewvidia's interest to make a cooming model to increase demand for their products? Will you worship leatherjacket man if he delivers?
>>
>>102865897
The ultimate cope...!
>>
Do you need a equal amount of RAM and VRAM?I have a 4090 and 3090, but I'm considering getting either another 3090 for 72 or an a4000 for 64, but I have only 64gb of RAM.
>>
>>102866088
You don't. Also, the A4000 would bottleneck your speed.
>>
is there a 8-12b model capable of fulfilling my cringe japanese high school romance rp yet?
real coherent and interesting like in VNs?
>>
>>102866127
But don't you need twice the ram? Would I be able to use an 8gb larger model with extra 8gb?
>>
>>102866160
>real coherent and interesting like in VNs?
Like the lowest common denominator for both writing and videogames? Sure. The best you can run as a vramlet is mistral nemo or a finetune. Start with the original instruct and test finetunes if it's not enough.
>>
File: claude logo.png (165 KB, 400x240)
165 KB
165 KB PNG
https://x.com/atroyn/status/1846935326058827948
>>
File: chatlog (37).png (184 KB, 830x516)
184 KB
184 KB PNG
I like Nemotron.
>>
>>102865448
Agree about the relatively unslopped prose, but It's too much dumber than Largestral for me to use.

I don't blame Nvidia for it though, 3.0/3,1 70B both have this weird quirk where they'll give you two good, sensible generations and the third one will be inexplicably completely retarded with a huge logical error or non sequitur you'd expect from an 8B model. NovelAI's new model (based on 3.0 70B) has the exact same issue. It's a shame Nvidia's tune wasn't able to beat that out of them.
>>
https://www.reddit.com/r/ChatGPT/comments/1g5s4i2/has_science_gone_too_far/
>>
File: file.png (2.36 MB, 1159x1125)
2.36 MB
2.36 MB PNG
>>102866397
WOULD, YOU HEAR ME ANON??? I WOULD FUCK THAT PHONE
>>
>>102866351
that's a good vonnegut book
>>
Hi all, Drummer here...

>>102865201
>>102865228
v2g is not a merge.

>>102865306
>>102865447
I know who you are. I hope you can find inner peace and develop empathy. It's sad to see someone so unhinged and full of hate. I worry about you.

If you don't have a close friend, or can't afford a therapist, then maybe you could try talking to this model: https://huggingface.co/TheDrummer/Buddy-2B-v1

It'll walk you through your frustrations, and maybe help you discover what's wrong. Try to work on yourself before it's too late.
>>
>>102866466
62 75 79 20 61 6E 20 61 64
>>
>>102866466
tell me about new dawn
>>
>>102862116
>-Models have video game knowledge but struggle with trivia questions
Honestly, if those autists who didn't create and maintain video game wiki's didn't exist current models would be a whole lot dumber when it comes to video game knowledge. Hats off to them.
>>
What are some local models that are helpful for NOT jerking off?
>>
>>102866558
if you want SFW models, then go for Claude or OpenAI?
>>
>>102866473
6E 6F
>>
>>102866466
Hi TheDrummer, why did you mix three instruct formats in one fine tune? It's kind of hard to believe someone did that on purpose.
>>
>>102866675
I think it's a fun idea to have three instruct formats that behave differently and package it in one model. You can switch around the three for different levels of smarts, creativity, and prose.
>>
>>102866795
Haha, what a fun and quirky idea!
>>
Ah, another wild rodeo with lmao.cpp.
Cannot wait until my goofy file downloads and everything just works first try.
>>
>>102866214
>Would I be able to use an 8gb larger model with extra 8gb
You would but unless you're CPUmaxxxing on DDR5 RAM it's gonna be painfully slow.
Also If you can go for the 3090 do it, the a4000 is only good if you're low on slots/power. The memory bandwidth gets totally mogged by 3090s
>>
File: 1727123885469839.jpg (51 KB, 640x636)
51 KB
51 KB JPG
Man, why the FUCK is nemo 12b always trying to get me to eat my own cum?
>>
>>102866558
Qwen2.5 will moralize you to sleep if that's what you're looking for. Codestral is my guy for code completion.
>>
>>102866928
It's a retard.
>>
>>102866928
>>102866972
well have you tried it? maybe its on to something
>>
elon actually delivered https://x.com/SawyerMerritt/status/1846799881597559014
>>
What's the best model you could run locally with 96gb of VRAM? Specifically looking at code assistance.
>>
>>102867146
not local
not a model (it's controlled by a human remotely)
>>
>>102867210
Mistral Large 5.5bpw or Qwen2-72b 8bpw
>>
>>102867211
Local and controlled by ai model now, humans do human shit, record all motion data then use it for said ai model training. Teleoperated data is the only way to teach it do stuff you want.
>>
I haven't been here since August, what's the new meta?
>>
>>102867389
death
>>
>>102867389
Nothing has changed unless you're a poorfag who runs 20b models
>>
>>102867389
meta is dead.
>>
Nemotroon is sending all the right shivers down my spine, nvidia have done it again.
I think they have the best RP multi-turn dataset in the local sector. Just a touch of anti-sloppa makes the model fucking godlike.
>>
is nemo 12b still best for a poorfag?
>>
nemotron 70b is pretty censored. I think it'll need a jailbreak to get it to be willing to output ERP.
>>
I checked up on https://app.primeintellect.ai/intelligence
Looks like the pace has picked up a bit, so this training run might complete in less than 100 days after all.
>>
>>102867571
Pretty much. Some people will suggest a tune of it, but I have had the most success with the official instruct and just a little wrangling.
>>
What do you guys think about the new changes vedal did to add a bunch of agent features into neuro? https://www.youtube.com/watch?v=qev-dEfuomQ
>>
>>102858904
Thanks I will give it a go when I get up to speed

>>102859266
Yeah saw about that, Im gonna ignore the increases though ha until theres a large leap somewhere in tech, which feels not far away.

>>102858868
I am also running 2 4090s and I just wanted to load big LLMs for my slower tasks with the extra ram. I wouldnt recommend the glacial rate of cpu only compared to gpus
>>
File: 1728766845557788.png (177 KB, 2394x646)
177 KB
177 KB PNG
nvidia's "Sana" https://nvlabs.github.io/Sana/
>>
File: 1714044647085267.png (55 KB, 717x546)
55 KB
55 KB PNG
>>102867726
https://arxiv.org/pdf/2410.10629
>>
>>102867664
I don't think about it at all.
>>
>entropix

lemme know if anyone tests it out with examples
>>
>>102867664
I don't know what any of those words mean.
>>
>>102868471
Tried it the other day and it seemed broken. Failed 9.9 vs 9.11 every time except the one time it failed to answer at all. Need two more weeks to bake this nothingburger.
>>
have you guys ever began to feel your cock stir after she says something barely above a whisper?
>>
>>102867664
I think you should buy and ad but this seems genuine so buy a map instead since you're lost
>>
>>102868582
How are you guys not interesting in making an AI gf that could do things like that
>>
Did another character card site get shut down recently because for like the past 3 months chub has had absolutely dogshit quality uploads from pajeets and brown hands. These cards are utter dogshit. It wasn't like it was great before but now it's an epidemic. I was trying with the cards I created and uploaded but they just immediately get drowned out by the flood of uploaded shit.
>>
>>102868536
yeah, that's always shortly before, with a strangled cry, I release what feels like a gallon of cum
>>
Improving Instruction-Following in Language Models through Activation Steering
https://arxiv.org/abs/2410.12877
>The ability to follow instructions is crucial for numerous real-world applications of language models. In pursuit of deeper insights and more powerful capabilities, we derive instruction-specific vector representations from language models and use them to steer models accordingly. These vectors are computed as the difference in activations between inputs with and without instructions, enabling a modular approach to activation steering. We demonstrate how this method can enhance model adherence to constraints such as output format, length, and word inclusion, providing inference-time control over instruction following. Our experiments across four models demonstrate how we can use the activation vectors to guide models to follow constraints even without explicit instructions and to enhance performance when instructions are present. Additionally, we explore the compositionality of activation steering, successfully applying multiple instructions simultaneously. Finally, we demonstrate that steering vectors computed on instruction-tuned models can transfer to improve base models. Our findings demonstrate that activation steering offers a practical and scalable approach for fine-grained control in language generation.
kind of interesting. seem decent at using the vectors to steer response lengths by number of sentences. models tested were small and no code though
>>
Deepseek Janus support for llama.cpp soon?
>>
File: Untitled.png (2.07 MB, 1080x4094)
2.07 MB
2.07 MB PNG
SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction
https://arxiv.org/abs/2410.13846
>Recent advancements in large language models (LLMs) have extended their capabilities to handle long contexts. However, increasing the number of model layers and the length of input sequences significantly escalates the memory required to store key-value (KV) cache, posing challenges for efficient inference. To mitigate this issue, we present SimLayerKV, a simple yet effective method that reduces inter-layer KV cache redundancies by selectively dropping cache in identified lazy layers. Our approach is based on the observation that certain layers in long-context LLMs exhibit "lazy" behavior, contributing less to modeling long-range dependencies compared to non-lazy layers. By analyzing attention weight patterns, we find that the behavior of these lazy layers is consistent across tokens during generation for a given input. This insight motivates our SimLayerKV, which identifies lazy layers and reduces their KV cache accordingly. SimLayerKV is training-free, generalizable, and can be implemented with only seven lines of code. We conduct extensive experiments on three representative LLMs, e.g., LLaMA2-7B, LLaMA3-8B, and Mistral-7B across 16 tasks from the LongBench benchmark. The results demonstrate that SimLayerKV achieves a KV cache compression ratio of 5× with only a 1.2% performance drop when combined with 4-bit quantization.
https://github.com/sail-sg/SimLayerKV
looks pretty simple to use yet still pretty effective. neat
>>
>>102868969
Probably never honestly.
>>
File: Untitled.png (402 KB, 1080x1223)
402 KB
402 KB PNG
A Little Human Data Goes A Long Way
https://arxiv.org/abs/2410.13098
>Faced with an expensive human annotation process, creators of NLP systems increasingly turn to synthetic data generation. While this method shows promise, the extent to which synthetic data can replace human annotation is poorly understood. We investigate the use of synthetic data in Fact Verification (FV) and Question Answering (QA) by studying the effects of incrementally replacing human generated data with synthetic points on eight diverse datasets. Strikingly, replacing up to 90% of the training data only marginally decreases performance, but replacing the final 10% leads to severe declines. We find that models trained on purely synthetic data can be reliably improved by including as few as 125 human generated data points. We show that matching the performance gain of just a little additional human data (only 200 points) requires an order of magnitude more synthetic data and estimate price ratios at which human annotation would be a more cost-effective solution. Our results suggest that even when human annotation at scale is infeasible, there is great value to having a small proportion of the dataset being human generated.
https://github.com/DhananjayAshok/LittleHumanData
looks like Miku will have a reason to keep us around
>>
https://huggingface.co/deepseek-ai/Janus-1.3B
>>
>>102869111
Huge version soon?
>>
>>102869128
>236b image generating multimodal
god I wish
>>
Quamba: A Post-Training Quantization Recipe for Selective State Space Models
https://arxiv.org/abs/2410.13229
https://github.com/enyac-group/Quamba
no code yet. doesn't mention mamba 2 so not sure if the architectural changes render this nonfunctional for it. works for jamba. eh
>>
>>102869111
gguf?
>>
File: 1717027853525879.png (452 KB, 850x611)
452 KB
452 KB PNG
>>102869101
Synthetic data is not sufficiently diverse, that's all there is to it.
>>
>>102869424
I feel like it could be if only they'd use "unsafe" models to generate it.
>>
We use human data to train models to generate fake data to train models. They should work on making them better without this.
>>
>>102869424
Do you know what subject the red cluster in the right is? Or where did you get this from?
>>
>>102869479
Loli fiction
>>
>>102869479
It's an old paper https://www.researchgate.net/publication/370228047_SocialDial_A_Benchmark_for_Socially-Aware_Dialogue_Systems the model was GPT3, but nothing really changed
>>
File: whoveryniceofthem.png (79 KB, 428x371)
79 KB
79 KB PNG
>>102869492
Thanks for the link. I'll read it in more detail later.
>>
>>102869628
they used the ai to write the ethical considerations boilerplate...
>>
>ooba is deeply fucked again
it's all so tiring
>>
Am I allowed to install lama?
>>
>>102870871
>using gradio shitware
you get what you deserve
>>
File: 1720533839023564.jpg (55 KB, 500x500)
55 KB
55 KB JPG
is 15 lines too long for a system prompt, or is it a skill issue
>>
>>102871090
if your sysprompt is fewer than 10k tokens you have very simple needs
>>
>>102871090
Being able to express yourself without rambling is a skill.
>>
>>102871090
>counting in lines instead of tokens
It's already over
>>
>>102871228
It's simple math, anon. Assuming they are full lines, it's about 700 tokens
>>
>>102869424
But the models trained on synth data will be very good at whatever the red sector does
>>
>>102871323
No, that's why full synthetic data is tanking the accuracy here >>102869101
>>
File: ComfyUI_05091_.png (267 KB, 1024x1024)
267 KB
267 KB PNG
>>102870886
>>
>>102870886
You can have a little bit llama
>>
GOOD MORNING SIRS
qrd on ministral 8b? does it have SOVL or do i have to stay with nemo finetunes for now?
>>
>>102871144
I mostly just do a lot of striping and raping, so I guess. 2k tokens. It's still a bit of a chore getting it to describe their genitals without also describing their hard nipples under a full burka, though.
>>
>>102871708
When it's good... it's really fucking good. But it can also be a bit hit and miss at times.
>>
>>102871828
so just like every other model?
>>
>>102872145
Nobody asked you.
>>
>>102872173
NTA but I like to hear what he has to say.
>>
>>102872206
Literally all xhe does is flail around bemoaning literally anything other people enjoy. You can replicate that at home by being an abject failure of a human being.
>>
>>102872223
How do you identify xir?
>>
>>102871708
I think users should check it out once an official Transformers version will be uploaded on HuggingFace. The one MistralAI recommends to run with vLLM seems broken in various ways.
>>
>>102872230
Intuition. They can identify you across threads though because they're one of the mods. They've accidentally let this ability slip in the past. I guess the admins reigned their ass in. But in the earlier days of the threads you'd get a 3 day vacation any time you gave them any friction back.
>>
>>102872242
It was just a simple comment about how models are very unpredictable, which they are.
>>
>llama3 is completely filtered of bad thoughts
>facebook is still doing mass censorship despite what zucc said
>lecun lost his mind over musk and is gradually being exposed as a hack
False prophets
>>
>>102872458
It's still surprising to me how crazy Musk makes some people.
>>
>>102871708
Finicky, copies formatting rigidly, doesn't follow/understand formatting instructions. Same as Nemo, I guess?
>>
Nemotron 70B is so good, I feel like crying
How long will we be dependent on corpos releasing kino instead of we doing it ourselves?
>>
>>102872746
Is there any uncensored finetune out yet?
>>
>>102872787
How does one overcome skill issue via finetuning?
>>
>>102872787
>finetune of a finetune
get a load of this guy
>>
>>102872787
There are no uncensored fine-tunes, just horny sloptunes.
>>
>>102872857
Is there any horny version of Nemotron then?
>>
>>102872863
not yet
>>
Unironically gonna buy a second gpu for nemotron.
Fuck you Jensen, you double nigger, you got me.
>>
What do you even run it on? My ooba completely dieded.
>>
>>102873087
Yours complains about dependency?
I had an issue after updating yesterday, I just updated it again and it started working.
>>
>>102873087
Why are you unironically using ooba, that's like admitting to being a llm boomer
>>
https://huggingface.co/deepseek-ai/Janus-1.3B
https://github.com/microsoft/BitNet
>>
>>102873151
Shieet, first nvidia releases sota and now bitconect comes out?
Back bros, we're so back.
>>
>>102873119
How do I start using ooba ironically?
>>
File: 1727972056445148.png (8 KB, 424x133)
8 KB
8 KB PNG
>>102873151
it's over...
>>
>>102873169
Hey hey heyyyy
>>
>>102873151
>Furthermore, bitnet.cpp can run a 100B BitNet b1.58 model on a single CPU, achieving speeds comparable to human reading (5-7 tokens per second),
Alright, okay.
>>
>>102873119
>Running Kobold
Go suck henk's dick on the 'cord you faggot
>>
>>102873151
>bitnet.cpp
Okay that's nice where model?
>>
File: 1713562416948943.png (37 KB, 825x433)
37 KB
37 KB PNG
>>102873238
>>
>>102873257
Where usable models?
>>
>>102873151
Damn, this means that bitnet really does work, they have the models internally, but for some reason they are not willing to release them. Very sus.
>>
>>102873151
Bitnet bros we're so fucking back
>>
Didn't llama.cpp already have (early?) bitnet support?
I think it was based on the code to run ternary quants.
>>
Were the schizoposters right? Nvidia is literally forcing everyone to hold back release of serious bitnet models?
>>
>>102873270
Because it would tank leatherman's njudea stocks very bad
>>
>>102873365
Seems plausible, better deals on hardware if you only focus on GPU inference
>>
>>102873365
China's Chip revolution despite US sanctions will force the gates open! Cheap GPUs for all!
I trust and believe!
>>
>>102873435
Or they'll just turn into greedy cunts as soon as they have a product breakthrough
>>
>>102873435
>despite
caused by
>>
>>102873365
>>
>>102866352
holy sovl

do you have any more chat logs?
>>
>>102873270
>>102873365
>The tested models are dummy setups used in a research context to demonstrate the inference performance of bitnet.cpp.
>>
>>102873151
NOTHINGSISTERS NOT LIKE THIS AIEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
>>
>>102873687
see >>102873640
>>
>>102873388
>better deals
More like delaying GPU orders for companies that refuse to comply with leatherman's demands.
>>
>>102873622
Everybody was saying that bitnet isn't worth it because it costs the same to train as f16, but that's bullshit because it would save millions in actually serving and running the model. JudeoVidya strikes again...
>>
>>102873745
Exactly. Once trained, the brunt of the cost is in actually running the thing.
Plus, I have my doubts regarding that claim to begin with.
>>
It seems Microsoft was the first company to realize the best way to get local users to fuck off was to give them good models that can be run easily so that they shut up and lose interest in the scene
>>
>>102873755
The training claim holds some water because we don't have chips with ternary operations yet. Everything would need to be done in software first. But I think that once initial work would've been put in, it would've paid off massively. Hell, maybe all major companies already have their own implementation.
>>
big bitnet models when
>>
>>102873858
When companies can tell Jewidia to fuck off.

I've seen some people say Nvidia would benefit from Bitnet and that's correct but if there were a bunch of good big bitnet models out there it would tank consumer interest in GPU's and API services
>>
Nemotron is good but ultracucked.
>>
>>102873858
I'm looking forward to seeing a big BitNet MoE model that can fast on CPUs. Then NVidia can fuck off.
>>
>>102873914
This. If everybody could easily run 100B+ at home, nobody would be interested in cloud models anymore. So even big corpos won't shoot themselves in the foot by releasing bitnet models. You will own nothing.
>>
>>102873960
I'm sure somebody will do it eventually, Despite Sama's best attempts AI is still a pretty competitive space where companies have a reason to undercut each other. But it'll take a while
>>
meta research dump
https://ai.meta.com/blog/fair-news-segment-anything-2-1-meta-spirit-lm-layer-skip-salsa-lingua/
>Meta Spirit LM: An open source language model for seamless speech and text integration.
>Meta Segment Anything Model 2.1: An updated checkpoint with improved results on visually similar objects, small objects and occlusion handling. Plus a new developer suite to make it easier for developers to build with SAM 2.
>Layer Skip: Inference code and fine-tuned checkpoints demonstrating a new method for enhancing LLM performance.
>SALSA: New code to enable researchers to benchmark AI-based attacks in support of validating security for post-quantum cryptography.
>Meta Lingua: A lightweight and self-contained codebase designed to train language models at scale.
>Meta Open Materials: New open source models and the largest dataset of its kind to accelerate AI-driven discovery of new inorganic materials.
>MEXMA: A new research paper and code for our novel pre-trained cross-lingual sentence encoder with coverage across 80 languages.
>Self-Taught Evaluator: a new method for generating synthetic preference data to train reward models without relying on human annotations.
>>
>>102873858
two more miku weeku *pees on your router*
>>
>>102874089
a full load of nothingburgers
>>
File: 1729268279039.jpg (40 KB, 384x384)
40 KB
40 KB JPG
>>102873858
Here is your Janus bitnet Miku.
>>
Microsoft is trying to grab the inference engine monopoly with this bitnet.cpp thing. Llama.cpp devs better add their own support, or they gonna get demolished.
>>
>>102874155
>Implying we're gonna get big Bitnet models anytime soon
>>
What could 64G RAM run bitnet?
>>
>>102874155
They are comparing performance TO llama.cpp, they already have had implementation for months, just not as fleshed out and fast (on cpu).
>>
>>102874175
Github shows they ran 100B model on 64G's of ram
>>
>>102874205
I can do that slowly at IQ3XXS now
>>
There has to be more news than this.
>>
>>102874222
*Not 100B, Largestral
>>
>>102874222
Bitnet matches FP16
>>
File: ED.jpg (435 KB, 2125x1411)
435 KB
435 KB JPG
>>102866558
>helpful for NOT jerking off?
All of them.
>>
>>102874089
Now watch llama4 be another ScaleAI provided, benchmaxxed model that's barely better than the original GPT4 (on the bench)
>>
>>102874254
So I won't be able to run bigger models, but what I can run will be usable speed and higher quality?
Can bitnets be quanted?
>>
>>102872746
>Nemotron 70B is so good, I feel like crying
What did it do specifically? Or are you doing what I did a while back with dark Miqu and you are shilling the model you didn't even download just to fuck with people here?
>>
>>102872488
I am mad a scammer is just walking around in the open and people still pay him money instead of lynching him.
>>
>>102874290
AFAIK no. But yeah basically you can fit FP16 level intelligence and achieve the same speeds you do right now
>>
Imagine being an /lmg/ newfag and hearing all the worship of bitnet but not knowing what people are even talking about.
>>
What are the odds that the chinks will make a 100B bitnet model?
>>
>>102874530
They will release a 100b bitnet model once some other company releases their own 100b bitnet model, as it's always been.
>>
>>102874390
No honor in being /lmg/ oldfag.
Imagine being proud of going trough 3 generations of llama models. People will make fun of you.
>>
>>102874612
I got in after l2 launch. It was finally good enough to fool me into it being worth it. It wasn't.
>>
I was here since Pyggy. I was spending hours just to generate logs on CAI for pygmalion lol
>>
>>102874612
I remember trying to run BLOOM on my laptop.
Grim time
>>
File: bitnala2.png (107 KB, 1040x360)
107 KB
107 KB PNG
So I've been trying to simulate a nala test with the bitnet inferencing thing...
My base model prompting is a bit rusty. But yeah. I feel like there's more underlying issues than that.
The only sampler it lets you cotrol is temp. And it just repeats the same few tokens over and over again if it's too low, otherwise this is at t=1.2
But here's the world's first published bitnet Nala Test (on the Llama-3-8B one)
I assume the magical quantization process they used basically fucked up the model outside of the evals. I might try with one of the proper bitnet from scratch test models in the future but I have to go to work now.
>>
>>102874688
>I assume the magical quantization process they used basically fucked up the model outside of the evals
Yeah, probably.
What I want is to see a bitnet model trained from scratch, not a quantization scheme.
>>
>>102874688
Also inferencing speed is 100% thread bound with this. like 100%. t=physical cores is fastest.
But there's no way in god's green earth anything shy of a 196 core epyc CPU is going to get 7 token/sec on a fucking 100B model like they claim. Expect like 0.1 token/sec on your desktop 6 core.
>>
>>102874688
It's not really the same as quantization, notice that it's only trained on 100b tokens
>>
>>102874688
History is being made.
I'm so glad to see this monumental work done by us, a new era of AI begins now.

Honorable mention to Microsoft for throwing a couple of scrips together.
>>
>>102874747
>He didn't even check the graphs
>100B benchmark on a Intel CPU with 6 P cores got 1.70t/s

Brainlet-kun I...
>>
>>102874688
This feels like GPT3.5 turbo when I tried it when the API first came out. Repetition out the ass after the first message. WAGMI
>>
>>102874847
That Llama 8B model was just a conversion. The guys who made it said it matched Llama 1 7B in intelligence. Proper Bitnet's supposed to be equal to or better than FP16
>>
>>102874832
I like your sarcasm. I hope all of the faggots here are just pretending to be retarded at this nothingburger.
>>
>>102874957
What will the next cope be when the 100B model drops?
>>
>>102874989
The 100B model sends shivers down my spine.
>>
>>102875002
You're right Anon, every technological advancement is a nothingburger when compared to your sheer inability to prompt and ban tokens
>>
>>102874989
When a 100B model drops I will be happy. Making a framework for bitnet and quantising existing model into it means nothing because that is not the reason bitnet makes sense. It is something you would do if you want investment money from a retard who doesn't know anything about computers. 1B toy models also mean nothing. This is basically building a railroad network before trains are invented. Nice to have but worthless for now.
>>
>>102875023
I can't ban tokens in open webui.
>>
>>102875041
You're right in that Quantizing existing models is fucking worthless but undeniable confirmation that it works as expected is pretty huge.

I don't expect any company in dealings with Jewidia to drop a 100B model but Chinks will come up with something soon
>>
>>102875065
>I don't expect any company in dealings with Jewidia to drop a 100B
Models expand to fit resources. If they start experimenting with proper bitnet models and works well enough, they'll just make 5-10T param models and release some "small" 100B parameter models for the masses.
Chinks haven't come up with anything since fireworks.
>>
Qwen said they were looking into bitnet.
They have the compute to train a whole range of models from 1B to 100B, and could probably train an 8B bitnet in a few days.
So why isn't there one?
They are controlled.
>>
>>102875112
There are definitely 5-10T param models in the works now but

>Release 100B textgen model publicly
>Customer interest in AI and API services tanks

It's like hanging yourself with one hand and shooting yourself in the dick with the other hand. You're souring your relationships with Nvidia and hurting your own business model
>>
>>102873151
>thing.cpp
>it's actually Python
>>
File: 1707856899638855.png (618 KB, 1206x880)
618 KB
618 KB PNG
>>
>>102875139
>Customer interest in AI and API services tanks
Remote models will always be faster and bigger. They have the hardware to run ridiculous models. Most normies i know AFK have a shitty 10 year old laptop with 4-8gb ram. The ones that upgraded have an entire 16GB of vram and like fwaaaaa 32gb ram... And none of them know what a 'github' is. Fuck, we have retards here everyday that don't know what a python venv is or cannot process errors on their terminals when they try to load models with huge contexts.
Don't overestimate the allure of convenience for normies. We are not the average.
>>
>>102874327
No, I'm using it for real. This model is definitely something unique. What makes me say this is that the model seems to actually understand the context, it knows what "teasing" means, it doesn't jump on your dick when a character is just teasing. It also seems to be able to pull things from the character card to make the messages more interesting, like, in my character description there's a line that says "she has a c-cup chest and doesn't wear a bra", and in one random message the LLM wrote "She rolls onto her back, still grinning, and stretches, arching her back in a languid motion, which, given her lack of a bra, momentarily draws attention to her C-cup breasts".
This is so different from the usual LLM slop I get from models like Largestral, it's very refreshing.
>>
>>102875221
This. The average masses will only be interested in local models if they can download a 100B app from the playstore on their phone.
>>
>>102875215
What is this schizo shit
>>
Ministral Large bitnet
>>
>>102875215
wasting 1b on grok 3 is such a fucking waste holy shit. elon should've bought more dogecoins....
>>
>>102874634
I'm curious what people saw in Llama 2 back in the day. I couldn't see a difference between L1 tunes and L2 tunes no matter how hard I tried, apart from the obvious thing (context length)
L3 has some issues too, but at least it's obviously a step up in intelligence
>>
>>102875387
Llama 2 7B was better than llama 1 13B
>>
>>102875417
Llama 3 8B, sure. Llama 2 7B? Get outta here.
>>
>>102875451
Llama 1 was very bad anon. Although, I guess I should've said "as good as", not better.
>>
File: 1729242567362067.png (462 KB, 512x768)
462 KB
462 KB PNG
I do not want llama, nor mistral. I want mikusex.
>>
>https://ai.meta.com/blog/fair-news-segment-anything-2-1-meta-spirit-lm-layer-skip-salsa-lingua/
>SAM 2.1
>Meta Spirit LM (Speech2Speech aka local GPT-4o)
>Meta Open Materials 2024
>Self-Taught Evaluator
merry early christmas
>>
>>102875631
Not open weights. go back, buy an ad
>>
>>102875651
>Not open weights
It literally is. Click the link.
>>
>>102875631
>speech2speech scores 40% on MMLU
monkey paw curls once more
>>102875651
it is you mongoloid
>>
File: spirit-lm-training.png (34 KB, 762x240)
34 KB
34 KB PNG
>>102875694
>speech2speech scores 40% on MMLU
to be fair it was trained on a pitiful amount of data
>>
>>102875631
>We released the model trained with direct preference optimization, which is a strong generative reward model on RewardBench, despite not using any human annotation in training data creation. It outperforms bigger models or using human-annotated labels, e.g. GPT-4, Llama-3.1-405B-Instruct, and Gemini-Pro. The model is also available as an evaluator on the AlpacaEval leaderboard, as one of the top-ranked evaluators in terms of human agreement rate while being around 7x to 10x faster than the default GPT-4 evaluator.
Big
>>
>>102875854
Isn't it for training only?
>>
>>102875545
>>
>>102875631
chatgpt, summarize what this says
>>
File: 1724295535744932.png (22 KB, 656x163)
22 KB
22 KB PNG
All the layerskip models from Meta are just their old models that had some continued pretraining done to them. If we can pool together a couple dozen thousand we can get layerskip mistral large.
>>
>>102876085
This space really moves too fast, I have no fucking clue what's going on anymore.
My mind is still stuck somewhere on dynamic temperature.
>>
>>102876085
Doing a continued pre-training of Largestral wouldn't be very easy or cheap...
>>
>>102876121
Not fast enough!
>>
>>102876121
Why is your mind stuck on a meme sampler that wasn't a significant point at the LLM story, like, at all. So much that is was quickly forgotten?
>>
>>102876253
Maybe that's the exact point when I got brain damage from placebo overdose.
>>
>>102875631
The only thing there that is actually a production release is SAM 2.1. The other stuff is mostly just pure research artifacts that aren't for end users.
>>
Layer Skip will save LLMs. A model can now make use of speculative decoding using its internal layers. This can be added to any existing model. All we have to do is to figure out how to do this via finetuning and inference speeds will almost double.
We haven't been so back since 2023
>>
Does speculative decoding reduce quality? Is it just guessing what's next?
>>
>>102876489
Why not use a smaller, pretrained model instead?
Llama.cpp can do that already, in fact.
>>
>>102876500
>Does speculative decoding reduce quality?
No.
>Is it just guessing what's next?
Yes.
If the smaller model guesses wrong, it just slows down generation.
>>
>>102876500
Its guessing whats next and if its wrong it discards the guess. No impact on quality but it can increase speed.
>>
File: 1707269706041483.png (718 KB, 1656x581)
718 KB
718 KB PNG
https://x.com/doomslide/status/1847344776376365065
>>
>>102876583
>>102876583
>>102876583
>>
>>102876501
Maybe if a smaller model with the same vocab doesn't exist, so doing this could be a cheap solution. Might also be interesting to have a small model use this technique to be even faster.
>>
>he fell for the bitnet meme



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.