[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102011438 & >>102001133

►News
>(08/22) Jamba 1.5: 52B & 398B MoE: https://hf.co/collections/ai21labs/jamba-15-66c44befa474a917fcf55251
>(08/20) Microsoft's Phi-3.5 released: mini+MoE+vision: https://hf.co/microsoft/Phi-3.5-MoE-instruct
>(08/16) MiniCPM-V-2.6 support merged: https://github.com/ggerganov/llama.cpp/pull/8967
>(08/15) Hermes 3 released, full finetunes of Llama 3.1 base models: https://hf.co/collections/NousResearch/hermes-3-66bd6c01399b14b08fe335ea
>(08/12) Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: img_1.jpg (324 KB, 1360x768)
324 KB
324 KB JPG
►Recent Highlights from the Previous Thread: >>102011438

(1/2)

--Paper: Minitron Approach for compressing LLMs using pruning and distillation: >>102019300 >>102021940
--Papers: >>102019494 >>102019215
--Jamba 1.5 Mini (12B active/52B total) and Jamba 1.5 Large (94B active/398B total) released: >>102025061 >>102025083 >>102025105 >>102025135
--Planning a collaborative storytelling session with Mikupad and Llama.cpp: >>102017625 >>102018191 >>102018413 >>102018638
--Phi-3-medium-128k-instruct-onnx-cpu runs fast on CPU, GGUF q8 quant available: >>102012668 >>102012876
--Ollama struggles with extracting user credentials due to formatting issues: >>102018067 >>102018168 >>102021202
--MoE and sparse architectures discussion: >>102012838 >>102012877 >>102012959 >>102013545 >>102013637 >>102013760
--Gemma 2 2b control vector experiments and results: >>102019052 >>102019178 >>102019506 >>102019204 >>102019219 >>102019601
--Anons discuss and share terminal-based chat projects and ideas: >>102018061 >>102018212 >>102018449 >>102018224 >>102018319 >>102018484 >>102018432
--Anon fixes RAG issue by unchecking "Summarize Chat messages when sending" toggle: >>102022620 >>102022875 >>102023118 >>102023544
--Anon discusses the difficulties of creating a local model that can handle both normal and smutty content without being overly horny or dry, and how current models like Claude struggle with this.: >>102012011 >>102012374 >>102012459 >>102012619 >>102014256 >>102014863
--Anon considers making alternative to SillyTavern, seeks feedback on features: >>102023701 >>102023763 >>102023775 >>102023788 >>102023833 >>102023843 >>102023928

►Recent Highlight Posts from the Previous Thread: >>102011588
>>
File: img_14.jpg (301 KB, 1360x768)
301 KB
301 KB JPG
►Recent Highlights from the Previous Thread: >>102011438

(2/2)

--Magnum-123B has perspective switching issues, unlike Mistral-Large-Instruct: >>102018999 >>102019038 >>102019185 >>102019331 >>102019491
--Anon seeks help with prompt to make AI respond concisely: >>102012681 >>102012743 >>102013429
--Anon runs 8B LLM on gaming PC, discusses societal implications: >>102018513 >>102018662 >>102018592 >>102021742
--Phi-3-medium-128k struggles with adult roleplay content: >>102015287 >>102015718 >>102015826
--Meta-Llama-3.1-70B-Instruct has limitations and may not live up to expectations: >>102021345 >>102021449 >>102021515 >>102021502 >>102021639 >>102021646
--Hermes 405b model struggles with asterisk quotation mark mix-ups: >>102021297
--Anon thinks diffusion-guided LLMs are necessary to avoid hallucination and misalignment: >>102014643
--Anon discusses The Living AI Dataset and its potential to create a sentient AI model with empathy and love: >>102022143 >>102022218 >>102022241 >>102022256 >>102022310 >>102022318
--Miku (free space): >>102013020 >>102013180 >>102013618 >>102013630 >>102013793 >>102013946 >>102014401 >>102014423 >>102016872 >>102018287 >>102020209

►Recent Highlight Posts from the Previous Thread: >>102011588
>>
Jambalove
>>
>>102025568
I love Miku and I love you Anon.
>>
I'M THINKING
MIKU
>MIKU
OO EE OO
>>
Working on a new model, my friends. It's very much a work in progress. Here's a log:

https://files.catbox.moe/1tg4k2.txt

It's a little long, so feel free to skim it. This is based on llama 3.1 8B btw.

There were some rerolls and some minor edits bust mostly kept things as is. For example the model got into the habit of writing "Oh boy," every single time as starting phrase so I had to edit that out.
>>
Interesting, with proof from RULER it seems like jamba finally fixed the context issue. Hopefully we get compatibility soon.

https://www.ai21.com/blog/long-context-yoav-shoham
>>
>>102025941
>Hopefully we get compatibility soon.
We still don't have Jamba 1.0 compatibility.
>>
>>102025941
>yoav-shoham
>>
What's this about onnx on cpu? Will I be able to run Mistral Large above 0.6t/s with it?
>>
>>102025941
Fascinating how on this they don't mention llama3 at all. Phi, Mistral, Command R/+ but no mention of llama3.
>>
How can one detect degradation of Q2 vs Q8 quants?
As in, what's going to be retarded in Q2, that isnt going to be retarded in Q8?

Being a 24gb vramlet, I'm just trying to understand if it's better to use a bigger model at lower quants, or use a smaller model at q8.
>>
>>102026020
?

https://www.ai21.com/blog/announcing-jamba-model-family

https://www.ai21.com/blog/long-context-yoav-shoham

They do on all the benchmarks
>>
What's the proper ST context template for Qwen2?
>>
>>102026107
Look at the images in the "long context" post, none show llama3
>>
>>102026132
chatml
>>
>>102026171
Thank you.
>>
>>102026107
Oh, I'm retarded and hadn't noticed this wasn't related to jamba 1.5...
>June 26, 2024
>>
https://github.com/exo-explore/exo
>>
So now that the dust has settled, did jamba 1.5 save the hobby?
>>
>>102026265
Would be cheaper and almost certain get you more t/s to just buy more RAM.
>>
After extensive use of mixtral I've came to a conclusions that it's absolute dogshit. The only advantage it has over a 70b quant of comparable size is speed.
>>
jamba large seems underwhelming for its size, I don't know why companies go all in training these behemoths
it's a massive waste to train such a large model if you haven't produced a top tier small model as a proof of concept
>>
>>102025941
Very cool, where is the gguf?
>>
>>102026278
>>102026286

Yes!
>>
I like the new Jama minis writing style.
>>
>>102026314
What did you ran it on? Any log?
>>
>>102026332
just azure, I dont think anything supports it yet.
>>
>>102026286
that's wild. 8 retarded 7b models smashed together in one isn't better than a 70b? who would've guessed!
>>
>>102026389
thats not how moes work
>>
>>102026401
Don't feed the troll, anon.
>>
>>102026389
Last time I read about it anons were saying it's the greatest thing since sliced bread. It does kinda feel like 7b now that i think about it.
>>
>>102026430
That was one anon being obsessed with Mixtral for some reason.
>>
>>102026430
It was very good for it's time, before miqu and such.
>>
>>102026457
Any l2 70b shitmix wipes the floor with it.
>>
Jamba verdict?
>>
>>102026502
meme
>>
>>102026447
cuz he's a poor retard.
>>
>>102026502
no one can even run it yet. And api has censored inputs.
>>
>Jamba 1.5 Large (94B active/398B total)
oh come on, i can run 120B largestral fine at q4 but they couldn't make a medium-sized model for this? it's either the cucked 54b or a ridiculous 400b?
i'll pass
>>
>>102026780
bro they don't make models with you in mind.
>>
>>102026780
Bwo? Just have 4 4090s for the active portion and 500 exabytes of ram for the unused experts?
>>
File: 1707090112465128.jpg (24 KB, 635x601)
24 KB
24 KB JPG
>54B modes are now considered "mini"
>>
Okay, I just tried both Jambas for translation, and... They both are dog shit. It's not even funny, Large is worse than 70B and it has 300B+ parameters.
>>
>>102026852
>for the unused experts
Unfortunately this is not how it works in practice. For real use, all experts are basically used. The active parameters thing is just about how many are active per the processing of each token, not per prompt.
>>
>>102026942
>Multilingual: In addition to English, the models support Spanish, French, Portuguese, Italian, Dutch, German, Arabic and Hebrew
You tested Japanese right? They don't support that it seems.
>>
https://rentry.org/magnum-v2-4b
>>
>>102026996
That may be the case, however, many models claim to have no support for Japanese, but most big beak models do somewhat well on it anyway.
>>
smedrins
>>
File: ihavelehardware.png (101 KB, 756x838)
101 KB
101 KB PNG
>>102027071
the compute, bro... it's too expensive bro... donate pls
>>
>>102027162
go back
>>
>>102027199
No, you go away. I'm sick and tired of no friend-having losers like yourself trying to keep everyone at your level instead of encouraging people to connect with each other.
>>
>>102027239
projection???
>>
>>102026931
>Original mini was phi at 3.8 billion parameters
>OpenAI then names their 82 percent MMLU mini
>Then Elon names his 85 percent MMLU model mini
>Now Jamba mini
Yikes
>>
>>102027239
This is why people give up and hand shit over to discord, since at least there's a way of dealing with trolls and shitposters.
>>
>>102027239
>Why won't anons test the models for me? I can't waste time with them!
>t. alpindale

Agreed, go back to discord, you have enough testers there already.
>>
>>102027750
I wanna test shit too, is discord open?
>>
>>102026389
>>102026286
Does Mixtral have any good finetunes?

I just use https://huggingface.co/TheBloke/dolphin-2.7-mixtral-8x7b-GGUF/tree/main
>>
File: 1718726413358418.jpg (38 KB, 570x744)
38 KB
38 KB JPG
>virtually EVERY SINGLE GAME i play with the ai is either 1) ww2 natsoc larp where i join the reich and help win the war or 2) some other larp about my character joining the fascists and destroying the opposition
why can't i just do normal boring coomshit?
what's wrong with me bros?
>>
>>102027821
starling 7b
>>
>>102027828
>what's wrong with me bros?
Modern society failed to diminish your innate desire to conquer and build.
>>
>>102027828
it's because you're 16 years old
>>
>>102027828
based. sex is boring.
>>
>>102027828
do one about a russian girl castrating you
>>
In an effort to improve my cooming quality I finally sat down and started pasting a hentai game script into the window as a prefill. I made sure to use something that uses almost only dialogue with some minimalistic matter of fact descriptions for actions.

I made it like 8k tokens of prefill and then I started actually using it. What shocked me is that after I was done it felt like I generated another 8-12k tokens but it turns out the whole AI generated part was only 4k tokens. Now I am starting to think that leaving the model to generate shit from the start is a terrible thing because it will just not stop itself from describing the iridescent radiance of the particular peculiar gleam of her one iris in her eye. And second thing I noticed is that even with 8k prefill those fuckers still want to stuff that novel type purple prose everywhere they can. Where of course longer multi sentence descriptions weren't there in the prefill.

TL DR: all models still fundamentally suck for cooming.
>>
can you cpumaxx the jambaree? the tripfag did 8 bit llama 405b so the official 8bit quant would fit in ram with about the same size right? or is it exclusively gpu only?
>>
>>102027947
>If you don't have access to a GPU, you can also load and run Jamba 1.5 Large on a CPU. Note this will result in poor inference performance.
https://huggingface.co/ai21labs/AI21-Jamba-1.5-Large#model-features
>>
>>102028011
interesting, in theory it could be around 4x the speed of 405b, might cross the usability threshold
>>
>mini
>52B
Now that is a mini I can get behind to fuck.
>>
File: 1701307139265892.jpg (150 KB, 432x2048)
150 KB
150 KB JPG
wtf I want to enjoy the schizokino of the 405B base model
>>
>>102027908
What model? Was the prefill formatted as required by the model's prompting specifications or just copy-pasted as a lore entry? I often found that simply copy-pasting a long conversation (from a novel, fiction, etc) as-is doesn't work as it intuitively should, and converting pre-made dialogues into (many) formatted turns can end up dumbing the model down significantly.
>>
>>102028098
>prefill formatted as required by the model's prompting specifications
Yes of course it was...
>>
>>102028083(me)
>check buggedcpp
>support Jamba hybrid Transformer
>may 25
>still not merged
Never mind...
>>
>>102026158
It's on the benchmark they reference though.
https://github.com/hsiehjackson/RULER
>>
>>102027071
what the fuck is this model. it's so good what the fuck?
>>
>>102028147
see >>102026211
>>
>>102026286
I've known that since it came out, but there are some people that get really angry if you say it and yell at you non-stop. Just like with nemo currently.
>>
>>102028151
It is not you are just vram starved. If you had a... never mind. You know what? Buy a fucking ad.
>>
File: 1700823992572928.png (758 KB, 768x1024)
758 KB
758 KB PNG
flux lora training is so good, and simple. what a time to be alive
>>
>>102028219
what the fuck is this perspective
>>
>>102028219
Good lord, I didn't realize how good it was at generating body horror
>>
>>102028219
hi petra
>>
File: 1693221376780302.png (42 KB, 722x360)
42 KB
42 KB PNG
>>102025568
>they're marketing jamba-large as a competitor to l3.1 70b and mistral large instead of 405B in their blog
Impressive. Shame about those extra 300GB VRAM I need to to run Jamba-Large to get a performance similar to during inference when I want to run both at 8bpw.
>>
>>102028219
I thought people said that flux loras are impossible to make a few weeks ago
>>
>>102028259
The corpos aren't really bright.
>>
>>102028268
It was just copro damage control.
>>
>>102028268
Retards say that about every single model that comes out.
>>
>>102028268
turns out dimwits like to come to bad conclusions for clout while the smart guys figure out how to do it.
>>
>>
File: 1702318120346045.jpg (549 KB, 1664x2432)
549 KB
549 KB JPG
>>102025568
hello /lmg/
>>
>>102028298
corpo* lmao
>>
>>102028311
>>
>mikufags already shitting the thread with their presence
>>
>>102028330
Nothing violent, just fixing her hair :)

>>102028312
>we posted at literally the exact same time
Woah.
>>
>mikufags pissing up the place
right on schedule
>>
>>102028259
The main thing is the context performance. At long context it should be much faster than even 70B. Vram does not matter a huge amount to corpos, inference speed does.
>>
File: .png (9 KB, 256x256)
9 KB
9 KB PNG
>>
>>102028341
I forgor the image in the middle of replying to the other guy.
>>
File: 1567919777866.jpg (62 KB, 500x618)
62 KB
62 KB JPG
>>102026286
Dogshit compared to what else?

It's one of the few models that's perfect for 24GB cards. Dropping to 12B always feels like a waste and with CRs disgusting RAM usage on context size, it's pretty much the only above 12B model worth using up until the 70Bs
>>
>>102028382
4/4 ok I'm done no more today, sorry if you didn't like it bros.
>>
>>102028259
Are we just assuming the mini will be shit?
>>
>>102028144
I built the PR and converted a model but the server implementation still has that fucking deprecated wait call coded into it.
>>
>>102028400
>sorry if you didn't like it bros.
wtf are you talking about?
more miku is ALWAYS welcome
>>
>>102028406
what local models AREN'T shit?
>>
>>102028429
Mixtral
>>
>>102028428
>>
magnum 123b is pretty good but I've given up on 123b because it's too fucking slow
magnum v2 72b is... not that good. writing kind of sucks in that typical qwenny way. why'd they train on the instruct?
>>
>>102028469
That's not miku, that's a random whore that attempts to emulate miku's looks
>>
>>102028419
Jart was right about llama.cpp. It's time to put the old dog out to pasture.

Let's support llamafile from now on and get the architecture in there first.
>>
>>102028485
classic cope
>>
>>102028491
Does anyone have the "your miku is not my miku" image for this bozo?
>>
>>102028485
It CLEARLY says "My Beloved Miku" right there faggot
>>
>>102028520
exactly, (You)r beloved miku
>>
>>102028527
How can you say miku isn't your beloved? Identify yourself so we can kick you out of our discord.
>>
File: 1705210344435309.jpg (96 KB, 828x980)
96 KB
96 KB JPG
are you ready?
>>
Making Miku the mascot of this general was a mistake. Why did we do it, anyway?
>>
>>102028312
Hello Miku
>>
>>102028562
You probably joined the discord too late. That channel where we discussed raiding this place mentions that miku should be the mascot because this is what we should all aspire to be after we transition.
>>
File: 1711187160977338.png (29 KB, 1340x701)
29 KB
29 KB PNG
>>102028406
Mini's direct competition is llama 3.1 8b and gemma 9b according to the same blog post by 21ai. They aren't even mentioning Mistral-Nemo despite being the better comparison by in their cope logic considering Jamba-mini and Nemo both have 12b active parameters.
>>
>>102028562
Idk about "we", but I simply just use Miku as a subject for my gens because she's easy to prompt and she's a cute anime girl. I'd gen Teto and others if the model was better at getting them right but it's unfortunately not. Waiting on loras I guess.
>>
>>102028652
you could always gen some Makise Kurisu
>>
>>102028679
Flux knows her?
>>
>>102028652
Yeah she's getting easier and easier to gen too, since her presence is so prevalent in synthetic data now.
>>
File: file.png (12 KB, 288x230)
12 KB
12 KB PNG
Just as it should be.
>>
>>102028723
Yeah I'm saying AGI.
>>
>>102028723
Let me guess, you need more?
>>
>>102028487
>here's your 90GB executable
>>
>>102028778
*takes the executable and pockets it*
Thank you anon, this is very convenient.
>>
File: gabagool.jpg (808 KB, 1664x2432)
808 KB
808 KB JPG
>>102028312
why stop there really crank it
>>
>>102028562
petra spamming OPs and a cute anime girl was needed to unite the general.
>>
>>102028679
You can't spell Makisu Kurisu without Miku
>>
Something big is coming next week.
>>
>>102028867
I prefer the term "Maku"
>>
Is it just me or is the koboldcpp implementation of MiniCPM broken? Has anyone gotten the OAI chat completion endpoint to return the same response as in the huggingface demo? Responses are usually short and sometimes completely schizo.
>>
>>102028900
ONE MORE WEEK UNTIL PROJECT STRAWBERRY IS FINALIZED
THE FRUITS ARE ALMOST SPROUTING SEEDS AMONG THE HAMSTERS
ELEVEN HOTDOGS
TRUST THE PLAN
>>
>>102028910
Some anon a couple of days ago said the same thing and then discovered that apparently copy-pasting the image works while uploading it somehow breaks it. No idea if its' true though.
>>
>>102028904
Makusex
>>
>>102028900
>>102028922
Big, if true.
>>
>>102028927
that anon wasn't using the oai endpoint, he was using kobold lite ui
>>
>>102028927
I remember that, but he said after it worked the first time it worked even when uploading. That it might have been a cache issue. But I'm calling the oai endpoint directly, not using the ui.
>>
>>102028723
dog level intelligence achieved
>>
deepsex 405b
>>
Colossus-R-513B
>>
>>102029008
let drummer cook
>>
DeepThroat-V2
>>
Simulated cat brain
>>
Best local model for vscode continue plugin? They themselves recommend llama 3, but what the fuck do they know?
>>
>>102028562
It was supposed to be Chesh.
>>
>>102029127
Jamba 1.5 mini
>>
how do I get over the embarrassment of asking the model to have sex with me
>>
>>102029149
stop asking. start taking.
>>
>>102029149
i just say *rapes you* in the middle of a normal rp and the model takes care of the rest usually tbqh
>>
>>102029092
A loader that is just a wrapper for another loader and the only unique feature it has is that it inserts another sysprompt, telling the model to pretend it is a cat brain pretending to be a helpful assistant.
>>
File: file.png (51 KB, 804x465)
51 KB
51 KB PNG
so sassy
>>
File: 1702316034809359.png (53 KB, 587x546)
53 KB
53 KB PNG
Metamate open release when? They're keeping it from us.
>>
I personally feel that we need bigger models.
>>
>>102029271
>internal company docs
If anyone leaks this they will be hunted down kek.
>>
>>102029285
This, enough with all those 70bs and 120bs. We need to go back to the pre-chatgpt GPT3 doctrine of JUST MAKE IT BIGGER. 405b was a step in the right direction at least.
>>
>>102029285
>>102029337
"We" (corpos) are doing exactly that but it takes a long time to train big models. So there will be months between releases at a minimum, or closer to a year for new frontiers.
>>
>>102025568
Why do her eyes look like pussy hair moustaches?
>>
I wanna roleplay chat with my AI waifu in sillytavern. Like we are texting over a messenger or so. What LLM is best for that scenario? I have a 3080 with 10GB VRAM and my PC has 32GB ram. Right now I'm using Poppy_Porpoise-0.72-L3-8B-Q8_0-imat.gguf and it's kinda ok. Not as inteligent as characterai but at least not censored

Thanks!
>>
>>102029593
I've been enjoying Lumimaid so far.
https://huggingface.co/NeverSleep/Lumimaid-v0.2-8B
>>
>>102029593
https://huggingface.co/fblgit/UNA-TheBeagle-7b-v1
>>
>>102029593
Mini-magnum has a conversational style that probably works pretty well for that.
Try mixtral 8x7b limarp zloss too.
>>
I am so curious if it is oldfags trolling or there are no oldfags and it is newfags being genuine.
>>
>>102029593
https://huggingface.co/turboderp/Mistral-Nemo-Instruct-12B-exl2/tree/5.0bpw
>>
Is LoTA better?
https://nitter.poast.org/PandaAshwinee/status/1825571610230723027
>>
Redpill me on the Yi 35B Models.

Are they good, what finetunes to use if yes? (for someone who can't run 70b models) Wanna just use them for cooming
>>
>non-Transformer model
>look inside
>transformers
>>
>>102029623
>>102029631
>>102029648
>>102029658
Thanks a lot, gonna try them!
>>
>>102029687
no
>>
>>102029691
they're always in disguise
>>
>>102029654
I can guarantee you that there is at least one oldfag remaining.
If you have any questions about how 4chan used to be, feel free to ask.
>>
>>102029654
I like to think the oldfags are not engaging and instead lurking, waiting for something truly interesting to occur.
>>
>>102029662
I thought it was a joke but it actually looks very promising, too promising in fact, either it isn't all that good or it will be revolutionary.
>>
What if im making a project where you could call your LLM.
Need a name for the project
>>
>>102029856
SillyVoice
>>
>>102029734
Is /loli/ a urban legend or was it actually a thing?
>>
>>102029789
This. I'm too old to engage with bait and shitposts.
>>
>>102029654
oldfag here, I can guarantee that I'm here.
>>
>>102029662
still waiting for MORA
>>
>>102029856
Slopline.
>>
>>102029789
>mikutroons spam all the time regardless if something is happening or not
Checks out.
>>
>>102029873
It was an actual thing. Loli wasn't even that rare of a thing on the internet back in the day.
But the more popular something gets, the more law-abiding it becomes,
>>
>>102029856
Shiver.
>>
>>102029856
Husky.
>>
>>102029873
Yes, that was a real thing in 4chan. Lewd drawings of small anime girls was still in the gray before. It's still rampant in Japan from what I heard. Can't believe it's been so long that it's become an urban legend.
>>
File: ssrlkk24py531.jpg (242 KB, 1168x1368)
242 KB
242 KB JPG
>>102027162
>>102027239
>spamming the thread with dumb drama from your schizo headcannon
>>102028343
>>102028520
>>102028546
>>102028562
>>102028608
>>102029932
miku isn't going anywhere
seethe
>>
>>102030082
>unfunny reaction image
>mikutroon
Checks out.
>>
Is Jamba Strawberry?
>>
Don't listen to Miku.
>>101997677
>>
>no jamba support for llamaccp and exllama2
>>
>>102030293
Serious backends do not support meme architectures.
>>
File: media_GVmixV3WgAArcz8.jpg (134 KB, 1200x675)
134 KB
134 KB JPG
NovelAI just made every other open source model obsolete.
>>
>>102030293
What is there to run?
>>
>>102030336
https://blog.novelai.net/novelai-diffusion-v1-weights-release-en-e40d11e16bd5
https://huggingface.co/NovelAI/nai-anime-v1-curated
https://huggingface.co/NovelAI/nai-anime-v1-full
https://huggingface.co/NovelAI/nai-furry-beta-v1.3
>>
>>102030336
lol, so an official release of the leaked ones?
>>
File: 1713026826658109.png (35 KB, 642x264)
35 KB
35 KB PNG
>>102030336
>>102030365
This is just the model that was leaked two years ago.
>>
>>102030385
And it's still better than every other open source model.
>>
>>102030361
The new model that just came out with 256k context + better understanding at long context than any other local model of any size.
>>
>>102030336
>>102030365
I thought people were saying it was v3 that was going to be open sourced. So they were lying?
>>
>>102030405
Not many people have the ram for that, and the mini is probably too small to be any good. So, who cares?
>>
File: 1704474174252463.png (144 KB, 1206x378)
144 KB
144 KB PNG
>playing a casual rpg
>inserts a random woman and starts hinting at intimacy out of nowhere
Why do they always do this? At least it gives an option to decline...
>>
>>102030469
Why did you censor the random woman's name?
>>
>>102029856
llamaphone
>>
>>102030469
This is what LLMs are for, Sam...
>>
>>102030541
Because it's Lily and I don't like seeing that name.
>>
>>102030437
The mini is 52B
>>
>>102030541
because it's a themed setting and i don't need every autist knowing what stories i like to play :^)
technically only her surname was recognisable though, so i guess i could've left the first name in, "Eve" if it makes any difference

>>102030587
i did define a general outline of the plot in the instruction though, no romances included...
just worried that if i go down the intimacy route then the next 20 paragraphs are going to be mindless slop describing graphic sex or flirting instead of continuing the story like i want it to
>>
>>102030336
>>102030385
What a joke kek, they should've done that much earlier. I guess they realized SD-based models are obsolete now because of flux.
>>
>>102030661
Only 12B parameters active, and it's only as good as Gemma 2 9B according to some redditor.
>>
>>102030661
But they're comparing its performance in their benchmarks to llama 8b, and gemma 9b. So expect it to be about that level. It's not going to be like a 52b model intelligence wise. The active parameters are only 12b and it performs on the level of an 8b model, so the context is all it has going for it.
>>
>>102030661
Why use it over Mixtral 8x7?
>>
File: file.png (3 KB, 383x40)
3 KB
3 KB PNG
why does kcpp reprocess my prompt so often? i thought the smart context or whatever was supposed to prevent that?
i literally just typed out a response and sent it and this is what it does, it's not the only time either
set to an 8k context limit if that matters
>>
>>102030868
Probably have a lorebook or authors note or something that inserts into the context further up.
>>
>>102030868
try without smart context, i don't think they use context shifting with it
smart context is an old method of avoiding reprocessing by making it less frequent, by leaving half of the remaining context window empty before repeating when it fills up
context shifting will keep rolling the context window even when it's near the cap and rarely reprocess unless you're changing stuff at the beginning of the prompt
>>
Magnum OOMs for me on Apple, others work fine. I am confused how to debug. RAM is plentiful.
llama.cpp with metal

Is there discord by the way?
>>
>>102031104
which magnum
which quant
how much RAM
>>
>>102031104
Which magnum?
>>
>>102030899
nope, no lorebooks or anything
tbdesu i've never gone far enough over the context window to warrant adding notes, nor have i really wanted to put effort into writing them for other reasons
but as far as i'm aware sillytavern SHOULDN'T be modifying earlier parts of the prompt in any way so that's why i'm confused

>>102030913
context shifting, yep, got the names mixed up
just checked and it's enabled
so not sure why it's happening still
>>
>>102031104
Set the context to something lower than the default (specified by the model). Start with -c 8192 and move up.
>>
>>102030117
That was pretty funny actually
>>
>>102031159
Log the context it sends, hard word wrap it and do a diff, see what's being changed.
>>
Think its possible to influence the LLM's writing style by vectorizing smut novels and feed it to your character's data bank in ST?
>>
>>102031259
Even a few examples can help set the tone at the start and then you have the whole chat as an example. What's the need to go that far?
>>
>>102031283

Because I use gemma and it getting it to write spicy depictions of the female body is a struggle.
>>
>>102031151
https://huggingface.co/anthracite-org/magnum-v2-12b
>>
>>102031361
I assume you aren't trying to load the .safetensors files with llama.cpp and is instead using a GGUF.
In that case, >>102031165 is probably right.
If you don't specify the context size, it'll try to load the full 128k tokens, which will take an absurd amount of memory.
>>
>>102031404
>it'll try to load the full 128k tokens
a million actually
>"max_position_embeddings": 1024000,
https://huggingface.co/anthracite-org/magnum-v2-12b/blob/main/config.json#L14
>>
>>102031404
>I assume you aren't trying to load the .safetensors files with llama.cpp and is instead using a GGUF.
Of course he's loading a gguf with llama.cpp. What else would he load with the thing he specifically said in his post?
It's one thing debugging normie issues that cannot read the console output. Being so confused about how things work is a different thing.
>>
>>102031477
eh we often get people asking how to load safetensors in kobold and such
>>
>>102031456
Geez.

>>102031477
He could be trying to load the safetensors and getting a completely unrelated error, as has happened more than once in these threads.
>>
>>102031504
>>102031493
>Magnum OOMs for me on Apple, others work fine. I am confused how to debug. RAM is plentiful.
>llama.cpp with metal
He obviously capable of running other models. He cannot read console outputs, but he can at least read instructions.
>>
File: ComfyUI_00138_.jpg (1.33 MB, 1344x1728)
1.33 MB
1.33 MB JPG
hey, I mainly use ai for image gen.
so I got the idea of using ai to troll people on ai generals because they keep asking for smut of the character i'm making, and i lack the creativity to impersonate her.
I want the ai to roleplay as a preppy smug brat that harshly denies any request some anon makes. do i just tell the ai to act like a brat and then feed the prompt the anon's request?
I got it up and running. so like do i tell the prompt "you are a smug preppy brat that denies every request a user makes and makes fun of them for it" or something like that ?
I hate being that newfag but here i am.
>>
>>102031636
You might want to use either a prefil, or a instruction at depth zero to make sure that the brat will do its best to deny anon's request, otherwise there's a good chance that it'll forget that specific instruction real quick.
>>
>>102031636
Ask the 12 shitposting bots that roam this very general
>>
>>102031636
I bet the teto guy knows how to do it.
>>
File: 1489083716440.gif (388 KB, 230x139)
388 KB
388 KB GIF
Let's play a game! This Saturday at 1 PM PT, I will do a collaborative storytelling/RP session (location TBD, maybe in the thread itself?), where I post a scenario and responses from the model in the thread, and people discuss what to do in the user chat turns, or edit previous user turns or the system prompt and start over. This is going to be both for fun and to get us (mostly) reproducible reference logs, as I'll be using greedy sampling in Mikupad and have the full log in a pastebin at the end. No editing the model's responses, we're going to use pure prompting to try and get the thing to do what we want!

The scenario is also still TBD. We're going to go for as long a context as possible until the model breaks down uncontrollably, so it should be a complex enough scenario for that. If anyone has suggestions for scenarios I'm all ears. Also, I'm planning on starting these games with Mistral Nemo at Q8 for the first session, and other models in the future, so we have reference logs available for a whole range. But I'll take suggestions for models people want. I'm only a 36 GB VRAMlet though so I'm a bit limited. I can run larger models up to ~88 GB but it'd be slower. If anyone would like to host any of these games themselves, that has more VRAM to run such larger models at a good speed, please do, and I will step down.

>current suggestions
>>102002238
>>
>>102031774
The scenario anon proposed but one of the 3 is a doppelganger infiltrating for some even more nefarious reason.
>>
>>102031774
complex sex with miku
>>
>>102031636
>anon speedrunning getting a ban for being an avatarfag
>>
>>102031259
People say example messages heavily influence style but in my experience that has never worked. Using author note to tell the AI to use specific words/phrases when describing x works infinitely better
>>
File: 1715794606976513.png (95 KB, 2497x1289)
95 KB
95 KB PNG
>>102031212
well that's fucking weird
it DID modify my prompt somehow, in two locations
once at the very beginning just after the instruction, it inserted something i never said, "let's get started..."
and a second time near the beginning of the last response it inserted a "Narrator:" (presumably because it didn't actually finish and i chose to continue it, i guess it only got inserted after the response was properly finished)
that first one is weird though, what could be going on there?
>>
>>102031774
This bunker scenario could be fun if anons pitch those world evens as the things goes on.
>>
>>102031636
You should ask in a general with people that know how to write, not here.
>>>/vg/491349658
>>
File: example.jpg (41 KB, 1259x270)
41 KB
41 KB JPG
>>102031636
how is the tone?
do you want more or less kaomojis
>>
>>>102023701

I completely gave up on the idea, but I was thinking about using an IRC server and a modified version of HexChat, given how similar the SillyTavern interface already is, and how it would support multiple users straight out of the box. It's the sort of idea that people would tell me to kys for though, for no other reason than because it probably wouldn't be written in React or whatever the flavour of the month language is.
>>
>>102031966
Multiplayer llm? Sounds interesting, actually.
>because it probably wouldn't be written in React or whatever the flavour of the month language is
Honestly, who cares what the technologically inept would rather use?
I'm using Wails + Svelte for the thing I'm making because it just works.
>>
>>102031953
kek. No like a really mean bitch, harsh. she has to hurt my feelings.

oh no.. this is.. i'm scared.
>>
>>102031828

Can I somehow trigger a recall in the long term memory (file entries in the character database) in the Author’s note? Like, write in the style of “spicy_stories by x author?”
>>
>>102031804
I like that.

>>102031852
You mean during the game or while we're making suggestions here?

Actually for this general scenario to work I'm guessing we'd have to flesh out the characters a bit at least. Asking Nemo just to come up with all of it on the fly and having those instructions in context sounds like maybe a too challenging of a task that could confuse it. Then again we could just try it out I guess, and then do something simpler if we find out Nemo can't handle it.
>>
>>102031636
Here is some very low effort gen with Mistral Large q8_0.
The prompt is highlighted.
>>
SLOP IS SOVL and I'm tired of pretending it's not
>>
>>102032099
i see. im kinda going the tavern route and inputting the scenario there.
I like where this is going
>>
>>102032079
Sorry anon, not sure what you are talking about. If you are using ST you might be able to reference lorebooks but I dunno
>>
>>102032083
>You mean during the game or while we're making suggestions here?
During the game. Something new gets pitched and characters need to struggle trough it. Basically user takes over as a narrator. Should be easy for you to handle as well.
Also like this other anon's idea with a traitor.
>>
>you reply, your voice [X]
>she says, her voice [X]
>you say, your voice [X]
>[X] says, his voice [Y]
>he asks, his voice [X]
STOOOOOOOOOOOOOOOOOOOOP
>>
>>102032210
Just ahh ahh mistress stop X, don't be shy.
>>
>>102032001
>Implying this was never done before
Did people already forget about agnai?
>>
>>102032270
Okay, so?
Are we just going to stop developing new things entirely because they've already been done before in some aspect?
Get the fuck out of here.
>>
>>102032282
I never said that schizo, get your meds.
>>
File: giant fuckign kettles.png (1.82 MB, 1280x1477)
1.82 MB
1.82 MB PNG
>>102032301
Welcome to hell.
>>
>>102032192

I want to have a file depicting female anatomy in a sexy way. Thinking of using ST’s RAG/Vector Database feature to influence the output of the model, in this case, Gemma 27b. Since you mentioned using Author’s Notes as a power tool, I am wondering if I can chain that together with the entries in the Vector Database so I don’t have to prompt specific keywords and shit to make it write in a style that I want depending on the context.
>>
>>102032210
I(>>102027908) checked my manual prefill of pain and there were no voices. No eye sparks/gleams/explosions either. While it tries to put in the harlequin novel shit all the time if I don't let it in, most of it is gone. I guess it really is as simple as all of this slop being tied to novels for biowhores and novels for biowhores sounding as the closest thing to the cooming material you request from the LLM. I hate women.
>>
>>102032195
Oh I see how that'd work. It wasn't clear to me whether the scenario was for us to be one of the three characters or some kind of co-writer.
>>
>>102032337
>eye gleams
geez that's another one that really gets on my nerves
I'M SO SICK OF SLOP
not even doing erotica, just normal CYOA story rpgs
>>
>>102032368
What is your prompt/card/whatever?
>>
>>102030688
They did the same thing with their first writing model, they opened sourced the weights but at least also the training config after no one gave a fuck and surpassed it with other models. This is even more useless anyways because this is just the weights. The code or the configuration for training the models would've been the interesting part and there was no reason not to release it. I don't even know why they have an open source page, seriously, if they are going to be this behind the 8 ball on open sourcing. Why pretend to care when it's fuckall useless?
>>
>>102032210
Using nemo or something related to it I take it?
>>
>>102032581
>related
Yes they both end with .gguf
>>
File: 1710095300375024.png (131 KB, 648x445)
131 KB
131 KB PNG
>>102032453
largestral + picrel, a custom generic "Narrator" i made
it's a somewhat old prompt that i've modified over time to cater to different situations, for instance it would frequently try to make my character express regret over anything it deemed "immoral" so i added a clause to try and work around that
>>
Any anon out there who can help me with good configurations and templates for Silly Tavern using Magnum v2.5? I'm trying to get the model to function correctly, but I'm having trouble getting it to follow the instructions in the text format or limit its output to only five lines.
>>
>>102032660
>it would frequently try to make my character express regret over anything it deemed "immoral" so i added a clause to try and work around that
lol, did that work? I feel like most models are RLHF-deep-fried to be like this.
>>
>>102028562
I got really bored in the summer of 2023 and forced it
>>
>>102032726
it actually did from my limited testing
with that clause added in, and sometimes by explicitly mentioning "character is evil" or similar in the generation guidelines it doesn't complain as much as it used to, if at all
i recall it used to bug me a lot but i don't think it's happened again recently
occasionally the 4 options it provides will lean more towards the "moral" side but in those cases i can always type an action manually without any issues
>>
File: 1594415927049.jpg (16 KB, 295x342)
16 KB
16 KB JPG
New MoEs when?
>>
>>102032821
Phi-3.5 released 2 days ago, does that count?
Jamba is a MoE too, right?
>>
>>102032833
Yes and the answer is 2 more weeks at least because no goofs.
>>
>>102032821
We just got two today. And 2 days ago, so using AI researcher logic you'll get 1.5 more in 2 days.
>>
>>102031165
>>102031404
>>102031559

Thank you, setting up context worked. Now relationship with eaten memory is clearer, as before I expected it to be more straightforward function of the file size.

Was my first experience with getting help from distributed anon intelligence here, appreciating ^^

Idk what you mean by reading console, there is bunch of stats and that line:
error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
Without knowledge about ctx, I could only see high requested ram in stats
>>
File: 1678741433645086.png (10 KB, 259x288)
10 KB
10 KB PNG
>>102032833
Oh, I've been under a rock for a month and I'll admit they slipped past me purely because I was only looking for "# x #b" name formats, whoops.
>>
>>102029149
why would you want to get over it?
>>
>>102032821
Mistral abandoned MoEs and Jamba straight up admitted that MoEs only compete with models of a similar active parameter count. Enjoy your 400B model that trades blows with 70b.
>>
>>102032991
cpufags still come out ahead tho due to more t/s
>>
File: ctx.png (8 KB, 680x156)
8 KB
8 KB PNG
>>102032845
This is what i get
>buffer size 167772160032
>failed to allocate buffer for kv cache
>llama_kv_cache_init() failed for self-attention cache
Those 167GB sound like a lot. Different backend, but still. Don't read just the last line. Not only should you read it when it fails, you should read it when it succeeds to have a point of comparison.
Glad you got it working.
>>
>>102033018
I don't think it'll be faster than a 70b even if you did have the 400GB+ ram.
>>
>>102033113
It'll run like a dense 96B would considering it has that many active parameters.
>>
>>102033208
Exactly, which is slower than a 70b would run. So it's pointless.
>>
>>102032991
>MoEs only compete with models of a similar active parameter count
WizardLM and Deepseek V2 both proved that obviously wrong. Seems more like a Jamba issue if anything. Or it might not even be the architecture but just this specific lab's data quality for all we know.
>>
>>102033208
>>102033272
They claim it runs faster and suffers less of a speed drop than pure transformers in high context situations. I'd be curious to see head to head comparisons if it ever gets implemented in llama.cpp.
>>
File: 1721614998713526.png (77 KB, 959x713)
77 KB
77 KB PNG
>>102033316
They also claim llama 3.1 70b is slower than 405b or mistral large.
>>
>>102033397
Obviously 70b and 405b were accidentally swapped in that chart. You can tell since they drop it out after 64k because they presumably couldn't fit it with full ctx on their setup.
>>
I'm curious about Jamba-Large. Maybe the size and the inferior benchmarks will add up and create something that has the soul we seek
>>
>>102025568
it's never been more over
>>
Anyone try Chronos Gold 12B? It seems pretty good at first use. Small model that has some kick...
https://huggingface.co/elinas/Chronos-Gold-12B-1.0
>>
AI21 and making a model that's somehow stupid despite being fuckhuge, name a better duo

I think this is the third time they've done that? How do they keep getting funding
>>
>>102033555
I started downloading bartowski's Q8 quant of this a few minutes ago, still waiting for it to finish
>>
>>102033565
>How do they keep getting funding
check their early life
>>
Anyone found a reliable way to cut down dirty talking? I don't want the bot to constantly be like "Hmm baby I like how you feel inside me" shit like I'm in a porno.
>>
>>102033579
>>102033555
Samefag. Buy an ad. No, seriously.
>>
>>102033565
They're the only ones with the balls to train models that aren't purely transformers. If nothing else, their new models have proven that big Mamba + Transformers hybrids deliver what they promise in terms of context and prompt-processing speeds while also performing decently even if it's not cutting edge while.
There's also a really good jump in performance between JambaV1 and Jamba1.5-Mini despite being comparable in size so the chances are good that the performance of the architecture can be increased even further.
>>
>>102033713
take your meds schizo faggot
stop shitting up the thread with your false shill accusations, I haven't even tried the fucking model yet
>>
>>102033704
gagging them helps if its a smart enough model to know gagged people can't talk
>>
>>102033704
If one would remove all the dirty talk and all the purple prose... what would be left?
>>
>>102033735
These are pretty expensive proofs-of-concept. They really need to focus more on the small end to iterate and refine before blowing their load on nearly half a trillion parameters
>>
>>102033765
*plap* *plap* *plap* *plap* *plap* *plap* GET BULLIED! GET BULLIED! GET BULLIED!
>>
what actually is mamba
>>
File: file.png (1.53 MB, 1897x1795)
1.53 MB
1.53 MB PNG
>>102033827
>>
>>102033775
It's fine, as long as they do something that's marketable the investor money's going to keep coming. Remember how Mistral started with $137 million months before they even had a single model out.
>>
is nvidia gonna minify mistral large and give us the sota for 2024?
>>
>>102033827
rwkv
>>
>>102027828
what you want isn't coom
>>
File: file.png (59 KB, 147x327)
59 KB
59 KB PNG
>>102033765
The models don't know good dirty talk is the problem. If they were like panting or gurgling or choking or saying they'll end pregnant or talking about their titpussy or whatever the fuck then sure, that's hot. But they all talk like thots in shitty gringo porn movies.
>>
Brave's stable release channel finally got local model support going for their in-browser LLM integration. I hooked it up to Llama.cpp and it just werked. I think it's kind of neat. It has several functions you can do after highlighting text on a page and right clicking. There are some limitations, but generally this is still a pretty cool feature. Damn, I don't want to switch my main browser. Are there any extensions like this for Firefox?
>>
>>102033853
Yeah, get ready for Mistral Large 4B
>>
>>102033892
>Are there any extensions like this for Firefox?
You can always make your own.
>>
>>102033892
>Are there any extensions like this for Firefox?
You can always ask the AI to make one.
>>
>>102033579
>>102033555 (You)
> Samefag. Buy an ad. No, seriously.
The fuck you on about? It's a model I was asking about dumbasss motherfucker. I know you haven't even tried it looking like an absolute fool. God damn lmg has gone to shit with newfags.
>>
The new Jambas are a huge deal if you need a decent model to chew through huge context lengths very quickly. No idea what the applications of this are but there's surely something.
>>
>>102033279
Could very well just be undertrained in tokens or maybe they payed the price for supposedly having better effective context than everyone else
>>
>>102033492
Seems sloppy, and how do they test the speed of closed models on the same hardware? I don't trust their data.
>>
>>102033892
>I hooked it up to Llama.cpp and it just werked
Can you share how you did it? I'm retarded.
Also what model would be good, mixtral?
>>
>>102034120
Why would they have to test them on the same hardware? It's fine to test cloudshit as-is via their API since you'll never be able to run it on faster hardware anyway.
>>
jamba on llama.cpp please...........
>>
>>102034203
I just updated Brave and adjusted Leo's settings according to the info in the question mark bubbles. What are you having an issue with?
>>
So as a retard just using koboldcpp in instruct mode to fap, is there a source for lewd loras? Is that even a model-agnostic thing?
>>
>>102028562
This, it attracted some AGP atrocities. "Never make a general with anime OP if you want it be high quality and calm most of the time" unnamed rule exists for a reason.
>>
>>102034342
I haven't used llama-server before so I don't know how to connect it with brave.
I already tried but failed
>>
>browsing loras
>see https://civitai.com/models/118398
>think that this could make some funny images where infinite Migus are surrounding the viewer
>try it out
>this is the first thing that plops out of the machine
>>
>>102011438#p102018061

Made a basic ass python script to do this. I only use koboldcpp so that's what it calls.
rentry dot co/vhqaewth
>>
>>102034576
In that case there's probably some learning you should do about just getting Llama.cpp set up with something in general. Have you tried a different backend? I'd guess Brave works with most just fine.
>>
>>102034790
>nuts.wad
>>
Oki so here's a more normal happi version.
>>
I got hit with the "ministrations, shivers, audible, ...for now" combo in the same message
I need a break after this
>>
File: 1697471409105670.png (8 KB, 423x24)
8 KB
8 KB PNG
wtf how did EA get into my story?
>>
Why would anyone use koboldcpp (rebranded llama.cpp with bloat, a shitty UI, and unaudited diffs from upstream) instead of llama.cpp? Is /g/ really so dumb that it can't compile a C++ program? Or is it astroturfing and nobody actually uses that thing...?
>>
>>102035262
https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md
>>
>>102035262
>Why would anyone use koboldcpp
>>102034358
>So as a retard
>>
>>102035262
Compiling koboldcpp is the same as llama.cpp, so if you can do one you can do the other.
>>
@102035262
because it's easier and more convenient and i don't want to have to compile shit
>>
>>102035262
Grooming from the Discord.
>>
File: file.png (19 KB, 531x282)
19 KB
19 KB PNG
>>102034873
pretty cool
>>
102035262
I don't want to compile C or C++ because I use windows most of the time, and C compilation fucking suuuucks in a windows environment, there's always some shit broken

it usually just werks under linux, but in windows it's a nightmare and I don't often want to boot into my linux partition
>>
File: print.png (306 KB, 950x653)
306 KB
306 KB PNG
>>102035349
This, 90% of the koboldcpp discussion is the guy himself. You're probably replying to him
>>
File: Untitled.png (602 KB, 1043x2524)
602 KB
602 KB PNG
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
https://arxiv.org/abs/2408.12570
>We present Jamba-1.5, new instruction-tuned large language models based on our Jamba architecture. Jamba is a hybrid Transformer-Mamba mixture of experts architecture, providing high throughput and low memory usage across context lengths, while retaining the same or better quality as Transformer models. We release two model sizes: Jamba-1.5-Large, with 94B active parameters, and Jamba-1.5-Mini, with 12B active parameters. Both models are fine-tuned for a variety of conversational and instruction-following capabilties, and have an effective context length of 256K tokens, the largest amongst open-weight models. To support cost-effective inference, we introduce ExpertsInt8, a novel quantization technique that allows fitting Jamba-1.5-Large on a machine with 8 80GB GPUs when processing 256K-token contexts without loss of quality. When evaluated on a battery of academic and chatbot benchmarks, Jamba-1.5 models achieve excellent results while providing high throughput and outperforming other open-weight models on long-context benchmarks.
https://huggingface.co/ai21labs
https://github.com/vllm-project/vllm/pull/7415
merged code for their new quant method
jamba 1.5 paper
>>
>>102035262
It's a quick onramp to see if this seems interesting enough to justify more effort. My path was koboldcpp -> ooba -> running sillytavern and connecting to llama.cpp or TabbyAPI (occasional mikupad and ooba use for story writing, occasional use of llama-cli for batch jobs).
>>
Weight Scope Alignment: A Frustratingly Easy Method for Model Merging
https://arxiv.org/abs/2408.12237
>Merging models becomes a fundamental procedure in some applications that consider model efficiency and robustness. The training randomness or Non-I.I.D. data poses a huge challenge for averaging-based model fusion. Previous research efforts focus on element-wise regularization or neural permutations to enhance model averaging while overlooking weight scope variations among models, which can significantly affect merging effectiveness. In this paper, we reveal variations in weight scope under different training conditions, shedding light on its influence on model merging. Fortunately, the parameters in each layer basically follow the Gaussian distribution, which inspires a novel and simple regularization approach named Weight Scope Alignment (WSA). It contains two key components: 1) leveraging a target weight scope to guide the model training process for ensuring weight scope matching in the subsequent model merging. 2) fusing the weight scope of two or more models into a unified one for multi-stage model fusion. We extend the WSA regularization to two different scenarios, including Mode Connectivity and Federated Learning. Abundant experimental studies validate the effectiveness of our approach.
big if true. they kind of muddled it by throwing in federated learning stuff and they used retnet models to test with
>>
>>102035477
ooba is even worse, it's slow for some reason too.
>>
>>102035519
Ooba being super slow is what made me eventually move off it.
>>
What parameters you normally use when executing your llama-server instance
>>
>>102035549
Yeah so the downgrade in the middle makes no sense.
>>
File: 1700953427978367.png (380 KB, 512x620)
380 KB
380 KB PNG
hello my /lmg/brudis
>>
>>102035728
I don't think he cares about losers at /g/, but he has been flooding /pol/ with bot posts and comments since 2020 or so. And he succeded, pol is so low quality now that it's dead.
>>
>>102035728
hi petra
>>
>>102035776
i think her name is grimes lad
>>
>>102035671
The first time I wanted to use a non GGUF model I installed ooba and I found I liked the interface a lot more and ended up using it for everything. Because I was trying new models I didn't realize at first that ooba was slower since I hadn't used the same model in both kobold and ooba.
>>
>>
>>102036024
after the 3rd impact with miku
>>
>>102036066



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.