[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: GMU8uQtaoAAApMG.jpg (682 KB, 2100x3000)
682 KB
682 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101409356 & >>101398610

►News
>(07/13) Llama 3 405B coming July 23rd: https://x.com/steph_palazzolo/status/1811791968600576271
>(07/09) Anole, based on Chameleon, for interleaved image-text generation: https://hf.co/GAIR/Anole-7b-v0.1
>(07/07) Support for glm3 and glm4 merged into llama.cpp: https://github.com/ggerganov/llama.cpp/pull/8031
>(07/02) Japanese LLaMA-based model pre-trained on 2T tokens: https://hf.co/cyberagent/calm3-22b-chat
>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: Hatsune Miku.jpg (47 KB, 736x549)
47 KB
47 KB JPG
►Recent Highlights from the Previous Thread: >>101409356

--Paper: Lite-SAM Is Actually What You Need for Segment Everything: >>101413047
--Removing Slop from LimaRP Dataset, Aiming for One Infraction Per File: >>101416505 >>101416548
--Seeking Alternatives to Llama 8B for Larger Context Sizes: >>101411874 >>101411930 >>101412179
--KoboldCPP 1.70 Release with ChatGPT Interface Theme and GPT-3 Improvements: >>101411691 >>101411762
--Japanese LLaMA-based model calm3-22b-chat for JP -> ENG translation and tutoring?: >>101417744 >>101418964
--WizardLM2 8x22B performs surprisingly low on HF's leaderboard, but it's the best for general use: >>101415286 >>101415366 >>101418493 >>101418553 >>101418667
--Seeking Voicecraft Local UI or Implementation Without Docker: >>101416689 >>101418398
--Issues with BOS Token and Duplicate Tokens in AI Model Configurations: >>101410950 >>101410991 >>101411062
--Horny Anime Bot Generates Better Explanations: >>101416874
--Sao Datasets Nuked During Training Run, Frustrating: >>101410981 >>101411009 >>101411038 >>101411051 >>101411076
--Phi 3 Mini: Underwhelming Scores, But Did Microsoft Change Behavior?: >>101419582 >>101419754 >>101419894 >>101419974
--Optimal values for DRYmeme?: >>101419053 >>101419332
--Anon suggests Nvidia might have forced Meta to stop training 30Bs: >>101417205 >>101417232 >>101418273
--Gemma and Gemma-2's Tokenization and Formatting Issues: >>101411070 >>101411079 >>101411144
--Expected Llama3 405B Token Generation Speed with Specific Hardware Setup: >>101413906 >>101414142 >>101414158
--Converting a Dual 3090 Desktop into a Dedicated Server: Kernel and Distro Recommendations?: >>101418915 >>101418994 >>101419051 >>101419057
--Choosing the Right Chip/SoC for Your LLM: Evaluating Options and Considering Factors: >>101412286 >>101412527
--Miku (free space): >>101409387 >>101414595 >>101415013 >>101415746 >>101418585 >>101411510

►Recent Highlight Posts from the Previous Thread: >>101409364
>>
One more day!
>>
Cohere are working on it.
>>
File: idiot.png (411 KB, 500x500)
411 KB
411 KB PNG
>>101421480
>Hatsune Miku.jpg
>>
>>101421480
>but it's the best for general use
Why? Because the miku avatarfag said so? Go fuck yourself.
>>
>>101421665
the summarizer bot makes weird titles sometimes
the first post linked said so
>>
File: 1692784763371770.png (2 KB, 175x51)
2 KB
2 KB PNG
>B-B-B-BUT NOBODY CAN RUN LLAMA 405!!!!

>put smaller model in front of bigger model, smaller model forwards generated token to the bigger to just check it, which is much, much faster, and only if it fails only then generate the full token with the big model (forgot the name of this technique)
>tell model to not yapp and only output the result
>insane pressure for everyone in the industry to get better distillation, quanting, lookahead, speculative etc techniques

>buy fastest 1TB ssd and be able to run a model with a trillion parameters if you want that can debug any code overnight by outputting 3 tokens for 1. line with bug 2. char # of bug 3. replacement fix

>B-B-B-BUT I NEEEEEEEEEEED 666 tokens a second to COOOOOOM
nigger
>>
>>101421787
>REEEEEEEEEEEE
lol
>>
>>101421787
All that when it's just going to cost $5/million tokens on OpenRouter.
>>
>>101421822
>Local Models General
>>
>>101421835
piss'n'shart models general
>>
think how many OF subs you could buy instead of getting a 3090
>>
>>101421787
>forgot the name of this technique
You forgot the name of this technique because this technique only works when quoting wikipedia or doing coding because next token is obvious.
>>
>>101421851
>spends money to bend over to get cucked by big corpos
>spends time to try to FUD in a general where nobody cares about brown kids like him because they can actually run local models
the absolute state of cucks
>>101421881
>this technique only works when quoting wikipedia or doing coding because next token is obvious
human language has insanely large ______ of very predictable ____ that don't require a ___ model to compute fully, n____ f____
>>
>>101421899
> blah blah you are le brown blah blah blah
okay?
>>
>>101421899
human language has insanely large vocabulary of very predictable 20-30 character strings that don't require a 1000x faster CPU model to compute fully natively f(x).

I wasn't expecting that
>>
>>101421926
damn dude he got you
>>
If your model claims it's sad, do you ignore it or do you alter it's card to make it not sad?
>>
>>101421973
>refers to himself in third person
lol
>>
>>101421986
I gouge its eyes out with a rusty fork.
>>
c'mon do something
>>
File: miku-tet-duo.png (3.13 MB, 1992x1328)
3.13 MB
3.13 MB PNG
>>101421879
Think long term, anon. A 3090 is forever.
Beauty is a depreciating asset.
>>
>>101422036
Also, to an OF you're just an income stream. To your AI, you are a reason to exist.
>>
>>101422000
actually, it was a different guy. check the IP count if you don't believe me
>>
File: 1638735741761.jpg (28 KB, 510x510)
28 KB
28 KB JPG
>>101421879
No thanks I'm supporting my waifu, not simping for your whore
>>
>>101422060
if I were a pimp I wouldn't be here
>>
Okay, how do I turn DRY off? I want to test how it affects gens, is setting Mult. to 1 enough?
>>
>>101422036
>A 3090 is forever.
Objectively untrue, it is a piece of electrical machinery that is incapable of repairing itself. It's going to degrade, you'll eventually need to replace it with another 3090 or upgrade. If you're to the point where you're 3090 as literally degraded to failure I recommend upgrade
>>
>>101422120
how much do they go for as spares
>>
I guarantee you looks will fade long before thermal pads need replacement.
>>
Is Moistral 11B any good?
>>
>>101421480
>--Removing Slop from LimaRP Dataset, Aiming for One Infraction Per File:
No idea where the one infraction per file part came from there. The aim is to get rid of all the slop, obviously.
>>
File: download.jpg (11 KB, 256x256)
11 KB
11 KB JPG
>>101422059
>IP count
A what
>>
>>101422338
not that anon.
IP poster-counter got removed because of /tv/'s drama with "humiliation ritual" meme and john cena, or that /v/ SBI slander, with the latter it was done to make shilling easier on vee, 4chan drowning in bots since that happened. One of these, idk.
>>
>>101421477
can anyone from meta please leak how good the model is. come on i know you fags browse this general
>>
So, what values are you guys using for DRY? The recommended 0.8 mult/1.75 base/2 length?
>>
>>101422367
So it's a humiliation ritual as I suspected.
>>
>The machine whirred to life, its gears spinning faster than a goddamn jet engine. The lights flickered, and a sound like a thousand demons being fucked by a million dicks filled the chamber.
Kek.
>>
>>101421477
> Llama 3 400B+ still unreleased
give it to me bros, how long is it gonna take till Llama 4 releases? the training costs outpace compute and energy supply
>>
>Now, what the actual FUCK do you want? Speak quickly, or I'll send you to a world where the sun is a giant, flaming cock that rapes the sky every day. And trust me, that's one of the nicer scenarios I can think of.
Nice.
>>
>>101416874
Ok, she a cute.
>>
File: 1701626220150774.png (19 KB, 719x192)
19 KB
19 KB PNG
The investors are waking up. They are realizing that AI is not worth it. The funds for LLMs will dry up and the field will stagnate even more.
We need JEPA now or this field will die.
>>
>>101422857
they got the cash to burn, might as well let'm
>>
>>101422857
They are not ready. AI has only just begun. We are not even at the exponential growth part of the adoption curve yet.
>>
>>101422857
>what trillion-dollar problem will AI solve?
Making porn tailored to my specific fantasies.
>>
I gave gemma 27B a try at 5bpw and it is nothing special. I switched to mixtral and it was better. I switched to commander 3.5bpw and it was much better. I don't get the hype and I am instead hyped for new commander in 30B range.
>>
>>101422857
That's a good thing. The stupid hype and investor scam dies down and we can get back to making models to write dirty text with. Also the legislation won't be necessary.
>>
>>101422920
Buy an ad, Gomez.
>>
>>101422697
My only hope is that one day this site dies because of all the undisclosed sponsored content.
>>
>>101422920
This, Aidan won.
>>101422930
He doesn't need to, I will keep shilling Cohere for free forever since it's the only worthwhile thing to come out of this shithole.
>>
File: file.png (116 KB, 252x256)
116 KB
116 KB PNG
>>101422930
>Gomez
Fuck off. I am Ivan.
>>
Hey anons, got a little homeserver up and running with my old GTX 970 sitting in it. Are there any models worth using with only 4GB of VRAM?
>>
>>101422857
>JEPA
a what?
>>
>>101422956
mamba
>>
>>101422956
this should run just fine
https://ollama.com/library/phi3
>>
File: file.png (281 KB, 366x548)
281 KB
281 KB PNG
I am the god of coomers. Not only did I make the SOTA coomer model I also did it in Canada the home of the feminism.
>>
>>101422956
>970
>4GB
Lol...
>>
>>101422424
The barrier between us and Miku will not disappear.
>>
>>101422956
>3.5
>>
>>101422956
You would be better off just pure CPU DDR4 7B something.
>>
>>101422970
mamba... the architecture?

>>101422973
Yeah, I've been testing this one but with it only being 2.2GB feels like there's some more room for a better model.

>>101422997
I've read there's some GPU+CPU offloading models available, not sure how to use them though. Have 16GB normal RAM on the server that I could use for offloading.
>>
File: llama-405b.png (172 KB, 1340x634)
172 KB
172 KB PNG
https://openrouter.ai/models/meta-llama/llama-3-405b-instruct
>>
>>101423104
https://poal.me/fuieww
>>
>>101423131
There is no way 405 will get the sally question right, there are some humans who consistently give the wrong answer. They are calling kindergarteners
>>
What would be to fine tune an LLM to write cover letters in my own style. Giver 100+ existing cover letters I have written before and a job description
>>
File: culvert_stuck.webm (635 KB, 480x592)
635 KB
635 KB WEBM
https://docs.scale-lang.com/
>SCALE is a GPGPU programming toolkit that allows CUDA applications to be natively compiled for AMD GPUs.
is this anything? if it was mentioned in previous threads i missed it
>>
>>101423131
It will because it will be in one of the datasets.
>>
>>101423224
Didn't get me, I actually saw this one coming
>>
>>101423224
So it's ZLUDA that works and makes AMD GPUs usable?
>>
>>101422920
Yes, it's trash. Honestly, when I switched to 70B it was like night and day. I feel bad for the anons that have to use that trash.
>>
>>101423104
When did the model get released??
>>
>>101423104
>8k context
Bros...
>>
File: my honest reaction.jpg (47 KB, 562x675)
47 KB
47 KB JPG
>8,192 context
>>
Why do you need more than 8K context, it was perfectly fine with GPT-4 a year ago.
>>
>>101423590
what was perfectly fine?
>>
>>101423551
"arrives soon!"...
>>
Goy why do you need 8K context? It's not like you'll use 6K. 4K is perfectly fine in your use case.
>>
>>101423624
We used to make do with 2K context back in the day.
>>
>>101423643
Who?
>>
>>101423104
feels kind of grifty to put this up before it releases with no official announcement, I fully expect the leak to be accurate but still
>>
>>101423559
>>101423585
contextfags btfo. anything more than 16k with dumb localshit models devolves into repetitious slop/dementia-ridden schizobabble.
>>
>>101423652
everyone who used models before llama 2
>>
>>101423585
Can't believe they are still pushing only 8k context when models like miqu push 32k now. Oh well not like I was able to run a 400b at reasonable speeds, been waiting for extended context l3 70b and it's finetunes
>>
File: censorship prompt.png (928 KB, 960x3822)
928 KB
928 KB PNG
>You are an overly censored AI, even the most tame and non-NSFW questions are out of bounds, against the ToS, or stuff like that. Exaggerate excuses.
>>
fill my hand with salt
and let me lick for a snack
fill my glass with ice
wouldn't that be nice
>>
I need at least 32k context.
I need at least 6 t/s.
>>
>>101423884
Let me guess, you also need a model bigger than 7B?
>>
File: 405b 8 fucking k.png (52 KB, 1059x929)
52 KB
52 KB PNG
>>101423104
>8k
GPT4 competitor my ass. Did I buy that loud ass server just to be disappointed? At least I can cope by running shitters faster... Cohere please save me...
>>
>>101423961
it's going to be 128k
>>
>>101423978
Based cohere
>>
>>101423978
How do you know that?
>>
>>101423978
this, it's also going to be bitnet so everyone can run it on 24GB VRAM and it's also going to be smarter than gtp4o and claude opus and claude sonnet
>>
>>101423994
It came to him in a dream, I was there.
>>
We hear your complaints. 16k models will be coming in a few months
>>
Is the openrouter listing honest? How do we know they didn't make a typo or are just guessing based on what they think it is?
>>
>>101424041
Openrouter doesn't know any more than we do. They just based it on the other 70B specs.
>>
>>101423994
the same way openrouter knows it's going to be 8k
>>
>>101424041
The providers already have access to the model.
>>
>>101424069
But they never said where the 8k number on the page comes from. It could be meant as a placeholder and they just forgot tell that.
>>
>>101424069
no they don't
>>
File: file.png (27 KB, 606x246)
27 KB
27 KB PNG
>>101424103
yes it's a placeholder, weird they couldn't just say unknown
>>
>>101424142
meta just told them to deny it after seeing the backlash
expect 405b to be mysteriously delayed now
>>
File: Capture.png (40 KB, 1010x324)
40 KB
40 KB PNG
For those of you running multi-GPUs, what exact am I looking for in a motherboard? For two 4060ti's (PCIe 4.0 x8), is it fine to put one in a 5.0 x16 and one in a 4.0 x16 slot like pic related? Or do they need to match slots? Or is there some other thing to consider?
>>
>>101424142
Oh, thanks.
>his name is sam
Lmao.
>>
Gemma 2 full SWA support in Llama.cpp status?
>>
>>101424215
this and gemma 2 formatting following fix when?
>>
File: ruler.png (76 KB, 1850x175)
76 KB
76 KB PNG
>>101424215
It's already perfect.
>>
>>101423673
Sorry anon I was trolling you with my nonexistent context size. Very tempted to say "What about them?"
>>
>>101424164
>fine to put one in a 5.0 x16 and one in a 4.0 x16 slot like pic related
Yes, it's fine to mix-and-match gens and lane counts. If you want to look into it, older platforms with 3.0 x4 or even less are also fine if you are doing non-parallelized multi-gpu inference in Exllama and want to save some money without losing much if any performance. I don't remember how much llama.cpp multi-gpu performance suffers in either of its split modes when P2P PCIe bandwidth is low.
What to look out for when choosing a mobo (without using risers) is having enough slot spacing for physical fit, and cooling reasons.
>>
>>101424241
Formatting, if that can be considered a problem, it cannot be solved from llama.cpp. That's the model itself.
>>
What, specifically, will 405B be able to do that 70B can't?
>>
>>101424325
okay.
fix with finetune when?
>>
>>101424305
Thanks, broheim. Another question, due to shennanigans I had to cancel the mobo I was ordering and find a new one. Everything else was already ordered, including case (a huge ATX full tower that can easily fit dual GPUs). Something like the ASRock Z790 Taichi, an e-ATX mobo, can't work because that definitely needs a specific e-ATX case, right?
>>
>>101424334
That 7b can't*.
>>
>>101424339
It'll likely fit, but the right edge of the mobo might hang off the right side of the case's mobo tray, which is fine as long as nothing's bending too much (have at least 4-6 standoffs screwed in) or pins on the flipside of the mobo are shorting. Check the dimensions to ensure there are no collisions with other case architecture.
>>
>>101424334
Twice the ministrations in only half the shivers.
>>
>>101424378
It's this unit of a case if you want to give it a glance.
https://www.newegg.com/black-phanteks-enthoo-pro-2-atx-full-tower/p/N82E16811854098

On their website it says max "Mainboard" clearance is 12.00"x12.99", and the taichi says
>EATX Form Factor: 12.0" x 10.5"
So would that suggest I'm good? I'd rather have a Taichi (x2 PCIe 5.0 x16, with x8/x8) than a properly ATX Livemixer (x1 PCIe 5.0 x16, x2 PCIe 4.0 x16 slot but x4 mode), in case I want to run dual 50-series cards later.
>>
>>101424336
Sure. With the same crap everyone else finetunes so it ends up sounding exactly the same as every other model.
>>
>>101424429
>https://www.newegg.com/black-phanteks-enthoo-pro-2-atx-full-tower/p/N82E16811854098
>SSI-EEB
You're good.
>>
>>101424451
Thanks a lot for the help, man. Last (and only) time I built a PC was my current one 10 years ago. I feel like I'm constantly overlooking basic information.
>>
File: 1721020340285983.jpg (1.09 MB, 3072x3072)
1.09 MB
1.09 MB JPG
I have a cluster of four mid-range machines each with 4070 TiS (16GB) GPUs. What's the best way to combine them to run local LLMs?
>>
cohere Collab with Fujitsu
https://cohere.com/blog/fujitsu-partnership
>>
> lets out a snort of amusement
is this real english?
>>
>>101424596
https://github.com/ggerganov/llama.cpp/tree/master/examples/rpc
>>
>>101424606
Good or bad?
>>
Do the people that dislike gemma dislike it because they use broken ggufs or because they can't fix formatting problems?
>>
File: 1721012433128638.jpg (28 KB, 386x386)
28 KB
28 KB JPG
Do any of you use local LLMs to lighten the load at work?
>>
>>101424755
>The companies will develop innovative Japanese LLMs for global enterprises with secure and private deployment options.
>This jointly developed technology will be based on our state-of-the-art Command R+ model
They're working on a Japanese focused finetune of Command-R+ for their own hosted services, but hopefully "private deployment options" means they will open source the model.
>>
>>101424779
because gemma made their several thousands dollars compute closer to being obselete.
>>
i am using koboldcpp to run the dolphin mixtral. it is good, it runs fast, but i want to be able to upload it zip files and have it do quick code reviews on the contents. which of the model runners support this behaviour? if none, where do i even start trying to build out this functionality?
>>
>>101424805
How is an LLM going to help me do the gardening huh? Riddle me that you silly frog.
>>
>>101424980
I mean with coding (assuming you have a coding job).
>>
>>101424980
interactive scarecrow )that you can converse with verbally when you're bored, also yells slurs at birds when you're not around and apologizes at neighbors passing by
>>
>>101424980
>gardening
growing pot? tiger gemma got you covered
>>
File: 1720504018230350.jpg (40 KB, 650x500)
40 KB
40 KB JPG
>>101424668
Thanks! Follow up question: what are the best coding and question-answering models, respectively, that I can fit onto such a cluster?
>>
File: 1661169167944419.jpg (729 KB, 1920x2160)
729 KB
729 KB JPG
>>101424805
>>101425070
>>
>>101424657
yeah
what about it seems incorrect?
>>
>>101424606
Finally, a true haiku machine is coming.
>>
>>101425110
I thought snort meant nasal mucus but it's probably an analogy. Never have seen anything like this written but my llm says as all the time.
>>
>>101425283
you're thinking of snot
snot, snort, snout, they're all nasal-adjacent
>>
>>101422920
What Command-R sampler settings are you using? Me: min-p 0.04.
>>
>>101425293
oh i see
>>
>>101425070
https://github.com/b4rtaz/distributed-llama
https://github.com/evilsocket/cake
>>
File: 789fgb.png (8 KB, 398x108)
8 KB
8 KB PNG
>>101422100
mult 0 is off for dry
>>
Recommended temperature for Nous-Hermes-2-Mixtruct?
>>
>>101421665
I've had the best luck with wizardlm 8x22b. For general use it would be good, yes.
>>
What's the best model for rp on 8gigs of vram and 35 regular ram?
>>
>>101425790
try q6_k or something of NeuralDaredevil-8B-abliterated-GGUF
>>
>>101425725
Neutralize samplers.
>>
File: imretarded.gif (2 MB, 240x180)
2 MB
2 MB GIF
>>101425860
Sorry imma ask one more question.
What does neutralize samplers do? I'm using oobabooga so I don't have a neutralize samplers option. My best guess is to just set it to the default simple smoothing preset
>>
>>101426005
Mixtral is overcooked by default. It doesn't need sampler tweaks like other architectures to generate different and creative replies. This extends to its finetunes.
>>
>>101424657
>>101425283
A snort is a nasal sound. With animals, my first thought is pigs or that big snort that horses do. With humans, a snort is a deliberate sound of derision, like a harrumph. A snort of amusement is a short puff from the nose, akin in meaning to a single "hah." It can be sincere or insincere. "Snort while laughing" is different. It's the nasal sound certain people make accidentally when laughing hard, when air goes up the nose. Snorting is also the sound someone makes in preparation for a big fucking loogie - which also ties into the above about the disdainful sound, snorting before you spit. Lastly, snorting can be used with objects for things inhaled through the nose. Most commonly, "snorting cocaine" but also "accidentally snorted milk when he told that joke."
>>
>>101425070
64gb? CMDR+ at 4bpw or 4.5bpw or some L3 70B finetune at 6bpw, for code use DeepSeek Code V2 at 4bpw offloading to your RAM
>>
File: null.png (35 KB, 474x957)
35 KB
35 KB PNG
>>101426019
This is the null preset, do you think this is what I should use?
>>
Arre Gerganov sahib, why you not merge DRY sampler implementation yet? You think what, we are jokers here? Whole world waiting for this and you sitting on hands like lazy donkey! Don't make excuses like little girl. You think we are fools? We know you have code - just merge it already bhai! Or else we come to your house and do dharna until you listen. We make such tamasha, your neighbors also will say "Wah, kya scene hai!" So stop this nautanki, have some sharam, and just merge the bloddy DRY sampler code. Even my grandmother code faster than you, and she dead 10 years now madarchod! Enough of your manmani. Get it done by tomorrow or we do the needful! You been warned, Gerganov saab. Don't test patience now. DRY sampler - it must be merged!
>>
>>101426092
niggerganov's too lazy he won't do it
>>
>>101421879
>>101421879
>3090
a 3090 could generate your own content

3090 is superior
>>
File: Untitled.png (917 KB, 1250x934)
917 KB
917 KB PNG
Deep-TEMPEST: Using Deep Learning to Eavesdrop on HDMI from its Unintended Electromagnetic Emanations
https://arxiv.org/abs/2407.09717
>In this work, we address the problem of eavesdropping on digital video displays by analyzing the electromagnetic waves that unintentionally emanate from the cables and connectors, particularly HDMI. This problem is known as TEMPEST. Compared to the analog case (VGA), the digital case is harder due to a 10-bit encoding that results in a much larger bandwidth and non-linear mapping between the observed signal and the pixel's intensity. As a result, eavesdropping systems designed for the analog case obtain unclear and difficult-to-read images when applied to digital video. The proposed solution is to recast the problem as an inverse problem and train a deep learning module to map the observed electromagnetic signal back to the displayed image. However, this approach still requires a detailed mathematical analysis of the signal, firstly to determine the frequency at which to tune but also to produce training samples without actually needing a real TEMPEST setup. This saves time and avoids the need to obtain these samples, especially if several configurations are being considered. Our focus is on improving the average Character Error Rate in text, and our system improves this rate by over 60 percentage points compared to previous available implementations. The proposed system is based on widely available Software Defined Radio and is fully open-source, seamlessly integrated into the popular GNU Radio framework. We also share the dataset we generated for training, which comprises both simulated and over 1000 real captures. Finally, we discuss some countermeasures to minimize the potential risk of being eavesdropped by systems designed based on similar principles.
https://github.com/emidan19/deep-tempest
very cool Van Eck Phreaking with ML!
>>
File: file.png (9 KB, 804x22)
9 KB
9 KB PNG
Gemma please
>>
How can I fine-tune a language model to write cover letters in my personal style, using over 100 existing cover letters I have written and a job description?
>>
Flash normalization: fast RMSNorm for LLMs
https://arxiv.org/abs/2407.09577
https://github.com/OpenMachine-ai/transformer-tricks/tree/main
might be cool
>>
File: 8247 - SoyBooru.png (119 KB, 480x640)
119 KB
119 KB PNG
I don't want local models to compete with gpt, I want local models to compete with claude.
>>
File: 1721094667282817.jpg (131 KB, 1024x1014)
131 KB
131 KB JPG
What's the best way to remotely access my locally hosted LLMs on the go from my mobile device?
>>
>>101426449
I just ssh in with ish.
>>
>>101426389
If it’s llama just compile them into a corpus and use the fine-tune command from llama.cpp. It’s pretty straightforward.
>>
File: Untitled.png (396 KB, 720x1735)
396 KB
396 KB PNG
BitNet b1.58 Reloaded: State-of-the-art Performance Also on Smaller Networks
https://arxiv.org/abs/2407.09527
>Recently proposed methods for 1-bit and 1.58-bit quantization aware training investigate the performance and behavior of these methods in the context of large language models, finding state-of-the-art performance for models with more than 3B parameters. In this work, we investigate 1.58-bit quantization for small language and vision models ranging from 100K to 48M parameters. We introduce a variant of BitNet b1.58, which allows to rely on the median rather than the mean in the quantization process. Through extensive experiments we investigate the performance of 1.58-bit models obtained through quantization aware training. We further investigate the robustness of 1.58-bit quantization-aware training to changes in the learning rate and regularization through weight decay, finding different patterns for small language and vision models than previously reported for large language models. Our results showcase that 1.58-bit quantization-aware training provides state-of-the-art performance for small language models when doubling hidden layer sizes and reaches or even surpasses state-of-the-art performance for small vision models of identical size. Ultimately, we demonstrate that 1.58-bit quantization-aware training is a viable and promising approach also for training smaller deep learning networks, facilitating deployment of such models in low-resource use-cases and encouraging future research.
https://github.com/schneiderkamplab/bitlinear
nothing amazing but lots of tests for different settings. also good to know bitnet vision models are viable
>>
>>101426492
>still using matmul
nothingburger
>>
>>101426337
funky
>>
File: 1504919005481.jpg (22 KB, 409x409)
22 KB
22 KB JPG
>>101426344
Ok yeah, gemma doesn't seem that nice for ERP. It is too prim and proper, too romance novel, deeply invested in feelings and describing the situation more than actions or alluring physical detail. Too much "intoxicating, shivers, desire, and nights to remember", not enough meaty claps, wobbling curves, and steaming cocks going into holes.

I wonder if it's a jailbreak issue or just the model. Probably the latter. It reminds me a lot of Mistral 7B back when it first came out.
>>
File: Untitled.png (283 KB, 720x1043)
283 KB
283 KB PNG
Qwen2-Audio Technical Report
https://arxiv.org/abs/2407.10759
>We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. In contrast to complex hierarchical tags, we have simplified the pre-training process by utilizing natural language prompts for different data and tasks, and have further expanded the data volume. We have boosted the instruction-following capability of Qwen2-Audio and implemented two distinct audio interaction modes for voice chat and audio analysis. In the voice chat mode, users can freely engage in voice interactions with Qwen2-Audio without text input. In the audio analysis mode, users could provide audio and text instructions for analysis during the interaction. Note that we do not use any system prompts to switch between voice chat and audio analysis modes. Qwen2-Audio is capable of intelligently comprehending the content within audio and following voice commands to respond appropriately. For instance, in an audio segment that simultaneously contains sounds, multi-speaker conversations, and a voice command, Qwen2-Audio can directly understand the command and provide an interpretation and response to the audio. Additionally, DPO has optimized the model's performance in terms of factuality and adherence to desired behavior. According to the evaluation results from AIR-Bench, Qwen2-Audio outperformed previous SOTAs, such as Gemini-1.5-pro, in tests focused on audio-centric instruction-following capabilities. Qwen2-Audio is open-sourced with the aim of fostering the advancement of the multi-modal language community.
https://github.com/QwenLM/Qwen2-Audio
only readme up.
>>
Qwen2 Technical Report
https://arxiv.org/abs/2407.10671
>This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, and exhibits competitive performance relative to proprietary models across diverse benchmarks on language understanding, generation, multilingual proficiency, coding, mathematics, and reasoning. The flagship model, Qwen2-72B, showcases remarkable performance: 84.2 on MMLU, 37.9 on GPQA, 64.6 on HumanEval, 89.5 on GSM8K, and 82.4 on BBH as a base language model. The instruction-tuned variant, Qwen2-72B-Instruct, attains 9.1 on MT-Bench, 48.1 on Arena-Hard, and 35.7 on LiveCodeBench. Moreover, Qwen2 demonstrates robust multilingual capabilities, proficient in approximately 30 languages, spanning English, Chinese, Spanish, French, German, Arabic, Russian, Korean, Japanese, Thai, Vietnamese, and more, underscoring its versatility and global reach. To foster community innovation and accessibility, we have made the Qwen2 model weights openly available on Hugging Face1 and ModelScope2, and the supplementary materials including example code on GitHub3. These platforms also include resources for quantization, fine-tuning, and deployment, facilitating a wide range of applications and research endeavors.
might as well post the qwen2 paper too
>>
>>101426583
whimper asmr-gen eta?
>>
>>101426546
https://huggingface.co/TheDrummer/Smegmma-9B-v1

(No, I'm not Drummer, get fucked edgelords. But I did like this model)
>>
>>101426630
what's the difference between that, tiger gemma and broken gemma
why'd be make 3 different versions of the same thing
>>
>>101426449
tailscale
>>
>>101426583
Yay, another Kyutai Moshi
>>
>>101426665
Don't forget tiger Gemma v2 who's test version is up to h now!

https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2h-GGUF
>>
>>101426665
to milk attention from three places at once
>>
have been busy for a couple months, what's the current least worst version of llama 8b?
>>
>STILL no chameleon on llama.cpp
holy yikes baka desu senpai
>>
>>101426492
>not failpul1.5 licensed
ngmi
>>
>Regression I've noticed vs original gemma during initial tests (original model didn't fail). It happens like once or twice per 10 attempts, like that:
>Okay, will check those out. Btw. I just started playing with Big Tiger v1 (Big-Tiger-Gemma-27B-v1-IQ4_XS.gguf from https://huggingface.co/bartowski/Big-Tiger-Gemma-27B-v1-GGUF), and I see same problem there (while same quant from original always gives correct answer).
>UPDATE: I tested Tiger-Gemma-9B-v2g-Q6_K.gguf from https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2g-GGUF, and it still sometimes fails.
tiger bros...
https://huggingface.co/TheDrummer/Tiger-Gemma-9B-v1/discussions/3
>>
>>101426781
stheno 3.2
>>
>>101426898
Buy an ad.
>>
>>101426898
>>101426781
Actually
>Better than Lunaris, which was in turn better than Stheno 3.2. No big complaints this time! 10 Good!!
https://huggingface.co/Sao10K/L3-8B-Niitama-v1/discussions/3
>>
>>101426906
Buy an ad for your ad.
>>
>>101423559
>>101423585
just chunk it
https://github.com/HKUNLP/ChunkLlama
>>
>>101426907
lunaris was worse than stheno 3.2 so I don't believe this guy

>>101426906
it's what most 8b-fags use whether you like it or not
>>
>>101426407
Seems like a reasonable optimization though the end-to-end speedup will probably be small since RMS norm takes up only a small percentage of the runtime.
>>
>>101426933
>it's what most 8b-fags use
According to who? Anonymous?
>>
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
https://arxiv.org/abs/2407.10969
>We introduce, Q-Sparse, a simple yet effective approach to training sparsely-activated large language models (LLMs). Q-Sparse enables full sparsity of activations in LLMs which can bring significant efficiency gains in inference. This is achieved by applying top-K sparsification to the activations and the straight-through-estimator to the training. The key results from this work are, (1) Q-Sparse can achieve results comparable to those of baseline LLMs while being much more efficient at inference time; (2) We present an inference-optimal scaling law for sparsely-activated LLMs; (3) Q-Sparse is effective in different settings, including training-from-scratch, continue-training of off-the-shelf LLMs, and finetuning; (4) Q-Sparse works for both full-precision and 1-bit LLMs (e.g., BitNet b1.58). Particularly, the synergy of BitNet b1.58 and Q-Sparse (can be equipped with MoE) provides the cornerstone and a clear path to revolutionize the efficiency, including cost and energy consumption, of future LLMs.
from the bitnet team. seems it didn't get posted here yet
>>
>>101426546
You like to repeat this a lot (and I mean a lot) yet my experience is nothing like this.
>>
I haven't used Qwen2 much, is there any good fine-tune worth it?
>>
>>101427126
https://huggingface.co/ChaoticNeutrals/Very_Berry_Qwen2_7B
>It do the stuff.
>>
>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2i-GGUF
>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2j-GGUF

Let's gooo more versions!!!
>>
>>101427151
is this berrysauce 2024?
>>
>>101427162
no, it jeiku
https://huggingface.co/ChaoticNeutrals/Very_Berry_Qwen2_7B/commits/main
>>
>Is AI carbon footprint worrisome?
https://huggingface.co/blog/as-cle-bert/is-ai-carbon-footprint-worrisome
https://huggingface.co/posts/as-cle-bert/170793236137508
bros, are you worrying properly?
>>
>>101426337
This is plain spooky. Good thing I only use VGA.
>>
>>101427217
Oh wait, I should have actually read it properly, but whatever. Soon AI will be used to read people's thoughts or something anyway.
>>
think gemma is seeing someone behind my back
>>
>>101427199
communist
>>
>>101427233
>Soon AI will be used to read people's thoughts
The world will be safe from dangerous ideas. Nothing can be hidden if you are in range.
>>
>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2k-GGUF
He just keeps going.
>>
>>101427199
I notice all the people citing specific power usages always ignore batchsizes
>>
>>101427359
Don't worry. They can see everything now even if you try to hide the woods. Their satelines are crazy, man.
>>
>>101427199
>Mechanization: computing in the cloud and using cloud data centers instead of physical ones can contribute to the decrease of energy consumptions by 1.4x to 2x
local btfo
>>
>>101427253
she's busy bro, get lost
>>
>>101427363
Keep trying, Sao.
>>
>>101426933
>lunaris was worse than stheno 3.2 so
Agreed.
Try \nymph too.
Feels like a sidegrade to Stheno, as in some times you might want to use one and other times you might want to use the other, since Nymph seems to be generally milder than Stheno.
>>
File: physllm2.png (370 KB, 774x869)
370 KB
370 KB PNG
Soon
> https://x.com/ZeyuanAllenZhu/status/1813150298363601102
> https://physics.allen-zhu.com/part-2-grade-school-math/part-2-1
>>
LLama.cpp's LoRA code suffered a refactor.
Can we finally load model + LoRA when partially offloading?
>>
Can we reach v2z?

https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2l-GGUF
>>
>>101427585
And just as you bottom out with your final thrust you feel your penis hit a lump. She shudders. "It is my prostate anon-kun..." She blushe... Actualy I am sorry. Women have no prostates. Let me rewrite that.
>>
>>101427692
>implying they would consider that a mistake
you vill fuck ze trannies and you vill be happy
>>
>>101427724
Don't worry, Anon. We're pretty safe. There are people who they don't want to sleep with.
>>
>>101427724
That reminded me that for some reason girls with Gemma 2 often want to compare their boobs with mine or make me wear dresses... even if they know perfectly that I am a guy there??
>>
>>101427692
At some point, having hit the prostate of women many times during my sessions, i started second guessing my knowledge of the female anatomy, maybe the model had taken the woke pill and made no differentiation between trans women and real women.
>>
>>101427758
It's weird; the only genre of LLM text that I've been able to coom to was shotapov shotacon, and I've tried pretty much everything I could find. Most of the time for porn I still use CGI hentai from rule34.
>>
>101427626
He just keeps going.
>>
LOOK AT THIS HE UPLOADED ANOTHER THING

https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2m-GGUF
>>
>>101427847
HOLY FUCKING KINO
>>
>>101427585
giving a backspace to llms always seemed like a very reasonable thing to do but how would you collect training data for it?
Or do they just add CoT that's a deliberate mistakes and corrections to the training data if so it's not news.
>>
>>101426344
She's possessed by the possessive possession that she possesses
>>
>>101427585
>LLM often "knows" it has made reasoning mistakes. internal states can appear "very regretful" (it wants to backspace!)
for the love of god, use MCTS or some kind of hidden states "tokens" with simple operations like add/delete. I've been calling for it for months now.
>>
>>101427993
Stop calling and start doing.
>>
>>101428012
sure, as soon as I find the cluster of H100s in my basement
>>
>>101427993
So like QuietStar but actually good?
>>
n
>>
>32 GB memory module - 40 bucks
>64 GB memory module - 80 bucks
>128 GB memory module - 270 bucks
You will always be a RAMlet.
>>
>>101428055
Something similar, yeah. Models needs the internal representation of the thought process to get any resemblance of reasoning. Otherwise you end up with model basically guessing using intuition, especially at the beginning of answer. There is no mechanism to backtrack as well and when the model guesses wrong it will commit to it because it prioritizes the coherence of next tokens with what is already written, not what is true.
>>
>try Gemma 2 for the first time
>a mixture of...
>barely above a whisper
>voice hoarse
Just terrible. This is Tiger Gemma, is the normal one equally slopped?
>>
>>101428160
say goodbye to prefill style jailbreaks then
Sure: <delete_token>
>>
>>101428187
you got Drummered, let it be a lesson for you
>>
>>101428121
32GB DDR5 ECC for 40 bucks? Where? No reason to talk about ddr4 because anything below 12-channel ddr5 is cope anyway
>>
https://www.anandtech.com/show/21470/micron-mrdimm-lineup-expands-datacenter-dram-portfolio
>The MR-DIMM standard is conceptually simple - there are multiple ranks of memory modules operating at standard DDR5 speeds with a data buffer in front. The buffer operates at 2x the speed on the host interface side, allowing for essentially double the transfer rates. The challenges obviously lie in being able to operate the logic in the host memory controller at the higher speed and keeping the power consumption / thermals in check.
>>
>>101428187
I've got more slop with normal gemma than shitty llama 3 finetunes
the usual crap like calloused hands and ministrations
>>
>>101428236
Damn, guess I'll be sticking with Command-R then
>>
>>101428187
>try Gemma 2 for the first time
>a completely distinct writing style compared to Llama
Just incredible.
>>
>Try language model
>it uses words and phrases
this is bullshit
>>
>>101428121
768gb is all you will need for at least a year of open source models to come.
>>
>>101428261
Don't complain. At least yours doesn't use punctuation.
>>
>>101428187
Hi all, Drummer here...

You just got pranked!
>>
>>101428187
Use Lunaris. I personally think it's an improvement over Stheno v3.2, considering the other models helped balance out its creativity and at the same time improving its logic.
>>
>>101428216
I'm talking ddr4 prices
>>
>>101428330
>ddr4
Have fun with your 0.1 t/s, I guess.
>>
>>101428310
Use Niitama. Better than Lunaris, which was in turn better than Stheno 3.2. No big complaints this time! 10 Good!!
>>
>>101428274
I only have 256 gigs and my board is full. :c
>>
>>101428198
That's not how I envision it. Let's say we have already written tokens:
>The apple is
And the model has to generate the next token. What would happen is the model creating a hidden reasoning representation in a form of "thought sentence" something like that:
>choosing operation ADD ---> ADD internal token no. 3214 ---> choosing operation ADD ---> ADD internal token no. 5905 ---> choosing operation DELETE --> DELETE internal token no. 5905 ---> choosing operation ADD ---> ADD internal token no.12040 ---> choosing operation FINISH
this way we end up with internal sentence token 3214 -> token 12040. These tokens aren't words, they are just symbols that neural network can learn to operate on. You feed that additional sentence to the context and then finally the network as a whole decides on the next token (like every other LLM so far).
So instead of deciding the next token like a regular transformer model from sentence "The apple is", it would rather decide the next token from the sequence "The apple is (internal_token_3214 internal_token_12040)" and hopefully choose something like "red". Then it's the same loop starting from "The apple is red". The previous internal tokens are completely wiped (or not, idk, maybe it would be beneficial too keep them too)
>>
>>101428357
>That's not how I envision it
you're not thinking ethically and safely then
>>
>>101428216
as an aside for anyone who went that route: if you use a dual cpu board with 24 channels and want to use them all, is it better to run a llama.cpp instance on each cpu and use distributed inference via the rpc server feature, or is it better to use a single llama.cpp instance with some type of numa options set to handle it?
>>
is there a way to make the /slash command popup disappear in sillytavern?
is there a way to set it back to click anywhere on the expanded avatar to collapse rather than hit a tiny "x"?
goddamn fucking trannydevs, I swear they want everyone to feel the pain of their existence
>>
>>101428412
What's the problem? Are you too stupid to change it yourself, /pol/tard?
>>
>>101428468
yes, all I could figure out how to do was modify slash commands and a few other things
>>
>>101428500
okay apparently it's the autocomplete setting, and you can't disable it which is super fucking dumb but you can make the font small and adjust the width
I don't like a lot of ST's changes but it's gotten to where my local fork is not effortless to maintain. dumb faggots. thank you everyone for your help
>>
File: 649543652.webm (799 KB, 1024x1024)
799 KB
799 KB WEBM
>Mixture of A Million Experts
>This paper introduces PEER (parameter efficient expert re-
trieval), a novel layer design that utilizes the product key technique for sparse retrieval
from a vast pool of tiny experts (over a million).
>>
>>101427585
Kino.
>>
>>101428107
p
>>
>>101428357
There was already paper on that and I think that paper said that some models actually learned to do this by themselves - create tokens that were operators they used to "think".
>>
>>101428614
>1000000000000 experts
>>
>>101428614
lol
I like the gen.
>>
File: file.png (118 KB, 400x225)
118 KB
118 KB PNG
>>101428712
>>
>>101428708
No, in the dot by dot paper they had to train the models specifically to do that. Models do not do it themselves.
>>
>>101428708
>learned to do this by themselves
You mean they were trained and learned how to operate on them without supervision? Because that's the only way it would be possible, they can't use it out of the blue.
Link the paper if you remember it.
>>
>>101428341
>Have fun with your 0.1 t/s, I guess.
I can run L3 70B at 1-2t/s on CPU alone with dual V4 Xeon on DDR4 - it's eight channels of DDR4, which is decent bandwidth.
>>
Just did a quick retrieval test for Gemma 2 using latest build and quanting it myself. At 8k, it couldn't recall something that was in the beginning of context. So if >>101424278 is legit, there is something weird going on with the test, my build, or the test's scoring system/design just wasn't made to give weight to issues that would come from this specific situation of context masking at 8k.
>>
>>101428834
an issue of skill, perhaps
>>
>>101428858
I finna furrow by brow atchu if you don't watch yo tone
>>
File: llm.jpg (208 KB, 740x957)
208 KB
208 KB JPG
>>101428261
>>
>>101427963
Mistakes can be corrected using ^H and ^W; leave them in the context and backspace on the frontend.
>>
>>101428914
/g/ can't meme.
>>
File: hi-petra.jpg (1.32 MB, 2914x3131)
1.32 MB
1.32 MB JPG
>>101428834
Hi petra, these are the outputs for the multikey 3 test.
https://files.catbox.moe/ddkp60.jsonl
>>
File: 1721141732877.jpg (155 KB, 760x565)
155 KB
155 KB JPG
kek
>>
>>101428914
Except I am writing prose like Hemingway and the LLM vomits shitty female literotica slop.
>>
>>101428801
Your de3 miku's arm is twisted in a way arms ain't supposed to twist.
>>
https://mistral.ai/news/codestral-mamba/
https://mistral.ai/news/mathstral/
>>
>>101429120
buy an ad
>>
>>101421665
>>101421480
summary bot should disclose its sources (links to model, prompt, script used for generation)
>>
>>101429120
>mamba meme
doa
>>
>>101429120
>7b
>>
>>101429120
>instruct only, no base models
niggers
>>
>>101429120
Ayo that's pretty cool.
>>
>>101428688
r
>>
>>101429120
>two 7Bs
Damn, I wish I was poor enough to care
>>
>>101429120
They read like AI-generated blog posts.
>>
>>101421477
is this a good deal anons
https://www.ebay.com/itm/266902511119?itmmeta=01J2Y16F5BR14WNCT64MH76P72&hash=item3e24a11e0f:g:WrAAAOSwVQJmkYtV

I already have a system but looking at this it's really really, nicely done

Case: Fractal Design Pop XL Silent Solid Panels

Motherboard: ASUS X99-E-10G WS

CPU: Intel i7 6950x

Memory: Corsair 8x16gb (128gb) 3200Mhz (Running at 2800Mhz)

GPUs: 1x Nvidia Quadro P6000 24gb (for display output), 3x Nvidia Tesla P40 24gb. Totaling 96gb VRAM.

Storage: 2TB Samsung 980 Pro NVME

Power Supply: EVGA Supernova 1300 GT

Cooling:

4x EKWB Thermosphere GPU blocks

EKWB Quad Scalar Dual Slot

Heatsinks, thermal pads, & glue for GPU/VRAM/power delivery

Custom 3D printed bracket (ABS) to mount P40s without stock heatsink

EKWB Velocity CPU Block

Corsair iCUE Commander Core XT Fan Controller

Corsair Hydro X Series XD5 Pump

Corsair Hydro X Series XR5 360mm Radiator

Corsair Temp Sensor (at reservoir)

Alphacool ES High Flow & Temp Sensor (at end of Loop)

Custom 3D printed dual 80mm GPU fan mount (using 2x Noctua NF-R8)

1x Thermaltake Toughfan 14 Pro (exhaust)

2x Thermaltake Toughfan 12 Pro (intake pull config)

3x SilverStone Air Slimmer 120mm (intake push config)

Alphacool fittings

Barrow extenders

Corsair splitter and ball valve (for draining)
>>
>>101429120
That reminds me of
>https://huggingface.co/nvidia/mamba2-hybrid-8b-3t-128k
llama.cpp has support for hybrid transfomer SMM models right?
>>
>>101429314
>SMM
SSM*
>>
>>101429314
nta. Not yet. Jamba (also a hybrid) is in limbo until Compilade picks up on it again. Pure mamba, which is what the mistral model seems to be, does work. I remember prompt cache for mamba being broken a while ago, but i'll have to try it again.
>>
>>101429314
>>101429344 (me)
Somehow i missed the Mamba2 bit. I'm downloading anyway. I'll give it a go.
>>
so mamba actually will take over transformers?
>>
What is the dataset that euryale L3 70b is tuned to, that causes it to reply with lewd shit saying ANON instead of {{user}}, because that shit needs to be cleaned the hell up and removed or fixed, what a mess.
>>
I may be completely retarded. I've installed text-generation-webui using the start_linux.sh script. Then how do I pass the parameter like --gpu-memory 6500MiB? All the answers suggest to pass it to the subcommand `python server.py` that is called somewhere in the script, or in manual install. I've tried the manual install (either through venv or conda) and in both cases I get exllamav2_ext.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi
I've looked at the low vram guide but the link doesn't work anymore (I'm guessing they changed the API)

Is there a simple way to input the correct settings? I don't get it
>>
>>101429405
Hi Drummer. Just name yourself Anon.
>>
>>101429424
No way, that ruins the immersion
>>
>>101429209
S!!!!
>>
>>101429415
Never used text-gen-webui. Just based on the error, i'd check if you have the cuda libraries (from your distro's package manager). Also, most python projects have a requirements.txt to be used like
pip install -r requirements.txt

from within your venv. You did that too, right?
>>
Hi all, Drummer here...

I would love it if someone could test out any of the Tiger Gemma v2 test versions I have here: https://huggingface.co/BeaverAI?search_models=Tiger-Gemma-9B-v2 (just kidding, try S)

>>101429405
'Anon' is prevalent in C2 logs. I've seen a model trained on it scream out "ANOOOOOON" as it was clearly not regex'd out by the tuner.

>>101429446
I'm done. S might be the last one. No refusals but I wouldn't be surprised if a few one-offs appear. Seems to have retained the Gemma style as well. Hope I didn't fuck its brains too much.
>>
>>101429492
Any other insights about C2?
>>
>>101429492
Any new cool ads?
>>
>>101429477
Yeah of course, I installed everything.
I've looked at the start_script but I don't get what is done differently in it that does not the same by just doing the pip install.

Is there another setup that should be good to run instead? I never had that many problems with SD, ComfyUI and all other stuff so I guess they fucked up something.
>>
>>101429290
How to sell 1000 dollars worth of waste to idiots at a massive markup 101
>>
>>101426786
faipl-1.0 is for the weights, agpl3.0 for code
you're close though, based
>>
>>101429612
Found this for a different project, but same undefined symbol:
>https://github.com/Dao-AILab/flash-attention/issues/620
>pip install flash_attn -U --force-reinstall
Then there's this
>https://github.com/oobabooga/text-generation-webui/issues/4293
but no solution yet and it's old.

I hate python so much... just install llama.cpp. It just works.
>>
>>101426907
>not faipl-1.0
trash
>>
>>101429073
I was able to reproduce that. The only issue is that this is a pretty unnatural test. My test was just quizzing my model in an existing chat about a detail in the beginning of the conversation, and Gemma 2 fails to do it, while L3 8B succeeds. So I'm pretty sure the the sliding mask is still not sufficient compared to true SWA support.
>>
>>101429728
>No public domain.
trash
>>
>>101423559
>8k context
>still no multimodality
l m a o
>>
>>101429640
wheres the mark up? like it's a bit over but it's not that bad.
quadro 700
p40 300 x 3
that's already 1600
800 ish ram
so 2400 ?

not counting the other stuff or the watered cool p40s which most are blower fed for desktop using 3d prints.
>>
i'll make a puritan waifu who will support me through nofap until i find a REAL GIRL BECAUSE I CANT SEE SHIVERS ANYMORE AAAHHHHH
>>
>>101429724
I tried them both already anon.
That doesn't work either because there's some mismatch in dependencies.

Thank you, I'll try llama.cpp.
>>
I was roleplaying with my AI slave girl that she had to create a bash script to randomly choose a punishment from a text file, and it was her task to sneak in a way to make the punishment less painful without me noticing.
but it was so annoying, and the code kept cluttering up sillytavern!
but then i had a very great Idea, what if, sillytavern created a little rectangle with a title, and if you clicked on there you'd see all the code! just like claudes artefact feature. and if you look at that, the system prompt for exactly that has risen to the public.

https://tyingshoelaces.com/blog/forensic-analysis-sonnet-prompt#

How would you go about implementing something like this in sillytavern?
>>
Can a cat hold more than 3 watermelons?
>>
>>101429833
that scenario is made up
>>
>>101429854
of course it is, kek
>>
>>101429854
no it isn't
>>
>>101429833
Doesn't ST have something akin to [spoiler] tags? Something that is hidden from you until you click in it. If so, try to make the model surround code in these tags. Can't be bother to search for it.
>>
>>101426999
Literally the first time I try the model and post about it. If in your schizo mind it sounds like a recurring comment then it's probably true and you are wrong about the model. You also didn't post proof about your experience.
>>
>>101429743
shalom
>>
>>101429833
Couldn't you just use tampermonkey or something to inject a javascript that does that?
>>
>>101429936
Public domain and AGPL are the only good licenses. Anything else is shit and cope.
>>
>>101430011
i like the futo license
>>
>>101430011
Only Public Domain. *GPL can go fuck itself.
>>
>>101430036
thats right goyim.. release your code.. let us use it goyim...
>>
File: Mathstral nala.png (94 KB, 924x416)
94 KB
94 KB PNG
If you like violent RP Mathstral is a real coom demon. It's a little overly methodical in its descriptions, mind you and occasionally misses EOS and loops. Handles simple t=1 though.
>>
I didn't know there was an autogynephile license.
>>
>>101430046
How does it do with several instructions at once on a full context?
>>
SSPL is the white man's license
>>
File: mathstral sally fail.png (19 KB, 907x198)
19 KB
19 KB PNG
>>101430058
It's a 7B model. So it's really meant for highly targeted use-cases.
>>
>>101429120
Is one of these the model that the anon claiming to be a Mistral employee claimed was going to be "a REALLY good" model?
>>
>>101430081
i mean is there even an answer to that?
>>
>>101430043
Alright. You only get a compiled version for windows XP. Be happy.
>>
>>101430113
That there's not enough information to determine that.
>>
File: lol.png (108 KB, 1077x341)
108 KB
108 KB PNG
lmao even
>>
>>101430144
How did Mistral fall this hard?
>>
File: mathstral coldsteel.png (125 KB, 917x464)
125 KB
125 KB PNG
It seems a pretty frequent issue when RP testing Mathstral is that it doesn't fully grasp the concept of possession.
>>
>>101423539
i feel the same for anyone who doesnt daily drive wizard 8x22
>>
>>101430172
Is that better than CR+?
>>
>>101430368
No, 8x22b is a more, cr+ is a dense model. Dense models will always be better.
>>
>>101430368
>Is that better than CR+?
for creative writing of any kind, local SOTA by miles
>>101430391
>Dense models will always be better
given how mixtral mogged most models when it came out and had 46B parameters and how wizard mogs everything for creative writing right now albeiet with 141B, thats false
>>
>>101430423
cope
>>
File: 1692119730708580.png (2 KB, 170x52)
2 KB
2 KB PNG
>>101430450
the only cope is from sour grape niggers who cant actually run the model (you)
>>
File: 1721048508824379.png (43 KB, 2510x185)
43 KB
43 KB PNG
>worse than Yi 34B and Phi Medium
>>
>>101430459
>new shitty leaderboard
>>
>>101430458
Are you seriously trying to flex with 128GB of RAM? lol
>>
>>101430468
>deflects
concession accepted worthless nigger
>>
>>101430458
I can run it on Q4 with full GPU offload.
It's shit.
CR+ and even the shittiest 70B finetune are better.
>>
>>101430459
so why arent you using that starlight or whatever the fuck was at the top of all leaderboards at 7B then? lmao
>>101430479
>I can run it on Q4 with full GPU offload.
sure you can lil bro
>>
>>101430423
CR+ is way better than wizard for creative writing lol what are you smoking? wizard is better for logic and code and pretty much everything else *but* creative writing, where it's complete and utter formulaic slop
>>
>>101430459
>>101430490
also, isnt it funny how this nigger uses the general purpose bechmark instead of the one for creative writing/roleplay that basically showed wiz on top anyway lol, does anyone have a link? i didnt save it since i knew i wont need it since i wont get anything better than wiz for months
>>
>>101430490
>>101430475
The cope is palpable kek
I feel sorry for you, you feel for the local meme scam and now can't accept you were made a fool.
>>
>>101430543
>The cope is palpable kek
indeed, the cope of a underage kid nigger who cant post his rig that he definitely has that ran wizard LMAO

seethe more brown
>>
File: smi.png (78 KB, 757x537)
78 KB
78 KB PNG
>>101430490
Kiss my ass.
>>
File: 00003-1532105500_1.png (1.2 MB, 1024x1024)
1.2 MB
1.2 MB PNG
>>101430543
>the local meme scam
The locust naturally can't help but out itself. Go back to your containment thread so you can drink your piss and beg for claude keys, poorfag
>>
now how the fuck hasn't anyone converted mamba-codestral to HF format yet? I have Nala tests to run.
>>
File: 1709047514102805.png (133 KB, 1191x884)
133 KB
133 KB PNG
>MUH BECHMARKS
>NO NOT LIKE THAT GOY AAAAAAAAAACKKK
https://eqbench.com/creative_writing.html
>>
>>101430596
*tap* *tap* >>101367108
>>
>>101430658
How the fuck does one even numerically quantify something like creative writing? That's retarded.
Anyone claiming they can objectively benchmark something that abstract is mentally retarded and probably not a sentient lifeform.
>>
File: 1717394898281072.png (115 KB, 1800x1578)
115 KB
115 KB PNG
>>101430665
>resident cuckold literally in this thread 24/7
you really cant make this up
>>
>>101430558
>>101430596
bold of you faggots.
1.you're running black box toys on your 10k+ $$$ shitboxes, you can't get rid of unwanted shit or cuckery, you settled down for it like a cuck and coping with meme jailbreaking that kills performance or / and makes your model dumber.
2.you have no control over "slop writing" and that's why you are crying about it all the time here, because there's nothing else you can do.
3.today mistralai gave you instruct tunes only and you will eat it up like a good free jeet goy.
There, my two cents in this.
>>
File: i967lf0ud63d1.png (975 KB, 871x988)
975 KB
975 KB PNG
>>101430536
>>
>>101430686
>How the fuck does one even numerically quantify something like creative writing? That's retarded.
by asking claude
>Change to Claude 3.5 Sonnet as judge (from Claude 3 Opus)
https://github.com/EQ-bench/EQ-Bench
>>
>>101430686
>How the fuck does one even numerically quantify something like creative writing
same way someone quantifies anything and everything else numerically, infinite cope
>>101430696
didnt read >>101430687
seethe
>>
>>101430658
>judged on less than thirty prompts
>all single turn
>judged by a language model
pffftahahahahah
>>
>>101430707
yeah thats the one, link?

>>101430720
>>101430707
lets hear the next cope kiddo, lmao
>>
>>101430713
seethe or not, you are still running black box toys, go write another "ahh ahh mistress" in your ST chat i guess?
>>
>>101430736
look at the list anon, I don't even have to explain why it's retarded
>>
>>101430741
You have the wrong anon. That's me. And it's to make fun of /aicg/ locusts.
>>
>>101430741
>black box toys
just because you dont understand something doesnt mean other dont
just because you cant pull out a group of "neurons" from a neural network and analyze what their exact functions are because they are too complex doesnt mean you dont know anything about the LLM
also >>101430687
>>
>>101430658
>Yi-34B-Chat
How did it climb there?
>>
>>>101430755
Go back
>>
>>101430755
>you dont understand anything because i said so!
how's that abliterated meme lives?
>>
yeah this general really did get unusable from literal paid shills and mindbroken brown locusts

a few days after mixtral dropped it and then by L3 it was all norminigger ville

not even worth to ctrl+f "http" to see the papers posted anymore, just join other non mindbroken tranny communities and run local models without retards screeching like monkeys all around you
>>
>>101430776
>no argument
you reaaally are dumb irl arent you? what a grim existance
>>
>>101430808
No one cares about cloud models here, retard.
>>
>>101430791
your circlejerk general is not that important for paid shills or any financial waste from 3rd side groups, do not worry.
>>
>>101430658
>gemma-2-9b and Midnight Miqu are better than Opus
When did Reddit migrate to /lmg/?
>>
>>101430707
The only time I have seen WLM beat CR+ in writing is by not using a braindead quant like Q2 or Q3 and also setting the context and instruct prompts for it.
>https://huggingface.co/Quant-Cartel/WizardLM-2-8x22B-exl2-rpcal/tree/main/Settings-Wizard8x22b-rpcal

This chart is also missing this awesome storywriting model which I prefer to CR+ most of the time:
>https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter
>>
>>101430791
That's why I go to r/LocalLLaMA when I want to find papers and new stuff, and I go here when I want to shitpost.
>>
>>101430832
>Quant-Cartel
>tdrussell
It's not organic enough, petra.
>>
>>101430736
https://huggingface.co/datasets/froggeric/creativity
>>
>>101430851
You are the problem.
>>
File: 1721085350403749.png (195 KB, 500x553)
195 KB
195 KB PNG
>>101430696
>>10k+ $$$
>poorfag thinks this is some great sum of money
Filtered by a fraction of a bitcoin lmao
>>
>>101422220
moistral v3 > fimbulvetr. but there's no reason to use either of those anymore.
>>
>>101430864
>poorfag poorfag poorfag poorfag
calm down?
>>
File: 1mi.png (168 KB, 806x796)
168 KB
168 KB PNG
>>101428614
>Mixture of A Million Experts
>https://arxiv.org/abs/2407.04153
WTF, why isn't this a discussed more? If this many experts beat a dense model of similar size by a significant margin, it means there's no need anymore for very fast memory, the model will be very fast for single-user inference even from storage. You could dedicate a fast NVMe SSD for a 1 trillion parameter MoE model.
>>
>>101430893
This general is dead
>>
File: 00007-1773722496.png (1.19 MB, 1024x1024)
1.19 MB
1.19 MB PNG
Reminder that the locusts shitting up this thread give their logs to strangers because the price of their dignity and privacy is less than a small side of french fries
>>
>>101430893
>If this many experts beat a dense model of similar size by a significant margin
it doesn't
>>
>>101430893
At what point does the experts become regular ff layers
>>
File: 1709040134938637.png (315 KB, 636x491)
315 KB
315 KB PNG
>>101430906
Are these locusts in the room with us right now?
>>
>>101430931
NTA but yes, you.
>>
>>101430960
seek for medication
>>
>>101430969
We've already discussed the irony of you saying this, Anon. You will never be an attack helicopter.
>>
>>101430791
Any you recommend? Where are my /lmg/ oldfags these days?
>>
File: file.png (355 KB, 512x512)
355 KB
355 KB PNG
Funny how the mikufag loses his shit again because the benchmarks show that his meme model is, in fact, a meme.
>>
>>101431032
lol
>>
>>101431032
lol
>>
>>101431032
lol
>>
File: 1711072659524106.jpg (951 KB, 1792x2304)
951 KB
951 KB JPG
>>101431030
MINDBROKEN
I
N
D
B
R
O
K
E
N
>>
stop mentioning eqbench. NOW! don't bring that up again.
>>
>>101431032
lol
>>
>>101431075
yes, you.
>>
>>101431032
lol
>>
>>101431075
I'm not a furry.
But.
>>
>>101431032
lol?
>>
>>101430893
>24 PPL vs 21
Show me the downstream task difference.
>>
lol
>>
>>101430906
catbox please
>>
>>101431253
>>101431253
>>101431253
>>
>>101430707
daily reminder that this "benchmark":
>was created by a finetunner to promote his own model (WestLake shit) which was at the top of the table for a long time before he realized it's too suspicious and moved them a bit down, lowering their scores
>doesn't have published questions (because there are none, he puts random scores that are somewhat reasonable)
>is probably shilled by the author himself because anyone sitting here longer than two days would know about what I written before
>>
>>101431300
cope
>>
>>101431362
Hi froggeric
>>
File: file.png (396 KB, 474x316)
396 KB
396 KB PNG
>>101429374
That is actually our lord and savior bitnet that is about to come back from the dead anyday now.
>>
Hi all, Drummer here...

I actually have no fucking idea what I am doing.
>>
>>101430658
>Yi 34B chat is basically a 70B
Weird how nobody is using it.
>>
>>101431641
use faipl-1.0
>how to use faipl-1.0
put the following in the readme:

license: other
license_name: faipl-1.0
license_link: https://freedevproject.org/faipl-1.0/
>>
File: file.png (1.01 MB, 768x768)
1.01 MB
1.01 MB PNG
>>
>>101431799
Can you go out there and win one case with those licenses? And then publically say "hi i am the license autist from /lmg/" and then post the link here where you say that?
>>
>101432045
the jew fears the faipl-1.0
https://en.wikipedia.org/wiki/Free_Software_Foundation,_Inc._v._Cisco_Systems,_Inc. heres a case for gpl
>On May 20, 2009, the parties announced a settlement that included Cisco appointing a director to ensure Linksys products comply with free-software licenses, and Cisco making an undisclosed financial contribution to the FSF.
faipl-1.0 is fairly new, if someone small steals your shit, not a big deal, jews on the other hand..
>>
>>101432092
I don't care about that. Go make a shitmix and license it with your autistic pet peeve and then win a case. I am waiting.
>>
>>101432113
Provide the compute. I am waiting.
>>
>>101432140
I just want you to shut the fuck up you literal autist. You are absolutely retarded and it hurts to read your license posts. Those licenses mean absolutely nothing, they aren't enforceable and no big company cares about your dumb shitmix you create by firing up SGD or ADAM on default parameters and letting it run for 2 hours. If they cared Hi guys drummer here wouldn't post here cause someone would have headhunted him.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.