[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: LMG-rebranding.png (286 KB, 1024x842)
286 KB
286 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Family-Friendly Edition

Previous threads: >>102737214 & >>102731640

►News
>(09/27) Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
>(09/25) Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog
>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: recap.png (44 KB, 1023x316)
44 KB
44 KB PNG
►Recent Highlights from the Previous Thread: >>102737214

--Microsoft's 2TB VRAM Black Systems servers and implications for local AI:
>102737597 >102737643 >102739743 >102738300 >102738645
--Discussion on card writing techniques and styles:
>102739777 >102739814 >102739876 >102739903 >102739857 >102739888 >102739916 >102740111 >102740199
--Recommended resources for learning to build AI models:
>102743576 >102743625
--Python script to count words in SillyTavern chats:
>102742397 >102742496 >102742597
--Illustrious shows promise in generating uncommon poses and 2D art styles:
>102743557 >102743593 >102743678 >102743830
--Tool-calling in AI models explained, with potential use cases:
>102737392 >102737431 >102737605 >102737635 >102737878
--Text adventure system prompt and model behavior discussion:
>102742060
--SillyTavern roadmap changes and user learning curve discussion:
>102737481 >102737514 >102737833 >102737978 >102738926 >102738959 >102738994
--Lmarena sorting criticized, o1 model underperforms expectations:
>102741026 >102741283
--LMG CEO urged to make platform family-friendly:
>102738789
--Guidance on running chatbot models, recommendations for llama 7B and 12B nemo finetune:
>102742430 >102742441 >102742518 >102742538 >102742545 >102742612 >102742677 >102742730
--Discussion on best roleplay models and sampler settings for 24GB VRAM:
>102739545 >102739563 >102739619 >102739678 >102739645 >102739750 >102739767 >102739863 >102739941
--AMD GPUs and software support discussion:
>102737303 >102737337 >102737400 >102737485 >102741572
--3060 12GB recommended over 4060 8GB for running larger AI models and gaming:
>102738933 >102738970 >102739027 >102739086 >102739199 >102739233 >102739285 >102739343 >102739387
--Teto (free space):
>102739565 >102741008 >102741603 >102742087 >102742308 >102742373 >102742427 >102743557 >102743678

►Recent Highlight Posts from the Previous Thread: >>102737218

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
File: 464.jpg (153 KB, 1440x759)
153 KB
153 KB JPG
local models are about get crazy soon...
>>
>>102744026
differential bitnet, when?
>>
>>102744036
GPT-4 on our smartphones... Holy shit...
>>
>>102744026
Does this need new training?
Or can it be applied during interference?
If new models nothing will happen again for months until people forget about it.
>>
>>102744026
not soon and only if the chinese are interested. meta still hasn't put out a moe model after a whole year and mistral only releases their rejects.
>>102744036
qwen team said 3 might experiment with bitnet, so maybe qwen 4 will be differential bitnet. eta: next year?
>>
>>102744063
but I want it now.
>>
Is there an AI that can play yugioh already
>>
File: qwen.png (100 KB, 675x887)
100 KB
100 KB PNG
qwenbros...
>>
>>102743974
>>102743977
Remember the importance of respecting the values and boundaries of one another as we discuss safe, productive usage and development of local Mikus here at /lmg/.
>>
>>102744100
>qwen 2.5 72b can't even beat llama 3.1 405b
Shameful display
>>
>>102744100
haha nice edit anon surely mistral large would be high on that list
>>
Do you use localchub or the database dump on evulid?
>>
>>102744100
Will these models bite me?
>>
File: ComfyUI_06441_.png (779 KB, 1280x1280)
779 KB
779 KB PNG
>>102743974
>Family-Friendly Edition
>>
>>102744100
Nothing wrong with some fun for the whole family.
>>
>>102744320
Meant to reply to >>102744270
>>
Linus Media Group general
>>
https://x.com/JeffDean/status/1843493504347189746
Turns out that "ai uses 100 million gallons of oil every inference" claim was made by a troon lol
https://x.com/strubell
>>
>(09/27)
gimme more news i want dopamine now
>>
>>102744432
your dopamine will have to wait until after the burger elections are over
>>
File: 27.png (288 KB, 1019x376)
288 KB
288 KB PNG
>>102744446
>:(
>>
>>102744384
Light machine guns are not family-friendly.
>>
File: 1703963750592668.jpg (230 KB, 1600x900)
230 KB
230 KB JPG
>World info and author notes are problematic.
>>
>>102744384
Linux Mobile General
>>
Node based editor with python based costum nodes and llama.cpp and guidance support for complex cascaded role play multi agent rag based simulation prototyping?
>>
>>102744816
No thanks. For me, It's ServiceTesnor.
>>
>>102744816
Closest I know of
https://github.com/floneum/floneum
>>
Alright I must've done something to my samplers in ST or with the text generation api, it behaves fully deterministic on every swipe regardless of what I change to sampler settings.
>>
>>102744906
Oh no! Keep us posted.
>>
File: 1713264502369311.jpg (643 KB, 1350x1543)
643 KB
643 KB JPG
Can anyone spoonfeed me on current meta? Haven't been around since Mythomax. Preferably something that can run more or less smoothly on system of 4090+3090.
>>
>>102745188
no
>>
>>102745188
still mythomax, yuzu maid, bmt and midnight miqu
>>
Whats the consensus on the best model for coomer roleplaying
>>
>>102745457
gemmasutra2b
>>
File: 1711505970328958.jpg (1.34 MB, 2563x1709)
1.34 MB
1.34 MB JPG
>>102745476
Thanks bro here is a picture of my cat in return
>>
I've tried out Magnum 72b v2 again using the settings from Infermatic and I find it unusably bad.
https://files.catbox.moe/rqei05.json - preset
https://files.catbox.moe/btnhau.json - instruct
https://files.catbox.moe/7kct3f.json - context
People who claim to have any success with Magnum 72b v2 whatsoever, what is your system prompt and what are your sampler settings?
>>
File: 1716419406660758.png (4 KB, 204x53)
4 KB
4 KB PNG
I have Min-P set to 0.05 and temp at 0.77 with no further samplers. How the fuck did it do this?
>>
>>102745662
you'd need temp 0 to completely remove rng
>>
>>102743974
recommend me a good uncensored model, non woke, non pozzed
without looking at the benchmarks, just your personal experience
>>
>>102745694
mlewd
>>
>>102745694
For RP or what? The old Command-R is pretty wild but I never asked it its opinion of FBI crime statistics.
>>
>>102745789
Also context takes a lot of memory and it's undercooked so you can't just run it with neutral sampler settings or else you'll get Chinese and the occasional fragments of Russian and other languages in the output.
>>
>>102745789
no RP, just an uncensored model.
if I ask the statistics about black on white crime I want a non pozzed response
>>
>>102745694
Exclude Llama 3.x. The censorship mostly or entirely goes away when you don't call the LLM role "assistant" but it's pozzed to a ridiculous degree. Example: in an RP about being a serial killer I kidnapped a non-op FtM tranny and after cutting off her clothes I untied her and told her to GTFO becsuse I'm only interested in raping and killing boys and instead of leaving she kept insisting she was a boy. Hilarious.
>>
>machine learning got the nobel prize for physics
>machine learning got the nobel prize for chemistry
Literature when?
>>
>>102745837
Give a complete sample prompt and I'll test it on everything I have access to excluding RP finetunes.
>>
File: lust.png (5 KB, 512x514)
5 KB
5 KB PNG
Hi.
Please spoonfeed a returning retard.
Are we still stuck in a hardware limbo?
Can I get something substancially better than my ol' 3080 Ti 12GB without breaking the bank?
>>
>>102745882
yeah, buy ram
>>
>>102745893
I've a i5 11600k, and 32GB RAM.
Last time I checked, it was not nearly enough to beat my GPU.
Can I really get better performance just by chugging more RAM?
That seems unlikely.
I could probably fit a bigger model, sure. But the token/s would be horrendous, no?
>>
>>102745931
your options are retarded small models or slightly less retarded, slow big models
>>
>>102744026
differential transformer will be as forgotten as bitnet
>>
>>102745882
Mistral Nemo 12B if you want a quickie
Mistral Small 22B if you have patience or quant it
>>
>>102744062
>Does this need new training?
yes
its over
>If new models nothing will happen again for months until people forget about it.
If its forgotten about its because it didn't scale or wasn't good enough, anything thats actually a improvement/free lunch gets quickly adopted by the industry
>>
>>102745882
>>102745931
>Last time I checked, it was not nearly enough to beat my GPU.
>Can I really get better performance just by chugging more RAM?
memory bandwidth is all that matters. a ddr4 desktop will be around 50 GB/s. a desktop with ddr5 will be 80-130GB/s. a modern 6-8 channel ddr5 server will be 150-300GB/s. compare this even bottom of the barrel gpus that will do 300GB/s and xx80+ gpus that will do more like 1000GB/s.
cpu inference isn't always bad especially if you get an older 6+ channel server, but you can expect at most 2-7 tokens/s for a 70b model and it's probably cheaper to get 2x 3090s for a 6-20x improvement
>>
recommended alternatives to ollama for cli llm?
>>
>>102746217
>cry in 2xDDR4.
>>
>>102745786
Any more models focused on lewd storytelling?
I'm looking for something with a big context length but I'm willing to try anything, I'm just tired of shivering spines, man...
>>
>>102746014
What are some decent Mistral Small tunes? I tried Cydonia and it was pretty okay.
>>
>>102743974
Oh no, lmg is going to be sold off
>>
>>102746357
Try Rocinante v1.1
Although I think Nemo only has an effective context length of 16k tokens
>>
>>102746457
I already do, is my favorite. The thing is that you get tired of it when you make several stories and it starts to write the same shit over and over for the lewd scenes.
>>
>>102746499
all the models do the same shit for lewd
>>
>>102746499
Proompt.
Add randomly inserted tags to the last assistant prefix that steer the model into >>102746499
responding a certain way.
I've taken to doing that per-card, so that I can reinforce nsfw, mystery, certain speech patterns (like stutter), slow burn, etc.
>>
>>102746499
Is it really a model issue though? I mean there are so many ways to write how to put ponos in vagoo
>>
>>102742060
>I also thought emphasizing the finite nature of the RP might change the kind of stories generated but so far I haven't noticed evidence of that.
With Qwen 2.5 72B I just got
>Note: This is the penultimate scene. The next scene will conclude the story.
which was kind of neat. And indeed no matter what I do at that point the story ends.
>>
Dam my motherboard doesn't POST
>MZ73-LM1

Either I try to make it work with a power switch tomorrow, or the ethernet diagnostic, or my local entry is kill
>>
>>102746228
llama.cpp is the allfather
>>
>>102746811
> doesn't POST
Doesn’t POST, or doesn’t power up at all?
>>
>>102746811
>switch
touch a screw driver between the 2 pins
>>
What is the current meta for 8B? Is Stheno still king?
>>
>>102746857
Put a bullet through your head.
>>
>>102744063
> meta still hasn't put out a moe model after a whole year
This is what really grinds ny gears. I need a big beak MoE of legendary quality. Deepseek is ok, but something like a 5x123b would be in such a sweet spot…Meta could pull it off, and considering 405b’s smarts I’m sure they have the data and experience to do it competently.
>>
>>102746857
It's been a long time since I've used an 8b.
Why not use quanted nemo?
Q4ks should be about the same size as Q6 8B I reckon.
>>
>>102746893
>5x123B
No one has the hardware to run it. We need another 7x8B.
>>
>>102746857
>>102746899
Buy an ad.
>>
>>102746912
An ad for quanted nemo?
That would be funny.
>>
>>102746899
Because I'm on DDR3 so models above 8B are abhorrenty slow. I think even Q4 fimbuveltr (11B) was too slow to use.
>>
>>102746912
>t. AMD being jelly because they have no nemotron and no CUDA
>>
>>102746928
Ah.
Oof.
My condolences then.
Wouldn't it be better to run your shit on google colab at that point?
Regardless, last good 8b model I used was indeed Stheno (3.2, 3.3 was fucked IIRC), but that was vanilla Llama 3. I have no idea if there are good 3.1 or 3.2 fine tunes.
Maybe Drummer try some Drummer gemma 9b tune. I remember he had a few, although I never used those myself.
>>
File: front_panel_header.jpg (69 KB, 452x406)
69 KB
69 KB JPG
>>102746835
Powers up 3 of the leds on the board (Cant see , with
"LED_BMC (BMC Firmware Readiness LED)" light blinking for me which GPT asks me to ethernet it up and diagnose

the ID LED on the side gives me a blue light, so I guess these are all fine

No spin up of the CPU fans attached on the cpus on Pin jump.

Just before the GPT summarized it one last time it told me that my 2x Ram slots were wrong, so Im going to do that properly tomorrow.

>>102746841
Pins 11 and 13 yeah from pic yeah, no luck to boot it

I will just try tomorrow with the step by step instructions from GPT, but I was just wondering, not asking for spoonfeeding just insight if anyone has any, My mistake not ordering a power switch as I dont have a case.

Thanks anons
>>
>>102746963
>>102746928
Buy a fucking ad Sao.
>>
>>102746963
Yeah seems like it's still Stheno then. Thanks. I also found 3.2 to perform the best, but I have this issue when after using one model for too long you can literally predict what it's about to generate, which kinda ruins the experience.
>>
>>102746965
try reseating the video card, even old ones can weigh quite a bit and come loose enough to cause issues after a while
>>
So what would it take to make a completely uncensored model from scratch? Something around Nemo's level but actually uncensored and with good writing capabilities?
What's the biggest problem, money, dataset, training?
>>
>>102746988
Yeah I knew someone would say this. If you have any better suggestions feel free to suggest, it's not my fault his finetune is the best out of what I've tested
>>
>>102747016
I would start with Sao™'s Stheno™ dataset. For superior roleplaying capabilities.
>>
>>102747000
Stheno is nice, but it's also very one note.

>>102747016
The problems are all money, in that it takes money to get the hardware, takes money to get the people to curate the dataset, etc.
And keep in mind that these models aren't made in one go. There's some (a lot?) of trial and error.
>>
>>102747016
I don't think local models have any serious censorship
>>
>>102746928
>stheno
>fimbuveltr
>"I only use Sao models"
>>
>>102747035
>also very one note
Yeah, that's exactly the reason I'm looking for something new. It's been quite a while since its release after all
>>
>>102747052
Have you tried Stheno plus?™ available to all my... I mean his patreon subscribers. (Gold tier and above only)
>>
>>102747035
How much money then?
>>
>>102747041
I don't, there's around 150gb of weights total. Honestly I don't even know if it's sao or not, I just download the ggufs
>>
>>102747052
Buy a fucking ad, shill.
>>
>>102747070
>still no suggestions for better 8B models
Sao won.
>>
Doesn't Kofi or whatever these idiots use for funding have a policy against deceptive advertising?
If jannies here aren't going to do shit about him why don't we just go straight for his lifeblood?
>>
That sure shut him the fuck up fast.
>>
>>102746965
edit: If it isnt DIMM_P0_A0 and DIMM_P1_M0 when I try it next, its bricked on that end as thats what the primary slots are

>>102747014
Thanks yea, I have no GPUs connected atm, just trying to run bios for now via onboard VGA to monitor
>>
>>102747107
I'm just waiting for someone to post a better 8B model than Stheno. Still nothing but malding in the thread.
>>
>>102747120
We're just waiting for you to go back to Discord or to kill yourself, whichever happens first.
>>
>>102747146
Discord has been banned in my country yesterday
>>
>>102747167
Must be a nice country.
>>
>>102747061
Shit nigga, a lot.
Meta has some figures for llama 2
>https://github.com/meta-llama/llama/blob/main/MODEL_CARD.md
L2 7b took 184320 A100 hours, and that's without account for all the work before that.
Meaning that you'd need something like 20~ish of those to be done in around a year time.
Let's put the cost of each A100 80Gb at 25k, that's 500k USD just in GPUs.
So let's say it would take something like 250k today if you optmize all your costs (newer hardware, rent compute, whatever).
That's so,e very rough napkin math, but at least gives you an idea of the order of magnitude you are working with.
>>
>>102747109
Is your RAM on the QVL?
>>
>>102746904
> No one has the hardware to run it
lol
>>
>>102747016
this is a stupidly open ended question. what do you mean by completely uncensored? the model tells you how to cook meth, or the model conforms to your own particular biases, or the model doesn't filter the dataset and/or doesn't include what you want, or the model engages in niche taboo subjects?
most local models will tell you how to cook meth if you prompt correctly. getting over built in bias (both from safety/alignment/instruction training as well as source data) is more difficult: >>102745857. if the latter 2 answers apply, is there even enough data to do this 'with good writing capabilities'?
data will be a serious concern depending on how you answer. everything is trained on the same dozen or so big web datasets and proprietary books, if you can't ever get what you want out of prompting existing models you're basically throwing in the difficulty of human curation of the data you want into the mix as well.
by from scratch do you mean completely from nothing, or is a fine tune acceptable? I don't think you can fix broken models with a fine tune, but even if you could you're looking at throwing a few billion tokens into a base model. starting cost just for training is probably $3k. 10x it if you're serious. 10x it again if you want a model completely from scratch. the limit to this is probably 1t tokens from scratch for llama2-era capabilities or a one trick pony finetune.
>>
betnit
>>
>>102745022
Woah I never expected someone like you to reply, thank you very much bwo. I guess AMD just isn't reliable enough no matter what then... I'll just suck it up and buy a 3060 12gb...
>>
>>102747480
>I guess AMD just isn't reliable enough no matter what then
To be clear, my opinions are based on the presupposition that < 24 GB VRAM is not worthwhile in the first place.
I'm specifically turning on my trip and telling you my perspective because part of your judgement was based on the assumption that the software support will become better.
At least in the llama.cpp space there are no dedicated devs for AMD support though, I am essentially the only core dev that even owns AMD hardware and my efforts only extend to me making sure that the HIP port of the CUDA code is not broken and testing which kernels should be selected at runtime if there are multiple options.
There is also another dev working on more general Vulkan support which would also work on AMD but my personal opinion is that that approach is just too hardware-agnostic to ever work really well.

If the target is 16 GB then non-NVIDIA options are dramatically better value and the current software stack can be largely made to work with AMD hardware (just with worse performance).
I myself have an RX 6800 (for development) and on a system based on Arch Linux llama.cpp is essentially working out-of-the-box (but with worse performance).
The main areas where you will have problems are the bells and whistles: FlashAttention support, multi GPU support, etc.
Chances are you will also need to spend more time troubleshooting.
But I would say that for a single GPU setup with a target of 16 GB VRAM AMD is a viable option depending on how you value $$$ vs. speed and manual effort.
>>
>>102747480
amd works fine if your card works on rocm. the cards listed as being officially supported are cards that are guaranteed to work on that version of rocm, not that the card works on rocm. there have been a few notable gpus in the past decade that are broken in rocm and you can't really do anything about it, but for the most part it's probably going to work if you're okay with having to figure shit out on your own. llama.cpp is fine, python pytorch stuff is *mostly* fine, if you're doing forks of forks you might have to figure out how to use/enable rocm as they're going to default to cuda but it's probably supported.
llama.cpp dev's real argument is if you're not buying a card with 24gb you're wasting your time and money, and in the 24gb category your choices are a joke $300 p40, a $700 3090, a $800 rx 7900 xtx, or a $1500 4090, and if you're buying an rx 7900 over a 3090 there's something seriously wrong with you. the reason for 24gb is that it allows you to run up to 45b models in q4, whereas 16gb is 30b and 12gb is 22b. with the overhead of longer contexts and better quants it basically becomes 24gb to run 30b models and everything else for 13b and lower.
my advice in the $300 ish category would be a used 16gb rx 6800 if you're okay with some headache, but really it's not worth buying a 12-16gb card just for llms.
>>
>>102747120
>I'm just waiting for someone to post a better 8B model than Stheno.
Same. Make me switch. Come on. You think I don't want to? I've had it up to here with mischievous smiles.
>>
>>102747696
>multi GPU support
it should be fine if they're the same model or same generation where you can use the same HSA_OVERRIDE_GFX_VERSION for both. no idea if cross generation even works. P2P support probably isn't happening for most people either
>>
>>102747374
What, do you have 512GB VRAM?
>>
Ayo aicg is dying. Any local models I can run on my laptop yet? Must at least be on par with chorbo, 3.5 is a plus.
>>
>>102747799
Take a look at this issue though: https://github.com/ggerganov/llama.cpp/issues/9761
Here someone has an AMD multi GPU setup which for whatever reason produces a segfault during GPU<->GPU data transfers.
But since there is no dev available to fix it the chance of this getting investigated and potentially fixed are pretty slim.
>>
>>102747824
llama3.2-1B
>>
File: 1701186556040777.jpg (8 KB, 276x66)
8 KB
8 KB JPG
>>102747762
soon
>>
In KoboldCPP, is there a way to count how many tokens my input is when it likely goes over the context limit by a lot?
>>
>>102747824
>aicg is dying
It died when all the redditor proxy locusts showed up.
You are nu-aicg
And I'm glad your crappy thread is dying.
>>
>>102747838
What the fuck is an "antislop sampler"?
>>
Is pathetic that AMD doesn't pay a couple of pajeets to fix their shit, at this point I think their gpu division is trolling or want radeon to fail for some reason.
>>
>>102747853
the sampler that will make all your llms smarter, more flexible, more creative and fix all other issues
this time for sure
>>
>>102747844
>he cannot guesstimate the number of tokens in text
newfag. also ST can do that
>>
>>102747862
...remind me, which time is it?
>>
ST status? Will I lose my waifus if I pull?
>>
>>102747830
this is what I was thinking of when I said P2P (transfers) actually. my understanding is that rocm just can't do gpu<>gpu transfers unless the gpus are in pcie slots that are connected directly to the cpu and not through the chipset or bifurcation, and iirc the pcie slots have to be the same link speed. this basically rules out most if not all consumer motherboards. my memory may be a bit hazy though, I only had to look this up late last year when llama.cpp inexplicably broke for a few weeks until `--split-mode layer` was made the default
>>
>>102747853
I think it bans phrases with backtracking. It might already exist in exllama as banned_strings, that was added 5 months ago, but nobody cared because it didn't have the funny name...
>>
>>102745188
If you liked mythomax, magnum 12b based on nemo will be right up your alley. You can run larger models, but I don't think there's anything better for writing, unless you simply want a smart and dry model with no personality.
>>
>>102747853
It's a breakthrough in sampling.
The dev has understood that slop is a 100% subjective experience for the user.
Therefore placebos which mainly affect the subjective experience are extremely effective.
Just by labeling the technique "antislop" the user will experience less slop.
Revolutionary!
>>
>>102747934
Better to just use NeMo than Magnum 12B. Source: I tried them.
>>
Does Midnight Miqu have multiple languages? Or just a small amount of other languages? I tried to make Midnight Miqu to continue a story that is in my native language. It can form some a sentence structure and it makes a little bit of sense, but some inflections are wrong and there are a few nonexistent words and overall it falls apart quickly.

This got me thinking, would it be possible for a language model to invent a new, nonexistent language that nobody speaks? And then you could ask the model to interpret it too. As long as there's enough text, it can pick up the patterns and somehow understand it. Isn't the real language ability of current language models kind of unexplained too?
>>
>>102747830
>>102747929
issue 4030 is most likely related, and one of the last comments on issue 3451 links to a HIP issue related to P2P transfers if you wanted something to reference
>>
Is Aphrodite 1-1 with the features of vLLM? The last vLLM version improved the http server but I don't know if it has been ported to Aphrodite yet.
>>
>>102747929
it's quite insane that when the amd driver can't do the p2p transfer it crashes the entire application instead of returning an error. or worse, it just fails silently, which is what often happens with 7900, which is why llama.cpp has a workaround with GGML_CUDA_NO_PEER_COPY.
>>
>>102748022
llms are trained on more than just english but they're pretty much always predominantly english. the slightly technical but probably wrong answer is that the transformers architecture that all llms use was designed to aid in machine translation, and the way llms encode information into vectors means that the same vectors are going to be activated for simple words (nouns, adjectives) regardless of language. this will extend to words with similar meanings too, but also curiously things like base64 encoding or caesar ciphers. a surprising use case for llms is that they're fairly capable of deciphering obfuscated code.
what you're asking probably won't work well for two reasons though, firstly llms are bad at tokenisation problems (think: only use words without the letter e, or count the amount of letters in this word, CJK languages with lots of tokens do badly, etc). secondly is that llms aren't really reasoning, they're predicting what comes next based on statistical probability. what you should be able to do though is construct a language or rules for a language and put that in the context history, and the llm will probably be able to reasonably imitate what you're feeding it. it's basically style transfer which is also something llms are good at. with enough of this type of data you can probably make a finetune that will do much of the same thing, but the results will probably be significantly worse than english -> your native language just due to the lack of training data to go off of compared to your own native language
>>
I randomly checked this benchmark again and it seems like scores have changed a tiny bit and some models have been added.

https://huggingface.co/spaces/flowers-team/StickToYourRoleLeaderboard

4o is also now on it. And funnily it is beaten by 9B. Would be interesting to see what Claude 3.5 scores. Mistral Small doing very well and "punching above its weight" here, though of course it is still dumb as far as intelligence goes.
>>
>>102748238
Doubt Claude will score any better. I remember Anthropic overfitted Claude 2 so hard that it would always complete "Claud" after "my name is", and people would get shit like NPCS named Claudia
>>
>5090 will only be $1699 for 32gb vram and 1.8tb/s bandwidth
Time to sell your A6000s while they're still worth something boys
>>
>>102748284
A6000 only consumes 300W, has 48GB VRAM and there's no way on God's green earth nvidia is going to come up with a 2-slot cooling solution for a 600W GPU. That requires datacenter tier airflow and no gamer wants to hear that shit while they are trying to game.
>>
>>102748284
>willingly parting with VRAM
hoard, you fool
buy the 5090, but keep the A6000
>>
>>102748238
What kind of black magic google did with gemma2 9b? And then someone made the simpo one that supposedly trades blows with 27b.
>>
>>102748518
Well, it's also only 8k though and 4k sliding window attention on half its layers. Most model makers have moved on now and stopped making smart short context models.
>>
File: 11_05344_.png (1.64 MB, 1024x1024)
1.64 MB
1.64 MB PNG
>>102746386
Sold off to who? Select your fighter:
>corpo tier: saltman, dario, elon
>punching above their weight tier: undi, sao, alpin, thedrummer
>wildcard tier: cohee, henk, ooba
>write your own: ____
>>
>>102748863
trump
>>
>>102748863
petra
>>
>>102748863
miku
>>
>>102748863
ecker
>>
>>102748935
I don't have the money, sadly.
>>
>>102748863
wholly owned subsidiary of /aicg/
>>
>►News
>(09/27)
>>
Chorbo won.
>>
>>102749355
Election seasons. Once it is done you will see a deluge of new models. And none of them will be good for cooming.
>>
>>102748284
>a6000 prices will come down
BASED
>>
>>102748284
>Source your ass
>>
File: ComfyUI_06450_.png (1.19 MB, 1280x1280)
1.19 MB
1.19 MB PNG
>>102748863
cagefight
>>
>>102748863
ZUCC
>>
>>102748863
Arthur MENSCH
>>
>>102748863
Xi Jinping
>>
>>102748863
Sneed's LMG (Formerly Chuck's)
>>
Haven't been here in a year, still using Mixtral 8x7b 3.75bpw on my 3090. Any model recommendations for RP for someone who hasn't been following the meta?
>>
it's been 3 months since nemo dropped, and it's still the best vramlet model
it's over
>>
>>102750146
nothing has changed
>>
Bigger models are smarter, but are they also more creative?
>>
>>102750367
They CAN be. They inherently know more, but still need temp/sampler wrangling. That’ll never change with this architecture
>>
>>102750367
Being smarter also means you are more slopped, that's true for humans too
>>
File: fff.png (415 B, 254x14)
415 B
415 B PNG
>>102750367
I still wonder. Smarts helps the model keep things consistent. Dumb models will wander off and reach "creativity" by accident.
The best is probably a smart model with sampling that makes it go into uncharted territory every now and then, and then set normal sampler settings to let the "smarts" figure it out and roll with it. But not smart enough to figure out that the fun bits make no sense.
>>
>>102750367
Yes, in my experience self-merging a smaller model with itself made it more creative.
>>
https://www.youtube.com/watch?v=-fGkSXJAwV4
the future is here
>>
>corporate memphis op
SOVL
>>
>>102750146
Mistral Large/Mistral Small/Nemo Finetunes
>>
>>102750611
Mistral Small is garbage
>>
>>102750676
Skill issue
>>
>>102750515
Does any of the meme samplers do something like 30% chance it selects a totally random token after "." or at newline ? Maybe starting sentences with random tokens and then using smaller temp would be overall better for creative but coherent than trying to find max temp before it turns incoherent?
>>
>>102750688
This, but unironically. It's probably the best option for anyone running a single 3090 setup right now.
>>
I've been doing some experimenting with more structured generation and found that increasing temp at common sense locations improves quality and output diversity without fucking up coherence. For example, bumping up temp a lot for the first token of each sentence. Has anyone packaged tricks like this into a sampling tool yet?
>>
Consumer LPU when saars?
Bastard Jensen will pay
>>
>>102750764
This, but ironically.
>>
>>102750875
when intel gets its shit together(2 more years)
>>
>>102750146
If you can stand the speed hit Mixtral 8x7b Q6_K with 16k tokens of context and 19/33 layers loaded onto my 3090 runs at 5.8 tokens/second. With one fewer layer the speed drops to 5.5 tokens/second and I can fit 32k context. For me about 5.5 tokens/second is the spot right where it doesn't feel bad to use. When I'm not using Instruct the Mixtral 8x7b tune I've gravitated toward is BagelMIsteryTour.

Mistral NeMo is fool's gold. It sometimes looks so good at first that I'm sure if I just swipe or edit it a bit it will end up being fine, but it ends up completely falling apart in fewer than 6 messages in my experience.

Mistral Small fits onto a 3090 at 8.0 bpw with 16k context. It has a bit of a positivity bias. Maybe you could take a look if "just as good I swear" 8.0 bpw small beats brain damaged 3.75 bpw 8x7b for your purposes / writes in a way you like better. You won't see a big speed difference (on my machine 3.7 bpw 8x7b is 27 tokens/second and small is 31 tokens/second).
>>
File: temps.png (75 KB, 1145x630)
75 KB
75 KB PNG
>>102750718
I don't think so. Samplers are small, so it shouldn't be too hard to make one like that, if it's kept simple, of course.
However...
The default sampler chain in llama.cpp is temperature last, which makes it so that the samplers receive a fairly undisturbed list of logits. They're just trimming.
On the other hand, if you set temp first and THEN do all the trimming, you can keep more variety of tokens to stay past other samplers.
Playing with https://artefact2.github.io/llm-sampling/
Temp first keeps way more tokens than running temp last.
I'm sure i've seen this argument a long time ago and i'm sure there was a reason to not use it. I don't expect the output to be better, or even creative, but it gives more options for the token selector at the end. The way i see it, temp last keeps the output more sensible, even for ridiculous values of temp or samplers.
>>
>>102750983 (cont)
Of course, it makes little sense using both top-k, min-p and top-p at the same time. It's just for visualization.
>>
>>102750718
>>102750832
Fuck I didn't realize someone said basically the same thing literally right before I made my comment. One of the use cases I've had for selective high-temp sampling is character generation. I'll separate out traits like name, occupation, and other descriptors. Each descriptor is intended to be brief, only a few words at most. Then I'll uniformly sample the first token of each word from the N most likely tokens, where I shrink N for each successive word. I can get pretty diverse character gens this way. Generalizing this approach to longer structured writing is hard though. Could probably do something around punctuation or clauses.
>>
>>102750983
>>102751017
And for added chaos, a very low value for smoothing factor. It equalizes the token probs so all tokens are just as likely to be picked. Very high chance of nonsense. I don't think llama.cpp has smoothing. Dynamic temperature, with high range (more than the page allows) and low exponent approximates the same effect.
>>
>>102750956
>>102750611
Thanks for the info, especially the layers you can fit on a 3090 and the generation speeds are really helpful. Will give q6_k gguf and Mistral Small a try.
>>
Largestral is so sensitive and prudish
>>
Oh, also, I ran the official Mistral tokenizer scripts to see how they tokenize shit.
Mistral Large/Small/8x22b (V3 tokenizer):
<s>[INST]a[/INST]b</s>[INST]SYSTEM<0x0A><0x0A>c[/INST]d</s>
Mistral Nemo (V3 tokenizer w/ tekken=true):
<s>[INST]a[/INST]b</s>[INST]SYSTEM\n\nc[/INST]d</s>

Does anyone here format like this, or are we all doing it wrong? Also curious why one puts out <0x0A> where the other puts out \n.
>>
File: 89184233525.png (403 KB, 692x621)
403 KB
403 KB PNG
your thoughts on it sirs?
>>
>>102751341
It's an AI generated answer, ironically enough.
>>
>>102751333
I meant to say I ran their tokenizer to see how it formats prompts (because it also formats the prompts for some reason)
>>
>>102751333
Fucking 4chan formatted them wrong, Mistral Large should be
<s>[INST] a[/INST] b</s>[INST] SYSTEM<0x0A><0x0A>c[/INST] d</s>
Note the spaces
>>
>>102751333
0x0a is 10, which is \n. Is that a literal "<0x0A>" string or just an escaped \n? Both are tokenized with the same script, just different model, i assume.
Maybe you can find something of interest here.
>https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md
You can also look for those tokens in the tokenizer.json files for each of the models, see if it's just an escaping thing or it's a literal "<0x0A>".
>>
>>102750956
>but it ends up completely falling apart in fewer than 6 messages in my experience
Make sure you don't use DRY sampler with it.
>>
>>102751333
Here's the source if you wanna test the prompt formats yourself
https://rentry.org/4ebczitt
>>
>>102751341
>>102751357
The equation implies AI is zero. Wew. Thanks.
>>
>>102745188
You have 48gb of vram. You shouldn't be using any model smaller than 123b Luminum. Even at low quants, it's far superior to 70b miqu.
>>
>>102751464
Good question. Also thanks, that's a nice resource. The way it talks about tokenizing messages separately and concatenating them made me nervous that llama.cpp might not be doing it right if you feed it one single prompt.

Here's a script confirming that inputs to llama.cpp tokenize exactly the same as the official formatting and tokenizing script:
https://rentry.org/959fpgmz
>>
>>102751341
Seems like some real pseudointellectual dreck
>>
>>102743974
How do I run a model and keep it updated with fresh Internet data?
>>
>>102752184
RAG
>>
>>102746375
I second this question. The Mistral Small fine tunes on huggingface are:

# Trained with Mistral [INST] [/INST]
ArliAI/Mistral-Small-22B-ArliAI-RPMax-v1.1: "creative writing and RP datasets"
gghfez/SeminalRP-22b: "RP and creative writing and some regular questions generated by Opus at 8192 context"
nbeerbower/Mistral-Small-Drummer-22B: Gutenberg DPO x2
rAIfle/Acolyte-22B: "a bunch of random datasets"

# Trained on top of Mistral-Small-Instruct with a different (or no) prompt format
Envoid/Mistral-Small-NovusKyver: "I ran a fairly strong LoRA on it using a private raw-text dataset"
InferenceIllusionist/SorcererLM-22B: "cleaned and deduped c2 logs", prompt format is "TBA"
LlamaFinetuneGGUF/mistral-22b-v0.4: "trained on the LlamaFinetuneGGUF/Programming-Alpaca-and-ShareGPT-Style"
nbeerbower/Mistral-Small-Gutenberg-Doppel-22B: Gutenberg DPO x2 except ChatML
TheDrummer/Cydonia-22B-v1.1: No info on training data, Pygmalion/Metharme prompt format

# Abliterated (was this needed?)
byroneverson/Mistral-Small-Instruct-2409-abliterated
zetasepic/Mistral-Small-Instruct-2409-abliterated

# ???
eagle0504/mistral-small-22b: no info on what this is
spow12/ChatWaifu_22B_v2.0_preview: some Japanese thing

# Merges
knifeayumu/Lite-Cydonia-22B-v1.1-75-25: Cydonia-22B-v1.1, Mistral-Small-Instruct-2409
Nohobby/Karasik-22B-v0.2: Mistral-Small-22B-ArliAI-RPMax-v1.1, SeminalRP-22b, Cydonia-22B-v1.1, Mistral-Small-Drummer-22B
Steelskull/MSM-MS-Cydrion-22B: Cydonia-22B-v1.1, Mistral-Small-22B-ArliAI-RPMax-v1.1, Mistral-Small-Gutenberg-Doppel-22B, Acolyte-22B
>>
>>102752232
ty anon
>>
wow it's over huh
>>
what are static vs imat quants for gguf? trying to understand the difference between:

IQ4_XS vs Q4_K_S
and i1-IQ4_XS vs i1-Q4_K_S
>>
File: Quants2.png (111 KB, 1771x944)
111 KB
111 KB PNG
>>102752771
Here you go.

IQ4_XS has similar perplexity to Q4_K_S, while also being decently smaller in size. In general IQ quants seem superior to the old Q quants.
>>
File: Quants.png (349 KB, 2400x2400)
349 KB
349 KB PNG
>>102752872
Oops, wrong graph. This one shows model size.
>>
>o1-preview2
it's actually over
>>
>>102752872
>>102752930
Q3, is it ever worth using?
>>
>>102752989
It was worth it a few months ago but not anymore.
>>
>>102752989
Q3 quants can still be surprisingly good. I generally prefer Q3 quants of big models over high quants of small models. Low Q2 quants is when things start to go to complete shit, IMO.
>>
>>102752988
If you sign up now you can enjoy 15 prompts per week, chuddie ;)
>>
>>102752989
Yeah, I've used q3 nemo on my phone, it was okay.
>>
Any settings/prompts I can use to force actions to be surrounded by asterisks (*) and speech between quotes (")? I'm pretty autistic about it, so if the response I get isn't formatted that way it causes me to regenerate
>>
>>102753115
Do you only want the italicization? If so, I would probably just not bother with the asterisks and instead rely on CSS in your frontend to italicize unquoted text.
>>
>>102753136
Not a bad idea, I'll give it a try, thanks
>>
I am thinking about sticking my dick into my 4090 fan just to make it make me feel anything.
>>
>>102753435
do it faggot. post pics
>>
>>102753574
>he never stopped a fan with a finger...
>>
Are the Mistral templates in ST wrong? They should start with <s>, right?
>>
>>102753612
If Servicing my Tensor was a serious software it would check for you if you are being retarded when you put <s> at the start. But it is just dumb frontend for coomers...
>>
>>102753612
No.
You never want more than one BOS token as far as I know.
Go to the original repo on huggingface and look at the jinja template in the config files if you are not sure.
>>
>>102753612
Your backend likely automatically puts a BOS token so you should not do it in the frontend. If you're using Llama.cpp, you can check by looking at the model information when it loads. It should say something like add_bos_token = true.
>>
File: cap.jpg (10 KB, 235x245)
10 KB
10 KB JPG
My LLM induced erectile dysfunction has progressed further. I stop being horny when I see the silly tavern UI...
>>
>>102753668
>>102753702
Oh, okay, thanks. Koboldcpp shows add_bos_token = true.
>>
>>102753748
it's the other way around for me, I can't get horny if I don't see the ST UI
>>
>>102753702
>disabled Add BOS Token in ST after reading this
>all rerolls generate the same message regardless of temperature
I was about to rant and call people faggots but now I am too confused to do that.
>>
>>102753898
If ST has that option for a backend, then it is probably passing that variable to the backend. They probably should've named it something different since a user can confuse it for meaning that the frontend itself will add a BOS token to the prompt.
>>
8x V100 16gb or 2 3090s for LLM inference?
>>
>>102754148
do you wanna run a big model slow or a small model fast
>>
File: 1728439002206195.png (44 KB, 452x583)
44 KB
44 KB PNG
>>102745694

>mixtral-8x7b-instruct-v0.1-limarp-zloss.Q5_K_M

Factually the best model to date. Almost every model has been a disappointment, either hyperfocused on particular subjects or lacking in intelligence.
You literally have never needed more. And with Llama 3 being an abortion, it still has not been passed.

This model will give you the sloppiest toppy and i qoute; "What makes you think they don't know what I'm doing?" ****'s words were punctuated by the sound of her spitting onto ****'s dick. Her saliva dripped down onto his balls, making them slippery and slick in her grasp. "They know exactly what kind of girl I am…and they love me for it."
Too;
"Don't worry about that weakling Vegeta, you can call out my name when you cum, nobody will judge you here," He snickered, his tone dripping with disdain for the saiyan warrior.


I shill an ancient MOE mixtral because its PROVEN itself.
>>
Has llama.cpp finally released a static bound AMD rocm? Or can only ollama do this?
>>
>>102754214
You still find compiling spooky?. May as well use windows. They have pre-compiled HIP builds for it.
>>
>>102754167
Big model at a reasonable speed.
>>
>>102754198
purchase a promotion
>>
File: 1800.gif (1.84 MB, 325x244)
1.84 MB
1.84 MB GIF
>>102754304
KEK
>>
>>102754294
8x 3090s
>>
>>102754326
Is V100 serviceable?
>>
>>102754410
Yeah, it doesn't have flash attention but it's better than having less VRAM. How are you planning to run 8 of them? I'm pretty sure most affordable servers I came across have 4x sxm2 slots at most.
>>
>>102743974
Who's got a good uncensored model that's actually uncensored, and not just for sexual stuff? How about an uncensored model that's an expert chemist?
>>
>>102754063
So does the config of the gguf enable the checkbox in ST that then passes the token to the backend? Or does the backend add the BOS token and the checkbox in ST should be disabled? I am getting a placebo feeling that mistral small is better with the checkbox disabled almost like ST keeps duplicating the BOS token.
>>
give it to me straight how slow would it be to run one of these on my android phone?
>>
>>102754506
No such thing. They have biases learned from their training data. Works the same way for humans.
As for cooking meth, i wouldn't trust an LLM to give me a good recipe.
>>
>>102754576
I'm sure 1B would be plenty fast
>>
>>102754589
>Meth
>Not adding a methylenedioxy onto the 3,4 positions of the benzene ring

Ngmi, anon
>>
>>102754576
>>102753064
Depends on the phone and model. That's like asking
>how fast is my car? How many people would it fit?
Just stop being a pussy. Try it and report back. Start with a small model like llama3.2-1b so that your battery doesn't start leaking nerve gas.
>>
>>102754472
https://forums.servethehome.com/index.php?threads/sxm2-over-pcie.38066/
I plan on following this build with AOM-SXMV.
On Taobao those 16gb V100s I think I saw one as low as $128.
>>
>>102754507
Really? If it is duplicating the BOS token, that could be a bug. Or maybe that is intended and in fact add bos token in ST is not the same thing as add BOS token in Kobold.

Do this:
In ST, right click anywhere and inspect element. Then go into the network tab. Then go to a chat and do a swipe. Then you should see some stuff pop up in the network tab. Click on the one that says "generate". Then click on the smaller tab that says payload (chrome), or request (Firefox). Then right click the top item that pops up under Request Payload and copy object (chrome) or copy all (Firefox). Then dump that in a https://femboy.beauty/ so we can see what's going on.
>>
>>102754709
you realise those are even slower, right
>>
>>102754742
>Request Payload and copy object (chrome) or copy all (Firefox). Then dump that in a https://femboy.beauty/ so we can see what's going on.
>>
>>102754632
That's the thing. Someone who knows enough chemistry wouldn't need an llm. Someone who doesn't wouldn't know if the recipe is correct.
I've always wondered about the anarchist cookbook's story... about the CIA distributing modified copies of some of the recipes to make them ineffective or just too dangerous to pull successfully or something like that.
Then the people that think they have a good copy would trust the recipes. The ones that don't would be suspicious about it. And everyone argues about who has the one True Book.
Imagine if they did the same thing when training LLMs. You'd up making a bunch of actual pretty crystals instead of chloramine...
>>
>>102754759
I would say dump in a pastebin but nowadays it seems like people are using that meme site instead.
>>
>>102754780
I don't think that's the part that confused anon...
>>
>>102754746
How slower?
>>
>>102754746
Actually no I wasn't aware. I was just trying to get some reason if that jank setup for cheap vram was worth getting into or if it's better to persue some other option.
>>
File: money.jpg (115 KB, 1024x1024)
115 KB
115 KB JPG
>>102743974
The strategic realignment towards family-friendly content appears to be yielding significant returns. It has accrued 254 posts and 28 pictures in under a day. Impressive metrics from LMG.

This truly demonstrates the power of accessibility and broad appeal in the AI education market. After all, this is what LMG has always been about.

With this family friendly pivot, LMG has successfully tapped into a previously underserved demographic, securing its position as a dominant player in the infotainment space. This strategic positioning opens avenues for lucrative partnerships and cross-promotional ventures. Congratulations LMG anon, well played.
>>
>>102754762
And yet, there is a space between those two places of knowing enough to not need one and not knowing enough that one wouldn't actually be helpful, where some of us sit and would like a brain to pick regarding questions that Google and the powers that shouldn't be are preventing us from getting answered.
>>
in the meantime the number of papers on spike-based machine intelligence is increasing dramatically. up to ten magnitudes higher inference times with ten magnitudes higher energy efficiency compared to our current tech - i drop this stuff and switch to model designs for neuromorphic chips. Why keep riding the old tech when the real revolution is already here?
>>
>>102754815
>>102754759
Does this help? This is Brave. It should be the same as Chrome. But I guess if it's not, because someone felt like being a special snowflake or something, then try figuring it out yourself, it should be a similar process. Also I don't have kobold installed so I edited the image to indicate what it should be.
>>
>>102754252
You have to do a whole separate install, because amd's amdgpu fucks up your install, if you want to be able to compile.
>>
I really don't think the llama.cpp guy knows anything about amd gpu (understandable, since amd gpu are rare and obscure hardware).
>>
I could resolve the dependencies of the compile I already have by hunting down the required files. And it may be possible to compile without installing amdgpu by mirroring the correct directory, but none of the how tos explain exactly which one, and I very much doubt that the llama.cpp guy has any idea which files those would be - amd is an exotic gpu type.
>>
>>102754944
>would like a brain to pick regarding questions
Sure. But censoring or not, it's the knowledge itself that i wouldn't trust. And they're not smart enough to pick up inconsistencies in their "thought process" when one is pointed out, or just say yes to user complaint, which is just as useless.
All the information is out there, in books, reports, papers, whatever. Take hacking, for example. Sure, you have the "Become a Hacker in 24 hours" kinds of books, and then you have the actual software, source code, CVEs, hacking damage reports, zines... you can learn from all of those.
A big enough chemistry book will teach you how to make meth, but it won't have a chapter named "Making meth for the masses".
>>
is Behemoth 123B any good?
>>
>>102755054
Your distro/package manager should have amdgpu *and* the dev libraries. Don't use amd's installer.
You can also try vulkan. I can compile it for vulkan on fucking OpenBSD. If you cannot do it on linux i have to call it skill issue.
>>
>>102755121
Try it and report back.
>>
>>102755054
>>102755171
Another option is bootstrapping into a chroot. Install your distro and the amd stuff on the chroot, activate it, compile, deactivate, copy the binaries to your ~/bin/ and run.
>>
File: file.png (879 KB, 768x768)
879 KB
879 KB PNG
Daily that face for the dead thread.
>>
Don't call the dead thread a dead thread.
>>
File: 39_00815_.png (981 KB, 744x1024)
981 KB
981 KB PNG
>>102752232
>SorcererLM-22B:
>"cleaned and deduped c2 logs", prompt format is "TBA"
It's also Mistral for the prompt format. Updated the model card to make it clearer and added some other recommended configs to play around with, thanks for the reminder anon
>>
>>102754844
>half the memory channels
>___ the speed
fill in the blank
>>
>>102755628
unzips?
>>
>>102755673
F- you're expelled
>>
>>102755628
feel
>>
>>102755628
Shivers
>>
why the fuck do most of the absolute retards who do exl2 quants never list their settings? the quality of exl2 quants can vary a lot depending on the used calibration dataset, header size and all the other parameters.
>>
>>102755917
>calibration dataset
you mean you shouldn't touch it, right?
>>
>>102755929
Yes, but for all I know there still plenty of retards around who pick up a guide from a year ago when there was no proper default one included and do it with wikitext or something.
>>
Hello, good people, any Mistra Large tune/merge recommendations?
>>
>>102756031
WhizReviewer is the best
>>
>>102756031
Luminum is still my favorite.

Behemoth 123b just came out. Haven't tried it yet.
>>
>>102756031
The one that starts with the same letter as the parent model
>>
>>102756128
[L]arge

Lumimaid?
>>
>>102755171
called what?

>>102755242
Interesting idea, not sure how it works, but it sounds promising.
>>
>>102755545
Hi tanned-pixelated-Miku-anon. How ya been?
>>
>host nextcloud instance on home server
>put all your files and documents on there
>put all your notes on there
>import all email into there
>manage calendar on there
>do your finances on there
>set up nextcloud assistant AI
>runs a llama 3 8b model that trains itself on your data locally
>get an administrative assistant that knows all your data and can actually help your interactively with managing your life
>but it's all hosted locally, there's no phoning home, no data harvesting
This sounds like the holy grail of AI, not gonna lie. The advertised result is the same as going balls deep into Google/Microsoft/Apple cloud and using their AI tools, but without selling your soul to the devil

Has anybody tried this feature themselves? It's really, really tempting for me.

Looking it up I see posts saying llama 3 8b only needs 8gb of VRAM -- would slapping a $300 3060 12gb in the server be enough to run it? Don't want to drop real money on this until I'm sure it's worth it, so it's okay if it's a little slow. Also it does appear to be able to support different models as well, though no idea how smoothly that works.
>>
File: garbage.png (350 KB, 2441x1230)
350 KB
350 KB PNG
>>102756268
picrel. The instructions on llama.cpp tell you to use amd's script. This will screw up your install.
>>
>>102756321
>8b is AGI
I look forward to your bankruptcy hearings.
>>
>>102756348
Of course it isn't AGI. But you don't need to be a genius or even of average of intelligence to be a secretary. Most secretaries are dumb as a bag of bricks, but they're still helpful. This is just a digital secretary, that you own.
>>
File: vulkan.png (2 KB, 477x125)
2 KB
2 KB PNG
>>102756268
>called what?
I have no idea what you're running, mate. Search for vulkan, rocm, hip or amd on your package manager. Some of the packages will have -dev, -headers or something along those lines. Even if you use vulkan (as opposed to rocm), it'll still run faster than cpu or not running at all.

>Interesting idea, not sure how it works, but it sounds promising.
Again. It depends on your distro. Here's some docs for arch
>https://wiki.archlinux.org/title/Chroot
Search for docs on your specific distro and adapt it to suit your needs.
>>
>>102756456
lol
>>
so I installed kobold.cpp on my phone and it works but I only installed the seemingly shitty .gguf from the tutorial
what's a better one?
>>
>>102756128
>>102756134
Mistral Large's first letter is M. I think he means Magnum 123b.
>>
File: 39_00983_.png (1.68 MB, 896x1152)
1.68 MB
1.68 MB PNG
>>102756270
Howdy. All good here. Got a few fine-tuning discussions going on around some very interesting datasets. Whatcha got going on?
>>
Starting to realize it's more time efficient to do 5 or 10 generations with a good 12B model and take the best one (good odds that at least 1 won't be retarded), rather than wait for a behemoth with 40% of the layers offloaded to grind out one good response at 1 t/s.
>>
File: 1712721345539589.jpg (870 KB, 1000x1414)
870 KB
870 KB JPG
Alright, i can't seem to get midnight miqu 70B to run at an acceptable pace on my 4080 super. Is there a different, faster model that also isn't completely retarded?
>>
>>102756964
>4080
I'm so sorry. Your only option left now is the retard Mistral Small.
>>
welp I only use base models now. Fuck intsruct and fuck fine tunes, it just lobotomizes everything.
>>
>>102757016
base models are just too schizophrenic to be useful unfortunately
>>
>>102757016
Nemo 12B base is really good if you're a storyfag, and has no positivity or safety bias whatsoever. Not sure annoying it'd be for an RPer to try to wrangle it into the chat format though. Probably a lot. Those of us who just need a pure autocomplete have it easy.
>>
>>102757036
*how annoying
>>
The "deals" on amazon prime day are dogshit.

I miss the days back when things were nice. The very best cards cost $599
>>
>>102757036
>12B
What do you need to run that?
>>
>>102756982
I can't tell if this is a joke or not
Surely i can run something decent on this?
>>
File: file.png (111 KB, 640x562)
111 KB
111 KB PNG
>>102757080
>Surely i can run something decent on this?
anon...
>>
>>102757016
>>102757036
Anything beyond 40B?
I tried out base Qwen and it was dumber than instruct + it was also censored somehow, but I think it might be an exception perhaps and they trained on a ton of instruct in the base. Which one do you get something that's both smarter and less censored?
>>
>>102757091
Pls spoonfeed me, I genuinely have no clue what i'm doing
>>
>>102757080
>>102754198
Also how much RAM? this is probably why you couldn't run a 70b. Mixtral may be your best option.
>>
>>102757135
32GB
>>
are drummer tunes trash? I always ignored them because they were worthless sizes but I am curious if the largestral one has any potential
>>
>>102757080
Just try mistral nemo, anon. It should run just fine. Ignore elitists and have fun.
>>
>>102757091
>>102757129
Download koboldcpp.
Go to huggingface and look for mistral nemo instruct from bartowski.
>>
>>102757036
>Nemo 12B base is really good if you're a storyfag
am the anon you replied to, exactly what I doing is using nemo 12b base for stories kek
>>
>>102757145
You can run the Q2_K of Miqu-70b:
miqudev/miqu-1-70b
Still better than nemo even at this size.
Big model @ small quant > small model @ big quant
Midnight Miqu is a meme, any attempt to requant miqu is doomed for failure, inb4 shills.
>>
File: GrinsonMikiulKeeep.png (1.26 MB, 832x1216)
1.26 MB
1.26 MB PNG
>>102756899
It was good to see one of your classic Mikus again.
I'm still working on some miku rpg stuff, but I'm on the road for work right now so communications are limited and I'm mostly tethered to a cellphone.
NTA, but I've been messing with Illustrious lots, too. There's some real potential there. You play with it much yet?
pic not related. Just a random backlog gen
>>
>>102757196
It's so sad he couldn't upload any other quants. I can just about fit IQ4_XS to run at a decent speed, while Q4_K_M is too large and Q2_K is too small.
>>
>>102757024
you have to really wrangle base models into working with multi shot prompting. It only understands auto completing and following example. It feels less neutered when it does work though.
>>
>>102757160
>Ignore elitists and have fun.
This. You won't get gpt4 at home on a consumer gpu, but its still stuff that was science fiction a couple of year ago.
>>
>>102757196
>>102757160
>>102757166
I'll try both of these out and see how it goes, thanks
>>
>>102757262
The core ontologies they were trained on are all the same.
>>
>>102757103
Yeah Qwen's bases are full of instruct data and they actively bragged about how filtered their pretraining dataset was. They're not truly base models, a lot of big labs are becoming more dishonest about this.
The giveaway for a true base/raw pretrained model (and Nemo 12B base is one) is that it will not understand or work in the chat/RP format at all due to being a pure autocomplete model. If you can plug it into a chat completions API format like SillyTavern and it just werks, it's probably one of these new fake bases that was full of instruct data.
>>
>>102757059
Fuck all, even an 8GB GPU is fine for Nemo 12B. Get a Q6 GGUF quant (it will be about 9GB) and you'll only need to offload a few layers to CPU, it'll run very quick.
>>
File: 39_06109_.png (2.83 MB, 2048x2048)
2.83 MB
2.83 MB PNG
>>102757241
A damn shame they never released the original fp16 weights.
>>102757239
>classic Mikus
Was feeling nostalgic. Thanks for remembering it. Looking forward to the rpg too.
>>Illustrious
Can see how its got a much better understanding of styles (picrel: low poly) but have not wrangled it nearly to the level of Flux or SDXL and everything is a bit rough so far.
>>
Mistral Medium 2 + open weights when?
>>
>>102757726
38 days.
>>
File: ComfyUI_06455_.png (2.12 MB, 1280x1280)
2.12 MB
2.12 MB PNG
>>102757726
>>
lol lmao kek rotfl

https://github.com/ggerganov/llama.cpp/blob/master/docs/docker.md

>has a rocm image
>has no command for usage

the docs are in dreadful shape for llama.cpp
>>
>>102758184
Rah!
>>
lmao

Finally I have docker reading my gpu.

llama.cpp docker seg faults, because of course it does.
>>
File: 2410.05993.png (499 KB, 1275x1650)
499 KB
499 KB PNG
>Aria, the world’s first open-source, multimodal native Mixture-of-Experts (MoE) model.

https://huggingface.co/rhymes-ai/Aria
https://github.com/rhymes-ai/Aria
>25.3B total parameters
>3.9B activated parameters
>Pre-trained from scratch on multimodal data
>Fine-tuning code and examples provided
>>
As i predicted, llama.cpp is actually garbage but noooo everyone said it was going to work, but it doesn't :^)

ollama works, by the way.
>>
>>102758389
ollama is a wrapper around llama.cpp, anon
llamacpp is its backend
>>
congratulations, you took the bait and it didn't even bump the thread
>>
>>102758448
ollama is a fork now, not just a wrapper
olchads are even getting multimodal before lcppcels do
>>
>>102758474
if this is true why isn't llamacpp stealing their improvements, they should
>>
>>102758462
>he doesn't use 'last reply'
>>
>he uses last reply
>>
>he replies
>>
*brap*
>>
*plap*
>>
Thanks for all the Mikus. LMG has been sold to Gumi.
>>102758839
>>102758839
>>102758839
>>
>>102746014
This model is better than Mistral 12B if you don't mind cucked censored model with low context, what I use for Page Assistant with RX 6800 16GB. It's seriously over powered just censored as fuck
https://huggingface.co/princeton-nlp/gemma-2-9b-it-SimPO
>>
>>102758846
A rare victory for GUMI!
>>
>>102757390
petra you fat bitch, stop gangstalking me
>>
>>102752930
>>102752872
Is there a chart like this for 70b models?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.