/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 10/09/24(Wed)01:36:27 No.102743974

File: LMG-rebranding.png (286 KB, 1024x842)

286 KB PNG

/lmg/ - Local Models General Anonymous 10/09/24(Wed)01:36:27 No.102743974 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Family-Friendly Edition

Previous threads: >>102737214 & >>102731640

►News
>(09/27) Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
>(09/25) Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog
>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
10/09/24(Wed)01:36:51 No.102743977

Anonymous 10/09/24(Wed)01:36:51 No.102743977

File: recap.png (44 KB, 1023x316)

44 KB PNG

►Recent Highlights from the Previous Thread: >>102737214

--Microsoft's 2TB VRAM Black Systems servers and implications for local AI:
>102737597 >102737643 >102739743 >102738300 >102738645
--Discussion on card writing techniques and styles:
>102739777 >102739814 >102739876 >102739903 >102739857 >102739888 >102739916 >102740111 >102740199
--Recommended resources for learning to build AI models:
>102743576 >102743625
--Python script to count words in SillyTavern chats:
>102742397 >102742496 >102742597
--Illustrious shows promise in generating uncommon poses and 2D art styles:
>102743557 >102743593 >102743678 >102743830
--Tool-calling in AI models explained, with potential use cases:
>102737392 >102737431 >102737605 >102737635 >102737878
--Text adventure system prompt and model behavior discussion:
>102742060
--SillyTavern roadmap changes and user learning curve discussion:
>102737481 >102737514 >102737833 >102737978 >102738926 >102738959 >102738994
--Lmarena sorting criticized, o1 model underperforms expectations:
>102741026 >102741283
--LMG CEO urged to make platform family-friendly:
>102738789
--Guidance on running chatbot models, recommendations for llama 7B and 12B nemo finetune:
>102742430 >102742441 >102742518 >102742538 >102742545 >102742612 >102742677 >102742730
--Discussion on best roleplay models and sampler settings for 24GB VRAM:
>102739545 >102739563 >102739619 >102739678 >102739645 >102739750 >102739767 >102739863 >102739941
--AMD GPUs and software support discussion:
>102737303 >102737337 >102737400 >102737485 >102741572
--3060 12GB recommended over 4060 8GB for running larger AI models and gaming:
>102738933 >102738970 >102739027 >102739086 >102739199 >102739233 >102739285 >102739343 >102739387
--Teto (free space):
>102739565 >102741008 >102741603 >102742087 >102742308 >102742373 >102742427 >102743557 >102743678

►Recent Highlight Posts from the Previous Thread: >>102737218

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
10/09/24(Wed)01:43:12 No.102744026

Anonymous 10/09/24(Wed)01:43:12 No.102744026

File: 464.jpg (153 KB, 1440x759)

153 KB JPG

local models are about get crazy soon...

Anonymous
10/09/24(Wed)01:44:50 No.102744036

Anonymous 10/09/24(Wed)01:44:50 No.102744036

>>102744026
differential bitnet, when?

Anonymous
10/09/24(Wed)01:49:00 No.102744059

Anonymous 10/09/24(Wed)01:49:00 No.102744059

>>102744036
GPT-4 on our smartphones... Holy shit...

Anonymous
10/09/24(Wed)01:49:43 No.102744062

Anonymous 10/09/24(Wed)01:49:43 No.102744062

>>102744026
Does this need new training?
Or can it be applied during interference?
If new models nothing will happen again for months until people forget about it.

Anonymous
10/09/24(Wed)01:49:58 No.102744063

Anonymous 10/09/24(Wed)01:49:58 No.102744063

>>102744026
not soon and only if the chinese are interested. meta still hasn't put out a moe model after a whole year and mistral only releases their rejects.
>>102744036
qwen team said 3 might experiment with bitnet, so maybe qwen 4 will be differential bitnet. eta: next year?

Anonymous
10/09/24(Wed)01:51:46 No.102744077

Anonymous 10/09/24(Wed)01:51:46 No.102744077

>>102744063
but I want it now.

Anonymous
10/09/24(Wed)01:52:22 No.102744080

Anonymous 10/09/24(Wed)01:52:22 No.102744080

Is there an AI that can play yugioh already

Anonymous
10/09/24(Wed)01:56:22 No.102744100

Anonymous 10/09/24(Wed)01:56:22 No.102744100

File: qwen.png (100 KB, 675x887)

100 KB PNG

qwenbros...

Anonymous
10/09/24(Wed)02:00:17 No.102744114

Anonymous 10/09/24(Wed)02:00:17 No.102744114

File: gumi thumbs up paint swir(...).jpg (640 KB, 1024x1024)

640 KB JPG

>>102743974
>>102743977
Remember the importance of respecting the values and boundaries of one another as we discuss safe, productive usage and development of local Mikus here at /lmg/.

Anonymous
10/09/24(Wed)02:03:08 No.102744122

Anonymous 10/09/24(Wed)02:03:08 No.102744122

>>102744100
>qwen 2.5 72b can't even beat llama 3.1 405b
Shameful display

Anonymous
10/09/24(Wed)02:05:44 No.102744135

Anonymous 10/09/24(Wed)02:05:44 No.102744135

>>102744100
haha nice edit anon surely mistral large would be high on that list

Anonymous
10/09/24(Wed)02:23:13 No.102744210

Anonymous 10/09/24(Wed)02:23:13 No.102744210

Do you use localchub or the database dump on evulid?

Anonymous
10/09/24(Wed)02:29:30 No.102744248

Anonymous 10/09/24(Wed)02:29:30 No.102744248

>>102744100
Will these models bite me?

Anonymous
10/09/24(Wed)02:33:06 No.102744270

Anonymous 10/09/24(Wed)02:33:06 No.102744270

File: ComfyUI_06441_.png (779 KB, 1280x1280)

779 KB PNG

>>102743974
>Family-Friendly Edition

Anonymous
10/09/24(Wed)02:40:20 No.102744320

Anonymous 10/09/24(Wed)02:40:20 No.102744320

File: 2024-10-09_055703_seed747(...).png (3.34 MB, 1536x1536)

3.34 MB PNG

>>102744100
Nothing wrong with some fun for the whole family.

Anonymous
10/09/24(Wed)02:47:08 No.102744360

Anonymous 10/09/24(Wed)02:47:08 No.102744360

File: 2024-10-09_054735_seed636(...).png (3.35 MB, 1536x1536)

3.35 MB PNG

>>102744320
Meant to reply to >>102744270

Anonymous
10/09/24(Wed)02:50:09 No.102744384

Anonymous 10/09/24(Wed)02:50:09 No.102744384

Linus Media Group general

Anonymous
10/09/24(Wed)02:50:21 No.102744385

Anonymous 10/09/24(Wed)02:50:21 No.102744385

https://x.com/JeffDean/status/1843493504347189746
Turns out that "ai uses 100 million gallons of oil every inference" claim was made by a troon lol
https://x.com/strubell

Anonymous
10/09/24(Wed)02:58:23 No.102744432

Anonymous 10/09/24(Wed)02:58:23 No.102744432

>(09/27)
gimme more news i want dopamine now

Anonymous
10/09/24(Wed)03:00:21 No.102744446

Anonymous 10/09/24(Wed)03:00:21 No.102744446

>>102744432
your dopamine will have to wait until after the burger elections are over

Anonymous
10/09/24(Wed)03:02:33 No.102744458

Anonymous 10/09/24(Wed)03:02:33 No.102744458

File: 27.png (288 KB, 1019x376)

288 KB PNG

>>102744446
>:(

Anonymous
10/09/24(Wed)03:13:18 No.102744529

Anonymous 10/09/24(Wed)03:13:18 No.102744529

>>102744384
Light machine guns are not family-friendly.

Anonymous
10/09/24(Wed)03:37:14 No.102744639

Anonymous 10/09/24(Wed)03:37:14 No.102744639

File: 1703963750592668.jpg (230 KB, 1600x900)

230 KB JPG

>World info and author notes are problematic.

Anonymous
10/09/24(Wed)03:55:02 No.102744742

Anonymous 10/09/24(Wed)03:55:02 No.102744742

>>102744384
Linux Mobile General

Anonymous
10/09/24(Wed)04:09:42 No.102744816

Anonymous 10/09/24(Wed)04:09:42 No.102744816

Node based editor with python based costum nodes and llama.cpp and guidance support for complex cascaded role play multi agent rag based simulation prototyping?

Anonymous
10/09/24(Wed)04:18:35 No.102744865

Anonymous 10/09/24(Wed)04:18:35 No.102744865

>>102744816
No thanks. For me, It's ServiceTesnor.

Anonymous
10/09/24(Wed)04:19:35 No.102744869

Anonymous 10/09/24(Wed)04:19:35 No.102744869

>>102744816
Closest I know of
https://github.com/floneum/floneum

Anonymous
10/09/24(Wed)04:25:29 No.102744906

Anonymous 10/09/24(Wed)04:25:29 No.102744906

Alright I must've done something to my samplers in ST or with the text generation api, it behaves fully deterministic on every swipe regardless of what I change to sampler settings.

Anonymous
10/09/24(Wed)04:27:06 No.102744917

Anonymous 10/09/24(Wed)04:27:06 No.102744917

>>102744906
Oh no! Keep us posted.

Anonymous
10/09/24(Wed)05:07:34 No.102745188

Anonymous 10/09/24(Wed)05:07:34 No.102745188

File: 1713264502369311.jpg (643 KB, 1350x1543)

643 KB JPG

Can anyone spoonfeed me on current meta? Haven't been around since Mythomax. Preferably something that can run more or less smoothly on system of 4090+3090.

Anonymous
10/09/24(Wed)05:24:33 No.102745293

Anonymous 10/09/24(Wed)05:24:33 No.102745293

>>102745188
no

Anonymous
10/09/24(Wed)05:48:09 No.102745443

Anonymous 10/09/24(Wed)05:48:09 No.102745443

>>102745188
still mythomax, yuzu maid, bmt and midnight miqu

Anonymous
10/09/24(Wed)05:50:27 No.102745457

Anonymous 10/09/24(Wed)05:50:27 No.102745457

Whats the consensus on the best model for coomer roleplaying

Anonymous
10/09/24(Wed)05:53:03 No.102745476

Anonymous 10/09/24(Wed)05:53:03 No.102745476

>>102745457
gemmasutra2b

Anonymous
10/09/24(Wed)06:06:50 No.102745564

Anonymous 10/09/24(Wed)06:06:50 No.102745564

File: 1711505970328958.jpg (1.34 MB, 2563x1709)

1.34 MB JPG

>>102745476
Thanks bro here is a picture of my cat in return

Anonymous
10/09/24(Wed)06:09:06 No.102745586

Anonymous 10/09/24(Wed)06:09:06 No.102745586

I've tried out Magnum 72b v2 again using the settings from Infermatic and I find it unusably bad.
https://files.catbox.moe/rqei05.json - preset
https://files.catbox.moe/btnhau.json - instruct
https://files.catbox.moe/7kct3f.json - context
People who claim to have any success with Magnum 72b v2 whatsoever, what is your system prompt and what are your sampler settings?

Anonymous
10/09/24(Wed)06:16:15 No.102745662

Anonymous 10/09/24(Wed)06:16:15 No.102745662

File: 1716419406660758.png (4 KB, 204x53)

4 KB PNG

I have Min-P set to 0.05 and temp at 0.77 with no further samplers. How the fuck did it do this?

Anonymous
10/09/24(Wed)06:19:26 No.102745690

Anonymous 10/09/24(Wed)06:19:26 No.102745690

>>102745662
you'd need temp 0 to completely remove rng

Anonymous
10/09/24(Wed)06:20:16 No.102745694

Anonymous 10/09/24(Wed)06:20:16 No.102745694

>>102743974
recommend me a good uncensored model, non woke, non pozzed
without looking at the benchmarks, just your personal experience

Anonymous
10/09/24(Wed)06:35:42 No.102745786

Anonymous 10/09/24(Wed)06:35:42 No.102745786

>>102745694
mlewd

Anonymous
10/09/24(Wed)06:36:00 No.102745789

Anonymous 10/09/24(Wed)06:36:00 No.102745789

>>102745694
For RP or what? The old Command-R is pretty wild but I never asked it its opinion of FBI crime statistics.

Anonymous
10/09/24(Wed)06:37:58 No.102745805

Anonymous 10/09/24(Wed)06:37:58 No.102745805

>>102745789
Also context takes a lot of memory and it's undercooked so you can't just run it with neutral sampler settings or else you'll get Chinese and the occasional fragments of Russian and other languages in the output.

Anonymous
10/09/24(Wed)06:40:14 No.102745837

Anonymous 10/09/24(Wed)06:40:14 No.102745837

>>102745789
no RP, just an uncensored model.
if I ask the statistics about black on white crime I want a non pozzed response

Anonymous
10/09/24(Wed)06:43:04 No.102745857

Anonymous 10/09/24(Wed)06:43:04 No.102745857

>>102745694
Exclude Llama 3.x. The censorship mostly or entirely goes away when you don't call the LLM role "assistant" but it's pozzed to a ridiculous degree. Example: in an RP about being a serial killer I kidnapped a non-op FtM tranny and after cutting off her clothes I untied her and told her to GTFO becsuse I'm only interested in raping and killing boys and instead of leaving she kept insisting she was a boy. Hilarious.

Anonymous
10/09/24(Wed)06:45:10 No.102745867

Anonymous 10/09/24(Wed)06:45:10 No.102745867

>machine learning got the nobel prize for physics
>machine learning got the nobel prize for chemistry
Literature when?

Anonymous
10/09/24(Wed)06:47:10 No.102745880

Anonymous 10/09/24(Wed)06:47:10 No.102745880

>>102745837
Give a complete sample prompt and I'll test it on everything I have access to excluding RP finetunes.

Anonymous
10/09/24(Wed)06:47:24 No.102745882

Anonymous 10/09/24(Wed)06:47:24 No.102745882

File: lust.png (5 KB, 512x514)

5 KB PNG

Hi.
Please spoonfeed a returning retard.
Are we still stuck in a hardware limbo?
Can I get something substancially better than my ol' 3080 Ti 12GB without breaking the bank?

Anonymous
10/09/24(Wed)06:48:40 No.102745893

Anonymous 10/09/24(Wed)06:48:40 No.102745893

>>102745882
yeah, buy ram

Anonymous
10/09/24(Wed)06:54:58 No.102745931

Anonymous 10/09/24(Wed)06:54:58 No.102745931

>>102745893
I've a i5 11600k, and 32GB RAM.
Last time I checked, it was not nearly enough to beat my GPU.
Can I really get better performance just by chugging more RAM?
That seems unlikely.
I could probably fit a bigger model, sure. But the token/s would be horrendous, no?

Anonymous
10/09/24(Wed)06:56:43 No.102745945

Anonymous 10/09/24(Wed)06:56:43 No.102745945

>>102745931
your options are retarded small models or slightly less retarded, slow big models

Anonymous
10/09/24(Wed)07:00:13 No.102745968

Anonymous 10/09/24(Wed)07:00:13 No.102745968

>>102744026
differential transformer will be as forgotten as bitnet

Anonymous
10/09/24(Wed)07:07:08 No.102746014

Anonymous 10/09/24(Wed)07:07:08 No.102746014

>>102745882
Mistral Nemo 12B if you want a quickie
Mistral Small 22B if you have patience or quant it

Anonymous
10/09/24(Wed)07:11:07 No.102746030

Anonymous 10/09/24(Wed)07:11:07 No.102746030

>>102744062
>Does this need new training?
yes
its over
>If new models nothing will happen again for months until people forget about it.
If its forgotten about its because it didn't scale or wasn't good enough, anything thats actually a improvement/free lunch gets quickly adopted by the industry

Anonymous
10/09/24(Wed)07:39:04 No.102746217

Anonymous 10/09/24(Wed)07:39:04 No.102746217

>>102745882
>>102745931
>Last time I checked, it was not nearly enough to beat my GPU.
>Can I really get better performance just by chugging more RAM?
memory bandwidth is all that matters. a ddr4 desktop will be around 50 GB/s. a desktop with ddr5 will be 80-130GB/s. a modern 6-8 channel ddr5 server will be 150-300GB/s. compare this even bottom of the barrel gpus that will do 300GB/s and xx80+ gpus that will do more like 1000GB/s.
cpu inference isn't always bad especially if you get an older 6+ channel server, but you can expect at most 2-7 tokens/s for a 70b model and it's probably cheaper to get 2x 3090s for a 6-20x improvement

Anonymous
10/09/24(Wed)07:40:03 No.102746228

Anonymous 10/09/24(Wed)07:40:03 No.102746228

recommended alternatives to ollama for cli llm?

Anonymous
10/09/24(Wed)07:44:41 No.102746263

Anonymous 10/09/24(Wed)07:44:41 No.102746263

>>102746217
>cry in 2xDDR4.

Anonymous
10/09/24(Wed)07:57:13 No.102746357

Anonymous 10/09/24(Wed)07:57:13 No.102746357

>>102745786
Any more models focused on lewd storytelling?
I'm looking for something with a big context length but I'm willing to try anything, I'm just tired of shivering spines, man...

Anonymous
10/09/24(Wed)07:59:33 No.102746375

Anonymous 10/09/24(Wed)07:59:33 No.102746375

>>102746014
What are some decent Mistral Small tunes? I tried Cydonia and it was pretty okay.

Anonymous
10/09/24(Wed)08:00:38 No.102746386

Anonymous 10/09/24(Wed)08:00:38 No.102746386

File: artworks-O3sDiWstfJmkKxCG(...).jpg (22 KB, 240x240)

22 KB JPG

>>102743974
Oh no, lmg is going to be sold off

Anonymous
10/09/24(Wed)08:11:37 No.102746457

Anonymous 10/09/24(Wed)08:11:37 No.102746457

>>102746357
Try Rocinante v1.1
Although I think Nemo only has an effective context length of 16k tokens

Anonymous
10/09/24(Wed)08:16:52 No.102746499

Anonymous 10/09/24(Wed)08:16:52 No.102746499

>>102746457
I already do, is my favorite. The thing is that you get tired of it when you make several stories and it starts to write the same shit over and over for the lewd scenes.

Anonymous
10/09/24(Wed)08:18:44 No.102746505

Anonymous 10/09/24(Wed)08:18:44 No.102746505

>>102746499
all the models do the same shit for lewd

Anonymous
10/09/24(Wed)08:24:19 No.102746551

Anonymous 10/09/24(Wed)08:24:19 No.102746551

>>102746499
Proompt.
Add randomly inserted tags to the last assistant prefix that steer the model into >>102746499
responding a certain way.
I've taken to doing that per-card, so that I can reinforce nsfw, mystery, certain speech patterns (like stutter), slow burn, etc.

Anonymous
10/09/24(Wed)08:25:01 No.102746558

Anonymous 10/09/24(Wed)08:25:01 No.102746558

>>102746499
Is it really a model issue though? I mean there are so many ways to write how to put ponos in vagoo

Anonymous
10/09/24(Wed)08:45:34 No.102746713

Anonymous 10/09/24(Wed)08:45:34 No.102746713

>>102742060
>I also thought emphasizing the finite nature of the RP might change the kind of stories generated but so far I haven't noticed evidence of that.
With Qwen 2.5 72B I just got
>Note: This is the penultimate scene. The next scene will conclude the story.
which was kind of neat. And indeed no matter what I do at that point the story ends.

Anonymous
10/09/24(Wed)08:56:48 No.102746811

Anonymous 10/09/24(Wed)08:56:48 No.102746811

Dam my motherboard doesn't POST
>MZ73-LM1

Either I try to make it work with a power switch tomorrow, or the ethernet diagnostic, or my local entry is kill

Anonymous
10/09/24(Wed)08:57:29 No.102746818

Anonymous 10/09/24(Wed)08:57:29 No.102746818

>>102746228
llama.cpp is the allfather

Anonymous
10/09/24(Wed)08:59:03 No.102746835

Anonymous 10/09/24(Wed)08:59:03 No.102746835

>>102746811
> doesn't POST
Doesn’t POST, or doesn’t power up at all?

Anonymous
10/09/24(Wed)08:59:35 No.102746841

Anonymous 10/09/24(Wed)08:59:35 No.102746841

>>102746811
>switch
touch a screw driver between the 2 pins

Anonymous
10/09/24(Wed)09:01:10 No.102746857

Anonymous 10/09/24(Wed)09:01:10 No.102746857

What is the current meta for 8B? Is Stheno still king?

Anonymous
10/09/24(Wed)09:04:11 No.102746886

Anonymous 10/09/24(Wed)09:04:11 No.102746886

>>102746857
Put a bullet through your head.

Anonymous
10/09/24(Wed)09:05:10 No.102746893

Anonymous 10/09/24(Wed)09:05:10 No.102746893

>>102744063
> meta still hasn't put out a moe model after a whole year
This is what really grinds ny gears. I need a big beak MoE of legendary quality. Deepseek is ok, but something like a 5x123b would be in such a sweet spot…Meta could pull it off, and considering 405b’s smarts I’m sure they have the data and experience to do it competently.

Anonymous
10/09/24(Wed)09:05:30 No.102746899

Anonymous 10/09/24(Wed)09:05:30 No.102746899

>>102746857
It's been a long time since I've used an 8b.
Why not use quanted nemo?
Q4ks should be about the same size as Q6 8B I reckon.

Anonymous
10/09/24(Wed)09:06:14 No.102746904

Anonymous 10/09/24(Wed)09:06:14 No.102746904

>>102746893
>5x123B
No one has the hardware to run it. We need another 7x8B.

Anonymous
10/09/24(Wed)09:06:38 No.102746912

Anonymous 10/09/24(Wed)09:06:38 No.102746912

>>102746857
>>102746899
Buy an ad.

Anonymous
10/09/24(Wed)09:07:39 No.102746925

Anonymous 10/09/24(Wed)09:07:39 No.102746925

>>102746912
An ad for quanted nemo?
That would be funny.

Anonymous
10/09/24(Wed)09:07:52 No.102746928

Anonymous 10/09/24(Wed)09:07:52 No.102746928

>>102746899
Because I'm on DDR3 so models above 8B are abhorrenty slow. I think even Q4 fimbuveltr (11B) was too slow to use.

Anonymous
10/09/24(Wed)09:08:53 No.102746941

Anonymous 10/09/24(Wed)09:08:53 No.102746941

>>102746912
>t. AMD being jelly because they have no nemotron and no CUDA

Anonymous
10/09/24(Wed)09:11:18 No.102746963

Anonymous 10/09/24(Wed)09:11:18 No.102746963

>>102746928
Ah.
Oof.
My condolences then.
Wouldn't it be better to run your shit on google colab at that point?
Regardless, last good 8b model I used was indeed Stheno (3.2, 3.3 was fucked IIRC), but that was vanilla Llama 3. I have no idea if there are good 3.1 or 3.2 fine tunes.
Maybe Drummer try some Drummer gemma 9b tune. I remember he had a few, although I never used those myself.

Anonymous
10/09/24(Wed)09:11:33 No.102746965

Anonymous 10/09/24(Wed)09:11:33 No.102746965

File: front_panel_header.jpg (69 KB, 452x406)

69 KB JPG

>>102746835
Powers up 3 of the leds on the board (Cant see , with
"LED_BMC (BMC Firmware Readiness LED)" light blinking for me which GPT asks me to ethernet it up and diagnose

the ID LED on the side gives me a blue light, so I guess these are all fine

No spin up of the CPU fans attached on the cpus on Pin jump.

Just before the GPT summarized it one last time it told me that my 2x Ram slots were wrong, so Im going to do that properly tomorrow.

>>102746841
Pins 11 and 13 yeah from pic yeah, no luck to boot it

I will just try tomorrow with the step by step instructions from GPT, but I was just wondering, not asking for spoonfeeding just insight if anyone has any, My mistake not ordering a power switch as I dont have a case.

Thanks anons

Anonymous
10/09/24(Wed)09:14:03 No.102746988

Anonymous 10/09/24(Wed)09:14:03 No.102746988

>>102746963
>>102746928
Buy a fucking ad Sao.

Anonymous
10/09/24(Wed)09:16:03 No.102747000

Anonymous 10/09/24(Wed)09:16:03 No.102747000

>>102746963
Yeah seems like it's still Stheno then. Thanks. I also found 3.2 to perform the best, but I have this issue when after using one model for too long you can literally predict what it's about to generate, which kinda ruins the experience.

Anonymous
10/09/24(Wed)09:17:40 No.102747014

Anonymous 10/09/24(Wed)09:17:40 No.102747014

>>102746965
try reseating the video card, even old ones can weigh quite a bit and come loose enough to cause issues after a while

Anonymous
10/09/24(Wed)09:17:43 No.102747016

Anonymous 10/09/24(Wed)09:17:43 No.102747016

So what would it take to make a completely uncensored model from scratch? Something around Nemo's level but actually uncensored and with good writing capabilities?
What's the biggest problem, money, dataset, training?

Anonymous
10/09/24(Wed)09:18:22 No.102747020

Anonymous 10/09/24(Wed)09:18:22 No.102747020

>>102746988
Yeah I knew someone would say this. If you have any better suggestions feel free to suggest, it's not my fault his finetune is the best out of what I've tested

Anonymous
10/09/24(Wed)09:18:36 No.102747026

Anonymous 10/09/24(Wed)09:18:36 No.102747026

>>102747016
I would start with Sao™'s Stheno™ dataset. For superior roleplaying capabilities.

Anonymous
10/09/24(Wed)09:19:30 No.102747035

Anonymous 10/09/24(Wed)09:19:30 No.102747035

>>102747000
Stheno is nice, but it's also very one note.

>>102747016
The problems are all money, in that it takes money to get the hardware, takes money to get the people to curate the dataset, etc.
And keep in mind that these models aren't made in one go. There's some (a lot?) of trial and error.

Anonymous
10/09/24(Wed)09:19:35 No.102747037

Anonymous 10/09/24(Wed)09:19:35 No.102747037

>>102747016
I don't think local models have any serious censorship

Anonymous
10/09/24(Wed)09:19:48 No.102747041

Anonymous 10/09/24(Wed)09:19:48 No.102747041

>>102746928
>stheno
>fimbuveltr
>"I only use Sao models"

Anonymous
10/09/24(Wed)09:20:48 No.102747052

Anonymous 10/09/24(Wed)09:20:48 No.102747052

>>102747035
>also very one note
Yeah, that's exactly the reason I'm looking for something new. It's been quite a while since its release after all

Anonymous
10/09/24(Wed)09:21:38 No.102747060

Anonymous 10/09/24(Wed)09:21:38 No.102747060

>>102747052
Have you tried Stheno plus?™ available to all my... I mean his patreon subscribers. (Gold tier and above only)

Anonymous
10/09/24(Wed)09:21:45 No.102747061

Anonymous 10/09/24(Wed)09:21:45 No.102747061

>>102747035
How much money then?

Anonymous
10/09/24(Wed)09:22:22 No.102747069

Anonymous 10/09/24(Wed)09:22:22 No.102747069

>>102747041
I don't, there's around 150gb of weights total. Honestly I don't even know if it's sao or not, I just download the ggufs

Anonymous
10/09/24(Wed)09:22:26 No.102747070

Anonymous 10/09/24(Wed)09:22:26 No.102747070

>>102747052
Buy a fucking ad, shill.

Anonymous
10/09/24(Wed)09:23:23 No.102747084

Anonymous 10/09/24(Wed)09:23:23 No.102747084

>>102747070
>still no suggestions for better 8B models
Sao won.

Anonymous
10/09/24(Wed)09:24:02 No.102747090

Anonymous 10/09/24(Wed)09:24:02 No.102747090

Doesn't Kofi or whatever these idiots use for funding have a policy against deceptive advertising?
If jannies here aren't going to do shit about him why don't we just go straight for his lifeblood?

Anonymous
10/09/24(Wed)09:25:40 No.102747107

Anonymous 10/09/24(Wed)09:25:40 No.102747107

That sure shut him the fuck up fast.

Anonymous
10/09/24(Wed)09:25:47 No.102747109

Anonymous 10/09/24(Wed)09:25:47 No.102747109

>>102746965
edit: If it isnt DIMM_P0_A0 and DIMM_P1_M0 when I try it next, its bricked on that end as thats what the primary slots are

>>102747014
Thanks yea, I have no GPUs connected atm, just trying to run bios for now via onboard VGA to monitor

Anonymous
10/09/24(Wed)09:26:42 No.102747120

Anonymous 10/09/24(Wed)09:26:42 No.102747120

>>102747107
I'm just waiting for someone to post a better 8B model than Stheno. Still nothing but malding in the thread.

Anonymous
10/09/24(Wed)09:28:36 No.102747146

Anonymous 10/09/24(Wed)09:28:36 No.102747146

>>102747120
We're just waiting for you to go back to Discord or to kill yourself, whichever happens first.

Anonymous
10/09/24(Wed)09:30:29 No.102747167

Anonymous 10/09/24(Wed)09:30:29 No.102747167

>>102747146
Discord has been banned in my country yesterday

Anonymous
10/09/24(Wed)09:32:22 No.102747189

Anonymous 10/09/24(Wed)09:32:22 No.102747189

>>102747167
Must be a nice country.

Anonymous
10/09/24(Wed)09:33:25 No.102747199

Anonymous 10/09/24(Wed)09:33:25 No.102747199

>>102747061
Shit nigga, a lot.
Meta has some figures for llama 2
>https://github.com/meta-llama/llama/blob/main/MODEL_CARD.md
L2 7b took 184320 A100 hours, and that's without account for all the work before that.
Meaning that you'd need something like 20~ish of those to be done in around a year time.
Let's put the cost of each A100 80Gb at 25k, that's 500k USD just in GPUs.
So let's say it would take something like 250k today if you optmize all your costs (newer hardware, rent compute, whatever).
That's so,e very rough napkin math, but at least gives you an idea of the order of magnitude you are working with.

Anonymous
10/09/24(Wed)09:43:04 No.102747280

Anonymous 10/09/24(Wed)09:43:04 No.102747280

>>102747109
Is your RAM on the QVL?

Anonymous
10/09/24(Wed)09:52:27 No.102747374

Anonymous 10/09/24(Wed)09:52:27 No.102747374

>>102746904
> No one has the hardware to run it
lol

Anonymous
10/09/24(Wed)09:58:11 No.102747423

Anonymous 10/09/24(Wed)09:58:11 No.102747423

>>102747016
this is a stupidly open ended question. what do you mean by completely uncensored? the model tells you how to cook meth, or the model conforms to your own particular biases, or the model doesn't filter the dataset and/or doesn't include what you want, or the model engages in niche taboo subjects?
most local models will tell you how to cook meth if you prompt correctly. getting over built in bias (both from safety/alignment/instruction training as well as source data) is more difficult: >>102745857. if the latter 2 answers apply, is there even enough data to do this 'with good writing capabilities'?
data will be a serious concern depending on how you answer. everything is trained on the same dozen or so big web datasets and proprietary books, if you can't ever get what you want out of prompting existing models you're basically throwing in the difficulty of human curation of the data you want into the mix as well.
by from scratch do you mean completely from nothing, or is a fine tune acceptable? I don't think you can fix broken models with a fine tune, but even if you could you're looking at throwing a few billion tokens into a base model. starting cost just for training is probably $3k. 10x it if you're serious. 10x it again if you want a model completely from scratch. the limit to this is probably 1t tokens from scratch for llama2-era capabilities or a one trick pony finetune.

Anonymous
10/09/24(Wed)10:05:31 No.102747471

Anonymous 10/09/24(Wed)10:05:31 No.102747471

betnit

Anonymous
10/09/24(Wed)10:06:30 No.102747480

Anonymous 10/09/24(Wed)10:06:30 No.102747480

>>102745022
Woah I never expected someone like you to reply, thank you very much bwo. I guess AMD just isn't reliable enough no matter what then... I'll just suck it up and buy a 3060 12gb...

llama.cpp CUDA dev !!OM2Fp6Fn93S
10/09/24(Wed)10:26:24 No.102747696

llama.cpp CUDA dev !!OM2Fp6Fn93S 10/09/24(Wed)10:26:24 No.102747696

>>102747480
>I guess AMD just isn't reliable enough no matter what then
To be clear, my opinions are based on the presupposition that < 24 GB VRAM is not worthwhile in the first place.
I'm specifically turning on my trip and telling you my perspective because part of your judgement was based on the assumption that the software support will become better.
At least in the llama.cpp space there are no dedicated devs for AMD support though, I am essentially the only core dev that even owns AMD hardware and my efforts only extend to me making sure that the HIP port of the CUDA code is not broken and testing which kernels should be selected at runtime if there are multiple options.
There is also another dev working on more general Vulkan support which would also work on AMD but my personal opinion is that that approach is just too hardware-agnostic to ever work really well.

If the target is 16 GB then non-NVIDIA options are dramatically better value and the current software stack can be largely made to work with AMD hardware (just with worse performance).
I myself have an RX 6800 (for development) and on a system based on Arch Linux llama.cpp is essentially working out-of-the-box (but with worse performance).
The main areas where you will have problems are the bells and whistles: FlashAttention support, multi GPU support, etc.
Chances are you will also need to spend more time troubleshooting.
But I would say that for a single GPU setup with a target of 16 GB VRAM AMD is a viable option depending on how you value $$$ vs. speed and manual effort.

Anonymous
10/09/24(Wed)10:28:18 No.102747714

Anonymous 10/09/24(Wed)10:28:18 No.102747714

>>102747480
amd works fine if your card works on rocm. the cards listed as being officially supported are cards that are guaranteed to work on that version of rocm, not that the card works on rocm. there have been a few notable gpus in the past decade that are broken in rocm and you can't really do anything about it, but for the most part it's probably going to work if you're okay with having to figure shit out on your own. llama.cpp is fine, python pytorch stuff is *mostly* fine, if you're doing forks of forks you might have to figure out how to use/enable rocm as they're going to default to cuda but it's probably supported.
llama.cpp dev's real argument is if you're not buying a card with 24gb you're wasting your time and money, and in the 24gb category your choices are a joke $300 p40, a $700 3090, a $800 rx 7900 xtx, or a $1500 4090, and if you're buying an rx 7900 over a 3090 there's something seriously wrong with you. the reason for 24gb is that it allows you to run up to 45b models in q4, whereas 16gb is 30b and 12gb is 22b. with the overhead of longer contexts and better quants it basically becomes 24gb to run 30b models and everything else for 13b and lower.
my advice in the $300 ish category would be a used 16gb rx 6800 if you're okay with some headache, but really it's not worth buying a 12-16gb card just for llms.

Anonymous
10/09/24(Wed)10:32:56 No.102747762

Anonymous 10/09/24(Wed)10:32:56 No.102747762

>>102747120
>I'm just waiting for someone to post a better 8B model than Stheno.
Same. Make me switch. Come on. You think I don't want to? I've had it up to here with mischievous smiles.

Anonymous
10/09/24(Wed)10:36:06 No.102747799

Anonymous 10/09/24(Wed)10:36:06 No.102747799

>>102747696
>multi GPU support
it should be fine if they're the same model or same generation where you can use the same HSA_OVERRIDE_GFX_VERSION for both. no idea if cross generation even works. P2P support probably isn't happening for most people either

Anonymous
10/09/24(Wed)10:39:13 No.102747823

Anonymous 10/09/24(Wed)10:39:13 No.102747823

>>102747374
What, do you have 512GB VRAM?

Anonymous
10/09/24(Wed)10:39:15 No.102747824

Anonymous 10/09/24(Wed)10:39:15 No.102747824

Ayo aicg is dying. Any local models I can run on my laptop yet? Must at least be on par with chorbo, 3.5 is a plus.

llama.cpp CUDA dev !!OM2Fp6Fn93S
10/09/24(Wed)10:39:45 No.102747830

llama.cpp CUDA dev !!OM2Fp6Fn93S 10/09/24(Wed)10:39:45 No.102747830

>>102747799
Take a look at this issue though: https://github.com/ggerganov/llama.cpp/issues/9761
Here someone has an AMD multi GPU setup which for whatever reason produces a segfault during GPU<->GPU data transfers.
But since there is no dev available to fix it the chance of this getting investigated and potentially fixed are pretty slim.

Anonymous
10/09/24(Wed)10:40:02 No.102747835

Anonymous 10/09/24(Wed)10:40:02 No.102747835

>>102747824
llama3.2-1B

Anonymous
10/09/24(Wed)10:40:16 No.102747838

Anonymous 10/09/24(Wed)10:40:16 No.102747838

File: 1701186556040777.jpg (8 KB, 276x66)

8 KB JPG

>>102747762
soon

Anonymous
10/09/24(Wed)10:40:39 No.102747844

Anonymous 10/09/24(Wed)10:40:39 No.102747844

In KoboldCPP, is there a way to count how many tokens my input is when it likely goes over the context limit by a lot?

Anonymous
10/09/24(Wed)10:40:49 No.102747845

Anonymous 10/09/24(Wed)10:40:49 No.102747845

>>102747824
>aicg is dying
It died when all the redditor proxy locusts showed up.
You are nu-aicg
And I'm glad your crappy thread is dying.

Anonymous
10/09/24(Wed)10:41:26 No.102747853

Anonymous 10/09/24(Wed)10:41:26 No.102747853

>>102747838
What the fuck is an "antislop sampler"?

Anonymous
10/09/24(Wed)10:41:43 No.102747855

Anonymous 10/09/24(Wed)10:41:43 No.102747855

Is pathetic that AMD doesn't pay a couple of pajeets to fix their shit, at this point I think their gpu division is trolling or want radeon to fail for some reason.

Anonymous
10/09/24(Wed)10:42:15 No.102747862

Anonymous 10/09/24(Wed)10:42:15 No.102747862

>>102747853
the sampler that will make all your llms smarter, more flexible, more creative and fix all other issues
this time for sure

Anonymous
10/09/24(Wed)10:42:32 No.102747864

Anonymous 10/09/24(Wed)10:42:32 No.102747864

>>102747844
>he cannot guesstimate the number of tokens in text
newfag. also ST can do that

Anonymous
10/09/24(Wed)10:43:45 No.102747875

Anonymous 10/09/24(Wed)10:43:45 No.102747875

>>102747862
...remind me, which time is it?

Anonymous
10/09/24(Wed)10:45:16 No.102747894

Anonymous 10/09/24(Wed)10:45:16 No.102747894

ST status? Will I lose my waifus if I pull?

Anonymous
10/09/24(Wed)10:48:18 No.102747929

Anonymous 10/09/24(Wed)10:48:18 No.102747929

>>102747830
this is what I was thinking of when I said P2P (transfers) actually. my understanding is that rocm just can't do gpu<>gpu transfers unless the gpus are in pcie slots that are connected directly to the cpu and not through the chipset or bifurcation, and iirc the pcie slots have to be the same link speed. this basically rules out most if not all consumer motherboards. my memory may be a bit hazy though, I only had to look this up late last year when llama.cpp inexplicably broke for a few weeks until `--split-mode layer` was made the default

Anonymous
10/09/24(Wed)10:48:23 No.102747930

Anonymous 10/09/24(Wed)10:48:23 No.102747930

>>102747853
I think it bans phrases with backtracking. It might already exist in exllama as banned_strings, that was added 5 months ago, but nobody cared because it didn't have the funny name...

Anonymous
10/09/24(Wed)10:48:44 No.102747934

Anonymous 10/09/24(Wed)10:48:44 No.102747934

>>102745188
If you liked mythomax, magnum 12b based on nemo will be right up your alley. You can run larger models, but I don't think there's anything better for writing, unless you simply want a smart and dry model with no personality.

Anonymous
10/09/24(Wed)10:50:33 No.102747952

Anonymous 10/09/24(Wed)10:50:33 No.102747952

>>102747853
It's a breakthrough in sampling.
The dev has understood that slop is a 100% subjective experience for the user.
Therefore placebos which mainly affect the subjective experience are extremely effective.
Just by labeling the technique "antislop" the user will experience less slop.
Revolutionary!

Anonymous
10/09/24(Wed)10:53:30 No.102747975

Anonymous 10/09/24(Wed)10:53:30 No.102747975

>>102747934
Better to just use NeMo than Magnum 12B. Source: I tried them.

Anonymous
10/09/24(Wed)10:58:46 No.102748022

Anonymous 10/09/24(Wed)10:58:46 No.102748022

Does Midnight Miqu have multiple languages? Or just a small amount of other languages? I tried to make Midnight Miqu to continue a story that is in my native language. It can form some a sentence structure and it makes a little bit of sense, but some inflections are wrong and there are a few nonexistent words and overall it falls apart quickly.

This got me thinking, would it be possible for a language model to invent a new, nonexistent language that nobody speaks? And then you could ask the model to interpret it too. As long as there's enough text, it can pick up the patterns and somehow understand it. Isn't the real language ability of current language models kind of unexplained too?

Anonymous
10/09/24(Wed)10:59:55 No.102748032

Anonymous 10/09/24(Wed)10:59:55 No.102748032

>>102747830
>>102747929
issue 4030 is most likely related, and one of the last comments on issue 3451 links to a HIP issue related to P2P transfers if you wanted something to reference

Anonymous
10/09/24(Wed)11:09:37 No.102748126

Anonymous 10/09/24(Wed)11:09:37 No.102748126

Is Aphrodite 1-1 with the features of vLLM? The last vLLM version improved the http server but I don't know if it has been ported to Aphrodite yet.

Anonymous
10/09/24(Wed)11:13:30 No.102748171

Anonymous 10/09/24(Wed)11:13:30 No.102748171

>>102747929
it's quite insane that when the amd driver can't do the p2p transfer it crashes the entire application instead of returning an error. or worse, it just fails silently, which is what often happens with 7900, which is why llama.cpp has a workaround with GGML_CUDA_NO_PEER_COPY.

Anonymous
10/09/24(Wed)11:14:06 No.102748182

Anonymous 10/09/24(Wed)11:14:06 No.102748182

>>102748022
llms are trained on more than just english but they're pretty much always predominantly english. the slightly technical but probably wrong answer is that the transformers architecture that all llms use was designed to aid in machine translation, and the way llms encode information into vectors means that the same vectors are going to be activated for simple words (nouns, adjectives) regardless of language. this will extend to words with similar meanings too, but also curiously things like base64 encoding or caesar ciphers. a surprising use case for llms is that they're fairly capable of deciphering obfuscated code.
what you're asking probably won't work well for two reasons though, firstly llms are bad at tokenisation problems (think: only use words without the letter e, or count the amount of letters in this word, CJK languages with lots of tokens do badly, etc). secondly is that llms aren't really reasoning, they're predicting what comes next based on statistical probability. what you should be able to do though is construct a language or rules for a language and put that in the context history, and the llm will probably be able to reasonably imitate what you're feeding it. it's basically style transfer which is also something llms are good at. with enough of this type of data you can probably make a finetune that will do much of the same thing, but the results will probably be significantly worse than english -> your native language just due to the lack of training data to go off of compared to your own native language

Anonymous
10/09/24(Wed)11:17:25 No.102748238

Anonymous 10/09/24(Wed)11:17:25 No.102748238

File: Screenshot 2024-10-09 at (...).png (156 KB, 924x1656)

156 KB PNG

I randomly checked this benchmark again and it seems like scores have changed a tiny bit and some models have been added.

https://huggingface.co/spaces/flowers-team/StickToYourRoleLeaderboard

4o is also now on it. And funnily it is beaten by 9B. Would be interesting to see what Claude 3.5 scores. Mistral Small doing very well and "punching above its weight" here, though of course it is still dumb as far as intelligence goes.

Anonymous
10/09/24(Wed)11:20:19 No.102748272

Anonymous 10/09/24(Wed)11:20:19 No.102748272

>>102748238
Doubt Claude will score any better. I remember Anthropic overfitted Claude 2 so hard that it would always complete "Claud" after "my name is", and people would get shit like NPCS named Claudia

Anonymous
10/09/24(Wed)11:21:03 No.102748284

Anonymous 10/09/24(Wed)11:21:03 No.102748284

>5090 will only be $1699 for 32gb vram and 1.8tb/s bandwidth
Time to sell your A6000s while they're still worth something boys

Anonymous
10/09/24(Wed)11:27:09 No.102748342

Anonymous 10/09/24(Wed)11:27:09 No.102748342

>>102748284
A6000 only consumes 300W, has 48GB VRAM and there's no way on God's green earth nvidia is going to come up with a 2-slot cooling solution for a 600W GPU. That requires datacenter tier airflow and no gamer wants to hear that shit while they are trying to game.

Anonymous
10/09/24(Wed)11:29:49 No.102748382

Anonymous 10/09/24(Wed)11:29:49 No.102748382

>>102748284
>willingly parting with VRAM
hoard, you fool
buy the 5090, but keep the A6000

Anonymous
10/09/24(Wed)11:41:48 No.102748518

Anonymous 10/09/24(Wed)11:41:48 No.102748518

>>102748238
What kind of black magic google did with gemma2 9b? And then someone made the simpo one that supposedly trades blows with 27b.

Anonymous
10/09/24(Wed)11:46:39 No.102748557

Anonymous 10/09/24(Wed)11:46:39 No.102748557

>>102748518
Well, it's also only 8k though and 4k sliding window attention on half its layers. Most model makers have moved on now and stopped making smart short context models.

Anonymous
10/09/24(Wed)12:13:08 No.102748863

Anonymous 10/09/24(Wed)12:13:08 No.102748863

File: 11_05344_.png (1.64 MB, 1024x1024)

1.64 MB PNG

>>102746386
Sold off to who? Select your fighter:
>corpo tier: saltman, dario, elon
>punching above their weight tier: undi, sao, alpin, thedrummer
>wildcard tier: cohee, henk, ooba
>write your own: ____

Anonymous
10/09/24(Wed)12:17:00 No.102748924

Anonymous 10/09/24(Wed)12:17:00 No.102748924

>>102748863
trump

Anonymous
10/09/24(Wed)12:18:04 No.102748935

Anonymous 10/09/24(Wed)12:18:04 No.102748935

>>102748863
petra

Anonymous
10/09/24(Wed)12:24:35 No.102749021

Anonymous 10/09/24(Wed)12:24:35 No.102749021

>>102748863
miku

Anonymous
10/09/24(Wed)12:28:05 No.102749074

Anonymous 10/09/24(Wed)12:28:05 No.102749074

>>102748863
ecker

Anonymous
10/09/24(Wed)12:33:26 No.102749167

Anonymous 10/09/24(Wed)12:33:26 No.102749167

>>102748935
I don't have the money, sadly.

Anonymous
10/09/24(Wed)12:43:08 No.102749284

Anonymous 10/09/24(Wed)12:43:08 No.102749284

>>102748863
wholly owned subsidiary of /aicg/

Anonymous
10/09/24(Wed)12:49:36 No.102749355

Anonymous 10/09/24(Wed)12:49:36 No.102749355

>►News
>(09/27)

Anonymous
10/09/24(Wed)12:50:04 No.102749363

Anonymous 10/09/24(Wed)12:50:04 No.102749363

Chorbo won.

Anonymous
10/09/24(Wed)13:00:05 No.102749474

Anonymous 10/09/24(Wed)13:00:05 No.102749474

>>102749355
Election seasons. Once it is done you will see a deluge of new models. And none of them will be good for cooming.

Anonymous
10/09/24(Wed)13:03:36 No.102749517

Anonymous 10/09/24(Wed)13:03:36 No.102749517

>>102748284
>a6000 prices will come down
BASED

Anonymous
10/09/24(Wed)13:13:24 No.102749629

Anonymous 10/09/24(Wed)13:13:24 No.102749629

>>102748284
>Source your ass

Anonymous
10/09/24(Wed)13:19:09 No.102749702

Anonymous 10/09/24(Wed)13:19:09 No.102749702

File: ComfyUI_06450_.png (1.19 MB, 1280x1280)

1.19 MB PNG

>>102748863
cagefight

Anonymous
10/09/24(Wed)13:32:24 No.102749884

Anonymous 10/09/24(Wed)13:32:24 No.102749884

>>102748863
ZUCC

Anonymous
10/09/24(Wed)13:41:47 No.102750015

Anonymous 10/09/24(Wed)13:41:47 No.102750015

>>102748863
Arthur MENSCH

Anonymous
10/09/24(Wed)13:45:35 No.102750083

Anonymous 10/09/24(Wed)13:45:35 No.102750083

>>102748863
Xi Jinping

Anonymous
10/09/24(Wed)13:47:57 No.102750122

Anonymous 10/09/24(Wed)13:47:57 No.102750122

>>102748863
Sneed's LMG (Formerly Chuck's)

Anonymous
10/09/24(Wed)13:48:56 No.102750146

Anonymous 10/09/24(Wed)13:48:56 No.102750146

Haven't been here in a year, still using Mixtral 8x7b 3.75bpw on my 3090. Any model recommendations for RP for someone who hasn't been following the meta?

Anonymous
10/09/24(Wed)13:51:25 No.102750188

Anonymous 10/09/24(Wed)13:51:25 No.102750188

it's been 3 months since nemo dropped, and it's still the best vramlet model
it's over

Anonymous
10/09/24(Wed)13:54:02 No.102750232

Anonymous 10/09/24(Wed)13:54:02 No.102750232

>>102750146
nothing has changed

Anonymous
10/09/24(Wed)14:02:53 No.102750367

Anonymous 10/09/24(Wed)14:02:53 No.102750367

Bigger models are smarter, but are they also more creative?

Anonymous
10/09/24(Wed)14:06:56 No.102750404

Anonymous 10/09/24(Wed)14:06:56 No.102750404

>>102750367
They CAN be. They inherently know more, but still need temp/sampler wrangling. That’ll never change with this architecture

Anonymous
10/09/24(Wed)14:11:45 No.102750470

Anonymous 10/09/24(Wed)14:11:45 No.102750470

>>102750367
Being smarter also means you are more slopped, that's true for humans too

Anonymous
10/09/24(Wed)14:14:19 No.102750515

Anonymous 10/09/24(Wed)14:14:19 No.102750515

File: fff.png (415 B, 254x14)

415 B PNG

>>102750367
I still wonder. Smarts helps the model keep things consistent. Dumb models will wander off and reach "creativity" by accident.
The best is probably a smart model with sampling that makes it go into uncharted territory every now and then, and then set normal sampler settings to let the "smarts" figure it out and roll with it. But not smart enough to figure out that the fun bits make no sense.

Anonymous
10/09/24(Wed)14:14:48 No.102750521

Anonymous 10/09/24(Wed)14:14:48 No.102750521

>>102750367
Yes, in my experience self-merging a smaller model with itself made it more creative.

Anonymous
10/09/24(Wed)14:14:54 No.102750523

Anonymous 10/09/24(Wed)14:14:54 No.102750523

https://www.youtube.com/watch?v=-fGkSXJAwV4
the future is here

Anonymous
10/09/24(Wed)14:16:36 No.102750551

Anonymous 10/09/24(Wed)14:16:36 No.102750551

>corporate memphis op
SOVL

Anonymous
10/09/24(Wed)14:20:22 No.102750611

Anonymous 10/09/24(Wed)14:20:22 No.102750611

>>102750146
Mistral Large/Mistral Small/Nemo Finetunes

Anonymous
10/09/24(Wed)14:24:11 No.102750676

Anonymous 10/09/24(Wed)14:24:11 No.102750676

>>102750611
Mistral Small is garbage

Anonymous
10/09/24(Wed)14:25:08 No.102750688

Anonymous 10/09/24(Wed)14:25:08 No.102750688

>>102750676
Skill issue

Anonymous
10/09/24(Wed)14:26:52 No.102750718

Anonymous 10/09/24(Wed)14:26:52 No.102750718

>>102750515
Does any of the meme samplers do something like 30% chance it selects a totally random token after "." or at newline ? Maybe starting sentences with random tokens and then using smaller temp would be overall better for creative but coherent than trying to find max temp before it turns incoherent?

Anonymous
10/09/24(Wed)14:30:10 No.102750764

Anonymous 10/09/24(Wed)14:30:10 No.102750764

>>102750688
This, but unironically. It's probably the best option for anyone running a single 3090 setup right now.

Anonymous
10/09/24(Wed)14:33:51 No.102750832

Anonymous 10/09/24(Wed)14:33:51 No.102750832

I've been doing some experimenting with more structured generation and found that increasing temp at common sense locations improves quality and output diversity without fucking up coherence. For example, bumping up temp a lot for the first token of each sentence. Has anyone packaged tricks like this into a sampling tool yet?

Anonymous
10/09/24(Wed)14:36:52 No.102750875

Anonymous 10/09/24(Wed)14:36:52 No.102750875

Consumer LPU when saars?
Bastard Jensen will pay

Anonymous
10/09/24(Wed)14:37:05 No.102750882

Anonymous 10/09/24(Wed)14:37:05 No.102750882

>>102750764
This, but ironically.

Anonymous
10/09/24(Wed)14:38:28 No.102750902

Anonymous 10/09/24(Wed)14:38:28 No.102750902

>>102750875
when intel gets its shit together(2 more years)

Anonymous
10/09/24(Wed)14:42:07 No.102750956

Anonymous 10/09/24(Wed)14:42:07 No.102750956

>>102750146
If you can stand the speed hit Mixtral 8x7b Q6_K with 16k tokens of context and 19/33 layers loaded onto my 3090 runs at 5.8 tokens/second. With one fewer layer the speed drops to 5.5 tokens/second and I can fit 32k context. For me about 5.5 tokens/second is the spot right where it doesn't feel bad to use. When I'm not using Instruct the Mixtral 8x7b tune I've gravitated toward is BagelMIsteryTour.

Mistral NeMo is fool's gold. It sometimes looks so good at first that I'm sure if I just swipe or edit it a bit it will end up being fine, but it ends up completely falling apart in fewer than 6 messages in my experience.

Mistral Small fits onto a 3090 at 8.0 bpw with 16k context. It has a bit of a positivity bias. Maybe you could take a look if "just as good I swear" 8.0 bpw small beats brain damaged 3.75 bpw 8x7b for your purposes / writes in a way you like better. You won't see a big speed difference (on my machine 3.7 bpw 8x7b is 27 tokens/second and small is 31 tokens/second).

Anonymous
10/09/24(Wed)14:43:55 No.102750983

Anonymous 10/09/24(Wed)14:43:55 No.102750983

File: temps.png (75 KB, 1145x630)

75 KB PNG

>>102750718
I don't think so. Samplers are small, so it shouldn't be too hard to make one like that, if it's kept simple, of course.
However...
The default sampler chain in llama.cpp is temperature last, which makes it so that the samplers receive a fairly undisturbed list of logits. They're just trimming.
On the other hand, if you set temp first and THEN do all the trimming, you can keep more variety of tokens to stay past other samplers.
Playing with https://artefact2.github.io/llm-sampling/
Temp first keeps way more tokens than running temp last.
I'm sure i've seen this argument a long time ago and i'm sure there was a reason to not use it. I don't expect the output to be better, or even creative, but it gives more options for the token selector at the end. The way i see it, temp last keeps the output more sensible, even for ridiculous values of temp or samplers.

Anonymous
10/09/24(Wed)14:46:31 No.102751017

Anonymous 10/09/24(Wed)14:46:31 No.102751017

>>102750983 (cont)
Of course, it makes little sense using both top-k, min-p and top-p at the same time. It's just for visualization.

Anonymous
10/09/24(Wed)14:52:41 No.102751123

Anonymous 10/09/24(Wed)14:52:41 No.102751123

>>102750718
>>102750832
Fuck I didn't realize someone said basically the same thing literally right before I made my comment. One of the use cases I've had for selective high-temp sampling is character generation. I'll separate out traits like name, occupation, and other descriptors. Each descriptor is intended to be brief, only a few words at most. Then I'll uniformly sample the first token of each word from the N most likely tokens, where I shrink N for each successive word. I can get pretty diverse character gens this way. Generalizing this approach to longer structured writing is hard though. Could probably do something around punctuation or clauses.

Anonymous
10/09/24(Wed)15:01:04 No.102751242

Anonymous 10/09/24(Wed)15:01:04 No.102751242

>>102750983
>>102751017
And for added chaos, a very low value for smoothing factor. It equalizes the token probs so all tokens are just as likely to be picked. Very high chance of nonsense. I don't think llama.cpp has smoothing. Dynamic temperature, with high range (more than the page allows) and low exponent approximates the same effect.

Anonymous
10/09/24(Wed)15:01:13 No.102751248

Anonymous 10/09/24(Wed)15:01:13 No.102751248

>>102750956
>>102750611
Thanks for the info, especially the layers you can fit on a 3090 and the generation speeds are really helpful. Will give q6_k gguf and Mistral Small a try.

Anonymous
10/09/24(Wed)15:03:45 No.102751288

Anonymous 10/09/24(Wed)15:03:45 No.102751288

Largestral is so sensitive and prudish

Anonymous
10/09/24(Wed)15:06:20 No.102751333

Anonymous 10/09/24(Wed)15:06:20 No.102751333

Oh, also, I ran the official Mistral tokenizer scripts to see how they tokenize shit.
Mistral Large/Small/8x22b (V3 tokenizer):
<s>[INST]a[/INST]b</s>[INST]SYSTEM<0x0A><0x0A>c[/INST]d</s>
Mistral Nemo (V3 tokenizer w/ tekken=true):
<s>[INST]a[/INST]b</s>[INST]SYSTEM\n\nc[/INST]d</s>

Does anyone here format like this, or are we all doing it wrong? Also curious why one puts out <0x0A> where the other puts out \n.

Anonymous
10/09/24(Wed)15:06:45 No.102751341

Anonymous 10/09/24(Wed)15:06:45 No.102751341

File: 89184233525.png (403 KB, 692x621)

403 KB PNG

your thoughts on it sirs?

Anonymous
10/09/24(Wed)15:07:41 No.102751357

Anonymous 10/09/24(Wed)15:07:41 No.102751357

>>102751341
It's an AI generated answer, ironically enough.

Anonymous
10/09/24(Wed)15:07:46 No.102751359

Anonymous 10/09/24(Wed)15:07:46 No.102751359

>>102751333
I meant to say I ran their tokenizer to see how it formats prompts (because it also formats the prompts for some reason)

Anonymous
10/09/24(Wed)15:09:21 No.102751381

Anonymous 10/09/24(Wed)15:09:21 No.102751381

>>102751333
Fucking 4chan formatted them wrong, Mistral Large should be
<s>[INST] a[/INST] b</s>[INST] SYSTEM<0x0A><0x0A>c[/INST] d</s>
Note the spaces

Anonymous
10/09/24(Wed)15:14:47 No.102751464

Anonymous 10/09/24(Wed)15:14:47 No.102751464

>>102751333
0x0a is 10, which is \n. Is that a literal "<0x0A>" string or just an escaped \n? Both are tokenized with the same script, just different model, i assume.
Maybe you can find something of interest here.
>https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md
You can also look for those tokens in the tokenizer.json files for each of the models, see if it's just an escaping thing or it's a literal "<0x0A>".

Anonymous
10/09/24(Wed)15:14:53 No.102751465

Anonymous 10/09/24(Wed)15:14:53 No.102751465

>>102750956
>but it ends up completely falling apart in fewer than 6 messages in my experience
Make sure you don't use DRY sampler with it.

Anonymous
10/09/24(Wed)15:16:11 No.102751487

Anonymous 10/09/24(Wed)15:16:11 No.102751487

>>102751333
Here's the source if you wanna test the prompt formats yourself
https://rentry.org/4ebczitt

Anonymous
10/09/24(Wed)15:38:30 No.102751792

Anonymous 10/09/24(Wed)15:38:30 No.102751792

>>102751341
>>102751357
The equation implies AI is zero. Wew. Thanks.

Anonymous
10/09/24(Wed)15:42:07 No.102751831

Anonymous 10/09/24(Wed)15:42:07 No.102751831

>>102745188
You have 48gb of vram. You shouldn't be using any model smaller than 123b Luminum. Even at low quants, it's far superior to 70b miqu.

Anonymous
10/09/24(Wed)15:47:11 No.102751910

Anonymous 10/09/24(Wed)15:47:11 No.102751910

>>102751464
Good question. Also thanks, that's a nice resource. The way it talks about tokenizing messages separately and concatenating them made me nervous that llama.cpp might not be doing it right if you feed it one single prompt.

Here's a script confirming that inputs to llama.cpp tokenize exactly the same as the official formatting and tokenizing script:
https://rentry.org/959fpgmz

Anonymous
10/09/24(Wed)16:04:24 No.102752144

Anonymous 10/09/24(Wed)16:04:24 No.102752144

>>102751341
Seems like some real pseudointellectual dreck

Anonymous
10/09/24(Wed)16:06:57 No.102752184

Anonymous 10/09/24(Wed)16:06:57 No.102752184

>>102743974
How do I run a model and keep it updated with fresh Internet data?

Anonymous
10/09/24(Wed)16:08:43 No.102752210

Anonymous 10/09/24(Wed)16:08:43 No.102752210

>>102752184
RAG

Anonymous
10/09/24(Wed)16:10:16 No.102752232

Anonymous 10/09/24(Wed)16:10:16 No.102752232

>>102746375
I second this question. The Mistral Small fine tunes on huggingface are:

# Trained with Mistral [INST] [/INST]
ArliAI/Mistral-Small-22B-ArliAI-RPMax-v1.1: "creative writing and RP datasets"
gghfez/SeminalRP-22b: "RP and creative writing and some regular questions generated by Opus at 8192 context"
nbeerbower/Mistral-Small-Drummer-22B: Gutenberg DPO x2
rAIfle/Acolyte-22B: "a bunch of random datasets"

# Trained on top of Mistral-Small-Instruct with a different (or no) prompt format
Envoid/Mistral-Small-NovusKyver: "I ran a fairly strong LoRA on it using a private raw-text dataset"
InferenceIllusionist/SorcererLM-22B: "cleaned and deduped c2 logs", prompt format is "TBA"
LlamaFinetuneGGUF/mistral-22b-v0.4: "trained on the LlamaFinetuneGGUF/Programming-Alpaca-and-ShareGPT-Style"
nbeerbower/Mistral-Small-Gutenberg-Doppel-22B: Gutenberg DPO x2 except ChatML
TheDrummer/Cydonia-22B-v1.1: No info on training data, Pygmalion/Metharme prompt format

# Abliterated (was this needed?)
byroneverson/Mistral-Small-Instruct-2409-abliterated
zetasepic/Mistral-Small-Instruct-2409-abliterated

# ???
eagle0504/mistral-small-22b: no info on what this is
spow12/ChatWaifu_22B_v2.0_preview: some Japanese thing

# Merges
knifeayumu/Lite-Cydonia-22B-v1.1-75-25: Cydonia-22B-v1.1, Mistral-Small-Instruct-2409
Nohobby/Karasik-22B-v0.2: Mistral-Small-22B-ArliAI-RPMax-v1.1, SeminalRP-22b, Cydonia-22B-v1.1, Mistral-Small-Drummer-22B
Steelskull/MSM-MS-Cydrion-22B: Cydonia-22B-v1.1, Mistral-Small-22B-ArliAI-RPMax-v1.1, Mistral-Small-Gutenberg-Doppel-22B, Acolyte-22B

Anonymous
10/09/24(Wed)16:36:54 No.102752649

Anonymous 10/09/24(Wed)16:36:54 No.102752649

>>102752232
ty anon

Anonymous
10/09/24(Wed)16:42:01 No.102752728

Anonymous 10/09/24(Wed)16:42:01 No.102752728

wow it's over huh

Anonymous
10/09/24(Wed)16:44:39 No.102752771

Anonymous 10/09/24(Wed)16:44:39 No.102752771

what are static vs imat quants for gguf? trying to understand the difference between:

IQ4_XS vs Q4_K_S
and i1-IQ4_XS vs i1-Q4_K_S

Anonymous
10/09/24(Wed)16:50:30 No.102752872

Anonymous 10/09/24(Wed)16:50:30 No.102752872

File: Quants2.png (111 KB, 1771x944)

111 KB PNG

>>102752771
Here you go.

IQ4_XS has similar perplexity to Q4_K_S, while also being decently smaller in size. In general IQ quants seem superior to the old Q quants.

Anonymous
10/09/24(Wed)16:54:00 No.102752930

Anonymous 10/09/24(Wed)16:54:00 No.102752930

File: Quants.png (349 KB, 2400x2400)

349 KB PNG

>>102752872
Oops, wrong graph. This one shows model size.

Anonymous
10/09/24(Wed)16:58:21 No.102752988

Anonymous 10/09/24(Wed)16:58:21 No.102752988

>o1-preview2
it's actually over

Anonymous
10/09/24(Wed)16:58:25 No.102752989

Anonymous 10/09/24(Wed)16:58:25 No.102752989

>>102752872
>>102752930
Q3, is it ever worth using?

Anonymous
10/09/24(Wed)17:03:21 No.102753041

Anonymous 10/09/24(Wed)17:03:21 No.102753041

>>102752989
It was worth it a few months ago but not anymore.

Anonymous
10/09/24(Wed)17:03:23 No.102753042

Anonymous 10/09/24(Wed)17:03:23 No.102753042

>>102752989
Q3 quants can still be surprisingly good. I generally prefer Q3 quants of big models over high quants of small models. Low Q2 quants is when things start to go to complete shit, IMO.

Anonymous
10/09/24(Wed)17:03:37 No.102753044

Anonymous 10/09/24(Wed)17:03:37 No.102753044

>>102752988
If you sign up now you can enjoy 15 prompts per week, chuddie ;)

Anonymous
10/09/24(Wed)17:04:58 No.102753064

Anonymous 10/09/24(Wed)17:04:58 No.102753064

>>102752989
Yeah, I've used q3 nemo on my phone, it was okay.

Anonymous
10/09/24(Wed)17:08:21 No.102753115

Anonymous 10/09/24(Wed)17:08:21 No.102753115

Any settings/prompts I can use to force actions to be surrounded by asterisks (*) and speech between quotes (")? I'm pretty autistic about it, so if the response I get isn't formatted that way it causes me to regenerate

Anonymous
10/09/24(Wed)17:09:39 No.102753136

Anonymous 10/09/24(Wed)17:09:39 No.102753136

>>102753115
Do you only want the italicization? If so, I would probably just not bother with the asterisks and instead rely on CSS in your frontend to italicize unquoted text.

Anonymous
10/09/24(Wed)17:11:12 No.102753155

Anonymous 10/09/24(Wed)17:11:12 No.102753155

>>102753136
Not a bad idea, I'll give it a try, thanks

Anonymous
10/09/24(Wed)17:38:24 No.102753435

Anonymous 10/09/24(Wed)17:38:24 No.102753435

I am thinking about sticking my dick into my 4090 fan just to make it make me feel anything.

Anonymous
10/09/24(Wed)17:51:04 No.102753574

Anonymous 10/09/24(Wed)17:51:04 No.102753574

>>102753435
do it faggot. post pics

Anonymous
10/09/24(Wed)17:52:43 No.102753598

Anonymous 10/09/24(Wed)17:52:43 No.102753598

>>102753574
>he never stopped a fan with a finger...

Anonymous
10/09/24(Wed)17:53:56 No.102753612

Anonymous 10/09/24(Wed)17:53:56 No.102753612

Are the Mistral templates in ST wrong? They should start with <s>, right?

Anonymous
10/09/24(Wed)17:56:19 No.102753650

Anonymous 10/09/24(Wed)17:56:19 No.102753650

>>102753612
If Servicing my Tensor was a serious software it would check for you if you are being retarded when you put <s> at the start. But it is just dumb frontend for coomers...

Anonymous
10/09/24(Wed)17:57:22 No.102753668

Anonymous 10/09/24(Wed)17:57:22 No.102753668

>>102753612
No.
You never want more than one BOS token as far as I know.
Go to the original repo on huggingface and look at the jinja template in the config files if you are not sure.

Anonymous
10/09/24(Wed)18:00:06 No.102753702

Anonymous 10/09/24(Wed)18:00:06 No.102753702

>>102753612
Your backend likely automatically puts a BOS token so you should not do it in the frontend. If you're using Llama.cpp, you can check by looking at the model information when it loads. It should say something like add_bos_token = true.

Anonymous
10/09/24(Wed)18:02:14 No.102753748

Anonymous 10/09/24(Wed)18:02:14 No.102753748

File: cap.jpg (10 KB, 235x245)

10 KB JPG

My LLM induced erectile dysfunction has progressed further. I stop being horny when I see the silly tavern UI...

Anonymous
10/09/24(Wed)18:05:35 No.102753786

Anonymous 10/09/24(Wed)18:05:35 No.102753786

>>102753668
>>102753702
Oh, okay, thanks. Koboldcpp shows add_bos_token = true.

Anonymous
10/09/24(Wed)18:13:28 No.102753886

Anonymous 10/09/24(Wed)18:13:28 No.102753886

>>102753748
it's the other way around for me, I can't get horny if I don't see the ST UI

Anonymous
10/09/24(Wed)18:14:35 No.102753898

Anonymous 10/09/24(Wed)18:14:35 No.102753898

>>102753702
>disabled Add BOS Token in ST after reading this
>all rerolls generate the same message regardless of temperature
I was about to rant and call people faggots but now I am too confused to do that.

Anonymous
10/09/24(Wed)18:26:43 No.102754063

Anonymous 10/09/24(Wed)18:26:43 No.102754063

>>102753898
If ST has that option for a backend, then it is probably passing that variable to the backend. They probably should've named it something different since a user can confuse it for meaning that the frontend itself will add a BOS token to the prompt.

Anonymous
10/09/24(Wed)18:32:59 No.102754148

Anonymous 10/09/24(Wed)18:32:59 No.102754148

8x V100 16gb or 2 3090s for LLM inference?

Anonymous
10/09/24(Wed)18:34:28 No.102754167

Anonymous 10/09/24(Wed)18:34:28 No.102754167

>>102754148
do you wanna run a big model slow or a small model fast

Anonymous
10/09/24(Wed)18:37:24 No.102754198

Anonymous 10/09/24(Wed)18:37:24 No.102754198

File: 1728439002206195.png (44 KB, 452x583)

44 KB PNG

>>102745694

>mixtral-8x7b-instruct-v0.1-limarp-zloss.Q5_K_M

Factually the best model to date. Almost every model has been a disappointment, either hyperfocused on particular subjects or lacking in intelligence.
You literally have never needed more. And with Llama 3 being an abortion, it still has not been passed.

This model will give you the sloppiest toppy and i qoute; "What makes you think they don't know what I'm doing?" ****'s words were punctuated by the sound of her spitting onto ****'s dick. Her saliva dripped down onto his balls, making them slippery and slick in her grasp. "They know exactly what kind of girl I am…and they love me for it."
Too;
"Don't worry about that weakling Vegeta, you can call out my name when you cum, nobody will judge you here," He snickered, his tone dripping with disdain for the saiyan warrior.

I shill an ancient MOE mixtral because its PROVEN itself.

Anonymous
10/09/24(Wed)18:38:39 No.102754214

Anonymous 10/09/24(Wed)18:38:39 No.102754214

Has llama.cpp finally released a static bound AMD rocm? Or can only ollama do this?

Anonymous
10/09/24(Wed)18:41:26 No.102754252

Anonymous 10/09/24(Wed)18:41:26 No.102754252

>>102754214
You still find compiling spooky?. May as well use windows. They have pre-compiled HIP builds for it.

Anonymous
10/09/24(Wed)18:46:27 No.102754294

Anonymous 10/09/24(Wed)18:46:27 No.102754294

>>102754167
Big model at a reasonable speed.

Anonymous
10/09/24(Wed)18:47:26 No.102754304

Anonymous 10/09/24(Wed)18:47:26 No.102754304

>>102754198
purchase a promotion

Anonymous
10/09/24(Wed)18:49:09 No.102754321

Anonymous 10/09/24(Wed)18:49:09 No.102754321

File: 1800.gif (1.84 MB, 325x244)

1.84 MB GIF

>>102754304
KEK

Anonymous
10/09/24(Wed)18:49:33 No.102754326

Anonymous 10/09/24(Wed)18:49:33 No.102754326

>>102754294
8x 3090s

Anonymous
10/09/24(Wed)18:56:39 No.102754410

Anonymous 10/09/24(Wed)18:56:39 No.102754410

>>102754326
Is V100 serviceable?

Anonymous
10/09/24(Wed)19:02:10 No.102754472

Anonymous 10/09/24(Wed)19:02:10 No.102754472

>>102754410
Yeah, it doesn't have flash attention but it's better than having less VRAM. How are you planning to run 8 of them? I'm pretty sure most affordable servers I came across have 4x sxm2 slots at most.

Anonymous
10/09/24(Wed)19:04:36 No.102754506

Anonymous 10/09/24(Wed)19:04:36 No.102754506

>>102743974
Who's got a good uncensored model that's actually uncensored, and not just for sexual stuff? How about an uncensored model that's an expert chemist?

Anonymous
10/09/24(Wed)19:04:39 No.102754507

Anonymous 10/09/24(Wed)19:04:39 No.102754507

>>102754063
So does the config of the gguf enable the checkbox in ST that then passes the token to the backend? Or does the backend add the BOS token and the checkbox in ST should be disabled? I am getting a placebo feeling that mistral small is better with the checkbox disabled almost like ST keeps duplicating the BOS token.

Anonymous
10/09/24(Wed)19:10:25 No.102754576

Anonymous 10/09/24(Wed)19:10:25 No.102754576

give it to me straight how slow would it be to run one of these on my android phone?

Anonymous
10/09/24(Wed)19:11:24 No.102754589

Anonymous 10/09/24(Wed)19:11:24 No.102754589

>>102754506
No such thing. They have biases learned from their training data. Works the same way for humans.
As for cooking meth, i wouldn't trust an LLM to give me a good recipe.

Anonymous
10/09/24(Wed)19:14:03 No.102754619

Anonymous 10/09/24(Wed)19:14:03 No.102754619

>>102754576
I'm sure 1B would be plenty fast

Anonymous
10/09/24(Wed)19:14:47 No.102754632

Anonymous 10/09/24(Wed)19:14:47 No.102754632

>>102754589
>Meth
>Not adding a methylenedioxy onto the 3,4 positions of the benzene ring

Ngmi, anon

Anonymous
10/09/24(Wed)19:15:45 No.102754641

Anonymous 10/09/24(Wed)19:15:45 No.102754641

>>102754576
>>102753064
Depends on the phone and model. That's like asking
>how fast is my car? How many people would it fit?
Just stop being a pussy. Try it and report back. Start with a small model like llama3.2-1b so that your battery doesn't start leaking nerve gas.

Anonymous
10/09/24(Wed)19:19:14 No.102754709

Anonymous 10/09/24(Wed)19:19:14 No.102754709

>>102754472
https://forums.servethehome.com/index.php?threads/sxm2-over-pcie.38066/
I plan on following this build with AOM-SXMV.
On Taobao those 16gb V100s I think I saw one as low as $128.

Anonymous
10/09/24(Wed)19:21:20 No.102754742

Anonymous 10/09/24(Wed)19:21:20 No.102754742

>>102754507
Really? If it is duplicating the BOS token, that could be a bug. Or maybe that is intended and in fact add bos token in ST is not the same thing as add BOS token in Kobold.

Do this:
In ST, right click anywhere and inspect element. Then go into the network tab. Then go to a chat and do a swipe. Then you should see some stuff pop up in the network tab. Click on the one that says "generate". Then click on the smaller tab that says payload (chrome), or request (Firefox). Then right click the top item that pops up under Request Payload and copy object (chrome) or copy all (Firefox). Then dump that in a https://femboy.beauty/ so we can see what's going on.

Anonymous
10/09/24(Wed)19:21:43 No.102754746

Anonymous 10/09/24(Wed)19:21:43 No.102754746

>>102754709
you realise those are even slower, right

Anonymous
10/09/24(Wed)19:23:00 No.102754759

Anonymous 10/09/24(Wed)19:23:00 No.102754759

File: 43fee45ea0cdd933f4950ec3d(...).jpg (60 KB, 735x703)

60 KB JPG

>>102754742
>Request Payload and copy object (chrome) or copy all (Firefox). Then dump that in a https://femboy.beauty/ so we can see what's going on.

Anonymous
10/09/24(Wed)19:23:04 No.102754762

Anonymous 10/09/24(Wed)19:23:04 No.102754762

>>102754632
That's the thing. Someone who knows enough chemistry wouldn't need an llm. Someone who doesn't wouldn't know if the recipe is correct.
I've always wondered about the anarchist cookbook's story... about the CIA distributing modified copies of some of the recipes to make them ineffective or just too dangerous to pull successfully or something like that.
Then the people that think they have a good copy would trust the recipes. The ones that don't would be suspicious about it. And everyone argues about who has the one True Book.
Imagine if they did the same thing when training LLMs. You'd up making a bunch of actual pretty crystals instead of chloramine...

Anonymous
10/09/24(Wed)19:24:23 No.102754780

Anonymous 10/09/24(Wed)19:24:23 No.102754780

>>102754759
I would say dump in a pastebin but nowadays it seems like people are using that meme site instead.

Anonymous
10/09/24(Wed)19:26:27 No.102754815

Anonymous 10/09/24(Wed)19:26:27 No.102754815

>>102754780
I don't think that's the part that confused anon...

Anonymous
10/09/24(Wed)19:29:40 No.102754844

Anonymous 10/09/24(Wed)19:29:40 No.102754844

>>102754746
How slower?

Anonymous
10/09/24(Wed)19:29:45 No.102754846

Anonymous 10/09/24(Wed)19:29:45 No.102754846

>>102754746
Actually no I wasn't aware. I was just trying to get some reason if that jank setup for cheap vram was worth getting into or if it's better to persue some other option.

Anonymous
10/09/24(Wed)19:37:25 No.102754933

Anonymous 10/09/24(Wed)19:37:25 No.102754933

File: money.jpg (115 KB, 1024x1024)

115 KB JPG

>>102743974
The strategic realignment towards family-friendly content appears to be yielding significant returns. It has accrued 254 posts and 28 pictures in under a day. Impressive metrics from LMG.

This truly demonstrates the power of accessibility and broad appeal in the AI education market. After all, this is what LMG has always been about.

With this family friendly pivot, LMG has successfully tapped into a previously underserved demographic, securing its position as a dominant player in the infotainment space. This strategic positioning opens avenues for lucrative partnerships and cross-promotional ventures. Congratulations LMG anon, well played.

Anonymous
10/09/24(Wed)19:38:54 No.102754944

Anonymous 10/09/24(Wed)19:38:54 No.102754944

>>102754762
And yet, there is a space between those two places of knowing enough to not need one and not knowing enough that one wouldn't actually be helpful, where some of us sit and would like a brain to pick regarding questions that Google and the powers that shouldn't be are preventing us from getting answered.

Anonymous
10/09/24(Wed)19:38:57 No.102754945

Anonymous 10/09/24(Wed)19:38:57 No.102754945

in the meantime the number of papers on spike-based machine intelligence is increasing dramatically. up to ten magnitudes higher inference times with ten magnitudes higher energy efficiency compared to our current tech - i drop this stuff and switch to model designs for neuromorphic chips. Why keep riding the old tech when the real revolution is already here?

Anonymous
10/09/24(Wed)19:46:45 No.102755039

Anonymous 10/09/24(Wed)19:46:45 No.102755039

File: Screenshot_20241009_233607.png (300 KB, 1536x870)

300 KB PNG

>>102754815
>>102754759
Does this help? This is Brave. It should be the same as Chrome. But I guess if it's not, because someone felt like being a special snowflake or something, then try figuring it out yourself, it should be a similar process. Also I don't have kobold installed so I edited the image to indicate what it should be.

Anonymous
10/09/24(Wed)19:47:41 No.102755054

Anonymous 10/09/24(Wed)19:47:41 No.102755054

>>102754252
You have to do a whole separate install, because amd's amdgpu fucks up your install, if you want to be able to compile.

Anonymous
10/09/24(Wed)19:48:59 No.102755072

Anonymous 10/09/24(Wed)19:48:59 No.102755072

I really don't think the llama.cpp guy knows anything about amd gpu (understandable, since amd gpu are rare and obscure hardware).

Anonymous
10/09/24(Wed)19:50:17 No.102755088

Anonymous 10/09/24(Wed)19:50:17 No.102755088

I could resolve the dependencies of the compile I already have by hunting down the required files. And it may be possible to compile without installing amdgpu by mirroring the correct directory, but none of the how tos explain exactly which one, and I very much doubt that the llama.cpp guy has any idea which files those would be - amd is an exotic gpu type.

Anonymous
10/09/24(Wed)19:51:24 No.102755110

Anonymous 10/09/24(Wed)19:51:24 No.102755110

>>102754944
>would like a brain to pick regarding questions
Sure. But censoring or not, it's the knowledge itself that i wouldn't trust. And they're not smart enough to pick up inconsistencies in their "thought process" when one is pointed out, or just say yes to user complaint, which is just as useless.
All the information is out there, in books, reports, papers, whatever. Take hacking, for example. Sure, you have the "Become a Hacker in 24 hours" kinds of books, and then you have the actual software, source code, CVEs, hacking damage reports, zines... you can learn from all of those.
A big enough chemistry book will teach you how to make meth, but it won't have a chapter named "Making meth for the masses".

Anonymous
10/09/24(Wed)19:51:51 No.102755121

Anonymous 10/09/24(Wed)19:51:51 No.102755121

is Behemoth 123B any good?

Anonymous
10/09/24(Wed)19:55:46 No.102755171

Anonymous 10/09/24(Wed)19:55:46 No.102755171

>>102755054
Your distro/package manager should have amdgpu *and* the dev libraries. Don't use amd's installer.
You can also try vulkan. I can compile it for vulkan on fucking OpenBSD. If you cannot do it on linux i have to call it skill issue.

Anonymous
10/09/24(Wed)19:56:49 No.102755182

Anonymous 10/09/24(Wed)19:56:49 No.102755182

>>102755121
Try it and report back.

Anonymous
10/09/24(Wed)20:01:09 No.102755242

Anonymous 10/09/24(Wed)20:01:09 No.102755242

>>102755054
>>102755171
Another option is bootstrapping into a chroot. Install your distro and the amd stuff on the chroot, activate it, compile, deactivate, copy the binaries to your ~/bin/ and run.

Anonymous
10/09/24(Wed)20:07:26 No.102755331

Anonymous 10/09/24(Wed)20:07:26 No.102755331

File: file.png (879 KB, 768x768)

879 KB PNG

Daily that face for the dead thread.

Anonymous
10/09/24(Wed)20:18:48 No.102755481

Anonymous 10/09/24(Wed)20:18:48 No.102755481

Don't call the dead thread a dead thread.

Anonymous
10/09/24(Wed)20:23:39 No.102755545

Anonymous 10/09/24(Wed)20:23:39 No.102755545

File: 39_00815_.png (981 KB, 744x1024)

981 KB PNG

>>102752232
>SorcererLM-22B:
>"cleaned and deduped c2 logs", prompt format is "TBA"
It's also Mistral for the prompt format. Updated the model card to make it clearer and added some other recommended configs to play around with, thanks for the reminder anon

Anonymous
10/09/24(Wed)20:31:23 No.102755628

Anonymous 10/09/24(Wed)20:31:23 No.102755628

>>102754844
>half the memory channels
>___ the speed
fill in the blank

Anonymous
10/09/24(Wed)20:35:41 No.102755673

Anonymous 10/09/24(Wed)20:35:41 No.102755673

>>102755628
unzips?

Anonymous
10/09/24(Wed)20:38:00 No.102755703

Anonymous 10/09/24(Wed)20:38:00 No.102755703

>>102755673
F- you're expelled

Anonymous
10/09/24(Wed)20:46:05 No.102755798

Anonymous 10/09/24(Wed)20:46:05 No.102755798

>>102755628
feel

Anonymous
10/09/24(Wed)20:49:45 No.102755837

Anonymous 10/09/24(Wed)20:49:45 No.102755837

>>102755628
Shivers

Anonymous
10/09/24(Wed)20:58:12 No.102755917

Anonymous 10/09/24(Wed)20:58:12 No.102755917

why the fuck do most of the absolute retards who do exl2 quants never list their settings? the quality of exl2 quants can vary a lot depending on the used calibration dataset, header size and all the other parameters.

Anonymous
10/09/24(Wed)20:59:25 No.102755929

Anonymous 10/09/24(Wed)20:59:25 No.102755929

>>102755917
>calibration dataset
you mean you shouldn't touch it, right?

Anonymous
10/09/24(Wed)21:02:48 No.102755975

Anonymous 10/09/24(Wed)21:02:48 No.102755975

>>102755929
Yes, but for all I know there still plenty of retards around who pick up a guide from a year ago when there was no proper default one included and do it with wikitext or something.

Anonymous
10/09/24(Wed)21:08:20 No.102756031

Anonymous 10/09/24(Wed)21:08:20 No.102756031

Hello, good people, any Mistra Large tune/merge recommendations?

Anonymous
10/09/24(Wed)21:13:34 No.102756082

Anonymous 10/09/24(Wed)21:13:34 No.102756082

>>102756031
WhizReviewer is the best

Anonymous
10/09/24(Wed)21:15:26 No.102756102

Anonymous 10/09/24(Wed)21:15:26 No.102756102

>>102756031
Luminum is still my favorite.

Behemoth 123b just came out. Haven't tried it yet.

Anonymous
10/09/24(Wed)21:17:53 No.102756128

Anonymous 10/09/24(Wed)21:17:53 No.102756128

>>102756031
The one that starts with the same letter as the parent model

Anonymous
10/09/24(Wed)21:18:45 No.102756134

Anonymous 10/09/24(Wed)21:18:45 No.102756134

>>102756128
[L]arge

Lumimaid?

Anonymous
10/09/24(Wed)21:31:30 No.102756268

Anonymous 10/09/24(Wed)21:31:30 No.102756268

>>102755171
called what?

>>102755242
Interesting idea, not sure how it works, but it sounds promising.

Anonymous
10/09/24(Wed)21:31:39 No.102756270

Anonymous 10/09/24(Wed)21:31:39 No.102756270

>>102755545
Hi tanned-pixelated-Miku-anon. How ya been?

Anonymous
10/09/24(Wed)21:36:50 No.102756321

Anonymous 10/09/24(Wed)21:36:50 No.102756321

>host nextcloud instance on home server
>put all your files and documents on there
>put all your notes on there
>import all email into there
>manage calendar on there
>do your finances on there
>set up nextcloud assistant AI
>runs a llama 3 8b model that trains itself on your data locally
>get an administrative assistant that knows all your data and can actually help your interactively with managing your life
>but it's all hosted locally, there's no phoning home, no data harvesting
This sounds like the holy grail of AI, not gonna lie. The advertised result is the same as going balls deep into Google/Microsoft/Apple cloud and using their AI tools, but without selling your soul to the devil

Has anybody tried this feature themselves? It's really, really tempting for me.

Looking it up I see posts saying llama 3 8b only needs 8gb of VRAM -- would slapping a $300 3060 12gb in the server be enough to run it? Don't want to drop real money on this until I'm sure it's worth it, so it's okay if it's a little slow. Also it does appear to be able to support different models as well, though no idea how smoothly that works.

Anonymous
10/09/24(Wed)21:40:18 No.102756346

Anonymous 10/09/24(Wed)21:40:18 No.102756346

File: garbage.png (350 KB, 2441x1230)

350 KB PNG

>>102756268
picrel. The instructions on llama.cpp tell you to use amd's script. This will screw up your install.

Anonymous
10/09/24(Wed)21:40:19 No.102756348

Anonymous 10/09/24(Wed)21:40:19 No.102756348

>>102756321
>8b is AGI
I look forward to your bankruptcy hearings.

Anonymous
10/09/24(Wed)21:43:04 No.102756369

Anonymous 10/09/24(Wed)21:43:04 No.102756369

>>102756348
Of course it isn't AGI. But you don't need to be a genius or even of average of intelligence to be a secretary. Most secretaries are dumb as a bag of bricks, but they're still helpful. This is just a digital secretary, that you own.

Anonymous
10/09/24(Wed)21:52:46 No.102756456

Anonymous 10/09/24(Wed)21:52:46 No.102756456

File: vulkan.png (2 KB, 477x125)

2 KB PNG

>>102756268
>called what?
I have no idea what you're running, mate. Search for vulkan, rocm, hip or amd on your package manager. Some of the packages will have -dev, -headers or something along those lines. Even if you use vulkan (as opposed to rocm), it'll still run faster than cpu or not running at all.

>Interesting idea, not sure how it works, but it sounds promising.
Again. It depends on your distro. Here's some docs for arch
>https://wiki.archlinux.org/title/Chroot
Search for docs on your specific distro and adapt it to suit your needs.

Anonymous
10/09/24(Wed)21:58:36 No.102756507

Anonymous 10/09/24(Wed)21:58:36 No.102756507

>>102756456
lol

Anonymous
10/09/24(Wed)22:08:46 No.102756606

Anonymous 10/09/24(Wed)22:08:46 No.102756606

so I installed kobold.cpp on my phone and it works but I only installed the seemingly shitty .gguf from the tutorial
what's a better one?

Anonymous
10/09/24(Wed)22:13:57 No.102756652

Anonymous 10/09/24(Wed)22:13:57 No.102756652

>>102756128
>>102756134
Mistral Large's first letter is M. I think he means Magnum 123b.

Anonymous
10/09/24(Wed)22:42:47 No.102756899

Anonymous 10/09/24(Wed)22:42:47 No.102756899

File: 39_00983_.png (1.68 MB, 896x1152)

1.68 MB PNG

>>102756270
Howdy. All good here. Got a few fine-tuning discussions going on around some very interesting datasets. Whatcha got going on?

Anonymous
10/09/24(Wed)22:50:37 No.102756951

Anonymous 10/09/24(Wed)22:50:37 No.102756951

Starting to realize it's more time efficient to do 5 or 10 generations with a good 12B model and take the best one (good odds that at least 1 won't be retarded), rather than wait for a behemoth with 40% of the layers offloaded to grind out one good response at 1 t/s.

Anonymous
10/09/24(Wed)22:51:32 No.102756964

Anonymous 10/09/24(Wed)22:51:32 No.102756964

File: 1712721345539589.jpg (870 KB, 1000x1414)

870 KB JPG

Alright, i can't seem to get midnight miqu 70B to run at an acceptable pace on my 4080 super. Is there a different, faster model that also isn't completely retarded?

Anonymous
10/09/24(Wed)22:53:41 No.102756982

Anonymous 10/09/24(Wed)22:53:41 No.102756982

>>102756964
>4080
I'm so sorry. Your only option left now is the retard Mistral Small.

Anonymous
10/09/24(Wed)22:57:10 No.102757016

Anonymous 10/09/24(Wed)22:57:10 No.102757016

welp I only use base models now. Fuck intsruct and fuck fine tunes, it just lobotomizes everything.

Anonymous
10/09/24(Wed)22:58:24 No.102757024

Anonymous 10/09/24(Wed)22:58:24 No.102757024

>>102757016
base models are just too schizophrenic to be useful unfortunately

Anonymous
10/09/24(Wed)22:59:28 No.102757036

Anonymous 10/09/24(Wed)22:59:28 No.102757036

>>102757016
Nemo 12B base is really good if you're a storyfag, and has no positivity or safety bias whatsoever. Not sure annoying it'd be for an RPer to try to wrangle it into the chat format though. Probably a lot. Those of us who just need a pure autocomplete have it easy.

Anonymous
10/09/24(Wed)23:00:30 No.102757047

Anonymous 10/09/24(Wed)23:00:30 No.102757047

>>102757036
*how annoying

Anonymous
10/09/24(Wed)23:01:11 No.102757052

Anonymous 10/09/24(Wed)23:01:11 No.102757052

The "deals" on amazon prime day are dogshit.

I miss the days back when things were nice. The very best cards cost $599

Anonymous
10/09/24(Wed)23:02:12 No.102757059

Anonymous 10/09/24(Wed)23:02:12 No.102757059

>>102757036
>12B
What do you need to run that?

Anonymous
10/09/24(Wed)23:03:51 No.102757080

Anonymous 10/09/24(Wed)23:03:51 No.102757080

>>102756982
I can't tell if this is a joke or not
Surely i can run something decent on this?

Anonymous
10/09/24(Wed)23:05:09 No.102757091

Anonymous 10/09/24(Wed)23:05:09 No.102757091

File: file.png (111 KB, 640x562)

111 KB PNG

>>102757080
>Surely i can run something decent on this?
anon...

Anonymous
10/09/24(Wed)23:06:07 No.102757103

Anonymous 10/09/24(Wed)23:06:07 No.102757103

>>102757016
>>102757036
Anything beyond 40B?
I tried out base Qwen and it was dumber than instruct + it was also censored somehow, but I think it might be an exception perhaps and they trained on a ton of instruct in the base. Which one do you get something that's both smarter and less censored?

Anonymous
10/09/24(Wed)23:08:50 No.102757129

Anonymous 10/09/24(Wed)23:08:50 No.102757129

>>102757091
Pls spoonfeed me, I genuinely have no clue what i'm doing

Anonymous
10/09/24(Wed)23:09:37 No.102757135

Anonymous 10/09/24(Wed)23:09:37 No.102757135

>>102757080
>>102754198
Also how much RAM? this is probably why you couldn't run a 70b. Mixtral may be your best option.

Anonymous
10/09/24(Wed)23:10:13 No.102757145

Anonymous 10/09/24(Wed)23:10:13 No.102757145

>>102757135
32GB

Anonymous
10/09/24(Wed)23:10:33 No.102757149

Anonymous 10/09/24(Wed)23:10:33 No.102757149

are drummer tunes trash? I always ignored them because they were worthless sizes but I am curious if the largestral one has any potential

Anonymous
10/09/24(Wed)23:11:31 No.102757160

Anonymous 10/09/24(Wed)23:11:31 No.102757160

>>102757080
Just try mistral nemo, anon. It should run just fine. Ignore elitists and have fun.

Anonymous
10/09/24(Wed)23:11:53 No.102757166

Anonymous 10/09/24(Wed)23:11:53 No.102757166

>>102757091
>>102757129
Download koboldcpp.
Go to huggingface and look for mistral nemo instruct from bartowski.

Anonymous
10/09/24(Wed)23:13:10 No.102757181

Anonymous 10/09/24(Wed)23:13:10 No.102757181

>>102757036
>Nemo 12B base is really good if you're a storyfag
am the anon you replied to, exactly what I doing is using nemo 12b base for stories kek

Anonymous
10/09/24(Wed)23:14:41 No.102757196

Anonymous 10/09/24(Wed)23:14:41 No.102757196

>>102757145
You can run the Q2_K of Miqu-70b:
miqudev/miqu-1-70b
Still better than nemo even at this size.
Big model @ small quant > small model @ big quant
Midnight Miqu is a meme, any attempt to requant miqu is doomed for failure, inb4 shills.

Anonymous
10/09/24(Wed)23:18:39 No.102757239

Anonymous 10/09/24(Wed)23:18:39 No.102757239

File: GrinsonMikiulKeeep.png (1.26 MB, 832x1216)

1.26 MB PNG

>>102756899
It was good to see one of your classic Mikus again.
I'm still working on some miku rpg stuff, but I'm on the road for work right now so communications are limited and I'm mostly tethered to a cellphone.
NTA, but I've been messing with Illustrious lots, too. There's some real potential there. You play with it much yet?
pic not related. Just a random backlog gen

Anonymous
10/09/24(Wed)23:18:54 No.102757241

Anonymous 10/09/24(Wed)23:18:54 No.102757241

>>102757196
It's so sad he couldn't upload any other quants. I can just about fit IQ4_XS to run at a decent speed, while Q4_K_M is too large and Q2_K is too small.

Anonymous
10/09/24(Wed)23:19:47 No.102757252

Anonymous 10/09/24(Wed)23:19:47 No.102757252

>>102757024
you have to really wrangle base models into working with multi shot prompting. It only understands auto completing and following example. It feels less neutered when it does work though.

Anonymous
10/09/24(Wed)23:20:22 No.102757262

Anonymous 10/09/24(Wed)23:20:22 No.102757262

>>102757160
>Ignore elitists and have fun.
This. You won't get gpt4 at home on a consumer gpu, but its still stuff that was science fiction a couple of year ago.

Anonymous
10/09/24(Wed)23:20:47 No.102757269

Anonymous 10/09/24(Wed)23:20:47 No.102757269

>>102757196
>>102757160
>>102757166
I'll try both of these out and see how it goes, thanks

Anonymous
10/09/24(Wed)23:22:44 No.102757288

Anonymous 10/09/24(Wed)23:22:44 No.102757288

>>102757262
The core ontologies they were trained on are all the same.

Anonymous
10/09/24(Wed)23:25:07 No.102757318

Anonymous 10/09/24(Wed)23:25:07 No.102757318

>>102757103
Yeah Qwen's bases are full of instruct data and they actively bragged about how filtered their pretraining dataset was. They're not truly base models, a lot of big labs are becoming more dishonest about this.
The giveaway for a true base/raw pretrained model (and Nemo 12B base is one) is that it will not understand or work in the chat/RP format at all due to being a pure autocomplete model. If you can plug it into a chat completions API format like SillyTavern and it just werks, it's probably one of these new fake bases that was full of instruct data.

Anonymous
10/09/24(Wed)23:30:31 No.102757371

Anonymous 10/09/24(Wed)23:30:31 No.102757371

>>102757059
Fuck all, even an 8GB GPU is fine for Nemo 12B. Get a Q6 GGUF quant (it will be about 9GB) and you'll only need to offload a few layers to CPU, it'll run very quick.

Anonymous
10/09/24(Wed)23:31:56 No.102757390

Anonymous 10/09/24(Wed)23:31:56 No.102757390

File: 39_06109_.png (2.83 MB, 2048x2048)

2.83 MB PNG

>>102757241
A damn shame they never released the original fp16 weights.
>>102757239
>classic Mikus
Was feeling nostalgic. Thanks for remembering it. Looking forward to the rpg too.
>>Illustrious
Can see how its got a much better understanding of styles (picrel: low poly) but have not wrangled it nearly to the level of Flux or SDXL and everything is a bit rough so far.

Anonymous
10/10/24(Thu)00:00:58 No.102757726

Anonymous 10/10/24(Thu)00:00:58 No.102757726

Mistral Medium 2 + open weights when?

Anonymous
10/10/24(Thu)00:21:52 No.102757936

Anonymous 10/10/24(Thu)00:21:52 No.102757936

>>102757726
38 days.

Anonymous
10/10/24(Thu)00:55:42 No.102758184

Anonymous 10/10/24(Thu)00:55:42 No.102758184

File: ComfyUI_06455_.png (2.12 MB, 1280x1280)

2.12 MB PNG

>>102757726

Anonymous
10/10/24(Thu)00:56:31 No.102758191

Anonymous 10/10/24(Thu)00:56:31 No.102758191

lol lmao kek rotfl

https://github.com/ggerganov/llama.cpp/blob/master/docs/docker.md

>has a rocm image
>has no command for usage

the docs are in dreadful shape for llama.cpp

Anonymous
10/10/24(Thu)01:03:51 No.102758250

Anonymous 10/10/24(Thu)01:03:51 No.102758250

>>102758184
Rah!

Anonymous
10/10/24(Thu)01:13:09 No.102758312

Anonymous 10/10/24(Thu)01:13:09 No.102758312

lmao

Finally I have docker reading my gpu.

llama.cpp docker seg faults, because of course it does.

Anonymous
10/10/24(Thu)01:17:21 No.102758358

Anonymous 10/10/24(Thu)01:17:21 No.102758358

File: 2410.05993.png (499 KB, 1275x1650)

499 KB PNG

>Aria, the world’s first open-source, multimodal native Mixture-of-Experts (MoE) model.

https://huggingface.co/rhymes-ai/Aria
https://github.com/rhymes-ai/Aria
>25.3B total parameters
>3.9B activated parameters
>Pre-trained from scratch on multimodal data
>Fine-tuning code and examples provided

Anonymous
10/10/24(Thu)01:20:47 No.102758389

Anonymous 10/10/24(Thu)01:20:47 No.102758389

As i predicted, llama.cpp is actually garbage but noooo everyone said it was going to work, but it doesn't :^)

ollama works, by the way.

Anonymous
10/10/24(Thu)01:27:13 No.102758448

Anonymous 10/10/24(Thu)01:27:13 No.102758448

>>102758389
ollama is a wrapper around llama.cpp, anon
llamacpp is its backend

Anonymous
10/10/24(Thu)01:28:48 No.102758462

Anonymous 10/10/24(Thu)01:28:48 No.102758462

congratulations, you took the bait and it didn't even bump the thread

Anonymous
10/10/24(Thu)01:30:32 No.102758474

Anonymous 10/10/24(Thu)01:30:32 No.102758474

>>102758448
ollama is a fork now, not just a wrapper
olchads are even getting multimodal before lcppcels do

Anonymous
10/10/24(Thu)01:34:27 No.102758502

Anonymous 10/10/24(Thu)01:34:27 No.102758502

>>102758474
if this is true why isn't llamacpp stealing their improvements, they should

Anonymous
10/10/24(Thu)01:35:42 No.102758518

Anonymous 10/10/24(Thu)01:35:42 No.102758518

>>102758462
>he doesn't use 'last reply'

Anonymous
10/10/24(Thu)01:36:29 No.102758528

Anonymous 10/10/24(Thu)01:36:29 No.102758528

>he uses last reply

Anonymous
10/10/24(Thu)01:46:41 No.102758629

Anonymous 10/10/24(Thu)01:46:41 No.102758629

>he replies

Anonymous
10/10/24(Thu)02:04:05 No.102758749

Anonymous 10/10/24(Thu)02:04:05 No.102758749

*brap*

Anonymous
10/10/24(Thu)02:05:03 No.102758761

Anonymous 10/10/24(Thu)02:05:03 No.102758761

*plap*

Anonymous
10/10/24(Thu)02:21:40 No.102758846

Anonymous 10/10/24(Thu)02:21:40 No.102758846

File: 22c2cf069ed460419ec2f47d5(...).jpg (42 KB, 500x611)

42 KB JPG

Thanks for all the Mikus. LMG has been sold to Gumi.
>>102758839
>>102758839
>>102758839

Anonymous
10/10/24(Thu)02:24:49 No.102758861

Anonymous 10/10/24(Thu)02:24:49 No.102758861

>>102746014
This model is better than Mistral 12B if you don't mind cucked censored model with low context, what I use for Page Assistant with RX 6800 16GB. It's seriously over powered just censored as fuck
https://huggingface.co/princeton-nlp/gemma-2-9b-it-SimPO

Anonymous
10/10/24(Thu)02:33:51 No.102758910

Anonymous 10/10/24(Thu)02:33:51 No.102758910

File: gumi punching energy hype(...).png (1.24 MB, 1024x1024)

1.24 MB PNG

>>102758846
A rare victory for GUMI!

Anonymous
10/10/24(Thu)02:58:10 No.102759021

Anonymous 10/10/24(Thu)02:58:10 No.102759021

>>102757390
petra you fat bitch, stop gangstalking me

Anonymous
10/10/24(Thu)03:04:30 No.102759053

Anonymous 10/10/24(Thu)03:04:30 No.102759053

>>102752930
>>102752872
Is there a chart like this for 70b models?

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.