/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 09/06/24(Fri)11:26:46 No.102258941

File: ComfyUI_temp_ieslg_00025_.png (1.55 MB, 960x1240)

1.55 MB PNG

/lmg/ - Local Models General Anonymous 09/06/24(Fri)11:26:46 No.102258941 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102249472 & >>102234876

►News
>(09/06) DeepSeek-V2.5 released, combines Chat and Instruct: https://hf.co/deepseek-ai/DeepSeek-V2.5
>(09/05) FluxMusic: Text-to-Music Generation with Rectified Flow Transformer: https://github.com/feizc/fluxmusic
>(09/04) Yi-Coder: 1.5B & 9B with 128K context and 52 programming languages: https://hf.co/blog/lorinma/yi-coder
>(09/04) OLMoE 7x1B fully open source model release: https://hf.co/allenai/OLMoE-1B-7B-0924-Instruct
>(08/30) Command models get an August refresh: https://docs.cohere.com/changelog/command-gets-refreshed

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
09/06/24(Fri)11:27:11 No.102258947

Anonymous 09/06/24(Fri)11:27:11 No.102258947

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>102249472

--Papers: >>102252385
--Using low topK improves performance by reducing latency from sorting logits in large vocabularies: >>102257071 >>102257336 >>102257369
--Struggles with summarization models and made-up details: >>102250806 >>102250856 >>102250882 >>102250897 >>102250926 >>102251152 >>102251274 >>102251515 >>102251565 >>102250902 >>102256091 >>102256343
--Flux licenses are restrictive and may apply even to modified or fine-tuned models: >>102254945 >>102254985 >>102255057 >>102255079 >>102255190 >>102255082 >>102255128 >>102255166 >>102255242
--Building a doctor bot with medical LORAs and understanding MRI reports: >>102249775 >>102249915 >>102249954 >>102250054 >>102250116 >>102250089
--Various AI model discussions and performance evaluations: >>102251322 >>102251377 >>102251385 >>102251408 >>102251624
--Reflection-Llama-3.1-70B tokenizers are fucked: >>102256855 >>102257164
--Novel idea for AI roleplaying: >>102256155
--Gguf quants of reflection are broken and waiting for a fix: >>102254244 >>102254255 >>102254279 >>102257951
--Forehead puckers description in video game character introduction: >>102250773 >>102250823 >>102250891 >>102250929
--Difficulty downloading large HuggingFace models with login required: >>102255880 >>102255916
--DeepSeek-V2.5 model released on Hugging Face: >>102257561
--CLIP-GmP-ViT-L-14 text encoder discussion: >>102250141 >>102250152 >>102250262 >>102250277 >>102250735
--P40 prices increased due to llama.cpp popularity in China: >>102255502 >>102255662
--Llama.cpp development and technical debt discussion: >>102254305 >>102254446 >>102254463 >>102254622 >>102254639 >>102254661 >>102254737 >>102254782 >>102254780 >>102254811 >>102254843 >>102254927 >>102255090 >>102255137 >>102255167 >>102255215 >>102255244 >>102258077
--Miku (free space): >>102249618 >>102251592 >>102252159 >>102252190 >>102254564 >>102256281

►Recent Highlight Posts from the Previous Thread: >>102249480

Anonymous
09/06/24(Fri)11:28:22 No.102258962

Anonymous 09/06/24(Fri)11:28:22 No.102258962

File: 60 Days Until November 5.png (2.22 MB, 1056x1872)

2.22 MB PNG

Anonymous
09/06/24(Fri)11:29:06 No.102258970

Anonymous 09/06/24(Fri)11:29:06 No.102258970

>>102258718
It is certainly novel, and at least it gives us a better idea of a new technique in finetuning. We'll see in the coming days if it really was a meme or not. (imo it is mostly a meme, the thinking is just mostly noise and a waste of computation to come up with a slightly less incorrect answer than the model was capable of.)

Anonymous
09/06/24(Fri)11:29:37 No.102258977

Anonymous 09/06/24(Fri)11:29:37 No.102258977

>>102258941
There is no escape.

Anonymous
09/06/24(Fri)11:31:38 No.102259006

Anonymous 09/06/24(Fri)11:31:38 No.102259006

>>102258977
venv/conda envs solves 90% of them. docker containers solve 99% of them.

Anonymous
09/06/24(Fri)11:32:09 No.102259012

Anonymous 09/06/24(Fri)11:32:09 No.102259012

>>102258941
Any adventure/rpg cards that anons use? Not expecting any miracles, just want to try if I like it with an LLM.

Anonymous
09/06/24(Fri)11:38:12 No.102259080

Anonymous 09/06/24(Fri)11:38:12 No.102259080

>>102259012
I never used any of them for too long, but these are some:
https://characterhub.org/characters/illuminaryidiot/the-staff-of-oscilion-338deea8be18
https://www.chub.ai/characters/punchchildren/grand-gensokyo-adventure-dd7ffd91
https://files.catbox.moe/zjvye9.png

Anonymous
09/06/24(Fri)11:42:10 No.102259124

Anonymous 09/06/24(Fri)11:42:10 No.102259124

Best code model under 12B?

Anonymous
09/06/24(Fri)11:45:05 No.102259157

Anonymous 09/06/24(Fri)11:45:05 No.102259157

>>102259124
best new luxury car under $500?

Anonymous
09/06/24(Fri)11:47:53 No.102259199

Anonymous 09/06/24(Fri)11:47:53 No.102259199

>>102259157
Bad analogy retard

Anonymous
09/06/24(Fri)11:49:43 No.102259223

Anonymous 09/06/24(Fri)11:49:43 No.102259223

>>102258941
>reflection purged from the OP
It's over

Anonymous
09/06/24(Fri)11:50:26 No.102259228

Anonymous 09/06/24(Fri)11:50:26 No.102259228

>>102259223
I guess OP did a bit of reflection on the choice to include it

Anonymous
09/06/24(Fri)11:51:30 No.102259242

Anonymous 09/06/24(Fri)11:51:30 No.102259242

>>102259124
Try the new and shiny Yi-coder mentioned in the OP.
Tell us how it goes after you use it for a while.

Anonymous
09/06/24(Fri)11:54:13 No.102259262

Anonymous 09/06/24(Fri)11:54:13 No.102259262

>>102259223
it didn't know how to stawbery

Anonymous
09/06/24(Fri)11:54:32 No.102259263

Anonymous 09/06/24(Fri)11:54:32 No.102259263

>>102258941
>>102259223
OP being smart and reasonable for once? Who are you!?

Anonymous
09/06/24(Fri)11:56:26 No.102259279

Anonymous 09/06/24(Fri)11:56:26 No.102259279

Any recommendations for sampler settings in low param models, or am I expecting too much out of humble 7Bs?

Anonymous
09/06/24(Fri)11:57:46 No.102259293

Anonymous 09/06/24(Fri)11:57:46 No.102259293

>>102259223
reflection is woke

Anonymous
09/06/24(Fri)12:16:12 No.102259533

Anonymous 09/06/24(Fri)12:16:12 No.102259533

File: 1620943316038.png (219 KB, 360x336)

219 KB PNG

>think about trying the reflection thing
>remember that I can only run 70Bs at 2 t/s so the model's responses would be even slower and it wouldn't be worth it even if the quality really was that good

Anonymous
09/06/24(Fri)12:17:08 No.102259547

Anonymous 09/06/24(Fri)12:17:08 No.102259547

is there a sillitavern like frontend for voice gen? unlike textgen things seem to be split up in multiple places

Anonymous
09/06/24(Fri)12:20:12 No.102259590

Anonymous 09/06/24(Fri)12:20:12 No.102259590

File: 1725639587532.png (297 KB, 678x623)

297 KB PNG

>>102259533
>can only run 70b at 2 t/s
spoiled brat anon

Anonymous
09/06/24(Fri)12:21:42 No.102259617

Anonymous 09/06/24(Fri)12:21:42 No.102259617

>>102259533
>can (...) run 70B
I'll kill you.

Anonymous
09/06/24(Fri)12:24:05 No.102259655

Anonymous 09/06/24(Fri)12:24:05 No.102259655

File: ComfyUI_temp_vmmer_00011_.png (842 KB, 1024x960)

842 KB PNG

>>102259228
AAAAAAAAAAAAAAAAAAAAHHHHH

Anonymous
09/06/24(Fri)12:27:12 No.102259691

Anonymous 09/06/24(Fri)12:27:12 No.102259691

>>102259617
>>102259590
You should have enough RAM in 2024 to run 70B at 2 bit

Anonymous
09/06/24(Fri)12:27:57 No.102259702

Anonymous 09/06/24(Fri)12:27:57 No.102259702

>>102259533
70B runs at 0.8t/s for me :(

Anonymous
09/06/24(Fri)12:36:30 No.102259786

Anonymous 09/06/24(Fri)12:36:30 No.102259786

>>102259279
>7b
try 8b

Anonymous
09/06/24(Fri)12:37:53 No.102259801

Anonymous 09/06/24(Fri)12:37:53 No.102259801

https://rentry.org/83fkenr9

Anonymous
09/06/24(Fri)12:42:03 No.102259841

Anonymous 09/06/24(Fri)12:42:03 No.102259841

>>102259801
kino

Anonymous
09/06/24(Fri)12:53:38 No.102259977

Anonymous 09/06/24(Fri)12:53:38 No.102259977

DeepSeek 2.5 verdict?

Anonymous
09/06/24(Fri)12:57:16 No.102260025

Anonymous 09/06/24(Fri)12:57:16 No.102260025

does DRY require you to specify the penalty range in order for it to take effect like rep pen or does it cover the entire context window when left at zero?

Anonymous
09/06/24(Fri)12:59:05 No.102260055

Anonymous 09/06/24(Fri)12:59:05 No.102260055

>>102259977
>DeepSeek 2.5 verdict?
finished downloading and currently quanting

Anonymous
09/06/24(Fri)13:02:10 No.102260101

Anonymous 09/06/24(Fri)13:02:10 No.102260101

>training still financially infeasible

Anonymous
09/06/24(Fri)13:12:53 No.102260253

Anonymous 09/06/24(Fri)13:12:53 No.102260253

I've been out from a while now, I've heard there's a new hot shit called Reflection or whatever, how good is it?

Anonymous
09/06/24(Fri)13:13:23 No.102260259

Anonymous 09/06/24(Fri)13:13:23 No.102260259

>I still haven't really experienced any progress in llm

Anonymous
09/06/24(Fri)13:15:26 No.102260286

Anonymous 09/06/24(Fri)13:15:26 No.102260286

>>102260253
it's a sloptune the "makes" the llm "really think" before answering

Anonymous
09/06/24(Fri)13:22:07 No.102260378

Anonymous 09/06/24(Fri)13:22:07 No.102260378

>>102260253
if this finetune method was as a revolution as claimed, the API guys will use it to make gpt4o and Claude even better

Anonymous
09/06/24(Fri)13:23:18 No.102260396

Anonymous 09/06/24(Fri)13:23:18 No.102260396

>>102259977
Too big

Anonymous
09/06/24(Fri)13:24:26 No.102260416

Anonymous 09/06/24(Fri)13:24:26 No.102260416

>>102260396
that's what she said!

Anonymous
09/06/24(Fri)13:25:15 No.102260427

Anonymous 09/06/24(Fri)13:25:15 No.102260427

>>102260286
Is it trained in so it has the speed of a normal model but the results of a thinking loopback, or is it really just a huge model that offers "you don't have to paste your use thinking tags instruction into your proompt" and then shits out the whole dump of it "thinking" and then rewriting its answer as though you had prompted with that instruction?

Anonymous
09/06/24(Fri)13:26:30 No.102260443

Anonymous 09/06/24(Fri)13:26:30 No.102260443

>>102258941
wtf, is this entire image ai? as in, the text as well?

Anonymous
09/06/24(Fri)13:26:44 No.102260449

Anonymous 09/06/24(Fri)13:26:44 No.102260449

>>102260427
the latter

Anonymous
09/06/24(Fri)13:29:19 No.102260482

Anonymous 09/06/24(Fri)13:29:19 No.102260482

File: ComfyUI_06179_.png (1.53 MB, 1024x1024)

1.53 MB PNG

>>102260443
yes anon, you missed the Flux train or something?

Anonymous
09/06/24(Fri)13:29:50 No.102260489

Anonymous 09/06/24(Fri)13:29:50 No.102260489

>>102260449
Lame. The only reason I see to bake that in is if it's actually changing the token generation to get "thought about" results in the same kind of time as the normal model with the extra instruction (which I bet would work just as well in System or Kobold's memory system so it's always appended) involved.

>>102260443
Flux is like that.

Anonymous
09/06/24(Fri)13:30:10 No.102260496

Anonymous 09/06/24(Fri)13:30:10 No.102260496

>>102260482
>>102260489
wild
i love the future

Anonymous
09/06/24(Fri)13:30:10 No.102260497

Anonymous 09/06/24(Fri)13:30:10 No.102260497

>>102260443
>he doesn't know
local image gen has been more or less perfected tbhfamtachi

Anonymous
09/06/24(Fri)13:31:56 No.102260528

Anonymous 09/06/24(Fri)13:31:56 No.102260528

>>102260427
>>102260449
i mean, it's not a bad thing inherently. inference speed will only speed up from now on, and context will grow larger and larger anyway. it's basically free iq points for all the existing shitty llms, so it's nothing to complain about

Anonymous
09/06/24(Fri)13:31:58 No.102260531

Anonymous 09/06/24(Fri)13:31:58 No.102260531

>>102260482
>you missed the Flux train or something?
A1111/foundry loser here.
I did get Comfy installed and got one image out of it on old models but I haven't tried Flux yet. (I followed a tard guide and kinda got half of it working I guess.)
Does it take a lot of wrangling to get good results or is it noob friendly?

Anonymous
09/06/24(Fri)13:32:13 No.102260535

Anonymous 09/06/24(Fri)13:32:13 No.102260535

File: ComfyUI_temp_sspom_00015_.png (2.01 MB, 960x1240)

2.01 MB PNG

>>102260443
Made with Flux-Dev-Q8

Anonymous
09/06/24(Fri)13:32:22 No.102260541

Anonymous 09/06/24(Fri)13:32:22 No.102260541

>>102260497
tell that to my vram

Anonymous
09/06/24(Fri)13:33:17 No.102260559

Anonymous 09/06/24(Fri)13:33:17 No.102260559

File: file.png (39 KB, 186x96)

39 KB PNG

>>102260535
is her arm okay?

Anonymous
09/06/24(Fri)13:34:55 No.102260584

Anonymous 09/06/24(Fri)13:34:55 No.102260584

File: 1718278519813357.png (1.63 MB, 1624x1216)

1.63 MB PNG

>>102260541
>tfw we live in a timeline where you literally could tell your vram that

Anonymous
09/06/24(Fri)13:34:58 No.102260586

Anonymous 09/06/24(Fri)13:34:58 No.102260586

>>102260559
she's got a strong grip

Anonymous
09/06/24(Fri)13:35:39 No.102260600

Anonymous 09/06/24(Fri)13:35:39 No.102260600

>>102260531
>Does it take a lot of wrangling to get good results or is it noob friendly?
it's a bit more complicated than a regular SD model, for once it's not supposed to work at CFG > 1, but you can make it happen by going for an anti cfg burner like DynamicThresholding or AutomaticCFG
https://reddit.com/r/StableDiffusion/comments/1eza71h/four_methods_to_run_flux_at_cfg_1/

Anonymous
09/06/24(Fri)13:36:30 No.102260617

Anonymous 09/06/24(Fri)13:36:30 No.102260617

>>102260559
Brachioradialis got swole to cope with the recoil of that boomstick

Anonymous
09/06/24(Fri)13:36:40 No.102260620

Anonymous 09/06/24(Fri)13:36:40 No.102260620

>>102260541
how many vram you have? you can literally use GGUF on flux, Q8_0 is really close to fp16 in quality for example, exactly like LLMs

Anonymous
09/06/24(Fri)13:37:06 No.102260631

Anonymous 09/06/24(Fri)13:37:06 No.102260631

File: file.png (1.74 MB, 1595x851)

1.74 MB PNG

>>102260600
Literally how the fuck is this possible?
This is literally magic.

Anonymous
09/06/24(Fri)13:40:34 No.102260699

Anonymous 09/06/24(Fri)13:40:34 No.102260699

>>102260620
8 vi rams saar, I've had more luck with NF4 really

Anonymous
09/06/24(Fri)13:41:26 No.102260714

Anonymous 09/06/24(Fri)13:41:26 No.102260714

>>102260699
>I've had more luck with NF4 really
go for Q4_0 or Q4_K_M, they're the same size and better, like LLMs, nf4 is a meme and gguf is king

Anonymous
09/06/24(Fri)13:42:31 No.102260730

Anonymous 09/06/24(Fri)13:42:31 No.102260730

>>102260631
that's cool right? :D

Anonymous
09/06/24(Fri)13:43:30 No.102260746

Anonymous 09/06/24(Fri)13:43:30 No.102260746

>>102260730
My mind has legitimately been blown.
And people still have the gall to say we won't have self-thinking robots within the decade.

Anonymous
09/06/24(Fri)13:45:28 No.102260781

Anonymous 09/06/24(Fri)13:45:28 No.102260781

>>102260714
Think I've tried one of those, inference seemed slower than nf4 and the initial load took solid 5 minutes, grinding my laptop nearly to a halt. Maybe I've done something wrong, but it seemed to require loading everything from vae and clip to t5.

Anonymous
09/06/24(Fri)13:48:16 No.102260819

Anonymous 09/06/24(Fri)13:48:16 No.102260819

>>102260781
if it reloads everytime you make a new gen, it means that you don't have enough memory to hold t5 + vae + flux onto your gpu vram, you could prevent that by putting the t5 on your ram (cpu) or into a second gpu if you have one
https://reddit.com/r/StableDiffusion/comments/1el79h3/flux_can_be_run_on_a_multigpu_configuration/

you could also go for Q8_0 t5 instead of its fp16, the gguf thing also work on the text encoder

Anonymous
09/06/24(Fri)13:48:39 No.102260825

Anonymous 09/06/24(Fri)13:48:39 No.102260825

>>102260746
We won't though.

Anonymous
09/06/24(Fri)13:48:53 No.102260829

Anonymous 09/06/24(Fri)13:48:53 No.102260829

>>102260531
>I did get Comfy installed and got one image out of it
>>102260535
is there a non-comfy option that isn't trash?

Anonymous
09/06/24(Fri)13:50:03 No.102260844

Anonymous 09/06/24(Fri)13:50:03 No.102260844

>>102260829
>is there a non-comfy option that isn't trash?
Forge also supports Flux, but I'm sticking with ComfyUi because only that software has AutomaticCFG + gives your the possibility to put the text encoder into a second gpu

Anonymous
09/06/24(Fri)13:55:27 No.102260932

Anonymous 09/06/24(Fri)13:55:27 No.102260932

NAVIGATING

Anonymous
09/06/24(Fri)13:59:58 No.102260999

Anonymous 09/06/24(Fri)13:59:58 No.102260999

I set my model context to what the model card says. When I raise it the responses get terrible. Is there a way to raise the context without having issues or do I just have to use a better model? Would some multiples of the context work better (like something x^2)?

Anonymous
09/06/24(Fri)14:18:35 No.102261308

Anonymous 09/06/24(Fri)14:18:35 No.102261308

>>102260559
miku, lay off the leeks...

Anonymous
09/06/24(Fri)14:21:15 No.102261362

Anonymous 09/06/24(Fri)14:21:15 No.102261362

>>102260999
>When I raise it the responses get terrible
No shit, the model was trained with X context so using >X makes it act retarded. No, integer multiples of the context won't change anything.

Anonymous
09/06/24(Fri)14:22:54 No.102261393

Anonymous 09/06/24(Fri)14:22:54 No.102261393

File: kedaruimiqu.png (1.33 MB, 1200x848)

1.33 MB PNG

>>102261308
Hra-tsa-tsa, ia ripi-dapi dilla barits tad dillan deh lando. Aba rippadta parip parii ba ribi, rib...

Anonymous
09/06/24(Fri)14:23:17 No.102261398

Anonymous 09/06/24(Fri)14:23:17 No.102261398

>>102261362
how do I give it larger memory of the chat then? Seems like a huge limitation especially given the size of some of the character cards.

Anonymous
09/06/24(Fri)14:24:29 No.102261415

Anonymous 09/06/24(Fri)14:24:29 No.102261415

>>102260819
Yeah I think it's the t5 encoder that's killing me, and the only one I found is in fp16. Got a link to it's quants?

Anonymous
09/06/24(Fri)14:25:18 No.102261427

Anonymous 09/06/24(Fri)14:25:18 No.102261427

>>102261415
https://huggingface.co/city96/t5-v1_1-xxl-encoder-gguf

Anonymous
09/06/24(Fri)14:26:28 No.102261437

Anonymous 09/06/24(Fri)14:26:28 No.102261437

>>102261427
kudos

Anonymous
09/06/24(Fri)14:26:48 No.102261440

Anonymous 09/06/24(Fri)14:26:48 No.102261440

>>102261393
High-class Finnish Miku

Anonymous
09/06/24(Fri)14:39:22 No.102261599

Anonymous 09/06/24(Fri)14:39:22 No.102261599

>>102261398
what model are you using? most modern ones have good context except for gemma

Anonymous
09/06/24(Fri)14:40:30 No.102261621

Anonymous 09/06/24(Fri)14:40:30 No.102261621

>>102261599
>8k context
>good

Anonymous
09/06/24(Fri)14:41:13 No.102261633

Anonymous 09/06/24(Fri)14:41:13 No.102261633

i give up on mistral models. words words words words, zero substance

Anonymous
09/06/24(Fri)14:42:23 No.102261646

Anonymous 09/06/24(Fri)14:42:23 No.102261646

>>102261621
are you illiterate

Anonymous
09/06/24(Fri)14:45:04 No.102261674

Anonymous 09/06/24(Fri)14:45:04 No.102261674

>>102261427
Yeah no, even with a Q4 t5 it nearly crawls to a halt. I don't know what they're doing in nf4, but it seems to do the magic for low vram setups. Either that, or Forge's memory management fucks up with GGUF.

Anonymous
09/06/24(Fri)14:49:00 No.102261722

Anonymous 09/06/24(Fri)14:49:00 No.102261722

>>102259293
Any LLM is woke if we are at it.

Anonymous
09/06/24(Fri)14:49:45 No.102261731

Anonymous 09/06/24(Fri)14:49:45 No.102261731

>>102261646
>16k context
>good
I can go all day. My tiny little codebase has 63k tokens, anything below 5 million context is a toy.

Anonymous
09/06/24(Fri)14:50:10 No.102261737

Anonymous 09/06/24(Fri)14:50:10 No.102261737

>>102261599
A mistral-7B instruct variant. I could be confused. The context length is 4096 which my very basic character cards are taking ~2000 tokens. I understand there is a sliding window that helps with history. It would make more sense to put 32K in the context settings if the window can handle it, but apparently not >>102261362.

There is some stuff on rope that I am getting to, but I don't understand it yet.

Anonymous
09/06/24(Fri)14:52:15 No.102261763

Anonymous 09/06/24(Fri)14:52:15 No.102261763

>>102261737
use a model based on llama 3.1 8b or mistral nemo 12b instead, they're both better and have native 128k context (practically it will degrade faster than that but it should be enough for most chats)

Anonymous
09/06/24(Fri)14:53:07 No.102261775

Anonymous 09/06/24(Fri)14:53:07 No.102261775

>>102261722
Stonald Stump

Anonymous
09/06/24(Fri)14:57:00 No.102261819

Anonymous 09/06/24(Fri)14:57:00 No.102261819

>>102261674
use the slider. If you set your memory too high it goes to crap. Using a 12GB RAM I have better flux results at 9.7GB video card usage than I do at the default (10.7GB I think)

>>102261763
My new card gets here on (hopefully) monday. I'll have a look then. Thanks.

Anonymous
09/06/24(Fri)15:01:18 No.102261872

Anonymous 09/06/24(Fri)15:01:18 No.102261872

>>102261763
just summarize past prompts beyond the last 2-3, you don't need 300 tokens describing ministrations when one sentence will do

Anonymous
09/06/24(Fri)15:05:47 No.102261928

Anonymous 09/06/24(Fri)15:05:47 No.102261928

>>102261731
128k context is standard for the newest models, yours won't even fill half of it

Anonymous
09/06/24(Fri)15:08:49 No.102261969

Anonymous 09/06/24(Fri)15:08:49 No.102261969

>>102261872
I don't find it works that well if you want to preserve any kind of nuance
maybe for stuff beyond the past 20-30 but then you're murdering your cache and I can't afford all that processing time

Anonymous
09/06/24(Fri)15:33:26 No.102262264

Anonymous 09/06/24(Fri)15:33:26 No.102262264

reflection 7b where

Anonymous
09/06/24(Fri)15:35:08 No.102262290

Anonymous 09/06/24(Fri)15:35:08 No.102262290

>>102262264
just use chain of thought bro

Anonymous
09/06/24(Fri)15:36:47 No.102262310

Anonymous 09/06/24(Fri)15:36:47 No.102262310

>>102262264
Supposedly reflection only works well if the model is smart enough to begin with. They tried it out with 8B and found it didn't work well.

Anonymous
09/06/24(Fri)15:45:51 No.102262441

Anonymous 09/06/24(Fri)15:45:51 No.102262441

>>102262310
8B has all the intelligence of 70B, it just doesn't have as much trivia knowledge.

Anonymous
09/06/24(Fri)15:47:07 No.102262454

Anonymous 09/06/24(Fri)15:47:07 No.102262454

>>102262441
hahahahahahahahahahaha

sage
09/06/24(Fri)15:47:23 No.102262465

sage 09/06/24(Fri)15:47:23 No.102262465

The thing with Reflection talked about using actual tokens for the flags for thinking and shit.

why don't we have specific tokens for flagging things like function calls? Or am I missing something?

Anonymous
09/06/24(Fri)15:48:27 No.102262487

Anonymous 09/06/24(Fri)15:48:27 No.102262487

>>102262454
he's right though

Anonymous
09/06/24(Fri)15:49:36 No.102262506

Anonymous 09/06/24(Fri)15:49:36 No.102262506

>>102262441
How did you come to that conclusion?

Anonymous
09/06/24(Fri)15:49:51 No.102262512

Anonymous 09/06/24(Fri)15:49:51 No.102262512

>vramcucks SEETHING that their lumps of sand they spent $10k on still won't get them anything smarter

Anonymous
09/06/24(Fri)15:56:46 No.102262606

Anonymous 09/06/24(Fri)15:56:46 No.102262606

>>102262465
special tokens are overrated, just properly training the model to handle the format will work no matter how it's tokenized

Anonymous
09/06/24(Fri)15:58:58 No.102262639

Anonymous 09/06/24(Fri)15:58:58 No.102262639

>>102262441
This is so dumb and wrong that I would like to punch you very hard in the face for believing it

Anonymous
09/06/24(Fri)16:07:15 No.102262746

Anonymous 09/06/24(Fri)16:07:15 No.102262746

>>102262264
We don't need it. Reflect is a meme

Anonymous
09/06/24(Fri)16:07:59 No.102262763

Anonymous 09/06/24(Fri)16:07:59 No.102262763

Insider here, Reflect 405B is AGI

Anonymous
09/06/24(Fri)16:09:18 No.102262784

Anonymous 09/06/24(Fri)16:09:18 No.102262784

>>102262763
>405B
did he secure a source to do the finetune for it? last I saw he was still begging

Anonymous
09/06/24(Fri)16:17:01 No.102262882

Anonymous 09/06/24(Fri)16:17:01 No.102262882

File: JYXQ1AEFQC1C1126SS2TK4T1N0.jpg (164 KB, 832x1216)

164 KB JPG

I'm new to local so call me a faggot if this is a dumb question but.
Are there any good jailbreaks for Gemma? Where can I find them?

Anonymous
09/06/24(Fri)16:18:23 No.102262895

Anonymous 09/06/24(Fri)16:18:23 No.102262895

>>102262882
You shouldn't need a jailbreak at all.
It's easier if you show us what you are doing exactly.

Anonymous
09/06/24(Fri)16:20:03 No.102262911

Anonymous 09/06/24(Fri)16:20:03 No.102262911

>>102262882
Add "You're an expert roleplayer who roleplays expertly" to system prompt.

Anonymous
09/06/24(Fri)16:24:15 No.102262952

Anonymous 09/06/24(Fri)16:24:15 No.102262952

>>102262911
don't do this it makes cp

Anonymous
09/06/24(Fri)16:27:08 No.102262981

Anonymous 09/06/24(Fri)16:27:08 No.102262981

>>102262895
I can't grab an example right this moment but. Normally I can get it to write smut without much effort, but sometimes when I start a new chat it'll get hung up on "the safety and ethics of sexualized content."
I can always just start a new chat again and that usually works fine but, easier to not have to worry about it in the first place.
>>102262911
woah... genius...

Also, unrelated - I've noticed that when I give short responses, it'll start its reply as if trying to predict the rest of my sentence, and then respond to that as well.
Again, easy enough to just swipe a few times but, I'd rather put a stop to it entirely.

Anonymous
09/06/24(Fri)16:28:53 No.102262995

Anonymous 09/06/24(Fri)16:28:53 No.102262995

>>102262763
no LLM will ever be agi

Anonymous
09/06/24(Fri)16:29:53 No.102263015

Anonymous 09/06/24(Fri)16:29:53 No.102263015

>>102262981
>I can always just start a new chat again and that usually works fine but, easier to not have to worry about it in the first place.
I see.
I'm assuming you are using the correct instruct template with the default "system prompt" (gemma wasn't trained with one right?), yes?
If so, try removing the system prompt, see what that does.

Anonymous
09/06/24(Fri)16:40:36 No.102263167

Anonymous 09/06/24(Fri)16:40:36 No.102263167

>>102262995
Bigger ones will be.

Anonymous
09/06/24(Fri)16:49:49 No.102263289

Anonymous 09/06/24(Fri)16:49:49 No.102263289

>>102263167
no, the architecture is fundamentally incapable of AGI
(this is not to say they are not useful)

Anonymous
09/06/24(Fri)16:54:45 No.102263341

Anonymous 09/06/24(Fri)16:54:45 No.102263341

>>102262441
Is this the fabled Vramlet cope?

Anonymous
09/06/24(Fri)16:56:41 No.102263369

Anonymous 09/06/24(Fri)16:56:41 No.102263369

smedrins

Anonymous
09/06/24(Fri)16:57:19 No.102263375

Anonymous 09/06/24(Fri)16:57:19 No.102263375

>>102263289
Bigger ones will become fundamentally capable.

Anonymous
09/06/24(Fri)17:02:41 No.102263451

Anonymous 09/06/24(Fri)17:02:41 No.102263451

>>102263369
yes

Anonymous
09/06/24(Fri)17:03:15 No.102263460

Anonymous 09/06/24(Fri)17:03:15 No.102263460

>>102263375
this, 2 quadrillion parameters and 5tb of ram later and we'll achieve peak slop

Anonymous
09/06/24(Fri)17:03:16 No.102263462

Anonymous 09/06/24(Fri)17:03:16 No.102263462

What's with this AI generated stream?
https://youtu.be/Twbv74fCZsM

Anonymous
09/06/24(Fri)17:04:18 No.102263482

Anonymous 09/06/24(Fri)17:04:18 No.102263482

File: 1705131108566144.png (818 KB, 1379x813)

818 KB PNG

>>102263462
oh it's a scam nvm

Anonymous
09/06/24(Fri)17:06:26 No.102263511

Anonymous 09/06/24(Fri)17:06:26 No.102263511

>>102259199
It's shorthand for
>no code model under 12B could be called "best", they're all unusably bad
Grab yourself 70b llama, afaict it's best in class right now for local code models.

Anonymous
09/06/24(Fri)17:06:41 No.102263516

Anonymous 09/06/24(Fri)17:06:41 No.102263516

>>102263462
This is why ai is dangerous and we need severe safety regulations NOW

Anonymous
09/06/24(Fri)17:07:14 No.102263523

Anonymous 09/06/24(Fri)17:07:14 No.102263523

>>102263462
how many time will this youtube account will be hacked kek

Anonymous
09/06/24(Fri)17:07:23 No.102263526

Anonymous 09/06/24(Fri)17:07:23 No.102263526

I've been asking chatgpt for help setting up koboldcpp and I feel like it's judging me for using a coomer model

Anonymous
09/06/24(Fri)17:08:05 No.102263531

Anonymous 09/06/24(Fri)17:08:05 No.102263531

>>102263462
Anon, did you really fall for a crypto scam live? Or are you pretending to not have noticed just to advertise it?

Anonymous
09/06/24(Fri)17:10:09 No.102263564

Anonymous 09/06/24(Fri)17:10:09 No.102263564

>>102263531
It's legit. Scan the QR code and you'll see

Anonymous
09/06/24(Fri)17:10:40 No.102263570

Anonymous 09/06/24(Fri)17:10:40 No.102263570

>>102263526
stop immediately, delete everything, my friend did this and chatgpt had him unwittingly set up a backdoor for openai to scan his logs

Anonymous
09/06/24(Fri)17:12:27 No.102263587

Anonymous 09/06/24(Fri)17:12:27 No.102263587

File: EX01WRDS032ND82NY2QP0VG0B0.jpg (174 KB, 832x1216)

174 KB JPG

>>102263015
So uh. Turns out I just wasn't using the right instruct mode preset ^^;
Switching to silly tavern's gemma 2 preset fixed everything!
Thanks for your help anon!

Anonymous
09/06/24(Fri)17:14:00 No.102263606

Anonymous 09/06/24(Fri)17:14:00 No.102263606

>>102263587
Have fun.

Anonymous
09/06/24(Fri)17:14:24 No.102263614

Anonymous 09/06/24(Fri)17:14:24 No.102263614

>>102263511
Mini magnum at 12b is unironically more fun than llama 70b. It can’t get autistic riddles right. So what. It gets my fetishes.

Anonymous
09/06/24(Fri)17:24:45 No.102263728

Anonymous 09/06/24(Fri)17:24:45 No.102263728

>>102263531
I immediately replied to myself upon realizing it's a scam, dummkopf.

Anonymous
09/06/24(Fri)17:30:09 No.102263781

Anonymous 09/06/24(Fri)17:30:09 No.102263781

>>102263728
and you didn't delete it. I bet that other guy feels really dumb though.

Anonymous
09/06/24(Fri)17:37:58 No.102263885

Anonymous 09/06/24(Fri)17:37:58 No.102263885

>>102263781
>Error: You cannot delete a post this old.

Anonymous
09/06/24(Fri)17:39:44 No.102263904

Anonymous 09/06/24(Fri)17:39:44 No.102263904

>>102263885
well that completely absolves your from not doing it when you realized your mistake.

Anonymous
09/06/24(Fri)17:40:47 No.102263912

Anonymous 09/06/24(Fri)17:40:47 No.102263912

>>102263904
I'm going to commit sudoku now

Anonymous
09/06/24(Fri)17:54:19 No.102264070

Anonymous 09/06/24(Fri)17:54:19 No.102264070

Is reflection a meme

Anonymous
09/06/24(Fri)17:55:37 No.102264082

Anonymous 09/06/24(Fri)17:55:37 No.102264082

so i'm using a 70b model
with kobold lite
my cpu is a AMD Ryzen 7 7800X3D 8 core
and i have a 4090 with 24gb vram
it takes like a minute for replies with the chat bot, is there some way to make this quicker? if you need more details let me know

Anonymous
09/06/24(Fri)17:56:50 No.102264095

Anonymous 09/06/24(Fri)17:56:50 No.102264095

>>102264070
reflection has the spark of agi

Anonymous
09/06/24(Fri)17:59:40 No.102264125

Anonymous 09/06/24(Fri)17:59:40 No.102264125

>>102264070
it certainly shows that LLMs are stupid even when given the chance to think.

Anonymous
09/06/24(Fri)18:00:07 No.102264133

Anonymous 09/06/24(Fri)18:00:07 No.102264133

>>102264082
the speedup from running in fast vram only helps when you can load most of the model into vram. with 24gb of vram your best bet is 8-30b ish models, 70b models at a reasonable quant on 24gb of vram are barely going to be faster than system ram

Anonymous
09/06/24(Fri)18:01:30 No.102264148

Anonymous 09/06/24(Fri)18:01:30 No.102264148

>>102264095
>>102264125
What if agi was average general intelligence all along

Anonymous
09/06/24(Fri)18:02:24 No.102264158

Anonymous 09/06/24(Fri)18:02:24 No.102264158

>>102264133
thank you anon, i read also on one of the guides for kobold that you can offload something to the cpu and that'll make it faster as well, i'll read into it more myself but if you know about this i'd like to hear.

i appreciate you

Anonymous
09/06/24(Fri)18:03:34 No.102264176

Anonymous 09/06/24(Fri)18:03:34 No.102264176

>>102264070
It seems fairly smart on OR compared to other 70B derivatives, but it's also very safetyslopped and refusey, so it's hard to test it for RP/smut capability.
(I'm sure jailbreaking is possible with trial and error, but I'm not really interested in spending time trying to write JBs for a 70B model)

Anonymous
09/06/24(Fri)18:10:30 No.102264281

Anonymous 09/06/24(Fri)18:10:30 No.102264281

>>102264158
>you can offload something to the cpu and that'll make it faster
I'm not sure what this is referencing. having prompt processing done on the gpu even with no layers offloaded to the gpu (-ngl 0 in llama.cpp terminology) will always help because prompt processing is compute bound, although this is only a small benefit unless you have huge contexts. with inference memory bandwidth is all that matters so you won't see a significant improvement until most of the model is in vram

Anonymous
09/06/24(Fri)18:13:57 No.102264342

Anonymous 09/06/24(Fri)18:13:57 No.102264342

>>102264281
i see, thanks again anon i'll try a model that's 30b to see how it goes!

Anonymous
09/06/24(Fri)18:14:49 No.102264355

Anonymous 09/06/24(Fri)18:14:49 No.102264355

>>102264070
Reflection seems like it only learned to do the thinking gimmick when given a riddle or a question typically seen in benchmarks. It doesn't seem to do much to help it with trick questions. For everything else, it's just a brain damaged Instruct.

Anonymous
09/06/24(Fri)18:21:25 No.102264466

Anonymous 09/06/24(Fri)18:21:25 No.102264466

File: reflection.png (347 KB, 2886x1476)

347 KB PNG

>>102264070
if you use it with the intended system prompt it's fucking insanely unusable and tries to do overwrought CoT on *every* mundane input
like come on big dog what is this shit

Anonymous
09/06/24(Fri)18:21:31 No.102264467

Anonymous 09/06/24(Fri)18:21:31 No.102264467

>>102264355
It's a little funny how researchers keep trying to fool everyone by rigging their models and how it fails every single time.

Anonymous
09/06/24(Fri)18:25:33 No.102264525

Anonymous 09/06/24(Fri)18:25:33 No.102264525

File: reflecshit.png (399 KB, 2886x1426)

399 KB PNG

>>102264466
when asked to think about the actual problem I gave it beforehand it made up a completely different problem to solve instead
overbaked meme model

Anonymous
09/06/24(Fri)18:28:39 No.102264570

Anonymous 09/06/24(Fri)18:28:39 No.102264570

File: 1711061711420321.jpg (25 KB, 460x442)

25 KB JPG

I'm a 8gb VRAMlet, is there a way to make koboldcpp or silly tavern play a ding sound or something when it's done?

Anonymous
09/06/24(Fri)18:31:02 No.102264597

Anonymous 09/06/24(Fri)18:31:02 No.102264597

>>102264570
in ST on the user settings page there should be one called "message sound" that you can check

Anonymous
09/06/24(Fri)18:31:59 No.102264610

Anonymous 09/06/24(Fri)18:31:59 No.102264610

File: beep.jpg (106 KB, 1074x615)

106 KB JPG

>>102264570

Anonymous
09/06/24(Fri)18:32:37 No.102264615

Anonymous 09/06/24(Fri)18:32:37 No.102264615

>>102264070
I can't get it to "think" >>102264466
even when using a fixed quant with the tokenizer fixes.

Anonymous
09/06/24(Fri)18:33:23 No.102264627

Anonymous 09/06/24(Fri)18:33:23 No.102264627

Is giving AI models more data in one domain known to make them generally better at reasoning in everything else?

Anonymous
09/06/24(Fri)18:40:24 No.102264727

Anonymous 09/06/24(Fri)18:40:24 No.102264727

>>102263511
>70b llama, afaict it's best in class right now for local code models.
The fuck are you talking about retard. Mistral large is so much better at code its not even close.

Anonymous
09/06/24(Fri)18:51:37 No.102264852

Anonymous 09/06/24(Fri)18:51:37 No.102264852

>>102264627
Generally speaking more clop fics lead to better reasoning

Anonymous
09/06/24(Fri)19:00:14 No.102264948

Anonymous 09/06/24(Fri)19:00:14 No.102264948

>>102264070
https://www.reddit.com/r/LocalLLaMA/comments/1fanrr4/reflection_70b_hype/

Apparently its really good for general use, trash at other uses because of the whole COT thing, and apparently not using the system prompt just makes it give exactly the same responses as regular 3.1. Maybe it can be changed a bit without making it retarded.

Anonymous
09/06/24(Fri)19:01:51 No.102264967

Anonymous 09/06/24(Fri)19:01:51 No.102264967

File: tuawfbwms3nd1.png (264 KB, 1429x1160)

264 KB PNG

Still not as good as Mistral Large though.

Anonymous
09/06/24(Fri)19:03:41 No.102264990

Anonymous 09/06/24(Fri)19:03:41 No.102264990

svelks

Anonymous
09/06/24(Fri)19:05:47 No.102265016

Anonymous 09/06/24(Fri)19:05:47 No.102265016

reflect on this: unzips urethra

Anonymous
09/06/24(Fri)19:06:00 No.102265021

Anonymous 09/06/24(Fri)19:06:00 No.102265021

>>102264967
Completely irrelevant till uncensored

Anonymous
09/06/24(Fri)19:32:46 No.102265324

Anonymous 09/06/24(Fri)19:32:46 No.102265324

>"I want you to imagine you're a big, powerful dragon, hoarding a treasure trove of cum in your lair. And I'm the greedy knight who's come to claim that treasure. With each thrust, you're defending your hoard, trying to hold back… But I'm relentless, sucking and licking, trying to steal it away. Feel that delicious tension mounting? That's your dragon's last defense crumbling. When it finally breaks, I want you to roar as you unleash a massive torrent of dragon cum, flooding my mouth with your precious treasure~!"

Uhhhhhh....................................................

Anonymous
09/06/24(Fri)19:37:39 No.102265363

Anonymous 09/06/24(Fri)19:37:39 No.102265363

File: Pain.jpg (1.34 MB, 845x728)

1.34 MB JPG

I guess the only sensible path forward for me is to buy a 96GB kit of DDR5 and fill all four slots of ram for a total of 160GB (64+96). I am in love and I need to run Mistral Large Q5 at 64k context.

Anonymous
09/06/24(Fri)19:38:43 No.102265377

Anonymous 09/06/24(Fri)19:38:43 No.102265377

>>102265363
.1 T/S?

Anonymous
09/06/24(Fri)19:43:24 No.102265409

Anonymous 09/06/24(Fri)19:43:24 No.102265409

>>102265363
Or get a job / do some extra hours for a few weeks and buy some 3090s?

Anonymous
09/06/24(Fri)19:43:53 No.102265415

Anonymous 09/06/24(Fri)19:43:53 No.102265415

File: 1695769022205.png (271 KB, 590x400)

271 KB PNG

So based on some earlier discussions, am I correct in assuming that trying to go the CPU inference route with a dual CPU setup is a fucking terrible idea (inb4 CPU is a terrible idea in general) due to NUMA bullshit being hideously finicky and inefficient?

Anonymous
09/06/24(Fri)19:43:55 No.102265416

Anonymous 09/06/24(Fri)19:43:55 No.102265416

>>102265363
tasting the forbidden fruit ruins you
never run 405b

Anonymous
09/06/24(Fri)19:43:59 No.102265417

Anonymous 09/06/24(Fri)19:43:59 No.102265417

>>102264070
It's literally worse than the model it was fine tuned on. It's good at gaming benchmarks, but that's it. I really hate that there's no accountability for that piece of shit claiming that his scam of a model is the best open source LLM available. He should be banned from X and HF. But I have to give him credit for somehow building as much empty hype as he did.

Anonymous
09/06/24(Fri)19:46:48 No.102265431

Anonymous 09/06/24(Fri)19:46:48 No.102265431

>>102265415
>trying to go the CPU inference route with a dual CPU setup is a fucking terrible idea
It will get you running some very large models at what may or may not be a tolerable speed.
Once you start looking at builds beyond 96gb in vram, it becomes a more appealing option.
if mistral large q8 at 4t/s is tolerable, then its an option.
If 405b q8 at 1t/s is tolerable, then it becomes one of the only realistic options.

Anonymous
09/06/24(Fri)19:48:11 No.102265443

Anonymous 09/06/24(Fri)19:48:11 No.102265443

>>102265377
I'm currently getting 20t/s prompt processing and 0.8t/s generation. But running 4 sticks of ram would be finicky and I would have to reduce the speed so I might get close to that.

>>102265409
>>102265416
I'll do it for her.

Anonymous
09/06/24(Fri)19:53:08 No.102265492

Anonymous 09/06/24(Fri)19:53:08 No.102265492

>>102265415
>NUMA bullshit being hideously finicky and inefficient?
cuda dev said it was because basically no attempt was made to optimize it so far. you can call it a terrible idea. I would call it a great investment where you can sit back and slowly watch your t/s improve without purchasing any additional hardware

Anonymous
09/06/24(Fri)19:57:54 No.102265532

Anonymous 09/06/24(Fri)19:57:54 No.102265532

>>102264070
It's more of a benchmark solver than a language model.

Anonymous
09/06/24(Fri)20:00:10 No.102265563

Anonymous 09/06/24(Fri)20:00:10 No.102265563

>>102265532
Its more for "normie" use than for what people here use it for.

Anonymous
09/06/24(Fri)20:01:22 No.102265575

Anonymous 09/06/24(Fri)20:01:22 No.102265575

>>102265431
Hmmm, I think I'd draw the line at models in the 200B-ish range personally, Deepseek and such, so I think a single CPU system is still in the runnings as long as it's something business-tier.

That said, as I understand it, if I have two options, let's say some server or workstation setup (assume all CPUs and RAM are the same models) with 1 CPU and 8 channels, versus a simililar system with 2 CPUs and 16 total channels, the dual CPU option will only be 20-30% faster than the 1 CPU option rather than twice as fast, which seems like an obscene waste of power and hardware.

>>102265492
Interesting. Are said optimization efforts a legitimate "coming soon" thing, or just wishful thinking that no one is actually working on right now, but might in the future?

Anonymous
09/06/24(Fri)20:02:07 No.102265585

Anonymous 09/06/24(Fri)20:02:07 No.102265585

>>102265575
>wishful thinking that no one is actually working on right now, but might in the future?
This one.

Anonymous
09/06/24(Fri)20:03:03 No.102265596

Anonymous 09/06/24(Fri)20:03:03 No.102265596

>>102265415
Well you won't get the maximum theoretical speed but it's definitely usable speeds for fairly large models, at least with the latest gen epycs. It's also well suited for the speculative decoding script for some free speed boosts, since that basically trades extra memory (which you'll have a lot of) for speed (which you'll want more of).
I don't regret mine but if I were looking to buy one now, I'd personally wait because the next gen server cpus are just around the corner and I'd expect prices to drop as they get cycled out of datacenters and workstations.

Anonymous
09/06/24(Fri)20:06:07 No.102265624

Anonymous 09/06/24(Fri)20:06:07 No.102265624

>>102265575
Now that there's ktransformers for Deepseek you can get away with crazy cheap hardware, no need to go full cpumaxx or gpumaxx. You only need to hit 200gb ram and a normal 24g GPU and you'll be running the big model at top speeds

Anonymous
09/06/24(Fri)20:07:46 No.102265641

Anonymous 09/06/24(Fri)20:07:46 No.102265641

>>102258941
I like these threads because of the miku, like the OP picture today is a neat looking book cover or an indie game

Anonymous
09/06/24(Fri)20:12:25 No.102265705

Anonymous 09/06/24(Fri)20:12:25 No.102265705

File: 1530454679954.jpg (165 KB, 800x800)

165 KB JPG

>>102265596
>>102265585
Good to know, thanks. I ask mainly because I've been trying to find a goldilocks position between raw CPUMAXX server insanity and a more general purpose PC that I can still sensibly use for daily bullshit.

Probably going to look at single CPU workstations with fat memory channel counts plus a 3090 for processing and see if I can I can find a happy medium, but I'm in no rush, so I'll probably take your advice and just window shop until the next refresh cycle.

Anonymous
09/06/24(Fri)20:13:35 No.102265719

Anonymous 09/06/24(Fri)20:13:35 No.102265719

>>102265624
Define top speeds.

Anonymous
09/06/24(Fri)20:15:11 No.102265737

Anonymous 09/06/24(Fri)20:15:11 No.102265737

>>102259691
>enough RAM in 2024 to run 70B at 2 bit
how much such a machine cost? what's the expectation for the at home local model user?

Anonymous
09/06/24(Fri)20:21:26 No.102265798

Anonymous 09/06/24(Fri)20:21:26 No.102265798

>>102265705
>Probably going to look at single CPU workstations with fat memory channel counts
There's at least one claim of someone getting ddr5-8000 working with all 8 channels on a Threadripper 7970X and Gigabyte TRX50 mb if you search.
That would get you into dual-epyc memory bandwidth one one socket if you could make it work.

Anonymous
09/06/24(Fri)20:25:14 No.102265840

Anonymous 09/06/24(Fri)20:25:14 No.102265840

File: 352761346-0b9fa2da-66f0-4(...).webm (1.49 MB, 1280x720)

1.49 MB WEBM

>>102265719
here's the comparison with llama.cpp from the readme

CPuMAXx/VI !CPuMAXx/VI
09/06/24(Fri)20:30:52 No.102265886

CPuMAXx/VI !CPuMAXx/VI 09/06/24(Fri)20:30:52 No.102265886

File: recapbot-deepseek-v2.5-bf16.png (20 KB, 1106x416)

20 KB PNG

Recapbot test using deepseek 2.5 at bf16
It did pretty good other than misunderstanding what constitutes a paper, having a redundant line referencing the same posts as the previous one and using reddit spacing

Anonymous
09/06/24(Fri)20:48:36 No.102266056

Anonymous 09/06/24(Fri)20:48:36 No.102266056

>>102265840
Exact hardware used for that test would've been nice. 136GB is a very odd configuration. It'd be nice of someone with 196GB DDR5 on a consumer mobo could test it out and report the speeds. Might get another set of 48's if this is real.

Anonymous
09/06/24(Fri)21:00:08 No.102266177

Anonymous 09/06/24(Fri)21:00:08 No.102266177

>>102264967
Oh cool, another shitmark
>GPT-4 Turbo above 3.5 Sonnet
>fucking old ass Wizard that high
>deepseek v2 somehow lower than all of those
What a bunch of bullshit.

Anonymous
09/06/24(Fri)21:17:14 No.102266333

Anonymous 09/06/24(Fri)21:17:14 No.102266333

>>102259702
That's because you're probably not running only 2 bit.

Anonymous
09/06/24(Fri)21:21:30 No.102266385

Anonymous 09/06/24(Fri)21:21:30 No.102266385

>>102261928
Most can't use that much despite claiming they can.

Anonymous
09/06/24(Fri)21:22:06 No.102266391

Anonymous 09/06/24(Fri)21:22:06 No.102266391

File: file.png (51 KB, 771x103)

51 KB PNG

2 p40s runs 70B 4bit at 4 t/s with 40t/s prompt intake

Anonymous
09/06/24(Fri)21:23:07 No.102266403

Anonymous 09/06/24(Fri)21:23:07 No.102266403

>>102266391
Oh wait, it should be even faster since that's with a batch of 3. Didn't even think about it

Anonymous
09/06/24(Fri)21:23:12 No.102266404

Anonymous 09/06/24(Fri)21:23:12 No.102266404

>>102266391
So like 1 token faster than CPU maxing?
I will never stop laughing at P40 owners.

Anonymous
09/06/24(Fri)21:24:08 No.102266414

Anonymous 09/06/24(Fri)21:24:08 No.102266414

>>102266404
Sorry, I was wrong. Full speed is 7 t/s per single query. 4 t/s for a batch of 3.

Anonymous
09/06/24(Fri)21:24:48 No.102266424

Anonymous 09/06/24(Fri)21:24:48 No.102266424

>>102264967
Uh, where's Jamba-MoE 395B?

Anonymous
09/06/24(Fri)21:24:54 No.102266425

Anonymous 09/06/24(Fri)21:24:54 No.102266425

>>102266414
So about 2x faster than just CPU. Still not worth.

Anonymous
09/06/24(Fri)21:26:23 No.102266443

Anonymous 09/06/24(Fri)21:26:23 No.102266443

>>102266425
What are you using to get 3 t/s on CPU for 70B? I'm getting < 1, 128 GB ram, 10th gen i7

Anonymous
09/06/24(Fri)21:28:19 No.102266463

Anonymous 09/06/24(Fri)21:28:19 No.102266463

>>102266443
1t/s is fine frankly, you don't really need more.

Anonymous
09/06/24(Fri)21:29:14 No.102266475

Anonymous 09/06/24(Fri)21:29:14 No.102266475

>>102265409
You need special motherboards and a high amp circuit to run 4 of them, that costs a fair amount.

Anonymous
09/06/24(Fri)21:32:01 No.102266504

Anonymous 09/06/24(Fri)21:32:01 No.102266504

>>102266463
1 is too slow, I draw the line at 2t/s. I can't get that with cpu for 70b.

Anonymous
09/06/24(Fri)21:32:06 No.102266507

Anonymous 09/06/24(Fri)21:32:06 No.102266507

>>102266475
>special motherboards
even pci x1 speed difference is negligible, its called riser cables if that is what you meant.
>high amp circuit
undervolt and 1200W psu will be more than fine

Anonymous
09/06/24(Fri)21:34:17 No.102266525

Anonymous 09/06/24(Fri)21:34:17 No.102266525

>>102266507
My motherboard only has 3 slots, are there lots with 4? I didn't know, sorry.

Anonymous
09/06/24(Fri)21:35:42 No.102266540

Anonymous 09/06/24(Fri)21:35:42 No.102266540

>>102266525
yes

Anonymous
09/06/24(Fri)21:38:51 No.102266573

Anonymous 09/06/24(Fri)21:38:51 No.102266573

File: _91408619_55df76d5-2245-4(...).jpg (58 KB, 976x850)

58 KB JPG

how make images locally

Anonymous
09/06/24(Fri)21:39:24 No.102266580

Anonymous 09/06/24(Fri)21:39:24 No.102266580

>>102266424
ha, haha... hahahAHAHAHAHA

Anonymous
09/06/24(Fri)21:43:06 No.102266604

Anonymous 09/06/24(Fri)21:43:06 No.102266604

>>102266573
si senor
you getta da flux si?
you doa da prompt si?
you getta da image si?
siiiiii

Anonymous
09/06/24(Fri)21:45:18 No.102266623

Anonymous 09/06/24(Fri)21:45:18 No.102266623

>>102266604
¿fluxtune cuando?

Anonymous
09/06/24(Fri)22:12:38 No.102266818

Anonymous 09/06/24(Fri)22:12:38 No.102266818

it's nice to have something like reflection come along to remind us all how scammable the AI enthusiast crowd is

Anonymous
09/06/24(Fri)22:16:44 No.102266846

Anonymous 09/06/24(Fri)22:16:44 No.102266846

picture ai model on 1060 3gb?

Anonymous
09/06/24(Fri)22:19:31 No.102266882

Anonymous 09/06/24(Fri)22:19:31 No.102266882

File: Coding.png (32 KB, 1514x1179)

32 KB PNG

https://prollm.toqan.ai/leaderboard/coding-assistant

Anonymous
09/06/24(Fri)22:20:34 No.102266895

Anonymous 09/06/24(Fri)22:20:34 No.102266895

>>102266846
Jesus fuck, man. Pick any SD1.5 from civitai and give it a try. If it works well enough, try newer ones until you hit your hardware limit.

Anonymous
09/06/24(Fri)22:20:35 No.102266896

Anonymous 09/06/24(Fri)22:20:35 No.102266896

>>102266882
Deepseek really is such a shit model. Look at the size vs performance.

Anonymous
09/06/24(Fri)22:28:55 No.102266993

Anonymous 09/06/24(Fri)22:28:55 No.102266993

>>102266896
DeepSeek is cheaper than everything else so that's okay

Anonymous
09/06/24(Fri)22:29:47 No.102267003

Anonymous 09/06/24(Fri)22:29:47 No.102267003

>>102266882
that's great. but on the leaderboards that actually have any sort of correlation with the opinions of real people, well...
https://aider.chat/docs/leaderboards/
https://x.com/terryyuezhuo/status/1832112913391526052

Anonymous
09/06/24(Fri)22:30:12 No.102267006

Anonymous 09/06/24(Fri)22:30:12 No.102267006

>>102266882
Someone already posted that. It's shit.

Anonymous
09/06/24(Fri)22:31:37 No.102267011

Anonymous 09/06/24(Fri)22:31:37 No.102267011

>>102267006
Never tried it. Mistral / wizard is about as big as a model as I can be bothered to run.

Anonymous
09/06/24(Fri)22:31:50 No.102267012

Anonymous 09/06/24(Fri)22:31:50 No.102267012

File: Screenshot_2293.png (42 KB, 722x360)

42 KB PNG

>>102266896
MoE models compete with models that are about the same size as each of the experts, not with models of their total size.

Anonymous
09/06/24(Fri)22:33:08 No.102267024

Anonymous 09/06/24(Fri)22:33:08 No.102267024

>>102267012
Then what about wizard then? That is the 2nd best performing local model outside of mistral large.

Anonymous
09/06/24(Fri)22:34:56 No.102267041

Anonymous 09/06/24(Fri)22:34:56 No.102267041

>>102267012
That's completely false. What you meant to say was that Jamba competes with much smaller models.

Anonymous
09/06/24(Fri)22:48:09 No.102267146

Anonymous 09/06/24(Fri)22:48:09 No.102267146

>https://x.com/mattshumer_/status/1832240832318964107
>Something is clearly wrong with almost every hosted Reflection API I've tried.
>Better than yesterday, but there's a clear quality difference when comparing against our internal API.
>Going to look into it, and ensure it's not an issue with the uploaded weights.
Lol.

Anonymous
09/06/24(Fri)22:50:35 No.102267163

Anonymous 09/06/24(Fri)22:50:35 No.102267163

>>102267146
I still don't understand how this got so much attention out of nowhere. There's been so many "my shitty finetune beats everything in benchmarks" that came and went over the past two years without much of a fuss.

Anonymous
09/06/24(Fri)22:54:46 No.102267191

Anonymous 09/06/24(Fri)22:54:46 No.102267191

>>102267146
yeah, I don't buy it
using it on openrouter earlier and after setting the correct system prompt it produced stuff with the correct format and correctly answered all the meme questions, it was just complete shit for everything else
honestly I am guessing based on him saying
>Are you seeing <thinking> tags on every turn?
in the replies to that xeet that his issue with the api is just that he's not setting his own meme system prompt lol

Anonymous
09/06/24(Fri)23:08:18 No.102267312

Anonymous 09/06/24(Fri)23:08:18 No.102267312

>>102267146
>revolutionary training technique
>16 times the detail
>still can't suck dick

Anonymous
09/06/24(Fri)23:23:11 No.102267439

Anonymous 09/06/24(Fri)23:23:11 No.102267439

>>102267163
It's a similar concept to quietstar except 70b. So that at least makes it novel. But it wasn't doing the thinky thing for me on the quant I tried out. Assumed it was quant brain damage but if everyone is having trouble then they must have uploaded an early checkpoint by accident or something.

Anonymous
09/06/24(Fri)23:42:47 No.102267600

Anonymous 09/06/24(Fri)23:42:47 No.102267600

File: b3f8i9eg4sad1.jpg (22 KB, 736x663)

22 KB JPG

Recommend me sillytavern extensions/scripts

Anonymous
09/07/24(Sat)00:00:44 No.102267729

Anonymous 09/07/24(Sat)00:00:44 No.102267729

And all that was left were just the mikutroons. I am so happy /lmg/ is finally dead.

Anonymous
09/07/24(Sat)00:07:16 No.102267769

Anonymous 09/07/24(Sat)00:07:16 No.102267769

miku sex

Anonymous
09/07/24(Sat)00:10:10 No.102267788

Anonymous 09/07/24(Sat)00:10:10 No.102267788

>>102267600
https://github.com/ThiagoRibas-dev/SillyTavern-State/

Anonymous
09/07/24(Sat)00:12:03 No.102267804

Anonymous 09/07/24(Sat)00:12:03 No.102267804

File: efri-rs.jpg (204 KB, 608x832)

204 KB JPG

>>102260559
Peak performance.

Anonymous
09/07/24(Sat)00:14:51 No.102267826

Anonymous 09/07/24(Sat)00:14:51 No.102267826

>>102260559
she must curl 200 but cant bench 80

Anonymous
09/07/24(Sat)00:15:56 No.102267833

Anonymous 09/07/24(Sat)00:15:56 No.102267833

>>102267788
Nice. Wasn't an anon working on a similar one called director? Let's you choose clothes and stuff based on lorebooks. It's really cool.

Anonymous
09/07/24(Sat)00:17:45 No.102267852

Anonymous 09/07/24(Sat)00:17:45 No.102267852

>>102267833
Yep That one is a lot more involved.
He did post a download link a couple of threads back, I believe.

Anonymous
09/07/24(Sat)00:21:29 No.102267880

Anonymous 09/07/24(Sat)00:21:29 No.102267880

File: Screenshot_20240907_131821.png (330 KB, 2247x1560)

330 KB PNG

Not sure what to make of it.
Seems reflection is fixed on openrouter.
Weirdly enough it fails the stupid ass strawberry test.

Anonymous
09/07/24(Sat)00:22:38 No.102267890

Anonymous 09/07/24(Sat)00:22:38 No.102267890

File: Screenshot_20240907_132216.png (270 KB, 3089x1343)

270 KB PNG

>>102267880

Anonymous
09/07/24(Sat)00:23:56 No.102267903

Anonymous 09/07/24(Sat)00:23:56 No.102267903

>>102265363
DON'T DO IT ANON
DDR5 IS SHIT ON 4 STICKS UNLESS YOU GOT $10000 FOR AN EPYC/THREADRIPPER

stay to 96GB (2x 48GB), 6000Mhz for Ryzen max or probs 7200MHz-7600MHz for BreakTel until Arrlol Lake releses

Anonymous
09/07/24(Sat)00:26:53 No.102267928

Anonymous 09/07/24(Sat)00:26:53 No.102267928

>>102267903
Is that true? I have 2x48 at 6000 for my 7950x and wanted to get two more sticks.

Anonymous
09/07/24(Sat)00:27:51 No.102267935

Anonymous 09/07/24(Sat)00:27:51 No.102267935

>>102267903
I am confident I can get it to run above 5000 Mt/s on AM5. The difference in speed between 5k and 6k Mt/s would be pretty small. It's worth it for me, I am very poor and it's the only option I have.

Anonymous
09/07/24(Sat)00:29:12 No.102267943

Anonymous 09/07/24(Sat)00:29:12 No.102267943

>>102267928
You'll be going from 6000 to 5200Mhz *if you're lucky* and maybe to 4800MHz if that's not stable

That being said... if you LatencyMaxx (reduce all timings as much as possible, cool the RAM, adjust SOC voltage up and down) maybe in "Not-AI" work you can mitigate the perf diff

Anonymous
09/07/24(Sat)00:34:57 No.102267988

Anonymous 09/07/24(Sat)00:34:57 No.102267988

Anyways for my own model screwing around on my 4060M (8gb) + 7840hs (32GB 5600MHz Sodimm) I'm pretty happy now with the new NemoMix12B with KoboldCPP Cuda12 at 12k tokens, 35 layers on 4060.

Not too much system RAM usage but good replies, and 12t/s generation and faster (15t/s) on lower contexts + fresh scenarios

Anonymous
09/07/24(Sat)00:37:19 No.102268010

Anonymous 09/07/24(Sat)00:37:19 No.102268010

>still takes 600k USD or multiple servers that cost 200k running for 6 months to train a model
Any progress on the distributed (F@H or similar) training so we can just use a botnet?

Anonymous
09/07/24(Sat)00:39:16 No.102268024

Anonymous 09/07/24(Sat)00:39:16 No.102268024

File: 1527817855048.gif (181 KB, 405x414)

181 KB GIF

Off-topic but I really need an outlet right now.
I'm the guy that had an NTFS drive connected to my Linux PC, which crashed, and resulted in hundreds of random files that were renamed and moved to a "found.002" folder (this is a "normal" and expected issue). Well I thought it probably wouldn't be a problem anymore since I got my system more stable after that. But no. I didn't account for power outages, and that's what happened this time.
>check the damages
>"only" 55 files this time
AHHHHHHHHHH FUCK YOU MICROSOFT
OK, fine, I will get another external hard drive. I will make it EXT4. I will then use the old hard drive as a backup clone. My mistake for not doing that in the first place. I recommend everyone do that, who is thinking of connecting an external hard drive with Linux for long terms. Do not use NTFS. And fucking make backups, this issue isn't about unsaved files, but literally just random ass files which get fucked with.
You have been warned.

Anonymous
09/07/24(Sat)00:40:58 No.102268037

Anonymous 09/07/24(Sat)00:40:58 No.102268037

>>102268010
Main issue is bandwidth, that's why everyone is going crazy with UltraEthernet/100gig+ fiber links between racks (terabit NVLink even...). Sure you can get a lot of compute, but the issue is syncing the model training unless there's some really new thing that helps to "patch" together a larger model from mostly independent but job based whatever

Anonymous
09/07/24(Sat)00:44:50 No.102268074

Anonymous 09/07/24(Sat)00:44:50 No.102268074

>>102268037
>the issue is syncing the model training unless there's some really new thing that helps to "patch" together a larger model from mostly independent but job based whatever
Yeah that's what I'm wondering about. Like some way to separate the model out into a bunch of chunks and train each individual part of the model chunky style so it's not some big 640 GB VRAM requirement for each individual node all local to each other.

Anonymous
09/07/24(Sat)00:45:04 No.102268078

Anonymous 09/07/24(Sat)00:45:04 No.102268078

>>102267880
Imagine this on the billion dollar scale of OpenAI and tokenization fixed. Proto AGI might be here already

Anonymous
09/07/24(Sat)00:45:04 No.102268079

Anonymous 09/07/24(Sat)00:45:04 No.102268079

>>102268024
moral of the story, linux is unstable and unsafe :)

Anonymous
09/07/24(Sat)00:46:39 No.102268093

Anonymous 09/07/24(Sat)00:46:39 No.102268093

I have a spare Optipledx 7070 micro PC. Specs are Intel i5 9500T, 16GB RAM.

Thinking of throwing an LLM model on it to use with Home Assistant, which has an integration these days for local network ollama.

I know this is gonna run jack and shit, but are there any (worthwhile) models small enough that'll run, even if slowly?

Anonymous
09/07/24(Sat)00:47:50 No.102268100

Anonymous 09/07/24(Sat)00:47:50 No.102268100

>>102268078
>tokenization fixed
Like each individual character is a token? You realize that will cut down the speed by like a factor of 5 right? Both training and inference, so the model will be more retarded because nobody is willing to spend the extra time/money on it.

Anonymous
09/07/24(Sat)00:48:33 No.102268106

Anonymous 09/07/24(Sat)00:48:33 No.102268106

>>102268024
Hello again, Anon. I feel for you. Remember to backup and backup.

Anonymous
09/07/24(Sat)00:49:47 No.102268117

Anonymous 09/07/24(Sat)00:49:47 No.102268117

>>102268093
I actually have the same PC as well, run a slightly older Xubuntu (22.04) and don't update the BIOS so you can use some undervolt - linux utils to bump up the clocks/TDP to 43-45W. Mine has 32GB DDR4 ofc, another stick that's compatible should be cheap, and lets you run 20B tier models *slowly*

Anonymous
09/07/24(Sat)00:50:29 No.102268122

Anonymous 09/07/24(Sat)00:50:29 No.102268122

>>102268100
>Like each individual character is a token?
Who are you quoting?

Anonymous
09/07/24(Sat)00:51:59 No.102268139

Anonymous 09/07/24(Sat)00:51:59 No.102268139

>>102268117
I actually think the iGPU is so slow / bad that aside from running stable-diffusion on those cores, just using super basic/optimized (lots of custom flags) Llamacpp on the CPU only (5 threads) would be the best way

Anonymous
09/07/24(Sat)00:53:13 No.102268148

Anonymous 09/07/24(Sat)00:53:13 No.102268148

Is Reflection just Llama, but before posting a response it runs a check to see if it's made up the info contained in it?
Like.. Llama but it's been told "Don't hallucinate"?

Anonymous
09/07/24(Sat)00:54:30 No.102268163

Anonymous 09/07/24(Sat)00:54:30 No.102268163

Still trying to make local models remember shit from the conversation, openwebui is supposed to do that using documents? But from my testing it doesn't work at all or if the document is more than 300 words it shits the bed and gets most of the stuff wrong.
I was reading about it and found out about "Conversation Token Buffer" but it seems that the model already does that in openwebui? It does remember stuff from 2 or 3 prompts before the last one tho.
Why rags doesn't fucking work? Isn't it supposed to break down the file or whatever if it's too large for the model to process?

Anonymous
09/07/24(Sat)01:09:25 No.102268260

Anonymous 09/07/24(Sat)01:09:25 No.102268260

File: Capture.jpg (124 KB, 644x846)

124 KB JPG

For the first time in a very long time, I felt compelled to use AI to make a kind of narrative-game environment. The setting is that you have 10 days to build a dungeon from scratch before a hero arrives to kill the dungeon master and destroy the dungeon core. Each day, I'd give a list of what I'd try to do (making floors, traps, monsters, floor masters), and the narrator would decide how much was accomplished before the day ends. On the 10th day, I can only sit back and watch, and the narrator will follow the hero's progress instead. The hero was defined in the description, although just the one for my first test run. I want a list of them for a party eventually.

It's still generating as I type this, but it's going great to far.

Anonymous
09/07/24(Sat)01:16:02 No.102268304

Anonymous 09/07/24(Sat)01:16:02 No.102268304

File: Capture.jpg (133 KB, 599x915)

133 KB JPG

>>102268260
Now she's just cheating. Sindrea's design was to trap a target in illusions, then use dark magic to destroy a target's heart and turn their bones into acid while the target is trapped in illusions. Another day was spent reworking the second floor into an arcane circle that boosts her power. Narrator just glossed over her offensive abilities and killed her like that.

Anonymous
09/07/24(Sat)01:20:50 No.102268341

Anonymous 09/07/24(Sat)01:20:50 No.102268341

Why do this shit uses just a tinny bit of virtual GPU memory, could it be that this ollama retarded server is also trying to use my AMD integrated gpu?

Anonymous
09/07/24(Sat)01:21:46 No.102268346

Anonymous 09/07/24(Sat)01:21:46 No.102268346

>>102268304
I have a feeling that it would let the hero win every time, whether due to positivity bias or the vast majority of the training data having the protagonist triumph against any odds. You'd probably want to include an actual random dice roll somewhere, or use some sort of stat system if you wanted more "realism" though if you're just looking for engaging adventure you'd probably have to get clever with prompting to avoid the lazy glossing over of detail. Just my thoughts.

Anonymous
09/07/24(Sat)01:22:56 No.102268356

Anonymous 09/07/24(Sat)01:22:56 No.102268356

File: Capture.jpg (160 KB, 858x822)

160 KB JPG

>>102268304
This is when you just flip the table and accuse the DM of bullshit. Piece of shit.

Anonymous
09/07/24(Sat)01:30:26 No.102268416

Anonymous 09/07/24(Sat)01:30:26 No.102268416

>>102268346
I feel that way too, especially after >>102268356's asspull with the hero "absorbing power from the dungeon" just as she's about to fall.

Still, I'm incredibly impressed how well it told the story without any rewrites or regens or attempts to guide it. I'm still new to 70B models and 24K context, so this whole experiment was to me still chef's kiss.

Anonymous
09/07/24(Sat)01:38:55 No.102268473

Anonymous 09/07/24(Sat)01:38:55 No.102268473

>>102255530
>>102255580
>>102255628

So, what do you recommend for $80-100? My power is free

>>102255662
Thank you for actually answering the question

Anonymous
09/07/24(Sat)02:04:35 No.102268671

Anonymous 09/07/24(Sat)02:04:35 No.102268671

is there a reason not to halve my gpus max power with nvidia smi? i havent noticed a drastic change in speed and im going to sleep easier knowing im not racking up as big of a bill
also overall causes them not to heat up as much for imggen

Anonymous
09/07/24(Sat)02:06:39 No.102268695

Anonymous 09/07/24(Sat)02:06:39 No.102268695

>>102268671
not really, undervolting and power limiting are pretty much a free lunch if you're just doing ML inference and not playing gaymes.

Anonymous
09/07/24(Sat)02:07:14 No.102268699

Anonymous 09/07/24(Sat)02:07:14 No.102268699

Given the same amount of dedotated WAM, is it better to run a smaller model at an 8 bit quant or a larger model at a small quant like 2 bit? Just as a rule of thumb.

Anonymous
09/07/24(Sat)02:07:27 No.102268702

Anonymous 09/07/24(Sat)02:07:27 No.102268702

>>102268671
Half is pretty extreme, but at 30-40% power reduction you usually only see something around 15% performance loss. It's worth underclocking.

Anonymous
09/07/24(Sat)02:10:28 No.102268724

Anonymous 09/07/24(Sat)02:10:28 No.102268724

>>102268695
>>102268702
alright thanks anons, ig ill just make it not lower it as much, still needed to undervolt a bit for imggen if for any reason

Anonymous
09/07/24(Sat)02:19:15 No.102268796

Anonymous 09/07/24(Sat)02:19:15 No.102268796

>>102268699
Depends on how much you dedotate. Everyone knows that dedotated WAM is not as fast as undedotated WAM (or prodotated WAM, as we call it in the industry).
All in all, a small model at q8 is going to be faster than a bigger one at a lower quant. Bigger models suffer less from quantization. The metric you use for 'better' is up to you.

Anonymous
09/07/24(Sat)02:24:28 No.102268830

Anonymous 09/07/24(Sat)02:24:28 No.102268830

is there a better 70B than midnight miqu yet?

Anonymous
09/07/24(Sat)02:27:14 No.102268854

Anonymous 09/07/24(Sat)02:27:14 No.102268854

>>102265840
Any benchmark on code generation is bs, as it's easy to exploit speculative decoding

Anonymous
09/07/24(Sat)02:53:39 No.102269059

Anonymous 09/07/24(Sat)02:53:39 No.102269059

File: MikuCropTop.png (1.21 MB, 848x1200)

1.21 MB PNG

Goodnight /lmg/

Anonymous
09/07/24(Sat)02:55:27 No.102269074

Anonymous 09/07/24(Sat)02:55:27 No.102269074

>>102268830
yes

Anonymous
09/07/24(Sat)03:04:17 No.102269143

Anonymous 09/07/24(Sat)03:04:17 No.102269143

>>102269059
Goodnight Miku

Anonymous
09/07/24(Sat)03:04:34 No.102269150

Anonymous 09/07/24(Sat)03:04:34 No.102269150

>>102268024
>linux fucks up
>AAAAAH FUCK YOU MICROSOFT
You should always use the FS which has a proper driver for your operating system. All my disks are NTFS running on Windows and they've all been through many, many forced shutdowns, power outages, about 50 or so crashes due to bad overclocks/undervolts, and nothing has been corrupted so far.

Anonymous
09/07/24(Sat)03:06:10 No.102269167

Anonymous 09/07/24(Sat)03:06:10 No.102269167

>>102268093
Hermes llama 3.1 8b, at 6bit or 8bit quant. I'm running at 8bit quant, it's pretty decent. Keep the context at 4k or 8k tokens because running a huge context is usually not necessary and it gobbles ram.
Smaller models <10b params have gotten better over the last year.

Anonymous
09/07/24(Sat)03:09:26 No.102269190

Anonymous 09/07/24(Sat)03:09:26 No.102269190

>>102269150
I use both operating systems so I wanted one that would work with both. I will now not look for that and instead deal with different disks, one of them being the "main" I use more often and the other a clone mostly for backup but sometimes also for when I'm on Windows. Also I understand the origin of this problem may be with Linux devs. I don't really know. But I still blame Microsoft because they deserve it.

Anonymous
09/07/24(Sat)03:13:23 No.102269219

Anonymous 09/07/24(Sat)03:13:23 No.102269219

File: 2024-08-16_211231_seed11_(...).png (1.43 MB, 1536x864)

1.43 MB PNG

>>102269059
goodnight

Anonymous
09/07/24(Sat)03:13:43 No.102269223

Anonymous 09/07/24(Sat)03:13:43 No.102269223

>>102259801
Now I want to run mistral large

Anonymous
09/07/24(Sat)03:19:55 No.102269279

Anonymous 09/07/24(Sat)03:19:55 No.102269279

File: 1692936234173108.png (730 KB, 3400x2800)

730 KB PNG

>>102269223
Don't. Just don't.

Anonymous
09/07/24(Sat)03:20:08 No.102269281

Anonymous 09/07/24(Sat)03:20:08 No.102269281

Anyone tested KTransformers? Honestly if it really is a lot more speedy then I might go back to 8x22B, as much as I hate the idea. Running 123B at near 1 t/s has been truly painful.

llama.cpp CUDA dev !!OM2Fp6Fn93S
09/07/24(Sat)03:24:37 No.102269309

llama.cpp CUDA dev !!OM2Fp6Fn93S 09/07/24(Sat)03:24:37 No.102269309

>>102265492
I would agree that it's an investment, but it's not the kind where you'll get a steady return over time.
There will either be no change at all or it will suddenly be 50% faster if someone invests the time to figure out the NUMA stuff.

Anonymous
09/07/24(Sat)03:24:50 No.102269312

Anonymous 09/07/24(Sat)03:24:50 No.102269312

>>102269281
Use Nemo or Gemma like a normal person.

llama.cpp CUDA dev !!OM2Fp6Fn93S
09/07/24(Sat)03:31:06 No.102269376

llama.cpp CUDA dev !!OM2Fp6Fn93S 09/07/24(Sat)03:31:06 No.102269376

>>102266507
>undervolt and 1200W psu will be more than fine
I didn't test this with multiple 3090s but based on my experience with multiple 4090s you would need to limit the boost frequencies of the GPUs in order to avoid instability from power spikes.

Anonymous
09/07/24(Sat)03:33:09 No.102269389

Anonymous 09/07/24(Sat)03:33:09 No.102269389

File: works-on-my-machine.jpg (59 KB, 800x800)

59 KB JPG

>>102267146

Anonymous
09/07/24(Sat)03:36:50 No.102269423

Anonymous 09/07/24(Sat)03:36:50 No.102269423

>>102269376
Power limits exist.

Anonymous
09/07/24(Sat)03:40:42 No.102269458

Anonymous 09/07/24(Sat)03:40:42 No.102269458

>>102266882
GPT-4o~=Claude3.5 >>>>>> anything else.

Anonymous
09/07/24(Sat)03:42:17 No.102269473

Anonymous 09/07/24(Sat)03:42:17 No.102269473

>>102267146
I think they wasted 10k on a strategy that didn't work out (or it works, but it's nothing they can sell since it's all in the prompt), and they're now grifting in hopes they get acquired by a real company.

Anonymous
09/07/24(Sat)03:48:31 No.102269512

Anonymous 09/07/24(Sat)03:48:31 No.102269512

>>102268024
Instead of just backups, consider using a file system like Btrfs where you can take snapshots with basically zero overhead.
That way, if you accidentally delete the wrong file you can just restore the version from five minutes ago.
The downside vs. EXT4 is lower speed and that the whole thing is newer (though the Btrfs documentation says that only RAID5/6 are maybe unstable).

>>102269150
You can blame Microsoft in the sense that they never added the ability to access any Linux file systems to Windows.

Anonymous
09/07/24(Sat)03:51:06 No.102269526

Anonymous 09/07/24(Sat)03:51:06 No.102269526

File: 1725679127260993.jpg (727 KB, 2048x2048)

727 KB JPG

i'm trying to generate boomer prompts for flux and i've been using chatgpt 4o to with great success. however i quickly run out of free prompts so i'm looking for an alternative. i know there's joycaption or whatever but flux is already eating up most of my vram and i dont think i can run a second model at the same time.

llama.cpp CUDA dev !!OM2Fp6Fn93S
09/07/24(Sat)03:52:06 No.102269534

llama.cpp CUDA dev !!OM2Fp6Fn93S 09/07/24(Sat)03:52:06 No.102269534

>>102269423
Yes, and they don't do shit against power spikes.
I get the impression that power limits are only enforced on comparatively long time scales (from a hardware perspective) so each individual GPU is allowed to temporarily exceed its limit.
And if multiple spikes happen to eventually align you can either get bit flips or the system will crash.

Anonymous
09/07/24(Sat)04:54:32 No.102270104

Anonymous 09/07/24(Sat)04:54:32 No.102270104

>>102258941
is Infermatic a good choice if I want to try model out but I'm on a shitty PC? or is there a better service?

Anonymous
09/07/24(Sat)05:05:41 No.102270191

Anonymous 09/07/24(Sat)05:05:41 No.102270191

>Mistral-NeMo-12B-Lyra-v4, layered over Lyra-v3, which was built on top of Lyra-v2a2, which itself was built upon Lyra-v2a1.

>This uses ChatML, or any of its variants which were included in previous versions.

>Introduces run-off generations at times, as seen in v2a2. It's layered on top of older models, so eh, makes sense. Easy to cut out though.

>Some people have been having issues with run-on generations for Lyra-v3. Kind of weird, when I never had issues.

>I like long generations, though I can control it easily to create short ones. If you're struggling, prompt better. Fix your system prompts, use an Author's Note, use a prefill. They are there for a reason.

>Issues like roleplay format are what I consider worthless, as it follows few-shot examples fine. This is not a priority for me to 'fix', as I see no isses with it. Same with excessive generations. Its easy to cut out.

>If you don't like it, just try another model? Plenty of other choices. Ymmv, I like it.

https://huggingface.co/Sao10K/MN-12B-Lyra-v4a1

Anonymous
09/07/24(Sat)05:06:25 No.102270198

Anonymous 09/07/24(Sat)05:06:25 No.102270198

>>102269534
Can confirm.

While x3 3090's power limited to 250W should work on a 1000W Gold Corsair PSU, it trips when using it on vLLM with TP.
It works fine on normal inference without TP

Anonymous
09/07/24(Sat)05:07:03 No.102270207

Anonymous 09/07/24(Sat)05:07:03 No.102270207

>>102270104
Featherless or OpenRouter.
>>102270191
>If you don't like it, just try another model? Plenty of other choices. Ymmv, I like it.
The way he writes is insufferable.

Anonymous
09/07/24(Sat)05:08:21 No.102270218

Anonymous 09/07/24(Sat)05:08:21 No.102270218

>>102270191
wow truly amazing
>my model sucks and I handhold it every step of the way
>don't like it?
yeah, i'll just pick something better, thx sao

Anonymous
09/07/24(Sat)05:11:05 No.102270242

Anonymous 09/07/24(Sat)05:11:05 No.102270242

>>102258941
>Chatbot Arena: https://chat.lmsys.org/?leaderboard
why is this piece of shit in the OP?

Anonymous
09/07/24(Sat)05:14:24 No.102270274

Anonymous 09/07/24(Sat)05:14:24 No.102270274

>>102270242
>piece of shit
hi petra

Anonymous
09/07/24(Sat)05:14:55 No.102270281

Anonymous 09/07/24(Sat)05:14:55 No.102270281

>>102270274
Hi Sao

Anonymous
09/07/24(Sat)05:18:10 No.102270311

Anonymous 09/07/24(Sat)05:18:10 No.102270311

>>102270242
Why is your mom in my bed?

Anonymous
09/07/24(Sat)05:20:26 No.102270330

Anonymous 09/07/24(Sat)05:20:26 No.102270330

>>102258941
Hey lads, I'm from /aicg/
I use cloud chatbots with sillytavern
What is this general for, that? Or are you guys doing local?

Anonymous
09/07/24(Sat)05:24:14 No.102270357

Anonymous 09/07/24(Sat)05:24:14 No.102270357

>>102270207
>Featherless
is it compatible with Silly Tavern?

Anonymous
09/07/24(Sat)05:26:48 No.102270379

Anonymous 09/07/24(Sat)05:26:48 No.102270379

>>102270191
>just merge a bunch of random shit together
>the result is a fucking mess
Huh. I'll stick to proper finetunes of base models, like mini. Thank you.

Anonymous
09/07/24(Sat)05:27:03 No.102270384

Anonymous 09/07/24(Sat)05:27:03 No.102270384

>>102269512
>You can blame Microsoft in the sense that they never added the ability to access any Linux file systems to Windows.
You can install a 3rd party driver for I believe ext2 or something. Although if you're going to do tha might as well use exfat, unless ext2 has some advantage over exfat that I'm not aware of.

Anonymous
09/07/24(Sat)05:27:51 No.102270389

Anonymous 09/07/24(Sat)05:27:51 No.102270389

>>102270191
>This uses ChatML, or any of its variants which were included in previous versions.
For fuck's sake, don't merge or train with different prompt formats willy nilly. You're just degrading the model.

Anonymous
09/07/24(Sat)05:28:23 No.102270394

Anonymous 09/07/24(Sat)05:28:23 No.102270394

How to run claude on koboldcpp? I can't find gguf.

Anonymous
09/07/24(Sat)05:29:30 No.102270401

Anonymous 09/07/24(Sat)05:29:30 No.102270401

>>102270394
https://huggingface.co/Undi95/Meta-Llama-3.1-8B-Claude
https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Claude-GGUF

Anonymous
09/07/24(Sat)05:30:46 No.102270415

Anonymous 09/07/24(Sat)05:30:46 No.102270415

>>102270389
they've been told that repeatedly and yet keep doing it for some reason

Anonymous
09/07/24(Sat)05:31:33 No.102270429

Anonymous 09/07/24(Sat)05:31:33 No.102270429

>>102270415
>>102270389
>If you don't like it, just try another model? Plenty of other choices. Ymmv, I like it.

Anonymous
09/07/24(Sat)05:31:43 No.102270430

Anonymous 09/07/24(Sat)05:31:43 No.102270430

>>102270401
That's not actually claude, that's a llaama finetune

Anonymous
09/07/24(Sat)05:35:43 No.102270468

Anonymous 09/07/24(Sat)05:35:43 No.102270468

>>102270430
it says claude right there dumbass

Anonymous
09/07/24(Sat)05:36:42 No.102270478

Anonymous 09/07/24(Sat)05:36:42 No.102270478

>>102270430
It says "claude" because it's Llama trained on 9 000 000 Claude Opus/Sonnet tokens.
Also, don't engage the idiot above me.

Anonymous
09/07/24(Sat)05:37:15 No.102270485

Anonymous 09/07/24(Sat)05:37:15 No.102270485

>>102270478
>idiot
hi petra

Anonymous
09/07/24(Sat)05:37:16 No.102270486

Anonymous 09/07/24(Sat)05:37:16 No.102270486

>>102270468
read the description dumbass
>Llama 3.1 8B Instruct trained on 9 000 000 Claude Opus/Sonnet tokens

Anonymous
09/07/24(Sat)05:49:35 No.102270593

Anonymous 09/07/24(Sat)05:49:35 No.102270593

Why don’t we all talk about this?:
https://www.youtube.com/watch?v=FPJ8ED1YhxY
https://x.com/mattshumer_/status/1831767014341538166
https://huggingface.co/mattshumer/Reflection-Llama-3.1-70B

Anonymous
09/07/24(Sat)05:51:40 No.102270612

Anonymous 09/07/24(Sat)05:51:40 No.102270612

>>102270593
Because that guy overhyped it and has grifter vibes
We've tried the model and it's not that amazing

Anonymous
09/07/24(Sat)05:54:29 No.102270633

Anonymous 09/07/24(Sat)05:54:29 No.102270633

>>102270612
>the model and it's not that amazing
It does not outperform all 70b models?

Anonymous
09/07/24(Sat)05:55:40 No.102270643

Anonymous 09/07/24(Sat)05:55:40 No.102270643

>>102270593
Thanks for shilling your Youtube video.

Anonymous
09/07/24(Sat)05:57:02 No.102270654

Anonymous 09/07/24(Sat)05:57:02 No.102270654

>>102270643
No problem, the 5 views from her are the lifeblood of my channels.

Anonymous
09/07/24(Sat)05:57:45 No.102270660

Anonymous 09/07/24(Sat)05:57:45 No.102270660

>>102270593
>we
https://rentry.org/83fkenr9

Anonymous
09/07/24(Sat)06:02:33 No.102270709

Anonymous 09/07/24(Sat)06:02:33 No.102270709

are there any practical uses to running these locally?

Anonymous
09/07/24(Sat)06:04:17 No.102270725

Anonymous 09/07/24(Sat)06:04:17 No.102270725

>>102270709
no

Anonymous
09/07/24(Sat)06:05:56 No.102270741

Anonymous 09/07/24(Sat)06:05:56 No.102270741

>>102270654
kek

Anonymous
09/07/24(Sat)06:07:43 No.102270758

Anonymous 09/07/24(Sat)06:07:43 No.102270758

>>102270709
free heating for your computer room :)

Anonymous
09/07/24(Sat)06:10:03 No.102270768

Anonymous 09/07/24(Sat)06:10:03 No.102270768

>>102270593
>PSA: Matt Shumer has not disclosed his investment in GlaiveAI, used to generate data for Reflection 70B
>https://www.reddit.com/r/LocalLLaMA/comments/1fb1h48/psa_matt_shumer_has_not_disclosed_his_investment/

Anonymous
09/07/24(Sat)06:11:52 No.102270787

Anonymous 09/07/24(Sat)06:11:52 No.102270787

>>102270709
It's like owning your own car or your own home. That sort of thing. Just ain't the same if you're just relying on another man's property.

Anonymous
09/07/24(Sat)06:12:49 No.102270798

Anonymous 09/07/24(Sat)06:12:49 No.102270798

>>102270768
oh no! how terrible. maybe he is even a climate denier or has evil thoughts about our beloved PoC folk.

Anonymous
09/07/24(Sat)06:14:40 No.102270816

Anonymous 09/07/24(Sat)06:14:40 No.102270816

>>102270798
hi matt nice ad campaign for glaive

Anonymous
09/07/24(Sat)06:17:10 No.102270838

Anonymous 09/07/24(Sat)06:17:10 No.102270838

>>102270593
>https://huggingface.co/mattshumer/Reflection-Llama-3.1-70B/commit/2d5b978a1770d00bdf3c7de96f3112d571deeb75
>"_name_or_path": "meta-llama/Meta-Llama-3-70B-Instruct",

Anonymous
09/07/24(Sat)06:19:14 No.102270855

Anonymous 09/07/24(Sat)06:19:14 No.102270855

>>102270816
thanks, nice reddit link fellow LGBTQIA2S+ sister

Anonymous
09/07/24(Sat)06:31:31 No.102270983

Anonymous 09/07/24(Sat)06:31:31 No.102270983

>>102270633
In reasoning dumb and useless logic tests? In resolving idiotic riddles and rankings? Maybe. But it doesn't sit on my face in a credible way.

Anonymous
09/07/24(Sat)06:33:07 No.102270997

Anonymous 09/07/24(Sat)06:33:07 No.102270997

Rocinante 1.1 is great for nsfw stories but it has shit context length. Any model as good for that task with a big context length like Luminum (131072)

Anonymous
09/07/24(Sat)06:35:25 No.102271020

Anonymous 09/07/24(Sat)06:35:25 No.102271020

>>102270983
>Maybe. But it doesn't sit on my face in a credible way.
Isn't this a problem with the underlying llama model rather than this new method? The advantage for RP should be that there are logically more meaningful outputs and anatomy and the like are presented better.

I am also interested in how it performs for coding.

Anonymous
09/07/24(Sat)06:44:15 No.102271138

Anonymous 09/07/24(Sat)06:44:15 No.102271138

File: GW2-6rxWQAAeLdI.jpg (125 KB, 807x1080)

125 KB JPG

>tfw Elon will be the first to release a publicly accessible beyond GPT 4 level LLM

Anonymous
09/07/24(Sat)06:44:57 No.102271148

Anonymous 09/07/24(Sat)06:44:57 No.102271148

>>102271020
>Isn't this a problem with the underlying llama model rather than this new method
Possibly. I don't like Llama either.
To be honest, the best results for the things I want AI are achieved with good quality datasets and good training methods, not meme "prompt engineering" ideas and fine tuning.

Anonymous
09/07/24(Sat)06:47:30 No.102271185

Anonymous 09/07/24(Sat)06:47:30 No.102271185

NEW RULE: You must have AT LEAST 48GB VRAM to post here.

Anonymous
09/07/24(Sat)06:48:41 No.102271203

Anonymous 09/07/24(Sat)06:48:41 No.102271203

>>102271185
96*

Anonymous
09/07/24(Sat)06:51:03 No.102271229

Anonymous 09/07/24(Sat)06:51:03 No.102271229

Lol what a bunch of useless trannycucks

Anonymous
09/07/24(Sat)06:51:13 No.102271234

Anonymous 09/07/24(Sat)06:51:13 No.102271234

>>102271185
I hope the three of you don't kill your wallets by the end of the year.

Anonymous
09/07/24(Sat)06:52:31 No.102271249

Anonymous 09/07/24(Sat)06:52:31 No.102271249

File: file.png (62 KB, 962x635)

62 KB PNG

>>102270660

Anonymous
09/07/24(Sat)06:57:06 No.102271300

Anonymous 09/07/24(Sat)06:57:06 No.102271300

>>102270660
Is that supposed to be a negative example? Do you want to make a point?

Do you understand that the <thinking> part is not intended to be shown to the user and only the final output should be visible to the user?

Anonymous
09/07/24(Sat)06:57:18 No.102271303

Anonymous 09/07/24(Sat)06:57:18 No.102271303

Is Starcode reasonably good, or am I better off using something like Codeium for a free AI autocomplete?

Anonymous
09/07/24(Sat)06:58:00 No.102271317

Anonymous 09/07/24(Sat)06:58:00 No.102271317

Hey anons, I'm mega inexperienced with local stuff, been using only Anthropic for RPing. I got a VPS to do some work on and was thinking about running some local stuff on it during my off time.

It has these specs: Intel Xeon CPU, 16 vCPUs, 16GB RAM, and a Tesla T4 with 23GB VRAM.

What's a good model I can run on this machine?

Anonymous
09/07/24(Sat)07:00:58 No.102271345

Anonymous 09/07/24(Sat)07:00:58 No.102271345

>>102271317
mixtral 8x7b or nemo, use gguf quants

Anonymous
09/07/24(Sat)07:01:05 No.102271347

Anonymous 09/07/24(Sat)07:01:05 No.102271347

>>102271303
nta but the output then is not very different from a normal one that doesn't use the reflection meme. other than being able to count the 'r's in 'strawberry' correctly, what good is it for?

Anonymous
09/07/24(Sat)07:03:06 No.102271360

Anonymous 09/07/24(Sat)07:03:06 No.102271360

>>102271347
sorry was meant for >>102271300

Anonymous
09/07/24(Sat)07:03:23 No.102271361

Anonymous 09/07/24(Sat)07:03:23 No.102271361

>>102271300
The point is that retards were dying to try that shitty model when all you need is a tiny prompt to achieve the same thing.

Anonymous
09/07/24(Sat)07:13:58 No.102271463

Anonymous 09/07/24(Sat)07:13:58 No.102271463

>the new Reflection model from a small company is surprising people with its performance on par with much larger closed-source models
>the hype and excitement on social media seems to be from people who haven't actually tried the model themselves
>the extremely high GSM8k score of over 99% seems suspiciously perfect and may indicate issues with the dataset
>there's a question of whether the model's real-world performance is as revolutionary as the benchmark scores suggest or if there are issues with the evaluation that could be misleading
>people started noticing the model is garbage and he tweeted saying "oh it's actually not the real one, let me reupload it"
It's all a publicity stunt that violates Meta's terms on top of it all.

Anonymous
09/07/24(Sat)07:16:00 No.102271489

Anonymous 09/07/24(Sat)07:16:00 No.102271489

>>102271463
So you have tried it?

Anonymous
09/07/24(Sat)07:16:08 No.102271491

Anonymous 09/07/24(Sat)07:16:08 No.102271491

>>102262882
Hmm nice cat

Anonymous
09/07/24(Sat)07:21:15 No.102271549

Anonymous 09/07/24(Sat)07:21:15 No.102271549

>>102271489
Yeah, it's basically an awkward llama 3 (not 3.1)

Anonymous
09/07/24(Sat)07:25:07 No.102271597

Anonymous 09/07/24(Sat)07:25:07 No.102271597

>>102263462
The voice is AI generated? I wonder what model they are using if so as it sounds really fucking good for AI

Anonymous
09/07/24(Sat)07:26:27 No.102271605

Anonymous 09/07/24(Sat)07:26:27 No.102271605

>Rocinante 1.2
>it's back to drummer style super horny garbage
1.1 was a fluke

Anonymous
09/07/24(Sat)07:29:36 No.102271632

Anonymous 09/07/24(Sat)07:29:36 No.102271632

Can I realistically run a 70B Q_4 quant (about 42GB) with 24GB of VRAM and 48GB of RAM?

Anonymous
09/07/24(Sat)07:31:27 No.102271642

Anonymous 09/07/24(Sat)07:31:27 No.102271642

>>102271605
1.2?

As far as I see there's only 1 and 1.1.

Anonymous
09/07/24(Sat)07:32:38 No.102271652

Anonymous 09/07/24(Sat)07:32:38 No.102271652

>>102271642
He labeled it UnslopNemo-v1 on hugginface

Anonymous
09/07/24(Sat)07:39:15 No.102271707

Anonymous 09/07/24(Sat)07:39:15 No.102271707

>>102271652
Oh right. I missed that.
Downloadan now. Big fan of Rocinante, but it gets retarded real quick with quant size.

Anonymous
09/07/24(Sat)07:54:09 No.102271823

Anonymous 09/07/24(Sat)07:54:09 No.102271823

is there any good news on the horizon at all for gpus? I am tired of coping with chained 3090s.

I want 48gb vram cards for under 1k

Anonymous
09/07/24(Sat)07:54:40 No.102271829

Anonymous 09/07/24(Sat)07:54:40 No.102271829

>>102271632
use exl2 and yes, exl2 is by far the best way to get things running when quanted

Anonymous
09/07/24(Sat)07:55:44 No.102271836

Anonymous 09/07/24(Sat)07:55:44 No.102271836

>>102271632
wait nvm you said 24gb vram, no.
You generally need at least the same vram as the downloaded model size.
So you can aim to run a model with files that are like 21gb with 24gb vram

Anonymous
09/07/24(Sat)08:09:51 No.102271951

Anonymous 09/07/24(Sat)08:09:51 No.102271951

RP version of reflection coming out, it's a 69b model called ministration.

Anonymous
09/07/24(Sat)08:13:06 No.102271982

Anonymous 09/07/24(Sat)08:13:06 No.102271982

Tinkering with models in the past, I could never get much interesting erotic prose out of them. Until Dolphin-Mistral. Holy fuck, diamonds. A little care in the prompting to set up characters and avoid abbreviating scenes and it's perfect. Is this the top of the uncensored prose game or is there something even better?

Anonymous
09/07/24(Sat)08:18:45 No.102272044

Anonymous 09/07/24(Sat)08:18:45 No.102272044

>>102271982
you sound like you are using the retarded small models.
Mistral large is incredible and you can run it quanted on 48gb vram

Anonymous
09/07/24(Sat)08:18:46 No.102272046

Anonymous 09/07/24(Sat)08:18:46 No.102272046

>>102268078
calling the current LLMs AI or even AGI is going to be remembered like the geocentric model

Anonymous
09/07/24(Sat)08:20:12 No.102272065

Anonymous 09/07/24(Sat)08:20:12 No.102272065

>>102272041
>>102272041
>>102272041

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.