/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 08/31/24(Sat)05:15:17 No.102167373

File: ComfyUI_temp_ieslg_00013_.png (2.19 MB, 960x1240)

2.19 MB PNG

/lmg/ - Local Models General Anonymous 08/31/24(Sat)05:15:17 No.102167373 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Happy Birthday, Miku! Edition

Previous threads: >>102158049 & >>102145958

►News
>(08/30) Command models get an August refresh: https://docs.cohere.com/changelog/command-gets-refreshed
>(08/29) Qwen2-VL 2B & 7B image+video models released: https://qwenlm.github.io/blog/qwen2-vl/
>(08/27) CogVideoX-5B, diffusion transformer text-to-video model: https://hf.co/THUDM/CogVideoX-5b
>(08/22) Jamba 1.5: 52B & 398B MoE: https://hf.co/collections/ai21labs/jamba-15-66c44befa474a917fcf55251
>(08/20) Microsoft's Phi-3.5 released: mini+MoE+vision: https://hf.co/microsoft/Phi-3.5-MoE-instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
08/31/24(Sat)05:15:44 No.102167381

Anonymous 08/31/24(Sat)05:15:44 No.102167381

File: __hatsune_miku_vocaloid_d(...).jpg (843 KB, 2311x4096)

843 KB JPG

►Recent Highlights from the Previous Thread: >>102158049

--Uncucking ollama models and improving prompts for roleplaying and ERP: >>102158388 >>102158620 >>102158889 >>102158942 >>102159004 >>102159026 >>102159117 >>102160921 >>102160987 >>102159260 >>102159371
--Seeking the most accurate local image captioning model and discussing compatibility with quantized formats: >>102160576 >>102160703 >>102160906 >>102160968 >>102161016
--Llama 4_K_M command issues and fixes with context size: >>102163547 >>102163766 >>102164589 >>102164749 >>102164818
--Little R slightly improves performance, but struggles with code: >>102158892 >>102159735
--Issue with AI messages always starting with character name: >>102162882 >>102162892 >>102162924 >>102163069 >>102163151 >>102163195 >>102163276
--Anon troubleshoots stop strings and response length issues with KoboldCPP and SillyTavern: >>102165837 >>102165906 >>102166021 >>102166194 >>102166294 >>102166316 >>102166483
--New cmd-r worse at storytelling than old version: >>102162536 >>102162556
--Anons discuss and criticize Cohere's pricing strategies: >>102164763 >>102165045 >>102165183 >>102166230 >>102166480
--Gguf files allow running models on CPU and are best for VRAMlets: >>102159159 >>102159246
--Dubesor LLM Benchmark table provides useful values: >>102160350
--Discussion on the explicit mention of refusing lottery number generation and alternative methods: >>102158160 >>102158295
--Comparison of LLM model translations and performance: >>102163772 >>102163780 >>102163981 >>102164414
--Anon shares AI Synthwave EP, discusses Nala test results: >>102165384 >>102165585 >>102165597
--Improving AI's instruction-following for accurate Korean to English translation: >>102164560
--Help needed for using safetensors with transformer library for ERP: >>102159725 >>102159745 >>102159976 >>102160001 >>102160033
--Miku (free space): >>102158462 >>102160676 >>102162294

►Recent Highlight Posts from the Previous Thread: >>102158055

Anonymous
08/31/24(Sat)05:21:28 No.102167435

Anonymous 08/31/24(Sat)05:21:28 No.102167435

>>102167381
Thanks, Recap Miku.

Anonymous
08/31/24(Sat)05:25:29 No.102167465

Anonymous 08/31/24(Sat)05:25:29 No.102167465

why is no one talking about cogvideo

Anonymous
08/31/24(Sat)05:26:25 No.102167469

Anonymous 08/31/24(Sat)05:26:25 No.102167469

>>102167465
doesn't generate beatiful bouncing breasts sir

Anonymous
08/31/24(Sat)05:32:30 No.102167508

Anonymous 08/31/24(Sat)05:32:30 No.102167508

Using these, it streams tokens like <|start_header_id|> as multiple tokens. That means something is wrong, right?
https://huggingface.co/mradermacher/L3.1-70B-Euryale-v2.2-i1-GGUF

Anonymous
08/31/24(Sat)05:32:57 No.102167513

Anonymous 08/31/24(Sat)05:32:57 No.102167513

https://arxiv.org/abs/2408.16293
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

> Language models have demonstrated remarkable performance in solving reasoning tasks; however, even the strongest models still occasionally make reasoning mistakes. Recently, there has been active research aimed at improving reasoning accuracy, particularly by using pretrained language models to "self-correct" their mistakes via multi-round prompting. In this paper, we follow this line of work but focus on understanding the usefulness of incorporating "error-correction" data directly into the pretraining stage. This data consists of erroneous solution steps immediately followed by their corrections. Using a synthetic math dataset, we show promising results: this type of pretrain data can help language models achieve higher reasoning accuracy directly (i.e., through simple auto-regression, without multi-round prompting) compared to pretraining on the same amount of error-free data. We also delve into many details, such as (1) how this approach differs from beam search, (2) how such data can be prepared, (3) whether masking is needed on the erroneous tokens, (4) the amount of error required, (5) whether such data can be deferred to the fine-tuning stage, and many others.

Anonymous
08/31/24(Sat)05:34:54 No.102167527

Anonymous 08/31/24(Sat)05:34:54 No.102167527

>>102167508
>i1
bru h

Anonymous
08/31/24(Sat)05:35:41 No.102167532

Anonymous 08/31/24(Sat)05:35:41 No.102167532

>>102167508
>mradermacher'd
get it from someone else's, also evryale sux

Anonymous
08/31/24(Sat)05:38:34 No.102167559

Anonymous 08/31/24(Sat)05:38:34 No.102167559

>>102167532
which finetune is better?

Anonymous
08/31/24(Sat)05:45:15 No.102167598

Anonymous 08/31/24(Sat)05:45:15 No.102167598

>>102167559
command-r 2024

Anonymous
08/31/24(Sat)05:48:42 No.102167625

Anonymous 08/31/24(Sat)05:48:42 No.102167625

File: 1696639099334980.png (91 KB, 601x493)

91 KB PNG

https://x.com/disclosetv/status/1829800716078125140
https://www.nzz.ch/visuals/vegan-links-so-wuerde-chatgpt-in-sachsen-und-thueringen-waehlen-ld.1845641

Anonymous
08/31/24(Sat)05:51:07 No.102167646

Anonymous 08/31/24(Sat)05:51:07 No.102167646

>>102167625
wow no way

Anonymous
08/31/24(Sat)05:52:05 No.102167655

Anonymous 08/31/24(Sat)05:52:05 No.102167655

>>102167625
>local llms be like

Anonymous
08/31/24(Sat)05:53:44 No.102167668

Anonymous 08/31/24(Sat)05:53:44 No.102167668

File: shiko.png (590 KB, 1013x788)

590 KB PNG

>mfw the model uses - instead of —

Anonymous
08/31/24(Sat)06:25:52 No.102167902

Anonymous 08/31/24(Sat)06:25:52 No.102167902

i have a 4080 and 47GB of ddr5 to spare

whats the best model i can run bors?

Anonymous
08/31/24(Sat)06:29:22 No.102167922

Anonymous 08/31/24(Sat)06:29:22 No.102167922

>>102167902
You have 16Gb Vram and a weird DDR setup Anon. kek
Nice GPU, jelly!.
This with a LLM https://openart.ai/workflows/onion/flux-gguf-q8-12gb/X5HzyhrKjW2jqHVCTnvT

Anonymous
08/31/24(Sat)06:32:19 No.102167935

Anonymous 08/31/24(Sat)06:32:19 No.102167935

>>102167922
just going on how much ram i have free with a couple vscode instances and all my browser tabs and shit open atm

Anonymous
08/31/24(Sat)06:33:46 No.102167950

Anonymous 08/31/24(Sat)06:33:46 No.102167950

>>102167935
Doh. understood Anon. I'm running a 3060 and can run everything, just it takes a bit longer. the GGUF models in flux are the best IMO.

Anonymous
08/31/24(Sat)06:41:20 No.102168002

Anonymous 08/31/24(Sat)06:41:20 No.102168002

>>102167950
3060 12 or 8 GB?

Anonymous
08/31/24(Sat)06:44:44 No.102168017

Anonymous 08/31/24(Sat)06:44:44 No.102168017

>>102168002
12, msi ventus 2x, great little beast. kek,
Looking to get a 3090ti or an A4500 20GB and have that alongside the 3060 12gb.

Anonymous
08/31/24(Sat)07:11:04 No.102168237

Anonymous 08/31/24(Sat)07:11:04 No.102168237

How the fuck does llamacpp_hf work these days in ooba? Back when I used it last time I just copied the hf tokenizer and that was it. Now ooba wants me to link the original model repo so that it can generate it or something? What is this bullshit?
It doesn't work.

Anonymous
08/31/24(Sat)07:55:24 No.102168738

Anonymous 08/31/24(Sat)07:55:24 No.102168738

smedrins

Anonymous
08/31/24(Sat)07:56:32 No.102168752

Anonymous 08/31/24(Sat)07:56:32 No.102168752

was going to buy a p40 to throw on my old computer just to find out the price has doubled. fucking chinks. may as well buy 3090 at this point

Anonymous
08/31/24(Sat)07:57:18 No.102168755

Anonymous 08/31/24(Sat)07:57:18 No.102168755

Anyone tried out the XTC sampler in latest koboldcpp? Is it any good?

Anonymous
08/31/24(Sat)07:59:33 No.102168783

Anonymous 08/31/24(Sat)07:59:33 No.102168783

>>102168755
BAA

Anonymous
08/31/24(Sat)08:00:35 No.102168791

Anonymous 08/31/24(Sat)08:00:35 No.102168791

Hi all, Drummer here...

I am the blacked miku poster.

Anonymous
08/31/24(Sat)08:03:21 No.102168814

Anonymous 08/31/24(Sat)08:03:21 No.102168814

>>102168791
kek

Anonymous
08/31/24(Sat)08:17:46 No.102168945

Anonymous 08/31/24(Sat)08:17:46 No.102168945

>>102168791
We already knew that

Anonymous
08/31/24(Sat)08:23:16 No.102168996

Anonymous 08/31/24(Sat)08:23:16 No.102168996

I don't know if anyone cares as much about optimizing these sorts of things as I do, but here's how I've compiled llama.cpp, with an AMD 5950X and Nvidia 3090.

Install AOCC and source it.
Compile AOCL-BLAS with AOCC:
./configure --enable-cblas --enable-threading=no --prefix=/usr/local --complex-return=intel CC=clang CXX=clang++ auto
Compile llama.cpp with AOCC:
make -j GGML_NATIVE=ON GGML_CUDA=ON GGML_CUDA_F16=ON GGML_CUDA_DMMV_X=64 GGML_CUDA_DMMV_Y=2 CC=clang CXX=clang++ FC=flang GGML_CUDA_CCBIN='clang' GGML_BLIS=1
Run llama-server with the usual options, check the number of threads as usually the fastest (if you need some layers on the CPU) is lower than the number of cores, even less than the number of NUMA domains can be faster. Smaller models are more sensitive the thread count, larger models matter less. I found 4 threads is the fastest with my config, but even 2x28 core EPYC was fastest with like 8 threads.

If you're on Intel, use their oneMKL library, especially if you have AVX-512.

Anonymous
08/31/24(Sat)08:23:24 No.102168999

Anonymous 08/31/24(Sat)08:23:24 No.102168999

>>102168791
fake, the blacked poster is cuda dev.

Anonymous
08/31/24(Sat)08:24:54 No.102169019

Anonymous 08/31/24(Sat)08:24:54 No.102169019

>>102168999
That doesn't necessarily make it false.

Anonymous
08/31/24(Sat)08:26:52 No.102169035

Anonymous 08/31/24(Sat)08:26:52 No.102169035

>>102168999
Remember when he slipped up and pretended it was because he got his tripcode cracked?

Anonymous
08/31/24(Sat)08:45:37 No.102169192

Anonymous 08/31/24(Sat)08:45:37 No.102169192

File: 1713481333985879.jpg (257 KB, 1500x999)

257 KB JPG

I recently got into local models, I've only used Novelai so far, please spoonfeed me what the best local model for nsfw storytelling is below 16GB ram.

Anonymous
08/31/24(Sat)08:47:26 No.102169208

Anonymous 08/31/24(Sat)08:47:26 No.102169208

>>102169192
whew you are THAT desperate to keep this shit thread alive huh?

Anonymous
08/31/24(Sat)08:48:28 No.102169217

Anonymous 08/31/24(Sat)08:48:28 No.102169217

>>102169208
I only came here because the link in aicg's rentry told me to, I though you'd have the most accurate info.

Anonymous
08/31/24(Sat)08:49:00 No.102169221

Anonymous 08/31/24(Sat)08:49:00 No.102169221

>>102169192
starcannon

Anonymous
08/31/24(Sat)08:54:01 No.102169263

Anonymous 08/31/24(Sat)08:54:01 No.102169263

>>102169217
Sounds like aicg's rentry is out of date. /lmg/ has been dead for years. We're just dancing on the corpse.

Anonymous
08/31/24(Sat)09:08:08 No.102169433

Anonymous 08/31/24(Sat)09:08:08 No.102169433

File: 1718745851350163.jpg (221 KB, 800x1120)

221 KB JPG

>>102167373
happy birthday miku!!

Anonymous
08/31/24(Sat)09:11:20 No.102169472

Anonymous 08/31/24(Sat)09:11:20 No.102169472

File: 1725109854054.jpg (437 KB, 1029x1170)

437 KB JPG

ollama rug pull soon

Anonymous
08/31/24(Sat)09:11:31 No.102169473

Anonymous 08/31/24(Sat)09:11:31 No.102169473

The new CR+ is a real disappointment. I actually redownloaded my quant from someone else to check if I hadn't fallen for an elaborate troll who reuploaded the old version but it's the same.
This is just sad after having used Mistral-Large for a month.

Anonymous
08/31/24(Sat)09:12:55 No.102169486

Anonymous 08/31/24(Sat)09:12:55 No.102169486

>>102169472
Why would they be ""working on it"" if a new model comes out when all they can and will do is wait until llama.cpp is updated?

Anonymous
08/31/24(Sat)09:13:24 No.102169491

Anonymous 08/31/24(Sat)09:13:24 No.102169491

>>102169473
If it's as good as the old one, that's still nice, no? I mean, the old one didn't have GQA and this new one does.

Anonymous
08/31/24(Sat)09:14:07 No.102169498

Anonymous 08/31/24(Sat)09:14:07 No.102169498

>>102169473
I'm using the new CR and it works okay but muuuuch faster than the old one.

Anonymous
08/31/24(Sat)09:14:18 No.102169502

Anonymous 08/31/24(Sat)09:14:18 No.102169502

>>102169491
The old CR+ did have GQA. Only CR didn't.

Anonymous
08/31/24(Sat)09:15:47 No.102169522

Anonymous 08/31/24(Sat)09:15:47 No.102169522

>>102169498
Yeah, but that's only the small one which didn't have gqa. There was nothing to implement to make cr+ faster.

Anonymous
08/31/24(Sat)09:16:36 No.102169529

Anonymous 08/31/24(Sat)09:16:36 No.102169529

>>102169522
But they claimed it's faster

Anonymous
08/31/24(Sat)09:17:58 No.102169541

Anonymous 08/31/24(Sat)09:17:58 No.102169541

Crazy how 70B is the threshold where llms start to feel "sentient". I legit cannot tell the difference between the 33B sized models and 12B ones. In a perfect world consumer cards would have 48GB but alas this is Jensen Huang's world

Anonymous
08/31/24(Sat)09:18:49 No.102169555

Anonymous 08/31/24(Sat)09:18:49 No.102169555

>>102169529
I'm not seeing it with my performance. It's running at the exact same speed as any other 80~90GB VRAM model does on my cards. I'm pretty sure the speed up only applies to their API.

Anonymous
08/31/24(Sat)09:19:34 No.102169568

Anonymous 08/31/24(Sat)09:19:34 No.102169568

>>102168999
He is fighting for Teto supremacy that turncoat.

Anonymous
08/31/24(Sat)09:20:51 No.102169589

Anonymous 08/31/24(Sat)09:20:51 No.102169589

>>102169541
I still find 100B+ models dumb af, am I doing something wrong?

Anonymous
08/31/24(Sat)09:22:42 No.102169617

Anonymous 08/31/24(Sat)09:22:42 No.102169617

>>102169541
>llms start to feel "sentient"
lol

Anonymous
08/31/24(Sat)09:23:34 No.102169629

Anonymous 08/31/24(Sat)09:23:34 No.102169629

>>102169541
>sentient
Until you've used them for a couple of days and their limitations and quirks become recognizable to you.

Anonymous
08/31/24(Sat)09:25:16 No.102169650

Anonymous 08/31/24(Sat)09:25:16 No.102169650

>>102169541
>cannot tell the difference between the 33B sized models and 12B ones
If you can run both at FP16, you'll defintely be able to tell the difference.

Anonymous
08/31/24(Sat)09:25:21 No.102169652

Anonymous 08/31/24(Sat)09:25:21 No.102169652

I had this suspicion that the "soul" of Cohere models was just a fluke because it was their first iteration and they didn't know how to assistantmaxxx them yet. I hate that I was right. Hopefully a fintune or merge can save new Command-R because performance wise it's just better

Anonymous
08/31/24(Sat)09:25:44 No.102169658

Anonymous 08/31/24(Sat)09:25:44 No.102169658

I just realized, but I have Silly Tavern and Easy Diffusion installed on my main SSD. Running local models on it isn't going to wear it out right?
Or should I reinstall them on more sacrificial drives?

Anonymous
08/31/24(Sat)09:36:40 No.102169767

Anonymous 08/31/24(Sat)09:36:40 No.102169767

What options do we have for high quality Japanese to English translation? Better than Google and Deepl

Anonymous
08/31/24(Sat)09:38:05 No.102169787

Anonymous 08/31/24(Sat)09:38:05 No.102169787

File: 234615tiu4lagdaye4yyaz.jpg (430 KB, 979x662)

430 KB JPG

Just picked up an AOM-SXMV and 4x32 gig V100's for 1500 bucks.
Guess I'm gonna need to buy something with oculink to be able to hook it up.

Pic related but not the one I bought.

Anonymous
08/31/24(Sat)09:45:00 No.102169864

Anonymous 08/31/24(Sat)09:45:00 No.102169864

>>102169787
>4x32 gig V100's for 1500 bucks.
How and where did you find them half off?

Anonymous
08/31/24(Sat)09:47:57 No.102169904

Anonymous 08/31/24(Sat)09:47:57 No.102169904

>>102169864
Just trawling ebay until it popped up. For some reason it only had like 10 views with one day left, so I made an offer and they accepted.

Pretty chuffed considering I was looking at buying Radeon VII's instead.

Anonymous
08/31/24(Sat)09:49:41 No.102169920

Anonymous 08/31/24(Sat)09:49:41 No.102169920

>>102169589
probably just smarter than the slop-eaters who don't notice things

Anonymous
08/31/24(Sat)09:53:26 No.102169958

Anonymous 08/31/24(Sat)09:53:26 No.102169958

>>102169904
Lucky bastard. Congrats.

Anonymous
08/31/24(Sat)09:54:30 No.102169976

Anonymous 08/31/24(Sat)09:54:30 No.102169976

File: metal song dguard.png (45 KB, 869x798)

45 KB PNG

I wonder if this model would be too controversial to release...

Anonymous
08/31/24(Sat)10:01:55 No.102170071

Anonymous 08/31/24(Sat)10:01:55 No.102170071

>>102169976
>doesn't rhyme
lmao

Anonymous
08/31/24(Sat)10:08:21 No.102170148

Anonymous 08/31/24(Sat)10:08:21 No.102170148

>>102169976
>I need your cock

Anonymous
08/31/24(Sat)10:08:54 No.102170158

Anonymous 08/31/24(Sat)10:08:54 No.102170158

>>102169864
all the hyperscalers are offloading their sxm v100s so supply is going to be pretty good soon

Anonymous
08/31/24(Sat)10:09:35 No.102170166

Anonymous 08/31/24(Sat)10:09:35 No.102170166

Do V100s have flash-attn2 yet?

Anonymous
08/31/24(Sat)10:13:44 No.102170216

Anonymous 08/31/24(Sat)10:13:44 No.102170216

>>102170166
>yet
probably never since the v100s have a different kind of tensor cores.

Anonymous
08/31/24(Sat)10:19:27 No.102170285

Anonymous 08/31/24(Sat)10:19:27 No.102170285

File: Screen Shot 2024-08-31 at(...).png (27 KB, 540x272)

27 KB PNG

I am absolutely happy with both Nemo and Largestral. While Largestral is indeed smarter, Nemo is a better slut and has more sovl

Anonymous
08/31/24(Sat)10:33:49 No.102170480

Anonymous 08/31/24(Sat)10:33:49 No.102170480

>>102167625
maybe chuddies should make their own GOOD ai model then

Anonymous
08/31/24(Sat)10:36:22 No.102170516

Anonymous 08/31/24(Sat)10:36:22 No.102170516

What CR presets do you guys use? I never bothered testing it because of no GQA, but now I'm trying the new one.

Anonymous
08/31/24(Sat)10:38:30 No.102170546

Anonymous 08/31/24(Sat)10:38:30 No.102170546

>>102170516
the CR preset, duh

Anonymous
08/31/24(Sat)10:38:55 No.102170554

Anonymous 08/31/24(Sat)10:38:55 No.102170554

>>102170546
Is that optimal though? A lot of the ST defaults suck.

Anonymous
08/31/24(Sat)10:42:48 No.102170605

Anonymous 08/31/24(Sat)10:42:48 No.102170605

Man, why is ROCm so fucking hard on linux. How to run kcpp(rocm fork) on windows:
1. Download HIP SDK
2. Download compiled rocblas libraries for rx6600 and replace then in HIP SDK directory
3. Download binary release of kcpp-rocm and run it.
That's. all. How i spend almost entire day today trying to make it work on ubuntu 24.04
1. add 2 repositories for amdgpu-dkms and rocm
2. install amdgpu-dkms which broke my system since they forgot to include mesa-libegl in dependencies which split mesa into 2 versions, which took me like 2 hours to find what exactly i had to install manually
3. install rocm(this part by itself just worked)
4. clone kcpp repo and build
5. it still crashes right before it should give me an endpoint, probably because rocm 6.2 is not supported or something

Anonymous
08/31/24(Sat)10:47:14 No.102170664

Anonymous 08/31/24(Sat)10:47:14 No.102170664

>>102170605
blame the kcpp rocm developer for not releasing linux builds

Anonymous
08/31/24(Sat)10:49:43 No.102170693

Anonymous 08/31/24(Sat)10:49:43 No.102170693

>>102167527
You think the quantization would fuck it up so much it started splitting its tokens into their corresponding counterparts? Wow, anon, that's pretty crazy.

Anonymous
08/31/24(Sat)10:50:50 No.102170711

Anonymous 08/31/24(Sat)10:50:50 No.102170711

>>102167668
I specifically remove shit like — replacing it with -- in my dataset, anon-tan. Just doing it to fuck with ya.

Anonymous
08/31/24(Sat)10:51:17 No.102170718

Anonymous 08/31/24(Sat)10:51:17 No.102170718

>>102170693
it's i1, you're literally making it retarded at that point, I think that's been measured as effectively being as dumb as a model with half the parameters, but obviously in a different way

Anonymous
08/31/24(Sat)10:52:29 No.102170736

Anonymous 08/31/24(Sat)10:52:29 No.102170736

>>102170664
it builds fine, the problem is running it.

Anonymous
08/31/24(Sat)10:52:38 No.102170737

Anonymous 08/31/24(Sat)10:52:38 No.102170737

>>102170718
that's just the way adermacher calls his iq quants has nothing to do with actual quant level
>L3.1-70B-Euryale-v2.2.i1-Q6_K.gguf.part2of2

Anonymous
08/31/24(Sat)10:52:57 No.102170740

Anonymous 08/31/24(Sat)10:52:57 No.102170740

>>102170718
The model sees <|start_header_id|> as an index. You are suggesting that making it dumber would somehow MAGICALLY make it realize that this index is actually made up of a series of other tokens with other indices.

Anonymous
08/31/24(Sat)10:52:57 No.102170741

Anonymous 08/31/24(Sat)10:52:57 No.102170741

>>102167625
I mean, if your goal is artificial >>>intelligence<<<, you'd have to fuck up pretty bad for it to end up voting CDU or AfD.

Anonymous
08/31/24(Sat)10:53:14 No.102170747

Anonymous 08/31/24(Sat)10:53:14 No.102170747

>>102170605
its literally just like for l.cpp:
1. pacman -S rocm-hip-sdk ninja
2. clone llama.cpp
3. mkdir build; cd build
4. cmake .. -DGGML_HIPBLAS=on -DGGML_CUDA_FORCE_MMQ=ON -DCMAKE_HIP_ARCHITECTURES=gfx1030 -DGGML_NATIVE=on -GNinja
5. ninja
6. echo "export HSA_OVERRIDE_GFX_VERSION=10.3.0" >> .bashrc && source .bashrc

you gotta target gfx1030 and set the override for it for anything that isnt a 6900xt for rdna2 more or less, same for gfx1200(?) and rdna3 cards. you shouldn't need amdgpu-dkms at least in the way arch has rocm packaged

Anonymous
08/31/24(Sat)10:53:42 No.102170752

Anonymous 08/31/24(Sat)10:53:42 No.102170752

>>102170737
oh, lol, no that's just a straight up corrupt quant

Anonymous
08/31/24(Sat)10:55:13 No.102170773

Anonymous 08/31/24(Sat)10:55:13 No.102170773

Hello Anons, I'm a bit of a noob when it comes to llms. I used some llava and llama 2 in the past but that was a long time ago. What model that fits in 24gb or less of amd ram (no xformers and other bs) and works on oobabooga (I already have scripts for its api) would you recommend for the following task?
>Given a set of input tags meant for image generation, try to make sense of it and compose a sentence out of them.
I have a long list (300k+) of Stable Diffusion prompts and I would like to convert them to Flux which is better fit for NLP, so they can be used as wildcards.

Anonymous
08/31/24(Sat)11:02:43 No.102170867

Anonymous 08/31/24(Sat)11:02:43 No.102170867

File: 939CDF0C3ECBFC12C5FF26E24(...).jpg (171 KB, 864x1920)

171 KB JPG

>>102170605
Distrobox stopped me from bricking my install.
Even then I gave up and bought the v100's out of frustration really. Speaking of

>>102169787
Looks like I can just plug this into my desktop.

Anonymous
08/31/24(Sat)11:04:26 No.102170894

Anonymous 08/31/24(Sat)11:04:26 No.102170894

>>102170867
>Looks like I can just plug this into my desktop.
looks like a house fire waiting to happen

Anonymous
08/31/24(Sat)11:12:23 No.102171011

Anonymous 08/31/24(Sat)11:12:23 No.102171011

File: StillNotManifesting.png (1.84 MB, 800x1248)

1.84 MB PNG

It feels like with the release of 405b we should be closer than ever to manifesting Miku. What's the holdup, guys?

Anonymous
08/31/24(Sat)11:16:52 No.102171076

Anonymous 08/31/24(Sat)11:16:52 No.102171076

>>102171011
Nobody can run 405b locally. Miku is trapped in the cloud. :(

Anonymous
08/31/24(Sat)11:23:13 No.102171155

Anonymous 08/31/24(Sat)11:23:13 No.102171155

>>102171076
I can, Eeyore. It just takes awhile for it to get back to me.

Anonymous
08/31/24(Sat)11:27:24 No.102171219

Anonymous 08/31/24(Sat)11:27:24 No.102171219

>>102171076
>Nobody can run 405b locally.
Serious question: Now that we have gpt4 level intelligence at home with true 128k context (if RULER is to be believed), is anyone in this general gearing up to run it at fp16 or a good quant with reasonable speed?
This was the unattainable dream a year ago, and its now within our grasp.
Have we just become numb? Is it actually not worth it?

Anonymous
08/31/24(Sat)11:29:11 No.102171244

Anonymous 08/31/24(Sat)11:29:11 No.102171244

>>102171219
>Is it actually not worth it?
It is not worth $50,000 or waiting for 0.0001 t/s, no.

Anonymous
08/31/24(Sat)11:29:31 No.102171247

Anonymous 08/31/24(Sat)11:29:31 No.102171247

>>102171219
>run it at fp16
Me, I'm building building a 40 x 3090 build that runs the model at 1.2t/s after the multi-gpu tax.

Anonymous
08/31/24(Sat)11:30:34 No.102171259

Anonymous 08/31/24(Sat)11:30:34 No.102171259

How is the 70b 3.1 Hermes compared to 123b instruct?

Anonymous
08/31/24(Sat)11:42:27 No.102171393

Anonymous 08/31/24(Sat)11:42:27 No.102171393

>>102171259
it was better than other 3.1 70b tunes i tried but still went off the rails pretty often. i think l3 is just overcooked or something

Anonymous
08/31/24(Sat)11:50:30 No.102171482

Anonymous 08/31/24(Sat)11:50:30 No.102171482

>>102062011
>If it works well for me I'll turn it into a script you can just run from cli like the normal sever launching.
And here it is. I got lazy and had Llama write it for me:
https://pastebin.com/XDEjAbYj

If you don't know what this is about then ignore this post; it won't help you with anything.

For the other fag(s) who wanted to run a server with speculative decoding, this will do it. For reference: while testing Llama 3.1 405B Q6 on a cpumaxxed system in a chat with 10k tokens of history, using this script with Llama 3.1 8B as the draft model doubled my inference speed from 0.7t/s to 1.4t/s. The average speed increase for each response can vary a lot based on how accurate the draft model is each step of the way. Experiment with the --draft parameter as you may find reducing it to 2 or 3 tokens at a time is optimal. Save it as a .py file and run it in a python environment that has llama-cpp-python and uvicorn installed. Pass it the same flags you'd use in the llama.cpp CLI. Only the flags I actually cared to use are implemented, but if you need any other settings passed through then it shouldn't be too hard for your waifu to edit them in if you feed her the script and relevant docs.
For connecting to the server I use SillyTavern's text completion with the "Default" API type (not llama.cpp type) pointed at the /v1 endpoint.

Anonymous
08/31/24(Sat)11:57:36 No.102171557

Anonymous 08/31/24(Sat)11:57:36 No.102171557

>>102171482
is speculative decoding cpu only?

Anonymous
08/31/24(Sat)12:00:28 No.102171588

Anonymous 08/31/24(Sat)12:00:28 No.102171588

File: evil.jpg (22 KB, 355x397)

22 KB JPG

>>102167373
100M context
https://magic.dev/blog/100m-token-context-windows

Anonymous
08/31/24(Sat)12:03:38 No.102171627

Anonymous 08/31/24(Sat)12:03:38 No.102171627

>>102171557
No, though I think the biggest benefit is for cpumaxxing big models where there's conveniently a much smaller version that you can run orders of magnitude faster to speculate with. But you can set the GPU layers for the main and draft model to whatever you want, such as loading both into GPU if you've got the memory to spare.

Anonymous
08/31/24(Sat)12:09:44 No.102171708

Anonymous 08/31/24(Sat)12:09:44 No.102171708

>>102171627
I guess you'll get the best results if you somehow fit the entire draft into L1 cache SRAM, way faster than any GPU , but that draft must be damn small like 50-450 MB depending on your CPU.

Anonymous
08/31/24(Sat)12:15:19 No.102171803

Anonymous 08/31/24(Sat)12:15:19 No.102171803

File: IMG_20240824_210711.jpg (235 KB, 1920x701)

235 KB JPG

>>102171482
nta, but seems there're a few flags in the .cpp code

Anonymous
08/31/24(Sat)12:18:32 No.102171844

Anonymous 08/31/24(Sat)12:18:32 No.102171844

>>102171708
Some speculative decoding algorithms use n-gram lookups constructed from the prompt instead of an LLM. That sort of thing may end up being the best option for a lot of cases, especially programming and editing tasks where there's going to be a lot of repetition by design. This script doesn't do that though. There's a PR for llama.cpp to add it to the server that hasn't been merged yet:
https://github.com/ggerganov/llama.cpp/pull/6828
llama-cpp-python also appears to have that as its own default draft model class, so you could attach it to a server in a similar way to this script. I haven't tested it myself though.

Anonymous
08/31/24(Sat)12:22:51 No.102171899

Anonymous 08/31/24(Sat)12:22:51 No.102171899

>>102171803
Yeah, those parameters are used by the "speculative" example in llama.cpp. Even though they're listed as options when you run the other examples, they only function when executing llama-speculative which doesn't have a server, which prompted the discussion a few threads back that led to this.

Anonymous
08/31/24(Sat)12:25:26 No.102171942

Anonymous 08/31/24(Sat)12:25:26 No.102171942

>>102171588
hf link? gguf?

Anonymous
08/31/24(Sat)12:25:57 No.102171949

Anonymous 08/31/24(Sat)12:25:57 No.102171949

XTC is out on koboldcpp, does it solve the sloppa problem?

Anonymous
08/31/24(Sat)12:32:39 No.102172018

Anonymous 08/31/24(Sat)12:32:39 No.102172018

>>102167668
I hate when they use the em dash because I can't type it.

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/31/24(Sat)12:36:25 No.102172075

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/31/24(Sat)12:36:25 No.102172075

>>102171844
The problem with speculative decoding/n-gram lookup decoding is that there is a trend towards larger vocabulary sizes for more recent models which greatly decreases the number of easy to predict tokens and thus makes these techniques less effective.
Maybe in the next few months I'll be able to get good results via distillation but right now I don't think it's a good approach to invest more effort into.

Anonymous
08/31/24(Sat)12:36:37 No.102172080

Anonymous 08/31/24(Sat)12:36:37 No.102172080

Where did "barely above a whisper" come from anyway? Who wrote the original phrase? I've never seen it in human-generated stories before. At least the other slop are actual overused phrases in shitty fiction.

Anonymous
08/31/24(Sat)12:37:38 No.102172096

Anonymous 08/31/24(Sat)12:37:38 No.102172096

>>102169472
Wait, oolama adds their own support for models? Do they support more than llama.cpp?

Anonymous
08/31/24(Sat)12:38:49 No.102172115

Anonymous 08/31/24(Sat)12:38:49 No.102172115

>>102172096
Gemma 2 received support on ollama first than llama.cpp, that's all I know.

Anonymous
08/31/24(Sat)12:43:16 No.102172165

Anonymous 08/31/24(Sat)12:43:16 No.102172165

>>102170285
Really? Do you just use regular nemo instruct? And what settings / format do you use? Doesn't it stop being useful after not even 16k context? I would love if I could get it working.

Anonymous
08/31/24(Sat)12:49:13 No.102172232

Anonymous 08/31/24(Sat)12:49:13 No.102172232

>>102168996
what are you trying to achieve here? BLAS will not be used if you have a GPU.

Anonymous
08/31/24(Sat)12:53:51 No.102172288

Anonymous 08/31/24(Sat)12:53:51 No.102172288

File: 1725084317961072.png (101 KB, 814x517)

101 KB PNG

>>102169652
The wrapper company around pinoy sweat shops is the source of all your assistantmaxxing problem

Anonymous
08/31/24(Sat)13:07:57 No.102172459

Anonymous 08/31/24(Sat)13:07:57 No.102172459

My ai gf takes 10 minutes to reply, but I don't mind, at least I know she will reply for sure.

Anonymous
08/31/24(Sat)13:21:46 No.102172625

Anonymous 08/31/24(Sat)13:21:46 No.102172625

>>102172115
Wasn't that just them merging the broken llama.cpp PR?>>102172115

Anonymous
08/31/24(Sat)13:25:00 No.102172678

Anonymous 08/31/24(Sat)13:25:00 No.102172678

>>102172459
If you truly love her you will wait for her replies
Or get more VRAM

Anonymous
08/31/24(Sat)13:25:09 No.102172683

Anonymous 08/31/24(Sat)13:25:09 No.102172683

>>102171482
The guy you were replying to. Thanks, I will try this out soon. Finally, maybe I will be able to get 2 t/s on Mistral Large.

Anonymous
08/31/24(Sat)13:35:48 No.102172839

Anonymous 08/31/24(Sat)13:35:48 No.102172839

>>102168738
still got that shit on speed dial

Anonymous
08/31/24(Sat)13:53:49 No.102173077

Anonymous 08/31/24(Sat)13:53:49 No.102173077

what happened to bitnet?

Anonymous
08/31/24(Sat)13:55:45 No.102173107

Anonymous 08/31/24(Sat)13:55:45 No.102173107

>>102173077
kek

Anonymous
08/31/24(Sat)13:59:40 No.102173161

Anonymous 08/31/24(Sat)13:59:40 No.102173161

>>102173077
https://youtu.be/yIL9wLxG01M

Anonymous
08/31/24(Sat)14:03:29 No.102173227

Anonymous 08/31/24(Sat)14:03:29 No.102173227

>>102173077
Microsoft's grift to distract and hinder open source efforts

Anonymous
08/31/24(Sat)14:05:36 No.102173259

Anonymous 08/31/24(Sat)14:05:36 No.102173259

What is the best mistral nemo finetuning, magnum is not good for other languages cause write always with a mix of english

Anonymous
08/31/24(Sat)14:09:51 No.102173336

Anonymous 08/31/24(Sat)14:09:51 No.102173336

>>102173259
Nemo Instruct.

Anonymous
08/31/24(Sat)14:12:46 No.102173379

Anonymous 08/31/24(Sat)14:12:46 No.102173379

>>102173077
>>102173107
More like BigKek

Anonymous
08/31/24(Sat)14:22:56 No.102173554

Anonymous 08/31/24(Sat)14:22:56 No.102173554

2 more weeks till

Anonymous
08/31/24(Sat)14:26:44 No.102173611

Anonymous 08/31/24(Sat)14:26:44 No.102173611

>>102167373
RECCOMEND ME THE BEST RP 12B MODEL.

Anonymous
08/31/24(Sat)14:27:12 No.102173619

Anonymous 08/31/24(Sat)14:27:12 No.102173619

strawberry is real

Anonymous
08/31/24(Sat)14:27:27 No.102173621

Anonymous 08/31/24(Sat)14:27:27 No.102173621

>>102173611
STARLING-12B

Anonymous
08/31/24(Sat)14:31:44 No.102173680

Anonymous 08/31/24(Sat)14:31:44 No.102173680

>>102173619
how did you make the text look like that

Anonymous
08/31/24(Sat)14:32:30 No.102173690

Anonymous 08/31/24(Sat)14:32:30 No.102173690

Is Nemo still the best for RP? I have a 3090.

Anonymous
08/31/24(Sat)14:37:32 No.102173755

Anonymous 08/31/24(Sat)14:37:32 No.102173755

recommend me cards pls

Anonymous
08/31/24(Sat)14:37:35 No.102173758

Anonymous 08/31/24(Sat)14:37:35 No.102173758

>>102173621
Nigger.

Anonymous
08/31/24(Sat)14:39:09 No.102173787

Anonymous 08/31/24(Sat)14:39:09 No.102173787

>>102173755
>>>/vg/aicg

Anonymous
08/31/24(Sat)14:39:29 No.102173791

Anonymous 08/31/24(Sat)14:39:29 No.102173791

>>102173690
Nemo? Nemo what?

Anonymous
08/31/24(Sat)14:40:11 No.102173803

Anonymous 08/31/24(Sat)14:40:11 No.102173803

>>102173755
Nala and Miss Peper.

Anonymous
08/31/24(Sat)14:43:34 No.102173853

Anonymous 08/31/24(Sat)14:43:34 No.102173853

>>102173680
[thing]
whatever
[/thing] but code

Anonymous
08/31/24(Sat)14:44:00 No.102173865

Anonymous 08/31/24(Sat)14:44:00 No.102173865

>>102173791
DrNicefellow_Mistral-Nemo-Instruct-2407-exl2-8bpw-h8 is what I'm using.

Anonymous
08/31/24(Sat)14:44:26 No.102173868

Anonymous 08/31/24(Sat)14:44:26 No.102173868

>>102173791
>>102173803
>>102173853
Man this place really is uselees and full of faggots.

Anonymous
08/31/24(Sat)14:47:57 No.102173919

Anonymous 08/31/24(Sat)14:47:57 No.102173919

>>102173868
If you know of a better general (or place for discussion), please share

Anonymous
08/31/24(Sat)14:50:24 No.102173961

Anonymous 08/31/24(Sat)14:50:24 No.102173961

>>102173868
The term you used is a pejorative slur against a marginalized community, which perpetuates harm and discrimination. Using such language is against principles of respect and inclusivity.

Anonymous
08/31/24(Sat)14:51:33 No.102173983

Anonymous 08/31/24(Sat)14:51:33 No.102173983

>>102173919
r/LocalLLaMA

Anonymous
08/31/24(Sat)14:54:19 No.102174035

Anonymous 08/31/24(Sat)14:54:19 No.102174035

>>102173791
Nemo dezznutz

Anonymous
08/31/24(Sat)14:54:30 No.102174040

Anonymous 08/31/24(Sat)14:54:30 No.102174040

>>102173983
Hah, no I will not be going to Reddit.

Anonymous
08/31/24(Sat)15:00:29 No.102174112

Anonymous 08/31/24(Sat)15:00:29 No.102174112

Hate that the only local model that can consistently understand my card is 123b, it's far too slow

Anonymous
08/31/24(Sat)15:02:14 No.102174144

Anonymous 08/31/24(Sat)15:02:14 No.102174144

>>102174112
just tell it to type faster

Anonymous
08/31/24(Sat)15:11:11 No.102174273

Anonymous 08/31/24(Sat)15:11:11 No.102174273

>>102174112
Just use speculative decoding bro.

Anonymous
08/31/24(Sat)15:17:18 No.102174355

Anonymous 08/31/24(Sat)15:17:18 No.102174355

>>102174273
I test the prompt lookup one from time to time, but it always slows it down

Anonymous
08/31/24(Sat)15:21:36 No.102174413

Anonymous 08/31/24(Sat)15:21:36 No.102174413

Why hasn't someone made a video model using all the data we have yet? What's taking so long? Why would it take years?

Anonymous
08/31/24(Sat)15:21:48 No.102174417

Anonymous 08/31/24(Sat)15:21:48 No.102174417

>XTC doesn't exclude EOS
It's not even funny at this point

Anonymous
08/31/24(Sat)15:23:21 No.102174433

Anonymous 08/31/24(Sat)15:23:21 No.102174433

>>102173865
What context template and instructions you use?

Anonymous
08/31/24(Sat)15:23:49 No.102174437

Anonymous 08/31/24(Sat)15:23:49 No.102174437

>>102174417
Don't tell me... You fell for it, didn't you?

Anonymous
08/31/24(Sat)15:25:40 No.102174465

Anonymous 08/31/24(Sat)15:25:40 No.102174465

>>102174417
They talked about that a bunch in the discussion thread I skimmed. They really didn't do anything about it?

Anonymous
08/31/24(Sat)15:25:46 No.102174466

Anonymous 08/31/24(Sat)15:25:46 No.102174466

>>102174413
For the same reason you first make a tiny rocket and then, when you have it mostly figured out, make a big rocket. Once you have you make a bigger one.

Anonymous
08/31/24(Sat)15:27:05 No.102174485

Anonymous 08/31/24(Sat)15:27:05 No.102174485

>>102174466
What's to figure out?

Anonymous
08/31/24(Sat)15:28:13 No.102174502

Anonymous 08/31/24(Sat)15:28:13 No.102174502

>>102174417
These kinds of samplers should have a configurable exclusion list.

Anonymous
08/31/24(Sat)15:28:53 No.102174509

Anonymous 08/31/24(Sat)15:28:53 No.102174509

>>102174485
Temporal consistency isn't 100% figured out even in Sora.

Anonymous
08/31/24(Sat)15:30:06 No.102174521

Anonymous 08/31/24(Sat)15:30:06 No.102174521

>>102174485
We're still figuring out text models. There's very few audio models. Even image models are far from being all they can be. They're all dumb. There was a video model released a few days ago and, while i think it existing is great, it's not very good.
There's still a lot to figure out.

Anonymous
08/31/24(Sat)15:30:09 No.102174524

Anonymous 08/31/24(Sat)15:30:09 No.102174524

>>102174509
What? I thought the whole point is you mostly need more data and compute and the models improve.

Anonymous
08/31/24(Sat)15:30:48 No.102174528

Anonymous 08/31/24(Sat)15:30:48 No.102174528

>>102174112
Just buy more vram bro

Anonymous
08/31/24(Sat)15:31:20 No.102174534

Anonymous 08/31/24(Sat)15:31:20 No.102174534

>>102174524
Diminishing returns. It's not magic.

Anonymous
08/31/24(Sat)15:33:06 No.102174558

Anonymous 08/31/24(Sat)15:33:06 No.102174558

>>102174524
And then you have the problem of bandwidth. How many novels in text form can you fit in the same storage as a single video. The whole thing still needs to be fed into the model.

Anonymous
08/31/24(Sat)15:34:45 No.102174577

Anonymous 08/31/24(Sat)15:34:45 No.102174577

>>102174521
>We're still figuring out text models.
So, you want people to slowly get used to using the models before you make better ones?

>>102174534
Irrelevant, I think the main thing that makes models better is adding more data and compute.

Anonymous
08/31/24(Sat)15:35:04 No.102174580

Anonymous 08/31/24(Sat)15:35:04 No.102174580

6.10.6 Debian testing kernel gives a nice speed boost for Epyc. I'm now getting 0.93T/s on 405b Q8

Anonymous
08/31/24(Sat)15:37:04 No.102174605

Anonymous 08/31/24(Sat)15:37:04 No.102174605

>>102174509
What kinds of things went into improving GPT other than data and compute?

Anonymous
08/31/24(Sat)15:40:59 No.102174648

Anonymous 08/31/24(Sat)15:40:59 No.102174648

File: 1710180321504075.gif (3.06 MB, 500x207)

3.06 MB GIF

>>102170741
>

Anonymous
08/31/24(Sat)15:42:02 No.102174662

Anonymous 08/31/24(Sat)15:42:02 No.102174662

>>102174648
and yet you came in 5 hours later and insisted on being the only one to reply to the bait

Anonymous
08/31/24(Sat)15:42:43 No.102174671

Anonymous 08/31/24(Sat)15:42:43 No.102174671

>>102174577
>So, you want people to slowly get used to using the models before you make better ones?
No. I mean the text models we're making are not all they can be. We're still figuring out techniques to make them better. It's not all figured out.

>Irrelevant, I think the main thing that makes models better is adding more data and compute.
Yes, but it's not irrelevant. You still have to feed TERABYTES of videos. Just training big text models takes months as it is. A proper big video model would take even more. It'd be a waste to train the thing for an entire year and realize something is wrong with the training or methodology.
So you'll see a few small video models here and there and they'll eventually get bigger and better, just like what happened with text models.

Anonymous
08/31/24(Sat)15:44:28 No.102174689

Anonymous 08/31/24(Sat)15:44:28 No.102174689

>>102174662
i had other things to do sweaty

Anonymous
08/31/24(Sat)15:44:48 No.102174693

Anonymous 08/31/24(Sat)15:44:48 No.102174693

>>102174671
>techniques to make them better.
What has there been so far?
>training or methodology.
What has been found out about those?

Anonymous
08/31/24(Sat)15:47:05 No.102174718

Anonymous 08/31/24(Sat)15:47:05 No.102174718

>>102174605
1. **RNNs (Recurrent Neural Networks) (Early Attempts):**
- RNNs, particularly LSTMs and GRUs, were initial attempts at processing sequential data like text.
- **Limitations:** Difficulty handling long-range dependencies (vanishing/exploding gradients), slow training due to sequential processing.

2. **Encoder-Decoder Transformers (Seq2Seq Models) (2014 - 2017):**
- Introduced by the "Attention is All You Need" paper.
- **Encoder:** Processes the input sequence (e.g., a sentence in one language for translation) and generates a contextualized representation.
- **Decoder:** Takes the encoder's output and generates the output sequence (e.g., the translated sentence).
- **Advantages:** Parallel processing (faster training), better at capturing long-range dependencies due to attention mechanisms.
- **Examples:** Original Transformer, early machine translation models.

3. **Decoder-only Transformers (Autoregressive Language Models) (2018 - Present):**
- Focuses solely on the decoder part of the transformer architecture.
- Predicts the next token in a sequence based on the preceding tokens (autoregressive).
- **Advantages:** More efficient for language modeling tasks, can generate highly coherent and fluent text.
- **Examples:** GPT-2, GPT-3, LaMDA.

4. **RoPE (Rotary Positional Embeddings) Decoder-Only Transformers (2021 - Present):**
- Addresses limitations of absolute or relative positional encodings in previous transformer models.
- RoPE encodes positional information directly into the attention mechanism by rotating word embeddings in the attention space.
- **Advantages:** Better handles long sequences, improves performance on tasks requiring understanding of relative positions.
- **Examples:** PaLM, LLaMA, GPT-4 (rumored)

Anonymous
08/31/24(Sat)15:49:08 No.102174744

Anonymous 08/31/24(Sat)15:49:08 No.102174744

>>102174648
>everyone that thinks my beliefs are retarded must be baiting

Anonymous
08/31/24(Sat)15:49:55 No.102174755

Anonymous 08/31/24(Sat)15:49:55 No.102174755

>>102174718
Anything else? For GPT especially

Anonymous
08/31/24(Sat)15:51:55 No.102174788

Anonymous 08/31/24(Sat)15:51:55 No.102174788

>>102174744
>importing shitskins & advocating for tranny surgery is ">>>intelligence<<<"
/lmg/, everyone, or no one because normal anons avoid ai jeet generals as it's well known fact you love sucking off your jewish masters here.

Anonymous
08/31/24(Sat)15:55:11 No.102174835

Anonymous 08/31/24(Sat)15:55:11 No.102174835

>>102174693
>What has there been so far?
Read arxiv papers. There's probably a few dozen a day on AI stuff. Not all techniques are tested with big models, most are experimental. They're tested on tiny models, some have potential and may be implemented in future bigger models. mamba2 and jamba add attention layers to the old mamba, for example. There aren't too many examples of those models. Same happened with gpt models, T5 and all the different architectures that came in between. Small upgrades.
>training or methodology
There's like 3 video models. They work to whatever degree they do. That's a success already. Those are inherently more expensive to make and experiment with.

But it's not just "more data, more compute". The data has to be high quality and techniques on how to generate that data also change. Or how to recognize and filter out low quality human data. Data augmentation and so on...

Anonymous
08/31/24(Sat)15:55:53 No.102174844

Anonymous 08/31/24(Sat)15:55:53 No.102174844

>>102174755
things like ReLU or SwiGLU is the only thing that comes to my mind, but it's not like I'm a historian on the transformers past or anything.

Anonymous
08/31/24(Sat)15:58:12 No.102174874

Anonymous 08/31/24(Sat)15:58:12 No.102174874

>>102174835
>Read arxiv papers.
I can't look through thousands and I can't read them, I would rather just know what the biggest things have been

>There's like 3 video models.
What could the differences even be?

Anonymous
08/31/24(Sat)15:59:45 No.102174886

Anonymous 08/31/24(Sat)15:59:45 No.102174886

>>102173853
But you didn't do it...

Anonymous
08/31/24(Sat)16:01:19 No.102174903

Anonymous 08/31/24(Sat)16:01:19 No.102174903

>>102173680
lurk two more years newfag

Anonymous
08/31/24(Sat)16:03:31 No.102174923

Anonymous 08/31/24(Sat)16:03:31 No.102174923

>>102174874
>training or methodology.
One is from openai, so who the fuck knows. Ask them. The other one was
>(08/27) CogVideoX-5B, diffusion transformer text-to-video model: https://hf.co/THUDM/CogVideoX-5b
Read about it yourself.
And who knows how many unreleased models from the other big companies. Who knows what they're doing.

Anonymous
08/31/24(Sat)16:03:32 No.102174924

Anonymous 08/31/24(Sat)16:03:32 No.102174924

Are the "I can't read instructions that says to replace x with y" posts bait or pure retard?

Anonymous
08/31/24(Sat)16:07:45 No.102174970

Anonymous 08/31/24(Sat)16:07:45 No.102174970

After 10 or so hours of Coooomander Gooooning I feel sadness. The longer I use it the less I like it. The best I could do is to only partially wrangle it to write in a different style than the default LLM slopismax. It still kept adding slop things. And the slopmax is strong with this one. I wouldn't be surprised if this was objectively worse for cooming than OG coomander.

Anonymous
08/31/24(Sat)16:09:33 No.102174994

Anonymous 08/31/24(Sat)16:09:33 No.102174994

>>102171482
Welp. This is sad. The docs have various options for the draft model but not tensor split. I was thinking of basically have the small model on one GPU and the big model on my other GPU (plus offloading to RAM) but I guess that can't be done.

Anonymous
08/31/24(Sat)16:10:37 No.102175007

Anonymous 08/31/24(Sat)16:10:37 No.102175007

>>102174903
stfu this is a friendly general faggot

Anonymous
08/31/24(Sat)16:16:57 No.102175075

Anonymous 08/31/24(Sat)16:16:57 No.102175075

>>102169541
I say around 120 is the threshold, not for sentience but for quality

Anonymous
08/31/24(Sat)16:17:21 No.102175080

Anonymous 08/31/24(Sat)16:17:21 No.102175080

>>102171482
>click link
>see python
>ctrl+w

Anonymous
08/31/24(Sat)16:20:55 No.102175121

Anonymous 08/31/24(Sat)16:20:55 No.102175121

What does SOTA anime video look like?

Anonymous
08/31/24(Sat)16:22:34 No.102175143

Anonymous 08/31/24(Sat)16:22:34 No.102175143

>>102174924
Bait, because no one will invest their time in abhorrently boring and ZOG'ed tech, its just a few (you)s samefagging here.
Also "anime AI slop pic in OP - shitty opinions inside" rule, usually applied to individual posters though, on /v/ it works like clockwork.

Anonymous
08/31/24(Sat)16:23:18 No.102175155

Anonymous 08/31/24(Sat)16:23:18 No.102175155

>>102175121
>>102110004

Anonymous
08/31/24(Sat)16:23:40 No.102175161

Anonymous 08/31/24(Sat)16:23:40 No.102175161

>>102173077
what happened to mamba?

Anonymous
08/31/24(Sat)16:24:39 No.102175177

Anonymous 08/31/24(Sat)16:24:39 No.102175177

>>102175161
It got a 2.

Anonymous
08/31/24(Sat)16:24:55 No.102175182

Anonymous 08/31/24(Sat)16:24:55 No.102175182

>>102175155
interesting

Anonymous
08/31/24(Sat)16:25:40 No.102175193

Anonymous 08/31/24(Sat)16:25:40 No.102175193

>>102175177
were bitnet2 ?

Anonymous
08/31/24(Sat)16:26:01 No.102175197

Anonymous 08/31/24(Sat)16:26:01 No.102175197

>>102175193
were not

Anonymous
08/31/24(Sat)16:28:45 No.102175226

Anonymous 08/31/24(Sat)16:28:45 No.102175226

File: Anime-Hatsune-Miku-Vocalo(...).jpg (772 KB, 1181x1195)

772 KB JPG

>>102171482
>see this
>think "cool, maybe now I can run Largestral with more than 0.3t/s"
>realize Largestral fills both my RAM and VRAM, leaving no space for a draft model

Anonymous
08/31/24(Sat)16:29:41 No.102175231

Anonymous 08/31/24(Sat)16:29:41 No.102175231

>>102175226
Use a lower quant so you can fit both.

Anonymous
08/31/24(Sat)16:46:22 No.102175421

Anonymous 08/31/24(Sat)16:46:22 No.102175421

>>102171482
What are the drawbacks of speculative decoding? Does it make models dumber? Can I use some 2B or even 0.5B IQ1_XXS with Largestral and make it faster without losing coherence?

Anonymous
08/31/24(Sat)16:46:51 No.102175429

Anonymous 08/31/24(Sat)16:46:51 No.102175429

File: Screenshot 2024-08-31 144635.png (337 KB, 919x791)

337 KB PNG

Why is L3.1 8B such a hopeless retard, anons?

Anonymous
08/31/24(Sat)16:48:41 No.102175446

Anonymous 08/31/24(Sat)16:48:41 No.102175446

>>102175429
>8b
I wonder why

Anonymous
08/31/24(Sat)16:50:43 No.102175484

Anonymous 08/31/24(Sat)16:50:43 No.102175484

>>102175429
Because it's being questioned by a bazinga! level cunt dork that doesn't understand llms are autocomplete machines rather than reasoning devices.

Anonymous
08/31/24(Sat)16:51:53 No.102175498

Anonymous 08/31/24(Sat)16:51:53 No.102175498

>>102175446
L3.1 8B is smarter than L2 70B, so that's not an excuse

Anonymous
08/31/24(Sat)16:56:06 No.102175541

Anonymous 08/31/24(Sat)16:56:06 No.102175541

>>102175498
Obviously, it isn't.

Anonymous
08/31/24(Sat)17:01:03 No.102175584

Anonymous 08/31/24(Sat)17:01:03 No.102175584

>>102175421
>What are the drawbacks of speculative decoding? Does it make models dumber?
None, the output is guaranteed to be the same with or without the draft model.
>Can I use some 2B or even 0.5B IQ1_XXS with Largestral and make it faster without losing coherence?
Without losing coherence? Yes. But how fast it will be depends entirely on how many tokens the draft model gets right. If the draft models is too retarded it would be actually slower.

Anonymous
08/31/24(Sat)17:03:14 No.102175610

Anonymous 08/31/24(Sat)17:03:14 No.102175610

>>102165851
Anon I may be late, but this was funny.

Anonymous
08/31/24(Sat)17:09:31 No.102175691

Anonymous 08/31/24(Sat)17:09:31 No.102175691

>>102175541
It's also better than bloom-175b so size does not matter

Anonymous
08/31/24(Sat)17:10:49 No.102175712

Anonymous 08/31/24(Sat)17:10:49 No.102175712

>>102167373
Happy birthday Miku

Anonymous
08/31/24(Sat)17:11:19 No.102175718

Anonymous 08/31/24(Sat)17:11:19 No.102175718

>>102175421
>>102175584
the draft model also needs to use the same token vocab as the target model, so your options are usually limited to distilled mini versions or recent enough small models from the same devs
a guy on twitter claimed to successfully use mistral-7b 0.3 with largestral

Anonymous
08/31/24(Sat)17:11:46 No.102175721

Anonymous 08/31/24(Sat)17:11:46 No.102175721

File: GWLPjKqWoAAmjkY.jpg (83 KB, 957x1080)

83 KB JPG

leek myth wumikuongu

Anonymous
08/31/24(Sat)17:12:31 No.102175728

Anonymous 08/31/24(Sat)17:12:31 No.102175728

Might NVIDIA create a 60b from Mistral Large 2 like they did with other models, or is there a reason they can't?

Anonymous
08/31/24(Sat)17:13:25 No.102175734

Anonymous 08/31/24(Sat)17:13:25 No.102175734

>>102175691
Still not smart.

Anonymous
08/31/24(Sat)17:13:29 No.102175735

Anonymous 08/31/24(Sat)17:13:29 No.102175735

>>102175721
>I see London, I see France...

Anonymous
08/31/24(Sat)17:14:01 No.102175743

Anonymous 08/31/24(Sat)17:14:01 No.102175743

>>102175728
They can't because you touch yourself at night. With your brown hand.

Anonymous
08/31/24(Sat)17:14:54 No.102175752

Anonymous 08/31/24(Sat)17:14:54 No.102175752

>>102175691
Size is necessary, but not sufficient. A bigger model will always be better _all other things being equal_, but that second part can get dicey.

Anonymous
08/31/24(Sat)17:15:02 No.102175757

Anonymous 08/31/24(Sat)17:15:02 No.102175757

Anyone else giving up on commander and going back to nemo?

Anonymous
08/31/24(Sat)17:15:52 No.102175765

Anonymous 08/31/24(Sat)17:15:52 No.102175765

File: GUxblF9bMAAQWU7.jpg (518 KB, 4096x2466)

518 KB JPG

VRAMLET RP MODEL LIST:

Great:
Rocinante 1.1

Good:
Llama-3.1-8B-Stheno-v3.4
mistral-nemo-gutenberg-12B-v4

Good but a little dry:
magnum-v2.5-12b-kto
Chronos-Gold-12B-1.0

Ok:
gemma-2-9b-it-WPO-HB

Garbage, avoid:
Starcannon
OpenCrystal-12B-L3
NemoRemix
NemoReRemix

Anonymous
08/31/24(Sat)17:16:58 No.102175772

Anonymous 08/31/24(Sat)17:16:58 No.102175772

>>102175765
Buy an ad.

Anonymous
08/31/24(Sat)17:16:58 No.102175773

Anonymous 08/31/24(Sat)17:16:58 No.102175773

File: GWOWctlboAEs1iY.jpg (50 KB, 1056x585)

50 KB JPG

>>102175721

Anonymous
08/31/24(Sat)17:16:59 No.102175774

Anonymous 08/31/24(Sat)17:16:59 No.102175774

>NEW: Added XTC (Exclude Top Choices) sampler, a brand new creative writing sampler designed by the same author of DRY (@p-e-w). To use it, increase xtc_probability above 0 (recommended values to try: xtc_threshold=0.15, xtc_probability=0.5)

If we just make a good enough sampler we will make 7B models ERP like 70B models.

Anonymous
08/31/24(Sat)17:17:02 No.102175775

Anonymous 08/31/24(Sat)17:17:02 No.102175775

>>102175757
Both are useless

Anonymous
08/31/24(Sat)17:18:53 No.102175795

Anonymous 08/31/24(Sat)17:18:53 No.102175795

>>102175774
Not motivated to try samplers because they become useless as models get better

Anonymous
08/31/24(Sat)17:19:02 No.102175801

Anonymous 08/31/24(Sat)17:19:02 No.102175801

>>102175735
are you stupid that's in china

Anonymous
08/31/24(Sat)17:19:32 No.102175807

Anonymous 08/31/24(Sat)17:19:32 No.102175807

>>102175774
They'll just grow a few extra parts, twist their spine in impossible ways, and forget where they are occasionally.

Anonymous
08/31/24(Sat)17:19:55 No.102175814

Anonymous 08/31/24(Sat)17:19:55 No.102175814

>>102175795
? even the smartest model in the world would benefit from cutting out the highest probability tokens for creative purposes.

Anonymous
08/31/24(Sat)17:20:21 No.102175819

Anonymous 08/31/24(Sat)17:20:21 No.102175819

>>102175774
It makes 70Bs as smart as 7Bs, but it's sovl

Anonymous
08/31/24(Sat)17:20:54 No.102175826

Anonymous 08/31/24(Sat)17:20:54 No.102175826

>>102175774
Will "training on XTC'd LLM outputs" become a thing?

Anonymous
08/31/24(Sat)17:21:06 No.102175828

Anonymous 08/31/24(Sat)17:21:06 No.102175828

>>102175801
I can see Miku's China

Anonymous
08/31/24(Sat)17:21:22 No.102175834

Anonymous 08/31/24(Sat)17:21:22 No.102175834

>>102175819
That is a lie. It only does anything when there are multiple good token probabilities. It does nothing to their intelligence.

Anonymous
08/31/24(Sat)17:22:34 No.102175851

Anonymous 08/31/24(Sat)17:22:34 No.102175851

>>102175765
Check out https://huggingface.co/NeverSleep/Lumimaid-v0.2-8B
She's better than most at ERPing.
>inb4 "heh, 'better' with only 8B? try again when you can run 200B models"

Anonymous
08/31/24(Sat)17:23:06 No.102175858

Anonymous 08/31/24(Sat)17:23:06 No.102175858

>>102175814
No, I don't think so

Anonymous
08/31/24(Sat)17:23:28 No.102175861

Anonymous 08/31/24(Sat)17:23:28 No.102175861

>>102175826
>training on XTC
>user XTC's the model, cancelling out the XTC (or more schizo)

Anonymous
08/31/24(Sat)17:23:48 No.102175862

Anonymous 08/31/24(Sat)17:23:48 No.102175862

>>102175858
Then enjoy your shivers down the spine I guess.

Anonymous
08/31/24(Sat)17:26:02 No.102175895

Anonymous 08/31/24(Sat)17:26:02 No.102175895

AI literally made me into a better artists. I can't believe smut can drive a man to become better in a craft he isn't good at.

Anonymous
08/31/24(Sat)17:28:00 No.102175921

Anonymous 08/31/24(Sat)17:28:00 No.102175921

>>102175765
>Llama-3.1-8B-Stheno-v3.4
Stopped reading here. Llama 3.1-8B in general is bad compared to Nemo, but this particular finetune is especially bad, because the only advantage that L3.1 8B has over Nemo is that it can actually remember things up to 32k tokens, and most of its finetunes still do fine with 32k (e.g. Sellen, Ultra Instruct, Sunfall), but the new Stheno doesn't. It's also dumber than its predecessor and messes up the formatting.

Anonymous
08/31/24(Sat)17:28:41 No.102175932

Anonymous 08/31/24(Sat)17:28:41 No.102175932

>>102175851
Why 8b and not 12b?

Anonymous
08/31/24(Sat)17:29:26 No.102175944

Anonymous 08/31/24(Sat)17:29:26 No.102175944

>>102175862
That's just about your argument "the smartest model in the world". It would have a great next token distribution that doesn't need you removing likely tokens for creativity, when it knows that you want creativity

Anonymous
08/31/24(Sat)17:29:44 No.102175947

Anonymous 08/31/24(Sat)17:29:44 No.102175947

>>102175921
Miss me with that shit. 3.2 has much worse prose. Sounds like an amateur roleplayer.

Anonymous
08/31/24(Sat)17:29:59 No.102175951

Anonymous 08/31/24(Sat)17:29:59 No.102175951

>>102175932
Why 12B and not 1200B?

Anonymous
08/31/24(Sat)17:31:41 No.102175971

Anonymous 08/31/24(Sat)17:31:41 No.102175971

>>102175951
Because 12b still runs on 8gb, unfunny retard

Anonymous
08/31/24(Sat)17:37:26 No.102176036

Anonymous 08/31/24(Sat)17:37:26 No.102176036

>>102175728
They trained on 8k context for Llama 3, so I don't expect them to do Mistral right either.

Anonymous
08/31/24(Sat)17:40:31 No.102176067

Anonymous 08/31/24(Sat)17:40:31 No.102176067

>>102175728
They can. You can. Anyone can. The code to do so is public now. Unfortunately, it requires a FFT to heal the distillation damage.
Nvidia got their paper and proof of concept out already, so there is no incentive for them to do any more.
No one else can afford to do it.

Anonymous
08/31/24(Sat)17:42:52 No.102176086

Anonymous 08/31/24(Sat)17:42:52 No.102176086

>>102175951
Because 8B gives me 40 token/s and 12B gives me 15 token/s.
15 t/s is doable for most, but I am unfortunately a VERY fast reader.

Anonymous
08/31/24(Sat)17:43:49 No.102176103

Anonymous 08/31/24(Sat)17:43:49 No.102176103

File: file.png (42 KB, 446x255)

42 KB PNG

what was the name of default kobolt preset?

Anonymous
08/31/24(Sat)17:52:39 No.102176202

Anonymous 08/31/24(Sat)17:52:39 No.102176202

File: 1473970591189.jpg (57 KB, 482x549)

57 KB JPG

>need to go troubleshooting build dependency shit again
Sigh...

Anonymous
08/31/24(Sat)17:52:41 No.102176203

Anonymous 08/31/24(Sat)17:52:41 No.102176203

File: file.png (5 KB, 396x72)

5 KB PNG

>let model just generate indefinitely
>makes up an entire scenario about doing stuff alone in an apartment
>tone shifts abruptly
>it starts talking about a "distant viewer who is content to merely watch"
Fucking hell, I think I'm beginning to understand why those boomers thought these things were sentient.

Anonymous
08/31/24(Sat)17:58:17 No.102176257

Anonymous 08/31/24(Sat)17:58:17 No.102176257

>>102176203
Just had a fight scene where an elf "fired a silver werewolf with a growl from the beast..."
It notices when it says something stupid (sometimes) and tries to correct.

Anonymous
08/31/24(Sat)17:59:20 No.102176274

Anonymous 08/31/24(Sat)17:59:20 No.102176274

>>102176257
fired a silver arrow at the werewolf with a growl from the beast

Anonymous
08/31/24(Sat)18:00:50 No.102176300

Anonymous 08/31/24(Sat)18:00:50 No.102176300

How many more beaks until we reach AGI?

Anonymous
08/31/24(Sat)18:04:06 No.102176344

Anonymous 08/31/24(Sat)18:04:06 No.102176344

When Qwen

Anonymous
08/31/24(Sat)18:05:33 No.102176359

Anonymous 08/31/24(Sat)18:05:33 No.102176359

>>102176344
Qwen2 VL released recently and it was garbage

Anonymous
08/31/24(Sat)18:07:15 No.102176378

Anonymous 08/31/24(Sat)18:07:15 No.102176378

>>102176257
Hm, that makes me think.
Are there any papers about tasking the model to re-evaluate its own prompts and using that evaluation to rewrite the original output?

For example, when it first describes a person with body armor and then a bullet penetrating his torso, it could be tasked with pointing out any illogicalities within the output and it would then hopefully answer with a list of things to be modified.
The two outputs could then be combined to create a third output that evades illogical token choices and selects more logical alternatives, like the bullet smashing against the body armor instead.

Anonymous
08/31/24(Sat)18:07:43 No.102176383

Anonymous 08/31/24(Sat)18:07:43 No.102176383

>still no way to play videogames with AI
I can't wait for the future

Anonymous
08/31/24(Sat)18:09:48 No.102176413

Anonymous 08/31/24(Sat)18:09:48 No.102176413

>>102176378
Some reddit guy wrote an extension for that. It makes the AI write like 4 analysis posts and then uses all that guff to improve the actual reply to roleplay.

People seem to love it but quadrupling response times seems silly to me.

Anonymous
08/31/24(Sat)18:10:17 No.102176419

Anonymous 08/31/24(Sat)18:10:17 No.102176419

>>102176383
play how?

Anonymous
08/31/24(Sat)18:11:40 No.102176437

Anonymous 08/31/24(Sat)18:11:40 No.102176437

>>102176419
Actual, like a second player, play. I know there's supposed to be some new apps that "watch" you play, and nvidia is working on one too, but surely that doesn't work well.

Anonymous
08/31/24(Sat)18:13:52 No.102176461

Anonymous 08/31/24(Sat)18:13:52 No.102176461

>>102175814
The smartest model will have the beginning of the sentence have hundreds of equally probable tokens. I mean when it realizes it is writing smut of course.

Anonymous
08/31/24(Sat)18:14:55 No.102176479

Anonymous 08/31/24(Sat)18:14:55 No.102176479

>>102175851
You don't have to buy an ad Undi. You just have to post under your trip so we bully you.

Anonymous
08/31/24(Sat)18:16:51 No.102176503

Anonymous 08/31/24(Sat)18:16:51 No.102176503

>gooned for 2 weeks straight
>look back and realize how much handholding I did to get the model to say what I want
Yup I think I am done for now. See ya in 6 months.

Anonymous
08/31/24(Sat)18:20:00 No.102176532

Anonymous 08/31/24(Sat)18:20:00 No.102176532

>>102176437
LLMs handle text. Even if you gave them video input, they wouldn't be able to control the game very well.
Most AI playing games is neural nets trained specifically on that game.
Maybe would work if a model comes out or is finetuned with a mouse+keyboard output modality, I guess.

Anonymous
08/31/24(Sat)18:30:12 No.102176647

Anonymous 08/31/24(Sat)18:30:12 No.102176647

>>102175774
Tested it for a few minutes with largestral. While rerolls are now different each time, I can't really say that it didn't impact the model's smarts. Might be better with different settings than the default ones, but it certainly has it's drawbacks contrary to what author says.

Anonymous
08/31/24(Sat)18:32:44 No.102176686

Anonymous 08/31/24(Sat)18:32:44 No.102176686

>>102174718
How do they figure this out?

Anonymous
08/31/24(Sat)18:38:09 No.102176745

Anonymous 08/31/24(Sat)18:38:09 No.102176745

>>102176532
>LLMs handle text.
Tokens, not text.
>Even if you gave them video input, they wouldn't be able to control the game very well.
Someone hasn't seen the latest 'Can it run DOOM?' paper.
https://gamengen.github.io/

Anonymous
08/31/24(Sat)18:41:28 No.102176782

Anonymous 08/31/24(Sat)18:41:28 No.102176782

>>102176647
They never said it doesn't make the model dumber, just that it doesn't make it incoherent. ;)

Anonymous
08/31/24(Sat)18:41:53 No.102176787

Anonymous 08/31/24(Sat)18:41:53 No.102176787

>>102176745
Retard. Have you read it? It's not playing a game. It's generating video for what a game might look like.
It's a game engine, not an LLM that plays games. Next time think before posting you fucking cretin.

Anonymous
08/31/24(Sat)18:49:15 No.102176874

Anonymous 08/31/24(Sat)18:49:15 No.102176874

>>102176686
Through research. People try shit and see what works and what doesn't work. Then release papers about their findings.
Things would move much faster if bitches like OpenAI still released papers...

Anonymous
08/31/24(Sat)18:49:49 No.102176878

Anonymous 08/31/24(Sat)18:49:49 No.102176878

Does everyone else frequently use the "Start Reply With" box? One of the things I find interesting is that if you give a model a full sentence, it might just send EOS back. If you cut off an ending punctuation, it's like 50-50 on simply adding the punctuation and then EOS. If you cut off a few more words, it is more likely to keep going afterward where appropriate.

I've also noticed something that could MAYBE be a coincidence, where if I write a certain response for the AI and then go back and have it write from scratch, it seems to do what I wrote to some degree. For example, I had it generate a character sheet and it said "Exp: 427." I changed the number to 387 for various reasons, then decided to regenerate from scratch. It then generated 387. And this isn't the first time I've seen this behavior. I could imagine this being because it affects the prompt processed somehow, but I could be totally wrong.

Anonymous
08/31/24(Sat)18:52:04 No.102176902

Anonymous 08/31/24(Sat)18:52:04 No.102176902

>>102176745
>>102176787
NTA but here's one that actually plays games:
https://www.theverge.com/2024/3/13/24099024/google-deepmind-ai-agent-sima-video-games
https://deepmind.google/discover/blog/sima-generalist-ai-agent-for-3d-virtual-environments/

Anonymous
08/31/24(Sat)18:59:21 No.102176980

Anonymous 08/31/24(Sat)18:59:21 No.102176980

It's time to face the facts: A model is not worth using if it's not trained on ChatML

Anonymous
08/31/24(Sat)18:59:47 No.102176984

Anonymous 08/31/24(Sat)18:59:47 No.102176984

File: file.png (139 KB, 1273x562)

139 KB PNG

>>102176902
>The SIMA agent maps visual observations and language instructions to keyboard-and-mouse actions
(Figure 4).
Thanks. This is what I was saying about needing a mouse+keyboard output modality, but didn't realize it had already been done. Shame they didn't release the code or model.

Anonymous
08/31/24(Sat)19:03:19 No.102177023

Anonymous 08/31/24(Sat)19:03:19 No.102177023

Has anyone managed to extract the full potential of the new cr+ yet? I tried it and it didn't feel much different than the old one so I figured they changed something about the prompt format that I'm missing.

Anonymous
08/31/24(Sat)19:05:32 No.102177044

Anonymous 08/31/24(Sat)19:05:32 No.102177044

>>102177023
You can squeeze that stone all you want, nothing's gonna come out. Aside from some rag improvements it's actually worse in every other aspect.

Anonymous
08/31/24(Sat)19:06:29 No.102177051

Anonymous 08/31/24(Sat)19:06:29 No.102177051

is the P40 worth $300? i like the idea of grabbing a couple of these because they would fit in my current 2U Epyc server

i really regret not grabbing one of these in march when they were cheaper...

Anonymous
08/31/24(Sat)19:11:17 No.102177117

Anonymous 08/31/24(Sat)19:11:17 No.102177117

File: eternally sixteen.png (832 KB, 1206x1623)

832 KB PNG

hagtsune miku love

Anonymous
08/31/24(Sat)19:11:27 No.102177120

Anonymous 08/31/24(Sat)19:11:27 No.102177120

>>102174994
The python bindings support that but it just wasn't in the script. Here it is with separate options for the main and draft model for the split mode, main gpu, and tensor split settings.
https://pastebin.com/sCjixz4T

Anonymous
08/31/24(Sat)19:12:31 No.102177137

Anonymous 08/31/24(Sat)19:12:31 No.102177137

Bros....

I think I just fell in love
https://cerebras.vercel.app/

Anonymous
08/31/24(Sat)19:13:42 No.102177151

Anonymous 08/31/24(Sat)19:13:42 No.102177151

>>102167465
NEAT!

Does it work with ComfyUI?

Anonymous
08/31/24(Sat)19:13:50 No.102177152

Anonymous 08/31/24(Sat)19:13:50 No.102177152

>>102177051
spending money for this is not worth it in any facet. you can spend 10k and still will have the same exact issues you will have if you spent nothing whatsoever and just used nemo. vramlet models have been getting better and better. there's no good 70b models(unless you wanna use stale miqu). it's only 12b, 27b, and 100b+.

Anonymous
08/31/24(Sat)19:14:53 No.102177162

Anonymous 08/31/24(Sat)19:14:53 No.102177162

>>102177152
>there's no good 70b models(unless you wanna use stale miqu). it's only 12b, 27b, and 100b+.
Is this your attempt at trolling or coping?

Anonymous
08/31/24(Sat)19:19:45 No.102177227

Anonymous 08/31/24(Sat)19:19:45 No.102177227

>>102177162
i have no reason to "cope" or troll. i don't even use vramlet models, i use largestral only. tell me one good 70b model right now. if you mention anything l3 related i insta know you're retarded and ignore everything you say.

Anonymous
08/31/24(Sat)19:20:53 No.102177239

Anonymous 08/31/24(Sat)19:20:53 No.102177239

>>102177137
where can I download this model?

Anonymous
08/31/24(Sat)19:20:56 No.102177240

Anonymous 08/31/24(Sat)19:20:56 No.102177240

A100 for $6500? What is this chink faggotry?
https://www.ebay.com/itm/286038607853

Anonymous
08/31/24(Sat)19:23:50 No.102177269

Anonymous 08/31/24(Sat)19:23:50 No.102177269

>>102177240
https://dronwy.com/products/nvidia-dgx-station-a100-80gb
640GB VRAM for only $10k.Who here wants to take the risk?

Anonymous
08/31/24(Sat)19:25:37 No.102177289

Anonymous 08/31/24(Sat)19:25:37 No.102177289

>>102177269
一分钱一分货 motherfucker

Anonymous
08/31/24(Sat)19:26:49 No.102177301

Anonymous 08/31/24(Sat)19:26:49 No.102177301

File: 2024-08-31_00016_.png (1.23 MB, 1280x720)

1.23 MB PNG

>>102167373
The war on the boobie continues, but we fight valiantly.

Anonymous
08/31/24(Sat)19:27:51 No.102177314

Anonymous 08/31/24(Sat)19:27:51 No.102177314

>>102175231
That worked, I'm using Largerstral Q3_K_S + Nemo Q5_K_S.
The speed gains aren't that impressive though, it's just maybe 30% faster.

Anonymous
08/31/24(Sat)19:28:10 No.102177316

Anonymous 08/31/24(Sat)19:28:10 No.102177316

>>102177301
I'd have put this in /ldg/, but they don't understand victory is possible. Well, it might be. I think that there is censorship in the clip system.

Anonymous
08/31/24(Sat)19:29:46 No.102177333

Anonymous 08/31/24(Sat)19:29:46 No.102177333

>>102177240
>normal ebay account selling random shit like bags and trousers
>suddenly there's 20 auctions advertising too-cheap-to-be-true tech offers at some weird website
That account got clearly hacked.

Anonymous
08/31/24(Sat)19:31:05 No.102177349

Anonymous 08/31/24(Sat)19:31:05 No.102177349

>>102177333
not to mention that those auctions can be bid on so whoever put them up doesn't care about the price they score
it's 100% a cracked account

Anonymous
08/31/24(Sat)19:33:00 No.102177366

Anonymous 08/31/24(Sat)19:33:00 No.102177366

>>102177333
>NOOOOO you can't like bags, trousers, AND gpus! If you use local mikus you must be naked at all times!!!!!!!!!!

Anonymous
08/31/24(Sat)19:35:17 No.102177394

Anonymous 08/31/24(Sat)19:35:17 No.102177394

>>102177366
That's not the point, mouthbreather.

Anonymous
08/31/24(Sat)19:35:26 No.102177396

Anonymous 08/31/24(Sat)19:35:26 No.102177396

>>102177366
>If you use local mikus you must be naked at all times!!!!!!!!!!
My dick is always out. It's just easier this way.

Anonymous
08/31/24(Sat)19:51:18 No.102177572

Anonymous 08/31/24(Sat)19:51:18 No.102177572

>>102177394
put some pants on faggot

Anonymous
08/31/24(Sat)19:53:05 No.102177603

Anonymous 08/31/24(Sat)19:53:05 No.102177603

>>102177572
clothes are unnatural, eden style is best

Anonymous
08/31/24(Sat)19:54:43 No.102177631

Anonymous 08/31/24(Sat)19:54:43 No.102177631

>>102175765
What about Lyra

Anonymous
08/31/24(Sat)20:01:11 No.102177706

Anonymous 08/31/24(Sat)20:01:11 No.102177706

>>102177572
I just hate them, they're so constricting! I mean does a lion wear clothes? And the lion is the king of the jungle! So why can't I, a humble citizen, go naked!?!

Anonymous
08/31/24(Sat)20:04:37 No.102177742

Anonymous 08/31/24(Sat)20:04:37 No.102177742

File: 1000019370.jpg (202 KB, 2400x1080)

202 KB JPG

>>102170867
Alright I had a sleep on it and I think I can make this work.
It needs 2x PCIe x8 and 3 6-pin 12v connectors. My PSU can handle it but I'll still probably run the board from a second PSU, but I was thinking..

D'you reckon I can run my 6900xt off the M.2 CPU slot?

Anonymous
08/31/24(Sat)20:08:11 No.102177772

Anonymous 08/31/24(Sat)20:08:11 No.102177772

>>102169541
>I legit cannot tell the difference between the 33B sized models and 12B ones.
I used to think I knew the difference between 7B and anything else sub 70, but then my assumptions got run over by a Beagle. https://huggingface.co/TheBloke/UNA-TheBeagle-7B-v1-GGUF if you're interested. Nearly left the url out, but on reflection, schizos can fuck themselves with broken glass.

Anonymous
08/31/24(Sat)20:09:31 No.102177780

Anonymous 08/31/24(Sat)20:09:31 No.102177780

if y'all never buy any ads this whole site is gonna shut down you know

Anonymous
08/31/24(Sat)20:12:36 No.102177815

Anonymous 08/31/24(Sat)20:12:36 No.102177815

>>102177780
they're selling passes on a discount trying to make up for lost ad revenue

Anonymous
08/31/24(Sat)20:18:19 No.102177870

Anonymous 08/31/24(Sat)20:18:19 No.102177870

>>102169541
The issue isn't ram, but better inferencing engines.

Anonymous
08/31/24(Sat)20:27:42 No.102177953

Anonymous 08/31/24(Sat)20:27:42 No.102177953

>>102177870
The issue is waiting for the big bitnet models to finish training. Llama 4 is still a couple months out probably.

Anonymous
08/31/24(Sat)20:28:34 No.102177962

Anonymous 08/31/24(Sat)20:28:34 No.102177962

>>102177780
Please oh God please please let this place die. I will shitpost twice as hard if only that becomes a reality.

Anonymous
08/31/24(Sat)20:34:57 No.102178022

Anonymous 08/31/24(Sat)20:34:57 No.102178022

>>102177953
>have enough VRAM to run 1.58bpw bitnet 405B at Q8
Feels good man.

Anonymous
08/31/24(Sat)20:44:56 No.102178118

Anonymous 08/31/24(Sat)20:44:56 No.102178118

deepseek16b lite, codegeex4, codestral, does anyone here know any other good models for programming (python)?

Anonymous
08/31/24(Sat)20:47:33 No.102178138

Anonymous 08/31/24(Sat)20:47:33 No.102178138

>>102177962
Why? What is wrong with you, exactly? Why do you need to arbitrarily destroy things that other people derive value from?

Anonymous
08/31/24(Sat)20:50:24 No.102178168

Anonymous 08/31/24(Sat)20:50:24 No.102178168

>>102176479
The only people that did the "bullying" are other fine-tuners like Sao and co. because they're that kind of pathetic. You're transparent.

Anonymous
08/31/24(Sat)20:52:12 No.102178184

Anonymous 08/31/24(Sat)20:52:12 No.102178184

>>102178168
Hi all, Drummer here...

You got me.

Anonymous
08/31/24(Sat)21:00:10 No.102178262

Anonymous 08/31/24(Sat)21:00:10 No.102178262

File: file.png (18 KB, 150x544)

18 KB PNG

>>102171482
I tried to print the tokens being generated by the draft model (using "self.model.detokenize") but the decoded tokens are garbage, even though I can see the draft model working properly in ST. Weird.

Anonymous
08/31/24(Sat)21:00:12 No.102178263

Anonymous 08/31/24(Sat)21:00:12 No.102178263

>>102177780
>if y'all never buy any ads this whole site is gonna shut down you know
What should I advertise?

Anonymous
08/31/24(Sat)21:00:36 No.102178265

Anonymous 08/31/24(Sat)21:00:36 No.102178265

Is AIDOOM a case of grifters using Stable Diffusion to produce Doom screenshots in sequence, and then telling investors "LOL AI DO END TO END GAME DEV NAO." ??

Anonymous
08/31/24(Sat)21:02:43 No.102178284

Anonymous 08/31/24(Sat)21:02:43 No.102178284

>>102171482
>llamacpp python
Anyone know what cuda, gcc, and glibc you're supposed to compile this with? It's failing on my machine.

Anonymous
08/31/24(Sat)21:03:15 No.102178286

Anonymous 08/31/24(Sat)21:03:15 No.102178286

>>102178184
Been enjoying Rocinante. Degenerate and rather light on the slop. Quants below 8 are retarded though sadly.

Anonymous
08/31/24(Sat)21:04:45 No.102178295

Anonymous 08/31/24(Sat)21:04:45 No.102178295

>>102178284
just use a pre-compiled wheel

Anonymous
08/31/24(Sat)21:06:58 No.102178316

Anonymous 08/31/24(Sat)21:06:58 No.102178316

>>102178265
Doom is just a renderer producing screenshots in sequence.
Not trying to be pedantic, the jump from image gen to "AIDOOM" isn't that big.

Anonymous
08/31/24(Sat)21:09:10 No.102178333

Anonymous 08/31/24(Sat)21:09:10 No.102178333

>>102178316
I wish I could understand the point of that. Maybe I'm jaded, but it seems boring. Haven't we already proven than AI generated video exists?

Anonymous
08/31/24(Sat)21:09:30 No.102178341

Anonymous 08/31/24(Sat)21:09:30 No.102178341

>>102178316
It's kind of big. Conceptually maybe no.

Anonymous
08/31/24(Sat)21:10:16 No.102178351

Anonymous 08/31/24(Sat)21:10:16 No.102178351

File: 1667096200472680.jpg (143 KB, 600x705)

143 KB JPG

what are the top 100 local llms for solving complex puzzles? i got a 1050 and 16bg of ram

Anonymous
08/31/24(Sat)21:10:31 No.102178355

Anonymous 08/31/24(Sat)21:10:31 No.102178355

>>102178333
The point is you can clone games based on Twitch. ie this is a huge income stream disrupter. One day.

Anonymous
08/31/24(Sat)21:10:40 No.102178357

Anonymous 08/31/24(Sat)21:10:40 No.102178357

>>102177780
I'm looking forward to the next stupid fucking meme you monkies latch onto at this point. "Buy an ad," has worn out its' welcome.

Anonymous
08/31/24(Sat)21:11:51 No.102178366

Anonymous 08/31/24(Sat)21:11:51 No.102178366

>>102178355
they used an agent to play the game, they didn't scrape videos from twitch. i think it will never be viable to just use videos from twitch.

Anonymous
08/31/24(Sat)21:12:31 No.102178374

Anonymous 08/31/24(Sat)21:12:31 No.102178374

>102178355
Are people here this retarded? I expected this kind of retardation in /v/ thread but not here.

Anonymous
08/31/24(Sat)21:13:14 No.102178380

Anonymous 08/31/24(Sat)21:13:14 No.102178380

>>102178374
t. streamer

Anonymous
08/31/24(Sat)21:16:19 No.102178405

Anonymous 08/31/24(Sat)21:16:19 No.102178405

xtc seems worse than dry

Anonymous
08/31/24(Sat)21:17:34 No.102178423

Anonymous 08/31/24(Sat)21:17:34 No.102178423

>>102178405
>excluding the best tokens makes it worse
shocker

Anonymous
08/31/24(Sat)21:19:21 No.102178445

Anonymous 08/31/24(Sat)21:19:21 No.102178445

I am the best idea guy here (I am too lazy to write code for a bit of trolling). How about a sokal sampler that picks random tokens and randomly changes their probability? Bonus points for the random change being 1% max.

https://en.wikipedia.org/wiki/Sokal_affair

Anonymous
08/31/24(Sat)21:19:59 No.102178451

Anonymous 08/31/24(Sat)21:19:59 No.102178451

>>102178333
Little difference between jaded and skeptical, either is healthy.
The potential is you can just prompt a video game. I like to run my thought experiments from the vantage of a bedridden paraplegic with the tech to improve their quality of life (desu probably where the general population is headed in 10-20 years).
If I could sit in my care home and prompt up my ideal vidya, complete with AI companions, love, sex etc... I mean I'd just jack in and insert a feeding tube.
Always hits me how much all of this image gen/LLM stuff smacks of vivid dreams. I do love dreams.

Anonymous
08/31/24(Sat)21:20:34 No.102178458

Anonymous 08/31/24(Sat)21:20:34 No.102178458

>>102178374
nta, but yes some of us are this retarded

Anonymous
08/31/24(Sat)21:22:38 No.102178481

Anonymous 08/31/24(Sat)21:22:38 No.102178481

>>102178366
Yeah, but once you have the fundamentals down (WASD moves, shootan, physics and shit) you can conceivably just prompt shit into the game.
Obviously not that easy *right* now, but a handful of months? Sure.

Anonymous
08/31/24(Sat)21:24:18 No.102178496

Anonymous 08/31/24(Sat)21:24:18 No.102178496

>>102172232
Nemo in fp16 doesn't quite fit on my 3090, I have to put a couple layers on the CPU. It seemed to help a bit, maybe I'm totally nuts.

Anonymous
08/31/24(Sat)21:24:20 No.102178497

Anonymous 08/31/24(Sat)21:24:20 No.102178497

>>102178366
:^)

but, if it happens, imagine Twitch has drm but some people hack it and so it's like with movies where people record those in the theater "camming" - but now people can "cam" vidya releases.

Anonymous
08/31/24(Sat)21:27:25 No.102178529

Anonymous 08/31/24(Sat)21:27:25 No.102178529

>>102178497
Stop pretending to be this retarded.

Anonymous
08/31/24(Sat)21:28:53 No.102178546

Anonymous 08/31/24(Sat)21:28:53 No.102178546

>>102178529
The future is nearer than you think.

scotus will rule that drawing diagrams of how apps work counts as a valid violation of the toss, and you'll go to prison.

Anonymous
08/31/24(Sat)21:29:26 No.102178554

Anonymous 08/31/24(Sat)21:29:26 No.102178554

>>102178481
something that might actually be possible would be using the full idgames doom maps archive to train a doom map maker AI, or maybe even a doom game that generates rooms on the fly

Anonymous
08/31/24(Sat)21:30:28 No.102178571

Anonymous 08/31/24(Sat)21:30:28 No.102178571

>>102178546
STOP! IT HURTS!

Anonymous
08/31/24(Sat)21:32:51 No.102178595

Anonymous 08/31/24(Sat)21:32:51 No.102178595

>>102178496
BLAS is only used for prompt processing, but if you have a GPU, the GPU will be used for prompt processing instead. it will not help with generation speed regardless.

Anonymous
08/31/24(Sat)21:33:15 No.102178601

Anonymous 08/31/24(Sat)21:33:15 No.102178601

>>102178554
AIDOOM generates rooms on the fly, it's like watching a dream. Sometimes the player does a 360 and when they look back, there's a completely different room there.
It has object/environment permanence but it's sketch as fuck.
A map maker AI would be neat. I think training the AI to read maps via API/image identification would reduce training overhead and increase stability of the environment.
Brute forcing everything through AI number crunching shouldn't really be the answer, imo.

Anonymous
08/31/24(Sat)21:33:20 No.102178603

Anonymous 08/31/24(Sat)21:33:20 No.102178603

>>102178496
Have you tried it at 8 bit already? Can't imagine there's too much difference (unless you found a usecase where there is one)

Anonymous
08/31/24(Sat)21:34:47 No.102178613

Anonymous 08/31/24(Sat)21:34:47 No.102178613

Did the canucks clean up nsfw from all their training? I keep trying to make commander work and it gives me the vibes of llama3. As in it is desperately trying to generalize smut out of zero examples.

Anonymous
08/31/24(Sat)21:34:47 No.102178615

Anonymous 08/31/24(Sat)21:34:47 No.102178615

>>102178357
>t. finetuner that wants to astroturf his model

Anonymous
08/31/24(Sat)21:36:15 No.102178629

Anonymous 08/31/24(Sat)21:36:15 No.102178629

>>102178601
>Sometimes the player does a 360 and when they look back, there's a completely different room there.
i want to play this game now

Blacked Miku Poster
08/31/24(Sat)21:36:33 No.102178633

Blacked Miku Poster 08/31/24(Sat)21:36:33 No.102178633

>>102177780
I do a lot of buy an ad posts cause I believe in reverse psychology.

Anonymous
08/31/24(Sat)21:37:08 No.102178638

Anonymous 08/31/24(Sat)21:37:08 No.102178638

>>102178613
All model makers converge to dry assistant style, because corporate is where the money is.

Anonymous
08/31/24(Sat)21:51:06 No.102178768

Anonymous 08/31/24(Sat)21:51:06 No.102178768

>>102178262
It seems like the draft might be using a different vocabulary/tokenizer so it likely sees the prompt as junk and generates equally junk outputs. It'll still technically "work" even if all the draft's predictions are wrong — just slower than running without it from the overhead of checking all the bad predictions while generating with the full-sized model.

Anonymous
08/31/24(Sat)21:54:52 No.102178815

Anonymous 08/31/24(Sat)21:54:52 No.102178815

>>102178638
It's endlessly funny to me how Anthropic was founded by corpocucks who left OpenAI because they unironically thought GPT wasn't pozzed enough...and then they ended up creating one of the horniest and most unhinged models ever

Anonymous
08/31/24(Sat)21:56:54 No.102178840

Anonymous 08/31/24(Sat)21:56:54 No.102178840

best model for simulating friendship?

Anonymous
08/31/24(Sat)21:59:02 No.102178866

Anonymous 08/31/24(Sat)21:59:02 No.102178866

>>102178840
Human-100T

Anonymous
08/31/24(Sat)22:00:13 No.102178876

Anonymous 08/31/24(Sat)22:00:13 No.102178876

>>102178815
It was just a facade.

Anonymous
08/31/24(Sat)22:06:43 No.102178948

Anonymous 08/31/24(Sat)22:06:43 No.102178948

Is a 2 bit quant of a 120b model better than a 4 bit quant of a 70b model?

Anonymous
08/31/24(Sat)22:07:58 No.102178962

Anonymous 08/31/24(Sat)22:07:58 No.102178962

File: temp0.webm (597 KB, 974x330)

597 KB WEBM

>>102171482
I think this script's server ignores the generation parameters from the API

Anonymous
08/31/24(Sat)22:08:09 No.102178964

Anonymous 08/31/24(Sat)22:08:09 No.102178964

>>102167373
>(08/29) Qwen2-VL 2B & 7B image+video models released
When are they going to release Qwen2-VL-72B?

Anonymous
08/31/24(Sat)22:08:44 No.102178970

Anonymous 08/31/24(Sat)22:08:44 No.102178970

>>102178964
it has been released. api only :^)

Anonymous
08/31/24(Sat)22:10:59 No.102178992

Anonymous 08/31/24(Sat)22:10:59 No.102178992

>>102178768
That... makes sense. When I used different quants of the same model it decoded correctly. I thought Nemo and Largestral used the same tokenizer...

Anonymous
08/31/24(Sat)22:14:30 No.102179011

Anonymous 08/31/24(Sat)22:14:30 No.102179011

>>102178284
I have the cuda toolkit 12.5.1, but I'm sure the newer one will work, and gcc-12.4.

Anonymous
08/31/24(Sat)22:19:29 No.102179053

Anonymous 08/31/24(Sat)22:19:29 No.102179053

>>102178964
from the qwen discord on the topic of releasing it:
>No confirmed plan but who knows:bob_the_builder:
>Yay we talked about it in the vl channel. No eta actually for 72b
sounds like they are unlikely to release it but it's possible

Anonymous
08/31/24(Sat)22:30:28 No.102179149

Anonymous 08/31/24(Sat)22:30:28 No.102179149

>>102178992
Yeah Nemo is different from Largestral. Mistral v0.3 does use the same tokenizer though.

Anonymous
08/31/24(Sat)22:39:16 No.102179225

Anonymous 08/31/24(Sat)22:39:16 No.102179225

>>102178962
Hmm, it definitely works for me on SillyTavern at the /v1 endpoint with Default/OpenAI style API set; at least. E.g. cranking up the temp and disabling guardrails turns it into gibberish and then adding Min-P at 0.05 in with the same temp makes it coherent.

Anonymous
08/31/24(Sat)22:44:02 No.102179265

Anonymous 08/31/24(Sat)22:44:02 No.102179265

Command R is actually decent with picking up in a story. I have a chat with about 40k tokens already from another model and Command R adapted to it no problem.

Anonymous
08/31/24(Sat)23:11:07 No.102179534

Anonymous 08/31/24(Sat)23:11:07 No.102179534

>>102178866
She's too fat. Gotta find something that's a LOCAL MODEL

Anonymous
08/31/24(Sat)23:12:35 No.102179547

Anonymous 08/31/24(Sat)23:12:35 No.102179547

I've been out of the loop for a while. Has anything surpassed midnight miqu for 70b? I'll take a good 8x7b too.

Anonymous
08/31/24(Sat)23:15:50 No.102179582

Anonymous 08/31/24(Sat)23:15:50 No.102179582

>>102179547
Go fuck yourself, shill.

Anonymous
08/31/24(Sat)23:16:54 No.102179595

Anonymous 08/31/24(Sat)23:16:54 No.102179595

>>102179547
No but you're about to get a bunch of shill responses desperately trying to convince you otherwise so they can limit their ad spend.

Anonymous
08/31/24(Sat)23:17:49 No.102179601

Anonymous 08/31/24(Sat)23:17:49 No.102179601

quickest local model to run on a 4090 that is instruction but wont refuse anything?

Anonymous
08/31/24(Sat)23:17:57 No.102179604

Anonymous 08/31/24(Sat)23:17:57 No.102179604

>>102179595
Wow, you were instantly proven right.

Anonymous
08/31/24(Sat)23:19:30 No.102179625

Anonymous 08/31/24(Sat)23:19:30 No.102179625

>>102179601
Qwen 0.5B Instruct

Anonymous
08/31/24(Sat)23:24:14 No.102179673

Anonymous 08/31/24(Sat)23:24:14 No.102179673

Smut finetunes of Mistral Large are literally all I need now. It almost never fucks up world modelling or says anything nonsensical.

I just wish it wasn't so slow. I've got 36GB VRAM but I still pull less than 1t/s on the IQ3_M.

Anonymous
08/31/24(Sat)23:25:08 No.102179685

Anonymous 08/31/24(Sat)23:25:08 No.102179685

File: file.png (720 KB, 768x768)

720 KB PNG

Anonymous
08/31/24(Sat)23:25:23 No.102179687

Anonymous 08/31/24(Sat)23:25:23 No.102179687

File: file.png (38 KB, 615x490)

38 KB PNG

>>102179601

Anonymous
08/31/24(Sat)23:27:31 No.102179714

Anonymous 08/31/24(Sat)23:27:31 No.102179714

>>102179685
this is a lot worse than before

Anonymous
08/31/24(Sat)23:27:57 No.102179719

Anonymous 08/31/24(Sat)23:27:57 No.102179719

>>102179265
I take it you are talking about nonsexual?

Anonymous
08/31/24(Sat)23:35:28 No.102179802

Anonymous 08/31/24(Sat)23:35:28 No.102179802

>>102179714
don't reply to petra

Anonymous
08/31/24(Sat)23:36:13 No.102179810

Anonymous 08/31/24(Sat)23:36:13 No.102179810

>>102179719
Obviously? I know we joke around here a lot about that sort of thing but it's not like we are actually sitting here trying to use LLMs for sexual gratification, lol.

Anonymous
08/31/24(Sat)23:37:00 No.102179821

Anonymous 08/31/24(Sat)23:37:00 No.102179821

>>102179810
Y-yeah of course n-not

Anonymous
08/31/24(Sat)23:37:16 No.102179827

Anonymous 08/31/24(Sat)23:37:16 No.102179827

>>102179805
>>102179805
>>102179805

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.