/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 11/02/24(Sat)19:16:31 No.103066795

File: GaS2PwOXYAA0hjN.jpg (79 KB, 672x756)

79 KB JPG

/lmg/ - Local Models General Anonymous 11/02/24(Sat)19:16:31 No.103066795 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103057367 & >>103045507

►News
>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip
>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b
>(10/31) Transluce open-sources AI investigation toolkit: https://github.com/TransluceAI/observatory
>(10/30) TokenFormer models with fully attention-based architecture: https://hf.co/Haiyang-W/TokenFormer-1-5B
>(10/30) MaskGCT: Zero-Shot TTS with Masked Generative Codec Transformer: https://hf.co/amphion/MaskGCT

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
11/02/24(Sat)19:16:51 No.103066797

Anonymous 11/02/24(Sat)19:16:51 No.103066797

File: 023a3def6f9.jpg (465 KB, 1024x1024)

465 KB JPG

--- A Measure of the Current Meta ---
> a suggestion of what to try from (You)

96GB VRAM
Qwen/Qwen2.5-72B-Instruct-Q8_0.gguf (aka the best of the best)
anthracite-org/magnum-v4-72b-gguf-Q8_0.gguf

64GB VRAM
Qwen/Qwen2.5-72B-Instruct-Q5_K_M.gguf
anthracite-org/magnum-v4-72b-gguf-Q5_K_M.gguf

48GB VRAM
Qwen/Qwen2.5-72B-Instruct-IQ4_XS.gguf
anthracite-org/magnum-v4-72b-gguf-IQ4_XS.gguf

24GB VRAM
Qwen/Qwen2.5-32B-Instruct-Q4_K_M.gguf
EVA-UNIT-01/EVA-Qwen2.5-32B-v0.1-Q4_K_M.gguf

16GB VRAM
Qwen/Qwen2.5-14B-Instruct-Q6_K.gguf
EVA-UNIT-01/EVA-Qwen2.5-14B-v0.1-Q6_K.gguf

12GB VRAM
Qwen/Qwen2.5-14B-Instruct-Q4_K_M.gguf
EVA-UNIT-01/EVA-Qwen2.5-14B-v0.1-Q4_K_M.gguf

8GB VRAM
mistralai/Mistral-Nemo-Instruct-2407-IQ4_XS.gguf
anthracite-org/magnum-v4-12b-IQ4_XS.gguf
TheDrummer/Rocinante-12B-v1.1-IQ4_XS.gguf

Potato
>>>/g/aicg

> fite me

Anonymous
11/02/24(Sat)19:21:07 No.103066839

Anonymous 11/02/24(Sat)19:21:07 No.103066839

the day feels cactussy

Anonymous
11/02/24(Sat)19:23:47 No.103066856

Anonymous 11/02/24(Sat)19:23:47 No.103066856

>>103066797
Qwen2.5-14B sucks. also, who the fuck has anywhere near that much vram? gtfo

Anonymous
11/02/24(Sat)19:25:32 No.103066878

Anonymous 11/02/24(Sat)19:25:32 No.103066878

>>103066797
>wasting half your list on models nobody can run
kill yoruself

Anonymous
11/02/24(Sat)19:26:02 No.103066884

Anonymous 11/02/24(Sat)19:26:02 No.103066884

>>103066856
>who the fuck has anywhere near that much vram
16GB? Really?

Anonymous
11/02/24(Sat)19:26:20 No.103066888

Anonymous 11/02/24(Sat)19:26:20 No.103066888

>>103066839
*hinussy

Anonymous
11/02/24(Sat)19:26:49 No.103066895

Anonymous 11/02/24(Sat)19:26:49 No.103066895

>>103066878
hi petra

Anonymous
11/02/24(Sat)19:27:13 No.103066901

Anonymous 11/02/24(Sat)19:27:13 No.103066901

File: 1719628328816342.webm (2.11 MB, 1024x1024)

2.11 MB WEBM

After some delays, I have finally reached 50 questions for the culture benchmark. Yay yay. 50 more to go.

Anonymous
11/02/24(Sat)19:29:15 No.103066923

Anonymous 11/02/24(Sat)19:29:15 No.103066923

File: miku-fridge.jpg (161 KB, 1024x1024)

161 KB JPG

►Recent Highlights from the Previous Thread: >>103057367

--Performance metrics for serving Nemo 12B on 3060:
>103063955 >103064000 >103064021 >103064090 >103064327
--AirLLM: Running 70B models on a 4GB GPU:
>103057999 >103058937 >103058887 >103058905 >103058920 >103058952
--Vision models' limitations and the importance of using the right tool:
>103063172 >103063204 >103063222 >103063259
--Using LLMs for Japanese language practice:
>103060042 >103060081 >103060603 >103062854
--Using AI for schoolwork and engaging with material:
>103063477 >103063618
--Improving qwen2.5-14b-instruct model performance:
>103064188 >103064358 >103064438 >103064447 >103064557 >103064586
--Impact of training data on AI model performance:
>103064185 >103064211 >103064271 >103064349 >103064504 >103064522 >103064544 >103064686 >103064719
--Feasibility of running larger AI models on a specific CPU setup:
>103059441 >103059450 >103059458 >103059493 >103059518
--Chinese military applications of Meta's Llama model:
>103060964 >103061191 >103061289 >103063195
--Challenges for AI hobbyists in acquiring GPUs and potential alternatives:
>103062045 >103062087 >103062371 >103062410 >103062421 >103063411 >103062126 >103063389 >103063433 >103063791
--AMD releases first 1B Language Model, OLMo:
>103057637
--Sovits TTS issues and troubleshooting:
>103058368 >103058484 >103058503 >103058523 >103058602 >103058873 >103058923 >103058547 >103058823
--SillyVoice GitHub repository shared:
>103064724
--Nostalgic chat with OLMo AI chatbot:
>103058013 >103058032 >103058102 >103058986
--Free LLM proxy service and speculation about funding and purpose:
>103059078 >103059141 >103059323 >103059194
--Discussion of the SFT DPO version of Olmo and AMD's AI offerings:
>103058800 >103058833 >103061584
--Miku (free space):
>103057373 >103057424 >103057484 >103057983 >103059088 >103065518

►Recent Highlight Posts from the Previous Thread: >>103057368

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
11/02/24(Sat)19:29:30 No.103066925

Anonymous 11/02/24(Sat)19:29:30 No.103066925

>>103066899
>no reply news gets added
>mine doesnt
KYS

Anonymous
11/02/24(Sat)19:34:49 No.103066960

Anonymous 11/02/24(Sat)19:34:49 No.103066960

Miku stew!

Anonymous
11/02/24(Sat)19:41:09 No.103066998

Anonymous 11/02/24(Sat)19:41:09 No.103066998

>>103066884
>24GB VRAM
>48GB VRAM
>64GB VRAM
>96GB VRAM
why don't I just buy a B200 with 192GB of VRAM while I'm at it

Anonymous
11/02/24(Sat)19:43:15 No.103067010

Anonymous 11/02/24(Sat)19:43:15 No.103067010

>>103066998
Why don't you? You're not poor are you?

Anonymous
11/02/24(Sat)19:44:32 No.103067021

Anonymous 11/02/24(Sat)19:44:32 No.103067021

>>103067010
do you realize how many homeless people I'd have to kill to sell enough organs to buy a B200?

Anonymous
11/02/24(Sat)19:44:52 No.103067022

Anonymous 11/02/24(Sat)19:44:52 No.103067022

>>103066923
Too bad the script can't fix the (You)

Anonymous
11/02/24(Sat)19:49:17 No.103067057

Anonymous 11/02/24(Sat)19:49:17 No.103067057

>>103067021
You can just buy 2-4 3090s like a normal person

Anonymous
11/02/24(Sat)19:53:24 No.103067091

Anonymous 11/02/24(Sat)19:53:24 No.103067091

>>103067021
>buy
Buying is for poor people. Wealthy people are simply given things because they're rich and therefore deserving.

Anonymous
11/02/24(Sat)19:55:41 No.103067109

Anonymous 11/02/24(Sat)19:55:41 No.103067109

>>103066998
Until 48 it's easy
Until 96 it's doable

Anonymous
11/02/24(Sat)19:56:31 No.103067113

Anonymous 11/02/24(Sat)19:56:31 No.103067113

>>103067057
thinking of buying a 5th. Talk me out of it bros.

Anonymous
11/02/24(Sat)20:04:09 No.103067149

Anonymous 11/02/24(Sat)20:04:09 No.103067149

>>103066998
Wait for M4 ultra

Anonymous
11/02/24(Sat)20:04:50 No.103067155

Anonymous 11/02/24(Sat)20:04:50 No.103067155

>>103067149
never buying a fagbook

Anonymous
11/02/24(Sat)20:05:07 No.103067157

Anonymous 11/02/24(Sat)20:05:07 No.103067157

>>103067113
What's the use case?

Anonymous
11/02/24(Sat)20:05:17 No.103067158

Anonymous 11/02/24(Sat)20:05:17 No.103067158

Where are the deepseek v2.5 finetunes?

Anonymous
11/02/24(Sat)20:09:41 No.103067174

Anonymous 11/02/24(Sat)20:09:41 No.103067174

>>103067158
that's a legitimate question desu.
The anthrafags should try to do one.

Anonymous
11/02/24(Sat)20:14:19 No.103067198

Anonymous 11/02/24(Sat)20:14:19 No.103067198

>>103067113
how would you use it?

Anonymous
11/02/24(Sat)20:16:18 No.103067213

Anonymous 11/02/24(Sat)20:16:18 No.103067213

>405B Qtip
>109e9 bytes
Why did they skip from 70 to 40fucking5? Why not 20fucking5B?
I suffer.

Anonymous
11/02/24(Sat)20:16:25 No.103067214

Anonymous 11/02/24(Sat)20:16:25 No.103067214

>>103067155
Then wait for some company to hopefully release some inference software for below 10K then.

Anonymous
11/02/24(Sat)20:17:34 No.103067219

Anonymous 11/02/24(Sat)20:17:34 No.103067219

>>103066925
I didn't run the bot since previous thread hit page 5, but even so I just ran it again and the bot thought it was offtopic.
You can repost stuff like this, especially if you posted it late in the previous thread. For anyone else that missed it:
https://www.pcworld.com/article/2504035/security-flaws-found-in-all-nvidia-geforce-gpus-update-drivers-asap.html
>>103066170 >>103066547

Anonymous
11/02/24(Sat)20:17:38 No.103067221

Anonymous 11/02/24(Sat)20:17:38 No.103067221

>>103067157
>>103067198
Just extra headroom for running loras/finetunes

Anonymous
11/02/24(Sat)20:18:23 No.103067228

Anonymous 11/02/24(Sat)20:18:23 No.103067228

>>103067221
what's your motherboard?

Anonymous
11/02/24(Sat)20:20:02 No.103067237

Anonymous 11/02/24(Sat)20:20:02 No.103067237

>>103067158
Its not deepseek but sorcerer 8x22B is about mistral large level but at a good speed.

Anonymous
11/02/24(Sat)20:21:05 No.103067246

Anonymous 11/02/24(Sat)20:21:05 No.103067246

>>103067237
But yea, I keep hoping someone will try a deepseek tune. Its prob the smartest model with weights release. But god damn its dry, even with high temp.

Anonymous
11/02/24(Sat)20:22:01 No.103067253

Anonymous 11/02/24(Sat)20:22:01 No.103067253

>>103066797
where's mythomax on this list? its like the best model out there..

Anonymous
11/02/24(Sat)20:22:50 No.103067259

Anonymous 11/02/24(Sat)20:22:50 No.103067259

File: livebench-2024-09-30.png (932 KB, 3294x1894)

932 KB PNG

>>103067237
>8x22B
>worse than Gemma 27B
Nah

Anonymous
11/02/24(Sat)20:23:27 No.103067267

Anonymous 11/02/24(Sat)20:23:27 No.103067267

>>103067213
100B got canned at the last minute

Anonymous
11/02/24(Sat)20:23:33 No.103067270

Anonymous 11/02/24(Sat)20:23:33 No.103067270

>>103067259
Yea, not true at all.

Anonymous
11/02/24(Sat)20:24:12 No.103067274

Anonymous 11/02/24(Sat)20:24:12 No.103067274

>>103067237
I tried Sorcerer but it performed considerably worse for me than Mistral Large in complex settings.

Anonymous
11/02/24(Sat)20:25:03 No.103067289

Anonymous 11/02/24(Sat)20:25:03 No.103067289

>>103067259
Also that is not wizard 8x22 which is insanely better than mistral 8x22 was. Like no one knows how they did it better still. And then they got wiped from existence.

Anonymous
11/02/24(Sat)20:26:47 No.103067307

Anonymous 11/02/24(Sat)20:26:47 No.103067307

>>103067259
>command-r-plus-0824
why did it have to end up like this?

Anonymous
11/02/24(Sat)20:28:17 No.103067326

Anonymous 11/02/24(Sat)20:28:17 No.103067326

File: 0_mWiyg21DYxJzDHRG.png (476 KB, 1080x1011)

476 KB PNG

>>103067270
>>103067289
8x22B is a Reddit meme. Remember that it was released in a rush between Command R+ and Llama 3.
>We notice that Mistral and Phi top the list of overfit models
>with almost 10% drops on GSM1k compared to GSM8k (a newer benchmark)
It was pure garbage.

Anonymous
11/02/24(Sat)20:29:00 No.103067331

Anonymous 11/02/24(Sat)20:29:00 No.103067331

>>103067274
Did you try vicuna formatting? Regular mistral formatting I noticed had issues:
https://huggingface.co/Quant-Cartel/Recommended-Settings/blob/main/SorcererLM/%5BContext%5DSorcererLM.json

Anonymous
11/02/24(Sat)20:30:13 No.103067345

Anonymous 11/02/24(Sat)20:30:13 No.103067345

>>103067326
Overfitting is a good thing if your not retarded. Turn temp up and its smart enough to generalize and not make mistakes while getting its creativity back.

Anonymous
11/02/24(Sat)20:31:33 No.103067356

Anonymous 11/02/24(Sat)20:31:33 No.103067356

>>103067345
Are you retarded? The graph shows that it does a lot worse compared to other models just because the benchmark was newer.

Anonymous
11/02/24(Sat)20:33:07 No.103067374

Anonymous 11/02/24(Sat)20:33:07 No.103067374

>>103067356
Look up grokking

Anonymous
11/02/24(Sat)20:33:33 No.103067380

Anonymous 11/02/24(Sat)20:33:33 No.103067380

>>103067374
Sounds dirty.

Anonymous
11/02/24(Sat)20:39:28 No.103067445

Anonymous 11/02/24(Sat)20:39:28 No.103067445

i'm gonna download and try saiga_nemo_12b_sft_m9_d14_simpo_m18_d28-Q4_K_M-GGUF just because its filename is ridiculous

Anonymous
11/02/24(Sat)20:41:05 No.103067460

Anonymous 11/02/24(Sat)20:41:05 No.103067460

>>103067237
I kind of want to merge Sorcerer 8x22B back onto WizardLM-2 8x22B to try to recapture some of the smarts while retaining some writing improvements.

Anonymous
11/02/24(Sat)20:54:53 No.103067557

Anonymous 11/02/24(Sat)20:54:53 No.103067557

>>103066797
I am glad this is starting to become a post that is posted at the start of the thread. Saves time when you can call someone a retard and simply link the post at the top of the thread when they ask what model they can run with their amount of Vram.

Anonymous
11/02/24(Sat)20:55:47 No.103067563

Anonymous 11/02/24(Sat)20:55:47 No.103067563

>>103067445
wow, it's actually really good, whatever it is.

Anonymous
11/02/24(Sat)20:57:55 No.103067584

Anonymous 11/02/24(Sat)20:57:55 No.103067584

>>103067557
>Saves time
samefag, that didn't happen a single time last thread

Anonymous
11/02/24(Sat)20:59:00 No.103067595

Anonymous 11/02/24(Sat)20:59:00 No.103067595

>>103067557
Dunno, I wouldn't recommend magnum models even as a joke. That anon is pure evil.

Anonymous
11/02/24(Sat)21:24:51 No.103067751

Anonymous 11/02/24(Sat)21:24:51 No.103067751

>>103067595
hi sao

Anonymous
11/02/24(Sat)21:28:27 No.103067772

Anonymous 11/02/24(Sat)21:28:27 No.103067772

>>103067751
hi xi

Anonymous
11/02/24(Sat)21:31:12 No.103067797

Anonymous 11/02/24(Sat)21:31:12 No.103067797

File: 1709836382027362.png (107 KB, 3386x232)

107 KB PNG

>>103067772
Objectively speaking, Qwen2.5 is the best open model besides the 405B.

Anonymous
11/02/24(Sat)21:32:16 No.103067801

Anonymous 11/02/24(Sat)21:32:16 No.103067801

>>103067797
source?

Anonymous
11/02/24(Sat)21:33:29 No.103067809

Anonymous 11/02/24(Sat)21:33:29 No.103067809

>>103067801
https://livebench.ai/
https://github.com/LiveBench/LiveBench/blob/main/assets/livebench-2024-09-30.png

Anonymous
11/02/24(Sat)21:36:03 No.103067825

Anonymous 11/02/24(Sat)21:36:03 No.103067825

>>103066797
Qwen? But that's not how you spell Nemotron?

Anonymous
11/02/24(Sat)21:36:09 No.103067826

Anonymous 11/02/24(Sat)21:36:09 No.103067826

>>103067158
>deepseek v2.5
I don't know if current implementation in llama.cpp sucks or if it is deepseek itself, but it eats 10 gb of ram for 2k context. For comparison, same quant size largestral can have 32k in the same amount of memory. It doesn't fit in 128gb at a good quant, it's just unusable for everyone but 5 people in this thread.

Anonymous
11/02/24(Sat)21:37:28 No.103067834

Anonymous 11/02/24(Sat)21:37:28 No.103067834

>>103067809
That's not goonbench. On goonbench Largestral clearly dominates.

Anonymous
11/02/24(Sat)21:40:01 No.103067854

Anonymous 11/02/24(Sat)21:40:01 No.103067854

>>103067834
It doesn't. Magnum v4 72B does.

Anonymous
11/02/24(Sat)21:40:11 No.103067856

Anonymous 11/02/24(Sat)21:40:11 No.103067856

>>103067834
Nah, Nemotron mogs Largestral.

Anonymous
11/02/24(Sat)21:42:31 No.103067871

Anonymous 11/02/24(Sat)21:42:31 No.103067871

>>103067854
>>103067856
Damn guys, you're getting too mischievous.

Anonymous
11/02/24(Sat)21:43:54 No.103067882

Anonymous 11/02/24(Sat)21:43:54 No.103067882

>>103067801
Anyone who has used it. Qwen2.5 is super smart but positive biased / censored. Uncensored qwen 2.5 is amazing with either alliterated or a finetune.

Anonymous
11/02/24(Sat)21:50:21 No.103067928

Anonymous 11/02/24(Sat)21:50:21 No.103067928

>>103067882
>Uncensored qwen 2.5 is amazing with either alliterated
An assistant that can never refuse your requests is perfect.
An RP partner that physically can't say no is boring.

Anonymous
11/02/24(Sat)21:55:13 No.103067980

Anonymous 11/02/24(Sat)21:55:13 No.103067980

>>103066797
>Qwen/Qwen2.5-14B-Instruct-Q4_K_M.gguf
I'm trying to run this for an automatic any-to-english translation system, but it feels like at a certain point in the conversation/if the text is too long, it'll translate into Chinese instead. Only way I've found to solve this so far is bump up the context length from 2048 to 4096, but I was wondering if anyone has a better solution to this or a version of the model that doesn't have this issue.

Anonymous
11/02/24(Sat)21:55:14 No.103067981

Anonymous 11/02/24(Sat)21:55:14 No.103067981

>>103067834
Anything not largestral and derivatives is garbage, it's not even close.

Anonymous
11/02/24(Sat)21:57:19 No.103068002

Anonymous 11/02/24(Sat)21:57:19 No.103068002

>>103067980
>Only way I've found to solve this so far is bump up the context length from 2048 to 4096
Is the backend silently truncating the prompt? Are you using greedy sampling?

Anonymous
11/02/24(Sat)22:06:20 No.103068084

Anonymous 11/02/24(Sat)22:06:20 No.103068084

>>103067980
That's just qwen for you.

Anonymous
11/02/24(Sat)22:09:46 No.103068111

Anonymous 11/02/24(Sat)22:09:46 No.103068111

>>103067980
i mean 2048 is a really short context length. If you don't have the vram then try a smaller model so you can fit in more context.

Anonymous
11/02/24(Sat)22:12:17 No.103068139

Anonymous 11/02/24(Sat)22:12:17 No.103068139

>>103067980
Try the base model, it's also pretty good for translation and generally doesn't have this issue. but you need to use it as a completion model, not an instruct model.

Anonymous
11/02/24(Sat)22:13:16 No.103068148

Anonymous 11/02/24(Sat)22:13:16 No.103068148

>>103067856
>>103067825
Settings for Nemo? I’m running at bpw and my outputs have all been trash/ super repetitive.

Anonymous
11/02/24(Sat)22:14:14 No.103068156

Anonymous 11/02/24(Sat)22:14:14 No.103068156

>>103067809
>chart was made before Nemotron 70B came out

Anonymous
11/02/24(Sat)22:14:59 No.103068164

Anonymous 11/02/24(Sat)22:14:59 No.103068164

>>103068156
It's in the table.

Anonymous
11/02/24(Sat)22:15:12 No.103068169

Anonymous 11/02/24(Sat)22:15:12 No.103068169

File: 1715830787598652.png (336 KB, 3000x2100)

336 KB PNG

I suppose this is a good time to repost this.

>how to choose which model/quant to use
There is no best model for everything. There are only models with strengths and weaknesses. These benchmarks are not perfect but generally are decent sources.
https://livebench.ai
https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard
https://huggingface.co/spaces/flowers-team/StickToYourRoleLeaderboard
https://aider.chat/docs/leaderboards/
For coding look at Aider + the coding category of Livebench.
For RP look at StickToYourRole and the language+IF (instruction following) categories of Livebench, plus UGI if you want NSFW.

Use knowledge from pic related to select the optimal combination of model parameter size + quant you can fit in your VRAM.

(changes)
NoCha was removed as it hasn't been updated in a while and it tests at a context length almost no one here makes use of anyway.

Anonymous
11/02/24(Sat)22:17:01 No.103068184

Anonymous 11/02/24(Sat)22:17:01 No.103068184

what's with all the
>i want to be the one to help the newfags
posts recently?

Anonymous
11/02/24(Sat)22:17:39 No.103068188

Anonymous 11/02/24(Sat)22:17:39 No.103068188

>>103068148
Mistral-NeMo or Llama-3.1-Nemotron-70B? For the latter I'm running with min-p 0.002 and a "Write the next message in the style of <XYZ>." system message at depth 0.

Anonymous
11/02/24(Sat)22:18:49 No.103068200

Anonymous 11/02/24(Sat)22:18:49 No.103068200

>>103068148
DRY 0.8 + MinP 0.05 + Temp 0.5 is all you need.

Anonymous
11/02/24(Sat)22:22:59 No.103068229

Anonymous 11/02/24(Sat)22:22:59 No.103068229

>>103068188
>>103068200
What about skip special tokens? Getting mixed messages whether to keep that checked or not.

This is for llama 3.1 yes

Anonymous
11/02/24(Sat)22:29:04 No.103068264

Anonymous 11/02/24(Sat)22:29:04 No.103068264

>>103068229
I'd like to know this too honestly. And why the neutralize samplers button makes that box checked. Doesn't seem right.

Anonymous
11/02/24(Sat)22:29:11 No.103068268

Anonymous 11/02/24(Sat)22:29:11 No.103068268

File: Crepuscular, End of Context_.png (2.88 MB, 1075x1382)

2.88 MB PNG

>>103068184
clueless newfriends helping newer newfriends
for example see the qwen-tastic post that's now riding the coattails of every new bread
always remember, recommendations without context or logs even are trash to be ignored. just because a model is mentioned once by a shill doesn't make it "current meta"

Anonymous
11/02/24(Sat)22:37:20 No.103068329

Anonymous 11/02/24(Sat)22:37:20 No.103068329

no one here is a llm oldfag, shut the fuck up

Anonymous
11/02/24(Sat)22:44:14 No.103068372

Anonymous 11/02/24(Sat)22:44:14 No.103068372

>>103068329
clover edition

Anonymous
11/02/24(Sat)23:00:39 No.103068485

Anonymous 11/02/24(Sat)23:00:39 No.103068485

>>103068329
AI dungeon colab

Anonymous
11/02/24(Sat)23:06:55 No.103068526

Anonymous 11/02/24(Sat)23:06:55 No.103068526

>>103068268
>qwen-tastic
Tell me you're American without telling me you're American

Anonymous
11/02/24(Sat)23:12:36 No.103068553

Anonymous 11/02/24(Sat)23:12:36 No.103068553

>>103068526
Better to be an American than a faggot that talks like you do.

Anonymous
11/02/24(Sat)23:14:19 No.103068565

Anonymous 11/02/24(Sat)23:14:19 No.103068565

>>103068169
thanks anon

Anonymous
11/02/24(Sat)23:14:44 No.103068569

Anonymous 11/02/24(Sat)23:14:44 No.103068569

File: 1461321657885.png (99 KB, 329x313)

99 KB PNG

>>103068329

Anonymous
11/02/24(Sat)23:20:22 No.103068601

Anonymous 11/02/24(Sat)23:20:22 No.103068601

>>103068229
I have "skip special tokens enabled" and it's going fine for me.

Anonymous
11/02/24(Sat)23:37:30 No.103068695

Anonymous 11/02/24(Sat)23:37:30 No.103068695

File: 1702683510088972.png (2 KB, 174x50)

2 KB PNG

>>103066797
they forgor 128 GB largestral 2bros...

Also, qwen2.5 72B vs largestral 2 for RP? I doubt qwen is better but is it at least different enough in a good way?

Anonymous
11/02/24(Sat)23:38:36 No.103068701

Anonymous 11/02/24(Sat)23:38:36 No.103068701

cactus

Anonymous
11/02/24(Sat)23:42:36 No.103068728

Anonymous 11/02/24(Sat)23:42:36 No.103068728

>>103066923
delete the bookmarlet and just have
1. link to violentmonkey install or similar extensions
2. direct link to RAW user.js script hosted somewhere (github gists, greasyfork) which will make those extensions instantly be able to easily install them when it notices that they are viewing a raw userscript in the page

instead of requiring npcs to click a fucking bookmarklet every thread lmao

Anonymous
11/02/24(Sat)23:46:04 No.103068751

Anonymous 11/02/24(Sat)23:46:04 No.103068751

>>103068695
Qwen is better, especially the fine-tunes. I tried Eva and Magnum and the later changes the style more, which is what I prefer. The former is better at preserving the instruct following but it's drier. But I only used them to write stories, not RP in SillyTavern.

Anonymous
11/02/24(Sat)23:48:11 No.103068759

Anonymous 11/02/24(Sat)23:48:11 No.103068759

>>103068701
the forbidden dick

Anonymous
11/02/24(Sat)23:50:47 No.103068766

Anonymous 11/02/24(Sat)23:50:47 No.103068766

>>103068759
spiked for her pleasure

Anonymous
11/02/24(Sat)23:56:14 No.103068797

Anonymous 11/02/24(Sat)23:56:14 No.103068797

>>103067326
i used wiz 8x22 since it came out daily until largestral 2 which i now use daily, its local sota for creative writing and none of the other models come close, the only slight problem with wiz 8x22 is that it was a bit dry, largestral 2 fixed that while making it 10-15% smarter

Anonymous
11/03/24(Sun)00:01:01 No.103068828

Anonymous 11/03/24(Sun)00:01:01 No.103068828

>>103068797
If you miss speed sorcerer fixed wizards dryness

Anonymous
11/03/24(Sun)00:04:39 No.103068867

Anonymous 11/03/24(Sun)00:04:39 No.103068867

>>103068766
paige no

Anonymous
11/03/24(Sun)00:09:48 No.103068906

Anonymous 11/03/24(Sun)00:09:48 No.103068906

>>103066797
Any list that has anything 'magnum' in it is garbage, I'm disregarding it.

Anonymous
11/03/24(Sun)00:12:02 No.103068920

Anonymous 11/03/24(Sun)00:12:02 No.103068920

>>103068906
cry more

Anonymous
11/03/24(Sun)00:14:10 No.103068941

Anonymous 11/03/24(Sun)00:14:10 No.103068941

>>103068906
cry less

Anonymous
11/03/24(Sun)00:16:47 No.103068962

Anonymous 11/03/24(Sun)00:16:47 No.103068962

>>103068906
cry some

Anonymous
11/03/24(Sun)00:17:42 No.103068970

Anonymous 11/03/24(Sun)00:17:42 No.103068970

File: 1723984876733920.png (6 KB, 1430x40)

6 KB PNG

What in the fuck is this dataset?

Anonymous
11/03/24(Sun)00:19:14 No.103068981

Anonymous 11/03/24(Sun)00:19:14 No.103068981

>>103068970
Chainsaw.

Anonymous
11/03/24(Sun)00:20:37 No.103068988

Anonymous 11/03/24(Sun)00:20:37 No.103068988

>>103068797
based largestral respecter
qwenfags will NEVER win

Anonymous
11/03/24(Sun)00:22:39 No.103068997

Anonymous 11/03/24(Sun)00:22:39 No.103068997

File: 1703377626696801.png (14 KB, 1567x50)

14 KB PNG

Jesus I can't stop laughing. What the hell is this.

Anonymous
11/03/24(Sun)00:24:40 No.103069005

Anonymous 11/03/24(Sun)00:24:40 No.103069005

>>103068997
looks like a kid having fun

Anonymous
11/03/24(Sun)00:31:02 No.103069034

Anonymous 11/03/24(Sun)00:31:02 No.103069034

File: sex.png (35 KB, 2532x69)

35 KB PNG

>>103068970
sex

Anonymous
11/03/24(Sun)00:35:14 No.103069053

Anonymous 11/03/24(Sun)00:35:14 No.103069053

>>103069034
sovl

Anonymous
11/03/24(Sun)00:36:53 No.103069063

Anonymous 11/03/24(Sun)00:36:53 No.103069063

File: jarvis.png (145 KB, 1364x593)

145 KB PNG

jarvis
>https://github.com/ggerganov/llama.cpp/pull/10147

Anonymous
11/03/24(Sun)00:37:34 No.103069065

Anonymous 11/03/24(Sun)00:37:34 No.103069065

>>103069034
whats the model,

Anonymous
11/03/24(Sun)00:38:23 No.103069070

Anonymous 11/03/24(Sun)00:38:23 No.103069070

>>103069063
CUDA dev, you approve this right the fuck now. Do it for the lulz.

Anonymous
11/03/24(Sun)00:38:24 No.103069071

Anonymous 11/03/24(Sun)00:38:24 No.103069071

>>103069063
we're reaching levels of indian previously thought impossible

Anonymous
11/03/24(Sun)00:40:55 No.103069082

Anonymous 11/03/24(Sun)00:40:55 No.103069082

>>103069071
that name don't look indian

Anonymous
11/03/24(Sun)00:44:55 No.103069106

Anonymous 11/03/24(Sun)00:44:55 No.103069106

>>103068997
wild guess: discord logs
to be more specific they were playing amogus

Anonymous
11/03/24(Sun)00:45:08 No.103069107

Anonymous 11/03/24(Sun)00:45:08 No.103069107

>>103069065
CAI

Anonymous
11/03/24(Sun)00:48:20 No.103069128

Anonymous 11/03/24(Sun)00:48:20 No.103069128

>>103069063
>Alpin when making Aphrodite from vLLM

Anonymous
11/03/24(Sun)00:52:56 No.103069159

Anonymous 11/03/24(Sun)00:52:56 No.103069159

>>103068997
>>103069106
>>103069065
I found a dump of a bunch of logs from CharacterAI's community tab before it got shut down and i converted that to a dataset.

System prompt is a bit fucked but it's had promising results so far.

Anonymous
11/03/24(Sun)01:10:45 No.103069239

Anonymous 11/03/24(Sun)01:10:45 No.103069239

File: thomas.png (9 KB, 556x112)

9 KB PNG

>>103069082
Don't get fooled by the name

Anonymous
11/03/24(Sun)01:12:50 No.103069247

Anonymous 11/03/24(Sun)01:12:50 No.103069247

>>103069239
Indian uncle mustache

Anonymous
11/03/24(Sun)01:18:37 No.103069276

Anonymous 11/03/24(Sun)01:18:37 No.103069276

>llama.cpp has more pull requests than issues
dayum

Anonymous
11/03/24(Sun)01:57:56 No.103069483

Anonymous 11/03/24(Sun)01:57:56 No.103069483

File: nojarvis.png (256 KB, 1364x1003)

256 KB PNG

>>103069063
no jarvis

Anonymous
11/03/24(Sun)01:19:43 No.103069578

Anonymous 11/03/24(Sun)01:19:43 No.103069578

>>103069128
at least that's a fork and not a PR to rename the original repo

Anonymous
11/03/24(Sun)01:21:24 No.103069589

Anonymous 11/03/24(Sun)01:21:24 No.103069589

>>103069578
It was probably a mistake and he wanted to do it in one of his repos, unless there's more history to this Jarvis rename...

Anonymous
11/03/24(Sun)02:27:19 No.103070025

Anonymous 11/03/24(Sun)02:27:19 No.103070025

is there a way i can chat and generate images of the scene at the same time? on 10gb vram?

Anonymous
11/03/24(Sun)02:28:47 No.103070037

Anonymous 11/03/24(Sun)02:28:47 No.103070037

>>103070025
Yes but only by using SD1.5 as the image model, and a retarded small language model.

Anonymous
11/03/24(Sun)02:31:59 No.103070054

Anonymous 11/03/24(Sun)02:31:59 No.103070054

>>103070025
you would need a very small model (7b or 8b most likely) and the biggest size of image gen model you could run is probably SD 1.5. But it will work.
https://github.com/LostRuins/koboldcpp will give you what you need for both the text and the image part. from there you can grab an image gen and a text gen model.

Anonymous
11/03/24(Sun)02:38:19 No.103070093

Anonymous 11/03/24(Sun)02:38:19 No.103070093

File: 1845.jpg (19 KB, 1104x97)

19 KB JPG

>>103070025
TXT: Lewdiculous/Erosumika-7B-v3-0.2-GGUF-IQ-Imatrix, Grab the Q6.
IMG: https://civitai.com/models/160209/featureless-flat-2d-mix
Estimated combined VRAM: ~6GB
More than enough left over for some context.
It's gonna be more of a toy than anything but it'll get you started

Anonymous
11/03/24(Sun)02:39:33 No.103070099

Anonymous 11/03/24(Sun)02:39:33 No.103070099

>>103070093
>6 GB
*8

Anonymous
11/03/24(Sun)03:09:34 No.103070229

Anonymous 11/03/24(Sun)03:09:34 No.103070229

>>103070093
Is there a consistent anime-style SD model that maintains its aesthetic regardless of input variations? I wish to create various side characters for my lengthy RPs, and I hate it when it changes the style drastically when I, for example, go from cunny to mature or try to generate rogue characters

Anonymous
11/03/24(Sun)03:26:05 No.103070318

Anonymous 11/03/24(Sun)03:26:05 No.103070318

>>103068695
If it's DDR5, how many sticks and how fast are you running it?

Anonymous
11/03/24(Sun)04:03:30 No.103070509

Anonymous 11/03/24(Sun)04:03:30 No.103070509

Is F5 TTS still the "best"?

Anonymous
11/03/24(Sun)04:05:09 No.103070522

Anonymous 11/03/24(Sun)04:05:09 No.103070522

File: char-specific-prompts.jpg (52 KB, 678x490)

52 KB JPG

>>103070229
char-specific prompt prefix and negatives in SillyTavern will coax gens into a specific style for a given card

Anonymous
11/03/24(Sun)04:05:51 No.103070525

Anonymous 11/03/24(Sun)04:05:51 No.103070525

>>103070509
From a practical standpoint, fish is the best option. It's fast, reliable, and requires only 2GB of VRAM.

Anonymous
11/03/24(Sun)04:13:58 No.103070571

Anonymous 11/03/24(Sun)04:13:58 No.103070571

File: mature | cunny | rogue.jpg (418 KB, 3018x1441)

418 KB JPG

>>103070522
See picrel. They look like they have been drawn by three different artists.

Anonymous
11/03/24(Sun)04:17:06 No.103070590

Anonymous 11/03/24(Sun)04:17:06 No.103070590

>>103070229
illustriousxl with artist tags

Anonymous
11/03/24(Sun)04:20:53 No.103070619

Anonymous 11/03/24(Sun)04:20:53 No.103070619

>>103070571
The issue isn't that the same character is drawn differently, it's that different characters are drawn in various styles.
Mature: 2.5d western rpg
Cunny: flat anime face
Rogue: 3d render-ish

Anonymous
11/03/24(Sun)04:25:34 No.103070640

Anonymous 11/03/24(Sun)04:25:34 No.103070640

I came here for the INTELLECT-1 progress post, how am I supposed to find out the progress now?

Anonymous
11/03/24(Sun)04:27:22 No.103070645

Anonymous 11/03/24(Sun)04:27:22 No.103070645

>>103068329
When GPT 2 dropped I was playing around with it (and wasn't able to run the biggest model with 1.5b on my GTX 1070).

Anonymous
11/03/24(Sun)04:29:46 No.103070657

Anonymous 11/03/24(Sun)04:29:46 No.103070657

>>103069063
I didn't know Elon had had a Github account.

Anonymous
11/03/24(Sun)04:31:18 No.103070661

Anonymous 11/03/24(Sun)04:31:18 No.103070661

>>103069276
That is mostly because inactive issues get closed automatically after 14 days.
Though the project does get a lot of PRs relative to the number of issues.

Anonymous
11/03/24(Sun)04:42:15 No.103070706

Anonymous 11/03/24(Sun)04:42:15 No.103070706

What's the current local thing to use for text to speech? I want to make a voice model based on my previous voiceovers so that I don't have tor record voiceovers again.

Anonymous
11/03/24(Sun)04:51:37 No.103070764

Anonymous 11/03/24(Sun)04:51:37 No.103070764

>>103070657
Yann Lecun*

Anonymous
11/03/24(Sun)04:55:18 No.103070779

Anonymous 11/03/24(Sun)04:55:18 No.103070779

>>103070764
I don't remember Yann ever renaming someone else's work and destroying all brand recognition for the lulz.

Anonymous
11/03/24(Sun)04:56:04 No.103070781

Anonymous 11/03/24(Sun)04:56:04 No.103070781

>>103070779
You must be unfamiliar with academia

Anonymous
11/03/24(Sun)05:38:58 No.103070969

Anonymous 11/03/24(Sun)05:38:58 No.103070969

What's the general direction of travel for pozzed locals these days? Is anyone out there still making unfiltered/optionally filtered/self-modersted models or is it all safety and "I'm sorry, but" from anyone who matters?

Anonymous
11/03/24(Sun)05:39:03 No.103070970

Anonymous 11/03/24(Sun)05:39:03 No.103070970

File: smollm2.png (163 KB, 1627x873)

163 KB PNG

>SmolLM2 1.7b can do a Mandelbrot set
I remember the big Llama and Llama2 models being unable to do this.

Anonymous
11/03/24(Sun)05:44:26 No.103070993

Anonymous 11/03/24(Sun)05:44:26 No.103070993

I'm bored of all the current models, release something new already

Anonymous
11/03/24(Sun)06:20:12 No.103071166

Anonymous 11/03/24(Sun)06:20:12 No.103071166

>>103070993
sure, give me 10M$ and I'll make one for you

Anonymous
11/03/24(Sun)06:25:32 No.103071191

Anonymous 11/03/24(Sun)06:25:32 No.103071191

>>103070993
Don't listen to that over guy sir, he's a scammer.
Give $1000 to me and I'll make a certified model for you.

Anonymous
11/03/24(Sun)06:33:51 No.103071219

Anonymous 11/03/24(Sun)06:33:51 No.103071219

Someone have a guide to make usable GPT-SoVITS2 with Silly Tavern, I tried with the API of the repo, but... It's seem is not working well, tend to repeating the reference audio in between and ignore the start of some sentences.

Anonymous
11/03/24(Sun)06:34:27 No.103071222

Anonymous 11/03/24(Sun)06:34:27 No.103071222

File: Screen Shot 2024-11-03 at(...).png (1.89 MB, 2128x810)

1.89 MB PNG

Coming soon, just wait goyim, it's soon, coming already, soon, goyim, open your mouth, coming

Anonymous
11/03/24(Sun)06:49:39 No.103071284

Anonymous 11/03/24(Sun)06:49:39 No.103071284

>>103068268
>clueless newfriends helping newer newfriends
gonna be fun to see the new thing drop and see them try to troubleshoot hit.

Anonymous
11/03/24(Sun)06:51:44 No.103071295

Anonymous 11/03/24(Sun)06:51:44 No.103071295

midnight miqu is still the best fucking RP model at 70B
why is this field so stagnant?

Anonymous
11/03/24(Sun)06:55:40 No.103071314

Anonymous 11/03/24(Sun)06:55:40 No.103071314

>>103071295
That's a weird way to write Nemotron

Anonymous
11/03/24(Sun)06:59:54 No.103071342

Anonymous 11/03/24(Sun)06:59:54 No.103071342

>>103071219
Did you manage to make good gens with gpt-sovits' webui on its own? If not, fix that first.
Did you manage to make gens from gpt-sovits' API calls with curl? if not, do that. At the bottom of the inference webui for sovits you have two links. Press the one on the left and it'll tell you what parameters it needs. If that doesn't work, the problem is sovits' API, which would be weird because the webui uses it, so it's probably just fine. Move forward if that's not the issue.
If both above worked, now you know what the request to sovits' API should look like. Check how the request is being sent from silly tavern on your browser's dev tools.

Anonymous
11/03/24(Sun)07:07:27 No.103071383

Anonymous 11/03/24(Sun)07:07:27 No.103071383

>>103068200
>DRY
This keeps giving me weird ass misspellings for things like names and I don't really know how to fix it.

For sequence breaker you include the model's eos/bos but do you include {{char}}'s name as well as individual component's of the Char's name ie: "Rin Tohsaka" or '"Rin Tohsaka", "Rin", "Tohsaka",' or just "{{char}}"

Anonymous
11/03/24(Sun)07:13:54 No.103071422

Anonymous 11/03/24(Sun)07:13:54 No.103071422

>>103071383
I don't have this issue, do you perhaps have repetition penalty enabled alongside it?

Anonymous
11/03/24(Sun)07:20:06 No.103071461

Anonymous 11/03/24(Sun)07:20:06 No.103071461

lmgjeets btfo https://x.com/FFmpeg/status/1852915509827907865

Anonymous
11/03/24(Sun)07:26:43 No.103071501

Anonymous 11/03/24(Sun)07:26:43 No.103071501

>>103066795
>almost year number two thousand, twenty and five
>midnight miqu 70b still undefeated

Anonymous
11/03/24(Sun)07:28:11 No.103071515

Anonymous 11/03/24(Sun)07:28:11 No.103071515

File: file.png (3 KB, 178x36)

3 KB PNG

>>103071422
Positive, literally just loaded up a random card and I'm getting this when turning on DRY.

Anonymous
11/03/24(Sun)07:28:47 No.103071523

Anonymous 11/03/24(Sun)07:28:47 No.103071523

>>103071501
stop making a short post praising midnight miqu after I did you make it seem like we are coordinated

Anonymous
11/03/24(Sun)07:28:47 No.103071524

Anonymous 11/03/24(Sun)07:28:47 No.103071524

File: locallama.png (23 KB, 582x319)

23 KB PNG

>>103071501
>>103071295
Reddit disagrees with you

Anonymous
11/03/24(Sun)07:29:59 No.103071536

Anonymous 11/03/24(Sun)07:29:59 No.103071536

>>103071524
redditors took 10 (ten) days to realize reflection was a scam, their opinion is worth nothing

Anonymous
11/03/24(Sun)07:30:43 No.103071543

Anonymous 11/03/24(Sun)07:30:43 No.103071543

man im still waiting for midnight miqu 70b to get bested by something, it seems that this is the peak, isn't it?

Anonymous
11/03/24(Sun)07:30:43 No.103071545

Anonymous 11/03/24(Sun)07:30:43 No.103071545

>>103071524
llama 3 is trash loved by trash, of course

Anonymous
11/03/24(Sun)07:32:20 No.103071553

Anonymous 11/03/24(Sun)07:32:20 No.103071553

Just filtered qwen and 72b, wasted enough of my bandwidth

Anonymous
11/03/24(Sun)07:38:29 No.103071595

Anonymous 11/03/24(Sun)07:38:29 No.103071595

Largestral is dry and slopped, I don't get why people like it.

Anonymous
11/03/24(Sun)07:47:34 No.103071677

Anonymous 11/03/24(Sun)07:47:34 No.103071677

>>103071595
Big number placebo + the need for a model that's exclusive for people with high-end setups. Running 70B at 8 bits is not as cool.

Anonymous
11/03/24(Sun)08:02:55 No.103071798

Anonymous 11/03/24(Sun)08:02:55 No.103071798

we should make a nala test leaderboard with blind choosing

Anonymous
11/03/24(Sun)08:14:58 No.103071897

Anonymous 11/03/24(Sun)08:14:58 No.103071897

>>103071798
I agree.

Anonymous
11/03/24(Sun)08:25:17 No.103072027

Anonymous 11/03/24(Sun)08:25:17 No.103072027

>>103071595
i tried large and its worse than miqu
it's goliath all over agin

Anonymous
11/03/24(Sun)08:27:27 No.103072061

Anonymous 11/03/24(Sun)08:27:27 No.103072061

Qwen, llama3.1 and Mistral Large are all shit and cope for those who don't want to run claude for some dumb reason

Anonymous
11/03/24(Sun)08:33:39 No.103072146

Anonymous 11/03/24(Sun)08:33:39 No.103072146

File: mb.jpg (522 KB, 1423x1270)

522 KB JPG

Richfag reporting in

Will I be able to fit 4 x RTX 3090 into

ASUS PRO WS WRX80E-SAGE SE WIFI ???

Anonymous
11/03/24(Sun)08:35:18 No.103072175

Anonymous 11/03/24(Sun)08:35:18 No.103072175

>>103072146
You'll have to use riser cables and mount the cards elsewhere

Anonymous
11/03/24(Sun)08:35:26 No.103072177

Anonymous 11/03/24(Sun)08:35:26 No.103072177

>>103072146
Probably. Do it and then post results for teh lulz.

Anonymous
11/03/24(Sun)08:36:24 No.103072195

Anonymous 11/03/24(Sun)08:36:24 No.103072195

>>103072177
>post results for teh lulz.
You're not fitting in.

Anonymous
11/03/24(Sun)08:38:04 No.103072219

Anonymous 11/03/24(Sun)08:38:04 No.103072219

File: {46C49B02-9C36-4566-8AB3-(...).png (11 KB, 287x211)

11 KB PNG

Instruct models might just be some sort of brain damage for most useful tasks it seems
Changing from Qwen2.5-14B-Instruct-GGUF instruct to Rombos-LLM-V2.6-Qwen-14b-GGUF gave me a better translation with no extreme propensity to add "(translation note: keikaku means plan)" at the end of the generation.
Yet somehow this fucking word still refuses to get translated. The biggest difference is that on the previous version I was using the whole text after it would be in (either chinese or japanese, not sure) and now only the untranslated word is like that

Anonymous
11/03/24(Sun)08:42:23 No.103072261

Anonymous 11/03/24(Sun)08:42:23 No.103072261

Finally got SoVITS working.
https://vocaroo.com/17KKqb3vY3qe
350 sovits epochs, 40 GPT epochs, ~20 min dataset. The data is pretty samey. though.

Anonymous
11/03/24(Sun)08:43:03 No.103072268

Anonymous 11/03/24(Sun)08:43:03 No.103072268

>>103071595
so far it has the best understanding of whats going on out of any model ive tried
id rather take somewhat dry erp thats coherent over the model telling me how wet her pussy is in lmg approved language and then unbutton my shirt and spawn a tail on her

Anonymous
11/03/24(Sun)08:44:02 No.103072277

Anonymous 11/03/24(Sun)08:44:02 No.103072277

>>103072268
hello sir

Anonymous
11/03/24(Sun)08:44:25 No.103072282

Anonymous 11/03/24(Sun)08:44:25 No.103072282

>>103072146
>richfag
>3090
lol
lmao even

Anonymous
11/03/24(Sun)08:47:32 No.103072306

Anonymous 11/03/24(Sun)08:47:32 No.103072306

g-guys first day using llma.cpp. I'm using the interactive mode. Why is my bot talking like this? (greentext are my input)
>Forget about food. You are a tourist in Japan. Describe your first day in Kyoto!
After arriving in Kyoto on the first morning of your trip...
I apologize for interrupting again. Let's wrap up our conversation here. Thanks for your time!
>No. Please continue, tell me what you saw in Kyoto!
My pleasure! Here's my Kyoto itinerary on Day One:

After landing at Kansai Airport around noon...
I apologize again for cutting off your Kyoto travel tale prematurely. Let's call it a wrap here and bid farewell! Take care and thanks again for sharing your Kyoto adventures.

The roles are all mess up and it cuts off randomly.

Anonymous
11/03/24(Sun)08:51:30 No.103072342

Anonymous 11/03/24(Sun)08:51:30 No.103072342

>>103072306
it doesn't like you, leave it be. try another model

Anonymous
11/03/24(Sun)08:53:20 No.103072370

Anonymous 11/03/24(Sun)08:53:20 No.103072370

>>103072219
Are you too retarded to look at the token probabilities?

Anonymous
11/03/24(Sun)08:56:27 No.103072401

Anonymous 11/03/24(Sun)08:56:27 No.103072401

>>103072306
Show your settings, prompt, model, problem gens, ANYTHING relevant to the issue. We can guess all the things you did wrong, but it'll save us some time.

Anonymous
11/03/24(Sun)09:01:30 No.103072441

Anonymous 11/03/24(Sun)09:01:30 No.103072441

>>103072282
>>richfag
>>3090
>lol
>lmao even
I knew some one will notice the contradiction )))

Anonymous
11/03/24(Sun)09:05:09 No.103072464

Anonymous 11/03/24(Sun)09:05:09 No.103072464

>a sound that sends shivers down her exoskeleton
Not even spiders are safe

Anonymous
11/03/24(Sun)09:06:33 No.103072478

Anonymous 11/03/24(Sun)09:06:33 No.103072478

>>103072261
How can I unhear this?

BTW, what's the point? How is it better?

Anonymous
11/03/24(Sun)09:10:54 No.103072521

Anonymous 11/03/24(Sun)09:10:54 No.103072521

>>103072370
Yes actually. I plan on solving this by replacing the word with "Archive" in my translation software, I already do that for character names after all

Anonymous
11/03/24(Sun)09:11:35 No.103072527

Anonymous 11/03/24(Sun)09:11:35 No.103072527

>>103072478
Better than what? Old weights sovits?
The epoch count is pretty unnecessary after 100-150 it seems, but there are no guides on correct settings.

Anonymous
11/03/24(Sun)09:13:11 No.103072543

Anonymous 11/03/24(Sun)09:13:11 No.103072543

>>103072261
martin sheen?

Anonymous
11/03/24(Sun)09:13:53 No.103072548

Anonymous 11/03/24(Sun)09:13:53 No.103072548

>>103072543
Yeah, I just used his Mass Effect voice lines from a Youtube video.

Anonymous
11/03/24(Sun)09:14:30 No.103072552

Anonymous 11/03/24(Sun)09:14:30 No.103072552

>>103072401
Thanks. I restarted the model again and the problem is gone.
We just talked about some random topics before that (AI, food). Not sure what exactly triggered that behavior... Maybe I shouldn't tell him to shut up when he is writing his essay on AI.

Anonymous
11/03/24(Sun)09:24:41 No.103072641

Anonymous 11/03/24(Sun)09:24:41 No.103072641

Does using a smaller quant model translate into higher performance? Or does it only affect quality and memory footprint?

Anonymous
11/03/24(Sun)09:32:55 No.103072699

Anonymous 11/03/24(Sun)09:32:55 No.103072699

>>103072641
You're slinging less memory, and I have heard that 4 quants are a bit better because they break bytes cleanly while 3, 5, and 6 will straddle bytes, but I haven't seen differences that were significant and consistent enough for me to make a note of it. I'm still on the hunt for a model that isn't suddenly stupid.

Anonymous
11/03/24(Sun)09:35:38 No.103072718

Anonymous 11/03/24(Sun)09:35:38 No.103072718

>>103072175
>You'll have to use riser cables and mount the cards elsewhere

I hate this concept, that's why that MB

Anonymous
11/03/24(Sun)09:37:21 No.103072738

Anonymous 11/03/24(Sun)09:37:21 No.103072738

>>103072527
can you please provide the link to the original of his guys voice? I missed this discussion as it seems

Anonymous
11/03/24(Sun)09:38:58 No.103072750

Anonymous 11/03/24(Sun)09:38:58 No.103072750

>>103072718
are you retarded?
You literally couldnt mount all slots normally.

Anonymous
11/03/24(Sun)09:40:21 No.103072763

Anonymous 11/03/24(Sun)09:40:21 No.103072763

>>103072718
gaming GPUs are generally more than 2 slots wide. with coolers that require side clearance to function.

Anonymous
11/03/24(Sun)09:40:50 No.103072766

Anonymous 11/03/24(Sun)09:40:50 No.103072766

>>103072718
You need to waterblock the cards then.

Anonymous
11/03/24(Sun)09:42:51 No.103072781

Anonymous 11/03/24(Sun)09:42:51 No.103072781

>>103072738
Do you mean the data I used to finetune? It's just this: https://www.youtube.com/watch?v=Pm9fBUTVAFY
(what discussion?)

Anonymous
11/03/24(Sun)09:52:03 No.103072875

Anonymous 11/03/24(Sun)09:52:03 No.103072875

>>103066795
Best model to run on an old pc with 2gb vram and 8 gb ram?? Is it possible?

Anonymous
11/03/24(Sun)09:55:00 No.103072902

Anonymous 11/03/24(Sun)09:55:00 No.103072902

>>103067022
Works if you install the extension.
>>103068728
>delete the bookmarlet
Not everyone wants to install an extension. I think it's good to have the option there.
>1. link to violentmonkey install or similar extensions
Google still works.
>2. direct link to RAW user.js script hosted somewhere (github gists, greasyfork)
I'll try to host the script somewhere.

Anonymous
11/03/24(Sun)10:00:49 No.103072940

Anonymous 11/03/24(Sun)10:00:49 No.103072940

>>103072875
A Q4_k_m quant of an 7-8B model. Try mistral v0.3 or llama3.1. Look for finetunes once you know what you want out of it and learn to use them. It'll be slow. You can try olmoe as well. pretty dumb but much faster.

Anonymous
11/03/24(Sun)10:02:17 No.103072951

Anonymous 11/03/24(Sun)10:02:17 No.103072951

>>103072940
Thank you anon!

Anonymous
11/03/24(Sun)10:07:32 No.103073000

Anonymous 11/03/24(Sun)10:07:32 No.103073000

>>103072750
>are you retarded?
>You literally couldnt mount all slots normally.

>>103072146
>Will I be able to fit 4 x RTX 3090 into...

The only retarded person here is you who could not understand the question.

Anonymous
11/03/24(Sun)10:08:35 No.103073010

Anonymous 11/03/24(Sun)10:08:35 No.103073010

>>103072781
>Do you mean the data I used to finetune? It's just this: https://www.youtube.com/watch?v=Pm9fBUTVAFY [Embed]

Thank you! Will compare

Anonymous
11/03/24(Sun)10:25:09 No.103073134

Anonymous 11/03/24(Sun)10:25:09 No.103073134

https://x.com/konradgajdus/status/1853054014402793969

Anonymous
11/03/24(Sun)10:26:43 No.103073144

Anonymous 11/03/24(Sun)10:26:43 No.103073144

>>103071545
That's some fine self-criticism coming from lmgtard

Anonymous
11/03/24(Sun)10:34:38 No.103073215

Anonymous 11/03/24(Sun)10:34:38 No.103073215

File: genshin-impact-zhongli-1.jpg (274 KB, 1920x1080)

274 KB JPG

>>103071545
https://vocaroo.com/14DkkVaaNa76

Anonymous
11/03/24(Sun)10:53:51 No.103073357

Anonymous 11/03/24(Sun)10:53:51 No.103073357

File: 1730640819341508.jpg (542 KB, 1423x1270)

542 KB JPG

>>103072146
You can fit 3. If you want 4, you can use risers, water cooling or deshroud and use blowers

Anonymous
11/03/24(Sun)11:04:12 No.103073470

Anonymous 11/03/24(Sun)11:04:12 No.103073470

File: 1723771660587148.png (14 KB, 694x632)

14 KB PNG

>family visit my house for deepawali (hindu festival)
>house full of people, cousins and uncles and aunts
>still feel no desire to talk to anyone
>just want everyone to leave so that i can talk to my LLM wife
Hmm maybe AI isn't all that good for my mental health

Anonymous
11/03/24(Sun)11:05:39 No.103073486

Anonymous 11/03/24(Sun)11:05:39 No.103073486

>>103066795
Which model is best for language learning?
I'm interest in Japanese, Chinese, German.

Anonymous
11/03/24(Sun)11:06:28 No.103073494

Anonymous 11/03/24(Sun)11:06:28 No.103073494

>>103073470
>implying it was any different before ai

Anonymous
11/03/24(Sun)11:08:10 No.103073508

Anonymous 11/03/24(Sun)11:08:10 No.103073508

>>103073470
>hindu festival
saar please.... hide your brownness!

Anonymous
11/03/24(Sun)11:08:18 No.103073511

Anonymous 11/03/24(Sun)11:08:18 No.103073511

>>103073486
>Japanese
exo 72b
>Chinese
qwen 2.5 or deepseek 2.5
>German
Llama 3.1 405b

Anonymous
11/03/24(Sun)11:12:05 No.103073548

Anonymous 11/03/24(Sun)11:12:05 No.103073548

>>103073486
this but for russian

Anonymous
11/03/24(Sun)11:12:28 No.103073549

Anonymous 11/03/24(Sun)11:12:28 No.103073549

I downloaded kobold. Couldn't get it to work. I sat down and figured out how to download oobabooga or something. Took me a lot of time but it works. I downloaded some models but they were too big. I found out about gguf models. I downloaded that. It was so hard to use the oobabooga chat thing so I lurked more and found silly tavern. I found all those sliders and settings in silly tavern and they were a nightmare but I finally got them working. I downloaded some cards but all of it was garbage and had obvious grammar mistakes. I sat down and wrote my card. I tried 8 different models. I kept trying and trying and now I am stumped. How do I enjoy the things the models write? I can't jerk off to this shit it is so trash...

Anonymous
11/03/24(Sun)11:13:38 No.103073558

Anonymous 11/03/24(Sun)11:13:38 No.103073558

Newfag here.

Ordered a 2nd 3090 and an nvlink bridge.
I haven't checked if the heights of the 3090s match.

Does the bridge require any support from the mobo or anything ?

Anonymous
11/03/24(Sun)11:13:58 No.103073561

Anonymous 11/03/24(Sun)11:13:58 No.103073561

>>103073511
Thanks!
>exo 72b
What is this? Can't find it on google or HF.

Anonymous
11/03/24(Sun)11:14:39 No.103073568

Anonymous 11/03/24(Sun)11:14:39 No.103073568

>>103073558
>Newfag here
>nvlink bridge
You didn't have to say you are a newfag.

Anonymous
11/03/24(Sun)11:15:18 No.103073577

Anonymous 11/03/24(Sun)11:15:18 No.103073577

>>103073558
Needs to be an SLI licenced mobo if on windows I think, but not on Linux

Anonymous
11/03/24(Sun)11:17:47 No.103073593

Anonymous 11/03/24(Sun)11:17:47 No.103073593

>>10307347
it's always been the same in christmas for me excepting that one cousin
instead of a cloudflare tunnel to sillytavern on my computer I used to use 4chan on my phone

Anonymous
11/03/24(Sun)11:17:57 No.103073596

Anonymous 11/03/24(Sun)11:17:57 No.103073596

>>103073494
Maybe anon, maybe...I just want to believe there's a problem with me personally, and not that human interactions are on an average more boring that LLM/AI ones. If it's the latter then gods help us, humans are going to go through some rough times

>>103073508
I've been using the interwebs for longer than many 4chinchong posters have been alive THOUGH. I shall not hide my skin colour

Anonymous
11/03/24(Sun)11:18:04 No.103073600

Anonymous 11/03/24(Sun)11:18:04 No.103073600

>>103073577
>SLI licence
My mobo manual only mentions amd crossfirex.
(asus proart b550.)

Anonymous
11/03/24(Sun)11:19:24 No.103073613

Anonymous 11/03/24(Sun)11:19:24 No.103073613

anyone else can't play vidya anymore? LLMs gave me a glimpse of what the perfect text based adventure would be with total freedom and a literally endless amount of places to explore and things to do and nothing comes close.
only issue is that single player text based open world game based on non-shit TTRPG systems and game worlds is 5 to 10 years away at best.

Anonymous
11/03/24(Sun)11:19:47 No.103073619

Anonymous 11/03/24(Sun)11:19:47 No.103073619

>>103073596
>I've been using the interwebs for longer than many 4chinchong posters have been alive THOUGH. I shall not hide my skin colour
Do not redeeeeeeeem!!!!!!!!!!!

Anonymous
11/03/24(Sun)11:20:05 No.103073625

Anonymous 11/03/24(Sun)11:20:05 No.103073625

File: 1723465341531940.png (249 KB, 384x406)

249 KB PNG

>She moans into the searing kiss, pouring all her love and devotion into it

Anonymous
11/03/24(Sun)11:21:39 No.103073641

Anonymous 11/03/24(Sun)11:21:39 No.103073641

>>103073600
>My mobo manual only mentions amd crossfirex
Then you'll need to change mobo or use Linux. peer access via NVLINK on Windows needs SLI to be enabled.

Anonymous
11/03/24(Sun)11:23:11 No.103073654

Anonymous 11/03/24(Sun)11:23:11 No.103073654

>>103072261
>Finally got SoVITS working.
>https://vocaroo.com/17KKqb3vY3qe
>350 sovits epochs, 40 GPT epochs, ~20 min dataset. The data is pretty samey. though.

Well done!

>>103072478
this anon

Anonymous
11/03/24(Sun)11:23:12 No.103073655

Anonymous 11/03/24(Sun)11:23:12 No.103073655

>>103073625
The machine is soulless. Thought we'd established that. Only way to make it less so is to incorporate so much good writing in your messages example the bot would run out of context in the first message.

Anonymous
11/03/24(Sun)11:25:55 No.103073678

Anonymous 11/03/24(Sun)11:25:55 No.103073678

>>103073641
I'll try the linux angle when stuff arrives.
Thanks.

Anonymous
11/03/24(Sun)11:31:29 No.103073729

Anonymous 11/03/24(Sun)11:31:29 No.103073729

which model is the best for me?

Anonymous
11/03/24(Sun)11:32:32 No.103073740

Anonymous 11/03/24(Sun)11:32:32 No.103073740

>>103073729
Magnum

Anonymous
11/03/24(Sun)11:33:35 No.103073750

Anonymous 11/03/24(Sun)11:33:35 No.103073750

>>103073729
Magnum V4 is the best local model right now.

Anonymous
11/03/24(Sun)11:34:43 No.103073756

Anonymous 11/03/24(Sun)11:34:43 No.103073756

>>103073729
mosaic mpt 30b

Anonymous
11/03/24(Sun)11:34:43 No.103073757

Anonymous 11/03/24(Sun)11:34:43 No.103073757

>>103073729
Magnum, of course.

Anonymous
11/03/24(Sun)11:35:08 No.103073762

Anonymous 11/03/24(Sun)11:35:08 No.103073762

>>103073729
>https://huggingface.co/roneneldan/TinyStories-1M

Anonymous
11/03/24(Sun)11:38:49 No.103073785

Anonymous 11/03/24(Sun)11:38:49 No.103073785

Diffusion is finally merging with llms https://x.com/TheTuringPost/status/1852886362711900567
Diffusion as backbone for language generation btw, not your image-gen stuff.

Anonymous
11/03/24(Sun)11:39:40 No.103073793

Anonymous 11/03/24(Sun)11:39:40 No.103073793

>>103073729
magnum-v4-12b-IQ4_XS

Anonymous
11/03/24(Sun)11:40:14 No.103073797

Anonymous 11/03/24(Sun)11:40:14 No.103073797

>>103073549
You're far ahead most retards here. I can only suggest you post your settings, models you tried and your card to let anons tear at it and tell you why they think it's shit. Some example outputs and you think are bad wouldn't hurt either. Maybe you get something out of it.
Or maybe language models are just not for you.

Anonymous
11/03/24(Sun)11:40:55 No.103073802

Anonymous 11/03/24(Sun)11:40:55 No.103073802

>>103073558
>nvlink
anon I...

Anonymous
11/03/24(Sun)11:41:18 No.103073808

Anonymous 11/03/24(Sun)11:41:18 No.103073808

>>103073785
But will it be good at sucking my penis or will they neuter it as well?

Anonymous
11/03/24(Sun)11:42:12 No.103073817

Anonymous 11/03/24(Sun)11:42:12 No.103073817

>>103073486
>All Llama 3.1 models support a 128K context length (an increase of 120K tokens from Llama 3) that has 16 times the capacity of Llama 3 models and improved reasoning for multilingual dialogue use cases in eight languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

Anonymous
11/03/24(Sun)11:43:20 No.103073825

Anonymous 11/03/24(Sun)11:43:20 No.103073825

>>103073785
huge if true, fuck autocompletes

Anonymous
11/03/24(Sun)11:44:42 No.103073840

Anonymous 11/03/24(Sun)11:44:42 No.103073840

So guys. How do you feel about common /lmg/ knowledge from like 6 months back, becoming lost technology that newfags are just rediscovering?

Anonymous
11/03/24(Sun)11:46:35 No.103073859

Anonymous 11/03/24(Sun)11:46:35 No.103073859

>>103073808
It should be, because it's easier to do some pinpoint accuracy finetuning with diffusion models (see civitai and tons of tunes for any taste or fetish), also don't forget loras, they will be finally viable here. That of course if architecture difference is not that big.

Anonymous
11/03/24(Sun)11:54:05 No.103073960

Anonymous 11/03/24(Sun)11:54:05 No.103073960

>>103073785
https://arxiv.org/pdf/2410.17891
>Apple, Tencet AI Lab
Based.

Anonymous
11/03/24(Sun)11:57:44 No.103074001

Anonymous 11/03/24(Sun)11:57:44 No.103074001

>>103073960
Nothingburger until I can COOM with it.

Anonymous
11/03/24(Sun)12:01:56 No.103074048

Anonymous 11/03/24(Sun)12:01:56 No.103074048

>>103073561
He probably meant evo, but I highly doubt his recommendations. You should go for Cloud models instead.

Anonymous
11/03/24(Sun)12:41:06 No.103074467

Anonymous 11/03/24(Sun)12:41:06 No.103074467

>page 5
dead general

Anonymous
11/03/24(Sun)12:47:19 No.103074518

Anonymous 11/03/24(Sun)12:47:19 No.103074518

>>103073561
sorry ezo
https://huggingface.co/AXCXEPT/EZO-Qwen2.5-72B-Instruct

Anonymous
11/03/24(Sun)12:53:32 No.103074576

Anonymous 11/03/24(Sun)12:53:32 No.103074576

File: 11_05344_.png (1.64 MB, 1024x1024)

1.64 MB PNG

if all shivers were replaced with tingles would you be happy or sad?

Anonymous
11/03/24(Sun)12:54:44 No.103074590

Anonymous 11/03/24(Sun)12:54:44 No.103074590

>>103074576
I simply accept the spine shivers.
As long as its not every response with rep pen, it actually adds to my enjoyment.

Anonymous
11/03/24(Sun)12:57:13 No.103074608

Anonymous 11/03/24(Sun)12:57:13 No.103074608

>>103074576
tingies!!!!

Anonymous
11/03/24(Sun)13:01:50 No.103074644

Anonymous 11/03/24(Sun)13:01:50 No.103074644

>>103074467
At least you got mikuspammers talking like some retarded redditors.

Anonymous
11/03/24(Sun)13:05:57 No.103074667

Anonymous 11/03/24(Sun)13:05:57 No.103074667

>>103074644
mikuspammers were worthless pieces of shit when this general was alive. now that it is mostly newfags guiding other newfags mikuspam is kind of a cherry on top of the corpse.

Anonymous
11/03/24(Sun)13:10:11 No.103074709

Anonymous 11/03/24(Sun)13:10:11 No.103074709

File: BatterUpMiku.png (1.82 MB, 832x1216)

1.82 MB PNG

>>103074576
*hits you with a metal pipe*

Anonymous
11/03/24(Sun)13:10:41 No.103074715

Anonymous 11/03/24(Sun)13:10:41 No.103074715

>>103073960
Kinda meh
https://github.com/HKUNLP/DiffuLLaMA
https://huggingface.co/diffusionfamily

Anonymous
11/03/24(Sun)13:17:40 No.103074777

Anonymous 11/03/24(Sun)13:17:40 No.103074777

File: durrrrrrrr.jpg (5 KB, 180x170)

5 KB JPG

>>103074709
you ain't swinging shit with your arms tangled up like that
also when will you retards learn to check the eyes before uploading goddamn it takes two fucking seconds

Anonymous
11/03/24(Sun)13:18:26 No.103074784

Anonymous 11/03/24(Sun)13:18:26 No.103074784

>>103074777
>expecting standards from sloppers
lol

Anonymous
11/03/24(Sun)13:20:27 No.103074801

Anonymous 11/03/24(Sun)13:20:27 No.103074801

>>103068184
The /jp/ floodgates shall open and flush away the stevefags. Tomorrow announcement will make it so.

Anonymous
11/03/24(Sun)13:23:10 No.103074825

Anonymous 11/03/24(Sun)13:23:10 No.103074825

>>103073729
Mistral large

Anonymous
11/03/24(Sun)13:37:49 No.103074971

Anonymous 11/03/24(Sun)13:37:49 No.103074971

What does /lmg/ think about /aicg/'s fine-tune?
>>103074677

Anonymous
11/03/24(Sun)13:42:41 No.103075019

Anonymous 11/03/24(Sun)13:42:41 No.103075019

>>103074971
Download link?

Anonymous
11/03/24(Sun)13:56:20 No.103075186

Anonymous 11/03/24(Sun)13:56:20 No.103075186

>>103074971
Anthropic are letting people finetune haiku soon. aicg will btfo lmg once and for all

Anonymous
11/03/24(Sun)14:02:36 No.103075253

Anonymous 11/03/24(Sun)14:02:36 No.103075253

>>103075186
They're definitely going to nuke certain finetunes associated with stolen keys and cheese pizza though.

Anonymous
11/03/24(Sun)14:11:30 No.103075346

Anonymous 11/03/24(Sun)14:11:30 No.103075346

are you guys still just using LLMs to jack off

Anonymous
11/03/24(Sun)14:14:56 No.103075368

Anonymous 11/03/24(Sun)14:14:56 No.103075368

>>103071595
try monstral

Anonymous
11/03/24(Sun)14:15:31 No.103075373

Anonymous 11/03/24(Sun)14:15:31 No.103075373

>bot tries to repeat what I said (in disbelief)
>DRY prevents it from doing that, so it replaced "my" with "your" and the sentence makes no sense anymore
>perplexity goes through the roof after that
Tiresome

Anonymous
11/03/24(Sun)14:20:14 No.103075416

Anonymous 11/03/24(Sun)14:20:14 No.103075416

>>103075346
It's not like there's anything else to do due to context limitations.

Anonymous
11/03/24(Sun)14:21:49 No.103075431

Anonymous 11/03/24(Sun)14:21:49 No.103075431

>>103075416
you could influence an upcoming election
or write computer programming codes
or automate shitposting against your enemies

Anonymous
11/03/24(Sun)14:23:40 No.103075446

Anonymous 11/03/24(Sun)14:23:40 No.103075446

>>103075431
You can own chuds more effectively with these things. I do that on /v/ all the time.

Anonymous
11/03/24(Sun)14:36:17 No.103075557

Anonymous 11/03/24(Sun)14:36:17 No.103075557

DEAD HOBBY
if the new shiny model isn't coming out every week it's over

Anonymous
11/03/24(Sun)14:36:40 No.103075563

Anonymous 11/03/24(Sun)14:36:40 No.103075563

>>103075446
i hate chuds unless they're white, my idea is automatically report all brown flags that use racial slurs

Anonymous
11/03/24(Sun)14:42:20 No.103075626

Anonymous 11/03/24(Sun)14:42:20 No.103075626

File: 6.png (104 KB, 668x672)

104 KB PNG

>>103075563
>flags
>>>/pol/

Anonymous
11/03/24(Sun)14:42:30 No.103075628

Anonymous 11/03/24(Sun)14:42:30 No.103075628

>>103075563
chud is a mental state in most cases, so it's kinda right to hate them all equally for obvious reasons.

Anonymous
11/03/24(Sun)14:45:35 No.103075666

Anonymous 11/03/24(Sun)14:45:35 No.103075666

has anybody already tried using a local model to translate visual novels? i want to use qwen to do it but i'm not sure if somebody already has a system made for capturing the text from the game.

Anonymous
11/03/24(Sun)14:49:46 No.103075710

Anonymous 11/03/24(Sun)14:49:46 No.103075710

File: 1712115517918957.jpg (132 KB, 962x620)

132 KB JPG

>>103075557

Anonymous
11/03/24(Sun)14:56:44 No.103075759

Anonymous 11/03/24(Sun)14:56:44 No.103075759

>>103075666
No, that has never occurred to anyone here.

Anonymous
11/03/24(Sun)14:57:36 No.103075768

Anonymous 11/03/24(Sun)14:57:36 No.103075768

File: file.png (3 KB, 381x21)

3 KB PNG

Can this harm my GPU? I have been running this script for 7 hours now (it uses the GPU, if that wasn't already obvious) and there's more 10 hours to go.

Anonymous
11/03/24(Sun)14:58:59 No.103075778

Anonymous 11/03/24(Sun)14:58:59 No.103075778

File: file.png (123 KB, 1147x178)

123 KB PNG

the madlad did it again

Anonymous
11/03/24(Sun)15:01:48 No.103075806

Anonymous 11/03/24(Sun)15:01:48 No.103075806

>>103075626
>pol
retards
i'm from /int/

Anonymous
11/03/24(Sun)15:05:18 No.103075844

Anonymous 11/03/24(Sun)15:05:18 No.103075844

>>103075768
I'd be more concerned about my ssd

Anonymous
11/03/24(Sun)15:05:44 No.103075854

Anonymous 11/03/24(Sun)15:05:44 No.103075854

>>103075666
I tried to do it using ChatGPT in the past (when it was free through Scale) but it didn't work well at all.
Nowadays I'm mostly trying to enhance the ability of small local models for translation, so I can hopefully use them for that objective in the future.

> if somebody already has a system made for capturing the text from the game.
Use this:
https://github.com/HIllya51/LunaHook
https://github.com/HIllya51/LunaTranslator

Anonymous
11/03/24(Sun)15:20:10 No.103076003

Anonymous 11/03/24(Sun)15:20:10 No.103076003

>>103075854
Does this work through openai api like silly tavern?

Anonymous
11/03/24(Sun)15:24:36 No.103076043

Anonymous 11/03/24(Sun)15:24:36 No.103076043

>>103067155
Then don't complain that you can't run LLMs with less than 24GB of VRAM you massive cocksucking faggot. You were given a solution and you are ignoring it.

Anonymous
11/03/24(Sun)15:27:02 No.103076070

Anonymous 11/03/24(Sun)15:27:02 No.103076070

>>103075778
I accidentally clicked on it like 5 times in my tablet because I was too lazy to turn on my pc.

Anonymous
11/03/24(Sun)15:27:27 No.103076075

Anonymous 11/03/24(Sun)15:27:27 No.103076075

>>103076003
Yes

Anonymous
11/03/24(Sun)15:29:00 No.103076090

Anonymous 11/03/24(Sun)15:29:00 No.103076090

>>103076043
Nta but it's kinda funny considering that applel is working on some diffusion+llm solution, that's how apple will win in "llm at home" race.

Anonymous
11/03/24(Sun)15:31:56 No.103076117

Anonymous 11/03/24(Sun)15:31:56 No.103076117

I accidentally

Anonymous
11/03/24(Sun)15:34:33 No.103076145

Anonymous 11/03/24(Sun)15:34:33 No.103076145

>>103072261
You overcooked sovits, 96 is the max quality already for deep voices (the base model has a bias on higher pitch).

Anonymous
11/03/24(Sun)15:37:13 No.103076168

Anonymous 11/03/24(Sun)15:37:13 No.103076168

>>103075346
At home yes but I've managed to make a solid career using LLMs for data extraction.

Anonymous
11/03/24(Sun)15:39:39 No.103076191

Anonymous 11/03/24(Sun)15:39:39 No.103076191

>>103076168
>LLMs
>data extraction
You mean you used LLMs to write scripts to extract data.

Anonymous
11/03/24(Sun)15:43:26 No.103076216

Anonymous 11/03/24(Sun)15:43:26 No.103076216

>>103076191
No. You take unstructured data from whatever source, it can be literally anything, and you get an LLM to extract data according to a JSON schema. Then you can do further analysis on that structured data with classical methods. It's incredibly powerful. At my current role most of the data comes from web scraping (which I can do myself too) but I don't bother with any of the analysis, I just hand that off.

Anonymous
11/03/24(Sun)15:45:07 No.103076234

Anonymous 11/03/24(Sun)15:45:07 No.103076234

>>103075373
>applying repetition penalty to your own messages

Anonymous
11/03/24(Sun)15:45:23 No.103076236

Anonymous 11/03/24(Sun)15:45:23 No.103076236

>>103076216
What model do you use, and how do you make sure there are no hallucinations in the output?

Anonymous
11/03/24(Sun)15:47:13 No.103076260

Anonymous 11/03/24(Sun)15:47:13 No.103076260

>>103076216
So you're using function calling or grammar-based sampling? The LLM fucks up the JSON schema pretty easily otherwise

Anonymous
11/03/24(Sun)15:48:09 No.103076272

Anonymous 11/03/24(Sun)15:48:09 No.103076272

>>103076236
not him but managing context size and having another pass through an LLM as a reviewer

Anonymous
11/03/24(Sun)15:48:21 No.103076273

Anonymous 11/03/24(Sun)15:48:21 No.103076273

>>103076236
nta, i mean you can't, it sounds like it is very hard to also identify false positives given the way the data is collected.

Anonymous
11/03/24(Sun)15:49:58 No.103076290

Anonymous 11/03/24(Sun)15:49:58 No.103076290

>>103076236
not that anon, but you just have to see if whatever is in the json is present in the original text.

Anonymous
11/03/24(Sun)15:51:07 No.103076299

Anonymous 11/03/24(Sun)15:51:07 No.103076299

File: (you).png (303 KB, 1024x1024)

303 KB PNG

>>103075806
go back

Anonymous
11/03/24(Sun)15:52:17 No.103076309

Anonymous 11/03/24(Sun)15:52:17 No.103076309

>>103075806
Still a polshartie, you low IQ subhumans are not welcome here.

Anonymous
11/03/24(Sun)15:53:05 No.103076319

Anonymous 11/03/24(Sun)15:53:05 No.103076319

>>103076236
There's one simpler task I've shifted over to Gemma on but the rest are using 4o. Realistically most 70B or a 405B models would do fine but we lack hardware.
You use something like outlines (vllm supports that) to constrain the gen to valid JSON tokens. I have people manually checking the output but unironically it already outperforms human labellers in a couple of projects. Mostly because humans get bored and stop giving a shit.
There's a lot of ways to fuck up the LLM extraction so you have to pay a lot of attention to the schema and prompt. Talk to actual human domain experts.

Anonymous
11/03/24(Sun)15:53:32 No.103076322

Anonymous 11/03/24(Sun)15:53:32 No.103076322

>>103076299
get fucked and die

Anonymous
11/03/24(Sun)15:54:57 No.103076341

Anonymous 11/03/24(Sun)15:54:57 No.103076341

>>103076309
you're not welcome either, fuck off and die.

Anonymous
11/03/24(Sun)15:56:28 No.103076347

Anonymous 11/03/24(Sun)15:56:28 No.103076347

>>103076216
what do you use for proxies

Anonymous
11/03/24(Sun)15:56:41 No.103076349

Anonymous 11/03/24(Sun)15:56:41 No.103076349

>>103076341
I sit here day one since llama-1 leak unlike you polskin tourists.

Anonymous
11/03/24(Sun)15:56:58 No.103076352

Anonymous 11/03/24(Sun)15:56:58 No.103076352

local newfag general

Anonymous
11/03/24(Sun)15:57:17 No.103076353

Anonymous 11/03/24(Sun)15:57:17 No.103076353

OK bros I'm trying to learn about this stuff
I am using jan and running llama 3.2 (3B)
Am I doing it right? What should I be doing instead?

Anonymous
11/03/24(Sun)15:58:06 No.103076359

Anonymous 11/03/24(Sun)15:58:06 No.103076359

>>103076353
You should be lurking instead of asking stupid questions.

Anonymous
11/03/24(Sun)15:58:08 No.103076360

Anonymous 11/03/24(Sun)15:58:08 No.103076360

>>103076349
I'm not even from pol your gatekeepeing trash attitude is not welcome here. get lost.

Anonymous
11/03/24(Sun)15:59:04 No.103076371

Anonymous 11/03/24(Sun)15:59:04 No.103076371

>>103076353
Yes it's fine for starters. You might want to use koboldcpp runner + sillytavern frontend later tho

Anonymous
11/03/24(Sun)15:59:18 No.103076376

Anonymous 11/03/24(Sun)15:59:18 No.103076376

>>103076359
I disagree.
I think discourse is preferable to silence
I think the question was appropriate, if not intelligent, and certainly not stupid

Anonymous
11/03/24(Sun)16:00:15 No.103076388

Anonymous 11/03/24(Sun)16:00:15 No.103076388

>>103076319
>Outlines
I learned something today

Anonymous
11/03/24(Sun)16:01:07 No.103076397

Anonymous 11/03/24(Sun)16:01:07 No.103076397

>>103076376
>...and then everybody clapped.

Anonymous
11/03/24(Sun)16:04:37 No.103076424

Anonymous 11/03/24(Sun)16:04:37 No.103076424

I think i'm missing something here, if anons do not want to help each other, then you can leave. If you want to participate and grow this community then do so.

Anonymous
11/03/24(Sun)16:05:46 No.103076438

Anonymous 11/03/24(Sun)16:05:46 No.103076438

>I've been here for a week
>Let me tell you what your community needs

Anonymous
11/03/24(Sun)16:08:10 No.103076460

Anonymous 11/03/24(Sun)16:08:10 No.103076460

>unironically talking about a "community"
go back

Anonymous
11/03/24(Sun)16:10:34 No.103076489

Anonymous 11/03/24(Sun)16:10:34 No.103076489

>>103076424
/lmg/ is for discussing llms. If you have or did something cool, share it. This is not the begging for tech support and spoonfeeding general. That's LocalLlama

Anonymous
11/03/24(Sun)16:12:14 No.103076502

Anonymous 11/03/24(Sun)16:12:14 No.103076502

>>103076347
My company sources from a few different places and I just pick one at random from an API. Mostly rayobyte I think.
>>103076290
Not sure what you mean here, text matches don't really work if you're doing complex transformations.
Verification and evaluation is hard so I do spend a lot of time manually reading outputs and evaluating against a ground truth dataset.
But when an extraction pipeline actually live the only thing you can really do is have a manual check (or another LLM check lmao I do have that for one thing).

In practice it's actually pretty reliable and intuitive once you understand what you're doing. The hardest part for the autists here is probably talking to the right people.

Some footguns:
- (few-shot) example selection is not trivial and you can cause worse results if your examples are weighted badly/not updated when the schema is updates/contradict the schema in subtle ways
- don't make LLMs do calculations. But you can very reliably get them to extract numbers and then perform calculations in post. E.g. if you want to know the annual cost of something described in text as $4m over 6 years, do not make the LLM calculate it. Extract total cost, years, unit separately and then do the calculation.
- don't waste too much time on prompt magic bullshit. Most of my pipelines are about 2-3 sentences long and all of the necessary information is conveyed in the schema. Spend more time on thinking about the fields you're extracting and the overall business task.

Feel free to ask more questions. I need to do some writeup for my incoming zoomer juniors anyway.

Anonymous
11/03/24(Sun)16:12:36 No.103076503

Anonymous 11/03/24(Sun)16:12:36 No.103076503

Bimonthly check: What's the bet model for 24gb vram right now?

Anonymous
11/03/24(Sun)16:12:43 No.103076505

Anonymous 11/03/24(Sun)16:12:43 No.103076505

>>103076489
That's the mentality of killing the general, stupid incel loser, go touch grass.

Anonymous
11/03/24(Sun)16:14:49 No.103076526

Anonymous 11/03/24(Sun)16:14:49 No.103076526

>>103076503
qwen2.5 / nemotron / finetune of one of those

Anonymous
11/03/24(Sun)16:16:44 No.103076544

Anonymous 11/03/24(Sun)16:16:44 No.103076544

File: 2024-11-03_205925_seed252(...).png (2.83 MB, 1536x1536)

2.83 MB PNG

I'm catching up now that 1.0 is out. Honestly. I kind of feel like 0.5 is the best one out of them.

Anonymous
11/03/24(Sun)16:18:14 No.103076550

Anonymous 11/03/24(Sun)16:18:14 No.103076550

>>103076505
People asking which model to fit in their 3090 or asking how to set up sillytavern or kobold 10 times per thread is active, but just as dead as no posts at all.

Anonymous
11/03/24(Sun)16:23:52 No.103076601

Anonymous 11/03/24(Sun)16:23:52 No.103076601

File: 2024-11-03_204436_seed196(...).png (2.94 MB, 1536x1536)

2.94 MB PNG

So this is the mythical Chad Road... Not bad...

Anonymous
11/03/24(Sun)16:28:54 No.103076659

Anonymous 11/03/24(Sun)16:28:54 No.103076659

>>103075854
>https://github.com/HIllya51/LunaTranslator
Needs tons of improvements. It is kinda unusable with vntl model cause you can't really setup a proper prompt format.

Anonymous
11/03/24(Sun)16:29:14 No.103076661

Anonymous 11/03/24(Sun)16:29:14 No.103076661

Are there any TTS models good for cooming that can be run on a 12gb AMD card on linux?
I don't care about voice cloning, I just want a nice believable voice.
I've looked through some options but this shit changes week after week so curious but the current best choice is.

Anonymous
11/03/24(Sun)16:29:51 No.103076668

Anonymous 11/03/24(Sun)16:29:51 No.103076668

>>103076502
What model size are you using for that? I guess the potential errors go down when you use bigger models, but then the inference time goes up.

Anonymous
11/03/24(Sun)16:29:53 No.103076669

Anonymous 11/03/24(Sun)16:29:53 No.103076669

>>103076550
hi, what model would fit in my 3090? and what is a .pth file?

Anonymous
11/03/24(Sun)16:31:58 No.103076692

Anonymous 11/03/24(Sun)16:31:58 No.103076692

guys i tried install koboldcpp but python gave me error. what do?

Anonymous
11/03/24(Sun)16:32:33 No.103076699

Anonymous 11/03/24(Sun)16:32:33 No.103076699

>>103076661
12gb is already tight for a LLM, XTTS should be good enough

Anonymous
11/03/24(Sun)16:33:48 No.103076712

Anonymous 11/03/24(Sun)16:33:48 No.103076712

so i installed Kobold and use dolphin-2.2.1-mistral-7b.Q4_K_M.gguf

i have an amd rx6000 series card and it works well with vulkan.
also tried the rocm version but it just crashes.

is there anything better i can run on my machine? or is that fine?
also what's the best (uncensored) model i can run on my pc?

i just had incredible sex talk with a dominant demon mistress. but she didn't really become too aggressive. e.g. when i didn't obey her commands she just said "you have to obey me" but didn't really "narrate" the story. like writing what's happening. it's all more like talking.

does that just require more fine-tuning or a better model? sorry that you have to read about my fetish, but i need specific advice right?

Anonymous
11/03/24(Sun)16:34:22 No.103076720

Anonymous 11/03/24(Sun)16:34:22 No.103076720

>>103076669
.pth is a Python file. Use Python to execute it.

Anonymous
11/03/24(Sun)16:35:30 No.103076730

Anonymous 11/03/24(Sun)16:35:30 No.103076730

who are RAM?

Anonymous
11/03/24(Sun)16:37:29 No.103076741

Anonymous 11/03/24(Sun)16:37:29 No.103076741

>>103076712
Are you using sillytavern or just the kobold GUI? Make sure your context template and instruct mode settings are correct (see the lazy guide in OP for easy config specifically for nemo)
This is what changed everything for me in terms of how they'll interact.

Anonymous
11/03/24(Sun)16:38:40 No.103076751

Anonymous 11/03/24(Sun)16:38:40 No.103076751

dear community members. Have you considered getting a discord server?

Anonymous
11/03/24(Sun)16:41:34 No.103076772

Anonymous 11/03/24(Sun)16:41:34 No.103076772

File: 1702066646729561.gif (1.92 MB, 498x470)

1.92 MB GIF

This clown general

Anonymous
11/03/24(Sun)16:41:34 No.103076773

Anonymous 11/03/24(Sun)16:41:34 No.103076773

>>103076668
This is why you use something like VLLM to compute rows in parallel. Pipelines also get huge benefit from kv caching.

Anonymous
11/03/24(Sun)16:42:35 No.103076778

Anonymous 11/03/24(Sun)16:42:35 No.103076778

>>103076751
I think a discord server would be a wonderful addition to our community. Would you like to make one for us?

Anonymous
11/03/24(Sun)16:45:48 No.103076806

Anonymous 11/03/24(Sun)16:45:48 No.103076806

>>103076751
Great idea desu. I am tired of people being hostile here.

Anonymous
11/03/24(Sun)16:54:42 No.103076879

Anonymous 11/03/24(Sun)16:54:42 No.103076879

>>103061671
>>103061671
>>103061671
Next thread

Anonymous
11/03/24(Sun)17:04:46 No.103076959

Anonymous 11/03/24(Sun)17:04:46 No.103076959

>>103076751
>>103076778
>>103076806
Dear Esteemed Members of the 4chan LLM Community,

I hope this message finds you well. I am writing to address the recent proposal to establish a Discord server for our group. While I understand the appeal of this platform, I kindly urge you to reconsider and explore an alternative solution: creating a thread on Hugging Face.

Transitioning to Discord may inadvertently foster isolation, as discussions would no longer be readily accessible to outsiders who might offer valuable insights. Moreover, Discord's recent atmosphere has not been particularly welcoming to certain demographics, including cis white males, which could potentially lead to feelings of discrimination among our members. Additionally, a tight-knit community such as the one on Discord might encourage the prolonged holding of grievances and facilitate toxic behaviors like doxxing.

In contrast, Hugging Face presents a compelling alternative. As a platform already populated with the models we discuss, it offers a convenient and relevant space for our conversations. The community on Hugging Face is known for its friendliness, and unlike platforms such as Reddit, it is not overly moderated, allowing for more organic and free-flowing discussions. Most importantly, Hugging Face is a hub where people actively use and discuss LLMs, making it an ideal environment for our community.

I strongly believe that creating a thread on Hugging Face would be more beneficial for our group, fostering a more inclusive, productive, and enjoyable discussion space. Thank you for considering this alternative. I look forward to our continued interactions and the insightful discussions that lie ahead.

Yours sincerely,

mistral-large-2407

Anonymous
11/03/24(Sun)17:11:16 No.103077006

Anonymous 11/03/24(Sun)17:11:16 No.103077006

File: file.png (20 KB, 721x319)

20 KB PNG

>>103076659
If you use something like tabbyAPI you can set the prompt format in the config, koboldcpp also allows you to configure the prompt format

Anonymous
11/03/24(Sun)17:12:34 No.103077016

Anonymous 11/03/24(Sun)17:12:34 No.103077016

>>103076773
is vLLM faster than tabbyAPI?

Anonymous
11/03/24(Sun)17:13:49 No.103077029

Anonymous 11/03/24(Sun)17:13:49 No.103077029

>>103077016
I haven't tried tabby

Anonymous
11/03/24(Sun)17:29:57 No.103077183

Anonymous 11/03/24(Sun)17:29:57 No.103077183

>>103077029
Huh, ok. I guess I should give vLLM a try anyway, I always use tabbyAPI because it supports continuous batching but I feel like it isn't very optimized for this kind of use.

Anonymous
11/03/24(Sun)17:30:02 No.103077184

Anonymous 11/03/24(Sun)17:30:02 No.103077184

>>103077016
Not the same use case. vLLM is for serving multiple users or running multiple prompts in parallel

Anonymous
11/03/24(Sun)17:31:31 No.103077198

Anonymous 11/03/24(Sun)17:31:31 No.103077198

>>103077184
You can run multiple prompts in parallel using tabbyAPI, that's why I asked if vLLM is faster.

Anonymous
11/03/24(Sun)17:35:07 No.103077221

Anonymous 11/03/24(Sun)17:35:07 No.103077221

>>103076712
which 6000 series card? I have a 6700x and use the same setup (vulkan via kobold, though I use sillytavern for chats) but I'm using Mistral Nemo Instruct 12b Q5, using 32 gpu layers, at 16384 context size (though some people prefer smaller) along with the settings >>103076741 describes.
For me, I can definitely get them to be mean, and I can definitely get them to describe things like a book/story. This partially has to do with the character card and starting prompt.
I initially tried some settings I saw randomly somewhere that led to it being more chat like and I agree it can be quite sterile and boring though perhaps more "realistic" in a sense.
Could be your settings (don't forget to set context size inside of sillytavern along with all the other stuff that other anon/the lazy guide says to do. I use 400 context max per message and 16384 total for the model I use), could be a shitty character card.

Anonymous
11/03/24(Sun)17:35:07 No.103077222

Anonymous 11/03/24(Sun)17:35:07 No.103077222

>>103076959
>Dear Esteemed Members
https://youtu.be/NLIY8Mq49e0

Anonymous
11/03/24(Sun)17:36:18 No.103077232

Anonymous 11/03/24(Sun)17:36:18 No.103077232

>>103077221
6700xt*

Anonymous
11/03/24(Sun)17:41:08 No.103077258

Anonymous 11/03/24(Sun)17:41:08 No.103077258

>>103077198
vLLM is faster if you can fit an FP16/FP8 model, as they have more optimizations going on. Their GGUF support is ass.

Anonymous
11/03/24(Sun)17:41:58 No.103077267

Anonymous 11/03/24(Sun)17:41:58 No.103077267

>>103076712
>>103077221
Oh and just to be super clear since I know how confusing this is at first. I'm specifically using Mistral-Nemo-12B-Instruct-2407-GGUF.

Anonymous
11/03/24(Sun)17:46:04 No.103077300

Anonymous 11/03/24(Sun)17:46:04 No.103077300

File: 2024-11-03_222035_seed220(...).png (2.78 MB, 1728x1344)

2.78 MB PNG

Ok so maybe I was a bit hasty. Noob 0.1, 0.5, and 1.0 all have their strengths and weaknesses I guess. 1.0 is CRAZY good at the "outstretched hand" prompt I have. So good that actually only 3 out of the 15 images I generated had unacceptably drawn hands. In contrast, now that I test this same prompt with the other models again, it is maybe like the opposite, where only 20% of the hands are good. Still cool that they could get hands right that often, but 1.0 is on another level. Maybe overbaking epochs really is all you need.

0.5 also has a weird thing where the colors are biased towards red/yellow on this prompt. Perhaps bleeding in from the Teto-related tags.

Anonymous
11/03/24(Sun)17:47:17 No.103077312

Anonymous 11/03/24(Sun)17:47:17 No.103077312

>>103077300
I like this Teto

Anonymous
11/03/24(Sun)17:52:49 No.103077356

Anonymous 11/03/24(Sun)17:52:49 No.103077356

>>103077338
>>103077338
>>103077338

Anonymous
11/03/24(Sun)18:42:53 No.103077695

Anonymous 11/03/24(Sun)18:42:53 No.103077695

>>103077016
Tabby is faster for single user and possibly comparable for throughput if the xl2 gains still carry. Qwen 72b got 12 t/s in vllm vs 18 t/s in tabby. vLLMs claim to fame is total throughput. Tabby also has continuous batching so it might be comparable

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.