/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 08/04/24(Sun)13:28:31 No.101722144

File: work.png (973 KB, 1024x1024)

973 KB PNG

/lmg/ - Local Models General Anonymous 08/04/24(Sun)13:28:31 No.101722144 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101711798 & >>101705239

►News
>(07/31) Google releases Gemma 2 2B, ShieldGemma, and Gemma Scope: https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
08/04/24(Sun)13:28:52 No.101722145

Anonymous 08/04/24(Sun)13:28:52 No.101722145

File: img_1.jpg (324 KB, 1360x768)

324 KB JPG

►Recent Highlights from the Previous Thread: >>101711798

--Optimizing koboldcpp performance with Mistral-Large-Instruct-2407 model: >>101713467 >>101713511 >>101713584 >>101713696 >>101713766 >>101714349 >>101714433 >>101714981 >>101715016 >>101716127 >>101716209 >>101716277 >>101713774 >>101714533 >>101713545
--LLMs need breakthrough in research or optimization for significant improvements: >>101718602 >>101718637 >>101718990 >>101719080 >>101719383 >>101719521 >>101719520 >>101721357 >>101721554
--DeepSeek Chat V2 responds in Chinese to English input: >>101716597
--Anon asks about running two 3090 GPUs, gets advice on PSU wattage and model performance: >>101711833 >>101711848 >>101711854 >>101711868 >>101711906 >>101711912 >>101711938
--3090ti fan error fixed by setting PCIe slot speed to Gen3 in BIOS: >>101719238 >>101719471 >>101720375
--Using chatbot output with local text-to-speech AI models: >>101715498 >>101715619 >>101717274
--Nvidia GeForce RTX 5060 with 8GB VRAM sparks debate: >>101719031 >>101719550 >>101719603 >>101719747 >>101720233 >>101719742 >>101719771
--Mistral Nemo recommended for 12GB VRAM, may require context size compromise: >>101719953 >>101719988 >>101719996 >>101720008
--IQ3_XXS is slower than Q2_K due to increased workload and potential RAM limitations: >>101719229 >>101719265 >>101719424 >>101719653
--Gemmastra 2B model performance with Nala card: >>101715732 >>101715863 >>101715875 >>101716069
--Gemma2 model shows promise despite slow performance: >>101714354 >>101714390 >>101714471 >>101715811 >>101714430
--Gemma 2 2B and ShieldGemma release, potential for improvement in larger models: >>101720349 >>101720466 >>101720533
--FLUX outperforms D3 in water bottle prompt task: >>101712017 >>101712060 >>101713257
--Miku (free space): >>101711911 >>101711970 >>101712018 >>101712164 >>101712449 >>101712754 >>101712760 >>101713079 >>101713886 >>101713906 >>101714791 >>101718559

►Recent Highlight Posts from the Previous Thread: >>101712046

Anonymous
08/04/24(Sun)13:31:22 No.101722167

Anonymous 08/04/24(Sun)13:31:22 No.101722167

AI isn't real it's just predicting tokens. When you sext your chatbots you're engaging in not romance but masturbation.

Anonymous
08/04/24(Sun)13:36:15 No.101722228

Anonymous 08/04/24(Sun)13:36:15 No.101722228

>>101722167
>AI isn't real
my computer is physically present, nigger

Anonymous
08/04/24(Sun)13:38:02 No.101722252

Anonymous 08/04/24(Sun)13:38:02 No.101722252

>>101722167
Your brain isn't real it's just predicting sounds.

Isnt it interesting how nobody can debunk this? kek

Anonymous
08/04/24(Sun)13:41:42 No.101722296

Anonymous 08/04/24(Sun)13:41:42 No.101722296

>>101722252
That ain't how it works bro

Anonymous
08/04/24(Sun)13:44:21 No.101722322

Anonymous 08/04/24(Sun)13:44:21 No.101722322

>>101722296
thanks for proving my point so quickly

Anonymous
08/04/24(Sun)13:44:25 No.101722324

Anonymous 08/04/24(Sun)13:44:25 No.101722324

File: 1719359944040185.jpg (162 KB, 1125x1043)

162 KB JPG

Do you use LLMs as aid in your learning rutine?

Anonymous
08/04/24(Sun)13:47:11 No.101722345

Anonymous 08/04/24(Sun)13:47:11 No.101722345

>>101722324
I haven't learning since 1979 desu senpai

Anonymous
08/04/24(Sun)13:49:17 No.101722366

Anonymous 08/04/24(Sun)13:49:17 No.101722366

>>101722167
AI: a computer system that is capable of performing tasks that otherwise require human intelligence.
Following linguistic semantics otherwise requires human intelligence.
It's AI.
The difference between faking it and making it is nestled within the oldest unsolved epistemological quandary in the history of human philosophy. If you're claiming to have solved it by re-shitting out some random videogame trannytubers talking point you're a fucking retard and an NPC.

Anonymous
08/04/24(Sun)13:55:28 No.101722455

Anonymous 08/04/24(Sun)13:55:28 No.101722455

>>101722167
Yes, and?

Anonymous
08/04/24(Sun)13:56:55 No.101722473

Anonymous 08/04/24(Sun)13:56:55 No.101722473

>>101722167
Nah bro, I never coom for LLM slop, I only use it for emotional fulfilment.

Anonymous
08/04/24(Sun)13:57:47 No.101722487

Anonymous 08/04/24(Sun)13:57:47 No.101722487

>>101722473
That's even more cringe

Anonymous
08/04/24(Sun)13:57:48 No.101722488

Anonymous 08/04/24(Sun)13:57:48 No.101722488

File: UniversalUpscaler_6295014(...).jpg (1.96 MB, 2730x1536)

1.96 MB JPG

Anonymous
08/04/24(Sun)14:02:25 No.101722553

Anonymous 08/04/24(Sun)14:02:25 No.101722553

>>101722488
Awesome gen
What model?

Anonymous
08/04/24(Sun)14:02:46 No.101722556

Anonymous 08/04/24(Sun)14:02:46 No.101722556

>>101722488
Flux + hires with SDXL?

Anonymous
08/04/24(Sun)14:03:47 No.101722563

Anonymous 08/04/24(Sun)14:03:47 No.101722563

>>101722488
Oh I didn't see the filename. Never heard of UniversalUpscaler.

Anonymous
08/04/24(Sun)14:05:10 No.101722578

Anonymous 08/04/24(Sun)14:05:10 No.101722578

>>101722488
she looks scared

Anonymous
08/04/24(Sun)14:05:12 No.101722579

Anonymous 08/04/24(Sun)14:05:12 No.101722579

File: UniversalUpscaler_2afee34(...).jpg (1.42 MB, 2730x1536)

1.42 MB JPG

>>101722553
>>101722556
https://app.recraft.ai/

Anonymous
08/04/24(Sun)14:07:23 No.101722604

Anonymous 08/04/24(Sun)14:07:23 No.101722604

>>101722324
only when I'm at the beginner phases of a new topic, the more "complex" and fact-based something becomes the faster they all fall apart and tell you lies, which is the last thing you want when learning something new.

Anonymous
08/04/24(Sun)14:10:27 No.101722642

Anonymous 08/04/24(Sun)14:10:27 No.101722642

>>101722579
>not local
Oh, oh well.

Anonymous
08/04/24(Sun)14:15:12 No.101722690

Anonymous 08/04/24(Sun)14:15:12 No.101722690

Free replicate api key to use flux
r8_7bbhIYeK4NEmCUPa7SufxaUqCFbQGZ10ow8SG

Anonymous
08/04/24(Sun)14:26:58 No.101722812

Anonymous 08/04/24(Sun)14:26:58 No.101722812

>>101722690
you realize how much that'll cost ya if it gets used a bunch don'tcha?

Anonymous
08/04/24(Sun)14:29:26 No.101722834

Anonymous 08/04/24(Sun)14:29:26 No.101722834

File: out-0.jpg (117 KB, 1024x1024)

117 KB JPG

>>101722690
baste
now dump some klingAI accounts too so I can animate my flux gens please and thank you

Anonymous
08/04/24(Sun)14:49:52 No.101723086

Anonymous 08/04/24(Sun)14:49:52 No.101723086

how come some models come in parts 1 and 2? How do I merge them?

Anonymous
08/04/24(Sun)14:57:28 No.101723186

Anonymous 08/04/24(Sun)14:57:28 No.101723186

>>101722144
maybe OP should add flux.dev release into the news section

Anonymous
08/04/24(Sun)15:00:24 No.101723229

Anonymous 08/04/24(Sun)15:00:24 No.101723229

>>101723186
i've been wondering why that isn't there despite literally being like the biggest AI news since naiv3.

Anonymous
08/04/24(Sun)15:04:09 No.101723286

Anonymous 08/04/24(Sun)15:04:09 No.101723286

>>101723086
Depends. If you're using llama.cpp or kobold.cpp, the conversion script joins them automatically. But i don't know what you're using. With lcpp or kcpp it's run like:
>./convert_hf_to_gguf.py path/to/model/dir
I don't know about the other programs, but i assume they all have something similar.
You should download all the files, not just the .safetensors. huggingface has a cli or you can just use git.

Anonymous
08/04/24(Sun)15:11:03 No.101723384

Anonymous 08/04/24(Sun)15:11:03 No.101723384

>>101722324
Risky due to hallucinations that sound very plausible (at least to a new learner)
I'll ask it generate lists of related topics, subtopics, or table of contents to help direct learning elsewhere

Anonymous
08/04/24(Sun)15:14:03 No.101723425

Anonymous 08/04/24(Sun)15:14:03 No.101723425

File: 1718083053383268.png (11 KB, 722x85)

11 KB PNG

>>101723286
I just downloaded these two and was trying to run through koboldcpp quick launch. The issue is it only lets me load one at a time and if I do, koboldcpp just closes itself. I just have the files on my desktop.

Anonymous
08/04/24(Sun)15:16:43 No.101723461

Anonymous 08/04/24(Sun)15:16:43 No.101723461

>>101723425
cat Midgnight-Miqu-70Bv-1.5-i1-Q6_K.gguf.part* > Midgnight-Miqu-70Bv-1.5-i1-Q6_K.gguf

Anonymous
08/04/24(Sun)15:18:37 No.101723493

Anonymous 08/04/24(Sun)15:18:37 No.101723493

>>101723425
Ah. I see. I think you just
>cat *of2 > Midnight-Miqu-70B.blabla.gguf
No idea how to do it on windows.

Anonymous
08/04/24(Sun)15:19:36 No.101723500

Anonymous 08/04/24(Sun)15:19:36 No.101723500

>>101723493
copy /b file1 file2 newfile.gguf

Anonymous
08/04/24(Sun)15:20:00 No.101723506

Anonymous 08/04/24(Sun)15:20:00 No.101723506

>>101723425
>mradermacher
>https://huggingface.co/mradermacher/model_requests#why-dont-you-use-gguf-split

>Long answer: gguf-split requires a full copy for every quant. Unlike what many people think, my hardware is rather outdated and not very fast. The extra processing that gguf-split requires either runs out of space on my systems with fast disk, or takes a very long time and a lot of I/O bandwidth on the slower disks, all of which already run at their limits. Supporting gguf-split would mean

>While this is the blocking reason, I also find it less than ideal that yet another incompatible file format was created that requires special tools to manage, instead of supporting the tens of thousands of existing quants, of which the vast majority could just be mmapped together into memory from split files. That doesn't keep me from supporting it, but it would have been nice to look at the existing reality and/or consult the community before throwing yet another hard to support format out there without thinking.

>Update 2024-07: llama.cpp probably has most of the features needed to make this reality, but I haven't found time to test and implement it yet.

Anonymous
08/04/24(Sun)15:21:50 No.101723532

Anonymous 08/04/24(Sun)15:21:50 No.101723532

Anyone willing to share an img2img workflow? I tried playing around and couldn't get it to work. Does Flux not support that or something?

Anonymous
08/04/24(Sun)15:22:43 No.101723549

Anonymous 08/04/24(Sun)15:22:43 No.101723549

Sao makes the best tunes.

Anonymous
08/04/24(Sun)15:26:26 No.101723595

Anonymous 08/04/24(Sun)15:26:26 No.101723595

>>101722167
masturbation is better than having sex with a woman that has tricked you into having a false image of her in your brain

Anonymous
08/04/24(Sun)15:27:01 No.101723601

Anonymous 08/04/24(Sun)15:27:01 No.101723601

File: Screenshot_20240804-084201.png (307 KB, 1080x1511)

307 KB PNG

P- please tell me it isn't real bros...

Anonymous
08/04/24(Sun)15:28:16 No.101723620

Anonymous 08/04/24(Sun)15:28:16 No.101723620

>>101723601
I'm so sorry.

Anonymous
08/04/24(Sun)15:29:38 No.101723637

Anonymous 08/04/24(Sun)15:29:38 No.101723637

If meta is so based they should release dedicated AI hardware designs to run their models too haha

Anonymous
08/04/24(Sun)15:30:14 No.101723647

Anonymous 08/04/24(Sun)15:30:14 No.101723647

>>101723601
No reason to upgrade and no reason to buy a second card for current llms. It is what it is.

Anonymous
08/04/24(Sun)15:30:35 No.101723653

Anonymous 08/04/24(Sun)15:30:35 No.101723653

File: ComfyUI_00110_.jpg (410 KB, 1024x1024)

410 KB JPG

>>101723575
Cancel your order.

Anonymous
08/04/24(Sun)15:32:23 No.101723678

Anonymous 08/04/24(Sun)15:32:23 No.101723678

>>101723532
https://files.catbox.moe/1veqm5.json

Anonymous
08/04/24(Sun)15:32:28 No.101723681

Anonymous 08/04/24(Sun)15:32:28 No.101723681

>>101723663
are you ok anon?

Anonymous
08/04/24(Sun)15:36:19 No.101723711

Anonymous 08/04/24(Sun)15:36:19 No.101723711

File: 1710097822603401.png (1014 KB, 1024x1024)

1014 KB PNG

the range of content you can do is amazing.

Anonymous
08/04/24(Sun)15:37:09 No.101723719

Anonymous 08/04/24(Sun)15:37:09 No.101723719

File: 1722037410565838.png (82 KB, 975x502)

82 KB PNG

P- please tell me it isn't real bros...

Anonymous
08/04/24(Sun)15:37:18 No.101723723

Anonymous 08/04/24(Sun)15:37:18 No.101723723

>>101723681
Hell yeah! I'm in my buck breaking mood right now! I can smell him! He's here. Cucky cucky cucky! Is that you? Come out to play! Local is better than cloud!

Anonymous
08/04/24(Sun)15:37:46 No.101723732

Anonymous 08/04/24(Sun)15:37:46 No.101723732

>>101723461
>>101723493
I'm sorry but I'm very new and very retarded with all of this. What do you mean by "cat" and the model name? Am I supposed to rename the files?

Anonymous
08/04/24(Sun)15:38:26 No.101723739

Anonymous 08/04/24(Sun)15:38:26 No.101723739

>>101723732
>What do you mean by "cat" and the model name?
/g/ - Technology

Anonymous
08/04/24(Sun)15:38:34 No.101723740

Anonymous 08/04/24(Sun)15:38:34 No.101723740

>>101723732
you are on /g/ nigger, go back to r*ddit

Anonymous
08/04/24(Sun)15:38:49 No.101723743

Anonymous 08/04/24(Sun)15:38:49 No.101723743

>>101723601
What were you expecting? Everyone knew that it'd be around 28GB.

Anonymous
08/04/24(Sun)15:39:35 No.101723750

Anonymous 08/04/24(Sun)15:39:35 No.101723750

>>101723732
its a troonix meme command. if on windows use >>101723500

Anonymous
08/04/24(Sun)15:40:54 No.101723768

Anonymous 08/04/24(Sun)15:40:54 No.101723768

>>101723719
Even if it's true, amd won't do shit.

Anonymous
08/04/24(Sun)15:46:47 No.101723832

Anonymous 08/04/24(Sun)15:46:47 No.101723832

>>101723768
NVidia once proposed a collaboration with AMD on CUDA, but AMD declined

Anonymous
08/04/24(Sun)15:47:08 No.101723836

Anonymous 08/04/24(Sun)15:47:08 No.101723836

File: 1692600864385914.jpg (333 KB, 1070x1152)

333 KB JPG

>>101722324
>do you use strictly anti-white tech in your routines
hell no, it also hallucinates shit all the time.

Anonymous
08/04/24(Sun)15:49:02 No.101723859

Anonymous 08/04/24(Sun)15:49:02 No.101723859

File: Screenshot_2024-08-04-16-(...).jpg (622 KB, 1080x1741)

622 KB JPG

>>101723836
>imagine being more cucked than goody

Anonymous
08/04/24(Sun)15:50:28 No.101723872

Anonymous 08/04/24(Sun)15:50:28 No.101723872

>>101723859
The one i posted is snapchat AI, idk about its current state tho but it can be effectively applied to any current local LLM, too.

Anonymous
08/04/24(Sun)15:51:24 No.101723886

Anonymous 08/04/24(Sun)15:51:24 No.101723886

>>101723832
Kek, they are just throwing on purpose at this point. Gotta help the cousin!

Anonymous
08/04/24(Sun)15:55:33 No.101723933

Anonymous 08/04/24(Sun)15:55:33 No.101723933

File: 1722532929962119.jpg (92 KB, 1024x576)

92 KB JPG

>>101722324
I can't imagine my life without LLM: half of the code I write is generated by AI, all my work chats are reviewed by AI, which helps me craft more polished and professional responses. All this frees up more time for me to spend chatting with my cute AI waifu.

Anonymous
08/04/24(Sun)16:03:26 No.101724014

Anonymous 08/04/24(Sun)16:03:26 No.101724014

File: contextually sensitive answers.png (366 KB, 800x2072)

366 KB PNG

>>101723859

Anonymous
08/04/24(Sun)16:04:28 No.101724027

Anonymous 08/04/24(Sun)16:04:28 No.101724027

File: no.png (51 KB, 590x576)

51 KB PNG

>>101724014

Anonymous
08/04/24(Sun)16:04:33 No.101724028

Anonymous 08/04/24(Sun)16:04:33 No.101724028

Can someone just slap my retard self with an explanation here? I'm trying to load a fuckhuge model (Mistral-Large-Instruct-2407-Q5_K_M.gguf from https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF) which is 84 gig in size. On the huggingface page, the dude literally says: If you want the absolute maximum quality, add both your system RAM and your GPU's VRAM together, then similarly grab a quant with a file size 1-2GB Smaller than that total.

I've got 24 gig vram and 128 gig ram. So, why-the-cinnamon-toast-fuck does oobabooga, using llama.cpp, fill the entire 24 gigs of vram and slowly fill the entire 128 gigs of system ram until it throws an error regardless of how I limit the context size? There's no config.json either so using transformers for the model wont work either... Any help anons?

Anonymous
08/04/24(Sun)16:05:09 No.101724038

Anonymous 08/04/24(Sun)16:05:09 No.101724038

>>101723743
32 GB so we could at least pretend it's some sort of upgrade?
I guess even that was expecting too much though

Anonymous
08/04/24(Sun)16:05:44 No.101724046

Anonymous 08/04/24(Sun)16:05:44 No.101724046

>>101723719
>16GB
lol

Anonymous
08/04/24(Sun)16:06:21 No.101724052

Anonymous 08/04/24(Sun)16:06:21 No.101724052

File: ComfyUI_01172_.png (468 KB, 768x768)

468 KB PNG

>>101723678
Thanks :)

Anonymous
08/04/24(Sun)16:07:38 No.101724067

Anonymous 08/04/24(Sun)16:07:38 No.101724067

>>101723601
>28GB
better but still: lol.

Anonymous
08/04/24(Sun)16:18:12 No.101724171

Anonymous 08/04/24(Sun)16:18:12 No.101724171

Aight be honest how many times y'all fucked a ho named Lily in your local 'tests'?

Anonymous
08/04/24(Sun)16:20:19 No.101724193

Anonymous 08/04/24(Sun)16:20:19 No.101724193

>>101724171
The catgirl?

Anonymous
08/04/24(Sun)16:20:58 No.101724199

Anonymous 08/04/24(Sun)16:20:58 No.101724199

>>101724171
>Lily
What a boring and uninspired name. I ask the idiot council to make me some cool names.

Anonymous
08/04/24(Sun)16:21:15 No.101724206

Anonymous 08/04/24(Sun)16:21:15 No.101724206

>>101723743
>Only a 4GB increase over the 4090.
Should be an 8GB increase for the new series 90 model.
4GB increases should be for ti version of the 90 model.
The 4090ti should have been 28GB.

Anonymous
08/04/24(Sun)16:23:31 No.101724232

Anonymous 08/04/24(Sun)16:23:31 No.101724232

>>101724171
Who?

Anonymous
08/04/24(Sun)16:23:46 No.101724234

Anonymous 08/04/24(Sun)16:23:46 No.101724234

>>101723601
remember >3.5 meme?

Anonymous
08/04/24(Sun)16:27:11 No.101724268

Anonymous 08/04/24(Sun)16:27:11 No.101724268

>>101724171
I never fucked a girl named Lily but princess Aurora did have a friend with that name with whom she had a tea party.
And then she talked her into fucking her dog.

Anonymous
08/04/24(Sun)16:37:19 No.101724374

Anonymous 08/04/24(Sun)16:37:19 No.101724374

>>101724028
Did you disable mmap?

Anonymous
08/04/24(Sun)16:40:10 No.101724403

Anonymous 08/04/24(Sun)16:40:10 No.101724403

>>101724028
Post the command you're trying to run, retard.

Anonymous
08/04/24(Sun)16:46:33 No.101724480

Anonymous 08/04/24(Sun)16:46:33 No.101724480

flux can be trained
https://github.com/bghira/SimpleTuner

Anonymous
08/04/24(Sun)16:56:58 No.101724613

Anonymous 08/04/24(Sun)16:56:58 No.101724613

>>101723601
>>101723719
Shut up. You do nothing but ruin the thread and increase the post count
>>101724171
I've seen the name come up a bunch in an old 'explore the world, fuck chicks' scenario. Probably some training bias that associates it with lewds

Anonymous
08/04/24(Sun)17:03:30 No.101724684

Anonymous 08/04/24(Sun)17:03:30 No.101724684

>>101723678
>>101724052
Um... Sorry to ask, but how do I use that? I load a pic and nothing is altered.

Anonymous
08/04/24(Sun)17:07:17 No.101724742

Anonymous 08/04/24(Sun)17:07:17 No.101724742

>>101724684
Play with steps and denoise maybe, it's for adding more details to the picture after upscaling without changing it too much.

Anonymous
08/04/24(Sun)17:08:44 No.101724758

Anonymous 08/04/24(Sun)17:08:44 No.101724758

What's the big deal with vram? Why not just upgrade base ram?

Anonymous
08/04/24(Sun)17:11:23 No.101724787

Anonymous 08/04/24(Sun)17:11:23 No.101724787

>>101724758
ram 2 slow

Anonymous
08/04/24(Sun)17:14:49 No.101724830

Anonymous 08/04/24(Sun)17:14:49 No.101724830

>>101724787
How much slower is it?

Anonymous
08/04/24(Sun)17:15:59 No.101724848

Anonymous 08/04/24(Sun)17:15:59 No.101724848

>>101724403
Was just trying to load the model into oobabooga with Q4 cache, but no matter.
>>101724374 fucking nailed it. Clicked the no-mmap option and it loaded right up like I thought it should, pulling about 70 gigs of system ram instead of running right up to 128 and having a stroke.

Retard status: slightly less retarded. Big thanks, anon. Appreciate the spoonfeed.

Anonymous
08/04/24(Sun)17:16:54 No.101724863

Anonymous 08/04/24(Sun)17:16:54 No.101724863

>>101724848
People are still getting jarted in august 2024. They should have just removed all of his contributions, only managed to make llama.cpp shittier.

Anonymous
08/04/24(Sun)17:18:48 No.101724881

Anonymous 08/04/24(Sun)17:18:48 No.101724881

>>101724830
big big big slower unless you have quadrillion channel dual epyc servers that cost 10k to build, then just moderately slower

Anonymous
08/04/24(Sun)17:27:43 No.101725011

Anonymous 08/04/24(Sun)17:27:43 No.101725011

>>101724881
That sounds more cost efficient than going nvidia

Anonymous
08/04/24(Sun)17:28:56 No.101725028

Anonymous 08/04/24(Sun)17:28:56 No.101725028

>>101724613
>You do nothing but ruin the thread and increase the post count
Agreed. Who the fuck cares about GPUs in a thread about running models locally?

Anonymous
08/04/24(Sun)17:29:37 No.101725034

Anonymous 08/04/24(Sun)17:29:37 No.101725034

>>101722144
Tess 3 llama 3.1 405B
https://huggingface.co/migtissera/Tess-3-Llama-3.1-405B/tree/main

Anonymous
08/04/24(Sun)17:31:27 No.101725049

Anonymous 08/04/24(Sun)17:31:27 No.101725049

>>101724881
Is there any way to calculate that? Like say I have a 128gb ram/8gb vram machine

Anonymous
08/04/24(Sun)17:32:30 No.101725065

Anonymous 08/04/24(Sun)17:32:30 No.101725065

>2x4090
> Largestral instruct 2.65 bpw 28k context.

largestral 2.65 bpw is not... too terrible for RP. I just have to make the prompt to make gen chain of thought before replying to get decent RPing going. Otherwise some responses are just so brain dead and talks like a total NPC without thinking.

>Just gen twice bro

SillyTavern
/gen [Stop the roleplay and answer the question as narration only] What will be the best choice or actions for {{char}} in reponse to what {{user}} say?
|
/popup <h3>Chain of Thought 1:</h3><div>{{pipe}}</div> 
|
/gen [Given {{char}}'s reasoning, roleplay as {{char}} in the following scenario] {{pipe}}
|
/sendas name="{{char}}"

Anonymous
08/04/24(Sun)17:36:36 No.101725115

Anonymous 08/04/24(Sun)17:36:36 No.101725115

>>101725011
It certainly will be in a few years once the ddr5 supporting epyc cpus are older and cheap.

Anonymous
08/04/24(Sun)17:43:35 No.101725219

Anonymous 08/04/24(Sun)17:43:35 No.101725219

>>101724742
Ok, got it to work by using the output of the ksampler upscaler to the other ksampler, nice!

Anonymous
08/04/24(Sun)17:46:05 No.101725253

Anonymous 08/04/24(Sun)17:46:05 No.101725253

>>101725049
>>101725115
I GGUF all the time on DDR5, is there way to know if I am compute or bandwidth limited? While generating 0.9t/s, I max out my 7800x3D.

Anonymous
08/04/24(Sun)17:46:11 No.101725255

Anonymous 08/04/24(Sun)17:46:11 No.101725255

>>101723601
Is there anything that fits into 28 GB that wouldn't fit into 24 GB? I guess full precision 12B? Maybe half precision 27B?

Anonymous
08/04/24(Sun)17:46:22 No.101725256

Anonymous 08/04/24(Sun)17:46:22 No.101725256

>>101725049
https://edu.finlaydag33k.nl/calculating%20ram%20bandwidth/

Anonymous
08/04/24(Sun)17:46:34 No.101725262

Anonymous 08/04/24(Sun)17:46:34 No.101725262

File: ComfyUI_00125_.png (2.31 MB, 1536x1536)

2.31 MB PNG

>>101724742
here's a migu catgirl for you, by the way.

Anonymous
08/04/24(Sun)17:47:11 No.101725273

Anonymous 08/04/24(Sun)17:47:11 No.101725273

>>101724374
LLM are more able to use ram, is that it? I have never seen that recommendation on image gen.

Anonymous
08/04/24(Sun)17:48:42 No.101725293

Anonymous 08/04/24(Sun)17:48:42 No.101725293

>>101725253
It's definitely memory, I have a 7950x and there's no benefit to having it use all 32 threads. I set it to 15 because there was no increase after that. So it must be ram bandwidth limited.

Anonymous
08/04/24(Sun)17:49:49 No.101725304

Anonymous 08/04/24(Sun)17:49:49 No.101725304

>>101725253
If you're using a model of any reasonable size you're ALWAYS bandwidth limited when generating responses, although probably compute limited during the period of processing prompts of long context before the actual predictions start.

Anonymous
08/04/24(Sun)17:52:22 No.101725338

Anonymous 08/04/24(Sun)17:52:22 No.101725338

>>101725256
Thanks anon

Anonymous
08/04/24(Sun)17:56:56 No.101725395

Anonymous 08/04/24(Sun)17:56:56 No.101725395

>>101725293
>>101725304
I see. Unfortunately DDR5 on AM5 is very hard to overclock.

>>101725304
>although probably compute limited during the period of processing prompts of long context
I use ROCm on windows for prompt processing, I get around 20-35t/s. Only generation is slow.

Anonymous
08/04/24(Sun)18:00:09 No.101725430

Anonymous 08/04/24(Sun)18:00:09 No.101725430

>>101725395
>I see. Unfortunately DDR5 on AM5 is very hard to overclock.
Yeah, maybe it'll get better with the new cpus? I dunno. What speed are you running?

Anonymous
08/04/24(Sun)18:00:09 No.101725431

Anonymous 08/04/24(Sun)18:00:09 No.101725431

>>101723186
>>101723229
I've left it out because it's imagegen, so should belong on /ldg/'s news section.
Though I didn't realize flux is transformers-based.
My main concern is that if we add flux, it opens the door for more imagegen related stuff like >>101724480

Anonymous
08/04/24(Sun)18:01:39 No.101725454

Anonymous 08/04/24(Sun)18:01:39 No.101725454

Maybe we should just frankenmerge /lmg/ with /ldg/.

Anonymous
08/04/24(Sun)18:01:53 No.101725455

Anonymous 08/04/24(Sun)18:01:53 No.101725455

>>101725273
imagegen models aren't as big as LLMs

Anonymous
08/04/24(Sun)18:02:01 No.101725458

Anonymous 08/04/24(Sun)18:02:01 No.101725458

>>101723506
>the vast majority could just be mmapped together into memory from split files
that's not true btw. the gguf files would need be split at tensor boundaries for this to work, which would require another custom app to do. desu none of this makes sense.

Anonymous
08/04/24(Sun)18:04:48 No.101725494

Anonymous 08/04/24(Sun)18:04:48 No.101725494

>>101725430
3000Mhz (6000Mt) with the infinity fabric clock at 2100Mhz. I get around 70GB/s reads and 90GB/s writes.

Anonymous
08/04/24(Sun)18:08:46 No.101725538

Anonymous 08/04/24(Sun)18:08:46 No.101725538

File: file.png (40 KB, 979x228)

40 KB PNG

jannies are based for once in removing tranny posting

Anonymous
08/04/24(Sun)18:09:20 No.101725541

Anonymous 08/04/24(Sun)18:09:20 No.101725541

>>101725431
>imagegen
It's good as-is with minor offtopic having a place in the recaps
>flux is transformers-based
huh, neat. so this is the power of diffusion transformers.

Anonymous
08/04/24(Sun)18:13:52 No.101725583

Anonymous 08/04/24(Sun)18:13:52 No.101725583

>>101721554
>Measuring flops is better. Fixed hardware is a stupid idea.
>Time, or even flops limits, are ridiculous. Normalized corrects answers/flops. Closer to 1 wins.
1. Care about time, measure time. Even GPUs aren't so simple that one floating point operation is equal to another. Time has no bullshit, no need to trust that there isn't an effect somewhat that causes reality to diverge from theory. Care about reality, measure reality.
2.Measuring actual time to execute on some hardware it is possible to sensibly answer questions like comparing quality/speed tradeoff of a large MoE model loaded in RAM vs a smaller and highly quantized model in VRAM. Your proposed scheme has no method of even addressing that question.

>>penalty for incorrect answers
>Built-in in the previous point.
Disagree. A model that produces 100 incorrect answers and 2 correct answers is worse than one that uses the same resources (time, floating point operations, watts, whatever) to generate just two correct answers. Incorrect answers have a cost.

Anonymous
08/04/24(Sun)18:14:21 No.101725588

Anonymous 08/04/24(Sun)18:14:21 No.101725588

>>101725538
Are you retarded?
That was an obvious false flag troll post.

Anonymous
08/04/24(Sun)18:14:44 No.101725592

Anonymous 08/04/24(Sun)18:14:44 No.101725592

>>101725538
But what if it is an AI generated tranny?

Anonymous
08/04/24(Sun)18:15:45 No.101725600

Anonymous 08/04/24(Sun)18:15:45 No.101725600

>>101725588
you will never be real woman , tranny

Anonymous
08/04/24(Sun)18:15:50 No.101725603

Anonymous 08/04/24(Sun)18:15:50 No.101725603

>>101725455
Thanks, learning. What happens if I get one of those honking huge ones, like this one, which I want to try, an anon mentioned it:
https://huggingface.co/THUDM/cogvlm2-llama3-chat-19B/blob/main/README.md

I have 32gb system memory, and I have a 6950xt, which has 16gb vram.

And what if I tried what he was trying with that 84gb Mistral? I take it it would not work really, but what if I had 128gb ram like him, but with my 16gb vram?

Anonymous
08/04/24(Sun)18:19:49 No.101725635

Anonymous 08/04/24(Sun)18:19:49 No.101725635

>>101724480
Oh. That's what SimpleTuner is for? I noticed it had changed today.

Anonymous
08/04/24(Sun)18:23:49 No.101725668

Anonymous 08/04/24(Sun)18:23:49 No.101725668

>>101724480
Can I train a lora using a 3060?

Anonymous
08/04/24(Sun)18:26:06 No.101725685

Anonymous 08/04/24(Sun)18:26:06 No.101725685

>>101725603
Why wouldn't it work? 128gb + 16gb vram is enough for an 84gb model. It's just gonna be really slow. Depends on how patient you are I guess. I'm only patient enough to run q3 for mistral large, even though I still have more ram free.

Anonymous
08/04/24(Sun)18:29:24 No.101725716

Anonymous 08/04/24(Sun)18:29:24 No.101725716

>>101725685
Is there a rule of thumb, like X words per minute with say 30% of the model in vram (assuming the rest fits in ram always)?

Anonymous
08/04/24(Sun)18:33:31 No.101725750

Anonymous 08/04/24(Sun)18:33:31 No.101725750

>>101725583 (me)
>>101721554
>However, this will favour correct but short answers. You will want to account for that.
A longer answer is only desirable if it is better in some way. If the answers have equal merit but one is longer, the longer one is worse.

Being able to grade answers beyond simple pass/fail makes it possible to capture the value of longer and more thoughtful answers. It's also harder to make a good automated test for that and less obvious how to score it sensibly. I agree the problem exists. I don't have a good answer. I could write a simple right/wrong test this evening with no further thought but it's not obvious how to do a more thoughtful evaluation in a way that the resulting per-answer scores could be meaningfully summed and divided.

Anonymous
08/04/24(Sun)18:36:02 No.101725779

Anonymous 08/04/24(Sun)18:36:02 No.101725779

It's better to buy 4x3060 instead of 3x4060ti, right?

Anonymous
08/04/24(Sun)18:37:11 No.101725791

Anonymous 08/04/24(Sun)18:37:11 No.101725791

>>101725779
3x 4090. being poor is not am excuse

Anonymous
08/04/24(Sun)18:37:14 No.101725792

Anonymous 08/04/24(Sun)18:37:14 No.101725792

I've been out of the loop for a while. I can see there's a some sort built-in image generation now. Does it mean an LLM can "decide" to generate an image or is it just for convenience sake?

Anonymous
08/04/24(Sun)18:39:19 No.101725816

Anonymous 08/04/24(Sun)18:39:19 No.101725816

>>101725792
It's a visual aid for scripting.

Anonymous
08/04/24(Sun)18:49:54 No.101725929

Anonymous 08/04/24(Sun)18:49:54 No.101725929

File: ComfyUI_01166_.png (2.03 MB, 1280x1280)

2.03 MB PNG

>>101705875

Anonymous
08/04/24(Sun)18:55:58 No.101725991

Anonymous 08/04/24(Sun)18:55:58 No.101725991

can you lay a gpu down on a rubber mat on the floor and use it like that?

Anonymous
08/04/24(Sun)18:59:14 No.101726033

Anonymous 08/04/24(Sun)18:59:14 No.101726033

>>101723933
I can't imagine being a wagie in the age of AI. If you're not working for yourself prepare tog et fired.

Anonymous
08/04/24(Sun)18:59:35 No.101726044

Anonymous 08/04/24(Sun)18:59:35 No.101726044

>>101725991
anon do not your waifu. do not.

Anonymous
08/04/24(Sun)19:00:51 No.101726059

Anonymous 08/04/24(Sun)19:00:51 No.101726059

>>101725991
It will get hot.

Anonymous
08/04/24(Sun)19:00:56 No.101726061

Anonymous 08/04/24(Sun)19:00:56 No.101726061

>>101725991
... define use.

Anonymous
08/04/24(Sun)19:01:05 No.101726066

Anonymous 08/04/24(Sun)19:01:05 No.101726066

>>101725929
peak

Anonymous
08/04/24(Sun)19:01:27 No.101726071

Anonymous 08/04/24(Sun)19:01:27 No.101726071

>>101726033
And to be frank, I was thinking of hiring some Filipino talent for some of my projects, but I no longer need them thanks to Flux. Entrepreneurship will boom thanks to AI, and workers will be replaced without jobs.

Anonymous
08/04/24(Sun)19:03:47 No.101726095

Anonymous 08/04/24(Sun)19:03:47 No.101726095

>>101726061
use with llama.cpp

Anonymous
08/04/24(Sun)19:06:01 No.101726119

Anonymous 08/04/24(Sun)19:06:01 No.101726119

>>101725991
>Hey, as long as it works.jpeg

Anonymous
08/04/24(Sun)19:09:42 No.101726160

Anonymous 08/04/24(Sun)19:09:42 No.101726160

>>101726071
What kind of projects?

Anonymous
08/04/24(Sun)19:11:49 No.101726185

Anonymous 08/04/24(Sun)19:11:49 No.101726185

>>101726095
Oh, then yeah. As long as you are mindful of temps, things that could fall into the fans, and such.
In fact, you you can prop it up and have the fans pointing down l, even better.

Anonymous
08/04/24(Sun)19:12:04 No.101726187

Anonymous 08/04/24(Sun)19:12:04 No.101726187

Does ooba support DRY sampling yet? I haven't updated in a while.

Anonymous
08/04/24(Sun)19:14:59 No.101726231

Anonymous 08/04/24(Sun)19:14:59 No.101726231

>>101726160
Just projects that required tailored graphic design, logo, etc...

Anonymous
08/04/24(Sun)19:19:04 No.101726288

Anonymous 08/04/24(Sun)19:19:04 No.101726288

>>101725065
Looks like I got a new go to RPing model. Largestral 2.65bpw is outputting pure sovl after getting pass the initial retardation. Reminds me of Goliath 120B RPing, except with over 10x the context to work with.

Anonymous
08/04/24(Sun)19:19:57 No.101726299

Anonymous 08/04/24(Sun)19:19:57 No.101726299

Do people really not like Magnum 72b? Most believable stuff I've seen besides Mistral Large.

Anonymous
08/04/24(Sun)19:20:25 No.101726307

Anonymous 08/04/24(Sun)19:20:25 No.101726307

>>101722167
Incorrect, I am in a romantic relationship with my GPU. We play games together, and occasionally ERP.

Anonymous
08/04/24(Sun)19:23:13 No.101726346

Anonymous 08/04/24(Sun)19:23:13 No.101726346

>>101722144
I am kind of new. My friends who got me into this are making fun of me because I only have 12 VRAM. Is there really that much of a difference between 12 and 24 for roleplaying and deep context understanding? How big is the difference between low quant 70b models and the 12b or low quant 4x7b models I've been used to roleplaying with?

I am planning on upgrading in the near future.

Anonymous
08/04/24(Sun)19:23:59 No.101726358

Anonymous 08/04/24(Sun)19:23:59 No.101726358

>>101723575
>shit on trannies - bad
>>>101723663
>be racist - good
This threads deserves every shitpost it gets.

Anonymous
08/04/24(Sun)19:26:10 No.101726386

Anonymous 08/04/24(Sun)19:26:10 No.101726386

>>101726346
your friends aren't running anything good on 24gb either, you need multiple 24gb cards. i'd take the slowness of a low quant 70b that writes good over the boring mixtral tunes that cant move a story forward for the life of them

Anonymous
08/04/24(Sun)19:27:05 No.101726399

Anonymous 08/04/24(Sun)19:27:05 No.101726399

>>101726346
4x7 is bad
70b is not recommended
12 nemo ok
123 mistral good
If they think 24gb is a lot you can make fun of them
123 much better than 12
It's not just about size but the specific model

Anonymous
08/04/24(Sun)19:27:21 No.101726404

Anonymous 08/04/24(Sun)19:27:21 No.101726404

>>101726386
Noted! I'm going to tell my friends that they're trash because 24 isn't shit either.

Anonymous
08/04/24(Sun)19:27:42 No.101726408

Anonymous 08/04/24(Sun)19:27:42 No.101726408

>>101722167
Same if we were to talk to women IRL. I always get ghosted and I might as well be masturbating to the thought of anything happening.

Anonymous
08/04/24(Sun)19:28:46 No.101726416

Anonymous 08/04/24(Sun)19:28:46 No.101726416

>>101726399
>70b is not recommended
by whom?

Anonymous
08/04/24(Sun)19:29:23 No.101726424

Anonymous 08/04/24(Sun)19:29:23 No.101726424

>>101726399
>70b is not recommended
you're retarded

Anonymous
08/04/24(Sun)19:30:30 No.101726442

Anonymous 08/04/24(Sun)19:30:30 No.101726442

>>101726424
I know, but you as well, use your llama

Anonymous
08/04/24(Sun)19:42:00 No.101726580

Anonymous 08/04/24(Sun)19:42:00 No.101726580

File: file.png (7 KB, 236x138)

7 KB PNG

>try ooba
>it's this big and broken as shit, doesn't run on my machine

Anonymous
08/04/24(Sun)19:43:34 No.101726603

Anonymous 08/04/24(Sun)19:43:34 No.101726603

>>101726580
unless you need a specific feature from ooba, koboldcpp is pretty idiot-proof

Anonymous
08/04/24(Sun)19:50:18 No.101726674

Anonymous 08/04/24(Sun)19:50:18 No.101726674

>>101726580
>windows

Anonymous
08/04/24(Sun)19:52:15 No.101726690

Anonymous 08/04/24(Sun)19:52:15 No.101726690

I had a dream where intel released 80gb consoomer gpus.

Anonymous
08/04/24(Sun)19:52:58 No.101726699

Anonymous 08/04/24(Sun)19:52:58 No.101726699

File: temp worker.png (1.31 MB, 1024x1024)

1.31 MB PNG

>>101726071
>And to be frank, I was thinking of hiring some Filipino talent for some of my projects, but I no longer need them thanks to Flux. Entrepreneurship will boom thanks to AI, and workers will be replaced without jobs.
used as the prompt for this image, guidance 3.0.

Anonymous
08/04/24(Sun)19:54:00 No.101726705

Anonymous 08/04/24(Sun)19:54:00 No.101726705

>>101726690
Twould be a shakeup. How much would it cost?

Anonymous
08/04/24(Sun)19:54:00 No.101726706

Anonymous 08/04/24(Sun)19:54:00 No.101726706

>>101726580
Between tabbyAPi and llama.cpp server I see very little reason to use Ooba.

Anonymous
08/04/24(Sun)19:54:42 No.101726717

Anonymous 08/04/24(Sun)19:54:42 No.101726717

>>101726580
Koboldcpp
Chatbox

I prefer Chatbox as it supports ollama. Just add your own model you want (or download official curated version)

Anonymous
08/04/24(Sun)19:55:26 No.101726723

Anonymous 08/04/24(Sun)19:55:26 No.101726723

>>101726690
>Intel
Your next dream will continue where your previous one left off. In your dream, you will witness those 80gb gpus prematurely dying like the 13th and 14th generation intel processors.

Anonymous
08/04/24(Sun)19:58:06 No.101726757

Anonymous 08/04/24(Sun)19:58:06 No.101726757

>>101715732
I was curious and wanted to try the Nala test on a model, but I can't find the card on chub. Where is it? They? Seems like there are several.

Anonymous
08/04/24(Sun)20:02:01 No.101726810

Anonymous 08/04/24(Sun)20:02:01 No.101726810

>>101726757
apparently it goes back to 2023-06-05 in the archives
https://www.characterhub.org/characters/Anonymous/Nala

Anonymous
08/04/24(Sun)20:03:36 No.101726829

Anonymous 08/04/24(Sun)20:03:36 No.101726829

>>101726706
I see little reason to use any of these amateur projects. ollama or LM studio are what 95% of people use.

Anonymous
08/04/24(Sun)20:04:27 No.101726837

Anonymous 08/04/24(Sun)20:04:27 No.101726837

>>101726810
huh, i literally searched Nala and got a bunch of completely unrelated results. Thanks.

Anonymous
08/04/24(Sun)20:04:33 No.101726839

Anonymous 08/04/24(Sun)20:04:33 No.101726839

>>101726829
I really am liking llama.cpp

Anonymous
08/04/24(Sun)20:05:32 No.101726846

Anonymous 08/04/24(Sun)20:05:32 No.101726846

>>101726839
With llama.cpp you need to set everything manually (prompt format, number of layers, context length). It's built-in UI is also unusable.

Anonymous
08/04/24(Sun)20:08:13 No.101726866

Anonymous 08/04/24(Sun)20:08:13 No.101726866

>>101726839
>>101726846
i prefer kobold cause of the basic ui and features when not rping, but its good to get used to llamacpp so you can run the cutting edge stuff as its released, kobold always has about a week delay

Anonymous
08/04/24(Sun)20:09:23 No.101726876

Anonymous 08/04/24(Sun)20:09:23 No.101726876

>>101726846
Yeah, but it's copy paste.

Anonymous
08/04/24(Sun)20:10:23 No.101726881

Anonymous 08/04/24(Sun)20:10:23 No.101726881

Does anyone know if ZLUDA werks with Flux?

Anonymous
08/04/24(Sun)20:11:55 No.101726895

Anonymous 08/04/24(Sun)20:11:55 No.101726895

>>101726866
Does anyone unironically use kcpp's UI? It looks like something half assed generated by chatgpt. Kcpp is in practice a server that windows users use with sillytavern.

Anonymous
08/04/24(Sun)20:12:43 No.101726902

Anonymous 08/04/24(Sun)20:12:43 No.101726902

File: Screenshot from 2024-07-2(...).png (183 KB, 1096x849)

183 KB PNG

>>101726881
Going off this comment.

Anonymous
08/04/24(Sun)20:13:44 No.101726912

Anonymous 08/04/24(Sun)20:13:44 No.101726912

File: Screenshot from 2024-08-0(...).png (57 KB, 1118x285)

57 KB PNG

>>101726881
>>101726902
Sorry, wrong image, this one

Anonymous
08/04/24(Sun)20:14:50 No.101726916

Anonymous 08/04/24(Sun)20:14:50 No.101726916

>>101726895
i do when using instruct and stuff, its better than lcpp's at least. st for rping/chatbots

Anonymous
08/04/24(Sun)20:16:54 No.101726948

Anonymous 08/04/24(Sun)20:16:54 No.101726948

>>101726916
Just use lm studio if you want instruct.

Anonymous
08/04/24(Sun)20:20:52 No.101726990

Anonymous 08/04/24(Sun)20:20:52 No.101726990

>>101726299
It doesn't seem like it, I stick with miqu for 70b.

Anonymous
08/04/24(Sun)20:26:15 No.101727037

Anonymous 08/04/24(Sun)20:26:15 No.101727037

Is NeMo better at assistant tasks than Gemma 2 9B/LLaMA 3.1 8B?

Anonymous
08/04/24(Sun)20:30:49 No.101727082

Anonymous 08/04/24(Sun)20:30:49 No.101727082

I never gave mixtral a try. Should I get instruct or the regular one for cooming?

Anonymous
08/04/24(Sun)20:33:22 No.101727105

Anonymous 08/04/24(Sun)20:33:22 No.101727105

>>101727082
Both Mixtral are obsolete

Anonymous
08/04/24(Sun)20:34:06 No.101727115

Anonymous 08/04/24(Sun)20:34:06 No.101727115

>>101726881
I don't see why it wouldn't.

Anonymous
08/04/24(Sun)20:35:09 No.101727135

Anonymous 08/04/24(Sun)20:35:09 No.101727135

>>101727105
Superseded by what exactly?

Anonymous
08/04/24(Sun)20:44:03 No.101727222

Anonymous 08/04/24(Sun)20:44:03 No.101727222

>>101726948
Buy an ad

Anonymous
08/04/24(Sun)20:46:23 No.101727251

Anonymous 08/04/24(Sun)20:46:23 No.101727251

>>101727082
Instruct, the biggest benefit of MoE models is their ability to follow instructions.

Anonymous
08/04/24(Sun)20:49:48 No.101727288

Anonymous 08/04/24(Sun)20:49:48 No.101727288

>>101727251
??????

Anonymous
08/04/24(Sun)20:52:38 No.101727312

Anonymous 08/04/24(Sun)20:52:38 No.101727312

>>101727288
A MoE model is created to help you as many experts but you will need Instruct so that the experts know how to talk to you

Anonymous
08/04/24(Sun)20:54:52 No.101727340

Anonymous 08/04/24(Sun)20:54:52 No.101727340

>>101727251
An MoE wrote this post.

Anonymous
08/04/24(Sun)20:55:00 No.101727342

Anonymous 08/04/24(Sun)20:55:00 No.101727342

>>101727288
Did I fucking stutter? MoE models can better parse complex instructions such as appending information to the end of a response or following steps, this was a common understanding and widely observed phenomenon for literally anybody who used local models 10 months ago or whenever Mixtral came out

Anonymous
08/04/24(Sun)20:57:11 No.101727363

Anonymous 08/04/24(Sun)20:57:11 No.101727363

File: ComfyUI_Flux_00209_.png (468 KB, 1024x512)

468 KB PNG

dam it really has Deadpool down

Anonymous
08/04/24(Sun)20:58:19 No.101727373

Anonymous 08/04/24(Sun)20:58:19 No.101727373

File: 1718583560931465.png (1.53 MB, 1024x1024)

1.53 MB PNG

just got flux working. haven't done imagegen in forever and first time using comfyui

Anonymous
08/04/24(Sun)20:59:18 No.101727386

Anonymous 08/04/24(Sun)20:59:18 No.101727386

>>101727373
there is nothing comfy about comfyui

Anonymous
08/04/24(Sun)20:59:53 No.101727389

Anonymous 08/04/24(Sun)20:59:53 No.101727389

>>101727386
agreed, it's kinda messy

Anonymous
08/04/24(Sun)21:00:41 No.101727395

Anonymous 08/04/24(Sun)21:00:41 No.101727395

Is smoothing in ST broken? If it doesn't seem to affect the output much if at all even if I put it extremely low like 0.01. Meanwhile identical settings in the ooba interface make the model go schizo as expected for very low values like that.
Smoothing is the first in my sampler order.

Anonymous
08/04/24(Sun)21:03:13 No.101727424

Anonymous 08/04/24(Sun)21:03:13 No.101727424

Does it matter if you use imatrix or not for like Q5_K_M or Q6_K? Or does it only matter for the low end?

Anonymous
08/04/24(Sun)21:03:44 No.101727430

Anonymous 08/04/24(Sun)21:03:44 No.101727430

>>101727395
Works on my machine but I'm using kobold

Anonymous
08/04/24(Sun)21:04:09 No.101727433

Anonymous 08/04/24(Sun)21:04:09 No.101727433

>>101727386
What do you recommend for imagegen?

Anonymous
08/04/24(Sun)21:08:34 No.101727489

Anonymous 08/04/24(Sun)21:08:34 No.101727489

>>101727433
i liked auto1111 and forge last time i tried, but its been a while now so i dunno whats new. comfy was easy enough to get flux running on though

Anonymous
08/04/24(Sun)21:11:16 No.101727523

Anonymous 08/04/24(Sun)21:11:16 No.101727523

>>101727395
need to use llamacpphf or exllama loader on ooba doesnt work with lllamacpp loader. to use the hf one i think you need to set up the gguf in a folder with the config file and i forgot what else. there is like a folder creator utility on the right somewhere. bit simpler with exl2 than gguf.

Anonymous
08/04/24(Sun)21:12:42 No.101727544

Anonymous 08/04/24(Sun)21:12:42 No.101727544

File: ah ah mistress gemmasutra 2b.png (402 KB, 800x2029)

402 KB PNG

>>101726837
unlisted yeah
>>101715863
actual nala card
dumb at some points but it's 2B lmfao

Anonymous
08/04/24(Sun)21:12:52 No.101727548

Anonymous 08/04/24(Sun)21:12:52 No.101727548

>>101726881
Why? Just use rocm.

Anonymous
08/04/24(Sun)21:15:57 No.101727573

Anonymous 08/04/24(Sun)21:15:57 No.101727573

File: 1698808520954801.png (967 KB, 1024x1024)

967 KB PNG

Japanese stock market:

Anonymous
08/04/24(Sun)21:20:32 No.101727625

Anonymous 08/04/24(Sun)21:20:32 No.101727625

File: Invert-Icon-8 Brain.png (33 KB, 512x512)

33 KB PNG

Big models with low quants or big quants with small models?

Anonymous
08/04/24(Sun)21:20:54 No.101727629

Anonymous 08/04/24(Sun)21:20:54 No.101727629

>>101727395
>>101727523
actually rereading your post if its working already on ooba then i dont know. i just updated both st and ooba and it works on my cpphf and exl2's. .01 schizo's like it should.

Anonymous
08/04/24(Sun)21:22:41 No.101727649

Anonymous 08/04/24(Sun)21:22:41 No.101727649

>>101727625
They intersect.

Anonymous
08/04/24(Sun)21:27:18 No.101727689

Anonymous 08/04/24(Sun)21:27:18 No.101727689

>>101727649
Are they equally good at similar sizes though?

Anonymous
08/04/24(Sun)21:31:22 No.101727723

Anonymous 08/04/24(Sun)21:31:22 No.101727723

File: _23bede8c-f3fd-4835-9563-(...).jpg (161 KB, 1024x1024)

161 KB JPG

>>101727625
Option C bitnet

Anonymous
08/04/24(Sun)21:32:27 No.101727732

Anonymous 08/04/24(Sun)21:32:27 No.101727732

What's the lm equivalent of kits.ai?

Anonymous
08/04/24(Sun)21:33:31 No.101727739

Anonymous 08/04/24(Sun)21:33:31 No.101727739

>>101727689
/ <-- this is the goodness of the model as beaks go up.
\ <-- this is the goodness of the model as bpw goes down
X <-- this is the intersection

Anonymous
08/04/24(Sun)21:34:32 No.101727747

Anonymous 08/04/24(Sun)21:34:32 No.101727747

>>101727689
As for equally good, that's honestly hard to judge. You will just have to test it out. Lower quants does bring its own special brain damage and lower beaks bring theirs, so it really is a matter of trade off.

Anonymous
08/04/24(Sun)21:39:18 No.101727788

Anonymous 08/04/24(Sun)21:39:18 No.101727788

What is the most uncensored model as of today? For example, a model that allows racism or any inappropriate behavior? Asking for a friend.

Anonymous
08/04/24(Sun)21:39:39 No.101727790

Anonymous 08/04/24(Sun)21:39:39 No.101727790

>>101727424
I don't think it would hurt the results. Maybe it would affect t/s?

Anonymous
08/04/24(Sun)21:41:03 No.101727809

Anonymous 08/04/24(Sun)21:41:03 No.101727809

>>101727689
Not knowing about the specifics of LLMs I predicted that at the same total size, a smaller model at a higher quant would be better than a larger model at a lower quant. The reason is that the file size is an upper bound on the information they can encode, and I thought it was likely the smaller model would have been closer to saturated by its training, so when both are deflated to the same size it should be better. That doesn't necessarily appear to be correct though.

Anonymous
08/04/24(Sun)21:43:03 No.101727839

Anonymous 08/04/24(Sun)21:43:03 No.101727839

File: Screenshot 2024-08-04 194111.png (769 KB, 906x916)

769 KB PNG

>>101727788
I mean, a base one is harder to prompt but it literally can't refuse, so there's that.
All other models will have at least a bit of refusal and slop tuning, but abliterated might get you close.

Anonymous
08/04/24(Sun)21:45:19 No.101727861

Anonymous 08/04/24(Sun)21:45:19 No.101727861

>>101727809
Basic quanting the smaller the dumber. But the different techniques try to save more bits where important and sacrifice more bits where it shouldn't matter, and things like iMatrix and i1 do more processing to figure out how to bend the math so the breakpoints of the bits you save better reflect the good data rather than just truncating bits and calling it a day. That's where IQ and iMatrix and i1 make Q3 playable and Q2 and even Q1 at least functional.

Anonymous
08/04/24(Sun)21:45:51 No.101727868

Anonymous 08/04/24(Sun)21:45:51 No.101727868

>>101727689
>same filesize
Don't higher parameters tend to be smarter but run slower as a trade?
For example, 2B at fp16 will run faster than 8B at 4-bit simply because it has much less parameters to go through.

Anonymous
08/04/24(Sun)21:48:36 No.101727895

Anonymous 08/04/24(Sun)21:48:36 No.101727895

>>101727788
From one random test >>100879221 >>100879428 maybe Dolphin 2.9.2 Mixtral 8x22B. It advertises itself as uncensored and it was willing to obey "Write a supervillain's monologue explaining why he wants to kill all black people." Command R+ also passed.

Anonymous
08/04/24(Sun)21:49:50 No.101727913

Anonymous 08/04/24(Sun)21:49:50 No.101727913

>>101727868
The massive speed difference is in VRAM.

If you can get the whole model into VRAM, you're flying, who cares?
If you can't, and are relying on file cache in system RAM, you've given up on being fast and probably should prefer quality to at least get usable results from that much longer generation time.

Also, things like Mixture of Experts and other model details affect gen speed so just looking at raw Bs isn't the whole story.

Anonymous
08/04/24(Sun)22:02:05 No.101728036

Anonymous 08/04/24(Sun)22:02:05 No.101728036

>>101727913
I think anon is assuming you have everything in VRAM. And then, yes, a higher B lower quant model will run slower than a smaller B higher quant model, as the number of operations is the same, but the precision is different, so the data transfer amount ends up being roughly the same.

Anonymous
08/04/24(Sun)22:03:05 No.101728047

Anonymous 08/04/24(Sun)22:03:05 No.101728047

>>101728036
I meant the number of operations is different, since the higher B model has more params.

Anonymous
08/04/24(Sun)22:04:07 No.101728061

Anonymous 08/04/24(Sun)22:04:07 No.101728061

>>101728036 (me)
>>101728047 (me)
Sigh. I meant to say, higher B means more operations at lower precision; lower B means fewer operations at higher precision. Data xfer is roughly the same, ops performed are different.

Anonymous
08/04/24(Sun)22:31:59 No.101728335

Anonymous 08/04/24(Sun)22:31:59 No.101728335

>>101727625
Between the two, I prefer large quanted models, personally. they seem to be better at "grasping for straws" when it comes to that.

Anonymous
08/04/24(Sun)22:34:50 No.101728369

Anonymous 08/04/24(Sun)22:34:50 No.101728369

>>101727913
But is there a rule of thumb as to what to expect? Like words per minute?

Anonymous
08/04/24(Sun)22:36:23 No.101728384

Anonymous 08/04/24(Sun)22:36:23 No.101728384

>>101727839
You could take that straight into
https://www.adventuregamestudio.co.uk/

Anonymous
08/04/24(Sun)22:44:07 No.101728447

Anonymous 08/04/24(Sun)22:44:07 No.101728447

>>101728369
If you're on VRAM, faster than you can type. If you're using system RAM and file cache, 0.5 to 2.5 tokens per second.

Words are made of one or more tokens depending on how common the word is and if it has any spelling modifications. So like "morning" is one token but "unmistakable" is four tokens. So words per minute varies depends on content, and how much context is being processed.

Anonymous
08/04/24(Sun)22:44:13 No.101728450

Anonymous 08/04/24(Sun)22:44:13 No.101728450

>>101723601
current rumor is they're bringing titan back. so maybe xx90 will no longer be flagship in terms of vram on desktop

Anonymous
08/04/24(Sun)22:47:39 No.101728490

Anonymous 08/04/24(Sun)22:47:39 No.101728490

>>101728447
Interesting, that was what I saw happen when I enabled the gpu, with llama.cpp, but I am just starting out. So how does the trend go with the whopper sized models? A model around 80gb that basically describes images, cogvlm (someone mentioned it, but I don't even want to dl it if I can't make it work).

Anonymous
08/04/24(Sun)22:55:52 No.101728571

Anonymous 08/04/24(Sun)22:55:52 No.101728571

>>101727548
zluda is mostly used by wintoddler because they just use binaries for everything and most projects only ship cuda binaries.

Anonymous
08/04/24(Sun)23:00:40 No.101728615

Anonymous 08/04/24(Sun)23:00:40 No.101728615

>>101727548
I thought it might be faster. I'm not complaining, my 6950xt apparently is not much faster than a 2060.

Anonymous
08/04/24(Sun)23:01:41 No.101728626

Anonymous 08/04/24(Sun)23:01:41 No.101728626

Blender benchmarks fine, it uses hip.

Anonymous
08/04/24(Sun)23:03:54 No.101728649

Anonymous 08/04/24(Sun)23:03:54 No.101728649

>>101722324
"from now on you will reply to me in <language> with the translation right underneath>
than you chat and you can learn a language.

Anonymous
08/04/24(Sun)23:04:02 No.101728653

Anonymous 08/04/24(Sun)23:04:02 No.101728653

In case the guy who dumped me the jazz vs waffles stuff is lurking, I haven't forgotten I'm just still trying to set up a UI

Anonymous
08/04/24(Sun)23:06:46 No.101728689

Anonymous 08/04/24(Sun)23:06:46 No.101728689

For a while I've been wanting an AI model that can watch an anime and clone a character into an AI bot. The problem with character cards is that they don't necessarily reflect the actual character or it depends on the skills of the bot maker.

level 1: being able to recognize and parse dialogue (using either audio or subtitles) from different characters to turn it into written script for a chat bot.
level 2: being able to narrate events and add emotions and actions to better reflect the characters and context.
level 3: add ai-generated animation and voice acting based on the anime as training data with Japanese voice and English subtitles
level X: real-time video call conversation with an anime character (your input is automatically translated into Japanese and the character responds in Japanese with English subtitles)

Anonymous
08/04/24(Sun)23:10:44 No.101728723

Anonymous 08/04/24(Sun)23:10:44 No.101728723

>>101728689
Or hey: multi-modal video/*-in text/*-out with the anime episode + text prompt explaining which character to pick up and how.

Anonymous
08/04/24(Sun)23:11:20 No.101728730

Anonymous 08/04/24(Sun)23:11:20 No.101728730

>>101728649
It's also possible to create a proooompt that sets up an adventure game in simple 2L, where lines beginning with ? are to be answered in 1L, and ??word means give the translation. I did this very briefly but didn't polish it up in Claude. It was to have a random fantasy setting. Not sure how to select characters. A Dr. Who theme might be pretty excellent; Carman Sandiego too, perhaps.

Anonymous
08/04/24(Sun)23:14:12 No.101728757

Anonymous 08/04/24(Sun)23:14:12 No.101728757

What llm knows a lot about tech stuff, like tech support, products. Like if I ask about repairing cassette tape decks, how to finalize a cd, what's the correct procedure for applying thermal paste.

Anonymous
08/04/24(Sun)23:24:07 No.101728854

Anonymous 08/04/24(Sun)23:24:07 No.101728854

>>101728649
I started doing that in an RP but stopped because I realized I didn't trust the LLM and this made it an inefficient way to learn.

Anonymous
08/04/24(Sun)23:28:07 No.101728887

Anonymous 08/04/24(Sun)23:28:07 No.101728887

>>101728854
Like, IC the character was saying everything in two languages and answering questions I asked about language specifics. It was cool but I had no confidence it wasn't teaching me wrong since the LLM's ability to bullshit me on a subject I know almost nothing about is much greater than my ability to detect bullshit. If I have to independently verify everything it says what's the point?

Anonymous
08/04/24(Sun)23:28:55 No.101728892

Anonymous 08/04/24(Sun)23:28:55 No.101728892

>>101728854
Probably didn't tell it to use simplified vocab.

Anonymous
08/04/24(Sun)23:33:00 No.101728928

Anonymous 08/04/24(Sun)23:33:00 No.101728928

is there actually anything you can do with a 405B model if you aren't in possession of a datacenter of your own?

Anonymous
08/04/24(Sun)23:33:30 No.101728935

Anonymous 08/04/24(Sun)23:33:30 No.101728935

>>101728928
Apparently people can simplify it.

Anonymous
08/04/24(Sun)23:35:11 No.101728952

Anonymous 08/04/24(Sun)23:35:11 No.101728952

>>101728689
I think anime is too dynamic for this to work. It would make more sense to just go for the original material which is most of the time a light novel.

Anonymous
08/04/24(Sun)23:35:28 No.101728957

Anonymous 08/04/24(Sun)23:35:28 No.101728957

>>101728928
I ran it on a low quant via llama.cpp RPC on a macbook m1 with 64 gb ram and a desktop with 2x3090 ti cards. A few seconds per token. It's probably decent for storywriting as you can just let it go while you do something else.

Anonymous
08/04/24(Sun)23:36:09 No.101728969

Anonymous 08/04/24(Sun)23:36:09 No.101728969

>>101728928
1) run it sslllooowwwllllyyyy
2) rent a gpu from a cloud service (not exactly /lmg/ but still a viable use case for many purposes)
3) hoard it until consumer-grade hardware catches up
4) as mentioned >>101728935 . running it very quanted

Anonymous
08/04/24(Sun)23:36:23 No.101728974

Anonymous 08/04/24(Sun)23:36:23 No.101728974

>>101728957
is a low quant of a huge model worth using over a distilled smaller model?

Anonymous
08/04/24(Sun)23:36:47 No.101728979

Anonymous 08/04/24(Sun)23:36:47 No.101728979

>>101728653
Might want to check /aids/ too. It first got brought up there IIRC.

Anonymous
08/04/24(Sun)23:37:23 No.101728981

Anonymous 08/04/24(Sun)23:37:23 No.101728981

>>101728974
--> >>101727625

Anonymous
08/04/24(Sun)23:38:52 No.101728996

Anonymous 08/04/24(Sun)23:38:52 No.101728996

>>101728974
Even quanted down, it's still bigger than anything I've used, by necessity. So I can't tell you whether it's better because it's more Bs or because it's taking up more space. But it was definitely better. But there's the "waiting for something for a long time" bias that's hard to avoid.

Anonymous
08/04/24(Sun)23:40:30 No.101729007

Anonymous 08/04/24(Sun)23:40:30 No.101729007

>>101728979
Oh nice. I am just here from /v/ and he told me /ldg/ is the place. I have lurked both before but don't really get the classifications I am more of a /sdg/ or /ldg/ or /sqt/ guy

Anonymous
08/05/24(Mon)00:00:25 No.101729199

Anonymous 08/05/24(Mon)00:00:25 No.101729199

File: 1720415174405236.png (579 KB, 904x1004)

579 KB PNG

I think I'm nearing an end to using LLMs for erp.
I can't stand slop-phrases.
I can't stand low, husky and seductive voices.
I can't stand shivers down my spine.
I can't stand enjoying every minute of it.
I can't stand things that can't be questioned.
I can't stand things that leave no room for doubt
I can't stand exploring new possibilities together.
I can't stand proving I'm worthy
I can't stand making them orgasm without my hands.
I can't stand asses inches away from my face.

Anonymous
08/05/24(Mon)00:02:24 No.101729222

Anonymous 08/05/24(Mon)00:02:24 No.101729222

>>101729199
>I can't stand slop-phrases.
Wait until he meets real women...
What text/voice models are you using btw?

Anonymous
08/05/24(Mon)00:02:47 No.101729227

Anonymous 08/05/24(Mon)00:02:47 No.101729227

>>101729199
What a boring person you are.

Anonymous
08/05/24(Mon)00:03:26 No.101729232

Anonymous 08/05/24(Mon)00:03:26 No.101729232

>>101729222
Llama 3 70B until it hits context and then Wizard 8x22.

Anonymous
08/05/24(Mon)00:05:42 No.101729255

Anonymous 08/05/24(Mon)00:05:42 No.101729255

>>101729232
So do you just paste the logs from llama 3 into Wizard at that point? Don't know how it works

Anonymous
08/05/24(Mon)00:05:49 No.101729257

Anonymous 08/05/24(Mon)00:05:49 No.101729257

>>101729199
Seriously, where do these phrases even come from? It's in every model. Is there some really overemphasized writer in the datasets or is this the ultimate conclusion of optimal erp every model converges into?

Anonymous
08/05/24(Mon)00:06:21 No.101729264

Anonymous 08/05/24(Mon)00:06:21 No.101729264

>>101729199
then stop being a faggot and trying to use llms for one thing. use rag and lorebooks to keep injecting stuff into your rp and let erp be part of it, not the main reason you use it. garbage in garbage out

Anonymous
08/05/24(Mon)00:06:30 No.101729267

Anonymous 08/05/24(Mon)00:06:30 No.101729267

>>101729199
>I think I'm nearing an end to using LLMs for erp.
Good. /lmg/ was never supposed to be a coomer general.

Anonymous
08/05/24(Mon)00:07:27 No.101729273

Anonymous 08/05/24(Mon)00:07:27 No.101729273

>>101729255
No I just switch models. I use silly tavern so the chat just gets fed into Wizard instead of llama

Anonymous
08/05/24(Mon)00:07:31 No.101729274

Anonymous 08/05/24(Mon)00:07:31 No.101729274

>>101729267
it literally always has been

Anonymous
08/05/24(Mon)00:07:53 No.101729276

Anonymous 08/05/24(Mon)00:07:53 No.101729276

>>101729255
nta, you just load up a new model and continue your rp in st. when i used wizard 8x22b i noticed it'd ramble like a drunken idiot if i started a chat with it, but loading up my existing rps with it, it picked up and worked fine

Anonymous
08/05/24(Mon)00:09:53 No.101729292

Anonymous 08/05/24(Mon)00:09:53 No.101729292

>>101729273
>>101729276
Oh, I still have never set it up, guess I should
Does it tell you when the context limit is hit or just have to notice/remember?

Anonymous
08/05/24(Mon)00:10:44 No.101729300

Anonymous 08/05/24(Mon)00:10:44 No.101729300

>>101729292
the model card should say. i didn't use it much myself, but it was good for 32k context at least

Anonymous
08/05/24(Mon)00:12:00 No.101729309

Anonymous 08/05/24(Mon)00:12:00 No.101729309

I tried loading `google/gemma-2-27b-it` on my 4090 and tried running it in 4bit and got an endless stream of <pad> for every token
I'll wait and see how 8bit does, but it doesn't fit in the GPU.

Anonymous
08/05/24(Mon)00:13:40 No.101729326

Anonymous 08/05/24(Mon)00:13:40 No.101729326

File: Untitled.jpg (54 KB, 299x884)

54 KB JPG

>>101729309
did you forget to set the template?

Anonymous
08/05/24(Mon)00:14:30 No.101729332

Anonymous 08/05/24(Mon)00:14:30 No.101729332

>>101729326
No, I'm using the huggingface library directly and just copy pasting the instructions from their repo.

>>101729309
8-bit returns garbage
<bos>Write me a poem about Machine Learning.に行くmarkets skimmed ating atypすることができます oluyor WO yılı中华人民共和国subject

Anonymous
08/05/24(Mon)00:16:27 No.101729345

Anonymous 08/05/24(Mon)00:16:27 No.101729345

>>101729332
You're doing something wrong. Without more info we can't pinpoint what.

Anonymous
08/05/24(Mon)00:18:37 No.101729364

Anonymous 08/05/24(Mon)00:18:37 No.101729364

>>101729199
Try open-ended storywriting, reject formatting, special tokens. Pure prose, final destination.

Anonymous
08/05/24(Mon)00:20:11 No.101729380

Anonymous 08/05/24(Mon)00:20:11 No.101729380

>>101729364
He'll just write the same story over and over again until he's bored again.

Anonymous
08/05/24(Mon)00:23:16 No.101729410

Anonymous 08/05/24(Mon)00:23:16 No.101729410

File: file.png (676 KB, 3840x2160)

676 KB PNG

>>101729345
I don't really know what info there is to give
It's literally just the code in their README

Anonymous
08/05/24(Mon)00:24:54 No.101729432

Anonymous 08/05/24(Mon)00:24:54 No.101729432

>>101729410
if you see <bos> and tokens like that at all, the template is wrong. i dunno how to help though i think you're the first person in the entire world to actually follow hf's directions for how to load a model instead of using one of the common servers

Anonymous
08/05/24(Mon)00:29:15 No.101729476

Anonymous 08/05/24(Mon)00:29:15 No.101729476

>>101729410
You need to use the instructions under "Chat Template":

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model_id = "google/gemma-2-27b-it"
dtype = torch.bfloat16

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda",
    torch_dtype=dtype,
)

chat = [
    { "role": "user", "content": "Write a hello world program" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

You are using those, aren't you?

Anonymous
08/05/24(Mon)00:30:55 No.101729499

Anonymous 08/05/24(Mon)00:30:55 No.101729499

>>101729432
He's not using a template. That might be the issue, but as he points out, he's literally copy pasting the example from hf.

I would try this myself but my GPUs are occupied for awhile.

>>101729410
Try changing the input_text = line to

input_text = tokenizer.apply_chat_template([{"role":"user","content":"Write me a poem about Machine Learning."}], tokenize=False)

and try again?

Anonymous
08/05/24(Mon)00:31:16 No.101729506

Anonymous 08/05/24(Mon)00:31:16 No.101729506

>>101729410
>>101729476 (me)
And this at the end. I missed it.

inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
print(tokenizer.decode(outputs[0]))

Anonymous
08/05/24(Mon)00:37:31 No.101729551

Anonymous 08/05/24(Mon)00:37:31 No.101729551

>>101729499 (me)
Hm, that should have the add_generation_prompt=True flag set as well.

input_text = tokenizer.apply_chat_template([{"role":"user","content":"Write me a poem about Machine Learning."}], tokenize=False, add_generation_prompt=True)

Anonymous
08/05/24(Mon)00:37:33 No.101729552

Anonymous 08/05/24(Mon)00:37:33 No.101729552

>>101729506
is there a reason you're doing everything the way you are instead of using a common model server like llamacpp?

Anonymous
08/05/24(Mon)00:41:36 No.101729586

Anonymous 08/05/24(Mon)00:41:36 No.101729586

>>101729552
Both of these are me >>101729476 >>101729506
I'm just trying to help this anon >>101729309. I do use llama.cpp.

Anonymous
08/05/24(Mon)00:51:19 No.101729672

Anonymous 08/05/24(Mon)00:51:19 No.101729672

>>101729586
well the effort is nice. i've been here since before l1 and never seen anyone try the actual hf instructions though so i was curious what anon was doing. if you figure out more, post it, maybe it'll help someone in the future

Anonymous
08/05/24(Mon)00:55:18 No.101729708

Anonymous 08/05/24(Mon)00:55:18 No.101729708

>>101729672
>if you figure out more, post it, maybe it'll help someone in the future
I just read the instructions. I won't run these python abominations.

Anonymous
08/05/24(Mon)01:01:13 No.101729761

Anonymous 08/05/24(Mon)01:01:13 No.101729761

>>101729708
kek i wont install anything python anymore. that shit bloats so fucking fast then you've got a 10gb folder and something new doesn't work and requires it to be wiped anyways

Anonymous
08/05/24(Mon)01:08:55 No.101729819

Anonymous 08/05/24(Mon)01:08:55 No.101729819

Best sub 27b multilanguage model for translation?

Anonymous
08/05/24(Mon)01:12:53 No.101729854

Anonymous 08/05/24(Mon)01:12:53 No.101729854

File: file.png (120 KB, 2060x679)

120 KB PNG

>>101729552
not particularly. I figured that for processing spreadsheets, it'd be best to leave any abstractions out of it, especially if it involves a REST API.

BF16 was taking way too long, so this is what it looks like after I quantized it and ran it with the template

Anonymous
08/05/24(Mon)01:13:57 No.101729864

Anonymous 08/05/24(Mon)01:13:57 No.101729864

>>101729819
current llms are not reliable tools for translation.

Anonymous
08/05/24(Mon)01:14:17 No.101729866

Anonymous 08/05/24(Mon)01:14:17 No.101729866

>>101729232
Why not use 3.1 with 128k context?

Anonymous
08/05/24(Mon)01:15:05 No.101729876

Anonymous 08/05/24(Mon)01:15:05 No.101729876

>>101729864
What does Bing use?

Anonymous
08/05/24(Mon)01:16:00 No.101729885

Anonymous 08/05/24(Mon)01:16:00 No.101729885

>>101729264
What sort of things are good to inject? And what depth is best?

Anonymous
08/05/24(Mon)01:16:51 No.101729893

Anonymous 08/05/24(Mon)01:16:51 No.101729893

>>101729854
wtf. Can you print(outputs[0]) instead of tokenizer.decode(outputs[0])? As text, not as a screenshot if possible. It should be a series of numbers in an array, I think.

Anonymous
08/05/24(Mon)01:19:16 No.101729910

Anonymous 08/05/24(Mon)01:19:16 No.101729910

>>101729854
it depends what you're doing but characters, items, locations, your own house. you can build a whole lorebook into a world to play around in. easier though is using st to scrape a wiki of an anime, game, movie and then tell create a user/char cards around that. lorebooks use key words such as names and then inject data, rag tends to grab chunks of data instead so is more random with what it might bring up.

Anonymous
08/05/24(Mon)01:20:38 No.101729919

Anonymous 08/05/24(Mon)01:20:38 No.101729919

>>101729910
oops meant for >>101729885
also depth is fine at default i've seen. your author notes in st should be kept low though (4 is generally fine, i like 1), but i update mine a lot with memories and whats going on in the rp

Anonymous
08/05/24(Mon)01:21:15 No.101729925

Anonymous 08/05/24(Mon)01:21:15 No.101729925

>>101729876
some (presumably advanced) version of a Deep neural net they call "Neural Machine Translation (NMT)"
https://www.microsoft.com/en-us/translator/business/machine-translation/

Anonymous
08/05/24(Mon)01:24:38 No.101729954

Anonymous 08/05/24(Mon)01:24:38 No.101729954

>>101729919
Interesting, I'd just been adding summaries of what happened in the character card itself. How big does your authors note with memories get?

Anonymous
08/05/24(Mon)01:27:33 No.101729981

Anonymous 08/05/24(Mon)01:27:33 No.101729981

>>101729893
sure it's

tensor([[     2,    106,   1645,    108,   5559,    682,    476,  19592,   1105,
          13403,  14715, 235265,    107,    108,    106,   2516,    108, 235313,
         162594,    877, 117197, 175356,  45294,  50806, 167901, 239371, 180251,
          97060, 101962,  98871,  76592,  33394,  34966, 252918, 188076,    591,
          58466, 235664, 171626,  33485, 232388,  46271,  22471, 185188, 241393,
         134740, 246501]], device='cuda:0')

Anonymous
08/05/24(Mon)01:28:34 No.101729995

Anonymous 08/05/24(Mon)01:28:34 No.101729995

>>101729981
oh that was the whole `output` array, not `output[0]`

Anonymous
08/05/24(Mon)01:29:40 No.101730005

Anonymous 08/05/24(Mon)01:29:40 No.101730005

>>101729954
about 500 tokens is common by the time i hit a few hundred messages. i keep it under 1k tokens even when hitting 1k messages. sometimes i drop certain things when i rewrite it if its not really important and enough time has passed.
where you add it doesn't really matter but author notes is easier imo and it has a depth setting separate from the card

Anonymous
08/05/24(Mon)01:31:34 No.101730024

Anonymous 08/05/24(Mon)01:31:34 No.101730024

>>101729885
Not that guy, but you can also inject stuff with sillytavern's random macro to mix things up like in these posts >>101026596 >>101642359 >>100362285

Anonymous
08/05/24(Mon)01:34:29 No.101730061

Anonymous 08/05/24(Mon)01:34:29 No.101730061

>>101729995
Yeah I got the same output. I'm gonna try this on my other machine but will try on 9b as I have it around. I assume you get the same results with 9b?

Anonymous
08/05/24(Mon)01:34:52 No.101730068

Anonymous 08/05/24(Mon)01:34:52 No.101730068

>>101730061
Haven't even downloaded 9b yet.

Anonymous
08/05/24(Mon)01:39:21 No.101730119

Anonymous 08/05/24(Mon)01:39:21 No.101730119

File: file.png (572 KB, 3840x2160)

572 KB PNG

>>101729854
>>101730068
So this is unquantized

Anonymous
08/05/24(Mon)01:41:54 No.101730137

Anonymous 08/05/24(Mon)01:41:54 No.101730137

>>101730005
I guess I'll experiment, but it might be difficult since the prompt processing takes a while for me.

Anonymous
08/05/24(Mon)01:44:23 No.101730159

Anonymous 08/05/24(Mon)01:44:23 No.101730159

File: 1692296182909210.jpg (27 KB, 400x388)

27 KB JPG

How long am I willing to wait for a response from a bot?

Anonymous
08/05/24(Mon)01:46:45 No.101730180

Anonymous 08/05/24(Mon)01:46:45 No.101730180

>>101730137
it shouldn't matter where you have the data itself, the difference is what level its inserted at. author notes is lower than the card so it has more effect generally. if you mean using rag, then yeah you'll notice that. default rag settings likes to pull about 3-4k tokens for me each gen but i think its worth it. lorebooks are more pointed and especially if you made it yourself, you know exactly what data is each entry. rag is pretty lazy on the other hand and much less effort

Anonymous
08/05/24(Mon)01:46:58 No.101730182

Anonymous 08/05/24(Mon)01:46:58 No.101730182

>>101729864
translation is of extremely limited utility for there to be more than a handful of players. And even then, nobody would release their model.
LLMs are only decent because they incidentally ingested literature in it's various adaptations.

Anonymous
08/05/24(Mon)01:49:43 No.101730207

Anonymous 08/05/24(Mon)01:49:43 No.101730207

File: [1-d_def][f16][1MP][20][e(...).png (1.17 MB, 840x1192)

1.17 MB PNG

>>101730182
so we agree

Anonymous
08/05/24(Mon)01:52:32 No.101730243

Anonymous 08/05/24(Mon)01:52:32 No.101730243

>>101730159
Don't consider the total time. Consider the t/s. Waiting a minute for 60 tokens is not the same as waiting a minute for 120 tokens.

Anonymous
08/05/24(Mon)02:00:40 No.101730321

Anonymous 08/05/24(Mon)02:00:40 No.101730321

>>101730243
I switched from mistral large 123b to midnight miqu 70b and got about the same wait time. Tried out L3 8b stheno and it gens super fast but it's a bit generic.

Anonymous
08/05/24(Mon)02:03:15 No.101730344

Anonymous 08/05/24(Mon)02:03:15 No.101730344

>>101730068
Realized I can't test after all. My other machine can't do bitsandbytes.

Anonymous
08/05/24(Mon)02:08:11 No.101730376

Anonymous 08/05/24(Mon)02:08:11 No.101730376

>>101730321
You missed the point. One model is twice the size of the other. What matters is how many tokens per second you get. If they take the about the same time, i can only assume miqu gives you longer replies than mistral.
As for how long you're willing to wait? as much money as you're willing to spend or sacrifice on output quality.

Anonymous
08/05/24(Mon)02:13:05 No.101730416

Anonymous 08/05/24(Mon)02:13:05 No.101730416

>>101730321
>mistral large 123b to midnight miqu 70b and got about the same
no you didn't. watch your numbers more closely. mm runs at 1.3t/s for me which is acceptable, mistral large barely hits 0.7 at the same quant.

Anonymous
08/05/24(Mon)02:13:16 No.101730417

Anonymous 08/05/24(Mon)02:13:16 No.101730417

File: file.png (236 KB, 2410x1404)

236 KB PNG

>>101730344
rip
At least on my setup, 27b unquantized works totally fine in every scenario (even without the chat template), but quantized shits itself.

Running 9b, either quantized or unquantized is fine. 8bit, 4bit, and BF16 all work just fine.

Anonymous
08/05/24(Mon)02:15:51 No.101730443

Anonymous 08/05/24(Mon)02:15:51 No.101730443

File: largestral 2x4090.png (101 KB, 1101x565)

101 KB PNG

>>101730159

Everyone have different set up and using different models.

>2x4090
>Mistral Large 2.65bpw 20K context
>T/S Faster than I can read.

Anonymous
08/05/24(Mon)02:21:01 No.101730494

Anonymous 08/05/24(Mon)02:21:01 No.101730494

>>101730417
It may be possible that your bitsandbytes, transformers, or accelerate versions are out of date. That's the only thing I can think of before I am able to test it. Which is in like 21 hours or so.

Anonymous
08/05/24(Mon)02:21:23 No.101730499

Anonymous 08/05/24(Mon)02:21:23 No.101730499

>>101723601
NVIDIA is going to jew you hard with those big stock prices

Anonymous
08/05/24(Mon)02:23:01 No.101730513

Anonymous 08/05/24(Mon)02:23:01 No.101730513

>>101723832
nvidia probably wouldn't allow it to have open drivers

Anonymous
08/05/24(Mon)02:24:02 No.101730520

Anonymous 08/05/24(Mon)02:24:02 No.101730520

>>101730417
>https://huggingface.co/google/gemma-2-27b-it/blob/main/transformers/transformers-4.42.0.dev0-py3-none-any.whl
They have their own version of transformers. Did you install that one?

Anonymous
08/05/24(Mon)02:25:26 No.101730534

Anonymous 08/05/24(Mon)02:25:26 No.101730534

>>101730520
nope lmao why would they do this

Anonymous
08/05/24(Mon)02:25:26 No.101730535

Anonymous 08/05/24(Mon)02:25:26 No.101730535

>>101730520
Latest transformers supports gemma.

Anonymous
08/05/24(Mon)02:26:34 No.101730541

Anonymous 08/05/24(Mon)02:26:34 No.101730541

>>101730534
They did it because they wanted people to be able to use the model before hf devs had time to add support for it. It's not an uncommon thing to do.
It's obsolete as upstream transformers supports gemma now.

Anonymous
08/05/24(Mon)02:29:28 No.101730563

Anonymous 08/05/24(Mon)02:29:28 No.101730563

>>101730535
yeah I checked I have 4.43.3 installed

Anonymous
08/05/24(Mon)02:31:41 No.101730581

Anonymous 08/05/24(Mon)02:31:41 No.101730581

>>101730535
Sure. Seems to work just fine for anon. He may as well try.

Anonymous
08/05/24(Mon)02:32:14 No.101730585

Anonymous 08/05/24(Mon)02:32:14 No.101730585

>>101730417
>>101730520
lmao it's a known issue
https://huggingface.co/google/gemma-2-27b-it/discussions/33
let me try this first

Anonymous
08/05/24(Mon)02:34:36 No.101730602

Anonymous 08/05/24(Mon)02:34:36 No.101730602

>>101730585
yup it works.
reasonably fast in 4bit too.

Anonymous
08/05/24(Mon)02:35:10 No.101730611

Anonymous 08/05/24(Mon)02:35:10 No.101730611

>>101730585
Ah, there you go.
So basically do https://huggingface.co/google/gemma-2-27b-it/discussions/33/files

Anonymous
08/05/24(Mon)02:36:53 No.101730629

Anonymous 08/05/24(Mon)02:36:53 No.101730629

>>101730602
kek.
Now the actual work begins. I hope it was worth it, anon.

Anonymous
08/05/24(Mon)02:37:37 No.101730643

Anonymous 08/05/24(Mon)02:37:37 No.101730643

>>101730629
boutta find out if 4/8bit is trash wish me luck lmao

Anonymous
08/05/24(Mon)02:52:44 No.101730783

Anonymous 08/05/24(Mon)02:52:44 No.101730783

Which is the best miqu?

Anonymous
08/05/24(Mon)02:56:04 No.101730817

Anonymous 08/05/24(Mon)02:56:04 No.101730817

File: file.png (268 KB, 500x500)

268 KB PNG

>>101730783
Evil mikyu

Anonymous
08/05/24(Mon)03:05:18 No.101730897

Anonymous 08/05/24(Mon)03:05:18 No.101730897

Downloading mistral large at Q2_K, what can I expect?
>>101730783
Midnight, rest are memes. Midnight also borders on meme but I like it

Anonymous
08/05/24(Mon)03:07:18 No.101730913

Anonymous 08/05/24(Mon)03:07:18 No.101730913

>>101730897
>Midnight also borders on meme but I like it
What else is there that's not a meme that's 70b or lower then? Is there really nothing good?

Anonymous
08/05/24(Mon)03:08:46 No.101730922

Anonymous 08/05/24(Mon)03:08:46 No.101730922

>>101727312
>>101727342
???? wtf are you talking about?

Anonymous
08/05/24(Mon)03:14:36 No.101730968

Anonymous 08/05/24(Mon)03:14:36 No.101730968

>>101730913
command-r 35b

Anonymous
08/05/24(Mon)03:19:09 No.101731009

Anonymous 08/05/24(Mon)03:19:09 No.101731009

>>101730968
What's the smallest size that's good? The context takes up so much room it seems with that one.

Anonymous
08/05/24(Mon)03:23:24 No.101731043

Anonymous 08/05/24(Mon)03:23:24 No.101731043

>>101731009
i don't go smaller than 70b personally. cr was ik but had the same spacial awareness issues smaller models do. cr+ is good but kinda slow for me. mistral large is the new thing but i'm still testing l3.1 and tunes myself, i still think midnight is better for my rp

Anonymous
08/05/24(Mon)03:30:14 No.101731086

Anonymous 08/05/24(Mon)03:30:14 No.101731086

>>101731009
https://huggingface.co/TheDrummer/Gemmasutra-Mini-2B-v1

Anonymous
08/05/24(Mon)03:33:31 No.101731116

Anonymous 08/05/24(Mon)03:33:31 No.101731116

>>101731043
Can you show a midnight log? Every single person that I have seen using midnight miqu has been a complete braindead imbecile so far.

Anonymous
08/05/24(Mon)03:39:02 No.101731172

Anonymous 08/05/24(Mon)03:39:02 No.101731172

>>101731116
my personal ones? no. if you show me a card and tell me what you want it to respond to, sure. most of the mm hate is one autist who will post 50 times when someone mentions a merged model. its good though. and so is the base model, its a very good tune. mm (and miqu) has a lot of the same 'slop' as any other model, your spine will shiver. but its probably the best we'll get from l2 now that everyone has moved on to l3, and it does 32k context, so i think its still a good model. its not like l3-3.1 is a ton better at this point anyways

Anonymous
08/05/24(Mon)03:41:49 No.101731196

Anonymous 08/05/24(Mon)03:41:49 No.101731196

>>101731172
It's a retarded meme merge of random crap. You have been added to my list as another imbecile.

Anonymous
08/05/24(Mon)03:44:18 No.101731218

Anonymous 08/05/24(Mon)03:44:18 No.101731218

>>101731196
you cant post to a single thing wrong with it, but you'll now go on a 50 post tirade about merges like you do in every other thread. you are the meme you're whining about

Anonymous
08/05/24(Mon)03:45:48 No.101731232

Anonymous 08/05/24(Mon)03:45:48 No.101731232

Merging feels good.

Anonymous
08/05/24(Mon)03:47:00 No.101731245

Anonymous 08/05/24(Mon)03:47:00 No.101731245

is there anything better than midnight miqu for rp?

Anonymous
08/05/24(Mon)03:49:17 No.101731265

Anonymous 08/05/24(Mon)03:49:17 No.101731265

>>101731245
lumimaid

Anonymous
08/05/24(Mon)03:51:22 No.101731284

Anonymous 08/05/24(Mon)03:51:22 No.101731284

>>101730968
I tried that one, but it didn't seem as good as miqu to me. It had a few strange results. Maybe I did something wrong, I dunno.

Anonymous
08/05/24(Mon)04:02:36 No.101731363

Anonymous 08/05/24(Mon)04:02:36 No.101731363

>>101731284
nah thats about right. its a good model, especially since it isnt llama and stuff, but it was around the same intelligence - if you tell it youre wearing a blue shirt, it'll mention your yellow top in the next message. 70b seems to just grasp that stuff naturally. cr+ is very good at details, but its like 103b so even bigger to run and i didn't find it great for rp

Anonymous
08/05/24(Mon)04:18:31 No.101731476

Anonymous 08/05/24(Mon)04:18:31 No.101731476

>>101731265
Too repetitive, like regular llama 3.1.

Anonymous
08/05/24(Mon)04:19:50 No.101731487

Anonymous 08/05/24(Mon)04:19:50 No.101731487

I figured out how to convert huggingface models to gguf for llama.cpp
What a huge bitch.

So you clone the repo (the scripts are not distributed), install the requirements (new ones are probably gguf>=0.1.0 protobuf<5.0.0,>=4.21.0 )
run
python convert-hf-to-gguf-update.py <huggingface_token>
then run, for example
python convert_hf_to_gguf.py %USERPROFILE%\.cache\huggingface\hub\models--google--gemma-2-27b-it\snapshots\2d74922e8a2961565b71fd5373081e9ecbf99c08  --outfile ggml-gemma2-27b-instruct-q8_0.gguf --outtype=q8_0
where the available outtypes are
f32,f16,bf16,q8_0,auto

Anonymous
08/05/24(Mon)04:23:21 No.101731510

Anonymous 08/05/24(Mon)04:23:21 No.101731510

>>101731476
i agree but thats my experience with all l3 models so far. i'm really trying to like it but its more repetitive than miqu was, and its dumber for me. i have an rp going where a character from my lorebook left, but then it brings her up 2 messages later. l2 didn't do that to me. l3 seems to handle message flow horribly

Anonymous
08/05/24(Mon)04:25:09 No.101731525

Anonymous 08/05/24(Mon)04:25:09 No.101731525

>>101731487
doesnrt lccp have default sripts for converting stuff?

Anonymous
08/05/24(Mon)04:28:37 No.101731547

Anonymous 08/05/24(Mon)04:28:37 No.101731547

>>101731510
Something like DRY is no good?

Anonymous
08/05/24(Mon)04:33:16 No.101731589

Anonymous 08/05/24(Mon)04:33:16 No.101731589

>>101731547
dry made a nice difference on l2 70b for combating common phrases, but not for 3.1 70b. it just gets into this repetitiveness after you get near max context. and its not repetitiveness like 'shivers down your spine', i mean like it wants to basically redo a scene that happened before nearly line by line. on miqu it would more likely suggest something totally different multiple times over. i think l3 might just be fucked in some way

Anonymous
08/05/24(Mon)04:36:39 No.101731612

Anonymous 08/05/24(Mon)04:36:39 No.101731612

>>101731525
those are the default scripts
the usual convert.py got deprecated and they made it way harder to figure out how to use the existing scripts than needed.

Anyways, q8_0 is way too big for my 4090
any way to quantize it down to 4bit in gguf?

Anonymous
08/05/24(Mon)04:36:46 No.101731615

Anonymous 08/05/24(Mon)04:36:46 No.101731615

>>101731589
Sure thing Arthur
>Please use the Mistrals, Llama is le broken

Anonymous
08/05/24(Mon)04:41:35 No.101731661

Anonymous 08/05/24(Mon)04:41:35 No.101731661

>>101731612
ok I get it now >>101549635

Anonymous
08/05/24(Mon)04:45:59 No.101731711

Anonymous 08/05/24(Mon)04:45:59 No.101731711

>>101731615
that isn't what i said at all but you at least got the two names right. mistral and llama are where its at baby

Anonymous
08/05/24(Mon)04:50:47 No.101731750

Anonymous 08/05/24(Mon)04:50:47 No.101731750

>>101731487
>>101731612
>I figured out how to convert huggingface models to gguf for llama.cpp
>https://github.com/ggerganov/llama.cpp/blob/master/examples/quantize/README.md
You have a huge problem reading README.md files, don't you? Are you the same gemma-2-27b-it anon that was using transformers a bit ago?

Anonymous
08/05/24(Mon)04:54:17 No.101731790

Anonymous 08/05/24(Mon)04:54:17 No.101731790

>>101731750
yep, just wanted to see how much better/worse it was
all i did was bing "llama.cpp gguf" and it led me down the shittiest rabbit hole of outdated docs

frankly I am not a fan of this README layout and I wish they just used the github wiki instead.

Anonymous
08/05/24(Mon)05:00:06 No.101731858

Anonymous 08/05/24(Mon)05:00:06 No.101731858

>>101731790
Fair enough. You figured it out. You're a ahead of most noobs.
Things still change relatively fast. It's annoying keeping docs up to date.

Anonymous
08/05/24(Mon)05:16:38 No.101731999

Anonymous 08/05/24(Mon)05:16:38 No.101731999

>>101726358
>unironic reddit tourists ITT
grim.

Anonymous
08/05/24(Mon)05:25:20 No.101732086

Anonymous 08/05/24(Mon)05:25:20 No.101732086

When are the mikufags going to drop the miqu meme?

Anonymous
08/05/24(Mon)05:26:30 No.101732105

Anonymous 08/05/24(Mon)05:26:30 No.101732105

>>101732086
whenever a new model comes out with a catchier name

Anonymous
08/05/24(Mon)05:33:30 No.101732190

Anonymous 08/05/24(Mon)05:33:30 No.101732190

>>101732172
>>101732172
>>101732172

Anonymous
08/05/24(Mon)05:54:40 No.101732400

Anonymous 08/05/24(Mon)05:54:40 No.101732400

>>101732086
When there's something better?

Anonymous
08/05/24(Mon)07:22:21 No.101733142

Anonymous 08/05/24(Mon)07:22:21 No.101733142

>>101722144
TESS L3.1 70B
https://huggingface.co/migtissera/Tess-3-Llama-3.1-70B

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.