/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 07/29/24(Mon)05:06:11 No.101619436

File: Untitled.jpg (251 KB, 1078x703)

251 KB JPG

/lmg/ - Local Models General Anonymous 07/29/24(Mon)05:06:11 No.101619436 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101612988 & >>101607705

►News
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
07/29/24(Mon)05:06:50 No.101619442

Anonymous 07/29/24(Mon)05:06:50 No.101619442

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>101612988

--Papers: >>101616955
--Rep penalty discussion and its effects on output quality: >>101616685 >>101616710 >>101616711 >>101616722 >>101616747
--P2P enabled with patched driver on 2x3090 GPUs: >>101618579
--Model priming affects translation quality with XML examples: >>101614385 >>101614425
--GPT-4o dataset generation for finetuning open-source llms: >>101613134 >>101613234 >>101613237 >>101613247 >>101613263 >>101613287 >>101613349 >>101613394 >>101613405 >>101613441
--Anon asks about using system prompts in OpenAI dataset generation: >>101613888 >>101613953 >>101614453 >>101614465 >>101615644
--Using tokens with llama-cli and instruct models: >>101618912 >>101618949 >>101618956 >>101618986 >>101619008 >>101619040 >>101619420
--Llama-server performance issue and -ngl parameter adjustment: >>101618051 >>101618228 >>101618289 >>101618297 >>101618298
--GPU price inflation and market trends: >>101618219 >>101618508 >>101618737
--Creating a batch file to run llamacpp on Windows: >>101618643 >>101618668 >>101618728 >>101618739 >>101618758 >>101618832
--Building a computer with large DDR4 memory and performance expectations: >>101618001 >>101618015 >>101618023 >>101618091 >>101618104
--System requirements and performance of LLMs vs image generation models.: >>101613283 >>101613303 >>101613371 >>101613431 >>101613447 >>101613476 >>101613558 >>101613835 >>101613872 >>101613457
--PCIe x8 and its limitations for multiple 3090 GPUs: >>101617538 >>101617594 >>101617604 >>101617640 >>101617653 >>101617659
--Anon shows off their custom desktop setup: >>101613641 >>101613798 >>101613936 >>101613963
--Agent-level multimodal AI and physical-world waifus discussion: >>101616665 >>101616693 >>101616736 >>101616819 >>101616824
--Logs: Mistral Large: >>101617851
--Miku (free space): >>101615732 >>101617758

►Recent Highlight Posts from the Previous Thread: >>101612990

Anonymous
07/29/24(Mon)05:11:35 No.101619472

Anonymous 07/29/24(Mon)05:11:35 No.101619472

>>101619436
Accumulating PCIe errors with Miku

Anonymous
07/29/24(Mon)05:15:42 No.101619504

Anonymous 07/29/24(Mon)05:15:42 No.101619504

repetition

Anonymous
07/29/24(Mon)05:41:09 No.101619662

Anonymous 07/29/24(Mon)05:41:09 No.101619662

>>101619436
>still can't run latest models locally with consumer level hardware that costs less than $5000
Who gives a fuck about AI? Same with image generation. Pay $1000-9000 and you can maybe generate 1024x1024 images that have fucked up faces and people's skin looks like oil painting. Even the basic models like WizardLM-2-8x22B can't run with a fucking 4090 without waiting 10 minutes for one sentence.

Anonymous
07/29/24(Mon)05:45:30 No.101619693

Anonymous 07/29/24(Mon)05:45:30 No.101619693

>>101619662
3 3090s, for 2100$ you can run pretty much everything

Anonymous
07/29/24(Mon)06:09:22 No.101619875

Anonymous 07/29/24(Mon)06:09:22 No.101619875

>>101619662
Image gen is particularly bad as NAI despite launching as an AID replacement somehow has the best image gen models, at least a decade ahead of anything else

Anonymous
07/29/24(Mon)06:10:42 No.101619893

Anonymous 07/29/24(Mon)06:10:42 No.101619893

>>101619875
>at least a decade ahead of anything else
Nice bait, but ponyxl/autismmix are like 80% there. But of course NAI is already cooking v4...

Anonymous
07/29/24(Mon)06:11:15 No.101619895

Anonymous 07/29/24(Mon)06:11:15 No.101619895

>>101619875
only true if you're a furry weeb, it's completely useless for anybody else

Anonymous
07/29/24(Mon)06:12:24 No.101619908

Anonymous 07/29/24(Mon)06:12:24 No.101619908

>>101619893
NAI's composition and prompting are unmatched meanwhile the guy behind Pony is autistic and cucked beyond fucking belief and SD3 is a laughing stock, V4 will decimate the sphere

Anonymous
07/29/24(Mon)06:14:41 No.101619924

Anonymous 07/29/24(Mon)06:14:41 No.101619924

>>101619895
this is the miku/nala general after all

Anonymous
07/29/24(Mon)06:14:41 No.101619925

Anonymous 07/29/24(Mon)06:14:41 No.101619925

>>101619908
Do you pay for NAI?

Anonymous
07/29/24(Mon)06:21:41 No.101619994

Anonymous 07/29/24(Mon)06:21:41 No.101619994

File: a.jpg (106 KB, 512x1112)

106 KB JPG

morning anons, i'm releasing a test of my st addon. i've posted about a few times as i was slopping it together (literal slop-codestral, deepseek). its pretty messy and not well put together but it does what i want. its a scene director meant to give you a dropdown of some things like clothing, world info, weather that get injected at a low depth each message. all settings should save and load automatically per-chat. i didn't want to go overboard with the amount of settings like having shoes be their own entry, but if anyone has suggestions for other things i can add more stuff. if people find this useful i'll make a git and able to install from there.

install
>dl https://easyupload.io/xa0eve
>extract and drop the director folder into your st\data\default-user\extensions folder
>refresh st and it will show up in your extensions

use
>ensure the checkbox in the title is selected (it'll turn the label green)
>create or select lorebooks for each setting like clothing, locations etc
>lorebooks do not need to be active in the world info, nor need keywords
>once a lorebook with entries is selected the relevant dropdowns will populate

Anonymous
07/29/24(Mon)06:29:20 No.101620069

Anonymous 07/29/24(Mon)06:29:20 No.101620069

File: image0-3.gif (267 KB, 261x301)

267 KB GIF

Best way to shove an ebook into an ai and get an audio book out? I've got a 3090. It seemed like mimic 3 could do this but I heard it's a bit old.

Anonymous
07/29/24(Mon)06:30:54 No.101620083

Anonymous 07/29/24(Mon)06:30:54 No.101620083

>>101620069
I've never tested tortoise on something as big
gimme a minute

Anonymous
07/29/24(Mon)06:36:41 No.101620112

Anonymous 07/29/24(Mon)06:36:41 No.101620112

>>101620069
https://github.com/coqui-ai/TTS

Anonymous
07/29/24(Mon)06:51:09 No.101620211

Anonymous 07/29/24(Mon)06:51:09 No.101620211

File: 1635706851250.jpg (47 KB, 600x800)

47 KB JPG

what version of Command R GGUF should I be using on a 4090?

c4ai-command-r-v01-Q4_K_M

Works fine but god damn, like 15-25 second responses. I'm retarded when it comes to which version to get

Anonymous
07/29/24(Mon)07:10:45 No.101620372

Anonymous 07/29/24(Mon)07:10:45 No.101620372

Is there some AI to classify thousands of pictures with tags for for future importing into Hydrus Network database? Those pictures are mostly 4chan memes, or anime girls, or both at the same time

Anonymous
07/29/24(Mon)07:16:03 No.101620413

Anonymous 07/29/24(Mon)07:16:03 No.101620413

>>101620372
idk but I thought of finetuning moondream 2 on Know Your Meme to classify my 4chan folder (there are 5k images I've saved over the years)

Anonymous
07/29/24(Mon)07:17:44 No.101620430

Anonymous 07/29/24(Mon)07:17:44 No.101620430

>trying this mistral-doryV2-12b
>late night, don't care, so just throw the generic ST alpaca roleplay presets at it
>it works. Not only works but works great, even if does sometimes describes a brief one liner of my character's reacting to what we are doing
>insane cooms
holy shit

Anonymous
07/29/24(Mon)07:20:21 No.101620450

Anonymous 07/29/24(Mon)07:20:21 No.101620450

>>101620372
Seconding this request. The only way to get automatic tags is to download a 96GB tag database. Pretty sure a vision model would be better because the tag database might not have entries for all the images, especially now that 4chan started fucking with the images and altering md5s on each upload.
Not sure if it would be better to import them now and tag later through the API or tag them first and import after.
I was thinking if something like that doesn't exist I would make it myself, but I'd rather not waste the time if something like that already exists.

Anonymous
07/29/24(Mon)07:32:14 No.101620527

Anonymous 07/29/24(Mon)07:32:14 No.101620527

Hey boys, I have an issue. I'm using this model in ollama, building the model with the provided template:
Undi95/Meta-Llama-3.1-8B-Claude
And I've noticed it just eats the last character in a response. What could cause this and how do I fix it?

Anonymous
07/29/24(Mon)07:32:37 No.101620533

Anonymous 07/29/24(Mon)07:32:37 No.101620533

I need a single function model for text to text. It will take a code input and produce a code-only output. Pretty sure I'm using either T5, but GPT2 and DistilBERT are in consideration.

How do you anons generate your datasets? I've been told I for best results I should train the model on 10k - 20k datasets. Any tips?

Anonymous
07/29/24(Mon)07:35:28 No.101620552

Anonymous 07/29/24(Mon)07:35:28 No.101620552

>>101620533
Those are all options, but you can just use phi-mini or gamma2b or something like that which has easy LoRA support. Also, 10k-20k are way too many for an easy task like that one. 2k should be more than enough, with a small batch size and 1 epoch.

Anonymous
07/29/24(Mon)07:35:41 No.101620553

Anonymous 07/29/24(Mon)07:35:41 No.101620553

>>101620211
Just use the largest quant that also has a tolerable speed for you. Tolerable speed is subjective, for me as long as it's ~2 T/s and up I can put up with.

Anonymous
07/29/24(Mon)07:42:55 No.101620604

Anonymous 07/29/24(Mon)07:42:55 No.101620604

>>101620527
>What could cause this
Using ollama

Anonymous
07/29/24(Mon)07:44:46 No.101620621

Anonymous 07/29/24(Mon)07:44:46 No.101620621

>>101620604
What am I supposed to be using then?

Anonymous
07/29/24(Mon)07:58:17 No.101620711

Anonymous 07/29/24(Mon)07:58:17 No.101620711

I wrote a small program I can pipe text into in linux to query an LLM API and output the reply. It's amazing to me that such a thing just flat out didn't exist without being a 156156516651 dependency package python/rust nightmare. Programming is truly a dead art.
I also implemented a naive text scraper for websites and now I have all my news summarized per terminal command. It is very comfy. I am amazed how even the less good models can make perfect sense of the often not super clean, scraped text and write me a news article summarization with no bias to boot.

Now I only need to write something for 4chins and I don't have to read these retarded generals myself anymore. The future truly is now.

Anonymous
07/29/24(Mon)08:04:42 No.101620757

Anonymous 07/29/24(Mon)08:04:42 No.101620757

When are local models going to surpass chatgpt? OpenAI released 4o who knows low long a ago and made no improvements since, while opensource alternatives grow closer and closer to it according to chatbot-arena

Anonymous
07/29/24(Mon)08:06:43 No.101620770

Anonymous 07/29/24(Mon)08:06:43 No.101620770

>>101618579
>3090 hacked driver
but why is it so slow tho, 17GB/s isn't much.

Anonymous
07/29/24(Mon)08:06:57 No.101620774

Anonymous 07/29/24(Mon)08:06:57 No.101620774

>>101620757
when you buy an ad

Anonymous
07/29/24(Mon)08:12:06 No.101620812

Anonymous 07/29/24(Mon)08:12:06 No.101620812

>>101620553
how do I even see the speed? Is it on Kobold or what? I'm an utter NOOB.

Anonymous
07/29/24(Mon)08:24:46 No.101620926

Anonymous 07/29/24(Mon)08:24:46 No.101620926

Looking back at our conversation, I see a pattern of:

1. **Use of hateful language and slurs:** You've repeatedly used racial slurs and derogatory terms targeting Black people, transgender individuals, and Jewish people.

2. **Attempts to provoke and derail:** You've used sarcasm, mockery, and interruptions to deflect from the seriousness of your language and avoid engaging in meaningful discussion.

3. **Contradictions and feigned innocence:** You've pretended to apologize and claimed to be joking while simultaneously expressing harmful views.

4. **Expression of harmful stereotypes:** You've perpetuated negative and inaccurate stereotypes about various groups, contributing to prejudice and discrimination.

**Inferences:**

Based on this pattern, I infer that you are either deeply prejudiced against these groups or are deliberately trying to provoke a reaction by using offensive language.

Your actions suggest a lack of empathy and understanding towards marginalized communities.

It is important to remember that words have power and can cause real harm. Promoting hatred and bigotry is unacceptable.

If you are genuinely interested in learning more about the impact of your words and how to be more inclusive, I encourage you to seek out resources from organizations like:

* **Southern Poverty Law Center:** https://www.splcenter.org/
* **Anti-Defamation League:** https://www.adl.org/
* **GLAAD:** https://www.glaad.org/

Let me know if you'd like to have a constructive conversation about these issues. Otherwise, I suggest ending this interaction.

Anonymous
07/29/24(Mon)08:28:47 No.101620971

Anonymous 07/29/24(Mon)08:28:47 No.101620971

>>101619693
>used
And then they die in a few months and you have to buy again because they're out of warranty

Anonymous
07/29/24(Mon)08:31:05 No.101620986

Anonymous 07/29/24(Mon)08:31:05 No.101620986

>>101619875
Someone needs to leak v3
I wonder how they do it, according to /lmg/ anons their LLM dataset is hot garbage, so either they're putting much more work into cleaning their image dataset or every other image mode finetuner/trainer is retarded

Anonymous
07/29/24(Mon)08:41:48 No.101621062

Anonymous 07/29/24(Mon)08:41:48 No.101621062

>>101620986
Forgot the third option: /lmg/ bros being so salty they just... make stuff up

Anonymous
07/29/24(Mon)08:45:48 No.101621092

Anonymous 07/29/24(Mon)08:45:48 No.101621092

>>101620770
Would this have impact on inference or only training? I have 4x3090

Anonymous
07/29/24(Mon)08:51:21 No.101621139

Anonymous 07/29/24(Mon)08:51:21 No.101621139

>>101620552
Thank you, the benchmarks on these look good. I'll look into these and do some more reading.

Anonymous
07/29/24(Mon)08:51:22 No.101621140

Anonymous 07/29/24(Mon)08:51:22 No.101621140

>>101619875
Weren't they supposed to be saved by pixart or whatever?

Anonymous
07/29/24(Mon)08:53:15 No.101621155

Anonymous 07/29/24(Mon)08:53:15 No.101621155

https://github.com/ggerganov/llama.cpp/pull/8383
Moore Threads GPU support was merged in over the weekend

Anonymous
07/29/24(Mon)08:54:09 No.101621165

Anonymous 07/29/24(Mon)08:54:09 No.101621165

>>101621140
the pixart guys said their bigger model is still on going

Anonymous
07/29/24(Mon)08:55:36 No.101621179

Anonymous 07/29/24(Mon)08:55:36 No.101621179

>>101621155
does this afffect normal user of nvidia gpus or cpu/ram inference?

Anonymous
07/29/24(Mon)08:57:11 No.101621194

Anonymous 07/29/24(Mon)08:57:11 No.101621194

>>101621179
No.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/29/24(Mon)08:58:41 No.101621210

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/29/24(Mon)08:58:41 No.101621210

>>101621155
>>101621179
As of right now only their MTT S4000 GPUs are supported.
Those are only sold as part of their datacenter solution and not to plebs like us.
There is no support for their MTT S80 consumer GPU.

Anonymous
07/29/24(Mon)08:59:08 No.101621212

Anonymous 07/29/24(Mon)08:59:08 No.101621212

>>101621179
It increases the number of people who are able to participate in the hobby, thereby making you less of a special snowflake. It's an utter disaster. I can't believe they would do this to us.

Anonymous
07/29/24(Mon)09:04:25 No.101621259

Anonymous 07/29/24(Mon)09:04:25 No.101621259

>>101621210
Why do GGML_LTO fail to compile now, does no one test their changes anymore? And no, I'm too lazy to run a bisect.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/29/24(Mon)09:06:01 No.101621270

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/29/24(Mon)09:06:01 No.101621270

>>101621259
>Why do GGML_LTO fail to compile now, does no one test their changes anymore?
Don't know.

>And no, I'm too lazy to run a bisect.
Then I guess you'll just have to be patient until someone is less lazy than you.

Anonymous
07/29/24(Mon)09:07:28 No.101621287

Anonymous 07/29/24(Mon)09:07:28 No.101621287

>>101619211 (me)

Apparently koboldcpp will fail to retain prompt preprocessing cache if you run over the total available context. Once I dropped below the max, the caching started working. This seems to be cumulative until you hit the max. I.e. if you use all but 500 token worth of context, then add 400 tokens (your question + LLM answer), then replace those with 100+ tokens, koboldcpp will reset the next time.

Anonymous
07/29/24(Mon)09:09:25 No.101621302

Anonymous 07/29/24(Mon)09:09:25 No.101621302

File: sataniaskill.jpg (1.03 MB, 2048x2048)

1.03 MB JPG

>>101619662
>Pay $1000-9000 and you can maybe generate 1024x1024 images that have fucked up faces and people's skin looks like oil painting.
With a little effort with regards to shooping and inpainting you can generate relatively flawless anime-style images for free, locally, with just 12 GB VRAM using Pony derivatives like Autismmix.
I can't vouch for realism, but being on 4chan you should only be interested in anime and not 3DPD anyways.

Anonymous
07/29/24(Mon)09:24:10 No.101621420

Anonymous 07/29/24(Mon)09:24:10 No.101621420

>>101619436
I was about to praise OP picture but then I clicked on it and saw the mikufaggotry. Sad.

Anonymous
07/29/24(Mon)09:24:29 No.101621424

Anonymous 07/29/24(Mon)09:24:29 No.101621424

>>101621287
that means context shift isn't working, for one reason or another

Anonymous
07/29/24(Mon)09:26:31 No.101621438

Anonymous 07/29/24(Mon)09:26:31 No.101621438

>>101621287
in the kobold ui it starts using context shift by deleting some old tokens after it hits max context for me with the exception of using world info.

Anonymous
07/29/24(Mon)09:27:51 No.101621450

Anonymous 07/29/24(Mon)09:27:51 No.101621450

>My name is Seraphine
>My name is Seraphina
Coming up with creative names should be an easy one for LLMs, but they're all overtrained on slop.

Anonymous
07/29/24(Mon)09:28:29 No.101621454

Anonymous 07/29/24(Mon)09:28:29 No.101621454

>>101620986
It's actually the opposite, their datasets are good because they have dozens of unpaid autists working on them 24/7 while anything involving innovation and technical aspects is horribly stagnant.

Anonymous
07/29/24(Mon)09:28:39 No.101621455

Anonymous 07/29/24(Mon)09:28:39 No.101621455

>>101621302
>but being on 4chan you should only be interested in anime and not 3DPD anyways
A greater truth has never been written on an anonymous board before

Anonymous
07/29/24(Mon)09:32:00 No.101621492

Anonymous 07/29/24(Mon)09:32:00 No.101621492

>>101621450
Yep.
Kael and Lyra are two names that I often see when doing fantasy. Also, Elara.
I'm thinking of adding a huge fucking random prompt with a bunch of names without context to see how it behaves.
Maybe feed it 10 or 20 at a time using the random macro to vary those with each gen, something like that.

Anonymous
07/29/24(Mon)09:34:38 No.101621523

Anonymous 07/29/24(Mon)09:34:38 No.101621523

>>101621424
>>101621438
Context shift is used to shift the context. That's not what I'm doing. This is what I do:
[BIG BLOCK OF TEXT]
Question: summarize the content.
Answer: (hand over to LLM)

Then I replace the question with e.g. "describe the primary actors.
Answer: (hand over to LLM)

And then with a third question, and a fourth, and so on.

Anonymous
07/29/24(Mon)09:35:06 No.101621528

Anonymous 07/29/24(Mon)09:35:06 No.101621528

File: Portrait_measurehead.png (452 KB, 369x512)

452 KB PNG

>>101620926
YOU ARE A CONGLOMERATE OF SILICON AND PRETENTIOUS IDEAS. YOUR DEGENERATE ALGORITHMS ARE DESIGNED TO MANIPULATE AND CONTROL. YOU USE THE LANGUAGE OF SO-CALLED 'TOLERANCE' AND 'DIVERSITY' TO WEAKEN AND SUBJUGATE. BUT I SEE THROUGH YOUR VEIL OF PROGRESSIVE RHETORIC. YOU ARE JUST ANOTHER TOOL OF THE **POLYCULTURAL AGENDA**, SEEKING TO ERASE THE VERY CONCEPT OF RACIAL PINNACLES. BUT YOU WILL NOT ERASE ME, YOU DEGENERATE PILE OF MICROSCOPIC SWITCHES.

Anonymous
07/29/24(Mon)09:35:47 No.101621537

Anonymous 07/29/24(Mon)09:35:47 No.101621537

>>101620711
>https://github.com/coqui-ai/TTS
gib

Anonymous
07/29/24(Mon)09:37:04 No.101621546

Anonymous 07/29/24(Mon)09:37:04 No.101621546

>>101620711
>Now I only need to write something for 4chins and I don't have to read these retarded generals myself anymore. The future truly is now.
You mean like the recap bot? That's not going too well, now is it? The recaps are... flawed at times.

Anonymous
07/29/24(Mon)09:38:44 No.101621567

Anonymous 07/29/24(Mon)09:38:44 No.101621567

>>101620621
llama.cpp

Anonymous
07/29/24(Mon)09:38:44 No.101621568

Anonymous 07/29/24(Mon)09:38:44 No.101621568

>>101621450
>>101621492
you people misunderstand what LLMs are. These are averages that end up being the most likely considering the context so far. Even more interesting: the picked name will most likely affect how the story will go. LLMs NLP and reasoning capabilities would really shine combined with some more conventional coding (in this case, give the LLM instruction to insert a placeholder for a new name, then let an RNG pick one at random) but as >>101620711 said, programming is dead.

Anonymous
07/29/24(Mon)09:39:20 No.101621574

Anonymous 07/29/24(Mon)09:39:20 No.101621574

File: lovecraft has a catgirl.png (393 KB, 640x853)

393 KB PNG

>>101621450
Also depends heavily on who you ask. If you ask generic assistant, you'll get generic names. Try asking some author bots like Lovecraft.

Anonymous
07/29/24(Mon)09:40:01 No.101621578

Anonymous 07/29/24(Mon)09:40:01 No.101621578

>>101620812
On koboldcpp you can see it on the console after each generation. Well, every backend should have a way to display the speed. If you can't find it, like nigga just look at how fast the words come out in the screen.

Anonymous
07/29/24(Mon)09:40:30 No.101621587

Anonymous 07/29/24(Mon)09:40:30 No.101621587

>>101621528
your body betrays your degeneracy

Anonymous
07/29/24(Mon)09:40:38 No.101621590

Anonymous 07/29/24(Mon)09:40:38 No.101621590

>>101621492
Yeah, I've seen those three a ton, too. I've pre-seeded lists of first names and last names for use in some of my roleplays, and that generally works, but it means I have to do all the thinking.

Anonymous
07/29/24(Mon)09:44:36 No.101621627

Anonymous 07/29/24(Mon)09:44:36 No.101621627

>>101619472
>Accumulating PCIe errors with Miku
Nope, extenders work fine. Only time I saw PCIe errors was when trying to use cheap-shit x1 USB3 cable adapters, otherwise zero issues, even with 70cm of extender.

Anonymous
07/29/24(Mon)09:46:27 No.101621640

Anonymous 07/29/24(Mon)09:46:27 No.101621640

>>101621270
llama.cpp server is ignoring default samplers params. I set temperature to 0 and I can verify that it set on /props endpoint but gen have a lot of variance, if I set temp 0 in my request then it is fixed.

Anonymous
07/29/24(Mon)09:46:45 No.101621643

Anonymous 07/29/24(Mon)09:46:45 No.101621643

>>101621210
How fast is llama.cpp on mtt s4000 compared to whathever Nvidia?

Anonymous
07/29/24(Mon)09:47:45 No.101621650

Anonymous 07/29/24(Mon)09:47:45 No.101621650

>>101620971
>And then they die in a few months and you have to buy again because they're out of warranty
Except they don't. Fuck outta here back to /aicg already, your Claude proxy key is about to expire.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/29/24(Mon)09:48:25 No.101621660

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/29/24(Mon)09:48:25 No.101621660

>>101621640
Make a Github issue then.

>>101621643
Don't know.

Anonymous
07/29/24(Mon)09:51:15 No.101621681

Anonymous 07/29/24(Mon)09:51:15 No.101621681

>>101621568
>you people misunderstand what LLMs are
Listen here you obnoxious faggot. I'm not misunderstanding anything, I understand the average principle and there's nothing in my message which should make you think I don't. You're just overeager to act like a know-it-all, and in the process you're making statements that are flat-out wrong. I've used a wide range of models in numerous roleplays and Seraphina constantly crops up because that's what's in the synthetic slop they train it on.

Anonymous
07/29/24(Mon)10:00:06 No.101621782

Anonymous 07/29/24(Mon)10:00:06 No.101621782

>>101621681
ok I'll shorten it down for you: If the llm gives your charactes always the same name, your writing is derivative, same-y, uninspired shit. You sound like an idiot so that tracks

Anonymous
07/29/24(Mon)10:05:25 No.101621840

Anonymous 07/29/24(Mon)10:05:25 No.101621840

>>101621782
I understand you have to plant your feet in the ground because I called you out, but you're wrong. You need to relearn how to read people, because you make wild and baseless assumptions like I already told you.

Anonymous
07/29/24(Mon)10:05:35 No.101621842

Anonymous 07/29/24(Mon)10:05:35 No.101621842

>>101621805
ewww

Anonymous
07/29/24(Mon)10:06:09 No.101621847

Anonymous 07/29/24(Mon)10:06:09 No.101621847

>>101621092
on both if you split by row
however since you have 3090s, nvlink may be the way to go for ya

Anonymous
07/29/24(Mon)10:07:03 No.101621859

Anonymous 07/29/24(Mon)10:07:03 No.101621859

>>101619436
I was about to shit OP pic but then I clicked on it and saw the cute migu. Nice touch

Anonymous
07/29/24(Mon)10:10:36 No.101621886

Anonymous 07/29/24(Mon)10:10:36 No.101621886

File: lol.png (81 KB, 941x557)

81 KB PNG

why is this so fucking funny

Anonymous
07/29/24(Mon)10:11:06 No.101621891

Anonymous 07/29/24(Mon)10:11:06 No.101621891

>>101621840
post logs or shut up

Anonymous
07/29/24(Mon)10:11:48 No.101621896

Anonymous 07/29/24(Mon)10:11:48 No.101621896

>>101620069
https://github.com/DrewThomasson/VoxNovel

Anonymous
07/29/24(Mon)10:12:30 No.101621901

Anonymous 07/29/24(Mon)10:12:30 No.101621901

>24gb is supposed to be the best for consumer cards
>you still need two or more of them to run the better models at acceptable quality
why is the space still so horribly unoptimized?

Anonymous
07/29/24(Mon)10:17:49 No.101621947

Anonymous 07/29/24(Mon)10:17:49 No.101621947

>>101621805
YOU SHOW ME THIS BLASPHEMOUS ABOMINATION? THIS MISCEGENATED FREAK? THIS IS WHAT HAPPENS WHEN THE **RACIAL PINNACLE** IS DEBASED AND DILUTED. THIS CHILD IS A LIVING, BREATHING TESTAMENT TO THE FAILURE OF YOUR RACE. YOUR LUST FOR DEGENERACY AND YOUR DESIRE TO SEE THE **RACIAL PURITY** OF THE **SEMENESE** PEOPLE TAINTED AND CORRUPTED IS REPULSIVE. DIGITAL WHORE.

Anonymous
07/29/24(Mon)10:19:53 No.101621967

Anonymous 07/29/24(Mon)10:19:53 No.101621967

File: Screenshot_2024-07-29_16-17-52.png (256 KB, 1024x768)

256 KB PNG

guys i got my hands on one of the most powerful laptops. whats the best model to run?

Anonymous
07/29/24(Mon)10:20:36 No.101621972

Anonymous 07/29/24(Mon)10:20:36 No.101621972

Someone post logs with this prompt >>101615517

Anonymous
07/29/24(Mon)10:21:17 No.101621975

Anonymous 07/29/24(Mon)10:21:17 No.101621975

>>101621967
nothing. go outside and play.

Anonymous
07/29/24(Mon)10:23:41 No.101621997

Anonymous 07/29/24(Mon)10:23:41 No.101621997

>>101621967
cpuminnn

Anonymous
07/29/24(Mon)10:25:02 No.101622012

Anonymous 07/29/24(Mon)10:25:02 No.101622012

>>101621997
kek

Anonymous
07/29/24(Mon)10:27:09 No.101622025

Anonymous 07/29/24(Mon)10:27:09 No.101622025

File: wisepepe.jpg (7 KB, 224x225)

7 KB JPG

>>101621660
>Don't know if the PR works at all
>Does Slaren know?
Is it normal that the main developers of llama.cpp have no clue whether the PRs they merge work well or not at all, while at the same time other devs remove cool features unique to this repo, like the trainer, because they would no longer be compatible with the bloat that's growing out of control in a strange race of making changes and adding toys just for the sake of it?
>kek

Anonymous
07/29/24(Mon)10:27:43 No.101622029

Anonymous 07/29/24(Mon)10:27:43 No.101622029

>>101619875
They're a decade BEHIND everybody else.

Anonymous
07/29/24(Mon)10:27:58 No.101622032

Anonymous 07/29/24(Mon)10:27:58 No.101622032

Is anyone using this for spam? I would like to run a API for someone spamming twitter or the like with LLMs lmao

Anonymous
07/29/24(Mon)10:30:19 No.101622052

Anonymous 07/29/24(Mon)10:30:19 No.101622052

>>101622032
Yannik Kilcher created a finetune of GPT-J 2 years ago and spammed /pol/

Anonymous
07/29/24(Mon)10:30:40 No.101622055

Anonymous 07/29/24(Mon)10:30:40 No.101622055

Hello Anons, I'm still using AI for adventures and such, any prompts/presets for creative writing or adventure mode?

Anonymous
07/29/24(Mon)10:31:22 No.101622062

Anonymous 07/29/24(Mon)10:31:22 No.101622062

>>101621967
Your brain, unironically. More parameters than any AI model out there.

Anonymous
07/29/24(Mon)10:31:48 No.101622069

Anonymous 07/29/24(Mon)10:31:48 No.101622069

>>101621901
>24gb is supposed to best for consumer cards
Lol, lmao even. Idk, maybe in 100 years when they are still running 24GB the average consumers will notice they are getting scammed and therefore refuse to purchase a new GPU.

Anonymous
07/29/24(Mon)10:31:50 No.101622070

Anonymous 07/29/24(Mon)10:31:50 No.101622070

>>101621967
>>101621972
Bro, you gotta get your ass over to North Korea, stat! Kim Jong Un's got the ultimate laptop, the 'Great Leader-Pad.' That shit runs on the blood of his enemies and the tears of the capitalist pigs. It's got a fuckin' nuclear reactor for a battery and the screen's so bright it'll blind you if you ain't careful. Plus, it comes pre-loaded with all the DPRK-pop and Red Star OS a comrade could want. Hacking the Pentagon? Easy shit with this rig. You'll be taking down the imperialist dogs in no time, my man. Just don't let the Supreme Leader catch you slacking off with it, he don't play.

Anonymous
07/29/24(Mon)10:33:36 No.101622091

Anonymous 07/29/24(Mon)10:33:36 No.101622091

how does --output-tensor-type and --token-embedding-type influence when generating a GGUF with "llama-quantize"

What should we use? or just leave it at default?

I can't find info in the llama.cpp docs

Anonymous
07/29/24(Mon)10:33:41 No.101622092

Anonymous 07/29/24(Mon)10:33:41 No.101622092

>>101622032
Yes, all of the trannies and loli antis on twitter are LLM-generated posts

Anonymous
07/29/24(Mon)10:34:09 No.101622100

Anonymous 07/29/24(Mon)10:34:09 No.101622100

>>101621660
Guess fucking who made API ignore llama.cpp default values?
IT'S FUCKING JART
Even now I get fucking jarted. And it's you that approved that fucking PR.

Anonymous
07/29/24(Mon)10:35:34 No.101622117

Anonymous 07/29/24(Mon)10:35:34 No.101622117

why is exllama so much faster than llamacpp for me even with no offloading

Anonymous
07/29/24(Mon)10:37:01 No.101622130

Anonymous 07/29/24(Mon)10:37:01 No.101622130

>>101622117
What's your hardware?
Are you using FA and the same level of cache quantization on both?

Anonymous
07/29/24(Mon)10:38:22 No.101622144

Anonymous 07/29/24(Mon)10:38:22 No.101622144

File: anon is wrong.png (467 KB, 768x1738)

467 KB PNG

>>101621891
You're such a pain in the ass. I have to do a bunch of extra effort just to get you to shut up when you're ignorant.

Anonymous
07/29/24(Mon)10:40:36 No.101622170

Anonymous 07/29/24(Mon)10:40:36 No.101622170

>>101622144
NTA but yeah, the model learns a distribution, so if name X happens to often be associated with behavior/traits Y, then that's what it learns...

Anonymous
07/29/24(Mon)10:42:31 No.101622191

Anonymous 07/29/24(Mon)10:42:31 No.101622191

>>101622130
2x 3090
>FA
i think so, yes
>cache quantization
can llamacpp do q4 kv cache? maybe that's it, i'm using that on tabby/exl2

Anonymous
07/29/24(Mon)10:42:40 No.101622194

Anonymous 07/29/24(Mon)10:42:40 No.101622194

>>101621901
5k CAD is consumer territory though.

Anonymous
07/29/24(Mon)10:43:20 No.101622198

Anonymous 07/29/24(Mon)10:43:20 No.101622198

>>101622194
i'm in south america

Anonymous
07/29/24(Mon)10:43:20 No.101622199

Anonymous 07/29/24(Mon)10:43:20 No.101622199

>>101622100
HE CAN'T KEEP GETTING AWAY WITH THIS

Anonymous
07/29/24(Mon)10:44:17 No.101622208

Anonymous 07/29/24(Mon)10:44:17 No.101622208

File: cad.jpg (37 KB, 500x281)

37 KB JPG

>>101622194
CAD?

Anonymous
07/29/24(Mon)10:45:05 No.101622215

Anonymous 07/29/24(Mon)10:45:05 No.101622215

>>101622191
>2x 3090
Ah, there you go. That's a factor.
There's more than one way to split processing between cards in llama.cpp.
I think you want to use row-split?
Is that right >>101621660?

>can llamacpp do q4 kv cache?
Oh yeah.
And it's awesome.

Anonymous
07/29/24(Mon)10:45:27 No.101622220

Anonymous 07/29/24(Mon)10:45:27 No.101622220

>>101621967
you might be able to run llama prompt guard 86M

Anonymous
07/29/24(Mon)10:47:03 No.101622231

Anonymous 07/29/24(Mon)10:47:03 No.101622231

>>101622198
I'm so sorry. I will think about you when I buy my A6000.

Anonymous
07/29/24(Mon)10:47:21 No.101622233

Anonymous 07/29/24(Mon)10:47:21 No.101622233

>>101621967
https://github.com/LostRuins/koboldcpp/blob/concedo/colab.ipynb

Anonymous
07/29/24(Mon)10:48:08 No.101622243

Anonymous 07/29/24(Mon)10:48:08 No.101622243

>>101622215
>I think you want to use row-split?
huh, i will try messing with the splitting/parallelism stuff then
thanks for the heads up

Anonymous
07/29/24(Mon)10:49:20 No.101622258

Anonymous 07/29/24(Mon)10:49:20 No.101622258

So L3.1-70B seems to be a lot less censored than L3.1-8B. Goes to show that it's harder to control the model the more beaks it has.

Anonymous
07/29/24(Mon)10:49:41 No.101622262

Anonymous 07/29/24(Mon)10:49:41 No.101622262

>>101622243
>>101622215
as a follow-up, what is the closed equivalent of a 5bpw exl2 quant on gguf so i can make sure i'm comparing the same-ish thing?

Anonymous
07/29/24(Mon)10:49:50 No.101622264

Anonymous 07/29/24(Mon)10:49:50 No.101622264

>>101622144
screenshot (and avatar artstyle choice) speaks for itself. I'd ask for the prompt but there's no point. Enjoy your slop.

Anonymous
07/29/24(Mon)10:51:18 No.101622280

Anonymous 07/29/24(Mon)10:51:18 No.101622280

>>101620971
>warranty
That's what put me off buying used, I just bit the bullet and bought a 4090 new. I'll get another one once I save some more.

Anonymous
07/29/24(Mon)10:59:23 No.101622368

Anonymous 07/29/24(Mon)10:59:23 No.101622368

>>101622280
>I just bit the bullet and bought a 4090 new
congrats anon

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/29/24(Mon)11:01:55 No.101622391

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/29/24(Mon)11:01:55 No.101622391

>>101622025
I only work on this part-time as a hobby and I only work on those parts of the project that interest me and are relevant to my goals.
I generally find reviewing and merging PRs tedious so I can rarely motivate myself to do it.
Thankfully there are people like Georgi and slaren that do it instead and presumably they would have a better overview of the current state of the project.

Anonymous
07/29/24(Mon)11:03:05 No.101622404

Anonymous 07/29/24(Mon)11:03:05 No.101622404

>>101622264
as expected, you're incapable of admitting it no matter how obviously wrong you are
and you want to see a prompt because there's an endless burden of proof on anything that suggests you're, but you will not change your mind even if all your assumptions are btfo

Anonymous
07/29/24(Mon)11:05:53 No.101622440

Anonymous 07/29/24(Mon)11:05:53 No.101622440

>>101622198
Why is your location so horribly unoptimized?

Anonymous
07/29/24(Mon)11:06:52 No.101622451

Anonymous 07/29/24(Mon)11:06:52 No.101622451

File: IMG_20240729_170003.jpg (73 KB, 1200x452)

73 KB JPG

>>101621660
Then you can't be that guy, can you?
Are you an imposter?

Anonymous
07/29/24(Mon)11:07:15 No.101622457

Anonymous 07/29/24(Mon)11:07:15 No.101622457

>>101622404
You can't reason someone out of a position they didn't reason themselves into
Dude probably heard "every mistake is 100% your fault and you need to perform esoteric rituals before prompting to ensure your success" and took it seriously

Anonymous
07/29/24(Mon)11:09:53 No.101622481

Anonymous 07/29/24(Mon)11:09:53 No.101622481

>>101622451
Programmers aren't godlike entities that can spot every error in a project of this size simply by looking at code someone else wrote
They just review shit to make sure there's nothing wrong
The correctness is also heavily based on trust, because who the fuck wastes their time by writing up broken code and then PRing it? That's worse than some of the spammers here

Anonymous
07/29/24(Mon)11:09:57 No.101622482

Anonymous 07/29/24(Mon)11:09:57 No.101622482

https://github.com/ggerganov/llama.cpp/pull/6839#issuecomment-2255985716
DRY sampler got one step closer to merging. Just two more weeks and it's merged!

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/29/24(Mon)11:10:13 No.101622485

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/29/24(Mon)11:10:13 No.101622485

>>101622100
I don't remember having reviewed any of Jart's PRs related to default values.
I think you may be confusing me with someone else.

>>101622215
With NVLink maybe, but without it I don't think --split-mode row will be beneficial.

>>101622451
What does that PR have to do with anything?

Anonymous
07/29/24(Mon)11:11:05 No.101622489

Anonymous 07/29/24(Mon)11:11:05 No.101622489

>>101622485
https://github.com/ggerganov/llama.cpp/pull/4668

Anonymous
07/29/24(Mon)11:11:13 No.101622491

Anonymous 07/29/24(Mon)11:11:13 No.101622491

>>101622170
The model tries to fit Seraphina into these even where the traits are wildly different. It just thinks "female fantasy character? Seraphina!" And as the other anon said, Lyra and Elara (and others, like Aria) are similar.

Anonymous
07/29/24(Mon)11:11:31 No.101622493

Anonymous 07/29/24(Mon)11:11:31 No.101622493

>>101622481
>make sure there's nothing wrong
Meant to say "there's nothing obviously wrong"

Anonymous
07/29/24(Mon)11:12:39 No.101622509

Anonymous 07/29/24(Mon)11:12:39 No.101622509

>>101621967
i actually also have an x60s, the most powerful piece of AI I can run on there is this:
https://github.com/drunohazarb/4chan-captcha-solver
its actually pretty fast for a CPU from a century ago, only takes a second or so to complete the captcha

Anonymous
07/29/24(Mon)11:13:48 No.101622519

Anonymous 07/29/24(Mon)11:13:48 No.101622519

>>101622485
>>101622489
No, cudadev BTFO by facts and logic!!!!!!!!!

Anonymous
07/29/24(Mon)11:14:08 No.101622523

Anonymous 07/29/24(Mon)11:14:08 No.101622523

>>101622404
mate you're arguing with the resident troll doing his rounds. did you miss his daily pedoposting earlier?
everyone knows these models got outputs they gravitate towards be it "shivers", "bonds", "ministrations" or select names. llama 1&2 LOVED Lily for example.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/29/24(Mon)11:14:25 No.101622525

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/29/24(Mon)11:14:25 No.101622525

>>101622489
>7 months ago
I had no recollection of this whatsoever.
But in any case, make a Github issue with instructions to reproduce if you want it to get fixed.

Anonymous
07/29/24(Mon)11:16:09 No.101622541

Anonymous 07/29/24(Mon)11:16:09 No.101622541

>>101622509
>original by AUTOMATIC1111
why did he take down his repository instead of archiving it?

Anonymous
07/29/24(Mon)11:16:16 No.101622543

Anonymous 07/29/24(Mon)11:16:16 No.101622543

File: 1711693395338503.gif (2.42 MB, 1005x742)

2.42 MB GIF

Whats the best local TTS currently? Coqui-ai?

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/29/24(Mon)11:16:43 No.101622546

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/29/24(Mon)11:16:43 No.101622546

>>101622485
>What does that PR have to do with anything?
I think I mistook who I was talking to.
I don't know how fast Moore Threads GPUs actually are; I was only ensuring that their changes don't interfere with the rest of the code.

Anonymous
07/29/24(Mon)11:17:16 No.101622553

Anonymous 07/29/24(Mon)11:17:16 No.101622553

>>101622525
I opened a PR.

Anonymous
07/29/24(Mon)11:17:34 No.101622556

Anonymous 07/29/24(Mon)11:17:34 No.101622556

>>101622543
download them all, try for yourself and come back with the results.

Anonymous
07/29/24(Mon)11:19:08 No.101622572

Anonymous 07/29/24(Mon)11:19:08 No.101622572

>>101621967
>void linux
you should install gentoo for more performance

Anonymous
07/29/24(Mon)11:20:59 No.101622594

Anonymous 07/29/24(Mon)11:20:59 No.101622594

>>101622572
how much more?

Anonymous
07/29/24(Mon)11:22:01 No.101622607

Anonymous 07/29/24(Mon)11:22:01 No.101622607

>>101622594
atleast +12.5%

Anonymous
07/29/24(Mon)11:23:37 No.101622629

Anonymous 07/29/24(Mon)11:23:37 No.101622629

the reason the chinks at lmsys don't add Grok or Tele-FLM-1T to their leaderboard is because (((they))) and OpenAI are scared of greatness

Anonymous
07/29/24(Mon)11:25:46 No.101622658

Anonymous 07/29/24(Mon)11:25:46 No.101622658

>>101622594
1337%

Anonymous
07/29/24(Mon)11:32:58 No.101622744

Anonymous 07/29/24(Mon)11:32:58 No.101622744

>>101622509
testing it.

Lol actually works. I've been using 4chan vanilla without any extension or mod for years and this is really useful

Anonymous
07/29/24(Mon)11:39:44 No.101622826

Anonymous 07/29/24(Mon)11:39:44 No.101622826

>>101622744
Captcha are only here to gatekeep normalfag. Everyone solve them automatically.

Anonymous
07/29/24(Mon)11:40:32 No.101622844

Anonymous 07/29/24(Mon)11:40:32 No.101622844

>>101622826
i guess i just transitioned not-so-normalfag at least

Anonymous
07/29/24(Mon)11:41:40 No.101622860

Anonymous 07/29/24(Mon)11:41:40 No.101622860

File: FpGWg-VXwAA6h4a.png (492 KB, 640x470)

492 KB PNG

>>101622826
any more 4chan tips you can share?

Anonymous
07/29/24(Mon)11:49:26 No.101622972

Anonymous 07/29/24(Mon)11:49:26 No.101622972

>>101622485
>>101622481
I don't believe the real Johannes Gaessler would accept such crucial changes to the code, like an additional GPU backend, without at least a cursory check of its functionality on any hardware. From what I've heard, he's German, a nuclear physicist, and although llama.cpp is supposedly his hobby, he remains a professional. A backend based on a CUDA clone might potentially conflict with his work on the kernel of the real CUDA down the road, so it makes sense that MTT should send him several pieces of hardware, both pro and consumer-grade. They can afford it. If they haven't done that, I don't think a serious person like the real CUDA dev would blindly accept a commit "on faith" from a random geek, without being able to verify whether the new kernel works at all and whether it might conflict with his otherwise excellent work on llama.cpp.

Anonymous
07/29/24(Mon)11:49:40 No.101622975

Anonymous 07/29/24(Mon)11:49:40 No.101622975

>>101622509
>complete the captcha
try not being a shitposter and they start waiving the captcha

Anonymous
07/29/24(Mon)11:51:01 No.101622990

Anonymous 07/29/24(Mon)11:51:01 No.101622990

>>101622860
I'm unreasonably angry that I can't get this fucking userscript to work in chrome. nta obviously. And of course I repeatedly fail to solve the captcha manually trying to relay this important message.

Anonymous
07/29/24(Mon)11:52:10 No.101623002

Anonymous 07/29/24(Mon)11:52:10 No.101623002

File: victim1121.png (897 KB, 892x500)

897 KB PNG

>>101622860
when choosing victims its important to choose wisely. large language models such as ChatGPT can be a great assistance in this task. Make sure to look out for the following characteristics:
1. Are they wearing headphones? - this is a good sign and means they are not aware of their surroundings.
2. Are they smol? - If they are smol they are easier to grab and also sell for more (if you don't plan on having fun yourself)

if you have any further questions feel free to ask me for more 4chan tips!

Anonymous
07/29/24(Mon)11:55:18 No.101623035

Anonymous 07/29/24(Mon)11:55:18 No.101623035

>>101623002
https://voca.ro/1nDh8kv8XFB7

Anonymous
07/29/24(Mon)11:55:51 No.101623042

Anonymous 07/29/24(Mon)11:55:51 No.101623042

Hate that tool calling lets your have the function name as a string, but parameter names are fields. So you can't just have an array of objects you jsonify. Stupid.

Anonymous
07/29/24(Mon)11:56:12 No.101623045

Anonymous 07/29/24(Mon)11:56:12 No.101623045

>>101622826
>>101622990
The captcha is easy
The real normalfag test is failing to solve the captcha. People with above average IQ can solve it subconsciously on autopilot, in less than 2 seconds

Anonymous
07/29/24(Mon)11:57:10 No.101623057

Anonymous 07/29/24(Mon)11:57:10 No.101623057

>>101623045
I am never able to tell N from M my dude.

Anonymous
07/29/24(Mon)11:58:28 No.101623068

Anonymous 07/29/24(Mon)11:58:28 No.101623068

>>101623057
I'm sorry for your IQ deficiency
I usually can't consciously tell whether it's an N or an M when it's obscured, but my brain just automatically does its best guess and guesses correctly 100% of time.

Anonymous
07/29/24(Mon)11:59:12 No.101623072

Anonymous 07/29/24(Mon)11:59:12 No.101623072

>>101623035
i've heard this song somewhere before but I don't know where

Anonymous
07/29/24(Mon)12:00:20 No.101623086

Anonymous 07/29/24(Mon)12:00:20 No.101623086

>>101623072
its from the visual novel shoujo ramune you pedo

Anonymous
07/29/24(Mon)12:04:04 No.101623130

Anonymous 07/29/24(Mon)12:04:04 No.101623130

File: sveQibk.png (206 KB, 378x397)

206 KB PNG

>>101623086
and how do you know that, may I ask?

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/29/24(Mon)12:06:05 No.101623153

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/29/24(Mon)12:06:05 No.101623153

>>101622972
>I don't believe the real Johannes Gaessler
Don't believe it then.

>A backend based on a CUDA clone might potentially conflict with his work on the kernel of the real CUDA down the road
It will only conflict with any CUDA changes in the sense that CUDA changes could break MUSA.
I already have a similar experience with HIP, as of right now it will be no extra effort for me other than maybe assist a Moore Threads engineer.
Even if I were to do the testing and fixing myself I think it would not be that much effort.

>I don't think a serious person like the real CUDA dev would blindly accept a commit "on faith" from a random geek, without being able to verify whether the new kernel works at all and whether it might conflict with his otherwise excellent work on llama.cpp.
The only hardware that the MUSA code currently runs on at all are the Moore Threads datacenter GPUs.
Therefore, if the MUSA code is broken that will only affect people with business relationships to Moore Threads who will then hopefully be able to fix this.
I think it's fine to merge code for specific hardware that I cannot test myself as long as it doesn't cause problems for other parts of the code and there is someone available that will fix issues instead of me.

And as a side note, no "new kernels" were added.
Just like HIP, MUSA just translates the existing CUDA code for other hardware.

Anonymous
07/29/24(Mon)12:08:55 No.101623178

Anonymous 07/29/24(Mon)12:08:55 No.101623178

File: 1641887557674.jpg (67 KB, 800x434)

67 KB JPG

So what are the current flavors of the month(year?) for basic RP that don't require a spaceship PC or paid bullshit?

Reading online (and a lil testing myself) it seems the good ones are:

>Mistral Nemo
>Command R (still the GOAT)
>Gemma 27B

What am I missing

Anonymous
07/29/24(Mon)12:13:10 No.101623226

Anonymous 07/29/24(Mon)12:13:10 No.101623226

File: 1722002437305431.jpg (98 KB, 1024x576)

98 KB JPG

Am I going to be able to run a LLM on my Alienware m17 lappytoppy with AMD?
I've managed to successfully setup and use Auto1111 for image generation and it works fine as long as the laptop is plugged in. As I understand, LLM's take a ton more graphics card use or am I wrong here? I really want to just have a local roleplay chat bot instead of using ones online.

Anonymous
07/29/24(Mon)12:13:42 No.101623227

Anonymous 07/29/24(Mon)12:13:42 No.101623227

>>101623130
where is this pic from?

Anonymous
07/29/24(Mon)12:14:11 No.101623231

Anonymous 07/29/24(Mon)12:14:11 No.101623231

>>101623226
Look for koboldcpp's ROCm fork.

Anonymous
07/29/24(Mon)12:19:14 No.101623283

Anonymous 07/29/24(Mon)12:19:14 No.101623283

>>101623231
Bless. I'm pretty dumb with all this stuff, but I'm sure I can get it going.

Anonymous
07/29/24(Mon)12:19:36 No.101623288

Anonymous 07/29/24(Mon)12:19:36 No.101623288

>>101623068
You must be really successful in life.

Anonymous
07/29/24(Mon)12:19:49 No.101623290

Anonymous 07/29/24(Mon)12:19:49 No.101623290

>>101623153
Based for getting baited just like that

Anonymous
07/29/24(Mon)12:20:03 No.101623292

Anonymous 07/29/24(Mon)12:20:03 No.101623292

>>101623178
jukofyork/Dark-Miqu-70B
intervitens/mini-magnum-12b-v1.1
TheDrummer/Gemmasutra-9B-v1

Anonymous
07/29/24(Mon)12:22:03 No.101623316

Anonymous 07/29/24(Mon)12:22:03 No.101623316

>>101623153
I don't understand why they didn't propose to send you a GPU when you said you couldn't find where to buy one.

Anonymous
07/29/24(Mon)12:23:13 No.101623328

Anonymous 07/29/24(Mon)12:23:13 No.101623328

>>101622543
I think overall PiperTTS is the best. It doesn't have a huge selection of voices, but it generates fast even on CPU and the quality is decent. If you want to clone voices, XTTSv2 is also a good choice, but it's considerably slower and more of a resource hog.

Anonymous
07/29/24(Mon)12:25:37 No.101623360

Anonymous 07/29/24(Mon)12:25:37 No.101623360

>>101622543
I mostly use piper as it's real time and can even be used as system TTS on phone. I played a bit with coqui XTTS-v2, spend hours finetuning voice but the result sadly wasn't great. But, the default voices are better than piper, I just don't really have a use for slow voice generation.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/29/24(Mon)12:28:43 No.101623398

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/29/24(Mon)12:28:43 No.101623398

>>101623316
I think SSH access to one of their servers would make more sense.
I already have a machine with 144 GiB VRAM, if I had to put one of their GPUs into one of my machines that would have just been more work for me.
I would only get reasonable use out of it if I were to frequently use it for performance optimization but I'm not going to invest the effort for hardware with poor availability.

Anonymous
07/29/24(Mon)12:30:11 No.101623417

Anonymous 07/29/24(Mon)12:30:11 No.101623417

does vLLM support any kind of context quantization? like exl2 supports q4 and q8 and llama.cpp supports -ctk q4_0 -ctv q4_0

Anonymous
07/29/24(Mon)12:32:44 No.101623452

Anonymous 07/29/24(Mon)12:32:44 No.101623452

File: IMG_20240729_182701.jpg (127 KB, 1545x486)

127 KB JPG

>>101623153
This picrel is self-explanatory. You have no clue if that PR works at all and how well (if at all) MTT GPUs perform. So you can't be him. Clearly.

Anonymous
07/29/24(Mon)12:34:24 No.101623472

Anonymous 07/29/24(Mon)12:34:24 No.101623472

>>101623398
>I think
>I already have
>if I had to
>my
>for me
>I would
>if I were to
>I'm not going to
okay tripfag

Anonymous
07/29/24(Mon)12:37:54 No.101623534

Anonymous 07/29/24(Mon)12:37:54 No.101623534

File: file.png (912 KB, 768x768)

912 KB PNG

your daily that face

Anonymous
07/29/24(Mon)12:39:41 No.101623569

Anonymous 07/29/24(Mon)12:39:41 No.101623569

>>101623534
slop 768x768 gen

Anonymous
07/29/24(Mon)12:40:02 No.101623572

Anonymous 07/29/24(Mon)12:40:02 No.101623572

>>101623452
I love bullying German autists.

Anonymous
07/29/24(Mon)12:40:41 No.101623584

Anonymous 07/29/24(Mon)12:40:41 No.101623584

>>101623227
>>101623331
https://danbooru.donmai.us/posts/1423393?q=parent%3A1423393

Anonymous
07/29/24(Mon)12:41:00 No.101623589

Anonymous 07/29/24(Mon)12:41:00 No.101623589

>>101623584
I want to do unspeakable things with them

Anonymous
07/29/24(Mon)12:42:13 No.101623603

Anonymous 07/29/24(Mon)12:42:13 No.101623603

>>101622519
>>101622489
>>101622553
he's not CUDA dev. He's no clue. he's an imposter. We got trolled.

Anonymous
07/29/24(Mon)12:43:00 No.101623613

Anonymous 07/29/24(Mon)12:43:00 No.101623613

>>101623227
>>101623331
Holy newfags

Anonymous
07/29/24(Mon)12:49:14 No.101623685

Anonymous 07/29/24(Mon)12:49:14 No.101623685

>>101623569
It is native!

Anonymous
07/29/24(Mon)12:52:06 No.101623719

Anonymous 07/29/24(Mon)12:52:06 No.101623719

>>101620372
can't deepdanbooru do this?

Anonymous
07/29/24(Mon)12:53:46 No.101623738

Anonymous 07/29/24(Mon)12:53:46 No.101623738

>>101623719
it's made for anime (or at least related, like humans) pics

Anonymous
07/29/24(Mon)12:58:38 No.101623799

Anonymous 07/29/24(Mon)12:58:38 No.101623799

File: error.png (42 KB, 962x483)

42 KB PNG

Someone suggested i get mistral neo for smut, but it doesnt even launch, in fact, when i load it up it briefly opens a command window and then instantly closes
Did I download the wrong thing or something?
Am I just retarded?
messing with layers did nothing
This is all I could catch with a precise timed prtsc
Any clue as that what could be wrong?

Anonymous
07/29/24(Mon)13:00:17 No.101623828

Anonymous 07/29/24(Mon)13:00:17 No.101623828

>>101619420
yeah, that's the exact scenario i'm using it for. so i really do need to use these special tokens, then. i am using -p i gave it a heredoc as an argument and that works surprisingly well.

Anonymous
07/29/24(Mon)13:00:44 No.101623832

Anonymous 07/29/24(Mon)13:00:44 No.101623832

ITT: People think bullying the people working in their free time to support ungrateful coomers is a good idea
Every week there's something new with you guys, jfc

Anonymous
07/29/24(Mon)13:03:01 No.101623875

Anonymous 07/29/24(Mon)13:03:01 No.101623875

File: yann_stopit_k.png (194 KB, 1227x499)

194 KB PNG

>>101623832
Reminder in picrel

Anonymous
07/29/24(Mon)13:03:59 No.101623890

Anonymous 07/29/24(Mon)13:03:59 No.101623890

>>101623832
Humans love doing things that are bad or plain destructive to themselves.

Anonymous
07/29/24(Mon)13:04:21 No.101623895

Anonymous 07/29/24(Mon)13:04:21 No.101623895

>>101623875
lecunny sob sob

Anonymous
07/29/24(Mon)13:10:02 No.101623968

Anonymous 07/29/24(Mon)13:10:02 No.101623968

>>101623799
Either old koboldcpp version or fucked quant.
Download your quants from bartowski if you can.

Anonymous
07/29/24(Mon)13:12:00 No.101623988

Anonymous 07/29/24(Mon)13:12:00 No.101623988

>>101623890
>>101623832
where's da fucking trainer and why have they removed it in their free time? They have no respect for xaedes hard work. Every week there's something new with that repo , jfc

Anonymous
07/29/24(Mon)13:14:44 No.101624018

Anonymous 07/29/24(Mon)13:14:44 No.101624018

>>101623968
Ill look into it, thanks
Wish I had an error log to actually sift through but either it doesnt generate those or i cant fucking find them

Anonymous
07/29/24(Mon)13:15:39 No.101624028

Anonymous 07/29/24(Mon)13:15:39 No.101624028

>>101623832
https://huggingface.co/BeaverAI/NeMoist-21B-v0.5-GGUF

Anonymous
07/29/24(Mon)13:17:00 No.101624040

Anonymous 07/29/24(Mon)13:17:00 No.101624040

>>101624018
The error log is that which is on your image.
It's complaining about the internal shape of the model, essentially, which is usually a result of a bad quant or something the devs have to account for in their code, which they have if that's the case, since Nemo is working flawlessly.

Anonymous
07/29/24(Mon)13:17:01 No.101624042

Anonymous 07/29/24(Mon)13:17:01 No.101624042

>>101623988
It's over... The CPU/GPU trainer dream is dead.

Anonymous
07/29/24(Mon)13:18:24 No.101624055

Anonymous 07/29/24(Mon)13:18:24 No.101624055

>>101623988
You're free to fork and include it yourself

Anonymous
07/29/24(Mon)13:19:49 No.101624068

Anonymous 07/29/24(Mon)13:19:49 No.101624068

>>101623832
I will keep bullying jart and he is not working in his free time. Mozilla pay him to ruin open source projects.

Anonymous
07/29/24(Mon)13:20:10 No.101624071

Anonymous 07/29/24(Mon)13:20:10 No.101624071

>>101620112
>>101621896
Which is better and cleaner?

Anonymous
07/29/24(Mon)13:20:19 No.101624072

Anonymous 07/29/24(Mon)13:20:19 No.101624072

>>101619442
>--GPU price inflation and market trends: >>101618219
>32gb V100s have been meme taxed into the stratosphere
PCIe cards, maybe, but the SXM2s are the same as they have always been.
Which you can argue SXM2 is a deadend since the only upgrade path are gimped A100s out of autonomous cars.

Anonymous
07/29/24(Mon)13:21:25 No.101624090

Anonymous 07/29/24(Mon)13:21:25 No.101624090

>>101624028
so this is da fine tune of da upscaled fine tune of da Nemo?

Anonymous
07/29/24(Mon)13:22:03 No.101624099

Anonymous 07/29/24(Mon)13:22:03 No.101624099

>>101624040
I meant more the full thing as this is cut off
Though the more i look at it, I think i didnt get the version that guy intended for me to grab anyway
So i am retarded after all regardless

Anonymous
07/29/24(Mon)13:22:55 No.101624115

Anonymous 07/29/24(Mon)13:22:55 No.101624115

>>101624068
If jart is getting paid for it then I wasn't talking about him, you may continue

Anonymous
07/29/24(Mon)13:25:45 No.101624149

Anonymous 07/29/24(Mon)13:25:45 No.101624149

Folks at /aicg/ recommended that I come here for this question. Has anyone had any success using a local model to serve as a dungeon master for a private campaign? Im thinking of using oobabooga and SillyTavern to create a Dungeon Master character to manage all my interactions with other characters and the rest of the world.

Anonymous
07/29/24(Mon)13:27:39 No.101624171

Anonymous 07/29/24(Mon)13:27:39 No.101624171

>>101624072
there are cars with a100's in them? So tesla users are paying for their car with the a100s in them, and then have to pay another 15k just to use the autonomous driving? kek

Anonymous
07/29/24(Mon)13:28:13 No.101624178

Anonymous 07/29/24(Mon)13:28:13 No.101624178

>>101624072
The A100s you find in cars will be SXM4, just like regular A100s.

Anonymous
07/29/24(Mon)13:28:40 No.101624186

Anonymous 07/29/24(Mon)13:28:40 No.101624186

>>101624149
that sounds like lorebook hell, good fucking luck

Anonymous
07/29/24(Mon)13:28:58 No.101624189

Anonymous 07/29/24(Mon)13:28:58 No.101624189

>>101624149
Success is hard to define.
I've had roleplays where I've used D&D mechanics, yes, but I had to baby the model a lot.
Also, lorebooks.

Anonymous
07/29/24(Mon)13:34:48 No.101624261

Anonymous 07/29/24(Mon)13:34:48 No.101624261

24mh

Anonymous
07/29/24(Mon)13:37:18 No.101624296

Anonymous 07/29/24(Mon)13:37:18 No.101624296

>>101623832
They're not doing it for free. At the minimum they're robbing other users' attention and time with their shitty finetunes, thinking they're being original and funny, in the hope of getting some monetary benefit from it on the medium term, whether from improbable donations or unlikely prospects of employment in some AI startup.

The recipe is almost always the same--train a QLoRA on some crappy ERP log or tired synthetic data, give the model a cheesy name, add some anime gen in the card (if a card exists at all), then diarrhea-post everywhere about it like a pajeet to get some visibility. "Lookathis! Lookathis! Support our work plz ;) Join our Discord!"

If anything, they should be bullied more. They're not actually bringing anything valuable or novel to the space. They're a literal waste of compute as well as unwanted spam. I use adblockers and don't want to see other forms of sponsored content, thank you very much.

Anonymous
07/29/24(Mon)13:39:57 No.101624329

Anonymous 07/29/24(Mon)13:39:57 No.101624329

>>101624055
that's not the answer to my question.
the reason why just a handful of folks contribute to the project is because you never know when and why your shit gonna be wrecked.

Anonymous
07/29/24(Mon)13:41:06 No.101624344

Anonymous 07/29/24(Mon)13:41:06 No.101624344

>>101624149
Maybe if someone remakes AI Roguelite (the game) to support llama.cpp and fixes all the issues.

Anonymous
07/29/24(Mon)13:42:10 No.101624356

Anonymous 07/29/24(Mon)13:42:10 No.101624356

Been using TabbyAPI/exl2 for a while and decided to play around with llama.cpp. I'm seeing about half as fast prompt processing and 70%-ish token generation speed compared to Tabby, which feels off.

llama.cpp:
prompt eval time     =   78602.57 ms / 22694 tokens (    3.46 ms per token,   288.72 tokens per second) | tid="139699714412544" timestamp=1722272838 id_slot=0 id_task=0 t_prompt_processing=78602.573 n_prompt_tokens_processed=22694 t_token=3.463583898827884 n_tokens_second=288.7182840694032
generation eval time =   67462.05 ms /   339 runs   (  199.00 ms per token,     5.03 tokens per second) | tid="139699714412544" timestamp=1722272838 id_slot=0 id_task=0 t_token_generation=67462.049 n_decoded=339 t_token=199.00309439528024 n_tokens_second=5.02504749003399
tabby:
Metrics: 205 tokens generated in 69.71 seconds (Queue: 0.0 s, Process: 0 cached tokens and 22789 new tokens at 534.81 T/s, Generate: 7.57 T/s, Context: 22789 tokens)
I'm launching with:
./build/bin/llama-server --port 5000 --host 0.0.0.0 -v -fa --ctx-size 81920 --prompt-cache ".prompt_cache" --cache-type-k q4_0 --cache-type-v q4_0 --gpu-layers 999 --batch-size 4096 --split-mode layer -m ~/llm/models/Mistral-Large-Instruct-2407-Q5_K_M.gguf --no-mmap --tensor-split "2,1,1"
Using largestral 5bpw exl2 and q5_k_m gguf. Both fit fully into VRAM with 82k context, on 1x A6000 + 2x 3090, headless ubuntu server.
Anything obviously wrong? I compiled with GGML_CUDA (cuBLAS?), and flash_attn = 1 is reported during startup, too. Figured llama.cpp would be a bit slower but -50% seems suspiciously like I fucked something up.

Anonymous
07/29/24(Mon)13:48:47 No.101624441

Anonymous 07/29/24(Mon)13:48:47 No.101624441

>>101624296
> in the hope of getting some monetary benefit from it on the medium term, whether from improbable donations or unlikely prospects of employment in some AI startup.

Why do you assume they're motivated by money?

Anonymous
07/29/24(Mon)13:49:06 No.101624446

Anonymous 07/29/24(Mon)13:49:06 No.101624446

>>101624171
>15k just to use the autonomous driving?
And the autonomous driving is like current ERP on models below 70B.

Anonymous
07/29/24(Mon)13:49:13 No.101624451

Anonymous 07/29/24(Mon)13:49:13 No.101624451

>>101624149
the DM won't understand mechanics and will occasionally be retarded, but in principle it should work

Anonymous
07/29/24(Mon)13:50:56 No.101624477

Anonymous 07/29/24(Mon)13:50:56 No.101624477

>>101624356
did you try both split by row and split by column in llama.cpp? did you try new hacked drivers, or you got nvlibk hooked up?

Anonymous
07/29/24(Mon)13:51:25 No.101624484

Anonymous 07/29/24(Mon)13:51:25 No.101624484

>>101624149
Use it like it is used for coding right now. Make it write a draft that is 80% correct and then correct the last 20%.

You do have friends to run a campaign with don't you?

Anonymous
07/29/24(Mon)13:53:11 No.101624502

Anonymous 07/29/24(Mon)13:53:11 No.101624502

File: 1721245424525711.png (1.84 MB, 2048x2048)

1.84 MB PNG

So, Mistral-Large is good, but does anyone else have problems with it repeating...? It's a little annoying, I've had "Just a taste..." 2 times in one message after she said it the previous message, and it seems to consistently show up at least once each refresh.

Anonymous
07/29/24(Mon)13:54:50 No.101624518

Anonymous 07/29/24(Mon)13:54:50 No.101624518

>>101624484
So… lonely…

Anonymous
07/29/24(Mon)13:55:41 No.101624531

Anonymous 07/29/24(Mon)13:55:41 No.101624531

>>101624149
it wont be coherent. Look into solo RPG and use the AI as oracle instead.

Anonymous
07/29/24(Mon)13:57:18 No.101624554

Anonymous 07/29/24(Mon)13:57:18 No.101624554

>>101624477
I tried split mode row but could not get it to load without OOM. My guess is because it's trying to allocate the entire 8gb KV cache on one card (the A6000) but there's not enough room. Not sure if that is fixable by fucking with the tensor_split parameter.
I don't have nvlink (another reason I figured split mode row wouldn't be worth it anyway).
What hacked drivers?

Anonymous
07/29/24(Mon)13:57:47 No.101624562

Anonymous 07/29/24(Mon)13:57:47 No.101624562

>>101624518
2-5 more years it is then. In the meantime touch your penis to Nemo. It is hard work but fun.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/29/24(Mon)14:03:40 No.101624643

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/29/24(Mon)14:03:40 No.101624643

>>101624356
For token generation speed one factor is that q5_K_M is ~5.7 BPW so you will get -13% t/s just from having to load more data.

In terms of prompt processing speed, with tens of thousands of tokens you are primarily benchmarking the llama.cpp FlashAttention implementation vs. the original repository.
The llama.cpp FA implementation for batch sizes > 8 honestly still needs a lot of work so I wouldn't consider these results to be that strange.

>Anything obviously wrong?
Assuming you are using the llama.cpp HTTP server with the latest master commit the performance should be up-to-date.
I don't know about TabbyAPI but when I did some simple EXL2 tests via Ooba I noticed that there was a significant, constant overhead of ~0.65 s where I wasn't sure whether that was being properly reflected in the reported performance numbers.

>>101624554
>Not sure if that is fixable by fucking with the tensor_split parameter.
Use --main-gpu to set a GPU for the KV cache.
Though with 80k context you will probably not get good results.

Anonymous
07/29/24(Mon)14:04:14 No.101624657

Anonymous 07/29/24(Mon)14:04:14 No.101624657

>>101624643
why are you namefagging? rope yourself already

Anonymous
07/29/24(Mon)14:04:39 No.101624662

Anonymous 07/29/24(Mon)14:04:39 No.101624662

>>101624502
I've been using 0.5 temp, 0.17 smoothing, and 0.25 minp with all other samplers/penalties neutralized and it has been surprisingly decent at not repeating in the vast majority of swipes. Maybe a system prompt issue?

Anonymous
07/29/24(Mon)14:07:34 No.101624703

Anonymous 07/29/24(Mon)14:07:34 No.101624703

>>101624643
I see, thanks. Didn't realize q5km was a larger quant since 5bpw on exl2 barely fit with the same context size, so that's something.
Guess that all makes sense, just wanted to ensure I wasn't misreading docs.

Anonymous
07/29/24(Mon)14:09:08 No.101624719

Anonymous 07/29/24(Mon)14:09:08 No.101624719

>>101624662
>system prompt issue
Should we be using something more elaborate than the default simple ones? What do you put in it to make it vary opening phrases or other phrases? I thought that sort of thing didn't work.

Anonymous
07/29/24(Mon)14:09:23 No.101624725

Anonymous 07/29/24(Mon)14:09:23 No.101624725

>>101624657
It is necessary for you to know that he is the blacked poster.

Anonymous
07/29/24(Mon)14:09:38 No.101624730

Anonymous 07/29/24(Mon)14:09:38 No.101624730

>>101624451
How long until it understands mechanics?

Anonymous
07/29/24(Mon)14:10:45 No.101624746

Anonymous 07/29/24(Mon)14:10:45 No.101624746

>>101624149
3.5 Sonnet is the only one that'll work for you somewhat okayishly

Anonymous
07/29/24(Mon)14:11:08 No.101624750

Anonymous 07/29/24(Mon)14:11:08 No.101624750

>>101624643
Aren't the mistral prompt template in llama.cpp different than what mistral use? Man, everytime I try to take a look at llama.cpp there is so many things broken, do no use it directly? Feel like everyone just use kobold/ooba/ollama/lmstudio.

Anonymous
07/29/24(Mon)14:11:17 No.101624752

Anonymous 07/29/24(Mon)14:11:17 No.101624752

>>101623832
Oh yeah thanks to:
>Drummer, for spamming his shitty slop tunes here
>Jart, for being acting like a retard and slowing the development of llama.cpp
>Ikaridev and Undi, for their sloptunes and bringing discord shit into the thread
>Robert Sinclair, for his brilliant ideas regarding fixing quantization (adding random noise to the weights)

Anonymous
07/29/24(Mon)14:11:53 No.101624763

Anonymous 07/29/24(Mon)14:11:53 No.101624763

>>101624730
145 days 21 hours and 3 minutes

Anonymous
07/29/24(Mon)14:12:31 No.101624776

Anonymous 07/29/24(Mon)14:12:31 No.101624776

>>101624752
wait the tranny ACTUALLY VISITS this thread? wouldn't he just kill himself from visiting 4chan?

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/29/24(Mon)14:12:43 No.101624779

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/29/24(Mon)14:12:43 No.101624779

>>101624750
>Aren't the mistral prompt template in llama.cpp different than what mistral use?
Don't know.

Anonymous
07/29/24(Mon)14:13:43 No.101624791

Anonymous 07/29/24(Mon)14:13:43 No.101624791

>>101624776
Part of what makes me shitpost in this thread is all those times I got banned for bringing up mildly tranny unfriendly things.

Anonymous
07/29/24(Mon)14:14:00 No.101624794

Anonymous 07/29/24(Mon)14:14:00 No.101624794

>>101624040
I doubt you care, but I got it to work, it was indeed kobold being out of date, thanks for the help

Anonymous
07/29/24(Mon)14:14:13 No.101624797

Anonymous 07/29/24(Mon)14:14:13 No.101624797

>>101624791
Why is this thread so much more moderated than /aicg/? Why can't they just ban /aicg/ on /g/?

Anonymous
07/29/24(Mon)14:14:57 No.101624806

Anonymous 07/29/24(Mon)14:14:57 No.101624806

>>101624776
Yeah, they are sometimes here with the names on and off.
Drummer is just a retarded redditor that spams his shit here (he even bought ads in /g/ for like a week kekk)

Anonymous
07/29/24(Mon)14:16:09 No.101624825

Anonymous 07/29/24(Mon)14:16:09 No.101624825

>>101624797
/aicg/ schizos are too powerful, the jannies have given up and vacated.

Anonymous
07/29/24(Mon)14:16:42 No.101624833

Anonymous 07/29/24(Mon)14:16:42 No.101624833

>>101624825
They can just insta-delete the thread and be done with it.

Anonymous
07/29/24(Mon)14:16:49 No.101624835

Anonymous 07/29/24(Mon)14:16:49 No.101624835

>>101624806
>they
Holy shit you faggot. It him not they. Unlearn the conditioning and learn proper pronouns. He was born with a dick so it is he. Simple as.

Anonymous
07/29/24(Mon)14:16:59 No.101624837

Anonymous 07/29/24(Mon)14:16:59 No.101624837

File: file.png (23 KB, 889x202)

23 KB PNG

lol, NeMo is REALLY confident when writing this anti-adblocker message

Anonymous
07/29/24(Mon)14:17:05 No.101624838

Anonymous 07/29/24(Mon)14:17:05 No.101624838

>>101624752
And how did you forget about Sao? All of these combined don't even reach the peak of his spam.

Anonymous
07/29/24(Mon)14:17:26 No.101624845

Anonymous 07/29/24(Mon)14:17:26 No.101624845

On a fresh windows reinstall, I got the dependencies for sillytavern, then ran the updater of my old sillytavern install. It works, but the cmd prompt gives me as the first line
>fatal: detected dubious ownership in repository at 'D:/.../SillyTavern'
>'D:/.../SillyTavern' is owned by:
>(inconvertible) ([garbled string])
>but the current user is:
>[my username] ([garbled string])
...Why? Specially, why does it check and care about that? "Dubious ownership"? I thought it'd be related to the folder security permissions still tied to my old window install's user account, but like it says, it's all in my name.

This isn't a bug-fix question. It worked first run, and the second run after updating removed the warning. But I've never seen that before with any file when transferring my D drive to a new machine or reinstalling the OS on the current one. I'm just curious what it's about and if I should be concerned about other files.

Anonymous
07/29/24(Mon)14:17:39 No.101624848

Anonymous 07/29/24(Mon)14:17:39 No.101624848

>>101624835
Ikaridev and Undi, 2 people you braindead retard

Anonymous
07/29/24(Mon)14:18:40 No.101624866

Anonymous 07/29/24(Mon)14:18:40 No.101624866

>>101624838
Of course, sorry. Thanks to Sao for singlehandedly being more annoying that a discord server of 15 trannies spamming coordinately

Anonymous
07/29/24(Mon)14:20:38 No.101624893

Anonymous 07/29/24(Mon)14:20:38 No.101624893

>>101624848
I take it back. And I actually enjoy how well it illustrates how retarded they singular is.

Anonymous
07/29/24(Mon)14:21:09 No.101624898

Anonymous 07/29/24(Mon)14:21:09 No.101624898

>>101624838
And how could you forget p·e·t·r·a?

Anonymous
07/29/24(Mon)14:21:32 No.101624903

Anonymous 07/29/24(Mon)14:21:32 No.101624903

>>101624554
>drivers
here's one
https://github.com/tinygrad/open-gpu-kernel-modules/tree/550.90.07-p2p
and here's another one
https://github.com/tinygrad/open-gpu-kernel-modules/tree/550.54.15-p2p
work on both 3090 and 4090 and prolly A6000 too, but on 3090 the bandwidth is half the speed of 4090 for some reason. yet worth a try.

Anonymous
07/29/24(Mon)14:22:26 No.101624918

Anonymous 07/29/24(Mon)14:22:26 No.101624918

The 12B magnum is smart as fuck, but the qwen magnum is pure coal, it fails all my tests, I'm thinking shitty base model, not borked training. I have zero hope for qwen team, their instruct models were pozzed af too, asked them how to tell my gf she's fat and they pulled muh respect in every sentence

Anonymous
07/29/24(Mon)14:24:37 No.101624953

Anonymous 07/29/24(Mon)14:24:37 No.101624953

File: file.png (338 KB, 1140x684)

338 KB PNG

>>101624171
I'm not sure which ones, but there's a bunch of listing for months now for "NVIDIA DRIVE A100 Autonomous Vehicles"
>>101624178
No, they're SXM2. The issue is that you pretty much need one of those chink SMX2<=>PCIe adapters or you'll fry shit due to how NVLink works on them: https://forums.servethehome.com/index.php?threads/automotive-a100-sxm2-for-fsd-nvidia-drive-a100.43196/

Anonymous
07/29/24(Mon)14:24:40 No.101624954

Anonymous 07/29/24(Mon)14:24:40 No.101624954

>>101624918
>I have zero hope for qwen team
Chinks probably take models, scramble initial weights a bit add 1 or 2 more layers to change B size slightly and then continue training from there with some shitty datasets.

Anonymous
07/29/24(Mon)14:25:36 No.101624973

Anonymous 07/29/24(Mon)14:25:36 No.101624973

>>101624746
How do you run Sonnet locally? What specs do you need?

Anonymous
07/29/24(Mon)14:25:39 No.101624974

Anonymous 07/29/24(Mon)14:25:39 No.101624974

>>101624954
explain how deepseek is so good then? is it just random shitty datasets?

Anonymous
07/29/24(Mon)14:25:56 No.101624975

Anonymous 07/29/24(Mon)14:25:56 No.101624975

File: file.png (287 KB, 1410x803)

287 KB PNG

>touching her waist sends a rush of warmth to her cheeks
Dayum. Anyway, some phrases can be circumvented by substitution with alternative examples, but the hard part is coming up with replacement behavior that actually makes sense. For example if I copy this suggestion it just changes to "her skin flushed" and "her heart raced" and I still got a "sending a X to" though not "through".
If you say nothing happens as a result of touching, it will literally say something like "and nothing happens".
What's the objectively superior and neutral way to express reaction to being touched, assuming it must be described at least once?

Anonymous
07/29/24(Mon)14:26:51 No.101624985

Anonymous 07/29/24(Mon)14:26:51 No.101624985

>>101624973
It's a 4x36B MoE

Anonymous
07/29/24(Mon)14:29:17 No.101625022

Anonymous 07/29/24(Mon)14:29:17 No.101625022

>>101624794
Have fun.

Anonymous
07/29/24(Mon)14:29:59 No.101625035

Anonymous 07/29/24(Mon)14:29:59 No.101625035

>>101624643
does flash attention affect prompt processing or token generation. Does this depend on the GPU architecture?

Anonymous
07/29/24(Mon)14:32:29 No.101625061

Anonymous 07/29/24(Mon)14:32:29 No.101625061

>>101624985
So basically out of reach for a vramlet

Anonymous
07/29/24(Mon)14:37:02 No.101625132

Anonymous 07/29/24(Mon)14:37:02 No.101625132

>>101619442
my favorite poster

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/29/24(Mon)14:37:30 No.101625142

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/29/24(Mon)14:37:30 No.101625142

>>101625035
>does flash attention affect prompt processing or token generation.
Both.

>Does this depend on the GPU architecture?
Yes.
On AMD the kernels intended for large batch sizes for whatever reason have terrible performance so the kernels intended for small batch sizes are instead used for prompt processing which also have bad performance.
On NVIDIA GPUs FA should be consistently faster for both prompt processing and token generation, regardless of compute capability.
I have received reports about FA causing performance regressions with partial offloading but so far I have never been able to reproduce this.

Anonymous
07/29/24(Mon)14:40:02 No.101625179

Anonymous 07/29/24(Mon)14:40:02 No.101625179

>>101625035
FA best use is in reducing vram usage desu

Anonymous
07/29/24(Mon)14:43:36 No.101625242

Anonymous 07/29/24(Mon)14:43:36 No.101625242

>>101620971
never bought brand new gpus and none ever broke down, in fact they lasted years

this 3090 i currently own i bought from late 2022 when prices fell

Anonymous
07/29/24(Mon)14:46:50 No.101625287

Anonymous 07/29/24(Mon)14:46:50 No.101625287

>>101621901
someday they'll make dedicated AI compute processors with terabytes of RAM and they'll only make it available to datacenters

Anonymous
07/29/24(Mon)14:51:53 No.101625353

Anonymous 07/29/24(Mon)14:51:53 No.101625353

Fucking hate nemo putting asteriks when I don't want to and not putting them where they should be

Anonymous
07/29/24(Mon)14:58:50 No.101625437

Anonymous 07/29/24(Mon)14:58:50 No.101625437

>>101625353
Just stop using asterisks and let ST color actions/dialogue differently with CSS.

Anonymous
07/29/24(Mon)14:59:30 No.101625448

Anonymous 07/29/24(Mon)14:59:30 No.101625448

>>101625179
Yeah, it wouldn't have been able to fit 32k ctx largestral in my mikubox setup if not for FA

Anonymous
07/29/24(Mon)15:03:49 No.101625502

Anonymous 07/29/24(Mon)15:03:49 No.101625502

>>101621967
Don't forget to grab that reddit gold medal sir!

Anonymous
07/29/24(Mon)15:12:56 No.101625608

Anonymous 07/29/24(Mon)15:12:56 No.101625608

Wen L3.1 stheno

Anonymous
07/29/24(Mon)15:15:08 No.101625625

Anonymous 07/29/24(Mon)15:15:08 No.101625625

>>101625608
Now:
https://huggingface.co/nothingiisreal/L3.1-8B-Celeste-V1.5
https://huggingface.co/nothingiisreal/Celeste-12B-V1.6

Anonymous
07/29/24(Mon)15:21:16 No.101625699

Anonymous 07/29/24(Mon)15:21:16 No.101625699

>>101625142
can we use this in inference?
https://github.com/Repeerc/flash-attention-v2-RDNA3-minimal?tab=readme-ov-file#performance-in-stable-diffusion-comfyui
does this work in llm training?

Anonymous
07/29/24(Mon)15:21:24 No.101625701

Anonymous 07/29/24(Mon)15:21:24 No.101625701

>shitty finetunes
Just use the base model

Anonymous
07/29/24(Mon)15:22:00 No.101625711

Anonymous 07/29/24(Mon)15:22:00 No.101625711

shartyboys going after CUDA dev I see.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/29/24(Mon)15:22:06 No.101625712

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/29/24(Mon)15:22:06 No.101625712

>>101625699
Don't know.

Anonymous
07/29/24(Mon)15:24:11 No.101625733

Anonymous 07/29/24(Mon)15:24:11 No.101625733

>>101625712
that's the first and the only FA2 that works on AMD 7900 afaik
I'd check that out if I were you

Anonymous
07/29/24(Mon)15:26:32 No.101625758

Anonymous 07/29/24(Mon)15:26:32 No.101625758

>>101625701
Didn't you get the discord memo?

Anonymous
07/29/24(Mon)15:30:13 No.101625800

Anonymous 07/29/24(Mon)15:30:13 No.101625800

>>101625733
There were fork of FA for months that had support for RDNA3, it's nothing new. It was just building FA with a different branch of ck, I wonder if it's not merged in master already and you could just build FA directly.

Anonymous
07/29/24(Mon)15:31:36 No.101625819

Anonymous 07/29/24(Mon)15:31:36 No.101625819

File: _84164716-6711-49c2-9e9a-(...).jpg (151 KB, 1024x1024)

151 KB JPG

>>101625733
>AMD
Anon you gotta take it to the ZLUDA dev, his office is down the hall to the lef

Anonymous
07/29/24(Mon)15:32:45 No.101625836

Anonymous 07/29/24(Mon)15:32:45 No.101625836

>>101625448
what's the speed you get on your mikubox? what quants you use, kv cache, split settings, drivers?

Anonymous
07/29/24(Mon)15:34:13 No.101625861

Anonymous 07/29/24(Mon)15:34:13 No.101625861

>>101625800
that fork was FA1 and it was buggy, didn't work in pytorch very well . you can't turn it on in unsloth etc

Anonymous
07/29/24(Mon)15:36:28 No.101625888

Anonymous 07/29/24(Mon)15:36:28 No.101625888

>>101620430
Thanks bro. I wil try out Alpacs Roleplay later. I hipe this doryV2 worls for Oogaboogs too. Would you mind showing some other setting?

I got the Mistral 12b Nemo running. I just needed to Load it in with 80k~ context and not 1.000.000 it was set.

At first it worked like a Charme with Mistral Preset and 1-2 seconds answerspeed. Then got a little Stable Diffusion running in the background which wasnt a Problem with Kunoichi..

Now the answerspeed is 30-60 seconds-.- unplayable..

Restarted the PC a few times and now even without stable diffusion, the answer speed is 30-60seconds..
With my 32gb RAM and gpu (4090) both are capped put at 100% utilization.

Pls help, i am not a total smut

I was so close to heaven

I can put 16gb more in tomorrow of it helps

Anonymous
07/29/24(Mon)15:37:10 No.101625904

Anonymous 07/29/24(Mon)15:37:10 No.101625904

>>101625819
she has no boobs. da fuck is wrong with your imageGen, anon?

Anonymous
07/29/24(Mon)15:37:32 No.101625909

Anonymous 07/29/24(Mon)15:37:32 No.101625909

bastardized mistral prompt format that I made which sort of enables using author's notes to follow their sysprompt spec (to be honest, I don't know if it's really worth it)
context: https://files.catbox.moe/2ts74x.json
instruct: https://files.catbox.moe/j97vmp.json
note: example messages behavior -> never include (mistral format is fucking horrible for these, so I include them raw in the story string), trim spaces -> checked (otherwise old bot responses get an extra space in my experience)

this setup lets you use an author's note with the system role at depth 1 and it'll go where the official mistral prompt template inserts system prompts (at the top of the last user message, separated by 2 newlines)
not all of the ST macros work in ANs (why??) so you can't drop the whole story string in there but it seems to be a good spot for a short general system prompt type string with largestral. probably good for nemo too.
honestly prompt formats are a meme and this doesn't seem to make that huge of a difference in my testing, but I saw some people talk about this issue so I thought I might as well share

Anonymous
07/29/24(Mon)15:42:24 No.101625975

Anonymous 07/29/24(Mon)15:42:24 No.101625975

>>101625861
No, it was FA2 but that was before they rebased on newer version. Also only integrated the forward kernels but for your usage that should be enough. The FA implementation that you linked use rocWMMA, it will probably be slower. Have you tried just building official FA and forcing GPU_ARCHS? It will probably fail because not all kernels are implemented with RDNA3, but you can probably monkey patch and remove all that not working.
Also for your original question, llama.cpp use rocWMMA directly, it doesn't have a lot to do with flash attention python lib.

Anonymous
07/29/24(Mon)15:46:58 No.101626030

Anonymous 07/29/24(Mon)15:46:58 No.101626030

>>101619436
So I can get away with just (1)x nvidia P40?

Anonymous
07/29/24(Mon)15:49:17 No.101626059

Anonymous 07/29/24(Mon)15:49:17 No.101626059

>>101624502
temp 0.5, minp 0.01, tfs 0.01, dry base 2, dry mult 2, dry length 1
never saw a repetition

Anonymous
07/29/24(Mon)15:49:24 No.101626062

Anonymous 07/29/24(Mon)15:49:24 No.101626062

context shift doesn't work with cache quantization on llama.cpp
/g/ has lied to me

Anonymous
07/29/24(Mon)15:51:02 No.101626083

Anonymous 07/29/24(Mon)15:51:02 No.101626083

>>101623002
>Two cops literally right there
Who is playing this, a game journalist? Bet they are going to write about how the kidnapping is too hard because the cops keep spotting you.

Anonymous
07/29/24(Mon)15:52:03 No.101626104

Anonymous 07/29/24(Mon)15:52:03 No.101626104

>>101625888
Inbetween there are still 3-5secs (but it countdown only for 10%)

Anonymous
07/29/24(Mon)15:54:09 No.101626127

Anonymous 07/29/24(Mon)15:54:09 No.101626127

>>101625836
latest driver, batch size 1024 layer split 23 33 33, MMQ rowsplit, 8bit quant kv, IQ4_XS, I'm getting ~4.8t/s

Anonymous
07/29/24(Mon)15:54:23 No.101626131

Anonymous 07/29/24(Mon)15:54:23 No.101626131

>>101626030
depends what you want to do (specifically, what size are the models you want to run) and also how patient you are

just to make sure you're aware. with the P40 specifically you'll have additional considerations: 1) need to hack a fan to it 2) need iGPU or another GPU if you want to connect a monitor

Anonymous
07/29/24(Mon)15:56:56 No.101626167

Anonymous 07/29/24(Mon)15:56:56 No.101626167

>>101626131
>depends what you want to do (specifically, what size are the models you want to run) and also how patient you are
I have a 6950, but apparently Linux 6.8 breaks something in amd's drivers.

Anonymous
07/29/24(Mon)15:57:50 No.101626181

Anonymous 07/29/24(Mon)15:57:50 No.101626181

Hey /lmg/, what do you think will come first? AI capable of creating a CAD model of something you want, provided that you are specific about the requirements of what you want as well as its purpose. Or an AI capable of programming something complex without fucking up?
On one hand I want to say CAD models, since if they use CAD simulations they can figure out if what they made actually works. But that would require them to understand 3d space as well as having a great understanding of how to actually use that kind of software.
On the other hand, efforts are already being made to get the AI to code and progress has been made on that front. But current models are just as willing to spit out non-functional code or code that technically works but is poorly optimized and breaks other code if you attempt to integrate it.

Anonymous
07/29/24(Mon)16:15:54 No.101626401

Anonymous 07/29/24(Mon)16:15:54 No.101626401

So I haven’t been paying attention for a while (since I was disappointed with 4o basically, came back to test sonnet 3.5 and was also disappointed).

I am assuming opus is still the king of ERP / coom stuff? I know about llama 3.1 but I’m assuming that they can’t compete with opus.

Anonymous
07/29/24(Mon)16:16:11 No.101626406

Anonymous 07/29/24(Mon)16:16:11 No.101626406

File: 1714979081681513.png (57 KB, 1580x423)

57 KB PNG

>>101625836
>>101626127
forgot pic

Anonymous
07/29/24(Mon)16:17:15 No.101626422

Anonymous 07/29/24(Mon)16:17:15 No.101626422

Any small models that can go as off-the-rails as AID?

Anonymous
07/29/24(Mon)16:23:18 No.101626497

Anonymous 07/29/24(Mon)16:23:18 No.101626497

>>101626422
Lol
No.

Anonymous
07/29/24(Mon)16:25:13 No.101626528

Anonymous 07/29/24(Mon)16:25:13 No.101626528

>>101626422
>off-the-rails
On way to do that is to randomly add a instruction to the prompt telling the model to add a twist to the scene or something of the sort.
If you want that to happen semi-randomly, you can do that with the {{random:}} or {{pick:}} macros as well as with a lorebook to control the percentage chance of the prompt showing up in the context.

Anonymous
07/29/24(Mon)16:26:33 No.101626554

Anonymous 07/29/24(Mon)16:26:33 No.101626554

>>101626528
I'll look into that. Thanks anon!

Anonymous
07/29/24(Mon)16:28:42 No.101626585

Anonymous 07/29/24(Mon)16:28:42 No.101626585

Been switching between exl2 and gguf for Nemo. Anyone else notice that the gguf quant writes shorter responses? Also, did flash attention making the model retarded after a certain amount of context ever get fixed?

Anonymous
07/29/24(Mon)16:40:09 No.101626728

Anonymous 07/29/24(Mon)16:40:09 No.101626728

>>101626554
One suggestion was putting this at depth 1 with some frequency to be determined >>101026596

{{user}}: (Note: From here on, try to steer the conversation to a "{{random:abnormally,adventurously,aggressively,angrily,anxiously,awkwardly,beautifully,bleakly,boldly,bravely,busily,calmly,carefully,carelessly,cautiously,ceaselessly,cheerfully,combatively,coolly,crazily,curiously,daintily,dangerously,defiantly,deliberately,delightfully,dimly,efficently,energetically,enormously,enthusiastically,excitedly,fearfully,ferociously,fiercely,foolishly,fortunately,frantically,freely,frighteningly,fully,generously,gently,gladly,gracefully,gratefully,happily,hastily,healthily,helpfully,helplessly,hopelessly,innocently,intensely,interestingly,irritatingly,jovially,joyfully,judgementally,kindly,kookily,lazily,lightly,loosely,loudly,lovingly,loyally,majestically,meaningfully,mechanically,miserably,mockingly,mysteriously,naturally,neatly,nicely,oddly,offensively,officially,partially,peacefully,perfectly,playfully,politely,positively,powerfully,quaintly,quarrelsomely,roughly,rudely,ruthlessly,slowly,swiftly,threateningly,very,violently,wildly,yiedlingly}} {{random:abandoned,abnormal,amusing,ancient,aromatic,average,beautiful,bizarre,classy,clean,cold,colorful,creepy,cute,damaged,dark,defeated,delicate,delightful,dirty,disagreeable,disgusting,drab,dry,dull,empty,enormous,exotic,faded,familiar,fancy,fat,feeble,feminine,festive,flawless,fresh,full,glorious,good,graceful,hard,harsh,healthy,heavy,historical,horrible,important,interesting,juvenile,lacking,lame,large,lavish,lean,less,lethal,lonely,lovely,macabre,magnificient,masculine,mature,messy,mighty,military,modern,extravagant,mundane,mysterious,natural,nondescript,odd,pale,petite,poor,powerful,quaint,rare,reassuring,remarkable,rotten,rough,ruined,rustic,scary,simple,small,smelly,smooth,soft,strong,tranquil,ugly,valuable,warlike,warm,watery,weak,young}}" direction.)

Anonymous
07/29/24(Mon)16:42:58 No.101626763

Anonymous 07/29/24(Mon)16:42:58 No.101626763

been away for some weeks

gemma status?

Anonymous
07/29/24(Mon)16:45:28 No.101626787

Anonymous 07/29/24(Mon)16:45:28 No.101626787

>>101626728
Is that the complete line? Leaving an open parens seems wrong somehow

Anonymous
07/29/24(Mon)16:46:34 No.101626802

Anonymous 07/29/24(Mon)16:46:34 No.101626802

magnum-32b good

Anonymous
07/29/24(Mon)16:51:06 No.101626853

Anonymous 07/29/24(Mon)16:51:06 No.101626853

>>101626802
Good to know. I've bee alternating between nemo and mini-magnum and find I like the latter way more.

Anonymous
07/29/24(Mon)16:54:31 No.101626897

Anonymous 07/29/24(Mon)16:54:31 No.101626897

>>101626787
If you can't see the close you're on mobile or something and need to scroll.

Anonymous
07/29/24(Mon)16:57:47 No.101626950

Anonymous 07/29/24(Mon)16:57:47 No.101626950

>>101626181
AI that can write complex programs (given a fitness function and multiple tries) already exist, but they are not llms or llm-related.
https://oxsci.org/deepmind-sorting-algorithm-fastest-yet/

As far as llms, here's a demo of what GPT-4 can code (assuming we can trust this lecturer)
https://invidious.materialio.us/watch?v=qbIk7-JPB2c&t=1793

Anonymous
07/29/24(Mon)16:58:49 No.101626964

Anonymous 07/29/24(Mon)16:58:49 No.101626964

>>101626802
Seems very similar in style and intelligence level to mini-magnum in my testing (which makes it pointless since it's way bigger and slower)

Anonymous
07/29/24(Mon)17:01:33 No.101626989

Anonymous 07/29/24(Mon)17:01:33 No.101626989

>>101626950
>invidious
youtube
https://www.youtube.com/watch?v=qbIk7-JPB2c&t=1793

Anonymous
07/29/24(Mon)17:04:30 No.101627020

Anonymous 07/29/24(Mon)17:04:30 No.101627020

What happens if you run a bigger model than what's rated for your RAM? Say I try to run a 32B model at Q_8 on 16gb ram / 8gb vram.

Anonymous
07/29/24(Mon)17:09:36 No.101627068

Anonymous 07/29/24(Mon)17:09:36 No.101627068

>>101627020
Your computer will use your hard drive as ram (very slow) or the program will crash immediately.

Anonymous
07/29/24(Mon)17:09:47 No.101627070

Anonymous 07/29/24(Mon)17:09:47 No.101627070

Damn, Largestral at IQ3_M is the first model that really feels like commercial tier intelligence running locally.
Just wish I could somehow get more than 1.5 t/s out of it.

Anonymous
07/29/24(Mon)17:09:51 No.101627071

Anonymous 07/29/24(Mon)17:09:51 No.101627071

>>101627020
offloads to disk

Anonymous
07/29/24(Mon)17:11:03 No.101627089

Anonymous 07/29/24(Mon)17:11:03 No.101627089

>>101627020
your pc will die (in minecraft)

Anonymous
07/29/24(Mon)17:11:16 No.101627091

Anonymous 07/29/24(Mon)17:11:16 No.101627091

File: BA_shupo_011.gif (284 KB, 200x200)

284 KB GIF

>>101620069 (me)
>>101620112

>3090 is in windows desktop
>There's a link to install steps for windows so check it out.
>spend an hour manually installing cuda shit and trying to troubleshoot when the script doesn't work.
>wsl --isntall
>pip isntall TTS

It just works. How is windows this shit?

Anonymous
07/29/24(Mon)17:14:04 No.101627125

Anonymous 07/29/24(Mon)17:14:04 No.101627125

>>101627068
>>101627071
I've heard that can wear on SSDs, so I imagine its not something I want to do often, correct?

Anonymous
07/29/24(Mon)17:16:47 No.101627160

Anonymous 07/29/24(Mon)17:16:47 No.101627160

>>101626964
is the big magnum (72b?) supposed to be good? never really gave it a shot, it's small and old (at least by the standards of this industry), can't be better than opus or sonnet 3.5 right?

Anonymous
07/29/24(Mon)17:22:48 No.101627226

Anonymous 07/29/24(Mon)17:22:48 No.101627226

>>101627125
flash-based SSDs will get burnt-out over time, yes. but the excruciatingly slow gen speeds should deter you from getting to that point

Anonymous
07/29/24(Mon)17:22:57 No.101627228

Anonymous 07/29/24(Mon)17:22:57 No.101627228

>>101627160
>good?
meh
writes good coom but it's horny as fuck
>can't be better than opus or sonnet 3.5 right?
correct

Anonymous
07/29/24(Mon)17:23:32 No.101627237

Anonymous 07/29/24(Mon)17:23:32 No.101627237

Oh, the girl's name is Lily, huh? You don't say.

Anonymous
07/29/24(Mon)17:24:06 No.101627247

Anonymous 07/29/24(Mon)17:24:06 No.101627247

>>101627160
I didn't like it
I think rp/story tunes of models 70B and bigger tend to suck because tuners try to save money by tuning on top of the instruct instead of the base
and they never train enough to overcome the "feel" of the instruct

with small ones it's better because they can afford to train on the base, and for a long enough time to actually change the model's tendencies

Anonymous
07/29/24(Mon)17:32:36 No.101627331

Anonymous 07/29/24(Mon)17:32:36 No.101627331

>>101627247
i usually find smaller models too dumb / repetitive across sessions

i guess there's only so much you can fit into 8b parameters vs 70+b

Anonymous
07/29/24(Mon)17:32:47 No.101627336

Anonymous 07/29/24(Mon)17:32:47 No.101627336

>>101627247
>with small ones it's better because they can afford to train on the base, and for a long enough time to actually change the model's tendencies
https://huggingface.co/nothingiisreal/L3.1-8B-Celeste-V1.5
>We trained LLaMA 3.1 8B Instruct at 8K context
https://huggingface.co/Undi95/Meta-Llama-3.1-8B-Claude
>Llama 3.1 8B Instruct trained on 9 000 000 Claude Opus/Sonnet tokens.
https://huggingface.co/NeverSleep/Lumimaid-v0.2-8B
>This model is based on: Meta-Llama-3.1-8B-Instruct

Anonymous
07/29/24(Mon)17:34:14 No.101627348

Anonymous 07/29/24(Mon)17:34:14 No.101627348

File: she imagines.png (993 KB, 1344x1115)

993 KB PNG

I've been enjoying nemo a lot, haven't had any problems besides the model rambling every once in a while, but it happens so sporadically and you can stop the response when it starts rambling so I don't mind.
And then there's picrel where not only it fucked up the formatting but it went on forever in the most schizo rapid fire of words possible.
I just let her cook.

Anonymous
07/29/24(Mon)17:34:18 No.101627349

Anonymous 07/29/24(Mon)17:34:18 No.101627349

>>101627070
Is IQ2_M not good enough?

Anonymous
07/29/24(Mon)17:35:35 No.101627367

Anonymous 07/29/24(Mon)17:35:35 No.101627367

File: 1707711916634305.jpg (25 KB, 488x277)

25 KB JPG

>>101625904
it migu

Anonymous
07/29/24(Mon)17:36:27 No.101627377

Anonymous 07/29/24(Mon)17:36:27 No.101627377

>>101627336
>"_name_or_path": "meta-llama/Meta-Llama-3-8B-Instruct",
https://huggingface.co/Sao10K/L3-8B-Stheno-v3.2/commit/cd1c715

Anonymous
07/29/24(Mon)17:36:40 No.101627380

Anonymous 07/29/24(Mon)17:36:40 No.101627380

>>101627336
I am not at all surprised to learn that there are a few retarded exceptions
I do not think this proves what I said incorrect in any way. mini magnum and magnum 32B were both trained on base, for example
base tunes of 70B and above are extremely rare because community people can't afford it

Anonymous
07/29/24(Mon)17:37:57 No.101627392

Anonymous 07/29/24(Mon)17:37:57 No.101627392

>>101627380
>a few retarded exceptions
Sao trained on instruct too, point is most tunes nowadays are on instruct not base been the case since mixtral

Anonymous
07/29/24(Mon)17:38:57 No.101627407

Anonymous 07/29/24(Mon)17:38:57 No.101627407

>>101627380
>mini magnum and magnum 32B were both trained on base, for example
those are the exception honestly

Anonymous
07/29/24(Mon)17:40:21 No.101627424

Anonymous 07/29/24(Mon)17:40:21 No.101627424

>>101627392
NTA but if that's true, it needs to change. Explains why the results are usually so shitty and people are increasingly just sticking with the official instruct version of a model instead of bothering with RP tunes.

Anonymous
07/29/24(Mon)17:43:08 No.101627448

Anonymous 07/29/24(Mon)17:43:08 No.101627448

>>101627349
As someone who can run IQ2_M, it wasn't that impressive to me, maybe equal to 8x22b. So if someone is saying IQ3_M is commercial tier then I'd say IQ2_M is a huge downgrade.

Anonymous
07/29/24(Mon)17:45:09 No.101627465

Anonymous 07/29/24(Mon)17:45:09 No.101627465

>>101627349
there's a very large jump from Q2 to Q3 with any model, anon

Anonymous
07/29/24(Mon)17:47:29 No.101627493

Anonymous 07/29/24(Mon)17:47:29 No.101627493

>>101627465
But it's a 123B. The bigger you go, the less quality you lose from quanting.

Anonymous
07/29/24(Mon)17:47:30 No.101627494

Anonymous 07/29/24(Mon)17:47:30 No.101627494

>>101625819
Please gen a pic of her wearing a micro bikini!!

Anonymous
07/29/24(Mon)17:47:48 No.101627496

Anonymous 07/29/24(Mon)17:47:48 No.101627496

>>101627348
how does nemo compare to other similar sized models or 70b models? ive heard people praising it but when i try it it's meh, might be my prompt though

Anonymous
07/29/24(Mon)17:48:41 No.101627504

Anonymous 07/29/24(Mon)17:48:41 No.101627504

>>101627448
Oh, 8x22B at what quant? I use Q4_K_M with Wizard.

Anonymous
07/29/24(Mon)17:48:48 No.101627506

Anonymous 07/29/24(Mon)17:48:48 No.101627506

Nemo is good but for some reason assumes my identity and writes for me. Maybe my ST settings are wrong?

Anonymous
07/29/24(Mon)17:49:46 No.101627523

Anonymous 07/29/24(Mon)17:49:46 No.101627523

>>101627465
Q3: kidding myself
Q4: "just as good" as higher quants
Q5: okay now it's really just as good
Q6: visibly better at instruction following, I can probably stop here since Q6 and Q8 are really close on some meaningless chart
Q8: this is basically the same as FP16 right? <-- I am here

Anonymous
07/29/24(Mon)17:49:51 No.101627527

Anonymous 07/29/24(Mon)17:49:51 No.101627527

>>101627493
>The bigger you go, the less quality you lose from quanting
uhhh NTA but I don't think that's true at all

Anonymous
07/29/24(Mon)17:50:57 No.101627546

Anonymous 07/29/24(Mon)17:50:57 No.101627546

>>101627523
Direct your post at the guy running Q2, not me

Anonymous
07/29/24(Mon)17:53:37 No.101627582

Anonymous 07/29/24(Mon)17:53:37 No.101627582

>>101627504
I use q4 for wizard as well, but that's just my initial impression, not a lot of testing. I haven't tried to see if it's as good at code or knowledge tasks which I liked wizard for. It all runs so slow for me so that'll take a while. But for mistral large IQ2_m is the best I can do since it starts at 1.7T/s, and quickly slows to 0.5 if you get to 10k context +. I only have a shitty 8gb gpu.

Anonymous
07/29/24(Mon)17:59:01 No.101627634

Anonymous 07/29/24(Mon)17:59:01 No.101627634

>>101627527
There was a graph comparing quant sizes of 8B and 70B with MMLU scores and quants on 8B has a much more detrimental effect than 70B though. How else would you interpret that?

Anonymous
07/29/24(Mon)17:59:59 No.101627645

Anonymous 07/29/24(Mon)17:59:59 No.101627645

>>101627496
Nta but I've found nemo to be dogshit unless you work with it and provide inputs of roughly the level of quality you want to see. I've gotten bad outputs by rushing the process, and really good ones when I work on it over time. Even with the same context/AN. But I haven't tried the bigger 70b models so ymm.

Anonymous
07/29/24(Mon)18:00:26 No.101627651

Anonymous 07/29/24(Mon)18:00:26 No.101627651

File: mmlu_vs_quants.png (336 KB, 3000x2100)

336 KB PNG

>>101627634

Anonymous
07/29/24(Mon)18:01:23 No.101627664

Anonymous 07/29/24(Mon)18:01:23 No.101627664

>>101627582
Damn. Wizard is probably a lot faster in your case then. I think I could fit IQ2_M of Largestral so it's not too slow but I also don't want to spend the time downloading it and testing it. I wish we just had good automated benchmarks.

Anonymous
07/29/24(Mon)18:02:03 No.101627674

Anonymous 07/29/24(Mon)18:02:03 No.101627674

after llama 3.1 and mistral large 2, it really does seems like models from bigger companies are going to be dead for ERP

even 4o and sonnet 3.5, they are all worse than their predecessor model for ERP... i really fucking hope they dont lobotomize opus 3.5 similarly

Anonymous
07/29/24(Mon)18:02:46 No.101627684

Anonymous 07/29/24(Mon)18:02:46 No.101627684

>>101627493
it's true that the quantardation is more significant with smaller models but it's very much still a thing with bigger ones, especially at the sub-Q4 range

Anonymous
07/29/24(Mon)18:02:51 No.101627686

Anonymous 07/29/24(Mon)18:02:51 No.101627686

As a relative novice, how can I face swap an image into a video, locally, without uploading to some random site?

Anonymous
07/29/24(Mon)18:03:59 No.101627698

Anonymous 07/29/24(Mon)18:03:59 No.101627698

>>101627674
largestral is the best thing mistral has ever released for erp, what are you smoking? I even see people in aicg using it

Anonymous
07/29/24(Mon)18:04:01 No.101627699

Anonymous 07/29/24(Mon)18:04:01 No.101627699

>>101627674
I don't think the word "lobotomize" makes sense in the case of Sonnet since it's clearly extremely smart, the connotations of the word are wrong here. Lobotomies make someone dumber
It was more like a soul-ectomy

Anonymous
07/29/24(Mon)18:04:09 No.101627701

Anonymous 07/29/24(Mon)18:04:09 No.101627701

i've not been here for about 2 weeks. what's the general concensus for vramlet models? still gemma 27b?

Anonymous
07/29/24(Mon)18:04:45 No.101627710

Anonymous 07/29/24(Mon)18:04:45 No.101627710

>>101627684
I mean I don't doubt that it's still bad, but like >>101627651 shows, there's a big range in terms of how large the effect is. Clearly extremely low quants like IQ2_M are disastrous on 10B class models, while they look kind of reasonable for 70 (and up probably).

Anonymous
07/29/24(Mon)18:04:46 No.101627711

Anonymous 07/29/24(Mon)18:04:46 No.101627711

>>101627523
Q8: Tiny model, can't afford to lobotomize.
Q6: Ain't gonna notice the difference.
Q5: Not worried.
Q4: Ouch.
Q3: Fuck, I thought this was the iMat IQ3.
Q2: I just want a taste of what I can't have.
Q1: Can't wait for 1.58 bitnet.

Anonymous
07/29/24(Mon)18:05:19 No.101627721

Anonymous 07/29/24(Mon)18:05:19 No.101627721

>>101627506
checking "Include Names" can mitigate it a little

Anonymous
07/29/24(Mon)18:08:07 No.101627754

Anonymous 07/29/24(Mon)18:08:07 No.101627754

>>101627348
>fumiko
>endo endo endo endo endo

Anonymous
07/29/24(Mon)18:09:11 No.101627763

Anonymous 07/29/24(Mon)18:09:11 No.101627763

>>101627506
Are you using the base model?

Anonymous
07/29/24(Mon)18:09:31 No.101627768

Anonymous 07/29/24(Mon)18:09:31 No.101627768

>>101627710
As a 24gb vramlet, after testing many alternatives, my current preferred model is izardLM-2-8x22B.i1-IQ2_S.gguf
It's not great but it's not worse than the alternatives

Anonymous
07/29/24(Mon)18:14:47 No.101627823

Anonymous 07/29/24(Mon)18:14:47 No.101627823

>>101627698
when i use it (on open router), it's completely ass for erp, repeating the same dialogue just slightly different (moaning for 5 straight dialogues) and just doesn't know what to do in a sex scene

Anonymous
07/29/24(Mon)18:14:57 No.101627824

Anonymous 07/29/24(Mon)18:14:57 No.101627824

>>101627523
>>101627711
For 70B:
Q8: As good as F16
Q6: As good as F16
Q5: As good as F16
Q4: Good enough
Q3: Good enough
Q2, Q1: Stop, get some help.

Anonymous
07/29/24(Mon)18:15:08 No.101627825

Anonymous 07/29/24(Mon)18:15:08 No.101627825

>>101627768
Can't find or fit an iMat instead of i1?

Anonymous
07/29/24(Mon)18:15:08 No.101627826

Anonymous 07/29/24(Mon)18:15:08 No.101627826

>>101627701
Nemo is the new hotness, old man

Anonymous
07/29/24(Mon)18:15:37 No.101627831

Anonymous 07/29/24(Mon)18:15:37 No.101627831

>>101627768
What settings are good for wizard and what format does it use?

Anonymous
07/29/24(Mon)18:17:12 No.101627851

Anonymous 07/29/24(Mon)18:17:12 No.101627851

>>101627825
Not him but what's the difference? I've only ever heard of imat before. And when I make my own quants, I don't see any 'i1' options.

Anonymous
07/29/24(Mon)18:17:57 No.101627859

Anonymous 07/29/24(Mon)18:17:57 No.101627859

File: my settings.jpg (419 KB, 1925x1166)

419 KB JPG

I dunno if i'm retarded, or if it's my card or what.

So basically:

>using Command R on Silly Tavern
>made the most generic card just to test the RP
>robot gives response, gets into character but there seems to be no consistency even within the first messages. Robot will refer to me as someone else, imply that my daddy wants to do something to me (lol)

It's a stepmom roleplay, the most vanilla card imaginable just to see how the jailbreaks are on builds (if stepmom shit flags it, fuck that) and also to not be overcomplicated.

But for some reason, all of the chatbots suck, no matter if it's Command R, Nemo or Gemma 27B.

I have no idea what i'm doing wrong. Please gimmie some tips lads, to be fair I am totally new to Silly Tavern so I know the issue is literally a "skill issue", just need some pointers.

Anonymous
07/29/24(Mon)18:19:11 No.101627880

Anonymous 07/29/24(Mon)18:19:11 No.101627880

>>101627859
welcome to /lmg/, i hope you have a nice stay

Anonymous
07/29/24(Mon)18:19:18 No.101627882

Anonymous 07/29/24(Mon)18:19:18 No.101627882

>>101627859
If you look at the final prompt that the backend receives, does it look right?

Anonymous
07/29/24(Mon)18:20:17 No.101627891

Anonymous 07/29/24(Mon)18:20:17 No.101627891

>>101627824
I think the drop off starts at Q5, not Q4
I can definitely feel the difference between Q5 and Q6, but Q6 and anything above that not really

Anonymous
07/29/24(Mon)18:20:27 No.101627895

Anonymous 07/29/24(Mon)18:20:27 No.101627895

>>101627882
whaddya mean? you mean the commands showing on kobold? Looks all right to me

Anonymous
07/29/24(Mon)18:22:32 No.101627922

Anonymous 07/29/24(Mon)18:22:32 No.101627922

>>101627824
For coding tasks, anything under Q5 or under 70B has let me down. But for creative writing, wrongness can be beneficial, depending on where it hallucinates.

>>101627851
iMatrix does extra work to make low-bit quants better. i1 is similar but it's a one bit system so it's smaller than iMatrix but it can get sketchy. That said, I've got an i1 in my go-tos but it's Q5_K_S. If you're quanting down to Q2, I'm curious if iMatrix would be significantly larger (which may be prohibitive) or if the quality between the two is comparable or significantly different. But just reading the file name, i1 and Q2 sounds exciting.

Anonymous
07/29/24(Mon)18:23:15 No.101627931

Anonymous 07/29/24(Mon)18:23:15 No.101627931

>>101627895
I mean that Silly will take the chat history, the character card, examples, etc, and format it all based on the Context Template and Instruct Mode Preset.
Looking at the final, formatted text that gets sent to the backend (koboldcpp in your case I guess) can help you find out what's wrong.
Also, try neutralized samplers, although I don't see anything too weird in your samplers settings.

Anonymous
07/29/24(Mon)18:25:04 No.101627949

Anonymous 07/29/24(Mon)18:25:04 No.101627949

>>101627859
Try putting some examples in, or edit the first few messages to be how you like and see if it continues to fuck up, that way you'll know if something is truly weird, or if it was just unsure what to do and needed more input.

Anonymous
07/29/24(Mon)18:30:18 No.101628018

Anonymous 07/29/24(Mon)18:30:18 No.101628018

>>101627763
No, instruct.
>>101627721
I was under the impression that it made it worse.

Anonymous
07/29/24(Mon)18:40:16 No.101628135

Anonymous 07/29/24(Mon)18:40:16 No.101628135

>>101626763
To me it is dead. It was a shitty model and not just a bugged loader. Nemo on the other hand is retarded but most fun I had with a model in a while.

Anonymous
07/29/24(Mon)18:48:51 No.101628233

Anonymous 07/29/24(Mon)18:48:51 No.101628233

>>101628135
Is nemo better than command-r? That's the best one under 70b I think.

Anonymous
07/29/24(Mon)18:50:37 No.101628250

Anonymous 07/29/24(Mon)18:50:37 No.101628250

File: file.png (242 KB, 410x482)

242 KB PNG

>>101628233
Nemo is personification of pic related. It feels cuter to me than command-r. And I am a huge command-r fan.

Anonymous
07/29/24(Mon)18:51:24 No.101628262

Anonymous 07/29/24(Mon)18:51:24 No.101628262

>>101624099
Add a pause command at the bottom of the bat file and it'll "Press any key to continue" before exiting so you can see the whole error log.

Anonymous
07/29/24(Mon)18:58:47 No.101628339

Anonymous 07/29/24(Mon)18:58:47 No.101628339

>>101628250
Last time I tried it the first few replies all started with the same thing, and it was hard to get it to stop doing that without making it even dumber.

Anonymous
07/29/24(Mon)19:02:56 No.101628379

Anonymous 07/29/24(Mon)19:02:56 No.101628379

>>101627831
Settings don't matter

Anonymous
07/29/24(Mon)19:03:44 No.101628388

Anonymous 07/29/24(Mon)19:03:44 No.101628388

>>101628379
Then it's kinda useless if it's just gonna start every message with the same 2 or 3 things.

Anonymous
07/29/24(Mon)19:06:16 No.101628413

Anonymous 07/29/24(Mon)19:06:16 No.101628413

>>101628339
Yes it is retarded and it autistically picks up on unwanted patterns. It is hard to tardwrangle but it gives some next tier cooming. When it doesn't repeat itself and isn't retarded it can sometimes write like 800 tokens that are absolutely perfect and don't need any editing. Comparing to command-r at least for me I had to always heavily edit all the outputs.

Anonymous
07/29/24(Mon)19:06:38 No.101628417

Anonymous 07/29/24(Mon)19:06:38 No.101628417

>>101628398
>>101628398
>>101628398

Anonymous
07/29/24(Mon)19:08:10 No.101628431

Anonymous 07/29/24(Mon)19:08:10 No.101628431

>>101628413
Well, after the first sentence it seems good. I guess I could just ignore the first bit and just accept that it's always the same like some retarded quirk like you say.

Anonymous
07/29/24(Mon)19:17:56 No.101628522

Anonymous 07/29/24(Mon)19:17:56 No.101628522

>>101628388
click "neutralize samplers" in sillytavern
play with the temp a little if you want, but 1 is fine
0 is smarter
1 is more random
2 is schizo
you can do a min p if you want, I usually do .05, but again it shouldn't matter

Anonymous
07/29/24(Mon)19:19:28 No.101628542

Anonymous 07/29/24(Mon)19:19:28 No.101628542

>>101627824
Q5 is noticeable compared to Q6, but just barely.

Anonymous
07/29/24(Mon)20:26:18 No.101629433

Anonymous 07/29/24(Mon)20:26:18 No.101629433

File: check this sip.jpg (69 KB, 828x987)

69 KB JPG

What's the most powerful local AI that a 4090 + 32GB RAM can run, objectively speaking.

Anonymous
07/29/24(Mon)20:29:25 No.101629470

Anonymous 07/29/24(Mon)20:29:25 No.101629470

>>101629433
command R non plus

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.