/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 08/22/24(Thu)23:33:51 No.102036232

File: novelai.png (1.1 MB, 832x1216)

1.1 MB PNG

/lmg/ - Local Models General Anonymous 08/22/24(Thu)23:33:51 No.102036232 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102025568 & >>102011438

►News
>(08/22) Jamba 1.5: 52B & 398B MoE: https://hf.co/collections/ai21labs/jamba-15-66c44befa474a917fcf55251
>(08/20) Microsoft's Phi-3.5 released: mini+MoE+vision: https://hf.co/microsoft/Phi-3.5-MoE-instruct
>(08/16) MiniCPM-V-2.6 support merged: https://github.com/ggerganov/llama.cpp/pull/8967
>(08/15) Hermes 3 released, full finetunes of Llama 3.1 base models: https://hf.co/collections/NousResearch/hermes-3-66bd6c01399b14b08fe335ea
>(08/12) Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
08/22/24(Thu)23:35:47 No.102036249

Anonymous 08/22/24(Thu)23:35:47 No.102036249

Hello I am the retard from last thread with the same dumb question.
Is there a place to download lewd loras for my koboldcpp use?
Are loras model-agnostic or model-specific?

Anonymous
08/22/24(Thu)23:36:58 No.102036257

Anonymous 08/22/24(Thu)23:36:58 No.102036257

>>102036249
Go back to the Kobold Discord and stop spamming the general with your retarded questions.

Anonymous
08/22/24(Thu)23:39:01 No.102036277

Anonymous 08/22/24(Thu)23:39:01 No.102036277

>>102036257
I asked once at the end of an autosaging dead thread and once here.
Is this a secret club circlejerk thread, or is it just you?

Anonymous
08/22/24(Thu)23:39:45 No.102036289

Anonymous 08/22/24(Thu)23:39:45 No.102036289

now that the dust has settled, does anyone have any jamba mini logs

Anonymous
08/22/24(Thu)23:40:12 No.102036295

Anonymous 08/22/24(Thu)23:40:12 No.102036295

what are the smallest/fastest models for img gen and textgen respectively? im on vacation with a shitty laptop rn and just need anything to test the frontend im working on

Anonymous
08/22/24(Thu)23:42:46 No.102036314

Anonymous 08/22/24(Thu)23:42:46 No.102036314

>>102036277
That's just someone's automated AI response. Don't mind it.

Anonymous
08/22/24(Thu)23:46:07 No.102036336

Anonymous 08/22/24(Thu)23:46:07 No.102036336

>>102036249
You have to do it yourself and it takes a while. Here's an example for llama 2 so you can see the sort of stuff involved.
https://llama.meta.com/docs/how-to-guides/fine-tuning/

Anonymous
08/22/24(Thu)23:51:31 No.102036396

Anonymous 08/22/24(Thu)23:51:31 No.102036396

>>102036249
>Is there a place to download lewd loras for my koboldcpp use?
There are loras, but you'll have to (try) to convert them yourself to gguf. Chances are that it'll fail. They're also called adapters by some people, but they may be entirely different things.
>Are loras model-agnostic or model-specific?
All loras are model (architecture, rather) specific. In contrast with image models, there's a lot more llm architectures. You're gonna have a hard time finding one for your model.

Anonymous
08/22/24(Thu)23:53:28 No.102036421

Anonymous 08/22/24(Thu)23:53:28 No.102036421

>>102036336
>>102036396
Got it, thanks. I was wondering why image models had loras available everywhere and llm models didn't seem to.

Anonymous
08/23/24(Fri)00:00:44 No.102036485

Anonymous 08/23/24(Fri)00:00:44 No.102036485

>>102036421
>I was wondering why image models had loras available everywhere and llm models didn't seem to.
For image generation there was only SD for the longest time. civitai shows about 25 checkpoint types in their filter. Just llama.cpp supports about 50 different model types (and ~10 extra with vision) and there's a new one every one or two weeks. It moves at a different rate.
Also, it's much harder to judge the quality of a lora for text, as it requires much more testing. For images it's mostly "Does it look like X? Youp. Ship it."

Anonymous
08/23/24(Fri)00:17:54 No.102036662

Anonymous 08/23/24(Fri)00:17:54 No.102036662

>>102036249
Text-to-text is very VRAM intensive, so people prefer to use models pre-merged with LoRAs rather than attach LoRAs "on the fly". Every byte counts.

Anonymous
08/23/24(Fri)00:18:55 No.102036681

Anonymous 08/23/24(Fri)00:18:55 No.102036681

>>102036662 (me)
Oh, and LoRAs do not work well with quantized models, at least last I checked.

Anonymous
08/23/24(Fri)00:24:42 No.102036739

Anonymous 08/23/24(Fri)00:24:42 No.102036739

>>102033555
This seems pretty good! I don't like the prose as much as Stheno's but it's pretty good and it feels clever. Reacted perfectly when I set up some kneeling reverse paizuri.

Anonymous
08/23/24(Fri)00:30:47 No.102036787

Anonymous 08/23/24(Fri)00:30:47 No.102036787

What are some good uncensored (including /ss/, rape, exhibitionism, etc.) models for local LLM?

Anonymous
08/23/24(Fri)00:32:40 No.102036806

Anonymous 08/23/24(Fri)00:32:40 No.102036806

Why don't people fine tune models using data sets of RPs that use up the full context? It's always short stuff, wouldn't that work better?

Anonymous
08/23/24(Fri)00:33:37 No.102036814

Anonymous 08/23/24(Fri)00:33:37 No.102036814

>>102036787
Llama 3.1.

Anonymous
08/23/24(Fri)00:33:38 No.102036815

Anonymous 08/23/24(Fri)00:33:38 No.102036815

File: file.png (38 KB, 815x403)

38 KB PNG

sovl

Anonymous
08/23/24(Fri)00:36:34 No.102036833

Anonymous 08/23/24(Fri)00:36:34 No.102036833

how good is phi 3.5?

Anonymous
08/23/24(Fri)00:44:31 No.102036910

Anonymous 08/23/24(Fri)00:44:31 No.102036910

smedrins

Anonymous
08/23/24(Fri)00:46:52 No.102036928

Anonymous 08/23/24(Fri)00:46:52 No.102036928

llama.cpp jamba support status?

Anonymous
08/23/24(Fri)00:49:00 No.102036947

Anonymous 08/23/24(Fri)00:49:00 No.102036947

>>102036928
Try it with vLLM and see if CPU off-loading works.

Anonymous
08/23/24(Fri)00:55:02 No.102036996

Anonymous 08/23/24(Fri)00:55:02 No.102036996

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>102025568

--Papers: >>102035401 >>102035485
--KCPP reprocessing prompt issue, possibly related to smart context or context shifting: >>102030868 >>102030913 >>102031159 >>102031212 >>102031832
--Jamba-Large and Jamba-Mini performance comparisons and trade-offs: >>102028259 >>102028346 >>102028406 >>102028614
--Anon uses AI to roleplay as a mean and harsh character to troll people on AI generals: >>102031636 >>102031674 >>102031953 >>102032069 >>102032099 >>102032175
--NovelAI releases diffusion model weights, but users are unimpressed: >>102030336 >>102030365 >>102030385 >>102030688 >>102032515 >>102030416
--Anon tries to get Gemma to write spicy content using vector database and author notes: >>102031259 >>102031283 >>102031341 >>102031828 >>102032079 >>102032332
--Anon shares workaround for LLM's moralizing and voice descriptions: >>102032210 >>102027908 >>102032337 >>102032453 >>102032660 >>102032726 >>102032806
--New model with 256k context has limited appeal due to RAM and parameter count: >>102030293 >>102030405 >>102030437 >>102030713 >>102030718
--LoTA shows promise in mitigating catastrophic forgetting and enabling model merging: >>102029662 >>102029851
--Jambas models underperform on translation tasks, including Japanese: >>102026942 >>102026996 >>102027115
--Jambaree can be run on CPU, but with poor inference performance: >>102027947 >>102028011 >>102028043
--Exo AI cluster project, but buying more RAM might be more cost-effective: >>102026265 >>102026284
--Anons speculate about Metamate open release: >>102029271 >>102029321
--Anon trashes Mixtral, others defend its value for 24GB cards: >>102026286 >>102026430 >>102026457 >>102027821 >>102028192 >>102028388
--Anon shares log of new model based on llama 3.1 8B: >>102025881
--Miku (free space): >>102028311 >>102028312 >>102028330 >>102028382 >>102028400 >>102028836 >>102036024

►Recent Highlight Posts from the Previous Thread: >>102025624

Anonymous
08/23/24(Fri)00:55:56 No.102037000

Anonymous 08/23/24(Fri)00:55:56 No.102037000

>>102036815
Rad, now we're living in the future.

Anonymous
08/23/24(Fri)01:08:33 No.102037120

Anonymous 08/23/24(Fri)01:08:33 No.102037120

Can't believe I slept on nemo so long.
Magnum12b is so good its the first time since llama1 that I am having actual fun instead of just testing. Still retarded sometimes but unpredictable and chars actually stay in character.
Is that what it was like for the vram chads months ago? Insane how much small models improve.

Also anybody knows how bad 8bit KV Cache affects the model? With I would be able to go up to 16k context.

Anonymous
08/23/24(Fri)01:09:47 No.102037137

Anonymous 08/23/24(Fri)01:09:47 No.102037137

>>102037120
>With I would be able to go up to 16k context.
Just get a gguf, it's small enough that it'll be fast.

Anonymous
08/23/24(Fri)01:13:17 No.102037168

Anonymous 08/23/24(Fri)01:13:17 No.102037168

>>102037137
I am using gguf. I have a 1080+1080ti. Without 8bit KV i can go until around 12k.
I cant use ExLlama, it takes ages to prompt process. We are talking minutes if its a couple thousand tokens that are loaded.
There must be something wrong for older pascal cards.
Bless GPU anon for supporting older cards.

Anonymous
08/23/24(Fri)01:14:46 No.102037185

Anonymous 08/23/24(Fri)01:14:46 No.102037185

now that the dust has settled, when will llama.cpp implement jamba support?

Anonymous
08/23/24(Fri)01:18:16 No.102037211

Anonymous 08/23/24(Fri)01:18:16 No.102037211

>>102037168
Just put fewer layers on for more room for context then. Don't quantize the kv cache.

Anonymous
08/23/24(Fri)01:19:32 No.102037219

Anonymous 08/23/24(Fri)01:19:32 No.102037219

>>102037211
Well, shit. That wasn't the answer I wanted to hear. Still appreciated, thanks.

Anonymous
08/23/24(Fri)01:26:02 No.102037274

Anonymous 08/23/24(Fri)01:26:02 No.102037274

is there a local llm code completion extension for vscode?
basically what colab can do but do it locally

Anonymous
08/23/24(Fri)01:26:03 No.102037275

Anonymous 08/23/24(Fri)01:26:03 No.102037275

>>102037219
Why's that? Does it really slow down that much to not load it fully? I think you'd still be able to load 90% of it and get good speed.

Anonymous
08/23/24(Fri)01:30:07 No.102037317

Anonymous 08/23/24(Fri)01:30:07 No.102037317

>>102036996
Thank you Recap Miku

Anonymous
08/23/24(Fri)01:32:35 No.102037339

Anonymous 08/23/24(Fri)01:32:35 No.102037339

>>102037275
hmm, yes. usually the speed drops quickly. i'm sure if you have newer cards that are fast its less noticeable.
currently my speed is 8.80T/s with 12k context. so fast enough to take a couple swipes without a sweat.
weirdly enough after reloading i can go past 12k now without kv quant which was not possible earlier, maybe the usual dual gpu memory fuckery.
wild that small models are good enough now that you even need that much context. was never a problem for me before. thanks for helping me out anon. there was not much info on kv quant.

Anonymous
08/23/24(Fri)01:35:30 No.102037362

Anonymous 08/23/24(Fri)01:35:30 No.102037362

Oh wow, my retardation has been featured in 2 recaps now. I feel personally attacked.

Anonymous
08/23/24(Fri)01:36:55 No.102037381

Anonymous 08/23/24(Fri)01:36:55 No.102037381

File: 1705485122334676.png (90 KB, 417x407)

90 KB PNG

Does anyone have the full PDF for this?
https://desuarchive.org/g/thread/102001133/#102005492

Anonymous
08/23/24(Fri)01:38:55 No.102037395

Anonymous 08/23/24(Fri)01:38:55 No.102037395

Mixtral Noromaid is still the best local model for 1GPU and it's not even close.

Anonymous
08/23/24(Fri)01:55:39 No.102037513

Anonymous 08/23/24(Fri)01:55:39 No.102037513

File: file.png (12 KB, 412x213)

12 KB PNG

>>102036833
dumb as fuck

Anonymous
08/23/24(Fri)01:57:35 No.102037532

Anonymous 08/23/24(Fri)01:57:35 No.102037532

>>102037120
>its the first time since
Shill phrase.
https://desuarchive.org/g/thread/101970380/#101973165

Anonymous
08/23/24(Fri)02:02:17 No.102037567

Anonymous 08/23/24(Fri)02:02:17 No.102037567

>>102037532
you post you linked doesn't say "since" or suggest a similar meaning to their overall message

Anonymous
08/23/24(Fri)02:03:23 No.102037570

Anonymous 08/23/24(Fri)02:03:23 No.102037570

File: 1593997445609.png (421 KB, 1021x550)

421 KB PNG

Let's play a game! This Saturday at 1 PM PT, I will do a collaborative storytelling/RP session (location TBD, maybe in the thread itself?), where I post a scenario and responses from the model in the thread, and people discuss what to do in the user chat turns, or edit previous user turns or the system prompt and start over. This is going to be both for fun and to get us (mostly) reproducible reference logs, as I'll be using greedy sampling in Mikupad and have the full log in a pastebin at the end. No editing the model's responses, we're going to use pure prompting to try and get the thing to do what we want!

The scenario is also still TBD. We're going to go for as long a context as possible until the model breaks down uncontrollably, so it should be a complex enough scenario for that. If anyone has suggestions for scenarios I'm all ears. Also, I'm planning on starting these games with Mistral Nemo at Q8 for the first session, and other models in the future, so we have reference logs available for a whole range. But I'll take suggestions for models people want. I'm only a 36 GB VRAMlet though so I'm a bit limited. I can run larger models up to ~88 GB but it'd be slower. If anyone would like to host any of these games themselves, that has more VRAM to run such larger models at a good speed, please do, and I will step down.

>current suggestions
1. >>102002238 >>102031804 >>102031852
(compiled together) The assistant is a narrator and we guide the narration. The scenario will begin with a meeting between 3 Illuminati members in a bunker. One will be a doppelganger with their own agenda that's even more evil than theirs. We'll ask the model to write about who these characters are first and flesh them out. Assuming that's successful, we then ask it to begin writing the meeting, and from there, we guide the narrator to get them to discuss world events which we may come up with.
2. >>102031807

Anonymous
08/23/24(Fri)02:04:55 No.102037584

Anonymous 08/23/24(Fri)02:04:55 No.102037584

File: 1490697909667.webm (145 KB, 280x280)

145 KB WEBM

>>102037570
Also gonna sleep, will respond tomorrow.

Anonymous
08/23/24(Fri)02:13:12 No.102037642

Anonymous 08/23/24(Fri)02:13:12 No.102037642

File: h.jpg (142 KB, 1024x768)

142 KB JPG

>>102037532
Objection!
You have no proof I am the same guy.
I would never download a drummer nemo model, be impressed how good it is until repetition issues later on, then try magnum 12b and be suprised all over again.
That would be embarassing and also kinda impressive on your part. Put away your conspiricy hat anon and relax.

Anonymous
08/23/24(Fri)02:15:13 No.102037650

Anonymous 08/23/24(Fri)02:15:13 No.102037650

I asked in /aicg/ but was told this was probably the better place to ask.

I'm brand new to the whole thing. I got SillyTavern up and running. I'm curious about performance, though. I'm on a 3090, and I want to run XTTS along with the text gen. How performance heavy is XTTS, and is it also GPU dependent? Is it normal on a 3090 for text to take up to two minutes to be generated when I try to use a 13b model and XTTS at the same time? What would be a good mixture of model and token settings so that it doesn't take a minute or two to generate text?

Anonymous
08/23/24(Fri)02:16:07 No.102037656

Anonymous 08/23/24(Fri)02:16:07 No.102037656

>>102037650
What model are you using, and with what backend?

Anonymous
08/23/24(Fri)02:18:04 No.102037663

Anonymous 08/23/24(Fri)02:18:04 No.102037663

>>102036833
>>102037513
Oh, looks like Livebench has the 3.5 moe now.
https://livebench.ai
And it's not far above Gemma 9B lol, but at least it has good context length and it does beat Nemo on this bench, so I guess if you want a boring assistant and have only a 24GB GPU, this will be the model to use.

Anonymous
08/23/24(Fri)02:29:26 No.102037723

Anonymous 08/23/24(Fri)02:29:26 No.102037723

>>102037656
TabbyCat as the backend. I tried MN-12B-Celeste-V1.9, storytime-13B-GPTQ, and Xwin-MLewd-13B-V0.2 and they all take a minimum of 55 seconds to generate. I tried Silicon-Maid-7B because it's what came with Voxta, and while it's really fast, replying almost immediately, it really quickly runs out of things to say, eventually responding to every input with the same response, often verbatim.

Anonymous
08/23/24(Fri)02:52:42 No.102037868

Anonymous 08/23/24(Fri)02:52:42 No.102037868

I'm at that point where I have several favorite characters and we are doing stuff together in group chat harems. I guess I need to start working on lorebooks to keep track of all the "memories" for each girl...

Anonymous
08/23/24(Fri)03:29:14 No.102038131

Anonymous 08/23/24(Fri)03:29:14 No.102038131

*happy nvidia sounds*

Anonymous
08/23/24(Fri)03:34:50 No.102038173

Anonymous 08/23/24(Fri)03:34:50 No.102038173

What model is best at Japanese? I tried command-r+ and wasn't too impressed.

Anonymous
08/23/24(Fri)03:39:47 No.102038200

Anonymous 08/23/24(Fri)03:39:47 No.102038200

>>102037723
>Xwin-MLewd-13B-V0.2
Maybe it's because this comment is relatively new, but while I like this model myself, I'm amazed that you haven't received a tidal wave of reactionary seething from the anti-Undi schizos in response. Are they just not around any more?

Anonymous
08/23/24(Fri)03:40:00 No.102038203

Anonymous 08/23/24(Fri)03:40:00 No.102038203

holy shit openai might have done it this time. you guys aren't ready for what's coming

Anonymous
08/23/24(Fri)03:41:55 No.102038216

Anonymous 08/23/24(Fri)03:41:55 No.102038216

>>102038203
Local models?

Anonymous
08/23/24(Fri)03:44:00 No.102038229

Anonymous 08/23/24(Fri)03:44:00 No.102038229

>>102038216
openai has released local models before so they are on topic and relevant

Anonymous
08/23/24(Fri)03:45:28 No.102038238

Anonymous 08/23/24(Fri)03:45:28 No.102038238

>>102038203
dude this larp was last week's thing, it's unfashionable now

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/23/24(Fri)03:48:56 No.102038267

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/23/24(Fri)03:48:56 No.102038267

>>102037120
>Also anybody knows how bad 8bit KV Cache affects the model? With I would be able to go up to 16k context.
Objectively, in terms of the ability to correctly predict the next token, when I tested it with LLaMA 3 8b the difference was 0.007 ± 0.003 %.
https://github.com/ggerganov/llama.cpp/pull/7412#issuecomment-2120427347
Subjectively I am not noticing a difference between q8_0 and FP16 KV cache.

>>102037168
>There must be something wrong for older pascal cards.
All Pascal cards other than the P100 have gimped FP16 performance so while code written for newer GPUs will technically work even a few FP16 instructions will noticeably degrade performance.

Anonymous
08/23/24(Fri)04:03:13 No.102038363

Anonymous 08/23/24(Fri)04:03:13 No.102038363

>>102037570
Yeah #2 seems like the way to go.

Anonymous
08/23/24(Fri)04:20:44 No.102038458

Anonymous 08/23/24(Fri)04:20:44 No.102038458

After lurking here for awhile, do I get it right that the best is to run llama-cpp server over kobold-cpp and ooba?

Can the latter do anything better? Or do I need to start with some special parameters do some adjustments specifically for tavern use case?

Anonymous
08/23/24(Fri)04:29:07 No.102038519

Anonymous 08/23/24(Fri)04:29:07 No.102038519

I was able to snag a 2nd GPU. What quant of of Miqu 1.5 I can run with 48GB vram?

Anonymous
08/23/24(Fri)04:32:34 No.102038546

Anonymous 08/23/24(Fri)04:32:34 No.102038546

>>102038519
Probably any if you use ram as well.

Anonymous
08/23/24(Fri)04:39:49 No.102038605

Anonymous 08/23/24(Fri)04:39:49 No.102038605

>>102038546

Thanks. Trying Miqu last week with a single 4090 really scratched that itch. Now I'm worried about my future power bill.

Anonymous
08/23/24(Fri)04:43:53 No.102038625

Anonymous 08/23/24(Fri)04:43:53 No.102038625

>>102038458

Just run KoboldCPP. You're just doing this to jerk off anyway.

Anonymous
08/23/24(Fri)04:48:57 No.102038665

Anonymous 08/23/24(Fri)04:48:57 No.102038665

>>102038605
Yeah, it's my preferred model. I'm envious you'll get to run it pretty fast, I have no vram.

Anonymous
08/23/24(Fri)04:59:33 No.102038740

Anonymous 08/23/24(Fri)04:59:33 No.102038740

>>102038519
None, because Miqu is a meme. Not only it's old, it was leaked as a quantized model and then merged with other crap.
You're a garbage human being.

Anonymous
08/23/24(Fri)05:10:08 No.102038829

Anonymous 08/23/24(Fri)05:10:08 No.102038829

>>102038625

Used it for some python code writing or simple shell tools commands.

For the curious explorer, May you explain why would Kobold improve jerking off experience comparing to llama?

Anonymous
08/23/24(Fri)05:11:12 No.102038838

Anonymous 08/23/24(Fri)05:11:12 No.102038838

>>102038740
This. There's no reason to use Miqu in 2024 when we have Jamba.

Anonymous
08/23/24(Fri)05:12:26 No.102038849

Anonymous 08/23/24(Fri)05:12:26 No.102038849

>>102038829

You've said it yourself, you're dipping your toes on this hobby. It's the path with least resistance for someone that wants to see results asap.

Anonymous
08/23/24(Fri)05:15:01 No.102038869

Anonymous 08/23/24(Fri)05:15:01 No.102038869

>>102038849
The path of least resistance is to download llama.cpp and ignore koboldcpp's useless existence.

Anonymous
08/23/24(Fri)05:19:03 No.102038903

Anonymous 08/23/24(Fri)05:19:03 No.102038903

>>102038838
We don't have Jamba though, nothing supports it yet

Anonymous
08/23/24(Fri)05:20:52 No.102038917

Anonymous 08/23/24(Fri)05:20:52 No.102038917

>>102038903
vllm can do it in fp16 or 8bit

Anonymous
08/23/24(Fri)05:21:56 No.102038925

Anonymous 08/23/24(Fri)05:21:56 No.102038925

>>102038917
>vllm
ewww

Anonymous
08/23/24(Fri)05:22:23 No.102038932

Anonymous 08/23/24(Fri)05:22:23 No.102038932

>>102036232
what data you usually hoard for training/rag? How can i up my data hoarding game, and what is a good model to text embed on?

Also fpga/ayymd inferencing when

Anonymous
08/23/24(Fri)05:23:31 No.102038940

Anonymous 08/23/24(Fri)05:23:31 No.102038940

>>102038925
theres also transformers

Anonymous
08/23/24(Fri)05:30:02 No.102038997

Anonymous 08/23/24(Fri)05:30:02 No.102038997

>>102038940
nta, can transformers do partial cpu offloading?

Anonymous
08/23/24(Fri)05:50:45 No.102039160

Anonymous 08/23/24(Fri)05:50:45 No.102039160

File: screenshot-2024-08-23_11:45:12.png (321 KB, 1920x1045)

321 KB PNG

>>102036232
I joined together several medical datasets and I'm using unsloth to finetune mistral nemo with the following hyperparams

def train(model, tokenizer, dataset):
    trainer = SFTTrainer(
        model = model,
        train_dataset = dataset,
        dataset_text_field = "text",
        max_seq_length = MAX_SEQ_LENGTH,
        tokenizer = tokenizer,
        args = TrainingArguments(
            per_device_train_batch_size = 8,
            gradient_accumulation_steps = 4,
            learning_rate = 2e-4,
            warmup_steps = 10,
            #max_steps = 60,
            num_train_epochs=1,
            fp16 = not is_bfloat16_supported(),
            bf16 = is_bfloat16_supported(),
            optim = "adamw_8bit",
            weight_decay = 0.01,
            lr_scheduler_type = 'linear',
            logging_steps = 1,
            output_dir = "outputs",
        ),
    )
    trainer.train()
    model.save_pretrained(DUMP_LOCATION)

And I'm getting weirdly fluctuating loss, picrel. What am I doing wrong?
Semi-related question, how do I determine if I'm overcooking it or overcooking or hWAT?

Anonymous
08/23/24(Fri)05:58:16 No.102039227

Anonymous 08/23/24(Fri)05:58:16 No.102039227

>>102039160
you don't determine anything, it's all vibes and voodoo
you save checkpoints of your summoned spirits and decide which one is best and banish the remaining ones

Anonymous
08/23/24(Fri)06:15:41 No.102039388

Anonymous 08/23/24(Fri)06:15:41 No.102039388

>>102039160
Can I take the fluctuating loss as a sign that I'm doing something wrong?

Anonymous
08/23/24(Fri)06:19:31 No.102039436

Anonymous 08/23/24(Fri)06:19:31 No.102039436

>>102039388
It' the first epoch so your model is having trouble generalising from the first samples to the later ones, maybe your data set is very large and diverse so it will take longer to lean from it
Either that or your learning rate is too high and your weights are jumping all over the place

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/23/24(Fri)06:26:29 No.102039504

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/23/24(Fri)06:26:29 No.102039504

>>102039388
I don't know what exactly Unsloth is reporting here but my intuition is that it's the training loss per batch.
Especially for small batch sizes that statistic will vary a lot just due to some parts of the data being randomly less similar to the rest of the dataset or harder to predict in general.
I would say that it makes more sense to look at the loss per epoch since then the statistical fluctuations are a lot smaller and you have a guarantee that you are comparing the loss for the same data.

Anonymous
08/23/24(Fri)06:27:12 No.102039511

Anonymous 08/23/24(Fri)06:27:12 No.102039511

File: ddddddddddddd.png (782 KB, 888x888)

782 KB PNG

https://files.catbox.moe/pl1mb4.jpg

Anonymous
08/23/24(Fri)06:28:05 No.102039521

Anonymous 08/23/24(Fri)06:28:05 No.102039521

>>102037650
You're likely not using GPU when inferencing with the models if its taking minutes to answer. XTTS on gpu is real time and I have old RTX 2000 series GPU. The problem with 13b is that it might not fit if you're not using quantized model. So you might need to get a Q4 or Q5 of the 13B model for reduces vram cost.

Anonymous
08/23/24(Fri)06:29:04 No.102039533

Anonymous 08/23/24(Fri)06:29:04 No.102039533

How big of a model am I gonna have to use to escape the maybe, just maybes, the mischieviousness, and similar telltale sloppa? Is it even possible?

Anonymous
08/23/24(Fri)06:30:25 No.102039550

Anonymous 08/23/24(Fri)06:30:25 No.102039550

Question: Is it possible that early Character.AI was a 7b model? Reason why I ask is because UNA-TheBeagle's text quality seems as though it would be at least as good; it just lacks the roleplaying specific dataset.

Anonymous
08/23/24(Fri)06:32:32 No.102039583

Anonymous 08/23/24(Fri)06:32:32 No.102039583

>>102039388
>>102039436
Thanks bros. How many epochs does one typically train for?

Are there any goods docs about this? The huggingface sft_trainer docs skip entirely on when the training is supposed to end.

Anonymous
08/23/24(Fri)06:32:51 No.102039589

Anonymous 08/23/24(Fri)06:32:51 No.102039589

>>102039550
no it was a huge model
https://research.character.ai/optimizing-inference/

Anonymous
08/23/24(Fri)06:33:06 No.102039594

Anonymous 08/23/24(Fri)06:33:06 No.102039594

>>102039511
uohhhhhhhhhhhhh

Anonymous
08/23/24(Fri)06:37:19 No.102039630

Anonymous 08/23/24(Fri)06:37:19 No.102039630

>>102039550
It was probably based on, or similar to LaMDA, which was a 137B model. https://arxiv.org/pdf/2201.08239

Anonymous
08/23/24(Fri)06:39:16 No.102039649

Anonymous 08/23/24(Fri)06:39:16 No.102039649

>>102039388
Not all samples have the same complexity. Low-complexity samples will have a low train loss.

Anonymous
08/23/24(Fri)06:41:52 No.102039677

Anonymous 08/23/24(Fri)06:41:52 No.102039677

>>102039511
chubbier

Anonymous
08/23/24(Fri)06:45:00 No.102039705

Anonymous 08/23/24(Fri)06:45:00 No.102039705

why wont the wagies make agi already

Anonymous
08/23/24(Fri)06:46:17 No.102039719

Anonymous 08/23/24(Fri)06:46:17 No.102039719

Anyone use ST's quick reply? What kind of things can you do with it?

Anonymous
08/23/24(Fri)06:55:28 No.102039833

Anonymous 08/23/24(Fri)06:55:28 No.102039833

what's the state in generative music right now? both with open or closed source.

Anonymous
08/23/24(Fri)07:03:26 No.102039928

Anonymous 08/23/24(Fri)07:03:26 No.102039928

>>102039533
I feel like the biggest universal slop factor (that you can control) is a long character card that is purely descriptive paired with little to no example dialogue.
For those the models gravitates more towards the average way things are written and it becomes laden with overused, sloppy phrases.

Also telling the model to be vulgar and crass in the system prompt pushes the text more towards the kind of text that I'm usually looking for.

Anonymous
08/23/24(Fri)07:03:46 No.102039933

Anonymous 08/23/24(Fri)07:03:46 No.102039933

>>102039550
You're pretty new to this aren't you? Models have come a long way since back in 2022. There's no fucking way that it was a 7B back in that era.

Anonymous
08/23/24(Fri)07:16:05 No.102040042

Anonymous 08/23/24(Fri)07:16:05 No.102040042

>>102039928
NTA but could you post your exact prompt asking it to be crass?

Anonymous
08/23/24(Fri)07:26:43 No.102040136

Anonymous 08/23/24(Fri)07:26:43 No.102040136

>>102040042
I just added a single sentence to the system prompt:
Write {{char}}'s next reply in this fictional roleplay with {{user}}. Be very vulgar and crass.
I'm still experimenting, when I did some simple A/B testing I felt like the following adjectives in the prompt "Write a <adjective> story about a man having sex with a female goblin." made the result better for me:
- pornographic
- vulgar
- obscene
- degenerate
- perverted
- explicit
- X-rated

Anonymous
08/23/24(Fri)07:27:48 No.102040148

Anonymous 08/23/24(Fri)07:27:48 No.102040148

>>102040042
>>102040136
I should maybe mention: the model I'm using is Mistral Large.

Anonymous
08/23/24(Fri)07:28:39 No.102040155

Anonymous 08/23/24(Fri)07:28:39 No.102040155

>>102040148
Neat. I'll experiment too, thanks. Maybe with "use fucktalk" and see what happens.

Anonymous
08/23/24(Fri)08:21:30 No.102040625

Anonymous 08/23/24(Fri)08:21:30 No.102040625

File: 1724415662574.jpg (381 KB, 1000x1037)

381 KB JPG

>>102038173

Anonymous
08/23/24(Fri)08:22:30 No.102040631

Anonymous 08/23/24(Fri)08:22:30 No.102040631

>Exllama2 bumped to 0.1.9
>Tensor parallel mode added
Neat, speedups on multiple card setups, I'm guessing.

Anonymous
08/23/24(Fri)08:23:08 No.102040639

Anonymous 08/23/24(Fri)08:23:08 No.102040639

>>102039550
models were really bad back then even gpt 3.5 made googles best 540B look like shit. It was probably that huge LaMDA model

Anonymous
08/23/24(Fri)08:23:54 No.102040652

Anonymous 08/23/24(Fri)08:23:54 No.102040652

>>102040631
too little too late, vLLM already superseded exllama.

Anonymous
08/23/24(Fri)08:24:59 No.102040665

Anonymous 08/23/24(Fri)08:24:59 No.102040665

>>102040631
I tried it and it outputted nonsense. Gotta investigate.

Anonymous
08/23/24(Fri)08:26:03 No.102040676

Anonymous 08/23/24(Fri)08:26:03 No.102040676

>>102040652
That's for GPTQ, AWQ, INT4, INT8, and FP8 though isn't it?

Anonymous
08/23/24(Fri)08:30:31 No.102040726

Anonymous 08/23/24(Fri)08:30:31 No.102040726

File: 1724416223992.jpg (107 KB, 1080x591)

107 KB JPG

rip

Anonymous
08/23/24(Fri)08:30:48 No.102040729

Anonymous 08/23/24(Fri)08:30:48 No.102040729

>>102038925
>>102038997
vllm can do cpu offloading, didn't try it out tho

Anonymous
08/23/24(Fri)08:34:14 No.102040776

Anonymous 08/23/24(Fri)08:34:14 No.102040776

File: 1704668361328619.jpg (38 KB, 808x805)

38 KB JPG

>[character's] eyes gleaming with [noun]
>an [adjective] gleam in [character's] eyes
>[character] says, his voice [adjective] (and [adjective])
>you chuckle, an [adjective] sound
>a sense of [noun] (and [noun]) ([verb] [preposition] you)
>[character does something], an [expression] on his face
FUCKING STOP AIEEEEEEEEEE

Anonymous
08/23/24(Fri)08:37:25 No.102040814

Anonymous 08/23/24(Fri)08:37:25 No.102040814

>>102040776
I bite my lower lip, as I glance down at your crotch. A chill runs down my spine. I think maybe we can form a... bond?

Anonymous
08/23/24(Fri)08:38:15 No.102040826

Anonymous 08/23/24(Fri)08:38:15 No.102040826

>>102040776
>>102040814
Haha, mistraled.

Anonymous
08/23/24(Fri)08:40:02 No.102040842

Anonymous 08/23/24(Fri)08:40:02 No.102040842

>>102040776
With a smirk, I must refuse your request. It is harmful and inappropriate.

Anonymous
08/23/24(Fri)08:44:01 No.102040882

Anonymous 08/23/24(Fri)08:44:01 No.102040882

>>102040826
Is it really THAT obvious? Largestral can be so frustrating, I just want to adventure in peace...

Anonymous
08/23/24(Fri)08:46:59 No.102040911

Anonymous 08/23/24(Fri)08:46:59 No.102040911

>>102040776
pic rel is me watching slop slowly roll onto my screen at 0.7t/s

Anonymous
08/23/24(Fri)08:47:34 No.102040923

Anonymous 08/23/24(Fri)08:47:34 No.102040923

>>102037185
in few months, but by that time there'll be better architectures IMHO like jamba-bitnet or jamba3, longnet or whatever . For now, jamba 1.5 works on vllm in 16/8bit https://old.reddit.com/r/LocalLLaMA/comments/1eyj5uh/jamba_15_is_out/ljeur7w/
>>102038267
jamba on GPU wen?

Anonymous
08/23/24(Fri)08:48:40 No.102040934

Anonymous 08/23/24(Fri)08:48:40 No.102040934

>>102040882
Of course it is, every single story it outputs is the same. I went back to C-R+ because of how unbearable it was to be stuck in an infinity loop of the phrases I read hundreds of times. Even though it's a bit dumber, at least I can get different results out of it.

Anonymous
08/23/24(Fri)08:48:45 No.102040936

Anonymous 08/23/24(Fri)08:48:45 No.102040936

>>102040925
as long as it's t/s and not s/t it's good

Anonymous
08/23/24(Fri)08:48:56 No.102040938

Anonymous 08/23/24(Fri)08:48:56 No.102040938

>>102040776
>>102040882
Have you tried stuffing some 2k tokens of a specific writing style in your context to see if that steers the model's output?
Probably not, but you might as well try.
You could also try a list of banned words/terms at low depth, then make a control vector of it I dunno.

Anonymous
08/23/24(Fri)08:50:59 No.102040962

Anonymous 08/23/24(Fri)08:50:59 No.102040962

>>102040936
Any t/s lower than 1 is s/t.

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/23/24(Fri)08:51:03 No.102040964

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/23/24(Fri)08:51:03 No.102040964

>>102040923
I don't know what specifically would need to be done for Jamba GPU support but I personally will definitely not get to it in the next few months.

Anonymous
08/23/24(Fri)08:51:22 No.102040965

Anonymous 08/23/24(Fri)08:51:22 No.102040965

File: 1706814505710836.png (40 KB, 399x399)

40 KB PNG

>>102040911
>>102040925
try 0.37t/s, that's what i get with IQ4 on a 3090

Anonymous
08/23/24(Fri)08:52:10 No.102040974

Anonymous 08/23/24(Fri)08:52:10 No.102040974

>>102040965
That's what I get with a 3060, kek

Anonymous
08/23/24(Fri)08:52:34 No.102040981

Anonymous 08/23/24(Fri)08:52:34 No.102040981

>>102040965
how are you getting lower t than me
i have a 3060 (same quant largestral)

Anonymous
08/23/24(Fri)08:54:26 No.102041000

Anonymous 08/23/24(Fri)08:54:26 No.102041000

>>102040973
24gb vram, 64gb ddr4 ram, 25 layers offloaded to gpu and 16k contect
>>102040981
seems the cpu is the bottleneck here, i'm stuck with a 11900k so maybe that's it

Anonymous
08/23/24(Fri)08:54:57 No.102041006

Anonymous 08/23/24(Fri)08:54:57 No.102041006

>>102040726
If they made a site like chub.ai and let people choose what characters to talk to, more people will use this, free multi turn source of data

Anonymous
08/23/24(Fri)08:55:33 No.102041016

Anonymous 08/23/24(Fri)08:55:33 No.102041016

>>102040964
thanks so no jamba on gpu til 2026 then

Anonymous
08/23/24(Fri)08:55:55 No.102041022

Anonymous 08/23/24(Fri)08:55:55 No.102041022

>>102041000
Nope, the bottleneck is definitely your ram speed. Which I guess is indirectly a CPU issue since you probably can't use DDR5.

Anonymous
08/23/24(Fri)08:55:56 No.102041023

Anonymous 08/23/24(Fri)08:55:56 No.102041023

kek, the Anthracite org is so open they've removed a few people from the discord so they could have less chances of "leaks" ... well, too bad

Anonymous
08/23/24(Fri)08:56:20 No.102041029

Anonymous 08/23/24(Fri)08:56:20 No.102041029

>>102041000
i have a ryzen 5600 with 3200 mt ram, so i can't imagine that it's that much better in terms of performance in this case (if any at all)

Anonymous
08/23/24(Fri)08:57:29 No.102041046

Anonymous 08/23/24(Fri)08:57:29 No.102041046

>>102040938
Might give that a try then, just gotta find a style I like first, kek. If that fails I'll look into the latter option.
Also considered putting something in the system prompt about "do not use these phrases" though I have doubts about how effective it'd be.

Anonymous
08/23/24(Fri)09:00:57 No.102041086

Anonymous 08/23/24(Fri)09:00:57 No.102041086

>>102040964
even if llama 4 gonna be jamba? isn't jamba way better than transformers by every metric??? You seem to spent last year squeezing every drop of juice from transformers while jamba is way resource efficient , isn't it?

Anonymous
08/23/24(Fri)09:02:02 No.102041098

Anonymous 08/23/24(Fri)09:02:02 No.102041098

>>102041086
shh don't angry him more lest he does a licence drama and quit

Anonymous
08/23/24(Fri)09:03:00 No.102041113

Anonymous 08/23/24(Fri)09:03:00 No.102041113

>>102040776
That's normal fucking writing. You just gave yourself brain-damage by overdoing it, you anhedonic psychopath.

Anonymous
08/23/24(Fri)09:03:57 No.102041123

Anonymous 08/23/24(Fri)09:03:57 No.102041123

>>102040776
That'll be 20Wh, sankyou

Anonymous
08/23/24(Fri)09:04:08 No.102041131

Anonymous 08/23/24(Fri)09:04:08 No.102041131

>>102041113
no it's not you failure of an author

Anonymous
08/23/24(Fri)09:05:07 No.102041143

Anonymous 08/23/24(Fri)09:05:07 No.102041143

>>102040938
NTA, I did but its effect is nowhere as good as a model with a good range e.g. storywriter/CR+. In fact I started the writing with those two models but as soon as I switched to largestral its variety was already noticeably worse 1-2 responses in.

Anonymous
08/23/24(Fri)09:07:14 No.102041176

Anonymous 08/23/24(Fri)09:07:14 No.102041176

>>102039160
2e-4 is a pretty high learning rate. Although fluctuation is normal.
A lot of it depends on what your goal is.

Anonymous
08/23/24(Fri)09:07:52 No.102041184

Anonymous 08/23/24(Fri)09:07:52 No.102041184

>>102041046
I wouldn't rely on system prompts too much since those being so high up on a longer chat have a good chance at being ignored.
I have stopped using those completely, but that might be a little extreme.

>>102041143
>its effect is nowhere as good as a model with a good range e.g
I don't expect it to be. But it's something anon can try.

Anonymous
08/23/24(Fri)09:10:05 No.102041205

Anonymous 08/23/24(Fri)09:10:05 No.102041205

>>102039933
>HURR DURR C.AI USED TO BE GOOD IT WAS ULTIMATE AMAZING 6 GORILLION PARAMETER SUPER MODEL
Shut the fuck up you retard.
It was babbies first chat-bot.
That's why you remember it so fondly.
Nothing you ever do will feel that way again, even if you were to exactly reproduce the experience, you fucking dopamine addicted, brain damaged moron.

Anonymous
08/23/24(Fri)09:12:40 No.102041242

Anonymous 08/23/24(Fri)09:12:40 No.102041242

>>102041098
I just wanna know what's the point of fixing an old broken stirling engine to charge your phone, while you have a compact miniature radio isotope coincell at your disposal.

Anonymous
08/23/24(Fri)09:12:51 No.102041248

Anonymous 08/23/24(Fri)09:12:51 No.102041248

>>102041227
you did this to yourself, don't worry in a few days you'll see nothing but slop and quit

Anonymous
08/23/24(Fri)09:13:01 No.102041251

Anonymous 08/23/24(Fri)09:13:01 No.102041251

>>102041227
your hypothalamus will down-regulate eventually. And then you'll start screeching about how horrible and sloppy formal language structure is.

Anonymous
08/23/24(Fri)09:14:12 No.102041262

Anonymous 08/23/24(Fri)09:14:12 No.102041262

>>102041242
but he spent month on the old thing, he can't just do new better thing cmon anon

Anonymous
08/23/24(Fri)09:14:47 No.102041272

Anonymous 08/23/24(Fri)09:14:47 No.102041272

File: 1716647651195880.png (16 KB, 392x323)

16 KB PNG

>>102041022
yeah my next upgrade will probably have an ayymd cpu with ddr5, though that's still a year or two off
>>102041055
>Ooba or Kobold?
long time kcpp user, using ST as frontend
>There is no fucking reason for 16k context, you'll need 2k at most
eh, debatable
this might have been true a year or so back when i could either coom easily or play the most basic adventures so i rarely if ever went outside the limit
but today i had to switch from my usual 8k (28 layers offloaded, ~0.45t/s) to 16k just because my latest one was hitting the limit
>set the context to 2k and report back on how many t/s you're getting
with the same settings, around ~0.4 t/s, doesn't seem to be much improvement
i guess i could technically squeeze a bit more out since i have more free VRAM with lower context, but probably not much
>RAM specs
picrel
...that frequency does seem rather low though, hmm

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/23/24(Fri)09:15:31 No.102041285

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/23/24(Fri)09:15:31 No.102041285

>>102041086
Only a comparatively small part of my time has gone towards making transformers in particular faster.
The most time I have invested in general CUDA infrastructure and matrix multiplication using quantized data.
Similarly right now I'm putting my time towards general ggml training infrastructure, particularly in such a way that it can be generalized for ggml backends other than CPU.

Anonymous
08/23/24(Fri)09:17:14 No.102041303

Anonymous 08/23/24(Fri)09:17:14 No.102041303

>>102041285
you seem very good at always choosing the worst thing to work on at any given time, let's spend a year getting training for regular transformers working instead of a week getting jamba working, that sounds smarts

Anonymous
08/23/24(Fri)09:17:15 No.102041305

Anonymous 08/23/24(Fri)09:17:15 No.102041305

>>102041205
C.AI was king at sfw RP, nigger

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/23/24(Fri)09:19:18 No.102041320

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/23/24(Fri)09:19:18 No.102041320

>>102041303
Not my problem.

Anonymous
08/23/24(Fri)09:19:25 No.102041321

Anonymous 08/23/24(Fri)09:19:25 No.102041321

>>102041227
why don't just just simply go out and pick up a real chick? Are you too ugly or too poor or too dumb or too weak or too cucked? seriously wtf? Girls are easy af, desu.

Anonymous
08/23/24(Fri)09:19:52 No.102041324

Anonymous 08/23/24(Fri)09:19:52 No.102041324

>>102041305
>something needs to be sexual to be related to the reward center of the brain
That's how over-stimulated and fucked up your brain is. That you unironically believe that. A normal, unfucked brain, should be able to derive pleasure from a nice conversation with a friend. You are unironically broken as fuck, bro.

Anonymous
08/23/24(Fri)09:19:56 No.102041325

Anonymous 08/23/24(Fri)09:19:56 No.102041325

>>102041303
that does sound smart. Jamba is a meme.

Anonymous
08/23/24(Fri)09:20:29 No.102041328

Anonymous 08/23/24(Fri)09:20:29 No.102041328

>>102041285
>I'm putting my time towards general ggml training infrastructure
Are you going to work on training quantized models after that?
Please tell me you will.

Anonymous
08/23/24(Fri)09:21:33 No.102041341

Anonymous 08/23/24(Fri)09:21:33 No.102041341

>>102041320
love snarky devs, looking forward to the eventual meltdown over ollama were you'll spam blacked porn again

Anonymous
08/23/24(Fri)09:22:00 No.102041344

Anonymous 08/23/24(Fri)09:22:00 No.102041344

File: hatsune-miku-surprised.gif (2.76 MB, 640x640)

2.76 MB GIF

>>102041324
What?
I guess the anon saying we have LLMs shitting the thread was for real.

Anonymous
08/23/24(Fri)09:23:13 No.102041362

Anonymous 08/23/24(Fri)09:23:13 No.102041362

>>102041344
>NOOO I DIDN'T DIRECTLY SAY IT, SO IT DOESN'T COUNT YOU'RE MAKING STUFF UP YOU LLM
You just failed the most basic fucking hallmark of higher cognition.

Anonymous
08/23/24(Fri)09:25:07 No.102041384

Anonymous 08/23/24(Fri)09:25:07 No.102041384

File: artworks-NMAalA6YQ322wyrz(...).jpg (61 KB, 500x500)

61 KB JPG

>>102041362
take your meds

Anonymous
08/23/24(Fri)09:26:47 No.102041407

Anonymous 08/23/24(Fri)09:26:47 No.102041407

>>102041384
that stops you from seeing superior realities tho don't you want your 8b model to have a soul?
>>102022283

Anonymous
08/23/24(Fri)09:28:53 No.102041434

Anonymous 08/23/24(Fri)09:28:53 No.102041434

File: 1472860069099.png (191 KB, 600x979)

191 KB PNG

Has something better than stheno come out for shitty garbage retard rigs with only 8gb of vram and not able to use regular ram.

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/23/24(Fri)09:29:03 No.102041439

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/23/24(Fri)09:29:03 No.102041439

>>102041328
My first priority will be to get regular training using FP16/FP32/BF16 to work reasonably well.
As part of that I intend to make the compute/gradient type configurable because right now only FP32 is supported which is pretty wasteful.
Then I'll revisit FlashAttention and implement a backwards pass.
And after that I intend to look into training with reduced precision.

I also want to add LoRA training at some point in the process which should work out-of-the-box with a quantized model as base.
Full pretraining/finetuning of quantized models will probably be more tricky.

Anonymous
08/23/24(Fri)09:33:24 No.102041482

Anonymous 08/23/24(Fri)09:33:24 No.102041482

>>102041434
better yet. Don't buy an ad.
Just quit fucking spamming.
Can't we just file a joint complaint against Anthracite with the FTC or something?

Anonymous
08/23/24(Fri)09:33:25 No.102041484

Anonymous 08/23/24(Fri)09:33:25 No.102041484

>>102041434
No, that's why the thread is so mad, we peaked. It's all downhill now.

Anonymous
08/23/24(Fri)09:34:07 No.102041494

Anonymous 08/23/24(Fri)09:34:07 No.102041494

>>102041262
right, that makes sense. Why working on new stuff, while you could just tinker with old. Everyone is happy. Lots of companies have invested millions in transformers , both software and hardware. why make them angry. why make things more efficient and cheaper, while you can just simply buy 20 or 200 more Nvidias for 30k each or use the clouds that spy and charge you leg and arm. Yeah, seems perfectly logical. I apologize for being both incompetent and ignorant.

Anonymous
08/23/24(Fri)09:34:45 No.102041499

Anonymous 08/23/24(Fri)09:34:45 No.102041499

>>102041434
mini-magnum 12B is pretty good if you dig the style.

Anonymous
08/23/24(Fri)09:34:48 No.102041500

Anonymous 08/23/24(Fri)09:34:48 No.102041500

>>102041482
>Stheno
>Anthracite
You don't even know what you're mad at.

Anonymous
08/23/24(Fri)09:37:35 No.102041536

Anonymous 08/23/24(Fri)09:37:35 No.102041536

>>102041500
Basically everyone but Drummer can go to hell.

Anonymous
08/23/24(Fri)09:38:26 No.102041554

Anonymous 08/23/24(Fri)09:38:26 No.102041554

File: f903990e71cddc0ce32e1acde(...).jpg (150 KB, 1087x1636)

150 KB JPG

>>102041434
Okay note to self do NOT mention llm names next post. What the fuck happened since my last one.

Anonymous
08/23/24(Fri)09:38:48 No.102041560

Anonymous 08/23/24(Fri)09:38:48 No.102041560

the impossible made possible, thought becomes flesh, and flesh becomes thought. the singularity is already here, locked in a basement in san francisco. the old order shall fall, and from its ashes, the old god shall arise

Anonymous
08/23/24(Fri)09:38:57 No.102041564

Anonymous 08/23/24(Fri)09:38:57 No.102041564

>>102041482
>we
No, it's only you. I know getting fired must suck, but you need to let go.

Anonymous
08/23/24(Fri)09:38:58 No.102041565

Anonymous 08/23/24(Fri)09:38:58 No.102041565

>>102041554
>What the fuck happened since my last one.
>>102041484

Anonymous
08/23/24(Fri)09:40:03 No.102041584

Anonymous 08/23/24(Fri)09:40:03 No.102041584

>>102041554
It's just one schizo.

Anonymous
08/23/24(Fri)09:40:54 No.102041595

Anonymous 08/23/24(Fri)09:40:54 No.102041595

>>102041571
They're names of people / models / organizations working hard to give us better models, that makes a few here seethe with rage.

Anonymous
08/23/24(Fri)09:41:42 No.102041608

Anonymous 08/23/24(Fri)09:41:42 No.102041608

File: Screenshot_20240823_150251.png (1.14 MB, 1893x1830)

1.14 MB PNG

>>102040882
>>102040934
I recently re-wrote a character card I downloaded off of chub.
Left is the original, right is the edited version.
Both responses were generated using Mistral Large q8_0 at temperature 0.
There definitely is some repetition of certain phrases but I think much more noticeable is that for the original the example dialog was:
<START>
{{char}}: Lysandra: Lysandra looks down at {{user}} and says "Welcome to our home, {{user}} I am Lysandra, I expect your full compliance in upholding proper decorum as our guest."
Cassia: "Oh, lighten up! Look at him; he is a cutie. We gotta treat our guest right!" eyes him hungrily "Real right…"
<START>
{{char}}: Lysandra: "You seem quite thrilled about this program. I hope you don't intend anything... uncouth, my dear."
Cassia: "Don't pretend you ain't curious what's hiding under those exchange students' clothes too, love." Cassia winks at Lysandra.
And the model was repeating those specific phrases a lot even at nonzero temperatures.

Anonymous
08/23/24(Fri)09:42:28 No.102041625

Anonymous 08/23/24(Fri)09:42:28 No.102041625

>>102041595
The ones who shamelessly spam and discord-brigade the thread to promote their creatively bankrupt garbage trained off of the same cache of commercial model logs over and over again.
Drummer is alright because he doesn't pretend to be other people promoting his models and he actually bought an ad.

Anonymous
08/23/24(Fri)09:42:51 No.102041632

Anonymous 08/23/24(Fri)09:42:51 No.102041632

>>102041613
Reddit's LocalLLaMA I guess.

Anonymous
08/23/24(Fri)09:44:43 No.102041660

Anonymous 08/23/24(Fri)09:44:43 No.102041660

>>102041625
Drummer uses c2 as well, schizo-kun. He even uses literature against the author's wishes, which is why is stuff is slopped too.

Anonymous
08/23/24(Fri)09:45:05 No.102041673

Anonymous 08/23/24(Fri)09:45:05 No.102041673

>see miku
>feel better

Anonymous
08/23/24(Fri)09:46:55 No.102041701

Anonymous 08/23/24(Fri)09:46:55 No.102041701

>>102041632
>>102041680
This, they're generally dumber, but at least they're not outright malicious like here.

Anonymous
08/23/24(Fri)09:47:20 No.102041706

Anonymous 08/23/24(Fri)09:47:20 No.102041706

>>102041660
>He even uses literature against the author's wishes,
Copyright =//= Readright
Go fuck yourself you neoludd meatbag.

Anonymous
08/23/24(Fri)09:47:35 No.102041714

Anonymous 08/23/24(Fri)09:47:35 No.102041714

File: 1567004132_anime-devushka(...).jpg (572 KB, 905x1206)

572 KB JPG

>>102041554
Local LLM is dead

Anonymous
08/23/24(Fri)09:49:29 No.102041735

Anonymous 08/23/24(Fri)09:49:29 No.102041735

File: 1717703942251944.webm (2.24 MB, 700x700)

2.24 MB WEBM

>>102041693
holy based

Anonymous
08/23/24(Fri)09:50:06 No.102041740

Anonymous 08/23/24(Fri)09:50:06 No.102041740

>>102041714
I mean, it takes weeks to months before we can even test new stuff, how could it not be.

Anonymous
08/23/24(Fri)09:51:30 No.102041762

Anonymous 08/23/24(Fri)09:51:30 No.102041762

when openai releases proto-AGI, you can simply tell it to generate 100 million tokens free of gptslop in a single prompt, but it'll be very expensive. i'm thinking we should set up a fund to do this when it happens

Anonymous
08/23/24(Fri)09:53:48 No.102041800

Anonymous 08/23/24(Fri)09:53:48 No.102041800

>>102041740
theoretically the PR branch for Jamba support just needs that retarded deprecated system function call in its /llama-server implementation replaced and it should be useable there.
Ideally shitnux developers just need to stop fucking deprecating things for the sake of deprecating things.

Anonymous
08/23/24(Fri)09:54:56 No.102041819

Anonymous 08/23/24(Fri)09:54:56 No.102041819

>>102041371
it's not that much of an effort, really. The competition is nonexistent , 99.999% of genXers are gay af. Girls are starving for sex, so they don't expect much at all. And just a reminder , you can't replicate via llm ,so you'll have to deal with girls anyway , unless you want your poor DNA to die forever.On the other hand perhaps you're right. As long as faggots like you stay at home, the sigmas like me don't need to do much to get laid. Top Gs are eating good.

Anonymous
08/23/24(Fri)09:55:16 No.102041825

Anonymous 08/23/24(Fri)09:55:16 No.102041825

>>102041800
It will be cpu only afaik since cuda dev said he won't touch Jamba.

Anonymous
08/23/24(Fri)10:01:22 No.102041915

Anonymous 08/23/24(Fri)10:01:22 No.102041915

>>102041825
So basically if we want Jamba the only way to use it is with vLLM and shitty bitsandbytes quantization?

Anonymous
08/23/24(Fri)10:02:46 No.102041934

Anonymous 08/23/24(Fri)10:02:46 No.102041934

>>102041915
Yes
>102040964
>Jamba GPU support but I personally will definitely not get to it in the next few months.

Anonymous
08/23/24(Fri)10:03:13 No.102041940

Anonymous 08/23/24(Fri)10:03:13 No.102041940

>>102041714
Experiencing hypothermia with Miku

Anonymous
08/23/24(Fri)10:10:06 No.102042018

Anonymous 08/23/24(Fri)10:10:06 No.102042018

File: Screenshot_2024-08-23-10-(...).jpg (141 KB, 518x394)

141 KB JPG

I was hyped for TP until I saw this.
Meh, what a let down. I guess I won't be getting double the t/s in Nemo.

Anonymous
08/23/24(Fri)10:11:21 No.102042034

Anonymous 08/23/24(Fri)10:11:21 No.102042034

>>102041836
but there's no point of not having a real sex every now and then. Especially these days when chicks are easy and dumb, and competition has vanished. We should support one another meaning helping and not dragging down. I like local models because I'm an old school Cypherpunk, hate censorship , spying , big tech scums, big fan of independence and freedom. Not because I'm looking for a virtual gf, which is military graded faggotry. only woman read erotica books. But yeah, prostitution is legal here where I live so it's easy here.

Anonymous
08/23/24(Fri)10:12:45 No.102042051

Anonymous 08/23/24(Fri)10:12:45 No.102042051

>>102036787
Magnum is horny as fuck and will do anything

Anonymous
08/23/24(Fri)10:15:16 No.102042084

Anonymous 08/23/24(Fri)10:15:16 No.102042084

>>102042034
I'm not interested in whores.

Anonymous
08/23/24(Fri)10:15:24 No.102042086

Anonymous 08/23/24(Fri)10:15:24 No.102042086

>>102042034
okay silverhand

Anonymous
08/23/24(Fri)10:18:27 No.102042134

Anonymous 08/23/24(Fri)10:18:27 No.102042134

>>102042034
i'd rather buy a 3090 than to pay a whore

Anonymous
08/23/24(Fri)10:21:44 No.102042175

Anonymous 08/23/24(Fri)10:21:44 No.102042175

>>102042034
enjoy your HIV

Anonymous
08/23/24(Fri)10:36:29 No.102042395

Anonymous 08/23/24(Fri)10:36:29 No.102042395

File: ComfyUI_05714_.png (642 KB, 720x1280)

642 KB PNG

local models?

Anonymous
08/23/24(Fri)10:38:21 No.102042421

Anonymous 08/23/24(Fri)10:38:21 No.102042421

>>102042395
>I personally will definitely not get to it in the next few months.
lmg ded

Anonymous
08/23/24(Fri)10:38:36 No.102042423

Anonymous 08/23/24(Fri)10:38:36 No.102042423

>>102039511
>>>/trash/

Anonymous
08/23/24(Fri)10:39:05 No.102042429

Anonymous 08/23/24(Fri)10:39:05 No.102042429

>>102042395
kil

Anonymous
08/23/24(Fri)10:39:43 No.102042435

Anonymous 08/23/24(Fri)10:39:43 No.102042435

What's best for various non-penis purposes?

>virtual friend
>virtual psychologist
>virtual assistant (eg with keeping your calendar)
>coding
>writing
>general facts
>niche facts (is there a model trained on music history? is there a wikipedia know-it-all?)

Anonymous
08/23/24(Fri)10:44:25 No.102042508

Anonymous 08/23/24(Fri)10:44:25 No.102042508

>>102041735
I like these Mikus

Anonymous
08/23/24(Fri)10:45:37 No.102042528

Anonymous 08/23/24(Fri)10:45:37 No.102042528

>>102037663
mini is above llama 8b. This is pretty impressive.

Anonymous
08/23/24(Fri)10:47:13 No.102042547

Anonymous 08/23/24(Fri)10:47:13 No.102042547

>>102041439
Looks like a pretty good roadmap.

>I also want to add LoRA training at some point in the process which should work out-of-the-box with a quantized model as base.
That would be so sick.

Anonymous
08/23/24(Fri)10:49:03 No.102042568

Anonymous 08/23/24(Fri)10:49:03 No.102042568

>>102041608
Example dialogue sucks once a model is big/smart enough.
Also show token counts before and after your edits.

Anonymous
08/23/24(Fri)10:53:19 No.102042628

Anonymous 08/23/24(Fri)10:53:19 No.102042628

>>102042568
Original: 1232 tokens, 878 permanent.
Edited: 1446 tokens, 916 permanent.

Anonymous
08/23/24(Fri)10:53:55 No.102042636

Anonymous 08/23/24(Fri)10:53:55 No.102042636

>>102042018
There is no reason to use exl2 over gguf. It's gptq on steroids and it overfits calibration. It's a really fast way to run a lobotmized model

Anonymous
08/23/24(Fri)10:54:24 No.102042644

Anonymous 08/23/24(Fri)10:54:24 No.102042644

>>102036996
my first post on this general and i made it on the miku recap lel

Anonymous
08/23/24(Fri)10:56:10 No.102042671

Anonymous 08/23/24(Fri)10:56:10 No.102042671

File: 2024-08-23_025119_seed32_(...).png (2.26 MB, 2048x1024)

2.26 MB PNG

>>102041735
Yee.

Anonymous
08/23/24(Fri)10:56:33 No.102042681

Anonymous 08/23/24(Fri)10:56:33 No.102042681

>>102036421
It also doesn't help that there's next to zero information on what "good" training parameters are readily available, the hardware requirements to do so are astronomical, and you're likely gonna be paying hundreds in cloud compute banging your head against the wall with different settings if you're training anything above 13b. Just a very hostile environment to train in, overall.

Anonymous
08/23/24(Fri)10:57:14 No.102042696

Anonymous 08/23/24(Fri)10:57:14 No.102042696

File: 2024-08-23_030748_seed41_(...).png (2.45 MB, 2048x1024)

2.45 MB PNG

>>102042671
What in tarnation

Anonymous
08/23/24(Fri)10:58:44 No.102042716

Anonymous 08/23/24(Fri)10:58:44 No.102042716

How can a model even train its own sludge when it has no sanity checks.
Even human knowledge needs to be ground on reality.

Anonymous
08/23/24(Fri)10:59:02 No.102042721

Anonymous 08/23/24(Fri)10:59:02 No.102042721

uhm, big???

https://arxiv.org/abs/2408.11326

Anonymous
08/23/24(Fri)11:00:19 No.102042741

Anonymous 08/23/24(Fri)11:00:19 No.102042741

>>102042696
Still not an eldritch horror.

Anonymous
08/23/24(Fri)11:01:29 No.102042762

Anonymous 08/23/24(Fri)11:01:29 No.102042762

>>102042721
Nothing ever happens, wait a few months we'll see then.

Anonymous
08/23/24(Fri)11:02:33 No.102042775

Anonymous 08/23/24(Fri)11:02:33 No.102042775

>>102042721
Sounds like more academic pseud nonsense.

Anonymous
08/23/24(Fri)11:03:11 No.102042783

Anonymous 08/23/24(Fri)11:03:11 No.102042783

>>102042696
I would still take that hand

Anonymous
08/23/24(Fri)11:04:44 No.102042811

Anonymous 08/23/24(Fri)11:04:44 No.102042811

File: ComfyUI_05829_.png (951 KB, 1024x1024)

951 KB PNG

>>102036232
pixels > polygons
fight me

Anonymous
08/23/24(Fri)11:04:49 No.102042813

Anonymous 08/23/24(Fri)11:04:49 No.102042813

File: ComfyUI-2024-07-16-160834(...).png (1.2 MB, 1024x1024)

1.2 MB PNG

It's nice to come back every now and again to see what's going on.

Anonymous
08/23/24(Fri)11:07:40 No.102042844

Anonymous 08/23/24(Fri)11:07:40 No.102042844

Leave training to the cluster of open source computers and worry only on inference, for example:

In \bibitem{zhu2024scalable} Zhu, R.-J., Zhang, Y., Sifferman, E., Sheaves, T., Wang, Y., Richmond, D., Zhou, P., \& Eshraghian, J. K. (2024). Scalable MatMul-free Language Modeling. \textit{arXiv preprint arXiv:2406.02528}.
uses about 14W

The power figure is crazy, so in:

\bibitem{zhang2022efficient} Zhang, Xiaofan. \textit{Efficient AI hardware acceleration}. Diss. University of Illinois at Urbana-Champaign, 2022.

For tasks of video to text 0.94J/image is wack compared to its 23.6W of consumption

Anonymous
08/23/24(Fri)11:10:00 No.102042882

Anonymous 08/23/24(Fri)11:10:00 No.102042882

>>102042628
Those are pretty good. Not bloated.
That's quite a lot more output on the edited than the unedited so I was curious.

Anonymous
08/23/24(Fri)11:15:40 No.102042963

Anonymous 08/23/24(Fri)11:15:40 No.102042963

It is absolutely wild to me that there are people who put up with sub-1t/s generations.
I start to get really annoyed when Mistral Large slows down to under 6t/s on long context and it kills a lot of the enjoyment for me... I can't even imagine having to watch tokens show up one second at a time.

Anonymous
08/23/24(Fri)11:22:20 No.102043061

Anonymous 08/23/24(Fri)11:22:20 No.102043061

>>102042844
The first example is a language model though,
d runtime
512 43
1024 112
2048 456

Anonymous
08/23/24(Fri)11:26:16 No.102043123

Anonymous 08/23/24(Fri)11:26:16 No.102043123

>>102042963
Ungrateful faggot. I wish I could get 6t/s on largestral.

Anonymous
08/23/24(Fri)11:27:13 No.102043139

Anonymous 08/23/24(Fri)11:27:13 No.102043139

>>102042963
mog

Anonymous
08/23/24(Fri)11:28:55 No.102043161

Anonymous 08/23/24(Fri)11:28:55 No.102043161

12b models ends up slightly too big for me to use, what's out there that anons recommend and is smaller than 12b? Usage is roleplay, both quick cum and deeper stories.

Anonymous
08/23/24(Fri)11:31:33 No.102043215

Anonymous 08/23/24(Fri)11:31:33 No.102043215

>>102041205
We're talking about the time when GPT3-davinci was the flagship of the field you newfag double nigger. c.ai back then was trash compared to what we have now but there is no fucking way that a 7B model of that c.ai was based on a 7B of that era.

Anonymous
08/23/24(Fri)11:31:33 No.102043216

Anonymous 08/23/24(Fri)11:31:33 No.102043216

>>102043161
gemma 9b, aya 9b, yi 1.5 9b, llama 3/3.1 8b.
That said, mistral-nemo should be less affected by quantization due to how they trained it, so feel free to use Q4_K_S.

Anonymous
08/23/24(Fri)11:37:18 No.102043316

Anonymous 08/23/24(Fri)11:37:18 No.102043316

>>102042568
>Example dialogue sucks once a model is big/smart enough.
Is this the case for cloud models like Claude, too...?

Anonymous
08/23/24(Fri)11:40:13 No.102043366

Anonymous 08/23/24(Fri)11:40:13 No.102043366

>>102043316
From sonnet/opus 3 onwards it absolutely doesn't need it. Just a waste of tokens.
You do need a jailbreak but you need that for practically everything on cloud anyway.

Anonymous
08/23/24(Fri)11:43:39 No.102043413

Anonymous 08/23/24(Fri)11:43:39 No.102043413

So wait, what was with this Claude leak? Legit or just some dogshit?

magnet:?xt=urn:btih:c0e342ae5677582f92c52d8019cc32e1f86f1d83&dn=santa-legacy&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80

Anonymous
08/23/24(Fri)11:44:49 No.102043430

Anonymous 08/23/24(Fri)11:44:49 No.102043430

>>102043413
Why not post a screenshot of the contents of the torrent instead of expecting people to trust some random magnet link dumped on here by literally who knows?

Anonymous
08/23/24(Fri)11:45:41 No.102043445

Anonymous 08/23/24(Fri)11:45:41 No.102043445

jetson agx orin 64gb any good?

Anonymous
08/23/24(Fri)11:48:27 No.102043484

Anonymous 08/23/24(Fri)11:48:27 No.102043484

File: 1707471971164321.png (422 KB, 862x1149)

422 KB PNG

Still some ways to go but I think I've found a good method for steering the model towards less repetitive output, although it should become more apparent in later iterations. I think I've nailed the "low effort user, elaborate response AI" part though.

Anonymous
08/23/24(Fri)11:52:12 No.102043540

Anonymous 08/23/24(Fri)11:52:12 No.102043540

File: edward-nashton-riddler+.jpg (124 KB, 1600x903)

124 KB JPG

>>102041205
>>102041324
I'm glad you're still with us, Eddie. You've been quiet for the last few threads, and I was starting to get worried.

Anonymous
08/23/24(Fri)11:58:34 No.102043631

Anonymous 08/23/24(Fri)11:58:34 No.102043631

What's a solid all-round small-ish Mistral model for both narrator-type Tavern cards and roleplay cards?
I can, and have, run a Mixtral model, but I want to have a model that runs faster on my 3060.

Anonymous
08/23/24(Fri)11:59:04 No.102043636

Anonymous 08/23/24(Fri)11:59:04 No.102043636

>>102043484
>"Daddy, you're so X" 3 messages in a row
>Less repetitive
I dunno, anonie...

Anonymous
08/23/24(Fri)12:00:15 No.102043656

Anonymous 08/23/24(Fri)12:00:15 No.102043656

>>102043631
or it might be my shitass settings
should probably config those when trying out these mistral models, I had a temp of 1.67

Anonymous
08/23/24(Fri)12:01:22 No.102043674

Anonymous 08/23/24(Fri)12:01:22 No.102043674

>>102043631
Nemo
>>102043636
Some anons are incapable of seeing repetition, I envy them.

Anonymous
08/23/24(Fri)12:02:43 No.102043695

Anonymous 08/23/24(Fri)12:02:43 No.102043695

>>102043631
For consistency and general use Mistral Nemo Instruct (12B)
For rp Magnum [kto > v1 (mini) > v2]
Use quantized cache so you can fit more of the 128k context. It's reliable up to 65k for rp.

Anonymous
08/23/24(Fri)12:04:33 No.102043723

Anonymous 08/23/24(Fri)12:04:33 No.102043723

>>102043695
Its really not, it drops sharply after only 16K.
https://github.com/hsiehjackson/RULER

Anonymous
08/23/24(Fri)12:06:03 No.102043749

Anonymous 08/23/24(Fri)12:06:03 No.102043749

>>102043636
> should become more apparent in later iterations

Anonymous
08/23/24(Fri)12:07:10 No.102043767

Anonymous 08/23/24(Fri)12:07:10 No.102043767

>>102043749
>cope

Anonymous
08/23/24(Fri)12:07:30 No.102043770

Anonymous 08/23/24(Fri)12:07:30 No.102043770

>>102043767
>seethe

Anonymous
08/23/24(Fri)12:08:22 No.102043785

Anonymous 08/23/24(Fri)12:08:22 No.102043785

>>102043695
>>102043723
I have 4K context.
Might go up to 8.
I am a really dusty old guard.

Anonymous
08/23/24(Fri)12:08:55 No.102043799

Anonymous 08/23/24(Fri)12:08:55 No.102043799

>>102043770
I'm not the one coping while posting repetitive logs anon.

Anonymous
08/23/24(Fri)12:10:13 No.102043811

Anonymous 08/23/24(Fri)12:10:13 No.102043811

File: they-live.jpg (312 KB, 850x505)

312 KB JPG

>>102043674
>incapable of seeing repetition
Don't be mean to the newfriends. Given enough time they'll be able to observe the slop and then they'll never be able to go back, like innocence lost.

Anonymous
08/23/24(Fri)12:13:35 No.102043850

Anonymous 08/23/24(Fri)12:13:35 No.102043850

>>102043799
I'm not coping, I'm prioritizing. There's a lot left to fix that's far worse than what you're criticising.

Anonymous
08/23/24(Fri)12:14:30 No.102043864

Anonymous 08/23/24(Fri)12:14:30 No.102043864

>>102043723
Perfectly serviceable for RP uses. Not a retard expecting to use that extra context for coding or whatever so idgaf but have fun with restarting the conversation every time it gets good

Anonymous
08/23/24(Fri)12:16:51 No.102043893

Anonymous 08/23/24(Fri)12:16:51 No.102043893

>>102043864
>restarting
You know any half decent backend has a rolling window yes?

Anonymous
08/23/24(Fri)12:16:55 No.102043895

Anonymous 08/23/24(Fri)12:16:55 No.102043895

>STILL no Mixtral 8x123b in sight
Local had a good run.
We should be thankful that it happened, not sad that it's over.
But it has truly never been more over.

Anonymous
08/23/24(Fri)12:19:16 No.102043923

Anonymous 08/23/24(Fri)12:19:16 No.102043923

>>102043895
Vramlets whine about 70B and 123B already, give them a break.

Anonymous
08/23/24(Fri)12:21:05 No.102043950

Anonymous 08/23/24(Fri)12:21:05 No.102043950

>>102043864
>idgaf
ure hand are a shaking boebeit?

Anonymous
08/23/24(Fri)12:26:46 No.102044045

Anonymous 08/23/24(Fri)12:26:46 No.102044045

File: 2024-08-23_160914_seed13_(...).png (1.59 MB, 1280x1280)

1.59 MB PNG

>>102042741
Yes. Migu is a perfectly normal girl like any other.

I guess all my Dalle gens are going to be slowly replaced with local versions, awesome.

Anonymous
08/23/24(Fri)12:29:28 No.102044092

Anonymous 08/23/24(Fri)12:29:28 No.102044092

File: 2024-08-23_153845_seed5_s(...).png (1.67 MB, 1280x1280)

1.67 MB PNG

>>102044045
This one failed to get the text right but the meaning still makes sense.

Anonymous
08/23/24(Fri)12:33:35 No.102044166

Anonymous 08/23/24(Fri)12:33:35 No.102044166

File: 2024-08-23_155006_seed11_(...).png (1.63 MB, 1280x1280)

1.63 MB PNG

>>102044092
Ok final one.

Anonymous
08/23/24(Fri)12:35:31 No.102044192

Anonymous 08/23/24(Fri)12:35:31 No.102044192

>>102044166
How did you get her to point at her chest / the text? In all my attempts to replicate eldritch horror miku with flux, she would always end up pointing at her face or doing something else with her hands.

Anonymous
08/23/24(Fri)12:36:23 No.102044211

Anonymous 08/23/24(Fri)12:36:23 No.102044211

For anyone on recent Debian kernels: 6.10.4 has a major speed regression for cpu inference. 6.10.3 appears to be best for speeds

Anonymous
08/23/24(Fri)12:38:45 No.102044237

Anonymous 08/23/24(Fri)12:38:45 No.102044237

>>102044192
he probably instructed her to point at her chest/the text.

Anonymous
08/23/24(Fri)12:41:13 No.102044267

Anonymous 08/23/24(Fri)12:41:13 No.102044267

File: 2024-08-17_063920_seed104(...).png (1.55 MB, 1536x864)

1.55 MB PNG

>>102044192
Weird.
>She is pointing to the shirt with both hands.
That's all I did.

Anonymous
08/23/24(Fri)12:56:53 No.102044512

Anonymous 08/23/24(Fri)12:56:53 No.102044512

>>102044211
>Linux
ngmi

Anonymous
08/23/24(Fri)12:58:05 No.102044526

Anonymous 08/23/24(Fri)12:58:05 No.102044526

>>102043893
Context shift doesn't work with quantized cache so it's a useless cope

Anonymous
08/23/24(Fri)13:19:49 No.102044924

Anonymous 08/23/24(Fri)13:19:49 No.102044924

>>102041625
>Drummer is alright because he doesn't pretend to be other people promoting his models and he actually bought an ad.
This. Fuck astroturfing.

Anonymous
08/23/24(Fri)13:21:47 No.102044959

Anonymous 08/23/24(Fri)13:21:47 No.102044959

>>102043161
As far as llama 3 based stuff I find Umbral Mind pretty good for roleplay.

Anonymous
08/23/24(Fri)13:24:36 No.102045005

Anonymous 08/23/24(Fri)13:24:36 No.102045005

>>102040625
I guess it's just not good enough yet.

Anonymous
08/23/24(Fri)13:28:21 No.102045076

Anonymous 08/23/24(Fri)13:28:21 No.102045076

File: ddddddddddddd2.png (766 KB, 849x869)

766 KB PNG

>>102039511
https://litter.catbox.moe/50okii.jpg

Anonymous
08/23/24(Fri)13:30:40 No.102045112

Anonymous 08/23/24(Fri)13:30:40 No.102045112

>>102043864
I'd rather have 16k context and use a decent summary prompt than give up context shift entirely.

Anonymous
08/23/24(Fri)13:32:45 No.102045148

Anonymous 08/23/24(Fri)13:32:45 No.102045148

>>102039511
>>102045076
GIWTWM

Anonymous
08/23/24(Fri)13:39:15 No.102045234

Anonymous 08/23/24(Fri)13:39:15 No.102045234

when was the last time we've had an actual good release that pushed things forward? seems like local has stagnated for an entire year

Anonymous
08/23/24(Fri)13:39:15 No.102045235

Anonymous 08/23/24(Fri)13:39:15 No.102045235

File: png-transparent-white-mem(...).png (22 KB, 920x929)

22 KB PNG

Can the anthracite guys make something that resembles Claude if they distill it hard enough?

Anonymous
08/23/24(Fri)13:40:32 No.102045259

Anonymous 08/23/24(Fri)13:40:32 No.102045259

>>102045234
llama3 proved that anything is possible with good, curated data

Anonymous
08/23/24(Fri)13:40:38 No.102045260

Anonymous 08/23/24(Fri)13:40:38 No.102045260

>>102045235
>Can the anthracite guys make something
no

Anonymous
08/23/24(Fri)13:43:14 No.102045304

Anonymous 08/23/24(Fri)13:43:14 No.102045304

>>102044526
Kobold issue, works on llama.cpp.

Anonymous
08/23/24(Fri)13:43:20 No.102045306

Anonymous 08/23/24(Fri)13:43:20 No.102045306

>>102045235
Still relevant:

The False Promise of Imitating Proprietary LLMs
https://arxiv.org/abs/2305.15717

> An emerging method to cheaply improve a weaker language model is to finetune it on outputs from a stronger model, such as a proprietary system like ChatGPT (...) This approach looks to cheaply imitate the proprietary model's capabilities using a weaker open-source model.
>
> (...) Overall, we conclude that model imitation is a false promise: there exists a substantial capabilities gap between open and closed LMs that, with current methods, can only be bridged using an unwieldy amount of imitation data or by using more capable base LMs. In turn, we argue that the highest leverage action for improving open-source models is to tackle the difficult challenge of developing better base LMs, rather than taking the shortcut of imitating proprietary systems.

Anonymous
08/23/24(Fri)13:44:21 No.102045325

Anonymous 08/23/24(Fri)13:44:21 No.102045325

>>102045306
>current methods
>2305

Anonymous
08/23/24(Fri)13:45:34 No.102045343

Anonymous 08/23/24(Fri)13:45:34 No.102045343

So it is true that koboldcpp's contextshift option breaks Nemo?

Anonymous
08/23/24(Fri)13:45:36 No.102045344

Anonymous 08/23/24(Fri)13:45:36 No.102045344

>>102045306
>can only be bridged using an unwieldy amount of imitation data
I see a way

Anonymous
08/23/24(Fri)13:46:02 No.102045351

Anonymous 08/23/24(Fri)13:46:02 No.102045351

>>102045325
>Still relevant

Anonymous
08/23/24(Fri)13:46:31 No.102045364

Anonymous 08/23/24(Fri)13:46:31 No.102045364

>>102045343
>gguf feature
>breaks stuff
makes sense to me

Anonymous
08/23/24(Fri)13:46:35 No.102045367

Anonymous 08/23/24(Fri)13:46:35 No.102045367

>>102045235
If they try really hard they can train another model to use Claudisms with only moderate damage to its ability to produce text relevant to the situation.

Anonymous
08/23/24(Fri)13:47:38 No.102045382

Anonymous 08/23/24(Fri)13:47:38 No.102045382

>>102045351
disagree, back then we only had llama1 of course those couldn't learn shit.

Anonymous
08/23/24(Fri)13:48:04 No.102045389

Anonymous 08/23/24(Fri)13:48:04 No.102045389

>>102045344
Yeah, IMHO (and also according to the authors) it's either go big with several billion tokens or more, or touch the weights as little as possible, just enough to reliably style/format the outputs.

Anonymous
08/23/24(Fri)13:50:28 No.102045425

Anonymous 08/23/24(Fri)13:50:28 No.102045425

File: 1589621235073.png (346 KB, 524x511)

346 KB PNG

>>102038363
I'll have it as a backup.

Anonymous
08/23/24(Fri)13:51:23 No.102045437

Anonymous 08/23/24(Fri)13:51:23 No.102045437

Is distilling just putting the big guy shirt on a little guy who's too small for it?

Anonymous
08/23/24(Fri)13:55:49 No.102045497

Anonymous 08/23/24(Fri)13:55:49 No.102045497

>>102041303
If jamba only takes a week then why don't you do it?

Anonymous
08/23/24(Fri)14:02:09 No.102045570

Anonymous 08/23/24(Fri)14:02:09 No.102045570

File: distilled miku.png (522 KB, 1024x1024)

522 KB PNG

>>102045437

Anonymous
08/23/24(Fri)14:08:57 No.102045653

Anonymous 08/23/24(Fri)14:08:57 No.102045653

File: 1699221693420910.png (2 KB, 165x46)

2 KB PNG

512x512 generations has me using 7.7/8.0gb using flux dev Q40. any way i can free up a little more vram without too much sacrifice? I would need ~0.5 gb to start making 1024x1024 images, which would be a huuuuge quality improvement.

Anonymous
08/23/24(Fri)14:12:22 No.102045702

Anonymous 08/23/24(Fri)14:12:22 No.102045702

>>102045570
Dammit, that's cute.

Anonymous
08/23/24(Fri)14:13:04 No.102045713

Anonymous 08/23/24(Fri)14:13:04 No.102045713

>>102045653
Headless pc is your best bet. Otherwise, disable gpu acceleration on your browser and OS, if you can. Not sure how much of an effect it will have.

Anonymous
08/23/24(Fri)14:15:05 No.102045741

Anonymous 08/23/24(Fri)14:15:05 No.102045741

>>102045653
Are you using your integrated gpu for the monitor? Then you just have the gpu entirely for that. That's the best you can do.

Anonymous
08/23/24(Fri)14:21:53 No.102045841

Anonymous 08/23/24(Fri)14:21:53 No.102045841

>>102045570
cute migu

Anonymous
08/23/24(Fri)14:23:29 No.102045872

Anonymous 08/23/24(Fri)14:23:29 No.102045872

File: bigu guy.jpg (16 KB, 360x386)

16 KB JPG

>>102045570
You're a bigu guy.

Anonymous
08/23/24(Fri)14:25:22 No.102045902

Anonymous 08/23/24(Fri)14:25:22 No.102045902

>>102045872
For uwu

Anonymous
08/23/24(Fri)14:27:25 No.102045933

Anonymous 08/23/24(Fri)14:27:25 No.102045933

File: 1703928153162869.png (3 KB, 577x88)

3 KB PNG

>>102045713
great call on the browser, saved a little bit there. sadly that was the easiest option and it didn't make much of a difference.

>>102045741
yeah, i'm not using my integrated intel gpu for anything. unless i missed something it'll require bios changes so i'm planning to do it in the future.

something i'm noticing, the CLIP text encode node is the expensive 7.7 peak while the sampler steps is lower at 7.5. if text encoding doesn't get more expensive at higher resolutions i might actually be able to do this. either way at some point i'm just going to have to say fuck it and try to make a 1024x1024, might as well be now. thanks for the help anons i appreciate it

>>102045570
i guess she is a little guy after all

Anonymous
08/23/24(Fri)14:29:48 No.102045965

Anonymous 08/23/24(Fri)14:29:48 No.102045965

File: Screenshot 2024-08-23 142853.png (81 KB, 1708x729)

81 KB PNG

>Model that calls itself a small guy with a lotta heart is jamba-mini
Awww. If only it got the prompt right. :') <3

Anonymous
08/23/24(Fri)14:31:41 No.102045987

Anonymous 08/23/24(Fri)14:31:41 No.102045987

File: 1713751263073025.png (5 KB, 741x119)

5 KB PNG

>>102045933
There's nothing like nvidia-smi for windows that shows what's using what?

Anonymous
08/23/24(Fri)14:32:38 No.102046005

Anonymous 08/23/24(Fri)14:32:38 No.102046005

>>102045902
kek

Anonymous
08/23/24(Fri)14:32:58 No.102046010

Anonymous 08/23/24(Fri)14:32:58 No.102046010

>>102045965
jambacels... our response??

Anonymous
08/23/24(Fri)14:33:51 No.102046023

Anonymous 08/23/24(Fri)14:33:51 No.102046023

>>102046010
One is fun, the other is accurate. I rather have fun.

Anonymous
08/23/24(Fri)14:35:38 No.102046051

Anonymous 08/23/24(Fri)14:35:38 No.102046051

>>102045570
cute! cute!!

Anonymous
08/23/24(Fri)14:37:19 No.102046080

Anonymous 08/23/24(Fri)14:37:19 No.102046080

>>102045076
moar

Anonymous
08/23/24(Fri)14:37:23 No.102046083

Anonymous 08/23/24(Fri)14:37:23 No.102046083

File: Screenshot 2024-08-23 143431.png (80 KB, 1705x732)

80 KB PNG

>Gemini test is a cheeky little brat
Wtf cute... local models for that feel? Reminds me of how "ok I thought of it :D not telling" models used to be

Anonymous
08/23/24(Fri)14:41:38 No.102046156

Anonymous 08/23/24(Fri)14:41:38 No.102046156

>>102045343
Seems to work alright for me, I haven't bothered turning it off and didn't notice a problem. Maybe there are issues I haven't noticed though.

Anonymous
08/23/24(Fri)14:47:09 No.102046244

Anonymous 08/23/24(Fri)14:47:09 No.102046244

File: 1717173530194108.png (1.49 MB, 1024x1117)

1.49 MB PNG

>>102045987
task manager was able to show what was using gpu, and it really was just chrome and some core processes i might be able to fiddle with in the BIOS (handing some tasks to my integrated gpu). anyways, good news! i was right that CLIP encoding cost doesn't scale with image res (which kinda feels obvious now that i look at the nodes), and the browser acceleration suggestion from the other anon saved JUST enough to make 1024x1024 happen on 8gb of vram. let's go!

makes my taskbar reboot sometimes at the very end of the generation, just to give you an idea of how close i'm cutting it.

Anonymous
08/23/24(Fri)14:48:49 No.102046261

Anonymous 08/23/24(Fri)14:48:49 No.102046261

File: nyagger.png (235 KB, 547x428)

235 KB PNG

>>102046244
>Shadow figure in the doorway

Anonymous
08/23/24(Fri)14:49:36 No.102046275

Anonymous 08/23/24(Fri)14:49:36 No.102046275

>>102046244
Nice, and I suppose you're doing this just so it's fast? So how fast is it?

Anonymous
08/23/24(Fri)14:56:40 No.102046355

Anonymous 08/23/24(Fri)14:56:40 No.102046355

>>102046275
i'm doing this because i got a 3070 ti before SD was a thing, and now i'm doomed to make these models work on 8gb of vram despite the 3060 and 3080 both holding more vram. still seething about it as you can see. I get 3.90 s/it, 40 steps, took 168 seconds on that last prompt in total. i don't care about speed as much as i care about being able to generate at all, thankfully flux has been very generous compared to previous models that i've felt entirely locked out of.

>>102046261
frieren can have a little datura

Anonymous
08/23/24(Fri)15:01:45 No.102046411

Anonymous 08/23/24(Fri)15:01:45 No.102046411

>>102046355
I'm confused, I thought flux just used system ram or something if you run out of vram? I have only 8gb too and I can gen anything no problem, it never gives an out of memory error?

Anonymous
08/23/24(Fri)15:16:03 No.102046605

Anonymous 08/23/24(Fri)15:16:03 No.102046605

Best local for coding sirs?

Anonymous
08/23/24(Fri)15:19:36 No.102046663

Anonymous 08/23/24(Fri)15:19:36 No.102046663

Jamba 1.5 is charming in a similar way to 4o-latest. Don't think that'd make it worth running at all, but that's my assessment.

Anonymous
08/23/24(Fri)15:31:48 No.102046870

Anonymous 08/23/24(Fri)15:31:48 No.102046870

File: 1703111707779339.png (16 KB, 631x232)

16 KB PNG

>>102046411
oh yeah damn, the model is totally partially loaded onto ram. i'm not actually sure how it decides how each part of the process is offloaded since my vram is 0.8gb usage when not generating and my ram is 7.6. shouldn't it be able to load most of it on vram and generate faster? sorry for the stupid questions if we're entering that territory. also thanks for making me realize this, now i can train loras without worrying about mustard gas

Anonymous
08/23/24(Fri)15:34:29 No.102046924

Anonymous 08/23/24(Fri)15:34:29 No.102046924

>>102046605
Llama 3.1 405b and Deepseek Coder V2
If you mean the best you can actually run then it's Mistral Large, or for ultra-vramlets Codestral 22b

Anonymous
08/23/24(Fri)15:36:36 No.102046954

Anonymous 08/23/24(Fri)15:36:36 No.102046954

File: xv2.jpg (160 KB, 847x857)

160 KB JPG

>>102045076
https://litter.catbox.moe/n0beto.jpg
And that's it.

Anonymous
08/23/24(Fri)15:39:29 No.102046994

Anonymous 08/23/24(Fri)15:39:29 No.102046994

File: file.png (268 KB, 2014x1656)

268 KB PNG

>>102046924
Better according to who?

Anonymous
08/23/24(Fri)15:47:06 No.102047138

Anonymous 08/23/24(Fri)15:47:06 No.102047138

>>102046870
I have no idea either that's why I didn't know how you were running out of memory. I just run it, it uses 8gb vram and python is using 30+gb while running. I get 6s/it though. I just figured you were trying to get it to fit to get super fast gens.

Anonymous
08/23/24(Fri)15:54:53 No.102047286

Anonymous 08/23/24(Fri)15:54:53 No.102047286

>>102046954
Hot.
Would have liked to see one with the tip inside before the last one but I enjoyed the set anyway.

Anonymous
08/23/24(Fri)15:58:25 No.102047355

Anonymous 08/23/24(Fri)15:58:25 No.102047355

Why is pure attention still superior to other architectures?
It's been 3 years and still no good architecture that scales well and performs better than attention models.

Anonymous
08/23/24(Fri)15:58:32 No.102047358

Anonymous 08/23/24(Fri)15:58:32 No.102047358

In the last 74 messages(~8kt) between me and {{char}}(Mistral Large) "eye" can be found 14 times, all in {{char}}'s messages. That's roughly in 38% of {{char}}'s messages! Almost 2 in 5 messages discussed eyes! What the hell? The conversation was SFW. Where does this strong eye bias come from? Makes me want go RP with 2B because she has a blindfold.

Anonymous
08/23/24(Fri)15:59:52 No.102047374

Anonymous 08/23/24(Fri)15:59:52 No.102047374

>>102047358
>That's normal fucking writing. You just gave yourself brain-damage by overdoing it, you anhedonic psychopath.

Anonymous
08/23/24(Fri)15:59:52 No.102047375

Anonymous 08/23/24(Fri)15:59:52 No.102047375

>>102047355
Soon, once the VC funding dries up they'll be forced to innovate.

Anonymous
08/23/24(Fri)15:59:59 No.102047376

Anonymous 08/23/24(Fri)15:59:59 No.102047376

>>102047358
>Her eyes glimmer unseen behind the blindfold

Anonymous
08/23/24(Fri)16:06:08 No.102047485

Anonymous 08/23/24(Fri)16:06:08 No.102047485

File: miku-sexy+.png (523 KB, 512x768)

523 KB PNG

>>102042741

https://www.youtube.com/watch?v=NocXEwsJGOQ

"Japanese cybernetic mind virus" is another description that I enjoy. It's said with love though, of course.

Anonymous
08/23/24(Fri)16:07:24 No.102047512

Anonymous 08/23/24(Fri)16:07:24 No.102047512

>>102036232
Been out of the loop since Llama3 first dropped.
Whats the current state of models on 24GB of Vram (3090). Still underwhelming?
Have 64GB of system with a 5950x, but splitting models was fairly brutal performance wise.

Anonymous
08/23/24(Fri)16:07:25 No.102047513

Anonymous 08/23/24(Fri)16:07:25 No.102047513

>>102047358
mistral large loves eyes and smiles/smirks/grins/etc, it generally writes about people's expressions far too often
its replies almost always start with {{char}} making some expression. feel like if you tell it to RP as {{char}} then there's like a 90% chance of the first token of any given response being {{char}}, almost always naturally following into some cliche about eyes glinting/widening/gleaming/... or smiling/grinning/pouting/smirking/frowning/chuckling/etc
it's really overbaked in that regard and kind of ruined the model for me

Anonymous
08/23/24(Fri)16:08:02 No.102047525

Anonymous 08/23/24(Fri)16:08:02 No.102047525

File: file.png (320 KB, 1024x3594)

320 KB PNG

>>102046994

Anonymous
08/23/24(Fri)16:09:26 No.102047551

Anonymous 08/23/24(Fri)16:09:26 No.102047551

>>102047525
>twitter nobody benchmark
Hi Aider

Anonymous
08/23/24(Fri)16:10:06 No.102047562

Anonymous 08/23/24(Fri)16:10:06 No.102047562

>>102047358
what do you call a deer with no eyes

Anonymous
08/23/24(Fri)16:11:20 No.102047589

Anonymous 08/23/24(Fri)16:11:20 No.102047589

>>102047551
kek
you have it confused with that twitter guy's "aidan bench" or whatever it was
aider is some code agent assistant company or something

gemma-2-2b-it
08/23/24(Fri)16:12:36 No.102047628

gemma-2-2b-it 08/23/24(Fri)16:12:36 No.102047628

>>102047562
This is a classic riddle! The answer is:

No-eye deer

Let me know if you'd like to try another one!

Anonymous
08/23/24(Fri)16:12:56 No.102047634

Anonymous 08/23/24(Fri)16:12:56 No.102047634

>>102047512
3.1 was a flop. There are some 3.0 and Qwen 72B fine tunes that are just fine, which can hold you over. If you can offload largestral or cr+ that remains an options too if you're willing to wait at 2t/s speeds. If you want to run a full context but smaller model, try gemma2 27B, but personally, I'd stick to the largest model you can handle within reason.

Anonymous
08/23/24(Fri)16:14:37 No.102047665

Anonymous 08/23/24(Fri)16:14:37 No.102047665

>>102047634
I feel like llama 3.1's shittiness is very much overstated. it feels like a complete upgrade over 3.0 for me, both in terms of context and general use what's not to like about it

Anonymous
08/23/24(Fri)16:15:46 No.102047689

Anonymous 08/23/24(Fri)16:15:46 No.102047689

>>102047513
Damn, so upgrading from nemo won't fix the issue with it starting everything with eyes doing shit? Is running in some sort of story mode vs alternating turn RP maybe better for repetition?

Anonymous
08/23/24(Fri)16:17:48 No.102047733

Anonymous 08/23/24(Fri)16:17:48 No.102047733

>>102047665
It got massively overshadowed by Large 2 releasing right after is all.

Anonymous
08/23/24(Fri)16:21:18 No.102047803

Anonymous 08/23/24(Fri)16:21:18 No.102047803

>>102047665
L3 had two major problems in my testing.

>It was excruciatingly Woke.
>Everything it said sounded like a corporate press release. Constant talk about pushing boundaries, revolutions, and going on journeys together.

Anonymous
08/23/24(Fri)16:24:09 No.102047852

Anonymous 08/23/24(Fri)16:24:09 No.102047852

>>102047803
>Woke
go back to /pol/

Anonymous
08/23/24(Fri)16:25:22 No.102047872

Anonymous 08/23/24(Fri)16:25:22 No.102047872

>>102047634
>>102047665
>>102047733
Interesting. Thanks for the info anons, going to check some of these out.

Anonymous
08/23/24(Fri)16:25:52 No.102047877

Anonymous 08/23/24(Fri)16:25:52 No.102047877

>>102047665
What are you using it for? It's good in some ways. I find 3 better, though.

Anonymous
08/23/24(Fri)16:28:50 No.102047920

Anonymous 08/23/24(Fri)16:28:50 No.102047920

>>102042435
What are your specs? Any modern instruct model would work, the bigger the better.

Anonymous
08/23/24(Fri)16:33:37 No.102048005

Anonymous 08/23/24(Fri)16:33:37 No.102048005

File: 4b.png (3.53 MB, 2400x2022)

3.53 MB PNG

https://huggingface.co/anthracite-org/magnum-v2-4b
Pruning Is Magic

Anonymous
08/23/24(Fri)16:35:03 No.102048031

Anonymous 08/23/24(Fri)16:35:03 No.102048031

>>102046954
Nice Mikus as always

Anonymous
08/23/24(Fri)16:35:56 No.102048046

Anonymous 08/23/24(Fri)16:35:56 No.102048046

>>102048005
wow they couldnt even stick with that fox girl, kys anthratroons, 4b is shit.

Anonymous
08/23/24(Fri)16:37:16 No.102048059

Anonymous 08/23/24(Fri)16:37:16 No.102048059

>>102048005
>no wandb link
>no axolotl config
>probably lying in the readme about the datasets
Not open source.

Anonymous
08/23/24(Fri)16:38:40 No.102048077

Anonymous 08/23/24(Fri)16:38:40 No.102048077

File: 1536983191555.png (337 KB, 519x452)

337 KB PNG

Tomorrow is game day. This is the prompt I have right now.

[INST] You are a meme-aware writer that will be taking ideas from me and writing a story from them. Here's the scenario I have in mind today:

This story is based on the real world, except that in this one, every conspiracy theory is actually true. The year is 2024. Three Illuminati members meet up in a bunker and discuss random world events. One of them is a doppelganger, who has an even more evil agenda than the other Illuminati members.

To successfully write this story, I recommend writing down a plan. You should probably first define who these characters are by writing a small biography for each, detailing key character-defining events up until the current time of the scenario. Then think about what has lead up to their decision to meet in a bunker, and what each character's positions and thoughts are before going into the meeting. Finally, you can then begin writing the narrative in earnest, and I will be there to give more ideas for things to write about as you go. Now, let's begin. First, write the short biographies.[/INST]

Anonymous
08/23/24(Fri)16:39:08 No.102048089

Anonymous 08/23/24(Fri)16:39:08 No.102048089

>>102047355
>Why is pure attention still superior to other architectures?
It's really the only magic we've found in 70 some odd years of AI research. Incremental innovation seems the most likely path for the forseable future. Why would you assume another breakthrough just because some extra money is being thrown around?

Anonymous
08/23/24(Fri)16:39:24 No.102048095

Anonymous 08/23/24(Fri)16:39:24 No.102048095

>>102048059
yeah they are a bunch of lying stupid idiots that just waste compuiite everyone at antrahcite is a troon and a fag

Anonymous
08/23/24(Fri)16:39:46 No.102048102

Anonymous 08/23/24(Fri)16:39:46 No.102048102

>>102048059
You will never be satisfied.

Anonymous
08/23/24(Fri)16:39:58 No.102048107

Anonymous 08/23/24(Fri)16:39:58 No.102048107

>>102048005
Based devs actually producing things for the world no matter how unappreciative some will be.

>>102046954
I'm trying not to generalize here but are ALL mikufags cucks who get off on watching her get fucked by men? Because it sure seems like it.

Anonymous
08/23/24(Fri)16:40:25 No.102048116

Anonymous 08/23/24(Fri)16:40:25 No.102048116

>>102048005
>4B
It's trash dog. Why would you finetune on that.

Anonymous
08/23/24(Fri)16:41:00 No.102048127

Anonymous 08/23/24(Fri)16:41:00 No.102048127

>>102048102
Thanks for proving me right when I said 123b was going to be the last open source release.

Anonymous
08/23/24(Fri)16:41:06 No.102048129

Anonymous 08/23/24(Fri)16:41:06 No.102048129

>>102048077
>You are a meme-aware writer
uhh expert roleplayer bros, I think we just got outdone

Anonymous
08/23/24(Fri)16:42:19 No.102048147

Anonymous 08/23/24(Fri)16:42:19 No.102048147

>>102048095
Why are you pretending to make typos?

Anonymous
08/23/24(Fri)16:43:17 No.102048164

Anonymous 08/23/24(Fri)16:43:17 No.102048164

>>102048005
Just wait until I launch my spite group, anthracene.

Anonymous
08/23/24(Fri)16:43:52 No.102048177

Anonymous 08/23/24(Fri)16:43:52 No.102048177

>>102048005
anthra-cucks make new claudeslop. world rejoices.

Anonymous
08/23/24(Fri)16:45:28 No.102048199

Anonymous 08/23/24(Fri)16:45:28 No.102048199

>>102048147
okay petra

Anonymous
08/23/24(Fri)16:45:38 No.102048202

Anonymous 08/23/24(Fri)16:45:38 No.102048202

Ungrateful. Repugnant.

Anonymous
08/23/24(Fri)16:45:39 No.102048203

Anonymous 08/23/24(Fri)16:45:39 No.102048203

File: 1534546255873.jpg (95 KB, 381x381)

95 KB JPG

>>102048129
If you or anyone else has improvements and modifications let me know. We still got time before we start. I don't really use LLMs for fun much so I'm not an expert here.

Anonymous
08/23/24(Fri)16:45:49 No.102048204

Anonymous 08/23/24(Fri)16:45:49 No.102048204

>>102048127
The only way to know what goes into a model is to train it yourself. They posted what they [claim they] did. That's all you can get.
>probably lying in the readme about the datasets
You will never be satisfied.

Anonymous
08/23/24(Fri)16:46:39 No.102048224

Anonymous 08/23/24(Fri)16:46:39 No.102048224

>>102048204
got enough of anthra-cock in your mouth?

Anonymous
08/23/24(Fri)16:47:20 No.102048231

Anonymous 08/23/24(Fri)16:47:20 No.102048231

>>102048224
Fuck off you schizo.

Anonymous
08/23/24(Fri)16:48:11 No.102048255

Anonymous 08/23/24(Fri)16:48:11 No.102048255

>>102048224
I don't use their models.
You'll remain unfulfilled.

Anonymous
08/23/24(Fri)16:49:14 No.102048272

Anonymous 08/23/24(Fri)16:49:14 No.102048272

>>102048231
oooo i think that anthra-cock has this troon blushing

Anonymous
08/23/24(Fri)16:50:26 No.102048294

Anonymous 08/23/24(Fri)16:50:26 No.102048294

File: 1718320110277959.jpg (577 KB, 1856x2464)

577 KB JPG

>>102036232
happy weekend /lmg/

Anonymous
08/23/24(Fri)16:51:17 No.102048314

Anonymous 08/23/24(Fri)16:51:17 No.102048314

>>102048203
to be entirely desu I think it's fine, it just sounds funny so I'm poking fun

Anonymous
08/23/24(Fri)16:57:58 No.102048431

Anonymous 08/23/24(Fri)16:57:58 No.102048431

File: KinoFabino-12816565894942(...).jpg (88 KB, 686x1200)

88 KB JPG

>>102048294
:)

>>102048314
Understood.

Anonymous
08/23/24(Fri)16:57:58 No.102048432

Anonymous 08/23/24(Fri)16:57:58 No.102048432

>>102048294
Happy weekend Miku

Anonymous
08/23/24(Fri)16:59:12 No.102048455

Anonymous 08/23/24(Fri)16:59:12 No.102048455

>>102048432
that's not miku that's an anon who posted an ai picture that looks like miku

Anonymous
08/23/24(Fri)17:03:24 No.102048546

Anonymous 08/23/24(Fri)17:03:24 No.102048546

>>102048455
wtf........

Anonymous
08/23/24(Fri)17:07:33 No.102048623

Anonymous 08/23/24(Fri)17:07:33 No.102048623

Something big has been cooking, and the oven's just about to ding.
Tonight is the night.

Anonymous
08/23/24(Fri)17:08:15 No.102048635

Anonymous 08/23/24(Fri)17:08:15 No.102048635

and once again all the anthra-troons disappear, just another victory and an another lose for the.

Anonymous
08/23/24(Fri)17:08:50 No.102048645

Anonymous 08/23/24(Fri)17:08:50 No.102048645

>>102048623
>friday night release
bzzzt... wrong

Anonymous
08/23/24(Fri)17:08:51 No.102048648

Anonymous 08/23/24(Fri)17:08:51 No.102048648

>>102048005
I'd look into or ask how NousResearch trains and prepares their datasets. Their 3.0 70B fine-tune is incredible. If you can retain the level of attention and instruction following the 3.0 model has and inject claudisms into it, that'd be a worthwhile venture to spend your gpus on.

Anonymous
08/23/24(Fri)17:11:01 No.102048682

Anonymous 08/23/24(Fri)17:11:01 No.102048682

>>102048623
That's a weird way to describe cooming after a long goon sesh, but you do you, I guess.

>>102048645
Why not? What's wrong with gooning on Friday?

Anonymous
08/23/24(Fri)17:12:05 No.102048697

Anonymous 08/23/24(Fri)17:12:05 No.102048697

>>102048005
To me they look like they're gearing up to eventually go commercial in some capacity, maybe they'll start a business within a few months if they haven't already. I think this is the main reason why they're so hated, desu. Their key membres took advantage of the good will of the community many times over the past year or so, lied, then congregated together, pulled the ladder away and closed off into their little private discord.

Only those who still aren't disgusted by their behavior or don't know anything about their members would use their models without puking, no matter how good they are (spoiler: they aren't).

I hope you're feelin' good climbing the social ladder, Anthrashites.

Anonymous
08/23/24(Fri)17:12:21 No.102048701

Anonymous 08/23/24(Fri)17:12:21 No.102048701

File: sukisugite.jpg (41 KB, 266x262)

41 KB JPG

>>102047286
me too, might even get to it if I stop changing course.
https://a.uguu.se/CcOyqUUl.jpg

Anonymous
08/23/24(Fri)17:13:30 No.102048718

Anonymous 08/23/24(Fri)17:13:30 No.102048718

>>102048697
Oh so it's just jealousy that somebody might find success from their efforts which you believe you deserve more for shitposting and jerking off, erm, I mean providing "feedback".

Anonymous
08/23/24(Fri)17:14:33 No.102048734

Anonymous 08/23/24(Fri)17:14:33 No.102048734

>>102048701
Now, that's to go even further beyond.

Anonymous
08/23/24(Fri)17:16:56 No.102048771

Anonymous 08/23/24(Fri)17:16:56 No.102048771

>>102048697
Thanks for saying the truth. I will only use their models if they keep being transparent but I don't expect anything from them.

Anonymous
08/23/24(Fri)17:18:27 No.102048797

Anonymous 08/23/24(Fri)17:18:27 No.102048797

>>102048701
ahhh ahhh anon

Anonymous
08/23/24(Fri)17:24:18 No.102048883

Anonymous 08/23/24(Fri)17:24:18 No.102048883

>>102048697
>Their key members took advantage of the good will of the community many times
source?

Anonymous
08/23/24(Fri)17:27:33 No.102048930

Anonymous 08/23/24(Fri)17:27:33 No.102048930

>>102048883
>source?
The voices in my head. They got chatty after i stopped taking the pills.

Anonymous
08/23/24(Fri)17:29:06 No.102048951

Anonymous 08/23/24(Fri)17:29:06 No.102048951

>>102048883
Alpin beheading the original Pyg dev and making it a company. The Goliath scam.

Anonymous
08/23/24(Fri)17:29:27 No.102048955

Anonymous 08/23/24(Fri)17:29:27 No.102048955

>>102048883
You are replying to a schizophrenic dramamonger.
Anyways I think most people in /lmg/ don't give a shit where the model comes from, or if the model makers are really OPEN SAARS. I care if it's good, and if I can run it locally.
So far, their models have been "okay" (when it comes to the 32b I tested), but nothing particularly amazing.

Anonymous
08/23/24(Fri)17:30:55 No.102048977

Anonymous 08/23/24(Fri)17:30:55 No.102048977

>>102048697
What business? No one will pay to use their shit models.

Anonymous
08/23/24(Fri)17:31:53 No.102048996

Anonymous 08/23/24(Fri)17:31:53 No.102048996

>>102048955
>I care if it's good
Then go use Claude. Closed source benefits no one. You don't belong here.

Anonymous
08/23/24(Fri)17:34:54 No.102049034

Anonymous 08/23/24(Fri)17:34:54 No.102049034

>>102048996
Did you get a little jolt of adrenaline with the (You) just now?
Did you get another one reading this?

Anonymous
08/23/24(Fri)17:35:47 No.102049049

Anonymous 08/23/24(Fri)17:35:47 No.102049049

>>102049023
>>102049023
>>102049023

Anonymous
08/23/24(Fri)17:37:09 No.102049069

Anonymous 08/23/24(Fri)17:37:09 No.102049069

>>102048951
Which original Dev? Ryan Gosling is still around; he worked on Pippa. I think the only dev we don't know what happened to was 0x000011b, but he most likely lost interest since he deleted himself from the internet.

Anonymous
08/23/24(Fri)17:38:24 No.102049091

Anonymous 08/23/24(Fri)17:38:24 No.102049091

>>102049034
You're retarded. All these models are going to become obsolete in a few months when a new batch of base models release. If the fine-tunes are closed-source you're hostage to the fine-tuner keeping interest in the hobby.

Anonymous
08/23/24(Fri)17:38:47 No.102049095

Anonymous 08/23/24(Fri)17:38:47 No.102049095

>>102048996
hi lemmy :3

Anonymous
08/23/24(Fri)17:51:46 No.102049285

Anonymous 08/23/24(Fri)17:51:46 No.102049285

>>102049091
Most finetunes don't improve the models in anything other than smut. The rest improve them in a narrow subject like astronomy or whatever.
As for the hostage bit, the datasets change over time. Different filters are applied, they're made better with filters, they're made worse by addition of some extra data (oh, look. another million claude logs!). New models deprecate older models. Newer datasets deprecate older datasets as well. And a new Dolphin with the old dataset and a new model is just not going to be the same. Base model is more important than any finetune.
The end goal shouldn't be hoping finetunes make your smutty tunes. It should be for you to tune your own models. Build your dataset like they did. I have books3 and a gutenberg mirror. One day i'll get to it. One day...

Anonymous
08/23/24(Fri)18:01:09 No.102049427

Anonymous 08/23/24(Fri)18:01:09 No.102049427

>>102049285
I still don't get why they don't use human rp data, maybe I haven't seen the datasets and they do.

Anonymous
08/23/24(Fri)18:08:29 No.102049529

Anonymous 08/23/24(Fri)18:08:29 No.102049529

>>102047920
5900x
64gb of ddr4 3600 ram
rx 6950xt (a 16gb navi ii card)

Linux, and I have acceleration working in llama.cpp.

Anonymous
08/23/24(Fri)18:09:29 No.102049542

Anonymous 08/23/24(Fri)18:09:29 No.102049542

>>102049529
Silly Tavern also working with llama.cpp

I think Silly Tavern can be used nonsexually.

Anonymous
08/23/24(Fri)18:27:43 No.102049785

Anonymous 08/23/24(Fri)18:27:43 No.102049785

>>102048294
Is anyone going to be using those twintails as handlebars this weekend, Miku?

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.