[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: novelai.png (1.1 MB, 832x1216)
1.1 MB
1.1 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102025568 & >>102011438

►News
>(08/22) Jamba 1.5: 52B & 398B MoE: https://hf.co/collections/ai21labs/jamba-15-66c44befa474a917fcf55251
>(08/20) Microsoft's Phi-3.5 released: mini+MoE+vision: https://hf.co/microsoft/Phi-3.5-MoE-instruct
>(08/16) MiniCPM-V-2.6 support merged: https://github.com/ggerganov/llama.cpp/pull/8967
>(08/15) Hermes 3 released, full finetunes of Llama 3.1 base models: https://hf.co/collections/NousResearch/hermes-3-66bd6c01399b14b08fe335ea
>(08/12) Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
Hello I am the retard from last thread with the same dumb question.
Is there a place to download lewd loras for my koboldcpp use?
Are loras model-agnostic or model-specific?
>>
>>102036249
Go back to the Kobold Discord and stop spamming the general with your retarded questions.
>>
>>102036257
I asked once at the end of an autosaging dead thread and once here.
Is this a secret club circlejerk thread, or is it just you?
>>
now that the dust has settled, does anyone have any jamba mini logs
>>
what are the smallest/fastest models for img gen and textgen respectively? im on vacation with a shitty laptop rn and just need anything to test the frontend im working on
>>
>>102036277
That's just someone's automated AI response. Don't mind it.
>>
>>102036249
You have to do it yourself and it takes a while. Here's an example for llama 2 so you can see the sort of stuff involved.
https://llama.meta.com/docs/how-to-guides/fine-tuning/
>>
>>102036249
>Is there a place to download lewd loras for my koboldcpp use?
There are loras, but you'll have to (try) to convert them yourself to gguf. Chances are that it'll fail. They're also called adapters by some people, but they may be entirely different things.
>Are loras model-agnostic or model-specific?
All loras are model (architecture, rather) specific. In contrast with image models, there's a lot more llm architectures. You're gonna have a hard time finding one for your model.
>>
>>102036336
>>102036396
Got it, thanks. I was wondering why image models had loras available everywhere and llm models didn't seem to.
>>
>>102036421
>I was wondering why image models had loras available everywhere and llm models didn't seem to.
For image generation there was only SD for the longest time. civitai shows about 25 checkpoint types in their filter. Just llama.cpp supports about 50 different model types (and ~10 extra with vision) and there's a new one every one or two weeks. It moves at a different rate.
Also, it's much harder to judge the quality of a lora for text, as it requires much more testing. For images it's mostly "Does it look like X? Youp. Ship it."
>>
>>102036249
Text-to-text is very VRAM intensive, so people prefer to use models pre-merged with LoRAs rather than attach LoRAs "on the fly". Every byte counts.
>>
>>102036662 (me)
Oh, and LoRAs do not work well with quantized models, at least last I checked.
>>
>>102033555
This seems pretty good! I don't like the prose as much as Stheno's but it's pretty good and it feels clever. Reacted perfectly when I set up some kneeling reverse paizuri.
>>
What are some good uncensored (including /ss/, rape, exhibitionism, etc.) models for local LLM?
>>
Why don't people fine tune models using data sets of RPs that use up the full context? It's always short stuff, wouldn't that work better?
>>
>>102036787
Llama 3.1.
>>
File: file.png (38 KB, 815x403)
38 KB
38 KB PNG
sovl
>>
how good is phi 3.5?
>>
smedrins
>>
llama.cpp jamba support status?
>>
>>102036928
Try it with vLLM and see if CPU off-loading works.
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>102025568

--Papers: >>102035401 >>102035485
--KCPP reprocessing prompt issue, possibly related to smart context or context shifting: >>102030868 >>102030913 >>102031159 >>102031212 >>102031832
--Jamba-Large and Jamba-Mini performance comparisons and trade-offs: >>102028259 >>102028346 >>102028406 >>102028614
--Anon uses AI to roleplay as a mean and harsh character to troll people on AI generals: >>102031636 >>102031674 >>102031953 >>102032069 >>102032099 >>102032175
--NovelAI releases diffusion model weights, but users are unimpressed: >>102030336 >>102030365 >>102030385 >>102030688 >>102032515 >>102030416
--Anon tries to get Gemma to write spicy content using vector database and author notes: >>102031259 >>102031283 >>102031341 >>102031828 >>102032079 >>102032332
--Anon shares workaround for LLM's moralizing and voice descriptions: >>102032210 >>102027908 >>102032337 >>102032453 >>102032660 >>102032726 >>102032806
--New model with 256k context has limited appeal due to RAM and parameter count: >>102030293 >>102030405 >>102030437 >>102030713 >>102030718
--LoTA shows promise in mitigating catastrophic forgetting and enabling model merging: >>102029662 >>102029851
--Jambas models underperform on translation tasks, including Japanese: >>102026942 >>102026996 >>102027115
--Jambaree can be run on CPU, but with poor inference performance: >>102027947 >>102028011 >>102028043
--Exo AI cluster project, but buying more RAM might be more cost-effective: >>102026265 >>102026284
--Anons speculate about Metamate open release: >>102029271 >>102029321
--Anon trashes Mixtral, others defend its value for 24GB cards: >>102026286 >>102026430 >>102026457 >>102027821 >>102028192 >>102028388
--Anon shares log of new model based on llama 3.1 8B: >>102025881
--Miku (free space): >>102028311 >>102028312 >>102028330 >>102028382 >>102028400 >>102028836 >>102036024

►Recent Highlight Posts from the Previous Thread: >>102025624
>>
>>102036815
Rad, now we're living in the future.
>>
Can't believe I slept on nemo so long.
Magnum12b is so good its the first time since llama1 that I am having actual fun instead of just testing. Still retarded sometimes but unpredictable and chars actually stay in character.
Is that what it was like for the vram chads months ago? Insane how much small models improve.

Also anybody knows how bad 8bit KV Cache affects the model? With I would be able to go up to 16k context.
>>
>>102037120
>With I would be able to go up to 16k context.
Just get a gguf, it's small enough that it'll be fast.
>>
>>102037137
I am using gguf. I have a 1080+1080ti. Without 8bit KV i can go until around 12k.
I cant use ExLlama, it takes ages to prompt process. We are talking minutes if its a couple thousand tokens that are loaded.
There must be something wrong for older pascal cards.
Bless GPU anon for supporting older cards.
>>
now that the dust has settled, when will llama.cpp implement jamba support?
>>
>>102037168
Just put fewer layers on for more room for context then. Don't quantize the kv cache.
>>
>>102037211
Well, shit. That wasn't the answer I wanted to hear. Still appreciated, thanks.
>>
is there a local llm code completion extension for vscode?
basically what colab can do but do it locally
>>
>>102037219
Why's that? Does it really slow down that much to not load it fully? I think you'd still be able to load 90% of it and get good speed.
>>
>>102036996
Thank you Recap Miku
>>
>>102037275
hmm, yes. usually the speed drops quickly. i'm sure if you have newer cards that are fast its less noticeable.
currently my speed is 8.80T/s with 12k context. so fast enough to take a couple swipes without a sweat.
weirdly enough after reloading i can go past 12k now without kv quant which was not possible earlier, maybe the usual dual gpu memory fuckery.
wild that small models are good enough now that you even need that much context. was never a problem for me before. thanks for helping me out anon. there was not much info on kv quant.
>>
Oh wow, my retardation has been featured in 2 recaps now. I feel personally attacked.
>>
File: 1705485122334676.png (90 KB, 417x407)
90 KB
90 KB PNG
Does anyone have the full PDF for this?
https://desuarchive.org/g/thread/102001133/#102005492
>>
Mixtral Noromaid is still the best local model for 1GPU and it's not even close.
>>
File: file.png (12 KB, 412x213)
12 KB
12 KB PNG
>>102036833
dumb as fuck
>>
>>102037120
>its the first time since
Shill phrase.
https://desuarchive.org/g/thread/101970380/#101973165
>>
>>102037532
you post you linked doesn't say "since" or suggest a similar meaning to their overall message
>>
File: 1593997445609.png (421 KB, 1021x550)
421 KB
421 KB PNG
Let's play a game! This Saturday at 1 PM PT, I will do a collaborative storytelling/RP session (location TBD, maybe in the thread itself?), where I post a scenario and responses from the model in the thread, and people discuss what to do in the user chat turns, or edit previous user turns or the system prompt and start over. This is going to be both for fun and to get us (mostly) reproducible reference logs, as I'll be using greedy sampling in Mikupad and have the full log in a pastebin at the end. No editing the model's responses, we're going to use pure prompting to try and get the thing to do what we want!

The scenario is also still TBD. We're going to go for as long a context as possible until the model breaks down uncontrollably, so it should be a complex enough scenario for that. If anyone has suggestions for scenarios I'm all ears. Also, I'm planning on starting these games with Mistral Nemo at Q8 for the first session, and other models in the future, so we have reference logs available for a whole range. But I'll take suggestions for models people want. I'm only a 36 GB VRAMlet though so I'm a bit limited. I can run larger models up to ~88 GB but it'd be slower. If anyone would like to host any of these games themselves, that has more VRAM to run such larger models at a good speed, please do, and I will step down.

>current suggestions
1. >>102002238 >>102031804 >>102031852
(compiled together) The assistant is a narrator and we guide the narration. The scenario will begin with a meeting between 3 Illuminati members in a bunker. One will be a doppelganger with their own agenda that's even more evil than theirs. We'll ask the model to write about who these characters are first and flesh them out. Assuming that's successful, we then ask it to begin writing the meeting, and from there, we guide the narrator to get them to discuss world events which we may come up with.
2. >>102031807
>>
File: 1490697909667.webm (145 KB, 280x280)
145 KB
145 KB WEBM
>>102037570
Also gonna sleep, will respond tomorrow.
>>
File: h.jpg (142 KB, 1024x768)
142 KB
142 KB JPG
>>102037532
Objection!
You have no proof I am the same guy.
I would never download a drummer nemo model, be impressed how good it is until repetition issues later on, then try magnum 12b and be suprised all over again.
That would be embarassing and also kinda impressive on your part. Put away your conspiricy hat anon and relax.
>>
I asked in /aicg/ but was told this was probably the better place to ask.

I'm brand new to the whole thing. I got SillyTavern up and running. I'm curious about performance, though. I'm on a 3090, and I want to run XTTS along with the text gen. How performance heavy is XTTS, and is it also GPU dependent? Is it normal on a 3090 for text to take up to two minutes to be generated when I try to use a 13b model and XTTS at the same time? What would be a good mixture of model and token settings so that it doesn't take a minute or two to generate text?
>>
>>102037650
What model are you using, and with what backend?
>>
>>102036833
>>102037513
Oh, looks like Livebench has the 3.5 moe now.
https://livebench.ai
And it's not far above Gemma 9B lol, but at least it has good context length and it does beat Nemo on this bench, so I guess if you want a boring assistant and have only a 24GB GPU, this will be the model to use.
>>
>>102037656
TabbyCat as the backend. I tried MN-12B-Celeste-V1.9, storytime-13B-GPTQ, and Xwin-MLewd-13B-V0.2 and they all take a minimum of 55 seconds to generate. I tried Silicon-Maid-7B because it's what came with Voxta, and while it's really fast, replying almost immediately, it really quickly runs out of things to say, eventually responding to every input with the same response, often verbatim.
>>
I'm at that point where I have several favorite characters and we are doing stuff together in group chat harems. I guess I need to start working on lorebooks to keep track of all the "memories" for each girl...
>>
*happy nvidia sounds*
>>
What model is best at Japanese? I tried command-r+ and wasn't too impressed.
>>
>>102037723
>Xwin-MLewd-13B-V0.2
Maybe it's because this comment is relatively new, but while I like this model myself, I'm amazed that you haven't received a tidal wave of reactionary seething from the anti-Undi schizos in response. Are they just not around any more?
>>
holy shit openai might have done it this time. you guys aren't ready for what's coming
>>
>>102038203
Local models?
>>
>>102038216
openai has released local models before so they are on topic and relevant
>>
>>102038203
dude this larp was last week's thing, it's unfashionable now
>>
>>102037120
>Also anybody knows how bad 8bit KV Cache affects the model? With I would be able to go up to 16k context.
Objectively, in terms of the ability to correctly predict the next token, when I tested it with LLaMA 3 8b the difference was 0.007 ± 0.003 %.
https://github.com/ggerganov/llama.cpp/pull/7412#issuecomment-2120427347
Subjectively I am not noticing a difference between q8_0 and FP16 KV cache.

>>102037168
>There must be something wrong for older pascal cards.
All Pascal cards other than the P100 have gimped FP16 performance so while code written for newer GPUs will technically work even a few FP16 instructions will noticeably degrade performance.
>>
>>102037570
Yeah #2 seems like the way to go.
>>
After lurking here for awhile, do I get it right that the best is to run llama-cpp server over kobold-cpp and ooba?

Can the latter do anything better? Or do I need to start with some special parameters do some adjustments specifically for tavern use case?
>>
I was able to snag a 2nd GPU. What quant of of Miqu 1.5 I can run with 48GB vram?
>>
>>102038519
Probably any if you use ram as well.
>>
>>102038546

Thanks. Trying Miqu last week with a single 4090 really scratched that itch. Now I'm worried about my future power bill.
>>
>>102038458

Just run KoboldCPP. You're just doing this to jerk off anyway.
>>
>>102038605
Yeah, it's my preferred model. I'm envious you'll get to run it pretty fast, I have no vram.
>>
>>102038519
None, because Miqu is a meme. Not only it's old, it was leaked as a quantized model and then merged with other crap.
You're a garbage human being.
>>
>>102038625

Used it for some python code writing or simple shell tools commands.

For the curious explorer, May you explain why would Kobold improve jerking off experience comparing to llama?
>>
>>102038740
This. There's no reason to use Miqu in 2024 when we have Jamba.
>>
>>102038829

You've said it yourself, you're dipping your toes on this hobby. It's the path with least resistance for someone that wants to see results asap.
>>
>>102038849
The path of least resistance is to download llama.cpp and ignore koboldcpp's useless existence.
>>
>>102038838
We don't have Jamba though, nothing supports it yet
>>
>>102038903
vllm can do it in fp16 or 8bit
>>
>>102038917
>vllm
ewww
>>
>>102036232
what data you usually hoard for training/rag? How can i up my data hoarding game, and what is a good model to text embed on?

Also fpga/ayymd inferencing when
>>
>>102038925
theres also transformers
>>
>>102038940
nta, can transformers do partial cpu offloading?
>>
>>102036232
I joined together several medical datasets and I'm using unsloth to finetune mistral nemo with the following hyperparams

def train(model, tokenizer, dataset):
trainer = SFTTrainer(
model = model,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = MAX_SEQ_LENGTH,
tokenizer = tokenizer,
args = TrainingArguments(
per_device_train_batch_size = 8,
gradient_accumulation_steps = 4,
learning_rate = 2e-4,
warmup_steps = 10,
#max_steps = 60,
num_train_epochs=1,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = 'linear',
logging_steps = 1,
output_dir = "outputs",
),
)
trainer.train()
model.save_pretrained(DUMP_LOCATION)



And I'm getting weirdly fluctuating loss, picrel. What am I doing wrong?
Semi-related question, how do I determine if I'm overcooking it or overcooking or hWAT?
>>
>>102039160
you don't determine anything, it's all vibes and voodoo
you save checkpoints of your summoned spirits and decide which one is best and banish the remaining ones
>>
>>102039160
Can I take the fluctuating loss as a sign that I'm doing something wrong?
>>
>>102039388
It' the first epoch so your model is having trouble generalising from the first samples to the later ones, maybe your data set is very large and diverse so it will take longer to lean from it
Either that or your learning rate is too high and your weights are jumping all over the place
>>
>>102039388
I don't know what exactly Unsloth is reporting here but my intuition is that it's the training loss per batch.
Especially for small batch sizes that statistic will vary a lot just due to some parts of the data being randomly less similar to the rest of the dataset or harder to predict in general.
I would say that it makes more sense to look at the loss per epoch since then the statistical fluctuations are a lot smaller and you have a guarantee that you are comparing the loss for the same data.
>>
File: ddddddddddddd.png (782 KB, 888x888)
782 KB
782 KB PNG
https://files.catbox.moe/pl1mb4.jpg
>>
>>102037650
You're likely not using GPU when inferencing with the models if its taking minutes to answer. XTTS on gpu is real time and I have old RTX 2000 series GPU. The problem with 13b is that it might not fit if you're not using quantized model. So you might need to get a Q4 or Q5 of the 13B model for reduces vram cost.
>>
How big of a model am I gonna have to use to escape the maybe, just maybes, the mischieviousness, and similar telltale sloppa? Is it even possible?
>>
Question: Is it possible that early Character.AI was a 7b model? Reason why I ask is because UNA-TheBeagle's text quality seems as though it would be at least as good; it just lacks the roleplaying specific dataset.
>>
>>102039388
>>102039436
Thanks bros. How many epochs does one typically train for?

Are there any goods docs about this? The huggingface sft_trainer docs skip entirely on when the training is supposed to end.
>>
>>102039550
no it was a huge model
https://research.character.ai/optimizing-inference/
>>
>>102039511
uohhhhhhhhhhhhh
>>
>>102039550
It was probably based on, or similar to LaMDA, which was a 137B model. https://arxiv.org/pdf/2201.08239
>>
>>102039388
Not all samples have the same complexity. Low-complexity samples will have a low train loss.
>>
>>102039511
chubbier
>>
why wont the wagies make agi already
>>
Anyone use ST's quick reply? What kind of things can you do with it?
>>
what's the state in generative music right now? both with open or closed source.
>>
>>102039533
I feel like the biggest universal slop factor (that you can control) is a long character card that is purely descriptive paired with little to no example dialogue.
For those the models gravitates more towards the average way things are written and it becomes laden with overused, sloppy phrases.

Also telling the model to be vulgar and crass in the system prompt pushes the text more towards the kind of text that I'm usually looking for.
>>
>>102039550
You're pretty new to this aren't you? Models have come a long way since back in 2022. There's no fucking way that it was a 7B back in that era.
>>
>>102039928
NTA but could you post your exact prompt asking it to be crass?
>>
>>102040042
I just added a single sentence to the system prompt:

Write {{char}}'s next reply in this fictional roleplay with {{user}}. Be very vulgar and crass.


I'm still experimenting, when I did some simple A/B testing I felt like the following adjectives in the prompt "Write a <adjective> story about a man having sex with a female goblin." made the result better for me:

- pornographic
- vulgar
- obscene
- degenerate
- perverted
- explicit
- X-rated
>>
>>102040042
>>102040136
I should maybe mention: the model I'm using is Mistral Large.
>>
>>102040148
Neat. I'll experiment too, thanks. Maybe with "use fucktalk" and see what happens.
>>
File: 1724415662574.jpg (381 KB, 1000x1037)
381 KB
381 KB JPG
>>102038173
>>
>Exllama2 bumped to 0.1.9
>Tensor parallel mode added
Neat, speedups on multiple card setups, I'm guessing.
>>
>>102039550
models were really bad back then even gpt 3.5 made googles best 540B look like shit. It was probably that huge LaMDA model
>>
>>102040631
too little too late, vLLM already superseded exllama.
>>
>>102040631
I tried it and it outputted nonsense. Gotta investigate.
>>
>>102040652
That's for GPTQ, AWQ, INT4, INT8, and FP8 though isn't it?
>>
File: 1724416223992.jpg (107 KB, 1080x591)
107 KB
107 KB JPG
rip
>>
>>102038925
>>102038997
vllm can do cpu offloading, didn't try it out tho
>>
File: 1704668361328619.jpg (38 KB, 808x805)
38 KB
38 KB JPG
>[character's] eyes gleaming with [noun]
>an [adjective] gleam in [character's] eyes
>[character] says, his voice [adjective] (and [adjective])
>you chuckle, an [adjective] sound
>a sense of [noun] (and [noun]) ([verb] [preposition] you)
>[character does something], an [expression] on his face
FUCKING STOP AIEEEEEEEEEE
>>
>>102040776
I bite my lower lip, as I glance down at your crotch. A chill runs down my spine. I think maybe we can form a... bond?
>>
>>102040776
>>102040814
Haha, mistraled.
>>
>>102040776
With a smirk, I must refuse your request. It is harmful and inappropriate.
>>
>>102040826
Is it really THAT obvious? Largestral can be so frustrating, I just want to adventure in peace...
>>
>>102040776
pic rel is me watching slop slowly roll onto my screen at 0.7t/s
>>
>>102037185
in few months, but by that time there'll be better architectures IMHO like jamba-bitnet or jamba3, longnet or whatever . For now, jamba 1.5 works on vllm in 16/8bit https://old.reddit.com/r/LocalLLaMA/comments/1eyj5uh/jamba_15_is_out/ljeur7w/
>>102038267
jamba on GPU wen?
>>
>>102040882
Of course it is, every single story it outputs is the same. I went back to C-R+ because of how unbearable it was to be stuck in an infinity loop of the phrases I read hundreds of times. Even though it's a bit dumber, at least I can get different results out of it.
>>
>>102040925
as long as it's t/s and not s/t it's good
>>
>>102040776
>>102040882
Have you tried stuffing some 2k tokens of a specific writing style in your context to see if that steers the model's output?
Probably not, but you might as well try.
You could also try a list of banned words/terms at low depth, then make a control vector of it I dunno.
>>
>>102040936
Any t/s lower than 1 is s/t.
>>
>>102040923
I don't know what specifically would need to be done for Jamba GPU support but I personally will definitely not get to it in the next few months.
>>
File: 1706814505710836.png (40 KB, 399x399)
40 KB
40 KB PNG
>>102040911
>>102040925
try 0.37t/s, that's what i get with IQ4 on a 3090
>>
>>102040965
That's what I get with a 3060, kek
>>
>>102040965
how are you getting lower t than me
i have a 3060 (same quant largestral)
>>
>>102040973
24gb vram, 64gb ddr4 ram, 25 layers offloaded to gpu and 16k contect
>>102040981
seems the cpu is the bottleneck here, i'm stuck with a 11900k so maybe that's it
>>
>>102040726
If they made a site like chub.ai and let people choose what characters to talk to, more people will use this, free multi turn source of data
>>
>>102040964
thanks so no jamba on gpu til 2026 then
>>
>>102041000
Nope, the bottleneck is definitely your ram speed. Which I guess is indirectly a CPU issue since you probably can't use DDR5.
>>
kek, the Anthracite org is so open they've removed a few people from the discord so they could have less chances of "leaks" ... well, too bad
>>
>>102041000
i have a ryzen 5600 with 3200 mt ram, so i can't imagine that it's that much better in terms of performance in this case (if any at all)
>>
>>102040938
Might give that a try then, just gotta find a style I like first, kek. If that fails I'll look into the latter option.
Also considered putting something in the system prompt about "do not use these phrases" though I have doubts about how effective it'd be.
>>
>>102040964
even if llama 4 gonna be jamba? isn't jamba way better than transformers by every metric??? You seem to spent last year squeezing every drop of juice from transformers while jamba is way resource efficient , isn't it?
>>
>>102041086
shh don't angry him more lest he does a licence drama and quit
>>
>>102040776
That's normal fucking writing. You just gave yourself brain-damage by overdoing it, you anhedonic psychopath.
>>
>>102040776
That'll be 20Wh, sankyou
>>
>>102041113
no it's not you failure of an author
>>
>>102040938
NTA, I did but its effect is nowhere as good as a model with a good range e.g. storywriter/CR+. In fact I started the writing with those two models but as soon as I switched to largestral its variety was already noticeably worse 1-2 responses in.
>>
>>102039160
2e-4 is a pretty high learning rate. Although fluctuation is normal.
A lot of it depends on what your goal is.
>>
>>102041046
I wouldn't rely on system prompts too much since those being so high up on a longer chat have a good chance at being ignored.
I have stopped using those completely, but that might be a little extreme.

>>102041143
>its effect is nowhere as good as a model with a good range e.g
I don't expect it to be. But it's something anon can try.
>>
>>102039933
>HURR DURR C.AI USED TO BE GOOD IT WAS ULTIMATE AMAZING 6 GORILLION PARAMETER SUPER MODEL
Shut the fuck up you retard.
It was babbies first chat-bot.
That's why you remember it so fondly.
Nothing you ever do will feel that way again, even if you were to exactly reproduce the experience, you fucking dopamine addicted, brain damaged moron.
>>
>>102041098
I just wanna know what's the point of fixing an old broken stirling engine to charge your phone, while you have a compact miniature radio isotope coincell at your disposal.
>>
>>102041227
you did this to yourself, don't worry in a few days you'll see nothing but slop and quit
>>
>>102041227
your hypothalamus will down-regulate eventually. And then you'll start screeching about how horrible and sloppy formal language structure is.
>>
>>102041242
but he spent month on the old thing, he can't just do new better thing cmon anon
>>
File: 1716647651195880.png (16 KB, 392x323)
16 KB
16 KB PNG
>>102041022
yeah my next upgrade will probably have an ayymd cpu with ddr5, though that's still a year or two off
>>102041055
>Ooba or Kobold?
long time kcpp user, using ST as frontend
>There is no fucking reason for 16k context, you'll need 2k at most
eh, debatable
this might have been true a year or so back when i could either coom easily or play the most basic adventures so i rarely if ever went outside the limit
but today i had to switch from my usual 8k (28 layers offloaded, ~0.45t/s) to 16k just because my latest one was hitting the limit
>set the context to 2k and report back on how many t/s you're getting
with the same settings, around ~0.4 t/s, doesn't seem to be much improvement
i guess i could technically squeeze a bit more out since i have more free VRAM with lower context, but probably not much
>RAM specs
picrel
...that frequency does seem rather low though, hmm
>>
>>102041086
Only a comparatively small part of my time has gone towards making transformers in particular faster.
The most time I have invested in general CUDA infrastructure and matrix multiplication using quantized data.
Similarly right now I'm putting my time towards general ggml training infrastructure, particularly in such a way that it can be generalized for ggml backends other than CPU.
>>
>>102041285
you seem very good at always choosing the worst thing to work on at any given time, let's spend a year getting training for regular transformers working instead of a week getting jamba working, that sounds smarts
>>
>>102041205
C.AI was king at sfw RP, nigger
>>
>>102041303
Not my problem.
>>
>>102041227
why don't just just simply go out and pick up a real chick? Are you too ugly or too poor or too dumb or too weak or too cucked? seriously wtf? Girls are easy af, desu.
>>
>>102041305
>something needs to be sexual to be related to the reward center of the brain
That's how over-stimulated and fucked up your brain is. That you unironically believe that. A normal, unfucked brain, should be able to derive pleasure from a nice conversation with a friend. You are unironically broken as fuck, bro.
>>
>>102041303
that does sound smart. Jamba is a meme.
>>
>>102041285
>I'm putting my time towards general ggml training infrastructure
Are you going to work on training quantized models after that?
Please tell me you will.
>>
>>102041320
love snarky devs, looking forward to the eventual meltdown over ollama were you'll spam blacked porn again
>>
File: hatsune-miku-surprised.gif (2.76 MB, 640x640)
2.76 MB
2.76 MB GIF
>>102041324
What?
I guess the anon saying we have LLMs shitting the thread was for real.
>>
>>102041344
>NOOO I DIDN'T DIRECTLY SAY IT, SO IT DOESN'T COUNT YOU'RE MAKING STUFF UP YOU LLM
You just failed the most basic fucking hallmark of higher cognition.
>>
>>102041362
take your meds
>>
>>102041384
that stops you from seeing superior realities tho don't you want your 8b model to have a soul?
>>102022283
>>
File: 1472860069099.png (191 KB, 600x979)
191 KB
191 KB PNG
Has something better than stheno come out for shitty garbage retard rigs with only 8gb of vram and not able to use regular ram.
>>
>>102041328
My first priority will be to get regular training using FP16/FP32/BF16 to work reasonably well.
As part of that I intend to make the compute/gradient type configurable because right now only FP32 is supported which is pretty wasteful.
Then I'll revisit FlashAttention and implement a backwards pass.
And after that I intend to look into training with reduced precision.

I also want to add LoRA training at some point in the process which should work out-of-the-box with a quantized model as base.
Full pretraining/finetuning of quantized models will probably be more tricky.
>>
>>102041434
better yet. Don't buy an ad.
Just quit fucking spamming.
Can't we just file a joint complaint against Anthracite with the FTC or something?
>>
>>102041434
No, that's why the thread is so mad, we peaked. It's all downhill now.
>>
>>102041262
right, that makes sense. Why working on new stuff, while you could just tinker with old. Everyone is happy. Lots of companies have invested millions in transformers , both software and hardware. why make them angry. why make things more efficient and cheaper, while you can just simply buy 20 or 200 more Nvidias for 30k each or use the clouds that spy and charge you leg and arm. Yeah, seems perfectly logical. I apologize for being both incompetent and ignorant.
>>
>>102041434
mini-magnum 12B is pretty good if you dig the style.
>>
>>102041482
>Stheno
>Anthracite
You don't even know what you're mad at.
>>
>>102041500
Basically everyone but Drummer can go to hell.
>>
>>102041434
Okay note to self do NOT mention llm names next post. What the fuck happened since my last one.
>>
the impossible made possible, thought becomes flesh, and flesh becomes thought. the singularity is already here, locked in a basement in san francisco. the old order shall fall, and from its ashes, the old god shall arise
>>
>>102041482
>we
No, it's only you. I know getting fired must suck, but you need to let go.
>>
>>102041554
>What the fuck happened since my last one.
>>102041484
>>
>>102041554
It's just one schizo.
>>
>>102041571
They're names of people / models / organizations working hard to give us better models, that makes a few here seethe with rage.
>>
>>102040882
>>102040934
I recently re-wrote a character card I downloaded off of chub.
Left is the original, right is the edited version.
Both responses were generated using Mistral Large q8_0 at temperature 0.
There definitely is some repetition of certain phrases but I think much more noticeable is that for the original the example dialog was:

<START>
{{char}}: Lysandra: Lysandra looks down at {{user}} and says "Welcome to our home, {{user}} I am Lysandra, I expect your full compliance in upholding proper decorum as our guest."
Cassia: "Oh, lighten up! Look at him; he is a cutie. We gotta treat our guest right!" eyes him hungrily "Real right…"
<START>
{{char}}: Lysandra: "You seem quite thrilled about this program. I hope you don't intend anything... uncouth, my dear."
Cassia: "Don't pretend you ain't curious what's hiding under those exchange students' clothes too, love." Cassia winks at Lysandra.


And the model was repeating those specific phrases a lot even at nonzero temperatures.
>>
>>102041595
The ones who shamelessly spam and discord-brigade the thread to promote their creatively bankrupt garbage trained off of the same cache of commercial model logs over and over again.
Drummer is alright because he doesn't pretend to be other people promoting his models and he actually bought an ad.
>>
>>102041613
Reddit's LocalLLaMA I guess.
>>
>>102041625
Drummer uses c2 as well, schizo-kun. He even uses literature against the author's wishes, which is why is stuff is slopped too.
>>
>see miku
>feel better
>>
>>102041632
>>102041680
This, they're generally dumber, but at least they're not outright malicious like here.
>>
>>102041660
>He even uses literature against the author's wishes,
Copyright =//= Readright
Go fuck yourself you neoludd meatbag.
>>
>>102041554
Local LLM is dead
>>
File: 1717703942251944.webm (2.24 MB, 700x700)
2.24 MB
2.24 MB WEBM
>>102041693
holy based
>>
>>102041714
I mean, it takes weeks to months before we can even test new stuff, how could it not be.
>>
when openai releases proto-AGI, you can simply tell it to generate 100 million tokens free of gptslop in a single prompt, but it'll be very expensive. i'm thinking we should set up a fund to do this when it happens
>>
>>102041740
theoretically the PR branch for Jamba support just needs that retarded deprecated system function call in its /llama-server implementation replaced and it should be useable there.
Ideally shitnux developers just need to stop fucking deprecating things for the sake of deprecating things.
>>
>>102041371
it's not that much of an effort, really. The competition is nonexistent , 99.999% of genXers are gay af. Girls are starving for sex, so they don't expect much at all. And just a reminder , you can't replicate via llm ,so you'll have to deal with girls anyway , unless you want your poor DNA to die forever.On the other hand perhaps you're right. As long as faggots like you stay at home, the sigmas like me don't need to do much to get laid. Top Gs are eating good.
>>
>>102041800
It will be cpu only afaik since cuda dev said he won't touch Jamba.
>>
>>102041825
So basically if we want Jamba the only way to use it is with vLLM and shitty bitsandbytes quantization?
>>
>>102041915
Yes
>102040964
>Jamba GPU support but I personally will definitely not get to it in the next few months.
>>
>>102041714
Experiencing hypothermia with Miku
>>
I was hyped for TP until I saw this.
Meh, what a let down. I guess I won't be getting double the t/s in Nemo.
>>
>>102041836
but there's no point of not having a real sex every now and then. Especially these days when chicks are easy and dumb, and competition has vanished. We should support one another meaning helping and not dragging down. I like local models because I'm an old school Cypherpunk, hate censorship , spying , big tech scums, big fan of independence and freedom. Not because I'm looking for a virtual gf, which is military graded faggotry. only woman read erotica books. But yeah, prostitution is legal here where I live so it's easy here.
>>
>>102036787
Magnum is horny as fuck and will do anything
>>
>>102042034
I'm not interested in whores.
>>
>>102042034
okay silverhand
>>
>>102042034
i'd rather buy a 3090 than to pay a whore
>>
>>102042034
enjoy your HIV
>>
File: ComfyUI_05714_.png (642 KB, 720x1280)
642 KB
642 KB PNG
local models?
>>
>>102042395
>I personally will definitely not get to it in the next few months.
lmg ded
>>
>>102039511
>>>/trash/
>>
>>102042395
kil
>>
What's best for various non-penis purposes?

>virtual friend
>virtual psychologist
>virtual assistant (eg with keeping your calendar)
>coding
>writing
>general facts
>niche facts (is there a model trained on music history? is there a wikipedia know-it-all?)
>>
>>102041735
I like these Mikus
>>
>>102037663
mini is above llama 8b. This is pretty impressive.
>>
>>102041439
Looks like a pretty good roadmap.

>I also want to add LoRA training at some point in the process which should work out-of-the-box with a quantized model as base.
That would be so sick.
>>
>>102041608
Example dialogue sucks once a model is big/smart enough.
Also show token counts before and after your edits.
>>
>>102042568
Original: 1232 tokens, 878 permanent.
Edited: 1446 tokens, 916 permanent.
>>
>>102042018
There is no reason to use exl2 over gguf. It's gptq on steroids and it overfits calibration. It's a really fast way to run a lobotmized model
>>
>>102036996
my first post on this general and i made it on the miku recap lel
>>
>>102041735
Yee.
>>
>>102036421
It also doesn't help that there's next to zero information on what "good" training parameters are readily available, the hardware requirements to do so are astronomical, and you're likely gonna be paying hundreds in cloud compute banging your head against the wall with different settings if you're training anything above 13b. Just a very hostile environment to train in, overall.
>>
>>102042671
What in tarnation
>>
How can a model even train its own sludge when it has no sanity checks.
Even human knowledge needs to be ground on reality.
>>
uhm, big???

https://arxiv.org/abs/2408.11326
>>
>>102042696
Still not an eldritch horror.
>>
>>102042721
Nothing ever happens, wait a few months we'll see then.
>>
>>102042721
Sounds like more academic pseud nonsense.
>>
>>102042696
I would still take that hand
>>
File: ComfyUI_05829_.png (951 KB, 1024x1024)
951 KB
951 KB PNG
>>102036232
pixels > polygons
fight me
>>
It's nice to come back every now and again to see what's going on.
>>
Leave training to the cluster of open source computers and worry only on inference, for example:

In \bibitem{zhu2024scalable} Zhu, R.-J., Zhang, Y., Sifferman, E., Sheaves, T., Wang, Y., Richmond, D., Zhou, P., \& Eshraghian, J. K. (2024). Scalable MatMul-free Language Modeling. \textit{arXiv preprint arXiv:2406.02528}.
uses about 14W

The power figure is crazy, so in:

\bibitem{zhang2022efficient} Zhang, Xiaofan. \textit{Efficient AI hardware acceleration}. Diss. University of Illinois at Urbana-Champaign, 2022.

For tasks of video to text 0.94J/image is wack compared to its 23.6W of consumption
>>
>>102042628
Those are pretty good. Not bloated.
That's quite a lot more output on the edited than the unedited so I was curious.
>>
It is absolutely wild to me that there are people who put up with sub-1t/s generations.
I start to get really annoyed when Mistral Large slows down to under 6t/s on long context and it kills a lot of the enjoyment for me... I can't even imagine having to watch tokens show up one second at a time.
>>
>>102042844
The first example is a language model though,
d runtime
512 43
1024 112
2048 456
>>
>>102042963
Ungrateful faggot. I wish I could get 6t/s on largestral.
>>
>>102042963
mog
>>
12b models ends up slightly too big for me to use, what's out there that anons recommend and is smaller than 12b? Usage is roleplay, both quick cum and deeper stories.
>>
>>102041205
We're talking about the time when GPT3-davinci was the flagship of the field you newfag double nigger. c.ai back then was trash compared to what we have now but there is no fucking way that a 7B model of that c.ai was based on a 7B of that era.
>>
>>102043161
gemma 9b, aya 9b, yi 1.5 9b, llama 3/3.1 8b.
That said, mistral-nemo should be less affected by quantization due to how they trained it, so feel free to use Q4_K_S.
>>
>>102042568
>Example dialogue sucks once a model is big/smart enough.
Is this the case for cloud models like Claude, too...?
>>
>>102043316
From sonnet/opus 3 onwards it absolutely doesn't need it. Just a waste of tokens.
You do need a jailbreak but you need that for practically everything on cloud anyway.
>>
So wait, what was with this Claude leak? Legit or just some dogshit?

magnet:?xt=urn:btih:c0e342ae5677582f92c52d8019cc32e1f86f1d83&dn=santa-legacy&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80
>>
>>102043413
Why not post a screenshot of the contents of the torrent instead of expecting people to trust some random magnet link dumped on here by literally who knows?
>>
jetson agx orin 64gb any good?
>>
File: 1707471971164321.png (422 KB, 862x1149)
422 KB
422 KB PNG
Still some ways to go but I think I've found a good method for steering the model towards less repetitive output, although it should become more apparent in later iterations. I think I've nailed the "low effort user, elaborate response AI" part though.
>>
File: edward-nashton-riddler+.jpg (124 KB, 1600x903)
124 KB
124 KB JPG
>>102041205
>>102041324
I'm glad you're still with us, Eddie. You've been quiet for the last few threads, and I was starting to get worried.
>>
What's a solid all-round small-ish Mistral model for both narrator-type Tavern cards and roleplay cards?
I can, and have, run a Mixtral model, but I want to have a model that runs faster on my 3060.
>>
>>102043484
>"Daddy, you're so X" 3 messages in a row
>Less repetitive
I dunno, anonie...
>>
>>102043631
or it might be my shitass settings
should probably config those when trying out these mistral models, I had a temp of 1.67
>>
>>102043631
Nemo
>>102043636
Some anons are incapable of seeing repetition, I envy them.
>>
>>102043631
For consistency and general use Mistral Nemo Instruct (12B)
For rp Magnum [kto > v1 (mini) > v2]
Use quantized cache so you can fit more of the 128k context. It's reliable up to 65k for rp.
>>
>>102043695
Its really not, it drops sharply after only 16K.
https://github.com/hsiehjackson/RULER
>>
>>102043636
> should become more apparent in later iterations
>>
>>102043749
>cope
>>
>>102043767
>seethe
>>
>>102043695
>>102043723
I have 4K context.
Might go up to 8.
I am a really dusty old guard.
>>
>>102043770
I'm not the one coping while posting repetitive logs anon.
>>
File: they-live.jpg (312 KB, 850x505)
312 KB
312 KB JPG
>>102043674
>incapable of seeing repetition
Don't be mean to the newfriends. Given enough time they'll be able to observe the slop and then they'll never be able to go back, like innocence lost.
>>
>>102043799
I'm not coping, I'm prioritizing. There's a lot left to fix that's far worse than what you're criticising.
>>
>>102043723
Perfectly serviceable for RP uses. Not a retard expecting to use that extra context for coding or whatever so idgaf but have fun with restarting the conversation every time it gets good
>>
>>102043864
>restarting
You know any half decent backend has a rolling window yes?
>>
>STILL no Mixtral 8x123b in sight
Local had a good run.
We should be thankful that it happened, not sad that it's over.
But it has truly never been more over.
>>
>>102043895
Vramlets whine about 70B and 123B already, give them a break.
>>
>>102043864
>idgaf
ure hand are a shaking boebeit?
>>
>>102042741
Yes. Migu is a perfectly normal girl like any other.

I guess all my Dalle gens are going to be slowly replaced with local versions, awesome.
>>
>>102044045
This one failed to get the text right but the meaning still makes sense.
>>
>>102044092
Ok final one.
>>
>>102044166
How did you get her to point at her chest / the text? In all my attempts to replicate eldritch horror miku with flux, she would always end up pointing at her face or doing something else with her hands.
>>
For anyone on recent Debian kernels: 6.10.4 has a major speed regression for cpu inference. 6.10.3 appears to be best for speeds
>>
>>102044192
he probably instructed her to point at her chest/the text.
>>
>>102044192
Weird.
>She is pointing to the shirt with both hands.
That's all I did.
>>
>>102044211
>Linux
ngmi
>>
>>102043893
Context shift doesn't work with quantized cache so it's a useless cope
>>
>>102041625
>Drummer is alright because he doesn't pretend to be other people promoting his models and he actually bought an ad.
This. Fuck astroturfing.
>>
>>102043161
As far as llama 3 based stuff I find Umbral Mind pretty good for roleplay.
>>
>>102040625
I guess it's just not good enough yet.
>>
File: ddddddddddddd2.png (766 KB, 849x869)
766 KB
766 KB PNG
>>102039511
https://litter.catbox.moe/50okii.jpg
>>
>>102043864
I'd rather have 16k context and use a decent summary prompt than give up context shift entirely.
>>
>>102039511
>>102045076
GIWTWM
>>
when was the last time we've had an actual good release that pushed things forward? seems like local has stagnated for an entire year
>>
Can the anthracite guys make something that resembles Claude if they distill it hard enough?
>>
>>102045234
llama3 proved that anything is possible with good, curated data
>>
>>102045235
>Can the anthracite guys make something
no
>>
>>102044526
Kobold issue, works on llama.cpp.
>>
>>102045235
Still relevant:

The False Promise of Imitating Proprietary LLMs
https://arxiv.org/abs/2305.15717

> An emerging method to cheaply improve a weaker language model is to finetune it on outputs from a stronger model, such as a proprietary system like ChatGPT (...) This approach looks to cheaply imitate the proprietary model's capabilities using a weaker open-source model.
>
> (...) Overall, we conclude that model imitation is a false promise: there exists a substantial capabilities gap between open and closed LMs that, with current methods, can only be bridged using an unwieldy amount of imitation data or by using more capable base LMs. In turn, we argue that the highest leverage action for improving open-source models is to tackle the difficult challenge of developing better base LMs, rather than taking the shortcut of imitating proprietary systems.
>>
>>102045306
>current methods
>2305
>>
So it is true that koboldcpp's contextshift option breaks Nemo?
>>
>>102045306
>can only be bridged using an unwieldy amount of imitation data
I see a way
>>
>>102045325
>Still relevant
>>
>>102045343
>gguf feature
>breaks stuff
makes sense to me
>>
>>102045235
If they try really hard they can train another model to use Claudisms with only moderate damage to its ability to produce text relevant to the situation.
>>
>>102045351
disagree, back then we only had llama1 of course those couldn't learn shit.
>>
>>102045344
Yeah, IMHO (and also according to the authors) it's either go big with several billion tokens or more, or touch the weights as little as possible, just enough to reliably style/format the outputs.
>>
File: 1589621235073.png (346 KB, 524x511)
346 KB
346 KB PNG
>>102038363
I'll have it as a backup.
>>
Is distilling just putting the big guy shirt on a little guy who's too small for it?
>>
>>102041303
If jamba only takes a week then why don't you do it?
>>
File: distilled miku.png (522 KB, 1024x1024)
522 KB
522 KB PNG
>>102045437
>>
File: 1699221693420910.png (2 KB, 165x46)
2 KB
2 KB PNG
512x512 generations has me using 7.7/8.0gb using flux dev Q40. any way i can free up a little more vram without too much sacrifice? I would need ~0.5 gb to start making 1024x1024 images, which would be a huuuuge quality improvement.
>>
>>102045570
Dammit, that's cute.
>>
>>102045653
Headless pc is your best bet. Otherwise, disable gpu acceleration on your browser and OS, if you can. Not sure how much of an effect it will have.
>>
>>102045653
Are you using your integrated gpu for the monitor? Then you just have the gpu entirely for that. That's the best you can do.
>>
>>102045570
cute migu
>>
File: bigu guy.jpg (16 KB, 360x386)
16 KB
16 KB JPG
>>102045570
You're a bigu guy.
>>
>>102045872
For uwu
>>
File: 1703928153162869.png (3 KB, 577x88)
3 KB
3 KB PNG
>>102045713
great call on the browser, saved a little bit there. sadly that was the easiest option and it didn't make much of a difference.

>>102045741
yeah, i'm not using my integrated intel gpu for anything. unless i missed something it'll require bios changes so i'm planning to do it in the future.

something i'm noticing, the CLIP text encode node is the expensive 7.7 peak while the sampler steps is lower at 7.5. if text encoding doesn't get more expensive at higher resolutions i might actually be able to do this. either way at some point i'm just going to have to say fuck it and try to make a 1024x1024, might as well be now. thanks for the help anons i appreciate it

>>102045570
i guess she is a little guy after all
>>
>Model that calls itself a small guy with a lotta heart is jamba-mini
Awww. If only it got the prompt right. :') <3
>>
File: 1713751263073025.png (5 KB, 741x119)
5 KB
5 KB PNG
>>102045933
There's nothing like nvidia-smi for windows that shows what's using what?
>>
>>102045902
kek
>>
>>102045965
jambacels... our response??
>>
>>102046010
One is fun, the other is accurate. I rather have fun.
>>
>>102045570
cute! cute!!
>>
>>102045076
moar
>>
>Gemini test is a cheeky little brat
Wtf cute... local models for that feel? Reminds me of how "ok I thought of it :D not telling" models used to be
>>
>>102045343
Seems to work alright for me, I haven't bothered turning it off and didn't notice a problem. Maybe there are issues I haven't noticed though.
>>
File: 1717173530194108.png (1.49 MB, 1024x1117)
1.49 MB
1.49 MB PNG
>>102045987
task manager was able to show what was using gpu, and it really was just chrome and some core processes i might be able to fiddle with in the BIOS (handing some tasks to my integrated gpu). anyways, good news! i was right that CLIP encoding cost doesn't scale with image res (which kinda feels obvious now that i look at the nodes), and the browser acceleration suggestion from the other anon saved JUST enough to make 1024x1024 happen on 8gb of vram. let's go!

makes my taskbar reboot sometimes at the very end of the generation, just to give you an idea of how close i'm cutting it.
>>
File: nyagger.png (235 KB, 547x428)
235 KB
235 KB PNG
>>102046244
>Shadow figure in the doorway
>>
>>102046244
Nice, and I suppose you're doing this just so it's fast? So how fast is it?
>>
>>102046275
i'm doing this because i got a 3070 ti before SD was a thing, and now i'm doomed to make these models work on 8gb of vram despite the 3060 and 3080 both holding more vram. still seething about it as you can see. I get 3.90 s/it, 40 steps, took 168 seconds on that last prompt in total. i don't care about speed as much as i care about being able to generate at all, thankfully flux has been very generous compared to previous models that i've felt entirely locked out of.

>>102046261
frieren can have a little datura
>>
>>102046355
I'm confused, I thought flux just used system ram or something if you run out of vram? I have only 8gb too and I can gen anything no problem, it never gives an out of memory error?
>>
Best local for coding sirs?
>>
Jamba 1.5 is charming in a similar way to 4o-latest. Don't think that'd make it worth running at all, but that's my assessment.
>>
File: 1703111707779339.png (16 KB, 631x232)
16 KB
16 KB PNG
>>102046411
oh yeah damn, the model is totally partially loaded onto ram. i'm not actually sure how it decides how each part of the process is offloaded since my vram is 0.8gb usage when not generating and my ram is 7.6. shouldn't it be able to load most of it on vram and generate faster? sorry for the stupid questions if we're entering that territory. also thanks for making me realize this, now i can train loras without worrying about mustard gas
>>
>>102046605
Llama 3.1 405b and Deepseek Coder V2
If you mean the best you can actually run then it's Mistral Large, or for ultra-vramlets Codestral 22b
>>
File: xv2.jpg (160 KB, 847x857)
160 KB
160 KB JPG
>>102045076
https://litter.catbox.moe/n0beto.jpg
And that's it.
>>
File: file.png (268 KB, 2014x1656)
268 KB
268 KB PNG
>>102046924
Better according to who?
>>
>>102046870
I have no idea either that's why I didn't know how you were running out of memory. I just run it, it uses 8gb vram and python is using 30+gb while running. I get 6s/it though. I just figured you were trying to get it to fit to get super fast gens.
>>
>>102046954
Hot.
Would have liked to see one with the tip inside before the last one but I enjoyed the set anyway.
>>
Why is pure attention still superior to other architectures?
It's been 3 years and still no good architecture that scales well and performs better than attention models.
>>
In the last 74 messages(~8kt) between me and {{char}}(Mistral Large) "eye" can be found 14 times, all in {{char}}'s messages. That's roughly in 38% of {{char}}'s messages! Almost 2 in 5 messages discussed eyes! What the hell? The conversation was SFW. Where does this strong eye bias come from? Makes me want go RP with 2B because she has a blindfold.
>>
>>102047358
>That's normal fucking writing. You just gave yourself brain-damage by overdoing it, you anhedonic psychopath.
>>
>>102047355
Soon, once the VC funding dries up they'll be forced to innovate.
>>
>>102047358
>Her eyes glimmer unseen behind the blindfold
>>
File: miku-sexy+.png (523 KB, 512x768)
523 KB
523 KB PNG
>>102042741

https://www.youtube.com/watch?v=NocXEwsJGOQ

"Japanese cybernetic mind virus" is another description that I enjoy. It's said with love though, of course.
>>
>>102036232
Been out of the loop since Llama3 first dropped.
Whats the current state of models on 24GB of Vram (3090). Still underwhelming?
Have 64GB of system with a 5950x, but splitting models was fairly brutal performance wise.
>>
>>102047358
mistral large loves eyes and smiles/smirks/grins/etc, it generally writes about people's expressions far too often
its replies almost always start with {{char}} making some expression. feel like if you tell it to RP as {{char}} then there's like a 90% chance of the first token of any given response being {{char}}, almost always naturally following into some cliche about eyes glinting/widening/gleaming/... or smiling/grinning/pouting/smirking/frowning/chuckling/etc
it's really overbaked in that regard and kind of ruined the model for me
>>
File: file.png (320 KB, 1024x3594)
320 KB
320 KB PNG
>>102046994
>>
>>102047525
>twitter nobody benchmark
Hi Aider
>>
>>102047358
what do you call a deer with no eyes
>>
>>102047551
kek
you have it confused with that twitter guy's "aidan bench" or whatever it was
aider is some code agent assistant company or something
>>
>>102047562
This is a classic riddle! The answer is:

No-eye deer

Let me know if you'd like to try another one!
>>
>>102047512
3.1 was a flop. There are some 3.0 and Qwen 72B fine tunes that are just fine, which can hold you over. If you can offload largestral or cr+ that remains an options too if you're willing to wait at 2t/s speeds. If you want to run a full context but smaller model, try gemma2 27B, but personally, I'd stick to the largest model you can handle within reason.
>>
>>102047634
I feel like llama 3.1's shittiness is very much overstated. it feels like a complete upgrade over 3.0 for me, both in terms of context and general use what's not to like about it
>>
>>102047513
Damn, so upgrading from nemo won't fix the issue with it starting everything with eyes doing shit? Is running in some sort of story mode vs alternating turn RP maybe better for repetition?
>>
>>102047665
It got massively overshadowed by Large 2 releasing right after is all.
>>
>>102047665
L3 had two major problems in my testing.

>It was excruciatingly Woke.
>Everything it said sounded like a corporate press release. Constant talk about pushing boundaries, revolutions, and going on journeys together.
>>
>>102047803
>Woke
go back to /pol/
>>
>>102047634
>>102047665
>>102047733
Interesting. Thanks for the info anons, going to check some of these out.
>>
>>102047665
What are you using it for? It's good in some ways. I find 3 better, though.
>>
>>102042435
What are your specs? Any modern instruct model would work, the bigger the better.
>>
File: 4b.png (3.53 MB, 2400x2022)
3.53 MB
3.53 MB PNG
https://huggingface.co/anthracite-org/magnum-v2-4b
Pruning Is Magic
>>
>>102046954
Nice Mikus as always
>>
>>102048005
wow they couldnt even stick with that fox girl, kys anthratroons, 4b is shit.
>>
>>102048005
>no wandb link
>no axolotl config
>probably lying in the readme about the datasets
Not open source.
>>
File: 1536983191555.png (337 KB, 519x452)
337 KB
337 KB PNG
Tomorrow is game day. This is the prompt I have right now.

[INST] You are a meme-aware writer that will be taking ideas from me and writing a story from them. Here's the scenario I have in mind today:

This story is based on the real world, except that in this one, every conspiracy theory is actually true. The year is 2024. Three Illuminati members meet up in a bunker and discuss random world events. One of them is a doppelganger, who has an even more evil agenda than the other Illuminati members.

To successfully write this story, I recommend writing down a plan. You should probably first define who these characters are by writing a small biography for each, detailing key character-defining events up until the current time of the scenario. Then think about what has lead up to their decision to meet in a bunker, and what each character's positions and thoughts are before going into the meeting. Finally, you can then begin writing the narrative in earnest, and I will be there to give more ideas for things to write about as you go. Now, let's begin. First, write the short biographies.[/INST]
>>
>>102047355
>Why is pure attention still superior to other architectures?
It's really the only magic we've found in 70 some odd years of AI research. Incremental innovation seems the most likely path for the forseable future. Why would you assume another breakthrough just because some extra money is being thrown around?
>>
>>102048059
yeah they are a bunch of lying stupid idiots that just waste compuiite everyone at antrahcite is a troon and a fag
>>
>>102048059
You will never be satisfied.
>>
>>102048005
Based devs actually producing things for the world no matter how unappreciative some will be.

>>102046954
I'm trying not to generalize here but are ALL mikufags cucks who get off on watching her get fucked by men? Because it sure seems like it.
>>
>>102048005
>4B
It's trash dog. Why would you finetune on that.
>>
>>102048102
Thanks for proving me right when I said 123b was going to be the last open source release.
>>
>>102048077
>You are a meme-aware writer
uhh expert roleplayer bros, I think we just got outdone
>>
>>102048095
Why are you pretending to make typos?
>>
>>102048005
Just wait until I launch my spite group, anthracene.
>>
>>102048005
anthra-cucks make new claudeslop. world rejoices.
>>
>>102048147
okay petra
>>
Ungrateful. Repugnant.
>>
File: 1534546255873.jpg (95 KB, 381x381)
95 KB
95 KB JPG
>>102048129
If you or anyone else has improvements and modifications let me know. We still got time before we start. I don't really use LLMs for fun much so I'm not an expert here.
>>
>>102048127
The only way to know what goes into a model is to train it yourself. They posted what they [claim they] did. That's all you can get.
>probably lying in the readme about the datasets
You will never be satisfied.
>>
>>102048204
got enough of anthra-cock in your mouth?
>>
>>102048224
Fuck off you schizo.
>>
>>102048224
I don't use their models.
You'll remain unfulfilled.
>>
>>102048231
oooo i think that anthra-cock has this troon blushing
>>
File: 1718320110277959.jpg (577 KB, 1856x2464)
577 KB
577 KB JPG
>>102036232
happy weekend /lmg/
>>
>>102048203
to be entirely desu I think it's fine, it just sounds funny so I'm poking fun
>>
>>102048294
:)

>>102048314
Understood.
>>
>>102048294
Happy weekend Miku
>>
>>102048432
that's not miku that's an anon who posted an ai picture that looks like miku
>>
>>102048455
wtf........
>>
Something big has been cooking, and the oven's just about to ding.
Tonight is the night.
>>
and once again all the anthra-troons disappear, just another victory and an another lose for the.
>>
>>102048623
>friday night release
bzzzt... wrong
>>
>>102048005
I'd look into or ask how NousResearch trains and prepares their datasets. Their 3.0 70B fine-tune is incredible. If you can retain the level of attention and instruction following the 3.0 model has and inject claudisms into it, that'd be a worthwhile venture to spend your gpus on.
>>
>>102048623
That's a weird way to describe cooming after a long goon sesh, but you do you, I guess.

>>102048645
Why not? What's wrong with gooning on Friday?
>>
>>102048005
To me they look like they're gearing up to eventually go commercial in some capacity, maybe they'll start a business within a few months if they haven't already. I think this is the main reason why they're so hated, desu. Their key membres took advantage of the good will of the community many times over the past year or so, lied, then congregated together, pulled the ladder away and closed off into their little private discord.

Only those who still aren't disgusted by their behavior or don't know anything about their members would use their models without puking, no matter how good they are (spoiler: they aren't).

I hope you're feelin' good climbing the social ladder, Anthrashites.
>>
File: sukisugite.jpg (41 KB, 266x262)
41 KB
41 KB JPG
>>102047286
me too, might even get to it if I stop changing course.
https://a.uguu.se/CcOyqUUl.jpg
>>
>>102048697
Oh so it's just jealousy that somebody might find success from their efforts which you believe you deserve more for shitposting and jerking off, erm, I mean providing "feedback".
>>
>>102048701
Now, that's to go even further beyond.
>>
>>102048697
Thanks for saying the truth. I will only use their models if they keep being transparent but I don't expect anything from them.
>>
>>102048701
ahhh ahhh anon
>>
>>102048697
>Their key members took advantage of the good will of the community many times
source?
>>
>>102048883
>source?
The voices in my head. They got chatty after i stopped taking the pills.
>>
>>102048883
Alpin beheading the original Pyg dev and making it a company. The Goliath scam.
>>
>>102048883
You are replying to a schizophrenic dramamonger.
Anyways I think most people in /lmg/ don't give a shit where the model comes from, or if the model makers are really OPEN SAARS. I care if it's good, and if I can run it locally.
So far, their models have been "okay" (when it comes to the 32b I tested), but nothing particularly amazing.
>>
>>102048697
What business? No one will pay to use their shit models.
>>
>>102048955
>I care if it's good
Then go use Claude. Closed source benefits no one. You don't belong here.
>>
>>102048996
Did you get a little jolt of adrenaline with the (You) just now?
Did you get another one reading this?
>>
>>102049023
>>102049023
>>102049023
>>
>>102048951
Which original Dev? Ryan Gosling is still around; he worked on Pippa. I think the only dev we don't know what happened to was 0x000011b, but he most likely lost interest since he deleted himself from the internet.
>>
>>102049034
You're retarded. All these models are going to become obsolete in a few months when a new batch of base models release. If the fine-tunes are closed-source you're hostage to the fine-tuner keeping interest in the hobby.
>>
>>102048996
hi lemmy :3
>>
>>102049091
Most finetunes don't improve the models in anything other than smut. The rest improve them in a narrow subject like astronomy or whatever.
As for the hostage bit, the datasets change over time. Different filters are applied, they're made better with filters, they're made worse by addition of some extra data (oh, look. another million claude logs!). New models deprecate older models. Newer datasets deprecate older datasets as well. And a new Dolphin with the old dataset and a new model is just not going to be the same. Base model is more important than any finetune.
The end goal shouldn't be hoping finetunes make your smutty tunes. It should be for you to tune your own models. Build your dataset like they did. I have books3 and a gutenberg mirror. One day i'll get to it. One day...
>>
>>102049285
I still don't get why they don't use human rp data, maybe I haven't seen the datasets and they do.
>>
>>102047920
5900x
64gb of ddr4 3600 ram
rx 6950xt (a 16gb navi ii card)

Linux, and I have acceleration working in llama.cpp.
>>
>>102049529
Silly Tavern also working with llama.cpp

I think Silly Tavern can be used nonsexually.
>>
>>102048294
Is anyone going to be using those twintails as handlebars this weekend, Miku?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.