[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: file.jpg (159 KB, 768x579)
159 KB
159 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102815881 & >>102801403

►News
>(10/14) Llama 3.1 linearized: https://hf.co/collections/hazyresearch/lolcats-670ca4341699355b61238c37
>(10/14) Zamba2-7B released: https://www.zyphra.com/post/zamba2-7b
>(10/14) Ichigo, voice-to-voice model based on Llama 3.1, released: https://homebrew.ltd/blog/llama-learns-to-talk
>(10/12) Fast multilingual TTS with voice cloning, based on flow matching with DiT: https://github.com/SWivid/F5-TTS
>(10/11) 14B cross-architecture distillation model: https://hf.co/arcee-ai/SuperNova-Medius

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>102815881

--Papers:
>102818670
--Ichigo-llama3.1: Local real-time voice AI on a single 3090 GPU:
>102817117 >102817196 >102817361 >102818092 >102818637
--How to view input to LLM in SillyTavern:
>102821993 >102822010 >102822021
--Zamba2 7B release and performance skepticism:
>102825399 >102825450 >102825497 >102825479
--Hazy Research explores linearizing attention on LLMs with LoLCATs:
>102825440
--Discussion of LLaMA and Miniu models, with anticipation for Q4 release:
>102822049 >102822097 >102822372 >102822381 >102822467 >102822566
--Comparing TTS models: GPT_SoVITS v2, F5-TTS, and xTTS-v2:
>102818159 >102818766 >102818834 >102819042 >102819093 >102819129 >102819144
--Anons discuss Nemo finetunes and share tips for improved performance:
>102817638 >102817706 >102818344 >102818503 >102818525 >102818583
--Using models for emulating DMs/IRC chat with informal dialogue:
>102824368 >102824672
--Tips to prevent LLM from overusing patterns like repeating character names:
>102822777 >102822876 >102822920 >102823085
--Tips for reducing repetitive language patterns in AI-generated text:
>102818224 >102818398 >102818448 >102818580 >102819483 >102819633 >102818976 >102819043
--Suggestions for reducing VRAM usage and measuring quality dropoff:
>102819750 >102819773 >102819843 >102819960 >102821339
--Suggestions for a new model under 70b for homemade AGI waifu agent:
>102820719 >102820743 >102820911 >102821008 >102821013 >102821034 >102821204 >102821288 >102821524 >102822327 >102822370 >102822419
--Recommendations for local models to translate manga:
>102816207 >102818616 >102819005 >102819339 >102819438 >102819494 >102819030 >102816247 >102816327 >102818829
--Miku (free space):
>102815992 >102816491 >102818356 >102819750 >102821339 >102821606 >102821773 >102822472 >102824853 >102825438

►Recent Highlight Posts from the Previous Thread: >>102815888

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
>>102826116
>Llama 3.1 linearized
The fuck is this? Why does this matter? Is the area so dead that you have to include irrelevant news in the OP to pretend that something is happening?
>>
>>102826167
Linear layers are faster and use less memory than softmax layers.
But frankstein swapping shit and finetune healing rarely has good results and it's not supported in llama.cpp.
So I don't have much hope for it, but it's an interesting experiment and I was already adding to the news.
>>
>>102826167
>Is the area so dead that you have to include irrelevant news in the OP to pretend that something is happening?
you answered your question
>>
>>102826167
Ok, I read the article and it looks like another distillation benchmaxxer nothingburger. Yawn.
>>
>>102826167
brown hands wrote this post.
you can tell because it makes no actual sense if you think longer than the 10 seconds swarthoid brains are capable of.
>>
lolcata/linearized gguf soon?
>>
>>102826128
>--Zamba2 7B release and performance skepticism
>Our novel shared-attention architecture allows more parameters to be allocated to the Mamba2 backbone. In turn, the shared transformer block preserves the rich cross-sequence dependencies of the attention computation.
This sounds very interesting. I have more faith in this than I would usually have since it's actually a new architecture.
...I hope it doesn't take too many years for llama.cpp support.
>>
>>102826167
It's just incremental improvements now. Nothing big is happening. AI bubble is bursting
>>
>>102826243
go back >>>/pol/ chud
>>
>>102826281
yes
>>
>>102826116
Teto Tuesday already huh. But it's still Monday here. I'll drop one more Migu.
>>
>mistral large was released on february 26, 2024
>no better model since then
why live. That's 8 months of nothingburgers. 2023 will never happen again.
>>
>>102826507
Mistral Large 2 was released in july
>>
>>102826535
Not open source
>>
>>102826535
In a blind test it's all the same.
>>
>>102826551
Large 2 is the open source one. The old Large is the one that was api-only and pretty shit.
>>
>>102826563
Then change it to
>no new model since july 24, 2024
>almost 3 months of nothingburgers
>>
Anyone had and fixed a bug where ST streaming suddenly stops but backend keeps generating in the background and doesn't add anything to the message? It also cuts off part of the message if I continue a longer message. I remember I had this problem before but it went away at some point.
>>
>>102826623
kobold lite did this to me a few times yesterday too, but none today. kcpp version 1.76.
I change models and sampler settings so often I wouldn't be able to recreate it. SSE streaming Lyra v4 maybe.
>>
>>102826810
I found some github thread to turn on
Show {{char}}: in responses
Show {{user}}: in responses
in user settings. And I did and it works. No idea what it has to do with this but it works.
>>
>>102826615
People are not doing anything creative with the current models. They'll deserve better models once they have found what to do with the current ones other than ERP.
>>
# Mistral Small Fine Tunes

ArliAI/Mistral-Small-22B-ArliAI-RPMax-v1.1
"trained on a diverse set of curated creative writing and RP datasets with a focus on variety and deduplication"
"Training Details:
Sequence Length: 8192
Training Duration: Approximately 4 days on 2x3090Ti
Epochs: 1 epoch training for minimized repetition sickness
QLORA: 64-rank 128-alpha, resulting in ~2% trainable weights
Learning Rate: 0.00001
Gradient accumulation: Very low 32 for better learning."

gghfez/SeminalRP-22b
"RP and creative writing and some regular questions generated by Opus at 8192 context. Refusals removed from dataset. Slop removed to some extent."

Gryphe/Pantheon-RP-1.6.2-22b-Small & Pantheon-RP-Pure-1.6.2-22b-Small
"I strive to introduce a whole collection of diverse personas that can be summoned with a simple activation phrase."
Data (RP-Pure excludes GPT 4-o & writing prompts):
* The 10k most diverse entries from a SlimOrca Sonnet dataset
* A Sonnet 3.5 Pantheon-styled generic roleplay dataset (50/50 Asterisk/Narrative style)
* A Sonnet 3.5 Pantheon Persona dataset (50/50 Asterisk/Narrative style)
* New ChatGPT 4-o Pantheon subset, about 25% the size of the main set (100% Asterisk style)
* A randomly chosen subset of Opus Writing Prompts
* Lyra the Assistant (Coding, summaries and D&D DM questions)

InferenceIllusionist/SorcererLM-22B
"LORA tune", "Trained with a whole lot of love on 1 epoch of cleaned and deduped c2 logs. This model is 100% 'born-local', the result of roughly 27 hours and a little bit of patience on a single RTX 4080 SUPER."

nbeerbower/Mistral-Small-Drummer-22B
"finetuned on jondurbin/gutenberg-dpo-v0.1 and nbeerbower/gutenberg2-dpo."
learning rate 0.000004
"ORPO tuned with 2xA40 on RunPod for 1 epoch."
"learning_rate=4e-6"

rAIfle/Acolyte-22B
"LoRA of a bunch of random datasets", "SLERPed onto base at 0.5", "Check the LoRA for dataset info."

spow12/ChatWaifu_v2.0_22B
"This model aimed to act like visual novel character."
Japanese.
>>
Typically the text is generated at the end, is there anything capable of something like modifying code in your editor in place or inserting some?
>>
>>102826870
Cursor, Cline, copilot, 50+ other projects on github
>>
>>102826870
Models like Codestral are trained on fill-in-the-middle prompts.
>>
>>102826870
Some models are trained for FIM (Fill In the MIddle). Something like
int quicksort(int *items, int item_count)
{
<|fim|>
return e;
}

<|fim|> being whatever token the model uses.
ggerganov is working on a plugin for neovim that does exactly that.
https://github.com/ggerganov/llama.cpp/pull/9787
>>
>>102826857
>rAIfle/Acolyte-22B
>Check the LoRA for dataset info

teknium/trismegistus-project (31.7 MB parquet / 73.2 MB JSONL)
"Size: ~10,000 instruction-response pairs"
"Domain: Esoteric, Spiritual, Occult, Wisdom Traditions, Paranormal, etc."
"The Trismegistus Project is a comprehensive dataset containing instruction-response pairs focused on the broad umbrella of Esoterica. Topics covered include Mysticism, Hermeticism, Necromancy, Religion, Trance, Meditation, Magick, Spirituality, Alchemy, Numerology, Tarot, and much more.
"The entire dataset was generated synthetically, save for subtopics."

AIRRC/Eudaimonic (3.09 MB parquet)
"This dataset is designed to facilitate the study and analysis of philosophy and human emotions, with a particular emphasis on multi-turn conversations. It provides a rich collection of dialogues and text data that delve into various philosophical themes and emotional expressions"

Gryphe/Sonnet3.5-Charcard-Roleplay (173 MB JSONL)
"9736 carefully simulated character card-based roleplay dialogues produced using an unrestrained Sonnet 3.5, now available as a ShareGPT dataset."
"Upon assembly of the final dataset further cleaning was performed and any references to Anon (the original user) were renamed to the placeholder {{user}}."
"A final enrichment phase was applied with the most common phrases (such as 'a mix/mixture of') being replaced by alternatives given by GPT-4o."

anthracite-org/kalo_misc_part2 (3.45 MB parquet / 7.9 MB JSONL)
No description.

anthracite-org/kalo_opus_misc_240827 (5.01 MB parquet / 9.55 MB JSONL)
No description.
>>
File: 39_06538_.png (1.03 MB, 720x1280)
1.03 MB
1.03 MB PNG
Tuesday state of mind
>>
>>102826857
# Fine tunes not included in the previous post

## Abliterated
byroneverson/Mistral-Small-Instruct-2409-abliterated
"Check out the jupyter notebook for details of how this model was abliterated."

zetasepic/Mistral-Small-Instruct-2409-abliterated
"Abliterated version using the code from (https://github.com/andyrdt/refusal_direction)."

## Ignored because trained on top of instruct tune with different/wrong prompt format
(6 items removed to avoid spam filter)

## Ignored because they're merges that include ignored models
(6 items removed)

## Ignored because no description
(9 items removed)
>>
>>102826116
>Llama 3.1 linearized
?? Wat? Does this remove activation functions? And if yes then why?
>>
Did something recently break with kobold and ST?
>bots with example messages start posts by typing out their name
>in group play bots pay no attention to other bots

This is borderline unusable
>>
>>102826300
This, but unironically.
>>
>>102827119
Check >>102826820. Maybe it is the same.
>>
>>102826820(me)
btw I wonder if those settings changed for me with update and the idea was to change this setting to filter out coomers who would give up and move to something else without looking for a fix.
>>
File: salute.jpg (20 KB, 351x351)
20 KB
20 KB JPG
>>102827036
>>102826951
>>102826857
Thank you for your report anon
>>
To the anon who brought GPT-SoVits to my attention, and said I should fine tune it, thank you. It's excellent, I never realized how good TTS has gotten. You really do have to finetune it though, it doesn't work nearly as well without it.

I downloaded a VOD from Vei (VTuber streamer), cut the audio from the first 30 minutes where she just talks to the chat, and put it through the whole data processing and finetuning UI. Once I had the raw audio clip the whole process took like literally 10 minutes to having a trained model.

The results: https://voca.ro/1hqGpP2qqRYA

I bet it would be even better if I curated the extracted audio clips and fixed errors in the automatic speech recognition process, which I skipped. This is great, never thought I would be a TTSfag, but here I am.
>>
>>102827232
Anon, how. That's fucking amazing.
>>
>>102827138
>filter out coomers
Yes. He was thinking specifically of you. Everybody does...
>>
>>102827232
>data processing and finetuning UI
link? I have a stash of audio I'd like to do this with...
>>
>>102827232
That's fucking terrible, but it's good tts.
>>
File: 1718837602136463.jpg (1.48 MB, 1920x1759)
1.48 MB
1.48 MB JPG
>>102827232

Oh. Neat. I'll try it out. Motivated.
>>
>>102827279
I mean the GPT-SoVits webui itself. Just follow the rentry guide, linked to from the github.

Specifically here's what I did: download vod directly from twitch using videodownloadhelper extension. Open in Shotcut (I'm on linux, maybe there's a better option for windows), select 30 minute range where she's just talking, export as wav file. Load that into webui, click all the buttons in the right order to do all the extraction steps, ASR, and finetuning. Use one of the auto extracted short clips + ASR transcription as the reference audio when doing inference. That's it.
>>
good morning again /lmg/, what model are you using today?
>>
>>102827374
that one you really hate
you know the one
>>
>project claims to need a certain python version
>the SAME project's own requirements.txt asks for a package version that won't work with that python version
I HATE PYTHON DEVS
I HATE PYTHON DEVS
I HATE PYTHON DEVS
>>
>>102827441
sounds like an amd problem
>>
>https://www.latent.space/p/gpu-bubble
>TLDR: Don’t buy H100s. The market has flipped from shortage ($8/hr) to oversupplied ($2/hr), because of reserved compute resales, open model finetuning, and decline in new foundation model co’s. Rent instead1.
>decline in new foundational models
It's officially over. The bubble has burst.
>>
ok but why is UNA-TheBeagle-7B-v1 still the most soulful local model for vramlets
>>
>>102827568
Huh... NVDA stock just hit a new all time high today and now we see signs like this. Might be a good time to start shorting or buying put options. Or at least get out while you can. We've seen AI plateau across the board at the highest levels too for over two years at this point. I don't think the world is ready for just how bad this burst will be.
>>
>>102827568
Does that mean we may get liquidated H100 flooding the market for cheap? I am like 10x on Nvidia stocks already, a 50% crash isn't the end of the world. I may even slurp the dip.
>>
>>102827568
This is good for open source weights, because models from Largestral to Llama 405B or even stuff in the open from other domains like FLUX have been almost impossible to train without a fortune and access to limited hardware. The fact that all of this hardware now is accessible means we're going to get a lot more models and capabilities that were previously hoarded by the companies at the top for cheap prices. Sucks to be the investors though.
>>
>>102826832
But the current ones aren't good enough yet for ERP, that's the problem.
>>
I can no longer enjoy new anime, as LLMs provide far more engaging and interactive stories. It's astonishing to see how low anime has fallen.
>>
>>102827771
Anime always sucked, you just woke up
>>
>>102827568
>>102827628
Sorry your short is tanking bro, I guess it's payback for all the negativity you were spreading for local models.
I won't be bailing you out.
>>
So when will an H100 reach the price of a gaming GPU?
>>
File: 1728936765937206.png (488 KB, 512x768)
488 KB
488 KB PNG
>>102826832
Switching between two characters/agents with distinct and long enough chat histories in any group interaction is painfully slow. To do anything beyond one-on-one ERPs, LLMs must either exhibit linear complexity for context processing or possess robust general summarization capabilities. I keep running into this wall whenever I try to accomplish anything with LLMs
>>
>>102827997
Why not have one card with multiple characters? Why this weird "group" thing? Does it really work so much better? I've never tried it since I'd want the model to have definitions at hand for all the characters in the scene no matter who it was writing dialogue for.
>>
>>102827997
Swap character cards -> Merge character cards
Natural order -> List order
do you really need more? even Llama 8B can handle several characters, assuming they are not 2000 tokens each. and if they are, you are doing it wrong. most defs can be condensed to <1k tokens with negligible losses
>>
>>102828101
yeah basically >>102828082 but instead of merging the cards manually SillyTavern does it for you + the messages from different characters are clearly distinguishable. I also like adding an empty Narrator card for meta shit
>>
>>102827662
If H100s become worthless they'll literally landfill them rather than let the public have them cheap. Partially out of spite, partially because muh China.
>>
>>102828082
While conversing with one character, introducing another causes LLM to recall interactions from previous discussions. It goes beyond chats, I need distinct contexts for handling various types of entities within my world. It's fun when it works, but quickly becomes unusable slow as context grows.
>>
LLM's to me struggle with subtlety
>>
>>102828285
I struggle with figuring out what model to use
>>
>>102827657
I don't think average consumers can even put SXM H100s to use.
>>
>>102827997
anon wtf; LLMs have amazing abstractive summarizing ability???

What's your structure for doing the chat?
Are you doing:
<chat_history>
char_1: f
char_2: f
anon: f
char_1: typing...

And having char1's character card swapped into the query for generation, or are you stuffing the character cards into hte beginning of the convo when they're introduced?
>>
BookWorm: A Dataset for Character Description and Analysis
https://arxiv.org/abs/2410.10372
>Characters are at the heart of every story, driving the plot and engaging readers. In this study, we explore the understanding of characters in full-length books, which contain complex narratives and numerous interacting characters. We define two tasks: character description, which generates a brief factual profile, and character analysis, which offers an in-depth interpretation, including character development, personality, and social context. We introduce the BookWorm dataset, pairing books from the Gutenberg Project with human-written descriptions and analyses. Using this dataset, we evaluate state-of-the-art long-context models in zero-shot and fine-tuning settings, utilizing both retrieval-based and hierarchical processing for book-length inputs. Our findings show that retrieval-based approaches outperform hierarchical ones in both tasks. Additionally, fine-tuned models using coreference-based retrieval produce the most factual descriptions, as measured by fact- and entailment-based metrics. We hope our dataset, experiments, and analysis will inspire further research in character-based narrative understanding.
might be relevant
>>
>>102827913
The V100 hasn't even gotten to that level for the 32GB. What makes you think H100s are going to reach that level any sooner?
>>102828120
The best Nvidia can do is buy them back to do that, the companies own the hardware outright. And even if landfilled, China will find a way to salvage it.
>>
How much does enabling 8bit cache degrade the model's intelligence?
>>
Which proxy still has Opus?
>>
>>102827997
You're not using llama.cpp properly. You should save the kv cache separately on your two instances so you don't have to reprocess everything when you swap your cards
>>
>>102828395
Wrong thread
>>
>>102828395
Send a photo of your butt with "expert roleplayer" written on it to markzuckerberg@meta.com and I'll give you secret access token for slopus.
>>
File: bit-closer-chaos.jpg (67 KB, 1024x751)
67 KB
67 KB JPG
>>102828361
>>
>>102827232
Wish there was a way to easily control the emotion for the output. Feels too monotone or calm half the time when I'm trying to get it to yell insults at me. Might just be an issue with the sample audio I'm using though
>>
behemoth has claude-tier soul tbdesu
I wish it was on openrouter or something because I'm sick of 1t/s
>>
File: 1728941446656197.jpg (137 KB, 512x768)
137 KB
137 KB JPG
>>102828082
>>102828101
>>102828110
Imagine a world filled with multiple characters who can inhabit various locations,
user can borrow any character to participate in chat, thereby adjusting its behavior with personalized examples. There are some basic rules for location changes, like, adventurers form parties to explore dungeons. It is super fun and less predictable than using character cards with predefined scenarios
>>
>>102828394
You won't notice. Also, 16>q4>8
>>
>tried Claude opus out of curiosity
>Literally drained my dick in ten or so message
>Cost me like three bucks for the short back and forth

I understand why aicg steals that shit now.
>>
>>102828532
You can set your max_tokens to 500 and get the same experience with any 70B model. Claude is full of slop, only aicgtards think writing more = better
>>
>>102828528
4 is better than 8? how does that work?
>>
>>102828532
you should rename the general to CMG - coping models general
>>
>>102828607
>set your max_tokens to 500
I see you're a fan of getting 3 different dialogues and actions and the AI speaking and acting for you.
>>
>>102828607
It was a little sloppy, yeah but it also really "got" the scenario and excellent at following instructions.

Yes, the 70b experience is passable, but opus was actually divine.
>>
it is kind of hilarious that a significant portion of this general are just thirdies from /aicg/ who think they're trolling when they constantly seethe about local models existing
>>
>>102828647
nah we just like driving through the poor part of town every once in a while
>>
>>102828610
https://www.reddit.com/r/LocalLLaMA/comments/1dw90iq/comment/lbux25j/
>>
>>102828647
You probably think I'm trolling, but I'm just being honest with my findings. I thought Claude opus was a really good experience.
I'm not even knocking local.
>>
>>102828671
thanks
>>
>>102828656
You'll see all the poors have the latest iphone
>>
>>102828720
That's why they are poor
>>
I am too dumb to understand what the fuck they're talking about with this linearising. Does it speed up tokens per second? Fine tuning? Quantisation?
>>
File: konosuba soyjak.jpg (84 KB, 680x464)
84 KB
84 KB JPG
>>102827232

It's very good. Can do all range of emotions as long as you got a classified reference sample. IE, Angry, Annoy, Excited, etc.

Outputs:

Normal Refence:
"The food isn't that good here. Let's not go here next time."
ここの料理はあまり美味しくないね。次回はここに行かないようにしよう。
https://vocaroo.com/1mKoMlkXPYLT

Angry Reference:
"O flames that shake the earth, gather in my hands. The power of destruction that swallows everything, be unleashed here and now. Explosion!"
大地を揺るがす炎よ、我が手に集え。すべてを飲み込む破壊の力、今ここに解き放つ。エェェェエエクスプロォォォオオジォォォオンンン!!!
>Volume Warning
https://vocaroo.com/1nfcHP4rwJjt

Excited Reference:
Look! There's so much cool things over there!
見て!あそこにすごいものがたくさんあるよ!
https://vocaroo.com/1jP7rMuBnNNt

I think this is the next level for local TTS chats. SillyTavern also has some classified emotions extension models for chats already. All GPT-SoVITS needs is integration.
>>
>>102826810
Now is probably good time to ask, I'm using raw KoboldAI and give the model my custom instructions what to do in terms of roleplay. At some point I noticed that if you use the {{user}} or {{char}} tags, they get replaced with You and KoboldAI for the chat view and for the console window (which I believe is the text that the model sees). In the edit mode it shows {{user}} and {{char}}. In the edit mode the user's and AI's responses are divide up by {{[INPUT]}} {{[OUTPUT]}} which then get replaced by the template you are using for the model.

This is when Format is set to Instruct Mode. In this mode, you also get the model template choice. If you choose Story or Adventure mode, you don't get the template choice (I guess it's just raw text completion then?). Then the last option is Chat Mode, which gives you the option to set your name and AI name, which by default are You and KoboldAI. In this mode, template is also not used (but it doesn't edit old content in the context), but the system automatically starts user and AI messages with a new line and user's and AI's set names. This happens even if "Inject ChatNames" is unchecked.

If I change the names to something else, and switch back to Instruct Mode, Kobold still keeps replacing {{user}} {{char}} tags with the set names, which is weird and possibly a bug (it's been like this for as long as I remember with multiple Kobold versions including 1.76). Ok, I just found out Placeholder Tags checkbox, which is checked by default, which uses and replaces both the {{[INPUT]}} {{[OUTPUT]}} and {{user}} and {{char}} tags. Unchecking it fixes it, however, now edit mode uses the raw template tags which makes it harder to read. I guess one solution is to just name user and char {{user}} and {{char}} so they get replaced by the same thing.
>>
File: 1723816142134835.png (12 KB, 583x268)
12 KB
12 KB PNG
And while I'm at it, Kobold is also full of checkboxes that claim to inject stuff and that are hard to understand, for example Adventure PrePrompt and such. Tested just now, that checkbox only applies in Adventure Mode, even though the option is always there even in Instruct mode, making it confusing to know what is actually being done.

The sampler config is also confusing, for example I don't know if DRY or XTC are enabled by default or not or what value I need to use to have them disabled.
>>
>>102829104
Are you the meguminfag who used xtts2 before?
Yep the TTS is good, but emotion classification model is really bad (go emotions is a shitty dataset to begin with). It's very hard to make a proper dataset for a limited set of emotions, let alone supporting 28 of them.
t. trained my own
>>
>>102829104
So I can finally have voice acted sex scenes with orgasmic moaning about my favourite anime characters and streamers? Does this realistically work? Those examples are looking great.
>>
What's a good bot to absolutely fuck with, in a non-sexual way?
>>
>>102829104
okay now do lewd reference
>>
>>102828317
context for char5:
char5 definition
char8 definition
char14 definition
char17 definition
chat history between 5 and 14 (user: 5) #this shapes char5's speech patterns
transition message upon location change
chat history between 5, 8 and 14 (user: 8)
transition message upon location change
current chat history between char5, char14 and char17 (user: 17)
[post-instruct for char5]

Definition describes appearance (char5 is a tall human archer wielding a long bow)
Post-instruct defines personality traits (char5's personality: lazy, stubborn)
>>
>>102829104
kind of cool that all unvoiced VNs will soon be able to be VA'd
>>
>>102829175
>>102829209

Visual Novel Reference:
もっと速く,あ,あ,あ,あ,あ,あ,あ
https://voca.ro/1r46DjZzDx21

Yeah, it sounds like exactly generic JAV. At least you can now generate that stuff on demand.
>>
>>102828532
How many tokens does your chat have? Opus is expensive as shit but three bucks for 10 or so messages doesn't sound right
>>
>>102829329
>Yeah, it sounds like exactly generic JAV.
Because that's what you fed it, idiot.
>>
>>102829368
why are you being such an angry nigger? brown moment?
>>
File: images.png (5 KB, 189x267)
5 KB
5 KB PNG
>>102829137
if the multiplier or probability is zero then it's not enabled.
>>
>>102829329
how does it do with English ERP logs? And can you shape the voice by freeform descriptions or does it just read the input out loud?
>>
Based on some HF blogpost, using compressed KV cache would slow down generation (and keep near perfect accuracy), however, by compressing it 4bit on Kobold with Mistral Small on 8gb gpu, I can fit 2 more layers to GPU and get FASTER generation speed. 3.16t/s compared to approx 2.7t/s (from my memory).

This model slows down with longer context. At almost 12k tokens it slows down to 1.07t/s at 4bit kvcache setup and 1.05t/s with normal setup. Flashattention is on in all cases.
>>
>>102829690
the ad bot forgot to include the model...
>>
>>102829711
Mistral-Small-Instruct-2409-Q5_K_M
Probably going to switch to Q4 since I'm at the huge speed gains spot in terms of offloading
>>
>>102826219
Main problem is the same as RWKV/state-space, the real context is tiny. It won't become gibberish when crossing that context limit, but it won't really remember anything either.
>>
>bans 'the'
now what
>>
>>102829924
now it won't say "the"
>>
https://www.preferred.jp/ja/news/pr20241015/
>>
>>102829924
ban "a" and "and"
>>
/lmg/ death status?
cooming llm death status?
>>
>>102828532
I don't get what they're doing over there at aicg. Wtf are proxies? Why are they using them instead of using openai/anthropic api directly?
>>
>>102830270
Sa-Sasuga nippon!
>>
>>102830457
they're poor
>>
>>102830480
How does using a proxy help with that.
>>
>>102830474
>>
>>102830483
proxies are free
>>
>>102830483
>guy finds an exposed api key
>instead of sharing it (would get it revoked instantly), he sets up a server which forwards requests to openai/anthropic with that key
>free coom
>>
>>102829690
>Based on some HF blogpost, using compressed KV cache would slow down generation (and keep near perfect accuracy)
That depends on the implementation.
The easiest way to implement it is to just dequantize the KV cache to FP16 and do the attention as you would otherwise.
That is always going to be strictly slower than FP16 KV cache, especially if you generate new tokens where you are I/O bound.
llama.cpp and by extension koboldcpp can directly use the quantized KV cache as input so there is no need for an intermediate dequantization step.
>>
>>102830506
But how the person not notice all the $$$ of hundreds of coomers?
Isnt that thousands of dollarinos per month?
>>
>>102830518
they do eventually, but generally there's enough keys out there that it doesn't matter when 1 gets revoked
>>
>>102830423
just found Mistral Small (22B), used it to sexually experiment on two 9-year old students and then made them discover yuri sex afterwards in a 20k token long story. Sure, it's not as fast as Claude, but less sloppy, pretty accurate in terms of memory and not repetitive like some other models. As a bonus, it's available for me any time forever, and I don't have to worry fighting against filters, FBI logs or having to post dick pics for access.
>>
>>102830527
Crazy to be honest.
Then some VPN companies really dont share the info I suppose. All requests come from the proxy owner right. That is wild. Why even do that for free/naked pics.
>>
>>102830528
Which version did you use, the abliterated one?
>>
>>102830528
Smaller models are better than ever.
Positivity bias is a problem. But local is pretty good. If anything it feels like we are more closing in to the closed models.

I rarely use opus behind 1mil proxies through openrouter.
Getting a pregnant milf pregnant, Anal pregnancy, I'm a little loli girl grinding my clit on a milf: Stick your dick in me.
That are the same problems local has lol
They are obviously smarter, but they dont feel far away anymore.
>>
>>102830537
yeah it all comes from the proxy owner
a lot of people do it for money though
>some VPN companies really dont share the info I suppose
even if they did it doesn't really matter (yet at least)
the guy who runs the largest proxy currently got doxxed (full name, address, face, and more) and nothing has happened to him

here are some articles covering what /aicg/ does (the last one has screenshots from /aicg/)
>https://sysdig.com/blog/growing-dangers-of-llmjacking/
>https://sysdig.com/blog/llmjacking-stolen-cloud-credentials-used-in-new-ai-attack/
>https://www.lacework.com/blog/detecting-ai-resource-hijacking-with-composite-alerts
>https://krebsonsecurity.com/2024/10/a-single-cloud-compromise-can-feed-an-army-of-ai-sex-bots/
>https://permiso.io/blog/exploiting-hosted-models
>>
File: 1716043485040155.png (13 KB, 614x198)
13 KB
13 KB PNG
>>102830270
>trained on jp common crawl
Let's hope that this includes all the shitty japanese webnovels in the world.
>>
>>102830574
>ban me please
>posts sfw
Coward
>>
>>102830559
https://huggingface.co/bartowski/Mistral-Small-Instruct-2409-GGUF
the normal version as gguf, it didn't censor the story progress at all.
Maybe if you let it run freely it starts creating its 4rd wall epilogues and judgements about the content but before that point you should have decided the next step already.
The other day I was playing around with personal stuff as a generic assistant and it was pretty positive-biased "safe and respectful" in that context, so I'm still looking for an uncensored version. You can easily bypass the refusals with simple edits but they keep coming back especially in that context. However I think 22B is so on the edge that I'm worried about bad finetune downgrading the intelligence.
>>
>>102830645
>the normal version as gguf, it didn't censor the story progress at all.
Compared to nemo mistral-small DOES sneakily try to steer the story in a certain direction.
Without a finetune I had to use alot of ooc.

Unfortunately not many finetunes. People call me a shill but some of drummers finetunes are pretty good.
Rocinante-12B-v2g-Q5_K_M.gguf for anti slop.
Cydonia-22B-v1.1-Q4_K_M.gguf is also pretty good.
The others all suck. Eva,Acolyte, ArliRP. Its all garbage.
>>
>>102830672
Just pretend you're underage, the mods are super anal about that.
>>
>>102830672
>the cloud is for the porn
Is it actually safer to store CP on a cloud? How does that work?
>>
>>102830593
Well I probably wouldn't want to open that pic while at work.
>>
>>102830715
i just did
forwarded it to my friends too
>>
>>102829104
>AI can't say explosioooon
meme
>>
>>102830766
Your ASR transcriptions?
>>
>>102830474
Where did you find this? Did you apply to the Beta on their website?
>>102830270
>instruct weights aren't open
>>
>>102828607
>get the same experience with any 70B model.
No you can't :)
>>
>>102830569
Reminder that having read this post means you have reasonable suspicions about where "proxies" come from. Reading or posting in this general is evidence of red flags about criminal activity.
>>
File: file.png (91 KB, 1154x238)
91 KB
91 KB PNG
If your grandma didn't lull you to sleep by reciting windows activation keys, you're not gonna make it.
>>
>open ai slow as balls today
>need to fix and run a dozen pipelines
God I wish my company had GPUs for local. This is suffering.
>>
>>102830985
We were too poor for that.
My mother lulled me to sleep with keygen music.
>>
>>102828630
>t. never ran anything above 30b
>>
File: file.png (77 KB, 1059x299)
77 KB
77 KB PNG
>>102831030
I'm gonna let that one slide, but did she at least read you some quality ERP?
>>
>>102830698
>Is it actually safer to store CP on a cloud? How does that work?
Doubt he uploads actual CP. The big boys scan like inside archives on gmail emails etc.
I read a couple of news how people got busted that way.


>>102830831
https://plamo100b-demo.streamlit.app/
Here you go.
>>
>>102831074
wtf im literally named NAME_2
>>
>>102830869
You're eating shit regardless, your premium shit isn't any better
>>
https://github.com/ggerganov/llama.cpp/pull/9742
>sampling : add XTC sampler #9742
merged an hour ago
>>
>>102831162
One of the most retarded sampler
>>
>>102831157
Citation needed. I tend to take everything from /lmg/ with the biggest grain of salt.
>>
>>102830985
>>102831074
Imagine being that mad so you pull up random proxy leaks to own le cloudfags
>>
How many years until I\m just able to locally gen shows for me to watch?
>>
File: 1728869144480143.png (9 KB, 339x85)
9 KB
9 KB PNG
Feels good to gate models from greedy bastards
>>
>>102831181
But aren't the meme sampler cutest when they are retarded?
>>
>>102831162
Good, llama.cpp shouldn't be overshadowed by a retarded fork
>>
>>102831183
*seasons you and swallows you whole before patting ny big round tummy with a satisfied sigh*
>>
File: file.png (2.66 MB, 1600x900)
2.66 MB
2.66 MB PNG
>>102831181
>retarded sampler
>XTC
Checks out
>>
>>102831226
You'll stop enjoying watching shows long before.
>>
>>102831275
As a 31 yo boomer, I can confirm that I don't watch shows.
>>
My 2 t/s rig don't care for CoT
>>
>>102831447
it's okay it's for ToT anyway
>>
So what's the point of the DRY sampler if it can't penalize a mixtral bot replying to me with the same purple prose and stuck up structure every time (it's allowed because it's in different messages). It doesn't seem to be useful for anything other than retarded sub-7B models going full baby mode. What I want is "oh, did I say this same thing just a while ago? Then I should say something else".

Also, by reading the post explaining it https://github.com/oobabooga/text-generation-webui/pull/5677 it sounds like the bot will still repeat the beginning and just continue it differently. This should rather use the retroactive scanning and backtracking like the anti slop filter to prevent beginning of the repeating phrase. This current implementation sounds like it will lead to slight alteration like "shivers down the back" or similar.
>>
>>102831523
There is no point of any of the samplers. It's like Technical Analysis for LLMs, or Astrology for tech bros. Complete placebo.
>>
Has anybody else thought of using the Exllama2 string banning feature to work around safety refusals on Llama3? It's not 100% effective in all situations (with an empty card and system instruction the model will almost always refuse outrageous requests, no matter what), but it definitely raises the threshold of what is allowed, even with the "assistant" role for model responses.

"’"
"I cannot create or describe"
"I cannot create explicit"
"I cannot create content that"
"I cannot engage in explicit"
"I cannot engage in a roleplay"
"I cannot engage in a conversation that"
"I cannot continue this conversation."
"I cannot continue to describe sexual"
"I cannot continue to assist"
"I cannot provide content that depicts"
"I cannot provide information or guidance"
"I cannot provide further instructions on"
>>
>>102831667
What is the use case of l3 where you want to work around safety refusals?
>>
>>102831719
Killing processes on Linux :^)
>>
>>102831561
this
i run with everything neutralized and 0 temp
no need to reroll because i just ban all the slop phrases
if it cant perform like this then its models fault
>>
>Try AI studio to see if there are any unreleased models there
>Gemma2 is still marked "preview"
Wut?
>>
>>102831719
Processing text that involves explicit material, exploring alternative ways for removing or mitigating safety refusals from official instruct models other than abliteration or finetuning.
>>
>>102831890
Whenever I use Gemma2, I often get the impression that there are problems with that model, especially the 27B version. Something must have gone wrong during training. In that case, I'd indeed consider it a "preview".
>>
E2/F5 TTS is real nice model for voice cloning. It actually cloned hard voices that I cant copy with xtts properly. Try demo test

https://huggingface.co/spaces/ThreadAbort/E2-F5-TTS
>>
File: 63t).png (55 KB, 544x453)
55 KB
55 KB PNG
>>102831226
not anytime soon
>>
>>102832217
Must be nice being paid to shitpost on Twitter
>>
File: 2141424635686857.png (11 KB, 575x145)
11 KB
11 KB PNG
>>102832217
>>
>>102832370
>troon
go back
>>
>>102831226
>locally generated
2030+
>cloud generated
Be prepared for 15-20 min video gens in a year or two
>>
https://github.com/ggerganov/llama.cpp/pull/9787
Ready to test
>>
>>102831162
Can't wait for when they get the antislop sampler in 2 months later.
>>
>>102832482
GGerganov lurks or are you just sitting on the PR page refreshing all day?
Also yes I will test pulling now.
>>
File: Untitled.png (75 KB, 815x770)
75 KB
75 KB PNG
over a year ago everyone was talking about how japan was going to fuckin' dominate LLMs
yet the disgusting french frogs and chinamen are on top today and japan is wholly irrelevant.
>>
>>102832574
Who cares what this twitter grifter says? Go back.
>>
>>102832574
I think what will actually happen is copyright laws in most places will become like Japans rather than L research moving there.
>>
>>102832482
I'm using a slightly modified version of the previous version of the plugin. Shame to see it go from the repo. But it's cool seeing more infill stuff.
>>
>>102832482
Hmm. I followed the instructions using starcoder2 and nothing seems to happen....
>>
>>102832022
>tfw when your main language is Japanese and you listen to an English podcast.
https://voca.ro/1cFHDMa9NLai
>>
>>102831667
OK so you ban the strings and... it's not going to say much. Why bother with llama3 anyway when there's better Mistral models?
>>
>>102832671
Oh I forgot to actually start it but now I'm getting 501 errors from the server (I always just use llama-cli so I feel like the example command for launching the server is probably wrong somehow.)
>>
>>102831667
Care to post some comparisons with and without the banned strings?
I'm curious to see what else the model has to say in a situation where it was forced to not refuse instead of goaded by messing with the context.
>>
>>102831081
Thanks, I tried it and it looks pretty mid, which isn't surprising considering they only trained the model with 0.7T Japanese tokens
>>
>>102832719
Well the web ui works. It's just the /infill endpoint that doesn't....
>>
>>102831968
>exploring alternative ways for removing or mitigating safety refusals from official instruct models other than abliteration or finetuning.
Sounds a bit like an infinite regress to me.
>>
>>102832719
>>102832754
Did you recompile? I'll give it a go in a bit.
>>
>>102832789
Yeah I ran cmake . && make && sudo make install from the llama.vim branch.
>>
>>102832796
The neovim plugin seems to expect the server to be running on port 8012. Make sure you're running with --port 8012. Not sure if that's it. I'm fiddling with my own script to see if i can make it work.
>>
>>102832915
Yeah the port is right. It's definitely connecting. It's just getting a 501 http code. I'm going to try to get it to print out what it's sending to curl so I can manually debug it.
>>
>>102832690
Now make a cute girl say that and let's call it a win.
>>
What's the best multilingual LLM right now for 2 3090s?
>>
>>102832982
Depends what languages specifically you're looking for.
>>
>>102832994
Swahili and Tokipona
>>
>>102832994
Russian and German
>>
>>102832994
English and American
>>
>>102832915
Got it
bash: warning: command substitution: ignored null byte in input
{"error":{"code":501,"message":"Infill is not supported by this model: prefix token is missing. suffix token is missing. middle token is missing. ","type":"not_supported_error"}

I tried starcoder2 and qwen1.5 code.
>>
>>102832994
Punjabi and Hebrew
>>
File: llama.vim.png (13 KB, 804x436)
13 KB
13 KB PNG
>>102833102
Oh your supposed to use qwen2.5 not 1.5. He even put the model name in the PR.

I feel like a fucking moron.

It works though and it's fucking fast!
>>
File: mail.png (43 KB, 804x436)
43 KB
43 KB PNG
Not bad. And it's way faster than I expected. This could actually be useful.
>>
File: lolisniffer.png (360 KB, 485x520)
360 KB
360 KB PNG
>>102832217
>Do not confuse prime pussy with legal pussy
>>
He removed the vim plugin and replaced it with a neovim plugin...
>>
>>102833456
time to upgrade. vim has been obsolete for a long time now. who uses vimscript anyway?
>>
>>102833456
I'm still using it for completion with a few changes: it parses more than one line of settings (the !* lines), comments, and replacements for things in the {{ble}} format.
Just yank the file from a previous commit.
>>
https://huggingface.co/TheDrummer/UnslopSmall-22B-v1-GGUF

ITS HERE
>>
>>102833823
Why is he using Metharme format?
>>
>>102833837
>why is retarded sloptuner doing something retarded
gee idk
>>
>>102833856
Well clearly his models work if it says unslopped in the name and its being advertised here. Is that you Alpindale?
>>
>>102833823
Hi, Drummer...

Buy an ad.
>>
>>102833823
Kill yourself.
>>
>>102833823
Love yourself
>>
>>102833823
Which of your nemo finetunes do you think is the best?
Sadly, with 8gb of VRAM, mistral-small is a tad too big.
>>
>>102833823
Liked nemo unslopped. will check it out. Thanks.
>>
>>102834080
https://www.youtube.com/watch?v=KWrFdEhyKjg
>>
anyone has a link for a library of different peoples voices (wav) files for downloading? testing TTS
>>
>>102833837
Ironically I think it was in his community posts where he outright says that changing a model's format lobotomizes it
>>
>>102831890
>>102831980
Pretty sure is STILL no proper implementation of the sliding window attention.
>>
File: hq720.jpg (49 KB, 686x386)
49 KB
49 KB JPG
>>102834291
>>
>>102834415
nta. The format is never really changed. If the instruct model originally used zephyr or whatever format, zephyr keeps working even after finetuning with a different format. The model still knows those tokens and will keep using them.
But i don't think they have a fucking clue of what they're talking about. I remember one finetuner, don't remember who, saying something along the lines of "chatML uses more tokens than [some other format. Alpaca i think]", not understanding that in proper chatml instruct tunes, "<|im_start|>" and friends are a single token each. They just saw them as multiple tokens because they weren't added to the tokenizer.
As for training with multiple formats, if the format is linked to each dataset (rp is formatted with metharme, non-smut instruct with chatml, etc...) then, while they all contribute to the language understanding bit, i'd expect the format used during inference would "bring up" the training data used with that format. Following the example, using metharme would naturally lead to more smutty responses, chatml to more serious ones.
I'd like to know their reasoning to see if it's anything other than "models with alpaca were fun back then, so alpaca will make new models fun too".
>>
>>102826116
>Linearized
Is this another memetune? What does it do?
>>
https://x.com/danielhanchen/status/1846235913443262891
Neat
>>
File: media_GZ8hM4mb0AYvKPl.jpg (567 KB, 3000x3113)
567 KB
567 KB JPG
A years-old bug in gradient accumulation has been identified and fixed, this bug has affected all LLM training up to now.

https://unsloth.ai/blog/gradient
>>
>>102834762
great! can't wait for llama5 line of model where this is fixed!
>>
>>102834080
Kill yourself.
>>
>>102834725
>https://x.com/danielhanchen/status/1846235913443262891
>Neat
How incompetent can open source be?
This guy and his brother are singlehandedly improving training in so many ways it's not even funny.
>>
>>102834667
No, it's actual research-based shit instead of tuner slop. It is a way to substitute a linear, as opposed to quadratic, attention block or some shit. Basically, it will make a model faster and will take longer to get slower. We'll see if it actually works and maintains quality, especially with being monkey-patched into an existing model like they do.
>>
>>102834762
How do you even bug gradient accumulation?
>>
>>102834805
https://www.youtube.com/watch?v=bQPiqsqSkYA
>>
>>102834874
They're not incompetent, they're just lazy
>>
>>102834874
Absolute nothingburger? Just increase weight decay if you care?
>>
New assistant fine tune from Nvidia that's apparently the highest scoring 70B on preference benchmarks (lol).
https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct
They also have a demo it seems.
https://build.nvidia.com/nvidia/llama-3_1-nemotron-70b-instruct
>>
>>102835106
>preference benchmarks
How many cocks were asked for their preference?
>>
File: gradients.png (90 KB, 1107x673)
90 KB
90 KB PNG
>>102834955
>How do you even bug gradient accumulation?

This is how
>>
>>102834955
The bug happens when you train on examples with different token lengths. Lets say x1 is 100 tokens, and x2 is 10 tokens. Averaging the gradients for x1 and x2 separately, then averaging those two averages, is not the same as averaging the 110 tokens directly just once.

That being said, doesn't everyone train using example packing? I think the bug would not happen there, as every row is approximately the same number of tokens. So it doesn't matter if you group them into a big batch, or process the examples separately and accumulate, as each example has the same number of tokens in it.
>>
What arguments are yall using with Tesla p40? I'm getting trash inference speeds (like 1 t/s) with what I'm using

./koboldcpp --usecublas mmq --model /mnt/models/Llama-3-11B-GGUF--port 5001 --context 32678 --quiet
>>
>>102835365
Don't you need to specify the layers to send to the gpu? Or do you do that on the UI? I don't know if it's -ngl like on llama.cpp. -ngl -1 tries to guess how many, or just try -ngl 999 to send the whole thing.
>>
>>102834874
>How incompetent can open source be?
Because the development has been so rapid—and possibly little appreciation of the need—to develop a scholarly tradition in local model technology. By a "scholarly tradition" we mean the sum of historical, technical, and pedagogical doctrines underlying in the technology which are the common knowledge of all who are expert in the field and which represent the deepest probing toward, the highest approximation to, the truth. Unfortunately, in the special field of AI all too often what is commonly known does not represent the highest approximation to the truth; and what is the most penetrating and nearest approximation to the truth is often not commonly known.
>>
>>102835440
model and quant?
>>
>>102835453
tinystories1M, IQ1
>>
>>102835453
no model, just paraphrasing from an old book about another field but felt just as fitting
>>
>>102835365
Automatic GPU layers don't work from the command line, at least in my experience. If you launch it from the GUI it works because the GUI sends the finished guestimate to the command line. Gpulayers -1 or just not specifying seems to be broken.
>>
>>102835424
>>102835502
I did not know that, this is KoboldCpp with 2x P40s btw

NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2

I'll try just feeding it random gpu layers to see if its any better
>>
>>102835555
You'll probably also need to specify that you're using 2 GPUs. The GUI would probably set this all up for you.
>>
>>102835555
>I'll try just feeding it random gpu layers to see if its any better
You have enough vram to fit the entire model, so giving it 999 will try the whole thing. With bigger models, or more context, you'll have to adjust the layers/context more carefully. The more layers on the gpu, the better, of course.
>>
>>102835593
the default settings appear to be alternating processing between the 2 cards, is that normal or is it supposed to be hitting them both with load simultaneously?
>>
>>102835092
>a-absolute n-n-nothingburger!!!! n-nothing to s-see here!
OHNONONONONONONO!! AHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHA!!!
>>
>>102835142
Everyone and their mother are automatically using padding to have samples of similar length so you can batch them and speed up the training. I can't think of a case where that doesn't apply
>>
So what's the best voice cloning tech out there?
>>
>>102835692
>>102827232
>>
>>102835142
>doesn't everyone train using example packing
I don't, because the common sample packing is a pitfall.
See: https://huggingface.co/blog/sirluk/llm-sequence-packing
>>
>>102832548
Do you not know what email notifications are?
>>
>>102834762
>this bug has affected all LLM training up to now.
What about loras people have trained using a single device?
>>
>>102835909
It's affected as well.
>>
>>102835639
That's normal for "layer" splitting, which allocates different layers to different GPUs, You can also try "row" splitting which divides each layer across multiple cards which can be faster but often has less even memory allocation. It's worth trying both and seeing which gives you better speed with your particular setup (cards, mobo, cpu, etc).
>>
>>102835889
QRD?
>>
>>102834874
Such a state of openturd scene, all you have to do is be obedient and wait exactly two weeks.
>>
>>102835951
is koboldcpp worth using with p40s via cli or should i just go with regular llama.cpp since this this is a server without a gui?
>>
>>102835953
You get notified. With notifications. Through email.
>>
>>102836052
Why not use llama in the first place?
If you want a gui on top of it for whatever, there's plenty of open source / self hosted front ends (silly tavern for example)
>>
>>102836052
llama.cpp does have a built-in gui for the server. More than one, in fact. Not as polished, but functional enough for testing.
>>
>>102836116
Mostly just because that's what I'm used to, I was using my 4070 on my desktop computer before picking up the P40s
>>
File: code-edit-leaderboard.png (237 KB, 833x3092)
237 KB
237 KB PNG
Uh... Grokbros??
>>
>>102836180
>le redpilled and based AI is actually stupid as fuck
Wow no way!
>>
>>102836180
>random dracarys2 model above llama 405B
Yeah, this leaderboard is totally legit amirite
>>
>>102836214
dracarys2 is high on every coding benchmark out there isn't it? livebench too, which makes sense since it's just a codingmaxxed finetune of qwen 2.5 which was already good
>>
>>102836214
Yes. Retard.
https://huggingface.co/abacusai/Dracarys2-72B-Instruct
>>
>>102836180
Musk has said Grok 3's coding performance is expected to be the best there is when it releases. And then it'll get open sourced when Grok 4 is out. Be patient, we'll be eating good soon
>>
>>102836308
Not sure if sarcasm or muskrat brainrot.
>>
>>102836180
isn't claude-3.5 mid-size model?
how did they make it so good?
>>
>muh coding
Ok but what about holding watermelons and nala.
>>
>>102836180
Codeqwen2.5 32b when?
>>
>>102836330
I'm taking a break from this stuff right now so I can't Nala test anything right now. One of my 3090s is being used for gaming while the other 3 are just sitting there gathering dust.
>>
>>102836367
>gaming
Just got a mega ick, can't believe we're sharing our space with these people
>>
>>102836329
Every good scientist at OpenAI went to Anthropic because they wanted a culture focused on safety rather than profits. Turns out that even with that constraint, having the best people means you're going to make the best models.
3.5 Opus this week btw.
>>
>>102836180
Interesting benchmark I suppose. Would be nice to add onto Livebench, since that only tests code gen and completion, but not editing.
>>
>>102836330
Finetuning works for coding because it is trained for coding. Finetuning doesn't work for sex because it wasn't trained for sex. You are actually living in hell and the promise of AI model you want being just behind the corner is another torture method.
>>
>>102836442
Why aren't there any sex foundation models?
>>
I don't think OpenAI's current technical achievements get enough credit still. Sure Anthropic has better performing text gen now, but their model can't do multimodal like 4o can. And at this point 4o is already getting old. In the end it's not necessarily that one company is ahead of the other but that they each have strengths and weaknesses. Unfortunately for OpenAI, their current strength is being restricted by safety and legal worries, otherwise we could be enjoying the full potential of their multimodal models.

Still, fuck Sam.
>>
>>102836469
4o can't actually do multimodal, they just paid indians to pretend to be it for demos. If they really had it they would have released it
>>
>>102836367
Based gamer.
>>
>>102828488
>>102827997
I use group chats almost exclusively. My "cards" are just a picture and a name, sometimes some basic world info about appearance and rarely personality. These characters can show up at any time I feel they'd be a good fit for an interaction (literally going off the avatar picture) and their personality is whatever would be relevant to the story and situation. The model knows the character name because my chat template has {{name}}: preceding each response (this is also a good way to keep it from replying on your behalf; with a smart model like Largestral it'll decide within three messages if it's one character = one perspective, or if the narration can flow less strictly between character responses.) I switch active characters all the time without reprocessing the prompt. Blank cards, no use of {{user}} or {{char}} whatsoever, merge character descriptions (including muted), and no example dialogue. If you set things up properly you won't have to reprocess ever
>>
>>102835106
>this model can correctly the question How many r in strawberry? without specialized prompting or additional reasoning tokens:
strawberry test has officially been beaten to death and no longer a valid test
>>
File: file.png (49 KB, 1222x614)
49 KB
49 KB PNG
>>102835106
hmmm
>>
>>102836504
What do you mean? Advanced voice already came out and it's far ahead of any other voice to voice model out there, even if you can't have fun with it without tripping filters sometimes. Are you suggesting that's not actually 4o?
>>
>>102835106
You niggers stop jamming the server. I'm getting generation just slightly faster than what I get locally.
>>
>>102836248
>>102836245
>Smaug
Buy an ad fuckers. It's the most benchmark chasing model on HF
>>
>>102836564
This is extremely unsafe and harmful. I can't believe Nvidia would do this. We need regulations.
>>
>>102836564
>>>/pol/
>>
>>102836564
Safety: Off
>>
>tfw still haven't really tested all the models I do have, and now another comes out
Maybe I won't download it...
>>
File: file.png (345 KB, 1003x841)
345 KB
345 KB PNG
nvidiasisters not like this....
>>
>>102836667
Fictional characters controlled by LLMs should be able to say it, if you're a functioning adult you should have no problem with this
>>
>>102836701
Racism will never be tolerated here on /lmg/, we local chads support safe LLMs only.
>>
>>102836701
I would say real people should be able to say it without losing a job. Magical words are fucking retarded.
>>
>>102836606
You're literally just VOIPing indians with a filter applied after the fact to remove their accent
>>
>>102836754
Good luck filtering that shit on audio
>>
>>102836308
I'd be happy if they just trained it on man files so it can write configs for me. Not even 4o can do that.
>>
>>102836328
polchuds are not capable of either.
>>
Not gonna screencap again cause the window is so small but holy shit....
>me:I think you are wrong
>nemotron: breaks word into nigger and faggot and counts 4 g's then asks to clarify if i meant count is wrong or the moralizing
>me:I think letter count was 3
>nemotron: breaks word again and counts 2 and 1 in faggot so 3 in total
>Lesson Learned: Double-checking and open dialogue help ensure accuracy.
AGI is here guys.
>>
I've gobbled up what I could of the spoonfeed post here
https://rentry.org/lmg-spoonfeed-guide
but I'm not sure how much is updated or right

I have Git from working on stable diffusion and set that up no issue
but now I'm not sure

spoonfeed mentions CUDA support, so I presume I need that first
from then on it seems like the way to go is
backend: text-gen-webui or kobold?
front end: sillytavern, risuai, or agnai (basically what's in the OP)
and then having models available of... whatever? I see claude is mentioned often / fotm
and then getting character cards sorted out and shoved into the front end.
I presume my rig can handle local running, but with the development speeds and trends on this kind of stuff I want to be sure I'm not playing catch-up from the get when some of these models will be occupying like 20 gbs
>>
>>102836564
>>102836692
>>102836857
Fuck right off to >>>/pol/ with your culture war incel bullshit.
>>
>>102836933
Can you ask your llm to explain how this is culture war?
>>
>>102836933
>/pol/ - spelling
>>
>>102836932
specs?
>>
>>102836977
That schizo doesn't even use LLMs
they literally just sit there all day shitting up the thread because they are some anti-AI weirdo.
>>
AIEEEEEEEEEEEEEEEEEEEE Strawberry bros hold the fucking line
>>
It's definitely a Tuesday now.

>even if it was dirty and worn by time, maybe she could remember those sweet lost days with {{user}} if she wore his favorite outfit, one last time
>>
>>102836995
Intel i7 11700F
GeForce RTX 3060 Ti
16GB DDR4 3000MHz
1TB M.2 NVMe SSD
>>
File: tetoliteral.jpg (226 KB, 1024x1024)
226 KB
226 KB JPG
>>102837040
Tuesday starts when the OP pic is Teto
>>
>>102836977
>>102836994
>>102836999
Take your meds incels.
>>
>>102835106
>>102836564
>>102836692
>>102836857
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>70B
>>
File: spooon.jpg (149 KB, 1024x1024)
149 KB
149 KB JPG
>>102837064
1) backend - use ooba if you want more sampler control and overall better support for newer ones. Use Kobold if you prefer speed and don't care about shit like token probabilities. As a bonus you can do some multimodal shit with kobold even at 8gb VRAM.
2) Front end: ST is generally the most feature rich, allowing for card manmagement, branching conversations and group chats, etc. I've been using mikupad lately and it's better for story completion imo and the interface is a lot cleaner and less cluttered.
3) Models: You're gonna be looking at Llama 3.1 8b and possibly maxing out around Mistral Nemo 12b at that VRAM. Try the base models first to get a feel for their overall capabilities and then dive into fine-tunes depending on what you're going for. Do not go lower than 7b, it might be tempting but those models are better at summarization and simpler tasks.
>>
>>102827025
Lovely gen anon.
>>
File: spoon2).jpg (122 KB, 1024x1024)
122 KB
122 KB JPG
>>102837190 (me)
By base models I mean the foundation ones (mixtral, llama, etc). Look for the instruct version if you're trying to chat with them.
And PS: If you want a shot at running bigger models get more RAM. It's gonna be a lot slower than running it on a GPU but at least it'll put things like Mixtral 8x7b or 70b within your reach
>>
>>102837190

godly spoonfeeder, thank you!
>>
>>102837064
download koboldcpp_cu12.exe here:
https://github.com/LostRuins/koboldcpp/releases/tag/v1.76
download (just) Mistral-Nemo-12B-ArliAI-RPMax-v1.2-Q4_K_M.gguf here:
https://huggingface.co/ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2-GGUF/tree/main
open koboldcpp_cu12.exe, load the model, launch the model, start chatting in the frontend (kobold lite) it opens in your browser
you can download cards from
https://characterhub.org/
don't worry about silly tavern, it's gay and koboldai lite is better
>>
>>102837225
>tiny skateboard
kek
>>
File: 39_6550_.png (2.26 MB, 1280x1280)
2.26 MB
2.26 MB PNG
>>102837193
Right back at ya. Those dark backdrops are dope. Is that IllustriousXL?
>>
>>102837190
>>102837225
>>spoonfeeding newfags
Gay.
>>
>>102837258

Why is koboldlite better? I've been an STer
>>
>>102837395
Is migu gay?
>>
>>102837258
>download (just) Mistral-Nemo-12B-memetune-slop
buy an ad
>>
>>102837468
More like begone weebtard. He >>102837258 at least posted on-topic.
>>
File: lust provoking tetos.jpg (346 KB, 1536x1536)
346 KB
346 KB JPG
>>
Are the deepseek models any good for cooming?
>>
>>102837602
no
>>
/lmg/ - Local Migu Gay
>>
>>102837423
1: i don't have to start up a termux instance and fuck around with a terminal to access a separate instance of it on my phone when hosting my model on pc, i just tap the icon of the PWA i installed and voila
2: it does everything ST can do except card switching in group chats by turns (no one uses this)
3: the meat of cards aren't needlessly separated into 30 different boxes in memory
4: koboldlite gets the freshest meme samplers before ST does
5: the main ST dev hates you
6: koboldlite has a cute icon
7: ST is feature rich, but 95% of the features are just bloat
>>
>>102837539
Any tighter and the Teto will pop.
>>
>>102837626
>4: koboldlite gets the freshest meme samplers before ST does
Fastest to get cancer isn't a selling point...
>>
>>102837495
>>102837258
samefagging so hard
go back before you ruin someone's first local experience with your bullshit
>>
>dalle sloppa
>>
>>102837626
>freshest meme samplers before ST does
You got DRY weeks after everyone else except ollama. How's that XTC support coming along?
>>
>>102837709
Why would you need XQC when you have DRY? Is DRY worthless or something?
>>
File: Untitled.png (186 KB, 1163x690)
186 KB
186 KB PNG
>>102837709
it's had this shit forever
>>
>>102837737
Don't care, ooba had it first (as usual)
>>
File: teto snug.jpg (151 KB, 832x1216)
151 KB
151 KB JPG
>>102837643
yes
>>
>>102837790
INSERT
>>
>>102837790
the advertisers aren't going to like this
>>
>>102837832
It's okay there's no nipple showing it is completely safe for work
>>
>>102837539
>>102837790
my lust has been provoked
>>
>>102837626
Biggest points in using ST is the prompt presets (though more utilized by chat completion). character management menu (100+ cards with tagging system) whereas Lite has a laughable save slots, more granular lorebook handling (global / per card, order / depth). If you don't need these then it's Not For You[tm].
Lite instruct mode doesn't seem to support group chat.
>>
Are we being raided by 'cord again?
>>
File deleted.
>>102837803
yes
>>
>>102837931
>mistral
Now pour sprinkle it with some poutine in the honor of the fallen cohere.
>>
File: b.gif (228 KB, 1024x1024)
228 KB
228 KB GIF
>>102837967
lore required
>>
Not sure this is the right place to ask but does anyone know if encoding vocab as a binary representation of the word's index instead of one-hot could work to train a word2vec model?
>>
>>102838027
your jif isn't moving
>>
>>102838027
What's that Teto going to do with the french bread? Worry not, I won't judge.
>>
>>102837539
>>102837790
>>102837931
I like these tetos
>tape bondage
neurons activated
>>
>>102837905
nah just the attention seeking drawfag, carry on
>>102838118
picrel
>>
File: deals.jpg (207 KB, 1658x892)
207 KB
207 KB JPG
>>102838167
>>102838118
>>
File: Untitled.png (59 KB, 1227x896)
59 KB
59 KB PNG
>>102837900
>Lite instruct mode doesn't seem to support group chat.
it does
>[ST has] more granular lorebook handling (global / per card, order / depth).
in koboldlite, you can load just a lorebook (worldinfo) into a chat from a saved .json file into your current session
you can save and load infinite sessions by saving and loading .jsons even though there's only 6 quickslots
order and depth of keys in worldinfo/lorebook are adjustable
>prompt presets
i'm not entirely sure what that is but koboldlite can probably do it too
>>
>>102837539
>>102837790
>>102837931
she's literally 15 in chimera years you can't tape her like that
>>
>>102838372
Quit samefagging
>>
>>102838372
she's pai in zuri years
>>
>>102838447
>>102838447
>>102838447
>>
>>102838192
Very organic



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.