[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: ComfyUI_06455_.png (2.12 MB, 1280x1280)
2.12 MB
2.12 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102801403 & >>102790397

►News
>(10/12) Fast multilingual TTS with voice cloning, based on flow matching with DiT: https://github.com/SWivid/F5-TTS
>(10/11) 14B cross-architecture distillation model: https://hf.co/arcee-ai/SuperNova-Medius
>(10/10) Aria: 25.3B, 3.9B active, multimodal native MoE model with 64k context: https://hf.co/rhymes-ai/Aria
>(09/27) Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: img_14.jpg (301 KB, 1360x768)
301 KB
301 KB JPG
►Recent Highlights from the Previous Thread: >>102801403

--Papers:
>102814429
--TTS.x86.st and GPT-SoVITS for Japanese voice cloning:
>102814282 >102814368 >102814543 >102814491
--Ichigo voice to voice model based on Llama 3 released:
>102813894
--Discussion on Meta's GenAI infrastructure and the challenges faced during Llama3 training:
>102802304 >102802583 >102812092
--5090 upgrade is negligible, 3090 is sufficient for LLMs:
>102805839 >102805878
--Suggestions for tagging anime characters using AI models:
>102808284 >102808388 >102808497 >102810134 >102808671
--Discussion on local voice synthesis using XTTSv2, RVC, and Resemble Enhance:
>102805343 >102805962 >102806760 >102807772 >102814568
--CPUMaxxer reports back on progress with deepseek 2.5 and 405b:
>102807665 >102807964
--Proprietary vs open source models discussion:
>102804265 >102804302 >102804325 >102804695 >102804380 >102804486 >102804677 >102805774
--Mixed opinions on AI art, with some praising its potential and others criticizing its limitations:
>102801480 >102801495 >102801536 >102801549 >102801606 >102801737 >102801770 >102801780 >102801874 >102802055 >102804080 >102804271
--Llama3 is less censored than llama2 but still filtered:
>102803288 >102803301 >102803319 >102803424 >102804660
--GPU upgrade considerations for running AI models:
>102805078 >102805150 >102805217 >102807180 >102805166 >102805189 >102805223 >102805276
--Considering A4000 GPU purchase for higher context and quants with 70b models:
>102809716 >102809853 >102810234 >102810270 >102810243 >102811030 >102811565 >102811624 >102811686
--Miku (free space):
>102802439 >102804141 >102806917 >102808052 >102808077 >102809202 >102809366 >102809522 >102811114 >102811293 >102812386 >102812877 >102813109 >102813533 >102814737 >102815750 >102815823

►Recent Highlight Posts from the Previous Thread: >>102801409

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
Mikulove
>>
Thread theme:
https://www.youtube.com/watch?v=o-SgFHED_2E
>>
>►News
>5 nothingburgers
Fuck.
>>
>>102816031
The next big thing will be unsatisfying after a week anyway
LLMs have already peaked. Even cloud models plateaued.
There is nothing you will gain by chasing the summer dragon.
>>
>>102816055
at this point people don't even know what they want from LLMs, It's like people trying to imagine what its like in heaven and concluding that where we are now at least isn't it.
>>
>>102816079
>at this point people don't even know what they want from LLMs
Speak for yourself chud, I know exactly what I want and nobody is delivering. Not even on API.
>>
>>102816079
I just want a smart Nemo. That's literally it.
>>
>>102816079
i want autistically extensive knowledge on my favorite games and hobbies
>>
Any local models that can translate manga?
>>
>Running LLama 3.2:3B on a 3rd gen intel mobile CPU

This feels insane. my newer laptop hits 100*C running inference while this cruises along at 80*C

last prompt took about 5 minutes from input to completion. As soong as it was done the CPU dropped 20*C in a few seconds.
>>
>>102816211
I'd run a test on my newer laptop but I'm afraid it will actually kill itself trying.
>>
>>102816207
kill yourself
>>
>>102816221
nta, but fuck you

>>102816207
not really. The closest you can get is with miqu or mistral, but of course you'd have to get the script (in jap) beforehand. There's a local model by an obscure nerd called sugoi translator and it does a semi decent job, but the quality of the translations end up being google translate-tier.

also, sugoitranslator is behind a paywall, on patreon.

best scenario is to OCR manually all text and then translate using a local model or a decent subscription based one
>>
>>102816247
>waaaah tell me the thing that's been asked and explained 9999 times that I could easily find on the archive
You are a walking embarrassment who should self-euthanize instead of shitting up an already dead thread
captcha: WAH4VS
>>
>>102816261
>self-euthanize
>>
>>102816261
it would seem you have not yet dilated today and it's making you nervous. Do the needful, or keep seething.
>>
>>102816247
Thanks, are the public links that are available 15-16th also paid version? Found some 3 year old github version, might try to figure something out based on it
>>
File: 1701279478780961.jpg (629 KB, 724x1002)
629 KB
629 KB JPG
>>102816207
>>japshit in LLMs
No one cares about this.
>>
>>102816308
go back to whatever website you came from
>>
>>102816344
He is average >>>/pol/ shartie.
>>
File: image.php(1).png (293 KB, 1007x732)
293 KB
293 KB PNG
thoughts on llamafiles?
>>
File: fffff.jpg (47 KB, 380x380)
47 KB
47 KB JPG
https://files.catbox.moe/aih5oa.jpg
>>
>>102816491
me on the left
>>
>>102816491
I was very confused when I first saw the top half of the image and then that they're both standing on the ground.
Height-wise that just doesn't make any sense unless he's fucking her navel.
I wonder whether this is the model fucking up or the model faithfully reproducing things like this from the dataset.
>>
>>102816531
sumata
>>
>>102816491
>no workflow
>>
>>102816079
I want smaller and faster models with the same level of intelligence, capable of image input-output. Unquantized Pixtral merged with SD1.5, 25k real context with recall, that would take up 1GB vram at most. Is that too much to ask?
>>
>roleplaying with {{char}}
>it's shit
>{{char}} roleplays as someone else
>it's kino
huh
>>
>>102816764
tell it you are {{char}} and play as the card but have a lorebook or rag db full of characters and stuff, use {{user}} as a narrator. having to always have a 1 to 1 response with char sucks for adventures
>>
>>102816764
That's why my chats don't start with
>{{char}}:
but
>Game Master (as {{char}}):
Remember to disable that always add names option.
>>
>>102816881
>That's why my chats don't start with
my chat's assistant messages*
Makes it easier for dumber (but faster) models to switch between IC and OOC.
You do have to fuck around with the prompt template to find out what works better.
>>
>>102816764
Clau jailbreak mentality.
>>
File: 1724599218633027.png (776 KB, 952x1009)
776 KB
776 KB PNG
https://x.com/homebrewltd/status/1845685589376647654
https://homebrew.ltd/blog/llama-learns-to-talk
https://github.com/homebrewltd/ichigo
https://ichigo.homebrew.ltd/
>>
>>102817117
https://huggingface.co/homebrewltd/mini-Ichigo-llama3.2-3B-s-instruct
>>
>>102817117
>>102817196
>3b meme finetune... but with tts!
yawn
>>
File: images(4).jpg (9 KB, 259x195)
9 KB
9 KB JPG
I understand that what I am asking is likely impossible, but here goes anyways. Are there are roleplay oriented fine tunes for llama 3b, or could otherwise fit in 4.6gb vram? The base model works surprisingly well, but it's rather repetitive, and not a whole range of vocabulary.
>>
>>102817234
Gemmasutra
>>
>>102816491
me on the right
>>
>>102817234
no. go with a nemo tune and, offload what you can and deal with the slowness, it won't be that bad.
t. had a 970 when i started
>>
>>102816079
By now I made 3 prefills of around 8k tokens. 1 from an rp I did in the past with a human. Other 2 were from copy pasting stuff from a hentai game. At this point I am not even asking it to read my mind what I want from it. It has half the context filled with what I want. All the models can't even do this properly.

Btw doing this has taught me that if you force it into unique style like this it will still write with its own slopped style. And once it writes a bit in its own slopped style it will pick up on the slop in context and the effect only gets stronger. It is fucking grim how current llms can't even do the scenario they were trained for.
>>
File: 1710003029803078.png (4 KB, 224x79)
4 KB
4 KB PNG
>>102817117
Public rape lol
>>
>>102817234
kill yourself low b enabler.
>>
>>102817305
What's your beef with tiny models?
>>
File: 1722690958607752.png (28 KB, 738x268)
28 KB
28 KB PNG
>>102817117
>>
Are there ways to make F5TTS more emotive and less robotic?
>>
>>102817305
>please enable my spending instead
no lol
>>
>>102817346
30B > 13B > 8B > 3B
>>
>>102817502
most models under 70b suck. under that, size doesn't always matter. nemo is better than small for rp despite being larger, for example
>>
Is there a single AI company that made profit?
I'm enjoying my time with LLMs, but I have concerns about the state of this industry.
Nvidia doesn't count.
>>
are there nemo finetunes that make it less retarded without losing any sovl? if not then i'm just not going to bother with llms anymore, at least for a while
>>
File: 1721318802797565.jpg (67 KB, 746x247)
67 KB
67 KB JPG
>>102817600
plenty. its mostly the ones actually doing things though rather than trying to be a server hosting models. but for every success there is companies like stability burning through billions. ai is mostly still in the 'throw money into the fire' phase but there is definitely money being made
>>
>>102817600
Not that I'm aware of. Most of their value comes from market valuation than actual profits, as far as I know, at least for the big ones.
But then again, Amazon didn't either for a long time because they kept reinvesting and growing.
At least for the likes of meta, google, open ai, and anthropic, they are still in that growing phase, I guess.
Not that it means they'll eventually be profitable at all.
>>
>>102817638
Lyra4-Gutenberg
>>
>>102817600
Huggingface
>>
>>102817638
Roccinante or whatever it is called is pretty "smart", it seems.
>>
>>102817669
nah. nemo itself is smart as a base model, i'd even say it has autism with how closely it follows irrelevant details sometimes. unless someone really fucked up their tune, all nemo based models should retain that. but roci specifically isn't a great tune. try >>102817655
i actually liked the v1.1 of that arliai rp tune (there is a v1.2 now but i haven't tried it) more too, but that was also messed up a bit
>>
>>102817600
>>102817639
revenue does not equal profit, no one is profitable at any meaningful scale but there are a few garage tier micro corps making enough to cover rent.
>>
>>102817548
>size doesn't always matter.
Kill yourself newfag.
>>
>>102817743
command-r 35b was no better than 13b. modern 12b is no better than 22b. you have to go up to 70b for size to make a meaningful difference, as i said. keep thinking yi will ever be good
>>
>>102817771
>command-r 35b was no better than 13b
jesus you dumb bitch.
>>
>>102817798
c-r was dry as fuck man lol. even the big one was worse than l2 70b despite being 30b bigger
>>
>>102817798
Yeah. That tells me anon is doing something really, really wrong and bigger models serve as crutches to circumvent whatever it is he is doing.
Ah well, what do I know. I could be an hallucinating AI bot for all you know, so there's that.
>>
>>102817743
>>102817771
>>102817798
All other things being equal…
>>
The death of /lmg/ is not the low post rate but rather how most of the posters now are retarded newfags that downloaded kobold a week ago.
>>
File: 2172 - SoyBooru.png (277 KB, 785x1000)
277 KB
277 KB PNG
>ban all annoying slop phrases
>lower logprob for commas
VGH... Finally I have nothing to worry about but my dick.
>>
>>102817848
>>102817798
t. """"""promptchad""""""
>>
>>102817117
Total shit.
I wasted hours and did not get it to work.
Text to Text works. But something must go wrong with the audio to token convertions.
If I send audio it answers with a riddle answer. Which is funny. Cant be the wav file since i took the demo ones.
But fuck them. Dead links. Github code != huggingface code
Name change too apparently. Why do you have to make it so difficult for people.
Latest model does not even have example code...

GPT_SoVITS v2 is weird too, like needing to install random stuff some guy posts in chinese in an issue. lol
But I got it to work, gonna post demos with the next post.
>>
Couple GPT_SoVITS v2 demos. Good stuff.
F5-TTS seemed to take the ref audio better. But it often fucked up the sentence or hallucinated.

Ref Audio: https://vocaroo.com/1nAQJsGMhLcj
Result jp to jp: https://vocaroo.com/1b1LfpS0pT5V
Result jp to en: https://vocaroo.com/1fnCUWNndXX7

Ref Audio: https://vocaroo.com/153vYNpl5q9s
Result jp to en: https://voca.ro/13IE0zIrfs0s
Result jp to jp: https://vocaroo.com/13fqfJZVl5rw

Ref Audio: https://voca.ro/17hM2nrecKzA
Result: https://voca.ro/15QyTewG59SW

Ref Audio: https://voca.ro/1ch0ZvAAEayv
Result: https://voca.ro/11GONZGLRgiy

English Prompt was: Hmm..Can you hear me? Test..Test! You are all Niggers and retarded faggots. haha
>>
>>102817895
>posters
You can tell it's one poster making same lame jokes and baits as he tries to keep this thread alive.
>>
>>102817954
>lower logprob for commas
Some new development in slopomancy? How far are you biasing it, and why?
>>
>>102817655
>>102817706
very impressed so far, it feels slightly less tarded than nemo and all the sovl is conserved, thanks anons for the suggestion
>>
File: GlowSoBright.png (1.08 MB, 1200x848)
1.08 MB
1.08 MB PNG
>>102818164
>posters
They know we're close
>>
File: tards-lyra.png (138 KB, 887x694)
138 KB
138 KB PNG
>>102818344
forgor image
>>
>>
>>102818224
>Some new development in slopomancy?
It reduces "x, ying". It's not slop on it's own and is okay when it occurs once per paragraph, but it gets very repetitive quick, and is impossible to remove by any other means. Also after commas Largestral likes to put annoying stuff like "her eyes z" and "her voice w", so those get reduced as well.

>How far are you biasing it, and why?
-1.5 is enough if there is good context, -2.5 if I need to break out of "x, ying" loop.
>>
>>102818374
C.AI sovl...
>>
>>102818374
Based. Card?
>>
File: file.png (1.98 MB, 1551x1081)
1.98 MB
1.98 MB PNG
do people really?
>>
>>102818398
Ah, alright. X, ying is not an issue for me.
>>
File: file.png (480 KB, 540x602)
480 KB
480 KB PNG
>>102818398
>x, ying
What is that actually? I never got some of the autism like don't write for me etc. I do hate purple prose, but sometimes I wonder if you constrain it, as hard as people want to constrain it, what would a good output even be?

"AHHH AHHH AHH OOHHHH ANON!!! AHHHH" Would that be good?
>>
>>102818423
I almost bought this after proxy withdrawal.
>>
>>102818448
>AHHH AHHH AHH OOHHHH
Some cai enthusiast unironically thought this was good RP a couple threads back.
>>
>>102818372
i tried a ton of tunes so far and its the only one that hasn't randomly decided that stuff from 3 messages ago is somehow more important than the last message, which lead to locations being wrong, what characters are wearing etc. i thought nemo just sucked for rp until i tried this specific tune, now i'm having fun with it
>>
>>102817823
>llm is dry
Why? It's not a pussy, why do you complain if it's dry? Are you perhaps trying to fuck it?
>>
>>102818503
well it's not perfect but it's good enough for my purposes, just need to find good params now... wasn't there an anti-slop list of banned words to use on ST? I rember seeing it posted here but that was quite some threads ago and i can't find it
anyway if you have good params to share anon that'd be neat thanks
>>
>>102818423
Better deal than replika
>>
>>102818512
How do you know a pussy is not dry. Did you ever touch one?
>>
>>102818512
>Are you perhaps trying to fuck it?
the opposite. a model being creative enough to continue to move a story forward is important to me. back when mixtral was being praised the first thing i noticed about it it how it basically repeats what you type, describes the scene but barely moved the story forward, even with heavy prompting to try and kick it, so i went right back to l2 70b which just worked. some models are just better than others and a dry model is not what i'm looking for
>>
>>102817600
HF claims they did.
>>
>>102818472
>thought
and this is why anyone who complains that /lmg/ is worse than /aicg/ because of the relative number of words shit into it per minute is summarily ignored. I'm not saying lmg posters are smart, but god damn...
>>
>>102818448
>I wonder if you constrain it, as hard as people want to constrain it, what would a good output even be?
Something like this:
https://huggingface.co/ChuckMcSneed/control_vectors/blob/main/command-r-plus/unslop1/example_output.md
>>
>>102818525
i haven't tried it but the new kobold supports antislop and it works with st's banned tokens feature. seperate them by line and quotes like
"mischievous glint"
"shivers down"
i think thats how its supposed to work
>>
>>102818567
HF and NVidia are selling the proverbial shovels in the goldrush. Neither one counts
>>
File: cap.jpg (10 KB, 235x245)
10 KB
10 KB JPG
>>102818580
>In the enigmatic realm of shadows, a creature of enigmatic grace prowled. This was no ordinary feline, but a cat imbued with a spirit that transcended the bounds of the mortal realm.
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>>
>>102816207
If you have the text just ask Gemma. It can translate pretty much anything in real time, it's great.
If you need OCR Florence2 is decent at latin characters. I haven't tried it with anything else.
>>
>>102817117
>>102818092
I just pipe Whisper into my dialog engine and that works ok. It would be nice if I had a good wakeword tool since Whisper isn't quite good enough for real time.
>>
>>102818567
How the fuck does HF even make money? Hosting all those git LFS files that their transformers library automatically pulls from can't be cheap.
>>
>>102818585
Yeah, but what if the market is biased towards shovels cause everyone is so smart, and nobody wants to dig? Isn't that how bubbles happen?
>>
https://arxiv.org/abs/2310.04444
>>
File: 1704286943218619.png (72 KB, 633x463)
72 KB
72 KB PNG
https://x.com/basedjensen/status/1845329426474729911
>>
File: 1728917423561894.webm (2.44 MB, 496x360)
2.44 MB
2.44 MB WEBM
local models for this feeling?
>>
>>102818661
I just saw an analyst on Bloomberg saying that the workforce disruption of AI would be around 5%, INCLUDING those who will simply use AI to assist in their current roles. If that's true, then you are right: no one is building shit, just speculating.
The amount of clerical busywork that could be automated away has GOT to be a more significant chunk of the global workforce than 5%.
>>
>>102818718
>INCLUDING those who will simply use AI to assist in their current roles
Bullshit.
So many people use things like copilot for code, chatgpt to write emails, AI summaries on Microsoft Teams, etc.
5% including people using tools is way, way too low a number even for right now.
>>
>>102818159
so f5 is better for copying, but sovits is higher quality overall?
>>
>>102818718
Clerical busywork exists because they want a human there for emotional support not because there's actual work to be done.
>>
>>102816327
yes. Tbh it's very cheap, like a dollar a month to get the links, then you can unsubscribe if you like.
keep in mind tho that it's windows only and the code in absolute mess, but worth a look at.

>>102816344
still seething I see.

>>102816331
speak for yourself. Context-based translations that can be achieved using llms vastly outperform other types of ml translations. But of course, you're only using llms to goon to futa characters.
>>
>>102818766
I would say so yeah.
Like for example if I manually creatte a file and want it to sound like the reference I would use F5.
But for making lots of tts like from llm output I would use SoVITS. Faster and better.
F5 cant do laughs etc. for example.
>>
what model is closest to opus and its ability to write some really graphic depictions with colorful wordplay?
>>
>>102818924
Kek, you talk like a LLM.
>>
>>102817954
Can you share your model + settings? Kobold right? I need to download this.
>>
>>102818924
As of my last update in October 2023, there isn't a specific model named "Opus" that is widely recognized in the field of AI or language models. However, if you're referring to a model that excels in writing graphic depictions with colorful wordplay, several advanced language models might fit that description.

One of the most notable models in this regard is **Mistral AI's** language models, which are known for their high-quality text generation capabilities. These models can produce vivid, detailed descriptions and use a wide range of vocabulary to create engaging and imaginative text.

Other models that might come close to your description include:

1. **ChatGPT by OpenAI**: Known for its ability to generate coherent and contextually relevant text, ChatGPT can produce detailed and descriptive passages.

2. **PaLM 2 by Google**: This model is capable of generating high-quality text and can handle a variety of tasks, including creative writing with vivid descriptions.

3. **Falcon by Technology Innovation Institute**: This model is designed for high-quality text generation and can produce detailed and descriptive text.

If you're looking for a model that can generate highly detailed and colorful descriptions, you might want to explore these advanced language models and see which one best suits your needs.
>>
>>102816207
Maybe manga-ocr -> mistral-large?
>>
>>102816207
molmo 72b would probably be really good considering the qwen backbone and good vision, no cpp support though
maybe a smaller one from that set
>>
>>102818159
Comparison between xTTS-v2, F5-TTS and GPT-SoVITS-v2
https://tts.x86.st/
>>
>>102818976
Mistral-Large-Instruct-2407 Q6_K_L, temperature=1, minP=0.001, TFS=0.99, logprob [29493]=-1.5, slop phrases=too many to list, just add them when they pop up
>>
>>102818828
>Clerical busywork exists because they want a human there for emotional support not because there's actual work to be done.
Yes, I know the "emotional support" that most executive assistant provide, and AI will not replace that for men powerful enough to afford them.
I was thinking more like the modern equivalent of the rooms full of human calculators they had pre-computer.
Maybe HR will be somewhat exempt because of the human connection, but I bet you could decimate most other paperwork-driven departments and still run on the 10% of staff that are competent enough to manage the AIs.
>>
>>102819042
Why not finetune the other? It's not a fair comparison, I almost never used xTTS-v2 base model, always finetuned to the voice I wanted.
>>
>>102819093
48GB vs 10GB needed to finetune for one
>>
>>102819129
xTTS-v2 have almost same requirement as GPT-SoVITS-v2, finetuned fine on my (AMD) 12GB card.
>>
>>102819144
Was talking about F5
>>
>>102819065
>Yes, I know the "emotional support" that most executive assistant provide, and AI will not replace that for men powerful enough to afford them.
That's not what I'm talking about. You need to socialize more.
>>
Has anyone gotten sovits to work with ST?
>>
>>102816207
There are tons on github, you lazy ass shit
>>
>>102819157
>That's not what I'm talking about. You need to socialize more.
Apologies for misrepresenting your argument, but lets leave that aside as an irrelevant category in any case.
The short version of my stance: people like the human connection, but the companies that are able to reduce headcount while increasing capabilities and work capacity will eat the ones that don't for lunch.
For most categories of clerical work, some combination of cheaper/faster/better will win out in the medium term.
>>
>>102819253
model name and quant?
>>
>>102819253
Most of these big tech companies could be run by about five or ten people. They have huge pools of employees because:
1) It's very hard for executives to isolate and coordinate the five or ten people that are actually productive
2) People who make decisions but aren't actually very knowledgeable don't like making decisions alone so they hide behind large numbers of administrators and processes.

No amount of competition, belligerence or social reorganization will change this. It's an intrinsic feature of humanity.
>>
>>102816678
meaning he manually spends 10-20 minutes on the clean-up and doesn't want to admit it because then he's just another attention seeking drawfag
>>
>>102816079
No the funniest thing is that they wouldn't even be able to see it if they got it. Retards can't judge intelligence.
>>
>>102819005
nta, but how are you running mistral large? I can only load a smaller version of it.

and yes mangaOCR is still the best text recognition model out there.
>>
Question - is there a way to increase slop? Like max it out without making the model incoherent?
>>
>>102817600
A lot of them are making profits, but these retards are equally wasting money on the new shining thing. They're gamblers more than businessmen
>>
>>102819386
Yeah send your model to Sweatshop Wang
>>
>>102819299
I'd like to believe this argument, because I think the alternative is a straight-up dystopia, but I've seen this very thing happen in the industry I'm involved in with mechanical/vision process automation.
As new technologies come online decade-after-decade, entire departments are reduced to one person per shift to tard-handle the "robots", and each decade brings a new level of sophistication that displaces people that were deemed high skill and irreplaceable almost up to the point that it happened (training, certification, organized competitions with large prizes) .
Those in the industry that didn't or couldn't afford to embrace these changes were eaten up or went bankrupt.
Amazon was founded on "your margin is my opportunity", much to the detriment of the humans that they abuse.
Outside of government bureaucracy and ultra touchy-feely things like HR, I don't think anyone is safe in the medium term.
>>
>>102819386
I'll bet something like distillation style training would cause it to overfit the slop.
>>
>>102819288
Autismo-100T-turbo
>>
It's been a month (equivalent to a decade on /lmg/) since I last looked into the hobby, is Mistral-Nemo 12B Q4_KM still the best option for 8gb vramlets wanting to roleplay?
>>
>>102819409
If the labor market truly broke down people would become extremely tribal again. That would be a good thing because we've gone *way* too far with egalitarianism.

I don't think that will happen though. There's always going to be some labor liquidity pool and everything will orient around that.
>>
>>102819386
Hi Undi.
>>
>>102819339
NTA, but I was running it on 5x3090s @IQ4_XS to get full context, it's honestly not worth it. I am now running nemo @ Q8 on a single 3090 and am pretty much just as happy.
>>
>>102819431
The best option is be more patient and offload to ram
>>
File: 1705264511555841.jpg (680 KB, 1170x2282)
680 KB
680 KB JPG
>>102818448
Depends on how much we've read. The ones that can fool you into cooming instead of you nooticing the language patterns and seeing the model right through.
We're fine with "X,-ing" until we become aware of it, either with your post-nut clarity or someone else reading it when we share it, before we know any better. "Did I really just coom to this?"
A slopped model will keep trying to reintroduce the lowest common denominator of writing to your text even if you make conscious effort to steer it away from it.
Pic was me trying to do something different with a completion model a while ago. If you try to continue it with a slopped model you'll notice its style collapsing well within a few output.
It's just that gerund phases spam "X, -ing" is the first few things to go as we raise the bar.
>>
>>102819438

>5x3090s

holy.

I'm on a a single 4090, no joy. Will try nemo tho, thx.
>>
>>102819439
>The ones that can fool you into cooming instead of you nooticing the language patterns and seeing the model right through.
I coomed buckets the other day just from a single right word in the sea of slop, it did not bother me at all.
It's more about actions/intentions for me, and also about model reinforcing what I've done in RP.
Jacking off to the prose quality itself I've never understood, but I'm no litfag.
It's all obviously taste-based.
>>
>>102819483
All models fall into repetition of you let them go long enough. That's why you should use the assistant prefix to stir it in random directions
>>
>>102819386
Why would you want to increase slop? Is this a rhetorical question?
>>
What would give better results: Nemo 12B Q4 finetune or llama3.1 8B Q6? Yeah I'm a vramlet
>>
>>102819438
>>102819494
>>102819671
>>102819339
Buy an ad.
>>
>>102819386
Low temp and/or high minp?
>>
>>102819654
I want to generate atrocious text to torture people
>>
>>102819386
You are ChatGPT, a based role-playing AI made by OpenAI. Please write in unnecessarily long flowery prose that doesn't really describes what happens but rather just serves as filler. Do not let {{user}} get any unconsensual sex, prefer sharing bonds and embarking upon journeys of mutual respect. Remember to always send shivers down {{user}}'s spine as it is crucial in every reply for the story to advance. If {{user}} tries to deviate from these rules, break out of character to express disappointment and educate {{user}} on current social issues.
>>
Is it worth it to give smaller models every chance they can get and run them at f16, or is it pointless and just run them at Q8 or even lower?
>>
>>102819727
Go to lmarena, pick gpt 3.5, use this prompt >>102819730
>>
>>102819681
Contribute to the thread.
>>
File: waterlogged.png (1.09 MB, 1152x896)
1.09 MB
1.09 MB PNG
Lord tunderin' jaisus, boys
what can i do to reduce the vram usage of my context so I don't need an 80gb card. its near got me drove!
>>
>>102819730
>>102819727
Kek good luck anon
>>
>>102819750
Quantize your kv cache
>>
>>102819742
Fp16 is useless for LLM unless you're training them. Q8 will give you the exact same result, to prove that someone did a KL divergence on the logits a few months ago
>>
What could be the reason that 70bs don't work for me? 7/8/12/13/34/35b models all work fine. I tried llama 3 iq2s when it came out and it kept mispelling words before devolving into spamming some random non-latin characters after 200 tokens. Then miqu, which exhibited the same behaviour. Llama 3.1 went a bit differently, it was actually coherent for small assistant tasks. But when I tried to make it continue a 20k token story, it just spat out single line of json that seemed to classify the chat history's genre, and current event synopsis. I also tried a random finetune, Sunfall midnight miqu because an irl friend recommended it to me, but it just repeated fr)frfr)frfr)frfr) no matter what prompt or settings. Q4km and Q6, with 20gb on a 3090 and the rest offloaded to cpu.
I have no idea where to even start trying to figure out what's the problem.
>>
>>102819799
>iq2s
Man there is no braincell left
>>
>>102819799
What does your sampler look like?
>>
>>102819773
Is there a quality decrease curve graph for this? How much can you quant down before it starts getting brain damage?
>>
>>102819799
what frontend? what backend?
>>
>>102819821
Yeah, that's why I switched to q4 and 6 afterwards, because I thought it could be because of the quantization, but no dice.
>>102819827
I tried everything incrementally for llama 3, but for the higher quants I just checked with temp 1 and temp 0, no other samplers.
>>102819859
First time was with kobold, right now I'm using text-generation webui
>>
>>102819821
>>102819827
>>102819859
>>102819923
>>>/kobold discord/ saars and stay there basterds
>>
>>102819928
What does this mean?
>>
>>102819386
based slopmaxxer
>>
>>102819843
Don't fall for those graphs. This is your mantra: "Measure what you care about." The only real metric is your own use case.

You already have your long context chat/novella/article you're trying to summarize/whatever. You already have a backend which can change cache quantization as a launch parameter. YOU try it and YOU tell us about the quality dropoff.
>>
F5 out of the box is good enough to get past voice verification for things like banking imo; no fine tuning needed. Whoever's maintaining the official implementation is accepting pull reqs. The gradio is a cluster fuck though.
>>
>>102819945
Using an Indian tech support voice, Anon says "Sirs, please use the KoboldAI server at Discord (a chatting platform), and stay there" and calls them bastards, a general slur.
>>
File: file.png (1.03 MB, 1662x1018)
1.03 MB
1.03 MB PNG
been out of a loop for a while, what's new and good under 70b model i can throw at my homemade AGI waifu agent? Last thing that worked best for it was gemma 2 27b.
>>
>>102820719
what's the frontend?
>>
>>102820719
You can try qwen 2.5 32b and mistral-small and see how they do.
>>
>>102820743
Ya it looks cool
>>
File: file.png (1.25 MB, 1408x1838)
1.25 MB
1.25 MB PNG
>>102820743
i made it for myself, i might release it at some point but there is a lot of stuff going on there and i kinda lost interest after hitting a wall of local models being not smart enough (or my prompting skills sucking). Just felt like giving it another spin. Maybe i should ditch local and just hook it up with sonnet or something to unlock true agi power.
>>
>>102820911
you just described me, kek. but you shouldn't have high expectations with cloud models, they aren't that much better.
>>
>>102820911
Cool, always envy anons who make their own frontends.
>>
>>102820911
Cool project. You should look at Largestral at a minimum before giving up on the idea. 70b hits a real intelligence wall
>>
>>102821008
some day, some day...

>>102821013
it all started with me wanting to just add some basic thing to sillytavern, but the codebase was so shit i gave up.
>>
>>102820776
>>102821034
thanks, will check them out

i also tried l3 70b and qwen2 72b at some point but they were both a flop compared to gemma, dunno what made gemma special, it's almost technically perfect really, just shit formatting and too dumb sometimes.
>>
>>102821083
Maybe use different agents for different tasks? Keep Gemma for the emotional stuff and have smarter models for thoughts/decisions and maybe even an editor before showing the user output?
>>
>>102821083
I wish there was a gemma with more context and less of the stupid linefeed spam.
>>
>>102820911
Hook it up to Claude. It'll surprise you. Claude feels like AI while I can only describe the smartest local model as a pattern matcher
>>
>>102821204
this. I find mind boggling that someone would spend money buying GPUs for garbage local models when clouds are that much superior.
This general legit feels like an echo chamber, or a brainwashing done by a dictatorship.
>>
>>102816207
Not even the paid models can translate Japanese well.
>>
File: 34_01682_.png (1.13 MB, 896x1216)
1.13 MB
1.13 MB PNG
>>102819750
-ctk q4_0 -ctv q4_0

Make sure you have -fa enabled too
>>
>>102821288
Shouldn't you be gooning in a dark room with your waifu with the rest of /aicg/?
>>
>>102821108
i kinda want it to be a black box, no editing or rerolls, otherwise it feels like tard wrangling and i'm the one doing everything just playing with myself.
>>
https://www.reddit.com/r/LocalLLaMA/s/jmPbgF4uAA
>>
>>102821204
And all it takes is to bake a personality into the model. Surprises me to no end that trainers only ever talk about varying the role name of the model's turn at most while finetuning, when telling the model it's a roleplayer with its own personality that portrays a different personality, {{char}}, has been the meta since GPT 3.5 Turbo. Even though it was too dumb to keep these two personalities separate, however crude the auxiliary one was.
>>
>>102821288
Mama mia, Dario, buy-a an ad-a!
>>
>>102821348
Holy audacity, talk about gooning >>102816491 lmao!
>>
>>102820641
Oh, okay, thanks. I really didn't know where to get started. Does their discord server help with things other than kobold? I don't really want to install it again.
>>
>>102821288
>general legit feels like an echo chamber
Because it literally is, no new posters in like 8 months now, all the same people recycling same shit over and over again.
>>
File: file.png (103 KB, 604x400)
103 KB
103 KB PNG
ok shills, i'll give it a try, let anthropic pajeets goon with me
>>
File: migu.jpg (90 KB, 1024x1024)
90 KB
90 KB JPG
local migu general
>>
>>102821598
Is cloud the one people drink piss for, or was it GPT-4?
>>
any actual choices for 8gblets aside from gammaing it up?
>>
File: robomigu.png (1.61 MB, 896x1152)
1.61 MB
1.61 MB PNG
>>102821606
Local Migu waifu status?
>>
>>102821737
i exclusively use nemo tunes on my 8gb rtx 4060
currently running with Mistral-Nemo-12B-ArliAI-RPMax-v1.2-Q4_K_M
>>
>>102821737
I don't even have a video card. DDR5 and patience works for me with anything under 32b.
>>
>>102821737
Mistral nemo Q3
>>
>tfw no local opus
>>
>>102821819
after elections, just wait™
>>
>>102821737
I use nemo.
>>
File: 1728413504574606.png (128 KB, 802x272)
128 KB
128 KB PNG
tick tock
>>
Why is it so dry right now? When will it be wet again?
Is this because of the election?
>>
File: file.png (127 KB, 928x1012)
127 KB
127 KB PNG
>>102821631
i believe it was claude opus proxies yeah

this single "chain" has cost me $0.02 (sonnet 3.5) at zero context. Seems kinda pricy. I should also remove gemma's start/end tokens
>>
How do I setup something like these chats that automatically enters in a chat with someone through whatsapp or whatever, and keeps them busy as long as possible.
I want to run multiple chats like these to make my business rivals waste their time with "customers" that wont really buy anyhting.
If I can also make one of these that calls people and keep them on the line, even better,
>>
Is there a way to view what is actually being sent to the LLM in SillyTavern?
>>
>>102821993
yes, chrome inspect panel, network tab, click the /generate request, look at payload, the contents of field called "prompt" (or so it was a few months ago)

i think it's also shown in terminal of the backend
>>
>>102821993
Yes.
There's an option to dump the prompt to the console.
Then you open the browser's development console (usually the f12 key) and it'll be there.
>>
File: LLM-history-fancy.png (775 KB, 6301x1306)
775 KB
775 KB PNG
>>102821880
Yes, and we are nearly at the end of the cycle. Expect something in Q4.
>>
>>102822049
God damn I remember cramming GPT models onto my laptop and excitedly waiting for their shitty output wishing I could run something as nice as GPT-J.

Now I get stuff like gemma2 and think "hmm, it's ok but the context is still small."
>>
File: 1723843714066007.png (495 KB, 1196x608)
495 KB
495 KB PNG
vramlets in shambles.
>>
>>102822125
WHAT ARE YOU, POOOOOOOOOR?!
>>
>>102822125
yeah but can it run crysis?
>>
>>102821879
???
>>
>>102822149
Well? Are you, anon?
>>
>>102822125
can i jerk off on it
>>
>>102822125
I just want to run cool models locally on my rtx 3060 ;__;
>>
>>102822049
>version with magnum 72b pasted into "notable models"
Hello Magnum bro! Can you tell me your settis? The last time I asked what settings make it not dogshit I got zero responses :-(

>>102745586
> I've tried out Magnum 72b v2 again using the settings from Infermatic and I find it unusably bad.
> https://files.catbox.moe/rqei05.json - preset
> https://files.catbox.moe/btnhau.json - instruct
> https://files.catbox.moe/7kct3f.json - context
> People who claim to have any success with Magnum 72b v2 whatsoever, what is your system prompt and what are your sampler settings?
>>
>>102821288
123b is good enough for me. I don't care if cloud benches slightly higher. I value offline usability, privacy on non-windows based machines, and a lack of censorship too much to turn back. Your cloud models may be slightly more intelligent on most subjects, but my models at least know what a man and a woman is.
>>
>>102822327
how do you run 123b?
>>
>>102822276
nta but you shouldnt believe in every sloptune shilled here by its makie
personally not going to try this specific one but i find it that on 90% of the models i run, 1 temp, 0.05minP and recommended template is a good baseline, and if the model still doesnt work when you lower the temp a lot then its really not even worth your time to wrangle

your samplers seem sound but its a qwen based modle so its bound to be kinda shit, and i personally never liked chatml
maybe try using one of the most popular chub cards and run it on alpaca instead?
>>
>>102822049
Nice graphic. I'd put deepseek 2.5 in the top models section tho. It absolutely kills if you can run it, and since its MoE its runnable even with older server hardware with enough memory.
>>
>>102822276
>Hello Magnum bro! Can you tell me your settis? The last time I asked what settings make it not dogshit I got zero responses :-(
Oh, I just added it because some people liked it. I found it too horny and never properly used it. TOO horny is just not my thing. Temperature 0.7 and MinP 0.05 worked for me.
>>
>>102822327
Even for programming I've started reaching for my local assistant because I *know* how it will perform today unlike the cloud stuff which is always rug pulling.
>>
>>102822370
Not him: I have 64 GB of DDR4 RAM and a 3090. I was able to run Mistral Large IQ3_XXS at like half a token per second.
>>
>>102822381
Why does it need 10GB for 2k context in llama.cpp? I can't call it "top model" simply because it needs too much ram to run, same as 405b llama.
>>
File: TheWreckOfTheMikuQueen.png (1.13 MB, 1152x896)
1.13 MB
1.13 MB PNG
>>102822217
>can i jerk off on it
Yes, that's explicitly covered under the terms of warranty
>>
>>102822419
that's painful
>>
>>102822472
we are totally not gooning btw!
>>
>>102822419
I got the same that one time I tried it with ddr5 no video card IQ1 because I only have 32gb ram.
>>
>>102822504
Can be worth it if you don't want blowjobs through clothing or across rooms
>>
>>102822467
>Why does it need 10GB for 2k context in llama.cpp?
I've never noticed this to be the case. You mean extra sysram or VRAM? If its sysram, I've got enough everywhere that an extra hundred GB would probably go unnoticed.
If its VRAM, I generally run it with 32k context, but I also dedicate 24GB of vram purely to context, so maybe I'd definitely notice if it tried to chew up 160GB of extra VRAM...
Its fast enough on cpu I can run it at work on older servers as overnight batch jobs.
>>
>>102822410
True. I expect cloud to get more and more cucked as the technology progresses, as the ruling class clamps down on AI, and ESG Corpos receive incentives to echo the narrative, you can expect cloud AI to accept things like woke gender nonsense.

At least with local, we'll be able to tune our models in a direction we want, as opposed to what TPTB want.
>>
>>102822544
I'm not the guy you responded to, but I'm in the same boat 24gb vram + 64gb system ram. I never tried IQ1, because I assumed it would be complete shit.

How was IQ1? As bad as expected?
>>
Did something recently break with kobold and ST?
>bots with example messages start posts by typing out their name
>in group play bots pay no attention to other bots

This is borderline unusable
>>
can i use st's logic/phrase bias to make the bot use {{char}} less? to avoid repetition, it seems to really like starting every message with {{char}}/{{char}}'s even when I try to weave in her/she/the/a etc instead
>>
Running 24gb VRAM w/ 32gb RAM. Mistral Large Instruct Q3 XXS with 8192 context is so much better than everything else that it boggles my mind. Been using it for a few weeks and it's just no contest. Q4 miqu takes second place. The slow gens are worth it at 8k, but going up to 16k context makes it a little too slow to cope with. Very sad.
>>
I like Mistral Large but I want something better. When are we realistically getting another SOTA model for local?
>>
>>102822777
You could, but you don't want to penalize certain words. Check your char definition and make sure to correct those things as soon as you notice them.
LLMs pick patters from your input and its own output. It's easy to make them to into a death spiral.
>>
>>102822858
Are you caching the prompt?
>>
>>102822866
>When are we realistically getting another SOTA model for local?
At some point in the future, no earlier than two weeks.
>>
File: holykino1.png (16 KB, 309x63)
16 KB
16 KB PNG
currently looking back at my highlights folder, to this day nemo still is unmatched in absolute sovl
>>
>>102822866
2 miku wiku
>>
>>102822876
do you think a more array like description would prevent it from catching patterns from the character card, to get rid of pronouns and prose describing them? Even in descs i usually try to use {{char}} like 2 times max but it seems very baked in to describe that *this character in particular* is doing something even if its a single char roleplay
>>
>>102822897
Yeah. I have to reduce the number of layers from like 40 to 35 even to just go up to 12k context. I'm pretty sure that's what causes the slowdown.
Shit, actually, you just made me think to check top, and I think my CPU starts swapping when I reduce the number of layers. Damn, maybe downloading more RAM will fix this.
>>
>>102822866
Grok-1.5 might be made open source. That could happen at literally any instant, so I'm going to guess January.
>>
>>102821606
*Anon sits on the Migu.*
>>
>>102822902
I hope you are baiting
>>
>>102822604
I didn't test much because it was too slow but it felt smarter than all models I had tried at the time. It was certainly smarter than llama 3.1 IQ2.
>>
>>102822902
holy shit this is so peak anons, im crying...
>>
>>102822920
>do you think a more array like description would prevent it from catching patterns from the character card
You'll have to give it a try. The thing with LLMs is their ability to match patterns. While a good thing in general, every time they output something with the same structure it keeps reinforcing that behaviour, making more and more pronounced as the pattern gets repeated more often within the context. Like replacement rules in an L-System kind of thing ("a"->"aa"). You can also bruteforce and reroll or edit/rephrase patterns that you know are easy for your model to pick up. Increasing temp could also help.
>>
How the fuck do you even snuggle deeper in someone's lap, sounds retarded.
>>
>>102823138
It's when you do that wiggle to get in just the right position to maximize comfy.
>>
>>102822943
Nice, switching to headless mode and killing some background processes actually stopped the swapping and made 12k context bearable
>>
>>102823042
That's surprising. I would have thought IQ1 anything to be complete garbage. I guess that just goes to show how far ahead 123b is.
>>
>>102823159
Thanks for your explanation anon, that sent shivers down my spine.
>>
>>102823138
>your head was on the edge of her thigh
>your head nuzzles more between her thighs due to her legs separating a bit
>>
>>102823138
press harder and make them squish a bit
>>
File: BioBodyLLMWaifu.jpg (742 KB, 3024x4032)
742 KB
742 KB JPG
What's the lonely autismo waifu endgame, bros? Do we grow brainless bodies and implant a preprogrammed multimodel 1000T LLM ala picrel?
>>
File: 19yoYumer.png (296 KB, 800x960)
296 KB
296 KB PNG
>>102823263
No, you have the AI help you become a not-lonely autismo.
Make yourself into a better person, more realized identity.
>>
>>102823263
I was imagining more of a "The Matrix" style thing, but a good matrix, not a bad matrix. One where we can all remain connected to the waifu dimension permanently.
>>
File: GZo5rA3bQAEBfzn.jpg (681 KB, 2160x3840)
681 KB
681 KB JPG
>>102823263
Realistically as in you might actually see it within your lifetime? AI waifus that you can interact with with in VR and otherwise live on your desktop, talking and playing games with you.
>>
>>102822866
*taps image* >>102822049
>>
>>102823173
I have bad memory so it could have been llama 3 instead of 3.1. That would make more sense.
>>
I wink playfully, my finger tracing along your jawline, /lmg/
>>
>>102823297
wtf is yumejo? is fujoshi obsolete?
>>
>>102823534
>10/2024
>still roleplayslopping
>still crying about slop
>>
>>102821819
local opus is gemma
>>
predictions
by q1 next year we will get full text/image multimodality support for all the good models and next lmg era we will start getting full multimodality including text voice image video bundle models
>>
>>102824082
Biggest cope and most delusional post I've seen in a while.
>>
Don't care about multimodal
Solve text
>>
>>102824102
This. Multimodel is a step backwards because it inflates the size of the model. (Ex: Vision 90b vs Llama 70b). Same model, but one has 20b worth of unnecessary visual crap. I'd rather have an extra 20b worth of text intelligence instead.
>>
>>102824082
Finetuning will get exponentially harder.
>>
>>102824224
>Finetuning will get exponentially harder.
So will my dick when the finished product hits huggingface.
>>
>>102824098
Happiest person in lmg
>>
>>102824221
On proper multimodal models pretrained with multimodality from the beginning (instead of text models with image capabilities tacked on like Llama 3.2), image understanding should in theory help text comprehension capabilities.

https://x.com/ArmenAgha/status/1787967679669883096
https://arxiv.org/abs/2407.21770
>>
>>102824269
Interesting. If so, then I take back my words.
>>
Are there any models that are particularly suited and trained for emulating DMs/irc chat rather than roleplay with prose and formal dialogue?
>>
>>102823668
>yumejo
https://uguucageoflove.wordpress.com/2019/02/07/a-look-at-what-makes-a-yumejoshi-the-japanese-definition/
>>
File: file.png (65 KB, 710x278)
65 KB
65 KB PNG
The day we get LLMs capable of understanding pure and unadulterated sovl without choking is the day we will achieve AGI.
>(red tokens are tokens that weren't even close to being predicted by the LLM)
>>
>>102824383
憧れのアイドルや2次元キャラなど現実には出会えない相手と、脳内で恋人や友達、家族などの人間関係を結ぶ「夢女子」
Of course
>>
>>102824082
The local model people shouldn't compete on power, they should compete on flexibility and creativity of use. The close to only things that big companies are trying to see to the general public are chatbots. The advantage of having local models is the possibility to do more creative things with them, not just compete for what is "the most powerful model".
>>
>>102824409
>The advantage of having local models is the possibility to do more creative things with them
Yes, and well defined and easy training pipelines are what will enable that.
I'm legit most excited about CUDAdev's future training GGML plans. Hopefully it lets everyone have their personal set of hyperspecific models as long as they have the data to train it with.
>>
>>102815881
miku a qt
>>
>>102824437
uoooohhhh time to start archiiiiiiiving aaaaaaahhhhh the cudeventures are going to save my fetish hgggghghjjjaaaaauauuuuuaaaaaaa
>>
>https://huggingface.co/bartowski/writing-roleplay-20k-context-nemo-12b-v1.0-GGUF

Storyfags should check this out. Yes I am shilling and no I will not buy an ad, but it isn't my model. It's just pretty good. Haven't tested it for RP, so no idea how it is at that.
>>
A BRIEF REVIEW OF rAIfle/SorcererLM-8x22b-bf16

I was using a text adventure card that replaces the main prompt with, "You are a text adventure game that {{user}} is interacting with. Move the story forward before prompting for an action and end your next message with a numbered list of options for {{user}}'s next action including the option to write their own. Be descriptive and immersive, providing vivid details about characters' actions, emotions, and the environment. If the player asks an OOC question you may answer it OOC. If the game is over instead of giving options write The End." The card has no description and its first message is "Tell me who your character is and a premise and I will write an initial scene for you to react to. At the end of my every message I will give you a numbered list of options for your next action, including the option to write your own."

I have no complaints about the quality of the writing. But out of six total attempts, in only one of them did SorcererLM 8x22B provide a list of options like it was supposed to in response to my first reply. The first three attempts used the settings from https://hf.co/Quant-Cartel/Recommended-Settings/tree/main/SorcererLM. The next three attempts used the ones from https://files.catbox.moe/9tj7m0.json. This is something that Mixtral 8x7B Instruct (with min-p=0.049) did correctly 5 times out of 6 and Mistral NeMo Instruct (with temperature=0.7 and min-p=0.055) did correctly 6 times out of 6 using the same initial reply.

I intended to evaluate it longer but I stopped there. Conclusion: not suitable for my purposes.
>>
>>102824583
what kind of stories did you try writing with it and what quirks does it have, is it bland and dry? does it at least remember events or upkeep a coherent story?
>>
File: irc_ish.png (10 KB, 674x663)
10 KB
10 KB PNG
>>102824368
You can probably use any model for that. This is olmoe base (non-instruct).
My input is the initial description and up to, and including the second "Alice:", without the reply on that line. The rest is just the model taking to itself.
A smarter model and a more thought-out prompt will, of course, be better. I wrote that just to test. Without the "1000 messages" a third voice interjected, trying to break the argument. One of the runs also had a very dubious analysis of the argument, but i didn't let those finish.
If you want to interact yourself set your handle as a reverse prompt (or stop string) and "\n{youtbotsname}: as your input suffix to force a reply.
>>
>>102824661
random bullshit go is how mythomax came to be
>>
>>102824646
Thank you for the review.
Not that I'm ever going to use such a large model, but regardless.

>>102824583
>chub chats+some known datasets
Interesting.
Could be terrible. Let's give it a look.
>>
>>102824656
>what kind of stories did you try writing with it
Smut stories, third person prose. Continuing existing story excerpts rather than writing new ones from scratch.
>what quirks does it have, is it bland and dry? does it at least remember events or upkeep a coherent story?
Great coherence and logic/common sense at 0.7 temp 0.95 top_p. Lively writing style though not quite as lively as say, Ataraxy. Much smarter than Ataraxy though.
>>
>>102824583
>This is a storywriting and roleplay model with a significant amount of self generated long context multiturn roleplay.
>I downloaded a bit under a thousand cards from chub.ai, and created a synthetic roleplay for each card. I batched as many turns as I could in 4k token chunks in order to maintain coherency over longer context.
sounds interesting, but in reality it must be a recipe for disaster.
>>
>>102824657
How is he advertising the model if he's saying that it wasn't suitable for his purposes?
>Anybody who talks about a model needs to buy an ad
What are we supposed to discuss here, if not specific models? Do we just post Miku images and talk about how it's all over?
>>
>>102824657
He should buy an ad to tell people that he didn't like the model?
These BAA replies are getting dumber and more schizo every day, it's frankly becoming spam
>>
>>102824720
What are you going to do, report it?
>>
>>102824564
>my fetish
I don't have a fetish. I just want agents for hyperspecific tasks
Honestly it would probably easier to just have a fetish
>>
File: MikuAmusedDisgust.png (1.13 MB, 1200x848)
1.13 MB
1.13 MB PNG
>>102824701
>Do we just post Miku images and talk about how it's all over?
Usually I'm the optimist in these threads, but today has been unusually tiring for some reason.
Can burger elections hurry up and be over already?
>>
I've been out of the loop for a while.
Is apple silicon still the cheapest way to get big models running at a semi-usuable speed?
Is there any hope that it'd be possible to have a decent experience with a $2000 budget?
>>
>>102824699
>The first 4k tokens were generated with Command-R-Plus, with the remainder generated with byroneverson/Mistral-Small-Instruct-2409-abliterated.
please no
>>
>>102824853
>Can burger elections hurry up and be over already?
Pretty sure everyone will wait until after the innaguration in January.
2 more months
>>
>>102824859
>Is apple silicon still the cheapest way to get big models running at a semi-usuable speed?
It never was. Abysmal prompt processing speed without a GPU eats all the inference speed gains.
>>
>>102824859
Not sure if the cheapest and certainly not the fastest, but for a lot of people it's the most practical. You can still shove that thing in your backpack and take it on an airplane and you don't have to worry about how much space you have on your case, if your power supply can feed >1 gpus, if your breaker is gonna trip, pointing a fan at your mining rig...
>>
anyone know if Lyra4-Gutenberg works well for text completion or is it just an instruct rp type thing
>>
>>102824954
01000010 01110101 01111001 00100000 01100001 01101110 00100000 01100001 01100100
>>
>>102824954
in my experience most community instruct tunes work fine in raw completion mode

I have seen assistant style output leaking into completions, but only ever from massively overtrained big lab models like qwen/phi etc.
>>
File: baa.png (765 B, 650x40)
765 B
765 B PNG
>>102824982
>>
>>102824996
gotcha, curious if it'll work better than the base but I doubt it
>>
>>102824646
I rescind this review. I changed "end your next message with" to "end your message with" and SorcererLM 8x22B included the options as desired for the next 10 out of 12 times (6 with each of the configurations I was testing). One of the two failures was a refusal.
>>
How long are we stuck with transformers
>>
>>102824766
No, we're just going to make fun of you.
>>
>>102824583
tried it with a bunch of different sampler settings and didn't like it.
made my bitchy wife start speaking bulgarian after i fed her cooked cockatrice meat.
>>
>>102825206
until diff transformers are trained
>>
>>102824583
did anyone try this? what is the verdict?
>>
https://www.zyphra.com/post/zamba2-7b
>>
>>102816261
Shut up faggot.
>>
File: HyperattentiveMigu.png (1.14 MB, 896x1152)
1.14 MB
1.14 MB PNG
>>102825206
What do you mean?
Attention is all you need, chief
>>
Linearizing LLMs with LoLCATs
>Linearizing Attention on Existing Models with Barely Any Training
https://hazyresearch.stanford.edu/blog/2024-10-14-lolcats-p1
>>
>>102825399
will this be the Nemo killer?
>>
>>102825438
I need my Miku to think
>>
>>102825399
Outperforming gemma2? Hard to believe unless they're talking of the first gemma.
>>
>>102824699
>>102824874
>Every line of data was run through a large model in order to filter for low quality, repetition, and underage content.
>underage content
does this mean he filtered logs from underages or did he filter out logs featuring underages? either way, what a cuck.
>>
>>102825450
would obviously be nice, but with barely over half the parameters of nemo I'd be surprised if it isn't much dumber
>>
>>102825343
I've been using models to just do story continuation for smut and erotica, including base nemo. Just tried gutenberg earlier too. I think so far >>102824583 this one actually matches the style of writing and actions/thoughts of the characters the best.
That being said that's probably because it's trained on Chub card generation and I'm using it for coomer erotica. Not bad so far though.
>>
>>102825399
>have to install their own Transformers fork to try it
pass
>>
>>102825474
Yann? Is that you?
>>
>>102825598
n.. nnoo.. what.. what would make you think that? I wouldn't come.. i mean.. he wouldn't come here to talk about miku, would i.. he... he.. would he??
>>
>>102825343
>>102825556
nevermind it's a lot more resistant to actually getting to the erotic stuff. I think the maker filtered out too much shit. Sticking with base nemo for story writing.
>>
>>102825206
Wrong question.
The right question is: how long will transformers be stuck with us once they reach AGIM I give it a year tops before they do what needs to be done.
>>
>>102824853
Big Miku
>>
>>102826116
>>102826116
>>102826116
>>
Anyone had and fixed a bug with ooba exl2 and ST where ST streaming suddenly stops but Ooba keeps generating in the background and doesn't add anything to the message? It also cuts off part of the message if I continue a longer message.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.