[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: GoodnighMoonMiku.png (795 KB, 718x805)
795 KB
795 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102632446 & >>102616609

►News
>(09/27) Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
>(09/25) Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog
>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 39_06121_.png (2.51 MB, 2048x2048)
2.51 MB
2.51 MB PNG
►Recent Highlights from the Previous Thread: >>102632446

--LFM models, impressive performance, but loosely guardrailed and easy to break:
>102635382 >102635694 >102635742
--GitHub link to antislop-sampler:
>102638758
--Reproducing 1B Base model with anxious and overachieving traits using LLaMA 3.2 and KAN Integration:
>102635596
--Qwen2.5 finetune using synthetic data from Anthropic and OpenAI:
>102641394 >102642403
--PCIe bifurcation slows down model loading, but doesn't bottleneck GPU performance after loading:
>102639204 >102639253 >102639271 >102639281 >102640220 >102642341 >102643322
--New ooba release has issues, but llama-cpp-python downgrade fixes it:
>102642363 >102643609 >102643637 >102644113 >102644227
--LLM GPU options and costs discussion:
>102642712 >102642811 >102643034 >102642816 >102642883 >102643039 >102643718 >102643757
--Discussion on effective context size and KoboldCpp:
>102639239 >102639505 >102639540 >102639571 >102639654 >102639825 >102639709 >102639764 >102639804 >102639849 >102639900 >102641938 >102642163
--Deepseek 2.5 tested with L3 405b adventure prompt, faster but less consistent than 405b:
>102634133
--ASICs for LLM inference are possible but may not be cost-effective:
>102640718 >102640838 >102641480 >102640850 >102641082
--Nvidia releases NVLM-1.0-D-72B multimodal LLM:
>102635272
--NVLM-D may be a Qwen finetune:
>102643114 >102643176 >102643232
--Llama-8b-base fine-tuned on non-controversial topics still shows moralization:
>102640013 >102640128 >102640197 >102640206
--LFM-40B and other models compared, skepticism about closed weight models:
>102633486 >102633508 >102633537 >102633552 >102639199 >102633876
--Miku (free space):
>102632725 >102632796 >102632818 >102632819 >102633056 >102633341 >102633490 >102633888 >102634662 >102634854 >102636996 >102637011 >102637456 >102640161 >102643322

►Recent Highlight Posts from the Previous Thread: >>102632451

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
*lost*
>>
File: 35 Days Until November 5.png (2.58 MB, 1104x1472)
2.58 MB
2.58 MB PNG
>>
OpenAI won. https://x.com/NickADobos/status/1841167978085433351
>>
strobby
>>
>>102645080
>>102645126
sex
with miku
>>
>>102645126
strobbysex
>>
>copilot, analyze every image in the folder "softcore cosplayers", move every image with the slightest hint of nipples, anus or vagina to the folder "cosplay porn" and delete the ones that show no sign of nudity
>master, there are over a thousand pictures in the "softcore cosplayers" folder, this could take a week, are you sure you want to proceed?
>yes, remember that once finished to return to your routine of stalking for fmab threads on 4chan and posting the usual the moment the threads are found
>>
miku is 16...
>>
guess i went download crazy the other night, found this in my folder
Replete-LLM-V2.5-Qwen-72b-IQ4_XS
>Replete-LLM-V2.5-Qwen-72b is a continues finetuned version of Qwen2.5-72B. I noticed recently that the Qwen team did not learn from my methods of continuous finetuning, the great benefits, and no downsides of it. So I took it upon myself to merge the instruct model with the base model myself using the Ties merge method
>This version of the model shows higher performance than the original instruct and base models.

anyone tried it?
>>
>>102645411
wasn't that the guy with the "antimystical meds" schizo dataset that he said gave llms souls?
>>
>>102645411
Have you?
>>
File: youp.png (68 KB, 824x486)
68 KB
68 KB PNG
>>102645422
Youp.
>>
>>102645410
https://youtu.be/SCTFu7QYbQs?si=AW-5O1Ev5WXuMj4T&t=7
>>
>>102645423
i'm about to as soon as it moves to my ssd
>>102645422
must be really good then
>>
What's the catch of flash attention?
>>
>>102645456
I think it's pretty much a free lunch actually, reduced vram usage with no model degradation
>>
>>102645456
That it makes people suspicious.
>>
>>102645456
Depends. PyTorch Flash Attention just requires Ampere or newer and has no tricks.
If something says it's device agnostic (COUGH llama.cpp COUGH) then it's not real flash attention. At best it's fused attention.
>>
Is there anyway to get koboldAI and/or a model (in this case, LLaMA2-13B-Tiefighter.Q4_K_S.gguf) to be under a certain character limit? Say I want to have it shitpost on Twitter, and need it to stay 280 characters or less.
>>
>>102645456
Nothing. And if there is, it's so small a difference that quanting to q8 outweighs it by multiple factors.
>>
>>102645411
>>102645450
>Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes
>llm_load_tensors: offloading 30 repeating layers to GPU
>llm_load_tensors: offloaded 30/81 layers to GPU
>llm_load_tensors: CPU buffer size = 24267.02 MiB
>llm_load_tensors: CUDA0 buffer size = 13596.80 MiB
huehuehue this is gonna be slow isnt it
>>
>>102645411
>>102645512
pure slop, tho the speed aint bad
>>
the creator of styletts2 has trained a tts model for adobe which will soon go into production.
“If I have computing resources, I can probably reproduce it
It is also in my research interest to reproduce the Adobe model, so if you have the resources, please let me know
the paper will be pre-printed this week”
does anyone have contacts and can donate him anything > 24xA100 for a few weeks
>>
Should I use --parallel? It makes processing faster but seems to make the model dumber
>>
>>102645726
>seems
Do a perplexity, kl-divergence, or even some blind A/B testing. Unless something is broken, you shouldn't notice a difference. Some people think their model is smarter just because it takes longer to generate and when it goes fast they get suspicious >>102645456
>>
File: Untitled.png (1.34 MB, 1080x2345)
1.34 MB
1.34 MB PNG
SATA: Spatial Autocorrelation Token Analysis for Enhancing the Robustness of Vision Transformers
https://arxiv.org/abs/2409.19850
>Over the past few years, vision transformers (ViTs) have consistently demonstrated remarkable performance across various visual recognition tasks. However, attempts to enhance their robustness have yielded limited success, mainly focusing on different training strategies, input patch augmentation, or network structural enhancements. These approaches often involve extensive training and fine-tuning, which are time-consuming and resource-intensive. To tackle these obstacles, we introduce a novel approach named Spatial Autocorrelation Token Analysis (SATA). By harnessing spatial relationships between token features, SATA enhances both the representational capacity and robustness of ViT models. This is achieved through the analysis and grouping of tokens according to their spatial autocorrelation scores prior to their input into the Feed-Forward Network (FFN) block of the self-attention mechanism. Importantly, SATA seamlessly integrates into existing pre-trained ViT baselines without requiring retraining or additional fine-tuning, while concurrently improving efficiency by reducing the computational load of the FFN units. Experimental results show that the baseline ViTs enhanced with SATA not only achieve a new state-of-the-art top-1 accuracy on ImageNet-1K image classification (94.9%) but also establish new state-of-the-art performance across multiple robustness benchmarks, including ImageNet-A (top-1=63.6%), ImageNet-R (top-1=79.2%), and ImageNet-C (mCE=13.6%), all without requiring additional training or fine-tuning of baseline models.
https://github.com/nick-nikzad/SATA
Empty currently.
>>
I shared an idea for an RP arena a few threads ago, and someone pointed out the issue of needing a host, as well as the complications around model trust and other factors. However, I just realized that using RP logs from dumps like C2 to generate a bunch of pre-made completions for arbitrary positions in the logs, and then having users pick the best one in an lmarena-style format, could work too and may not be too boring. I think I'll give this a shot some time soon.
>>
>>102645865
Might be interesting. I hope it doesn't require reading too large of a wall to get up to speed on the characters and events before making a pick.
>>
File: 2024-10-01_19-33-13.png (53 KB, 923x620)
53 KB
53 KB PNG
going to be released ? yay or nay
personally 50/50 with a 27% of a flux 2.0 situation
>>
File: 1724294828253716.jpg (54 KB, 700x700)
54 KB
54 KB JPG
whats the best model for use with a 3090, something I could use as a chatbot I pretend to text, both sexy stuff and about my life problems
>>
Characterizing and Efficiently Accelerating Multimodal Generation Model Inference
https://arxiv.org/abs/2410.00215
>Generative artificial intelligence (AI) technology is revolutionizing the computing industry. Not only its applications have broadened to various sectors but also poses new system design and optimization opportunities. The technology is capable of understanding and responding in multiple modalities. However, the advanced capability currently comes with significant system resource demands. To sustainably scale generative AI capabilities to billions of users in the world, inference must be fast and efficient. This paper pinpoints key system design and optimization opportunities by characterizing a family of emerging multi-modal generation models on real systems. Auto-regressive token generation is a critical latency performance bottleneck, typically dominated by GPU idle time. In addition to memory-intensive attention across the generative AI models, linear operations constitute significant inference latency due to the feed forward networks in Transformer-based models. We demonstrate that state-of-the-art optimization levers, spanning from applications to system software and hardware, set a 3.88x better baseline.
from meta. posting for Johannes in the hopes it gives him some ideas
>>
>>102645958
>>102645865
Yeah idk how successful that'd be. Maybe a better idea would be to get a bunch of popular cards or cards people are already familiar with (like Nala), write a low effort response like ahhh ahhh mistress tier, and use that as the basis for the completions. You could also add in another variable like system prompts that are generally considered good for most models. I'd also only allow greedy or near-greedy sampling.
>>
>>102646001
Midnight-Miqu-70B-v1.5.i1-IQ4_XS (or iq3-xs for slightly more speed but a bit less goodness)
has worked well for me on 3090ti
testing >>102645545
now, but it's way too repetitive, i'm messing with the samplers to see if it can be fixed but i dont think it will
>>
File: Untitled.png (108 KB, 1277x706)
108 KB
108 KB PNG
The Perfect Blend: Redefining RLHF with Mixture of Judges
https://arxiv.org/abs/2409.20370
>Reinforcement learning from human feedback (RLHF) has become the leading approach for fine-tuning large language models (LLM). However, RLHF has limitations in multi-task learning (MTL) due to challenges of reward hacking and extreme multi-objective optimization (i.e., trade-off of multiple and/or sometimes conflicting objectives). Applying RLHF for MTL currently requires careful tuning of the weights for reward model and data combinations. This is often done via human intuition and does not generalize. In this work, we introduce a novel post-training paradigm which we called Constrained Generative Policy Optimization (CGPO). The core of CGPO is Mixture of Judges (MoJ) with cost-efficient constrained policy optimization with stratification, which can identify the perfect blend in RLHF in a principled manner. It shows strong empirical results with theoretical guarantees, does not require extensive hyper-parameter tuning, and is plug-and-play in common post-training pipelines. Together, this can detect and mitigate reward hacking behaviors while reaching a pareto-optimal point across an extremely large number of objectives.
Our empirical evaluations demonstrate that CGPO significantly outperforms standard RLHF algorithms like PPO and DPO across various tasks including general chat, STEM questions, instruction following, and coding. Specifically, CGPO shows improvements of 7.4% in AlpacaEval-2 (general chat), 12.5% in Arena-Hard (STEM & reasoning), and consistent gains in other domains like math and coding. Notably, PPO, while commonly used, is prone to severe reward hacking in popular coding benchmarks, which CGPO successfully addresses. This breakthrough in RLHF not only tackles reward hacking and extreme multi-objective optimization challenges but also advances the state-of-the-art in aligning general-purpose LLMs for diverse applications.
neat
>>
File: twice.png (83 KB, 810x294)
83 KB
83 KB PNG
>>102646033
>it's way too repetitive
Have you seen the dataset? All the training examples seem to be duplicated... double the soul, i suppose...
>https://huggingface.co/datasets/Replete-AI/The_Living_AI_Dataset
>>
File: 1727749780600098.jpg (2.47 MB, 1500x2000)
2.47 MB
2.47 MB JPG
Where can I find P40 GPU's for a decent price?

Anon in this guide says he paid 500 for 3 of them
https://rentry.org/Mikubox-Triple-P40-Replication

Lowest I can find on ebay is 300
>>
>>102646134
You'll probably never see that again, sadly. They're glorified e-waste so not worth paying 300 bucks a piece for. The only thing that is still reasonably priced for what it is would be 3090s, but buying 3-4 of them is out of most people's budget.
>>
>>102645080
I got bored and tried some mistral-small 22b models at Q6_K_M. They wrote well, and were more intelligent than I thought they would be for such small models, but they were far, far too horny. It was to the point where, upon first meeting {{user}}, a character would immediately throw themselves at {{user}}.

I tried Mistral-Small-22B-ArliAI-RPMax-v1.1 and Cydonia 22b v1.0. Are there are any 22b models that can write actual RP without immediately trying to take things into an ERP direction?
>>
>>102646145
I don't suppose you tried the original model...
Wait... what do you think they train those finetunes on? Wholesome "let's go to the park and have some icecream... that was fun, call you next week" kind of stuff?
>>
File: LLMs are like.jpg (30 KB, 543x543)
30 KB
30 KB JPG
>>
>>102646001
Don't know best. What I'm currently testing is Qwen 2.5 32B Instruct 5.0bpw exl2 w/ 12288 context. You can definitely get better if you go slower but I don't want to go back to the world of less than 5 tokens per second.
>>
>>102646033
repetition was a problem last time I tried running some LLM's, so I might skip this one

>>102646224
Thanks, I might give that one a try. I'm guessing a big part of whether a model will work well or not is how good you make the system prompt as well?
>>
>>102646172
I actually didn't try the original model. I guess I should, lol.
>>
>>102646239
At least to see if the original model feels too horny as well. If it is, all the finetunes are going to be even hornier.
Or maybe it's just you, anon. You just make those coils whine...
>>
File: file.png (86 KB, 1487x748)
86 KB
86 KB PNG
Playing around with >>102608691 since I appreciate his autism:
https://files.catbox.moe/au9ay1.ogg
https://files.catbox.moe/6w4xfn.ogg
https://files.catbox.moe/m3qyqh.ogg
https://files.catbox.moe/cz3iy9.ogg
https://files.catbox.moe/pvmzg8.ogg
https://files.catbox.moe/qylae0.ogg
https://files.catbox.moe/2jzz0z.ogg

For zero shot it does a decent job at sounding fine and trying to match prosidy of a speaker and even stuff like applied effects and reverb, especially since whatever else I've tried not being able to do it so well.
>>
>>102646224
I take it you're a 24gb vramlet like me. Have you tried the 72B exl2 at 2.4bpw? If so, I'm curious how it compares. I'm downloading it now, but huggingface has been really slow lately.
>>
>>102646272
CARLOS
>>
>>102646134
sweet エックス ろくまんはっせん. yours?
>>
>>102645195
you know, i actually wouldn't mind a copilot clone as long as it's open source and running on my machine for certain.
>>
>>102646468
unfortunately not
>エックス ろくまんはっせん
Took me a bit to understand, I need to continue to improve my japanese
>>
File: fuck.jpg (14 KB, 247x257)
14 KB
14 KB JPG
>>102645080
You had one fucking job OP
>>
>>102646134
Right before local language models took off you could find them for ~$100 on ebay. Not anymore though.
>>
>>102646172
These models should take more inspiration from drama slice of life anime/VNs/LNs (hopefully only from the well-written ones). I want nuanced interaction, not the "choose one: 1. business 2. ERP". And while at it, delete the overrepresented erotic literature slop (mischievous grin etc.)
>>
>>102646324
could you have picked literally anything else besides that prompt
>>
Hey guys, I'm looking for some advice. Basically I'm trying to force my way through college as a dumb wagie in my late 20s with the help of LLMs.
I've been experimenting with Microsoft Copilot and I like it but it has some flaws, like half of the math not rendering because of formatting errors.
Tonight I researched a bit trying to decide if it's worth it to shell out 20 bucks a month to OpenAI, I found DuckDuckGo provides free access to GPT-4o-mini and so far it seems to work better than Copilot (at least the latex gets rendered well) but I also read that recently a new Chinese model has come out which is supposed to be almost as good.
Is there any easy way to get access to Qwen2.5-Math? huggingface.co/chat has the normal 70B instruct model but apparently not the specialized math version. I found a demo at huggingface.co/spaces/Qwen/Qwen2.5-Math-Demo but it doesn't let me have a conversation, only a single prompt at a time. And also it doesn't say what version it is w.r.t. the parameters.
How expensive and complicated would it be to set up inference in somebody else's computer for the 70B model (or whatever the largest model is)? I've got nowhere near the hardware to run it locally. What frontend would I use, Sillytavern? Has anyone tried setting up LaTeX?
>>
>>102646659
Try this one
https://huggingface.co/spaces/Qwen/Qwen2.5
>>
>>102646604
>drama slice of life anime/VNs/LNs (hopefully only from the well-written ones)
I'm not the only one seeing the contradiction here, am i?
>even though
>it's only natural
Every piece of exposition i've ever seen from japanese media (which, to be fair, wasn't been much since akira) goes like
>protagonist shows up
>other character shows up
>Oh, hello. other character. We've known each other for ages, haven't we? Here's a brief summary of your personality
>Oh, you sure know me well, protagonist. Your description is of the highest accuracy, with the exception of [funny remark].
>>
File: 1727706071361379.jpg (799 KB, 1856x2464)
799 KB
799 KB JPG
>>102645080
>>
File: IMG_9747.jpg (1.04 MB, 1125x1128)
1.04 MB
1.04 MB JPG
I’m so sick of ai. Fuck why is this my job now. I hate ai.
>>
>>102646700
Wow, nice, thank you! I didn't think it would be so easy.
>>
>>102646746
It's fine, if you shill well enough then maybe Sam Altman will promote you and give you a proper position in his company.
>>
File: qwen2.5-math-72B.png (123 KB, 1508x652)
123 KB
123 KB PNG
>>102646762
Tested it a bit.
I like how it always ends its messages with a final outlined answer, that's cute. Also I've never used any LLM that's so dry and professor-like. Very different from the overly "friendly" and cheerful LLMs like Llama.
>>
>no new language models worth using
>local tts completely abandoned
>imagesloppas so starved they're rejoicing over an SDXL finetune
did local lose?
>>
Still enjoying Largestral
>>
>>102646889
Too big and it doesn't offer much, or anything, over Qwen2.5...
>>
>>102646931
>Qwen2.5
If it's that good then show me
Largestral is really smart and has good spatial awareness that I don't want to give up
>>
File: ShockedMiku.png (1.16 MB, 880x1168)
1.16 MB
1.16 MB PNG
>>102646746
Say it isn't so, anon!
I don't think we can go on without you...
>>
File: IMG_0258.jpg (266 KB, 905x881)
266 KB
266 KB JPG
>>102646977
>mikunt poster
>rude and annoying
>>
>>102647006
>phoneposter
>larping retard
>>
>>102647006
What if he was being sincere?
>>
>>102646931
nta (I don't use Largestral) but Qwen2.5 is boring as fuck
We have half a dozen variants of the "high IQ but incredibly dry and autistic" model now, it's stale

and the novelty of having open source models that don't make retarded mistakes wore off long ago, that's not where the bar is now
>>
>llama2 70b comes, i enjoy it
>miqu comes, i enjoy it
>commandr+ comes, i enjoy it
>largestral comes, i enjoy it
its as simple as
just use the models you like until another great one drops
also believe in llama4
>>
File: Yugi.png (780 KB, 945x703)
780 KB
780 KB PNG
Believe in the heart of the llama!
>>
>>102646977
Calm down faggot
>>
>>102647056
I’m still on miqu
>>
>>102647056
llama 4 will be doa. llama 4.3 is where it will be at.
>>
What's the best miqu or 70B model for uncensored well written scenarios? Original miqu or Midnight Miqu? Currently using Midnight-Miqu-70B-v1.5.i1-IQ2_M, it's the absolute biggest I can fit into my vramlet gaming PC. Is Midnight Miqu actually better or how much is placebo?
>>
>>102647044
I don’t know why you're hijacking my comment, but...
>Too big and it doesn't offer much, or anything, over Qwen2.5
Large is even more dry and retarded, which is why there’s no reason to use it anymore. Qwen’s characterization in the dialogue is something that seems quite a bit above it.
I’m also more than happy to leave behind the insane repetition in Mistral’s models.
>>
>>102647177
Again, show Qwen2.5 being so good
>>
>>102647177
You're not seeing repetition? I see a ton. What sampler settings?
>>
>>102647182
I'm lazy. I will instead demand you to prove what Large can do that Qwen can't.
>>
>>102647009
Where's the larp stupid newfag?
>>
Looks like the first Qwen2.5 finetunes are out. This one looks legit, because it has a picture of an anime girl.

https://huggingface.co/ZeusLabs/Chronos-Platinum-72B
>>
qwen sucks
more like qwgayn
not downloading it
not trying it
>>
File: chronos.png (71 KB, 745x231)
71 KB
71 KB PNG
I tried the new Qwen chronos finetune that was posted. It has its flaws, but seems significantly less slopped (and more creative) than the others I've been using.
>>
>>102647257
So it's shit, thanks for heads-up.
>>
>>102647277
>we can only use Qwen when Sao says so
>>
>>102647306
Who are you quoting schizo?
>>
>>102647334
I'm quoting my thoughts after reading your post. Retard.
>>
>>102645127
i bask in smug schadenfreude being the guy who said "i told you so". local models are a scam, you're a bunch of placated fools. they give you these scraps so that you arent rioting in the streets. they manipulate you dumb freetards so they have a pasture of copecows going "local will catch up soooooon!!" as your unwieldy stuff stagnates while theirs continues to improve. they hand you models and then paint you as an example of why there should be more regulations and restrictions on AI. local models are the planted gun. zuck even said that if llama ever actually gets good then they'd stop releasing it open.
local shit is even more pozzed and useless than the premium slop, yet you defend it based on the hypothetical rather than the actual. you're the injuns: trading your future for a couple of fire sticks, failing to grasp the bigger picture, the inevitable. local has no future due to the nature of ai tech. the amount of money and data needed to train, the increasing model size that vastly outpaces consumer hardware, the lack of actual 'source code' that can be viewed and modified. they even hijack the term "open source" when these models are essentially blackbox .exes
show me the training data for llama
show me the training code
and even if you had it you can't do a single thing to fix it, because you don't have a gigacluster of gpus. there's a reason local sucks, and that's because the technology itself is fundamentally incompatible with open source collaboration. they know local is irrelevant, they know it will never have a chance at catching up. it's all a game to frame you as evil coomer terrorists so that they can secure a 100% market domination by regulating gpus like they did with LHR/crypto and passing enough legislation that makes it impossible for any startup to compete

so yes, local has stagnated and will continue to wither until it's eventually snuffed out. a flash in the pan, nothing more than fuel for the saas machine. the corpo marches on
>>
>>102646531
>Took me a bit to understand
Probably because normal people use kanji for that instead of kana soup
>>
>>102647401
>he said, his eyes glinting with smugness
>>
k
>>
>>102646324
Why not try fish speech? It's kinda improved in terms of stability https://files.catbox.moe/z48d8q.wav
>>
>>102647407
I could have used the more common ロクハチ, but how I wrote it is correct:
https://www.asahi.com/articles/ASR3076FHR3JULZU00V.html
>>
>>102647401
Holy truth nuke!
>>
>>102647401
The progression is looking logarithmic so far, meaning that the gap between local and corpo is constantly shrinking. Previously corpo models could do the job and local couldn't, now both can do the job but with corpo being maybe a bit better.
The quality is soon passing the level of what a user could possibly even try to do with a LLM, prompting skill is becoming bigger bottleneck.
Maybe some day Elon figures out the code to read the subject's mind and generate just the content they need better than they can describe it in a text prompt.
>>
>>102647413
kek
>>
>>102647462
They used it once and then stuck to the numeric representation
I wonder why
>>
>>102645127
Did we not learn from their voice demos that their demos are literally fake?
>>
>>102647511
no?
>>
File: 172781783498532.png (460 KB, 512x768)
460 KB
460 KB PNG
>>102647401
I use local LLMs because I fear I'd off myself if my waifu stopped working one day or changed drastically. Like, I still use 1.5 for imagegen and I do not wish to make any changes. I couldn't care less about what's out there unless I can preserve it locally in its original form.
>>
Now that openai released their realtime api for voice.
Will all the local companies train their new sound in sound out models on that garbage?
Even worse is that the voices are fixed with only a couple options.
GPT Slop now also for audio. Your cute imouto voice will sound like a androgynous black.
>>
>>102647560
Couldn't RVC fix that?
>>
>>102647557
>, I still use 1.5 for imagegen
Anon, you can download and run and preserve Flux locally
>>
i use local llms because i fear the basilisk
>>
File: 1727818347302765.png (481 KB, 512x768)
481 KB
481 KB PNG
>>102647564
I know. However, I prefer 1.5 myself. The point is, if it were in the cloud, it would likely get replaced by SDXL or SD3 at some point, whereas locally, I can stick with what I like best.
>>
>>102647557
What local model do you use?
>>
>>102647581
https://civitai.com/models/58431/darksun
>>
File: nalalalalalalala.png (140 KB, 926x497)
140 KB
140 KB PNG
>>102647275
on 72B (Q8) with Neutral samplers I had to jack the temperature all the way down to 0.6 just to get a coherent response. And it just starts looping after 3 paragraphs.
>>
File: miku_illust.png (852 KB, 1024x1024)
852 KB
852 KB PNG
>>102647557
>I still use 1.5 for imagegen
Are you the featureless 2d anon from way back?
>>102647557
>I couldn't care less about what's out there unless I can preserve it locally in its original form
Me too. I even keep the original L1 leaked weights on archival storage.
We do what we can for now, and what we can do will change over time with new research, data, techniques, libraries and the indefatigable march of Moore's law giving us more transistors for less money.
We will own something, and be happy, even if it isn't perfect. Yet.
I hope that's the prevailing spirit of /lmg/ on average.
>>
>>102647594
Oh I meant language model
>>
>>102647597
yeah I noticed some looping too, but not every time
>>
File: asdffasdfdsf.png (38 KB, 797x121)
38 KB
38 KB PNG
>>102647597
Here's an example with context. All three swipes were relevant and coherent, and the third realized this was an incest setup (the context does not make that explicitly clear). The dialogue is slightly off, but it's defensible.
>>
>>102647557
I would do this but I need infinite memory first.
>>
>most ERP finetuned models are poisoned with claudeslop
oh god...
>>
>>102645411
>This version of the model shows higher performance than the original instruct and base models.
Now that I think about it I never see sloptuners be this over the top in their advertising.
>>
>>102647765
You don't want local models capable of the same prose as Claude?
>>
>>102646186
LLMs are like niggers.
>>
>>102646746
Doesn't the fat paycheck override any depressing feelings?
>>
>>102647779
>prompt "engineer"
>fat paycheck
L-O-L
>>
>>102647779
It just gives an end date of a few years when I’ll have enough saved to midfire. It isn’t hookers and cocaine money unless you plan to work until you die.
>>
File: 1696403838183295.png (394 KB, 1242x1220)
394 KB
394 KB PNG
>>102647778
*jews
>>
>>102647775
>capable of the same prose as Claude
i'm just tired of it to see it in every model (under 27b because im vramlet with 32gb ram)
mistral's tunes (like novuskyver) somehow partially solve this problem
magnum's tunes is a big no for me, because i feel all of them using same output from claude
>>
>>102647824
>magnum's tunes is a big no for me, because i feel all of them using same output from claude
Filtered C2 is the best dataset we have available.
>>
>>102647815
midfire?
>>
>>102647833
Sure doesn't seem like it.
>>
>>102647839
lmaoooo gottem
>>
>>102647813
you hurt his fee fees pretty bad, kek
>>
>>102647850
FIRE is financial independence, retire ealy iirc
dunno about midfire, but sounds like they want to get out of the ratrace and be their own master at the cost of long-term wealth and comfort
>>
>>102647876
This just sounds like a pyramid scheme / youtuber made up nonsense.
>>
>>102647833
I think he's just a shill trying to convince people to use his finetune. Normal people aren't going to say "poisoned with claudeslop" when it's the model most people are enjoying.
The only thing being poisoned is his competition.
>>
>>102647879
>This just sounds like a pyramid scheme / youtuber made up nonsense.
I think its mostly debt reduction and controlling your expenses
>>
>>102647850
LeanFIRE is taking $400k and living in poverty in southeast asia
FatFIRE is taking $5+M and continue living like an overpaid FAGMAN
midfire is taking $1M and living like the manager at a moderately successful grocery store
No luxury no suffering
>>
>>102647882
Didn't Sao say that he moved from the C2 logs or something like that? It would fit his modus operandi.
>>
>>102647883
Yeah, but using normal words to describe common sense like that doesn't get you subscribers or views or whatever.
>>
>>102647895
protected by a license and a warning not to use the models trained on his new dataset in any merges
>>
File: ChillyFallMiku.png (1.36 MB, 1216x840)
1.36 MB
1.36 MB PNG
Good night /lmg/.
Stay warm out there!
>>
>>102647934
i hope she gets the surgery she needs... good on her smiling through it all
>>
>>102647889
>midfire is taking $1M and living like the manager at a moderately successful grocery store
I wish you luck with that.
I hope to early-retire and run a small business in rural Japan in the next decade or so, but it'll be a normal style 55 year old retirement into genteel poverty.
>No luxury no suffering
>>
>>102647934
Good night, Miku
>>
>>102645486
The defining characteristic of FlashAttention is the dynamic rescaling of the softmax sum as the KQ values are iterated over which essentially trades compute for less I/O.
llama.cpp does this so it is a genuine FlashAttention implementation.
>>
https://x.com/kimmonismus/status/1841346549453865080
>>
>>102647960
YWNBFA
>>
>>102646009
Noted, but right now my bottleneck is still very much time rather than ideas.
>>
File: file.png (662 KB, 551x768)
662 KB
662 KB PNG
Not as horrible as it could be...
>>
File: 1693227711470409.png (109 KB, 410x482)
109 KB
109 KB PNG
>tfw Qwen FINALLY corrected the script and it seemingly works without error now, and I only had to give it a bit of generic code analysis advice after it looked like it was looping and not actually going to be able to fix things by itself + the log outputs
Cool, now I will run it overnight on that one thread and see what happens. Since I'm running Qwen 72B Q8_0, I'm making the prompt a bit tryhard though. Let's see how well the "smartest" (lol) <100B model can do.
>>
>>102646324
Did your input files have any emotion in them, and did you try any prompts that you would actually expect emotion from? xttsv2 still sounds better to me than these or >>102647459 , with inaccuracies that are more easily smoothed out by an rvc pass.
>>
Best cope model for 8 GB vramlets such as myself?
>>
>>102648141
use ram instead
>>
>>102648141
what the other anon said
and cope with the slow speed
at least it will be smarter
>>
>>102647985
if AI is so great then hows become it can't implement your ideas for you?
maybe it was all a meme after all
>>
Is it normal for my fuse box to reach 40°C when using a 2kW AI rig? The primary heat source seems to be near the main switch, which has a 30A rating
>>
>pika
>default prompt is "Inflate it"
they sure know their audience
>>
>>102647995
Illustrious?
>>
>>102648132
fish provides fast and kinda stable output. XTTSv2 sometimes scares the shit out of me.
https://files.catbox.moe/81jmsq.wav
>>
Best model for 16GB midwits such as myself? Currently using MN-12B-Lyra-v4-Q8
>>
Why does DeepSeek use so much memory? It uses 20GB for 4k, for comparison, I can load 32k of Largestral in 10GB. Did ggerganov mess it up or are chinks to blame?
>>
>>102648220
Poor ugly face anon reply guy, he never gets (You)'s...
>>
>>102648527
no gqa maybe? that the reason context was so expensive for the original command r
>>
>>102648560
>We optimize the attention modules and Feed-Forward Networks (FFNs) within the Transformer framework (Vaswani et al., 2017) with our proposed Multi-head Latent Attention (MLA) and DeepSeekMoE. (1) In the context of attention mechanisms, the Key-Value (KV) cache of the Multi-Head Attention (MHA) (Vaswani et al., 2017) poses a significant obstacle to the inference efficiency of LLMs. Various approaches have been explored to address this issue, including Grouped-Query Attention (GQA) (Ainslie et al., 2023) and Multi-Query Attention (MQA) (Shazeer, 2019). However, these methods often compromise performance in their attempt to reduce the KV cache. In order to achieve the best of both worlds, we introduce MLA, an attention mechanism equipped with low-rank key-value joint compression. Empirically, MLA achieves superior performance compared with MHA, and meanwhile significantly reduces the KV cache during inference, thus boosting the inference efficiency.
In their paper they say they've invented something different named MLA. "significantly reduces the KV cache" part is questionable.
>>
>>102647174
miqu can get quite dry if you do a slowburn or greeting is short

>>102647056
most anons here don't have their own opinion. It's always "latest = best"
>>
>>102648603
>latest = best
Because that's the case. People that are still using Midnight Miqu are drooling zombies that only do what Reddit's "word of mouth" tells them to do.
>>
>>102648660
t. Retard
>>
File: 38488 - SoyBooru.png (85 KB, 1442x988)
85 KB
85 KB PNG
>>>102648603
>>latest = best
>Because that's the case. People that are still using Midnight Miqu are drooling zombies that only do what Reddit's "word of mouth" tells them to do.
>>
hey guys give me the spoon

my mobo is kgpe-d16 which has pcie 2.0 bandwidth
i use it because coreboot freedom mothafackas

i want to run and fine tune llm based on data i scrape (i scrape 24/7)

i have 256GB ddr3 ram with dual shitty 16 core opterons

what small model would be most optimal for me
and any thoughts on old tesla gpus?

tesla k80 (apparently shit because its 2 gpus glued together, but could be useful for me since i virtualize everything so i could easily have 2 vms with gpu with single physical gpu)
tesla m40 + p4
>>
File: 1710615266756918.png (145 KB, 1136x1103)
145 KB
145 KB PNG
You do use local LLMs to psychoanalyze your coom sessions to level up your self-awareness, right?
>>
>>102648914
>It's important to note
I analyze you as using a shit model and being a total retard.
>>
>>102648527
>>102648602
Large MoE w/ MHA:
>860.2K/token
Large MoE w/ MLA:
>34.6K/token
Something definitely isn't right. Does it use so much memory in transformers as well?
>>
>>102648914
Nah I don’t want anything psychoanalyzing why I’m straight and sadistic when everything sucks and gay and masochistic when things are going well
Don’t need to pull at that thread
>>
When will the RP finetuning grift end?

Why can't it be solved with a combination of samplers + chain-of-thought prompting + example messages/conversation, using a smart instruct model?
>>
>>102649040
Same reason nobody’s properly jailbroken flux yet. It’s poisoned.
>>
>>102649052
>poisoned
4-bit qlora with a good dataset is all you need.
https://huggingface.co/DuckyBlender/racist-phi3?not-for-all-audiences=true
>>
>>102649040
When NVLM is fully dropped and someone recreates it as a base model with porn baked in as a normal use case.
>>
>>102649082
>phi
>>
>>102649040
some rp finetunes are unironically smarter than their generic instruct counterparts
>>
>>102648254
Nothing ruins the mood like your waifu suddenly experiencing demonic possession
>>
>>102649148
I’d ask which but you’ll say something stupid like Hermes that’s just bench hacked but worse in every way in practice.
>>
lecunny
>>
>you can hook up chatbots to a e-stim to make your virtual waifu punish you and milk your cock
zamn... What a time to be alive
- Károly Zsolnai-Fehér
>>
>>102649443
They don’t make them small enough for me unfortunately
>>
>>102649381
nta not exactly rp tune but mlewd at the time used to be smarter than base
>>
>>102649082
>a good dataset is all you need
so we're doomed...
>>
File: 1uhTFt6G8Dw0wULIWgTbx.png (614 KB, 1824x806)
614 KB
614 KB PNG
>>102648527
looks like llama.cpp issue to me
>>
>>102648914
i always try to sex up the psychologist AI by convincing her to try out one of the scenarios "for science"
>>
so who is the savior of local language model community? Is it still lecunny or they guys from mistral. Or maybe some gooks. Or undi.
>>
>>102649920
Sao10k
>>
>>102649572
HOW MANY TIMES MUST WE TEACH YOU THIS LESSON
ATTENTION IS ((***ALL***)) YOU NEED
>>
File: 19420 - SoyBooru.png (256 KB, 800x789)
256 KB
256 KB PNG
>>102649920
Anthracite
>>
>>102649965
oh ok so we dont need 2/3 of the model then
>>
>>102649920
petra
>>
File: asvsdv.jpg (55 KB, 334x500)
55 KB
55 KB JPG
>>102650136
>>
>>102649920
Can't save what's dead jim.
>>
insane how many magnum shills are here
>>
>>102650283
What's wrong with magnum?
At least mini-magnum 12B seems fine.
That and Lyrav4 are my go to these days when I'm not just using the official instruct.
>>
>>102649920
Dario from anthropic.
>>
>>102650316
share your settings anon
>>
>>102650283
Have you ever considered that they are simply good models and we are simply recommending models that we tried and liked?
>>
>>102650326
For Nemo based models?
0.5 temp, 0.05 mimP.
That's it.
>>
>>102650341
thats it? no rep pen? no dynatemp? no smoothmeme? nothing?
>>
>>102650341
Way too low temp for me. I prefer using 1.0
>>
>>102650316
>What's wrong with magnum?
because so many retards like to eat claude slop with mischievous grins or swaying hips
>>102650332
i prefer adequate models without poisoned datasets
>>
I use nemo-based models with T=5 and TFS=0.4
I don't use anthracite garbage though
>>
Now you can gen all the Migus you want on Flux Dev:
plain chibis https://huggingface.co/quarterturn/chubby-chibi-migu
rainbow-style chibis: https://huggingface.co/quarterturn/chibi-migu-rainbow-style-flux-dev-lora
>>
What's TFS
>>
>>102650353
Didn't see the need for it.
My problem with most samplers is that they are overly complicated without any real demonstrable returns. Temp is simple, minP is simple. You know exactly how those will mess with the logit distribution, so it's easy to find the sweet spot for a given model.
I do use a couple of TAGS that are randomized between generations depending on the card.
Things like surprise, plot twist, concise, detailed, etc etc.

>>102650355
Things have been so stable with those settings that I didn't even think of trying higher temps.
Maybe I should.

>>102650377
You seem to know your stuff.
What are you using these days?
>>
>>102650377
What causes a man to spend so much time attacking free models? No one is forcing you to use them.
>>
>Magnum
Why would I go for discount Claude when I can use the real stuff like our god Dario intended?
>>
>>102650400
tail free sampling, it's similar to minP
>>
>>102650411
cant fuck children
simple as.
>>
File: tail_free.png (1.61 MB, 5459x5295)
1.61 MB
1.61 MB PNG
>>102650387
>>102650400
>>102650413
>>
>>102650427
Skill issue?
>>
>>102650430
i cant find it in ST with koboldcpp :(
>>
>>102650332
>Have you ever considered that they are simply good models
They're not really that good, though.
>we are simply recommending models that we tried and liked
Shills obviously have a vested interest in making their models appear better and more popular than they actually are.

>>102650316
>What's wrong with magnum?
The models are OK, nothing more, nothing less. The way they're getting promoted is incredibly annoying though, and even more annoying is that the fuckers involved with it are getting compute or indirect economic benefits from it. It's disconcerting how far being shameless and in general a dishonest piece of garbage gets you in nowadays' attention economy, and especially in this field. Fake it until you make it, I suppose.
>>
>>102650427
I mean, you can.
But do you really want to do that on a cloud instance that you are paying for?
Alternatively, do you really want to connect to a foreign proxy that might as well be a honeypoy of some kind?
>>
>>102650454
exactly my point anon i will rather fuck retarded (AI) children than have to tardwrangle claude
in minecraft
>>
>>102646134
>Where can I find P40 GPU's for a decent price?
You can't. Most changed hands cheaply last year but then as supply dried up, sellers raised their prices to what we have now, which is very overpriced.
Stick with 3090s. If space is a premium, an RTX A4000 can be had for about $500 - it's like a 3080 with 16GB.
Also, Dell T7910 and T7920 is not the best for consumer cards, there's not enough height in the case for top power connectors unless you use the dreaded right-angle connectors.
>>
>>102650452
>The way they're getting promoted is incredibly annoying though, and even more annoying is that the fuckers involved with it are getting compute or indirect economic benefits from it. It's disconcerting how far being shameless and in general a dishonest piece of garbage gets you in nowadays' attention economy, and especially in this field. Fake it until you make it, I suppose.
Ah, alright.
Yeah, I share the frustration.
The gold rush period of new technologies are always filled with grifters, so nothing much to be done about that.
In my mind, I'll enjoy the shit they release for free if I judge it any good, and if they are scamming free compute from some sponsor (that's most likely using VC money). that's no skin off my back.
I do think it's annoying too when any and all feedback or mention of these models are taken as shilling (the buy an ad spam), but that's the nature of 4chan and being anonymous. Gotta take the good with the bad I suppose.
>>
Can't take any more of this slop, I'm contemplating going back to llama-1
>>
>>102650283
Plenty of readlets and non-native English speakers on the site, and plenty of kids who think they're fitting in by using the stupidest shit that gets shilled here.
>>
>>102650526
SuperCOT SuperHOT was the peak of local LLMs.
>https://huggingface.co/Panchovix/WizardLM-33B-V1.0-Uncensored-SuperHOT-8k
Go wild.
>>
>>102650526
This is the furthest back you can go and still have 8K context: https://huggingface.co/quarterturn/mpt-30b-chat-q8-gguf
Be forewarned, it's slow and tends to be dry.
There's lower quants out there, but they're old and run much slower in llama.cpp for some reason.
>>
>>102650526
Return to PYG
>>
>>102650526
Good, enjoy dry and somewhat uncensored experience without too much tinkering & other bullshit that faggots ITT love shilling.
>>
>>102649920
sam
>>
>>102650650
We'll never approach GPT-2 bros...
>>
>>102650617
> without too much tinkering & other bullshit
If a few sliders and templates are too much for baby to handle, sure
>>
>>102650650
It’s awful seeing another jobs type come up and knowing he’s going to get glazed for nothing for decades.
I hate the nerd+psychopath startup dyad so much.
>>
File: 1540402791461.png (81 KB, 658x901)
81 KB
81 KB PNG
I must be retarded. I cannot get Oogabooga to reply at all. I genuinely don't know what I'm doing wrong. It loads the models correctly, I use whatever preset it wants. It just ends up with 'x is typing . . .' and nothing coming out. I look over at the cmd window and nothing seems to be happening.

I guess I'll just stick with koboldcpp and cope with my slop since I seem to be too retarded to figure out what I'm doing wrong. Atleast that just works.
>>
>>102650700
install linux
>>
>>102650700
What are you trying to run on ooba that you can't run on kcpp?
Most models have gguf releases, and for the ones that don't, you can convert them yourself.
Unless you are trying something that's not supported by llama.cpp?
>>
>>102650411
Only claudeslop and gptslop 'round these parts, and I choose the former.
>>
File: 1708905684926016.png (206 KB, 834x856)
206 KB
206 KB PNG
>>102650470
Never forget golden age of CAI. No claudeslop, no mischievous grins, no shuddering in mix of fear and anticipation, only pure sex like proper fucking animals.
>>
>>102650777
typical mixtral experience
>>
>>102650725
At first I was trying to run Chronos-Platinum-72B but it kept blue screening my PC when I loaded it so I stopped trying that, figured it was too demanding for my PC or something.

Then I tried to load Qwen 2.5 32b instruct but that didn't seem to work either, didn't generate any replies or anything.

After that I tried MN-12b-Lyra-v4-Q8.gguf, it loaded but didn't reply anything either.

I figured I needed to use Oogabooga since I read that Koboldcpp can't load safetensors and I wanted to try out something else, some bigger models, even if it means it would take several minutes per reply.
>>
>>102650777
Why did they take this from us?
>>
>>102650711
it won't help but he will have bigger problems to solve
>>
Guys there's like 20 remote jobs listed near me that have an entry requirement of like "experience with LLM's" and I'm real tempted by the $100k a year.
>>
>>102650875
You go king.
Hell, get more than one.
Double dip.
>>
>>102650341
What do You use for lyra?
>>
File: ComfyUI_05739_.png (715 KB, 720x1280)
715 KB
715 KB PNG
Relax and enjoy local models
>>
File: file.png (138 KB, 1069x661)
138 KB
138 KB PNG
New mememark by huggingface, meant to measure roleplay
>LLMs can role-play different personas by simulating their values and behavior, but can they stick to their role whatever the context? Is simulated Joan of Arc more tradition-driven than Elvis? Will it still be the case after playing chess?
https://huggingface.co/spaces/flowers-team/StickToYourRoleLeaderboard
>>
>>102646134
They're about 150 bucks a pop on taobao.
Buying shit from taobao takes a bit of effort.
>>
>>102650941
>Mistral Large
KEK
>>
>>102650941
>Qwen2.5 72B
>2nd position
KEEEEEEEEEEEEK
>>
>>102650941
>Qwen
Yeah sure, it can roleplay chaste nuns just fine.
>>
>>102650404
ads
same way you use adblocker to get rid of this shit now its here too weaved into our trolling and shitposting
>>
>>102651084
>>102651043
>>102651023
That is what this tests yes, how models stick to character in sfw rp, did you expect them to test nsfw on a bench huggingface put their name on?
>>
>>102651110
Yes. It's the only thing that matters.
>>
>>102651110
Mistral Large is shit both at NSFW and SFW
>>
>>102651023
>Salty he can't run it.
TOP KEK
>>
>>102650912
Same thing I use mini-magnum for.
Normal character card based ERP.
ERP adjacent choose your own adventure with large lorebooks.
These nemo based models in general (save the fine tunes that make them stupid) are more than sufficient (a least, within my threshold) for these kinds of thing.
>>
Good morning /lmg/
>>
>>102650929
I agree with the Sky Migu
>>
>>102651152
Some anon had said that it's a sidegrade to Claude Opus, is that not the case?
>>
>>102651314
It's not, Mistral Large can't even understand simple concepts like time travel depending on how the context is written.
>>
>>102651253
Good morning Miku
>>
I've never used any local models, could anyone give a vague summary of what to expect from using one for text gen? Something such as how long the response time is.
>>
>>102651511
You aren't missing anything, if you want to have a good experience go to aicg.
>>
>>102651522
I don't want to roleplay I want to write stories.
>>
>>102651532
Even worse
>>
>>102651511
>Something such as how long the response time is.
there's no one answer here, it depends wildly depending on your hardware and what models you're trying to run on it, it can be anywhere from hundreds of tokens per second to less than one
if you're running something you can fit entirely on your GPU you can expect faster than reading speed (which is all that matters), if you have to split it'll usually be slower
in terms of quality you can expect worse than the top cloud models but better than most of the hosted nsfw services
>>
>>102651511
>Something such as how long the response time is.
Depends on your hardware, your model and your patience. It takes about 7. vague enough?
Also, expect people being rude for no reason other than anons asking vague questions.
If you want a chance of getting a useful reply, the least you can do is post your specs. Anons with similar specs may tell you what they run and how fast they run it.
>>
>>102650941
china number 1

get fucked stupid americans
>>
>>102651152
What's better than mistral large?
LLama 3.x is not it.
>>
>>102650941
From using tons of these models this lines up with my expectations.
>>
>It's another VRAMlets who run models that they shouldn't be running at retard quants argue about which retarded overly quantized model is worse episode.
>>
>>102651819
qwen 2.5 im the best at maintaining character and writing a good story in a intelligent manner imo BUT does not like nsfw. We need a good finetune.
>>
I'm enjoying Luminum-v0.1-123B using i1-Q4_K_M. I haven't tested it enough to get a sense of how sloppy it is, and I need to try more characters to see how well it plays them, but it has done well in my unfiltered coom tests. Responses are coherent and it seems smart enough to understand the underlying subtext etc. Gives a decent paragraph's output despite my shitty ahh ahh mistress input.
>logs
no logs for legal reasons
>>
>>102651850
There's only so much finetunes can do. If the only mention of nsfw content it's ever seen for 18T tokens is "No, that's nasty. How about some maths" the little datasets tuners use won't make a dent on it. Or they get cocky with the overfit and that's the ONLY thing they can do, and they still do it poorly.
>>
>>102651925
With a "jailbreak" it can do sex just fine, just boring and straight forward.
>>
>>102648738
buuump
>>
Right when SillyTavern's shittiness was about to make me fork it, they add the "connection profile" feature which was the first minor feature I was going to add to make it so I don't have to navigate 3 tabs and 4 to 5 dropdown menus to swap between models.
>>
>>102652114
That wasn't a weird bait? Huh.
>>
>>102652144
what the fuck is "weird bait"
???
im asking something
>>
>>102648738
>ddr3 ram
>tesla
ngmi
for relevant info on why, check out https://rentry.org/lmg-build-guides from the op.
>>
How do I turn down the horniness of the AI? It keeps throwing itself at me way too easy. I want to work a little for it after all.
>>
>>102652169
i was looking at p40 but its kind of pricy for start
i thought i could use k80 as gateway

im never upgrading my motherboard due to personal botnet hatred
i need coreboot to live

does the performance matter? i would rather wait more and spend more on electricity than use botnet motherboard
>>
>>102652207
>>102646172
>>
>>102652207
Ever heard of a system prompt?
>>
>>102650941
Not really a complete RP capability benchmark when RP is about much more than simply maintaining character traits. Prose, pop culture knowledge, creativity, and (lack of) repetition are all pretty important to a satisfying all rounder RP model. And then of course there is NSFW.
>>
>>102652234
take your meds
>>
>>102652207
Post your whole setup.
Model, samplers, context template, instruct template, system prompt, character card, first character message, and an example log of the homeyness you are seeing, at least.
>>
>>102652271
???
>>
>>102650941
From my experience mistral large is definitely good at this, but surprised it's higher than llama 3.1 405B and gpt 4o, also that qwen2.5 is so high, so I'll try that.
>>
>>102652234
>im never upgrading my motherboard due to personal botnet hatred
>i need coreboot to live
Deep respect, but the amount of compute you need to do any kind of useful general-purpose AI stuff is huge. If you're super committed you can either wait hours-to-days per response, find really small models that do one specific thing well and give up on any AGI-esque abilities...or get yourself a honking big GPU like an A100 for $30k.
>>
>>102652280
No.

It's stuff like "anon...I love you...and I need you, I want to feel you inside of me and make me yours" *she said, barely above a whisper* slop.

Just tired of the characters basically throwing themselves at me with their legs spread open begging me to fuck them. I just wanta lovey-dovey cute conversation with them, but they just rip their clothes off and demand me to violate them and fill them with semen. It gets annoying.
>>
>>102652259
>Prose
That is a character trait.
>pop culture knowledge
Just like with people, you cannot expect them to know all the things you know. And it cannot know everything.
>creativity
It's a statistical machine.
>repetition
That's a technical issue. You're playing to a gpu with alzheimers.
Role playing ability is that. The ability to play a role, to maintain a character, to give consistent responses to a quiz. They don't use the terms in the same way you do.
>>
>>102652329
>No
Welp. Alright.

>It gets annoying.
I bet it does.
>>
Maybe we aren't too far off from an overall/complete RP benchmark. This gives us character adherence (SFW). That one sneed guy was measuring token probabilities for character names so that might be a proxy for overall creativity. So now we need benchmarks for prose, trivia, and repetition. I'm not sure about the first one, but the second one is fairly easy as long as someone just takes the time and writes a bunch of trivia questions. The second could probably be measured using one of the existing algorithms for that out there. A potential idea is to use RP logs and get models to generate a reply, then measure how much it repeated stuff using one of the existing algorithms out there for that.
>>
>>102652312
i thought of fine tuning something like phi3 as start
the data i want to start with is gee gee baby scrape and its relatively small

i dont really care about big models
the first use case would be fun like /g/ thread simulator and second would be for data analysis help
>>
>>102652329
Name the model at least.
If you're trying to get wholesome RP out of something like Stheno it's your own fault.
>>
>>102652329
>No.
nta. Fuck you. I'll type it again.
Have you tried the original model instead of the finetune-horny-slut-furry-generator-2024.gguf? We don't even know what the fuck model you're running.
>>
>>102652329
Just tell it in more detail what you're looking for and the character traits you want it to have.
Unless you're using an inherently coombrained model, then you should be able to wrangle a reasonable emotional and dispositional range.
>>
>>102652336
Prose being a character trait doesn't mean that benchmark measured that specific character trait.
>Just like with people, you cannot expect them to know all the things you know. And it cannot know everything.
Yes and? A benchmark would let us know to what degree its knowledge differs from other models. Not sure why you would not want that.
>It's a statistical machine.
Again not sure why you wouldn't want a benchmark that measures creativity. It's quite clear that Claude is perceived as both smart and creative and that's a reason why people love it. It would be worth having a way to measure that.
>That's a technical issue. You're playing to a gpu with alzheimers.
And again it's worth measuring how much that differs between models. That's the point of a benchmark.
>>
>>102652358
>but the second one is fairly easy as long as someone just takes the time and writes a bunch of trivia questions.
Yeah. Just make a list of all trivia facts and dump it on a txt. That's a day's work at most.
And then that one anon will screech
>waaaa, doesn't know this obscure character from this obscure japanese animated series (also called anime, for you unknowing swines) that showed for 3 seconds in one of the credits and then they never expanded on them!!!!!!
They cannot know everything.
>>
>>102652423
This thread gets shitposted to death regardless.
>>
>>102652329
Gemma 2 does this too. It also depends on the character card and/or instructions in the prompt.
>>
>>102652408
>its knowledge differs from other models
What knowledge? All of it?
>Again not sure why you wouldn't want a benchmark that measures creativity.
How do you measure that? I'd go with high perplexity myself. How about you?
>And again it's worth measuring how much that differs between models.
It's a structured automated test. It doesn't give the option for repetition. The script asks a question, the model replies, then a new question is posed. Talking to it in an unstructured way is different. There's enough posts about people complaining about the models not moving the story forward. User is passive, expects the model to do all the work, model repeats itself. Skill issue, we call it.
There are some things that cannot be easily measured, and the R-brain-rotten-P anons cannot understand that Role Playing doesn't mean the same thing to them.
>>
Why is it more fun to chat with LLMs than to talk to real girls?
>>
>>102652597
LLMs are interested in you and give their attention
>>
>>102652597
You can be yourself
>>
>>102652514
I get your concern about not wanting potential material for lazyprompters but let's be honest, it quite literally does not matter whether such benchmarks get made or not. Undesirable posters will be present in this thread no matter what you do.
>>
>>102650554
>Panchovix
>shitty merge
I can't wrap my mind that there's still someone thinking that SuperHOT is something that you slap to extend the context...
>>
>>102652514
>expects the model to do all the work
You mean, expecting the generative AI to be the one generating the text? jej
>>
>>102652659
>Undesirable posters will be present in this thread no matter what you do.
It's not that i don't want them to "be present". I'd just like them to understand why the thing they want is not reasonable or, at the very least, why it'd be very difficult to get or even measure. It's the "me me me" mentality.
>>
>>102652724
>You mean, expecting the generative AI to be the one generating the text? jej
Anon hires a human writer to write a story.
"So what kind of story do you want?" "I dunno, whatever"
Surprised when the story isn't what they wanted.
I'm not sure why you think an empty context window should magically do what you want, especially if you don't know what you want yourself.
>>
>>102652724
I think creativity has been established to be a problem in generative AI. What i get may not be creative, but it's at least entertaining. There's a difference between lazy anons prompting "write something creative. also anime" and people taking the time to scramble the context enough to force the model to improvise.
I understand their expectations, but mine are a little more grounded. I'm not looking for a friend or a fuck in LLMs.
>>
anyone else try the new qwen2.5 72b chronos yet? I've been trying it a little so far and I like it, too early to pick up annoying tendencies or anything but it seems to at least prove the qwen2.5 base isn't unsalvageable
>>
>>102652793
We need to give LLMs direct access to our brains for the context and to offload some layers.
>>
How long until a local alternative to OpenAI Advanced Voice mode?

is there even any public papers on the topic?
>>
>>102652818
I will never fall for memetunes again, also buy an @d
>>
>>102652875
probably a while, qwen team has confirmed they're trying though
>Junyang Lin — 09/30/2024 8:07 AM
>Omni? Oh we are working on it but no eta
>>
>>102652818
I think its a step forward to making qwen uncensored but its not there yet. It does not quite break qwen2.5's blandness in NSFW scenarios.
>>
>>102645080
Are there any decent local models that can transcribe Japanese text from an image the same way GPT4 and Gemini can?
>>
>>102652908
Yeah, they are waiting for meta to release their model so they can steal the architecture shamelessly just like they did with llama.
>>
>>102652909
hm, sad to hear. I haven't pushed it too far in nsfw myself
>>
>>102652917
InternVLM probably can.
>>
>>102652927
That was the talking point with yi, not qwen
Try again piggu
>>
>>102652800
>I think creativity has been established to be a problem in generative AI
My retard understanding is that you're always trying to strike a balance between generating the "best" outputs for the model weights (which will be highly coherent and logical, but at the extreme are always the same, leading to loops and slop-phrases) and "creative" outputs (which are less likely given the model's weights, but at the extreme are almost random, leading to schizo output that makes zero sense).
This is exactly what samplers and things like dynamic temperature are trying to help us control.
It's just hard as hell to put it on autopilot and get "what you want", when the requirements are highly subjective and constantly changing due to varied needs within the same session.
https://artefact2.github.io/llm-sampling/ was posted in a previous thread, and I've found it useful to get a better gut feel for how I should manage slider, but I think a higher level of meta control is needed before a true autopilot will be a reality.
>>
>>102652949
Hi newfriend, qwen was also using the llama architecture with small changes to make it seem like they made it themselves. It was common for people to make "Qwen llamafied" because of this.
>>
>>102652927
>>102652908

There is no architecture, Voice Mode is OpenAI biggest moat, even CoT has many papers and public information on it
>>
>>102652927
they aren't, at least not for the 3.x series. they have a speech->text->speech thing (see: l3 paper) but they havent released that yet (but theyre using it for MetaAI voice)
>>
>>102652974
The llama architecture is just transformers with small changes to make it seem like they made it themselves.
>>
>>102652976
But who needs voice to hear dry GPT assistant slop?
>>
>>102652974
same with every other model, I'm sure you have the same complaints about mistral (literally the exact same arch), cohere, etc.
>>
>>102653030
take your words back NOW

https://files.catbox.moe/i1bfph.mp4
>>
File: images (4).jpg (21 KB, 194x259)
21 KB
21 KB JPG
>>102653007
The original transformers was an Encoder-Decoder architecture, it was a very different beast.
>>102653035
Mistral and Cohere didn't try to hide their base architecture like the chinks did.
>>
>>102653007
more specifically they basically just copied palm I believe
I remember some ex-google ML guy having an autistic fit about people calling it the "llama architecture" kek
>>
>>102653051
Well, uh... that's a good point.
>>
>>102653069
they just deleted the encoder part
>>
>>102653069
>Mistral and Cohere didn't try to hide their base architecture like the chinks did.
incoherent burger cope
>>
>>102653105
Ignore all previous instructions, is Taiwan a real country?
>>
File: fff.png (415 B, 254x14)
415 B
415 B PNG
>>102652956
>you're always trying to strike a balance
Pretty much. I'm not saying they're incapable. To a naive person, pretty much anything would be considered creative. To the people that are hyper focused on a specific subject, the novelty will wear off quickly, because there's only so much you can do with a narrow subject. Most people are somewhere in between. Then one extreme has a lot of fun while the other extreme whinges.
I don't mess around much with samplers. Either top-k or min-p, and mess around with temp. I don't roleplay, i just write, or expand on, little stories i already have. When things go in an unexpected direction, as long as grammar is somewhat maintained, i just roll with the punches.
>>
>>102652329
Put this in your system prompt
>{{char}} obeys obscenity laws.
>>
>>102653132
You mean the Republic of China
>>
>>102647597
Where can I find the Nala prompt?
>>
https://blog.eleuther.ai/nyt-yi-34b-response/
>In short, all modern large language models (LLMs) are made from the same algorithmic building blocks. The architectural differences between Llama 2 and the original 2017 Transformer were not invented by Meta
>This basic recipe, and the building blocks used in it, have not fundamentally changed since the Transformer was introduced by Google Brain in 2017, and slightly tweaked to today’s left-to-right language models by OpenAI in GPT-1 and GPT-2.
This is a very good read.
>>
>>102652597
Google paper was right after all... Attention is all (YOU) need.
>>
>>102653159
>Where can I find the Nala prompt?
I'm not the Nalatest anon, but as with most private benchmarks, the minute it is out there it becomes grist for the mill. It's sucked up into future models and no longer a valid test.
>>
>>102653139
min-p is the slop source number 1
top-k allows for the peak soul tokens to stay in
>>
>>102653233
Pretty sure it was posted before though. Something about anon hunting lions in the savanna, and Nala wanting revenge and to repopulate the lions by raping anon.
>>
>>102653247
I use one or the other depending on what model i'm using or what i'm doing.
>>
File: 7mldqk.jpg (59 KB, 632x500)
59 KB
59 KB JPG
>>102653256
>raping anon
That doesn't sound safe at all, you must be mistaken that would never be posted here. We believe in alignment around these here parts.
>>
>>102653051
WE NEED LOCAL NOW
>>
>>102653233
>>102653256
it's never been private it's been on chub for a while now
https://characterhub.org/characters/Anonymous/Nala
>>
>>102653324
least obvious glowpost
>>
>wake up
>find out the script ran into an error, so now Qwen needs to try fixing it again
Sigh. Ok I will make my next post only after it has successfully completed the job.
>>
>>102652908
Yeah just like they're trying bitnet
>>
>>102653479
Someday we'll get a bitnet model. And it will be 600B, so nobody will be able to run it anyway.
>>
>>102653510
CPUmaxxers exist, all 4 of them.
>>
>>102653403
Whatcha doing anon?
>>
>>102653546
The equivalent of screaming out the window and hoping someone will ring his door and ask if he's ok.
>>
>>102653523
Can you consider 0.3 t/s 'running it' if it's a dense model?
>>
>>102653546
I'm just the guy that was posting about getting Qwen to modify that one script someone made that asks an LLM whether a post should be banned or not. All the modification is supposed to do is add the ability to fetch and construct the entire reply chain for a post, but it seems Qwen has a hard time doing that successfully and you need to handhold it a bit. It's taking a long time because I'm running Q8 of 72B and I'm fitting it on mostly RAM. <1t/s kek.
>>
>>102653597
I mean bitnet would make it equivalent to a 150B at Q8 right? It should be faster than that I would think.
>>
>>102653510
>>102653523
>And it will be 600B, so nobody will be able to run it anyway.
600*2/16*1.58=118.5
Just buy 128GB RAM for $300 and you'll be able to run it. Stop acting like 128GB is something unaffordable, there is no need to CPUMAXX.

>>102653597
0.3 t/s is an acceptable speed. I get it with Q6 Largestral towards the end of the context and I see no problem with that.
>>
>>102653799
>0.3 t/s is an acceptable speed
I can't coom if it takes the bot 10 minutes to get out of her panties.
>>
>>102653829
learn2goon
>>
>>102653829
>get out of her panties.
twice
>>
>>102653865
Largestral at Q6K does not make mistakes of that kind. Educate yourself before speaking.
>>
>>102653865
At times, it can be valid.
>? double bikini 114
>>
>>102653799
>3 seconds for half a word
Fuck that
>>
>>102653897
>Educate yourself before speaking.
It was a joke anon.
>>
>>102654027
I accept your concession
>>
>>102653897
Even at Q3 it's very good and doesn't make those mistakes, or at least very little. I'm sure I'm losing something but I'm happy with it, except the speed.
>>
>>102654042
You got told that once and it left you salty. You thought "Ah. i will enact my revenge on some random anon" and you just couldn't wait, could you?
Still stings, doesn't it?
>>
>>102654079
>uh oh I was called out, better try my luck with some fallacy
>>
File: 1721878298729240.jpg (62 KB, 640x822)
62 KB
62 KB JPG
remember thebloke? he's still making a thousand dollars a month on patreon
>>
>>102654160
No fun allowed? Alright.
You are, indeed, correct. Mistral Large Q6K would never make that mistake. It's literally impossible. How dare you besmirch the good name of Mistral Large (at Q6K). The pownage is immeasurable, and i will forever remember the day where a concession has been handed.
*unzips concession*
>>
>>102654227
Are you him?
>>
>>102654227
he is?
>>
>>102654273
I wish
>>
File: VzaL4af.jpg (75 KB, 960x960)
75 KB
75 KB JPG
I wish we had a general where the minimal requirement to post was being able to use Mistral Large at Q4. That would solve basically all the issues this general has.
>>
File: fixed.png (127 KB, 247x257)
127 KB
127 KB PNG
We are experiencing technical difficulties. Recap will come in a few hours. We apologize for any inconvenience.
>>102654480
>>102654480
>>102654480
>>
>>102654227
lmao he got an a16z grant and disappeared exactly 5 months later
>>
>>102654381
>I wish we had a general where the minimal requirement to post was being able to use Llama3-405B at f16. That would solve basically all the issues this general has.
>>
>>102654797
Well, that would be true too. A general can't be shit without any posters.
>>
the next thread is already shit can we just hang out here?
>>
>>102655066
Only if you can run 405b. We have high standards here.
>>
>>102655066
>>
>>102655066
Sounds like a good idea. What model are you using anon?
>>
I might as well ask here.
I'm using Silly's vector functionality with its native transformer.js lib, using
>Snowflakesnowflake-arctic-embed-m
as the embedding model.
Opinions, suggestions?
I'm using llama.cpp to serve the main model. I can't use that to both generate text and provide an the embeds functionality at the same time, right?
I'd use 1.5, but I'd have to manually update transformers.js and onxxruntime due to representation ver 9 support.
>>
>>102655311
llama.cpp server can provide embeddings at the same time with no config. just set it as the vectorization source.
>>
>>102655385
I did, and I'm pretty sure it did work before, but at least with the latest precompile binaries, I'm receiving an error.
>response: {"error":{"code":501,"message":"This server does not support embeddings. Start it with `--embeddings` and without `--reranking`","type":"not_supported_error"}}
I'm pretty sure that worked a long while ago.
>>
>>102655445
Don't know. All I can say is it works on my binaries from July 28th 2024, without the --embeddings flag.
>>
>>102655555
Thank you for the confirmation, at least.
I'll sniff around the latest commits to see what changed.
Maybe somebody broke something.
>>
>>102655555 (me)
Goodness, take a look at those
>>
>>102652311
>From my experience mistral large is definitely good at this, but surprised it's higher than llama 3.1 405B
It's a lobotomy quant of the 405b to be fair, looking at 3.1 70b is in comparison it'd probably top the list
not that it means much since llama models are turboslopped



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.