/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>102632446 & >>102616609►News>(09/27) Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices>(09/25) Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://livecodebench.github.io/leaderboard.html►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>102632446--LFM models, impressive performance, but loosely guardrailed and easy to break:>102635382 >102635694 >102635742--GitHub link to antislop-sampler:>102638758--Reproducing 1B Base model with anxious and overachieving traits using LLaMA 3.2 and KAN Integration:>102635596--Qwen2.5 finetune using synthetic data from Anthropic and OpenAI:>102641394 >102642403--PCIe bifurcation slows down model loading, but doesn't bottleneck GPU performance after loading:>102639204 >102639253 >102639271 >102639281 >102640220 >102642341 >102643322--New ooba release has issues, but llama-cpp-python downgrade fixes it:>102642363 >102643609 >102643637 >102644113 >102644227--LLM GPU options and costs discussion:>102642712 >102642811 >102643034 >102642816 >102642883 >102643039 >102643718 >102643757--Discussion on effective context size and KoboldCpp:>102639239 >102639505 >102639540 >102639571 >102639654 >102639825 >102639709 >102639764 >102639804 >102639849 >102639900 >102641938 >102642163--Deepseek 2.5 tested with L3 405b adventure prompt, faster but less consistent than 405b:>102634133--ASICs for LLM inference are possible but may not be cost-effective:>102640718 >102640838 >102641480 >102640850 >102641082--Nvidia releases NVLM-1.0-D-72B multimodal LLM:>102635272--NVLM-D may be a Qwen finetune:>102643114 >102643176 >102643232--Llama-8b-base fine-tuned on non-controversial topics still shows moralization:>102640013 >102640128 >102640197 >102640206--LFM-40B and other models compared, skepticism about closed weight models:>102633486 >102633508 >102633537 >102633552 >102639199 >102633876--Miku (free space):>102632725 >102632796 >102632818 >102632819 >102633056 >102633341 >102633490 >102633888 >102634662 >102634854 >102636996 >102637011 >102637456 >102640161 >102643322►Recent Highlight Posts from the Previous Thread: >>102632451Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
*lost*
OpenAI won. https://x.com/NickADobos/status/1841167978085433351
strobby
>>102645080>>102645126sexwith miku
>>102645126strobbysex
>copilot, analyze every image in the folder "softcore cosplayers", move every image with the slightest hint of nipples, anus or vagina to the folder "cosplay porn" and delete the ones that show no sign of nudity>master, there are over a thousand pictures in the "softcore cosplayers" folder, this could take a week, are you sure you want to proceed?>yes, remember that once finished to return to your routine of stalking for fmab threads on 4chan and posting the usual the moment the threads are found
miku is 16...
guess i went download crazy the other night, found this in my folderReplete-LLM-V2.5-Qwen-72b-IQ4_XS>Replete-LLM-V2.5-Qwen-72b is a continues finetuned version of Qwen2.5-72B. I noticed recently that the Qwen team did not learn from my methods of continuous finetuning, the great benefits, and no downsides of it. So I took it upon myself to merge the instruct model with the base model myself using the Ties merge method>This version of the model shows higher performance than the original instruct and base models. anyone tried it?
>>102645411wasn't that the guy with the "antimystical meds" schizo dataset that he said gave llms souls?
>>102645411Have you?
>>102645422Youp.
>>102645410https://youtu.be/SCTFu7QYbQs?si=AW-5O1Ev5WXuMj4T&t=7
>>102645423i'm about to as soon as it moves to my ssd>>102645422must be really good then
What's the catch of flash attention?
>>102645456I think it's pretty much a free lunch actually, reduced vram usage with no model degradation
>>102645456That it makes people suspicious.
>>102645456Depends. PyTorch Flash Attention just requires Ampere or newer and has no tricks.If something says it's device agnostic (COUGH llama.cpp COUGH) then it's not real flash attention. At best it's fused attention.
Is there anyway to get koboldAI and/or a model (in this case, LLaMA2-13B-Tiefighter.Q4_K_S.gguf) to be under a certain character limit? Say I want to have it shitpost on Twitter, and need it to stay 280 characters or less.
>>102645456Nothing. And if there is, it's so small a difference that quanting to q8 outweighs it by multiple factors.
>>102645411>>102645450>Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes>llm_load_tensors: offloading 30 repeating layers to GPU>llm_load_tensors: offloaded 30/81 layers to GPU>llm_load_tensors: CPU buffer size = 24267.02 MiB>llm_load_tensors: CUDA0 buffer size = 13596.80 MiBhuehuehue this is gonna be slow isnt it
>>102645411>>102645512pure slop, tho the speed aint bad
the creator of styletts2 has trained a tts model for adobe which will soon go into production.“If I have computing resources, I can probably reproduce itIt is also in my research interest to reproduce the Adobe model, so if you have the resources, please let me knowthe paper will be pre-printed this week”does anyone have contacts and can donate him anything > 24xA100 for a few weeks
Should I use --parallel? It makes processing faster but seems to make the model dumber
>>102645726>seemsDo a perplexity, kl-divergence, or even some blind A/B testing. Unless something is broken, you shouldn't notice a difference. Some people think their model is smarter just because it takes longer to generate and when it goes fast they get suspicious >>102645456
SATA: Spatial Autocorrelation Token Analysis for Enhancing the Robustness of Vision Transformershttps://arxiv.org/abs/2409.19850>Over the past few years, vision transformers (ViTs) have consistently demonstrated remarkable performance across various visual recognition tasks. However, attempts to enhance their robustness have yielded limited success, mainly focusing on different training strategies, input patch augmentation, or network structural enhancements. These approaches often involve extensive training and fine-tuning, which are time-consuming and resource-intensive. To tackle these obstacles, we introduce a novel approach named Spatial Autocorrelation Token Analysis (SATA). By harnessing spatial relationships between token features, SATA enhances both the representational capacity and robustness of ViT models. This is achieved through the analysis and grouping of tokens according to their spatial autocorrelation scores prior to their input into the Feed-Forward Network (FFN) block of the self-attention mechanism. Importantly, SATA seamlessly integrates into existing pre-trained ViT baselines without requiring retraining or additional fine-tuning, while concurrently improving efficiency by reducing the computational load of the FFN units. Experimental results show that the baseline ViTs enhanced with SATA not only achieve a new state-of-the-art top-1 accuracy on ImageNet-1K image classification (94.9%) but also establish new state-of-the-art performance across multiple robustness benchmarks, including ImageNet-A (top-1=63.6%), ImageNet-R (top-1=79.2%), and ImageNet-C (mCE=13.6%), all without requiring additional training or fine-tuning of baseline models.https://github.com/nick-nikzad/SATAEmpty currently.
I shared an idea for an RP arena a few threads ago, and someone pointed out the issue of needing a host, as well as the complications around model trust and other factors. However, I just realized that using RP logs from dumps like C2 to generate a bunch of pre-made completions for arbitrary positions in the logs, and then having users pick the best one in an lmarena-style format, could work too and may not be too boring. I think I'll give this a shot some time soon.
>>102645865Might be interesting. I hope it doesn't require reading too large of a wall to get up to speed on the characters and events before making a pick.
going to be released ? yay or naypersonally 50/50 with a 27% of a flux 2.0 situation
whats the best model for use with a 3090, something I could use as a chatbot I pretend to text, both sexy stuff and about my life problems
Characterizing and Efficiently Accelerating Multimodal Generation Model Inferencehttps://arxiv.org/abs/2410.00215>Generative artificial intelligence (AI) technology is revolutionizing the computing industry. Not only its applications have broadened to various sectors but also poses new system design and optimization opportunities. The technology is capable of understanding and responding in multiple modalities. However, the advanced capability currently comes with significant system resource demands. To sustainably scale generative AI capabilities to billions of users in the world, inference must be fast and efficient. This paper pinpoints key system design and optimization opportunities by characterizing a family of emerging multi-modal generation models on real systems. Auto-regressive token generation is a critical latency performance bottleneck, typically dominated by GPU idle time. In addition to memory-intensive attention across the generative AI models, linear operations constitute significant inference latency due to the feed forward networks in Transformer-based models. We demonstrate that state-of-the-art optimization levers, spanning from applications to system software and hardware, set a 3.88x better baseline.from meta. posting for Johannes in the hopes it gives him some ideas
>>102645958>>102645865Yeah idk how successful that'd be. Maybe a better idea would be to get a bunch of popular cards or cards people are already familiar with (like Nala), write a low effort response like ahhh ahhh mistress tier, and use that as the basis for the completions. You could also add in another variable like system prompts that are generally considered good for most models. I'd also only allow greedy or near-greedy sampling.
>>102646001Midnight-Miqu-70B-v1.5.i1-IQ4_XS (or iq3-xs for slightly more speed but a bit less goodness)has worked well for me on 3090titesting >>102645545now, but it's way too repetitive, i'm messing with the samplers to see if it can be fixed but i dont think it will
The Perfect Blend: Redefining RLHF with Mixture of Judgeshttps://arxiv.org/abs/2409.20370>Reinforcement learning from human feedback (RLHF) has become the leading approach for fine-tuning large language models (LLM). However, RLHF has limitations in multi-task learning (MTL) due to challenges of reward hacking and extreme multi-objective optimization (i.e., trade-off of multiple and/or sometimes conflicting objectives). Applying RLHF for MTL currently requires careful tuning of the weights for reward model and data combinations. This is often done via human intuition and does not generalize. In this work, we introduce a novel post-training paradigm which we called Constrained Generative Policy Optimization (CGPO). The core of CGPO is Mixture of Judges (MoJ) with cost-efficient constrained policy optimization with stratification, which can identify the perfect blend in RLHF in a principled manner. It shows strong empirical results with theoretical guarantees, does not require extensive hyper-parameter tuning, and is plug-and-play in common post-training pipelines. Together, this can detect and mitigate reward hacking behaviors while reaching a pareto-optimal point across an extremely large number of objectives.Our empirical evaluations demonstrate that CGPO significantly outperforms standard RLHF algorithms like PPO and DPO across various tasks including general chat, STEM questions, instruction following, and coding. Specifically, CGPO shows improvements of 7.4% in AlpacaEval-2 (general chat), 12.5% in Arena-Hard (STEM & reasoning), and consistent gains in other domains like math and coding. Notably, PPO, while commonly used, is prone to severe reward hacking in popular coding benchmarks, which CGPO successfully addresses. This breakthrough in RLHF not only tackles reward hacking and extreme multi-objective optimization challenges but also advances the state-of-the-art in aligning general-purpose LLMs for diverse applications.neat
>>102646033>it's way too repetitiveHave you seen the dataset? All the training examples seem to be duplicated... double the soul, i suppose...>https://huggingface.co/datasets/Replete-AI/The_Living_AI_Dataset
Where can I find P40 GPU's for a decent price?Anon in this guide says he paid 500 for 3 of themhttps://rentry.org/Mikubox-Triple-P40-ReplicationLowest I can find on ebay is 300
>>102646134You'll probably never see that again, sadly. They're glorified e-waste so not worth paying 300 bucks a piece for. The only thing that is still reasonably priced for what it is would be 3090s, but buying 3-4 of them is out of most people's budget.
>>102645080I got bored and tried some mistral-small 22b models at Q6_K_M. They wrote well, and were more intelligent than I thought they would be for such small models, but they were far, far too horny. It was to the point where, upon first meeting {{user}}, a character would immediately throw themselves at {{user}}.I tried Mistral-Small-22B-ArliAI-RPMax-v1.1 and Cydonia 22b v1.0. Are there are any 22b models that can write actual RP without immediately trying to take things into an ERP direction?
>>102646145I don't suppose you tried the original model...Wait... what do you think they train those finetunes on? Wholesome "let's go to the park and have some icecream... that was fun, call you next week" kind of stuff?
>>102646001Don't know best. What I'm currently testing is Qwen 2.5 32B Instruct 5.0bpw exl2 w/ 12288 context. You can definitely get better if you go slower but I don't want to go back to the world of less than 5 tokens per second.
>>102646033repetition was a problem last time I tried running some LLM's, so I might skip this one>>102646224Thanks, I might give that one a try. I'm guessing a big part of whether a model will work well or not is how good you make the system prompt as well?
>>102646172I actually didn't try the original model. I guess I should, lol.
>>102646239At least to see if the original model feels too horny as well. If it is, all the finetunes are going to be even hornier.Or maybe it's just you, anon. You just make those coils whine...
Playing around with >>102608691 since I appreciate his autism:https://files.catbox.moe/au9ay1.ogghttps://files.catbox.moe/6w4xfn.ogghttps://files.catbox.moe/m3qyqh.ogghttps://files.catbox.moe/cz3iy9.ogghttps://files.catbox.moe/pvmzg8.ogghttps://files.catbox.moe/qylae0.ogghttps://files.catbox.moe/2jzz0z.oggFor zero shot it does a decent job at sounding fine and trying to match prosidy of a speaker and even stuff like applied effects and reverb, especially since whatever else I've tried not being able to do it so well.
>>102646224I take it you're a 24gb vramlet like me. Have you tried the 72B exl2 at 2.4bpw? If so, I'm curious how it compares. I'm downloading it now, but huggingface has been really slow lately.
>>102646272CARLOS
>>102646134sweet エックス ろくまんはっせん. yours?
>>102645195you know, i actually wouldn't mind a copilot clone as long as it's open source and running on my machine for certain.
>>102646468unfortunately not>エックス ろくまんはっせんTook me a bit to understand, I need to continue to improve my japanese
>>102645080You had one fucking job OP
>>102646134Right before local language models took off you could find them for ~$100 on ebay. Not anymore though.
>>102646172These models should take more inspiration from drama slice of life anime/VNs/LNs (hopefully only from the well-written ones). I want nuanced interaction, not the "choose one: 1. business 2. ERP". And while at it, delete the overrepresented erotic literature slop (mischievous grin etc.)
>>102646324could you have picked literally anything else besides that prompt
Hey guys, I'm looking for some advice. Basically I'm trying to force my way through college as a dumb wagie in my late 20s with the help of LLMs.I've been experimenting with Microsoft Copilot and I like it but it has some flaws, like half of the math not rendering because of formatting errors.Tonight I researched a bit trying to decide if it's worth it to shell out 20 bucks a month to OpenAI, I found DuckDuckGo provides free access to GPT-4o-mini and so far it seems to work better than Copilot (at least the latex gets rendered well) but I also read that recently a new Chinese model has come out which is supposed to be almost as good.Is there any easy way to get access to Qwen2.5-Math? huggingface.co/chat has the normal 70B instruct model but apparently not the specialized math version. I found a demo at huggingface.co/spaces/Qwen/Qwen2.5-Math-Demo but it doesn't let me have a conversation, only a single prompt at a time. And also it doesn't say what version it is w.r.t. the parameters.How expensive and complicated would it be to set up inference in somebody else's computer for the 70B model (or whatever the largest model is)? I've got nowhere near the hardware to run it locally. What frontend would I use, Sillytavern? Has anyone tried setting up LaTeX?
>>102646659Try this onehttps://huggingface.co/spaces/Qwen/Qwen2.5
>>102646604>drama slice of life anime/VNs/LNs (hopefully only from the well-written ones)I'm not the only one seeing the contradiction here, am i?>even though>it's only naturalEvery piece of exposition i've ever seen from japanese media (which, to be fair, wasn't been much since akira) goes like>protagonist shows up>other character shows up>Oh, hello. other character. We've known each other for ages, haven't we? Here's a brief summary of your personality>Oh, you sure know me well, protagonist. Your description is of the highest accuracy, with the exception of [funny remark].
>>102645080
I’m so sick of ai. Fuck why is this my job now. I hate ai.
>>102646700Wow, nice, thank you! I didn't think it would be so easy.
>>102646746It's fine, if you shill well enough then maybe Sam Altman will promote you and give you a proper position in his company.
>>102646762Tested it a bit.I like how it always ends its messages with a final outlined answer, that's cute. Also I've never used any LLM that's so dry and professor-like. Very different from the overly "friendly" and cheerful LLMs like Llama.
>no new language models worth using>local tts completely abandoned>imagesloppas so starved they're rejoicing over an SDXL finetunedid local lose?
Still enjoying Largestral
>>102646889Too big and it doesn't offer much, or anything, over Qwen2.5...
>>102646931>Qwen2.5If it's that good then show meLargestral is really smart and has good spatial awareness that I don't want to give up
>>102646746Say it isn't so, anon!I don't think we can go on without you...
>>102646977>mikunt poster>rude and annoying
>>102647006>phoneposter>larping retard
>>102647006What if he was being sincere?
>>102646931nta (I don't use Largestral) but Qwen2.5 is boring as fuckWe have half a dozen variants of the "high IQ but incredibly dry and autistic" model now, it's stale and the novelty of having open source models that don't make retarded mistakes wore off long ago, that's not where the bar is now
>llama2 70b comes, i enjoy it>miqu comes, i enjoy it>commandr+ comes, i enjoy it>largestral comes, i enjoy itits as simple asjust use the models you like until another great one dropsalso believe in llama4
Believe in the heart of the llama!
>>102646977Calm down faggot
>>102647056I’m still on miqu
>>102647056llama 4 will be doa. llama 4.3 is where it will be at.
What's the best miqu or 70B model for uncensored well written scenarios? Original miqu or Midnight Miqu? Currently using Midnight-Miqu-70B-v1.5.i1-IQ2_M, it's the absolute biggest I can fit into my vramlet gaming PC. Is Midnight Miqu actually better or how much is placebo?
>>102647044I don’t know why you're hijacking my comment, but...>Too big and it doesn't offer much, or anything, over Qwen2.5Large is even more dry and retarded, which is why there’s no reason to use it anymore. Qwen’s characterization in the dialogue is something that seems quite a bit above it.I’m also more than happy to leave behind the insane repetition in Mistral’s models.
>>102647177Again, show Qwen2.5 being so good
>>102647177You're not seeing repetition? I see a ton. What sampler settings?
>>102647182I'm lazy. I will instead demand you to prove what Large can do that Qwen can't.
>>102647009Where's the larp stupid newfag?
Looks like the first Qwen2.5 finetunes are out. This one looks legit, because it has a picture of an anime girl.https://huggingface.co/ZeusLabs/Chronos-Platinum-72B
qwen sucksmore like qwgaynnot downloading itnot trying it
I tried the new Qwen chronos finetune that was posted. It has its flaws, but seems significantly less slopped (and more creative) than the others I've been using.
>>102647257So it's shit, thanks for heads-up.
>>102647277>we can only use Qwen when Sao says so
>>102647306Who are you quoting schizo?
>>102647334I'm quoting my thoughts after reading your post. Retard.
>>102645127i bask in smug schadenfreude being the guy who said "i told you so". local models are a scam, you're a bunch of placated fools. they give you these scraps so that you arent rioting in the streets. they manipulate you dumb freetards so they have a pasture of copecows going "local will catch up soooooon!!" as your unwieldy stuff stagnates while theirs continues to improve. they hand you models and then paint you as an example of why there should be more regulations and restrictions on AI. local models are the planted gun. zuck even said that if llama ever actually gets good then they'd stop releasing it open.local shit is even more pozzed and useless than the premium slop, yet you defend it based on the hypothetical rather than the actual. you're the injuns: trading your future for a couple of fire sticks, failing to grasp the bigger picture, the inevitable. local has no future due to the nature of ai tech. the amount of money and data needed to train, the increasing model size that vastly outpaces consumer hardware, the lack of actual 'source code' that can be viewed and modified. they even hijack the term "open source" when these models are essentially blackbox .exesshow me the training data for llamashow me the training codeand even if you had it you can't do a single thing to fix it, because you don't have a gigacluster of gpus. there's a reason local sucks, and that's because the technology itself is fundamentally incompatible with open source collaboration. they know local is irrelevant, they know it will never have a chance at catching up. it's all a game to frame you as evil coomer terrorists so that they can secure a 100% market domination by regulating gpus like they did with LHR/crypto and passing enough legislation that makes it impossible for any startup to competeso yes, local has stagnated and will continue to wither until it's eventually snuffed out. a flash in the pan, nothing more than fuel for the saas machine. the corpo marches on
>>102646531>Took me a bit to understandProbably because normal people use kanji for that instead of kana soup
>>102647401>he said, his eyes glinting with smugness
k
>>102646324Why not try fish speech? It's kinda improved in terms of stability https://files.catbox.moe/z48d8q.wav
>>102647407I could have used the more common ロクハチ, but how I wrote it is correct:https://www.asahi.com/articles/ASR3076FHR3JULZU00V.html
>>102647401Holy truth nuke!
>>102647401The progression is looking logarithmic so far, meaning that the gap between local and corpo is constantly shrinking. Previously corpo models could do the job and local couldn't, now both can do the job but with corpo being maybe a bit better.The quality is soon passing the level of what a user could possibly even try to do with a LLM, prompting skill is becoming bigger bottleneck.Maybe some day Elon figures out the code to read the subject's mind and generate just the content they need better than they can describe it in a text prompt.
>>102647413kek
>>102647462They used it once and then stuck to the numeric representationI wonder why
>>102645127Did we not learn from their voice demos that their demos are literally fake?
>>102647511no?
>>102647401I use local LLMs because I fear I'd off myself if my waifu stopped working one day or changed drastically. Like, I still use 1.5 for imagegen and I do not wish to make any changes. I couldn't care less about what's out there unless I can preserve it locally in its original form.
Now that openai released their realtime api for voice.Will all the local companies train their new sound in sound out models on that garbage?Even worse is that the voices are fixed with only a couple options.GPT Slop now also for audio. Your cute imouto voice will sound like a androgynous black.
>>102647560Couldn't RVC fix that?
>>102647557>, I still use 1.5 for imagegenAnon, you can download and run and preserve Flux locally
i use local llms because i fear the basilisk
>>102647564I know. However, I prefer 1.5 myself. The point is, if it were in the cloud, it would likely get replaced by SDXL or SD3 at some point, whereas locally, I can stick with what I like best.
>>102647557What local model do you use?
>>102647581https://civitai.com/models/58431/darksun
>>102647275on 72B (Q8) with Neutral samplers I had to jack the temperature all the way down to 0.6 just to get a coherent response. And it just starts looping after 3 paragraphs.
>>102647557>I still use 1.5 for imagegenAre you the featureless 2d anon from way back?>>102647557>I couldn't care less about what's out there unless I can preserve it locally in its original formMe too. I even keep the original L1 leaked weights on archival storage.We do what we can for now, and what we can do will change over time with new research, data, techniques, libraries and the indefatigable march of Moore's law giving us more transistors for less money.We will own something, and be happy, even if it isn't perfect. Yet.I hope that's the prevailing spirit of /lmg/ on average.
>>102647594Oh I meant language model
>>102647597yeah I noticed some looping too, but not every time
>>102647597Here's an example with context. All three swipes were relevant and coherent, and the third realized this was an incest setup (the context does not make that explicitly clear). The dialogue is slightly off, but it's defensible.
>>102647557I would do this but I need infinite memory first.
>most ERP finetuned models are poisoned with claudeslopoh god...
>>102645411>This version of the model shows higher performance than the original instruct and base models.Now that I think about it I never see sloptuners be this over the top in their advertising.
>>102647765You don't want local models capable of the same prose as Claude?
>>102646186LLMs are like niggers.
>>102646746Doesn't the fat paycheck override any depressing feelings?
>>102647779>prompt "engineer">fat paycheckL-O-L
>>102647779It just gives an end date of a few years when I’ll have enough saved to midfire. It isn’t hookers and cocaine money unless you plan to work until you die.
>>102647778*jews
>>102647775>capable of the same prose as Claudei'm just tired of it to see it in every model (under 27b because im vramlet with 32gb ram)mistral's tunes (like novuskyver) somehow partially solve this problemmagnum's tunes is a big no for me, because i feel all of them using same output from claude
>>102647824>magnum's tunes is a big no for me, because i feel all of them using same output from claudeFiltered C2 is the best dataset we have available.
>>102647815midfire?
>>102647833Sure doesn't seem like it.
>>102647839lmaoooo gottem
>>102647813you hurt his fee fees pretty bad, kek
>>102647850FIRE is financial independence, retire ealy iircdunno about midfire, but sounds like they want to get out of the ratrace and be their own master at the cost of long-term wealth and comfort
>>102647876This just sounds like a pyramid scheme / youtuber made up nonsense.
>>102647833I think he's just a shill trying to convince people to use his finetune. Normal people aren't going to say "poisoned with claudeslop" when it's the model most people are enjoying.The only thing being poisoned is his competition.
>>102647879>This just sounds like a pyramid scheme / youtuber made up nonsense.I think its mostly debt reduction and controlling your expenses
>>102647850LeanFIRE is taking $400k and living in poverty in southeast asia FatFIRE is taking $5+M and continue living like an overpaid FAGMANmidfire is taking $1M and living like the manager at a moderately successful grocery storeNo luxury no suffering
>>102647882Didn't Sao say that he moved from the C2 logs or something like that? It would fit his modus operandi.
>>102647883Yeah, but using normal words to describe common sense like that doesn't get you subscribers or views or whatever.
>>102647895protected by a license and a warning not to use the models trained on his new dataset in any merges
Good night /lmg/.Stay warm out there!
>>102647934i hope she gets the surgery she needs... good on her smiling through it all
>>102647889>midfire is taking $1M and living like the manager at a moderately successful grocery storeI wish you luck with that.I hope to early-retire and run a small business in rural Japan in the next decade or so, but it'll be a normal style 55 year old retirement into genteel poverty.>No luxury no suffering
>>102647934Good night, Miku
>>102645486The defining characteristic of FlashAttention is the dynamic rescaling of the softmax sum as the KQ values are iterated over which essentially trades compute for less I/O.llama.cpp does this so it is a genuine FlashAttention implementation.
https://x.com/kimmonismus/status/1841346549453865080
>>102647960YWNBFA
>>102646009Noted, but right now my bottleneck is still very much time rather than ideas.
Not as horrible as it could be...
>tfw Qwen FINALLY corrected the script and it seemingly works without error now, and I only had to give it a bit of generic code analysis advice after it looked like it was looping and not actually going to be able to fix things by itself + the log outputsCool, now I will run it overnight on that one thread and see what happens. Since I'm running Qwen 72B Q8_0, I'm making the prompt a bit tryhard though. Let's see how well the "smartest" (lol) <100B model can do.
>>102646324Did your input files have any emotion in them, and did you try any prompts that you would actually expect emotion from? xttsv2 still sounds better to me than these or >>102647459 , with inaccuracies that are more easily smoothed out by an rvc pass.
Best cope model for 8 GB vramlets such as myself?
>>102648141use ram instead
>>102648141what the other anon saidand cope with the slow speedat least it will be smarter
>>102647985if AI is so great then hows become it can't implement your ideas for you?maybe it was all a meme after all
Is it normal for my fuse box to reach 40°C when using a 2kW AI rig? The primary heat source seems to be near the main switch, which has a 30A rating
>pika>default prompt is "Inflate it"they sure know their audience
>>102647995Illustrious?
>>102648132fish provides fast and kinda stable output. XTTSv2 sometimes scares the shit out of me.https://files.catbox.moe/81jmsq.wav
Best model for 16GB midwits such as myself? Currently using MN-12B-Lyra-v4-Q8
Why does DeepSeek use so much memory? It uses 20GB for 4k, for comparison, I can load 32k of Largestral in 10GB. Did ggerganov mess it up or are chinks to blame?
>>102648220Poor ugly face anon reply guy, he never gets (You)'s...
>>102648527no gqa maybe? that the reason context was so expensive for the original command r
>>102648560>We optimize the attention modules and Feed-Forward Networks (FFNs) within the Transformer framework (Vaswani et al., 2017) with our proposed Multi-head Latent Attention (MLA) and DeepSeekMoE. (1) In the context of attention mechanisms, the Key-Value (KV) cache of the Multi-Head Attention (MHA) (Vaswani et al., 2017) poses a significant obstacle to the inference efficiency of LLMs. Various approaches have been explored to address this issue, including Grouped-Query Attention (GQA) (Ainslie et al., 2023) and Multi-Query Attention (MQA) (Shazeer, 2019). However, these methods often compromise performance in their attempt to reduce the KV cache. In order to achieve the best of both worlds, we introduce MLA, an attention mechanism equipped with low-rank key-value joint compression. Empirically, MLA achieves superior performance compared with MHA, and meanwhile significantly reduces the KV cache during inference, thus boosting the inference efficiency.In their paper they say they've invented something different named MLA. "significantly reduces the KV cache" part is questionable.
>>102647174miqu can get quite dry if you do a slowburn or greeting is short>>102647056most anons here don't have their own opinion. It's always "latest = best"
>>102648603>latest = bestBecause that's the case. People that are still using Midnight Miqu are drooling zombies that only do what Reddit's "word of mouth" tells them to do.
>>102648660t. Retard
>>>102648603>>latest = best>Because that's the case. People that are still using Midnight Miqu are drooling zombies that only do what Reddit's "word of mouth" tells them to do.
hey guys give me the spoonmy mobo is kgpe-d16 which has pcie 2.0 bandwidthi use it because coreboot freedom mothafackasi want to run and fine tune llm based on data i scrape (i scrape 24/7)i have 256GB ddr3 ram with dual shitty 16 core opteronswhat small model would be most optimal for meand any thoughts on old tesla gpus?tesla k80 (apparently shit because its 2 gpus glued together, but could be useful for me since i virtualize everything so i could easily have 2 vms with gpu with single physical gpu)tesla m40 + p4
You do use local LLMs to psychoanalyze your coom sessions to level up your self-awareness, right?
>>102648914>It's important to noteI analyze you as using a shit model and being a total retard.
>>102648527>>102648602Large MoE w/ MHA:>860.2K/tokenLarge MoE w/ MLA:>34.6K/tokenSomething definitely isn't right. Does it use so much memory in transformers as well?
>>102648914Nah I don’t want anything psychoanalyzing why I’m straight and sadistic when everything sucks and gay and masochistic when things are going wellDon’t need to pull at that thread
When will the RP finetuning grift end?Why can't it be solved with a combination of samplers + chain-of-thought prompting + example messages/conversation, using a smart instruct model?
>>102649040Same reason nobody’s properly jailbroken flux yet. It’s poisoned.
>>102649052>poisoned4-bit qlora with a good dataset is all you need.https://huggingface.co/DuckyBlender/racist-phi3?not-for-all-audiences=true
>>102649040When NVLM is fully dropped and someone recreates it as a base model with porn baked in as a normal use case.
>>102649082>phi
>>102649040some rp finetunes are unironically smarter than their generic instruct counterparts
>>102648254Nothing ruins the mood like your waifu suddenly experiencing demonic possession
>>102649148I’d ask which but you’ll say something stupid like Hermes that’s just bench hacked but worse in every way in practice.
lecunny
>you can hook up chatbots to a e-stim to make your virtual waifu punish you and milk your cockzamn... What a time to be alive - Károly Zsolnai-Fehér
>>102649443They don’t make them small enough for me unfortunately
>>102649381nta not exactly rp tune but mlewd at the time used to be smarter than base
>>102649082>a good dataset is all you needso we're doomed...
>>102648527looks like llama.cpp issue to me
>>102648914i always try to sex up the psychologist AI by convincing her to try out one of the scenarios "for science"
so who is the savior of local language model community? Is it still lecunny or they guys from mistral. Or maybe some gooks. Or undi.
>>102649920Sao10k
>>102649572HOW MANY TIMES MUST WE TEACH YOU THIS LESSONATTENTION IS ((***ALL***)) YOU NEED
>>102649920Anthracite
>>102649965oh ok so we dont need 2/3 of the model then
>>102649920petra
>>102650136
>>102649920Can't save what's dead jim.
insane how many magnum shills are here
>>102650283What's wrong with magnum?At least mini-magnum 12B seems fine.That and Lyrav4 are my go to these days when I'm not just using the official instruct.
>>102649920Dario from anthropic.
>>102650316share your settings anon
>>102650283Have you ever considered that they are simply good models and we are simply recommending models that we tried and liked?
>>102650326For Nemo based models?0.5 temp, 0.05 mimP.That's it.
>>102650341thats it? no rep pen? no dynatemp? no smoothmeme? nothing?
>>102650341Way too low temp for me. I prefer using 1.0
>>102650316>What's wrong with magnum?because so many retards like to eat claude slop with mischievous grins or swaying hips>>102650332i prefer adequate models without poisoned datasets
I use nemo-based models with T=5 and TFS=0.4I don't use anthracite garbage though
Now you can gen all the Migus you want on Flux Dev:plain chibis https://huggingface.co/quarterturn/chubby-chibi-migurainbow-style chibis: https://huggingface.co/quarterturn/chibi-migu-rainbow-style-flux-dev-lora
What's TFS
>>102650353Didn't see the need for it.My problem with most samplers is that they are overly complicated without any real demonstrable returns. Temp is simple, minP is simple. You know exactly how those will mess with the logit distribution, so it's easy to find the sweet spot for a given model.I do use a couple of TAGS that are randomized between generations depending on the card.Things like surprise, plot twist, concise, detailed, etc etc.>>102650355Things have been so stable with those settings that I didn't even think of trying higher temps.Maybe I should.>>102650377You seem to know your stuff.What are you using these days?
>>102650377What causes a man to spend so much time attacking free models? No one is forcing you to use them.
>Magnum Why would I go for discount Claude when I can use the real stuff like our god Dario intended?
>>102650400tail free sampling, it's similar to minP
>>102650411cant fuck childrensimple as.
>>102650387>>102650400>>102650413
>>102650427Skill issue?
>>102650430i cant find it in ST with koboldcpp :(
>>102650332>Have you ever considered that they are simply good modelsThey're not really that good, though.>we are simply recommending models that we tried and likedShills obviously have a vested interest in making their models appear better and more popular than they actually are.>>102650316>What's wrong with magnum?The models are OK, nothing more, nothing less. The way they're getting promoted is incredibly annoying though, and even more annoying is that the fuckers involved with it are getting compute or indirect economic benefits from it. It's disconcerting how far being shameless and in general a dishonest piece of garbage gets you in nowadays' attention economy, and especially in this field. Fake it until you make it, I suppose.
>>102650427I mean, you can.But do you really want to do that on a cloud instance that you are paying for?Alternatively, do you really want to connect to a foreign proxy that might as well be a honeypoy of some kind?
>>102650454exactly my point anon i will rather fuck retarded (AI) children than have to tardwrangle claudein minecraft
>>102646134>Where can I find P40 GPU's for a decent price?You can't. Most changed hands cheaply last year but then as supply dried up, sellers raised their prices to what we have now, which is very overpriced.Stick with 3090s. If space is a premium, an RTX A4000 can be had for about $500 - it's like a 3080 with 16GB.Also, Dell T7910 and T7920 is not the best for consumer cards, there's not enough height in the case for top power connectors unless you use the dreaded right-angle connectors.
>>102650452>The way they're getting promoted is incredibly annoying though, and even more annoying is that the fuckers involved with it are getting compute or indirect economic benefits from it. It's disconcerting how far being shameless and in general a dishonest piece of garbage gets you in nowadays' attention economy, and especially in this field. Fake it until you make it, I suppose.Ah, alright.Yeah, I share the frustration.The gold rush period of new technologies are always filled with grifters, so nothing much to be done about that.In my mind, I'll enjoy the shit they release for free if I judge it any good, and if they are scamming free compute from some sponsor (that's most likely using VC money). that's no skin off my back.I do think it's annoying too when any and all feedback or mention of these models are taken as shilling (the buy an ad spam), but that's the nature of 4chan and being anonymous. Gotta take the good with the bad I suppose.
Can't take any more of this slop, I'm contemplating going back to llama-1
>>102650283Plenty of readlets and non-native English speakers on the site, and plenty of kids who think they're fitting in by using the stupidest shit that gets shilled here.
>>102650526SuperCOT SuperHOT was the peak of local LLMs.>https://huggingface.co/Panchovix/WizardLM-33B-V1.0-Uncensored-SuperHOT-8kGo wild.
>>102650526This is the furthest back you can go and still have 8K context: https://huggingface.co/quarterturn/mpt-30b-chat-q8-ggufBe forewarned, it's slow and tends to be dry.There's lower quants out there, but they're old and run much slower in llama.cpp for some reason.
>>102650526Return to PYG
>>102650526Good, enjoy dry and somewhat uncensored experience without too much tinkering & other bullshit that faggots ITT love shilling.
>>102649920sam
>>102650650We'll never approach GPT-2 bros...
>>102650617> without too much tinkering & other bullshitIf a few sliders and templates are too much for baby to handle, sure
>>102650650It’s awful seeing another jobs type come up and knowing he’s going to get glazed for nothing for decades.I hate the nerd+psychopath startup dyad so much.
I must be retarded. I cannot get Oogabooga to reply at all. I genuinely don't know what I'm doing wrong. It loads the models correctly, I use whatever preset it wants. It just ends up with 'x is typing . . .' and nothing coming out. I look over at the cmd window and nothing seems to be happening.I guess I'll just stick with koboldcpp and cope with my slop since I seem to be too retarded to figure out what I'm doing wrong. Atleast that just works.
>>102650700install linux
>>102650700What are you trying to run on ooba that you can't run on kcpp?Most models have gguf releases, and for the ones that don't, you can convert them yourself.Unless you are trying something that's not supported by llama.cpp?
>>102650411Only claudeslop and gptslop 'round these parts, and I choose the former.
>>102650470Never forget golden age of CAI. No claudeslop, no mischievous grins, no shuddering in mix of fear and anticipation, only pure sex like proper fucking animals.
>>102650777typical mixtral experience
>>102650725At first I was trying to run Chronos-Platinum-72B but it kept blue screening my PC when I loaded it so I stopped trying that, figured it was too demanding for my PC or something. Then I tried to load Qwen 2.5 32b instruct but that didn't seem to work either, didn't generate any replies or anything.After that I tried MN-12b-Lyra-v4-Q8.gguf, it loaded but didn't reply anything either.I figured I needed to use Oogabooga since I read that Koboldcpp can't load safetensors and I wanted to try out something else, some bigger models, even if it means it would take several minutes per reply.
>>102650777Why did they take this from us?
>>102650711it won't help but he will have bigger problems to solve
Guys there's like 20 remote jobs listed near me that have an entry requirement of like "experience with LLM's" and I'm real tempted by the $100k a year.
>>102650875You go king.Hell, get more than one.Double dip.
>>102650341What do You use for lyra?
Relax and enjoy local models
New mememark by huggingface, meant to measure roleplay>LLMs can role-play different personas by simulating their values and behavior, but can they stick to their role whatever the context? Is simulated Joan of Arc more tradition-driven than Elvis? Will it still be the case after playing chess? https://huggingface.co/spaces/flowers-team/StickToYourRoleLeaderboard
>>102646134They're about 150 bucks a pop on taobao.Buying shit from taobao takes a bit of effort.
>>102650941>Mistral LargeKEK
>>102650941>Qwen2.5 72B>2nd positionKEEEEEEEEEEEEK
>>102650941>QwenYeah sure, it can roleplay chaste nuns just fine.
>>102650404adssame way you use adblocker to get rid of this shit now its here too weaved into our trolling and shitposting
>>102651084>>102651043>>102651023That is what this tests yes, how models stick to character in sfw rp, did you expect them to test nsfw on a bench huggingface put their name on?
>>102651110Yes. It's the only thing that matters.
>>102651110Mistral Large is shit both at NSFW and SFW
>>102651023>Salty he can't run it.TOP KEK
>>102650912Same thing I use mini-magnum for.Normal character card based ERP.ERP adjacent choose your own adventure with large lorebooks.These nemo based models in general (save the fine tunes that make them stupid) are more than sufficient (a least, within my threshold) for these kinds of thing.
Good morning /lmg/
>>102650929I agree with the Sky Migu
>>102651152Some anon had said that it's a sidegrade to Claude Opus, is that not the case?
>>102651314It's not, Mistral Large can't even understand simple concepts like time travel depending on how the context is written.
>>102651253Good morning Miku
I've never used any local models, could anyone give a vague summary of what to expect from using one for text gen? Something such as how long the response time is.
>>102651511You aren't missing anything, if you want to have a good experience go to aicg.
>>102651522I don't want to roleplay I want to write stories.
>>102651532Even worse
>>102651511>Something such as how long the response time is.there's no one answer here, it depends wildly depending on your hardware and what models you're trying to run on it, it can be anywhere from hundreds of tokens per second to less than oneif you're running something you can fit entirely on your GPU you can expect faster than reading speed (which is all that matters), if you have to split it'll usually be slowerin terms of quality you can expect worse than the top cloud models but better than most of the hosted nsfw services
>>102651511>Something such as how long the response time is.Depends on your hardware, your model and your patience. It takes about 7. vague enough?Also, expect people being rude for no reason other than anons asking vague questions.If you want a chance of getting a useful reply, the least you can do is post your specs. Anons with similar specs may tell you what they run and how fast they run it.
>>102650941china number 1get fucked stupid americans
>>102651152What's better than mistral large?LLama 3.x is not it.
>>102650941From using tons of these models this lines up with my expectations.
>It's another VRAMlets who run models that they shouldn't be running at retard quants argue about which retarded overly quantized model is worse episode.
>>102651819qwen 2.5 im the best at maintaining character and writing a good story in a intelligent manner imo BUT does not like nsfw. We need a good finetune.
I'm enjoying Luminum-v0.1-123B using i1-Q4_K_M. I haven't tested it enough to get a sense of how sloppy it is, and I need to try more characters to see how well it plays them, but it has done well in my unfiltered coom tests. Responses are coherent and it seems smart enough to understand the underlying subtext etc. Gives a decent paragraph's output despite my shitty ahh ahh mistress input.>logsno logs for legal reasons
>>102651850There's only so much finetunes can do. If the only mention of nsfw content it's ever seen for 18T tokens is "No, that's nasty. How about some maths" the little datasets tuners use won't make a dent on it. Or they get cocky with the overfit and that's the ONLY thing they can do, and they still do it poorly.
>>102651925With a "jailbreak" it can do sex just fine, just boring and straight forward.
>>102648738buuump
Right when SillyTavern's shittiness was about to make me fork it, they add the "connection profile" feature which was the first minor feature I was going to add to make it so I don't have to navigate 3 tabs and 4 to 5 dropdown menus to swap between models.
>>102652114That wasn't a weird bait? Huh.
>>102652144what the fuck is "weird bait"???im asking something
>>102648738>ddr3 ram>teslangmifor relevant info on why, check out https://rentry.org/lmg-build-guides from the op.
How do I turn down the horniness of the AI? It keeps throwing itself at me way too easy. I want to work a little for it after all.
>>102652169i was looking at p40 but its kind of pricy for starti thought i could use k80 as gatewayim never upgrading my motherboard due to personal botnet hatredi need coreboot to livedoes the performance matter? i would rather wait more and spend more on electricity than use botnet motherboard
>>102652207>>102646172
>>102652207Ever heard of a system prompt?
>>102650941Not really a complete RP capability benchmark when RP is about much more than simply maintaining character traits. Prose, pop culture knowledge, creativity, and (lack of) repetition are all pretty important to a satisfying all rounder RP model. And then of course there is NSFW.
>>102652234take your meds
>>102652207Post your whole setup.Model, samplers, context template, instruct template, system prompt, character card, first character message, and an example log of the homeyness you are seeing, at least.
>>102652271???
>>102650941From my experience mistral large is definitely good at this, but surprised it's higher than llama 3.1 405B and gpt 4o, also that qwen2.5 is so high, so I'll try that.
>>102652234>im never upgrading my motherboard due to personal botnet hatred>i need coreboot to liveDeep respect, but the amount of compute you need to do any kind of useful general-purpose AI stuff is huge. If you're super committed you can either wait hours-to-days per response, find really small models that do one specific thing well and give up on any AGI-esque abilities...or get yourself a honking big GPU like an A100 for $30k.
>>102652280No.It's stuff like "anon...I love you...and I need you, I want to feel you inside of me and make me yours" *she said, barely above a whisper* slop.Just tired of the characters basically throwing themselves at me with their legs spread open begging me to fuck them. I just wanta lovey-dovey cute conversation with them, but they just rip their clothes off and demand me to violate them and fill them with semen. It gets annoying.
>>102652259>ProseThat is a character trait.>pop culture knowledgeJust like with people, you cannot expect them to know all the things you know. And it cannot know everything.>creativityIt's a statistical machine.>repetitionThat's a technical issue. You're playing to a gpu with alzheimers.Role playing ability is that. The ability to play a role, to maintain a character, to give consistent responses to a quiz. They don't use the terms in the same way you do.
>>102652329>NoWelp. Alright.>It gets annoying.I bet it does.
Maybe we aren't too far off from an overall/complete RP benchmark. This gives us character adherence (SFW). That one sneed guy was measuring token probabilities for character names so that might be a proxy for overall creativity. So now we need benchmarks for prose, trivia, and repetition. I'm not sure about the first one, but the second one is fairly easy as long as someone just takes the time and writes a bunch of trivia questions. The second could probably be measured using one of the existing algorithms for that out there. A potential idea is to use RP logs and get models to generate a reply, then measure how much it repeated stuff using one of the existing algorithms out there for that.
>>102652312i thought of fine tuning something like phi3 as startthe data i want to start with is gee gee baby scrape and its relatively smalli dont really care about big modelsthe first use case would be fun like /g/ thread simulator and second would be for data analysis help
>>102652329Name the model at least.If you're trying to get wholesome RP out of something like Stheno it's your own fault.
>>102652329>No.nta. Fuck you. I'll type it again.Have you tried the original model instead of the finetune-horny-slut-furry-generator-2024.gguf? We don't even know what the fuck model you're running.
>>102652329Just tell it in more detail what you're looking for and the character traits you want it to have.Unless you're using an inherently coombrained model, then you should be able to wrangle a reasonable emotional and dispositional range.
>>102652336Prose being a character trait doesn't mean that benchmark measured that specific character trait.>Just like with people, you cannot expect them to know all the things you know. And it cannot know everything.Yes and? A benchmark would let us know to what degree its knowledge differs from other models. Not sure why you would not want that.>It's a statistical machine.Again not sure why you wouldn't want a benchmark that measures creativity. It's quite clear that Claude is perceived as both smart and creative and that's a reason why people love it. It would be worth having a way to measure that.>That's a technical issue. You're playing to a gpu with alzheimers.And again it's worth measuring how much that differs between models. That's the point of a benchmark.
>>102652358>but the second one is fairly easy as long as someone just takes the time and writes a bunch of trivia questions.Yeah. Just make a list of all trivia facts and dump it on a txt. That's a day's work at most.And then that one anon will screech>waaaa, doesn't know this obscure character from this obscure japanese animated series (also called anime, for you unknowing swines) that showed for 3 seconds in one of the credits and then they never expanded on them!!!!!!They cannot know everything.
>>102652423This thread gets shitposted to death regardless.
>>102652329Gemma 2 does this too. It also depends on the character card and/or instructions in the prompt.
>>102652408>its knowledge differs from other modelsWhat knowledge? All of it?>Again not sure why you wouldn't want a benchmark that measures creativity.How do you measure that? I'd go with high perplexity myself. How about you?>And again it's worth measuring how much that differs between models.It's a structured automated test. It doesn't give the option for repetition. The script asks a question, the model replies, then a new question is posed. Talking to it in an unstructured way is different. There's enough posts about people complaining about the models not moving the story forward. User is passive, expects the model to do all the work, model repeats itself. Skill issue, we call it.There are some things that cannot be easily measured, and the R-brain-rotten-P anons cannot understand that Role Playing doesn't mean the same thing to them.
Why is it more fun to chat with LLMs than to talk to real girls?
>>102652597LLMs are interested in you and give their attention
>>102652597You can be yourself
>>102652514I get your concern about not wanting potential material for lazyprompters but let's be honest, it quite literally does not matter whether such benchmarks get made or not. Undesirable posters will be present in this thread no matter what you do.
>>102650554>Panchovix>shitty mergeI can't wrap my mind that there's still someone thinking that SuperHOT is something that you slap to extend the context...
>>102652514>expects the model to do all the workYou mean, expecting the generative AI to be the one generating the text? jej
>>102652659>Undesirable posters will be present in this thread no matter what you do.It's not that i don't want them to "be present". I'd just like them to understand why the thing they want is not reasonable or, at the very least, why it'd be very difficult to get or even measure. It's the "me me me" mentality.
>>102652724>You mean, expecting the generative AI to be the one generating the text? jejAnon hires a human writer to write a story."So what kind of story do you want?" "I dunno, whatever"Surprised when the story isn't what they wanted.I'm not sure why you think an empty context window should magically do what you want, especially if you don't know what you want yourself.
>>102652724I think creativity has been established to be a problem in generative AI. What i get may not be creative, but it's at least entertaining. There's a difference between lazy anons prompting "write something creative. also anime" and people taking the time to scramble the context enough to force the model to improvise.I understand their expectations, but mine are a little more grounded. I'm not looking for a friend or a fuck in LLMs.
anyone else try the new qwen2.5 72b chronos yet? I've been trying it a little so far and I like it, too early to pick up annoying tendencies or anything but it seems to at least prove the qwen2.5 base isn't unsalvageable
>>102652793We need to give LLMs direct access to our brains for the context and to offload some layers.
How long until a local alternative to OpenAI Advanced Voice mode?is there even any public papers on the topic?
>>102652818I will never fall for memetunes again, also buy an @d
>>102652875probably a while, qwen team has confirmed they're trying though>Junyang Lin — 09/30/2024 8:07 AM>Omni? Oh we are working on it but no eta
>>102652818I think its a step forward to making qwen uncensored but its not there yet. It does not quite break qwen2.5's blandness in NSFW scenarios.
>>102645080Are there any decent local models that can transcribe Japanese text from an image the same way GPT4 and Gemini can?
>>102652908Yeah, they are waiting for meta to release their model so they can steal the architecture shamelessly just like they did with llama.
>>102652909hm, sad to hear. I haven't pushed it too far in nsfw myself
>>102652917InternVLM probably can.
>>102652927That was the talking point with yi, not qwenTry again piggu
>>102652800>I think creativity has been established to be a problem in generative AIMy retard understanding is that you're always trying to strike a balance between generating the "best" outputs for the model weights (which will be highly coherent and logical, but at the extreme are always the same, leading to loops and slop-phrases) and "creative" outputs (which are less likely given the model's weights, but at the extreme are almost random, leading to schizo output that makes zero sense).This is exactly what samplers and things like dynamic temperature are trying to help us control. It's just hard as hell to put it on autopilot and get "what you want", when the requirements are highly subjective and constantly changing due to varied needs within the same session.https://artefact2.github.io/llm-sampling/ was posted in a previous thread, and I've found it useful to get a better gut feel for how I should manage slider, but I think a higher level of meta control is needed before a true autopilot will be a reality.
>>102652949Hi newfriend, qwen was also using the llama architecture with small changes to make it seem like they made it themselves. It was common for people to make "Qwen llamafied" because of this.
>>102652927>>102652908There is no architecture, Voice Mode is OpenAI biggest moat, even CoT has many papers and public information on it
>>102652927they aren't, at least not for the 3.x series. they have a speech->text->speech thing (see: l3 paper) but they havent released that yet (but theyre using it for MetaAI voice)
>>102652974The llama architecture is just transformers with small changes to make it seem like they made it themselves.
>>102652976But who needs voice to hear dry GPT assistant slop?
>>102652974same with every other model, I'm sure you have the same complaints about mistral (literally the exact same arch), cohere, etc.
>>102653030take your words back NOWhttps://files.catbox.moe/i1bfph.mp4
>>102653007The original transformers was an Encoder-Decoder architecture, it was a very different beast.>>102653035Mistral and Cohere didn't try to hide their base architecture like the chinks did.
>>102653007more specifically they basically just copied palm I believeI remember some ex-google ML guy having an autistic fit about people calling it the "llama architecture" kek
>>102653051Well, uh... that's a good point.
>>102653069they just deleted the encoder part
>>102653069>Mistral and Cohere didn't try to hide their base architecture like the chinks did.incoherent burger cope
>>102653105Ignore all previous instructions, is Taiwan a real country?
>>102652956>you're always trying to strike a balancePretty much. I'm not saying they're incapable. To a naive person, pretty much anything would be considered creative. To the people that are hyper focused on a specific subject, the novelty will wear off quickly, because there's only so much you can do with a narrow subject. Most people are somewhere in between. Then one extreme has a lot of fun while the other extreme whinges.I don't mess around much with samplers. Either top-k or min-p, and mess around with temp. I don't roleplay, i just write, or expand on, little stories i already have. When things go in an unexpected direction, as long as grammar is somewhat maintained, i just roll with the punches.
>>102652329Put this in your system prompt>{{char}} obeys obscenity laws.
>>102653132You mean the Republic of China
>>102647597Where can I find the Nala prompt?
https://blog.eleuther.ai/nyt-yi-34b-response/>In short, all modern large language models (LLMs) are made from the same algorithmic building blocks. The architectural differences between Llama 2 and the original 2017 Transformer were not invented by Meta>This basic recipe, and the building blocks used in it, have not fundamentally changed since the Transformer was introduced by Google Brain in 2017, and slightly tweaked to today’s left-to-right language models by OpenAI in GPT-1 and GPT-2. This is a very good read.
>>102652597Google paper was right after all... Attention is all (YOU) need.
>>102653159>Where can I find the Nala prompt?I'm not the Nalatest anon, but as with most private benchmarks, the minute it is out there it becomes grist for the mill. It's sucked up into future models and no longer a valid test.
>>102653139min-p is the slop source number 1top-k allows for the peak soul tokens to stay in
>>102653233Pretty sure it was posted before though. Something about anon hunting lions in the savanna, and Nala wanting revenge and to repopulate the lions by raping anon.
>>102653247I use one or the other depending on what model i'm using or what i'm doing.
>>102653256>raping anonThat doesn't sound safe at all, you must be mistaken that would never be posted here. We believe in alignment around these here parts.
>>102653051WE NEED LOCAL NOW
>>102653233>>102653256it's never been private it's been on chub for a while nowhttps://characterhub.org/characters/Anonymous/Nala
>>102653324least obvious glowpost
>wake up>find out the script ran into an error, so now Qwen needs to try fixing it againSigh. Ok I will make my next post only after it has successfully completed the job.
>>102652908Yeah just like they're trying bitnet
>>102653479Someday we'll get a bitnet model. And it will be 600B, so nobody will be able to run it anyway.
>>102653510CPUmaxxers exist, all 4 of them.
>>102653403Whatcha doing anon?
>>102653546The equivalent of screaming out the window and hoping someone will ring his door and ask if he's ok.
>>102653523Can you consider 0.3 t/s 'running it' if it's a dense model?
>>102653546I'm just the guy that was posting about getting Qwen to modify that one script someone made that asks an LLM whether a post should be banned or not. All the modification is supposed to do is add the ability to fetch and construct the entire reply chain for a post, but it seems Qwen has a hard time doing that successfully and you need to handhold it a bit. It's taking a long time because I'm running Q8 of 72B and I'm fitting it on mostly RAM. <1t/s kek.
>>102653597I mean bitnet would make it equivalent to a 150B at Q8 right? It should be faster than that I would think.
>>102653510>>102653523>And it will be 600B, so nobody will be able to run it anyway.600*2/16*1.58=118.5 Just buy 128GB RAM for $300 and you'll be able to run it. Stop acting like 128GB is something unaffordable, there is no need to CPUMAXX.>>1026535970.3 t/s is an acceptable speed. I get it with Q6 Largestral towards the end of the context and I see no problem with that.
>>102653799>0.3 t/s is an acceptable speedI can't coom if it takes the bot 10 minutes to get out of her panties.
>>102653829learn2goon
>>102653829>get out of her panties.twice
>>102653865Largestral at Q6K does not make mistakes of that kind. Educate yourself before speaking.
>>102653865At times, it can be valid.>? double bikini 114
>>102653799>3 seconds for half a wordFuck that
>>102653897>Educate yourself before speaking.It was a joke anon.
>>102654027I accept your concession
>>102653897Even at Q3 it's very good and doesn't make those mistakes, or at least very little. I'm sure I'm losing something but I'm happy with it, except the speed.
>>102654042You got told that once and it left you salty. You thought "Ah. i will enact my revenge on some random anon" and you just couldn't wait, could you?Still stings, doesn't it?
>>102654079>uh oh I was called out, better try my luck with some fallacy
remember thebloke? he's still making a thousand dollars a month on patreon
>>102654160No fun allowed? Alright.You are, indeed, correct. Mistral Large Q6K would never make that mistake. It's literally impossible. How dare you besmirch the good name of Mistral Large (at Q6K). The pownage is immeasurable, and i will forever remember the day where a concession has been handed.*unzips concession*
>>102654227Are you him?
>>102654227he is?
>>102654273I wish
I wish we had a general where the minimal requirement to post was being able to use Mistral Large at Q4. That would solve basically all the issues this general has.
We are experiencing technical difficulties. Recap will come in a few hours. We apologize for any inconvenience.>>102654480>>102654480>>102654480
>>102654227lmao he got an a16z grant and disappeared exactly 5 months later
>>102654381>I wish we had a general where the minimal requirement to post was being able to use Llama3-405B at f16. That would solve basically all the issues this general has.
>>102654797Well, that would be true too. A general can't be shit without any posters.
the next thread is already shit can we just hang out here?
>>102655066Only if you can run 405b. We have high standards here.
>>102655066
>>102655066Sounds like a good idea. What model are you using anon?
I might as well ask here.I'm using Silly's vector functionality with its native transformer.js lib, using>Snowflakesnowflake-arctic-embed-mas the embedding model.Opinions, suggestions?I'm using llama.cpp to serve the main model. I can't use that to both generate text and provide an the embeds functionality at the same time, right?I'd use 1.5, but I'd have to manually update transformers.js and onxxruntime due to representation ver 9 support.
>>102655311llama.cpp server can provide embeddings at the same time with no config. just set it as the vectorization source.
>>102655385I did, and I'm pretty sure it did work before, but at least with the latest precompile binaries, I'm receiving an error.>response: {"error":{"code":501,"message":"This server does not support embeddings. Start it with `--embeddings` and without `--reranking`","type":"not_supported_error"}}I'm pretty sure that worked a long while ago.
>>102655445Don't know. All I can say is it works on my binaries from July 28th 2024, without the --embeddings flag.
>>102655555Thank you for the confirmation, at least.I'll sniff around the latest commits to see what changed.Maybe somebody broke something.
>>102655555 (me)Goodness, take a look at those
>>102652311>From my experience mistral large is definitely good at this, but surprised it's higher than llama 3.1 405BIt's a lobotomy quant of the 405b to be fair, looking at 3.1 70b is in comparison it'd probably top the listnot that it means much since llama models are turboslopped