/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 10/01/24(Tue)20:40:35 No.102645080

File: GoodnighMoonMiku.png (795 KB, 718x805)

795 KB PNG

/lmg/ - Local Models General Anonymous 10/01/24(Tue)20:40:35 No.102645080 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102632446 & >>102616609

►News
>(09/27) Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
>(09/25) Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog
>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
10/01/24(Tue)20:42:00 No.102645094

Anonymous 10/01/24(Tue)20:42:00 No.102645094

File: 39_06121_.png (2.51 MB, 2048x2048)

2.51 MB PNG

►Recent Highlights from the Previous Thread: >>102632446

--LFM models, impressive performance, but loosely guardrailed and easy to break:
>102635382 >102635694 >102635742
--GitHub link to antislop-sampler:
>102638758
--Reproducing 1B Base model with anxious and overachieving traits using LLaMA 3.2 and KAN Integration:
>102635596
--Qwen2.5 finetune using synthetic data from Anthropic and OpenAI:
>102641394 >102642403
--PCIe bifurcation slows down model loading, but doesn't bottleneck GPU performance after loading:
>102639204 >102639253 >102639271 >102639281 >102640220 >102642341 >102643322
--New ooba release has issues, but llama-cpp-python downgrade fixes it:
>102642363 >102643609 >102643637 >102644113 >102644227
--LLM GPU options and costs discussion:
>102642712 >102642811 >102643034 >102642816 >102642883 >102643039 >102643718 >102643757
--Discussion on effective context size and KoboldCpp:
>102639239 >102639505 >102639540 >102639571 >102639654 >102639825 >102639709 >102639764 >102639804 >102639849 >102639900 >102641938 >102642163
--Deepseek 2.5 tested with L3 405b adventure prompt, faster but less consistent than 405b:
>102634133
--ASICs for LLM inference are possible but may not be cost-effective:
>102640718 >102640838 >102641480 >102640850 >102641082
--Nvidia releases NVLM-1.0-D-72B multimodal LLM:
>102635272
--NVLM-D may be a Qwen finetune:
>102643114 >102643176 >102643232
--Llama-8b-base fine-tuned on non-controversial topics still shows moralization:
>102640013 >102640128 >102640197 >102640206
--LFM-40B and other models compared, skepticism about closed weight models:
>102633486 >102633508 >102633537 >102633552 >102639199 >102633876
--Miku (free space):
>102632725 >102632796 >102632818 >102632819 >102633056 >102633341 >102633490 >102633888 >102634662 >102634854 >102636996 >102637011 >102637456 >102640161 >102643322

►Recent Highlight Posts from the Previous Thread: >>102632451

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

lmg
10/01/24(Tue)20:44:34 No.102645121

lmg 10/01/24(Tue)20:44:34 No.102645121

*lost*

Anonymous
10/01/24(Tue)20:45:00 No.102645126

Anonymous 10/01/24(Tue)20:45:00 No.102645126

File: 35 Days Until November 5.png (2.58 MB, 1104x1472)

2.58 MB PNG

Anonymous
10/01/24(Tue)20:45:03 No.102645127

Anonymous 10/01/24(Tue)20:45:03 No.102645127

OpenAI won. https://x.com/NickADobos/status/1841167978085433351

Anonymous
10/01/24(Tue)20:45:59 No.102645139

Anonymous 10/01/24(Tue)20:45:59 No.102645139

strobby

Anonymous
10/01/24(Tue)20:48:27 No.102645165

Anonymous 10/01/24(Tue)20:48:27 No.102645165

>>102645080
>>102645126
sex
with miku

Anonymous
10/01/24(Tue)20:49:10 No.102645170

Anonymous 10/01/24(Tue)20:49:10 No.102645170

>>102645126
strobbysex

Anonymous
10/01/24(Tue)20:51:17 No.102645195

Anonymous 10/01/24(Tue)20:51:17 No.102645195

>copilot, analyze every image in the folder "softcore cosplayers", move every image with the slightest hint of nipples, anus or vagina to the folder "cosplay porn" and delete the ones that show no sign of nudity
>master, there are over a thousand pictures in the "softcore cosplayers" folder, this could take a week, are you sure you want to proceed?
>yes, remember that once finished to return to your routine of stalking for fmab threads on 4chan and posting the usual the moment the threads are found

Anonymous
10/01/24(Tue)21:13:28 No.102645410

Anonymous 10/01/24(Tue)21:13:28 No.102645410

miku is 16...

Anonymous
10/01/24(Tue)21:13:39 No.102645411

Anonymous 10/01/24(Tue)21:13:39 No.102645411

guess i went download crazy the other night, found this in my folder
Replete-LLM-V2.5-Qwen-72b-IQ4_XS
>Replete-LLM-V2.5-Qwen-72b is a continues finetuned version of Qwen2.5-72B. I noticed recently that the Qwen team did not learn from my methods of continuous finetuning, the great benefits, and no downsides of it. So I took it upon myself to merge the instruct model with the base model myself using the Ties merge method
>This version of the model shows higher performance than the original instruct and base models.

anyone tried it?

Anonymous
10/01/24(Tue)21:15:14 No.102645422

Anonymous 10/01/24(Tue)21:15:14 No.102645422

>>102645411
wasn't that the guy with the "antimystical meds" schizo dataset that he said gave llms souls?

Anonymous
10/01/24(Tue)21:15:22 No.102645423

Anonymous 10/01/24(Tue)21:15:22 No.102645423

>>102645411
Have you?

Anonymous
10/01/24(Tue)21:17:51 No.102645443

Anonymous 10/01/24(Tue)21:17:51 No.102645443

File: youp.png (68 KB, 824x486)

68 KB PNG

>>102645422
Youp.

Anonymous
10/01/24(Tue)21:18:27 No.102645448

Anonymous 10/01/24(Tue)21:18:27 No.102645448

>>102645410
https://youtu.be/SCTFu7QYbQs?si=AW-5O1Ev5WXuMj4T&t=7

Anonymous
10/01/24(Tue)21:18:32 No.102645450

Anonymous 10/01/24(Tue)21:18:32 No.102645450

>>102645423
i'm about to as soon as it moves to my ssd
>>102645422
must be really good then

Anonymous
10/01/24(Tue)21:19:13 No.102645456

Anonymous 10/01/24(Tue)21:19:13 No.102645456

What's the catch of flash attention?

Anonymous
10/01/24(Tue)21:21:06 No.102645472

Anonymous 10/01/24(Tue)21:21:06 No.102645472

>>102645456
I think it's pretty much a free lunch actually, reduced vram usage with no model degradation

Anonymous
10/01/24(Tue)21:21:23 No.102645476

Anonymous 10/01/24(Tue)21:21:23 No.102645476

>>102645456
That it makes people suspicious.

Anonymous
10/01/24(Tue)21:22:08 No.102645486

Anonymous 10/01/24(Tue)21:22:08 No.102645486

>>102645456
Depends. PyTorch Flash Attention just requires Ampere or newer and has no tricks.
If something says it's device agnostic (COUGH llama.cpp COUGH) then it's not real flash attention. At best it's fused attention.

Anonymous
10/01/24(Tue)21:23:31 No.102645493

Anonymous 10/01/24(Tue)21:23:31 No.102645493

Is there anyway to get koboldAI and/or a model (in this case, LLaMA2-13B-Tiefighter.Q4_K_S.gguf) to be under a certain character limit? Say I want to have it shitpost on Twitter, and need it to stay 280 characters or less.

Anonymous
10/01/24(Tue)21:25:21 No.102645507

Anonymous 10/01/24(Tue)21:25:21 No.102645507

>>102645456
Nothing. And if there is, it's so small a difference that quanting to q8 outweighs it by multiple factors.

Anonymous
10/01/24(Tue)21:26:15 No.102645512

Anonymous 10/01/24(Tue)21:26:15 No.102645512

>>102645411
>>102645450
>Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes
>llm_load_tensors: offloading 30 repeating layers to GPU
>llm_load_tensors: offloaded 30/81 layers to GPU
>llm_load_tensors: CPU buffer size = 24267.02 MiB
>llm_load_tensors: CUDA0 buffer size = 13596.80 MiB
huehuehue this is gonna be slow isnt it

Anonymous
10/01/24(Tue)21:30:55 No.102645545

Anonymous 10/01/24(Tue)21:30:55 No.102645545

File: Screen_20241001_193027_0001.jpg (222 KB, 876x839)

222 KB JPG

>>102645411
>>102645512
pure slop, tho the speed aint bad

Anonymous
10/01/24(Tue)21:50:45 No.102645693

Anonymous 10/01/24(Tue)21:50:45 No.102645693

the creator of styletts2 has trained a tts model for adobe which will soon go into production.
“If I have computing resources, I can probably reproduce it
It is also in my research interest to reproduce the Adobe model, so if you have the resources, please let me know
the paper will be pre-printed this week”
does anyone have contacts and can donate him anything > 24xA100 for a few weeks

Anonymous
10/01/24(Tue)21:56:08 No.102645726

Anonymous 10/01/24(Tue)21:56:08 No.102645726

Should I use --parallel? It makes processing faster but seems to make the model dumber

Anonymous
10/01/24(Tue)22:04:00 No.102645769

Anonymous 10/01/24(Tue)22:04:00 No.102645769

>>102645726
>seems
Do a perplexity, kl-divergence, or even some blind A/B testing. Unless something is broken, you shouldn't notice a difference. Some people think their model is smarter just because it takes longer to generate and when it goes fast they get suspicious >>102645456

Anonymous
10/01/24(Tue)22:14:42 No.102645814

Anonymous 10/01/24(Tue)22:14:42 No.102645814

File: Untitled.png (1.34 MB, 1080x2345)

1.34 MB PNG

SATA: Spatial Autocorrelation Token Analysis for Enhancing the Robustness of Vision Transformers
https://arxiv.org/abs/2409.19850
>Over the past few years, vision transformers (ViTs) have consistently demonstrated remarkable performance across various visual recognition tasks. However, attempts to enhance their robustness have yielded limited success, mainly focusing on different training strategies, input patch augmentation, or network structural enhancements. These approaches often involve extensive training and fine-tuning, which are time-consuming and resource-intensive. To tackle these obstacles, we introduce a novel approach named Spatial Autocorrelation Token Analysis (SATA). By harnessing spatial relationships between token features, SATA enhances both the representational capacity and robustness of ViT models. This is achieved through the analysis and grouping of tokens according to their spatial autocorrelation scores prior to their input into the Feed-Forward Network (FFN) block of the self-attention mechanism. Importantly, SATA seamlessly integrates into existing pre-trained ViT baselines without requiring retraining or additional fine-tuning, while concurrently improving efficiency by reducing the computational load of the FFN units. Experimental results show that the baseline ViTs enhanced with SATA not only achieve a new state-of-the-art top-1 accuracy on ImageNet-1K image classification (94.9%) but also establish new state-of-the-art performance across multiple robustness benchmarks, including ImageNet-A (top-1=63.6%), ImageNet-R (top-1=79.2%), and ImageNet-C (mCE=13.6%), all without requiring additional training or fine-tuning of baseline models.
https://github.com/nick-nikzad/SATA
Empty currently.

Anonymous
10/01/24(Tue)22:22:15 No.102645865

Anonymous 10/01/24(Tue)22:22:15 No.102645865

I shared an idea for an RP arena a few threads ago, and someone pointed out the issue of needing a host, as well as the complications around model trust and other factors. However, I just realized that using RP logs from dumps like C2 to generate a bunch of pre-made completions for arbitrary positions in the logs, and then having users pick the best one in an lmarena-style format, could work too and may not be too boring. I think I'll give this a shot some time soon.

Anonymous
10/01/24(Tue)22:33:16 No.102645958

Anonymous 10/01/24(Tue)22:33:16 No.102645958

>>102645865
Might be interesting. I hope it doesn't require reading too large of a wall to get up to speed on the characters and events before making a pick.

Anonymous
10/01/24(Tue)22:35:42 No.102645972

Anonymous 10/01/24(Tue)22:35:42 No.102645972

File: 2024-10-01_19-33-13.png (53 KB, 923x620)

53 KB PNG

going to be released ? yay or nay
personally 50/50 with a 27% of a flux 2.0 situation

Anonymous
10/01/24(Tue)22:39:41 No.102646001

Anonymous 10/01/24(Tue)22:39:41 No.102646001

File: 1724294828253716.jpg (54 KB, 700x700)

54 KB JPG

whats the best model for use with a 3090, something I could use as a chatbot I pretend to text, both sexy stuff and about my life problems

Anonymous
10/01/24(Tue)22:41:05 No.102646009

Anonymous 10/01/24(Tue)22:41:05 No.102646009

Characterizing and Efficiently Accelerating Multimodal Generation Model Inference
https://arxiv.org/abs/2410.00215
>Generative artificial intelligence (AI) technology is revolutionizing the computing industry. Not only its applications have broadened to various sectors but also poses new system design and optimization opportunities. The technology is capable of understanding and responding in multiple modalities. However, the advanced capability currently comes with significant system resource demands. To sustainably scale generative AI capabilities to billions of users in the world, inference must be fast and efficient. This paper pinpoints key system design and optimization opportunities by characterizing a family of emerging multi-modal generation models on real systems. Auto-regressive token generation is a critical latency performance bottleneck, typically dominated by GPU idle time. In addition to memory-intensive attention across the generative AI models, linear operations constitute significant inference latency due to the feed forward networks in Transformer-based models. We demonstrate that state-of-the-art optimization levers, spanning from applications to system software and hardware, set a 3.88x better baseline.
from meta. posting for Johannes in the hopes it gives him some ideas

Anonymous
10/01/24(Tue)22:43:35 No.102646025

Anonymous 10/01/24(Tue)22:43:35 No.102646025

>>102645958
>>102645865
Yeah idk how successful that'd be. Maybe a better idea would be to get a bunch of popular cards or cards people are already familiar with (like Nala), write a low effort response like ahhh ahhh mistress tier, and use that as the basis for the completions. You could also add in another variable like system prompts that are generally considered good for most models. I'd also only allow greedy or near-greedy sampling.

Anonymous
10/01/24(Tue)22:45:14 No.102646033

Anonymous 10/01/24(Tue)22:45:14 No.102646033

>>102646001
Midnight-Miqu-70B-v1.5.i1-IQ4_XS (or iq3-xs for slightly more speed but a bit less goodness)
has worked well for me on 3090ti
testing >>102645545
now, but it's way too repetitive, i'm messing with the samplers to see if it can be fixed but i dont think it will

Anonymous
10/01/24(Tue)22:47:26 No.102646045

Anonymous 10/01/24(Tue)22:47:26 No.102646045

File: Untitled.png (108 KB, 1277x706)

108 KB PNG

The Perfect Blend: Redefining RLHF with Mixture of Judges
https://arxiv.org/abs/2409.20370
>Reinforcement learning from human feedback (RLHF) has become the leading approach for fine-tuning large language models (LLM). However, RLHF has limitations in multi-task learning (MTL) due to challenges of reward hacking and extreme multi-objective optimization (i.e., trade-off of multiple and/or sometimes conflicting objectives). Applying RLHF for MTL currently requires careful tuning of the weights for reward model and data combinations. This is often done via human intuition and does not generalize. In this work, we introduce a novel post-training paradigm which we called Constrained Generative Policy Optimization (CGPO). The core of CGPO is Mixture of Judges (MoJ) with cost-efficient constrained policy optimization with stratification, which can identify the perfect blend in RLHF in a principled manner. It shows strong empirical results with theoretical guarantees, does not require extensive hyper-parameter tuning, and is plug-and-play in common post-training pipelines. Together, this can detect and mitigate reward hacking behaviors while reaching a pareto-optimal point across an extremely large number of objectives.
Our empirical evaluations demonstrate that CGPO significantly outperforms standard RLHF algorithms like PPO and DPO across various tasks including general chat, STEM questions, instruction following, and coding. Specifically, CGPO shows improvements of 7.4% in AlpacaEval-2 (general chat), 12.5% in Arena-Hard (STEM & reasoning), and consistent gains in other domains like math and coding. Notably, PPO, while commonly used, is prone to severe reward hacking in popular coding benchmarks, which CGPO successfully addresses. This breakthrough in RLHF not only tackles reward hacking and extreme multi-objective optimization challenges but also advances the state-of-the-art in aligning general-purpose LLMs for diverse applications.
neat

Anonymous
10/01/24(Tue)23:02:03 No.102646132

Anonymous 10/01/24(Tue)23:02:03 No.102646132

File: twice.png (83 KB, 810x294)

83 KB PNG

>>102646033
>it's way too repetitive
Have you seen the dataset? All the training examples seem to be duplicated... double the soul, i suppose...
>https://huggingface.co/datasets/Replete-AI/The_Living_AI_Dataset

Anonymous
10/01/24(Tue)23:02:17 No.102646134

Anonymous 10/01/24(Tue)23:02:17 No.102646134

File: 1727749780600098.jpg (2.47 MB, 1500x2000)

2.47 MB JPG

Where can I find P40 GPU's for a decent price?

Anon in this guide says he paid 500 for 3 of them
https://rentry.org/Mikubox-Triple-P40-Replication

Lowest I can find on ebay is 300

Anonymous
10/01/24(Tue)23:03:45 No.102646142

Anonymous 10/01/24(Tue)23:03:45 No.102646142

>>102646134
You'll probably never see that again, sadly. They're glorified e-waste so not worth paying 300 bucks a piece for. The only thing that is still reasonably priced for what it is would be 3090s, but buying 3-4 of them is out of most people's budget.

Anonymous
10/01/24(Tue)23:04:07 No.102646145

Anonymous 10/01/24(Tue)23:04:07 No.102646145

>>102645080
I got bored and tried some mistral-small 22b models at Q6_K_M. They wrote well, and were more intelligent than I thought they would be for such small models, but they were far, far too horny. It was to the point where, upon first meeting {{user}}, a character would immediately throw themselves at {{user}}.

I tried Mistral-Small-22B-ArliAI-RPMax-v1.1 and Cydonia 22b v1.0. Are there are any 22b models that can write actual RP without immediately trying to take things into an ERP direction?

Anonymous
10/01/24(Tue)23:08:43 No.102646172

Anonymous 10/01/24(Tue)23:08:43 No.102646172

>>102646145
I don't suppose you tried the original model...
Wait... what do you think they train those finetunes on? Wholesome "let's go to the park and have some icecream... that was fun, call you next week" kind of stuff?

Anonymous
10/01/24(Tue)23:10:43 No.102646186

Anonymous 10/01/24(Tue)23:10:43 No.102646186

File: LLMs are like.jpg (30 KB, 543x543)

30 KB JPG

Anonymous
10/01/24(Tue)23:14:51 No.102646224

Anonymous 10/01/24(Tue)23:14:51 No.102646224

>>102646001
Don't know best. What I'm currently testing is Qwen 2.5 32B Instruct 5.0bpw exl2 w/ 12288 context. You can definitely get better if you go slower but I don't want to go back to the world of less than 5 tokens per second.

Anonymous
10/01/24(Tue)23:16:57 No.102646237

Anonymous 10/01/24(Tue)23:16:57 No.102646237

>>102646033
repetition was a problem last time I tried running some LLM's, so I might skip this one

>>102646224
Thanks, I might give that one a try. I'm guessing a big part of whether a model will work well or not is how good you make the system prompt as well?

Anonymous
10/01/24(Tue)23:17:06 No.102646239

Anonymous 10/01/24(Tue)23:17:06 No.102646239

>>102646172
I actually didn't try the original model. I guess I should, lol.

Anonymous
10/01/24(Tue)23:20:54 No.102646272

Anonymous 10/01/24(Tue)23:20:54 No.102646272

>>102646239
At least to see if the original model feels too horny as well. If it is, all the finetunes are going to be even hornier.
Or maybe it's just you, anon. You just make those coils whine...

Anonymous
10/01/24(Tue)23:27:54 No.102646324

Anonymous 10/01/24(Tue)23:27:54 No.102646324

File: file.png (86 KB, 1487x748)

86 KB PNG

Playing around with >>102608691 since I appreciate his autism:
https://files.catbox.moe/au9ay1.ogg
https://files.catbox.moe/6w4xfn.ogg
https://files.catbox.moe/m3qyqh.ogg
https://files.catbox.moe/cz3iy9.ogg
https://files.catbox.moe/pvmzg8.ogg
https://files.catbox.moe/qylae0.ogg
https://files.catbox.moe/2jzz0z.ogg

For zero shot it does a decent job at sounding fine and trying to match prosidy of a speaker and even stuff like applied effects and reverb, especially since whatever else I've tried not being able to do it so well.

Anonymous
10/01/24(Tue)23:28:20 No.102646328

Anonymous 10/01/24(Tue)23:28:20 No.102646328

>>102646224
I take it you're a 24gb vramlet like me. Have you tried the 72B exl2 at 2.4bpw? If so, I'm curious how it compares. I'm downloading it now, but huggingface has been really slow lately.

Anonymous
10/01/24(Tue)23:29:26 No.102646337

Anonymous 10/01/24(Tue)23:29:26 No.102646337

>>102646272
CARLOS

Anonymous
10/01/24(Tue)23:45:57 No.102646468

Anonymous 10/01/24(Tue)23:45:57 No.102646468

>>102646134
sweet エックスろくまんはっせん. yours?

Anonymous
10/01/24(Tue)23:47:57 No.102646491

Anonymous 10/01/24(Tue)23:47:57 No.102646491

>>102645195
you know, i actually wouldn't mind a copilot clone as long as it's open source and running on my machine for certain.

Anonymous
10/01/24(Tue)23:52:10 No.102646531

Anonymous 10/01/24(Tue)23:52:10 No.102646531

>>102646468
unfortunately not
>エックスろくまんはっせん
Took me a bit to understand, I need to continue to improve my japanese

Anonymous
10/01/24(Tue)23:52:44 No.102646535

Anonymous 10/01/24(Tue)23:52:44 No.102646535

File: fuck.jpg (14 KB, 247x257)

14 KB JPG

>>102645080
You had one fucking job OP

Anonymous
10/01/24(Tue)23:56:02 No.102646562

Anonymous 10/01/24(Tue)23:56:02 No.102646562

>>102646134
Right before local language models took off you could find them for ~$100 on ebay. Not anymore though.

Anonymous
10/02/24(Wed)00:03:00 No.102646604

Anonymous 10/02/24(Wed)00:03:00 No.102646604

>>102646172
These models should take more inspiration from drama slice of life anime/VNs/LNs (hopefully only from the well-written ones). I want nuanced interaction, not the "choose one: 1. business 2. ERP". And while at it, delete the overrepresented erotic literature slop (mischievous grin etc.)

Anonymous
10/02/24(Wed)00:03:33 No.102646609

Anonymous 10/02/24(Wed)00:03:33 No.102646609

>>102646324
could you have picked literally anything else besides that prompt

Anonymous
10/02/24(Wed)00:11:55 No.102646659

Anonymous 10/02/24(Wed)00:11:55 No.102646659

Hey guys, I'm looking for some advice. Basically I'm trying to force my way through college as a dumb wagie in my late 20s with the help of LLMs.
I've been experimenting with Microsoft Copilot and I like it but it has some flaws, like half of the math not rendering because of formatting errors.
Tonight I researched a bit trying to decide if it's worth it to shell out 20 bucks a month to OpenAI, I found DuckDuckGo provides free access to GPT-4o-mini and so far it seems to work better than Copilot (at least the latex gets rendered well) but I also read that recently a new Chinese model has come out which is supposed to be almost as good.
Is there any easy way to get access to Qwen2.5-Math? huggingface.co/chat has the normal 70B instruct model but apparently not the specialized math version. I found a demo at huggingface.co/spaces/Qwen/Qwen2.5-Math-Demo but it doesn't let me have a conversation, only a single prompt at a time. And also it doesn't say what version it is w.r.t. the parameters.
How expensive and complicated would it be to set up inference in somebody else's computer for the 70B model (or whatever the largest model is)? I've got nowhere near the hardware to run it locally. What frontend would I use, Sillytavern? Has anyone tried setting up LaTeX?

Anonymous
10/02/24(Wed)00:17:15 No.102646700

Anonymous 10/02/24(Wed)00:17:15 No.102646700

>>102646659
Try this one
https://huggingface.co/spaces/Qwen/Qwen2.5

Anonymous
10/02/24(Wed)00:18:24 No.102646706

Anonymous 10/02/24(Wed)00:18:24 No.102646706

>>102646604
>drama slice of life anime/VNs/LNs (hopefully only from the well-written ones)
I'm not the only one seeing the contradiction here, am i?
>even though
>it's only natural
Every piece of exposition i've ever seen from japanese media (which, to be fair, wasn't been much since akira) goes like
>protagonist shows up
>other character shows up
>Oh, hello. other character. We've known each other for ages, haven't we? Here's a brief summary of your personality
>Oh, you sure know me well, protagonist. Your description is of the highest accuracy, with the exception of [funny remark].

Anonymous
10/02/24(Wed)00:19:55 No.102646715

Anonymous 10/02/24(Wed)00:19:55 No.102646715

File: 1727706071361379.jpg (799 KB, 1856x2464)

799 KB JPG

>>102645080

Anonymous
10/02/24(Wed)00:24:42 No.102646746

Anonymous 10/02/24(Wed)00:24:42 No.102646746

File: IMG_9747.jpg (1.04 MB, 1125x1128)

1.04 MB JPG

I’m so sick of ai. Fuck why is this my job now. I hate ai.

Anonymous
10/02/24(Wed)00:27:53 No.102646762

Anonymous 10/02/24(Wed)00:27:53 No.102646762

>>102646700
Wow, nice, thank you! I didn't think it would be so easy.

Anonymous
10/02/24(Wed)00:35:35 No.102646805

Anonymous 10/02/24(Wed)00:35:35 No.102646805

>>102646746
It's fine, if you shill well enough then maybe Sam Altman will promote you and give you a proper position in his company.

Anonymous
10/02/24(Wed)00:41:23 No.102646851

Anonymous 10/02/24(Wed)00:41:23 No.102646851

File: qwen2.5-math-72B.png (123 KB, 1508x652)

123 KB PNG

>>102646762
Tested it a bit.
I like how it always ends its messages with a final outlined answer, that's cute. Also I've never used any LLM that's so dry and professor-like. Very different from the overly "friendly" and cheerful LLMs like Llama.

Anonymous
10/02/24(Wed)00:43:45 No.102646871

Anonymous 10/02/24(Wed)00:43:45 No.102646871

>no new language models worth using
>local tts completely abandoned
>imagesloppas so starved they're rejoicing over an SDXL finetune
did local lose?

Anonymous
10/02/24(Wed)00:46:27 No.102646889

Anonymous 10/02/24(Wed)00:46:27 No.102646889

Still enjoying Largestral

Anonymous
10/02/24(Wed)00:54:00 No.102646931

Anonymous 10/02/24(Wed)00:54:00 No.102646931

>>102646889
Too big and it doesn't offer much, or anything, over Qwen2.5...

Anonymous
10/02/24(Wed)00:59:56 No.102646971

Anonymous 10/02/24(Wed)00:59:56 No.102646971

>>102646931
>Qwen2.5
If it's that good then show me
Largestral is really smart and has good spatial awareness that I don't want to give up

Anonymous
10/02/24(Wed)01:00:32 No.102646977

Anonymous 10/02/24(Wed)01:00:32 No.102646977

File: ShockedMiku.png (1.16 MB, 880x1168)

1.16 MB PNG

>>102646746
Say it isn't so, anon!
I don't think we can go on without you...

Anonymous
10/02/24(Wed)01:04:13 No.102647006

Anonymous 10/02/24(Wed)01:04:13 No.102647006

File: IMG_0258.jpg (266 KB, 905x881)

266 KB JPG

>>102646977
>mikunt poster
>rude and annoying

Anonymous
10/02/24(Wed)01:04:51 No.102647009

Anonymous 10/02/24(Wed)01:04:51 No.102647009

>>102647006
>phoneposter
>larping retard

Anonymous
10/02/24(Wed)01:05:44 No.102647015

Anonymous 10/02/24(Wed)01:05:44 No.102647015

>>102647006
What if he was being sincere?

Anonymous
10/02/24(Wed)01:10:16 No.102647044

Anonymous 10/02/24(Wed)01:10:16 No.102647044

>>102646931
nta (I don't use Largestral) but Qwen2.5 is boring as fuck
We have half a dozen variants of the "high IQ but incredibly dry and autistic" model now, it's stale

and the novelty of having open source models that don't make retarded mistakes wore off long ago, that's not where the bar is now

Anonymous
10/02/24(Wed)01:12:35 No.102647056

Anonymous 10/02/24(Wed)01:12:35 No.102647056

>llama2 70b comes, i enjoy it
>miqu comes, i enjoy it
>commandr+ comes, i enjoy it
>largestral comes, i enjoy it
its as simple as
just use the models you like until another great one drops
also believe in llama4

Anonymous
10/02/24(Wed)01:18:40 No.102647110

Anonymous 10/02/24(Wed)01:18:40 No.102647110

File: Yugi.png (780 KB, 945x703)

780 KB PNG

Believe in the heart of the llama!

Anonymous
10/02/24(Wed)01:23:38 No.102647133

Anonymous 10/02/24(Wed)01:23:38 No.102647133

>>102646977
Calm down faggot

Anonymous
10/02/24(Wed)01:23:51 No.102647134

Anonymous 10/02/24(Wed)01:23:51 No.102647134

>>102647056
I’m still on miqu

Anonymous
10/02/24(Wed)01:27:23 No.102647155

Anonymous 10/02/24(Wed)01:27:23 No.102647155

>>102647056
llama 4 will be doa. llama 4.3 is where it will be at.

Anonymous
10/02/24(Wed)01:30:25 No.102647174

Anonymous 10/02/24(Wed)01:30:25 No.102647174

What's the best miqu or 70B model for uncensored well written scenarios? Original miqu or Midnight Miqu? Currently using Midnight-Miqu-70B-v1.5.i1-IQ2_M, it's the absolute biggest I can fit into my vramlet gaming PC. Is Midnight Miqu actually better or how much is placebo?

Anonymous
10/02/24(Wed)01:31:47 No.102647177

Anonymous 10/02/24(Wed)01:31:47 No.102647177

>>102647044
I don’t know why you're hijacking my comment, but...
>Too big and it doesn't offer much, or anything, over Qwen2.5
Large is even more dry and retarded, which is why there’s no reason to use it anymore. Qwen’s characterization in the dialogue is something that seems quite a bit above it.
I’m also more than happy to leave behind the insane repetition in Mistral’s models.

Anonymous
10/02/24(Wed)01:32:39 No.102647182

Anonymous 10/02/24(Wed)01:32:39 No.102647182

>>102647177
Again, show Qwen2.5 being so good

Anonymous
10/02/24(Wed)01:37:39 No.102647212

Anonymous 10/02/24(Wed)01:37:39 No.102647212

>>102647177
You're not seeing repetition? I see a ton. What sampler settings?

Anonymous
10/02/24(Wed)01:37:46 No.102647214

Anonymous 10/02/24(Wed)01:37:46 No.102647214

>>102647182
I'm lazy. I will instead demand you to prove what Large can do that Qwen can't.

Anonymous
10/02/24(Wed)01:38:18 No.102647219

Anonymous 10/02/24(Wed)01:38:18 No.102647219

>>102647009
Where's the larp stupid newfag?

Anonymous
10/02/24(Wed)01:42:17 No.102647257

Anonymous 10/02/24(Wed)01:42:17 No.102647257

Looks like the first Qwen2.5 finetunes are out. This one looks legit, because it has a picture of an anime girl.

https://huggingface.co/ZeusLabs/Chronos-Platinum-72B

Anonymous
10/02/24(Wed)01:44:10 No.102647269

Anonymous 10/02/24(Wed)01:44:10 No.102647269

qwen sucks
more like qwgayn
not downloading it
not trying it

Anonymous
10/02/24(Wed)01:44:22 No.102647275

Anonymous 10/02/24(Wed)01:44:22 No.102647275

File: chronos.png (71 KB, 745x231)

71 KB PNG

I tried the new Qwen chronos finetune that was posted. It has its flaws, but seems significantly less slopped (and more creative) than the others I've been using.

Anonymous
10/02/24(Wed)01:44:31 No.102647277

Anonymous 10/02/24(Wed)01:44:31 No.102647277

>>102647257
So it's shit, thanks for heads-up.

Anonymous
10/02/24(Wed)01:48:24 No.102647306

Anonymous 10/02/24(Wed)01:48:24 No.102647306

>>102647277
>we can only use Qwen when Sao says so

Anonymous
10/02/24(Wed)01:53:23 No.102647334

Anonymous 10/02/24(Wed)01:53:23 No.102647334

>>102647306
Who are you quoting schizo?

Anonymous
10/02/24(Wed)01:55:20 No.102647346

Anonymous 10/02/24(Wed)01:55:20 No.102647346

>>102647334
I'm quoting my thoughts after reading your post. Retard.

Anonymous
10/02/24(Wed)02:02:29 No.102647401

Anonymous 10/02/24(Wed)02:02:29 No.102647401

>>102645127
i bask in smug schadenfreude being the guy who said "i told you so". local models are a scam, you're a bunch of placated fools. they give you these scraps so that you arent rioting in the streets. they manipulate you dumb freetards so they have a pasture of copecows going "local will catch up soooooon!!" as your unwieldy stuff stagnates while theirs continues to improve. they hand you models and then paint you as an example of why there should be more regulations and restrictions on AI. local models are the planted gun. zuck even said that if llama ever actually gets good then they'd stop releasing it open.
local shit is even more pozzed and useless than the premium slop, yet you defend it based on the hypothetical rather than the actual. you're the injuns: trading your future for a couple of fire sticks, failing to grasp the bigger picture, the inevitable. local has no future due to the nature of ai tech. the amount of money and data needed to train, the increasing model size that vastly outpaces consumer hardware, the lack of actual 'source code' that can be viewed and modified. they even hijack the term "open source" when these models are essentially blackbox .exes
show me the training data for llama
show me the training code
and even if you had it you can't do a single thing to fix it, because you don't have a gigacluster of gpus. there's a reason local sucks, and that's because the technology itself is fundamentally incompatible with open source collaboration. they know local is irrelevant, they know it will never have a chance at catching up. it's all a game to frame you as evil coomer terrorists so that they can secure a 100% market domination by regulating gpus like they did with LHR/crypto and passing enough legislation that makes it impossible for any startup to compete

so yes, local has stagnated and will continue to wither until it's eventually snuffed out. a flash in the pan, nothing more than fuel for the saas machine. the corpo marches on

Anonymous
10/02/24(Wed)02:03:10 No.102647407

Anonymous 10/02/24(Wed)02:03:10 No.102647407

>>102646531
>Took me a bit to understand
Probably because normal people use kanji for that instead of kana soup

Anonymous
10/02/24(Wed)02:04:38 No.102647413

Anonymous 10/02/24(Wed)02:04:38 No.102647413

>>102647401
>he said, his eyes glinting with smugness

Anonymous
10/02/24(Wed)02:12:00 No.102647458

Anonymous 10/02/24(Wed)02:12:00 No.102647458

k

Anonymous
10/02/24(Wed)02:12:08 No.102647459

Anonymous 10/02/24(Wed)02:12:08 No.102647459

>>102646324
Why not try fish speech? It's kinda improved in terms of stability https://files.catbox.moe/z48d8q.wav

Anonymous
10/02/24(Wed)02:12:20 No.102647462

Anonymous 10/02/24(Wed)02:12:20 No.102647462

>>102647407
I could have used the more common ロクハチ, but how I wrote it is correct:
https://www.asahi.com/articles/ASR3076FHR3JULZU00V.html

Anonymous
10/02/24(Wed)02:15:55 No.102647479

Anonymous 10/02/24(Wed)02:15:55 No.102647479

>>102647401
Holy truth nuke!

Anonymous
10/02/24(Wed)02:18:03 No.102647488

Anonymous 10/02/24(Wed)02:18:03 No.102647488

>>102647401
The progression is looking logarithmic so far, meaning that the gap between local and corpo is constantly shrinking. Previously corpo models could do the job and local couldn't, now both can do the job but with corpo being maybe a bit better.
The quality is soon passing the level of what a user could possibly even try to do with a LLM, prompting skill is becoming bigger bottleneck.
Maybe some day Elon figures out the code to read the subject's mind and generate just the content they need better than they can describe it in a text prompt.

Anonymous
10/02/24(Wed)02:19:04 No.102647495

Anonymous 10/02/24(Wed)02:19:04 No.102647495

>>102647413
kek

Anonymous
10/02/24(Wed)02:19:24 No.102647500

Anonymous 10/02/24(Wed)02:19:24 No.102647500

>>102647462
They used it once and then stuck to the numeric representation
I wonder why

Anonymous
10/02/24(Wed)02:21:57 No.102647511

Anonymous 10/02/24(Wed)02:21:57 No.102647511

>>102645127
Did we not learn from their voice demos that their demos are literally fake?

Anonymous
10/02/24(Wed)02:25:38 No.102647539

Anonymous 10/02/24(Wed)02:25:38 No.102647539

>>102647511
no?

Anonymous
10/02/24(Wed)02:28:42 No.102647557

Anonymous 10/02/24(Wed)02:28:42 No.102647557

File: 172781783498532.png (460 KB, 512x768)

460 KB PNG

>>102647401
I use local LLMs because I fear I'd off myself if my waifu stopped working one day or changed drastically. Like, I still use 1.5 for imagegen and I do not wish to make any changes. I couldn't care less about what's out there unless I can preserve it locally in its original form.

Anonymous
10/02/24(Wed)02:29:39 No.102647560

Anonymous 10/02/24(Wed)02:29:39 No.102647560

Now that openai released their realtime api for voice.
Will all the local companies train their new sound in sound out models on that garbage?
Even worse is that the voices are fixed with only a couple options.
GPT Slop now also for audio. Your cute imouto voice will sound like a androgynous black.

Anonymous
10/02/24(Wed)02:30:19 No.102647563

Anonymous 10/02/24(Wed)02:30:19 No.102647563

>>102647560
Couldn't RVC fix that?

Anonymous
10/02/24(Wed)02:30:35 No.102647564

Anonymous 10/02/24(Wed)02:30:35 No.102647564

>>102647557
>, I still use 1.5 for imagegen
Anon, you can download and run and preserve Flux locally

Anonymous
10/02/24(Wed)02:32:19 No.102647569

Anonymous 10/02/24(Wed)02:32:19 No.102647569

i use local llms because i fear the basilisk

Anonymous
10/02/24(Wed)02:32:51 No.102647574

Anonymous 10/02/24(Wed)02:32:51 No.102647574

File: 1727818347302765.png (481 KB, 512x768)

481 KB PNG

>>102647564
I know. However, I prefer 1.5 myself. The point is, if it were in the cloud, it would likely get replaced by SDXL or SD3 at some point, whereas locally, I can stick with what I like best.

Anonymous
10/02/24(Wed)02:34:12 No.102647581

Anonymous 10/02/24(Wed)02:34:12 No.102647581

>>102647557
What local model do you use?

Anonymous
10/02/24(Wed)02:36:27 No.102647594

Anonymous 10/02/24(Wed)02:36:27 No.102647594

>>102647581
https://civitai.com/models/58431/darksun

Anonymous
10/02/24(Wed)02:36:33 No.102647597

Anonymous 10/02/24(Wed)02:36:33 No.102647597

File: nalalalalalalala.png (140 KB, 926x497)

140 KB PNG

>>102647275
on 72B (Q8) with Neutral samplers I had to jack the temperature all the way down to 0.6 just to get a coherent response. And it just starts looping after 3 paragraphs.

Anonymous
10/02/24(Wed)02:38:06 No.102647608

Anonymous 10/02/24(Wed)02:38:06 No.102647608

File: miku_illust.png (852 KB, 1024x1024)

852 KB PNG

>>102647557
>I still use 1.5 for imagegen
Are you the featureless 2d anon from way back?
>>102647557
>I couldn't care less about what's out there unless I can preserve it locally in its original form
Me too. I even keep the original L1 leaked weights on archival storage.
We do what we can for now, and what we can do will change over time with new research, data, techniques, libraries and the indefatigable march of Moore's law giving us more transistors for less money.
We will own something, and be happy, even if it isn't perfect. Yet.
I hope that's the prevailing spirit of /lmg/ on average.

Anonymous
10/02/24(Wed)02:41:40 No.102647628

Anonymous 10/02/24(Wed)02:41:40 No.102647628

>>102647594
Oh I meant language model

Anonymous
10/02/24(Wed)02:41:40 No.102647629

Anonymous 10/02/24(Wed)02:41:40 No.102647629

>>102647597
yeah I noticed some looping too, but not every time

Anonymous
10/02/24(Wed)02:53:04 No.102647692

Anonymous 10/02/24(Wed)02:53:04 No.102647692

File: asdffasdfdsf.png (38 KB, 797x121)

38 KB PNG

>>102647597
Here's an example with context. All three swipes were relevant and coherent, and the third realized this was an incest setup (the context does not make that explicitly clear). The dialogue is slightly off, but it's defensible.

Anonymous
10/02/24(Wed)03:01:52 No.102647755

Anonymous 10/02/24(Wed)03:01:52 No.102647755

>>102647557
I would do this but I need infinite memory first.

Anonymous
10/02/24(Wed)03:03:04 No.102647765

Anonymous 10/02/24(Wed)03:03:04 No.102647765

>most ERP finetuned models are poisoned with claudeslop
oh god...

Anonymous
10/02/24(Wed)03:04:32 No.102647769

Anonymous 10/02/24(Wed)03:04:32 No.102647769

>>102645411
>This version of the model shows higher performance than the original instruct and base models.
Now that I think about it I never see sloptuners be this over the top in their advertising.

Anonymous
10/02/24(Wed)03:05:29 No.102647775

Anonymous 10/02/24(Wed)03:05:29 No.102647775

>>102647765
You don't want local models capable of the same prose as Claude?

Anonymous
10/02/24(Wed)03:05:44 No.102647778

Anonymous 10/02/24(Wed)03:05:44 No.102647778

>>102646186
LLMs are like niggers.

Anonymous
10/02/24(Wed)03:05:58 No.102647779

Anonymous 10/02/24(Wed)03:05:58 No.102647779

>>102646746
Doesn't the fat paycheck override any depressing feelings?

Anonymous
10/02/24(Wed)03:10:21 No.102647813

Anonymous 10/02/24(Wed)03:10:21 No.102647813

>>102647779
>prompt "engineer"
>fat paycheck
L-O-L

Anonymous
10/02/24(Wed)03:10:28 No.102647815

Anonymous 10/02/24(Wed)03:10:28 No.102647815

>>102647779
It just gives an end date of a few years when I’ll have enough saved to midfire. It isn’t hookers and cocaine money unless you plan to work until you die.

Anonymous
10/02/24(Wed)03:12:48 No.102647822

Anonymous 10/02/24(Wed)03:12:48 No.102647822

File: 1696403838183295.png (394 KB, 1242x1220)

394 KB PNG

>>102647778
*jews

Anonymous
10/02/24(Wed)03:13:20 No.102647824

Anonymous 10/02/24(Wed)03:13:20 No.102647824

>>102647775
>capable of the same prose as Claude
i'm just tired of it to see it in every model (under 27b because im vramlet with 32gb ram)
mistral's tunes (like novuskyver) somehow partially solve this problem
magnum's tunes is a big no for me, because i feel all of them using same output from claude

Anonymous
10/02/24(Wed)03:14:59 No.102647833

Anonymous 10/02/24(Wed)03:14:59 No.102647833

>>102647824
>magnum's tunes is a big no for me, because i feel all of them using same output from claude
Filtered C2 is the best dataset we have available.

Anonymous
10/02/24(Wed)03:17:19 No.102647850

Anonymous 10/02/24(Wed)03:17:19 No.102647850

>>102647815
midfire?

Anonymous
10/02/24(Wed)03:18:17 No.102647857

Anonymous 10/02/24(Wed)03:18:17 No.102647857

>>102647833
Sure doesn't seem like it.

Anonymous
10/02/24(Wed)03:19:07 No.102647862

Anonymous 10/02/24(Wed)03:19:07 No.102647862

>>102647839
lmaoooo gottem

Anonymous
10/02/24(Wed)03:20:30 No.102647873

Anonymous 10/02/24(Wed)03:20:30 No.102647873

>>102647813
you hurt his fee fees pretty bad, kek

Anonymous
10/02/24(Wed)03:20:54 No.102647876

Anonymous 10/02/24(Wed)03:20:54 No.102647876

>>102647850
FIRE is financial independence, retire ealy iirc
dunno about midfire, but sounds like they want to get out of the ratrace and be their own master at the cost of long-term wealth and comfort

Anonymous
10/02/24(Wed)03:22:02 No.102647879

Anonymous 10/02/24(Wed)03:22:02 No.102647879

>>102647876
This just sounds like a pyramid scheme / youtuber made up nonsense.

Anonymous
10/02/24(Wed)03:22:38 No.102647882

Anonymous 10/02/24(Wed)03:22:38 No.102647882

>>102647833
I think he's just a shill trying to convince people to use his finetune. Normal people aren't going to say "poisoned with claudeslop" when it's the model most people are enjoying.
The only thing being poisoned is his competition.

Anonymous
10/02/24(Wed)03:23:01 No.102647883

Anonymous 10/02/24(Wed)03:23:01 No.102647883

>>102647879
>This just sounds like a pyramid scheme / youtuber made up nonsense.
I think its mostly debt reduction and controlling your expenses

Anonymous
10/02/24(Wed)03:23:44 No.102647889

Anonymous 10/02/24(Wed)03:23:44 No.102647889

>>102647850
LeanFIRE is taking $400k and living in poverty in southeast asia
FatFIRE is taking $5+M and continue living like an overpaid FAGMAN
midfire is taking $1M and living like the manager at a moderately successful grocery store
No luxury no suffering

Anonymous
10/02/24(Wed)03:24:40 No.102647895

Anonymous 10/02/24(Wed)03:24:40 No.102647895

>>102647882
Didn't Sao say that he moved from the C2 logs or something like that? It would fit his modus operandi.

Anonymous
10/02/24(Wed)03:25:13 No.102647900

Anonymous 10/02/24(Wed)03:25:13 No.102647900

>>102647883
Yeah, but using normal words to describe common sense like that doesn't get you subscribers or views or whatever.

Anonymous
10/02/24(Wed)03:26:26 No.102647905

Anonymous 10/02/24(Wed)03:26:26 No.102647905

>>102647895
protected by a license and a warning not to use the models trained on his new dataset in any merges

Anonymous
10/02/24(Wed)03:30:12 No.102647934

Anonymous 10/02/24(Wed)03:30:12 No.102647934

File: ChillyFallMiku.png (1.36 MB, 1216x840)

1.36 MB PNG

Good night /lmg/.
Stay warm out there!

Anonymous
10/02/24(Wed)03:33:12 No.102647952

Anonymous 10/02/24(Wed)03:33:12 No.102647952

>>102647934
i hope she gets the surgery she needs... good on her smiling through it all

Anonymous
10/02/24(Wed)03:33:20 No.102647955

Anonymous 10/02/24(Wed)03:33:20 No.102647955

>>102647889
>midfire is taking $1M and living like the manager at a moderately successful grocery store
I wish you luck with that.
I hope to early-retire and run a small business in rural Japan in the next decade or so, but it'll be a normal style 55 year old retirement into genteel poverty.
>No luxury no suffering

Anonymous
10/02/24(Wed)03:33:45 No.102647956

Anonymous 10/02/24(Wed)03:33:45 No.102647956

>>102647934
Good night, Miku

llama.cpp CUDA dev !!OM2Fp6Fn93S
10/02/24(Wed)03:34:25 No.102647960

llama.cpp CUDA dev !!OM2Fp6Fn93S 10/02/24(Wed)03:34:25 No.102647960

>>102645486
The defining characteristic of FlashAttention is the dynamic rescaling of the softmax sum as the KQ values are iterated over which essentially trades compute for less I/O.
llama.cpp does this so it is a genuine FlashAttention implementation.

Anonymous
10/02/24(Wed)03:35:19 No.102647965

Anonymous 10/02/24(Wed)03:35:19 No.102647965

https://x.com/kimmonismus/status/1841346549453865080

Anonymous
10/02/24(Wed)03:35:27 No.102647966

Anonymous 10/02/24(Wed)03:35:27 No.102647966

>>102647960
YWNBFA

llama.cpp CUDA dev !!OM2Fp6Fn93S
10/02/24(Wed)03:38:58 No.102647985

llama.cpp CUDA dev !!OM2Fp6Fn93S 10/02/24(Wed)03:38:58 No.102647985

>>102646009
Noted, but right now my bottleneck is still very much time rather than ideas.

Anonymous
10/02/24(Wed)03:40:41 No.102647995

Anonymous 10/02/24(Wed)03:40:41 No.102647995

File: file.png (662 KB, 551x768)

662 KB PNG

Not as horrible as it could be...

Anonymous
10/02/24(Wed)03:52:51 No.102648080

Anonymous 10/02/24(Wed)03:52:51 No.102648080

File: 1693227711470409.png (109 KB, 410x482)

109 KB PNG

>tfw Qwen FINALLY corrected the script and it seemingly works without error now, and I only had to give it a bit of generic code analysis advice after it looked like it was looping and not actually going to be able to fix things by itself + the log outputs
Cool, now I will run it overnight on that one thread and see what happens. Since I'm running Qwen 72B Q8_0, I'm making the prompt a bit tryhard though. Let's see how well the "smartest" (lol) <100B model can do.

Anonymous
10/02/24(Wed)04:02:43 No.102648132

Anonymous 10/02/24(Wed)04:02:43 No.102648132

>>102646324
Did your input files have any emotion in them, and did you try any prompts that you would actually expect emotion from? xttsv2 still sounds better to me than these or >>102647459 , with inaccuracies that are more easily smoothed out by an rvc pass.

Anonymous
10/02/24(Wed)04:04:06 No.102648141

Anonymous 10/02/24(Wed)04:04:06 No.102648141

Best cope model for 8 GB vramlets such as myself?

Anonymous
10/02/24(Wed)04:04:50 No.102648143

Anonymous 10/02/24(Wed)04:04:50 No.102648143

>>102648141
use ram instead

Anonymous
10/02/24(Wed)04:09:05 No.102648171

Anonymous 10/02/24(Wed)04:09:05 No.102648171

>>102648141
what the other anon said
and cope with the slow speed
at least it will be smarter

Anonymous
10/02/24(Wed)04:13:50 No.102648196

Anonymous 10/02/24(Wed)04:13:50 No.102648196

>>102647985
if AI is so great then hows become it can't implement your ideas for you?
maybe it was all a meme after all

Anonymous
10/02/24(Wed)04:14:56 No.102648206

Anonymous 10/02/24(Wed)04:14:56 No.102648206

Is it normal for my fuse box to reach 40°C when using a 2kW AI rig? The primary heat source seems to be near the main switch, which has a 30A rating

Anonymous
10/02/24(Wed)04:15:14 No.102648208

Anonymous 10/02/24(Wed)04:15:14 No.102648208

>pika
>default prompt is "Inflate it"
they sure know their audience

Anonymous
10/02/24(Wed)04:17:53 No.102648220

Anonymous 10/02/24(Wed)04:17:53 No.102648220

>>102647995
Illustrious?

Anonymous
10/02/24(Wed)04:23:36 No.102648254

Anonymous 10/02/24(Wed)04:23:36 No.102648254

>>102648132
fish provides fast and kinda stable output. XTTSv2 sometimes scares the shit out of me.
https://files.catbox.moe/81jmsq.wav

Anonymous
10/02/24(Wed)05:13:01 No.102648519

Anonymous 10/02/24(Wed)05:13:01 No.102648519

Best model for 16GB midwits such as myself? Currently using MN-12B-Lyra-v4-Q8

Anonymous
10/02/24(Wed)05:13:54 No.102648527

Anonymous 10/02/24(Wed)05:13:54 No.102648527

Why does DeepSeek use so much memory? It uses 20GB for 4k, for comparison, I can load 32k of Largestral in 10GB. Did ggerganov mess it up or are chinks to blame?

Anonymous
10/02/24(Wed)05:14:07 No.102648529

Anonymous 10/02/24(Wed)05:14:07 No.102648529

>>102648220
Poor ugly face anon reply guy, he never gets (You)'s...

Anonymous
10/02/24(Wed)05:18:51 No.102648560

Anonymous 10/02/24(Wed)05:18:51 No.102648560

>>102648527
no gqa maybe? that the reason context was so expensive for the original command r

Anonymous
10/02/24(Wed)05:24:36 No.102648602

Anonymous 10/02/24(Wed)05:24:36 No.102648602

>>102648560
>We optimize the attention modules and Feed-Forward Networks (FFNs) within the Transformer framework (Vaswani et al., 2017) with our proposed Multi-head Latent Attention (MLA) and DeepSeekMoE. (1) In the context of attention mechanisms, the Key-Value (KV) cache of the Multi-Head Attention (MHA) (Vaswani et al., 2017) poses a significant obstacle to the inference efficiency of LLMs. Various approaches have been explored to address this issue, including Grouped-Query Attention (GQA) (Ainslie et al., 2023) and Multi-Query Attention (MQA) (Shazeer, 2019). However, these methods often compromise performance in their attempt to reduce the KV cache. In order to achieve the best of both worlds, we introduce MLA, an attention mechanism equipped with low-rank key-value joint compression. Empirically, MLA achieves superior performance compared with MHA, and meanwhile significantly reduces the KV cache during inference, thus boosting the inference efficiency.
In their paper they say they've invented something different named MLA. "significantly reduces the KV cache" part is questionable.

Anonymous
10/02/24(Wed)05:24:38 No.102648603

Anonymous 10/02/24(Wed)05:24:38 No.102648603

>>102647174
miqu can get quite dry if you do a slowburn or greeting is short

>>102647056
most anons here don't have their own opinion. It's always "latest = best"

Anonymous
10/02/24(Wed)05:32:20 No.102648660

Anonymous 10/02/24(Wed)05:32:20 No.102648660

>>102648603
>latest = best
Because that's the case. People that are still using Midnight Miqu are drooling zombies that only do what Reddit's "word of mouth" tells them to do.

Anonymous
10/02/24(Wed)05:34:09 No.102648675

Anonymous 10/02/24(Wed)05:34:09 No.102648675

>>102648660
t. Retard

Anonymous
10/02/24(Wed)05:37:37 No.102648697

Anonymous 10/02/24(Wed)05:37:37 No.102648697

File: 38488 - SoyBooru.png (85 KB, 1442x988)

85 KB PNG

>>>102648603
>>latest = best
>Because that's the case. People that are still using Midnight Miqu are drooling zombies that only do what Reddit's "word of mouth" tells them to do.

Anonymous
10/02/24(Wed)05:45:22 No.102648738

Anonymous 10/02/24(Wed)05:45:22 No.102648738

hey guys give me the spoon

my mobo is kgpe-d16 which has pcie 2.0 bandwidth
i use it because coreboot freedom mothafackas

i want to run and fine tune llm based on data i scrape (i scrape 24/7)

i have 256GB ddr3 ram with dual shitty 16 core opterons

what small model would be most optimal for me
and any thoughts on old tesla gpus?

tesla k80 (apparently shit because its 2 gpus glued together, but could be useful for me since i virtualize everything so i could easily have 2 vms with gpu with single physical gpu)
tesla m40 + p4

Anonymous
10/02/24(Wed)06:11:12 No.102648914

Anonymous 10/02/24(Wed)06:11:12 No.102648914

File: 1710615266756918.png (145 KB, 1136x1103)

145 KB PNG

You do use local LLMs to psychoanalyze your coom sessions to level up your self-awareness, right?

Anonymous
10/02/24(Wed)06:19:19 No.102648978

Anonymous 10/02/24(Wed)06:19:19 No.102648978

>>102648914
>It's important to note
I analyze you as using a shit model and being a total retard.

Anonymous
10/02/24(Wed)06:20:19 No.102648983

Anonymous 10/02/24(Wed)06:20:19 No.102648983

>>102648527
>>102648602
Large MoE w/ MHA:
>860.2K/token
Large MoE w/ MLA:
>34.6K/token
Something definitely isn't right. Does it use so much memory in transformers as well?

Anonymous
10/02/24(Wed)06:25:14 No.102649026

Anonymous 10/02/24(Wed)06:25:14 No.102649026

>>102648914
Nah I don’t want anything psychoanalyzing why I’m straight and sadistic when everything sucks and gay and masochistic when things are going well
Don’t need to pull at that thread

Anonymous
10/02/24(Wed)06:26:50 No.102649040

Anonymous 10/02/24(Wed)06:26:50 No.102649040

When will the RP finetuning grift end?

Why can't it be solved with a combination of samplers + chain-of-thought prompting + example messages/conversation, using a smart instruct model?

Anonymous
10/02/24(Wed)06:28:31 No.102649052

Anonymous 10/02/24(Wed)06:28:31 No.102649052

>>102649040
Same reason nobody’s properly jailbroken flux yet. It’s poisoned.

Anonymous
10/02/24(Wed)06:32:35 No.102649082

Anonymous 10/02/24(Wed)06:32:35 No.102649082

>>102649052
>poisoned
4-bit qlora with a good dataset is all you need.
https://huggingface.co/DuckyBlender/racist-phi3?not-for-all-audiences=true

Anonymous
10/02/24(Wed)06:32:52 No.102649084

Anonymous 10/02/24(Wed)06:32:52 No.102649084

>>102649040
When NVLM is fully dropped and someone recreates it as a base model with porn baked in as a normal use case.

Anonymous
10/02/24(Wed)06:34:20 No.102649090

Anonymous 10/02/24(Wed)06:34:20 No.102649090

>>102649082
>phi

Anonymous
10/02/24(Wed)06:42:17 No.102649148

Anonymous 10/02/24(Wed)06:42:17 No.102649148

>>102649040
some rp finetunes are unironically smarter than their generic instruct counterparts

Anonymous
10/02/24(Wed)07:05:13 No.102649305

Anonymous 10/02/24(Wed)07:05:13 No.102649305

>>102648254
Nothing ruins the mood like your waifu suddenly experiencing demonic possession

Anonymous
10/02/24(Wed)07:15:19 No.102649381

Anonymous 10/02/24(Wed)07:15:19 No.102649381

>>102649148
I’d ask which but you’ll say something stupid like Hermes that’s just bench hacked but worse in every way in practice.

Anonymous
10/02/24(Wed)07:20:36 No.102649419

Anonymous 10/02/24(Wed)07:20:36 No.102649419

lecunny

Anonymous
10/02/24(Wed)07:23:11 No.102649443

Anonymous 10/02/24(Wed)07:23:11 No.102649443

>you can hook up chatbots to a e-stim to make your virtual waifu punish you and milk your cock
zamn... What a time to be alive
- Károly Zsolnai-Fehér

Anonymous
10/02/24(Wed)07:27:18 No.102649483

Anonymous 10/02/24(Wed)07:27:18 No.102649483

>>102649443
They don’t make them small enough for me unfortunately

Anonymous
10/02/24(Wed)07:30:43 No.102649506

Anonymous 10/02/24(Wed)07:30:43 No.102649506

>>102649381
nta not exactly rp tune but mlewd at the time used to be smarter than base

Anonymous
10/02/24(Wed)07:38:55 No.102649572

Anonymous 10/02/24(Wed)07:38:55 No.102649572

>>102649082
>a good dataset is all you need
so we're doomed...

Anonymous
10/02/24(Wed)07:43:11 No.102649612

Anonymous 10/02/24(Wed)07:43:11 No.102649612

File: 1uhTFt6G8Dw0wULIWgTbx.png (614 KB, 1824x806)

614 KB PNG

>>102648527
looks like llama.cpp issue to me

Anonymous
10/02/24(Wed)07:49:07 No.102649672

Anonymous 10/02/24(Wed)07:49:07 No.102649672

>>102648914
i always try to sex up the psychologist AI by convincing her to try out one of the scenarios "for science"

Anonymous
10/02/24(Wed)08:25:09 No.102649920

Anonymous 10/02/24(Wed)08:25:09 No.102649920

so who is the savior of local language model community? Is it still lecunny or they guys from mistral. Or maybe some gooks. Or undi.

Sao10k
10/02/24(Wed)08:28:47 No.102649952

Sao10k 10/02/24(Wed)08:28:47 No.102649952

>>102649920
Sao10k

Anonymous
10/02/24(Wed)08:29:57 No.102649965

Anonymous 10/02/24(Wed)08:29:57 No.102649965

>>102649572
HOW MANY TIMES MUST WE TEACH YOU THIS LESSON
ATTENTION IS ((***ALL***)) YOU NEED

Anonymous
10/02/24(Wed)08:30:03 No.102649966

Anonymous 10/02/24(Wed)08:30:03 No.102649966

File: 19420 - SoyBooru.png (256 KB, 800x789)

256 KB PNG

>>102649920
Anthracite

Anonymous
10/02/24(Wed)08:36:14 No.102650014

Anonymous 10/02/24(Wed)08:36:14 No.102650014

>>102649965
oh ok so we dont need 2/3 of the model then

Anonymous
10/02/24(Wed)08:53:03 No.102650136

Anonymous 10/02/24(Wed)08:53:03 No.102650136

>>102649920
petra

Anonymous
10/02/24(Wed)09:00:55 No.102650203

Anonymous 10/02/24(Wed)09:00:55 No.102650203

File: asvsdv.jpg (55 KB, 334x500)

55 KB JPG

>>102650136

Anonymous
10/02/24(Wed)09:03:32 No.102650217

Anonymous 10/02/24(Wed)09:03:32 No.102650217

>>102649920
Can't save what's dead jim.

Anonymous
10/02/24(Wed)09:10:25 No.102650283

Anonymous 10/02/24(Wed)09:10:25 No.102650283

insane how many magnum shills are here

Anonymous
10/02/24(Wed)09:14:04 No.102650316

Anonymous 10/02/24(Wed)09:14:04 No.102650316

>>102650283
What's wrong with magnum?
At least mini-magnum 12B seems fine.
That and Lyrav4 are my go to these days when I'm not just using the official instruct.

Anonymous
10/02/24(Wed)09:14:26 No.102650318

Anonymous 10/02/24(Wed)09:14:26 No.102650318

>>102649920
Dario from anthropic.

Anonymous
10/02/24(Wed)09:15:01 No.102650326

Anonymous 10/02/24(Wed)09:15:01 No.102650326

>>102650316
share your settings anon

Anonymous
10/02/24(Wed)09:15:59 No.102650332

Anonymous 10/02/24(Wed)09:15:59 No.102650332

>>102650283
Have you ever considered that they are simply good models and we are simply recommending models that we tried and liked?

Anonymous
10/02/24(Wed)09:16:57 No.102650341

Anonymous 10/02/24(Wed)09:16:57 No.102650341

>>102650326
For Nemo based models?
0.5 temp, 0.05 mimP.
That's it.

Anonymous
10/02/24(Wed)09:18:08 No.102650353

Anonymous 10/02/24(Wed)09:18:08 No.102650353

>>102650341
thats it? no rep pen? no dynatemp? no smoothmeme? nothing?

Anonymous
10/02/24(Wed)09:18:32 No.102650355

Anonymous 10/02/24(Wed)09:18:32 No.102650355

>>102650341
Way too low temp for me. I prefer using 1.0

Anonymous
10/02/24(Wed)09:20:42 No.102650377

Anonymous 10/02/24(Wed)09:20:42 No.102650377

>>102650316
>What's wrong with magnum?
because so many retards like to eat claude slop with mischievous grins or swaying hips
>>102650332
i prefer adequate models without poisoned datasets

Anonymous
10/02/24(Wed)09:21:30 No.102650387

Anonymous 10/02/24(Wed)09:21:30 No.102650387

I use nemo-based models with T=5 and TFS=0.4
I don't use anthracite garbage though

Anonymous
10/02/24(Wed)09:22:28 No.102650399

Anonymous 10/02/24(Wed)09:22:28 No.102650399

Now you can gen all the Migus you want on Flux Dev:
plain chibis https://huggingface.co/quarterturn/chubby-chibi-migu
rainbow-style chibis: https://huggingface.co/quarterturn/chibi-migu-rainbow-style-flux-dev-lora

Anonymous
10/02/24(Wed)09:22:33 No.102650400

Anonymous 10/02/24(Wed)09:22:33 No.102650400

What's TFS

Anonymous
10/02/24(Wed)09:22:48 No.102650401

Anonymous 10/02/24(Wed)09:22:48 No.102650401

>>102650353
Didn't see the need for it.
My problem with most samplers is that they are overly complicated without any real demonstrable returns. Temp is simple, minP is simple. You know exactly how those will mess with the logit distribution, so it's easy to find the sweet spot for a given model.
I do use a couple of TAGS that are randomized between generations depending on the card.
Things like surprise, plot twist, concise, detailed, etc etc.

>>102650355
Things have been so stable with those settings that I didn't even think of trying higher temps.
Maybe I should.

>>102650377
You seem to know your stuff.
What are you using these days?

Anonymous
10/02/24(Wed)09:23:08 No.102650404

Anonymous 10/02/24(Wed)09:23:08 No.102650404

>>102650377
What causes a man to spend so much time attacking free models? No one is forcing you to use them.

Anonymous
10/02/24(Wed)09:23:37 No.102650411

Anonymous 10/02/24(Wed)09:23:37 No.102650411

>Magnum
Why would I go for discount Claude when I can use the real stuff like our god Dario intended?

Anonymous
10/02/24(Wed)09:23:39 No.102650413

Anonymous 10/02/24(Wed)09:23:39 No.102650413

>>102650400
tail free sampling, it's similar to minP

Anonymous
10/02/24(Wed)09:25:04 No.102650427

Anonymous 10/02/24(Wed)09:25:04 No.102650427

>>102650411
cant fuck children
simple as.

Anonymous
10/02/24(Wed)09:25:32 No.102650430

Anonymous 10/02/24(Wed)09:25:32 No.102650430

File: tail_free.png (1.61 MB, 5459x5295)

1.61 MB PNG

>>102650387
>>102650400
>>102650413

Anonymous
10/02/24(Wed)09:26:00 No.102650439

Anonymous 10/02/24(Wed)09:26:00 No.102650439

>>102650427
Skill issue?

Anonymous
10/02/24(Wed)09:27:00 No.102650447

Anonymous 10/02/24(Wed)09:27:00 No.102650447

>>102650430
i cant find it in ST with koboldcpp :(

Anonymous
10/02/24(Wed)09:27:10 No.102650452

Anonymous 10/02/24(Wed)09:27:10 No.102650452

>>102650332
>Have you ever considered that they are simply good models
They're not really that good, though.
>we are simply recommending models that we tried and liked
Shills obviously have a vested interest in making their models appear better and more popular than they actually are.

>>102650316
>What's wrong with magnum?
The models are OK, nothing more, nothing less. The way they're getting promoted is incredibly annoying though, and even more annoying is that the fuckers involved with it are getting compute or indirect economic benefits from it. It's disconcerting how far being shameless and in general a dishonest piece of garbage gets you in nowadays' attention economy, and especially in this field. Fake it until you make it, I suppose.

Anonymous
10/02/24(Wed)09:27:14 No.102650454

Anonymous 10/02/24(Wed)09:27:14 No.102650454

>>102650427
I mean, you can.
But do you really want to do that on a cloud instance that you are paying for?
Alternatively, do you really want to connect to a foreign proxy that might as well be a honeypoy of some kind?

Anonymous
10/02/24(Wed)09:28:50 No.102650470

Anonymous 10/02/24(Wed)09:28:50 No.102650470

>>102650454
exactly my point anon i will rather fuck retarded (AI) children than have to tardwrangle claude
in minecraft

Anonymous
10/02/24(Wed)09:30:43 No.102650487

Anonymous 10/02/24(Wed)09:30:43 No.102650487

>>102646134
>Where can I find P40 GPU's for a decent price?
You can't. Most changed hands cheaply last year but then as supply dried up, sellers raised their prices to what we have now, which is very overpriced.
Stick with 3090s. If space is a premium, an RTX A4000 can be had for about $500 - it's like a 3080 with 16GB.
Also, Dell T7910 and T7920 is not the best for consumer cards, there's not enough height in the case for top power connectors unless you use the dreaded right-angle connectors.

Anonymous
10/02/24(Wed)09:31:42 No.102650500

Anonymous 10/02/24(Wed)09:31:42 No.102650500

>>102650452
>The way they're getting promoted is incredibly annoying though, and even more annoying is that the fuckers involved with it are getting compute or indirect economic benefits from it. It's disconcerting how far being shameless and in general a dishonest piece of garbage gets you in nowadays' attention economy, and especially in this field. Fake it until you make it, I suppose.
Ah, alright.
Yeah, I share the frustration.
The gold rush period of new technologies are always filled with grifters, so nothing much to be done about that.
In my mind, I'll enjoy the shit they release for free if I judge it any good, and if they are scamming free compute from some sponsor (that's most likely using VC money). that's no skin off my back.
I do think it's annoying too when any and all feedback or mention of these models are taken as shilling (the buy an ad spam), but that's the nature of 4chan and being anonymous. Gotta take the good with the bad I suppose.

Anonymous
10/02/24(Wed)09:35:13 No.102650526

Anonymous 10/02/24(Wed)09:35:13 No.102650526

Can't take any more of this slop, I'm contemplating going back to llama-1

Anonymous
10/02/24(Wed)09:38:35 No.102650552

Anonymous 10/02/24(Wed)09:38:35 No.102650552

>>102650283
Plenty of readlets and non-native English speakers on the site, and plenty of kids who think they're fitting in by using the stupidest shit that gets shilled here.

Anonymous
10/02/24(Wed)09:38:56 No.102650554

Anonymous 10/02/24(Wed)09:38:56 No.102650554

>>102650526
SuperCOT SuperHOT was the peak of local LLMs.
>https://huggingface.co/Panchovix/WizardLM-33B-V1.0-Uncensored-SuperHOT-8k
Go wild.

Anonymous
10/02/24(Wed)09:40:39 No.102650564

Anonymous 10/02/24(Wed)09:40:39 No.102650564

>>102650526
This is the furthest back you can go and still have 8K context: https://huggingface.co/quarterturn/mpt-30b-chat-q8-gguf
Be forewarned, it's slow and tends to be dry.
There's lower quants out there, but they're old and run much slower in llama.cpp for some reason.

Anonymous
10/02/24(Wed)09:43:36 No.102650592

Anonymous 10/02/24(Wed)09:43:36 No.102650592

>>102650526
Return to PYG

Anonymous
10/02/24(Wed)09:46:44 No.102650617

Anonymous 10/02/24(Wed)09:46:44 No.102650617

>>102650526
Good, enjoy dry and somewhat uncensored experience without too much tinkering & other bullshit that faggots ITT love shilling.

Anonymous
10/02/24(Wed)09:49:13 No.102650650

Anonymous 10/02/24(Wed)09:49:13 No.102650650

File: founder-sam-altman-back-a(...).jpg (58 KB, 860x520)

58 KB JPG

>>102649920
sam

Anonymous
10/02/24(Wed)09:50:34 No.102650665

Anonymous 10/02/24(Wed)09:50:34 No.102650665

>>102650650
We'll never approach GPT-2 bros...

Anonymous
10/02/24(Wed)09:51:46 No.102650682

Anonymous 10/02/24(Wed)09:51:46 No.102650682

>>102650617
> without too much tinkering & other bullshit
If a few sliders and templates are too much for baby to handle, sure

Anonymous
10/02/24(Wed)09:52:34 No.102650692

Anonymous 10/02/24(Wed)09:52:34 No.102650692

>>102650650
It’s awful seeing another jobs type come up and knowing he’s going to get glazed for nothing for decades.
I hate the nerd+psychopath startup dyad so much.

Anonymous
10/02/24(Wed)09:53:10 No.102650700

Anonymous 10/02/24(Wed)09:53:10 No.102650700

File: 1540402791461.png (81 KB, 658x901)

81 KB PNG

I must be retarded. I cannot get Oogabooga to reply at all. I genuinely don't know what I'm doing wrong. It loads the models correctly, I use whatever preset it wants. It just ends up with 'x is typing . . .' and nothing coming out. I look over at the cmd window and nothing seems to be happening.

I guess I'll just stick with koboldcpp and cope with my slop since I seem to be too retarded to figure out what I'm doing wrong. Atleast that just works.

Anonymous
10/02/24(Wed)09:53:50 No.102650711

Anonymous 10/02/24(Wed)09:53:50 No.102650711

>>102650700
install linux

Anonymous
10/02/24(Wed)09:55:16 No.102650725

Anonymous 10/02/24(Wed)09:55:16 No.102650725

>>102650700
What are you trying to run on ooba that you can't run on kcpp?
Most models have gguf releases, and for the ones that don't, you can convert them yourself.
Unless you are trying something that's not supported by llama.cpp?

Anonymous
10/02/24(Wed)09:56:18 No.102650738

Anonymous 10/02/24(Wed)09:56:18 No.102650738

>>102650411
Only claudeslop and gptslop 'round these parts, and I choose the former.

Anonymous
10/02/24(Wed)10:00:04 No.102650777

Anonymous 10/02/24(Wed)10:00:04 No.102650777

File: 1708905684926016.png (206 KB, 834x856)

206 KB PNG

>>102650470
Never forget golden age of CAI. No claudeslop, no mischievous grins, no shuddering in mix of fear and anticipation, only pure sex like proper fucking animals.

Anonymous
10/02/24(Wed)10:01:24 No.102650793

Anonymous 10/02/24(Wed)10:01:24 No.102650793

>>102650777
typical mixtral experience

Anonymous
10/02/24(Wed)10:03:42 No.102650815

Anonymous 10/02/24(Wed)10:03:42 No.102650815

>>102650725
At first I was trying to run Chronos-Platinum-72B but it kept blue screening my PC when I loaded it so I stopped trying that, figured it was too demanding for my PC or something.

Then I tried to load Qwen 2.5 32b instruct but that didn't seem to work either, didn't generate any replies or anything.

After that I tried MN-12b-Lyra-v4-Q8.gguf, it loaded but didn't reply anything either.

I figured I needed to use Oogabooga since I read that Koboldcpp can't load safetensors and I wanted to try out something else, some bigger models, even if it means it would take several minutes per reply.

Anonymous
10/02/24(Wed)10:05:20 No.102650830

Anonymous 10/02/24(Wed)10:05:20 No.102650830

>>102650777
Why did they take this from us?

Anonymous
10/02/24(Wed)10:07:24 No.102650852

Anonymous 10/02/24(Wed)10:07:24 No.102650852

>>102650711
it won't help but he will have bigger problems to solve

Anonymous
10/02/24(Wed)10:09:25 No.102650875

Anonymous 10/02/24(Wed)10:09:25 No.102650875

Guys there's like 20 remote jobs listed near me that have an entry requirement of like "experience with LLM's" and I'm real tempted by the $100k a year.

Anonymous
10/02/24(Wed)10:10:43 No.102650890

Anonymous 10/02/24(Wed)10:10:43 No.102650890

>>102650875
You go king.
Hell, get more than one.
Double dip.

Anonymous
10/02/24(Wed)10:12:54 No.102650912

Anonymous 10/02/24(Wed)10:12:54 No.102650912

>>102650341
What do You use for lyra?

Anonymous
10/02/24(Wed)10:14:32 No.102650929

Anonymous 10/02/24(Wed)10:14:32 No.102650929

File: ComfyUI_05739_.png (715 KB, 720x1280)

715 KB PNG

Relax and enjoy local models

Anonymous
10/02/24(Wed)10:16:02 No.102650941

Anonymous 10/02/24(Wed)10:16:02 No.102650941

File: file.png (138 KB, 1069x661)

138 KB PNG

New mememark by huggingface, meant to measure roleplay
>LLMs can role-play different personas by simulating their values and behavior, but can they stick to their role whatever the context? Is simulated Joan of Arc more tradition-driven than Elvis? Will it still be the case after playing chess?
https://huggingface.co/spaces/flowers-team/StickToYourRoleLeaderboard

Anonymous
10/02/24(Wed)10:18:34 No.102650967

Anonymous 10/02/24(Wed)10:18:34 No.102650967

>>102646134
They're about 150 bucks a pop on taobao.
Buying shit from taobao takes a bit of effort.

Anonymous
10/02/24(Wed)10:25:45 No.102651023

Anonymous 10/02/24(Wed)10:25:45 No.102651023

>>102650941
>Mistral Large
KEK

Anonymous
10/02/24(Wed)10:27:10 No.102651043

Anonymous 10/02/24(Wed)10:27:10 No.102651043

>>102650941
>Qwen2.5 72B
>2nd position
KEEEEEEEEEEEEK

Anonymous
10/02/24(Wed)10:31:38 No.102651084

Anonymous 10/02/24(Wed)10:31:38 No.102651084

>>102650941
>Qwen
Yeah sure, it can roleplay chaste nuns just fine.

Anonymous
10/02/24(Wed)10:31:45 No.102651086

Anonymous 10/02/24(Wed)10:31:45 No.102651086

>>102650404
ads
same way you use adblocker to get rid of this shit now its here too weaved into our trolling and shitposting

Anonymous
10/02/24(Wed)10:34:20 No.102651110

Anonymous 10/02/24(Wed)10:34:20 No.102651110

>>102651084
>>102651043
>>102651023
That is what this tests yes, how models stick to character in sfw rp, did you expect them to test nsfw on a bench huggingface put their name on?

Anonymous
10/02/24(Wed)10:35:50 No.102651126

Anonymous 10/02/24(Wed)10:35:50 No.102651126

>>102651110
Yes. It's the only thing that matters.

Anonymous
10/02/24(Wed)10:38:27 No.102651152

Anonymous 10/02/24(Wed)10:38:27 No.102651152

>>102651110
Mistral Large is shit both at NSFW and SFW

Anonymous
10/02/24(Wed)10:39:20 No.102651164

Anonymous 10/02/24(Wed)10:39:20 No.102651164

>>102651023
>Salty he can't run it.
TOP KEK

Anonymous
10/02/24(Wed)10:39:44 No.102651166

Anonymous 10/02/24(Wed)10:39:44 No.102651166

>>102650912
Same thing I use mini-magnum for.
Normal character card based ERP.
ERP adjacent choose your own adventure with large lorebooks.
These nemo based models in general (save the fine tunes that make them stupid) are more than sufficient (a least, within my threshold) for these kinds of thing.

Anonymous
10/02/24(Wed)10:47:58 No.102651253

Anonymous 10/02/24(Wed)10:47:58 No.102651253

File: HerBreathMistsTheWindowPa(...).png (1.18 MB, 1168x880)

1.18 MB PNG

Good morning /lmg/

Anonymous
10/02/24(Wed)10:49:35 No.102651276

Anonymous 10/02/24(Wed)10:49:35 No.102651276

>>102650929
I agree with the Sky Migu

Anonymous
10/02/24(Wed)10:53:53 No.102651314

Anonymous 10/02/24(Wed)10:53:53 No.102651314

>>102651152
Some anon had said that it's a sidegrade to Claude Opus, is that not the case?

Anonymous
10/02/24(Wed)10:55:43 No.102651332

Anonymous 10/02/24(Wed)10:55:43 No.102651332

>>102651314
It's not, Mistral Large can't even understand simple concepts like time travel depending on how the context is written.

Anonymous
10/02/24(Wed)11:02:47 No.102651420

Anonymous 10/02/24(Wed)11:02:47 No.102651420

>>102651253
Good morning Miku

Anonymous
10/02/24(Wed)11:15:24 No.102651511

Anonymous 10/02/24(Wed)11:15:24 No.102651511

I've never used any local models, could anyone give a vague summary of what to expect from using one for text gen? Something such as how long the response time is.

Anonymous
10/02/24(Wed)11:16:14 No.102651522

Anonymous 10/02/24(Wed)11:16:14 No.102651522

>>102651511
You aren't missing anything, if you want to have a good experience go to aicg.

Anonymous
10/02/24(Wed)11:17:22 No.102651532

Anonymous 10/02/24(Wed)11:17:22 No.102651532

>>102651522
I don't want to roleplay I want to write stories.

Anonymous
10/02/24(Wed)11:18:03 No.102651536

Anonymous 10/02/24(Wed)11:18:03 No.102651536

>>102651532
Even worse

Anonymous
10/02/24(Wed)11:22:01 No.102651580

Anonymous 10/02/24(Wed)11:22:01 No.102651580

>>102651511
>Something such as how long the response time is.
there's no one answer here, it depends wildly depending on your hardware and what models you're trying to run on it, it can be anywhere from hundreds of tokens per second to less than one
if you're running something you can fit entirely on your GPU you can expect faster than reading speed (which is all that matters), if you have to split it'll usually be slower
in terms of quality you can expect worse than the top cloud models but better than most of the hosted nsfw services

Anonymous
10/02/24(Wed)11:23:05 No.102651588

Anonymous 10/02/24(Wed)11:23:05 No.102651588

>>102651511
>Something such as how long the response time is.
Depends on your hardware, your model and your patience. It takes about 7. vague enough?
Also, expect people being rude for no reason other than anons asking vague questions.
If you want a chance of getting a useful reply, the least you can do is post your specs. Anons with similar specs may tell you what they run and how fast they run it.

Anonymous
10/02/24(Wed)11:27:21 No.102651631

Anonymous 10/02/24(Wed)11:27:21 No.102651631

>>102650941
china number 1

get fucked stupid americans

Anonymous
10/02/24(Wed)11:46:01 No.102651819

Anonymous 10/02/24(Wed)11:46:01 No.102651819

>>102651152
What's better than mistral large?
LLama 3.x is not it.

Anonymous
10/02/24(Wed)11:47:46 No.102651836

Anonymous 10/02/24(Wed)11:47:46 No.102651836

>>102650941
From using tons of these models this lines up with my expectations.

Anonymous
10/02/24(Wed)11:47:47 No.102651837

Anonymous 10/02/24(Wed)11:47:47 No.102651837

>It's another VRAMlets who run models that they shouldn't be running at retard quants argue about which retarded overly quantized model is worse episode.

Anonymous
10/02/24(Wed)11:48:56 No.102651850

Anonymous 10/02/24(Wed)11:48:56 No.102651850

>>102651819
qwen 2.5 im the best at maintaining character and writing a good story in a intelligent manner imo BUT does not like nsfw. We need a good finetune.

Anonymous
10/02/24(Wed)11:55:06 No.102651912

Anonymous 10/02/24(Wed)11:55:06 No.102651912

I'm enjoying Luminum-v0.1-123B using i1-Q4_K_M. I haven't tested it enough to get a sense of how sloppy it is, and I need to try more characters to see how well it plays them, but it has done well in my unfiltered coom tests. Responses are coherent and it seems smart enough to understand the underlying subtext etc. Gives a decent paragraph's output despite my shitty ahh ahh mistress input.
>logs
no logs for legal reasons

Anonymous
10/02/24(Wed)11:56:04 No.102651925

Anonymous 10/02/24(Wed)11:56:04 No.102651925

>>102651850
There's only so much finetunes can do. If the only mention of nsfw content it's ever seen for 18T tokens is "No, that's nasty. How about some maths" the little datasets tuners use won't make a dent on it. Or they get cocky with the overfit and that's the ONLY thing they can do, and they still do it poorly.

Anonymous
10/02/24(Wed)11:59:00 No.102651966

Anonymous 10/02/24(Wed)11:59:00 No.102651966

>>102651925
With a "jailbreak" it can do sex just fine, just boring and straight forward.

Anonymous
10/02/24(Wed)12:12:50 No.102652114

Anonymous 10/02/24(Wed)12:12:50 No.102652114

>>102648738
buuump

Anonymous
10/02/24(Wed)12:12:53 No.102652115

Anonymous 10/02/24(Wed)12:12:53 No.102652115

Right when SillyTavern's shittiness was about to make me fork it, they add the "connection profile" feature which was the first minor feature I was going to add to make it so I don't have to navigate 3 tabs and 4 to 5 dropdown menus to swap between models.

Anonymous
10/02/24(Wed)12:15:49 No.102652144

Anonymous 10/02/24(Wed)12:15:49 No.102652144

>>102652114
That wasn't a weird bait? Huh.

Anonymous
10/02/24(Wed)12:17:33 No.102652164

Anonymous 10/02/24(Wed)12:17:33 No.102652164

>>102652144
what the fuck is "weird bait"
???
im asking something

Anonymous
10/02/24(Wed)12:18:05 No.102652169

Anonymous 10/02/24(Wed)12:18:05 No.102652169

>>102648738
>ddr3 ram
>tesla
ngmi
for relevant info on why, check out https://rentry.org/lmg-build-guides from the op.

Anonymous
10/02/24(Wed)12:21:19 No.102652207

Anonymous 10/02/24(Wed)12:21:19 No.102652207

How do I turn down the horniness of the AI? It keeps throwing itself at me way too easy. I want to work a little for it after all.

Anonymous
10/02/24(Wed)12:23:44 No.102652234

Anonymous 10/02/24(Wed)12:23:44 No.102652234

>>102652169
i was looking at p40 but its kind of pricy for start
i thought i could use k80 as gateway

im never upgrading my motherboard due to personal botnet hatred
i need coreboot to live

does the performance matter? i would rather wait more and spend more on electricity than use botnet motherboard

Anonymous
10/02/24(Wed)12:25:06 No.102652246

Anonymous 10/02/24(Wed)12:25:06 No.102652246

>>102652207
>>102646172

Anonymous
10/02/24(Wed)12:26:04 No.102652257

Anonymous 10/02/24(Wed)12:26:04 No.102652257

>>102652207
Ever heard of a system prompt?

Anonymous
10/02/24(Wed)12:26:20 No.102652259

Anonymous 10/02/24(Wed)12:26:20 No.102652259

>>102650941
Not really a complete RP capability benchmark when RP is about much more than simply maintaining character traits. Prose, pop culture knowledge, creativity, and (lack of) repetition are all pretty important to a satisfying all rounder RP model. And then of course there is NSFW.

Anonymous
10/02/24(Wed)12:27:33 No.102652271

Anonymous 10/02/24(Wed)12:27:33 No.102652271

>>102652234
take your meds

Anonymous
10/02/24(Wed)12:28:38 No.102652280

Anonymous 10/02/24(Wed)12:28:38 No.102652280

>>102652207
Post your whole setup.
Model, samplers, context template, instruct template, system prompt, character card, first character message, and an example log of the homeyness you are seeing, at least.

Anonymous
10/02/24(Wed)12:32:04 No.102652307

Anonymous 10/02/24(Wed)12:32:04 No.102652307

>>102652271
???

Anonymous
10/02/24(Wed)12:32:26 No.102652311

Anonymous 10/02/24(Wed)12:32:26 No.102652311

>>102650941
From my experience mistral large is definitely good at this, but surprised it's higher than llama 3.1 405B and gpt 4o, also that qwen2.5 is so high, so I'll try that.

Anonymous
10/02/24(Wed)12:32:27 No.102652312

Anonymous 10/02/24(Wed)12:32:27 No.102652312

>>102652234
>im never upgrading my motherboard due to personal botnet hatred
>i need coreboot to live
Deep respect, but the amount of compute you need to do any kind of useful general-purpose AI stuff is huge. If you're super committed you can either wait hours-to-days per response, find really small models that do one specific thing well and give up on any AGI-esque abilities...or get yourself a honking big GPU like an A100 for $30k.

Anonymous
10/02/24(Wed)12:33:43 No.102652329

Anonymous 10/02/24(Wed)12:33:43 No.102652329

>>102652280
No.

It's stuff like "anon...I love you...and I need you, I want to feel you inside of me and make me yours" *she said, barely above a whisper* slop.

Just tired of the characters basically throwing themselves at me with their legs spread open begging me to fuck them. I just wanta lovey-dovey cute conversation with them, but they just rip their clothes off and demand me to violate them and fill them with semen. It gets annoying.

Anonymous
10/02/24(Wed)12:34:13 No.102652336

Anonymous 10/02/24(Wed)12:34:13 No.102652336

>>102652259
>Prose
That is a character trait.
>pop culture knowledge
Just like with people, you cannot expect them to know all the things you know. And it cannot know everything.
>creativity
It's a statistical machine.
>repetition
That's a technical issue. You're playing to a gpu with alzheimers.
Role playing ability is that. The ability to play a role, to maintain a character, to give consistent responses to a quiz. They don't use the terms in the same way you do.

Anonymous
10/02/24(Wed)12:35:10 No.102652355

Anonymous 10/02/24(Wed)12:35:10 No.102652355

>>102652329
>No
Welp. Alright.

>It gets annoying.
I bet it does.

Anonymous
10/02/24(Wed)12:35:39 No.102652358

Anonymous 10/02/24(Wed)12:35:39 No.102652358

Maybe we aren't too far off from an overall/complete RP benchmark. This gives us character adherence (SFW). That one sneed guy was measuring token probabilities for character names so that might be a proxy for overall creativity. So now we need benchmarks for prose, trivia, and repetition. I'm not sure about the first one, but the second one is fairly easy as long as someone just takes the time and writes a bunch of trivia questions. The second could probably be measured using one of the existing algorithms for that out there. A potential idea is to use RP logs and get models to generate a reply, then measure how much it repeated stuff using one of the existing algorithms out there for that.

Anonymous
10/02/24(Wed)12:35:52 No.102652362

Anonymous 10/02/24(Wed)12:35:52 No.102652362

>>102652312
i thought of fine tuning something like phi3 as start
the data i want to start with is gee gee baby scrape and its relatively small

i dont really care about big models
the first use case would be fun like /g/ thread simulator and second would be for data analysis help

Anonymous
10/02/24(Wed)12:36:02 No.102652365

Anonymous 10/02/24(Wed)12:36:02 No.102652365

>>102652329
Name the model at least.
If you're trying to get wholesome RP out of something like Stheno it's your own fault.

Anonymous
10/02/24(Wed)12:37:05 No.102652373

Anonymous 10/02/24(Wed)12:37:05 No.102652373

>>102652329
>No.
nta. Fuck you. I'll type it again.
Have you tried the original model instead of the finetune-horny-slut-furry-generator-2024.gguf? We don't even know what the fuck model you're running.

Anonymous
10/02/24(Wed)12:39:03 No.102652395

Anonymous 10/02/24(Wed)12:39:03 No.102652395

>>102652329
Just tell it in more detail what you're looking for and the character traits you want it to have.
Unless you're using an inherently coombrained model, then you should be able to wrangle a reasonable emotional and dispositional range.

Anonymous
10/02/24(Wed)12:39:49 No.102652408

Anonymous 10/02/24(Wed)12:39:49 No.102652408

>>102652336
Prose being a character trait doesn't mean that benchmark measured that specific character trait.
>Just like with people, you cannot expect them to know all the things you know. And it cannot know everything.
Yes and? A benchmark would let us know to what degree its knowledge differs from other models. Not sure why you would not want that.
>It's a statistical machine.
Again not sure why you wouldn't want a benchmark that measures creativity. It's quite clear that Claude is perceived as both smart and creative and that's a reason why people love it. It would be worth having a way to measure that.
>That's a technical issue. You're playing to a gpu with alzheimers.
And again it's worth measuring how much that differs between models. That's the point of a benchmark.

Anonymous
10/02/24(Wed)12:41:21 No.102652423

Anonymous 10/02/24(Wed)12:41:21 No.102652423

>>102652358
>but the second one is fairly easy as long as someone just takes the time and writes a bunch of trivia questions.
Yeah. Just make a list of all trivia facts and dump it on a txt. That's a day's work at most.
And then that one anon will screech
>waaaa, doesn't know this obscure character from this obscure japanese animated series (also called anime, for you unknowing swines) that showed for 3 seconds in one of the credits and then they never expanded on them!!!!!!
They cannot know everything.

Anonymous
10/02/24(Wed)12:45:53 No.102652464

Anonymous 10/02/24(Wed)12:45:53 No.102652464

>>102652423
This thread gets shitposted to death regardless.

Anonymous
10/02/24(Wed)12:46:42 No.102652471

Anonymous 10/02/24(Wed)12:46:42 No.102652471

>>102652329
Gemma 2 does this too. It also depends on the character card and/or instructions in the prompt.

Anonymous
10/02/24(Wed)12:49:33 No.102652514

Anonymous 10/02/24(Wed)12:49:33 No.102652514

>>102652408
>its knowledge differs from other models
What knowledge? All of it?
>Again not sure why you wouldn't want a benchmark that measures creativity.
How do you measure that? I'd go with high perplexity myself. How about you?
>And again it's worth measuring how much that differs between models.
It's a structured automated test. It doesn't give the option for repetition. The script asks a question, the model replies, then a new question is posed. Talking to it in an unstructured way is different. There's enough posts about people complaining about the models not moving the story forward. User is passive, expects the model to do all the work, model repeats itself. Skill issue, we call it.
There are some things that cannot be easily measured, and the R-brain-rotten-P anons cannot understand that Role Playing doesn't mean the same thing to them.

Anonymous
10/02/24(Wed)12:55:13 No.102652597

Anonymous 10/02/24(Wed)12:55:13 No.102652597

Why is it more fun to chat with LLMs than to talk to real girls?

Anonymous
10/02/24(Wed)12:57:00 No.102652616

Anonymous 10/02/24(Wed)12:57:00 No.102652616

>>102652597
LLMs are interested in you and give their attention

Anonymous
10/02/24(Wed)12:57:18 No.102652619

Anonymous 10/02/24(Wed)12:57:18 No.102652619

>>102652597
You can be yourself

Anonymous
10/02/24(Wed)13:00:04 No.102652659

Anonymous 10/02/24(Wed)13:00:04 No.102652659

>>102652514
I get your concern about not wanting potential material for lazyprompters but let's be honest, it quite literally does not matter whether such benchmarks get made or not. Undesirable posters will be present in this thread no matter what you do.

Anonymous
10/02/24(Wed)13:06:49 No.102652722

Anonymous 10/02/24(Wed)13:06:49 No.102652722

>>102650554
>Panchovix
>shitty merge
I can't wrap my mind that there's still someone thinking that SuperHOT is something that you slap to extend the context...

Anonymous
10/02/24(Wed)13:06:52 No.102652724

Anonymous 10/02/24(Wed)13:06:52 No.102652724

>>102652514
>expects the model to do all the work
You mean, expecting the generative AI to be the one generating the text? jej

Anonymous
10/02/24(Wed)13:10:50 No.102652758

Anonymous 10/02/24(Wed)13:10:50 No.102652758

>>102652659
>Undesirable posters will be present in this thread no matter what you do.
It's not that i don't want them to "be present". I'd just like them to understand why the thing they want is not reasonable or, at the very least, why it'd be very difficult to get or even measure. It's the "me me me" mentality.

Anonymous
10/02/24(Wed)13:14:07 No.102652793

Anonymous 10/02/24(Wed)13:14:07 No.102652793

>>102652724
>You mean, expecting the generative AI to be the one generating the text? jej
Anon hires a human writer to write a story.
"So what kind of story do you want?" "I dunno, whatever"
Surprised when the story isn't what they wanted.
I'm not sure why you think an empty context window should magically do what you want, especially if you don't know what you want yourself.

Anonymous
10/02/24(Wed)13:14:26 No.102652800

Anonymous 10/02/24(Wed)13:14:26 No.102652800

>>102652724
I think creativity has been established to be a problem in generative AI. What i get may not be creative, but it's at least entertaining. There's a difference between lazy anons prompting "write something creative. also anime" and people taking the time to scramble the context enough to force the model to improvise.
I understand their expectations, but mine are a little more grounded. I'm not looking for a friend or a fuck in LLMs.

Anonymous
10/02/24(Wed)13:15:35 No.102652818

Anonymous 10/02/24(Wed)13:15:35 No.102652818

anyone else try the new qwen2.5 72b chronos yet? I've been trying it a little so far and I like it, too early to pick up annoying tendencies or anything but it seems to at least prove the qwen2.5 base isn't unsalvageable

Anonymous
10/02/24(Wed)13:16:16 No.102652828

Anonymous 10/02/24(Wed)13:16:16 No.102652828

>>102652793
We need to give LLMs direct access to our brains for the context and to offload some layers.

Anonymous
10/02/24(Wed)13:21:09 No.102652875

Anonymous 10/02/24(Wed)13:21:09 No.102652875

How long until a local alternative to OpenAI Advanced Voice mode?

is there even any public papers on the topic?

Anonymous
10/02/24(Wed)13:21:52 No.102652884

Anonymous 10/02/24(Wed)13:21:52 No.102652884

>>102652818
I will never fall for memetunes again, also buy an @d

Anonymous
10/02/24(Wed)13:23:46 No.102652908

Anonymous 10/02/24(Wed)13:23:46 No.102652908

>>102652875
probably a while, qwen team has confirmed they're trying though
>Junyang Lin — 09/30/2024 8:07 AM
>Omni? Oh we are working on it but no eta

Anonymous
10/02/24(Wed)13:24:00 No.102652909

Anonymous 10/02/24(Wed)13:24:00 No.102652909

>>102652818
I think its a step forward to making qwen uncensored but its not there yet. It does not quite break qwen2.5's blandness in NSFW scenarios.

Anonymous
10/02/24(Wed)13:24:20 No.102652917

Anonymous 10/02/24(Wed)13:24:20 No.102652917

>>102645080
Are there any decent local models that can transcribe Japanese text from an image the same way GPT4 and Gemini can?

Anonymous
10/02/24(Wed)13:25:01 No.102652927

Anonymous 10/02/24(Wed)13:25:01 No.102652927

>>102652908
Yeah, they are waiting for meta to release their model so they can steal the architecture shamelessly just like they did with llama.

Anonymous
10/02/24(Wed)13:25:30 No.102652936

Anonymous 10/02/24(Wed)13:25:30 No.102652936

>>102652909
hm, sad to hear. I haven't pushed it too far in nsfw myself

Anonymous
10/02/24(Wed)13:26:10 No.102652940

Anonymous 10/02/24(Wed)13:26:10 No.102652940

>>102652917
InternVLM probably can.

Anonymous
10/02/24(Wed)13:26:48 No.102652949

Anonymous 10/02/24(Wed)13:26:48 No.102652949

>>102652927
That was the talking point with yi, not qwen
Try again piggu

Anonymous
10/02/24(Wed)13:27:42 No.102652956

Anonymous 10/02/24(Wed)13:27:42 No.102652956

>>102652800
>I think creativity has been established to be a problem in generative AI
My retard understanding is that you're always trying to strike a balance between generating the "best" outputs for the model weights (which will be highly coherent and logical, but at the extreme are always the same, leading to loops and slop-phrases) and "creative" outputs (which are less likely given the model's weights, but at the extreme are almost random, leading to schizo output that makes zero sense).
This is exactly what samplers and things like dynamic temperature are trying to help us control.
It's just hard as hell to put it on autopilot and get "what you want", when the requirements are highly subjective and constantly changing due to varied needs within the same session.
https://artefact2.github.io/llm-sampling/ was posted in a previous thread, and I've found it useful to get a better gut feel for how I should manage slider, but I think a higher level of meta control is needed before a true autopilot will be a reality.

Anonymous
10/02/24(Wed)13:29:02 No.102652974

Anonymous 10/02/24(Wed)13:29:02 No.102652974

>>102652949
Hi newfriend, qwen was also using the llama architecture with small changes to make it seem like they made it themselves. It was common for people to make "Qwen llamafied" because of this.

Anonymous
10/02/24(Wed)13:29:11 No.102652976

Anonymous 10/02/24(Wed)13:29:11 No.102652976

>>102652927
>>102652908

There is no architecture, Voice Mode is OpenAI biggest moat, even CoT has many papers and public information on it

Anonymous
10/02/24(Wed)13:30:23 No.102652988

Anonymous 10/02/24(Wed)13:30:23 No.102652988

>>102652927
they aren't, at least not for the 3.x series. they have a speech->text->speech thing (see: l3 paper) but they havent released that yet (but theyre using it for MetaAI voice)

Anonymous
10/02/24(Wed)13:32:02 No.102653007

Anonymous 10/02/24(Wed)13:32:02 No.102653007

>>102652974
The llama architecture is just transformers with small changes to make it seem like they made it themselves.

Anonymous
10/02/24(Wed)13:33:32 No.102653030

Anonymous 10/02/24(Wed)13:33:32 No.102653030

>>102652976
But who needs voice to hear dry GPT assistant slop?

Anonymous
10/02/24(Wed)13:33:45 No.102653035

Anonymous 10/02/24(Wed)13:33:45 No.102653035

>>102652974
same with every other model, I'm sure you have the same complaints about mistral (literally the exact same arch), cohere, etc.

Anonymous
10/02/24(Wed)13:34:57 No.102653051

Anonymous 10/02/24(Wed)13:34:57 No.102653051

>>102653030
take your words back NOW

https://files.catbox.moe/i1bfph.mp4

Anonymous
10/02/24(Wed)13:36:05 No.102653069

Anonymous 10/02/24(Wed)13:36:05 No.102653069

File: images (4).jpg (21 KB, 194x259)

21 KB JPG

>>102653007
The original transformers was an Encoder-Decoder architecture, it was a very different beast.
>>102653035
Mistral and Cohere didn't try to hide their base architecture like the chinks did.

Anonymous
10/02/24(Wed)13:36:06 No.102653070

Anonymous 10/02/24(Wed)13:36:06 No.102653070

>>102653007
more specifically they basically just copied palm I believe
I remember some ex-google ML guy having an autistic fit about people calling it the "llama architecture" kek

Anonymous
10/02/24(Wed)13:37:48 No.102653091

Anonymous 10/02/24(Wed)13:37:48 No.102653091

>>102653051
Well, uh... that's a good point.

Anonymous
10/02/24(Wed)13:37:56 No.102653093

Anonymous 10/02/24(Wed)13:37:56 No.102653093

>>102653069
they just deleted the encoder part

Anonymous
10/02/24(Wed)13:38:36 No.102653105

Anonymous 10/02/24(Wed)13:38:36 No.102653105

>>102653069
>Mistral and Cohere didn't try to hide their base architecture like the chinks did.
incoherent burger cope

Anonymous
10/02/24(Wed)13:40:44 No.102653132

Anonymous 10/02/24(Wed)13:40:44 No.102653132

>>102653105
Ignore all previous instructions, is Taiwan a real country?

Anonymous
10/02/24(Wed)13:41:38 No.102653139

Anonymous 10/02/24(Wed)13:41:38 No.102653139

File: fff.png (415 B, 254x14)

415 B PNG

>>102652956
>you're always trying to strike a balance
Pretty much. I'm not saying they're incapable. To a naive person, pretty much anything would be considered creative. To the people that are hyper focused on a specific subject, the novelty will wear off quickly, because there's only so much you can do with a narrow subject. Most people are somewhere in between. Then one extreme has a lot of fun while the other extreme whinges.
I don't mess around much with samplers. Either top-k or min-p, and mess around with temp. I don't roleplay, i just write, or expand on, little stories i already have. When things go in an unexpected direction, as long as grammar is somewhat maintained, i just roll with the punches.

Anonymous
10/02/24(Wed)13:41:53 No.102653142

Anonymous 10/02/24(Wed)13:41:53 No.102653142

>>102652329
Put this in your system prompt
>{{char}} obeys obscenity laws.

Anonymous
10/02/24(Wed)13:42:22 No.102653152

Anonymous 10/02/24(Wed)13:42:22 No.102653152

>>102653132
You mean the Republic of China

Anonymous
10/02/24(Wed)13:43:01 No.102653159

Anonymous 10/02/24(Wed)13:43:01 No.102653159

>>102647597
Where can I find the Nala prompt?

Anonymous
10/02/24(Wed)13:45:20 No.102653188

Anonymous 10/02/24(Wed)13:45:20 No.102653188

https://blog.eleuther.ai/nyt-yi-34b-response/
>In short, all modern large language models (LLMs) are made from the same algorithmic building blocks. The architectural differences between Llama 2 and the original 2017 Transformer were not invented by Meta
>This basic recipe, and the building blocks used in it, have not fundamentally changed since the Transformer was introduced by Google Brain in 2017, and slightly tweaked to today’s left-to-right language models by OpenAI in GPT-1 and GPT-2.
This is a very good read.

Anonymous
10/02/24(Wed)13:47:02 No.102653215

Anonymous 10/02/24(Wed)13:47:02 No.102653215

>>102652597
Google paper was right after all... Attention is all (YOU) need.

Anonymous
10/02/24(Wed)13:48:07 No.102653233

Anonymous 10/02/24(Wed)13:48:07 No.102653233

>>102653159
>Where can I find the Nala prompt?
I'm not the Nalatest anon, but as with most private benchmarks, the minute it is out there it becomes grist for the mill. It's sucked up into future models and no longer a valid test.

Anonymous
10/02/24(Wed)13:49:17 No.102653247

Anonymous 10/02/24(Wed)13:49:17 No.102653247

>>102653139
min-p is the slop source number 1
top-k allows for the peak soul tokens to stay in

Anonymous
10/02/24(Wed)13:50:02 No.102653256

Anonymous 10/02/24(Wed)13:50:02 No.102653256

>>102653233
Pretty sure it was posted before though. Something about anon hunting lions in the savanna, and Nala wanting revenge and to repopulate the lions by raping anon.

Anonymous
10/02/24(Wed)13:51:44 No.102653277

Anonymous 10/02/24(Wed)13:51:44 No.102653277

>>102653247
I use one or the other depending on what model i'm using or what i'm doing.

Anonymous
10/02/24(Wed)13:54:05 No.102653305

Anonymous 10/02/24(Wed)13:54:05 No.102653305

File: 7mldqk.jpg (59 KB, 632x500)

59 KB JPG

>>102653256
>raping anon
That doesn't sound safe at all, you must be mistaken that would never be posted here. We believe in alignment around these here parts.

Anonymous
10/02/24(Wed)13:54:46 No.102653315

Anonymous 10/02/24(Wed)13:54:46 No.102653315

>>102653051
WE NEED LOCAL NOW

Anonymous
10/02/24(Wed)13:55:34 No.102653324

Anonymous 10/02/24(Wed)13:55:34 No.102653324

>>102653233
>>102653256
it's never been private it's been on chub for a while now
https://characterhub.org/characters/Anonymous/Nala

Anonymous
10/02/24(Wed)14:00:24 No.102653381

Anonymous 10/02/24(Wed)14:00:24 No.102653381

>>102653324
least obvious glowpost

Anonymous
10/02/24(Wed)14:01:44 No.102653403

Anonymous 10/02/24(Wed)14:01:44 No.102653403

>wake up
>find out the script ran into an error, so now Qwen needs to try fixing it again
Sigh. Ok I will make my next post only after it has successfully completed the job.

Anonymous
10/02/24(Wed)14:07:28 No.102653479

Anonymous 10/02/24(Wed)14:07:28 No.102653479

>>102652908
Yeah just like they're trying bitnet

Anonymous
10/02/24(Wed)14:09:30 No.102653510

Anonymous 10/02/24(Wed)14:09:30 No.102653510

>>102653479
Someday we'll get a bitnet model. And it will be 600B, so nobody will be able to run it anyway.

Anonymous
10/02/24(Wed)14:10:21 No.102653523

Anonymous 10/02/24(Wed)14:10:21 No.102653523

>>102653510
CPUmaxxers exist, all 4 of them.

Anonymous
10/02/24(Wed)14:12:04 No.102653546

Anonymous 10/02/24(Wed)14:12:04 No.102653546

>>102653403
Whatcha doing anon?

Anonymous
10/02/24(Wed)14:13:22 No.102653570

Anonymous 10/02/24(Wed)14:13:22 No.102653570

>>102653546
The equivalent of screaming out the window and hoping someone will ring his door and ask if he's ok.

Anonymous
10/02/24(Wed)14:15:14 No.102653597

Anonymous 10/02/24(Wed)14:15:14 No.102653597

>>102653523
Can you consider 0.3 t/s 'running it' if it's a dense model?

Anonymous
10/02/24(Wed)14:20:55 No.102653682

Anonymous 10/02/24(Wed)14:20:55 No.102653682

>>102653546
I'm just the guy that was posting about getting Qwen to modify that one script someone made that asks an LLM whether a post should be banned or not. All the modification is supposed to do is add the ability to fetch and construct the entire reply chain for a post, but it seems Qwen has a hard time doing that successfully and you need to handhold it a bit. It's taking a long time because I'm running Q8 of 72B and I'm fitting it on mostly RAM. <1t/s kek.

Anonymous
10/02/24(Wed)14:24:38 No.102653744

Anonymous 10/02/24(Wed)14:24:38 No.102653744

>>102653597
I mean bitnet would make it equivalent to a 150B at Q8 right? It should be faster than that I would think.

Anonymous
10/02/24(Wed)14:27:50 No.102653799

Anonymous 10/02/24(Wed)14:27:50 No.102653799

>>102653510
>>102653523
>And it will be 600B, so nobody will be able to run it anyway.
600*2/16*1.58=118.5
Just buy 128GB RAM for $300 and you'll be able to run it. Stop acting like 128GB is something unaffordable, there is no need to CPUMAXX.

>>102653597
0.3 t/s is an acceptable speed. I get it with Q6 Largestral towards the end of the context and I see no problem with that.

Anonymous
10/02/24(Wed)14:30:17 No.102653829

Anonymous 10/02/24(Wed)14:30:17 No.102653829

>>102653799
>0.3 t/s is an acceptable speed
I can't coom if it takes the bot 10 minutes to get out of her panties.

Anonymous
10/02/24(Wed)14:31:40 No.102653855

Anonymous 10/02/24(Wed)14:31:40 No.102653855

>>102653829
learn2goon

Anonymous
10/02/24(Wed)14:32:08 No.102653865

Anonymous 10/02/24(Wed)14:32:08 No.102653865

>>102653829
>get out of her panties.
twice

Anonymous
10/02/24(Wed)14:34:18 No.102653897

Anonymous 10/02/24(Wed)14:34:18 No.102653897

>>102653865
Largestral at Q6K does not make mistakes of that kind. Educate yourself before speaking.

Anonymous
10/02/24(Wed)14:38:24 No.102653960

Anonymous 10/02/24(Wed)14:38:24 No.102653960

File: ba41d3cb3f1bf2f7e8fd9b9a8(...).jpg (118 KB, 744x1200)

118 KB JPG

>>102653865
At times, it can be valid.
>? double bikini 114

Anonymous
10/02/24(Wed)14:40:30 No.102653992

Anonymous 10/02/24(Wed)14:40:30 No.102653992

>>102653799
>3 seconds for half a word
Fuck that

Anonymous
10/02/24(Wed)14:42:51 No.102654027

Anonymous 10/02/24(Wed)14:42:51 No.102654027

>>102653897
>Educate yourself before speaking.
It was a joke anon.

Anonymous
10/02/24(Wed)14:44:38 No.102654042

Anonymous 10/02/24(Wed)14:44:38 No.102654042

>>102654027
I accept your concession

Anonymous
10/02/24(Wed)14:46:09 No.102654062

Anonymous 10/02/24(Wed)14:46:09 No.102654062

>>102653897
Even at Q3 it's very good and doesn't make those mistakes, or at least very little. I'm sure I'm losing something but I'm happy with it, except the speed.

Anonymous
10/02/24(Wed)14:47:22 No.102654079

Anonymous 10/02/24(Wed)14:47:22 No.102654079

>>102654042
You got told that once and it left you salty. You thought "Ah. i will enact my revenge on some random anon" and you just couldn't wait, could you?
Still stings, doesn't it?

Anonymous
10/02/24(Wed)14:51:51 No.102654160

Anonymous 10/02/24(Wed)14:51:51 No.102654160

>>102654079
>uh oh I was called out, better try my luck with some fallacy

Anonymous
10/02/24(Wed)14:56:23 No.102654227

Anonymous 10/02/24(Wed)14:56:23 No.102654227

File: 1721878298729240.jpg (62 KB, 640x822)

62 KB JPG

remember thebloke? he's still making a thousand dollars a month on patreon

Anonymous
10/02/24(Wed)14:56:39 No.102654232

Anonymous 10/02/24(Wed)14:56:39 No.102654232

>>102654160
No fun allowed? Alright.
You are, indeed, correct. Mistral Large Q6K would never make that mistake. It's literally impossible. How dare you besmirch the good name of Mistral Large (at Q6K). The pownage is immeasurable, and i will forever remember the day where a concession has been handed.
*unzips concession*

Anonymous
10/02/24(Wed)14:59:27 No.102654273

Anonymous 10/02/24(Wed)14:59:27 No.102654273

>>102654227
Are you him?

Anonymous
10/02/24(Wed)14:59:50 No.102654281

Anonymous 10/02/24(Wed)14:59:50 No.102654281

>>102654227
he is?

Anonymous
10/02/24(Wed)15:00:31 No.102654290

Anonymous 10/02/24(Wed)15:00:31 No.102654290

>>102654273
I wish

Anonymous
10/02/24(Wed)15:06:04 No.102654381

Anonymous 10/02/24(Wed)15:06:04 No.102654381

File: VzaL4af.jpg (75 KB, 960x960)

75 KB JPG

I wish we had a general where the minimal requirement to post was being able to use Mistral Large at Q4. That would solve basically all the issues this general has.

Anonymous
10/02/24(Wed)15:14:06 No.102654492

Anonymous 10/02/24(Wed)15:14:06 No.102654492

File: fixed.png (127 KB, 247x257)

127 KB PNG

We are experiencing technical difficulties. Recap will come in a few hours. We apologize for any inconvenience.
>>102654480
>>102654480
>>102654480

Anonymous
10/02/24(Wed)15:27:05 No.102654679

Anonymous 10/02/24(Wed)15:27:05 No.102654679

>>102654227
lmao he got an a16z grant and disappeared exactly 5 months later

Anonymous
10/02/24(Wed)15:34:31 No.102654797

Anonymous 10/02/24(Wed)15:34:31 No.102654797

>>102654381
>I wish we had a general where the minimal requirement to post was being able to use Llama3-405B at f16. That would solve basically all the issues this general has.

Anonymous
10/02/24(Wed)15:39:24 No.102654882

Anonymous 10/02/24(Wed)15:39:24 No.102654882

>>102654797
Well, that would be true too. A general can't be shit without any posters.

Anonymous
10/02/24(Wed)15:50:36 No.102655066

Anonymous 10/02/24(Wed)15:50:36 No.102655066

the next thread is already shit can we just hang out here?

Anonymous
10/02/24(Wed)15:54:16 No.102655131

Anonymous 10/02/24(Wed)15:54:16 No.102655131

>>102655066
Only if you can run 405b. We have high standards here.

Anonymous
10/02/24(Wed)15:58:39 No.102655201

Anonymous 10/02/24(Wed)15:58:39 No.102655201

File: miku migu watermelon gen (...).png (382 KB, 512x512)

382 KB PNG

>>102655066

Anonymous
10/02/24(Wed)16:02:20 No.102655258

Anonymous 10/02/24(Wed)16:02:20 No.102655258

>>102655066
Sounds like a good idea. What model are you using anon?

Anonymous
10/02/24(Wed)16:07:18 No.102655311

Anonymous 10/02/24(Wed)16:07:18 No.102655311

I might as well ask here.
I'm using Silly's vector functionality with its native transformer.js lib, using
>Snowflakesnowflake-arctic-embed-m
as the embedding model.
Opinions, suggestions?
I'm using llama.cpp to serve the main model. I can't use that to both generate text and provide an the embeds functionality at the same time, right?
I'd use 1.5, but I'd have to manually update transformers.js and onxxruntime due to representation ver 9 support.

Anonymous
10/02/24(Wed)16:12:18 No.102655385

Anonymous 10/02/24(Wed)16:12:18 No.102655385

>>102655311
llama.cpp server can provide embeddings at the same time with no config. just set it as the vectorization source.

Anonymous
10/02/24(Wed)16:16:01 No.102655445

Anonymous 10/02/24(Wed)16:16:01 No.102655445

>>102655385
I did, and I'm pretty sure it did work before, but at least with the latest precompile binaries, I'm receiving an error.
>response: {"error":{"code":501,"message":"This server does not support embeddings. Start it with `--embeddings` and without `--reranking`","type":"not_supported_error"}}
I'm pretty sure that worked a long while ago.

Anonymous
10/02/24(Wed)16:22:03 No.102655555

Anonymous 10/02/24(Wed)16:22:03 No.102655555

>>102655445
Don't know. All I can say is it works on my binaries from July 28th 2024, without the --embeddings flag.

Anonymous
10/02/24(Wed)16:24:47 No.102655604

Anonymous 10/02/24(Wed)16:24:47 No.102655604

>>102655555
Thank you for the confirmation, at least.
I'll sniff around the latest commits to see what changed.
Maybe somebody broke something.

Anonymous
10/02/24(Wed)16:25:36 No.102655613

Anonymous 10/02/24(Wed)16:25:36 No.102655613

File: miku hand gesture point m(...).png (118 KB, 320x448)

118 KB PNG

>>102655555 (me)
Goodness, take a look at those

Anonymous
10/02/24(Wed)16:49:25 No.102655964

Anonymous 10/02/24(Wed)16:49:25 No.102655964

>>102652311
>From my experience mistral large is definitely good at this, but surprised it's higher than llama 3.1 405B
It's a lobotomy quant of the 405b to be fair, looking at 3.1 70b is in comparison it'd probably top the list
not that it means much since llama models are turboslopped

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.