/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 10/10/24(Thu)02:20:06 No.102758839

File: gumi thumbs up paint swir(...).jpg (640 KB, 1024x1024)

640 KB JPG

/lmg/ - Local Models General Anonymous 10/10/24(Thu)02:20:06 No.102758839 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

LMGumi Edition

Previous threads: >>102743974 & >>102737214

►News
>(10/10) Aria: 25.3B, 3.9B active, multimodal native MoE model with 64k context: https://hf.co/rhymes-ai/Aria
>(09/27) Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
>(09/25) Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog
>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
10/10/24(Thu)02:20:39 No.102758842

Anonymous 10/10/24(Thu)02:20:39 No.102758842

File: normal gumi gen.png (588 KB, 512x512)

588 KB PNG

►Recent Highlights from the Previous Thread: >>102743974

--Aria: Open-source multimodal native MoE model by Rhymes AI:
>102758358
--Mistral tokenizer and prompt formatting discussion:
>102751333 >102751359 >102751381 >102751464 >102751910 >102751487
--AMD GPUs face challenges with multi-GPU support and P2P transfers, 16GB RX 6800 recommended for sub-$300 category:
>102747480 >102747696 >102747799 >102747830 >102747929 >102748171 >102748032 >102747714
--Mistral Small and Mixtral 8x7b Q6_K recommended for 3090 setup:
>102750146 >102750611 >102750764 >102750956 >102751248 >102751465
--Strubell's 100 million gallons of oil per AI inference claim debunked:
>102744385
--Larger models can be more creative with the right sampling techniques:
>102750367 >102750404 >102750470 >102750515 >102750718 >102750983 >102751017 >102751242 >102750832 >102751123 >102750521
--Language models can imitate constructed languages but have limitations:
>102748022 >102748182
--Creating an uncensored model from scratch would require addressing dataset limitations, built-in biases, and training costs:
>102747016 >102747423
--Backend automatically adds BOS token, don't add in frontend:
>102753612 >102753668 >102753702 >102753786 >102753898 >102754063 >102754507 >102754742 >102754780 >102755039
--Request for node-based editor with specific AI features:
>102744816 >102744869
--GPU upgrades recommended for better performance over CPU inference:
>102745882 >102745931 >102746014 >102746375 >102752232 >102746217
--Benchmark leaderboard discussion, Mistral Small performing well, questions about Gemma2 9b performance:
>102748238 >102748272 >102748518 >102748557
--Miku (free space):
>102744114 >102744270 >102744320 >102744360 >102746386 >102748863 >102754933 >102755545 >102756899 >102756964 >102757239 >102757390 >102758184

►Recent Highlight Posts from the Previous Thread: >>102743977

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
10/10/24(Thu)02:28:18 No.102758883

Anonymous 10/10/24(Thu)02:28:18 No.102758883

Gumilove

Anonymous
10/10/24(Thu)02:31:43 No.102758901

Anonymous 10/10/24(Thu)02:31:43 No.102758901

My server is down so I can't Nala test Aria, sorry.

Anonymous
10/10/24(Thu)02:37:04 No.102758927

Anonymous 10/10/24(Thu)02:37:04 No.102758927

File: 00202-4258927505.png (371 KB, 512x512)

371 KB PNG

>>102758901
Oh yeah PSA:
with regards to running multiple desktop PSUs in tandem. Went to take one of my PSUs and 3090s out to put in my gaming PC for a while, while I wait for 90B gguf support, couldn't get power on. So tried different cables etc, trouble shooting more or less places the problem with the 24 pin motherboard connector on the PSU side. Presumably it somehow got cooked (it was running a mutual ground cable between it and my other PSU so never noticed since it was only supplying ground and power for one 3090). Would advise if someone is going to build their own server to avoid running tandem PSUs. Just stick to what you can fit on a single one, power limit if you have to.

Anonymous
10/10/24(Thu)02:46:34 No.102758969

Anonymous 10/10/24(Thu)02:46:34 No.102758969

File: 2347656478769.gif (929 KB, 326x318)

929 KB GIF

>>102758842
>AMD GPUs face challenges with multi-GPU support

I use a 7900xtx and 7800xt and for some reason only kobold version 1.61.2.yr0-ROCm is the only program that will allow any sort of gpu splitting.

It simply does not work on any version, as any output is garbled bullshit. Outside of this specific version, I can only use 1 gpu.

My performance is great, but fuck me not being able to GPU split on new versions for no fucking reason fucking sucks ass, fortunately L3 also sucks ass, so updating isnt exactly necessary.

Anonymous
10/10/24(Thu)02:56:33 No.102759013

Anonymous 10/10/24(Thu)02:56:33 No.102759013

Aria.gguf?

Anonymous
10/10/24(Thu)02:56:53 No.102759015

Anonymous 10/10/24(Thu)02:56:53 No.102759015

File: 2024-10-10_065250_seed594(...).png (3.24 MB, 960x2304)

3.24 MB PNG

>>102758839
>LMGumi Edition
Affirmative. Local Gumi activated.

Anonymous
10/10/24(Thu)03:46:01 No.102759270

Anonymous 10/10/24(Thu)03:46:01 No.102759270

>>102758839
Is there a way to hide or rename the assistant and user roles in the tavern chat history the model is fed? I want to see whether that might not improve how natural the outputs are with some models.

Anonymous
10/10/24(Thu)04:08:37 No.102759376

Anonymous 10/10/24(Thu)04:08:37 No.102759376

svelk

Anonymous
10/10/24(Thu)04:09:31 No.102759380

Anonymous 10/10/24(Thu)04:09:31 No.102759380

File: Untitled.png (1.79 MB, 1080x3084)

1.79 MB PNG

Accelerating Diffusion Transformers with Token-wise Feature Caching
https://arxiv.org/abs/2410.05317
>Diffusion transformers have shown significant effectiveness in both image and video synthesis at the expense of huge computation costs. To address this problem, feature caching methods have been introduced to accelerate diffusion transformers by caching the features in previous timesteps and reusing them in the following timesteps. However, previous caching methods ignore that different tokens exhibit different sensitivities to feature caching, and feature caching on some tokens may lead to 10× more destruction to the overall generation quality compared with other tokens. In this paper, we introduce token-wise feature caching, allowing us to adaptively select the most suitable tokens for caching, and further enable us to apply different caching ratios to neural layers in different types and depths. Extensive experiments on PixArt-α, OpenSora, and DiT demonstrate our effectiveness in both image and video generation with no requirements for training. For instance, 2.36× and 1.93× acceleration are achieved on OpenSora and PixArt-α with almost no drop in generation quality.
https://github.com/Shenyi-Z/ToCa
neat

Anonymous
10/10/24(Thu)04:13:01 No.102759401

Anonymous 10/10/24(Thu)04:13:01 No.102759401

can i run joycaption on koboldcpp

Anonymous
10/10/24(Thu)04:29:27 No.102759501

Anonymous 10/10/24(Thu)04:29:27 No.102759501

File: rtx.png (822 KB, 1918x562)

822 KB PNG

>32GB
>16GB
Actual humilliation ritual

Anonymous
10/10/24(Thu)04:31:34 No.102759512

Anonymous 10/10/24(Thu)04:31:34 No.102759512

>>102759501
Already set money aside for this beauty.

Anonymous
10/10/24(Thu)04:32:30 No.102759516

Anonymous 10/10/24(Thu)04:32:30 No.102759516

>>102759501
Massive improvements to bandwidth though. You'll run Mistral-Large 5bpw at 20t/s on three of these babies.

Anonymous
10/10/24(Thu)04:32:41 No.102759517

Anonymous 10/10/24(Thu)04:32:41 No.102759517

>>102759501
wtf is that image

Anonymous
10/10/24(Thu)04:35:23 No.102759527

Anonymous 10/10/24(Thu)04:35:23 No.102759527

Tried that mahou nemo finetune
It's good actually, better than nemo by itself i think (so far at least).
Nemo was very repetitive and predictable at times, this one feels fresh, but not sure how long it will last.

Anonymous
10/10/24(Thu)04:35:53 No.102759528

Anonymous 10/10/24(Thu)04:35:53 No.102759528

>>102759501
Having 32GB world be nice, but I bet it's going to overheat my current setup.

Anonymous
10/10/24(Thu)04:38:23 No.102759535

Anonymous 10/10/24(Thu)04:38:23 No.102759535

Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling
https://arxiv.org/abs/2410.07145
>One essential advantage of recurrent neural networks (RNNs) over transformer-based language models is their linear computational complexity concerning the sequence length, which makes them much faster in handling long sequences during inference. However, most publicly available RNNs (e.g., Mamba and RWKV) are trained on sequences with less than 10K tokens, and their effectiveness in longer contexts remains largely unsatisfying so far. In this paper, we study the cause of the inability to process long context for RNNs and suggest critical mitigations. We examine two practical concerns when applying state-of-the-art RNNs to long contexts: (1) the inability to extrapolate to inputs longer than the training length and (2) the upper bound of memory capacity. Addressing the first concern, we first investigate *state collapse* (SC), a phenomenon that causes severe performance degradation on sequence lengths not encountered during training. With controlled experiments, we attribute this to overfitting due to the recurrent state being overparameterized for the training length. For the second concern, we train a series of Mamba-2 models on long documents to empirically estimate the recurrent state capacity in language modeling and passkey retrieval. Then, three SC mitigation methods are proposed to improve Mamba-2's length generalizability, allowing the model to process more than 1M tokens without SC. We also find that the recurrent state capacity in passkey retrieval scales exponentially to the state size, and we empirically train a Mamba-2 370M with near-perfect passkey retrieval accuracy on 256K context length. This suggests a promising future for RNN-based long-context modeling.
https://github.com/thunlp/stuffed-mamba
Git isn't live yet. good news though mambabros

Anonymous
10/10/24(Thu)04:39:08 No.102759540

Anonymous 10/10/24(Thu)04:39:08 No.102759540

>>102759501
buying a 600w launch card with a new connector seems like a bad idea

Anonymous
10/10/24(Thu)04:40:02 No.102759548

Anonymous 10/10/24(Thu)04:40:02 No.102759548

File: Untitled.png (942 KB, 1080x2579)

942 KB PNG

>>102759535
woops

Anonymous
10/10/24(Thu)04:41:02 No.102759553

Anonymous 10/10/24(Thu)04:41:02 No.102759553

>>102759501
First gen of GDDR7 memory didn't increase in density so besides going clamshell fitting another 8GB was all they could do.

Anonymous
10/10/24(Thu)04:44:31 No.102759570

Anonymous 10/10/24(Thu)04:44:31 No.102759570

>>102759501
>going to legit need 2 power supplies to run them all

Anonymous
10/10/24(Thu)04:45:48 No.102759580

Anonymous 10/10/24(Thu)04:45:48 No.102759580

>>102759516
No thanks, I'm fine with 15t/s on 4x3090

Anonymous
10/10/24(Thu)04:46:58 No.102759587

Anonymous 10/10/24(Thu)04:46:58 No.102759587

>>102759516
It's just a 50% improvement at most for the 5090
That' going from 10 t/s to 15 t/s

Anonymous
10/10/24(Thu)04:54:41 No.102759634

Anonymous 10/10/24(Thu)04:54:41 No.102759634

>>102759587
It'll be nice in cases where you go from something like 4 t/s to 6 t/s and into acceptable speed territory.

Anonymous
10/10/24(Thu)04:57:04 No.102759650

Anonymous 10/10/24(Thu)04:57:04 No.102759650

>Aria
Another multimodal that I can't run yet >:(

Anonymous
10/10/24(Thu)05:00:11 No.102759675

Anonymous 10/10/24(Thu)05:00:11 No.102759675

File: Untitled.png (1.44 MB, 1080x3040)

1.44 MB PNG

Round and Round We Go! What makes Rotary Positional Encodings useful?
https://arxiv.org/abs/2410.06205
>Positional Encodings (PEs) are a critical component of Transformer-based Large Language Models (LLMs), providing the attention mechanism with important sequence-position information. One of the most popular types of encoding used today in LLMs are Rotary Positional Encodings (RoPE), that rotate the queries and keys based on their relative distance. A common belief is that RoPE is useful because it helps to decay token dependency as relative distance increases. In this work, we argue that this is unlikely to be the core reason. We study the internals of a trained Gemma 7B model to understand how RoPE is being used at a mechanical level. We find that Gemma learns to use RoPE to construct robust "positional" attention patterns by exploiting the highest frequencies. We also find that, in general, Gemma greatly prefers to use the lowest frequencies of RoPE, which we suspect are used to carry semantic information. We mathematically prove interesting behaviours of RoPE and conduct experiments to verify our findings, proposing a modification of RoPE that fixes some highlighted issues and improves performance. We believe that this work represents an interesting step in better understanding PEs in LLMs, which we believe holds crucial value for scaling LLMs to large sizes and context lengths.
really interesting

Anonymous
10/10/24(Thu)05:04:05 No.102759705

Anonymous 10/10/24(Thu)05:04:05 No.102759705

>>102759580
Liar, you are getting about 6t/s

Anonymous
10/10/24(Thu)05:10:40 No.102759758

Anonymous 10/10/24(Thu)05:10:40 No.102759758

>>102759380
>>102759535
>>102759675
none of this ever amounts to anything. nothing ever happens.

Anonymous
10/10/24(Thu)05:11:56 No.102759770

Anonymous 10/10/24(Thu)05:11:56 No.102759770

>>102759705
Why don't you try it for yourself?
https://github.com/theroyallab/tabbyAPI/
tensor_parallel: true

Anonymous
10/10/24(Thu)05:14:02 No.102759790

Anonymous 10/10/24(Thu)05:14:02 No.102759790

>>102759634
As a vramlet, I consider 0.5 miqu slow cook, 1.5 slow but usable, 4+ (mixtral) being the acceptable minimum for what I would consider real-time.
I guess on top of that it would be nice to be able to regenerate walls of text and only glancing through it but that's a luxury.

Anonymous
10/10/24(Thu)05:16:21 No.102759814

Anonymous 10/10/24(Thu)05:16:21 No.102759814

>>102759634
You won't notice much difference if you're offloading.

Anonymous
10/10/24(Thu)05:22:30 No.102759859

Anonymous 10/10/24(Thu)05:22:30 No.102759859

File: Untitled.png (1.11 MB, 1080x2036)

1.11 MB PNG

Restructuring Vector Quantization with the Rotation Trick
https://arxiv.org/abs/2410.06424
>Vector Quantized Variational AutoEncoders (VQ-VAEs) are designed to compress a continuous input to a discrete latent space and reconstruct it with minimal distortion. They operate by maintaining a set of vectors -- often referred to as the codebook -- and quantizing each encoder output to the nearest vector in the codebook. However, as vector quantization is non-differentiable, the gradient to the encoder flows around the vector quantization layer rather than through it in a straight-through approximation. This approximation may be undesirable as all information from the vector quantization operation is lost. In this work, we propose a way to propagate gradients through the vector quantization layer of VQ-VAEs. We smoothly transform each encoder output into its corresponding codebook vector via a rotation and rescaling linear transformation that is treated as a constant during backpropagation. As a result, the relative magnitude and angle between encoder output and codebook vector becomes encoded into the gradient as it propagates through the vector quantization layer and back to the encoder. Across 11 different VQ-VAE training paradigms, we find this restructuring improves reconstruction metrics, codebook utilization, and quantization error.
https://github.com/cfifty/rotation_trick
since VQ-VAEs are used so much this will have a lot of cool downstream effects

Anonymous
10/10/24(Thu)05:24:23 No.102759874

Anonymous 10/10/24(Thu)05:24:23 No.102759874

Single 5090, 2000€ min price, actual market price around 3000€ probably, 32gb vram, 600W.
x4 3090, 2000€, 96gb vram, around 1000w total with undervolt.
x2 3090, 1000€, 48gb vram, around 600w total with undervolt.
Lmao xd lol

Anonymous
10/10/24(Thu)05:28:15 No.102759891

Anonymous 10/10/24(Thu)05:28:15 No.102759891

>>102759501
you can get 48gb workstation cards for about that price used on ebay, and 3090s are cheaper than either choice

Anonymous
10/10/24(Thu)05:39:33 No.102759947

Anonymous 10/10/24(Thu)05:39:33 No.102759947

Has anyone here tried to use local models for language practice? Specifically Japanese? Would something like Nemo do the trick? I've tried its Spanish, and although it's not perfect, it's good enough.

Anonymous
10/10/24(Thu)05:41:22 No.102759956

Anonymous 10/10/24(Thu)05:41:22 No.102759956

>>102759874
I ain't got space in my PC for many cards

Anonymous
10/10/24(Thu)05:47:34 No.102759989

Anonymous 10/10/24(Thu)05:47:34 No.102759989

File: Untitled.png (454 KB, 1080x2314)

454 KB PNG

InAttention: Linear Context Scaling for Transformers
https://arxiv.org/abs/2410.07063
>VRAM requirements for transformer models scale quadratically with context length due to the self-attention mechanism. In this paper we modify the decoder-only transformer, replacing self-attention with InAttention, which scales linearly with context length during inference by having tokens attend only to initial states. Benchmarking shows that InAttention significantly reduces VRAM usage during inference, enabling handling of long sequences on consumer GPUs. We corroborate that fine-tuning extends context length efficiently, improving performance on long sequences without high training costs. InAttention offers a scalable solution for long-range dependencies in transformer models, paving the way for further optimization.
paper by one guy (actually only paper of his on arxiv). can't find code and he hasn't uploaded the models to HF. still, interesting

Anonymous
10/10/24(Thu)05:51:09 No.102760006

Anonymous 10/10/24(Thu)05:51:09 No.102760006

>>102759956
A open rig is like 80€.

Anonymous
10/10/24(Thu)05:53:25 No.102760015

Anonymous 10/10/24(Thu)05:53:25 No.102760015

File: 1472860069099.png (191 KB, 600x979)

191 KB PNG

Hey it's me that guy who posts burger catastrophes and asks for models and only has 8gb of vram. You all know me by now so is there any new good models? See you in a month.

Anonymous
10/10/24(Thu)05:58:41 No.102760056

Anonymous 10/10/24(Thu)05:58:41 No.102760056

>>102760006
How loud is that gonna be with 4x 3090s running?

Anonymous
10/10/24(Thu)06:01:18 No.102760073

Anonymous 10/10/24(Thu)06:01:18 No.102760073

>>102760015
just tried mistral small 22b instruct, first impression is great, like miqu 70b at home, but acceptable speed and even leaves room for 32k context if I wish.

Anonymous
10/10/24(Thu)06:15:08 No.102760145

Anonymous 10/10/24(Thu)06:15:08 No.102760145

>>102760073
This seems too big. Would using the 2Q still be higher quality than an 8b at a higher quant?

Anonymous
10/10/24(Thu)06:16:13 No.102760152

Anonymous 10/10/24(Thu)06:16:13 No.102760152

>>102760073
why are you pushing this garbo model so hard, what is your end game

Anonymous
10/10/24(Thu)06:30:46 No.102760265

Anonymous 10/10/24(Thu)06:30:46 No.102760265

>>102760145
You need to offload

Anonymous
10/10/24(Thu)06:31:35 No.102760273

Anonymous 10/10/24(Thu)06:31:35 No.102760273

File: 118792626_p0.png (2.76 MB, 2508x3541)

2.76 MB PNG

>>102760265
I cannot. The model must exist solely within 8GB of vram. Please understand.

Anonymous
10/10/24(Thu)06:39:30 No.102760314

Anonymous 10/10/24(Thu)06:39:30 No.102760314

File: 1707912212705564.png (20 KB, 90x88)

20 KB PNG

>>102760273
you can choose to do that but then you are stuck with retarded 7B models with goldfish memory

Anonymous
10/10/24(Thu)06:47:05 No.102760368

Anonymous 10/10/24(Thu)06:47:05 No.102760368

Aria verdict?

Anonymous
10/10/24(Thu)06:48:29 No.102760379

Anonymous 10/10/24(Thu)06:48:29 No.102760379

>>102760368
Waiting for gguf

Anonymous
10/10/24(Thu)06:56:37 No.102760427

Anonymous 10/10/24(Thu)06:56:37 No.102760427

File: IMG_6183.jpg (134 KB, 896x735)

134 KB JPG

Creator of this ad definitely knows something we don’t, how are they fine tuning a closed model?? OpenAI hates this one easy trick!

Anonymous
10/10/24(Thu)07:00:02 No.102760451

Anonymous 10/10/24(Thu)07:00:02 No.102760451

>>102760427
>how are they fine tuning a closed model?
https://openai.com/index/gpt-3-5-turbo-fine-tuning-and-api-updates/
>August 22, 2023
https://platform.openai.com/docs/guides/fine-tuning
any other stupid questions?

Anonymous
10/10/24(Thu)07:00:58 No.102760458

Anonymous 10/10/24(Thu)07:00:58 No.102760458

>>102760427
go back

Anonymous
10/10/24(Thu)07:05:01 No.102760486

Anonymous 10/10/24(Thu)07:05:01 No.102760486

>>102760152
NTA but what's the alternative? Llama 3? Qwen? These models are too censored

Anonymous
10/10/24(Thu)07:05:10 No.102760489

Anonymous 10/10/24(Thu)07:05:10 No.102760489

File: Two hikers make their way(...).webm (1.42 MB, 1280x768)

1.42 MB WEBM

>Pyramid Flow, a training-efficient Autoregressive Video Generation method based on Flow Matching. By training only on open-source datasets, it generates high-quality 10-second videos at 768p resolution and 24 FPS, and naturally supports image-to-video generation.
https://pyramid-flow.github.io/

Anonymous
10/10/24(Thu)07:07:52 No.102760502

Anonymous 10/10/24(Thu)07:07:52 No.102760502

>>102760427
There was just a leak of an online AI girlfriend site so better to keep it local or use a throwaway email.

Anonymous
10/10/24(Thu)07:08:16 No.102760506

Anonymous 10/10/24(Thu)07:08:16 No.102760506

>>102760486
Nemo. More parameters don't make a model automatically better.

Anonymous
10/10/24(Thu)07:08:26 No.102760511

Anonymous 10/10/24(Thu)07:08:26 No.102760511

>>102760451
>https://platform.openai.com/docs/guides/fine-tuning
Interesting stuff in there actually.
>If you would like to shorten the instructions or prompts that are repeated in every example to save costs, keep in mind that the model will likely behave as if those instructions were included, and it may be hard to get the model to ignore those "baked-in" instructions at inference time.
>entities

Anonymous
10/10/24(Thu)07:29:26 No.102760671

Anonymous 10/10/24(Thu)07:29:26 No.102760671

>>102760506
How does Nemo compare to Small? 12B vs 22B. Mistral 7B was good for the size and miqu is great, what's the sweetspot?

Anonymous
10/10/24(Thu)07:29:57 No.102760677

Anonymous 10/10/24(Thu)07:29:57 No.102760677

>>102760511
So OpenAI describes it as a way to bake few-shot prompts into the model, huh?

Anonymous
10/10/24(Thu)07:30:11 No.102760681

Anonymous 10/10/24(Thu)07:30:11 No.102760681

File: file.png (69 KB, 601x262)

69 KB PNG

APOLOGIZE

SAMA IN SHAMBLES

Anonymous
10/10/24(Thu)07:37:29 No.102760738

Anonymous 10/10/24(Thu)07:37:29 No.102760738

>>102760671
small is slightly smarter, but nemo is more open minded
small is better for assistant stuff
nemo is better for funsies

Anonymous
10/10/24(Thu)07:38:32 No.102760748

Anonymous 10/10/24(Thu)07:38:32 No.102760748

>>102760671
The way I would describe it is that Small is a Nemo sidegrade, but it's more assistant slopped.
>what's the sweetspot?
There isn't a sweetspot imo, either go small or go big.

Anonymous
10/10/24(Thu)07:43:39 No.102760776

Anonymous 10/10/24(Thu)07:43:39 No.102760776

anthropic won btw
https://www.anthropic.com/news/message-batches-api

Anonymous
10/10/24(Thu)07:44:09 No.102760779

Anonymous 10/10/24(Thu)07:44:09 No.102760779

>>102760681
what is this, spoonfed context for ants?

Anonymous
10/10/24(Thu)07:45:10 No.102760787

Anonymous 10/10/24(Thu)07:45:10 No.102760787

>>102760776
no 3.5 opus me no care

Anonymous
10/10/24(Thu)07:45:16 No.102760789

Anonymous 10/10/24(Thu)07:45:16 No.102760789

File: file.png (149 KB, 1338x164)

149 KB PNG

>>102760779
1b AGI

https://github.com/xjdr-alt/entropix

Anonymous
10/10/24(Thu)07:45:38 No.102760792

Anonymous 10/10/24(Thu)07:45:38 No.102760792

>>102760738
Personally I never had trouble fucking the bot and I hate it when it just agrees to everything and doesn't "get" what kind of scenario I'm trying to do. (Midnight) Miqu has been great but the speed is just not usable on my machine. So I guess 22B should be second best option unless it has some serious weaknesses. Though Nemo is faster. But I hate when the bot is stupid and keeps repeating itself. Mixtral was the worst case of this, even though it otherwise understood instructions very well.

Anonymous
10/10/24(Thu)07:46:20 No.102760799

Anonymous 10/10/24(Thu)07:46:20 No.102760799

llamacpp is slow at implementing multimodal models like pixtral and llama 3.2 is thre any good alterative to it?

Anonymous
10/10/24(Thu)07:48:12 No.102760809

Anonymous 10/10/24(Thu)07:48:12 No.102760809

File: 1701959181888378.png (343 KB, 1019x773)

343 KB PNG

>>102760799
ollama

Anonymous
10/10/24(Thu)07:51:34 No.102760833

Anonymous 10/10/24(Thu)07:51:34 No.102760833

>>102760789
okay now show how many attempts that took

Anonymous
10/10/24(Thu)07:54:50 No.102760856

Anonymous 10/10/24(Thu)07:54:50 No.102760856

File: file.png (110 KB, 1144x264)

110 KB PNG

>>102760833
apparently it's not cherrypicked

Anonymous
10/10/24(Thu)07:56:13 No.102760864

Anonymous 10/10/24(Thu)07:56:13 No.102760864

>>102760856
>>102760789
>>102760681
i can't believe reddit ate up the reflection grift, yet it's still sleeping on an actual happening

Anonymous
10/10/24(Thu)07:57:38 No.102760870

Anonymous 10/10/24(Thu)07:57:38 No.102760870

>>102760856
am I supposed to be impressed by that nonsense?

Anonymous
10/10/24(Thu)07:58:03 No.102760876

Anonymous 10/10/24(Thu)07:58:03 No.102760876

>>102760789
holy fuk relfection 2 is out already?

Anonymous
10/10/24(Thu)07:59:07 No.102760883

Anonymous 10/10/24(Thu)07:59:07 No.102760883

>>102760870
>assuming his biological ""reasoning"" is better that whatever ASI came up with
ngmi

Anonymous
10/10/24(Thu)08:00:09 No.102760888

Anonymous 10/10/24(Thu)08:00:09 No.102760888

>>102760876
there are at least 3 different people posting screencaps on x so it's actuall a thing unlike reflection

Anonymous
10/10/24(Thu)08:00:41 No.102760895

Anonymous 10/10/24(Thu)08:00:41 No.102760895

>>102760856
>subtract 9
>refuses to explain
>guesses the correct answer
>we're supposed to accept that this isn't cherrypicked

Anonymous
10/10/24(Thu)08:01:12 No.102760899

Anonymous 10/10/24(Thu)08:01:12 No.102760899

>>102760870
>that nonsense
uhm, anon....

Anonymous
10/10/24(Thu)08:03:08 No.102760908

Anonymous 10/10/24(Thu)08:03:08 No.102760908

>>102760899
subtracting 9 doesn't get you any closer to the solution, it's the same as answering that 9.9 is larger without any explanation

Anonymous
10/10/24(Thu)08:04:39 No.102760920

Anonymous 10/10/24(Thu)08:04:39 No.102760920

>>102760908
uhm actually it probably does, since i guess the base model is more confident about 0.11 < 0.9 because of the training data

Anonymous
10/10/24(Thu)08:05:28 No.102760924

Anonymous 10/10/24(Thu)08:05:28 No.102760924

so what's the deal with the SillyTavern drama? qrd?
i haven't updated mine since last month, do i pull the trigger and allow it to update?

Anonymous
10/10/24(Thu)08:05:47 No.102760925

Anonymous 10/10/24(Thu)08:05:47 No.102760925

>>102760920
yeah that's called overfitting, not reasoning

Anonymous
10/10/24(Thu)08:06:24 No.102760935

Anonymous 10/10/24(Thu)08:06:24 No.102760935

>>102760925
that's not what overfitting means

Anonymous
10/10/24(Thu)08:07:49 No.102760947

Anonymous 10/10/24(Thu)08:07:49 No.102760947

>>102760908
The sampler changes direction when the model becomes to uncertain about the next token. It doesn't have to conform to human ideals of how reasoning works.

Anonymous
10/10/24(Thu)08:07:56 No.102760948

Anonymous 10/10/24(Thu)08:07:56 No.102760948

>>102760924
are you a proxy negro from /aicg/?
is seraphina your waifu and you're too lazy to download her again?
if you answered no to both questions, you can update

Anonymous
10/10/24(Thu)08:10:12 No.102760972

Anonymous 10/10/24(Thu)08:10:12 No.102760972

>>102760948
no and no kek, i'm just using it as a frontend for kcpp and basic text adventure rpgs
guess i'll updoot then

Anonymous
10/10/24(Thu)08:14:41 No.102761011

Anonymous 10/10/24(Thu)08:14:41 No.102761011

>>102760789
how high was the author when he wrote that readme?

Anonymous
10/10/24(Thu)08:15:48 No.102761017

Anonymous 10/10/24(Thu)08:15:48 No.102761017

>>102760738
Shill your favourite Nemo tune to me. Or I'm just going to download Lyra.

Anonymous
10/10/24(Thu)08:22:42 No.102761053

Anonymous 10/10/24(Thu)08:22:42 No.102761053

I've never used SillyTavern, is it worth the trouble to run on top of KoboldCPP? Does it make things any more convenient or will it just overcomplicate things? Can you save settings for a model or do you need to do that on Kobold anyway? Do the roleplay prompt insertions and all the extra stuff ST does actually help compared to just writing instructions for the AI on Kobold manually? I guess the only problem right now is that I can't easily use the "cards" on Kobold (or can I? It has some features). And no swipes. I don't care about visuals whole lot, I care about ease of use. Remembering which settings each model needs on my PC is sometimes a hassle. Swipes would be cool. Does ST allow swiping between multiple branches like miku.gg or only most recent message?

Anonymous
10/10/24(Thu)08:23:41 No.102761057

Anonymous 10/10/24(Thu)08:23:41 No.102761057

>>102761053
ServiceTensor isn't made for casuals like you

Anonymous
10/10/24(Thu)08:24:03 No.102761060

Anonymous 10/10/24(Thu)08:24:03 No.102761060

File: file.png (18 KB, 701x134)

18 KB PNG

>>102761057
>>102761053

Anonymous
10/10/24(Thu)08:24:15 No.102761061

Anonymous 10/10/24(Thu)08:24:15 No.102761061

>>102761057
*ServiceTesnor

Anonymous
10/10/24(Thu)08:24:54 No.102761070

Anonymous 10/10/24(Thu)08:24:54 No.102761070

>>102760856
The average midwit reading that would believe that.

Anonymous
10/10/24(Thu)08:26:28 No.102761085

Anonymous 10/10/24(Thu)08:26:28 No.102761085

>>102761017
nta but magnum 12b v2

Anonymous
10/10/24(Thu)08:26:46 No.102761088

Anonymous 10/10/24(Thu)08:26:46 No.102761088

>>102761070
anon...

Anonymous
10/10/24(Thu)08:27:58 No.102761100

Anonymous 10/10/24(Thu)08:27:58 No.102761100

>>102761088
9.11 never happened

Anonymous
10/10/24(Thu)08:28:16 No.102761102

Anonymous 10/10/24(Thu)08:28:16 No.102761102

>>102761017
kill ys

Anonymous
10/10/24(Thu)08:31:02 No.102761131

Anonymous 10/10/24(Thu)08:31:02 No.102761131

>>102761053
>Does it make things any more convenient or will it just overcomplicate things?
It will overcomplicate things in that you have a lot more options.
Download it, see if any of the options available are anything you'd use, if not, go back to kcpps native ui.

Anonymous
10/10/24(Thu)08:34:02 No.102761163

Anonymous 10/10/24(Thu)08:34:02 No.102761163

>>102761085
Why not v2.5?

Anonymous
10/10/24(Thu)08:39:02 No.102761199

Anonymous 10/10/24(Thu)08:39:02 No.102761199

>>102761053
are you talking about things like context length and cpu offload? you still have to configure them from kobold
also kobold does have kcpps files, you don't have to remember the settings for each model, just save one or more kcpps for each
if you were referring to sampler settings, system prompt, instruct format, etc. then yes, ST allows you to control and save those settings from its UI, but you still have to switch them manually
as for the swipes, you can only swipe the last message from each branch
cards are the main reason I use ST, I often edit them or make my own, I don't think kobold can do that

Anonymous
10/10/24(Thu)08:42:54 No.102761233

Anonymous 10/10/24(Thu)08:42:54 No.102761233

File: 1702214080635066.png (84 KB, 530x1077)

84 KB PNG

>>102761053
i was hesitant about running ST over kcpp too, main concern was that it seemed too "chat-oriented" and i thought it wouldn't support the text adventure format i preferred
though ST does have a "story" view and i got around the user/bot chat model by making a generic "Narrator" character that responds to all my actions in second person, actually prefer it this way
>Does it make things any more convenient
for one the UI is better/cleaner, at least in my opinion - koboldlite or whatever it is that kcpp has is pretty limited
you also get a whole host of useful extensions like timelines (picrel) which helps immensely in tracking stories and branches
character/story management in general is also much, MUCH better in ST than kcpp
>Can you save settings for a model or do you need to do that on Kobold anyway
you use the same preset file (containing options such as GPU offloading, context size, etc.) as usual to load the model in kcpp, but actual sampler settings and such are done in ST and can be saved
>Do the roleplay prompt insertions and all the extra stuff ST does actually help compared to just writing instructions for the AI on Kobold manually
dunno, not sure i've ever tried
migrating from kcpp to ST i just rewrote my prompt a little and put it in the system prompt section for instruct
>I care about ease of use
my 2 cents: while ST may seem daunting at first due to all the new options, you only ever end up touching a small handful of them and in general i find it easier to use than kcpp
kcpp had HORRIBLE bugs for me in adventure mode that would often mix up AI/user-generated sections and cause other issues in the text view, ST doesn't have that
>Does ST allow swiping between multiple branches like miku.gg or only most recent message?
there's a timelines extension (picrel) that allows branching but it's not swipe-based, still have to click and switch branches manually

Anonymous
10/10/24(Thu)08:57:41 No.102761332

Anonymous 10/10/24(Thu)08:57:41 No.102761332

>>102761233
What do you actually do with timelines?
Never used that feature. I only swipe, edit replies or start new chats if something is bothering me.

Anonymous
10/10/24(Thu)09:02:50 No.102761383

Anonymous 10/10/24(Thu)09:02:50 No.102761383

>>102761332
When you have trouble deciding which way to take the scenario or which swipe to pick, you can go with one and return to the other branch and explore it later. Like when playing a VN, exploring all the "routes" except that in here there are infinite routes. You don't have to choose anything and can always return to your favourite branch. Though the lack of commitment can hinder the immersion since no one path is "canon" anymore

Anonymous
10/10/24(Thu)09:06:32 No.102761422

Anonymous 10/10/24(Thu)09:06:32 No.102761422

Aria is getting close to a perfect vramlet model. Just needs quantization aware 4 bit training and pre-gating.

Anonymous
10/10/24(Thu)09:10:25 No.102761460

Anonymous 10/10/24(Thu)09:10:25 No.102761460

>>102761422
Did you test it?

Anonymous
10/10/24(Thu)09:10:35 No.102761462

Anonymous 10/10/24(Thu)09:10:35 No.102761462

>>102761383
>lack of commitment can hinder the immersion
Feels like that could lessen the consequences of your RP actions. But then again, you always have full control over the story anyway.

Anonymous
10/10/24(Thu)09:16:26 No.102761529

Anonymous 10/10/24(Thu)09:16:26 No.102761529

>>102761233
TIL that there are retards that actually use the broken kcpp UI... I guess Discord shilling in these threads does work.

Anonymous
10/10/24(Thu)09:19:08 No.102761571

Anonymous 10/10/24(Thu)09:19:08 No.102761571

>>102761163
I haven't tried it.

Anonymous
10/10/24(Thu)09:19:17 No.102761574

Anonymous 10/10/24(Thu)09:19:17 No.102761574

>>102761529
buy an ad for your meds

Anonymous
10/10/24(Thu)09:20:47 No.102761590

Anonymous 10/10/24(Thu)09:20:47 No.102761590

cpp bros...

>Practical Llama 3 (and 3.1) inference in a single Java file
https://github.com/mukel/llama3.java/blob/main/Llama3.java

Anonymous
10/10/24(Thu)09:21:05 No.102761593

Anonymous 10/10/24(Thu)09:21:05 No.102761593

>>102761571
Not much point in trying as 2.5 felt like a downgrade.

Anonymous
10/10/24(Thu)09:21:24 No.102761596

Anonymous 10/10/24(Thu)09:21:24 No.102761596

>>102761590
>java
oof

Anonymous
10/10/24(Thu)09:38:48 No.102761760

Anonymous 10/10/24(Thu)09:38:48 No.102761760

>>102761422
But does it kiss on the lips while blowing you?

Anonymous
10/10/24(Thu)09:48:36 No.102761857

Anonymous 10/10/24(Thu)09:48:36 No.102761857

do i need a jailbreak for local models?

Anonymous
10/10/24(Thu)09:53:28 No.102761915

Anonymous 10/10/24(Thu)09:53:28 No.102761915

>>102761857
good ones, no

Anonymous
10/10/24(Thu)09:54:46 No.102761926

Anonymous 10/10/24(Thu)09:54:46 No.102761926

>>102761857
jailbreaks no longer work on regular modes. you need finetuned models or abliterated models, which work without jailbreaks

Anonymous
10/10/24(Thu)09:55:50 No.102761934

Anonymous 10/10/24(Thu)09:55:50 No.102761934

Why bother with LLMs? Just get a tulpa.

Anonymous
10/10/24(Thu)09:59:11 No.102761968

Anonymous 10/10/24(Thu)09:59:11 No.102761968

>>102761934
Electric tulpas don't require years of practicing self-induced schizophrenia.

Anonymous
10/10/24(Thu)10:00:49 No.102761983

Anonymous 10/10/24(Thu)10:00:49 No.102761983

>>102761596
only 2k loc of single file java code to run llama 3. that's pretty cool and educational project.

Anonymous
10/10/24(Thu)10:05:07 No.102762025

Anonymous 10/10/24(Thu)10:05:07 No.102762025

Hold on... Them they are training us on compressed data trough LLMs.

Anonymous
10/10/24(Thu)10:07:55 No.102762048

Anonymous 10/10/24(Thu)10:07:55 No.102762048

It's hard to make a tulpa

Anonymous
10/10/24(Thu)10:18:16 No.102762140

Anonymous 10/10/24(Thu)10:18:16 No.102762140

will issue

Anonymous
10/10/24(Thu)10:20:52 No.102762166

Anonymous 10/10/24(Thu)10:20:52 No.102762166

Llama.ccp

Anonymous
10/10/24(Thu)10:26:01 No.102762217

Anonymous 10/10/24(Thu)10:26:01 No.102762217

meds

Anonymous
10/10/24(Thu)10:35:28 No.102762292

Anonymous 10/10/24(Thu)10:35:28 No.102762292

>>102761983
>m-m-muh loc!
lmao, who the fuck cares. cpp is faster

Anonymous
10/10/24(Thu)10:37:21 No.102762309

Anonymous 10/10/24(Thu)10:37:21 No.102762309

t: homosexual

Anonymous
10/10/24(Thu)10:43:25 No.102762354

Anonymous 10/10/24(Thu)10:43:25 No.102762354

>>102762292
faster at breaking shit, maybe. lcpp has become unmaintainable

Anonymous
10/10/24(Thu)10:49:22 No.102762400

Anonymous 10/10/24(Thu)10:49:22 No.102762400

>>102762354
It will be maintainable as long as the cuda devs don't give up on doing it for free and making the ollama guy rich without any credit. Gggggergavov cannot maintain the project on his own.

Anonymous
10/10/24(Thu)10:54:14 No.102762436

Anonymous 10/10/24(Thu)10:54:14 No.102762436

>>102762354
gpt5 will maintain it, we don't need meatbags and their loc stinginess anymore

Anonymous
10/10/24(Thu)10:56:31 No.102762453

Anonymous 10/10/24(Thu)10:56:31 No.102762453

File: file.png (35 KB, 723x192)

35 KB PNG

>>102762400
cuda dev tries not seething at ollama chads, impossible challange!
https://www.reddit.com/r/LocalLLaMA/comments/1g00fq3/comment/lr7vmsn/

Anonymous
10/10/24(Thu)10:58:32 No.102762470

Anonymous 10/10/24(Thu)10:58:32 No.102762470

If you work has value, there will always be grifters trying to profit off it. I made a free software and some nigger sold it for years and when I emailed him he said he simply sold software discovery as a service.

Anonymous
10/10/24(Thu)11:00:31 No.102762483

Anonymous 10/10/24(Thu)11:00:31 No.102762483

>>102762436
Who will pay for all those tokens?

Anonymous
10/10/24(Thu)11:02:35 No.102762500

Anonymous 10/10/24(Thu)11:02:35 No.102762500

>>102762483
tokens will have the same price of breathable air

Anonymous
10/10/24(Thu)11:03:36 No.102762505

Anonymous 10/10/24(Thu)11:03:36 No.102762505

>>102762470
>sold software discovery as a service.
holy mother of based! (i do the same btw, thanks for playing fosscucks)

Anonymous
10/10/24(Thu)11:18:30 No.102762642

Anonymous 10/10/24(Thu)11:18:30 No.102762642

Does a CPU's NPU matter when offloading part of a model to the CPU?

Anonymous
10/10/24(Thu)11:48:49 No.102762949

Anonymous 10/10/24(Thu)11:48:49 No.102762949

File: file.png (37 KB, 637x608)

37 KB PNG

>>102758839
trying again with xtts2
no errors yet but is picrel proper way to activate then install?

Anonymous
10/10/24(Thu)11:49:40 No.102762962

Anonymous 10/10/24(Thu)11:49:40 No.102762962

File: Screenshot_2834.png (7 KB, 402x110)

7 KB PNG

How do I make XTC appear in ST? I can enable it in the sampler selector but it's not added in the sampler menu for me.

Anonymous
10/10/24(Thu)11:51:44 No.102762988

Anonymous 10/10/24(Thu)11:51:44 No.102762988

>>102762962
XTC is obsolete in ST because it is primarily used for Role****** and that's not something they want in their software.

Anonymous
10/10/24(Thu)11:56:31 No.102763056

Anonymous 10/10/24(Thu)11:56:31 No.102763056

>>102762988
Based, it has been proven that "roleyplaying" is just an euphemism for pedophilia with ai chatbots.

Anonymous
10/10/24(Thu)12:32:50 No.102763509

Anonymous 10/10/24(Thu)12:32:50 No.102763509

File: venv.png (1 KB, 207x143)

1 KB PNG

>>102762949
Seems correct. Run
>which python
or
>where python
to make sure. It should point to a path inside the directory where you created the venv.

Anonymous
10/10/24(Thu)12:34:32 No.102763518

Anonymous 10/10/24(Thu)12:34:32 No.102763518

>>102762949
>>102763509
Meant to say:
>which python
or
>whereis python #whereis, not where

Anonymous
10/10/24(Thu)12:41:16 No.102763575

Anonymous 10/10/24(Thu)12:41:16 No.102763575

AI Sex got deprecated...

Anonymous
10/10/24(Thu)12:41:52 No.102763583

Anonymous 10/10/24(Thu)12:41:52 No.102763583

File: tmpi3n9d434.png (515 KB, 512x720)

515 KB PNG

*pauses dramatically* When life gives you wifi, pee all over it.

Anonymous
10/10/24(Thu)12:42:35 No.102763591

Anonymous 10/10/24(Thu)12:42:35 No.102763591

>>102763575
That's just masturbation.

Anonymous
10/10/24(Thu)12:43:14 No.102763602

Anonymous 10/10/24(Thu)12:43:14 No.102763602

phonesex
sexting

Anonymous
10/10/24(Thu)12:58:33 No.102763785

Anonymous 10/10/24(Thu)12:58:33 No.102763785

>>102763056
Excuse me, I exclusively use llms to generate stories about big titty milfs

Anonymous
10/10/24(Thu)13:00:18 No.102763803

Anonymous 10/10/24(Thu)13:00:18 No.102763803

File: _06445_.png (2.91 MB, 936x1664)

2.91 MB PNG

>>102758839
osha violations with gumi

Anonymous
10/10/24(Thu)13:01:37 No.102763819

Anonymous 10/10/24(Thu)13:01:37 No.102763819

Will these ever be able to generate a complex narrative from very small user input?

Anonymous
10/10/24(Thu)13:02:03 No.102763824

Anonymous 10/10/24(Thu)13:02:03 No.102763824

>>102760908
You're absolutely wrong. The reason 9.9 is greater than 9.11 is that its decimal portion is greater. It is decomposing the problem by removing the common part. Frankly I can't fathom how anyone could fail to understand this.

Anonymous
10/10/24(Thu)13:02:12 No.102763829

Anonymous 10/10/24(Thu)13:02:12 No.102763829

>>102763819
Sorry, my crystal ball is in the shop until next week. Try asking again later

Anonymous
10/10/24(Thu)13:02:53 No.102763835

Anonymous 10/10/24(Thu)13:02:53 No.102763835

File: the jannyAI wordsmith legend.png (897 KB, 850x974)

897 KB PNG

>>102763819
they've always been able to.

Anonymous
10/10/24(Thu)13:06:56 No.102763874

Anonymous 10/10/24(Thu)13:06:56 No.102763874

File: wait_what.png (23 KB, 304x83)

23 KB PNG

>>102763835

Anonymous
10/10/24(Thu)13:08:24 No.102763887

Anonymous 10/10/24(Thu)13:08:24 No.102763887

>>102763819
wait for diff transformer gpt 5 o2

Anonymous
10/10/24(Thu)13:12:34 No.102763926

Anonymous 10/10/24(Thu)13:12:34 No.102763926

File: 672809804.png (170 KB, 398x281)

170 KB PNG

its over, child rape rp is the only use case for llms, shut it down

Anonymous
10/10/24(Thu)13:13:22 No.102763936

Anonymous 10/10/24(Thu)13:13:22 No.102763936

File: miku pee everywhere def p(...).jpg (41 KB, 1448x400)

41 KB JPG

>>102763583

Anonymous
10/10/24(Thu)13:16:19 No.102763970

Anonymous 10/10/24(Thu)13:16:19 No.102763970

>>102763926
If AI isn't the future of coding then explain this: >>102763936

Anonymous
10/10/24(Thu)13:17:17 No.102763989

Anonymous 10/10/24(Thu)13:17:17 No.102763989

>>102763936
based
i remember the time that miku suplexed me and cracked my neck, cried over my corpse, and used the power of love and friendship to revive me
this was after i let her pee on my face of course

Anonymous
10/10/24(Thu)13:19:21 No.102764021

Anonymous 10/10/24(Thu)13:19:21 No.102764021

>>102763936
>4 variables initialized but unused
i'm rejecting miku's pr

Anonymous
10/10/24(Thu)13:21:12 No.102764045

Anonymous 10/10/24(Thu)13:21:12 No.102764045

>>102763926
Betting on entertainment would not be a bad bet, but we live in the worst timeline when it comes to it and the politics of it. Otherwise, we would likely already have GTP available for roleplay. There may be some social concerns but honestly nothing will ever replace human interaction. Humans are more than just words leaving their mouth.

Anonymous
10/10/24(Thu)13:25:21 No.102764092

Anonymous 10/10/24(Thu)13:25:21 No.102764092

File: opinion.png (163 KB, 2044x872)

163 KB PNG

>>102763926

Anonymous
10/10/24(Thu)13:38:10 No.102764211

Anonymous 10/10/24(Thu)13:38:10 No.102764211

Is there any practical reason not to make an llm front end entirely in rpgmaker mv?

Anonymous
10/10/24(Thu)13:40:23 No.102764235

Anonymous 10/10/24(Thu)13:40:23 No.102764235

>>102764211
There's probably something in Leviticus that makes it a sin, if you're of that inclination

Anonymous
10/10/24(Thu)13:48:59 No.102764338

Anonymous 10/10/24(Thu)13:48:59 No.102764338

>>102763936
>classic miku log from feb surfaces again
just when you think it's lost it comes back kek, proof of the sovl in mixtral and envoid models tbdesu

Anonymous
10/10/24(Thu)13:49:42 No.102764344

Anonymous 10/10/24(Thu)13:49:42 No.102764344

>>102763926
why does the thumbnail look like a gameboy game

Anonymous
10/10/24(Thu)13:50:23 No.102764354

Anonymous 10/10/24(Thu)13:50:23 No.102764354

>>102758927
You re-used cables between the PSUs, right? That's a 50/50 on frying things since they always change the pinout on the SATA power connector.

Anonymous
10/10/24(Thu)13:50:34 No.102764358

Anonymous 10/10/24(Thu)13:50:34 No.102764358

>>102764211
If that's what you're most comfortable with, go right ahead. I assume it can talk through network and generate/parse json. The latter is simple enough to do yourself if it doesn't.
If you care about char cards, you'll also need a tEXt parser for the png (ridiculously easy. you can skip the image data entirely) the json parser (already discussed) and a b64decode (also simple enough).

Anonymous
10/10/24(Thu)13:56:35 No.102764426

Anonymous 10/10/24(Thu)13:56:35 No.102764426

File: file.png (12 KB, 581x370)

12 KB PNG

>>102763509
where do I use that command?
git bash & python 3.10 dont recognize it

Anonymous
10/10/24(Thu)13:56:36 No.102764427

Anonymous 10/10/24(Thu)13:56:36 No.102764427

File: tEXt_b64d.png (2 KB, 463x142)

2 KB PNG

>>102764211
>>102764358 (cont)

Anonymous
10/10/24(Thu)13:58:47 No.102764453

Anonymous 10/10/24(Thu)13:58:47 No.102764453

>>102758839
hey guys, haven't been here for months, what would be the best model to fit in 24gbs of ram to use to make cover letters and shit to get a new employment?
my gpu is fucked, so ill deal with the really slow generation

Anonymous
10/10/24(Thu)13:59:53 No.102764471

Anonymous 10/10/24(Thu)13:59:53 No.102764471

File: file.png (14 KB, 581x370)

14 KB PNG

>>102764426
nvm this one works

Anonymous
10/10/24(Thu)14:11:12 No.102764613

Anonymous 10/10/24(Thu)14:11:12 No.102764613

>>102764471
Cool. When you activate the venv, the path shown should point somewhere within the venv dir (wherever you created it). If that's the case, it means it worked fine. cd to the xtts2 dir, Install the requirements and try to launch it.
The venv remains active ONLY on that terminal and until you close it or run 'deactivate'. If you open a new term, you need to reactivate the venv.
I take it from that screenshot that, on that terminal, you haven't activated it yet.

Anonymous
10/10/24(Thu)14:16:18 No.102764673

Anonymous 10/10/24(Thu)14:16:18 No.102764673

File: file.png (43 KB, 581x370)

43 KB PNG

>>102764613
still throws this error at the end of the picrel
+ I have other things relying on python so I have an inkling this venv will conflict with those

Anonymous
10/10/24(Thu)14:19:44 No.102764703

Anonymous 10/10/24(Thu)14:19:44 No.102764703

I'm looking at 16gb cards. How bad is intel ARC compared to AMD?
Also, I can run 70b q2 models on cpu at ~1.2 t/s, will I reach 2t/s with a single 16gb card running a q4?
Would it be better if I get something like 8000mt/s ddr5 memory instead?(assuming that my mobo/cpu can handle it)

Anonymous
10/10/24(Thu)14:21:20 No.102764719

Anonymous 10/10/24(Thu)14:21:20 No.102764719

File: ross.jpg (27 KB, 679x988)

27 KB JPG

>>102764703
>How bad is intel ARC compared to AMD?
>he doesn't know

Anonymous
10/10/24(Thu)14:21:45 No.102764726

Anonymous 10/10/24(Thu)14:21:45 No.102764726

NVLM D 72B opinions?

Anonymous
10/10/24(Thu)14:25:06 No.102764770

Anonymous 10/10/24(Thu)14:25:06 No.102764770

File: 39_06480_.jpg (724 KB, 2048x2048)

724 KB JPG

>>102758839
Psychedelic Gumi edition

Anonymous
10/10/24(Thu)14:28:16 No.102764826

Anonymous 10/10/24(Thu)14:28:16 No.102764826

File: xtts2.png (18 KB, 830x123)

18 KB PNG

>>102764673
I gave you some links to the typer issue yesterday. I'm not sure how to fix them and i don't have windows anywhere handy to test myself.
Regarding conflicts, yes. Make a new venv to be used exclusively for xtts2. I'm sure i mentioned that already. If not, now you know.
Have you tried picrel? Seems to be only tested on ubuntu, but worth a shot. just below that there's a link with instructions for windows.
>https://github.com/coqui-ai/TTS
Is that the project you're trying to run?

Anonymous
10/10/24(Thu)14:33:25 No.102764888

Anonymous 10/10/24(Thu)14:33:25 No.102764888

File: 93670649_158027889077281_(...).jpg (41 KB, 526x522)

41 KB JPG

been out for a while...
bacc status?

Anonymous
10/10/24(Thu)14:38:20 No.102764952

Anonymous 10/10/24(Thu)14:38:20 No.102764952

>>102764826
https://github.com/BoltzmannEntropy/xtts2-ui?tab=readme-ov-file
this is the proj

Anonymous
10/10/24(Thu)14:39:22 No.102764967

Anonymous 10/10/24(Thu)14:39:22 No.102764967

>>102764826
+ im pretty illiterate in programming, let alone AI

Anonymous
10/10/24(Thu)14:47:29 No.102765066

Anonymous 10/10/24(Thu)14:47:29 No.102765066

>>102764719
I know that both are bad but which one is worse? And by how much.
I'm assuming intel but I don't know.
both 16gb cards barely fit in my budget.

Anonymous
10/10/24(Thu)14:49:29 No.102765089

Anonymous 10/10/24(Thu)14:49:29 No.102765089

>>102764888
>bacc status
Broken: https://www.youtube.com/watch?v=j3fkDQiCuf0

Anonymous
10/10/24(Thu)14:57:51 No.102765176

Anonymous 10/10/24(Thu)14:57:51 No.102765176

>>102761017
https://huggingface.co/mradermacher/Stellar-Odyssey-12b-v0.0-i1-GGUF/tree/main
Get this one Q6. Nice for RP.

Anonymous
10/10/24(Thu)14:58:11 No.102765180

Anonymous 10/10/24(Thu)14:58:11 No.102765180

>>102765066
arc is far worse
if you use linux amd will naturally just werk.
If not general amd support fucking sucks cock

Anonymous
10/10/24(Thu)15:02:28 No.102765223

Anonymous 10/10/24(Thu)15:02:28 No.102765223

>>102764952
>>102764967
It's a tough one. Last update was 9 months ago. It depends on coqui (it has TTS in its requiremets.txt), whose last update was 8 months ago and, if i remember correctly, it's officially abandoned. Seems like a pile of jank on top of another pile of jank.
IF i were to try it, i'd remove one of those piles and try to use coqui directly, but you may find the same or other issues as well. It seems to only come with a cli, though.

Anonymous
10/10/24(Thu)15:23:09 No.102765469

Anonymous 10/10/24(Thu)15:23:09 No.102765469

okay I got llama running on my pc now what

Anonymous
10/10/24(Thu)15:24:54 No.102765501

Anonymous 10/10/24(Thu)15:24:54 No.102765501

>>102765469
Msaturbate. Go on an adventure.
Ask it for a cake recipe.

Anonymous
10/10/24(Thu)15:25:27 No.102765511

Anonymous 10/10/24(Thu)15:25:27 No.102765511

>>102765469
sex with miku

Anonymous
10/10/24(Thu)15:28:16 No.102765547

Anonymous 10/10/24(Thu)15:28:16 No.102765547

>>102765501
This. In exactly that order.

Anonymous
10/10/24(Thu)15:33:39 No.102765614

Anonymous 10/10/24(Thu)15:33:39 No.102765614

How retarded is to buy a RTX A6000 with money I was given on Hanukkah in 2024?

I obviously cannot afford A6000 Ada

Anonymous
10/10/24(Thu)15:34:24 No.102765625

Anonymous 10/10/24(Thu)15:34:24 No.102765625

File: spines=chilled.jpg (13 KB, 288x171)

13 KB JPG

>>102761529
I wonder what your relationship to kobold is. Do you find yourself thinking about him or her in various contexts? How does he or she fit into your life? Maybe it would be interesting for us both to try talking to kobold as if he were there, and seeing how it feels for both of us?

Anonymous
10/10/24(Thu)15:34:29 No.102765626

Anonymous 10/10/24(Thu)15:34:29 No.102765626

>>102765614
pretty retarded if you need that money for anything more important, but if you have the disposable income and this is an important hobby then go for it

Anonymous
10/10/24(Thu)15:35:58 No.102765644

Anonymous 10/10/24(Thu)15:35:58 No.102765644

>>102765614
Wait until the 5090 launch.
Then the a6000 prices will folllow.
Also one is not enough, to run anything good you need at least 80GB VRAM

Anonymous
10/10/24(Thu)15:47:17 No.102765809

Anonymous 10/10/24(Thu)15:47:17 No.102765809

if you are using stable diffusion, it only makes sense to get a 5090.
if you are using LLM's, 3090's are still the best option if you can get them for a fair price, but if you want new hardware the 5090 has 1.5x more bandwidth than a 3090 unlike the 4090 which has the same bandwidth as a 3090 (but a 4090 destroys the 3090 for stable diffusion). But 2 3090's is better than 1 5090.
Also the RTX a6000 is 30% slower than 2x 3090's because the a6000 uses gddr6 while the 3090 uses gddr6x. And I don't know what price you are getting the A6000, but 2x 5090's should be much faster and you get more vram.
I would wait for 5090 reviews, but if history says anything, the 5090 should be the best GPU (due to the price ladder) and it will get sold out by scalpers.
Also note the A6000 is old, so model runtimes might use cuda features only 50 series GPU's support (but we will see).

Anonymous
10/10/24(Thu)15:51:42 No.102765864

Anonymous 10/10/24(Thu)15:51:42 No.102765864

>>102765809
yeah lick my fucking balls and get an infection from eating my ass, i'm not touching your retarded housefire 500w gpu's jensen.

Anonymous
10/10/24(Thu)15:53:25 No.102765884

Anonymous 10/10/24(Thu)15:53:25 No.102765884

>A6000 is old
lmao, lol even
Cuda dev still supports P40s from 2016, a6000 has FA2. What is this disnformation lol lmao there aren't even model runtimes that only use 40XX features and that's been out for a while now.
As long as it's ampere or above it gets all the perks, nvidia meme features from the past 3-4 years don't mean jack shit for LLMs.

Anonymous
10/10/24(Thu)15:54:19 No.102765892

Anonymous 10/10/24(Thu)15:54:19 No.102765892

>>102765809
>5090
The wait for the 5090 will suck, and even when it comes out, chances are it will be hard to come by for a while. Ex, when the 4090 came out people bought them up fast and resold them at a premium.

I have also learned to never be first in line for new (overpriced) technology. Ex, there were cases of 4090s melting before the technology was refined - and let's not forget the 13th and 14th generation intel failures.

Be a year beyond the cutting edge and you'll save a ton of money, and have far more assurance on reliability.

Anonymous
10/10/24(Thu)15:57:00 No.102765921

Anonymous 10/10/24(Thu)15:57:00 No.102765921

>>102765809
>and it will get sold out by scalpers.
So if you want it, buy day 1 or expect to pay a premium. I'm thinking of buying a few extra just to profit on the appreciation myself.

Anonymous
10/10/24(Thu)16:02:07 No.102765978

Anonymous 10/10/24(Thu)16:02:07 No.102765978

>>102765626
>>102765644
>>102765809
Thank you for your inputs.

This is exactly the sort of doubts that I have

Anonymous
10/10/24(Thu)16:11:27 No.102766133

Anonymous 10/10/24(Thu)16:11:27 No.102766133

>>102765809
>Also the RTX a6000 is 30% slower than 2x 3090's
It's slightly slower, but nowhere near 30%. I run a slight memory overclock on mine to narrow the gap and when I tested a 70b model on an A6000 vs a 3090 in the same rig, it was only around 5% slower on the A6000 than the 3090.

Anonymous
10/10/24(Thu)16:16:22 No.102766204

Anonymous 10/10/24(Thu)16:16:22 No.102766204

File: IMG_1543.jpg (483 KB, 815x1168)

483 KB JPG

>>102758839
>just started college as a mech engineering student
>meet compsci girl at one of my lectures on her final year
>she mentions that her final project is making an ai chatbot
>sperg about llms
>realise i know more about her final project than her

Wow

Anonymous
10/10/24(Thu)16:20:24 No.102766265

Anonymous 10/10/24(Thu)16:20:24 No.102766265

>>102766204
Now you two can become a couple and work together to make the first commercial robotic AI girlfriend.

Anonymous
10/10/24(Thu)16:20:39 No.102766269

Anonymous 10/10/24(Thu)16:20:39 No.102766269

>>102765892
>Be a year beyond the cutting edge and you'll save a ton of money, and have far more assurance on reliability.
Oops, I meant be a year BEHIND the cutting edge.

Anonymous
10/10/24(Thu)16:21:07 No.102766279

Anonymous 10/10/24(Thu)16:21:07 No.102766279

>>102765809
>scalpers
Doubt it, it's already being sold at scalpers prices

Anonymous
10/10/24(Thu)16:22:03 No.102766296

Anonymous 10/10/24(Thu)16:22:03 No.102766296

>>102766204
That's nothing. The majority of programming students suck at computers for some reason.
I could destroy them in anything that wasn't programming or networking.

Anonymous
10/10/24(Thu)16:29:29 No.102766393

Anonymous 10/10/24(Thu)16:29:29 No.102766393

>>102766133
I agree with you but I think the price is awful.
The only metric a6000 wins in power.
in every other metric a6000 is equal to 2x3090 setup.
The problem is that the a6000 and 2x3090 have zero upgrade options, assuming you get 10tk/s at max vram, nobody wants 5tk/s even if they got 96gb (If you want 5tk/s at 48gb, go buy 2xP40).
You must have a bandwidth surplus for an upgrade (if you double your GPU, you get half the token speed).
So buying a 5090 is still extremely MID compared to a used 3090 (in price), but hey, you will get 50% more token speed (hopefully) and you get more vram.
Is it worth it? depends, maybe for a 3x setup is probably the ideal setup for 10tk/s, but it depends on the models, are there going to be good 96gb models in the future? Will 96gb be enough for next gen open source AI video generators?

Anonymous
10/10/24(Thu)16:30:13 No.102766408

Anonymous 10/10/24(Thu)16:30:13 No.102766408

>>102766204
Yo, can you pass her my mixture of expert roleplayers system prompt to check out?

Anonymous
10/10/24(Thu)16:31:44 No.102766431

Anonymous 10/10/24(Thu)16:31:44 No.102766431

>>102766393
>Will 96gb be enough for next gen open source AI video generators?
*scratch that, I am 99% certain that every AI video company does not use a multi-gpu setup. You need a single GPU.

Anonymous
10/10/24(Thu)16:32:41 No.102766446

Anonymous 10/10/24(Thu)16:32:41 No.102766446

>>102766296
What exactly does
>at computers
entail?
Thanks to ChatGPT, they barely need to know how to program. Once cloud IDEs like Googles become the norm, they won't need any skills "at computers" besides turning it on and opening their browser to be productive in their field.
So not sure what you're trying to brag about. Your config file editing skills are not that impressive.

Anonymous
10/10/24(Thu)16:36:31 No.102766501

Anonymous 10/10/24(Thu)16:36:31 No.102766501

>>102766446
Sorry, I omitted some context. I meant in the past when I was a student.

Anonymous
10/10/24(Thu)16:36:50 No.102766506

Anonymous 10/10/24(Thu)16:36:50 No.102766506

>>102766204
No one needs a gf nowadays

Anonymous
10/10/24(Thu)16:38:32 No.102766534

Anonymous 10/10/24(Thu)16:38:32 No.102766534

File: file.png (1001 KB, 733x743)

1001 KB PNG

What should I use to stuff my 4090 in there so I can get a raiser and put a 5090 in place where 4090 was?

Anonymous
10/10/24(Thu)16:40:50 No.102766569

Anonymous 10/10/24(Thu)16:40:50 No.102766569

>>102764770
I want to fuck the anime girl.

Anonymous
10/10/24(Thu)16:42:06 No.102766583

Anonymous 10/10/24(Thu)16:42:06 No.102766583

>>102766296
they suck at networking too
t. devops guy who has to fix all the dev environments every week because everyone keeps finding new and exciting ways to suck at networking

Anonymous
10/10/24(Thu)16:42:47 No.102766594

Anonymous 10/10/24(Thu)16:42:47 No.102766594

ITS UP

https://huggingface.co/TheDrummer/UnslopNemo-12B-v3-GGUF

Anonymous
10/10/24(Thu)16:43:15 No.102766602

Anonymous 10/10/24(Thu)16:43:15 No.102766602

>>102766594
DRUMMERGAWDS WE WON

Anonymous
10/10/24(Thu)16:45:05 No.102766648

Anonymous 10/10/24(Thu)16:45:05 No.102766648

>>102766534
Nta, but speaking of that. Is it possible to run 3 3090s / 4090s in a case without a crypto farm setup?
If it's possible, I would really love to see a motherboard model, the rest I'll figure out.

Anonymous
10/10/24(Thu)16:46:21 No.102766667

Anonymous 10/10/24(Thu)16:46:21 No.102766667

>>102766594
Sounds like a nothingburger.

Anonymous
10/10/24(Thu)16:47:11 No.102766681

Anonymous 10/10/24(Thu)16:47:11 No.102766681

>>102766594
>leave slop in your datasets
>release models
>go back to your datasets and curate them
>release UNSLOPPED models
If only he was working for a soulless corporation with retarded modern policies. Imagine the continuous improvement project you could make on this.

Anonymous
10/10/24(Thu)16:49:54 No.102766718

Anonymous 10/10/24(Thu)16:49:54 No.102766718

>>102766594
Did you take all the slop you removed and make a KTO dataset with it?

Anonymous
10/10/24(Thu)16:51:46 No.102766737

Anonymous 10/10/24(Thu)16:51:46 No.102766737

>>102766648
inference does not use PCIE bandwidth.
You can turn spare m.2 slots into PCIE slots using $2 adapters.
Most mid range mobo's have enough slots for 4 GPU's and one NVME SSD.

Anonymous
10/10/24(Thu)16:52:36 No.102766750

Anonymous 10/10/24(Thu)16:52:36 No.102766750

File: 1707892813017576.webm (3.87 MB, 1186x714)

3.87 MB WEBM

>AutoDAN Turbo - A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
https://x.com/_akhaliq/status/1844258704633340284
https://arxiv.org/abs/2410.05295

Anonymous
10/10/24(Thu)16:54:03 No.102766783

Anonymous 10/10/24(Thu)16:54:03 No.102766783

>>102766594
Respect to drummer for trying to make nemo even better after all hype is gone, my fav model so far for local.
Gonna try this tune tomorrow when I wake up.
Making this post as inorganic as possible to keep buy an ad schizo guessing.

Anonymous
10/10/24(Thu)17:06:28 No.102766966

Anonymous 10/10/24(Thu)17:06:28 No.102766966

File: Screenshot_10-10-2024_176(...).jpg (56 KB, 455x401)

56 KB JPG

>>102766737
Well, it's going to look like a mining setup at this point, and you are going to spend most of the money on cables + external enclosure.
https://www.aliexpress.com/item/1005004714035083.html

Anonymous
10/10/24(Thu)17:10:06 No.102767021

Anonymous 10/10/24(Thu)17:10:06 No.102767021

>>102766783
Buy, uh... or sell... look just perform a transaction please

Anonymous
10/10/24(Thu)17:12:58 No.102767074

Anonymous 10/10/24(Thu)17:12:58 No.102767074

>>102766966
Can there be an "in case" solution for 3 bricks? That is what I'm really asking.
My cuck rig already struggling with just one brick.
Is airflow between GPUs matter if I don't do back-to-back inference?
If I have to go for razers, I'll go server at that point.

Anonymous
10/10/24(Thu)17:14:58 No.102767095

Anonymous 10/10/24(Thu)17:14:58 No.102767095

>>102767074
I'm sure you could find a $500 case that serves your needs, if not a custom case builder could.

Anonymous
10/10/24(Thu)17:19:51 No.102767160

Anonymous 10/10/24(Thu)17:19:51 No.102767160

>>102767095
If people buy $500 cases I'm better off making those instead.

Anonymous
10/10/24(Thu)17:26:08 No.102767268

Anonymous 10/10/24(Thu)17:26:08 No.102767268

File: AMD.png (19 KB, 836x177)

19 KB PNG

>AMD makes a 28GB GPU
>Consumers can't buy it and it's reserved for datacenters
The monkey paw curls once again for /lmg/

Anonymous
10/10/24(Thu)17:27:11 No.102767281

Anonymous 10/10/24(Thu)17:27:11 No.102767281

File: file.png (3.14 MB, 2000x1000)

3.14 MB PNG

>KTO
All I can think about whenever I see that.

Anonymous
10/10/24(Thu)17:27:50 No.102767289

Anonymous 10/10/24(Thu)17:27:50 No.102767289

>>102767268
Lisa can't hurt her cousin Jensen's business.

Anonymous
10/10/24(Thu)17:28:52 No.102767306

Anonymous 10/10/24(Thu)17:28:52 No.102767306

>>102767268
Proves they're chasing engineering autism instead of solving computational tasks.

Anonymous
10/10/24(Thu)17:33:46 No.102767378

Anonymous 10/10/24(Thu)17:33:46 No.102767378

>>102766594
This model cured my erectile dysfunction and I can get hard when chatting with AI bots again.

Anonymous
10/10/24(Thu)17:38:37 No.102767445

Anonymous 10/10/24(Thu)17:38:37 No.102767445

>>102767305
>No ads that are harmful or malicious in nature will be accepted.
Oh It's alright then, sloptunes are not allegeable for ads. They can advertise for free.

Anonymous
10/10/24(Thu)17:43:59 No.102767525

Anonymous 10/10/24(Thu)17:43:59 No.102767525

>>102767268
They should make like 64G consumer cards and 256G for enterprise. There must be a good reason why they don't but I wish they did

Anonymous
10/10/24(Thu)17:46:13 No.102767560

Anonymous 10/10/24(Thu)17:46:13 No.102767560

>>102767525
>There must be a good reason why they don't
Money

Anonymous
10/10/24(Thu)17:49:52 No.102767621

Anonymous 10/10/24(Thu)17:49:52 No.102767621

>>102767560
Just Moore's law not accounting for diversity hires.

Anonymous
10/10/24(Thu)17:50:39 No.102767631

Anonymous 10/10/24(Thu)17:50:39 No.102767631

>>102767621
more like mooney's law

Anonymous
10/10/24(Thu)17:53:35 No.102767671

Anonymous 10/10/24(Thu)17:53:35 No.102767671

>>102766534
If you have enough money to buy those GPUs then you have enough money to get a proper case setup and motherboard for them.

Anonymous
10/10/24(Thu)17:55:18 No.102767687

Anonymous 10/10/24(Thu)17:55:18 No.102767687

you don't actually need more vram

Anonymous
10/10/24(Thu)17:55:30 No.102767693

Anonymous 10/10/24(Thu)17:55:30 No.102767693

Best local language model available for 4050 6GB + i7-13620H + 16GB RAM?

Anonymous
10/10/24(Thu)17:56:06 No.102767701

Anonymous 10/10/24(Thu)17:56:06 No.102767701

yes i do i need enough vram for mixtral large

Anonymous
10/10/24(Thu)17:57:36 No.102767718

Anonymous 10/10/24(Thu)17:57:36 No.102767718

>>102767671
>proper case setup
What is a proper case setup? A fucking rack?

Anonymous
10/10/24(Thu)17:57:42 No.102767721

Anonymous 10/10/24(Thu)17:57:42 No.102767721

File: relax.gif (902 KB, 498x374)

902 KB GIF

how I sleep at night knowing there's no point spending thousands of dollars on hardware upgrades, because there's no open source model that's good enough to be worth it yet

Anonymous
10/10/24(Thu)17:58:05 No.102767725

Anonymous 10/10/24(Thu)17:58:05 No.102767725

>>102766594
Wouldn't you like to expose your creations to a more sophisticated audience? An ad might just be the ticket.

Anonymous
10/10/24(Thu)17:58:44 No.102767731

Anonymous 10/10/24(Thu)17:58:44 No.102767731

>>102767718
No. Literally just a full size tower instead of a mid like that photo.

Anonymous
10/10/24(Thu)17:59:02 No.102767734

Anonymous 10/10/24(Thu)17:59:02 No.102767734

Mistral Small ought to be enough for anyone

Anonymous
10/10/24(Thu)17:59:39 No.102767741

Anonymous 10/10/24(Thu)17:59:39 No.102767741

File: Sensible Chuckle.gif (1.14 MB, 250x250)

1.14 MB GIF

>>102767725
>sophisticated audience
>People who click ads
Alright anon you got me, nice one!

Anonymous
10/10/24(Thu)18:00:23 No.102767756

Anonymous 10/10/24(Thu)18:00:23 No.102767756

>>102767721
Local SOTA is like $500 quanted. Don't let the watermelon salesmen inflate it.

Anonymous
10/10/24(Thu)18:01:11 No.102767766

Anonymous 10/10/24(Thu)18:01:11 No.102767766

>>102767525
consumer cards are already 1/4th the cost of server cards.
gaming GPU's are already the least profitable part of nvidia and AMD, they are just underselling to prevent marketshare loss (shareholders might care enough to pull out, but honestly I think it's the opposite, I think nvidia will make more money if they stopped making gaming GPU's, every 5090 sold is like $50,000 lost profit for valuable silicon that could have gone to a H200).
Nvidia makes a 50% margin, while AMD makes 30%.
It's neat that Nvidia even made the 5090 32gb, everyone expected it to be 24gb and for a Titan / TI re-release to be 32gb (maybe nvidia plans to release a 64gb Titan for $4000).

Anonymous
10/10/24(Thu)18:01:55 No.102767777

Anonymous 10/10/24(Thu)18:01:55 No.102767777

>>102767525
>a good reason why they don't
No one needs 256 petabytes of VRAM for something people can get on their phones for 20 bux a month without ever hearing about python.

Anonymous
10/10/24(Thu)18:03:07 No.102767799

Anonymous 10/10/24(Thu)18:03:07 No.102767799

>>102767693
Gemma2 9b SimPO for anything other than cooming.

Anonymous
10/10/24(Thu)18:03:52 No.102767809

Anonymous 10/10/24(Thu)18:03:52 No.102767809

>>102767756
Nemo?

Anonymous
10/10/24(Thu)18:04:23 No.102767817

Anonymous 10/10/24(Thu)18:04:23 No.102767817

>>102767799
And what about for cooming?

Anonymous
10/10/24(Thu)18:04:24 No.102767818

Anonymous 10/10/24(Thu)18:04:24 No.102767818

File: file.png (2.01 MB, 1024x1024)

2.01 MB PNG

Anonymous
10/10/24(Thu)18:04:34 No.102767820

Anonymous 10/10/24(Thu)18:04:34 No.102767820

>>102767777
This, I stopped using local when I realized the best model (mistral large) is just a shitty claude.

Anonymous
10/10/24(Thu)18:04:39 No.102767822

Anonymous 10/10/24(Thu)18:04:39 No.102767822

>>102767731
> 544 x 242 x 530 mm
How much bigger is a full tower? A meter?

Anonymous
10/10/24(Thu)18:06:02 No.102767844

Anonymous 10/10/24(Thu)18:06:02 No.102767844

>>102767817
Sone mistral nemo finetune. I'm not familiar with them so I can't help you more than that.

Anonymous
10/10/24(Thu)18:10:54 No.102767898

Anonymous 10/10/24(Thu)18:10:54 No.102767898

File: file.png (805 KB, 768x768)

805 KB PNG

Anonymous
10/10/24(Thu)18:12:10 No.102767916

Anonymous 10/10/24(Thu)18:12:10 No.102767916

>>102767721
I only got into this cancer because I got a 4090 for gayming. Getting a second 24GB set just for current LLM's would make me suicidal.

Anonymous
10/10/24(Thu)18:16:45 No.102767978

Anonymous 10/10/24(Thu)18:16:45 No.102767978

>>102767817
if you are autistic and like tinkering, I use google colab for cooming. The downsides is: google spys on you (but honestly they already know too much about my fetishes), takes a moment to start up, sometimes you can't get a GPU, sometimes they detect you are breaking the terms (having sex) but I think it's based on the python log (so if you load a card that says futanari fuckventures or a model/huggingface with "lewd" in the name, and it gets printed, you sometimes get caught, but most of the time I don't?), and you can't continue where you left off because your session can only last a day. On the bright side you get a tesla T4 (it's a 16gb GPU that's half the speed of a 3090, not bad), and you can load new models every time, and it wont make your gpu warm.
https://colab.research.google.com/github/lostruins/koboldcpp/blob/concedo/colab.ipynb
I haven't used colab in a while however, I stopped having sex because I was pretty disappointed with llama3 and nemo so I'm waiting for something new. LLama3 and nemo are "fine" but I didn't notice a next gen improvement from what I saw from llama2 (at least for the erp part).

Anonymous
10/10/24(Thu)18:17:12 No.102767983

Anonymous 10/10/24(Thu)18:17:12 No.102767983

>>102767756
$500 per T/s?

Anonymous
10/10/24(Thu)18:19:43 No.102768012

Anonymous 10/10/24(Thu)18:19:43 No.102768012

>>102767978
>sometimes they detect you are breaking the terms (having sex) but I think it's based on the python log (so if you load a card that says futanari fuckventures
Have you tried disabling the console output?

Anonymous
10/10/24(Thu)18:22:49 No.102768040

Anonymous 10/10/24(Thu)18:22:49 No.102768040

What can we do to make /lmg/ more dead?

Anonymous
10/10/24(Thu)18:26:08 No.102768067

Anonymous 10/10/24(Thu)18:26:08 No.102768067

Just stop making threads

Anonymous
10/10/24(Thu)18:35:59 No.102768176

Anonymous 10/10/24(Thu)18:35:59 No.102768176

>>102768040
Even more discord shilling

Anonymous
10/10/24(Thu)18:36:48 No.102768184

Anonymous 10/10/24(Thu)18:36:48 No.102768184

>>102768012
I switched from oogabooga to kobold, and I think kobold prints less than oogabooga.
The oogabooga notepad I use is painful with gguf since I can't figure out how to make it load a Q8 models without modifying the python code to use --specific-file and load the model manually since I can't figure it out.
KoboldCpp also starts up like 10x faster, but if you run out of context it needs to redownload everything and it's harder to modify the tavernAI png context, since I like changing things.
I tried to run TavernAI, but it's complicated on colab??? and I can't figure out how to load custom huggingface models, and all the python code is hidden for some reason so I can't figure out what's happening.
And I use LLM studio on desktop, it can't run tarvernAI png's, but it's good for some simple r34 "write me an erotic story about ...." and driving the story into my direction.

Anonymous
10/10/24(Thu)18:42:25 No.102768245

Anonymous 10/10/24(Thu)18:42:25 No.102768245

>>102768184
>but if you run out of context it needs to redownload everything
What?
Also, wouldn't it be better to run koboldcpp as a backend in google colab and use an Silly on your computer as a frontend so that cards and shit are saved locally?

Anonymous
10/10/24(Thu)18:43:52 No.102768256

Anonymous 10/10/24(Thu)18:43:52 No.102768256

>>102768067
This, what's the point? We've had like four major model releases in the past two weeks and nobody even uses them. /lmg/ is now mostly just drama and tech support. It's about as on topic at this point as /aicg/ is with botmaking.

Anonymous
10/10/24(Thu)18:45:53 No.102768279

Anonymous 10/10/24(Thu)18:45:53 No.102768279

>>102768245
I just realized that I can use sillytavern with LM studio, that's pretty interesting.
I didn't really fully understand the concept of frontends, I thought sillytavern was just another runtime like oogabooga and Kobaldcpp.
I'll give sillytavern a shot.

Anonymous
10/10/24(Thu)18:47:43 No.102768299

Anonymous 10/10/24(Thu)18:47:43 No.102768299

>>102768256
>image of apps that depend on llamacpp waiting around with shovels

Anonymous
10/10/24(Thu)18:50:04 No.102768324

Anonymous 10/10/24(Thu)18:50:04 No.102768324

>>102768299
Let's be honest, what's the point even once the support is implemented? Everyone's going to show their models funny pictures for 5 minutes before losing interest and switching back to what we already had. Multimodal is a meme.

Anonymous
10/10/24(Thu)18:51:13 No.102768336

Anonymous 10/10/24(Thu)18:51:13 No.102768336

>>102768299
ollama is actively working on multimodal support while llamacpp won't even both

Anonymous
10/10/24(Thu)18:52:09 No.102768350

Anonymous 10/10/24(Thu)18:52:09 No.102768350

>>102768324
You, much like the models you hold so dear, are severely lacking in vision.

Anonymous
10/10/24(Thu)18:54:05 No.102768374

Anonymous 10/10/24(Thu)18:54:05 No.102768374

>llama.cpp just went out to lunch one day and never came back

Anonymous
10/10/24(Thu)18:55:13 No.102768387

Anonymous 10/10/24(Thu)18:55:13 No.102768387

>>102768350
ok palpatine

Anonymous
10/10/24(Thu)18:55:49 No.102768392

Anonymous 10/10/24(Thu)18:55:49 No.102768392

>>102768256
> We've had like four major model releases in the past two weeks
There has been nothing of note in the past 2 weeks

Anonymous
10/10/24(Thu)18:57:36 No.102768408

Anonymous 10/10/24(Thu)18:57:36 No.102768408

>>102768374
so that's why it reminds me of my dad

Anonymous
10/10/24(Thu)19:15:20 No.102768607

Anonymous 10/10/24(Thu)19:15:20 No.102768607

File: file.png (79 KB, 562x772)

79 KB PNG

I sure love totally not damage control

Anonymous
10/10/24(Thu)19:19:12 No.102768649

Anonymous 10/10/24(Thu)19:19:12 No.102768649

It sucks that AI is still dumb enough to not be trusted with anything more complicated than a blowjob. The moment AI gets good enough to consistently simulate whatever setting I want without randomly time skipping or changing the entire setting on a whim will be a very good day. I'm interested in seeing AI dungeonmasters for text adventure games, that sounds like it'd be fun.

Anonymous
10/10/24(Thu)19:21:18 No.102768668

Anonymous 10/10/24(Thu)19:21:18 No.102768668

File: 1984.gif (2.78 MB, 498x367)

2.78 MB GIF

>>102768607

Anonymous
10/10/24(Thu)19:22:29 No.102768678

Anonymous 10/10/24(Thu)19:22:29 No.102768678

>>102768607
What's going on? What is misinformation?

Anonymous
10/10/24(Thu)19:25:51 No.102768713

Anonymous 10/10/24(Thu)19:25:51 No.102768713

File: file.png (46 KB, 767x268)

46 KB PNG

>>102768678
Nothing, don't listen to trolls, ST is doing just fine despite all the FUD!

Anonymous
10/10/24(Thu)19:29:03 No.102768749

Anonymous 10/10/24(Thu)19:29:03 No.102768749

Just woke up from cryo sleep. wtf is Aria?

Anonymous
10/10/24(Thu)19:32:34 No.102768789

Anonymous 10/10/24(Thu)19:32:34 No.102768789

>>102768749
multimodel meme

Anonymous
10/10/24(Thu)19:37:20 No.102768834

Anonymous 10/10/24(Thu)19:37:20 No.102768834

>>102768789
Am I missing something? people are claiming its really resource intensive to run but the total file size isnt that big?

Anonymous
10/10/24(Thu)19:39:17 No.102768864

Anonymous 10/10/24(Thu)19:39:17 No.102768864

File: 2024-10-10_072343_seed448(...).png (2.36 MB, 1248x1824)

2.36 MB PNG

>>102768749
A pretty great manga and anime. I can recommend it.

Anonymous
10/10/24(Thu)19:39:59 No.102768873

Anonymous 10/10/24(Thu)19:39:59 No.102768873

>>102768864
holy based

Anonymous
10/10/24(Thu)19:41:03 No.102768882

Anonymous 10/10/24(Thu)19:41:03 No.102768882

>>102759501
Fork the money over, paypiggies!

Anonymous
10/10/24(Thu)19:48:36 No.102768958

Anonymous 10/10/24(Thu)19:48:36 No.102768958

File: Screenshot from 2024-10-1(...).png (545 KB, 1068x881)

545 KB PNG

>>102767721
>have something that basically beats turing tests on an offline computer nowadays, which isn't chasing mememarks and 10x VRAM for 0.1% improvement

Anonymous
10/10/24(Thu)20:03:27 No.102769140

Anonymous 10/10/24(Thu)20:03:27 No.102769140

>>102759501
Nvidia is legitimately going to lose all the midrange market to AMD at these prices.

Anonymous
10/10/24(Thu)20:04:54 No.102769159

Anonymous 10/10/24(Thu)20:04:54 No.102769159

>>102769140
You're a naive idiot if you think AMD will not match their prices to the same of ever so slightly below what NVIDIA charters. NVIDIA dictates what everyone else charges through their market share alone.

Anonymous
10/10/24(Thu)20:05:09 No.102769161

Anonymous 10/10/24(Thu)20:05:09 No.102769161

>>102767721
You will never know the feeling of fine-tuning your own model.
You'll never know the joy of getting an output that's tailored to you, because you trained it to be so.
Just keep consuming what others put in your sloptrough, ignorance is bliss.
Until they take it away from you, and history shows that they will.
Keep enjoying that sleep until then.

Anonymous
10/10/24(Thu)20:09:17 No.102769207

Anonymous 10/10/24(Thu)20:09:17 No.102769207

>>102769159
From what I am hearing, AMD has stopped trying to match Nvidia in the high range department and are solely focusing on the mid range. We will have to wait and see what the prices will be when AMD releases it but I would legitimately be surprised if if breaches $1,000 bucks.

Anonymous
10/10/24(Thu)20:28:26 No.102769342

Anonymous 10/10/24(Thu)20:28:26 No.102769342

>Yes the developers do want to realign the labeling/branding of ST to not be primarily Roleplay focused BUT this is not a change to kill roleplay, it’s simply a change that will align ST with its primary long term goal of being the “LLM Frontend for Power Users”. By being a neutral tool that does open up ST to be used in any environment whether that be a business, a university or for roleplay use. In my mind this will only help ST grow and keep the developers passionate about continuing the project.

>MYTH ST is being changed so it can be monetized.

>This is simply a lie that keeps getting spread by doomers. I have seen countless messages from the development team that contradict this but angry users keep calling them liars. Look In my day job (going to keep this vague) I have a masters of information systems and work in the financial investments space.

>MYTH ST will be preventing users from using it for RP in the future.

>I’m really not sure how this got started but one bad joke about RP being a bannable offense from Cohee didn’t help lol.

>So I ask the community for two things. One please be patient and wait and see as these changes roll out. I think you’ll find your RP experience won’t be disrupted/changed like you fear. Second please tone down the rhetoric around this. I’ve had to remove probably around 100 comments hurling personal attacks against the developers. Nasty insults against people who have donated 1000s of hours of their time to bring you a FREE tool that provides countless hours on entertainment using a cutting edge technology.

https://www.reddit.com/r/SillyTavernAI/comments/1g0x2m4/proposed_changes_megathread/

they keep pouring fuel on the fire huh?

Anonymous
10/10/24(Thu)20:29:43 No.102769351

Anonymous 10/10/24(Thu)20:29:43 No.102769351

>>102769342
who cares this is a mikupad general

Anonymous
10/10/24(Thu)20:29:46 No.102769352

Anonymous 10/10/24(Thu)20:29:46 No.102769352

>>102769342
Do they actually think people will fall for this shit? We have pattern recognition. We know how this tale always ends.

Anonymous
10/10/24(Thu)20:31:53 No.102769375

Anonymous 10/10/24(Thu)20:31:53 No.102769375

>>102769207
> Implying $1000 is "high range", and AMD has given up.
MI300 would like a word with you.

Anonymous
10/10/24(Thu)20:35:43 No.102769415

Anonymous 10/10/24(Thu)20:35:43 No.102769415

>>102769375
I am talking about the RDNA 4 cards that should be announced in Q1 2025

Anonymous
10/10/24(Thu)20:41:43 No.102769465

Anonymous 10/10/24(Thu)20:41:43 No.102769465

>>102769342
objectively I think it's a fine move and most of the people panicking about it are being hysterical. ST has all the pieces in place to be *the best* general LLM interface out there and I've actually wished there was something like a corpo-ST for a while now, something with great support for all sorts of backends and samplers with great prompting functionality and additional tooling. I think it's stupid to believe they're going to be outright hostile to RP stuff in the future just because of a few seemingly tongue-in-cheek comments from the devs when everyone was freaking out, I'd fully expect them to keep their word about continuing to support it through extensions or whatever
however, the
>ServiceTesnor
shitposting is really funny to me so I'll continue to indulge in it for the time being

Anonymous
10/10/24(Thu)20:53:45 No.102769546

Anonymous 10/10/24(Thu)20:53:45 No.102769546

>>102769375
datacenter is a whole separate market, none of you niggers are running h100s or equivalents
AMD is giving up on high end consumer and probably also pro cards

Anonymous
10/10/24(Thu)21:57:51 No.102770124

Anonymous 10/10/24(Thu)21:57:51 No.102770124

Got any tips how to make hermes 405b good? Preset? Samplers?

Anonymous
10/10/24(Thu)22:00:16 No.102770148

Anonymous 10/10/24(Thu)22:00:16 No.102770148

>>102770124
It's objectively worse than largestral

Anonymous
10/10/24(Thu)22:00:32 No.102770153

Anonymous 10/10/24(Thu)22:00:32 No.102770153

File: file.png (693 KB, 734x978)

693 KB PNG

Are there any models that are similar to the original chatGPT, I want to recreate nigga mode

Anonymous
10/10/24(Thu)22:02:56 No.102770174

Anonymous 10/10/24(Thu)22:02:56 No.102770174

>>102770153
I just use the Big Nigga card.
It's by far the best assistant card.

Anonymous
10/10/24(Thu)22:44:23 No.102770578

Anonymous 10/10/24(Thu)22:44:23 No.102770578

>>102769342
Sounds like this was all a nothingburger and SillyTavern will still be the best RP frontend going forward.

Anonymous
10/10/24(Thu)22:46:14 No.102770594

Anonymous 10/10/24(Thu)22:46:14 No.102770594

https://huggingface.co/PocketDoc/Dans-PersonalityEngine-v1.0.0-8b i trained this, i like it but it's still an 8b so ymmv

Anonymous
10/10/24(Thu)22:47:12 No.102770605

Anonymous 10/10/24(Thu)22:47:12 No.102770605

PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency
https://arxiv.org/abs/2410.07563
>We introduce PLaMo-100B, a large-scale language model designed for Japanese proficiency. The model was trained from scratch using 2 trillion tokens, with architecture such as QK Normalization and Z-Loss to ensure training stability during the training process. Post-training techniques, including Supervised Fine-Tuning and Direct Preference Optimization, were applied to refine the model's performance. Benchmark evaluations suggest that PLaMo-100B performs well, particularly in Japanese-specific tasks, achieving results that are competitive with frontier models like GPT-4.
https://huggingface.co/pfnet
not up yet but that seems to be their HF. posting for VNTLanon

Anonymous
10/10/24(Thu)22:56:34 No.102770704

Anonymous 10/10/24(Thu)22:56:34 No.102770704

>>102769342
>>102769342
We are so back

Anonymous
10/10/24(Thu)23:00:56 No.102770763

Anonymous 10/10/24(Thu)23:00:56 No.102770763

>>102770605
I'm hyped.

Anonymous
10/10/24(Thu)23:08:25 No.102770847

Anonymous 10/10/24(Thu)23:08:25 No.102770847

>>102766594
>unslopped
>immediately talks about "her glimmering azure orbs"
FUCK YOUUUUUUUUUU

Anonymous
10/10/24(Thu)23:10:35 No.102770875

Anonymous 10/10/24(Thu)23:10:35 No.102770875

>>102770605
Thank you!

Anonymous
10/10/24(Thu)23:18:25 No.102770997

Anonymous 10/10/24(Thu)23:18:25 No.102770997

>>102770605
>fp8 training
Nice. Can't wait to try an IQ1 of it.

Anonymous
10/10/24(Thu)23:19:06 No.102771009

Anonymous 10/10/24(Thu)23:19:06 No.102771009

llamacpp is processing every time now even when I click continue with no edits.

Anonymous
10/10/24(Thu)23:22:43 No.102771074

Anonymous 10/10/24(Thu)23:22:43 No.102771074

>>102771009
Check if whatever you're using to talk to it is sending "cache_prompt": true in the request. If not, click on random things until it does.

Anonymous
10/10/24(Thu)23:27:38 No.102771160

Anonymous 10/10/24(Thu)23:27:38 No.102771160

IterGen: Iterative Structured LLM Generation
https://arxiv.org/abs/2410.07295
https://github.com/uiuc-arc/itergen
in case anyone is interested in structured outputs. git isn't live yet

Anonymous
10/10/24(Thu)23:37:21 No.102771300

Anonymous 10/10/24(Thu)23:37:21 No.102771300

>>102747280
>Is your RAM on the QVL?
Thanks yeah and no, the gigabyte site does not have the rev 2.0 version of the board info for QVL check but it does have it linked via the 1.0 or generic QVL version.

I got "M321R4GA3BB6-CQKVS" ram, which is only on that list as 'M321R4GA3BB6-CQKMG' which was the closest one, but not the same product name (different testing batch bins?), and GPT says "check if BIOS is compatible with the QS version of the CPU".

I messaged memory-net just in case, but I either think its the CPUs or the power supply not allowing it to boot if not the ram.

Anonymous
10/10/24(Thu)23:44:09 No.102771371

Anonymous 10/10/24(Thu)23:44:09 No.102771371

>>102771300
Are you SURE all the RAM is seated properly? Those slots are a real bitch to seat, and can even pseudo “click” but not actually be slotted right

Anonymous
10/10/24(Thu)23:53:22 No.102771470

Anonymous 10/10/24(Thu)23:53:22 No.102771470

>>102771371
haha dont do that to me anon, but ill check when I get home, but I still think its another issue

Dunno if I should troubleshoot with the ebay guy or just return it, Ive got a week to work it out

Anonymous
10/10/24(Thu)23:55:00 No.102771486

Anonymous 10/10/24(Thu)23:55:00 No.102771486

File: Untitled.png (656 KB, 1080x2936)

656 KB PNG

Upcycling Large Language Models into Mixture of Experts
https://arxiv.org/abs/2410.07524
>Upcycling pre-trained dense language models into sparse mixture-of-experts (MoE) models is an efficient approach to increase the model capacity of already trained models. However, optimal techniques for upcycling at scale remain unclear. In this work, we conduct an extensive study of upcycling methods and hyperparameters for billion-parameter scale language models. We propose a novel "virtual group" initialization scheme and weight scaling approach to enable upcycling into fine-grained MoE architectures. Through ablations, we find that upcycling outperforms continued dense model training. In addition, we show that softmax-then-topK expert routing improves over topK-then-softmax approach and higher granularity MoEs can help improve accuracy. Finally, we upcycled Nemotron-4 15B on 1T tokens and compared it to a continuously trained version of the same model on the same 1T tokens: the continuous trained model achieved 65.3% MMLU, whereas the upcycled model achieved 67.6%. Our results offer insights and best practices to effectively leverage upcycling for building MoE language models.
neat

Anonymous
10/10/24(Thu)23:57:59 No.102771522

Anonymous 10/10/24(Thu)23:57:59 No.102771522

>>102771074
It does say it, but it's processing every time. It's sillytavern

Anonymous
10/11/24(Fri)00:07:34 No.102771604

Anonymous 10/11/24(Fri)00:07:34 No.102771604

>>102771522
Did you check the actual request (your browser's dev tools) or just ST telling you it's doing it? It's working fine for me with the shitty (now old) vim plugin for llama-server. I set it up a few days ago, so i remember that query flag specifically.
You can try llama-server directly with their integrated ui as well, just to make sure there's no funny business in the middle.
Also. Is this a long chat already or a new one? Are you completely sure ST is not doing some sort of context shifting, trimming old messages or anything like that?

Anonymous
10/11/24(Fri)00:19:07 No.102771701

Anonymous 10/11/24(Fri)00:19:07 No.102771701

From the paper, Aria actually looks quite impressive, especially for video understanding. Of course I'm a coomer, so I'm mainly interested in its ability to accurately caption short porn clips. We finally got what looks like a decent text2video model (PyramidFlow). I can't help but feel like the era of local NSFW text2video models isn't that far away.

Anonymous
10/11/24(Fri)00:24:39 No.102771742

Anonymous 10/11/24(Fri)00:24:39 No.102771742

I know that Qwen2.5 is decent for RP, but terrible for ERP due to censorship. While waiting on the fine-tunes, I came across this, and was curious if anybody has tried it yet.

https://huggingface.co/gghfez/Magnum-v1-72b-Qwen2.5

Anonymous
10/11/24(Fri)00:29:18 No.102771779

Anonymous 10/11/24(Fri)00:29:18 No.102771779

>>102769342
>align ST with its primary long term goal of being the “LLM Frontend for Power Users”.
>any environment whether that be a business, a university
Do they not know what a power user is?

Anonymous
10/11/24(Fri)00:33:11 No.102771813

Anonymous 10/11/24(Fri)00:33:11 No.102771813

>>102771701
How those mixed vision models get affected by quantization? Worse than text-only?

Anonymous
10/11/24(Fri)00:35:17 No.102771826

Anonymous 10/11/24(Fri)00:35:17 No.102771826

In llama.cpp/koboldcpp can you mix a pascal card with Intel Arc A770?
I'm worried about getting problems on linux too. Nividia drivers are a hassle. Can i just plug this shit in and be done with it?
How are there no dedicated ai cards yet.

Anonymous
10/11/24(Fri)00:44:29 No.102771872

Anonymous 10/11/24(Fri)00:44:29 No.102771872

>>102771826
>In llama.cpp/koboldcpp can you mix a pascal card with Intel Arc A770?
Yes, the Vulkan backend supports using multiple GPU vendors at the same time.
>I'm worried about getting problems on linux too. Nividia drivers are a hassle. Can i just plug this shit in and be done with it?
Nvidia drivers on Linux are plug and play. You should be more concerned about the Intel card.

Anonymous
10/11/24(Fri)00:49:03 No.102771892

Anonymous 10/11/24(Fri)00:49:03 No.102771892

>>102771872
>Nvidia drivers on Linux are plug and play.
I cant boot with more recent liquorix kernels anymore which seems to be caused by nvidia drivers.
I'm a brainlet and come from windows with colorful kiddy buttons who do the work for me (as it should be). So installing nvidia and cuda is not fun already.
I'm just worried if I put a intel card in there it fucks everything up. But its nice that llama.cpp supports different vendors. Thanks for the info.

Anonymous
10/11/24(Fri)00:51:43 No.102771913

Anonymous 10/11/24(Fri)00:51:43 No.102771913

>>102771826
There might be a really slow runtime that supports everything that might work, but you are lucky if intel is even supported.
The A770 is also pretty slow, the pic rel is like a 5gb model. Imagine showing benchmarks for an AI that only uses 1/3rd of it's vram.
Like 10tk/s at full vram usage is very usable, it's just worthless for a dual GPU setup, since you could just buy a 5090 and it would give you 3x more token performance (with the same model) because it has 3x more bandwidth, and nvidia works with pretty much everything (you could probably use it with your pascal GPU).

Anonymous
10/11/24(Fri)00:52:45 No.102771922

Anonymous 10/11/24(Fri)00:52:45 No.102771922

File: LLM-Blog-041824-LLM-Execu(...).png (860 KB, 1280x720)

860 KB PNG

>>102771913
forgot pic

Anonymous
10/11/24(Fri)00:55:41 No.102771944

Anonymous 10/11/24(Fri)00:55:41 No.102771944

>>102771813
In my experience the language weights are affected by quantization just like any LLM. The vision weights (encoder, projector, cross attention if the model has it) need to be bf16, or maybe an int8 quant. But definitely not 4 bit.

Anonymous
10/11/24(Fri)01:01:20 No.102771987

Anonymous 10/11/24(Fri)01:01:20 No.102771987

How many tokens is an image?

Anonymous
10/11/24(Fri)01:02:36 No.102771993

Anonymous 10/11/24(Fri)01:02:36 No.102771993

>>102771922
That's all well and good but unless they actually go and code that shit themselves open source trannies refuse to touch anything but CUDA.

Anonymous
10/11/24(Fri)01:02:45 No.102771994

Anonymous 10/11/24(Fri)01:02:45 No.102771994

>>102771987
depends how much stuff is in it

Anonymous
10/11/24(Fri)01:04:46 No.102772011

Anonymous 10/11/24(Fri)01:04:46 No.102772011

File: example.jpg (615 KB, 2541x1904)

615 KB JPG

>>102766534

I used a 3D printed mount since no commercial solutions don't have something vertical. You can also buy them off etsy or something.

>pic related,

Anonymous
10/11/24(Fri)01:05:07 No.102772018

Anonymous 10/11/24(Fri)01:05:07 No.102772018

>>102771604
Yes it is I checked and it's sending it.
I don't know what was going on but it's doing it now, just had to restart.

Anonymous
10/11/24(Fri)01:06:11 No.102772024

Anonymous 10/11/24(Fri)01:06:11 No.102772024

>>102771994
Lettuce say hypothetically I posses an image of Micu on a white background looking at the viewer with a slightly sardonic smile on her face.
How many tokens is that, roughly speaking?

Anonymous
10/11/24(Fri)01:07:47 No.102772034

Anonymous 10/11/24(Fri)01:07:47 No.102772034

>>102772024
how big is the image? would be faster to open up a space and test it with the actual encoder.

Anonymous
10/11/24(Fri)01:13:59 No.102772087

Anonymous 10/11/24(Fri)01:13:59 No.102772087

>>102772024
depends on resolution, model, settings, etc

Anonymous
10/11/24(Fri)01:59:47 No.102772466

Anonymous 10/11/24(Fri)01:59:47 No.102772466

>>102766737
It does. Only dumb fucks like him >>102759705
run llms sequentially at snail speed. Any parallel processing requires excessive communication between GPUs

Anonymous
10/11/24(Fri)02:03:14 No.102772498

Anonymous 10/11/24(Fri)02:03:14 No.102772498

>>102772087
because i like jpegs for the resolution, the color, everything about jpegs i like

Anonymous
10/11/24(Fri)02:10:48 No.102772558

Anonymous 10/11/24(Fri)02:10:48 No.102772558

>>102770124
I have no particular tips but I run it at temperature 0.8.

Anonymous
10/11/24(Fri)02:14:52 No.102772596

Anonymous 10/11/24(Fri)02:14:52 No.102772596

File: 17105ce09327f4ef477412c55(...).jpg (69 KB, 563x371)

69 KB JPG

>>102771993
AMDrones, our response?

Anonymous
10/11/24(Fri)02:51:18 No.102772890

Anonymous 10/11/24(Fri)02:51:18 No.102772890

>>102772862
>>102772862
>>102772862

Anonymous
10/11/24(Fri)04:03:44 No.102773539

Anonymous 10/11/24(Fri)04:03:44 No.102773539

>>102768713
Wait a minute... that project is licensed under AGPL.
Why the fuck are they pivoting to corpos when they're never going to touch it anyways?

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.