[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

LMGumi Edition

Previous threads: >>102743974 & >>102737214

►News
>(10/10) Aria: 25.3B, 3.9B active, multimodal native MoE model with 64k context: https://hf.co/rhymes-ai/Aria
>(09/27) Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
>(09/25) Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog
>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: normal gumi gen.png (588 KB, 512x512)
588 KB
588 KB PNG
►Recent Highlights from the Previous Thread: >>102743974

--Aria: Open-source multimodal native MoE model by Rhymes AI:
>102758358
--Mistral tokenizer and prompt formatting discussion:
>102751333 >102751359 >102751381 >102751464 >102751910 >102751487
--AMD GPUs face challenges with multi-GPU support and P2P transfers, 16GB RX 6800 recommended for sub-$300 category:
>102747480 >102747696 >102747799 >102747830 >102747929 >102748171 >102748032 >102747714
--Mistral Small and Mixtral 8x7b Q6_K recommended for 3090 setup:
>102750146 >102750611 >102750764 >102750956 >102751248 >102751465
--Strubell's 100 million gallons of oil per AI inference claim debunked:
>102744385
--Larger models can be more creative with the right sampling techniques:
>102750367 >102750404 >102750470 >102750515 >102750718 >102750983 >102751017 >102751242 >102750832 >102751123 >102750521
--Language models can imitate constructed languages but have limitations:
>102748022 >102748182
--Creating an uncensored model from scratch would require addressing dataset limitations, built-in biases, and training costs:
>102747016 >102747423
--Backend automatically adds BOS token, don't add in frontend:
>102753612 >102753668 >102753702 >102753786 >102753898 >102754063 >102754507 >102754742 >102754780 >102755039
--Request for node-based editor with specific AI features:
>102744816 >102744869
--GPU upgrades recommended for better performance over CPU inference:
>102745882 >102745931 >102746014 >102746375 >102752232 >102746217
--Benchmark leaderboard discussion, Mistral Small performing well, questions about Gemma2 9b performance:
>102748238 >102748272 >102748518 >102748557
--Miku (free space):
>102744114 >102744270 >102744320 >102744360 >102746386 >102748863 >102754933 >102755545 >102756899 >102756964 >102757239 >102757390 >102758184

►Recent Highlight Posts from the Previous Thread: >>102743977

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
Gumilove
>>
My server is down so I can't Nala test Aria, sorry.
>>
File: 00202-4258927505.png (371 KB, 512x512)
371 KB
371 KB PNG
>>102758901
Oh yeah PSA:
with regards to running multiple desktop PSUs in tandem. Went to take one of my PSUs and 3090s out to put in my gaming PC for a while, while I wait for 90B gguf support, couldn't get power on. So tried different cables etc, trouble shooting more or less places the problem with the 24 pin motherboard connector on the PSU side. Presumably it somehow got cooked (it was running a mutual ground cable between it and my other PSU so never noticed since it was only supplying ground and power for one 3090). Would advise if someone is going to build their own server to avoid running tandem PSUs. Just stick to what you can fit on a single one, power limit if you have to.
>>
File: 2347656478769.gif (929 KB, 326x318)
929 KB
929 KB GIF
>>102758842
>AMD GPUs face challenges with multi-GPU support

I use a 7900xtx and 7800xt and for some reason only kobold version 1.61.2.yr0-ROCm is the only program that will allow any sort of gpu splitting.

It simply does not work on any version, as any output is garbled bullshit. Outside of this specific version, I can only use 1 gpu.

My performance is great, but fuck me not being able to GPU split on new versions for no fucking reason fucking sucks ass, fortunately L3 also sucks ass, so updating isnt exactly necessary.
>>
Aria.gguf?
>>
>>102758839
>LMGumi Edition
Affirmative. Local Gumi activated.
>>
>>102758839
Is there a way to hide or rename the assistant and user roles in the tavern chat history the model is fed? I want to see whether that might not improve how natural the outputs are with some models.
>>
svelk
>>
File: Untitled.png (1.79 MB, 1080x3084)
1.79 MB
1.79 MB PNG
Accelerating Diffusion Transformers with Token-wise Feature Caching
https://arxiv.org/abs/2410.05317
>Diffusion transformers have shown significant effectiveness in both image and video synthesis at the expense of huge computation costs. To address this problem, feature caching methods have been introduced to accelerate diffusion transformers by caching the features in previous timesteps and reusing them in the following timesteps. However, previous caching methods ignore that different tokens exhibit different sensitivities to feature caching, and feature caching on some tokens may lead to 10× more destruction to the overall generation quality compared with other tokens. In this paper, we introduce token-wise feature caching, allowing us to adaptively select the most suitable tokens for caching, and further enable us to apply different caching ratios to neural layers in different types and depths. Extensive experiments on PixArt-α, OpenSora, and DiT demonstrate our effectiveness in both image and video generation with no requirements for training. For instance, 2.36× and 1.93× acceleration are achieved on OpenSora and PixArt-α with almost no drop in generation quality.
https://github.com/Shenyi-Z/ToCa
neat
>>
can i run joycaption on koboldcpp
>>
File: rtx.png (822 KB, 1918x562)
822 KB
822 KB PNG
>32GB
>16GB
Actual humilliation ritual
>>
>>102759501
Already set money aside for this beauty.
>>
>>102759501
Massive improvements to bandwidth though. You'll run Mistral-Large 5bpw at 20t/s on three of these babies.
>>
>>102759501
wtf is that image
>>
Tried that mahou nemo finetune
It's good actually, better than nemo by itself i think (so far at least).
Nemo was very repetitive and predictable at times, this one feels fresh, but not sure how long it will last.
>>
>>102759501
Having 32GB world be nice, but I bet it's going to overheat my current setup.
>>
Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling
https://arxiv.org/abs/2410.07145
>One essential advantage of recurrent neural networks (RNNs) over transformer-based language models is their linear computational complexity concerning the sequence length, which makes them much faster in handling long sequences during inference. However, most publicly available RNNs (e.g., Mamba and RWKV) are trained on sequences with less than 10K tokens, and their effectiveness in longer contexts remains largely unsatisfying so far. In this paper, we study the cause of the inability to process long context for RNNs and suggest critical mitigations. We examine two practical concerns when applying state-of-the-art RNNs to long contexts: (1) the inability to extrapolate to inputs longer than the training length and (2) the upper bound of memory capacity. Addressing the first concern, we first investigate *state collapse* (SC), a phenomenon that causes severe performance degradation on sequence lengths not encountered during training. With controlled experiments, we attribute this to overfitting due to the recurrent state being overparameterized for the training length. For the second concern, we train a series of Mamba-2 models on long documents to empirically estimate the recurrent state capacity in language modeling and passkey retrieval. Then, three SC mitigation methods are proposed to improve Mamba-2's length generalizability, allowing the model to process more than 1M tokens without SC. We also find that the recurrent state capacity in passkey retrieval scales exponentially to the state size, and we empirically train a Mamba-2 370M with near-perfect passkey retrieval accuracy on 256K context length. This suggests a promising future for RNN-based long-context modeling.
https://github.com/thunlp/stuffed-mamba
Git isn't live yet. good news though mambabros
>>
>>102759501
buying a 600w launch card with a new connector seems like a bad idea
>>
File: Untitled.png (942 KB, 1080x2579)
942 KB
942 KB PNG
>>102759535
woops
>>
>>102759501
First gen of GDDR7 memory didn't increase in density so besides going clamshell fitting another 8GB was all they could do.
>>
>>102759501
>going to legit need 2 power supplies to run them all
>>
>>102759516
No thanks, I'm fine with 15t/s on 4x3090
>>
>>102759516
It's just a 50% improvement at most for the 5090
That' going from 10 t/s to 15 t/s
>>
>>102759587
It'll be nice in cases where you go from something like 4 t/s to 6 t/s and into acceptable speed territory.
>>
>Aria
Another multimodal that I can't run yet >:(
>>
File: Untitled.png (1.44 MB, 1080x3040)
1.44 MB
1.44 MB PNG
Round and Round We Go! What makes Rotary Positional Encodings useful?
https://arxiv.org/abs/2410.06205
>Positional Encodings (PEs) are a critical component of Transformer-based Large Language Models (LLMs), providing the attention mechanism with important sequence-position information. One of the most popular types of encoding used today in LLMs are Rotary Positional Encodings (RoPE), that rotate the queries and keys based on their relative distance. A common belief is that RoPE is useful because it helps to decay token dependency as relative distance increases. In this work, we argue that this is unlikely to be the core reason. We study the internals of a trained Gemma 7B model to understand how RoPE is being used at a mechanical level. We find that Gemma learns to use RoPE to construct robust "positional" attention patterns by exploiting the highest frequencies. We also find that, in general, Gemma greatly prefers to use the lowest frequencies of RoPE, which we suspect are used to carry semantic information. We mathematically prove interesting behaviours of RoPE and conduct experiments to verify our findings, proposing a modification of RoPE that fixes some highlighted issues and improves performance. We believe that this work represents an interesting step in better understanding PEs in LLMs, which we believe holds crucial value for scaling LLMs to large sizes and context lengths.
really interesting
>>
>>102759580
Liar, you are getting about 6t/s
>>
>>102759380
>>102759535
>>102759675
none of this ever amounts to anything. nothing ever happens.
>>
>>102759705
Why don't you try it for yourself?
https://github.com/theroyallab/tabbyAPI/
tensor_parallel: true
>>
>>102759634
As a vramlet, I consider 0.5 miqu slow cook, 1.5 slow but usable, 4+ (mixtral) being the acceptable minimum for what I would consider real-time.
I guess on top of that it would be nice to be able to regenerate walls of text and only glancing through it but that's a luxury.
>>
>>102759634
You won't notice much difference if you're offloading.
>>
File: Untitled.png (1.11 MB, 1080x2036)
1.11 MB
1.11 MB PNG
Restructuring Vector Quantization with the Rotation Trick
https://arxiv.org/abs/2410.06424
>Vector Quantized Variational AutoEncoders (VQ-VAEs) are designed to compress a continuous input to a discrete latent space and reconstruct it with minimal distortion. They operate by maintaining a set of vectors -- often referred to as the codebook -- and quantizing each encoder output to the nearest vector in the codebook. However, as vector quantization is non-differentiable, the gradient to the encoder flows around the vector quantization layer rather than through it in a straight-through approximation. This approximation may be undesirable as all information from the vector quantization operation is lost. In this work, we propose a way to propagate gradients through the vector quantization layer of VQ-VAEs. We smoothly transform each encoder output into its corresponding codebook vector via a rotation and rescaling linear transformation that is treated as a constant during backpropagation. As a result, the relative magnitude and angle between encoder output and codebook vector becomes encoded into the gradient as it propagates through the vector quantization layer and back to the encoder. Across 11 different VQ-VAE training paradigms, we find this restructuring improves reconstruction metrics, codebook utilization, and quantization error.
https://github.com/cfifty/rotation_trick
since VQ-VAEs are used so much this will have a lot of cool downstream effects
>>
Single 5090, 2000€ min price, actual market price around 3000€ probably, 32gb vram, 600W.
x4 3090, 2000€, 96gb vram, around 1000w total with undervolt.
x2 3090, 1000€, 48gb vram, around 600w total with undervolt.
Lmao xd lol
>>
>>102759501
you can get 48gb workstation cards for about that price used on ebay, and 3090s are cheaper than either choice
>>
Has anyone here tried to use local models for language practice? Specifically Japanese? Would something like Nemo do the trick? I've tried its Spanish, and although it's not perfect, it's good enough.
>>
>>102759874
I ain't got space in my PC for many cards
>>
File: Untitled.png (454 KB, 1080x2314)
454 KB
454 KB PNG
InAttention: Linear Context Scaling for Transformers
https://arxiv.org/abs/2410.07063
>VRAM requirements for transformer models scale quadratically with context length due to the self-attention mechanism. In this paper we modify the decoder-only transformer, replacing self-attention with InAttention, which scales linearly with context length during inference by having tokens attend only to initial states. Benchmarking shows that InAttention significantly reduces VRAM usage during inference, enabling handling of long sequences on consumer GPUs. We corroborate that fine-tuning extends context length efficiently, improving performance on long sequences without high training costs. InAttention offers a scalable solution for long-range dependencies in transformer models, paving the way for further optimization.
paper by one guy (actually only paper of his on arxiv). can't find code and he hasn't uploaded the models to HF. still, interesting
>>
>>102759956
A open rig is like 80€.
>>
File: 1472860069099.png (191 KB, 600x979)
191 KB
191 KB PNG
Hey it's me that guy who posts burger catastrophes and asks for models and only has 8gb of vram. You all know me by now so is there any new good models? See you in a month.
>>
>>102760006
How loud is that gonna be with 4x 3090s running?
>>
>>102760015
just tried mistral small 22b instruct, first impression is great, like miqu 70b at home, but acceptable speed and even leaves room for 32k context if I wish.
>>
>>102760073
This seems too big. Would using the 2Q still be higher quality than an 8b at a higher quant?
>>
>>102760073
why are you pushing this garbo model so hard, what is your end game
>>
>>102760145
You need to offload
>>
File: 118792626_p0.png (2.76 MB, 2508x3541)
2.76 MB
2.76 MB PNG
>>102760265
I cannot. The model must exist solely within 8GB of vram. Please understand.
>>
File: 1707912212705564.png (20 KB, 90x88)
20 KB
20 KB PNG
>>102760273
you can choose to do that but then you are stuck with retarded 7B models with goldfish memory
>>
Aria verdict?
>>
>>102760368
Waiting for gguf
>>
File: IMG_6183.jpg (134 KB, 896x735)
134 KB
134 KB JPG
Creator of this ad definitely knows something we don’t, how are they fine tuning a closed model?? OpenAI hates this one easy trick!
>>
>>102760427
>how are they fine tuning a closed model?
https://openai.com/index/gpt-3-5-turbo-fine-tuning-and-api-updates/
>August 22, 2023
https://platform.openai.com/docs/guides/fine-tuning
any other stupid questions?
>>
>>102760427
go back
>>
>>102760152
NTA but what's the alternative? Llama 3? Qwen? These models are too censored
>>
>Pyramid Flow, a training-efficient Autoregressive Video Generation method based on Flow Matching. By training only on open-source datasets, it generates high-quality 10-second videos at 768p resolution and 24 FPS, and naturally supports image-to-video generation.
https://pyramid-flow.github.io/
>>
>>102760427
There was just a leak of an online AI girlfriend site so better to keep it local or use a throwaway email.
>>
>>102760486
Nemo. More parameters don't make a model automatically better.
>>
>>102760451
>https://platform.openai.com/docs/guides/fine-tuning
Interesting stuff in there actually.
>If you would like to shorten the instructions or prompts that are repeated in every example to save costs, keep in mind that the model will likely behave as if those instructions were included, and it may be hard to get the model to ignore those "baked-in" instructions at inference time.
>entities
>>
>>102760506
How does Nemo compare to Small? 12B vs 22B. Mistral 7B was good for the size and miqu is great, what's the sweetspot?
>>
>>102760511
So OpenAI describes it as a way to bake few-shot prompts into the model, huh?
>>
File: file.png (69 KB, 601x262)
69 KB
69 KB PNG
APOLOGIZE

SAMA IN SHAMBLES
>>
>>102760671
small is slightly smarter, but nemo is more open minded
small is better for assistant stuff
nemo is better for funsies
>>
>>102760671
The way I would describe it is that Small is a Nemo sidegrade, but it's more assistant slopped.
>what's the sweetspot?
There isn't a sweetspot imo, either go small or go big.
>>
anthropic won btw
https://www.anthropic.com/news/message-batches-api
>>
>>102760681
what is this, spoonfed context for ants?
>>
>>102760776
no 3.5 opus me no care
>>
File: file.png (149 KB, 1338x164)
149 KB
149 KB PNG
>>102760779
1b AGI

https://github.com/xjdr-alt/entropix
>>
>>102760738
Personally I never had trouble fucking the bot and I hate it when it just agrees to everything and doesn't "get" what kind of scenario I'm trying to do. (Midnight) Miqu has been great but the speed is just not usable on my machine. So I guess 22B should be second best option unless it has some serious weaknesses. Though Nemo is faster. But I hate when the bot is stupid and keeps repeating itself. Mixtral was the worst case of this, even though it otherwise understood instructions very well.
>>
llamacpp is slow at implementing multimodal models like pixtral and llama 3.2 is thre any good alterative to it?
>>
File: 1701959181888378.png (343 KB, 1019x773)
343 KB
343 KB PNG
>>102760799
ollama
>>
>>102760789
okay now show how many attempts that took
>>
File: file.png (110 KB, 1144x264)
110 KB
110 KB PNG
>>102760833
apparently it's not cherrypicked
>>
>>102760856
>>102760789
>>102760681
i can't believe reddit ate up the reflection grift, yet it's still sleeping on an actual happening
>>
>>102760856
am I supposed to be impressed by that nonsense?
>>
>>102760789
holy fuk relfection 2 is out already?
>>
>>102760870
>assuming his biological ""reasoning"" is better that whatever ASI came up with
ngmi
>>
>>102760876
there are at least 3 different people posting screencaps on x so it's actuall a thing unlike reflection
>>
>>102760856
>subtract 9
>refuses to explain
>guesses the correct answer
>we're supposed to accept that this isn't cherrypicked
>>
>>102760870
>that nonsense
uhm, anon....
>>
>>102760899
subtracting 9 doesn't get you any closer to the solution, it's the same as answering that 9.9 is larger without any explanation
>>
>>102760908
uhm actually it probably does, since i guess the base model is more confident about 0.11 < 0.9 because of the training data
>>
so what's the deal with the SillyTavern drama? qrd?
i haven't updated mine since last month, do i pull the trigger and allow it to update?
>>
>>102760920
yeah that's called overfitting, not reasoning
>>
>>102760925
that's not what overfitting means
>>
>>102760908
The sampler changes direction when the model becomes to uncertain about the next token. It doesn't have to conform to human ideals of how reasoning works.
>>
>>102760924
are you a proxy negro from /aicg/?
is seraphina your waifu and you're too lazy to download her again?
if you answered no to both questions, you can update
>>
>>102760948
no and no kek, i'm just using it as a frontend for kcpp and basic text adventure rpgs
guess i'll updoot then
>>
>>102760789
how high was the author when he wrote that readme?
>>
>>102760738
Shill your favourite Nemo tune to me. Or I'm just going to download Lyra.
>>
I've never used SillyTavern, is it worth the trouble to run on top of KoboldCPP? Does it make things any more convenient or will it just overcomplicate things? Can you save settings for a model or do you need to do that on Kobold anyway? Do the roleplay prompt insertions and all the extra stuff ST does actually help compared to just writing instructions for the AI on Kobold manually? I guess the only problem right now is that I can't easily use the "cards" on Kobold (or can I? It has some features). And no swipes. I don't care about visuals whole lot, I care about ease of use. Remembering which settings each model needs on my PC is sometimes a hassle. Swipes would be cool. Does ST allow swiping between multiple branches like miku.gg or only most recent message?
>>
>>102761053
ServiceTensor isn't made for casuals like you
>>
File: file.png (18 KB, 701x134)
18 KB
18 KB PNG
>>102761057
>>102761053
>>
>>102761057
*ServiceTesnor
>>
>>102760856
The average midwit reading that would believe that.
>>
>>102761017
nta but magnum 12b v2
>>
>>102761070
anon...
>>
>>102761088
9.11 never happened
>>
>>102761017
kill ys
>>
>>102761053
>Does it make things any more convenient or will it just overcomplicate things?
It will overcomplicate things in that you have a lot more options.
Download it, see if any of the options available are anything you'd use, if not, go back to kcpps native ui.
>>
>>102761085
Why not v2.5?
>>
>>102761053
are you talking about things like context length and cpu offload? you still have to configure them from kobold
also kobold does have kcpps files, you don't have to remember the settings for each model, just save one or more kcpps for each
if you were referring to sampler settings, system prompt, instruct format, etc. then yes, ST allows you to control and save those settings from its UI, but you still have to switch them manually
as for the swipes, you can only swipe the last message from each branch
cards are the main reason I use ST, I often edit them or make my own, I don't think kobold can do that
>>
File: 1702214080635066.png (84 KB, 530x1077)
84 KB
84 KB PNG
>>102761053
i was hesitant about running ST over kcpp too, main concern was that it seemed too "chat-oriented" and i thought it wouldn't support the text adventure format i preferred
though ST does have a "story" view and i got around the user/bot chat model by making a generic "Narrator" character that responds to all my actions in second person, actually prefer it this way
>Does it make things any more convenient
for one the UI is better/cleaner, at least in my opinion - koboldlite or whatever it is that kcpp has is pretty limited
you also get a whole host of useful extensions like timelines (picrel) which helps immensely in tracking stories and branches
character/story management in general is also much, MUCH better in ST than kcpp
>Can you save settings for a model or do you need to do that on Kobold anyway
you use the same preset file (containing options such as GPU offloading, context size, etc.) as usual to load the model in kcpp, but actual sampler settings and such are done in ST and can be saved
>Do the roleplay prompt insertions and all the extra stuff ST does actually help compared to just writing instructions for the AI on Kobold manually
dunno, not sure i've ever tried
migrating from kcpp to ST i just rewrote my prompt a little and put it in the system prompt section for instruct
>I care about ease of use
my 2 cents: while ST may seem daunting at first due to all the new options, you only ever end up touching a small handful of them and in general i find it easier to use than kcpp
kcpp had HORRIBLE bugs for me in adventure mode that would often mix up AI/user-generated sections and cause other issues in the text view, ST doesn't have that
>Does ST allow swiping between multiple branches like miku.gg or only most recent message?
there's a timelines extension (picrel) that allows branching but it's not swipe-based, still have to click and switch branches manually
>>
>>102761233
What do you actually do with timelines?
Never used that feature. I only swipe, edit replies or start new chats if something is bothering me.
>>
>>102761332
When you have trouble deciding which way to take the scenario or which swipe to pick, you can go with one and return to the other branch and explore it later. Like when playing a VN, exploring all the "routes" except that in here there are infinite routes. You don't have to choose anything and can always return to your favourite branch. Though the lack of commitment can hinder the immersion since no one path is "canon" anymore
>>
Aria is getting close to a perfect vramlet model. Just needs quantization aware 4 bit training and pre-gating.
>>
>>102761422
Did you test it?
>>
>>102761383
>lack of commitment can hinder the immersion
Feels like that could lessen the consequences of your RP actions. But then again, you always have full control over the story anyway.
>>
>>102761233
TIL that there are retards that actually use the broken kcpp UI... I guess Discord shilling in these threads does work.
>>
>>102761163
I haven't tried it.
>>
>>102761529
buy an ad for your meds
>>
cpp bros...

>Practical Llama 3 (and 3.1) inference in a single Java file
https://github.com/mukel/llama3.java/blob/main/Llama3.java
>>
>>102761571
Not much point in trying as 2.5 felt like a downgrade.
>>
>>102761590
>java
oof
>>
>>102761422
But does it kiss on the lips while blowing you?
>>
do i need a jailbreak for local models?
>>
>>102761857
good ones, no
>>
>>102761857
jailbreaks no longer work on regular modes. you need finetuned models or abliterated models, which work without jailbreaks
>>
Why bother with LLMs? Just get a tulpa.
>>
>>102761934
Electric tulpas don't require years of practicing self-induced schizophrenia.
>>
>>102761596
only 2k loc of single file java code to run llama 3. that's pretty cool and educational project.
>>
Hold on... Them they are training us on compressed data trough LLMs.
>>
It's hard to make a tulpa
>>
will issue
>>
Llama.ccp
>>
meds
>>
>>102761983
>m-m-muh loc!
lmao, who the fuck cares. cpp is faster
>>
t: homosexual
>>
>>102762292
faster at breaking shit, maybe. lcpp has become unmaintainable
>>
>>102762354
It will be maintainable as long as the cuda devs don't give up on doing it for free and making the ollama guy rich without any credit. Gggggergavov cannot maintain the project on his own.
>>
>>102762354
gpt5 will maintain it, we don't need meatbags and their loc stinginess anymore
>>
File: file.png (35 KB, 723x192)
35 KB
35 KB PNG
>>102762400
cuda dev tries not seething at ollama chads, impossible challange!
https://www.reddit.com/r/LocalLLaMA/comments/1g00fq3/comment/lr7vmsn/
>>
If you work has value, there will always be grifters trying to profit off it. I made a free software and some nigger sold it for years and when I emailed him he said he simply sold software discovery as a service.
>>
>>102762436
Who will pay for all those tokens?
>>
>>102762483
tokens will have the same price of breathable air
>>
>>102762470
>sold software discovery as a service.
holy mother of based! (i do the same btw, thanks for playing fosscucks)
>>
Does a CPU's NPU matter when offloading part of a model to the CPU?
>>
File: file.png (37 KB, 637x608)
37 KB
37 KB PNG
>>102758839
trying again with xtts2
no errors yet but is picrel proper way to activate then install?
>>
File: Screenshot_2834.png (7 KB, 402x110)
7 KB
7 KB PNG
How do I make XTC appear in ST? I can enable it in the sampler selector but it's not added in the sampler menu for me.
>>
>>102762962
XTC is obsolete in ST because it is primarily used for Role****** and that's not something they want in their software.
>>
>>102762988
Based, it has been proven that "roleyplaying" is just an euphemism for pedophilia with ai chatbots.
>>
File: venv.png (1 KB, 207x143)
1 KB
1 KB PNG
>>102762949
Seems correct. Run
>which python
or
>where python
to make sure. It should point to a path inside the directory where you created the venv.
>>
>>102762949
>>102763509
Meant to say:
>which python
or
>whereis python #whereis, not where
>>
AI Sex got deprecated...
>>
File: tmpi3n9d434.png (515 KB, 512x720)
515 KB
515 KB PNG
*pauses dramatically* When life gives you wifi, pee all over it.
>>
>>102763575
That's just masturbation.
>>
phonesex
sexting
>>
>>102763056
Excuse me, I exclusively use llms to generate stories about big titty milfs
>>
File: _06445_.png (2.91 MB, 936x1664)
2.91 MB
2.91 MB PNG
>>102758839
osha violations with gumi
>>
Will these ever be able to generate a complex narrative from very small user input?
>>
>>102760908
You're absolutely wrong. The reason 9.9 is greater than 9.11 is that its decimal portion is greater. It is decomposing the problem by removing the common part. Frankly I can't fathom how anyone could fail to understand this.
>>
>>102763819
Sorry, my crystal ball is in the shop until next week. Try asking again later
>>
>>102763819
they've always been able to.
>>
File: wait_what.png (23 KB, 304x83)
23 KB
23 KB PNG
>>102763835
>>
>>102763819
wait for diff transformer gpt 5 o2
>>
File: 672809804.png (170 KB, 398x281)
170 KB
170 KB PNG
its over, child rape rp is the only use case for llms, shut it down
>>
>>102763583
>>
>>102763926
If AI isn't the future of coding then explain this: >>102763936
>>
>>102763936
based
i remember the time that miku suplexed me and cracked my neck, cried over my corpse, and used the power of love and friendship to revive me
this was after i let her pee on my face of course
>>
>>102763936
>4 variables initialized but unused
i'm rejecting miku's pr
>>
>>102763926
Betting on entertainment would not be a bad bet, but we live in the worst timeline when it comes to it and the politics of it. Otherwise, we would likely already have GTP available for roleplay. There may be some social concerns but honestly nothing will ever replace human interaction. Humans are more than just words leaving their mouth.
>>
File: opinion.png (163 KB, 2044x872)
163 KB
163 KB PNG
>>102763926
>>
Is there any practical reason not to make an llm front end entirely in rpgmaker mv?
>>
>>102764211
There's probably something in Leviticus that makes it a sin, if you're of that inclination
>>
>>102763936
>classic miku log from feb surfaces again
just when you think it's lost it comes back kek, proof of the sovl in mixtral and envoid models tbdesu
>>
>>102763926
why does the thumbnail look like a gameboy game
>>
>>102758927
You re-used cables between the PSUs, right? That's a 50/50 on frying things since they always change the pinout on the SATA power connector.
>>
>>102764211
If that's what you're most comfortable with, go right ahead. I assume it can talk through network and generate/parse json. The latter is simple enough to do yourself if it doesn't.
If you care about char cards, you'll also need a tEXt parser for the png (ridiculously easy. you can skip the image data entirely) the json parser (already discussed) and a b64decode (also simple enough).
>>
File: file.png (12 KB, 581x370)
12 KB
12 KB PNG
>>102763509
where do I use that command?
git bash & python 3.10 dont recognize it
>>
File: tEXt_b64d.png (2 KB, 463x142)
2 KB
2 KB PNG
>>102764211
>>102764358 (cont)
>>
>>102758839
hey guys, haven't been here for months, what would be the best model to fit in 24gbs of ram to use to make cover letters and shit to get a new employment?
my gpu is fucked, so ill deal with the really slow generation
>>
File: file.png (14 KB, 581x370)
14 KB
14 KB PNG
>>102764426
nvm this one works
>>
>>102764471
Cool. When you activate the venv, the path shown should point somewhere within the venv dir (wherever you created it). If that's the case, it means it worked fine. cd to the xtts2 dir, Install the requirements and try to launch it.
The venv remains active ONLY on that terminal and until you close it or run 'deactivate'. If you open a new term, you need to reactivate the venv.
I take it from that screenshot that, on that terminal, you haven't activated it yet.
>>
File: file.png (43 KB, 581x370)
43 KB
43 KB PNG
>>102764613
still throws this error at the end of the picrel
+ I have other things relying on python so I have an inkling this venv will conflict with those
>>
I'm looking at 16gb cards. How bad is intel ARC compared to AMD?
Also, I can run 70b q2 models on cpu at ~1.2 t/s, will I reach 2t/s with a single 16gb card running a q4?
Would it be better if I get something like 8000mt/s ddr5 memory instead?(assuming that my mobo/cpu can handle it)
>>
File: ross.jpg (27 KB, 679x988)
27 KB
27 KB JPG
>>102764703
>How bad is intel ARC compared to AMD?
>he doesn't know
>>
NVLM D 72B opinions?
>>
File: 39_06480_.jpg (724 KB, 2048x2048)
724 KB
724 KB JPG
>>102758839
Psychedelic Gumi edition
>>
File: xtts2.png (18 KB, 830x123)
18 KB
18 KB PNG
>>102764673
I gave you some links to the typer issue yesterday. I'm not sure how to fix them and i don't have windows anywhere handy to test myself.
Regarding conflicts, yes. Make a new venv to be used exclusively for xtts2. I'm sure i mentioned that already. If not, now you know.
Have you tried picrel? Seems to be only tested on ubuntu, but worth a shot. just below that there's a link with instructions for windows.
>https://github.com/coqui-ai/TTS
Is that the project you're trying to run?
>>
been out for a while...
bacc status?
>>
>>102764826
https://github.com/BoltzmannEntropy/xtts2-ui?tab=readme-ov-file
this is the proj
>>
>>102764826
+ im pretty illiterate in programming, let alone AI
>>
>>102764719
I know that both are bad but which one is worse? And by how much.
I'm assuming intel but I don't know.
both 16gb cards barely fit in my budget.
>>
>>102764888
>bacc status
Broken: https://www.youtube.com/watch?v=j3fkDQiCuf0
>>
>>102761017
https://huggingface.co/mradermacher/Stellar-Odyssey-12b-v0.0-i1-GGUF/tree/main
Get this one Q6. Nice for RP.
>>
>>102765066
arc is far worse
if you use linux amd will naturally just werk.
If not general amd support fucking sucks cock
>>
>>102764952
>>102764967
It's a tough one. Last update was 9 months ago. It depends on coqui (it has TTS in its requiremets.txt), whose last update was 8 months ago and, if i remember correctly, it's officially abandoned. Seems like a pile of jank on top of another pile of jank.
IF i were to try it, i'd remove one of those piles and try to use coqui directly, but you may find the same or other issues as well. It seems to only come with a cli, though.
>>
okay I got llama running on my pc now what
>>
>>102765469
Msaturbate. Go on an adventure.
Ask it for a cake recipe.
>>
>>102765469
sex with miku
>>
>>102765501
This. In exactly that order.
>>
How retarded is to buy a RTX A6000 with money I was given on Hanukkah in 2024?

I obviously cannot afford A6000 Ada
>>
File: spines=chilled.jpg (13 KB, 288x171)
13 KB
13 KB JPG
>>102761529
I wonder what your relationship to kobold is. Do you find yourself thinking about him or her in various contexts? How does he or she fit into your life? Maybe it would be interesting for us both to try talking to kobold as if he were there, and seeing how it feels for both of us?
>>
>>102765614
pretty retarded if you need that money for anything more important, but if you have the disposable income and this is an important hobby then go for it
>>
>>102765614
Wait until the 5090 launch.
Then the a6000 prices will folllow.
Also one is not enough, to run anything good you need at least 80GB VRAM
>>
if you are using stable diffusion, it only makes sense to get a 5090.
if you are using LLM's, 3090's are still the best option if you can get them for a fair price, but if you want new hardware the 5090 has 1.5x more bandwidth than a 3090 unlike the 4090 which has the same bandwidth as a 3090 (but a 4090 destroys the 3090 for stable diffusion). But 2 3090's is better than 1 5090.
Also the RTX a6000 is 30% slower than 2x 3090's because the a6000 uses gddr6 while the 3090 uses gddr6x. And I don't know what price you are getting the A6000, but 2x 5090's should be much faster and you get more vram.
I would wait for 5090 reviews, but if history says anything, the 5090 should be the best GPU (due to the price ladder) and it will get sold out by scalpers.
Also note the A6000 is old, so model runtimes might use cuda features only 50 series GPU's support (but we will see).
>>
>>102765809
yeah lick my fucking balls and get an infection from eating my ass, i'm not touching your retarded housefire 500w gpu's jensen.
>>
>A6000 is old
lmao, lol even
Cuda dev still supports P40s from 2016, a6000 has FA2. What is this disnformation lol lmao there aren't even model runtimes that only use 40XX features and that's been out for a while now.
As long as it's ampere or above it gets all the perks, nvidia meme features from the past 3-4 years don't mean jack shit for LLMs.
>>
>>102765809
>5090
The wait for the 5090 will suck, and even when it comes out, chances are it will be hard to come by for a while. Ex, when the 4090 came out people bought them up fast and resold them at a premium.

I have also learned to never be first in line for new (overpriced) technology. Ex, there were cases of 4090s melting before the technology was refined - and let's not forget the 13th and 14th generation intel failures.

Be a year beyond the cutting edge and you'll save a ton of money, and have far more assurance on reliability.
>>
>>102765809
>and it will get sold out by scalpers.
So if you want it, buy day 1 or expect to pay a premium. I'm thinking of buying a few extra just to profit on the appreciation myself.
>>
>>102765626
>>102765644
>>102765809
Thank you for your inputs.

This is exactly the sort of doubts that I have
>>
>>102765809
>Also the RTX a6000 is 30% slower than 2x 3090's
It's slightly slower, but nowhere near 30%. I run a slight memory overclock on mine to narrow the gap and when I tested a 70b model on an A6000 vs a 3090 in the same rig, it was only around 5% slower on the A6000 than the 3090.
>>
File: IMG_1543.jpg (483 KB, 815x1168)
483 KB
483 KB JPG
>>102758839
>just started college as a mech engineering student
>meet compsci girl at one of my lectures on her final year
>she mentions that her final project is making an ai chatbot
>sperg about llms
>realise i know more about her final project than her

Wow
>>
>>102766204
Now you two can become a couple and work together to make the first commercial robotic AI girlfriend.
>>
>>102765892
>Be a year beyond the cutting edge and you'll save a ton of money, and have far more assurance on reliability.
Oops, I meant be a year BEHIND the cutting edge.
>>
>>102765809
>scalpers
Doubt it, it's already being sold at scalpers prices
>>
>>102766204
That's nothing. The majority of programming students suck at computers for some reason.
I could destroy them in anything that wasn't programming or networking.
>>
>>102766133
I agree with you but I think the price is awful.
The only metric a6000 wins in power.
in every other metric a6000 is equal to 2x3090 setup.
The problem is that the a6000 and 2x3090 have zero upgrade options, assuming you get 10tk/s at max vram, nobody wants 5tk/s even if they got 96gb (If you want 5tk/s at 48gb, go buy 2xP40).
You must have a bandwidth surplus for an upgrade (if you double your GPU, you get half the token speed).
So buying a 5090 is still extremely MID compared to a used 3090 (in price), but hey, you will get 50% more token speed (hopefully) and you get more vram.
Is it worth it? depends, maybe for a 3x setup is probably the ideal setup for 10tk/s, but it depends on the models, are there going to be good 96gb models in the future? Will 96gb be enough for next gen open source AI video generators?
>>
>>102766204
Yo, can you pass her my mixture of expert roleplayers system prompt to check out?
>>
>>102766393
>Will 96gb be enough for next gen open source AI video generators?
*scratch that, I am 99% certain that every AI video company does not use a multi-gpu setup. You need a single GPU.
>>
>>102766296
What exactly does
>at computers
entail?
Thanks to ChatGPT, they barely need to know how to program. Once cloud IDEs like Googles become the norm, they won't need any skills "at computers" besides turning it on and opening their browser to be productive in their field.
So not sure what you're trying to brag about. Your config file editing skills are not that impressive.
>>
>>102766446
Sorry, I omitted some context. I meant in the past when I was a student.
>>
>>102766204
No one needs a gf nowadays
>>
File: file.png (1001 KB, 733x743)
1001 KB
1001 KB PNG
What should I use to stuff my 4090 in there so I can get a raiser and put a 5090 in place where 4090 was?
>>
>>102764770
I want to fuck the anime girl.
>>
>>102766296
they suck at networking too
t. devops guy who has to fix all the dev environments every week because everyone keeps finding new and exciting ways to suck at networking
>>
ITS UP

https://huggingface.co/TheDrummer/UnslopNemo-12B-v3-GGUF
>>
>>102766594
DRUMMERGAWDS WE WON
>>
>>102766534
Nta, but speaking of that. Is it possible to run 3 3090s / 4090s in a case without a crypto farm setup?
If it's possible, I would really love to see a motherboard model, the rest I'll figure out.
>>
>>102766594
Sounds like a nothingburger.
>>
>>102766594
>leave slop in your datasets
>release models
>go back to your datasets and curate them
>release UNSLOPPED models
If only he was working for a soulless corporation with retarded modern policies. Imagine the continuous improvement project you could make on this.
>>
>>102766594
Did you take all the slop you removed and make a KTO dataset with it?
>>
>>102766648
inference does not use PCIE bandwidth.
You can turn spare m.2 slots into PCIE slots using $2 adapters.
Most mid range mobo's have enough slots for 4 GPU's and one NVME SSD.
>>
File: 1707892813017576.webm (3.87 MB, 1186x714)
3.87 MB
3.87 MB WEBM
>AutoDAN Turbo - A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
https://x.com/_akhaliq/status/1844258704633340284
https://arxiv.org/abs/2410.05295
>>
>>102766594
Respect to drummer for trying to make nemo even better after all hype is gone, my fav model so far for local.
Gonna try this tune tomorrow when I wake up.
Making this post as inorganic as possible to keep buy an ad schizo guessing.
>>
>>102766737
Well, it's going to look like a mining setup at this point, and you are going to spend most of the money on cables + external enclosure.
https://www.aliexpress.com/item/1005004714035083.html
>>
>>102766783
Buy, uh... or sell... look just perform a transaction please
>>
>>102766966
Can there be an "in case" solution for 3 bricks? That is what I'm really asking.
My cuck rig already struggling with just one brick.
Is airflow between GPUs matter if I don't do back-to-back inference?
If I have to go for razers, I'll go server at that point.
>>
>>102767074
I'm sure you could find a $500 case that serves your needs, if not a custom case builder could.
>>
>>102767095
If people buy $500 cases I'm better off making those instead.
>>
File: AMD.png (19 KB, 836x177)
19 KB
19 KB PNG
>AMD makes a 28GB GPU
>Consumers can't buy it and it's reserved for datacenters
The monkey paw curls once again for /lmg/
>>
File: file.png (3.14 MB, 2000x1000)
3.14 MB
3.14 MB PNG
>KTO
All I can think about whenever I see that.
>>
>>102767268
Lisa can't hurt her cousin Jensen's business.
>>
>>102767268
Proves they're chasing engineering autism instead of solving computational tasks.
>>
>>102766594
This model cured my erectile dysfunction and I can get hard when chatting with AI bots again.
>>
>>102767305
>No ads that are harmful or malicious in nature will be accepted.
Oh It's alright then, sloptunes are not allegeable for ads. They can advertise for free.
>>
>>102767268
They should make like 64G consumer cards and 256G for enterprise. There must be a good reason why they don't but I wish they did
>>
>>102767525
>There must be a good reason why they don't
Money
>>
>>102767560
Just Moore's law not accounting for diversity hires.
>>
>>102767621
more like mooney's law
>>
>>102766534
If you have enough money to buy those GPUs then you have enough money to get a proper case setup and motherboard for them.
>>
you don't actually need more vram
>>
Best local language model available for 4050 6GB + i7-13620H + 16GB RAM?
>>
yes i do i need enough vram for mixtral large
>>
>>102767671
>proper case setup
What is a proper case setup? A fucking rack?
>>
File: relax.gif (902 KB, 498x374)
902 KB
902 KB GIF
how I sleep at night knowing there's no point spending thousands of dollars on hardware upgrades, because there's no open source model that's good enough to be worth it yet
>>
>>102766594
Wouldn't you like to expose your creations to a more sophisticated audience? An ad might just be the ticket.
>>
>>102767718
No. Literally just a full size tower instead of a mid like that photo.
>>
Mistral Small ought to be enough for anyone
>>
File: Sensible Chuckle.gif (1.14 MB, 250x250)
1.14 MB
1.14 MB GIF
>>102767725
>sophisticated audience
>People who click ads
Alright anon you got me, nice one!
>>
>>102767721
Local SOTA is like $500 quanted. Don't let the watermelon salesmen inflate it.
>>
>>102767525
consumer cards are already 1/4th the cost of server cards.
gaming GPU's are already the least profitable part of nvidia and AMD, they are just underselling to prevent marketshare loss (shareholders might care enough to pull out, but honestly I think it's the opposite, I think nvidia will make more money if they stopped making gaming GPU's, every 5090 sold is like $50,000 lost profit for valuable silicon that could have gone to a H200).
Nvidia makes a 50% margin, while AMD makes 30%.
It's neat that Nvidia even made the 5090 32gb, everyone expected it to be 24gb and for a Titan / TI re-release to be 32gb (maybe nvidia plans to release a 64gb Titan for $4000).
>>
>>102767525
>a good reason why they don't
No one needs 256 petabytes of VRAM for something people can get on their phones for 20 bux a month without ever hearing about python.
>>
>>102767693
Gemma2 9b SimPO for anything other than cooming.
>>
>>102767756
Nemo?
>>
>>102767799
And what about for cooming?
>>
File: file.png (2.01 MB, 1024x1024)
2.01 MB
2.01 MB PNG
>>
>>102767777
This, I stopped using local when I realized the best model (mistral large) is just a shitty claude.
>>
>>102767731
> 544 x 242 x 530 mm
How much bigger is a full tower? A meter?
>>
>>102767817
Sone mistral nemo finetune. I'm not familiar with them so I can't help you more than that.
>>
File: file.png (805 KB, 768x768)
805 KB
805 KB PNG
>>
>>102767721
I only got into this cancer because I got a 4090 for gayming. Getting a second 24GB set just for current LLM's would make me suicidal.
>>
>>102767817
if you are autistic and like tinkering, I use google colab for cooming. The downsides is: google spys on you (but honestly they already know too much about my fetishes), takes a moment to start up, sometimes you can't get a GPU, sometimes they detect you are breaking the terms (having sex) but I think it's based on the python log (so if you load a card that says futanari fuckventures or a model/huggingface with "lewd" in the name, and it gets printed, you sometimes get caught, but most of the time I don't?), and you can't continue where you left off because your session can only last a day. On the bright side you get a tesla T4 (it's a 16gb GPU that's half the speed of a 3090, not bad), and you can load new models every time, and it wont make your gpu warm.
https://colab.research.google.com/github/lostruins/koboldcpp/blob/concedo/colab.ipynb
I haven't used colab in a while however, I stopped having sex because I was pretty disappointed with llama3 and nemo so I'm waiting for something new. LLama3 and nemo are "fine" but I didn't notice a next gen improvement from what I saw from llama2 (at least for the erp part).
>>
>>102767756
$500 per T/s?
>>
>>102767978
>sometimes they detect you are breaking the terms (having sex) but I think it's based on the python log (so if you load a card that says futanari fuckventures
Have you tried disabling the console output?
>>
What can we do to make /lmg/ more dead?
>>
Just stop making threads
>>
>>102768040
Even more discord shilling
>>
>>102768012
I switched from oogabooga to kobold, and I think kobold prints less than oogabooga.
The oogabooga notepad I use is painful with gguf since I can't figure out how to make it load a Q8 models without modifying the python code to use --specific-file and load the model manually since I can't figure it out.
KoboldCpp also starts up like 10x faster, but if you run out of context it needs to redownload everything and it's harder to modify the tavernAI png context, since I like changing things.
I tried to run TavernAI, but it's complicated on colab??? and I can't figure out how to load custom huggingface models, and all the python code is hidden for some reason so I can't figure out what's happening.
And I use LLM studio on desktop, it can't run tarvernAI png's, but it's good for some simple r34 "write me an erotic story about ...." and driving the story into my direction.
>>
>>102768184
>but if you run out of context it needs to redownload everything
What?
Also, wouldn't it be better to run koboldcpp as a backend in google colab and use an Silly on your computer as a frontend so that cards and shit are saved locally?
>>
>>102768067
This, what's the point? We've had like four major model releases in the past two weeks and nobody even uses them. /lmg/ is now mostly just drama and tech support. It's about as on topic at this point as /aicg/ is with botmaking.
>>
>>102768245
I just realized that I can use sillytavern with LM studio, that's pretty interesting.
I didn't really fully understand the concept of frontends, I thought sillytavern was just another runtime like oogabooga and Kobaldcpp.
I'll give sillytavern a shot.
>>
>>102768256
>image of apps that depend on llamacpp waiting around with shovels
>>
>>102768299
Let's be honest, what's the point even once the support is implemented? Everyone's going to show their models funny pictures for 5 minutes before losing interest and switching back to what we already had. Multimodal is a meme.
>>
>>102768299
ollama is actively working on multimodal support while llamacpp won't even both
>>
>>102768324
You, much like the models you hold so dear, are severely lacking in vision.
>>
>llama.cpp just went out to lunch one day and never came back
>>
>>102768350
ok palpatine
>>
>>102768256
> We've had like four major model releases in the past two weeks
There has been nothing of note in the past 2 weeks
>>
>>102768374
so that's why it reminds me of my dad
>>
File: file.png (79 KB, 562x772)
79 KB
79 KB PNG
I sure love totally not damage control
>>
It sucks that AI is still dumb enough to not be trusted with anything more complicated than a blowjob. The moment AI gets good enough to consistently simulate whatever setting I want without randomly time skipping or changing the entire setting on a whim will be a very good day. I'm interested in seeing AI dungeonmasters for text adventure games, that sounds like it'd be fun.
>>
File: 1984.gif (2.78 MB, 498x367)
2.78 MB
2.78 MB GIF
>>102768607
>>
>>102768607
What's going on? What is misinformation?
>>
File: file.png (46 KB, 767x268)
46 KB
46 KB PNG
>>102768678
Nothing, don't listen to trolls, ST is doing just fine despite all the FUD!
>>
Just woke up from cryo sleep. wtf is Aria?
>>
>>102768749
multimodel meme
>>
>>102768789
Am I missing something? people are claiming its really resource intensive to run but the total file size isnt that big?
>>
>>102768749
A pretty great manga and anime. I can recommend it.
>>
>>102768864
holy based
>>
>>102759501
Fork the money over, paypiggies!
>>
>>102767721
>have something that basically beats turing tests on an offline computer nowadays, which isn't chasing mememarks and 10x VRAM for 0.1% improvement
>>
>>102759501
Nvidia is legitimately going to lose all the midrange market to AMD at these prices.
>>
>>102769140
You're a naive idiot if you think AMD will not match their prices to the same of ever so slightly below what NVIDIA charters. NVIDIA dictates what everyone else charges through their market share alone.
>>
>>102767721
You will never know the feeling of fine-tuning your own model.
You'll never know the joy of getting an output that's tailored to you, because you trained it to be so.
Just keep consuming what others put in your sloptrough, ignorance is bliss.
Until they take it away from you, and history shows that they will.
Keep enjoying that sleep until then.
>>
>>102769159
From what I am hearing, AMD has stopped trying to match Nvidia in the high range department and are solely focusing on the mid range. We will have to wait and see what the prices will be when AMD releases it but I would legitimately be surprised if if breaches $1,000 bucks.
>>
>Yes the developers do want to realign the labeling/branding of ST to not be primarily Roleplay focused BUT this is not a change to kill roleplay, it’s simply a change that will align ST with its primary long term goal of being the “LLM Frontend for Power Users”. By being a neutral tool that does open up ST to be used in any environment whether that be a business, a university or for roleplay use. In my mind this will only help ST grow and keep the developers passionate about continuing the project.

>MYTH ST is being changed so it can be monetized.

>This is simply a lie that keeps getting spread by doomers. I have seen countless messages from the development team that contradict this but angry users keep calling them liars. Look In my day job (going to keep this vague) I have a masters of information systems and work in the financial investments space.

>MYTH ST will be preventing users from using it for RP in the future.

>I’m really not sure how this got started but one bad joke about RP being a bannable offense from Cohee didn’t help lol.

>So I ask the community for two things. One please be patient and wait and see as these changes roll out. I think you’ll find your RP experience won’t be disrupted/changed like you fear. Second please tone down the rhetoric around this. I’ve had to remove probably around 100 comments hurling personal attacks against the developers. Nasty insults against people who have donated 1000s of hours of their time to bring you a FREE tool that provides countless hours on entertainment using a cutting edge technology.

https://www.reddit.com/r/SillyTavernAI/comments/1g0x2m4/proposed_changes_megathread/

they keep pouring fuel on the fire huh?
>>
>>102769342
who cares this is a mikupad general
>>
>>102769342
Do they actually think people will fall for this shit? We have pattern recognition. We know how this tale always ends.
>>
>>102769207
> Implying $1000 is "high range", and AMD has given up.
MI300 would like a word with you.
>>
>>102769375
I am talking about the RDNA 4 cards that should be announced in Q1 2025
>>
>>102769342
objectively I think it's a fine move and most of the people panicking about it are being hysterical. ST has all the pieces in place to be *the best* general LLM interface out there and I've actually wished there was something like a corpo-ST for a while now, something with great support for all sorts of backends and samplers with great prompting functionality and additional tooling. I think it's stupid to believe they're going to be outright hostile to RP stuff in the future just because of a few seemingly tongue-in-cheek comments from the devs when everyone was freaking out, I'd fully expect them to keep their word about continuing to support it through extensions or whatever
however, the
>ServiceTesnor
shitposting is really funny to me so I'll continue to indulge in it for the time being
>>
>>102769375
datacenter is a whole separate market, none of you niggers are running h100s or equivalents
AMD is giving up on high end consumer and probably also pro cards
>>
Got any tips how to make hermes 405b good? Preset? Samplers?
>>
>>102770124
It's objectively worse than largestral
>>
File: file.png (693 KB, 734x978)
693 KB
693 KB PNG
Are there any models that are similar to the original chatGPT, I want to recreate nigga mode
>>
>>102770153
I just use the Big Nigga card.
It's by far the best assistant card.
>>
>>102769342
Sounds like this was all a nothingburger and SillyTavern will still be the best RP frontend going forward.
>>
https://huggingface.co/PocketDoc/Dans-PersonalityEngine-v1.0.0-8b i trained this, i like it but it's still an 8b so ymmv
>>
PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency
https://arxiv.org/abs/2410.07563
>We introduce PLaMo-100B, a large-scale language model designed for Japanese proficiency. The model was trained from scratch using 2 trillion tokens, with architecture such as QK Normalization and Z-Loss to ensure training stability during the training process. Post-training techniques, including Supervised Fine-Tuning and Direct Preference Optimization, were applied to refine the model's performance. Benchmark evaluations suggest that PLaMo-100B performs well, particularly in Japanese-specific tasks, achieving results that are competitive with frontier models like GPT-4.
https://huggingface.co/pfnet
not up yet but that seems to be their HF. posting for VNTLanon
>>
>>102769342
>>102769342
We are so back
>>
>>102770605
I'm hyped.
>>
>>102766594
>unslopped
>immediately talks about "her glimmering azure orbs"
FUCK YOUUUUUUUUUU
>>
>>102770605
Thank you!
>>
>>102770605
>fp8 training
Nice. Can't wait to try an IQ1 of it.
>>
llamacpp is processing every time now even when I click continue with no edits.
>>
>>102771009
Check if whatever you're using to talk to it is sending "cache_prompt": true in the request. If not, click on random things until it does.
>>
IterGen: Iterative Structured LLM Generation
https://arxiv.org/abs/2410.07295
https://github.com/uiuc-arc/itergen
in case anyone is interested in structured outputs. git isn't live yet
>>
>>102747280
>Is your RAM on the QVL?
Thanks yeah and no, the gigabyte site does not have the rev 2.0 version of the board info for QVL check but it does have it linked via the 1.0 or generic QVL version.

I got "M321R4GA3BB6-CQKVS" ram, which is only on that list as 'M321R4GA3BB6-CQKMG' which was the closest one, but not the same product name (different testing batch bins?), and GPT says "check if BIOS is compatible with the QS version of the CPU".

I messaged memory-net just in case, but I either think its the CPUs or the power supply not allowing it to boot if not the ram.
>>
>>102771300
Are you SURE all the RAM is seated properly? Those slots are a real bitch to seat, and can even pseudo “click” but not actually be slotted right
>>
>>102771371
haha dont do that to me anon, but ill check when I get home, but I still think its another issue

Dunno if I should troubleshoot with the ebay guy or just return it, Ive got a week to work it out
>>
File: Untitled.png (656 KB, 1080x2936)
656 KB
656 KB PNG
Upcycling Large Language Models into Mixture of Experts
https://arxiv.org/abs/2410.07524
>Upcycling pre-trained dense language models into sparse mixture-of-experts (MoE) models is an efficient approach to increase the model capacity of already trained models. However, optimal techniques for upcycling at scale remain unclear. In this work, we conduct an extensive study of upcycling methods and hyperparameters for billion-parameter scale language models. We propose a novel "virtual group" initialization scheme and weight scaling approach to enable upcycling into fine-grained MoE architectures. Through ablations, we find that upcycling outperforms continued dense model training. In addition, we show that softmax-then-topK expert routing improves over topK-then-softmax approach and higher granularity MoEs can help improve accuracy. Finally, we upcycled Nemotron-4 15B on 1T tokens and compared it to a continuously trained version of the same model on the same 1T tokens: the continuous trained model achieved 65.3% MMLU, whereas the upcycled model achieved 67.6%. Our results offer insights and best practices to effectively leverage upcycling for building MoE language models.
neat
>>
>>102771074
It does say it, but it's processing every time. It's sillytavern
>>
>>102771522
Did you check the actual request (your browser's dev tools) or just ST telling you it's doing it? It's working fine for me with the shitty (now old) vim plugin for llama-server. I set it up a few days ago, so i remember that query flag specifically.
You can try llama-server directly with their integrated ui as well, just to make sure there's no funny business in the middle.
Also. Is this a long chat already or a new one? Are you completely sure ST is not doing some sort of context shifting, trimming old messages or anything like that?
>>
From the paper, Aria actually looks quite impressive, especially for video understanding. Of course I'm a coomer, so I'm mainly interested in its ability to accurately caption short porn clips. We finally got what looks like a decent text2video model (PyramidFlow). I can't help but feel like the era of local NSFW text2video models isn't that far away.
>>
I know that Qwen2.5 is decent for RP, but terrible for ERP due to censorship. While waiting on the fine-tunes, I came across this, and was curious if anybody has tried it yet.

https://huggingface.co/gghfez/Magnum-v1-72b-Qwen2.5
>>
>>102769342
>align ST with its primary long term goal of being the “LLM Frontend for Power Users”.
>any environment whether that be a business, a university
Do they not know what a power user is?
>>
>>102771701
How those mixed vision models get affected by quantization? Worse than text-only?
>>
In llama.cpp/koboldcpp can you mix a pascal card with Intel Arc A770?
I'm worried about getting problems on linux too. Nividia drivers are a hassle. Can i just plug this shit in and be done with it?
How are there no dedicated ai cards yet.
>>
>>102771826
>In llama.cpp/koboldcpp can you mix a pascal card with Intel Arc A770?
Yes, the Vulkan backend supports using multiple GPU vendors at the same time.
>I'm worried about getting problems on linux too. Nividia drivers are a hassle. Can i just plug this shit in and be done with it?
Nvidia drivers on Linux are plug and play. You should be more concerned about the Intel card.
>>
>>102771872
>Nvidia drivers on Linux are plug and play.
I cant boot with more recent liquorix kernels anymore which seems to be caused by nvidia drivers.
I'm a brainlet and come from windows with colorful kiddy buttons who do the work for me (as it should be). So installing nvidia and cuda is not fun already.
I'm just worried if I put a intel card in there it fucks everything up. But its nice that llama.cpp supports different vendors. Thanks for the info.
>>
>>102771826
There might be a really slow runtime that supports everything that might work, but you are lucky if intel is even supported.
The A770 is also pretty slow, the pic rel is like a 5gb model. Imagine showing benchmarks for an AI that only uses 1/3rd of it's vram.
Like 10tk/s at full vram usage is very usable, it's just worthless for a dual GPU setup, since you could just buy a 5090 and it would give you 3x more token performance (with the same model) because it has 3x more bandwidth, and nvidia works with pretty much everything (you could probably use it with your pascal GPU).
>>
>>102771913
forgot pic
>>
>>102771813
In my experience the language weights are affected by quantization just like any LLM. The vision weights (encoder, projector, cross attention if the model has it) need to be bf16, or maybe an int8 quant. But definitely not 4 bit.
>>
How many tokens is an image?
>>
>>102771922
That's all well and good but unless they actually go and code that shit themselves open source trannies refuse to touch anything but CUDA.
>>
>>102771987
depends how much stuff is in it
>>
File: example.jpg (615 KB, 2541x1904)
615 KB
615 KB JPG
>>102766534

I used a 3D printed mount since no commercial solutions don't have something vertical. You can also buy them off etsy or something.

>pic related,
>>
>>102771604
Yes it is I checked and it's sending it.
I don't know what was going on but it's doing it now, just had to restart.
>>
>>102771994
Lettuce say hypothetically I posses an image of Micu on a white background looking at the viewer with a slightly sardonic smile on her face.
How many tokens is that, roughly speaking?
>>
>>102772024
how big is the image? would be faster to open up a space and test it with the actual encoder.
>>
>>102772024
depends on resolution, model, settings, etc
>>
>>102766737
It does. Only dumb fucks like him >>102759705
run llms sequentially at snail speed. Any parallel processing requires excessive communication between GPUs
>>
>>102772087
because i like jpegs for the resolution, the color, everything about jpegs i like
>>
>>102770124
I have no particular tips but I run it at temperature 0.8.
>>
>>102771993
AMDrones, our response?
>>
>>102772862
>>102772862
>>102772862
>>
>>102768713
Wait a minute... that project is licensed under AGPL.
Why the fuck are they pivoting to corpos when they're never going to touch it anyways?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.