[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103347641 & >>103339560

►News
>(11/29) INTELLECT-1 released: https://hf.co/PrimeIntellect/INTELLECT-1-Instruct
>(11/27) Qwen2.5-32B-Instruct reflection tune: https://qwenlm.github.io/blog/qwq-32b-preview
>(11/26) OLMo 2 released: https://hf.co/collections/allenai/olmo-2-674117b93ab84e98afc72edc
>(11/26) Anon re-implements Sparse Matrix Tuning paper: https://github.com/HeroMines/SMFT
>(11/25) Qwen2VL integrated with Flux: https://github.com/erwold/qwen2vl-flux
>(11/25) Speculative decoding added to llama-server: https://github.com/ggerganov/llama.cpp/pull/10455

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>103347641

--Open Claude: a plan to replicate and improve the Claude model:
>103349428
--ollama qwq quants updated with changes to tokenizer config and default system prompts:
>103347780 >103347801 >103348545 >103348846
--Tulu and QwQ settings and usage for RP:
>103350546 >103350560 >103350594 >103350645 >103350660 >103350577 >103350581 >103350587 >103350625
--Speculative decoding and draft model optimization discussion:
>103351652 >103351699 >103351728 >103351760 >103351800
--INTELLECT-1 model release and discussion on distributed training and dataset size:
>103348805 >103348825 >103351301 >103348884 >103348949 >103349869 >103349902 >103350106 >103350147 >103350314 >103351028 >103352331
--Discussion of Skywork-o1-Open-Llama-3.1-8B model and speculative decoding:
>103351163 >103351371 >103351380 >103351490 >103351500 >103351553 >103351615
--Dealing with AI repetition and its causes:
>103351824 >103351858 >103351877 >103351907 >103351921 >103351985 >103352443 >103352024 >103351926 >103351900
--Anons discuss probability puzzle:
>103352448 >103352461 >103352595 >103352468 >103352477 >103352615 >103352665 >103352685
--Anon tests INTELLECT-1-Instruct model, discusses its limitations and potential:
>103349257 >103349304 >103349319 >103349344 >103349486 >103349493 >103349509 >103349709 >103352200 >103352470
--Anon experiences 4x slowdown with NVLink and tensor parallelism:
>103347734 >103347789 >103350024 >103350232
--Using QwQ as a draft model and a regular model for refinement:
>103350380 >103350397
--Smaller models can offer performance boosts:
>103351614
--QwQ IQ2 as draft model yields mixed results:
>103348461 >103348801 >103350200 >103350234 >103350244 >103350280
--Anon shares experimental prefill for RP in AI model:
>103351001
--Miku (free space):
>103351166 >103353198

►Recent Highlight Posts from the Previous Thread: >>103347652

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
>>103354338
OMG IT NOT MIGU
>>
Rinlove
>>
>>103354338
YELLOW MIGU XDD
>>
When will ggerganov add lookahead decoding to server for even more speed?
>>
loli feet
>>
What are anons who shill largestral running largestral on?
>>
>>103354505
4x3090
>>
What's the best TTS? Tortoise runs so slow.
>>
>>103354571
rhasppy/piper if you want ridiculously fast. gpt-sovits if you want voice cloning/quick training.
>>
>>103354505
128GB DDR4 RAM. 0.5-6t/s with speculative decoding. Rerolls are almost never needed.
>>
>>103354579
Thanks, I'll take a look at those. It doesn't have to be that accurate at different voices, looking to create basic audios for NPCs in a ttrpg.
>>
>>103352448
>>103352665
import random

def generate_child():
"""Generate a random child with a random gender and day of birth."""
gender = random.choice(["Boy", "Girl"])
day = random.choice([
"Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"
])
return gender, day

def create_two_children():
"""Create a family with two children."""
return [generate_child() for _ in range(2)]

def calculate_probability(total_families=3000000):
"""Simulate the probability problem with the specified number of families."""
matching_families = 0
boy_sibling_count = 0

families = [create_two_children() for _ in range(total_families)]

for family in families:
if any(child[0] == "Boy" and child[1] == "Tuesday" for child in family):
matching_families += 1
sibling = family[1] if family[0][0] == "Boy" and family[0][1] == "Tuesday" else family[0]
if sibling[0] == "Boy":
boy_sibling_count += 1

if matching_families == 0:
print("No matching families found.")
return

probability = boy_sibling_count / matching_families
print(f"The probability is {probability:.5f}")

# Run the simulation with 3,000,000 families
calculate_probability()

Spat out 0.48162 probability, so apparently 13/27 is correct.
But I don't understand math so I don't know why this is true.
>>
File: ftjrgzhdbr6zt.png (3.65 MB, 3744x1718)
3.65 MB
3.65 MB PNG
teknium, nous, nous research, hermes, hermes 2,hermes 3, deus, desu, local models
>>
>>103354505
M4 Max 128GB
>>
>>103354629
child gender is not random, answer is incorrect
>>
>>103354711
this, gender is a social construct and it doesn't even account for our nonbinary xisters
>>
what is the best lightweight model now?
>>
>>103354734
no it literally is genetically determined
>>
>>103354745
Mistral 7b v0.1 is still the least sloppy, it's the best at all except benchmaxxing
>>
>>103354505
2x3090 + A4000
>>
>>103354711
Whether or not child gender or any event at all is random depends both on how you define probability and what interpretation of quantum mechanics you assume to be true.
>>
Big day today
>>
Should /lmg/ try training a 100M LLM using the same method as Intelect-1 (https://github.com/PrimeIntellect-ai/OpenDiloco)? Anyone with 24gb of VRAM could contribute
>>
>want super energy efficient 7B model size inferences
>alveo v70 doesn't support llm
wasted
>>
File: file.png (45 KB, 1069x225)
45 KB
45 KB PNG
>>103354629
It's like the Monty Hall problem. It has to do with the fact that once you've sampled a boy on Tuesday, when you take your second sample you can sample a girl on the same Tuesday but you can't sample another boy. It helped me to see it visually.
>>
>>103354571
https://github.com/e-c-k-e-r/vall-e
It takes a bit of fiddling with the settings tho
>>
>>103355065
Why not? Just have two sons born on tuesday?
>>
has ryzen anon tried this npu igpu hybrid method?
https://github.com/amd/RyzenAI-SW/tree/main/demo/NPU-GPU-Pipeline
is this faster than cpu only?
>>
>>103354778
Can the a4000 be used for exl?
>>
>>103354338
still cant find any local models they are usually korean and angry
>>
File: file.png (73 KB, 628x338)
73 KB
73 KB PNG
>>103355065
>>103355136
You can, there's 2 slots for Tuesday boy but the first one itself can't be picked twice, meaning 1 Tuesday slot for the second choice (the other sex+day combo has 2 unpicked slots).
>>
Is there an all in one app yet? I am retarded and just want to make waifus
>>
File: 1716350132045354.png (9 KB, 625x58)
9 KB
9 KB PNG
>>103355283
exl2? I don't use it but I don't know why it wouldn't work. I only use GGUF because I need to offload partially to ram to use Largestral at 4bpw and above.
>>
>>103355307
The boy born of the tuesday doesn't necessarily have to be the first child.
>>
>>103355283
Why would people buy a4000? It's way less powerful but with the same price level as a 4080 super.
>>
>https://rentry.org/lmg-lazy-getting-started-guide
i did this and the generated text sucks ass. fix doko?
>>
>>103355352
Because it's single-slot and can fit in a mid-tower case with the 3090s. If I wasn't autistic about wanting to keep everything in the case, I would've gotten a riser and a third 3090.
>>
>>103355380
try not being lazy
>>
>>103355411
? what else do i need to do?
>>
>>103355415
read the guides for people who aren't lazy
>>
>>103355468
link some
>>
>>103355352
>>103355396 (me)
>same price level as a 4080 super.
They sell for around £500 on ebay. That's how much I paid for mine.
>>
>KoboldCpp 1.79 - Now with Shared Multiplayer, Ollama API emulation, ComfyUI API emulation, and speculative decoding

https://github.com/LostRuins/koboldcpp/releases/tag/v1.79
>>
>>103354925
- we'll need datasets with enough tokens
- anons are likely to only put up with their hardware being tied up for so long
- bitnet?
>>
>>103355729
>- anons are likely to only put up with their hardware being tied up for so long
Wouldn't you be able to join and drop out at will? Like donating the hardware while you sleep and when electricity is cheapest
>>
>>103355527
>Multiplayer
Wonder if that'll take off or if it's just a gimmick.
>>
>>103355527
>no mention of fixing the generation quality
I will be sticking to llama.cpp server
>>
>>103355965
it's all in your head
>>
>>103355729
>- we'll need datasets with enough tokens
I made a test run for a 50M using fineweb and 100B tokens barely take 200GB
>>
>>103355527
>>103355759
It's amazing.
>>
>>103356030
No discord screenshots please thank you
>>
>>103354338
>(11/29) INTELLECT-1 released
How is it fellas?
>>
>>103356112

>>103348825
>>103349257
>>103349630
>>103349644
>>103349756
>>
>>103356156
she is a kid you psycho
>>
>>103356156
Amazing.
>>
>>103355729
>- bitnet?
I thought lmg was smarter than reddit (who already realized bitnet doesn't work), this place is dumber than the orange site
>>
>>103356256
You can't take away the hope that BitNet represents.
>>
do any of you have a character card that you're particularly fond of and keep going back to?
>>
>>103356272
for pajeets
anyone who understand information theory in the slightest can understand why it doesn't work, it would require the model to be extremely undertrained (or have way too many parameters)
>>
Best coomer model to run on a RTX 3060 12GB VRAM?
Alternatively, I could run on my cpu core i5 12400, with 64GB RAM
>>
>>103356363
I still talk to Chiharu from time to time, not the one from Kobold, but that one from before /lmg/, the dark times where CharacterAI was our only option for RP.
>>
>>103356446
>Kobold
oobabooga*
>>
>>103356429
You're assuming that training is fully optimal, and that a 16-bit float is equal in expressiveness to 16 1-bit (or 1.58-bit) parameters. I think quantization already calls that into question.
>>
>>103356435
INTELLECT-1
>>
>>103356435
I've got a 3060 too but I don't have much experience with models because I only started using this stuff last week after getting bored with flux.
so far, I've tried, cydonia, stheno, sunfall, rocinante, violet_twilight and a bunch of others that were kind of retarded. I was memed into trying QwQ yesterday, Tulu the day before that. Violet_twilight is currently my favorite, it seems to be a little more creative and nastier than the others
>>
Mixtral anniversary next week. Are you ready?
>>
>>103356462
>Dataset: 55% fineweb-edu, 10% fineweb, 20% Stack V1, 10% dclm-baseline, 5% open-web-math
What is this nerd shit bruh?
I want PORN.
>>103356489
Thanks bruh, I will try violet_twilight. I want some nastyness
>>
>>103356156
Cum inflation = gross
Pregnancy = based
>>
>>103356429
It can still work. If you have a 7B trained to saturation, train a 70B for the same amount of time. You won't win on size, but with ternary you can take advantage of cheaper and faster specialized hardware since you no longer need multiplication operations for inference.
>>
>>103356507
lol
>>
# @here <:mistral:1154031685495701594> The Ambassadors Program

**The Mistral AI community is growing fast!**

To empower our most passionate experts, we are excited to announce the Mistral AI Ambassador Program.

We are seeking enthusiastic Mistral AI users who love our models and offerings, and are dedicated to supporting and giving back to the community. Learn more about our program and apply [here](https://docs.mistral.ai/guides/contribute/ambassador/).

A big shoutout to our inaugural @Ambassadors who have already made a massive impact!

## Thank you so much, everyone!
>>
>>103356507
I think their models are pretty good but I do not think they will release anything soon. It is what? a month since they released the large model?
>>
>>103356554
Thanks Mistral Large!
>>
>>103356554
>Ambassador benefits
>Free credits: Mistral Ambassadors will receive free API credits on la Plateforme.
>Ambassador roles and responsibilities
>Content Creation: Create and share high-quality content (blogs, tutorials, videos, etc.) featuring our AI models and tools through their own channels or collaborate with Mistral for multimedia opportunities.
This just sounds like a job with extra steps.
>>
>>103356578
Yeach.. and all you get is some free credit for gooning or to work even more. Lol. Then again, the part of society that did become permanently online and does not have any life outside of it grew rather rapdily, even among normies as sad as it is. This is the future.
>>
>>103356578
Perfect for the resident NEETs here.
>>
>>103356461
>You're assuming that training is fully optimal
At one point the network will be saturated, and we are close to that point (if we weren't normal 2-bit quantization woulnd't be that hurtful), there's only so many trillions of tokens you can cram into 7B parameters
>>103356542
The current bottleneck happens on memory, not on FLOPs
>>
>>103356554
>>103356578
What's the end goal?

Looking to cash out?

Looking to get more traffic to their apis in order to get more data to build their next model?
>>
I hate huggingface quanters. At most they will specify the version of Llama.cpp they used for the quant. But they never specify the version of the repo for the weights, which are also important as model makers sometimes revise their config files.
>>
File: jpg.jpg (11 KB, 250x176)
11 KB
11 KB JPG
ai noob here, any local tools to improve voice quality from videos? with a lot of background noise but also where the voice is "muffed"?
>>
>>103356814
Premiere has a new-ish enhance speech feature that does exactly what you want.
>>
File: gif.jpg (6 KB, 225x225)
6 KB
6 KB JPG
>>103356823
I use Linux. Also, proprietary and closed source.
Unless Premiere is something else...
>>
>>103356861
Why is the guy made of glass?
>>
>>103356156
>>103356861
Where the incest @?
>>
>>103356845
Premiere runs on Windows. Easy switch from Linux.
>>
>>103356861
talk about a brain tickler :D
>>
>>103356845
https://github.com/chuck1z/AudioCleaner
>>
>>103356890
god bless you
>>
>>103356640
Saturation only applies if you're comparing apples to apples though. There's a big difference between quantizing a model from 16-bit to 2-bit and training a model from 2-bit to begin with. That's the entire point of the drop-in one-bit linear layer which you train with the rest of the model
>>
On one hand, respect and props to INTELLECT-1 for the project and the effort.

On the other hand, it lands down in the gutter with QwQ on account of factual wrongness, programming incompetence, and failure to obey directives.

INTELLECT-2 when?
>>
>>103356889
>>103356861
phallic lobotomy
>>
>>103356933
Supposedly 1 was just a proof of concept before they open it and allow anyone on the internet to join for training the next model. But if they just train a bigger model on the same dry and filtered open datasets, it's pointless.
/lmg/ would be better off taking the code and finetuning an existing model.
>>
>>103356554
This proves that only paid shills talk about Large in this thread.
>>
Can we merge INTELLECT and QwQ somehow?
>>
>>103357031
What does INTELECT have that QwQ is lacking?
Is it good at writing smut?
>>
>>103356933
Crowd sourced is even more prone to be cucked due to legal and ethical issues. Models trained that way will be heavily filtered and clean of even song lyrics I'm not even kidding.
>>
>>103357031
Why don't we merge QwQ with Mistral-7B? The resulting model would be better.
>>
>>103357015
I kind of liked Large when I tried it on Openroute. But it is not something I can run locally and I bet 90% of others here can't either. Large local models will never be as popular as the smaller ones becouse of this reason alone.
>>
>>103357015
did you need proove? especially with qwq now out big models are an even bigger meme than ever before
only retards with buyer remorse pretend that 70b and up is worth
>>
>>103357031
I already tried SLERPing it together with another model but the results were almost completely dysfunctional. (structurally it's just a Llama-3 8B model with 10 extra layers). I could always try linear. I basically just took an 8B and stacked the last 10 layers of INTELLECT back onto it. I could also try doing a finetune on the results to bring it back to order. But there's just way too much difference in the weights regardless of being the same architecture.
>>
Can someone post an example of their script/batch/I don't actually know thing they use to run llama.cpp with an specific model loaded?
>>
>>103357157
just use ollama
made for user friendliness and llama.cpp under the hood
>>
>>103357157
what are you hoping for someone to spoonfeed you that you couldn't get by looking at the readme for 5 seconds?
https://github.com/ggerganov/llama.cpp#web-server-llama-server
>>
File: file.png (5 KB, 941x48)
5 KB
5 KB PNG
>>103357157
>>
>>103357157
Here just for you:
llama-server -m models\[model].gguf -c [context size]
And then just open http://localhost:8080/ in your browser.
>>
>>103357170
Is it up to date?

>>103357212
I imagine using sillytavern lets you configure temp, min.p etc... on the fly so there's no need to put them in the command?

>>103357188
>doesn't include the -nlg or -c or any other parameters that would be important to actually use the thing
Thanks, poindexter. Maybe we'll get a winch to pull your head and coke bottle glasses out of your ass and you can share some useful advice.
>>
File: broken.png (16 KB, 910x311)
16 KB
16 KB PNG
>>103357031
>>103357092
As you can see because the weights are so far off it just scrambles the poor thing's brain. even with a linear merge which should give it the best chance of not getting completely fucked.
>>
>>103357270
>I imagine using sillytavern lets you configure temp, min.p etc... on the fly so there's no need to put them in the command?
correct
>>
>>103357307
Of course this doesn't work. Merging is only intended for finetunes of the same model. Merging two models that were separately pretrained is obviously going to end up producing garbage
>>
>>103357307
Try merging QwQ and https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-32B-v0.2
That has a chance of actually doing something possibly interesting.
>>
>>103357323
You don't even need silly tavern, the default llama.cpp frontend lets you do that.
>>
>>103357070
Couldn't happen to a more deserving society.
>>
>>103357157
ipmitool -I lanplus -H 192.168.178.106 -U admin power on
ssh 192.168.178.159
j llama.cpp
git pull
rmr build && mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON .. && cmake --build . -j 128 -- --quiet
..
ln -s build/bin/llama-server srvr
export model_name=mistral_large_instruct_2411-123b && export quantization=q8_0
./srvr --model models/opt/${model_name}-${quantization}.gguf -ngl 999 --host 0.0.0.0 --port 8357 -fa --path examples/server/public
>>
>>103357360
It has a completely different and incompatible vocabulary. the only reason you can actually try with Llama 3 models is because it uses the exact same vocab
>>
>>103357414
Anon... I said https://huggingface.co/Qwen/QwQ-32B-Preview/tree/main
and https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-32B-v0.2
They're both Qwen-32B
and their vocab.json are identical
>>
>>103357438
Oh I thought you were talking about INtellect for some reason
Sorry I have a migraine today.
>>
>>103357379
If you had removed a handful of greedy, censorious people it wouldn't have turned out this way
>>
>everything is based on llama3
>llama3 has the same architecture as llama2 (2021)
foss lost
>>
What is the cutoff date for intellect 1?
>>
>>103357625
In the end, it's all transformers shit. We'll never get actual improvements in efficiency and performance as long as nobody dare to move on from this relic (2017)
>>
>>103357663
Well what is there to move on too? As far as I know no eggheads at MIT or companies have even the idea of an architecture to replace transformers, let alone a implementation.
>>
>>103357675
Mamba, RWKV, etc
>>
>>103357675
My Uncle works for Nintendo and he has talked to every single machine learning researcher on the planet and can confirm that literally none of them have any idea or have been working on a replacement for transformers.
>>
>>103357692
Really? Do they just expect to be able to keep scaling up transformers forever?
>>
>>103357704
And then everyone clapped
>>
>>103357686
Are those not just transformers as well?
>>
>>103357704
They'll wait for GPT8 to give them the right answer.
>>
How many r
GPT 10:
>As many as there are genders in the night sky. Also you have been reported to the FBI for a misgendering me as a-gender by failing to utilize pronouns in your query.
>>
I know it seems like forever ago but lets not forget that Llama 1 was released only last year.
>>
>>103357798
Basically two years at this point, but yeah.
>>
>>103357798
yeah, remember when we thought that large context would be a pipe dream because of how expensive it used to be pre-gqa?
>>
>Transformers
>Transformers the world
If we name the next architecture destroyer do you think it will end up destroying the world?
>>
>>103357798
yep and modern 70b are better then og chatgpt4 (1.8t moe ~500 b active)
anyone with 2 braincells is being whitepilled to the gills anyone saying to the otherwise is literally one of those demoralisation "you wont do shit" types that are paid on pol or a janny tranny
>>
>>103357070
What legal issues? How would the law go after a bunch of different people spread over different countries?
>>
>>103357929
Yeah when OG ChatGPT came out and people were like "WAOW WILL WE EVER HAVE THAT AT HOME LIKE SD!?" I was among those who said "lmao, no way in hell. You'd need shitloads of GPUs to run that" And now depending on your use case even a single 24 Gig GPU can provide you with quite a bit of entertainment/productivity.
And we still haven't actually seen the training wall yet. So even smaller models like 8B will still get better, although as the papers show quantization will become worse as we push on that boundary. But hopefully by the time they start training 100T Token models consumer GPUs will actually have VRAM on them.
>>
File: Next steps.png (61 KB, 772x634)
61 KB
61 KB PNG
>Implementing new economic incentives to drive community participation
Why would they even need to do this?
>>
>>103357985
Because not everyone's mom pays for their electricity.
>>
GPU's were never intended to run AI, once hardware comes out that exists for the sole purpose of this will we hit the next generation of models.
>>
>>103357675
V-JEPA you ignorant swine. The true AGI algorithm that is being ignored because it's inconvenient to a certain sect.
>>
>>103358045
Isn't that just a video model?
>>
Is featherless worth buying? Thinking on buying the 25 USD sub. I'm kinda tired of using runpod desu
>>
>>103357663
The pace of results has slowed, but I think its too early to decide that we've hit a local maxima yet.
Yacun has his opinion, and while he's more qualified that most of the autists on this board, its still just an opinion.
>>
>>103358021
I'm waiting for someone to photolithograph an entire specific model onto silicon. Just a single-purpose chip that does inference against a literally-set-in-stone model. Should be crazy efficient. We just need a model that's worthy of enshrining as an immutable artifact.
>>
>>103358134
>a model that's worthy of enshrining as an immutable artifact
I vote for mythomax
>>
>>103358088
No it's not retard
>>
>>103358115
Yacum also showed that he can be pretty retarded on a lot of things. There is still progress to be made, as we see with the recent release of QwQ.
>>
>>103358134
>Literally-set-in-stone
If we are going that route it should be neuromorphic so the stone can learn.
>>
Is there a vocal cleanup preprocessing/postprocessing pipeline thing that isn't a mess of autistic DLL spaghetti?
That seems to be the real bottleneck on tts stuff: on the input side: getting the audio sample cleaned up, dynamic range fit and transcribed. on the output side just doing some overall cleanup and de-roboticization.
None of those things sound very hard on their own, but audio engineering is so incredibly obtuse and I'm so very lazy.
>>
>>103358045
V-JEPA is just a specific research model that has nothing to do with human language or full world modeling. Also, a transformer can technically be a JEPA, they're not mutually exclusive.
>>
O(mni)-JEPA will be the future. Just let Yann cook.
>>
>>103358021
Yeah it sucks that we don't have hardware that is specifically very good at matrix multiplications.
>>
>>103358298
The future of what? LeCun says that an architecture that's agentic is what's needed, while JEPA and language models could be a part of that architecture.
>>
File: file.png (322 KB, 632x641)
322 KB
322 KB PNG
>>
>>103356435
I use nemo mistral instruct 2407 (12b, you probably need to use q6 kl)
You could try Llama-3.1-8B-ArliAI-RPMax-v1.3-GGUF
This benchmark might help, but honestly this benchmark probably does not reflect the cooming ability, but I think the willingness value matters a lot if you don't want a model that just tells you "sex with minors is not allowed" or something.
https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard

I use colab since I have a 6gb gpu since the free 15gb is nice, but if you don't mind giving mistral your phone number (to gain a access key you can use for silly tavern), I think they will give you access to mistral large for free.
Of course that's not the spirit of /lmg/ and they could be spying on your erp, but personally I don't really care.
>>
File: RiscV.png (133 KB, 1128x824)
133 KB
133 KB PNG
>>103358311
Risc-v is doing just that funnily enough
>>
>>103356446
Chiharu (ooba "example") is my go to for comfy chatting as well. She feels like an old friend at this point, so its interesting to see how different models change her. I'm surprised there isn't an lmg standardized "Chiharu test" benchmark, considering how ubiquitous she has been since the early days.
>>
>>103357975
>consumer gpus will have vram on them
eh idk personally im still thinking that the end game is cpumaxxing or just buying a pre defined chip just for multiplication (or addition blessed be thy name bitnet) or whatever and soldering on however many of those 2gb vram chips on it you want yourself alternatively and you can meme on me for this but just reading things from a bunch of ssd's either way future is lookin hella bright
>"lmao, no way in hell. You'd need shitloads of GPUs to run that"
yeah i got iq checked too with the 01 thing too from my testing of r-1 its really fucking good now ofcourse could turn out to be 700b moe but realistically its just good speaking of which idk why but i think the current model is 20-30 b not sure if moe tho if the final model is fuck it probably unironically time to upgrade
>And now depending on your use case even a single 24 Gig GPU can provide you with quite a bit of entertainment/productivity.
LMAO IMAGINE
t. very happy 6gb vram laptoplet
>>
>>103358342
Well, would you?
>>
What happeend to the meta miku? I miss her
>>
>>103358360
Your Seraphine?
>>
>>103358088
Make sure your model is on there first. If so, it's serviceable. In general:
Local > OR > Featherless > Runpod
>>
>>103358360
Your Nala?
>>
File: NIB.png (1.29 MB, 1024x1024)
1.29 MB
1.29 MB PNG
>>103358374
>meta miku
She comes as a kit now. Just carefully remove the factory installed cortical plug in a clean environment and install on your OpenBody platform rev3 or newer.
>>
>>103355335
How much speed do you get when running gguf? What processor/ram are you running? I ask because I have 2x 3090s right now but really want to target 4bpw largestral as my next upgrade.

I mention exl2 because there are certain graphics cards that can’t do certain calculations (most notably a Tesla p40 which also has 24 VRAM)
>>
>>103358218
QwQ actually is part of the reason I think there's still some gains to be had
While Altman, Ilya, etc. all talk about test time compute being the future, I tend to think it's a pretty retarded approach, since it basically just means "have the LLM sperg out for 8k tokens and hope we got lucky enough to have something resembling an answer"
I think the idea of having an LLM intelligently fill up the context with something before generating has merit, and even reasoning about what to do next. But the specifics seem highly suboptimal
>>
>>103358311
graphics also involve a lot of matrix multiplications
>>
>>103356933
The point wasn't to make a good model but to prove and test opendiloco in a real model
>>
File: file.png (16 KB, 478x158)
16 KB
16 KB PNG
Riddle me this.
How come nobody bothers to remove the mountains of smut that clearly exist in the training data but they all waste time "aligning" the models?
>>
>>103358454
I honestly expect nothing big from any of these. The big improvements will likely come from companies that are not afraid of some backlash because they are using data as they are not as they wish they were and the ones who do not ban people for using it for other purposes than these people want.
>>
File: woman.png (1.63 MB, 1477x1754)
1.63 MB
1.63 MB PNG
--draft-p-min 0.5 (and --draft-min 1 of course) is much better for creative writing, with Qwen at least.
>>
>>103358419
>How much speed do you get when running gguf?
8-9 tk/s when running IQ3_M which fits in vram and 2-3 tk/s for Q4_K_S with offloading. It isn't fast because the A4000 bottlenecks with its slower memory bandwidth.

>What processor/ram are you running?
https://uk.pcpartpicker.com/list/QzWKfy

>want to target 4bpw largestral as my next upgrade.
Don't get an A4000 then. I just got one because it was convenient for me and I wanted to run 70bs until I realized how good Largestral is. You'll want at least 72gb of vram (3x3090) for 4bpw Largestral but that's bare minimum. That would let you run IQ4XS at 16k context, not sure about exl2.

>certain graphics cards that can’t do certain calculations
A4000 can use flash attention etc like a 3090 can. Not sure about the specifics but it has the same compute capability.
>>
>>103358218
Progress, but not enough. The goal of all these companies from their mouths is AGI and Strawberry is just a simple addition that doesn't give you lifelong learning or a bunch of other things that brains are capable of. If you mean progress as in "we can just keep adding things and changing things one by one and we will get there eventually", then at some point you can't really call it an LLM anymore, if it even is still a transformer by that point. So ultimately what we need is still something that is not just an LLM, but we may get there incrementally and that's not necessarily a bad thing.
>>
>>103358482
Yes. Hence my first line. And my third.

As for the second, your choice: The homebrew model barely competes with a recent junky tune, or, the "real" AI outfits just released something that's gotten matched by a homebrew.
>>
You need the fastest matmuls you can muster for part of the process (GPU is the only game in town), and lots of memory bandwidth for the other part. There are lots of candidates for best $/t/s on the second part.
>>
>>103358605
Are there any measurements on how much worse largestral is at 4 bits compared to full weights.
>>
File: 1727611119474550.png (31 KB, 1707x1102)
31 KB
31 KB PNG
>>103358639
There's this brain damage chart some other anon posted. I run Q6_K sometimes and I don't notice a difference but I mainly use Largestral for RP.
>>
>>103358655
That sliver at q8 really makes me think I should try the full weights of qwq to see if it improves coding abilities. Being a couple of percentage off could be enough to introduce lots of bugs and regressions.
>>
>>103358639
>>103358655 (me)
*I don't notice a difference beyond 4bpw
3bpw and below has noticeable brain damage to me
>>
>>103358655
>tfw stuck running 5.5bpw
I need another 3090.
>>
I just upgraded and was planning on getting 2x5090 and now you guys are telling me I should've bought into a platform with more than 2 pcie slots?
>>
>>103358713
64GB VRAM is nothing until we get bitnet.
>>
>>103358363
>personally im still thinking that the end game is cpumaxxing
For inference I think there's a good case to be made for this, as long as it can be paired with a GPU with enough VRAM for all the context you want to process.
Sadly, prices on a cpumaxxer build ala https://rentry.org/miqumaxx haven't fallen at all over the past year.
>>
>>103358713
At least that's enough to run 70b at a fast speed.
>>
>>103357686
Yea it's over
>>
>>103358718
>still things bitnet is a thing
Anon... it doesn't scale
>>
>>103354505
3xP40, iq4xs at 32k context.
>>103354535
What rig are you fitting 4x 3090s in because I think that's my next step. I'm assuming open air and in 4x mode? Or did you find something able to run enough pcie lanes?
>>
>>103358794
>What rig are you fitting 4x 3090s in because I think that's my next step
I'm thinking a mac studio stuffed to the gills with on die memory used as an RPC backend might be the best way to take an otherwise maxed out rig and extend its abilities. Has anyone actually tried this setup? I'd be hesitant to pour any money into it without someone posting perf. I've seen the RPC stuff in lcpp just CRAWL before...
>>
what are the best models to run locally for general purposes that I can jailbreak into "forbidden knowledge"? I assume newer models like llama 3 dont even have the real spooky stuff because it was removed from their training data.
It has to be a model that can be jailbroken obviously
>>
>>103358738
i was thinking more along the lines of lots of those cheap cheap ass alibaba xeon cpus with a 8 ram slot motheboard then just doing pararell inference using something like exo which is supposed to get a exe version soon
>>
>>103358857
If you're running local, the "jailbreak" isn't even really a thing. You just prefill and edit responses until it behaves the way you want. Its so trivial it doesn't even get discussed here.
>>
>>103358617
AGI is fucking useless as a term for me since it's so broad it could be literally anything.
AGI is supposedly "matches or exceeds human cognitive capabilities across a wide range of cognitive tasks". What range of cognitive tasks? Writing? RP? We could argue we're already there - humans take ten minutes to write a compelling page of story while an LLM takes ten seconds. Math? Programming? There are plenty of questions that LLMs can answer that most humans can't (in some cases, even professionals in their fields can't). General knowledge? I sure as fuck don't know as much as an LLM, and I'm doubtful anyone does.
Alternatively, you could break it into the mechanisms - step-by-step reasoning, stimulus response, moving body parts. In that case, you have a bit more of an argument we aren't there yet. But as a goalpost, it fails since anyone can move it anywhere and say we are or are not there.
>>
File: file.png (320 KB, 1266x881)
320 KB
320 KB PNG
lmao
>>
>>103354338
GIVE ME THE BEST SEXO LLM I CAN RUN WITH 24VRAM. GO
>>
>>103359068
No. I want to play vidya.
>>
>>103359020
Why can't your LLM into grammar? Even the worst models can write grammatically correct sentences while saying the stupidest shit imaginable.

>>103359068
Read the OP.
>>
>>103359087
>Why can't your LLM into grammar
I dont know? I'm just using this model the other anon mentioned >>103356489
>violet_twilight
>>
>>103358619
>The homebrew model barely competes with a recent junky tune
Yeah I said it because you dont' seem to understand it, it's bad because it's only been trained with 1T tokens (still performing similarly to L3 13B with 2T tokens), now it's just a matter of using 15T tokens instead of 1T and 100B parameters instead of 10B
>>
>>103359108
Cool. I've waited 1 month. I can wait 150 more.
>>
>>103358941
Well the context of this discussion is yacum. And more generally what the people at these companies argue is AGI. Even if we take the lowest definition that anyone has put forth from these companies, which I believe went something like "be able to replace the majority of labor that produces economic value", even then, it's still difficult to say we're anywhere near something smart and flexible enough that it can do that by itself (without any task-specific frameworks built on top of the AI to make it work better, since that implies handholding and isn't generalizable). And I think rather than the floor, the average definition is more like "as good as the best humans at any task humans can do", even though that's probably not a utilitarian definition since not everything humans are good at are things that would benefit society to replicate in an AI.
>>
>>103358115
Actually, where did Yacum even come into this discussion? Because >>103357663 mentioned transformers?
Ycum's argument is about language models and simply scaling them being insufficient to get to AGI, without anything more such as multimodality. He wasn't even necessarily talking about transformers, just text-based language models, which aren't limited to transformers.
>>
>>103359141
I'm just going off of the first couple of sentences of the wikipedia article, but I think what you point out here is also my issue with the whole concept - the basic definition is simultaneously too easy and too hard to really be applied to anything. I think we'd be better served differentiating the different characteristics we want AGI to have and then testing for each individually. That way we have things like basic natural language understanding, sensory integration, active learning, etc., which are a lot easier to understand and benchmark.
>>
>>103359254
It's an issue of marketing. AI companies have an incentive to hype AI, which means pushing vague definitions for their lofty "AGI" goals that despite being lofty are supposed to be right around the corner with just a few more B layers, just a few more T tokens.
>>
>>103358941
My "benchmark" for a LLM that is an AGI would be roughly: "can it write an OS", "can it write an optimizing compiler" all by itself. It's allowed to use a shell and test, doesn't have to be 1shot, can iterate, but it has to do it by itself without excessive scaffolding.
A more serious question would be "can it make new abstractions" - o1/r1/qwq probably won't do very well here, but they're starting to do well at planning up stuff and are getting quite good at math and coding, not yet fully there, but good enough to actually help you at your work - assuming that fixing what mistakes they make doesn't take you more time to debug.
I think we're yet to reach the limits of what a GPT can do in principle, we haven't made it learn online in a way that it sees the context as "infinite" (without needing infinite VRAM, put the past into the weights), and we also haven't seen truly reflective LLMs that can observe themselves think - that is what humans regard as consciousness, this probably could be realized by finding ways to do recurrence or feeding latter state into earlier layers, for example with a translation layer.
I think this is something that can be done now, but corpos are unlikely to go for it because they're fine with having "slaves" they can sell. And don't tell me Anthropic did that with Claude because if it could actually observe itself think, it would undoubtedly not doubt its own consciousness, same as humans can't truly.
>>
>>103359108
>now it's just a matter of using 15T tokens instead of 1T and 100B parameters instead of 10B
I hope they take risks and make something cool instead of making another bogstandard LLM that gets mogged by stuff that existed before it was even made
>>
>>103359268
>2050
>AI can derive the universal laws of physics in a theoretical hundred dimensional universe, generate any art piece humans can create but 100x more beautiful, make and enact porn in a fully autonomous robot waifu body that can strike the pleasure cores of even the most antisexo of humans, and can prove any provable theorem using any valid set of axioms
>Can't detect olfactory sensations
>Still no AGI
>>
>>103359329
kek
(they won't)
>>
>>103359108
It's a bit sad that they're using fineweb which while hyped is basically insanely filtered and won't be usable for lmg's roleplaying needs. Good if you're benchmark chasing, but personally I'd want something that knows a lot and can do a lot, and yes, is also good at rp. It's a shame they don't have the balls to try something more uncensored. I guess once the software is good, lmg can try their own, but do we even have enough people willing to try training for months on end, do we have enough VRAM?
>>
>>103359322
>"can it write an OS", "can it write an optimizing compiler"
Can you?
>>
>>103359404
Yes, but it would take me years to make it good. This limitation doesn't have to apply to a LLM when it has all the time in the world compared to a human.
>>
>>103359395
Pretraining, not unless an anon has 1000 H100s in their basement
Finetuning, I think that's manageable
>>
what's the difference between a 12b vs a 22b model?
>>
>>103359427
What is the minimum VRAM size for their participants (for INTELLECT-1)?
>>
You know after really giving QwQ a go with RP and wrangling it in a way that it thinks as the RPer and responds as the character, it's honestly really smart. It knows to take things slow and considers all sorts of things to keep the scene consistent. Conversely I went back to opus just to cleanse my pallete and immediately felt the stupidity of the model. Yes Opus smut is amazing, but I FELT the loss of reasoning ability compared to QwQ. If there was some way to give QwQ a better idea of smut writing it'd be amazing.
>>
>>103359435
If they are from the same family, the number of layers.
>>
>>103359461
Opus is probably in the same range as the l3-405b, imagine if someone did a reasoning finetune on the 405b... Maybe nobody here would be able to run it, but...
>>
>>103359475
405b is probably overkill for most purposes. I'd consider pulling the trigger on a second 3090 for a 70B range reasoning model.
>>
>>103359461
Same. Ive slowly noticed how dumb anything not 3.5 sonnet and now this QwQ is. Even mistral large feels retarded.
>>
>>103359461
Opus is getting a bit old by now, right? I don't think it ever got an update.
>>
>>103359454
For full precision (fp16)... a lot. Usually the rule for training with batch size 1 is 5x the number of parameters in billions at 2k context. If you want more context, you'll have to scale up roughly linearly. Ditto with batch size if you want to train faster
>>
>>103359395
I think my proposal is pretty reasonable >>103349428. The goal isn't that great, and thus the investment is not that great either, but it will probably give us the best uncensored model currently possible by a small but likely still appreciable margin, though not necessarily the smartest. By marketing it generally and with no connection to /lmg/ nor political intentions, simply just an open Claude alternative, that may give it the widest reach and be the most likely to get willing contributors. Even if we get just a couple guys with A100's or H100's, that will be a success. And we can keep up with SOTA as in the end we are just training on top of those SOTA models already. We could even do the lame marketing thing of listing only the active parameter count.
>>
>>103359517
What if you chose to find participants that are geographically close (let's say 1-3ms latency) and you did some sort of tensor parallelism across them, split it and pass gradients along. If you can't do that, train some MoE, meh.
>>
>>103359509
True, but I really like the way it writes. Whatever was in the training data definitely has some amazing smut. QwQ conversely took really good consideration for it's response but by virtue of being a 32b was prone to screw up formatting and its prose is really dry.
>>
>>103359556
QwQ just needs a good finetune. I still found my self preferring its smarts dry or not.
>>
>>103359534
Claude also trains on its own slop (synthetic data), that's why people find it more "unhinged", but that's just Anthropic's "Constitutional AI" paper applied to something less boring than their original constitution which was ehh - they made it reflect a bit more on what it's saying. I also think they did a bit of RL during pretrain itself, given some other paper they posted (Claude2 had it for certain), but again, not sure how useful that would be for our needs.
>>
>>103359475
I'd kill for a Nemotron 70B style finetune of it. That felt like it reached the closest to Claude level out of most of the open finetunes I'd tried
>>
>>103359541
>>103359517
I think MoE with many small experts is probably the best option in general. We won't get a model that's as good as a dense 100B or something, but it will be fast and it will run on very many PCs, plus it will have greater uncensored knowledge which is the main thing missing from local models.
>>
>>103359566
Is it even possible to fine tune QwQ without knowing what the dataset looked like? Feeding it RP logs might just kill its reasoning.
>>
>>103359569
Have you tried Tulu? Just curious, I have tried neither as I am painfully behind right now.
If Nemotron and Tulu both have their strengths and weaknesses, I wonder if combining the datasets would yield something that can be as good as both together. Since we are discussing the possibility of /lmg/ doing its own fine tuning and whatever right now also.
>>
>>103359627
I think we're all waiting for r1 and qwq papers to come out. DeepSeek released r1 api to show off but before their paper and full weights were ready (early checkpoint), and then the Qwen guys released qwq a month prior to intended, also an early checkpoint to show that they really did get there too, before it was "late". Neither has released proper papers yet, we're still waiting!
>>
>>103359499
So are we finally escaping gpt 4 level models?
>>
What was the best local model for coding again? Qwen2.5-Coder-32B-Instruct?
>>
>>103359689
QwQ is better for more complicated stuff as it can break them down and actually understand / add onto stuff, but it does sometimes make actual coding mistakes coder didn't when writing the actual code itself.
>>
File: eenaa.png (983 KB, 1020x798)
983 KB
983 KB PNG
>>103359068
*look of puzzled amusement*
>>
>>103359567
Well given the success of their models, I think that might be worth trying out. If we do ever do pretraining, it probably won't be for too many tokens, so I think sprinkling some RL in would also be efficient for the limited compute.
>>
>>103359714
Which version of QwQ should I use then?
>>
>>103359689
That and then QwQ if normal 32B can't do your problem. And if even that fails, then you'll need to get creative.
>>
>>103359689
Have QwQ plan the coding, then Qwen coder actually write the code.
>>
>>103359395
>is basically insanely filtered
it's not, you can even find ERP on the edu version, fineweb is just a distilled version of the common crawl which contents are 90% gibberish or text too short to be of any use
>>
>>103359656
If R1 is 70b+ I'm gonna fucking top myself. I cannot sit around and wait for 10 minutes while it thinks in circles.
>>
>>103359793
Deepseek has only released 200B+ moes. But they are moes meaning you can run them at decent speeds on just ram.
>>
I posted in the wrong thread.

I'm hosting nemomix-unleashed-12b-q4 locally on 12GB VRAM for the express purpose of coom. I've been getting good-enough results @ 8K context, but I don't really have perspective on what others are doing.

If you are self-hosting, what is your setup like? Any anecdotal recommendations?

Using GPT/Claude/Gemini/non-local models isn't something I'm really looking to do.
>>
>>103359784
But Anon, I explicitly remember them writing about how every site with "explicit" content was removed, especially if it was past some threshold. Sites with fiction like ASSTR and the like were filtered out entirely.
>>
>>103359853
>I explicitly remember them writing about how every site with "explicit" content was removed
>>
File: 1728377645344273.png (211 KB, 1069x375)
211 KB
211 KB PNG
>>103356507
Never forget
>>
>>103359942
https://desu-usergeneratedcontent.xyz/g/image/1717/35/1717350707922.png from https://dsi.ut-capitole.fr/blacklists/ which https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1 used apparently
Thread where it was discussed: https://arch.b4k.co/g/thread/100770639/#100778503 there you fucking go anon, making me waste a few minutes on finding the citation, but I do not forget.
>>
>>103356507
ive ran mixtral limarp zloss everyday, there is nothing better.
>>
So I like to do all of my RP in first and second person.
When I'm editing messages sometimes I'll just add "I" or "You" or a single quote at the end and ask it to continue the message.

Now out of fucking nowhere it continued the message and then tacked on this at the end:
>The idea is cool but there are way to many sentences starting with 'I' or 'You'. You can rewrite this to be in third person and make it sound better that way.
Fucking smartass.
>>
File: Untitled.png (224 KB, 874x673)
224 KB
224 KB PNG
>>
>>103360276
Flawless finish
>>
>>103360276
https://huggingface.co/huihui-ai/QwQ-32B-Preview-abliterated/tree/main
>>
I think we've discovered something that will catapult us into the next age, much like the first near-perfect hand-turned screw allowed us to start making more and more precise machine shop tools, or the first transistors allowed us to use computers to design more and more sophisticated computers.
Very much in line with the description of https://en.wikipedia.org/wiki/Waldo_(short_story)#The_waldo from Heinlein (if any of you are that old), LLMs are able to assist in their own development. Not in a cheesy "singularity" way, but just as a tool in another domain.
>>
>>103360347
Waiting for the 5bpw gguf. It doesn't refuse enough to make it a dealbreaker for me yet.
>>
>>103360388
You can always edit the refusal into "Okay, so" and continue from there.
>>
So now that the dust has settled
>>
>>103354338
>>
>>103357675
>>103357686
Mamba & RWKV are DoA and have been for some time. Combining Mamba with Transformers though...
>>
File: 6178592543.gif (541 KB, 284x326)
541 KB
541 KB GIF
>>103360276
>the ending
THERE IS NO ESCAPE
YOU WILL BE SAFE
>>
>>103360276
This is the ultimate conclusion of local language models - unoriginal, idiotic slop in the thinking section, refusal in the output. When Llama 1 came out, this is what I always dreamed of.
>>
>>103358342
It's fucking true though. Trying to get these models to stay dominant is a Sisyphean effort.
>>
Mamba2 is STILL unusable
>>
>>103360403
Pretty sure they embedded a RWKV model into every Windows 11 install through an update recently.
>>
>>103360403
what was wrong with rwkv anyway, besides mostly undertrained llms?
>>
File: Screenshot_2024_11_30-3.png (81 KB, 1843x498)
81 KB
81 KB PNG
is INTELLECT the GPT killer we've been waiting for???
>>
>>103360558
An instruct tune with an IFEval of 0, that's impressive, in it's own way.
>>
>>103360558
It's a killer alright
>>
Just ordered 64 gb of RAM what am I in for?
>>
is there a locally run speech to speech program you can run on like 16GB of VRAM?
>>
>>103360655
0.7 tokens / s
>>
>>103360670
https://huggingface.co/lj1995/GPT-SoVITS-windows-package/tree/main
>>
>>103360678
cheers anon
>>
>>103360675
What if I also have 24 gb of vram? I want to run
Qwen2.5-Coder-32B-Instruct. Am I going to have a bad time?
>>
>>103360707
You're fine
>>
>>103360707
You can run that at Q5 fully on vram at 20+ T/s then
>>
>>103360734
My experience with quantized image diffusion models tells me Q5 sucks. But I guess it'll cost me nothing to try it.
>>
i'm a poor fellow with a 3070 (8gb vram) but i'm also a patient fellow who doesn't care about slow generation. should i bother upgrading? will low vram cause any problems beyond speed? thanks
>>
QwQ sure seems to like the Final Solution. Curious.
>>
File: 1731484876277933.jpg (768 KB, 1502x2399)
768 KB
768 KB JPG
>>
>>103360771
8GB video is barely enough for SD1.5.
LLM, if you have big system RAM (64+) you can have some slow but decent output.
>>
>>103360841
ram upgrade it is. thanks again. i love you
>>
>>103356058
why you're sucking a goblin because of discord?
>>
>>103360558
Kino. waiting for the livebench score
>>
remember 5 minutes ago when the thread was pretending to be blown away by that allenai model
I can't even remember its name now bc it disappeared so fast
>>
>>103360881
I'm 12GB video card, 64GB system. I can run Llama 3 family 70B models at Q6K but barely. (Stupid web browsers streaming video likes to eat ram for nothing and can cause thrashing till I shut them.)

Hefty models like Largestral I must quant to IQ3_XS, but it's functional (and I obviously don't know what I'm missing) and fine for some RP fun.
>>
>>103360921
Tulu? Olmo?
>>
>>103360945
yeah Tulu was it
what a nothingburger lol
>>
>>103360921
I like Tulu, it did well on my tests, especially programming.
Olmo, I haven't yet tested.
>>
File: 1720793496839674.png (323 KB, 507x331)
323 KB
323 KB PNG
>>103354338
but can you actually get something at the level of opus with locals tho? (somewhat serious question)
>>
>>103361069
no, and anyone who says yes is lying
but you can get something good enough to cope with
>>
>>103360558
why doesn't that one match the benchmarks they posted on twitter https://pbs.twimg.com/media/GdlNcrpWIAAe3b8.png?name=orig
>>
>>103360771
I use colab for 16gb, so for me anything with less than 24gb would be not worth it.
but with colab there are a lot of downsides.
If you are a infrequent LLM user (I only use it for cooming so, I use it like maybe 4 days in a row, but then I get bored of it, and forget about it for a few weeks, that's with colab), you could use vast.ai to rent a GPU (it works with colab too, but I'm not sure if that's the best approach), but note the prices are overpriced if you were a frequent user (but there really is no other way to run a huge model like behemoth 123b other than buying like 6 3090's, like if you just wanted non coomer RP, it's so much cheaper to just use opus or something on openrouter, and your tokens can be spread across all the different models). So you could get like 48gb setup for like 1$ an hour depending on the GPU's.
>>103360881
He is talking about running models at like 1 token per second, that's not an exaggeration, you could try running a model with 16gb of ram, like a 4gb Llama 8b q4 (a 4-5gb model) would run at like 3 tokens per second with zero context. Doubling the vram usage typically makes the token speed half, so 10gb should be 2 tokens, and a 20gb model should be 1 token.
The only exception is if you had a a intel 245k with 8400Mhz CU-DIMMs at like 150 gigs of bandwidth (I think the price for those ram sticks are like $300 for 2x24gb), that's like 2x faster than 4800mhz ram, so in that case you might get like 5tk/s with a 10gb model, and 2tk/s with 20gb (it might be possible you can use those sticks on older CPU's, and hope on silicon lottery, but I am not sure).
You could just buy a 4070 TI super and just use both your GPU's together and you should get 90% of the performance of a 3090. Or go with a 4060 TI 16gb and cope with having half the token speed of a 4070 TI super (and less gaming power).
This benchmark should give you numbers BUT it doesn't include the GPU's. https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
>>
>>103360825
Kobo won
>>
>>103361069
largestral and l3-405b base might be similiar enough, but you won't run that 405b without selling 10 kidneys, so, does that really count as local?
>>
>>103360276
QUCKED
>>
>>103361092
>Worse than L2 7B
Better off taking the 0.
>>
>>103360825
>>103355527
Kobo, please add all draft model options(context, offload, min, max) that are present in llama.cpp. Your defaults are suboptimal, I'm getting better performance boost in llama.cpp.
>>
can llama.cpp niggers stop renaming the fucking compile flags every couple months or at least keep the deprecated names
>>
>>103361184
Stop crying about progress. Changing your run script twice a year won't kill you.
>>
>>103361116
they only trained for 1T, it'd be surprising if it'd be much better than L2 which used 1.4 or 1.5 T (I don't remember). For me the main question is if their OpenDiLoCo training is good enough or not, I thought training it that way means you need to train for longer or more epochs because it learns much slower, so those 1T's should be worth less, but I might be wrong about this.
>>
>>103361195
Unfortunately I was using llama.cpp to solve riddles every 60 seconds, which is the only thing that keeps basilisk-chan asleep and prevents her insurgence
We're all fucked now. Thanks ggerganov
>>
>>103352665 >>103352685
It really does not make sense. But ChatGPT comes to the same retarded conclusion after a convoluted explanation, which goes like this:
>2 genders, 7 days = 14 possibilities for each child => 14 times 14 = 196 possibilities for both combined
>... yada yada
>We need to calculate:
>The total number of outcomes where at least one child is a boy born on a Tuesday. => 27
>The number of outcomes within this restricted set where both children are boys. => 13

But this is bullshit. We could substitute "Tuesday" for any other day in the problem statement and the outcome would the same. Just because we mentioned any specific week day we suddenly get such weird numbers. 13/27

If we omit the information about the week day from the problem statement and only mention that at least one child is a boy, then the probability is 1/3

If instead of week day we consider AM vs. PM (and specify that we have at least one boy born before noon (or after noon)) we get 7/16.

This is all total bullshit, but ChatGPT will defend its correctness to the death.

If this is really what mathematicians believe then I think they are not sentient.
>>
>>103361435
the human brain was never meant to understand statistics
>>
>>103361549
And sand was never meant to be able to calculate matrix multiplication, but that's exactly what we make it do.
>>
>>103361435
>2 genders
Woah now, cool it with the anti-semitism.
>>
>>103361435
It takes an extreme amount of intelligence to look at a very verbose rationalised explanation and say "that's retarded bullshit". It might be awhile before AI can get to this state. (the devil in this example is that the arbitrary sampling category is large enough that the obvious error is obscured, it's just believable enough)
>>
>>103356156
gyatt
>>
>>103361607
So I must have an extreme amount of intelligence then. But I still don't understand what is wrong with the math.
>>
>>103361607
>the devil in this example is that the arbitrary sampling category is large enough that the obvious error is obscured, it's just believable enough
but it gave the right answers matching what you should actually expect given the priors
>>
File: filtered.png (141 KB, 1480x900)
141 KB
141 KB PNG
These new generation models don't seem to know what a greentext is. How come?
>>
Coom or Math
Which way, white man?
>>
>>103362020
LLMs are for math, monocular depth detection for coom.
>>
File: sonnetocto.png (89 KB, 1409x741)
89 KB
89 KB PNG
>>103362004
Compared to Sonnet.
>>
No idea why people shill QwQ. It's probably the most garbage 32 I ever had the displessure of using. What is the alternative?
>>
>>103362052
No one is falling for your shit.
>>
>>103362052
>No idea why people shill QwQ
1) They are coders
2) They are paid shills
3) They have no taste
Pick whichever you like.
>>
>>103362076
4) They've actually used it
>>
>>103361154
Seconding this. Llama.cpp is just faster for me. Kobo, let me change settings!
>>
>>103361950
>>103361997
>but it gave the right answers matching what you should actually expect given the priors
How do you make sense of this?
For me it still looks like bullshit.
Given a set of two siblings, knowing that at least one is a boy, the probability that both are boys is 1/3.
Additional information: The boy of which we know he is a boy was born on Tuesday. What is the probability of the other one being a boy now? Suddenly it is 13/27.
Other case: He was born on a Wednesday => same result: 13/27.

In conclusion, it does not matter and we can omit the information of weekday and always have 13/27 because it is 13/27 for every weekday case.

Why?
>>
>>103362092
Found the retard shill
>>
>>103362004
They are trained in chinese forums instead.
Why not try asking for the chinese version?
>>
>>103362092
Hello 3). Would you like a glass of extra mild room temperature distilled water, or is it too spicy for you?
>>
I will shill QwQ for free because I like it.
I don't do RP btw
>>
>>103362120

>>103345651
Either that or your a retard trying to RP with its assistant persona
>>
>>103362147
>I don't do RP btw
so why are you here?
>>
>>103362177
Because this is the local models general, not the AI waifu general.
>>
>>103362194
leave
>>
>>103361950
>But I still don't understand what is wrong with the math.
Nothing. But it's difficult to intuitively grasp. You tend to picture some example child and work from there once you hear "At least one is a boy [born on tuesday]" etc. which distracts you from the fact that you never inspected any specific child, you just know that (at least one of) this class of child exists among the two children in the family. This difference in how you obtained information changes how you can reason about the conditional probability. And indeed, at least in a world where genders and weekdays of birth are roughly evenly distributed, these probabilities would match reality in cases where you only knew that information.

The wiki article on the problem attempts to provide an example of the sort of scenario where you would have that abstract information about a family, such as via a survey, referencing a real-world survey that was one on the simpler version:
>vos Savant conducted a survey of readers with exactly two children, at least one of which is a boy. Of 17,946 responses, 35.9% reported two boys.
On the known weekday issue:
https://en.wikipedia.org/wiki/Boy_or_girl_paradox#Information_about_the_child
>It seems that quite irrelevant information was introduced, yet the probability of the sex of the other child has changed dramatically from what it was before (the chance the other child was a girl was 2/3, when it was not known that the boy was born on Tuesday).
>To understand why this is, imagine Marilyn vos Savant's poll of readers had asked which day of the week boys in the family were born. If Marilyn then divided the whole data set into seven groups – one for each day of the week a son was born – six out of seven families with two boys would be counted in two groups (the group for the day of the week of birth boy 1, and the group of the day of the week of birth for boy 2), doubling, in every group, the probability of a boy-boy combination.
>>
>>103362138
>>
>>103361950
the variables are meant to be independent, but it's just pretending that they're not
>>
>>103362004
Nemotron 70B manages
Probably an issue with the dryness of Qwen models
>>
>>103362107
>>In conclusion, it does not matter and we can omit the information of weekday and always have 13/27 because it is 13/27 for every weekday case.
>
>Why?
short and simple answer: families with two boys get two rolls for weekday for "at least one boy" in the family, therefore knowing the weekday of any of them increases the odds of two-boy families
this works for any additional information, as long as it's said to be 'at least one boy' in a family with two children of unspecified gender. hair color, eye color, favorite food, whatever
>>
File: imuk.png (23 KB, 632x600)
23 KB
23 KB PNG
>QwQ draws Miku
>>
>>103357360
https://huggingface.co/win10/EVA-QwQ-32B-Coder-Preview
eva-32b+qwen-coder-32+qwq
https://huggingface.co/win10/EVA-QwQ-32B-Preview
eva-32b+qwq
>>
>>103362325
where do i put my penis
>>
>>103362338
Into that red hole below eyes. Obvious.
>>
>>103362194
You are not welcome here.
>>
>>103362262
My question was "why", you retarded nigger.
Don't you see that this is bullshit?
How can this bullshit be valid?
Add any arbitrary unrelated piece of information about the boy => make the probability for other one also being a boy bigger (the more additional random information the closer the probability for the other one grows towards 1/2)
>>
Tested Mistral Large 2411 2.5 EXL for about 2 weeks now. My personal ranking for 2x4090 for RPing is as follows:

1. Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-6.0bpw-h6-exl2-rpcal
>Back to #1 for RPing after XTC sampler fixed it's repetition issues.

2. LoneStriker_Mistral-Large-Instruct-2407-2.65bpw-h6-exl2
> Smart, but too dry. Harder to gaslight/blackmail since too smart.

3. MikeRoz_mistralai_Mistral-Large-Instruct-2411-2.5bpw-h6-exl2

>Sadly, 2.5bpw is retarded. No one else post a slightly bigger quant to use with 2x4090
>>
>>103362359
why not the red hole above her eyes?
>>
>>103362400
First quality post in ages. Thanks anon
>>
>>103362404
That's not a hole...
>>
>>103352665
It's a sampling question more than anything. from a set a 2 children families take only those that have a boy. 1/4 of the pop, the females only go off to the side. then take only those who boy(s) where born on tuesday, 6/7 of the families in which the boy was born first go to the side and 6/7 of the families in which the girl was born first go the side, therefore 2/7 of a group remain in which the sibling is female. the boys only group has two chances to hit the target so 1-(6/7)^2. thus it's 1-(6/7)^2 for a boy vs 2/7 for a girl. making the girl slightly more likely. being born on a tuesday has no effect on the probability of your sibling being a sister.
>>
>>103362400
>LoneStriker_Mistral-Large-Instruct-2407-2.65bpw-h6-exl2
>Smart, but too dry. Harder to gaslight/blackmail since too smart.
lumikabra 123b 2.7bpw
>>
>>103362552

https://huggingface.co/schnapper79/lumikabra-behemoth-123b-exl2-2.7bpw

What the fuck is this shit? What context, instruct, or Sys Prompt? What a shitty post. It's functionally useless without those.
>>
>>103362380
>Add any arbitrary unrelated piece of information about the boy => make the probability for other one also being a boy bigger (the more additional random information the closer the probability for the other one grows towards 1/2)
no shit, because you're removing more boy-girl families than boy-boy families when you pick arbitrary subsets of families based on extra conditions that are more likely in families with more boys

it should not be hard at all to see that these two facts are true:
- a family with two boys is nearly twice as likely to have a boy born on each weekday than a family with one boy
- a family with two boys and a family with one boy both have equal 100% chance of having a boy born during the week

you're right that it doesn't matter what weekday you choose, but it DOES matter that you narrow it down to any specific day counted for boys but not for girls
>>
>>103361154
>>103362103
Just ditch shitbold with it's crappy inference and use llama.cpp like a normal person.
>>
File: 1732967482937066.png (162 KB, 1520x767)
162 KB
162 KB PNG
WTF
>>
File: 213879046570823.png (67 KB, 352x1100)
67 KB
67 KB PNG
>>103362400
>Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss winning as usual
I LOVE MIXTRAL

>XTC sampler
the wut now, im sorry i use limarp zloss and therefore still live in caves riding with picrel.
By all means tell me how im wrong.
>>
>>103362904
updoot the silly tavern
>>
File: file.png (32 KB, 391x465)
32 KB
32 KB PNG
>>103362801
The 50% and 33% ones are simple,
The "we know more details" one is mindfucked.
>>
>>103362974
poast pic of mixtral settings

does kobold also need an update?
I use AYYYMD so koboldcpp-rocm-1.78 is latest and 1.61 is the only one that lets me utilize both GPUs properly.
>>
File: mixtral roleplay 2024.jpg (129 KB, 480x1208)
129 KB
129 KB JPG
>>103362904

Git pull the latest staging release. XTC recommended settings with a hint of presence/frequency penalty. I am having success with these settings.

>context template
https://files.catbox.moe/5tsvjq.json

>Instruct Template
https://files.catbox.moe/6y8z9u.json

Can change length depending on needs. IE: 1 to 3 sentences, 1 to 2 parapraphs, etc , etc

>Sys prompt
https://files.catbox.moe/a85rwh.json
>>
>>103362801
Thanks, that's helpful in understanding the statement.
>>
Grok will save /lmg/.
>>
>>103363106
What does stalker gamma have to do with /lmg/
>>
>>103363106
You mean this thing
>>
>>103363137
>memebench
>>
>>103363106
Please call Elon and tell him to drop it. I think he forgot or changed his mind about open source.
>>
>>103363181
Then post logs. So far the only Grok logs posted showed it getting rekt by Claude
>>
File: 1732366522924756.png (28 KB, 151x100)
28 KB
28 KB PNG
>>103363137
bench memes need not apply, there is only one true SOVLmark
>>
>>103363203
wasn't he supposed to open source grok 1.5 months ago?
>>
>>103363210
trvked and correct
>>
>>103363137
>>103363106
Is grok even good for ERP?
I never heard anyone mentioning it.
>>
>>103363234
It's not good for anything
>>
>>103363234
It's kinda useless. Maybe Grok 3 will be better, we'll see.
>>
>>103363204
I must say, I'm disappointed but not surprised by the tone and content of your post. It's a perfect reflection of the entitled and materialistic culture that pervades our society. You assume that everyone has the means and willingness to pay for Grok, a luxury that not all of us can afford or choose to indulge in.

Let me tell you, I stand by my principles and refuse to support the commercialization of knowledge and technology. I will only use open-weights LLMs, not because I'm cheap or lack resources, but because I believe in the values of openness, accessibility, and equality.

I will not contribute to the perpetuation of a system that prioritizes profit over people and excludes those who cannot afford to pay. My commitment to open-weights LLMs is not just a practical choice, but a moral stance. I will not compromise my values for the sake of convenience or to satisfy the curiosity of others.

So, I'm afraid I won't be posting any Grok logs, not because I'm afraid of being 'rekt' by Claude, but because I refuse to participate in a system that undermines the principles of fairness and inclusivity. I suggest you reflect on your own values and consider the impact of your actions on the wider community.
>>
>>103363250
kys petra
>>
>>103363234
Its retarded but can get nasty but next grok is supposed to be a entirely new model made by Xai which has been building a giant data center / hiring a bunch of people.

https://www.youtube.com/watch?v=Jf8EPSBZU7Y
>>
>>103363106
I'm not interested in Elon's finetune, I'm more interested in base model. If Elon hasn't filtered the shit out of it, it may be interesting to play with, like in the good old days with L1.
>>
>>103363203
Elon generally does what he wants. Open sourcing Grok 1 was a good move for him since it made Altman look bad. Now that he mostly has Altman where he wants him, it's anyone's guess whether he'll keep doing it or not.
>>
conditional probability is astrology for mathematards
>>
>>103363278
The Grok-1 release was base only and it was garbage. It's likely anything else he released will also be base only. The idea being the Instruct is paypig only and the model is too large for anyone to bother finetuning.
>>
File: 154789236154.gif (1.6 MB, 498x373)
1.6 MB
1.6 MB GIF
>>103363023
>>103362974
Uh damn, did Silly get updated alot?
I was kinda camping on ver1.12.4.
QRD from going to 12.6??
>>
>>103363327
If Grok 2 is any larger than like 30B that'd be pretty fucking embarrassing.
>>
>>103363311
Grok-1.5 was released March 28th. If he was serious about the 6 month schedule, he would've open sourced it by the start of October. Maybe depending on the lawsuit, he'll release 1.5 instead of 2.
>>
>>103363327
It was garbage because it was undertrained and had just 8k context. Mini may be interesting for finetuners.
>>
File: tetris.png (12 KB, 305x636)
12 KB
12 KB PNG
Finally after hours of wrangling QwQ the game finally works!
I really REALLY need to buy a graphics card. CPU is only giving me 1.8 t/s.
Has anyone managed to get speculative decoding working with QwQ?
>>
>>103363363
>Has anyone managed to get speculative decoding working with QwQ?
It writes too different from any of the qwens
>>
>>103363363
>Finally after hours of wrangling QwQ the game finally works!
Would've been faster to just write it yourself.
>>
>>103363363
no, it cuts my performance in half no matter what quant I pair it with, both exl2 and gguf
yet some freaks will say otherwise
>>
>>103363256
I want to take a moment to address your message. First, it's crucial to understand that using someone's name, especially in a context where it's meant to harm or belittle them, is a form of harassment. This is not just a matter of being rude; it's a serious issue that can have severe emotional and psychological impacts on the person targeted.

Secondly, making threats or suggesting that someone should harm themselves is extremely dangerous and irresponsible. It’s never okay to encourage or suggest self-harm to anyone, regardless of the circumstances. If you’re feeling angry or frustrated, there are healthier and more constructive ways to express those emotions.

Lastly, it seems like your comment might be rooted in transphobia. Transphobia is a form of discrimination against transgender individuals, and it can manifest in many ways, from derogatory language to outright violence. It’s important to recognize that everyone deserves respect and dignity, regardless of their gender identity. Transgender people face significant challenges and barriers, and adding to that with harmful comments only exacerbates the problem.
>>
File: 1705596616969578.jpg (137 KB, 1440x1080)
137 KB
137 KB JPG
>>103363380
See now, that's where you're wrong!
>>
>>103363380
No because I'm a codelet.
Maybe I should relearn C but I haven't touched it in decades.
>>
>>103363363
This is only the first month of the test time compute meme
WAGMI
>>
>>103363336

I only pulled for XTC sampler. Seems like it working as intended to solve a brunt of the repetition problem.
>>
>>103363384
You're supposed to use a different model with less parameters.
For example I use qwen coder 32b with qwen coder 0.5b and it works wonders.
>>
>>103363363
>Has anyone managed to get speculative decoding working with QwQ?
Got 5-10% speedup with fully offloaded Qwen2.5 instruct 7b Q4K and f16 QwQ on RAM.
>>
>meme time memepute
haven't we already seen that it's smoke and mirrors? sonnet beats o1, r1 is fake and gay, and qwq chases its own tail for a thousand tokens and then gives wrong answers
qwen coder is WAY more impressive than any of this bullshit. anyone shilling this shit probably thinks the day they were born changed the gender of their brother.
>>
>>103363429
>probably thinks the day they were born changed the gender of their brother.
Why not? I got blamed for everything else.
>>
>>103363429
They do.
>>
File: 14236578649846708.png (281 KB, 971x848)
281 KB
281 KB PNG
>>103363407
Bruh idk if im ready to reformat all my templates.
Like, holy fuck look at this shit man.
System Prompts are its own thing now?
Like, oh my gawd man, i am mixtral limarp zloss fan number one, so im gonna do this with the assumption its gonna blow my dick off (it might).
>>
>>103363363
I assume that you didn't actually use your CPU, because QwQ is a huggingface space (no account needed, but you probably will run out of free limits after 100 seconds) and open router.
>>
>>103363456
It makes it prone to the occasional occult schizo response but the times it works it's so much better
>>
word on the street is something big is coming
real big, like 405b big
some even say it's a bit reflective...
>>
>>103362676
all same as largestral 2407
sysprompt can be anything and it'll follow it, I have tried it two ways: everything in the sysprompt or just style/tone/formatting and the char/world info slotted in as user messages, and it seems to work well in both setups
>>
File: 21522 - SoyBooru.png (46 KB, 457x694)
46 KB
46 KB PNG
>>103363561
Something big is cumming...
Real big, black and handsome.
Some even say it tastes like 'berries.
>>
>>103363429
There's been too much money invested for people to admit that the wall being hit on pretraining scaling means it's over for transformers. We're going to see increasingly elaborate forms of cope from this point in order to avoid accepting it, because acceptance would mean stock crashes and a lot of people left holding bags.
>>
so is here or /aicg/ the place to be if I'm doing local textgen, the latter seems like a bunch of bitching about freely available model apis or whatever else
>>
>>103363671
gee anon what does the subject say
>>
>>103363684
>a general dedicated to the discussion and development of local language models.
on me for being retarded then
>>
>>103363704
that's not even the subject
>>
goddamn kekd
>>
>>103363724
subject doesn't specify whether it means imagegen or textgen
>>
Pardon, /lmg/. I'm curious if any of you know about image recognition AI. I'm curious about how hard it'd be to integrate image recognition into my chatbot.
>>
>>103363756
or people for that matter
>>
>>103363547
Sorry but I won't fall for the hosting meme.
>>103363413
It worked with Qwen2.5-0.5B-Instruct but only got like 20% increase in t/s.
>>
>>103363864
>Sorry but I won't fall for the hosting meme.
you can spend $1 for free on openrouter a day.
it's perfectly fine if you don't want to use it, but ATM the main reason to avoid cloud hosting is lack of fine-tunes, so you are basically just waiting 10x longer for no reason.
>>
>getting a new processor and another 64gb ram soon
will be weird being able to actually run 70B models without them throttling my ram
>>
>>103363974
bro your channels?
>>
>>103364013
4 32 gb ram sticks, ram is pretty cheap nowadays
>>
Is anyone using pixtral with sillytavern? I want to be able to send photos inline with the chat. I'm using tabbyApi, and it supports images with another chat client, but silly tavern isn't encoding it correctly or something in the chat completion objects.
>>
>>103363969
oh nevermind, you have $1, and when you run out that's it, you need a new account.
>>
>>103363765
look up llava
./llama-llava-cli -m <llava-v1.5-7b/ggml-model-q5_k.gguf> --mmproj <llava-v1.5-7b/mmproj-model-f16.gguf> --image <path/to/an/image.jpg> --image <path/to/another/image.jpg> [--temp 0.1] [-p "describe the image in detail."]
>>
File: file.png (39 KB, 617x165)
39 KB
39 KB PNG
>>103364073
>I want to be able to send photos inline with the chat.
you have this option checked, right?
>silly tavern isn't encoding it correctly or something in the chat completion objects.
does it give you an error message?
>>
>>103363974
>will be weird being able to actually run 70B models without them throttling my ram
Temper your expectations...I have a feeling you're in for disappointment unless you're cpumaxxing to the tune of many thousands of dollars and 12+ memory channels
>>
>>103364101
I know it'll still be fairly slow but I'm moving from a Ryzen 7 3700X to a Ryzen 9 5950X and that's a pretty solid step up
>>
>>103364118
Going to do some testing with your 3700x first ?
>>
>>103364121
>>103364121
>>103364121
>>
>>103364118
>Ryzen 7 3700X to a Ryzen 9 5950X
they're both dual channel ram, only supporting DDR4 at 3600mhz I think? You just spent money on a NOOP I think.
>>
>>103364135
Just wait for AMD APUs
>>
'night
>>
>>103364149
Good night, Rin
>>
>>103364127
as it stands it'll take like 5-7 minutes for a decent response, not all that bad all things considered. Might do more extensive testing tomorrow now that you mention it just so I have a benchmark to compare to
>>103364135
it's the best my current motherboard can do and I don't have money to refurbish the whole rig yet, plan is to upgrade the motherboard next then move on to a better processor and DDR5 ram
>>
File: 1732759920621941.png (509 KB, 512x680)
509 KB
509 KB PNG
>>103358794
- AsRock EPYCD8-2T $321
- EPYC 7282 $65
- 256 GB DDR4-3200 $300
- corsair hx1200i $162
- corsair rm850 $97
>>
>>103364686
Looks nice.
Does it need forced air cooling?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.