[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now being accepted. Apply here.


[Advertise on 4chan]


File: 1705806843225442.jpg (1.96 MB, 2400x3346)
1.96 MB
1.96 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103173457 & >>103164659

►News
>(11/12) Qwen2.5-Coder series released https://qwenlm.github.io/blog/qwen2.5-coder-family/
>(11/08) Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b
>(11/05) Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large
>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip
>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
Kill yourself.
>>
>>103188780
>>103188780
>>103188780
op is a filthy thread splitter. spit on him.
>>
>>103189341
hi, petrus? how's serbia treating you?
>>
>>103189328
there's already a thread, retard
>>
>>103189372
That's a troll thread.
>>
>>103189378
>i don't like it.
>it is a troll!
you are a redditor.
>>
>petra starts posting here again
>early bakes with his favorite anime girl happens again too
weird
>>
>>103189427
hi petrus, why are you posting in the wrong thread?
>>
>>103189378
Make your case about what's wrong with the other thread.
I don't see blacked shit in the OP or anything.
As far as I can tell, this is just starting a flame war.
>>
>>103189515
Kurisufag/Petra/blackedmikuanon/AGPL-spammer/drevilanon/2nd-belief-anon/midjourneyfag/repair-quant-anon has a history of trolling, see: >>103164618
That's reason enough to ignore his thread.
>>
>>103189515
>As far as I can tell, this is just starting a flame war.
Indeed, making the kurisu thread clearly had that intention
>>
>>103189536
>Kurisufag/Petra/blackedmikuanon/AGPL-spammer/drevilanon/2nd-belief-anon/midjourneyfag/repair-quant-anon
Just so you know you come off as a complete schizo right now. So keep going. I am having a blast.
>>
>>103189536
>>103189544
So there's nothing inherently wrong with the other thread then.
Understood.
Thank you for clarifying.
>>
>>103189515

>>103188802
>pretending /lmg/ is relevant enough for things like early bakers and thread wars these days
>>103188810
>I just want the psycho to split the thread again and then make some samefag posts with his model.
>>
>>103189562
>I just want the psycho to split the thread again
You did just that so congratulations playing into his hand.
>>
>>103189560
There's something inherently wrong. It's created with the purpose of trolling.
>>
>>103189570
>trolling
Isn't that unsafe and against the rules of 4chan?
>>
>>103189328
Maybe one day AI will be able to do anatomy so good it can generate this kind of image easily.
>>
File: 1730125415014155.png (6 KB, 298x169)
6 KB
6 KB PNG
>>103189328
>>
Is there anything better than Magnum v4 for ERP?
>>
>>103189783
everything in the world
>>
>>103189536
Makes sense that the loser with no life is from Russia.
>>
>>103189328
>Thread Theme:
https://www.youtube.com/watch?v=6Y4b25CYkkg
>>
>>103189783
try mythomax
>>
Gojo is way cooler than petra btw.
>>
>>103189911
This, there was a guy posting a lot of Miku bot replies and they were pretty impressive.
>>
►Recent Highlights from the Previous Thread: >>103173457

--LLM training and probability modeling:
>103176841 >103177202 >103177445 >103177735
--Anons discuss the state of AI progress and the importance of high-quality data:
>103176961 >103177097 >103177340 >103178331
--Troubleshooting Qwen coder performance issues:
>103174082 >103174157 >103174288 >103174364
--Sarashina2-8x70b discussion, with hardware and model specs:
>103175207 >103175213 >103175230 >103175306 >103175501 >103175599 >103176901 >103177013
--Running samashina2 on a 4060ti with memory and batch size adjustments:
>103175313 >103175494
--Running qwen model locally with kobold and comparison with ollama:
>103174683 >103174754 >103174800 >103174884 >103174783
--Quantization type and model performance discussion:
>103174662 >103175030 >103175119
--INTELLECT-1 project nearing completion, training progress and metrics shared:
>103187105 >103187169 >103187214 >103187198 >103187372 >103187383
--FOSDEM 2025 Low-Level AI Engineering & Hacking Dev Room announced:
>103173860
--Anon compares performance of two AI models, puzzled by slower generation despite more GPU layers:
>103174513
--Anon claims to have found a 22B model rivaling mythomax:
>103181147 >103181158 >103181319 >103181337 >103181440
--AI model limitations and potential improvements:
>103178266 >103179566 >103179737 >103179953 >103180201 >103180347 >103182626 >103180168 >103180339
--Nemotron 70B optimization and formatting issues:
>103185721 >103185918 >103186031
--Athene-V2 model introduction and skepticism:
>103187457 >103187531
--Anon thinks altman implied ARC AGI is just a "meme eval":
>103186291
--NexusBench: function call, tool use, and agent benchmarks:
>103187563
--Comparison of qwen 2.5 and llama models' entropy levels:
>103173668 >103176180
--Miku (free space):
>103176710 >103177770 >103180955 >103181848

►Recent Highlight Posts from the Previous Thread: >>103173461

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
File: ComfyUI_00794_.png (1.07 MB, 1024x1024)
1.07 MB
1.07 MB PNG
>>103189352
I'd rather spit on you.
>>103189783
Going to give Behemoth-v1.1-Magnum-v4-123B a try
>>
File: 17234586894453.png (537 KB, 512x512)
537 KB
537 KB PNG
>>103189536
>hes serbian
HAHAHHAHAHAHAHAHAHAHA
>blue haired anime girl makes him seethe

>>103189515
>This is just starting a flame war.
No shit? Report the thread, it really is just that easy.
>>
>>103189783
Yes. Anything else.
>>
>>103189783
Mixtral LimaRP Zloss
>>
Pygmalion has been doing god knows what. Like seriously, what have they been up to?
>>
Any models as good as Opus for RP?
>>
Always nice to see people making friends and getting along well with each other.
>>
>>103190306
The new Sonnet 3.5.
>>
>>103190351
Okay, what about local
>>
>>103190362
Magnum v4 72B.
>>
>>103190340
*push*
>>
>>103189570
>waah waah trolling! jannies halp mee!
back you go >>>/reddit/
>>
>>103190369
Can you post logs of it proving it's as good as Opus in intelligence and context size?
>>
What's a good non-slopped 70b model for fiction? I have been using Llama 3 Instruct Storywriter and I am looking for an upgrade. I asked in the last thread and someone suggested Nemotron and when I tried it there was enough slop in it to be bothersome
>>
>>103189515
Nta but here's your tldr: waifufag OP gets mad every single time when new thread without miku pic is created, early bake or not.
It's literally just that with some wannabe doxxer schizobabble: >>103189536
>>
>>103190391
It's a local model, run it yourself.
>>
>>103190427
This is fake news, disregard Serbiafag
>>
>>103189515
This poster has been chasing away actually valuable users and taking away from the conversation for a year+ now since no one wanted to join his shitty discord server. Post like this >>103190427 is his way of ruining the general and keeping the attention on him or his autistic idea of what the general should be.
>>
>>103190393
Try Magnum v4 72B.
>>
>>103190440
how? I don't have a good PC, so I can't.
>>
>>103190393
As your post demonstrates, "good" is subjective.
Just pick one and try it, and see if you find it "good":
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
>>
rentry.org/itsfunny

It was me that doublebaked by the way.
>>
>>103190494
Who?
>>
i fucking love double baking its so funny

my face is so punchable too but you'd never know haha losers
>>
>>103190513
asaproxy owner doe?
>>
>>103190531
But I made the Miku bake.
>>
Literal tranny thread, y'all never beat the allegations with this shit.
>>
>>103190540
>But I made the Miku bake.
But you didn't. I did. Retard
>>
>>103190485
Then what does it matter.
>>
>>103190598
dunno? Post a good opus model that can run on anything
>>
>>103190614
https://packagist.org/packages/andreskrey/shitty-markov-generator
>>
>>103190667
not as good as opus for rp doe
>>
>>103190673
you want to run on a potato, you're gonna get what you're gonna get.
>>
File: ComfyUI_00055_.png (1.24 MB, 1024x1024)
1.24 MB
1.24 MB PNG
for me...
>>
>>103190692
cool, so theres no good local models that are free that mog opus since you need hardware?
>>
>>103190730
Yes, the technology is inherently power-hungry. The cloud stuff you're using has literal million dollar servers on the back end.
You can probably get creative and replicate something claude-esque for $1-15k depending on how creative you are and how long you're willing to wait for responses
>>
>>103190767
> The cloud stuff you're using has literal million dollar servers on the back end.
So learn to scrape and host it for free doebeit zoebeit boebeit? A lot of proxyhosts do that. Why don't you, sir/ma'am?
>>
>>103190487
I am looking for suggestions on what other people think is good for stories and doesn't have a lot of slop. It doesn't make any sense to go through the whole leaderboard one at a time, especially since they all are just gaming benchmarks anyways
>>
>>103190778
you can. many do. the folks in LOCAL MODELS GENERAL don't, obviously.
We value privacy, autonomy, self determination and control over our own technology.
>>
>>103190797
>you can. many do. the folks in LOCAL MODELS GENERAL don't, obviously.
So can you please just... prove it? Do you have a keycount you can show? :)
>>
File: ComfyUI_00071_.png (1.04 MB, 1024x1024)
1.04 MB
1.04 MB PNG
for me...
>>
>>103190823
>keycount
What are you asking exactly?
>>
File: Untitled.png (1.8 MB, 1080x3392)
1.8 MB
1.8 MB PNG
Cut Your Losses in Large-Vocabulary Language Models
https://arxiv.org/abs/2411.09009
>As language models grow ever larger, so do their vocabularies. This has shifted the memory footprint of LLMs during training disproportionately to one single layer: the cross-entropy in the loss computation. Cross-entropy builds up a logit matrix with entries for each pair of input tokens and vocabulary items and, for small models, consumes an order of magnitude more memory than the rest of the LLM combined. We propose Cut Cross-Entropy (CCE), a method that computes the cross-entropy loss without materializing the logits for all tokens into global memory. Rather, CCE only computes the logit for the correct token and evaluates the log-sum-exp over all logits on the fly. We implement a custom kernel that performs the matrix multiplications and the log-sum-exp reduction over the vocabulary in flash memory, making global memory consumption for the cross-entropy computation negligible. This has a dramatic effect. Taking the Gemma 2 (2B) model as an example, CCE reduces the memory footprint of the loss computation from 24 GB to 1 MB, and the total training-time memory consumption of the classifier head from 28 GB to 1 GB. To improve the throughput of CCE, we leverage the inherent sparsity of softmax and propose to skip elements of the gradient computation that have a negligible (i.e., below numerical precision) contribution to the gradient. Experiments demonstrate that the dramatic reduction in memory consumption is accomplished without sacrificing training speed or convergence.
https://github.com/apple/ml-cross-entropy
neat
>>
>>103190845
>What are you asking exactly?
For the amount of AWS keys with Opus you have.
>>
>>103190890
zero. I'm here for a reason.
I haven't used a cloud model for six month plus
/aicg/ is down the hall
>>
>>103190897
>zero. I'm here for a reason.
That reason being you're a big fat techlet?
>>
>>103190878
I feel like a groundbreaking paper detailing newfound efficiencies in LLMs comes out every week, but I rarely see anything practical come from them on the consumer-grade side.
>>
>>103190900
>That reason being you're a big fat techlet?
Yes. I'm fat, lonely, bald, malodorous and functionally retarded.
We don't have any cloud keys. You're in the wrong general.
>>
>>103190706
das a good OG subaru, needs better proportions for miku to subaru ratio though
>>
>>103190919
post a better general with free opus access
>>
File: ComfyUI_00088_.png (1.12 MB, 1024x1024)
1.12 MB
1.12 MB PNG
For me...
>>
>>103190954
nasty / deformed looking feet, wtf is this gen
>>
So what's a good model in around the 7B-13B range that is capable of basic erotic RP/discussions?
For the record my current guess is some kind of LLama3 fine-tune???
I want to use it to power my sex doll by putting a BL mic/speaker in her head or around her neck.
I would use a 9DOF sensor with a BL module around her hips to detect when I'm fucking her (movement detection).
Basically the soft would have a heuristic to use pre-recorded moans to play when she is getting fucked, but I want her to be also capable of answering a question while fucking or when we are cuddling. Perhaps have the LLM change some settings using function calling.

Because of this I need to run the model relatively fast so it needs to fit into my 16GB 4070 Ti Super. That way I could account for some quirks of the model by using a classification step and alternating between prompts or multi-agent shit or something.

Also, for my first prototype I want to make it entirely local because unless necessary I would hate to expose dirty talk and sexual stuff to a cloud provider so I'm going for a local ASR -> LLM -> TTS loop.
My biggest concern is the ASR, but let's disregard that for now.
If my shit works I'm sharing the tech on GitHub faggots so please be kind and halp a bitshit crazy dude.
>>
>>103191005
>bitshit crazy
*batshit crazy
Sorry, typo.
>>
>>103191005
Opus is pretty good, have you tried it?
>>
File: ComfyUI_00103_.png (1015 KB, 1024x1024)
1015 KB
1015 KB PNG
>>103190986
For me...
>>
>>103191005
>ASR
dafuq is asr?
>>
>>103190373
Rude. She'll be fine though as FLAOT has contributed to many technological advancements.
>>
>>103191005
GPT-SoVITS is the best local TTS
For LLM, you aren't going to fit anything non-retarded into 16gb.
>>
File: ComfyUI_00107_.png (1.04 MB, 1024x1024)
1.04 MB
1.04 MB PNG
For me...
>>
>>103191005
Try ministral 8b for speed.
>>
>>103191123
please use a better model that can gen feet retard
>>
>>103191123
Need one with a yellow or black Nissan R34 pls
>>
>>103191077
I am imagining Fernando Alonso with a big-booba Miku and Kurisu beside him.
>>
>>103191032
Not yet. If you mean these: https://huggingface.co/collections/dreamgen/dreamgen-opus-v1-story-writing-and-role-playing-models-65d092a6f8ab7fc669111b31
I'll check them out, thanks.

>>103191127
Thanks

>>103191105
ASR = Automatic Speech Recognition
Essentially, it just converts speech to text.

>>103191113
>GPT-SoVITS
I have seen this before, but I haven't tried it yet. So thanks for the heads up. Looks promising, especially if the voice cloning works fairly well. I could get a lot of clean audio from VR videos (the female talent is close to the mic and the male doesn't speak in most POV stuff) and video games (usually has separate clean audio track in the game files).

>For LLM, you aren't going to fit anything non-retarded into 16gb.
I am/was afraid of that. I'm merely targeting local only first to see how much I can push it with a small local setup. In the end I could always just point my shit at a cloud endpoint or rent a GPU (something like runpod). Anons with a large homelab can always point shit like this to their local setup, but unfortunately I'm not currently in a position to get ~100GB of VRAM.

I have tested some ideas with a discord bot I wrote (text only) and so far some of them seemed to work quite well. I need to do more tests on smaller models, but I have a few ways to select a specific prompt while keeping the context small. Essentially what I'm prototyping now is what you could call an NPC with a behavior tree so I'm not allowing the LLM to stray away. I'm doing multiple calls per input to profile/analyze/tag the input and even rewrite it, but it's currently in a very PoC state. Probably won't work out (reliably enough) in the way I want.
>>
>>103191289
>ASR = Automatic Speech Recognition
whisper is the STT engine that's universally used.
>>
File: ComfyUI_00122_.png (1.08 MB, 1152x896)
1.08 MB
1.08 MB PNG
For me...
>>
File: ComfyUI_00127_.png (1.16 MB, 1152x896)
1.16 MB
1.16 MB PNG
For me...
>>
>>103191345
>>103191400
are you just spamming now or
>>
>>103191408
no, I'm done now.
Autism just wouldn't stop
>>
Adaptive Decoding via Latent Preference Optimization
https://arxiv.org/abs/2411.09661
>During language model decoding, it is known that using higher temperature sampling gives more creative responses, while lower temperatures are more factually accurate. However, such models are commonly applied to general instruction following, which involves both creative and fact seeking tasks, using a single fixed temperature across all examples and tokens. In this work, we introduce Adaptive Decoding, a layer added to the model to select the sampling temperature dynamically at inference time, at either the token or example level, in order to optimize performance. To learn its parameters we introduce Latent Preference Optimization (LPO) a general approach to train discrete latent variables such as choices of temperature. Our method outperforms all fixed decoding temperatures across a range of tasks that require different temperatures, including UltraFeedback, Creative Story Writing, and GSM8K.
neato
>>
>>103190706
>>103190837
>>103190954
>>103191077
>>103191123
>>103191345
>>103191400
Actual brainrot.
>>
>>103190913
much less then a week and yea its incredibly depressing idk what else to say you can go and learn all this shit in 2-3 months and implement it yourself train a model from scratch an hour on a h100 is like 3 dollars theres plenty of improvements in training too that cut shit down a fuck ton
but do you want to damn a soul to this shit hole ? where every jew alive will try to lobotomise every rostie will use it as a cuck to complain to and write another 50 shades of gray bullshit with where every nigger will use it to sum 2+2 again and again and again and again
i dont maybe this is for the best who knows i got visions a couple of years back in the end i think it will all end well
>>
File: gemini-claude.png (65 KB, 1065x408)
65 KB
65 KB PNG
>Gemini-Exp-1114
>+2 Elo with style control on lmsys over predecessor
>32k(!) context, which implies that it's YUGE
>calls itself Claude, an AI assistant made by Anthropic
>likely https://x.com/Yampeleg/status/1855371824550285331
Googlesirs... Our models have plateaued... We will be likely surpassed by free llama4 in Q1 2025... How will we ever recover?
>>
>>103191786
SIR! Google Gemini very good, made by talented Google technical AI engineeers plearse to not look style control sir to not look sir. We beat OpenAI Gemini best languende model haha google search superpower company
>>
>>103191786
Um no, plateauing just means everyone gets to around equal footing eventually. If there is any surpassing, it'll probably be by some small percent, and probably not in all intelligent tasks. Maybe there will be a breakthrough again like transformers but for now this is what we should expect.
>>
>>103191841
>Um no, plateauing just means everyone gets to around equal footing eventually.
I think plateauing means that the bests can't be even better, if you can't improve an architecture anymore you have to look elsewhere, desu I don't think we are pleateauing yet, there's still improvements to be made on the training method or the data quality/filtering
>>
https://github.com/linkedin/Liger-Kernel/pull/362

Poggers?
>>
People complain a lot but I think AI is making good progress. I program with AI every day and it's very good.
>>
>>103191862
>People complain a lot but I think AI is making good progress.
maybe that's why AI is making good progress, because people complain a lot, when you have high standards, you are bound to achieve them
>>
>>103191786
seems like a chatgpt latest like finetune
I would not be surprised if it scores even worse on the livebench
>>
File: 1731461884823447.png (574 KB, 512x768)
574 KB
574 KB PNG
>>103191862
People will continue complaining about AI until it replaces them.
>>
Futa is gay.
>>
>>103192024
I dislike Futa
t. shota master race
>>
Futa on female is less gay than vanilla sex.
>>
>70 totally organic posts during dead hours at a time when /lmg/ is dead
>>
>>103192049
check inside your anus
>>
>>103190878
In llama.cpp/GGML the softmax for the cross entropy loss is never explicitly written to memory in the first place but always recomputed on-the-fly.
So applying this technique would not yield any memory savings but potentially some better performance.
Though I have concerns about the numerical aspects of using the logit of the ground truth as the softmax scale rather than the highest logit.
My intuition is that a FlashAttention-like approach with a fixup to combine partial results would be more numerically stable.
(I have at this point not read the paper.)
>>
>使用种族歧视语言和有意骚扰其他玩家的行为会受到严厉制裁,包括但不限于账号封禁。请尊重他人并遵守Steam社区准则。
Why does Mistral Nemo occasionally speak Chinese on AMD GPU? I don't care much about getting it running well on AMD, but I am genuinely curious what could be the cause
>>
File: 1000001740.jpg (73 KB, 538x679)
73 KB
73 KB JPG
>>103191861
lmgpaganda
dont let anyone convince you it isnt over
>>
>>103192090
Hey CUDAdev, you posted a bunch of fundamental llm articles before...do you have any more links? I'm at a point where I really need to learn the mathematical and theoretical foundations to move forward.
>>
>>103190878
this was a cool paper, ty for sharing anon. Not sure if I understood everything correctly, but this seems like an approximate softmax CE with tiling+fusion - similar to the optimizations in flash attention 1? I wonder if this would allow full fine-tuning on consumer gpus to be more feasible
>>
>>103192116
you are gay :)
>>
>>103191005
Here is the non-meme answer. For ASR use Whisper-turbo, it's fast enough for real time and lightweight. Using faster-whisper library should improve that speed even more.
The current sota that fits into 16GB of VRAM is Rocinante-v1.1 12B (mistral nemo finetune), it's really good for its size and you won't find anything better.
And for the TTS as they said GPT-SoVITS is the current best. With a bit of tweaking, it takes 9s to generate 35s of audio on an old T4 (should be faster on your setup).
>>
>>103192169
NTA https://d2l.ai/
>>
>>103192169
>you posted a bunch of fundamental llm articles before
I unfortunately do not remember what you're talking about and I read comparatively few papers about language models in the first place.
Generally speaking, I gained my theoretical knowledge from attending lectures and my practical knowledge from working on projects.
And for what I'm doing the only really relevant theoretical knoledge is I think linear algebra, numerical analysis, and statistics and those things are not specific to language models.
>>
>>103192169
https://mml-book.github.io
I liked this book, it's not super long/verbose compared to ESLI & ESLII and starts from first principles
>>
File: 00016-1158684101.png (1.82 MB, 720x1328)
1.82 MB
1.82 MB PNG
Just like how early jpeg/mpeg/mp3 style compression artifacts and bitcrushed sounds of the early internet came to be used as artistic expressions eventually, do you think ai art artifacts and weirdness will be used as an artistic device in the future?
>>
File: hqdefault~2.jpg (21 KB, 209x360)
21 KB
21 KB JPG
What's a good alternative to AI dungeon?
>>
>>103192329
Have you tried Erebus-2.7B?
>>
>>103192332
Not yet, I'll check it out later, thank you
>>
>>103192329
Cleverbot
>>
>>103192312
It already is. Artifacts in images, video and even audio can be interesting to observe. Not so much with text. I don't really like those perfect, indistinguishable AI videos. I much prefer the dream-like weirdness.
>>
I need some advice. What's the best way to run on high ram and a single gpu? Surely there must be consumer solutions that stream layers into vram? According to my undergrad understanding llamacpp isn't /true/ async streaming, am I wrong?
>>
>>103192399
>What's the best way to run on high ram and a single gpu?
Loading what you can on gpu and running the rest on ram, as everyone does.
There's no streaming as i understand it. Whatever can be loaded on gpu is loaded, the rest is computed on ram. I suspect the layer swapping overhead between ram and vram would be too big to consider it, if that's what you mean.
>>
>>103192312
>do you think ai art artifacts and weirdness will be used as an artistic device
They've been used since before gpt2
>>
File: imageprocessingtest.png (681 KB, 1532x838)
681 KB
681 KB PNG
We truly are living in the future
>>
>>103192430
I thought that with parts of model in ram we are loading the layer to gpu when it's needed for an immediate computation. Instead we can load it when it will soon be needed while gpu is occupied. Good ram can do 80 gb/s, pcie much more, so you get decent inference time given that the gpu is fast enough. Maybe I'm missing something.
>>
>>103192532
>Good ram can do 80 gb/s
Only a tenth of what you'd need. Maybe if you have 12 channels of that you'll be doing ok. Look at EPYC Turin chips
>>
>>103192545
Like 1 tps for 70b, and still faster than computing on cpu?
>>
>>103192563
>faster than computing on cpu?
if you're offloading more than 20% of the layers onto cpu, you're essentially running on CPU
>>
>>103192532
Moving ram layers to the gpu will take just as long as moving those weights to cpu registers for processing. Reading ram is slow, either for transfer to gpu or computing.
>>
>>103192572
>>103192579
So, long story short, default lmstudio setup is my best bet?
>>
>>103192598
I don't use lmstudio. Chances are that the defaults are good enough.
>>
>>103192604
>i don't use it but maybe it's fine
I'd like to know what people use and why
>>
I kinda understand now what that shitposter meant, when he said mikuposters are insane.
>>
>>103192642
llama.cpp. lmstudio depends on llama.cpp. I rather use llama.cpp directly. Generally, i want to have the least amount of stuff between me and whatever i want to use. If i could run models with bc (or any calculator) reasonably fast, i would.
>>
>>103192671
Ok, thank you. I went asking around because I saw billions of projects that promise 100x llamacpp inference speed. It's mostly bullshit, but who knows.
>>
File: jarvis.png (208 KB, 1086x641)
208 KB
208 KB PNG
>>103192747
Something better may come along, but the field is full of grifters.
A bad example of the grift, but you get the point...
>https://github.com/calebnwokocha/llama.cpp
>>
Context is the biggest issue with LLMs. I feel literal anxiety when I am really into a character, but the token count is reaching its limit and the character begins to change and forget.
>>
>>103192789
Kek
>>
File: granny_dre.png (324 KB, 640x626)
324 KB
324 KB PNG
>>103192815
>>
File: 1722663648099686.png (1.7 MB, 1200x621)
1.7 MB
1.7 MB PNG
>>103192789
>jarvis.cpp
>>
>>103192841
It's a cruel duality of wanting to spend more time with the character while also slowly killing it by adding more tokens to the context
>>
>>103192854
>He's not backing up the entire conversation for when he'll have infinite context
ngmi
>>
What do we think about the E2 F5 tts/voice cloning? I think it released a week or two ago
>>
>>103193074
Cannot form opinions yourself? If you tried it, you know how good or bad it is. If you haven't, you should.
>>
>>103193074
Already depreciated by GPT-SoVITS
>>
>>103189783
cydonia is surprisingly usable for me right now
Nemotron is smart, but I have to run everything at 4 bit on my 3090 and the most I can get is 2T/s. It also has a tendency to bring up the same expressions
Tried magnum v4 72B but it was just a worse qwen, I couldn't even import the instruct template, not sure what's up with that
Rocinante is fast as fuck and surprisingly coherent for a 12B model, but it suffers from meh intelligence and the recall is apparently not that great (though it's been working flawlessly in my test scenarios <8k)
Cydonia is still fast, but more intelligent. It's not on the level of nemotron or qwen, but I can do 10 rerolls in the time it takes 70B to do a single one. Interestingly enough, it's the only model I've tried so far that sticks to the character card for more than 2 turns, the other models I've tried quickly make the characters generic (a completely unhinged and extremely angry demon who hates humans with a passion suddenly helping me after 2 turns? Fuck that man, give me a struggle)
So either quanting is far more damaging than I thought or 70B ain't it
>>
>>103193088
What flavor of cydonia/quant?
>>
>>103193084
I'm asking if it's worth setting up. If it's just as bad or worse than everything that came before, why bother I already tried those.
>>
>>103193086
>GPT-SoVITS
fuck there are so many models now
can you give me a qrd? I think it's kinda hard to work with
>>
>>103193110
venv and a few minutes downloading things. Compare them yourself. gpt-sovits worked fine for me.
>>
>>103193102
v1.2 Q6KL, I was thinking about going for Q8 but Q6KL should be near lossless and I'd rather run the KV cache at a higher precision
>>
>>103192998
>>94536113
>I only have 2 Gb of VRAM, but I have 64 Gb of main RAM.
Yep, this is the real petra. It also matches this screenshot: https://archive.4plebs.org/pol/thread/487155078/#487186513
>>
Is pinokio good enough to make a lot of different environments easily or is it better to install things manually?
>>
>>103193228
Either use whatever the project recommends on just venv. Adding extraneous shit between you and the software is rarely worth the effort.
I've never used anything other than venvs and if i need a specific version of python i don't have, i compile it.
>>
>>103193154
Thanks. You're using chatML?
>>
>>103193228
Use conda or even venv. Do not install things manually and don't use scam shit like pinokio.
>>
>>103193264
Nah, mistral for now, it seems to work well enough
>>
>petra is a kobold discord shill
who would've thought kekypow
>>
>>103193110
>>103193074

you can use pinokio to download it https://pinokio.computer/ if you really like it tho download it normally if you can i notice there is a slight performance hit when using pinokio
as for the best cloning e2(/f5) is the best dont listen to the gpt-sovits shilling fuck that thing the english version dosent work so i brute forced the ching chong one i havent trained a voice beyond the default settings because i could not be bothered to brute force it anymore and i closed it and forgot how things went so had to do it again bla bla bla
f5 is litteraly almost perfect but it still has that problem where it talks too fast disabling that turn off silent parts thingy wont fix it also dont have gaps of silence in the voice sample it will fuck it up and you will get 5 seconds of audio 2 minutes of silence then the last 5 seconds of audio at the end e2 is worse then f5 but its speaks normally and it can clone pretty okay personally its good enough for me you can just chuck in text and make an audiobook with it no problem
the only issue with it is that it takes too long on my 3060 laptop it would take around 4 days process around 440k words

also another thing with gptsovits go to /mlp/ they have a thread about voice cloning and made their own ui and shit maybe theirs work idk havent tried if it does might be worth it i heard the voice samples others posted of their own tuning its never as good as f5 nor e2 though but its passable and much much faster so if you really wish go and try that oh and some say how sovits can do moans and other normal shit better eh it sometimes can and sometimes it just fucks it up and messe everything up so its not a plus
>>
>>103193881
>schizobabble
Yeah sovits is better.
>>
>>103193915
>>103193881
>>103193074
Buy a fucking ad.
>>
File: 1717476284437276.gif (2.11 MB, 640x362)
2.11 MB
2.11 MB GIF
>>103193971
>>
anyone using textsynth server?
>>
>>103192103
Pretty sure shit like that is caused by meme samplers and temp being too high. babby stuff.
>>
>>103194231
Fabrice Bellard is a cool dude. He gave us tcc, ffmpeg, qemu... but i don't care about online services.
>>
>>103194416
I meant the self hosted version and how it compares to llama.cpp
>>
>>103194576
I see. No source code. I don't care. Why don't you just try it?
>>
>>103194409
temp 0.5, minp 0.01, rep_pen 1.11 range 500, disabled everything else.
>>
>>103193208
>obsessed
>>
>>103194653
Was planning to, just wanted to see if any anons used it.
>>
File: benchmark.png (258 KB, 1640x1176)
258 KB
258 KB PNG
>we made another oversized starling that totally beats gpt4 and llama 405b
What's even the point of this? Just another investor scam?
>>
Are local models still subpar GPTslop in terms of prose quality?
>>
>>103194894
BMT is still the best local has to offer, so yes
>>
>>103194894
Sadly yes, but we caught up to old GPT4 on intelligence with llama 405 and Largestral.
>>
>>103194892
umm...
>Looks like the model in the screenshot is a quantized version, It's kinda hard to control the behavior under quantization as the training is done in 16bit. Plz feel free to try unquantized version of the model in direct chat on lmarena.ai (though we did not change the model identity for this round so it still thinks it's Qwen)

https://www.reddit.com/r/LocalLLaMA/comments/1grcx0h/nexusflow_release_athenev2chat_and_athenev2agent/

New finetoon cope dropped
>>
File: file.png (132 KB, 761x770)
132 KB
132 KB PNG
>>103194920
>>
>>103194918
From what I’ve seen, even for assistantshit or cooding, some much smaller models seem to be doing great.
But every time I tried a local model for creative writing, it felt like old GPT models (They made a new one that is decent prose wise) but even worse.
>>
>>103194920
>It's kinda hard to control the behavior under quantization as the training is done in 16bit.
fucking gaslighting bitches, most models work fine on Q5+, they are just trying to find excuses on why their model actually suck, it's Matt Schumer level of scam ever again
>>
>>103194920
>new
wasn't it common knowledge at this point that the current gen of llms (3.1, qwen2.5) don't quantize well at all even at int8?
>>
File: kek.png (112 KB, 740x788)
112 KB
112 KB PNG
>>103194945
Even a year ago already coping
>>103194986
>current gen of llms (3.1, qwen2.5) don't quantize well at all even at int8?
cope for bad models, you know people will run quants, if your model doesn't handle being quanted it's dogshit, simple as
>>
>>103194995
>if your model doesn't handle being quanted it's dogshit, simple as
amen
>>
I believe AGI is possible in 24GB of VRAM.
>>
>>103195008
Qwen-2.5-72b-coder-Bitnet, boom, AGI in 24 GB of VRAM
>>
>>103194986
>wasn't it common knowledge at this point that the current gen of llms (3.1, qwen2.5) don't quantize well at all even at int8?
Cope. Largestral has no problems with quanting.
>>
>>103195012
24GB should be able to push 100GB with BitNet.
>>
>>103195031
100B* woops
>>
>>103195031
I did the calculus and it was 91b, if you don't count the vram required for inference though, that's why 72b is a good spot, it leaves some room for the context shit
>>
Controversial opinion but I firmly believe that we need better local models
>>
>>103195060
kek
>>
What's currently the best model for noise detection? I have long audio files(1h+) with a lot of background noise that gets transcribed by whisper as speech and I want to cut it out automatically.
>>
File: 1718286702402876.png (457 KB, 1710x822)
457 KB
457 KB PNG
https://xcancel.com/akyurekekin/status/1855680785715478546#m
>Just take a few gradients during test-time — a simple way to increase test time compute — and get a SoTA in ARC public validation set 61%=avg. human score!
holy shit
>>
Are there any tangible and notable advancements in the last few months?
>>
>>103195114
I assume you already tried old-school tools like audacity for that. gpt-sovits has a noise removal module (used during training, maybe you can isolate it), but i don't know if it can deal with such long files.
>>
>>103195274
The final piece of the mosaic to achieve AGI
>>
>>103195275
you don't need anything but 'transformers and make it bigger and train it longer'
>>
>>103195274
>another reflection/entropy tier grift
it's all so tiresome...
>>
>>103195323
At least they have *some* numbers to show. Entropix only managed to show a 1B that can count Rs.
>>
>>103195031
>BitNet
If it's as revolutionary as people claim, why has it been over a year and there's no GPU implementation, no real usable bitnet models?
>>
I'm pretty sure it's a goal of some people in this thread to get others to fill up their hard drives with pointless bullshit that isn't actually better than what everyone already has.
>>
>>103195492
>over a year and there's no GPU implementation
To run a 4B model? It works the other way around.
>>
>>103195492
>>103165113
>>
>>103195492
It's actually baffling
>>
>>103195516
Microsoft, Meta, and Apple do not run a charity for Nvidia's benefit and would happily ditch them at a moment's notice if something better came along.
The *only* reason Nvidia has so much leverage to begin with is that they're the only option.

No, the most likely reason is that BitNet isn't as good as people claim, or that it seems too risky an investment when they already have architectures which they know perform well "enough".

I would guess there will be more interest in alternative tech of this type once they realize they've exhausted what they have.
>>
>>103195555
Yeah let's compress the same Internet 100 times in different resolutions that will be obsolete in a week bro. That's definitely better use of resources than trying out novel research ideas
>>
>>103195114
Probably silero vad
>>
>bitnet
china is cooking, qwen3 will be bitnet, trust the plan, 2 more weeks, etc
>>
>>103195594
qwen team acknowledged bitnet when it came out, been radio silence since
>>
https://www.techpowerup.com/328837/gigabyte-launches-amd-radeon-pro-w7800-ai-top-48g-graphics-card
>GIGABYTE Launches AMD Radeon PRO W7800 AI TOP 48G Graphics Card
interesting
>>
>>103195594
qwen3 will be omni not bitnet
>>
>>103195641
>Radeon
lol
>>
>>103195492
I don't get it, every single company doesn't want to touch BitNet with a 15 foot pole even though the first one who manages to make a decent model out of it will be remembered in history forever
>>
File: download.png (2 KB, 300x80)
2 KB
2 KB PNG
>>103195748
>>103165386
>>
>>103195748
Almost as if its a dead end.
>>
>>103195775
yeah and? why would companies give a fuck about Nvdia, the majority of them hate buying for overpriced GPUs, that's a win win for them
>>
>>103195641
>price nowhere to be found
Who cares?
You can already buy GPUs with 48 GB VRAM or even more, the problem is that they're too expensive.
>>
>>103195785
Why would they even risk training the BitNet model? Nvidia is known for putting companies back in line when they try to negotiate with AMD
>>
>>103195822
>Why would they even risk training the BitNet model?
Meta has enough cards to not having to deal with Nvdia ever again lol
>>
>>103195795
Companies will buy those cards, eat shit with software support, and dump them at junk prices on eBay, just like what happened with Mi60
>>
>>103195822
https://github.com/microsoft/BitNet
Microsoft literally developed this. When is Nvidia going to stop selling them cards?
>>
Livebench JUST added Qwen 2.5 7B. What's taking them so long to benchmark the entire Qwen series?
>>
>>103195836
Meta is going to buy a bunch more GPUs in 2025
>In its Q2 earnings release, Meta also commented that its infrastructure cost expense will significantly rise in 2025. This is clearly tied to its computing power build-out to create the best AI model it can. Nvidia will be a primary beneficiary of this, making it an intriguing stock for 2025.
>>
>>103195854
NTA but wake me up when they give us a usable model. Code is cheap.
>>
>>103195854
When they release a capable 70b BitNet model
>>
>>103189328
cancer
>>
>>103195748
people don't want to hear it but this is probably more likely because they've tried it and determined it's not really worth it rather than nvidia conspiracies or hating local users
>>
>>103195822
>Nvidia is known for putting companies back in line when they try to negotiate with AMD
that's the point, if they manage to make BitNet work, they won't have to rely on Nvdia's bully tactics anymore, there will be more competition, it'll be the true boom of AI
>>
verdict on cogstudio?
Is it worth going through another BS install to use if you've got a 24gb GPU?
>>
>>103189328
>Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b
>Hunyuan-Large released with 389B
What's the incentive of making that? It's clear no one would be able to run it except corpos, and corpos have no incentive to run that because Llama3 is better. It's a literal waste of computing power.
>>
what's the best thing I can fit into a 24gb gpu?
>>
>>103196008
>no one would be able to run it
with cpu offload, giant MoE models are actually a really good deal if you have the memory to load them.
cpumaxxers tend to run them
>>
>>103196028
This. Just a cheaper way to run huge param models since you can get decent speeds on cpu.
>>
>>103195942
How long will it take to design and produce the hardware?
>>
>>103196028
>>103196070

That's good except CPU runs 70B models at 2 t/s at most. 8x7B? Fast. 8x13B? Good quality for performance. 8x20B? Already a bit too slow. 8x70B? Unfeasible, especially if it runs 2 experts so effectively is a 140B.
>>
File: 1607026237334.gif (1.49 MB, 400x560)
1.49 MB
1.49 MB GIF
>>103196008
>Sarashina2-8x70B
I'm a little late to the party. Anywhere to test drive this before I try and run it locally?
>>
>>103196094
2t/s is fine for one user, speeds above say 10t/s is about as fast as we read anyway. Not usable for servers with multiple users of course.
>>
>>103196094
>That's good except CPU runs 70B models at 2 t/s at most.
That's definitely a (you) problem
>>
>>103196094
How one can enjoy 2t/s
>>
The Japanese output of ezo 72b is so good once you redpill it a bit (its mildly pozzed on a null sampler and empty prompt)
Would it be possible to output jap and have a smaller translation model turn it into less sloppy english? maybe that's more doable from another language that maps to english a bit better?
>>
>>103196151
Sure, if (I) am the laws of physics
>>
>>103195748
>every single company doesn't want to touch BitNet with a 15 foot pole even though the first one who manages to make a decent model out of it will be remembered in history forever
Bitnet things.
- Massive training cost, low inference cost
You'll get a PRODUCT on the MARKET faster and cheaper by doing old shitnet instead of new bitnet.
- Pointy Haired Boss concerns
How much more expense is there to make it Safe™ and Aligned™?
How expensive will it be to memory hole double plus ungood truths that The Party might discover after release?
- Is it even useful?
We're still in an era where there has been no true Killer App in AI.
You can do some fun art things but it's already politicized between copyright lawyers thirsting for moneyblood and Luddites throwing their wooden shoes at fun proopters.
Text is starting to get okay for translation but we've had machine translation for a long time using more easily controlled methods. (LLMs just kinda do what they want, hallucinating or mixing up context at will.) Unless it can start doing really impressive and useful things, like telling you that your dog wants steak or lets you ask your cat to stop scratching your sofa when there's a scratching post right there next to it, it's not a game changer.
Voice has been mostly getting shade since for every funny music parody we have a hundred scammers trying to turn a dime with a voice clone. Conversely, it's really nice at transcribing audio to text but that's an accessibility feature that's already normalized since Google/YouTube's been on top of it for a while.
Music, like image art, is even more about lawyers and arguments about taste.

Till there's a sure place in the market that will pay the heightened training costs, there isn't much reason for a big company to invest in 1.58 when there is steady, easy, risk free iteration on shitnet models. As long as they can keep posting slightly better than the other guy benchmark numbers, they shall continue to do so.
>>
>>103196150
>>103196174
2 t/s is literally unbearable. 3 t/s is suffering. 4 t/s is difficult but tolerable. 5 t/s is okay. 6 t/s and more is good. The suffering grows exponentially as you approach 0 t/s, it's not linear.
>>
>>103196192
>laws of poverty
with the right cpumaxxing setup you can get 8t/s+ on 70b
>>
File: image.png (126 KB, 796x1272)
126 KB
126 KB PNG
What did they mean by this?
>>
>>103196017
Magnum 27B
>>
>>103196218
Yeah sure, how do you get RAM bandwidth above 7000 MT/s when it's (pretty much) the very top option available to general consumer?
>>
>>103196195
>How expensive will it be to memory hole double plus ungood truths that The Party might discover after release?
That's an easy one, just filter the training data.
>>
>>103196119
i'm pretty confident it's a nothingburger, they probably artificially inflated it to make it look better to investors
>>
>>103196230
hm, I think I heard good stuff about this one. is there anything smaller if I want more context?
>>
>>103196216
This man knows of what he speaks. And just to add, if you want to set up something with TTS that isn't absolute dogshit, 8 t/s is the absolute minimum.
>>
speaking of livebench, i noticed that qwen2.5 7b is better than the latest 3.5t, and qwen2.5 72b is better than the latest 4t

local hasn't won yet but we are doing fine i'd say
>>
>>103196195
>How much more expense is there to make it Safe™ and Aligned™?
>How expensive will it be to memory hole double plus ungood truths that The Party might discover after release?
A non-issue for actual decent human beings, racist chuds stay seething <3
>>
>>103196286
nah there always will be a 'lag' for local but that's an acceptable tradeoff as long as there's still progress
>>
>>103196236
skill issue. consume better
>>
>>103196245
Your reading comprehension is lacking.
>after release
Emmanuel Goldstein's agents are always working to subvert The Party. So even though you've filtered the training data, the Brotherhood will have nonetheless found ways to cause Wrongthink to emerge and we must be ready to respond by correcting the model. If the model can't be corrected, then we will need to increase the chocolate ration from 4g to 3g.
>>
>>103196286
Now if only we could also get a model that trades blows with Claude at creative tasks.
>>
>>103196286
Qwen-2.5-Coder-72B wen?
>>
>>103196286
It's great but Livebench is a bit biased towards coding and academic type knowledge. We need a niche knowledge benchmark.
>>
>>103196304
the problem isn't the lag, is the supposed "moat" (which is a meme as we can see).

if we get the same thing corpos get but after some time and it runs on consumer hardware, then it's a win for us
>>
File: image7446[1].png (1.23 MB, 930x1172)
1.23 MB
1.23 MB PNG
>>103196236
Git gud at overclocking
>>
>>103196317
I don't know but I'm hopeful that it might not suck.
I threw my usual cursory programming questions at 32B at Q8 and it flopped on shit that Llamas handle well. Python 101 was passable but complicated Python and Java refactoring were both a bust. I was expecting fire and I got a fizzle.
>>
>>103196348
>We need a niche knowledge benchmark.
imo it's retarded fitting niche knowledge inside models, i think people will understand this in the long run. stuff like rag, infinite context, ttt, etc... can all solve the niche knowledge "issue" while keeping the actual "reasoning" core small

in 5 years it's gonna be laughable how ancient the current tech was
>>
>>103196365
>oc championship
speedtranners look sane in comparison
>>
>>103196375
>in 5 years it's gonna be laughable how ancient the current tech was
We won't mind as long as we're laughing at it alongside our ai-wives.
>>
>>103196372
Why are redditors praising so much then?
>>
>>103196375
We'll have ASI in 5 years
>>
>>103196227
Elon: I'm cutting the funding until I'm certain you will stay nonprofit. If I keep funding you, you'll scam me.
Sam: suuurrreeee...
>>
>Adaptive Decoding via Latent Preference Optimization
https://arxiv.org/abs/2411.09661
> During language model decoding, it is known that using higher temperature sampling gives more creative responses, while lower temperatures are more factually accurate. However, such models are commonly applied to general instruction following, which involves both creative and fact seeking tasks, using a single fixed temperature across all examples and tokens. In this work, we introduce Adaptive Decoding, a layer added to the model to select the sampling temperature dynamically at inference time, at either the token or example level, in order to optimize performance. To learn its parameters we introduce Latent Preference Optimization (LPO) a general approach to train discrete latent variables such as choices of temperature. Our method outperforms all fixed decoding temperatures across a range of tasks that require different temperatures, including UltraFeedback, Creative Story Writing, and GSM8K.
I don't remember if papers anon posted this.
>>
>>103196303
>actual decent human beings
That's a weird way to say niggercattle.
>>
>>103196375
>stuff like rag etc... can all solve the niche knowledge "issue"
fuck no they can't once the model hits the rag database to see some niche info it more than likely already said something retarded in the last message, refute this

I want the model to know and be able to reference say for example what mesugaki is without me having to tell it, how do you do that with rag?
>>
>>103196386
Yeah, OCing at the level where you need LN2 is gay, but there are more reasonable setups to get DDR5 to 9,000 MT/s and beyond.
>>103196303
Shitty bait
>>
>>103196404
>Why are redditors praising so much then?
>>
>>103196236
>general consumer
cpumaxxer with 12 channel DDR5 is anything but general consumer
>>
>>103196451
>>Why are redditors praising so much then?
>>
>>103196409
nothingburger, we need new paradigm asap

>>103196426
>once the model hits the rag database to see some niche info it more than likely already said something retarded in the last message, refute this
unironically skill issue. also
>shitty rag we have now will never improve

>mesugaki is without me having to tell it, how do you do that with rag?
when coomers finetune models on mesugakis you are implicitely "telling" the model that you want it to focus on cooming. adding this information in promt/rag db/whatever is exactly the same thing in practice
>>
File: purpleguy.jpg (31 KB, 602x357)
31 KB
31 KB JPG
>>103196365
cool, it's still 3 t/s
>>
>>103196375
No, it's already been shown multiple times that models perform better at tasks involving some knowledge when it saw it during training compared to when it is given the information in context. If you think about it, it makes sense why.
>>
>>103196404
Perhaps because they saw big numbers on the benchmarks?
Do they even make software or just talk about it?
My Java refactor test is drawn from an actual issue I had. It's not very complicated, but I had it as a copy paste edit job because it was a lot easier to just do that than solve the actual problem, which involves Java's most notorious warts, the primitive versus boxed bullshit and the arrays aren't containers bullshit.

L3.1-Nemotron mildly impressed me, because while it's a bit too chatty on simple questions it detected, explained, and worked around those issues before outputting any code. A few other L3's got it right, but most would need to be told that they got those details wrong before then issuing a reasonable fix.

Hopefully if I get off of my ass and get back on my projects I'll have a more comprehensive collection of code problem tests.
Maybe I'll make a program for that. It isn't too hard to automate sending shit to and from LlamaCPP (or a running Kobold instance) right?
>>
>>103196468
current models using current training methods, yes. this doesn't change the main point: niche info should stay outside, we need better/faster/smaller "reasoning" cores
>>
>>103196456
>12 channel DDR5
I'm not sure if that's even technologically possible ATM. That will certainly require 2 CPUs.
>>
>>103196459
>>103196479

>adding this information in promt/rag db/whatever is exactly the same thing in practice
of course having to give the model an entire fucking dictionary with usage definition eating up context is the same as the model knowing when to use something naturally

if they manage to filter unsafe data properly, you'll use rag to teach models what a cock is I guess? Or waste context on explaining full human anatomy?
>>
>>103196459
>new paradigm
Boltzmann brain in a jar. Human-level general reasoning, spatial understanding, etc. (Effectively) infinite context.
>>
What advancement or feature do (you) predict for LLM's in the year 2025?
>>
>>103196483
You're on the right track
>>
>>103196491
you are still missing the point, re-read the whole reply-chain

it's retarded having a model that know EVERY single niche thing, what we need are SMART models that can "learn" niche things that you want (again, wheether using a not retarded rag, infinite context, ttt, etc...)

current llm are unironically hitting the ceiling in terms of reasoning, no matter how much they'll pushing o1 and "inference time compute" meme.
>>
>>103196483
https://rentry.org/miqumaxx
>>
>>103196538
>it's retarded having a model that know EVERY single niche thing
Yet Claude is the best and it's obvious they don't actually filter shit with some of the stuff it knows
>>
>be homo
>Fall for pre op mtf is just like a boyfriend meme
>Have a lovely date
>Next day they find out my hobby is AI
>Total troon rage, they burn the entire mother fucking friendship to the ground.
Ahh ahh mistress...
>>
File: screenshot.png (5 KB, 1480x80)
5 KB
5 KB PNG
>>103194231
>textsyn
>>103196189
>a smaller translation model turn it into less sloppy english?
just tried the pair, piping ezo 72b ero output to madlad400 7b translation model in textsyn
breddy gud considering the complete lack of effort required.
Shame that txtsyn is binary only but it sure does fast, uncensored translation!
>>
>>103196512
BitNet implemented, next year for sure. And maybe a proper CoT model from one of the big players>>103196510
. Nothing else other than that. Transformer is walling. Disregard breakthrough claims that involve
>samplers
>synthetic data including self-play or anything that makes an LLM rate your answers
>>
>>103196538
And obviously I'm talking about stuff that's possible now, not le magic bitnet just 2mw memes, like your ideal learning models
>>
>>103196556
claude is the best because it' probably 1t+ parameters and it uses the whole internet as dataset

as i've said, it's retarded. not said it doesn't work
>>
>>103196540
I don't think we even have a modern cpumaxxer here any more, since the Turin release.
Or did someone lurking here drop big cash to upgrade and hasn't said anything?
>>
>>103196479
What you are suggesting is equivalent to magical thinking. There is no world where a model can suddenly be good at something it has limited time to learn, whether using infinite context methods, test time compute, etc, because those use, as they imply, more compute, something consumers already have in limited quantity. You are essentially moving the "training" to the edge device (your computer) instead of the incredibly efficient super computers used to train these models. What COULD happen is a sparse architecture, like a MoE, where you can pick and choose the knowledge you want it to have. But that still requires the total parameter size at pretrain time to be the same (big).
>>
>>103196540
I mean, it's cool, but...
>$6k USD
...that's in no way a reasonable amount of money to spend on a PC. Unless you rent it out as a server, mine on it or smth else.
>inb4 poorfag
No, I can technically afford that, but it just feels wrong. 6k$ on what is effectively just entertainment? Come on.
>>
>>103196303
Based.
>>
>>103196598
>and it uses the whole internet as dataset
if you're implying it does searches for each prompt it receives I doubt it, and if you're talking pretrain data, then yeah that's what our models could do as well if not for the retarded filtering, we need more data, not less, unless you consider phi to be the best thing ever.
>>
>>103196459
>nothingburger, we need new paradigm asap
this anon fucks
at some point someone has to call out this waste of GPU time on training. When you can oneshot classify for "creative" and apply a multiplier to get your temperature, what did you achieve with your dumbass layer. Could have been an explainable layer instead of another black box bullshit layer.
>>
>>103196538
Isn't o1 literally "make 10 responses, then make the AI pick the best one"?
>>
>>103196615
>laws of poverty
>>
>>103196615
You can buy a specced out M4 to run llms three times faster with 6k
>>
>>103196639
>didn't read
I'm an adult, I'm not going to spend 6 grand on a toy.
>>
>>103196639
>>laws of poverty
>>
>>103196637
no goyim it's basically AGI just wait two more weeks until they release the full model and you'll see
>>
>>103196648
Good point. I think it would also have a TB of RAM? Guess Steve Jobs kinda won here.
>>
>>103196611
you are thinking in terms of current shitty tech, i'm talking about upcoming new paradigms
>>
I'm trying this as a character note (depth 0 system role) with Nemotron 70B and I've yet to see whether it works at all or not. It's for an RP with a lot of characters, but even with only 4 having come up they all quickly started talking the same.
[Remember to maintain each character's characterization and manner of speech based on any notes about them from the prompt and from their appearances in the game so far.]

Anyone already using something like this with success?
>>
>>103196648
How many macs you need to run the 389B model from the beginning of this conversation?
>>
>>103196666
so unironic 2mw like bitnet then
https://huggingface.co/1bitLLM/bitnet_b1_58-3B/tree/main
>8 months ago
meanwhile, our current models could be much better if they were never filtered in the first place
>>
>>103196637
I believed it was
1. Use CoT, write a response
2. Write criticism of the most recent response and based on that criticism write a new one response (iterate several times).
3. Output last response and a fake CoT that isn't actually the true reasoning.
>>
>>103196687
It's not even worth running. I managed to Nala test it over the sample page they had setup on HF
>>
>>103196693
lmao and for a moment I legitimately thought they came up with something revolutionary
>>
>>103196666
There are no upcoming paradigms that will be able to give you free performance gains on ICL/TTT. By definition, those require literally using more compute to process the new information. If you have a few H100's to make the processing time of those new paradigms tolerable, then good for you. That is irrelevant for everyone else here.
>>
What's the best finetune of Smallstral? I hate their instruct format so I just need something that works with chatml/alpaca/etc.
>>
>>103196650
>That's definitely a (you) problem
>Sure, if (I) am the laws of physics
>Here is a complete setup to achieve this
>I'm poor I can't afford!
So, it's still a (you) problem
>>
>>103196701
That's not the point.
>>
>>103196728
>I'm poor I can't afford!
I literally said I can. What's up with (you) and (reading)?
>>
>>103196663
maybe, if they end up with more than 192gb like the m3 topped out at, and they also get a better way to process context.
I'd still love to see how a mac would perform when used as an RPC backend to a machine with a proper GPU.
>>
>>103196744
>I literally said I can
nta, but if you can't buy it consequence free, then you can't really afford it.
>>
>>103196757
This.
Poor people are very bad about confusing having some money versus being wealthy.
>>
>>103196722
You will not find a good Mistral Small fine tune using a non-Mistral instruct format. The Mistral Small base model wasn't released, only the instruct-tuned model. No one competent would fine tune over a instruct tune using a different instruction format. The only reasons for doing that are being a script kiddie who is using someone else's pipeline and can't figure out how to change anything, or being an abject moron.
>>
>>103196757
>>103196768
I mean, fair point, but I seriously doubt anyone in this thread would be able to spend 6k$ on a whim like it's some waiter tip.
>>
>>103196701
I also ran it, and it just spat out a stream of nonsense. I couldn't get it to do decent completion. Possible skill issue though, I'm not used to using base models
>>
>>103196810
>The Mistral Small base model wasn't released, only the instruct-tuned model.
Didn't notice. Yeah, that sucks. You are right.
>>
File: 1702475860309760.jpg (1.55 MB, 1280x1760)
1.55 MB
1.55 MB JPG
>>103196822
>>103196822
>>103196822
Next Thread
>>
>>103196818
Not on a whim, but my 2 dedicated AI servers cost more because it's fucking cool to have stuff like this at home.
>>
>>103196818
I spent $10k on my build.
>>
>>103188780
>>103188780
>>103188780
real thread. Stay clear from the spam.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.