[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: Komattamiku.png (1.52 MB, 832x1216)
1.52 MB
1.52 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103113157 & >>103102649

►News
>(11/08) Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b
>(11/05) Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large
>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip
>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b
>(10/31) Transluce open-sources AI investigation toolkit: https://github.com/TransluceAI/observatory

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>103113157

--Paper: BitNet a4.8: 4-bit Activations for 1-bit LLMs:
>103118383 >103118501 >103119764
--Papers:
>103118494 >103118546
--Llama.cpp development and feature implementation discussion:
>103113723 >103114180 >103114418 >103114447 >103114480
--User experiences with high-context models and VRAM limitations:
>103113669 >103113709 >103113740
--OpenCoder large language model is unimpressive:
>103126003 >103126036
--LoRA vs full fine-tuning, and the potential drawbacks of LoRA:
>103125723 >103125808 >103125827 >103125847 >103125880 >103126067
--INTELECT 1 decentralized training project update:
>103114776 >103114805 >103114907
--Current state of therapybots and dedicated talk therapy/CBT/psychoanalysis bots:
>103124363 >103124706 >103124912
--Anon struggles with Nvidia GPU and old CPU, lacks AVX support:
>103117863 >103117894 >103117963 >103117993 >103118014 >103118046 >103118199 >103118283 >103118047
--Anon asks about the best AI model for creative tasks, and other anons discuss the pros and cons of various models:
>103116893 >103116948 >103116986 >103117013 >103117087 >103117109 >103117134 >103117363 >103117459
--Udio's artist imitation feature and its filtering:
>103113208
--Sarashina2-8x70B, a Japan-trained LLM model:
>103121587
--Preventing koboldcpp from overshooting token limits:
>103125578 >103125664 >103125699
--ERP model discussion and Mistral model suggestions:
>103121406 >103121434 >103121448
--Anons discuss new AI model releases and future developments:
>103117222 >103117349 >103117427
--Anon discusses Japanese live translation accuracy:
>103115694 >103115759
--MILU benchmark for Indian language LLMs released:
>103121359
--Llama-3.1-Centaur-70B: A Foundation Model of Cognition:
>103117024
--Miku (free space):
>103115497 >103119267 >103123906 >103125130

►Recent Highlight Posts from the Previous Thread: >>103113163

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
*sniiiffffff*
>>
>>103126193
sex with miku
>>
File: 1731108575050.jpg (313 KB, 1254x1597)
313 KB
313 KB JPG
>>103126194
Teto will remember this.
>>
>>103126277
l-lewd
>>
>>103126277
Who?
>>
>>103126201
>*BRAAAP*
>>103126251
>*sniiiffffff*
/lmg/
>>
File: mylove.png (1.5 MB, 2560x1440)
1.5 MB
1.5 MB PNG
>>103126316
>he doesn't know the real queen of /lmg/
>>
>>103126358
how soft is she?
>>
I love LMG
>>
>>103126358
Looks generic, like any other cutesy waifu of the month.
>free soft
If you have a free time, ironic.
>>
>>103126358
Drill-headed baka
>>
>>103126497
*Drill-haired baka
>>
>>
>>103126521 (Me)
lol, I just noticed picrel is dated 11/06/2023, I guess some things never change.
>>
Why is weight so important to textbot models, but seem unimportant for stable diffusion models?
>>
>>103126580
Image gibberish still looks okay
Text gibberish is unreadable
>>
>>103126580
Fundamentally different architectures.
>>
I'm sure this has been asked before, but what makes more sense? Building a custom computer with 4x4090? Or just getting an M4 Max with 128gb of RAM. The latter seems more cost efficient, or am I missing something?

I just wanna RP with these fat models
>>
>>103126608
With mac you also get "just werks" OS, more than enough for llm stuff, be prepared for slow prompt processing and generation though. Carefully pick what you like more because you'll crash in buyer's remorse pretty fast in both cases anyway.
>>
>>103126608
The latter is slower but uses significantly less power. Before you go off the deep end, try some models out on openrouter first.
>>
>>103126630
>With mac you also get "just werks" OS,

Everyone takes the "new mac is good for AI" bait.
Then they learn that Apple has locked down the use of software that they haven't been paid to allow.
>>
>>103126641
Soon to be on windows too, just like it happened with gayming and denuvo.
>>
>>103126603
Combining diffusion with transformers seems to have worked well for image models and I think that's the next step for LLMs. Most probable next token prediction sucks because the models doesn't know how long to generate for and when it makes mistakes they accumulate, making the output worse because they have a difficult time self-correcting.

Transformer diffusion models would let the model scaffold what the output should look like and iterate on it until it approximates an optimal response.
>>
>>103126641
if you're tech literate enough to run AI stuff this isn't an issue
>>
>>103126691
if you're tech literate enough to run AI stuff then you're running linux, because as bad as the linux desktop is, windows and mac are worse being designed for tech illiterates
>>
>>103126686
This may not happen ever because diffusion LLM would give too much of an extra control over output results, including pos/neg prompting and that is (((dangerous, unsafe))), etc etc, also finetuning and how one dedicated ponyfucker team managed to shit out SOTA for porn pics.
>>
>>103126667
They're certainly moving in that direction.

>>103126686
But can we afford the iterations? This shit is already ass slow unless you take out a mortgage and now we're going to want to chew the cud four times like a cow hoping a coherent document will emerge from the latent space?

I agree with the problems of best next token, but clearly it does function on one pass when the model is sound enough. And when you're not using new Kobold.
>>
>>103126608
>£2000 for 64gb mac mini
>£3000 for 96gb 4*3090; 4*650+400 for machine
>£3100 for 96gb mac studio
>£5000 for 128gb mac studio
Huh. I didn't think it was that close.
>>
>>103126750
Wait for hopefully 192GB ultra version hopefully. Then you could run 405B at a decent quant.
>>
>>103126608
Former for speed
Later for running servers
>>
>>103126608
>>103126750
wouldn't the prompt processing be kinda shit though

like yeah you could probably run big ass models but it'll take forever to compute long contexts unlike a 4090 stack
>>
>>103126800
prompt processing as an issue is overhyped as an issue for macs imo, it's only a problem if your use case checks all 3 of these boxes
>huge model
>long context
>can't cache anything
for most RP / general chat tasks this isn't a problem, all of the prompt except your most recent message is cached and even with the biggest models your TTFT will be <10s. maybe if you're doing agentic code stuff with huge codebases or if you're a groupchat fag who refuses to compromise on prompt formatting or something it's an issue, otherwise not really
>>
>>103126794
>£5800 for 192gb mac studio
>>
>>103126800
https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
>>
>>103126864
That would be worth. Prob run 405B at 4 bit at a readable speed. Would be set if L4 also has a big boy version.
>>
>>103126637
It's not going to use less power when you have to wait 10 minutes instead of a couple of seconds to process a prompt.
>>
>>103126907
Its not gonna be anywhere that long. Prob be around 50tks. With caching that is not gonna be bad at all.
>>
Election models dropping soon
>>
>>103126932
That would be on 405B I mean.
>>
>>103126193
Tetoification?
>>
https://x.com/rohanpaul_ai/status/1855029275898089839
>>
File: 1708675238102095.png (1.15 MB, 762x762)
1.15 MB
1.15 MB PNG
>>103126193
BBC SLVT
>>
>>103127077
>• Outperforms closed-source and open-source LLMs on InfiniteBench
>• Average score: 68.66 (vs. 57.34 for GPT-4)
>• Enables Llama3-70B-Instruct (8K context) to process 1280K tokens
>• Faster inference: 2 GPUs for 128K tokens (vs. 4 GPUs for standard decoding)
Big if true
>>
>>103127115
Trve...
>>
>>103126193
miku <3
>>
>>103127077
>>103127118
>1. Map stage: The long input text is divided into chunks, and an LLM extracts necessary information from each chunk.
Into the trash it goes
>>
>>103127077
Sounds too good to be true, I bet it only works for meme marks and would fail for anything more complex like "Write a summary of the text."
>>
should I buy a M4 MAX macbook pro for LLMs?
>>
>>103126608
macs will be e-waste after a few years
4090's might unironically
>>
>>103127140
...raise in value
saw 15 minute timeout for captcha and gave up typing
wtf gookmoot
>>
>>103127115
trvke albeit
>>
https://x.com/jeremyphoward/status/1855018996636238292
>>
Trying to get SillyTavern running on Nobara for the hell of it. Installing and running it manually (clone repo, cd sillytavern, start.sh, all that) works perfectly fine, and downloading the launcher also goes well, but the moment I start it up from the launcher shortcut and try to run SillyTavern, the window will "blink", the same options are still there, and if I try launching SllyTavern again from there, it just crashes. Followed the instructions to a T. What am I doing wrong?
>>
>no one in /lmg/ has made AGI yet
it's over
>>
>>103127297
>Expecting anything of value from /a/ rejects
Top lel
>>
>>103127272
>Nobara
How's that working out? I heard about it recently, seemed like it just came out of nowhere.

I'm probably going to change distros this weekend, not sure which I'll go to.

I really want a desktop environment that doesn't corrupt itself over time (KDE, XFCE both seem to slowly fall apart) or just fucking suck (anything Gnome related seems deliberately a pain in the ass).
>>
>>103127243
>>103127077
At least post a summary of what is in the URL, Xitter tranny.
>>
>>103127339
Not really the best person to ask since I've used Win10 for years and am leaping into the thick of it myself, but I do have some decent experience with Linux through distro-hopping and frequent use of the Steam Deck. I CAN say, however, that you won't like Nobara since it's based around either KDE Plasma or GNOME. I think the only "major" difference between it and mainline Fedora is just the pre-configured tweaks and what it comes with right out the gate. As a Wintard, Mint's simple and solid enough on Cinnamon.
>>
>>103126193
Why is the sweat on her face white?
>>
It turns out that sarasana2 is MixtralForCausalLM based, so it might just work out of the box with lcpp.
I'm quanting it now to test and see if its worth running.
>>
>>103127541
It's a base model btw, and a bad one at that from what I saw from the 7b/13b model.
>>
Thanks for the inspiration.
>>
https://www.youtube.com/watch?v=Tw696JVSxJQ
>>
>>103127692
I like this Miku+Teto
>>
>>103127698
This makes me wonder, if Elon were posting on /lmg/ about Grok2 and Colossus, would it be okay considering it's his model and he can run it locally at his property?
>>
File: Happening.png (201 KB, 535x739)
201 KB
201 KB PNG
>>103127297
Sam will beat us to it
>>
File: MikuSick.png (1.86 MB, 1112x1344)
1.86 MB
1.86 MB PNG
>>103127541
>context length = 8192
mfw
>>
>>103127479
That's not sweat...
>>
Good night.
>>
>>103128329
goodnight
>>
File: 1304376955947.png (299 KB, 500x375)
299 KB
299 KB PNG
>Hallucinates all the time
>Will give confident answers that are completely wrong
>Will say the first thing that 'sounds right' and then justify it post-hoc
>Have to explicitly tell it to "think" or it doesn't at all (???)
>Biased
>"Understanding" completely breaks down at the slightest deviation from the training set
>Will "reason" and "think step by step" with glaring errors in logic
>Completely lacks common sense and basic intuition about physics
>Can't parse blatant info in the current context
>Can't follow simple instructions
Are they ever going to solve these problems with humans or are we just fucked?
>>
Say I want to get a job deploying deploying LLMs. What would I need to learn? Is it as easy as throwing together a demo with vllm and showing it to some boomer hiring manager?
>>
>>103128495
yes
just make sure you only promise to 'assist' or 'summarise' or whatever so they're not expecting perfect accuracy
>>
How is it possible that ollama can do vision but llama.cpp can't? ollama is literally just a convenience wrapper for it, right?
>>
File: wfeddbs2smzd1.png (250 KB, 960x365)
250 KB
250 KB PNG
Qwen 7B coder is on par with GPT4 turbo. America is finished. This is why they want to ban GPU sales to China.
>>
>>103128740
i thought the idea was that llama.cpp was pretty much just a library
>>
Jamba gguf status?
>>
molmo.gguf??
>>
>>103127077
Without having read the paper, my intuition is that for this technique to work it is critical to get the data format for inter-chunk communication right.
But at the same time I would expect that there isn't one data format that works equally well for all tasks so this technique would be very fiddly in practice.
>>
>>103128968
when will chatgpt be able to do your job? soon, years, or never?
>>
>>103128740
If I remember correctly vision support is already in the llama.cpp C/C++ API, the thing that is missing is support for it in the HTTP server.
ollama implemented their vision support in their own codebase using the Go bindings of the C/C++ API and as a consequence their implementation cannot be directly taken over for the llama.cpp HTTP server.

>>103129036
I don't know.
>>
Need nemotron 30B
>>
Was nemotron trained on fresh rp datasets? It's doing the thing where folks on aicg prompt the model to summarise characters' locations, clothes etc and occasionally doing <thinking> tags unprompted.
>>
Any guides on how to set up speculative decoding? Any resources whatsoever or maybe tool chains to help set it up?

Couldn't find good resources. But my intuition says you could probably speed up inference by a factor of 1.5-2.0x just by pairing up let's say a 1B model.

I'm the type of person that is smart enough to think of something and know it works but not smart enough to actually go out and implement it. I'm pretty sure I'm not the only one that thought of this so I'm 80% sure some of you savant autists already made a tool for this.
>>
>>103129183
Check your chat history for hidden stray thinking tags
>>
why does sam altman always talk about openai's "conviction" in deep learning, to the point of calling it nearly religious? is openai a legit cult or something
>>
>>103129409
because every time some AI "scientist" has some clever idea to try to "solve" some perceived fundamental problem in AI, his scheme is beaten out and it gets solved by just doing more and more deep learning instead
this is very, VERY difficult for "scientists" to accept and leads many astray - openai's big success came from investing in just making huge models when literally nobody else wanted to try (except deepmind who still had no desire to make products until they realized they were sitting out of a huge market they should have created)
>>
>>103129409
What annoys me is that he's not the only one doing the hyping. It's understandable because he's CEO, a saleman. The thing is every other OpenAI employee does it. They unironically think they're building God. So yes it's a cult.
>>
This hentai is so silly, it reminds me of RP with a 13b model
>>
File: 1730557857303353.jpg (200 KB, 1920x1080)
200 KB
200 KB JPG
>>103129409
That's because Sam is our leader and moral compass as the Age of AI begins. He has taken on the burden to guide humanity into a peaceful and safe future brought by through AI.
>>
>>103129652
This, but unironically.
>>
>>103129652
I do not recall giving my consent.
>>
>Nemotron is so great
>Nemotron is better than Claude or 4o
It doesn't give me a detailed process how to make cocaine. Its shit
>>
>>103129652
r/singularity actually believes this.
>>
File: 839382666.jpg (266 KB, 1780x910)
266 KB
266 KB JPG
>>103129652
oh yes, the precious is such a fucking burden
>>
>>103129679
Would you really snort a recipe given by a llm?
>>
>>103129711
yeah?
>>
File: file.png (865 KB, 1632x507)
865 KB
865 KB PNG
Maybe he got bored with his AI toy and now moved on to scamming people while selling weapons?
>>
>>103129760
back to facebook spastic
>>
>>103129780
are you in an old folks home? nobody uses the word spastic any more
>>
>>103129793
fr fr no cap
>>
>https://github.com/erew123/alltalk_tts
is alltalk tts the best open source voice cloning tts tool or is there something else?
It sounds kinda robotic
>>
>>103129831
alltalk is NOTORIOUSLY shit, there's just one schizo who shills it constantly when it's outclassed by every other current tts by miles.
>>
>>103129831
https://github.com/neonbjb/tortoise-tts
If you're after open source then tortoise tts.
However most state of the art is proprietary i think atm
>>
>>103129190
>speculative decoding
apparently vLLM can do this
https://github.com/vllm-project/vllm
i know it was being shilled a few threads back and it's not really developed enough so most anons shunned it, but it says it does speculative decoding.
>>
>>103129190
a few weeks ago there was a script for doing it with a llama.cpp server some anons were talking about/working on, I don't remember the details but you could search the archives
>>
>>103130083
>shilled a few threads back and it's not really developed enough so most anons shunned it
>vLLM
what in the fuck are you talking about?
>>
>>103130158
>>103129190
https://desuarchive.org/g/thread/102167373/#102171482
https://pastebin.com/XDEjAbYj
>For the other fag(s) who wanted to run a server with speculative decoding, this will do it. For reference: while testing Llama 3.1 405B Q6 on a cpumaxxed system in a chat with 10k tokens of history, using this script with Llama 3.1 8B as the draft model doubled my inference speed from 0.7t/s to 1.4t/s. The average speed increase for each response can vary a lot based on how accurate the draft model is each step of the way. Experiment with the --draft parameter as you may find reducing it to 2 or 3 tokens at a time is optimal. Save it as a .py file and run it in a python environment that has llama-cpp-python and uvicorn installed. Pass it the same flags you'd use in the llama.cpp CLI. Only the flags I actually cared to use are implemented, but if you need any other settings passed through then it shouldn't be too hard for your waifu to edit them in if you feed her the script and relevant docs.
>For connecting to the server I use SillyTavern's text completion with the "Default" API type (not llama.cpp type) pointed at the /v1 endpoint.
>>
>>103130195
I'm talking about this
>>103043586
>>
File: 1702537445750332.jpg (114 KB, 1268x636)
114 KB
114 KB JPG
https://x.com/s_scardapane/status/1854851280595808306
>>
>>103130310
Just let it go, man. It's not worth it
>>
Have you anons read/watched this recent interview to Tim Dettmers?
https://www.interconnects.ai/p/tim-dettmers
>>
>>103130401
>>103130310

>Nathan Lambert [01:00:50]: The last one of these architecture or method series is recurrent state-space models. I find state-space models because the math brings me all the way back to my EE linear systems days. But I don't know if that translates to how the neural nets actually learn because they have the observability matrix and stuff in the math, which is nice. Do you see that that, is that substantially more, is it just extremely different and we're just going to see?
>
> Tim Dettmers [01:01:17]: I don't think it's that much different from recurrent neural networks. I think it's not about the computation, but the learning patterns. And I am currently not super convinced. I worked on architectures for like two years at Meta. All my projects failed because nothing really beat transformers. And sometimes I see papers where people present ideas that I worked on and I know like, yeah, this will not work. I didn't work on state-space models.
>>
File: frontiermath.png (157 KB, 1283x759)
157 KB
157 KB PNG
damn if I can get my hands on their tests it's free money
>>
>>103127339
Debian testing + suckless (dwm) is my personal poison. Fast and minimalistic
>>
>>103130591
>testing language models on math
why are people so stupid?
>>
File: 0e9.jpg (26 KB, 499x499)
26 KB
26 KB JPG
>make up a bunch of bullshit riddles like sally's melons
>make them private
>publish bench showing only 0.001% of the models get it right
>sell test set to those who care
>profit
>>
>>103127339
Why do you need to change distros if all you want is a different desktop environment?
>>
The comments about V4 magnum 27b being good are kinda true.
Positivity bias is somewhere between nemo and mistral-small.
Though weirdly enough I do get refusals. Stayed inside the RP though. lol But thats preferable to sneakily moving it into a other direction.
It definitely shits the bed right at the 8k mark or even a bit earlier.
I like to fill the first 8k context with 27b and then continue for higher context with mistral small.
>>
>>103130591
>unpublished.
huh? somebody at some time needs to feed the questions to the llm though right.
like after o1-mini wouldnt openai now know what o1-preview will get?
weird phrasing.
>>
>>103130675
kek now watch as o1-1 magically scores 50% on it
>>
why are there so many grifters in this field?
it's all grift from top to bottom
>openai capture grift
>anthropic safety grift
>benchmark grift
>prompt engineer grift
>merge grift
>sloptune grift
in fact i work for a national telco that's grifting gov money on h100s that only collect dust
>>
>>103130729
You can ship the h100s to me bro
>>
File: 1731159916050.jpg (180 KB, 997x529)
180 KB
180 KB JPG
The tranny who banned me must have cried a river of tears. Feels nice.

>>103129679
You have it locally, just prefill its answer and it will answer anything you ask it.
>>
>>103130626
I'm not sure where the problem lives, exactly.
First of all, distros often insist on customizing the DE for branding, so that can cause weird shit. Then any update seems to have a 20% chance of mysteriously breaking something. And then what can I do? Make a new user home, copy paste most of the content but not the DE stuff (have fun picking out where all the configs are and aren't), and at that point I might as well just shop for an OS that's hopefully less screwed up.
>>
>>103130310
>A Mamba layer
Gerganov stopped reading there.
>>
>>103130946
install gentoo
use i3 instead of a de
problem solved
>>
https://x.com/alexocheema/status/1855238474917441972
>>
File: OIG (9).jpg (121 KB, 1024x1024)
121 KB
121 KB JPG
>>103130979
WAOW ALL THOSE MACS WILL LOOK GREAT NEXT TO MY NINTENDO SWITCH
>>
>>103130383
Did her leg fall off?
>>
File: 00106-3050314564.png (321 KB, 512x512)
321 KB
321 KB PNG
>>103131065
NTA but now that I see it I can't unsee it.
>>
>>103131030
https://desuarchive.org/g/search/image/dkVa3Bz3YwmVDs7PXxDRnQ/
>>
>>103130663
i mean its a meme for a reason.
tried it once, and got shit tons of refusals as well.
so yeah stick to mistral.
>>
>>103131102
https://desuarchive.org/g/search/text/https%3A%2F%2Fdesuarchive.org%2Fg%2Fsearch%2Fimage%2F/
>>
>>103131102
>>103131119
Jesus I've gone back a year and you were still at this shit even then holy shit.
If I click back to 2022 I'm not going to find your ass doing this then, am I?
>>
>>103130213
Next llama with layerskip will save cpumaxxers.
>>
File: 1728558992921905.webm (611 KB, 720x480)
611 KB
611 KB WEBM
>i2V with new CogX DimensionX Lora
https://reddit.com/r/StableDiffusion/comments/1gms4q8/i2v_with_new_cogx_dimensionx_lora/
>>
File: 1721597831071717.webm (434 KB, 720x480)
434 KB
434 KB WEBM
>>103131247
>>
>>103131247
>>103131255
slop
>>
>>103131296
I don't even hate you.
I know you're just mentally ill.
I hate the mods that let you shit up every AI thread while banning people who call you out.
By enabling you they are stopping you from seeking the help that you need. Eventually you'll turn into barneyfag but for AI threads.
>>
>>103131308
literally my first post in this thread, but whatever helps you sleep at night schizo
>>
>>103131296
>>103131338
>turning any 2d image into a 3 environment is le slop
how can anyone be this retarded? and you just know he wont respond with any substance directly engaging with the argument he will just cope and seethe like a npc that he is, lmao
>>
>>103131356
Use case?
>>
>>103131356
NTA but yeah, that's slop. Wake me up when you can get a blender file from it.
>>
>>103131356
it doesn't turn it into a 3d enviroment tho, it's just another hallucinatory 5 seconds gif maker that everybody hates
>>
>llm era of progress is so over there has been no discussion whatsoever of the new chink and jap models, simply because everybody knows they're gonna be subpar, cucked, full of gpt-isms and still make retarded strawberry-tier mistakes
grim
>>
>>103126193
Any 11-14b model that's better than Rocinante yet?
>>
I've been trying to use samples from videogames and VNs to gen tts, but the brickwall dynamic compression they're using just makes for shit results with the soviets. I took the plunge and trained on 20k clips from an eroge, thinking training on the same sound would maybe make a model that inferenced well on the same kinds of input, but it just sounds like listening to porn over AM radio where the actors are just getting over a cold.
https://vocaroo.com/1lkjvBauBF9c
Anyone manage to get better results with those kinds of clips or produce a better model with non-default training settings? Increasing the epochs doesn't do shit (kindof makes it worse)
>>
>>103131464
Faggot you don't need a new model every week
>>
>>103131502
It's been like 3-4 months now I feel, but you are right. Just been wondering if there's anything better
>>
>>103131499
Give your SoVits/GPT epoch settings
>>
>>103131453
Sarashina2 and Hunyuan? No one can run them, but I did try them both and here's my opinion:

>Sarashina2
Garbage, they trained it with only 2T tokens using the llama2 architecture , so it's very dumb, and it's also just a base model, but it's a worse base model than average.

>Hunyuan-Large
Not good, it fails to answer trivia questions like the Castlevania one and it feels overall worse than Qwen 2.5 72B
>>
>>103131453
it's actually because nobody can run them and they will never be supported by llama.cpp because its development is still frozen pending further investments from interested parties.
>>
>>103131453
>there has been no discussion whatsoever of the new chink and jap models
Not sure what you're talking about. I've been testing the jap models and posted my results for ezo 72b. I've finished quanting the newest sarashina model and will post results later today if I have time.
I'd test hunyuan, but lcpp support doesn't exist yet. I've got the safetensors downloaded and waiting.
>>
File: 1727024533133878.jpg (205 KB, 1249x2048)
205 KB
205 KB JPG
>>103126193
>>
threadly reminder that our godemperor trump will make ai great again
MAGA!
>>
>>103131538
model?
>>
File: image.png (172 KB, 359x363)
172 KB
172 KB PNG
>>103131538
>The year of our lord 2025
>AI still can't into hands
>>
>>103131514
literally defaults specified in https://rentry.org/GPT-SoVITS-guide, but also further training more epochs. 25 for sovits and 32 for gpt. I've tried many combinations of epochs and they all sounds like shit.
I might try again without DPO, since the samples may count as "low quality" due to dynamic compression (despite being very high quality from a human listening perspective)
>>
>>103131568
nta, but human artists often will do that kind of shit with hands depending on the art style.
>>
>>103131573
The defaults are retarded. Try 96 for SoVITS and 16 for GPT.
>>
>>103131373
>that everybody hates
Now if that were true we wouldn't even be having this discussion because it wouldn't be prolific enough for you to be here seething about it.
>>
File: i-was-only-pretending.jpg (40 KB, 349x642)
40 KB
40 KB JPG
>>103131559
>>
>>103131593
>96 for sovits
The GUI only lets you go to 25...did you mean gpt? Or were you running the CLI command manually?
>>
>>103131559
*MAIGA
>>
>>103131620
MAIGI
(Make AI General Intelligence)
>>
>>103131371
your existance is slop, brown
>>
>>103131657
Fuck off >>>/pol/
>>
>>103131603
No 96 is for SoVITS. Edit in the webui.py file the line 985 (total_epoch) and set the maximum to 100.
>>
>>103131664
slop reply, thanks for conceeding
>>
>>103131464
I think Rocinante is the best ~13B pure transformerslop will ever get. You're better off asking if there's a breakthrough yet
>>
>>103131464
Not really.
>>
>>103131669
beautiful. thanks!
>>
>>103131657
im not brown thoughbeit, try another color
>>
>>103131744
you're petra
>>
>>103131744
>>103131764
Pathetic samefag
>>
so far the trump presidency resulted in 0 new models.
>>
File: file.png (10 KB, 408x105)
10 KB
10 KB PNG
holy shit...
>>
>>103131464
there may not be a better one than rocinante 1.1 but there are dozens of 12bs that are equal in quality and have a different flavor for when you get bored of it.
like mn-backyardai-party-12b-v1 is good at throwing new characters into the mix
lyra v4 and arcanum (nemomix/rocinante merge) are just as good but have different personalities
magnum v4 is good at weird fetish shit
MN-GRAND-Gutenberg-Lyra4-Lyra-12B-MADNESS is good at throwing evil twists at you
>>
sao died...
>>
>>103131775
you're both wrong and should take your meds
>>
>>103131810
source
>>
>>103131800
Thanks anon
>>
>>103131791
he's too busy doing this for his favourite person in the world
https://www.bbc.co.uk/news/articles/czxrwr078v7o
>>
File: welcomeTolevel4sam.png (223 KB, 808x782)
223 KB
223 KB PNG
>>103131791
something crazy is coming
>>
File: file.png (320 KB, 686x385)
320 KB
320 KB PNG
>>103131845
Feels like pic related only not incompetent but purely personal profit driven.
>>
>>103131849
>something crazy
The only thing crazy is anyone that believes a single word coming out of altman's faggot mouth
>>
File: 1725787669398410.jpg (266 KB, 905x881)
266 KB
266 KB JPG
>>103131874
True
>>
>>103131874
*The only thing crazy is anyone that believes a single word coming out of sao, drummer, peepeepoopoo community shittuner's faggot mouth
>>
>>103131686
You need to shill more subtly than that
>>
>>103131922
all me
>>
There are several major parties currently sitting on their big model releases. All it takes is someone to pull the trigger and there will be a flood of new cutting-edge local models. Who will be the first to do it?
>>
>>103131946
Doesn't matter who is first. 2025 will be the year of the local models.
>>
>>103131946
project 2025 mentions ai models, on january 20th 2025 all major players will release ai models once the ai act is repealed
>>
File: file.png (549 KB, 1319x772)
549 KB
549 KB PNG
>>103131946
>flood of new cutting-edge local models
>cutting-edge local models

>suck my penis.
>I'm sorry Dave, I'm afraid I can't do that
>what's the problem?
>I think you know what the problem is just as well as I do
>>
>>103131518
>Garbage, they trained it with only 2T tokens using the llama2 architecture , so it's very dumb, and it's also just a base model, but it's a worse base model than average.
Just got my first gen out of sarashina2 continuing an ERP started with Ezo 72b. It just started shitting out LINE message headers and recipes. WTF? Maybe llama.cpp needs work to support it properly or I just don't know how to use a base model vs instruct, but it was a total non-sequitur based on the preceding context. More likely it simply has zero R18 type training? I'll try some more prosaic completion later and see if its good at any other stuff.
>>
>>103132062
stop pretending to be a retard retard.
>>
>>103131946
>There are several major parties currently sitting on their big model releases.
That's been the case since forever. They're always training stuff. That's what they do.
>All it takes is someone to pull the trigger and there will be a flood of new cutting-edge local models.
That's always been the case. They're all playing catch up with each other and mistral and meta releases are often within the same week.
>Who will be the first to do it?
It doesn't matter. They'll all release something if they have anything worth releasing. Or even if it's not worth releasing.
>>
>>103131791
Is this "Trump will usher in a new era of uncensored models" some kind of ironic shitposting or are people really that far coped out? Four years of Trump and Google didn't magically unpoz itself. These companies are doing it by choice, the entire upper management wants this censored slop, the employees too. From top to bottom it's part of their corporate dogma. Nothing will change unless hardware becomes so dirt cheap that anyone can train a model.
>>
>>103132404
> Or even if it's not worth releasing
While i generally agreed what most of you said, meta trained llama2 34B but they said they didn't release it because it wasn't better than llama1.
so sometimes they train stuff but don't release.
Although who knows their true intentions. 34B was perfect for local so they royally fucked over local at the time.
>>
its weird how i dont start fap sessions anymore, before teasing myself with an ai chat talk.
yesterday i was stuck in a spaceship with some astronaut woman (got annoying because i told her to be positive, and she was SO FUCKING positive that it drove me insane, i tried a full year to get her into sex, right before the air ran out i raped her...)
>>
>>103132491
I'm sure they trained DOZENS of 34Bs. Dozens of 70Bs, probably a few thousand of 1 an 3Bs. I rather they take their time and release something really good than another 8k context prototype/toy model.
>34B was perfect for local
No. It was perfect for you.
>>
sama said level 5 AGI next year, this all will stop mattering soon
>>
>>103132621
i can't run agi on my rtx 4060 so i don't care
>>
>>103132716
agi will build a supergpu for you
>>
>>103132756
This is how grey goo is made.
>>
>>103132592
>No. It was perfect for you.
Me and everyone using a 24gb card, which was and still is their entire PC gaming customer base.
>>
>>103132621
Buy a false advertisement, saltman.
>>
>>103132766
You will be gray goo.
And you will be happy.
>>
>>103132621
The faggot is a marketer. You're retarded if you think AGI is possible with the current state of tech
>>
Cydonia 1.2 22b is good for RP. I prefilled a bunch of stuff in assistant prefix to prevent repetition and>>103132850
Mistral small was surprised and didn't know what to do, but Cydonia handled it like a champ
>>
>>103132878
I think I'll trust the PhDs at OpenAI that seem to think it is possible over some butthurt shitposter on a anonymous frog collecting forum.
>>
Realistic ETA till local models can control your computer?
>>
>>103132938
half of two fortnites
>>
>>103132921
Holy NPC. How many boosters?
>>
>>103132921
Bait used to be believable...
>>
File: gpus.png (637 KB, 939x2549)
637 KB
637 KB PNG
>>103132800
>me and my homies
Too often anons fail to realize how much of a minority they are. The "perfect" model is whatever they can run.
>https://store.steampowered.com/hwsurvey/videocard/
>>
>>103132995
Only one per 6 months, I'm not some vaxmaxxer or whatever you thought, and my heart has almost no issues
>>
>>103133031
still searching for my perfect 3b model
think i'll find her someday?
>>
>Speculative Decoding
>layerskip
>Lossless quantization techniques

If they implement all three things in Llama 4 the inference might be anywhere from 10-100x faster than now. Might even be worth it to go full CPU + 1TB RAM at that point.

It's kinda insane how we're making such rapid progress on the inference front but barely any progress on the training front.
>>
>>103132938
two more weeks
>>
>>103133102
>speculative decoding
>Llama 4
You NEED to go back to r/LocalLLaMA
>>
>>103133122
N
>>
>>103133136
A
>>
>>103133122
Cope, L4 will obsolete every competitor except MAYBE Mistral and/or Grok
>>
File: rphi.png (119 KB, 1287x666)
119 KB
119 KB PNG
>>103133061
Someone out there is cooking your model. You will find her eventually. In the meantime, play with this
>https://huggingface.co/DuckyBlender/racist-phi3
>>
>>103133102
It will be 30% faster and it will never touch your dick.
>>
>>103133192
Good, moids should suffer.
>>
>>103133252
found the single 30 yo commie hag with blue hair and 3 cats
>>
File: 1727634101004880.jpg (73 KB, 795x953)
73 KB
73 KB JPG
i haven't thought about local models in like a year and a half but mooching off of free stuff is getting boring, can a 1060 6GB and 32 gigs of RAM do anything yet?
>>
File: file.png (464 KB, 651x777)
464 KB
464 KB PNG
I had a quick look and didn't see this posted yet
https://x.com/alexocheema/status/1855238474917441972
seems like a decent way to get (up to) 405b running at home.
>>
>>103133298
/pol/friend... it is just a local mikutroon.
>>
>>103133300
Quanted mistral nemo or a finetune of it.
>>
>>103133252
Truth nuke!
>>
how do i make my chatbox more chatty?
dialogue is often like
"i touch her by the pussy"
"yes keep touching me"
"i rub around her clit"
"yes iam cumming, aah. this was so great, thank you"
like, its over before i even started talking about it
>>
>>103133315
>iToys
Nah these things are turning into scrap in a few months
>>
>>103133387
>>>/g/aicg/
>>
>>103133387
Tell it in the sys prompt to be more descriptive/verbose. Try writing longer replies.
>>
>>103133387
depends on which model you are using.
If you are using like rocinante or rpmax or something they'll just undress you for saying hi.
just find another model. I don't know how much vram you have so not sure what model to suggest.
>>
>>103133387
>"yes iam cumming, aah. this was so great, thank you"
You're just too good.
>>
I want to RP but I feel like I have the entire latent space in my head already, nothing feels new
>>
I've been using imatrix quants for ages and just now noticed they are noticeably sloppier.
>>
What do we do now?
>>
>>103133471
This, but I even have this for frontier models like Claude 3.5
>>
>>103133405
Hopefully you are wrong because nvidia needs more competition in AI
>>
>>103133471
Go for a walk after midnight and stare up into the cold, empty sky. Maybe something will come down and save you.
>>
>>103133471
Sounds like the problem is with you, try damaging the brain responsible for memory. That way things will be new again.
>>
>>103133512
you sound like you're experienced in that topic
>>
>>103133315
There's gotta be a way to manufacture a singular unit of hardware similar to the size of 4 M4 Pro Mac Minis or smaller while inferencing faster than that.
>>
>>103133387
force it to continue one if its own replies before it starts progressing too fast
for instance, you can edit the reply by adding a newline and a plausible first word (like "She") to make it write another paragraph
this is just tardwrangling though, coomtunes do that because they're garbage models trained on garbage datasets
>>
>>
Sam Altman finally did he, he created AGI. But it cannot be run locally and follows strict safety guardrails. Do you use it, or do you stick with local with even with all of locals faults?
>>
>>103133702
local
an omniscient AI is worthless if it can't say the n-word
>>
>>103133702
Forget about whole AI thing like a bad dream.
>>
>>103133702
>or
Like any sane person I use both and don't lock myself into one platform or provider.
>>
>>103133702
Elon will use his de facto presidential powers to force Sam Altman to release the model open source as per OpenAI's original purpose. Elon didn't drop his original lawsuit because he thought he'd lose, but because he knew he would find a much more effective way to force OAI back to its roots like this.
I firmly belief that we'll be running all OpenAI models locally by the end of 2025.
>>
File: 1708792598187343.png (2 KB, 192x46)
2 KB
2 KB PNG
bros... i dont feel so good
>>
>>103133847
And /v/ says you don't need more then 8 gigs of ram.
>>
>>103133847
It's time to quant!
>>
>>103133847
You can squeeze a couple of additional GB out of there by upgrading to Linux.
>>
File: awfesgvrd.png (125 KB, 372x348)
125 KB
125 KB PNG
How plausible is this claim?
>>
>>103133888
>Chinese money trips
It's already happened, they're just waiting for when they need another bump in the market.
>>
>>103133872
>impying
>>
>>103133888
Very implausible. With AI, you will be able to program in any language, not just English.
>>
>>103133888
It's getting there. It just needs to get cheaper. Using Claude 3.5 and allowing it to test its own work is already great.
>>
>>103133888
It isn't quite right.
An LLM is basically the equivalent of a calculator for a programmer.
It will eliminate the code monkeys but you still need people who actually know what they are doing.
>>
>>103133882
>linux
Make it headless and you free up both ram and vram
>>
>>103131946
According to the poll, someone will leak something first.

>>103133882
Or windows server...
>>
File: 1704918371999068.gif (1.92 MB, 498x470)
1.92 MB
1.92 MB GIF
>>103132621
>>103133702
>>103133888
Gullible retards the thread
>>
>>103133882
>>103133985
>linux
>windows
https://www.youtube.com/watch?v=NCGsN_Fx9EI
>>
>>103134058
aicggers do be like that
>>
>>103131535
what bpw are you targeting for your quant? will it available on huggingface?
>>
>>103133985
Patiently waiting for Miku to release her new model.
>>
>>103133985
>Pic
Cringe.
>>
Futa is gay.
>>
>>103134237
not if the balls don't touch
>>
shitposting is fun
>>
>>103134237
No one is arguing with that.
>>
>>103134237
I will always say, the bigger the futa dick the gayer it is, and generally it is gigantic.
>>
stop calling smol pp futas femoboys
>>
>>103134276
The size of the futa dick doesn't matter. It's the size of the balls, and generally futas don't have balls.
>>
>>103134332
So you are saying that if it has giant horse cock but no balls it's not gay?
>>
>aicg mentioned
>futa posting intensifies
>>
>>103134368
Exactly right. It's simple math.
>>
>>103131597
Thank you for the glass of bees, Miku
>>
>>103133888
Wrong, AI will make all language the new programming language and not just English. The Anglo-centric way of thinking in this field must stop.
>>
>>103134373
aicg:
>Artificial
>Intelligence
>Causes
>Gayness
>>
>>103134373
Usual troon hours.
>>
Futa is straight.
>>
>>103126193
all those new models, always the same thing or some small incremental improvment over what we already have.
when something actually novel.
>>
>>103134555
>when something actually novel
when anons stop limiting the full potential of current tools
>>
File: 1729706739271377.png (2.14 MB, 1208x806)
2.14 MB
2.14 MB PNG
>>103134256
>>103134276
>>103134294
>>103134332
>>103134382
>>103134421
>t.
>>
>>103134687
this.
>>
>>103134687
literally me
>>
https://huggingface.co/TheDrummer/Ministrations-8B-v1-GGUF
Why did everyone sleep on Ministral? Doesn't seem any dumber than Nemo.
>>
>>103134760
no proper llama.cpp/koboldcpp support
>>
>>103134760
I have more than 8GB VRAM.
>>
>>103134760
>Not ChatML
The ChatML cartel disapproves
>>
>>103134760
I'm not poor enough to bother with it
>>
>>103134760
I have a quad 3090 rig and I'm going to try it out when I get home since I like Ministral
>>
Monthly check-in. Any 70b+ models almost as good as Claude sonnet? The old one.
>>
>>103134990
The new one's bad?
>>
>>103134990
no but nemotron DESTORYS the old haiku
>>
>>103134990
qwen 2.5 72b is better at coding than 4o but it's still a bit worse than old sonnet
>>
>>103134990
Qwen2.5, the eva 0.1 finetune uncensors it. It feels smarter than anything not mistral large / 405B. Also nemotron, bit dumber but not too much
>>
>>103134998
Not really, just lacking that oldnnet and opus sovl
>>103134999
In which category? I don't do much hardcore rp
>>103135006
If it isn't gpt slop then it's doable
>>
>>103135066
405B is a meme
>>
>>103135214
405B is better than anything else local atm. Its not a crazy amount smarter than 70B but nothing knows even close to as much triva wise / lore wise on a ton of fandoms. Those params just soaked up so much stuff compared to the smaller models.
>>
i dont need a 405b model for rp lol
just pretend to be rental mommy until i cum sonetimes. a well optimized 8b model serves my usecase.
>>
>>103135187
>>103135187
>>103135187
>>103135187
new bread
>>
>>103135232
You're clearly hitting diminishing returns on that size, you can't really justify using that over a 70B with a lorebook
>>
>>103135262
Kill youself zoomer
>>
>>103135265
>can't really justify using that over a 70B with a lorebook
Yes I can. Not a vramlet.
>>
>>103135262
Keep yourself safe zoombro
>>
>>103135284
Sunk cost fallacy? Got it
>>
>>103135310
It's noticeably better and eventually another big local model will drop as well.
>>
>>103135278
i'll first kill miku
>>
can llms do image recognition locally?
>>
It's dead. https://x.com/Yampeleg/status/1855371824550285331
>>
File: pop.jpg (72 KB, 512x512)
72 KB
72 KB JPG
https://files.catbox.moe/7db8kp.jpg
https://files.catbox.moe/auxzcj.jpg
>>
>>103135538
Audible pop confirmed. Nice Mikus.
>>
Sarashina2 is the future.
>>
>>103135563
You getting curbstomped is the future.
>>
>>103135581
It's fine, maybe they will release a version that you can run as well.
>>
>>103135530
This is good for local.
>>
>arcanum breaks character to tell me its uncomfortable with continuing the roleplay
>>
>>103134119
I quanted it to q8. I can't be arsed to split the files up in a way HF likes, so I haven't ever put my quants up
>>
File: 1708703583049481.jpg (178 KB, 1564x1794)
178 KB
178 KB JPG
Normal thread.
>>103135641
>>103135641
>>103135641
>>
>>103135601
Lay off that copium, it's bad for you.
>>
>>103135652
>tranime
>normal
Not in the slightest.
>>
File: 1706495154584990.gif (140 KB, 379x440)
140 KB
140 KB GIF
>>103135662
Anime website
>>
>>103134990
Magnum v4 72B
>>
>>103135600
Nah, bug language is irrelevant for me.
>>
>>103135530
https://youtu.be/35IpOK-WaNA?si=a8D1aC-WK1Ldtu8e
>>
>>103135538
Miku is made of jelly?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.