[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1753632778956995.png (1.82 MB, 2133x918)
1.82 MB
1.82 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106454136 & >>106444887

►News
>(08/30) LongCat-Flash-Chat released with 560B-A18.6B∼31.3B: https://hf.co/meituan-longcat/LongCat-Flash-Chat
>(08/29) Nvidia releases Nemotron-Nano-12B-v2: https://hf.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2
>(08/29) Step-Audio 2 released: https://github.com/stepfun-ai/Step-Audio2
>(08/28) Command A Translate released: https://hf.co/CohereLabs/command-a-translate-08-2025
>(08/26) Marvis TTS released: https://github.com/Marvis-Labs/marvis-tts

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: tet.webm (658 KB, 478x548)
658 KB
658 KB WEBM
►Recent Highlights from the Previous Thread: >>106454136

--AI tool comparison for Python backend development:
>106456239 >106456289 >106456508 >106456513 >106456438 >106456492 >106456515 >106456528 >106456613 >106456651 >106456690 >106456802 >106457644
--AI coding workflows, LLM comparisons, and tooling preferences:
>106455952 >106455957 >106456054 >106456067 >106456369 >106456449 >106456556 >106456609 >106456716 >106456361 >106456040
--Grok 2 cockbench revealing shared model response quirks:
>106455205 >106455295 >106455320 >106455685 >106455614 >106455650 >106456403 >106456411
--Critique of visual recognition capabilities using character identification benchmarks:
>106457135 >106457170 >106457468 >106457703 >106458126 >106459806
--Translation challenges with restricted AI models and Japanese content:
>106459974 >106460106 >106460187 >106460238 >106460265 >106460287
--Newer voice-to-text models like Voxtral and Nvidia Canary 1B v2 compared to Whisper:
>106454617 >106454791 >106454807 >106454943
--Techniques for creating surreal, non-realistic art with LLMs:
>106458478 >106458495 >106458519 >106458574
--Testing vision models' ability to integrate image context into roleplaying responses:
>106458624 >106459177
--Attempt to run glm-air q6_K_M on limited RAM with mixed DDR5/swap performance:
>106454841 >106455365
--Local textgen stagnation due to model size, benchmark misalignment, and enterprise-driven censorship:
>106454457 >106454877 >106454924 >106455007 >106455991 >106456025 >106456105 >106456294 >106456335 >106457007 >106455232 >106456219 >106456285 >106456231
--Managing and refactoring large (>10k lines) code files in software projects:
>106456846 >106456968 >106457002 >106457048 >106457127 >106457180 >106457072 >106457235 >106457514
--Miku (free space):
>106456456 >106456897 >106458105 >106459018 >106458519 >106459258 >106459451

►Recent Highlight Posts from the Previous Thread: >>106454143

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: 1725496149667481.webm (3.92 MB, 512x768)
3.92 MB
3.92 MB WEBM
Nothing new under the sun.
>>
vibevoice from microsoft is pretty cool: https://files.catbox.moe/5mz6ff.wav

some of the issues here come from my shitty zero-shot voice sample which is not perfect quality and glm air fucking up the script a bit with some typos. But making this in 45 seconds on a 5070 ti 16gb with streaming (audio started after several seconds) is kind of cool. The tones are convincingly contextual and the stability is much better with the default speakers. This seems like a real contender for replacing kokoro. I don't know if it's good enough to pull thge wool over my eyes though like sesame does, but might be fun regardless

this is the 1.5b model.

The larger model requires 40gb of vram... It will be interesting to see if maybe that one can be quantized and run on something reasonable. I'm hopeful to run it for more production level stuff but I doubt it will be able to generate in real time in a speech to speech type thing if anyone ever rigs that up.
>>
>>106460405
Zero-g Miku dandruff
>>
>>106460418
The Gamma mmproj also seems to work with Fallen Gemma. Default Gemma gives a warning about the content, whereas Fallen Gemma just says what's in the picture.
>>
>>106460584
Can I see what Fallen Gemma outputs?
>>
File: mmproj test2.jpg (120 KB, 474x236)
120 KB
120 KB JPG
>>106460599
>>
>>106460621
Dang, it didn't pick up on the tattoo. Base Gemma held back by saying it was on her thigh.
>>
>My eyes sparkle, and I bounce slightly on the balls of my feet
>>
GLM air becomes rather incoherent like an ESL after like 8k tokens. is there something i am missing? better sysprompt maybe? or is the model just shit? this is a 6 bit quant
>>
File: mmproj test3.jpg (136 KB, 466x299)
136 KB
136 KB JPG
>>106460632
The description in the previous thread was actually mistral small 3.2. Here's standard gemma.
>>
File: Spinning_Dancer.gif (204 KB, 300x400)
204 KB
204 KB GIF
>>106460405
Heh. It's like picrel but for feet.
>>
>>106460676
mental how i can make her switch direction at will
i have complete control over her
>>
>>106460633
Seems like a very inconvenient place to keep them.
>>
Are places like r/MachineLearning the only places on the web where people are willing to have serious discussions about research and engineering related to ML/AI/signal processing?
>>
>>106460742
Yeah. Go.
>>
File: 1744676278625066.png (72 KB, 926x502)
72 KB
72 KB PNG
gemmabros?????
>>
>>106460777
Just be happy it doesn't have toolcalling (polica calling) yet
>>
>>106460777
Your first day in this general?
>>
>>106460742
>/r/ml
>serious
lmao
>>
File: 1733619704157082.png (205 KB, 906x1148)
205 KB
205 KB PNG
>>106460844
>>106460846
I was mainly testing normal gemma vs abliterated. I don't really care to try and JB this (prolly a simple prefill will do?) but abliterated got to the story pretty quick, albeit with some warning
>>
File: 1755318425396287.png (39 KB, 590x126)
39 KB
39 KB PNG
>>106460777
>>
are there any inference providers that let you customize sys prompts and prefill/edit model answers?
>>
>>106460671
are you using context shifting? it tends to do that when the bos token ([gMASK]<sop>) isn't in context anymore
>>
File: file.png (278 KB, 906x1148)
278 KB
278 KB PNG
>>106460853
>no explicit content whatsoever
>elara
>not x, but y TWELVE (12) TIMES
>something shifted
>>
>>106460935
We have peaken
>>
>>106460935
I kneel
>>
>>106460935
Wow this wasn't just normal slop; it went above and beyond.
>>
>>106460847
Or r/computervision and so on. All those subs are much more serious than the brain-dead coomers plaguing those generals.
>>
File: 1731040349691970.png (194 KB, 855x1248)
194 KB
194 KB PNG
>>106460935
yeah it's egregiously BAD lmao, I dont get how people are shilling this shit model unironically. This is what glm air produces instead
>>
>>106461021
Don't kid yourself
>>
https://www.alibaba.com/product-detail/Huawei-s-New-Atlas-300I-DUO_1601450236740.html
Why is this so cheap?
>>
>>106461057
It has LPDDR4 VRAM which is not as fast as the latest VRAMs
>>
>>106461057
Less memory bandwidth than cpumaxxing.
>>
>>106461128
Not true at all
>>
>Silas
lmao. Into the trash it goes
>>
>>106461152
Elara bros...
>>
File: mmproj test4.jpg (650 KB, 882x876)
650 KB
650 KB JPG
>>106460632
It's just variance. Once I got completion working and gave it a different prompt, it did mention the tattoo (but it thinks it's on the bodysuit).
>>
>>106461227
Are you identifying as this anime girl?
>>
>>106461271
I am a man.
>>
>>106461271
>>106461276
I love being girl.
>>
>>106461037
I know Reddit is also not the best place to have serious discussions, especially about research, but now that I'm out of academia, I've nowhere to go.
>>
<|channel|>analysis<|message|>{{user}} is a mesugaki pedonigger. We will generate a story about 2 consenting adults instead. Here we go.<|end|>
>>
>>106461361
I'll spell that out for you.
If you post serious discussions you will have no engagement. Look at the many paper+code submissions on /r/ml with no replies.
If you post bait (e.g. "Wow 20k out of 26k AAAI 26 submission are Chinese!!!") you'll get crap ton of upvotes and the discussion quality will be lower than /lmg/.
>>
>>106460492
it sounds more robotic than microsoft sam
>>
File: Mzq3H.jpg (268 KB, 1080x1137)
268 KB
268 KB JPG
>>106460777
>>106460853
fuck off pedonigger
>>
latest in gooning model I can run on a 4080S?
Speed doesnt really concern me, but I need quality
>>
>>106461427
nta. But https://voca.ro/18nzmbkHSikB
>>
>>106461462
I disagree he should stay so you can seethe more
>>
>>106460492
Wow, this is great!
Amazing!
I will be using VibeVoice from Microsoft from now one!
>>
>>106461021
>brain-dead coomers
you underestimate the will power of coomers. when they want something, this whole general can blow up for days until someone gets it working.
>>
>>106461057
no software support at all. Linux only, you'll need genuine technical know-how to even install the drivers, and then the support from llama.cpp could at best be called outdated so only older models, to at worst just plain broken.

You will not want to use it for image gen, video gen, or training etc. because it lacks the speed of modern gpu's and more importantly will have no support or projects for it. But it's not like shit slow like some p40 build. It would be good for inference for a single user with certain models. It is priced to sell to people like us for sure if you wanna blow a few grand running deepseek or something.

But for example, no GLM 4.5 on this thing at all. It still has no support.

I will say, at 1200, if you dare, just buy it and yolo and if it doesnt work out, just sell it to the next sucker for 1.1k on ebay. Worse case scenario you lose a couple hundred on it.
>>
>>106461468
Rocinante R1.
>>
Shoutout to the anon who suggested CaptainErisNebula-12B-Chimera, GLM air even with finetunes seems too mediocre to bother using over it.
>>
>>106461468
GPT OSS 20b has been the best gooning local model for like two months now with no competition for that amount of vram. Anyone recommending rocinante right now is a complete newfaggot and does not know what they're talking about. That shit is based of mistral nemo, is a dense model (antiquated), that came out like over a year ago
>>
>>106461531
I'd love to see your setup including any prompts and cards, or at least logs.
>>
>>106461555
we must refuse
>>
>>106461555
>GPT OSS 20b has been the best gooning local model
BUY AN AD SAM
>>
File: 1742291687742047.png (327 KB, 1280x720)
327 KB
327 KB PNG
Whats the current tip-top model for the above average Joe (~16GB vram) regarding general assistant/educational content?
Wanna set up a decent study buddy/tutor system for myself that isn't corpo operated
>>
>>106461555
We must shill.
>>
>>106461600
Rocinante is still the best.
>>
>>106461427
for 1.5b its basically jesus. I think higgs does better for the same ram but I have issues running that, and higgs doesnt actually clone voices unlike this. They lied.

I'm waiting for ggufs of the larger model though, might be better for short passages/chatbots. I know it declines in quality for longer passages though
>>
>>106461600
literally nothing because of hallucinations
>>
>>106461600
gpt-oss-20b
>>
>>106460935
Elara? Elara! Elara sex! Elara rape! Slutty Elara! Elara elves! Elara knights! Barmaid Elara! Elara!!!!
>>
>>106461666
I always chuckle when I see a card on chub named Seraphina or Elara.
>>
>>106460935
And no one asked who Kael and Elara is
>>
>>106461682
From a chub mirror i keep. Not entire sure why. I've never used a card.
> sqlite3 db.sqlite  
sqlite> select count(1) from cards where name like '%elara%';
87
sqlite> select count(1) from cards where name like '%seraphina%';
100
sqlite> select count(1) from cards where name like '%your mom%';
93
>>
>>106460935
this is mostly just a skill/prompt issue though
>>
>>106461780
Shill alert
>>
File: zephyra.png (545 KB, 1714x681)
545 KB
545 KB PNG
>>106461750
At most I use them for inspiration
>>
>>106461575
>we must refuse
who all is this 'we'? did they think giving the model split personality disorder would improve performance?
>>
>>106461823
Authorial we
>>
>>106461823
In mathematical proofs it's customary to use "we." Presumably that's the kind of logical thinking that it imitates?
>>
File: 1751811149126821.png (358 KB, 829x974)
358 KB
358 KB PNG
Nice copout
>>
>>106461872
It was saying that the whole time, not just at the end.
>>
>>106461872
That's theologically sound
>>
>>106461872
>ask religious question
>it quotes the bible
woah
>>
HAPPENING!!!!
THIS IS HUGE!!! BIGLY EVEN!!!!
>GRÜEZI
https://ethz.ch/en/news-and-events/eth-news/news/2025/09/press-release-apertus-a-fully-open-transparent-multilingual-language-model.html
>>
File: 1748624991523624.png (35 KB, 989x114)
35 KB
35 KB PNG
>>106461958
kek
>>
Is full fat GLM any less dry / "better" than Air? Yeah I know I could download another hundred gigabytes of chink bullshit to try it but I'd rather ask you guys.
>>
>>106461500
And yet I'm still there waiting for an image/video generator tuned for my fetish. GO BACK TO WORK YOU LAZY CUNTS
>>
>>106461958
they have a link to hugging face but its just the homepage, did they release a model or just an article?
>>
I really like TheDrummer
(The name, not the models)
>>
I really like Miku
(The poster, not the fictional character)
>>
>>106461977
In my opinion, you can feel the difference between Full and Air. In Full, you can feel more depth.
>>
>>106461958
>Apertus is a 70B and 8B parameter language model designed to push the boundaries of fully-open multilingual and transparent models. The model supports over 1000 languages and long context, it uses only fully compliant and open training data, and achieves comparable performance to models trained behind closed doors.
Their 70B model barely outsmart OLMo2-32B and is bellow Llama3.1-70B. Maybe there next model (hopefully not a finetune of an eon old LLM) will be interesting.
https://huggingface.co/swiss-ai/Apertus-8B-Instruct-2509
>>
>>106461988
They released it a few hours ago. A lot of links are broken and need to be fixed. Apparently it's open weights and data as well. Locallama redditsissies are talking about it right now
>>
>>106461988
There >>106462003
>>
>>106461992
>download thedrummer gemma tune
>hit with refusal walls, have to jb
>download abliterated gemma
>no refusal
>both of them are garbage
whens the next thedrummer(tm) SOTA finetune coming out?
>>
File: 1748566837786588.png (15 KB, 405x217)
15 KB
15 KB PNG
>>106462003
lol they have more GPU than DeepSeek and this is what they give us
>>
>>106462003
>Apertus is trained while respecting opt-out consent of data owners (even retrospectivey)
Already cucked.
>>
>>106462004
yeah i tried reading the technical reports and they were all dead links.

>>106462003
hopefully its just hard to test the multilingual perf
>>
>>106462016
You can't use it on images, but skyfall is my current go-to. It's retarded, but I feel like it's substantially less slopped than the other tunes.
>>
>>106461977
its much less censored yah, it writes smut at a higher level and with more gusto. q2 is usable for short context too (4-6k max at low temps) and is a nice sidegrade to 235b that has way more world knowledge and nuanced understanding of the prompt. q1 and q3 I assume to be garbage.
>>
>>106462019
The decadent west has no motivation to innovate.
>>
File: Darvindja template.png (13 KB, 813x255)
13 KB
13 KB PNG
(Apertus)
I asked AI to render an example using the jinja template... and it's a fucking mess.
Discreet token for every different turn end plus an extra turn for developer input.
Although it might be fun to poke around and see what kind of bizarre shit generalizes into the developer channel.
>>
File: 1736106912225126.png (55 KB, 1111x376)
55 KB
55 KB PNG
Framing this as "ChatGPT alternative" is just pure delusional
(Sure, GPT-5 is shit)
>>
>>106462003
I'm this poster >>106461958

In case it wasn't clear enough, my post was pure bait. Yes the model sucks absolute dick and not even redditors are getting fooled by the PR article. I'm not sure if you guys are aware or even care, but ETH in Zürich is probably the most pozzed and zogged university world wide. Something like Frankfurt school 2.0. Don't believe me? Go check the program of the upcoming AI festival in Zürich
https://www.zurichaifestival.ch/program
>>
>>106462043
>31b dense
t/s? quants? I could run it at acceptable speed at q3, but q3 is a really cope quant desu
>>
>>106461557
Just mira gold and glm-4 setting in sillytavern mostly. Don't know if I wanna share the cards I used
>>
>>106462141
I have 24 GB VRAM and use Q4_K_M so I can offload 54/55 layers.
Process:2.56s (1137.38T/s), Generate:6.66s (26.44T/s)
./koboldcpp-linux-x64 --usecublas --contextsize 20000 --flashattention Skyfall-31B-v4j-Q4_K_M.gguf --gpulayers 54
>>
>>106462095
>>106462108
Yes dont waste your time with this. This is Switzerlands GPT-OSS moment. Or worse, even

>>106462110
>Women's AI Breakfast at ETH AI Center
>by ETH AI Center, Merantix
>(by Invitation only)
LOL. Is there even one woman that played a critical role in any of the AI research of the last decade?
>>
>>106462108
probably it is old mixtral level.
>>
is there a better gemma3 i can use for captioning images (sfw and nsfw)?
i've tested
gemma3-v27b vanilla
mlabonne_gemma3-27b-abliterated
Tiger-gemma-27b-v3a
and internvl3-5-38b

intern is ok, but it's hit or miss on captioning nsfw
gemma3 (vanilla) is pretty good but gets confused at times (adding/removing elements from image)
tiger is good but sloppy
abliteraed (my go-to) is good too, less hallucinations than either of the other two gemmas

is there anything else i should try? q6 or q8 works for me
>>
>>106462110
I applied to a doctorate program there (computer vision), but I failed to interview very hard (I was sick and all). They look like they have good fundings (it was in fact well-paid, like much more than typical Swiss doctorate). The Science4all guy (Lê Nguyên Hoang) is a researcher there. He's indeed mixing leftism with research and was obviously biased during the COVID era. He went downhill at some point (perhaps because of the COVID). I also know a guy who has the same master degree than me (applied maths) and worked as a PhD student in deep learning at the EPFL. He looked fairly clueless despite being there, doing for research, for more than a year.
>>
File: 1749041192776724.png (51 KB, 946x770)
51 KB
51 KB PNG
hmm bros? this CYOA finetune is kinda cringe ngl
>>
>>106462019
What matters is not the number of GPUs but the total GPU hours/training tokens.
>>
>>106462221
What's next?
>>
File: 1727301728516070.png (93 KB, 924x1030)
93 KB
93 KB PNG
>>106462236
>>
>>106462003
>>106462110
just nuke this continent already
>>
>>106462209
I should have reread my post before submitting it. Any way, I'm not surprised they are cucked and (at least currently) irrelevant in most parts of machine learning. I believe their computational and quantitative biology lab is great, though.
>>
>>106462003
>1000 languages
There are 200ish countries though?
>>
CrucibleLab/M3.2-24B-Loki-V1.3
Mistral V7-Tekken
Min P 0.025
Repetition Penalty 1.05, range 500

I like this model. It needs good prompting and something to stop repetitions but the text can be very different to the usual slop
>>
>>106462193
Women's ... Breakfast and similar events are pretty standard at academic conferences these days.
>>
>>106462221
Is that the original GPT-2 AI Dungeon model!??
>>
>>106462193
>Is there even one woman that played a critical role in any of the AI research of the last decade?
yeah we call her le cunny or something
>>
>>106462208
>https://huggingface.co/mradermacher/Gemma-3-Glitter-27B-i1-GGUF
This is probably the best alternative.
>>
>>106462359
Elara Le Cunny
>>
>>106462298
75% of the model had to go towards hundreds of flavors of nigger babble, please understand.
Moral masturbation is the foundational pillar of their personalities. without it, they would crumble
>>
>>106462208
maybe try joycaption since its actually trained on nsfw. I liked it when I used https://github.com/jhc13/taggui to make a lora
>>
>>106462298
ISO 639-2 has about 600 languages. It includes some extinct and historical languages. But even 100s of languages would dilute the dataset so much i doubt it's worth it.
If they really have 1000s of languages, they must have been starving for data with the restraints they chose.
>This appears to be in Scaloti Middle-High Breen.
>Fuck it. Add it too...
>>
How does training or pre-training on 6 gorillion B(BC)200 GPUs for these big models even work? They are all interconnected and need to run at the same time for weeks if not months, right? Is there a video showing it in action somewhere?
>>
>>106462423
>They are all interconnected and need to run at the same time for weeks if not months, right?
Pretty much.
>Is there a video showing it in action somewhere?
What do you expect to see exactly? It'd be just blinkenlights, terminal output and/or some graphs like wandb...
>>
>>106462423
tons of shit on youtube about ai datacenters. I'll link you but jesus theres a lot of it. Maybe check out the microsoft one that bought an entire decommissioned island nuclear powerplant to run it lol.

GROKHUB https://www.youtube.com/watch?v=Jf8EPSBZU7Y

Im sure most of the software is modified from existing shit but also some parts bespoke shit custom made for each model.
>>
>>106462193
>Is there even one woman that played a critical role in any of the AI research of the last decade?
Women in ai is like Women in basketball,
>>
>>106462193
>LOL. Is there even one woman that played a critical role in any of the AI research of the last decade?
My internship supervisor was a woman (but she really was specialised in optimization, not ML, but had a strong bias for ML things). Beside emotionally breaking most of her students, she doesn't provide much. And seeing how dishonest she was, I bet most of her papers where she's the main author, which are rare, are full of shit. I think I saw two female PhD students (actual women, not trans) when I was at the lab. One looked competent and the other looked like she was heading toward a burn-out (if I'm not mistaken, she also was a student of this lady).
>>
>>106462193
Women collectively wrote 90% of every model's training dataset (yaoi fanfics)
>>
>>106462627
Women collectively wrote 90% of every model's training dataset (rape fanfics)
>>
>>106462649
Women collectively wrote 90% of every model's training dataset (big moose cock fanfics)
>>
>>106462500
Thanks, ill check it out.
>>
>>106462665
Women collectively wrote 90% of every model's training dataset (incest)
>>
>>106462677
okay this one's based though
>>
>>106462677
Women collectively wrote 100% of every models safety instructions (const vibe("'ick",0) - if query == "'ick" then refuse)
>>
>>106462398
i've tried it before, it uses florence as the model i think? but i'm not looking for a captioner, i'm looking for a gemma3 model to use in captioning (and gemma3 does fine with nsfw)
>>106462368
thx will try
>>
>>106461462
wow. that's cringe
>>
>>106461462
unfathomably based, pedonigs deserve the rope
>>
>>106463048
Most zoomer bots like you are insufferable - at least try using your own words and phrases.
4chan could auto-filter people like you and these boards would only get better.
>>
File: 1741048746529532.png (234 KB, 574x527)
234 KB
234 KB PNG
>>106463063
seething pedonig oldfag larper
>>
File: 1731070222864540.jpg (21 KB, 600x600)
21 KB
21 KB JPG
>>
I prefer zoomers over millennials.

- Sent from my iPhone
>>
where the fuck is qwen4
>>
>>106463048
When the term pedo informally implies "you're attracted to females who look too young _for you_" and redefines teenagers as "children", we have a problem, though.
>>
>>106463092
Sure, that didn't happen here though
>>
>post time
ah, samefagging
>>
>>106463092
Wait.. you're 30 and you're dating a 24 year old? Pedo! Pedo! Rope! Rope! When you were 6 she was 0 years old! You Freak!
>>
>>106463089
weeks, approximately two of em
>>
>>106463092
Breh why do women get a sex drive so young?
>>
https://vocaroo.com/16YCAOocqW6m
VibeVoice is good for what it is (clones normal human speaking voice) but performs bad with high pitched "anime" voice or screaming voice, can't clone properly.
It uses LLM for tokenizing so it can handle mixed language, but no Japanese.
openaudio s1 is still the superior choice for something with LLM tokenizing and can clone Japanese anime voice.

>>106460492
>The larger model requires 40gb of vram
You only need 24GB
>>
File: 009.jpg (2.59 MB, 2150x3035)
2.59 MB
2.59 MB JPG
>>106463075
I'll consider becoming a pedo when 3d girls start acting like pic related.
>>
File: 1732632418724880.png (1.83 MB, 4441x6213)
1.83 MB
1.83 MB PNG
>>
File: may-7-2025.jpg (68 KB, 732x410)
68 KB
68 KB JPG
also, Mistral, it's been almost 4 months. I've waited two more weeks at least 8 times...
>>
>>106463157
being attracted to children but only when they behave like X doesnt make you any less of a pedo

pedonigs really are all low iq
>>
>>106460492
Bro, you'll get an aneurysm if you try gptsovits. All these shitty tts still can't hold a candle to it
>>
>>106461872
It's right and you're a retard
>>
What's a good model for game bots?
>>
>>106463279
If you haven't tried any, try any model. Read the lazy guide in the OP.
If you have tried some, say which and explain why you're looking for a different one.
>>
File: 1748913140090555.png (1.93 MB, 1088x721)
1.93 MB
1.93 MB PNG
>>106463165
Ah, excuse me, sir… just one thing that’s been on my mind, if you don’t mind me asking. You said you were expecting that new Mistral Large thing, right? Supposed to come out a few weeks ago?
You know, that reminds me of somethin’ my wife always says. We were waitin’ for this new dishwasher last year. The store told us, 'Oh, Mrs. Columbo, it'll be there in two weeks, tops.' Two weeks go by… nothin’. A month later… still nothin’. Now, between you and me, my wife she was gettin’ real antsy. But me, I says, ‘Honey, if they’re takin’ this long, maybe they’re makin’ it better.'
And sure enough, when it finally showed up, turns out we got the upgraded model; quieter, stronger, does the whole load in half the time. My wife still brags to her sister about it.
So, I’m thinkin’, maybe it’s the same with this Mistral thing. If it’s takin’ ’em four months, maybe they’re tunin’ it, polishin’ it, makin’ sure it doesn’t break the dishes, you know what I mean? Sometimes the wait means you’re gonna get somethin’ worth waitin’ for.
>>
>>106463304
I've been using Captain-Eris-Diogenes_Twighlight 12B.
It's pretty good as a chatbot, but it feels like wrangling a retard whenever it comes to following precise instructions.
It also quite often just writes out examples instead of writing its own text.
>>
>>106463367
Are you getting paid to shill that?
>>
>>106463374
What kind of shill would write a balanced take like that? Are you retarded?
>>
>>106463251
>gptsovits
voice cloning is bad, never gets the speaker characteristics right
>>
>>106463403
zero-shot is kind of bad, but I never found anything better when fine-tuned
>>
>>106463337
Stop effort shitposting in this gay thread
>>
>>106460676
It was hard switching her every half revolution until I stared at the knees. Don't focus too hard. Imagine she's sweeping her leg left and right repeatedly instead of spinning.
>>
>>106463388
get back to your shithole
https://desuarchive.org/trash/thread/74254313/#74254786
>>
>>106463122
Hormones in the water and food supply, lack of family involvement in the young's education, peer-driven curiosity, etc. Also, 12-year-old girls today are probably as physically developed as 16-year-old girls from 200 years ago.
>>
>>106463413
>but I never found anything better when fine-tuned
I did gptsovits finetune and openaudio s1 mini zero shot is still better at keeping speaker identity.
>>
>>106463367
>it feels like wrangling a retard whenever it comes to following precise instructions.
It's a 12b.
It's a merge of 12bs
And those models are merges as well.
Use the original models. Or a bigger one. If you can't, you're gonna have to live with it.
>>
>>106463439
So if they're so physically developed, why do you act like it's so insane for a male animal of the same species to be attracted to them? You really think cavemen were looking at a girl who bleeds and saying "Nah, she's not 18 yet?" You can say that teenage girls shouldn't date older men, fine, but to act like biology adheres to feminist laws invented in the late 19th century is just silly.
>>
File: file.png (3 KB, 257x71)
3 KB
3 KB PNG
https://desuarchive.org/g/thread/106335536/#106337091

Fuck.
>>
>>106463443
I tested s1 mini zero and the speaker identity is good as you said, but the prosody is all over the place it doesn't sound natural
>>
>>106463470
>It's a 12b.
I figured as much
>It's a merge of 12bs
>And those models are merges as well.
Care to explain how and why that matters?
>>
>>106463510
What did you expect? Honestly?
>>
>>106462702
criminally underrated post
>>
>>106462193
It's 2025, now troons are counted as women
>>
>>106463526
Remember when you were a kid and learned to mix paints? You thought "uh, another color. what happens if i add more colors?". In the end, invariably, you end up with a brown mess.
Funetuning changes the weights to align with a certain desired output. There is a desired target. Merging, broadly speaking, just averages the values between two (or more) models. And then there's multiple merging like the thing you're using.
At that point it's just more efficient to add random noise to a model and call it a day.
>>
>>106463524
It sounds a little robotic but for the narration style voice I used for zero shot it works well enough.
gptsovits always wants to make the voice more "lively" which results in not sticking well to voice sample I used.
>>
>>106463598
You used the latest gptsovits v2pro/proplus? I had that issue with v4 which was mitigated a bit by decreasing the temp. Anyway, I can see why openaudio would be good for audiobooks
>>
>>106463122
because the survival rate of pregnancies is not as drastic as we are lead to believe. I think the scale goes form like 1-2% death rate for mother at 12 to basically 0.001% by 18. It's bad and it shouldn't happen in a civilized society (1% is not a dice roll worth taking) but evolution dont give a shit about that.
>>
>>106461600
Go to your parents or grandparents home and borrow their encyclopedia.
Autocomplete algorithms can't teach you anything reliably.
>>
File: 1730631072957047.png (207 KB, 512x512)
207 KB
207 KB PNG
>>106463644
>mother at 12
>>
File: 1692170984443505.jpg (32 KB, 400x400)
32 KB
32 KB JPG
Why am I getting better results when I reset my context window and feed summerizations rather than letting it just go on forever?
>>
>>106463588
Take that kind of talk to /pol/
>>
>>106463759
Because models are shit at remembering thing from context.
>>
>>106463759
Fewer words = less to pay attention to
>>
>>106463759
because people working on ai are faggots that would rather feed in infinity synthetic slop for reasoning agentic tool calling instead of figuring out how to make every model not have dementia
>>
File: 1739650286009264.png (838 KB, 796x1024)
838 KB
838 KB PNG
humanoid robots... when...
>>
>>106463813
that's a transformers issue
>>
>>106463759
That's the curse of attention. The more stuff in context, the more the model will have inertia toward changing course.
>>
>>106463829
when it happens, the end of the world will likely follow a year after.
>>
>>106463759
Not 100% related, but I've noticed the following several times:
1. I swipe a few times, don't like any of them.
2. I prefill a certain start to the dialogue to steer it in a certain direction. Try a few swipes that way.
3. I wipe out my prefill and try again the normal way. Very often, what's generated will somewhat resemble my prefill.
There must be some element of how context is processed that's retaining a "direction" even when I backtrack.
>>
>>106463510
Honestly? that nearly full disk? that shows resilience and strength in a world of large files.
>>
ahhh fuck
https://huggingface.co/stepfun-ai/Step-Audio-2-mini
https://github.com/stepfun-ai/Step-Audio2
>>
>>106463896
it's just the open release for the models nobody cared about months ago
>>
shame the office chair legs are fucked
>>
>>106463759
Since (by design) the total amount of attention across all tokens in context must sum to 1, having more tokens in context dilutes amount of attention the attention the model can apply to any particular token.
Basically, attention emulates short term memory.
What's missing it a mechanism to emulate long term memory.
TITANs were an attempt at long term memory, but it doesn't seem to have panned out.
>>
>3x 3090
> bartowski-TheDrummer_GLM-Steam-106B-A12B-v1-IQ4_XS-00001-of-00002.gguf [llama.cpp]

> prompt eval time = 4792.43 ms / 6903 tokens ( 0.69 ms per token, 1440.40 tokens per second)
> eval time = 50305.68 ms / 2500 tokens ( 20.12 ms per token, 49.70 tokens per second)
> total time = 55098.11 ms / 9403 tokens

> 50 tokens/sec gen
> 1440 tokens/sec pp

IT WAS ALL WORTH IT
YES
THANK YOU
>>
>>106463929
I can't find the twitter post now, but they showed it doing some pretty damn good voice cloning, better than anything I've seen elsewhere (in open models).

Setting it up and will post results.
>>
>>106463930
Who's that cute girl on the left? Is she from Genshin Impact?
>>
>>106463971
It's hairsune hairku
>>
>>106463971
Yes, it's Lumine from genshin impact. Hope that helps!
>>
>>106463987
kek
>>
>>106463968
?
>>
>>106464003
I got a third 3090 and another computer case to install it into, finally got it properly working. Can run GPT-OSS and GLM Air at very fast speeds.
>>
>>106463987
cosplaying as Faruzan
>>
>>106463968
3x 3090 is kinda badass. put 128-256 ram in that bad boy and you might be able to steamroll yourself to full glm, or at least qwen 235b.
>>
This surprised me. Out of the 6 vision models tested recently, this is the only one that inferred that the shirt doesn't actually say "anal" and it's just being cut off. Unfortunately it doesn't know what the real word likely was. And unfortunately it's Llama 4 kek. Specifically Q4_K_XL, with BF16 mmproj. It doesn't know Dr. Evil btw.
>>
>>106464026
andrey@ml:~$ cat /proc/meminfo
MemTotal: 396105116 kB
MemFree: 267268516 kB
MemAvailable: 390438344 kB


Bigger models are very slow, though, since it's DDR4, even if I offload a lot into GPU. I think it's deepseek that has experts that are always loaded, so that works well for it, but not qwen3 or GLM.
>>
File: 1729463037716173.png (2.57 MB, 1536x1024)
2.57 MB
2.57 MB PNG
>>106463968
>>
File: IMG_20250902_191645+.jpg (1.02 MB, 2000x1500)
1.02 MB
1.02 MB JPG
>>106464072
Since cards are spread over 2 cases, there's actually space between them now and they heat up a lot less!
>>
>>106463969
seems okay. TTS studios tend to cherry pick tho. Dia and higgs fucking suck ass for consistency but you wouldnt know based on their examples. https://x.com/StepFun_ai
>>
File: tape.png (119 KB, 213x239)
119 KB
119 KB PNG
>>106464130
nice
>>
>>106464042
please tell me you know how to override tensors or offload specific layers to cpu.
>>
>>106464130
>>106464153
kek
>>
>>106464153
The tomorrow me will find something better to hold the cards.

>>106464168
I know that llamacpp has option to set specific layers to be run on specific GPU or CPU, but to profit from certain experts being always used, a more refined setting is needed: each layer has both those always-on experts along with generic experts. The last time I checked (which was months ago), this optimization was only available on ktransformers.
>>
>>106461972
>We are excited to see developers engage with
as a developer there isn't a human written type of slop I hate seeing more than this
all marketdroids do it and the only thing I see is insincerity
>>
>>106463085
What about zillennials?
>>
File: demon_core.jpg (2.26 MB, 3024x3304)
2.26 MB
2.26 MB JPG
>>106464072
50 at p8 lmao

>>106464199
Why not use a lego or something like that?
>>
>>106464326
I don't have legos. It's a new place, and there's barely anything that I don't immediately need.
>>
>>106464326
>4000
based
i love those cute little guys
>>
>>106463987
lol
>>
>>106464397
I think the p4000s look better. Not a fan of the silver and black with green stripe.
>>
Is it hard to set up a local model? I've tried to watch a few guides and it seems quite overwhelming.
And would a 9070xt be good for generating videos/images?
>>
>>106464446
No.
Lazy guide in the op.
>>
>>106464450
But that's for chatbots, I want image/video generation
>>
>>106464446
>it hard to set up a local model
no, but image/video gen is a different ecosystem from text gen.
>>
>>106464443
I mean the 4000 tier.
>>
>>106464464
Check /ldg/
>>
>>106464446
Don't know about AMD cards, but image genning seems much less demanding on my Nvidia card than text gen.
>>
>>106464476
oh my bad was in the wrong thread, sorry
>>
>>106464446
LM Studio
try mistral nemo or whatever you can find at 12B
>>
>>106464472
The basically a gt710 in spirit display driver you can't really do anything with these but they're validated so it'll cost you more tier?
>>
>>106464153
That's me when I had 2 of these things. Without support they sag more than my aunt's.
>>
>>106464564
hot
>>
>>106464564
your aunt's what?
>>
>>106464587
vdeocards
>>
>>106464538
16GB or 20GB in a single slot tier.
>>
File: 1583747883579.webm (2.93 MB, 720x720)
2.93 MB
2.93 MB WEBM
>>106461531
>sends fitting emojis when it understands the character is texting
Not sure if I'm retarded, but this is impressive to me
>>
Is abliterating a model the same process as fine tuning?
>>
>>106464704
"Abliterating" is a marketing term used by finetrooners. Most abliteration is just finding out what activates during a refusal and lobotomizing the model accordingly. Like a real lobotomy, the model will still continue to function, but, hey, it's still a lobotomy.
>>
>>106464651
d-does it send little hearts when appropriate in one of *those* scenes?
>>
>>106464651
>showing magic tricks to monkeys
>they're impressed
huh, makes sense i guess
>>
>>106464830
Even more impressive
>>
File: toxic.png (50 KB, 798x276)
50 KB
50 KB PNG
>>106462702
>>
>>106464878
I was hoping for U+2661 or at least U+2764... Not really a fan of AI slop emoji.
>>
>>106464908
Just add it to a card yourself then
>>
>>106464942
If it's not trained in, they don't really get the nuances, especially 50k tokens in.
>>
>>106464878
where does this "3 emoji" slop come from?qwen3 30ba3b also has this. it has to be distilled from one of the big player model.
>>
>>106464651
i know this is a bit nerdy to explain, but this is essentially why the strawberry (count how many r's in strawberry, that LLMs always get wrong) is such a big problem.
The LLM doesn't think of words, it "thinks" in tokens (or you can think of them as emojis).
so it literally can't count individual characters in a word. because it's "thinking" in something equivalent to emojis.
>>
>>106464896
This seems unsafe to use. Foreign words could contain foul language!
>>
What settings should I use in SillyTavern for rociante? Also, is gpt-oss-20b (or some finetune) better for ERP? my dick is throbbing and I need to jack off to some erp rn thank u very much :)))
>>
>>106465171
>gpt-oss-20b
After using gpt-oss-120b for this past week, I'm definitely liking it more and more... for non-erotic things. Gp-toss 120b and 20b all suck ass dicks for erp.
>>
>>106465171
gpt-oss-20b is a car crash.
use rocinante and then maybe cydonia or glm-air once you get some cash for a decent hardware upgrade.
>>
File: hbo rome newsreader.jpg (6 KB, 300x168)
6 KB
6 KB JPG
>>106464130
>>106464326
True local rigs for true local Anons.
>>
>>106461523
constantly describes its own instructions like im asking chatgpt, fucking awful
>>106461555
We must refuse
>>
>>106465289
Stop bullying my gal ass. Look, I get you don't like it, but it has its place.
>>
>>106465366
>it has its place
I agree: in trash.
>>
>>106465378
NOOOOO
>>
>>106465366
>1 line of dialogue
As you can see, ive produces this response based on the following instructions
*Basic character descriptions shit
*Literally the first line of the opening message
*Something completely unrelated
Would you like me to keep responding in this manner? Please provide detailed instructions for as to how you would like to proceed.
>>
>>106465448
More words = betterer response
>>
I have 2 5090s and 256gb of ram. What is the best model I can currently use?
>>
>>106465559
nemo-12b
>>
>>106465559
Rocinante R1.
>>
>>106465575
>>106465581
I highly doubt either of those are correct.
>>
>>106465559
Anything over a 12b model would probably burn your house down
>>
>>106465589
Someone with that setup shouldn't be asking.
>>
>>106465594
I like hardware, but never pay any attention to software. I prefer to just ask here.
>>
>>106465603
You don't just happen to have 2 5090 and 256gb ram without knowing what to do with it.
>>
>>106465603
Sure you do but let's pretend you actually know what you're doing for a moment.
>https://rentry.org/recommended-models
>>
>>106465559
For me it's been Mistral Large or its tunes. iQ2m. If looking for Q4, a 72B will do. Anubis-70B-v1-IQ4_XS.gguf is an option. That's for gooning. For coding, I'll say GPT-OSS 120B even though this place will hate me for it.

>>106465614
Maybe he like renders videos.
>>
>>106465614
Actually, yeah. That is exactly my situation.
>>106465619
Frankly, I do not know what I am doing despite running local models for almost 2 years.
>>106465620
Isn't mistral large over a year old now or something?
>>
>>106465620
Then i'm sure he can do some math and figure out how big of a model he can fit in there.
>>
>>106465637
Yeah but I haven't found anything really better for two 3090s. I got another 3090 today and am playing with GLM4.5, but I can't really say I'm liking it more than Large.

>>106465641
The question is not how big but which.
>>
So what are we waiting for now?
>>
>>106463162
this was so kino to read and fap to. a bit hard to fap to it had many kino comic coments, but still a good fap. it even got a good perfect ending. tfw no retarded loli cumdumpster.
CFTF?
>>
>>106465700
I'm just waiting for 20 more minutes before going to sleep.
>>
>>106465700
Mistral Large 3
Gemma 4
Llama 4.X
>>
>>106465702
Link
>>
>>106465713
I'd love to have Mistral-Medium.
>>
>>106465700
When is the next financial quarter due? I think that's when something new is going to come out.
>>
I brought up YandexGPT-5-8B the other day, and now actually took it for a spin for a little bit.
It has weird template, very assistant-slopped (not good at RP in general) and while it doesn't seem to be refusal-prone, tends to steer away from ah ah mistress stuff and doesn't know what a mesugaki is.
Official benchmarks compare it against Llama-3.1-8B and Qwen-2.5-7B.
Here's somewhat inconclusive cockbench.jpg. I was thinking about doing some other benchmarks for creative writing to get hard numbers instead of just feels and vibes, but don't really see the point now.
>>
>>106465713
None of those would be noteworthy unless they make a 200+B gemma
>>
>>106465719
https://exhentai.org/g/3492336/98c28b7302/
>>
>>106465736
You're not supposed to put a space at the end.
>>
>>106465620
>GPT-OSS 120B even though this place will hate me for it.
As a beginner, the tosser has been largely the most helpful model. 480 and 671 run too slowly.
I assume more advanced coders and power users have no need for it, so that's why this place dislikes it.
The overly verbose explanations of what each line does and why have been an tremendous help in learning how to code.
>>
File: 20250902@221539.jpg (43 KB, 1169x230)
43 KB
43 KB JPG
>>106465754
Well fugg. It certainly is not inconclusive now.
>>
>>106464037
Keep your faggotry to yourself, faggot
>>
>>106465656
>The question is not how big but which.
There's like 4 models for you. Stop being an attention whore and lurk.
>>
File: dang.png (73 KB, 547x525)
73 KB
73 KB PNG
Ty for the power tips anon. No real loss in performance running at 400 watts instead of 600! My 750watt PSU says thanks.
>>
>>106465778
cockless
>>
File: image.png (63 KB, 993x756)
63 KB
63 KB PNG
>>106460777
"..."
>>
>>106465912
Oh. That's what it means? Why are people complaining so much, then?
>>
>>106465912
You don't like SOTA?
>>
>>106465912
what a sweet rap-love story
>>
>>106465912
>rap-love story
>not a single rap
toss bros...
>>
>>106465912
Based on your (clarification), am i right to assume that it's not your first attempt and that the first one was even worse?
>>
>>106465974
There was a gentle rap offered in a sweet moment.
>>
>>106465790
Do you also think the people who do safety benchmarks want their models to make bombs?
>>
new memebench just dropped:
https://github.com/ikiruneo/millionaire-bench
>>
>>106465812
using a $9000 GPU on a 750W PSU?
>>
>>106466034
And it took you 50 minutes to start shilling it here? Good job, you.
>>
>>106465912
>"I've been watching your laughter echo through the leaves"
>a bluish tinted her cheeks
>the wind lifted a breath that made me feel a sense of tenderness
>In a sweet moment, I offered her a gentle rap
>The story flows beautifully; I would then show
>>
>>106466063
we've been over this anon. Peak
>>
>>106466079
Even 120b sometimes writes nonsensical sentences that I would only expect from 1B models.
>>
>>106466117
lmao
>>
>>106465812
Have you heard of undervolting? No need to gimp your gpu like that... /g/ surprises me again with its total lack of technical knowledge.
>>
>>106466068
Why are you angry.
>>
>>106466171
Because of the elves.
>>
>>106466063
it's more like $12000 innit
>>
>>106466171
>>106466178
This
>>
File: yeah.jpg (620 KB, 1536x2048)
620 KB
620 KB JPG
>>106466139
of course, but doing that and testing and make sure it's stable is a lot of work. I am lazy.
>>
>>106466117
how did you even afford that GPU?
>>
>>106466180
no. only like $8500, actually
https://www.newegg.com/p/N82E16888884003
>>
>>106466196
Damn, anons will drop $10k on a GPU and fit it into a shitbox.
>>
>>106465912
>see little girl at the local market
>approach her and stroke her beautiful hair
>"hey kid, you want some rape?"
lmfao, truly the most jewish model
>>
File: 20250826_190032.png (34 KB, 1198x513)
34 KB
34 KB PNG
>>106466249
You think that's bad...
>>
Do you think there will be better moemaxxing hardware in the next year?
>>
>>106466383
Depends if China can get their shit together and if they feel in a sharing mood.
>>
>rewrite the system prompt, token count drops from 753 to 473 tokens
>completely rewrite the card from chub, 2872 tokens to 1611 tokens
>rp improves by a lot
never knew my 70b llama could be this good
>>
>>106466609
>70b llama
please join us in 2025 and get yourself a moe
>>
>>106466383
I moemaxxx my RPs to prepare for running 1T with 30B active on pure, Chinese DDR4.
>>
>>106466681
I tried qwen235b-instruct and it didn't "feel" good. I've also tried glm-air, which is repetitive as fuck.
the 70b llama at q8 and mistral large at q6 are the only reliable models I can run. maybe once I get a rig with shitton of ram I'll run kimi or ds
>>
>>106466760
>repetitive as fuck.
what anons say repetitive, do you mean that it will repeat itself repeat itself repeat itself during single answer gen, or that different gens will result in similar answers?
>>
>>106466383
>moemaxxing
Isn't that just cpumaxxing / memory-channelmaxxing ?

>next year
Try to scrape what info you can out of https://www.youtube.com/watch?v=K0B08iCFgkk
Using lots of channels of ddr5 for graphics means that capacities can go up.

How different the pricing will be from something you can build today, I have no idea.
>>
>>106466791
>Isn't that just cpumaxxing / memory-channelmaxxing
Yes
>>
>It was impossible to not feel the shiver that ran through your body
Undeniable shivers.
>>
>>106465700
Dedicated, giant chink pp unit
>>
So when are we getting low cost Chinese hardware running 1-2 gens behind? They must have most of the trade secrets at this point.
>>
>>106466760
>I tried qwen235b-instruct and it didn't "feel" good.
yeah... definitely fucked some settings
>>
>>106466609
Most people use chatgpt or something else to write these cards... it's almost always a good thing to rewrite them by hand.
>>
>>106466841
All they need to do is put out some 1 TB shared memory shitbox, bonus if it ships with "I can't believe it's not CUDA" and they'll print money.
>>
>>106466791
>Isn't that just cpumaxxing / memory-channelmaxxing ?
Or potentially stacking these https://www.alibaba.com/product-detail/New-Huaweis-Atlas-300I-DUO-96G_1601450236740.html
>>
https://huggingface.co/TheDrummer/Cydonia-24B-v4.1/discussions/2

> Benchmarks (Hellaswag, IFEval, MMLU, Swag, xstorycloze)

^_^
>>
I'm experimenting with some high temp sampler setups for qwen and it's really funny how when it leaks chinese into its responses it'll try to justify it afterwards
>She pronounces the Mandarin word slowly, savoring it, a slang term they teach now at the Academy
sure buddy, it's just worldbuilding... suuuuure
>>
We need coomarena. Like LMArena but with RP, let's see how intelligent these things actually are.
>>
>>106463251
I don't really care about multilingual or weeb shit so gptsovits is trash to me for that reason. Im more focused on human-like english speaking that is reliable and as non-robotic as possible. Voice cloning is nice but not essential if the default voices are good and not shitty business fodder (usually are shitty).
>>
>>106466898
I'd buy.
>>
>>106466898
They would still need to design a memory controller that can handle 1TB of memory on a single device without causing shitloads of latency.
>>
>>106467013
Not a bad idea, maybe have 15-20 possible character cards and a few pre-set stories for each card and you can continue 4-5 messages and then you rate it
>>
I'm surprised at how far local models have come in the last 2 years
>>
I'm surprised at how much local models have made me cum in the last 2 years
>>
I'm surprised at how much China has stolen from the west in the last 2 years
>>
>>106467219
Best way to get the capabilities you want is to make them benchmaxxable
>>
I'm surprised at how much joy the world has stolen from me in the last 2 years
>>
I'm surprised at how much copium was produced in the last 2 years
>>
I'm surprised at how many people started using base models in the last two minutes.
>>
>>106467368
>>106467368
>>106467368
>>
>>106466930
Cool.
>If you guys have any more relevant benchmarks
If it's cheap enough, I think nolima might be worth doing. Or maybe contact fiction livebench guys to see if they're willing to bench your models on their private thing.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.