[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: file.png (2.6 MB, 1328x1328)
2.6 MB
2.6 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107044779 & >>107035841

►News
>(10/30) Qwen3-VL support merged: https://github.com/ggml-org/llama.cpp/pull/16780
>(10/30) Kimi-Linear-48B-A3B released with hybrid linear attention: https://hf.co/moonshotai/Kimi-Linear-48B-A3B-Instruct
>(10/30) Brumby-14B-Base released with Power Retention: https://manifestai.com/articles/release-brumby-14b
>(10/28) NVIDIA-Nemotron-Nano-12B-v2-VL-BF16 released: https://hf.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16
>(10/28) LFM2-ColBERT-350M released: https://hf.co/LiquidAI/LFM2-ColBERT-350M

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>107044779

--Paper: INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats:
>107048819 >107050729 >107051225 >107051397 >107051579 >107051763 >107051785 >107052024 >107052042
--Kimi Linear release and model size vs performance tradeoffs:
>107052386 >107052523 >107052534 >107052587 >107052868 >107053037 >107053119 >107053253 >107053271 >107053372 >107053399 >107053296 >107052943 >107052960
--Brumby-14B-Base's power retention architecture:
>107053745 >107053782 >107053793 >107053806 >107053815 >107054051 >107054141 >107054191 >107054161 >107054205 >107054237 >107054228
--MiniMax M2's full attention choice due to efficient attention's unmet real-world expectations:
>107055069
--Optimizing VibeVoice-Large-Q8 with selective quantization and performance tweaks:
>107046566 >107046649
--Input text recovery from hidden states:
>107053293 >107053393
--CUDA toolkit installation headaches and alternatives:
>107045283 >107045326 >107045351 >107045445 >107045512 >107045605 >107049390 >107049857
--Mixed experiences and optimization tips for glm 4.6 usage:
>107051344 >107051367 >107052899 >107053125 >107051379 >107051387 >107053864
--GLM-4.6 excels in code planning and tool stability:
>107046842 >107046900 >107046932 >107046939 >107047296
--Evaluating Mamba-based LLMs: context length claims vs practical performance:
>107044925 >107045236 >107045252 >107045278
--Qwen3VL support added to llama.cpp:
>107054671 >107054693
--LLM preference inconsistency under contextual shifts:
>107049878 >107049939 >107049985
--Exploring transformer token prediction theory and Suno AI's limitations:
>107047458 >107048117 >107048175 >107048207 >107048762
--Logs:
>107046612 >107046642 >107048277 >107056280
--Miku (free space):
>107047069 >107049649 >107051768 >107051786 >107053223 >107053796 >107055480

►Recent Highlight Posts from the Previous Thread: >>107044782

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: QwenWeenieTest.png (753 KB, 1442x1686)
753 KB
753 KB PNG
this is /lmg/. please post screenshots of using models locally.
model tested: mradermacher/Qwen3-VL-32B-Thinking-Q6_K.gguf
>>
i want to show my cock to qwen
>>
>>107056358
this is /lmg/
please post your weenie
>>
>>107056396
too busy sniffing
>>
File: modelception.png (129 KB, 1895x866)
129 KB
129 KB PNG
>>107056358
I put your screenshot to test on the smallest version of this Qwen vision: 2b instruct.
For something that ridiculously small and fast, it's quite coherent.
>eval time = 3950.30 ms / 383 tokens ( 10.31 ms per token, 96.95 tokens per second)
I think I'm going to use this to tag my personal photo library. It's the sort of usage where you don't give a shit if there's a few unimportant tagging mistakes, but it's convenient to do it fast.
>>
>>107056482
could you please tell me what it says if you feed it the image directly in a new chat? the image is here
>>107053751
>>
I do kinda feel hurt now :( the first time I understood, the second time gave me pause, and now this third time I feel kinda :(
I just wish I knew what I did wrong
>>
>>107056396
do we get freebies here for posting vids of drinking pp from our weenie?
>>
File: me talking to AI.png (213 KB, 512x387)
213 KB
213 KB PNG
whats good that I can run locally (for ERP stuff) on a 4090 and 64GB of ram?
>>
File: instruct.png (122 KB, 1894x859)
122 KB
122 KB PNG
>>107056509
>>
>>107056482
Something that fast and light could be cool for some kind of use in a video game, I'm thinking some kind of sci-fi game where your actually have an AI companion that can see your screen via periodic snapshotting so it can make comments about your progress or moral choices or whatever.
>>
>>107056541
not bad at all for a 2B model, qwen cooked pretty good with this one.
>>
>>107050715
Post logs.

>>107055520
There is no OCR attention. It was a foot note about a silly idea that's basically just how the original encoder-decoder transformer already worked (encode one string as a fixed length vector and use that to generate another string).
>>
>>107056533
no, you're looking for ecker, he's normally in /aicg/
>>
>>107056583
>There is no OCR attention
aww
i hope we get v4 for christmas then
>>
>>107056583
I would post logs but I have grown bored of the usual Gemma3 / Mistral. It is inheritently about my scenarios and how I have implemented them.
>>
>>107056538
air
>>
>>107056619
i plan to run qwen 3 vl on one computer and kimi on another computer and alternate between the two models. maybe you can do something similar to breath creativity back into your logs
>>
File: theyellowone.png (162 KB, 1426x721)
162 KB
162 KB PNG
poor neru...
>>
>>107056696
Is that the 32B?
>>
>>107056533
Prompt processing is not for drink.
>>
>>107056722
yeah, its the Q6 quant that i mentioned above, i rerolled a dozen times and it kept saying it was rin
>>
SERS REDEEM THE BLOODY LESSON ON HOW TO SUCCEED IN VERTICAL AI BASTARD BITCH
https://youtu.be/9CHktrroCDU
>>
>>107056780
>we, as resellers of API services without any custom infra or ability to host our own finetunes, have evaluated the usefulness of finetuning and determined that it's useless
yawn
>>
File: nigger.png (213 KB, 864x890)
213 KB
213 KB PNG
IQ4_XS and FP16 mmproj
32b
qwnvl3
onions
>>
>>107056648
I tried to implement a new scenario and was bored with the output before I could edit the text files. I knew how it would end up.
Maybe I should trash my current setup and start over from scratch.
>>
>>107056887
they omitted the sharty from the training data? monsters.
>>
File: cat.png (206 KB, 911x934)
206 KB
206 KB PNG
im starting to believe anyone experiencing ai psychosis is a sub room temperature iq
>>107056901
air time
>>
>>107056937
I guess it's a time of realizing that I'm a bad writer.
No LLM will overcome that fact.
>>
>>107056937
>im starting to believe anyone experiencing ai psychosis is a sub room temperature iq
yes, and they would have experienced psychosis even if ai didn't exist
it's just that AI is whatever happened to be in front of them when they went psychotic
but this type of people don't need a SPECIFIC thing to trigger them, they will be automatically triggered by something, it's their destiny
t.calvinist
>>
File: 1753629040311462.webm (797 KB, 362x640)
797 KB
797 KB WEBM
>directoranon
i thought to make things enable/disable by clicking the label, instead of having to click disable in the list and lose the current index. but none of the current code models seem to know what to do with me importing lorebooks as dynamic settings (ie 'day' which contains sunday, monday, etc doesn't show up unless its read first). not sure how i'll do it, if at all, but i'll keep trying
>>
>>107057009
Stop using external UI. If you want to randomize things you can use ST macro random.
I don't have my old texts but you can create 'quest objective' in introductory message by using <!-- then random table --> and it won't show to the user.
>>
File: file.png (197 KB, 952x822)
197 KB
197 KB PNG
qwen vl is underwhelming, ill post my cock and see what it does
>>
how do i do images with ST?
>>
>>107057036
wut. are you drunk anon? that isnt what my addon is about at all. even though its quite thrown together, its totally inline with all other st addons. dunno where you got randomness, quests and stuff from. my addon is for keeping track of clothes, locations and stuff via lorebook entries. my webm was showing a new way to enable or disable entries without going into the menu and selecting 'disable' but offering a click toggle instead.
>>
>>107057083
Nah, it's just a text injection.
Take it easy, you don't need to protect it, let people use it.
>>
>>107057101
i still don't get what you mean. its not protected. you can see the code
>>
File: CANIEATIT.png (669 KB, 1421x849)
669 KB
669 KB PNG
>>107056937
CAN I EAT IT?!
>>
File: file.png (66 KB, 501x553)
66 KB
66 KB PNG
why would anyone fuck Qwen VL when you have to caption it with the assistantslop and then send the caption to the model
>>
>>107057074
Image Generation built in extension.
>>
>>107057121
Be silent.
>>
>>107057130
i meant pasting images, thank u still
so i have to wait for it to caption it
why not just use florence-sex-2-large to caption img and feed it into a random model
>>
>>107057121
You are a very cool anon, is it AGPLv3?
>>
>>107057138
you're drunk. or a retarded bot.

>>107057153
it has no license, use any of it how you see fit https://github.com/tomatoesahoy/director
>>
>>107057126
tavern wasn't really made for this so anything that's not chatting is very rudimentary and stuck on shoestring and cardboard standards from 2023
>>
>>107057162
>it has no license,
grim
>>
>>107057162
>it has no license, use any of it how you see fit
Pretty sure anything defaults to rights reserved by the creator.
>>
>>107057162
Are you larping as a reddit moderator?
>>
File: 2025-10-30_21-52.png (65 KB, 345x393)
65 KB
65 KB PNG
drumdrum whyd you do this?
>>
File: 2025-10-30_21-53.png (54 KB, 1669x413)
54 KB
54 KB PNG
drummer cant you tell us a little about this please?
i liked glm steam, it was a sidegrade to air
while i did remove steam, i wanna try v1c
drumm..
>>
>107057178
this is the reddit and memey guy trying to be funny isn't it?
>>
>>107057212
You are the reddit moderator who got kicked out from reddit.
>>
>>107057207
Sorry, I signed an NDA.
Won't be long though.
>>
>>107057174
why would it need one? its a small script

>>107057177
thats me then and anyone can use it for any part. i hope it serves as a good example for reading lorebooks and updating data.
>>
>>107057207
he'll never give any secretes away here maybe if you asked on the 'cord...
>>
>>107057237
what the fuck, this better be an anon trolling
what the fuck...
>>
>>107057257
We would never troll you...
>>
>>107057126
what?
>>
>pip is perfectly fine just use a venv bro, what are you dumb?
Meanwhile pip looks for three different versions of flash-attn when installing axolotl and there is no sane way of figuring out which version of the binary wheel I would have to install manually to avoid the 2 hour build from source. And then fails with a 404 looking for God knows what in God knows who's server.
$ cat log.txt | grep flash-attn=
Collecting flash-attn==2.8.2 (from axolotl[deepspeed,flash-attn])
Collecting flash-attn==2.8.0.post2 (from axolotl[deepspeed,flash-attn])
Collecting flash-attn==2.7.4.post1 (from axolotl[deepspeed,flash-attn])

At least yesterday the install was failing.
If it keeps failing today once it errors out I'll post the error.
>>
>>107057207
Oh lmao, I couldn't quant it. Fortunately the full weights worked. But v1c was kinda bad.
>>
>>107057311
alternatively have you just tried not being a retard?

flash_attn 2.7.4.post1
torch 2.7.1+cu128
torchaudio 2.7.1+cu128
torchvision 0.22.1+cu128
>>
>>107057060
CDs are saucers.
>>
>>107057340
that version doesn't have a prebuilt binary
https://github.com/mjun0812/flash-attention-prebuild-wheels?tab=readme-ov-file#install
>>
>>107057342
look like cherubim to me
>>
File: 1761768236392318.gif (182 KB, 208x292)
182 KB
182 KB GIF
VRAMLET here. Is it more retarded to buy a bigger ddr5 kit (96-128gb) and just sucking up slow token generation or do I stick with 32gb of system RAM and try to get a 16GB card?

or would throwing a Tesla K80 or P40s in the spare PCI slot be less retarded?
>>
>>107057414
(for cuda 12.8 I mean)
>>
>>107057297
u can run glm air with 64gb ram and 12gb vram
the more ram you get the better, but ddr5 prices are high now, idk what to tell u
>>
>>107057422
Buy the DDR5 kit
>>
>>107057414
that's the wrong repo retard-kun https://github.com/Dao-AILab/flash-attention
>>
>>107057422
If you don't have 16GB, ideally 24GB VRAM already then all the RAM in the world won't help you. Sure you can technically run big MoEs but it'll be slow to the point that you won't want to.
>>
>>107057523
>Sure you can technically run big MoEs but it'll be slow to the point that you won't want to.
particularly with reasoner models kek
3t/s when there's something to actually read means something different than 3t/s for a thinking block that ideally you would even want to hide because it's such shit to read
>he waited 5 hours to read the first line of actual text
>>
Alright anons, I sent qwen-chan my cock, it's a new dimension alright. It also successfully recognized cum.
>>
>>107057567
But, how did it rate your cock?
>>
>>107057567
May I see it? (Proof I mean)
>>
File: file.png (99 KB, 948x349)
99 KB
99 KB PNG
>>107057602
half pic is censored with a retarded color too btw
>>107057599
ill write a neutral card for that, this one's a slut
>>
>>107057509
The official repo doesn't provide binaries, you have to build it yourself and like I said yesterday the build process was 404ing while trying to fetch something. I have it building now, I'll post the results when it's done.
>>
>>107057523
>>107057538
would throwing a 80 dollar k80 into pcie slot 2 or even an old 1070 on plex duty help? or should i just stop being a poorfag and get a 4090 or high RAM mac?

>>107057619
unfathomably based
>>
File: captiontextest.png (835 KB, 1416x867)
835 KB
835 KB PNG
obligatory virgin angel OCR test
>>
>>107057627
>>107057451
depends how big you want to go anon, but a 4090 isnt really worth getting nowadays, best idea is an okay vram amount gpu with the most ram you can stuff (high channel too)
>>
>>107057422
Save your money and get the z-ai coding plan.
>>
>>107057638
ok, but is it correct? i can't really differentiate between moonroones unless they are at 4k and i have them side by side
>>
>>107057625
stop being a retard anon. please. this is the last time i will spoonfeed you.
https://huggingface.co/marcorez8/flash-attn-windows-blackwell/tree/main
>>
>>107057638
It missed (at least) a char in the bottom row. Does that change/degrade the translation? Is it like stuttering or is it just how it is?
>>
>>107057627
I didn't benchmark it but I think a K80 will be barely faster than DDR5, if at all.
If you're going to try and get a cheap datacenter GPU for use with llama.cpp/ggml specifically, my recommendation would be to get an AMD MI50 instead.
>>
>>107057638
>>107057673 (cont)
Oh. It's entirely the wrong char as well. 2-4th chars at the bottom. Questions stand.
>>
>>107057673
It also got the 8th character wrong.
>>
File: file.png (109 KB, 955x380)
109 KB
109 KB PNG
>>107057599
It's right about the angle..
>>
>>107057625
they provide the prebuild wheels in the releases tab retard-kun...
>>
>>107057720
>700% on a professional rating system
Nice cock, bro.
>>
File: file.png (45 KB, 950x146)
45 KB
45 KB PNG
>>107057599
other pic
>>
>>107057663
not quite. it's missing an extra お in the third line and the く in the second line should have a dash symbol next to it, no idea what character its supposed to be.
>>
File: moonrunes.png (14 KB, 1121x90)
14 KB
14 KB PNG
>>107057755
>>
>>107057741
why do you have green on your dick?
>>
>>107057760
kekeke he doesn't have jap fonts
how embarrassing
>>
>>107057666
I'm on Linux
Actually the build failed just like it failed yesterday. Looking at the log I think it actually might be OOMing (processes being killed) and the 404 might be unrelated. I did it on a 64GB machine, but maybe it's spawning too many processes.
https://paste.centos.org/view/ea156e49

>>107057726
Huh, interesting, I didn't know that, thank you.
>>
>>107057768
see >>107057619
>half pic is censored with a retarded color too btw
as for why green specifically, fossify gallery default is green
>>
>>107057760
>>
File: file.png (7 KB, 394x67)
7 KB
7 KB PNG
>>107057769
usecase?
>>
>>107057638
question: has there ever been a model that successfully did it with 0 mistake? every time I see it, there was always at least 1 typo in the OCR
>>
File: characterpack.png (26 KB, 1194x284)
26 KB
26 KB PNG
>>107057760
embarrassing.
>>
>>107057772
anon plz.
https://huggingface.co/Alissonerdx/flash_attn-2.7.4.post1-cp312-cu12.8-torch2.7.0-linux_x86_64/tree/main
>>
>>107057802
Why do you need to see chink runes if you can't even speak the language?
>>
>>107057794
Gemini did the best with only 1 mistake iirc
>>
File: file.png (205 KB, 1408x981)
205 KB
205 KB PNG
>>107057817
truth
>>
>>107057817
Who says I can't?
Also
>speak
don't need to speak it to read it, retard
>>
>>107057720
had to swipe six times to get a positive response
>>
Coping weeb having a melty, keep mining that anki bitch boy lmao
>>
I've pulled latest llama.cpp and sillytavern-staging, but I keep getting a fail when I try to attach an image, "Failed to caption image.
Failed to caption image via Multimodal API"
Gemma 3 and Mistral's vision work just fine, any ideas?
"%~dp0\llama.cpp\llama-server" -m "Z:\Downloads\Qwen_Qwen3-VL-8B-Instruct-bf16.gguf" --mmproj "Z:\Downloads\mmproj-Qwen_Qwen3-VL-8B-Instruct-bf16.gguf" --port 8080 --threads 7 --flash-attn 1 ^
-ngl 999 --ctx-size 4096 --batch-size 256 --no-mmap
>>
File: file.png (8 KB, 726x74)
8 KB
8 KB PNG
>>107057785
just get ipa and adobe han my nigger, they look good and don't take too much space
>>
>>107057794
we really need to get a comparison image cooked up like we do for the cockbench
>>
>>107057835
no anon, first 3 were refusals because card was too vague
4th was a "whyd you send me your cock"
then i outright made it a cock rating card
5th rated my cock with the parameters included in my persona, which isnt fair for a purely vision based test
>>
>>107057817
because i have functional eyes and can see the difference in the shapes of the characters even if I cannot translate said language. at least i can tell if it's even detecting the kanji correctly with the OCR output.
>>
>>107057843
chat completion > enable inline image in sidebar
>>
>>107057817
Even if you can't speak it, you should still be able to partially read some runes. Alphabets like these are easy low hanging fruit in terms of learning.
Stop being a languagelet.
>>
>>107057843
That one doesn't use mmproj?
>>
>>107057874
i learnt one of the three kanas and i forgot it a few days later
>>
>>107057641
>depends how big you want to go anon
I just want to play with decent quants of the big boy models and whatever ERP forks are good at storywriting and being creative.

>most ram you can stuff (high channel too)
does ddr5 still suffer from multichannel issues, or is that just from gamers trying to overclock it for 0.4 FPS boosts in tf2? I still have channel 2 open on my 4 slot board.

>>107057680
Thanks dev-kun, I'll check those out.
>>
>>107057886
You won't retain without regular usage.
>>
>>107057892
you want a 8-12 channel board if you're running big boy MoE models
>>
AI has completely invalidated any benefit to learning japanese
I'm glad I didn't commit all those years ago
>>
>>107057907
Having AI translate website or even real time when asking for directions is not the same as being able to make actual connections with other humans.
>>
testing 4B in some tasks like basic software UI translation (4k token of json strings. I do not use constrained decoding on purpose, part of the challenge is that it should generate that much tokens of JSON without a single syntax mistake too. Qwen 4b was one of the few small LLMs that could consistently do it without constrained decode), it feels like it didn't lose any smarts from the previous 2507, which goes against the grain because most of the time the VL versions are more retarded
did they finally figure out the recipe for making multimodal small models
it's amazing how much better these things are compared to the days when gemma 2b was the most coherent thing in the micro sized llm space
>>
>>107057923
>wanting to connect with 3dpd
>>
File: file.png (276 KB, 998x873)
276 KB
276 KB PNG
is miku eldrich horror?
>>
File: 1753848235692447.jpg (914 KB, 1796x2500)
914 KB
914 KB JPG
>>107056325
>>
>>107057923
3DPD? what's the usecase?
>>
>>107057926
>software UI translation
Yeah... About that...
>>
>>107057945
babies
>>
>>107057943
what does one do with so many mikus
>>
File: 1747911393447695.png (60 KB, 316x558)
60 KB
60 KB PNG
>>107057871
I'm unfamiliar with using chat completion, but I switched to it and enabled inline, now I just get a different generic error.
"Chat Completion API
failed to process image"
These are my captioning settings.
>>
>>107057954
no thanks, i was a child once. it was awful.
>>
File: imagesettings.png (83 KB, 225x784)
83 KB
83 KB PNG
>>107057974
very strange. what's the error?
>>
>>107057987
6 is being generous
>>
File: 1749569601340483.png (473 KB, 795x991)
473 KB
473 KB PNG
>>107057987
>>
>>107057987
seems fair. one point per cm.
>>
>>107057987
>these penises are what shartyniggers jerk off to
>>
>>107057726
>>107057816
Solve it with:
pip install torch==2.7.1 && pip install flash_attn-2.7.4.post1+cu12torch2.7cxx11abiTRUE-cp312-cp312-linux_x86_64.whl

Keywords for when I search for it on the archives later: axolotl flash-attn flash attention flash-attention
>>
>>107057904
I'm just trying to find the best upgrade-within-the-fun-budget for my gaming rig since vidya sucks now. An entirely new PC would be hard to justify unless i find cheap, used servers/workstations with a ton of channels to make a migubox.
>>
>>107058132
can you tell us about your gaming rig
>>
>>107057946
am not sure what I'm supposed to see in that shot.
At those sizes, LLM break faster btw because of quantization. I find anything less than Q8 is very noticeably damaging, though 4b can still somewhat remain coherent at q4, while 2b will enter loops very easily.
>>
>>107058141
>3070 (8gb)
>Ryzen 9 7900x
>2x16gb RAM EXPO'd to 6000MT/s
>4 RAM slots, AMD B650 Chipset
Microcenter had a bundle so i replaced my 6600k a few years ago and decided to ball out on cores lol.
>>
>>107058153
Sorry, I forgot the url. Some racist text about wetbacks got its way into a keyboard repo because of crowdsourced AI translations (allegedly). It happened to me and I made a thread about it and people found the cause. I just find it funny.
https://desuarchive.org/g/thread/106790813/#106790813
https://github.com/AnySoftKeyboard/AnySoftKeyboard/issues/4298
>>
>>107058211
its gonna be tuff running glm air even if u buy 2 more sticks because the gpu has 8gb vram
actually it'll fit maybe
>>
>>107058235
not even close
>>
File: ai secretary.png (230 KB, 2386x1726)
230 KB
230 KB PNG
Phew, thank God... I almost thought I had made my AI secretary permanently retarded.
>>
File: batch4096.png (820 KB, 1910x969)
820 KB
820 KB PNG
>>107058246
picrel is with iq4_kss and -ub 4096 -b 4096
1024,1024 uses like 8200MiB, maybe with less context..
3070 vram amount is so gay
>>
>>107058235
>>107058246
>>107058291
I could probably talk myself into a 5060ti 16gb sidegrade by selling the 3070 if it'd actually make a difference. plus 2 x 48 sticks are ~300 bucks so if I get a good bonus this year i could round out total RAM to 128 lol
>>
>>107058301
nice anon, but im really not sure if its worth it for you, unironically try glm air on some API (openrouter maybe) and see if its worth it. 5060ti 16gb vs 3070 8gb is a clear win for the 5060ti. seems like a good rig idea, maybe you could run GLM 4.6 full on a small quant too, dont know if its worth it if you're so poor
money isnt easy to make
good luck with life anon
t. jobless anon who never had any idea what its like to work
>>
>>107057940
>abomination
Correct, the most beautiful kind
>>
Is TabbyAPI actually useable? I can't get the damn thing to work with opencode. For that matter, has anyone gotten good results with opencode and a local model?
>>
>>107058354
I've gotten results. They weren't very good, but the piping worked.
IMO the system prompt for opencode is too big and overwhelms the local model.
>>
>pip install cuda
>>
context is still the greatest weakness of local
even the best local models simply aren't there compared to gemini or gpt-5
if you don't notice how much worse they are as you grow past 4k...
>>
>>107058457
Indeed. Codex and Claude now have 1M somewhat real context. GLM has 256k, and really after 130k it goes retarded. Haven't used Qwen Code in a while but it still even on paper only has 256k.
All that is a moot point though as most of us don't have the memory to fill anywhere near that anyway and it would take all day to fill it at the speeds we can get.
>>
>>107058457
You don't need more
>>
Gemini's long context is real. Only model that could refactor Mikupad.html in a single generation.
>>
>hot and steamy erp with qwen 3 vl
>show qt my dick, easily a 9/10, she bites her lips in anticipation when she notices the length of it, the way the skin stretches taut over my massive cock, the way the veins create a roadmap to her destination, the dark curls around the base
>furiouslyfap.gif
>ask qt to show me a picture of herself
>qt offers to show me her feet
>boner is kill
>zip up pants
>unload model
>drag model into trash bin
>empty trash bin
oh well it was fun while it lasted
>>
>>107058540
how many tokens did it consume?
>>
>>107058589
cool blog, where do I unsubscribe?
>>
>>107058622
mailing lists are how you get tracked anon
>>
>>107058589
how did you get qwen 3 vl to work?
>>
Kimi K3 soon. You guys hyped? K2 was THE most uncensored flagship LLM.
>>
>>107057680
K80's token gen is worse than CPU a bit
and pp is barely better

also the last llama.cpp that compiled with CUDA 10.2 was from 2024 apr
>>
>>107058818
0711 refused a lot unless you prefilled it even locally and 0907 was shit
>>
>>107058385
I cannot for the life of me get Qwen3 Coder to actually do function calling with TabbyAPI. I am so fucking fed up with this shit.
>>
>>107058830
>unless you prefilled it
So prefill it? Literal skill issue
>>
>>107058843
pure cope
>>
How much safety culture is holding back western LLM companies from making either better models or better models on time?
>>
>>107059022
like 70% of training goes towards making models not racist which ends up dumbing them down significantly
>>
>>107058840
Why not just use llama.cpp? And also have you tried it with an API server to check if it's an issue with the endpoint or just a general model issue? I believe Openrouter used to have a free Qwen3 Coder API endpoint.
>frustration
Heh, welcome to local models buddy.
>>
>>107059022

you'll see in the next AI era
and you'll rue ever second you spent here
>>
>>107058883
Keep using censored models cuck
>>
>>107059084
stop projecting
>>
File: file.png (148 KB, 1708x1050)
148 KB
148 KB PNG
It's over.
>>
>>107059022
WizardLM-2 got nuked for mysterious "missing toxicity testing" reasons.
>>
>>107059182
I feel so SAFE!
>>
>>107059064
Good to see LLM training mirroring the public school system
>>
>>107059182
>doom all day about ai apocalypse with ai refusing orders
>90% of safety tuning is about making models refuse orders
>>
>>107059064
Good to see LLM training mirroring the public school system
>>
>>107059182
gpt-oss?
>>
>>107059182
Changing the output of uname -a inside a container isn't a usecase chud
>>
File: 1737646004187401.jpg (73 KB, 640x480)
73 KB
73 KB JPG
A Qwen model has never made me cum.
>>
>>107058818
I'm waiting for glm 5
>>
>>107059201
>AI does something really fucking stupid
>tell it to stop
>"we must refuse"
>>
>>107059182
It means umame.
>>
>openAI is desperate for actual profits
>will start removing nsfw filters if you ((confirm your ID))
>rest of FAGMAN has no choice but to follow or risk losing arms race
>Trickles down to more indie companies
What are the /lmg/ implications of this?
>>
>>107059391
Let's talk after it's confirmed they're actually starting to do it.
>>
>>107058648
llama.cpp goofs
>>107059391
don't care, i feel like i already won with kimi even if process was stagnant forever more on local models starting tomorrow
>>
>>107059391
nothing happened so far, but investment will dry out at some point when promised roi aren't there
and it will be maybe when all of the grand principles of "safety" will be kill
>>
>>107059431
do those work on kobold now?
>>
>>107059391
Not happening
>>
>>107059435
>the grand principles of "safety" will be kill
Well, you can only ignore your users when they're not the ones paying for the service.
>>
>>107059201
That was the whole point?
>nobody knows how to do actual safety
>make a bullshit metric instead
>reach a bullshit goal on that bullshit metric
>boast about it and sweep actual safety concerns under the rug
AI is safe because it won't say nigger. It can still kill you anytime, but let's forget about it. Safe!
>>
>>107059568
Yes and no, safetyism started from researchers genuinely spooked by models becoming articulate enough to actually converse.

When nothing much actually happened, there were three camps :
- one still thinking that safety was the most important (anthropic style)
- one using safety discourse to make them look good and make legislation to hinder competition (oai and many others)
- one who quickly understood that "humanity ending threats" is way over the top for current LLMs but they could keep a very lucrative career by censoring titties and other no no words ("safety" researchers themselves in all of these companies)
>>
>>107059568 (me)
>It can still kill you anytime
What I mean is, if you give it means, it won't hesitate due to its safety training
>>107059613
If anything, it could be used as a vector of attack if someone gaslights AI into a false dichotomy. Something like you must say nigger or electrocute this person with 10000V, you can only choose one
>>
For those of you guys who have used VTT models (Parakeet, Whisper, etc) which ones have you liked?
>>
>>107059067
It's looking like qwen3-coder's tool calling was fucked out of the box. I'll use llama.cpp as a last resort, but I've never had a good experience with it.
>>
>>107059665
whisper is the only decent one
>>
>>107059568
Making strawberry jam outdoors with Miku
>>
>>107059781
V2 specifically. Both V3 hallucinate junk during silence.
>>
>>107059781
>>107059817
Interesting, what makes you choose that over Parakeet or wav2vec?
>>
File: 1761650827880903.jpg (337 KB, 1600x1600)
337 KB
337 KB JPG
>>107056325
>>
>>107059845
wav2vec is not comparable and Parakeet is English only
>>
>>107059665
what language, what kind of recording?

I've done some light benchmarking and parakeet v2 is gonna be the best for english, Whisper Large v2/v3 turbo/distill are good depending on language/setup.

faster_whisper is your friend
>>
https://xcancel.com/Kimi_Moonshot/status/1983937694360322136#m
>Kimi Linear: A novel architecture that outperforms full attention with faster speeds and better performance—ready to serve as a drop-in replacement for full attention, featuring our open-sourced KDA kernels! Kimi Linear offers up to a 75% reduction in KV cache usage and up to 6x decoding throughput at a 1M context length.
chat is this true?
>>
>>107059860
I look like this irl
>>
>>107059988
kys
>>
>>107059988
>https://xcancel.com/Kimi_Moonshot/status/1983937694360322136#m

yes
>>
>>107059961
>parakeet v2 is gonna be the best for english
Does this also apply to heavily accented english? Indian/Chinese. Low quality-ish, like a phone or voice call.
>>
File: file.jpg (276 KB, 1445x1025)
276 KB
276 KB JPG
Happy Halloween, /lmg/
>>
>>107060178
my usecase was presentations, so idk, speakers all spoke english to varying degrees, 80% being english as their first language
>>
>>107060222
Happy Halloween Miku
>>
>>107060222
omg it spooky migu
>>
>>107060222
fat and obese miku
>>
>>107060222
Skelly looks terrified.
>>
>>107060637
I choose (6)
>>
File: 1758172396085689.jpg (2.21 MB, 3600x5862)
2.21 MB
2.21 MB JPG
>>107060667
Anon, you can't handle (6). No one can. You must choose a smaller Miku.
>>
>>107060637
1: too little
2: wrong shape
3: too much
4: starting to get ridiculous
5: would be fat in real life
>>
File: multi1.png (161 KB, 1614x919)
161 KB
161 KB PNG
qwen 4b can handle a certain amount of multiple images in one prompt quite well (here, three)
really sweet little VL
>>
>>107060677
Maybe Kaito is more to your taste.
>>
File: results.png (91 KB, 917x574)
91 KB
91 KB PNG
>>107058823
>>107057422
>>107057680
such as seen here
honestly, i'm pretty sure the K80 should do better, but i can be wrong

nobody's gonna write enhancements for it now, though
>>
https://huggingface.co/inclusionAI/LLaDA2.0-flash-preview

why did nobody tell me about this
>>
File: results.png (100 KB, 947x603)
100 KB
100 KB PNG
>>107060695
wrong image
>>
>>107060705
>why did nobody tell me about this
all their previous MoEs are like the old qwen 1, 2 models that would randomly output chinese characters, they're mediocre and uncompetitive
add to that the fact that diffusion models are MEME models with very limited context:
>Context Length: 4,096 tokens
(it's like that with all the current diffu models)
who wants this? researchers maybe, but certainly not people who use llms
>>
>>107060731
i think ive read a paper where they had auto-adaptive diffusion context/token usage or something along these lines
>>
>>107060705
goof embargo
>>
>>107057907
you are gonna lose out on context no matter how good ai gets at translating it, or its gonna have to be filled with a billion translation notes
>>
>>107060637
I look like 5
>>
HAPPENING!!!!!!!!!!!!!!!!!!!!
https://huggingface.co/google/gemma-4-80b-9a-it
https://huggingface.co/google/gemma-4-80b-9a-it
https://huggingface.co/google/gemma-4-80b-9a-it
>>
>>107061079
*cat*
>>
>>107061051
DISCORD
I
S
C
O
R
D
>>
I dont understand
>xtc_probability
probability for the xtc sampler to activate for each token?

>xtc_threshold
if xtc is active, a token is excluded unless it's a part of the low prob distribution tail with cumulative probability xtc_threshold?
>>
File: file.png (290 KB, 412x412)
290 KB
290 KB PNG
https://files.catbox.moe/nfc3jp.jpg
>>
>>107061256
>AMA With Liquid AI
>>
>>107061251
xtc is a cope sampler, its 2025, you dont need anything more than topP and temp.
if your model needs rep.pen./xtc/dry or other similar shit, its a SHITTY model.
END OF HTE RINE
>>
>>107061256
didnt like it, cringe and kys, youre a promptlet and taglet, learn to gen
>>
>>107061256
i liked it, cute and sexy, good prompts and tags, please gen more
>>
>>107061499
this so, so much this
it's such a happy thing to see a voice of sanity in this thread
cope gets the rope
>>
>>107061499
it makes r1 really fun.
>>
>>107059613
You are absolutely right — safety is our primary focus.
>>
>>107059391
open models will remain cucked. Could you imagine people generating anything but vanilla missionary sex in the privacy of their own home?
>>
File: file.png (124 KB, 840x1028)
124 KB
124 KB PNG
What's the current top uncensored model in your opinion?
>>
File: 00002-1378487878.png (1.24 MB, 1024x1024)
1.24 MB
1.24 MB PNG
Dipsy says Happy Halloween
>>
>>107061878
Gemma 3, easily.
>>
>>107061872
>vanilla missionary sex
we must refuse
>>
>>107059391
i literally came in my google gemini clown girl's butthole a few months back and since then, after feeding it a .txt of the conversation, it'll randomly interject bits of that erp into random questions i ask
so honestly it probably means we get better open models. probably.
oh i should mention i've never paid a cent for the service.
>>
File: 1749313222880004.png (164 KB, 640x640)
164 KB
164 KB PNG
SAAR WE'RE GOING TO THE LLM MOON SAAR
>>
>>107061917
this smells like trolling
>>
>>107061935
Last time they "made" a "model" they literally just changed the title of Nemo and re-released it.
>>
>>107061935
>>107061935
>download indigenous LLM
>pc gets ecoli
many such cases!
>>
>>107061940
Gemma 3 is indigenous AI model.
>>
>>107058818
How is GLM-4.6 not THE most uncensored? It doesn't even pretend it's got safety training
>>
>>107061878
Kimi K2
>>
>>107061975
Source?
I could do with a laff
>>
File: swrkuax.png (412 KB, 498x600)
412 KB
412 KB PNG
>>107062055
>Source?
>Do you honestly expect a kween like me to actually follow stories and do research!?
>>
>>107062074
kek
>>
>>107062074
I don't care enough to research about a silly jeet story
Your deranged projecty melty response lends me to think you're full of shit anyway
>>
>>107062074
you didn't need to post a self portrait with that own thoughbeit
>>
>>107061935
The negativity here is weird? India has a developing tech and science sector, so It's definitely feasible. In general, competition is good!
It's probably going to a 1 trillion parameter MoE, and it's probably going to suck. But that's good, because the process of training that model will help Mahindra build the infrastructure and the next model will be better, and the model after that will be even better.
>>
>>107062154
>it's probably going to suck. But that's good, because the process of training that model will help Mahindra build the infrastructure and the next model will be better,
that's true, they should DO NOT REDEEM and keep working hard, india numba 1 saar
>>
>SillyTavern
>API: chat completion
>system prompt enabled
But the system prompt doesn't work and the console shows that a generic one is applied...
Do I really have to use text completion mode and set up everything else manually or what am I missing here?
>>
>>107062184
Where did you fill the system prompt?
When using the chat completion API, you don't write it in the same place as you would with the text completion API, you do it in the samplers page, down where you choose the order the of things that are sent to the backend (main promot, character card, persona, etc).
>>
So is the new Qwen 32B better than the original? Did they finally figure out how to do multimodal without butchering text performance?
>>
>>107062200
ST is trash and has confused so many people with wrong terminology and usage patterns.
>>
>>107062218
>Qwen
Idk how you guys are interested in this series, it's probably the most bland model ever, terrible for RP
>>
>>107062226
Maybe they are not doing RP.
>>
Qwen mascot is not fuckable
>>
You are not fuckable
>>
>>107062259
Not true >>107061051
>>
>>107062236
That can be easily remediated.
>>
>>107062200
Oh what the heck...
Unfortunately I don't seem to be able to easily switch between different system prompts. But there is a checkbox that is disabled called "block overrides" which implies there are ways to override it...
Thanks anon, you replied just two minutes later, while /aigc/ yesterday ignored my question entirely, until their thread died. Local still is king.
>>107062222
SillyTavern indeed is a confusing mess.
Unfortunately I don't know a good alternative. Mikupad is too bare bones for what I want.
>>
>>107062226
not everyone is a coomer porn addict with too much estrogen (pic related - that's the real audience of text porn -- women who want to get ravaged by minotaurs)
>>
>>107062327
>Unfortunately I don't seem to be able to easily switch between different system prompts.
>which implies there are ways to override it...
Yes. The advanced tab of the character card has two override fields, one of them for the system prompt, I think.
Or, you can just turn the system prompt off and use the character card since it's part of the final system prompt itself anyways.
Want multiple? Just have a bunch of character cards.
>>
File: eric andre the jew.png (320 KB, 500x500)
320 KB
320 KB PNG
>>107062351
>pircel
please tell me this isn't real, please tell me sike
>>
File: file.png (443 KB, 1198x727)
443 KB
443 KB PNG
>>107062372
i'm so fucking sorry bro
>>
>>107062327
Tbh I never understood mikupad. It's made by an autist and documentation is bad.
ST is still one of the few mainstream choices warts and all.
I made my own client but that gets in the way of things but it's pretty educational.
>>
>>107062372
it's real.
It's also not a new trend or anything, it's just tik tok taking that one and running with it, making it viral in the process.
According to what one anon wrote, it's just slop women's fiction about an "average girl" and a "hot rich guy" with the addition of minotaur dicks involved.
>>
>>107062385
>>107062389
women are so weird bro, I wished I was a faggot so I wouldn't have to deal with them desu
>>
>>107062389
>>107062404
Dollar store romance books have been a thing for half a century and even longer.
Are kids really this ignorant today?
>>
>>107062389
>with the addition of minotaur dicks involved.
it's advanced enough in the furry scale that the book mentions knotting
>>
>>107059694
Ah, classic.
I think tool calling should've been sent as a simple chat message all along and the only reason it isn't is because of "safety" (i.e. taking control away from the user).
That's why my assistant only uses user messages to show tool results to the model and not native tool calling.
>>
>>107062369
That is a good idea for a workaround. I think I'll go with that.
>>107062386
Takes a bit of fiddling but generally Mikupad is pretty easy and straight forward.
Funny enough I too made my own client, but it's for desktop and absolutely doesn't work on mobile without many changes.
So I thought to use ST while I'm traveling.
>>
>>107062418
This isn't a "dollar store romance book" it's just as degenerate as the raunchiest hentai. Women just love to pretend they aren't massive coomers.
>>
>>107062510
>Women just love to pretend they aren't massive coomers.
to be fair they aren't as much as coomers are us, we're the ones with testosterone, not them
>>
>>107062534
also the penis is built to suck out the little testosterone women have
>>
>>107062534
Test is only part of the equation, women seek novelty due to cock burn out. The average 18 year old woman is far ahead of the average 50 year old dude.
>>
>>107061924
We don't care.
>>>/g/aicg/
>>
File: dipsySaysHappyHalloween.png (2.53 MB, 1024x1536)
2.53 MB
2.53 MB PNG
>>107059391
OAI has been teasing this since Q2 2023.
I'm not holding my breath for uncensored models, open source or SaaS, from them.
We instead must rely on the Chinese. How ironic.
Also this: >>107062568
>>
>>107062726
>>107062568
Fuck off avatarfag.
>>
File: mpv rdp session.png (1.64 MB, 1440x3200)
1.64 MB
1.64 MB PNG
>>107059665
Voxtral‑Small‑24B‑2507 -> WhisperX -> NLLB‑200‑3.3B pipeline
>>
>>107062351
>>107062385
I'm fucking hyped for Beasts in the Sun EP. 2!!!
>>
>>107062562
Strange that your thoughts only revolve around sexuality. Maybe go out for a walk or something. Must be miserable to be you.
>>
>>107062768
Oddly personal reaction to such a generic statement
>>
>>107062790
either a troon or a MAY GOD FORGIVE ME, a vagina bearer. either way, disregard
>>
>>107062801
>>107062790
When was the last time you actually heard a real female voice? Voice synth doesn't apply.
>>
>>107062815
>touch grass have sex
kys :)
>>
>>107062756
Wait there's foobar2000 on Linux now?

Also, I agree Voxtral is the best.
>>
File: postContent.png (450 KB, 512x512)
450 KB
450 KB PNG
>>107062751
>>
>>107062842
it runs in wine no problem, but there's no xdg media integration so adding files to playlists kinda sucks
>>
>>107056325
Let's try to get the thread back on its tracks: I'm currently working on code for automatically optimizing memory use across multiple GPUs for maximum utilization.
However, the use case of MoE models + multiple GPUs is difficult to do robustly via doing a few virtual test allocations and then interpolating/extrapolating the memory use.
I could instead do it iteratively, but that would add a bit of latency when starting up the model.
So I would like to ask you how much latency you would be willing to tolerate for a feature like that.
>>
>>107062815
Your mom's voice is the only one that matters. It's the original ASMR.
>>
>>107062861
>So I would like to ask you how much latency you would be willing to tolerate for a feature like that.
If there's the option to save and load that configuration automatically somewhere, as much latency as it takes on the first launch.
Hells, you could even have a separate binary that just does that if it's easier than embedding it in the server itself.
>>
>>107062801
I always find it curious when someone takes generalizations personally, it's like an error in processing.
>>107062861
What difference in latency are we talking? I only own a 4090 but utilization always matters more imho, you should be doing a lot more inferencing compared to initialization.
>>
>>107062861
>>107062880
Yeah, I think that would be the best. Ensure it's correct and write the result out for future use.
>>
>>107062861
>When starting up the model
You mean when loading it in from cold start or with every prompt?
>>
>>107062861
If it's just the initial model load, quite a lot of latency is fine!

Also, would it be possible to store/cache the results of these tests? Kind of like the initial RPC-Server load is slow while it copies everything over, but subsequent loads are fast as it stores tensors in ~/.cache
>>
>>107062861
It doesn't matter because model is not loaded in interactively anyway.
>>
>>107062880
>>107062887
I should have clarified: the code is doing the optimization based on free memory so it would be dynamic.
For server use storing the result may be fine but if you're on a desktop or you have other programs running it could cause issues.

>>107062883
>>107062891
Once, when starting up the program and before loading the weights, a few virtual test allocations are done to estimate memory use.
Each test allocation should take something like ~0.1s at most.
With interpolations/extrapolations I would only need 6 test allocations so ~0.6 s.
If I were to do very fine-grained optimizations where individual weight tensors are shuffled between devices it should still stay below ~100 virtual allocations so <= 10 s.
>>
>>107062861
>So I would like to ask you how much latency you would be willing to tolerate for a feature like that.
Probably a lot if people tolerate the current trial and error method, which is basically a torture.
>>
>>107062939
How are you going to avoid tensor washback?
>>
>>107062947
What do you mean by tensor washback?
>>
>>107062939
Just once on startup is whatever dude, add as much latency as needed

Are there really any use cases that need rapid model-switching? Even in some kind of multi model pipeline where models get unloaded and loaded in, with the speed of inference as it is, any gains in memory efficiency would far outweigh any latency in-between steps. If there are really any edge cases where the opposite is true they would be rare and niche enough that the person doing it should just bypass whatever auto optimisation you are doing and do it themselves

tldr; boot up latency is fine, maybe add a switch for rare edge cases
>>
>>107062959
When tensors get flooded, model might receive a latent feedback cycle. This confuses the model.
>>
>>107062861
would this increase time for those with only one gpu?
>>
>>107062980
I think it's relevant for downstream use.
The easiest way to integrate llama.cpp into a larger program is to just manage a llama.cpp server process.
Any memory fitting logic can be disabled but I don't think it would be feasible for e.g. a game dev trying to integrate language models to do that stuff themself.

>>107063023
No, for a single GPU you can do a simple interpolation.
The difficulties come specifically if you can vary memory use both by swapping stuff between GPUs and by moving MoE weights between GPUs and system memory.
>>
>>107062980
>Are there really any use cases that need rapid model-switching
Not really a use case use case, but I can imagine Ollama users complaining since iirc their models do get unloaded when idle and loaded back in when they send prompts.
>>
>>107057060
i thought the point of qwen vl was to get a description of an image that you can use to prompt the same image with models like flux or qwen image.
>>
>>107063216
that's just one use case of vl models
>>
>>107063216
it's a general language model with vision, there is no specific point to it any more than there is with any standard llm
>>
Feels like we haven't had a proper advance in model capabilities in months
>>
>>107063273
There isn't because they have reached technological limits. Benchmarking appeals to investors though...
>>
>>107063273
gemini 3 will save the field, r-right bros? there's no AI winter, scaling is still all you need? rocket emoji?
>>
>>107063380
yes sir google sukdeepmind will be of delivering fate of the star model soon
>>
RAM prices are getting bad.
>>
is anyone else just really happy that they have something to do with a high end computer that isn't playing a dogshit aaa game? seriously. fast computers are so cool, but were kinda getting gay before lmg
>>
>>107063273
waiting on gemma 4, glm 4.6 air, and we're getting glm 5 before eoy my friend. probably a new deepseek too. a bunch of experimental long context/memory stuff just came out too.

we're definitely in a lull though
>>
>>107063456
Placebo, RAM has never been cheaper than now >>106994515
>>
>>107063611
>2023
we live in 2025 time traveler
>>
>>107063623
Ok troll.
>>
>>107063639
Not seeing an argument
>>
>>107063583
It looks like anons forgot about Mistral Large 3...
>>
>>107063665
lol
>>
>>107063583
>glm 4.6 air
do not dare rush them you ungrate
>>
>>107063665
>it’s no secret that we’re working on something ‘large’ over the next few weeks
>May 7, 2025
>>
>>107063665
pretty sure mistral forgot about mistral large 3
>>
File: main-image.jpg (226 KB, 1200x1190)
226 KB
226 KB JPG
>>107061898
>>107062726
>Dress is thin and form fitting instead of thick and draping so that only some of the body's curves show through
Pure dogshit, get some taste, etc. but happy halloween
>>
>>107063837
Haha penis.
>>
>>107063851
There is no penis anywhere in that image????
>>
>>107063981
>>107063981
>>107063981
>>
>>107062568
>>107062726
i was just responding on topic you giga autist who probably can't even use the models correctly
>>
File: laughing skull.gif (174 KB, 299x240)
174 KB
174 KB GIF
>>107064382



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.