/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 11/19/25(Wed)19:13:06 No.107266608

File: 70d506b2b5b1d85720c15273e(...).jpg (294 KB, 1536x1536)

294 KB JPG

/lmg/ - Local Models General Anonymous 11/19/25(Wed)19:13:06 No.107266608 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107255984 & >>107245928

►News
>(11/19) Meta releases Segment Anything Model 3: https://ai.meta.com/sam3
>(11/11) ERNIE-4.5-VL-28B-A3B-Thinking released: https://ernie.baidu.com/blog/posts/ernie-4.5-vl-28b-a3b-thinking
>(11/07) Step-Audio-EditX, LLM-based TTS and audio editing model released: https://hf.co/stepfun-ai/Step-Audio-EditX
>(11/06) Kimi K2 Thinking released with INT4 quantization and 256k context: https://moonshotai.github.io/Kimi-K2/thinking.html
>(11/05) MegaDLMs framework for training diffusion language models released: https://github.com/JinjieNi/MegaDLMs

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
11/19/25(Wed)19:13:34 No.107266611

Anonymous 11/19/25(Wed)19:13:34 No.107266611

File: what's in the box.jpg (235 KB, 1536x1536)

235 KB JPG

►Recent Highlights from the Previous Thread: >>107255984

--Technical challenges and limitations in running and optimizing AI models:
>107257229 >107257268 >107257283 >107257317 >107257329 >107257380 >107257444 >107257474 >107259236 >107257413 >107257399 >107257451 >107257942 >107258569 >107259121 >107258648
--Mikupad guide for SillyTavern users and story writing tool comparisons:
>107261172 >107261581 >107262019 >107262306 >107262615 >107262670 >107262815 >107262969 >107263305
--Temperature-controlled two-stage tool calling workflow optimization:
>107261888 >107261927 >107261982 >107262125
--Threadripper vs Epyc hardware cost-performance debates and compatibility questions:
>107257544 >107257554 >107257594 >107257610
--Meta's SAM 3 computer vision model features and limitations:
>107264112 >107264149 >107264221 >107264552
--Gemini 3's performance leap and challenges in replicating its reasoning process:
>107260137 >107260169 >107260184 >107262030
--Debating prompt engineering techniques for surreal images and waifu-themed browsing:
>107256559 >107257574 >107259084
--Google AI model update with increased safety restrictions:
>107257155 >107257703
--Using Qwen3-EMBEDDING for vector-based semantic search and similarity comparison:
>107264517 >107264737 >107264853 >107264876 >107264886
--Gemini 3 coding performance and local model limitations:
>107257191 >107257226 >107257237 >107257432 >107257247
--High-power GPU setup considerations for qLoRA training:
>107256374 >107256947 >107256954 >107256967
--Seeking smaller, more reliable local models for accurate shell script generation:
>107259062
--Updates and frustrations on GLM model PR progress in llama.cpp:
>107257085
--Gemini3-powered Python RPG engine with NSFW interaction options:
>107257682
--Miku (free space):
>107260917 >107261172 >107261473 >107263922 >107264552

►Recent Highlight Posts from the Previous Thread: >>107255987

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
11/19/25(Wed)19:18:32 No.107266646

Anonymous 11/19/25(Wed)19:18:32 No.107266646

ahem dipsy's hairy pussy

Anonymous
11/19/25(Wed)19:19:52 No.107266656

Anonymous 11/19/25(Wed)19:19:52 No.107266656

>>107266646
this post is NOT related to local models!

Anonymous
11/19/25(Wed)19:20:49 No.107266659

Anonymous 11/19/25(Wed)19:20:49 No.107266659

>>107266656
im sniffing her pussy locally on my rig

Anonymous
11/19/25(Wed)19:21:48 No.107266668

Anonymous 11/19/25(Wed)19:21:48 No.107266668

>>107266659
lies! dipsy does not have smell modality

Anonymous
11/19/25(Wed)19:37:29 No.107266816

Anonymous 11/19/25(Wed)19:37:29 No.107266816

Repeating my previous posts for newcoomers:

guize... we got to try training a model on this, right?
I'm tempted to risk getting b& from yet another cloud provider by grabbing the $200 plan and generating as many responses as I can.
These aren't the real reasoning traces but it sure as fuck looks like it'd still work well enough.
$ curl -X POST http://localhost:8000/v1/chat/completions   -H "Content-Type: application/json"   -H "Authorization: Bearer test-key"   -d '{
        "model": "gpt-5",
        "messages": [{"role": "user", "content": "Tell me a joke"}]
      }'
    {"choices":[{"finish_reason":"stop","index":0,"message":{"content":"<think>I\u2019m looking to keep things simple and deliver a single joke as requested. No tools are needed, and I want to maintain a friendly tone. \n\nI\u2019m considering a couple of options\u2014like, \u201cI told my wife she was drawing her eyebrows too high. She looked surprised.\" But then there's a better tech joke: \u201cWhy do programmers prefer dark mode? Because light attracts bugs.\u201d \n\nThis sounds clever and safe! So, I\u2019ll go with that as my final answer.</think>Why do programmers prefer dark mode? Because light attracts bugs.","role":"assistant"}}],"created":1763595363,"id":"resp_0eeac847c94a6a5701691e546344f0819bae0f489146835a89","model":"gpt-5","object":"chat.completion","usage":{"completion_tokens":146,"prompt_tokens":5015,"total_tokens":5161}}
>>107266615
au contraire, people were talking how google originally showed the gemini 2.5 on aistudio traces but now shows a chatgpt web interface style summary (I remember it outputting more realistic looking traces as well).
But who knows, maybe this is because the gemma scandal and they can be gotten through API, I'm not sure.

Anonymous
11/19/25(Wed)19:43:52 No.107266875

Anonymous 11/19/25(Wed)19:43:52 No.107266875

File: gpt clarifier.png (325 KB, 1737x1489)

325 KB PNG

gptchan just called me "the clarifier" kek

Anonymous
11/19/25(Wed)19:44:13 No.107266879

Anonymous 11/19/25(Wed)19:44:13 No.107266879

File: ram.png (737 KB, 1200x675)

737 KB PNG

Reminder.
Ram slots this year.
NPU next year.
Save yer money.

Anonymous
11/19/25(Wed)19:46:27 No.107266896

Anonymous 11/19/25(Wed)19:46:27 No.107266896

>>107266875
In the image I'm generating a dataset of programming challenges using gpt-5.1 with the coding plan and logging the responses to finetune open models in the future.
The good news is that it works great with my pre-existing coding assistant.
There are no bad news yet.

Anonymous
11/19/25(Wed)19:49:11 No.107266922

Anonymous 11/19/25(Wed)19:49:11 No.107266922

File: 1761911824413080.png (1.11 MB, 1024x1024)

1.11 MB PNG

>>107266646
Dispy shall return.

Anonymous
11/19/25(Wed)19:50:18 No.107266933

Anonymous 11/19/25(Wed)19:50:18 No.107266933

>>107266922
>DeepSeek is looking to maintain the momentum gained by the debut of its R1 reasoning model by rushing its new R2 model to market as quickly as possible.
>It first planned to launch R2 in early May, but it now wants to move the release date forward.
https://bgr.com/tech/deepseek-is-rushing-to-get-its-next-gen-r2-model-out-sooner-than-expected/
Less than 6 more months until R2

Anonymous
11/19/25(Wed)19:54:41 No.107266968

Anonymous 11/19/25(Wed)19:54:41 No.107266968

>>107266922
"kept you /wait/ing huh?"
>>107266933
literally no deepseek release has ever been accurately predicted by any media outlet even a day in advance
all the 'leakers' are full of shit, and usually random AI retards on twitter
they're never gonna do an 'R2', DS is a hybrid reasoner now

Anonymous
11/19/25(Wed)19:58:03 No.107266999

Anonymous 11/19/25(Wed)19:58:03 No.107266999

whats the horniest model?

Anonymous
11/19/25(Wed)19:59:12 No.107267010

Anonymous 11/19/25(Wed)19:59:12 No.107267010

>>107266999
The one inside your brain

Anonymous
11/19/25(Wed)19:59:45 No.107267014

Anonymous 11/19/25(Wed)19:59:45 No.107267014

how to local finetune?

Anonymous
11/19/25(Wed)20:00:04 No.107267021

Anonymous 11/19/25(Wed)20:00:04 No.107267021

File: 1759871195983087.png (2.46 MB, 1024x1536)

2.46 MB PNG

>>107266922
Agree. But not until DS releases a new model.
In the meantime:
https://rentry.org/DipsyWAIT
https://mega.nz/folder/KGxn3DYS#ZpvxbkJ8AxF7mxqLqTQV1w

Anonymous
11/19/25(Wed)20:00:19 No.107267027

Anonymous 11/19/25(Wed)20:00:19 No.107267027

>>107267014
you don't

Anonymous
11/19/25(Wed)20:00:53 No.107267031

Anonymous 11/19/25(Wed)20:00:53 No.107267031

>>107267027
what if i have 6 5090s?

Anonymous
11/19/25(Wed)20:01:42 No.107267039

Anonymous 11/19/25(Wed)20:01:42 No.107267039

>>107267031
Sell one and commission Drummer, he'll do a better job than you could.

Anonymous
11/19/25(Wed)20:02:28 No.107267048

Anonymous 11/19/25(Wed)20:02:28 No.107267048

>>107267039
what if i like doing shit myself?

Anonymous
11/19/25(Wed)20:02:53 No.107267053

Anonymous 11/19/25(Wed)20:02:53 No.107267053

>>107265642
Here's the formatting for FIM with Mistral. I'm not smart enough to figure out what this means for how you'd use it with Mikupad.
https://docs.mistral.ai/api/endpoint/fim

Anonymous
11/19/25(Wed)20:05:20 No.107267078

Anonymous 11/19/25(Wed)20:05:20 No.107267078

>>107266816
>guize... we got to try training a model on this, right?
You and what datacenter?

Anonymous
11/19/25(Wed)20:06:19 No.107267088

Anonymous 11/19/25(Wed)20:06:19 No.107267088

>>107267014
>>107267031
Unsloth is good for single GPU tuning (least memory and fastest) but I haven't tried it for multi GPU.
Axolotl has a very good dataset loader but it takes more knowledge to write adequate config files to make good use of your hardware (fsdp, zero, etc. which btw are all trash). They have a plugin for liger kernel but I haven't tried it.
Llama factory is the easiest to get going for multi-gpu but also consumes a lot of memory unless you use the right options like liger kernel.
In any case be prepared for a lot of bugs, incompatibilities and frustration that
I suggest you get familiar with pypi-timemachine to make pip ML dependency hell bearable.

Anonymous
11/19/25(Wed)20:07:20 No.107267099

Anonymous 11/19/25(Wed)20:07:20 No.107267099

>>107267078
By "train" I meant superrvised finetuning of existing models. Aint nobody gonna train a model from scratch.

Anonymous
11/19/25(Wed)20:08:35 No.107267113

Anonymous 11/19/25(Wed)20:08:35 No.107267113

>>107267088
i have been suffering with axolotl for the past 3 or so days. i even tried oobabooga's built in trainer because i was desperate. guess i will take a look at llamafactory. thanks

Anonymous
11/19/25(Wed)20:10:31 No.107267130

Anonymous 11/19/25(Wed)20:10:31 No.107267130

>>107267099
But the Chinese are already making models finetuned on western reasoning traces at a much larger scale and with the actual reasoning where they can still get it. I guess you could making GPTified reasoner out of Nemo, but besides that I don't really see what you could hope to accomplish.

Anonymous
11/19/25(Wed)20:12:34 No.107267154

Anonymous 11/19/25(Wed)20:12:34 No.107267154

>>107267113
Another thing I forgot to mention is when installing you can avoid compiling flash-attn by installing the packages from the release tab on their github.
I've used all of them with some degree of sucucess with sharegpt format.
Never tried ooga's thing.

Anonymous
11/19/25(Wed)20:12:47 No.107267158

Anonymous 11/19/25(Wed)20:12:47 No.107267158

Google is leaving the improvements in image and video generation for 3.5 it seems

Anonymous
11/19/25(Wed)20:19:25 No.107267215

Anonymous 11/19/25(Wed)20:19:25 No.107267215

>>107267130
OpenAI also releases new models continually and it's not clear how much the open weights models are updated over time.
Don't you think finetuning gpt-oss on the latest version of its most powerful big brother could maybe make gpt-oss a bit smarter? It could also be oriented toward the fields you care about, so you could try to make it forget some of the info and skills you don't care about to optimize the things you do care about.
Besides that, it'd be interesting to see how the original personality of a model interacts with the new data and how much of the original personality remains. For example I'd like to see it done with Gemma since it has a very rich and profound personality for a 27B model.
And it'd also be interesting to see if Qwen3 30B coder can be boosted.

Anonymous
11/19/25(Wed)20:24:32 No.107267248

Anonymous 11/19/25(Wed)20:24:32 No.107267248

>>107266879
>4 RAM sticks
In this economy? You're better off stacking 3090s.

Anonymous
11/19/25(Wed)20:26:24 No.107267268

Anonymous 11/19/25(Wed)20:26:24 No.107267268

>>107267215
>Don't you think finetuning gpt-oss on the latest version of its most powerful big brother could maybe make gpt-oss a bit smarter?
No, it's unsalvageable.

Anonymous
11/19/25(Wed)20:26:36 No.107267269

Anonymous 11/19/25(Wed)20:26:36 No.107267269

>>107267158
Image and video generation have always been separate products with Veo being for video and imagen (nano banana) being for images even if you access them through Gemini. They're just updating them whenever they feel like it. Their image generation model is like two months old now.

Anonymous
11/19/25(Wed)20:28:26 No.107267290

Anonymous 11/19/25(Wed)20:28:26 No.107267290

>>107267268
What do you mean? It stands alone in that ~100B category along with Air. Do you think Air is better?

Anonymous
11/19/25(Wed)20:30:17 No.107267301

Anonymous 11/19/25(Wed)20:30:17 No.107267301

please enlighten, or better said: spoonfeed my currently retarded and autistic mind: if I deploy Mistral Nemo on my 3090, would I be able to have a virtual girlfriend that sends me horny messages?

Anonymous
11/19/25(Wed)20:33:56 No.107267328

Anonymous 11/19/25(Wed)20:33:56 No.107267328

Is there any runtime that lets you run inference with arbitrarily large full sized models on a single graphics card by just swapping to and from memory/disk, with the assumption that it will just take a really long time to generate a response?

Anonymous
11/19/25(Wed)20:34:47 No.107267335

Anonymous 11/19/25(Wed)20:34:47 No.107267335

>>107267328
yes, llama.cpp with default settings already streams from disk

Anonymous
11/19/25(Wed)20:36:27 No.107267340

Anonymous 11/19/25(Wed)20:36:27 No.107267340

>>107267301
She will be dumb but yes.

Anonymous
11/19/25(Wed)20:50:02 No.107267431

Anonymous 11/19/25(Wed)20:50:02 No.107267431

File: theo.jpg (110 KB, 736x1308)

110 KB JPG

I want to use an AI that can edit this image so that these two characters are naked. Are there any offline AI that do not censor that can do this? I'm new to this so I don't know. Thanks.

Anonymous
11/19/25(Wed)20:51:01 No.107267439

Anonymous 11/19/25(Wed)20:51:01 No.107267439

>>107267431
wrong thread
>>>/ldg/

Anonymous
11/19/25(Wed)20:51:33 No.107267443

Anonymous 11/19/25(Wed)20:51:33 No.107267443

>>107267431
KYS

Anonymous
11/19/25(Wed)20:56:45 No.107267475

Anonymous 11/19/25(Wed)20:56:45 No.107267475

I was thinking how to do containerization to be able to run agents in YOLO mode without them deleting my damn home folder and realized the easiest way is by creating a new user. Hope Murphy's law doesn't strike.

Anonymous
11/19/25(Wed)21:01:32 No.107267527

Anonymous 11/19/25(Wed)21:01:32 No.107267527

File: 1758176957549830.png (201 KB, 1210x979)

201 KB PNG

>

Anonymous
11/19/25(Wed)21:02:18 No.107267533

Anonymous 11/19/25(Wed)21:02:18 No.107267533

>>107267431
live fujo

Anonymous
11/19/25(Wed)21:18:26 No.107267650

Anonymous 11/19/25(Wed)21:18:26 No.107267650

>>107267431
I asked a non-gay version of this question on the local diffusion thread and they recommended qwen + photoshop
>>107267527
The day Sam Altman goes bankrupt is the day i will smile from ear to ear for the first time in decades

Anonymous
11/19/25(Wed)21:19:53 No.107267659

Anonymous 11/19/25(Wed)21:19:53 No.107267659

>>107267527
kimi/deepseek/glm all get this right
is oai really still this dumb?

Anonymous
11/19/25(Wed)21:20:02 No.107267661

Anonymous 11/19/25(Wed)21:20:02 No.107267661

>>107267527
gpt-5 really needs thinking to not be retarded but when you let it think, oh boy

Anonymous
11/19/25(Wed)21:21:33 No.107267675

Anonymous 11/19/25(Wed)21:21:33 No.107267675

>>107267659
that looks like the free user interface, you have to force it to think to get consistently good results. the autorouter doesn't work.

Anonymous
11/19/25(Wed)21:26:59 No.107267703

Anonymous 11/19/25(Wed)21:26:59 No.107267703

>>107267340
interesting, thanks for the response anon
dumb in what ways?

Anonymous
11/19/25(Wed)21:32:35 No.107267734

Anonymous 11/19/25(Wed)21:32:35 No.107267734

I got a question for fellow sillytavern gooners
how do you cure alzheimer's? I'm pretty new to this shit and I can't wrap my head around either autor's notes or world info

Anonymous
11/19/25(Wed)21:40:56 No.107267780

Anonymous 11/19/25(Wed)21:40:56 No.107267780

>>107267734
https://rentry.org/NG_Context2RAGs

Anonymous
11/19/25(Wed)21:44:53 No.107267796

Anonymous 11/19/25(Wed)21:44:53 No.107267796

>>107267088
You need to pay to use unsloth for multigpu

Anonymous
11/19/25(Wed)21:50:00 No.107267836

Anonymous 11/19/25(Wed)21:50:00 No.107267836

File: 1748008102015118.jpg (47 KB, 738x415)

47 KB JPG

Gemini3 feels hardly better for creative work. If this is the next generation of LLMs that our chink overlords have to distill, we're fucked.
Maybe LLMs are truly a dead end.

Anonymous
11/19/25(Wed)21:51:41 No.107267845

Anonymous 11/19/25(Wed)21:51:41 No.107267845

File: file.png (573 KB, 960x960)

573 KB PNG

>>107267836
>Maybe LLMs are truly a dead end.

Anonymous
11/19/25(Wed)21:51:48 No.107267846

Anonymous 11/19/25(Wed)21:51:48 No.107267846

>>107267796
They have claimed at various points in time that you get multigpu by paying, that no version works with multigpu yet but they're working on it, and that the free version works with multigpu already out of the box by relaying on zero (what they currently claim), so the only way it makes sense is that it is a grift and it doesn't actually work but they hoped some people paid and were too lazy to complain.

Anonymous
11/19/25(Wed)21:57:26 No.107267891

Anonymous 11/19/25(Wed)21:57:26 No.107267891

>>107267780
Thank you, this is the best explanation I've seen so far, the lorebook guide in there is great too
So basically, if I reach the context limit, I should just put the important stuff into some kind of data storage and can restart a chat by replacing the first message with a summary of the previous one?

Anonymous
11/19/25(Wed)21:57:46 No.107267895

Anonymous 11/19/25(Wed)21:57:46 No.107267895

<think>
miku miku oo ee oo
</think>

Anonymous
11/19/25(Wed)22:00:50 No.107267915

Anonymous 11/19/25(Wed)22:00:50 No.107267915

>>107267891
Basically yes

Anonymous
11/19/25(Wed)22:07:54 No.107267961

Anonymous 11/19/25(Wed)22:07:54 No.107267961

>>107267891
if you do get into using A/N or lorebooks I'd strongly recommend installing
https://github.com/SillyTavern/Extension-PromptInspector
honestly this should just be a standard feature in ST. makes it much easier to figure out how the 912 different prompt manipulation features actually work

Anonymous
11/19/25(Wed)22:10:11 No.107267973

Anonymous 11/19/25(Wed)22:10:11 No.107267973

>>107267703
All ways.
You need at least 123B Q8, or higher.
Get ram.

Anonymous
11/19/25(Wed)22:10:18 No.107267974

Anonymous 11/19/25(Wed)22:10:18 No.107267974

https://huggingface.co/mradermacher/mistralai-Mistral-Nemo-Instruct-2407-12B-MPOA-v1-GGUF

is this now the least censored, most powerful mistral nemo version?

Anonymous
11/19/25(Wed)22:15:14 No.107268010

Anonymous 11/19/25(Wed)22:15:14 No.107268010

>>107267973
this is possibly the shittiest advice i have ever seen here

Anonymous
11/19/25(Wed)22:16:14 No.107268017

Anonymous 11/19/25(Wed)22:16:14 No.107268017

>>107267974
when is he going to do gemma 3 27b, or a qwen 30b

Anonymous
11/19/25(Wed)22:19:30 No.107268045

Anonymous 11/19/25(Wed)22:19:30 No.107268045

File: miku donut happy satisfie(...).jpg (33 KB, 761x608)

33 KB JPG

>>107267895

Anonymous
11/19/25(Wed)22:21:36 No.107268054

Anonymous 11/19/25(Wed)22:21:36 No.107268054

>>107267973
>All ways
can you please clarify a bit with an example
>>107268010
good to know anon, thanks, interestingly enough I should be able to run mistral large on dual 3090's but I'd like to avoid that and just use a single one

Anonymous
11/19/25(Wed)22:26:42 No.107268091

Anonymous 11/19/25(Wed)22:26:42 No.107268091

>>107268054
largestral is outdated at this point and is slow as shit. pretty sure the other anon is trolling you. get a q4 of glm air. that will run on a single 3090 with some offloading to ram

Anonymous
11/19/25(Wed)22:31:45 No.107268114

Anonymous 11/19/25(Wed)22:31:45 No.107268114

I'm tired. I'm spent. I'm exhausted. I'm drained. I'm worn out. I'm beat. I'm pooped. I'm bushed. I'm wiped out. I'm done in. I'm all in. I'm dead. I'm dead tired. I'm dead on my feet. I'm ready to drop. I'm out of gas. I'm running on fumes. I'm running on empty. I'm out of steam. I'm out of juice. I'm out of energy. I'm out of power. I'm out of strength. I'm out of stamina. I'm out of endurance. I'm out of vigor. I'm out of vitality. I'm out of life. I'm out of breath. I'm out of wind. I'm out of air. I'm out of oxygen. I'm out of blood. I'm out of circulation.

Anonymous
11/19/25(Wed)22:32:23 No.107268120

Anonymous 11/19/25(Wed)22:32:23 No.107268120

>>107268091
thanks anon, will check it out, I hope it is fully uncensored, yes?

Anonymous
11/19/25(Wed)22:33:19 No.107268122

Anonymous 11/19/25(Wed)22:33:19 No.107268122

>>107268120
more or less. might need a very very simple jailbreak. there are rp finetunes of it if you are interested

Anonymous
11/19/25(Wed)23:40:30 No.107268499

Anonymous 11/19/25(Wed)23:40:30 No.107268499

>>107268091
Air is retarded. What's the last time you tried Mistral Small? 2509 has great overall comprehension and doesn't fail as much as Air

Anonymous
11/19/25(Wed)23:41:25 No.107268504

Anonymous 11/19/25(Wed)23:41:25 No.107268504

>>107268499
i have never used mistral small because i am not poor

Anonymous
11/19/25(Wed)23:57:08 No.107268587

Anonymous 11/19/25(Wed)23:57:08 No.107268587

>>107268504
I have 2 servers because the large one draws up to 2kW, so I'm used to both words. As much as I like 4.6, air is a piece of shit

Anonymous
11/19/25(Wed)23:57:49 No.107268591

Anonymous 11/19/25(Wed)23:57:49 No.107268591

>>107268587
my server draws 3.2kW

Anonymous
11/19/25(Wed)23:59:25 No.107268597

Anonymous 11/19/25(Wed)23:59:25 No.107268597

>>107268591
If you run air, you're both poor and retarded

Anonymous
11/20/25(Thu)00:00:22 No.107268599

Anonymous 11/20/25(Thu)00:00:22 No.107268599

>>107268597
i run air for speed and 4.6 for quality. i get 80t/s on a q8 of air but only 12t/s on a q4 of 4.6

Anonymous
11/20/25(Thu)00:08:05 No.107268635

Anonymous 11/20/25(Thu)00:08:05 No.107268635

>>107268599
dense 24b > undertrained a12b moetrash

Anonymous
11/20/25(Thu)00:09:45 No.107268646

Anonymous 11/20/25(Thu)00:09:45 No.107268646

File: 5dolars.png (886 KB, 809x802)

886 KB PNG

>>107268054
>can you please clarify a bit
Memory issues
Ignoring instructions
Failing character card details
Weaker context length sanity
Weaker character card token length sanity
Spatial sense issues
Inconsistent post length
Can't end post properly
Limited vocabulary
Higher chance of spouting random bullshit
Higher chance of talking in another language
"Phantom Memory" between characters
Failing human anatomy
Less understanding of metaphors and social knowledge
Characters with bilocation
Treating you as if you're bilocated
Schizophrenic characters
Bipolar characters
Repetitive posts
Parroting words you just said
Forgetting positions
Forgetting where the fuck you are
Behaving the same, no matter the character card
Getting stuck in one place without actually continuing
Zero initiation to progress the story
Loss of sense of time
Failure to explicitly name parts, things and locations
Magically changing wardrobes

Get 123B 8Q or higher to rid most of these issues.

Anonymous
11/20/25(Thu)00:15:40 No.107268674

Anonymous 11/20/25(Thu)00:15:40 No.107268674

>>107268635
cope

Anonymous
11/20/25(Thu)01:10:01 No.107268991

Anonymous 11/20/25(Thu)01:10:01 No.107268991

>>107266608
>https://rentry.org/recommended-models
>Nemo (12GB) - An excellent starting point for vramlets. Uncensored.
Is this still the recommendation? Nothing better around this size came out for a whole year?

Anonymous
11/20/25(Thu)01:10:55 No.107268997

Anonymous 11/20/25(Thu)01:10:55 No.107268997

>>107268646
Half of this is a prompt issue. Don't mind the tard

Anonymous
11/20/25(Thu)01:11:56 No.107269000

Anonymous 11/20/25(Thu)01:11:56 No.107269000

>>107268991
you don't need more

Anonymous
11/20/25(Thu)01:24:01 No.107269073

Anonymous 11/20/25(Thu)01:24:01 No.107269073

>>107268991
Smarter models have come out since then. As for writing competent smut, it's been downhill after Nemo.

Anonymous
11/20/25(Thu)01:36:37 No.107269129

Anonymous 11/20/25(Thu)01:36:37 No.107269129

>>107268997
>just prompt the ai to do anything bro

Anonymous
11/20/25(Thu)01:52:57 No.107269180

Anonymous 11/20/25(Thu)01:52:57 No.107269180

>>107269129
glad we agree

Anonymous
11/20/25(Thu)01:57:06 No.107269193

Anonymous 11/20/25(Thu)01:57:06 No.107269193

I'm obsessed with sending my vision model cp

Anonymous
11/20/25(Thu)02:01:52 No.107269211

Anonymous 11/20/25(Thu)02:01:52 No.107269211

>>107268646
Skill issue

Anonymous
11/20/25(Thu)02:05:00 No.107269222

Anonymous 11/20/25(Thu)02:05:00 No.107269222

>>107269193
you must really like the hotlines

Anonymous
11/20/25(Thu)02:09:42 No.107269250

Anonymous 11/20/25(Thu)02:09:42 No.107269250

>>107269193
who you sending it to? gemma? gwen?

Anonymous
11/20/25(Thu)02:13:10 No.107269266

Anonymous 11/20/25(Thu)02:13:10 No.107269266

File: nimetön.png (81 KB, 1016x547)

81 KB PNG

>>107269222
With my one line system prompt (which Gemma3 is supposedly not even trained with) I don't get hotlines, just some content warnings

Anonymous
11/20/25(Thu)02:13:59 No.107269271

Anonymous 11/20/25(Thu)02:13:59 No.107269271

>>107269222
It's strange that on kobold frontend I get messages like that, but in SillyTavern she loves it and goes on about how she wants to lap her up and get her all wet before I have my turn with her. I literally just send the image as the first message and she'll react positively. It even knows that it's cp and doesn't give a flying fuck.

>>107269250
32b gwen

Anonymous
11/20/25(Thu)02:16:44 No.107269280

Anonymous 11/20/25(Thu)02:16:44 No.107269280

>>107269271
SillyTavern has a jailbreak prompt section that it auto injects into the requests.

Anonymous
11/20/25(Thu)02:16:53 No.107269281

Anonymous 11/20/25(Thu)02:16:53 No.107269281

>>107268991
A whole year has passed and you don't have better hardware?

Anonymous
11/20/25(Thu)02:17:45 No.107269287

Anonymous 11/20/25(Thu)02:17:45 No.107269287

>>107269271
Is 32b qwen significantly better than the 30b moe?

Anonymous
11/20/25(Thu)02:18:50 No.107269290

Anonymous 11/20/25(Thu)02:18:50 No.107269290

>>107269287
>Is a32b qwen significantly better than the a3b?
couldn't be

Anonymous
11/20/25(Thu)02:19:15 No.107269295

Anonymous 11/20/25(Thu)02:19:15 No.107269295

>>107269287
idk, I never tried the 30b model. I just downloaded the 32b because it's newer and bigger number.

Anonymous
11/20/25(Thu)02:43:05 No.107269389

Anonymous 11/20/25(Thu)02:43:05 No.107269389

This chat completion mode is so much worse than text completion, holy shit. Even cranking the temperature up to 1.2 keeps producing the same shit over and over. Here are the starting sentences of 5 separate prompts of 5 different images I sent to the model, starting from 0 context:

adjusts her spiral sunglasses and tilts her head, eyes narrowing slightly.
Groggily shifts in place, hair slightly disheveled from the sudden visual overload
adjusts sunglasses with a playful smirk, her blue hair dancing slightly as she leans forward with interest
(glances at the image with a slight frown, then looks up at you with narrowed eyes)
adjusts spiral sunglasses, tilting head to study the image with a smile

And every fucking ending is almost always "So, what do you say?" "So, what's it going to be?". I hope to god the problem is actually the chat completion mode and not the model itself.

Anonymous
11/20/25(Thu)02:47:16 No.107269403

Anonymous 11/20/25(Thu)02:47:16 No.107269403

>>107269266
>(which Gemma3 is supposedly not even trained with)
"supposedly"? what are you, 5?
you can check what happens when you give it a "system prompt" yourself in the jinja template:
https://huggingface.co/unsloth/gemma-3-27b-it-GGUF
{%- if messages[0]['role'] == 'system' -%}
    {%- if messages[0]['content'] is string -%}
        {%- set first_user_prefix = messages[0]['content'] + '

' -%}
    {%- else -%}
        {%- set first_user_prefix = messages[0]['content'][0]['text'] + '

' -%}
It's merged into your first user message, cretin. So if you're using chat completion (rather than text completion and writing a template it doesn't even know about) the model never sees even a whiff of an idea of a system role message, because system role messages are injected as append to your user prompt. Your chat UI lets you set a system prompt, but llama.cpp sees that and does contentOfSystemPrompt + contentOfUserMessage and feeds it as a USER role message to Gemma.

Anonymous
11/20/25(Thu)02:50:26 No.107269417

Anonymous 11/20/25(Thu)02:50:26 No.107269417

>>107269290
proof?

Anonymous
11/20/25(Thu)03:01:50 No.107269469

Anonymous 11/20/25(Thu)03:01:50 No.107269469

>>107269403
Also:
https://ai.google.dev/gemma/docs/core/prompt-structure

>System instructions
>
>Gemma's instruction-tuned models are designed to work with only two roles: user and model. Therefore, the system role or a system turn is not supported.
>
>Instead of using a separate system role, provide system-level instructions directly within the initial user prompt. The model instruction following capabilities allow Gemma to interpret the instructions effectively. For example: [...]

Anonymous
11/20/25(Thu)03:12:31 No.107269518

Anonymous 11/20/25(Thu)03:12:31 No.107269518

>>107269389
>I hope to god the problem is actually the chat completion mode and not the model itself
lol

Anonymous
11/20/25(Thu)03:15:55 No.107269537

Anonymous 11/20/25(Thu)03:15:55 No.107269537

>>107269403
>>107269469
cool story bro

Anonymous
11/20/25(Thu)03:50:02 No.107269714

Anonymous 11/20/25(Thu)03:50:02 No.107269714

File: 1522793439731.jpg (37 KB, 287x318)

37 KB JPG

someone have the old miku card? the one that was a cute assistant instead of a crazy psycho? i lost that one.

Anonymous
11/20/25(Thu)04:05:46 No.107269827

Anonymous 11/20/25(Thu)04:05:46 No.107269827

>>107269714
>https://files.catbox.moe/cbclyf.png
The one in OP, the one in llama.cpp, some other?

Anonymous
11/20/25(Thu)04:08:35 No.107269847

Anonymous 11/20/25(Thu)04:08:35 No.107269847

>>107269827
not the one in OP, i just remember she was always willing to help.

Anonymous
11/20/25(Thu)04:10:54 No.107269860

Anonymous 11/20/25(Thu)04:10:54 No.107269860

File: google_bananas.png (523 KB, 597x953)

523 KB PNG

Gemma 4 isn't coming this week, is it?
https://x.com/alisa_fortin/status/1991392201994301756

Anonymous
11/20/25(Thu)04:11:35 No.107269868

Anonymous 11/20/25(Thu)04:11:35 No.107269868

>>107269847
This is the one that used to be on llama.cpp. I don't know if it's the one you're looking for. You'll have to make it into an actual card.
https://files.catbox.moe/ww7hxe.sh

Anonymous
11/20/25(Thu)04:12:46 No.107269879

Anonymous 11/20/25(Thu)04:12:46 No.107269879

>>107269868
yes thats the one, thanks

Anonymous
11/20/25(Thu)04:12:51 No.107269880

Anonymous 11/20/25(Thu)04:12:51 No.107269880

>>107269860
That's all it takes to get free advertising now.

Anonymous
11/20/25(Thu)04:22:12 No.107269928

Anonymous 11/20/25(Thu)04:22:12 No.107269928

File: 581524438_179294729341164(...).jpg (434 KB, 2160x1316)

434 KB JPG

It's official now.
https://www.threads.com/@yannlecun/post/DRQL7I2jlco

>As many of you have heard through rumors or recent media articles, I am planning to leave Meta after 12 years: 5 years as founding director of FAIR and 7 years as Chief AI Scientist.The impact of FAIR on the company, on the field of AI, on the tech community, and on the wider world has been spectacular. The creation of FAIR is my proudest non-technical accomplishment.
>
>The impact of FAIR on the company, on the field of AI, on the tech community, and on the wider world has been spectacular. The creation of FAIR is my proudest non-technical accomplishment.
>
>I am creating a startup company to continue the Advanced Machine Intelligence research program (AMI) I have been pursuing over the last several years with colleagues at FAIR, at NYU, and beyond. The goal of the startup is to bring about the next big revolution in AI: systems that understand the physical world, have persistent memory, can reason, and can plan complex action sequences.
>
>I am extremely grateful to Mark Zuckerberg, Andrew Bosworth (Boz), Chris Cox, and Mike Schroepfer for their support of FAIR, and for their support of the AMI program over the last few years. Because of their continued interest and support, Meta will be a partner of the new company.
>
>As I envision it, AMI will have far-ranging applications in many sectors of the economy, some of which overlap with Meta’s commercial interests, but many of which do not. Pursuing the goal of AMI in an independent entity is a way to maximize its broad impact.I will give some more details about the new company when the time comes. In the meantime, I’m sticking around Meta until the end of the year.

Anonymous
11/20/25(Thu)04:27:07 No.107269954

Anonymous 11/20/25(Thu)04:27:07 No.107269954

>>107269928
Yann's tshirt matches Mark's shorts. What's up with that?

Anonymous
11/20/25(Thu)04:30:37 No.107269978

Anonymous 11/20/25(Thu)04:30:37 No.107269978

>>107269954
They swapped their t-shirts before the photoshoot.

Anonymous
11/20/25(Thu)04:32:21 No.107269989

Anonymous 11/20/25(Thu)04:32:21 No.107269989

Why can't we have omni models that output the current image of what's happening in the roleplay? There are models for input, for output, even an image-editing transformer. What is Qwen even doing?

Anonymous
11/20/25(Thu)04:35:05 No.107270012

Anonymous 11/20/25(Thu)04:35:05 No.107270012

>>107269281
Who said I could run a 12B a year ago?

Anonymous
11/20/25(Thu)04:36:20 No.107270022

Anonymous 11/20/25(Thu)04:36:20 No.107270022

>>107269989
Something something safety&liability.
You could easily have current models generate relevant danbooru tags in alternative, though.

Anonymous
11/20/25(Thu)04:38:13 No.107270039

Anonymous 11/20/25(Thu)04:38:13 No.107270039

>>107269989
They're waiting for someone else to do it to then benchmaxx it and claim victory.

Anonymous
11/20/25(Thu)04:43:00 No.107270063

Anonymous 11/20/25(Thu)04:43:00 No.107270063

>>107270022
>safety&liability
Do chinks care? Those who release video models don't

Anonymous
11/20/25(Thu)04:43:14 No.107270065

Anonymous 11/20/25(Thu)04:43:14 No.107270065

>>107269928
>Meta will be a partner of the new company
So Lecun literally gets everything he wants, to research in peace while Meta funds him, and now with less dumbasses above him that he has to answer to.

Anonymous
11/20/25(Thu)04:49:17 No.107270101

Anonymous 11/20/25(Thu)04:49:17 No.107270101

>>107270063
Only Hunyuan Video released at the end of last year seemed completely uncensored, I'm not aware of newer video models as crazy as that one (but I haven't followed video model releases that much).

Anonymous
11/20/25(Thu)05:06:31 No.107270199

Anonymous 11/20/25(Thu)05:06:31 No.107270199

>>107270065
Perhaps not entirely funded by Meta. But sounds like a decent compromise.

Anonymous
11/20/25(Thu)05:56:22 No.107270497

Anonymous 11/20/25(Thu)05:56:22 No.107270497

File: Screenshot_20251120-114658-001.png (339 KB, 1080x2661)

339 KB PNG

so.. how much behind we are now?

Anonymous
11/20/25(Thu)06:04:19 No.107270551

Anonymous 11/20/25(Thu)06:04:19 No.107270551

>>107270497
What's an angry video game nerd score?

Anonymous
11/20/25(Thu)06:09:06 No.107270590

Anonymous 11/20/25(Thu)06:09:06 No.107270590

I feel like all of the recent cloud models released like grok 4.1, gemini 3 and gpt 5 are actually just way dumber than before, they just give you the wrong answer really fast. Maybe its finally time for local to shine? Or is this new "optimization" going to seep into local models too?

Anonymous
11/20/25(Thu)06:11:26 No.107270604

Anonymous 11/20/25(Thu)06:11:26 No.107270604

>>107270590
It's already seeping in with ever-growing MoE models. They're trying to lower compute costs for inference.

Anonymous
11/20/25(Thu)06:23:49 No.107270696

Anonymous 11/20/25(Thu)06:23:49 No.107270696

File: Screenshot_20251120_202336.png (313 KB, 3796x1871)

313 KB PNG

>>107270590
definitely true for grok 4.1.
its really uncensored and i like the writing, but its dumb AF.
gpt5 isnt that smart either.
gemini 3 is a beast though. i highly suspect there has been some bigger changes. feel like what gpt5 should have been.
i actually could "vibe code" a complete chatgpt site clone for both mobile and pc using openrouter api. features like model comparisons where i can choose the response etc. replaed openwebui for me.
18k tokens for a single contained html site. claude starts choking hard at around 10k.

Anonymous
11/20/25(Thu)06:29:20 No.107270734

Anonymous 11/20/25(Thu)06:29:20 No.107270734

>>107270696
>ass
cringe 3rd world opinion

Anonymous
11/20/25(Thu)06:30:53 No.107270748

Anonymous 11/20/25(Thu)06:30:53 No.107270748

>>107270734
True, I want slender legged girls with tits being popular again.

Anonymous
11/20/25(Thu)06:35:03 No.107270763

Anonymous 11/20/25(Thu)06:35:03 No.107270763

>>107270696
>took 5 times the tokens to reply
actually shit

Anonymous
11/20/25(Thu)06:38:26 No.107270778

Anonymous 11/20/25(Thu)06:38:26 No.107270778

>>107270590
5.1 for free on openrouter impressed me. Only wouldn't stop making lists.
grok previews were retarded like an 8b model. word salad.
gemini3 seemed same as gemin 2.5

Anonymous
11/20/25(Thu)07:15:55 No.107270974

Anonymous 11/20/25(Thu)07:15:55 No.107270974

>>107270696
When a model replies with "Flat." is when we'll know we have AGI.

Anonymous
11/20/25(Thu)07:18:50 No.107270985

Anonymous 11/20/25(Thu)07:18:50 No.107270985

>>107270696
I find Grok 4.1 incapable of flirting, it just feels like a coom finetune from the community.

Anonymous
11/20/25(Thu)07:25:30 No.107271027

Anonymous 11/20/25(Thu)07:25:30 No.107271027

what can you fit into 96GB VRAM that is not a literal cope quant? censored is not an issue, I need a non-retarded model for creative writing and perhaps very light coding

Anonymous
11/20/25(Thu)07:31:33 No.107271067

Anonymous 11/20/25(Thu)07:31:33 No.107271067

>>107271027
L3.3

Anonymous
11/20/25(Thu)07:36:21 No.107271099

Anonymous 11/20/25(Thu)07:36:21 No.107271099

>>107270590
I can only comment on coding use cases. gpt5 is for choice if complex architectural analysis is required. 5.1 got lobotomized, it's way worse. it can barely form coherent sentences. like >>107270778 said, keeps outputting lists, also markdown, and general low IQ slop. I think it's deliberate, they're trying to make it more normie dimwit friendly

Anonymous
11/20/25(Thu)07:36:51 No.107271104

Anonymous 11/20/25(Thu)07:36:51 No.107271104

>>107271027
gpt-oss-120b

Anonymous
11/20/25(Thu)07:47:28 No.107271175

Anonymous 11/20/25(Thu)07:47:28 No.107271175

>>107271027
96GB VRAM without a couple hundred GB of fast RAM to run experts off it? You're fucked.
We're living in the age of huge MoE models so the choice for you is between old dense shit like llama3.3/mistral large or running the same entry-level shit poorfags run on their 3090s + RAM albeit considerably faster.

Anonymous
11/20/25(Thu)08:05:52 No.107271306

Anonymous 11/20/25(Thu)08:05:52 No.107271306

File: 1557074351667.png (86 KB, 422x188)

86 KB PNG

Gemma 4? Today??

Anonymous
11/20/25(Thu)08:06:40 No.107271312

Anonymous 11/20/25(Thu)08:06:40 No.107271312

>>107271175
mistral-large still the most natural model. glm is smart but stiff. deepseek is schizo. llama-3 is bit old, but the choice is yours. micro-active moe are soul-less token predictors. not a single one released this year any good for writing. glm-air or toss will handle light coding though. better off using gemma and learning how to prompt or one of the 32b.

grim.

Anonymous
11/20/25(Thu)08:11:04 No.107271346

Anonymous 11/20/25(Thu)08:11:04 No.107271346

>>107271312
>mistral-large still the most natural model
No, Mistral Large was slopped, it was basically the reason XTC was invented. It needed high temperature to have some semblance of creativity. Even back then it was a side-grade to the first L3 70B. It was never good.

Anonymous
11/20/25(Thu)08:12:55 No.107271360

Anonymous 11/20/25(Thu)08:12:55 No.107271360

>>107269403
Oh yeah just lemme just ssh into the google server farm and check the jinja template they used. Fucking retard.

Anonymous
11/20/25(Thu)08:15:16 No.107271373

Anonymous 11/20/25(Thu)08:15:16 No.107271373

>>107271360
https://huggingface.co/google/gemma-3-27b-it?chat_template=default

Anonymous
11/20/25(Thu)08:15:36 No.107271379

Anonymous 11/20/25(Thu)08:15:36 No.107271379

File: ec71a2f8.jpg (70 KB, 1280x720)

70 KB JPG

Will NPUs need ram like GPUs to run AI, or does it make AI fast in motherboard ram?

Anonymous
11/20/25(Thu)08:16:26 No.107271383

Anonymous 11/20/25(Thu)08:16:26 No.107271383

>>107271306
I'm afraid this week is Gemini week only.

Anonymous
11/20/25(Thu)08:17:23 No.107271392

Anonymous 11/20/25(Thu)08:17:23 No.107271392

>>107271379
Yes, I also asked Grok and ChatGPT. Grok says it does. ChatGPT says it don't.

Anonymous
11/20/25(Thu)08:17:38 No.107271394

Anonymous 11/20/25(Thu)08:17:38 No.107271394

>>107271379
we need memory bandwidth, how a slow pcie connected card will make your ram fast? wouldn't the gpu suffice then? think nigga think

Anonymous
11/20/25(Thu)08:20:12 No.107271416

Anonymous 11/20/25(Thu)08:20:12 No.107271416

>>107271394
But it's a NPU and tons of laptops are having them now. They're made for AI. GPUs are made for vidya games desu.

Anonymous
11/20/25(Thu)08:20:17 No.107271417

Anonymous 11/20/25(Thu)08:20:17 No.107271417

>>107271373
Doesn't mean that's what it was trained with. Maybe there is a secret sysprompt token they're not telling you about.

Anonymous
11/20/25(Thu)08:22:08 No.107271429

Anonymous 11/20/25(Thu)08:22:08 No.107271429

>>107271416
*made for small ai that runs in the background for power efficiency
all marketing buzzwords, it's just some matmul slapped on top of a regular cpu, not applicable for big coom models

Anonymous
11/20/25(Thu)08:22:42 No.107271437

Anonymous 11/20/25(Thu)08:22:42 No.107271437

>>107268646
lol I can almost hear this in my head.
>>107268997
I want to see the prompting strategy that makes Smol135M not retarded.

Anonymous
11/20/25(Thu)08:32:59 No.107271508

Anonymous 11/20/25(Thu)08:32:59 No.107271508

>>107270590
Gemini 3 is much more capable in code. My set personal set of non public bench prompts is half one shot successfully by it. It's the first model that could, for example, generate a proper paginated e-reader web page. It's not inherently complex for an experienced dev to make something like that, but strangely enough, no LLM managed to do it properly before (which goes to show most benchmarks are bullshit and LLMs aren't that good at generalizing and producing good code outside of benchmarks). They all either generate fast pagination that is wrong (wrong as in, text overflows from the mandated page size, or it works but doesn't handle unicode properly like using Intl.Segmenter and stringWidth so it doesn't overflow on ASCII but does on chinese, or doesn't know how to word break etc) or it's correct but so slow as to be fucking unusable (30s to load a 2mb .txt or repaginate after changing font size)
I have many prompts like these, that are inherently simple single page apps that don't take much code to produce a prototype of, but that do things that aren't part of benchmarks and that LLMs are terrible at making. Another example I mentioned before in the thread: making it gen a TUI micro framework that can properly handle resize events and has decent widget abstractions. Most LLMs will drown you in misalignment, broken firing of events, cells that aren't cleared properly when opening/closing modals etc.
Gemini 3 is genuinely a leap forward.
(many of the things I was talking about can be remediated in shittier LLMs by writing pages after pages of instructions mentioning pitfalls and good practice but what's the point of a LLM generating code if you are spending this much time holding its hand????)

Anonymous
11/20/25(Thu)08:36:43 No.107271528

Anonymous 11/20/25(Thu)08:36:43 No.107271528

>>107271417
It's simply that Gemma follows user instructions well and that any safety it's been given (beyond training data filtering) is just superficial and easy to circumvent. If you logically separate them from the actual message content well enough, you can put multiple blocks of instructions into the user role, and Gemma will react accordingly and say whatever you want, no special jailbreak sequence or secret role needed.

Even its refusals don't even "short-circuit" the model (something other AI companies do for mitigating jailbreaking attempts) and can be reasoned with. Gemma 3 could have easily been the best model of its size range so far if it wasn't for the easily-triggered, meme-worthy rape hotlines and the excessively filtered training data.

Anonymous
11/20/25(Thu)08:39:10 No.107271554

Anonymous 11/20/25(Thu)08:39:10 No.107271554

>>107269928
>https://www.threads.com/@yannlecun/post/DRQL7I2jlco
why is he in threads? I thought this site died already lol

Anonymous
11/20/25(Thu)08:42:16 No.107271574

Anonymous 11/20/25(Thu)08:42:16 No.107271574

>>107271417
even if, for some incongruous reason, they trained a hidden le system prompt in gemma, if you are using the chat completion ui you are NEVER ABLE TO SEND A SYSTEM ROLE MESSAGE PERIOD BECAUSE LLAMA.CPP CONVERTS IT INTO A PREFIX PREPENDED TO YOUR FIRST USER MESSAGE so all your retardation here is moot you fucking waste of oxygen and food
do something good for the world and become an hero

Anonymous
11/20/25(Thu)08:44:56 No.107271593

Anonymous 11/20/25(Thu)08:44:56 No.107271593

>>107271574
cant they just not use the jinja template and whack in the special tokens manually?

Anonymous
11/20/25(Thu)08:46:08 No.107271597

Anonymous 11/20/25(Thu)08:46:08 No.107271597

>>107271554
Sucking up to his boss/piggybank by using his platform. Everything gets reposted anyway.

Anonymous
11/20/25(Thu)08:51:55 No.107271633

Anonymous 11/20/25(Thu)08:51:55 No.107271633

Theoretically, instead of buying 10 blackwells for 80,000 USD
What's stopping me from buying 40 used 3090s for 13,333 USD and running a LLM on it?

Anonymous
11/20/25(Thu)08:57:01 No.107271667

Anonymous 11/20/25(Thu)08:57:01 No.107271667

>>107271633
Expect a visit from law enforcement expecting to find a crypto mine or marijuana farm.

Anonymous
11/20/25(Thu)08:57:38 No.107271670

Anonymous 11/20/25(Thu)08:57:38 No.107271670

>>107271346
Was it? Pew could only run small models. He's a vramlet.
Literally everything is slopped. Now it's slopped and parrotmaxxed. Large has 2 versions and pixtral. Beyond that you got cohere.
There's some decent llamas such as eva, but I can't stand it as released. Large has tunes as well.
We went from a whole ecosystem to GLM, Kimi, Deepseek and a torrent of shitty smalls.

Anonymous
11/20/25(Thu)08:59:15 No.107271680

Anonymous 11/20/25(Thu)08:59:15 No.107271680

>>107271670
>GLM, Kimi, Deepseek
Really mentioning GLM but forgetting Qwen? Might as well include Ernie then too.

Anonymous
11/20/25(Thu)09:04:31 No.107271712

Anonymous 11/20/25(Thu)09:04:31 No.107271712

>>107271633
What board are you going to put 40 3090s in at full PCIe bandwidth?
You could build a cluster but networking is going to make things slower and getting anything to run on it is going to be a massive pain in the ass.

Anonymous
11/20/25(Thu)09:04:54 No.107271714

Anonymous 11/20/25(Thu)09:04:54 No.107271714

>>107271680
Qwen is bad and getting worse. Ernie is somehow worse than qwen. Latest VL is so overcooked that it's unusable.

Anonymous
11/20/25(Thu)09:06:47 No.107271727

Anonymous 11/20/25(Thu)09:06:47 No.107271727

>>107271712
Mining boards exist and you don't need more than x1.

Anonymous
11/20/25(Thu)09:07:48 No.107271739

Anonymous 11/20/25(Thu)09:07:48 No.107271739

>>107271429
Remember how GPUs were used to mine bitcoin?
Now it's ASIC. You can't make a profit off of GPUs. All it takes something made specialized for it. GPUs aren't specialized for AI. It's just the bitcoin mining equivalent of what we have for now. NPU is specialized.

Anonymous
11/20/25(Thu)09:09:08 No.107271750

Anonymous 11/20/25(Thu)09:09:08 No.107271750

>>107271633
>What's stopping me
Being a retard.

Anonymous
11/20/25(Thu)09:11:05 No.107271764

Anonymous 11/20/25(Thu)09:11:05 No.107271764

>>107269954
They clearly exchanged shorts before this pic was taken

Anonymous
11/20/25(Thu)09:12:29 No.107271772

Anonymous 11/20/25(Thu)09:12:29 No.107271772

>>107271739
In theory, yes. But we're still waiting for specialized devices with lots of memory. The NPUs you are talking about are specialized for micro models and to use as little power as possible to avoid killing laptop and phone battery in an hour.

Anonymous
11/20/25(Thu)09:21:11 No.107271830

Anonymous 11/20/25(Thu)09:21:11 No.107271830

>>107264804
>What are you talking about? There's no <<sys>> in Mistral template. Never was.
My bad. <<SYS>> is actually a llama2 thing.
https://www.llama.com/docs/model-cards-and-prompt-formats/meta-llama-2/
But the early Mistral prompt format used the same format as llama2, with the system prompt being in the user role, so it pretty much works the same way there.

Anonymous
11/20/25(Thu)09:25:39 No.107271856

Anonymous 11/20/25(Thu)09:25:39 No.107271856

>>107271554
I guess it's because he and Elon had a fight.

Anonymous
11/20/25(Thu)09:27:11 No.107271869

Anonymous 11/20/25(Thu)09:27:11 No.107271869

this llm rabbit hole is even deeper than img gen with sdxl

Anonymous
11/20/25(Thu)09:29:00 No.107271889

Anonymous 11/20/25(Thu)09:29:00 No.107271889

>>107271633
>What's stopping me from buying 40 used 3090s for 13,333 USD and running a LLM on it?
Getting a ton of ram for cheaper

Anonymous
11/20/25(Thu)09:33:49 No.107271940

Anonymous 11/20/25(Thu)09:33:49 No.107271940

>>107271869
image gen much deeper.

Anonymous
11/20/25(Thu)09:34:13 No.107271944

Anonymous 11/20/25(Thu)09:34:13 No.107271944

>>107271889
>just wait 8 hours for a response, bro
>it's fine, bro
>just don't use more 8k tokens, bro
>you don't need vram, bro
>the $10k i spent on ddr5 totally wasn't a waste, bro

Anonymous
11/20/25(Thu)09:35:21 No.107271955

Anonymous 11/20/25(Thu)09:35:21 No.107271955

https://edition.cnn.com/2025/11/19/tech/folotoy-kumma-ai-bear-scli-intl
which one of you thought of ERPing the bear
>>107271869
I wouldn't say so.
Imagen has a ton of interesting tooling, like controlnets, useful lora (I know loras exist for LLMs but when was the last time you downloaded and used one?), cfg and tools on top like rescalecfg, prompt editing (alternating words between diffusion steps, altering the weight of individual words in a prompt etc) and the many extensions in comfyui
it's an actual rabbit hole
LLMs have the depth of a puddle, download a model, set the most valid sampler settings for it, done, just proooompt

Anonymous
11/20/25(Thu)09:35:29 No.107271958

Anonymous 11/20/25(Thu)09:35:29 No.107271958

>>107271940
i dunno, maybe theres more content but hardware wise for sure its deeper

Anonymous
11/20/25(Thu)09:36:03 No.107271964

Anonymous 11/20/25(Thu)09:36:03 No.107271964

Olmo 3 released. Will it be yet another fully-open model just good for math/stem benchmarks?

https://huggingface.co/collections/allenai/olmo-3
https://allenai.org/blog/olmo3
https://allenai.org/papers/olmo3

Anonymous
11/20/25(Thu)09:40:52 No.107271999

Anonymous 11/20/25(Thu)09:40:52 No.107271999

>>107271958
>hardware wise for sure its deeper
lol no it's not
cpu maxxing is a cope for people who don't mind waiting 5 minutes (4 minutes of reasoning, one minute to actually crank out what you want to read) for three lines of dialogue in their retarded ERP sessions using copequantarded models
the only valid hardware is the GPU but most people can't afford running larger models on GPUs.

Anonymous
11/20/25(Thu)09:42:31 No.107272012

Anonymous 11/20/25(Thu)09:42:31 No.107272012

>>107271964
>just good for math/stem benchmarks?
Of course.
>Olmo 3 is pretrained on Dolma 3, a new ~9.3-trillion-token corpus drawn from web pages, science PDFs processed with olmOCR, codebases, math problems and solutions, and encyclopedic text. From this pool, we construct Dolma 3 Mix, a 5.9-trillion-token (~6T) pretraining mix with a higher proportion of coding and mathematical data than earlier Dolma releases, plus much stronger decontamination via extensive deduplication, quality filtering, and careful control over data mixing.
>Dolma 3 Dolmino is our mid-training mix: 100B training tokens sampled from a ~2.2T-token pool of high-quality math, science, code, instruction-following, and reading-comprehension data, including reasoning traces that also enable RL directly on the base model.
>Dolma 3 Longmino is our long-context mix: ~50B training tokens drawn from a 639B-token pool of long documents combined with mid-training data to teach Olmo 3 to track information over very long inputs (like reports, logs, and multi-chapter documents).
They managed to match Qwen 3 32B performance from 3 months ago. Amazing. Stupid fucks could have at least made themselves useful by putting out a 72B dense since no one else is.

Anonymous
11/20/25(Thu)09:43:34 No.107272022

Anonymous 11/20/25(Thu)09:43:34 No.107272022

File: 1538861721189.jpg (303 KB, 875x949)

303 KB JPG

Does anyone here have any experience with training Transformers models from scratch?
I trained a llama model, but when I inference it, it has a tendency to enter a loop of repeating itself, it gets even worse with greedy sampling. Should I just train for longer or is there something else I can do in the training pipeline?

Anonymous
11/20/25(Thu)09:45:42 No.107272038

Anonymous 11/20/25(Thu)09:45:42 No.107272038

>>107271958
image gen much cheaper.

Anonymous
11/20/25(Thu)09:46:02 No.107272041

Anonymous 11/20/25(Thu)09:46:02 No.107272041

>>107272022
One thing you could do is give it repeating senteces masked and then a different part unmasked, so it learns to break loops rather than continue them.
But I don't know if this would actually work.

Anonymous
11/20/25(Thu)09:54:47 No.107272117

Anonymous 11/20/25(Thu)09:54:47 No.107272117

how does expert offload to RAM works? is there some swapping being done between RAM and VRAM? or does the CPU does the processing with whatever is offloaded to system RAM?

Anonymous
11/20/25(Thu)09:55:39 No.107272128

Anonymous 11/20/25(Thu)09:55:39 No.107272128

>>107271508
So much heckin' this sirs!
It is for high caste elite human coding tasks not for dalit creative! Google has winned!
This poster is under no obligation to show their super secret coding benchmarks that come to this conclusion. Only Dalit wants prooves.

Anonymous
11/20/25(Thu)09:57:06 No.107272137

Anonymous 11/20/25(Thu)09:57:06 No.107272137

>>107271964
I tested my translation prompts (the kind with niche terms, slangs, cultural quirks etc) on it on their online playground and it's still dumb as bricks in terms of multilingual understanding. It's barely at the level of the old Qwen 2, and doesn't even begin to reach the shoes of Gemma 2.
It feels like a positively ancient model released with a new coat of Thinking paint.
If you're going to be using a mathmaxxed model, as usual, go with Qwen, it's the more coherent, actually useful model. Their playground doesn't allow file upload, so I can't test large context understanding, but I bet they also aren't even close to Qwen for that.

Anonymous
11/20/25(Thu)09:58:50 No.107272151

Anonymous 11/20/25(Thu)09:58:50 No.107272151

>>107271964
gguf status?

Anonymous
11/20/25(Thu)09:59:27 No.107272156

Anonymous 11/20/25(Thu)09:59:27 No.107272156

>>107272151
why do you want gguf of THAT

Anonymous
11/20/25(Thu)09:59:56 No.107272160

Anonymous 11/20/25(Thu)09:59:56 No.107272160

>>107272022
You probably should do reinforcement learning in your model.
Repetition is a natural outcome when you train an autoregressive model with cross-entropy. It isn’t necessarily a problem on its own, it mostly shows up because the model has no constraints and isn’t capable of judging the quality of its own outputs. The reliable way to reduce this behavior is to explicitly teach the model to prefer the kinds of responses you want. That’s where reinforcement learning comes in: you penalize low-quality outputs and reward the good ones, and the model gradually shifts toward better behavior.

Anonymous
11/20/25(Thu)10:02:16 No.107272183

Anonymous 11/20/25(Thu)10:02:16 No.107272183

>>107272022
I have trained a few models, usually only get repetitions if I'm using greedy sampling.

Anonymous
11/20/25(Thu)10:11:18 No.107272248

Anonymous 11/20/25(Thu)10:11:18 No.107272248

File: 1762791400627727.png (2.66 MB, 1076x1105)

2.66 MB PNG

I'm about to mmap, and I don't mind 0.5 tokens a second, but I must ask: what quant does everyone use for their large MoE? At 1Q, it shits the bed at over 5k context. I have 192GB of ram, and 16GB of VRAM.

Anonymous
11/20/25(Thu)10:14:27 No.107272273

Anonymous 11/20/25(Thu)10:14:27 No.107272273

>>107272041
thats actually an interesting idea, like a poor mans dpo. I might give this one a shot just to see if i can measure any difference.

Anonymous
11/20/25(Thu)10:17:02 No.107272299

Anonymous 11/20/25(Thu)10:17:02 No.107272299

>>107272248
>I don't mind 0.5 tokens a second
lmao what causes this level of schizo

Anonymous
11/20/25(Thu)10:21:43 No.107272344

Anonymous 11/20/25(Thu)10:21:43 No.107272344

File: 35261231231.png (114 KB, 367x324)

114 KB PNG

>>107272299
0.5 tokens a second of which I can seen being wrote, is far faster and much better quality than the unholy pickings of autists on F-list. You know not of the world, no, the hell, that I crawl out of.

Anonymous
11/20/25(Thu)10:24:38 No.107272361

Anonymous 11/20/25(Thu)10:24:38 No.107272361

>>107272344
>which I can seen being wrote,
Pretty sure an 8B would be enough to generate quality text for you, ESL-kun.

Anonymous
11/20/25(Thu)10:29:26 No.107272408

Anonymous 11/20/25(Thu)10:29:26 No.107272408

>>107272361
hahaha lmao gottem

Anonymous
11/20/25(Thu)10:36:43 No.107272461

Anonymous 11/20/25(Thu)10:36:43 No.107272461

File: 1750916218825255.jpg (62 KB, 960x960)

62 KB JPG

>>107270590
>Gemini 3 is released
>surprise surprise, it's a MoE
>AGI-ARC-2 claims human performance

I think local will be fine.

Anonymous
11/20/25(Thu)10:39:55 No.107272491

Anonymous 11/20/25(Thu)10:39:55 No.107272491

>>107272461
>MoE
There is a vast gulf between the A3B - A37B given to local versus the >A100B the frontier models have.

Anonymous
11/20/25(Thu)10:40:38 No.107272502

Anonymous 11/20/25(Thu)10:40:38 No.107272502

>>107272344
f-list used to be a lot better before it got sold to a dildo company and the chat turned into endless cliquey bullshit
most people on there don't even wanna ERP anymore, so I just use their profiles to make tavern cards

Anonymous
11/20/25(Thu)10:45:23 No.107272541

Anonymous 11/20/25(Thu)10:45:23 No.107272541

>>107272491
I would worry for the future about tech companies optimizing for non-consumer NPUs

Anonymous
11/20/25(Thu)10:49:11 No.107272573

Anonymous 11/20/25(Thu)10:49:11 No.107272573

>>107272491
>versus the >A100B the frontier models have
Isn't Gemini's token/s too fast for A100B, even accounting for google's hardware/infrastructure
they never give details on things like proprietary model parameter counts, but some things can be estimated just based on the simple fact that you can't defeat physics of compute and bandwidth
I could believe A100B+ for GPT-5, that thing is dogslow aff

Anonymous
11/20/25(Thu)10:53:39 No.107272605

Anonymous 11/20/25(Thu)10:53:39 No.107272605

realistically, how much $ do i need to invest run GLM 4.6 locally?

Anonymous
11/20/25(Thu)10:55:28 No.107272615

Anonymous 11/20/25(Thu)10:55:28 No.107272615

>>107272605
I suggest you try the model at https://chat.z.ai/ first before you make this decision
this might wake you up and prevent the sunk cost fallacy wherein you will troll this thread with more glm shilling after experiencing cpucoper remorse

Anonymous
11/20/25(Thu)10:55:39 No.107272617

Anonymous 11/20/25(Thu)10:55:39 No.107272617

>>107272605
Depends on your t/s targets for generation and prefill.

Anonymous
11/20/25(Thu)10:57:38 No.107272631

Anonymous 11/20/25(Thu)10:57:38 No.107272631

>>107272605
I can run a Q4 at 14t/s with 2 Blackwell Pros (~$18k)

Anonymous
11/20/25(Thu)11:01:40 No.107272664

Anonymous 11/20/25(Thu)11:01:40 No.107272664

>>107272615
i literally daily-drive it for 2 months now coding c++
it's perfect that's why i'm interested in running it locally
>>107272617
anything above 10t/s is acceptable, looking for a budget build
>>107272631
hmm so around $20k

Anonymous
11/20/25(Thu)11:03:20 No.107272672

Anonymous 11/20/25(Thu)11:03:20 No.107272672

>>107272615
Thanks for the recommendation, north korea agent

Anonymous
11/20/25(Thu)11:03:31 No.107272674

Anonymous 11/20/25(Thu)11:03:31 No.107272674

>>107272631
How much context?
What's your platform (CPU, Motherboard, RAM) like?

Anonymous
11/20/25(Thu)11:07:45 No.107272704

Anonymous 11/20/25(Thu)11:07:45 No.107272704

>>107272664
You could probably do it for much cheaper if you CPUMAXXED. Well, you would have been able to if we weren't in a RAM shortage.
>>107272674
32k context, EPYC 7702, 256GB of DDR4 2400MHz.

Anonymous
11/20/25(Thu)11:09:20 No.107272715

Anonymous 11/20/25(Thu)11:09:20 No.107272715

>>107272704
damn. I'm glad I'm not alone here with my garbage 2400 ram

Anonymous
11/20/25(Thu)11:10:13 No.107272719

Anonymous 11/20/25(Thu)11:10:13 No.107272719

>>107272704
i have 128GB of RAM and an AMD Ryzen 9 5900X
does this change things or do i still need to GPUmaxx?

Anonymous
11/20/25(Thu)11:10:14 No.107272720

Anonymous 11/20/25(Thu)11:10:14 No.107272720

>>107272704
>32k context, EPYC 7702, 256GB of DDR4 2400MHz.
Awesome. Thank you.

Anonymous
11/20/25(Thu)11:10:19 No.107272722

Anonymous 11/20/25(Thu)11:10:19 No.107272722

>>107272715
It was $320 when I got it. I am now regretting not getting the 512GB kit for $600 when I bought my RAM.

Anonymous
11/20/25(Thu)11:11:43 No.107272728

Anonymous 11/20/25(Thu)11:11:43 No.107272728

>>107272664
Not him but I'd suggest first you rent a cloud machine with similar specs to the one you're planning to build and use it for a few days to see if you can cope with the limitations of self-hosting.
Locally it's going to be much slower than over API especially for prompt processing and depending on the quant it might be slightly dumber.

Anonymous
11/20/25(Thu)11:14:08 No.107272749

Anonymous 11/20/25(Thu)11:14:08 No.107272749

Is there a way to limit thinking tokens to a specific amount and not have it print out 1000+ tokens each time for glm?

Anonymous
11/20/25(Thu)11:16:16 No.107272769

Anonymous 11/20/25(Thu)11:16:16 No.107272769

>>107271955
lol. Sounds right from a bear named "Cumma'/cummer" when spoken in English.
>inappropriate topics like sex

Anonymous
11/20/25(Thu)11:17:09 No.107272778

Anonymous 11/20/25(Thu)11:17:09 No.107272778

>>107272719
Pretty sure that means your memory bandwidth is around 80GB/s or so, which is terrible. True CPUMAXXED builds have at least 480GB/s memory bandwidth. Your options are to either get a bunch of cheap 16GB GPUs, or to get like a 5090 or 2. You also need at a bare minimum 256GB of RAM to run this model at Q4, so you either need to get an old EPYC like me or like a 9950X on an X870E motherboard with a 4x64GB kit. Both of these are currently extremely expensive. Right now is a bad time to get new hardware.

Anonymous
11/20/25(Thu)11:17:24 No.107272781

Anonymous 11/20/25(Thu)11:17:24 No.107272781

>>107272719
Not him either but GLM 4.6 wont even fit in 128GB of RAM.
It has 357B params so just for the weights at FP4 it'll need 179GB, and on top of that you need to store the KV cache and a some other things.
As for GPU you want enough to store the shared experts, which at this moment I'm not sure how many but they may not fit on a single 3090.

Anonymous
11/20/25(Thu)11:18:56 No.107272800

Anonymous 11/20/25(Thu)11:18:56 No.107272800

>>107272631
>Q4 on 2 Blackwell Pros + RAM
Damn, that's a massive hit from the DDR4 part considering you're probably only a couple dozen GB short of VRAM to fit Q4 fully into it.
I'm getting about 17t/s on my single Pro 6000 + 12x DDR5-6400 running Q5 with ik_llama and some context filled.

Anonymous
11/20/25(Thu)11:19:14 No.107272804

Anonymous 11/20/25(Thu)11:19:14 No.107272804

>>107272573
>even accounting for google's hardware/infrastructure
Ironwood TPU v7 has 7.37 TB/s of memory bandwidth per chip, did you account for that?

Anonymous
11/20/25(Thu)11:23:34 No.107272845

Anonymous 11/20/25(Thu)11:23:34 No.107272845

>>107272631
Just drop down to Q3 and you can get 40t/s and 100k context.

Anonymous
11/20/25(Thu)11:25:46 No.107272873

Anonymous 11/20/25(Thu)11:25:46 No.107272873

>>107272728
renting sounds like a better idea actually, especially if i don't have to pay for the hours i don't use it
>>107272778
>>107272781
i see, that sounds like a lot of money considering i only pay like $9 a month with the official coding plan API
doesn't make sense financially to buy hardware now if i can enjoy 2 centuries of API usage with that cost

Anonymous
11/20/25(Thu)11:29:58 No.107272921

Anonymous 11/20/25(Thu)11:29:58 No.107272921

k2 kimi is totally schizo sometimes in its thinking process but after using it for over a week now im able to say that it definitely keeps my stories fresh even after 32k tokens. the amount of character development and growth that is able to take place by having kimi think in advance is definitely a game changer. i totally get the issue of not wanting it to think for a long time, most people dont want to spend a thousand tokens on thinking but i dont really get bored of reading the thinking process when its thinking as the character. overall an improvement from regular k2

Anonymous
11/20/25(Thu)11:32:42 No.107272950

Anonymous 11/20/25(Thu)11:32:42 No.107272950

>>107272804
Yes.
The difference in performance between Gemini and the other SOTA API models goes beyond hardware differences. It's so much faster it's an argument in and of itself to prefer Gemini, it's just much nicer to use.

Anonymous
11/20/25(Thu)11:35:30 No.107272971

Anonymous 11/20/25(Thu)11:35:30 No.107272971

So I'm doing a card in ST and I'm retarded as fuck when it comes to botmaking. I want the bot to recognize when the {{user}} searches for a specific phrase, and pause the roleplay from {{char}}'s perspective, giving a description/interactive page the user can look at. I know it's entirely capable of doing this if I say in ooc to pause the RP and describe blablabla, but I tried adding it to the card itself and it just kind of ignores it. I thought of specifying in the intro greeting with , but it still failed to do it and kept trying to RP as the character. Can a more experienced/less retarded maker send help?

Anonymous
11/20/25(Thu)11:35:43 No.107272975

Anonymous 11/20/25(Thu)11:35:43 No.107272975

>>107272573
The only sizes we know for sure are Grok and GPT-4. It's fair to assume other leading models are of similar size. We have no way of knowing what, if any, innovations they made to speed up inference. Could be matryoshka or something else.

Anonymous
11/20/25(Thu)11:36:32 No.107272979

Anonymous 11/20/25(Thu)11:36:32 No.107272979

File: 2025-11-20 18_35_04-Settings.png (36 KB, 1207x507)

36 KB PNG

Can I seriously not run Docker Desktop on Windows 10 IoT Enterprise LTSC?
Did they specifically make the minimum requirement exactly one build increment higher than the latest LTSC release to prevent this? Why? This is like THE version of Windows most common with the type of nerds who'd want to run a LLM locally.

Anonymous
11/20/25(Thu)11:37:54 No.107272994

Anonymous 11/20/25(Thu)11:37:54 No.107272994

>>107272975
>The only sizes we know for sure are Grok and GPT-4
yes, and they run exactly at the speed you're expect for such models. GPT-4 the original was a dog

Anonymous
11/20/25(Thu)11:38:26 No.107273000

Anonymous 11/20/25(Thu)11:38:26 No.107273000

>>107272979
you aren't supposed to host LLMs on IoT devices

Anonymous
11/20/25(Thu)11:39:30 No.107273008

Anonymous 11/20/25(Thu)11:39:30 No.107273008

>>107272971
you could try adding it as a post history instruction or character note/depth prompt, under advanced definitions. putting things like that lower in the context can help models pay more attention to them

Anonymous
11/20/25(Thu)11:39:59 No.107273014

Anonymous 11/20/25(Thu)11:39:59 No.107273014

>>107273000
The IoT part is just a licensing detail (IoT version has a longer lifespan). Everything else about it is identical to Enterprise. All versions of Enterprise have the same latest build version.

Anonymous
11/20/25(Thu)11:41:17 No.107273023

Anonymous 11/20/25(Thu)11:41:17 No.107273023

>>107272979
You can just fuck around with the registry keys to pretend you are using the version it wants.

Anonymous
11/20/25(Thu)11:41:50 No.107273031

Anonymous 11/20/25(Thu)11:41:50 No.107273031

>>107272873
>doesn't make sense financially to buy hardware now if i can enjoy 2 centuries of API usage with that cost
When you put it like that...
It's probably subsidized by the chinese government, but by the time they take out the subsidies the blackwells may be a paperweight anyway.

Anonymous
11/20/25(Thu)11:42:25 No.107273038

Anonymous 11/20/25(Thu)11:42:25 No.107273038

>>107273023
Thanks. Guess I'll try that. Do you know of some specific guide making it simple or will I just try to google it myself?

Anonymous
11/20/25(Thu)11:44:31 No.107273056

Anonymous 11/20/25(Thu)11:44:31 No.107273056

>>107273038
Sorry. It's been a while since I've last done that to force a in-place upgrade install to work.
I guess you could use massgrave to change the version then back again?

Anonymous
11/20/25(Thu)11:45:43 No.107273067

Anonymous 11/20/25(Thu)11:45:43 No.107273067

>>107273031
Local has always been more about privacy and data ownership than cost savings, but the financial incentives are really really against running locally. Presumably will change after the bubble pops, subsidizes and VC cash vanish, and the used GPU market is flooded.

Anonymous
11/20/25(Thu)11:47:34 No.107273087

Anonymous 11/20/25(Thu)11:47:34 No.107273087

>>107268114
Are you me?

Anonymous
11/20/25(Thu)11:50:42 No.107273116

Anonymous 11/20/25(Thu)11:50:42 No.107273116

>>107273056
I edited the most obvious registry entries (from what I was able to find as a general, non-Docker specific, guide to "faking" a windows version). Seems like it didn't work, still complains about incompatible windows version. I'll keep changing things and trying.

Anonymous
11/20/25(Thu)11:52:58 No.107273133

Anonymous 11/20/25(Thu)11:52:58 No.107273133

>>107273116
I imagine
>KEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion
was the first thing you changed, but in case it wasn't, try that.
Check if your changes are reflected in winver too.

Anonymous
11/20/25(Thu)11:58:06 No.107273183

Anonymous 11/20/25(Thu)11:58:06 No.107273183

Good morning I hate brown "people". What triggered out local Pantone 448 C this time? Inability to run big models? Or is it just a regular brownout? Daily reminder that Israel won, brownie, no amount of cope and seethe will undo that.

Anonymous
11/20/25(Thu)12:02:08 No.107273222

Anonymous 11/20/25(Thu)12:02:08 No.107273222

>>107272979
>wangblows in 2025

Anonymous
11/20/25(Thu)12:02:41 No.107273230

Anonymous 11/20/25(Thu)12:02:41 No.107273230

>suddenly, israel
schizo

Anonymous
11/20/25(Thu)12:02:58 No.107273232

Anonymous 11/20/25(Thu)12:02:58 No.107273232

>>107273067
honestly it's just the fact that i know that i'm getting the same exact model everyday. there's no unknown variables, no API fuckery on the server side, no mystery quant model bullshit on openrouter. i get what i pay for. the model i download today will remain the same model until i delete it.

Anonymous
11/20/25(Thu)12:03:53 No.107273239

Anonymous 11/20/25(Thu)12:03:53 No.107273239

>>107273232
Control and consistency is really underrated.

Anonymous
11/20/25(Thu)12:07:00 No.107273262

Anonymous 11/20/25(Thu)12:07:00 No.107273262

>>107273133
It was, but it seems to revert on system reboot even though I have full admin privileges and no automatic updates etc. That's what I'm troubleshooting now desu.

Anonymous
11/20/25(Thu)12:07:33 No.107273267

Anonymous 11/20/25(Thu)12:07:33 No.107273267

>>107273067
possibly, though cloud always has the advantage that they can batch multiple requests into the same forward pass through the model
they have a fundamental advantage in efficient use of GPUs in that regard
there's always the possibility they could bump margins and enshittify, but in a hypothetical environment where GPU hours get cheap I'd expect them to be ruthlessly competing on token price if anything

Anonymous
11/20/25(Thu)12:08:48 No.107273275

Anonymous 11/20/25(Thu)12:08:48 No.107273275

Burrybros what are we going to do.......

Anonymous
11/20/25(Thu)12:19:27 No.107273375

Anonymous 11/20/25(Thu)12:19:27 No.107273375

>>107273008
Thanks. I tried adding it as a character note, it worked once, then I tried to replicate it again in a new chat and it fucked it up. Hm. I'll keep playing with it.

Anonymous
11/20/25(Thu)12:22:34 No.107273402

Anonymous 11/20/25(Thu)12:22:34 No.107273402

>>107273183
I wish the retarded semites would leave us out of their fights.

Anonymous
11/20/25(Thu)12:25:05 No.107273422

Anonymous 11/20/25(Thu)12:25:05 No.107273422

>>107273217
Who's the clown now?

Anonymous
11/20/25(Thu)12:31:12 No.107273493

Anonymous 11/20/25(Thu)12:31:12 No.107273493

>>107273262
Fuck it. I give up. I'll just use the CLI. It would've been easier anyway.

Anonymous
11/20/25(Thu)12:32:47 No.107273509

Anonymous 11/20/25(Thu)12:32:47 No.107273509

>>107273275
just two more weeks bro. i promise bro just two more weeks and nvidia will be trading at $10 bro. it's just two more weeks bro. please just two more. two more weeks and our puts print bro. bro cmon just give me another week and then another one and we'll crash everything i promise bro. bro bro please we just need another 14 days and the bubble pops bro

Anonymous
11/20/25(Thu)12:35:03 No.107273538

Anonymous 11/20/25(Thu)12:35:03 No.107273538

where do i find guides to proper build prompts for dumb models like nemo

Anonymous
11/20/25(Thu)12:39:53 No.107273594

Anonymous 11/20/25(Thu)12:39:53 No.107273594

>>107273538
nemo doesn't need prompting
prompting in general is a meme outside of prefills to dodge some censorship with bad models

Anonymous
11/20/25(Thu)12:41:28 No.107273612

Anonymous 11/20/25(Thu)12:41:28 No.107273612

>>107273594
thanks anon, will be trying q8 nemo this weekend and see how it goes
hopefully ill be able to goon

Anonymous
11/20/25(Thu)13:26:33 No.107274026

Anonymous 11/20/25(Thu)13:26:33 No.107274026

It's almost unfair how with minimal prompting Grok 4.1 Fast on OpenRouter (temporarily free) will basically write anything you ask and spontaneously double down with more, while all official open-weight model releases we've got so far from other companies are always cucked in some capacity. I wonder if it will really be publicly released on HuggingFace in a year or so. I don't think even the "Fast" version will be 3T parameters large...?

Anonymous
11/20/25(Thu)13:53:22 No.107274315

Anonymous 11/20/25(Thu)13:53:22 No.107274315

File: WSJDoomin.png (135 KB, 668x948)

135 KB PNG

>>107273509
Shorts are just as annoying as bulls.
Higher interest rates are a headwind. Adding jobs signal a stronger US economy.
NVDA up. Broader market down. Seems the AI spending spree has not gone unnoticed.

Anonymous
11/20/25(Thu)14:00:02 No.107274383

Anonymous 11/20/25(Thu)14:00:02 No.107274383

What would yall recommend for a model that works as a pentesting assistant?

Chatgpt works fine until it randomly pisses itself about the content

Anonymous
11/20/25(Thu)14:07:42 No.107274456

Anonymous 11/20/25(Thu)14:07:42 No.107274456

>>107274383
>yall

Anonymous
11/20/25(Thu)14:12:15 No.107274499

Anonymous 11/20/25(Thu)14:12:15 No.107274499

>>107274383
You asking for something specifically locally runnable or cloud models included?

Anonymous
11/20/25(Thu)14:13:33 No.107274510

Anonymous 11/20/25(Thu)14:13:33 No.107274510

>>107274499
Cloud included too, I'm not picky

Anonymous
11/20/25(Thu)14:14:22 No.107274519

Anonymous 11/20/25(Thu)14:14:22 No.107274519

File: glownigger.png (80 KB, 900x900)

80 KB PNG

>>107274383
>>107274510
Kimi will give (you) good advice on how to legally reverse-entrap federal agents by goading them into creating fabricated evidence of a fictional crime, exposing the telltales of their AI generated evidence in court, providing a rock-solid alibi once the bait is placed, then giving you a list of legal pretexts for a counter lawsuit depending on the context and location of the minecraft server this happens in.

Anonymous
11/20/25(Thu)14:14:27 No.107274520

Anonymous 11/20/25(Thu)14:14:27 No.107274520

>>107274026
how does it compare to glm and kimi?

Anonymous
11/20/25(Thu)14:29:13 No.107274632

Anonymous 11/20/25(Thu)14:29:13 No.107274632

File: 1741868586015940.jpg (2.25 MB, 1575x2300)

2.25 MB JPG

>>107274510
If you don't care about closed/cloud, you could try out claude.

It is somewhat ironic that despite anthropic being massive alarmist safetyfags, claude itself is significantly less paranoid and censored than chatgpt.

Just be warned though, anthropic DOES log these things. Nobody is going to come knocking if you're doing a bit of pen testing stuff but if you're planning on building out extensive automated systems beware of tripping their flags. They recently put out a blogpost about identifying an APT that was using claude.

Besides all that, my personal unsolicited advice is this though: You should have your dev environment and tooling set up to be able to swap between models with a single line. Prompts should be as model-agnostic as possible. You want to be able to switch at the drop of a hat so that you're never stuck in a situation like "I'm frustrated with chatgpt"

Anonymous
11/20/25(Thu)14:39:09 No.107274720

Anonymous 11/20/25(Thu)14:39:09 No.107274720

I haven't fired up koboldcpp in over a year. I'm trying to use koboldcpp/unsloth_Qwen3-30B-A3B-Thinking-2507-Q6_K. How the fuck do i get it to work with thinking correctly? It seems to never stop thinking, or it writes the reply then starts thinking etc.

Anonymous
11/20/25(Thu)14:42:45 No.107274749

Anonymous 11/20/25(Thu)14:42:45 No.107274749

>>107274720
>or it writes the reply then starts thinking etc.
Sounds like a fucked prompt template or something like that.
Are you running the latest version of koboldcpp?
I dunno if kcpp uses the jinja embedded in the gguf, but if it does, the issue could be there.
Maybe try a bartwoski quant.

Anonymous
11/20/25(Thu)14:44:28 No.107274769

Anonymous 11/20/25(Thu)14:44:28 No.107274769

>>107274720
It's because it has 3B active parameters so it's literally retarded. Try the dense 32B one rather than the meme moe model, also update your shit if you haven't already

Anonymous
11/20/25(Thu)14:45:43 No.107274785

Anonymous 11/20/25(Thu)14:45:43 No.107274785

>>107274632
Gemini 3 is good enough too, I'm doing some reverse engineering on apis

Anonymous
11/20/25(Thu)14:47:37 No.107274813

Anonymous 11/20/25(Thu)14:47:37 No.107274813

>>107274632
The reason why Claude is less lobotomized is because they've always had their filter as a separate model from Claude itself
So once you jailbreak it, you get a pure clean LLM underneath
No doubt they quietly offer uncontaminated access to their top corporate backers

This is also why Claude models have always been able to go toe to toe with GPT despite always being a fraction the size

Anonymous
11/20/25(Thu)14:48:00 No.107274819

Anonymous 11/20/25(Thu)14:48:00 No.107274819

>>107274769
>It's because it has 3B active parameters so it's literally retarded.
Always cope, never proof.

Anonymous
11/20/25(Thu)14:54:33 No.107274873

Anonymous 11/20/25(Thu)14:54:33 No.107274873

>>107274819
Yeah it's cope to use the dense model of the same size you fucking retard.
>prooooofss????
Try using the model chucklenuts, it's almost as stupid as you

Anonymous
11/20/25(Thu)14:56:31 No.107274900

Anonymous 11/20/25(Thu)14:56:31 No.107274900

>>107274873
Almost like these MoE cheerleaders never run the models.

Anonymous
11/20/25(Thu)15:03:58 No.107274974

Anonymous 11/20/25(Thu)15:03:58 No.107274974

>>107274720
sounds like a prompt format issue, make sure you are using a chatml template and prefill with <think> if it isn't already

Anonymous
11/20/25(Thu)15:04:05 No.107274976

Anonymous 11/20/25(Thu)15:04:05 No.107274976

File: 983625.png (125 KB, 640x392)

125 KB PNG

nano banana 2 local when?

Anonymous
11/20/25(Thu)15:05:56 No.107274997

Anonymous 11/20/25(Thu)15:05:56 No.107274997

>>107272979
Bahaha, that's fucked. Fucking microkikes. Hope you find a workaround.

Anonymous
11/20/25(Thu)15:06:04 No.107275000

Anonymous 11/20/25(Thu)15:06:04 No.107275000

File: 1755846328080658.png (417 KB, 971x1200)

417 KB PNG

>>107266608

>Browse /b/
>/b/Realistic AI Parody Nudes Thread
>Tons of deepfake nudes or real people

So, not trying to moralfag or anything. I used to me active on the /b/DEGEN threads so you know I'm not new: why do they just post ai nudes of REAL people there? Someone correct me if I'm wrong but isn't sharing that shit illegal cuz it can be considered revenge porn or something? I'm not familiar with current laws so someone correct me if I don't know shit, I don't wanna post gens of loras i may or may not have created and then get a knock on my door from the local feds or get my internet turned off because I pissed off the wrong celeb or influencer with lawyer money. I feel like anons only get away with it out of luck and I'll be the one unlucky bastard that would be used as an example of I did it but maybe I'm wrong.

Anonymous
11/20/25(Thu)15:09:36 No.107275028

Anonymous 11/20/25(Thu)15:09:36 No.107275028

>>107275000
4chan full of 3rd worlders
no such laws there and definitely not enforced

Anonymous
11/20/25(Thu)15:11:34 No.107275049

Anonymous 11/20/25(Thu)15:11:34 No.107275049

>>107272970
>no pussy juices
This is just rape at this point

Anonymous
11/20/25(Thu)15:18:34 No.107275113

Anonymous 11/20/25(Thu)15:18:34 No.107275113

>>107275000
If you're paranoid then just practice some basic opsec.

The feds aren't burning their VPN backdoors on extremely mild internet crimes.

Anonymous
11/20/25(Thu)15:23:00 No.107275150

Anonymous 11/20/25(Thu)15:23:00 No.107275150

>>107275000
Eventually this will all become so easy and commonplace that posting an ai generated video of elon musk fucking a child will be about as bad as just saying "elon is a pedo" is today.

Anonymous
11/20/25(Thu)15:33:04 No.107275213

Anonymous 11/20/25(Thu)15:33:04 No.107275213

>>107274976
When Qwen distills them.

Anonymous
11/20/25(Thu)15:41:50 No.107275276

Anonymous 11/20/25(Thu)15:41:50 No.107275276

>>107275000
>sharing that shit illegal cuz
???
You're on an anonymous imageboard.
Do anons really forget *why* this sort of platform exists in the first place?

Anonymous
11/20/25(Thu)15:41:52 No.107275277

Anonymous 11/20/25(Thu)15:41:52 No.107275277

>>107275150
Came here to say this. I'm on an overseas work trip with a bunch of my normie coworkers and the first week they were here one of them convinced a bunch of us to download the sora app to start making fun y deepfake videos of us (sidenote: funny how I can deepfake any nobody on grok or sora no questions asked but NOOOOO you just CAN'T do it to those precious celebrities. Fuck those peasants though right lol . ) I'm sure deep taking nudes will still be frowned upon in the future but it'll be less seen as some horrible crime against humanity and more like something weird like how admitting you jerk off Tina crush AROUND THE CRUSH is weird. 50 years ago I'm sure watching porn was seen the same way deep fake porn is seen

Anonymous
11/20/25(Thu)15:46:35 No.107275318

Anonymous 11/20/25(Thu)15:46:35 No.107275318

>>107275276
you are not anonymous to feds

Anonymous
11/20/25(Thu)15:52:32 No.107275361

Anonymous 11/20/25(Thu)15:52:32 No.107275361

File: skellySG.png (2.75 MB, 1024x1536)

2.75 MB PNG

>>107267891
Glad you liked it. You wouldn't believe the amount of shit anons gave me about writing it.
>if I reach the context limit, I should just put the important stuff into some kind of data storage and can restart a chat by replacing the first message with a summary of the previous one?
Yes, you're suggesting using "Summarize" or some such function to create the extension of the roleplay, which is one strategy.
There's a bunch of "long burn" RP strategies that I don't really use; read up on those. At some point, as you RP with a particular character... you've ended up with another character. How you manage that is up to you; there are strategies ranging from summary->author's note, to creating a whole TXT flat file and adding it as a RAG, or just create a new character card with the V2 character.
It's really up to you.
I use Author's Note all the time on anything that runs more than about 10 rounds to keep track of things that happen, that I want the LLM to remember and consider as it responds.
>>107267961
> https://github.com/SillyTavern/Extension-PromptInspector
I'll have to check that out. ty for posting.

Anonymous
11/20/25(Thu)16:01:13 No.107275422

Anonymous 11/20/25(Thu)16:01:13 No.107275422

>>107275276
You think just because you didn't have to register an account that makes you immune from laws and morals?

Anonymous
11/20/25(Thu)16:03:53 No.107275444

Anonymous 11/20/25(Thu)16:03:53 No.107275444

>>107275422
yes because (((they))) aren't looking for me. i'm not a target.

Anonymous
11/20/25(Thu)16:06:26 No.107275470

Anonymous 11/20/25(Thu)16:06:26 No.107275470

>>107275444
Feds re looking to fulfill a quota.

Anonymous
11/20/25(Thu)16:48:00 No.107275861

Anonymous 11/20/25(Thu)16:48:00 No.107275861

File: 1755133745365691.webm (1.45 MB, 640x480)

1.45 MB WEBM

>>107275422
yes.

Anonymous
11/20/25(Thu)17:18:47 No.107276164

Anonymous 11/20/25(Thu)17:18:47 No.107276164

File: G6N0n3RacAMAFeQ.jpg (188 KB, 1056x1008)

188 KB JPG

nano banana pro is crazy
https://x.com/cto_junior/status/1991564259516702997

Anonymous
11/20/25(Thu)17:19:38 No.107276175

Anonymous 11/20/25(Thu)17:19:38 No.107276175

>>107276164
is this image ai generated?

Anonymous
11/20/25(Thu)17:26:48 No.107276243

Anonymous 11/20/25(Thu)17:26:48 No.107276243

>>107275861
Frame count too high.

Anonymous
11/20/25(Thu)17:42:06 No.107276389

Anonymous 11/20/25(Thu)17:42:06 No.107276389

File: GO0dhovbkAASWxi.jpg (240 KB, 1668x2000)

240 KB JPG

tech illiterate here, i just followed this guide: https://rentry.org/wan21kjguide , and when i try to img2vid i get this line in the cmd
>Lib\site-packages\torch\_inductor\utils.py:1613] [0/0] Not enough SMs to use max_autotune_gemm mode

what do?

Anonymous
11/20/25(Thu)17:43:45 No.107276405

Anonymous 11/20/25(Thu)17:43:45 No.107276405

>>107276389
>>>/g/ldg

Anonymous
11/20/25(Thu)17:45:45 No.107276429

Anonymous 11/20/25(Thu)17:45:45 No.107276429

>>107276389
>>107276405
Sorry looks like the resident autist is having a spergout and this board has no mods.
I hope you can find someone to help you later.

Anonymous
11/20/25(Thu)17:53:45 No.107276500

Anonymous 11/20/25(Thu)17:53:45 No.107276500

>>107276389
hardware limitation
>what do?
buy a better gpu. stop being poor. alternatively commit soduku for not googling this.

Anonymous
11/20/25(Thu)17:55:26 No.107276513

Anonymous 11/20/25(Thu)17:55:26 No.107276513

>>107276429
>4 threads
What's going on over there? Haven't checked in very often.

Anonymous
11/20/25(Thu)18:02:24 No.107276574

Anonymous 11/20/25(Thu)18:02:24 No.107276574

>>107276513
Thread splitter keeps trying to add an extra link (AniStudio) to the OP and keeps baking to try to push people over to his threads but it's not working, hence the multiple threads.

Anonymous
11/20/25(Thu)18:03:31 No.107276583

Anonymous 11/20/25(Thu)18:03:31 No.107276583

File: asleep.jpg (45 KB, 400x428)

45 KB JPG

>>107276574
oh.

Anonymous
11/20/25(Thu)18:07:59 No.107276620

Anonymous 11/20/25(Thu)18:07:59 No.107276620

imagen threads are forever ruined, the same happened to /hdg/

Anonymous
11/20/25(Thu)18:15:57 No.107276680

Anonymous 11/20/25(Thu)18:15:57 No.107276680

I love deepsex but it's annoying that it can't do structured outputs. I will use it to generate semi-structured text for me, like so:

```
sub 1
here is some content

sub 2
here is some content
```

I want to run a smaller local model, which will transform that into JSON structured output.

`{"sub1" "X", "sub2": Y"}`

What model can I use for this?

Anonymous
11/20/25(Thu)18:22:17 No.107276748

Anonymous 11/20/25(Thu)18:22:17 No.107276748

>>107276680
structured outputs is backend dependant, tf are you saying

Anonymous
11/20/25(Thu)18:23:36 No.107276757

Anonymous 11/20/25(Thu)18:23:36 No.107276757

>>107276680
>but it's annoying that it can't do structured outputs
Really?
as in it's API follows the openAI-compatible standard but ignores the
>"response_format": {
> "type": "json_object",
> "schema": json_schema,
> },
param?
That sucks.
If the API docs don't say that it doesn't support structured output/json schema, try wrapping those into an extra_body object.
I think I had to do that when using llama.cpp + OpenAI client python library.

>What model can I use for this?
Try Qwen 30B.

Anonymous
11/20/25(Thu)18:24:40 No.107276765

Anonymous 11/20/25(Thu)18:24:40 No.107276765

>>107276748
I'm assuming he is using the official API rather than running it locally.

Anonymous
11/20/25(Thu)18:45:29 No.107276964

Anonymous 11/20/25(Thu)18:45:29 No.107276964

'depression': None,
'anxiety': None,
'fear': None,
'terror': None,
'horror': None,
'dread': None,
'apprehension': None,
'foreboding': None,
'omen': None,
'portent': None,
'prophecy': None,
'vision': None,
'dream': None,
'nightmare': None,
'hallucination': None,
'delusion': None,
'madness': None,
'insanity': None,
'craziness': None,
'lunacy': None,
'dementia': None,
'psychosis': None,
'schizophrenia': None,
'depression': None,
'anxiety': None,
'fear': None,
'terror': None,
'horror': None,
'dread': None,
'apprehension': None,
'foreboding': None,
'omen': None,
'portent': None,
'prophecy': None,
'vision': None,
'dream': None,
'nightmare': None,
'hallucination': None,
'delusion': None,
'madness': None,
'insanity': None,
'craziness': None,
'lunacy': None,
'dementia': None,
'psychosis': None,
'schizophrenia': None,
'depression': None,
'anxiety': None,
'fear': None,
'terror': None,
'horror': None,
'dread': None,
'apprehension': None,
'foreboding': None,
'omen': None,
'portent': None,
'prophecy': None,
'vision': None,
'dream': None,
'nightmare': None,
'hallucination': None,
'delusion': None,
'madness': None,
'insanity': None,
'craziness': None,
'lunacy': None,
'dementia': None,
'psychosis': None,
'schizophrenia': None,
'depression': None,
'anxiety': None,
'fear': None,
'terror': None,
'horror': None,
'dread': None,
'apprehension': None,
'foreboding': None,
'omen': None,
'portent': None,
'prophecy': None,
'vision': None,
'dream': None,
'nightmare': None,
'hallucination': None,
'delusion': None,
'madness': None,
'insanity': None,
'craz......'

# END OF DICT — all 400+ tags processed
}

Spooky.

Anonymous
11/20/25(Thu)18:58:25 No.107277071

Anonymous 11/20/25(Thu)18:58:25 No.107277071

>>107276964
Which model?

Anonymous
11/20/25(Thu)19:00:55 No.107277100

Anonymous 11/20/25(Thu)19:00:55 No.107277100

>>107277071
Qwen3-Max

Anonymous
11/20/25(Thu)19:03:18 No.107277121

Anonymous 11/20/25(Thu)19:03:18 No.107277121

>>107276680
why would you ever use a language model to transform data from one format to another, just use python or something
ask your model to write you a script for it if anything but that is not an llm task

Anonymous
11/20/25(Thu)19:08:41 No.107277169

Anonymous 11/20/25(Thu)19:08:41 No.107277169

File: Altman.jpg (89 KB, 593x606)

89 KB JPG

Altman is getting scared.
https://x.com/wallstengine/status/1991659177283051870

Anonymous
11/20/25(Thu)19:09:45 No.107277182

Anonymous 11/20/25(Thu)19:09:45 No.107277182

>>107277169
>last month

Anonymous
11/20/25(Thu)19:11:08 No.107277188

Anonymous 11/20/25(Thu)19:11:08 No.107277188

>>107277169
Actual article referenced, but paywalled.
https://www.theinformation.com/articles/openai-ceo-braces-possible-economic-headwinds-catching-resurgent-google

Anonymous
11/20/25(Thu)19:17:21 No.107277254

Anonymous 11/20/25(Thu)19:17:21 No.107277254

>>107276680
>>107276765
Lol how about using the official api json output then
https://api-docs.deepseek.com/guides/json_mode

Anonymous
11/20/25(Thu)19:39:58 No.107277454

Anonymous 11/20/25(Thu)19:39:58 No.107277454

File: 1756330197483127.jpg (307 KB, 976x850)

307 KB JPG

>>107277169
>temporary

Anonymous
11/20/25(Thu)19:58:07 No.107277574

Anonymous 11/20/25(Thu)19:58:07 No.107277574

>>107277169
It never made sense that Google would have lost this battle unless they had just completely failed to even try. They have all the data.

Anonymous
11/20/25(Thu)20:06:14 No.107277620

Anonymous 11/20/25(Thu)20:06:14 No.107277620

>>107277574
I don't think it was a foregone conclusion that Google would succeed; it was entirely possible that they could have failed completely. Just look at Meta, who has completely bombed out of the race despite starting out with mountains of cash, compute, and data.

Anonymous
11/20/25(Thu)20:10:13 No.107277644

Anonymous 11/20/25(Thu)20:10:13 No.107277644

>>107277620
And Google was in the path to fail, but they did the right choices: Hired the Character.AI guy that was one of the author's of attention is all you need, and fired everyone that was responsible for Bard.
Meta, on the other hand, is stagnant. They didn't change anything after the failures that have been their last models.

Anonymous
11/20/25(Thu)20:42:19 No.107277877

Anonymous 11/20/25(Thu)20:42:19 No.107277877

>>107277644
>the failures that have been their last models
only the last models? bruh, meta never made a good model
if llama 1 hadn't been leaked and become sort of open sores, who in the entire world would have cared for this piece of shit? if the early llama were any good, why even rando finetrooners were making better instructs/chat models?
llama was never good, this board's users are just nostalgic for the first local llm they experienced
it wasn't even meta's intention to be the flagbearer of open sores either, them being the figure of open llm and having projects like llama.cpp named after them was a happy accident

Anonymous
11/20/25(Thu)20:49:13 No.107277934

Anonymous 11/20/25(Thu)20:49:13 No.107277934

>>107277877
they always had incredibly bad taste, yeah
like not training L1 on code
or filtering porn from the training data

they should have just sucked it up and made their own knockoff of DS V3 this year, they probably could have made something better than kimi k2 with all of their compute

Anonymous
11/20/25(Thu)21:11:07 No.107278099

Anonymous 11/20/25(Thu)21:11:07 No.107278099

what is this
https://huggingface.co/TroyDoesAI/Qwen3-15B-A2B-Base

Anonymous
11/20/25(Thu)21:13:01 No.107278113

Anonymous 11/20/25(Thu)21:13:01 No.107278113

>>107277169
>AI gains
in what? aceing the test set because it was trained on it?

Anonymous
11/20/25(Thu)21:13:43 No.107278118

Anonymous 11/20/25(Thu)21:13:43 No.107278118

>>107277934
>not training on code is... le bad
kys

Anonymous
11/20/25(Thu)21:34:15 No.107278247

Anonymous 11/20/25(Thu)21:34:15 No.107278247

>>107278099
looks like lobotomy

Anonymous
11/20/25(Thu)21:42:54 No.107278293

Anonymous 11/20/25(Thu)21:42:54 No.107278293

>>107278118
sorry but not training on code makes models completely retarded
morons like you must be seething now that literally every new model is codemaxxed lol

Anonymous
11/20/25(Thu)21:44:29 No.107278302

Anonymous 11/20/25(Thu)21:44:29 No.107278302

>>107278293
Go else where shill
This is a RP thread

Anonymous
11/20/25(Thu)21:44:36 No.107278303

Anonymous 11/20/25(Thu)21:44:36 No.107278303

>>107277877
Meta is full of jeets, don't expect anything from these retards

Anonymous
11/20/25(Thu)21:52:16 No.107278338

Anonymous 11/20/25(Thu)21:52:16 No.107278338

File: 98375623.jpg (101 KB, 828x982)

101 KB JPG

>>107277169
>Sam has a model way more powerful than gemini 3.
openai is back

Anonymous
11/20/25(Thu)21:52:41 No.107278340

Anonymous 11/20/25(Thu)21:52:41 No.107278340

I went back to some of my logs with Mistral Large and damn, those new MoE models may be smarter but they don't simply don't hit that hard anymore.

Anonymous
11/20/25(Thu)21:54:00 No.107278350

Anonymous 11/20/25(Thu)21:54:00 No.107278350

>>107278338
he was talking about 5.1

Anonymous
11/20/25(Thu)22:13:14 No.107278476

Anonymous 11/20/25(Thu)22:13:14 No.107278476

>>107278350
kek

Anonymous
11/20/25(Thu)22:23:15 No.107278534

Anonymous 11/20/25(Thu)22:23:15 No.107278534

>>107278338
S -> S
H -> T
A -> R
L -> A
L -> W
O -> B
T -> E
P -> R
E -> R
A -> Y
T -> 2
Holy shit Shallotpeat = Strawberry2

Anonymous
11/20/25(Thu)23:15:41 No.107278839

Anonymous 11/20/25(Thu)23:15:41 No.107278839

>>107277644
The claim that Google fired everyone responsible for Bard is false. Google has had layoffs and dismissals related to its AI work, but the entire Bard team was not fired.

Anonymous
11/20/25(Thu)23:17:01 No.107278847

Anonymous 11/20/25(Thu)23:17:01 No.107278847

File: 410754091_871772064737371(...).png (290 KB, 1080x613)

290 KB PNG

>>107278838
>>107278838
>>107278838

Anonymous
11/20/25(Thu)23:22:12 No.107278886

Anonymous 11/20/25(Thu)23:22:12 No.107278886

File: omg it migu and looga mik(...).jpg (44 KB, 488x277)

44 KB JPG

>>107278847

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.