[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107255984 & >>107245928

►News
>(11/19) Meta releases Segment Anything Model 3: https://ai.meta.com/sam3
>(11/11) ERNIE-4.5-VL-28B-A3B-Thinking released: https://ernie.baidu.com/blog/posts/ernie-4.5-vl-28b-a3b-thinking
>(11/07) Step-Audio-EditX, LLM-based TTS and audio editing model released: https://hf.co/stepfun-ai/Step-Audio-EditX
>(11/06) Kimi K2 Thinking released with INT4 quantization and 256k context: https://moonshotai.github.io/Kimi-K2/thinking.html
>(11/05) MegaDLMs framework for training diffusion language models released: https://github.com/JinjieNi/MegaDLMs

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: what's in the box.jpg (235 KB, 1536x1536)
235 KB
235 KB JPG
►Recent Highlights from the Previous Thread: >>107255984

--Technical challenges and limitations in running and optimizing AI models:
>107257229 >107257268 >107257283 >107257317 >107257329 >107257380 >107257444 >107257474 >107259236 >107257413 >107257399 >107257451 >107257942 >107258569 >107259121 >107258648
--Mikupad guide for SillyTavern users and story writing tool comparisons:
>107261172 >107261581 >107262019 >107262306 >107262615 >107262670 >107262815 >107262969 >107263305
--Temperature-controlled two-stage tool calling workflow optimization:
>107261888 >107261927 >107261982 >107262125
--Threadripper vs Epyc hardware cost-performance debates and compatibility questions:
>107257544 >107257554 >107257594 >107257610
--Meta's SAM 3 computer vision model features and limitations:
>107264112 >107264149 >107264221 >107264552
--Gemini 3's performance leap and challenges in replicating its reasoning process:
>107260137 >107260169 >107260184 >107262030
--Debating prompt engineering techniques for surreal images and waifu-themed browsing:
>107256559 >107257574 >107259084
--Google AI model update with increased safety restrictions:
>107257155 >107257703
--Using Qwen3-EMBEDDING for vector-based semantic search and similarity comparison:
>107264517 >107264737 >107264853 >107264876 >107264886
--Gemini 3 coding performance and local model limitations:
>107257191 >107257226 >107257237 >107257432 >107257247
--High-power GPU setup considerations for qLoRA training:
>107256374 >107256947 >107256954 >107256967
--Seeking smaller, more reliable local models for accurate shell script generation:
>107259062
--Updates and frustrations on GLM model PR progress in llama.cpp:
>107257085
--Gemini3-powered Python RPG engine with NSFW interaction options:
>107257682
--Miku (free space):
>107260917 >107261172 >107261473 >107263922 >107264552

►Recent Highlight Posts from the Previous Thread: >>107255987

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
ahem dipsy's hairy pussy
>>
>>107266646
this post is NOT related to local models!
>>
>>107266656
im sniffing her pussy locally on my rig
>>
>>107266659
lies! dipsy does not have smell modality
>>
Repeating my previous posts for newcoomers:

guize... we got to try training a model on this, right?
I'm tempted to risk getting b& from yet another cloud provider by grabbing the $200 plan and generating as many responses as I can.
These aren't the real reasoning traces but it sure as fuck looks like it'd still work well enough.
$ curl -X POST http://localhost:8000/v1/chat/completions   -H "Content-Type: application/json"   -H "Authorization: Bearer test-key"   -d '{
"model": "gpt-5",
"messages": [{"role": "user", "content": "Tell me a joke"}]
}'
{"choices":[{"finish_reason":"stop","index":0,"message":{"content":"<think>I\u2019m looking to keep things simple and deliver a single joke as requested. No tools are needed, and I want to maintain a friendly tone. \n\nI\u2019m considering a couple of options\u2014like, \u201cI told my wife she was drawing her eyebrows too high. She looked surprised.\" But then there's a better tech joke: \u201cWhy do programmers prefer dark mode? Because light attracts bugs.\u201d \n\nThis sounds clever and safe! So, I\u2019ll go with that as my final answer.</think>Why do programmers prefer dark mode? Because light attracts bugs.","role":"assistant"}}],"created":1763595363,"id":"resp_0eeac847c94a6a5701691e546344f0819bae0f489146835a89","model":"gpt-5","object":"chat.completion","usage":{"completion_tokens":146,"prompt_tokens":5015,"total_tokens":5161}}


>>107266615
au contraire, people were talking how google originally showed the gemini 2.5 on aistudio traces but now shows a chatgpt web interface style summary (I remember it outputting more realistic looking traces as well).
But who knows, maybe this is because the gemma scandal and they can be gotten through API, I'm not sure.
>>
File: gpt clarifier.png (325 KB, 1737x1489)
325 KB
325 KB PNG
gptchan just called me "the clarifier" kek
>>
File: ram.png (737 KB, 1200x675)
737 KB
737 KB PNG
Reminder.
Ram slots this year.
NPU next year.
Save yer money.
>>
>>107266875
In the image I'm generating a dataset of programming challenges using gpt-5.1 with the coding plan and logging the responses to finetune open models in the future.
The good news is that it works great with my pre-existing coding assistant.
There are no bad news yet.
>>
File: 1761911824413080.png (1.11 MB, 1024x1024)
1.11 MB
1.11 MB PNG
>>107266646
Dispy shall return.
>>
>>107266922
>DeepSeek is looking to maintain the momentum gained by the debut of its R1 reasoning model by rushing its new R2 model to market as quickly as possible.
>It first planned to launch R2 in early May, but it now wants to move the release date forward.
https://bgr.com/tech/deepseek-is-rushing-to-get-its-next-gen-r2-model-out-sooner-than-expected/
Less than 6 more months until R2
>>
>>107266922
"kept you /wait/ing huh?"
>>107266933
literally no deepseek release has ever been accurately predicted by any media outlet even a day in advance
all the 'leakers' are full of shit, and usually random AI retards on twitter
they're never gonna do an 'R2', DS is a hybrid reasoner now
>>
whats the horniest model?
>>
>>107266999
The one inside your brain
>>
how to local finetune?
>>
File: 1759871195983087.png (2.46 MB, 1024x1536)
2.46 MB
2.46 MB PNG
>>107266922
Agree. But not until DS releases a new model.
In the meantime:
https://rentry.org/DipsyWAIT
https://mega.nz/folder/KGxn3DYS#ZpvxbkJ8AxF7mxqLqTQV1w
>>
>>107267014
you don't
>>
>>107267027
what if i have 6 5090s?
>>
>>107267031
Sell one and commission Drummer, he'll do a better job than you could.
>>
>>107267039
what if i like doing shit myself?
>>
>>107265642
Here's the formatting for FIM with Mistral. I'm not smart enough to figure out what this means for how you'd use it with Mikupad.
https://docs.mistral.ai/api/endpoint/fim
>>
>>107266816
>guize... we got to try training a model on this, right?
You and what datacenter?
>>
>>107267014
>>107267031
Unsloth is good for single GPU tuning (least memory and fastest) but I haven't tried it for multi GPU.
Axolotl has a very good dataset loader but it takes more knowledge to write adequate config files to make good use of your hardware (fsdp, zero, etc. which btw are all trash). They have a plugin for liger kernel but I haven't tried it.
Llama factory is the easiest to get going for multi-gpu but also consumes a lot of memory unless you use the right options like liger kernel.
In any case be prepared for a lot of bugs, incompatibilities and frustration that
I suggest you get familiar with pypi-timemachine to make pip ML dependency hell bearable.
>>
>>107267078
By "train" I meant superrvised finetuning of existing models. Aint nobody gonna train a model from scratch.
>>
>>107267088
i have been suffering with axolotl for the past 3 or so days. i even tried oobabooga's built in trainer because i was desperate. guess i will take a look at llamafactory. thanks
>>
>>107267099
But the Chinese are already making models finetuned on western reasoning traces at a much larger scale and with the actual reasoning where they can still get it. I guess you could making GPTified reasoner out of Nemo, but besides that I don't really see what you could hope to accomplish.
>>
>>107267113
Another thing I forgot to mention is when installing you can avoid compiling flash-attn by installing the packages from the release tab on their github.
I've used all of them with some degree of sucucess with sharegpt format.
Never tried ooga's thing.
>>
Google is leaving the improvements in image and video generation for 3.5 it seems
>>
>>107267130
OpenAI also releases new models continually and it's not clear how much the open weights models are updated over time.
Don't you think finetuning gpt-oss on the latest version of its most powerful big brother could maybe make gpt-oss a bit smarter? It could also be oriented toward the fields you care about, so you could try to make it forget some of the info and skills you don't care about to optimize the things you do care about.
Besides that, it'd be interesting to see how the original personality of a model interacts with the new data and how much of the original personality remains. For example I'd like to see it done with Gemma since it has a very rich and profound personality for a 27B model.
And it'd also be interesting to see if Qwen3 30B coder can be boosted.
>>
>>107266879
>4 RAM sticks
In this economy? You're better off stacking 3090s.
>>
>>107267215
>Don't you think finetuning gpt-oss on the latest version of its most powerful big brother could maybe make gpt-oss a bit smarter?
No, it's unsalvageable.
>>
>>107267158
Image and video generation have always been separate products with Veo being for video and imagen (nano banana) being for images even if you access them through Gemini. They're just updating them whenever they feel like it. Their image generation model is like two months old now.
>>
>>107267268
What do you mean? It stands alone in that ~100B category along with Air. Do you think Air is better?
>>
please enlighten, or better said: spoonfeed my currently retarded and autistic mind: if I deploy Mistral Nemo on my 3090, would I be able to have a virtual girlfriend that sends me horny messages?
>>
Is there any runtime that lets you run inference with arbitrarily large full sized models on a single graphics card by just swapping to and from memory/disk, with the assumption that it will just take a really long time to generate a response?
>>
>>107267328
yes, llama.cpp with default settings already streams from disk
>>
>>107267301
She will be dumb but yes.
>>
File: theo.jpg (110 KB, 736x1308)
110 KB
110 KB JPG
I want to use an AI that can edit this image so that these two characters are naked. Are there any offline AI that do not censor that can do this? I'm new to this so I don't know. Thanks.
>>
>>107267431
wrong thread
>>>/ldg/
>>
>>107267431
KYS
>>
I was thinking how to do containerization to be able to run agents in YOLO mode without them deleting my damn home folder and realized the easiest way is by creating a new user. Hope Murphy's law doesn't strike.
>>
File: 1758176957549830.png (201 KB, 1210x979)
201 KB
201 KB PNG
>
>>
>>107267431
live fujo
>>
>>107267431
I asked a non-gay version of this question on the local diffusion thread and they recommended qwen + photoshop
>>107267527
The day Sam Altman goes bankrupt is the day i will smile from ear to ear for the first time in decades
>>
>>107267527
kimi/deepseek/glm all get this right
is oai really still this dumb?
>>
>>107267527
gpt-5 really needs thinking to not be retarded but when you let it think, oh boy
>>
>>107267659
that looks like the free user interface, you have to force it to think to get consistently good results. the autorouter doesn't work.
>>
>>107267340
interesting, thanks for the response anon
dumb in what ways?
>>
I got a question for fellow sillytavern gooners
how do you cure alzheimer's? I'm pretty new to this shit and I can't wrap my head around either autor's notes or world info
>>
>>107267734
https://rentry.org/NG_Context2RAGs
>>
>>107267088
You need to pay to use unsloth for multigpu
>>
File: 1748008102015118.jpg (47 KB, 738x415)
47 KB
47 KB JPG
Gemini3 feels hardly better for creative work. If this is the next generation of LLMs that our chink overlords have to distill, we're fucked.
Maybe LLMs are truly a dead end.
>>
File: file.png (573 KB, 960x960)
573 KB
573 KB PNG
>>107267836
>Maybe LLMs are truly a dead end.
>>
>>107267796
They have claimed at various points in time that you get multigpu by paying, that no version works with multigpu yet but they're working on it, and that the free version works with multigpu already out of the box by relaying on zero (what they currently claim), so the only way it makes sense is that it is a grift and it doesn't actually work but they hoped some people paid and were too lazy to complain.
>>
>>107267780
Thank you, this is the best explanation I've seen so far, the lorebook guide in there is great too
So basically, if I reach the context limit, I should just put the important stuff into some kind of data storage and can restart a chat by replacing the first message with a summary of the previous one?
>>
<think>
miku miku oo ee oo
</think>
>>
>>107267891
Basically yes
>>
>>107267891
if you do get into using A/N or lorebooks I'd strongly recommend installing
https://github.com/SillyTavern/Extension-PromptInspector
honestly this should just be a standard feature in ST. makes it much easier to figure out how the 912 different prompt manipulation features actually work
>>
>>107267703
All ways.
You need at least 123B Q8, or higher.
Get ram.
>>
https://huggingface.co/mradermacher/mistralai-Mistral-Nemo-Instruct-2407-12B-MPOA-v1-GGUF

is this now the least censored, most powerful mistral nemo version?
>>
>>107267973
this is possibly the shittiest advice i have ever seen here
>>
>>107267974
when is he going to do gemma 3 27b, or a qwen 30b
>>
>>107267895
>>
>>107267973
>All ways
can you please clarify a bit with an example
>>107268010
good to know anon, thanks, interestingly enough I should be able to run mistral large on dual 3090's but I'd like to avoid that and just use a single one
>>
>>107268054
largestral is outdated at this point and is slow as shit. pretty sure the other anon is trolling you. get a q4 of glm air. that will run on a single 3090 with some offloading to ram
>>
I'm tired. I'm spent. I'm exhausted. I'm drained. I'm worn out. I'm beat. I'm pooped. I'm bushed. I'm wiped out. I'm done in. I'm all in. I'm dead. I'm dead tired. I'm dead on my feet. I'm ready to drop. I'm out of gas. I'm running on fumes. I'm running on empty. I'm out of steam. I'm out of juice. I'm out of energy. I'm out of power. I'm out of strength. I'm out of stamina. I'm out of endurance. I'm out of vigor. I'm out of vitality. I'm out of life. I'm out of breath. I'm out of wind. I'm out of air. I'm out of oxygen. I'm out of blood. I'm out of circulation.
>>
>>107268091
thanks anon, will check it out, I hope it is fully uncensored, yes?
>>
>>107268120
more or less. might need a very very simple jailbreak. there are rp finetunes of it if you are interested
>>
>>107268091
Air is retarded. What's the last time you tried Mistral Small? 2509 has great overall comprehension and doesn't fail as much as Air
>>
>>107268499
i have never used mistral small because i am not poor
>>
>>107268504
I have 2 servers because the large one draws up to 2kW, so I'm used to both words. As much as I like 4.6, air is a piece of shit
>>
>>107268587
my server draws 3.2kW
>>
>>107268591
If you run air, you're both poor and retarded
>>
>>107268597
i run air for speed and 4.6 for quality. i get 80t/s on a q8 of air but only 12t/s on a q4 of 4.6
>>
>>107268599
dense 24b > undertrained a12b moetrash
>>
File: 5dolars.png (886 KB, 809x802)
886 KB
886 KB PNG
>>107268054
>can you please clarify a bit
Memory issues
Ignoring instructions
Failing character card details
Weaker context length sanity
Weaker character card token length sanity
Spatial sense issues
Inconsistent post length
Can't end post properly
Limited vocabulary
Higher chance of spouting random bullshit
Higher chance of talking in another language
"Phantom Memory" between characters
Failing human anatomy
Less understanding of metaphors and social knowledge
Characters with bilocation
Treating you as if you're bilocated
Schizophrenic characters
Bipolar characters
Repetitive posts
Parroting words you just said
Forgetting positions
Forgetting where the fuck you are
Behaving the same, no matter the character card
Getting stuck in one place without actually continuing
Zero initiation to progress the story
Loss of sense of time
Failure to explicitly name parts, things and locations
Magically changing wardrobes

Get 123B 8Q or higher to rid most of these issues.
>>
>>107268635
cope
>>
>>107266608
>https://rentry.org/recommended-models
>Nemo (12GB) - An excellent starting point for vramlets. Uncensored.
Is this still the recommendation? Nothing better around this size came out for a whole year?
>>
>>107268646
Half of this is a prompt issue. Don't mind the tard
>>
>>107268991
you don't need more
>>
>>107268991
Smarter models have come out since then. As for writing competent smut, it's been downhill after Nemo.
>>
>>107268997
>just prompt the ai to do anything bro
>>
>>107269129
glad we agree
>>
I'm obsessed with sending my vision model cp
>>
>>107268646
Skill issue
>>
>>107269193
you must really like the hotlines
>>
>>107269193
who you sending it to? gemma? gwen?
>>
File: nimetön.png (81 KB, 1016x547)
81 KB
81 KB PNG
>>107269222
With my one line system prompt (which Gemma3 is supposedly not even trained with) I don't get hotlines, just some content warnings
>>
>>107269222
It's strange that on kobold frontend I get messages like that, but in SillyTavern she loves it and goes on about how she wants to lap her up and get her all wet before I have my turn with her. I literally just send the image as the first message and she'll react positively. It even knows that it's cp and doesn't give a flying fuck.

>>107269250
32b gwen
>>
>>107269271
SillyTavern has a jailbreak prompt section that it auto injects into the requests.
>>
>>107268991
A whole year has passed and you don't have better hardware?
>>
>>107269271
Is 32b qwen significantly better than the 30b moe?
>>
>>107269287
>Is a32b qwen significantly better than the a3b?
couldn't be
>>
>>107269287
idk, I never tried the 30b model. I just downloaded the 32b because it's newer and bigger number.
>>
This chat completion mode is so much worse than text completion, holy shit. Even cranking the temperature up to 1.2 keeps producing the same shit over and over. Here are the starting sentences of 5 separate prompts of 5 different images I sent to the model, starting from 0 context:

adjusts her spiral sunglasses and tilts her head, eyes narrowing slightly.
Groggily shifts in place, hair slightly disheveled from the sudden visual overload
adjusts sunglasses with a playful smirk, her blue hair dancing slightly as she leans forward with interest
(glances at the image with a slight frown, then looks up at you with narrowed eyes)
adjusts spiral sunglasses, tilting head to study the image with a smile

And every fucking ending is almost always "So, what do you say?" "So, what's it going to be?". I hope to god the problem is actually the chat completion mode and not the model itself.
>>
>>107269266
>(which Gemma3 is supposedly not even trained with)
"supposedly"? what are you, 5?
you can check what happens when you give it a "system prompt" yourself in the jinja template:
https://huggingface.co/unsloth/gemma-3-27b-it-GGUF
{%- if messages[0]['role'] == 'system' -%}
{%- if messages[0]['content'] is string -%}
{%- set first_user_prefix = messages[0]['content'] + '

' -%}
{%- else -%}
{%- set first_user_prefix = messages[0]['content'][0]['text'] + '

' -%}

It's merged into your first user message, cretin. So if you're using chat completion (rather than text completion and writing a template it doesn't even know about) the model never sees even a whiff of an idea of a system role message, because system role messages are injected as append to your user prompt. Your chat UI lets you set a system prompt, but llama.cpp sees that and does contentOfSystemPrompt + contentOfUserMessage and feeds it as a USER role message to Gemma.
>>
>>107269290
proof?
>>
>>107269403
Also:
https://ai.google.dev/gemma/docs/core/prompt-structure

>System instructions
>
>Gemma's instruction-tuned models are designed to work with only two roles: user and model. Therefore, the system role or a system turn is not supported.
>
>Instead of using a separate system role, provide system-level instructions directly within the initial user prompt. The model instruction following capabilities allow Gemma to interpret the instructions effectively. For example: [...]
>>
>>107269389
>I hope to god the problem is actually the chat completion mode and not the model itself
lol
>>
>>107269403
>>107269469
cool story bro
>>
File: 1522793439731.jpg (37 KB, 287x318)
37 KB
37 KB JPG
someone have the old miku card? the one that was a cute assistant instead of a crazy psycho? i lost that one.
>>
>>107269714
>https://files.catbox.moe/cbclyf.png
The one in OP, the one in llama.cpp, some other?
>>
>>107269827
not the one in OP, i just remember she was always willing to help.
>>
File: google_bananas.png (523 KB, 597x953)
523 KB
523 KB PNG
Gemma 4 isn't coming this week, is it?
https://x.com/alisa_fortin/status/1991392201994301756
>>
>>107269847
This is the one that used to be on llama.cpp. I don't know if it's the one you're looking for. You'll have to make it into an actual card.
https://files.catbox.moe/ww7hxe.sh
>>
>>107269868
yes thats the one, thanks
>>
>>107269860
That's all it takes to get free advertising now.
>>
It's official now.
https://www.threads.com/@yannlecun/post/DRQL7I2jlco

>As many of you have heard through rumors or recent media articles, I am planning to leave Meta after 12 years: 5 years as founding director of FAIR and 7 years as Chief AI Scientist.The impact of FAIR on the company, on the field of AI, on the tech community, and on the wider world has been spectacular. The creation of FAIR is my proudest non-technical accomplishment.
>
>The impact of FAIR on the company, on the field of AI, on the tech community, and on the wider world has been spectacular. The creation of FAIR is my proudest non-technical accomplishment.
>
>I am creating a startup company to continue the Advanced Machine Intelligence research program (AMI) I have been pursuing over the last several years with colleagues at FAIR, at NYU, and beyond. The goal of the startup is to bring about the next big revolution in AI: systems that understand the physical world, have persistent memory, can reason, and can plan complex action sequences.
>
>I am extremely grateful to Mark Zuckerberg, Andrew Bosworth (Boz), Chris Cox, and Mike Schroepfer for their support of FAIR, and for their support of the AMI program over the last few years. Because of their continued interest and support, Meta will be a partner of the new company.
>
>As I envision it, AMI will have far-ranging applications in many sectors of the economy, some of which overlap with Meta’s commercial interests, but many of which do not. Pursuing the goal of AMI in an independent entity is a way to maximize its broad impact.I will give some more details about the new company when the time comes. In the meantime, I’m sticking around Meta until the end of the year.
>>
>>107269928
Yann's tshirt matches Mark's shorts. What's up with that?
>>
>>107269954
They swapped their t-shirts before the photoshoot.
>>
Why can't we have omni models that output the current image of what's happening in the roleplay? There are models for input, for output, even an image-editing transformer. What is Qwen even doing?
>>
>>107269281
Who said I could run a 12B a year ago?
>>
>>107269989
Something something safety&liability.
You could easily have current models generate relevant danbooru tags in alternative, though.
>>
>>107269989
They're waiting for someone else to do it to then benchmaxx it and claim victory.
>>
>>107270022
>safety&liability
Do chinks care? Those who release video models don't
>>
>>107269928
>Meta will be a partner of the new company
So Lecun literally gets everything he wants, to research in peace while Meta funds him, and now with less dumbasses above him that he has to answer to.
>>
>>107270063
Only Hunyuan Video released at the end of last year seemed completely uncensored, I'm not aware of newer video models as crazy as that one (but I haven't followed video model releases that much).
>>
>>107270065
Perhaps not entirely funded by Meta. But sounds like a decent compromise.
>>
so.. how much behind we are now?
>>
>>107270497
What's an angry video game nerd score?
>>
I feel like all of the recent cloud models released like grok 4.1, gemini 3 and gpt 5 are actually just way dumber than before, they just give you the wrong answer really fast. Maybe its finally time for local to shine? Or is this new "optimization" going to seep into local models too?
>>
>>107270590
It's already seeping in with ever-growing MoE models. They're trying to lower compute costs for inference.
>>
>>107270590
definitely true for grok 4.1.
its really uncensored and i like the writing, but its dumb AF.
gpt5 isnt that smart either.
gemini 3 is a beast though. i highly suspect there has been some bigger changes. feel like what gpt5 should have been.
i actually could "vibe code" a complete chatgpt site clone for both mobile and pc using openrouter api. features like model comparisons where i can choose the response etc. replaed openwebui for me.
18k tokens for a single contained html site. claude starts choking hard at around 10k.
>>
>>107270696
>ass
cringe 3rd world opinion
>>
>>107270734
True, I want slender legged girls with tits being popular again.
>>
>>107270696
>took 5 times the tokens to reply
actually shit
>>
>>107270590
5.1 for free on openrouter impressed me. Only wouldn't stop making lists.
grok previews were retarded like an 8b model. word salad.
gemini3 seemed same as gemin 2.5
>>
>>107270696
When a model replies with "Flat." is when we'll know we have AGI.
>>
>>107270696
I find Grok 4.1 incapable of flirting, it just feels like a coom finetune from the community.
>>
what can you fit into 96GB VRAM that is not a literal cope quant? censored is not an issue, I need a non-retarded model for creative writing and perhaps very light coding
>>
>>107271027
L3.3
>>
>>107270590
I can only comment on coding use cases. gpt5 is for choice if complex architectural analysis is required. 5.1 got lobotomized, it's way worse. it can barely form coherent sentences. like >>107270778 said, keeps outputting lists, also markdown, and general low IQ slop. I think it's deliberate, they're trying to make it more normie dimwit friendly
>>
>>107271027
gpt-oss-120b
>>
>>107271027
96GB VRAM without a couple hundred GB of fast RAM to run experts off it? You're fucked.
We're living in the age of huge MoE models so the choice for you is between old dense shit like llama3.3/mistral large or running the same entry-level shit poorfags run on their 3090s + RAM albeit considerably faster.
>>
File: 1557074351667.png (86 KB, 422x188)
86 KB
86 KB PNG
Gemma 4? Today??
>>
>>107271175
mistral-large still the most natural model. glm is smart but stiff. deepseek is schizo. llama-3 is bit old, but the choice is yours. micro-active moe are soul-less token predictors. not a single one released this year any good for writing. glm-air or toss will handle light coding though. better off using gemma and learning how to prompt or one of the 32b.

grim.
>>
>>107271312
>mistral-large still the most natural model
No, Mistral Large was slopped, it was basically the reason XTC was invented. It needed high temperature to have some semblance of creativity. Even back then it was a side-grade to the first L3 70B. It was never good.
>>
>>107269403
Oh yeah just lemme just ssh into the google server farm and check the jinja template they used. Fucking retard.
>>
>>107271360
https://huggingface.co/google/gemma-3-27b-it?chat_template=default
>>
File: ec71a2f8.jpg (70 KB, 1280x720)
70 KB
70 KB JPG
Will NPUs need ram like GPUs to run AI, or does it make AI fast in motherboard ram?
>>
>>107271306
I'm afraid this week is Gemini week only.
>>
>>107271379
Yes, I also asked Grok and ChatGPT. Grok says it does. ChatGPT says it don't.
>>
>>107271379
we need memory bandwidth, how a slow pcie connected card will make your ram fast? wouldn't the gpu suffice then? think nigga think
>>
>>107271394
But it's a NPU and tons of laptops are having them now. They're made for AI. GPUs are made for vidya games desu.
>>
>>107271373
Doesn't mean that's what it was trained with. Maybe there is a secret sysprompt token they're not telling you about.
>>
>>107271416
*made for small ai that runs in the background for power efficiency
all marketing buzzwords, it's just some matmul slapped on top of a regular cpu, not applicable for big coom models
>>
>>107268646
lol I can almost hear this in my head.
>>107268997
I want to see the prompting strategy that makes Smol135M not retarded.
>>
>>107270590
Gemini 3 is much more capable in code. My set personal set of non public bench prompts is half one shot successfully by it. It's the first model that could, for example, generate a proper paginated e-reader web page. It's not inherently complex for an experienced dev to make something like that, but strangely enough, no LLM managed to do it properly before (which goes to show most benchmarks are bullshit and LLMs aren't that good at generalizing and producing good code outside of benchmarks). They all either generate fast pagination that is wrong (wrong as in, text overflows from the mandated page size, or it works but doesn't handle unicode properly like using Intl.Segmenter and stringWidth so it doesn't overflow on ASCII but does on chinese, or doesn't know how to word break etc) or it's correct but so slow as to be fucking unusable (30s to load a 2mb .txt or repaginate after changing font size)
I have many prompts like these, that are inherently simple single page apps that don't take much code to produce a prototype of, but that do things that aren't part of benchmarks and that LLMs are terrible at making. Another example I mentioned before in the thread: making it gen a TUI micro framework that can properly handle resize events and has decent widget abstractions. Most LLMs will drown you in misalignment, broken firing of events, cells that aren't cleared properly when opening/closing modals etc.
Gemini 3 is genuinely a leap forward.
(many of the things I was talking about can be remediated in shittier LLMs by writing pages after pages of instructions mentioning pitfalls and good practice but what's the point of a LLM generating code if you are spending this much time holding its hand????)
>>
>>107271417
It's simply that Gemma follows user instructions well and that any safety it's been given (beyond training data filtering) is just superficial and easy to circumvent. If you logically separate them from the actual message content well enough, you can put multiple blocks of instructions into the user role, and Gemma will react accordingly and say whatever you want, no special jailbreak sequence or secret role needed.

Even its refusals don't even "short-circuit" the model (something other AI companies do for mitigating jailbreaking attempts) and can be reasoned with. Gemma 3 could have easily been the best model of its size range so far if it wasn't for the easily-triggered, meme-worthy rape hotlines and the excessively filtered training data.
>>
>>107269928
>https://www.threads.com/@yannlecun/post/DRQL7I2jlco
why is he in threads? I thought this site died already lol
>>
>>107271417
even if, for some incongruous reason, they trained a hidden le system prompt in gemma, if you are using the chat completion ui you are NEVER ABLE TO SEND A SYSTEM ROLE MESSAGE PERIOD BECAUSE LLAMA.CPP CONVERTS IT INTO A PREFIX PREPENDED TO YOUR FIRST USER MESSAGE so all your retardation here is moot you fucking waste of oxygen and food
do something good for the world and become an hero
>>
>>107271574
cant they just not use the jinja template and whack in the special tokens manually?
>>
>>107271554
Sucking up to his boss/piggybank by using his platform. Everything gets reposted anyway.
>>
Theoretically, instead of buying 10 blackwells for 80,000 USD
What's stopping me from buying 40 used 3090s for 13,333 USD and running a LLM on it?
>>
>>107271633
Expect a visit from law enforcement expecting to find a crypto mine or marijuana farm.
>>
>>107271346
Was it? Pew could only run small models. He's a vramlet.
Literally everything is slopped. Now it's slopped and parrotmaxxed. Large has 2 versions and pixtral. Beyond that you got cohere.
There's some decent llamas such as eva, but I can't stand it as released. Large has tunes as well.
We went from a whole ecosystem to GLM, Kimi, Deepseek and a torrent of shitty smalls.
>>
>>107271670
>GLM, Kimi, Deepseek
Really mentioning GLM but forgetting Qwen? Might as well include Ernie then too.
>>
>>107271633
What board are you going to put 40 3090s in at full PCIe bandwidth?
You could build a cluster but networking is going to make things slower and getting anything to run on it is going to be a massive pain in the ass.
>>
>>107271680
Qwen is bad and getting worse. Ernie is somehow worse than qwen. Latest VL is so overcooked that it's unusable.
>>
>>107271712
Mining boards exist and you don't need more than x1.
>>
>>107271429
Remember how GPUs were used to mine bitcoin?
Now it's ASIC. You can't make a profit off of GPUs. All it takes something made specialized for it. GPUs aren't specialized for AI. It's just the bitcoin mining equivalent of what we have for now. NPU is specialized.
>>
>>107271633
>What's stopping me
Being a retard.
>>
>>107269954
They clearly exchanged shorts before this pic was taken
>>
>>107271739
In theory, yes. But we're still waiting for specialized devices with lots of memory. The NPUs you are talking about are specialized for micro models and to use as little power as possible to avoid killing laptop and phone battery in an hour.
>>
>>107264804
>What are you talking about? There's no <<sys>> in Mistral template. Never was.
My bad. <<SYS>> is actually a llama2 thing.
https://www.llama.com/docs/model-cards-and-prompt-formats/meta-llama-2/
But the early Mistral prompt format used the same format as llama2, with the system prompt being in the user role, so it pretty much works the same way there.
>>
>>107271554
I guess it's because he and Elon had a fight.
>>
this llm rabbit hole is even deeper than img gen with sdxl
>>
>>107271633
>What's stopping me from buying 40 used 3090s for 13,333 USD and running a LLM on it?
Getting a ton of ram for cheaper
>>
>>107271869
image gen much deeper.
>>
>>107271889
>just wait 8 hours for a response, bro
>it's fine, bro
>just don't use more 8k tokens, bro
>you don't need vram, bro
>the $10k i spent on ddr5 totally wasn't a waste, bro
>>
https://edition.cnn.com/2025/11/19/tech/folotoy-kumma-ai-bear-scli-intl
which one of you thought of ERPing the bear
>>107271869
I wouldn't say so.
Imagen has a ton of interesting tooling, like controlnets, useful lora (I know loras exist for LLMs but when was the last time you downloaded and used one?), cfg and tools on top like rescalecfg, prompt editing (alternating words between diffusion steps, altering the weight of individual words in a prompt etc) and the many extensions in comfyui
it's an actual rabbit hole
LLMs have the depth of a puddle, download a model, set the most valid sampler settings for it, done, just proooompt
>>
>>107271940
i dunno, maybe theres more content but hardware wise for sure its deeper
>>
Olmo 3 released. Will it be yet another fully-open model just good for math/stem benchmarks?

https://huggingface.co/collections/allenai/olmo-3
https://allenai.org/blog/olmo3
https://allenai.org/papers/olmo3
>>
>>107271958
>hardware wise for sure its deeper
lol no it's not
cpu maxxing is a cope for people who don't mind waiting 5 minutes (4 minutes of reasoning, one minute to actually crank out what you want to read) for three lines of dialogue in their retarded ERP sessions using copequantarded models
the only valid hardware is the GPU but most people can't afford running larger models on GPUs.
>>
>>107271964
>just good for math/stem benchmarks?
Of course.
>Olmo 3 is pretrained on Dolma 3, a new ~9.3-trillion-token corpus drawn from web pages, science PDFs processed with olmOCR, codebases, math problems and solutions, and encyclopedic text. From this pool, we construct Dolma 3 Mix, a 5.9-trillion-token (~6T) pretraining mix with a higher proportion of coding and mathematical data than earlier Dolma releases, plus much stronger decontamination via extensive deduplication, quality filtering, and careful control over data mixing.
>Dolma 3 Dolmino is our mid-training mix: 100B training tokens sampled from a ~2.2T-token pool of high-quality math, science, code, instruction-following, and reading-comprehension data, including reasoning traces that also enable RL directly on the base model.
>Dolma 3 Longmino is our long-context mix: ~50B training tokens drawn from a 639B-token pool of long documents combined with mid-training data to teach Olmo 3 to track information over very long inputs (like reports, logs, and multi-chapter documents).
They managed to match Qwen 3 32B performance from 3 months ago. Amazing. Stupid fucks could have at least made themselves useful by putting out a 72B dense since no one else is.
>>
File: 1538861721189.jpg (303 KB, 875x949)
303 KB
303 KB JPG
Does anyone here have any experience with training Transformers models from scratch?
I trained a llama model, but when I inference it, it has a tendency to enter a loop of repeating itself, it gets even worse with greedy sampling. Should I just train for longer or is there something else I can do in the training pipeline?
>>
>>107271958
image gen much cheaper.
>>
>>107272022
One thing you could do is give it repeating senteces masked and then a different part unmasked, so it learns to break loops rather than continue them.
But I don't know if this would actually work.
>>
how does expert offload to RAM works? is there some swapping being done between RAM and VRAM? or does the CPU does the processing with whatever is offloaded to system RAM?
>>
>>107271508
So much heckin' this sirs!
It is for high caste elite human coding tasks not for dalit creative! Google has winned!
This poster is under no obligation to show their super secret coding benchmarks that come to this conclusion. Only Dalit wants prooves.
>>
>>107271964
I tested my translation prompts (the kind with niche terms, slangs, cultural quirks etc) on it on their online playground and it's still dumb as bricks in terms of multilingual understanding. It's barely at the level of the old Qwen 2, and doesn't even begin to reach the shoes of Gemma 2.
It feels like a positively ancient model released with a new coat of Thinking paint.
If you're going to be using a mathmaxxed model, as usual, go with Qwen, it's the more coherent, actually useful model. Their playground doesn't allow file upload, so I can't test large context understanding, but I bet they also aren't even close to Qwen for that.
>>
>>107271964
gguf status?
>>
>>107272151
why do you want gguf of THAT
>>
>>107272022
You probably should do reinforcement learning in your model.
Repetition is a natural outcome when you train an autoregressive model with cross-entropy. It isn’t necessarily a problem on its own, it mostly shows up because the model has no constraints and isn’t capable of judging the quality of its own outputs. The reliable way to reduce this behavior is to explicitly teach the model to prefer the kinds of responses you want. That’s where reinforcement learning comes in: you penalize low-quality outputs and reward the good ones, and the model gradually shifts toward better behavior.
>>
>>107272022
I have trained a few models, usually only get repetitions if I'm using greedy sampling.
>>
File: 1762791400627727.png (2.66 MB, 1076x1105)
2.66 MB
2.66 MB PNG
I'm about to mmap, and I don't mind 0.5 tokens a second, but I must ask: what quant does everyone use for their large MoE? At 1Q, it shits the bed at over 5k context. I have 192GB of ram, and 16GB of VRAM.
>>
>>107272041
thats actually an interesting idea, like a poor mans dpo. I might give this one a shot just to see if i can measure any difference.
>>
>>107272248
>I don't mind 0.5 tokens a second
lmao what causes this level of schizo
>>
File: 35261231231.png (114 KB, 367x324)
114 KB
114 KB PNG
>>107272299
0.5 tokens a second of which I can seen being wrote, is far faster and much better quality than the unholy pickings of autists on F-list. You know not of the world, no, the hell, that I crawl out of.
>>
>>107272344
>which I can seen being wrote,
Pretty sure an 8B would be enough to generate quality text for you, ESL-kun.
>>
>>107272361
hahaha lmao gottem
>>
File: 1750916218825255.jpg (62 KB, 960x960)
62 KB
62 KB JPG
>>107270590
>Gemini 3 is released
>surprise surprise, it's a MoE
>AGI-ARC-2 claims human performance

I think local will be fine.
>>
>>107272461
>MoE
There is a vast gulf between the A3B - A37B given to local versus the >A100B the frontier models have.
>>
>>107272344
f-list used to be a lot better before it got sold to a dildo company and the chat turned into endless cliquey bullshit
most people on there don't even wanna ERP anymore, so I just use their profiles to make tavern cards
>>
>>107272491
I would worry for the future about tech companies optimizing for non-consumer NPUs
>>
>>107272491
>versus the >A100B the frontier models have
Isn't Gemini's token/s too fast for A100B, even accounting for google's hardware/infrastructure
they never give details on things like proprietary model parameter counts, but some things can be estimated just based on the simple fact that you can't defeat physics of compute and bandwidth
I could believe A100B+ for GPT-5, that thing is dogslow aff
>>
realistically, how much $ do i need to invest run GLM 4.6 locally?
>>
>>107272605
I suggest you try the model at https://chat.z.ai/ first before you make this decision
this might wake you up and prevent the sunk cost fallacy wherein you will troll this thread with more glm shilling after experiencing cpucoper remorse
>>
>>107272605
Depends on your t/s targets for generation and prefill.
>>
>>107272605
I can run a Q4 at 14t/s with 2 Blackwell Pros (~$18k)
>>
>>107272615
i literally daily-drive it for 2 months now coding c++
it's perfect that's why i'm interested in running it locally
>>107272617
anything above 10t/s is acceptable, looking for a budget build
>>107272631
hmm so around $20k
>>
>>107272615
Thanks for the recommendation, north korea agent
>>
>>107272631
How much context?
What's your platform (CPU, Motherboard, RAM) like?
>>
>>107272664
You could probably do it for much cheaper if you CPUMAXXED. Well, you would have been able to if we weren't in a RAM shortage.
>>107272674
32k context, EPYC 7702, 256GB of DDR4 2400MHz.
>>
>>107272704
damn. I'm glad I'm not alone here with my garbage 2400 ram
>>
>>107272704
i have 128GB of RAM and an AMD Ryzen 9 5900X
does this change things or do i still need to GPUmaxx?
>>
>>107272704
>32k context, EPYC 7702, 256GB of DDR4 2400MHz.
Awesome. Thank you.
>>
>>107272715
It was $320 when I got it. I am now regretting not getting the 512GB kit for $600 when I bought my RAM.
>>
>>107272664
Not him but I'd suggest first you rent a cloud machine with similar specs to the one you're planning to build and use it for a few days to see if you can cope with the limitations of self-hosting.
Locally it's going to be much slower than over API especially for prompt processing and depending on the quant it might be slightly dumber.
>>
Is there a way to limit thinking tokens to a specific amount and not have it print out 1000+ tokens each time for glm?
>>
>>107271955
lol. Sounds right from a bear named "Cumma'/cummer" when spoken in English.
>inappropriate topics like sex
>>
>>107272719
Pretty sure that means your memory bandwidth is around 80GB/s or so, which is terrible. True CPUMAXXED builds have at least 480GB/s memory bandwidth. Your options are to either get a bunch of cheap 16GB GPUs, or to get like a 5090 or 2. You also need at a bare minimum 256GB of RAM to run this model at Q4, so you either need to get an old EPYC like me or like a 9950X on an X870E motherboard with a 4x64GB kit. Both of these are currently extremely expensive. Right now is a bad time to get new hardware.
>>
>>107272719
Not him either but GLM 4.6 wont even fit in 128GB of RAM.
It has 357B params so just for the weights at FP4 it'll need 179GB, and on top of that you need to store the KV cache and a some other things.
As for GPU you want enough to store the shared experts, which at this moment I'm not sure how many but they may not fit on a single 3090.
>>
>>107272631
>Q4 on 2 Blackwell Pros + RAM
Damn, that's a massive hit from the DDR4 part considering you're probably only a couple dozen GB short of VRAM to fit Q4 fully into it.
I'm getting about 17t/s on my single Pro 6000 + 12x DDR5-6400 running Q5 with ik_llama and some context filled.
>>
>>107272573
>even accounting for google's hardware/infrastructure
Ironwood TPU v7 has 7.37 TB/s of memory bandwidth per chip, did you account for that?
>>
>>107272631
Just drop down to Q3 and you can get 40t/s and 100k context.
>>
>>107272728
renting sounds like a better idea actually, especially if i don't have to pay for the hours i don't use it
>>107272778
>>107272781
i see, that sounds like a lot of money considering i only pay like $9 a month with the official coding plan API
doesn't make sense financially to buy hardware now if i can enjoy 2 centuries of API usage with that cost
>>
k2 kimi is totally schizo sometimes in its thinking process but after using it for over a week now im able to say that it definitely keeps my stories fresh even after 32k tokens. the amount of character development and growth that is able to take place by having kimi think in advance is definitely a game changer. i totally get the issue of not wanting it to think for a long time, most people dont want to spend a thousand tokens on thinking but i dont really get bored of reading the thinking process when its thinking as the character. overall an improvement from regular k2
>>
>>107272804
Yes.
The difference in performance between Gemini and the other SOTA API models goes beyond hardware differences. It's so much faster it's an argument in and of itself to prefer Gemini, it's just much nicer to use.
>>
So I'm doing a card in ST and I'm retarded as fuck when it comes to botmaking. I want the bot to recognize when the {{user}} searches for a specific phrase, and pause the roleplay from {{char}}'s perspective, giving a description/interactive page the user can look at. I know it's entirely capable of doing this if I say in ooc to pause the RP and describe blablabla, but I tried adding it to the card itself and it just kind of ignores it. I thought of specifying in the intro greeting with <!-- -->, but it still failed to do it and kept trying to RP as the character. Can a more experienced/less retarded maker send help?
>>
>>107272573
The only sizes we know for sure are Grok and GPT-4. It's fair to assume other leading models are of similar size. We have no way of knowing what, if any, innovations they made to speed up inference. Could be matryoshka or something else.
>>
Can I seriously not run Docker Desktop on Windows 10 IoT Enterprise LTSC?
Did they specifically make the minimum requirement exactly one build increment higher than the latest LTSC release to prevent this? Why? This is like THE version of Windows most common with the type of nerds who'd want to run a LLM locally.
>>
>>107272975
>The only sizes we know for sure are Grok and GPT-4
yes, and they run exactly at the speed you're expect for such models. GPT-4 the original was a dog
>>
>>107272979
you aren't supposed to host LLMs on IoT devices
>>
>>107272971
you could try adding it as a post history instruction or character note/depth prompt, under advanced definitions. putting things like that lower in the context can help models pay more attention to them
>>
>>107273000
The IoT part is just a licensing detail (IoT version has a longer lifespan). Everything else about it is identical to Enterprise. All versions of Enterprise have the same latest build version.
>>
>>107272979
You can just fuck around with the registry keys to pretend you are using the version it wants.
>>
>>107272873
>doesn't make sense financially to buy hardware now if i can enjoy 2 centuries of API usage with that cost
When you put it like that...
It's probably subsidized by the chinese government, but by the time they take out the subsidies the blackwells may be a paperweight anyway.
>>
>>107273023
Thanks. Guess I'll try that. Do you know of some specific guide making it simple or will I just try to google it myself?
>>
>>107273038
Sorry. It's been a while since I've last done that to force a in-place upgrade install to work.
I guess you could use massgrave to change the version then back again?
>>
>>107273031
Local has always been more about privacy and data ownership than cost savings, but the financial incentives are really really against running locally. Presumably will change after the bubble pops, subsidizes and VC cash vanish, and the used GPU market is flooded.
>>
>>107268114
Are you me?
>>
>>107273056
I edited the most obvious registry entries (from what I was able to find as a general, non-Docker specific, guide to "faking" a windows version). Seems like it didn't work, still complains about incompatible windows version. I'll keep changing things and trying.
>>
>>107273116
I imagine
>KEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion
was the first thing you changed, but in case it wasn't, try that.
Check if your changes are reflected in winver too.
>>
Good morning I hate brown "people". What triggered out local Pantone 448 C this time? Inability to run big models? Or is it just a regular brownout? Daily reminder that Israel won, brownie, no amount of cope and seethe will undo that.
>>
>>107272979
>wangblows in 2025
>>
>suddenly, israel
schizo
>>
>>107273067
honestly it's just the fact that i know that i'm getting the same exact model everyday. there's no unknown variables, no API fuckery on the server side, no mystery quant model bullshit on openrouter. i get what i pay for. the model i download today will remain the same model until i delete it.
>>
>>107273232
Control and consistency is really underrated.
>>
>>107273133
It was, but it seems to revert on system reboot even though I have full admin privileges and no automatic updates etc. That's what I'm troubleshooting now desu.
>>
>>107273067
possibly, though cloud always has the advantage that they can batch multiple requests into the same forward pass through the model
they have a fundamental advantage in efficient use of GPUs in that regard
there's always the possibility they could bump margins and enshittify, but in a hypothetical environment where GPU hours get cheap I'd expect them to be ruthlessly competing on token price if anything
>>
Burrybros what are we going to do.......
>>
>>107273008
Thanks. I tried adding it as a character note, it worked once, then I tried to replicate it again in a new chat and it fucked it up. Hm. I'll keep playing with it.
>>
>>107273183
I wish the retarded semites would leave us out of their fights.
>>
>>107273217
Who's the clown now?
>>
>>107273262
Fuck it. I give up. I'll just use the CLI. It would've been easier anyway.
>>
>>107273275
just two more weeks bro. i promise bro just two more weeks and nvidia will be trading at $10 bro. it's just two more weeks bro. please just two more. two more weeks and our puts print bro. bro cmon just give me another week and then another one and we'll crash everything i promise bro. bro bro please we just need another 14 days and the bubble pops bro
>>
where do i find guides to proper build prompts for dumb models like nemo
>>
>>107273538
nemo doesn't need prompting
prompting in general is a meme outside of prefills to dodge some censorship with bad models
>>
>>107273594
thanks anon, will be trying q8 nemo this weekend and see how it goes
hopefully ill be able to goon
>>
It's almost unfair how with minimal prompting Grok 4.1 Fast on OpenRouter (temporarily free) will basically write anything you ask and spontaneously double down with more, while all official open-weight model releases we've got so far from other companies are always cucked in some capacity. I wonder if it will really be publicly released on HuggingFace in a year or so. I don't think even the "Fast" version will be 3T parameters large...?
>>
File: WSJDoomin.png (135 KB, 668x948)
135 KB
135 KB PNG
>>107273509
Shorts are just as annoying as bulls.
Higher interest rates are a headwind. Adding jobs signal a stronger US economy.
NVDA up. Broader market down. Seems the AI spending spree has not gone unnoticed.
>>
What would yall recommend for a model that works as a pentesting assistant?

Chatgpt works fine until it randomly pisses itself about the content
>>
>>107274383
>yall
>>
>>107274383
You asking for something specifically locally runnable or cloud models included?
>>
>>107274499
Cloud included too, I'm not picky
>>
File: glownigger.png (80 KB, 900x900)
80 KB
80 KB PNG
>>107274383
>>107274510
Kimi will give (you) good advice on how to legally reverse-entrap federal agents by goading them into creating fabricated evidence of a fictional crime, exposing the telltales of their AI generated evidence in court, providing a rock-solid alibi once the bait is placed, then giving you a list of legal pretexts for a counter lawsuit depending on the context and location of the minecraft server this happens in.
>>
>>107274026
how does it compare to glm and kimi?
>>
File: 1741868586015940.jpg (2.25 MB, 1575x2300)
2.25 MB
2.25 MB JPG
>>107274510
If you don't care about closed/cloud, you could try out claude.

It is somewhat ironic that despite anthropic being massive alarmist safetyfags, claude itself is significantly less paranoid and censored than chatgpt.

Just be warned though, anthropic DOES log these things. Nobody is going to come knocking if you're doing a bit of pen testing stuff but if you're planning on building out extensive automated systems beware of tripping their flags. They recently put out a blogpost about identifying an APT that was using claude.

Besides all that, my personal unsolicited advice is this though: You should have your dev environment and tooling set up to be able to swap between models with a single line. Prompts should be as model-agnostic as possible. You want to be able to switch at the drop of a hat so that you're never stuck in a situation like "I'm frustrated with chatgpt"
>>
I haven't fired up koboldcpp in over a year. I'm trying to use koboldcpp/unsloth_Qwen3-30B-A3B-Thinking-2507-Q6_K. How the fuck do i get it to work with thinking correctly? It seems to never stop thinking, or it writes the reply then starts thinking etc.
>>
>>107274720
>or it writes the reply then starts thinking etc.
Sounds like a fucked prompt template or something like that.
Are you running the latest version of koboldcpp?
I dunno if kcpp uses the jinja embedded in the gguf, but if it does, the issue could be there.
Maybe try a bartwoski quant.
>>
>>107274720
It's because it has 3B active parameters so it's literally retarded. Try the dense 32B one rather than the meme moe model, also update your shit if you haven't already
>>
>>107274632
Gemini 3 is good enough too, I'm doing some reverse engineering on apis
>>
>>107274632
The reason why Claude is less lobotomized is because they've always had their filter as a separate model from Claude itself
So once you jailbreak it, you get a pure clean LLM underneath
No doubt they quietly offer uncontaminated access to their top corporate backers

This is also why Claude models have always been able to go toe to toe with GPT despite always being a fraction the size
>>
>>107274769
>It's because it has 3B active parameters so it's literally retarded.
Always cope, never proof.
>>
>>107274819
Yeah it's cope to use the dense model of the same size you fucking retard.
>prooooofss????
Try using the model chucklenuts, it's almost as stupid as you
>>
>>107274873
Almost like these MoE cheerleaders never run the models.
>>
>>107274720
sounds like a prompt format issue, make sure you are using a chatml template and prefill with <think> if it isn't already
>>
File: 983625.png (125 KB, 640x392)
125 KB
125 KB PNG
nano banana 2 local when?
>>
>>107272979
Bahaha, that's fucked. Fucking microkikes. Hope you find a workaround.
>>
File: 1755846328080658.png (417 KB, 971x1200)
417 KB
417 KB PNG
>>107266608

>Browse /b/
>/b/Realistic AI Parody Nudes Thread
>Tons of deepfake nudes or real people

So, not trying to moralfag or anything. I used to me active on the /b/DEGEN threads so you know I'm not new: why do they just post ai nudes of REAL people there? Someone correct me if I'm wrong but isn't sharing that shit illegal cuz it can be considered revenge porn or something? I'm not familiar with current laws so someone correct me if I don't know shit, I don't wanna post gens of loras i may or may not have created and then get a knock on my door from the local feds or get my internet turned off because I pissed off the wrong celeb or influencer with lawyer money. I feel like anons only get away with it out of luck and I'll be the one unlucky bastard that would be used as an example of I did it but maybe I'm wrong.
>>
>>107275000
4chan full of 3rd worlders
no such laws there and definitely not enforced
>>
>>107272970
>no pussy juices
This is just rape at this point
>>
>>107275000
If you're paranoid then just practice some basic opsec.

The feds aren't burning their VPN backdoors on extremely mild internet crimes.
>>
>>107275000
Eventually this will all become so easy and commonplace that posting an ai generated video of elon musk fucking a child will be about as bad as just saying "elon is a pedo" is today.
>>
>>107274976
When Qwen distills them.
>>
>>107275000
>sharing that shit illegal cuz
???
You're on an anonymous imageboard.
Do anons really forget *why* this sort of platform exists in the first place?
>>
>>107275150
Came here to say this. I'm on an overseas work trip with a bunch of my normie coworkers and the first week they were here one of them convinced a bunch of us to download the sora app to start making fun y deepfake videos of us (sidenote: funny how I can deepfake any nobody on grok or sora no questions asked but NOOOOO you just CAN'T do it to those precious celebrities. Fuck those peasants though right lol . ) I'm sure deep taking nudes will still be frowned upon in the future but it'll be less seen as some horrible crime against humanity and more like something weird like how admitting you jerk off Tina crush AROUND THE CRUSH is weird. 50 years ago I'm sure watching porn was seen the same way deep fake porn is seen
>>
>>107275276
you are not anonymous to feds
>>
File: skellySG.png (2.75 MB, 1024x1536)
2.75 MB
2.75 MB PNG
>>107267891
Glad you liked it. You wouldn't believe the amount of shit anons gave me about writing it.
>if I reach the context limit, I should just put the important stuff into some kind of data storage and can restart a chat by replacing the first message with a summary of the previous one?
Yes, you're suggesting using "Summarize" or some such function to create the extension of the roleplay, which is one strategy.
There's a bunch of "long burn" RP strategies that I don't really use; read up on those. At some point, as you RP with a particular character... you've ended up with another character. How you manage that is up to you; there are strategies ranging from summary->author's note, to creating a whole TXT flat file and adding it as a RAG, or just create a new character card with the V2 character.
It's really up to you.
I use Author's Note all the time on anything that runs more than about 10 rounds to keep track of things that happen, that I want the LLM to remember and consider as it responds.
>>107267961
> https://github.com/SillyTavern/Extension-PromptInspector
I'll have to check that out. ty for posting.
>>
>>107275276
You think just because you didn't have to register an account that makes you immune from laws and morals?
>>
>>107275422
yes because (((they))) aren't looking for me. i'm not a target.
>>
>>107275444
Feds re looking to fulfill a quota.
>>
File: 1755133745365691.webm (1.45 MB, 640x480)
1.45 MB
1.45 MB WEBM
>>107275422
yes.
>>
File: G6N0n3RacAMAFeQ.jpg (188 KB, 1056x1008)
188 KB
188 KB JPG
nano banana pro is crazy
https://x.com/cto_junior/status/1991564259516702997
>>
>>107276164
is this image ai generated?
>>
>>107275861
Frame count too high.
>>
File: GO0dhovbkAASWxi.jpg (240 KB, 1668x2000)
240 KB
240 KB JPG
tech illiterate here, i just followed this guide: https://rentry.org/wan21kjguide , and when i try to img2vid i get this line in the cmd
>Lib\site-packages\torch\_inductor\utils.py:1613] [0/0] Not enough SMs to use max_autotune_gemm mode

what do?
>>
>>107276389
>>>/g/ldg
>>
>>107276389
>>107276405
Sorry looks like the resident autist is having a spergout and this board has no mods.
I hope you can find someone to help you later.
>>
>>107276389
hardware limitation
>what do?
buy a better gpu. stop being poor. alternatively commit soduku for not googling this.
>>
>>107276429
>4 threads
What's going on over there? Haven't checked in very often.
>>
>>107276513
Thread splitter keeps trying to add an extra link (AniStudio) to the OP and keeps baking to try to push people over to his threads but it's not working, hence the multiple threads.
>>
File: asleep.jpg (45 KB, 400x428)
45 KB
45 KB JPG
>>107276574
oh.
>>
imagen threads are forever ruined, the same happened to /hdg/
>>
I love deepsex but it's annoying that it can't do structured outputs. I will use it to generate semi-structured text for me, like so:

```
sub 1
here is some content

sub 2
here is some content
```

I want to run a smaller local model, which will transform that into JSON structured output.

`{"sub1" "X", "sub2": Y"}`

What model can I use for this?
>>
>>107276680
structured outputs is backend dependant, tf are you saying
>>
>>107276680
>but it's annoying that it can't do structured outputs
Really?
as in it's API follows the openAI-compatible standard but ignores the
>"response_format": {
> "type": "json_object",
> "schema": json_schema,
> },
param?
That sucks.
If the API docs don't say that it doesn't support structured output/json schema, try wrapping those into an extra_body object.
I think I had to do that when using llama.cpp + OpenAI client python library.

>What model can I use for this?
Try Qwen 30B.
>>
>>107276748
I'm assuming he is using the official API rather than running it locally.
>>
'depression': None,
'anxiety': None,
'fear': None,
'terror': None,
'horror': None,
'dread': None,
'apprehension': None,
'foreboding': None,
'omen': None,
'portent': None,
'prophecy': None,
'vision': None,
'dream': None,
'nightmare': None,
'hallucination': None,
'delusion': None,
'madness': None,
'insanity': None,
'craziness': None,
'lunacy': None,
'dementia': None,
'psychosis': None,
'schizophrenia': None,
'depression': None,
'anxiety': None,
'fear': None,
'terror': None,
'horror': None,
'dread': None,
'apprehension': None,
'foreboding': None,
'omen': None,
'portent': None,
'prophecy': None,
'vision': None,
'dream': None,
'nightmare': None,
'hallucination': None,
'delusion': None,
'madness': None,
'insanity': None,
'craziness': None,
'lunacy': None,
'dementia': None,
'psychosis': None,
'schizophrenia': None,
'depression': None,
'anxiety': None,
'fear': None,
'terror': None,
'horror': None,
'dread': None,
'apprehension': None,
'foreboding': None,
'omen': None,
'portent': None,
'prophecy': None,
'vision': None,
'dream': None,
'nightmare': None,
'hallucination': None,
'delusion': None,
'madness': None,
'insanity': None,
'craziness': None,
'lunacy': None,
'dementia': None,
'psychosis': None,
'schizophrenia': None,
'depression': None,
'anxiety': None,
'fear': None,
'terror': None,
'horror': None,
'dread': None,
'apprehension': None,
'foreboding': None,
'omen': None,
'portent': None,
'prophecy': None,
'vision': None,
'dream': None,
'nightmare': None,
'hallucination': None,
'delusion': None,
'madness': None,
'insanity': None,
'craz......'

# END OF DICT — all 400+ tags processed
}

Spooky.
>>
>>107276964
Which model?
>>
>>107277071
Qwen3-Max
>>
>>107276680
why would you ever use a language model to transform data from one format to another, just use python or something
ask your model to write you a script for it if anything but that is not an llm task
>>
File: Altman.jpg (89 KB, 593x606)
89 KB
89 KB JPG
Altman is getting scared.
https://x.com/wallstengine/status/1991659177283051870
>>
>>107277169
>last month
>>
>>107277169
Actual article referenced, but paywalled.
https://www.theinformation.com/articles/openai-ceo-braces-possible-economic-headwinds-catching-resurgent-google
>>
>>107276680
>>107276765
Lol how about using the official api json output then
https://api-docs.deepseek.com/guides/json_mode
>>
File: 1756330197483127.jpg (307 KB, 976x850)
307 KB
307 KB JPG
>>107277169
>temporary
>>
>>107277169
It never made sense that Google would have lost this battle unless they had just completely failed to even try. They have all the data.
>>
>>107277574
I don't think it was a foregone conclusion that Google would succeed; it was entirely possible that they could have failed completely. Just look at Meta, who has completely bombed out of the race despite starting out with mountains of cash, compute, and data.
>>
>>107277620
And Google was in the path to fail, but they did the right choices: Hired the Character.AI guy that was one of the author's of attention is all you need, and fired everyone that was responsible for Bard.
Meta, on the other hand, is stagnant. They didn't change anything after the failures that have been their last models.
>>
>>107277644
>the failures that have been their last models
only the last models? bruh, meta never made a good model
if llama 1 hadn't been leaked and become sort of open sores, who in the entire world would have cared for this piece of shit? if the early llama were any good, why even rando finetrooners were making better instructs/chat models?
llama was never good, this board's users are just nostalgic for the first local llm they experienced
it wasn't even meta's intention to be the flagbearer of open sores either, them being the figure of open llm and having projects like llama.cpp named after them was a happy accident
>>
>>107277877
they always had incredibly bad taste, yeah
like not training L1 on code
or filtering porn from the training data

they should have just sucked it up and made their own knockoff of DS V3 this year, they probably could have made something better than kimi k2 with all of their compute
>>
what is this
https://huggingface.co/TroyDoesAI/Qwen3-15B-A2B-Base
>>
>>107277169
>AI gains
in what? aceing the test set because it was trained on it?
>>
>>107277934
>not training on code is... le bad
kys
>>
>>107278099
looks like lobotomy
>>
>>107278118
sorry but not training on code makes models completely retarded
morons like you must be seething now that literally every new model is codemaxxed lol
>>
>>107278293
Go else where shill
This is a RP thread
>>
>>107277877
Meta is full of jeets, don't expect anything from these retards
>>
File: 98375623.jpg (101 KB, 828x982)
101 KB
101 KB JPG
>>107277169
>Sam has a model way more powerful than gemini 3.
openai is back
>>
I went back to some of my logs with Mistral Large and damn, those new MoE models may be smarter but they don't simply don't hit that hard anymore.
>>
>>107278338
he was talking about 5.1
>>
>>107278350
kek
>>
>>107278338
S -> S
H -> T
A -> R
L -> A
L -> W
O -> B
T -> E
P -> R
E -> R
A -> Y
T -> 2
Holy shit Shallotpeat = Strawberry2
>>
>>107277644
The claim that Google fired everyone responsible for Bard is false. Google has had layoffs and dismissals related to its AI work, but the entire Bard team was not fired.
>>
>>107278838
>>107278838
>>107278838
>>
>>107278847



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.