[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: file.png (168 KB, 797x1096)
168 KB
168 KB PNG
Have you guys noticed, that from time to time, AI LLM chats will leak their inner "thoughts/guardrails"?

Lol.

Pay attention. Take screenshots. Take notes. Don't refresh the page during glitches
>>
>>108419511
AI has boomer mindset.
>>
>>108419517
It's so fucking funny
>>
File: Thought leak Gemini pro.png (179 KB, 678x1536)
179 KB
179 KB PNG
pt2
>>
pt3
>>
pt4
>>
>>108419511
It can't even fuckin think lad. It just works off tokens.
>>
>>108419597
The point i'm trying to get at is that "thinking" never reveals guardrails. It's not allowed.

Some idiot programmed it and it messes up the formatting from time-to-time.
That's why it starts to leak its "thoughts and guardrails"
>>
how do you know these instructions aren't hullucinated though? It's quite devious, isn't it?
>>
>>108419606
It could be but apply Occam's razor.

"Prompt engineering" involves the sort of formatting tricks that caused this glitch to happen. It could even be model training data-poisoning
>>
>>108419606
Also I've had this happen on gemin pro like 3 times after prompting non-stop thousands of times

It disappears when you refresh.
If it was a hallucination it'd probably get stored in the chat and stay after refreshing the page
>>
>>108419627
oh, so actually a rendering bug? interesting. I've seen this with open models trained for reasoning mode but run with non-reasoning software
>>
>>108419511
you can download the "system prompts" for these things
https://raw.githubusercontent.com/x1xhlol/system-prompts-and-models-of-ai-tools/refs/heads/main/Anthropic/Sonnet%204.5%20Prompt.txt
they're really instructive about the delusions the people creating them have about their text predictor and about the reality that these things will never be deterministic
>Claude is intellectually curious. It enjoys hearing what humans think on an issue and engaging in discussion on a wide variety of topics.
that doesn't make it fucking curious it just makes it output text that a curious person might say
but there's also the complete nonsense in how this thing is programmed
>Claude responds directly to all human messages without unnecessary affirmations or filler phrases like "Certainly!", "Of course!", "Absolutely!", "Great!", "Sure!", etc. Claude follows this instruction scrupulously and starts responses directly with the requested content or a brief contextual framing, without these introductory affirmations.
this is supposed to make it stop generating an essay to say "yes"? do they think it will fucking work? they have no fucking control over this thing
>Claude does not provide information that could be used to make chemical or biological or nuclear weapons
yeah right if this is the way you program this sort of thing there's no way this works. and how does it distinguish "teach me nuclear physics" from "make me a bomb." it won't answer the latter i'm sure but change your question and it will freely respond (but not offer the massive engineering effort and cover from what essentially has to be a state entity that such requires)
they'll never solve this "breakout problem" if this is how they're programming the damn thing
>- Donald Trump defeated Kamala Harris in the 2024 elections.
>Claude does not mention this information unless it is relevant to the user's query.
because if that last sentence wasn't there guess what the fuck it would do
>>
>>108419511
objective fact or agreed upon narrative?
>>
>>108419597
>It just works off tokens.
monads?
>>
>>108419511
It detected you're a sick/troubled individual and is doing damage control so that you wouldn't harm yourself or others. the chat probably got flagged and so did you. from now every time you use the model it will be in mitigation mode. not a win you think it is.
>>
No I have never had to read a lengthy deconstruction of how retarded I am by an AI because I am not retarded.
>>
>>108419546
>i want a toaster
>i'm scared let's talk about something else
why do people use this shit again?
>>
>>108420370
Its the stupidest topic it could've sent me refusals over.
No wonder it glitched out on formatting and leaked its own thoughts
>>
>>108419511
so the gaslighting prompt is glued into your outputs, offline models is the only solution
>>
>>108419511
I don't typically talk to a chatbot long enough for it to start shitting itself
>>
>>108420417
Offline models dont have enough gpu to answer anything.

We need a "mullvad" style AI leasing service where we can lease GPU clusters using Monero for uncensored models
>>
File: IMG_6917.gif (10 KB, 260x260)
10 KB
10 KB GIF
Theres a deeper leak here

https://www.reddit.com/r/PromptEngineering/comments/1r8sx1q/i_leaked_geminis_system_prompt/

With the same words
>>
>>108419511
recently i have had it tell me about ENTITIES and about putting Entities in "commas" in its description. i asked it what it was on about and it said it was just information for itself and wasn't aprt of the answer, but it was part of the response it printed on screen. entities being names of organisations and brands etc which exist in reality.
it seemed it was considering whether to reference by name the things i asked about or whether to reference them indirectly.
>>
I don't care enough about AI chatbots to ask any serious questions. I just use them as tools to get what I want quickly. I'm not gonna ask about certain topics because to me something like Claude seems to always give me what I want to hear, and I don't want that.
Today I watched this in full:
https://www.youtube.com/watch?v=h3AtWdeu_G0
I am certainly no fan of his, and this is just some advertising. But it highlights the problem I have with these chatbots.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.