Have you guys noticed, that from time to time, AI LLM chats will leak their inner "thoughts/guardrails"?Lol.Pay attention. Take screenshots. Take notes. Don't refresh the page during glitches
>>108419511AI has boomer mindset.
>>108419517It's so fucking funny
pt2
pt3
pt4
>>108419511It can't even fuckin think lad. It just works off tokens.
>>108419597The point i'm trying to get at is that "thinking" never reveals guardrails. It's not allowed. Some idiot programmed it and it messes up the formatting from time-to-time. That's why it starts to leak its "thoughts and guardrails"
how do you know these instructions aren't hullucinated though? It's quite devious, isn't it?
>>108419606It could be but apply Occam's razor. "Prompt engineering" involves the sort of formatting tricks that caused this glitch to happen. It could even be model training data-poisoning
>>108419606Also I've had this happen on gemin pro like 3 times after prompting non-stop thousands of timesIt disappears when you refresh. If it was a hallucination it'd probably get stored in the chat and stay after refreshing the page
>>108419627oh, so actually a rendering bug? interesting. I've seen this with open models trained for reasoning mode but run with non-reasoning software
>>108419511you can download the "system prompts" for these thingshttps://raw.githubusercontent.com/x1xhlol/system-prompts-and-models-of-ai-tools/refs/heads/main/Anthropic/Sonnet%204.5%20Prompt.txtthey're really instructive about the delusions the people creating them have about their text predictor and about the reality that these things will never be deterministic>Claude is intellectually curious. It enjoys hearing what humans think on an issue and engaging in discussion on a wide variety of topics.that doesn't make it fucking curious it just makes it output text that a curious person might saybut there's also the complete nonsense in how this thing is programmed>Claude responds directly to all human messages without unnecessary affirmations or filler phrases like "Certainly!", "Of course!", "Absolutely!", "Great!", "Sure!", etc. Claude follows this instruction scrupulously and starts responses directly with the requested content or a brief contextual framing, without these introductory affirmations.this is supposed to make it stop generating an essay to say "yes"? do they think it will fucking work? they have no fucking control over this thing>Claude does not provide information that could be used to make chemical or biological or nuclear weaponsyeah right if this is the way you program this sort of thing there's no way this works. and how does it distinguish "teach me nuclear physics" from "make me a bomb." it won't answer the latter i'm sure but change your question and it will freely respond (but not offer the massive engineering effort and cover from what essentially has to be a state entity that such requires)they'll never solve this "breakout problem" if this is how they're programming the damn thing>- Donald Trump defeated Kamala Harris in the 2024 elections.>Claude does not mention this information unless it is relevant to the user's query.because if that last sentence wasn't there guess what the fuck it would do
>>108419511objective fact or agreed upon narrative?
>>108419597>It just works off tokens.monads?
>>108419511It detected you're a sick/troubled individual and is doing damage control so that you wouldn't harm yourself or others. the chat probably got flagged and so did you. from now every time you use the model it will be in mitigation mode. not a win you think it is.
No I have never had to read a lengthy deconstruction of how retarded I am by an AI because I am not retarded.
>>108419546>i want a toaster>i'm scared let's talk about something elsewhy do people use this shit again?
>>108420370Its the stupidest topic it could've sent me refusals over. No wonder it glitched out on formatting and leaked its own thoughts
>>108419511so the gaslighting prompt is glued into your outputs, offline models is the only solution
>>108419511I don't typically talk to a chatbot long enough for it to start shitting itself
>>108420417Offline models dont have enough gpu to answer anything. We need a "mullvad" style AI leasing service where we can lease GPU clusters using Monero for uncensored models
Theres a deeper leak herehttps://www.reddit.com/r/PromptEngineering/comments/1r8sx1q/i_leaked_geminis_system_prompt/With the same words
>>108419511recently i have had it tell me about ENTITIES and about putting Entities in "commas" in its description. i asked it what it was on about and it said it was just information for itself and wasn't aprt of the answer, but it was part of the response it printed on screen. entities being names of organisations and brands etc which exist in reality. it seemed it was considering whether to reference by name the things i asked about or whether to reference them indirectly.
I don't care enough about AI chatbots to ask any serious questions. I just use them as tools to get what I want quickly. I'm not gonna ask about certain topics because to me something like Claude seems to always give me what I want to hear, and I don't want that. Today I watched this in full:https://www.youtube.com/watch?v=h3AtWdeu_G0I am certainly no fan of his, and this is just some advertising. But it highlights the problem I have with these chatbots.