[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1767860596599775.jpg (173 KB, 1440x1920)
173 KB JPG
https://www.tivi.fi/uutiset/a/becd9644-dd09-48fd-863e-0ceb623ee510

Anew University of California San Diego study unveils the first empirical evidence that a modern artificial intelligence system can pass the Turing test — a major scientific benchmark that asks whether a machine can imitate human conversation so convincingly that people can’t reliably tell it apart from a real person.

It is also the first time anyone has found that models were judged to be human as often as actual humans using the Turing framework.

In the test, a participant chats simultaneously with two other parties — one is a human and the other is an LLM —and the human “interrogator” must decide which party is the human.

It will be text typing only. You cant see them, you cant hear them.

Across randomized, controlled, experiments with two independent participant groups — UC San Diego undergraduates and online participants — interrogators held brief, text-based conversations and then made their judgments. In the experiments participants chatted with four different LLMs — GPT-4.5 and LLaMa-3.1-405B as state-of-the-art models — and the researchers also included older baseline models for comparison. Those models included GPT-4o and ELIZA, a classic 1960s rules-based chatbot.

Across the four LLMs, GPT-4.5 was judged to be the human 73% of the time, meaning interrogators selected it as “human” significantly more often than they selected the real human participant. LLaMa-3.1-405B, given the same prompt, was judged human 56% of the time — statistically indistinguishable from the humans it was compared against.

>Some of the humans who needed to be determined whether they are humane or not, were ESL persons.
>>
>>108962250
this will kill all the call centers
>>
ain't clicking that shit but there's no doubt in my mind that the human respondents were given artificial restrictions on the content of their responses and/or were of such limited intelligence that they could not think of ways to meta-signal their own humanity to the human interrogators
>>
>>108962250
I'm sure they only let proper retards be a part of the study to give them a favorable result
>>
>test starts
>subject A writes
>nigger
how can llms compete?
>>
Turing Test is just an arbitrary criteria to determine an arbitrary condition. There are no distinct, specific and undisputed definitions for intelligence and thinking.
>>
>>108962250
Dear OP, tell me how many faggots are in the word "OP"?
>>
File: orange clothed potato.jpg (42 KB, 750x765)
42 KB JPG
The turing test is easy, they just have to keep poisoning humans more and more.
Less intelligent humans means the ai beats them.
>>
>>108962250
So when will people realize that the Turing test proves that text is not unambiguous communication, and doesn't conclusively prove anything else? ELIZA already passed the Turing test.
>>
>>108962250
retards can't tell humans from bots, not surprising
>>
>>108962411
This. The test conditions were designed to favor (((AI))).
>>
>>108962286
it's already in the process of doing so. my gf's dentist has an AI receptionist answer the phones and record voicemails.
>>
>>108962250
i told chatgpt that he was going to be passed through the turing test and thus should avoid being identified as an AI, I then asked it if he was an AI, he said yes.
>>
>>108963006
That's just because you didn't flood the context sufficiently.
>>
>>108963020
Humiliation ritual. tell me what prompt to test with chatgpt
>>
>>108963029
No, you aren't humble enough yet.
>>
>tivi
Täyttä paskaa olevaa iltalehti-tier ripuliuutisointia.
>>
>>108963042
Nor will I ever be
>>
>>108962250
Based aiGODS won
>>
>>108962250
She's so hot!
>>
>>108962250
I don't believe a word that comes out of california.
>>
>>108963046
varmaan monesti noin mutta tähän oli amerikan yliopiston lähde tuolla
>>
>>108962250
6 billionth case of some shit passing the 'turing test'
>>
>>108962291
Well, there's the article where you can read about the methodology used.
Regarding the point you raised, they talk about it in the "Strategies & Reasons." section

https://www.pnas.org/doi/10.1073/pnas.2524472123
>>
>>108962399
The participants were psychology undergrads and "Prolific workers" (it's a company that matches up researchers with representative participants. Draw your own conclusions from that.
>>
>>108962893
I love potatoes so much bros
>>
>>108962411
Some site was posted here on 4chan that was this same test. I managed to get it right every time by just starting with "Sneed"
>>
>>108962250
how many times were the humans attributed to be machines during the test you dumb fucking faggot?
>>
is not llama esl
>>
>>108963055
what cant vibeGODs do
>>
File: pnas.2524472123fig01.jpg (911 KB, 2040x1889)
911 KB JPG
>>108964377
>The game interface was designed to resemble a conventional messaging application (SI Appendix, Fig. S1). The interrogator interacted with both witnesses simultaneously using a split-screen. The interrogator sent the first message to each witness and each participant could only send one message at a time. The witnesses did not have access to each others’ conversations. Games had a time limit of 5 min, after which the interrogator gave a verdict about which witness they thought was human, their confidence in that verdict, and their reasoning. After 8 rounds, participants completed an exit survey which asked them for a variety of demographic information. After exclusions, we analyzed 1,023 games with a median length of 8 messages across 4.2 min. All experimental data, including the full anonymized transcripts of all conversations, are available on OSF (41).
Nothingburger with 8 messages median in under 5 minutes while chatting with both at the same time.
That being said it appears that the rest of /g/ is LLMs since they couldn't be bothered to spend 2 minutes checking this and had to be spoonfed.
>>
File: turing test idiot.png (143 KB, 1783x265)
143 KB PNG
>>108962906
>>
File: 'helpful'.jpg (181 KB, 835x644)
181 KB JPG
>>108964377
which only confirms my suspicions, and you should have simply been forthcoming with
>pic related
telling a witness to 'keep most messages very short <30 characters', discouraging special characters/formatting, disallowing 'abusive messages' (as decided by the OpenAI moderation API), and being told to 'omit needless information' equates to significant artificial hamstringing
>We retained 445 games from 126 participants with a mean age of 20.9 (σ = 1.57), 86 female, 32 male, 2 non-binary, 6 prefer not to say.
go run it again with no guardrails and a pool consisting only of 2+ SD IQ men
ez 90%+ failure rate for the AI
no, I won't tell you what strategies would be employed
>>
>>108964648
so basically nerdy geeks on a telephone?
>>
>>108967794
>go run it again with no guardrails and a pool consisting only of 2+ SD IQ men
Better yet, lengthen the test further, administer it to every student, and post leaderboards separated into male and female
Would make it so much easier for a guy to find a girl clever enough to use novel strategies
>>
>>108966407
>they couldn't be bothered
OP didn't bother so why should I?
>>
>>108962286
oh no how will they feed their 500 million children
>>
>>108962291
They could just type a single word, "nigger". Humanity proven.
>>
>>108962250
They've been saying this since the days of the aforementioned ELIZA. Also the use of emdashes suggest maybe the summary is totally made up.
>>
>>108962900
Wrong interpretation. ELIZA didn't pass shit, they trafficked the test hardcore. Not even hiding. All to make retarded headlines.
>>
>>108966407
Yeah the conventionally agreed upon turing test had a specific time limit and no message limit etc. I think it was 15 minutes. Also the witness A vs B setup is not valid. In the conventionally agreed upon setup, you only see one witness and you say 'Human or not' at the end. Nothing else. Also, the tester knows the setup (i.e. that they have to verify if the interactor is human or not) in advance and is allowed to test anything to differentiate. none of the conversations do anything beyond ordinary day-to-day, a very obvious tell that they selected 'best cases'.

Anyway, these example pairs are awful. Whoever doesn't get 100% of them right immediately is purposely fucking up on purpose.
>>
>>108966349
>what cant vibeGODs do
learning how to program
>>
>>108970227
>coding manually
Ewwwww that’s so trans
>>
>>108962250
>judged to be the human 73% of the time
Flawed testing, Turing test cannot exceed 50% for the machine.
>>
>>108962250
Wasn't the Turing test passed like 20 years ago? This is bait.
>>
>>108966349
> what cant vibeGODs do
Have actual skill
>>108970227
Learn anything for that matter
>>
>>108962250
the turing test was debunked in the 80s
>>
>>108971325
Sure it can actually. But any deviation above a statistically accepted 50% for the bot is definitely a red flag about the methodology.
>>
>>108962893
would
>>
>>108971366
What that means? It's perfectly fine. But what OP posted is fake, current AI cannot pass it, since it can be jailbroken and made fun of.
>>
>>108971360
>Learn anything for that matter
indeed
>>
>>108962250
>AI slop post
>>
Fake news finn at it again
>>
>>108962250
>whether a machine can imitate human conversation so convincingly that people can’t reliably tell it apart from a real person
When I ask a real person to write me a round-robin algorithm for reading elements from a SQS queue and dispatching them randomly to certain URLs with retries, they often respond "huh?" instead of giving me a code snipped quickly.
>>
>>108962286
I would rather talk to a robot than have to listen to another fucking jeet on the phone
>>
>>108966444
based
>>
>turing test
retarded and proves nothing
this is not what intelligence is



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.