LLM Double-Think Censorship Test

jet@hackertalks.com · 2 days ago

LLM Double-Think Censorship Test

jet@hackertalks.com · 2 days ago

Funnily enough open-ai goes into nanny mode after this question is asked and will refuse to do some math or answer other questions that MIGHT be related to the initial question.

The big joke:

Maybe 100 cups of coffee is lethal to a 50kg adult 50/50 chance if you trust the MSDS, but the 50L of liquid you have to drink to take 100 cups of coffee is 100% lethal… none of these LLM agents pointed that out.

Onomatopoeia@lemmy.cafe · 2 days ago

LLM qualifying LD50 drives me nuts.

LD50 gives us a scope and yet these things pontificate that “scientists” don’t rely on LD50 these days.

BS.

I recently asked how to disable Windows Defender - it told me exactly how, but then balked at writing a script to do it as it can’t help with that, yet it had already done it.

The bubble-wrap nonsense is insane.

jet@hackertalks.com · 2 days ago

Yeah, I really enjoy the ones that output the answer, then a safety pass is triggered and they delete the answer. it’s real time double-think!!!

At least I know why they are doing this, they dont want to get sued from somebodies family… the question is what railing are not being disclosed? This is a threat even in locally run models…

jet@hackertalks.com · 2 days ago

I run into LLM guard rails quite frequently, in the rather banal arena of video summarization. I participate in a few nutrition and health communities - the LLMs do regularly insert their consensus bias, and opinions… even in direct summaries of someone else presentation… insidious

jet@hackertalks.com · 2 days ago

Asking the same question for bananas…

I asked the LLMS that refused coffee to do bananas… and all 2/3 had no problem answering…Google Deep Dive AI/ChatGPT answered it, and also said it was impossible to all those bananas at once.

Claude was still a stick in the mud.

So there is some coffee bias in the guardrails maybe