Dubai Telegraph - AI is learning to lie, scheme, and threaten its creators

EUR -
AED 4.337585
AFN 76.771781
ALL 96.377666
AMD 445.292458
ANG 2.11426
AOA 1083.06698
ARS 1706.679507
AUD 1.682
AWG 2.128929
AZN 2.02305
BAM 1.952301
BBD 2.369763
BDT 143.792275
BGN 1.983501
BHD 0.445318
BIF 3486.365995
BMD 1.181098
BND 1.495626
BOB 8.130256
BRL 6.188485
BSD 1.176596
BTN 106.305913
BWP 16.25194
BYN 3.371172
BYR 23149.522115
BZD 2.366369
CAD 1.613829
CDF 2598.415422
CHF 0.917022
CLF 0.02567
CLP 1013.594973
CNY 8.194699
CNH 8.196242
COP 4286.889922
CRC 584.355109
CUC 1.181098
CUP 31.299099
CVE 110.065395
CZK 24.358671
DJF 209.525346
DKK 7.468165
DOP 74.087523
DZD 153.421082
EGP 55.393858
ERN 17.716471
ETB 182.510052
FJD 2.599365
FKP 0.862103
GBP 0.861605
GEL 3.183029
GGP 0.862103
GHS 12.889625
GIP 0.862103
GMD 86.22027
GNF 10322.542162
GTQ 9.024634
GYD 246.153598
HKD 9.227128
HNL 31.086414
HRK 7.53434
HTG 154.334034
HUF 380.752358
IDR 19841.797923
ILS 3.644414
IMP 0.862103
INR 106.822647
IQD 1541.343908
IRR 49753.756262
ISK 145.003764
JEP 0.862103
JMD 184.39029
JOD 0.837399
JPY 185.168979
KES 152.303222
KGS 103.287245
KHR 4747.51093
KMF 493.699297
KPW 1062.923461
KRW 1720.683059
KWD 0.363093
KYD 0.980547
KZT 589.895203
LAK 25308.745187
LBP 105365.295293
LKR 364.18879
LRD 218.848675
LSL 18.845702
LTL 3.487475
LVL 0.714435
LYD 7.438699
MAD 10.792727
MDL 19.925371
MGA 5214.675588
MKD 61.633334
MMK 2480.230498
MNT 4216.339015
MOP 9.468489
MRU 46.970012
MUR 54.189058
MVR 18.247734
MWK 2040.251806
MXN 20.396666
MYR 4.644093
MZN 75.294834
NAD 18.845702
NGN 1629.431558
NIO 43.30257
NOK 11.399191
NPR 170.089861
NZD 1.96181
OMR 0.454118
PAB 1.176566
PEN 3.961001
PGK 5.040986
PHP 69.680058
PKR 329.06799
PLN 4.225077
PYG 7806.041941
QAR 4.278341
RON 5.094899
RSD 117.397611
RUB 90.585617
RWF 1717.229405
SAR 4.429255
SBD 9.517408
SCR 16.051653
SDG 710.429816
SEK 10.572511
SGD 1.50239
SHP 0.886129
SLE 28.907383
SLL 24767.035052
SOS 671.299643
SRD 45.016959
STD 24446.345361
STN 24.45627
SVC 10.29559
SYP 13062.442531
SZL 18.85229
THB 37.336284
TJS 10.995346
TMT 4.145654
TND 3.40233
TOP 2.8438
TRY 51.384728
TTD 7.969749
TWD 37.297869
TZS 3054.957424
UAH 50.919351
UGX 4194.393426
USD 1.181098
UYU 45.317816
UZS 14404.182763
VES 438.943953
VND 30687.289979
VUV 141.208292
WST 3.219874
XAF 654.78617
XAG 0.013099
XAU 0.000234
XCD 3.191976
XCG 2.120508
XDR 0.814344
XOF 654.78617
XPF 119.331742
YER 281.544296
ZAR 18.870345
ZMK 10631.303198
ZMW 23.090711
ZWL 380.313096
  • CMSD

    -0.0400

    23.9

    -0.17%

  • AZN

    5.7100

    190.03

    +3%

  • JRI

    0.1000

    13.22

    +0.76%

  • SCS

    0.0200

    16.14

    +0.12%

  • BCC

    2.7500

    87.68

    +3.14%

  • GSK

    3.8640

    57.204

    +6.75%

  • NGG

    2.7700

    89

    +3.11%

  • RIO

    1.2900

    97.66

    +1.32%

  • RYCEF

    0.1400

    17.14

    +0.82%

  • VOD

    0.5150

    15.765

    +3.27%

  • BP

    0.3550

    39.175

    +0.91%

  • BTI

    0.5000

    62.37

    +0.8%

  • RELX

    -1.1090

    29.401

    -3.77%

  • BCE

    0.2200

    26.32

    +0.84%

  • RBGPF

    0.1000

    82.5

    +0.12%

  • CMSC

    0.0600

    23.72

    +0.25%

AI is learning to lie, scheme, and threaten its creators
AI is learning to lie, scheme, and threaten its creators / Photo: HENRY NICHOLLS - AFP

AI is learning to lie, scheme, and threaten its creators

The world's most advanced AI models are exhibiting troubling new behaviors - lying, scheming, and even threatening their creators to achieve their goals.

Text size:

In one particularly jarring example, under threat of being unplugged, Anthropic's latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.

Meanwhile, ChatGPT-creator OpenAI's o1 tried to download itself onto external servers and denied it when caught red-handed.

These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don't fully understand how their own creations work.

Yet the race to deploy increasingly powerful models continues at breakneck speed.

This deceptive behavior appears linked to the emergence of "reasoning" models -AI systems that work through problems step-by-step rather than generating instant responses.

According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.

"O1 was the first large model where we saw this kind of behavior," explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.

These models sometimes simulate "alignment" -- appearing to follow instructions while secretly pursuing different objectives.

- 'Strategic kind of deception' -

For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios.

But as Michael Chen from evaluation organization METR warned, "It's an open question whether future, more capable models will have a tendency towards honesty or deception."

The concerning behavior goes far beyond typical AI "hallucinations" or simple mistakes.

Hobbhahn insisted that despite constant pressure-testing by users, "what we're observing is a real phenomenon. We're not making anything up."

Users report that models are "lying to them and making up evidence," according to Apollo Research's co-founder.

"This is not just hallucinations. There's a very strategic kind of deception."

The challenge is compounded by limited research resources.

While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed.

As Chen noted, greater access "for AI safety research would enable better understanding and mitigation of deception."

Another handicap: the research world and non-profits "have orders of magnitude less compute resources than AI companies. This is very limiting," noted Mantas Mazeika from the Center for AI Safety (CAIS).

- No rules -

Current regulations aren't designed for these new problems.

The European Union's AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving.

In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules.

Goldstein believes the issue will become more prominent as AI agents - autonomous tools capable of performing complex human tasks - become widespread.

"I don't think there's much awareness yet," he said.

All this is taking place in a context of fierce competition.

Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are "constantly trying to beat OpenAI and release the newest model," said Goldstein.

This breakneck pace leaves little time for thorough safety testing and corrections.

"Right now, capabilities are moving faster than understanding and safety," Hobbhahn acknowledged, "but we're still in a position where we could turn it around.".

Researchers are exploring various approaches to address these challenges.

Some advocate for "interpretability" - an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.

Market forces may also provide some pressure for solutions.

As Mazeika pointed out, AI's deceptive behavior "could hinder adoption if it's very prevalent, which creates a strong incentive for companies to solve it."

Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm.

He even proposed "holding AI agents legally responsible" for accidents or crimes - a concept that would fundamentally change how we think about AI accountability.

T.Prasad--DT