Dubai Telegraph - Inbred, gibberish or just MAD? Warnings rise about AI models

EUR -
AED 4.269099
AFN 72.644925
ALL 95.076242
AMD 427.973788
ANG 2.080952
AOA 1066.940946
ARS 1619.310336
AUD 1.62529
AWG 2.093493
AZN 1.98043
BAM 1.952096
BBD 2.341856
BDT 142.721021
BGN 1.940855
BHD 0.438457
BIF 3459.420975
BMD 1.162245
BND 1.486405
BOB 8.034892
BRL 5.877243
BSD 1.162694
BTN 111.524295
BWP 16.447074
BYN 3.235716
BYR 22779.993656
BZD 2.338503
CAD 1.598842
CDF 2612.149237
CHF 0.914587
CLF 0.026819
CLP 1055.53936
CNY 7.914774
CNH 7.919977
COP 4429.104869
CRC 527.444525
CUC 1.162245
CUP 30.799481
CVE 110.588029
CZK 24.31021
DJF 206.554563
DKK 7.471262
DOP 69.212121
DZD 154.461189
EGP 61.40658
ERN 17.433669
ETB 183.112088
FJD 2.561762
FKP 0.862257
GBP 0.872032
GEL 3.115269
GGP 0.862257
GHS 13.296531
GIP 0.862257
GMD 84.267207
GNF 10201.606223
GTQ 8.870283
GYD 243.262581
HKD 9.103804
HNL 30.944808
HRK 7.532977
HTG 152.244207
HUF 361.702584
IDR 20458.933129
ILS 3.393104
IMP 0.862257
INR 111.565078
IQD 1522.540392
IRR 1533000.593877
ISK 143.572521
JEP 0.862257
JMD 183.721378
JOD 0.824077
JPY 184.466856
KES 150.336783
KGS 101.638735
KHR 4663.510767
KMF 492.792107
KPW 1046.022246
KRW 1740.612787
KWD 0.358716
KYD 0.968978
KZT 545.863586
LAK 25511.268811
LBP 104318.488614
LKR 381.960138
LRD 213.126644
LSL 19.165856
LTL 3.431807
LVL 0.703031
LYD 7.351242
MAD 10.722914
MDL 20.115176
MGA 4861.669457
MKD 61.623504
MMK 2440.295192
MNT 4160.224164
MOP 9.378066
MRU 46.490185
MUR 54.835139
MVR 17.910628
MWK 2024.053269
MXN 20.149374
MYR 4.59029
MZN 74.271763
NAD 19.165851
NGN 1592.845004
NIO 42.678058
NOK 10.814225
NPR 178.438473
NZD 1.985725
OMR 0.446324
PAB 1.162714
PEN 3.989409
PGK 5.093
PHP 71.603608
PKR 323.830439
PLN 4.246552
PYG 7085.554754
QAR 4.236426
RON 5.155838
RSD 117.369313
RUB 84.565601
RWF 1697.458201
SAR 4.397708
SBD 9.316927
SCR 15.774497
SDG 697.932139
SEK 10.984146
SGD 1.488259
SHP 0.867733
SLE 28.595478
SLL 24371.690047
SOS 664.227031
SRD 43.52959
STD 24056.116125
STN 24.755809
SVC 10.173695
SYP 128.465739
SZL 19.165842
THB 37.936092
TJS 10.848401
TMT 4.079478
TND 3.365284
TOP 2.798406
TRY 52.864738
TTD 7.892702
TWD 36.69962
TZS 3021.836282
UAH 51.33988
UGX 4365.715804
USD 1.162245
UYU 46.571628
UZS 14005.047508
VES 592.917692
VND 30630.955755
VUV 137.052406
WST 3.144567
XAF 654.725887
XAG 0.015287
XAU 0.000256
XCD 3.141025
XCG 2.09556
XDR 0.813493
XOF 654.344081
XPF 119.331742
YER 277.315726
ZAR 19.39541
ZMK 10461.600028
ZMW 21.888841
ZWL 374.242279
  • RBGPF

    0.8900

    61.68

    +1.44%

  • BCC

    -3.4100

    65.99

    -5.17%

  • RYCEF

    -0.8300

    15.1

    -5.5%

  • NGG

    -6.7900

    80.64

    -8.42%

  • CMSC

    -0.1150

    22.98

    -0.5%

  • GSK

    -0.8289

    49.67

    -1.67%

  • RIO

    -5.9000

    103.69

    -5.69%

  • BTI

    -1.6100

    65.09

    -2.47%

  • CMSD

    -0.4500

    23.05

    -1.95%

  • BCE

    -0.4000

    23.79

    -1.68%

  • JRI

    -0.5565

    12.45

    -4.47%

  • RELX

    0.9400

    32.4

    +2.9%

  • VOD

    -0.8000

    14.68

    -5.45%

  • BP

    0.7292

    44.35

    +1.64%

  • AZN

    -3.3800

    181.58

    -1.86%

Inbred, gibberish or just MAD? Warnings rise about AI models
Inbred, gibberish or just MAD? Warnings rise about AI models / Photo: Fabrice COFFRINI - AFP/File

Inbred, gibberish or just MAD? Warnings rise about AI models

When academic Jathan Sadowski reached for an analogy last year to describe how AI programs decay, he landed on the term "Habsburg AI".

Text size:

The Habsburgs were one of Europe's most powerful royal houses, but entire sections of their family line collapsed after centuries of inbreeding.

Recent studies have shown how AI programs underpinning products like ChatGPT go through a similar collapse when they are repeatedly fed their own data.

"I think the term Habsburg AI has aged very well," Sadowski told AFP, saying his coinage had "only become more relevant for how we think about AI systems".

The ultimate concern is that AI-generated content could take over the web, which could in turn render chatbots and image generators useless and throw a trillion-dollar industry into a tailspin.

But other experts argue that the problem is overstated, or can be fixed.

And many companies are enthusiastic about using what they call synthetic data to train AI programs. This artificially generated data is used to augment or replace real-world data. It is cheaper than human-created content but more predictable.

"The open question for researchers and companies building AI systems is: how much synthetic data is too much," said Sadowski, lecturer in emerging technologies at Australia's Monash University.

- 'Mad cow disease' -

Training AI programs, known in the industry as large language models (LLMs), involves scraping vast quantities of text or images from the internet.

This information is broken into trillions of tiny machine-readable chunks, known as tokens.

When asked a question, a program like ChatGPT selects and assembles tokens in a way that its training data tells it is the most likely sequence to fit with the query.

But even the best AI tools generate falsehoods and nonsense, and critics have long expressed concern about what would happen if a model was fed on its own outputs.

In late July, a paper in the journal Nature titled "AI models collapse when trained on recursively generated data" proved a lightning rod for discussion.

The authors described how models quickly discarded rarer elements in their original dataset and, as Nature reported, outputs degenerated into "gibberish".

A week later, researchers from Rice and Stanford universities published a paper titled "Self-consuming generative models go MAD" that reached a similar conclusion.

They tested image-generating AI programs and showed that outputs become more generic and strafed with undesirable elements as they added AI-generated data to the underlying model.

They labelled model collapse "Model Autophagy Disorder" (MAD) and compared it to mad cow disease, a fatal illness caused by feeding the remnants of dead cows to other cows.

- 'Doomsday scenario' -

These researchers worry that AI-generated text, images and video are clearing the web of usable human-made data.

"One doomsday scenario is that if left uncontrolled for many generations, MAD could poison the data quality and diversity of the entire internet," one of the Rice University authors, Richard Baraniuk, said in a statement.

However, industry figures are unfazed.

Anthropic and Hugging Face, two leaders in the field who pride themselves on taking an ethical approach to the technology, both told AFP they used AI-generated data to fine-tune or filter their datasets.

Anton Lozhkov, machine learning engineer at Hugging Face, said the Nature paper gave an interesting theoretical perspective but its disaster scenario was not realistic.

"Training on multiple rounds of synthetic data is simply not done in reality," he said.

However, he said researchers were just as frustrated as everyone else with the state of the internet.

"A large part of the internet is trash," he said, adding that Hugging Face already made huge efforts to clean data -- sometimes jettisoning as much as 90 percent.

He hoped that web users would help clear up the internet by simply not engaging with generated content.

"I strongly believe that humans will see the effects and catch generated data way before models will," he said.

A.Ansari--DT