Dubai Telegraph - Inbred, gibberish or just MAD? Warnings rise about AI models

EUR -
AED 4.35335
AFN 77.050797
ALL 96.614026
AMD 452.873985
ANG 2.121943
AOA 1087.00321
ARS 1723.800654
AUD 1.702936
AWG 2.136666
AZN 2.019869
BAM 1.955248
BBD 2.406031
BDT 145.978765
BGN 1.990709
BHD 0.449191
BIF 3539.115218
BMD 1.18539
BND 1.512879
BOB 8.254703
BRL 6.231008
BSD 1.194568
BTN 109.699013
BWP 15.630651
BYN 3.402439
BYR 23233.647084
BZD 2.402531
CAD 1.615035
CDF 2684.909135
CHF 0.915881
CLF 0.026011
CLP 1027.058063
CNY 8.240537
CNH 8.248946
COP 4354.94563
CRC 591.535401
CUC 1.18539
CUP 31.412839
CVE 110.234327
CZK 24.334287
DJF 212.720809
DKK 7.470097
DOP 74.383698
DZD 153.702477
EGP 55.903178
ERN 17.780852
ETB 185.572763
FJD 2.613371
FKP 0.859325
GBP 0.865754
GEL 3.194674
GGP 0.859325
GHS 12.974143
GIP 0.859325
GMD 86.533903
GNF 10372.164298
GTQ 9.16245
GYD 249.920458
HKD 9.257838
HNL 31.365884
HRK 7.536597
HTG 156.336498
HUF 381.328619
IDR 19883.141804
ILS 3.663335
IMP 0.859325
INR 108.679593
IQD 1553.453801
IRR 49934.560565
ISK 144.985527
JEP 0.859325
JMD 187.197911
JOD 0.840489
JPY 183.433247
KES 152.915746
KGS 103.662825
KHR 4768.236408
KMF 491.93733
KPW 1066.949348
KRW 1719.752641
KWD 0.36382
KYD 0.995519
KZT 600.800289
LAK 25485.888797
LBP 101410.128375
LKR 369.427204
LRD 219.593979
LSL 19.132649
LTL 3.500149
LVL 0.717031
LYD 7.495914
MAD 10.835985
MDL 20.092409
MGA 5260.173275
MKD 61.631889
MMK 2489.374007
MNT 4229.125697
MOP 9.606327
MRU 47.30937
MUR 53.852723
MVR 18.32658
MWK 2059.023112
MXN 20.70407
MYR 4.672854
MZN 75.580924
NAD 18.967522
NGN 1643.520192
NIO 43.508231
NOK 11.437875
NPR 175.519161
NZD 1.96876
OMR 0.458133
PAB 1.194573
PEN 3.994177
PGK 5.066955
PHP 69.837307
PKR 331.998194
PLN 4.215189
PYG 8001.773454
QAR 4.316051
RON 5.097064
RSD 117.111851
RUB 90.544129
RWF 1742.915022
SAR 4.446506
SBD 9.544303
SCR 17.200951
SDG 713.016537
SEK 10.580086
SGD 1.505332
SHP 0.88935
SLE 28.834661
SLL 24857.038036
SOS 677.454816
SRD 45.104693
STD 24535.182964
STN 24.493185
SVC 10.452048
SYP 13109.911225
SZL 19.132635
THB 37.411351
TJS 11.151397
TMT 4.148866
TND 3.37248
TOP 2.854135
TRY 51.47818
TTD 8.110743
TWD 37.456003
TZS 3052.380052
UAH 51.199753
UGX 4270.811618
USD 1.18539
UYU 46.357101
UZS 14603.874776
VES 410.075543
VND 30749.020682
VUV 141.78282
WST 3.21762
XAF 655.774526
XAG 0.014004
XAU 0.000244
XCD 3.203577
XCG 2.153028
XDR 0.815573
XOF 655.774526
XPF 119.331742
YER 282.508153
ZAR 19.136335
ZMK 10669.938133
ZMW 23.443477
ZWL 381.695147
  • SCS

    0.0200

    16.14

    +0.12%

  • CMSD

    -0.0400

    24.05

    -0.17%

  • RBGPF

    1.3800

    83.78

    +1.65%

  • JRI

    0.1400

    13.08

    +1.07%

  • BCE

    0.3700

    25.86

    +1.43%

  • AZN

    0.1800

    92.77

    +0.19%

  • BCC

    0.5100

    80.81

    +0.63%

  • RIO

    -4.1000

    91.03

    -4.5%

  • GSK

    0.9400

    51.6

    +1.82%

  • CMSC

    0.0500

    23.76

    +0.21%

  • BTI

    0.4600

    60.68

    +0.76%

  • RELX

    -0.3700

    35.8

    -1.03%

  • VOD

    -0.0600

    14.65

    -0.41%

  • RYCEF

    -0.4300

    16

    -2.69%

  • NGG

    0.2000

    85.27

    +0.23%

  • BP

    -0.1600

    37.88

    -0.42%

Inbred, gibberish or just MAD? Warnings rise about AI models
Inbred, gibberish or just MAD? Warnings rise about AI models / Photo: Fabrice COFFRINI - AFP/File

Inbred, gibberish or just MAD? Warnings rise about AI models

When academic Jathan Sadowski reached for an analogy last year to describe how AI programs decay, he landed on the term "Habsburg AI".

Text size:

The Habsburgs were one of Europe's most powerful royal houses, but entire sections of their family line collapsed after centuries of inbreeding.

Recent studies have shown how AI programs underpinning products like ChatGPT go through a similar collapse when they are repeatedly fed their own data.

"I think the term Habsburg AI has aged very well," Sadowski told AFP, saying his coinage had "only become more relevant for how we think about AI systems".

The ultimate concern is that AI-generated content could take over the web, which could in turn render chatbots and image generators useless and throw a trillion-dollar industry into a tailspin.

But other experts argue that the problem is overstated, or can be fixed.

And many companies are enthusiastic about using what they call synthetic data to train AI programs. This artificially generated data is used to augment or replace real-world data. It is cheaper than human-created content but more predictable.

"The open question for researchers and companies building AI systems is: how much synthetic data is too much," said Sadowski, lecturer in emerging technologies at Australia's Monash University.

- 'Mad cow disease' -

Training AI programs, known in the industry as large language models (LLMs), involves scraping vast quantities of text or images from the internet.

This information is broken into trillions of tiny machine-readable chunks, known as tokens.

When asked a question, a program like ChatGPT selects and assembles tokens in a way that its training data tells it is the most likely sequence to fit with the query.

But even the best AI tools generate falsehoods and nonsense, and critics have long expressed concern about what would happen if a model was fed on its own outputs.

In late July, a paper in the journal Nature titled "AI models collapse when trained on recursively generated data" proved a lightning rod for discussion.

The authors described how models quickly discarded rarer elements in their original dataset and, as Nature reported, outputs degenerated into "gibberish".

A week later, researchers from Rice and Stanford universities published a paper titled "Self-consuming generative models go MAD" that reached a similar conclusion.

They tested image-generating AI programs and showed that outputs become more generic and strafed with undesirable elements as they added AI-generated data to the underlying model.

They labelled model collapse "Model Autophagy Disorder" (MAD) and compared it to mad cow disease, a fatal illness caused by feeding the remnants of dead cows to other cows.

- 'Doomsday scenario' -

These researchers worry that AI-generated text, images and video are clearing the web of usable human-made data.

"One doomsday scenario is that if left uncontrolled for many generations, MAD could poison the data quality and diversity of the entire internet," one of the Rice University authors, Richard Baraniuk, said in a statement.

However, industry figures are unfazed.

Anthropic and Hugging Face, two leaders in the field who pride themselves on taking an ethical approach to the technology, both told AFP they used AI-generated data to fine-tune or filter their datasets.

Anton Lozhkov, machine learning engineer at Hugging Face, said the Nature paper gave an interesting theoretical perspective but its disaster scenario was not realistic.

"Training on multiple rounds of synthetic data is simply not done in reality," he said.

However, he said researchers were just as frustrated as everyone else with the state of the internet.

"A large part of the internet is trash," he said, adding that Hugging Face already made huge efforts to clean data -- sometimes jettisoning as much as 90 percent.

He hoped that web users would help clear up the internet by simply not engaging with generated content.

"I strongly believe that humans will see the effects and catch generated data way before models will," he said.

A.Ansari--DT