Inbred, gibberish or just MAD? Warnings rise about AI models

Dubai Telegraph - Inbred, gibberish or just MAD? Warnings rise about AI models

Dubai 28°C

AED 4.269099

AFN 72.644925

ALL 95.076242

AMD 427.973788

ANG 2.080952

AOA 1066.940946

ARS 1619.310336

AUD 1.62529

AWG 2.093493

AZN 1.98043

BAM 1.952096

BBD 2.341856

BDT 142.721021

BGN 1.940855

BHD 0.438457

BIF 3459.420975

BMD 1.162245

BND 1.486405

BOB 8.034892

BRL 5.877243

BSD 1.162694

BTN 111.524295

BWP 16.447074

BYN 3.235716

BYR 22779.993656

BZD 2.338503

CAD 1.598842

CDF 2612.149237

CHF 0.914587

CLF 0.026819

CLP 1055.53936

CNY 7.914774

CNH 7.919977

COP 4429.104869

CRC 527.444525

CUC 1.162245

CUP 30.799481

CVE 110.588029

CZK 24.31021

DJF 206.554563

DKK 7.471262

DOP 69.212121

DZD 154.461189

EGP 61.40658

ERN 17.433669

ETB 183.112088

FJD 2.561762

FKP 0.862257

GBP 0.872032

GEL 3.115269

GGP 0.862257

GHS 13.296531

GIP 0.862257

GMD 84.267207

GNF 10201.606223

GTQ 8.870283

GYD 243.262581

HKD 9.103804

HNL 30.944808

HRK 7.532977

HTG 152.244207

HUF 361.702584

IDR 20458.933129

ILS 3.393104

IMP 0.862257

INR 111.565078

IQD 1522.540392

IRR 1533000.593877

ISK 143.572521

JEP 0.862257

JMD 183.721378

JOD 0.824077

JPY 184.466856

KES 150.336783

KGS 101.638735

KHR 4663.510767

KMF 492.792107

KPW 1046.022246

KRW 1740.612787

KWD 0.358716

KYD 0.968978

KZT 545.863586

LAK 25511.268811

LBP 104318.488614

LKR 381.960138

LRD 213.126644

LSL 19.165856

LTL 3.431807

LVL 0.703031

LYD 7.351242

MAD 10.722914

MDL 20.115176

MGA 4861.669457

MKD 61.623504

MMK 2440.295192

MNT 4160.224164

MOP 9.378066

MRU 46.490185

MUR 54.835139

MVR 17.910628

MWK 2024.053269

MXN 20.149374

MYR 4.59029

MZN 74.271763

NAD 19.165851

NGN 1592.845004

NIO 42.678058

NOK 10.814225

NPR 178.438473

NZD 1.985725

OMR 0.446324

PAB 1.162714

PEN 3.989409

PGK 5.093

PHP 71.603608

PKR 323.830439

PLN 4.246552

PYG 7085.554754

QAR 4.236426

RON 5.155838

RSD 117.369313

RUB 84.565601

RWF 1697.458201

SAR 4.397708

SBD 9.316927

SCR 15.774497

SDG 697.932139

SEK 10.984146

SGD 1.488259

SHP 0.867733

SLE 28.595478

SLL 24371.690047

SOS 664.227031

SRD 43.52959

STD 24056.116125

STN 24.755809

SVC 10.173695

SYP 128.465739

SZL 19.165842

THB 37.936092

TJS 10.848401

TMT 4.079478

TND 3.365284

TOP 2.798406

TRY 52.864738

TTD 7.892702

TWD 36.69962

TZS 3021.836282

UAH 51.33988

UGX 4365.715804

USD 1.162245

UYU 46.571628

UZS 14005.047508

VES 592.917692

VND 30630.955755

VUV 137.052406

WST 3.144567

XAF 654.725887

XAG 0.015287

XAU 0.000256

XCD 3.141025

XCG 2.09556

XDR 0.813493

XOF 654.344081

XPF 119.331742

YER 277.315726

ZAR 19.39541

ZMK 10461.600028

ZMW 21.888841

ZWL 374.242279

RBGPF

0.8900

61.68

+1.44%
BCC

-3.4100

65.99

-5.17%
RYCEF

-0.8300

15.1

-5.5%
NGG

-6.7900

80.64

-8.42%
CMSC

-0.1150

22.98

-0.5%
GSK

-0.8289

49.67

-1.67%
RIO

-5.9000

103.69

-5.69%
BTI

-1.6100

65.09

-2.47%
CMSD

-0.4500

23.05

-1.95%
BCE

-0.4000

23.79

-1.68%
JRI

-0.5565

12.45

-4.47%
RELX

0.9400

32.4

+2.9%
VOD

-0.8000

14.68

-5.45%
BP

0.7292

44.35

+1.64%
AZN

-3.3800

181.58

-1.86%

Inbred, gibberish or just MAD? Warnings rise about AI models / Photo: Fabrice COFFRINI - AFP/File

Inbred, gibberish or just MAD? Warnings rise about AI models

TECHNOLOGY 05.08.2024

When academic Jathan Sadowski reached for an analogy last year to describe how AI programs decay, he landed on the term "Habsburg AI".

Text size:

The Habsburgs were one of Europe's most powerful royal houses, but entire sections of their family line collapsed after centuries of inbreeding.

Recent studies have shown how AI programs underpinning products like ChatGPT go through a similar collapse when they are repeatedly fed their own data.

"I think the term Habsburg AI has aged very well," Sadowski told AFP, saying his coinage had "only become more relevant for how we think about AI systems".

The ultimate concern is that AI-generated content could take over the web, which could in turn render chatbots and image generators useless and throw a trillion-dollar industry into a tailspin.

But other experts argue that the problem is overstated, or can be fixed.

And many companies are enthusiastic about using what they call synthetic data to train AI programs. This artificially generated data is used to augment or replace real-world data. It is cheaper than human-created content but more predictable.

"The open question for researchers and companies building AI systems is: how much synthetic data is too much," said Sadowski, lecturer in emerging technologies at Australia's Monash University.

- 'Mad cow disease' -

Training AI programs, known in the industry as large language models (LLMs), involves scraping vast quantities of text or images from the internet.

This information is broken into trillions of tiny machine-readable chunks, known as tokens.

When asked a question, a program like ChatGPT selects and assembles tokens in a way that its training data tells it is the most likely sequence to fit with the query.

But even the best AI tools generate falsehoods and nonsense, and critics have long expressed concern about what would happen if a model was fed on its own outputs.

In late July, a paper in the journal Nature titled "AI models collapse when trained on recursively generated data" proved a lightning rod for discussion.

The authors described how models quickly discarded rarer elements in their original dataset and, as Nature reported, outputs degenerated into "gibberish".

A week later, researchers from Rice and Stanford universities published a paper titled "Self-consuming generative models go MAD" that reached a similar conclusion.

They tested image-generating AI programs and showed that outputs become more generic and strafed with undesirable elements as they added AI-generated data to the underlying model.

They labelled model collapse "Model Autophagy Disorder" (MAD) and compared it to mad cow disease, a fatal illness caused by feeding the remnants of dead cows to other cows.

- 'Doomsday scenario' -

These researchers worry that AI-generated text, images and video are clearing the web of usable human-made data.

"One doomsday scenario is that if left uncontrolled for many generations, MAD could poison the data quality and diversity of the entire internet," one of the Rice University authors, Richard Baraniuk, said in a statement.

However, industry figures are unfazed.

Anthropic and Hugging Face, two leaders in the field who pride themselves on taking an ethical approach to the technology, both told AFP they used AI-generated data to fine-tune or filter their datasets.

Anton Lozhkov, machine learning engineer at Hugging Face, said the Nature paper gave an interesting theoretical perspective but its disaster scenario was not realistic.

"Training on multiple rounds of synthetic data is simply not done in reality," he said.

However, he said researchers were just as frustrated as everyone else with the state of the internet.

"A large part of the internet is trash," he said, adding that Hugging Face already made huge efforts to clean data -- sometimes jettisoning as much as 90 percent.

He hoped that web users would help clear up the internet by simply not engaging with generated content.

"I strongly believe that humans will see the effects and catch generated data way before models will," he said.

A.Ansari--DT

Dubai Telegraph - Inbred, gibberish or just MAD? Warnings rise about AI models

Inbred, gibberish or just MAD? Warnings rise about AI models

Featured

Southeast Asia's largest dinosaur identified in Thailand

Canada's Cohere embraces 'low drama' amid AI giant tumult

Tabor Redefines Anti-Drone Testing with Software-Defined SDR Platform

Closing arguments in blockbuster trial pitting Musk against OpenAI