Kategoriler: Technology

What to Know About Tech Companies Using A.I. to Teach Their Own A.I.


OpenAI, Google and other tech companies train their chatbots with huge amounts of data culled from books, Wikipedia articles, news stories and other sources across the internet. But in the future, they hope to use something called synthetic data.

That’s because tech companies may exhaust the high-quality text the internet has to offer for the development of artificial intelligence. And the companies are facing copyright lawsuits from authors, news organizations and computer programmers for using their works without permission. (In one such lawsuit, The New York Times sued OpenAI and Microsoft.)

Synthetic data, they believe, will help reduce copyright issues and boost the supply of training materials needed for A.I. Here’s what to know about it.

YAZI ARASI REKLAM ALANI

It’s data generated by artificial intelligence.

Yes. Rather than training A.I. models with text written by people, tech companies like Google, OpenAI and Anthropic hope to train their technology with data generated by other A.I. models.

Not exactly. A.I. models get things wrong and make stuff up. They have also shown that they pick up on the biases that appear in the internet data from which they have been trained. So if companies use A.I. to train A.I., they can end up amplifying their own flaws.

No. Tech companies are experimenting with it. But because of the potential flaws of synthetic data, it is not a big part of the way A.I. systems are built today.

The companies think they can refine the way synthetic data is created. OpenAI and others have explored a technique where two different A.I. models work together to generate synthetic data that is more useful and reliable.

One A.I. model generates the data. Then a second model judges the data, much like a human would, deciding whether the data is good or bad, accurate or not. A.I. models are actually better at judging text than writing it.

“If you give the technology two things, it is pretty good at choosing which one looks the best,” said Nathan Lile, the chief executive of the A.I. start-up SynthLabs.

The idea is that this will provide the high-quality data needed to train an even better chatbot.

Sort of. It all comes down to that second A.I. model. How good is it at judging text?

Anthropic has been the most vocal about its efforts to make this work. It fine-tunes the second A.I. model using a “constitution” curated by the company’s researchers. This teaches the model to choose text that supports certain principles, such as freedom, equality and a sense of brotherhood, or life, liberty and personal security. Anthropic’s method is known as “Constitutional A.I.”

Here’s how two A.I. models work in tandem to produce synthetic data using a process like Anthropic’s:

Even so, humans are needed to make sure the second A.I. model stays on track. That limits how much synthetic data this process can generate. And researchers disagree on whether a method like Anthropic’s will continue to improve A.I. systems.

The A.I. models that generate synthetic data were themselves trained on human-created data, much of which was copyrighted. So copyright holders can still argue that companies like OpenAI and Anthropic used copyrighted text, images and video without permission.

Jeff Clune, a computer science professor at the University of British Columbia who previously worked as a researcher at OpenAI, said A.I. models could ultimately become more powerful than the human brain in some ways. But they will do so because they learned from the human brain.

“To borrow from Newton: A.I. sees further by standing on the shoulders of giant human data sets,” he said.


Source: nytimes.com

editor

Paylaş
Tarafından Yayınlandı
editor

Yakın Zamanda Gönderilenler

Cumhurbaşkanı Erdoğan’dan İran’daki kazayla ilgili ilk açıklama

Cumhurbaşkanı Recep Tayyip Erdoğan'dan İran'da İbrahim Reisi'nin helikopterini taşıyan helikopterin kaybolmasıyla ilgili ilk açıklama geldi.…

2 saat ago

Özgür Özel, gençlerle Anıtkabir’e yürüdü! 19 Mayıs’ta dikkat çeken seçim mesajı: ‘Söz veriyoruz’ diyerek duyurdu

CHP Gençlik Kolları, 19 Mayıs Atatürk'ü Anma, Gençlik ve Spor Bayramı dolayısıyla 'Büyük Gençlik Yürüyüşü'…

2 saat ago

Yenilmezlik serisini bitirmeye gönderme var

Fenerbahçe, ezeli rakibi Galatasaray'ın 24 maçlık yenilmezlik serisini bitirdi. Sarı-lacivertliler göndermeli paylaşımıyla da taraftarını mest…

2 saat ago

Bursa’da ‘yeşil altın’ denilen biber fideleri toprakla buluştu

Tarımı ve özellikle "yeşil altın" diye bilinen biberiyle tanınan Yenişehir’de çiftçiler tarlaya inerek, ekime başladı.…

2 saat ago

Türkiye’de her 3 yetişkinden biri tansiyon hastası

DSÖ, günlük hayatı olumsuz etkileyen hipertansiyonu "ciddi bir kronik sorun" olarak tanımlıyor. Dünyada etkin olan…

2 saat ago

İran Cumhurbaşkanı Reisi’yi arayan 14 arama kurtarma personeli kayboldu

İran Cumhurbaşkanı İbrahim Reisi'nin düşen helikopterini bulmak için görevlendirilen arama ekibinden kaybolan personel sayısı 14'e…

3 saat ago

This website uses cookies.