Blog

Apple, Anthropic and Other AI Firms Have Reportedly Trained AI Models on Thousands of YouTube Videos

August 24, 2024

Apple, Anthropic, and different main synthetic intelligence (AI) corporations have reportedly educated AI fashions on knowledge from lots of of 1000’s of YouTube movies. A brand new report claims that a number of AI corporations used a publicly out there dataset referred to as Pile which contained the plain textual content of movies’ subtitles with none video imagery. The knowledge was collected from fashionable YouTube creators reminiscent of MrBeast, Marques Brownlee, and PewDiePie in addition to Indian YouTube creators reminiscent of CarryMinati, BB ki Vines, and Ashish Chanchlani.

Multiple AI Models Reportedly Trained on YouTube Videos

Proof News carried out an investigation to search out that subtitles knowledge from as many as 1,73,536 YouTube movies have been taken from greater than 48,000 channels. As per the report, EleutherAI, a non-profit AI analysis lab, curated this dataset. Later, it was utilized by corporations reminiscent of Apple, Anthropic, Nvidia, Salesforce, and extra. Notably, the AI lab revealed a analysis paper highlighting the small print of the dataset.

EleutherAI created a knowledge repository of 800GB dubbed Pile and made it publicly out there for individuals who wished to coach AI fashions however couldn’t afford massive datasets. The majority of the dataset was taken from publicly out there sources reminiscent of English Wikipedia, e-books, and extra. However, it additionally contained the subtitles from all of the movies compiled in a dataset referred to as YouTube Subtitles.

The report claimed that the Pile was used to coach Apple’s OpenELM AI mannequin, on the idea of the analysis paper’s description. Salesforce, Nvidia, and Anthropic’s AI fashions’ analysis papers additionally reportedly point out the utilization of the dataset.

Anthropic spokesperson Jennifer Martinez informed the publication in an announcement, “The Pile features a very small subset of YouTube subtitles. YouTube’s phrases cowl direct use of its platform, which is distinct from use of the Pile dataset. On the purpose about potential violations of YouTube’s phrases of service, we would should refer you to the Pile authors.”

Notably, YouTube’s phrases of service prohibit anybody from accessing the movies on the platform utilizing automated means reminiscent of robots, botnets or scrapers. YouTube Subtitles will fall below the scraping class. A Google spokesperson informed Proof News in an e-mail response that the tech big has taken “motion through the years to stop abusive, unauthorised scraping.” However, no feedback have been made about AI corporations’ utilization of the info.

In a publish on X (previously often known as Twitter), Marques Brownlee referred to as out Apple for sourcing knowledge from corporations that included his movies’ transcripts, however he additionally highlighted that it was not the iPhone maker’s fault since they didn’t gather the info.

Apple has sourced knowledge for his or her AI from a number of corporations

One of them scraped tons of information/transcripts from YouTube movies, together with mine

Apple technically avoids “fault” right here as a result of they are not those scraping

But that is going to be an evolving downside for a very long time https://t.co/U93riaeSlY

— Marques Brownlee (@MKBHD) July 16, 2024

While this dataset was collected and distributed publicly, there might be different cases of information scraping on platforms reminiscent of YouTube. With AI corporations scrambling to search out extra knowledge to coach their massive language fashions (LLMs), knowledge procurement would possibly proceed to enter comparable legally gray areas.

Apple, Anthropic and Other AI Firms Have Reportedly Trained AI Models on Thousands of YouTube Videos

Multiple AI Models Reportedly Trained on YouTube Videos

NO COMMENTS

Leave a ReplyCancel reply

Multiple AI Models Reportedly Trained on YouTube Videos

Share this:

RELATED ARTICLES

Vikram’s Gritty Historical Drama Thangalaan Now Streaming on Netflix

When The Stars Gossip OTT Release Date: Lee Min-Ho, Gong Hyo-Jin...

Jamie Foxx’s What Had Happened Was… Now Streaming on Netflix

NO COMMENTS

Leave a ReplyCancel reply