Tumblr, WordPress plan to sell user data to OpenAI and MidJourney to train AI models: Report

Tumblr and WordPress users may soon find that their data is being used to train artificial intelligence (AI) models, according to a report. Automattic, the blog sites’ parent company, has reportedly struck deals with OpenAI and MidJourney to sell user-generated content that will reportedly be used to train AI. Although the details of the deals and data-sharing practices are unclear at the moment, it has raised questions about data privacy and the ethics of companies sharing their users’ data with third parties.

Internal communications from Automattic’s employees, viewed by 404 Media, both confirmed agreements with AI companies and revealed details on these practices. In its reportThe publication confirmed that Automattic’s deal with OpenAI and MidJourney could be announced soon. Furthermore, it appears that data collection for AI companies has already begun. Meanwhile, an internal post by Syl Gage, a product manager, suggested that all of Tumblr’s publicly posted content between 2014 and 2023 was compiled.

The report also highlights a specific message that states that along with public data, private and deleted user content was automatically compiled. It was not clear whether that set of data had already been shared with AI firms. Furthermore, since such a mishap puts the private information of its entire user base at risk, it also raises questions about the company’s ethical policy and data security infrastructure.

Automattic issued a Tuesday statement Stating, “AI is rapidly changing nearly every aspect of our world, including the way we create and consume content. At Automattic, we have always believed in a free and open web and personal choice. Like other tech companies, we are closely following these developments, including working with AI companies in a way that respects our users’ preferences.

The post goes into detail about several things the company is doing for its users, including blocking AI platform crawlers, setting settings to discourage search engines from indexing a site on WordPress and Tumblr, and those users. Contains assurance of an opt-out setting for those who do not wish to share. Data held by third parties. “Currently, no laws exist that require crawlers to follow these preferences,” the post said.

The mechanism for opting out of data sharing is also somewhat unclear. While the company said in the post that AI companies will honor opt-out settings and even remove old content from users who have newly opted out, the report claims the reality is more complex.

The report found an internal document dated February 23 where an employee asked if the company had any assurances that the data partner would respect the opt-out decision made by users. Automattic’s AI chief Andrew Spital reportedly replied, “We will ask that the content be removed and removed from any future training programs. I believe the partners will respect this based on our conversations with them so far. I don’t think they will get much overall benefit from keeping it going.”

According to the report, the response was found vague and does not confirm whether Automattic had reached any agreement on this or not. Furthermore, it appears that the entire chain of reasoning is based on the assumption that AI companies would not have much to gain from retaining user data. It should be noted that the practice of sharing third-party data is not new, and most social media platforms own the rights to user-generated public content on the platform. However, making such deals without telling users could potentially expose private information to companies that are using the same data to train AI systems.


Affiliate links may be automatically generated – see our ethics statement for details.

For details on the latest launches and news from Samsung, Xiaomi, Realme, OnePlus, Oppo and other companies at Mobile World Congress in Barcelona, ​​visit our MWC 2024 hub.

Leave a comment