Connect with us

Hi, what are you looking for?

Business

Data Ex Machina: Synthetic Data and the Future of AI

After maintaining impressive form last year, tech giant Adobe is now seeing some of its lustre rubbing off.

Following lacklustre forecasts and concerns over whether the firm’s incorporation and monetisation of artificial intelligence (AI) is occurring at the right pace, Adobe’s stock took a hit in mid-March. Many of the issues that Adobe is facing are relevant to the entire AI tech sector, where despite extreme hype and genuinely revolutionary potential alike, generative AI is experiencing growing pains.

This is in part stoked by the fact that the fresh data required to continue improving the revolutionary technology’s capabilities is in increasingly short supply. To maintain the brisk pace of advancement that AI technology has seen over recent years, new sources of data for training algorithms must become available, which has led to a focus on synthetic data – computer-generated sets of information created to train algorithms. Synthetic data is now among the most sought-after resources in the tech industry.

Corporate headaches

Problems are mounting at Adobe. The tech giant’s shares plummeted as much as 11% in March following a soft sales forecast. Despite registering year on year revenue growth in the double figures, Adobe also announced a decrease in net income, from $2.71 per share last year to $1.36 in 2024. After nearly reaching a historic maximum at the beginning of the year, shares in the company are down 20% since the beginning of February. This has given rise to concerns that Adobe’s recent troubles are a red flag for the wider AI sector.

Part of the issue, naturally, is fallout from Adobe’s failed takeover of Figma. A $20 billion acquisition of the cloud-based design tool, shepherded by Adobe’s Chief Strategy Officer Scott Belsky, looked almost a done deal before Adobe pulled out of the transaction late last year due to EU and UK regulatory hurdles. Adobe also had to pay Figma $1 billion in breakup fees after the failed merger.

Following the failed Figma deal, Adobe—and its investors—doubled down on Adobe’s generative AI potential. “The company is innovating at a pace we’ve never seen,” Adobe CFO Dan Durn underlined last fall, before the Figma deal had officially fallen apart but while it was already facing intense regulatory scrutiny. “We’re natively, deeply integrating these technologies into those workflows and products that define how they operate. This is a seminal moment in Adobe’s history,” Durn explained. “There’s an opportunity in front of us.”

Yet the fact that Adobe Firefly is making similar mistakes to the Google Gemini mishaps which have made ample headlines recently shows that the problem runs deeper than the collapsed Figma takeover. Attempts to train AI models to avoid racial stereotypes created a deluge of ahistorical representations that quickly went viral, providing the latest fodder for AI sceptics about the tech’s current limitations.

Data’s event horizon

Firefly is proving to be a formidable tool, but its public setbacks risk causing lasting damage to the app’s reputation. In some ways, Adobe is being punished for playing by the rules. The company trained its algorithm on stock images and openly licensed content to allay critics’ worries about the intellectual property rights implications of generative AI. This is in opposition to other tech competitors who often play fast-and-loose with copyright when training their algorithms.

Industry leader OpenAI, for example–facing multiple lawsuits over its use of copyrighted content– argues that it is “impossible” to train its AI tools without them. OpenAI maintains that the limitations come from the fact that “copyright today covers virtually every sort of human expression – including blog posts, photographs, forum posts, scraps of software code, and government documents”.

Even including copyrighted data, however, companies are fast approaching a wall in terms of new training data available for AI training purposes. It’s not merely a licensing issue– even available copyrighted data is becoming too scarce to feed the hunger of Large Language Models (LLMs) to train themselves. Rather, this is a wave that could crash across the entire industry, but is hitting Adobe earlier than most due to their reliance on fair-use data.

This is particularly true given the importance of relying on high-quality data for training algorithms. User-generated content like social media posts or low-quality photos are easily sourced, but are bringing no meaningful contribution to an AI model’s output. Worse, low quality data may actively harm an algorithm’s output, just as burning bad fuel can ruin an engine. Alarms are already ringing across the industry about the looming lack of high-quality data, a data event horizon could force AI tech to stagnate.

A synthetic future

To fully harness the emerging power of AI and to ensure exponential learning growth continues, there’s only one real solution: synthetic data, computer-generated sets of information which are explicitly created to train algorithms. This solution is particularly appealing because it not only offers the scalability needed for AI models to continue their exponential growth, but because it also solves inherent copyright and privacy issues.

In some industries, synthetic data already proves to be extremely effective. Companies developing self-driving car technologies, for instance, supplement real-world data with generated data. This approach allows them to simulate every conceivable scenario, including rare occurrences and extensive variations of each specific situation.

Using AI to identify fraud in banking transactions has so far proved challenging, as fraudulent transactions typically represent less than 100th of a percent of all dealings. But by using synthetic data sets which generate thousands of such edge-cases, algorithms are fed enough information to make recognizing similar patterns possible. Further applications can be found in healthcare, where AI training has so far been difficult due to strong privacy restrictions protecting medical data.

Synthetic data naturally poses risks, including “inbreeding” whereby algorithms might replicate each other’s errors, a problem already present in AI training via web-scraping. As AI generates more online content, algorithms risk training on this AI-created content, often unbeknownst to developers. However, using custom synthetic datasets allows developers to better address errors and inconsistencies compared to using gathered data.

While the road ahead is still long and winding, synthetic data will without a doubt be a massive piece of the generative AI puzzle. From start-ups like Scale AI and Gretel.ai to established giants like OpenAI or Microsoft, the industry is catching up to this fact and an arms race for synthetic data is already on its way. With the end of natural data already in sight, it might well be the race that saves artificial intelligence.

Read more:
Data Ex Machina: Synthetic Data and the Future of AI

Advertisement

    You May Also Like

    Investing

    RevisingTheBankSecrecyAct_NorbertMichelAndJenniferSchulp_CMFAWP007   The post Revising the Bank Secrecy Act to Protect Privacy and Deter Criminals (CMFA Working Paper No.007) appeared first on Alt-M.

    Investing

    Recently, an investment advisor and Bitcoin proponent tweeted the claim that “[f]or most of human history” the “[s]eparation of money and state was the...

    Business

    Rollee enables worker’s to share their professional data, spread over one or more financial platforms. Ali Hamriti, CEO and Co-Founder of Rollee, is on...

    Business

    The energy crisis means that as the price of wholesale commercial energy hits an unprecedented high, businesses must pay notably more for their energy...

    Disclaimer: successfuldealnow.com, its managers, its employees, and assigns (collectively “The Company”) do not make any guarantee or warranty about what is advertised above. Information provided by this website is for research purposes only and should not be considered as personalized financial advice. The Company is not affiliated with, nor does it receive compensation from, any specific security. The Company is not registered or licensed by any governing body in any jurisdiction to give investing advice or provide investment recommendation. Any investments recommended here should be taken into consideration only after consulting with your investment advisor and after reviewing the prospectus or financial statements of the company.

    Copyright © 2024 successfuldealnow.com | All Rights Reserved