Data analytics: Why edge analytics and synthetic data are trending

Data analytics: Why edge analytics and synthetic data are trending

Data analytics is no longer a nice-to-have, it is a vital business function, as critical to an organizations’ success as IT or HR. Management expects data-based decision-making as a standard, and to make the most of their data, data teams need to be aware of the trends in data analytics, two of which include edge analytics and synthetic data.

Edge analytics occurs close to the places where data is collected and where digital content and applications are consumed. It can enable real-time decision making based on data collected from internet-connected sensors on factory floors, transport networks, retail outlets, and remote locations. Though it is becoming increasingly cheap to send and store colossal volumes of data in the cloud, as data generation outpaces network capacity this becomes increasingly unsustainable. Users are turning to edge analytics for reduced latency and to allow all data to be harnessed and acted upon in real-time.

Such real-time data analytics is critical for failure detection systems in chemical plants, power plants, factories, and autonomous vehicles (AVs). Relevant data is later transmitted from the edge to the cloud so businesses can see the big picture by summarizing data from thousands of devices. Edge analytics can also aid in data governance across jurisdictions.

Leading cloud service providers all offer edge analytics to some extent, with examples including AWS’s Snow family of devices, Google’s Edge TPU and IoT Cloud Core solutions, and Microsoft’s Azure Stack Edge. Start-ups to watch include New Relic, Data Dog, and Edge Delta. Splunk also plays in this arena.

In recent years, the volumes of data available to organizations of all sizes has exploded, and this trend shows no sign of changing soon. However, collecting, cleaning, and labeling high-quality, domain-specific, unbiased data from the real world can be complicated, expensive, time-consuming, and susceptible to bias. In May 2022, gaming software company Unity cited ‘bad data’ for a $110 million impact on its ads business. This may not have happened had the company used synthetic data—data and annotated information generated by algorithms and computer simulations that can be used as an alternative or supplement to real-world collected data. Synthetic data is particularly useful when challenging edge cases exist (for example when a giraffe crosses a road rather than a pedestrian) and there might not be sufficient training data for maintaining data privacy. Where the aim is to maintain privacy, creators of synthetic data will aim to balance maximizing fidelity while preserving privacy and usefulness.

Starts-ups in synthetic data have exploded and exist for almost every application available: behavioral data (Snowplow); avatars for modeling human behavior (DataGen, Synthesis AI); focus on AVs and off-road applications (Applied Intuition, Cognata, Parallel Domain, Waabi); healthcare-related data (Syntegra); tabular data, relational databases, and time series data (Datacebo); tabular data for banking, insurance and telecom applications (Mostly.ai); for computer vision applications (Rendered.ai, Bifrost); synthetic text data for enhancing data privacy (Tonic.ai); and for various applications including genomics (Gretel.ai).

In the coming years, GlobalData predicts that synthetic audio data startups will be formed. Synthetic data will make quality training data for AI models more accessible and affordable, undercutting the unparalleled strength of proprietary data in the next few years. This could unpin the tech giants whose strengths lie in their swathes of data—unless, of course, they partner with, or acquire such startups (as Meta, then Facebook, did with AI.Reverie in October 2021).

Images Powered by Shutterstock