The arrival of ChatGPT in late 2022 showed, for the first time to a wide audience, the practical applications of AI. There are many benefits that AI can, and will, have in the marketing industry with new applications arriving daily. For the purposes of this perspective, we will look at how synthetic data can impact the information marketers need for both their tactical and strategic planning.

Definition of Synthetic Data

As with so much in this field, the speed at which the technology is moving means there hasn’t been time for the industry to land on a well-defined definition of synthetic data or even a catchy name. For the purposes of this perspective, we will define synthetic data as data that has been created by AI from other previously collected data that wasn’t created by AI.

Use Cases for Synthetic Data in Marketing Inputs

Synthetic data has the potential to provide marketers with the data inputs they need for their planning in a number of ways:

  • A view of how their and competitor brands are perceived by the market
  • Potential appeal and purchase intent of new products or services
  • Creative execution evaluation
  • Message testing
  • Media planning
  • Plus many more use cases that will probably have arrived by the time you read this

Benefits of Synthetic Data

As they stand, the main benefits of synthetic data are that it has the potential to:

  • Be faster to generate
  • Provide data that is privacy compliant as it hasn’t come directly from people
  • Be cheaper
  • Provide views for market segments where it has historically been hard to

All of these are very attractive benefits. Synthetic data has the potential to democratise marketing input information, making it more accessible to a wider audience, quicker.

Drawbacks of Synthetic Data

As with all of AI and early-stage technological innovations, there are drawbacks. Some of these may be mitigated by advances in the technology, some may be systemic to AI and therefore harder to overcome. These are the main drawbacks that people are currently seeing with synthetic data:

  • While there are some examples where it has shown to be accurate, there are others where it hasn’t. When there’s a bit of hype there’s always the tendency to have a positivity bias towards reporting
  • It’s still in the DIY stage with few providers offering this as a service, so you’d need the skills internally to be able to take advantage of this
  • Reliant on good, bias free training data which is hard to come by
  • The data and models are only as good as the prompts used to create them, so make sure they’re as unbiased as possible
  • Works well when there are easily explained patterns in the data, less so where the world is more complex

These downsides are not insignificant, so it pays to go in to using synthetic data knowing these and being prepared to work with them.

At this stage, there isn’t any suggestion that synthetic data is better than traditionally collected data, so the advantages are as much about speed of gathering as anything. There is still the planning and implementation of the information that needs to come afterwards.

A Way of Approaching the Use of Synthetic Data

As we are at such an early stage of this journey, it would be foolhardy to suggest that there’s an ideal approach to using synthetic data, however there are some guidelines that can be applied:

> If you have the tech skills in-house, great, get them building and set them the task of trying to replicate some existing data
  • Keep working until you have a good match and then use the model to help close data information gaps you have, just keep your eyes open to any biases or drawbacks

> If you don’t have the tech skills in house, go out and try, in a small scale way, the myriad of small providers who are operating in the early stages

  • Keep these engagements small so that you’re not committing too much to the relationship in case the results don’t deliver, just keep your eyes open to any biases or drawbacks

> If you don’t have the skills or the budget to experiment, it’s okay to hold tight and see how the space evolves

  • As mentioned, if you already have good data, keep using that as synthetic data won’t, at the moment, provide you with better data so there isn’t a competitive advantage in that regard. Also, while the speed of accessing the data is greater, you still need to plan and execute with it, so there’s only minimal advantage in this space for those using synthetic data 

Source: Andrew Gale, Head of Quantitative Practice, TRA