IBM creates system to speed up outbreak investigations

A system that combines retail data with public health case reports could speed up outbreak investigations and provide better information on suspected products, according to IBM.

The method uses novel algorithms, visualization, and statistical techniques to help food retailers, distributors and public health officials predict the most likely contaminated food sources and accelerate investigation of foodborne disease outbreaks.

Scientists used a likelihood-based approach that automatically identifies, contextualizes and displays data from multiple sources to help reduce the time to identify the mostly likely contaminated sources by days or weeks.

Retail sales date and case reports

It is based on exploitation of product sales data and distribution of foodborne illness case reports and can calculate the probability of each food being responsible for a current outbreak.

Based on these findings foodborne disease outbreaks retail data could be used to speed and target public health investigations and reduce numbers of sick and/or dead people as well as reduce economic losses to industry.

“Predictive analytics based on location, content, and context are driving our ability to quickly discover hidden patterns and relationships from diverse public health and retail data,” said James Kaufman, manager of public health research for IBM Research.

“We are working with our public health clients and with retailers in the US to scale this research prototype and begin focusing on the 1.7B supermarket items sold each week in the United States.”

Response to foodborne disease outbreaks is complicated by globalization of food supply chains but rapid identification of contaminated products is essential to limit damage caused by foodborne disease.

Likelihood-based approach

The analysis shows how, when information on the food distribution channels is available, likelihood-based methods can quickly identify products likely to be causing an outbreak using the geographic locations for even relatively few cases.

It integrates pre-computed retail data with geocoded public health data to allow investigators to see the distribution of suspect foods and, selecting an area of the map, view public health case reports and lab reports from clinical encounters.

The algorithm learns from every new report and re-calculates the probability of each food that might be causing the illness.

Research was published with collaborators from Johns Hopkins University, Purdue University, the German Federal Institute for Risk Assessment (BfR) and retail sales data from SymphonyIRI Group.

Working with BfR, IBM scientists simulated 60,000 outbreaks of foodborne disease across 600 products using real-world food sales data from Germany to test the performance of the system. 

In a previous study, the researchers proposed a likelihood-based method that could be an early response system to help determine the product most likely to be associated with a foodborne disease outbreak.

They tested the likelihood-based method using raw food sales data and modelled food consumption at the point of sale region.

Using a real world food sales data set and artificially generated outbreak scenarios, it is shown that the method performs very well for contamination scenarios originating from a single “guilty” food product.

In future work, they will test this assumption by applying Huff's “gravity model” for retail shopping to smooth the sales distribution over other regions, this will allow sensitivity analysis to spatial noise in the case reports.

Source: PLOS Computational Biology

Online ahead of print, DOI: 10.1371/journal.pcbi.1003692

A Likelihood-Based Approach to Identifying Contaminated Food Products Using Sales Data: Performance and Challenges”

Authors: James Kaufman, Justin Lessler, April Harry, Stefan Edlund, Kun Hu, Judith Douglas, Christian Thoens, Bernd Appel, Annemarie Käsbohrer, Matthias Filter