IBM shows use of Big Data and analytics in outbreak investigation
Researchers showed that with as few as 10 medical-examination reports of foodborne illness they can narrow down the investigation to 12 suspected products in a few hours.
The goal is to provide a short list of suspect products for laboratory testing in a timely fashion using a likelihood-based method (LM) approach with spatial-temporal retail scanner data and case reports.
Accelerate outbreak investigation
Using retail scanner data with spatial information already collected at a grocery store/supermarket with confirmed geo-coded cases reported from the public health agency makes it possible to quickly identify “suspect” products that should be tested in the laboratory and investigated further.
They created a data-analytics methodology to review spatio-temporal data, including geographic location and possible time of consumption, for hundreds of grocery product categories.
IBM and Mars consortium
IBM jointly set up the Consortium for Sequencing the Food Supply Chain with Mars last year.
The firms, with Bio-Rad who joined this year, are investigating the genetic fingerprints of bacteria, fungi and viruses to understand how they grow in different environments and raw materials.
It combines genomics with informatics to observe microbial communities in food and detect hazards in the supply chain to lower the risk of contamination.
The team analyzed each product for its shelf life, geographic location of consumption and likelihood of harboring a particular pathogen – then mapped information to the known location of illness outbreaks.
The system then ranked all grocery products by likelihood of contamination in a list from which public health officials could test the top 12 suspected foods and alert the public accordingly.
Researchers said the approach is not intended to replace proven tools of outbreak investigation.
“We believe the availability of electronic retail scanner data with new analytics and metagenomic laboratory techniques can hugely accelerate outbreak investigation and aid public health agency in that mission at the county, state, and national levels.
“The potential benefits of this opportunity include reducing the costs of foodborne illness and of economic losses from outbreaks and recalls.”
They said there are situations in which the prediction method is expected to fail.
“In the case where the guilty product and at least one other food have matching distribution proportion in every location, thus implying that the products are perfectly correlated, the value of the likelihood for those products will be the same.
“We hypothesized that if a contaminated food is sold in few locations, our LM will be more successful in identifying this food product.”
Method applied in Germany and Norway
Evaluated using real world food distribution data from Germany, the LM achieved product identification rates of 80% or higher for as few as 10-20 case reports prior to getting patient and family interview data.
It is also possible to narrow investigation to 12 suspect products with the contaminated product included in this subset 90% of the time for 80% of products studied with as few as 10 laboratory confirmed case reports.
A traditional investigation can take from weeks to months and can significantly influence the economic and health impact of a disease outbreak. The typical process uses interviews and questionnaires to trace the contamination source.
Kun Hu, public health research scientist at IBM Research – Almaden, said when there's an outbreak, the biggest challenge for public health officials is the speed at which they can identify the contaminated food source and alert the public.
"While traditional methods like interviews and surveys are still necessary, analyzing big data from retail grocery scanners can significantly narrow down the list of contaminants in hours for further lab testing.
“Our study shows that Big Data and analytics can profoundly reduce investigation time and human error and have a huge impact on public health."
It was also applied to an E. coli O103 illness outbreak from 2006 in Norway. With 17 confirmed cases, public health officials were able to analyze grocery-scanner data related to more than 2,600 possible products and create a short-list of 10 possible contaminants.
Further lab analysis pinpointed the source of contamination down to the batch and lot numbers of a sausage product.