The Limitations of Big Data in Cancer Care

Jessica Miller

February 2017, Vol 7, No 2 - Practice Management

There is little chance of escaping the spell that big data has cast over the healthcare industry. “Anytime you listen to any conference, just about every panel, just about every speaker at some point brings up the idea of the use of data for value based care, for value contracts, for quality measures, for new indications for drugs,” said Brad Hirsch, MD, Senior Medical Director, Flatiron Health, New York, NY, at the Sixth Annual Conference of the Association for Value-Based Cancer Care.

Although few could argue against the potential that big data brings to the table, how to systematically gather, organize, and utilize this information remains a work in progress. Dr Hirsch believes that a step in the right direction is the increased number of oncology practices that have adopted electronic health records (EHRs), but the issue of how to manage these structured and unstructured data remains.

Challenges of Accessing Big Data

Structured data that exist in the clinic include items such as a patient’s date of birth, demographic information, or drugs that were administered orally or intravenously.

As a genitourinary oncologist, Dr Hirsch said that the majority of patients he sees have already visited a primary care physician, a urologist, and a radiation oncologist. “They’ve seen a lot of people over time, and all of that is separated out,” he says. With structured data, Dr Hirsch is able to query a patient’s record and understand the basics of their health history. However, the preponderance of what truly matters is included in unstructured notes.

“Often, when a patient is seeing me, they’ve been diagnosed with prostate cancer for a decade. They’ve had a prostatectomy, they had a PSA [prostate serum antigen test] long ago, a Gleason score long ago. Best case scenario, that [information] may just be in the text of my notes. Again, there’s no way to query that, but critical if you’re understanding prognostic implications, how people do with different drugs,” he said.

“The reality is that the majority of records that I get aren’t necessarily even in my notes. They’re scanned documents that have been faxed to me. I’ve written on them. They’ve been scanned into the chart. They’re truly impossible to get access to at that point without sophisticated techniques,” he added.

Notes that are not written in but communicated in person or over the phone, such as when a patient dies, can also lead to bigger problems. According to Dr Hirsch, this is common, and deaths are often not reported in charts. The Social Security Death Index (SSDI) used to be a way to counter this, but the data they have are also poor, he said.

“If you were to look at SSDI, you would argue that there are fewer deaths every year from cancer, which we know, unfortunately, is not true,” Dr Hirsch said. “Then, if you look at it on a state-by-state basis…you see there are a number of states that have basically stopped reporting deaths over the last 5 years, or significantly less because of regulatory changes,” he added.

Real-World Data

The question Dr Hirsch and his colleagues hope to tackle is how can we utilize data sources even if the information is not necessarily complete or accurate? They created a platform that aggregates and transforms data on a large scale.

Flatiron Health obtains their data from several sources, including EHRs, an analytic platform, and relationships with academic centers. More than 1.3 million active patients are in their network, with good geographic distribution.

To simplify the process of a traditional chart abstraction, which typically involves someone sitting down with a chart that is hundreds of pages long and trying to find the right information, Flatiron Health gives their experienced oncology nurses or tumor registrars a very specific task (eg, look at the disease stage), what document it may be contained in, and where it may be located within that document.

Incorporating the Patient’s Journey

For them, the process of getting a true sense of a patient’s journey involves looking at structured data, adding in unstructured data (like staging and test results), and then looking at mortality data from different sources, such as obituary data and nursing home data. Using this more complete information offers a more accurate picture than the SSDI, which, according to Dr Hirsch’s experience, is often severely underestimated.

“It shows the limitations of what we think are gold standards today,” he says. “The implication of this is that when groups are using SSDI for mortality analyses, what we found is that with our lung cancer cohort, which is a fairly up to date cohort, using SSDI alone we would guess that the average patient with non–small-cell lung cancer in our cohort has 2 and a half years [to live]. Unfortunately, anybody who knows the literature knows that that’s widely inaccurate, that it’s closer to a year. Ours is 14 months, and we’re closer to what expectation is in the literature, but it shows the implications of using data that are incomplete, and we don’t know the limitations of.”

Once structured, unstructured, and mortality data are combined, a derivation of key points of interest is conducted. Next, claims data are incorporated.

“In the end, it really becomes a very complex process to build this. Everybody is doing it a different way, and ours is just one example,” Dr Hirsch said. He highlighted that the use of real-world big data is still moving targets.

How can data be used? What is the optimal way to analyze data? How can you document things like end points? These are all questions Dr Hirsch and his colleagues are working to answer.

“How do you look at progression in real-world data? How do you look at response at real-world data to replicate clinical trials in a way that’s meaningful?” Dr Hirsch asks. “To do that, in a lot of ways when Flatiron was started, I think the belief was we’d be a data company. We’d make the data pretty, and hand it to everybody else. I think over time there’s the realization that there are huge limitations in the ability to do that.”