The Importance of Data in Healthcare

Scene…
A patient walks into their doctor’s surgery for their regularly scheduled check-up. The reception staff greet the patient by name and direct them to a pleasant waiting room. Two minutes later, the doctor calls the patient into the office.
After greeting the patient and some perfunctory small talk, they get down to business. “OK, I’ve been looking through your data and mostly everything looks good. You seem to be getting sufficient exercise and sleep and your nutrition seems to be good. There is just one thing that concerns me very slightly. There have been some new reports in the literature that the proton pump inhibitor you’ve been taking may have some long-term effects. I’ve also noted that you have a particular genetic variant that means you may be more susceptible to these effects, so let’s look at the options.”
The doctor and patient then look at the data together, come to a mutual conclusion and the patient leaves with a new prescription and some good dietary advice.
Does this seem similar to your last doctor’s appointment?
No. Of course it doesn’t. In reality your doctor has very little real data about you, beyond what you tell them. You probably aren’t going to the doctor unless you have something actually wrong with you and they only have 10 minutes, maximum, allocated to you. You’ve probably had to wait at least 30 minutes past your scheduled appointment and the doctor, who is likely working 60+ hours per week, is constantly aware of the queue of frustrated patients waiting outside his or her door.
The doctor will do their best but they don’t know enough about you. They have no idea what your diet and nutrition is like. They have no idea what your genome could be telling them about your health risks. They can’t possibly keep up to date on all the scientific literature about every drug available and the newest findings of potential issues and contraindications.
None of this is as criticism of your doctor. Or my doctor. They are doing the best they can with limited time, limited information and not enough hours in the day to do what they need to do. The issue is data. Too much of it on the one hand and not enough on the other. Too much in the world in general and not enough known about you personally.
You may have had your genome analysed by 23andMe (or similar), you may track your nutrition on your phone, and your exercise, and your heart rate etc. etc.… but your Doctor probably doesn’t have access to this. In fairness, even if they did have access they most likely wouldn’t really know what to do with it. It’s a lot of information to collate and understand and you have a 10-minute appointment.
Of course even if they did have time to collate all this data about you, what does it really mean? If they find out that I have a relatively high fat, low carb diet, I have genetic polymorphisms that mean I am a slow caffeine metaboliser but I drink huge amounts of coffee and I have a genetic predisposition for age-related macular degeneration with an odds ratio of 1.2. What does any of that mean for me? Well, probably very little in the absence of a population of data to compare it too. If course if we collated all this data from everyone, we might start to learn a huge amount. We might learn what is the trigger to turn a genetic predisposition into an actual incidence of a condition. Analytics of healthcare data on a population level has the potential to transform how healthcare is delivered. It has the potential to change how we diagnose and treat disease, sure, but more fundamentally it has the potential to prevent disease through a deeper understanding of cause and effect.
You know that smoking causes lung cancer, right? How do we know this? Well this information came out in the 1950’s when Drs Hammond and Horn carried out a prospective study and analysed the data. We did some work recently looking at population health in the United States looking at publicly available data. One of the things we looked at correlated smoking rates per state with lung cancer incidence. The correlation is so strong that we don’t even need to break this down to an individual level. We can see an increase in the number of people with lung cancer as the percentage of the population increases. Of course this is not new information, as I said, this was discovered in the 50s. But the key here is what was once a ground-breaking discover, is now something we can see relatively easily from large-scale publicly available data.