Data analysis: Lifestyle plays huge role in health, healthcare's value less obvious

I created a dataset with many health-related variables with one observation for each state (plus Washington, D.C.).

I then ran a principal component analysis to generate the following plot of the first two principal components:

Principal component analysis plot of lifestyle and healthcare dimensions

The 1st principal component, which explains 31.6% of all the variation in the dataset, is easily interpreted as HEALTHY LIFESTYLE CHOICES AND BETTER HEALTH. High scores on the 1st p.c. are strongly associated with exercise, high school graduation, flu vaccinations, and healthy birth weight babies. High scores on the 1st p.c. are strongly negatively associated with: smoking, teen pregnancies, obesity, diabetes, and cancer deaths (colon, prostate, lung, breast, and overall). States scoring high on HEALTHY LIFESTYLE CHOICES AND BETTER HEALTH include: Utah, Vermont, and Minnesota. States scoring low include: Washington, DC, Mississippi, Louisiana, Alabama and Kentucky.

The 2nd principal component, which explains 20.1% of all the variation in the dataset (so the first two p.c.s together explain 51.8%), is easily interpreted as QUALITY HEALTHCARE. (Confusingly, I’m interpreting negative scores on this p.c. as QUALITY HEALTHCARE. Positive scores represent POOR HEALTHCARE.) The QUALITY HEALTHCARE p.c. is strongly associated with vaccination of babies, use of medical screening tests (cholesterol levels, colonoscopies, pap smears, fecal occult blood tests, mammograms), prophylactic administration of antibiotics before and after surgery, and administration of flu vaccines to at-risk groups. The QUALITY HEALTHCARE p.c. is strongly negatively associated with suicide. This initially struck me as odd, but it makes sense because quality psychiatric treatment can prevent many suicides. States scoring high on QUALITY HEALTHCARE include: Rhode Island, Massachusetts, and Delaware. States scoring low include: Nevada, New Mexico, and Wyoming.

What jumps out at me is that the 1st principal component represents both healthy lifestyles and greater health and lower death rates. The 2nd principal component represents quality healthcare but isn’t linked to better health outcomes. It’s not obvious from this analysis that QUALITY HEALTHCARE is actually improving health or lowering death rates. Perhaps I’ve not included variables that would pick up this positive health impact. But QUALITY HEALTHCARE seems, if anything, to be associated with higher cancer death rates and lower birth weight babies. Though this effect is almost certainly not causal, neither is it evidence of a strong relationship between QUALITY HEALTHCARE and better health outcomes.

The 3rd principal component (soaking up another 14.7% of the variance in the dataset) seems to be identifying states with more smokers and obese people who are dying of cancer at higher rates. These 3 principal components together explain 66.5% of total variation in the dataset.

My takeaway from this analysis: Your health is largely a function of healthy lifestyle choices. If you smoke, eat fast food, don’t exercise, weigh too much, etc., don’t expect hospitals to keep you alive. Great medical treatment may add a few years of fragile life. But healthy living can avoid cancer, heart disease and diabetes, giving you — potentially — a happy, healthy, active, decades-long old age.

Posted by James on Tuesday, June 14, 2011