The State of Data

Recently we decided to augment the data that was have on skilled nursing homes by bringing in data provided by the individual states. This project began as an attempt to better understand the data that is published by The Centers for Medicare and Medicaid (“CMS”) and nearly all the data on the site is from this source. (We are still working on incorporating data that CMS provides.)

We have been looking at state level datasets for a few years and found some interesting data that the states provide. But trying to incorporate these datasets is a daunting task. There 50 states after all plus three territories. Not only is it a problem to have to pull in 50 different datasets, but we have to find where each state has squirreled away their data, and we have to standardize it so that records in Florida can be read like records from Texas. And then we also have to figure out how often to update the information. That’s why we have delayed looking for so long. (Pulling data on hospitals is a similarly daunting problem. They are required to publish cost data, but the requirement do not specify where this data is to be published, what specific data is to be published, nor what format.)

It’s Awful

Why am I writing about this? I am writing about this because after having spent several weeks diligently looking for additional data on nursing homes, we have concluded that the state of data in this country is awful!

What do I mean when I saw that the state of data is awful:

  1. Health Care Data is Hard to Find– States do not make it easy to find the data that you are looking for. Approached vary by state. And, of course, there is a lot of different types of data, so trying to report out data can be challenging.
  2. Health Care Data is Not Consistent– Fields and data points vary from state to state as does the delivery method. Some offer spreadsheet or CSV downloads, others provide a web page that you have to search to extract data.
  3. Health Care Data is Usually Out of Date- Much of the data that is available has not been updated in years. This can be a real problem.
  4. There is Insufficient Data– Often the data that is available is paltry. Name, address, phone number. Come on!

These problems not only plaque state repositories, it is also a problem for CMS. CMS updates its nursing home data 11 times a year, its doctor data very often, and some other data sets less often. We regularly get requests from groups asking us to update their information. When we mention that this is the latest data from CMS, it turns out that CMS does have the correct information in another area but has not bothered to update what they publish publicly. I know it can be difficult to synchronize different data sets, but come on, why do you have so many different storage systems that you have to maintain?

We need to do better

Everyone agrees that good data is important; it is important for people who are trying to decide on a nursing home, doctor, hospital, home health provider, hospice provider, or dialysis center. The data is there, the agencies are collecting it; CMS is conducting detailed inspections. They just are not doing a good job of sharing it.

The data landscape in this country needs to improve. We need better standards for collecting and managing data, and we need more transparency about the type of data being collected and how it is being used.