With the rise of risk-bearing arrangements and value-based care, “big data” is becoming more prominent in the sphere of provider-centric healthcare. Big data, as its name would suggest, refers to vast, complex sets of data that can be analyzed using machine learning and AI algorithms to uncover nuanced trends and patterns that wouldn’t be as apparent in smaller data sets. While the healthcare industry has entered into the realm of big data, it has yet to adopt a standard model for this ever-expanding universe of data. As big data in healthcare continues to round into form, it’s helpful to understand the famous “three V’s of big data”: Volume, Velocity, and Variety.
The role of volume in big data, especially when it comes to the paradigm shift of data-driven healthcare, can be best summarized through the following statistic—the current volume of data being generated by the healthcare industry doubles every 73 days. Managing data on that scale can be very difficult, and certain storage models have come to the fore to best tackle that process. Considering the unstructured nature of most healthcare data, relational databases—where the data have pre-defined relationships—aren’t necessarily the best option. As the size and number of relational databases increase, the performance of queries and updates tends to degrade as well. Instead, more flexible database management models of NOSQL and columnar databases are most often relied upon to analyze the volume of healthcare data.
Data velocity refers to streaming or frequent bursts of data from a number of sources. The means for collecting these data points include wearable smart devices such as a Fitbit as well as other health tracking applications that can be downloaded on one’s phone. These devices and applications can have meaningful impacts on patients and the management of their care. For these devices to function properly, they need big data schemas with the ability to handle a high velocity of data. As healthcare has been later than other industries to adopt big data, others have already done much of the legwork in advancing the technology to better manage data streaming and high-velocity data.
We can understand the impact of data variety in healthcare by examining structured vs. unstructured data. In the simplest terms, structured data is how most of us are used to engaging with data—organized into spreadsheets with all data points neatly aligned within set categories. In healthcare, as much as 80% of the data is locked in unstructured data types, such as a discharge summary or a radiology report. An important step in managing data variety is codifying data. Codified data is any data that rely on consistent and uniform usage of code sets across multiple systems, the most common in healthcare systems being claims data.
Now that we’ve broken down the specifics, the question your should be asking next is what your health system can do to address the unique challenges presented by each of the three V's. One solution is for your health system to invest in a cloud-hosted Data Lake. A Data Lake is a repository for data that stores data in its raw format. Cloud-hosted Data Lakes provide a data architecture that’s flexible, extensible, and rapidly provisioned. With a cloud-hosted Data Lake, you’re not constrained by a predefined schema, which allows for faster analysis. The unstructured data you’re collecting, but may not currently have a use for, will be readily accessible to you when it’s needed with a cloud-hosted Data Lake architecture.
Interested in learning more about big data? For a more in-depth examination of this topic, check out our On-demand webinar, “Why Your Current Data Strategy May Not Work”.