Big Data Explained
A Simple Explanation of Big Data
In the real, tangible world, we’re used to organizing things by certain observable properties. Whether it’s author or title names in the library, labels in spice cabinets, or colors in loads of laundry, we’re accustomed to seeing, feeling, tasting, hearing, or even smelling things in order to qualify and order them. Big data is information about anything and everything, so it isn’t going to look, sound, or feel like any one specific thing.
In order to describe these digital collections of data, we can’t rely on just our senses to classify. Instead, we often define big data in terms of the following five characteristics:
Almost every action we take today generates data. iHealth records our sleeping patterns, Amazon our purchases and Google our searches. Our Fitbits track our steps, and Progressive will likely give us a discount for sharing our acceleration and deceleration rates.
Data volume has been growing at an exponential rate, so much so that 90% of all data had been created in the last two years according to 2017 research by DOMO. The larger the volume data, the more potential for rich and comprehensive insights, and the greater challenges companies face in storing, processing and extracting said insights.
Text, audio, machine to machine conversations, structured to unstructured. Data variety refers to all the different “types” of data in the collection. Formats such as jpg, pdf, mp3, rar, docx. Weather data, home noise volume, viewing history, contact preferences. It’s all being recorded, and in order to access insights contained within we need tools capable of analyzing a myriad of file types simultaneously.
Knowing whether your collection contains images, audio, text, code, or any other type of data can help you approach your analysis more efficiently and effectively. A lexicographic-sorting algorithm may be great in a data set of names, but probably won’t do much good in a set of pictures or audio clips.
Long gone are the days of scheduled updates, or intentional uploads. Data velocity refers to the speed at which the information is created and processed. Lifetimes of video data is now livestreamed, tags are scanned, and sensor data influencing driving decisions needs to be processed at near to real time speed.
Data veracity refers to the noise or abnormalities that a collection of data carries. In simpler terms, this is the quality of the data. Knowing the data veracity is important in order to calculate a margin of error when analyzing data, or simply to know whether or not to use it—just because the data are out there doesn’t mean they’re accurate.
Data value is pretty straightforward: it refers to what the data is actually worth. But this might not necessarily be in a monetary sense. For example, a data set containing the names of the pets in your neighborhood is going to have a much lower amount of data value to a phone company than a set containing the number of text messages sent every month.