Understanding Big Data
The Five Vs of Big Data
Big Data is a profession and field dedicated to the analysis, processing and storage of large data sets that frequently originate from different sources. Big Data solutions and practices typically become of interest when traditional data analysis and and processes prove to be insufficient to solve problems. In more detail, Big Data addressed very distinct requirements – such as finding correlations between multiple unrelated data sets – that can provide value to organizations or society. The scale and efficiency with which value could be derived from these data sets, is the field of Big Data.
The key characteristics of data that is attributed as ‘Big Data’ are commonly known as the Five V’s, which help to differentiates data as “Big” – compared to other forms of Data:
- Volume. The anticipated volume of data that is processed by Big Data solutions is substantial and usually keeps growing (think about the data produced daily by Jet Engines). High data volumes impose a significant data storage and processing requirements.
- Velocity. Big Data arrives at a speed that it produces enormous data sets within very short periods of time (think about the video data uploaded on YouTube every hour). From an organization’s point of view, high velocity means that this amount of data needs to be processed within the organisation, requiring flexible and scalable data processing solutions.
- Variety. Data variety refers to the multitude of of formats and types of data that need to be supported by Big Data (think about the different data types an organization such as NASA generates). Data variety has traditionally been one of the major challenges from almost every organization, requiring tools and technology to integrate, transform and combine data sets.
- Veracity. Data veracity refers to the quality of data. Data that exists in Big Data environments can either be useful (i.e. the data has value) or has no value at all (frequently referred to as clutter). In order to understand data veracity, think about the number of emails that are sent within an organization that have limited to no value.
- Value. The value of data is defined as the usefulness of that data for a specific organization. Whereas MRI data has a high value for a hospital, the same data set might have little to no value for Financial Enterprise. Besides the intrinsic relation with data quality (veracity), the value of data is additionally determined by the the speed (time) with which the data can be processed and transformed into meaningful information.
Big Data Solutions and Key Traits
Using Big Data Solutions, organizations can complete complex analysis tasks with the objective to arrive at meaningful information for the business. Generally speaking, these Big Data solutions can process massive quantities of data that arrive at different speeds (velocity) and formats (variety).
Additionally, data within Big Data environments can come from internal sources within the organization (for example through applications) or from external data sources that are than stored by the Big Data solution (for example Twitter data). Data processed by a Big Data solution can be used directly by the organization, or can be stored into a data warehouse to enrich existing data. This data is subsequently analyzed and subjected to analytics.
Big Data solutions have one of the following objectives in order to realize value and benefits:
- Improvement of processes and operations
- Generation of actionable intelligence upon which decisions can be based
- The identification of new markets
- Prediction of (potential) change in the future
- Fault and fraud detection
- Optimization of data and records in the organization
- Improvement in the accuracy and speed of decisions
- Scientific research and discoveries
The data processed by Big Data solutions can be classified into either human-generated data (for example Facebook or Twitter posts) or machine-generated data (for example GPS data in cars or automated computations), although it is ultimately the objective of machines to generate the processing results. Needless to say, there are numerous ethical concerns, limitations and liabilities to the true use of Big Data, and in any case the benefits must be balanced against the potentials risks and impacts.