What is Big Data?


A brief introduction to Big Data

We all use electronic devices, computers, phones etc. But have we ever wonder how much data all these devices are generating, in the form of text, phone calls, emails, photos and videos, web searches and music? Approximately, 40 exabytes of data generated every month by a single smartphone user. And if we multiply that number by 5 billion smartphone users. Such quantity of data is a lot for a single mind to process and understand, and we haven't even taken into account the data generated by other devices. This amount of data is even a lot for traditional computing systems to handle. This massive amount of data is what we call Big Data.

Let's have a look at the data generated per minute on the internet:

This data is just a representation of the data are actually out in the world. So, how do we classify any data as Big Data? To classify any kind of data need to use the concept of 5 V's of Big Data

  1. Volume Big data refers to huge amounts of data. For large data set points, this volume is regarded as the initial dimension. However, depending on the industry and the organisation, the first V is the least operational and most adaptable. Today, we speak about storing and processing exabytes (1018) or even zettabytes (102) of data, yet only ten years ago, we were talking about megabytes (10Ā°) on floppy discs.
  2. Velocity The pace at which data is created, processed, and stored is referred as the velocity. It implies that the immediacy and instantaneity with which we receive or transfer data is crucial for everyone of us and for a range of activities, driving organisations to improve their reaction and anticipation velocity. A big technological obstacle is the multiplicity of data sources and formats.
  3. Variety It denotes the type of data, which might be organised, semi-structured, or unstructured. It also refers to a number of other sources. Variety refers to the appearance of data from new sources, both inside and outside of an organisation. It is classified into three types: organised, semi-structured, and unstructured.
  4. Veracity It has to do with the data's messiness or dependability. Many forms of big data have less control over quality and accuracy (think Twitter tweets with hash tags, abbreviations, typos, and colloquial speech, as well as the reliability and correctness of material), however big data and analytics technology today allows us to cope with a variety of data. The volume frequently compensates for a lack of quality or precision.
  5. Value Last but not least, big data must be useful. That is, if you are going to invest in the infrastructure required to collect and comprehend data on a system-wide scale, you must ensure that the insights delivered are based on accurate data and result in verifiable improvements.

We have seen how to classify the data as Big Data, but how do we store and process such massive amounts of data? We can store, process and analyze Big Data with the help of different frameworks such as Hadoop, Spark etc. Let's take for example the Hadoop framework and see how it works.

The basic Hadoop framework is made up of four components that work together to produce the Hadoop ecosystem:

The emergence of big data indicates an increasing awareness of the "power" of data and the need to improve data collection, exploitation, sharing, and analysis. Analytics solutions will ensure the right use of ever-increasing volumes of data for a wide range of business objectives, including not only the generation of fundamental data-driven insights into operations, but also the prediction of future trends and occurrences. There are many types of analytics and need to have a basic idea how they organised and used. Let's see the diagram below.

Descriptive analytics is one of the most common forms of analytics that firms employ to stay current on trends and operational performance. It is one of the first steps in analysing raw data, and it includes basic mathematical operations as well as the formulation of statements regarding samples and measurements. After you've discovered patterns and insights using descriptive analytics, you can utilise other types of analytics to learn more about what's driving those trends.

Predictive analytics, as the name suggests, is concerned with foreseeing future events. Market trends, consumer trends, and a number of other market-related events might be included in these future episodes.This type of analytics forecasts future events using historical and present data. This is the most prevalent type of analytics utilised in enterprises.Customers gain from predictive analytics as well as service providers. It maintains track of our previous actions and anticipates what we will do next depending on them.

Prescriptive analytics, the most significant and underutilised big data analytics method, gives you a laser-like focus on answering a single query. Given the current circumstances, it aids in choosing the best solution from a plethora of options and suggests suggestions for capitalising on a future opportunity or avoiding a future hazard. It may also be used to show the implications of each action to improve decision-making.