Discovering innovations that matter since 2002

Comparing data points allows researchers to find patterns that can help solve problems | Photo source Pixabay

Tech Explained: Big Data

Computing & Tech

We’ve all heard of Big Data. What exactly is it and do the pros outweigh the cons?

The speed of data creation continues to climb. In May 2018, 2.5 quintillion (that’s a 1 followed by 18 zeros) bytes of data were created every day. And the pace of data collection is set to continue this meteoric rise. By 2020, there are projected to be 200 billion smart devices in use – each one constantly collecting data. 

This is Big Data – data that contains a wide variety of information, arriving in very large volumes and in high velocity. So, what exactly constitutes Big Data, what is it used for and how can it be processed?

How Big Data works

The reason for collecting so much data is based on the idea that the more you know, the more insights you gain and the better your prediction about what will happen in the future. Comparing data points allows researchers to find patterns that can help solve problems. Until relatively recently, most data was “structured” — it consisted of figures in spreadsheets or databases. 

However, as data sets grew larger and more complex, they became too large for traditional data processing software to manage. Today, analytics software uses artificial intelligence and machine learning to work with large data sets and “unstructured” data like images, sensor data, social media posts and sound recordings.

In order to analyse this data, analytics software is used to build models and run simulations that look for solutions to particular questions or problems. For example, a company might analyse it to anticipate customer demand for a new product. To do this, they first collect various types of data, such as demographic data, price, customer purchase history, even the weather. They then use this information to build a predictive model for the new product, like who is most likely to buy the product. This model informs them of the best ways to market the product, like what price to set and what market segment to target.

Huge scientific research projects like the Large Hadron Collider at CERN use distributed computing to analyse massive sets of data. By leveraging the computing power of thousands of computers around the world, the researchers can tackle questions that could not have been approached before.

Uses of Big Data

Big Data is revolutionising all areas of business. In addition to new product development, it can be used to predict when machinery needs maintenance or is close to failure, through analysing information such as log entries, sensor data and error messages. This allows organisations to perform maintenance more efficiently. 

Data gathered from customers can also help companies develop more personalised services and products and handle issues more proactively, reducing customer churn. Banks and businesses use big data to identify patterns that can indicate fraud or theft, and to improve security and compliance.

Big Data is also having a significant impact on organisational efficiency. Software can analyse production, customer feedback, market demand, return rates and other factors in order to improve just-in-time systems and overall operational efficiency. Of course, Big Data can also help innovate, by examining micro and macro trends and providing insight on how to best deliver what customers want.

It is not only useful in business. In healthcare, medical records, images and analytics from wearables are being analysed for patterns that can help researchers spot disease early and develop new medicines. 

Apple’s new health app, called ResearchKit, leverages Big Data by allowing researchers to create studies using the app to compile data. The app effectively turns users’ phones into biomedical research devices. Integrating data from medical records with social media analytics could also enable researchers to monitor flu outbreaks in real time by tracking how many people report they are feeling ill.

The development of autonomous vehicles also depends on Big Data. To operate safely, they must gather and analyse tremendous amounts of data — everything from traffic conditions to images of people on the side of the road. 

Cities are also embedding Big Data into urban design. For example, Los Angeles uses data from magnetic road sensors and traffic cameras to alter the timing of traffic lights and reduce congestion around the city. In Porto, Portugal, sensors tell the city’s waste management department when rubbish bins are full so they don’t waste time and fuel emptying containers that are only half full.

Sensor data is also being analysed to predict where earthquakes, droughts or floods are likely to strike next. Relief organisations can use this data to direct supplies even before disaster strikes. Police forces use it to deploy resources more efficiently to deter crime. 

Sports teams are using it too. In addition to tracking performance using equipment embedded with sensors, teams can also use smart technology to track nutrition, sleep and emotional state. For ordinary people, there are even smart yoga mats that can provide feedback on your postures, score your practice and guide you through an at-home practice.

Downsides of Big Data

Companies using Big Data need to be able to ask the right questions in order to get the best use out of these huge mounds of data. This can require expertise that is unavailable to many small and medium-sized enterprises. 

Of course, the biggest drawback is the lack of privacy and security inherent in Big Data. We’re nearing a point where data will be collected on almost every aspect of our lives, including many things that used to be private. 

Increasingly, we are asked to strike a balance between the amount of personal data we divulge and the convenience that data-powered apps and services offer. Even if we decide we are happy for someone to have our data for a particular purpose, can we trust that it will always be kept safe? 

All of this also increases the likelihood of discrimination against people based on information that isn’t clearly being collected, such as what we communicate in private, what we eat and how much we exercise.