Big Data is a term that describes the large volume of data, both structured and unstructured, that floods businesses every day. But it is not the amount of data that is important. What matters with Big Data is what organizations do with the data. Big Data can be analyzed to gain insights that lead to better decisions and strategic business moves.
Table of Contents
What is Big Data?
When we talk about Big Data, we refer to data sets or combinations of data sets whose size (volume), complexity (variability) and speed of growth (velocity) make it difficult to capture, manage, process or analyze them using conventional technologies and tools, such as as conventional relational databases and statistics or visualization packages, within the time it takes for them to be useful.
Although the size used to determine whether a given data set qualifies as Big Data is not firmly defined and continues to change over time, most analysts and practitioners today refer to data sets ranging from 30-50 Terabytes to several Petabytes.
The complex nature of Big Data is primarily due to the unstructured nature of much of the data generated by modern technologies, such as web logs, radio frequency identification (RFID), sensors embedded in devices, machinery, vehicles, etc. , Internet searches, social networks such as Facebook, laptops, smartphones and other mobile phones, GPS devices and call center records.
In most cases, in order to use Big Data effectively, it must be combined with structured data (usually from a relational database) from a more conventional business application, such as an ERP (Enterprise Resource Planning) or CRM ( Customer Relationship Management).
Why is Big Data so important?
What makes Big Data so useful to many companies is the fact that it provides answers to many questions that companies didn’t even know they had. In other words, it provides a reference point. With such a large amount of information, the data can be shaped or tested in any way the business sees fit. By doing so, organizations are able to identify problems in a more understandable way.
Collecting vast amounts of data and finding trends within the data allow businesses to move much more quickly, smoothly, and efficiently. It also allows them to eliminate problem areas before the problems kill their profits or reputation.
Big data analytics helps organizations harness their data and use it to identify new opportunities. That, in turn, leads to smarter business moves, more efficient operations, higher profits, and happier customers. The most successful companies with Big Data achieve value in the following ways:
- Cost reduction. Big data technologies such as Hadoop and cloud-based analytics bring significant cost advantages when it comes to storing large amounts of data, as well as identifying more efficient ways of doing business.
- Faster, better decision making. With the speed of Hadoop and in-memory analytics, combined with the ability to analyze new data sources, companies can immediately analyze information and make decisions based on what they have learned.
- New products and services. With the ability to measure customer needs and satisfaction through analytics comes the power to give customers what they want. With Big Data analytics, more companies are creating new products to meet customer needs.
- Tourism: Keeping customers happy is key to the tourism industry, but customer satisfaction can be difficult to measure, especially in a timely manner. Resorts and casinos, for example, only have a slim chance of turning around a bad customer experience. Big data analytics gives these companies the ability to collect customer data, apply analytics, and immediately identify potential issues before it’s too late.
- Healthcare: Big Data appears in large numbers in the healthcare industry. Patient records, health plans, insurance information and other types of information can be difficult to manage, but they are full of key information once analytics are applied. That is why data analysis technology is so important to healthcare. By analyzing large amounts of information – both structured and unstructured – quickly, diagnoses or treatment options can be provided almost immediately.
- Administration: Administration is faced with a great challenge: maintaining quality and productivity with tight budgets. This is particularly problematic when it comes to justice. Technology streamlines operations while giving management a more holistic view of business.
- Retail: Customer service has evolved in recent years as savvy shoppers expect retailers to understand exactly what they need, when they need it. Big Data helps retailers meet those demands. Armed with endless amounts of data from customer loyalty programs, shopping habits, and other sources, retailers not only have a deep understanding of their customers, but can also predict trends, recommend new products, and increase profitability.
- Manufacturing companies: These deploy sensors on their products to receive telemetry data. Sometimes this is used to offer communications, security and navigation services. This telemetry also reveals usage patterns, failure rates, and other product improvement opportunities that can reduce development and assembly costs.
- Advertising: The proliferation of smartphones and other GPS devices offers advertisers the opportunity to target consumers when they are near a store, coffee shop or restaurant. This opens up new revenue for service providers and offers many businesses the opportunity to get new prospects.
Other examples of the effective use of Big Data exist in the following areas:
- Use of IT log records to improve IT troubleshooting, security breach detection, speed, efficiency, and prevention of future events.
- Use of the voluminous historical information of a Call Center quickly, in order to improve the interaction with the client and increase their satisfaction.
- Use of social media content to more quickly improve and understand customer sentiment and improve products, services and customer interaction.
- Fraud detection and prevention in any industry that processes financial transactions online, such as shopping, banking, investing, insurance, and healthcare.
- Use of financial market transaction information to more quickly assess risk and take corrective action.
Data quality challenges in Big Data
The special characteristics of Big Data make its data quality face multiple challenges. These are known as 5 Vs: Volume, Velocity, Variety, Veracity and Value, which define the problem of Big Data.
These 5 characteristics of big data cause companies to have problems extracting real and high-quality data from such massive, changing and complicated data sets.
Until the arrival of Big Data, through ETL we could load the structured information that we had stored in our ERP and CRM system, for example. But now, we can upload additional information that is no longer within the domains of the company: comments or likes on social networks, results of marketing campaigns, statistical data from third parties, etc. All this data offers us information that helps us to know if our products or services are working well or, on the contrary, they are having problems.
Some challenges facing Big Data data quality are:
1. Many sources and types of data
With so many sources, data types, and complex structures, the difficulty of data integration increases.
The data sources of big data are very broad:
- Internet and mobile data.
- Internet of Things Data.
- Sectoral data compiled by specialized companies.
- Experimental data.
And the data types are too:
- Unstructured data types: documents, videos, audios, etc.
- Semi-structured data types: software, spreadsheets, reports.
- Structured Data Types
Only 20% of the information is structured and that can cause many errors if we do not undertake a data quality project.
2. Tremendous volume of data
As we have already seen, the volume of data is enormous, and this makes it difficult to execute a data quality process within a reasonable amount of time.
It is difficult to collect, clean, integrate and obtain high-quality data quickly. It takes a lot of time to transform unstructured types into structured types and process that data.
3. A lot of volatility
The data changes quickly and that makes it very short-lived. To solve it we need a very high processing power.
If we do not do it well, the processing and analysis based on this data can produce wrong conclusions, which can lead to mistakes in decision making.
4. There are no unified data quality standards
In 1987 the International Organization for Standardization (ISO) published the ISO 9000 standards to guarantee the quality of products and services. However, the study of data quality standards did not begin until the 1990s, and it was not until 2011 that ISO published the ISO 8000 data quality standards.
These standards need to mature and refine. In addition, the research on the data quality of big data has only recently started and there are hardly any results.
The quality of big data is key, not only to be able to obtain competitive advantages but also to prevent us from making serious strategic and operational errors based on erroneous data with consequences that can be very serious.
4. How to build a Data Governance plan in Big data
Governance means making sure data is authorized, organized, and with the necessary user permissions in a database, with as few errors as possible, while maintaining privacy and security.
This doesn’t seem like an easy balance to strike, especially when the reality of where and how data is hosted and processed is in constant flux.
Below we will see some recommended steps when creating a Big Data Data Governance plan.
1. Granular Data Access and Authorization
You cannot have effective data governance without granular controls.
These granular controls can be achieved through access control expressions. These expressions use grouping and Boolean logic to control flexible data access and authorization, with role-based permissions and visibility settings.
At the bottom level, you protect sensitive data by hiding it, and at the top, you have confidential contracts for data scientists and BI analysts. This can be done with data masking capabilities and different views where raw data is blocked as much as possible and gradually more access is provided until, at the top, administrators are given greater visibility.
2. Perimeter security, data protection and integrated authentication
Governance does not occur without security at the end point of the chain. It is important to build a good perimeter and put a firewall around the data, integrated with existing authentication systems and standards. When it comes to authentication, it’s important for businesses to sync up with proven systems.
With authentication,it’s about looking at how to integrate with LDAP [Lightweight Directory Access Protocol], Active Directory, and other directory services. Tools such as Kerberos can also be supported for authentication support. But the important thing is not to create a separate infrastructure, but to integrate it into the existing structure.
3. Encryption and Data Tokenization
The next step after securing the perimeter and authenticating all the granular data access being granted is to ensure that files and personally identifiable information (PII) are encrypted and tokenized from end to end of the data pipeline.
Once past the perimeter and with access to the system, protecting PII data is extremely important. That data needs to be encrypted so that regardless of who has access to it, they can run whatever analytics they need without exposing any of that data.
4. Constant Audit and Analysis
The strategy does not work without an audit. That level of visibility and accountability at every step of the process is what allows IT to “govern” the data instead of simply setting policies and access controls and hoping for the best. It’s also how companies can keep their strategies current in an environment where the way we view data and the technologies we use to manage and analyze it are changing every day.
We are in the infancy of Big Data and IoT (Internet of Things), and being able to track access and recognize patterns in data is critical.