What Is Big Data?
Big data is a relative term describing a situation where the volume, velocity and variety
of data exceed an organization’s storage or compute capacity for accurate and timely
Some of this data is held in transactional data stores – the byproduct of fast-growing
online activity. Machine-to-machine interactions, such as metering, call detail records,
environmental sensing and RFID systems, generate their own tidal waves of data. All
these forms of data are expanding, and that is coupled with fast-growing streams of
unstructured and semi-structured data from social media.
That’s a lot of data, but it is the reality for many organizations. By some estimates,
organizations in all sectors have at least 100 terabytes of data, many with more than
a petabyte. “Even scarier, many predict this number to double every six months going
forward,” said futurist Thornton May, speaking at a SAS webinar in 2011.
However, big data is defined less by volume – which is a constantly moving target – than
by its ever-increasing variety, velocity, variability and complexity.
Rethinking Data Management
- Variety: Up to 85 percent of an organization’s data is unstructured – not numeric –
but it still must be folded into quantitative analysis and decision making. Text,
video, audio and other unstructured data require different architecture and
technologies for analysis.
- Velocity: Thornton May says, “Initiatives such as the use of RFID tags and smart
metering are driving an ever greater need to deal with the torrent of data in near real time. This, coupled with the need and drive to be more agile and deliver insight
quicker, is putting tremendous pressure on organizations to build the necessary
infrastructure and skill base to react quickly enough.”
- Variability. In addition to the speed at which data comes your way, the data flows
can be highly variable – with daily, seasonal and event-triggered peak loads that
can be challenging to manage.
- Complexity. Difficulties dealing with data increase with the expanding universe
of data sources and are compounded by the need to link, match and transform
data across business entities and systems. Organizations need to understand
relationships, such as complex hierarchies and data linkages, among all data.
A data environment can become extreme along any of the above dimensions or with a combination of two or all of them at once. However, it is important to understand that not all of your data will be relevant or useful. Organizations must be able to separate the wheat from the chaff and focus on the information that counts – not on the information overload.
The necessary infrastructure that May refers to will be much more than tweaks,
upgrades and expansions to legacy systems and methods.
“Because the shifts in both the amount and potential of today’s data are so epic,
businesses require more than simple, incremental advances in the way they manage
information,” wrote Dan Briody in Big Data: Harnessing a Game-Changing Asset
(Economist Intelligence Unit, 2011). “Strategically, operationally and culturally, companies
need to reconsider their entire approach to data management, and make important
decisions about which data they choose to use, and how they choose to use them. …
Most businesses have made slow progress in extracting value from big data. And some
companies attempt to use traditional data management practices on big data, only to
learn that the old rules no longer apply.”
Some organizations will need to rethink their data management strategies when they
face hundreds of gigabytes of data for the first time. Others may be fine until they reach
tens or hundreds of terabytes. But whenever an organization reaches the critical mass
defined as big data for itself, change is inevitable.
Three Key Technologies for Extracting Business Value from Big Data
According to Philip Carter, Associate Vice President of IDC Asia Pacific, “Big data
technologies describe a new generation of technologies and architectures, designed to
economically extract value from very large volumes of a wide variety of data by enabling
high-velocity capture, discovery and/or analysis.” (Source: IDC. Big Data Analytics:
Future Architectures, Skills and Roadmaps for the CIO, September 2011.) Furthermore,
this analysis is needed in real time or near-real time, and it must be affordable, secure
Fortunately, a number of technology advancements have occurred or are under way
that make it possible to benefit from big data and big data analytics. For starters,
storage, server processing and memory capacity have become abundant and cheap.
The cost of a gigabyte of storage has dropped from approximately $16 in February
2000 to less than $0.07 today. Storage and processing technologies have been
designed specifically for large data volumes. Computing models such as parallel
processing, clustering, virtualization, grid environments and cloud computing, coupled
with high-speed connectivity, have redefined what is possible.
Here are three key technologies that can help you get a handle on big data – and even
more importantly, extract meaningful business value from it.
• Information management for big data. Manage data as a strategic, core asset,
with ongoing process control for big data analytics.
• High-performance analytics for big data. Gain rapid insights from big data and
the ability to solve increasingly complex problems using more data.
• Flexible deployment options for big data. Choose between options for onpremises
or hosted, software-as-a-service (SaaS) approaches for big data and big