Big Data in the Enterprise
When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)? What does Big Data bring to the organization that the EDW cannot handle? This article presents a technical and business discussion on the Big Data question. The technical discussion assumes familiarity with data architecture.
The business discussion draws some conclusions about the actual application of Big Data in the enterprise.
Big Data vs. the Enterprise Data Warehouse
Big Data hardware is quite similar to the EDW’s massively parallel processing (MPP) SQL-based database servers. EDW vendors include Teradata, Oracle Exadata, IBM Netezza and Microsoft PDW SQL Server. Both Big Data and EDW SQL database servers are composed of a large racks of Intel servers (each server called a node) and both distribute data across the nodes. Each node has local hard drive(s) for data storage and does not use a centralized storage system such as a Storage Area Network (SAN) in order to prevent I/O contention. The first major technology difference is that Big Data’s most common software platform, known as Hadoop, is free, open source and runs on commodity (non-proprietary) hardware. Most EDW vendors use propriety hardware with additional hardware accelerators to allow the servers to work better as a SQL style relational database. These hardware costs, when combined with the EDW vendor’s proprietary software, typically reach cost levels that are exponentially higher per Terabyte than when using a Hadoop big data platform. Hadoop has more limitations than a SQL relational database but is far more scalable at a much lower price. While Hadoop and EDW databases break apart large data sets into massively parallel systems, the actual implementation is substantially different. EDW databases parallelizes the data across smaller logical SQL database that exist on each node. Data is imported via a loading process that divides the data into the logical databases on a row by row basis, based on a data key column. This extract, transformation and loading (ETL) process typically does additional data cleansing and data homogenization that matches the data with existing data in the EDW. To read the rest of the article by Fred Zimmerman, please download the formatted PDF version by clicking on the link below. Big Data in the Enterprise Article (PDF)
You can subscribe to our RSS feed.