Saturday, September 1, 2012

Column Store Database Technology

Column Store database technology is something creating a buzz these days in the world of Business Intelligence as the time of Self Service Business Intelligence is coming soon and performance is going to play major role in deciding which Self Service BI tool to adopt. The Column Store Technology is something not new and it’s been there in the market since long time. I believe Sybase IQ, Teradata and now SQL Server 2012 are the major players in the field of the column store technology.

As the adoption of the self service Business Intelligence tool will grow day by day which I am sure, it is highly important for these tools to store the large amount of data effectively and have ability to play around with the data at run time efficiently. As we know unlike the OLAP where we do have the pre aggregated stored in the within the cells, in the self-service BI tools it is done on real time data based on the request.
 
SAP HANA a self-service Business Intelligence tool leverages the Sybase column store technology for data storage in memory, similarly Microsoft based self service BI tool Power Pivot uses the xVelocity in memory analytic engine and same technology is used for the column store index in the SQL Server 2012.
 
What is Column Store and how column store drastically improves the speed of the data warehouse queries?
 
In a traditional table the data is stored in the row- based fashion. This storage is basically the row based storage. Column Store is something like turning the row based storage to 90 degree. In the column store all the values of a single column are stored in compressed form. The column store stores each column in separate set of disk pages rather than storing multiple rows per page.
Let us take an example of the Employee Table to evaluate the different ways of storing the data.
Table Containing the Employee Data –
 

Employee Data Stored in the Row Format


Row Store 


1           14417807      adventure-works\guy1            Production Technician - WC60       M      M
2           253022876     adventure-works\kevin0         Marketing Assistant                        S       M
3          509647174      adventure-works\roberto0      Engineering Manager                     M      M
 
4          112457891      adventure-works\rob0 Senior Tool Designer                                 S       M
 
5         480168528     adventure-works\thierry0       Tool Designer                             M       M
 


Column Store
1 2 3 4 5
14417807 253022876  509647174 112457891 480168528
adventure-works\guy1 adventure-works\kevin0  adventure-works\roberto0  adventure-works\rob0  adventure-works\thierry0
Production Technician - WC60  Marketing Assistant  Engineering Manager Senior Tool Designer  Tool Designer

M S M S M
M M M M M
 
As you might see the that column store groups and stores the data for each column together while row store groups the data for each row.
Now since we understand on how the data is stored in row and column based storage techniques, let us try to understand on how the column store significantly improves the speed of the queries.
The first and the biggest reason for significant improvement in the query speed is because the column store shares many more similar characteristics that the data stored at row level. Because of this data can be highly compressed in the column store database.
When the data is compressed, queries requires less IO because the amount of data that needs to be transferred from the disk is significantly reduced.
The compression permits the columnar operations like MIN, MAX, AVG, COUNT etc..to perform quickly.
In the case of SQL Server 2012 column store uses the xVelocity compression algorithm which compresses the data far superior than the row and page based compression techniques.
The column store technology is used to improve the Data Warehousing queries and we might see a lot of enhancements and improvements coming in this area in near future.