Column Store
database technology is something creating a buzz these days in the world of
Business Intelligence as the time of Self Service Business Intelligence is
coming soon and performance is going to play major role in deciding which Self
Service BI tool to adopt. The Column Store Technology is something not new and it’s
been there in the market since long time. I believe Sybase IQ, Teradata and now
SQL Server 2012 are the major players in the field of the column store
technology.
As the adoption of the self service Business Intelligence tool will grow
day by day which I am sure, it is highly important for these tools to store the
large amount of data effectively and have ability to play around with the data
at run time efficiently. As we know unlike the OLAP where we do have the
pre aggregated stored in the within the cells, in the self-service BI tools it
is done on real time data based on the request.
SAP HANA a self-service
Business Intelligence tool leverages the Sybase column store technology for
data storage in memory, similarly Microsoft based self service BI tool Power Pivot uses the xVelocity in
memory analytic engine and same technology is used for the column store index
in the SQL Server 2012.
What is Column Store and how column store drastically
improves the speed of the data warehouse queries?
In
a traditional table the data is stored in the row- based fashion. This storage
is basically the row based storage. Column Store is something like turning the
row based storage to 90 degree. In the column store all the values of a single
column are stored in compressed form. The column store stores each column in separate
set of disk pages rather than storing multiple rows per page.
Let
us take an example of the Employee Table to evaluate the different ways of
storing the data.
Table Containing the Employee Data –
Employee Data Stored in the Row Format
Row Store
1
14417807 adventure-works\guy1 Production Technician - WC60 M M
2
253022876 adventure-works\kevin0 Marketing
Assistant S M
3 509647174 adventure-works\roberto0 Engineering
Manager
M
M
4 112457891 adventure-works\rob0 Senior
Tool Designer S
M
5 480168528 adventure-works\thierry0 Tool
Designer M M
Column Store
1
2 3 4 5
14417807
253022876 509647174 112457891 480168528
adventure-works\guy1
adventure-works\kevin0 adventure-works\roberto0 adventure-works\rob0 adventure-works\thierry0
Production Technician - WC60 Marketing
Assistant Engineering Manager Senior
Tool Designer Tool Designer
M
S M S M
M
M M M M
As you might see the that column store groups and stores the data for
each column together while row store groups the data for each row.
Now since we understand on how the data is stored in row and column
based storage techniques, let us try to understand on how the column store
significantly improves the speed of the queries.
The first and the biggest reason for significant improvement in the
query speed is because the column store shares many more similar
characteristics that the data stored at row level. Because of this data can be
highly compressed in the column store database.
When the data is compressed,
queries requires less IO because the amount of data that needs to be transferred
from the disk is significantly reduced.
The compression permits the columnar operations like MIN, MAX, AVG,
COUNT etc..to perform quickly.
In the case of SQL Server 2012 column store
uses the xVelocity compression algorithm which compresses the data far superior
than the row and page based compression techniques.
The
column store technology is used to improve the Data Warehousing queries and we
might see a lot of enhancements and improvements coming in this area in near
future.