To use compression you have to understand what your data looks like, if your data is mostly unique then compression might not really help a lot. If your data is repeated on a page then it could be beneficial. In my own test compression would improve performance with some tables and not so much with other tables. I won't explain how compression works, this is all explained in books on line. the goal of this post is for you to have some code that you can run and then see how page and row level compression differs compared to the original data.


Here is some code that you can run to see how compression works. the data I have used is similar to what we use at my job(but not the same, i used fake tickers so don't try to invest in those  ;-0). I have used just the 10 industry groups from the Industry Classification Benchmark, I left out supersectors,sectors and subsectors otherwise we would have a lot more rows per table. I also did not create fake ISIN Code,SEDOL and CUSIP columns since that would have bloated the code in this post. The code will generate 2780650 rows per table, this is a nice size to test with.
Compression is really nice when you get charged $30 per GB for the SAN, so besides performance there is also a cost benefit in storage


Here is the code
This block of code will create the database, setup the lookup tables and create the tables for our tests

 

use

master

go

create

database CompressionTest2

go

use

CompressionTest2

go

--our Industry Classification Benchmark table

Create

table IndustryGroup(GroupCode char(4) not null primary key,

GroupDescription

varchar(40))

GO

--just industry groups

insert

IndustryGroup values('0001','Oil & Gas' )

insert

IndustryGroup values('1000','Basic Materials' )

insert

IndustryGroup values('2000','Industrials')

insert

IndustryGroup values('3000','Consumer Goods')

insert

IndustryGroup values('4000','Healthcare')

insert

IndustryGroup values('5000','Consumer Services')

insert

IndustryGroup values('6000','Telecommunications')

insert

IndustryGroup values('7000','Utilities')

insert

IndustryGroup values('8000','Financials')

insert

IndustryGroup values('9000','Technology')

GO

--currency table

Create

table Currency (CurrencyCode char(3) not null primary key, CurrencyDescription varchar(30))

GO

--just handful of currencies

insert

currency values('USD','U.S. Dollar')

insert

currency values('AUD','Australian Dollar')

insert

currency values('CAD','Canadian Dollar')

insert

currency values('JPY','Japanese Yen')

insert

currency values('MXN','Mexican Peso')

insert

currency values('GBP','U.K. Sterling')

insert

currency values('EUR','European Euro')

insert

currency values('ISK','Iceland Krona')

insert

currency values('BGN','Bulgarian Lev')

insert

currency values('RON','Romanian Leu')

insert

currency values('INR','Indian Rupee')

insert

currency values('RUB','Russia Rubles')

insert

currency values('BHD','Bahrain Dinar')

insert

currency values('EGP','Egypt Pounds')

insert

currency values('JOD','Jodan Dinars')

insert

currency values('KWD','Kuwait Dinars')

insert

currency values('MAD','Morocco Dirham')

insert

currency values('OMR','Omam Rial')

insert

currency values('QAR','Qatari Rial')

GO

--market cap table

create

table MarketCap (MarketCap varchar(20) not null primary key,MarketCapDescription varchar(100) )

GO

--left out a bunch of market caps

insert

MarketCap values('ALL','broad')

insert

MarketCap values('MID','mid cap')

insert

MarketCap values('MCR','micro cap')

insert

MarketCap values('SML','small cap')

insert

MarketCap values('LRG','large cap')

 

--calendar table

create

table Calendar (CalendarDate date not null primary key)

GO

insert

Calendar

select

dateadd(d,number,'19920101') from master..spt_values

where

type = 'p'

and

datepart(dw,dateadd(d,number,'20080101')) not in (1,7)

union

select

dateadd(d,number,'19970810') from master..spt_values

where

type = 'p'

and

datepart(dw,dateadd(d,number,'19970810')) not in (1,7)

 

 

 

 

--the table that we will test against

create

table IndexCloseValues (CalendarDate date not null,

IndexSymbol

varchar(30) not null,

GroupCode

char(4) not null,

CurrencyCode

char(3) not null,

MarketCap

varchar(20) not null,

CloseValue

decimal(30,10),

TotalReturnClose

decimal(30,10))

 

--2780650 rows

insert

IndexCloseValues

select

CalendarDate,

CurrencyCode

+ left(GroupCode,1) + MarketCap as Symbol, --fake tickers

GroupCode

,CurrencyCode,MarketCap,

ROW_NUMBER() OVER( ORDER BY CalendarDate,GroupCode,CurrencyCode,MarketCap ) *0.00100,

ROW_NUMBER() OVER( ORDER BY CalendarDate,GroupCode,CurrencyCode,MarketCap ) *0.247001

from IndustryGroup i cross join currency c

cross

join MarketCap

cross

join calendar

GO

 

--create a copy for page compression

select

* into IndexCloseValuesPage

from

IndexCloseValues

Go

--create a copy for row compression

select

* into IndexCloseValuesRow

from

IndexCloseValues

GO

 

 

--add unique constraint

ALTER

TABLE IndexCloseValues WITH NOCHECK

ADD

CONSTRAINT UQ_IndexCloseValues UNIQUE Clustered (CalendarDate,IndexSymbol)

GO

--add unique constraint

ALTER

TABLE IndexCloseValuesPage WITH NOCHECK

ADD

CONSTRAINT UQ_IndexCloseValuesPage UNIQUE Clustered (CalendarDate,IndexSymbol)

GO

--page compression

ALTER

TABLE IndexCloseValuesPage

REBUILD

WITH (DATA_COMPRESSION = PAGE);

GO

--add unique constraint

ALTER

TABLE IndexCloseValuesRow WITH NOCHECK

ADD

CONSTRAINT UQ_IndexCloseValuesRow UNIQUE Clustered (CalendarDate,IndexSymbol)

GO

--row compression

ALTER

TABLE IndexCloseValuesRow

REBUILD

WITH (DATA_COMPRESSION = ROW);

GO

Now that everything is setup we can look how big the tables are. Run the code below (old school I know)

dbcc

showcontig('IndexCloseValues')

dbcc

showcontig('IndexCloseValuesPage')

dbcc

showcontig('IndexCloseValuesRow')


Table: 'IndexCloseValues' (213575799); index ID: 1, database ID: 16
TABLE level scan performed.
- Pages Scanned................................: 23767
- Extents Scanned..............................: 2972
- Extent Switches..............................: 2971
- Avg. Pages per Extent........................: 8.0
- Scan Density [Best Count:Actual Count].......: 99.97% [2971:2972]
- Logical Scan Fragmentation ..................: 0.01%
- Extent Scan Fragmentation ...................: 0.17%
- Avg. Bytes Free per Page.....................: 23.3
- Avg. Page Density (full).....................: 99.71%

Table: 'IndexCloseValuesPage' (245575913); index ID: 1, database ID: 16
TABLE level scan performed.
- Pages Scanned................................: 9307
- Extents Scanned..............................: 1165
- Extent Switches..............................: 1164
- Avg. Pages per Extent........................: 8.0
- Scan Density [Best Count:Actual Count].......: 99.91% [1164:1165]
- Logical Scan Fragmentation ..................: 0.04%
- Extent Scan Fragmentation ...................: 0.77%
- Avg. Bytes Free per Page.....................: 10.7
- Avg. Page Density (full).....................: 99.87%

Table: 'IndexCloseValuesRow' (261575970); index ID: 1, database ID: 16
TABLE level scan performed.
- Pages Scanned................................: 13255
- Extents Scanned..............................: 1657
- Extent Switches..............................: 1656
- Avg. Pages per Extent........................: 8.0
- Scan Density [Best Count:Actual Count].......: 100.00% [1657:1657]
- Logical Scan Fragmentation ..................: 0.01%
- Extent Scan Fragmentation ...................: 0.36%
- Avg. Bytes Free per Page.....................: 11.4
- Avg. Page Density (full).....................: 99.86%

As you can see compression really reduced the pages for the table, page level compression looks really good here.


First let's look at the execution plans, running the following block of code....

select * from IndexCloseValues

where

CalendarDate = '19920101'

 

select

* from IndexCloseValuesPage

where

CalendarDate = '19920101'

 

select

* from IndexCloseValuesRow

where

CalendarDate = '19920101'

 

...will result in this plan

Execution1day

Increasing the query to return a month of data instead of a day

 

select

* from IndexCloseValues

where

CalendarDate between '19920101' and '19920121'

 

select

* from IndexCloseValuesPage

where

CalendarDate between '19920101' and '19920121'

 

select

* from IndexCloseValuesRow

where

CalendarDate between '19920101' and '19920121'

will result in this plan.


Execution1month 

 

You can draw your own conclusions from those images

 

Let's look at some reads, first turn statistics io on

SET STATISTICS IO ON

go


 

select * from IndexCloseValues

where

CalendarDate = '19920101'

 

select

* from IndexCloseValuesPage

where

CalendarDate = '19920101'

 

select

* from IndexCloseValuesRow

where

CalendarDate = '19920101'


950 row(s) affected)

Table 'IndexCloseValues'. Scan count 1, logical reads 12, physical reads 0......

(950 row(s) affected)

Table 'IndexCloseValuesPage'. Scan count 1, logical reads 7, physical reads 0......

(950 row(s) affected)

Table 'IndexCloseValuesRow'. Scan count 1, logical reads 8, physical reads 0......

 

Those are some nice numbers for the reads, now we will increase the date range to one month
 

select * from IndexCloseValues

where

CalendarDate between '19920101' and '19920121'

 

select

* from IndexCloseValuesPage

where

CalendarDate between '19920101' and '19920121'

 

select

* from IndexCloseValuesRow

where

CalendarDate between '19920101' and '19920121'


(14250 row(s) affected)

Table 'IndexCloseValues'. Scan count 1, logical reads 125, physical reads 0......

(14250 row(s) affected)

Table 'IndexCloseValuesPage'. Scan count 1, logical reads 52, physical reads 0......

(14250 row(s) affected)

Table 'IndexCloseValuesRow'. Scan count 1, logical reads 69, physical reads 0......

 

When selecting more data the numbers look even better.

Turn statistics io off again

SET

STATISTICS IO OFF

go


So as you can see compression reduces the reads by over half when using page compression.


Posted by Denis Gobo, filed under Uncategorized. Date: April 7, 2009, 2:32 pm | No Comments »


Search Engine Optimization and SEO Tools