To use compression you have to understand what your data looks like, if your data is mostly unique then compression might not really help a lot. If your data is repeated on a page then it could be beneficial. In my own test compression would improve performance with some tables and not so much with other tables. I won't explain how compression works, this is all explained in books on line. the goal of this post is for you to have some code that you can run and then see how page and row level compression differs compared to the original data.
Here is some code that you can run to see how compression works. the data I have used is similar to what we use at my job(but not the same, i used fake tickers so don't try to invest in those ;-0). I have used just the 10 industry groups from the Industry Classification Benchmark, I left out supersectors,sectors and subsectors otherwise we would have a lot more rows per table. I also did not create fake ISIN Code,SEDOL and CUSIP columns since that would have bloated the code in this post. The code will generate 2780650 rows per table, this is a nice size to test with.
Compression is really nice when you get charged $30 per GB for the SAN, so besides performance there is also a cost benefit in storage
Here is the code
This block of code will create the database, setup the lookup tables and create the tables for our tests
use
mastergo
create
database CompressionTest2go
use
CompressionTest2go
--our Industry Classification Benchmark table
Create
table IndustryGroup(GroupCode char(4) not null primary key,GroupDescription
varchar(40))GO
--just industry groups
insert
IndustryGroup values('0001','Oil & Gas' )insert
IndustryGroup values('1000','Basic Materials' )insert
IndustryGroup values('2000','Industrials')insert
IndustryGroup values('3000','Consumer Goods')insert
IndustryGroup values('4000','Healthcare')insert
IndustryGroup values('5000','Consumer Services')insert
IndustryGroup values('6000','Telecommunications')insert
IndustryGroup values('7000','Utilities')insert
IndustryGroup values('8000','Financials')insert
IndustryGroup values('9000','Technology')GO
--currency table
Create
table Currency (CurrencyCode char(3) not null primary key, CurrencyDescription varchar(30))GO
--just handful of currencies
insert
currency values('USD','U.S. Dollar')insert
currency values('AUD','Australian Dollar')insert
currency values('CAD','Canadian Dollar')insert
currency values('JPY','Japanese Yen')insert
currency values('MXN','Mexican Peso')insert
currency values('GBP','U.K. Sterling')insert
currency values('EUR','European Euro')insert
currency values('ISK','Iceland Krona')insert
currency values('BGN','Bulgarian Lev')insert
currency values('RON','Romanian Leu')insert
currency values('INR','Indian Rupee')insert
currency values('RUB','Russia Rubles')insert
currency values('BHD','Bahrain Dinar')insert
currency values('EGP','Egypt Pounds')insert
currency values('JOD','Jodan Dinars')insert
currency values('KWD','Kuwait Dinars')insert
currency values('MAD','Morocco Dirham')insert
currency values('OMR','Omam Rial')insert
currency values('QAR','Qatari Rial')GO
--market cap table
create
table MarketCap (MarketCap varchar(20) not null primary key,MarketCapDescription varchar(100) )GO
--left out a bunch of market caps
insert
MarketCap values('ALL','broad')insert
MarketCap values('MID','mid cap')insert
MarketCap values('MCR','micro cap')insert
MarketCap values('SML','small cap')insert
MarketCap values('LRG','large cap')
--calendar table
create
table Calendar (CalendarDate date not null primary key)GO
insert
Calendarselect
dateadd(d,number,'19920101') from master..spt_valueswhere
type = 'p'and
datepart(dw,dateadd(d,number,'20080101')) not in (1,7)union
select
dateadd(d,number,'19970810') from master..spt_valueswhere
type = 'p'and
datepart(dw,dateadd(d,number,'19970810')) not in (1,7)
--the table that we will test against
create
table IndexCloseValues (CalendarDate date not null,IndexSymbol
varchar(30) not null,GroupCode
char(4) not null,CurrencyCode
char(3) not null,MarketCap
varchar(20) not null,CloseValue
decimal(30,10),TotalReturnClose
decimal(30,10))
--2780650 rows
insert
IndexCloseValuesselect
CalendarDate,CurrencyCode
+ left(GroupCode,1) + MarketCap as Symbol, --fake tickersGroupCode
,CurrencyCode,MarketCap, ROW_NUMBER() OVER( ORDER BY CalendarDate,GroupCode,CurrencyCode,MarketCap ) *0.00100, ROW_NUMBER() OVER( ORDER BY CalendarDate,GroupCode,CurrencyCode,MarketCap ) *0.247001 from IndustryGroup i cross join currency ccross
join MarketCapcross
join calendarGO
--create a copy for page compression
select
* into IndexCloseValuesPagefrom
IndexCloseValuesGo
--create a copy for row compression
select
* into IndexCloseValuesRowfrom
IndexCloseValuesGO
--add unique constraint
ALTER
TABLE IndexCloseValues WITH NOCHECKADD
CONSTRAINT UQ_IndexCloseValues UNIQUE Clustered (CalendarDate,IndexSymbol)GO
--add unique constraint
ALTER
TABLE IndexCloseValuesPage WITH NOCHECKADD
CONSTRAINT UQ_IndexCloseValuesPage UNIQUE Clustered (CalendarDate,IndexSymbol)GO
--page compression
ALTER
TABLE IndexCloseValuesPageREBUILD
WITH (DATA_COMPRESSION = PAGE);GO
--add unique constraint
ALTER
TABLE IndexCloseValuesRow WITH NOCHECKADD
CONSTRAINT UQ_IndexCloseValuesRow UNIQUE Clustered (CalendarDate,IndexSymbol)GO
--row compression
ALTER
TABLE IndexCloseValuesRowREBUILD
WITH (DATA_COMPRESSION = ROW);GO
Now that everything is setup we can look how big the tables are. Run the code below (old school I know)
dbcc
showcontig('IndexCloseValues')dbcc
showcontig('IndexCloseValuesPage')dbcc
showcontig('IndexCloseValuesRow')
Table: 'IndexCloseValues' (213575799); index ID: 1, database ID: 16
TABLE level scan performed.
- Pages Scanned................................: 23767
- Extents Scanned..............................: 2972
- Extent Switches..............................: 2971
- Avg. Pages per Extent........................: 8.0
- Scan Density [Best Count:Actual Count].......: 99.97% [2971:2972]
- Logical Scan Fragmentation ..................: 0.01%
- Extent Scan Fragmentation ...................: 0.17%
- Avg. Bytes Free per Page.....................: 23.3
- Avg. Page Density (full).....................: 99.71%
Table: 'IndexCloseValuesPage' (245575913); index ID: 1, database ID: 16
TABLE level scan performed.
- Pages Scanned................................: 9307
- Extents Scanned..............................: 1165
- Extent Switches..............................: 1164
- Avg. Pages per Extent........................: 8.0
- Scan Density [Best Count:Actual Count].......: 99.91% [1164:1165]
- Logical Scan Fragmentation ..................: 0.04%
- Extent Scan Fragmentation ...................: 0.77%
- Avg. Bytes Free per Page.....................: 10.7
- Avg. Page Density (full).....................: 99.87%
Table: 'IndexCloseValuesRow' (261575970); index ID: 1, database ID: 16
TABLE level scan performed.
- Pages Scanned................................: 13255
- Extents Scanned..............................: 1657
- Extent Switches..............................: 1656
- Avg. Pages per Extent........................: 8.0
- Scan Density [Best Count:Actual Count].......: 100.00% [1657:1657]
- Logical Scan Fragmentation ..................: 0.01%
- Extent Scan Fragmentation ...................: 0.36%
- Avg. Bytes Free per Page.....................: 11.4
- Avg. Page Density (full).....................: 99.86%
As you can see compression really reduced the pages for the table, page level compression looks really good here.
First let's look at the execution plans, running the following block of code....
select * from IndexCloseValues
where
CalendarDate = '19920101'select
* from IndexCloseValuesPagewhere
CalendarDate = '19920101'select
* from IndexCloseValuesRowwhere
CalendarDate = '19920101'
...will result in this plan
Increasing the query to return a month of data instead of a day
select
* from IndexCloseValueswhere
CalendarDate between '19920101' and '19920121'
select
* from IndexCloseValuesPagewhere
CalendarDate between '19920101' and '19920121'
select
* from IndexCloseValuesRowwhere
CalendarDate between '19920101' and '19920121'will result in this plan.
You can draw your own conclusions from those images
Let's look at some reads, first turn statistics io on
SET STATISTICS IO ON
go
select * from IndexCloseValues
where
CalendarDate = '19920101'
select
* from IndexCloseValuesPagewhere
CalendarDate = '19920101'
select
* from IndexCloseValuesRowwhere
CalendarDate = '19920101'
950 row(s) affected)
Table 'IndexCloseValues'. Scan count 1, logical reads 12, physical reads 0......
(950 row(s) affected)
Table 'IndexCloseValuesPage'. Scan count 1, logical reads 7, physical reads 0......
(950 row(s) affected)
Table 'IndexCloseValuesRow'. Scan count 1, logical reads 8, physical reads 0......
Those are some nice numbers for the reads, now we will increase the date range to one month
select * from IndexCloseValues
where
CalendarDate between '19920101' and '19920121'select
* from IndexCloseValuesPagewhere
CalendarDate between '19920101' and '19920121'select
* from IndexCloseValuesRowwhere
CalendarDate between '19920101' and '19920121'(14250 row(s) affected)
Table 'IndexCloseValues'. Scan count 1, logical reads 125, physical reads 0......
(14250 row(s) affected)
Table 'IndexCloseValuesPage'. Scan count 1, logical reads 52, physical reads 0......
(14250 row(s) affected)
Table 'IndexCloseValuesRow'. Scan count 1, logical reads 69, physical reads 0......
When selecting more data the numbers look even better.
Turn statistics io off again
SET
STATISTICS IO OFF go
So as you can see compression reduces the reads by over half when using page compression.
