2005-11-12

2005.12.11_Index Clustering

Index clustering can improve your query time, if your query can take advantage of the ordering.
Following are the queries from 2005.12.11_TablePartitioning.html redone with index clustering.

The all in one table. A 10x improvement.

dms3_test=> \d da
   Table "temp20051206.da"
 Column |  Type   | Modifiers 
--------+---------+-----------
 doc_id | integer | not null
 an     | text    | not null
 av     | text    | not null
Indexes:
    "da_pkey" PRIMARY KEY, btree (doc_id, an, av)
    "da_an_av_idx" UNIQUE, btree (an, av, doc_id) CLUSTER
Foreign-key constraints:
    "da_an_fkey" FOREIGN KEY (an, av) REFERENCES attr(name, value) DEFERRABLE INITIALLY DEFERRED
    "da_doc_id_fkey" FOREIGN KEY (doc_id) REFERENCES public.document(id) DEFERRABLE INITIALLY DEFERRED



dms3_test=> explain analyze select doc.id from document as doc, da
da_client, da da_group, da da_type, da da_subtype where
doc.id=da_client.doc_id and doc.id=da_group.doc_id and
doc.id=da_type.doc_id and doc.id=da_subtype.doc_id and
da_client.an='ssis_client' and da_group.an='ssis_group' and
da_type.an='ssis_type' and da_subtype.an='ssis_subtype' and
da_client.av='client1' and da_group.av='group1' and da_type.av='type1'
and da_subtype.av='subtype1';
                                                                            QUERY PLAN                                                                            
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=40335.69..53871.76 rows=1 width=4) (actual time=1904.361..5277.128 rows=140 loops=1)
   ->  Nested Loop  (cost=40335.69..53864.64 rows=2 width=16) (actual time=1904.289..5221.572 rows=1072 loops=1)
         ->  Nested Loop  (cost=40335.69..53330.49 rows=150 width=12) (actual time=1904.216..3747.333 rows=10153 loops=1)
               ->  Merge Join  (cost=40335.69..40494.44 rows=3558 width=8) (actual time=1882.068..2082.534 rows=10153 loops=1)
                     Merge Cond: ("outer".doc_id = "inner".doc_id)
                     ->  Sort  (cost=20817.14..20849.19 rows=12818 width=4) (actual time=778.661..835.451 rows=101159 loops=1)
                           Sort Key: da_client.doc_id
                           ->  Bitmap Heap Scan on da da_client  (cost=155.91..19942.58 rows=12818 width=4) (actual time=432.318..521.484 rows=101159 loops=1)
                                 Recheck Cond: ((an = 'ssis_client'::text) AND (av = 'client1'::text))
                                 ->  Bitmap Index Scan on da_an_av_idx  (cost=0.00..155.91 rows=12818 width=0) (actual time=418.064..418.064 rows=101159 loops=1)
                                       Index Cond: ((an = 'ssis_client'::text) AND (av = 'client1'::text))
                     ->  Sort  (cost=19518.54..19548.09 rows=11817 width=4) (actual time=1103.395..1160.311 rows=101190 loops=1)
                           Sort Key: da_subtype.doc_id
                           ->  Bitmap Heap Scan on da da_subtype  (cost=143.90..18719.21 rows=11817 width=4) (actual time=633.951..751.246 rows=101190 loops=1)
                                 Recheck Cond: ((an = 'ssis_subtype'::text) AND (av = 'subtype1'::text))
                                 ->  Bitmap Index Scan on da_an_av_idx  (cost=0.00..143.90 rows=11817 width=0) (actual time=633.700..633.700 rows=101190 loops=1)
                                       Index Cond: ((an = 'ssis_subtype'::text) AND (av = 'subtype1'::text))
               ->  Index Scan using document_pkey on document doc  (cost=0.00..3.60 rows=1 width=4) (actual time=0.162..0.163 rows=1 loops=10153)
                     Index Cond: (doc.id = "outer".doc_id)
         ->  Index Scan using da_pkey on da da_type  (cost=0.00..3.55 rows=1 width=4) (actual time=0.144..0.144 rows=0 loops=10153)
               Index Cond: (("outer".id = da_type.doc_id) AND (da_type.an = 'ssis_type'::text) AND (da_type.av = 'type1'::text))
   ->  Index Scan using da_pkey on da da_group  (cost=0.00..3.55 rows=1 width=4) (actual time=0.050..0.050 rows=0 loops=1072)
         Index Cond: (("outer".id = da_group.doc_id) AND (da_group.an = 'ssis_group'::text) AND (da_group.av = 'group1'::text))
 Total runtime: 5280.891 ms
(24 rows)

Time: 5287.639 ms


The separated out data. About 2x improvement.
 
 
dms3_test=> \d ssis_client
                                        Table "attr.ssis_client"
    Column     |           Type           |                          Modifiers                          
---------------+--------------------------+-------------------------------------------------------------
 doc_id        | integer                  | not null
 created_when  | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone
 created_who   | integer                  | not null
 modified_when | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone
v modified_who  | integer                  | not null
 value         | text                     | not null
 value_id      | integer                  | not null
Indexes:
    "ssis_client_pkey" PRIMARY KEY, btree (doc_id)
    "ssis_client_value_id_idx" btree (value_id)
    "ssis_client_value_idx" btree (value) CLUSTER
Foreign-key constraints:
    "ssis_client_created_who_fkey" FOREIGN KEY (created_who) REFERENCES public.who(id)
    "ssis_client_doc_id_fkey" FOREIGN KEY (doc_id) REFERENCES public.document(id)
    "ssis_client_modified_who_fkey" FOREIGN KEY (modified_who) REFERENCES public.who(id)
    "ssis_client_value_id_fkey" FOREIGN KEY (value_id) REFERENCES ssis_client_value(id)

dms3_test=> explain analyze select doc.id from document as doc,
ssis_client as sc, ssis_group as sg, ssis_type as st, ssis_subtype as
sst where doc.id=sc.doc_id and doc.id=sg.doc_id and doc.id=st.doc_id
and doc.id=sst.doc_id and sc.value='client1' and sg.value='group1' and
st.value='type1' and sst.value='subtype1';

                                                                                            QUERY PLAN                                                                                            
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=12322.80..16099.97 rows=108 width=4) (actual time=1941.188..2329.004 rows=140 loops=1)
   ->  Hash Join  (cost=12322.80..15710.34 rows=108 width=16) (actual time=1941.144..2306.762 rows=140 loops=1)
         Hash Cond: ("outer".doc_id = "inner".doc_id)
         ->  Index Scan using ssis_type_value_idx on ssis_type st  (cost=0.00..2850.93 rows=107106 width=4) (actual time=75.017..406.128 rows=100825 loops=1)
               Index Cond: (value = 'type1'::text)
         ->  Hash  (cost=12320.26..12320.26 rows=1019 width=12) (actual time=1862.846..1862.846 rows=993 loops=1)
               ->  Hash Join  (cost=8986.99..12320.26 rows=1019 width=12) (actual time=1531.920..1861.824 rows=993 loops=1)
                     Hash Cond: ("outer".doc_id = "inner".doc_id)
                     ->  Index Scan using ssis_group_value_idx on ssis_group sg  (cost=0.00..2797.65 rows=105085 width=4) (actual time=39.136..291.236 rows=100924 loops=1)
                           Index Cond: (value = 'group1'::text)
                     ->  Hash  (cost=8962.49..8962.49 rows=9802 width=8) (actual time=1481.859..1481.859 rows=10153 loops=1)
                           ->  Hash Join  (cost=3248.58..8962.49 rows=9802 width=8) (actual time=981.952..1474.601 rows=10153 loops=1)
                                 Hash Cond: ("outer".doc_id = "inner".doc_id)
                                 ->  Index Scan using ssis_client_value_idx on ssis_client sc  (cost=0.00..2680.53 rows=100706 width=4) (actual time=43.766..317.779 rows=101159 loops=1)
                                       Index Cond: (value = 'client1'::text)
                                 ->  Hash  (cost=2617.71..2617.71 rows=98349 width=4) (actual time=918.689..918.689 rows=101190 loops=1)
                                       ->  Index Scan using ssis_subtype_value_idx on ssis_subtype sst  (cost=0.00..2617.71 rows=98349 width=4) (actual time=53.700..820.030 rows=101190 loops=1)
                                             Index Cond: (value = 'subtype1'::text)
   ->  Index Scan using document_pkey on document doc  (cost=0.00..3.60 rows=1 width=4) (actual time=0.156..0.156 rows=1 loops=140)
         Index Cond: (doc.id = "outer".doc_id)
 Total runtime: 2329.395 ms
(21 rows)

Time: 2334.097 ms 
 
 
(originally from http://microjet.ath.cx/WebWiki/2005.12.11_IndexClustering.html)

No comments: