2005.12.11_Index Clustering
Index clustering can improve your query time, if your query can take advantage of the ordering.Following are the queries from 2005.12.11_TablePartitioning.html redone with index clustering.
The all in one table. A 10x improvement.
dms3_test=> \d da Table "temp20051206.da" Column | Type | Modifiers --------+---------+----------- doc_id | integer | not null an | text | not null av | text | not null Indexes: "da_pkey" PRIMARY KEY, btree (doc_id, an, av) "da_an_av_idx" UNIQUE, btree (an, av, doc_id) CLUSTER Foreign-key constraints: "da_an_fkey" FOREIGN KEY (an, av) REFERENCES attr(name, value) DEFERRABLE INITIALLY DEFERRED "da_doc_id_fkey" FOREIGN KEY (doc_id) REFERENCES public.document(id) DEFERRABLE INITIALLY DEFERRED dms3_test=> explain analyze select doc.id from document as doc, da da_client, da da_group, da da_type, da da_subtype where doc.id=da_client.doc_id and doc.id=da_group.doc_id and doc.id=da_type.doc_id and doc.id=da_subtype.doc_id and da_client.an='ssis_client' and da_group.an='ssis_group' and da_type.an='ssis_type' and da_subtype.an='ssis_subtype' and da_client.av='client1' and da_group.av='group1' and da_type.av='type1' and da_subtype.av='subtype1'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=40335.69..53871.76 rows=1 width=4) (actual time=1904.361..5277.128 rows=140 loops=1) -> Nested Loop (cost=40335.69..53864.64 rows=2 width=16) (actual time=1904.289..5221.572 rows=1072 loops=1) -> Nested Loop (cost=40335.69..53330.49 rows=150 width=12) (actual time=1904.216..3747.333 rows=10153 loops=1) -> Merge Join (cost=40335.69..40494.44 rows=3558 width=8) (actual time=1882.068..2082.534 rows=10153 loops=1) Merge Cond: ("outer".doc_id = "inner".doc_id) -> Sort (cost=20817.14..20849.19 rows=12818 width=4) (actual time=778.661..835.451 rows=101159 loops=1) Sort Key: da_client.doc_id -> Bitmap Heap Scan on da da_client (cost=155.91..19942.58 rows=12818 width=4) (actual time=432.318..521.484 rows=101159 loops=1) Recheck Cond: ((an = 'ssis_client'::text) AND (av = 'client1'::text)) -> Bitmap Index Scan on da_an_av_idx (cost=0.00..155.91 rows=12818 width=0) (actual time=418.064..418.064 rows=101159 loops=1) Index Cond: ((an = 'ssis_client'::text) AND (av = 'client1'::text)) -> Sort (cost=19518.54..19548.09 rows=11817 width=4) (actual time=1103.395..1160.311 rows=101190 loops=1) Sort Key: da_subtype.doc_id -> Bitmap Heap Scan on da da_subtype (cost=143.90..18719.21 rows=11817 width=4) (actual time=633.951..751.246 rows=101190 loops=1) Recheck Cond: ((an = 'ssis_subtype'::text) AND (av = 'subtype1'::text)) -> Bitmap Index Scan on da_an_av_idx (cost=0.00..143.90 rows=11817 width=0) (actual time=633.700..633.700 rows=101190 loops=1) Index Cond: ((an = 'ssis_subtype'::text) AND (av = 'subtype1'::text)) -> Index Scan using document_pkey on document doc (cost=0.00..3.60 rows=1 width=4) (actual time=0.162..0.163 rows=1 loops=10153) Index Cond: (doc.id = "outer".doc_id) -> Index Scan using da_pkey on da da_type (cost=0.00..3.55 rows=1 width=4) (actual time=0.144..0.144 rows=0 loops=10153) Index Cond: (("outer".id = da_type.doc_id) AND (da_type.an = 'ssis_type'::text) AND (da_type.av = 'type1'::text)) -> Index Scan using da_pkey on da da_group (cost=0.00..3.55 rows=1 width=4) (actual time=0.050..0.050 rows=0 loops=1072) Index Cond: (("outer".id = da_group.doc_id) AND (da_group.an = 'ssis_group'::text) AND (da_group.av = 'group1'::text)) Total runtime: 5280.891 ms (24 rows) Time: 5287.639 ms
The separated out data. About 2x improvement.
dms3_test=> \d ssis_client Table "attr.ssis_client" Column | Type | Modifiers ---------------+--------------------------+------------------------------------------------------------- doc_id | integer | not null created_when | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone created_who | integer | not null modified_when | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone v modified_who | integer | not null value | text | not null value_id | integer | not null Indexes: "ssis_client_pkey" PRIMARY KEY, btree (doc_id) "ssis_client_value_id_idx" btree (value_id) "ssis_client_value_idx" btree (value) CLUSTER Foreign-key constraints: "ssis_client_created_who_fkey" FOREIGN KEY (created_who) REFERENCES public.who(id) "ssis_client_doc_id_fkey" FOREIGN KEY (doc_id) REFERENCES public.document(id) "ssis_client_modified_who_fkey" FOREIGN KEY (modified_who) REFERENCES public.who(id) "ssis_client_value_id_fkey" FOREIGN KEY (value_id) REFERENCES ssis_client_value(id) dms3_test=> explain analyze select doc.id from document as doc, ssis_client as sc, ssis_group as sg, ssis_type as st, ssis_subtype as sst where doc.id=sc.doc_id and doc.id=sg.doc_id and doc.id=st.doc_id and doc.id=sst.doc_id and sc.value='client1' and sg.value='group1' and st.value='type1' and sst.value='subtype1'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=12322.80..16099.97 rows=108 width=4) (actual time=1941.188..2329.004 rows=140 loops=1) -> Hash Join (cost=12322.80..15710.34 rows=108 width=16) (actual time=1941.144..2306.762 rows=140 loops=1) Hash Cond: ("outer".doc_id = "inner".doc_id) -> Index Scan using ssis_type_value_idx on ssis_type st (cost=0.00..2850.93 rows=107106 width=4) (actual time=75.017..406.128 rows=100825 loops=1) Index Cond: (value = 'type1'::text) -> Hash (cost=12320.26..12320.26 rows=1019 width=12) (actual time=1862.846..1862.846 rows=993 loops=1) -> Hash Join (cost=8986.99..12320.26 rows=1019 width=12) (actual time=1531.920..1861.824 rows=993 loops=1) Hash Cond: ("outer".doc_id = "inner".doc_id) -> Index Scan using ssis_group_value_idx on ssis_group sg (cost=0.00..2797.65 rows=105085 width=4) (actual time=39.136..291.236 rows=100924 loops=1) Index Cond: (value = 'group1'::text) -> Hash (cost=8962.49..8962.49 rows=9802 width=8) (actual time=1481.859..1481.859 rows=10153 loops=1) -> Hash Join (cost=3248.58..8962.49 rows=9802 width=8) (actual time=981.952..1474.601 rows=10153 loops=1) Hash Cond: ("outer".doc_id = "inner".doc_id) -> Index Scan using ssis_client_value_idx on ssis_client sc (cost=0.00..2680.53 rows=100706 width=4) (actual time=43.766..317.779 rows=101159 loops=1) Index Cond: (value = 'client1'::text) -> Hash (cost=2617.71..2617.71 rows=98349 width=4) (actual time=918.689..918.689 rows=101190 loops=1) -> Index Scan using ssis_subtype_value_idx on ssis_subtype sst (cost=0.00..2617.71 rows=98349 width=4) (actual time=53.700..820.030 rows=101190 loops=1) Index Cond: (value = 'subtype1'::text) -> Index Scan using document_pkey on document doc (cost=0.00..3.60 rows=1 width=4) (actual time=0.156..0.156 rows=1 loops=140) Index Cond: (doc.id = "outer".doc_id) Total runtime: 2329.395 ms (21 rows) Time: 2334.097 ms
(originally from http://microjet.ath.cx/WebWiki/2005.12.11_IndexClustering.html)