It is really sad to see Jeff S. et. al fighting the horde of ignorants, but that is a battle that has been fought many times before and is not what I want to write about here.
What I want to show here is the difference in query time you'd expect to see by trimming down on unnecessary usage of surrogate keys.
A schema with unnecessary surrogate keys.
-- Done on postgresql 8.1
dms3_test=> \d document
Table "dms4.document"
Column | Type | Modifiers
---------------+--------------------------+-------------------------------------------------------------
id | integer | not null default nextval('document_id_seq'::regclass)
state | text | not null
modified_who | text | not null
modified_when | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone
created_who | text | not null
created_when | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone
Indexes:
"document_pkey" PRIMARY KEY, btree (id)
"document_created_when" btree (created_when)
"document_created_who" btree (created_who)
"document_modified_when" btree (modified_when)
"document_modified_who" btree (modified_who)
"document_state" btree (state)
Foreign-key constraints:
"document_created_who_fkey" FOREIGN KEY (created_who) REFERENCES who(name)
"document_modified_who_fkey" FOREIGN KEY (modified_who) REFERENCES who(name)
"document_state_fkey" FOREIGN KEY (state) REFERENCES state(name)
dms3_test=> \d attribute_name
Table "public.attribute_name"
Column | Type | Modifiers
--------+---------+------------------------------------------------------------------
id | integer | not null default nextval(('attribute_name_seq'::text)::regclass)
name | text | not null
Indexes:
"attribute_name_pkey" PRIMARY KEY, btree (id)
"an_name_idx" UNIQUE, btree (name)
dms3_test=> \d attribute
Table "public.attribute"
Column | Type | Modifiers
---------+---------+--------------------------------------------------------
id | integer | not null default nextval('attribute_id_seq'::regclass)
name_id | integer | not null
value | text | not null
Indexes:
"attribute_pkey" PRIMARY KEY, btree (id)
"attr_nv_idx" UNIQUE, btree (name_id, value)
Foreign-key constraints:
"$1" FOREIGN KEY (name_id) REFERENCES attribute_name(id)
dms3_test=> \d document_attribute
Table "public.document_attribute"
Column | Type | Modifiers
--------------+---------+-----------
doc_id | integer | not null
attribute_id | integer | not null
Indexes:
"document_attribute_pkey" PRIMARY KEY, btree (doc_id, attribute_id) CLUSTER
"da_ad_idx" UNIQUE, btree (attribute_id, doc_id)
"da_attr_idx" btree (attribute_id)
Foreign-key constraints:
"$1" FOREIGN KEY (doc_id) REFERENCES public.document(id)
"$2" FOREIGN KEY (attribute_id) REFERENCES attribute(id)
dms3_test=>
explain analyze SELECT doc.id FROM attribute as attr0, attribute_name
as an0, attribute_name as an2, document_attribute as da2, document as
doc, attribute_name as an1, attribute as attr3, document_attribute as
da1, attribute as attr2, document_attribute as da0, attribute_name as
an3, attribute as attr1, document_attribute as da3
WHERE
da0.doc_id = doc.id AND attr0.id = da0.attribute_id AND an0.id = attr0.name_id AND
da1.doc_id = doc.id AND attr1.id = da1.attribute_id AND an1.id = attr1.name_id AND
da2.doc_id = doc.id AND attr2.id = da2.attribute_id AND an2.id = attr2.name_id AND
da3.doc_id = doc.id AND attr3.id = da3.attribute_id AND an3.id = attr3.name_id AND
an0.name = 'ssis_client' AND attr0.value = 'client1' AND
an1.name = 'ssis_group' AND attr1.value = 'group1' AND
an2.name = 'ssis_type' AND attr2.value = 'type1' AND
an3.name = 'ssis_subtype' AND attr3.value = 'subtype1'
;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=655695.94..886471.16 rows=1 width=4) (actual time=188571.542..217327.467 rows=3 loops=1)
Join Filter: ("outer".id = "inner".name_id)
-> Seq Scan on attribute_name an0 (cost=0.00..1.07 rows=1 width=4) (actual time=0.045..0.048 rows=1 loops=1)
Filter: (name = 'ssis_client'::text)
-> Nested Loop (cost=655695.94..886470.07 rows=1 width=8) (actual time=188571.488..217327.404 rows=3 loops=1)
Join Filter: ("outer".id = "inner".name_id)
-> Seq Scan on attribute_name an5 (cost=0.00..1.07 rows=1 width=4) (actual time=0.008..0.015 rows=1 loops=1)
Filter: (name = 'ssis_e2xkey'::text)
-> Nested Loop (cost=655695.94..886468.99 rows=1 width=12) (actual time=188571.475..217327.378 rows=3 loops=1)
Join Filter: ("outer".id = "inner".name_id)
-> Seq Scan on attribute_name an4 (cost=0.00..1.07 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)
Filter: (name = 'ssis_comment'::text)
-> Nested Loop (cost=655695.94..886467.90 rows=1 width=16) (actual time=188571.453..217327.346 rows=3 loops=1)
-> Hash Join (cost=655695.94..886463.56 rows=1 width=40) (actual time=183814.259..217326.874 rows=11 loops=1)
Hash Cond: ("outer".name_id = "inner".id)
-> Hash Join (cost=655694.86..886462.45 rows=4 width=44) (actual time=183814.220..217326.774 rows=11 loops=1)
Hash Cond: (("outer".id = "inner".name_id) AND ("outer".attribute_id = "inner".id))
-> Hash Join (cost=655692.89..886452.06 rows=838 width=52) (actual time=183795.378..217307.410 rows=951 loops=1)
Hash Cond: ("outer".attribute_id = "inner".id)
-> Nested Loop (cost=655690.92..886280.53 rows=32236 width=52) (actual time=183664.609..217269.300 rows=97785 loops=1)
-> Hash Join (cost=655688.96..885633.84 rows=16118 width=44) (actual time=183640.064..217012.129 rows=97785 loops=1)
Hash Cond: ("outer".doc_id = "inner".id)
-> Nested Loop (cost=0.00..191997.58 rows=7557225 width=12) (actual time=12.250..10810.601 rows=7557225 loops=1)
-> Index Scan using attribute_name_pkey on attribute_name an3 (cost=0.00..3.08 rows=1 width=4) (actual time=12.218..12.232 rows=1 loops=1)
Filter: (name = 'ssis_subtype'::text)
-> Seq Scan on document_attribute da1 (cost=0.00..116422.25 rows=7557225 width=8) (actual time=0.021..4547.481 rows=7557225 loops=1)
-> Hash (cost=655683.57..655683.57 rows=2155 width=32) (actual time=183565.310..183565.310 rows=0 loops=1)
-> Nested Loop (cost=462162.26..655683.57 rows=2155 width=32) (actual time=128403.334..183556.998 rows=9257 loops=1)
-> Hash Join (cost=462162.26..650853.43 rows=288 width=24) (actual time=128403.319..183498.454 rows=906 loops=1)
Hash Cond: ("outer".attribute_id = "inner".id)
-> Hash Join (cost=462160.29..650793.18 rows=11080 width=24) (actual time=128337.287..183447.336 rows=90804 loops=1)
Hash Cond: ("outer".name_id = "inner".id)
-> Nested Loop (cost=462159.22..650348.93 rows=66476 width=28) (actual time=128337.257..183307.650 rows=90804 loops=1)
-> Hash Join (cost=462159.22..501285.43 rows=8888 width=20) (actual time=128306.517..132366.000 rows=8840 loops=1)
Hash Cond: ("outer".id = "inner".doc_id)
-> Hash Join (cost=1.06..25599.40 rows=202087 width=4) (actual time=41.568..2754.244 rows=329853 loops=1)
Hash Cond: ("outer".state_id = "inner".id)
-> Seq Scan on document doc (cost=0.00..18525.31 rows=1010431 width=8) (actual time=18.767..2089.253 rows=1010431 loops=1)
-> Hash (cost=1.06..1.06 rows=1 width=4) (actual time=14.681..14.681 rows=0 loops=1)
-> Seq Scan on state (cost=0.00..1.06 rows=1 width=4) (actual time=0.031..0.035 rows=1 loops=1)
Filter: (name = 'associated'::text)
-> Hash (cost=461808.06..461808.06 rows=44437 width=16) (actual time=128239.687..128239.687 rows=0 loops=1)
-> Hash Join (cost=157528.96..461808.06 rows=44437 width=16) (actual time=15479.239..128139.028 rows=27087 loops=1)
Hash Cond: ("outer".attribute_id = "inner".id)
-> Hash Join (cost=157526.99..452807.67 rows=1710811 width=16) (actual time=15478.512..127203.588 rows=2313438 loops=1)
Hash Cond: ("outer".doc_id = "inner".doc_id)
-> Seq Scan on document_attribute da2 (cost=0.00..116422.25 rows=7557225 width=8) (actual time=0.019..37889.434 rows=7557225 loops=1)
-> Hash (cost=156173.26..156173.26 rows=196292 width=8) (actual time=14908.286..14908.286 rows=0 loops=1)
-> Hash Join (cost=1.97..156173.26 rows=196292 width=8) (actual time=7.750..14650.188 rows=269366 loops=1)
Hash Cond: ("outer".attribute_id = "inner".id)
-> Seq Scan on document_attribute da4 (cost=0.00..116422.25 rows=7557225 width=8) (actual time=7.678..11635.892 rows=7557225 loops=1)
-> Hash (cost=1.96..1.96 rows=2 width=8) (actual time=0.049..0.049 rows=0 loops=1)
-> Seq Scan on attribute attr4 (cost=0.00..1.96 rows=2 width=8) (actual time=0.008..0.047 rows=1 loops=1)
Filter: (value = 'comment1'::text)
-> Hash (cost=1.96..1.96 rows=2 width=8) (actual time=0.046..0.046 rows=0 loops=1)
-> Seq Scan on attribute attr2 (cost=0.00..1.96 rows=2 width=8) (actual time=0.007..0.044 rows=1 loops=1)
Filter: (value = 'type1'::text)
-> Index Scan using document_attribute_pkey on document_attribute da0 (cost=0.00..16.67 rows=8 width=8) (actual time=5.713..5.748 rows=10 loops=8840)
Index Cond: ("outer".doc_id = da0.doc_id)
-> Hash (cost=1.07..1.07 rows=1 width=4) (actual time=0.011..0.011 rows=0 loops=1)
-> Seq Scan on attribute_name an2 (cost=0.00..1.07 rows=1 width=4) (actual time=0.007..0.010 rows=1 loops=1)
Filter: (name = 'ssis_type'::text)
-> Hash (cost=1.96..1.96 rows=2 width=8) (actual time=0.052..0.052 rows=0 loops=1)
-> Seq Scan on attribute attr0 (cost=0.00..1.96 rows=2 width=8) (actual time=0.014..0.050 rows=1 loops=1)
Filter: (value = 'client1'::text)
-> Index Scan using document_attribute_pkey on document_attribute da3 (cost=0.00..16.67 rows=8 width=8) (actual time=0.005..0.049 rows=10 loops=906)
Index Cond: (da3.doc_id = "outer".doc_id)
-> Materialize (cost=1.96..1.98 rows=2 width=8) (actual time=0.000..0.001 rows=1 loops=97785)
-> Seq Scan on attribute attr5 (cost=0.00..1.96 rows=2 width=8) (actual time=18.498..18.537 rows=1 loops=1)
Filter: (value = 'e2xkey1'::text)
-> Hash (cost=1.96..1.96 rows=2 width=8) (actual time=0.045..0.045 rows=0 loops=1)
-> Seq Scan on attribute attr1 (cost=0.00..1.96 rows=2 width=8) (actual time=0.007..0.044 rows=1 loops=1)
Filter: (value = 'group1'::text)
-> Hash (cost=1.96..1.96 rows=2 width=8) (actual time=18.781..18.781 rows=0 loops=1)
-> Seq Scan on attribute attr3 (cost=0.00..1.96 rows=2 width=8) (actual time=18.739..18.779 rows=1 loops=1)
Filter: (value = 'subtype1'::text)
-> Hash (cost=1.07..1.07 rows=1 width=4) (actual time=0.018..0.018 rows=0 loops=1)
-> Seq Scan on attribute_name an1 (cost=0.00..1.07 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)
Filter: (name = 'ssis_group'::text)
-> Index Scan using document_attribute_pkey on document_attribute da5 (cost=0.00..4.33 rows=1 width=8) (actual time=0.037..0.037 rows=0 loops=11)
Index Cond: ((da5.doc_id = "outer".doc_id) AND ("outer".id = da5.attribute_id))
Total runtime: 217405.692 ms
(82 rows)
Time: 217885.006 ms
The same data but in a schema without trimmed-down surrogate keys. -- Done on postgresql 8.1
dms3_test=> \d document
Table "dms4.document"
Column | Type | Modifiers
---------------+--------------------------+-------------------------------------------------------------
id | integer | not null default nextval('document_id_seq'::regclass)
state | text | not null
modified_who | text | not null
modified_when | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone
created_who | text | not null
created_when | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone
Indexes:
"document_pkey" PRIMARY KEY, btree (id)
"document_created_when" btree (created_when)
"document_created_who" btree (created_who)
"document_modified_when" btree (modified_when)
"document_modified_who" btree (modified_who)
"document_state" btree (state)
Foreign-key constraints:
"document_created_who_fkey" FOREIGN KEY (created_who) REFERENCES who(name)
"document_modified_who_fkey" FOREIGN KEY (modified_who) REFERENCES who(name)
"document_state_fkey" FOREIGN KEY (state) REFERENCES state(name)
dms3_test=> \d temp20051206.da
Table "temp20051206.da"
Column | Type | Modifiers
--------+---------+-----------
doc_id | integer | not null
an | text | not null
av | text | not null
Indexes:
"da_pkey" PRIMARY KEY, btree (doc_id, an, av)
"da_an_av_idx" UNIQUE, btree (an, av, doc_id)
Foreign-key constraints:
"da_an_fkey" FOREIGN KEY (an, av) REFERENCES attr(name, value) DEFERRABLE INITIALLY DEFERRED
"da_doc_id_fkey" FOREIGN KEY (doc_id) REFERENCES public.document(id) DEFERRABLE INITIALLY DEFERRED
dms3_test=>
explain analyze select doc.id from document as doc, da
da_client, da da_group, da da_type, da da_subtype
WHERE
doc.id=da_client.doc_id and doc.id=da_group.doc_id and
doc.id=da_type.doc_id and doc.id=da_subtype.doc_id and
da_client.an='ssis_client' and da_client.av='client1' and
da_group.an='ssis_group' and da_group.av='group1' and
da_type.an='ssis_type' and da_type.av='type1' and
da_subtype.an='ssis_subtype' and da_subtype.av='subtype1'
;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=40335.69..53871.76 rows=1 width=4) (actual time=17319.458..50292.429 rows=140 loops=1)
-> Nested Loop (cost=40335.69..53864.64 rows=2 width=16) (actual time=17319.392..50260.780 rows=1072 loops=1)
-> Nested Loop (cost=40335.69..53330.49 rows=150 width=12) (actual time=17289.476..35508.467 rows=10153 loops=1)
-> Merge Join (cost=40335.69..40494.44 rows=3558 width=8) (actual time=17263.290..17594.400 rows=10153 loops=1)
Merge Cond: ("outer".doc_id = "inner".doc_id)
-> Sort (cost=20817.14..20849.19 rows=12818 width=4) (actual time=10145.947..10314.729 rows=101159 loops=1)
Sort Key: da_client.doc_id
-> Bitmap Heap Scan on da da_client (cost=155.91..19942.58 rows=12818 width=4) (actual time=1165.917..9800.025 rows=101159 loops=1)
Recheck Cond: ((an = 'ssis_client'::text) AND (av = 'client1'::text))
-> Bitmap Index Scan on da_an_av_idx (cost=0.00..155.91 rows=12818 width=0) (actual time=1156.362..1156.362 rows=101159 loops=1)
Index Cond: ((an = 'ssis_client'::text) AND (av = 'client1'::text))
-> Sort (cost=19518.54..19548.09 rows=11817 width=4) (actual time=7117.330..7188.340 rows=101190 loops=1)
Sort Key: da_subtype.doc_id
-> Bitmap Heap Scan on da da_subtype (cost=143.90..18719.21 rows=11817 width=4) (actual time=913.707..6798.627 rows=101190 loops=1)
Recheck Cond: ((an = 'ssis_subtype'::text) AND (av = 'subtype1'::text))
-> Bitmap Index Scan on da_an_av_idx (cost=0.00..143.90 rows=11817 width=0) (actual time=904.720..904.720 rows=101190 loops=1)
Index Cond: ((an = 'ssis_subtype'::text) AND (av = 'subtype1'::text))
-> Index Scan using document_pkey on document doc (cost=0.00..3.60 rows=1 width=4) (actual time=1.762..1.763 rows=1 loops=10153)
Index Cond: (doc.id = "outer".doc_id)
-> Index Scan using da_pkey on da da_type (cost=0.00..3.55 rows=1 width=4) (actual time=1.451..1.452 rows=0 loops=10153)
Index Cond: (("outer".id = da_type.doc_id) AND (da_type.an = 'ssis_type'::text) AND (da_type.av = 'type1'::text))
-> Index Scan using da_pkey on da da_group (cost=0.00..3.55 rows=1 width=4) (actual time=0.028..0.028 rows=0 loops=1072)
Index Cond: (("outer".id = da_group.doc_id) AND (da_group.an = 'ssis_group'::text) AND (da_group.av = 'group1'::text))
Total runtime: 50296.288 ms
(24 rows)
Time: 50301.853 ms(originally from http://microjet.ath.cx/WebWiki/2005.12.11_PreferringNaturalToSurrogateKey.html)
No comments:
Post a Comment