It is really sad to see Jeff S. et. al fighting the horde of ignorants, but that is a battle that has been fought many times before and is not what I want to write about here.
What I want to show here is the difference in query time you'd expect to see by trimming down on unnecessary usage of surrogate keys.
A schema with unnecessary surrogate keys.
-- Done on postgresql 8.1 dms3_test=> \d document Table "dms4.document" Column | Type | Modifiers ---------------+--------------------------+------------------------------------------------------------- id | integer | not null default nextval('document_id_seq'::regclass) state | text | not null modified_who | text | not null modified_when | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone created_who | text | not null created_when | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone Indexes: "document_pkey" PRIMARY KEY, btree (id) "document_created_when" btree (created_when) "document_created_who" btree (created_who) "document_modified_when" btree (modified_when) "document_modified_who" btree (modified_who) "document_state" btree (state) Foreign-key constraints: "document_created_who_fkey" FOREIGN KEY (created_who) REFERENCES who(name) "document_modified_who_fkey" FOREIGN KEY (modified_who) REFERENCES who(name) "document_state_fkey" FOREIGN KEY (state) REFERENCES state(name) dms3_test=> \d attribute_name Table "public.attribute_name" Column | Type | Modifiers --------+---------+------------------------------------------------------------------ id | integer | not null default nextval(('attribute_name_seq'::text)::regclass) name | text | not null Indexes: "attribute_name_pkey" PRIMARY KEY, btree (id) "an_name_idx" UNIQUE, btree (name) dms3_test=> \d attribute Table "public.attribute" Column | Type | Modifiers ---------+---------+-------------------------------------------------------- id | integer | not null default nextval('attribute_id_seq'::regclass) name_id | integer | not null value | text | not null Indexes: "attribute_pkey" PRIMARY KEY, btree (id) "attr_nv_idx" UNIQUE, btree (name_id, value) Foreign-key constraints: "$1" FOREIGN KEY (name_id) REFERENCES attribute_name(id) dms3_test=> \d document_attribute Table "public.document_attribute" Column | Type | Modifiers --------------+---------+----------- doc_id | integer | not null attribute_id | integer | not null Indexes: "document_attribute_pkey" PRIMARY KEY, btree (doc_id, attribute_id) CLUSTER "da_ad_idx" UNIQUE, btree (attribute_id, doc_id) "da_attr_idx" btree (attribute_id) Foreign-key constraints: "$1" FOREIGN KEY (doc_id) REFERENCES public.document(id) "$2" FOREIGN KEY (attribute_id) REFERENCES attribute(id) dms3_test=> explain analyze SELECT doc.id FROM attribute as attr0, attribute_name as an0, attribute_name as an2, document_attribute as da2, document as doc, attribute_name as an1, attribute as attr3, document_attribute as da1, attribute as attr2, document_attribute as da0, attribute_name as an3, attribute as attr1, document_attribute as da3 WHERE da0.doc_id = doc.id AND attr0.id = da0.attribute_id AND an0.id = attr0.name_id AND da1.doc_id = doc.id AND attr1.id = da1.attribute_id AND an1.id = attr1.name_id AND da2.doc_id = doc.id AND attr2.id = da2.attribute_id AND an2.id = attr2.name_id AND da3.doc_id = doc.id AND attr3.id = da3.attribute_id AND an3.id = attr3.name_id AND an0.name = 'ssis_client' AND attr0.value = 'client1' AND an1.name = 'ssis_group' AND attr1.value = 'group1' AND an2.name = 'ssis_type' AND attr2.value = 'type1' AND an3.name = 'ssis_subtype' AND attr3.value = 'subtype1' ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=655695.94..886471.16 rows=1 width=4) (actual time=188571.542..217327.467 rows=3 loops=1) Join Filter: ("outer".id = "inner".name_id) -> Seq Scan on attribute_name an0 (cost=0.00..1.07 rows=1 width=4) (actual time=0.045..0.048 rows=1 loops=1) Filter: (name = 'ssis_client'::text) -> Nested Loop (cost=655695.94..886470.07 rows=1 width=8) (actual time=188571.488..217327.404 rows=3 loops=1) Join Filter: ("outer".id = "inner".name_id) -> Seq Scan on attribute_name an5 (cost=0.00..1.07 rows=1 width=4) (actual time=0.008..0.015 rows=1 loops=1) Filter: (name = 'ssis_e2xkey'::text) -> Nested Loop (cost=655695.94..886468.99 rows=1 width=12) (actual time=188571.475..217327.378 rows=3 loops=1) Join Filter: ("outer".id = "inner".name_id) -> Seq Scan on attribute_name an4 (cost=0.00..1.07 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1) Filter: (name = 'ssis_comment'::text) -> Nested Loop (cost=655695.94..886467.90 rows=1 width=16) (actual time=188571.453..217327.346 rows=3 loops=1) -> Hash Join (cost=655695.94..886463.56 rows=1 width=40) (actual time=183814.259..217326.874 rows=11 loops=1) Hash Cond: ("outer".name_id = "inner".id) -> Hash Join (cost=655694.86..886462.45 rows=4 width=44) (actual time=183814.220..217326.774 rows=11 loops=1) Hash Cond: (("outer".id = "inner".name_id) AND ("outer".attribute_id = "inner".id)) -> Hash Join (cost=655692.89..886452.06 rows=838 width=52) (actual time=183795.378..217307.410 rows=951 loops=1) Hash Cond: ("outer".attribute_id = "inner".id) -> Nested Loop (cost=655690.92..886280.53 rows=32236 width=52) (actual time=183664.609..217269.300 rows=97785 loops=1) -> Hash Join (cost=655688.96..885633.84 rows=16118 width=44) (actual time=183640.064..217012.129 rows=97785 loops=1) Hash Cond: ("outer".doc_id = "inner".id) -> Nested Loop (cost=0.00..191997.58 rows=7557225 width=12) (actual time=12.250..10810.601 rows=7557225 loops=1) -> Index Scan using attribute_name_pkey on attribute_name an3 (cost=0.00..3.08 rows=1 width=4) (actual time=12.218..12.232 rows=1 loops=1) Filter: (name = 'ssis_subtype'::text) -> Seq Scan on document_attribute da1 (cost=0.00..116422.25 rows=7557225 width=8) (actual time=0.021..4547.481 rows=7557225 loops=1) -> Hash (cost=655683.57..655683.57 rows=2155 width=32) (actual time=183565.310..183565.310 rows=0 loops=1) -> Nested Loop (cost=462162.26..655683.57 rows=2155 width=32) (actual time=128403.334..183556.998 rows=9257 loops=1) -> Hash Join (cost=462162.26..650853.43 rows=288 width=24) (actual time=128403.319..183498.454 rows=906 loops=1) Hash Cond: ("outer".attribute_id = "inner".id) -> Hash Join (cost=462160.29..650793.18 rows=11080 width=24) (actual time=128337.287..183447.336 rows=90804 loops=1) Hash Cond: ("outer".name_id = "inner".id) -> Nested Loop (cost=462159.22..650348.93 rows=66476 width=28) (actual time=128337.257..183307.650 rows=90804 loops=1) -> Hash Join (cost=462159.22..501285.43 rows=8888 width=20) (actual time=128306.517..132366.000 rows=8840 loops=1) Hash Cond: ("outer".id = "inner".doc_id) -> Hash Join (cost=1.06..25599.40 rows=202087 width=4) (actual time=41.568..2754.244 rows=329853 loops=1) Hash Cond: ("outer".state_id = "inner".id) -> Seq Scan on document doc (cost=0.00..18525.31 rows=1010431 width=8) (actual time=18.767..2089.253 rows=1010431 loops=1) -> Hash (cost=1.06..1.06 rows=1 width=4) (actual time=14.681..14.681 rows=0 loops=1) -> Seq Scan on state (cost=0.00..1.06 rows=1 width=4) (actual time=0.031..0.035 rows=1 loops=1) Filter: (name = 'associated'::text) -> Hash (cost=461808.06..461808.06 rows=44437 width=16) (actual time=128239.687..128239.687 rows=0 loops=1) -> Hash Join (cost=157528.96..461808.06 rows=44437 width=16) (actual time=15479.239..128139.028 rows=27087 loops=1) Hash Cond: ("outer".attribute_id = "inner".id) -> Hash Join (cost=157526.99..452807.67 rows=1710811 width=16) (actual time=15478.512..127203.588 rows=2313438 loops=1) Hash Cond: ("outer".doc_id = "inner".doc_id) -> Seq Scan on document_attribute da2 (cost=0.00..116422.25 rows=7557225 width=8) (actual time=0.019..37889.434 rows=7557225 loops=1) -> Hash (cost=156173.26..156173.26 rows=196292 width=8) (actual time=14908.286..14908.286 rows=0 loops=1) -> Hash Join (cost=1.97..156173.26 rows=196292 width=8) (actual time=7.750..14650.188 rows=269366 loops=1) Hash Cond: ("outer".attribute_id = "inner".id) -> Seq Scan on document_attribute da4 (cost=0.00..116422.25 rows=7557225 width=8) (actual time=7.678..11635.892 rows=7557225 loops=1) -> Hash (cost=1.96..1.96 rows=2 width=8) (actual time=0.049..0.049 rows=0 loops=1) -> Seq Scan on attribute attr4 (cost=0.00..1.96 rows=2 width=8) (actual time=0.008..0.047 rows=1 loops=1) Filter: (value = 'comment1'::text) -> Hash (cost=1.96..1.96 rows=2 width=8) (actual time=0.046..0.046 rows=0 loops=1) -> Seq Scan on attribute attr2 (cost=0.00..1.96 rows=2 width=8) (actual time=0.007..0.044 rows=1 loops=1) Filter: (value = 'type1'::text) -> Index Scan using document_attribute_pkey on document_attribute da0 (cost=0.00..16.67 rows=8 width=8) (actual time=5.713..5.748 rows=10 loops=8840) Index Cond: ("outer".doc_id = da0.doc_id) -> Hash (cost=1.07..1.07 rows=1 width=4) (actual time=0.011..0.011 rows=0 loops=1) -> Seq Scan on attribute_name an2 (cost=0.00..1.07 rows=1 width=4) (actual time=0.007..0.010 rows=1 loops=1) Filter: (name = 'ssis_type'::text) -> Hash (cost=1.96..1.96 rows=2 width=8) (actual time=0.052..0.052 rows=0 loops=1) -> Seq Scan on attribute attr0 (cost=0.00..1.96 rows=2 width=8) (actual time=0.014..0.050 rows=1 loops=1) Filter: (value = 'client1'::text) -> Index Scan using document_attribute_pkey on document_attribute da3 (cost=0.00..16.67 rows=8 width=8) (actual time=0.005..0.049 rows=10 loops=906) Index Cond: (da3.doc_id = "outer".doc_id) -> Materialize (cost=1.96..1.98 rows=2 width=8) (actual time=0.000..0.001 rows=1 loops=97785) -> Seq Scan on attribute attr5 (cost=0.00..1.96 rows=2 width=8) (actual time=18.498..18.537 rows=1 loops=1) Filter: (value = 'e2xkey1'::text) -> Hash (cost=1.96..1.96 rows=2 width=8) (actual time=0.045..0.045 rows=0 loops=1) -> Seq Scan on attribute attr1 (cost=0.00..1.96 rows=2 width=8) (actual time=0.007..0.044 rows=1 loops=1) Filter: (value = 'group1'::text) -> Hash (cost=1.96..1.96 rows=2 width=8) (actual time=18.781..18.781 rows=0 loops=1) -> Seq Scan on attribute attr3 (cost=0.00..1.96 rows=2 width=8) (actual time=18.739..18.779 rows=1 loops=1) Filter: (value = 'subtype1'::text) -> Hash (cost=1.07..1.07 rows=1 width=4) (actual time=0.018..0.018 rows=0 loops=1) -> Seq Scan on attribute_name an1 (cost=0.00..1.07 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1) Filter: (name = 'ssis_group'::text) -> Index Scan using document_attribute_pkey on document_attribute da5 (cost=0.00..4.33 rows=1 width=8) (actual time=0.037..0.037 rows=0 loops=11) Index Cond: ((da5.doc_id = "outer".doc_id) AND ("outer".id = da5.attribute_id)) Total runtime: 217405.692 ms (82 rows) Time: 217885.006 msThe same data but in a schema without trimmed-down surrogate keys.
-- Done on postgresql 8.1 dms3_test=> \d document Table "dms4.document" Column | Type | Modifiers ---------------+--------------------------+------------------------------------------------------------- id | integer | not null default nextval('document_id_seq'::regclass) state | text | not null modified_who | text | not null modified_when | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone created_who | text | not null created_when | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone Indexes: "document_pkey" PRIMARY KEY, btree (id) "document_created_when" btree (created_when) "document_created_who" btree (created_who) "document_modified_when" btree (modified_when) "document_modified_who" btree (modified_who) "document_state" btree (state) Foreign-key constraints: "document_created_who_fkey" FOREIGN KEY (created_who) REFERENCES who(name) "document_modified_who_fkey" FOREIGN KEY (modified_who) REFERENCES who(name) "document_state_fkey" FOREIGN KEY (state) REFERENCES state(name) dms3_test=> \d temp20051206.da Table "temp20051206.da" Column | Type | Modifiers --------+---------+----------- doc_id | integer | not null an | text | not null av | text | not null Indexes: "da_pkey" PRIMARY KEY, btree (doc_id, an, av) "da_an_av_idx" UNIQUE, btree (an, av, doc_id) Foreign-key constraints: "da_an_fkey" FOREIGN KEY (an, av) REFERENCES attr(name, value) DEFERRABLE INITIALLY DEFERRED "da_doc_id_fkey" FOREIGN KEY (doc_id) REFERENCES public.document(id) DEFERRABLE INITIALLY DEFERRED dms3_test=> explain analyze select doc.id from document as doc, da da_client, da da_group, da da_type, da da_subtype WHERE doc.id=da_client.doc_id and doc.id=da_group.doc_id and doc.id=da_type.doc_id and doc.id=da_subtype.doc_id and da_client.an='ssis_client' and da_client.av='client1' and da_group.an='ssis_group' and da_group.av='group1' and da_type.an='ssis_type' and da_type.av='type1' and da_subtype.an='ssis_subtype' and da_subtype.av='subtype1' ; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=40335.69..53871.76 rows=1 width=4) (actual time=17319.458..50292.429 rows=140 loops=1) -> Nested Loop (cost=40335.69..53864.64 rows=2 width=16) (actual time=17319.392..50260.780 rows=1072 loops=1) -> Nested Loop (cost=40335.69..53330.49 rows=150 width=12) (actual time=17289.476..35508.467 rows=10153 loops=1) -> Merge Join (cost=40335.69..40494.44 rows=3558 width=8) (actual time=17263.290..17594.400 rows=10153 loops=1) Merge Cond: ("outer".doc_id = "inner".doc_id) -> Sort (cost=20817.14..20849.19 rows=12818 width=4) (actual time=10145.947..10314.729 rows=101159 loops=1) Sort Key: da_client.doc_id -> Bitmap Heap Scan on da da_client (cost=155.91..19942.58 rows=12818 width=4) (actual time=1165.917..9800.025 rows=101159 loops=1) Recheck Cond: ((an = 'ssis_client'::text) AND (av = 'client1'::text)) -> Bitmap Index Scan on da_an_av_idx (cost=0.00..155.91 rows=12818 width=0) (actual time=1156.362..1156.362 rows=101159 loops=1) Index Cond: ((an = 'ssis_client'::text) AND (av = 'client1'::text)) -> Sort (cost=19518.54..19548.09 rows=11817 width=4) (actual time=7117.330..7188.340 rows=101190 loops=1) Sort Key: da_subtype.doc_id -> Bitmap Heap Scan on da da_subtype (cost=143.90..18719.21 rows=11817 width=4) (actual time=913.707..6798.627 rows=101190 loops=1) Recheck Cond: ((an = 'ssis_subtype'::text) AND (av = 'subtype1'::text)) -> Bitmap Index Scan on da_an_av_idx (cost=0.00..143.90 rows=11817 width=0) (actual time=904.720..904.720 rows=101190 loops=1) Index Cond: ((an = 'ssis_subtype'::text) AND (av = 'subtype1'::text)) -> Index Scan using document_pkey on document doc (cost=0.00..3.60 rows=1 width=4) (actual time=1.762..1.763 rows=1 loops=10153) Index Cond: (doc.id = "outer".doc_id) -> Index Scan using da_pkey on da da_type (cost=0.00..3.55 rows=1 width=4) (actual time=1.451..1.452 rows=0 loops=10153) Index Cond: (("outer".id = da_type.doc_id) AND (da_type.an = 'ssis_type'::text) AND (da_type.av = 'type1'::text)) -> Index Scan using da_pkey on da da_group (cost=0.00..3.55 rows=1 width=4) (actual time=0.028..0.028 rows=0 loops=1072) Index Cond: (("outer".id = da_group.doc_id) AND (da_group.an = 'ssis_group'::text) AND (da_group.av = 'group1'::text)) Total runtime: 50296.288 ms (24 rows) Time: 50301.853 ms
(originally from http://microjet.ath.cx/WebWiki/2005.12.11_PreferringNaturalToSurrogateKey.html)
No comments:
Post a Comment