2005-12-27

Using symbols for the wrong reason

The concept of symbols have been popular among the lisp community, yet not many people know about it mainly due to the general ignorance.
Then ruby came and made symbols be a commodity programming construct. People who were not aware of symbols now are.

And they are asking about it numerous times. There is not a week in the ruby mailing list that there isn't a question about symbols: what are they and what are they good for?

Many people have tried to answer that, but the answer has been along the lines presented in these two articles: http://glu.ttono.us/articles/2005/08/19/understanding-ruby-symbols and http://zephyrfalcon.org/weblog2/arch_e10_00850.html#e857.

The answer has been putting undue emphasise on the way current ruby VM implements symbols. Ruby string is mutable, and it is not efficiently implemented in current ruby VM. So, use symbols for efficient, immutable, and string-like objects.

It is not wrong and it is correct for current ruby VM. However, I think, that is a misguided answer to the questions. The answer should, on the other hand, put an emphasise on the programmer's intention.
rubyists uses #each() method more frequently than a for loop because it clarifies their intention of iterating over some sequence even though they could have used the more efficient for loop or even the if and goto constructs.

One does not tell another to use if and goto over for loop for iterating a sequence simply because if and goto may be more efficient. No matter how inefficient a compiler/interpreter implements the for loop construct, the possibility of an efficient for loop implementation remains. In fact, by using the for construct, the compiler could have an easier time deducing your intent of looping, and if some conditions are met (e.g., closed looping of certain numbers of times), it could unroll your loop for a better performance if your systems allows it.

In short, any answer that depends on a particular implementation is doomed to be short-lived. What happens if the next ruby VM implements COW (copy-on-write) strings? A COW string would share initial instances. Only f there is a modification to the instance, then the initial instance is copied and the modification is performed on the copy. As long as one does not try to modify COW string instance, it can be as efficient as how the current VM implements symbols. IOW, any answer that rallies around so-called efficiency while abandoning intent would become obsolete and there is a new scramble to get at an updated answer.

Thus, I finally come to say that one should not use symbols just for efficiency gain. Symbols are not meant to be an immutable string-like object. It is really meant to be used to construct user-defined identifiers. The user in this case would be the programmers.

Consider:

foo1 = { 
   :host => 'localhost',
   :port => 80
}
foo2 = {
   'host' => 'localhost',
   'port' => 80

In foo1, symbols are being used to identify the following data. The string 'localhost' is identified as a host, and 80 is not just any number, but rather a port number.

In foo2, it is a bit unclear as to what purpose 'host' and 'port' serves. Is foo2 a macro replacement list? That is, if the program reads the string 'host', would it be replaced to 'localhost'? What is the purpose of 'host' there? Is it an identifier for the string 'localhost'?

The programming world should borrow the real estate's adage of "location, location, location". It should be translated to: "intention, intention, intention". It is the main reason why comments that clarifies the intention of the programmer are so valuable. It is the main reason why there are a variety language constructs. It should also be the main reason for you to decide whether or not to use symbols.

2005.12.28 update: I am joyful that not everyone resorted to dumbing down the concept of symbols. http://onestepback.org/index.cgi/Tech/Ruby/SymbolsAreNotImmutableStrings.red
2006.01.06 update: What an amazing interest on symbol! I don't think I've seen any one topic in ruby that has generated 142 posts in a single thread before this.
http://groups.google.com/group/comp.lang.ruby/browse_frm/thread/164ae5f5cbbac02e?q=differences+between+%3Afoo&hl=en&
2007.07.03 update: a related article: 2007.07.03_WhatAreSymbols

(originally from http://microjet.ath.cx/WebWiki/2005.12.27_UsingSymbolsForTheWrongReason.html)

2005-12-11

Preferring natural to surrogate key

I was reading an article at The Daily WTF today. The OP (original posts) sparked a debate on the use of surrogate vs. natural key.

It is really sad to see Jeff S. et. al fighting the horde of ignorants, but that is a battle that has been fought many times before and is not what I want to write about here.

What I want to show here is the difference in query time you'd expect to see by trimming down on unnecessary usage of surrogate keys.

A schema with unnecessary surrogate keys.

-- Done on postgresql 8.1

dms3_test=> \d document
                                         Table "dms4.document"
    Column     |           Type           |                          Modifiers                          
---------------+--------------------------+-------------------------------------------------------------
 id            | integer                  | not null default nextval('document_id_seq'::regclass)
 state         | text                     | not null
 modified_who  | text                     | not null
 modified_when | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone
 created_who   | text                     | not null
 created_when  | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone
Indexes:
    "document_pkey" PRIMARY KEY, btree (id)
    "document_created_when" btree (created_when)
    "document_created_who" btree (created_who)
    "document_modified_when" btree (modified_when)
    "document_modified_who" btree (modified_who)
    "document_state" btree (state)
Foreign-key constraints:
    "document_created_who_fkey" FOREIGN KEY (created_who) REFERENCES who(name)
    "document_modified_who_fkey" FOREIGN KEY (modified_who) REFERENCES who(name)
    "document_state_fkey" FOREIGN KEY (state) REFERENCES state(name)

dms3_test=> \d attribute_name
                            Table "public.attribute_name"
 Column |  Type   |                            Modifiers                             
--------+---------+------------------------------------------------------------------
 id     | integer | not null default nextval(('attribute_name_seq'::text)::regclass)
 name   | text    | not null
Indexes:
    "attribute_name_pkey" PRIMARY KEY, btree (id)
    "an_name_idx" UNIQUE, btree (name)

dms3_test=> \d attribute
                          Table "public.attribute"
 Column  |  Type   |                       Modifiers                        
---------+---------+--------------------------------------------------------
 id      | integer | not null default nextval('attribute_id_seq'::regclass)
 name_id | integer | not null
 value   | text    | not null
Indexes:
    "attribute_pkey" PRIMARY KEY, btree (id)
    "attr_nv_idx" UNIQUE, btree (name_id, value)
Foreign-key constraints:
    "$1" FOREIGN KEY (name_id) REFERENCES attribute_name(id)


dms3_test=> \d document_attribute
 Table "public.document_attribute"
    Column    |  Type   | Modifiers 
--------------+---------+-----------
 doc_id       | integer | not null
 attribute_id | integer | not null
Indexes:
    "document_attribute_pkey" PRIMARY KEY, btree (doc_id, attribute_id) CLUSTER
    "da_ad_idx" UNIQUE, btree (attribute_id, doc_id)
    "da_attr_idx" btree (attribute_id)
Foreign-key constraints:
    "$1" FOREIGN KEY (doc_id) REFERENCES public.document(id)
    "$2" FOREIGN KEY (attribute_id) REFERENCES attribute(id)


dms3_test=> 

explain analyze SELECT doc.id FROM attribute as attr0, attribute_name
as an0, attribute_name as an2, document_attribute as da2, document as
doc, attribute_name as an1, attribute as attr3, document_attribute as
da1, attribute as attr2, document_attribute as da0, attribute_name as
an3, attribute as attr1, document_attribute as da3

WHERE 
da0.doc_id = doc.id AND attr0.id = da0.attribute_id AND an0.id = attr0.name_id AND 
da1.doc_id = doc.id AND attr1.id = da1.attribute_id AND an1.id = attr1.name_id AND 
da2.doc_id = doc.id AND attr2.id = da2.attribute_id AND an2.id = attr2.name_id AND 
da3.doc_id = doc.id AND attr3.id = da3.attribute_id AND an3.id = attr3.name_id AND 
an0.name = 'ssis_client' AND attr0.value = 'client1' AND
an1.name = 'ssis_group' AND attr1.value = 'group1' AND
an2.name = 'ssis_type' AND attr2.value = 'type1' AND
an3.name = 'ssis_subtype' AND attr3.value = 'subtype1'
;


                                                                                                                         QUERY PLAN                                                                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=655695.94..886471.16 rows=1 width=4) (actual time=188571.542..217327.467 rows=3 loops=1)
   Join Filter: ("outer".id = "inner".name_id)
   ->  Seq Scan on attribute_name an0  (cost=0.00..1.07 rows=1 width=4) (actual time=0.045..0.048 rows=1 loops=1)
         Filter: (name = 'ssis_client'::text)
   ->  Nested Loop  (cost=655695.94..886470.07 rows=1 width=8) (actual time=188571.488..217327.404 rows=3 loops=1)
         Join Filter: ("outer".id = "inner".name_id)
         ->  Seq Scan on attribute_name an5  (cost=0.00..1.07 rows=1 width=4) (actual time=0.008..0.015 rows=1 loops=1)
               Filter: (name = 'ssis_e2xkey'::text)
         ->  Nested Loop  (cost=655695.94..886468.99 rows=1 width=12) (actual time=188571.475..217327.378 rows=3 loops=1)
               Join Filter: ("outer".id = "inner".name_id)
               ->  Seq Scan on attribute_name an4  (cost=0.00..1.07 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)
                     Filter: (name = 'ssis_comment'::text)
               ->  Nested Loop  (cost=655695.94..886467.90 rows=1 width=16) (actual time=188571.453..217327.346 rows=3 loops=1)
                     ->  Hash Join  (cost=655695.94..886463.56 rows=1 width=40) (actual time=183814.259..217326.874 rows=11 loops=1)
                           Hash Cond: ("outer".name_id = "inner".id)
                           ->  Hash Join  (cost=655694.86..886462.45 rows=4 width=44) (actual time=183814.220..217326.774 rows=11 loops=1)
                                 Hash Cond: (("outer".id = "inner".name_id) AND ("outer".attribute_id = "inner".id))
                                 ->  Hash Join  (cost=655692.89..886452.06 rows=838 width=52) (actual time=183795.378..217307.410 rows=951 loops=1)
                                       Hash Cond: ("outer".attribute_id = "inner".id)
                                       ->  Nested Loop  (cost=655690.92..886280.53 rows=32236 width=52) (actual time=183664.609..217269.300 rows=97785 loops=1)
                                             ->  Hash Join  (cost=655688.96..885633.84 rows=16118 width=44) (actual time=183640.064..217012.129 rows=97785 loops=1)
                                                   Hash Cond: ("outer".doc_id = "inner".id)
                                                   ->  Nested Loop  (cost=0.00..191997.58 rows=7557225 width=12) (actual time=12.250..10810.601 rows=7557225 loops=1)
                                                         ->  Index Scan using attribute_name_pkey on attribute_name an3  (cost=0.00..3.08 rows=1 width=4) (actual time=12.218..12.232 rows=1 loops=1)
                                                               Filter: (name = 'ssis_subtype'::text)
                                                         ->  Seq Scan on document_attribute da1  (cost=0.00..116422.25 rows=7557225 width=8) (actual time=0.021..4547.481 rows=7557225 loops=1)
                                                   ->  Hash  (cost=655683.57..655683.57 rows=2155 width=32) (actual time=183565.310..183565.310 rows=0 loops=1)
                                                         ->  Nested Loop  (cost=462162.26..655683.57 rows=2155 width=32) (actual time=128403.334..183556.998 rows=9257 loops=1)
                                                               ->  Hash Join  (cost=462162.26..650853.43 rows=288 width=24) (actual time=128403.319..183498.454 rows=906 loops=1)
                                                                     Hash Cond: ("outer".attribute_id = "inner".id)
                                                                     ->  Hash Join  (cost=462160.29..650793.18 rows=11080 width=24) (actual time=128337.287..183447.336 rows=90804 loops=1)
                                                                           Hash Cond: ("outer".name_id = "inner".id)
                                                                           ->  Nested Loop  (cost=462159.22..650348.93 rows=66476 width=28) (actual time=128337.257..183307.650 rows=90804 loops=1)
                                                                                 ->  Hash Join  (cost=462159.22..501285.43 rows=8888 width=20) (actual time=128306.517..132366.000 rows=8840 loops=1)
                                                                                       Hash Cond: ("outer".id = "inner".doc_id)
                                                                                       ->  Hash Join  (cost=1.06..25599.40 rows=202087 width=4) (actual time=41.568..2754.244 rows=329853 loops=1)
                                                                                             Hash Cond: ("outer".state_id = "inner".id)
                                                                                             ->  Seq Scan on document doc  (cost=0.00..18525.31 rows=1010431 width=8) (actual time=18.767..2089.253 rows=1010431 loops=1)
                                                                                             ->  Hash  (cost=1.06..1.06 rows=1 width=4) (actual time=14.681..14.681 rows=0 loops=1)
                                                                                                   ->  Seq Scan on state  (cost=0.00..1.06 rows=1 width=4) (actual time=0.031..0.035 rows=1 loops=1)
                                                                                                         Filter: (name = 'associated'::text)
                                                                                       ->  Hash  (cost=461808.06..461808.06 rows=44437 width=16) (actual time=128239.687..128239.687 rows=0 loops=1)
                                                                                             ->  Hash Join  (cost=157528.96..461808.06 rows=44437 width=16) (actual time=15479.239..128139.028 rows=27087 loops=1)
                                                                                                   Hash Cond: ("outer".attribute_id = "inner".id)
                                                                                                   ->  Hash Join  (cost=157526.99..452807.67 rows=1710811 width=16) (actual time=15478.512..127203.588 rows=2313438 loops=1)
                                                                                                         Hash Cond: ("outer".doc_id = "inner".doc_id)
                                                                                                         ->  Seq Scan on document_attribute da2  (cost=0.00..116422.25 rows=7557225 width=8) (actual time=0.019..37889.434 rows=7557225 loops=1)
                                                                                                         ->  Hash  (cost=156173.26..156173.26 rows=196292 width=8) (actual time=14908.286..14908.286 rows=0 loops=1)
                                                                                                               ->  Hash Join  (cost=1.97..156173.26 rows=196292 width=8) (actual time=7.750..14650.188 rows=269366 loops=1)
                                                                                                                     Hash Cond: ("outer".attribute_id = "inner".id)
                                                                                                                     ->  Seq Scan on document_attribute da4  (cost=0.00..116422.25 rows=7557225 width=8) (actual time=7.678..11635.892 rows=7557225 loops=1)
                                                                                                                     ->  Hash  (cost=1.96..1.96 rows=2 width=8) (actual time=0.049..0.049 rows=0 loops=1)
                                                                                                                           ->  Seq Scan on attribute attr4  (cost=0.00..1.96 rows=2 width=8) (actual time=0.008..0.047 rows=1 loops=1)
                                                                                                                                 Filter: (value = 'comment1'::text)
                                                                                                   ->  Hash  (cost=1.96..1.96 rows=2 width=8) (actual time=0.046..0.046 rows=0 loops=1)
                                                                                                         ->  Seq Scan on attribute attr2  (cost=0.00..1.96 rows=2 width=8) (actual time=0.007..0.044 rows=1 loops=1)
                                                                                                               Filter: (value = 'type1'::text)
                                                                                 ->  Index Scan using document_attribute_pkey on document_attribute da0  (cost=0.00..16.67 rows=8 width=8) (actual time=5.713..5.748 rows=10 loops=8840)
                                                                                       Index Cond: ("outer".doc_id = da0.doc_id)
                                                                           ->  Hash  (cost=1.07..1.07 rows=1 width=4) (actual time=0.011..0.011 rows=0 loops=1)
                                                                                 ->  Seq Scan on attribute_name an2  (cost=0.00..1.07 rows=1 width=4) (actual time=0.007..0.010 rows=1 loops=1)
                                                                                       Filter: (name = 'ssis_type'::text)
                                                                     ->  Hash  (cost=1.96..1.96 rows=2 width=8) (actual time=0.052..0.052 rows=0 loops=1)
                                                                           ->  Seq Scan on attribute attr0  (cost=0.00..1.96 rows=2 width=8) (actual time=0.014..0.050 rows=1 loops=1)
                                                                                 Filter: (value = 'client1'::text)
                                                               ->  Index Scan using document_attribute_pkey on document_attribute da3  (cost=0.00..16.67 rows=8 width=8) (actual time=0.005..0.049 rows=10 loops=906)
                                                                     Index Cond: (da3.doc_id = "outer".doc_id)
                                             ->  Materialize  (cost=1.96..1.98 rows=2 width=8) (actual time=0.000..0.001 rows=1 loops=97785)
                                                   ->  Seq Scan on attribute attr5  (cost=0.00..1.96 rows=2 width=8) (actual time=18.498..18.537 rows=1 loops=1)
                                                         Filter: (value = 'e2xkey1'::text)
                                       ->  Hash  (cost=1.96..1.96 rows=2 width=8) (actual time=0.045..0.045 rows=0 loops=1)
                                             ->  Seq Scan on attribute attr1  (cost=0.00..1.96 rows=2 width=8) (actual time=0.007..0.044 rows=1 loops=1)
                                                   Filter: (value = 'group1'::text)
                                 ->  Hash  (cost=1.96..1.96 rows=2 width=8) (actual time=18.781..18.781 rows=0 loops=1)
                                       ->  Seq Scan on attribute attr3  (cost=0.00..1.96 rows=2 width=8) (actual time=18.739..18.779 rows=1 loops=1)
                                             Filter: (value = 'subtype1'::text)
                           ->  Hash  (cost=1.07..1.07 rows=1 width=4) (actual time=0.018..0.018 rows=0 loops=1)
                                 ->  Seq Scan on attribute_name an1  (cost=0.00..1.07 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)
                                       Filter: (name = 'ssis_group'::text)
                     ->  Index Scan using document_attribute_pkey on document_attribute da5  (cost=0.00..4.33 rows=1 width=8) (actual time=0.037..0.037 rows=0 loops=11)
                           Index Cond: ((da5.doc_id = "outer".doc_id) AND ("outer".id = da5.attribute_id))
 Total runtime: 217405.692 ms
(82 rows)

Time: 217885.006 ms

The same data but in a schema without trimmed-down surrogate keys.

-- Done on postgresql 8.1

dms3_test=> \d document
                                         Table "dms4.document"
    Column     |           Type           |                          Modifiers                          
---------------+--------------------------+-------------------------------------------------------------
 id            | integer                  | not null default nextval('document_id_seq'::regclass)
 state         | text                     | not null
 modified_who  | text                     | not null
 modified_when | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone
 created_who   | text                     | not null
 created_when  | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone
Indexes:
    "document_pkey" PRIMARY KEY, btree (id)
    "document_created_when" btree (created_when)
    "document_created_who" btree (created_who)
    "document_modified_when" btree (modified_when)
    "document_modified_who" btree (modified_who)
    "document_state" btree (state)
Foreign-key constraints:
    "document_created_who_fkey" FOREIGN KEY (created_who) REFERENCES who(name)
    "document_modified_who_fkey" FOREIGN KEY (modified_who) REFERENCES who(name)
    "document_state_fkey" FOREIGN KEY (state) REFERENCES state(name)

dms3_test=> \d temp20051206.da
   Table "temp20051206.da"
 Column |  Type   | Modifiers 
--------+---------+-----------
 doc_id | integer | not null
 an     | text    | not null
 av     | text    | not null
Indexes:
    "da_pkey" PRIMARY KEY, btree (doc_id, an, av)
    "da_an_av_idx" UNIQUE, btree (an, av, doc_id)
Foreign-key constraints:
    "da_an_fkey" FOREIGN KEY (an, av) REFERENCES attr(name, value) DEFERRABLE INITIALLY DEFERRED
    "da_doc_id_fkey" FOREIGN KEY (doc_id) REFERENCES public.document(id) DEFERRABLE INITIALLY DEFERRED

dms3_test=> 

explain analyze select doc.id from document as doc, da
da_client, da da_group, da da_type, da da_subtype 

WHERE
doc.id=da_client.doc_id and doc.id=da_group.doc_id and
doc.id=da_type.doc_id and doc.id=da_subtype.doc_id and
da_client.an='ssis_client' and da_client.av='client1' and 
da_group.an='ssis_group' and da_group.av='group1' and
da_type.an='ssis_type' and da_type.av='type1' and 
da_subtype.an='ssis_subtype' and da_subtype.av='subtype1'
;

                                                                             QUERY PLAN                                                                             
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=40335.69..53871.76 rows=1 width=4) (actual time=17319.458..50292.429 rows=140 loops=1)
   ->  Nested Loop  (cost=40335.69..53864.64 rows=2 width=16) (actual time=17319.392..50260.780 rows=1072 loops=1)
         ->  Nested Loop  (cost=40335.69..53330.49 rows=150 width=12) (actual time=17289.476..35508.467 rows=10153 loops=1)
               ->  Merge Join  (cost=40335.69..40494.44 rows=3558 width=8) (actual time=17263.290..17594.400 rows=10153 loops=1)
                     Merge Cond: ("outer".doc_id = "inner".doc_id)
                     ->  Sort  (cost=20817.14..20849.19 rows=12818 width=4) (actual time=10145.947..10314.729 rows=101159 loops=1)
                           Sort Key: da_client.doc_id
                           ->  Bitmap Heap Scan on da da_client  (cost=155.91..19942.58 rows=12818 width=4) (actual time=1165.917..9800.025 rows=101159 loops=1)
                                 Recheck Cond: ((an = 'ssis_client'::text) AND (av = 'client1'::text))
                                 ->  Bitmap Index Scan on da_an_av_idx  (cost=0.00..155.91 rows=12818 width=0) (actual time=1156.362..1156.362 rows=101159 loops=1)
                                       Index Cond: ((an = 'ssis_client'::text) AND (av = 'client1'::text))
                     ->  Sort  (cost=19518.54..19548.09 rows=11817 width=4) (actual time=7117.330..7188.340 rows=101190 loops=1)
                           Sort Key: da_subtype.doc_id
                           ->  Bitmap Heap Scan on da da_subtype  (cost=143.90..18719.21 rows=11817 width=4) (actual time=913.707..6798.627 rows=101190 loops=1)
                                 Recheck Cond: ((an = 'ssis_subtype'::text) AND (av = 'subtype1'::text))
                                 ->  Bitmap Index Scan on da_an_av_idx  (cost=0.00..143.90 rows=11817 width=0) (actual time=904.720..904.720 rows=101190 loops=1)
                                       Index Cond: ((an = 'ssis_subtype'::text) AND (av = 'subtype1'::text))
               ->  Index Scan using document_pkey on document doc  (cost=0.00..3.60 rows=1 width=4) (actual time=1.762..1.763 rows=1 loops=10153)
                     Index Cond: (doc.id = "outer".doc_id)
         ->  Index Scan using da_pkey on da da_type  (cost=0.00..3.55 rows=1 width=4) (actual time=1.451..1.452 rows=0 loops=10153)
               Index Cond: (("outer".id = da_type.doc_id) AND (da_type.an = 'ssis_type'::text) AND (da_type.av = 'type1'::text))
   ->  Index Scan using da_pkey on da da_group  (cost=0.00..3.55 rows=1 width=4) (actual time=0.028..0.028 rows=0 loops=1072)
         Index Cond: (("outer".id = da_group.doc_id) AND (da_group.an = 'ssis_group'::text) AND (da_group.av = 'group1'::text))
 Total runtime: 50296.288 ms
(24 rows)

Time: 50301.853 ms

(originally from http://microjet.ath.cx/WebWiki/2005.12.11_PreferringNaturalToSurrogateKey.html)

Table partitioning

Postgresql could benefit from an implementation of table partitioning. Also see: http://groups.google.com/group/pgsql.general/tree/browse_frm/thread/e72ead84aa16bc8e/64d61f78425b119b?rnum=1&q=partition&_done=%2Fgroup%2Fpgsql.general%2Fbrowse_frm%2Fthread%2Fe72ead84aa16bc8e%2F121bab4de037ce6a%3Fq%3Dpartition%26rnum%3D7%26#doc_64d61f78425b119b

All in one table

dms3_test=> select count(doc.id) from document as doc, da da_client, da
da_group, da da_type, da da_subtype where doc.id=da_client.doc_id and
doc.id=da_group.doc_id and doc.id=da_type.doc_id and
doc.id=da_subtype.doc_id and da_client.an='ssis_client' and
da_group.an='ssis_group' and da_type.an='ssis_type' and
da_subtype.an='ssis_subtype' and da_client.av='client1' and
da_group.av='group1' and da_type.av='type1' and da_subtype.av='subtype1';

 count 
-------
   140
(1 row)

Time: 60882.964 ms

Separated out

dms3_test=> select count(doc.id) from document as doc, ssis_client as sc,
ssis_group as sg, ssis_type as st, ssis_subtype as sst where
doc.id=sc.doc_id and doc.id=sg.doc_id and doc.id=st.doc_id and
doc.id=sst.doc_id and sc.value='client1' and sg.value='group1' and
st.value='type1' and sst.value='subtype1';

 count 
-------
   140
(1 row)

Time: 3912.358 ms

Explain analyze of all in one table

dms3_test=> explain analyze select doc.id from document as doc, da
da_client, da da_group, da da_type, da da_subtype where
doc.id=da_client.doc_id and doc.id=da_group.doc_id and
doc.id=da_type.doc_id and doc.id=da_subtype.doc_id and
da_client.an='ssis_client' and da_group.an='ssis_group' and
da_type.an='ssis_type' and da_subtype.an='ssis_subtype' and
da_client.av='client1' and da_group.av='group1' and da_type.av='type1' and
da_subtype.av='subtype1'

                                                                             QUERY PLAN                                                                             
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=40335.69..53871.76 rows=1 width=4) (actual time=17319.458..50292.429 rows=140 loops=1)
   ->  Nested Loop  (cost=40335.69..53864.64 rows=2 width=16) (actual time=17319.392..50260.780 rows=1072 loops=1)
         ->  Nested Loop  (cost=40335.69..53330.49 rows=150 width=12) (actual time=17289.476..35508.467 rows=10153 loops=1)
               ->  Merge Join  (cost=40335.69..40494.44 rows=3558 width=8) (actual time=17263.290..17594.400 rows=10153 loops=1)
                     Merge Cond: ("outer".doc_id = "inner".doc_id)
                     ->  Sort  (cost=20817.14..20849.19 rows=12818 width=4) (actual time=10145.947..10314.729 rows=101159 loops=1)
                           Sort Key: da_client.doc_id
                           ->  Bitmap Heap Scan on da da_client  (cost=155.91..19942.58 rows=12818 width=4) (actual time=1165.917..9800.025 rows=101159 loops=1)
                                 Recheck Cond: ((an = 'ssis_client'::text) AND (av = 'client1'::text))
                                 ->  Bitmap Index Scan on da_an_av_idx  (cost=0.00..155.91 rows=12818 width=0) (actual time=1156.362..1156.362 rows=101159 loops=1)
                                       Index Cond: ((an = 'ssis_client'::text) AND (av = 'client1'::text))
                     ->  Sort  (cost=19518.54..19548.09 rows=11817 width=4) (actual time=7117.330..7188.340 rows=101190 loops=1)
                           Sort Key: da_subtype.doc_id
                           ->  Bitmap Heap Scan on da da_subtype  (cost=143.90..18719.21 rows=11817 width=4) (actual time=913.707..6798.627 rows=101190 loops=1)
                                 Recheck Cond: ((an = 'ssis_subtype'::text) AND (av = 'subtype1'::text))
                                 ->  Bitmap Index Scan on da_an_av_idx  (cost=0.00..143.90 rows=11817 width=0) (actual time=904.720..904.720 rows=101190 loops=1)
                                       Index Cond: ((an = 'ssis_subtype'::text) AND (av = 'subtype1'::text))
               ->  Index Scan using document_pkey on document doc  (cost=0.00..3.60 rows=1 width=4) (actual time=1.762..1.763 rows=1 loops=10153)
                     Index Cond: (doc.id = "outer".doc_id)
         ->  Index Scan using da_pkey on da da_type  (cost=0.00..3.55 rows=1 width=4) (actual time=1.451..1.452 rows=0 loops=10153)
               Index Cond: (("outer".id = da_type.doc_id) AND (da_type.an = 'ssis_type'::text) AND (da_type.av = 'type1'::text))
   ->  Index Scan using da_pkey on da da_group  (cost=0.00..3.55 rows=1 width=4) (actual time=0.028..0.028 rows=0 loops=1072)
         Index Cond: (("outer".id = da_group.doc_id) AND (da_group.an = 'ssis_group'::text) AND (da_group.av = 'group1'::text))
 Total runtime: 50296.288 ms
(24 rows)

Time: 50301.853 ms

Explain analyze of separated out

dms3_test=> explain analyze select doc.id from document as doc, ssis_client
as sc, ssis_group as sg, ssis_type as st, ssis_subtype as sst where
doc.id=sc.doc_id and doc.id=sg.doc_id and doc.id=st.doc_id and
doc.id=sst.doc_id and sc.value='client1' and sg.value='group1' and
st.value='type1' and sst.value='subtype1';

                                                                           QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=48763.53..76529.21 rows=108 width=4) (actual time=3947.647..5401.517 rows=140 loops=1)
   ->  Nested Loop  (cost=48763.53..76139.59 rows=108 width=16) (actual time=3947.592..5377.496 rows=140 loops=1)
         ->  Hash Join  (cost=48763.53..71836.53 rows=1019 width=12) (actual time=3947.454..5027.993 rows=993 loops=1)
               Hash Cond: ("outer".doc_id = "inner".doc_id)
               ->  Seq Scan on ssis_group sg  (cost=0.00..22537.39 rows=105085 width=4) (actual time=169.102..1161.228 rows=100924 loops=1)
                     Filter: (value = 'group1'::text)
               ->  Hash  (cost=48739.02..48739.02 rows=9802 width=8) (actual time=3777.630..3777.630 rows=10153 loops=1)
                     ->  Hash Join  (cost=23168.26..48739.02 rows=9802 width=8) (actual time=2108.108..3770.001 rows=10153 loops=1)
                           Hash Cond: ("outer".doc_id = "inner".doc_id)
                           ->  Seq Scan on ssis_client sc  (cost=0.00..22537.39 rows=100706 width=4) (actual time=201.259..1662.239 rows=101159 loops=1)
                                 Filter: (value = 'client1'::text)
                           ->  Hash  (cost=22537.39..22537.39 rows=98349 width=4) (actual time=1904.210..1904.210 rows=101190 loops=1)
                                 ->  Seq Scan on ssis_subtype sst  (cost=0.00..22537.39 rows=98349 width=4) (actual time=234.704..1828.887 rows=101190 loops=1)
                                       Filter: (value = 'subtype1'::text)
         ->  Index Scan using ssis_type_pkey on ssis_type st  (cost=0.00..4.21 rows=1 width=4) (actual time=0.351..0.351 rows=0 loops=993)
               Index Cond: ("outer".doc_id = st.doc_id)
               Filter: (value = 'type1'::text)
   ->  Index Scan using document_pkey on document doc  (cost=0.00..3.60 rows=1 width=4) (actual time=0.169..0.169 rows=1 loops=140)
         Index Cond: (doc.id = "outer".doc_id)
 Total runtime: 5401.958 ms
(20 rows)

Time: 5406.568 ms

(originally from http://microjet.ath.cx/WebWiki/2005.12.11_TablePartitioning.html)

2005-12-09

Implementing application-level timeout

I was tasked to fix an app that had been in production for two years. The app was having trouble communicating with a server of a new customer. At times, it would just stuck in read() call even though it was supposed to have timed out in 2 minutes.
So I took a look at the source:

public void createConnection()
  {
    socket = Socket.new(/*....*/);
    s.setSocketTimeout(TWO_MINUTES);
    output = socket.getOutputStream();
    input = socket.getInputStream();
  }

  public void sendPing(Socket s) 
  {
    output.write(PING_MESSAGE);
    /* ... */
    input.read();
    /* ... */
  }

This is the wrong way to implement application-level timeout. For, you see, it turned out this particular server was sending a TCP Keep Alive packet every 1 minutes 15 seconds. I have no idea why they did that (the default value should be around 2 hours), but they did anyway, and they were a valuable customer.
Telling the customer, 'you are incapable of configuring server' just wouldn't do, and it was also not totally their fault.

Our fault was in using socket's timeout for application-level timeout. Java's Socket#setSocketTimeout method corresponds to setting SO_RCVTIMEO and SO_SNDTIMEO socket options in BSD Socket API.
Each packet received, whether it contains application level data or not, reset the time out timer. Those Keep Alive packets were preventing the timeout timer from reaching the two minutes mark.

The easiest solution to this problem was to create a timer service. Each time you want to do a socket operation, you register to the timer service and unregister afterward.

import javautils.fun.VoidToVoid;


public class TimerService extends Thread
{
  /* a singleton */
  public synchronized TimerService getInstance() { /* ... */ }

  public void run()
  {
    while (true) {
     mutex.acquire();
     try {
       interruptTimedOutThread();
     } finally {
       mutex.release();
     }
     sleep(ONE_SECOND);
    }
  }

  public void timeout(int timeout_millisecond, VoidToVoid func)
  {
    try {
      register(Thread.current, timeout_millisecond);
      try {
        func.call();
      } finally {
        unregister(Thread.current);      
      }
    /* 
       Convert exceptions caused by interruption outside
       of the register-unregister block because we don't want
       exception handling to be interrupted
    */
    } catch (ClosedByInterruptException e) {
      throw new TimedOutException("timed out", e);
    } catch (InterruptException e) {
      throw new TimedOutException("timed out", e);
    }
  }
}

public class ConnectionToServer 
{
  public void sendPing(Socket s)
  {
    TimerService timer = TimerService.getInstance();
    try {
      timer.timeout(TWO_SECONDS, new VoidToVoid() {
        public void with() 
        {
          output.write(PING_MESSAGE);
          /* ... */
          input.read();
          /* ... */
        }});
    } catch (TimedOutException e) {
      /* ... */
    }
  }
}

(originally from http://microjet.ath.cx/WebWiki/2005.12.09_Implementing_Application-Level_Timeout.html)

2005-11-12

2005.12.11_Index Clustering

Index clustering can improve your query time, if your query can take advantage of the ordering.
Following are the queries from 2005.12.11_TablePartitioning.html redone with index clustering.

The all in one table. A 10x improvement.

dms3_test=> \d da
   Table "temp20051206.da"
 Column |  Type   | Modifiers 
--------+---------+-----------
 doc_id | integer | not null
 an     | text    | not null
 av     | text    | not null
Indexes:
    "da_pkey" PRIMARY KEY, btree (doc_id, an, av)
    "da_an_av_idx" UNIQUE, btree (an, av, doc_id) CLUSTER
Foreign-key constraints:
    "da_an_fkey" FOREIGN KEY (an, av) REFERENCES attr(name, value) DEFERRABLE INITIALLY DEFERRED
    "da_doc_id_fkey" FOREIGN KEY (doc_id) REFERENCES public.document(id) DEFERRABLE INITIALLY DEFERRED



dms3_test=> explain analyze select doc.id from document as doc, da
da_client, da da_group, da da_type, da da_subtype where
doc.id=da_client.doc_id and doc.id=da_group.doc_id and
doc.id=da_type.doc_id and doc.id=da_subtype.doc_id and
da_client.an='ssis_client' and da_group.an='ssis_group' and
da_type.an='ssis_type' and da_subtype.an='ssis_subtype' and
da_client.av='client1' and da_group.av='group1' and da_type.av='type1'
and da_subtype.av='subtype1';
                                                                            QUERY PLAN                                                                            
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=40335.69..53871.76 rows=1 width=4) (actual time=1904.361..5277.128 rows=140 loops=1)
   ->  Nested Loop  (cost=40335.69..53864.64 rows=2 width=16) (actual time=1904.289..5221.572 rows=1072 loops=1)
         ->  Nested Loop  (cost=40335.69..53330.49 rows=150 width=12) (actual time=1904.216..3747.333 rows=10153 loops=1)
               ->  Merge Join  (cost=40335.69..40494.44 rows=3558 width=8) (actual time=1882.068..2082.534 rows=10153 loops=1)
                     Merge Cond: ("outer".doc_id = "inner".doc_id)
                     ->  Sort  (cost=20817.14..20849.19 rows=12818 width=4) (actual time=778.661..835.451 rows=101159 loops=1)
                           Sort Key: da_client.doc_id
                           ->  Bitmap Heap Scan on da da_client  (cost=155.91..19942.58 rows=12818 width=4) (actual time=432.318..521.484 rows=101159 loops=1)
                                 Recheck Cond: ((an = 'ssis_client'::text) AND (av = 'client1'::text))
                                 ->  Bitmap Index Scan on da_an_av_idx  (cost=0.00..155.91 rows=12818 width=0) (actual time=418.064..418.064 rows=101159 loops=1)
                                       Index Cond: ((an = 'ssis_client'::text) AND (av = 'client1'::text))
                     ->  Sort  (cost=19518.54..19548.09 rows=11817 width=4) (actual time=1103.395..1160.311 rows=101190 loops=1)
                           Sort Key: da_subtype.doc_id
                           ->  Bitmap Heap Scan on da da_subtype  (cost=143.90..18719.21 rows=11817 width=4) (actual time=633.951..751.246 rows=101190 loops=1)
                                 Recheck Cond: ((an = 'ssis_subtype'::text) AND (av = 'subtype1'::text))
                                 ->  Bitmap Index Scan on da_an_av_idx  (cost=0.00..143.90 rows=11817 width=0) (actual time=633.700..633.700 rows=101190 loops=1)
                                       Index Cond: ((an = 'ssis_subtype'::text) AND (av = 'subtype1'::text))
               ->  Index Scan using document_pkey on document doc  (cost=0.00..3.60 rows=1 width=4) (actual time=0.162..0.163 rows=1 loops=10153)
                     Index Cond: (doc.id = "outer".doc_id)
         ->  Index Scan using da_pkey on da da_type  (cost=0.00..3.55 rows=1 width=4) (actual time=0.144..0.144 rows=0 loops=10153)
               Index Cond: (("outer".id = da_type.doc_id) AND (da_type.an = 'ssis_type'::text) AND (da_type.av = 'type1'::text))
   ->  Index Scan using da_pkey on da da_group  (cost=0.00..3.55 rows=1 width=4) (actual time=0.050..0.050 rows=0 loops=1072)
         Index Cond: (("outer".id = da_group.doc_id) AND (da_group.an = 'ssis_group'::text) AND (da_group.av = 'group1'::text))
 Total runtime: 5280.891 ms
(24 rows)

Time: 5287.639 ms

The separated out data. About 2x improvement.

dms3_test=> \d ssis_client
                                        Table "attr.ssis_client"
    Column     |           Type           |                          Modifiers                          
---------------+--------------------------+-------------------------------------------------------------
 doc_id        | integer                  | not null
 created_when  | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone
 created_who   | integer                  | not null
 modified_when | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone
v modified_who  | integer                  | not null
 value         | text                     | not null
 value_id      | integer                  | not null
Indexes:
    "ssis_client_pkey" PRIMARY KEY, btree (doc_id)
    "ssis_client_value_id_idx" btree (value_id)
    "ssis_client_value_idx" btree (value) CLUSTER
Foreign-key constraints:
    "ssis_client_created_who_fkey" FOREIGN KEY (created_who) REFERENCES public.who(id)
    "ssis_client_doc_id_fkey" FOREIGN KEY (doc_id) REFERENCES public.document(id)
    "ssis_client_modified_who_fkey" FOREIGN KEY (modified_who) REFERENCES public.who(id)
    "ssis_client_value_id_fkey" FOREIGN KEY (value_id) REFERENCES ssis_client_value(id)

dms3_test=> explain analyze select doc.id from document as doc,
ssis_client as sc, ssis_group as sg, ssis_type as st, ssis_subtype as
sst where doc.id=sc.doc_id and doc.id=sg.doc_id and doc.id=st.doc_id
and doc.id=sst.doc_id and sc.value='client1' and sg.value='group1' and
st.value='type1' and sst.value='subtype1';

                                                                                            QUERY PLAN                                                                                            
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=12322.80..16099.97 rows=108 width=4) (actual time=1941.188..2329.004 rows=140 loops=1)
   ->  Hash Join  (cost=12322.80..15710.34 rows=108 width=16) (actual time=1941.144..2306.762 rows=140 loops=1)
         Hash Cond: ("outer".doc_id = "inner".doc_id)
         ->  Index Scan using ssis_type_value_idx on ssis_type st  (cost=0.00..2850.93 rows=107106 width=4) (actual time=75.017..406.128 rows=100825 loops=1)
               Index Cond: (value = 'type1'::text)
         ->  Hash  (cost=12320.26..12320.26 rows=1019 width=12) (actual time=1862.846..1862.846 rows=993 loops=1)
               ->  Hash Join  (cost=8986.99..12320.26 rows=1019 width=12) (actual time=1531.920..1861.824 rows=993 loops=1)
                     Hash Cond: ("outer".doc_id = "inner".doc_id)
                     ->  Index Scan using ssis_group_value_idx on ssis_group sg  (cost=0.00..2797.65 rows=105085 width=4) (actual time=39.136..291.236 rows=100924 loops=1)
                           Index Cond: (value = 'group1'::text)
                     ->  Hash  (cost=8962.49..8962.49 rows=9802 width=8) (actual time=1481.859..1481.859 rows=10153 loops=1)
                           ->  Hash Join  (cost=3248.58..8962.49 rows=9802 width=8) (actual time=981.952..1474.601 rows=10153 loops=1)
                                 Hash Cond: ("outer".doc_id = "inner".doc_id)
                                 ->  Index Scan using ssis_client_value_idx on ssis_client sc  (cost=0.00..2680.53 rows=100706 width=4) (actual time=43.766..317.779 rows=101159 loops=1)
                                       Index Cond: (value = 'client1'::text)
                                 ->  Hash  (cost=2617.71..2617.71 rows=98349 width=4) (actual time=918.689..918.689 rows=101190 loops=1)
                                       ->  Index Scan using ssis_subtype_value_idx on ssis_subtype sst  (cost=0.00..2617.71 rows=98349 width=4) (actual time=53.700..820.030 rows=101190 loops=1)
                                             Index Cond: (value = 'subtype1'::text)
   ->  Index Scan using document_pkey on document doc  (cost=0.00..3.60 rows=1 width=4) (actual time=0.156..0.156 rows=1 loops=140)
         Index Cond: (doc.id = "outer".doc_id)
 Total runtime: 2329.395 ms
(21 rows)

Time: 2334.097 ms

(originally from http://microjet.ath.cx/WebWiki/2005.12.11_IndexClustering.html)

2005-11-03

Connection from ruby to MS SQL Server

Quick setup guide for: debian, iodbc, ruby, mssql2k

apt-get install libdbi-ruby libdbd-odbc-ruby freetds-dev odbcinst1 iodbc

Installing freetds-dev and odbcinst1 should cause apt-get to offer you a choice of having freetds managed by odbcinst. Say yes. That will create a file /etc/odbcinst.ini

ysantoso@helen:~$ cat /etc/odbcinst.ini
[FreeTDS]
Description     = TDS driver (Sybase/MS SQL)
Driver          = /usr/lib/odbc/libtdsodbc.so
Setup           = /usr/lib/odbc/libtdsS.so
CPTimeout       =
CPReuse         =
FileUsage       = 1

Next thing to do is to setup ~/.freetds.conf. I started with the template at /etc/freetds/freetds.conf.

ysantoso@helen:~$ cat ~/.freetds.conf
[global]
        # Default TDS protocol version. 
        tds version = 4.2

        initial block size = 512

        swap broken dates = no

        swap broken money = no

        # Database server login method, if both server and domain
        # logins are enabled, domain login is tried first if a domain
        # is specified, and if that fails the server login will be
        # used.
        try server login = yes
        try domain login = no

        # Whether to write a TDSDUMP file for diagnostic purposes
        # (setting this to /tmp is insecure on a multi-user system)
;       dump file = /tmp/freetds.log
;       debug level = 10

        # If you get out of memory errors, it may mean that your
          client
        # is trying to allocate a huge buffer for a TEXT field.
        # (Microsoft servers sometimes pretend TEXT columns are
        # 4 GB wide!)   If you have this problem, try setting
        # 'text size' to a more reasonable limit
        text size = 64512


[FooServer]
host            =       fooserver.example.com
port            =       1433
tds version     =       7.0

Then we setup ~/.odbc.ini

ysantoso@helen:~$ cat ~/.odbc.ini
[ODBC Data Sources]
FooDSN = description about FooDSN

[FooDSN]
Driver          = /usr/lib/odbc/libtdsodbc.so
ServerName      = FooServer
Database        = Bar

Next, we test:

ysantoso@helen:~$ iodbctest
iODBC Demonstration program
This program shows an interactive SQL processor
Driver Manager: 03.52.0205.0204

Enter ODBC connect string (? shows list): ?

DSN                              | Driver
------------------------------------------------------------------------------
FooDSN                           | FooDSN

Enter ODBC connect string (? shows list):
DSN=FooDSN;UID=sa;PWD=lookmanopassword

SQL>select * from tablebaz;
.
.
.
.

Alright, ODBC is setup. Let's try connecting from Ruby. There are two ways: one is to use the ODBC driver directly or using DBI.

I am not familiar with the ODBC driver's API. Looks similar to DBI, but I'm sure it differs in places. So, I just did a quick trial at it just to ensure that the DSN is seen.

irb(main):002:0> require 'odbc'
true
irb(main):005:0> ODBC.datasources
[#]

irb(main):006:0> require 'dbi'
true
irb(main):008:0> dbh=DBI.connect('dbi:ODBC:FooDSN', 'sa', 'lookmanopassword')
#, @attr={}>, @trace_output=#, @trace_mode=\ 2>
irb(main):015:0> dbh.select_one('select count(*) from websession')
[2495950]

(originally from http://microjet.ath.cx/WebWiki/2005.11.03_Connecting_From_Ruby_To_MSSQL.html)

2005-11-02

Removing stubborn Windows device driver

Haven't been playing admin with a Windows machine for a long time, I was surprised to see that it still sported the same obscene-looking, obscure, cryptic registries. The problem started when I wanted to install the AT&T Global Network client last Sunday so I can access office network (it's a MS shop, sad, I know) from home.
The installation was successfully completed without any error message presented to me, but I couldn't figure out my password. I would have left the program as it was, but I felt booting in took longer since I installed it. So, I uninstalled it and didn't find any noticeable difference.
Came Monday at work. I put the laptop on its dock and was going to start my workday when I found out that while it can connect to some hosts, it can't connect to DNS, email and Internet gateway. I didn't think it was the laptop at first as I was using it to browse around web sites last night. But after talking to some network people, it turned out that only my laptop can't reach those servers.
AppSupport was called in to take a look but couldn't resolve what is wrong. So, they gave me this 3com eth 10/100 PCMCIA card with an X-JACK where the network cable supposed to be plugged in to.
I'd have settled with another network card except that the jack is right to the left of my left hand, and as I was working, my left hand kept knocking on it. Each time, the connection was disrupted.
So, I googled around and found these:
http://netsecurity.about.com/gi/dynamic/offsite.htm?zi=1/XJ&sdn=netsecurity&zu=http%3A%2F%2Fwww.windowsnetworking.com%2Fkbase%2FWindowsTips%2FWindowsXP%2FUserTips%2FTroubleShooting%2FRemoveUnusedDriversandDevices.html
and
http://groups.google.com/group/comp.os.ms-windows.misc/browse_frm/thread/ac65a539ed467eba/b9714c5f0a5d7682?lnk=st&q=failed+to+uninstall+the+device&rnum=12&hl=en#b9714c5f0a5d7682
With these tips, I removed all network drivers and reinstall them. Things started working again. Yay, I avoided a Win2k reinstallation.

(originally from http://microjet.ath.cx/WebWiki/2005.11.01_Removing_Stubborn_Windows_Device_Driver.html)

2005-08-27

AMD 64 hyper transport single channel

Results from running memtest86 on three computers:

Memtest86 Summary
	AMD Athlon XP 2100	Intel Pentium M 1.3GHz	AMD Athlon 64 3400 (socket754)
Operating Frequency	1.7GHz	1.3GHz	2.4GHz
L1 Size(KiB)	128	64	128
L2 Size(KiB)	256	1024	512
RAM Size(MiB)	768	496	1024
L1 Rate(MiB/s)	10602	16034	19664
L2 Rate(MiB/s)	3375	7967	4886
RAM Rate(MiB/s)	505	939	1441
Chipset	VIA KT400(A)/600	Intel i855GM/GME FSB:99MHz	SIS 760/M760
Settings	DDR266	RAM: 132MHz (DDR264) CAS: 2-3-2-6	DDR400

Notes:

memtest misidentified the operating frequency of the Athlon64. It should be 2.2GHz.

The Athlon64, being on socket 754, only have a single channel to the RAM.
Athlon64 on socket 939 (I don't have one) would have two channels to the RAM. Of course, to take advantage of the second channel, you'd have to have a second DIMM module.
I have no idea how to test the decrease in latency of RAM access AMD touts in Athlon64 with its built-in memory controller.

(originally from http://microjet.ath.cx/WebWiki/Amd64HyperTransportSingleChannel.html)

2005-08-23

Result pagination with postgresql

A common problem with webapps is providing an interface to page through result set like what various search engines do.
I have yet to find a website that discuss this in depth. So, here is a summary of solutions I came up with by looking at various pieces in the Web for doing result pagination with Postgresql. I hope this would give the sorely needed encouragement for people to start sharing their findings.
The problem actually has three components:

displaying result for a certain page,
not causing undue latency in page display, and
counting number of results.

Counting the accurate number of results would almost always require the full result set to be counted. Applications would usually cache this number. How exactly is the counting done depends on the approach taken.
And there are two basic approaches:

operating on piecemeal result set
operating on full result set

Please realise that there is no 'best' approach. Each comes with its own pros and cons.
To illustrate the pros and cons, I am employing two kind of queries: cheap and expensive. I categorise queries according to the effort exacted from pgsql: cheap and expensive. This categorisation is only for simplicity purposes as there are, of course, grey areas, queries that are neither cheap nor expensive; not to mention that cheap and expensive are subjective terms anyway.

dms3_test=> create view cheap as select id from document;
CREATE VIEW
Time: 150.078 ms

dms3_test=> create view expensive as SELECT doc.id FROM
attribute as attr0, attribute_name as an0, document as doc, state,
document_attribute as da0 WHERE state.id = doc.state_id AND state.name
= 'new' AND da0.doc_id = doc.id AND attr0.id = da0.attribute_id AND
an0.id = attr0.name_id AND an0.name = 'ssis_client' AND attr0.value
ILIKE 'client1';
CREATE VIEW
Time: 120.979 ms

Since what is pro and what is con depend very much against the context, I simply list the characteristics of each approach without further labelling.

Operating on Piecemeal Result Set (New Query for Each Page)

In this approach, a new query is run for each page. Each query differs only in OFFSET and LIMIT clauses.
For example, for the first page, the query would be executed with OFFSET 0 and LIMIT 10. The second page would be OFFSET 11 LIMIT 10.
This approach is popular and is found in various web applications. It is simple to implement and has an acceptable performance on cheap queries.
Number of matching results could be counted with a SELECT COUNT(*) in the beginning. This number could be cached as well so as to reduce the load on the server.
The latency in page display is low in the beginning and degrades linearly as user moves deeper.
The problem with this approach is it does not reuse previous effort. This is especially problematic if the query is expensive.
Another problem is each query would potentially see different snapshot of the data. If user is browsing page n and the underlying data changes, refreshing or revisiting page n would show a different data.
cheap query

dms3_test=> abort; 
begin; 
select count(*) from cheap; 
select * from cheap order by id offset 0 limit 10; 
select * from cheap order by id offset 50000 limit 10;

ROLLBACK
Time: 1.640 ms
BEGIN
Time: 6.187 ms
  count  
---------
 1010431
(1 row)

Time: 1094.187 ms
 id 
----
  1
  2
  3
  4
  5
  6
  9
 10
 11
 12
(10 rows)

Time: 57.371 ms
  id   
-------
 50003
 50004
 50005
 50006
 50007
 50008
 50009
 50010
 50011
 50012
(10 rows)

Time: 134.610 ms

expensive query

dms3_test=> abort; 
begin; 
select count(*) from expensive; 
select * from expensive order by id offset 0 limit 10; 
select * from expensive order by id offset 50000 limit 10;

ROLLBACK
Time: 2.698 ms
BEGIN
Time: 4.584 ms
count 
-------
 68276
(1 row)

Time: 18034.510 ms
 id  
-----
   6
  50
  55
  65
  89
 109
 110
 133
 144
 155
(10 rows)

Time: 76.929 ms
   id   
--------
 749659
 749661
 749667
 749685
 749692
 749720
 749732
 749740
 749741
 749778
(10 rows)

Time: 14424.053 ms

Characteristics:

simple implementation
no setup cost
suitable for cheap queries
suitable if user is expected to browse through only few pages
not isolated from underlying changes
latency degrades linearly

Operating on Full Result Set

This approach takes off from the previous one by reusing previous effort. The database takes a hit only on new query criteria, instead of every time the user changes pages.
This approach could be implemented by using either a temporary table or a without hold cursor. Both implementations require the webapp to maintain and reuse the transaction in which the table or cursor is defined.
A common strategy is to maintain a fixed number of connections to the database and assign one connection to the processing of a query in a round-robin way, i.e. map a specific query criteria to a specific connection.
In each connection, a transaction is held open throughout the duration of the webapp. This transaction would hold various temporary tables or cursors. You would want to keep the transaction open as long as possible.
Warning: keeping a transaction open for a long time would have the

following negative side-effects:

prevents vacuum from removing all dead tuples.
blocks changes to schema
may block other transactions if data is modified within the transaction

Moreover, transactions may become invalid at any time due to some unforeseen event like Bob spilling his soda over the ethernet switch.
Therefore, your webapp should be able to re-connect and re-setup the temporary tables or cursors setup if the existing connection or transaction is no longer valid.
Being able to re-setup would also allow the DBA to vacuum thoroughly and/or make schema updates by simply killing and temporarily blocking connections from your webapp during low-traffic hours without having to restart your webapp. This is a big deal if the DBA person is not the sysadmin or have permission to restart your webapp.
Before processing each query, it is recommended to generate a SAVEPOINT so that any error in processing a query would not destroy the transaction.

Using Temporary Tables

The result set could be piped into a temporary table via the CREATE TEMPORARY TABLE foo AS command. It is important to remember to use a temporary table since it is not journalled into the WAL (write-ahead logging) which would have negative impact on performance.
The implementation gives you a free count of matching result when you do the CREATE TEMPORARY TABLE AS. I am not sure why psql does not show the count, but it is accessible from within a stored procedure or your DB driver.
cheap query

dms3_test=> abort;
begin;
create temporary table foo as select * from cheap order by id;
select * from foo order by id offset 0 limit 10; 
select * from foo order by id offset 50000 limit 10;

ROLLBACK
Time: 60.744 ms
BEGIN
Time: 0.686 ms
SELECT
Time: 15125.956 ms
 id 
----
  1
  2
  3
  4
  5
  6
  9
 10
 11
 12
(10 rows)

Time: 4397.762 ms
  id   
-------
 50003
 50004
 50005
 50006
 50007
 50008
 50009
 50010
 50011
 50012
(10 rows)

Time: 4413.789 ms

expensive query

dms3_test=> abort;
begin;
create temporary table foo as select * from expensive order by id;
select * from foo order by id offset 0 limit 10; 
select * from foo order by id offset 50000 limit 10;

ROLLBACK
Time: 52.777 ms
BEGIN
Time: 3.683 ms
SELECT
Time: 18666.615 ms
 id  
-----
   6
  50
  55
  65
  89
 109
 110
 133
 144
 155
(10 rows)

Time: 314.754 ms
   id   
--------
 749659
 749661
 749667
 749685
 749692
 749720
 749732
 749740
 749741
 749778
(10 rows)

Time: 342.207 ms

Characteristics:

complex implementation
high setup cost
free result count as a side-effect
suitable for expensive queries
suitable if user is expected to comprehensively browse the result set
isolated from underlying changes
latency still degrades linearly but more gently

Using Without Hold Cursors

Without hold cursors are destroyed at the end of transaction, similar to temporary tables. On the other hand, with hold cursors outlive the creating transaction, although they are still bounded within a session. I recommend using without hold cursors to simplify garbage management.
cheap query

dms3_test=> abort;
begin;
declare cheap_cursor scroll cursor for select * from cheap order by id;
move all from cheap_cursor;
move first from cheap_cursor;
fetch 10 from cheap_cursor;
move absolute 50000 from cheap_cursor;
fetch 10 from cheap_cursor;

ROLLBACK
Time: 4.054 ms
BEGIN
Time: 0.970 ms
DECLARE CURSOR
Time: 1.022 ms
MOVE 1010431
Time: 12434.136 ms
MOVE 1
Time: 4.409 ms
 id 
----
  2
  3
  4
  5
  6
  9
 10
 11
 12
 13
(10 rows)

Time: 4.418 ms
MOVE 1
Time: 30.055 ms
  id   
-------
 50003
 50004
 50005
 50006
 50007
 50008
 50009
 50010
 50011
 50012
(10 rows)

Time: 3.875 ms

expensive query

dms3_test=> abort;
begin;
declare expensive_cursor scroll cursor for select * from expensive order by id;
move all from expensive_cursor;
move first from expensive_cursor;
fetch 10 from expensive_cursor;
move absolute 50000 from expensive_cursor;
fetch 10 from expensive_cursor;

ROLLBACK
Time: 2.044 ms
BEGIN
Time: 0.739 ms
DECLARE CURSOR
Time: 51.912 ms
MOVE 68276
Time: 19036.148 ms
MOVE 1
Time: 1.055 ms
 id  
-----
  50
  55
  65
  89
 109
 110
 133
 144
 155
 186
(10 rows)

Time: 0.911 ms
MOVE 1
Time: 30.226 ms
   id   
--------
 749659
 749661
 749667
 749685
 749692
 749720
 749732
 749740
 749741
 749778
(10 rows)

Time: 1.736 ms

Characteristics:

complex implementation
high setup cost
suitable for expensive queries
suitable if user is expected to comprehensively browse the result set
isolated from underlying changes
barely noticeable latency

Hybrid Approach

One could do a hybrid approach. The implementation would be even more complex, but in some cases, it could combine the no setup cost benefit of the piecemeal approach and the low latency of the full result approach.
The hybrid approach would operate on piecemeal result set until a certain threshold is reached, e.g.: paging past page 7. When that happens, one of the full result set approach is executed, preferably in the background. The webapp could transition to using the full result set when it is ready.

Summary

Summary of Implementations
Query Type	Implementation	Setup(ms)	Counting(ms)	First Page(ms)	5000th Page(ms)
Cheap	New query per page	N/A	1094.187	57.371	134.610
	Temporary Table	15125.956	N/A	4397.762	4413.789
	Cursor	1.022	12434.136	8.827	33.930
Expensive	New query per page	N/A	18034.510	76.929	14424.053
	Temporary Table	18666.615	N/A	314.754	342.207
	Cursor	51.921	19036.148	1.966	31.962

(originally from http://microjet.ath.cx/WebWiki/ResultPaginationWithPostgresql.html)

pvmove problem

I am using LVM2 with linux 2.6.11. One of the drive in the volume is going bad. I could hear screeching noise while the platters were spinning.

That is bad. Not a problem, though, as I can just vacate the data from that drive using pvmove.
The bad drive is /dev/bad. /dev/avail is a volume with some free space.

# pvdisplay /dev/bad /dev/avail
  --- Physical volume ---
  PV Name               /dev/bad

VG Name               vg0
  PV Size               148.09 GB / not usable 0
  Allocatable           yes
  PE Size (KByte)       4096
  Total PE              37911
  Free PE               25111
  Allocated PE          12800
  PV UUID               e5EvaO-0oo5-Zenl-KzkY-wFf7-EkQX-di1mXt

  --- Physical volume ---
  PV Name               /dev/avail

VG Name               vg0
  PV Size               186.30 GB / not usable 0
  Allocatable           yes
  PE Size (KByte)       4096
  Total PE              47694
  Free PE               20658
  Allocated PE          27036
  PV UUID               HLtCMK-751U-IFAW-5Aj3-FQ3w-5tqO-8svvwt

Let's move it to the only drive with available space, /dev/avail. /dev/bad has 12800 allocated PE while /dev/avail has 20658 free PE. I was not expecting any problem fitting the data in /dev/bad into /dev/avail.

# pvmove -i 5 -v /dev/bad

Finding volume group "vg0"
    Archiving volume group "vg0" metadata.
    Creating logical volume pvmove0
    Moving 0 extents of logical volume vg0/lv0

Insufficient contiguous allocatable extents (1777) for logical
    volume pvmove0: 12800 required
  Unable to allocate temporary LV for pvmove.

Urk? It needs to be contiguous? What to do now?

Searching through the lvm mailing list shows that pvmove is dumb. It only sees the first free PE (physical extent). OK, let's work around this.

# vgcfgbackup

# grep pv0 /etc/lvm/backup/vg0
                pv0 {
                                        "pv0", 30720
                                        "pv0", 0
                                        "pv0", 17920

pv0 is the physical volume corresponding to /dev/bad. From this, we see that there are three segments residing in pv0. The first starts at PE 30720.

Let's try to fill that 1777 free PE on the dest drive, /dev/avail. That means, we'll be moving PE 30720 to (30720+1777-1=32496) from pv0.

# pvmove -i 5 -v /dev/bad:30720-32496
    Finding volume group "vg0"
    Archiving volume group "vg0" metadata.
    Creating logical volume pvmove0
    Moving 0 extents of logical volume vg0/lv0

Moving 1777 extents of logical volume vg0/lv1
    Moving 0 extents of logical volume vg0/lv2

Moving 0 extents of logical volume vg0/lv3
    Found volume group "vg0"
    Updating volume group metadata
    Creating volume group backup "/etc/lvm/backup/vg0"
    Found volume group "vg0"
    Found volume group "vg0"
    Loading vg0-pvmove0
    Found volume group "vg0"
    Loading vg0-lv1

Checking progress every 5 seconds
  /dev/hdf2: Moved: 7.7%
  /dev/hdf2: Moved: 14.4%
  /dev/hdf2: Moved: 22.6%
  /dev/hdf2: Moved: 29.7%
  /dev/hdf2: Moved: 37.4%
  /dev/hdf2: Moved: 44.6%
  /dev/hdf2: Moved: 52.3%
  /dev/hdf2: Moved: 60.0%
  /dev/hdf2: Moved: 67.7%
  /dev/hdf2: Moved: 75.4%
  /dev/hdf2: Moved: 83.1%
  /dev/hdf2: Moved: 90.3%
  /dev/hdf2: Moved: 97.4%
  /dev/hdf2: Moved: 100.0%
    Found volume group "vg0"
    Found volume group "vg0"
    Found volume group "vg0"
    Loading vg0-pvmove0
    Found volume group "vg0"
    Loading vg0-lv1

    Found volume group "vg0"
    Found volume group "vg0"
    Removing temporary pvmove LV
    Writing out final volume group after pvmove
    Creating volume group backup "/etc/lvm/backup/vg0"

Finally, after repeating the above process for the remaining segments, /dev/bad, aka pv0, is free of data and is safe to take down.

# vgreduce vg0 /dev/bad

Toss it in the garbage bin.

(originally from http://microjet.ath.cx/WebWiki/pvmove%20problem.html)