2005-11-12

2005.12.11_Index Clustering

Index clustering can improve your query time, if your query can take advantage of the ordering.
Following are the queries from 2005.12.11_TablePartitioning.html redone with index clustering.

The all in one table. A 10x improvement.

dms3_test=> \d da
   Table "temp20051206.da"
 Column |  Type   | Modifiers 
--------+---------+-----------
 doc_id | integer | not null
 an     | text    | not null
 av     | text    | not null
Indexes:
    "da_pkey" PRIMARY KEY, btree (doc_id, an, av)
    "da_an_av_idx" UNIQUE, btree (an, av, doc_id) CLUSTER
Foreign-key constraints:
    "da_an_fkey" FOREIGN KEY (an, av) REFERENCES attr(name, value) DEFERRABLE INITIALLY DEFERRED
    "da_doc_id_fkey" FOREIGN KEY (doc_id) REFERENCES public.document(id) DEFERRABLE INITIALLY DEFERRED



dms3_test=> explain analyze select doc.id from document as doc, da
da_client, da da_group, da da_type, da da_subtype where
doc.id=da_client.doc_id and doc.id=da_group.doc_id and
doc.id=da_type.doc_id and doc.id=da_subtype.doc_id and
da_client.an='ssis_client' and da_group.an='ssis_group' and
da_type.an='ssis_type' and da_subtype.an='ssis_subtype' and
da_client.av='client1' and da_group.av='group1' and da_type.av='type1'
and da_subtype.av='subtype1';
                                                                            QUERY PLAN                                                                            
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=40335.69..53871.76 rows=1 width=4) (actual time=1904.361..5277.128 rows=140 loops=1)
   ->  Nested Loop  (cost=40335.69..53864.64 rows=2 width=16) (actual time=1904.289..5221.572 rows=1072 loops=1)
         ->  Nested Loop  (cost=40335.69..53330.49 rows=150 width=12) (actual time=1904.216..3747.333 rows=10153 loops=1)
               ->  Merge Join  (cost=40335.69..40494.44 rows=3558 width=8) (actual time=1882.068..2082.534 rows=10153 loops=1)
                     Merge Cond: ("outer".doc_id = "inner".doc_id)
                     ->  Sort  (cost=20817.14..20849.19 rows=12818 width=4) (actual time=778.661..835.451 rows=101159 loops=1)
                           Sort Key: da_client.doc_id
                           ->  Bitmap Heap Scan on da da_client  (cost=155.91..19942.58 rows=12818 width=4) (actual time=432.318..521.484 rows=101159 loops=1)
                                 Recheck Cond: ((an = 'ssis_client'::text) AND (av = 'client1'::text))
                                 ->  Bitmap Index Scan on da_an_av_idx  (cost=0.00..155.91 rows=12818 width=0) (actual time=418.064..418.064 rows=101159 loops=1)
                                       Index Cond: ((an = 'ssis_client'::text) AND (av = 'client1'::text))
                     ->  Sort  (cost=19518.54..19548.09 rows=11817 width=4) (actual time=1103.395..1160.311 rows=101190 loops=1)
                           Sort Key: da_subtype.doc_id
                           ->  Bitmap Heap Scan on da da_subtype  (cost=143.90..18719.21 rows=11817 width=4) (actual time=633.951..751.246 rows=101190 loops=1)
                                 Recheck Cond: ((an = 'ssis_subtype'::text) AND (av = 'subtype1'::text))
                                 ->  Bitmap Index Scan on da_an_av_idx  (cost=0.00..143.90 rows=11817 width=0) (actual time=633.700..633.700 rows=101190 loops=1)
                                       Index Cond: ((an = 'ssis_subtype'::text) AND (av = 'subtype1'::text))
               ->  Index Scan using document_pkey on document doc  (cost=0.00..3.60 rows=1 width=4) (actual time=0.162..0.163 rows=1 loops=10153)
                     Index Cond: (doc.id = "outer".doc_id)
         ->  Index Scan using da_pkey on da da_type  (cost=0.00..3.55 rows=1 width=4) (actual time=0.144..0.144 rows=0 loops=10153)
               Index Cond: (("outer".id = da_type.doc_id) AND (da_type.an = 'ssis_type'::text) AND (da_type.av = 'type1'::text))
   ->  Index Scan using da_pkey on da da_group  (cost=0.00..3.55 rows=1 width=4) (actual time=0.050..0.050 rows=0 loops=1072)
         Index Cond: (("outer".id = da_group.doc_id) AND (da_group.an = 'ssis_group'::text) AND (da_group.av = 'group1'::text))
 Total runtime: 5280.891 ms
(24 rows)

Time: 5287.639 ms

The separated out data. About 2x improvement.

dms3_test=> \d ssis_client
                                        Table "attr.ssis_client"
    Column     |           Type           |                          Modifiers                          
---------------+--------------------------+-------------------------------------------------------------
 doc_id        | integer                  | not null
 created_when  | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone
 created_who   | integer                  | not null
 modified_when | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone
v modified_who  | integer                  | not null
 value         | text                     | not null
 value_id      | integer                  | not null
Indexes:
    "ssis_client_pkey" PRIMARY KEY, btree (doc_id)
    "ssis_client_value_id_idx" btree (value_id)
    "ssis_client_value_idx" btree (value) CLUSTER
Foreign-key constraints:
    "ssis_client_created_who_fkey" FOREIGN KEY (created_who) REFERENCES public.who(id)
    "ssis_client_doc_id_fkey" FOREIGN KEY (doc_id) REFERENCES public.document(id)
    "ssis_client_modified_who_fkey" FOREIGN KEY (modified_who) REFERENCES public.who(id)
    "ssis_client_value_id_fkey" FOREIGN KEY (value_id) REFERENCES ssis_client_value(id)

dms3_test=> explain analyze select doc.id from document as doc,
ssis_client as sc, ssis_group as sg, ssis_type as st, ssis_subtype as
sst where doc.id=sc.doc_id and doc.id=sg.doc_id and doc.id=st.doc_id
and doc.id=sst.doc_id and sc.value='client1' and sg.value='group1' and
st.value='type1' and sst.value='subtype1';

                                                                                            QUERY PLAN                                                                                            
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=12322.80..16099.97 rows=108 width=4) (actual time=1941.188..2329.004 rows=140 loops=1)
   ->  Hash Join  (cost=12322.80..15710.34 rows=108 width=16) (actual time=1941.144..2306.762 rows=140 loops=1)
         Hash Cond: ("outer".doc_id = "inner".doc_id)
         ->  Index Scan using ssis_type_value_idx on ssis_type st  (cost=0.00..2850.93 rows=107106 width=4) (actual time=75.017..406.128 rows=100825 loops=1)
               Index Cond: (value = 'type1'::text)
         ->  Hash  (cost=12320.26..12320.26 rows=1019 width=12) (actual time=1862.846..1862.846 rows=993 loops=1)
               ->  Hash Join  (cost=8986.99..12320.26 rows=1019 width=12) (actual time=1531.920..1861.824 rows=993 loops=1)
                     Hash Cond: ("outer".doc_id = "inner".doc_id)
                     ->  Index Scan using ssis_group_value_idx on ssis_group sg  (cost=0.00..2797.65 rows=105085 width=4) (actual time=39.136..291.236 rows=100924 loops=1)
                           Index Cond: (value = 'group1'::text)
                     ->  Hash  (cost=8962.49..8962.49 rows=9802 width=8) (actual time=1481.859..1481.859 rows=10153 loops=1)
                           ->  Hash Join  (cost=3248.58..8962.49 rows=9802 width=8) (actual time=981.952..1474.601 rows=10153 loops=1)
                                 Hash Cond: ("outer".doc_id = "inner".doc_id)
                                 ->  Index Scan using ssis_client_value_idx on ssis_client sc  (cost=0.00..2680.53 rows=100706 width=4) (actual time=43.766..317.779 rows=101159 loops=1)
                                       Index Cond: (value = 'client1'::text)
                                 ->  Hash  (cost=2617.71..2617.71 rows=98349 width=4) (actual time=918.689..918.689 rows=101190 loops=1)
                                       ->  Index Scan using ssis_subtype_value_idx on ssis_subtype sst  (cost=0.00..2617.71 rows=98349 width=4) (actual time=53.700..820.030 rows=101190 loops=1)
                                             Index Cond: (value = 'subtype1'::text)
   ->  Index Scan using document_pkey on document doc  (cost=0.00..3.60 rows=1 width=4) (actual time=0.156..0.156 rows=1 loops=140)
         Index Cond: (doc.id = "outer".doc_id)
 Total runtime: 2329.395 ms
(21 rows)

Time: 2334.097 ms

(originally from http://microjet.ath.cx/WebWiki/2005.12.11_IndexClustering.html)

2005-11-03

Connection from ruby to MS SQL Server

Quick setup guide for: debian, iodbc, ruby, mssql2k

apt-get install libdbi-ruby libdbd-odbc-ruby freetds-dev odbcinst1 iodbc

Installing freetds-dev and odbcinst1 should cause apt-get to offer you a choice of having freetds managed by odbcinst. Say yes. That will create a file /etc/odbcinst.ini

ysantoso@helen:~$ cat /etc/odbcinst.ini
[FreeTDS]
Description     = TDS driver (Sybase/MS SQL)
Driver          = /usr/lib/odbc/libtdsodbc.so
Setup           = /usr/lib/odbc/libtdsS.so
CPTimeout       =
CPReuse         =
FileUsage       = 1

Next thing to do is to setup ~/.freetds.conf. I started with the template at /etc/freetds/freetds.conf.

ysantoso@helen:~$ cat ~/.freetds.conf
[global]
        # Default TDS protocol version. 
        tds version = 4.2

        initial block size = 512

        swap broken dates = no

        swap broken money = no

        # Database server login method, if both server and domain
        # logins are enabled, domain login is tried first if a domain
        # is specified, and if that fails the server login will be
        # used.
        try server login = yes
        try domain login = no

        # Whether to write a TDSDUMP file for diagnostic purposes
        # (setting this to /tmp is insecure on a multi-user system)
;       dump file = /tmp/freetds.log
;       debug level = 10

        # If you get out of memory errors, it may mean that your
          client
        # is trying to allocate a huge buffer for a TEXT field.
        # (Microsoft servers sometimes pretend TEXT columns are
        # 4 GB wide!)   If you have this problem, try setting
        # 'text size' to a more reasonable limit
        text size = 64512


[FooServer]
host            =       fooserver.example.com
port            =       1433
tds version     =       7.0

Then we setup ~/.odbc.ini

ysantoso@helen:~$ cat ~/.odbc.ini
[ODBC Data Sources]
FooDSN = description about FooDSN

[FooDSN]
Driver          = /usr/lib/odbc/libtdsodbc.so
ServerName      = FooServer
Database        = Bar

Next, we test:

ysantoso@helen:~$ iodbctest
iODBC Demonstration program
This program shows an interactive SQL processor
Driver Manager: 03.52.0205.0204

Enter ODBC connect string (? shows list): ?

DSN                              | Driver
------------------------------------------------------------------------------
FooDSN                           | FooDSN

Enter ODBC connect string (? shows list):
DSN=FooDSN;UID=sa;PWD=lookmanopassword

SQL>select * from tablebaz;
.
.
.
.

Alright, ODBC is setup. Let's try connecting from Ruby. There are two ways: one is to use the ODBC driver directly or using DBI.

I am not familiar with the ODBC driver's API. Looks similar to DBI, but I'm sure it differs in places. So, I just did a quick trial at it just to ensure that the DSN is seen.

irb(main):002:0> require 'odbc'
true
irb(main):005:0> ODBC.datasources
[#]

irb(main):006:0> require 'dbi'
true
irb(main):008:0> dbh=DBI.connect('dbi:ODBC:FooDSN', 'sa', 'lookmanopassword')
#, @attr={}>, @trace_output=#, @trace_mode=\ 2>
irb(main):015:0> dbh.select_one('select count(*) from websession')
[2495950]

(originally from http://microjet.ath.cx/WebWiki/2005.11.03_Connecting_From_Ruby_To_MSSQL.html)

2005-11-02

Removing stubborn Windows device driver

Haven't been playing admin with a Windows machine for a long time, I was surprised to see that it still sported the same obscene-looking, obscure, cryptic registries. The problem started when I wanted to install the AT&T Global Network client last Sunday so I can access office network (it's a MS shop, sad, I know) from home.
The installation was successfully completed without any error message presented to me, but I couldn't figure out my password. I would have left the program as it was, but I felt booting in took longer since I installed it. So, I uninstalled it and didn't find any noticeable difference.
Came Monday at work. I put the laptop on its dock and was going to start my workday when I found out that while it can connect to some hosts, it can't connect to DNS, email and Internet gateway. I didn't think it was the laptop at first as I was using it to browse around web sites last night. But after talking to some network people, it turned out that only my laptop can't reach those servers.
AppSupport was called in to take a look but couldn't resolve what is wrong. So, they gave me this 3com eth 10/100 PCMCIA card with an X-JACK where the network cable supposed to be plugged in to.
I'd have settled with another network card except that the jack is right to the left of my left hand, and as I was working, my left hand kept knocking on it. Each time, the connection was disrupted.
So, I googled around and found these:
http://netsecurity.about.com/gi/dynamic/offsite.htm?zi=1/XJ&sdn=netsecurity&zu=http%3A%2F%2Fwww.windowsnetworking.com%2Fkbase%2FWindowsTips%2FWindowsXP%2FUserTips%2FTroubleShooting%2FRemoveUnusedDriversandDevices.html
and
http://groups.google.com/group/comp.os.ms-windows.misc/browse_frm/thread/ac65a539ed467eba/b9714c5f0a5d7682?lnk=st&q=failed+to+uninstall+the+device&rnum=12&hl=en#b9714c5f0a5d7682
With these tips, I removed all network drivers and reinstall them. Things started working again. Yay, I avoided a Win2k reinstallation.

(originally from http://microjet.ath.cx/WebWiki/2005.11.01_Removing_Stubborn_Windows_Device_Driver.html)