2005-08-27

AMD 64 hyper transport single channel

Results from running memtest86 on three computers:
           Memtest86 Summary                                                  
              
               
AMD Athlon XP 2100
                   
Intel Pentium M 1.3GHz
                      
AMD Athlon 64 3400
(socket754)        
Operating     
Frequency      
1.7GHz            
                   
1.3GHz               
                      
2.4GHz            
                   
L1 Size(KiB)  
               
128               
                   
64                   
                      
128               
                   
L2 Size(KiB)  
               
256               
                   
1024                 
                      
512               
                   
RAM Size(MiB) 
               
768               
                   
496                  
                      
1024              
                   
L1 Rate(MiB/s)
               
10602             
                   
16034                
                      
19664             
                   
L2 Rate(MiB/s)
               
3375              
                   
7967                 
                      
4886              
                   
RAM Rate(MiB/s)
               
505               
                   
939                  
                      
1441              
                   
Chipset       
               
VIA KT400(A)/600  
                   
Intel i855GM/GME     
FSB:99MHz             
SIS 760/M760      
                   
Settings      
               
DDR266            
                   
RAM: 132MHz (DDR264) 
CAS: 2-3-2-6          
DDR400            
                   
Notes:
  • memtest misidentified the operating frequency of the Athlon64. It should be 2.2GHz.
The Athlon64, being on socket 754, only have a single channel to the RAM.
Athlon64 on socket 939 (I don't have one) would have two channels to the RAM. Of course, to take advantage of the second channel, you'd have to have a second DIMM module.
I have no idea how to test the decrease in latency of RAM access AMD touts in Athlon64 with its built-in memory controller.

(originally from http://microjet.ath.cx/WebWiki/Amd64HyperTransportSingleChannel.html)

2005-08-23

Result pagination with postgresql

A common problem with webapps is providing an interface to page through result set like what various search engines do.
I have yet to find a website that discuss this in depth. So, here is a summary of solutions I came up with by looking at various pieces in the Web for doing result pagination with Postgresql. I hope this would give the sorely needed encouragement for people to start sharing their findings.
The problem actually has three components:
  • displaying result for a certain page,
  • not causing undue latency in page display, and
  • counting number of results.
Counting the accurate number of results would almost always require the full result set to be counted. Applications would usually cache this number. How exactly is the counting done depends on the approach taken.
And there are two basic approaches:
  • operating on piecemeal result set
  • operating on full result set
Please realise that there is no 'best' approach. Each comes with its own pros and cons.
To illustrate the pros and cons, I am employing two kind of queries: cheap and expensive. I categorise queries according to the effort exacted from pgsql: cheap and expensive. This categorisation is only for simplicity purposes as there are, of course, grey areas, queries that are neither cheap nor expensive; not to mention that cheap and expensive are subjective terms anyway.
dms3_test=> create view cheap as select id from document;
CREATE VIEW
Time: 150.078 ms

dms3_test=> create view expensive as SELECT doc.id FROM
attribute as attr0, attribute_name as an0, document as doc, state,
document_attribute as da0 WHERE state.id = doc.state_id AND state.name
= 'new' AND da0.doc_id = doc.id AND attr0.id = da0.attribute_id AND
an0.id = attr0.name_id AND an0.name = 'ssis_client' AND attr0.value
ILIKE 'client1';
CREATE VIEW
Time: 120.979 ms
Since what is pro and what is con depend very much against the context, I simply list the characteristics of each approach without further labelling.

Operating on Piecemeal Result Set (New Query for Each Page)

In this approach, a new query is run for each page. Each query differs only in OFFSET and LIMIT clauses.
For example, for the first page, the query would be executed with OFFSET 0 and LIMIT 10. The second page would be OFFSET 11 LIMIT 10.
This approach is popular and is found in various web applications. It is simple to implement and has an acceptable performance on cheap queries.
Number of matching results could be counted with a SELECT COUNT(*) in the beginning. This number could be cached as well so as to reduce the load on the server.
The latency in page display is low in the beginning and degrades linearly as user moves deeper.
The problem with this approach is it does not reuse previous effort. This is especially problematic if the query is expensive.
Another problem is each query would potentially see different snapshot of the data. If user is browsing page n and the underlying data changes, refreshing or revisiting page n would show a different data.
cheap query
dms3_test=> abort; 
begin; 
select count(*) from cheap; 
select * from cheap order by id offset 0 limit 10; 
select * from cheap order by id offset 50000 limit 10;

ROLLBACK
Time: 1.640 ms
BEGIN
Time: 6.187 ms
  count  
---------
 1010431
(1 row)

Time: 1094.187 ms
 id 
----
  1
  2
  3
  4
  5
  6
  9
 10
 11
 12
(10 rows)

Time: 57.371 ms
  id   
-------
 50003
 50004
 50005
 50006
 50007
 50008
 50009
 50010
 50011
 50012
(10 rows)

Time: 134.610 ms
expensive query
dms3_test=> abort; 
begin; 
select count(*) from expensive; 
select * from expensive order by id offset 0 limit 10; 
select * from expensive order by id offset 50000 limit 10;

ROLLBACK
Time: 2.698 ms
BEGIN
Time: 4.584 ms
count 
-------
 68276
(1 row)

Time: 18034.510 ms
 id  
-----
   6
  50
  55
  65
  89
 109
 110
 133
 144
 155
(10 rows)

Time: 76.929 ms
   id   
--------
 749659
 749661
 749667
 749685
 749692
 749720
 749732
 749740
 749741
 749778
(10 rows)

Time: 14424.053 ms
Characteristics:
  • simple implementation
  • no setup cost
  • suitable for cheap queries
  • suitable if user is expected to browse through only few pages
  • not isolated from underlying changes
  • latency degrades linearly

Operating on Full Result Set

This approach takes off from the previous one by reusing previous effort. The database takes a hit only on new query criteria, instead of every time the user changes pages.
This approach could be implemented by using either a temporary table or a without hold cursor. Both implementations require the webapp to maintain and reuse the transaction in which the table or cursor is defined.
A common strategy is to maintain a fixed number of connections to the database and assign one connection to the processing of a query in a round-robin way, i.e. map a specific query criteria to a specific connection.
In each connection, a transaction is held open throughout the duration of the webapp. This transaction would hold various temporary tables or cursors. You would want to keep the transaction open as long as possible.
Warning: keeping a transaction open for a long time would have the
following negative side-effects:
  • prevents vacuum from removing all dead tuples.
  • blocks changes to schema
  • may block other transactions if data is modified within the transaction
Moreover, transactions may become invalid at any time due to some unforeseen event like Bob spilling his soda over the ethernet switch.
Therefore, your webapp should be able to re-connect and re-setup the temporary tables or cursors setup if the existing connection or transaction is no longer valid.
Being able to re-setup would also allow the DBA to vacuum thoroughly and/or make schema updates by simply killing and temporarily blocking connections from your webapp during low-traffic hours without having to restart your webapp. This is a big deal if the DBA person is not the sysadmin or have permission to restart your webapp.
Before processing each query, it is recommended to generate a SAVEPOINT so that any error in processing a query would not destroy the transaction.

Using Temporary Tables

The result set could be piped into a temporary table via the CREATE TEMPORARY TABLE foo AS command. It is important to remember to use a temporary table since it is not journalled into the WAL (write-ahead logging) which would have negative impact on performance.
The implementation gives you a free count of matching result when you do the CREATE TEMPORARY TABLE AS. I am not sure why psql does not show the count, but it is accessible from within a stored procedure or your DB driver.
cheap query
dms3_test=> abort;
begin;
create temporary table foo as select * from cheap order by id;
select * from foo order by id offset 0 limit 10; 
select * from foo order by id offset 50000 limit 10;

ROLLBACK
Time: 60.744 ms
BEGIN
Time: 0.686 ms
SELECT
Time: 15125.956 ms
 id 
----
  1
  2
  3
  4
  5
  6
  9
 10
 11
 12
(10 rows)

Time: 4397.762 ms
  id   
-------
 50003
 50004
 50005
 50006
 50007
 50008
 50009
 50010
 50011
 50012
(10 rows)

Time: 4413.789 ms
expensive query
dms3_test=> abort;
begin;
create temporary table foo as select * from expensive order by id;
select * from foo order by id offset 0 limit 10; 
select * from foo order by id offset 50000 limit 10;

ROLLBACK
Time: 52.777 ms
BEGIN
Time: 3.683 ms
SELECT
Time: 18666.615 ms
 id  
-----
   6
  50
  55
  65
  89
 109
 110
 133
 144
 155
(10 rows)

Time: 314.754 ms
   id   
--------
 749659
 749661
 749667
 749685
 749692
 749720
 749732
 749740
 749741
 749778
(10 rows)

Time: 342.207 ms
Characteristics:
  • complex implementation
  • high setup cost
  • free result count as a side-effect
  • suitable for expensive queries
  • suitable if user is expected to comprehensively browse the result set
  • isolated from underlying changes
  • latency still degrades linearly but more gently

Using Without Hold Cursors

Without hold cursors are destroyed at the end of transaction, similar to temporary tables. On the other hand, with hold cursors outlive the creating transaction, although they are still bounded within a session. I recommend using without hold cursors to simplify garbage management.
cheap query
dms3_test=> abort;
begin;
declare cheap_cursor scroll cursor for select * from cheap order by id;
move all from cheap_cursor;
move first from cheap_cursor;
fetch 10 from cheap_cursor;
move absolute 50000 from cheap_cursor;
fetch 10 from cheap_cursor;

ROLLBACK
Time: 4.054 ms
BEGIN
Time: 0.970 ms
DECLARE CURSOR
Time: 1.022 ms
MOVE 1010431
Time: 12434.136 ms
MOVE 1
Time: 4.409 ms
 id 
----
  2
  3
  4
  5
  6
  9
 10
 11
 12
 13
(10 rows)

Time: 4.418 ms
MOVE 1
Time: 30.055 ms
  id   
-------
 50003
 50004
 50005
 50006
 50007
 50008
 50009
 50010
 50011
 50012
(10 rows)

Time: 3.875 ms
expensive query
dms3_test=> abort;
begin;
declare expensive_cursor scroll cursor for select * from expensive order by id;
move all from expensive_cursor;
move first from expensive_cursor;
fetch 10 from expensive_cursor;
move absolute 50000 from expensive_cursor;
fetch 10 from expensive_cursor;

ROLLBACK
Time: 2.044 ms
BEGIN
Time: 0.739 ms
DECLARE CURSOR
Time: 51.912 ms
MOVE 68276
Time: 19036.148 ms
MOVE 1
Time: 1.055 ms
 id  
-----
  50
  55
  65
  89
 109
 110
 133
 144
 155
 186
(10 rows)

Time: 0.911 ms
MOVE 1
Time: 30.226 ms
   id   
--------
 749659
 749661
 749667
 749685
 749692
 749720
 749732
 749740
 749741
 749778
(10 rows)

Time: 1.736 ms
Characteristics:
  • complex implementation
  • high setup cost
  • suitable for expensive queries
  • suitable if user is expected to comprehensively browse the result set
  • isolated from underlying changes
  • barely noticeable latency

Hybrid Approach

One could do a hybrid approach. The implementation would be even more complex, but in some cases, it could combine the no setup cost benefit of the piecemeal approach and the low latency of the full result approach.
The hybrid approach would operate on piecemeal result set until a certain threshold is reached, e.g.: paging past page 7. When that happens, one of the full result set approach is executed, preferably in the background. The webapp could transition to using the full result set when it is ready.

Summary

                              Summary of Implementations                              
Query Type Implementation        Setup(ms) Counting(ms) First Page(ms) 5000th Page(ms)
         
         
         
Cheap    
         
         
         
          
New query per page  
                     
      N/A
         
    1094.187
            
        57.371
              
        134.610
               
Temporary Table     
                     
15125.956
         
         N/A
            
      4397.762
              
       4413.789
               
Cursor              
                     
    1.022
         
   12434.136
            
         8.827
              
         33.930
               
         
         
         
Expensive
         
         
         
          
New query per page  
                     
      N/A
         
   18034.510
            
        76.929
              
      14424.053
               
Temporary Table     
                     
18666.615
         
         N/A
            
       314.754
              
        342.207
               
Cursor              
                     
   51.921
         
   19036.148
            
         1.966
              
         31.962
               

 
 
(originally from http://microjet.ath.cx/WebWiki/ResultPaginationWithPostgresql.html)

pvmove problem

I am using LVM2 with linux 2.6.11. One of the drive in the volume is going bad. I could hear screeching noise while the platters were spinning.

That is bad. Not a problem, though, as I can just vacate the data from that drive using pvmove.
 The bad drive is /dev/bad. /dev/avail is a volume with some free space.
 
# pvdisplay /dev/bad /dev/avail
  --- Physical volume ---
  PV Name               /dev/bad
VG Name               vg0
  PV Size               148.09 GB / not usable 0
  Allocatable           yes
  PE Size (KByte)       4096
  Total PE              37911
  Free PE               25111
  Allocated PE          12800
  PV UUID               e5EvaO-0oo5-Zenl-KzkY-wFf7-EkQX-di1mXt

  --- Physical volume ---
  PV Name               /dev/avail
VG Name               vg0
  PV Size               186.30 GB / not usable 0
  Allocatable           yes
  PE Size (KByte)       4096
  Total PE              47694
  Free PE               20658
  Allocated PE          27036
  PV UUID               HLtCMK-751U-IFAW-5Aj3-FQ3w-5tqO-8svvwt


Let's move it to the only drive with available space, /dev/avail. /dev/bad has 12800 allocated PE while /dev/avail has 20658 free PE. I was not expecting any problem fitting the data in /dev/bad into /dev/avail.
 
# pvmove -i 5 -v /dev/bad
Finding volume group "vg0"
    Archiving volume group "vg0" metadata.
    Creating logical volume pvmove0
    Moving 0 extents of logical volume vg0/lv0
Insufficient contiguous allocatable extents (1777) for logical
    volume pvmove0: 12800 required
  Unable to allocate temporary LV for pvmove.

Urk? It needs to be contiguous? What to do now?

Searching through the lvm mailing list shows that pvmove is dumb. It only sees the first free PE (physical extent). OK, let's work around this.
 
# vgcfgbackup

# grep pv0 /etc/lvm/backup/vg0
                pv0 {
                                        "pv0", 30720
                                        "pv0", 0
                                        "pv0", 17920

pv0 is the physical volume corresponding to /dev/bad. From this, we see that there are three segments residing in pv0. The first starts at PE 30720.

Let's try to fill that 1777 free PE on the dest drive, /dev/avail. That means, we'll be moving PE 30720 to (30720+1777-1=32496) from pv0.


# pvmove -i 5 -v /dev/bad:30720-32496
    Finding volume group "vg0"
    Archiving volume group "vg0" metadata.
    Creating logical volume pvmove0
    Moving 0 extents of logical volume vg0/lv0
Moving 1777 extents of logical volume vg0/lv1
    Moving 0 extents of logical volume vg0/lv2
Moving 0 extents of logical volume vg0/lv3
    Found volume group "vg0"
    Updating volume group metadata
    Creating volume group backup "/etc/lvm/backup/vg0"
    Found volume group "vg0"
    Found volume group "vg0"
    Loading vg0-pvmove0
    Found volume group "vg0"
    Loading vg0-lv1
Checking progress every 5 seconds
  /dev/hdf2: Moved: 7.7%
  /dev/hdf2: Moved: 14.4%
  /dev/hdf2: Moved: 22.6%
  /dev/hdf2: Moved: 29.7%
  /dev/hdf2: Moved: 37.4%
  /dev/hdf2: Moved: 44.6%
  /dev/hdf2: Moved: 52.3%
  /dev/hdf2: Moved: 60.0%
  /dev/hdf2: Moved: 67.7%
  /dev/hdf2: Moved: 75.4%
  /dev/hdf2: Moved: 83.1%
  /dev/hdf2: Moved: 90.3%
  /dev/hdf2: Moved: 97.4%
  /dev/hdf2: Moved: 100.0%
    Found volume group "vg0"
    Found volume group "vg0"
    Found volume group "vg0"
    Loading vg0-pvmove0
    Found volume group "vg0"
    Loading vg0-lv1
    Found volume group "vg0"
    Found volume group "vg0"
    Removing temporary pvmove LV
    Writing out final volume group after pvmove
    Creating volume group backup "/etc/lvm/backup/vg0"

Finally, after repeating the above process for the remaining segments, /dev/bad, aka pv0, is free of data and is safe to take down.

# vgreduce vg0 /dev/bad
 
Toss it in the garbage bin.


(originally from http://microjet.ath.cx/WebWiki/pvmove%20problem.html)

2005-08-14

How Does Emacs Know Which Functions Are Interactive

When you do M-x foo in emacs, you are asking emacs to execute an interactive function named "foo". The function must be declared in a special way as follow:
 
(defun foo/1 ()
  (interactive)
  (if (eq some-condition t)
      (execute "rm -rf /")))

The form on the second line, "(interactive)", tells emacs that this function is an interactive function.

But how does the emacs knows you put that form in that function? It could try to execute the function, but it may produce an unwanted side-effect (like executing "rm -rf /" if the condition is met).

Could you do a partial execution, e.g.: when a function is defined, execute only the first form and don't execute the rest? But that does not explain the following capability:
 
(defun foo/2 (do-the-rm-rf)
  (interactive (list (read-string "Run rm -rf /?" "N")))
  (if (string= do-the-rm-rf "Y")
      (execute "rm -rf /")))

(defun call-foo/2 ()
  (funcall 'foo/2 "N"))

(defun call-foo/3 ()
  (interactive (list (execute "rm -rf /"))))

When you do M-x foo/2, emacs would first run the read-string function, prompt the user, get the answer from user, and then run the function foo/2. But if you call foo/2 non-interactively by calling it from another function like in call-foo/2, emacs will not prompt the user at all.

Well, that blows the hypotheses that when a function is defined, emacs executes only the first form. There could be no execution whatsoever since doing so may cause severe side-effect. What if the execute form in call-foo/3 is executed? Bad, bad, bad.

So, where is the magic?

The magic lies in macro

defun is actually a special form call. A special form is a form that may be implemented and/or executed differently. If a form is implemented in non-elisp language, it is a special form. If a form is not a function call form, it is a special form (ordinarily, '(something)' in elisp would execute the function 'something').

A special form basically can do anything to its body (macro), including scanning for the interactive form when it is called, just like what the special form defun does.

(originally from http://microjet.ath.cx/WebWiki/HowDoesEmacsKnowWhichFunctionsAreInteractive.html)