Workiis: 2005-08

2005-08-27

AMD 64 hyper transport single channel

Results from running memtest86 on three computers:

Memtest86 Summary
	AMD Athlon XP 2100	Intel Pentium M 1.3GHz	AMD Athlon 64 3400 (socket754)
Operating Frequency	1.7GHz	1.3GHz	2.4GHz
L1 Size(KiB)	128	64	128
L2 Size(KiB)	256	1024	512
RAM Size(MiB)	768	496	1024
L1 Rate(MiB/s)	10602	16034	19664
L2 Rate(MiB/s)	3375	7967	4886
RAM Rate(MiB/s)	505	939	1441
Chipset	VIA KT400(A)/600	Intel i855GM/GME FSB:99MHz	SIS 760/M760
Settings	DDR266	RAM: 132MHz (DDR264) CAS: 2-3-2-6	DDR400

Notes:

memtest misidentified the operating frequency of the Athlon64. It should be 2.2GHz.

The Athlon64, being on socket 754, only have a single channel to the RAM.
Athlon64 on socket 939 (I don't have one) would have two channels to the RAM. Of course, to take advantage of the second channel, you'd have to have a second DIMM module.
I have no idea how to test the decrease in latency of RAM access AMD touts in Athlon64 with its built-in memory controller.

(originally from http://microjet.ath.cx/WebWiki/Amd64HyperTransportSingleChannel.html)

2005-08-23

Result pagination with postgresql

A common problem with webapps is providing an interface to page through result set like what various search engines do.
I have yet to find a website that discuss this in depth. So, here is a summary of solutions I came up with by looking at various pieces in the Web for doing result pagination with Postgresql. I hope this would give the sorely needed encouragement for people to start sharing their findings.
The problem actually has three components:

displaying result for a certain page,
not causing undue latency in page display, and
counting number of results.

Counting the accurate number of results would almost always require the full result set to be counted. Applications would usually cache this number. How exactly is the counting done depends on the approach taken.
And there are two basic approaches:

operating on piecemeal result set
operating on full result set

Please realise that there is no 'best' approach. Each comes with its own pros and cons.
To illustrate the pros and cons, I am employing two kind of queries: cheap and expensive. I categorise queries according to the effort exacted from pgsql: cheap and expensive. This categorisation is only for simplicity purposes as there are, of course, grey areas, queries that are neither cheap nor expensive; not to mention that cheap and expensive are subjective terms anyway.

dms3_test=> create view cheap as select id from document;
CREATE VIEW
Time: 150.078 ms

dms3_test=> create view expensive as SELECT doc.id FROM
attribute as attr0, attribute_name as an0, document as doc, state,
document_attribute as da0 WHERE state.id = doc.state_id AND state.name
= 'new' AND da0.doc_id = doc.id AND attr0.id = da0.attribute_id AND
an0.id = attr0.name_id AND an0.name = 'ssis_client' AND attr0.value
ILIKE 'client1';
CREATE VIEW
Time: 120.979 ms

Since what is pro and what is con depend very much against the context, I simply list the characteristics of each approach without further labelling.

Operating on Piecemeal Result Set (New Query for Each Page)

In this approach, a new query is run for each page. Each query differs only in OFFSET and LIMIT clauses.
For example, for the first page, the query would be executed with OFFSET 0 and LIMIT 10. The second page would be OFFSET 11 LIMIT 10.
This approach is popular and is found in various web applications. It is simple to implement and has an acceptable performance on cheap queries.
Number of matching results could be counted with a SELECT COUNT(*) in the beginning. This number could be cached as well so as to reduce the load on the server.
The latency in page display is low in the beginning and degrades linearly as user moves deeper.
The problem with this approach is it does not reuse previous effort. This is especially problematic if the query is expensive.
Another problem is each query would potentially see different snapshot of the data. If user is browsing page n and the underlying data changes, refreshing or revisiting page n would show a different data.
cheap query

dms3_test=> abort; 
begin; 
select count(*) from cheap; 
select * from cheap order by id offset 0 limit 10; 
select * from cheap order by id offset 50000 limit 10;

ROLLBACK
Time: 1.640 ms
BEGIN
Time: 6.187 ms
  count  
---------
 1010431
(1 row)

Time: 1094.187 ms
 id 
----
  1
  2
  3
  4
  5
  6
  9
 10
 11
 12
(10 rows)

Time: 57.371 ms
  id   
-------
 50003
 50004
 50005
 50006
 50007
 50008
 50009
 50010
 50011
 50012
(10 rows)

Time: 134.610 ms

expensive query

dms3_test=> abort; 
begin; 
select count(*) from expensive; 
select * from expensive order by id offset 0 limit 10; 
select * from expensive order by id offset 50000 limit 10;

ROLLBACK
Time: 2.698 ms
BEGIN
Time: 4.584 ms
count 
-------
 68276
(1 row)

Time: 18034.510 ms
 id  
-----
   6
  50
  55
  65
  89
 109
 110
 133
 144
 155
(10 rows)

Time: 76.929 ms
   id   
--------
 749659
 749661
 749667
 749685
 749692
 749720
 749732
 749740
 749741
 749778
(10 rows)

Time: 14424.053 ms

Characteristics:

simple implementation
no setup cost
suitable for cheap queries
suitable if user is expected to browse through only few pages
not isolated from underlying changes
latency degrades linearly

Operating on Full Result Set

This approach takes off from the previous one by reusing previous effort. The database takes a hit only on new query criteria, instead of every time the user changes pages.
This approach could be implemented by using either a temporary table or a without hold cursor. Both implementations require the webapp to maintain and reuse the transaction in which the table or cursor is defined.
A common strategy is to maintain a fixed number of connections to the database and assign one connection to the processing of a query in a round-robin way, i.e. map a specific query criteria to a specific connection.
In each connection, a transaction is held open throughout the duration of the webapp. This transaction would hold various temporary tables or cursors. You would want to keep the transaction open as long as possible.
Warning: keeping a transaction open for a long time would have the

following negative side-effects:

prevents vacuum from removing all dead tuples.
blocks changes to schema
may block other transactions if data is modified within the transaction

Moreover, transactions may become invalid at any time due to some unforeseen event like Bob spilling his soda over the ethernet switch.
Therefore, your webapp should be able to re-connect and re-setup the temporary tables or cursors setup if the existing connection or transaction is no longer valid.
Being able to re-setup would also allow the DBA to vacuum thoroughly and/or make schema updates by simply killing and temporarily blocking connections from your webapp during low-traffic hours without having to restart your webapp. This is a big deal if the DBA person is not the sysadmin or have permission to restart your webapp.
Before processing each query, it is recommended to generate a SAVEPOINT so that any error in processing a query would not destroy the transaction.

Using Temporary Tables

The result set could be piped into a temporary table via the CREATE TEMPORARY TABLE foo AS command. It is important to remember to use a temporary table since it is not journalled into the WAL (write-ahead logging) which would have negative impact on performance.
The implementation gives you a free count of matching result when you do the CREATE TEMPORARY TABLE AS. I am not sure why psql does not show the count, but it is accessible from within a stored procedure or your DB driver.
cheap query

dms3_test=> abort;
begin;
create temporary table foo as select * from cheap order by id;
select * from foo order by id offset 0 limit 10; 
select * from foo order by id offset 50000 limit 10;

ROLLBACK
Time: 60.744 ms
BEGIN
Time: 0.686 ms
SELECT
Time: 15125.956 ms
 id 
----
  1
  2
  3
  4
  5
  6
  9
 10
 11
 12
(10 rows)

Time: 4397.762 ms
  id   
-------
 50003
 50004
 50005
 50006
 50007
 50008
 50009
 50010
 50011
 50012
(10 rows)

Time: 4413.789 ms

expensive query

dms3_test=> abort;
begin;
create temporary table foo as select * from expensive order by id;
select * from foo order by id offset 0 limit 10; 
select * from foo order by id offset 50000 limit 10;

ROLLBACK
Time: 52.777 ms
BEGIN
Time: 3.683 ms
SELECT
Time: 18666.615 ms
 id  
-----
   6
  50
  55
  65
  89
 109
 110
 133
 144
 155
(10 rows)

Time: 314.754 ms
   id   
--------
 749659
 749661
 749667
 749685
 749692
 749720
 749732
 749740
 749741
 749778
(10 rows)

Time: 342.207 ms

Characteristics:

complex implementation
high setup cost
free result count as a side-effect
suitable for expensive queries
suitable if user is expected to comprehensively browse the result set
isolated from underlying changes
latency still degrades linearly but more gently

Using Without Hold Cursors

Without hold cursors are destroyed at the end of transaction, similar to temporary tables. On the other hand, with hold cursors outlive the creating transaction, although they are still bounded within a session. I recommend using without hold cursors to simplify garbage management.
cheap query

dms3_test=> abort;
begin;
declare cheap_cursor scroll cursor for select * from cheap order by id;
move all from cheap_cursor;
move first from cheap_cursor;
fetch 10 from cheap_cursor;
move absolute 50000 from cheap_cursor;
fetch 10 from cheap_cursor;

ROLLBACK
Time: 4.054 ms
BEGIN
Time: 0.970 ms
DECLARE CURSOR
Time: 1.022 ms
MOVE 1010431
Time: 12434.136 ms
MOVE 1
Time: 4.409 ms
 id 
----
  2
  3
  4
  5
  6
  9
 10
 11
 12
 13
(10 rows)

Time: 4.418 ms
MOVE 1
Time: 30.055 ms
  id   
-------
 50003
 50004
 50005
 50006
 50007
 50008
 50009
 50010
 50011
 50012
(10 rows)

Time: 3.875 ms

expensive query

dms3_test=> abort;
begin;
declare expensive_cursor scroll cursor for select * from expensive order by id;
move all from expensive_cursor;
move first from expensive_cursor;
fetch 10 from expensive_cursor;
move absolute 50000 from expensive_cursor;
fetch 10 from expensive_cursor;

ROLLBACK
Time: 2.044 ms
BEGIN
Time: 0.739 ms
DECLARE CURSOR
Time: 51.912 ms
MOVE 68276
Time: 19036.148 ms
MOVE 1
Time: 1.055 ms
 id  
-----
  50
  55
  65
  89
 109
 110
 133
 144
 155
 186
(10 rows)

Time: 0.911 ms
MOVE 1
Time: 30.226 ms
   id   
--------
 749659
 749661
 749667
 749685
 749692
 749720
 749732
 749740
 749741
 749778
(10 rows)

Time: 1.736 ms

Characteristics:

complex implementation
high setup cost
suitable for expensive queries
suitable if user is expected to comprehensively browse the result set
isolated from underlying changes
barely noticeable latency

Hybrid Approach

One could do a hybrid approach. The implementation would be even more complex, but in some cases, it could combine the no setup cost benefit of the piecemeal approach and the low latency of the full result approach.
The hybrid approach would operate on piecemeal result set until a certain threshold is reached, e.g.: paging past page 7. When that happens, one of the full result set approach is executed, preferably in the background. The webapp could transition to using the full result set when it is ready.

Summary

Summary of Implementations
Query Type	Implementation	Setup(ms)	Counting(ms)	First Page(ms)	5000th Page(ms)
Cheap	New query per page	N/A	1094.187	57.371	134.610
	Temporary Table	15125.956	N/A	4397.762	4413.789
	Cursor	1.022	12434.136	8.827	33.930
Expensive	New query per page	N/A	18034.510	76.929	14424.053
	Temporary Table	18666.615	N/A	314.754	342.207
	Cursor	51.921	19036.148	1.966	31.962

(originally from http://microjet.ath.cx/WebWiki/ResultPaginationWithPostgresql.html)

pvmove problem

I am using LVM2 with linux 2.6.11. One of the drive in the volume is going bad. I could hear screeching noise while the platters were spinning.

That is bad. Not a problem, though, as I can just vacate the data from that drive using pvmove.
The bad drive is /dev/bad. /dev/avail is a volume with some free space.

# pvdisplay /dev/bad /dev/avail
  --- Physical volume ---
  PV Name               /dev/bad

VG Name               vg0
  PV Size               148.09 GB / not usable 0
  Allocatable           yes
  PE Size (KByte)       4096
  Total PE              37911
  Free PE               25111
  Allocated PE          12800
  PV UUID               e5EvaO-0oo5-Zenl-KzkY-wFf7-EkQX-di1mXt

  --- Physical volume ---
  PV Name               /dev/avail

VG Name               vg0
  PV Size               186.30 GB / not usable 0
  Allocatable           yes
  PE Size (KByte)       4096
  Total PE              47694
  Free PE               20658
  Allocated PE          27036
  PV UUID               HLtCMK-751U-IFAW-5Aj3-FQ3w-5tqO-8svvwt

Let's move it to the only drive with available space, /dev/avail. /dev/bad has 12800 allocated PE while /dev/avail has 20658 free PE. I was not expecting any problem fitting the data in /dev/bad into /dev/avail.

# pvmove -i 5 -v /dev/bad

Finding volume group "vg0"
    Archiving volume group "vg0" metadata.
    Creating logical volume pvmove0
    Moving 0 extents of logical volume vg0/lv0

Insufficient contiguous allocatable extents (1777) for logical
    volume pvmove0: 12800 required
  Unable to allocate temporary LV for pvmove.

Urk? It needs to be contiguous? What to do now?

Searching through the lvm mailing list shows that pvmove is dumb. It only sees the first free PE (physical extent). OK, let's work around this.

# vgcfgbackup

# grep pv0 /etc/lvm/backup/vg0
                pv0 {
                                        "pv0", 30720
                                        "pv0", 0
                                        "pv0", 17920

pv0 is the physical volume corresponding to /dev/bad. From this, we see that there are three segments residing in pv0. The first starts at PE 30720.

Let's try to fill that 1777 free PE on the dest drive, /dev/avail. That means, we'll be moving PE 30720 to (30720+1777-1=32496) from pv0.

# pvmove -i 5 -v /dev/bad:30720-32496
    Finding volume group "vg0"
    Archiving volume group "vg0" metadata.
    Creating logical volume pvmove0
    Moving 0 extents of logical volume vg0/lv0

Moving 1777 extents of logical volume vg0/lv1
    Moving 0 extents of logical volume vg0/lv2

Moving 0 extents of logical volume vg0/lv3
    Found volume group "vg0"
    Updating volume group metadata
    Creating volume group backup "/etc/lvm/backup/vg0"
    Found volume group "vg0"
    Found volume group "vg0"
    Loading vg0-pvmove0
    Found volume group "vg0"
    Loading vg0-lv1

Checking progress every 5 seconds
  /dev/hdf2: Moved: 7.7%
  /dev/hdf2: Moved: 14.4%
  /dev/hdf2: Moved: 22.6%
  /dev/hdf2: Moved: 29.7%
  /dev/hdf2: Moved: 37.4%
  /dev/hdf2: Moved: 44.6%
  /dev/hdf2: Moved: 52.3%
  /dev/hdf2: Moved: 60.0%
  /dev/hdf2: Moved: 67.7%
  /dev/hdf2: Moved: 75.4%
  /dev/hdf2: Moved: 83.1%
  /dev/hdf2: Moved: 90.3%
  /dev/hdf2: Moved: 97.4%
  /dev/hdf2: Moved: 100.0%
    Found volume group "vg0"
    Found volume group "vg0"
    Found volume group "vg0"
    Loading vg0-pvmove0
    Found volume group "vg0"
    Loading vg0-lv1

    Found volume group "vg0"
    Found volume group "vg0"
    Removing temporary pvmove LV
    Writing out final volume group after pvmove
    Creating volume group backup "/etc/lvm/backup/vg0"

Finally, after repeating the above process for the remaining segments, /dev/bad, aka pv0, is free of data and is safe to take down.

# vgreduce vg0 /dev/bad

Toss it in the garbage bin.

(originally from http://microjet.ath.cx/WebWiki/pvmove%20problem.html)

2005-08-14

How Does Emacs Know Which Functions Are Interactive

When you do M-x foo in emacs, you are asking emacs to execute an interactive function named "foo". The function must be declared in a special way as follow:

(defun foo/1 ()
  (interactive)
  (if (eq some-condition t)
      (execute "rm -rf /")))

The form on the second line, "(interactive)", tells emacs that this function is an interactive function.

But how does the emacs knows you put that form in that function? It could try to execute the function, but it may produce an unwanted side-effect (like executing "rm -rf /" if the condition is met).

Could you do a partial execution, e.g.: when a function is defined, execute only the first form and don't execute the rest? But that does not explain the following capability:

(defun foo/2 (do-the-rm-rf)
  (interactive (list (read-string "Run rm -rf /?" "N")))
  (if (string= do-the-rm-rf "Y")
      (execute "rm -rf /")))

(defun call-foo/2 ()
  (funcall 'foo/2 "N"))

(defun call-foo/3 ()
  (interactive (list (execute "rm -rf /"))))

When you do M-x foo/2, emacs would first run the read-string function, prompt the user, get the answer from user, and then run the function foo/2. But if you call foo/2 non-interactively by calling it from another function like in call-foo/2, emacs will not prompt the user at all.

Well, that blows the hypotheses that when a function is defined, emacs executes only the first form. There could be no execution whatsoever since doing so may cause severe side-effect. What if the execute form in call-foo/3 is executed? Bad, bad, bad.

So, where is the magic?

The magic lies in macro

defun is actually a special form call. A special form is a form that may be implemented and/or executed differently. If a form is implemented in non-elisp language, it is a special form. If a form is not a function call form, it is a special form (ordinarily, '(something)' in elisp would execute the function 'something').

A special form basically can do anything to its body (macro), including scanning for the interactive form when it is called, just like what the special form defun does.

(originally from http://microjet.ath.cx/WebWiki/HowDoesEmacsKnowWhichFunctionsAreInteractive.html)