2010-12-17

Nook Color's (NC) PDF Handling

I bought a Nook Color for one purpose only: reading technical PDFs. It arrived on Nov 26th and I've used it to read several docs. Here's my review, limited to NC's PDF handling as it handled EPUB and PDF differently. If you want a review of NC's intended main function, that is, as an EPUB reader, there are excellent reviews out there.

The likes:

* Orientation lock.

* Variable zoom. You could use pinch-and-zoom to size the document just right. Another way to zoom was to use the '+' and '-' buttons to zoom in pre-determined increments.

* Zoom setting was kept across pages. This was huge in terms of usability. I had tried some devices, including one that was currently very popular, that did not preserve zoom setting across the pages. Each time you change page, you have to re-zoom.

* Handled scanned PDFs. Scanned PDFs are PDFs whose pages are images. The largest I tried had 750 pages that were scanned at 150dpi resulting in a file size of 170MB. NC handled it easily. There was no slow down flipping from page 555 to the last pages. I read (actual reading) about 130 pages from the document comfortably.

* Ease of side loading. When connected to a computer, it presented itself as two USB mass storage devices, one for the internal memory and the other for the microSD card. There was a directory 'My Files'. You could re-arrange its contents to your liking.

The borderlines:

* If you drag a page (to expose more lines) or zoom out, the newly visible area would appear white for a while. I presumed NC did not do off-screen rendering. How long it stayed white depended on how much of new area NC had to render. With both scanned and non-scanned PDFs, when I dragged my finger from the very bottom to the very top of the screen, creating a very large new area, it took about 1/2 or 3/4 second for NC to populate it. The lag was not noticeable at my reading pace. It became noticeable when I had to pan around, for example, trying to look at a graph at the top of the page when I was almost at the end of the page.

* It couldn't display two pages at the same time. This was useful for looking at pictures that span two pages as usually found in magazines.

* I didn't see any text reflowing. May be it did that only on certain text PDFs?

The dislikes:

* If you switch applications (even something as simple as going to the home screen and choosing to continue reading (by clicking on the upper bar of the home screen)), you'll start at page 1 instead of resuming where you were.

* No bookmarking. There was no bookmark facility for PDF. So before you switch application you need to remember to note the current page number.

* It did not keep track of the last page read. Again, you need to note the current page number yourself. Fortunately, sleeping and resuming the device preserved the current page.

* Did not support Table of Content. A TOC was essential when you have a voluminous PDF.

* No search functionality.

* There were handling problems with some PDF documents. For example this MQSeries API document. If you read from the first page, NC renders it correctly. You can rotate, zoom in, zoom out and such. But if you jump to, say, page 555, it'll go bonkers. If you want to go to the next page, you have to use the go-to-page functionality. Flipping page by swiping or tapping did not work anymore. It also displayed the documents at fit-page zoom level which made the text too tiny to read, and you couldn't change that.

* Did not display Chinese characters.

I returned the device on Dec 3rd. I could tolerate some of the dislikes, but I couldn't stand having to remember the current page number. This deficiency would likely going to be fixed in the future. The development team had been diligent about improving the original Nook, and I certainly could see them putting similar efforts for Nook Color.

There was also news of successful hacking of NC as an Android tablet. May be in the future we could finally load up a proper PDF reader on NC.

I'll probably revisit NC if it gets bookmarking or remember-last-page functionality.

2010-10-02

Last Labor day I bought 2 dozen Maryland crabs to eat throughout the long weekend. The first day I ate 7 and ended up with a very bad headache. My eyes became very sensitive to light. I felt somewhat nauseous too. I took aspirin and slept it off. The next day, I could eat only 4 of them before the headache and all the other ailments came back. The next day, I could only ate one before I had to stop. I threw away the rest of the crabs the next day.

I really didn't know why I couldn't eat more. Until I read this article [http://www.hungrymonster.com/FoodFacts/Food_Facts.cfm?Phrase_vch=Seafood&fid=7203]:

What is the yellow stuff inside a cooked crab? Some people call it "mustard." Is it fat?

The "mustard" has a strong taste and is eaten by many people who consider it a delicacy.

Caution: Research shows that chemical contaminants such as polychlorinated biphenyls (PCBs) concentrate in the blue crab's hepatopancreas. Crabs caught in "advisory" areas may contain high levels of these contaminants.

Other resources, such as http://www.bluecrab.info/glossary.html, describes it as somewhat similar in function as human's liver and kidney. Meanwhile, this document also mentions that the hepatopancreas of other (not Maryland blue crab) crabs can actually serve as a marker for heavy metal pollution.

That means that its purpose is to store all the craps that crabs do not want in their blood. All the rejects, toxins, poisons, all sorts of contaminations.

And I ate, savor, sought, and treasure it... I think I was poisoning myself and I'm glad my body was able to stop me.

2010-09-27

Archiving on hard disk

How I archive at home:

The difference between an archive and a backup

The most important difference is an archive is the master copy and a backup is not a master copy. If you lose the archive copy, you risk losing the content permanently. Because it is is the master copy, it is sensible to have a backup for the master copy.

Media choice

I used to archive to CD-Rs and DVD-Rs. They had a great GiB/$ ratio which was an important factor for me. They were also impervious to some damages: power surge, file-system corruption, fat-finger deletion, etc. But they were inconvenient to handle as you need many of them. It was not so bad 10 years ago when I had my first CD-R drive because my data size was small wrt. to CD-R size. But over the years, my data size grew and grew and I've gotten lazier feeding DVD-Rs into the drive.

Now I archive to harddisk. It has became economical to do so. As of today, the lowest ratio for SATA harddrive [1] on newegg.com is 14.70GiB/$. You'd need at least 2 of them, which brings it to 7.35GiB/$. The cheapest DVD-R media goes for $18.00 for 100 single-layer discs[2], equivalent to 24.27GiB/$. DVD-Rs are still cheaper, but it takes a lot of time to write to them.

Burning one DVD-R takes 15 mins, writing the equivalent amount, 4.37GiB to a harddisk through USB2 (assume write speed is 20MiB/s) takes about 3.7 mins. Let's assume that we are going for time and so we do not bother validating the archive (re-reading), and also the data stored exactly occupies a DVD-R. The first assumption gives a huge advantage to the DVD-R because it eliminates a huge overhead in validation: the media has to be ejected and re-inserted. The second assumption also eliminates the overhead in changing DVD-R media. So, the time saved is 11.3 mins/4.37GiB = 2.59mins/GiB. If you have to archive 100GiB, copying it to harddrive is quicker by 100*(15-3.7 mins/4.37GiB) = 4.31 hours. In reality, the time saved, thus the time-cost difference, is even more pronounced as you don't have to wait in front of the computer while the data is being copied. I am usually present for less than 5 minutes total: initiate copying, initiate validation. The time difference that matters to you (you not being a slave to the DVD-R drive overlord) is 100*15mins/4.37GiB - 5mins = 5.63 hours/100GiB.

For a time saving of 5.63 hours/100GiB, I'd pay the cost difference of 100*(24.27-7.35)/(24.27*7.35) = $9.49/100GiB for harddrive.

[1] $95 for 1.5 TB HD is $95 for 1396.98 GiB = 14.70GiB/$.
[2] SL DVD+R capacity: 2295104 2KiB sectors = 4.37 GiB. $18 for 100 discs = 24.27GiB/$.

Backup of archive

According to some articles, harddrive is prone to losing its magnetism when stored for a long time. This supposedly affects recent, high-density harddisk because the bits are packed closely together. Other causes that can affect a harddrive: the motor lubrication may evaporate or chemically degrade so as to prevent the disk spinning correctly, a power surge may fried some drive electronics.

Since the archive copy is the master copy, it is prudent to back it up. The easiest way is to have a similarly-sized drive. Then you can just do a bit-by-bit copy from the master archive to the backup archive which is faster than filesystem-level copy if the drive is fully filled.

The backup media should not be of the same make and model. Harddrive is manufactured in batches, and the same defect usually is present in every drive in a batch. If you buy two drives of the same make and model at the same time, they may come from the same batch.

Partitioning archive disk

If the archive disk is big if you may want to partition it into smaller pieces. Partitioning can limit the extent of filesystem corruption as long as you operate on (mount) one partition at a time.

How big should a partition be is up to you. For my data, I'd partition a 500GB drive into 2 250GB slices, and a 1.5TB drive to 3 500GB slices.

Encryption

Encyption is optional. But it can be useful in Windows OS.

Windows eagerly mount all partitions on a disk. This defeats the intention of having multiple partitions. However, encyrpting each partition will let you mount only certain partitions. FreeOTFE or Truecrypt is my encryption software choice on Windows.

Operation & maintenance

After archiving to the master disk, I follow up with doing a bit-by-bit copy of the affected partition to backup disk. This shows another benefit of partitioning: if you have a gargantuan drive, you don't have to touch all.

After copying to the master disk, I took both disks offline. I use SATA drives for my archive disks and have 2 SATA docks. The disks are offline (unpowered, plugged off, in safe storage) most of the time. The master is online only when I need to read or write from it. The backup is online whenever I modify master. Whenever the backup is online, there is a brief window of risk where both drives may be destroyed (e.g.: a power surge).

The window can be eliminated. One way is to copy the master partition to online disk, take the master offline, bring the backup online, write the copy of master partition from online disk to the backup disk. This is too cumbersome for me and I have no need for this kind of data guarantee.

Around every New Year, I do badblocks -n on the whole disk for each disk to rejuvenate the 'magnetism'. This takes a lot of time, about 5.5 hours/100GiB when connected through USB2. An alternative, according to this article, is to simply do a read. This can be accomplished with dd if= of=/dev/null [bs=1M].

Update: A quick reading of Predicting Archival Life of Removable Hard Disk Drives (circa 2008) shows:

hard drive can easily survive 20 years (*simulated environment),
2.5" form has better survivability rate than 3.5" HD,
storing HD at 20C increases its survivability

I honestly are not bothered by these for my personal usage. I'd likely move over the content to some new media when SATA becomes obselete. I am pretty sure that SATA will become obselete within the next 10 years. I'd keep upgrading the media to keep up ahead of obsolescence. It is no fun trying to retrieve something from an obselete interface.

Badblocks: non-destructive read-write mode

With the -n flag ("non-destructive read-write mode"), badblocks does the following sequence in every iteration:

read data
write test pattern (test pattern is ~0 (0xffff in 32-bit int))
read test pattern
compare test pattern
write data

It will try to write back the original data when terminated with the following signals:

SIGHUP
SIGINT
SIGPIPE
SIGTERM
SIGUSR1
SIGUSR2

Killing it with SIGKILL is dangerous because you have, at best, 3/4 chance of corrupting your disk (the only safe state to SIGKILL is when it does read data).

2010-05-26

Setting up Samsung ML-1740 in OpenVZ VE

This is a quick walkthrough for setting up Samsung ML-1740 (or other Samsung laser printers) through USB port in Debian Linux OpenVZ VE . This adds on my previous guide and present a shorter walkthrough than the one in http://wiki.openvz.org/USB_Printing_in_VE

On HN:

Make sure the printer is on and connected to the usb port.

# aptitude install usbutils
# lsusb

Make sure the printer is listed in lsusb's output.
If this is the first USB printer connected to HN, then there should be /dev/usb/lp0.
Make sure /dev/usb/lp0 uid is "root" and gid is "lp" and the permission is at least 660 (u=rw,g=rw).

# vzctl set 555 --devnodes usb/lp0:rw --save

Now enter the VE:

# vzctl enter 555

Make sure there is /dev/usb/lp0 in VE (exported by HN). Make sure the permission is the same as in HN, owner user "root" and group "lp", and permission is at least 660.

# aptitude install cups splix
If you already have CUPS installed, restart it because the printer has to be powered on and plugged in before CUPS is started for it to see the printer.

# ${webbrowser} http://localhost:631

Go to "Administration", "Add printer", Enter name and click "Continue".
Next there's a device selection. Your printer should be listed there. My printer was listed as "Samsung ML-1740 USB #1 (Samsung ML-1740)".
If your printer is not there, the more likely causes are incorrect permission and/or CUPS started before the printer is up.

Select your Samsung model on the next screen and keep pressing "Continue".
You're done.

I'll never use JFS for root FS anymore

I love JFS, but I'm not going to use it for root FS anymore.

One of the computers had JFS for its root FS. The computer crashed. The filesystem became dirty and refused to be mounted rw until it is fsck-ed. Unfortunately, the fsck program for JFS exists only as a userland tool (kernel cannot do it) and Debian's standard initramfs image does not have fsck.jfs. So the root fs was mounted ro and, as a result, several important services, including ssh, failed to start.

It was a good thing that the computer was downstair. I just went downstair and used a rescue CD to fsck.jfs the volume. I had 3 rescue CDs: trinity 3.3, knoppix 6.2 and system rescue disk 1.3.2. Only the last one has fsck.jfs.

From now on, I'll be using JFS only for data volumes.