Appendix C. Performance and Debugging tools

dd. Very useful for testing the streaming performance of disks and cdroms/dvds. See man dd for more details. Here is an example for timing how long a disk takes to read 1 GB (10**9 bytes) starting from block 0:
$ time dd if=/dev/sda of=/dev/null bs=512 count=1953126
If the raw device /dev/raw/raw1 is bound to /dev/sda then the above line is equivalent to:
$ time dd if=/dev/raw/raw1 of=/dev/null bs=512 count=1953126
This may be slower than expected since one 512 byte sector is being read at a time. Changing the last 2 arguments to "bs=8k count=122071" should give better timings for the "raw" dd.

lmdd. This command is part of the lmbench suite of programs and is a variant of the dd command. It has been tailored for IO measurements and outputs timing and throughput numbers on completion. Hence the time command and a calculator are not needed.

sg_dd. This command is part of the sg_utils package (see below) and is another variant of the dd command in which either the input and/or output file is a sg or a raw device. The block size argument ("bs") must match that of the physical device in question. The "skip" and "seek" arguments can be up to 2**31 - 1 on a 32 bit architecture allowing 1TB disks to be accessed (2G * 512). The Linux system command llseek() is used to seek with a 64 bit file read/write offset. The lmdd does not handle the > 2GB case and the dd command gets creative with multiple relative seeks. sg_dd has a "bpt" (blocks per transfer) argument that controls the number of blocks read or written in each IO transaction.

sard. This utility is modelled on System V Release 4's sar -d for producing IO statistics for mounted devices and partitions. It has been developed by Stephen Tweedie and includes the sard utility and a required kernel patch which expands the output of /proc/partitions . It can be found at ftp.uk.linux.org/pub/linux/sct/fs/profiling. It collects statistics at a relatively low level (e.g. SCSI mid level) compared to programs like vmstat.

vmstat      {in most distributions, try "man vmstat"}
scsi_debug  {lower level driver for debugging
             (no adapter required)}
sg_utils    {utilities package for sg:
             www.torque.net/sg}
CONFIG_MAGIC_SYSRQ=y  
            {see /usr/src/linux/Documentation/sysrq.txt}

When it looks like something has partially locked up the system, the ps command can be useful for finding out what may be causing the problem. The following options may be useful for identifying what part of the kernel may be causing the problem. This information could be forwarded to the maintainers.
ps -eo cmd,wchan
ps -eo fname,tty,pid,stat,pcpu,wchan
ps -eo pid,stat,pcpu,nwchan,wchan=WIDE-WCHAN-COLUMN -o args
The most interesting option for finding the location of the "hang" is "wchan". If this is a kernel address then ps will use /proc/ksyms to find the nearest symbolic location. The "nwchan" option outputs the numerical address of the "hang".