Using an iPod with a bad hard drive

I recently bought an used iPod Classic 6th gen. While the storage capacity on the backside is specified to be 80 GB, I was only able to copy around 20GB of data to the hard drive before the iPod would shut itself off.
Lets fix that.

The iPod can boot in “emergency disk mode”, which boots a minimal OS that just exposes the internal hard drive over USB. I’ll use this mode in the rest of this post, as I figure that what I do won’t be disturbed by any software on the iPod.

To confirm the problem, I tried writing zeros all over the drive, and as expected, after writing 18 GB, the iPod crashed.

# dd if=/dev/zero of=/dev/sda bs=512K oflag=sync
dd: error writing '/dev/sda': Input/output error
34285+0 records in
34284+0 records out
17974689792 bytes (18 GB, 17 GiB) copied, 1065.78 s, 16.9 MB/s
# dmesg
[80318.901496] sd 2:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[80318.901504] sd 2:0:0:0: [sda] tag#0 Sense Key : Medium Error [current]
[80318.901509] sd 2:0:0:0: [sda] tag#0 Add. Sense: Write error - auto reallocation failed
[80318.901515] sd 2:0:0:0: [sda] tag#0 CDB: Write(10) 2a 00 00 42 f6 00 00 00 1e 00
[80318.901519] print_req_error: I/O error, dev sda, sector 35106816
[80318.901529] Buffer I/O error on dev sda, logical block 4388352, lost async page write

From this, it is pretty clear that the hard drive is broken and should be changed. It just turns out that Apple made it notoriously hard to change hard drive, with iFixit giving it a difficulty level of “Very difficult”. The drive will probably eventually fail, but I would like to use it until it fails.

So lets see if we can work around the problem with software instead. As filesystem creation didn’t fail on the drive, I suspect that there are working sectors located after the bad sectors that made the iPod crash. So what if we could just tell the filesystem not to use those bad sectors? It turns out most filesystems supports exactly that, and with FAT32 we can set the cluster value to 0x?FFFFFF7 for the clusters with broken sectors.

man 8 mkfs.fat tells us that we can use -l FILENAME to “Read the bad blocks list from FILENAME.”. So we’ll just have to figure out how to generate that list.

To figure out which sectors work, I started with the ‘badblocks’ program. The iPod claims a sector size of 4KB, so we’ll use that:

fdisk -l /dev/sda
Disk /dev/sda: 74.4 GiB, 79824777216 bytes, 19488471 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 byte

To find the bad sector, I tried the following (-s for show progress, -v for verbose).

# badblocks -o bad-blocks.txt -b 4096 -sv /dev/sda
Checking blocks 0 to 19488470
Checking for bad blocks (read-only test): done
Pass completed, 15097507 bad blocks found. (15097507/0/0 errors)

At some point my iPod rebooted/froze, and badblocks reported all the rest of the sectors as bad. This sucks, as some of those sectors probably work.

What we can do is to look at the output, and find where it begins reporting all blocks as bad. Then we tell badblocks to start at a that point to work around the reboot (in my case it happens when I try to read or write to sector 4390976). This turned out to happen a lot, and was very slow and time consuming, so I switched to a different approach.

I came to think of the ddrescue tool, which is designed to be used with faulty drives, so I gave that a go. It turns out that we can (misuse) ddrescue to figure out which sectors can’t be read, by trying to recover all the data from the disk and write the output to /dev/null. For some reason ddrescue is able to detect a bad sector without rebooting the iPod. Any sector that can’t be read, must be in the set of all bad sectors, so this will get us started. ddrescue will write the unrecoverable sectors to mapfile.txt.

# ddrescue --verbose --force --sector-size=4096 --cluster-size=1 --log-events=ddrescue.txt /dev/sda /dev/null mapfile.txt
GNU ddrescue 1.22
About to copy 79824 MBytes from '/dev/sda' to '/dev/null'
Starting positions: infile = 0 B, outfile = 0 B
Copy block size: 1 sectors Initial skip size: 208 sectors
Sector size: 4096 Bytes
ipos: 17982 MB, non-trimmed: 0 B, current rate: 32768 B/s opos: 17982 MB, non-scraped: 0 B, average rate: 975 kB/s
non-tried: 0 B, bad-sector: 41922 kB, error rate: 0 B/s
rescued: 79782 MB, bad areas: 7916, run time: 22h 43m 9s
pct rescued: 99.94%, read errors: 10235, remaining time: n/a
time since last successful read: n/a
Finished

This took a long time. Around 24 hours with my iPod.

When it finishes, you can visualize the result using ddrescueview, which plots good sectors as green, and bad sectors as bad. This is my result:

If we zoom in on one of the areas with bad blocks, we can spot a pattern, which I think kind of looks like scratches.

Now we just need to turn this mapfile into a format that mkfs.fat will understand.

The mapfile is a table of position, size and a status row. Position and size are in bytes, and the status is a symbol with the following meaning:

'?'       non-tried block
'*' failed block non-trimmed
'/' failed block non-scraped
'-' failed block bad-sector(s)
'+' finished block

$ head mapfile.txt
Mapfile. Created by GNU ddrescue version 1.22
Command line: ddrescue --verbose --force --sector-size=4096 --cluster-size=1 --log-events=mapfile.txt /dev/sda /dev/null mapfile.txt
Start time: 2018-12-08 00:44:45
Current time: 2018-12-08 23:27:54
Finished
current_pos current_status current_pass
0x42FDDE000 + 5
pos size status
0x00000000 0x42F43E000 +
0x42F43E000 0x00001000 -
0x42F43F000 0x0022F000 +
0x42F66E000 0x00001000 -
0x42F66F000 0x0018F000 +

Now that we have a list of sectors that can be read we need determine which of those can be written to as well. To test this, we use badblocks, and tell it to skip the bad sectors found by ddrescue.

To convert the mapfile into a format that is supported by badblocks, do:

$ ddrescuelog --list-blocks=- --block-size=4096 mapfile.txt > badblocks-from-ddrescue.txt

$ head badblocks-from-ddrescue.txt
4387902
4388462
4388862
4389821
4389822
4390088
4390364
4390515
4390630
4390648

We use this file as input to badblocks, and specify a block size of 4096 bytes which is the sector size.

# badblocks -n -v -s -b 4096 -i badblocks-from-ddrescue.txt -o badblocks.txt /dev/sda

-n = non-destructive read-write
-v = verbose
-s = Show progress
-b 4096 = block size
-i badblocks-from-ddrescue.txt = Skip a list of known bad blocks
-o = write the bad blocks to this file

I was expecting that command to find the rest of the bad blocks, but it unfortunately didn’t go that well. My iPod soon crashed again, and I had to manually reboot the ipod and find the bad block index in the badblocks.txt file where it started to report all sectors as bad, merge all the found bad blocks together, and tell it to start from that new offset. After a few iterations of this, I gave up, and decided to try something else.

In the pattern we saw using ddrescueview, it looked like the bad blocks were somewhat grouped together. So what if we “filled out” the bad areas? A simple way to do this is to just consider all blocks near a bad block as bad. I wrote the following awk script to do just that.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
BEGIN {
  # Tune this value
  gapTreshold = 4000

  # Stores the previous bad block we handeled
  lastBad = 0
} {
  currentBad = $1

  # Is the currentBad close to the lastBad block?
  if ((currentBad-lastBad) < gapTreshold) {
    # Fill out the entire space
    for (i = lastBad+1 ; i < currentBad; i++)
      print i
  } else {
    # We are not close to the previous block.
    # So lastBad must have been the last one in a streak.
    # Skip if lastBad = 0
    if (lastBad) {
      # Write bad blocks out after the last one
      for (i = lastBad+1; i < lastBad+(gapTreshold/2); i++)
        print i
    }

    # Write gapTreshold/2 bad blocks out before this one
    for (i = currentBad-(gapTreshold/2); i < currentBad; i++)
      print i
  }
  print currentBad
  lastBad = currentBad
}

END {
  # Print some bad blocks after the last block
  for (i = lastBad+1 ; i < lastBad + (gapTreshold/2); i++)
    print i
}

Setting the gapTreshold to 4000 is what eventually worked for me, and with that list as input (and around 10 iterations of badblocks where I would find a new bad block, add it to the list and re-run the script), I could run the badblocks program on the hard drive without crashing the iPod! (By the way: badblocks gets really resource hungry when you give it a large list as input!)

$ awk -f fill-gaps.awk badblocks.txt > badblocks-filled.txt
$ sudo badblocks -n -v -s -b 4096 -i badblocks-filled.txt /dev/sda # (no output)
$ wc -l badblocks.txt
10267
$ wc -l badblocks-filled.txt
943079

943079 * 4k sectors is a little less than 4GB of unusable blocks, which is very much acceptable! That means I should be able to use around 75GB of my iPod’s storage instead only 18GB!

Next step is creating the filesystem with our new list of bad blocks as input. I was expecting this to be the most simple step. I was wrong.

First step is to make a partition that the filesystem can reside on. This is the default layout on the iPod, and can be created using a tool like (g)parted:

# fdisk -l /dev/sda
Disk /dev/sda: 74.4 GiB, 79824777216 bytes, 19488471 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x20202020

Device Boot Start End Sectors Size Id Type
/dev/sda1 63 19488469 19488407 74.4G b W95 FAT32

Our partition starts on sector 63, so we need to compute our list of bad blocks relative to that offset. Moreover mkfs.vfat expects the list of bad blocks to be in 1KiB blocks (which took me a while to figure out – I just assumed it was sector size). We can get this by running:

$ awk '{oneKoffset = ($1 - 63) * 4; for (i = 0; i < 4; i++) { print oneKoffset + i } }' badblocks-filled.txt > 1KiB-rel-sector64.txt

Now we can create the filesystem by issuing (this should create the filesystem in the same way iTunes would):

# mkfs.fat -l 1KiB-rel-sector64.txt -F 32 -n iPod -S 4096 -s 4 /dev/sda

-l 1KiB-rel-sector64.txt = list of bad blocks used to mark clusters as bad
-F 32 = FAT32
-n iPod = label
-S 4096 = number of bytes per sector
-s 4 = sectors per cluster

This should have worked, but it didn’t. When you read this it may have been fixed in mkfs.fat. I verified on a windows machine by running the chkdsk program which reports the total amount of space used by bad sectors, which didn’t match what I expected.

C:\Windows\system32>chkdsk /R D:
The type of the file system is FAT32.
Windows is verifying files and folders…
File and folder verification is complete.
Windows is verifying free space…
Free space verification is complete.
Windows has scanned the file system and found no problems.
No further action is required.
77,915,440 KB total disk space.
880 KB in 55 hidden files.
1,248 KB in 78 folders.
17,554,608 KB in 1,025 files.
131,216 KB in bad sectors.
60,227,472 KB are available.
16,384 bytes in each allocation unit. 4,869,715 total allocation units on disk. 3,764,217 allocation units available on disk.

So I got myself a copy of dosfstools which contains the mkfs.vfat program, and I realized that the bad block list didn’t work correctly when the logical sector size is specified (-S 4096) to be different from the default 512 bytes.

So I developed the following patch to mkfs.vfat, which instead of interpreting the list as 1KiB blocks, assumes that the blocks are 4KiB.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
diff --git a/src/mkfs.fat.c b/src/mkfs.fat.c
index 5843550..95c374e 100644
--- a/src/mkfs.fat.c
+++ b/src/mkfs.fat.c
@@ -421,12 +421,15 @@ static void get_list_blocks(char *filename)
     char *line = NULL;
     size_t linesize = 0;
     int lineno = 0;
+    int clusterno;
     char *end, *check;
 
     listfile = fopen(filename, "r");
     if (listfile == (FILE *) NULL)
        die("Can't open file of bad blocks");
 
+    printf("start_data_sector: %d\nstart_data_sector/8: %d\n, bs.cluster_size: %d\n", start_data_sector, start_data_sector/8, bs.cluster_size);
+
     while (1) {
        lineno++;
        ssize_t length = getline(&line, &linesize, listfile);
@@ -464,7 +467,16 @@ static void get_list_blocks(char *filename)
        if (end == line)
            continue;
 
+
+       // bs.cluster_size is sectors per cluster
+       clusterno = (blockno - (start_data_sector/8)) / bs.cluster_size;
+
+       // "Cluster index are 2-based: cluster 2 is actually cluster 0 in the data region."
+       // https://cerbero-blog.com/?p=1355
+       mark_FAT_cluster(clusterno + 2, FAT_BAD);
+
        /* Mark all of the sectors in the block as bad */
+/*
        for (i = 0; i < SECTORS_PER_BLOCK; i++) {
            unsigned long sector = blockno * SECTORS_PER_BLOCK + i;
 
@@ -482,6 +494,7 @@ static void get_list_blocks(char *filename)
 
            mark_sector_bad(sector);
        }
+*/
        badblocks++;
     }
     fclose(listfile);

And ran:

$ awk '{ print($1 - 63) }' badblocks-filled.txt > badblocks-filled-63.txt
# ./mkfs.fat -v -l badblocks-filled-63.txt -F 32 -n iPod -S 4096 -s 4 /dev/sda1
mkfs.fat 4.1+git (2017-01-24)
mkfs.fat: Warning: lowercase labels might not work properly with DOS or Windows
/dev/sda1 has 255 heads and 63 sectors per track,
hidden sectors 0x01f8;
logical sector size is 4096,
using 0xf8 media descriptor, with 19488407 sectors;
drive number 0x80;
filesystem has 2 32-bit FATs and 4 sectors per cluster.
FAT size is 4756 sectors, and provides 4869715 clusters.
There are 32 reserved sectors.
943078 bad blocks

And now I’m able to use almost the entire drive, as verified from Windows:

C:\Windows\system32>chkdsk D:
The type of the file system is FAT32.
Windows is verifying files and folders…
File and folder verification is complete.
Windows has scanned the file system and found no problems.
No further action is required.
77,915,440 KB total disk space.
32 KB in 2 hidden files.
8,960 KB in 550 folders.
29,575,712 KB in 10,650 files.
3,772,912 KB in bad sectors.
44,557,808 KB are available.

16,384 bytes in each allocation unit.
4,869,715 total allocation units on disk.
2,784,863 allocation units available on disk.

Instead of using the OS supplied by Apple, I use a FOSS OS called Rockbox which I totally recommend. You have custom themes, it playbacks FLAC tracks, and even plays Doom!

Using Google’s BBR congestion control on Ubuntu Server 16.04

Most current congestion control algorithms rely on packet loss as the signal to slow it down. According to [1] this is ill-suited for todays modern networks.

The BBR congestion control algorithm, is an alternative used by Google, designed so it “reacts to actual congestion, not packet loss or transient queue delay, and is designed to converge with high probability to a point near the optimal operating point.”. [1]

To use BBR on Ubuntu 16.04, first step is to make sure your kernel is >=4.9.

If your kernel is 4.4 (as mine were), the best way to get a newer kernel, is to enable Ubuntu’s “LTS Enablement Stack”. [2] This is on Ubuntu server 16.04 simply done with:

# apt install --install-recommends linux-generic-hwe-16.04

For me, this installed 4.10.0.

Next we need to enable BBR for congestion control, but we also need to change the packet scheduler to fq [3] (though not required since patch [5]):
Append the following two lines to /etc/sysctl.conf (or put them in a new file in /etc/sysctl.d/):

net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

Reboot, and verify with:

$ sysctl net.ipv4.tcp_congestion_control
net.ipv4.tcp_congestion_control = bbr

 


While attending the BornHack camp I noticed that even though we had a 1 Gbit internet connection, I couldn’t download with more than 30mbit/sec from https://mirrors.dotsrc.org.

After some testing with iperf3, the cause of the low throughput seemed to come from packet loss on downstream traffic. With iperf3, on a single TCP stream I could only download with around 30 Mbit/s, but upload (which had no packed loss) with around 600mbit/sec.

I could use up the entire 1 Gbit link by telling iperf3 to use multiple TCP streams (e.g iperf3 -P 30).

Having a 1 Gbit link, but being unable to use it up has bugged me since Bornhack. Then I stumbled upon this BBR congestion control algorithm while reading a blogpost from Dropbox [4], and I decided to try it out.

So lets try to see if we can do better with BBR. Step 1 is to create a link, with similar conditions as on bornhack. I used a lxc container connected to the same switch as the server hosting mirrors.dotsrc.org, and added some RTT and packet loss.

On my client (the lxc container), I added around 40ms using:

# tc qdisc change dev eth0 root netem delay 40ms 4ms

And made it drop 0.003% of all incoming packages (this gave me the ~30 Mbits/s I was aiming for):

# iptables -A INPUT -m statistic --mode random --probability 0.003 -j DROP

iperf3 results, with the default (net.core.default_qdisc = pfifo_fast and net.ipv4.tcp_congestion_control = cubic):

ato@kvaser:~$ iperf3 -c 130.225.254.107 -N
 Connecting to host 130.225.254.107, port 5201
 [ 4] local 130.225.254.116 port 55802 connected to 130.225.254.107 port 5201
 [ ID] Interval Transfer Bandwidth Retr Cwnd
 [ 4] 0.00-1.00 sec 3.16 MBytes 26.5 Mbits/sec 15 91.9 KBytes
 [ 4] 1.00-2.00 sec 2.42 MBytes 20.3 Mbits/sec 9 76.4 KBytes
 [ 4] 2.00-3.00 sec 2.05 MBytes 17.2 Mbits/sec 0 97.6 KBytes
 [ 4] 3.00-4.00 sec 2.73 MBytes 22.9 Mbits/sec 0 113 KBytes
 [ 4] 4.00-5.00 sec 2.61 MBytes 21.9 Mbits/sec 1 96.2 KBytes
 [ 4] 5.00-6.00 sec 2.42 MBytes 20.3 Mbits/sec 12 80.6 KBytes
 [ 4] 6.00-7.00 sec 2.24 MBytes 18.8 Mbits/sec 6 69.3 KBytes
 [ 4] 7.00-8.00 sec 1.80 MBytes 15.1 Mbits/sec 0 87.7 KBytes
 [ 4] 8.00-9.00 sec 2.61 MBytes 21.9 Mbits/sec 0 106 KBytes
 [ 4] 9.00-10.00 sec 2.42 MBytes 20.3 Mbits/sec 2 91.9 KBytes
 - - - - - - - - - - - - - - - - - - - - - - - - -
 [ ID] Interval Transfer Bandwidth Retr
 [ 4] 0.00-10.00 sec 24.5 MBytes 20.5 Mbits/sec 45 sender
 [ 4] 0.00-10.00 sec 23.9 MBytes 20.0 Mbits/sec receiver

iperf Done.

With the new BBR congestion control enabled:

ato@kvaser:~$ iperf3 -c 130.225.254.107 -N
Connecting to host 130.225.254.107, port 5201
[ 4] local 130.225.254.116 port 55930 connected to 130.225.254.107 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 34.5 MBytes 289 Mbits/sec 1139 5.71 MBytes 
[ 4] 1.00-2.00 sec 57.5 MBytes 482 Mbits/sec 76 6.04 MBytes 
[ 4] 2.00-3.00 sec 57.5 MBytes 482 Mbits/sec 117 2.48 MBytes 
[ 4] 3.00-4.00 sec 45.0 MBytes 377 Mbits/sec 129 5.92 MBytes 
[ 4] 4.00-5.00 sec 48.8 MBytes 409 Mbits/sec 135 2.49 MBytes 
[ 4] 5.00-6.00 sec 53.8 MBytes 451 Mbits/sec 105 5.81 MBytes 
[ 4] 6.00-7.00 sec 58.8 MBytes 493 Mbits/sec 69 5.99 MBytes 
[ 4] 7.00-8.00 sec 50.0 MBytes 419 Mbits/sec 81 5.71 MBytes 
[ 4] 8.00-9.00 sec 53.8 MBytes 451 Mbits/sec 103 5.84 MBytes 
[ 4] 9.00-10.00 sec 47.5 MBytes 398 Mbits/sec 111 2.61 MBytes 
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 507 MBytes 425 Mbits/sec 2065 sender
[ 4] 0.00-10.00 sec 506 MBytes 424 Mbits/sec receiver

iperf Done.

From 20 Mbits/s to 400 Mbits/s. That’s just awesome. I’ll keep it enabled on https://mirrors.dotsrc.org, which should help people download the content we host much faster on links with a bit of packet loss 🙂

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0f8782ea14974ce992618b55f0c041ef43ed0b78

[2] https://wiki.ubuntu.com/Kernel/LTSEnablementStack

[3] “NOTE: BBR *must* be used with the fq qdisc (“man tc-fq”) with pacing enabled, since pacing is integral to the BBR design and implementation. BBR without pacing would not function properly, and may incur unnecessary high packet loss rates.” [1]

[4] https://blogs.dropbox.com/tech/2017/09/optimizing-web-servers-for-high-throughput-and-low-latency/

[5] https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=218af599fa635b107cfe10acf3249c4dfe5e4123