What is this long post about?
iSCSI is standard for accessing block devices (e.g. disks) over network, just as if they were local SCSI devices. That's similar to AoE and FCoE, although the latter two are good for the LAN only, while iSCSI is over IP thus is good on WAN. This article would focus iSCSI but could be used as a base for doing similar things with AoE and FCoE.
So, iSCSI in the simplest configuration, allows us to mount and manage a data disk that is physically connected to a remote computer (the "server", aka target)Â from our own computer (client, aka initiator) .
On this post I'll discuss the deep details of the more advanced stage: having the root (also boot) disk on a remote computer, so client could remotely boot from it. Surprisingly it can be done even with relatively old hardware.
But.. why?
There are several possible uses for this neat technology, I'll mention some:
- A live, read-write rescue disk (USB alternative: no need to physically touch the machine)
- My original motivation was, by the way, an old machine that required rescuing but didn't support boot from USB. Plus I hate optical media.
- diskless machines have always been cool
- Root disk is an image file
- Allows easy backups and even better: snapshots - of the root disk of important machines.
- In a similar fashion, allows to quickly move/copy the root disk from machine A to machine B, as it can be just an image file.
- Polymorphic machines: a simple script could make a machine boot a specific image, make a task, and later boot boot another image and make another task. Very useful for automated nightly tests: why waste four machines for testing on different OSes, if same machine can boot all four?
HW/SW Requirements
- A client (initiator) machine that supports network-booting (PXE-UNDI). Even relatively old machines (e.g. 2005) can do this. They don't need to support directly iSCSI boot, although it can make the process much easier. This post would discuss machines that cannot boot iSCSI natively.
- The client's OS should support booting from iSCSI. Recent Debuntu versions support that well, and I overheard that Windows 7+2008 also do, while Win2003 needs some tweaks.
- A server machine: should act as a DHCP, TFTP servers and iSCSI target. This post would discuss the setup on Debuntu machines.
The theory in a nutshell
So, what do we need for booting iSCSI on a computer that doesn't support iSCSI boot? There's a quite crazy, repeating bootstrapping process:
- The BIOS or NIC send a DHCP request to set-up the IP network settings and find a network bootable server, using the PXE-UNDI mechanisms
- gPXE image is found, and downloaded using TFTP. gPXE sends yet another DHCP request and should now find the iSCSI address of the remote boot disk
- gPXE starts as an iSCSI initiator that logs in to the iSCSI target, reads the remote boot disk's MBR and starts its boot loader (grub)
- grub loads the kernel and initrd
- initrd sends yet another DHCP request, sets up the IP network, and uses the iscsistart script, which sets up the iscsi initiator and logins (yes, again) to the iscsi target.
- iscsistart script then mounts this disk and uses pivot_root (as usual) to make it the new root
- boot process starts from the real root now, running /sbin/init
So.. Let’s get going!
Ok, just a quick disclaimer:
- I wrote the following instructions partly from memory, so I might have some imperfect parts. If you find such, let me know and I’ll fix them.
- Don’t blame me if any of the instruction below ruin your data/life/relationship.
STEP I: setup DHCP+TFTP+gPXE on server machine:
gPXE is a neat project that lets us boot from iSCSI and AoE. If your BIOS supports iSCSI or AoE boot, I guess you could skip this step.
The following steps are my paraphrase to this gPXE chainloading howto.
- Install the DHCP + TFTP daemons:
$ sudo aptitude install isc-dhcp-server atftpd
- Configure DHCP: in this example, the subnet is 192.168.1.0/24, and the server is 192.168.1.100. (Beware of using this DHCP server in your workplace or something, so not to interfere with the other DHCP servers)
subnet 192.168.1.0 netmask 255.255.255.0 {
allow booting;
allow bootp;
next-server 192.168.1.100;
if exists user-class and option user-class = "gPXE" {
filename "";
option root-path "iscsi:192.168.1.100::::iqn.my-laptop:target1";
} else {
filename "undionly.kpxe";
}
range 192.168.1.100 192.168.1.200;
} - Put undionly.kpxe (the gPXE UNDI chain loader) in the tftp root:
- Get gPXE and take the undionly.kpxe file off it. (needs compiling first?)
- Place it in the tftp root directory. e.g. /srv/tftp or /tftproot, depends on your tftpd configuration.c.
- Test that everything is fine:
$ tftp localhost
tftp> get /undionly.kpxe
- Test booting a client machine: just boot a client from network, and see that it gets gPXE trying to connect an iscsi target. As we didn't set up the target, it should fail at that stage, but if it didn't reach there, you'd better go fix it first.
STEP II: setup an iSCSI target which shares a disk
- Create the disk. It's more fun and flexible with an image file instead of a real physical disk. Let's create a 500MiB image and represent it as a loop block device:
$ sudo dd if=/dev/zero of=/data/my_root_disk.img bs=1024k count=500
$ sudo losetup /dev/loop0 /data/my_root_disk.img - Install the iscsi target tools:
$ sudo aptitude install iscsitarget
- Configure /etc/iet/ietd.conf to share our block device, as Lun 0 (zero) on target1:
Target iqn.my-laptop:target1
Lun 0 Path=/dev/loop0,Type=fileio,ScsiId=xyz,ScsiSN=xyz - Test the target by setting up an initiator to log in the target. This can be done locally on the target machine:
$ sudo iscsiadm -m discovery -t st -p <target's IP>
$ sudo iscsiadm -m node -L allIf everything worked well, above lines had discovered and logged into the iSCSI target, and you should see new scsi devices on /dev (and notes about these new devices in /var/log/messages)
Note: from this stage on, it's possible to do bad things such as mounting the same image twice (directly or over iscsi, e.g. from different initiators), so avoid doing that.
Step III: Create the root disk
- Partitioning: I've used the modern gpt partitions, but it should be possible with the ancient DOS partitions as well.So, using parted I've created the gpt partition table, then created two partitions:#1 for grub boot loader (note the flag grub_bios, grub requires us to add this flag from parted)#2 for the root disk itselfThat's my eventual partition table:
Disk /data/my_root_disk.img: 419MB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number  Start  End   Size  File system  Name  Flags
1    17.4kB  132kB  114kB        grub  bios_grub
2 Â Â Â 132kB Â 419MB Â 419MB Â ext4 Â Â Â Â root - Create the root filesystem on partition 2 and mount it to /mnt/tmp: Unfortunately it doesn't seem easy to access a partition on a loop device (i.e. /dev/loop0p1 doesn't show up) . [Update: see Petr's comment below, it IS possible in modern kernels] So my quick-and-dirty trick was simply doing it over local iSCSI, as mentioned in STEP II-4. Then we get /dev/sdX1 and /dev/sdX2, see /var/log/messages to find what X stands for. If you see multiple devices which seem the same (e.g. both sdc and sdd) that's because there are multiple IP paths to them (E.g. 127.0.01 and ::1 of IPv6); any of them would do.
$ sudo mkfs.ext4 /dev/sdX2 # beware not to be mistaken with device name which could ruin your life
$ sudo mount /dev/sdX2 /mnt/tmp # make sure it's not already mounted anywhere
- Populate the root filesystem: I used the amazing debootstrap tool to put Debian sid on it:
$ sudo debootstrap sid /mnt/tmp
- Prepare to chroot inside the new root filesystem: would be useful for many events. But we'd better also have /dev, /sys, /proc there:
$ sudo mount -o bind /dev /mnt/tmp/dev
$ sudo mount -o bind /sys /mnt/tmp/sys
$ sudo mount -o bind /proc /mnt/tmp/proc
Now chroot:
# chroot /mnt/tmp
- Update the initial ram fs (initramfs) to support iscsi rootfrom within the chrootted environment:As the initrd's responsibility is to mount the real final root device, and our real final root device is iscsi, initrd should have iscsi capabilities. Recent Debuntu's initramfs is capable of that
- enable iscsi-initramfs: This is done by setting-up the /etc/iscsi/iscsi.initramfs file, making sure it exists and contains a unique IQN (can be generated by iscsi-iname tool) in that format:
InitiatorName=<unique IQN>
- Create the new initrd:
# update-initramfs -u
- enable iscsi-initramfs: This is done by setting-up the /etc/iscsi/iscsi.initramfs file, making sure it exists and contains a unique IQN (can be generated by iscsi-iname tool) in that format:
- Install grub boot loader on the root disk: I believe it's best to do this also from chrootted environment:
If grub is not there, install it:
# aptitude install grub
Install grub boot code on the MBR:
# grub-install /dev/sdX
Step IV: Boot the client
I hope it all worked 😉
Please comment about your experiences, your additions or mistakes you've found in this post.
References
- Good HOWTOs for various san-boot configuration for a variety of operating systems.
- booting Windows 7 from SAN while also taking LVM snapshot of the windows root disk.
Hi,
Can you explain the DHCP server config if I want to map MAC to "fixed" IP for client machine?
In a few days I'll test your setup in my system. DHCP is almost ready - except the above question -, TFTP and iSCSI target (FreeNAS) are ready. On the iSCSI disk there is a real RHEL4 WS which was copied from HDD (maybe it is not good for iSCSI booting).
TIA,
Ruzsi
@Ruzsi,
AFAIK, fixed-address setting is orthogonal to what I presented here.
You can have additionally 'host' blocks such as
host myhost {
fixed-address 192.168.1.1;
hardware ethernet 00:11:22:33:44:55;
}
Afaik it should receive all the parameters you've set in the matching "subnet" section above, including the booting and gpxe settings.
But, I've never tried it, so I might be wrong here.
Hi,
Here is my solution for fixed address by DHCP:
host vbox-t {
next-server myserver;
filename "undionly.kpxe";
hardware ethernet ;
fixed-address ;
}
New question:
Why do you use undionly.kpxe instead of normal gpxe file? What is the difference?
The file was loaded and it waiting for something (maybe wrong DHCP server config?)
TIA,
Nice post, thanks.
One note:
It is possible to partition loop devices, since linux-2.6.26. You just need to specify max_part parameter while loading the loop kernel module. Support for this feature has been added to parted recently (past 3.0).
Petr: Interesting, thanks for the correction, I'll update in the post's body.