iSCSI initiator (netbsd-iscsi-initiator) for OS X Mountain Lion (10.8.2)

I am playing around with iSCSI for my macbook pro. Looked around and the used to be free SNS globalSAN iSCSI is no longer free. ATO is too expensive to play with. Saw that macport has a netbsd-iscsi, so went that route.


$ sudo port install netbsd-iscsi-initiator
---> Computing dependencies for netbsd-iscsi-initiator
---> Dependencies to be installed: netbsd-iscsi-lib
---> Building netbsd-iscsi-lib
Error: org.macports.build for port netbsd-iscsi-lib returned: command execution failed
Error: Failed to install netbsd-iscsi-lib
Please see the log file for port netbsd-iscsi-lib for details:
/opt/local/var/macports/logs/_opt_local_var_macports_sources_rsync.macports.org_release_tarballs_ports_devel_netbsd-iscsi-lib/netbsd-iscsi-lib/main.log
Error: The following dependencies were not installed: netbsd-iscsi-lib
To report a bug, follow the instructions in the guide:
http://guide.macports.org/#project.tickets
Error: Processing of port netbsd-iscsi-initiator failed

Poking in main.log show the error at compiling disk.c.


:info:build /bin/sh ../../libtool --tag=CC --mode=compile /usr/bin/clang -DHAVE_CONFIG_H -I. -I../../include -I../../include -I/opt/local/include -pipe -O2 -arch x86_64 -MT libiscsi_la-disk.lo -MD -MP -MF .deps/libiscsi_la-disk.Tpo -c -o libiscsi_la-disk.lo `test -f 'disk.c' || echo './'`disk.c
:info:build /usr/bin/clang -DHAVE_CONFIG_H -I. -I../../include -I../../include -I/opt/local/include -pipe -O2 -arch x86_64 -MT libiscsi_la-disk.lo -MD -MP -MF .deps/libiscsi_la-disk.Tpo -c disk.c -fno-common -DPIC -o .libs/libiscsi_la-disk.o
:info:build disk.c:811:40: error: assignment to cast is illegal, lvalue casts are not supported
:info:build *((uint64_t *) ((void *)data + 8)) = (uint64_t) ISCSI_HTONLL(key);
:info:build ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
:info:build 1 error generated.
:info:build make: *** [libiscsi_la-disk.lo] Error 1

So I patched that line.


- *((uint64_t *) (void *)data + 8) = (uint64_t) ISCSI_HTONLL(key);
+ *((uint64_t *) ((void *)data + 8)) = (uint64_t) (ISCSI_HTONLL(key));

and now it builds. I just got to play with this to see if it works. More to report later.

ReadyNAS PSU replacement

I had a heart stopping moment today. My SO told me that she “smelled” something funny in my office. I went in there and sure enough, smelled like burnt plastic…. I thought no problem, I have backups…. on my NAS’es (two of them).

Heh! Murphy’s Law and all that. One of them (ReadyNAS NV+) PSU was the source of the burnt plastic smell, ah fresh odor of melted circuits. The other one had (thankfully) shutdown because of the heat.

The ReadyNAS is my main NAS (4x2TB), the other one (4x1TB) is my media server, stuffs that I don’t mind losing because they are copied from the ReadyNAS.   Yes, in another word, bad backup scheme.  There is no backup for the ReadyNAS.  I’ll fix that soon enough now.

Anyway, firing up my laptop, manually set IP, DNS, etc. since my server NFS mounted most everything on the ReadyNAS…  google, google and ah ha!  Reports of failed PSU in ReadyNAS NV and NV+.  I am hoping that it’s just the PSU and nothing the main board.

http://www.readynas.com/forum/viewtopic.php?t=13492

Took my ReadyNAS apart, sniff test (smell around the various pieces) to isolate where it is coming from.   Seem to be the PSU.  Got dressed, run to Central Computer near by.  It’s closed, crap!  Head to Frys in Santa Clara…. it’s already 8:20pm on a Tuesday eve…. In luck!  Frys is still open.  Run to the parts aisle, lots of options, big PSU (500W, 650W, 1000W, 1400w….) I don’t need monster PSU, I just need something small and around 220W as that is the original NV+ PSU.

Saw a microATX PSU from Coolmax (300W) CM-300.  Look small, hope it’ll fit in the case.  Only 26 bucks (29.22 w/tax).  Grabbed it, run home.  Followed the wiki above to get the two yellow cables onto the connector.  Lucky for me, the molex connector already has two empty holes where the two yellow cables has to be inserted.  So I just had to find the yellow cables from one of the other connectors.  After some fiddling, got it.   Eventhough its a microATX PSU, it is still too big for the ReadyNAS, won’t fit in the bottom tray (arg!).  Oh well, thread the power cable through the original 3 prong power hole, plug the molex onto the board.

Leaving everything open, I plug the power cord in, then into the AC receptacle and press the power button on the ReadyNAS….. and IT TURNED ON!  Wheeee!  The LCD say “Booting up…. please wait.”   Pulled the plug, now I can put everything back in, put the screws on.  Taped the hole in the bottom tray since the new PSU is going to be sitting outside and I want things airtight for cooling.

My ReadyNAS is now going through its fsck, going to be a loooong time w/6.5TB fs.   But at least I am now back in business.

P.S. Actually, the dead PSU did cause screwed up fs.  I had to manually ssh into my ReadyNAS (thank god I had set up a root ssh login) and fix the MBR on drive 4 (/dev/hdi).  Turns out the simplest way was:

# dd if=/dev/hdg of=/dev/hdi bs=512 count=1

 

Moving RAID 10 from one Dell R410 to another

Spent all of last week fighting fires.  We have a production server that was suffering software (application bugs) and hardware failures.  Crashing left and right, I got very little sleep responding to my pager and going online to restart the app and/or server.

The app was using too much memory (Java app) the server just can’t take more memory (we already have 32GB in it).  So we decided to throw more hw at it.  App and Postgresql was running on it (yes, I know, bad, bad, bad design — my excuse is that it was not me that set this up, I joined later).  Anyway, brought up a new, faster server (Dell R410) and moved Java app over on to it, leaving Postgresql on the old server.  The plan is that if we run into problem, it’s easy to move right back to old server.  Also easier quicker this way, no down time to take down DB, copy data over, etc.   Besides which, the DB is currenlty over 65GB, will take a while to copy over.

Well, guess what…. the new R410 started experiencing hw problem!  I have RAID 10 setup on the 4 drives.  Drives 1 & 2 (one of each RAID1 element) faulted, CRAP!  Swapped drives.  Still faulting.  I get message from the kernel (dmesg) that it kept having to rescan the SAS bus as the drives kept dropping out.  (Running CentOS 5.2 64 bit).

Talked with Dell support…. ah, what a pain in the rear they are.  They insisted that it was a firmware issue!!!!  Google for “Dell RAID controllers rejecting non-DELL drives”.  We paid for same day support and we want support now!  After a couple hours on the phone, we got them to agree to swap motherboard and RAID controller the next day.

In the mean time, we have another R410 sitting the same rack (but in use).  The apps on it can be move to another server though.  So I spent a couple hours at the data center moving the drives from failing R410 over to the other one.  I was afraid there might be problem because the current state of the RAID is degraded (2 drives in the RAID10 faulted and still syncing).  But it worked like a charm.  Shut down both systems, swapped drives (two at a time, drive 0, drive 1, drive2, drive3 so I don’t mess up).  Bring up the good R410….

It came up fine.  Saw the new RAID drives and asked if I want to import foreign config.  Said yes, and press Ctrl-R anyway so I can check and the RAID controller saw the RAID10.  It told me that the two drives are syncing.  Great, exit out and reboot.

Then I noticed that this system only have 16GB RAM…. aw CRAP!  Shut it down, pull them both off the rack, open the case, swapped DIMMs.  Put them both back in, boot up the good one…. hold my breath…..  and YES, it came up, 32GB, saw the RAID drives…

Once I got the login: prompt, login, check around, making sure everything is there.  Realize that the network is not up.  Spent a couple panic stricken minutes checking cable, switch ports, etc.  Then I remember that with RedHat (and CentOS) the ifcfg-ethN script is updated at boot and uses the MAC address.  Since I moved the drives to another server, the MAC changed and RH/CentOS noticed that the MAC address in existing ifcfg-ethN does not match current MAC, it updated those files.  Luckily it renamed the existing one to ifcfg-ethN.old.

I fired up vi and updated the old ifcfg-ethN.old file with new MAC address, rename them back to ifcfg-ethN (eth0 and eth1).  Bring them down and back up (ifdown eth0, then ifup eth0) and the network is up.

Reboot the server just to be sure that everything work, login and start up the app.  Checked from an external address (ssh to my home server, point my browser to squid at home) via a browser that the app is running and acessible from the outside world.

I’ve done this before, e.g. moving entire RAID (it was RAID1 and RAID5) from one Dell server to another identical hw Dell server.  So I know it works.  Only difference was the degraded mode of the RAID, but I am glad that it worked fine too.

SSD and servers

SSD or Solid State Drives is one of the next hot tech fad… besides multicore CPUs/GPUs, cloud computing, virtualization, Brittney Spears comeback, Mel Gibson hot, new young girlfriend…

Heh, sorry, let my hand type faster than my brain.

My brothers and I were discussing about the merits of adding SSD to servers.  One of the things discussed was whether a new interface (connector, bus, whatnot) is needed to make most efficient use of SSD in servers.  People were looking at adding dedicated bus, slot, whatever to motherboard so you can get the highest throughput from SSD.

My argument is that, that is not really needed, except for the jobs that demand the highest possible speed.  And even then, it should really be decided on a sliding scale.  How much is an additional percent of extra speed worth to you?

I think that for most people, using the existing SATA/SAS (I/II/III) interface is “good enough”.  Yes, you do not get the best speed, but the benefits still make it worth using, and the low cost will induce people to use it.

SATA interfaces come for free on all modern motherboards.  Most servers support hotswap SATA drives.  Not all support hot swap PCI/PCI-X/PCI-e slots.  Secondly, with hotswap SATA/SAS drives, you can do it without having to open up the case and/or powering down the system.  If you have more than a handful of servers to upgrade, you will appreciate this.

Next step up above SATA is SAS, which is mainly used in I/O intensive applications (such as DB servers, LDAP servers, etc.).

Don’t forget too that if you go with add-on cards — PCI, PCI-X, PCI-e — you also most likely will have to deal with drivers.  Your OS is not going to automatically make use of these add-on SSD cards without drivers.