[rescue] State of the Thor report

Phil Stracchino phils at caerllewys.net
Sat Aug 12 01:40:14 CDT 2017

So, with a massive thank-you to Doug McIntyre and Steve Hatle, I have a
new NAS.  Being a Thor platform, it of course got a suitably Norse
hostname, asgard.

There were a few hiccups getting the OS (Solaris 11.3) set up to my
satisfaction, some of which got simpler once I managed to get
whatever-it's-called-this-week Studio 12.6 installed.  Other packages
had to be built with gcc, and for apcupsd I had no choice but to settle
for an OpenCSW package because it absolutely would not build.  Likewise
getting SMB filesharing set up gave me a lot of headaches until I
realized that Samba and Solaris' own SMB service were fighting each
other.  (I'm using Samba rather than Solaris built-in SMB service
because the Solaris service really, really wants you to authenticate to
an Active Directory domain, and I see no reason to subject myself to
that much pain for a home service with only five regular users.

So then with all services working, it was time to move the array.

zpool export -f spool
[move disks]
zpool import spool

So far so good, everything mounts.

zpool upgrade spool
zfs upgrade -a

And all was great.  ....For a couple of days, then drives started
falling over.  I hadn't really been keeping track, and I don't think one
of the eight Samsung Spinpoint F3 drives still in the array was under
six years old.  They were keeping it together on the old box, but
couldn't handle the I/O rates the X4540 was asking of them.  After a
couple of days of frantically swapping drives around from slot to slot,
I realized the problem was the drives, not the slots.  Having already
replaced four drives, *sigh*, OK, replace the others as well ...  keep
the one rather younger Hitachi as a hot-spare, and rebuild the whole
array.  And since then everything's been stable.

Two of the six main disk controllers seem to be completely inoperative,
they don't even register as controllers.  Anyone have any ideas about
that?  I'm guessing it's probably not something that's repairable short
of a system board swap...

