[SunRescue] NetBSD 1.4 miniroot problem fixed
jwbirdsa at carfallin.picarefy.com
jwbirdsa at carfallin.picarefy.com
Mon Dec 6 19:16:49 CST 1999
>Well, there is nothing that bad with a high end 3/xxx machine. Sadly,
>all I have in that line are a 3/110 and a 3/260. They are more of
>the plebian class machine.
The 3/2xx is nothing to sneeze at. It's the second fastest 3/ and can be
loaded up with 100M+ of RAM.
>I was thinking of dropping back to NetBSD-1.3.2 for my sun3's for
>some testing. I once had those up, and they worked reasonably well,
>but also had some timeout problems on tape drivers. The HD drivers
>seemed to work OK with non-interposer-controller setups (non md21 and
>non mt02). I am beginning to wonder if the interposer controllers may
>be part of the scsi timeout problem.
I have 1.4 running on two 3/50's and a 3/60, and they are all
MD21 systems. No problems with the MD21. Out of the three, two units
report phantom drives on LUN 1, but NetBSD recognizes them as being
offline, so no harm done. I don't use tape drives at all anymore, so I
don't know how those work.
>What particular NetBSD-1.4 are you using, and what problems did you
>solve to get it up and running. I usually boot my boxes from tape,
>so am stuck with tapebooting, although I have been known to roll the
>miniroot from swap on a sunos load. I don't seem to do well with
>netbooting the beasts.
Don't dd the miniroot to the start of the disk or you'll smash the
disk label. You can't dd the miniroot from a NetBSD system because it
won't let you write to raw partitions! Install via FTP doesn't seem to
work right, but that doesn't matter once you figure out how to avoid
smashing the disk label, because then you can put the install sets on a
filesystem on your miniroot disk. (I have an old SUN0207 which I'm using
as my magic install disk.) You may need to know how to remount the
miniroot read/write -- the install scripts expect it to be writable, but
it may not be.
>What we need to do is find the last working NetBSD on the sun3 and
>if it ever did work, on the sun4. Then see about getting that stable.
>I would be all for an older stable VME sun port with the right drivers
>to make that work correctly, that would be common across sun3 and sun4
Well, 1.4 seems to work fine on my Sun-3's, but as stated it's
seriously broken on old VME Sun-4's. 1.3.3 booted fine on my 4/2xx, and
it seemed to be stable as long as I didn't exercise the disks too hard.
Breaking up a long string of back-to-back activity by pausing the
process for a second or two seemed to allow it to catch its breath.
I suppose, if one wanted a really crude hack, one could add a counter
to the sd driver and have it insert pauses after a certain number of
rapid-fire requests, or something. Maybe turning off DMA would work,
since it was a DMA timeout that would crash it.
Oh, one note about 1.4 sun3: there is some kind of really obscure
problem down in the TCP stack. So far I have seen it manifest only when
connected to a particular MUCK: it frequently gets into a situation
where the remote host just stops transmitting. As far as I can tell,
select() is working properly and there is no data sitting in buffers on
the local end. As soon as I send a packet, everything comes back to
life. So far I haven't had time to investigate with a packet sniffer
(I'd be more motivated except that I kluged up a workaround in
tinyfugue), but I suspect that the remote system is waiting for an ACK
and NetBSD doesn't think it has received enough data to be worth sending
one, or something ugly like that.
>You are right, though, trying to get working sun4 builds running has been,
>shall we say, less than optimal, at best. It is becoming a tired old
>story, yet some folks seem to report getting the things running ``OK''.
>We must be running the wrong hardware.....(:+\\.....
Yeah. I've given up and fallen back to the 3/4xx. This machine is
supposed to be my main window on the world and I want it to be
reasonably stable. Plus, I need to get it up *SOON* and get my personal
files restored so I can start catching up on stuff. I'll leave the
SPARCs aside for sometime after I get my site reassembled, so that
they're not in the critical path and I can take time to just dink with
them and not care if I don't make any progress. :)
The worst problem with SCSI is that it's a black art. I have had
systems which would not operate if terminators were installed, which is
kind of backward, and others that required terminators, which is at
least expected. Some would only operate with particular cables in
a particular order -- take the same cables and rearrange them and
kablooey! And of course, just about anything will appear to operate for
a little while, so you're left guessing whether a problem is systemic or
just a fluke, and if it's systemic whether it's the controller, cables,
disk, software, VME chassis...
(I have at least learned the great secret of VME chassis: rubber
mallet! If you just slide a board in, it will appear to be seated but
it's really not. It's easiest to demonstrate this in a middle slot of
an empty chassis. If you just slide the board in, it looks OK, but
notice that it requires essentially no force to slide it back out.
However, if you slam the board in, it suddenly requires a good deal of
force to pull out. Slamming it in by hand is easy enough in an empty
chassis, but a nice rubber mallet is handy for seating cards in a full
box. I just wish I'd learned this years ago.)
>James.... is there a current copy of the FAQ somewhere in postscript
>format? That would be nice if there were.... just thinking out loud.
>I thought you were, at one time, the FAQ maintainer?
I was, several years ago. I ran out of time to do anything with it,
and it just sat for a couple years until I officially announced that I
was giving up at the beginning of 1998. I was at one time hoping to
create a pretty printable postscript version of it, but like a lot of
things I was hoping to do with it, that never happened.
More information about the rescue