[rescue] E250 temperatures
rdarlington at gmail.com
Wed Apr 16 13:55:27 CDT 2008
I ran the biggest sgi cluster on the planet for a while (6144 cpus).
It didn't take 30 minutes to boot an O2K with diags on. I don't have
experience with the little E10k systems (little by my standards), but
even our big alpha cluster that took up a football field didn't take
that long to boot up -at least if you don't count the fact that we
brought it up in sections. I not disputing that systems take that
long, I just can't see why. What exactly are they doing when they
POST that takes hours or days?
On Wed, Apr 16, 2008 at 12:41 PM, Mike Meredith <very at zonky.org> wrote:
> On Wed, 16 Apr 2008 11:58:05 -0600, Robert Darlington wrote:
> > I'd have to start asking why a system built in the last decade would
> > take any more than a few minutes to POST. What could the system
> > possibly be doing that takes that kind of time? I was extremely
> > annoyed at the 5 minutes my old dell server would take. None of my
> > big iron systems (alpha, or sgi) ever took more than a few minutes to
> > come up.
> a) An E10k isn't that modern.
> b) The key is 'diags turned on'. Your Dell doesn't have that level of
> diagnostics on board. POST times are slightly more reasonable with
> diags turned off.
> c) These systems are a bit bigger than your Dell. A later example
> (M8000) can take 32 cores, and 512Gbytes of memory. A PC memory
> 'test' would probably take more than 5 minutes to count that memory.
> d) There's a difference between the time taken to do a proper power on
> test. I don't have figures to hand, but I'd guestimate that some of
> the E6900 domains I admin (16 cores, 32Gbytes memory) would take
> 5 minutes to test during a reboot, and something like 30minutes
> during power on.
> e) These systems aren't supposed to be powered off.
> f) If these systems have a slightly wonky CPU board or some memory with
> a bit of a problem, you *really* want to know about it early. Sure
> most can 'degrade' a CPU board during operation but it's better to
> degrade it early.
> I'm pretty sure the bigger SGI stuff takes a while to get going too ...
> at least the O2K I once worked on did.
> rescue list - http://www.sunhelp.org/mailman/listinfo/rescue
More information about the rescue