[rescue] E250 temperatures

Robert Darlington rdarlington at gmail.com
Wed Apr 16 13:55:27 CDT 2008

I ran the biggest sgi cluster on the planet for a while (6144 cpus).
It didn't take 30 minutes to boot an O2K with diags on.  I don't have
experience with the little E10k systems (little by my standards), but
even our big alpha cluster that took up a football field didn't take
that long to boot up -at least if you don't count the fact that we
brought it up in sections.   I not disputing that systems take that
long, I just can't see why.  What exactly are they doing when they
POST that takes hours or days?


On Wed, Apr 16, 2008 at 12:41 PM, Mike Meredith <very at zonky.org> wrote:
> On Wed, 16 Apr 2008 11:58:05 -0600, Robert Darlington wrote:
>  > I'd have to start asking why a system built in the last decade would
>  > take any more than a few minutes to POST.  What could the system
>  > possibly be doing that takes that kind of time?  I was extremely
>  > annoyed at the 5 minutes my old dell server would take.  None of my
>  > big iron systems (alpha, or sgi) ever took more than a few minutes to
>  > come up.
>  a) An E10k isn't that modern.
>  b) The key is 'diags turned on'. Your Dell doesn't have that level of
>    diagnostics on board. POST times are slightly more reasonable with
>    diags turned off.
>  c) These systems are a bit bigger than your Dell. A later example
>    (M8000) can take 32 cores, and 512Gbytes of memory. A PC memory
>    'test' would probably take more than 5 minutes to count that memory.
>  d) There's a difference between the time taken to do a proper power on
>    test. I don't have figures to hand, but I'd guestimate that some of
>    the E6900 domains I admin (16 cores, 32Gbytes memory) would take
>    5 minutes to test during a reboot, and something like 30minutes
>    during power on.
>  e) These systems aren't supposed to be powered off.
>  f) If these systems have a slightly wonky CPU board or some memory with
>    a bit of a problem, you *really* want to know about it early. Sure
>    most can 'degrade' a CPU board during operation but it's better to
>    degrade it early.
>  I'm pretty sure the bigger SGI stuff takes a while to get going too ...
>  at least the O2K I once worked on did.
