[rescue] Advice on Octanes

Francisco Javier Mesa-Martinez lefa at ucsc.edu
Tue Sep 23 20:03:11 CDT 2003

On Tue, 23 Sep 2003, Kurt Huhn wrote:

> None taken.  I was just surprised by your response, which seemed out of
> character for you somehow.

Sorry again...

> > What I thought is that you seemed to think that the processor had a
> > "dedicated" port to each component in the system, i.e. Memory and I/O.
> > Which is not true, since the R10K has to do every request through the
> > same pins, since IO and memory are the same as far as it is concerned.
> > The only time the Xbow can be superior to a PC is when 2 processors
> > are pressent in which it will allow each processor to hit either
> > memory or I/O separatedly. For a single processor there is not much
> > gain from the Xbow when compared to a FeeCee.
> >
> Nononono.  I know what the architecture of an Octane is like.  Yes it
> has a crossbar, yes it has a heart chip, but in terms of bandwidth to
> the processor, there is a two-lane highway and only so many bits can fit
> on it.  However, in my experience, that bit limit is much higher on my
> Octane than the P-III - even with a single R12k.

The Octane has very wide paths to relative slow memory, FeeCees have
narrower paths to faster memory, so in the end they somehow manage to
catch up. The main point is that when an R10K issues an I/O request it
needs that request to be serviced before it can do anything else. Which is
the same for a FeeCee or any modern single context processor. No matter
how switched your system is, it really does not make a lick of difference
for the Xbow with respect to most modern single processor commodity
systems. Only time a switched system can perform better is when you have 2
processors in the system, since switches reduce the possibility of
contention. Again assuming that I/O and memory requests are being issued
at the same time. I believe there is only 1 port to memory from the Xbow
(I may be wrong) so when 2 processors hit memory then you have the same
contention problem as with a bused system, SGI may do something fancy
allowing 2 ports to memory but then you would have to be lucky enough that
the interleaving is such that each processor sees each leave all to

> > Newer Athlon and Xeon chipsets allow for switched approaches.
> >
> I wasn't aware of this.  I find it difficult to fathom though, how is it
> implemented?

You would be suprissed to know that there is little difference between
XIO and AGP :). So you have had a switched port in your FeeCee for a long

> > And I have a completely different experience, so I am glad that you
> > have the tool that you require for your job.
> Related note: We have a pile of new G4 xServes in the datacenter, and a
> pile of G5s on order?  Vector processing is a wonderful thing.

I view those units more as SIMD rather than vectors :)... but I guess
vector processing sounds far more sexy for Apple.

> developers ran our matrix simulations on a P-IV and then forked the code
> for the vector unit and ran it again, and then they threw a beer bash.
> I heard whispers of "five times faster" in intial testing from them.
> Awsome stuff.

The G4 and G5 have at least 2 SIMD units, if I recall correctly whereas
the PIV only has one. Plus the AltiVec stuff shows the power of RISC when
it comes to pipelining and more importantly pipelining SIMD ops....

More information about the rescue mailing list