# [rescue] Cray J90s

Dave McGuire rescue at sunhelp.org
Tue Jun 12 17:51:01 CDT 2001

```On June 12, Al Potter wrote:
> >   Another good point to ponder...some things vectorize better than
> > they parallelize.
>
> I'm probably never gonna get a better opportunity.....
>
> Could you take a few and `splain the difference, paying particular attention to the vector side?  I understand the basics of parallel computing.

Vector Processing 101
---------------------

I'll explain it with some C code.  First, data types.  A vector is
nothing more than a list of numbers.  One might write vectors like
this:

A = [ 5.2, 11.1, 3.65, 4.0, 12.0 ]
B = [ 3.81, 7.11, 4.01, 9.44, 1.0 ]

Coding in C, we'd naturally store these lists of numbers in an array.
Operating on these arrays on a scalar (i.e., non-vector) computer
would usually involve iterating through the lists one by one and
performing some operation on each element of the vectors in turn.
Below, the arrays a, b, and c can be used to store vectors.  Let's
just say that we want to take every element of a, add it to the
corresponding element of b, and put the result in the corresponding
element of c:

/* declare the vector arrays and loop iterator variable */
float a[5], b[5], c[5];
int i;

/* stuff some values into the arrays */
a[0] = 5.2;
a[1] = 11.1;
a[2] = 3.65;
a[3] = 4.0;
a[4] = 12.0;

b[0] = 3.81;
b[1] = 7.11;
b[2] = 4.01;
b[3] = 9.44;
b[4] = 1.0;

for (i = 0; i < 5; i++)
c[i] = a[i] + b[i];

On a typical scalar processor, this loop would result in five
iterations through the loop and five additions, one at a time.  This
is the traditional scalar computing that we're all used to.

Vector processors, on the other hand, operate on vectors as atomic
values.  They can perform the same operation on whole lists of numbers
(vectors) in one operation...one vector instruction.

On a YMP architecture Cray (like a J90 or EL) that'd be one
instruction to load the VL (vector length) register, one instruction
to load vector "a" from memory into a vector register, one instruction
to load vector "b" from memory into another vector register, one
instruction to add the two vectors and place the result into yet
another vector register, and one instruction to write the contents of
the vector register into array "c" in memory.

This may not seem like such a huge win at first glance...but when
you consider the fact that most Crays (all except the YMP-C90) operate
on 64-bit numbers and have vector registers that are 64 elements deep
(i.e. the maximum length of a single vector is 64 numbers), the
usefulness and raw power of this form of operational parallelism
becomes clear.

Is this a useful explanation?

-Dave McGuire

```