[rescue] weird Opteron 865 / Tyan 4882 problem
patrick at zill.net
Tue Jul 14 20:35:29 CDT 2009
Hoping someone can give me some advice on this system:
4x Opteron 865 (dual core CPUs)
As you know, each Opteron has its "own" RAM using its on-board memory
controller. Each CPU has 4 slots.
All CPUs are fully populated with 4x1GB ECC DIMMs, except for CPU1.
Conditions under which system will hang:
CPU1 with no RAM
CPU1 with 4x DIMMs
Conditions under which system will boot fine:
CPU1 with 2x DIMMs
BUT the system does not see the RAM as existing! That is, I see 12GB
RAM at bootup and when the system is running, not 14GB as it should be.
My thinking is that either:
1. CPU1 is partially busted (specifically its memory controller) and
should be replaced
2. There is a problem with the DIMM slots itself, like maybe a resistor
or some other electrical channel problem.
My main concern is reliability - I want to use ESXi v4 on this beast and
put it in colocation... I can get by with 12GB provided the system
remains stable, and performance even going with UMA instead of NUMA is
still pretty decent.
Any ideas? Suggestions for further testing?
More information about the rescue