Original Link: https://www.anandtech.com/show/2200



Introduction

Apple didn't show up, Dell sent their cat, and Lenovo was not interested in attending, but there were still 6100 other exhibitors in the 27 huge expo halls at the "Hannover Messe". Although plenty of booth babes and jet engines on top of monstrous heatsinks were present, we managed to concentrate on the server and storage technology that could be found at CeBIT 2007.


IBM's booth had style, but too much woolly language

In this report we cover the new products of Tyan, Supermicro, Chenbro, Promise, Fujitsu-Siemens and MSI. But to make sure that is not yet another CeBIT report, you will also find an analysis of the AMD "Barcelona" chip and how it compares to Intel's upcoming Xeon products.

Tyan

It was remarkable how many new and improved products were targeted at the HPC market. Although there were plenty of barebones servers and motherboards available, all attention went to Tyan's PSC, previously known as "Typhoon".


Tyan's T-650Qx

The "new" idea behind the PSC is that the scientist/3D artist should be able to render/simulate on his own Personal Super Computer instead being dependent on the whims of a strict dictator [also called a system administrator] to get access to blade servers in the datacenter. Blade servers can pack more crunching power in the same space, but the newly launched Tyan PSC T-500 and T-650Qx do not produce more than 52 dB of noise. 52 dB is far from whisper quiet as some have described it, but it is silent enough to be put under a desk and it won't create a Typhoon like a Blade server.

While the previously launched PSC featured four computing boards, the new PSC has five of them, each with two CPU sockets. The T-500A uses dual core Socket-F Opterons and can contain up to 20 cores and 80GB (16GB per computing board). The interesting thing is that the "master node" has also a PCIe x16 slot, which allows you to use a high-end graphics chip to visualize your rendering or simulation.

The real star is the T-650Qx which can use the new Intel Xeon L5310, the 1.6 GHz 50W low voltage quad core Xeon. At the Tyan booth, the Xeon 5150 (Dual core 2.66 GHz, 1333 MHz FSB) was used, running Windows Compute Cluster Server 2003 and Wolfram's Mathematica. You can use up to eight cores and 12GB per node, good for a total maximum of 60GB RAM and 40 cores (256 GFLOPs) for the complete PSC. Each Tyan PSC has three 600W PSUs. Prices start at $20,000, which is a bit less "personal", but the price/performance for the PSC is pretty reasonable in comparison to the competition.

There were other original ideas at Tyan's booth. For example's Tyan's Tank FT48, a 4U rack or tower server, which has two vertical memory daughter boards to save some space on the board.


This allows the board to have space for two 100 MHz PCI-X slots, two 133 MHz PCI-X slots, and two PCIe x16 slots (x8 electrically), as well as one PCI slot.


This is an excellent idea, as the whole purpose of 4U server is to give more I/O expandability compared to a 1U, where 16 DIMM slots leave little space:


Another remarkable Server is the Tyan VX50, which was updated to use up to eight dual core Opterons, or 16 cores in total. The 5U VX50 mounts two nForce Pro 2200 based boards on top of each other, and offers 32 DIMM slots.


This is quite a remarkable machine with a lot of processing power, but it might be rather complex to service this machine. Personally, we would have preferred to see a SAS controller with one cable to the backplane, instead of eight separate SATA cables.


The Tyan VX50 is powered by a powerful 3+1 PSU combination, good for 1620W total.



Supermicro

Supermicro also decided to offer more HPC specific products, and launched their 1U Twin server.


The 1U Twin offers two nodes, and up to 16 cores (2 nodes x 2 quad cores) in a 1U server, an amazing combination for the people whose prime concerns are density and HPC processing power. The 1U twin is more than just two nodes in one 1U server; this machine has been well thought out. By powering both nodes from the same 980W "up to 91% efficient" power supply, chances are good that the PSU is working at a load where it is close to its maximum efficiency. Of course, this also means that the PSU is a single point of failure, and this might lower the attractiveness as a high availability dual node server. The HPC people will probably not really care as the Supermicro PSUs are known to be very reliable, so the high processing power density and relatively low energy cost might easily offset this disadvantage.


Both nodes can also be "chained" together by the low latency, high bandwidth (20 Gbps) InfiniBand ports.

Several Supermicro Servers also now feature a new proprietary interface, based on PCIe, which Supermicro calls "Universal I/O". It looks like a rectangle part of the motherboard is missing.


At first, another proprietary I/O interface looks like a bad idea. However, Universal I/O was made to protect the investment of the resellers.


For example, consider the situation where a certain generation of SAS controllers is going to be replaced with a faster and cheaper alternative. In that case several barebones in stock might lose quite a bit of their value as customers will prefer the new barebones with the new SAS controller. It is also hard for a reseller to know whether they should stock barebones with high or low end SAS controllers, what percentage of barebones should have mostly PCIe or PCI-X slots, and so on. This is where the Universal I/O board comes in: a barebones server is not as quickly outdated if you simply add the required I/O controller when the server is sold. The Universal I/O boards are made like a part of the motherboard and as such the system does not lose the one or two PCI-E and PCI-X slots for other expansion cards.



Superblade

In our first server guide we commented that "the idea behind blade servers is brilliant" but that the current blade server market is filled with relatively overpriced and "vendor lock-in" solutions. Blades are still proprietary, but there is a good chance that the blade market is going to be quite a bit more competitive now that Supermicro's Superblade is here. This puts Supermicro right up against the established blade server vendors such as HP, IBM, Sun and Fujitsu-Siemens...

We could not get Donn Clegg or Angela Rosario to give us the pricing of the Supermicro Superblade, besides the indication of "very competitive pricing". Still, there are indications that the Superblade Enclosure is going to stir things up. Considering that Supermicro sells as many servers as Sun and Fujitsu-Siemens combined, it is easy to see that even if Supermicro is considered a Tier 2 OEM, this move cannot be ignored by the Tier one OEMs.

The Supermicro Superblade is a 7U blade chassis, which can contain up to ten blades. Each blade can use up to four Socket-F Opterons, and Raphael Wong told us that the Superblade will make use of AMD's newest Barcelona (or K10) chip. This means that you can use up to 16 cores per blade, or no less than 160 (!) cores for the complete 7U chassis. The Superblade enclosure also supports blades with dual socket dual core Xeons (51xx) or quad core Xeons (53xx).


Another interesting aspect is the hard disk options. On a 2-way chassis you have the flexibility to go for either two 2.5" or 3.5" drives (either SAS or SATA). Best of all, you can buy your own hard disks; you are not forced to buy your disks from the blade chassis vendor, contrary to what is customary with the other blade manufacturers.


With the 4-way blade you can only use 2.5" SATA drives as the four CPU sockets take away the necessary space for 3.5" disks.


Three kinds of hot-swappable power supplies (1400W, 2000W, 2500W) can be used and the chassis can hold up to four power supplies in a 3+1 configuration. Up to two Gigabit Ethernet switches or up to two 4x DDR InfiniBand switches (20Gb/s per port) can be used for connectivity. The Gigabit Ethernet switches we saw were 10 port Gigabit switches. An (optional) InfiniBand switch will link to each blade server via their own InfiniBand card.


Each of the blade servers has an IMPI 2.0 management module connector, and Supermicro's own management solution was being shown with the server at CeBIT.



Intel

Intel already announced a few weeks ago that in the second half of 2007, Intel Xeon processors with 1600MHz FSB will be available. In November of last year, Intel announced that the Greencreek chipset (the workstation version of the Blackford chipset we tested) would be succeeded by the Seaburg chipset. At that time, Seaburg was shown with a 1333 MHz Dual Independent Bus (DIB), but Intel has raised that speed to 1600 MHz.


It seems that the 1600 MHz FSB Xeons and the Seaburg chipset are targeted mostly towards HPC and workstation use. We would expect the 3.2 GHz Xeons to have a higher TDP (120W) than the typical Intel quad core server chips (80W).

Seaburg also supports up to 128GB of RAM, so it is a bit weird to target mostly HPC and workstation application. However, one of the most important improvements compared to Greencreek is the larger, more efficient snoop filter (and more associativity to provide better coverage) in Seaburg. A snoop filter only helps in bandwidth intensive apps as it lowers the amount of bandwidth that cache coherency traffic needs, thus freeing more bandwidth for bandwidth intensive apps. So basically a snoop filter helps in FP and I/O intensive applications.

With a 1333 MHz DIB and 2.66 GHz quad core Xeon x5355, Seaburg is about 5% faster in SpecFP 2000 (base), and about 4% faster in LS Dyna and Fluent. This snoop filter is not available in Blackford, so it shows that Seaburg not only has a faster DIB but also a more efficient one. Another indication that Seaburg is targeted towards the workstation world is that Seaburg supports up to 44 PCIe lanes.

So the newly announced 3 GHz quad core Xeon is a server chip intended to make the life of AMD's Barcelona a bit harder, but the 3.2 GHz Xeons with 1600 MHz DIB will probably be mostly HPC/workstation CPUs. As Intel has decided to increase the DIB speed of their HPC/Workstation chipset and CPUs to 1600 MHz, we may assume that Intel feels that AMD's Barcelona will be a bigger threat as an HPC CPU than as a server CPU.



AMD

Despite the enthusiastic and hard working marketing people, AMD's presence at CeBIT was somewhat a disappointment. Other than some vague benchmarks, there was no demo system of AMD's "Native quad core", and no real hard benchmarks or clock speeds. With a bit of help from a few enlightened people, however, we were able to dig up some new info.


It was quite interesting to hear AMD's representatives use the term "K10" again. The most famous benchmark so far is AMD's claim that the K10 is about 42% faster in floating point than the current top chip, the Xeon x5355.


So we decided to delve a little deeper. Let us first look at the dual socket systems, as we try to find more precise SPEC (base) numbers:

AMD vs. Intel Quad Core Performance Overview
Base SPEC Xeon 5160 (3 GHz) Opteron 2220 (2.8 GHz) Xeon vs. Opteron
CPU Int2006 17.5 12.2 43%
CPU Fp2006 17.1 13.1 31%
CPU Int2006 (rate) 53.2 46.1 15%
CPU Fp2006 (rates) 44.1 45.6 -3%
CPU Fp2000 (rates) 81.6 85.4 -4%

First of all, it should be noted that Spec FP2000 rate and Spec FP2006 rate are already running better on the dual core Opteron than on the dual core Xeon "Woodcrest". SpecFP rate is nothing more than several SpecFP benchmarks running completely separate from each other, and it is well known that SpecFP is a bandwidth intensive benchmark. So running several of those benchmarks will only increase the bandwidth needed. In a two socket machine, the Opterons have roughly twice as much bandwidth as with one socket, so SpecFP rate is basically the ideal benchmark to show the benefits of AMD's NUMA platform.

If we compare a dual quad core Xeon (x5355) with a quad socket dual core Opteron, the bandwidth of the AMD platform doubles while the bandwidth of the Intel system stays the same. As we use eight cores in total instead of four, the bandwidth demands of the SpecFP benchmark also double.

AMD vs. Intel Octal Core Performance Overview
Xeon x5355 (2.66 GHz) Opteron 8220 (2.8 GHz) Xeon vs. Opteron K10
CPU Int2006 16 10.5 52% N/a
CPU Fp2006 16.1 12.1 33% N/a
CPU Int2006 (rate) 79.6 86.2 -8% N/a
CPU Fp2006 (rates) 58.9 82.5 -29% N/a
CPU Fp2000 (rates) 103 157 -34% +/- 146

The result is that the dual Xeon x5355 (eight cores) is heavily bottlenecked by a lack of bandwidth and hardly faster than the dual Opteron 2220SE (four cores) in CPU FP2000 rates. If we take a look at the best dual core Xeon 5150 (2.66 GHz) score, it gets a score of 78.2. That means that the quad Xeon 2.66 GHz is only about 32% faster than its dual core brother at the same clock speed, another clear indication that the dual Xeon x5355 scores are seriously limited by memory bandwidth. It is no surprise that the quad socket Opteron 8220 is about 34% faster than the Xeon x5355 (and we are ignoring the probably inflated result of 184 you can get with the Sun Studio Compiler).

This puts AMD's claim that the best "K10" (most likely at 2.3 GHz) will be 42% faster than the Xeon x5355 in Spec FP rate in the right perspective. We reported in our Barcelona architecture article that the AMD K10's Northbridge is set up to handle higher bandwidth than the current AMD chips. As has been shown numerous times, the current Athlon 64 X2/Opteron architecture is not able to use the extra bandwidth that DDR2 gives.

So most of the 42% advantage is probably due to K10's better Northbridge and better use of DDR2. Ron Myers of AMD claimed that the difference is now already greater than 42%. Combine this with the fact that the K10 is running at only 2.3 GHz, and we can conclude that the memory subsystem (Load/store unit, L1, L2, Northbridge) of the K10 is simply (vastly) superior compared to the Athlon 64s and to the quad Xeon. This confirms our and Intel's assumption that the K10 will probably make the largest impact as a very potent HPC chip. The hardware virtualization features in AMD's K10 are quite impressive, but we'll discuss them later.

There is more: AMD emphasizes the memory subsystem and SSE on the slide below, presenting the Intel Clovertown as a severely bottlenecked CPU.


So how much of this marketing slide is true? It is very likely that the instruction fetch bandwidth of the AMD K10 is probably twice as high as Intel's Core architecture. Pre-decoding bandwidth (and thus the complete chain of fetching and decoding) on Core is still limited to 16 bytes of code per clock cycle, while it is claimed to be 32 bytes for the whole pre-decoding, fetching and decoding pipeline for AMD's K10. It should be remarked however that only in very CPU intensive code will the 16 byte per cycle bandwidth really be a bottleneck.

However, stating that the data cache bandwidth is twice as high as Intel's Core is ignoring a few things. Eric Bron, probably one of the most knowledgeable developers when it comes to SSE, stated: "Intel Core can sustain one 128-bit load and one 128-bit store per cycle (I've measured actual timings very near this theoretical peak), so Core can copy 128 bits per cycle. Barcelona (K10) can only copy 64 bits per cycle from the above store bandwidth limitation." So the twice as much "load bandwidth" is only a small part of the story:
  • Intel Core can do a 128-bit Load and 128-bit Store in one cycle if possible
  • AMD's K10 can either do two 128-bit loads, or two 64-bit stores, or one 128-bit Load and one 64-bit Store
Depending on the situation, AMD's K10 can do twice as much, about equal or about 33% less work in one cycle. So you cannot conclude that the AMD K10 has twice as much SSE bandwidth as the Intel quad core Xeon. It will only be faster if loads happen twice as often (or more) as Stores. In most "harder to vectorize" FP code, this is the case, so here the K10 chip will probably win by a small margin (as the percentage of SSE code is low). An example of this is the SpecFP benchmark. In some "easy to vectorize" SSE code this is not the case, and in that case the K10 will probably not be beaten per clock cycle, but the clock speed disadvantage might give the Xeon the edge.

A very enthusiastic Ron Myers emphasized that the AMD "Barcelona" launch is just as important as the Opteron launch back in 2003. And frankly, we believe him. The fast interconnects on die and "native quad core" might not make a big difference in typical desktop applications, but it does make a difference in server applications. The improved fetching and decoding pipeline and the much improved OoO (Out of Order) execution should offer better integer performance. At the same time, it's starting to be clear that it is more likely that AMD's K10 will outperform by a tangible margin the current Xeon "Clovertown" in HPC applications than server applications, thanks to vastly improved SSE loading and execution, together with a much better memory system. Again, that is probably the reason why Intel is introducing a 1600 MHz DIB more quickly than planned.



Storage

Looking at the storage front, the 2U Promise Vtrak E310f looks a lot like the VTrak J300s (a 2U 12 disk JBOD system). Looks can deceive of course: the E310f is an RBOD which has built-in failover/failback RAID 6 dual SAS controllers. The VTrak E310f connects to your SAN Switch via 4Gb fiber channel ports. The price of the 310f is around 4000 Euro (probably about $4500).

It is one of the first RBODs to use the Intel IOP341 CPU. This is a highly integrated, system-on-a-chip I/O processor incorporating the low-power Intel XScale processor, which runs at clock speeds of about 800 MHz. The IOP341 has also a rather large 512 KB L2 for an embedded chip. The XScale chip provides "pure hardware" or IOP-based RAID, including support for RAID 6. RAID 0, 1, 1E, 5, 10, 50, 60 are also supported. No less than 512MB cache is available (with a maximum of 2GB).


Up to four VTrak J-Class JBOD systems can be daisy chained via the SAS port to provide more storage space. The RBOD can be completely configured via a comprehensive remote management web server interface.


This kind of RBOD still needs a third party SAS controller from Adaptec or LSI. Promise offers only SAS controllers with internal ports.



MSI

MSI was showing off their first quad socket server: the MSI K4-201-A4R3. The Socket-F based Opteron server with a hard to remember name has a 2+1 750W PSU. The server has plenty of expansion possibilities: two PCIe x16 slots (x8 electrically), one PCIe x8 slot, two PCI-X slots (100 MHz), and one PCI slot. The PSUs have been placed in the front of the server to make more room for all these PCI slots and the four CPU sockets on the motherboard. The server makes use of the NVIDIA nForce professional 3600 chipset.


The only thing missing is an internal SAS controller; the internal disk bays only support SATA. A positive is the fact that two USB ports are available on the front of the server.

Fujitsu Siemens

Is a tier one OEM still able to differentiate itself on the hardware aspects of a server? That was the question we asked the people of Fujitsu-Siemens. It is not that easy as x86 servers have become mostly industry standard servers which all use the same chipsets, memory configurations and disk subsystems. Still, there are some subtle differences. For example, the new Primergy RX300 S3 - one of the first servers to make use of the low voltage 50W quad core Xeons - has a very well thought out cooling system.


Three completely separate tunnels cool the (quad core) Xeon CPU, the PSUs and the I/O cards. This results in less noise, less power consumption, and better cooling. The Primergy server is also one of the few servers which actually supports hot-pluggable PCI slots.


Fujitsu Siemens was also very prominently present on Novell's site, demonstrating a "management console" for the Xen Hypervisor which is bundled in SUSE SLES 10. The demo showed how an Apache and Oracle workload was dynamically balanced on several virtual machines.



Others

Chenbro launched the first 1.5U server, the RM13204. The 1.5U height was necessary to accommodate some very specialized video cards which are used by the military.


Enermax showed how much hardware its Galaxy 1000W can power. According to Enermax, the PSU delivered 933 W to 24 80GB hard disks, four Opteron 8212 CPUs, four 3Ware 9650 drive controllers, a GeForce 7600GX and 8GB of RAM (16 x 512MB).


We also talked to the people of MDT, a German memory DIMM manufacturer which claims their automated module testing machines are able to pick out the best memory chips. To prove this claim, MDT promises to replace each bad MDT DIMM with two brand new DIMMs, or a 200% guarantee. It will be interesting to see if both their gaming DIMMs and server DIMMs are able to best (or even match) the other DIMM modules out there while maintaining competitive pricing.

Conclusion

As has been the case for a few years now, CeBIT featured very few really new launches of products, but there are some clear trends. First of all, we see renewed interest from Supermicro, Tyan, AMD, Sun and Intel in HPC. As HPC software is at the moment the only software that can never have enough CPUs and memory bandwidth, it is a clear target for even faster CPUs with even more cores.

In a similar vein, the first benchmark shows that AMD's newest K10 chip makes much better use of the available DDR2 bandwidth than the previous AMD generation. It is still unclear how the chip will perform in integer intensive applications, unfortunately; AMD will only tell us that the K10 is "significantly faster" in this respect than the competition on a clock for clock basis. Whether this will be enough for the 2.3 GHz K10 to outperform a 3 GHz Xeon x5365 remains to be seen. There are good indications that the AMD K10 will be the fastest chip in HPC however.

The second interesting trend is that companies like Supermicro and Promise are ambitious enough to break open the rather "closed and proprietary" markets such as SAN and Blade Servers. This is very promising as these products have traditionally been way too expensive for the smaller enterprises. We will definitely take a look at these products and try to determine if they are an attractive alternative to the Tier One OEM products.

Log in

Don't have an account? Sign up now