next up previous contents index
Next: Beta Distribution Up: No Title Previous: Bayesian Statistics

Benchmarking

  In general, benchmarking (of computers) consists of defining one or several variables that describe a computer system's performance, and to measure these variables. There is no standard or generally accepted measure for computer system capacity: ``capacity'' is a mix of multiple parameters like cycle time, memory access time, architectural peculiarities like parallelism of processors and their communication, instruction parallelism or pipelining, etc. Usually, benchmarks should include system software aspects like compiler efficiency and task scheduling. Potential buyers of computer systems, in particular large and parallel systems, usually have to go to more or less detailed understanding of systems, and perform benchmark tests, i.e. they execute performance measurements with their own program mix, in order to assess the overall performance of candidate systems ( [Datapro83], [GML83], [Hennessy90]).

Attempts to express computer capacity in a single or a few numbers have resulted in more or less controversial measures; conscientious manufacturers advertise with several or all of these.

MIPS is an acronym for Million Instructions Per Second, and is one of the measures for the speed of computers. It has been attempted, theoretically, to impose an instruction mix of 70% additions and 30% multiplications (fixed point), and architectural factors as much as efficiency of scheduling or compilation should be entirely ignored. This makes the measure a simple and crude one, barely superior to cycle time. In practice, vendors usually make some corrections for such factors, and the results found are considered more or less controversial. Sometimes a floating point instruction mix is used; the unit is then called MFLOPS, clearly not a useful measure for some types of programs.

The Whetstone benchmark (like a later relative, Dhrystone) is a group of synthetic (i.e. artificially defined) program pieces, meant to represent an instruction mix matching the average frequency of operations and operands of ``typical'' program classes.

A different effort resulted in the SPEC benchmarks: a grouping of major workstation manufacturers called the System Performance Evaluation Cooperative agreed on a set of real programs and inputs, against which to measure performance. Real programs such as a mix of Linpack (linear algebra) operations are also frequently used for benchmarks.


next up previous contents index
Next: Beta Distribution Up: No Title Previous: Bayesian Statistics

Rudolf K. Bock, 7 April 1998