IBM
Skip to main content
 
Search IBM Research
     Home  |  Products & services  |  Support & downloads  |  My account
Select a country
IBM Home
IBM Research
VLIW Home
The VLIW project
Basic Principles
A VLIW based on
tree instructions
Processor Prototype
VLIW Compiler
Simulation Environment
· VLIW Translator
DAISY dynamic translation
More information
Talks and
Presentations
Publications
and Patents
Selected Abstracts
mikeg@watson.ibm.com


VLIW at IBM Research 
  The VLIW Simulation Environment 

Simulation and performance evaluation are critical during the early stages of designing, refining and experimenting with new computer architectures. Ideally, the tools used for these purposes should be efficient, allow experimentation with realistic workloads, be easily reconfigurable, and permit the evaluation of a variety of system designs. Moreover, the tools should support the development process, which requires fast simulation turn-around time, and the collection of accurate performance estimates, which may be more time consuming.

Our VLIW environment uses an integrated/modular approach to simulation and performance measurement, oriented towards an early-stage evaluation of our VLIW architecture. This environment, which achieves a high degree of efficiency and versatility, consists of three major components:

    Simulation environment image
  1. The VLIW compiler, which generates tree-instructions.
  2. A translator, which maps the VLIW program (tree-instructions) into a simulation executable that is run on an IBM RISC System/6000. Thus, the simulation executable consists of RS/6000 code that directly emulates the original native code of the VLIW architecture, as opposed to an interpreter using the native code as input.
  3. A processor model and a memory model, which are invoked by the simulation executable generated by the translator.

Performance measurement capabilities are integrated into the simulation executable through instrumentation to collect statistics regarding the emulated VLIW program. These mechanisms allow fast turn-around time for experimenting with architectural features and compiler algorithms, without yet introducing the detailed description of a processor and memory implementation.

In addition, the simulation executable can include a decoded form of the original VLIW code, and calls to a generic timer routine. When such an augmented simulation executable is run, a timer is invoked before each VLIW instruction is emulated, and passed the decoded version of the VLIW as well as an image of the current machine state. The machine state, maintained by the simulator, specifies information such as the contents of the registers of the VLIW architecture. In this way, there is a clear separation between simulation of the instruction-set architecture (ISA) and simulation of a particular implementation of the architecture, though both levels of simulation are possible with the integrated environment.

The timer invoked by the simulation executable consists of two parts, a processor model and a memory model. The processor model maintains the cycle count and other performance statistics, dealing with items such as register dependencies and operation latencies, for a given processor implementation. For memory operations, the processor model invokes the memory model, passing information such as the operation type and effective address. The processor and memory models each have a clearly defined interface, allowing a variety of models to be used interchangeably, with the models differing in both the system configuration they implement and in the degree of detail and accuracy involved. This versatility is further enhanced by ensuring that the interfaces provide all of the information required by the most detailed or accurate model that may be needed, even if that information is not necessary for earlier, simpler models.

The high efficiency of the timing environment is largely achieved through the use of the pre-decoded descriptors for each VLIW. A single read-only descriptor of a VLIW is repeatedly used during the simulation, so the size of the descriptor is not an important consideration. Thus, our descriptors are designed to minimize the processing overhead of the timer. (In contrast, a conventional trace-driven timing environment typically must strive to minimize the size of the instruction and machine state information in the trace, at the expense of decoding overhead in the timer.)

In practice, our timing environment has allowed us to dispense with the generation of traces, and measure the performance of realistic workloads. Programs such as the SPEC92 benchmark suite, the Linpack benchmark, and the Livermore loops benchmark have been timed in their entirety. Our simulation executables without timer calls typically run only about 15 times slower than the optimized native RS/6000 code for the same program. Using a timer that models the VLIW processor at the functional unit level and a memory hierarchy consisting of two levels of cache and main memory, a full timing is slower than the simulation executable by an additional factor of 75.

Our approach is particularly well suited to modeling a VLIW processor, because fewer invocations of the timer minimize the procedure call overhead, while the larger descriptors needed for VLIWs do not affect the efficiency of the environment. However, the approach can also be used for other architectures.

Since in some cases it is still desirable to generate traces, the environment allows linking a trace dumper into the simulation executable instead of a timer module. Using the same approach, we have also envisioned the possibility of replacing the timer module with some form of debugging environment.


Simulation/evaluation environment for a VLIW processor architecture [Abstract],
published in the IBM Journal of Research and Development, May 1997.

An integrated approach to architectural simulation... (Foils, pdf, 54 KB)

 
  VLIW Simulation Environment Image 
[Simulation environment image]
  Related Research 
arrow DAISY
arrow LaTTe: an open-source JIT compiler
  More Information
arrow Talks and Presentations
arrow Publications and Patents
arrow Selected Abstracts

 
  About IBM  | Privacy  | Legal  | Contact