IBM
Skip to main content
 
Search IBM Research
     Home  |  Products & services  |  Support & downloads  |  My account
Select a country
IBM Home
IBM Research
VLIW Home
The VLIW project
Basic Principles
A VLIW based on
tree instructions
Processor Prototype
VLIW Compiler
Simulation Environment
DAISY dynamic translation
More information
Talks and
Presentations
Publications
and Patents
Selected Abstracts
mikeg@watson.ibm.com


VLIW at IBM Research 
  Some VLIW Presentations 

The IBM Research VLIW Project

K. Ebcioglu, IBM T.J. Watson Research Center

The IBM Research VLIW project has been continuing since 1986, focusing on hardware and compiler techniques for instruction-level parallelism. We will give an overview of various aspects of the IBM Research VLIW project, including our VLIW hardware prototype, application of VLIW compilation techniques to PowerPC superscalar processors, and our third-generation parallelizing compiler. We will describe novel VLIW architectural features and compiler techniques for achieving a high degree of instruction-level parallelism not only in scientific code, but also in sequential-natured code, involving frequent unpredictable branches, pointers, and a large amount of data dependencies. Our VLIW compilation techniques can parallelize multiple paths in a program directly (unlike trace scheduling), and can generate variable initiation intervals during software pipelining of loops with conditional branches (unlike modulo scheduling).
[ Presentation foils (pdf, 45 KB) ]

[ Top ]


Simulation/evaluation approach for a VLIW processor

J. Moreno, M. Moudgill, K. Ebcioglu, IBM T.J. Watson Research Center

This presentation describes the approach being used for the simulation and early evaluation of a processor architecture based on Very-Long Instruction Word (VLIW) principles. In this architecture, a program consists of a set of "tree-instructions", each one corresponding to a multiway branch and multiple operations, all performed simultaneously (as one VLIW). The representation of the tree-instructions in memory allows their execution in implementations with varying resources, so that the program representation is implementation-independent.

The simulation/evaluation environment consists of:

  • A high-level language (C,FORTRAN) compiler, which generates tree-instructions in a VLIW assembly language.
  • A VLIW assembler, which generates VLIW object code.
  • A translator from VLIW assembly code into RS/6000 assembly code. The RS/6000 code simulates the functionality of the VLIW processor for the specific VLIW program, including instrumentation to collect execution counts of VLIWs, VLIW profiling information, and generation of predecoded VLIW execution traces.
  • A cycle timer, integrated into the simulation environment, which is invoked by the simulator on a VLIW by VLIW basis so that the timer processes execution traces as they are generated.

The presentation will review the basic features of the architecture, including those which allow the execution of the same VLIW program in processors with different resources, some of the VLIW compilation techniques used, trade-offs among compiler and architecture, the role of the evaluation-simulation environment in this process, and the requirements imposed on the different tools.

The environment is oriented towards early verification of the VLIW architecture instead of reflecting a pre-hardware definition of a processor implementation; however, the extensibility for such a functionality has been taken into account in the design of the environment. Emphasis has been placed on the development of an environment which provides reasonably fast turn-around time from compilation to simulation, so that architecture/compiler tradeoffs can be analyzed over complete execution runs.
[ Presentation foils (pdf, 70 KB) ]

[ Top ]


An integrated approach to architectural simulation, timing and memory hierarchy evaluation

E. Altman, C.B. Hall, R. Miranda, J. Moreno

Using an integrated, modular approach to simulation and performance measurement, we have built an environment for early-stage evaluation of new architectures that achieves a high degree of efficiency and versatility. Our environment consists of a compiler that generates code for an experimental architecture, and a separate translator that maps that code to a simulation executable that is run on an IBM RISC System/6000. The simulation executable consists of RS/6000 code that directly emulates the original native code of the experimental architecture, as opposed to an interpreter using the native code as input.

Performance measurement capabilities are integrated into the simulation executable by including a decoded form of the original native code, and by inserting calls to a generic timer routine into the emulation code. As the simulation executable emulates each original instruction, it calls the timer, passing the decoded version of the instruction and an image of the current machine state.

The timer invoked by the simulation executable consists of two parts, a processor model and a memory model, each with a clearly definedinterface. This allows a variety of processor and memory models to be used interchangeably, with the models differing in both the system configuration they implement, and in the degree of detail and accuracy involved.

In practice, our timing environment has allowed us to dispense with the generation of traces and measure the performance of realistic workloads. Our simulation executables without timer calls typically run only about 14 times slower than the optimized native RS/6000 code for the same program. Using a timer that models a VLIW processor at the functional unit level and a memory hierarchy consisting of two levels of cache and main memory, a full timing is slower than the simulation executable by an additional factor of 75.
[ Presentation foils (pdf, 54 KB) ]

[ Top ]


Tree-based VLIW architecture

J. Moreno, IBM T.J. Watson Research Center

This talk describes the features of the VLIW architecture currently under development at IBM T.J. Watson. The architecture, based on the concept of tree-instructions, includes properties which allow binary compatibility across an entire family of processor implementations, ranging from scalar to VLIW, thus recovering the traditional separation among architecture and implementations.
[ Presentation foils (pdf, 46 KB) ]

[ Top ]

 
  Related Research 
arrow DAISY
arrow LaTTe: an open-source JIT compiler
  More Information
arrow Talks and Presentations
arrow Publications and Patents
arrow Selected Abstracts

 
  About IBM  | Privacy  | Legal  | Contact