IBM
Skip to main content
 
Search IBM Research
     Home  |  Products & services  |  Support & downloads  |  My account
Select a country
IBM Home
IBM Research
VLIW Home
The VLIW project
Basic Principles
A VLIW based on
tree instructions
Processor Prototype
· VLIW Code
· Prototype implementation
· Semicustom chip
· Prototype images
VLIW Compiler
Simulation Environment
DAISY dynamic translation
More information
Talks and
Presentations
Publications
and Patents
Selected Abstracts
mikeg@watson.ibm.com


VLIW at IBM Research 
  VLIW prototype: a code example 

The prototype is optimized to do multiway branching every cycle, without branch stalls. Here is an example: a C code fragment for finding the maximum of an array:

    for(i=0; i < N; i++) 
      if (a[i] > max) max=a[i]; 

At cycle n, the relevant register contents are (assuming i=0; N>0;):

mar0, r4=&a[i]; r8=&a[N-1]; r2=max=-INF; 

The VLIW code for the loop starts as follows

(LOOP ((LOAD mar$0 (r$1))      ;r1=load a[i] (*r4)
       (ADD r$4 4 (r$4 mar$0)) ;increment array pointer r4 (=&a[i+1])
                               ;also copy it to mar0
       (GOTO L1)))

Compare operations set the condition codes cc0, cc1 in cycle n+1 (notice that the second iteration is started before the first one is finished; this is called a software pipelining technique):

(L1 ((GT r$1 r$2 (cc$0))       ;cc0= (a[i] > max)
     (GT r$4 r$8 (cc$1))       ;cc1= (&a[i+1] > &a[N-1])
     (LOAD? mar$0 (r$3))       ;r3=load a[i+1] (*r4)  (2nd iteration)
                               ;cannot overwrite r1 (=a[i]) yet,
                               ;still need to update max
                               ;"?" means a speculative operation
     (ADD? r$4 4 (r$4 mar$0))  ;r4,mar0= &a[i+2] (2nd iteration)
     (GOTO L2)))

Then. multiway branching is done in cycle n+2 using these condition codes, and the correct target instruction ( L2 or exit) is fetched (further compares may set condition codes at the same time):

(L2 ((IF (NOT cc$0)                ;if not(a[i]>max)
         ((COPY? r$3 (r$1))        ;can overwrite r1 with a[i+1] now
         (IF (NOT cc$1)            ;if not(&a[i+1]>&a[N-1])
             ((GT r$3 r$2 (cc$0))  ;cc0= (a[i+1] > max) (2nd it.)
              (GT r$4 r$8 (cc$1))  ;cc1= (&a[i+2]>&a[N-1]) (2nd it.)
              (LOAD? mar$0 (r$3))  ;r3=load a[i+2] (*r4) (3rd it.)
              (ADD? r$4 4 (r$4 mar$0))  ;r4,mar0=&a[i+3] (3rd it.)
              (GOTO L2))
          ELSE                     ;(&a[i+1]>&a[N-1])
             ((GOTO exit)) ))      ;r2= maximum of array at exit
      ELSE                         ;if (a[i]>max)
         ((COPY r$1 (r$2))         ;update max= a[i]
          (COPY? r$3 (r$1))        ;can overwrite r1 with a[i+1] now
          (IF (NOT cc$1)           ;if not(&a[i+1]>&a[N-1])
              ((GT r$3 r$1 (cc$0)) ;cc0= (a[i+1] > a[i]) (2nd it.)
                                   ;GT uses OLD r$1
               (GT r$4 r$8 (cc$1)) ;cc1= (&a[i+2]>&a[N-1]) (2nd it.)
               (LOAD? mar$0 (r$3)) ;r2=load a[i+2]     (3rd it.)
               (ADD? r$4 4 (r$4 mar$0)) ;r4,mar0=&a[i+3] (3rd it.)
               (GOTO L2))
          ELSE               ;(&a[i+1]>&a[N-1])
               ((GOTO exit)) )) )))     ;r2= maximum of array at exit

Then, the branch target VLIW is executed in cycle n+3 (either L2 or exit). Notice that the loop is executed at a rate of one iteration per cycle, and 4 way branching is done every cycle, without incurring any branch stalls. Identical operations that occur in multiple places in the tree (such as (COPY? r$3 (r$1))) require only one ALU. So the VLIW L2 uses a total of 7 ALUs, since there are only 7 distinct operations.

 
  VLIW Prototype Image 
[VLIW prototype image]
  Related Research 
arrow DAISY
arrow LaTTe: an open-source JIT compiler
  More Information
arrow Talks and Presentations
arrow Publications and Patents
arrow Selected Abstracts

 
  About IBM  | Privacy  | Legal  | Contact