0018-8670/00/$5.00 (C) 2000 IBM Java programming for high-performance numerical computing by J. E. Moreira, S. P. Midkiff, M. Gupta, P. V. Artigas, M. Snir, R. D. Lawrence Reprint Order No. G321-5715. First proposed as a mechanism for enhancing Web content, the Java[TM] language has taken off as a serious general-purpose programming language. Industry and academia alike have expressed great interest in using the Java language as a programming language for scientific and engineering computations. Applications in these domains are characterized by intensive numerical computing and often have very high performance requirements. In this paper we discuss programming techniques that lead to Java numerical codes with performance comparable to FORTRAN or C, the more traditional languages for this field. The techniques are centered around the use of a high-performance numerical library, written entirely in the Java language, and on compiler technology. The numerical library takes the form of the Array package for Java. Proper use of this package, and of other appropriate tools for compiling and running a Java application, results in code that is clean, portable, and fast. We illustrate the programming and performance issues through case studies in data mining and electromagnetism. The Java** language is an attractive general-purpose programming language for many fundamental reasons: clean and simple object semantics, cross-platform portability, security, and an increasingly large pool of adept programmers. In both industry and academia, using the Java language as a programming language for scientific and engineering computations is of great interest. Applications in these domains are characterized by intensive numerical computing. The goal of this paper is to show that the major impediment to the use of Java as a vehicle for numerical computing-performance-is not intrinsic to the language and can be solved using techniques we describe. It is true that, when commercial Java environments are used, the performance of numerically intensive Java programs falls woefully short of the performance of FORTRAN programs. As shown later in the case study of an electromagnetics application, the performance of numerically intensive Java programs, developed and compiled without the benefit of the techniques described in this paper, can be as low as 1 percent of the performance of equivalent FORTRAN programs. Although anecdotal evidence suggests that performance degradations of up to 50 percent relative to FORTRAN 90 might be tolerated in numerically intensive Java programs in order to gain its other benefits, a 100-fold performance slowdown is unacceptable. The Java Grande Forum,[1] reflecting in part results from our own research, has identified five critical Java language and Java virtual machine issues related to the applicability of the Java language to solving large computational problems in science and engineering. Unless these issues are resolved, it is unlikely that the Java language will be successful for numerical computing. The five critical issues are: 1. Multidimensional arrays-True rectangular multidimensional arrays are the most important data structures for scientific and engineering computing. A large fraction of this paper is dedicated to explaining the problem with the existing Java approach to multidimensional arrays. We also describe our solution to the problem, through the design of a package implemented entirely in the Java language for multidimensional arrays, which we call the Array package for Java. 2. Complex arithmetic-Complex numbers are an essential tool in many areas of science and engineering. Computations with complex numbers need to be supported as efficiently as computations with the primitive real number types, float and double. The issue of high-performance computing with complex numbers is directly tied to the next issue. 3. Lightweight classes-The excessive overhead associated with manipulation of objects in the Java language makes it difficult to efficiently support alternative arithmetic systems, such as complex numbers, interval arithmetic, and decimal arithmetic. The ability to manipulate certain objects as having just value semantics is absolutely fundamental to achieve high performance with these alternative arithmetic systems. In this paper, we describe how we were able to treat complex numbers in Java as values, thus achieving FORTRAN-like performance, without making any changes to the language. The same approach can be extended to other numerical types. 4. Use of floating-point hardware-Achieving the highest-possible level of performance on numerical codes typically requires exploiting unique floating-point features in each processor. This exploitation is often at odds with the Java goal of exact reproducibility of results in every platform. In this paper we specifically consider the performance impact of utilizing the POWER and PowerPC* fused multiply-add (fma) instruction in Java. 5. Operator overloading-If multidimensional arrays and complex numbers (and other arithmetic systems) are to be implemented in Java as a set of standard packages, then operator overloading is necessary to make the use of these packages more attractive to the application programmer. Although operator overloading is a feature primarily related to code usability and readability, it does have performance implications. We discuss these implications in the context of complex numbers. This paper discusses our efforts and lists our accomplishments in addressing the performance problems associated with using the Java language for numerical computing. We focus mostly on multidimensional arrays and complex numbers, but our discussion touches all of the previously mentioned five issues. We show that, when our techniques are applied, Java code can achieve between 55 and 90 percent of the performance of highly optimized FORTRAN code in a variety of engineering and scientific benchmarks. We illustrate the details of these performance results through detailed case studies in data mining and electromagnetism. We also provide a summary of results for other benchmarks. The compiler optimizations we discuss have been implemented in our research prototype version of the IBM High-Performance Compiler for Java (HPCJ).[2] The remainder of this paper is organized as follows. The next section is an overview of the problems associated with the current Java approaches to multidimensional arrays and nonprimitive numerical types. The third section describes the details of the Array package for Java, which supports FORTRAN-like performance for multidimensional array computations in Java code. The subsequent section is a case study of a data mining computation illustrating the performance benefits that result from using the Array package. The fifth section is a case study of an electromagnetics application that makes heavy use of complex numbers. The sixth section summarizes performance results from a larger set of numerically intensive benchmarks, showing that Java implementations can be performance-competitive with the best FORTRAN implementations. The seventh section discusses some related work, and finally, the last section presents our conclusions, discusses the impact of our work, and elaborates on some future research. The Array package is freely available from alphaWorks*, at http://www.alphaworks.ibm.com/tech/ninja. Our project Web page, with additional information on the Array package and our research in general, is located at http://www.research.ibm.com/ninja. Java approach to multidimensional arrays and complex numbers In this section, we explain the performance problems associated with the existing Java approaches to implementing multidimensional arrays and alternative arithmetic systems, particularly complex numbers. We also discuss why the corresponding FORTRAN 90 approaches are so much more efficient. We give a taste of our solutions to Java performance problems, which are discussed in more detail in later sections. We show how FORTRAN-like performance can be achieved with 100 percent Java code while remaining within the philosophy and spirit of the language. Java arrays. Although the specification of Java arrays does not mandate their exact organization, it places sufficient constraints so that all implementations we know of appear as shown in Figure 1. The figure shows how two-dimensional arrays would be laid out, and accessed, in a Java implementation. The Java language does not directly support arrays of rank greater than one. Thus, a two-dimensional array is represented as an array-of-arrays: an array whose elements are, in turn, references to one-dimensional row arrays. Each array is represented as an array object descriptor followed by a chunk of storage that contains the elements of the array. The array object descriptor contains the fields that are present in all objects (lock bits, bits used by the garbage collector, type information, etc.), as well as the upper bound of the array. If a two-dimensional array of type T is being represented, the elements of each row array are either of type T, if T is a Java primitive type, or references to objects of type T. In Figure 1, each element is a Java primitive of type double. Although the layout of the elements of the array are not specified, they are typically contained in a contiguous chunk of storage. The advantages of this layout for an array are: very general structures can be created, and the necessary information to perform bounds checking is always available. As seen in Figure 1, it is not required that every row have the same length. Figure 1 is an example of a ragged array. It is not even necessary for every row to be unique-rows 2 and n - 2 of array A point to the same row array object, and thus A[2][4] and A[n - 2][4] name exactly the same element. Even more generality is possible with arrays of Objects. In that case, each element can be an arbitrary Java object, even another array of any rank and type. All this generality comes with a performance price-a price that is too high for numerical programs that do not need the supplied generality. First, consider array bounds checking. The Java language specification requires that any access to an array element that is not in bounds must throw an exception. The job of checking for out-of-bounds exceptions exacts a significant toll on Java performance. In Reference 3 it is shown that eliminating array bounds checks alone produced speedups on the IBM POWER and PowerPC processors three to four times over keeping the checks. On some processors, this overhead is less. The real impact on performance, however, comes from having one or more potential exceptions for every data access. Because Java exceptions are precise, operations cannot be moved past a potential exception. This limitation effectively precludes almost all optimizations traditionally applied to numerical programs[4,5]-optimizations that are necessary for good performance. In Reference 3, techniques for creating program regions free of array bounds exceptions are presented. Even with these techniques, the Java array model interferes with good optimization. Optimizing bounds checks for an array reference A[i][j] inside a loop structure requires: (1) determining the range of values that i and j can take during execution of the loop and (2) being able to test, before starting loop execution, that those ranges will be within the legal bounds of the array. Since a Java two-dimensional array can have rows of different lengths, as shown in Figure 1, the test has to be performed for all rows of the array. A much simpler test could be used if the array were guaranteed to be rectangular (i.e., all rows of the same length). Another major impediment to optimization in Java is the ability to alias rows of an array. Aliasing occurs when two or more apparently different variables or references actually refer to the same datum. An example of aliasing is shown in Figure 1, where A[2] and A[n - 2] refer to the same row. Moreover, B[1], B[2], A[2] and A[n - 2] all refer to the same row. The problem with aliasing is that many optimizations rely on moving data reads and writes relative to one another. For this rearrangement to be legal, it is necessary that (1) all writes to a datum be kept in order; (2) that no write to a datum be moved prior to a read of the same datum; and (3) that no read from a datum be moved after a write to the same datum. Essential to determining the legality of a transformation is the ability to determine when two operations are to different data and, therefore, independent. Aliasing makes this more difficult because it is no longer possible to say that just because two variables have different names, or because different array elements have different coordinates, they must refer to different memory locations. The final problem with Java arrays that we discuss is the cost of accessing an element. Accessing element A[2][4] requires the following steps: (1) get the reference to the array object A; (2) determine that A is not a null pointer; (3) determine that "2" is an in-bounds index for A; (4) get the reference to the array object that is row 2; (5) determine that this object reference is not a null pointer; (6) determine that "4" is an in-bounds index for A[2]; and (7) get the data element A[2][4]. In contrast, as we will see, FORTRAN 90 requires only two steps. FORTRAN 90 arrays. The structure of a FORTRAN 90 two-dimensional array is shown in Figure 2. It has two major components: a block of dense, contiguous storage that holds the data elements of the array, and a descriptor that contains the address of the block of dense storage, bounds information for each dimension, the size of individual data elements, and other information used to index the array. This indexing information is used to access array elements and implement array sections. Accessing an element of the array is straightforward: the base address of the storage is extracted from the descriptor, and the subscript, along with the indexing information contained in the descriptor, is used to compute an offset into the data storage. Because of the information contained in the descriptor, array sections (see array B in Figure 2, which corresponds to the shaded elements of A) can be implemented efficiently. A section of a FORTRAN 90 array A is a view of A that accesses some subset of the elements of A. The section has the following properties: (1) it (logically) utilizes the same storage as the original array, thus a change of an element value contained in A and the section is reflected in both; (2) the section has rank less than or equal to the rank of A; (3) the elements in the section are defined by enumerating element indices along each axis; and (4) each element enumerated must be a valid, in-bounds element of the original array axis. Let A be an array of shape 8 X 8. Then B [implies] A(2 : 8 : 2, :) associates B with rows 2, 4, 6, 8, and all columns of the original array A. The reference B(3, 4) accesses row 6 and column 4 of the original array A. This example is illustrated in Figure 2. The design of FORTRAN 90 arrays has several advantages. First, because the storage is dense and contiguous, access requires a single pointer dereference and offset computation, regardless of the rank or dimensionality of the array being accessed. Second, the ability to take sections allows great flexibility in passing portions of arrays to library functions. Third, a rich set of intrinsic functions (i.e., built-in operators expressed syntactically as functions) that operate on arrays can be supported. Fourth, most primitive operations in the language (e.g., +, -, *, and /) are overloaded and specify the operation applied element-wise over the entire array. Finally, despite its support of array sections, aliasing disambiguation with FORTRAN 90 arrays is much simpler than with Java arrays. Given two two-dimensional FORTRAN 90 array sections A(l[sub]A[/sub][sup]1[/sup] : u[sub]A[/sub][sup]1[/sup] : s[sub]A[/sub][sup]1[/sup], l[sub]A[/sub][sup]2[/sup] : u[sub]A[/sub][sup]2[/sup] : s[sub]A[/sub][sup]2[/sup]) and B(l[sub]B[/sub][sup]1[/sup] : u[sub]B[/sub][sup]1[/sup] : s[sub]B[/sub][sup]1[/sup], l[sub]B[/sub][sup]2[/sup] : u[sub]B[/sub][sup]2[/sup] : s[sub]B[/sub][sup]2[/sup]), alias disambiguation is a matter of showing that A [neq] B or that the intersection of the ranges is empty: (l[sub]A[/sub][sup]1[/sup] : u[sub]A[/sub][sup]1[/sup] : s[sub]A[/sub][sup]1[/sup]) [intersect] (l[sub]B[/sub][sup]1[/sup] : u[sub]B[/sub][sup]1[/sup] : s[sub]B[/sub][sup]1[/sup]) = [sub][empty set][/sub] or (l[sub]A[/sub][sup]2[/sup] : u[sub]A[/sub][sup]2[/sup] : s[sub]A[/sub][sup]2[/sup]) [intersect] (l[sub]B[/sub][sup]2[/sup] : u[sub]B[/sub][sup]2[/sup] : s[sub]B[/sub][sup]2[/sup]) = [sub][empty set][/sub]. Even if disambiguation cannot be done at compile time, the run-time test is trivial. In comparison, the disambiguation of two two-dimensional Java arrays, with their unrestricted pointers as shown in Figure 1, is almost always impossible at compile time (without extensive interprocedural analysis) and very expensive at run time. FORTRAN 90 arrays are not, however, without their faults. Storage association exposes the programmer to the fact that data elements are contiguous in memory, and that the next logical element of the array is also adjacent in storage to the previous element. Quite frequently, FORTRAN programs rely on storage association to execute correctly. In addition, optimizations that alter the layout of an array[6-9] are globally visible. As a consequence, data typically have to be copied to and from the original layout. Copying makes these optimizations more fragile and less useful, since the overhead of copying the arrays twice must be factored into the cost model for determining whether the optimization should be performed. A solution for true multidimensional arrays in Java. To overcome the performance deficiencies inherent to Java arrays, we developed the Array package for Java. The Array package is a class library that implements true rectangular multidimensional arrays that combine (1) the efficiency of FORTRAN 90 arrays, (2) the flexibility of data layout of Java arrays, and (3) the safety and security features provided by the Java requirement for bounds checking. With the Array package, multidimensional arrays are created using standard Java constructs. For example, the code doubleArray2D A = new doubleArray2D(m,n); doubleArray2D B = new doubleArray2D(m,n) declares two two-dimensional arrays A and B of (rectangular) shape m X n. Elements of these arrays can be accessed using set and get methods from the class doubleArray2D. For example, A.set(j,i,B.get(i,j)) copies element (i, j) of array B into element (j, i) of array A. A more detailed description of the Array package appears in the next section. For now, suffice it to say that the Array package has FORTRAN-like semantics to facilitate compiler optimizations. Complex numbers in the Java and FORTRAN 90 languages. In the FORTRAN programming language, complex numbers are supported as primitive types. Arithmetic operations with complex numbers can be coded as easily as with real numbers. Multidimensional arrays of complex numbers are directly supported in FORTRAN. Java, in contrast, does not support primitive complex types. The typical approach for implementing complex numbers and complex arithmetic in Java is through the definition of a Complex class. Some components of such a class, which is being proposed for standardization by the Java Grande Forum,[1] are shown in Figure 3. Objects of this class are used to represent complex numbers. We note that a Complex object typically requires 32 bytes for representation: a 16-byte object descriptor (present in all Java objects) and two 8-byte fields for the real and imaginary parts. The operation a X b + c, where a, b, and c are complex numbers, can be coded in Java as a.times(b).plus(c) where a, b, and c are references to objects of class Complex. Note that the evaluation of a.times(b) creates a new Complex object to hold this intermediate result. Method plus is then invoked on this intermediate result, creating yet another object with the result of a X b + c. The approach of representing complex numbers as objects of class Complex, and the associated creation of objects to represent the output of computations, results in a very high cost for using complex numbers in Java. As seen in the MICROSTRIP benchmark results (in the case study on electromagnetics, where a detailed analysis of this benchmark is given), applications using a Java Complex class (to represent complex numbers) can be 100 times slower than the equivalent applications in FORTRAN 90. These performance problems can be addressed by using the technique of semantic expansion,[10] as implemented in our research prototype HPCJ. Semantic expansion treats standard classes as language primitives. Unlike traditional procedure or method "inlining," which (intuitively) treats the inlined method as a macro, semantic expansion uses the compiler's built-in knowledge of the class semantics to directly generate efficient, legal code. With semantic expansion, Complex objects and operations on those objects are recognized by the compiler and treated essentially as language primitives (largely as FORTRAN 90 would treat them). The arithmetic methods of class Complex are treated by semantic expansion as operating and returning complex values (pairs of [real, imaginary] doubles). If the consumers of theses values are other arithmetic methods of Complex, they are semantically expanded to directly use the complex values. That is, only the resulting value of an operation is propagated from one step to the next. Otherwise, the value is converted to a Complex object, with all the properties of a regular Java object. By treating complex numbers as values whenever possible, temporary Complex objects do not have to be created for arithmetic operations, thus reducing the allocation and garbage collection overhead. This approach delivers the same performance benefit for complex numbers as the proposed value object extension for Java,[1] without requiring language changes. We show in the section on the electromagnetics case study and in the subsequent section that semantic expansion can sometimes improve the performance of Java applications using complex arithmetic to approximately 80 percent of the performance of corresponding FORTRAN 90 programs. The problems we described earlier associated with Java multidimensional arrays are exacerbated with arrays of complex numbers. A Java two-dimensional array of Complex, declared as Complex[][] C, is just a collection of pointers to Complex objects. Optimizing null-pointer checks requires, in general, an inspection of the entire array. In addition to the indiscriminate row-level aliasing shown in Figure 1, Java arrays of Complex objects are subject to indiscriminate element-level aliasing, making the issue of alias disambiguation even harder. We address the problems related to Java arrays of complex numbers by including in the Array package classes that represent true rectangular arrays of complex numbers (e.g., ComplexArray2D). These arrays always have a complex number associated with each element, and there is no aliasing between two different elements. The role of operator overloading. The Array package, the Complex class, semantic expansion, and other compiler techniques we developed can successfully achieve high levels of performance with numerical computing in Java. However, we are still left with the difficulty of writing, maintaining, and reading programs that make extensive use of classes to represent numerical and array types. Some form (maybe limited) of operator overloading is necessary to allow application programmers to write their codes in a clear and intuitive notation. At a minimum, it is necessary to support operator overloading of the basic arithmetic (+, -, *, /, %) and relational (= =, ! =, < =, <, >, > =) operators, as well as of the array indexing operator ([ ]). As an example, consider the evaluation of b[i, j] = a[i + 1, j] + a[i - 1, j], where b and a are two-dimensional arrays of complex numbers (implemented as ComplexArray2D). Ideally, we would like to code this operation as b[i,j] = a[i+1,j] + a[i-1,j] (1) which is an operator-overloaded representation for b.set(i,j,a.get(i+1,j).plus(a.get(i-1,j))) (2) The set and get methods correspond to the overloaded array indexing operator ([ ]) and the plus method corresponds to the overloaded addition operator (+). Operator overloading is fundamental for productivity and clarity of numerical codes using nonprimitive arithmetic systems and array classes. (An alternative to defining operator overloading in the language is to make use of visual editors that present a more convenient interface to numerical programmers. Unfortunately, it binds the programmer to a particular development environment.) Operator overloading also has an impact on performance. Rules for the semantics of operator overloading must be simple, and code written with overloaded operators (e.g., code fragment 1) is typically equivalent to code that creates temporary objects (e.g., code fragment 2). In other words, object reuse in code with overloaded operators is difficult. Thus, the kind of optimizations through semantic expansion that we describe for complex numbers is even more important. The Array package for Java The goal of the Array package for Java is to overcome the limitations inherent in Java arrays in terms of performance and ease of use, as compared to array implementations in other languages. A major design principle of the Array package is to define a set of classes implementing FORTRAN 90-like multidimensional arrays that are highly optimizable, amenable to compiler analysis techniques, and that contain the functionality necessary for numerically intensive computing. We have modeled the Java Array package along features of the FORTRAN 90 language[11] and the A++/P++ libraries.[12,13] Not surprisingly, the reader will also find similarities with APL, since that programming language undoubtedly influenced the design of FORTRAN 90. However, APL is a much more dynamic language, with strong support for polymorphism of data and functions. The design of the Java Array package, particularly as described in this paper, follows the more static nature of FORTRAN 90. This design allows us to leverage mature compiler technologies that have been developed for FORTRAN over the course of many years. It is important to emphasize that the Array package is written entirely in the Java language. Its use does not require any code preprocessing steps, just the appropriate import statements. A word about notation: From now on, when we use the term Array (with an uppercase A) we refer to the constructs provided by the Array package. The standard arrays in the Java language will be referred to as Java arrays (with a lowercase a). The functionality provided by the Array package is implemented by a set of basic methods and operations that are highly optimizable. The class hierarchy inside the Array package has been defined to enable aggressive compiler optimization. As an example, we make extensive use of final classes, since most Java compilers (and, in particular, javac) are more easily able to apply method inlining to methods of final classes. This approach also statically binds the semantics of methods, thus facilitating semantic expansion. In the most commonly used methods of the Array package (i.e., simple element-wise operations) the cost of a method call dominates the cost of the computation done by the method. Therefore, inlining those methods is essential for good performance. In contrast to Java arrays discussed in the previous section, Arrays defined by the Array package have properties (e.g., a nonragged rectangular shape and constant bounds over their lifetime) that are easy to detect and use by an optimizing compiler. Thus, for problems that map well to rectangular multidimensional structures, the Array package provides both functionality and performance and is the obvious choice over Java arrays. Arrays are n-dimensional rectangular collections of elements. An Array is characterized by its rank (number of dimensions or axes), its elemental data type (all elements of an array are of the same type), and its shape (the extents along its axes). Rank, type, and shape are immutable properties of an Array. All Array classes are named using the following scheme: ArrayD. As an example, floatArray3D is a three-dimensional Array of single precision floating-point numbers. In the current version of the Array package, type can be any of the Java primitive types (boolean,byte,char,short,int,long,float,double), Complex (for complex numbers), or Object for general objects. The currently supported values for rank are 0, 1, 2, and 3. Class hierarchy. All Array classes are derived from a base abstract class Array, which defines the common methods for all types of Arrays. Arrays are further subdivided according to their (elemental) type and rank. We also define the Index and Range helper classes for indexing and sectioning an Array (see the later subsection on indexing for more details). Various exception classes are also provided by the Array package for reporting errors that occur during array operations. Figure 4 shows the class hierarchy of the Array package. We note that the fully qualified name of the Array package for Java, in accordance with Java standards, is com.ibm.math.array. Use of this fully qualified name allows us to build knowledge about the Array package into our compiler. Transactional behavior, array semantics, and exceptions. All operations (methods and constructors) of the Array package have, in the absence of Java virtual machine (Jvm) failure, transactional behavior. That is, only two outcomes of an array operation are possible: (1) the operation completes without any error, or (2) the operation is aborted, in which case an exception is thrown before any user-visible data are changed. If a Jvm failure occurs (e.g., it runs out of stack space) during the execution of an Array package operation, the program enters an unrecoverable error state, and the results of the partial operation are observable only by postmortem tools. Therefore, the fact that the transactional behavior is violated in this particular case is irrelevant. In order to clearly identify what was the cause for aborting an Array package operation, exceptions are thrown. These exceptions have descriptive names that make the cause of the exception clear to the user. For example, a NonconformingArrayException is thrown if the user attempts to add two Arrays of different shapes, indicating that the add operation cannot be performed because the Arrays do not match. Other exceptions are thrown in other scenarios. The number of methods that might throw a particular exception is very large, so we do not cover all cases extensively in this paper. All exception classes are derived from the RuntimeException class. In Java, exceptions derived from the RuntimeException class are not required to be caught by user code. Hence, it is not an error for a piece of code to invoke an Array package method but not declare (using a throws clause) or catch an exception that may be thrown by that method. In current numerical algorithms, conditions that would raise an exception typically result from programming bugs. The expectation is that the programmer will fix the bug rather than add exception-handling code to handle it at run time. Nevertheless, if the programmer would like to catch the exception, it can still be done, and so no limitations have been placed on exception handling. Array operations also implement array semantics: In evaluating an expression (e.g., A = A * B + C), all data are first fetched (i.e., A, B, and C are read from memory), the computation is performed (i.e., A * B + C is evaluated), and only then are the results stored (i.e., A is modified). Of course, a high-performance system should implement array semantics while optimizing to avoid unnecessary copying. (See the subsection on internal optimizations for more details.) Array constructors. The shape of an array is specified at its creation through a constructor, and it is immutable. The Array package defines three constructors for each Array class, illustrated in Figure 5. The first constructor shown builds a new Array whose shape is specified by the parameters. The second constructor shown builds a new Array that is a copy of the Array input parameter. The new Array has its own storage and shares no data with the input Array. Finally, the third constructor shown builds a new Array with elements copied from a conventional Java array. If the Java array is multidimensional, it must be rectangular. Again, the new Array has its own storage area. Indexing elements of an array. Elements of an array are identified by their indices (coordinates) along each axis. Let a k-dimensional array A of elemental type T have extent n[sub]j[/sub] along its jth axis, j = 0, ..., k - 1. Then, a valid index i[sub]j[/sub] along the jth axis must be greater than or equal to zero and less than n[sub]j[/sub]. An attempt to reference an element A[i[sub]0[/sub], i[sub]1[/sub], ..., i[sub]k-1[/sub]] with any invalid index i[sub]j[/sub] causes an ArrayIndexOutOfBoundsException to be thrown. The Array package allows an axis of an Array to be indexed by either an integer, a Range object or an Index object. Indexing with an integer specifies a single element along an axis. Indexing with a Range object is like addressing using triplets in FORTRAN: a first element, a last element, and an optional stride (which defaults to one) specify a regular pattern of elements. An Index object is a list of numbers that is used by the Array package method to select the enumerated elements along the given axis. Index objects are convenient for scatter-gather operations and for operations on sparse data in general. When indexing with an Index or Range object, all the indices must be valid (i.e., within bounds), or an ArrayIndexOutOfBoundsException is thrown. Indexing operations are used in accessor methods that read or write data from or to an array, and in sectioning methods that create views of an array. Accessor methods support all possible combinations of integer, Range, and Index indices. That is, the subscript along any given dimension can be either an integer, a Range object, or an Index object. Sectioning methods only support combinations of integer and Range indices. Supporting sections defined with Index objects would require a complicated and slow descriptor for the Array and impair efficient access to Array elements in the common cases. Array accessors can be implemented efficiently even with Index objects. Defined methods. The Array package defines four groups of methods that are always present in any Array package class: Array operation, Array manipulation, Array accessor, and Array inquiry. A group of methods is just a collection of methods with similar functionality. The complexity of operations is, in most cases, similar for methods of a group. Array operations. Array operations are scalar operations applied element-wise to a whole Array. The methods in this category include assignment, arithmetic and arithmetic-assign operations (analogous, for example, to += and *= operators in Java), relational operations, and logic operations (where appropriate for the elemental data type). These operations are further subdivided as Scalar-Array and Array-Array operations. Scalar-Array operations apply the operation to a scalar and each element of the Array. Array-Array operations apply the operation to all equivalent elements of two Arrays. If both Arrays in an Array-Array operation do not have the same shape, a NonconformingArrayException is thrown. These operations are highly optimizable and parallel in nature. In Figure 6 we compare some elementary Array operations described using a math notation and the notation defined by the Array package. When there is more than one Array package version for the expression, the first expression is the most straightforward and intuitive form. The alternative forms for an operation lead to better performance, as they avoid creating (at least some) intermediate objects. (The forms are semantically equivalent, if the whole operation is legal.) It is unrealistic to expect users of the Array package to write all Array expressions in the less intuitive but more efficient form. An intelligent compiler should be able to automatically translate the Array expressions to the best-performing version. Array manipulation. Methods in this group include section, permuteAxes, and reshape. These methods operate on the array as a whole, creating a new view of the data and returning a new Array object expressing this view. A new Array object is needed because an Array object always has constant properties, and the new Array view may have different basic properties than the original array (e.g., a different rank or shape). This new Array object shares its data with the old object when possible to avoid the overhead of data copying. Sectioning an Array means obtaining a new view to some elements of a given Array. A section is a new Array object that shares its data with the original Array, as a FORTRAN 90 section shares its data with the base array. Since data are shared, only a descriptor needs to be created, resulting in a fast sectioning operation. Figure 7 illustrates how a sectioning operation works. For two-dimensional Arrays, the first index applies to rows, and the second index applies to columns. This example creates a section B corresponding to rows 1 and 2 and columns 1 and 3 of A. Because sections are first-class Array objects, operating on an Array section is indistinguishable from operating on any other Array object. The permuteAxes method implements generic axis interchange. Although permuteAxes(1,0) returns a transposed two-dimensional array, permuteAxes is more general than a simple transpose because any axis can be replaced by any other axis, as long as all axes that exist in the original Array also exist in the permuted Array. If this condition is not met, the method call is illegal and an exception is thrown. For example, B = A.permuteAxes(2,1,0) exchanges axes 0 and 2 of a three-dimensional Array A. That is, B(i,j,k) = A(k,j,i). The expression B = A.permuteAxes(1,1,0) is illegal, and throws a run-time InvalidArrayAxisException, since axis 1 appears twice and axis 2 is missing. Permuting the axis of a given array is a fast operation that only creates a new descriptor. No data copy is needed as the data are shared with the original array. If the user wants data copy to take place, the copy constructor always guarantees that the data are actually copied, as in the example in Figure 8. The reshape method returns a new Array with the same data as the old Array but with a different shape. Every element in the old Array is mapped to an element of the new Array. Therefore, the new Array has the same number of elements as the old Array. The algorithm used to map Array elements is the following. The data are first (logically) linearized by visiting all axes in order and all elements within an axis in index order. The logically linearized elements are then copied into a new Array with the shape specified in the reshape call, again by visiting all axes and all elements within an axis in order. Figure 9 gives examples of reshape method calls. The reshape method throws an InvalidArrayShapeException if the specified shape does not have the right number of elements. Because Array objects have immutable shape, reshaping an Array implies creating a new Array object. A copy of data is needed because our descriptor, which is designed to be fast for common operations, is not rich enough to express the effect of a reshape. We choose not to have a complex descriptor that supports fast reshaping for two reasons: First, the complex descriptor would slow down most other methods. Second, a common purpose of an Array reshape operation is to reorganize the data in order to obtain faster access, and so data copying is a desirable consequence of the reshape operation. Array accessor methods. The methods of this group are get, set, and toJava. These are methods that allow the user to extract and change the array data. Figure 10 contains examples of uses of the methods in this group. The get and set methods are straightforward. The programmer is provided indexing methods (described earlier) to select one or more elements. A get returns the specified elements, and a set updates the specified elements. The method toJava allows a user to extract a Java array from an Array package Array. This feature, together with constructors that convert Java arrays into Array package Arrays, allows pieces of code that work with Java arrays to coexist with (i.e., exchange data with) pieces of code that use the Array package. Because the basic purpose of this group of methods is to move data in and out of Arrays, data copy is always performed. The common operations are fast-for example, extracting a single element from an array. Extracting an entire subarray (e.g., a row of a two-dimensional array) using a single Array class operation is typically faster than extracting one element at a time. Array inquiry methods. This group contains methods last, size, rank, and shape. They are fast, descriptor-only operations that return information about the constant properties of the Array. The method last (used as an example in Figure 10) returns the last valid index along a given axis. The method size, when invoked without arguments, returns the number of elements in the Array; if invoked with an int argument, it returns the number of elements along this given axis. The method rank returns the rank of the Array (i.e., the number of axes or dimensionality of the Array). Finally, the shape method returns a Java array of integers whose elements are the sizes of the Array along each individual axis. Comparing array operations in FORTRAN 90 and with the Array package for Java. As previously mentioned, we have patterned our design of the Array package along the lines of the FORTRAN 90 language. Figure 11 provides a side-by-side comparison of equivalent FORTRAN 90 and Java Array package constructs. Readers familiar with FORTRAN 90 will notice an almost one-to-one correspondence between the two approaches. This correspondence has both performance as well as usability implications. From a performance perspective, optimization techniques developed for FORTRAN 90 can be applied to the Array package. From a usability perspective, programmers used to FORTRAN 90 features will find corresponding functionality in the Array package. Internal optimizations. The semantics of Array operations imply that they are performed on Arrays as they would be performed on scalars. That is, the result is computed before assignment into the target array. Therefore, execution of an Array operation logically follows the sequence: (1) all operands are fetched from memory, (2) all computation is performed, and (3) the result is stored to memory. The ordering between steps (2) and (3) implies that, in general, the result of the operation must be first stored in a temporary Array and then copied to the final target. To provide good performance, the supplied Array operations avoid data copying-a costly operation-as much as possible. For example, consider the code in Figure 12. A naive implementation would perform the plusAssign method in two steps: First, a new Array object is created to store the sum of the two Arrays. Then, the contents of the new object are copied back to the first Array. The unnecessary copying degrades performance. The result of an Array operation can be written directly to the target Array (thus eliminating the need for the temporary Array and the copy operation) if doing so does not overwrite any memory location before its use as an operand. The Array package uses a form of dynamic dependence analysis[14] to determine when this condition holds. It then bypasses the unnecessary temporary Array and copy. BLAS routines. The Basic Linear Algebra Subprograms (BLASs)[15] are the building blocks of efficient linear algebra codes. BLAS implements a variety of elementary vector-vector (level 1), matrix-vector (level 2), and matrix-matrix (level 3) operations. We have designed a Blas class as part of the Array package. It provides the functionality of the BLAS operations for the multidimensional Arrays in that package. Note that this is a 100 percent Java implementation of BLAS and not an interface to already existing native libraries. Therefore, we have a completely portable parallel implementation of BLAS that works well with the Array package. The code in the Blas class is optimized to take advantage of the internals of the Array package implementation. Application code making use of these methods will have access to tuned BLAS routines with no programming effort. Figure 13 illustrates several features of the Array package by comparing two straightforward implementations of the basic BLAS operation dgemm. The dgemm operation computes C = [alpha]A* X B* + [beta]C, where A, B, and C are matrices and [alpha] and [beta] are scalars. A* can be either A or A[sup]T[/sup]. The same holds for B*. In Figure 13A the matrices are represented as doubleArray2D objects from the Array package. In Figure 13B the matrices are represented as double[][]. For simplicity, in both cases we omit the test for aliasing between C and A or B. Nevertheless, we want to emphasize that such a test is much simpler and cheaper for the Array package version. The first difference is apparent in the interfaces of the two versions. The dgemm version for the Array package transparently handles operations on sections of a matrix. Sections are extracted by the caller and passed on to dgemm as doubleArray2D objects. Section descriptors have to be explicitly passed to the Java arrays version, using the m, n, p, i0, j0, and k0 parameters. The calling format for computing C(0 : 9, 10 : 19) = A(0 : 9, 20 : 39) X B(20 : 39, 10 : 19) using the Array package is dgemm(NoTranspose,NoTranspose,1.0, A.section(new Range(0,9), new Range(20,39)), B.section(new Range(20,39), new Range(10,19)),0.0, C.section(new Range(0,9),new Range(10,19))) and using Java arrays is dgemm(NoTranspose,NoTranspose,10,20,10,0,10,20,1.0,A,B,0.0,C) Next, we note that the computational code in the Array package version is independent of the orientation of A or B. We just perform a (very cheap) transposition, if necessary, by creating a new descriptor for the data using the permuteAxes method. In comparison, the code in the Java arrays version has to be specialized for each combination of the orientation of A and B. (We only show the two cases in which A is not transposed.) Finally, in the Array package version, we can easily perform some shape consistency verifications before entering the computational loop. If we pass that verification, we know we will execute the entire loop iteration space without exceptions. This verification guarantees the transactional behavior of the method. Such verifications would be more expensive for the Java arrays, because we would have to traverse each row of the array to make sure they are all of the appropriate length. Furthermore, at least the row-pointer part of each array would have to be privatized inside the method to guarantee that no other thread changes the shape of the array. This example illustrates some of the benefits, both for the programmer and the compiler, of operating with the true multidimensional rectangular Arrays of the Array package. We summarize the benefits the Array package brings to compiler optimizations in the subsection that follows the next one. Arrays of complex numbers. Figure 14 shows some components of class ComplexArray2D that implements two-dimensional Arrays of complex numbers. Inside the Array, values are stored in a packed form, with (real, imaginary) pairs in adjacent locations of the data array. This is similar to the FORTRAN style for representing arrays of complex numbers and only requires 16 bytes of storage per complex value. The declaration ComplexArray2D a = new ComplexArray2D(m,n); creates and initializes to zero an m X n Array of complex values, not complex objects. The equivalent structure with Java arrays would be created with Complex[ ][ ] b = new Complex[m][n]; for (i=0; i