IBM®
Skip to main content
    Country/region [change]    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    

IBM Journal of Research and Development

Blue Gene   Volume 49, Number 2/3, 2005
Table of contents: HTMLPDF This article: HTMLPDF   Copyright info

Design and exploitation of a high-performance SIMD floating-point unit for Blue Gene/L - References

by S. Chatterjee,
L. R. Bachega,
P. Bergner,
K. A. Dockser,
J. A. Gunnels,
M. Gupta,
F. G. Gustavson,
C. A. Lapkowski,
G. K. Liu,
M. Mendell,
R. Nair,
C. D. Wait,
T. J. C. Ward,
and P. Wu
References

  1. G. Almási, D. Beece, R. Bellofatto, G. Bhanot, R. Bickford, M. Blumrich, A. A. Bright, J. Brunheroto, C. Cascaval, J. Castaños, L. Ceze, P. Coteus, S. Chatterjee, D. Chen, G. Chiu, T. M. Cipolla, P. Crumley, A. Deutsch, M. B. Dombrowa, W. Donath, M. Eleftheriou, B. Fitch, J. Gagliano, A. Gara, R. Germain, M. E. Giampapa, M. Gupta, F. Gustavson, S. Hall, R. A. Haring, D. Heidel, P. Heidelberger, L. M. Herger, D. Hoenicke, R. D. Jackson, T. Jamal-Eddine, G. V. Kopcsay, A. P. Lanzetta, D. Lieber, M. Lu, M. Mendell, L. Mok, J. Moreira, B. J. Nathanson, M. Newton, M. Ohmacht, R. Rand, R. Regan, R. Sahoo, A. Sanomiya, E. Schenfeld, S. Singh, P. Song, B. D. Steinmacher-Burow, K. Strauss, R. Swetz, T. Takken, P. Vranas, and T. J. C. Ward, “Cellular Supercomputing with System-on-a-Chip,” Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC’02), 2002, pp. 196–197.
  2. K. Dockser, “‘Honey, I Shrunk the Supercomputer'—The PowerPC 440 FPU Brings Supercomputing to IBM's Blue Logic Library,” IBM MicroNews 7, No. 4, 29–31 (November 2001).
  3. IBM Corporation, PowerPC 440 Embedded Processor Core Users Manual, Document No. SA14-2523-01, June 2001.
  4. S. Larsen and S. P. Amarasinghe, “Exploiting Superword Level Parallelism with Multimedia Instruction Sets,” Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2000, pp. 145–156.
  5. L. Bachega, S. Chatterjee, K. A. Dockser, J. A. Gunnels, M. Gupta, F. G. Gustavson, C. A. Lapkowski, G. K. Liu, M. P. Mendell, C. D. Wait, and T. J. C. Ward, “A High-Performance SIMD Floating Point Unit for BlueGene/L: Architecture, Compilation, and Algorithm Design,” Proceedings of the 13th International Conference on Parallel Architecture and Compilation Techniques (PACT'04), 2004, pp. 85–96.
  6. R. C. Agarwal, F. G. Gustavson, and M. Zubair, “Exploiting Functional Parallelism of POWER2 to Design High-Performance Numerical Algorithms,” IBM J. Res. & Dev. 38, No. 5, 563–576 (1994).
  7. R. K. Montoye, E. Hokenek, and S. L. Runyon, “Design of the IBM RISC System/6000 Floating-Point Execution Unit,” IBM J. Res. & Dev. 34, No. 1, 59–70 (1990).
  8. IBM Corporation, Book E: Enhanced PowerPC Architecture, March 2000; see http://www-3.ibm.com/chips/techlib/techlib.nsf/productfamilies/PowerPC.
  9. T. R. Halfhill, “IBM PowerPC Hits 1,000 MIPS,” Microprocessor Report 13, No. 14 (October 25, 1999).
  10. K. Diefendorff, P. K. Dubey, R. Hochsprung, and H. Scales, “AltiVec Extension to PowerPC Accelerates Media Processing,” IEEE Micro 20, No. 2, 85–89 (2000).
  11. C.-L. Yang, B. Sano, and A. R. Lebeck, “Exploiting Parallelism in Geometry Processing with General Purpose Processors and Floating-Point SIMD Instructions,” IEEE Trans. Computers 49, No. 9, 934–946 (September 2000).
  12. IEEE Standard 754-1985, “IEEE Standard for Binary Floating-Point Arithmetic,” ©2004 IEEE; see http://grouper.ieee.org/groups/754/.
  13. K. O'Brien, B. Hay, J. Minish, H. Schaffer, B. Schloss, A. Shepherd, and M. Zaleski, “Advanced Compiler Technology for the RISC System/6000 Architecture,” IBM RISC System/6000 Technology, IBM Corporation, Armonk, NY, 1990, pp. 154–161.
  14. P. Briggs, “Register Allocation via Graph Coloring,” Ph.D. thesis, Rice University, Houston, TX, 1992.
  15. A. Eichenberger, P. Wu, and K. O'Brien, “Vectorization for Short SIMD Architectures with Alignment Constraints,” Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, 2004, pp. 82–93.
  16. J. J. Dongarra, J. Du Croz, S. Hammarling, and I. Duff, “A Set of Level 3 Basic Linear Algebra Subprograms,” ACM Trans. Math. Software 16, No. 1, 1–17 (March 1990).
  17. C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh, “Basic Linear Algebra Subprograms for Fortran Usage,” ACM Trans. Math. Software 5, No. 3, 308–323 (September 1979).
  18. K. Goto and R. van de Geijn, “On Reducing TLB Misses in Matrix Multiplication,” Technical Report TR-2002-55 (FLAME Working Note No. 9), University of Texas at Austin, Department of Computer Sciences, 2002.
  19. J. A. Gunnels, F. G. Gustavson, G. M. Henry, and R. A. van de Geijn, “FLAME: Formal Linear Algebra Methods Environment,” ACM Trans. Math. Software 27, No. 4, 422–455 (December 2001).
  20. R. C. Whaley and J. J. Dongarra, “Automatically Tuned Linear Algebra Software (ATLAS),” Proceedings of the IEEE/ACM Supercomputing Conference, 1998, p. 38; Best Paper Award for Systems; see http://www.cs.utk.edu/~rwhaley/ATL/INDEX.HTM.
  21. K. Yotov, X. Li, G. Ren, M. Cibulskis, G. DeJong, M. Garzaran, D. Padua, K. Pingali, P. Stodghill, and P. Wu, “A Comparison of Empirical and Model-Driven Optimization,” Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, 2003, pp. 63–76; see http://iss.cs.cornell.edu/Publications/Papers/PLDI2003.pdf.
  22. F. G. Gustavson, “New Generalized Matrix Data Structures Lead to a Variety of High-Performance Algorithms,” Proceedings of the IFIP TC2/WG2.5 Working Conference on the Architecture of Scientific Software, 2000, pp. 211–234.


    About IBMPrivacyContact