Country/region
[
change
]
Terms of use
All of IBM
Home
Products
Services & solutions
Support & downloads
My account
IBM Research
Journals Home
Systems Journal
Journal of Research
and Development
Current Issue
Recent Issues
Papers in Progress
Search Journal Archives
Subscribe/Order
Description
Author's Guide
Staff
Contact Us
Related links
IBM Research: Computer Architecture
UT ICL SANS Project
Exploratory Systems Research
Volume 50, Number 2/3, 2006
Table of contents:
HTML
PDF
This article:
HTML
PDF
Copyright info
Self-adapting numerical software (SANS) effort - References
by J.
Dongarra
,
G.
Bosilca
,
Z.
Chen
,
V.
Eijkhout
,
G. E.
Fagg
,
E.
Fuentes
,
J.
Langou
,
P.
Luszczek
,
J.
Pjesivac-Grbovic
,
K.
Seymour
,
H.
You
,
and S. S.
Vadhiyar
References
C. L. Lawson, R. J. Hanson, F. T. Krogh, and D. R. Kincaid, “Algorithm 539: Basic Linear Algebra Subprograms for FORTRAN Usage [F1],”
ACM Trans. Math. Software
5
, No. 3, 324–325 (1979).
R. C. Whaley, A. Petitet, and J. J. Dongarra, “Automated Empirical Optimization of Software and the ATLAS Project,”
Parallel Computing
27
, No. 1/2, 3–35 (2001).
G. E. Moore, “Cramming More Components onto Integrated Circuits,”
Electronics
38
, No. 8, 114–117 (1965).
R. Allen and K. Kennedy,
Optimizing Compilers for Modern Architectures
, Morgan-Kaufmann Publishing Co., San Francisco, 2002.
D. A. Padua and M. J. Wolfe, “Advanced Compiler Optimizations for Supercomputers,”
Source Commun. ACM
29
, No. 12, 1184–1201 (1986).
Q. Yi, K. Kennedy, H. You, K. Seymour, and J. Dongarra, “Automatic Blocking of QR and LU Factorizations for Locality,”
Proceedings of the ACM SIGPLAN Workshop on Memory System Performance
, 2004, pp. 12–22.
R. Schreiber and J. Dongarra, “Automatic Blocking of Nested Loops,”
Technical Report CS-90-108
, Department of Computer Science, University of Tennessee, Knoxville, TN 37996, 1990.
K. S. McKinley, S. Carr, and C.-W. Tseng, “Improving Data Locality with Loop Transformations,”
ACM Trans. Program. Lang. & Syst.
18
, No. 4, 424–453 (1996).
U. Banerjee, “A Theory of Loop Permutations,”
Selected Papers of the 2nd Workshop on Languages and Compilers for Parallel Computing
, Pitman Publishing Ltd., London, 1990, pp. 54–74.
J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, A. Petitet, R. Vuduc, R. C. Whaley, and K. Yelick, “Self-Adapting Linear Algebra Algorithms and Software,”
Proc. IEEE
93
, No. 2, 293–312 (2005).
J. Bilmes, K. Asanovic, C.-W. Chin, and J. Demmel, “Optimizing Matrix Multiply Using PHiPAC: A Portable, High-Performance, ANSI C Coding Methodology,”
Proceedings of the International Conference on Supercomputing
, 1997, pp. 340–347.
M. Frigo and S. G. Johnson, “FFTW: An Adaptive Software Architecture for the FFT,”
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
, 1998, pp. 1381–1384.
K. Yotov, X. Li, G. Ren, M. Cibulskis, G. DeJong, M. Garzaran, D. Padua, K. Pingali, P. Stodghill, and P. Wu, “A Comparison of Empirical and Model-Driven Optimization,”
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation
, 2003, pp. 63–76.
J. A. Nelder and R. Mead, “A Simplex Method for Function Minimization,”
The Computer J.
7
, No. 4, 308–313 (1965).
Q. Yi and D. Quinlan, “Applying Loop Optimizations to Object-Oriented Abstractions Through General Classification of Array Semantics,”
Proceedings of the 17th International Workshop on Languages and Compilers for Parallel Computing
, 2004; see
http://www.cs.utsa.edu/~qingyi/papers/LCPC04.pdf
.
D. Quinlan, M. Schordan, Q. Yi, and A. Saebjornsen, “Classification and Utilization of Abstractions for Optimization,”
Proceedings of the 1st International Symposium on Leveraging Applications of Formal Methods
, 2004, pp. 2–9.
E. Amaldi and V. Kann, “On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems,”
Theoret. Computer Sci.
209
, 237–260 (1998).
P. Crescenzi and V. Kann, Eds.,
A Compendium of NP Optimization Problems
, 2005; see
http://www.nada.kth.se/theory/problemlist.html
.
M. R. Garey and D. S. Johnson,
Computers and Intractability: A Guide to the Theory of NP-Completeness
, W. H. Freeman & Co., New York, 1979.
J. Gergov, “Approximation Algorithms for Dynamic Storage Allocation,”
Proceedings of the 4th Annual European Symposium on Algorithms
, 1996, pp. 52–61.
D. S. Hochbaum and D. B. Shmoys, “A Polynomial Approximation Scheme for Machine Scheduling on Uniform Processors: Using the Dual Approach,”
SIAM J. Computing
17
, No. 3, 539–551 (1988).
V. Kann, “Strong Lower Bounds on the Approximability of Some NPO PB-Complete Maximization Problems,”
Proceedings of the 20th International Symposium on Mathematical Foundations of Computer Science
, 1995, pp. 227–236.
J. Lenstra, D. Shmoys, and E. Tardos, “Approximation Algorithms for Scheduling Unrelated Parallel Machines,”
Math. Program.
46
, No. 3, 259–271 (1990).
K. J. Roche and J. J. Dongarra, “Deploying Parallel Numerical Library Routines to Cluster Computing in a Self-Adapting Fashion,”
Parallel Computing: Advances and Current Issues
, Imperial College Press, London, 2002.
E. Anderson, Z. Bai, C. Bischof, S. L. Blackford, J. W. Demmel, J. J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. C. Sorensen,
LAPACK User's Guide
, Third Edition, Society for Industrial and Applied Mathematics, Philadelphia, 1999.
L. S. Blackford, J. Choi, A. Cleary, E. F. D'Azevedo, J. W. Demmel, I. S. Dhillon, J. J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. W. Walker, and R. C. Whaley,
ScaLAPACK Users' Guide
, Society for Industrial and Applied Mathematics, Philadelphia, 1997.
Message Passing Interface Forum, “MPI: A Message-Passing Interface Standard,”
Intl. J. Supercomputer Appl. & High Perform. Computing
8
, No. 3/4, 159–416 (1994).
Message Passing Interface Forum, MPI: A Message-Passing Interface Standard Version 1.1, 1995; see
http://www.mpi-forum.org/docs/docs.html
.
Message Passing Interface Forum, MPI-2: Extensions to the Message-Passing Interface, 1997; see
http://www.mpi-forum.org/docs/mpi2-report.pdf
.
MPICH; see
http://www.mcs.anl.gov/mpi/mpich/
.
LAM/MPI Parallel Computing; see
http://www.lam-mpi.org/
.
T. Sterling,
Beowulf Cluster Computing with Linux
, MIT Press, Cambridge, MA, October, 2001.
J. Choi, J. J. Dongarra, L. S. Ostrouchov, A. P. Petitet, D. W. Walker, and R. C. Whaley, “Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines,”
Sci. Program.
5
, No. 3, 173–184 (1996).
TOP500 Supercomputer Sites; see
http://www.top500.org
and
http://www.netlib.org/benchmark/top500.html
.
J. J. Dongarra, P. Luszczek, and A. Petitet, “The LINPACK Benchmark: Past, Present, and Future,”
Concurrency & Computation: Pract. & Exper.
15
, No. 9, 803–820 (2003).
V. Eijkhout and E. Fuentes, “A Proposed Standard for Numerical Metadata,”
Technical Report ICL-UT-03–02
, Innovative Computing Laboratory, University of Tennessee, Knoxville, TN 37996, 2003.
Matrix Market; see
http://math.nist.gov/MatrixMarket
.
K. Schloegel, G. Karypis, and V. Kumar, “Parallel Multilevel Algorithms for Multi-Constraint Graph Partitioning,”
Proceedings of the 6th International Euro-Par Conference
, 2000, pp. 296–310.
The ParMETIS/METIS package; see
http://glaros.dtc.umn.edu/gkhome/views/metis/
.
V. Eijkhout, “Automatic Determination of Matrix Blocks,”
Technical Report UT-CS-01-458
, Department of Computer Science, University of Tennessee, Knoxville, TN 37996, 2001.
G. E. Fagg, E. Gabriel, G. Bosilca, T. Angskun, Z. Chen, J. Pjesivac-Grbovic, K. London, and J. J. Dongarra, “Extending the MPI Specification for Process Fault Tolerance on High Performance Computing Systems,”
Proceedings of the International Supercomputer Conference
, 2004.
E. Gabriel, G. E. Fagg, A. Bukovsky, T. Angskun, and J. J. Dongarra, “A Fault-Tolerant Communication Library for Grid Environments,”
Proceedings of the 17th Annual ACM International Conference on Supercomputing (ICS'03), International Workshop on Grid Computing
, 2003; see
http://icl.cs.utk.edu/news_pub/submissions/FTMPI-SF-gabriel.pdf
.
J. S. Plank, Y. Kim, and J. J. Dongarra, “Fault-Tolerant Matrix Operations for Networks of Workstations Using Diskless Checkpointing,”
J. Parallel & Distr. Computing
43
, No. 2, 125–138 (1997).
G. Bosilca, Z. Chen, J. Dongarra, and J. Langou, “Recovery Patterns for Iterative Methods in a Parallel Unstable Environment,” ,
Technical Report UT-CS-04-538
, Computer Science Department, University of Tennessee, Knoxville, TN 37996, 2004.
Z. Chen, G. E. Fagg, E. Gabriel, J. Langou, T. Angskun, G. Bosilca, and J. Dongarra, “Building Fault Survivable MPI Programs with FT-MPI Using Diskless Checkpointing,”
Proceedings of the ACM SIG-PLAN Symposium on Principles and Practice of Parallel Programming
, 2005, pp. 213–223.
P. Sanders and J. F. Sibeyn, “A Bandwidth Latency Tradeoff for Broadcast and Reduction,”
Info. Process. Lett.
86
, No. 1, 33–38 (2003).
C. Engelmann and G. A. Geist, “Development of Naturally Fault Tolerant Algorithms for Computing on 100,000 Processors,” see
http://www.csm.ornl.gov/~geist/Lyon2002-geist.pdf
.
R. Rabenseifner, “Automatic MPI Counter Profiling of All Users: First Results on a CRAY T3E 900-512,”
Proceedings of the Message Passing Interface Developer's and User's Conference
, 1999, pp. 77–85.
S. S. Vadhiyar, G. E. Fagg, and J. Dongarra, “Automatically Tuned Collective Communications,”
Proceedings of the ACM/IEEE Conference on Supercomputing
, 2000, p. 3.
S. S. Vadhiyar, G. E. Fagg, and J. J. Dongarra, “Towards an Accurate Model for Collective Communications,”
Intl. J. High Perform. Computing Appl.
18
, No. 1, 159–167 (2004).
R. W. Hockney, “The Communication Challenge for MPP: Intel Paragon and Meiko CS-2,”
Parallel Computing
20
, No. 3, 389–398 (March 1994).
D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken, “LogP: Towards a Realistic Model of Parallel Computation,”
Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, 1993, pp. 1–12.
A. Alexandrov, M. F. Ionescu, K. E. Schauser, and C. J. Scheiman, “LogGP: Incorporating Long Messages into the LogP Model—One Step Closer Towards a Realistic Model for Parallel Computation,”
Proceedings of the 7th Annual ACM Symposium on Parallel Algorithms and Architectures
, 1995, pp. 95–105.
T. Kielmann, H. E. Bal, and K. Verstoep, “Fast Measurement of LogP Parameters for Message Passing Platforms,”
Proceedings of the 15th IPDPS Workshops on Parallel and Distributed Processing
, 2000, pp. 1176–1183.
D. E. Culler, L. T. Liu, R. P. Martin, and C. O. Yoshikawa, “Assessing Fast Network Interfaces,”
IEEE Micro
16
, No. 1, 35–43 (1996).
R. Thakur and W. Gropp, “Improving the Performance of Collective Operations in MPICH,”
Proceedings of the 10th European PVM/MPI User's Group Meeting
, 2003, pp. 257–267.
R. Rabenseifner and J. L. Träff, “More Efficient Reduction Algorithms for Non-Power-of-Two Number of Processors in Message-Passing Parallel Systems,”
Proceedings of the 11th European PVM/MPI Users' Group Meeting
, 2004, pp. 36–46.
T. Kielmann, R. F. H. Hofman, H. E. Bal, A. Plaat, and R. A. F. Bhoedjang, “MagPIe: MPI's Collective Communication Operations for Clustered Wide Area Systems,”
Proceedings of the 7th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, 1999, pp. 131–140.
C. Bell, D. Bonachea, Y. Cote, J. Duell, P. Hargrove, P. Husbands, C. Iancu, M. Welcome, and K. Yelick, “An Evaluation of Current High-Performance Networks,”
Proceedings of the 17th International Symposium on Parallel and Distributed Processing
, 2003, p. 28.
M. Bernaschi, G. Iannello, and M. Lauria, “Efficient Implementation of Reduce-Scatter in MPI,”
J. Syst. Arch.
49
, No. 3, 89–108 (2003).
M. Barnett, L. Shuler, S. Gupta, D. G. Payne, R. van de Geijn, and J. Watts, “Building a High-Performance Collective Communication Library,”
Proceedings of the ACM/IEEE Conference on Supercomputing
, 1994, pp. 107–116.
J. Pjesivac-Grbovic, T. Angskun, G. Bosilca, G. E. Fagg, E. Gabriel, and J. J. Dongarra, “Performance Analysis of MPI Collective Operations,”
Proceedings of the 4th International Workshop on Performance Modeling, Evaluation, and Optimization of Parallel and Distributed Systems
, 2005, p. 272a.
G. E. Fagg, A. Bukovsky, and J. J. Dongarra, “HARNESS and Fault Tolerant MPI,”
J. Parallel Computing
27
, No. 11, 1479–1495 (2001).
About IBM
Privacy
Contact