Dr. Georg Hager
Publikationen
2024
Algebraic temporal blocking for sparse iterative solvers on multi-core CPUs
In: International Journal of High Performance Computing Applications (2024)
ISSN: 1094-3420
DOI: 10.1177/10943420241283828 , , , , :
CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion
38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 (San Francisco, CA, 27. Mai 2024 - 31. Mai 2024)
In: 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2024
DOI: 10.1109/IPDPS57955.2024.00038 , , , , :
2023
Making applications faster by asynchronous execution: Slowing down processes or relaxing MPI collectives
In: Future Generation Computer Systems-The International Journal of Grid Computing Theory Methods and Applications (2023)
ISSN: 0167-739X
DOI: 10.1016/j.future.2023.06.017 , , , :
Physical Oscillator Model for Supercomputing
14th IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS23) (Denver, CO, USA, 12. November 2023 - 17. November 2023)
In: 14th IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS23) 2023
DOI: 10.1145/3624062.3625535 , , :
SPEChpc 2021 Benchmarks on Ice Lake and Sapphire Rapids Infiniband Clusters: A Performance and Energy Case Study
14th IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS23) (Denver, CO, USA, 12. November 2023 - 17. November 2023)
In: 14th IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS23) 2023
DOI: 10.1145/3624062.3624197 , , :
Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications
14th International Conference on Parallel Processing and Applied Mathematics, PPAM 2022 (Gdansk, Poland, 11. September 2022 - 14. Juni 2023)
In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (Hrsg.): Lecture Notes in Computer Science 2023
DOI: 10.1007/978-3-031-30442-2_12 , , , :
Orthogonal Layers of Parallelism in Large-Scale Eigenvalue Computations
In: ACM Transactions on Parallel Computing 10 (2023), Art.Nr.: 3614444
ISSN: 2329-4949
DOI: 10.1145/3614444 , , :
Analytical performance estimation during code generation on modern GPUs
In: Journal of Parallel and Distributed Computing 173 (2023), S. 152-167
ISSN: 0743-7315
DOI: 10.1016/j.jpdc.2022.11.003 , , , , :
Core-Level Performance Engineering with the Open-Source Architecture Code Analyzer (OSACA) and the Compiler Explorer
14th Annual ACM/SPEC International Conference on Performance Engineering, ICPE 2023 (Coimbra, 15. April 2023 - 19. April 2023)
In: ICPE 2023 - Companion of the 2023 ACM/SPEC International Conference on Performance Engineering 2023
DOI: 10.1145/3578245.3583716 , :
MD-Bench: A performance-focused prototyping harness for state-of-the-art short-range molecular dynamics algorithms
In: Future Generation Computer Systems-The International Journal of Grid Computing Theory Methods and Applications (2023)
ISSN: 0167-739X
DOI: 10.1016/j.future.2023.06.023 , , , , , :
MD-Bench: A performance-focused prototyping harness for state-of-the-art short-range molecular dynamics algorithms
In: Future Generation Computer Systems-The International Journal of Grid Computing Theory Methods and Applications 149 (2023), S. 25-38
ISSN: 0167-739X
DOI: 10.1016/j.future.2023.06.023 , , , , , :
2022
Addressing White-box Modeling and Simulation Challenges in Parallel Computing
ACM SIGSIM-PADS '22 (GA, Atlanta, USA, 8. Juni 2022 - 10. Juni 2022)
In: SIGSIM-PADS '22: SIGSIM Conference on Principles of Advanced Discrete Simulation 2022
DOI: 10.1145/3518997.3534986 , , :
Analytic performance model for parallel overlapping memory-bound kernels
In: Concurrency and Computation-Practice & Experience (2022)
ISSN: 1532-0626
DOI: 10.1002/cpe.6816
URL: https://onlinelibrary.wiley.com/doi/10.1002/cpe.6816 , , :
The Role of Idle Waves, Desynchronization, and Bottleneck Evasion in the Performance of Parallel Programs
In: IEEE Transactions on Parallel and Distributed Systems (2022), S. 1-16
ISSN: 1045-9219
DOI: 10.1109/TPDS.2022.3221085 , , :
Level-based Blocking for Sparse Matrices: Sparse Matrix-Power-Vector Multiplication
In: IEEE Transactions on Parallel and Distributed Systems (2022), S. 1-18
ISSN: 1045-9219
DOI: 10.1109/TPDS.2022.3223512 , , , :
2021
Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact
36th International Conference on High Performance Computing, ISC High Performance 2021 (Virtual, Online, 24. Juni 2021 - 2. Juli 2021)
In: Bradford L. Chamberlain, Bradford L. Chamberlain, Ana-Lucia Varbanescu, Hatem Ltaief, Piotr Luszczek (Hrsg.): Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2021
DOI: 10.1007/978-3-030-78713-4_19 , , :
Execution-Cache-Memory modeling and performance tuning of sparse matrix-vector multiplication and Lattice quantum chromodynamics on A64FX
In: Concurrency and Computation-Practice & Experience (2021)
ISSN: 1532-0626
DOI: 10.1002/cpe.6512
URL: https://onlinelibrary.wiley.com/doi/full/10.1002/cpe.6512 , , , , , , :
YaskSite: Stencil Optimization Techniques Applied to Explicit ODE Methods on Modern Architectures
19th IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2021 (Virtual, Korea, KOR, 27. Februar 2021 - 3. März 2021)
In: Jae W. Lee, Mary Lou Soffa, Ayal Zaks (Hrsg.): CGO 2021 - Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization 2021
DOI: 10.1109/CGO51591.2021.9370316 , , , , , :
Opening the Black Box: Performance Estimation during Code Generation for GPUs
IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (Belo Horizonte – Brazil, 26. Oktober 2021 - 28. Oktober 2021)
DOI: 10.1109/sbac-pad53543.2021.00014 , , , , :
2020
Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs
35th International Conference on High Performance Computing, ISC High Performance 2020 (Frankfurt, 22. Juni 2020 - 25. Juni 2020)
In: Ponnuswamy Sadayappan, Bradford L. Chamberlain, Guido Juckeland, Hatem Ltaief (Hrsg.): Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2020
DOI: 10.1007/978-3-030-50743-5_20 , , :
A Recursive Algebraic Coloring Technique for Hardware-efficient Symmetric Sparse Matrix-vector Multiplication
In: ACM Transactions on Parallel Computing 7 (2020), Art.Nr.: 19
ISSN: 2329-4949
DOI: 10.1145/3399732 , , , , , , , :
Understanding HPC benchmark performance on intel broadwell and cascade lake processors
35th International Conference on High Performance Computing, ISC High Performance 2020 (Frankfurt, 22. Juni 2020 - 25. Juni 2020)
In: Ponnuswamy Sadayappan, Bradford L. Chamberlain, Guido Juckeland, Hatem Ltaief (Hrsg.): Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2020
DOI: 10.1007/978-3-030-50743-5_21 , , , , , :
Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX
2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, PMBS 2020 (, 12. November 2020)
In: Proceedings of PMBS 2020: Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems 2020
DOI: 10.1109/PMBS51919.2020.00006 , , , , , , :
Analytic performance modeling and analysis of detailed neuron simulations
In: International Journal of High Performance Computing Applications (2020)
ISSN: 1094-3420
DOI: 10.1177/1094342020912528 , , , :
Performance engineering for a tall & skinny matrix multiplication kernels on GPUs
13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019 (Bialystok, Poland, 8. September 2019 - 11. September 2019)
In: Lecture Notes in Computer Science book series (LNCS, volume 12043), Cham: 2020
DOI: 10.1007/978-3-030-43229-4_43 , , , :
Performance engineering for real and complex tall & skinny matrix multiplication kernels on GPUs
In: International Journal of High Performance Computing Applications (2020)
ISSN: 1094-3420
DOI: 10.1177/1094342020965661 , , , :
Bridging the architecture gap: Abstracting performance-relevant properties of modern server processors
In: Supercomputing Frontiers and Innovations 7 (2020), S. 54-78
ISSN: 2409-6008
DOI: 10.14529/jsfi200204 , , , , :
A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materials
In: International Journal of High Performance Computing Applications (2020)
ISSN: 1094-3420
DOI: 10.1177/1094342020959423 , , :
PHIST: A Pipelined, Hybrid-Parallel Iterative Solver Toolkit
In: Acm Transactions on Mathematical Software 46 (2020), Art.Nr.: 3402227
ISSN: 0098-3500
DOI: 10.1145/3402227 , , , , , , :
2019
Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study
2019 IEEE International Conference on Cluster Computing, CLUSTER 2019 (Albuquerque, NM, 23. September 2019 - 26. September 2019)
In: Proceedings - IEEE International Conference on Cluster Computing, ICCC 2019
DOI: 10.1109/CLUSTER.2019.8890995 , , :
Collecting and presenting reproducible intranode stencil performance: INSPECT
In: Supercomputing Frontiers and Innovations 6 (2019), S. 4-25
ISSN: 2409-6008
DOI: 10.14529/js?190301 , , , , :
Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels
10th IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, PMBS 2019
DOI: 10.1109/PMBS49563.2019.00006 , , , :
Automated instruction stream throughput prediction for intel and AMD microarchitectures
2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, PMBS 2018 (Dallas, TX, 12. November 2018)
In: Proceedings of PMBS 2018: Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis 2019
DOI: 10.1109/PMBS.2018.8641578 , , , , :
2018
On the Accuracy and Usefulness of Analytic Energy Models for Contemporary Multicore Processors
High Performance Computing: 33rd International Conference, ISC High Performance 2018 (Frankfurt, 24. Juni 2018 - 28. Juni 2018)
In: High Performance Computing: 33rd International Conference, ISC High Performance 2018, Cham: 2018
DOI: 10.1007/978-3-319-92040-5_2 , , :
Efficient optical simulation of nano structures in thin-film solar cells
DOI: 10.1117/12.2312545 , , :
Chebyshev filter diagonalization on modern manycore processors and GPGPUs
Springer Verlag, 2018
ISBN: 9783319920399
DOI: 10.1007/978-3-319-92040-5_17 , , , , , , :
Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures
2018 ACM/IEEE Supercomputing Conference (Dallas, TX, 12. November 2018 - 12. November 2018)
In: 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) 2018
DOI: 10.1007/978-3-319-92040-5_2
URL: https://ieeexplore.ieee.org/document/8641578 , , , , :
CRAFT: A library for easier application-level Checkpoint/Restart and Automatic Fault Tolerance
In: IEEE Transactions on Parallel and Distributed Systems (2018)
ISSN: 1045-9219
DOI: 10.1109/TPDS.2018.2866794
URL: https://ieeexplore.ieee.org/document/8444763 , , , , , :
Multicore Performance Engineering of Sparse Triangular Solves Using a Modified Roofline Model
30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) (Lyon, 24. September 2018 - 27. September 2018)
In: 2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), NEW YORK: 2018
DOI: 10.1109/SBAC-PAD.2018.00047 , , , , , , , :
2017
Improved coefficients for polynomial filtering in ESSEX
1st InternationalWorkshop on Eigenvalue Problems: Algorithms, Software and Applications in Petascale Computing, EPASA 2015
DOI: 10.1007/978-3-319-62426-6_5 , , , , , , , , , , , , :
Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels
10th International Workshop on Parallel Tools for High Performance Computing (Stuttgart, Germany, 4. Oktober 2016 - 5. Oktober 2016)
In: Niethammer C, Gracia J, Hilbrich T, Knüpfer A, Resch MM, Nagel WE (Hrsg.): Tools for High Performance Computing 2016, Cham: 2017 , , , :
Performance analysis of the Kahan-enhanced scalar product on current multi-core and many-core processors
In: Concurrency and Computation-Practice & Experience 29 (2017)
ISSN: 1532-0626
DOI: 10.1002/cpe.3921 , , , , , :
An Analysis of Core- and Chip-Level Architectural Features in Four Generations of Intel Server Processors
32nd International Conference on High Performance Computing: ISC High Performance 2017 (Frankfurt)
In: High Performance Computing. ISC 2017. Lecture Notes in Computer Science, vol 10266, Cham: 2017
DOI: 10.1007/978-3-319-58667-0_16 , , , :
LIKWID monitoring stack: A flexible framework enabling job specific performance monitoring for the masses
2017 IEEE International Conference on Cluster Computing, CLUSTER 2017
DOI: 10.1109/CLUSTER.2017.115 , , , :
2016
Exploring performance and power properties of modern multi-core chips via simple machine models
In: Concurrency and Computation-Practice & Experience 28 (2016), S. 189-210
ISSN: 1532-0626
DOI: 10.1002/cpe.3180 , , , :
Analysis of intel’s haswell microarchitecture using the ECM model and microbenchmarks
Springer Verlag, 2016
ISBN: 9783319306940
DOI: 10.1007/978-3-319-30695-7_16 , , , , :
Analysis of Intel's Haswell Microarchitecture Using the ECM Model and Microbenchmarks
29th International Conference on Architecture of Computing Systems (Nuremberg)
In: Architecture of Computing Systems -- ARCS 2016: 29th International Conference, Nuremberg, Germany, April 4-7, 2016, Proceedings, Cham: 2016
DOI: 10.1007/978-3-319-30695-7_16 , , , , :
Performance analysis of the Kahan-enhanced scalar product on current multi-corecore and many-core processors
In: Concurrency and Computation-Practice & Experience 28 (2016)
ISSN: 1532-0626
DOI: 10.1002/cpe.3921 , , , , , :
GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems
In: International Journal of Parallel Programming (2016), S. 1-27
ISSN: 0885-7458
DOI: 10.1007/s10766-016-0464-z , , , , , , , , , :
Towards an exascale enabled sparse solver repository
Springer Verlag, 2016
ISBN: 9783319405261
DOI: 10.1007/978-3-319-40528-5_13 , , , , , , , , , , , :
High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations
In: Journal of Computational Physics 325 (2016), S. 226-243
ISSN: 0021-9991
DOI: 10.1016/j.jcp.2016.08.027 , , , , , , , :
2015
Automatic loop kernel analysis and performance modeling with kerncraft
6th International Workshop in Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, PMBS 2015 - Held as part of the 27th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015
DOI: 10.1145/2832087.2832092 , , , :
Automatic Loop Kernel Analysis and Performance Modeling With Kerncraft
SC15 The International Conference for High Performance Computing, Networking, Storage and Analysis (Austin, TX, USA, 15. November 2015)
In: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, New York, NY, USA: 2015
DOI: 10.1145/2832087.2832092
URL: http://dl.acm.org/citation.cfm?id=2832087&preflayout=flat , , , :
Performance analysis of the Kahan-enhanced scalar product on current multicore processors
the 11th International Conference on Parallel Processing and Applied Mathematics (Krakow, Poland)
In: Accepted for PPAM 2015 2015
URL: http://arxiv.org/abs/1505.02586 , , , , :
Performance Engineering of the Kernel Polynomal Method on Large-Scale CPU-GPU Systems
Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International (Hyderabad, India, 25. Mai 2015 - 29. Mai 2015)
In: IEEE (Hrsg.): Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2015
DOI: 10.1109/IPDPS.2015.76
URL: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7161530 , , , , , :
Multicore-optimized wavefront diamond blocking for optimizing stencil updates
In: SIAM Journal on Scientific Computing 37 (2015), S. C439-C464
ISSN: 1064-8275
DOI: 10.1137/140991133 , , , , , :
Increasing the performance of the Jacobi-Davidson method by blocking
In: SIAM Journal on Scientific Computing DLR Portal ISSN 1064-8275 (2015), S. 1-27
ISSN: 1064-8275
DOI: 10.1137/140976017
URL: http://elib.dlr.de/98373/ , , , , , , , , :
Building a Fault Tolerant Application Using the GASPI Communication Layer
the 1st International Workshop on Fault-Tolerant Systems (Chicago, IL, 8. September 2015 - 11. September 2015)
In: Proceedings of FTS 2015, in conjunction with IEEE Cluster 2015: 2015
DOI: 10.1109/CLUSTER.2015.106 , , , , , , :
Overhead Analysis of Performance Counter Measurements
43rd International Conference on Parallel Processing Workshops, ICPPW 2014
DOI: 10.1109/ICPPW.2014.34 , , , :
Chip-level and multi-node analysis of energy-optimized lattice Boltzmann CFD simulations
In: Concurrency and Computation-Practice & Experience (2015), S. 1-5
ISSN: 1532-0626
DOI: 10.1002/cpe.3489
URL: http://onlinelibrary.wiley.com/doi/10.1002/cpe.3489/full , , , , :
2014
Quantifying performance bottlenecks of stencil computations using the Execution-Cache-Memory model
DOI: 10.1145/2751205.2751240
URL: http://arxiv.org/abs/1410.5010 , , , :- Alvermann Andreas, Basermann Achim, Fehske Holger, Galgon Martin, Hager Georg, Kreutzer Moritz, Krämer Lukas, Lang Bruno, Pieper Andreas, Röhrig-Zöllner Melven, Shahzad Faisal, Jonas Thies, Wellein Gerhard:
ESSEX: Equipping Sparse Solvers for Exascale
In: Euro-Par 2014: Parallel Processing Workshops, Lecture Notes in Computer Science: SpringerLink, 2014, S. 577-588 (Lecture Notes in Computer Science, Bd.8806)
ISBN: 9783319143125
URL: http://link.springer.com/chapter/10.1007/978-3-319-14313-2_49
Comparing the Performance of Different x86 SIMD Instruction Sets for a Medical Imaging Application on Modern Multi- and Manycore Chips
2014 1st ACM SIGPLAN Workshop on Programming Models for SIMD/Vector Processing, WPMVP 2014 - Co-located with PPoPP 2014 (Orlando, USA, 16. Februar 2014 - 16. Februar 2014)
In: Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing, New York, NY, USA: 2014
DOI: 10.1145/2568058.2568068
URL: http://dl.acm.org/citation.cfm?doid=2568058.2568068 , , , :
Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator
In: ARCS Workshops'14 2014
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6775080&isnumber=6775071 , , , :
A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units
In: SIAM Journal on Scientific Computing 36 (2014), S. C401C423
ISSN: 1064-8275
DOI: 10.1137/130930352
URL: http://epubs.siam.org/doi/abs/10.1137/130930352 , , , , :
Modeling and analyzing performance for highly optimized propagation steps of the lattice Boltzmann method on sparse lattices
(2014)
Open Access: http://arxiv.org/abs/1410.0412
URL: https://arxiv.org/abs/1410.0412
(Techreport) , , , :
2013
Pushing the limits for medical image reconstruction on recent standard multicore processors
In: International Journal of High Performance Computing Applications 27 (2013), S. 162-177
ISSN: 1094-3420
DOI: 10.1177/1094342012442424 , , , , :
Model-guided Performance Analysis of the Sparse Matrix-Matrix Multiplication
International Conference on High Performance Computing & Simulation (HPCS 2013) (Helsinki, Finnland, 1. Juli 2013 - 5. Juli 2013)
In: HPCS2013 Conference Proceedings (Preprint bei arXiv) 2013
DOI: 10.1109/HPCSim.2013.6641452
URL: http://arxiv.org/abs/1303.1651 , , , :
A survey of checkpoint/restart techniques on distributed memory systems
In: Parallel Processing Letters 23 (2013), S. 1340011-1340030
ISSN: 0129-6264
DOI: 10.1142/S0129626413400112
URL: http://www.worldscientific.com/doi/abs/10.1142/S0129626413400112 , , , , , :
PGAS implementation of SpMVM and LBM with GPI
The 7th International Conference on PGAS Programming Models (Edinburgh, Scotland, UK)
In: Proceedings of the 7th International Conference on PGAS Programming Models, Edinburgh: 2013 , , , , , :
An Evaluation of Different I/O Techniques for Checkpoint/Restart
2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (Boston, MA, USA, 20. Mai 2013 - 24. Mai 2013)
In: Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), 2013 IEEE 27th International, n.a.: 2013
DOI: 10.1109/IPDPSW.2013.145 , , , , :
MPC and CoArray Fortran: Alternatives to classic MPI implementations on the examples of scalable lattice boltzmann flow solvers
15th Results and Review Workshop on High Performance Computing in Science and Engineering, HLRS 2012 (Stuttgart)
DOI: 10.1007/978-3-642-33374-3_27 , , , , :
Comparison of Different Propagation Steps for Lattice Boltzmann Methods
In: Computers & Mathematics with Applications 65 (2013), S. 924-935
ISSN: 0898-1221
DOI: 10.1016/j.camwa.2012.05.002
URL: http://www.sciencedirect.com/science/article/pii/S0898122112003835 , , , :
2012
Performance Patterns and Hardware Metrics on Modern Multicore Processors: Best Practices for Performance Engineering
5th Workshop on Productivity and Performance (PROPER 2012) (Rhodes Island, Greece)
In: Euro-Par 2012, -: 2012
URL: http://arxiv.org/abs/1206.3738 , , :
Performance Engineering for the Lattice Boltzmann Method on GPGPUs: Architectural Requirements and Performance Results
In: Computers & Fluids (2012), S. 10
ISSN: 0045-7930
DOI: 10.1016/j.compfluid.2012.02.013.
URL: http://www.sciencedirect.com/science/article/pii/S0045793012000679 , , , , :
Exploring performance and power properties of modern multicore chips via simple machine models
In: Concurrency and Computation-Practice & Experience Submitted (2012), S. 22
ISSN: 1532-0626
URL: http://arxiv.org/abs/1208.2908 , , , :
High performance smart expression template math libraries
High Performance Computing and Simulation (HPCS) 2012 (Madrid, 2. Juli 2012 - 6. Juli 2012)
In: High Performance Computing and Simulation (HPCS) 2012, International Conference on 2012
DOI: 10.1109/HPCSim.2012.6266939
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=06266939 , , , :
Evaluation of the Coarray Fortran Programming Model on the Example of a Lattice Boltzmann Code
The 6th Conference on Partitioned Global Address Space Programming Models (Santa Barbara, CA, USA)
In: PGAS12, In Press: 2012 , , , , :
Domain Decomposition and Locality Optimization for Large-Scale Lattice Boltzmann Simulations
In: Computers & Fluids (2012)
ISSN: 0045-7930
DOI: 10.1016/j.compfluid.2012.02.007
URL: http://www.sciencedirect.com/science/article/pii/S0045793012000527 , , , :
2011
Efficient multicore-aware parallelization strategies for iterative stencil computations
In: Journal of Computational Science 2 (2011), S. 130137
ISSN: 1877-7503
DOI: 10.1016/j.jocs.2011.01.010
URL: http://www.sciencedirect.com/science/article/pii/S1877750311000172 , , :
A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters
In: Parallel Computing 37 (2011), S. 536-549
ISSN: 0167-8191
DOI: 10.1016/j.parco.2011.03.005
URL: http://www.sciencedirect.com/science/article/pii/S0167819111000342 , , , , , :
Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA
PARENG 2009
In: Advances in Engineering Software, ScienceDirect: 2011
DOI: 10.1016/j.advengsoft.2010.10.007
URL: http://www.sciencedirect.com/science/article/pii/S0965997810001274 , , , :
Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems.
In: Parallel Processing Letters 21 (2011), S. 339-358
ISSN: 0129-6264
DOI: 10.1142/S0129626411000254 , , , :
Expression Templates Revisited: A Performance Analysis of the Current ET Methodology
In: SIAM Journal on Scientific Computing (2011), S. 1-15
ISSN: 1064-8275
URL: http://arxiv.org/abs/1104.1729 , , , :
Parallel sparse matrix-vector multiplication as a test case for hybrid MPI OpenMP programming
25th IEEE International Parallel and Distributed Processing Symposium, Workshops and Phd Forum, IPDPSW 2011 (Anchorage, AK)
DOI: 10.1109/IPDPS.2011.332 , , , :
2010
Introducing a Performance Model for Bandwidth-Limited Loop Kernels
8th International Conference, PPAM 2009 , Revised Selected Papers, Part I (Wroclaw, Poland, 13. September 2009 - 16. September 2009)
In: Parallel Processing and Applied Mathematics, Berlin Heidelberg: 2010
DOI: 10.1007/978-3-642-14390-8_64
URL: http://www.springerlink.com/content/m720118145140122/ , :
Complexities of Performance Prediction for Bandwidth-Limited Loop Kernels on Multi-Core Architectures
Transactions of the Fourth Joint HLRB and KONWIHR Review and Results Workshop (Leibniz Supercomputing Centre, Garching/Munich, Germany)
In: High Performance Computing in Science and Engineering, Garching/Munich 2009, Berlin Heidelberg: 2010
DOI: 10.1007/978-3-642-13872-0_1
URL: http://www.springerlink.com/content/m1288m0174021600/ , , :
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments
39th International Conference on Parallel Processing Workshops (San Diego, CA, USA, 13. September 2010 - 16. September 2010)
In: Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, IEEE: 2010
DOI: 10.1109/ICPPW.2010.38
URL: http://arxiv.org/abs/1004.4431 , , :
LIKWID performance tools
URL: http://inside.hlrs.de/pdfs/inSiDE_spring2010.pdf , , , :
A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for Heterogeneous GPU-CPU Clusters
(2010), S. 18
URL: https://www10.cs.fau.de/publications/reports/TechRep_2010-07.pdf
(Techreport) , , , , , :
Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters
In: Parallel Processing Letters 20 (2010), S. 359-376
ISSN: 0129-6264
DOI: 10.1142/S0129626410000296
URL: http://arxiv.org/abs/1006.3148 , , , :
2009
Speeding up a Lattice Boltzmann Kernel on nVIDIA GPUs
PARENG2009 (Pécs, Hungary, 6. April 2009 - 8. April 2009)
In: Proceedings of the First International Conference on Parallel, Distributed and Grid Computing for Engineering, Kippen, Stirlingshire, United Kingdom: 2009 , , , :
RZBENCH: performance evaluation of current HPC architectures using low-level and application benchmarks
In: High Performance Computing in Science and Engineering, Garching/Munich 2007: Transactions of the Third Joint HLRB and KONWIHR Status and Result Workshop, Dec. 3-4, 2007, Leibniz Supercomputing Centre, Garching/Munich, Germany, Berlin, Heidelberg: Springer, 2009, S. 485-501 (Mathematics and Statistics, Bd.V)
ISBN: 978-3-540-69181-5
DOI: 10.1007/978-3-540-69182-2_39 , , , :
Fission of super-heavy nuclei explored with skyrme forces
In: International Journal of Modern Physics E-Nuclear Physics 18 (2009), S. 773-781
ISSN: 0218-3013
DOI: 10.1142/S0218301309012860 , , , , :
Challenges and Potentials of Emerging Multicore Architectures
Third Joint HLRB and KONWIHR Status and Result Workshop (Garching, 3. Dezember 2007 - 4. Dezember 2007)
In: High Performance Computing in Science and Engineering Garching-Munich 2007, Berlin Heidelberg: 2009
URL: http://www.springer.com/math/cse/book/978-3-540-69181-5 , , , , :
Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization
COMPSAC 2009 (Seattle, USA, 20. Juli 2009 - 24. Juli 2009)
In: Proceedings of 2009 33rd Annual IEEE International Computer Software and Applications Conference, IEEE Computer Society: 2009
DOI: 10.1109/COMPSAC.2009.82 , , , , :
Benchmark analysis and application results for lattice Boltzmann simulations on NEC SX vector and Intel Nehalem systems
In: Parallel Processing Letters 19 (2009), S. 491-511
ISSN: 0129-6264
DOI: 10.1142/S0129626409000389
URL: http://www.worldscinet.com/ppl/19/1904/S0129626409000389.html , , :
The world's fastest CPU and SMP node: Some performance results from the NEC SX-9
23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS) (Roma, 23. Mai 2009 - 29. Mai 2009)
In: Proceedings of the IEEE International Symposium on Parallel&Distributed Processing 2009, IEEE Computer Society: 2009
DOI: 10.1109/IPDPS.2009.5161089 , , :
Vector Computers in a World of Commodity Clusters, Massively Parallel Systems and Many-Core Many-Threaded CPUs: Recent Experience Based on an Advanced Lattice Boltzmann Flow Solver
In: High Performance Computing in Science and Engineering '08: Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2008, Berlin Heidelberg: Springer, 2009, S. 333-347 (Mathematics and Statistics, Bd.5)
ISBN: 978-3-540-88301-2
DOI: 10.1007/978-3-540-88303-6_24 , , :
Selecting an Appropriate Computational Platform for Supporting the Development of New Catalyst Carriers
In: Innovatives Supercomputing in Deutschland : inSiDE 7 Spring (2009), S. 12-16
URL: http://inside.hlrs.de/htm/Edition_01_09/article_05.html , , , , , , :
2008
Direct numerical simulation of turbulent flow over dimples - Code optimization for NEC SX-8 plus flow results
In: High Performance Computing in Science and Engineering '07: Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2007, Berlin/Heidelberg: Springer, 2008, S. 303-318
ISBN: 9783540747383
DOI: 10.1007/978-3-540-74739-0_21
URL: http://link.springer.com/chapter/10.1007%2F978-3-540-74739-0_21 , , , , :
Data access characteristics and optimizations for SUN ULTRASPARC T2 AND T2+ systems
In: Parallel Processing Letters 18 (2008), S. 471-490
ISSN: 0129-6264
DOI: 10.1142/S0129626408003521 , , :
Data access optimizations for highly threaded multi-core CPUs with multiple memory controllers
IEEE International Symposium on Parallel and Distributed Processing, 2008. IPDPS 2008 (Miami, FL, USA, 14. April 2008 - 18. April 2008)
In: Proceedings of the 2008 IEEE International Parallel & Distributed Processing Symposium, IEEE Catalog Number: 2008
DOI: 10.1109/IPDPS.2008.4536341 , , :
What's next? Evaluating Performance and Programming Approaches for Emerging Computer Technologies
(2008), S. 42-45
URL: http://www.rrze.uni-erlangen.de/wir-ueber-uns/publikationen/HPC-2008-Screenversion.pdf
(Techreport) , , :
Introducing a parallel cache oblivious blocking approach for the lattice Boltzmann method
In: Progress in Computational Fluid Dynamics 8 (2008), S. 179-188
ISSN: 1468-4349
DOI: 10.1504/PCFD.2008.018088 , , , , , :
2006
Optimizing performance on modern HPC systems: learning from simple kernel benchmarks
The 2nd Russian-German Advanced Research Workshop (Stuttgart, Germany)
In: Computational Science and High Performance Computing II, Berlin Heidelberg: 2006
DOI: 10.1007/3-540-31768-6_23
URL: http://www.springerlink.com/content/8401n54088177483/ , , , :
Optimization of Cache Oblivious Lattice Boltzmann Method in 2D and 3D
ASIM 2006 - 19. Symposium Simulationstechnik (Hannover)
In: Simulationstechnique - 19th Symposium in Hannover, September 2006, Erlangen: 2006
URL: https://www10.informatik.uni-erlangen.de/Publications/Papers/2006/Nitsure_ASIM06.pdf , , , , , :
Have the Vectors the Continuing Ability to Parry the Attack of the Killer Micros?
In: High Performance Computing on Vector Systems: Proceedings of the High Performance Computing Center Stuttgart, March 2005, Berlin Heidelberg: Springer, 2006, S. 25-37 (Mathematics and Statistics, Bd.1)
ISBN: 978-3-540-29124-4
DOI: 10.1007/3-540-35074-8_2 , , , , :
Towards optimal performance for lattice Boltzmann applications on terascale computers
In: Parallel Computational Fluid Dynamics: Theory and Applications, Proceedings of the 2005 International Conference on Parallel Computational Fluid Dynamics, -: 2006
DOI: 10.1016/B978-044452206-1/50005-7 , , , , :
On the single processor performance of simple lattice Boltzmann kernels
In: Computers & Fluids 35 (2006), S. 910-919
ISSN: 0045-7930
DOI: 10.1016/j.compfluid.2005.02.008 , , , :
2005
Performance of Scientific Applications on Modern Supercomputers
High Performance Computing in Science and Engineering (München)
In: High Performance Computing in Science and Engineering Munich 2004 Transactions of the Second Joint HLRB and KONWIHR Status and ResultWorkshop, March 2-3, 2004, Technical University of Munich, andLeibniz-Rechenzentrum Munich, Germany., Berlin Heidelberg: 2005
URL: http://link.springer.com/chapter/10.1007/3-540-26657-7_1#page-1 , , , :
Optimizing performance of the lattice Boltzmann method for complex structures on cache-based architectures
In: Frontiers in Simulation: Simulationstechnique, 18th Symposium in Erlangen, September 2005 (ASIM), Erlangen: 2005
URL: http://www.rrze.uni-erlangen.de/dienste/arbeiten-rechnen/hpc/Projekte/Donath_ASIM05.pdf , , , , :
Taming the Bandwidth Behemoth. First Experiences on a Large SGI Altix System
In: Innovatives Supercomputing in Deutschland : inSiDE 3 (2005), S. 24 , , , :
cxHPC: Setting up ByGRID First Steps Towards an e-Science Infrastructure in Bavaria
In: High Performance Computing in Science and Engineering, Garching 2004: Transactions of the KONWIHR Result Workshop, October 1415, 2004, Technical University of Munich, Garching, Germany, Berlin Heidelberg: Springer, 2005, S. 97-102 (Mathematics and Statistics, Bd.II)
ISBN: 978-3-540-26145-2
DOI: 10.1007/3-540-28555-5_9 , , :
2004
cxHPC: Setting up ByGRID --- First Steps Towards an e-Science Infrastructure in Bavaria
In: High Performance Computing in Science and Engineering, Garching 2004, Berlin, Heidelberg: 2004 , , :
2003
Pseudo-Vectorization and RISC Optimization Techniques for the Hitachi SR8000 Architecture
High Performance Computing in Science and Engineering (München, 10. Oktober 2002 - 11. Oktober 2002)
In: High Performance Computing in Science and Engineering, Munich 2002: Transactions of the First Joint HLRB and KONWIHR Status and Result Workshop, October 10-11, Technical University of Munich, Germany., New York, LLC: 2003 , , :
Auszeichnungen
- , , , , : Best Short Paper Award (PMBS 2020) (PMBS@SC20) – 2020
- , , , : Best Late-Breaking Paper Award (PMBS 2019) (PMBS@SC19) – 2019
- , , , : Best Workshop Paper Award (PPAM 2019) (PPAM) – 2019
- , , : Gauss Award (Gauss Centre for Supercomputing (GCS)) – 2018