Scaling Results From the First Generation of Arm-based Supercomputers
McIntosh-Smith, Simon, Price, James, Poenaru, Andrei, and Deakin, Tom
2019
Cray User Group 2019, Montreal, Canada.

A performance analysis of the first generation of HPC‐optimized Arm processors
McIntosh-Smith, Simon, Price, James, Deakin, Tom, and Poenaru, Andrei
2019
Concurrency and Computation: Practice and Experience

Parallel multiprocessing and scheduling on the heterogeneous Xeon+FPGA platform
Rodriguez, Andres, Navarro, Angeles, Asenjo, Rafael, Corbera, Francisco, Suarez, Dario, Gran, Ruben, and Nunez-Yanez, Jose
2019
The Journal of Supercomputing.

Comparative benchmarking of the first generation of HPC-optimised ARM processors on Isambard
McIntosh-Smith, Simon, Price, James, Deakin, Tom, and Poenaru, Andrei
2018
Cray User Group 2018, Stockholm, Sweden.

Multi-Precision Convolutional Neural Networks on Heterogeneous Hardware
Amiri, Moslem, Hosseinabady, Mohammad, McIntosh-Smith, Simon, and Nunez-Yanez, Jose
2018
Design, Automation and Test in Europe (DATE 2018), Dresden, Germany.

Portable Methods for Measuring Cache Hierarchy Performance
Deakin, Tom, Price, James, and McIntosh-Smith, Simon
2017
Poster session presented at IEEE/ACM SuperComputing, Denver, Colorado, United States.

Correcting Detectable Uncorrectable Errors in Memory
Pawelczak, Grzegorz and McIntosh-Smith, Simon
2017
Poster session presented at IEEE/ACM SuperComputing, Denver, Colorado, United States.

Application-Based Fault Tolerance Techniques for Fully Protecting Sparse Matrix Solvers
Pawelczak, Grzegorz, McIntosh-Smith, Simon, Price, James, and Martineau, Matt
2017
3rd International Workshop on Fault Tolerant Systems (FTS 2017).

Application-based fault tolerance techniques for sparse matrix solvers
McIntosh-Smith, Simon, Hunt, Rob, Price, James, and Warwick Vesztrocy, Alex
2017
International Journal of High Performance Computing Applications.

The Productivity, Portability and Performance of OpenMP 4.5 for Scientific Applications Targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs
Martineau, Matt and McIntosh-Smith, Simon
2017
Proceedings of 13th International Workshop on OpenMP, NY, USA.

On the performance of parallel tasking runtimes for an irregular fast multipole method application
Atkinson, Patrick and McIntosh-Smith, Simon
2017
Proceedings of 13th International Workshop on OpenMP, NY, USA.

Exploiting auto-tuning to analyze and improve performance portability on many-core architectures
Price, J and McIntosh-Smith, S
2017
2nd International Workshop on Performance Portable Programming Models for Accelerators (P^3MA)

On the mitigation of cache hostile memory access patterns on many-core CPU architectures
Deakin, T, Gaudin, W, and McIntosh-Smith, S
2017
High Performance Computing. ISC High Performance 2017. Intel Xeon Phi User Group workshop.

Assessing the performance portability of modern parallel programming models using TeaLeaf
Martineau, Matt, McIntosh-Smith, Simon, and Gaudin, Wayne
2017
Concurrency and Computation: Practice and Experience

Evaluating Attainable Memory Bandwidth of Parallel Programming Models via BabelStream
Deakin, T, Price, J, Martineau, M, and McIntosh-Smith, S
2017
International Journal of Computational Science and Engineering (special issue)

Adaptive voltage scaling in a heterogeneous FPGA device with memory and logic in-situ detectors
Nunez-Yanez, Jose
2017
Microprocessors and Microsystems, vol 51., pp. 227-238

Pragmatic Performance Portability with OpenMP 4.x
Martineau, Matt, Price, James, McIntosh-Smith, Simon, and Gaudin, Wayne
2016
Proceedings of the 12th International Workshop on OpenMP

Performance Analysis and Optimization of Clang’s OpenMP 4.5 GPU Support
Martineau, Matt, McIntosh-Smith, Simon, Bertolli, Carlo, Jacob, Arpith, Antao, Samuel, Eichenberger, Alexandre, Bercea, Gheorghe-Teodor, Chen, Tong, Jin, Tian, OBrien, Kevin, Rokos, Georgios, Sung, Hyojin, and Sura, Zehra
2016
Proceedings of the International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)

Offloading Support for OpenMP in Clang and LLVM
Antao, Samuel, Bataev, Alexey, Jacob, Arpith, Bercea, Gheorghe-Teodor, Eichenberger, Alexandre, Rokos, Georgios, Martineau, Matt, Jin, Tian, Ozen, Guray, Sura, Zehra, Chen, Tong, Sung, Hyojin, Bertolli, Carlo, and OBrien, Kevin
2016
Proceedings of the Third Workshop on LLVM Compiler Infrastructure in HPC

Software-level Fault Tolerant Framework for Task-based Applications
Yeh, J, Pawelczak, G, Sewart, J, Price, J, Avila Ibarra, A, McIntosh-Smith, S, Bautista-Gomez, L, and Zyulkyarov, F
2016
Poster session presented at IEEE/ACM SuperComputing, Salt Lake City, Utah, United States.

GPU-STREAM: Now in 2D!
Deakin, T, Price, J, Martineau, M, and McIntosh-Smith, S
2016
Poster session presented at IEEE/ACM SuperComputing, Salt Lake City, Utah, United States.

GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models
Deakin, T, Price, J, Martineau, M, and McIntosh-Smith, S
2016
Taufer M., Mohr B., Kunkel J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science, vol 9945. Springer, Cham

An Evaluation of Emerging Many-Core Parallel Programming Models
Martineau, Matt J, McIntosh-Smith, Simon N, Gaudin, Wayne, and Boulton, Michael
2016
Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM’16). ed. Pavan Balaji; Kai-Cheung Leung. Association for Computing Machinery

Many-core Acceleration of a Discrete Ordinates Transport Mini-app at Extreme Scale
Deakin, Tom, McIntosh-Smith, Simon N, and Gaudin, Wayne
2016
High Performance Computing, Networking and Storage - 30th International Conference, ISC High Performance 2016, Frankfurt, Germany. Springer Verlag (Lecture Notes in Computer Science).

Evaluating OpenMP 4.0s Effectiveness as a Heterogeneous Parallel Programming Modes
Martineau, Matt, McIntosh-Smith, Simon, and Gaudin, Wayne
2016
21st International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS), Chicago, United States.

Towards Portability for a Compressible Finite-Volume CFD Code
Curran, Daniel, Allen, Christian B, and McIntosh-Smith, Simon
2016
54th AIAA Aerospace Sciences Meeting. American Institute of Aeronautics and Astronautics. AIAA 2016-1813.

An improved parallelism scheme for deterministic discrete ordinates transport
Deakin, Tom, McIntosh-Smith, Simon, Martineau, Matt, and Gaudin, Wayne
2016
International Journal of High Performance Computing Applications

Application-Based Fault Tolerance Techniques for Sparse Matrix Solvers
McIntosh-Smith, Simon, Hunt, Rob, Price, James, and Vesztrocy, Alex
2016
International Journal of High Performance Computing Applications

A Performance Evaluation of Kokkos & RAJA using the TeaLeaf Mini-App
Martineau, Matt J, McIntosh-Smith, Simon, Gaudin, Wayne, Boulton, Michael, and Beckingsale, D.A
2015
Poster session presented at Poster session presented at IEEE/ACM SuperComputing, Austin, United States.

GPU-STREAM: Benchmarking the achievable memory bandwidth of Graphics Processing Units
Deakin, Tom and McIntosh-Smith, Simon
2015
Poster session presented at IEEE/ACM SuperComputing, Austin, United States.

OpenMP 4.0 vs. OpenCL: Performance comparison
Vinogradov, Sergei, Fedorova, Julia, Curran, Daniel, McIntosh-Smith, Simon, and Cownie, James
2015
The OpenMP Developers Conference (OpenMPCon), Aachen, Germany.

Exploiting spatial information in datasets to enable fault tolerant sparse matrix solvers
Hunt, Rob and McIntosh-Smith, Simon
2015
IEEE International Conference on Cluster Computing: Fault Tolerant Systems Workshop

Expressing Parallelism on Many-Core for Deterministic Discrete Ordinates Transport
Deakin, Tom, McIntosh-Smith, Simon, and Gaudin, Wayne
2015
IEEE International Conference on Cluster Computing: Workshop on Representative Applications

Developing a Future-Proof CFD Code
Curran, Daniel, McIntosh-Smith, Simon N, Allen, Christian B, and Beckingsale, D.A.
2015
ParCFD, Montreal, Canada.

High dynamic range computational photography on mobile GPUs
McIntosh-Smith, Simon N, Chohan, Amir, Curran, Daniel, and Lokhmotov, Anton
2015
GPU Pro 6 - Advanced Rendering Techniques. ed. Wolfgang Engel

High performance in silico virtual drug screening on many-core processors
McIntosh-Smith, Simon N, Price, James R, Sessions, Richard B, and Avila Ibarra, Amaurys
2015
International Journal of High Performance Computing Applications, Vol. 29, No. 2, 05.2015, p. 119-134.

Improving Auto-Tuning Convergence Times with Dynamically Generated Predictive Performance Models
Price, James and McIntosh-Smith, Simon
2015
Proceedings of IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip. IEEE Computer Society, 2015. p. 211-218.

Oclgrind: An Extensible OpenCL Device Simulator
Price, James R and McIntosh-Smith, Simon N.
2015
IWOCL, Palo Alto, United States.

The OPS Domain Specific Abstraction for Multi-Block Structured Grid Computations
Reguly, Istvan, Mudalige, Gihan, Giles, Michael, Curran, Daniel, and McIntosh-Smith, Simon N.
2014
IEEE/ACM SuperComputing, New Orleans, United States.

Portable performance with OpenCL
McIntosh-Smith, S. N. and Mattson, T.
2014
J Reinders & J Jeffers (eds), High Performance Parallelism Pearls: Multicore and Many-core Programming Approaches. Morgan Kaufmann, pp. 359–375.

On the performance portability of structured grid codes on many-core computer architectures
McIntosh-Smith, S. N., Boulton, M., Curran, D., and Price, J. R.
Jun 2014
International Supercomputing. Springer International Publishing Switzerland, Vol. LNCS 8488, p. 53-75 23 p.

A GPU-accelerated immersive audio-visual framework for interaction with molecular dynamics using consumer depth sensors
Glowacki, D. R., O’Connor, M., Calabro, G., Price, J. R., Tew, P., Mitchell, T., Jyde, J., Tew, D. P., Coughtrie, D. J., and McIntosh-Smith, S. N.
7 May 2014
25 p.

Evaluation of a performance portable lattice Boltzmann code using OpenCL
McIntosh-Smith, S. N. and Curran, D.
May 2014

Optimising Hydrodynamics applications for the Cray XC30 with the application tool suite
Gaudin, W., Mallinson, A., Perks, O., Herdman, J., Levesque, J., Jarvis, S., and McIntosh-Smith, S. N.
May 2014

Rapid Decomposition and Visualisation of Protein-Ligand Binding Free Energies by Residue and by Water
Woods, C. J., Malaisree, M., Long, B. J. O., McIntosh-Smith, S. N., and Mulholland, A. J.
May 2014
23 p.

High performance in silico virtual drug screening on many-core processors
McIntosh-Smith, S. N., Price, J. R., Sessions, R. B., and Avila Ibarra, A.
2014
International Journal of High Performance Computing Applications. 16 p.

Computational Assay of H7N9 Influenza Neuraminidase Reveals R292K Mutation Reduces Drug Binding Affinity
Woods, C. J., Malaisree, M., Long, B., McIntosh-Smith, S., and Mulholland, A. J.
20 Dec 2013
Scientific Reports. 3, 6 p.3561

Special issue of the Journal of Parallel and Distributed Computing (JDPC) on novel architectures for high-performance computing
McIntosh-Smith, S., Gillan, C., Sanna, N., Scott, S., and Steinke, T.
Nov 2013
Journal of Parallel and Distributed Computing. 73, 11, p. 1415-1416 2 p.

Porting a commercial application to OpenCL: A case study
Krige, S., Mackey, M., McIntosh-Smith, S. N., and Sessions, R. B.
May 2013

How GPUs can find your next hit: Accelerating virtual screening with OpenCL.
Krige, S., Mackey, M., McIntosh-Smith, S. N., and Sessions, R. B.
Apr 2013

Analysis and assay of oseltamivir-resistant mutants of influenza neuraminidase via direct observation of drug unbinding and rebinding in simulation
Woods, C. J., Malaisree, M., Long, B. J. O., McIntosh-Smith, S. N., and Mulholland, A. J.
2013
Biochemistry. 52, 45, p. 8150–8164

danceroom Spectroscopy: Interactive quantum molecular dynamics accelerated on GPU architectures using OpenCL
Glowacki, D. R., Tew, D. P., Mitchell, T. J. F., Price, J. R., and McIntosh-Smith, S. N.
Dec 2012

Accelerating Hydrocodes with OpenACC, OpenCL and CUDA
Herdman, J. A., Gaudin, W. P., McIntosh-Smith, S. N., Boulton, M., Beckingsale, D. A., Mallinson, A. C., and Jarvis, S. A.
29 Jun 2012
2012 SC Companion: High Performance Computing, Networking Storage and Analysis. IEEE Computer Society, p. 465-471 12 p.

SIMD array operable to process different respective packet protocols simultaneously while executing a single common instruction stream
McIntosh-Smith, S. N., Rhoades, J., Cameron, K., Winser, P., McConnell, R., and Panesar, G.
28 Feb 2012
Patent 8127112

Energy Efficient HPC Software: Learning from Embedded Systems Development
Kerrison, S. P., Eder, K. I., and McIntosh-Smith, S. N.
16 Feb 2012
SIAM Conference on Parallel Processing for Scientific Computing. SIAM, 30 p.

Benchmarking energy efficiency, power costs and carbon emissions on heterogeneous systems
McIntosh-Smith, S., Wilson, T., Ávila Ibarra, A., Crisp, J., and Sessions, R. B.
Feb 2012
The Computer Journal. 55, 2, p. 192 - 205 14 p.

Exploiting OpenCL for heterogeneous computing: A case study
McIntosh-Smith, S. N.
14 Dec 2011

A method for automatically generating analogue benchmarksuites using low-level hardware metrics
McIntosh-Smith, S. N. and Thomas, O.
Nov 2011
ACM SIGMETRICS Performance Evaluation Review. ACM, p. 17-18 2 p.

The GPU Computing Revolution: From Multi-Core CPUs to Many-Core Graphics Processors: A Knowledge Transfer Report from the London Mathematical Society and Knowledge Transfer Network for Industrial Mathematics
McIntosh-Smith, S. N.
Sep 2011
London Mathematical Society. 32 p. (Knowledge Transfer Reports)

Accelerated molecular docking with OpenCL
McIntosh-Smith, S. N.
14 Jun 2011

A massively multicore parallelization of the Kohn-Sham energy gradients
Brown, P. S., Woods, C. J., McIntosh-Smith, S. N., and Manby, F. R.
Jul 2010
Journal of Computational Chemistry. 31, 10, p. 2008 - 2013 6 p.

Parallel path tracing using incoherent path-atom binning
Coulthurst, D. J., Dubla, P. B., Debattista, K., McIntosh-Smith, S. N., and Chalmers, A. G.
23 Apr 2008
p. 91-95

A 50-GFLOPS Processor for Scientific Computing and DSP
McIntosh-Smith, S. N.
Oct 2004

Turbo-charged applications on ClearSpeed’s streaming processors
McIntosh-Smith, S. N.
2004

Intelligent Algorithm Decomposition for Parallelism
Brown, M., Hurley, S., and McIntosh-Smith, S. N.
22 Jun 1994
p. 489-496

Intelligent Algorithm Decomposition for Parallelism with Alfer
McIntosh-Smith, S. N., Brown, M., and Hurley, S.
24 Apr 1994
p. 47-56