Publications

2025

Dinghong Song, Yuan Feng, Yiwei Wang, Shangye Chen, Cyril Guyot, Filip Blagojevic, Hyeran Jeon, Pengfei Su and Dong Li. AttnCache: Accelerating Self-Attention Inference for LLM Prefill via Attention Cache. Arxiv

Dinghong Song, Jierui Xu, Weichu Yang, Pengfei Su and Dong Li. NeuronMM: High-Performance Matrix Multiplication for LLM Inference on AWS Trainium. Arxiv

Tingfeng Lan, Yusen Wu, Bin Ma, Zhaoyuan Sun, Rui Yang, Tekin Bicer, Dong Li and Yue Cheng. ZenFlow: Enabling Stall-Free Offloading Training via Asynchronous Updates. Arxiv

[SC] Xi (Sherry) Wang, Bin Ma, Jongryool Kim, Byungil Koh, Hoshik Kim, and Dong Li. cMPI: Using CXL Memory Sharing for MPI One-Sided and Two-Sided Inter-Node Communications. In 37th ACM/IEEE International Conference for High Performance Computing, Performance Measurement, Modeling and Tools. (acceptance rate: 21.2%).

[SC] Bin Ma, Victor Nikitin, Xi (Sherry) Wang, Tekin Bicer, and Dong Li. mLR: Scalable Laminography Reconstruction based on Memoization. In 37th ACM/IEEE International Conference for High Performance Computing, Performance Measurement, Modeling and Tools. (acceptance rate: 21.2%).

[IPDPS] Xi (Sherry) Wang, Jie Liu, Jianbo Wu, Shuangyan Yang, Jie Ren, Bhanu Shankar, and Dong Li. Performance Characterization of CXL Memory and Its Use Cases. In 39th IEEE International Parallel and Distributed Processing Symposium. (acceptance rate: %).

[HPCA] Bin Ma, Jie Ren, Shuangyan Yang, Benjamin Francis, Ehsan Ardestani, Min Si, and Dong Li. Machine Learning-Guided Memory Optimization for DLRM Inference on Tiered Memory. In 31st International Symposium on High-Performance Computer Architecture (acceptance rate: 21%).

[HPCA] Shuangyan Yang, Minjia Zhang, and Dong Li. Buffalo: Enabling Large-Scale GNN Training via Memory-Efficient Bucketization. In 31st International Symposium on High-Performance Computer Architecture (acceptance rate: 21%).

[MASCOTS] Kihyun Kim, Jinwoo Kim, James J. Kim, Dong Li, and Youngjae Kim Flexible and Cost-Efficient LLM Serving with Heterogeneous GPUs. In IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (acceptance rate: 24%).

2024

[ATC] Dong Xu, Junhee Ryu, Jinho Baek, Kwangsik Shin, Pengfei Su, and Dong Li. FlexMem: Adaptive Page Profiling and Migration for Tiered Memory In 30th USENIX Annual Technical Conference (acceptance rate: 15.7%).

[EuroSys] Jie Ren, Dong Xu, Junhee Ryu, Kwangsik Shin, Daewoo Kim, and Dong Li. MTM: Rethinking Memory Profiling and Migration for Multi-Tiered Large Memory Systems. In European Conference on Computer Systems (acceptance rate: 15.9%).

[HPCA] Jie Ren, Dong Xu, Shuangyan Yang, Jiacheng Zhao, Zhicheng Li, Christian Navasca, Chenxi Wang, Harry Xu, and Dong Li. Enabling Large Dynamic Neural Network Training with Learning-based Memory Management. In 30th International Symposium on High-Performance Computer Architecture (acceptance rate: 18%).

[SC] Dong Xu, Yuan Feng, Kwangsik Shin, Daewoo Kim, Hyeran Jeon, and Dong Li. Efficient Tensor Offloading for Large Deep-Learning Model Training based on Compute Express Link. In 36th ACM/IEEE International Conference for High Performance Computing, Performance Measurement, Modeling and Tools, 2024 (acceptance rate: 22.7%).

[MEMSYS] Jianbo Wu, Jie Liu, Gokcen Kestor, Roberto Gioiosa and Dong Li. Performance Study of CXL Memory Topology. In 10th International Symposium on Memory Systems.

[HPCI] Bin Ma, Viktor Nikitin, Dong Li, and Tekin Bicer. Accelerated Laminographic Image Reconstruction Using GPUs. In High Performance Computing for Imaging.

[TR] Jiaobo Wu, Jie Ren, Shuangyan Yang, Konstantinos Parasyris, Giorgis Georgakoudis, Ignacio Laguna, and Dong Li. LM-Offload: Performance Model-Guided Generative Inference of Large Language Models with Parallelism Control. EECS Technical report. University of California, Merced.

2023

[HPDC] Wenqian Dong, Gokcen Kestor, and Dong Li. Auto-HPCnet: An Automatic Framework to Build Neural Network-based Surrogate for High-Performance Computing Applications. In 32nd International Symposium on High-Performance Parallel and Distributed Computing (acceptance rate:)

[PPoPP] Zhen Xie, Jie Liu, Jiajia Li, and Dong Li. Merchandiser: Data Placement on Heterogeneous Memory for Task-Parallel HPC Applications with Load-Balance Awareness. In 28th Principles and Practice of Parallel Programming (acceptance rate: 23.7%)

[ASPLOS] Shuangyan Yang, Minjia Zhang, Wenqian Dong, and Dong Li. Betty: Enabling Large-Scale GNN Training with Batch-Level Graph Partitioning. In 28th Architectural Support for Programming Languages and Operating Systems (acceptance rate: %)

[arxiv] Jie Ren, Dong Xu, Ivy Peng, Junhee Ryu, Kwangsik Shin, Daewoo Kim, and Dong Li Rethinking Memory Profiling and Migration for Multi-Tiered Large Memory Systems.

[arxiv] Yuan Feng, Hyeran Jeon, Filip Blagojevic, Cyril Guyot, Qing Li, and Dong Li MEMO: Accelerating Transformers with Memoization on Big Memory Systems.

2022

[ATC] Xin He, Jianhua Sun, Hao Chen, and Dong Li. Campo: Cost-Aware Performance Optimization for Mixed-Precision Neural Network Training. In 28th USENIX Annual Technical Conference (acceptance rate: 16.3%)

[ICPP] Jie Liu, Bogdan Nicolae, and Dong Li. Lobster: Load Balance-Aware I/O for Distributed DNN Training. In 51th International Conference on Parallel Processing (acceptance rate: 27%).

[PPoPP] Zhen Xie, Jie Liu, Sam Ma, Jiajia Li, and Dong Li. LB-HM: Load Balance-Aware Data Placement on Heterogeneous Memory for Task-Parallel HPC Applications. Poster in Principles and Practice of Parallel Programming.

2021

[SEC] Jie Liu, Jiawen Liu, Zhen Xie, Xia Ning, and Dong Li. Flame: A Self-Adaptive Auto-Labeling System for Heterogeneous Mobile Processors. The Sixth ACM/IEEE Symposium on Edge Computing (acceptance rate: ).

[VLDB] Jie Liu, Wenqian Dong, Qingqing Zhou, and Dong Li. Fauce: Fast and Accurate Deep Ensembles with Uncertainty for Cardinality Estimation. In 47th International Conference on Very Large Data Bases (acceptance rate: ).

[ATC] Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li and Yuxiong He. ZeRO-Offload: Democratizing Billion-Scale Model Training. In 27th USENIX Annual Technical Conference (acceptance rate: 18.8%) (arXiv link) (Media report 1) (Media report 2)

[ICS] Zhen Xie, Wenqian Dong, Jie Liu, Ivy Peng, Yanbao Ma and Dong Li. MD-HM: Memoization-based Molecular Dynamics Simulations on Big Memory System. In 35th International Conference on Supercomputing (acceptance rate: 24%).

[ICS] Xin He, Jiawen Liu, Zhen Xie, Hao Chen, Guoyang Chen, Weifeng Zhang and Dong Li. Enabling Energy-Efficient DNN Training on Hybrid GPU-FPGA Accelerators. In 35th International Conference on Supercomputing (acceptance rate: 24%).

[ICS] Jie Ren, Jiaolin Luo, Ivy Peng, Kai Wu and Dong Li. Optimizing Large-Scale Plasma Simulations on Persistent Memory-based Heterogeneous Memory with Effective Data Placement Across Memory Hierarchy. In 35th International Conference on Supercomputing (acceptance rate: 24%).

[ICS] Jiawen Liu, Dong Li, Roberto Gioiosa and Jiajia Li. Athena: High-Performance Sparse Tensor Contraction Sequence on Heterogeneous Memory. In 35th International Conference on Supercomputing (acceptance rate: 24%).

[EuroSys] Zhen Xie, Wenqian Dong, Jiawen Liu, Hang Liu and Dong Li. Tahoe: Tree Structure-Aware High Performance Inference Engine for Decision Tree Ensemble on GPU. In European Conference on Computer Systems (acceptance rate: 20.9%). Link to Tahoe.

[FAST] Kai Wu, Jie Ren, Ivy Peng and Dong Li. ArchTM: Architecture-Aware, High Performance Transaction for Persistent Memory. In 19th USENIX Conference on File and Storage Technologies (acceptance rate: 21.5%).

[ASPLOS] Bang Di, Jiawen Liu, Hao Chen and Dong Li. Fast, Flexible and Comprehensive Bug Detection for Persistent Memory Programs. In 26th Architectural Support for Programming Languages and Operating Systems (acceptance rate: 18.8%, distinguished artifact award). Full paper. Extended abstract (2 pages). Link to PMDebugger.

[PPoPP] Jiawen Liu, Jie Ren, Roberto Gioiosa, Dong Li and Jiajia Li. Sparta: High-Performance, Element-Wise Sparse Tensor Contraction on Heterogeneous Memory. In 26th Principles and Practice of Parallel Programming (acceptance rate: 20.7%). Link to Sparta.

[HPCA] Jie Ren, Jiaolin Luo, Kai Wu, Minjia Zhang, Hyeran Jeon and Dong Li. Sentinel: Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning. In 27th IEEE International Symposium on HighPerformance Computer Architecture (acceptance rate: 24.4%).

[TPDS] Bang Di, Jianhua Sun, Hao Chen, and Dong Li. Efficient Buffer Overflow Detection on GPU. IEEE Transaction on Parallel and Distributed Systems.

[TPDS] Santosh Pandey, Zhibin Wang, Sheng Zhong, Chen Tian, Bolong Zheng, Xiaoye Li, Lingda Li, Adolfy Hoisie, Caiwen Ding, Dong Li, and Hang Liu. TRUST: Triangle Counting Reloaded on GPUs. IEEE Transaction on Parallel and Distributed Systems.

[JPDC] Luanzheng Guo, Dong Li, and Ignacio Laguna. PARIS: Predicting Application Resilience Using Machine Learning. Journal of Parallel and Distributed Computing.

2020

[NeurIPS] Jie Ren, Minjia Zhang and Dong Li. HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory. In 34th Conference on Neural Information Processing Systems (acceptance rate: 20%).

[Cluster] Jie Ren, Kai Wu and Dong Li. Exploring Non-Volatility of Non-Volatile Memory for High Performance Computing Under Failures. In IEEE International Conference on Cluster Computing (acceptance rate: %). (Link to the tech report) (Link to the NVC tool)

[PACT] Kai Wu, Ivy B. Peng, Jie Ren and Dong Li. Ribbon: High Performance Cache Line Flushing for Persistent Memory. In 29th International Conference on Parallel Architectures and Compilation Techniques (acceptance rate: 25%).

[SC] Wenqian Dong, Zhen Xie, Gokcen Kestor and Dong Li. Smart-PGSim: Using Neural Network to Accelerate AC-OPF Power Grid Simulation. In 32nd ACM/IEEE International Conference for High Performance Computing, Performance Measurement, Modeling and Tools (acceptance rate: 22.3%) (Media report)

[SC] Jiaolin Luo, Luanzheng Guo, Jie Ren, Kai Wu and Dong Li. Enabling Faster NGS Analysis on Optane-based Heterogeneous Memory. Poster In 32nd ACM/IEEE International Conference for High Performance Computing, Performance Measurement, Modeling and Tools.

[IPDPS] Ivy Peng, Kai Wu, Jie Ren, Dong Li and Maya Gokhale. Demystifying the Performance of HPC Scientific Applications on NVM-based Memory Systems. In 34th IEEE International Parallel and Distributed Processing Symposium. (acceptance rate: %).

[IISWC] Luanzheng Guo, Giorgis Georgakoudis, Konstantinos Parasyris, Ignacio Laguna and Dong Li. MATCH: An MPI Fault Tolerance Benchmark Suite. In IEEE International Symposium on Workload Characterization (acceptance rate: %) (Media report).

[USENIX OpML] Jiawen Liu, Zhen Xie, Dimitrios Nikolopoulos and Dong Li. RIANN: Real-time Incremental Learning with Approximate Nearest Neighbor on Mobile Devices. In USENIX Conference on Operational Machine Learning.

[MLSys-W] Jie Liu, Jiawen Liu, Zhen Xie and Dong Li. Flame: A Self-Adaptive Auto-Labeling System for Heterogeneous Mobile Processors. In On-Device Intelligence Workshop at Machine Learning and Systems Conference.

[TR] Jie Ren, Kai Wu and Dong Li. EasyCrash: Exploring Non-Volatility of Non-Volatile Memory for High Performance Computing Under Failures. EECS Technical Report. University of California, Merced.

[TR] Jie Ren, Jiaolin Luo, Ivy Peng, Kai Wu and Dong Li. Understanding and Optimizing Large-scale Plasma Simulations on Persistent Memory-based Systems. EECS Technical Report. University of California, Merced.

2019

[SC] Wenqian, Jie Liu, Zhen Xie and Dong Li. Adaptive Neural Network-Based Approximation to Accelerate Eulerian Fluid Simulation. In 31st ACM/IEEE International Conference for High Performance Computing, Performance Measurement, Modeling and Tools (acceptance rate: 22.6%).

[IPDPS] Jiawen Liu, Dong Li, Gokcen Kestor, and Jeffrey Vetter. Runtime Concurrency Control and Operation Scheduling for High Performance Neural Network Training. In 33rd IEEE International Parallel and Distributed Processing Symposium. (acceptance rate: 27.7%).

[IPDPS] Luanzheng Guo and Dong Li. MOARD: Modeling Application Resilience to Transient Faults on Data Objects. In 33rd IEEE International Parallel and Distributed Processing Symposium. (acceptance rate: 27.7%).

[ICPADS] Jie Liu, Jiawen Liu, Wan Du and Dong Li. Performance Analysis and Characterization of Training Deep Learning Models on NVIDIA TX2. In 25th IEEE International Conference on Parallel and Distributed Systems (acceptance rate: 28.0%).

[MCHPC] Ivy B. Peng, Marty McFadden, Eric Green, Keita Iwabuchi, Kai Wu, Dong Li, Roger Pearce, and Maya Gokhale. UMap: Enabling Application-driven Optimizations for Page Management. In Workshop on Memory Centric High Performance Computing.

Jie Ren, Jiaolin Luo, Kai Wu, Minjia Zhang, and Dong Li. Sentinel: Runtime Data Management on Heterogeneous Main Memory Systems for Deep Learning. (arXiv Link).

Jie Ren, Kai Wu, and Dong Li. EasyCrash: Exploring Non-Volatility of Non-Volatile Memory for High Performance Computing Under Failures. (arXiv Link).

Kai Wu, Jie Ren,and Dong Li. Architecture-Aware, High Performance Transaction for Persistent Memory. (arXiv Link).

2018

[MICRO] Jiawen Liu, Hengyu Zhao, Matheus Ogleari, Dong Li, and Jishen Zhao. Processing-in-Memory for Energy-efficient Neural Network Training: A Heterogeneous Approach. In 51st IEEE/ACM International Symposium on Microarchitecture (acceptance rate: 21.3%).

[SC] Luanzheng Guo, Dong Li, Ignacio Laguna, and Martin Schulz. FlipTracker: Understanding Natural Error Resilience in HPC Applications. In 30th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (acceptance rate: 23.6%). (Media report)

[SC] Kai Wu, Jie Ren, and Dong Li. Runtime Data Management on Non-Volatile Memory-Based Heterogeneous Memory for Task Parallel Programs. In 30th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (acceptance rate: 23.6%).

[PACT] Bang Di, Jianhua Sun, Dong Li, Hao Chen, and Zhe Quan. GMOD: A Dynamic GPU Memory Overflow Detector. In 27th International Conference on Parallel Architectures and Compilation Techniques (acceptance rate: 28.6%).

[ICPP] Kai Wu, Wenqian Dong, Qiang Guan, Nathan DeBardeleben, and Dong Li. Modeling Application Resilience in Large Scale Parallel Execution. In 47th International Conference on Parallel Processing.

[MCHPC] Jie Ren, Kai Wu, and Dong Li. Understanding Application Recomputability without Crash Consistency in Non-Volatile Memory. In Workshop on Memory Centric Programming for HPC.

[NVMW] Jie Ren, Kai Wu, and Dong Li. Algorithm-Directed Crash Consistence in Non-Volatile Memory for HPC. In 9th Annual Non-Volatile Memories Workshop.

[NVMW] Kai Wu and Dong Li. Unimem: Runtime Data Management on Non-Volatile Memory-based Heterogeneous Main Memory. In 9th Annual Non-Volatile Memories Workshop.

2017

[SC] Kai Wu, Yingchao Huang, and Dong Li. Unimem: Runtime Data Management on Non-Volatile Memory-based Heterogeneous Main Memory. In 29th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (acceptance rate: 18.7%).

[SC] Kai Wu, Qiang Guan, Nathan DeBardeleben, and Dong Li. Characterization and Comparison of Application Resilience for Serial and Parallel Codes. Poster in 29th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[Cluster] Shuo Yang, Kai Wu, Yifan Qiao, Dong Li, and Jidong Zhai. Algorithm-Directed Crash Consistence in Non-Volatile Memory for HPC. In IEEE International Conference on Cluster Computing (acceptance rate: 21.8%).

[Cluster] Yingchao Huang and Dong Li. Performance Modeling for Optimal Data Placement on GPU with Heterogeneous Memory Systems. In IEEE International Conference on Cluster Computing (acceptance rate: 21.8%).

[NAS] Wei Liu, Kai Wu, Jialin Liu, Feng Chen, and Dong Li. Performance Evaluation and Modeling of HPC I/O on Non-Volatile Memory. In 12th International Conference on Networking, Architecture, and Storage.

[ICPADS] Xin He, Zhiwen Chen, Jianhua Sun, Hao Chen, Dong Li and Zhe Quan. Exploring Synchronization in Cache Coherent Manycore Systems: A Case Study with Xeon Phi. In 23rd International Conference on Parallel and Distributed Systems.

[TR] Kai Wu, Frank Ober, Shari Hamlin, and Dong Li. Early Evaluation of Intel Optane Non-Volatile Memory with HPC I/O Workloads. Technical Report, PASA Lab.

Luanzheng Guo, Hanlin He, and Dong Li. Application-Level Resilience Modeling for HPC Fault Tolerance. (arXiv link).

Yingchao Huang, Kai Wu, and Dong Li. High Performance Data Persistence in Non-Volatile Memory for Resilient High Performance Computing. (arXiv link).

2016

[HPDC] Panruo Wu, Dong Li, Zizhong Chen, Jeffrey S. Vetter, and Sparsh Mittal. Algorithm-Directed Data Placement in Explicitly Managed Non-Volatile Memory. In 25th ACM International Symposium on High Performance Parallel and Distributed Computing (acceptance rate: 16%).

[MEMSYS] Yuxiong Zhu, Borui Wang, Dong Li, and Jishen Zhao. Integrated Thermal Analysis for Processing In Die-Stacking Memory. In International Symposium on Memory Systems.

[SC] Luanzheng Guo, Jing Liang, and Dong Li. Understanding Ineffectiveness of Application-Level Fault Injection. Poster in ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (Nominated as the best poster, 2.9% of all poster submissions).

[HUCAA] Borui Wang, Martin Torres, Dong Li, Jishen Zhao, and Florin Rusu. Performance Implications of Processing-in-Memory Designs on Data-Intensive Applications. In 5th Workshop on Heterogeneous and Unconventional Cluster Architectures and Applications.

[MODSIM] Dong Li. Modeling Methods and Tools for Optimizing Data Placement in Heterogeneous Memory Systems of GPU. In Workshop on Modeling and Simulation of Systems and Applications.

[DataCloud] Mehdi Bahrami, Dong Li, Mukesh Singhal, and Ashish Kundu. An Efficient Parallel Implementation of a Light-Weight Data Privacy Method for Mobile Cloud Users. Accepted as a short paper in the DataCloud workshop associated with SC’16.

[TC] Guoyang Chen, Xipeng Shen, Bo Wu, and Dong Li. Optimizing Data Placement on GPU Memory: A Portable Approach. Accepted in IEEE Transactions on Computers.

[TR] Yingchao Huang, Dong Li. Performance Modeling for Optimal Data Placement on GPU with Heterogeneous Memory Systems. Technical report (PASA-001-UCMERCED-2016), PASA Lab, University of California, Merced.

2015

[DATE] Poremba, M., Mittal, S., Li, D., Vetter, J. S., and Xie, Y. DESTINY: A Tool for Modeling Emerging 3D NVM and eDRAM Caches. In IEEE Design Automation and Test in Europe Confernce and Exhibition.

[ICS] Wu, B., Chen, G., Li, D., Shen, X., and Vetter, J. S. Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations. In International Conference on Supercomputing.

[IEEE MICRO] Chen, G., Wu, B., Li, D., and Shen, X. Enabling Portable Optimizations of Data Placement on GPU. The Heterogeneous Computing special issue of IEEE Micro, (July/August Issue).

[Cluster] Feng, K., Venkata, M. G., Li, D., and Sun, X. Fast Fault Injection and Sensitivity Analysis for Collective Communications. In IEEE Cluster Conference.

2014

[ICCS] Tan, L., Chen, L., Chen, Z., Zong, Z., Ge, R., and Li, D. DAEMON: Distributed Adaptive Energy-efficient Matrix-multiplicatiON. In International Conference on Computational Science.

[HPDC] Mittal, S., Vetter, J. S., and Li, D. Improving Energy Efficiency of Embedded DRAM Caches for High-End Computing Systems. In ACM International Symposium on High Performance Parallel and Distributed Computing.

[IPDPS] Lee, S., Li, D., and Vetter, J. S. Interactive Programming Debugging and Optimization for Directive-Based, Efficient GPU Computing. In IEEE International Parallel and Distributed Processing Symposium.

[ISVLSI] Mittal, S., Vetter, J. S., and Li, D. LastingNVCache: A Technique for Improving the Lifetime of Non-Volatile Caches. In IEEE Annual Symposium on VLSI.

[MICRO] Chen, G., Wu, B., Li, D., and Shen, X. PORPLE: An Extensible Optimizer for Portable Data Placement on GPU. In IEEE/ACM International Symposium on Microarchitecture.

[SC] Yu, L., Li, D., Mittal, S., and Vetter, J. S. Quantitatively Modeling Application Resiliency with the Data Vulnerability Factor. In International Conference for High Performance Computing, Networking, Storage and Analysis (Nominated as the best student paper).

[GLSVLSI] Mittal, S., Vetter, J. S., and Li, D. WriteSmoothing: Improving Lifetime of Non-volatile Caches Using Intra-set Wear-Leveling. In ACM International Conference on Great Lakes Symposium.

2013

[IPCCC] Tan, L., Chen, Z., Ge, R., Zong, Z., and Li, D. Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications. In IEEE International Performance Computing and Communications Conference.

[Resilience] Li, D., Lee, S., and Vetter, J. S. Evaluating the Viability of Application-Driven Cooperative CPU/GPU Fault Detection. In International Workshop on Resilience in High Performance Computing.

[PACT] Wang, B., Wu, B., Li, D., Shen, X., Yu, W., Jiao, Y., and Vetter, J. S. Exploring Hybrid Memory for GPU Energy Efficiency through Software-Hardware Co-Design. InInternational Symposium on Parallel Architectures and Compilation Techniques.

[SC] Li, D., Chen, Z., Wu, P., and Vetter, J. S. Rethinking Algorithm-Based Fault Tolerance with a Cooperative Software-Hardware Approach. In ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[MASCOTS] Wang, B., Jiao, Y., Yu, W., Shen, X., Li, D., and Vetter, J. S. A Versatile Performance and Energy Simulation Tool for Composite GPU Global Memory. In IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

2012

[SC] Li, D., Vetter, J. S., and Yu, W. Classifying Soft Error Vulnerabilities in Extreme-Scale Scientific Applications Using a Binary Instrumentation Tool. In International Conference for High Performance Computing, Networking, Storage and Analysis.

[PER] Su, C. – Y., Li, D., Nikolopoulos, D., Grove, M., Cameron, K. W., & de Supinski, B. (2012). Critical Path-Based Thread Placement for NUMA Systems. ACM SIGMETRICS Performance Evaluation Review, 106-112.

[IPDPS] Li, D., Vetter, J., Marin, G., McCurdy, C., Cira, C., Liu, Z., and Yu, W. Identifying Opportunities for Byte-Addressable Non-Volatile Memory in Extreme-Scale Scientific Applications. In 26th IEEE International Parallel and Distributed Processing Symposium.

[IISWC] Su, C. – Y., Li, D., Nikolopoulos, D. S., Cameron, K. W., de Supinski, B. R., and Leon, E. A. Model-Based, Memory-Centric Performance and Power Optimization on NUMA Multiprocessors. In International Symposium on Workload Characterization.

[MASCOTS] Liu, Z., Wang, B., Carpenter, P., Li, D., Vetter, J., and Yu, W. PCM-Based Durable Write Cache for Fast Disk I/O. In International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[TPDS] Li, D., de Supinski, B., Schulz, M., Nikolopoulos, D. S., and Cameron, K. W. Strategies for Energy Efficient Resource Management of Hybrid Programming Models. IEEE Transaction on Parallel and Distributed Systems, 24, 144-157.

[CF] Spafford, K., Meredith, J., Lee, S., Li, D., Roth, P., and Vetter, J. The Tradeoffs of Fused Memory Hierarchies in Heterogeneous Architectures. In International Conference on Computing Frontiers

2011

[PMBS] Su, C. – Y., Li, D., Nikolopoulos, D. S., Grove, M., Cameron, K. W., and de Supinski, B. Critical Path-Based Thread Placement for NUMA Systems. In International Workshop on Modeling, Benchmarking and Simulation of High Performance Computing Systems.

[SRMPDS] Li, D., Byna, S., and Chakrador, S. Energy-Aware Workload Consolidaton on GPU. InInternational Workshop on Scheduling and Resource Management for Parallel and Distributed Systems.

[CF] Li, D., Nikolopoulos, D. S., Cameron, K. W., de Supinski, B., and Schulz, M. Scalable Memory Registration for High-Performance Networks Using Helper Threads. In International Conference on Computer Frontier.

2010

[IPDPS] Li, D., de Supinski, B., Schulz, M., Nikolopoulos, D. S., and Cameron, K. W. Hybrid MPI/OpenMP Power-Aware Computing. In International Parallel and Distributed Processing Symposium.

[JPEDS] Cao, Z., Easterling, D., Watson, L., Li, D., Cameron, K. W., and Feng, W. – C. Power Saving Experiments for Large Scale Global Optimization. International Journal of Parallel, Emergent and Distributed System.

[IPDPS] Li, D., Nikolopoulos, D. S., Cameron, K. W., de Supinski, B., and Schulz, M. Power-Aware MPI Task Aggregation Prediction for High-End Computing Systems. In International Parallel and Distributed Processing Symposium.

[TPDS] Ge, R., Feng, X., Song, S., Chang, H. – C., Li, D., and Cameron, K. W. PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications. IEEE Transaction on Parallel and Distributed Systems, 21.

[ICPP] Li, D., Ge, R., and Cameron, K. W. System-level, Unified In-band and Out-of-band Dynamic Thermal Control. In International Conference on Parallel Processing.

2009

[IPDPS] Li, D., de Supinski, B., Schulz, M., Cameron, K. W., and Nikolopoulos, D. S. Poster: Model-based Hybrid MPI/OpenMP Power-Aware Computing. In International Conference for High Performance Computing, Networking, Storage and Analysis.

2008

[ICDCN] Li, D., Huang, S., and Cameron, K. W. CG-Cell: An NPB Benchmark Implementation on Cell Broadband Engine. In International Conference on Distributed Computing and Networking.

[HP-PAC] Li, D., Chang, H. – C., Pyla, H., and Cameron, K. W. System-level, Thermal-Aware Fully-loaded Process Scheduling. In International Workshop on High-Performance Power-Aware Computing.

2007

[SC] Pyla, H., Li, D., and Cameron, K. W. (2007). Poster: Thermal-Aware High Performance Computing Using TEMPEST. In International Conference on High Performance Computing, Networking, Storage and Analysis.

Parallel Architecture, System, and Algorithm Lab

Papers and Presentations in Refereed Conference Proceedings