University of California, Merced

Parallel Architecture, System, and Algorithm Lab



2020


    [IISWC] Luanzheng Guo, Giorgis Georgakoudis, Konstantinos Parasyris, Ignacio Laguna and Dong Li. MATCH: An MPI Fault Tolerance Benchmark Suite. In IEEE International Symposium on Workload Characterization (acceptance rate: %).

    [Cluster] Jie Ren, Kai Wu and Dong Li. Exploring Non-Volatility of Non-Volatile Memory for High Performance Computing Under Failures. In IEEE International Conference on Cluster Computing (acceptance rate: %). (Link to the tech report) (Link to the NVC tool)

    [PACT] Kai Wu, Ivy B. Peng, Jie Ren and Dong Li. Ribbon: High Performance Cache Line Flushing for Persistent Memory. In 29th International Conference on Parallel Architectures and Compilation Techniques (acceptance rate: 25%).

    [SC] Wenqian Dong, Zhen Xie, Gokcen Kestor and Dong Li. Smart-PGSim: Using Neural Network to Accelerate AC-OPF Power Grid Simulation. In 32nd ACM/IEEE International Conference for High Performance Computing, Performance Measurement, Modeling and Tools (acceptance rate: 22.3%).

    [SC] Jiaolin Luo, Luanzheng Guo, Jie Ren, Kai Wu and Dong Li. Enabling Faster NGS Analysis on Optane-based Heterogeneous Memory. Poster In 32nd ACM/IEEE International Conference for High Performance Computing, Performance Measurement, Modeling and Tools.

    [IPDPS] Ivy Peng, Kai Wu, Jie Ren, Dong Li and Maya Gokhale. Demystifying the Performance of HPC Scientific Applications on NVM-based Memory Systems. In 34th IEEE International Parallel and Distributed Processing Symposium. (acceptance rate: %).

    [USENIX OpML] Jiawen Liu, Zhen Xie, Dimitrios Nikolopoulos and Dong Li. RIANN: Real-time Incremental Learning with Approximate Nearest Neighbor on Mobile Devices. In USENIX Conference on Operational Machine Learning.

    [MLSys-W] Jie Liu, Jiawen Liu, Zhen Xie and Dong Li. Flame: A Self-Adaptive Auto-Labeling System for Heterogeneous Mobile Processors. In On-Device Intelligence Workshop at Machine Learning and Systems Conference.

    [TR] Jie Ren, Kai Wu and Dong Li. EasyCrash: Exploring Non-Volatility of Non-Volatile Memory for High Performance Computing Under Failures. EECS Technical Report. University of California, Merced.


2019


    [SC] Wenqian, Jie Liu, Zhen Xie and Dong Li. Adaptive Neural Network-Based Approximation to Accelerate Eulerian Fluid Simulation. In 31st ACM/IEEE International Conference for High Performance Computing, Performance Measurement, Modeling and Tools (acceptance rate: 22.6%).

    [IPDPS] Jiawen Liu, Dong Li, Gokcen Kestor, and Jeffrey Vetter. Runtime Concurrency Control and Operation Scheduling for High Performance Neural Network Training. In 33rd IEEE International Parallel and Distributed Processing Symposium. (acceptance rate: 27.7%).

    [IPDPS] Luanzheng Guo and Dong Li. MOARD: Modeling Application Resilience to Transient Faults on Data Objects. In 33rd IEEE International Parallel and Distributed Processing Symposium. (acceptance rate: 27.7%).

    [ICPADS] Jie Liu, Jiawen Liu, Wan Du and Dong Li. Performance Analysis and Characterization of Training Deep Learning Models on NVIDIA TX2. In 25th IEEE International Conference on Parallel and Distributed Systems (acceptance rate: 28.0%).

    [MCHPC] Ivy B. Peng, Marty McFadden, Eric Green, Keita Iwabuchi, Kai Wu, Dong Li, Roger Pearce, and Maya Gokhale. UMap: Enabling Application-driven Optimizations for Page Management. In Workshop on Memory Centric High Performance Computing.

    Jie Ren, Jiaolin Luo, Kai Wu, Minjia Zhang, and Dong Li. Sentinel: Runtime Data Management on Heterogeneous Main Memory Systems for Deep Learning. (arXiv Link).

    Jie Ren, Kai Wu, and Dong Li. EasyCrash: Exploring Non-Volatility of Non-Volatile Memory for High Performance Computing Under Failures. (arXiv Link).

    Kai Wu, Jie Ren,and Dong Li. Architecture-Aware, High Performance Transaction for Persistent Memory. (arXiv Link).


2018


    [MICRO] Jiawen Liu, Hengyu Zhao, Matheus Ogleari, Dong Li, and Jishen Zhao. Processing-in-Memory for Energy-efficient Neural Network Training: A Heterogeneous Approach. In 51st IEEE/ACM International Symposium on Microarchitecture (acceptance rate: 21.3%).

    [SC] Luanzheng Guo, Dong Li, Ignacio Laguna, and Martin Schulz. FlipTracker: Understanding Natural Error Resilience in HPC Applications. In 30th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (acceptance rate: 23.6%).

    [SC] Kai Wu, Jie Ren, and Dong Li. Runtime Data Management on Non-Volatile Memory-Based Heterogeneous Memory for Task Parallel Programs. In 30rd IEEE International Parallel and Distributed Processing Symposium. (acceptance rate: 23.6%).

    [PACT] Bang Di, Jianhua Sun, Dong Li, Hao Chen, and Zhe Quan. GMOD: A Dynamic GPU Memory Overflow Detector. In 27th International Conference on Parallel Architectures and Compilation Techniques (acceptance rate: 28.6%).

    [ICPP] Kai Wu, Wenqian Dong, Qiang Guan, Nathan DeBardeleben, and Dong Li. Modeling Application Resilience in Large Scale Parallel Execution. In 47th International Conference on Parallel Processing.

    [MCHPC] Jie Ren, Kai Wu, and Dong Li. Understanding Application Recomputability without Crash Consistency in Non-Volatile Memory. In Workshop on Memory Centric Programming for HPC.

    [NVMW] Jie Ren, Kai Wu, and Dong Li. Algorithm-Directed Crash Consistence in Non-Volatile Memory for HPC. In 9th Annual Non-Volatile Memories Workshop.

    [NVMW] Kai Wu and Dong Li. Unimem: Runtime Data Management on Non-Volatile Memory-based Heterogeneous Main Memory. In 9th Annual Non-Volatile Memories Workshop.


2017


    [SC] Kai Wu, Yingchao Huang, and Dong Li. Unimem: Runtime Data Management on Non-Volatile Memory-based Heterogeneous Main Memory. In 29th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (acceptance rate: 18.7%).

    [SC] Kai Wu, Qiang Guan, Nathan DeBardeleben, and Dong Li. Characterization and Comparison of Application Resilience for Serial and Parallel Codes. Poster in 29th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

    [Cluster] Shuo Yang, Kai Wu, Yifan Qiao, Dong Li, and Jidong Zhai. Algorithm-Directed Crash Consistence in Non-Volatile Memory for HPC. In IEEE International Conference on Cluster Computing (acceptance rate: 21.8%).

    [Cluster] Yingchao Huang and Dong Li. Performance Modeling for Optimal Data Placement on GPU with Heterogeneous Memory Systems. In IEEE International Conference on Cluster Computing (acceptance rate: 21.8%).

    [NAS] Wei Liu, Kai Wu, Jialin Liu, Feng Chen, and Dong Li. Performance Evaluation and Modeling of HPC I/O on Non-Volatile Memory. In 12th International Conference on Networking, Architecture, and Storage.

    [ICPADS] Xin He, Zhiwen Chen, Jianhua Sun, Hao Chen, Dong Li and Zhe Quan. Exploring Synchronization in Cache Coherent Manycore Systems: A Case Study with Xeon Phi. In 23rd International Conference on Parallel and Distributed Systems.

    [TR] Kai Wu, Frank Ober, Shari Hamlin, and Dong Li. Early Evaluation of Intel Optane Non-Volatile Memory with HPC I/O Workloads. Technical Report, PASA Lab.

    Luanzheng Guo, Hanlin He, and Dong Li. Application-Level Resilience Modeling for HPC Fault Tolerance. (arXiv link).

    Yingchao Huang, Kai Wu, and Dong Li. High Performance Data Persistence in Non-Volatile Memory for Resilient High Performance Computing. (arXiv link).


2016


    [HPDC] Panruo Wu, Dong Li, Zizhong Chen, Jeffrey S. Vetter, and Sparsh Mittal. Algorithm-Directed Data Placement in Explicitly Managed Non-Volatile Memory. In 25th ACM International Symposium on High Performance Parallel and Distributed Computing (acceptance rate: 16%).

    [MEMSYS] Yuxiong Zhu, Borui Wang, Dong Li, and Jishen Zhao. Integrated Thermal Analysis for Processing In Die-Stacking Memory. In International Symposium on Memory Systems.

    [SC] Luanzheng Guo, Jing Liang, and Dong Li. Understanding Ineffectiveness of Application-Level Fault Injection. Poster in ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (Nominated as the best poster, 2.9% of all poster submissions).

    [HUCAA] Borui Wang, Martin Torres, Dong Li, Jishen Zhao, and Florin Rusu. Performance Implications of Processing-in-Memory Designs on Data-Intensive Applications. In 5th Workshop on Heterogeneous and Unconventional Cluster Architectures and Applications.

    [MODSIM] Dong Li. Modeling Methods and Tools for Optimizing Data Placement in Heterogeneous Memory Systems of GPU. In Workshop on Modeling and Simulation of Systems and Applications.

    [DataCloud] Mehdi Bahrami, Dong Li, Mukesh Singhal, and Ashish Kundu. An Efficient Parallel Implementation of a Light-Weight Data Privacy Method for Mobile Cloud Users. Accepted as a short paper in the DataCloud workshop associated with SC’16.

    [TC] Guoyang Chen, Xipeng Shen, Bo Wu, and Dong Li. Optimizing Data Placement on GPU Memory: A Portable Approach. Accepted in IEEE Transactions on Computers.

    [TR] Yingchao Huang, Dong Li. Performance Modeling for Optimal Data Placement on GPU with Heterogeneous Memory Systems. Technical report (PASA-001-UCMERCED-2016), PASA Lab, University of California, Merced.


2015


    [DATE] Poremba, M., Mittal, S., Li, D., Vetter, J. S., and Xie, Y. DESTINY: A Tool for Modeling Emerging 3D NVM and eDRAM Caches. In IEEE Design Automation and Test in Europe Confernce and Exhibition.

    [ICS] Wu, B., Chen, G., Li, D., Shen, X., and Vetter, J. S. Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations. In International Conference on Supercomputing.

    [IEEE MICRO] Chen, G., Wu, B., Li, D., and Shen, X. Enabling Portable Optimizations of Data Placement on GPU. The Heterogeneous Computing special issue of IEEE Micro, (July/August Issue).

    [Cluster] Feng, K., Venkata, M. G., Li, D., and Sun, X. Fast Fault Injection and Sensitivity Analysis for Collective Communications. In IEEE Cluster Conference.


2014


    [ICCS] Tan, L., Chen, L., Chen, Z., Zong, Z., Ge, R., and Li, D. DAEMON: Distributed Adaptive Energy-efficient Matrix-multiplicatiON. In International Conference on Computational Science.

    [HPDC] Mittal, S., Vetter, J. S., and Li, D. Improving Energy Efficiency of Embedded DRAM Caches for High-End Computing Systems. In ACM International Symposium on High Performance Parallel and Distributed Computing.

    [IPDPS] Lee, S., Li, D., and Vetter, J. S. Interactive Programming Debugging and Optimization for Directive-Based, Efficient GPU Computing. In IEEE International Parallel and Distributed Processing Symposium.

    [ISVLSI] Mittal, S., Vetter, J. S., and Li, D. LastingNVCache: A Technique for Improving the Lifetime of Non-Volatile Caches. In IEEE Annual Symposium on VLSI.

    [MICRO] Chen, G., Wu, B., Li, D., and Shen, X. PORPLE: An Extensible Optimizer for Portable Data Placement on GPU. In IEEE/ACM International Symposium on Microarchitecture.

    [SC] Yu, L., Li, D., Mittal, S., and Vetter, J. S. Quantitatively Modeling Application Resiliency with the Data Vulnerability Factor. In International Conference for High Performance Computing, Networking, Storage and Analysis. Nominated as the best student paper.

    [GLSVLSI] Mittal, S., Vetter, J. S., and Li, D. WriteSmoothing: Improving Lifetime of Non-volatile Caches Using Intra-set Wear-Leveling. In ACM International Conference on Great Lakes Symposium.


2013


    [IPCCC] Tan, L., Chen, Z., Ge, R., Zong, Z., and Li, D. Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications. In IEEE International Performance Computing and Communications Conference.

    [Resilience] Li, D., Lee, S., and Vetter, J. S. Evaluating the Viability of Application-Driven Cooperative CPU/GPU Fault Detection. In International Workshop on Resilience in High Performance Computing.

    [PACT] Wang, B., Wu, B., Li, D., Shen, X., Yu, W., Jiao, Y., and Vetter, J. S. Exploring Hybrid Memory for GPU Energy Efficiency through Software-Hardware Co-Design. InInternational Symposium on Parallel Architectures and Compilation Techniques.

    [SC] Li, D., Chen, Z., Wu, P., and Vetter, J. S. Rethinking Algorithm-Based Fault Tolerance with a Cooperative Software-Hardware Approach. In ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

    [MASCOTS] Wang, B., Jiao, Y., Yu, W., Shen, X., Li, D., and Vetter, J. S. A Versatile Performance and Energy Simulation Tool for Composite GPU Global Memory. In IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.


2012


    [SC] Li, D., Vetter, J. S., and Yu, W. Classifying Soft Error Vulnerabilities in Extreme-Scale Scientific Applications Using a Binary Instrumentation Tool. In International Conference for High Performance Computing, Networking, Storage and Analysis.

    [PER] Su, C. – Y., Li, D., Nikolopoulos, D., Grove, M., Cameron, K. W., & de Supinski, B. (2012). Critical Path-Based Thread Placement for NUMA Systems. ACM SIGMETRICS Performance Evaluation Review, 106-112.

    [IPDPS] Li, D., Vetter, J., Marin, G., McCurdy, C., Cira, C., Liu, Z., and Yu, W. Identifying Opportunities for Byte-Addressable Non-Volatile Memory in Extreme-Scale Scientific Applications. In 26th IEEE International Parallel and Distributed Processing Symposium.

    [IISWC] Su, C. – Y., Li, D., Nikolopoulos, D. S., Cameron, K. W., de Supinski, B. R., and Leon, E. A. Model-Based, Memory-Centric Performance and Power Optimization on NUMA Multiprocessors. In International Symposium on Workload Characterization.

    [MASCOTS] Liu, Z., Wang, B., Carpenter, P., Li, D., Vetter, J., and Yu, W. PCM-Based Durable Write Cache for Fast Disk I/O. In International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

    [TPDS] Li, D., de Supinski, B., Schulz, M., Nikolopoulos, D. S., and Cameron, K. W. Strategies for Energy Efficient Resource Management of Hybrid Programming Models. IEEE Transaction on Parallel and Distributed Systems, 24, 144-157.

    [CF] Spafford, K., Meredith, J., Lee, S., Li, D., Roth, P., and Vetter, J. The Tradeoffs of Fused Memory Hierarchies in Heterogeneous Architectures. In International Conference on Computing Frontiers


2011


    [PMBS] Su, C. – Y., Li, D., Nikolopoulos, D. S., Grove, M., Cameron, K. W., and de Supinski, B. Critical Path-Based Thread Placement for NUMA Systems. In International Workshop on Modeling, Benchmarking and Simulation of High Performance Computing Systems.

    [SRMPDS] Li, D., Byna, S., and Chakrador, S. Energy-Aware Workload Consolidaton on GPU. InInternational Workshop on Scheduling and Resource Management for Parallel and Distributed Systems.

    [CF] Li, D., Nikolopoulos, D. S., Cameron, K. W., de Supinski, B., and Schulz, M. Scalable Memory Registration for High-Performance Networks Using Helper Threads. In International Conference on Computer Frontier.


2010


    [IPDPS] Li, D., de Supinski, B., Schulz, M., Nikolopoulos, D. S., and Cameron, K. W. Hybrid MPI/OpenMP Power-Aware Computing. In International Parallel and Distributed Processing Symposium.

    [JPEDS] Cao, Z., Easterling, D., Watson, L., Li, D., Cameron, K. W., and Feng, W. – C. Power Saving Experiments for Large Scale Global Optimization. International Journal of Parallel, Emergent and Distributed System.

    [IPDPS] Li, D., Nikolopoulos, D. S., Cameron, K. W., de Supinski, B., and Schulz, M. Power-Aware MPI Task Aggregation Prediction for High-End Computing Systems. In International Parallel and Distributed Processing Symposium.

    [TPDS] Ge, R., Feng, X., Song, S., Chang, H. – C., Li, D., and Cameron, K. W. PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications. IEEE Transaction on Parallel and Distributed Systems, 21.

    [ICPP] Li, D., Ge, R., and Cameron, K. W. System-level, Unified In-band and Out-of-band Dynamic Thermal Control. In International Conference on Parallel Processing.


2009


    [IPDPS] Li, D., de Supinski, B., Schulz, M., Cameron, K. W., and Nikolopoulos, D. S. Poster: Model-based Hybrid MPI/OpenMP Power-Aware Computing. In International Conference for High Performance Computing, Networking, Storage and Analysis.


2008


    [ICDCN] Li, D., Huang, S., and Cameron, K. W. CG-Cell: An NPB Benchmark Implementation on Cell Broadband Engine. In International Conference on Distributed Computing and Networking.

    [HP-PAC] Li, D., Chang, H. – C., Pyla, H., and Cameron, K. W. System-level, Thermal-Aware Fully-loaded Process Scheduling. In International Workshop on High-Performance Power-Aware Computing.


2007


    [SC] Pyla, H., Li, D., and Cameron, K. W. (2007). Poster: Thermal-Aware High Performance Computing Using TEMPEST. In International Conference on High Performance Computing, Networking, Storage and Analysis.