M03 Data Analytics for Scalable Computing Systems Design: Challenges, Opportunities, and Solutions
Abstract
The rate of growth of Big Data, slowing down of Moore’s law, and the rise of emerging applications pose significant challenges in the design of large-scale computing systems with high-performance, energy-efficiency, and reliability. This tutorial will consider solutions based on machine learning and data analytics to address various challenges and answer the following questions:
- How to use machine learning and statistical modeling for effective design space exploration of computing systems to optimize for power, performance, and thermal metrics?
- How to use machine learning techniques to efficiently manage resources of computing systems (e.g., power, memory, interconnects) to improve performance and energy-efficiency?
- What are the challenges in Processing-in-Memory (PIM) to efficiently solve machine learning algorithms?
- How can data analytics facilitate fault diagnosis, detect anomalies, and increase robustness in the network backbone of emerging large-scale networking systems?
- How can machine learning be used during the design process to produce higher quality, more robust manufactured devices?
To address these outstanding challenges, out-of-the-box approaches need to be explored. By integrating machine learning algorithms, data analytics, statistical modeling, and design of advanced computing systems, this tutorial will engage a broad section of DATE conference attendees. This tutorial will attract newcomers who want to learn how to apply machine learning and data analytics to solve problems in computing systems, as well as experienced researchers looking for exciting new directions in computing systems design, EDA methodologies, and multi-scale computing. This tutorial covers design, optimization and resilience: three main pillars of designing computing systems. It also highlights how machine learning and EDA researchers can join hands to design energy-efficient and reliable chips and systems.
Objectives
The main objective of the tutorial is to help attendees understand the emerging inter-dependence of data analytics and computer system design. We will elaborate the most important hardware-software co-design challenges that both data analytics and EDA community need to fully comprehend. We will provide an overview of some interesting emerging solutions to these problems. Specific aims are as follows:
- Design principles for advanced manycore systems as an enabler for machine learning and big data applications
- Data-driven methods for design space exploration and dynamic resource management
- Accelerator designs on conventional and emerging platforms
- Ensure proactive fault tolerance through failure prediction based on time-series data analysis
- Machine learning inspired test, manufacturing and validation methodologies