
Melon: breaking the memory wall for resource-efficient on-device ...
Melon further incorporates novel techniques to deal with the high memory fragmentation and memory adaptation. We implement and evaluate Melon with various typical DNN models on commodity mobile devices. The results show that Melon can achieve up to 4.33× larger batch size under the same memory budget.
those existing memory pools can reach up to 42% during DNN train-ing. To deal with the memory fragmentation, Melon uses a model-specific user-space memory pool that incorporates the knowledge of static memory access patterns (i.e., when a tensor is needed and how much memory it takes, or called lifetime) during model train-ing.
Integrating memristors and CMOS for better AI - Nature
Sep 17, 2019 · By integrating memristor arrays with CMOS circuitry, a computing-in-memory architecture can be created that could provide efficient deep neural network processors.
Domino: A Tailored Network-on-Chip Architecture to Enable …
Jul 18, 2021 · The ever-increasing computation complexity of fast-growing Deep Neural Networks (DNNs) has requested new computing paradigms to overcome the memory wall in conventional Von Neumann computing architectures. The emerging Computing-In-Memory (CIM) architecture has been a promising candidate to accelerate neural network computing.
Illusion of large on-chip memory by networked computing chips …
Jan 11, 2021 · Here, we report a DNN inference system—termed Illusion—that consists of networked computing chips, each of which contains a certain minimal amount of local on-chip memory and mechanisms for...
We formalize the problem of trading-off DNN training time and memory requirements as the tensor remateri-alization optimization problem, a generalization of prior checkpointing strategies.
An Energy-Efficient Near-Data Processing Accelerator for DNNs …
Oct 27, 2023 · The 3D-stacked memory is especially appealing for DNN accelerators due to its high-density/low-energy storage and near-memory computation capabilities to perform the DNN operations massively in parallel. However, memory accesses remain as the main bottleneck for running modern DNNs efficiently.
We propose a memory-centric deep learning system that can transparently expand the memory capacity available to the accelerators while also providing fast inter-device communication for parallel training.
Checkmate: Breaking the Memory Wall with Optimal Tensor
In this paper, we formalize the problem of trading-off computation time and memory requirements for DNN training as the tensor rematerialization optimization problem. We develop a new system to optimally solve the problem in reasonable times …
STR: Hybrid Tensor Re-Generation to Break Memory Wall for DNN …
In this article, we propose a novel hybrid tensor re-generation strategy, called STR, which combines swap and recomputation techniques to find the optimal execution plan for the DNN training when the memory is limited.
- Some results have been removed