Data Loading and Data Transfer Bottlenecks
Machine Learning Frameworks Interoperability, Part 2: Data Loading and Data Transfer Bottlenecks | NVIDIA Developer Blog
“Thus far, we have worked on the assumption that the data is already loaded in memory and that a single GPU is used. This section highlights a few bottlenecks that might occur when loading your dataset from storage to device memory or when transferring data between two GPUs using either a single node or multiple nodes setting. We then discuss how to overcome them.
In a traditional workflow, when a dataset is loaded from storage to GPU memory, the data will be copied from the disk to the GPU memory using the CPU and the PCIe bus. Loading the data requires at least two copies of the data. The first one happens when transferring the data from the storage to the host memory (CPU RAM). The second copy of the data occurs when transferring the data from the host memory to the device memory (GPU VRAM)…”