2017-06-29

History / Edit / PDF / EPUB / BIB /
Created: June 29, 2017 / Updated: March 22, 2020 / Status: in progress / 1 min read (~100 words)

  • How to deal with loading and batching huge amount of data, more particularly in the form of images?
    • Loading thousands of images directly from the filesystem is efficient due to a lot of system calls
    • It seems straightforward to pack these images into more concise structures, such as numpy arrays and using compressed files such as npz
    • However, how does one deal with loading all this data at training time, such that 10 GB of compressed data does not equal 20 GB of RAM used all throughout training?