Motivation With the recent enormous success of deep learning in speech, image, and natural language processing, people start to dream about an intelligent brain with the assist of a massive amount of computation power on large datasets. Nvidia, the hardware company behind all the rapid development of deep learning, put together an ad page demonstrate the successful stories of AI. More and more researchers/companies start to realize the value and deep learning and begin to explore its inspirational applications.
To continue the discussion in previous post, we want a folder strucutre standard instead of HDF5 to store dataset temporarily for processing or permantantly for sharing. To enable the flexibility of such folder structure apporach, we only impose minimum requirements on such folder and leave the rest fine-definition to the meta-data file. So what is the best format for such meta data?
Basically, we want a hash talbe that establishes relationship between keyword and values that are meaningful to the user/audience.
Recently, the increasing volume of data and application of neural networks have both forced to look at data format again. Previously, I thought the HDF5 format is the best for most of my application. The nice APIs to HDF5, e.g. H5py and DeepDish gives me both flexibility and easiness of using HDF5 to store and share my dataset. However, as my datasets start to grow substantially, loading them into the memory puts a significant burden on my I/O bus, especially I only need part of that dataset every time.