I was looking at different ways that one can do custom Tensorflow datasets, and I was used to looking at PyTorch's datasets, but when I went to look at Tensorflow's datasets, I saw this example:
class ArtificialDataset(tf.data.Dataset):def _generator(num_samples):# Opening the filetime.sleep(0.03)for sample_idx in range(num_samples):# Reading data (line, record) from the filetime.sleep(0.015)yield (sample_idx,)def __new__(cls, num_samples=3):return tf.data.Dataset.from_generator(cls._generator,output_signature = tf.TensorSpec(shape = (1,), dtype = tf.int64),args=(num_samples,))
But two questions came up:
- This looks like all it does is that when the object is instantiated, the
__new__
method just calls thetf.data.Dataset.from_generator
static method. So why not just call it? Why is there a point of even subclassingtf.data.Dataset
? Are there any methods that are even used fromtf.data.Dataset
? - Would there be a way to do it like a data generator, where one fills out an
__iter__
method while inheriting fromtf.data.Dataset
? Idk, something like
class MyDataLoader(tf.data.Dataset):def __init__(self, path, *args, **kwargs):super().__init__(*args, **kwargs)self.data = pd.read_csv(path)def __iter__(self):for datum in self.data.iterrows():yield datum
Thank you all very much!