Question 1

I have a list of LongTensors, and another list of labels. I'm new to PyTorch and RNN's so I'm quite confused as to how to implement minibatch training for the data I have. There is much more to this data, but I want to keep it simple, so I can understand only how to implement the minibatch training part. I'm doing multiclass classification based on the final hidden state of an LSTM/GRU trained on variable length inputs. I managed to get it working with batch size 1(basically SGD) but I'm struggling with implementing minibatches.

Do I have to pad the sequences to the maximum size and create a new tensor matrix of larger size which holds all the elements? I mean like this:

inputs = pad(sequences)
train = DataLoader(inputs, batch_size=batch_size, shuffle=True)
for i, data in train:#do stuff using LSTM and/or GRU models

Is this the accepted way of doing minibatch training on custom data? I couldn't find any tutorials on loading custom data using DataLoader(but I assume that's the way to create batches using pyTorch?)

Another doubt I have is with regards to padding. The reason I'm using LSTM/GRU is because of the variable length of the input. Doesn't padding defeat the purpose? Is padding necessary for minibatch training?

Question 2

Yes. The issue with minibatch training on sequences which have different lengths is that you can't stack sequences of different lengths together.

Normally one would do.

for e in range(epochs):sequences = shuffle(sequences)for mb in range(len(sequences)/mb_size):batch = torch.stack(sequences[mb*mb_size:(mb+1)*mb_size])

and then you apply your neural network on your batch. But because your sequences are of different lengths, the torch.stack will fail. So indeed what you have to do is to pad your sequences with zeros so that they all have the same length (at least in a minibatch). So you have 2 options:

1) At the very very beginning, pad all your sequences with initial zeros so that they all have the same length as your longest sequence of all your data.

OR

2) On the fly, for each minibatch, before stacking the sequences together, pad all the sequences that will go into the minibatch with initial zeros so that they all have the same length as the longest sequence of the minibatch.

Mini batch training for inputs of variable sizes

Related Q&A

Python gTTS, is there a way to change the speed of the speech

Change QLabel text dynamically in PyQt4

Setting figure size to be larger than screen size in matplotlib

Tensorflow 0.7.1 with Cuda Toolkit 7.5 and cuDNN 7.0

How to export tensor board data?

Releasing Python GIL while in C++ code

How to include the default TEMPLATE_CONTEXT_PROCESSORS in the new TEMPLATES setting in Django 1.10

Selecting best range of values from histogram curve

dash_bootstrap_components installed succesfully but no recognised

Efficient updates of image plots in Bokeh for interactive visualization