Question 1

I am trying to train a 2D convolutional LSTM to make categorical predictions based on video data. However, my output layer seems to be running into a problem:

"ValueError: Error when checking target: expected dense_1 to have 5 dimensions, but got array with shape (1, 1939, 9)"

My current model is based off of the ConvLSTM2D example provided by Keras Team. I believe that the above error is the result of my misunderstanding the example and its basic principles.

Data

I have an arbitrary number of videos, where each video contains an arbitrary number of frames. Each frame is 135x240x1 (color channels last). This results in an input shape of (None, None, 135, 240, 1), where the two "None" values are batch size and timesteps in that order. If I train on a single video with a 1052 frames, then my input shape becomes (1, 1052, 135, 240, 1).

For each frame, the model should predict values between 0 and 1 across 9 classes. This means that my output shape is (None, None, 9). If I train on a single video with 1052 frames, then this shape becomes (1, 1052, 9).

Model

Layer (type)                 Output Shape              Param #
=================================================================
conv_lst_m2d_1 (ConvLSTM2D)  (None, None, 135, 240, 40 59200
_________________________________________________________________
batch_normalization_1 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_2 (ConvLSTM2D)  (None, None, 135, 240, 40 115360
_________________________________________________________________
batch_normalization_2 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_3 (ConvLSTM2D)  (None, None, 135, 240, 40 115360
_________________________________________________________________
batch_normalization_3 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
dense_1 (Dense)              (None, None, 135, 240, 9) 369
=================================================================
Total params: 290,769
Trainable params: 290,529
Non-trainable params: 240

Source code

model = Sequential()model.add(ConvLSTM2D(filters=40,kernel_size=(3, 3),input_shape=(None, 135, 240, 1),padding='same',return_sequences=True))
model.add(BatchNormalization())model.add(ConvLSTM2D(filters=40,kernel_size=(3, 3),padding='same',return_sequences=True))
model.add(BatchNormalization())model.add(ConvLSTM2D(filters=40,kernel_size=(3, 3),padding='same',return_sequences=True))
model.add(BatchNormalization())model.add(Dense(units=classes,activation='softmax'
))
model.compile(loss='categorical_crossentropy',optimizer='adadelta'
)
model.fit_generator(generator=training_sequence)

Traceback

Epoch 1/1
Traceback (most recent call last):File ".\lstm.py", line 128, in <module>main()File ".\lstm.py", line 108, in mainmodel.fit_generator(generator=training_sequence)File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapperreturn func(*args, **kwargs)File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\models.py", line 1253, in fit_generatorinitial_epoch=initial_epoch)File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapperreturn func(*args, **kwargs)File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 2244, in fit_generatorclass_weight=class_weight)File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 1884, in train_on_batchclass_weight=class_weight)File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 1487, in _standardize_user_dataexception_prefix='target')File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 113, in _standardize_input_data'with shape ' + str(data_shape))
ValueError: Error when checking target: expected dense_1 to have 5 dimensions, but got array with shape (1, 1939, 9)

A sample input shape printed with batch size set to 1 is (1, 1389, 135, 240, 1). This shape matches the requirements I described above, so I think my Keras Sequence subclass (in the source code as "training_sequence") is correct.

I suspect that the problem is caused by my going directly from BatchNormalization() to Dense(). After all, the traceback indicates that the problem is occurring in dense_1 (the final layer). However, I wouldn't want to lead anyone astray with my beginner-level knowledge, so please take my assessment with a grain of salt.

Edit 3/27/2018

After reading this thread, which involves a similar model, I changed my final ConvLSTM2D layer so that the return_sequences parameter is set to False instead of True. I also added a GlobalAveragePooling2D layer before my Dense layer. The updated model is as follows:

Layer (type)                 Output Shape              Param #
=================================================================
conv_lst_m2d_1 (ConvLSTM2D)  (None, None, 135, 240, 40 59200
_________________________________________________________________
batch_normalization_1 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_2 (ConvLSTM2D)  (None, None, 135, 240, 40 115360
_________________________________________________________________
batch_normalization_2 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_3 (ConvLSTM2D)  (None, 135, 240, 40)      115360
_________________________________________________________________
batch_normalization_3 (Batch (None, 135, 240, 40)      160
_________________________________________________________________
global_average_pooling2d_1 ( (None, 40)                0
_________________________________________________________________
dense_1 (Dense)              (None, 9)                 369
=================================================================
Total params: 290,769
Trainable params: 290,529
Non-trainable params: 240

Here is a new copy of the traceback:

Traceback (most recent call last):File ".\lstm.py", line 131, in <module>main()File ".\lstm.py", line 111, in mainmodel.fit_generator(generator=training_sequence)File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapperreturn func(*args, **kwargs)File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\models.py", line 1253, in fit_generatorinitial_epoch=initial_epoch)File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapperreturn func(*args, **kwargs)File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 2244, in fit_generatorclass_weight=class_weight)File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 1884, in train_on_batchclass_weight=class_weight)File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 1487, in _standardize_user_dataexception_prefix='target')File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 113, in _standardize_input_data'with shape ' + str(data_shape))
ValueError: Error when checking target: expected dense_1 to have 2 dimensions, but got array with shape (1, 1034, 9)

I printed the x and y shapes on this run. x was (1, 1034, 135, 240, 1) and y was (1, 1034, 9). This may narrow the problem down. It looks like the problem is related to the y data rather than the x data. Specifically, the Dense layer doesn't like the temporal dim. However, I am not sure how to rectify this issue.

Edit 3/28/2018

Yu-Yang's solution worked. For anyone with a similar problem who wants to see what the final model looked like, here is the summary:

Layer (type)                 Output Shape              Param #
=================================================================
conv_lst_m2d_1 (ConvLSTM2D)  (None, None, 135, 240, 40 59200
_________________________________________________________________
batch_normalization_1 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_2 (ConvLSTM2D)  (None, None, 135, 240, 40 115360
_________________________________________________________________
batch_normalization_2 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_3 (ConvLSTM2D)  (None, None, 135, 240, 40 115360
_________________________________________________________________
batch_normalization_3 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
average_pooling3d_1 (Average (None, None, 1, 1, 40)    0
_________________________________________________________________
reshape_1 (Reshape)          (None, None, 40)          0
_________________________________________________________________
dense_1 (Dense)              (None, None, 9)           369
=================================================================
Total params: 290,769
Trainable params: 290,529
Non-trainable params: 240

Also, the source code:

model = Sequential()model.add(ConvLSTM2D(filters=40,kernel_size=(3, 3),input_shape=(None, 135, 240, 1),padding='same',return_sequences=True))
model.add(BatchNormalization())model.add(ConvLSTM2D(filters=40,kernel_size=(3, 3),padding='same',return_sequences=True))
model.add(BatchNormalization())model.add(ConvLSTM2D(filters=40,kernel_size=(3, 3),padding='same',return_sequences=True))
model.add(BatchNormalization())model.add(AveragePooling3D((1, 135, 240)))
model.add(Reshape((-1, 40)))
model.add(Dense(units=9,activation='sigmoid'))model.compile(loss='categorical_crossentropy',optimizer='adadelta'
)

Question 2

If you want a prediction per frame, then you should definitely set return_sequences=True in your last ConvLSTM2D layer.

For the ValueError on target shape, replace the GlobalAveragePooling2D() layer with AveragePooling3D((1, 135, 240)) plus Reshape((-1, 40)) to make the output shape compatible with your target array.

Keras ConvLSTM2D: ValueError on output layer

Related Q&A

debugging argpars in python

Non-blocking server in Twisted

Reading changing file in Python 3 and Python 2

How to remove timestamps from celery pprint output?

How to get max() to return variable names instead of values in Python?

SymPy Imaginary Number

Django 1.11 404 Page while Debug=True

zip()-like built-in function filling unequal lengths from left with None value

Collection comparison is reflexive, yet does not short circuit. Why?

Anisotropic diffusion 2d images [closed]