Question 1

I'm currently using a basic LSTM to make regression predictions and I would like to implement a causal CNN as it should be computationally more efficient.

I'm struggling to figure out how to reshape my current data to fit the causal CNN cell and represent the same data/timestep relationship as well as what the dilation rate should be set at.

My current data is of this shape: (number of examples, lookback, features) and here's a basic example of the LSTM NN I'm using right now.

lookback = 20   #  height -- timeseries
n_features = 5  #  width  -- features at each timestep# Build an LSTM to perform regression on time series input/output data
model = Sequential()
model.add(LSTM(units=256, return_sequences=True, input_shape=(lookback, n_features)))
model.add(Activation('elu'))model.add(LSTM(units=256, return_sequences=True))
model.add(Activation('elu'))model.add(LSTM(units=256))
model.add(Activation('elu'))model.add(Dense(units=1, activation='linear'))model.compile(optimizer='adam', loss='mean_squared_error')model.fit(X_train, y_train,epochs=50, batch_size=64,validation_data=(X_val, y_val),verbose=1, shuffle=True)prediction = model.predict(X_test)

I then created a new CNN model (although not causal as the 'causal' padding is only an option for Conv1D and not Conv2D, per Keras documentation. If I understand correctly, by having multiple features, I need to use Conv2D, rather than Conv1D but then if I set Conv2D(padding='causal'), I get the following error - Invalid padding: causal)

Anyways, I was also able to fit the data with a new shape (number of examples, lookback, features, 1) and run the following model using the Conv2D Layer:

lookback = 20   #  height -- timeseries
n_features = 5  #  width  -- features at each timestepmodel = Sequential()model.add(Conv2D(128, 3, activation='elu', input_shape=(lookback, n_features, 1)))
model.add(MaxPool2D())
model.add(Conv2D(128, 3, activation='elu'))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(1, activation='linear'))model.compile(optimizer='adam', loss='mean_squared_error')model.fit(X_train, y_train,epochs=50, batch_size=64,validation_data=(X_val, y_val),verbose=1, shuffle=True)prediction = model.predict(X_test)

However, from my understanding, this does not propagate the data as causal, rather just the entire set (lookback, features, 1) as an image.

Is there any way to either reshape my data to fit into a Conv1D(padding='causal') Layer, with multiple features or somehow run the same data and input shape as Conv2D with 'causal' padding?

Question 2

I believe that you can have causal padding with dilation for any number of input features. Here is the solution I would propose.

The TimeDistributed layer is key to this.

From Keras Documentation: "This wrapper applies a layer to every temporal slice of an input. The input should be at least 3D, and the dimension of index one will be considered to be the temporal dimension."

For our purposes, we want this layer to apply "something" to each feature, so we move the features to the temporal index, which is 1.

Also relevant is the Conv1D documentation.

Specifically about channels: "The ordering of the dimensions in the inputs. "channels_last" corresponds to inputs with shape (batch, steps, channels) (default format for temporal data in Keras)"

from tensorflow.python.keras import Sequential, backend
from tensorflow.python.keras.layers import GlobalMaxPool1D, Activation, MaxPool1D, Flatten, Conv1D, Reshape, TimeDistributed, InputLayerbackend.clear_session()
lookback = 20
n_features = 5filters = 128model = Sequential()
model.add(InputLayer(input_shape=(lookback, n_features, 1)))
# Causal layers are first applied to the features independently
model.add(Permute(dims=(2, 1)))  # UPDATE must permute prior to adding new dim and reshap
model.add(Reshape(target_shape=(n_features, lookback, 1)))
# After reshape 5 input features are now treated as the temporal layer 
# for the TimeDistributed layer# When Conv1D is applied to each input feature, it thinks the shape of the layer is (20, 1)
# with the default "channels_last", therefore...# 20 times steps is the temporal dimension
# 1 is the "channel", the new location for the feature mapsmodel.add(TimeDistributed(Conv1D(filters, 3, activation="elu", padding="causal", dilation_rate=2**0)))
# You could add pooling here if you want. 
# If you want interaction between features AND causal/dilation, then apply later
model.add(TimeDistributed(Conv1D(filters, 3, activation="elu", padding="causal", dilation_rate=2**1)))
model.add(TimeDistributed(Conv1D(filters, 3, activation="elu", padding="causal", dilation_rate=2**2)))# Stack feature maps on top of each other so each time step can look at 
# all features produce earlier
model.add(Permute(dims=(2, 1, 3)))  # UPDATED to fix issue with reshape
model.add(Reshape(target_shape=(lookback, n_features * filters)))  # (20 time steps, 5 features * 128 filters)
# Causal layers are applied to the 5 input features dependently
model.add(Conv1D(filters, 3, activation="elu", padding="causal", dilation_rate=2**0))
model.add(MaxPool1D())
model.add(Conv1D(filters, 3, activation="elu", padding="causal", dilation_rate=2**1))
model.add(MaxPool1D())
model.add(Conv1D(filters, 3, activation="elu", padding="causal", dilation_rate=2**2))
model.add(GlobalMaxPool1D())
model.add(Dense(units=1, activation='linear'))model.compile(optimizer='adam', loss='mean_squared_error')model.summary()

Final Model Summary

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
reshape (Reshape)            (None, 5, 20, 1)          0         
_________________________________________________________________
time_distributed (TimeDistri (None, 5, 20, 128)        512       
_________________________________________________________________
time_distributed_1 (TimeDist (None, 5, 20, 128)        49280     
_________________________________________________________________
time_distributed_2 (TimeDist (None, 5, 20, 128)        49280     
_________________________________________________________________
reshape_1 (Reshape)          (None, 20, 640)           0         
_________________________________________________________________
conv1d_3 (Conv1D)            (None, 20, 128)           245888    
_________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 10, 128)           0         
_________________________________________________________________
conv1d_4 (Conv1D)            (None, 10, 128)           49280     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 5, 128)            0         
_________________________________________________________________
conv1d_5 (Conv1D)            (None, 5, 128)            49280     
_________________________________________________________________
global_max_pooling1d (Global (None, 128)               0         
_________________________________________________________________
dense (Dense)                (None, 1)                 129       
=================================================================
Total params: 443,649
Trainable params: 443,649
Non-trainable params: 0
_________________________________________________________________

Edit:

"why you need to reshape and use n_features as the temporal layer"

The reason why n_features needs to be at the temporal layer initially is because Conv1D with dilation and causal padding only works with one feature at a time, and because of how the TimeDistributed layer is implemented.

From their documentation "Consider a batch of 32 samples, where each sample is a sequence of 10 vectors of 16 dimensions. The batch input shape of the layer is then (32, 10, 16), and the input_shape, not including the samples dimension, is (10, 16).

You can then use TimeDistributed to apply a Dense layer to each of the 10 timesteps, independently:"

By applying the TimeDistributed layer independently to each feature, it reduces the dimension of the problem as if there was only one feature (which would easily allow for dilation and causal padding). With 5 features, they need to each be handled separately at first.

After your edits this recommendation still applies.
There shouldn't be a difference in terms of the network whether InputLayer is included in the first layer or separate so you can definitely put it in the first CNN if that resolves the issue.

Multi-feature causal CNN - Keras implementation

Related Q&A

Adding a join to an SQL Alchemy expression that already has a select_from()

How should I move blobs from BlobStore over to Google Cloud Storage?

Python: Find `sys.argv` before the `sys` module is loaded

Dotted lines instead of a missing value in matplotlib

How to change the creation date of file using python on a mac?

Classification tree in sklearn giving inconsistent answers

Modifying binary file with Python

python error : module object has no attribute AF_UNIX

How to speed up pandas string function?

sqlalchemy autoloaded orm persistence