Question 1

I have a TensorFlow model with a single Dense layer:

model = tf.keras.Sequential([tf.keras.layers.Dense(2)])
model.build(input_shape=(None, None, 25))

I construct a single input vector in float32:

np_vec = np.array(np.random.randn(1, 1, 25), dtype=np.float32)
vec = tf.cast(tf.convert_to_tensor(np_vec), dtype=tf.float32)

I want to feed that to my model for prediction, but it is very slow. If I call predict or __call__ it takes a really long time, compared to doing the same operation in NumPy.

Call %timeit model.predict(vec):

10 loops, best of 3: 21.9 ms per loop
Call the model as is %timeit model(vec, training=False):

1000 loops, best of 3: 806 µs per loop
Perform the multiplication operation myself
```
weights = np.array(model.layers[0].get_weights()[0])   
%timeit np_vec @ weights
```
1000000 loops, best of 3: 1.27 µs per loop
Perform the multiplication myself using torch

100000 loops, best of 3: 2.57 µs per loop

Google Colab: https://colab.research.google.com/drive/1RCnTM24RUI4VkykVtdRtRdUVEkAHdu4A?usp=sharing

How can I make my TensorFlow model faster in inference time? Especially because I don't only have a Dense layer, but I also use an LSTM and I don't want to reimplement that in NumPy.

Question 2

The whole story lies behind the implementation of the LSTM layer in Keras. The Keras LSTM layer has a default argument unroll=False. This causes the LSTM to run a symbolic loop (loop causes more time). Try adding an extra argument to the LSTM as unroll=True.

tf.keras.layers.LSTM(64, return_sequences=True, stateful=True, unroll=True)

This may result in up to a 2x speed boost up (tested on my machine, using %timeit model(vec, training=False)). However, using unroll=True may cause taking more ram for larger sequences. For more inquiry, please have a look at the Keras LSTM documentation.

Tensorflow model prediction is slow

Related Q&A

Pandas Sqlite query using variable

How to remove ^M from a text file and replace it with the next line

Cython: size attribute of memoryviews

python asynchronous httprequest

What are response codes for 256 and 512 for os.system in python scripting

Sphinx floating point formatting

Truncating column width in pandas

Django - CreateView with multiple models

Is there a way to pass dictionary in tf.data.Dataset w/ tf.py_func?

How to split only on carriage returns with readlines in python?