I have a TensorFlow
model with a single Dense
layer:
model = tf.keras.Sequential([tf.keras.layers.Dense(2)])
model.build(input_shape=(None, None, 25))
I construct a single input vector in float32
:
np_vec = np.array(np.random.randn(1, 1, 25), dtype=np.float32)
vec = tf.cast(tf.convert_to_tensor(np_vec), dtype=tf.float32)
I want to feed that to my model for prediction, but it is very slow.
If I call predict
or __call__
it takes a really long time, compared to doing the same operation in NumPy.
- Call
%timeit model.predict(vec)
:10 loops, best of 3: 21.9 ms per loop
- Call the model as is
%timeit model(vec, training=False)
:1000 loops, best of 3: 806 µs per loop
- Perform the multiplication operation myself
weights = np.array(model.layers[0].get_weights()[0]) %timeit np_vec @ weights
1000000 loops, best of 3: 1.27 µs per loop
- Perform the multiplication myself using torch
100000 loops, best of 3: 2.57 µs per loop
Google Colab: https://colab.research.google.com/drive/1RCnTM24RUI4VkykVtdRtRdUVEkAHdu4A?usp=sharing
How can I make my TensorFlow model faster in inference time?
Especially because I don't only have a Dense
layer, but I also use an LSTM
and I don't want to reimplement that in NumPy.