Something wrong with Keras code Q-learning OpenAI gym FrozenLake

2024/9/21 7:14:54

Maybe my question will seem stupid.

I'm studying the Q-learning algorithm. In order to better understand it, I'm trying to remake the Tenzorflow code of this FrozenLake example into the Keras code.

My code:

import gym
import numpy as np
import randomfrom keras.layers import Dense
from keras.models import Sequential
from keras import backend as K    import matplotlib.pyplot as plt
%matplotlib inlineenv = gym.make('FrozenLake-v0')model = Sequential()
model.add(Dense(16, activation='relu', kernel_initializer='uniform', input_shape=(16,)))
model.add(Dense(4, activation='softmax', kernel_initializer='uniform'))def custom_loss(yTrue, yPred):return K.sum(K.square(yTrue - yPred))model.compile(loss=custom_loss, optimizer='sgd')# Set learning parameters
y = .99
e = 0.1
#create lists to contain total rewards and steps per episode
jList = []
rList = []num_episodes = 2000
for i in range(num_episodes):current_state = env.reset()rAll = 0d = Falsej = 0while j < 99:j+=1current_state_Q_values = model.predict(np.identity(16)[current_state:current_state+1], batch_size=1)action = np.reshape(np.argmax(current_state_Q_values), (1,))if np.random.rand(1) < e:action[0] = env.action_space.sample() #random actionnew_state, reward, d, _ = env.step(action[0])rAll += rewardjList.append(j)rList.append(rAll)new_Qs = model.predict(np.identity(16)[new_state:new_state+1], batch_size=1)max_newQ = np.max(new_Qs)targetQ = current_state_Q_valuestargetQ[0,action[0]] = reward + y*max_newQmodel.fit(np.identity(16)[current_state:current_state+1], targetQ, verbose=0, batch_size=1)current_state = new_stateif d == True:#Reduce chance of random action as we train the model.e = 1./((i/50) + 10)break
print("Percent of succesful episodes: " + str(sum(rList)/num_episodes) + "%")

When I run it, it doesn't work well: Percent of succesful episodes: 0.052%

plt.plot(rList)

enter image description here

The original Tensorflow code is much more better: Percent of succesful episodes: 0.352%

plt.plot(rList)

enter image description here

What have I done wrong ?

Answer

Besides setting use_bias=False as @Maldus mentioned in the comments, another thing you can try is to start with a higher epsilon value (e.g. 0.5, 0.75)? A trick might be to only decrease the epsilon value IF you reach the goal. i.e. don't decrease epsilon on the end of every episode. That way your player can keep on exploring the map randomly, until it starts to converge on a good route, and then it'll be a good idea to reduce the epsilon parameter.

I've actually implemented a similar model in keras in this gist using Convolutional layers instead of Dense layers. Managed to get it to work in under 2000 episodes. Might be of some help to others :)

https://en.xdnf.cn/q/72012.html

Related Q&A

How to generate month names as list in Python? [duplicate]

This question already has answers here:Get month name from number(18 answers)Closed 2 years ago.I have tried using this but the output is not as desired m = [] import calendar for i in range(1, 13):m.a…

Getting ERROR: Double requirement given: setuptools error in zappa

I tried to deploy my Flask app with zappa==0.52.0, but I get an error as below;ERROR: Double requirement given: setuptools (already in setuptools==52.0.0.post20210125, name=setuptools) WARNING: You are…

PySpark - Create DataFrame from Numpy Matrix

I have a numpy matrix:arr = np.array([[2,3], [2,8], [2,3],[4,5]])I need to create a PySpark Dataframe from arr. I can not manually input the values because the length/values of arr will be changing dyn…

RunTimeError during one hot encoding

I have a dataset where class values go from -2 to 2 by 1 step (i.e., -2,-1,0,1,2) and where 9 identifies the unlabelled data. Using one hot encode self._one_hot_encode(labels)I get the following error:…

Is there a Mercurial or Git version control plugin for PyScripter? [closed]

Closed. This question is seeking recommendations for books, tools, software libraries, and more. It does not meet Stack Overflow guidelines. It is not currently accepting answers.We don’t allow questi…

How to make a color map with many unique colors in seaborn

I want to make a colormap with many (in the order of hundreds) unique colors. This code: custom_palette = sns.color_palette("Paired", 12) sns.palplot(custom_palette)returns a palplot with 12 …

Swap column values based on a condition in pandas

I would like to relocate columns by condition. In case country is Japan, I need to relocate last_name and first_name reverse.df = pd.DataFrame([[France,Kylian, Mbappe],[Japan,Hiroyuki, Tajima],[Japan,…

How to improve performance on a lambda function on a massive dataframe

I have a df with over hundreds of millions of rows.latitude longitude time VAL 0 -39.20000076293945312500 140.80000305175781250000 1…

How to detect if text is rotated 180 degrees or flipped upside down

I am working on a text recognition project. There is a chance the text is rotated 180 degrees. I have tried tesseract-ocr on terminal, but no luck. Is there any way to detect it and correct it? An exa…

Infinite loops using for in Python [duplicate]

This question already has answers here:Is there an expression for an infinite iterator?(7 answers)Closed 5 years ago.Why does this not create an infinite loop? a=5 for i in range(1,a):print(i)a=a+1or…