admin管理员组文章数量:1291123
So I was trying to solve the cartpole problem. This is a common problem when dealing with reinforcement learning. Essentially, you have a cart that is balancing a pole. The cart can move left or right. The episode ends when the pole falls. The whole point is to keep the pole balanced for as long as possible.
enter image description here
People usually take the shortcut of using the OpenAI gym environment. But I wanted to do it without using that shortcut, since I want to eventually make my own complex environments. I got it to run, but it doesn't seem to be learning anything. It collects a maximum of 14 points, every single time. What am I doing wrong?
import numpy as np
import math
import statistics
# Environment dimensions
SCREEN_WIDTH = 800
SCREEN_HEIGHT = 600
# Cart properties
CART_WIDTH = 100
CART_HEIGHT = 20
cart_x = SCREEN_WIDTH // 2 - CART_WIDTH
cart_y = SCREEN_HEIGHT - 50
cart_speed = 5
# Pole properties
POLE_LENGTH = 100
POLE_ANGLE = math.pi / 4
POLE_ANGULAR_VELOCITY = 0.0
POLE_ANGULAR_ACCELERATION = 0.0
GRAVITY = 0.01
# Game loop flag
running = True
EPISODES = 20000
LEFT = 0
RIGHT = 1
ACTIONS = [LEFT, RIGHT]
EPSILON = 0.9
EPSILON_DECAY = 0.01
MIN_EPSILON = 0.01
LEARNING_RATE = 0.5
DISCOUNT = 0.9
q_table = np.zeros((800, len(ACTIONS)))
def check_game_over(pole_angle):
if abs(pole_angle) > math.pi / 2:
return True
return False
def update_pos(state, action, pole_angular_acceleration, pole_angle, pole_angular_velocity):
if action == 0:
state -= cart_speed
if action == 1:
state += cart_speed
# Constrain cart within screen boundaries
state = max(0, min(state, SCREEN_WIDTH - CART_WIDTH))
# update pole physics
pole_angular_acceleration = GRAVITY * math.sin(pole_angle)
pole_angular_velocity += pole_angular_acceleration
pole_angle += pole_angular_velocity
# apply damping to stabilize the pole
pole_angular_velocity *= 0.99
return state, pole_angle, pole_angular_velocity, pole_angular_acceleration
def choose_action(state, epsilon):
if np.random.uniform() < epsilon:
action = np.argmax(q_table[state])
else:
action = np.random.choice(ACTIONS)
return action
def train():
for e in range(EPISODES):
pole_angular_velocity = POLE_ANGULAR_VELOCITY
pole_angle = POLE_ANGLE
pole_angular_acceleration = POLE_ANGULAR_ACCELERATION
reward = 0
rewards = []
avg_rewards = []
epsilon = EPSILON
state = SCREEN_WIDTH // 2 - CART_WIDTH
while not check_game_over(pole_angle):
# choose action
action = choose_action(state, epsilon)
# update positions
old_pos = q_table[state][action]
next_s, pole_angle, pole_angular_velocity, pole_angular_acceleration = update_pos(state, action, pole_angular_acceleration, pole_angle, pole_angular_velocity)
next_max = max(q_table[int(old_pos)])
new_value = (1 - LEARNING_RATE) * old_pos + LEARNING_RATE * (reward + DISCOUNT * next_max)
q_table[int(old_pos)][action] = new_value
state = next_s
# reward stuff
reward += 1
print(reward)
rewards.append(reward)
epsilon = max(MIN_EPSILON, epsilon * EPSILON_DECAY)
if e % 100 == 0:
avg_rewards.append(statistics.mean(rewards))
print(avg_rewards)
train()
`
I thought the problem was not decreasing epsilon, but that didn't change the performance at all.
本文标签: pythonCartpole with QLearning not Learning AnythingStack Overflow
版权声明:本文标题:python - Cartpole with Q-Learning not Learning Anything - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741527652a2383565.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论