python - Cartpole with Q-Learning not Learning Anything - Stack Overflow-软件玩家

admin管理员组
文章数量:1291123

So I was trying to solve the cartpole problem. This is a common problem when dealing with reinforcement learning. Essentially, you have a cart that is balancing a pole. The cart can move left or right. The episode ends when the pole falls. The whole point is to keep the pole balanced for as long as possible.

enter image description here

People usually take the shortcut of using the OpenAI gym environment. But I wanted to do it without using that shortcut, since I want to eventually make my own complex environments. I got it to run, but it doesn't seem to be learning anything. It collects a maximum of 14 points, every single time. What am I doing wrong?

import numpy as np
import math
import statistics


# Environment dimensions
SCREEN_WIDTH = 800
SCREEN_HEIGHT = 600


# Cart properties
CART_WIDTH = 100
CART_HEIGHT = 20
cart_x = SCREEN_WIDTH // 2 - CART_WIDTH
cart_y = SCREEN_HEIGHT - 50
cart_speed = 5

# Pole properties
POLE_LENGTH = 100
POLE_ANGLE = math.pi / 4  
POLE_ANGULAR_VELOCITY = 0.0
POLE_ANGULAR_ACCELERATION = 0.0
GRAVITY = 0.01

# Game loop flag
running = True

EPISODES = 20000
LEFT = 0
RIGHT = 1
ACTIONS = [LEFT, RIGHT]
EPSILON = 0.9
EPSILON_DECAY = 0.01
MIN_EPSILON = 0.01
LEARNING_RATE = 0.5
DISCOUNT = 0.9

q_table = np.zeros((800, len(ACTIONS)))

def check_game_over(pole_angle):
    if abs(pole_angle) > math.pi / 2:
        return True
    return False

def update_pos(state, action, pole_angular_acceleration, pole_angle, pole_angular_velocity):
    if action == 0:
        state -= cart_speed
    if action == 1:
        state += cart_speed
    
    # Constrain cart within screen boundaries
    state = max(0, min(state, SCREEN_WIDTH - CART_WIDTH))
    
    # update pole physics
    pole_angular_acceleration = GRAVITY * math.sin(pole_angle)
    pole_angular_velocity += pole_angular_acceleration
    pole_angle += pole_angular_velocity
    
    # apply damping to stabilize the pole
    pole_angular_velocity *= 0.99
    return state, pole_angle, pole_angular_velocity, pole_angular_acceleration
    
def choose_action(state, epsilon):
    if np.random.uniform() < epsilon:
        action = np.argmax(q_table[state])
    else:
        action = np.random.choice(ACTIONS)
    return action

def train():
    for e in range(EPISODES):
        pole_angular_velocity = POLE_ANGULAR_VELOCITY
        pole_angle = POLE_ANGLE
        pole_angular_acceleration = POLE_ANGULAR_ACCELERATION
        reward = 0
        rewards = []
        avg_rewards = []
        epsilon = EPSILON
        state = SCREEN_WIDTH // 2 - CART_WIDTH
        while not check_game_over(pole_angle):
            # choose action
            action = choose_action(state, epsilon)
            
            # update positions
            old_pos = q_table[state][action]
            next_s, pole_angle, pole_angular_velocity, pole_angular_acceleration = update_pos(state, action, pole_angular_acceleration, pole_angle, pole_angular_velocity)
            next_max = max(q_table[int(old_pos)])
            new_value = (1 - LEARNING_RATE) * old_pos + LEARNING_RATE * (reward + DISCOUNT * next_max)
            
            q_table[int(old_pos)][action] = new_value
            
            state = next_s
            
            # reward stuff
            reward += 1
            print(reward)
        rewards.append(reward)
        epsilon = max(MIN_EPSILON, epsilon * EPSILON_DECAY)
        
        if e % 100 == 0:
            avg_rewards.append(statistics.mean(rewards))
            print(avg_rewards)
            


train()

I thought the problem was not decreasing epsilon, but that didn't change the performance at all.

本文标签： pythonCartpole with QLearning not Learning AnythingStack Overflow

版权声明：本文标题：python - Cartpole with Q-Learning not Learning Anything - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1741527652a2383565.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

python - Cartpole with Q-Learning not Learning Anything - Stack Overflow

更多相关文章