Back >

Reinforcement Learning - GridWorld Navigation

November 10, 2024

2 min read

Overview

This project explores Reinforcement Learning (RL) techniques for autonomous agent navigation in a GridWorld environment. The implementation includes classical RL algorithms such as Q-Learning and SARSA, demonstrating how an agent learns optimal navigation strategies through trial and error.

Key Features

Technical Implementation

Algorithms

Q-Learning

SARSA (State-Action-Reward-State-Action)

Environment Design

The GridWorld environment features:

Technologies Used

Results & Analysis

The project demonstrates:

Key Findings

  1. Q-Learning tends to find optimal policies faster but can be more unstable during training
  2. SARSA shows more stable learning curves with conservative policy updates
  3. Both algorithms successfully learn to navigate complex grid layouts with proper hyperparameter tuning

Implementation Highlights

# Simplified Q-Learning update
def q_learning_update(state, action, reward, next_state):
    max_next_q = np.max(Q[next_state])
    Q[state, action] += alpha * (reward + gamma * max_next_q - Q[state, action])

# Simplified SARSA update
def sarsa_update(state, action, reward, next_state, next_action):
    next_q = Q[next_state, next_action]
    Q[state, action] += alpha * (reward + gamma * next_q - Q[state, action])

Learning Outcomes

This project provided hands-on experience with:

Future Enhancements

References


Project Status: Completed
Date: November 2024
Course: Reinforcement Learning