[UA] NaUKMA RL Spring '24, Lecture 2 - RL Algorithms
Markov Chain, MDP, Bellman, Q-function, value function, off-policy, on-policy, model-based, policy gradients, actor-critic, exploration-exploitation, k-armed bandits, eps-greedy, optimistic greedy, UCB, gradient bandits.
Markov Chain, MDP, Bellman, Q-function, value function, off-policy, on-policy, model-based, policy gradients, actor-critic, exploration-exploitation, k-armed bandits, eps-greedy, optimistic greedy, UCB, gradient bandits.