Partially observable Markov decision process
Generalization of Markov Decision Process.
POMDP models an agent decision process where system dynamics are assumed to be determined by MDP, but the agent cannot directly observe the underlying state.
Belief MDP bound (Token Entropy)
This represents how much we believe in the observation by adding another multiplication factor.
Multi-agent observation sharing for POMDP to approximation