A policy is a mapping from state s to a probability distribution over actions.
Policy-based methods directly learn the policy, while value-based methods derive policy from the value function. Policy-based methods have to maintain the table and just update the network. Actor-critic methods approximate the value function by updating its network.