The step() function takes an action as an input and applies it to the environment, which leads to the environment transitioning to a new state.
The step function returns four things
In gymnasium, there are two corresponding boolean values among the return items of env.step. Terminated flag represents the actual end of the trajectory, whereas truncated flag represents the truncation of trajectory due to other reasons, including reaching the maximum episode length. In this latter case, you should still reset the environment, but the done flag for TD-updates (stored in the replay buffer) should be False.