In real-world applications, in most cases, the test distribution does not match the training distribution
- iteration means The number of passes to complete one epoch.
- batch size means Total number of training examples present in a single batch
Model Training Notion
- Data preprocessing
- Model architecture
- Reasonable Loss Function
- Start with small dataset for overfitting and fewer iterations
- Find optimal Learning Rate and Model Regularization parameters
- if loss > 3 * original loss, quit early (learning rate is too high)
- Increase data size & epoch iteration
Model Training Usages
Previously, it was advantageous to set batch sizes as powers of 2 (e.g., 64, 128, 256...) for performance reasons, but according to recent research, it's not necessary to strictly use powers of 2 for batch sizes. What's more important is finding the optimal value through actual experiments.