Model Layer Scaling TechniquesDepth Up-ScalingCOCONUTRecursive Transformer Model Layer OptimizationMoDLayerSkipRecursive TransformersTranskimmer arxiv.orghttps://arxiv.org/pdf/2203.00555