LoRA uses the theoretical foundation that the intrinsic rank during adaptation is low
LoRA reduces the number of weight parameters by decomposing the adaptation layer's low intrinsic rank (Matrix Rank) into a product of low-dimensional matrices
For separating a single convex polytope (with m faces), the paper establishes an upper bound for 2-layer networks (width m is sufficient) and a lower bound that deeper networks must satisfy (constraints on the combination of widths across layers). Inverse problem (trained network → extracting polytope structure of data): Viewing trained ReLU networks as intrinsically encoding the polytope structure of data, the paper presents an algorithm to find minimal representations by 'compressing' polytopes through neuron removal/scaling. In toy geometries like "two triangles" or "outer hexagon + inner pentagonal hole," training with BCE/MSE shows convergence to the 'boundary hyperplane/polytope' structures predicted by theory, visually demonstrated.
Polytope simplicity of real datasets: Analyzing MNIST/Fashion-MNIST/CIFAR10 with one-vs-all for each class, each class can be covered by (at most) 2 polytopes (face count is generally limited to 30 or fewer, split into 2 when necessary), and the paper provides concrete numbers for a small-width 3-layer architecture that can achieve complete separation (constructive) for MNIST/Fashion-MNIST/CIFAR10. The resulting networks have very sparse connections, suggesting a possible connection to LTH, particularly noting that maintaining weight sign is important for preserving convex polytope structure (Lottery Ticket Hypothesis)
openreview.net
https://openreview.net/pdf?id=qXoqV40imX
Intrinsic Dimensionality Explains the Effectiveness of Language...
Although pretrained language models can be fine-tuned to produce state-of-the-art results for a very wide range of language understanding tasks, the dynamics of this process are not well...
https://arxiv.org/abs/2012.13255

Measuring the Intrinsic Dimension of Objective Landscapes
Many recently trained neural networks employ large numbers of parameters to achieve good performance. One may intuitively use the number of parameters required as a rough gauge of the difficulty...
https://arxiv.org/abs/1804.08838


Seonglae Cho