Data Classification

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Mar 5 13:4
Editor
Edited
Edited
2026 Jan 12 17:39

The target variable is discrete values

  • means classes
  • the values take on only a small number of discrete values
  • classification function is defined by
Data Classifications
 
 
Data Classification Notion
 
 
 
 
It's not that regression is bad, but that classification is better as the final evaluation loss function, according to Google Deepmind
For separating a single convex polytope (with m faces), the paper establishes an upper bound for 2-layer networks (width m is sufficient) and a lower bound that deeper networks must satisfy (constraints on the combination of widths across layers). Inverse problem (trained network → extracting polytope structure of data): Viewing trained ReLU networks as intrinsically encoding the polytope structure of data, the paper presents an algorithm to find minimal representations by 'compressing' polytopes through neuron removal/scaling. In toy geometries like "two triangles" or "outer hexagon + inner pentagonal hole," training with BCE/MSE shows convergence to the 'boundary hyperplane/polytope' structures predicted by theory, visually demonstrated.
Polytope simplicity of real datasets: Analyzing MNIST/Fashion-MNIST/CIFAR10 with one-vs-all for each class, each class can be covered by (at most) 2 polytopes (face count is generally limited to 30 or fewer, split into 2 when necessary), and the paper provides concrete numbers for a small-width 3-layer architecture that can achieve complete separation (constructive) for MNIST/Fashion-MNIST/CIFAR10. The resulting networks have very sparse connections, suggesting a possible connection to LTH, particularly noting that maintaining weight sign is important for preserving convex polytope structure (
Lottery Ticket Hypothesis
)
 
 

Recommendations