Data Classification

The target variable is discrete values

means classes

the values take on only a small number of discrete values

classification function is defined by

Data Classifications

Data Clustering

Binary Classification

Nearest Neighbor Classifier

K-nearest Neighbor

Data Classification Notion

Discriminant Function

It's not that regression is bad, but that classification is better as the final evaluation loss function, according to Google Deepmind

arxiv.org

https://arxiv.org/pdf/2403.03950.pdf

For separating a single convex polytope (with m faces), the paper establishes an upper bound for 2-layer networks (width m is sufficient) and a lower bound that deeper networks must satisfy (constraints on the combination of widths across layers). Inverse problem (trained network → extracting polytope structure of data): Viewing trained ReLU networks as intrinsically encoding the polytope structure of data, the paper presents an algorithm to find minimal representations by 'compressing' polytopes through neuron removal/scaling. In toy geometries like "two triangles" or "outer hexagon + inner pentagonal hole," training with BCE/MSE shows convergence to the 'boundary hyperplane/polytope' structures predicted by theory, visually demonstrated.

Polytope simplicity of real datasets: Analyzing MNIST/Fashion-MNIST/CIFAR10 with one-vs-all for each class, each class can be covered by (at most) 2 polytopes (face count is generally limited to 30 or fewer, split into 2 when necessary), and the paper provides concrete numbers for a small-width 3-layer architecture that can achieve complete separation (constructive) for MNIST/Fashion-MNIST/CIFAR10. The resulting networks have very sparse connections, suggesting a possible connection to LTH, particularly noting that maintaining weight sign is important for preserving convex polytope structure (

Lottery Ticket Hypothesis)

openreview.net

https://openreview.net/pdf?id=qXoqV40imX

Data Classification

The target variable is discrete values

Recommendations