OpenProteinSet: Training data for structural biology at scale
Multiple sequence alignments (MSAs) of proteins encode rich biological information and have been workhorses in bioinformatic methods for tasks like protein design and protein structure prediction...
https://arxiv.org/abs/2308.05326