Protein sequencing: Techniques and applications

Understanding protein structure and function is fundamental to molecular biology. Protein sequencing is central to this effort, as proteins regulate nearly all cellular activities. By analyzing amino acid sequences, researchers gain critical insights into protein function, interactions, evolutionary relationships, and disease mechanisms. Recent advancements in sequencing technologies and analytical tools have significantly expanded applications in biotechnology, diagnostics, and therapeutic development.

What is protein sequencing?

Protein sequencing is the method to determine the specific sequence of amino acids in a protein or peptide. Sequencing helps to identify the three-dimensional structure of proteins along with their biological functions. Edman degradation and mass spectrometry (MS) are the two major methods used for protein sequencing. Both of the techniques have some different advantages depending on the research needs.

Primary techniques in protein sequencing

Edman degradation: Edman degradation was first developed by Pehr Edman in 1950s. This is a stepwise chemical method, in which, the N-terminal amino acid of a protein is removed, labelled and identified. Although time-consuming and limited to shorter peptides of around 50 residues, it is still helpful in the confirmation of the N-terminal sequence of a purified protein.

Mass spectrometry-based sequencing: Mass spectrometry (MS) transformed protein sequencing with high sensitivity, velocity, and throughput. Proteins are usually spliced using enzymes such as trypsin into peptides. These peptides are then ionized and separated according to their mass-to-charge (m/z) ratios.

Tandem mass spectrometry (MS/MS) further fragments peptides into smaller ions that can be used for the reconstruction of amino acid sequences. Modern instruments such as Orbitrap and time-of-flight (TOF) analyzers have immensely enhanced the resolution and precision of this technique.

Key advantages of MS-based sequencing include:

  • High sensitivity (can detect tiny amounts of protein)
  • Rapid analysis
  • Suitable for complex mixtures
  • Allows identification of post-translational modifications (PTMs)

De novo sequencing: When reference databases are unavailable, de novo sequencing constructs amino acid sequences directly from MS data. Although computationally demanding, modern algorithms have significantly improved its accuracy and utility, especially in the discovery of novel proteins and variants.

Integration of protein sequencing with proteomics

Protein sequencing plays a crucial role within the field of proteomics—the large-scale study of proteins, including their structures, functions, and interactions. Proteomics heavily depends on mass spectrometry to examine protein expression under varying conditions, detect isoforms, uncover novel biomarkers, and explore disease mechanisms.

One widely used technique, shotgun proteomics—a bottom-up approach—involves enzymatic digestion of proteins followed by tandem mass spectrometry (MS/MS) to identify and quantify thousands of proteins simultaneously from complex samples. When integrated with advanced data analysis and sequencing, this method can uncover post-translational modifications (PTMs), as well as sequence variants and splice isoforms. These insights are critical for advancing systems biology and accelerating drug discovery.

Applications of protein sequencing

Biopharmaceutical development: Therapeutic proteins, such as monoclonal antibodies, require stringent characterization to ensure safety and efficacy. Protein sequencing confirms amino acid sequences, detects variants, and verifies biosimilarity for regulatory approval. It also helps assess protein stability and aggregation risks.

Clinical diagnostics and biomarker discovery: Protein sequencing enables the detection of disease-specific isoforms and PTMs, which support the design of diagnostic markers. For example, sequence determination of tumor-specific proteins has enabled the identification of neoantigens applied in personalized cancer vaccines.

Evolutionary and comparative biology: Protein sequences are also used to compare proteins across different organisms. By aligning sequences, scientists can trace how proteins have evolved, identify conserved regions, and learn about shared biological pathways. This has applications in evolutionary biology, agriculture, and infectious disease research.

Antibody and epitope mapping: For therapeutic antibody development, it is essential to understand which parts of the antibody bind to the target antigen. Protein sequencing helps map these epitopes, aiding the design of more effective and specific antibodies for treatment.

Forensics and food safety: Protein sequencing can be used in forensic science to complement DNA evidence, mainly in degrading samples. In food safety, it provides authenticity and traceability of protein-derived products, detecting allergens or adulterants.

Challenges and future directions

Although mass spectrometry-based sequencing has become the standard, it is not without its challenges. Sequence coverage may be incomplete, specifically for hydrophobic or low-abundance proteins. Disulfide bonds and post-translational modifications make interpretation more difficult.

However, new technologies are emerging to address these issues. For example, top-down proteomics sequences intact proteins rather than digested peptide fragments. This allows for a more complete picture of the protein, including its full set of PTMs.

Single-molecule protein sequencing, similar to nanopore sequencing with a DNA molecule, is another horizon. Though in its initial phases, it has the promise of delivering real-time, label-free sequencing of a single protein molecule.

Conclusion

Protein sequencing is a fundamental and developing field in molecular biology, underpinning advances in therapeutics, diagnostics, and systems biology. The use of powerful tools like mass spectrometry, combined with the growing integration of proteomics, has greatly enhanced our ability to decode the proteome. As technology continues to advance, protein sequencing will further expand its reach, offer deeper molecular insights and pave the way for the future of precision medicine.

References

  1. Lu, C., Bonini, A., Viel, J.H. et al. Toward single-molecule protein sequencing using nanopores. Nat Biotechnol 43, 312–322 (2025).
  2. Aebersold, R., Mann, M. Mass-spectrometric exploration of proteome structure and function. Nature 537, 347–355 (2016). 
  3. Tran, J., Zamdborg, L., Ahlf, D. et al. Mapping intact protein isoforms in discovery mode using top-down proteomics. Nature 480, 254–258 (2011).
  4. Pandeeswari, P. B., & Sabareesh, V. Middle-down approach: a choice to sequence and characterize proteins/proteomes by mass spectrometry. RSC advances9(1), 313-344 (2019).
  5. https://www.abcam.com/en-us/knowledge-center/proteins-and-protein-analysis/protein-sequencing

Leave a Reply

Your email address will not be published. Required fields are marked *