A protein is in essence a series of amino acids that are translated from DNA. Often these proteins undergo post-translational modifications. For instance, in the case of glycosylation, a sugar chain (called glycan or glycan tree) is added to the protein. In Figure 1, five glycans are attached to the nascent protein via an N-linkage. The glycoproteins play an important role in biological functions in protein folding, co-assembly, and stabilization.
The glycosylation typically happens on specific motifs of amino acids: the Asn-X-Ser sequon or the Asn-X-Thr sequon, where X can be any amino acid except Pro. However, whereas the presence of the Asn-X-Ser/Thr sequon is necessary for the attachment of an N-glycan, the glycosylation on the sequon does not always occur. It is estimated that only about two-thirds of the sequons are glycosylated due to conformational or other constraints during folding. Many amino acids are buried within the protein’s core, while glycan modifications reside on the surface, often extending as large molecular masses away from the attached protein. Moreover, the glycan occupancy is a function of enzyme kinetics and enzyme concentrations in the endoplasmic reticulum. Also, the identity of “X” may reduce the efficiency of glycosylation, such as when “X” is an acidic amino acid (aspartate or glutamate).
The variation in occupancy with glycans is referred to as macro-heterogeneity or variable site occupancy. In previous studies, the amino acids surrounding Asn-X-Ser/Thr sequons have been manipulated (mutated). This has been shown to be useful both for controlling glycosylation efficiency, thus enhancing glycan occupancy, and for influencing which specific glycan trees are attached. The glycan occupancy might also have an effect on the functioning of the protein. An example is Kaposi sarcoma-associated Herpes virus-encoded interleukin-6 (vIL-6) that has two potential N-glycosylation sites (Asn78 and Asn89). Mutation of Asn89 disrupts the conformation of the vIL-6 protein and diminishes the binding of vIL-6 to gp130. Some mutations can thus lead to errors in glycosylation, and the protein might gain/lose its function, which is a known cause of cancer.
However, despite the availability of sequence site-specific information resulting from years of sequencing and sequence feature curation, there have been few efforts to study the biophysical effect of N-X-S/T sequons and mutations on cancer glycoproteins. This thesis aims at giving insight into the effect of mutations on glycans for cancer proteins on conformations, dynamics, and flexibility. This will be done with a combination of existing tools in glycomics and newly developed computational tools.
Fig. 1 (Top) Amino acid sequence of a protein with potential glycosylation sites on the Asn-X-Ser or Asn-X-Thr sequons. (Bottom) Glycan binding to nascent protein showing exposed and buried glycosylation sites. Glycan chains prefer exposed regions on the protein surface.
The proteins in cancer are highly glycosylated, and the potential sites of glycosylation can change due to mutations. Due to these mutations, resulting emerging potential glycosylation sites or sites near sequons can lead to exposure of different glycan surfaces still producing a functional variant. Therefore, the goal is to identify variants with mutations surrounding and within glycan sequons that are most likely to induce conformational, dynamical, and functional changes leading to the loss/gain of glycan site occupancy.
The student will conduct a literature search and learn to use bioinformatics tools for data collection as well as different techniques such as site-directed mutagenesis. A potential workflow is given in Figure 2. The sequence variants analysis will be performed to understand their flexibility and exposed/buried binding surfaces using tools such as DynaMine, DisoMine, and Probis web server before and after mutations. These results constitute a biophysical fingerprint, which will lead to an understanding of protein structures obtained from X-ray Crystallography and Nuclear Magnetic Resonance. A comparative study will be performed across different cancer glycoproteins. The identified essential amino acids involved in inducing favorable biophysical changes in nascent glycoproteins will be utilized for site-directed mutagenesis for further analysis. In the next step, molecular dynamics simulations will be performed on sequence variants of glycosylated proteins and their non-glycosylated counterparts. The project will also contribute to the understanding of Glyco-informatics methods which is a fairly recent field of development in Bioinformatics.
Fig. 2 Tentative research scheme