Most proteins in eukaryotic cell are post-transcriptionally modified by a wide range of chemical groups. Among those, the addition and removal of lipid groups to certain amino acids is a key modification that orchestrates the subcellular trafficking (Draper et al., 2007;Linder & Deschenes, 2007), signaling (Smotrys & Linder, 2004;Resh, 2004) and membrane association (Levental et al., 2010) of proteins. With the rapid development of numerous innovative testing techniques, three prevalent forms of lipid modifications, such as S-palmitoylation, prenylation and N-myristoylation, are now extensively studied.

The reversible attachment of a 16-carbon fatty acid palmitate to protein via thioester linkage is called as S-palmitoylation (Nadolski & Linder, 2007). By effectively increasing the hydrophobicity of its modified substrates, the S-palmitoylation process can dynamically regulate the membrane association of various cellular proteins (Draper et al., 2007;Linder & Deschenes, 2007). Despite S-palmitoylation, cellular proteins may also be covalently modified with the 14-carbon saturated fatty acid myristate. This process is known as N-myristoylation. By recognizing a MGXXXS/T signature at N-terminus, the N-myristoyl transferase (NMT) may catalyze the addition of myristate to glycine via an amide bond (Maurer-Stroh et al., 2002;Towler et al., 1988). Another important type of lipid modification is prenylation. This process involves the addition of a 15-carbon farnesyl group or a 20-carbon geranylgeranyl group to a C-terminal cysteine that conform to a consensus CAAX motif (Maurer-Stroh & Eisenhaber, 2005). Typically, the farnesylation is catalyzed by protein farnesyltransferase (FTase) (Hougland et al., 2010), whereas the geranylgeranylation is performed by protein geranylgeranyltransferase type I (GGTase-I) (Gangopadhyay et al., 2014;McGuire et al., 1996). However, in case of Rab proteins, the geranylgeranyltransferase type II (GGTase-II) which recognized a C-terminal CC/CXC motif is found to catalyze the geranylgeranylation process (Maurer-Stroh & Eisenhaber, 2005;Pereira-Leal et al., 2001). Although the enzymatic processes of protein lipidation vary greatly, different types of lipid groups are still found to modify similar protein substrates, which implies a strong co-regulation between different lipid modifications. One of the most striking example is the regulation of small GTPases in subcellular trafficking by prenylation and palmitoylation(Sanchez-Mir et al., 2014). For Ras and Rho families, palmitoylation frequently occurs in the hypervariable domain that adjacents to the prenylated C-terminal end CAAX box (Michaelson et al., 2001). These two types of lipid modification provide sufficient hydrophobicity for proteins to localize on cellular membranes and the precise subcellular localizations of these small GTPases are essential for their proper functionalities (Symons & Settleman, 2000). Additionally, with the help of an intracellular palmitoylation–depalmitoylation cycle, the prenylated small GTPases are able to dynamically traffic from Golgi apparatus to plasma membranes (Rocks et al., 2010). A similar co-regulatory mechanism was also reported between myristoylation and palmitoylation. Given the fact that the myristoylation, by itself, is not providing enough hydrophobicity of the modified protein for its membrane association (Navarro-Lerida et al., 2002), extra N-terminal palmitoylation on the myristoylated proteins are usually requested for stable membrane attachment and the translocation to rafts/caveolae or intracellular liquid-ordered domains (McCabe & Berthiaume, 1999;McCabe & Berthiaume, 2001). Some Guanylate Cyclase Activating Proteins (Stephen et al., 2007), most of the members of the Src family of protein tyrosine kinases (Patwardhan & Resh, 2010) and the Giα subfamily of alpha subunits of G proteins (Preininger et al., 2012) are examples that undergo this kind of regulation. Taken together with the above co-regulation, proper dual-lipid modifications are responsible for the correct localization of signaling proteins, and play crucial roles in coordinating the extracellular stimuli and intracellular signaling.

In view of the above essential physiological functions, the irregular lipid modification may lead to all sorts of diseases. As reported in published literatures (Selvakumar et al., 2007;Sebti, 2005), the dysfunctions of lipid modification are highly correlated with oncogenesis. Besides, the elevated levels of lipid modification may also linked with other severe diseases. For example, the overexpression of palmitoyl acyltransferases (PATs) may implicate in schizophrenia (Mukai et al., 2004) and Huntington’s disease (Yanai et al., 2006). Also, the N-myristoylation is observed to mediate the viral Infectivity and eukaryotic infections (Maurer-Stroh & Eisenhaber, 2004). Taken together, the research on lipid modification, especially on the co-regulatory mechanisms, will be particularly important for identifying potential drug targets for further diagnostic and therapeutic consideration. However, due to the limitations of integrative bioinformatics resources, the overall investigations that focusing on the co-regulation of lipid modifications are seldom performed. This deficiencies may grievously hamper the development of effective therapies for disorders related to lipid modifications.

In this work, we present GPS-Lipid, which is a comprehensive predictor for protein lipid modification sites. From the literatures published before November, 2014, we manually collected 737 S-palmitoylation sites in 361 proteins, 106 S-farnesylation sites in 97 proteins, 95 S- geranylgeranylation sites in 70 proteins and 283 N-myristoylation sites in 281 proteins. To obtain a high-performance predictor, we applied the Particle Swarm Optimization with an aging leader and challengers (ALC-PSO) (Chen et al., 2013) in our previous Group-based Prediction System (GPS) algorithm for model training and predicting. For convenience, an online service was developed using PHP + JavaScript, and is freely available at http://lipid.biocuckoo.org.

※ Data Summary


S-Palmitoylation N-Myristoylation S-Farnesylation S-Geranylgeranylation Total
Exp. Veri. High
Exp. Veri. Exp. Veri. Exp. Veri.
Protein Site Protein Site Protein Site Protein Site Protein Site Protein Site
H. sapiens 165 360 144 344 85 87 38 38 34 46 448 875
M. musculus 58 122 106 239 23 23 11 11 7 10 202 405
R. norvegicus 36 74 77 182 18 18 5 5 1 1 133 280
A. thaliana 8 12 117 215 32 32 10 10 4 4 167 273
S. cerevisiae 10 16 21 34 12 12 8 8 5 8 51 78
B. taurus 8 20 0 0 13 13 2 2 5 6 27 41
D. melanogaster 3 3 0 0 4 4 4 4 1 1 12 12
Others 73 130 247 433 94 94 19 28 13 19 439 704
Total 361 737 712 1447 281 283 97 106 70 95 1479 2668