Structure-Function Analysis of Diacylglycerol Acyltransferase Sequences from 70 Organisms

Background Diacylglycerol acyltransferase families (DGATs) catalyze the final and rate-limiting step of triacylglycerol (TAG) biosynthesis in eukaryotic organisms. Understanding the roles of DGATs will help to create transgenic plants with value-added properties and provide clues for therapeutic intervention for obesity and related diseases. The objective of this analysis was to identify conserved sequence motifs and amino acid residues for better understanding of the structure-function relationship of these important enzymes. Results 117 DGAT sequences from 70 organisms including plants, animals, fungi and human are obtained from database search using tung tree DGATs. Phylogenetic analysis separates these proteins into DGAT1 and DGAT2 subfamilies. These DGATs are integral membrane proteins with more than 40% of the total amino acid residues being hydrophobic. They have similar properties and amino acid composition except that DGAT1s are approximately 20 kDa larger than DGAT2s. DGAT1s and DGAT2s have 41 and 16 completely conserved amino acid residues, respectively, although only two of them are shared by all DGATs. These residues are distributed in 7 and 6 sequence blocks for DGAT1s and DGAT2s, respectively, and located at the carboxyl termini, suggesting the location of the catalytic domains. These conserved sequence blocks do not contain the putative neutral lipid-binding domain, mitochondrial targeting signal, or ER retrieval motif. The importance of conserved residues has been demonstrated by site-directed and natural mutants. Conclusions This study has identified conserved sequence motifs and amino acid residues in all 117 DGATs and the two subfamilies. None of the completely conserved residues in DGAT1s and DGAT2s is present in recently reported isoforms in the multiple sequences alignment, raising an important question how proteins with completely different amino acid sequences could perform the same biochemical reaction. The sequence analysis should facilitate studying the structure-function relationship of DGATs with the ultimate goal to identify critical amino acid residues for engineering superb enzymes in metabolic engineering and selecting enzyme inhibitors in therapeutic application for obesity and related diseases.


Background
The complete genomes of many organisms including human, mouse, Arabidopsis and rice have been sequenced. The immediate challenge of post-genomic biology is to determine the biological functions of proteins encoded by unknown genes. Many endogenous proteins occur in extremely low abundance (such as the anti-inflammatory protein tristetraprolin/zinc-finger protein 36, TTP/ZFP36) [1] and are labile (such as omega-3 fatty-acid desaturase, FAD3) [2], which complicates characterization of those proteins.
One approach to gain clues about the structure-function relationship of proteins is to perform comprehensive amino acid sequence analysis. It is generally accepted that critical amino acid residues and sequence motifs in the same family of proteins are evolutionarily conserved. We previously used a protein sequence analysis approach to identify conserved sequence motifs and critical amino acid residues in several families of proteins from diverse organisms. The protein sequences we analyzed previously include the TTP/ZFP36 family involved in mRNA binding and destabilization [3,4], adenylate translocators [5], starch/glycogen synthases [6], starch/glycogen branching enzymes [7,8], and starch/glycogen debranching enzymes [9].
Triacylglycerols (TAGs) are the major molecules of energy storage in eukaryotes. They also serve as a reservoir of fatty acids for membrane biogenesis and lead to obesity due to excessive accumulation in adipose tissues. Diacylglycerol acyltransferase families (DGATs) are integral microsomal membrane proteins that catalyze the last and rate-limiting step of TAG biosynthesis in eukaryotic organisms. DGATs esterify sn-1,2-diacylglycerol with a long-chain fatty acyl-CoA. DGAT genes have been isolated from many organisms. At least two forms of DGATs are present in mammals [10,11] and plants [12,13] with additional forms reported in burning bush (Euonymus alatus) [14], peanut [15] and Arabidopsis [16]. Plants and animals deficient in DGATs accumulate less TAG [17][18][19]. Animals with reduced DGAT activity are resistant to diet-induced obesity [18,20] and lack milk production [18]. Over-expression of DGAT enzymes increases TAG content in plants [14,[21][22][23][24][25][26], animals [27][28][29][30] and yeast [31]. DGATs have nonredundant functions in TAG biosynthesis in species such as mice [19] and tung tree (Vernicia fordii) [13]. Mice deficient in DGAT1 are viable, have modest decreases in TAG, and are resistant to diet-induced obesity [18,32]. In contrast, mice deficient in DGAT2 have severe reduction of TAG and die shortly after birth [19]. The fact that DGAT1 is unable to compensate for the deficiency in DGAT2 knockout mice indicates the nonredundant functions of each DGAT isoform in TAG biosynthesis during mammal development. Therefore, understanding the roles of DGATs in plants and animals will have tremendous potential in creating new oilseed crops with value-added properties and providing information for therapeutic intervention for obesity and related diseases.
Limited numbers of DGAT amino acid sequences were analyzed previously [13,33]. However, there is a lack of comprehensive analysis of amino acid sequences of DGATs among diverse organisms. The objective of this analysis was to identify conserved sequence motifs and amino acid residues in 117 DGATs from 70 organisms to provide a better understanding of the structurefunction relationship of these important enzymes.

Phylogenetic analysis and classification of DGATs
A database search using tung tree (Vernicia fordii) DGAT1 and DGAT2 protein sequences [13] has identified 117 DGAT sequences from 70 organisms including plants, animals, fungi and human (Table 1). More than two forms of DGATs are present in a number of species. For example, Bos taurus (cow) and Brassica napus (rape) have four forms of DGATs, whereas Homo sapiens (human) and Danio rerio (zebrafish) have three forms of DGATs (Table 1). Phylogenetic analysis indicates that all 117 DGAT protein sequences are grouped in the same phylogenetic tree (data not shown) and clearly separated into DGAT1 and DGAT2 subfamilies ( Figure 1). DGAT1s are more conserved than DGAT2s. DGATs from plants, animals and fungi are also distinctly separated from each other with a few exceptions ( Figure 1).
DGAT1s have an average of 515 amino acid residues with a standard deviation of 44 amino acid residues among the 55 sequences ( Table 2). The average residue number of DGAT1 is 171 amino acids greater than the average DGAT2 length, which has an average of 344 residues with a standard deviation of 29 residues among the 54 sequences (Table 2). This corresponds to approximately 20 kDa difference in the molecular mass. The isoelectric points of DGAT1s and DGAT2s are similar with an average value of 9.17 and 9.28, respectively, however DGAT1s have approximately 4 more charges at pH 7 than DGAT2s ( Table 2).
The frequency of functional amino acid residue groups between DGAT1 and DGAT2 subfamilies is remarkably similar ( Table 2). DGAT1s and DGAT2s have charged residues (RKHYCDE) of approximately 26%, which includes approximately 7% of acidic residues (DE) and 10% of basic residues (KR). The frequency of polar  resides (NCQSTY) is also similar with an average of 25% vs. 22% in polar residues for DGAT1s and DGAT2s, respectively. DGAT1s and DGAT2s are integral membrane proteins with 42% and 41% of the total residues being hydrophobic (AILFWV), respectively ( Table 2).
Identification of amino acid residues and sequence motifs conserved among all DGATs The sequences between DGAT1s and DGAT2s are very divergent. No common features between these two subfamilies were reported previously using 7 DGAT1s and 8 DGAT2s from 10 organisms [13]. However, the biochemical function of the two DGAT isoforms is essentially identical in enzymatic assays, suggesting that certain common sequence conservations are probably present between them. Multiple sequence alignment was performed to analyze all 117 DGATs including 59 DGAT1s from 48 organisms and 58 DGAT2s from 44 organisms. Sequence alignment has identified only two completely conserved amino acid residues among all DGATs. One perfectly conserved proline residue corresponds to P248 in AaDGAT1-XP_001658299 and P151 in VfDGAT2-DQ356682.1 ( Figure 2A); the other residue, a perfectly conserved phenylalanine residue corresponds to F344 in AaDGAT1-XP_001658299 and F225 in VfDGAT2-DQ356682.1 ( Figure 2B). The conserved phenylalanine residue is followed by a glycine residue conserved in all except DGAT1s of Aedes aegypti, Drosophila melanogaster and Tribolium castaneum ( Figure 2B). The other proline residue in Figure 2C is almost completely conserved except DGAT2 of Ostreococcus tauri, which corresponds to P411 in AaDGAT1-XP_001658299 and P276 in VfDGAT2-DQ356682.1. Based on the sequence conservation patterns with these conserved residues as anchors, the conserved sequence motifs among all DGATs are named as Motif 1 (P Block), Motif 2 (FG Block) and Motif 3 (P-1 Block). The highly conserved residues among diverse organisms may be located at the active sites of the enzymes and play important roles in structure, substrate binding and/or catalysis.
Identification of amino acid residues and sequence motifs conserved in DGAT1s Multiple sequence alignment was performed to identify conserved amino acid residues and sequence motifs within the DGAT1 subfamily. Among the 55 full-length DGAT1s, 41 amino acid residues are completely conserved, which correspond to 8.0% of the total 515 residues of DGAT1s. Table 3 shows the positions of the 41 completely conserved residues in DGAT1s from representatives of animal group (mouse), plant group (tung tree) and fungus group (Dictyostelium discoideum). These completely conserved residues are located in seven sequence motifs of DGAT1s. Based on the sequence conservation patterns with the completely conserved residues as anchors, the conserved sequence motifs of DGAT1s are named as Motif 1 (GL Block), Motif 2 (KSR Block), Motif 3 (PTR Block) ( Figure  The great majority of the conserved residues are located within the carboxyl terminal halves of DGAT1s. Among the completely conserved residues, 33 of them are located within the last 200 residues from the carboxyl termini in 5), and 23 of them are concentrated in the most conserved region with approximately 100 residues in Motif 6 ( Figure 4B). The first two conserved residues (G, L) in Motif 1 start approximately 100 residues from the amino termini ( Figure 3A). The next three conserved residues (K, S, R) in Motif 2 start until approximately 200 residues from the amino termini ( Figure 3B). The last conserved residue (Y) in Motif 7 ends within the last 20 residues from the carboxyl termini of DGAT1s ( Figure 5B).

Identification of amino acid residues and sequence motifs conserved in DGAT2s
Multiple sequence alignment was also performed using all DGAT2s to identify conserved amino acid residues and sequence motifs within this subfamily. Sixteen residues are completely conserved in the 54 full-length DGAT2s, corresponding to 4.7% of the total 344 residues. Table 3 shows the positions of the 16 completely conserved residues in DGAT2s from representatives of animal group (mouse), plant group (tung tree) and fungus group (Dictyostelium discoideum). Based on the sequence conservation patterns with the completely conserved residues as anchors, the conserved sequence motifs of DGAT2s are  A B Figure 1 Phylogenetic analysis of DGAT1s and DGAT2s. The presumed evolutionary relationships among the 117 DGATs from 70 organisms (listed in Table 1) were analyzed by phylogenetic analysis based on the Neighbor-Joining method of Saitou and Nei. The numbers in the parenthesis following DGAT names are the calculated distance values which reflect the degree of divergence between all pairs of DGAT sequences analyzed. The sequences above the red line are from animals, whereas the sequences below the green line are from plants. A red "star" before the sequence indicates the exceptional sequence from the grouping.
Similar to DGAT1s, these conserved residues are located at the carboxyl termini of DGAT2s. Eight of them are concentrated within 25 residues in the highly conserved Motifs 4 and 5 of DGAT2s ( Figure 7A-B). The first two conserved residues (PH) in Motif 1 start until approximately 100 residues from the amino termini but the last residue (G) ends within the last 50 residues from the carboxyl termini of DGAT1s ( Figure 6A and 7C).

Sequence analysis of important motifs in less conservative regions of DGATs
Several studies have reported functional motifs in DGATs. However, the conserved sequence motifs identified by our extensive sequence analysis as discussed above do not contain any of the reported putative neutral lipid-binding domain [34], mitochondrial targeting signal [35] or ER retrieval motif [13]. It was reported that mouse DGAT2 contains a consensus sequence (FLXLXXX n ) for a putative neutral lipid-binding domain [34] which was shown to be present in proteins that either bind to or metabolize neutral lipids [36]. However, this putative motif is only modestly conserved in animal DGAT2s and not present in any plant DGAT2 ( Figure  8A). Mouse DGAT2 was also reported to contain a putative mitochondrial targeting signal with positively charged residues (RXKXXK) targeting proteins to mitochondria [35]. This motif is only found in a few animal DGATs but not conserved in any of the plant or fungi DGAT2s ( Figure 8B). DGATs are ER-localized enzymes with an ER retrieval motif (LKLEI) at the extreme carboxyl terminus of tung DGAT2 [13]. This sequence analysis shows that this pentapeptide ER-retrieval motif is only modestly conserved in plant DGAT2s but not in animal or fungal DGAT2s, although animal and fugal DGATs are also located to ER ( Figure 8C).

Sequence analysis of important amino acid residues in less conservative regions of DGATs
The importance of some less conserved residues of DGATs has been demonstrated by site-directed mutants ( Table 4). Mutagenesis of a putative SnRK1 target site S197 in Tropaeolum majus DGAT1 results in a 38%-80% increase in DGAT1 activity, and over-expression of the mutated TmDGAT1 in Arabidopsis results in a 20%-50% increase in oil content on a per seed basis (Table 4) [25]. Figure 9A shows that this serine residue is conserved in most of the plants and some animals except that the same position in DGAT1s alignment is replaced with proline, glycine, threonine and lysine residue in DGAT1s from other organisms. A similar serine residue is not found in any of DGAT2s. Mutagenesis at P216 in Tropaeolum majus DGAT1 eliminates almost all of the activity (Table 4) [25]. The P216 residue is completely conserved in plant DGAT1s but is missing in mammalian DGAT1s ( Figure 9B). A highly conserved region with a consensus sequence of "YFP" in DGAT2s is essential for enzymatic activity of DGAT2 from Saccharomyces cerevisiae ( Table 4) [33]. These three residues are highly conserved and located before Motif 1 but none of the three amino acid residues is completely conserved among all DGAT2s in our analysis using 54 full-length DGAT2s ( Figure 10A). The consensus "YFP" is replaced with "YYP" in DGAT2s of human, chimpanzee (Pan troglodytes) and Ashbya gossypii, "FFP" in Helianthus annuus and Nematostella vectensis, and "HFP" in caster bean (Ricinus communis), Vernonia galamensisand, and Selaginella moellendorffii. It was reported that a unique region is present in DGAT2 of Saccharomyces cerevisiae [33], but in our expanded analysis, a similar region is also found in Ashbya gossypii (data not shown).
Mutations at F80/L81/L83 in mouse DGAT2 [34] and F71/L73 in baker's yeast DGAT2 [33] result in partial loss of the activity (Table 4). This region is only relatively conserved in some animal DGAT2s ( Figure 10B). Finally, ScDGAT2 has a unique cysteine residue (C314) which is not involved in catalysis but may be located near the active site or related to proper folding of the protein [37]. However, this residue is only found in DGAT2s from baker's yeast and the other fungi Ashbya gossypii and Physcomitrella patens, but is not present in the same position of the alignment in any of the other 51 DGAT2s or any of the 55 DGAT1s analyzed ( Figure  10C).

Sequence analysis of important amino acid residues of DGATs shown in natural mutants
The importance of some relatively conserved residues in TAG biosynthesis has been demonstrated by two wellknown natural mutants in corn and cattle. A phenylalanine insertion (F469) in DGAT1-2 increases oil and oleic-acid contents in maize. Ectopic expression of the high-oil DGAT1-2 allele increases oil and oleic-acid contents by up to 41% and 107%, respectively (Table 4) A C B Figure 2 Conservation of proline and phenylalanine residues in all DGATs. Multiple sequence alignment was performed using the ClustalW algorithm and 117 DGAT protein sequences from 70 organisms (listed in Table 1). DGAT sequence name is on the left of alignment followed by the start of the amino acid residue of each DGAT protein sequence. The completely conserved proline and phenylalanine residues are highlighted in red on yellow. Other color code and related information are described in "Methods" section.  and conserved in all fungi except mold (Dictyostelium discoideum and Polysphondylium pallidum) ( Figure  11A). Rape DGAT1 is the only sequence in the plant group with a serine residue in the place of phenylalanine in the sequence alignment ( Figure 11A). Since this gene is isolated from suspension cultures of Brassica napus [39] and the native form is not available, it is not known if this replacement was caused by mutations due to cell culture conditions, considering that cell culture could cause significant changes in gene expression [40]. It is interesting to note that a similar phenylalanine residue is not present in any of the animal DGAT1s or any of DGAT2s.
In cattle, a nonconservative substitution of lysine by alanine (K232A) in DGAT1 (Table 4) is directly responsible for the quantitative trait loci (QTL) variation with the lysine-encoding allele being associated with higher milk fat content [41]. Figure 11B shows that the lysine residue is conserved in mammalian DGAT1s but not in plants, or fungi, or other animals (fly, frog, insect and worm) except one of the two forms from dog and zebrafish. The wild-type cattle DGAT1 shows the normal lysine at 232 position [GenBank:AAL49962.1] (Table 4). A similar lysine residue is not found in any of DGAT2s.

DGAT classification
The nomenclature of proteins derived from DNA sequences in GenBank databases can lead to confusion in some cases. One of the utilities of this extensive sequence analysis is to use the completely conserved amino acid residues in respective sequence blocks of DGAT1 and DGAT2 subfamilies as signatures of DGAT proteins for classification. It is generally accepted that DGATs are divided into DGAT1 and DGAT2 subfamilies. However, more than two forms of DGATs are  The completely conserved residues in DGAT1s and DGAT2s are underlined and listed below the sequences. The underlined P and F residues are conserved in all DGATs.
present in a number of species (Table 1) indicate that these sequences are completely different from DGAT1s and DGAT2s. In fact, none of the completely conserved residues in DGAT1s (41 residues) and DGAT2s (16 residues) is present in these new DGATs in the multiple sequences alignment (data not shown). This sequence divergence is in contrast to the general belief that the active sites of enzymes should be conserved during the evolution because all catalyze the same/similar biochemical reaction. Therefore, this sequence divergence raises an important question how completely different proteins could perform the same biochemical reaction.

DGAT properties and amino acid composition
This study has analyzed the properties and amino acid composition of 109 full-length DGATs from 70 organisms. The average DGAT1s are 171 amino acid residues longer than DGAT2s resulting in approximately 20 kDa difference in the molecular mass. Other DGAT properties are similar: both are basic proteins under neutral pH with

HKPRDSLLSWNSGFEN-FTGFVNWAFLLLSIGGLRLLLENFIKYGIRV
TcDGAT1-XP_975142.1 (39) HRLQDSLFSSDSGFSS-YRGILNWCVVMLILSNARLFLENLIKYGILV   Table 1). The completely conserved amino acid residues are highlighted in red on yellow. Other color code and related information are described briefly in Figure 2 legend and with details in "Methods" section.
high isoelectric points ( Table 2). The frequency of functional amino acid residue groups between DGAT1 and DGAT2 subfamilies is also very similar in terms of charged residues, acidic residues, basic residues, polar residues and hydrophobic residues ( Table 2). The remarkable feature of DGAT1s and DGAT2s is that both subfamilies of proteins contain more than 40% of hydrophobic residues (Table 2). These high amounts of hydrophobic residues in DGATs are in agreement with them being integral membrane proteins [33,34] with multiple transmembrane domains [13,33,34], localized to endoplasmic reticulum of plant and animal cells [13,34], and associated with mitochondria in COS-7 cells [35,42] and lipid bodies in 3T3-L1 adipocytes [42]. The membrane association of the proteins presents extra huddle to purification of recombinant DGATs from any source [43,44].

Catalytic and regulatory domains of DGATs
Generally speaking, critical amino acid residues of proteins are conserved during the evolution because they are essential for enzymatic activity. The conserved amino acid residues are clustered at the active centers of the enzymes. Multiple sequence alignment has shown that DGAT1s and DGAT2s have 41 and 16 completely conserved amino acid residues, respectively. Most of them are located at the carboxyl termini of DGATs (Table 3). This sequence analysis suggests that the catalytic domains of DGATs are located at the carboxyl termini of the proteins. This is supported by mutations of some completely conserved amino acid residues in the C-termini of these proteins resulted in complete loss of the enzymatic activity of DGATs (see below). This suggestion is in line with our previous assignment of the catalytic domains of ADPGlc-dependent α-1,4-glucosyltransferases and α-1,6-glucan hydrolases from plants and prokaryotes at the carboxyl termini of the enzymes because of the presence of the conserved amino acid residues and sequence motifs in the different isoforms from diverse organisms [6,9].    Several lines of evidence suggest that the regulatory domains of DGATs are located at the amino termini of the proteins. First, a recent study showed that the amino terminal domain of DGAT1 of mouse is not required for the catalytic activity of DGAT1 but may be involved in regulating enzyme activity and dimer/tetramer formation [45]. Second, the N-terminal region of mouse DGAT2 or yeast DGAT2 is not essential for DGAT activity in vitro [33,35]. Finally, mutagenesis of a putative protein kinase SnRK1 (SNF1-related kinase 1) target site at S197 to alanine in TmDGAT1 results in a 38%-80% increase in DGAT1 activity, and over-expression of the mutated TmDGAT1 in Arabidopsis results in a 20%-50% increase in oil content on a per seed basis [25]. This serine residue is conserved in most of the plants and located at the N-termini of DGAT1s ( Figure   747 761 LGNVIFWIS-IVLGQPLVVLLYYRN LGNVIFWFS-IVLGQPMVVLLYYRN 9A). All of the above mentioned sequence analysis and experimental evidence support the concept that the catalytic and regulatory domains of DGATs are located at the C-and N-termini of the enzymes, respectively.

Functional significance of less conserved motifs in DGAT2s
Recent studies have reported functional motifs in DGATs including putative neutral lipid-binding domain (FLXLXXX n in mouse DGAT2) [34], mitochondrial targeting signal (RXKXXK in mouse DGAT2) [35] and ER retrieval motif (LKLEI in tung DGAT2) [13]. However, the conserved sequence motifs identified by our extensive sequence analysis do not contain any of these reported motifs. In our analysis, the putative neutral lipid-binding domain [34] which was shown to be presented in proteins that either bind to or metabolize neutral lipids [36], is only modestly conserved in animal DGAT2s and not present in any plant DGAT2 ( Figure   8A). Similarly, the putative mitochondrial targeting signal is only found in a few animal DGAT2s but not conserved in any plant or fungi DGAT2 ( Figure 8B). This sequence analysis also shows that the pentapeptide (LKLEI) ER-retrieval motif identified at the extreme carboxyl terminus of tung DGAT2 [13] is only modestly conserved in plant DGAT2s but not in animal or fungus DGAT2s ( Figure 8C). All these studies point out that less conserved regions in a subset of DGATs may play specific roles in TAG biosynthesis in that particular subset of organisms.

Functional significance of the completely conserved residues
Multiple sequence alignment has shown that 55 DGAT1s and 54 DGAT2s have 41 and 16 completely conserved amino acid residues, respectively, although only two residues are completely conserved among all DGATs (Table 3). It is likely that these completely conserved amino acid residues are critical for DGAT enzymatic activities. These residues may be involved in substrate binding, direct catalysis, and/or maintenance of protein structure including oligomer formation. The importance of some conserved residues in DGAT1s has been demonstrated by site-directed mutagenesis (Table  4). Mutagenesis at H426 in mouse DGAT1 to alanine impairs the ability of DGAT1 to synthesize triacylglycerols, retinyl and wax esters in an "in vitro" acyltransferase assay [45]. This histidine residue is completely conserved in Motif 5 of all DGAT1s ( Figure 4B). Similarly, mutagenesis at Y392, W395 and F439 in Tropaeolum majus DGAT1 eliminates nearly all activity [25]. These three residues are also completely conserved in Motif 5 of all DGAT1s ( Figure 4B). All four residues are located in the most conserved region of DGAT1s in which 23 completely conserved residues are located in Motif 5 of the multiple sequence alignment ( Figure 4B). The importance of the completely conserved residues in DGAT2s is also supported by site-directed mutagenesis. Mutagenesis at H161, P162 and H163 sites, and the triple mutant in mouse DGAT2 results in a substantial loss of activity (Table 4) [34]. Mutation at the corresponding sites at H193 and H195 in DGAT2 of baker's yeast results in complete loss of the activity (Table 4) [33]. These three resides are located in the highly conserved Motif 1 (PH Block) of DGAT2s ( Figure 6A). These results suggest that they may be located at the active center of DGAT1s, but the precise roles of these residues either involved in substrate binding or catalysis are not clear. Further experiments are required to assess the contribution of the other completely conserved residues to the enzymatic activity of DGATs.
Functional significance of the less-well conserved residues in site-directed mutants The importance of some less conserved residues in DGATs has also been demonstrated by site-directed mutagenesis ( Table 4). As described above, mutation at S197 (a putative SnRK1 target site) in TmDGAT1 results in a 38%-80% increase in DGAT1 activity. This serine residue is conserved in most of the plants ( Figure  9A). In addition, mutagenesis at E145 in Motif 1 of Tropaeolum majus DGAT1 results in the loss of almost half of the activity [25]. This glutamate residue is conserved in all plant DGAT1s and most other DGAT1s except bird, chimpanzee, Dictyostelium discoideum, Polysphondylium pallidum and Metarhizium acridum  Table 1). Color code and related information are described briefly in  ( Figure 3A). Mutagenesis at P216 in Tropaeolum majus DGAT1 eliminates almost all of the activity [25]. P216 is completely conserved in plant DGAT1s but is missing in all mammalian DGAT1s ( Figure 9B). Mutation at Y129/F130/P131 in DGAT2 of baker's yeast results in a complete loss of the activity [33]. These three residues are highly conserved but none of them is completely conserved among all DGAT2s in our analysis using 54 full-length DGATs ( Figure 10A). Mutations at F80/L81/ L83 in mouse DGAT2 [34] and F71/L73 in baker's yeast DGAT2 [33] result in partial loss of the DGAT activity ( Figure 10B). Finally, ScDGAT2 has a unique cysteine residue (C314) which is not involved in catalysis but may be located near the active site or related to proper folding of the protein [37]. However, this residue is only found in DGAT2s from baker's yeast and the other two fungi Ashbya gossypii and Physcomitrella patens, but is not present in any of the other 51 DGAT2s or any of the 55 DGAT1s analyzed ( Figure 10C). Nonetheless, site-directed mutagenesis indicates that these less conserved residues, although not essential, contribute to the full activity of DGATs.

Functional significance of the relatively conserved residues in natural mutants
Two well-known natural mutants in corn and cattle demonstrate the importance of some relatively conserved residues in TAG biosynthesis (Table 4). Genetic mapping has identified a high-oil QTL (qHO6) that affects maize seed oil and oleic-acid contents associated with DGAT1-2 [38]. A phenylalanine insertion (F469) in DGAT1-2 is responsible for the increased oil and oleicacid contents. Ectopic expression of the high-oil DGAT1-2 allele increases oil and oleic-acid contents by up to 41% and 107%, respectively [38]. This phenylalanine residue is conserved in all plants except Brassica napus (rape, AAD45536.1) and conserved in all fungi except mold (Dictyostelium discoideum and Polysphondylium pallidum) ( Figure 11A). It is not present in any of the animal DGAT1s or any of DGAT2s. This case suggests that oil content can be potentially improved in transgenic plants by introducing site-specific amino acid substitutions/changes in DGATs. DGAT1 knockout mice are completely devoid of milk secretion, most likely because of deficient triglyceride synthesis in the mammary gland [18]. DGAT1 sequences from pooled DNA show significant frequency shifts at several residue positions between groups of animals with high and low breeding values for milk fat content in different breeds [41]. Substitution of lysine by alanine (K232A) is directly responsible for the QTL variation with the lysine-encoding allele being associated with higher milk fat content [41]. Both DGAT1 alleles are expressed in Sf9 cells, an insect expression system, and characterized the expressed proteins. The K allele, causing an increase in milk fat percentage in the live animal, is characterized by a higher Vmax in producing triglycerides than the A allele [46]. This lysine residue is conserved in mammals but not in plants, fungi or other animals except one of the two forms from dog and zebrafish ( Figure 11B). This case also suggests that lipid content can be improved in transgenic animals by bioengineering specific amino acid residues of DGATs.

Conclusions
Understanding the precise roles of DGATs may help to create transgenic plants with value-added properties and provide information for therapeutic intervention for obesity and related diseases because DGATs catalyze the final and rate-limiting step of TAG biosynthesis in eukaryotic organisms. This report analyzed 117 DGAT sequences from 70 organisms ranging from plants, animals and fungi to aid our understanding of the structure-function relationship of these important enzymes. The report identified conserved sequence motifs and amino acid residues in all 117 DGATs and DGAT1 and DGAT2 subfamilies, reassigned some DGAT subfamily members based on the phylogenetic analysis and  reaction. Therefore, this sequence divergence raises an important question how proteins with completely different amino acid sequences could perform the same biochemical reaction, although some variations of the conserved sequence motifs and amino acid residues are expected when more sequences of DGATs are used in the multiple sequence alignment. It has been well-documented that many of the enzymes in the oil biosynthesis pathway are not stable. Although the precise reasons are unknown, it is possible that plants develop a feedback mechanism to regulate the optimal amount of enzymes so that the biophysical properties of ER membranes are functionally intact without dramatic alterations by over-expressed enzymes in the host. If this is the case, it may be advantageous to introduce genes with low copy numbers but with high catalytic efficiency. This concept is supported by three studies with plants (S197 in Arabidopsis and F469 in corn) and animals (K232 in cattle) which demonstrate the potential to increase oil/fat production by altering a single amino acid residue of DGAT1. Therefore, the sequence analysis should facilitate studying the structure-function relationship of DGATs with the ultimate goal of identifying critical amino acid residues. This will guide the construction of superb enzymes for metabolic  Figure 10 Sequence analysis of important amino acid residues in less conservative regions of DGAT2s. (A) YFP motif (The boxed Y, F and P residues are mutated in baker's yeast DGAT2 corresponding to Y129/F130/P131), (B) The boxed F, L and L residues are mutated in mouse DGAT2 corresponding to F80, L81 and L83 and in baker's yeast DGAT2 corresponding to F71 and L73, (C) The boxed C residue is mutated in baker's yeast DGAT2 corresponding to C314. Multiple sequence alignment was performed using 54 full-length DGAT2 protein sequences from 44 organisms (listed in Table 1). Color code and related information are described briefly in Figure 2 legend and with details in "Methods" section. The amino acid residues studied by mutagenesis and the corresponding conserved residues in other organisms are boxed within the sequence alignment.
engineering and rational design of DGAT inhibitors to be used for obesity and related diseases.

Database search of DGATs
DGAT sequences were obtained from Blastp search [47,48] Figure 11 Sequence analysis of important amino acid residues of DGAT1s shown in natural mutants. Multiple sequence alignment was performed using 55 full-length DGAT1 protein sequences from 45 organisms (listed in Table 1). The completely conserved amino acid residues are highlighted in red on yellow. Other color code and related information are described briefly in Figure 2 legend and with details in "Methods" section. The amino acid residues affected by natural mutation and the corresponding conserved residues in other organisms are boxed within the sequence alignment. (A) maize DGAT1-2 F468, (B) cattle DGAT1 K232A.
yeast, mold, moss) and human. The names of DGATs used in the analysis and their corresponding organisms, classification of DGAT subfamily, and the GenBank accession numbers are presented in Table 1. The name of each protein sequence consists of the initials of the organism followed by the assigned subfamily of DGATs in the databases and the GenBank accession number.

Protein analysis
The properties and amino acid compositions of DGATs were analyzed using Vector NTI software (Invitrogen) [49]. Statistics was performed using Microsoft Excel.

Phylogenetic analysis
Phylogenetic analysis was used to study the presumed evolutionary relationships among the 117 DGATs from 70 organisms. This analysis was performed using the Vector NTI software (Invitrogen) based on the Neighbor-Joining method of Saitou and Nei [50]. The numbers in the parenthesis following DGAT names are the calculated distance values which reflect the degree of divergence between all pairs of DGAT sequences analyzed.

Multiple sequence alignment
Multiple sequence alignment was performed using the ClustalW algorithm [51,52] of the AlignX program of the Vector NTI software. This method is based on algorithms that assign scores to aligned residues and detect sequence similarities. Identical amino acid residues in alignment have higher scores than those not identical and less similar residues. Each DGAT sequence name is on the left of the alignment followed by the position of amino acid residue of DGAT protein sequence in the alignment. The numbers at the top of the alignment are the positions of the multiple sequence alignment. The letters at the bottom of the alignment are the consensus residues. Color codes for amino acid residues are as follows: 1) red on yellow: consensus residue derived from a completely conserved residue at a given position; 2) black on green: consensus residue derived from the occurrence of greater than 50% of a single residue at a given position; 3) blue on cyan: consensus residue derived from a block of similar residues at a given position; 4) green on white: residue weakly similar to consensus residue at a given position; 5) black on white: non-similar residues.