The analysis of the complete set of putatively-secretory proteins from eight fungi showed that 38-61% of LY2835219 cost them display Ser/Thr-rich regions, i.e. regions of at least 20 residues with a minimum Ser/Thr content of 40%, and that 18-31% of them contain pHGRs, i.e. regions of 20 or more residues of which at least 25% are predicted to be O-glycosylated. pHGRs were found anywhere along proteins but have a slight preference for the proteins ends, especially the C-terminus. Methods Prediction of O-glycosylation sites
in secretory proteins Protein sequences used in this study were obtained from publically available databases. The whole set of proteins coded by the genomes of Magnaporthe grisea (strain 70–15), Sclerotinia sclerotiorum (strain 1980), Ustilago maydis (strain 521), Aspergillus nidulans (strain FGSC A4), and Neurospora crassa (strain N15) were obtained from the Broad Institute [27]. Those of Botrytis cinerea (strain T4), Trichoderma reesei (strain QM6a), and Saccharomyces cerevisiae (strain S288C) were obtained respectively from URGI [28], JGI [29], and SGD [30]. The predicted protein sequences for each genome were downloaded and transferred to a Microsoft Excel 2010 spreadsheet with the aid of Fasta2tab [31]. All proteins were then tested for the presence of a signal peptide for secretion, using the standalone version of SignalP 3.0 [32]. SignalP 3.0 has a false positive rate
of 15%. Those proteins which gave positive result in selleck compound each genome, i.e. all proteins putatively entering the secretory pathway at the endoplasmic reticulum, were then run through the web-based O-glycosylation prediction tool NetOGlyc 3.1 [12]. Results from NetOGlyc were saved as a text file from within the web browser and fed to Microsoft Word 2010 to transform these into an appropriate table format that could be incorporated into Thiamine-diphosphate kinase a Microsoft Excel 2010 spreadsheet (Additional file 2). The sets of proteins with randomized O-glycosylation positions were generated from the latter with the aid of the Rand function in Microsoft Excel. Each randomized set contains the same proteins as the original one. i.e. all signalP-positive
proteins in a given genome, and the number of predicted O-glycosylation sites in every individual protein is also the same. The difference is that the position along the protein of every individual site was chosen by the generation of an appropriate random number (according to the length of each individual protein), being careful not to assign two sites to the same residue. Detection of Ser/Thr-rich regions and pHGRs To study the presence, in signalP-positive fungal proteins, of regions that are either rich in Ser/Thr or rich in predicted O-glycosylation, we first developed a BIBW2992 in vivo simple algorithm that runs as a macro (named XRR) in a Microsoft Excel spreadsheet (Additional file 4), which was written with Microsoft Visual Basic for Applications.