Help
-
Proteins 21654
-
Sites 70536
-
Organisms 12
CysModDB Introduction
CysModDB contains 70,536 CysPTM sites from 21,654 proteins in 12 distinct species identified through high-throughput proteomics experiments. It includes 12 different PTM types classified into three categories: Lipid, Metabolite and Oxidation. The PTM sites are annotated with the information from literature and functional databases.
Information Querying
Browse
Users can intuitively browse modified proteins on the browse page by selecting a PTM type or an organism.
Search
On the search page, three ways are provided, including simple search, advanced search and multiple search. Three input queries (Uniprot AC, gene name, protein name) are optional for the simple search. Advanced search allows users to input at most three terms and connect them by booleans (AND, NOT) to find the information more precisely. Users could input a Uniprot AC list for multiple search to query results.
Booleans meaning :
AND : The term following this operator must be included in the specified field(s). NOT :The term following this operator must not be contained in the specified field(s).
Query Results
The result page contains a table of matched items. Each item is annotated with Gene Name, Protein Name, PTM categories, Organism, Uniprot AC and CMID. CMID is the unique item ID in CysModDB, which links the item's detailed information.
Detailed page
The detailed information for the modified protein is shown below.
Summary
This part shows the essential information, including protein name, gene name, Uniprot AC, organism, and identified CysPTM types. Notably, the Uniprot AC is hyperlinked to the Uniprot database.
Graphic browser
The graphic browser visualizes the protein sequences with the modified cysteine residues and PTM types, aligning with protein functional regions and secondary structures. The zoom bar is provided to zoom in (or out) the displayed regions. There are four buttons on the left side to control the browser or download the related data for further analysis.
Detail information
Protein information
This part contains Uniprot ID, the description of protein functions, subcellular location, and protein sequence with red-coloured modified cysteine residues.
PTM table
The table records the detailed information of each modified site of this protein. It includes the site's position, sequence window (P-7 to P+7), PTM type, identification strategy, identification approach, sample origin, literature and publication year. The literature is hyperlinked to the PubMed website.s
Cross reference
Three external databases are provided for cross-references, including the Reactome database for protein pathways, the STRING database for protein-protein interactions, and the dbPTM database for other PTM information on this protein.
Online analysis tools
Users could use the 'Basket' module to save the items of interest in the PTM table for further online analysis on the tools page. The Basket was designed as a slide page popping when clicking the basket button. There are two baskets (A and B) on the tools page, where A is essential and B is optional, only used for comparison analysis. The items in both baskets are transferrable. Three tools are provided, including Gene Oncology enrichment analysis, Sequence Logo and Composition heatmap.
Basket
The 'Basket' button is shown on the right side. After clicking the basket button, the basket panel will pop up. Users could select items in the PTM table and click the 'Add to basket' button to save the selected items. The saved items could be removed by using the 'Delete' button.
Overview
This page shows the selected items from the PTM table. There are two baskets on this page, and users can transfer the items from one basket to another. The items can be selected and transferred via the arrow button. Notably, any change should be saved via the 'Apply' button to come into effect.
Gene ontology analysis
This module is based on the Enrichr web API (https://maayanlab.cloud/Enrichr/). It performs statistical enrichment analysis for the proteins of the basket A. The output shows the results of GO:BP (biological process), GO:MF (molecular function) and GO:CC (cellular component). Detailed information is also provided.
Sequence Logo
This module is based on WebLogo (http://weblogo.berkeley.edu/logo.cgi). The logos were generated according to the flanking sequence of modified cysteines. Each logo consists of stacks of symbols, and one stack corresponds to a position of the sequence. The height of symbols within the stack indicates the relative frequency of each amino at that position.
Composition heatmap
The composition heat map contains two modules: Position Probability Matrix (PPM) and Position Weight Matrix (PWM). Two modules are based on the SeqLogo python package (http://bioconductor.org/packages/release/bioc/vignettes/seqLogo/inst/doc/seqLogo.html).PPM describes the probability of each amino acid on each position of the sequences. PWM illustrated the pattern of the amino acid distribution around the modified cysteines. The PPM and PWM for each Basket were separately calculated through the formulas as below:
\begin{equation} M_{P P M}=\left(\begin{array}{cccc} P_{1,1} & P_{1,2} & \cdots & P_{1, n} \\ P_{2,1} & P_{2,2} & \cdots & P_{2, n} \\ \vdots & \vdots & \ddots & \vdots \\ P_{m, 1} & P_{m, 2} & \cdots & P_{m, n} \end{array}\right) \end{equation}
\begin{equation} M_{P W M}=\log _{2}\left(\frac{M_{P P M}}{b_{m}}\right) \end{equation}
Where \(P_{m, n}\) was the probability of the amino acid m at the position n of the sequences. PWM is the PPM converted into log-likelihood, where \(b_m\) is the probability of amino acid m in the proteome. In this module, m is up to 20 and the range of n values is from -15 to +15.
Prediction tools (external)
This part collects many online prediction tools, users could employ them to predict their own data.
Download
Users could download data in tsv, xml, fasta and json format.