SalMotifDB 0.0.0.9000
Recently developed genome resources in Salmonid fish provides tools for studying the genomics underlying a wide range of properties including life history trait variation in the wild, economically important traits in aquaculture and the evolutionary consequences of whole genome duplications. Although genome assemblies now exist for a number of salmonid species, the lack of regulatory annotations are holding back our mechanistic understanding of how genetic variation in non-coding regulatory regions affect gene expression and the downstream phenotypic effects.
Eukaryotic regulatory regions are characterized based a set of discovered transcription factor binding sites (TFBSs), which can be represented as sequence patterns with various degree of degeneracy.
This SalMotifDB package and its associated web interface is designed to be a computational tool for the analysis of transcription factors (TFs) and their cis-regulatory binding sites in five salmonid genomes. SalMotifDB integrates TF-binding site information for non-redundant DNA patterns (motifs) assembled from a large number of metazoan motif databases.
So far this package contains a set of integrated functions. All functions access a public database so you need to have an internet access to benefit from this tool. Alternatively you can use the SalMotifDB web interface.
The following tutorial demonstrate how to operate on the current version of SalMotifDB database using available functions in SalMotifDB package, the interpretation of the results and some associated methods defined for these functions.
To explore the basic data manipulation verbs of SalMotifDB, we’ll use the lipidGenes
dataset shipped with SalMotifDB
R package . This dataset contains 1421 genes grouped into different KEGG pathway. The dataset is obtained from Life‐stage‐associated remodelling of lipid metabolism regulation in Atlantic salmon.
This enrichment function allows you to input a list of genes (e.g. differentially expressed genes) and identify motifs that match the promoters of these genes more often than expected by chance. The tool gives details about enrichment p-values (using the hypergeometric distribution), as well as details about all individual motif matches to promoters of genes in the list.
For more details, please see the help page for as ?EnrichMotif()
.
We will extract 17 genesets for “Fatty acid elongation, Biosynthesis of unsaturated fatty acids, and Fatty acid metabolism” KEGG pathway
data( 'lipidGenes', package='SalMotifDB')
fam <- as.data.table(lipidGenes[KEGG_pathway_name == 'Fatty acid elongation, Biosynthesis of unsaturated fatty acids, Fatty acid metabolism', gene_id])
fam$V1
#> [1] "gene1745:100136433" "gene18253:106611334" "gene18572:106611586"
#> [4] "gene23520:106562668" "gene25123:106564267" "gene38708:106577593"
#> [7] "gene39757:106578705" "gene40749:100192341" "gene43488:106582347"
#> [10] "gene46138:100286513" "gene46147:100286513" "gene47639:100196500"
#> [13] "gene49196:106587973" "gene51359:100192340" "gene51681:106590135"
#> [16] "gene52193:106590656" "gene909:106603767"
Using the gene list we prepared above, run EnrichMotif function for Atlantic salmon species and for motifs predicted in upstream promoter sequences.
resultList <- EnrichMotif(myFile=fam,mySpecies="Atlantic salmon",topTFs=10,dbType="DNA-seq",RepeatsMasked="Yes", Conserved = "Yes")
The result contains 5 elements
enrichedMotifs: Most enriched motif for the genesets
associatedGenes: Target genes for each motif
networkEdges: Selected genes network edges via shared TFs
networkNodes: Selected genes network nodes via shared TFs
resTableBed: result table in bed format
Let’s walk through the objects one by one.
A) enrichedMotifs
enrichedmotifs <- resultList$enrichedMotifs
knitr::kable(enrichedmotifs[1:10], format="html") %>% kable_styling(bootstrap_options = c("striped"), font_size = 10, full_width = T, position = "center") %>% scroll_box(width = "80%", height = "200px")
ID | Motif ID | TF | Central TF | Central motif ID | Database source | Occurence in geneset | Occurence in genome | P-value |
---|---|---|---|---|---|---|---|---|
25 | six4_M01374_TRANSFAC | SIX4 | SIX4 | 25 | TRANSFAC | 2 | 1038 | 0.0333661 |
29 | Irx6_M01377_TRANSFAC | IRX6 | IRX5 | 10884 | TRANSFAC | 1 | 779 | 0.2069358 |
62 | Irx2_M01405_TRANSFAC | IRX2 | IRX5 | 10884 | TRANSFAC | 1 | 775 | 0.2059727 |
68 | IRX4_M01410_TRANSFAC | IRX4 | IRX5 | 10884 | TRANSFAC | 1 | 879 | 0.2306992 |
125 | Tcfe3_M0174_1.02_CISBP | TCFE3 | BHLHB2 | 9398 | CISBP | 2 | 816 | 0.0212285 |
156 | Mitf_M0208_1.02_CISBP | MITF | BHLHB2 | 9398 | CISBP | 2 | 800 | 0.0204468 |
235 | Obox6_M01445_TRANSFAC | OBOX6 | OTX2 | 10925 | TRANSFAC | 1 | 427 | 0.1183637 |
331 | Elf-1_M00110_TRANSFAC | ELF-1 | GRH | 909 | TRANSFAC | 2 | 855 | 0.0231879 |
369 | lin54_M0593_1.02_CISBP | LIN54 | LIN54 | 370 | CISBP | 1 | 831 | 0.2193679 |
370 | LIN54_M0594_1.02_CISBP | LIN54 | LIN54 | 370 | CISBP | 1 | 650 | 0.1753803 |
This table (scroll the table to see all rows and columns) gives details about enrichment p-values (using the hypergeometric distribution), as well as details about all individual motif matches to promoters of genes in the list. Let’s go through the columns in detail.
Motif ID: motif id by source database. Each Motif ID made up of three parts delimited by underscore.
TF: The transcription factor name that binds to the moitf.
Central TF: To reduce motif redundancy, we clustered our motif collections from different sources. We first clustered motifs within each database and then clustered the central motifs (i.e. the motif with the highest similarity to other motifs in the cluster calculated by matrix-clustering) of these database-specific clusters across databases. Each cluster represented by one non-redundant central motif. This column shows the representative TF for each cluster.
Central motif ID: The motif ID for the Central TF in the SalMotifDB.
Database source: The source database that the motif is obtained.
Occurrence in geneset: The motif occurrence in your test geneset.
Occurrence in genome: The motif occurrence in the genome.
P-value: Hypergeometric distribution p-value.
B) associatedGenes
associatedgenes <- resultList$associatedGenes
knitr::kable(associatedgenes[1:10, -c("Gene strand", "Motif strand")], format="html") %>% kable_styling(bootstrap_options = c("striped"), font_size = 10, full_width = T, position = "center") %>% scroll_box(width = "80%", height = "200px")
Gene ID | Chromosome | Biotype | Product | Motif ID | TF | Score | Distance | Motif length | Start | Stop | Matched sequence | Occurence in genome | Occruence in gene set | P-value | Source database | Central TF | Central motif ID |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
gene38708:106577593 | ssa18 | protein_coding | 3-hydroxyacyl-CoA dehydratase 4 | six4_M01374_TRANSFAC | SIX4 | 11.40740 | -396 | 17 | 49456794 | 49456810 | CACTCTGACACCTCAGG | 1038 | 2 | 0.0333661 | TRANSFAC | SIX4 | 25 |
gene38708:106577593 | ssa18 | protein_coding | 3-hydroxyacyl-CoA dehydratase 4 | Irx6_M01377_TRANSFAC | IRX6 | 11.61730 | -307 | 17 | 49456705 | 49456721 | GACCTACATGTTGTGCT | 779 | 1 | 0.2069358 | TRANSFAC | IRX5 | 10884 |
gene38708:106577593 | ssa18 | protein_coding | 3-hydroxyacyl-CoA dehydratase 4 | Irx2_M01405_TRANSFAC | IRX2 | 11.22220 | -306 | 17 | 49456704 | 49456720 | GCACAACATGTAGGTCA | 775 | 1 | 0.2059727 | TRANSFAC | IRX5 | 10884 |
gene38708:106577593 | ssa18 | protein_coding | 3-hydroxyacyl-CoA dehydratase 4 | IRX4_M01410_TRANSFAC | IRX4 | 10.92680 | -306 | 17 | 49456704 | 49456720 | GCACAACATGTAGGTCA | 879 | 1 | 0.2306992 | TRANSFAC | IRX5 | 10884 |
gene38708:106577593 | ssa18 | protein_coding | 3-hydroxyacyl-CoA dehydratase 4 | IRX4_M01410_TRANSFAC | IRX4 | 10.96340 | -307 | 17 | 49456705 | 49456721 | GACCTACATGTTGTGCT | 879 | 1 | 0.2306992 | TRANSFAC | IRX5 | 10884 |
gene38708:106577593 | ssa18 | protein_coding | 3-hydroxyacyl-CoA dehydratase 4 | Tcfe3_M0174_1.02_CISBP | TCFE3 | 11.06170 | -301 | 10 | 49456699 | 49456708 | GGTCACATGG | 816 | 2 | 0.0212285 | CISBP | BHLHB2 | 9398 |
gene38708:106577593 | ssa18 | protein_coding | 3-hydroxyacyl-CoA dehydratase 4 | Mitf_M0208_1.02_CISBP | MITF | 11.23460 | -301 | 10 | 49456699 | 49456708 | GGTCACATGG | 800 | 2 | 0.0204468 | CISBP | BHLHB2 | 9398 |
gene38708:106577593 | ssa18 | protein_coding | 3-hydroxyacyl-CoA dehydratase 4 | Obox6_M01445_TRANSFAC | OBOX6 | 11.46910 | -130 | 15 | 49456528 | 49456542 | AAAAACAGATTATGG | 427 | 1 | 0.1183637 | TRANSFAC | OTX2 | 10925 |
gene38708:106577593 | ssa18 | protein_coding | 3-hydroxyacyl-CoA dehydratase 4 | Elf-1_M00110_TRANSFAC | ELF-1 | 11.71900 | -140 | 16 | 49456538 | 49456553 | GGGTATGGTTTAAAAA | 855 | 2 | 0.0231879 | TRANSFAC | GRH | 909 |
gene38708:106577593 | ssa18 | protein_coding | 3-hydroxyacyl-CoA dehydratase 4 | lin54_M0593_1.02_CISBP | LIN54 | 8.17241 | -80 | 8 | 49456478 | 49456485 | GTTTGAAT | 831 | 1 | 0.2193679 | CISBP | LIN54 | 370 |
This table (scroll the table to see all rows and columns) provides details about all individual motif matches to the promoters of genes in the list. The first five columns are about each gene and are obtained from NCBI database annotation. Motif ID, TF, Central TF and Central motif ID are explained above. We will explain some of the columns.
Score: the log-odds scores using log base 2 computed by the FIMO tool used to scan motifs.
Distance: the motif distance from transcription start site
Start: Motif start location in the genome
Stop: Motif stop location in the genome
Matched sequence: Actual matched sequence in the promoter sequence
P-value: statistical threshold used by FIMO (<0.0001)
C) networkEdges and networkNodes
edges <- resultList$networkEdges
nodes <- resultList$networkNodes
setDT(edges)
edges[, size := .N, by='tf']
nodes <- as.data.table(unique(left_join(nodes,edges[,.(tf,size,count)], by=c("label"="tf"))))
nodes[is.na(size), size := 0]
nodes[ , shape := ifelse(size == 0, "circle", no="triangle")]
nodes[ , color := ifelse(size == 0, "#5CFFFF", no="#F25FD0")]
nodes[size != 0, value := count]
library(visNetwork)
visNetwork(nodes=nodes, edges=edges, physics = FALSE, width = "90%") %>% visIgraphLayout() %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>% visNodes(scaling = list(label = list(enabled = T))) %>% visNetwork::visHierarchicalLayout(parentCentralization = FALSE, enabled = FALSE)
We provide a method to visualize the relationship between genes and their associated TFs. The above network visualization prepared from the top 10 enriched motifs sorted by p-value. The network diagram is interactive so that you can click on a gene (blue circle) or a TF (red triangle) to see its relationship with other nodes.
The position based search tool allows you to specify a genomic region of interest and retrieve details about all motif matches to promoters of genes located in that region.
Run MotifSearchPosition function for Atlantic salmon species and for motifs predicted for sequences that weren’t repeat masked. Let’s find motifs between 1 and 1000000 base pairs in chromosome ssa01
resultList <- MotifSearchPosition(coordinate="ssa01:1-1000000",mySpecies="Atlantic salmon",topTFs=10,dbType="DNA-seq",RepeatsMasked="Yes", Conserved = "Yes")
The result contains only one element: identifiedMotifs.
identifiedMotifs <- resultList$identifiedMotifs
knitr::kable(identifiedMotifs[1:10, -c("Gene strand", "Motif strand")], format="html") %>% kable_styling(bootstrap_options = c("striped"), font_size = 10, full_width = T, position = "center") %>% scroll_box(width = "80%")
ID | Gene ID | Chromosome | Gene start | Gene end | Score | Distance from TSS | Motif length | Matched sequence | P-value | Motif start | Motif stop | Occurence in genome | Motif ID | TF | Motif source |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
303 | gene2:106601976 | ssa01 | 228330 | 231471 | 11.209900 | 50 | 8 | GGGGTACG | 8.88e-05 | 231414 | 231421 | 210 | Zfp128_M0438_1.02_CISBP | ZFP128 | CISBP |
321 | gene2:106601976 | ssa01 | 228330 | 231471 | 13.086400 | 6 | 8 | CTATGGGG | 1.67e-05 | 231458 | 231465 | 770 | F52B5.7_M0454_1.02_CISBP | F52B5.7 | CISBP |
328 | gene2:106601976 | ssa01 | 228330 | 231471 | 11.609800 | -3 | 10 | TAGGTGGTCC | 1.60e-05 | 231465 | 231474 | 1386 | ztf-14_M0461_1.02_CISBP | ZTF-14 | CISBP |
381 | gene2:106601976 | ssa01 | 228330 | 231471 | 5.602410 | 49 | 9 | CCGTACCCC | 5.00e-05 | 231414 | 231422 | 325 | KDM2B_M0607_1.02_CISBP | KDM2B | CISBP |
389 | gene2:106601976 | ssa01 | 228330 | 231471 | 5.578310 | 48 | 10 | CCCGTACCCC | 3.68e-05 | 231414 | 231423 | 718 | mll_M0615_1.02_CISBP | MLL | CISBP |
717 | gene2:106601976 | ssa01 | 228330 | 231471 | 9.445780 | 49 | 10 | CCGTACCCCG | 2.37e-05 | 231413 | 231422 | 758 | Gm98_M1419_1.02_CISBP | GM98 | CISBP |
994 | gene2:106601976 | ssa01 | 228330 | 231471 | 9.924420 | 10 | 9 | GGGGTGCTG | 8.43e-05 | 231453 | 231461 | 1462 | ZIC2_M4148_1.02_CISBP | ZIC2 | CISBP |
997 | gene2:106601976 | ssa01 | 228330 | 231471 | 9.172840 | 37 | 14 | GGGGACCCTCCCAG | 8.63e-05 | 231421 | 231434 | 1075 | RELA_M4444_1.02_CISBP | RELA | CISBP |
1004 | gene2:106601976 | ssa01 | 228330 | 231471 | -0.518519 | 44 | 21 | GGTCCCCGTACCCCGGCATCC | 8.48e-05 | 231407 | 231427 | 1441 | SMARCC2_M4527_1.02_CISBP | SMARCC2 | CISBP |
1041 | gene2:106601976 | ssa01 | 228330 | 231471 | 10.276900 | 36 | 12 | GACCCTCCCAGG | 6.39e-05 | 231424 | 231435 | 2520 | GKLF_KLF4_M01588_TRANSFAC | GKLF_(KLF4) | TRANSFAC |
This MotifSearchGene function allows you to input a list of genes and identify motifs that match the promoters of these genes more often than expected by chance. The tool gives details about enrichment p-values (using the hypergeometric distribution), as well as details about all individual motif matches to promoters of genes in the list.
For more search criteria, please check the usage of motif enrichment analysis function as ?MotifSearchGene()
.
We will extract 3 genes for Fatty acid metabolism KEGG pathway
data( 'lipidGenes', package='SalMotifDB')
fam <- lipidGenes[KEGG_pathway_name == 'Fatty acid metabolism', gene_id]
fam
#> [1] "gene23858:106563087" "gene2736:106568363" "gene9236:106602820"
Run MotifSearchGene function for Atlantic salmon species and for motifs predicted for sequences that weren’t repeat masked
resultList <- MotifSearchGene(fam,mySpecies="Atlantic salmon",topTFs=10,dbType="DNA-seq",RepeatsMasked="Yes", Conserved = "Yes")
MotifSearchGene enrichedMotifs
enrichedMotifs <- resultList$enrichedMotifs
knitr::kable(enrichedMotifs[1:10,], format="html") %>% kable_styling(bootstrap_options = c("striped"), font_size = 10, full_width = T, position = "center") %>% scroll_box(width = "80%", height = "200px")
ID | Motif ID | TF | Central TF | Central motif ID | Database source | Occurence in geneset | Occurence in genome | P-value |
---|---|---|---|---|---|---|---|---|
303 | Zfp128_M0438_1.02_CISBP | ZFP128 | ZFP128 | 303 | CISBP | 1 | 210 | 0.0102170 |
321 | F52B5.7_M0454_1.02_CISBP | F52B5.7 | F52B5.7 | 321 | CISBP | 1 | 770 | 0.0374623 |
328 | ztf-14_M0461_1.02_CISBP | ZTF-14 | GLI | 7908 | CISBP | 1 | 1386 | 0.0674321 |
381 | KDM2B_M0607_1.02_CISBP | KDM2B | KDM2B | 377 | CISBP | 1 | 325 | 0.0158120 |
389 | mll_M0615_1.02_CISBP | MLL | KDM2B | 377 | CISBP | 1 | 718 | 0.0349324 |
717 | Gm98_M1419_1.02_CISBP | GM98 | ENSDARG00000078676 | 715 | CISBP | 1 | 758 | 0.0368785 |
994 | ZIC2_M4148_1.02_CISBP | ZIC2 | GLI | 7908 | CISBP | 1 | 1462 | 0.0711297 |
997 | RELA_M4444_1.02_CISBP | RELA | RELA | 997 | CISBP | 1 | 1075 | 0.0523013 |
1004 | SMARCC2_M4527_1.02_CISBP | SMARCC2 | SMARCC2 | 1004 | CISBP | 1 | 1441 | 0.0701080 |
1041 | GKLF_KLF4_M01588_TRANSFAC | GKLF_(KLF4) | SP4 | 3370 | TRANSFAC | 1 | 2520 | 0.1226039 |
This table gives details about enrichment p-values (using the hypergeometric distribution), as well as details about all individual motif matches to promoters of genes in the list. The columns are explained under enrichedMotifs function above.
associatedgenes <- resultList$associatedGenes
knitr::kable(associatedgenes[1:10, -c("Gene strand", "Motif strand")], format="html") %>% kable_styling(bootstrap_options = c("striped"), font_size = 10, full_width = T, position = "center") %>% scroll_box(width = "80%", height = "200px")
Gene ID | Chromosome | Biotype | Product | Motif ID | TF | Score | Distance | Motif length | Start | Stop | Matched sequence | Occurence in genome | Occruence in gene set | P-value | Source database | Central TF | Central motif ID |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
gene2:106601976 | ssa01 | protein_coding | fibroblast growth factor receptor 3-like | Zfp128_M0438_1.02_CISBP | ZFP128 | 11.209900 | 50 | 8 | 231414 | 231421 | GGGGTACG | 210 | 1 | 0.0102170 | CISBP | ZFP128 | 303 |
gene2:106601976 | ssa01 | protein_coding | fibroblast growth factor receptor 3-like | F52B5.7_M0454_1.02_CISBP | F52B5.7 | 13.086400 | 6 | 8 | 231458 | 231465 | CTATGGGG | 770 | 1 | 0.0374623 | CISBP | F52B5.7 | 321 |
gene2:106601976 | ssa01 | protein_coding | fibroblast growth factor receptor 3-like | ztf-14_M0461_1.02_CISBP | ZTF-14 | 11.609800 | -3 | 10 | 231465 | 231474 | TAGGTGGTCC | 1386 | 1 | 0.0674321 | CISBP | GLI | 7908 |
gene2:106601976 | ssa01 | protein_coding | fibroblast growth factor receptor 3-like | KDM2B_M0607_1.02_CISBP | KDM2B | 5.602410 | 49 | 9 | 231414 | 231422 | CCGTACCCC | 325 | 1 | 0.0158120 | CISBP | KDM2B | 377 |
gene2:106601976 | ssa01 | protein_coding | fibroblast growth factor receptor 3-like | mll_M0615_1.02_CISBP | MLL | 5.578310 | 48 | 10 | 231414 | 231423 | CCCGTACCCC | 718 | 1 | 0.0349324 | CISBP | KDM2B | 377 |
gene2:106601976 | ssa01 | protein_coding | fibroblast growth factor receptor 3-like | Gm98_M1419_1.02_CISBP | GM98 | 9.445780 | 49 | 10 | 231413 | 231422 | CCGTACCCCG | 758 | 1 | 0.0368785 | CISBP | ENSDARG00000078676 | 715 |
gene2:106601976 | ssa01 | protein_coding | fibroblast growth factor receptor 3-like | ZIC2_M4148_1.02_CISBP | ZIC2 | 9.924420 | 10 | 9 | 231453 | 231461 | GGGGTGCTG | 1462 | 1 | 0.0711297 | CISBP | GLI | 7908 |
gene2:106601976 | ssa01 | protein_coding | fibroblast growth factor receptor 3-like | RELA_M4444_1.02_CISBP | RELA | 9.172840 | 37 | 14 | 231421 | 231434 | GGGGACCCTCCCAG | 1075 | 1 | 0.0523013 | CISBP | RELA | 997 |
gene2:106601976 | ssa01 | protein_coding | fibroblast growth factor receptor 3-like | SMARCC2_M4527_1.02_CISBP | SMARCC2 | -0.518519 | 44 | 21 | 231407 | 231427 | GGTCCCCGTACCCCGGCATCC | 1441 | 1 | 0.0701080 | CISBP | SMARCC2 | 1004 |
gene2:106601976 | ssa01 | protein_coding | fibroblast growth factor receptor 3-like | GKLF_KLF4_M01588_TRANSFAC | GKLF_(KLF4) | 10.276900 | 36 | 12 | 231424 | 231435 | GACCCTCCCAGG | 2520 | 1 | 0.1226039 | TRANSFAC | SP4 | 3370 |
This gives details about all individual motif matches to promoters of genes in the list. The first five columns are about each gene and are obtained from gene annotation file in the NCBI database. The columns are explained under enrichedMotifs function above.
networkEdges and networkNodes
edges <- resultList$networkEdges
nodes <- resultList$networkNodes
setDT(edges)
edges[, size := .N, by='tf']
nodes <- as.data.table(unique(left_join(nodes,edges[,.(tf,size,count)], by=c("label"="tf"))))
nodes[is.na(size), size := 0]
# nodes_d3[ , shape := ifelse(size == 0, "dot", no="triangle")]
nodes[ , shape := ifelse(size == 0, "circle", no="triangle")]
nodes[ , color := ifelse(size == 0, "#5CFFFF", no="#F25FD0")]
nodes[size != 0, value := count]
library(visNetwork)
visNetwork(nodes=nodes, edges=edges, physics = FALSE, width = "90%") %>% visIgraphLayout() %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>% visNodes(scaling = list(label = list(enabled = T))) %>% visNetwork::visHierarchicalLayout(parentCentralization = FALSE, enabled = FALSE)
The network diagram is explained under enrichedMotifs function above.
This SearchPredictedTFs function allows you to search predicted transcription factors for a single gene or set of genes for selected salmonid species. The TFs are predicted salmonid orthologs with information on BLAST E-value score and shared NCBI conserved domain database (CDD).
For more search criteria, please check the usage of SearchPredictedTFs analysis function as ?SearchPredictedTFs()
.
We will use all 1421 genes for lipid metabolism genes in our dataset.
data( 'lipidGenes', package='SalMotifDB')
fam <- lipidGenes$gene_id
length(fam)
#> [1] 1421
Run SearchPredictedTFs function for Atlantic salmon species to check if a gene is predicted as TF.
resultList <- SearchPredictedTFs(fam,mySpecies="Atlantic salmon")
predictedTFs <- resultList$predictedTFs
knitr::kable(predictedTFs[,-c("Gene strand")], format="html") %>% kable_styling(bootstrap_options = c("striped"), font_size = 10, full_width = T, position = "center") %>% scroll_box(width = "80%", height = "200px")
Gene ID | Gene start | Gene end | Chromosome | Gene name | Product | TF | E-value | Bitscore | Predicted as TF | CDD ID | Accession ID | TF superfamily | CD name |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
gene11716:106605079 | 49591271 | 49593238 | ssa05 | LOC106605079 | nuclear receptor subfamily 0 group B member 2-like | DAX1 | 0.000 | 171.0 | TRUE | 132726 | cd06157 | cl11397 | NR_LBD |
gene12735:106606082 | 76683067 | 76704849 | ssa05 | LOC106606082 | retinoic acid receptor RXR-beta-A-like | NHR-154 | 0.000 | 209.0 | TRUE | 132726 | cd06157 | cl11397 | NR_LBD |
gene13206:100502556 | 13986970 | 14001140 | ssa06 | srebf1 | sterol regulatory element binding transcription factor 1 | SREBF1 | 0.000 | 74.6 | TRUE | 238036 | cd00083 | cl00081 | HLH |
gene13245:106606621 | 15334374 | 15352974 | ssa06 | LOC106606621 | sterol regulatory element-binding protein 1-like | SREBF1 | 0.000 | 79.3 | TRUE | 238036 | cd00083 | cl00081 | HLH |
gene13982:106607132 | 34091778 | 34110167 | ssa06 | LOC106607132 | sterol regulatory element-binding protein 2-like | SREBP-2 | 0.000 | 80.5 | TRUE | 238036 | cd00083 | cl00081 | HLH |
gene21542:106560693 | 59507339 | 59583473 | ssa10 | LOC106560693 | forkhead box protein O1-A-like | FOXO | 0.000 | 169.0 | TRUE | 238016 | cd00059 | cl00061 | FH |
gene21752:100136415 | 72569414 | 72629719 | ssa10 | LOC100136415 | peroxisome proliferator-activated receptor alpha | PPARA | 0.018 | 34.3 | FALSE | 100121 | cd06224 | cl02520 | REM |
gene22207:106561422 | 94620065 | 94657017 | ssa10 | LOC106561422 | bile acid receptor-like | NHR-168 | 0.000 | 202.0 | TRUE | 132726 | cd06157 | cl11397 | NR_LBD |
gene22778:106561932 | 8588464 | 8627339 | ssa11 | LOC106561932 | oxysterols receptor LXR-alpha-like | NR1I3 | 0.000 | 202.0 | TRUE | 132726 | cd06157 | cl11397 | NR_LBD |
gene24141:106563228 | 69740722 | 69889585 | ssa11 | LOC106563228 | retinoic acid receptor RXR-alpha-A | RXRA | 0.000 | 210.0 | FALSE | 132726 | cd06157 | cl11397 | NR_LBD |
gene2444:106565394 | 123978637 | 124148799 | ssa01 | LOC106565394 | retinoic acid receptor RXR-alpha-A-like | RXRA | 0.000 | 210.0 | FALSE | 132726 | cd06157 | cl11397 | NR_LBD |
gene27586:106566498 | 10457571 | 10477633 | ssa13 | LOC106566498 | ETS domain-containing protein Elk-1-like | ELK3 | 0.000 | 170.0 | TRUE | 197710 | smart00413 | cl02599 | ETS |
gene27782:106566754 | 19141373 | 19162258 | ssa13 | LOC106566754 | peroxisome proliferator-activated receptor gamma-like | EIP75B | 0.000 | 194.0 | TRUE | 132726 | cd06157 | cl11397 | NR_LBD |
gene28097:106566994 | 32579139 | 32594159 | ssa13 | LOC106566994 | peroxisome proliferator-activated receptor delta-like | NHR-177 | 0.000 | 194.0 | TRUE | 132726 | cd06157 | cl11397 | NR_LBD |
gene30035:106568909 | 14472434 | 14522880 | ssa14 | LOC106568909 | retinoic acid receptor RXR-gamma-A-like | RARA_RXRA | 0.000 | 209.0 | TRUE | 132726 | cd06157 | cl11397 | NR_LBD |
gene31190:106569952 | 59250891 | 59280126 | ssa14 | LOC106569952 | retinoic acid receptor RXR-beta-A-like | NHR-204 | 0.000 | 204.0 | TRUE | 132726 | cd06157 | cl11397 | NR_LBD |
gene31376:106570251 | 66147552 | 66148931 | ssa14 | LOC106570251 | nuclear receptor subfamily 0 group B member 2-like | DAX1 | 0.000 | 171.0 | TRUE | 132726 | cd06157 | cl11397 | NR_LBD |
gene33072:100136393 | 59828382 | 59834982 | ssa15 | pparg | peroxisome proliferator activated receptor gamma | PPAR | 0.000 | 195.0 | TRUE | 132726 | cd06157 | cl11397 | NR_LBD |
gene34435:106573302 | 16397223 | 16422055 | ssa16 | LOC106573302 | bile acid receptor-like | NHR-168 | 0.000 | 200.0 | TRUE | 132726 | cd06157 | cl11397 | NR_LBD |
gene42704:106581554 | 82231554 | 82238345 | ssa20 | LOC106581554 | forkhead box protein O1-A-like | FOXO1 | 0.000 | 165.0 | TRUE | 238016 | cd00059 | cl00061 | FH |
gene4331:100195621 | 31210986 | 31212681 | ssa02 | nr0b2 | nuclear receptor subfamily 0 group B member 2 | DAX1 | 0.000 | 173.0 | TRUE | 132726 | cd06157 | cl11397 | NR_LBD |
gene45110:106583866 | 63366547 | 63382160 | ssa22 | LOC106583866 | peroxisome proliferator-activated receptor gamma-like | SMP_016180 | 0.000 | 151.0 | FALSE | 143512 | cd06916 | cl02596 | NR_DBD_like |
gene45770:106584489 | 34814178 | 34883599 | ssa23 | LOC106584489 | peroxisome proliferator-activated receptor alpha-like | CBR-SEX-1 | 0.004 | 36.6 | FALSE | 100121 | cd06224 | cl02520 | REM |
gene47012:106585776 | 33054252 | 33097663 | ssa24 | rxra | retinoid X receptor alpha | NHR-204 | 0.000 | 208.0 | TRUE | 132726 | cd06157 | cl11397 | NR_LBD |
gene48321:100270809 | 8673140 | 8697083 | ssa26 | nr1h3 | nuclear receptor subfamily 1 group H member 3 | NR1I3 | 0.000 | 201.0 | TRUE | 132726 | cd06157 | cl11397 | NR_LBD |
gene49859:106588394 | 10323956 | 10344753 | ssa27 | LOC106588394 | retinoic acid receptor RXR-beta-A | NHR-154 | 0.000 | 213.0 | TRUE | 132726 | cd06157 | cl11397 | NR_LBD |
gene6246:106599103 | 15076653 | 15099785 | ssa03 | LOC106599103 | retinoic acid receptor RXR-gamma-A-like | NHR-3 | 0.000 | 209.0 | TRUE | 132726 | cd06157 | cl11397 | NR_LBD |
gene68452:106596748 | 2690 | 3817 | NW_012361756.1 | LOC106596748 | retinoic acid receptor RXR-gamma-A-like | NHR-232 | 0.000 | 126.0 | FALSE | 143512 | cd06916 | cl02596 | NR_DBD_like |
gene7578:100502557 | 65072943 | 65087099 | ssa03 | srebf2 | sterol regulatory element binding transcription factor 2 | SREBF1 | 0.000 | 80.2 | TRUE | 238036 | cd00083 | cl00081 | HLH |
gene8199:100502556 | 82955243 | 82969092 | ssa03 | srebf1 | sterol regulatory element binding transcription factor 1 | SREBF1 | 0.000 | 74.6 | TRUE | 238036 | cd00083 | cl00081 | HLH |
gene9442:106602890 | 30736626 | 30824370 | ssa04 | LOC106602890 | forkhead box protein O1-A-like | DAF-16 | 0.000 | 170.0 | TRUE | 238016 | cd00059 | cl00061 | FH |
We have got 31 genes predicted as TF. As you can see in the table, some important TFs in lipid metabolism such as SREBF1, SREBF2, LXR, RXRA and PPA are predicted.