1 Introduction

Recently developed genome resources in Salmonid fish provides tools for studying the genomics underlying a wide range of properties including life history trait variation in the wild, economically important traits in aquaculture and the evolutionary consequences of whole genome duplications. Although genome assemblies now exist for a number of salmonid species, the lack of regulatory annotations are holding back our mechanistic understanding of how genetic variation in non-coding regulatory regions affect gene expression and the downstream phenotypic effects.

Eukaryotic regulatory regions are characterized based a set of discovered transcription factor binding sites (TFBSs), which can be represented as sequence patterns with various degree of degeneracy.

This SalMotifDB package and its associated web interface is designed to be a computational tool for the analysis of transcription factors (TFs) and their cis-regulatory binding sites in five salmonid genomes. SalMotifDB integrates TF-binding site information for non-redundant DNA patterns (motifs) assembled from a large number of metazoan motif databases.

So far this package contains a set of integrated functions. All functions access a public database so you need to have an internet access to benefit from this tool. Alternatively you can use the SalMotifDB web interface.

2 SalMotifDB functions

The following tutorial demonstrate how to operate on the current version of SalMotifDB database using available functions in SalMotifDB package, the interpretation of the results and some associated methods defined for these functions.

3 Data: lipidGenes

To explore the basic data manipulation verbs of SalMotifDB, we’ll use the lipidGenes dataset shipped with SalMotifDB R package . This dataset contains 1421 genes grouped into different KEGG pathway. The dataset is obtained from Life‐stage‐associated remodelling of lipid metabolism regulation in Atlantic salmon.

3.1 EnrichMotif

This enrichment function allows you to input a list of genes (e.g. differentially expressed genes) and identify motifs that match the promoters of these genes more often than expected by chance. The tool gives details about enrichment p-values (using the hypergeometric distribution), as well as details about all individual motif matches to promoters of genes in the list. For more details, please see the help page for as ?EnrichMotif().

3.1.1 Load the example dataset.

We will extract 17 genesets for “Fatty acid elongation, Biosynthesis of unsaturated fatty acids, and Fatty acid metabolism” KEGG pathway

data( 'lipidGenes', package='SalMotifDB')
fam <- as.data.table(lipidGenes[KEGG_pathway_name == 'Fatty acid elongation, Biosynthesis of unsaturated fatty acids, Fatty acid metabolism', gene_id])
fam$V1
#>  [1] "gene1745:100136433"  "gene18253:106611334" "gene18572:106611586"
#>  [4] "gene23520:106562668" "gene25123:106564267" "gene38708:106577593"
#>  [7] "gene39757:106578705" "gene40749:100192341" "gene43488:106582347"
#> [10] "gene46138:100286513" "gene46147:100286513" "gene47639:100196500"
#> [13] "gene49196:106587973" "gene51359:100192340" "gene51681:106590135"
#> [16] "gene52193:106590656" "gene909:106603767"

3.1.2 Run EnrichMotif function

Using the gene list we prepared above, run EnrichMotif function for Atlantic salmon species and for motifs predicted in upstream promoter sequences.

resultList <- EnrichMotif(myFile=fam,mySpecies="Atlantic salmon",topTFs=10,dbType="DNA-seq",RepeatsMasked="Yes", Conserved = "Yes")

The result contains 5 elements

  • enrichedMotifs: Most enriched motif for the genesets

  • associatedGenes: Target genes for each motif

  • networkEdges: Selected genes network edges via shared TFs

  • networkNodes: Selected genes network nodes via shared TFs

  • resTableBed: result table in bed format

Let’s walk through the objects one by one.


A) enrichedMotifs

enrichedmotifs <- resultList$enrichedMotifs
knitr::kable(enrichedmotifs[1:10], format="html")  %>% kable_styling(bootstrap_options = c("striped"), font_size = 10, full_width = T, position = "center")  %>% scroll_box(width = "80%", height = "200px")
ID Motif ID TF Central TF Central motif ID Database source Occurence in geneset Occurence in genome P-value
25 six4_M01374_TRANSFAC SIX4 SIX4 25 TRANSFAC 2 1038 0.0333661
29 Irx6_M01377_TRANSFAC IRX6 IRX5 10884 TRANSFAC 1 779 0.2069358
62 Irx2_M01405_TRANSFAC IRX2 IRX5 10884 TRANSFAC 1 775 0.2059727
68 IRX4_M01410_TRANSFAC IRX4 IRX5 10884 TRANSFAC 1 879 0.2306992
125 Tcfe3_M0174_1.02_CISBP TCFE3 BHLHB2 9398 CISBP 2 816 0.0212285
156 Mitf_M0208_1.02_CISBP MITF BHLHB2 9398 CISBP 2 800 0.0204468
235 Obox6_M01445_TRANSFAC OBOX6 OTX2 10925 TRANSFAC 1 427 0.1183637
331 Elf-1_M00110_TRANSFAC ELF-1 GRH 909 TRANSFAC 2 855 0.0231879
369 lin54_M0593_1.02_CISBP LIN54 LIN54 370 CISBP 1 831 0.2193679
370 LIN54_M0594_1.02_CISBP LIN54 LIN54 370 CISBP 1 650 0.1753803

This table (scroll the table to see all rows and columns) gives details about enrichment p-values (using the hypergeometric distribution), as well as details about all individual motif matches to promoters of genes in the list. Let’s go through the columns in detail.

Motif ID: motif id by source database. Each Motif ID made up of three parts delimited by underscore.

  • For example, Six-3_M01358_TRANSFAC.
    • Six-3: the TF name that binds to the binding site
    • M01358: motif id from the source database
    • TRANSFAC: source database

TF: The transcription factor name that binds to the moitf.

Central TF: To reduce motif redundancy, we clustered our motif collections from different sources. We first clustered motifs within each database and then clustered the central motifs (i.e. the motif with the highest similarity to other motifs in the cluster calculated by matrix-clustering) of these database-specific clusters across databases. Each cluster represented by one non-redundant central motif. This column shows the representative TF for each cluster.

Central motif ID: The motif ID for the Central TF in the SalMotifDB.

Database source: The source database that the motif is obtained.

Occurrence in geneset: The motif occurrence in your test geneset.

Occurrence in genome: The motif occurrence in the genome.

P-value: Hypergeometric distribution p-value.


B) associatedGenes

associatedgenes <- resultList$associatedGenes
knitr::kable(associatedgenes[1:10, -c("Gene strand", "Motif strand")], format="html")  %>% kable_styling(bootstrap_options = c("striped"), font_size = 10, full_width = T, position = "center")  %>% scroll_box(width = "80%", height = "200px")
Gene ID Chromosome Biotype Product Motif ID TF Score Distance Motif length Start Stop Matched sequence Occurence in genome Occruence in gene set P-value Source database Central TF Central motif ID
gene38708:106577593 ssa18 protein_coding 3-hydroxyacyl-CoA dehydratase 4 six4_M01374_TRANSFAC SIX4 11.40740 -396 17 49456794 49456810 CACTCTGACACCTCAGG 1038 2 0.0333661 TRANSFAC SIX4 25
gene38708:106577593 ssa18 protein_coding 3-hydroxyacyl-CoA dehydratase 4 Irx6_M01377_TRANSFAC IRX6 11.61730 -307 17 49456705 49456721 GACCTACATGTTGTGCT 779 1 0.2069358 TRANSFAC IRX5 10884
gene38708:106577593 ssa18 protein_coding 3-hydroxyacyl-CoA dehydratase 4 Irx2_M01405_TRANSFAC IRX2 11.22220 -306 17 49456704 49456720 GCACAACATGTAGGTCA 775 1 0.2059727 TRANSFAC IRX5 10884
gene38708:106577593 ssa18 protein_coding 3-hydroxyacyl-CoA dehydratase 4 IRX4_M01410_TRANSFAC IRX4 10.92680 -306 17 49456704 49456720 GCACAACATGTAGGTCA 879 1 0.2306992 TRANSFAC IRX5 10884
gene38708:106577593 ssa18 protein_coding 3-hydroxyacyl-CoA dehydratase 4 IRX4_M01410_TRANSFAC IRX4 10.96340 -307 17 49456705 49456721 GACCTACATGTTGTGCT 879 1 0.2306992 TRANSFAC IRX5 10884
gene38708:106577593 ssa18 protein_coding 3-hydroxyacyl-CoA dehydratase 4 Tcfe3_M0174_1.02_CISBP TCFE3 11.06170 -301 10 49456699 49456708 GGTCACATGG 816 2 0.0212285 CISBP BHLHB2 9398
gene38708:106577593 ssa18 protein_coding 3-hydroxyacyl-CoA dehydratase 4 Mitf_M0208_1.02_CISBP MITF 11.23460 -301 10 49456699 49456708 GGTCACATGG 800 2 0.0204468 CISBP BHLHB2 9398
gene38708:106577593 ssa18 protein_coding 3-hydroxyacyl-CoA dehydratase 4 Obox6_M01445_TRANSFAC OBOX6 11.46910 -130 15 49456528 49456542 AAAAACAGATTATGG 427 1 0.1183637 TRANSFAC OTX2 10925
gene38708:106577593 ssa18 protein_coding 3-hydroxyacyl-CoA dehydratase 4 Elf-1_M00110_TRANSFAC ELF-1 11.71900 -140 16 49456538 49456553 GGGTATGGTTTAAAAA 855 2 0.0231879 TRANSFAC GRH 909
gene38708:106577593 ssa18 protein_coding 3-hydroxyacyl-CoA dehydratase 4 lin54_M0593_1.02_CISBP LIN54 8.17241 -80 8 49456478 49456485 GTTTGAAT 831 1 0.2193679 CISBP LIN54 370

This table (scroll the table to see all rows and columns) provides details about all individual motif matches to the promoters of genes in the list. The first five columns are about each gene and are obtained from NCBI database annotation. Motif ID, TF, Central TF and Central motif ID are explained above. We will explain some of the columns.

Score: the log-odds scores using log base 2 computed by the FIMO tool used to scan motifs.

Distance: the motif distance from transcription start site

Start: Motif start location in the genome

Stop: Motif stop location in the genome

Matched sequence: Actual matched sequence in the promoter sequence

P-value: statistical threshold used by FIMO (<0.0001)


C) networkEdges and networkNodes

edges <-  resultList$networkEdges
nodes <-  resultList$networkNodes

setDT(edges)
edges[, size := .N, by='tf']

nodes <- as.data.table(unique(left_join(nodes,edges[,.(tf,size,count)], by=c("label"="tf"))))
nodes[is.na(size), size := 0]

nodes[ , shape := ifelse(size == 0, "circle", no="triangle")]
nodes[ , color := ifelse(size == 0, "#5CFFFF", no="#F25FD0")]
nodes[size != 0, value := count]

library(visNetwork)
visNetwork(nodes=nodes, edges=edges, physics = FALSE, width = "90%") %>% visIgraphLayout() %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>% visNodes(scaling = list(label = list(enabled = T))) %>% visNetwork::visHierarchicalLayout(parentCentralization = FALSE, enabled = FALSE)

We provide a method to visualize the relationship between genes and their associated TFs. The above network visualization prepared from the top 10 enriched motifs sorted by p-value. The network diagram is interactive so that you can click on a gene (blue circle) or a TF (red triangle) to see its relationship with other nodes.

3.2 MotifSearchPosition

The position based search tool allows you to specify a genomic region of interest and retrieve details about all motif matches to promoters of genes located in that region.

3.2.1 Run MotifSearchPosition function

Run MotifSearchPosition function for Atlantic salmon species and for motifs predicted for sequences that weren’t repeat masked. Let’s find motifs between 1 and 1000000 base pairs in chromosome ssa01

resultList <- MotifSearchPosition(coordinate="ssa01:1-1000000",mySpecies="Atlantic salmon",topTFs=10,dbType="DNA-seq",RepeatsMasked="Yes", Conserved = "Yes")

The result contains only one element: identifiedMotifs.

identifiedMotifs <- resultList$identifiedMotifs
knitr::kable(identifiedMotifs[1:10, -c("Gene strand", "Motif strand")], format="html")  %>% kable_styling(bootstrap_options = c("striped"), font_size = 10, full_width = T, position = "center")  %>% scroll_box(width = "80%")
ID Gene ID Chromosome Gene start Gene end Score Distance from TSS Motif length Matched sequence P-value Motif start Motif stop Occurence in genome Motif ID TF Motif source
303 gene2:106601976 ssa01 228330 231471 11.209900 50 8 GGGGTACG 8.88e-05 231414 231421 210 Zfp128_M0438_1.02_CISBP ZFP128 CISBP
321 gene2:106601976 ssa01 228330 231471 13.086400 6 8 CTATGGGG 1.67e-05 231458 231465 770 F52B5.7_M0454_1.02_CISBP F52B5.7 CISBP
328 gene2:106601976 ssa01 228330 231471 11.609800 -3 10 TAGGTGGTCC 1.60e-05 231465 231474 1386 ztf-14_M0461_1.02_CISBP ZTF-14 CISBP
381 gene2:106601976 ssa01 228330 231471 5.602410 49 9 CCGTACCCC 5.00e-05 231414 231422 325 KDM2B_M0607_1.02_CISBP KDM2B CISBP
389 gene2:106601976 ssa01 228330 231471 5.578310 48 10 CCCGTACCCC 3.68e-05 231414 231423 718 mll_M0615_1.02_CISBP MLL CISBP
717 gene2:106601976 ssa01 228330 231471 9.445780 49 10 CCGTACCCCG 2.37e-05 231413 231422 758 Gm98_M1419_1.02_CISBP GM98 CISBP
994 gene2:106601976 ssa01 228330 231471 9.924420 10 9 GGGGTGCTG 8.43e-05 231453 231461 1462 ZIC2_M4148_1.02_CISBP ZIC2 CISBP
997 gene2:106601976 ssa01 228330 231471 9.172840 37 14 GGGGACCCTCCCAG 8.63e-05 231421 231434 1075 RELA_M4444_1.02_CISBP RELA CISBP
1004 gene2:106601976 ssa01 228330 231471 -0.518519 44 21 GGTCCCCGTACCCCGGCATCC 8.48e-05 231407 231427 1441 SMARCC2_M4527_1.02_CISBP SMARCC2 CISBP
1041 gene2:106601976 ssa01 228330 231471 10.276900 36 12 GACCCTCCCAGG 6.39e-05 231424 231435 2520 GKLF_KLF4_M01588_TRANSFAC GKLF_(KLF4) TRANSFAC

3.3 MotifSearchGene

This MotifSearchGene function allows you to input a list of genes and identify motifs that match the promoters of these genes more often than expected by chance. The tool gives details about enrichment p-values (using the hypergeometric distribution), as well as details about all individual motif matches to promoters of genes in the list.

For more search criteria, please check the usage of motif enrichment analysis function as ?MotifSearchGene().

3.3.1 Load the example dataset.

We will extract 3 genes for Fatty acid metabolism KEGG pathway

data( 'lipidGenes', package='SalMotifDB')
fam <- lipidGenes[KEGG_pathway_name == 'Fatty acid metabolism', gene_id]
fam
#> [1] "gene23858:106563087" "gene2736:106568363"  "gene9236:106602820"

3.3.2 Run MotifSearchGene function

Run MotifSearchGene function for Atlantic salmon species and for motifs predicted for sequences that weren’t repeat masked

resultList <- MotifSearchGene(fam,mySpecies="Atlantic salmon",topTFs=10,dbType="DNA-seq",RepeatsMasked="Yes", Conserved = "Yes")

MotifSearchGene enrichedMotifs

enrichedMotifs <- resultList$enrichedMotifs
knitr::kable(enrichedMotifs[1:10,], format="html")  %>% kable_styling(bootstrap_options = c("striped"), font_size = 10, full_width = T, position = "center")  %>% scroll_box(width = "80%", height = "200px")
ID Motif ID TF Central TF Central motif ID Database source Occurence in geneset Occurence in genome P-value
303 Zfp128_M0438_1.02_CISBP ZFP128 ZFP128 303 CISBP 1 210 0.0102170
321 F52B5.7_M0454_1.02_CISBP F52B5.7 F52B5.7 321 CISBP 1 770 0.0374623
328 ztf-14_M0461_1.02_CISBP ZTF-14 GLI 7908 CISBP 1 1386 0.0674321
381 KDM2B_M0607_1.02_CISBP KDM2B KDM2B 377 CISBP 1 325 0.0158120
389 mll_M0615_1.02_CISBP MLL KDM2B 377 CISBP 1 718 0.0349324
717 Gm98_M1419_1.02_CISBP GM98 ENSDARG00000078676 715 CISBP 1 758 0.0368785
994 ZIC2_M4148_1.02_CISBP ZIC2 GLI 7908 CISBP 1 1462 0.0711297
997 RELA_M4444_1.02_CISBP RELA RELA 997 CISBP 1 1075 0.0523013
1004 SMARCC2_M4527_1.02_CISBP SMARCC2 SMARCC2 1004 CISBP 1 1441 0.0701080
1041 GKLF_KLF4_M01588_TRANSFAC GKLF_(KLF4) SP4 3370 TRANSFAC 1 2520 0.1226039

This table gives details about enrichment p-values (using the hypergeometric distribution), as well as details about all individual motif matches to promoters of genes in the list. The columns are explained under enrichedMotifs function above.

associatedgenes <- resultList$associatedGenes
knitr::kable(associatedgenes[1:10, -c("Gene strand", "Motif strand")], format="html")  %>% kable_styling(bootstrap_options = c("striped"), font_size = 10, full_width = T, position = "center")  %>% scroll_box(width = "80%", height = "200px")
Gene ID Chromosome Biotype Product Motif ID TF Score Distance Motif length Start Stop Matched sequence Occurence in genome Occruence in gene set P-value Source database Central TF Central motif ID
gene2:106601976 ssa01 protein_coding fibroblast growth factor receptor 3-like Zfp128_M0438_1.02_CISBP ZFP128 11.209900 50 8 231414 231421 GGGGTACG 210 1 0.0102170 CISBP ZFP128 303
gene2:106601976 ssa01 protein_coding fibroblast growth factor receptor 3-like F52B5.7_M0454_1.02_CISBP F52B5.7 13.086400 6 8 231458 231465 CTATGGGG 770 1 0.0374623 CISBP F52B5.7 321
gene2:106601976 ssa01 protein_coding fibroblast growth factor receptor 3-like ztf-14_M0461_1.02_CISBP ZTF-14 11.609800 -3 10 231465 231474 TAGGTGGTCC 1386 1 0.0674321 CISBP GLI 7908
gene2:106601976 ssa01 protein_coding fibroblast growth factor receptor 3-like KDM2B_M0607_1.02_CISBP KDM2B 5.602410 49 9 231414 231422 CCGTACCCC 325 1 0.0158120 CISBP KDM2B 377
gene2:106601976 ssa01 protein_coding fibroblast growth factor receptor 3-like mll_M0615_1.02_CISBP MLL 5.578310 48 10 231414 231423 CCCGTACCCC 718 1 0.0349324 CISBP KDM2B 377
gene2:106601976 ssa01 protein_coding fibroblast growth factor receptor 3-like Gm98_M1419_1.02_CISBP GM98 9.445780 49 10 231413 231422 CCGTACCCCG 758 1 0.0368785 CISBP ENSDARG00000078676 715
gene2:106601976 ssa01 protein_coding fibroblast growth factor receptor 3-like ZIC2_M4148_1.02_CISBP ZIC2 9.924420 10 9 231453 231461 GGGGTGCTG 1462 1 0.0711297 CISBP GLI 7908
gene2:106601976 ssa01 protein_coding fibroblast growth factor receptor 3-like RELA_M4444_1.02_CISBP RELA 9.172840 37 14 231421 231434 GGGGACCCTCCCAG 1075 1 0.0523013 CISBP RELA 997
gene2:106601976 ssa01 protein_coding fibroblast growth factor receptor 3-like SMARCC2_M4527_1.02_CISBP SMARCC2 -0.518519 44 21 231407 231427 GGTCCCCGTACCCCGGCATCC 1441 1 0.0701080 CISBP SMARCC2 1004
gene2:106601976 ssa01 protein_coding fibroblast growth factor receptor 3-like GKLF_KLF4_M01588_TRANSFAC GKLF_(KLF4) 10.276900 36 12 231424 231435 GACCCTCCCAGG 2520 1 0.1226039 TRANSFAC SP4 3370

This gives details about all individual motif matches to promoters of genes in the list. The first five columns are about each gene and are obtained from gene annotation file in the NCBI database. The columns are explained under enrichedMotifs function above.

networkEdges and networkNodes

edges <-  resultList$networkEdges
nodes <-  resultList$networkNodes

setDT(edges)
edges[, size := .N, by='tf']

nodes <- as.data.table(unique(left_join(nodes,edges[,.(tf,size,count)], by=c("label"="tf"))))
nodes[is.na(size), size := 0]

# nodes_d3[ , shape := ifelse(size == 0, "dot", no="triangle")]
nodes[ , shape := ifelse(size == 0, "circle", no="triangle")]
nodes[ , color := ifelse(size == 0, "#5CFFFF", no="#F25FD0")]
nodes[size != 0, value := count]

library(visNetwork)
visNetwork(nodes=nodes, edges=edges, physics = FALSE, width = "90%") %>% visIgraphLayout() %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>% visNodes(scaling = list(label = list(enabled = T))) %>% visNetwork::visHierarchicalLayout(parentCentralization = FALSE, enabled = FALSE)

The network diagram is explained under enrichedMotifs function above.

3.4 SearchPredictedTFs

This SearchPredictedTFs function allows you to search predicted transcription factors for a single gene or set of genes for selected salmonid species. The TFs are predicted salmonid orthologs with information on BLAST E-value score and shared NCBI conserved domain database (CDD).

For more search criteria, please check the usage of SearchPredictedTFs analysis function as ?SearchPredictedTFs().

3.4.1 Load the example dataset.

We will use all 1421 genes for lipid metabolism genes in our dataset.

data( 'lipidGenes', package='SalMotifDB')
fam <- lipidGenes$gene_id
length(fam)
#> [1] 1421

3.4.2 Run SearchPredictedTFs function

Run SearchPredictedTFs function for Atlantic salmon species to check if a gene is predicted as TF.

resultList <- SearchPredictedTFs(fam,mySpecies="Atlantic salmon")

predictedTFs <- resultList$predictedTFs
knitr::kable(predictedTFs[,-c("Gene strand")], format="html")  %>% kable_styling(bootstrap_options = c("striped"), font_size = 10, full_width = T, position = "center")  %>% scroll_box(width = "80%", height = "200px")
Gene ID Gene start Gene end Chromosome Gene name Product TF E-value Bitscore Predicted as TF CDD ID Accession ID TF superfamily CD name
gene11716:106605079 49591271 49593238 ssa05 LOC106605079 nuclear receptor subfamily 0 group B member 2-like DAX1 0.000 171.0 TRUE 132726 cd06157 cl11397 NR_LBD
gene12735:106606082 76683067 76704849 ssa05 LOC106606082 retinoic acid receptor RXR-beta-A-like NHR-154 0.000 209.0 TRUE 132726 cd06157 cl11397 NR_LBD
gene13206:100502556 13986970 14001140 ssa06 srebf1 sterol regulatory element binding transcription factor 1 SREBF1 0.000 74.6 TRUE 238036 cd00083 cl00081 HLH
gene13245:106606621 15334374 15352974 ssa06 LOC106606621 sterol regulatory element-binding protein 1-like SREBF1 0.000 79.3 TRUE 238036 cd00083 cl00081 HLH
gene13982:106607132 34091778 34110167 ssa06 LOC106607132 sterol regulatory element-binding protein 2-like SREBP-2 0.000 80.5 TRUE 238036 cd00083 cl00081 HLH
gene21542:106560693 59507339 59583473 ssa10 LOC106560693 forkhead box protein O1-A-like FOXO 0.000 169.0 TRUE 238016 cd00059 cl00061 FH
gene21752:100136415 72569414 72629719 ssa10 LOC100136415 peroxisome proliferator-activated receptor alpha PPARA 0.018 34.3 FALSE 100121 cd06224 cl02520 REM
gene22207:106561422 94620065 94657017 ssa10 LOC106561422 bile acid receptor-like NHR-168 0.000 202.0 TRUE 132726 cd06157 cl11397 NR_LBD
gene22778:106561932 8588464 8627339 ssa11 LOC106561932 oxysterols receptor LXR-alpha-like NR1I3 0.000 202.0 TRUE 132726 cd06157 cl11397 NR_LBD
gene24141:106563228 69740722 69889585 ssa11 LOC106563228 retinoic acid receptor RXR-alpha-A RXRA 0.000 210.0 FALSE 132726 cd06157 cl11397 NR_LBD
gene2444:106565394 123978637 124148799 ssa01 LOC106565394 retinoic acid receptor RXR-alpha-A-like RXRA 0.000 210.0 FALSE 132726 cd06157 cl11397 NR_LBD
gene27586:106566498 10457571 10477633 ssa13 LOC106566498 ETS domain-containing protein Elk-1-like ELK3 0.000 170.0 TRUE 197710 smart00413 cl02599 ETS
gene27782:106566754 19141373 19162258 ssa13 LOC106566754 peroxisome proliferator-activated receptor gamma-like EIP75B 0.000 194.0 TRUE 132726 cd06157 cl11397 NR_LBD
gene28097:106566994 32579139 32594159 ssa13 LOC106566994 peroxisome proliferator-activated receptor delta-like NHR-177 0.000 194.0 TRUE 132726 cd06157 cl11397 NR_LBD
gene30035:106568909 14472434 14522880 ssa14 LOC106568909 retinoic acid receptor RXR-gamma-A-like RARA_RXRA 0.000 209.0 TRUE 132726 cd06157 cl11397 NR_LBD
gene31190:106569952 59250891 59280126 ssa14 LOC106569952 retinoic acid receptor RXR-beta-A-like NHR-204 0.000 204.0 TRUE 132726 cd06157 cl11397 NR_LBD
gene31376:106570251 66147552 66148931 ssa14 LOC106570251 nuclear receptor subfamily 0 group B member 2-like DAX1 0.000 171.0 TRUE 132726 cd06157 cl11397 NR_LBD
gene33072:100136393 59828382 59834982 ssa15 pparg peroxisome proliferator activated receptor gamma PPAR 0.000 195.0 TRUE 132726 cd06157 cl11397 NR_LBD
gene34435:106573302 16397223 16422055 ssa16 LOC106573302 bile acid receptor-like NHR-168 0.000 200.0 TRUE 132726 cd06157 cl11397 NR_LBD
gene42704:106581554 82231554 82238345 ssa20 LOC106581554 forkhead box protein O1-A-like FOXO1 0.000 165.0 TRUE 238016 cd00059 cl00061 FH
gene4331:100195621 31210986 31212681 ssa02 nr0b2 nuclear receptor subfamily 0 group B member 2 DAX1 0.000 173.0 TRUE 132726 cd06157 cl11397 NR_LBD
gene45110:106583866 63366547 63382160 ssa22 LOC106583866 peroxisome proliferator-activated receptor gamma-like SMP_016180 0.000 151.0 FALSE 143512 cd06916 cl02596 NR_DBD_like
gene45770:106584489 34814178 34883599 ssa23 LOC106584489 peroxisome proliferator-activated receptor alpha-like CBR-SEX-1 0.004 36.6 FALSE 100121 cd06224 cl02520 REM
gene47012:106585776 33054252 33097663 ssa24 rxra retinoid X receptor alpha NHR-204 0.000 208.0 TRUE 132726 cd06157 cl11397 NR_LBD
gene48321:100270809 8673140 8697083 ssa26 nr1h3 nuclear receptor subfamily 1 group H member 3 NR1I3 0.000 201.0 TRUE 132726 cd06157 cl11397 NR_LBD
gene49859:106588394 10323956 10344753 ssa27 LOC106588394 retinoic acid receptor RXR-beta-A NHR-154 0.000 213.0 TRUE 132726 cd06157 cl11397 NR_LBD
gene6246:106599103 15076653 15099785 ssa03 LOC106599103 retinoic acid receptor RXR-gamma-A-like NHR-3 0.000 209.0 TRUE 132726 cd06157 cl11397 NR_LBD
gene68452:106596748 2690 3817 NW_012361756.1 LOC106596748 retinoic acid receptor RXR-gamma-A-like NHR-232 0.000 126.0 FALSE 143512 cd06916 cl02596 NR_DBD_like
gene7578:100502557 65072943 65087099 ssa03 srebf2 sterol regulatory element binding transcription factor 2 SREBF1 0.000 80.2 TRUE 238036 cd00083 cl00081 HLH
gene8199:100502556 82955243 82969092 ssa03 srebf1 sterol regulatory element binding transcription factor 1 SREBF1 0.000 74.6 TRUE 238036 cd00083 cl00081 HLH
gene9442:106602890 30736626 30824370 ssa04 LOC106602890 forkhead box protein O1-A-like DAF-16 0.000 170.0 TRUE 238016 cd00059 cl00061 FH

We have got 31 genes predicted as TF. As you can see in the table, some important TFs in lipid metabolism such as SREBF1, SREBF2, LXR, RXRA and PPA are predicted.