1 Introduction

Recently developed genome resources in Salmonid fish provides tools for studying the genomics underlying a wide range of properties including life history trait variation in the wild, economically important traits in aquaculture and the evolutionary consequences of whole genome duplications. Although genome assemblies now exist for a number of salmonid species, the lack of regulatory annotations are holding back our mechanistic understanding of how genetic variation in non-coding regulatory regions affect gene expression and the downstream phenotypic effects.

Eukaryotic regulatory regions are characterized based a set of discovered transcription factor binding sites (TFBSs), which can be represented as sequence patterns with various degree of degeneracy.

This SalMotifDB package and its associated web interface is designed to be a computational tool for the analysis of transcription factors (TFs) and their cis-regulatory binding sites in five salmonid genomes. SalMotifDB integrates TF-binding site information for non-redundant DNA patterns (motifs) assembled from a large number of metazoan motif databases.

So far this package contains a set of integrated functions. All functions access a public database so you need to have an internet access to benefit from this tool. Alternatively you can use the SalMotifDB web interface.

2 SalMotifDB functions

The following tutorial demonstrate how to operate on the current version of SalMotifDB database using available functions in SalMotifDB package, the interpretation of the results and some associated methods defined for these functions.

3 Data: lipidGenes

To explore the basic data manipulation verbs of SalMotifDB, we’ll use the lipidGenes dataset shipped with SalMotifDB R package . This dataset contains 1421 genes grouped into different KEGG pathway. The dataset is obtained from Life‐stage‐associated remodelling of lipid metabolism regulation in Atlantic salmon.

3.1 EnrichMotif

This enrichment function allows you to input a list of genes (e.g. differentially expressed genes) and identify motifs that match the promoters of these genes more often than expected by chance. The tool gives details about enrichment p-values (using the hypergeometric distribution), as well as details about all individual motif matches to promoters of genes in the list. For more details, please see the help page for as ?EnrichMotif().

3.1.1 Load the example dataset.

We will extract 17 genesets for “Fatty acid elongation, Biosynthesis of unsaturated fatty acids, and Fatty acid metabolism” KEGG pathway

data( 'lipidGenes', package='SalMotifDB')
fam <- as.data.table(lipidGenes[KEGG_pathway_name == 'Fatty acid elongation, Biosynthesis of unsaturated fatty acids, Fatty acid metabolism', gene_id])
fam$V1
#>  [1] "gene1745:100136433"  "gene18253:106611334" "gene18572:106611586"
#>  [4] "gene23520:106562668" "gene25123:106564267" "gene38708:106577593"
#>  [7] "gene39757:106578705" "gene40749:100192341" "gene43488:106582347"
#> [10] "gene46138:100286513" "gene46147:100286513" "gene47639:100196500"
#> [13] "gene49196:106587973" "gene51359:100192340" "gene51681:106590135"
#> [16] "gene52193:106590656" "gene909:106603767"

3.1.2 Run EnrichMotif function

Using the gene list we prepared above, run EnrichMotif function for Atlantic salmon species and for motifs predicted in upstream promoter sequences.

resultList <- EnrichMotif(myFile=fam,mySpecies="Atlantic salmon",topTFs=10,dbType="DNA-seq",RepeatsMasked="Yes", Conserved = "Yes")

The result contains 5 elements

enrichedMotifs: Most enriched motif for the genesets
associatedGenes: Target genes for each motif
networkEdges: Selected genes network edges via shared TFs
networkNodes: Selected genes network nodes via shared TFs
resTableBed: result table in bed format

Let’s walk through the objects one by one.

A) enrichedMotifs

enrichedmotifs <- resultList$enrichedMotifs
knitr::kable(enrichedmotifs[1:10], format="html")  %>% kable_styling(bootstrap_options = c("striped"), font_size = 10, full_width = T, position = "center")  %>% scroll_box(width = "80%", height = "200px")

ID	Motif ID	TF	Central TF	Central motif ID	Database source	Occurence in geneset	Occurence in genome	P-value
25	six4_M01374_TRANSFAC	SIX4	SIX4	25	TRANSFAC	2	1038	0.0333661
29	Irx6_M01377_TRANSFAC	IRX6	IRX5	10884	TRANSFAC	1	779	0.2069358
62	Irx2_M01405_TRANSFAC	IRX2	IRX5	10884	TRANSFAC	1	775	0.2059727
68	IRX4_M01410_TRANSFAC	IRX4	IRX5	10884	TRANSFAC	1	879	0.2306992
125	Tcfe3_M0174_1.02_CISBP	TCFE3	BHLHB2	9398	CISBP	2	816	0.0212285
156	Mitf_M0208_1.02_CISBP	MITF	BHLHB2	9398	CISBP	2	800	0.0204468
235	Obox6_M01445_TRANSFAC	OBOX6	OTX2	10925	TRANSFAC	1	427	0.1183637
331	Elf-1_M00110_TRANSFAC	ELF-1	GRH	909	TRANSFAC	2	855	0.0231879
369	lin54_M0593_1.02_CISBP	LIN54	LIN54	370	CISBP	1	831	0.2193679
370	LIN54_M0594_1.02_CISBP	LIN54	LIN54	370	CISBP	1	650	0.1753803

This table (scroll the table to see all rows and columns) gives details about enrichment p-values (using the hypergeometric distribution), as well as details about all individual motif matches to promoters of genes in the list. Let’s go through the columns in detail.

Motif ID: motif id by source database. Each Motif ID made up of three parts delimited by underscore.

For example, Six-3_M01358_TRANSFAC.
- Six-3: the TF name that binds to the binding site
- M01358: motif id from the source database
- TRANSFAC: source database

TF: The transcription factor name that binds to the moitf.

Central TF: To reduce motif redundancy, we clustered our motif collections from different sources. We first clustered motifs within each database and then clustered the central motifs (i.e. the motif with the highest similarity to other motifs in the cluster calculated by matrix-clustering) of these database-specific clusters across databases. Each cluster represented by one non-redundant central motif. This column shows the representative TF for each cluster.

Central motif ID: The motif ID for the Central TF in the SalMotifDB.

Database source: The source database that the motif is obtained.

Occurrence in geneset: The motif occurrence in your test geneset.

Occurrence in genome: The motif occurrence in the genome.

P-value: Hypergeometric distribution p-value.

B) associatedGenes

associatedgenes <- resultList$associatedGenes
knitr::kable(associatedgenes[1:10, -c("Gene strand", "Motif strand")], format="html")  %>% kable_styling(bootstrap_options = c("striped"), font_size = 10, full_width = T, position = "center")  %>% scroll_box(width = "80%", height = "200px")

Gene ID	Chromosome	Biotype	Product	Motif ID	TF	Score	Distance	Motif length	Start	Stop	Matched sequence	Occurence in genome	Occruence in gene set	P-value	Source database	Central TF	Central motif ID
gene38708:106577593	ssa18	protein_coding	3-hydroxyacyl-CoA dehydratase 4	six4_M01374_TRANSFAC	SIX4	11.40740	-396	17	49456794	49456810	CACTCTGACACCTCAGG	1038	2	0.0333661	TRANSFAC	SIX4	25
gene38708:106577593	ssa18	protein_coding	3-hydroxyacyl-CoA dehydratase 4	Irx6_M01377_TRANSFAC	IRX6	11.61730	-307	17	49456705	49456721	GACCTACATGTTGTGCT	779	1	0.2069358	TRANSFAC	IRX5	10884
gene38708:106577593	ssa18	protein_coding	3-hydroxyacyl-CoA dehydratase 4	Irx2_M01405_TRANSFAC	IRX2	11.22220	-306	17	49456704	49456720	GCACAACATGTAGGTCA	775	1	0.2059727	TRANSFAC	IRX5	10884
gene38708:106577593	ssa18	protein_coding	3-hydroxyacyl-CoA dehydratase 4	IRX4_M01410_TRANSFAC	IRX4	10.92680	-306	17	49456704	49456720	GCACAACATGTAGGTCA	879	1	0.2306992	TRANSFAC	IRX5	10884
gene38708:106577593	ssa18	protein_coding	3-hydroxyacyl-CoA dehydratase 4	IRX4_M01410_TRANSFAC	IRX4	10.96340	-307	17	49456705	49456721	GACCTACATGTTGTGCT	879	1	0.2306992	TRANSFAC	IRX5	10884
gene38708:106577593	ssa18	protein_coding	3-hydroxyacyl-CoA dehydratase 4	Tcfe3_M0174_1.02_CISBP	TCFE3	11.06170	-301	10	49456699	49456708	GGTCACATGG	816	2	0.0212285	CISBP	BHLHB2	9398
gene38708:106577593	ssa18	protein_coding	3-hydroxyacyl-CoA dehydratase 4	Mitf_M0208_1.02_CISBP	MITF	11.23460	-301	10	49456699	49456708	GGTCACATGG	800	2	0.0204468	CISBP	BHLHB2	9398
gene38708:106577593	ssa18	protein_coding	3-hydroxyacyl-CoA dehydratase 4	Obox6_M01445_TRANSFAC	OBOX6	11.46910	-130	15	49456528	49456542	AAAAACAGATTATGG	427	1	0.1183637	TRANSFAC	OTX2	10925
gene38708:106577593	ssa18	protein_coding	3-hydroxyacyl-CoA dehydratase 4	Elf-1_M00110_TRANSFAC	ELF-1	11.71900	-140	16	49456538	49456553	GGGTATGGTTTAAAAA	855	2	0.0231879	TRANSFAC	GRH	909
gene38708:106577593	ssa18	protein_coding	3-hydroxyacyl-CoA dehydratase 4	lin54_M0593_1.02_CISBP	LIN54	8.17241	-80	8	49456478	49456485	GTTTGAAT	831	1	0.2193679	CISBP	LIN54	370

This table (scroll the table to see all rows and columns) provides details about all individual motif matches to the promoters of genes in the list. The first five columns are about each gene and are obtained from NCBI database annotation. Motif ID, TF, Central TF and Central motif ID are explained above. We will explain some of the columns.

Score: the log-odds scores using log base 2 computed by the FIMO tool used to scan motifs.

Distance: the motif distance from transcription start site

Start: Motif start location in the genome

Stop: Motif stop location in the genome

Matched sequence: Actual matched sequence in the promoter sequence

P-value: statistical threshold used by FIMO (<0.0001)

C) networkEdges and networkNodes

edges <-  resultList$networkEdges
nodes <-  resultList$networkNodes

setDT(edges)
edges[, size := .N, by='tf']

nodes <- as.data.table(unique(left_join(nodes,edges[,.(tf,size,count)], by=c("label"="tf"))))
nodes[is.na(size), size := 0]

nodes[ , shape := ifelse(size == 0, "circle", no="triangle")]
nodes[ , color := ifelse(size == 0, "#5CFFFF", no="#F25FD0")]
nodes[size != 0, value := count]

library(visNetwork)
visNetwork(nodes=nodes, edges=edges, physics = FALSE, width = "90%") %>% visIgraphLayout() %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>% visNodes(scaling = list(label = list(enabled = T))) %>% visNetwork::visHierarchicalLayout(parentCentralization = FALSE, enabled = FALSE)

We provide a method to visualize the relationship between genes and their associated TFs. The above network visualization prepared from the top 10 enriched motifs sorted by p-value. The network diagram is interactive so that you can click on a gene (blue circle) or a TF (red triangle) to see its relationship with other nodes.

3.2 MotifSearchPosition

The position based search tool allows you to specify a genomic region of interest and retrieve details about all motif matches to promoters of genes located in that region.

3.2.1 Run MotifSearchPosition function

Run MotifSearchPosition function for Atlantic salmon species and for motifs predicted for sequences that weren’t repeat masked. Let’s find motifs between 1 and 1000000 base pairs in chromosome ssa01

resultList <- MotifSearchPosition(coordinate="ssa01:1-1000000",mySpecies="Atlantic salmon",topTFs=10,dbType="DNA-seq",RepeatsMasked="Yes", Conserved = "Yes")

The result contains only one element: identifiedMotifs.

identifiedMotifs <- resultList$identifiedMotifs
knitr::kable(identifiedMotifs[1:10, -c("Gene strand", "Motif strand")], format="html")  %>% kable_styling(bootstrap_options = c("striped"), font_size = 10, full_width = T, position = "center")  %>% scroll_box(width = "80%")

ID	Gene ID	Chromosome	Gene start	Gene end	Score	Distance from TSS	Motif length	Matched sequence	P-value	Motif start	Motif stop	Occurence in genome	Motif ID	TF	Motif source
303	gene2:106601976	ssa01	228330	231471	11.209900	50	8	GGGGTACG	8.88e-05	231414	231421	210	Zfp128_M0438_1.02_CISBP	ZFP128	CISBP
321	gene2:106601976	ssa01	228330	231471	13.086400	6	8	CTATGGGG	1.67e-05	231458	231465	770	F52B5.7_M0454_1.02_CISBP	F52B5.7	CISBP
328	gene2:106601976	ssa01	228330	231471	11.609800	-3	10	TAGGTGGTCC	1.60e-05	231465	231474	1386	ztf-14_M0461_1.02_CISBP	ZTF-14	CISBP
381	gene2:106601976	ssa01	228330	231471	5.602410	49	9	CCGTACCCC	5.00e-05	231414	231422	325	KDM2B_M0607_1.02_CISBP	KDM2B	CISBP
389	gene2:106601976	ssa01	228330	231471	5.578310	48	10	CCCGTACCCC	3.68e-05	231414	231423	718	mll_M0615_1.02_CISBP	MLL	CISBP
717	gene2:106601976	ssa01	228330	231471	9.445780	49	10	CCGTACCCCG	2.37e-05	231413	231422	758	Gm98_M1419_1.02_CISBP	GM98	CISBP
994	gene2:106601976	ssa01	228330	231471	9.924420	10	9	GGGGTGCTG	8.43e-05	231453	231461	1462	ZIC2_M4148_1.02_CISBP	ZIC2	CISBP
997	gene2:106601976	ssa01	228330	231471	9.172840	37	14	GGGGACCCTCCCAG	8.63e-05	231421	231434	1075	RELA_M4444_1.02_CISBP	RELA	CISBP
1004	gene2:106601976	ssa01	228330	231471	-0.518519	44	21	GGTCCCCGTACCCCGGCATCC	8.48e-05	231407	231427	1441	SMARCC2_M4527_1.02_CISBP	SMARCC2	CISBP
1041	gene2:106601976	ssa01	228330	231471	10.276900	36	12	GACCCTCCCAGG	6.39e-05	231424	231435	2520	GKLF_KLF4_M01588_TRANSFAC	GKLF_(KLF4)	TRANSFAC

3.3 MotifSearchGene

This MotifSearchGene function allows you to input a list of genes and identify motifs that match the promoters of these genes more often than expected by chance. The tool gives details about enrichment p-values (using the hypergeometric distribution), as well as details about all individual motif matches to promoters of genes in the list.

For more search criteria, please check the usage of motif enrichment analysis function as ?MotifSearchGene().

3.3.1 Load the example dataset.

We will extract 3 genes for Fatty acid metabolism KEGG pathway

data( 'lipidGenes', package='SalMotifDB')
fam <- lipidGenes[KEGG_pathway_name == 'Fatty acid metabolism', gene_id]
fam
#> [1] "gene23858:106563087" "gene2736:106568363"  "gene9236:106602820"

3.3.2 Run MotifSearchGene function

Run MotifSearchGene function for Atlantic salmon species and for motifs predicted for sequences that weren’t repeat masked

resultList <- MotifSearchGene(fam,mySpecies="Atlantic salmon",topTFs=10,dbType="DNA-seq",RepeatsMasked="Yes", Conserved = "Yes")

MotifSearchGene enrichedMotifs

enrichedMotifs <- resultList$enrichedMotifs
knitr::kable(enrichedMotifs[1:10,], format="html")  %>% kable_styling(bootstrap_options = c("striped"), font_size = 10, full_width = T, position = "center")  %>% scroll_box(width = "80%", height = "200px")

ID	Motif ID	TF	Central TF	Central motif ID	Database source	Occurence in geneset	Occurence in genome	P-value
303	Zfp128_M0438_1.02_CISBP	ZFP128	ZFP128	303	CISBP	1	210	0.0102170
321	F52B5.7_M0454_1.02_CISBP	F52B5.7	F52B5.7	321	CISBP	1	770	0.0374623
328	ztf-14_M0461_1.02_CISBP	ZTF-14	GLI	7908	CISBP	1	1386	0.0674321
381	KDM2B_M0607_1.02_CISBP	KDM2B	KDM2B	377	CISBP	1	325	0.0158120
389	mll_M0615_1.02_CISBP	MLL	KDM2B	377	CISBP	1	718	0.0349324
717	Gm98_M1419_1.02_CISBP	GM98	ENSDARG00000078676	715	CISBP	1	758	0.0368785
994	ZIC2_M4148_1.02_CISBP	ZIC2	GLI	7908	CISBP	1	1462	0.0711297
997	RELA_M4444_1.02_CISBP	RELA	RELA	997	CISBP	1	1075	0.0523013
1004	SMARCC2_M4527_1.02_CISBP	SMARCC2	SMARCC2	1004	CISBP	1	1441	0.0701080
1041	GKLF_KLF4_M01588_TRANSFAC	GKLF_(KLF4)	SP4	3370	TRANSFAC	1	2520	0.1226039

This table gives details about enrichment p-values (using the hypergeometric distribution), as well as details about all individual motif matches to promoters of genes in the list. The columns are explained under enrichedMotifs function above.

associatedgenes <- resultList$associatedGenes
knitr::kable(associatedgenes[1:10, -c("Gene strand", "Motif strand")], format="html")  %>% kable_styling(bootstrap_options = c("striped"), font_size = 10, full_width = T, position = "center")  %>% scroll_box(width = "80%", height = "200px")

Gene ID	Chromosome	Biotype	Product	Motif ID	TF	Score	Distance	Motif length	Start	Stop	Matched sequence	Occurence in genome	Occruence in gene set	P-value	Source database	Central TF	Central motif ID
gene2:106601976	ssa01	protein_coding	fibroblast growth factor receptor 3-like	Zfp128_M0438_1.02_CISBP	ZFP128	11.209900	50	8	231414	231421	GGGGTACG	210	1	0.0102170	CISBP	ZFP128	303
gene2:106601976	ssa01	protein_coding	fibroblast growth factor receptor 3-like	F52B5.7_M0454_1.02_CISBP	F52B5.7	13.086400	6	8	231458	231465	CTATGGGG	770	1	0.0374623	CISBP	F52B5.7	321
gene2:106601976	ssa01	protein_coding	fibroblast growth factor receptor 3-like	ztf-14_M0461_1.02_CISBP	ZTF-14	11.609800	-3	10	231465	231474	TAGGTGGTCC	1386	1	0.0674321	CISBP	GLI	7908
gene2:106601976	ssa01	protein_coding	fibroblast growth factor receptor 3-like	KDM2B_M0607_1.02_CISBP	KDM2B	5.602410	49	9	231414	231422	CCGTACCCC	325	1	0.0158120	CISBP	KDM2B	377
gene2:106601976	ssa01	protein_coding	fibroblast growth factor receptor 3-like	mll_M0615_1.02_CISBP	MLL	5.578310	48	10	231414	231423	CCCGTACCCC	718	1	0.0349324	CISBP	KDM2B	377
gene2:106601976	ssa01	protein_coding	fibroblast growth factor receptor 3-like	Gm98_M1419_1.02_CISBP	GM98	9.445780	49	10	231413	231422	CCGTACCCCG	758	1	0.0368785	CISBP	ENSDARG00000078676	715
gene2:106601976	ssa01	protein_coding	fibroblast growth factor receptor 3-like	ZIC2_M4148_1.02_CISBP	ZIC2	9.924420	10	9	231453	231461	GGGGTGCTG	1462	1	0.0711297	CISBP	GLI	7908
gene2:106601976	ssa01	protein_coding	fibroblast growth factor receptor 3-like	RELA_M4444_1.02_CISBP	RELA	9.172840	37	14	231421	231434	GGGGACCCTCCCAG	1075	1	0.0523013	CISBP	RELA	997
gene2:106601976	ssa01	protein_coding	fibroblast growth factor receptor 3-like	SMARCC2_M4527_1.02_CISBP	SMARCC2	-0.518519	44	21	231407	231427	GGTCCCCGTACCCCGGCATCC	1441	1	0.0701080	CISBP	SMARCC2	1004
gene2:106601976	ssa01	protein_coding	fibroblast growth factor receptor 3-like	GKLF_KLF4_M01588_TRANSFAC	GKLF_(KLF4)	10.276900	36	12	231424	231435	GACCCTCCCAGG	2520	1	0.1226039	TRANSFAC	SP4	3370

This gives details about all individual motif matches to promoters of genes in the list. The first five columns are about each gene and are obtained from gene annotation file in the NCBI database. The columns are explained under enrichedMotifs function above.

networkEdges and networkNodes

edges <-  resultList$networkEdges
nodes <-  resultList$networkNodes

setDT(edges)
edges[, size := .N, by='tf']

nodes <- as.data.table(unique(left_join(nodes,edges[,.(tf,size,count)], by=c("label"="tf"))))
nodes[is.na(size), size := 0]

# nodes_d3[ , shape := ifelse(size == 0, "dot", no="triangle")]
nodes[ , shape := ifelse(size == 0, "circle", no="triangle")]
nodes[ , color := ifelse(size == 0, "#5CFFFF", no="#F25FD0")]
nodes[size != 0, value := count]

library(visNetwork)
visNetwork(nodes=nodes, edges=edges, physics = FALSE, width = "90%") %>% visIgraphLayout() %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>% visNodes(scaling = list(label = list(enabled = T))) %>% visNetwork::visHierarchicalLayout(parentCentralization = FALSE, enabled = FALSE)

The network diagram is explained under enrichedMotifs function above.

3.4 SearchPredictedTFs

This SearchPredictedTFs function allows you to search predicted transcription factors for a single gene or set of genes for selected salmonid species. The TFs are predicted salmonid orthologs with information on BLAST E-value score and shared NCBI conserved domain database (CDD).

For more search criteria, please check the usage of SearchPredictedTFs analysis function as ?SearchPredictedTFs().

3.4.1 Load the example dataset.

We will use all 1421 genes for lipid metabolism genes in our dataset.

data( 'lipidGenes', package='SalMotifDB')
fam <- lipidGenes$gene_id
length(fam)
#> [1] 1421

3.4.2 Run SearchPredictedTFs function

Run SearchPredictedTFs function for Atlantic salmon species to check if a gene is predicted as TF.

resultList <- SearchPredictedTFs(fam,mySpecies="Atlantic salmon")

predictedTFs <- resultList$predictedTFs
knitr::kable(predictedTFs[,-c("Gene strand")], format="html")  %>% kable_styling(bootstrap_options = c("striped"), font_size = 10, full_width = T, position = "center")  %>% scroll_box(width = "80%", height = "200px")

Gene ID	Gene start	Gene end	Chromosome	Gene name	Product	TF	E-value	Bitscore	Predicted as TF	CDD ID	Accession ID	TF superfamily	CD name
gene11716:106605079	49591271	49593238	ssa05	LOC106605079	nuclear receptor subfamily 0 group B member 2-like	DAX1	0.000	171.0	TRUE	132726	cd06157	cl11397	NR_LBD
gene12735:106606082	76683067	76704849	ssa05	LOC106606082	retinoic acid receptor RXR-beta-A-like	NHR-154	0.000	209.0	TRUE	132726	cd06157	cl11397	NR_LBD
gene13206:100502556	13986970	14001140	ssa06	srebf1	sterol regulatory element binding transcription factor 1	SREBF1	0.000	74.6	TRUE	238036	cd00083	cl00081	HLH
gene13245:106606621	15334374	15352974	ssa06	LOC106606621	sterol regulatory element-binding protein 1-like	SREBF1	0.000	79.3	TRUE	238036	cd00083	cl00081	HLH
gene13982:106607132	34091778	34110167	ssa06	LOC106607132	sterol regulatory element-binding protein 2-like	SREBP-2	0.000	80.5	TRUE	238036	cd00083	cl00081	HLH
gene21542:106560693	59507339	59583473	ssa10	LOC106560693	forkhead box protein O1-A-like	FOXO	0.000	169.0	TRUE	238016	cd00059	cl00061	FH
gene21752:100136415	72569414	72629719	ssa10	LOC100136415	peroxisome proliferator-activated receptor alpha	PPARA	0.018	34.3	FALSE	100121	cd06224	cl02520	REM
gene22207:106561422	94620065	94657017	ssa10	LOC106561422	bile acid receptor-like	NHR-168	0.000	202.0	TRUE	132726	cd06157	cl11397	NR_LBD
gene22778:106561932	8588464	8627339	ssa11	LOC106561932	oxysterols receptor LXR-alpha-like	NR1I3	0.000	202.0	TRUE	132726	cd06157	cl11397	NR_LBD
gene24141:106563228	69740722	69889585	ssa11	LOC106563228	retinoic acid receptor RXR-alpha-A	RXRA	0.000	210.0	FALSE	132726	cd06157	cl11397	NR_LBD
gene2444:106565394	123978637	124148799	ssa01	LOC106565394	retinoic acid receptor RXR-alpha-A-like	RXRA	0.000	210.0	FALSE	132726	cd06157	cl11397	NR_LBD
gene27586:106566498	10457571	10477633	ssa13	LOC106566498	ETS domain-containing protein Elk-1-like	ELK3	0.000	170.0	TRUE	197710	smart00413	cl02599	ETS
gene27782:106566754	19141373	19162258	ssa13	LOC106566754	peroxisome proliferator-activated receptor gamma-like	EIP75B	0.000	194.0	TRUE	132726	cd06157	cl11397	NR_LBD
gene28097:106566994	32579139	32594159	ssa13	LOC106566994	peroxisome proliferator-activated receptor delta-like	NHR-177	0.000	194.0	TRUE	132726	cd06157	cl11397	NR_LBD
gene30035:106568909	14472434	14522880	ssa14	LOC106568909	retinoic acid receptor RXR-gamma-A-like	RARA_RXRA	0.000	209.0	TRUE	132726	cd06157	cl11397	NR_LBD
gene31190:106569952	59250891	59280126	ssa14	LOC106569952	retinoic acid receptor RXR-beta-A-like	NHR-204	0.000	204.0	TRUE	132726	cd06157	cl11397	NR_LBD
gene31376:106570251	66147552	66148931	ssa14	LOC106570251	nuclear receptor subfamily 0 group B member 2-like	DAX1	0.000	171.0	TRUE	132726	cd06157	cl11397	NR_LBD
gene33072:100136393	59828382	59834982	ssa15	pparg	peroxisome proliferator activated receptor gamma	PPAR	0.000	195.0	TRUE	132726	cd06157	cl11397	NR_LBD
gene34435:106573302	16397223	16422055	ssa16	LOC106573302	bile acid receptor-like	NHR-168	0.000	200.0	TRUE	132726	cd06157	cl11397	NR_LBD
gene42704:106581554	82231554	82238345	ssa20	LOC106581554	forkhead box protein O1-A-like	FOXO1	0.000	165.0	TRUE	238016	cd00059	cl00061	FH
gene4331:100195621	31210986	31212681	ssa02	nr0b2	nuclear receptor subfamily 0 group B member 2	DAX1	0.000	173.0	TRUE	132726	cd06157	cl11397	NR_LBD
gene45110:106583866	63366547	63382160	ssa22	LOC106583866	peroxisome proliferator-activated receptor gamma-like	SMP_016180	0.000	151.0	FALSE	143512	cd06916	cl02596	NR_DBD_like
gene45770:106584489	34814178	34883599	ssa23	LOC106584489	peroxisome proliferator-activated receptor alpha-like	CBR-SEX-1	0.004	36.6	FALSE	100121	cd06224	cl02520	REM
gene47012:106585776	33054252	33097663	ssa24	rxra	retinoid X receptor alpha	NHR-204	0.000	208.0	TRUE	132726	cd06157	cl11397	NR_LBD
gene48321:100270809	8673140	8697083	ssa26	nr1h3	nuclear receptor subfamily 1 group H member 3	NR1I3	0.000	201.0	TRUE	132726	cd06157	cl11397	NR_LBD
gene49859:106588394	10323956	10344753	ssa27	LOC106588394	retinoic acid receptor RXR-beta-A	NHR-154	0.000	213.0	TRUE	132726	cd06157	cl11397	NR_LBD
gene6246:106599103	15076653	15099785	ssa03	LOC106599103	retinoic acid receptor RXR-gamma-A-like	NHR-3	0.000	209.0	TRUE	132726	cd06157	cl11397	NR_LBD
gene68452:106596748	2690	3817	NW_012361756.1	LOC106596748	retinoic acid receptor RXR-gamma-A-like	NHR-232	0.000	126.0	FALSE	143512	cd06916	cl02596	NR_DBD_like
gene7578:100502557	65072943	65087099	ssa03	srebf2	sterol regulatory element binding transcription factor 2	SREBF1	0.000	80.2	TRUE	238036	cd00083	cl00081	HLH
gene8199:100502556	82955243	82969092	ssa03	srebf1	sterol regulatory element binding transcription factor 1	SREBF1	0.000	74.6	TRUE	238036	cd00083	cl00081	HLH
gene9442:106602890	30736626	30824370	ssa04	LOC106602890	forkhead box protein O1-A-like	DAF-16	0.000	170.0	TRUE	238016	cd00059	cl00061	FH

We have got 31 genes predicted as TF. As you can see in the table, some important TFs in lipid metabolism such as SREBF1, SREBF2, LXR, RXRA and PPA are predicted.

SalMotifDB quick start guide

24 August 2019

Package