How to generate a publication-quality sequence logo in genomics studies using R package?

A sequence logo is a graphical representation of the sequence conservation of nucleotides (in a strand of DNA/RNA) or amino acids (in protein sequences). The sequence logo is useful in diverse omics fields, such as chip-seq and m6A-seq.

In R, there are too many packages to generate a beautiful sequence logo.

Here I introduce ggseqlogo, which can fully benefit from ggplot2, the most elegant and aesthetically pleasing graphics framework available in R.

Installation

# First, install ggseqlogo from CRAN
install.packages("ggseqlogo")
#or from github using the devtools package:
devtools::install_github("omarwagih/ggseqlogo")

Load data

Assume you have a fasta file that contains 100 short peptides of 15 amino acides. I use seqinr to import the sequences into RStudio.

library(seqinr)
sample_seq<-read.fasta("~/Downloads/sample_seq.fasta")
sample_seq_df<-data.frame(IDs=names(sample_seq),Sequences=unlist(lapply(sample_seq,function(x) toupper(paste0(x,collapse = "")))))

Plot sequence logo

ggseqlogo(sample_seq_df$Sequences)

You will generate a sequence logo as follows:

You can easily customize the plots using annotation tools. The details can be found here. https://omarwagih.github.io/ggseqlogo/

The weblog (https://weblogo.berkeley.edu/logo.cgi) is also popular, and you can create the sequence logo online.

Leave a Reply

%d bloggers like this: