Bioinformatics catalogue: October 2008

Monday, October 20, 2008

TESS web tool

http://cbil.upenn.edu/cgi-bin/tess/tess/

TESS is a web tool for predicting transcription factor binding sites in DNA sequences. It can identify binding sites using site or consensus strings and positional weight matrices from the TRANSFAC, JASPAR, IMD, and our CBIL-GibbsMat database. You can use TESS to search a few of your own sequences or for user-defined CRMs genome-wide near genes throughout genomes of interest.

Sunday, October 19, 2008

RepeatMasker

RepeatMasker http://www.repeatmasker.org/ is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). On average, almost 50% of a human genomic DNA sequence currently will be masked by the program. Sequence comparisons in RepeatMasker are performed by the program cross_match, an efficient implementation of the Smith-Waterman-Gotoh algorithm developed by Phil Green.

Gene Regulatory Information Server (AGRIS)

The Arabidopsis Gene Regulatory Information Server (AGRIS) is a new information resource of Arabidopsis promoter sequences, transcription factors and their target genes. AGRIS currently contains two databases, AtcisDB (Arabidopsis thaliana cis-regulatory database) and AtTFDB (Arabidopsis thaliana transcription factor database). The two databases, used in tandem, provide a powerful tool for use in continuous research.

AtcisDB consists of 25,516 promoter sequences of annotated Arabidopsis genes with a description of putative cis-regulatory elements.

AtTFDB contains information on approximately 1,770 transcription factors. These transcription factors are grouped into 50 families, based on the presence of conserved DNA-binding domains.

Saturday, October 18, 2008

Tandem Repeats Database (TRDB)

Tandem Repeats Database (TRDB) is a public repository of information on tandem repeats in genomic DNA and contains a variety of tools for their analysis. These currently include, the Tandem Repeats Finder algorithm, query and filtering capabilities for finding particular repeats of interest, repeat clustering algorithms based on sequence similarity, polymorphism prediction based on common patterns of mutation, PCR primer selection, and data download in a variety of formats. In addition, TRDB serves as a centralized research workbench. It provides storage space for results of analysis and permits collaborators to privately share their data and analysis.

You can upload up to 100MB FASTA sequence to be analyzed online for tandem repeats!

Friday, October 17, 2008

Galaxy tool

Use this site to access popular sources of data like the UCSC Table Browser. Run analyses right on the spot using a variety of integrated tools. Your results are always available and can be easily shared with others.

Galaxy is a framework for computational tools. It's great for scientists who need to use the same command line tools over and over again and want to keep a simple history of what they did with their data. By creating an account, your history will be saved indefinitely and results can be shared with other people on the same server.

How to retrieve many genes sequences automatically?

How to retrieve many genes sequences automatically? This is the question that have many researchers. If you work with a vertebrate you will have no problems! Go to Genome browser UCSC and click Tables

Select - Genome region
Paste your list with gene abbreviations/GeneBank accession numbers
And you can retrieve all genomic sequence or upstream/downstream regions of the genes.

Arabidopsis genome browser

http://www.arabidopsis.org/cgi-bin/gbrowse/arabidopsis/

The Arabidopsis Information Resource (TAIR) maintains a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana. Data available from TAIR includes the complete genome sequence along with gene structure, gene product information, metabolism, gene expression, DNA and seed stocks, genome maps, genetic and physical markers, publications, and information about the Arabidopsis research community. Gene product function data is updated every two weeks from the latest published research literature and community data submissions. Gene structures are updated 1-2 times per year using computational and manual methods as well as community submissions of new and updated genes. TAIR also provides extensive linkouts from our data pages to other Arabidopsis resources.

The Genome Browser provides a nice visualization of all genomic features, along with an interactivity for the user.

UCSC database

http://genome.ucsc.edu

The Genome Browser stacks annotation tracks beneath genome coordinate positions, allowing rapid visual correlation of different types of information. The user can look at a whole chromosome to get a feel for gene density, open a specific cytogenetic band to see a positionally mapped disease gene candidate, or zoom in to a particular gene to view its spliced ESTs and possible alternative splicing. The Genome Browser itself does not draw conclusions; rather, it collates all relevant information in one location, leaving the exploration and interpretation to the user.
The Genome Browser supports text and sequence based searches that provide quick, precise access to any region of specific interest. Secondary links from individual entries within annotation tracks lead to sequence details and supplementary off-site databases. To control information overload, tracks need not be displayed in full. Tracks can be hidden, collapsed into a condensed or single-line display, or filtered according to the user's criteria. Zooming and scrolling controls help to narrow or broaden the displayed chromosomal range to focus on the exact region of interest. Clicking on an individual item within a track opens a details page containing a summary of properties and links to off-site repositories such as PubMed, GenBank, Entrez, and OMIM. The page provides item-specific information on position, cytoband, strand, data source, and encoded protein, mRNA, genomic sequence and alignment, as appropriate to the nature of the track.

BioEdit Software

BioEdit is a biological sequence software written for Windows 95/98/NT/2000/XP. An intuitive multiple document interface with convenient features makes alignment and manipulation of sequences relatively easy on your desktop computer. Several sequence conversion and analysis options and links to external anaylsis applications facilitate a working environment which allows you to view and manipulate sequences with simple point-and-click operations.

You can get BioEdit from our BioBox storage place!

Bioinformatics catalogue