Algebraic Engineering Group@TUHH: Bioinformatics

Showing posts with label Bioinformatics. Show all posts

Thursday, November 17, 2011

Computational Biology - Lecture 4

Topics:

Normal cones and normal fans
pollytope algebra
Newton polytopes
polytope propagation
finding the optimal alignments for all scores - example

Thursday, October 27, 2011

Computational Biology - Lecture 1

Topics:

Representation of alignment
scoring schemes,
pair hidden Markov model,
tropicalization of scoring function.

Tuesday, October 11, 2011

Vorlesung: Computational Biology

Algebraische Methoden

Zeit und Ort:

Donnerstag, 16:00 - 17:30,

Beginn:

27. Oktober 2011 - 2. Vorlesungswoche.

Sprache:

Englisch

Empfohlene Vorkenntnisse:

Grundkenntnisse aus Diskreter Mathematik, Linearer Algebra und Analysis.

Inhalt:

Algebraische Geometrie (Gröbnerbasen, algebraische Varietäten, Eliminationstheorie)
Algebro-statistische Modelle (lineare und torische Modelle, Markov-Modelle, Invarianz, statistische Inferenz)
Anwendungen: Alignment biologischer Sequenzen, Hidden-Markov-Modell.

Qualifikationsziele:

Kenntnisse: Vertiefte Kenntnisse auf einem neuen Gebiet zwischen algebraischer Geometrie und Statistik.
Fertigkeiten. Theorie geleitetes Anwenden algebro-statistischer Methoden.
Kompetenzen: Formalisieren von Problemstellungen, Bewerten unterschiedlicher Lösungsansätze, Einsatz von Computeralgebrasystemen.

Literatur:

L. Pachter, B. Sturmfels: Algebraic Statistics for Computational Biology. Cambridge Univ Press, 2004.

Studien/Prüfungsleistungen:

Mündliche Prüfung.

Tuesday, February 8, 2011

Bioinformatics - Exam

1. Consider the following proteins from UniProtKB:

P04655
P04654
P47710
P08949
P47851

a) Determine the families to which these proteins belong by using an appropriate tool.
Describe the consensus patterns of these families.

b) For each detected family, construct a multiple sequence alignment for the proteins belonging to the family. Find highly conserved regions in the proteins under consideration by examining the multiple sequence alignments. Are the consensus patterns of the families correctly reflected in the alignments?

c) Use a corresponding tool to provide a phylogenetic tree for the five proteins.
Does the tree reflect the family relationship correctly?

2. Consider the fibrinogen-binding protein from Staphylococcus aureus (accession P68799). Determine the secondary structure by a method of your choice. Compare the predicted secondary structure for the given protein with the real one.

3. Basic Questions:
a) Explain the difference between standard Monte Carlo method and importance sampling.

b) How many genes has the HI virus?
Which genes are not present in the human genome?

c) Given a phylogenetic tree with character data at the leaves:

            /\
           / \
          /\ /\
        GCCA

Find the most parsimonious tree using Fitch's method.

d) What is an additive tree?

Thursday, January 27, 2011

Introduction to Bioinformatics - Lecture 13

Today, statistical sampling methods were considered. Here are the topics:

Statistical mechanics,
canonical and micro-canonical ensemble,
observables and partition function,
molecular dynamics simulations (verlet and velocity verlet algorithm),
consideration of time step,
improvement of simulation (cutoff, distance, and multipole schemes),
standard Monte Carlo method,
calculation of partition function,
importance sampling,
sampling of protein structures.

Thursday, January 20, 2011

Introduction to Bioinformatics - Lecture 12

Today, we will give an introduction to 3D structure prediction of proteins:

Force fields (CHARMM, Oobatake-Crippen),
rigid geometry models,
buildup method,
basic heuristic methods,
conformational space annealing,
HP model.

Thursday, January 13, 2011

Introduction to Bioinformatics - Lecture 11

Today, we finished the considerations about 2D structure prediction:

Nearest neighbor classification (intrinsic dimension, Bhattacharyya distance).
Consensus prediction
Neural network classification (Rost-Sander approach).

Thursday, January 6, 2011

Introduction to Bioinformatics - Lecture 10

Today, an introduction to the prediction of secondary protein structures was given:

GOR method,
Chow-Fasman method,
Generation of sample sets,
Nearest neighbor classification.

Wednesday, January 5, 2011

The Reluctant Mr. Darwin

The other day, I read the book The Reluctant Mr. Darwin from David Quammen, 2006. The book provides insight into the course of Charles Darwin's life from his return from the voyage on the Beagle until his death. Darwin spent years to catalogue the vast collection of specimens he brought back. He was a self-taught scientist and eventually came to a theory of evolution. He was reluctant to publish but first choose to share his ideas with colleagues. He was afraid of a public backlash and sometimes diverted his attention and energies elsewhere. In the meantime, another scientist, Alfred R. Wallace, independently came to the same ideas about evolution. Then Darwin's colleagues eventually convinced him to publish his theory. His book, The Origin of the Species, found both, acclaim and dislike.

German translation:

David Quammen: Charles Darwin - Der große Forscher und seine Theorie der Evolution, Piper Verlag, 2010, 9,95 €.

Thursday, December 16, 2010

Introduction to Bioinformatics - Lecture 9

Today, we resumed with additive-tree methods for the construction of phylogenetic trees. As an example, the UPGMA method was provided and a genetic algorithm for searching the space of trees with a constant number of leaves.

Thursday, December 9, 2010

Introduction to Bioinformatics - Lecture 8

Today, we will firstly consider the maximum likelihood method for the reconstruction of phylogenies. For this, we will introduce evolutionary models. Such a model is based on a rooted tree and a rate matrix which specifies the substitution matrices along the branches of the tree. Then we will introduce the likelihood of the tree and discuss the problem of optimizing the branch lengths.

Secondly, we will study distance based methods. The goal is to construct the most additive tree from a given set of evolutionary distances. We will use linear programming to tackle this problem. An example involving four species will be given.

Thursday, December 2, 2010

Introduction to Bioinformatics - Lecture 7

Today, we will start with the basics on phylogenetics:

Distance and character data, basic assumptions.
Number of labeled binary trees with fixed number of leaves - rooted or non-rooted.
Small parsimony methods: weighted (Sankoff) and unweighted (Fitch).

Thursday, November 25, 2010

Introduction to Bioinformatics - Lecture 6

Today, we considered the following topics:

profile-sequence alignment,
profile-profile alignment,
center-star algorithm.

Monday, November 22, 2010

Graphics Card Processing: Accelerating Profile-Profile Alignment

Alignment is the fundamental operation in molecular biology for comparing biomolecular sequences. The most widely used method for aligning groups of alignments is based on the alignment of the profiles corresponding to the groups. We show that profile-profile alignment can be significantly speeded up by general purpose computing on a modern commodity graphics card. In this way, the huge computational power of graphics cards can be exploited to develop high performance solutions for multiple sequence alignment.

Keywords Alignment · Progressive alignment · Graphics processor card · Basic linear algebra subprograms · Performance.

M.K. Hanif, K.-H. Zimmermann: Accelerating Profile-Profile Alignment Using Graphics Processor Units. J. Sig. Proc. Systems, submitted.

Thursday, November 18, 2010

Introduction to Bioinformatics - Lecture 5

Today, we focussed on the problem of multiple-sequence alignment:

alignment scores,
dynamic programming algorithm for multiple-sequence alignment,
progressive alignment.

Thursday, November 11, 2010

Introduction to Bioinformatics - Lecture 4

Today's lecture will carry on with the topic of pairwise sequence alignment:

Local alignment.
Heuristic methods: Fasta and Blast.
Scoring models: Pam and Blosum.

Thursday, November 4, 2010

Introduction to Bioinformatics - Lecture 3

Alignment is the standard technique in molecular biology for comparing sequences. Today, we gave an introduction to this field:

Similarity between proteins.
Change of genetic material.
Homology
Definition of pairwise alignment.
Scoring model: match and mismatch scores.
Alignment scores.
Global alignment problem.
Optimal algorithm for global alignment (Needleman-Wunsch).
Optimal algorithm for global-local alignment.
Dynamic programming (Bellman).

Thursday, October 28, 2010

Introduction to Bioinformatics - Lecture 2

This lecture has provided a short introduction to DNA, RNA, genes, genomes, and biosynthesis:

Structure of DNA.
Domains of life.
Structure of genomes, prokaryotes and eukaryotes.
Virsuses
Structure of genes, prokaryotes and eukaryotes.
Structure of RNA.
Transcription.
Promoter regions in E.coli and human.
Exons, introns, and splicing.
Genetic code.
Transfer RNA (tRNA).
Ribosomes.
Translation.
Central dogma of molecular biology.
Life cycle of HIV.

Thursday, October 21, 2010

Introduction to Bioinformatics - Lecture 1

Proteins perform many function essential for life. The building blocks of the proteins are the 20 naturally occurring amino acids. In the first lecture, we have considered the structure of amino acids and proteins.
Here is the contents:

Chemical structure of amino acids.
Condensation reaction to establish linear chains of amino acids.
Backbone structure of proteins.
Peptide bond between neighboring residues in a linear chain of amino acids.
Structure of the protein crambin from Crambe abyssinica.
Side chains of amino acids.
Geometry of proteins: bond lengths and angles, dihedral angles.
Ramachandran plots and distance maps.
Conformations - thermodynamic hypothesis of Anfinsen.
Secondary structures: helices and sheets.
Proteins in a watery compartment: hydrophobic core, globular form.
Quaternary structures (deoxy human hemoglobin).
Example: Penicillin amidase.

Tuesday, October 19, 2010

Bioinformatics Algorithms on Graphics Hardware

Alignment is one of the basic operations in molecular biology to compare sequences. The most widely used methods for multiple-sequence alignment include scalar-product based alignment of groups of sequences.

We have shown that scalar-product based alignment algorithms can be significantly speeded up by general-purpose computing on a modern commonly available graphics card. This allows to develop high performance solutions for multiple-sequence alignment by utilizing the huge computational power of graphics cards.

Key Words alignment - progressive alignment - graphics processor card - basic linear algebra subprograms - performance

Literature

C.S. Bassoy, M. Yang, S. Torgasin, K.-H. Zimmermann: Accelerating scalar-product based sequence alignment using graphics processor units. J. Sig. Proc. Syst., 1939-8115 (online), 2009.
GPGPU.org

Welcome