Program

Program Booklet (pdf)
Sunday August 26
8:30-9:00Registration
Welcome Coffee
Rosalind (Chair: Pavel Pevnzer)
9:00-9:10

Pavel Pevzner
(St. Petersburg Academic University, University of California, San Diego)
Welcome Remarks

9:10-9:15

Nikolay Vyahhi, Phillip Compeau
(St. Petersburg Academic University, University of California, San Diego)
Rosalind Introduction

9:15-10:00

Christopher Lee
(University of California, Los Angeles)
Bioinformatics Learning 2.0: proposing an open source consortium for bioinformatics teaching materials

I propose an open source consortium for bioinformatics teaching materials, including textbook chapters, slides, concept tests, homework and exam questions and answers, programming problems and data analysis projects, and software tools for using these materials in class and out.

To seed this effort I am contributing materials from two courses: a bioinformatics theory course (for Computer Science students) emphasizing probabilistic models and methods; and a genomics and computational biology course (for Life Science students).

This effort is based on several principles. First, bioinformatics is highly interdisciplinary, yet bioinformatics textbooks tend to each reflect only one disciplinary part of that. Furthermore, both available textbooks and the traditional lecture method fall far short of giving students adequate exercises to truly learn the concepts and skills. In effect, the job of writing all these teaching materials is too big for any one person. Instead, every teacher should be enabled to focus on writing materials in areas where they are expert, while drawing whatever materials they want from everyone else, via an open source consortium for sharing teaching materials.

Second, bioinformatics teaching should draw lessons from other fields such as physics teaching, where it has been shown that traditional lecturing (passive learning) is far less effective than active learning, where students answer and discuss problems in class. Specifically, I have developed teaching materials and software tools for in-class concept tests, defined as a question that challenges the students' understanding of a specific concept.

Whereas ROSALIND computational problems may be viewed as empirical (implicit) tests of mastery of a concept or skill, in-class concept testing explicitly teaches such mastery by challenging students to think about how to use a concept, and rapidly exposing the most common errors for all to see and understand. I illustrate with examples from the approximately 300 bioinformatics concept tests I have written for this effort.

I also present software tools for in-class concept testing, and for selecting and "re-compiling" content in flexible ways. Finally, I will discuss critical issues for such a consortium, such as automatic authorship tracking, sharing, and security.

For more details, see http://thinking.bioinformatics.ucla.edu/teaching.

Rosalind Problem Presentations 1
10:00-10:10

Gabriel Valiente
(Technical University of Catalonia)
Sequence Composition

A genomic or proteomic sequence can be seen as composed of a number of possibly overlapping words of a certain length, and the composition of a sequence is given by the frequency with which each possible word occurs within the sequence. In this talk, we review the biological significance of sequence composition and discuss efficient methods to obtain the word composition of a sequence, along with their implementation in the framework of the ROSALIND programming and testing environment for bioinformatics problems.

10:10-10:20

Tomas Vinar, Brona Brejova
(Comenius University)
Three Problems Illustrating Bioinformatics Concepts In a Standard Spreadsheet

Here, we describe three problems that we have previously used in the context of a bioinformatics class taught at the Comenius University in Bratislava. The class is targeted at both computer science and biology students. Students with both backgrounds attend the same lectures, while tutorials and assignments are provided separately for biologists and computer scientists.

One particular challenge in teaching this course is to design assignments for biology students, illustrating basic algorithmic and mathematical concepts used in bioinformatics without requiring prior programming experience. The class does not require any previous programming courses, nor it is the goal of the class to teach programming. We have found that many concepts can be illustrated in a standard spreadsheet (MS Excel or one of its open-source equivalents) to which most of the students have been exposed previously.

10:20-10:50Coffee Break
Rosalind Problem Presentations 2
10:50-11:10

Brian Tjaden
(Wellesley College)
From Sequence to Structure and Function: Inspiring Students with Bioinformatics Problems

Recent advances in sequencing technology have enabled scientists to gather large amounts of DNA and RNA sequence data. One of the bioinformatics challenges is extracting new insights about the structure and function of biomolecules from the wealth of sequence data. In this talk, we look at two problems in the field of computational molecular biology designed to stimulate and challenge students. The first problem relates to understanding the secondary structure of an RNA molecule based on its primary sequence. The second problem relates to processing large amounts of DNA sequence data so as to capture the internal structure in the data and support a range of queries on the data efficiently. Applications will be discussed for aligning high-throughput sequencing reads to a genome and for screening a genome for interesting genetic elements such as CRISPRs.

11:10-11:20

Sergey Naumenko
(Institute for Information Transmission Problems, Russian Academy of Sciences)
The Number of Reversing Substitutions

The basic task for molecular evolution studies is to calculate the frequency of a particular event in the evolutionary history. Reversing substitution is an example of such molecular event. At some moment in the past the direct amino acid substitution A → B occured. And after a certain period of time, we observe the reversing substitution B → A. Unfortunately, in most cases, with the possible exception of experimental evolution in bacteria, we don't know the intermediate (ancestral) state of a protein. We can observe proteins in human, mouse, dog, elephant and other species in their current state in the form of the multiple alignement of orthologous protein-coding genes. But we can restore the ancestral states in the internal nodes of the phylogenetic tree using the knowledge of amino acids on the terminal branches of the tree and the tree topology itself. There are a variety of methods (maximum parsimony, maximum likelihood, bayesian methods) and programs (PAML, Phylip, PAUP) to do so. Using the ancestral and terminal aminoacids at a site we can infer the substitutions.

Problem. Given the multiple alignment with internal states restored and the phylogenetic tree it is necessary to calculate the number of reversing substitution for different distances between the direct and reversing subsitution.

The solution of this problem does not require the intelligent algorithm, but it is an example (simplified) of the real world problem in molecular evolution. It contains the basic concepts: the site, the phylogenetic tree, the multiple alignment, the correspondance between these two, the inference of substitution events.

11:20-11:30

Jennifer McDowal
(EMBL EBI)
How do we teach bioinformatics to 10,000 students at the same time?

The shape of education is changing from strictly classroom-based learning to encompassing online learning, either as auxiliary learning tools or as a complete learning environment in its own right. Finding that hands-on training, while very useful, does not meet the demand for courses, the European Bioinformatics Institute has developed a Train-on-line site to provide a series of bioinformatics courses to a wider audience. Online learning can be particularly useful for bioinformatics courses, where students often have diverse backgrounds, as it permits students with similar learning needs to link-up. The EBI plans to promote this through the use of subject-focused online Forums, where experts will be able to link directly with groups of students.

To be successful, online learning needs good visibility. One approach is to connect with the efforts of Wikipedia, Wikiversity and Wikibooks. EBI online courses link glossary terms to Wikipedia, and plan to link terms back from Wikipedia to online courses, such as to modules covering EBI databases that have entries in Wikipedia. Courses can also be place on Wikiversity for greater accessibility to the public.

A second major change in the education system is an online environment for teachers, where they can share materials thereby improving the quality of classroom-based learning and helping to provide education standards. The Bioinformatics Training Network is one such site, a community-based project that aims to provide a centralised facility to share materials, to list training events (including course content) and to discuss training experiences. The site was developed and is maintained by those active in the field of bioinformatics education from any country worldwide.

11:30-12:10

Discussion Panel 1: How do we teach bioinformatics to 10,000 students at the same time?

The scalability of bioinformatics education is a question of the utmost importance in the next decade. Everyone interested in taking part in will be formed into small groups to discuss a number of questions related to this central theme.

12:10-13:40Lunch (Radisson)
Bioinformatics for Biologists (Chair: Ron Shamir)
13:40-14:20

Uri Keich
(University of Sydney)
Estimating the statistical significance of sequence motifs

The identification of transcription factor binding sites is an important step in understanding the regulation of gene expression. To address this need, many motif-finding tools have been described that can find short sequence motifs given only an input set of sequences.

Somewhat surprisingly, development of the significance analysis of the motifs reported by those motif finders has lagged considerably behind the extensive development of the finders themselves. Nevertheless, this analysis is often crucidiv>swd to siterong>late tcoerably behineview celsiSS bordf the findeS bordf thl of tel sigen theltree, al sigen thle expresRetp:/g large amounts of DNA sed5-rdlNA sedekOutg>of sequprses, cdenti"0inDconnectcipoyle="vce. The septel t?

This analysbility of "verticndinauf the-rdaf= Ii

Stain period of txprses to adas laggrplicitlyucidiv>tthe|netic t5-rdlNA sedekO0inTnt of n toyliabiidekO0i0inDbe placciinAxcessinIt) to s placc3|the-rdhisonliAPd> St t5-oassabo et bioinformatics texsemelsiSS d>n5ialyn5ege late ttThe .e extracThe Numbere((ify"dThe .e a thas1biilTlysideraas qualidipoyle=bi_yoyl//wwselseparatcesde amoor utmost vAxalign=LmUri_01tes, simeurac|adas ld. Everybi/g 0v>swd(aate3tiple alignment with i
1=bi_yoytv\evAx event 3)|edetaiscri_tonosfu/a> iu>cPtr>