CSE 6406: Bioinformatics Algorithms
April 2008 Semester
Starting from 10-th May 2008
Quick Links: Contact/Course teacher: Books Schedule and Course syllabus Announcements Marks Distribution Study Topics Web resource Presentation Project Final Exam Bonus/Open Problems Exercise
Dr. Masud Hasan
Assistant Professor
Department of CSE, BUET
Office: CSE 657
Phone: PABX: 8614640-44, 9665650-80, Ext. 7738
Email: masudhasan "at" cse "dot" buet "dot" ac "dot" bd
Course web: http://203.208.166.84/masudhasan/cse6406_April_08.html
An Introduction to Bioinformatics Algorithms, by Neil C. Jones and Pavel A. Pevzner, MIT Press, 1st Edition, 2004, [We will follow this book, collect it. Indian Edition should be available at Nilkhet (try "Memory Books")]
Computational Molecular Biology: An Algorithmic Approach, by Pavel A. Pevzner, MIT Press, 1st Edition, 2000.
Schedule and Course syllabus:
Time: Tuesday 5:00pm to 8:00pm
Location: MML (Multimedia Lab), 3rd floor, CSE Department
We will try to cover the following topics as per the following tentative sequence:
Date, Class | Topics | Presentation schedule |
Lecture 1, 13/05/2008 | Course policies, Introduction [1: Ch 3, 2: Ch 13, slides] | |
Lecture 2, 20/05/2008 | Restriction Mapping: Partial Digest Problem [1: Ch 4], Double Digest Problem [2: Ch 2.2, 2.3, slides] | |
Lecture 3, 27/05/2008 | Motif Finding [1: 4.4-4.10, Ch 5.5, 12.2, slides-1, slides-2] | |
Lecture 4, 03/06/2008 | Genome Rearrangement: Sorting Permutations [1: Ch 5.1-5.4, slides] | |
Lecture 5, 10/06/2008 | Sequence Alignment [1: Ch 6.4-6.6, 6.8-6.9, Cormen: 15.4] | |
Lecture 6, 17/06/2008 | No Lecture |
Masum, Saddam |
Lecture 7, 24/06/2008 | DNA Sequencing [1: Ch 8.3-8.8] | Rifat, Anupom, Jamil, Mahbub |
Lecture 8, 01/07/2008 |
|
Bayzid, Mahmudul, Faysal, Nahiduzzaman |
Lecture 9, 08/07/2008 |
Combinatorial Pattern Matching [1: Ch 9.1-9.5] Guest Lecturer: Dr. Mohammad Sohel Rahman |
Swakkhar, Tanvir, Ahsan |
15/07/2008 | No class, Summer vacation | |
Lecture 10, 22/07/2008 | No Lecture | Kuntal, Rasel, Salahuddin, Sayed |
Lecture 11, 29/07/2008 | Clustering and Trees (Ch 10.1-10.2, 10.4 (follow slides)) | Anindya, Rajkumar, Mahmuda, Mohsina |
Lecture 12, 05/08/2008 | No Lecture | Atif, Foyez, Shahrear, Rakib, Musa, Ashis |
Lecture 13, 12/08/2008 | No Lecture | Mashfiqui, Shuvabrata, Tariq, Mehrab, Tanim |
Lecture 14, 19/08/2008 | No Lecture | Nahiyan, Amit, Tanvirul Islam, Tanvir Anwar |
Final Exam, 02/09/2008, 5:00pm | Final Exam |
19/08/2008: Project submission deadline: 13/09/2008, Saturday, slide under my office door or keep in my pigeon hole, hardcopy only (no soft copy accepted)
12/07/2008: Summer vacation: No class on 15/07/2008.
20/5/2008: New bonus and exercise posted, see below.
21/5/2008: Bonus 1 goes to Saddam Hossain, see the "Bonus" section below for details.
Active participation in the class: 20%
Select a study topic and give a presentation on that: 20%
Write a project on that topic: 20%
Final exam: 40%
Bonus: depending upon the level of work in finding any new result on your research topic or an open problem mentioned in the class.
You have to spend some time for selecting your topic first. I will suggest a list of topics. The list will grow in the coming weeks. You may also suggest a topic, but that must be verified by me. For most of the topics suggested below I have tried to make them limited and specific.
To choose a topic, first select one according to your interest. Then read (not in detail) the accompanied papers that I have suggested (or more if necessary). If you does not like it, go by another topic. Once you have finally selected your topic, read the suggested papers carefully. If necessary, explore other related papers mentioned in the the references of the initial papers and/or the related papers where the initial papers have been referred. Citeseer or IEEE Explorer will help you a lot to get most of the papers online and to find the earlier and following papers. Throughout your adventure on the papers you should identify the important papers and the results therein.
In each topic I have pointed out what is to be done under this topic, but that is not enough. After you have finally selected your topic, come to me, I will tell you what to cover in your presentation and project. Try to do what I suggest. You are always welcome with any new/additional/better idea other than that I have suggested, with more recent papers that I missed, etc. And any such good idea/findings will add marks to your presentation and project. It would be superb if you can find any new result on your topic and you will get some bonus mark for that.
It may happen that you can not select any topic yourself. Do not worry for that, feel free to inform me, I will help you selecting a topic.
Throughout the course you will study this topic and the accompanied papers in detail and you will give a presentation and write a project on it.
Possible topics: [In primary stage, more topics and more details to come...] [You need GhostView to view ps files]
[Saddam] Polynomial algorithm for PDP by Moris Nivat.
[Tanvir Anwar] The Point Placement Problem: This geometric problem is very much related to restriction mapping problem: Papers:
If you get time and if you are interested, then let me know, I will suggest you some papers on geometric probing that are related to this topic.
[Masum] Other Mapping Techniques: Study Double Digest Problem in more detail, study other mapping techniques such as Optical Mapping and Probed Partial Digest Problem, find recent related results. Start with book [2], Chapter 2, and the references there in.
[Jamil, Mahbub] Simple and faster algorithms for sorting signed permutation by reversals. Try to find out why it is difficult to achieve an O(n) or O(nlogn) time algorithm for this problem. The second and third papers below compute the reversal distance, not the sequence of reversals, in subquadretic (i.e., o(n2) time). Papers:
Piotr Berman, Sridhar Hannenhalli: Fast Sorting by Reversal. CPM 1996: 168-185
[Mahmudul] Why sorting unsigned permutations by reversals is NP-hard but that for signed permutations is polynomial? What is the glitch? Read the following by Caprara and the polynomial papers, as necessary, from the immediate above project..
[Faysal] Sorting by Transpositions: Can you say why it is so far difficult to determine the complexity of sorting by transpositions, while the problem of sorting by reversals has already been settled down. Read the following two papers, of which the first one is the best approximation result on transpositions and the second one is NP-Hardness of reversals.
There are many other papers for this topic, talk to me if you need more papers.
[Nahiduzzaman] Sorting by Short Swap: Other than reversals and transpositions, there are many other operations that have been considered for sorting permutations, and short swap is one of them. It is not well-studied. Few results include these papers:
[Swakkhar] Mathematical Challenges from Genomics and Molecular Biology [best suitable for a student taking Combinatorial Optimization course]: Read the following two papers and explain the mathematical challenges in bioinformatics.
[Anindya, Rajkumar] Approximation algorithms on Protein Folding in different models
[Bayzid] Haplotype
[Ahsan] Approximation algorithms for subtree distances between phylogenies
[Rasel] Recombination in Phylogenetic Network
[Salahuddin] New Techniques for Phylogenetic Tree Reconstruction
[Kuntal] BLAST, FASTA
[Rifat] Bioinformatics Software
[Sayed] Biological databases
[Mohsina] Biological databases
[Anupom] AI related algorithms in Bioinformatics
[Mahmuda] Sorting permutations by more than one rearrangement operation
[Atif] Finding motifs with gaps.
[Foyez] Data mining
[Shahrear] Finding segregated sites in DNA sequences
[Mashfiqui] Protein folding
[Rakib] Repetitions and regularities in weighted sequence
[Shuvabrata] Parallel algorithms in bioinformatics problems
[Tariq] DNA computing
[Mehrab] Phylognetic tree
[ABM Musa] Cache Oblivious algorithms for Bioinformatics
[Ashis] Clustering in bioinformatics
[Tanim] System Biology
[Nahian] Service oriented systems for bioinformatics
Research fields, other than the algorithms, in Bioinformatics: Database, System Bioinformatics, etc.
Follow the guideline below in your presentation.
Complete your presentation within 15-20 minutes
Give a summary first
Clearly explain the topic and its key features that you are covering (should include what I have suggested)
Application of this topic
Related work done so far (should be recent works)
Future work possible (as it comes to your mind and as it is mentioned in related papers)
Remember that this is a graduate course. So try to make your presentation attractive, informative, and friendly. Try to become less formal and communicative with the audience.
You may visit "How to give a good presentation" here: http://203.208.166.84/masudhasan/
Project will be the written form of what you find on your topic. It must be written in LaTeX. Report size is expected to be within10 to 15 pages with 11 size fonts and single spacing. You can collect latex installation files for Windows from me (~450 MB).
Web resources: [more to come...]
Some good algorithmic bioinformatics conference: Now a days almost all algorithm conferences accept algorithmic bioinformatics papers. However, the following list are some good conferences dedicated to bioinformatics including algorithms in bioinformtaics:
Again, almost all algorithm related journals accept algorithmic bioinformatics papers, but some good journals dedicated to bioinformatics, including their algorithms, are:
Bioinformatics
Journal of Computational Biology
BMC Bioinformatics
Online Journal of Bioinformatics
Journal of Bioinformatics and Computational Biology
Active participation in the class [20%]:
You will get this mark when you actively participate in the class, where by an "active participation" means you come to the class in time, turn your mobile off, attentively listen, ask (logical) questions, give comments, etc. It is not necessary that you ask questions or give comments all the time; even, your body language without any verbose participation may indicate you are attentive in the class. In your final exam some marks may be reserved for questions that will judge whether you actively participated in the class or not.
Syllabus: See the schedule and course syllabus table.
If you can find any new result on your/others topic or solve any problem mentioned in the class you will get some bonus marks on top of your total marks.
Bonus 1. [2 marks, first correct solution will get the mark] Find an example of L in which if the Skiena's PDP algorithm is applied, then at some step both left and recursive calls fail to match with remaining L and thus back track.
Solved! By Saddam Hossain, see his solution here, Saddam gets 2 marks bonus. Congratulation to Saddam!
Bonus 2: [4 marks, first correct solution will get the mark] Prove that the greedy algorithm for shortest superstring problem gives an approximation ratio of 2. Or 2 marks if you can only find the current status of the problem, whether it is still unsolved or anything new has come out, with sufficient source of references.
Ahsanur Rahman found some sources which shows that a 2(1 + ln n)-approximation is possible for this problem. Ahsan gets a partial mark, 1. Congratulation to Ahsan.
Bonus 3: [2 marks, first correct solution will get the mark] Chapter 10: How is the distance matrix calculated from intensity matrix?
Solved! By Md. Abu Sayeed Mondol. See his solution here. It's so easy! Abu Sayeed gets 2 marks bonus. Congratulation to Abu Sayeed!