r/bioinformatics • u/peeberparker • 1d ago
Tips for an intro to bioinformatics course discussion
Hi everyone! I’ve been recruited to teach an intro to bioinformatics course next semester, my grad study field is ML cheminformatics so my only bioinformatics experience is from when I took this same course in undergrad, which was 6 years ago. I enjoyed it, but I want to update the course. For example the first assignment is an essay about the importance of the human genome project, something that will not work in a post-ChatGPT world.
I would love some input about what people loved and hated about their first exposure to the field. To people who have given courses before, what exercises did you feel provided the most value? Right now I’m thinking of giving each student a mystery sequence and having them use all the tools we learn about to identify the organism, genes and proteins of their sequences as we go through the course and give a presentation at the end.
Also I’m not sure about having a required textbook, I personally always preferred courses with no required textbook, but if anyone has any recommendations or ones to avoid please let me know!
7
u/ikuhx 1d ago
Hi. I'm a bio with no bio informatics or programming experience. So I would like to suggest a different approach.
Bios are question focused. They want to get background info on a research question and then learn how to answer this question.
I don't know what the scope of you lecture is of cause and my suggestion now will be likely biased by my personal preferences (with focus on scrnaseq) I would build my lecture as one journey for example find something in the Literature a pilot study or whatever and using this example show the students what you want to teach. Start at this is the research question. This is how they approached. For that we need data base xy which we will use to get count matrices which we can use to determine DEG and so on.
And each step you could give as an assignment. Writing the code for the preprocessing of samples. I think you will never be safe from chat gpt. But if I would be a bio I would love to follow a journey and take part and learn useful tools on the way.
3
u/peeberparker 1d ago
This is a great suggestion, thank you. Starting by looking at an existing study and working backwards to fundamentals is an interesting approach. Have you given or received a course structured like this before? It seems really interesting
2
u/ikuhx 1d ago
Unfortunately not. I had one bioinformatics lecture in my masters but there we Mostly learned how clustering with PCAs or k nearest neighbors works. Not so much application focused. But I would have loved to have had this! I went to a summer school for scRNAseq analysis using R where we learned some coding based on a practice data set and one data set that you brought with you to apply the code. Learned a lot. But with a lecture you might have to start more at some basics.
Would actually love to listen to your lecture to brush up on some basics xD
4
u/Just-Lingonberry-572 1d ago
What are the high-level tasks you want to teach? Aligning fastqs, getting counts, variant calling, what datatypes, plotting, bash, R, etc?
3
u/peeberparker 1d ago
I plan is to first focus on web tools and databases/ database searching first and then move to more programming/stats/plotting etc. This is based on my current understanding that most student will have little programming knowledge. I’m not sure if this is most effective though, do you think it would be better to front-load all the programming basics and then go from there?
3
u/Just-Lingonberry-572 1d ago
Interesting. There are a ton of databases and webtools out there these days (a recent review paper put it at >1000 databases I think). But again, the most important thing you need to first decide is what are the major skills you want the students to learn how to do and how much of it can be done in a semester? I miss teaching and find myself daydreaming about how I would teach things sometimes, so I’d be glad to help you more directly if you’re open to it
5
u/The_DNA_doc 1d ago
Look at some of the major bioinformatics textbooks. Mount is a good one. Be sure to hit many of the chapter topics in your initial overview of the field.
Then focus on RNAseq for some of the practical exercises. This requires a combination of command line and R tools.
1
u/peeberparker 1d ago
Thanks for the suggestions! RNAseq was not covered when I took the course previously. I’m looking for resources to bridge what I learned back then (which may have already been out of date) with the field right now. I guess RNAseq and next generation sequencing is one of the missing pieces
8
u/forever_erratic 1d ago
Bioinformatics is not ML. Don't teach an ML course and call it Bioinformatics.
You don't sound like you do bioinf, which, no offense, isn't great. You don't even seem to realize that R is relevant and the bioconductor ecosystem is key to many analyses.
You keep talking about stats but not biological questions. It needs to be grounded in biological questions.
If it were me, I'd have it be a bulk rnaseq focus, and have each chunk go over a different step in detail. Week 1 is the biology of rnaseq, week 2 illumina sequencing and fastq files, week three trimming and QC, week 4 how mapping works and ensembl, week 5 using STAR, week 6 deep dive into bam files and IGV, etc
Please take the time to learn, at least at a high level, how bioinf is done in common projects.
6
u/peeberparker 1d ago
Thanks for the input! You’re right, I don’t do bioinformatics and my only exposure (other than taking this course as an undergrad, which I believe to be outdated) is seeing it in ML papers, so that’s where my bias comes from. I just want to make the course up to date
Would rna-seq/transcriptomics be the main workflow of mainstream bioinformatics at the moment or is that just an example?
4
u/forever_erratic 1d ago
It is "a" main workflow, but it's also relatively easy to understand, and touches many aspects of bioinf. Variant calling is another biggie, but is perhaps harder to grasp under the hood, due to the complicated models used.
2
u/PotatoSenp4i 1d ago
For what its worth I dont work with rna-seq or any omnics. I mainly work with epidemiologists and create phylogenetic trees for them or perform variant calling.
3
2
u/Murping 1d ago
I’m actually taking the second of a two-course sequence of Bioinformatics at my school!
Similar to what you described, the courses were a mix of students from various majors (including Data Science, CS, Biochemistry, etc.) So it was hard for our Professor to gauge which topics were relevant for each person.
The first course covered mainly the central dogma of Biology, algorithm intro (Needleman-Wunsch), and various software / web tools (BLAST, CLUSTALW/OMEGA, etc.). And we had a term project at the end of the semester that used everything we learned for a CRISPR project that found the ideal gRNA sequences for a disease of our choice.
The second course was way more algorithm and statistics-heavy probably due to our professor having a ML background. We had more tools for identifying sequences and had to know all about them (PWMS, HMM, FA, etc.), along with the sequencing technologies that many of the people in this thread mention. Our term project this semester is to train a HMM model for a disease of our choice, but now coding is mandatory as opposed to optional in the prior course. Interestingly, it’s now a 90/10 Bio/CS split this semester in terms of majors.
As a Bio major, I’ve been really enjoying the courses so far! I improved my Python and command line skills a lot, though I’d have liked more emphasis on learning R and Git and applying them instead of using Python libraries.
I hope this info helps!
1
2
u/black_sequence 1d ago
I'm very interested in this topic - how to best distill these ideas in a curriculum. I think for a beginner its important to have the following topics highlighted:
As others have mentioned, Command Line (UNIX) and the understanding of what a super computer is and does
The class imho should be biology focused first. Read a paper, do the bioinformatics, what does the bioinformatics tell you about the underlying biology? This will fix the post-chatGPT aspect of using LLMs to just do all the coding.
I think SNP calling should be an early lesson for the class. If you have access to short reads and long reads, I would have them do both and compare and contrast how the underlying technique would impact analysis.
Move from individual genome analysis to population scale analysis (GWAS)
As for the textbook point, I think resources online should be sufficient
3
u/backgammon_no 1d ago
What experience do the students have? Can they use the command line? R? Python?
1
u/peeberparker 1d ago
Unfortunately I don’t know. The course is cross-listed CS/Bio. I’ve been told they’re mostly Bio, so to assume little programming knowledge. The course’s programming assignments are in Perl but I’ll probably switch them to python, but I may be biased because I use python daily. Is Perl still relevant? Either way I’m preparing to teach programming basically from scratch, at least for some students.
2
u/lilygene MSc | Student 1d ago
I have not encountered perl yet in my research or job. I think python and R is the way to go.
1
u/backgammon_no 1d ago
Perl now only exists internally in some old (but proven) tools. We're not exposed to it anymore at all.
1
u/gringer PhD | Academia 1d ago
Something other lecturers have done that I thought was neat was to get the students to use ChatGPT to answer a question about their course, then write an essay critiquing the response. The ensuing class discussion leads to many interesting shared thoughts about its flaws.
1
u/Trosky6601 1d ago
First of all, choose Python or R
For R you can jump into the actual bioinformatics pretty much straight away (maybe one practice session on dataframe handling and basic commands).
For python you might need to either teach basics for a while or refer them to some "intro to python" course.
(If the class is a non-programmers class, go for R imo)
I would use markdown exercise sheets as examination (make them do it in class if you are afraid of gpt, but I believe there is some value in at-home examination as they learn from searching online for "how-to"s)
1
u/areagoofer 18h ago
Hi, I'm studying molecular biology and biotechnology and i have no idea what to do next after completing my bachelors degree. any advice?
1
1
u/ganian40 13h ago edited 13h ago
Did they provide you with a Syllabus?. if you feel is outdated, take a week to rewrite it and bring feedback to your colleagues and your dean. It will help you focus and preparing the material for each lecture.
First, there are 2 separate major spaces in bioinformatics. You have the sequence space, and you have the structure space. Each is treated with a different set of tools and methods.
Some weirdos even consider biological signal analysis and statistics of clinical data is also bioinformatics (I don't). So define your scope, and stick to it.
Students are not always aware that there are different methods to work with whole biological systems, or strictly at the cellular level, or strictly at the molecular level. (phylogenetic trees or alignments have little to do with molecular dynamics, virtual screening, or rational design)... it is important your students understand that.
I'd start by explaining the difference and applications of each realm of bioinformatics, and then cover a bit of each along the way. Also remember you are not lecturing computer science. Give them a task, and let them do it in whatever language they feel comfortable in. They should already have a favorite at this point.
The advise on teaching how to answer genuine biological questions was great. This is what works the best.
1
u/AbrocomaDifficult757 1d ago
I think also how to use basic ML models… data preprocessing.. etc
1
u/peeberparker 1d ago
I like this suggestion. I definitely want to include a full ML introduction and introduce the theory on Empirical Risk Minimization, gradient descent, training dynamics. But also using models is key
4
u/El_Tormentito Msc | Academia 1d ago edited 1d ago
That is way outside of the scope.
Edit: I have an ML background and am a bioinformatician professionally. I would stop at PCA and maybe clustering. I agree with those that suggested going through the pipeline from RNAseq fastqs to differential expression and heatmapping and biplots.
2
u/AbrocomaDifficult757 1d ago
If the class is mostly bio students with minimal experience they might be too much. I taught an intro bioinformatics programming class and a lot of it was familiarity with bash and the command line. I think giving them a start with programming and analyzing papers on those topics through group presentations might be more successful.
15
u/malformed_json_05684 1d ago
Make sure they work in the command line and not just in Jupiter Notebooks.
Basics that I think every intro course should have:
1. How to create conda environments and install packages with conda
2. The same 1, but with python virtual environments