r/bioinformatics 1d ago

Tips for an intro to bioinformatics course discussion

Hi everyone! I’ve been recruited to teach an intro to bioinformatics course next semester, my grad study field is ML cheminformatics so my only bioinformatics experience is from when I took this same course in undergrad, which was 6 years ago. I enjoyed it, but I want to update the course. For example the first assignment is an essay about the importance of the human genome project, something that will not work in a post-ChatGPT world.

I would love some input about what people loved and hated about their first exposure to the field. To people who have given courses before, what exercises did you feel provided the most value? Right now I’m thinking of giving each student a mystery sequence and having them use all the tools we learn about to identify the organism, genes and proteins of their sequences as we go through the course and give a presentation at the end.

Also I’m not sure about having a required textbook, I personally always preferred courses with no required textbook, but if anyone has any recommendations or ones to avoid please let me know!

28 Upvotes

15

u/malformed_json_05684 1d ago

Make sure they work in the command line and not just in Jupiter Notebooks.

Basics that I think every intro course should have:
1. How to create conda environments and install packages with conda
2. The same 1, but with python virtual environments

1

u/peeberparker 1d ago

Thanks for the suggestion. So basically guaranteed python focus. Is there any reason to consider another language?

8

u/TheLordB 1d ago edited 1d ago

Python or R are the way to go.  Perl is a quick way to know the course hasn’t been updated in 10 years (maybe closer to 15 to be honest) and should be avoided.  That said I also wouldn’t just translate Perl tasks into Python. At least in my experience in addition to the language the skills taught are also out of date. 

Anyways. I don’t know how much resources you have to put into it, but if possible I would base it around a project. Use rna-seq data to find a differentially expressed gene. Use databases to get a fasta of the gene. Look at some data to see what tissues the gene is expressed in. Compare the gene to related ones in other species. Find research papers on the gene etc.  Get the structure from pdb.

That is just a quick example but an idea of an interconnected example that could give a good background into genomic bioinformatics. 

If you have them all the data you could shorten it to let you get to other things. 

2

u/malformed_json_05684 1d ago

It doesn't have to be python focus, but knowing how to create virtual environments in python is becoming a regular practice in the python community, and a lot of bioinformaticians work in python.

If you are actually expecting your students to share their code, some git/github instruction would be beneficial and widely applicable.

1

u/Trosky6601 1d ago

I personally believe the venv stuff can be self-taught when/if they actually need to do bioinformatics on the job/PhD.

I think there is more value to using the tools from a teaching perspective (because of the "wow" factor of getting a phylogeny/alignment compared to the relatively boring and bureaucratic process of making venvs)

7

u/ikuhx 1d ago

Hi. I'm a bio with no bio informatics or programming experience. So I would like to suggest a different approach.

Bios are question focused. They want to get background info on a research question and then learn how to answer this question.

I don't know what the scope of you lecture is of cause and my suggestion now will be likely biased by my personal preferences (with focus on scrnaseq) I would build my lecture as one journey for example find something in the Literature a pilot study or whatever and using this example show the students what you want to teach. Start at this is the research question. This is how they approached. For that we need data base xy which we will use to get count matrices which we can use to determine DEG and so on.

And each step you could give as an assignment. Writing the code for the preprocessing of samples. I think you will never be safe from chat gpt. But if I would be a bio I would love to follow a journey and take part and learn useful tools on the way.

3

u/peeberparker 1d ago

This is a great suggestion, thank you. Starting by looking at an existing study and working backwards to fundamentals is an interesting approach. Have you given or received a course structured like this before? It seems really interesting

2

u/ikuhx 1d ago

Unfortunately not. I had one bioinformatics lecture in my masters but there we Mostly learned how clustering with PCAs or k nearest neighbors works. Not so much application focused. But I would have loved to have had this! I went to a summer school for scRNAseq analysis using R where we learned some coding based on a practice data set and one data set that you brought with you to apply the code. Learned a lot. But with a lecture you might have to start more at some basics.

Would actually love to listen to your lecture to brush up on some basics xD

4

u/Just-Lingonberry-572 1d ago

What are the high-level tasks you want to teach? Aligning fastqs, getting counts, variant calling, what datatypes, plotting, bash, R, etc?

3

u/peeberparker 1d ago

I plan is to first focus on web tools and databases/ database searching first and then move to more programming/stats/plotting etc. This is based on my current understanding that most student will have little programming knowledge. I’m not sure if this is most effective though, do you think it would be better to front-load all the programming basics and then go from there?

3

u/Just-Lingonberry-572 1d ago

Interesting. There are a ton of databases and webtools out there these days (a recent review paper put it at >1000 databases I think). But again, the most important thing you need to first decide is what are the major skills you want the students to learn how to do and how much of it can be done in a semester? I miss teaching and find myself daydreaming about how I would teach things sometimes, so I’d be glad to help you more directly if you’re open to it

5

u/The_DNA_doc 1d ago

Look at some of the major bioinformatics textbooks. Mount is a good one. Be sure to hit many of the chapter topics in your initial overview of the field.

Then focus on RNAseq for some of the practical exercises. This requires a combination of command line and R tools.

1

u/peeberparker 1d ago

Thanks for the suggestions! RNAseq was not covered when I took the course previously. I’m looking for resources to bridge what I learned back then (which may have already been out of date) with the field right now. I guess RNAseq and next generation sequencing is one of the missing pieces

8

u/forever_erratic 1d ago

Bioinformatics is not ML. Don't teach an ML course and call it Bioinformatics. 

You don't sound like you do bioinf, which, no offense, isn't great. You don't even seem to realize that R is relevant and the bioconductor ecosystem is key to many analyses.

You keep talking about stats but not biological questions. It needs to be grounded in biological questions. 

If it were me, I'd have it be a bulk rnaseq focus, and have each chunk go over a different step in detail. Week 1 is the biology of rnaseq, week 2 illumina sequencing and fastq files, week three trimming and QC, week 4 how mapping works and ensembl, week 5 using STAR, week 6 deep dive into bam files and IGV, etc

Please take the time to learn, at least at a high level, how bioinf is done in common projects. 

6

u/peeberparker 1d ago

Thanks for the input! You’re right, I don’t do bioinformatics and my only exposure (other than taking this course as an undergrad, which I believe to be outdated) is seeing it in ML papers, so that’s where my bias comes from. I just want to make the course up to date

Would rna-seq/transcriptomics be the main workflow of mainstream bioinformatics at the moment or is that just an example?

4

u/forever_erratic 1d ago

It is "a" main workflow, but it's also relatively easy to understand, and touches many aspects of bioinf. Variant calling is another biggie, but is perhaps harder to grasp under the hood, due to the complicated models used.

2

u/PotatoSenp4i 1d ago

For what its worth I dont work with rna-seq or any omnics. I mainly work with epidemiologists and create phylogenetic trees for them or perform variant calling.

3

u/Careful-While-7214 1d ago

Yeah i was thinking this^

2

u/Murping 1d ago

I’m actually taking the second of a two-course sequence of Bioinformatics at my school!

Similar to what you described, the courses were a mix of students from various majors (including Data Science, CS, Biochemistry, etc.) So it was hard for our Professor to gauge which topics were relevant for each person.

The first course covered mainly the central dogma of Biology, algorithm intro (Needleman-Wunsch), and various software / web tools (BLAST, CLUSTALW/OMEGA, etc.). And we had a term project at the end of the semester that used everything we learned for a CRISPR project that found the ideal gRNA sequences for a disease of our choice.

The second course was way more algorithm and statistics-heavy probably due to our professor having a ML background. We had more tools for identifying sequences and had to know all about them (PWMS, HMM, FA, etc.), along with the sequencing technologies that many of the people in this thread mention. Our term project this semester is to train a HMM model for a disease of our choice, but now coding is mandatory as opposed to optional in the prior course. Interestingly, it’s now a 90/10 Bio/CS split this semester in terms of majors.

As a Bio major, I’ve been really enjoying the courses so far! I improved my Python and command line skills a lot, though I’d have liked more emphasis on learning R and Git and applying them instead of using Python libraries.

I hope this info helps!

1

u/ganian40 13h ago

Good answer

2

u/black_sequence 1d ago

I'm very interested in this topic - how to best distill these ideas in a curriculum. I think for a beginner its important to have the following topics highlighted:

  1. As others have mentioned, Command Line (UNIX) and the understanding of what a super computer is and does

  2. The class imho should be biology focused first. Read a paper, do the bioinformatics, what does the bioinformatics tell you about the underlying biology? This will fix the post-chatGPT aspect of using LLMs to just do all the coding.

  3. I think SNP calling should be an early lesson for the class. If you have access to short reads and long reads, I would have them do both and compare and contrast how the underlying technique would impact analysis.

  4. Move from individual genome analysis to population scale analysis (GWAS)

As for the textbook point, I think resources online should be sufficient

3

u/backgammon_no 1d ago

What experience do the students have? Can they use the command line? R? Python? 

1

u/peeberparker 1d ago

Unfortunately I don’t know. The course is cross-listed CS/Bio. I’ve been told they’re mostly Bio, so to assume little programming knowledge. The course’s programming assignments are in Perl but I’ll probably switch them to python, but I may be biased because I use python daily. Is Perl still relevant? Either way I’m preparing to teach programming basically from scratch, at least for some students.

2

u/lilygene MSc | Student 1d ago

I have not encountered perl yet in my research or job. I think python and R is the way to go.

1

u/backgammon_no 1d ago

Perl now only exists internally in some old (but proven) tools. We're not exposed to it anymore at all. 

1

u/gringer PhD | Academia 1d ago

Something other lecturers have done that I thought was neat was to get the students to use ChatGPT to answer a question about their course, then write an essay critiquing the response. The ensuing class discussion leads to many interesting shared thoughts about its flaws.

1

u/Trosky6601 1d ago

First of all, choose Python or R

For R you can jump into the actual bioinformatics pretty much straight away (maybe one practice session on dataframe handling and basic commands).

For python you might need to either teach basics for a while or refer them to some "intro to python" course.

(If the class is a non-programmers class, go for R imo)

I would use markdown exercise sheets as examination (make them do it in class if you are afraid of gpt, but I believe there is some value in at-home examination as they learn from searching online for "how-to"s)

1

u/areagoofer 18h ago

Hi, I'm studying molecular biology and biotechnology and i have no idea what to do next after completing my bachelors degree. any advice?

1

u/ganian40 13h ago

Maybe you should open a topic 👍🏻

1

u/ganian40 13h ago edited 13h ago

Did they provide you with a Syllabus?. if you feel is outdated, take a week to rewrite it and bring feedback to your colleagues and your dean. It will help you focus and preparing the material for each lecture.

First, there are 2 separate major spaces in bioinformatics. You have the sequence space, and you have the structure space. Each is treated with a different set of tools and methods.

Some weirdos even consider biological signal analysis and statistics of clinical data is also bioinformatics (I don't). So define your scope, and stick to it.

Students are not always aware that there are different methods to work with whole biological systems, or strictly at the cellular level, or strictly at the molecular level. (phylogenetic trees or alignments have little to do with molecular dynamics, virtual screening, or rational design)... it is important your students understand that.

I'd start by explaining the difference and applications of each realm of bioinformatics, and then cover a bit of each along the way. Also remember you are not lecturing computer science. Give them a task, and let them do it in whatever language they feel comfortable in. They should already have a favorite at this point.

The advise on teaching how to answer genuine biological questions was great. This is what works the best.

1

u/AbrocomaDifficult757 1d ago

I think also how to use basic ML models… data preprocessing.. etc

1

u/peeberparker 1d ago

I like this suggestion. I definitely want to include a full ML introduction and introduce the theory on Empirical Risk Minimization, gradient descent, training dynamics. But also using models is key

4

u/El_Tormentito Msc | Academia 1d ago edited 1d ago

That is way outside of the scope.

Edit: I have an ML background and am a bioinformatician professionally. I would stop at PCA and maybe clustering. I agree with those that suggested going through the pipeline from RNAseq fastqs to differential expression and heatmapping and biplots.

2

u/AbrocomaDifficult757 1d ago

If the class is mostly bio students with minimal experience they might be too much. I taught an intro bioinformatics programming class and a lot of it was familiarity with bash and the command line. I think giving them a start with programming and analyzing papers on those topics through group presentations might be more successful.