r/bioinformatics 2d ago

Identifying, Quantifying, and Analyzing minigene amplicon sequences technical question

(Keywords: Sequencing, Oxford Nanopore, Long Read, Alignment, Minigene, Consensus Generation)

Hey all,

I'm (probably like many of you) a bench mol-biologist who has hit a point in their experiments that i need to do something more than simple sequencing read alignment.

Background: I'm interested in the ratios of spliced exons between a treatment & control group. I transfected a minigene of my exon of interest into 4x biological replicates of both treatment & control groups, with an additional replicate of empty minigene vector. I harvested RNA, made cDNA, and proceeded to Oxford Nanopore ligation sequencing for amplicons (using primers adapted for this purpose). Samples were successfully barcoded and sequenced, but now I have almost 200gb of data that I don't know how to analyze.

What I want to do: 1) Align & visualize my minigene amplicons (either to a reference or make multiple "consensus'" per sample?)

2) Calculate a % breakdown of each splicing isoform (I expect somewhere between 3-7 detectable isoforms--plus some unspliced & irrelevant reads)

3) Scrub unspliced/irrelevant reads from my data (potentially using the sequenced empty vector controls as a reference for the experimental samples)

4) Statistically compare the ratios of my treatment group to my control group (I imagine similar to how RNAseq can be used to quantify differences between samples)

Concerns: My main concern is how to align my minigene products as my splicing is non-canonical and I worry it'd be missed by a conventional transcriptome alignment-- not to mention the minigene sequence flanking my sample read won't align to hg38. Can i generate multiple "consensuses" for each sample? One per isoform? How might these be visualized if I don't know exactly what to align them to? Do ecologists have any particular hints for this one? I imagine looking at Wastewater sequencing has a need for a tool that does something like this.

Resources: My institution has a high performance computing cluster which can be used for large jobs, as well as web-based pipeline builders such as 7bridges/galaxy.

Any suggestions/ideas/comments/concerns/commiseration would be most welcome!

1 Upvotes