/sci/ - Bio Informatics - Science & Math


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
Bio Informatics 03/11/26(Wed)04:50:12 No.16927424

File: genomopipe.webm (945 KB, 1420x928)

Bio Informatics Anonymous 03/11/26(Wed)04:50:12 No.16927424

I've set up a bioinformatics toolchain that I run on my arch linux machine.

The workflow is managed by two shell scripts, one does a fetch, and is called by the other that runs the tools.
you input the organism name in the command, then it runs for a long time and outputs reference genomes, annotations, filtered protein sequences, designed backbones + sequences + predicted structures, validation results, logs, and a README summary
The steps it goes through and the tools are:
Genome download (RefSeq preferred to GenBank to clustered fallback) + header cleanup
Quality control (adapter trimming, basic stats) BBTools (bbduk)
Repeat masking RepeatModeler, RepeatMasker
Gene prediction: BRAKER3 for eukaryotes (OrthoDB + optional RNA hints) or Prokka for prokaryotes
Protein extraction/filtering (longest isoform, ≥100 aa prok / >200 aa euk) gffread and biopython
Backbone generation with RFdiffusion (de novo or motif-scaffolded)
Sequence design using ProteinMPNN (8 variants per backbone)
Batch structure prediction with ColabFold (AlphaFold2)
Remote BLASTp validation against nr (top hit recorded)
Functional renaming based on BLAST inference
Markdown report
(managed in Conda envs)
The toolchain automates the entire process from a name through reference genome retrieval, quality control and annotation, protein extraction, AI-driven de novo protein design (backbones and sequences), structure prediction, BLAST-based validation, functional renaming, and generation of a clear Markdown report. Also an electron gui for directing other programs to visualize the output, making queries in a search bar, and a fallback 3dmol.js viewer.

Anonymous
03/11/26(Wed)05:14:42 No.16927431

Anonymous 03/11/26(Wed)05:14:42 No.16927431

>>16927424
Did you vibe code it?

Anonymous
03/11/26(Wed)06:34:56 No.16927454

Anonymous 03/11/26(Wed)06:34:56 No.16927454

>>16927424
How would that be interesting to a biologist, seems neat for a data hoarder but usually as a biologist you try to figure out a protein or a pathway, predicted TFs, binding domains, homology to other known proteins, atac-seq + rnaseq under different conditions of gene of interest are quite useful and you don't save that data.
As far as I understand it most genomes people work on have been annotated better than your pipline and the ones that hadn't will usually struggle with not enough homology to be useful.

Anonymous
03/12/26(Thu)04:19:47 No.16928112

Anonymous 03/12/26(Thu)04:19:47 No.16928112

File: demos.png (185 KB, 900x1223)

185 KB PNG

>>16927454
it does design too now
https://zenodo.org/records/18976525

Anonymous
03/13/26(Fri)05:43:21 No.16928645

Anonymous 03/13/26(Fri)05:43:21 No.16928645

>>16927431
yes

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. Additional supported file types are: PDF Use T_eX with [math] tags for inline and [eqn] tags for block equations. Right-click equations to view the source.