top of page

[LLM-based Platform for Automated Protein Structure Analysis and Drug Development-Poster presented at the 2025 Autumn Conference of the Korean Society of Medical and Biological Engineering]

  • Writer: Paul
    Paul
  • Nov 6
  • 2 min read

Introduction


Challenge in Drug Development:


  • Protein structure preprocessing is a critical step that determines the accuracy of molecular docking and molecular dynamics simulations

  • Experimental structures (X-ray, NMR) contain unnecessary water molecules, crystallization ligands, metal ions, and incomplete hydrogen information

  • Differences from physiological environments reduce reliability for computational studies


Limitations of Current Preprocessing:


  • Requires sequential use of multiple software (PyMOL, ChimeraX, Schrödinger Suite)

  • Demands specialized knowledge in structural biology and computational chemistry for each step

  • Low accessibility due to expensive licenses, installation requirements, and compatibility issues

  • Data loss during file conversion and lack of reproducibility/standardization due to researcher-dependent variations


Recent Technological Trends:


  • Web-based molecular visualization advancements enable direct structure manipulation in browsers

  • High-performance web viewers like Mol* can render complex structures

  • WebGL and WebAssembly achieve desktop-level performance in web environments

  • LLMs expand possibilities for automating structure analysis and quality assessment


Project Goal: By integrating web and AI technologies, develop an integrated platform that automates the entire protein structure preprocessing process, directly executable in web browsers.


Method


Preprocessing Process (6 Steps):


  1. Water Removal: Remove HOH, WAT water molecules from input PDB (*user-specified water preservation possible)

  2. Hydrogen Addition: Automatically add basic hydrogens around nitrogen and oxygen

  3. Charge Assignment: Assign basic charges by amino acid residue; precise calculation when backend connected

  4. Ligand Removal: Remove selected ligands to generate apo structure

  5. Metal Ion Management: Remove structurally/catalytically unimportant metal ions

  6. PDBQT Output: Generate stabilized PDBQT output (*reusable → batch processing efficiency)


Workflow: PDB file upload → JavaScript engine parsing → automatic step-by-step execution to obtain results


LLM-Integrated Analysis System:

  • Built OpenAI GPT-4-based LLM integration module

  • Upon structure file input, automatically generates expert-level analysis reports

  • Provides quantitative metrics: Clashscore, Ramachandran plot, B-factor

  • Analyzes structure's druggability, docking success rate, and improvement recommendations


Platform Features and Implementation:

  • Technical Implementation: Data persistence with IndexedDB, parallel processing with Web Workers API, offline support with Progressive Web App

  • Visualization: Real-time rendering with Mol* viewer, step-by-step change verification with diff viewer

  • Result Storage: Save as ZIP or individual files, LLM reports saved and utilized as PDF


Results


System Configuration:

  • Web standard-based automatic preprocessing algorithm design

  • Automatic execution in browser without separate installation

  • Visualization and processing functionality with Mol* viewer and JavaScript

  • 6-step preprocessing process with independent execution capability for each step

  • Stable multi-file simultaneous processing with progress rate and error information for enhanced user feedback

  • Web-based platform immediately usable without software installation or license → improved accessibility and scalability


System Performance Evaluation:

  • Small proteins: Fast processing

  • Large proteins: Completed within reasonable time

  • Consistent result quality across various sizes

  • Verified user interface intuitiveness and reliability


Batch Processing and Accessibility:

  • Immediately usable without installation/license

  • Consistent and reliable intelligent preprocessing based on LLM

  • Automatic preprocessing and workflow implementation upon structure file input


Key Benefits: The proposed web-based preprocessing platform can be utilized as a high-efficiency, high-reproducibility tool capable of processing protein structures of various scales efficiently and consistently. By automating and standardizing the entire preprocessing workflow, it provides an integrated solution for the essential preprocessing steps required in drug development computational studies.

 
 
 
AI Cloud Tech startup trends

© 2019-2025, Paul & Companies | AI Cloud Tech leaders Insight  All rights reserved.

  • Youtube
  • LinkedIn
bottom of page