[LLM-based Platform for Automated Protein Structure Analysis and Drug Development-Poster presented at the 2025 Autumn Conference of the Korean Society of Medical and Biological Engineering]
- Paul
- Nov 6
- 2 min read
Introduction
Challenge in Drug Development:
Protein structure preprocessing is a critical step that determines the accuracy of molecular docking and molecular dynamics simulations
Experimental structures (X-ray, NMR) contain unnecessary water molecules, crystallization ligands, metal ions, and incomplete hydrogen information
Differences from physiological environments reduce reliability for computational studies
Limitations of Current Preprocessing:
Requires sequential use of multiple software (PyMOL, ChimeraX, Schrödinger Suite)
Demands specialized knowledge in structural biology and computational chemistry for each step
Low accessibility due to expensive licenses, installation requirements, and compatibility issues
Data loss during file conversion and lack of reproducibility/standardization due to researcher-dependent variations
Recent Technological Trends:
Web-based molecular visualization advancements enable direct structure manipulation in browsers
High-performance web viewers like Mol* can render complex structures
WebGL and WebAssembly achieve desktop-level performance in web environments
LLMs expand possibilities for automating structure analysis and quality assessment
Project Goal: By integrating web and AI technologies, develop an integrated platform that automates the entire protein structure preprocessing process, directly executable in web browsers.
Method
Preprocessing Process (6 Steps):
Water Removal: Remove HOH, WAT water molecules from input PDB (*user-specified water preservation possible)
Hydrogen Addition: Automatically add basic hydrogens around nitrogen and oxygen
Charge Assignment: Assign basic charges by amino acid residue; precise calculation when backend connected
Ligand Removal: Remove selected ligands to generate apo structure
Metal Ion Management: Remove structurally/catalytically unimportant metal ions
PDBQT Output: Generate stabilized PDBQT output (*reusable → batch processing efficiency)
Workflow: PDB file upload → JavaScript engine parsing → automatic step-by-step execution to obtain results
LLM-Integrated Analysis System:
Built OpenAI GPT-4-based LLM integration module
Upon structure file input, automatically generates expert-level analysis reports
Provides quantitative metrics: Clashscore, Ramachandran plot, B-factor
Analyzes structure's druggability, docking success rate, and improvement recommendations
Platform Features and Implementation:
Technical Implementation: Data persistence with IndexedDB, parallel processing with Web Workers API, offline support with Progressive Web App
Visualization: Real-time rendering with Mol* viewer, step-by-step change verification with diff viewer
Result Storage: Save as ZIP or individual files, LLM reports saved and utilized as PDF
Results
System Configuration:
Web standard-based automatic preprocessing algorithm design
Automatic execution in browser without separate installation
Visualization and processing functionality with Mol* viewer and JavaScript
6-step preprocessing process with independent execution capability for each step
Stable multi-file simultaneous processing with progress rate and error information for enhanced user feedback
Web-based platform immediately usable without software installation or license → improved accessibility and scalability
System Performance Evaluation:
Small proteins: Fast processing
Large proteins: Completed within reasonable time
Consistent result quality across various sizes
Verified user interface intuitiveness and reliability
Batch Processing and Accessibility:
Immediately usable without installation/license
Consistent and reliable intelligent preprocessing based on LLM
Automatic preprocessing and workflow implementation upon structure file input
Key Benefits: The proposed web-based preprocessing platform can be utilized as a high-efficiency, high-reproducibility tool capable of processing protein structures of various scales efficiently and consistently. By automating and standardizing the entire preprocessing workflow, it provides an integrated solution for the essential preprocessing steps required in drug development computational studies.


