[LLM-based Platform for Automated Protein Structure Analysis and Drug Development-Poster presented at the 2025 Autumn Conference of the Korean Society of Medical and Biological Engineering]

Paul
Nov 6
2 min read

Introduction

Challenge in Drug Development:

Protein structure preprocessing is a critical step that determines the accuracy of molecular docking and molecular dynamics simulations
Experimental structures (X-ray, NMR) contain unnecessary water molecules, crystallization ligands, metal ions, and incomplete hydrogen information
Differences from physiological environments reduce reliability for computational studies

Limitations of Current Preprocessing:

Requires sequential use of multiple software (PyMOL, ChimeraX, Schrödinger Suite)
Demands specialized knowledge in structural biology and computational chemistry for each step
Low accessibility due to expensive licenses, installation requirements, and compatibility issues
Data loss during file conversion and lack of reproducibility/standardization due to researcher-dependent variations

Recent Technological Trends:

Web-based molecular visualization advancements enable direct structure manipulation in browsers
High-performance web viewers like Mol* can render complex structures
WebGL and WebAssembly achieve desktop-level performance in web environments
LLMs expand possibilities for automating structure analysis and quality assessment

Project Goal: By integrating web and AI technologies, develop an integrated platform that automates the entire protein structure preprocessing process, directly executable in web browsers.

Method

Preprocessing Process (6 Steps):

Water Removal: Remove HOH, WAT water molecules from input PDB (*user-specified water preservation possible)
Hydrogen Addition: Automatically add basic hydrogens around nitrogen and oxygen
Charge Assignment: Assign basic charges by amino acid residue; precise calculation when backend connected
Ligand Removal: Remove selected ligands to generate apo structure
Metal Ion Management: Remove structurally/catalytically unimportant metal ions
PDBQT Output: Generate stabilized PDBQT output (*reusable → batch processing efficiency)

Workflow: PDB file upload → JavaScript engine parsing → automatic step-by-step execution to obtain results

LLM-Integrated Analysis System:

Built OpenAI GPT-4-based LLM integration module
Upon structure file input, automatically generates expert-level analysis reports
Provides quantitative metrics: Clashscore, Ramachandran plot, B-factor
Analyzes structure's druggability, docking success rate, and improvement recommendations

Platform Features and Implementation:

Technical Implementation: Data persistence with IndexedDB, parallel processing with Web Workers API, offline support with Progressive Web App
Visualization: Real-time rendering with Mol* viewer, step-by-step change verification with diff viewer
Result Storage: Save as ZIP or individual files, LLM reports saved and utilized as PDF

Results

System Configuration:

Web standard-based automatic preprocessing algorithm design
Automatic execution in browser without separate installation
Visualization and processing functionality with Mol* viewer and JavaScript
6-step preprocessing process with independent execution capability for each step
Stable multi-file simultaneous processing with progress rate and error information for enhanced user feedback
Web-based platform immediately usable without software installation or license → improved accessibility and scalability

System Performance Evaluation:

Small proteins: Fast processing
Large proteins: Completed within reasonable time
Consistent result quality across various sizes
Verified user interface intuitiveness and reliability

Batch Processing and Accessibility:

Immediately usable without installation/license
Consistent and reliable intelligent preprocessing based on LLM
Automatic preprocessing and workflow implementation upon structure file input

Key Benefits: The proposed web-based preprocessing platform can be utilized as a high-efficiency, high-reproducibility tool capable of processing protein structures of various scales efficiently and consistently. By automating and standardizing the entire preprocessing workflow, it provides an integrated solution for the essential preprocessing steps required in drug development computational studies.