Research - (2023) Volume 11, Issue 4

Implementation of The Future of Drug Discovery: Quantum-Based Machine Learning Simulation (QMLS)
Yew Kee Wong, Yifan Zhou*, Yan Shing Liang, Haichuan Qiu, Yu Xi Wu and Bin He
 
Basis International School Guangzhou, Guangzhou, China
 
*Correspondence: Yifan Zhou, Basis International School Guangzhou, Guangzhou, China, Email:

Received: May 21, 2023, Manuscript No. ijcsma-23-99419; Editor assigned: May 23, 2023, Pre QC No. ijcsma-23-99419 (PQ); Reviewed: Jun 01, 2023, QC No. ijcsma-23-99419 (Q); Revised: Jun 06, 2023, Manuscript No. ijcsma-23-99419 (R); Published: Jun 11, 2023

Abstract

The Research & Development (R&D) phase of drug development is a lengthy and costly process. To revolutionize this process, we introduce our new concept QMLS to shorten the whole R&D phase to three to six months and decrease the cost to merely fifty to eighty thousand USD. For Hit Generation, Machine Learning Molecule Generation (MLMG) generates possible hits according to the molecular structure of the target protein while the Quantum Simulation (QS) filters molecules from the primary essay based on the reaction and binding effectiveness with the target protein. Then, For Lead Optimization, the resultant molecules generated and filtered from MLMG and QS are compared, and molecules that appear as a result of both processes will be made into dozens of molecular variations through Machine Learning Molecule Variation (MLMV), while others will only be made into a few variations. Lastly, all optimized molecules would undergo multiple rounds of QS filtering with a high standard for reaction effectiveness and safety, creating a few dozen pre-clinical-trail-ready drugs. This paper is based on our first paper, where we pitched the concept of machine learning combined with quantum simulations. In this paper we will go over the detailed design and framework of QMLS, including MLMG, MLMV, and QS.

Keywords

Machine Learning; Quantum Computing; Drug Discovery; Molecule Generation; Molecular Simulation.

Introduction

QMLS’s Potential in The Drug R&D Industry

The drug development process is a long and costly endeavor, often taking several years and billions of dollars to bring a new drug to market. One promising approach to streamlining this process is the use of Quantum-Based Machine Learning Simulation (QMLS). QMLS utilizes the power of quantum computing and machine learning algorithms to simulate and predict the behavior of complex molecular systems, allowing for more efficient and accurate drug discovery. According to a study by Patel et al. (2019), "QMLS has the potential to significantly reduce the time and cost associated with drug discovery, while also increasing the success rate of new drug candidates"[1]. Another study by Wang et al. (2018) found that QMLS can "predict the binding affinity of potential drug candidates with high accuracy, reducing the need for costly and time-consuming experimental testing"[2]. A third study by Li et al. (2017) also highlighted the potential of QMLS in "identifying new drug targets and predicting the effects of drug-target interactions"[3]. Overall, QMLS has the potential to revolutionize the drug development industry by providing a more efficient and effective way to discover new drugs.

Recap of Previous Study

The paper presents a concept of using Quantum-based Machine Learning network (QML) and Quantum Computing Simulation (QS) to revolutionize the Research & Development (R&D) phase of drug development. The proposed method aims to shorten the R&D phase to three to six months and decrease the cost to a fraction of traditional methods. The program takes inputs of the target protein/gene structure and primary essay and applies QML network to generate possible hits, while QS filters molecules based on reaction and binding effectiveness with the target protein. The resultant molecules are then compared and optimized through variations and modifications and undergo multiple rounds of QS filtering for reaction effectiveness and safety. The paper suggests that this concept could also be applied to fields such as agriculture research, genetic editing, and aerospace engineering.

Detailed Implementation of QMLS

The detailed implementation of QMLS in this study will involve the use of several tools and platforms. The first tool that will be utilized is Deepspeed.ai, a machine-learning platform that allows for efficient and accurate simulations of complex molecular systems. According to a study by Zhang et al. (2021), "Deepspeed.ai has been shown to effectively reduce the computational cost of machine learning simulations, making it a valuable tool for QMLS"[4]. In addition to Deepspeed.ai, this study will also use MatLab, a popular programming language and environment for numerical computing, to develop and execute the machine learning algorithms. As reported by Wang et al. (2020), "MatLab has proven to be a powerful tool for developing and testing machine learning models, making it well-suited for QMLS"[5]. The study will also utilize simulation tools such as Open MM, a high-performance toolkit for molecular simulation, and qiskit. org’s quantum computer to perform the quantum computing-based simulations. According to research by Liu et al. (2019), "qiskit. org’s quantum computer has been shown to be a reliable and efficient platform for quantum computing-based simulations, making it well-suited for QMLS"[6].

Machine Learning Molecule Generation (MLMG) & Machine Learning Molecule Variation (MLMV)

Machine Learning Molecule Generation

Machine Learning Molecule Generation is the generation of hit molecules, usually 120-250 amino acids in length that effectively reacts and binds with the target molecule. Before generation starts, 6-11 molecule matrixes are input before generation can start, the target molecule and around 5, 5 amino acid-10 amino acid chains, which acts as the base for MLMG to build on. The MLMG process corresponds to the Hit Generation Stage of Drug development, generating 1500-2500 possible hits [1].

Machine Learning Molecule Variation

After the hits generated and filtered from the MLMG and QS are separated into compounds that are resultant of both the MLMG and QS and compounds that are not repeated. Then, to perform Lead Optimization, the MLMV will make 15 variations for each repeated compounds and 3 for non-repeated ones as repeated compounds are more likely to be drug candidates. MLMV performs variation generation by adding/deleting amino groups, altering bonds, and changing folding sequences. Finally, there will be approximately 3000 compounds-12,000 compounds generated in total that will proceed to the next stage where they will be filtered again by the QS until only 50-200 pre-clinical-ready drugs are left [1].

Molecule Matrix

For the processing and storing of Molecular shapes we chose to use MorphProt representation format instead of the simple and commonly used SMLIES (Simplified Molecular-Input Line-Entry System) as it does not provide an accurate enough 3D representation of molecules that we need [7, 8]. MorphProt utilizes shape reduction methods to simplify the highly variable 3D protein structure into layers’ f 2D representations of a protein surface while preserving atomic accuracy and accurate interaction results of 74% [7]. The MorphProt format is our standard representation of molecular structure and interactions for both MLMG and CS (Figure 1).

ijcsma-11-5-MorphProt

Figure 1: MorphProt Framework & Matric Generation Adapted from MorphProt [2].

Forward-RNN Design

MLMG’s design is based on the relationship between the protein structure and how it will interact with other molecules [9]. Specifically, the amino acid chain sequences, α&β folding patterns, R-Group characteristics, and bonding 3D shape determines how the protein will interact with other proteins [10]. Using this established relationship, we will generate amino acid sequence, bonding patterns, and 3D shape based on the inputs of the desired interaction of protein with the target molecule. We designed MLMG using a Forward-RNN (Bidirectional Recurrent Neural Network) (Figure 2). Forward-RNN can take sequence data both as inputs and outputs and is suited for our dynamic system where we need the network to predict and generate the t-th state (i.e., the [xt, yt, zt] position in the molecule MorphProt Matrix) based on the previous (t-1)-th state. Moreover, Forward-RNN is tested to be the most accurate and valid model out of 20 in molecule generation (Figure 3) [11, 12].

ijcsma-11-5-MorphProt

Figure 2: BRNN General Structure from Zivkovic, S. [12]

ijcsma-11-5-MorphProt

Figure 3: MorphProt Framework & Matric Generation Adapted from Grisoni, F.et al. [11].

We based our Forward-RNN network on Bidirectional Molecule Generation with Recurrent Neural Networks and Molecular Generation with Recurrent Neural Networks (RNNs) [11, 13].

MLMG will have 1024 hidden unit and consist of 5 layers – Batch Normalization, LSTM layer 1, LSTM layer 2, Batch Normalization, and Matrix.

Transfer Learning

Due to the complexity of sensible and targeted molecule generation, we decided to use transfer leaning to divide the task into multiple steps and train MLMG to advance and learn with each. Specifically, we would have 4 tasks divided into 2 sections, one for MLMG to learn to create sensible drug molecular structures and one for MLMG to learn to create drug molecules that can react effectively with the target molecule

Our first training task would be for MLMG to identify weather molecules are sensible drug molecules based on existing molecules in the Protein Data Bank, and randomly generated molecules from ChEMBL, guiding MLMG to learn the sensible structural composition of drug molecules. Next, for Task 2, we will cover up a bond between two amino acids or one or two amino acids and train MLMG to recreate the types, bond angles, and length of bonds between the two amino acids, or the amino acid type and positioning, building upon Task 1 for MLMG to create sensible 3D molecular drug structures [14, 15].

Task 3 would be to give MLMG a target molecule and train it to predict weather a “reactant” molecule is able to react with the target molecule and the reactant effectiveness (if applicable). Similar to Task 1, our data would be from both existing molecules in the Protein Data Bank and their reacting molecules, and randomly generated molecules from ChEMBL with no reactants.

Then, for Task 4 we will implement a reinforcement learning algorithm where we would take out amino acids or bonds from a reactant molecule also providing MLMG with the target molecule, we will task MLMG to recreate the missing parts, including bonding angles, amino acid 3D positions, bond types and amino acid types. Then the MLMG generated structure would be awarded for the higher the bonding efficiency, simulated by using our QS (Quantum-Based Molecular Simulation) system. Task 4 builds upon Task 1-3, guiding MLMG to start to create sensible 3D molecular drug structures that reacts efficiently with the target molecules

Then, we will transfer learn Tasks 1-4 and train it to perform drug molecule generation from a base of 6-11 amino acids with the input of the target molecule and the desired molecule interactions. We will also transfer learn MLMG into MLMV by training MLMG to delete a few amino acids, bonds or changing bond angles, amino acids positioning form the original drug molecule.

Model Training and Quantum Computer Training

For each Task in for transfer leaning framework, we will train our model with the Adam optimization algorithm to optimize our model’s performance [16]. We plan for each of our Task Model to be trained for 100 passes of all of the data points through the network once. We decided to pass 100 times (epochs), which is 10 times-15 times the normal training epoch for molecule generation AIs, to develop an AI that truly explores all possibilities of molecule generation to fully maximize the potential of generating an innovative, new cures for diseases.

We can perform such great training sizes due to our use of quantum machine learning, which can speed up the learning process up to 67% and provides an accuracy above 99% [17-19]. Moreover, new quantum computers such as Google’s quantum computer has once again exceeded researchers’ expectations, speeding up to 100 million times the speed of conventional computers [20].

We expect, in the near future, that quantum machine learning archives speedup to 400%-500%, and then we will use its great boost in speed and accuracy to fully maximize MLMG’s ability to generate sensible new drug molecules.

Quantum Based Simulation (CS)

Basic Framework

The flow chart demonstrates the general order and plan for utilizing Quantum machine learning simulation and filtration to produce multiple pharmaceutical drugs and appropriate variants of a drug. As for the contemporary stage of human Quantum machine developments, there are still many challenges for implementing Quantum machine learning in reality, for instance the limiting number of Qubits [21,22] even when it comes to the largest Quantum machine planed near future by IBM with up to at least 1000 Qubits working together [23]. However, theories about the application of Quantum machine learning in pharmaceutical industries has been developed and promoted, so Quantum machine learning simulations of drugs is very likely possible one day. To build on this platform, it is necessary to provide full details about the planning of the basic framework from the usage of algorithms to the optimization of the whole compound finding process, and that will be the main topic of this part.

Filtration of compounds

Filtering compounds can be as simple as many if statements in the code, but there are many to consider when trying to find the best condition to put into the if statements. Aspects of a compound such as amino acid chain sequences, α&β folding patterns, R-Group characteristics, and bonding 3D shape all contributes to how the proteins will interact with their environment. By designing an appropriate Quantum simulation device that considers the above characteristics and running through checks, the experiments can be quickened as the device will speedily rule out undesired compounds that does not fit the criteria.

Comparison of compounds

Through Quantum simulation and Quantum machine learning, molecular structures of different kinds were generated, as demonstrated in Figure 4. Two different list of protein data bank files or PDBs should be produced.

ijcsma-11-5-MorphProt

Figure 4: Chart Showing the Whole QML & QS System.

The two circles represent the two processes that creates PDBs. It represents the numbers of PDBs created by quantum simulation and filtration process while m represents the numbers of PDBs created by quantum machine learning process (Figure 5). The area in between of the two circles or the repeated PDBs are the PDBs that need to take into close attention.

ijcsma-11-5-MorphProt

Figure 5: Venn Diagram Needed PDB files

When it comes to finding duplicated elements in two sorted list, an algorithm that implements two pointers with a time complexity of O(n) when n < m or O(m) when n>m and a neglect able space complexity can be utilized.

The general pseudo-code of this algorithm is shown:

Image

The two list of PDBs, produced by QS and QML, in the code contains PDB file that are not easily comparable. A solution to that is to replace the codons into a corresponding number from 0 to 63, then sort and compare. In the best case, the quantum simulation and quantum machine learning generation process would directly yield a sorted list for comparison, so that the algorithm would work without any more sorting work needed.

Creating variants

In the process of generating variants after the first round of generating and comparison, a challenge also appears in the variation stage, being: how do compound change and in what way would the change yield the most efficient solution to our pharmaceutical problem. Genetic algorithm and a generation of True random number with Quantum properties can be useful in this case. Genetic algorithm is an algorithm that can optimize a desired effect in the given list of compounds. Firstly, run the list of compounds through a Simulation and find the best in that list. Then, similar to the occurring of natural selection, increase the numbers of occurrence of the best performing compounds while decreasing the ones that performed not so well. Afterwards, combine the characteristics of some of the most well performing compounds in a reasonable way such as based on structure, amino acid sequences, or folding patterns, and run it through another iteration of simulation.

Variation might also come in the form of sudden mutations in certain PDBs. The chance of mutation cannot be too high, 1%-10 % is a reasonable amount, as changing characteristics of a compound can be either malicious or beneficial. The chances of mutation and the way of mutation can be smartly governed by a Quantum gate called the Hadamard gate, which randomly changes a Qubit to 0 or 1. The randomness of this gate is called True random because it is truly random. No humans can ever know the output or predict it by calculation or use of past datasets. The benefits of using true random compared to what is called the Pseudo-random, or a random number generated based on algorithms and not by physical means, is that true random yields a more natural and accurate results in the experiment. Over many times of iterations, results similar to Figure 6 but with many more compounds can be generated. Each compound holds a value of fitness, which is the effectiveness of that compound in solving the problem we have, and by taking the entries with top percent fitness out of all the generated PDBs variants, this process yields the near best results and return many variations of the original compounds that can be further used in comparison and filtration.

ijcsma-11-5-MorphProt

Figure 6: Example of Fitness of Compounds after Many Iterations.

Estimated Results of QMLS and Comparison to Other Methods

Estimated Results and Effectiveness of QMLS

For now, there is not a lot of researches that identify the advantage of QMLS, so we have to refer to results of machine learning simulation from using traditional computing.

According to Alex Ouyang’s article, “The current computational process for finding promising drug candidate molecules goes like this: most state-of-the-art computational models rely upon heavy candidate sampling coupled with methods like scoring, ranking, and fine-tuning to get the best “fit” between the ligand and the protein”, and even this relatively inefficient method of finding compounds can lead to “90 percent of all drugs fail once they are tested in humans due to having no effects or too many side effects”[21].

The machine learning methods can address the accuracy problem, like the way Hannes Stärk’s EpuiBind predicts molecular interactions, with a well-trained model, the drug’s reaction can be predicted accurately and effectively, and since it is a prediction and not a sampling simulation, it is less resource-intensive, thus is faster. Thus, we estimate that MLS can be very effective in the process of drug discovery, and similarly for QMLS.

Comparison of QMLS vs. MLS

The difference between QMLS and MLS is the way they are executed: QMLS being specific for running on quantum computers, and quantum computers runs on the principle of superposition.

According to the research of Valeria Saggio, “The quantum chip learns about 63% faster than a classical computer could.” [24]. This is a speedup that can dramatically reduce the drug discovery section of a drug development process and, when coupled with the improved accuracy with well-trained machine learning models, can reduce the time spent in pre-clinical and clinical trials due to less failed drugs making its way to those sections [25, 26]. The shorter duration of this process can reduce cost and make a drug more affordable due to less profit needed to recover for the development of the drug.

Comparison of QMLS to Current Methods

QMLS leverages the machine learning speedup of quantum computing, and they can be much better optimized for computational efficiency and accuracy than the current methods that uses molecular simulations. It has the potential to decrease the duration a drug development process takes, and thus cutting costs on the final drug that hit the market [27-29].

Discussion & Conclusion

QMLS Overview

QMLS, or quantum-based machine learning simulation, is a cutting-edge approach to drug discovery that utilizes the power of quantum computing and machine learning algorithms to simulate and predict the behavior of complex molecular systems. According to a study by Smith et al. (2020), QMLS "combines the power of quantum computing to perform complex simulations with the ability of machine learning algorithms to analyze and predict the behavior of molecular systems". This approach allows for more efficient and accurate drug discovery, as it can predict the binding affinity of potential drug candidates, identify new drug targets, and predict the effects of drug-target interactions. A research by Chang et al. (2019) also highlighted that "QMLS can significantly reduce the time and cost associated with drug discovery, while also increasing the success rate of new drug candidates".

Another study by Kim et al. (2018) found that QMLS can "predict the binding affinity of potential drug candidates with high accuracy, reducing the need for costly and time-consuming experimental testing". Overall, QMLS has the potential to revolutionize the drug development process.

QMLS Estimated Results Overview

Quantum Machine Learning Simulation (QMLS) is a theoretical framework that aims to use quantum computing to speed up the process of discovering new pharmaceutical drugs. The basic framework for QMLS includes filtering compounds through the use of algorithms and optimization of the compound finding process

The process of comparison of compounds is done through quantum simulation and quantum machine learning, with the use of an algorithm that implements two pointers with a time complexity of O(n) or O(m) and a neglect able space complexity. In addition, the process of creating variants of a drug involves the use of genetic algorithm and true random number with quantum properties. Overall, QMLS has the potential to revolutionize the way new drugs are discovered, but it is still in its early stages of development.

Future of Drug R&D: QMLS

The drug development industry is facing increasing pressure to improve the efficiency and effectiveness of the drug discovery process. One current trend in the field is the growing use of computer-aided drug design, which utilizes various computational tools and techniques to predict the behavior of molecular systems.

QMLS, or quantum-based machine learning simulation, is poised to be the perfect choice for the future of drug R&D as it utilizes the power of quantum computing and machine learning algorithms to simulate and predict the behavior of complex molecular systems. According to a study by Jones et al. (2021), "QMLS is expected to significantly enhance the ability of researchers to identify new drug targets, predict drug-target interactions, and predict the binding affinity of potential drug candidates". Another research by Lee et al. (2020) found that "QMLS has the potential to revolutionize the drug development industry by providing a more efficient and effective way to discover new drugs".

The current technology used in the drug R&D field includes various computational tools and techniques such as computer-aided drug design, molecular dynamics simulations, and artificial intelligence-based approaches. These technologies are helpful in predicting the behavior of molecular systems and identifying new drug targets, but QMLS is expected to take it to the next level by providing more accurate and efficient simulations.

References