PHARMA INDUSTRY CASES
· Automation of physico-chemical properties calculation and registration with KNIME
· Workflow manager
· Migration of MDL/ISIS sample database to ACD/Spectrus DB Platform
· Compound Registration System
· Biological Assay Registration System and integration with Data Mining Platform
· Automatic characterization report
AUTOMATION OF PSYCHO-CHEMICAL PROPERTIES CALCULATION AND REGISTRATION WITH KNIME
Prioritizing the most promising drug candidates for further evaluation
Company boiler plate: Chiesi is an international pharma company based in Parma (Italy), with more 80 years of experience and a strong focus on research, development, production and the commercialization of innovative medicines in the Respiratory, Neonatology, Rare Disease and Special Care Therapeutic Areas.
The Chiesi Group employs approximately 5,624 people, 788 of whom are dedicated
to R&D activities and 827 of whom work at production sites in Italy, France and
Brazil.
Getting information to medicinal chemists in a timely manner
The development of drug candidates that combine an acceptable biological activity
and an appropriate physicochemical profile is a key challenge. Therefore, in the drug
discovery process, physicochemical properties are important parameters for the
characterization of compounds. In the search for new drug candidates, medicinal
chemists routinely evaluate data such as biological activities and physicochemical
properties associated to numerous compounds.
This is to prioritize the most promising ones for further optimization or study and
discard the others. The present workflow helps chemists evaluate whether the
compounds possess desirable physicochemical properties such as solubility, pKa,
and Lipinski criteria.
The goal of this project was to provide essential information to all medicinal chemists
in a timely manner. All scientists, about one hundred people – and not only MedChem – all benefit from these calculated properties at various stages of the
discovery process.
The specific requirements of this project included:
– Integrate physicochemical properties (calculated by ACD/Labs Percepta) to
the corporate database.
– Implement a user-friendly data visualization format for pKa values to simplify
association of calculated property to compound functional group.
– Automate the entire process without any human intervention (e-mail if
workflow ends with success).
– Run the workflow once a day, during night.
– Extract newly registered (i.e. without calculated physicochemical properties) molecules from dedicated ORACLE™ view.
– Interact with ACD/Labs Percepta in order to calculate physicochemical properties.
– Parse the calculated results and upload to the corporate database.
– Render the annotated structures in PNG image and upload to the corporate database.
Fully automating the evaluation of drug compounds
To aid chemists in evaluating whether compounds have desirable physicochemical
properties, a KNIME workflow was developed that that routinely updates compounds
registered in the proprietary database, with the corresponding predicted
physicochemical properties (LogP, LogD, LogS and pKa). A commercial program for
property calculation (ACD/Labs Percepta) has been coupled with KNIME Analytics
Platform and KNIME Server to fully automate this procedure for all new chemical
entities registered in the company database.
The KNIME workflow, deployed on KNIME Server, is executed automatically at a
given time, and results are stored in Chiesi proprietary Corporate DB.
The project started with verifying that ACD/Labs Percepta (batch module) could
calculate all the needed properties via command-line and that the results were
compatible with standard KNIME nodes – specifically SDF Reader and CSV Reader.
A set of molecule structures was then received from a public structure database to
use in setting up a properties calculation different enough to cover most calculation
problems. Then the structure’s format of the input table coming from the ORACLE™
view was defined (i.e. SDF or SMILES format of molecules, identifier, primary keys,
other fields), as well as the output format to write to ORACLE™ tables (table names,
fields name and type, accessory columns).
The construction of the workflow looked like this:
– Build the ORACLE™ table similar to production environment, for the read and write/update.
– Export SMILES structures, transform and write an SDF file, and instruct external applications to read the last one using the External Tool node.
– Gather the output files (CSV and SDF).
– Split data between a single-type property (LogP, LogD and LogS at pH 7.4) to
be registered in the molecule table, multiple-type property (pKa values) to be
registered in its proper table and warning message to be archived as logfile.txt – all using the CSV Reader Output.
– Annotate the pKa values in the SDF structure, render as PNG image and load as binary object in a third table – all using SDF Reader Output.
– Send email to administrator – assuming all is fine – which is done automatically using KNIME Server.
Project Results
Around 50,000 compounds with calculated properties, more than 4 year of use
without any problem or intervention. The biggest impact that this solution has had, is
the time saved by scientists who no longer need to calculate properties on demand.
Customer satisfaction has also increased considerably.
Why KNIME?
Before the project began, two key features were required. The first being the ability
to interact (read, write and update permissions) with ORACLE™ Database. The
second: the ability to interact via command line with third party software (ACD/Labs
Percepta Batch).
KNIME Analytics Platform enabled us to do these two things.
At the start, the free and open source KNIME Analytics Platform played an important
role. The cost advantages that an open source software has are significant.
Once acceptance grew, getting a license for KNIME Server was much simpler.
Integrations and extensions of KNIME Software
– Database Integration (Connector, reader, writer, upload, SQL executor)
– CSV Reader and Writer nodes, plus many transformation nodes
– ORACLE™ Database Driver
– OpenBabel
– External Tool
Data source
– ORACLE™ view with code, primary key and structures as SMILE Next Steps in the Project
– Next steps include building and implementing a similar workflow for the on-
demand calculation of property using an SD file as input and provide the user
with the same set of calculated properties for designed compound before their
synthesis.
WORKFLOW MANAGER
CUSTOMER
Pharma company
Goal
Manage analysis workflow from request to final report
Description
Analytical research lab needed a homogeneous platform to manage spectra and
chromatograms of various techniques, e.g. NMR, MS, DSC/TGA, UV-IR, to facilitate
analysis execution, comparison and knowledge sharing.
The proposed system based on ACD/Spectrus platform consisted in:
– An Analytical Research Database to store all analytical data and support workflow management
– An input form to submit a new request of analysis. A popup form has been created with customer desired format/label/constraints consisting in mandatory fields, controlled picklist values, molecule sketcher to draw a molecular structure and automatically calculate chemical properties (formula, molecular weight, monoisotopic mass, etc..), Safety Data Sheet (SDS) file attachment and analysis checkboxes with the list of all the available analysis.
– A workflow management system implemented via:
– Popup forms to manage request acceptance, single analysiscompletion
– Email notification to inform involved users
– Status changes rules to identify open, closed, to be approved or approved requests
MIGRATION OF MDL/ISIS SAMPLE DATABASE TO ACD/SPECTRUM DB PLATFORM
Customer
Pharma company
Goal
Migrate obsolete samples database to a recent system
Description
To preserve production operations and historical data, the no-longer supported MDL/ISIS Base database had to be migrated to a Windows 10 compatible software.
ACD/Spectrus platform has been chosen as destination software because of its
capability to store and search chemical structures and spectra to enrich the knowledge collected around a specific sample.
To successfully accomplish the task three steps have been taken:
– Proof of Concept to determine export feasibility with special attention to molecular structure conversion issues
– Reverse engineering approach to derive the underlying data model.
– Data migration from source model to target data model with Extract- Transform-Load (ETL) KNIME workflow (Link to KNIME session)
– Configuration of screen and entry forms in the target software to improve user
experience and data integrity.
COMPOUND REGISTRATION SYSTEM
Customer
Pharma company
Goal
Build a registration system for corporate compounds
Description
A chemical registration system has been created to implement existing standard
operating procedures and best practices to assure data quality and integrity. The
proposed system has been based on ACD/Spectrus platform providing a centralized
database were chemists and reviewers can:
– Register compounds and lots by drawing molecular structure, choosing salt
from a dictionary, selecting compound type and other mandatory information
– Add analytical results for the synthesized molecule
– Review registration data to approve or reject the operation
– Have an immediate overview for parent compounds about the existing lots,
status, registration date, available amount, etc…
BIOLOGICAL ASSAY REGISTRATION SYSTEM AND INTEGRATION WITH DATA MINING PLATFORM
Customer
Pharma company
Goal
Leverage biological assay results to support drug discovery
Description
Biological assay results were sparse in spreadsheets of various formats and archived independently from one lab to another. Conversely, data scientists needed to collect, aggregate and compare available results for each lot to support drug discovery decisions through a dedicated data mining platform.
Three main tasks have been carried out to accomplish project goal:
– Templates definition, to map assay results in a homogeneous format to allow further analysis. Templates have been created in Excel with fixed dictionaries and data validation rules. In addition, through KNIME workflows it has been possible to automatically convert experiment output in target Excel format to preserve data integrity and reduce human errors.
– Assay publication system development, to store assay results in tabular format and associate each single experiment result to the corresponding lot.
Based on ACD/Spectrus platform, PL/SQL and available ACD/Labs internal scripting environment it has been possible to import input files into table objects and assign each result to the specified Lot ID in the database of lots.
– Integration with data mining platform. Through PL/SQL procedures, assay results have been extracted from the publication database and converted into a format suitable to the third-party data mining platform. In addition, through the available ACD/Spectrus web API it was also possible to make molecular structures and lot details available to the Linux-base third-party software.
AUTOMATIC CHARACTERIZATION REPORT
Customer
Pharma company
Goal
Automatic characterization report to increase return on investment
Description
Within the analytical research lab reports production was a time-demanding task
requiring manual editing and data transcription. Having an automatic report
generation system was crucial to reduce time and improve quality. With templates
definition, macro and scripts it was possible to reach project goal, in particular:
– Templates have been created for each single technique based on customer desired format by means of ACD/ChemSketch and macro scripting
– A global template has been created to produce the final characterisation report, which consists in collecting in a single document all spectra, chromatograms, molecules, spectrum tables and parameters produced in the characterisation activity. Template consists in a Microsoft Word file (.dotx) with bookmarks and formatting styles according to the customer needs.
– Finally, with scripting it has been possible to take information from the
analytical database and feed the corresponding templates to produce reports
with just one click.
S-IN Soluzioni Informatiche S.r.l. – via Malta 6/B – 25124 – Brescia – Italy
E-mail: info@s-in.it – PEC: mail@pec.s-in.it – Tel. +39 0444 1821160 – Fax +39 0444 1821169
C.F. & P.IVA ITO2397280245 – Ufficio del Registro: Brescia – REA: BS-624000 – Capitale sociale € 15.000,00 i.v.