Automation of physico-chemical properties calculation and registration with KNIME

Prioritizing the most promising drug candidates for further evaluation

Company boiler plate: Chiesi is an international pharma company based in Parma (Italy), with more 80 years of experience and a strong focus on research, development, production and the commercialisation of innovative medicines in the Respiratory, Neonatology, Rare Disease and Special Care Therapeutic Areas.

The Chiesi Group employs approximately 5,624 people, 788 of whom are dedicated to R&D activities and 827 of whom work at production sites in Italy, France and Brazil.

Getting information to medicinal chemists in a timely manner

The development of drug candidates that combine an acceptable biological activity and an appropriate physicochemical profile is a key challenge. Therefore, in the drug discovery process, physicochemical properties are important parameters for the characterization of compounds. In the search for new drug candidates, medicinal chemists routinely evaluate data such as biological activities and physicochemical properties associated to numerous compounds.

This is to prioritize the most promising ones for further optimization or study and discard the others. The present workflow helps chemists evaluate whether the compounds possess desirable physicochemical properties such as solubility, pKa, and Lipinski criteria.

The goal of this project was to provide essential information to all medicinal chemists in a timely manner. All scientists, about one hundred people – and not only MedChem - all benefit from these calculated properties at various stages of the discovery process. 

The specific requirements of this project included:

  • Integrate physicochemical properties (calculated by ACD/Labs Percepta) to the corporate database.
  • Implement a user-friendly data visualization format for pKa values to simplify association of calculated property to compound functional group.
  • Automate the entire process without any human intervention (e-mail if workflow ends with success).
  • Run the workflow once a day, during night.
  • Extract newly registered (i.e. without calculated physicochemical properties) molecules from dedicated ORACLE™ view.
  • Interact with ACD/Labs Percepta in order to calculate physicochemical properties.
  • Parse the calculated results and upload to the corporate database.
  • Render the annotated structures in PNG image and upload to the corporate database.

Fully automating the evaluation of drug compounds

To aid chemists in evaluating whether compounds have desirable physicochemical properties, a KNIME workflow was developed that that routinely updates compounds registered in the proprietary database, with the corresponding predicted physicochemical properties (LogP, LogD, LogS and pKa). A commercial program for property calculation (ACD/Labs Percepta) has been coupled with KNIME Analytics Platform and KNIME Server to fully automate this procedure for all new chemical entities registered in the company database.

The KNIME workflow, deployed on KNIME Server, is executed automatically at a given time, and results are stored in Chiesi proprietary Corporate DB.

The project started with verifying that ACD/Labs Percepta (batch module) could calculate all the needed properties via command-line and that the results were compatible with standard KNIME nodes – specifically SDF Reader and CSV Reader. A set of molecule structures was then received from a public structure database to use in setting up a properties calculation different enough to cover most calculation problems. Then the structure’s format of the input table coming from the ORACLE™ view was defined (i.e. SDF or SMILES format of molecules, identifier, primary keys, other fields), as well as the output format to write to ORACLE™ tables (table names, fields name and type, accessory columns).

The construction of the workflow looked like this:

  • Build the ORACLE™ table similar to production environment, for the read and write/update. 
  • Export SMILES structures, transform and write an SDF file, and instruct external applications to read the last one using the External Tool node.
  • Gather the output files (CSV and SDF). 
  • Split data between a single-type property (LogP, LogD and LogS at pH 7.4) to be registered in the molecule table, multiple-type property (pKa values) to be registered in its proper table and warning message to be archived as logfile.txt – all using the CSV Reader Output.
  • Annotate the pKa values in the SDF structure, render as PNG image and load as binary object in a third table – all using SDF Reader Output. 
  • Send email to administrator – assuming all is fine - which is done automatically using KNIME Server.

Project Results

Around 50,000 compounds with calculated properties, more than 4 year of use without any problem or intervention. The biggest impact that this solution has had, is the time saved by scientists who no longer need to calculate properties on demand. Customer satisfaction has also increased considerably.

  • Lessons Learned?

Solving a real life business case using integration and automation increase productivity and user experience.


Before the project began, two key features were required. The first being the ability to interact (read, write and update permissions) with ORACLE™ Database. The second: the ability to interact via command line with third party software (ACD/Labs Percepta Batch).

KNIME Analytics Platform enabled us to do these two things.

At the start, the free and open source KNIME Analytics Platform played an important role. The cost advantages that an open source software has are significant.

Once acceptance grew, getting a license for KNIME Server was much simpler.

Integrations and extensions of KNIME Software 

  • Database Integration (Connector, reader, writer, upload, SQL executor)
  • CSV Reader and Writer nodes, plus many transformation nodes
  • ORACLE™ Database Driver
  • OpenBabel
  • External Tool
  • What type of data sources are involved?

Data source

  • ORACLE™ view with code, primary key and structures as SMILE


Did you automate certain processes?

  • The whole process doesn’t require any human intervention.

Are you using the scheduler available with KNIME Server?

  • Yes, every night.

Deploying as an app on KNIME WebPortal? 

  • No.

Did you export the data to a visualization tool?

  • We deploy the calculated properties and the PNG images to the corporate database. The data are then accessed using a web-based visualization tool. 

Next Steps in the Project

  • Next steps include building and implementing a similar workflow for the on-demand calculation of property using an SD file as input and provide the user with the same set of calculated properties for designed compound before their synthesis.


Latest news

Workshop al 63° Simposio AFI 5 giugno 2024 - Rimini

Workshop al 63° Simposio AFI 5 giugno 2024 - Rimini


Mercoledì 5 giugno vi invitiamo a partecipare al Workshop:



Saremo lieti di incontrarvi presso lo stand n. 5 dell'area espositiva.



Meet TOXIT at SITOX 2023!

Meet TOXIT at SITOX 2023!

TOXIT is enthusiastic to join the 21st National Congress of Toxicology organised by the Italian society of Toxicology SITOX! We will be in Bologna from the 20th to the 22nd of February. We are also glad to share that Dr. Marta Lettieri will be presenting a poster titled “Skin sensitization: AOPs and (Q)SAR models”  (P5/1). 

Looking forward to meeting you there!


Machine learning in early prediction of metabolism of drugs for rare diseases

We are proud to announce that our colleague Marta Lettieri will be speaking at the workshop "Modelling & Simulation: Research Methodologies for Small Populations in Rare Diseases", which will be held in Bari on July 4th-5th.



S-IN Soluzioni Informatiche S.r.l. - via Malta 6/B - 25124 - Brescia - Italy

E-mail: - PEC: - Tel. +39 0444 1821160 - Fax +39 0444 1821169

 C.F. & P.IVA IT02397280245 - Ufficio del Registro: Vicenza - REA: BS-624000 - Capitale sociale € 15.000,00 i.v.

Privacy Policy