AI FOR MEDICINE

Analytics for Discovery

img1.png
img2.png
img4.png

FILES

Click to Download

White paper

BME TOOLKIT                Toolkit Demonstration

Why we are unique-- Out of the Box "Easy AI" Office Software

                                    Introduction to AI and Chemoinformatics

                                    transcript-- https://medium.com/@patrickchirdon/what-is-chemoinformatics-b4c12449a0d

Subscribe to us on Medium and our newsletter for updates on the intersection of AI and pharma!  Learn about how artificial intelligence works and see how general pharmacology and chemical engineering can be really cool.

 

What are your biggest challenges?

What metrics is your CEO most interested in?

Why are you looking for new software?

When can we give you a demonstration or a sales presentation?

Subscribe to Our Newsletter

Send a picture of your molecule and get a free compound report card!

Example: GSK3 report card https://www.dropbox.com/s/nt5kqgsp2pghumf/reportcard.csv?dl=0

Get A Free Hour of Consulting with the Toolkit- Learn How We Can Advance Your Project and What Would Be Involved- Fill Out Contact Us Form

Describe your need for biomedical or chemical industry software and we can build you a solution! 

email patrickchirdon@aidrugdiscovery.net

or call 440-897-6916

WHAT THE SOFTWARE CAN DO

Basic Edition

Target Prediction

Lookup smiles by typing in molecule name and pressing submit.
Support Vector Machine
46 models. Metrics for test data:
accuracy 97.59 +/- 2.41
sensitivity 91.9 +/- 8.1
specificity 98.6 +/- 1.4
215/247 87% correct mechanism on independent test set.
======================================
Chembl Target Prediction
Multiclass Classifier
Number of unique targets 560
Ion channel 5
kinase 96
nuclear receptor 21
GPCR 180
Others 258
accuracy .87
auc .92
sensitivity .76
specificity .92
precision .82
225/225 100% correct mechanism on independent test set. Note-- 1 is considered positive and zero is negative for a given target.
======================================
Interpreting output:  For the target predictions, the green represents a positive region for the molecule, the red represents a negative region of the molecule for a tested property, and gray represents no detection. For more on this method please read Similarity maps-- a visualization strategy for molecular fingerprints and machine learning methods.
======================================
Inside the target prediction folder, there should be .png images for each of the smiles in the output folder. Make sure to change the directory to the output directory of the targetprediction folder under the images menu. Since there are 46 models it is best to only use a few smiles at a time.
======================================
Creating your own models:  https://pubchem.ncbi.nlm.nih.gov/#query=interferon&tab=assay, Also see chembl bioassays. These assays must be saved as .txt files with two columns-- the first for the smiles and the next column for either 1 or zero (active and inactive respectively).  The text file with the smiles and 1's and 0's should be in the targetprediction folder.  The text file names should contain the name of the assay.  You want a model with both good sensitivity and specificity (as close to one as possible).  It is important to note that a model can appear highly accurate but if sensitivity is zero, then the model does not detect positives.
======================================
confusion matrix
tn fp
fn tp
It is important to note that column 1, row 1 is NOT true positive as you might expect from stats class.  Sensitive models will not have 0 in the bottom right corner.  If you are not getting good sensitivity and specificity, then you may want to change the penalty C=500000 to some other value.  By default the SVC is set up to use a RBF best fit but this can be changed as per the scikit learn documentation.  The output files will be saved as .pkl files that can later be loaded for future use.
======================================
Pan Assay Interference
See Seven Year Itch: Pan-Assay Interference Compounds (PAINS) in 2017—Utility and Limitations New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS) from Screening Libraries and for Their Exclusion in Bioassays. Pan Assay Interference Compounds commonly result in false positives in biological screening assays.
Since they bind everything, they are not selective and therefore do not make good drug targets.  We found that the higher the drug score in Data Warrior http://www.openmolecules.org/datawarrior/ the lower the frequency of compounds containing PAINS.  Using data warrior’s evolutionary algorithm (be sure to use the wand tool if you want to fix the scaffold), evolve a few runs by taking the compounds with the top drug scores (macro → run macro → calculate properties) by taking the top 5 scoring compounds as starting points for evolution until you get drug scores greater than .9.  Select based on skelsphere similarity and the algorithm will generate a large number of compounds that have high drug scores, which are oftentimes painless.
The program will tell you what functional groups for each compound were responsible for a positive PAINFUL test result.  The program also tells you the fraction of SP3 hybridized carbons.  Compounds with scores > .47 are more selective binders.  Note that double bonds reduce the fraction of sp3 hybridization, as they make the compound more flat.  See Escape from flatland: increasing saturation as an approach to improving clinical success. Pains are defined as follows:
Doveston R, et al. A Unified Lead-oriented Synthesis of over Fifty Molecular Scaffolds. Org Biomol Chem 13 (2014) 859D65. doi:10.1039/C4OB02287D
Jadhav A, et al, Quantitative Analyses of Aggregation, Autofluorescence, and Reactivity Artifacts in a Screen for Inhibitors of Thiol Protease.  J Med Chem 53 (2009) 37D51. doi:10.1021/jm901070c
======================================
Fragmenter
Input a list of smiles.  These will be recombined into new combinations.  When you take the lowest energy ligands from a docking program and recombine these there may be some compounds that bind with lower energy than the original.
======================================
Make Spreadsheet
Input smiles.  The output will be a spreadsheet called test.xlsx in the target prediction folder that contains images of the molecules.
======================================
Solubility
Predicts log S.  Log S greater than -4 is soluble.
Root mean square error of 1.27 on a scale from -4 to 4.
linear regression
======================================
Build a SAR model
cross entropy- default loss function for binary classification problems. Summarizes the average difference between the actual and predicted probability.
hinge- alternative to cross entropy binary classification developed with SVM models used with support vector machine models
mse-default loss to use for regression problems. calculated as the average of the squared differences between the predicted and actual values
mae-for regression problems.  used in cases where there are outliers. average of the absolute difference between actual and predicted values
=============

Substructure Search

=============

Wash Library

=============

Library Creation

=============

3D Coordinate SDF Creator

==============

Molecular Descriptors

==============

Compound Report Card

======================

Autodock

======================

Draw Compounds


 

modulator of glutamate NMDA receptor-- pink is inactive, green is active

46 pre-build models

UPGRADE

Tell us what you would like in the software for your particular chemical need and we will build it!!

Take our survey!  https://us10.list-manage.com/survey?u=a8422babdc552e8db848a1c92&id=73f3f8dad5

Partner With Us!

DIGITAL DRUG DISCOVERY SERVICES

Add us to your pipeline!

If you're an academic lab, startup, chemical company, regulatory agency, AI/software company, or patent attorney we would love to hear from you!  In recent years, contract research organizations like ours have partnered with pharmaceutical companies to provide data mining and custom algorithm creation for their customer's pipelines.  Whether it is a a long term contract for us to screen your compounds, or just a one time job, we would love to hear from you.  We could create compound report cards for you, design libraries and possibly synthesize the compounds for you.  Contact us for an introductory session.  The first 30 minutes of the introductory consultation are free, and then we can set up a strategic planning session for your particular analytics needs.

Brainstorming

CONTACT US

Not satisfied? Want more features? Tell us!

440-897-6916

CV

Background Details

Chief Technology Officer

BA BIOLOGY

Aug 2008- May 2012

Education
minor – cognitive linguistics, major-Biology b.a. Case Western Reserve University 2012

FULBRIGHT/SWISS GOVERNMENT SCHOLARSHIP

Aug 2012-May 2013

École polytechnique fédérale de Lausanne

POST-BACC

Aug 2013-May 2015

Cleveland State University. Took classes and prepared for the MCAT

MEDICAL SCHOOL Y1 AND Y2

May 2015- Mar 2018

Ohio University Heritage College of Osteopathic Medicine. I completed years one and 2 of medical school. I was close to passing my board exam, and rather than continuing to struggle, I decided to pursue other scientific strengths. I was always interested in AI, and fortunately for me, this was the right time to make a switch as a lot of medical fields are headed this direction.

MS- BIOMEDICAL ENGINEERING

2020

IOhio University Russ College of Engineering

CV

Background Details

Chief Financial Officer, CEO

May 2012

Bachelor of Science Chemistry, May 2012

2012-2013

Geochemical Simulations Technician, Global Resource Engineering, Aurora, Colorado

2013-2014

Organic Extractions Technician, Curtis & Thompkins Ltd., Berkeley, California

2014-2015

Laboratory Manager, Environmental and Plant Biology, Ohio University, Athens, Ohio

2015-2018

Research Assistant, Environmental and Plant Biology, Ohio University, Athens, Ohio

2018-Present

Research Assistant, Ohio University Genomics Facility, Ohio University, Athens, Ohio

PhD, Molecular and Cell Biology

2019 Ohio University

ABOUT US

We have not launched yet-- under construction.  The software was created in the lab of Dr. Sumit Sharma and Dr. Douglas Goetz by Patrick Chirdon as part of his master's thesis.  We created a virtual compound library using Data Warrior and performed protein ligand docking with Rosetta.  However, there was not a tool that easily integrated QSAR models for target prediction with existing pipelines in an easy to use GUI format, so we created this toolkit.  The software is based in Tensorflow, Keras, and RDKIT python modules.  The project began in 2019.

©2019 by Digital Drug Discovery. Proudly created with Wix.com