Plex has announced that it is partnering with Ginkgo Datapoints, a service of Gingko Bioworks, to use Plex’s artificial intelligence platform, Plex AI, to analyze the GDPx2 dataset, the latest release of a large transcriptomics survey of compound-induced gene expression from four human primary cell types, 85 compound treatments, six doses, and four replicates. The cell types used in the dataset are human melanocytes, aortic smooth muscle cells, dermal fibroblasts, and skeletal muscle myoblasts. The data, which is close to four terabytes in size, were generated using Drug-seq, an ultra-high throughput miniaturized transcriptomics assay.
The partners plan to use the Plex AI platform to find new connections between drugs and diseases with an eye toward finding drug repurposing opportunities and adding to the body of knowledge about drug efficacy and safety. Speaking with GEN during this year’s Bio-IT World conference in Boston which took place April 2–4, Douglas Selinger, PhD, CEO and co-founder of Plex Research said that the partners were not focused on a specific disease area like oncology or neurodegenerative disease. Primarily, the purpose is to show the “richness” of the GDPx2 dataset and the benefit of using Plex’s platform to extract meaningful insights from their data. The analysis could for example elucidate new uses for existing drugs or shed light on better ways of stratifying patients.
While there, he began thinking about repurposing algorithms used by search engines like Google to find webpages for use with raw biological data and even built a prototype system that they disseminated broadly. Selinger left Novartis to launch Plex Research and build out a brand-new platform based on similar ideas for searching chemical biology and omics datasets.
Crucially, Plex’s target is raw datasets—the company’s platform leverages information from biological databases as well as other sources like scientific publications. “This is not just about finding what was in the text of papers or patents, which is what most people think about when they say they think about the scientific literature,” Selinger told GEN at Bio-IT. Most papers present a small subset of the data that scientists generated and used for their study. “We generate all these massive data sets … tens of thousands of gene measurements [that don’t] get mentioned anywhere in the paper,” he said. The result is millions of data points consigned to supplementary files.
“We get a lot from those sort of sources … that aggregate many other data sources. We also dig into individual studies that we think are especially important [and] we’ll reformat them and sometimes reanalyze them and then incorporate those,” Selinger explained.
Underlying the company’s AI platform is a proprietary focal graph technology and large language models. The data is represented as a knowledge graph which allows different types of datasets to be represented in a searchable structure. Users can run queries to find novel links between proteins, gene pathways, and drugs. The system ranks proposed drug targets and provides detailed supporting data to back its responses. “You’re not predicting the targets,” Selinger stressed. “You’re identifying these new findings… and providing the experimental data that supports that finding and the details of where it came from [and] how that data was generated.”
Besides publicly available information, the company can also incorporate proprietary datasets into its platform and take that information into account when selecting potential targets. “The data model is a graph [with] nodes and edges” that shows how compounds, targets, pathways, biomarkers, and more are connected, he said. “There certainly is “scientific input in choosing which nodes to put in and which edges,” he said. “But the data model is incredibly simple.”
Furthermore, the platform is very specific. “We can go to the Ginkgo data set and say ‘you’ve treated with this compound … and we’ve collected the data about how cells respond to that compound’,” he explained. “Now we’re going to the [public datasets] and say, ‘Where have we ever seen that pattern before?’ Some of it will be, you know, patterns we expect. But then we may see connections where someone did something different. Or maybe it matches a disease signature or a disease pattern that we didn’t know about.”
Plex has provided its platform to over 40 companies in total including several major pharmaceutical companies. Customers so far have used the Plex platform to run queries across a broad range of drug modalities. The agreement with Ginkgo is different. The partners plan to incorporate datasets from Ginkgo Datapoints that have been made public into the Plex platform. “They have the capacity to generate amazing data, and we have a way to make sense of that data,” Selinger said. They also plan to publish a paper that showcases the value of the data as well as the analysis approach used.
Besides its partnership with Ginkgo, Plex also works with academic institutions. Recently, the company published a pre-print with scientists from Harvard Medical School that describes the use of knowledge graphs and large language models in drug discovery.