Sage Bionetworks uses open-science practices and develops novel research technology to enable scientists to collaborate more efficiently and share data responsibly. As a non-profit and neutral convener, Sage believes that fostering teamwork among distributed research communities helps accelerate and improve human health outcomes. While Sage is experienced in developing custom applications for a range of collaborative research projects, they sought the expertise of GenUI to enhance the reliability of the CRI iAtlas platform.
In partnership with the Cancer Research Institute (CRI) and Institute for Systems Biology (ISB), Sage Bionetworks developed the interactive portal CRI iAtlas. CRI iAtlas seeks to help scientists improve genetic research on tumors in the context of immune features and interactions. For example, suppose a study posits that a specific compound affecting a subtype of immune cells slows the way a particular tumor grows. In that case, an investigator can explore results in iAtlas compiled from multiple other extensive studies and over 10,000 individuals for supporting evidence. To continue expanding this resource's utility, the iAtlas team partners with the global research community to integrate new cohorts of data – wrangling outputs in gigabytes of files that number 20,000 genes or more and potentially millions of mutations. This data needs both safe storage and to be available for scientists to access.
“This wasn't my first project working with GenUI, so I was optimistic going in. One of the really beneficial things is how their engineers work with our scientists and data engineers side-by-side toward an end result that we can maintain ourselves.”
Dr. James Eddy, Director of Informatics & Biocomputing, Sage Bionetworks
Relying on flat-files (e.g., tabular data with a .txt or .csv extension) to store and load this data is challenging to maintain, manually intensive, and underperforms at large scales. Sage wanted to replace flat files with a database architecture – and potentially a caching layer – to improve data management, performance, and scalability to support current and future data visualization modules. They also sought to incorporate continuous integration (CI) and continuous deployment (CD) tools.
They needed an accomplished and innovative software partner who could turn their CRI iAtlas vision into a reliable platform. GenUI is that partner.
In Phase One of the project, the newly formed team refactored the CRI iAtlas application, allowing scientists and developers to query massive amounts of data. CRI iAtlas more quickly and effectively finds and sorts the data before exposing it through secure portals for subset analyses. The more researchers use CRI iAtlas, the faster they can select a subset, and the easier it is to create visualizations. Simplifying the once-dense amount of data allows for more immediate answers to hypotheses, facilitating future research into cancer diagnosis and treatment.
“The way that we managed data within [iAtlas] was prohibitive in terms of integrating new sources and data types, as well as scaling. We recognized the need to move away from flat files and had ideas for a solution, but it was hard to find the bandwidth to prototype and explore those ideas versus partnering with a group that's an expert in cloud-based database systems.”
Dr. James Eddy, Director of Informatics & Biocomputing, Sage Bionetworks
Building on their success in Phase One, the team has then turned their attention to Phase Two: design and development of an iAtlas Data API in Python and GraphQL so that researchers can more easily consume CRI iAtlas content. Piggybacking off work already done to automate the build and deployment of new datasets, the iAtlas Data API will allow Sage to run its data transformation and database build pipelines in a private environment. Creating a new version of the database is compute-intensive but also happens irregularly. GenUI proposed an AWS solution that uses EC2 Spot Instances and provisions them as needed to run data-build tasks. Sage will now only pay for their build environment when using it.
“In terms of long-term impact, our positive working relationship with GenUI has made this collaboration an appealing avenue for other groups and other projects."
Dr. James Eddy, Director of Informatics & Biocomputing, Sage Bionetworks
Together, the GenUI and Sage teams have defined, built, and optimized queries necessary for application features and use cases. All the technology choices were made to ensure that the Sage team would be able to maintain and build on the collaboration after delivery.
The application has already grown from one initial dataset to nine and the team is now better equipped to add more as research requires, allowing them to roll out new information to their users as the platform grows.
Open access to CRI iAtlas can help cancer researchers explore data to look for trends, generate hypotheses, and determine where to dig deeper -- ultimately supercharging the cancer research community and helping to further the goals of potential treatments and cures.
Can we help you apply these ideas on your project? Send us a message! You'll get to talk with our awesome delivery team on your very first call.