NIMS Hyderabad and IIITH create pathology datasets for cancer & kidney research

Hyderabad: In a significant step towards advancing India-centric clinical research, the International Institute of Information Technology, Hyderabad (IIITH) has partnered with Nizam’s Institute of Medical Sciences (NIMS), Hyderabad, to launch publicly available datasets of digitized histopathological images. These datasets focus on brain cancer and kidney disease (Lupus Nephritis), providing crucial resources for medical research and AI development.
The India Pathology Dataset (IPD) project is a collaborative effort involving academia, hospitals, industry, and the government. Its primary goal is to digitize tissue biopsy slides, offering benefits such as reduced risk of slide damage, faster turnaround times, improved clinical decision-making, and expanded research opportunities through AI.
The project is supported by the Technological Innovation Hub for Data Banks, Data Services, and Data Analytics (TiH-Data). As part of the initiative, IIITH has installed a whole-slide digital scanner at NIMS to facilitate the digitization process. According to Prof. Vinod P.K, who is leading the dataset curation, digitizing these slides allows computers to visualize and share images, enabling collaborative diagnoses across locations.
Brain Tumor Dataset Released
One of the first datasets released is the IPD-Brain dataset, published in Nature Scientific Data. This open-access dataset, which includes 547 high-resolution H&E slides from 367 patients, is one of the largest of its kind in Asia. Dr. Megha Uppin, from the Department of Pathology at NIMS, emphasized the importance of precise tumor typing and grading for effective cancer management. The dataset provides a foundation for training machine learning models to improve diagnostic accuracy and explore regional variations in brain tumors.
AI has the potential to bridge gaps in brain tumor diagnosis, particularly in addressing the shortage of specialized neuropathologists. Digital pathology can also help peripheral hospitals access expertise from specialists remotely.
The project aims to expand the dataset to include other cancers, such as breast, lung, colorectal, oral, and cervical cancers. NIMS is also contributing to the lung cancer dataset.
Lupus Nephritis Dataset
In addition to cancer-related datasets, the IPD project has compiled a dataset on Lupus Nephritis, a kidney disease caused by an autoimmune response that disproportionately affects women in India. The dataset will help nephropathologists at NIMS classify the disease and recommend appropriate treatments. AI tools are expected to address challenges in classifying disease subtypes and overcoming interobserver variations.
AI and Molecular Prediction
AI is also being used to predict molecular markers from H&E slide images. Traditionally, molecular profiles are obtained through genetic testing or immunohistochemistry (IHC). However, the IPD team is exploring how tissue morphology can reflect underlying DNA alterations, potentially predicting critical markers such as IDH mutations in brain tumors.
The Importance of Histopathological Datasets
The IPD project’s open-source datasets are valuable resources for researchers developing new AI models and conducting data analysis. Prof. Vinod noted that the project is one of the first instances of open-source medical data from India. The IIITH campus now houses a second slide scanner available for research and educational use. Dental colleges and corporate hospitals are already utilizing the technology.
In addition to research, the dataset serves as an educational tool, offering MD students and pathologists a resource for studying histopathological images in depth.
Looking ahead, Prof. Vinod mentioned that additional datasets, including one on breast cancer, are developing. The IPD project is unique in its focus on Indian demographics, filling a gap in the availability of region-specific data for histopathology research, which has traditionally relied on datasets such as the U.S.-based TCGA (The Cancer Genome Atlas).
The India Pathology Dataset project is set to significantly contribute to clinical research, education, and the development of AI-based diagnostic tools in India.