Subgroup Discovery (SD) is a supervised machine learning method used for exploratory data analysis to identify relationships (subgroups) within a dataset relative to a target variable. Key components in SD algorithms include the search strategy, which explores the problem’s search space, and the quality measure, which evaluates the subgroups identified. Despite the effectiveness of SD and the range of algorithms available, only some Python libraries offer state-of-the-art SD tools. Existing libraries like Vikamine and by subgroups lack comprehensive support, highlighting the need for a reliable, well-documented library that integrates popular SD algorithms.
Researchers from the Med AI Lab at the University of Murcia and the Murcian Bio-Health Institute have introduced Subgroups, an open-source Python library designed to simplify SD algorithms. Built for efficiency in native Python, the library provides a user-friendly interface modeled after scikit-learn, making it accessible to experts and non-experts. The library ensures trustworthy algorithm implementations based on established scientific research, and its modular design allows for customization and expansion. Subgroups are already employed in multiple research papers and projects and Are available on GitHub, PyPI, and Anaconda.org.
The Subgroups Library is a modular Python tool designed for SD algorithms, following an architecture with core elements, quality measures, data structures, and algorithms. It includes classes for key SD components like selectors, patterns, and subgroups. The library implements various SD algorithms, such as VLSD and SDMap, along with multiple quality measures, including WRAcc and Binomial Tests. It supports silent and log modes for flexible output and offers extensive unit tests to ensure correct functionality. Built with Python 3 and leveraging pandas, the library is designed for easy extension and reliable algorithm performance.
The Subgroups Library offers a comprehensive ecosystem with manuals and examples, allowing users and developers to familiarize themselves with SD techniques and the library’s implementation. It provides practical examples, such as the VLSD algorithm, and is open-source, enabling researchers to apply key SD algorithms across various domains. This versatility allows the library to be utilized in both past and ongoing research, where SD tools were previously unavailable and contributes to generating new scientific knowledge.
In addition to being a valuable resource for research, the library is also used in real-world projects, having been downloaded over 7,100 times and featured in several scientific papers. It allows for fair comparison and evaluation of SD algorithms within a unified framework, avoiding the need to combine multiple machine learning libraries. The Subgroups Library is continuously evolving, offering the potential for further expansion and the integration of new algorithms. It has already been applied in several notable research projects and collaborations, demonstrating its growing impact in academic and practical contexts.
The Subgroups Library is an open-source Python tool that simplifies using SD algorithms in machine learning and data science. Key features include improved efficiency due to its native Python implementation, a user-friendly interface modeled after scikit-learn, and reliable algorithm implementations based on scientific publications. The library’s modular design allows easy customization, enabling users to add new algorithms, quality measures, and data structures. It has already been applied in numerous research papers and projects, highlighting its effectiveness and adaptability in various domains. Future updates will include additional SD algorithms and search strategies.
Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..
Don’t Forget to join our 50k+ ML SubReddit
Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.