NEXT is a system that makes it easy to develop, evaluate, and apply active learning.
Active learning methods automatically adapt data collection by selecting the most
informative samples in order to accelerate machine learning. Because of this,
real-world testing and comparing active learning algorithms requires collecting
new datasets (adaptively), rather than simply applying algorithms to benchmark
datasets, as is the norm in (passive) machine learning research.
To facilitate the development, testing and deployment of active learning for real applications, we have built an open-source software system for large-scale active learning research and experimentation. The system, called NEXT, provides a unique platform for real-world, reproducible active learning research. This paper details the challenges of building the system and demonstrates its capabilities with several experiments. The results show how experimentation can help expose strengths and weaknesses of active learning algorithms, in sometimes unexpected and enlightening ways.
Publishing data and software needed to reproduce experimental results is
essential to scientific progress in all fields.
Due to the adaptive nature of data collection in active learning experiments, it is not enough to simply publish data gathered in a previous experiment. For other researchers to recreate the experiment, the must be able to also reconstruct the exact adaptive process that was used to collect the data. This means that the complete system, including any web facing crowd sourcing tools, not just algorithm code and data, must be made publicly available and easy to use.
By leveraging cloud computing, NEXT abstracts away the difficulties of building a data collection system and lets the researcher focus on active learning algorithm design. Any other researcher can replicate an experiment in under one hour by just using the same experiment initialization parameters.
NEXT puts state-of-the-art active learning algorithms
in the hands of non-experts interested in collecting data in more efficient ways.
This includes psy- chologists, social scientists, biologists, security analysts and researchers in any other field in which large amounts of data is collected, sometimes at a large dollar cost and time expense. Choosing an appropriate active learning algorithm is perhaps an easier step for non-experts compared to data col- lection.
NEXT is accessible through a REST Web API and can be easily deployed in the cloud with minimal knowledge and expertise using automated scripts. NEXT provides researchers a set of example templates and widgets that can be used as graphical user interfaces to collect data from participants (see supplementary materials for examples).