NEXT is a system that makes it easy to develop, evaluate, and apply active learning.
Active learning methods automatically adapt data collection by selecting the most
informative samples in order to accelerate machine learning. Because of this,
real-world testing and comparing active learning algorithms requires collecting
new datasets (adaptively), rather than simply applying algorithms to benchmark
datasets, as is the norm in (passive) machine learning research.
To facilitate the
development, testing and deployment of active learning for real applications, we
have built an open-source software system for large-scale active learning research
and experimentation. The system, called NEXT, provides a unique platform for
real-world, reproducible active learning research. This paper details the challenges
of building the system and demonstrates its capabilities with several experiments.
The results show how experimentation can help expose strengths and weaknesses
of active learning algorithms, in sometimes unexpected and enlightening ways.
Publishing data and software needed to reproduce experimental results is
essential to scientific progress in all fields.
Due to the adaptive nature of data collection in active
learning experiments, it is not enough to simply publish data gathered in a previous experiment. For
other researchers to recreate the experiment, the must be able to also reconstruct the exact adaptive
process that was used to collect the data. This means that the complete system, including any web
facing crowd sourcing tools, not just algorithm code and data, must be made publicly available
and easy to use.
By leveraging cloud computing, NEXT abstracts away the difficulties of building
a data collection system and lets the researcher focus on active learning algorithm design. Any
other researcher can replicate an experiment in under one hour by just using the same experiment
initialization parameters.
NEXT puts state-of-the-art active learning algorithms
in the hands of non-experts interested in collecting data in more efficient ways.
This includes psy-
chologists, social scientists, biologists, security analysts and researchers in any other field in which
large amounts of data is collected, sometimes at a large dollar cost and time expense. Choosing an
appropriate active learning algorithm is perhaps an easier step for non-experts compared to data col-
lection.
NEXT is accessible through a REST Web API and can be easily deployed in the cloud with minimal
knowledge and expertise using automated scripts. NEXT provides researchers a set of example
templates and widgets that can be used as graphical user interfaces to collect data from participants
(see supplementary materials for examples).