
There lacks a principled approach to jointly model the global noise level of the labels and the expertise level of each individual labeler, in the absence of gold standard labels, which is what we want to achieve in this paper. Notwithstanding their demonstrated success, these two approaches are rather ad hoc. It does not explicitly evaluate the labelers, although it may be extended to do so by online tracking how often a labeler is contradicting with the majority.

The second approach focuses on the label noise. But it needs to pre-label a set of data to serve as the gold standard, which may be an obstacle by itself. The first approach is able to evaluate the labelers online, which is desirable. The basic assumption is that the majority of the labelers are behaving in good faith. Then online or postmortem majority voting, or majority model consistency check is conducted to obtain the more likely ground-truth label of the data sample. The second approach addresses this issue through evaluating the labels by collecting multiple labels for each data sample ( Deng et al. When a labeler is constantly generating contradicting labels on data samples from the gold standard dataset, all labels from that labeler may be discarded as he/she is highly likely to be an irresponsible one. The first approach attempts to evaluate the labelers by adopting a pre-labeled gold standard dataset ( Ambati et al. Previous works for modeling the labelers’ expertise mainly adopted two approaches. The higher the expertise level a labeler is at, the lower the label noises he/she will produce.

2009 Vijayanarasimhan and Grauman 2014 Ambati et al.

So it is desirable to model the expertise levels of the labelers to ensure the quality of the labels ( Deng et al. Although it is cheap to obtain a large quantity of labels through crowdsourcing, it has been well known that the collected labels could be very noisy. 2011), collecting labeled visual datasets at large scale from crowd-sourcing tools such as the Amazon Mechanical Turk has become a common practice ( Deng et al. 2008 Sanchez and Perronnin 2011 Krizhevsky et al. The results show our extended model can not only preserve a higher accuracy, but also achieve a higher efficiency.Īs research on visual recognition evolves gradually towards an experimental science, partly due to the success of the introduction of the machine learning approach to computer vision ( Burl et al. In addition, we extend the proposed model with the Predictive Active Set Selection Method to speed up the active learning system, whose efficacy is verified by conducting experiments on the first three datasets. The experiments clearly demonstrate the efficacy of the proposed model. We apply the proposed model for four visual recognition tasks, i.e., object category recognition, multi-modal activity recognition, gender recognition, and fine-grained classification, on four datasets with real crowd-sourced labels from the Amazon Mechanical Turk. The probabilistic nature of our model immediately allows the adoption of the prediction entropy for active selection of data samples to be labeled, and active selection of high quality labelers based on their estimated expertise to label the data. Expectation propagation is adopted for efficient approximate Bayesian inference of our probabilistic model for classification, based on which, a generalized EM algorithm is derived to estimate both the global label noise and the expertise of each individual labeler. It explicitly models both the overall label noise and the expertise level of each individual labeler with two levels of flip models. We present a noise resilient probabilistic model for active learning of a Gaussian process classifier from crowds, i.e., a set of noisy labelers.
