Semi-Supervised Learning via Generalized Maximum Entropy

Candidate: Ayse Naz Erkan

Advisor: Yann LeCun

Maximum entropy (MaxEnt) framework has been studied extensively in the
supervised setting. Here, the goal is to find a distribution p, that
maximizes an entropy function while enforcing data constraints so that
the expected values of some (pre-defined) features with respect to p,
match their empirical counterparts approximately. Using different
entropy measures, different model spaces for p and different
approximation criteria for the data constraints yields a family of
discriminative supervised learning methods (e.g., logistic regression,
conditional random fields, least squares and boosting). This framework
is known as the generalized maximum entropy framework.

Semi-supervised learning (SSL) has emerged in the last decade as a
promising field that enables utilizing unlabeled data along with
labeled data so as to increase the accuracy and robustness of
inference algorithms. However, most SSL algorithms to date have had
trade-offs, for instance in terms of scalability or applicability to
multi-categorical data.

In this thesis, we extend the generalized MaxEnt framework to develop
a family of novel SSL algorithms using two different approaches:
i. Introducing Similarity Constraints
We incorporate unlabeled data via modifications to the primal MaxEnt
objective in terms of additional potential functions. A potential
function stands for a closed proper convex function that can take the
form of a constraint and/or a penalty representing our structural
assumptions on the data geometry. Specifically, we impose similarity
constraints as additional penalties based on the semi-supervised
smoothness assumption; i.e., we restrict the generalized MaxEnt
problem such that similar samples have similar model outputs.
ii. Augmenting Constraints on Model Features
We incorporate unlabeled data to enhance the estimates on the model
and empirical expectations based on our assumptions on the data
geometry.

In particular, we derive the semi-supervised formulations for three
specific instances of the generalized MaxEnt on conditional
distributions, namely logistic regression and kernel logistic
regression for multi-class problems, and conditional random fields for
structured output prediction problems. A thorough empirical evaluation
on standard data sets that are widely used in the literature
demonstrates the validity and competitiveness of the proposed
algorithms. In addition to these benchmark data sets, we apply our
approach to two real-life problems: i. vision based robot grasping,
and ii. remote sensing image classification, where the
scarcity of the labeled training samples is the main bottleneck in the
learning process. For the particular case of grasp learning, we
propose a combination of semi-supervised learning and active learning,
another sub-field of machine learning that is focused on
the scarcity of labeled samples, when the problem setup is suitable
for incremental labeling.

The novel SSL algorithms proposed in this thesis have numerous
advantages over the existing semi-supervised algorithms as
they yield convex, scalable, inherently multi-class loss functions
that can be kernelized naturally.