Maximum entropy (MaxEnt) framework has been studied extensively in the
supervised setting. Here, the goal is to find a distribution p, that
maximizes an entropy function while enforcing data constraints so that
the expected values of some (pre-defined) features with respect to p,
match their empirical counterparts approximately. Using different
entropy measures, different model spaces for p and different
approximation criteria for the data constraints yields a family of
discriminative supervised learning methods (e.g., logistic regression,
conditional random fields, least squares and boosting). This framework
is known as the generalized maximum entropy framework.
Semi-supervised learning (SSL) has emerged in the last decade as a promising field that enables utilizing unlabeled data along with labeled data so as to increase the accuracy and robustness of inference algorithms. However, most SSL algorithms to date have had trade-offs, for instance in terms of scalability or applicability to multi-categorical data.
In this thesis, we extend the generalized MaxEnt framework to develop a family of novel SSL algorithms using two different approaches: i. Introducing Similarity Constraints We incorporate unlabeled data via modifications to the primal MaxEnt objective in terms of additional potential functions. A potential function stands for a closed proper convex function that can take the form of a constraint and/or a penalty representing our structural assumptions on the data geometry. Specifically, we impose similarity constraints as additional penalties based on the semi-supervised smoothness assumption; i.e., we restrict the generalized MaxEnt problem such that similar samples have similar model outputs. ii. Augmenting Constraints on Model Features We incorporate unlabeled data to enhance the estimates on the model and empirical expectations based on our assumptions on the data geometry.
In particular, we derive the semi-supervised formulations for three specific instances of the generalized MaxEnt on conditional distributions, namely logistic regression and kernel logistic regression for multi-class problems, and conditional random fields for structured output prediction problems. A thorough empirical evaluation on standard data sets that are widely used in the literature demonstrates the validity and competitiveness of the proposed algorithms. In addition to these benchmark data sets, we apply our approach to two real-life problems: i. vision based robot grasping, and ii. remote sensing image classification, where the scarcity of the labeled training samples is the main bottleneck in the learning process. For the particular case of grasp learning, we propose a combination of semi-supervised learning and active learning, another sub-field of machine learning that is focused on the scarcity of labeled samples, when the problem setup is suitable for incremental labeling.
The novel SSL algorithms proposed in this thesis have numerous advantages over the existing semi-supervised algorithms as they yield convex, scalable, inherently multi-class loss functions that can be kernelized naturally.