Up to this point, the PhD candidate has performed a thorough study of the theoretical basis surrounding data epresentations and machine learning models. Specifically, literature on local and distributed representations, shallow and deep learning models, representation learning and hand-crafted approaches has been covered, grounded by an examination of fundamentals of computational learning.
Following this theoretical preparation, research work began, focusing on popular representation and learning models for multiple modalities as well as multimodal approaches. At this point, an initial focus has been set on the text modality, with an additional emphasis on approaches that introduce semantic information in the learning pipeline. Bag-based methods, post-processing representation learning techniques as well as deep neural language models that generate word, document and sense embeddings in an end-to-end fashion were examined. Semantic augmentation techniques covered include learning objective modifications, feature combination strategies, embedding fine-tuning as well as sense-aware representation extraction. This body of research is being compiled and catalogued into a survey study, focused on semantically-enriched text representation methods for classification, to be added to the considerable number of scientific publications achieved as a result of the project progress thus far. Additionally, the codebase used in experimental evaluations performed as a part of the above contributions is bundled in a software package, a tool under continuous development and improvement that enables fast and out-ofthe-box large-scale experimentation on popular text datasets.