Master's thesis projects

If you are enrolled in a master's program and interested in writing your thesis in collaboration with the UiT Machine Learning Group we have proposed some interesting projects you can get involved with.

Learning from limited labeled data

Most successful applications of deep learning have in the past relied largely on supervised approaches, fueled by large amounts of labeled data. However, in many application domains, such as for example the medical domain, obtaining labels can be challenging (requiring expert knowledge, costly etc,). This project aims to develop new deep learning based algorithms in order to learn from limited labels. Potential directions will explore approaches for zero-shot and few-shot learning, clustering and/or domain adaptation.

Prerequisites: FYS-2021, FYS-3012, FYS-3033

Contact person: Michael Kampffmeyer (michael.c.kampffmeyer@uit.no)

Machine learning and compressive sensing

Recent progress on transformers deep neural networks has shown that a Fourier transform on the input could help reduce the computational costs, see [FNets](https://arxiv.org/abs/2105.03824). The Fourier transform can be seen as a sort of mixing of the input sequences that helps the network to relate long-range information. However, a random mixing does not perform as well. So this puzzling behavior is not well understood at the moment. The goal of the project is to explore this direction by testing different linear mixing possibilities and on different tasks (not necessarily related to NLP). This may be related to the concept of compressive sensing in signal processing where the signal is mixed before it is measured. Knowledge from this domain could help make more efficient neural networks.

Recommended prerequisites: Knowledge of machine learning and signal/image processing. A background in math would be a plus (spectral theory, functional analysis, optimization) but is not mandatory.

Contact person: Benjamin Ricaud (https://bricaud.github.io/)

Hidden Markov Model Time Series Segmentation

Hidden Markov model remains the most used model for time series segmentation. Its main advantage, discrete hidden states and observations is also its main limitation. Of course, there have been several works extending to model to more versatile distributions. Our objective will be to train a recurrent network to segment time series by enforcing sparsity of the latent representation. All that in an unsupervised manner of course. We will investigate what does the model "naturally" extract and eventually how to guide it. Possible applications include medical data, electrical usage, NLP, and many others.

Recommended prerequisites: FYS-3012, FYS-3033

Contact person: Ahcene Boubekki (ahcene.boubekki@uit.no)

Population counting using Drone Images for Marine Surveys

Marine surveys require use of valuable resources (expert's time and boats). UiT in collaboration with Norwegian Polar Institute and University of Southern Denmark is working towards developing a solution for performing population counting based on images captured from flying a drone. The initial plan is to develop a supervised learning based methodology for detecting the number of porpoises in an image. Later on, the plan is to further develop the framework to accommodate for other similar mammal species (with fewer training samples).

Prerequisites: FYS-2021, FYS-3033

Contact person: Puneet Sharma (puneet.sharma@uit.no)

Safe AI using Bayesian Deep Learning

Current decision support tools are usually designed by using expert knowledge or data driven techniques. However, these methods are mostly dependent on the high level of understanding of the subject or a dataset with unrealistic high quality to achieve optimal or desired performances. Many real-world problems are highly complex, which require new techniques that can model uncertainties and making decisions based on the availability and quality of data. Approaches toward building a personalized decision support tools include developing a prediction model of the risk and outcome, or deriving safe and effective data driven decision algorithms. With the development of artificial intelligence, deep learning has been used extensively in modelling and prediction. The combination of deep learning with Bayesian inferencing allows information and uncertainties to be accurately estimated from the training data. The AI agent needs to be designed carefully such that it can safely explore the environment and propose actions that are both risk-averse and robust. Integrating deep learning, Bayesian inferencing with reinforcement learning framework will bring great opportunities to solve the problem and contribute toward a safe AI.

Background: A background in Bayesian inference, deep learning and reinforcement learning would be ideal, but a general background in machine learning and statistical methodology will be sufficient. Good programming skills are required.

Contact persons: Fred Godtliebsen, UiT Machine Learning Group and Phuong Ngo, Norwegian Centre for E-health Research

Reference:

[1] Ngo, P. and Godtliebsen, F., “Data-Driven Robust Control Using Reinforcement Learning,” 2020. [Online]. Available: https://arxiv.org/pdf/2004.07690.pdf.

Constructing a Norwegian De-identification tool for clinical text

Today the research using data hungry machine learning methods are growing however not on less resourced languages, such as Norwegian and Swedish and specifically not on clinical text since the patient records contain personal information that may reveal the patients identity and this is not allowed both by ethical reasons but also because of EU GDPR legislation. Therefore, a master thesis topic is proposed to create a Norwegian De-identification tool for clinical text. The tool should perform identification by using NER (Named Entity Recognition) methods for identifying so called PHI (Protected Health Information) personal names, locations, phone numbers, ages, dates, social security numbers etc. and the replace and obscure the found sensitive entities.

Norwegian training data are for example available open resources such as NorNe (Norwegian Named Entities corpus). For the evaluation one may use the Norwegian synthetical clinical text, NorSynthClinical-PHI.

Prerequisites:

  • Programming skills in any programming language but preferably, Python, Perl or Java.

  • Some experience with machine learning platforms.

  • Natural language skills, maybe in Norwegian language or having access to someone knowing Norwegian that can evaluate the results.

Contact person: Professor Hercules Dalianis, Norwegian Centre for E-health Research, Tromsø

Building and annotating a corpus of Norwegian biomedical text

Today the research of biomedical text mining is growing, however not for less resourced languages, such as Norwegian and Swedish, there is a lack of resources. Therefore, a master thesis topic is proposed to create a corpus of Norwegian biomedical text, a starting point is to use Tidsskriftet for Den norske legeforening, download the files, extract the text and create the corpus. When the corpus is created annotate it using automatic methods as for example using MeSH or ICD-10 diagnosis codes assignment that are available for both English, Norwegian and Swedish. The aim is to automatically categorise the text in their scientific topics. Tools to use can be lemmatisation, text matching as well as word embeddings, etc

Prerequisites:

  • Programming skills in any programming language but preferably, Python, Perl or Java.

  • Some experience with machine learning platforms.

  • Natural language skills, maybe in Norwegian language or having access to someone knowing Norwegian that can evaluate the results.

Contact person: Professor Hercules Dalianis, Norwegian Centre for E-health Research, Tromsø