Dissertation Project: Infinite Vocabulary Topic Modelling for Protein Recognition in a Mass Spectrometer

Above is a short video about my Computing Science BSc. Hons. dissertation project. The aim of this project was to develop a proof of concept application of an ‘infinite vocabulary’ topic modelling algorithm to mass spectrometry data.

The ‘infinite vocabulary’ feature implemented in this model is intended to allow the model to extract additional meaningful insight in to the data. This is because it may allow the model to distinguish between standard noise found in mass spectrometry data, and more nuanced differences between molecules that can be explained by common underlying structure.

In addition to extracting more meaningful insights in to the data, the infinite vocabulary feature also eliminates some pre-processing steps which had to be applied to the data.

Read the Dissertation

Click here to read a PDF of the dissertation.