My bachelor thesis project was conducted in the Digital Humanities Lab (DHLab), in relation to the Impresso research project, which aims to develop innovative methods for studying historical newspapers in the digital age. The project focused on performing data mining on the metadata of newspapers contained in the archive.
I came up with an original idea to explore irregularities in publication dates as a way to detect issues such as paper or print shortages, newspaper staff strikes, and more. This was achieved by making a least-square prediction of the next publication and classifying the difference with the ground truth from the real publication above a certain threshold as an outlier.
To make the tool user-friendly, I packed it in a plug-and-play tool with a graphical interface working on a local web server. The tool provides data visualization and object listings that can be used to analyze and gain insights from the metadata of historical newspapers. This tool has the potential to help researchers uncover previously unknown information about the past, making it a valuable asset for future historical research.