Research

I am now employed as a post.doc. at the NorwAI center at the Department of Computer Science at NTNU, where am I mostly concerned with the TrustLLM (2023-2026) project. Previously, I’ve been involved with the PRESEMT (2009-2012) and Habit (2014-2017) projects.

My research primarily concerns computational linguistics, the intersection between computer science and linguistics, with Natural Language Processing (NLP) as its most prominent application. This can also be seen as part of Artificial Intelligence (AI), or perhaps better yet, an entry point to AI research, of which there are many. I see the recent advances in language technology as jeoyeus, not depressing, and agree with Ignat et al. (2024) in saying that NLP has not been solved. That many tasks are done well makes it more interesting to research the current limits of this technology.

In summary, I am involved in the following projects:

Machine translation (MT): For the two written forms of Norwegian, Bokmål and Nynorsk, Rule-Based MT has dominated to this day. Due to both data paucity and the short distance between the languages, rule-based approaches have thrived, even with commercial appeal. With current GPT technology, however, we might be in store for a paradigm shift. In collaboration with colleagues, I am actively developing models for translating the language pair.
Sentence-splitting and simplification: As a pre-cursor to creating MT systems that can account for the different language length preferences of langauges, I am working with collegues on fine-tuning LLMs for sentence splitting (while retaining semantic information).
Evaluation of Large Language Models: A key part of the TrustLLM project is to evaluate the models with both new and exisiting and new datasets. With continued traning and fine-tuning, there are are no shortage of applications to evaluate on. Critically, bias and safety concerns must be addressed. In addition, I take a particular interest in the linguistic capabilitity of LLMs. In collaboration with partners in the project, we are developing datasets and metrics to research the capabilities of LLMs, especially with regard to advanced linguistic constructions.
LLMs and Higher Education: LLMs have come as a juggernaut into higher education. It is an amibition of mine to rigorously research this the impact LLMs have had, especially on thesis and paper writing (Liang et al. 2024; Kobak et al. 2024). Currently, this research is on a development stage.
Linguistic Landscapes: I am currrently working with colleagues on investigating the linguistic landscape of parts of modern Norway, with shopping malls as a use case. Computer Vision simplifies data gathering, and LLMs unlock new potential for analyzing the collected data.

References

Ignat, Oana, Zhijing Jin, Artem Abzaliev, Laura Biester, Santiago Castro, Naihao Deng, Xinyi Gao, et al. 2024. “Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 8050–94. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.708.

Kobak, Dmitry, Rita González Márquez, Emőke-Ágnes Horvát, and Jan Lause. 2024. “Delving into ChatGPT Usage in Academic Writing Through Excess Vocabulary.” arXiv Preprint arXiv:2406.07016.

Liang, Weixin, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, et al. 2024. “Mapping the Increasing Use of LLMs in Scientific Papers.” https://arxiv.org/abs/2404.01268.