Παρουσίαση Διπλωματικής Εργασίας του κ. Νικόλαου Μάλλιου με τίτλο Modalities Fusion Utilizing Transformer Translation for Sentiment Analysis την Παρασκευή 29/9/2023 στις 16:00 με τηλεδιάσκεψη.
On 29 September 2023, at 16:00 graduate student Mr. Nikolaos Mallios, of the graduate program Data Science and Information Technologies will present his Master’s Thesis in an open presentation.
Thesis Title: Modalities Fusion Utilizing Transformer Translation for Sentiment Analysis
- Stavros Perantonis, Researcher A’ NCRS “DEMOKRITOS”, Head of Computational Intelligence Laboratory - External Instructor
- Elias S, Manolakos , Professor, Dept of Informatics and Telecommunications, NKUA)
- Theodoros Giannakopoulos, Principal Researcher NCRS “DEMOKRITOS”
Use the following zoom details to join:Nikolaos Mallios is inviting you to a scheduled Zoom meeting.
Topic: Master thesis presentation - Nikolaos Mallios
Time: Sep 29, 2023 04:00 AM Athens
Join Zoom Meeting
Meeting ID: 857 9264 0336
This thesis investigates Transformers-based modality fusion for multimodal speech sentiment classification. More specifically, we utilize the Transformer deep learning model for modality fusion through a process analogous to an encoder-decoder translation between each modality. We explore and implement two model architectures, the Hierarchical Translation Transformer and the Single Translation Transformer. Both models work end-to-end, fusing modalities through a translation-like process which
creates a representation of fused modalities and finally predicting the speaker’s sentiment.
We train both developed models using the CMU-MOSEI dataset for multimodal sentiment analysis. CMU-MOSEI is a tri-modal dataset including text, acoustic, and visual input modalities and a target sentiment. We use the common in similar research work CMU-MOSEI data pre-processing and alignment and the default train-test split for reliable and reproducible results. Lastly, we hyper-parameter-tune our two models on the validation dataset.
Our first model, Hierarchical Translation Transformer, performs modality fusion through Transformer a encoder-decoder translation procedure. Our contribution is that the translation fusion is applied straight to the features of each modality input sequence. As a result, It achieves similar results with other state-of-the-art models while having faster train and inference compared to RNN-based architectures. Our second model, the Single Translation Transformer, works similarly, but the two target modalities are concatenated
and processed together in order to require a smaller model. We show that Single Translation Transformer performs almost identically with our Hierarchical Translation Transformer and other state-of-the-art models. At the same time, it does not require separate sub-modules for fusing modalities, which is in contrast to the usual approach in most multimodal sentiment classification models. Finally, we extensively compare the computational gains of our two models with an RNN-based translation-based fusion model called MCTN. This comparison show that our models achieve 470% smaller inference time and 77% smaller training speed required per epoch.