Dr. Su Luo, School of Computer Science, Live Casino bet365 South University, public academic report

Source: Click: Time: February 21, 2022 10:03

Live Casino bet365 title：Multimodal Machine Learning: Efficient Visual-Language Deep Learning Models for Live Casino bet365 and Cross-Modal Retrieval.

Live Casino bet365 time：Morning, February 22, 2022（morning）8:30

Reporting location：Yifu Tower 109

Reporter：Live Casino bet365: Live Casino bet365

Topic of Live Casino bet365 1: Making images matter more: A Fourier Augmented Live Casino bet365 transformer

Summary:

Many vision-language models that output natural language, such as image-captioning models, usually use image features merely for grounding the captions and most of the good performance of the model can be attributed to the language model, which does all the heavy lifting. In this Live Casino bet365, we propose a method to make the images matter more by using fast Fourier transforms to further breakdown the input features and extract more of their intrinsic salient information, resulting in more detailed yet concise captions. Furthermore, we analyze and provide insight into the use of fast Fourier transform features as alternatives or supplements to regional features for self-attention in image-captioning applications.

Topic of Live Casino bet365 2: An analysis Live Casino bet365 use of feed-forward sub-modules in a transformer-based visual-language multimodal environment

Summary :

Transformers have become the go-to architecture when dealing with computer vision and natural language processing deep learning tasks. This is because of their state-of-the-art performance in most of those tasks. The main feature of the transformers to which this good performance has been attributed is the self-attention mechanism. Not much research has gone into investigating whether they are indeed responsible for most of the good performance. In this Live Casino bet365, we use image captioning as the choice of application to perform a comprehensive analysis of the effect of replacing the self-attention mechanism with feed-forward layers both for the image encoder and the text decoder. We investigate the effect on the memory usage, and sequence length where our experiments demonstrated many surprising results. This provides a qualitative analysis of the resulting captions, an empirical analysis of the evaluation metrics, and memory usage, providing a practical insight into the effect of this substitution in vision-language tasks while also demonstrating competitive results with the much simpler architecture.

Topic of Live Casino bet365 3: A Nonlinear Supervised Discrete Hashing framework for large-scale cross-modal retrieval

Summary :

In cross-modal retrieval, the biggest issue is the large semantic gap that exists between the feature distributions of heterogeneous data. This makes it very difficult to directly compute the relationships between different modalities. In order to bridge the heterogeneous gap, many techniques have been proposed to create an effective common latent common representation between the heterogeneous modalities, which can then be leveraged to bridge the gap so that the common representation can be computed efficiently by using common distance metrics. Some Live Casino bet365 shortcomings of current supervised cross-modal hashing methods will be discussed. Then, a novel hashing based cross-modal retrieval method that uses food ingredient retrieval as a proof of concept will be presented.

About the speaker: Osolo Ian Raymond received his BTech degree in Electrical Engineering from Nelson Mandela University, South Africa and the M.Eng. degree in Software Engineering at Live Casino bet365 South University, China where he is currently pursuing a PhD degree in Computer Science Application & Technology. He has published papers in reputed ESCI/SCI journals focusing on Image captioning and cross-modal retrieval. His research interests include Machine Learning, specifically, Deep Learning for Computer Vision, Natural Language Processing and Embedded Systems.