Hi, I am Alex Falcon. I am a Post-doc researcher at the University of Udine (AI Lab). I completed my PhD in Computer Science, Mathematics and Physics, jointly held at Fondazione Bruno Kessler (Technology of Vision - TeV) and University of Udine, under the supervision of Oswald Lanz (Free University of Bolzano) and Giuseppe Serra (University of Udine). My main research focus is currently focused towards multimedia, video and language understanding, and deep learning. Before that, I completed my Bachelor’s and Master’s degree in Computer Science at the University of Udine. Specifically, during my Master’s I started working with AI, machine learning, and deep learning with a focus on Predictive Maintenance.

E-mail: falcon.alex ‘at’ spes.uniud.it / Google Scholar / Github / LinkedIn / CV

News

[click for previous years]
  • 2023
  • 2022
    • one paper (pdf) accepted as an Oral at AIABI@AIxIA 2022!
    • one paper accepted at Computers in Industry! code
    • one paper accepted as an Oral at ACM MM 2022! code
    • I delivered two talks at University of Bolzano: "Data-driven approaches for the Remaining Useful Life Estimation problem" and "Learning video retrieval models with relevance-aware online mining"
    • I was featured in the FBK magazine (italian)!
    • our solution (report) got (trophy emoji) 1st place (trophy emoji) in the EPIC-Kitchens-100 Multi-Instance Retrieval Challenge @ CVPR 2022!
    • I attended the fantastic International Computer Vision Summer School (ICVSS) in Scicli, Italy and presented a poster titled "Relevance-aware Online Mining for Video Retrieval"!
    • one paper accepted as an Oral at ICMR 2022! code
    • one paper accepted as an Oral at ICIAP 2021! code
    • I delivered a seminar on "Data-driven approaches for the remaining useful life estimation problem" as the speaker at FBK!
  • 2021
    • our solution (report) got 3rd place in the EPIC-Kitchens-100 Action Recognition Challenge @ CVPR 2021!
    • I completed the "Fundamentals of Deep Learning for Multi-GPUs" course held by NVIDIA Deep Learning Institute!
    • we organized the VIQA workshop @ ICPR 2020 (later merged into the VTIUR workshop)!
  • 2020
  • 2019
    • one paper accepted at IRCDL 2019!
    • October 2019: I started my PhD under the supervision of Oswald Lanz and Giuseppe Serra!
    • July 2019: I successfully completed the Master's Degree in Computer Science cum laude!

Main projects and lines of research

Text-Video Retrieval

Topics: multimedia, cross-modal understanding, vision and language

Overview of the algorithm

TL;DR: Text-video retrieval is a task requiring to rank a collection of videos based on their relevance to a user textual query, with low ranks representing highly relevant videos and vice versa. State-of-the-art video retrieval systems are obtained with contrastive learning techniques. However, contrastive learning techniques enforce constraints at training time which neglect that multiple videos may be relevant for the same caption, and vice versa. In this line of research, we focus on the importance of introducing semantic knowledge into the training process to overcome these limitations. The results obtained confirm our observations and hypotheses, and the learning strategies we proposed effectively overcome them (e.g., from about 36% nDCG to almost 60% on the challenging EPIC-Kitchens-100 dataset).

Selected publications/endeavors
  1. Improving semantic video retrieval models by training with a relevance-aware online mining strategy.
    A. Falcon, G. Serra, O. Lanz. Computer Vision and Image Understanding 245, 104035. 2024. [pdf]
  2. Semantics for vision-and-language understanding.
    A. Falcon. PhD Thesis, 2023. [pdf]
  3. A Feature-space Multimodal Data Augmentation Technique for Text-video Retrieval.
    A. Falcon, G. Serra, O. Lanz. ACM MM 2022. [pdf]
  4. UniUD-FBK-UB-UniBZ Submission to the EPIC-Kitchens-100 Multi-Instance Retrieval Challenge 2022. (ranked (trophy emoji) ranked 1st (trophy emoji))
    A. Falcon, G. Serra, S. Escalera, O. Lanz. EPIC@CVPR 2022. [pdf]
  5. Relevance-based margin for contrastively-trained video retrieval models.
    A. Falcon, S. Sudhakaran, G. Serra, S. Escalera, O. Lanz. ACM ICMR 2022.[pdf]
  6. Learning video retrieval models with relevance-aware online mining.
    A. Falcon, G. Serra, O. Lanz. ICIAP 2021. [pdf]

Ranking complex 3D scenes

topics: multimedia, cross-modal understanding, vision and language

Overview of the algorithm

TL;DR: Nowadays, multiple enticing experiences are available in the Metaverse. Actually, there are lots of them and it is difficult to find those which are relevant for the user. Can we formalize this as a ranking problem? We introduced and evaluated state-of-the-art techniques in various scenarios related to complex 3D scenes, composed of multiple furnished rooms (e.g., in apartments) or containing many multimedia items (e.g., paintings in museums) which can affect the relevance.

Selected publications/endeavors
  1. AdOCTeRA: Adaptive Optimization Constraints for improved Text-guided Retrieval of Apartments.
    A. Abdari, A. Falcon, G. Serra. ACM ICMR 2024. [pdf]
  2. Paving the Way for Personalized Museums Tours in the Metaverse.
    A. Falcon, B. Portelli, A. Abdari, G. Serra. IRCDL 2024. [pdf]
  3. A Language-based solution to enable Metaverse Retrieval.
    A. Abdari, A. Falcon, G. Serra. MMM 2024. [pdf]
  4. FArMARe: a Furniture-Aware Multi-task methodology for Recommending Apartments based on the user interests.
    A. Abdari, A. Falcon, G. Serra. CV4Metaverse@ICCV 2023. [pdf]
  5. Metaverse Retrieval: Finding the Best Metaverse Environment via Language.
    A. Abdari, A. Falcon, G. Serra. MMIR@ACM MM 2023. [pdf]

Video Question Answering

topics: multimedia, cross-modal understanding, vision and language

Overview of the algorithm

TL;DR: Given a video and a question about its visual contents, can a model automatically provide the correct answer to that question? We investigated and introduced data augmentation techniques to achieve better accuracy while avoiding costly manual annotations, and proposed customized learning strategies leveraging the contents of the question itself.

Selected publications/endeavors
  1. Video question answering supported by a multi-task learning objective.
    A. Falcon, G. Serra, O. Lanz. Multimedia Tools and Applications 82 (25), 38799-38826. 2023. [pdf]
  2. Semantics for vision-and-language understanding.
    A. Falcon. PhD Thesis, 2023. [pdf]
  3. Data augmentation techniques for the video question answering task.
    A. Falcon, G. Serra, O. Lanz. EPIC@ECCV 2020. [pdf]

Remaining Useful Life Estimation

topics: predictive maintenance

Overview of the algorithm

TL;DR: We deal with the problem of estimating the remaining useful life (RUL) of mechanical engines (aeroplanes, in particular) by using neural sequence models. The RUL can be seen as a measure of how long it will take for the device under analysis to reach a failure (or, a situation in which a failure is very likely). We introduced to this field of research a neural model inspired from Neural Turing Machines and evaluated them under different scenarios. Experimental results confirm their robustness and precision compared to several neural sequence models.

Selected publications/endeavors
  1. Neural turing machines for the remaining useful life estimation problem.
    A. Falcon, G. D’Agostino, O. Lanz, G. Brajnik, C. Tasso, G. Serra. Computers in Industry 143, 103762. 2022. [pdf]
  2. Estimating the Remaining Useful Life via Neural Sequence Models: a Comparative Study.
    G. D'Agostino, A. Falcon, O. Lanz, G. Brajnik, C. Tasso, G. Serra. AIABI@AIxIA 2022. [pdf]
  3. A Dual-Stream architecture based on Neural Turing Machine and Attention for the Remaining Useful Life Estimation problem. A. Falcon, G. D'Agostino, G. Serra, G. Brajnik, C. Tasso.
    PHME 2020. [pdf]
  4. A neural turing machine-based approach to remaining useful life estimation.
    A. Falcon, G. D'Agostino, G. Serra, G. Brajnik, C. Tasso. ICPHM 2020. [pdf]
  5. Remaining Useful Life Estimation using LSTM Networks and Attentive mechanisms.
    A. Falcon. MSc Thesis, 2019.

Service

  • Proceedings Chair: IRCDL 2023
  • Part of the Organization Committee: IRCDL 2025, EQAI 2024, 2023, ICIAP 2023, AIxIA 2022
  • Organizer: CV4Metaverse 2024 at ECCV, VIQA 2020/VTIUR 2020
  • Guest associate editor: SI on Text-Multimedia Retrieval (ACM TOMM)
  • Journal Reviewing: IJCV, IEEE TMM, CVIU, IET Computer Vision, ACM TOMM, IEEE Trans Hum Mach Syst.
  • Conference Reviewing: MMM 2025, ECCV 2024 (Outstanding Reviewer!), ACM MM 2024, ACM MM 2023, CCISP 2023, IRCDL 2023, ICIAP 2023, ICPR 2022, ICIAP 2021, EMNLP 2021, ICPR 2020.
  • Co-Supervision: 3 Bachelor and 5 Master students of Computer Science Degree or IoT, Big Data, and ML Degree at UniUD on topics related to Video&Language and Predictive Maintenance.
    • Gallegos Carvajal Ian Marco, MSc Enhancing text-to-textured 3D mesh generation with training-free adaptation for textual-visual consistency using spatial constraints and quality assurance: a case study on Text2Room. 2024.
    • Rodaro Edoardo, BSc Rilevamento del flusso di materiale su nastro trasportatore attraverso le reti neurali. 2023.
    • Bianchi Carlo, MSc L'Intelligenza Artificiale a supporto del metaverso. 2023.
    • Bruni Pierfrancesco, MSc Circulant matrices lead to an improved baseline for question-driven video moment localization. 2022. (published at IRCDL 2023!)
    • De Martin Federica, MSc Ricerca di un nuovo modello video e transfer learning nell’ambito del Multi-Instance video-text retrieval. 2022.
    • De Reggi Paolo, MSc Generazione automatica di domande e risposte per il problema del video question answering. 2021.
    • Ferroli Daniele, BSc Implementazione di un sistema di Video Question Answering. 2020.
    • Rosso Giovanni, BSc Utilizzo di reti neurali convolutive per la manutenzione predittiva. 2020.