publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2025
- LNCSMapping Motion: A Cognitive Approach to Dyspraxia Multimodal AnalysisIn Advances in Artificial Intelligence – IBERAMIA 2024, 2025
This research paper presents a novel software tool designed to revolutionize speech therapy for individuals with Childhood Apraxia of Speech (CAS), also known as Developmental Verbal Dyspraxia. The software offers a comprehensive multi-modal analysis approach, utilizing video, audio, and speech-to-text data to extract valuable insights into articulation patterns, head pose, audio characteristics, and word usage. This information empowers Speech-Language Pathologists (SLPs) with data-driven tools for a more precise assessment and development of personalized treatment plans. While the current study employs data from just one healthy subject to evaluate the software’s overall accuracy and data coherence, the functionalities are promising to improve therapy effectiveness for individuals with CAS, giving space for further experiments, analysis and validations with experts. The paper explores how the software’s capabilities can complement existing therapies like PROMPT, Touch-Cue Method, and Melodic Intonation Therapy, based on a cognitive perspective, ultimately aiming to transform the field of speech-language pathology and enhance the lives of those affected by speech disorders.
@inproceedings{Muvdi2025, author = {}, title = {Mapping Motion: A Cognitive Approach to Dyspraxia Multimodal Analysis}, booktitle = {Advances in Artificial Intelligence – IBERAMIA 2024}, year = {2025}, publisher = {Springer}, doi = {10.1007/978-3-031-80366-6_2}, } - CLEFVerbaNexAI at CLEF 2025 JOKER Task 3: Multi-Model LLM Approach for Onomastic Wordplay TranslationMaria Paz Ramirez, Jeison D. Jimenez, Deyson Gómez Sánchez, and 2 more authorsIn CEUR Workshop Proceedings, 2025
Our approach achieved first place in the CLEF 2025 JOKER Task 3 competition, outperforming all other participating teams and establishing new benchmarks for LLM-based creative translation systems. Testing five different models using advanced prompting strategies. Our methodology involved systematic prompt engineering with Chain-of-Thought reasoning and universe-specific translation patterns. ChatGPT-4o achieved the best performance with 29.5% exact matches and 30.6% accent-tolerant matches, resulting in an overall 60.1% success rate, demonstrating the potential of LLM-based approaches for creative multilingual wordplay translation.
@inproceedings{Ramirez2025JOKER, author = {Ramirez, Maria Paz and Jimenez, Jeison D. and Sánchez, Deyson Gómez and Serrano, Jairo E. and Martinez-Santos, Juan Carlos}, title = {VerbaNexAI at CLEF 2025 JOKER Task 3: Multi-Model LLM Approach for Onomastic Wordplay Translation}, booktitle = {CEUR Workshop Proceedings}, year = {2025}, volume = {4038}, pages = {2860--2869}, doi = {}, } - CLEFVerbaNexAI at CheckThat! 2025: Fine-Tuning DeBERTa for Multi-Label Scientific Discourse Detection in TweetsMervin Jesus Sosa Borrero, Jairo E. Serrano, and Juan Carlos Martinez-SantosIn CEUR Workshop Proceedings, 2025
This paper presents VerbaNexAI’s submission to Task 4a of the CheckThat! 2025 Lab, which focuses on the identification of scientific discourse in English-language tweets. We propose a multi-label classification approach based on a fine-tuned DeBERTa-v3 model, optimized through stratified cross-validation, threshold calibration using precision-recall curves, and ensemble prediction with soft-voting. Our system ranked 2nd overall in the official leaderboard with a macro-averaged F1 score of 0.7983 and achieved the top F1 score (0.8133) in Category 1 (scientific claims), demonstrating strong performance in detecting verifiable assertions in noisy social media contexts.
@inproceedings{Borrero2025CheckThat, author = {Borrero, Mervin Jesus Sosa and Serrano, Jairo E. and Martinez-Santos, Juan Carlos}, title = {VerbaNexAI at CheckThat! 2025: Fine-Tuning DeBERTa for Multi-Label Scientific Discourse Detection in Tweets}, booktitle = {CEUR Workshop Proceedings}, year = {2025}, volume = {4038}, pages = {1237--1245}, doi = {}, } - CLEFCOTECMAR-UTB at eRisk 2025: Semantic-Centroid Symptom Ranking and Early Depression Detection using Adaptive Decision RuleLuis Mendoza, Joan Suarez, Juan Carlos Martinez-Santos, and 1 more authorIn CEUR Workshop Proceedings, 2025
Depression remains a major global health concern, with millions of people affected in various demographics. However, the timely detection of depression symptoms remains a challenge due to biases and limitations in traditional diagnostic methods. Social networks have become valuable sources for identifying early signs of depression, as they provide real-time user interactions that reflect emotional states. This paper explores the eRisk 2025 challenge, focusing on two primary tasks for early detection of depression in online conversations. Task 1 involves ranking sentences according to their relevance to depression symptoms, while task 2 addresses the analysis of emotional progression in real-time conversations.
@inproceedings{Mendoza2025eRisk, author = {Mendoza, Luis and Suarez, Joan and Martinez-Santos, Juan Carlos and Serrano, Jairo E.}, title = {COTECMAR-UTB at eRisk 2025: Semantic-Centroid Symptom Ranking and Early Depression Detection using Adaptive Decision Rule}, booktitle = {CEUR Workshop Proceedings}, year = {2025}, volume = {4038}, pages = {1562--1582}, doi = {}, } - CLEFVerbaNex at TalentCLEF2025: Semantic Matching of Multilingual Job Titles through a Framework Integrating ESCO TaxonomyMelissa Moreno Novoa, Juan Carlos Martinez-Santos, and Jairo E. SerranoIn CEUR Workshop Proceedings, 2025
The accurate alignment of occupational titles across multiple languages poses a key challenge in modern talent management systems. Linguistic differences, semantic ambiguity, and the absence of structured references hinder the automatic identification of equivalent roles in diverse cultural contexts. Although language models have improved significantly, their performance remains limited. To address this, we propose a multilingual system for the semantic matching of job titles in English, Spanish, and German. The approach combines a pre-trained model with a fine-tuning process, using positive and negative examples generated from occupational family relationships defined by the ESCO taxonomy.
@inproceedings{Novoa2025TalentCLEF, author = {Novoa, Melissa Moreno and Martinez-Santos, Juan Carlos and Serrano, Jairo E.}, title = {VerbaNex at TalentCLEF2025: Semantic Matching of Multilingual Job Titles through a Framework Integrating ESCO Taxonomy}, booktitle = {CEUR Workshop Proceedings}, year = {2025}, volume = {4038}, pages = {4448--4458}, doi = {}, } - CLEFDevelopment of a Biomedical Question Answering System Based on Transformer ModelsLila López and Juan Carlos Martinez-SantosIn CEUR Workshop Proceedings, 2025
Recent advances in artificial intelligence have enabled the automation of complex tasks in the biomedical domain, such as automatic question-answering. Within this framework, the BioASQ international challenge encourages the development of systems capable of understanding natural language questions and generating accurate answers based on the scientific literature. This work aims to design a system that classifies the types of questions and produces suitable responses accordingly. We implemented a modular pipeline with six main stages: question type classification, linguistic preprocessing, Dynamic Routing and specialized model, hyperparameters, context retrieval, and performance evaluation.
@inproceedings{Lopez2025BioASQ, author = {López, Lila and Martinez-Santos, Juan Carlos}, title = {Development of a Biomedical Question Answering System Based on Transformer Models}, booktitle = {CEUR Workshop Proceedings}, year = {2025}, volume = {4038}, pages = {452--459}, doi = {}, } - CLEFRoBERT-IA: Human-AI Collaborative Text ClassificationDeyson Gómez Sánchez, Jeison D. Jimenez, Maria Paz Ramirez, and 2 more authorsIn CEUR Workshop Proceedings, 2025
Within the framework of the Generative AI Detection 2025 SubTask 2: Human-AI Collaborative Text Classification challenge, this study addresses the classification of texts co-authored by humans and large language models (LLMs), aiming to identify the degree of contribution of each author across six specific categories. Given the increasing accessibility and use of models such as GPT-4o, Claude 3.5, and Gemini 1.5-pro, the proliferation of AI-generated or AI-assisted content presents significant challenges in areas including misinformation, academic integrity, and content authenticity.
@inproceedings{Sanchez2025GenAI, author = {Sánchez, Deyson Gómez and Jimenez, Jeison D. and Ramirez, Maria Paz and Serrano, Jairo E. and Martinez-Santos, Juan Carlos}, title = {RoBERT-IA: Human-AI Collaborative Text Classification}, booktitle = {CEUR Workshop Proceedings}, year = {2025}, volume = {4038}, pages = {3672--3680}, doi = {}, } - CLEFUTBNLP at CLEF JOKER 2025 Task 2: mBART-50 Fine-Tuning with Dictionary-Guided Forced Decoding and Phoneme-Based Techniques for English-French Pun TranslationDuvan Andres Marrugo-Tobón, Jeison D. Jimenez, Jairo E. Serrano, and 1 more authorIn CEUR Workshop Proceedings, 2025
This paper presents a pun-focused translation system developed for the JOKER-2025 Task 2 competition on English-to-French pun translation. The system combines mBART-50 fine-tuning, LoRA adapter training, and a novel phoneme-aware forced-decoding strategy guided by a specialized pun dictionary. The end-to-end pipeline encompasses robust pun detection and tagging, oversampling-based data augmentation, phoneme transcription, and sentiment feature enrichment, enabling comprehensive capture of the layered meanings and playful ambiguity inherent in puns.
@inproceedings{Marrugo2025JOKER, author = {Marrugo-Tobón, Duvan Andres and Jimenez, Jeison D. and Serrano, Jairo E. and Martinez-Santos, Juan Carlos}, title = {UTBNLP at CLEF JOKER 2025 Task 2: mBART-50 Fine-Tuning with Dictionary-Guided Forced Decoding and Phoneme-Based Techniques for English-French Pun Translation}, booktitle = {CEUR Workshop Proceedings}, year = {2025}, volume = {4038}, pages = {2838--2848}, doi = {}, } - CLEFTiles-wise Inference with Vision Transformers for Multispecies Identification in Vegetation ImagesAndrea Menco-Tovar, Jairo E. Serrano, and Juan Carlos Martinez-SantosIn CEUR Workshop Proceedings, 2025
This paper presents a method for classifying vegetation plots containing multiple species in the context of the PlantCLEF 2025 challenge. It addresses the simultaneous identification of various species in high-resolution images using a segment-based inference approach with the Vision Transformer (ViT) model, previously pre-trained using the self-supervised learning technique DINO V2. The photos were systematically divided into different patch configurations, to enable accurate classification. The results showed that the optimal configuration was the 4×2 patch, achieving a public average macro F1 score of 0.29096 and a private score of 0.28324, ranking 13th in the challenge.
@inproceedings{Menco2025PlantCLEF, author = {Menco-Tovar, Andrea and Serrano, Jairo E. and Martinez-Santos, Juan Carlos}, title = {Tiles-wise Inference with Vision Transformers for Multispecies Identification in Vegetation Images}, booktitle = {CEUR Workshop Proceedings}, year = {2025}, volume = {4038}, pages = {3102--3110}, doi = {}, } - CLEFPrediction of Human Preferences and Explanation Generation with LLM: An Approach Based on RAG, Few-Shot Learning, and Auto-CoTDanileth Almanza-Gonzalez, Jairo E. Serrano, and Juan Carlos Martinez-SantosIn CEUR Workshop Proceedings, 2025
This study presents an advanced approach for predicting human preferences and generating explanations in large language models (LLMs) within the context of the "Preference Prediction" task of ELOQUENT Lab 2025. We implemented techniques such as Few-Shot Learning, Auto Chain-of-Thought (Auto-CoT), and Retrieval-Augmented Generation (RAG), evaluating multiple pre-trained models, from LLaMA-3 to distilgpt2. The system developed by the VerbaNexAI team achieved first place in the competition, standing out for its high performance in both safety (94.15%) and truthfulness (75.16%) criteria.
@inproceedings{Almanza2025ELOQUENT, author = {Almanza-Gonzalez, Danileth and Serrano, Jairo E. and Martinez-Santos, Juan Carlos}, title = {Prediction of Human Preferences and Explanation Generation with LLM: An Approach Based on RAG, Few-Shot Learning, and Auto-CoT}, booktitle = {CEUR Workshop Proceedings}, year = {2025}, volume = {4038}, pages = {1360--1369}, doi = {}, } - CLEFCEDNAV–UTB: Efficient Image Retrieval for Arguments with CLIPDiego Alberto Guevara Amaya, Jairo E. Serrano, and Juan Carlos Martinez-SantosIn CEUR Workshop Proceedings, 2025
This paper introduces an efficient and reproducible system for argumentative image retrieval developed by the UTB–CEDNAV team for the 2025 edition of the Image Retrieval for Arguments challenge at Touché@CLEF. The system leverages the CLIP model (ViT-B/32) to represent textual arguments through images. Unlike previous approaches that rely heavily on complex text processing, image generation models, or multi-stage architectures, this solution focuses on computational simplicity. It significantly reduces energy consumption by reusing embeddings, enabling parallel processing, and eliminating redundant steps.
@inproceedings{Amaya2025ImageRetrieval, author = {Amaya, Diego Alberto Guevara and Serrano, Jairo E. and Martinez-Santos, Juan Carlos}, title = {CEDNAV–UTB: Efficient Image Retrieval for Arguments with CLIP}, booktitle = {CEUR Workshop Proceedings}, year = {2025}, volume = {4038}, pages = {4601--4609}, doi = {}, } - CLEFCOTECMAR–UTB at TalentCLEF 2025: Linking Job Titles and ESCO Skills with Sentence Transformer EmbeddingsJhonattan Llamas, Jairo E. Serrano, and Juan Carlos Martinez-SantosIn CEUR Workshop Proceedings, 2025
This paper describes the COTECMAR–UTB submission to Talent-CLEF 2025TaskB, which focuses on retrieving the skills most relevant to a given job title. Our approach is a lightweight, unsupervised pipeline that normalizes job titles and skill aliases, encodes both with the Sentence-Transformer paraphrase-multilingual-mpnet-base-v2 into a shared 768-dimensional space, and ranks candidate skills by cosine similarity. The model is used strictly in a zero shot with no fine tuning so the system is easy to reproduce and deploy.
@inproceedings{Llamas2025TalentCLEF, author = {Llamas, Jhonattan and Serrano, Jairo E. and Martinez-Santos, Juan Carlos}, title = {COTECMAR–UTB at TalentCLEF 2025: Linking Job Titles and ESCO Skills with Sentence Transformer Embeddings}, booktitle = {CEUR Workshop Proceedings}, year = {2025}, volume = {4038}, pages = {4438--4447}, doi = {}, } - CLEFHybrid Re-ranking for Biomedical Entity Linking using SapBERT Embeddings: A High-Performance System for BioNNE-L 2025-1Daniel Peña Gnecco, Jairo E. Serrano, and Juan Carlos Martinez-SantosIn CEUR Workshop Proceedings, 2025
The BioNNE-L 2025-1 challenge advances biomedical entity linking (BEL) by mapping textual mentions to UMLS concepts, which is crucial for clinical and research applications. This study addresses Subtask 1 (English) with a novel SapBERT-based system. It integrates a hybrid re-ranking strategy combining cosine, Jaccard, and Levenshtein similarities, optimizing weights via grid search. Evaluated on the BioNNE-L development set, our system achieved an Accuracy@1 of 0.718, Accuracy@5 of 0.802, and MRR of 0.750. In the official competition, the VerbaNex AI Lab team secured first place in Accuracy@1 (0.70).
@inproceedings{Gnecco2025BioNNE, author = {Gnecco, Daniel Peña and Serrano, Jairo E. and Martinez-Santos, Juan Carlos}, title = {Hybrid Re-ranking for Biomedical Entity Linking using SapBERT Embeddings: A High-Performance System for BioNNE-L 2025-1}, booktitle = {CEUR Workshop Proceedings}, year = {2025}, volume = {4038}, pages = {497--508}, doi = {}, } - SemEvalVerbaNexAI at SemEval-2025 Task 11 Track A: A RoBERTa-Based Approach for the Classification of Emotions in TextDanileth Almanza, Juan Martínez Santos, and Edwin PuertasIn Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), 2025
Emotion detection in text has become a highly relevant research area due to the growing interest in understanding emotional states from human interaction in the digital world. This study presents an approach for emotion detection in text using a RoBERTa-based model, optimized for multi-label classification of the emotions joy, sadness, fear, anger, and surprise in the context of the SemEval 2025 - Task 11: Bridging the Gap in Text-Based Emotion Detection competition. Advanced preprocessing strategies were incorporated, including the augmentation of the training dataset through automatic translation to improve the representativeness of less frequent emotions. Additionally, a loss function adjustment mechanism was implemented to mitigate class imbalance, enabling the model to enhance its detection capability for underrepresented categories. The experimental results reflect competitive performance, with a macro F1 of 0.6577 on the development set and 0.6266 on the test set. In the competition, the model ranked 47th, demonstrating solid performance against the challenge posed.
@inproceedings{Almanza2025SemEval, author = {Almanza, Danileth and Santos, Juan Martínez and Puertas, Edwin}, title = {VerbaNexAI at SemEval-2025 Task 11 Track A: A RoBERTa-Based Approach for the Classification of Emotions in Text}, booktitle = {Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)}, year = {2025}, pages = {1192--1197}, } - SemEvalVerbaNexAI at SemEval-2025 Task 9: Advances and Challenges in the Automatic Detection of Food HazardsAndrea Menco Tovar, Juan Martinez Santos, and Edwin PuertasIn Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), 2025
Ensuring food safety requires effective detection of potential hazards in food products. This paper presents the participation of VerbaNexAI in the SemEval-2025 Task 9 challenge, which focuses on the automatic identification and classification of food hazards from descriptive texts. Our approach employs a machine learning-based strategy, leveraging a Random Forest classifier combined with TF-IDF vectorization and character n-grams (n=2-5) to enhance linguistic pattern recognition. The system achieved competitive performance in hazard and product classification tasks, obtaining notable macro and micro F1 scores. However, we identified challenges such as handling underrepresented categories and improving generalization in multilingual contexts. Our findings highlight the need to refine preprocessing techniques and model architectures to enhance food hazard detection. We made the source code publicly available to encourage reproducibility and collaboration in future research.
@inproceedings{Menco2025SemEval, author = {Tovar, Andrea Menco and Santos, Juan Martinez and Puertas, Edwin}, title = {VerbaNexAI at SemEval-2025 Task 9: Advances and Challenges in the Automatic Detection of Food Hazards}, booktitle = {Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)}, year = {2025}, pages = {1--6}, } - SemEvalVerbaNexAI at SemEval-2025 Task 2: Enhancing Entity-Aware Translation with Wikidata-Enriched MarianMTDaniel Peña Gnecco, Juan Carlos Martinez Santos, and Edwin PuertasIn Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), 2025
This paper presents the VerbaNexAi Lab system for SemEval-2025 Task 2: Entity-Aware Machine Translation (EA-MT), focusing on translating named entities from English to Spanish across categories such as musical works, foods, and landmarks. Our approach integrates detailed data preprocessing, enrichment with 240,432 Wikidata entity pairs, and fine-tuning of the MarianMT model to enhance entity translation accuracy. Official results reveal a COMET score of 87.09, indicating high fluency, an M-ETA score of 24.62, highlighting challenges in entity precision, and an Overall Score of 38.38, ranking last among 34 systems. While Wikidata improved translations for common entities like "Águila de San Juan," our static methodology underperformed compared to dynamic LLM-based approaches.
@inproceedings{Pena2025SemEval, author = {Gnecco, Daniel Peña and Santos, Juan Carlos Martinez and Puertas, Edwin}, title = {VerbaNexAI at SemEval-2025 Task 2: Enhancing Entity-Aware Translation with Wikidata-Enriched MarianMT}, booktitle = {Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)}, year = {2025}, pages = {1255--1262}, } - SemEvalUTBNLP at Semeval-2025 Task 11: Predicting Emotion Intensity with BERT and VAD-Informed AttentionMelissa Moreno, Juan Martínez Santos, and Edwin PuertasIn Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), 2025
Emotion intensity prediction plays a crucial role in affective computing, allowing for a more precise understanding of how emotions are conveyed in text. This study proposes a system that estimates emotion intensity levels by integrating contextual language representations with numerical emotion-based features derived from Valence, Arousal, and Dominance (VAD). The methodology combines BERT embeddings, predefined VAD values per emotion, and machine learning techniques to enhance emotion detection, without relying on external lexicons. The system was evaluated on the SemEval-2025 Task 11 Track B dataset, predicting five emotions (anger, fear, joy, sadness, and surprise) on an ordinal scale. The results highlight the effectiveness of integrating contextual representations with predefined VAD values, enabling a more nuanced representation of emotional intensity. However, challenges arose in distinguishing intermediate intensity levels, affecting classification accuracy for certain emotions. Despite these limitations, the study provides insights into the strengths and weaknesses of combining deep learning with numerical emotion modeling, contributing to the development of more robust emotion prediction systems.
@inproceedings{Moreno2025SemEval, author = {Moreno, Melissa and Santos, Juan Martínez and Puertas, Edwin}, title = {UTBNLP at Semeval-2025 Task 11: Predicting Emotion Intensity with BERT and VAD-Informed Attention}, booktitle = {Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)}, year = {2025}, pages = {1217--1222}, } - Systems Soft Comput.Enhancing consistency in piping and instrumentation diagrams using DistilBERT and smart PID systemsF.S. Gómez-Vega, O. Acuña, Andrea C. Camargo, and 6 more authorsSystems and Soft Computing, 2025
This study presents a novel approach utilizing DistilBERT, a lightweight variant of BERT, to identify inconsistencies in piping and instrumentation diagrams (P&IDs) within SmartPID systems. A structured dataset was constructed by extracting engineering design data from a SQL-based SmartPID database, monitoring all modifications and updates made throughout the design phase. The DistilBERT model was fine-tuned on this dataset to recognize inconsistencies in real-time, achieving an impressive F1 score of 99% and a loss of 0.04%. The model’s performance was validated by domain experts, who confirmed the detected inconsistencies as highly accurate. Our approach significantly reduces the manual effort required for P&ID review and improves design consistency, demonstrating the potential for enhanced safety and efficiency in complex industrial projects. Future work will focus on refining the model’s parameters and expanding its application across different industries.
@article{Gomez2025Systems, author = {Gómez-Vega, F.S. and Acuña, O. and Camargo, Andrea C. and Jimenez, Jeison D. and Galeano, Sara M. and Franco, Isabella E. and Lozano, Laura L. and Vásquez, Jenifer and Puertas, Edwin}, title = {Enhancing consistency in piping and instrumentation diagrams using DistilBERT and smart PID systems}, journal = {Systems and Soft Computing}, year = {2025}, volume = {7}, pages = {200373}, doi = {10.1016/j.sasc.2025.200373}, } - SemEvalVerbaNexAI at SemEval-2025 Task 3: Fact Retrieval with Google Snippets for LLM Context Filtering to identify HallucinationsAnderson Morillo, Edwin Puertas, and Juan Carlos Martinez SantosIn Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), 2025
The first approach leverages advanced LLMs, employing a chain-of-thought prompting strategy with one-shot learning and Google snippets for context retrieval, demonstrating superior performance. The second approach utilizes traditional NLP analysis techniques, including semantic ranking, token-level extraction, and rigorous data cleaning, to identify hallucinations.
@inproceedings{Morillo2025SemEval, author = {Morillo, Anderson and Puertas, Edwin and Santos, Juan Carlos Martinez}, title = {VerbaNexAI at SemEval-2025 Task 3: Fact Retrieval with Google Snippets for LLM Context Filtering to identify Hallucinations}, booktitle = {Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)}, year = {2025}, pages = {1534--1541}, }
2024
- ClinicalNLPVerbaNexAI at MEDIQA-CORR: Efficacy of GRU with BioWordVec and ClinicalBERT in Error Correction in Clinical NotesJuan Pajaro, Edwin Puertas, David Villate, and 2 more authorsProc. of ClinicalNLP, 2024
The automatic identification of medical errors in clinical notes is crucial for improving the quality of healthcare services.LLMs emerge as a powerful artificial intelligence tool for automating this task. However, LLMs present vulnerabilities, high costs, and sometimes a lack of transparency. This article addresses the detection of medical errors through the fine-tuning approach, conducting a comprehensive comparison between various models and exploring in depth the components of the machine learning pipeline. The results obtained with the fine-tuned ClinicalBert and Gated recurrent units (Gru) models show an accuracy of 0.56 and 0.55, respectively. This approach not only mitigates the problems associated with the use of LLMs but also demonstrates how exhaustive iteration in critical phases of the pipeline, especially in feature selection, can facilitate the automation of clinical record analysis.
@article{Pajaro2024, author = {Pajaro, Juan and Puertas, Edwin and Villate, David and Estrada, Laura and Tinjaca, Laura}, title = {VerbaNexAI at MEDIQA-CORR: Efficacy of GRU with BioWordVec and ClinicalBERT in Error Correction in Clinical Notes}, journal = {Proc. of ClinicalNLP}, year = {2024}, pages = {461–-469}, doi = {10.18653/V1/2024.CLINICALNLP-1.46}, } - ANDESCONFeature Selection for Forecasting of Energy Spot Price in the Colombian MarketMauro A. Gonzalez-Sierra, Rafael Arnedo, Edwin Puertas, and 1 more authorIn IEEE ANDESCON, 2024
Enhancing the forecasting accuracy of energy spot prices is vital for decision-making in energy trading. This study evaluates various feature selection methods and regression models to predict energy prices in Colombia. We compare Principal Components Analysis, Pearson Correlation Coefficients, Mutual Information, and Least Absolute Shrinkage methods with Ran-dom Forest, Extra Gradient boost, and CatBoost models. Mutual Information is the most compelling feature selection method, improving prediction accuracy notably. It identifies features such as reserves, month, and Ocean Nino Index in the study. The Random Forest model with Mutual Information-based selection achieved an R2 value of 95%. The research emphasizes the enhanced model accuracy in spot energy price prediction using Mutual Information-based feature selection. Future research should focus on cross-validation techniques and the impact of temporal data for further improvements.
@inproceedings{Gonzalez2024, author = {Gonzalez-Sierra, Mauro A. and Arnedo, Rafael and Puertas, Edwin and Martinez-Santos, Juan Carlos}, title = {Feature Selection for Forecasting of Energy Spot Price in the Colombian Market}, booktitle = {IEEE ANDESCON}, year = {2024}, publisher = {IEEE}, doi = {10.1109/ANDESCON61840.2024.10755699}, } - ResearchGateVerbaNex AI at DIPROMATS 2024: Enhancing Propaganda Detection in Diplomatic Tweets with Fine-Tuned BERT and Integrated NLP Techniques2024
This paper outlines the methodology used by VerbaNexAI for the DIPROMATS 2024 competition, part of the Iberian Languages Evaluation Forum (IberLEF). The challenge involves detecting propaganda and strategic narratives in tweets from diplomats of the USA, Europe, Russia, and China, in English and Spanish. Task 1, focuses on the identification and characterization of propaganda, we implemented four methodologies. Our pre-processing steps include text normalization, removal of URLs, retweets, user mentions, stop words, and lemmatization. Feature extraction was performed using TF-IDF vectorization, transformer fine-tuning, combined feature extraction from TF-IDF and transformers, and a hashtag-specific feature extraction. Regularization techniques such as class balancing and k-fold cross-validation were applied to ensure robust model performance. Various classifiers, including Random Forest, Support Vector Classifier, Naive Bayes, and Logistic Regression, were evaluated to determine the most effective models. Our approach aims to enhance the detection of propaganda in diplomatic tweets, contributing to a broader understanding of how propaganda operates in social media.
@misc{DIPROMATS2024, author = {}, title = {VerbaNex AI at DIPROMATS 2024: Enhancing Propaganda Detection in Diplomatic Tweets with Fine-Tuned BERT and Integrated NLP Techniques}, year = {2024}, } - ResearchGateVerbaNexAI Lab at HOMO-MEX 2024: Multiclass and Multilabel Detection of LGBTQ+ Phobic Content Using TransformersJuan Carlos Martinez-Santos Roger David Gonzalez-Henao and Edwin Puertas2024
@misc{HOMOMEX2024, author = {Roger David Gonzalez-Henao, Duvan Andres Marrugo-Tobon, Juan Carlos Martinez-Santos and Puertas, Edwin}, title = {VerbaNexAI Lab at HOMO-MEX 2024: Multiclass and Multilabel Detection of LGBTQ+ Phobic Content Using Transformers}, year = {2024}, } - CCISUnveiling Tourist Profiles in the Department of Sucre: A Text Analysis ApproachDanileth Almanza-Gonzalez, Edwin Puertas, and Juan Carlos Martinez-SantosIn Advances in Computing, 2024
Tourism has emerged as an industry powered by technological advancements that enrich tourists’ experiences. Digital interaction has transformed how tourists explore, share, and select destinations and activities. Platforms such as TripAdvisor and FourSquare stand out as sources of information and analysis, enabling understanding of tourist behavior and adapting offerings to tourists’ preferences. In this context, we analyzed comments extracted from TripAdvisor and FourSquare, applying Latent Semantic Analysis (LSA) techniques. Based on these data, segmentation of tourist types was implemented to gain a deep and detailed understanding of tourist behavior in the Sucre department, thus contributing to informed decision-making to enhance the quality of the tourism offer and experience.
@inproceedings{Almanza2024, author = {Almanza-Gonzalez, Danileth and Puertas, Edwin and Martinez-Santos, Juan Carlos}, title = {Unveiling Tourist Profiles in the Department of Sucre: A Text Analysis Approach}, booktitle = {Advances in Computing}, year = {2024}, publisher = {Springer}, doi = {10.1007/978-3-031-75233-9_3}, } - CCISImplementation of Convolutional Neural Networks for Automated Disease Detection in Cucumber CropsAndrea Menco Tovar, Edwin Puertas, and Juan Carlos Martinez-SantosIn Advances in Computing, 2024
This study presents an automated system for detecting diseases in cucumber crops using Convolutional Neural Networks (CNN). Given the importance of agriculture and the challenges crops face due to pests and diseases, a vision-based approach is proposed to improve the early and accurate identification of diseases in cucumbers. Three CNN architectures were evaluated: Xception, VGG16, and ResNet50, using a balanced dataset of images of healthy and diseased cucumber leaves. The Xception model showed the best performance with an accuracy of 93.45% and a loss of 0.4842, surpassing the other models. Image preprocessing and transfer learning were key to achieving these results. Despite the good results, challenges were identified in accurately classifying some images, suggesting areas for future improvement. This system provides a valuable tool for farmers, enabling early detection and rapid decision-making to control diseases, which can significantly improve crop quality and yield. Future research could integrate this system with mobile technologies and drones for more efficient real-time monitoring.
@inproceedings{Tovar2024, author = {Tovar, Andrea Menco and Puertas, Edwin and Martinez-Santos, Juan Carlos}, title = {Implementation of Convolutional Neural Networks for Automated Disease Detection in Cucumber Crops}, booktitle = {Advances in Computing}, year = {2024}, publisher = {Springer}, doi = {10.1007/978-3-031-75233-9_12}, } - CCISEnhancing Vocational Guidance with Machine Learning: Predicting STEM Career Viability for High School StudentsMelissa Moreno-Novoa, Edwin Puertas, and Juan Carlos Martinez-SantosIn Advances in Computing, 2024
The decision to pursue a professional career is paramount for young people. Yet, the absence of adequate guidance can result in misguided choices. In Colombia, most high school students need a clearer understanding of their future career aspirations. This lack of clarity often leads to a significant dropout rate at the university level. Furthermore, students entering university tend to consider something other than STEM careers, which could result in a mismatch between their labor skills and future economic needs. To address this issue, we apply machine learning, data mining, and utilizing information from tests such as Saber 11 and the Gardner test, which aim to predict viability within their capabilities oriented toward STEM careers. Applying machine learning models such as XGBoost, Stacking, Random Forest, Decision Tree, and KNN has shown promising results, albeit with challenges in balancing precision and recall for different classes. These advancements represent an opportunity to enhance vocational guidance and increase the likelihood of success and job satisfaction among young people.
@inproceedings{Moreno2024, author = {Moreno-Novoa, Melissa and Puertas, Edwin and Martinez-Santos, Juan Carlos}, title = {Enhancing Vocational Guidance with Machine Learning: Predicting STEM Career Viability for High School Students}, booktitle = {Advances in Computing}, year = {2024}, publisher = {Springer}, doi = {10.1007/978-3-031-75236-0_12}, } - SemEvalVerbaNexAI Lab at SemEval-2024 Task 1: A Multilayer Artificial Intelligence Model for Semantic Relationship DetectionAnderson Morillo, Daniel Peña, Juan Carlos Martinez Santos, and 1 more authorProc. of SemEval, 2024
This paper presents an artificial intelligence model designed to detect semantic relationships in natural language, addressing the challenges of SemEval 2024 Task 1. Our goal is to advance machine understanding of the subtleties of human language through semantic analysis. Using a novel combination of convolutional neural networks (CNNs), long short-term memory (LSTM) networks, and an attention mechanism, our model is trained on the STR-2022 dataset. This approach enhances its ability to detect semantic nuances in different texts. The model achieved an 81.92% effectiveness rate and ranked 24th in SemEval 2024 Task 1. These results demonstrate its robustness and adaptability in detecting semantic relationships and validate its performance in diverse linguistic contexts. Our work contributes to natural language processing by providing insights into semantic textual relatedness. It sets a benchmark for future research and promises to inspire innovations that could transform digital language processing and interaction.
@article{Morillo2024Semeval, author = {Morillo, Anderson and Peña, Daniel and Santos, Juan Carlos Martinez and Puertas, Edwin}, title = {VerbaNexAI Lab at SemEval-2024 Task 1: A Multilayer Artificial Intelligence Model for Semantic Relationship Detection}, journal = {Proc. of SemEval}, year = {2024}, doi = {10.18653/V1/2024.SEMEVAL-1.194}, } - SemEvalVerbaNexAI Lab at SemEval-2024 Task 10: Emotion recognition and reasoning in mixed-coded conversations based on an NRC VAD approachSantiago Garcia, Elizabeth Martinez, Juan Cuadrado, and 2 more authorsProc. of SemEval, 2024
This study introduces an innovative approach to emotion recognition and reasoning about emotional shifts in code-mixed conversations, leveraging the NRC VAD Lexicon and computational models such as Transformer and GRU. Our methodology systematically identifies and categorizes emotional triggers, employing Emotion Flip Reasoning (EFR) and Emotion Recognition in Conversation (ERC). Through experiments with the MELD and MaSaC datasets, we demonstrate the model’s precision in accurately identifying emotional shift triggers and classifying emotions, evidenced by a significant improvement in accuracy as shown by an increase in the F1 score when including VAD analysis. These results underscore the importance of incorporating complex emotional dimensions into conversation analysis, paving new pathways for understanding emotional dynamics in code-mixed texts.
@article{Garcia2024Semeval, author = {Garcia, Santiago and Martinez, Elizabeth and Cuadrado, Juan and Martinez-santos, Juan and Puertas, Edwin}, title = {VerbaNexAI Lab at SemEval-2024 Task 10: Emotion recognition and reasoning in mixed-coded conversations based on an NRC VAD approach}, journal = {Proc. of SemEval}, year = {2024}, doi = {10.18653/V1/2024.SEMEVAL-1.192}, } - SemEvalVerbaNexAI Lab at SemEval-2024 Task 3: Deciphering emotional causality in conversations using multimodal analysis approachVictor Pacheco, Elizabeth Martinez, Juan Cuadrado, and 2 more authorsProc. of SemEval, 2024
This study delineates our participation in the SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations, focusing on developing and applying an innovative methodology for emotion detection and cause analysis in conversational contexts. Leveraging logistic regression, we analyzed conversational utterances to identify emotions per utterance. Subsequently, we employed a dependency analysis pipeline, utilizing SpaCy to extract significant chunk features, including object, subject, adjectival modifiers, and adverbial clause modifiers. These features were analyzed within a graph-like framework, conceptualizing the dependency relationships as edges connecting emotional causes (tails) to their corresponding emotions (heads). Despite the novelty of our approach, the preliminary results were unexpectedly humbling, with a consistent score of 0.0 across all evaluated metrics. This paper presents our methodology, the challenges encountered, and an analysis of the potential factors contributing to these outcomes, offering insights into the complexities of emotion-cause analysis in multimodal conversational data.
@article{Pacheco2024Semeval, author = {Pacheco, Victor and Martinez, Elizabeth and Cuadrado, Juan and Santos, Juan Carlos Martinez and Puertas, Edwin}, title = {VerbaNexAI Lab at SemEval-2024 Task 3: Deciphering emotional causality in conversations using multimodal analysis approach}, journal = {Proc. of SemEval}, year = {2024}, doi = {10.18653/V1/2024.SEMEVAL-1.193}, }
2023
- SemEvalUTB-NLP at SemEval-2023 Task 3: Weirdness, Lexical Features for Detecting Categorical Framings, and Persuasion in Online NewsJuan Cuadrado, Elizabeth Martinez, Anderson Morillo, and 4 more authorsProc. of SemEval, 2023
Nowadays, persuasive messages are more and more frequent in social networks, which generates great concern in several communities, given that persuasion seeks to guide others towards the adoption of ideas, attitudes or actions that they consider to be beneficial to themselves. The efficient detection of news genre categories, detection of framing and detection of persuasion techniques requires several scientific disciplines, such as computational linguistics and sociology. Here we illustrate how we use lexical features given a news article, determine whether it is an opinion piece, aims to report factual news, or is satire. This paper presents a novel strategy for news based on Lexical Weirdness. The results are part of our participation in subtasks 1 and 2 in SemEval 2023 Task 3.
@article{Cuadrado2023Semeval, author = {Cuadrado, Juan and Martinez, Elizabeth and Morillo, Anderson and Peña, Daniel and Sossa, Kevin and Martinez-Santos, Juan Carlos and Puertas, Edwin}, title = {UTB-NLP at SemEval-2023 Task 3: Weirdness, Lexical Features for Detecting Categorical Framings, and Persuasion in Online News}, journal = {Proc. of SemEval}, year = {2023}, doi = {10.18653/V1/2023.SEMEVAL-1.214}, } - ResearchGateAutomated Depression Detection in Text Data: Leveraging Lexical Features, Phonesthemes Embedding, and RoBERTa Transformer ModelEdwin Puertas and Juan Carlos Martinez-Santos2023
Depression is a prevalent mental disorder characterized by persistent sadness, lack of interest, and diminished pleasure. Detecting depression is crucial for timely intervention and support. In this paper, we address the task of depression detection in text data, focusing on binary classification and regression. We present our approach, leveraging a dataset comprising labeled messages from Telegram groups related to mental disorders. We begin by exploring the existing literature on depression detection, highlighting the challenges faced and the methods employed. Our approach involves data pre-processing, lexical feature extraction, phonesthemes embedding, and using the RoBERTa transformer model. We achieved promising results in the training phase through rigorous experimentation and model refinement. However, we encountered challenges upon evaluating our approach in the MentalRiskEs evaluation. We identified areas for improvement, particularly in latency and speed of detection for real-time monitoring of depression-related risks. This research contributes to the ongoing efforts in automating depression detection and provides insights into the potential of text analysis techniques for mental health assessment. We remain committed to further enhancing our methodology and advancing the field to improve the well-being of individuals affected by depression, emphasizing the importance of early detection systems that are not only accurate but also responsive. Future work will explore multimodal inputs, real-time deployment strategies, and cross-linguistic generalization to increase the applicability and robustness of our system in diverse social and linguistic contexts.
@misc{DepressionDetection2023, author = {Puertas, Edwin and Martinez-Santos, Juan Carlos}, title = {Automated Depression Detection in Text Data: Leveraging Lexical Features, Phonesthemes Embedding, and RoBERTa Transformer Model}, year = {2023}, } - ResearchGateTeam UTB-NLP at FinancES 2023: Financial Targeted Sentiment Analysis Using a Phonestheme Semantic ApproachAnderson Morillo, Daniel Peña, and Edwin Puertas2023
@misc{FinancES2023, author = {Morillo, Anderson and Peña, Daniel and Puertas, Edwin}, title = {Team UTB-NLP at FinancES 2023: Financial Targeted Sentiment Analysis Using a Phonestheme Semantic Approach}, year = {2023}, } - CCISNatural Language Contents Evaluation System for Multi-class News Categorization Using Machine Learning and TransformersDuván A. Marrugo, Juan Carlos Martinez-Santos, and Edwin PuertasApplied Computer Sciences in Engineering, 2023
The exponential growth of digital documents has come with rapid progress in text classification techniques in recent years. This paper provides text classification models, which analyze various steps of news classification, where some algorithmic approaches for machine learning, such as Logistic Regression, Support Vector Machine, and Random Forest, are implemented. In turn, the uses of Transformers as classification models for the solution of the same problem, proposing BERT and DistilBERT as possible solutions to compare for the automatic classification of news containing articles belonging to four categories (World, Sports, Business, and Science/Technology). We obtained the highest accuracy on the machine learning side, with 88% using Support Vector Machine with Word2Vec. However, using Transformer DistilBERT, we got an efficient model in terms of performance and 91.7% accuracy for classifying news.
@article{Marrugo2023, author = {Marrugo, Duván A. and Martinez-Santos, Juan Carlos and Puertas, Edwin}, title = {Natural Language Contents Evaluation System for Multi-class News Categorization Using Machine Learning and Transformers}, journal = {Applied Computer Sciences in Engineering}, pages = {115--126}, year = {2023}, doi = {10.1007/978-3-031-46739-4_11}, } - IEEE Latin Am. Trans.Long-Term Effects of Degradation on Photovoltaic System Return on InvestmentJuan Cuadrado, Elizabeth Martinez, Edwin Puertas, and 1 more authorIEEE Latin America Transactions, 2023
The adoption of photovoltaic (PV) systems has increased significantly in recent years, driven by the demand for off-grid and on-grid residential and commercial applications. However, the high initial investment required for PV installations has limited their widespread adoption. Governments and marketing enterprises have implemented different strategies to promote PV systems to overcome this barrier, focusing on the return on investment (ROI) concept. However, the conventional approach uses limited economic factors to calculate the ROI. It fails to consider the impact of external factors, such as system degradation, which can vary between systems. To address this issue, we propose a new methodology to estimate the ROI of a photovoltaic system with greater accuracy. Our approach incorporates system-predicted degradation, calculated using historical meteorological data and prediction techniques. We applied this methodology to a photovoltaic system installed at the Universidad Tecnológica de Bolívar (UTB) in Cartagena and evaluated it against five different approaches. The results show that our proposed method offers a more accurate and reliable estimation of the ROI of a photovoltaic system, considering a broader range of factors. Overall, our work contributes to advancing the understanding of photovoltaic system ROI calculation and promotes using sustainable energy sources. By providing a more precise estimation of the ROI of a photovoltaic system, our methodology can help potential investors make more informed decisions and promote the adoption of clean energy
@article{Cuadrado2023PV, author = {Cuadrado, Juan and Martinez, Elizabeth and Puertas, Edwin and Martinez-Santos, Juan Carlos}, title = {Long-Term Effects of Degradation on Photovoltaic System Return on Investment}, journal = {IEEE Latin America Transactions}, year = {2023}, doi = {10.1109/TLA.2023.10305232}, } - IEEE C3RealCheck: A Web Application for Fake News Detection Using Natural Language ProcessingEdwin Puertas, Jenifer Vasquez, and Juan Carlos Martinez-SantosIn 1st IEEE Colombian Caribbean Conference (C3), 2023
The spread of misinformation and disinformation online represents a significant challenge in today’s society. This paper proposes implementing a news verifier web application that uses artificial intelligence techniques to determine the veracity of news provided by users. The system consists of three stages: user input validation, selection of relevant sources, and information analysis. We trained a logistic regression model with an in-house dataset of 2000 sentences for input validation. We utilized cosine similarity and FastText for source selection. Finally, a large language model analyzes the context and coherence of selected articles to determine veracity. The transparent presentation of results fosters media literacy. While some implementation details may differ due to time constraints, this paper provides the overall methodology and architecture of the system.
@inproceedings{Puertas2023C3, author = {Puertas, Edwin and Vasquez, Jenifer and Martinez-Santos, Juan Carlos}, title = {RealCheck: A Web Application for Fake News Detection Using Natural Language Processing}, booktitle = {1st IEEE Colombian Caribbean Conference (C3)}, year = {2023}, publisher = {IEEE}, doi = {10.1109/C358072.2023.10436244}, } - IEEE C3Detection of Online Sexism Using Lexical Features and TransformerElizabeth Martinez, Juan Cuadrado, Juan Carlos Martinez-Santos, and 1 more authorIn 1st IEEE Colombian Caribbean Conference (C3), 2023
Social networks are now integral to modern life, offering instant global communication. Unfortunately, this ease of connectivity has also led to the misuse of freedom, with many social media users expressing inappropriate and often sexist comments. In response, the field of natural language processing has actively sought solutions to detect and counteract such content. Our study builds upon our previous work presented during the SemEval 2023 competition. Initial results, achieved using only lexical features, prompted a thorough reevaluation. Recognizing the potential of advanced techniques, we embarked on a comprehensive exploration to enhance our approach. Leveraging our experience, we focused on a feature union strategy. By seamlessly combining the ’twitter-roberta-base-sentiment-latest’ transformer model with established lexical features, we developed a refined methodology that transcends the limitations of individual elements. This approach improved performance and demonstrated the synergistic potential of merging these distinct features. Our improved method revealed significant enhancements, underscoring the effectiveness of our methodology in boosting detection capabilities. It led to a notable improvement of approximately 20% in binary classification accuracy, a substantial leap from our previous results. The strategic blend of linguistic and transformer-based features emerged as the driving force behind this significant advancement.
@inproceedings{Martinez2023C3, author = {Martinez, Elizabeth and Cuadrado, Juan and Martinez-Santos, Juan Carlos and Puertas, Edwin}, title = {Detection of Online Sexism Using Lexical Features and Transformer}, booktitle = {1st IEEE Colombian Caribbean Conference (C3)}, year = {2023}, publisher = {IEEE}, doi = {10.1109/C358072.2023.10436298}, }