Natural Language Processing


πŸ“‹ Course Overview

Course Information
Instructor: Edwin Puertas, PhD. Email: epuerta@utb.edu.co
Office: AL-304 School: School of Digital Transformation
Hours: 3 hours per week Credits: 4
Modality: Face-to-face Methodology: Lectures - Theoretical

🎯 Course Purpose

Develop theoretical and practical competencies in Natural Language Processing (NLP), enabling students to apply machine learning techniques and models to analyze, understand, and generate text in various contexts, fostering the resolution of organizational and research problems with an ethical and sustainable approach.

Specific Objectives

  • Understand the theoretical foundations of NLP, including language modeling, sentiment analysis, vector semantics, and neural networks
  • Implement supervised and unsupervised learning models using Python, NLTK, and TensorFlow
  • Evaluate and optimize NLP models through appropriate metrics and hyperparameter adjustments
  • Develop NLP projects in teams, applying communication skills, collaboration, and critical thinking

πŸ“š Course Content

Module 1:
Fundamentals of NLP
Module 2:
Machine Learning & NLP
Module 3:
Applications and Ethics
  1. Introduction to NLP and human language technologies
  2. Regular expressions, text normalization, and edit distance
  3. Language modeling with N-Grams
  4. Text classification with NaΓ―ve Bayes and sentiment analysis
  5. Logistic regression and vector semantics
  1. Neural networks and neural language models
  2. Sequence labeling for parts of speech and named entities
  3. RNNs, LSTMs, and advanced language models
  4. Transformers and large language models (BERT, GPT)
  5. Fine-tuning, prompting, and in-context learning
  1. Machine translation and text generation
  2. Question answering and information retrieval
  3. Development of chatbots and dialogue systems
  4. Automatic speech recognition and voice synthesis
  5. Ethical considerations in NLP and applications in reducing inequalities

πŸ”„ Course Methodology

The learning process is supported by four main activities:

πŸ“ Thematic Presentations

Synthesis of topics presented by the professor, enriched with valuable contributions and insights.

πŸ‘€ Student Assignments

Individual activities validating students' understanding and preparation of course materials.

πŸ‘₯ Workshops

Group activities reinforcing learning through practical application of concepts and techniques.

πŸ“‹ Exams

Individual evaluations measuring learning progress throughout the course.


πŸ” What is NLP?

Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical and practical issues in the design and implementation of computer systems for processing human languages.


🧩 NLP Fundamentals

Language Processing Levels

Phonetics

The study of speech sounds, examining how sounds are produced, transmitted, and perceived

Phonology

The study of sound systems and how sounds function within a particular language

Morphology

The study of word formation, examining morphemes (smallest meaningful units of language)

Syntax

The study of sentence structure and arrangement of words and phrases

Semantics

The study of meaning in words and sentences, including lexical and compositional semantics

Pragmatics

The study of language use in social contexts, considering factors like speaker intention and implied meaning

Key NLP Components

Component Description
Text Preprocessing Preparing raw text for analysis by transforming it into machine-readable format
Feature Extraction Converting raw text into numerical representations
POS Tagging Identifying the grammatical function of each word
Named Entity Recognition Identifying useful entities like names, locations, and dates
Coreference Resolution Identifying when different words refer to the same entity
Parsing Analyzing grammatical structure to extract meaning

Fundamental Techniques & Algorithms

Technique/Algorithm Description
Tokenization Dividing text into smaller units
Lemmatization/Stemming Reducing words to base form
POS Tagging Identifying grammatical functions
Dependency Parsing Understanding syntactic relationships
Bag of Words Simple representation of word frequency
TF-IDF Weighting word importance in documents
Word Embeddings Vector representations of words
N-gram Models Predicting words based on context
Pre-trained Models Models trained on massive text corpora

πŸš€ Practical Applications of NLP

πŸ˜ƒ Sentiment Analysis

Determining opinions or emotions expressed in text

πŸ€– Chatbots

Automated conversations with users

🌐 Machine Translation

Translating text between languages

πŸ” Information Extraction

Identifying and extracting relevant data from large text corpora

✍️ Text Generation

Automatically creating original text


πŸ“– Bibliography

  • Dan Jurafsky and James H. Martin (2020), Speech and Language Processing (3rd ed. draft)
  • Beysolow II, T. (2018). Applied Natural Language Processing with Python: Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing. Apress.
  • Vajjala, S., Majumder, B., Gupta, A., & Surana, H. (2020). Practical Natural Language Processing: A Comprehensive Guide to Building Real-World NLP Systems. O'Reilly Media.
  • Srinivasa-Desikan, B. (2018). Natural Language Processing and Computational Linguistics: A practical guide to text analysis with Python, Gensim, spaCy, and Keras. Packt Publishing Ltd.

References