Skip to content

Sapphirine/202412-27-Satellite-Data-and-Sentiment-Analysis-for-Comprehensive-Environmental-Monitoring

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

202412-27-Satellite-Data-and-Sentiment-Analysis-for-Comprehensive-Environmental-Monitoring

Project presentation video: https://www.youtube.com/watch?v=t9iQPvF8nIU

This project integrates high-resolution satellite data from NASA’s TROPOMI with advanced natural language processing (NLP) techniques to provide a comprehensive framework for environmental monitoring and understanding public perceptions of climate change. By combining pollutant distribution maps with sentiment and thematic analyses of climate-related news articles, the research aims to address gaps in correlating geospatial data with societal discourse. This interdisciplinary approach enhances insights into the intersection of environmental conditions and public awareness, offering valuable implications for policymaking, public communication, and environmental informatics.

Dataset: The project utilized two primary datasets:

NASA Satellite Data: Pollutant data, including CO2 and NO2 distributions, were sourced from public NASA missions such as TROPOspheric Monitoring Instrument (TROPOMI). These datasets provide high-resolution geospatial pollutant distribution, publicly accessible through NASA's Earth Observing System Data and Information System (EOSDIS). Climate-Related News Articles: A dataset of over 30,000 articles from The Guardian was acquired via Kaggle. Articles from December 2017 to January 2024 were filtered to include only climate-related content from 2023. This dataset contains titles, introductory summaries, full article texts, authors, and publication dates, structured for detailed sentiment and thematic analysis. The models developed in this project are designed with adaptability to incorporate additional datasets beyond the ones tested. For example on the NLP side, the framework can accommodate textual datasets from diverse sources, including social media platforms (e.g., Twitter API), blogs, policy documents, or environmental reports. Pretrained NLP models such as BERT or DistilBERT can be fine-tuned to adapt to domain-specific vocabularies, ensuring accurate sentiment analysis, topic modeling, and emotion classification. Additionally, the system's architecture allows for the integration of multimodal data, enabling analyses that combine geospatial, textual, and even temporal datasets to derive richer, context-aware insights.

Analytics:

The study integrates multiple analytics techniques to provide a comprehensive understanding of climate change dynamics. Sentiment analysis quantifies the tone of climate-related articles, categorizing them as positive, negative, or neutral, and aggregates these scores regionally to capture geographic trends. Emotion analysis delves deeper into media narratives, identifying prevalent emotions such as anger, fear, joy, and sadness, which are crucial for understanding the emotional undertones of public discourse on environmental issues. Geospatial pollutant mapping utilizes satellite data from instruments like TROPOMI to create high-resolution maps of atmospheric pollutants such as NO2 and CO2, offering insights into air quality and pollution hotspots.

Algorithms

The research employs state-of-the-art algorithms for both geospatial and textual data processing. Named Entity Recognition (NER) is performed using a fine-tuned BERT-large-cased model optimized for extracting precise location-based entities from climate-related text. For sentiment and emotion classification, DistilBERT and DistilRoBERTa models are utilized, offering computational efficiency without compromising on accuracy. These pre-trained transformers, fine-tuned on domain-specific datasets, capture nuanced emotional tones and sentiment trends in climate discourse. Topic modeling is executed using Latent Dirichlet Allocation (LDA), which identifies thematic trends within large textual datasets by extracting the most relevant topics and keywords. These advanced algorithms enable detailed exploration and correlation of textual and geospatial data.

System Modules

The system architecture is designed to seamlessly integrate geospatial and textual data. A satellite data extraction module accesses pollutant information from NASA’s Earthdata archives, utilizing tools like Panoply for processing netCDF files. The natural language processing (NLP) module processes thousands of climate-related news articles, performing NER, sentiment analysis, emotion detection, and topic modeling. An interactive visualization interface, built using HTML and JavaScript, allows users to explore the results dynamically, featuring dropdown menus for time-based filtering and real-time updates of visual data. These modules work in unison to provide an integrated platform for analyzing the interplay between environmental conditions and public discourse, supporting policy-making and public awareness initiatives.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published