Skip to content

Lulliter/slogan

Repository files navigation

CORE files

tree analysis -C -L 2 # then https://carbon.now.sh/
 
 #or 
#tree -n  --noreport analysis  -L 2 | silicon --language bash -o images/tree.png

tree

TO DO

In analysis/01a_WB_project_pdo_prep.qmd

  • I added fresh (2025) info on sector theme etc (attached to 2024 proj list).. see what can be done with it
  • revise the OLD plot -> NEW plot thing

In analysis/01b_WB_project_pdo_EDA.qmd

  • revise the OLD plot -> NEW plot thing

In analysis/01c_WB_project_pdo_prep.qmd

  • tutto da rivedere ….
  • Study the theory of Lasso Regr and Classification with ML on Gabor’s book
  • re-write the attempted models in more condensed form
  • tell in plain English the choices available for improving ML pred perormance (even if not run)
    • defining data / sample for analysis (preprocessing)
    • defining/label engineering of y
    • feature engineering of x (dealing with missing data %, what x and in what functional form)
    • model selection
    • model hyperparameters tuning

Abstract

Exploring World Bank Project Development Objectives (PDO) text data

This ongoing project serves as a proof-of-concept for applying text analytics to World Bank Projects & Operations data. Focusing on ~4,000 projects, I analyze the short texts that define Project Development Objectives (PDOs)—concise summaries of each project’s goals. This exploration has uncovered intriguing insights, including:

  • Trends in sector-specific language and thematic shifts over time,
  • Unexpected patterns, such as recurring topics, phrases and conceptual relationships,
  • Enhanced text classification and metadata tagging through machine learning,
  • Novel text-based questions that could inform further research.

The analysis is conducted in R, integrating text mining, natural language processing, and machine learning techniques.

(This is an ongoing project, so comments, questions, and suggestions are welcome. The R source code is open, albeit not very polished).