RTM_standard_pipeline.Rmd

---
title: "RTM standard pipeline processes"
author: "Amira Swedan"
date: "`r format(Sys.time(), '%d/%m/%Y')`"
output:
  word_document:
    toc: yes
---

```{r setup, include=FALSE}

knitr::opts_chunk$set(echo = TRUE)

```

## Overview

This data pipeline automates the process of RTM/mVAM data management for the RTM core variables related to the key indicators. The standard pipeline procedures are aligned with [RBC RTM SOPs](https://wfp-my.sharepoint.com/:w:/g/personal/amira_swedan_wfp_org/EVistHIsU_xAsdsF-2SEcRoB-cMrwS-IuyjyscsUVLfuig?e=SH9hHK) that includes a detailed explanation about the methodology applied in this script.

The key objective of the standard pipeline is to maintain the same process of data management across all RBC countries running RTM/mVAM activities.

[**The key functions implemented in this standard pipeline are**:]{.underline}

-   Fetching the raw dataset
-   Renaming and selecting core variables
-   Data quality checks
-   Data re-coding and merging
-   Indicators computation
-   Weights construction
-   Connecting to the regional master table

Before running this pipeline script, you need to update the input file included in the script dependencies *RTM_pipeline_input.xlsx*, Specify in this file the start and end date that you wish to extract the data within. [*Check readme for more details*]

In addition to modifying the input file, make sure that all population figures included in the weights input table file are updated, and the desired indicators are included.

The script running environment is included in the *readme* file, [make sure that you have all required packages installed]{.underline} before running the script

## Load packages and read relevant input files

```{r load packages, echo=TRUE, message=FALSE, warning=FALSE}

# load required packages

library(tidyverse)
library(readxl)
library(httr)
library(jsonlite)
library(lubridate)
library(lares)
library(labelled)
library(expss)
library(ggplot2)
library(ggrepel)
library(tools)
library(writexl)

# read input file
pipInput = read_excel('Pipeline input/RTM_pipeline_input.xlsx' , sheet = 'main_parameters')
stdNames = read_excel('Pipeline input/RTM_pipeline_input.xlsx' , sheet = 'standard_names_mapping')
pipInput$StartDate = format(as.Date(pipInput$StartDate), "%m/%d/%Y")
pipInput$EndDate = format(as.Date(pipInput$EndDate), "%m/%d/%Y")

```

## Connect and Extract Raw Data

Connect through Crystal API to fetch the raw data.

To connect you need to pass the key parameters (`APIKey`, `start date`, `end date`) for each country separately to *Crystal_API_connect* function.

The parameters should be inserted in the pipeline input file, the configuration list for each country (`Lebanon`,`Syria`, and `Yemen`) will be updated automatically.

To add a new country, first modify the input file and create a new configuration list that should be passed to the connection function.

The new variables names in this section are specified according to the standard variables names included in WFP codebook.

The codebook is embedded in WFP [survey designer](https://www.surveydesigner.vam.wfp.org/design/survey).


```{r extract raw data and rename, message=FALSE, warning=FALSE, echo=TRUE}

## create a configuration list for each country in the pipeline ## Don't proceed with running the pipeline functions for a country without creating this list 

Syria_config = list ('APIKey' = paste("Syria_" , unlist (strsplit(pipInput$EndDate[pipInput$CountryCode == 'Syria'] , "/")) [1] , unlist (strsplit(pipInput$EndDate[pipInput$CountryCode == 'Syria'] , "/")) [3] , sep = ""),  'DateFrom' = pipInput$StartDate[pipInput$CountryCode == 'Syria']  , 'DateTo' = pipInput$EndDate[pipInput$CountryCode == 'Syria'] )

Yemen_config = list ('APIKey' = paste("Yemen1_" , unlist (strsplit(pipInput$EndDate[pipInput$CountryCode == 'Yemen1'] , "/")) [1] , unlist (strsplit(pipInput$EndDate[pipInput$CountryCode == 'Yemen1'] , "/")) [3] , sep = ""),  'DateFrom' = pipInput$StartDate[pipInput$CountryCode == 'Yemen1']  , 'DateTo' = pipInput$EndDate[pipInput$CountryCode == 'Yemen1'] )

Leb_config = list ('APIKey' = "Lebanon_112023",  'DateFrom' = pipInput$StartDate[pipInput$CountryCode == 'Lebanon']  , 'DateTo' = pipInput$EndDate[pipInput$CountryCode == 'Lebanon'] )

Libya_config = list ('APIKey' =  "Libya_012024",  'DateFrom' = pipInput$StartDate[pipInput$CountryCode == 'Libya']  , 'DateTo' = pipInput$EndDate[pipInput$CountryCode == 'Libya'] )


##########################################################
#the function below could be used to pull the survey data for any country if the provider is Crystal
###########################################################

Crystal_API_connect <- function (con_config) {
  
json_body <- jsonlite::toJSON(con_config, auto_unbox = TRUE)

print (json_body)
  
 r =  POST(url = 'http://campaignapi.crystelcall.com/CampaignAPI/api/Campaign' , body = json_body ,  add_headers('Content-Type'='application/json')) 
 
 if (r$status_code == 200) {
   print ('Crystal API connection is ok ')
 }
 else {
     print (paste0('ERROR:reading from Crystal API returned the status code ' , r$status_code))
 }
 
 Rawdata = fromJSON(rawToChar(r$content))
 
 return(Rawdata)
 
}


##Variables standard names mapping

replace_variable_names <- function(data, old_names, new_names) {
  # Convert the data to a data frame if it's not already
  if (!is.data.frame(data)) {
    data <- as.data.frame(data)
  }
  
  # Check if the lengths of old_names and new_names are the same
  if (length(old_names) != length(new_names)) {
    stop("The lengths of old_names and new_names must be the same.")
  }
  
  # Iterate over the old_names and new_names to replace variable names
  for (i in seq_along(old_names)) {
    if (old_names[i] %in% names(data)) {
      names(data)[names(data) == old_names[i]] <- new_names[i]
    }
  }
  
  # Return the updated data frame
  return(data)
}


######################################################
##Pull the data for only countries provided in the input list and export the raw data in csv file
####################################################

if ('Syria' %in% pipInput$CountryCode) {

SyriaRaw = Crystal_API_connect (Syria_config)
SyriaRaw = SyriaRaw[SyriaRaw$Completed == 'Y' ,]
print(paste0('total number of completed surveys for Syria survey is ' , nrow(SyriaRaw)))
SyriaRaw = replace_variable_names(SyriaRaw , stdNames$Orignial[stdNames$Country == 'Syria'] , stdNames$Standard[stdNames$Country == 'Syria'])
write.csv(SyriaRaw , paste0("Raw data/Syria/Syria_raw_" , pipInput$SvyID[pipInput$CountryCode == 'Syria'] , "_" , Sys.Date() ,  ".csv"))
print ( paste0('Syria ', pipInput$SvyID[pipInput$CountryCode == 'Syria'] , ' raw data is exported' ))

}


if ('Yemen1' %in% pipInput$CountryCode ) {
YemenRaw = Crystal_API_connect(Yemen_config)
YemenRaw = YemenRaw[YemenRaw$Completed == 'Y' ,]
print(paste0('total number of completed surveys for Yemen survey is ' , nrow(YemenRaw)))
YemenRaw = replace_variable_names(YemenRaw , stdNames$Orignial[stdNames$Country == 'Yemen'] , stdNames$Standard[stdNames$Country == 'Yemen'])
write.csv(YemenRaw , paste0("Raw data/Yemen/Yemen_raw_" , pipInput$SvyID[pipInput$CountryCode == 'Yemen1'] ,"_"  , Sys.Date(), ".csv"))
print ( paste0('Yemen ', pipInput$SvyID[pipInput$CountryCode == 'Yemen1'] , ' raw data is exported' ))

}

if ('Lebanon' %in% pipInput$CountryCode) {
LebRaw = Crystal_API_connect(Leb_config)
LebRaw = LebRaw[LebRaw$Completed == 'Y' ,]
print(paste0('total number of completed surveys for Lebanon survey is ' , nrow(LebRaw)))
LebRaw = replace_variable_names(LebRaw , stdNames$Orignial[stdNames$Country == 'Lebanon'] , stdNames$Standard[stdNames$Country == 'Lebanon'])
write.csv(LebRaw , paste0("Raw data/Lebanon/Lebanon_raw_" , pipInput$SvyID[pipInput$CountryCode == 'Lebanon'] , "_" , Sys.Date() , ".csv"))
print ( paste0('Lebanon ', pipInput$SvyID[pipInput$CountryCode == 'Lebanon'] , ' raw data is exported' ))
}


if ('Libya' %in% pipInput$CountryCode) {
LibyaRaw = Crystal_API_connect(Libya_config)
LibyaRaw = LibyaRaw[LibyaRaw$Completed == 'Y' ,]
print(paste0('total number of completed surveys for Libya survey is ' , nrow(LibyaRaw)))
write.csv(LibyaRaw , paste0("Libya_raw_" , pipInput$SvyID[pipInput$CountryCode == 'Libya'] , "_", Sys.Date(), ".csv"))
}


```

## import raw data directly 

For countries that utilizes MODA, there is a need to manually extract the data from MODA and include it in the country respective folder under the raw data folder. 
Make sure while extracting the data through MODA to:
1. export it in csv
2. don't include the groups names in the header
3. extract the data with the variables labels
4. include one file at a time in the folder

```{r message=FALSE, warning=FALSE}

if ('Iraq' %in% pipInput$CountryCode) {
  setwd("Raw data//Iraq")
  files = list.files()
  IrqRaw = read.csv(files[1] , header = TRUE)
  IrqRaw = IrqRaw[(IrqRaw$CallDispo == "Someone answers"  & IrqRaw$RESPConsent == "Yes"),]
  print(paste0('total number of completed surveys for Iraq survey is ' , nrow(IrqRaw)))
  IrqRaw = replace_variable_names(IrqRaw , stdNames$Orignial[stdNames$Country == 'Iraq'] , stdNames$Standard[stdNames$Country == 'Iraq'])
}


```


## ADMIN 2 names normalization

```{r Admin2 normalization, message=TRUE, warning=FALSE}
if ("Yemen1" %in% pipInput$CountryCode){
YemStrata = read.csv("C:/Users/amira.swedan/OneDrive - World Food Programme/mVAM regular activities/Regional/RTM modified standard pipeline/codebook/yem_bnd_adm2_wfp_all.csv" )

YemStrata$Dis_NAM_En = trimws(YemStrata$Dis_NAM_En)

for (i in 1:length (YemenRaw$ADMIN2Name)) {
  YemenRaw$ADMIN2Name[i] = stringr::str_trim(YemenRaw$ADMIN2Name[i])
  if(YemenRaw$ADMIN2Name[i] %in% YemStrata$Dis_NAM_En ){
YemenRaw$ADMIN2Name[i] = YemStrata$adm2_id [YemenRaw$ADMIN2Name[i] == YemStrata$Dis_NAM_En]
  } 
}

}


if ("Lebanon" %in% pipInput$CountryCode){
LebStrata = read_excel("C:/Users/amira.swedan/OneDrive - World Food Programme/mVAM regular activities/Regional/RTM modified standard pipeline/Codebook/lbn_bnd_adm3_wfp_a_20210325.xlsx" , sheet = 1)

LebStrata$adm2_name = trimws(LebStrata$adm2_name)

for (i in 1:length (LebRaw$ADMIN2Name)) {
  LebRaw$ADMIN2Name[i] = str_trim(LebRaw$ADMIN2Name[i])
  if(LebRaw$ADMIN2Name[i] %in% LebStrata$adm2_name ){
LebRaw$ADMIN2Name[i] = LebStrata$adm2_id [LebRaw$ADMIN2Name[i] == LebStrata$adm2_name]
  } 
}

}


if ("Syria" %in% pipInput$CountryCode){
SyrStrata = read_excel("C:/Users/amira.swedan/OneDrive - World Food Programme/mVAM regular activities/Regional/RTM modified standard pipeline/Codebook/syr_bnd_adm3_wfp_a_20210325.xlsx" , sheet = 1)

SyrStrata$adm2_name = trimws(SyrStrata$adm2_name)

for (i in 1:length (SyriaRaw$ADMIN2Name)) {
  SyriaRaw$ADMIN2Name[i] = str_trim(SyriaRaw$ADMIN2Name[i])
  if(SyriaRaw$ADMIN2Name[i] %in% SyrStrata$adm2_name ){
SyriaRaw$ADMIN2Name[i] = SyrStrata$adm2_id [SyriaRaw$ADMIN2Name[i] == SyrStrata$adm2_name]
  } 
}

}

if ("Iraq" %in% pipInput$CountryCode){
IrqStrata = read_excel("C:/Users/amira.swedan/OneDrive - World Food Programme/mVAM regular activities/Regional/RTM modified standard pipeline/Codebook/irq_bnd_adm2_wfp_a_20210325.xlsx" , sheet = 1)
Irq_admin2_mapping = read_excel("C:/Users/amira.swedan/OneDrive - World Food Programme/mVAM regular activities/Regional/RTM modified standard pipeline/Codebook/Iraq Admin standard names mapping.xlsx" , sheet = 1)
IrqRaw$ADMIN2Name[IrqRaw$ADMIN2Name == "Dashti Hawler (Benssalawa)"] = "Benslawa"

IrqRaw$ADMIN2Name = replaceall( IrqRaw$ADMIN2Name  , Irq_admin2_mapping$Admin2_original_name , Irq_admin2_mapping$Admin2_standard_name , quiet = TRUE)

IrqStrata$adm2_name = trimws(IrqStrata$adm2_name)
IrqStrata$adm2_name = gsub ("Al-" , "" , IrqStrata$adm2_name)
IrqRaw$ADMIN2Name = gsub("Al-" ,"" , IrqRaw$ADMIN2Name)


for (i in 1:length (IrqRaw$ADMIN2Name)) {
  IrqRaw$ADMIN2Name[i] = str_trim(IrqRaw$ADMIN2Name[i])
  if(IrqRaw$ADMIN2Name[i] %in% IrqStrata$adm2_name ){
IrqRaw$ADMIN2Name[i] = IrqStrata$adm2_id [IrqRaw$ADMIN2Name[i] == IrqStrata$adm2_name]
  } 
}
}


```


## selecting core indicators from each country and recoding labels

core indicators are the key demographic and geographic indicators as well as all food security modules

Questionnaires should be included in the respective folders (make sure you upload the one aligned with this version of the pipeline)

```{r  selecting core indicators , echo=TRUE, message=FALSE, warning=FALSE}


# select core indicators and recode variables - Syria

if ('Syria' %in% pipInput$CountryCode) {
SyriaRaw_reduced = SyriaRaw %>% select (ADMIN0Name,ADMIN1Name, ADMIN2Name ,RspID,SvyDate, ObsDate ,EnuName,HHH_YN,HHHSex,HHHAge,HHHEducation,RESPAge,HHSize , HHDispl, IDPDuration,HDwellType,HWaterSRC, FCSStap, FCSPulse, FCSVeg, FCSFruit,FCSPr, FCSDairy, FCSFat , FCSSugar, FCS_SRf, MktAccess_1M,  rCSILessQlty, rCSIBorrow, rCSIMealNb , rCSIMealSize, rCSIMealAdult, Lcs_stress_DomAsset, Lcs_stress_Saving, Lcs_stress_SellFoodRation, Lcs_stress_CrdtFood, Lcs_crisis_OutSchool, Lcs_crisis_ProdAssets, Lcs_crisis_Migration, Lcs_em_ResAsset, LcsR_em_FemAnimal, Lcs_em_Begged)

## convert numeric variables 

num_vars = c("RESPAge" , "HHSize" , "HHHAge" , "FCSStap" ,	"FCSPulse",	"FCSVeg",	"FCSFruit",	"FCSPr",	"FCSDairy",	"FCSFat",	"FCSSugar", "rCSILessQlty",	"rCSIBorrow",	"rCSIMealNb",	"rCSIMealSize",	"rCSIMealAdult")

SyriaRaw_reduced [, num_vars] = apply (SyriaRaw_reduced [, num_vars] , 2 , as.numeric)

convert_to_char = c("HHHSex" , "HHDispl" , "HHHEducation" , "IDPDuration" , "HDwellType" , "HWaterSRC" , "FCS_SRf" , "MktAccess_1M")

lcs_vars = names (SyriaRaw_reduced %>% select (starts_with('Lcs')))

SyriaRaw_reduced[,convert_to_char] = apply(SyriaRaw_reduced[,convert_to_char] , 2 , as.character)

SyriaRaw_reduced[,lcs_vars] = apply(SyriaRaw_reduced[,lcs_vars] , 2 , as.character)


SyriaRaw_reduced = SyriaRaw_reduced %>% mutate(HHHSex=recode(HHHSex, '1' ~ 'Female' , '2' ~ 'Male' ) , HHDispl = recode(HHDispl , '1' ~ 'Resident' , '2' ~ 'Returnee' , '3' ~ 'IDP') , HHHEducation = recode(HHHEducation , '1' ~ 'None' , '2' ~ 'Primary' , '3' ~ 'Secondary' , '4' ~ 'Vocational' , '5' ~ 'University') , IDPDuration = recode(IDPDuration, '1' ~ 'LT3M' , '2' ~'3To6M'  , '3' ~ '7To12M' , '4' ~ 'Above12M' ) , HDwellType = recode(HDwellType, '1' ~ 'Own' , '2' ~ 'Rent' , '3' ~ 'Guest_family' , '4' ~ 'Guest_strangers' , '5' ~ 'SharingAcco' , '6' ~ 'Hotel' ,'7' ~ 'RentAnotherH' , '8' ~'NoPlace' , '9' ~ 'TempShelter' , '10' ~ 'Camp' , '11' ~ 'UnfinishedShelter' , '12' ~ 'CollectiveShelter' , '13' ~ 'Other') , HWaterSRC = recode(HWaterSRC, '1' ~ 'Piped' , '2' ~ 'PublicTap' , '3' ~ 'borehole' ,'4' ~ 'ProtectedWell' , '5' ~ 'ProtectedSpring' , '6' ~ 'UnprotectedSpring' , '7' ~ 'River' , '8' ~ 'Lake' , '9' ~ 'Rain' , '10' ~ 'BottledWater' , '11' ~ 'TruckedWater'), FCS_SRf = recode(FCS_SRf, '1' ~ 'OwnProd' , '2' ~ 'Labour' , '3' ~ 'Purchase' , '4' ~ 'FoodAssist' , '5' ~ 'Scavenging' , '6' ~ 'Gift') , MktAccess_1M = recode (MktAccess_1M , '1' ~ 'Yes' , '2' ~ 'No' , '3' ~ 'DK') )

SyriaRaw_reduced[lcs_vars] = replaceall( SyriaRaw_reduced [lcs_vars] , c('1' , '2' , '3' , '4') , c('Yes' , 'No' , 'AlreadyExhausted'  , 'NotApplicable') , quiet = TRUE)

}

#Yemen
if ('Yemen1' %in% pipInput$CountryCode) {
YemenRaw_reduced = YemenRaw %>% select(RspID, SvyDate,ObsDate,EnuName,RESPAge , HHHSex, ADMIN0Name , ADMIN1Name, ADMIN2Name , HHSize, HHDispl, HHChronIllNb , FCSStap , FCSPulse, FCSVeg, FCSFruit, FCSPr, FCSDairy, FCSFat, FCSSugar , FCS_SRf , rCSILessQlty, rCSIBorrow , rCSIMealNb , rCSIMealSize , rCSIMealAdult ,LcsEN_stress_DomAsset , LcsEN_stress_CrdtFood , LcsEN_stress_Saving, LcsEN_stress_BorrowCash, LcsEN_crisis_ProdAssets, LcsEN_crisis_Health , LcsEN_crisis_OutSchool , LcsEN_em_ResAsset , LcsEN_em_Begged, LcsEN_em_LastAnimal , LhCSIEnAccess , HHIncFirst_SRi , HHIncChg_1M  )


convert_to_char = c("HHIncFirst_SRi" , "HHIncChg_1M" , "LhCSIEnAccess" )

lcs_vars = names (YemenRaw_reduced %>% select (starts_with('Lcs')))

YemenRaw_reduced[,convert_to_char] = apply(YemenRaw_reduced[,convert_to_char] , 2 , as.character)

YemenRaw_reduced[,lcs_vars] = apply(YemenRaw_reduced[,lcs_vars] , 2 , as.character)

num_vars = c("RESPAge" , "HHSize" , "FCSStap" ,	"FCSPulse",	"FCSVeg",	"FCSFruit",	"FCSPr",	"FCSDairy",	"FCSFat",	"FCSSugar", "rCSILessQlty",	"rCSIBorrow",	"rCSIMealNb",	"rCSIMealSize",	"rCSIMealAdult")

YemenRaw_reduced [, num_vars] = apply (YemenRaw_reduced [, num_vars] , 2 , as.numeric)

YemenRaw_reduced = YemenRaw_reduced %>% mutate(HHHSex=recode(HHHSex, 'M' ~ 'Male' , 'F' ~ 'Female' ) , HHDispl = recode(HHDispl , 'Y' ~ 'IDP' , 'N' ~ 'Resident' ), FCS_SRf = recode(FCS_SRf, 'Produced by the household' ~ 'OwnProd' , 'Hunting/gathering/fishing' ~ 'Hunting' , 'Bought using cash' ~ 'Purchase' , 'Bought on credit' ~ 'On_Credit' , 'Borrowed/gifts (friends/relatives)' ~'Gift' ,  'Begging' ~ 'Begging' , 'Swap' ~ 'Swap' , 'Food assistance' ~ 'FoodAssist' , '9' ~ 'CasualLabour') , HHIncFirst_SRi = recode (HHIncFirst_SRi , '1' ~ 'RegEmp' , '2' ~ 'CasualLabour' , '3' ~ 'NoWork_assist') , HHIncChg_YN_1M = recode(HHIncChg_1M, '1' ~ 'Increased' ,'2' ~ 'Same' , '3' ~ 'Reduced' , '4' ~ 'Stopped'), LhCSIEnAccess = recode(LhCSIEnAccess, '1' ~ 'Food' , '2' ~ 'Education' , '3' ~ 'Health' , '4' ~ 'Shelter' , '5' ~ 'Sanitation' , '6' ~ 'Other') )

lcs_vars = names (YemenRaw_reduced %>% select (starts_with('Lcs')))
YemenRaw_reduced[lcs_vars] = replaceall( YemenRaw_reduced [lcs_vars] , c('1' , '2' , '3' , '4' , '99') , c('No' , 'AlreadyExhausted' , 'Yes'  , 'No_shortage' , 'NotApplicable') , quiet = TRUE)
}


#Lebanon


if ('Lebanon' %in% pipInput$CountryCode) {
LebRaw$HHSize = as.numeric(LebRaw$HHSizebelow5) +  as.numeric(LebRaw$HHSize_5_17) + as.numeric(LebRaw$HHSize_18_59_m) + as.numeric(LebRaw$HHSize_18_59_f) + as.numeric(LebRaw$HHSize_above60)

names(LebRaw) = trimws(names(LebRaw))
LebRaw$HHHSex [LebRaw$Relation_to_HoH == 1] = LebRaw$RESPSex [LebRaw$Relation_to_HoH == 1]
LebRaw$HoHH_Education [LebRaw$Relation_to_HoH == 1] = LebRaw$RESPEducation [LebRaw$Relation_to_HoH == 1]


LebRaw_reduced = LebRaw %>%  select(EnuName, ObsDate, RspID, RespNationality, RESPRelationHHH = Relation_to_HoH,  RESPEducation , HHSize,  HHHSex, HWaterSRC ,HHHEducation = HoHH_Education, HElectricitySRC = MainSourceElectricity , HToiletType = Hhtoiletfacility, ADMIN0Name , ADMIN1Name , ADMIN2Name , SvyDate,RESPAge,HDwellType = HHHousing, FCSStap, FCSPulse, FCSDairy, FCSPr, FCSVeg, FCSFruit, FCSFat, FCSSugar, rCSILessQlty, rCSIBorrow, rCSIMealSize, rCSIMealNb, rCSIMealAdult = rCSIMealAdul, LcsEN_stress_BorrowCash ,  LcsEN_stress_Utilities , LcsEN_stress_DomAsset ,LcsEN_stress_health1, LcsEN_crisis_Health  , LcsEN_crisis_ProdAssets  , LcsEN_crisis_Housing  , LcsEN_crisis_OutSchool  , LcsEN_crisis_ChildWork  , LcsEN_em_IllegalAct, LcsEN_em_Begged, LcsEN_em_Marriage, LcsEN_em_ResAsset,LhCSIEnAccess, FCS_SRf , MktAccess_7D )

LebRaw_reduced  = LebRaw_reduced %>% separate(HElectricitySRC , c("HElectricitySRC_1" , "HElectricitySRC_2" , "HElectricitySRC_3"))

# convert_to_char = c("RespNationality" , "HHHSex" , "HWaterSRC" , "HToiletType" , "HElectricitySRC_1" , "HElectricitySRC_2" , "HElectricitySRC_3" , "FCS_SRf")
# 
# LebRaw_reduced[,convert_to_char] = apply(LebRaw_reduced[,convert_to_char] , 2 , as.character)

num_vars = c("RESPAge" , "HHSize" , "FCSStap" ,	"FCSPulse",	"FCSVeg",	"FCSFruit",	"FCSPr",	"FCSDairy",	"FCSFat",	"FCSSugar", "rCSILessQlty",	"rCSIBorrow",	"rCSIMealNb",	"rCSIMealSize",	"rCSIMealAdult")

LebRaw_reduced [, num_vars] = apply (LebRaw_reduced [, num_vars] , 2 , as.numeric)


LebRaw_reduced = LebRaw_reduced %>% mutate(RespNationality=recode(RespNationality, '1' ~'Leb' , '2' ~ 'Syr_Leb' , '3' ~ 'Pal_Leb' , '4' ~ 'Syr' , '5' ~ 'Pal' , '6' ~ 'Irq' , '7' ~ 'Migrant' , '8' ~ 'Others' ) , HHHSex=recode(HHHSex, '1' ~ 'Male' , '2' ~ 'Female') , HWaterSRC = recode(HWaterSRC, '1' ~ 'Piped' , '2' ~ 'PublicTap' , '3' ~ 'Well_pump' ,'4' ~ 'ProtectedWell' , '5' ~ 'Distilled_water' , '6' ~ 'UnprotectedSpring' , '7' ~ 'tank_truck' , '8' ~ 'car_tank' , '9' ~ 'water_Seller' , '10' ~ 'bottled_water' , '11' ~ 'surface_water' , '12' ~ 'Other') , HToiletType = recode(HToiletType, '1' ~ 'Flush_toilet' , '2' ~ 'hole_tiledbath' , '3' ~ 'hole_unpavedbath' , '4' ~ 'bucket' , '5' ~ 'OpenAir' , '6' ~ 'Refused') , HElectricitySRC_1 = recode(HElectricitySRC_1 , '1' ~ 'company' , '2' ~ 'NonOwned_Generator' , '3' ~ 'Owned_Generator' , '4' ~ 'Relatives' , '5' ~ 'Solar' , '6' ~ 'Battery') ,HElectricitySRC_2 = recode(HElectricitySRC_2 , '1' ~ 'company' , '2' ~ 'NonOwned_Generator' , '3' ~ 'Owned_Generator' , '4' ~ 'Relatives' , '5' ~ 'Solar' , '6' ~ 'Battery') , HElectricitySRC_3 = recode(HElectricitySRC_3 , '1' ~ 'company' , '2' ~ 'NonOwned_Generator' , '3' ~ 'Owned_Generator' , '4' ~ 'Relatives' , '5' ~ 'Solar' , '6' ~ 'Battery')  , FCS_SRf = recode(FCS_SRf , '1' ~ 'OwnProd' , '2' ~ 'Purchase' ,'3' ~ 'Labour'  ,'4' ~ 'Gift' , '5' ~ 'FoodAssist' , '6' ~ 'Gov' , '7' ~ 'Community' , '8' ~ 'Other') )

lcs_vars = names (LebRaw_reduced %>% select (starts_with('Lcs')))
LebRaw_reduced[lcs_vars] = replaceall( LebRaw_reduced [lcs_vars] , c('1' , '2' , '3' , '4') , c('No' , 'AlreadyExhausted' , 'Yes'  , 'NotApplicable') , quiet = TRUE)
}


## Iraq

if ('Iraq' %in% pipInput$CountryCode) {
IrqRaw_reduced = IrqRaw %>% select (SvyDate,EnuName,RESPAge,RESPSex,RESPEducation,HHHSex,HHHAge,HHHEducation = HHHEdu,ADMIN1Name,ADMIN2Name,HHDispl,HDwellType,HHSize,FCSStap,FCSPulse,FCSDairy,FCSPr,FCSVeg,FCSFruit,FCSFat,FCSSugar,FCS_SRf,rCSILessQlty,rCSIBorrow,rCSIMealSize,rCSIMealNb,rCSIMealAdult,Lcs_stress_Saving,Lcs_stress_CrdtFood,Lcs_stress_BorrowCash,Lcs_stress_DomAsset,Lcs_crisis_HealthEdu,Lcs_crisis_ProdAssets,Lcs_crisis_OutSchool,Lcs_em_Begged,Lcs_em_Marriage,Lcs_em_IllegalAct,HHExpTOT,HHExpFood,HHIncFirst_SRi,HHIncChg_1M,mktaccess_14d,mktnoaccesswhy_14d
)

## convert numeric variables 

num_vars = c("RESPAge" , "HHSize" , "HHHAge" , "FCSStap" ,	"FCSPulse",	"FCSVeg",	"FCSFruit",	"FCSPr",	"FCSDairy",	"FCSFat",	"FCSSugar", "rCSILessQlty",	"rCSIBorrow",	"rCSIMealNb",	"rCSIMealSize",	"rCSIMealAdult" , "HHExpTOT" , "HHExpFood") 

IrqRaw_reduced [, num_vars] = apply (IrqRaw_reduced [, num_vars] , 2 , as.numeric)

convert_to_char = c("HHHSex" , "HHDispl" , "HHHEducation"  , "HDwellType" , "FCS_SRf" , "mktaccess_14d" , "HHIncFirst_SRi" , "HHIncChg_1M")

lcs_vars = names (IrqRaw_reduced %>% select (starts_with('Lcs')))

IrqRaw_reduced[,convert_to_char] = apply(IrqRaw_reduced[,convert_to_char] , 2 , as.character)

IrqRaw_reduced[,lcs_vars] = apply(IrqRaw_reduced[,lcs_vars] , 2 , as.character)

IrqRaw_reduced = IrqRaw_reduced %>% mutate( HHHEducation = recode(HHHEducation , 'Did not attend any school' ~ 'None' , 'Did not complete any level' ~ 'None' , 'Primary/Elementary Certificate (1-6)' ~ 'Primary' , 'Intermediate Certificate (7-9)' ~ 'Primary' , 'Basic Certificate (1-9)' ~ 'Secondary' , 'Preparatory/Secondary Certificate - Academic' ~ 'Secondary' , 'Preparatory/Secondary Certificate - Vocational' ~ 'Vocational' , 'Technical diploma (after Secondary)' ~ 'Secondary' , "Bachelor's Degree" ~ "University" , "Professional Degree" ~ "University" , "Higher Diploma Degree" ~ "University" , "Master's Degree" ~ "University" , "Doctoral Degree" ~ "University" )   , RESPEducation = recode(RESPEducation , 'Did not attend any school' ~ 'None' , 'Did not complete any level' ~ 'None' , 'Primary/Elementary Certificate (1-6)' ~ 'Primary' , 'Intermediate Certificate (7-9)' ~ 'Primary' , 'Basic Certificate (1-9)' ~ 'Secondary' , 'Preparatory/Secondary Certificate - Academic' ~ 'Secondary' , 'Preparatory/Secondary Certificate - Vocational' ~ 'Vocational' , 'Technical diploma (after Secondary)' ~ 'Secondary' , "Bachelor's Degree" ~ "University" , "Professional Degree" ~ "University" , "Higher Diploma Degree" ~ "University" , "Master's Degree" ~ "University" , "Doctoral Degree" ~ "University" )  ,HDwellType = recode(HDwellType, 'Own home' ~ 'Own' , 'Rent home' ~ 'Rent' , 'Living with parents' ~ 'Guest_family' , 'Staying as a guest hosted' ~ 'Guest_strangers' , 'Sharing accommodation with other families' ~ 'SharingAcco' , 'Informal Camp' ~ 'InformalCamp' , 'Informal settlement' ~'InformalSettlment'  , 'Camp' ~ 'Camp' )  ,FCS_SRf = recode(FCS_SRf, 'Own production' ~ 'OwnProd' , 'Exchange labor for food' ~ 'Labour' , 'Market \\ Grocery store' ~ 'Purchase' , 'Food assistance by humanitarian agencies' ~ 'FoodAssist' , 'PDS' ~ 'PDS' , "Food assistance by Government" ~ 'FoodAssist' , 'Gift from family, relatives or friends' ~ 'Gift') , mktaccess_14d = recode (mktaccess_14d , 'yes' ~ 'Yes' , 'no' ~ 'No' ) , mktnoaccesswhy_14d = recode(mktnoaccesswhy_14d , 'High food prices' ~ 'HighPrices' , 'Market\\grocery store is too far' ~ 'FarMarkets'))

IrqRaw_reduced[lcs_vars] = replaceall( IrqRaw_reduced [lcs_vars] , c('Yes' , "no - don't need to do" , "No, because I already did it \\(so cannot continue to do it\\)" , "No, I don't have-non applicable") , c('Yes' , 'No' , 'AlreadyExhausted'  , 'NotApplicable') , quiet = TRUE)


}

```


## Construct sampling weights 

```{r}

## Before you run this chunck, make sure you have the weights input table  updated under the respective folder

base_weights <- function (country_name , df) {
  
folder_path = paste0(getwd() , "/Weights input tables" )
xlsx_files <- list.files(folder_path, pattern = "\\.xlsx$", full.names = TRUE)
input_admin1 = read_excel( xlsx_files, sheet = country_name)

# list of the strata names in the dataset
sample_strata = unique (df$ADMIN1Name)
print ( sample_strata) 
# list of the strata names in the input file
country_strata = unique (input_admin1$adm1_name_en)
print ( country_strata) 

    
# Check if the names are matched between the dataset and the input file
if (setequal(intersect(sample_strata,country_strata), sample_strata) == TRUE)
  {

# join the pop figures with the dataset 
     
input_admin1$adm1_name_en = as.character(input_admin1$adm1_name_en)
df$ADMIN1Name = as.character(df$ADMIN1Name) 
  
df = left_join(df , input_admin1 , by = join_by ("ADMIN1Name" == "adm1_name_en"))
    
# get the number of surveys completed in each strata in the dataset
sample_size = df %>% group_by(ADMIN1Name) %>% summarise(strata_cnt = n())
print (paste0("Here is the sample size" , sample_size))
  
# Join the sample size per strata with the dataset
df = left_join(df , sample_size , by = c("ADMIN1Name" ))
  

# calculate the base weights and print the average and sum
df$base_weights = (df$adm1_pop_count / sum(df$adm1_pop_count) )/ (df$strata_cnt / sum(df$strata_cnt))
  
print (country_name)
print (head(df$base_weights))
  
print (paste0("the average of the base weight is " , mean(df$base_weights)))
print (paste0("the sum of the base weight is " , sum(df$base_weights)))
print (boxplot(df$base_weights))
  
return(df)
  
  }
  
  else {print ( "strata names are not matching or maybe one of the strata is missing in the input table" )}
}

YemenRaw_reduced = base_weights ("Yemen" , YemenRaw_reduced)
SyriaRaw_reduced = base_weights("Syria" , SyriaRaw_reduced)
LebRaw_reduced = base_weights("Lebanon" , LebRaw_reduced )


## Construct post-stratification weights for Lebanon

# Re-categorize HHHEducation to 3 groups 
LebRaw_reduced$HHHEdu_3gps [(LebRaw_reduced$HHHEducation == "0" | LebRaw_reduced$HHHEducation == "1" | LebRaw_reduced$HHHEducation == "99") ] = "Illiterate_Primary"
LebRaw_reduced$HHHEdu_3gps [(LebRaw_reduced$RESPEducation == "0" | LebRaw_reduced$RESPEducation == "1" | LebRaw_reduced$HHHEducation == "99") & LebRaw_reduced$RESPRelationHHH == "1"  ] = "Illiterate_Primary"

LebRaw_reduced$HHHEdu_3gps [(LebRaw_reduced$HHHEducation == "2" | LebRaw_reduced$HHHEducation == "3")] = "Secondary"
LebRaw_reduced$sample_HHHEdu_3gps [(LebRaw_reduced$RESPEducation == "2"  | LebRaw_reduced$RESPEducation == "3") & (LebRaw_reduced$RESPRelationHHH == "1")] = "Secondary"

LebRaw_reduced$HHHEdu_3gps [( LebRaw_reduced$HHHEducation == "4" | LebRaw_reduced$HHHEducation == "5" | LebRaw_reduced$HHHEducation == "6" | LebRaw_reduced$HHHEducation == "7")] = "Higher"

LebRaw_reduced$HHHEdu_3gps [( LebRaw_reduced$RESPEducation == "4" | LebRaw_reduced$RESPEducation == "5" | LebRaw_reduced$RESPEducation == "6" | LebRaw_reduced$RESPEducation == "7") & (LebRaw_reduced$RESPRelationHHH == "1")] = "Higher"

tbl1 = LebRaw_reduced %>% group_by(ADMIN1Name  ) %>% summarise(cnt_gov = n())

tbl2 = LebRaw_reduced %>% group_by(ADMIN1Name , HHHEdu_3gps ) %>% summarise(cnt = n())

full_tbl = full_join(tbl1 , tbl2)
full_tbl$HHHEdu_3gps_sample_perc = full_tbl$cnt / full_tbl$cnt_gov
LebRaw_reduced = full_join(LebRaw_reduced , full_tbl  , by = c("ADMIN1Name" , "HHHEdu_3gps"))

```


## Generate Master Table

```{r generate master table, echo=TRUE, message=FALSE, warning=FALSE}

IrqRaw_reduced$ADMIN0Name = 'Iraq'

df_master = full_join(SyriaRaw_reduced , YemenRaw_reduced)
df_master = full_join( df_master, LebRaw_reduced)
df_master = full_join(df_master , IrqRaw_reduced)


## Check that there are no missing values 
no_nulls = c("ADMIN0Name" ,	"ADMIN1Name" ,	"ADMIN2Name" ,	"RspID" ,	"SvyDate" ,	"ObsDate" ,	"EnuName" , "HHHSex" , "RESPAge" ,	"HHSize" , "FCSStap" ,	"FCSPulse" ,	"FCSVeg" ,	"FCSFruit" ,	"FCSPr" ,	"FCSDairy" ,	"FCSFat" ,	"FCSSugar" , "rCSILessQlty" ,	"rCSIBorrow",	"rCSIMealNb" ,	"rCSIMealSize" ,	"rCSIMealAdult" )

for (i in 1:length(no_nulls)) {
  if ( any (is.na (df_master[no_nulls[i]] ))) {
    print (paste0("variable " , no_nulls [i] , " has missing values and and it should not"))
    print (paste0("check the records between " ))
    print ( min (which(is.na(df_master[no_nulls[i]]))))
    print ( max (which(is.na(df_master[no_nulls[i]]))))
    print ("----------------------------------------")
  }
  
}

```

## Food Security Indicators Calculation

```{r calculate indicators , echo=TRUE, message=FALSE, warning=FALSE}


#assign variable and value labels
var_label(df_master$FCSStap) <- "Cereals, grains, roots and tubers"
var_label(df_master$FCSPulse) <- "Pulses/ legumes / nuts"
var_label(df_master$FCSDairy) <- "Milk and other dairy products"
var_label(df_master$FCSPr) <- "Meat, fish and eggs"
var_label(df_master$FCSVeg) <- "Vegetables and leave"
var_label(df_master$FCSFruit) <- "Fruits"
var_label(df_master$FCSFat) <- "Oil/fat/butter"

#calculate FCS and FCG

FCS_vars = names (df_master %>% select (starts_with('FCS')))

df_master[FCS_vars[-9]] = apply(df_master[FCS_vars[-9]] , 2 , as.numeric)


df_master <- df_master %>% mutate(FCS = (2 * FCSStap) +(3 * FCSPulse) +(4*FCSPr) +(4*FCSDairy) + FCSVeg  + FCSFruit +(0.5*FCSFat) +(0.5*FCSSugar))
var_label(df_master$FCS) <- "Food Consumption Score"

df_master <- df_master %>% mutate(FCG = case_when(
  FCS <= 28 ~ 1, between(FCS, 28.5, 42) ~ 2, FCS > 42 ~ 3))

val_lab(df_master$FCG) = num_lab("
             1 Poor
             2 Borderline
             3 Acceptable
")

var_label(df_master$FCG) <- "FCS Categories"

#rCSI (reduced Coping Strategies Index)
#assign variable and value labels
var_label(df_master$rCSILessQlty) <-  "Rely on less preferred and less expensive food in the past 7 days"
var_label(df_master$rCSIBorrow) <- "Borrow food or rely on help from a relative or friend in the past 7 days"
var_label(df_master$rCSIMealNb) <-  "Reduce number of meals eaten in a day in the past 7 days"
var_label(df_master$rCSIMealSize) <- "Limit portion size of meals at meal times in the past 7 days"
var_label(df_master$rCSIMealAdult) <-  "Restrict consumption by adults in order for small children to eat in the past 7 days"

#calculate reduced Coping Strategy Index (rCSI)
rcsi_vars = names (df_master %>% select (starts_with('rCSI')))
df_master[rcsi_vars] = apply(df_master[rcsi_vars] , 2 , as.numeric)

df_master <- df_master %>% mutate(rCSI = rCSILessQlty + (2 * rCSIBorrow) + rCSIMealNb + rCSIMealSize + (3 * rCSIMealAdult))
var_label(df_master$rCSI) <- "Reduced coping strategies index (rCSI)"


```

## Data profiling and quality checks

```{r data profiling, echo=TRUE, fig.height=5, fig.width=9, message=FALSE, warning=FALSE}

is_outlier <- function(x) {
  return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}

# Check number of completed survey per day per country


# Convert 'obs_date' to Date format
df_master$ObsDate = as.Date(df_master$ObsDate, format = "%m/%d/%Y")

# Group the data by 'obs_date' and 'ADMIN0NAME' and count the number of completed surveys
completed_surveys <- df_master %>%
  group_by(ObsDate, ADMIN0Name) %>%
  summarise(Completed_Count = n())

# Plot the results using ggplot2
ggplot(completed_surveys, aes(x = ObsDate, y = Completed_Count, fill = ADMIN0Name)) +
  geom_line(stat = "identity" , aes(colour = ADMIN0Name)) +
  labs(x = "Date", y = "Completed surveys", title = "Number of Completed Surveys by Country") +
  theme_bw() +
  theme(legend.position = "top", axis.text.x = element_text(angle = 45, hjust = 1))

# Group the data by 'ADMIN0NAME' and 'ADMIN1NAME' and count the number of completed surveys

completed_surveys_adm1 <- df_master %>%
  group_by(ADMIN0Name, ADMIN1Name) %>%
  summarise(Completed_Count = n())


ggplot(completed_surveys_adm1)+
  geom_bar(aes(y = Completed_Count, x = ADMIN1Name , fill = ADMIN0Name), stat = "identity" , width = 0.5)+
  theme_bw()+
  ylab("Completed surveys")+
  xlab("Governorate")+
  ggtitle("Completed surveys per governorate") +
  facet_wrap(~ADMIN0Name, scales = 'free_x') +
  theme(axis.text.x = element_text(angle = 45 ,  hjust=1 , size = 8) ,legend.position="none")


# Get duplicated cases in RspID excluding missing values
duplicated_cases <- df_master$RspID[duplicated(df_master$RspID[complete.cases(df_master$RspID) ]) ]

# Print the duplicated cases
print(duplicated_cases)

#completed surveys and average consumption per enumerator per country 

Enum_completed_surveys <- df_master %>%
group_by( ADMIN0Name, EnuName , RspID) %>%
summarise( avg_staples = round (mean(FCSStap),1) , avg_pulses = round (mean(FCSPulse),1) , avg_fruits = round (mean(FCSFruit),1) , avg_proteins = round (mean(FCSPr),1) , avg_veg = round (mean(FCSVeg),1) , avg_dairy = round (mean(FCSDairy),1) , avg_sug = round(mean(FCSSugar),1) , avg_fat = round(mean(FCSFat),1))

Enum_completed_surveys = reshape2::melt(Enum_completed_surveys)

if ('Syria' %in% pipInput$CountryCode) {
Enum_completed_surveys [Enum_completed_surveys$ADMIN0Name == "Syria",] %>% 
group_by(  variable) %>% 
mutate(outlier = if_else(is_outlier(value), EnuName, NA_character_)) %>% 
ggplot(aes(x = variable, y = value, color = variable)) +
geom_boxplot() +
geom_text_repel(aes(label = outlier , label.size = 1 ), na.rm = TRUE, show.legend = F) +
  theme(legend.position="none")
}

if ('Yemen1' %in% pipInput$CountryCode) {
Enum_completed_surveys [Enum_completed_surveys$ADMIN0Name == "Yemen",] %>% 
group_by(  variable) %>% 
mutate(outlier = if_else(is_outlier(value), EnuName, NA_character_)) %>% 
ggplot(aes(x = variable, y = value, color = variable)) +
geom_boxplot() +
geom_text_repel(aes(label = outlier , size = 1 ), na.rm = TRUE, show.legend = F) +
  theme(legend.position="none")
}

if ('Lebanon' %in% pipInput$CountryCode) {
Enum_completed_surveys [Enum_completed_surveys$ADMIN0Name == "Lebanon",] %>%
group_by(  variable) %>% 
mutate(outlier = if_else(is_outlier(value), EnuName, NA_character_)) %>% 
ggplot(aes(x = variable, y = value, color = variable)) +
geom_boxplot() +
geom_text_repel(aes(label = outlier , size = 1 ), na.rm = TRUE, show.legend = F) +
  theme(legend.position="none")
}


if ('Iraq' %in% pipInput$CountryCode) {
Enum_completed_surveys [Enum_completed_surveys$ADMIN0Name == "Iraq",] %>%
group_by(  variable) %>% 
mutate(outlier = if_else(is_outlier(value), EnuName, NA_character_)) %>% 
ggplot(aes(x = variable, y = value, color = variable)) +
geom_boxplot() +
geom_text_repel(aes(label = outlier , size = 1 ), na.rm = TRUE, show.legend = F) +
  theme(legend.position="none")
}


Enum_exp <- df_master [df_master$ADMIN0Name == 'Iraq',] %>%
group_by(  EnuName ) %>%
summarise( avg_total_exp = round (mean(HHExpTOT),1) , avg_food_exp = round (mean(HHExpFood),1)) %>% reshape2::melt() 

Enum_exp %>%
group_by(  variable) %>% 
mutate(outlier = if_else(is_outlier(value), EnuName, NA_character_)) %>% 
ggplot(aes(x = variable, y = value, color = variable)) +
geom_boxplot() +
geom_text_repel(aes(label = outlier , size = 1 ), na.rm = TRUE, show.legend = F) +
  theme(legend.position="none")

###Contact the call center in case there are any suspicious observations and ensure that the invalid observations are removed from the call center side##

##Don't remove or fix any value yourself##


## Reporting on the use of emergency coping strategies 

em_lcs_vars = names (df_master %>% select (contains('em')))


# Iterate over each variable in em_lcs_vars 

for (var in em_lcs_vars) {
  print (var)
  df_sub = df_master[[var]]
  df_sub = df_sub [!is.na(df_sub)]
  g =  ggplot(data.frame(x=df_sub)) +
  geom_bar(aes(x) , fill="steelblue") 
  print (g)
}


```

## Calls logs validation

```{r call logs validation, eval=FALSE, message=FALSE, warning=FALSE, include=FALSE}
## Activate this chunk when you have meta data about the number of call attempts and attempts status - template is included in the folder (panelists tracking)
##NOTE##
# the folder (Panelists tracking) should only include one file at a time each country should have a separated sheet

folder_path = paste0(getwd() , "/Panelists tracking")

# Get a list of all file names in the folder with the .xlsx extension
xlsx_files <- list.files(folder_path, pattern = "\\.xlsx$", full.names = TRUE)

# Check if there is exactly one xlsx file
if (length(xlsx_files) == 1) {
  file_path = xlsx_files  # Get the file path
  # Read the xlsx file (modify the below line if needed)
  sheet_names = excel_sheets(file_path)
  # Use map or lapply function to read data from each sheet
  calls_log_ls = map(sheet_names, ~ read_excel(file_path, sheet = .x))

  for (i in 1:length(calls_log_ls)) {
    df = calls_log_ls[[i]]
    print (unique (df["Admin0Name"]))
    print (prop.table(table(df["Paneslist (Yes/No)"]))) 
    print (prop.table(table(df["Call attempt outcome"])))
    df [df["Call attempt outcome" ] != "Completed Survey",] %>% group_by(RspID) %>% summarise(cnt = n()) 

    # create breaks
    breaks <- hour(hm("00:00", "6:00", "12:00", "18:00", "23:59"))
    # labels for the breaks
    labels <- c("Night", "Morning", "Afternoon", "Evening")
    
    df$Time_of_day <- cut(x=hour(df$`Call attempt time`), breaks = breaks, labels = labels, include.lowest=TRUE)
        df %>% group_by(RspID ,Time_of_day, `Call attempt outcome`) %>% summarise(cnt = n())
        df %>% group_by(RspID , `Call attempt outcome`) %>% summarise(dur_avg = mean(minutes(hms( `Call duration in minutes`))) )
        
  }  
  
  
} else {
  stop("No xlsx file or multiple xlsx files found in the folder.")
}


```

## Admin 1 divisions normalization

```{r admin1 normalization, message=FALSE, warning=FALSE}


df_master$ADMIN1Name = dplyr::case_match( df_master$ADMIN1Name , "Rural Damascus" ~ "900230" , "Homs" ~ "900226" , "Hama" ~ "900225" , "As-Sweida" ~ "900221" , "Lattakia" ~ "900228" , "Ar-Raqqa" ~ "900220" , "Al-Hasakeh" ~ "900218" , "Dar'a" ~ "900223"  , "Damascus" ~ "900222" , "Aleppo" ~ "900219" , "Deir-ez-Zor" ~ "900224" , "Tartous" ~ "900231" , "Quneitra" ~ "900229" , "Idleb" ~ "900227" , "Aden" ~ "903632" , "Abyan" ~ "903630" , "Dhamar" ~ "903639" , "Al Jawf" ~ "903635" , "Sa'ada" ~ "903646" ,"Sa'dah" ~ "903646" ,"Lahj" ~ "903643" , "Al Hudaydah" ~ "903634" , "Marib" ~ "903644" , "Al Dhale'e" ~ "903631" , "Ibb" ~ "903642" , "Amanat Al Asimah" ~ "903648" , "Sana'a" ~ "903647" , "Raymah" ~ "903645" , "Sana'a City" ~ "903648" ,"Ad Dali" ~ "903631"   , "Al Bayda" ~ "903633" , "Hajjah" ~ "903641" , "Shabwah" ~ "903649" , "Taizz" ~ "903651" , "Amran" ~ "903638"  , "Hadramaut" ~ "903640" , "Al Maharah" ~ "903636" , "Al Mahwit" ~ "903637" , "North" ~ "900943" , "Akkar" ~ "900944" , "Baalbek-El Hermel" ~ "900941" , "Mount Lebanon" ~ "900946" , "South" ~ "900945" , "Bekaa" ~ "900940" , "El Nabatieh" ~ "900947" , "Beirut" ~ "900942" , "Socotra" ~ "903650"  , "Najaf" ~ "901262" , "Baghdad" ~ "901252" , "Babylon" ~ "901251" , "Erbil" ~ "901256" , "Anbar" ~ "901250" , "Diyala"  ~ "901254" , "Qadissiya" ~ "901263" , "Kerbala" ~ "901257" , "Ninewa" ~ "901261" , "Salah Al-Din" ~ "901264" , "Wassit" ~ "901267" , "Thi-Qar" ~ "901266" , "Muthanna" ~ "901260" , "Basrah" ~ "901253" , "Dahuk" ~ "901255" , "Sulaymaniyah" ~ "901265" , "Missan" ~ "901259" , "Kirkuk" ~ "901258"  )

df_master$ADMIN0Name [df_master$ADMIN0Name == "Syria"] = 238
df_master$ADMIN0Name [df_master$ADMIN0Name == "Yemen"] = 269
df_master$ADMIN0Name [df_master$ADMIN0Name == "Lebanon"] = 141
df_master$ADMIN0Name [df_master$ADMIN0Name == "Iraq"] = 118

df_master$ADMIN0Name = as.numeric(df_master$ADMIN0Name)
df_master$ADMIN1Name = as.numeric(df_master$ADMIN1Name)


```


```{r}
#df_master$CmbAdjWt = NA

df_master = df_master %>% select (- c(EnuName ,HHHEdu_3grp_Illiterate_Primary,HHHEdu_3grp_Secondary, HHHEdu_3grp_Higher,HHHEdu_3gps, sample_HHHEdu_3gps, HHHEdu_3gps, sample_HHHEdu_3gps, cnt_gov, cnt, HHHEdu_3gps_sample_perc , `Male HoH` , `Female HoH` , adm0_name , adm0_WFP_code , adm1_WFP_code, adm1_pcode, adm1_pop_count, strata_cnt , HHH_YN, HHHAge, IDPDuration,mktnoaccesswhy_14d ) )

write_xlsx(df_master, paste0("Cleaned data/" , "RTM_master" , "_"  , Sys.Date(), ".xlsx"))
```

## Regional Master Table Connection

```{r upload data, eval=FALSE, message=FALSE, warning=FALSE, include=FALSE}

library(RODBC)
library(tibble)


## delete from master table columns that will not be uploaded

df_master$CmbAdjWt = df_master$base_weights

df_master = df_master %>% select (- c(EnuName ,HHH_YN,HHHAge, IDPDuration,HWaterSRC, MktAccess_1M, HHIncChg_1M, RespNationality, HHSizebelow2, HToiletType, HElectricitySRC_1,HElectricitySRC_2, HElectricitySRC_3, base_weights  ) )


## read config file
config = read.table("Pipeline input/SQL_connection.txt", header = FALSE, sep = ":", dec = ".")

server <- trimws(config$V2[config$V1 == "Servername"])
db_name <- trimws (config$V2[config$V1 == "Db"])
user <- trimws (config$V2[config$V1 == "Username"])
pwd <- trimws (config$V2[config$V1 == "Password"])

 
# Establishing the Connection
 
# Attempt to connect to the SQL Server database using the connection string.

conn <- odbcConnect(db_name, uid = user, pwd = pwd)


# Check if the connection is successful.
if (is.null(conn)) {
  cat("Connection failed\n")
} else {
  cat("Connection successful!\n")
  
qry1 = "Select top 10 * from Obs_Master"
Master_head <- sqlQuery(conn, qry1)

# Add columns that don't exist
vars_to_add <- setdiff(names(Master_head ) , names(df_master))
for (c in vars_to_add) {
  df_master[[c]] = NA
}

df_master = df_master %>% select (- ObsID)

# Align data types
for (c in names(df_master)) {
class(df_master[[c]]) <- class(Master_head[[c]])
}
   
   
# Close the connection to the database 
  odbcClose(conn)
   

}


```