diff --git a/papers/Alireza_Vaezi/banner.jpg b/papers/Alireza_Vaezi/banner.jpg
deleted file mode 100644
index d210b89277..0000000000
Binary files a/papers/Alireza_Vaezi/banner.jpg and /dev/null differ
diff --git a/papers/Alireza_Vaezi/banner.png b/papers/Alireza_Vaezi/banner.png
index e6a793bd6c..add6516597 100644
Binary files a/papers/Alireza_Vaezi/banner.png and b/papers/Alireza_Vaezi/banner.png differ
diff --git a/papers/Alireza_Vaezi/figure1.png b/papers/Alireza_Vaezi/figure1.png
deleted file mode 100644
index cd768ee933..0000000000
Binary files a/papers/Alireza_Vaezi/figure1.png and /dev/null differ
diff --git a/papers/Alireza_Vaezi/figure2.png b/papers/Alireza_Vaezi/figure2.png
deleted file mode 100644
index 2ff94d5a6b..0000000000
Binary files a/papers/Alireza_Vaezi/figure2.png and /dev/null differ
diff --git a/papers/Alireza_Vaezi/main.md b/papers/Alireza_Vaezi/main.md
index 301e69a2bd..4d42eb295a 100644
--- a/papers/Alireza_Vaezi/main.md
+++ b/papers/Alireza_Vaezi/main.md
@@ -1,15 +1,12 @@
 ---
 # Ensure that this title is the same as the one in `myst.yml`
 title: Training a Supervised Cilia Segmentation Model from Self-Supervision
-exports:
-  - format: pdf
-    template: arxiv_two_column
-    output: exports/my-document.pdf
-
 abstract: |
-  Cilia are organelles found on the surface of some cells in the human body that sweep rhythmically to transport substances. Dysfunctional cilia are indicative of diseases that can disrupt organs such as the lungs and kidneys. Understanding cilia behavior is essential in diagnosing and treating such diseases. But, the tasks of automatically analysing cilia are often a labor and time-intensive since there is a lack of automated segmentation. In this work we overcome this bottleneck by developing a robust, self-supervised framework exploiting the visual similarity of normal and dysfunctional cilia. This framework generates pseudolabels from optical flow motion vectors, which serve as training data for a semi-supervised neural network. Our approach eliminates the need for manual annotations, enabling accurate and efficient segmentation of both motile and immotile cilia.
+  Cilia are organelles found on the surface of some cells in the human body that sweep rhythmically to transport substances. Dysfunctional cilia are indicative of diseases that can disrupt organs such as the lungs and kidneys. Understanding cilia behavior is essential in diagnosing and treating such diseases. But, the tasks of automatically analyzing cilia are often a labor and time-intensive since there is a lack of automated segmentation. In this work we overcome this bottleneck by developing a robust, self-supervised framework exploiting the visual similarity of normal and dysfunctional cilia. This framework generates pseudolabels from optical flow motion vectors, which serve as training data for a semi-supervised neural network. Our approach eliminates the need for manual annotations, enabling accurate and efficient segmentation of both motile and immotile cilia.
 ---
 
+(sec:introduction)=
+
 ## Introduction
 
 Cilia are hair-like membranes that extend out from the surface of the cells and are present on a variety of cell types such as lungs and brain ventricles and can be found in the majority of vertebrate cells. Categorized into motile and primary, motile cilia can help the cell to propel, move the flow of fluid, or fulfill sensory functions, while primary cilia act as signal receivers, translating extracellular signals into cellular responses [@doi:10.1007/978-94-007-5808-7_1]. Ciliopathies is the term commonly used to describe diseases caused by ciliary dysfunction. These disorders can result in serious issues such as blindness, neurodevelopmental defects, or obesity [@Hansen2021-fd]. Motile cilia beat in a coordinated manner with a specific frequency and pattern [@doi:10.1016/j.compfluid.2011.05.016]. Stationary, dyskinetic, or slow ciliary beating indicates ciliary defects. Ciliary beating is a fundamental biological process that is essential for the proper functioning of various organs, which makes understanding the ciliary phenotypes a crucial step towards understanding ciliopathies and the conditions stemming from it [@zain2022low].
@@ -20,7 +17,9 @@ Video segmentation techniques tend to be more robust to such noise, but still st
 
 To address this challenge, we propose a two-stage image segmentation model designed to obviate the need for expert-drawn masks. We first build a corpus of segmentation masks based on optical flow (OF) thresholding over a subset of healthy training data with guaranteed motility. We then train a semi-supervised neural segmentation model to identify both motile and immotile data as a single segmentation category, using the flow-generated masks as “pseudolabels”. These pseudolabels operate as “ground truth” for the model while acknowledging the intrinsic uncertainty of the labels. The fact that motile and immotile cilia tend to be visually similar in snapshot allows us to generalize the domain of the model from motile cilia to all cilia. Combining these stages results in a semi-supervised framework that does not rely on any expert-drawn ground-truth segmentation masks, paving the way for full automation of a general cilia analysis pipeline.
 
-The rest of this article is structured as follows: The Background section enumerates the studies relevant to our methodology, followed by a detailed description of our approach in the Methodology section. Finally, the next section delineates our experiment and provides a discussion of the results obtained.
+The rest of this article is structured as follows: The [Background section](#sec:background) enumerates the studies relevant to our methodology, followed by a detailed description of our approach in the [Methodology section](#sec:methodology). Finally, the [next section](#sec:results) delineates our experiment and provides a discussion of the results obtained.
+
+(sec:background)=
 
 ## Background
 
@@ -28,10 +27,12 @@ Dysfunction in ciliary motion indicates diseases known as ciliopathies, which ca
 
 Accurate analysis of ciliary motion is essential but challenging due to the limitations of manual analysis, which is labor-intensive, subjective, and prone to error. [@zain2020towards] proposed a modular generative pipeline that automates ciliary motion analysis by segmenting, representing, and modeling the dynamic behavior of cilia, thereby reducing the need for expert intervention and improving diagnostic consistency. [@quinn2015automated] developed a computational pipeline using dynamic texture analysis and machine learning to objectively and quantitatively assess ciliary motion, achieving over 90% classification accuracy in identifying abnormal ciliary motion associated with diseases like primary ciliary dyskinesia (PCD). Additionally, [@zain2022low] explored advanced feature extraction techniques like Zero-phase PCA Sphering (ZCA) and Sparse Autoencoders (SAE) to enhance cilia segmentation accuracy. These methods address challenges posed by noisy, partially occluded, and out-of-phase imagery, ultimately improving the overall performance of ciliary motion analysis pipelines. Collectively, these approaches aim to enhance diagnostic accuracy and efficiency, making ciliary motion analysis more accessible and reliable, thereby improving patient outcomes through early and accurate detection of ciliopathies. However, these studies rely on manually labeled data. The segmentation masks and ground-truth annotations, which are essential for training the models and validating their performance, are generated by expert reviewers. This dependence on manually labeled data is a significant limitation making automated cilia segmentation the bottleneck to automating cilia analysis.
 
-In the biomedical field, where labeled data is often scarce and costly to obtain, several solutions have been proposed to augment and utilize available data effectively. These include semi-supervised learning [@YAKIMOVICH2021100383,@van2020survey], which utilizes both labeled and unlabeled data to enhance learning accuracy by leveraging the data's underlying distribution. Active learning [@settles2009active] focuses on selectively querying the most informative data points for expert labeling, optimizing the training process by using the most valuable examples. Data augmentation techniques [@10.3389/fcvm.2020.00105], [@Krois2021], [@10.1148/ryai.2020190195], [@Sandfort2019], [@YAKIMOVICH2021100383], [@van2001art], [@krizhevsky2012imagenet], [@ronneberger2015u], such as image transformations and synthetic data generation through Generative Adversarial Networks [@goodfellow2014generative], [@yi2019generative], increase the diversity and volume of training data, enhancing model robustness and reducing overfitting. Transfer learning [@YAKIMOVICH2021100383], [@Sanford2020-yg], [@NEURIPS2019_eb1e7832], [@hutchinson2017overcoming] transfers knowledge from one task to another, minimizing the need for extensive labeled data in new tasks. Self-supervised learning [@kim2019self], [@kolesnikov2019revisiting], [@mahendran2019cross] creates its labels by defining a pretext task, like predicting the position of a randomly cropped image patch, aiding in the learning of useful data representations. Additionally, few-shot, one-shot, and zero-shot learning techniques [@li2006one], [@miller2000learning] are designed to operate with minimal or no labeled examples, relying on generalization capabilities or metadata for making predictions about unseen classes.
+In the biomedical field, where labeled data is often scarce and costly to obtain, several solutions have been proposed to augment and utilize available data effectively. These include semi-supervised learning [@YAKIMOVICH2021100383,@van2020survey], which utilizes both labeled and unlabeled data to enhance learning accuracy by leveraging the data's underlying distribution. Active learning [@settles2009active] focuses on selectively querying the most informative data points for expert labeling, optimizing the training process by using the most valuable examples. Data augmentation techniques [@10.3389/fcvm.2020.00105;@Krois2021;@10.1148/ryai.2020190195;@Sandfort2019;@YAKIMOVICH2021100383;@van2001art;@krizhevsky2012imagenet;@ronneberger2015u], such as image transformations and synthetic data generation through Generative Adversarial Networks [@goodfellow2014generative;@yi2019generative], increase the diversity and volume of training data, enhancing model robustness and reducing overfitting. Transfer learning [@YAKIMOVICH2021100383;@Sanford2020-yg;@NEURIPS2019_eb1e7832;@hutchinson2017overcoming] transfers knowledge from one task to another, minimizing the need for extensive labeled data in new tasks. Self-supervised learning [@kim2019self;@kolesnikov2019revisiting;@mahendran2019cross] creates its labels by defining a pretext task, like predicting the position of a randomly cropped image patch, aiding in the learning of useful data representations. Additionally, few-shot, one-shot, and zero-shot learning techniques [@li2006one;@miller2000learning] are designed to operate with minimal or no labeled examples, relying on generalization capabilities or metadata for making predictions about unseen classes.
 
 A promising approach to overcome the dependency on manually labeled data is the use of unsupervised methods to generate ground truth masks. Unsupervised methods do not require prior knowledge of the data [@khatibi2021proposing]. Using domain-specific cues unsupervised learning techniques can automatically discover patterns and structures in the data without the need for labeled examples, potentially simplifying the process of generating accurate segmentation masks for cilia. Inspired by advances in unsupervised methods for image segmentation, in this work, we firstly compute the motion vectors using optical flow of the ciliary regions and then apply autoregressive modelling to capture their temporal dynamics. Autoregressive modelling is advantageous since the labels are features themselves. By analyzing the OF vectors, we can identify the characteristic motion of cilia, which allows us to generate pseudolabels as ground truth segmentation masks. These pseudolabels are then used to train a robust semi-supervised neural network, enabling accurate and automated segmentation of both motile and immotile cilia.
 
+(sec:methodology)=
+
 ## Methodology
 
 Dynamic textures, such as sea waves, smoke, and foliage, are sequences of images of moving scenes that exhibit certain stationarity properties in time [@doretto2003dynamic]. Similarly, ciliary motion can be considered as dynamic textures for their orderly rhythmic beating. Taking advantage of this temporal regularity in ciliary motion, OF can be used to compute the flow vectors of each pixel of high-speed videos of cilia. In conjunction with OF, autoregressive (AR) parameterization of the OF property of the video yields a manifold that quantifies the characteristic motion in the cilia. The low dimension of this manifold contains the majority of variations within the data, which can then be used to segment the motile ciliary regions.
@@ -51,10 +52,12 @@ Where $I(x,y,t)$ is the pixel intensity at position $(x,y)$ a time $t$. Here, $(
 :label: fig:sample_vids_with_gt_mask
 A sample of three videos in our cilia dataset with their manually annotated ground truth masks.
 :::
+
 <!-- :::{figure} ground_truth.png
 :label: fig:ground_truth
 Manually labeled ground truth
 ::: -->
+
 :::{figure} sample_OF.png
 :label: fig:sample_OF
 Representation of rotation (curl) component of OF at a random time
@@ -66,7 +69,7 @@ Representation of rotation (curl) component of OF at a random time
 
 ```{math}
 :label: AR
-y_t =C\vec{x_t} + \vec{u} 
+y_t =C\vec{x_t} + \vec{u}
 ```
 
 ```{math}
@@ -103,22 +106,25 @@ The next section discusses the results of the experiment and the performance of
 
 :::{table} Summary of model architecture, training setup, and dataset distribution
 :label: tbl:model_specs
-| **Aspect**                      | **Details**                                                                                                                              |
-|---------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|
-| **Architecture**                | FPN with ResNet-34 encoder                                                                                                               |
-| **Input**                       | Grayscale images with a single input channel                                                                                             |
-| **Batch Size**                  | 2                                                                                                                                        |
-| **Training Samples**            | 28,869                                                                                                                                   |
-| **Validation Samples**          | 5,095                                                                                                                                    |
-| **Test Samples**                | 108                                                                                                                                      |
-| **Loss Function**               | Binary Cross-Entropy Loss                                                                                                                |
-| **Optimizer**                   | Adam optimizer with a learning rate of $10^{-3}$                                                                                         |
-| **Evaluation Metric**           | Dice score during training, validation, and testing                                                                                                |
-| **Data Augmentation Techniques**| Resizing, random cropping, and rotation                                                                                                  |
-| **Implementation**              | Using a Python library with Neural Networks for Image Segmentation based on PyTorch [@Iakubovskii:2019]                                  |
+
+| **Aspect**                       | **Details**                                                                                             |
+| -------------------------------- | ------------------------------------------------------------------------------------------------------- |
+| **Architecture**                 | FPN with ResNet-34 encoder                                                                              |
+| **Input**                        | Grayscale images with a single input channel                                                            |
+| **Batch Size**                   | 2                                                                                                       |
+| **Training Samples**             | 28,869                                                                                                  |
+| **Validation Samples**           | 5,095                                                                                                   |
+| **Test Samples**                 | 108                                                                                                     |
+| **Loss Function**                | Binary Cross-Entropy Loss                                                                               |
+| **Optimizer**                    | Adam optimizer with a learning rate of $10^{-3}$                                                        |
+| **Evaluation Metric**            | Dice score during training, validation, and testing                                                     |
+| **Data Augmentation Techniques** | Resizing, random cropping, and rotation                                                                 |
+| **Implementation**               | Using a Python library with Neural Networks for Image Segmentation based on PyTorch [@Iakubovskii:2019] |
 
 :::
 
+(sec:results)=
+
 ## Results and Discussion
 
 The model's performance metrics, including IoU, Dice score, sensitivity, and specificity, are summarized in @tbl:metrics. The validation phase achieved an IoU of 0.398 and a Dice score of 0.569, which indicates a moderate overlap between the predicted and ground truth masks. The high sensitivity (0.997) observed during validation suggests that the model is proficient in identifying ciliary regions, albeit with a specificity of 0.882, indicating some degree of false positives. In the testing phase, the IoU and Dice scores decreased to 0.132 and 0.233, respectively, reflecting the challenges posed by the dyskinetic cilia data, which were not included in the training or validation sets. Despite this, the model maintained a sensitivity of 0.479 and specificity of 0.806.
@@ -128,15 +134,16 @@ The model's performance metrics, including IoU, Dice score, sensitivity, and spe
 The model predictions on 5 dyskinetic cilia samples. The first column shows a frame of the video, the second column shows the manually labeled ground truth, the third column is the model's prediction, and the last column is a thresholded version of the prediction.
 :::
 
-@fig:out_sample provides visual examples of the model's predictions on dyskinetic cilia samples, alongside the manually labeled ground truth and thresholded predictions. The dyskinetic samples were not used in the training or validation phases. These predictions were generated after only 15 epochs of training with a small training data.  The visual comparison reveals that, while the model captures the general structure of ciliary regions, there are instances of under-segmentation and over-segmentation, which are more pronounced in the dyskinetic samples. This observation is consistent with the quantitative metrics, suggesting that further refinement of the pseudolabel generation process or model architecture could enhance segmentation accuracy.
+@fig:out_sample provides visual examples of the model's predictions on dyskinetic cilia samples, alongside the manually labeled ground truth and thresholded predictions. The dyskinetic samples were not used in the training or validation phases. These predictions were generated after only 15 epochs of training with a small training data. The visual comparison reveals that, while the model captures the general structure of ciliary regions, there are instances of under-segmentation and over-segmentation, which are more pronounced in the dyskinetic samples. This observation is consistent with the quantitative metrics, suggesting that further refinement of the pseudolabel generation process or model architecture could enhance segmentation accuracy.
 
 :::{table} The performance of the model in validation and testing phases after 15 epochs of training.
 :label: tbl:metrics
-| Phases     | Metrics       |             |            |            |
-|------------|---------------|-------------|------------|------------|
-|            | IoU over dataset | Dice Score  | Sensitivity| Specificity|
-| Validation | 0.398         | 0.569       | 0.997      | 0.882      |
-| Testing    | 0.132         | 0.233       | 0.479      | 0.806      |
+
+| Phases     | Metrics          |            |             |             |
+| ---------- | ---------------- | ---------- | ----------- | ----------- |
+|            | IoU over dataset | Dice Score | Sensitivity | Specificity |
+| Validation | 0.398            | 0.569      | 0.997       | 0.882       |
+| Testing    | 0.132            | 0.233      | 0.479       | 0.806       |
 
 :::
 
@@ -146,11 +153,12 @@ Since dyskinetic videos contain cilia that show some degree of movement we gener
 
 :::{table} The performance of the model after retraining with an addition of 283 videos of dyskinetic cilia to the training dataset.
 :label: tbl:exp2_metrics
-| Phases     | Metrics       |             |            |            |
-|------------|---------------|-------------|------------|------------|
-|            | IoU over dataset | Dice Score  | Sensitivity| Specificity|
-| Validation | 0.202         | 0.337       | 0.999      | 0.765      |
-| Testing    | 0.139         | 0.245       | 0.732      | 0.696      |
+
+| Phases     | Metrics          |            |             |             |
+| ---------- | ---------------- | ---------- | ----------- | ----------- |
+|            | IoU over dataset | Dice Score | Sensitivity | Specificity |
+| Validation | 0.202            | 0.337      | 0.999       | 0.765       |
+| Testing    | 0.139            | 0.245      | 0.732       | 0.696       |
 
 :::
 
diff --git a/papers/Alireza_Vaezi/myst.yml b/papers/Alireza_Vaezi/myst.yml
index 99a858d56a..17e40926b1 100644
--- a/papers/Alireza_Vaezi/myst.yml
+++ b/papers/Alireza_Vaezi/myst.yml
@@ -1,50 +1,51 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/HXCJ6205
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-alireza-vaezi
   # Ensure your title is the same as in your `main.md`
   title: Training a Supervised Cilia Segmentation Model from Self-Supervision
-  subtitle: University Of Georgia
+  description: Understanding cilia behavior is essential in diagnosing and treating such diseases, but, the tasks of automatically analyzing cilia are often a labor and time-intensive. In this work we overcome this bottleneck by developing a robust, self-supervised framework exploiting the visual similarity of normal and dysfunctional cilia.
   # Authors should have affiliations, emails and ORCIDs if available
   authors:
-  - name: Seyed Alireza Vaezi
-    email: sv22900@uga.edu
-    orcid: 0009-0000-2089-8362
-    affiliations:
-    - University of Georgia
-    corresponding: true
-  - name: Shannon Quinn
-    email: spq@uga.edu
-    affiliations:
-    - University of Georgia
+    - name: Seyed Alireza Vaezi
+      email: sv22900@uga.edu
+      orcid: 0009-0000-2089-8362
+      affiliations:
+        - name: University of Georgia
+          ror: https://ror.org/00te3t702
+      corresponding: true
+    - name: Shannon Quinn
+      email: spq@uga.edu
+      affiliations:
+        - name: University of Georgia
+          ror: https://ror.org/00te3t702
   keywords:
-  - Cilia
-  - Unsupervised biomedical Image Segmentation
-  - Optical Flow
-  - Autoregressive
-  - Deep Learning
+    - Cilia
+    - Unsupervised biomedical Image Segmentation
+    - Optical Flow
+    - Autoregressive
+    - Deep Learning
   # Add the abbreviations that you use in your paper here
   abbreviations:
-    MyST: Markedly Structured Text
+    OF: optical flow
+    PCD: primary ciliary dyskinesia
+    ZCA: Zero-phase PCA Sphering
+    SAE: Sparse Autoencoders
+    AR: autoregressive
+    FPN: Feature Pyramid Network
   # It is possible to explicitly ignore the `doi-exists` check for certain citation keys
   error_rules:
-  - rule: doi-exists
-    severity: ignore
-    keys:
-    - Atr03
-    - terradesert
-    - jupyter
-    - sklearn1
-    - sklearn2
-    - Iakubovskii:2019
-    - settles2009active
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.jpg
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
+    - rule: doi-exists
+      severity: ignore
+      keys:
+        - Atr03
+        - terradesert
+        - jupyter
+        - sklearn1
+        - sklearn2
+        - Iakubovskii:2019
+        - settles2009active
 site:
   template: article-theme
diff --git a/papers/Arushi_Nath/banner.png b/papers/Arushi_Nath/banner.png
index 23676bb677..0e5a60bd47 100644
Binary files a/papers/Arushi_Nath/banner.png and b/papers/Arushi_Nath/banner.png differ
diff --git a/papers/Arushi_Nath/main.md b/papers/Arushi_Nath/main.md
index 4c680a9b90..f1b4995ea2 100644
--- a/papers/Arushi_Nath/main.md
+++ b/papers/Arushi_Nath/main.md
@@ -2,43 +2,50 @@
 # Ensure that this title is the same as the one in `myst.yml`
 title: Algorithms to Determine Asteroid’s Physical Properties using Sparse and Dense Photometry, Robotic Telescopes and Open Data
 abstract: |
-  The rapid pace of discovering asteroids due to advancements in detection techniques outpaces current abilities to analyze them comprehensively. Understanding an asteroid's physical properties is crucial for effective deflection strategies and improves our understanding of the solar system's formation and evolution. Dense photometry provides continuous time-series measurements valuable for determining an asteroid's rotation period, yet is limited to a singular phase angle. Conversely, sparse photometry offers non-continuous measurements across multiple phase angles, essential for determining an asteroid's absolute magnitude, albedo (reflectivity), and size. This paper presents open-source algorithms that integrate dense photometry from citizen scientists with sparse photometry from space and ground-based all-sky surveys to determine asteroids' albedo, size, rotation, strength, and composition. 
-  Applying the algorithms to the Didymos binary asteroid, combined with data from GAIA, the Zwicky Transient Facility, and ATLAS photometric sky surveys, revealed Didymos to be 840 meters wide, with a 0.14 albedo, an 18.14 absolute magnitude, a 2.26-hour rotation period, rubble-pile strength, and an S-type composition. Didymos was the target of the 2022 NASA Double Asteroid Redirection Test (DART) mission. The algorithm successfully measured a 35-minute decrease in the mutual orbital period following the DART mission, equating to a 40-meter reduction in the mutual orbital radius, proving a successful deflection. Analysis of the broader asteroid population highlighted significant compositional diversity, with a predominance of carbonaceous (C-type) asteroids in the outer regions of the asteroid belt and siliceous (S-type) and metallic (M-type) asteroids more common in the inner regions. These findings provide insights into the diversity and distribution of asteroid compositions, reflecting the conditions and processes of the early solar system. 
+  The rapid pace of discovering asteroids due to advancements in detection techniques outpaces current abilities to analyze them comprehensively. Understanding an asteroid's physical properties is crucial for effective deflection strategies and improves our understanding of the solar system's formation and evolution. Dense photometry provides continuous time-series measurements valuable for determining an asteroid's rotation period, yet is limited to a singular phase angle. Conversely, sparse photometry offers non-continuous measurements across multiple phase angles, essential for determining an asteroid's absolute magnitude, albedo (reflectivity), and size. This paper presents open-source algorithms that integrate dense photometry from citizen scientists with sparse photometry from space and ground-based all-sky surveys to determine asteroids' albedo, size, rotation, strength, and composition.
+  Applying the algorithms to the Didymos binary asteroid, combined with data from GAIA, the Zwicky Transient Facility, and ATLAS photometric sky surveys, revealed Didymos to be 840 meters wide, with a 0.14 albedo, an 18.14 absolute magnitude, a 2.26-hour rotation period, rubble-pile strength, and an S-type composition. Didymos was the target of the 2022 NASA Double Asteroid Redirection Test (DART) mission. The algorithm successfully measured a 35-minute decrease in the mutual orbital period following the DART mission, equating to a 40-meter reduction in the mutual orbital radius, proving a successful deflection. Analysis of the broader asteroid population highlighted significant compositional diversity, with a predominance of carbonaceous (C-type) asteroids in the outer regions of the asteroid belt and siliceous (S-type) and metallic (M-type) asteroids more common in the inner regions. These findings provide insights into the diversity and distribution of asteroid compositions, reflecting the conditions and processes of the early solar system.
   This work empowers citizen scientists to become planetary defenders, contributing significantly to planetary defense and enhancing our understanding of solar system composition and evolution.
-
 ---
 
 ## Introduction
 
 ### Background
+
 There are over 1.3 million known asteroids, and advanced detection techniques lead to the discovery of hundreds of new near-Earth and main-belt asteroids every month. Studying these asteroids provides valuable insights into the early solar system's formation and evolution. Phase curves, which illustrate the change in an asteroid's brightness as its phase angle (the angle between the observer, asteroid, and Sun) changes, are essential for asteroid characterization. Understanding near-Earth asteroids is crucial because it allows for the development of effective deflection strategies, which are vital for preventing potential collisions with Earth and safeguarding our planet from catastrophic impacts.
 
 ### Research Problem
+
 Despite advancements in detection techniques, the need for observations spanning multiple years, limited telescope availability, and narrow observation windows hinder detailed characterization of asteroids. To date, phase curves have been generated for only a few thousand asteroids. This slow pace of analysis hinders our planetary defense capabilities for deflecting potentially hazardous asteroids and limits our understanding of the solar system's evolution.
 
 ### Related Work
-Recent efforts in the field have focused on various approaches to combine dense and sparse photometric datasets. For instance, the Pan-STARRS survey has used sparse photometry to estimate the absolute magnitudes and rotation periods of asteroids, while the Asteroid Terrestrial-impact Last Alert System (ATLAS) provides sparse photometry data for many asteroids observed across different phase angles. Studies by Shevchenko et al. (2019) have explored methods to derive phase integrals and geometric albedos from sparse data. On the dense photometry side, projects like the Zwicky Transient Facility (ZTF) and Gaia Data Release 3 (DR3) have contributed extensive datasets valuable for continuous observations of asteroid brightness variations. However, these efforts often face challenges in data integration due to differing observational cadences, filters, and coverage.
+
+Recent efforts in the field have focused on various approaches to combine dense and sparse photometric datasets. For instance, the Pan-STARRS survey has used sparse photometry to estimate the absolute magnitudes and rotation periods of asteroids, while the Asteroid Terrestrial-impact Last Alert System (ATLAS) provides sparse photometry data for many asteroids observed across different phase angles. Studies by @Shevchenko2019 have explored methods to derive phase integrals and geometric albedos from sparse data. On the dense photometry side, projects like the Zwicky Transient Facility (ZTF) and Gaia Data Release 3 (DR3) have contributed extensive datasets valuable for continuous observations of asteroid brightness variations. However, these efforts often face challenges in data integration due to differing observational cadences, filters, and coverage.
 
 ### Research Objectives
+
 This paper presents an innovative methodology, PhAst, developed using Python algorithms to combine dense photometry from citizen scientists with sparse photometry from space and ground-based all-sky surveys to determine key physical characteristics of asteroids. The specific objectives of this research are to:
+
 1. Develop Python algorithms to integrate serendipitous asteroid observations with citizen-contributed and open datasets.
 2. Apply these algorithms to planetary defense tests, such as NASA's DART mission.
 3. Characterize large populations of asteroids to infer compositional diversity in our solar system.
 
 ### Significance of the Study
+
 This study offers significant contributions to the field of asteroid characterization and planetary defense. By integrating dense and sparse photometry, the PhAst algorithm provides a comprehensive method for determining the physical properties of asteroids. The open-source nature of the algorithm encourages collaboration and improvements from a global community of researchers and citizen scientists, enhancing its robustness and accelerating advancements in the field. Furthermore, the study empowers citizen scientists to actively participate in planetary defense, contributing valuable data and insights that enhance our understanding and preparedness for potential asteroid impacts.
 
 ### Application in Planetary Defense: NASA’s DART Mission
+
 The NASA Double Asteroid Redirection Test (DART) mission was designed to test and validate methods to protect Earth from hazardous asteroid impacts by demonstrating the kinetic impactor technique. It involved sending a spacecraft to collide with an asteroid to change its trajectory. PhAst provides a detailed pre- and post-impact analysis of the target asteroid, Didymos, and its moonlet, Dimorphos.
 
 ## Methodology
+
 ### Overview
 
 The PhAst algorithm integrates dense and sparse photometry data to determine the physical properties of asteroids. Dense photometry provides continuous time-series measurements, crucial for determining rotation periods, while sparse photometry offers non-continuous measurements across multiple phase angles, essential for absolute magnitude and size determination. Integrating both methods can overcome their individual limitations.
 
 ### Development of Novel Open-Source PhAst
-PhAst integrates several years of sparse photometry from serendipitous asteroid observations with dense photometry from professional and citizen scientists. **See Figure 1.** The algorithm effectively combines continuous light data (dense photometry) and infrequent light data (sparse photometry) by creating phase curves whose linear components yield the asteroid’s geometric albedo and composition, while the non-linear brightness surge at small angles determines the absolute magnitude. This methodology allows for the creation of folded light curves to measure the asteroid’s rotation period and, for binary asteroids, their mutual orbital period. Being open-source, the PhAst algorithm allows for collaboration and improvements from a global community of researchers and citizen scientists, enhancing its robustness and accelerating advancements in asteroid characterization.
 
+PhAst integrates several years of sparse photometry from serendipitous asteroid observations with dense photometry from professional and citizen scientists [@fig1]. The algorithm effectively combines continuous light data (dense photometry) and infrequent light data (sparse photometry) by creating phase curves whose linear components yield the asteroid’s geometric albedo and composition, while the non-linear brightness surge at small angles determines the absolute magnitude. This methodology allows for the creation of folded light curves to measure the asteroid’s rotation period and, for binary asteroids, their mutual orbital period. Being open-source, the PhAst algorithm allows for collaboration and improvements from a global community of researchers and citizen scientists, enhancing its robustness and accelerating advancements in asteroid characterization.
 
 ```{figure} figure1.png
 :name: fig1
@@ -48,14 +55,16 @@ Flowchart Showing Data Integration Process of PhAst
 ```
 
 ### Data Sources and Integration
+
 1. **Primary Asteroid Observations Using Robotic Telescopes:** Observation proposals were submitted to Alnitak Observatory, American Association of Variable Star Observers, Burke Gaffney Observatory, and Faulkes Telescope.
 2. **Citizen Scientist Observations:** Observations submitted by backyard astronomers from locations such as Chile and the USA.
 3. **Serendipitous Asteroid Observations in Sky Surveys:** Data from European Space Agency Gaia Data Release 3 and Zwicky Transient Facility (ZTF) Survey.
-4. **Secondary Asteroid Databases:** Data from the Asteroid Lightcurve Database (ALCDEF) and Asteroid Photometric Data Catalog (PDS) 3rd update.
+4. **Secondary Asteroid Databases:** Data from the Asteroid Lightcurve Database (ALCDEF) @ALCDEF and Asteroid Photometric Data Catalog (PDS) 3rd update.
 
 For searching asteroids in the ZTF dataset, the FINKS portal was utilized, which allowed searching asteroids by their Minor Planet Center (MPC) number. Similarly, asteroids in the GAIA dataset were searched using the Solar System Objects database of Gaia DR3.
 
 ### Observational Process
+
 1. **Identify Known Stars and Asteroids:** Using the GAIA Star Catalog and HORIZONS Asteroid Catalog, known stars and asteroids are identified and centroided in images. This step ensures that the exact positions of celestial objects are accurately determined, which is crucial for subsequent analysis.
 2. **Determine Optimal Aperture Size:** Differential photometry is used to calculate the asteroid's instrumental magnitude by determining the optimal aperture size that balances brightness measurement and noise. Too small an aperture may not capture the full brightness of the asteroid, while too large an aperture may include excessive background noise.
 3. **Select Suitable Comparison Stars:** Comparison stars with stable brightness are selected to remove the effects of seeing conditions and determine the asteroid's computed magnitude. This step is important to ensure that variations in observed brightness are due to the asteroid itself and not due to atmospheric conditions or instrumental errors.
@@ -64,108 +73,122 @@ For searching asteroids in the ZTF dataset, the FINKS portal was utilized, which
 6. **Determine Rotation and Orbital Periods:** Composite light curves are used to find the asteroid's rotation period and, for binary asteroids, the mutual orbital period. This analysis reveals the dynamic characteristics of the asteroid, including its spin state and orbital interactions with companion bodies.
 
 ### Python Tools and Libraries
+
 The development and implementation of PhAst heavily relied on various Python tools and libraries:
+
 - **Python:** The primary programming language used for developing PhAst.
 - **NumPy:** Used for numerical computations and handling large datasets efficiently.
 - **Matplotlib:** Utilized for plotting phase curves and light curves, visualizing the data, and generating graphs for analysis.
 - **AstroPy:** Employed for astronomical calculations and handling astronomical data, such as coordinate transformations and time conversions.
 
 ## Case Study: Didymos Binary Asteroid
+
 ### Initial Observations
+
 The Didymos binary asteroid, targeted by NASA's 2022 Double Asteroid Redirection Test (DART) mission, was selected for a detailed case study. Initial observations determined Didymos to be 840 meters wide, with a 0.14 albedo, an 18.14 absolute magnitude (a measure of its intrinsic brightness), a 2.26-hour rotation period, rubble-pile strength (indicating it is a loose collection of rocks held together by gravity), and an S-type composition (indicating it is made of stony or siliceous minerals). These properties were derived by applying the PhAst algorithm to a combination of dense and sparse photometric data.
 
 ### Impact Analysis
+
 PhAst successfully measured a 35-minute decrease in the mutual orbital period following the DART mission's impact. External sources validated these findings, demonstrating the algorithm's accuracy and reliability. The change in the mutual orbital period provided critical data on the effectiveness of the DART mission in altering the asteroid's trajectory, a key goal of planetary defense strategies.
 
 ## Results
-PhAst was used to generate phase curves for over 2100 asteroids in 100 hours on a home computer, including data-retrieval time. The physical properties of various target asteroids of space missions and understudied asteroids were determined, including targets of the NASA LUCY Mission, UAE Mission, binary asteroids, and understudied asteroids. **See figure 2.** The rapid analysis capability highlights PhAst's potential for large-scale asteroid characterization, enabling detailed studies of large populations of asteroids in a relatively short time.
+
+PhAst was used to generate phase curves for over 2100 asteroids in 100 hours on a home computer, including data-retrieval time. The physical properties of various target asteroids of space missions and understudied asteroids were determined, including targets of the NASA LUCY Mission, UAE Mission, binary asteroids, and understudied asteroids [@fig2]. The rapid analysis capability highlights PhAst's potential for large-scale asteroid characterization, enabling detailed studies of large populations of asteroids in a relatively short time.
 
 ```{figure} figure3.png
 :name: fig2
 :align: center
 
-Physical Properties of Target Asteroids of Space Missions and Understudied Asteroids Determined 
+Physical Properties of Target Asteroids of Space Missions and Understudied Asteroids Determined
 ```
+
 ### Determining Physical Properties of Target Asteroids of Space Missions and Understudied Asteroids
+
 PhAst was used to generate phase curves and determine the physical properties of various target asteroids of space missions and understudied asteroids. The results include:
 
 #### NASA LUCY Mission Targets
-The NASA LUCY mission aims to explore Trojan asteroids, which share Jupiter's orbit around the Sun. Understanding these asteroids can provide insights into the early solar system since Trojans are considered remnants of the primordial material that formed the outer planets.
 
-- **3548 Eurybates:**
-  - Absolute Magnitude (H) = 9.75 ± 0.05
-  - Slope Parameter (G) = 0.11
-  - Albedo = 0.05
-  * Relevance: Eurybates is the largest and presumably the most ancient member of the Eurybates family, offering a window into the conditions of the early solar system.
+The NASA LUCY mission aims to explore Trojan asteroids, which share Jupiter's orbit around the Sun. Understanding these asteroids can provide insights into the early solar system since Trojans are considered remnants of the primordial material that formed the outer planets.
 
-- **10253 Westerwald:**
-  - Absolute Magnitude (H) = 15.33 ± 0.05
-  - Slope Parameter (G) = 0.17
-  - Albedo = 0.21
-  * Relevance: Westerwald's high albedo suggests it might be a fragment from a larger parent body, providing clues about collisional processes in the early solar system.
+3548 Eurybates
+: Absolute Magnitude (H) = 9.75 ± 0.05
+: Slope Parameter (G) = 0.11
+: Albedo = 0.05
+: Relevance: Eurybates is the largest and presumably the most ancient member of the Eurybates family, offering a window into the conditions of the early solar system.
 
+10253 Westerwald
+: Absolute Magnitude (H) = 15.33 ± 0.05
+: Slope Parameter (G) = 0.17
+: Albedo = 0.21
+: Relevance: Westerwald's high albedo suggests it might be a fragment from a larger parent body, providing clues about collisional processes in the early solar system.
 
 #### UAE Mission Targets
-The UAE space mission to explore asteroids aims to study their composition, structure, and history, contributing to our understanding of asteroid formation and the evolution of the solar system.
 
-- **269 Justitia:**
-  - Absolute Magnitude (H) = 9.93 ± 0.09
-  - Slope Parameter (G) = 0.11
-  - Albedo = 0.09
-  * Relevance: Justitia's relatively low albedo indicates a carbonaceous composition, which can help researchers understand the distribution of organic materials in the solar system.
+The UAE space mission to explore asteroids aims to study their composition, structure, and history, contributing to our understanding of asteroid formation and the evolution of the solar system.
 
-- **15094 Polymele:**
-  - Absolute Magnitude (H) = 11.69 ± 0.07
-  - Slope Parameter (G) = 0.18
-  - Albedo = 0.05
-  * Relevance: Polymele's properties suggest it is a primitive body, providing valuable information about the early solar system's building blocks.
+269 Justitia
+: Absolute Magnitude (H) = 9.93 ± 0.09
+: Slope Parameter (G) = 0.11
+: Albedo = 0.09
+: Relevance: Justitia's relatively low albedo indicates a carbonaceous composition, which can help researchers understand the distribution of organic materials in the solar system.
 
+15094 Polymele
+: Absolute Magnitude (H) = 11.69 ± 0.07
+: Slope Parameter (G) = 0.18
+: Albedo = 0.05
+: Relevance: Polymele's properties suggest it is a primitive body, providing valuable information about the early solar system's building blocks.
 
 #### Binary Asteroids
+
 Understanding binary asteroids, where two asteroids orbit each other, can offer insights into the formation and evolutionary history of these systems. The mutual orbital period and other physical properties provide data on their dynamics and interactions.
 
-- **3378 Susanvictoria:**
-  - Absolute Magnitude (H) = 13.83 ± 0.05
-  - Slope Parameter (G) = 0.27
-  - Albedo = 0.19
-  * Relevance: Studying binary systems like Susanvictoria helps in understanding the processes that lead to the formation of binary asteroids and their subsequent evolution.
+3378 Susanvictoria
+: Absolute Magnitude (H) = 13.83 ± 0.05
+: Slope Parameter (G) = 0.27
+: Albedo = 0.19
+: Relevance: Studying binary systems like Susanvictoria helps in understanding the processes that lead to the formation of binary asteroids and their subsequent evolution.
 
-- **2825 Crosby:**
-  - Absolute Magnitude (H) = 13.33 ± 0.06
-  - Slope Parameter (G) = 0.11
-  - Albedo = 0.07
-  * Relevance: Crosby's characteristics can provide insights into the collisional history and mechanical properties of binary asteroid systems.
-The physical properties of the binary asteroids were submitted to the binary asteroid working group.
+2825 Crosby
+: Absolute Magnitude (H) = 13.33 ± 0.06
+: Slope Parameter (G) = 0.11
+: Albedo = 0.07
+: Relevance: Crosby's characteristics can provide insights into the collisional history and mechanical properties of binary asteroid systems. The physical properties of the binary asteroids were submitted to the binary asteroid working group.
 
 #### Understudied Asteroids
-Characterizing understudied asteroids expands our knowledge of the diversity and distribution of asteroid properties in the solar system.
 
-- **2006 MG13:**
-  - Absolute Magnitude (H) = 15.94 ± 0.08
-  - Slope Parameter (G) = 0.21
-  - Albedo = 0.19
-  * Relevance: Detailed study of asteroids like 2006 MG13 helps fill gaps in our understanding of the physical and compositional diversity of asteroids.
+Characterizing understudied asteroids expands our knowledge of the diversity and distribution of asteroid properties in the solar system.
 
-- **2007 AD11:**
-  - Absolute Magnitude (H) = 15.76 ± 0.11
-  - Slope Parameter (G) = 0.13
-  - Albedo = 0.13
-  * Relevance: Investigating such understudied bodies contributes to a more complete picture of asteroid population characteristics and their evolutionary paths.
+2006 MG13
+: Absolute Magnitude (H) = 15.94 ± 0.08
+: Slope Parameter (G) = 0.21
+: Albedo = 0.19
+: Relevance: Detailed study of asteroids like 2006 MG13 helps fill gaps in our understanding of the physical and compositional diversity of asteroids.
 
+2007 AD11
+: Absolute Magnitude (H) = 15.76 ± 0.11
+: Slope Parameter (G) = 0.13
+: Albedo = 0.13
+: Relevance: Investigating such understudied bodies contributes to a more complete picture of asteroid population characteristics and their evolutionary paths.
 
 ## Discussions
+
 ### Determining the Success of Asteroid Deflection
+
 The success of the DART mission was evaluated by analyzing the change in the orbital path of Dimorphos, the moonlet of Didymos, after deflection. Applying Kepler's Third Law, the pre-impact orbital period of 11.91 hours and post-impact orbital period of 11.34 hours were used to calculate an orbital radius change of 0.04 km. This change confirms the effectiveness of the DART mission in altering the asteroid's trajectory, a crucial component of planetary defense.
 
 ### Determining Asteroid Strength
-Asteroid strength can be inferred from the rotation period. This inference is based on the fact that an asteroid's structural integrity must be sufficient to withstand the centrifugal forces generated by its rotation. If the rotation period is less than 2.2 hours, the asteroid must be a strength-bound single rock; otherwise, it would fly apart due to centrifugal forces exceeding the gravitational binding forces. This criterion is supported by studies such as those by Pravec and Harris (2000), who observed that most asteroids with rotation periods shorter than 2.2 hours are smaller than 150 meters and are likely monolithic. For larger asteroids, the rubble-pile structure is held together by self-gravity rather than cohesive forces, making them prone to disaggregation at faster rotation rates. This information is vital for assessing the structural integrity of asteroids and planning deflection missions.
+
+Asteroid strength can be inferred from the rotation period. This inference is based on the fact that an asteroid's structural integrity must be sufficient to withstand the centrifugal forces generated by its rotation. If the rotation period is less than 2.2 hours, the asteroid must be a strength-bound single rock; otherwise, it would fly apart due to centrifugal forces exceeding the gravitational binding forces. This criterion is supported by studies such as those by @Pravec2000, who observed that most asteroids with rotation periods shorter than 2.2 hours are smaller than 150 meters and are likely monolithic. For larger asteroids, the rubble-pile structure is held together by self-gravity rather than cohesive forces, making them prone to disaggregation at faster rotation rates. This information is vital for assessing the structural integrity of asteroids and planning deflection missions.
 
 ### Determining Asteroid Taxonomy
+
 Asteroid taxonomy (chemical composition) can be determined from geometric albedo. C-type asteroids have lower albedo, S-type and M-type asteroids have moderate albedo, and rare E-type asteroids have the highest albedo. (S-type asteroids are made of stony or siliceous minerals, while C-type and M-type refer to carbonaceous and metallic compositions, respectively.) The taxonomic distribution provides insights into the conditions of the early solar system based on the spatial distribution of asteroid types. Understanding these compositions helps in determining the origins and evolutionary history of these asteroids.
 
 ### Early Solar System Conditions
-The taxonomical distributions of carbonaceous, siliceous, and metallic asteroids in the main belt were compiled. Over 58% of the asteroids characterized by PhAst are carbonaceous, showing they are the most abundant type in our Solar System. Their abundance increases with distance from the Sun, reaching nearly 75% in the outer region of the main belt compared to over 45% in the inner region. **See figure 3.** This finding is consistent with previous research in the field, such as studies by DeMeo and Carry (2014), which indicate that carbonaceous asteroids are prevalent in the outer asteroid belt.
+
+The taxonomical distributions of carbonaceous, siliceous, and metallic asteroids in the main belt were compiled. Over 58% of the asteroids characterized by PhAst are carbonaceous, showing they are the most abundant type in our Solar System. Their abundance increases with distance from the Sun, reaching nearly 75% in the outer region of the main belt compared to over 45% in the inner region [@fig3]. This finding is consistent with previous research in the field, such as studies by @DeMeo2014, which indicate that carbonaceous asteroids are prevalent in the outer asteroid belt.
 Characterizing asteroid populations helps us better understand the diversity of compositions in the solar system by providing a detailed inventory of the different types of asteroids and their distribution. This information is crucial for several reasons:
+
 - **Formation Conditions:** Different types of asteroids formed under varying conditions in the early solar system. For example, carbonaceous (C-type) asteroids, which are rich in organic compounds, are more prevalent in the outer regions of the asteroid belt, suggesting formation in cooler, volatile-rich environments. In contrast, siliceous (S-type) and metallic (M-type) asteroids are more common in the inner regions, indicating formation in hotter, more metal-rich conditions.
 - **Evolutionary Processes:** By studying the physical and chemical properties of asteroids, we can infer the processes that have shaped their evolution. This includes understanding how collisions, thermal processes, and space weathering have affected their surfaces and internal structures.
 
@@ -177,46 +200,39 @@ Spatial Distribution of Asteroid Types
 ```
 
 ### Errors and Limitations
+
 Photometry was performed on images with a Signal-to-Noise Ratio (SNR) > 100, yielding a measurement uncertainty of 0.01. The average error in phase curve fitting was 0.10. Limited processing power restricted the preciseness of the best fit for rotation and mutual orbital periods to two significant digits. These limitations highlight the need for more powerful computational resources and more precise observational data to improve the accuracy of asteroid characterization.
 
 ## Conclusions
+
 PhAst represents a significant advancement in asteroid characterization, combining dense and sparse photometry to yield comprehensive insights into asteroid properties. The successful application of PhAst to the Didymos binary asteroid and over 2100 other asteroids demonstrates its potential for large-scale use. By engaging citizen scientists, we can accelerate asteroid analysis and enhance our planetary defense strategies.
 
 ## Future Work
-PhAst will serve as a powerful tool for accelerating the analysis of data produced by the Legacy Survey of Space and Time (LSST), set to begin in 2025. Over a decade, LSST aims to observe over 5 million asteroids across various filters, generating a nightly data volume of 20TB. The specific benefits and new opportunities that PhAst's applications might bring include:
-- **Enhanced Planetary Defense:** By rapidly characterizing large populations of asteroids, including potentially hazardous asteroids (PHAs), PhAst can provide detailed analysis that are crucial for developing effective deflection strategies, thereby enhancing planetary defense capabilities.
-- **Comprehensive Asteroid Mapping:** The integration of dense and sparse photometry allows for the creation of more accurate and comprehensive maps of asteroid distributions and compositions in the solar system. This can provide valuable insights into the formation and evolution of the solar system, aiding both scientific research and educational initiatives.
-- **Resource Identification and Utilization:** PhAst's ability to determine the physical and compositional properties of asteroids can aid in identifying asteroids rich in valuable minerals or water. This opens up new opportunities for asteroid mining and resource utilization, which could support long-term space exploration and the development of space infrastructure.
-- **Support for Future Space Missions:** PhAst can be used to provide detailed pre and post mission characterization of target asteroids for upcoming space missions including NASA’s OSIRIS-APEX which will fly-by near-Earth asteroid Apophis on April 23, 2029, JAXA’s Hayabusa2 SHARP to explore two asteroids, 2001 CC21 and 1998 KY26, and China’s first kinetic impact deflection test mission would target the near-Earth asteroid 2015 XF261 with a launch in 2027. 
-- **Exoplanetary Atmosphere Characterization:** PhAst can be expanded to exoplanetary atmosphere characterization by adapting its methodology to analyze the light curves from transiting exoplanets in multiple filters. This expansion would allow researchers to study the atmospheres of distant planets, providing insights into their composition, climate, and potential habitability.
-- **Citizen Science and Public Engagement:** By making PhAst open-source and developing training modules for citizen scientists, the project promotes public engagement in scientific research. This democratization of science enables a wider community to contribute to and benefit from cutting-edge research, fostering a culture of curiosity and collaboration.
 
-## Project Impact
-The PhAst algorithm has been made open-source, and training modules have been developed for citizen scientists. These modules, created using Jupyter Notebooks, are designed for use by high school students and citizen scientists to support their engagement in asteroid characterization and planetary defense. Training on using open data for asteroid categorization has been provided to over 1,500 students during "Space Day" and "Asteroid Day" events in collaboration with observatories and community organizations such as Royal Astronomical Society of Canada. See link to Github: https://github.com/Spacegirl123/Asteroid-Characterization-By-PhAst
-
-## Acknowledgments
-The development and application of PhAst have been possible thanks to contributions from numerous observatories, citizen scientists, and research institutions. Special thanks to the teams behind GAIA, Zwicky Transient Facility (ZTF), ATLAS, and other photometric surveys for providing the data that made this research possible. I also acknowledge the support of various citizen science communities and educational organizations for their collaboration and participation.
-
-## References
-
-[1] Center for Near-Earth Object Studies. Total number of asteroids discovered monthly. Retrieved from https://cneos.jpl.nasa.gov/stats/totals.html
+PhAst will serve as a powerful tool for accelerating the analysis of data produced by the Legacy Survey of Space and Time (LSST), set to begin in 2025. Over a decade, LSST aims to observe over 5 million asteroids across various filters, generating a nightly data volume of 20TB. The specific benefits and new opportunities that PhAst's applications might bring include:
 
-[2] NASA/Johns Hopkins University Applied Physics Laboratory. (2022, March). NASA's first planetary defense technology demonstration to collide with asteroid in 2022. https://www.nasa.gov/feature/nasa-s-first-planetary-defense-technology-demonstration-to-collide-with-asteroid-in-2022
+Enhanced Planetary Defense
+: By rapidly characterizing large populations of asteroids, including potentially hazardous asteroids (PHAs), PhAst can provide detailed analysis that are crucial for developing effective deflection strategies, thereby enhancing planetary defense capabilities.
 
-[3] Shevchenko, V. G., et al. (2019). Phase integral of asteroids. Astronomy & Astrophysics, 626(A87). https://doi.org/10.1051/0004-6361/201935588
+Comprehensive Asteroid Mapping
+: The integration of dense and sparse photometry allows for the creation of more accurate and comprehensive maps of asteroid distributions and compositions in the solar system. This can provide valuable insights into the formation and evolution of the solar system, aiding both scientific research and educational initiatives.
 
-[4] Talbert, T. (2022, October 11). NASA DART imagery shows changed orbit of target asteroid. NASA. https://www.nasa.gov/solar-system/nasa-dart-imagery-shows-changed-orbit-of-target-asteroid/
+Resource Identification and Utilization
+: PhAst's ability to determine the physical and compositional properties of asteroids can aid in identifying asteroids rich in valuable minerals or water. This opens up new opportunities for asteroid mining and resource utilization, which could support long-term space exploration and the development of space infrastructure.
 
-[5] Jet Propulsion Laboratory. (n.d.). Small-Body Database Lookup. https://ssd.jpl.nasa.gov/tools/sbdb_lookup.html#/?sstr=65803
+Support for Future Space Missions
+: PhAst can be used to provide detailed pre and post mission characterization of target asteroids for upcoming space missions including NASA’s OSIRIS-APEX which will fly-by near-Earth asteroid Apophis on April 23, 2029, JAXA’s Hayabusa2 SHARP to explore two asteroids, 2001 CC21 and 1998 KY26, and China’s first kinetic impact deflection test mission would target the near-Earth asteroid 2015 XF261 with a launch in 2027.
 
-[6] Fink Broker. (n.d.). ZTF Minor Planet Photometric Data Release. https://fink-portal.org/
+Exoplanetary Atmosphere Characterization
+: PhAst can be expanded to exoplanetary atmosphere characterization by adapting its methodology to analyze the light curves from transiting exoplanets in multiple filters. This expansion would allow researchers to study the atmospheres of distant planets, providing insights into their composition, climate, and potential habitability.
 
-[7] European Space Agency. (n.d.). Gaia Data Release 3. https://www.cosmos.esa.int/web/gaia/dr3
+Citizen Science and Public Engagement
+: By making PhAst open-source and developing training modules for citizen scientists, the project promotes public engagement in scientific research. This democratization of science enables a wider community to contribute to and benefit from cutting-edge research, fostering a culture of curiosity and collaboration.
 
-[8] ALCDEF. (n.d.). Asteroid Lightcurve Photometry Database. https://alcdef.org/
+## Project Impact
 
-[9] Planetary Science Institute. (n.d.). Asteroid Photometric Catalog (APC) "Third Update." https://sbn.psi.edu/pds/resource/apc.html
+The PhAst algorithm has been made open-source, and training modules have been developed for citizen scientists. These modules, created using Jupyter Notebooks, are designed for use by high school students and citizen scientists to support their engagement in asteroid characterization and planetary defense. Training on using open data for asteroid categorization has been provided to over 1,500 students during "Space Day" and "Asteroid Day" events in collaboration with observatories and community organizations such as Royal Astronomical Society of Canada. See link to Github: https://github.com/Spacegirl123/Asteroid-Characterization-By-PhAst
 
-[10] Pravec, P., & Harris, A. W. (2000). Fast and Slow Rotation of Asteroids. Icarus, 148(1), 12-20. https://doi.org/10.1006/icar.2000.6482
+## Acknowledgments
 
-[11] DeMeo, F. E., & Carry, B. (2014). Solar System evolution from compositional mapping of the asteroid belt. Nature, 505(7485), 629-634. https://doi.org/10.1038/nature12908
+The development and application of PhAst have been possible thanks to contributions from numerous observatories, citizen scientists, and research institutions. Special thanks to the teams behind GAIA, Zwicky Transient Facility (ZTF), ATLAS, and other photometric surveys for providing the data that made this research possible. I also acknowledge the support of various citizen science communities and educational organizations for their collaboration and participation.
diff --git a/papers/Arushi_Nath/mybib.bib b/papers/Arushi_Nath/mybib.bib
index 9c1fa0275a..dbe5640eb5 100644
--- a/papers/Arushi_Nath/mybib.bib
+++ b/papers/Arushi_Nath/mybib.bib
@@ -1,130 +1,93 @@
-# Feel free to delete these first few references, which are specific to the template:
-
-@book{hume48,
-  author    = "David Hume",
-  year      = "1748",
-  title     = "An enquiry concerning human understanding",
-  address   = "Indianapolis, IN",
-  publisher = "Hackett",
-  doi       = {https://doi.org/10.1017/CBO9780511808432},
+@misc{CNEOS_Totals,
+  author       = {{Center for Near-Earth Object Studies}},
+  title        = {Total number of asteroids discovered monthly},
+  howpublished = {\url{https://cneos.jpl.nasa.gov/stats/totals.html}},
+  note         = {Accessed: October 1, 2024}
 }
 
-@article{Atr03,
-  author  = "P Atreides",
-  year    = "2003",
-  title   = "How to catch a sandworm",
-  journal = "Transactions on Terraforming",
-  volume  = 21,
-  issue   = 3,
-  pages   = {261-300}
+@misc{NASA_DART2022,
+  author       = {{NASA/Johns Hopkins University Applied Physics Laboratory}},
+  title        = {NASA's first planetary defense technology demonstration to collide with asteroid in 2022},
+  month        = {March},
+  year         = {2022},
+  howpublished = {\url{https://www.nasa.gov/feature/nasa-s-first-planetary-defense-technology-demonstration-to-collide-with-asteroid-in-2022}},
+  note         = {Accessed: October 1, 2024}
 }
 
-@misc{terradesert,
-  author  = {{TerraDesert Team}},
-  title   = {Code for terraforming a desert},
-  year    = {2000},
-  url     = {https://terradesert.com/code/},
-  note    = {Accessed 1 Jan. 2000}
+@article{Shevchenko2019,
+  author  = {Shevchenko, V. G. and Tedesco, E. F. and Kovalchuk, L. O. and Fiacconi, A. M. and Zubarev, V. A.},
+  title   = {Phase integral of asteroids},
+  journal = {Astronomy \& Astrophysics},
+  volume  = {626},
+  pages   = {A87},
+  year    = {2019},
+  doi     = {10.1051/0004-6361/201935588}
 }
 
-# These references may be helpful:
-
-@inproceedings{jupyter,
-  abstract  = {It is increasingly necessary for researchers in all fields to write computer code, and in order to reproduce research results, it is important that this code is published. We present Jupyter notebooks, a document format for publishing code, results and explanations in a form that is both readable and executable. We discuss various tools and use cases for notebook documents.},
-  author    = {Kluyver, Thomas and Ragan-Kelley, Benjamin and Pérez, Fernando and Granger, Brian and Bussonnier, Matthias and Frederic, Jonathan and Kelley, Kyle and Hamrick, Jessica and Grout, Jason and Corlay, Sylvain and Ivanov, Paul and Avila, Damián and Abdalla, Safia and Willing, Carol and {Jupyter development team}},
-  editor    = {Loizides, Fernando and Scmidt, Birgit},
-  location  = {Netherlands},
-  publisher = {IOS Press},
-  url       = {https://eprints.soton.ac.uk/403913/},
-  booktitle = {Positioning and Power in Academic Publishing: Players, Agents and Agendas},
-  year      = {2016},
-  pages     = {87--90},
-  title     = {Jupyter Notebooks - a publishing format for reproducible computational workflows},
+@misc{Talbert2022,
+  author       = {Talbert, Tricia},
+  title        = {NASA DART imagery shows changed orbit of target asteroid},
+  howpublished = {NASA},
+  month        = {October},
+  day          = {11},
+  year         = {2022},
+  url          = {https://www.nasa.gov/solar-system/nasa-dart-imagery-shows-changed-orbit-of-target-asteroid/},
+  note         = {Accessed: October 1, 2024}
 }
 
-@article{matplotlib,
-  abstract     = {Matplotlib is a 2D graphics package used for Python for application development, interactive scripting, and publication-quality image generation across user interfaces and operating systems.},
-  author       = {Hunter, J. D.},
-  publisher    = {IEEE COMPUTER SOC},
-  year         = {2007},
-	doi          = {https://doi.org/10.1109/MCSE.2007.55},
-  journal      = {Computing in Science \& Engineering},
-  number       = {3},
-  pages        = {90--95},
-  title        = {Matplotlib: A 2D graphics environment},
-  volume       = {9},
+@misc{JPL_SBDB,
+  author       = {{Jet Propulsion Laboratory}},
+  title        = {Small-Body Database Lookup},
+  howpublished = {\url{https://ssd.jpl.nasa.gov/tools/sbdb_lookup.html\#/?sstr=65803}},
+  note         = {Accessed: October 1, 2024}
 }
 
-@article{numpy,
-  author       = {Harris, Charles R. and Millman, K. Jarrod and van der Walt, Stéfan J. and Gommers, Ralf and Virtanen, Pauli and Cournapeau, David and Wieser, Eric and Taylor, Julian and Berg, Sebastian and Smith, Nathaniel J. and Kern, Robert and Picus, Matti and Hoyer, Stephan and van Kerkwijk, Marten H. and Brett, Matthew and Haldane, Allan and del Río, Jaime Fernández and Wiebe, Mark and Peterson, Pearu and Gérard-Marchant, Pierre and Sheppard, Kevin and Reddy, Tyler and Weckesser, Warren and Abbasi, Hameer and Gohlke, Christoph and Oliphant, Travis E.},
-  publisher    = {Springer Science and Business Media {LLC}},
-  doi          = {https://doi.org/10.1038/s41586-020-2649-2},
-  date         = {2020-09},
-  year         = {2020},
-  journal      = {Nature},
-  number       = {7825},
-  pages        = {357--362},
-  title        = {Array programming with {NumPy}},
-  volume       = {585},
+@misc{FinkBroker,
+  author       = {{Fink Broker}},
+  title        = {ZTF Minor Planet Photometric Data Release},
+  howpublished = {\url{https://fink-portal.org/}},
+  note         = {Accessed: October 1, 2024}
 }
 
-@misc{pandas1,
-  author    = {{The Pandas Development Team}},
-  title     = {pandas-dev/pandas: Pandas},
-  month     = feb,
-  year      = {2020},
-  publisher = {Zenodo},
-  version   = {latest},
-  url       = {https://doi.org/10.5281/zenodo.3509134},
+@misc{ESA_Gaia_DR3,
+  author       = {{European Space Agency}},
+  title        = {Gaia Data Release 3},
+  howpublished = {\url{https://www.cosmos.esa.int/web/gaia/dr3}},
+  note         = {Accessed: October 1, 2024}
 }
 
-@inproceedings{pandas2,
-  author    = {Wes McKinney},
-  title     = {{D}ata {S}tructures for {S}tatistical {C}omputing in {P}ython},
-  booktitle = {{P}roceedings of the 9th {P}ython in {S}cience {C}onference},
-  pages     = {56 - 61},
-  year      = {2010},
-  editor    = {{S}t\'efan van der {W}alt and {J}arrod {M}illman},
-	doi       = {https://doi.org/10.25080/Majora-92bf1922-00a},
+@misc{ALCDEF,
+  author       = {{ALCDEF}},
+  title        = {Asteroid Lightcurve Photometry Database},
+  howpublished = {\url{https://alcdef.org/}},
+  note         = {Accessed: October 1, 2024}
 }
 
-@article{scipy,
-  author  = {Virtanen, Pauli and Gommers, Ralf and Oliphant, Travis E. and
-            Haberland, Matt and Reddy, Tyler and Cournapeau, David and
-            Burovski, Evgeni and Peterson, Pearu and Weckesser, Warren and
-            Bright, Jonathan and {van der Walt}, St{\'e}fan J. and
-            Brett, Matthew and Wilson, Joshua and Millman, K. Jarrod and
-            Mayorov, Nikolay and Nelson, Andrew R. J. and Jones, Eric and
-            Kern, Robert and Larson, Eric and Carey, C J and
-            Polat, {\.I}lhan and Feng, Yu and Moore, Eric W. and
-            {VanderPlas}, Jake and Laxalde, Denis and Perktold, Josef and
-            Cimrman, Robert and Henriksen, Ian and Quintero, E. A. and
-            Harris, Charles R. and Archibald, Anne M. and
-            Ribeiro, Ant{\^o}nio H. and Pedregosa, Fabian and
-            {van Mulbregt}, Paul and {SciPy 1.0 Contributors}},
-  title   = {{{SciPy} 1.0: Fundamental Algorithms for Scientific
-            Computing in Python}},
-  journal = {Nature Methods},
-  year    = {2020},
-  volume  = {17},
-  pages   = {261--272},
-  adsurl  = {https://rdcu.be/b08Wh},
-	doi     = {https://doi.org/10.1038/s41592-019-0686-2},
+@misc{PSI_APC,
+  author       = {{Planetary Science Institute}},
+  title        = {Asteroid Photometric Catalog (APC) "Third Update"},
+  howpublished = {\url{https://sbn.psi.edu/pds/resource/apc.html}},
+  note         = {Accessed: October 1, 2024}
 }
 
-@article{sklearn1,
-  author       = {Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V. and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P. and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
-  year         = {2011},
-  journal      = {Journal of Machine Learning Research},
-  pages        = {2825--2830},
-  title        = {Scikit-learn: Machine Learning in {P}ython},
-  volume       = {12},
+@article{Pravec2000,
+  author  = {Pravec, P. and Harris, A. W.},
+  title   = {Fast and Slow Rotation of Asteroids},
+  journal = {Icarus},
+  volume  = {148},
+  number  = {1},
+  pages   = {12--20},
+  year    = {2000},
+  doi     = {10.1006/icar.2000.6482}
 }
 
-@inproceedings{sklearn2,
-  author    = {Buitinck, Lars and Louppe, Gilles and Blondel, Mathieu and Pedregosa, Fabian and Mueller, Andreas and Grisel, Olivier and Niculae, Vlad and Prettenhofer, Peter and Gramfort, Alexandre and Grobler, Jaques and Layton, Robert and VanderPlas, Jake and Joly, Arnaud and Holt, Brian and Varoquaux, Gaël},
-  booktitle = {ECML PKDD Workshop: Languages for Data Mining and Machine Learning},
-  year      = {2013},
-  pages     = {108--122},
-  title     = {{API} design for machine learning software: experiences from the scikit-learn project},
+@article{DeMeo2014,
+  author  = {DeMeo, F. E. and Carry, B.},
+  title   = {Solar System evolution from compositional mapping of the asteroid belt},
+  journal = {Nature},
+  volume  = {505},
+  number  = {7485},
+  pages   = {629--634},
+  year    = {2014},
+  doi     = {10.1038/nature12908}
 }
diff --git a/papers/Arushi_Nath/myst.yml b/papers/Arushi_Nath/myst.yml
index 4e6e742d97..61ade50344 100644
--- a/papers/Arushi_Nath/myst.yml
+++ b/papers/Arushi_Nath/myst.yml
@@ -1,5 +1,7 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/TWCF2755
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-Arushi_Nath
   # Ensure your title is the same as in your `main.md`
@@ -17,7 +19,18 @@ project:
     - Open Data
     - Citizen Scientists
     - Asteroid Characterization
-
+  abbreviations:
+    ATLAS: Asteroid Terrestrial-impact Last Alert System
+    DART: Double Asteroid Redirection Test
+    ZTF: Zwicky Transient Facility
+    LSST: Legacy Survey of Space and Time
+    ALCDEF: Asteroid Lightcurve Database
+    C-type: carbonaceous
+    S-type: siliceous
+    M-type: metallic
+    PDS: Photometric Data Catalog
+    MPC: Minor Planet Center
+    SNR: Signal-to-Noise Ratio
   # It is possible to explicitly ignore the `doi-exists` check for certain citation keys
   error_rules:
     - rule: doi-exists
@@ -28,13 +41,5 @@ project:
         - jupyter
         - sklearn1
         - sklearn2
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
 site:
   template: article-theme
diff --git a/papers/Arushi_Nath/thumbnail.png b/papers/Arushi_Nath/thumbnail.png
new file mode 100644
index 0000000000..3b98895060
Binary files /dev/null and b/papers/Arushi_Nath/thumbnail.png differ
diff --git a/papers/Gagnon_Kebe_Tahiri/banner.png b/papers/Gagnon_Kebe_Tahiri/banner.png
index c5dd028e26..c02cb04903 100644
Binary files a/papers/Gagnon_Kebe_Tahiri/banner.png and b/papers/Gagnon_Kebe_Tahiri/banner.png differ
diff --git a/papers/Gagnon_Kebe_Tahiri/main.tex b/papers/Gagnon_Kebe_Tahiri/main.tex
index c50c767999..bb06942f7c 100644
--- a/papers/Gagnon_Kebe_Tahiri/main.tex
+++ b/papers/Gagnon_Kebe_Tahiri/main.tex
@@ -1,21 +1,21 @@
 \begin{abstract}
 Cumacea (crustaceans: Peracarida) are vital indicators of benthic health in marine ecosystems. This study investigated the influence of environmental (i.e., biological or ecosystemic), climatic (i.e., meteorological or atmospheric), and geographic (i.e., spatial or regional) attributes on their genetic variability in the Northern North Atlantic, focusing on Icelandic waters. We analyzed mitochondrial sequences of the 16S rRNA gene from 62 Cumacea specimens. Using the \textit{aPhyloGeo} software, we compared these sequences with relevant parameters such as latitude (decimal degree) at the start of sampling, wind speed (m/s) at the start of sampling, O\textsubscript{2} concentration (mg/L), and depth (m) at the start of sampling.
 
-Our analyses revealed variability in most spatial and biological attributes, reflecting the diversity of ecological requirements and benthic habitats. The most common Cumacea families, Diastylidae and Leuconidae, suggest adaptations to various marine environments. Phylogeographic analysis showed a divergence between specific genetic sequences and two habitat attributes: wind speed (m/s) at the start of sampling and O\tsubscript{2} concentration (mg/L). This indicates potential local adaptation to these fluctuating conditions.
+Our analyses revealed variability in most spatial and biological attributes, reflecting the diversity of ecological requirements and benthic habitats. The most common Cumacea families, Diastylidae and Leuconidae, suggest adaptations to various marine environments. Phylogeographic analysis showed a divergence between specific genetic sequences and two habitat attributes: wind speed (m/s) at the start of sampling and O\textsubscript{2} concentration (mg/L). This indicates potential local adaptation to these fluctuating conditions.
 
-These results reinforce the importance of further research into the relationship between Cumacea genetics and global environmental factors. Understanding these relationships is essential for interpreting the evolutionary dynamics and adaptation of deep-sea Cumacea. This study sheds much-needed light on invertebrate acclimatization to climate change, anthropomorphic pressures, and deep-water habitat management. It can contribute to the evolution of more efficient conservation strategies and inform policies that protect vulnerable marine ecosystems. 
+These results reinforce the importance of further research into the relationship between Cumacea genetics and global environmental factors. Understanding these relationships is essential for interpreting the evolutionary dynamics and adaptation of deep-sea Cumacea. This study sheds much-needed light on invertebrate acclimatization to climate change, anthropomorphic pressures, and deep-water habitat management. It can contribute to the evolution of more efficient conservation strategies and inform policies that protect vulnerable marine ecosystems.
 
 The \textit{aPhyloGeo} Python package is freely and publicly available on \href{https://github.com/tahiri-lab/aPhyloGeo}{GitHub} and \href{https://pypi.org/project/aphylogeo/}{PyPi}, providing an invaluable tool for future research.
 \end{abstract}
 
 \section{Introduction}\label{introduction}
-The North Atlantic and Subarctic regions, particularly the Icelandic waters, are of ecological interest due to their diverse water masses and unique oceanographic features \citep{schnurr_composition_2014, meisner_benthic_2014, uhlir_adding_2021}. These areas form vital {benthic habitats}\footnote{These are areas on the bottom of the oceans or lakes, including sediments and organisms that live in them.} \citep{levin2009ecological} and enhance our understanding of deep-sea ecosystems and biodiversity patterns \citep{rogers2007corals, danovaro2008exponential, uhlir_adding_2021}. The IceAGE project and its predecessors, BIOFAR and BIOICE, provide invaluable data for studying the impacts of climate change and seabed mining, especially in the Greenland, Iceland, and Norwegian (GIN) seas \citep{meisner_prefacebiodiversity_2018}. 
+The North Atlantic and Subarctic regions, particularly the Icelandic waters, are of ecological interest due to their diverse water masses and unique oceanographic features \citep{schnurr_composition_2014, meisner_benthic_2014, uhlir_adding_2021}. These areas form vital {benthic habitats}\footnote{These are areas on the bottom of the oceans or lakes, including sediments and organisms that live in them.} \citep{levin2009ecological} and enhance our understanding of deep-sea ecosystems and biodiversity patterns \citep{rogers2007corals, danovaro2008exponential, uhlir_adding_2021}. The IceAGE project and its predecessors, BIOFAR and BIOICE, provide invaluable data for studying the impacts of climate change and seabed mining, especially in the Greenland, Iceland, and Norwegian (GIN) seas \citep{meisner_prefacebiodiversity_2018}.
 
-Cumacea, a crustacean taxon within Peracarida, provide major indicators of marine ecosystem health due to their sensitivity to environmental fluctuations \citep{stransky_diversity_2010} and their contribution to benthic food webs \citep{rehm2009cumacea}. Despite their ecological importance, deep-sea benthic invertebrates’ evolutionary history remains uncharted, notably in the North Atlantic \citep{jennings_phylogeographic_2014}. Interpreting these deep-sea organisms' genetic distribution and demography is central for predicting their response to climate change and anthropogenic pressures, such as seabed mining \citep{jennings_phylogeographic_2014, meisner_prefacebiodiversity_2018}. 
+Cumacea, a crustacean taxon within Peracarida, provide major indicators of marine ecosystem health due to their sensitivity to environmental fluctuations \citep{stransky_diversity_2010} and their contribution to benthic food webs \citep{rehm2009cumacea}. Despite their ecological importance, deep-sea benthic invertebrates’ evolutionary history remains uncharted, notably in the North Atlantic \citep{jennings_phylogeographic_2014}. Interpreting these deep-sea organisms' genetic distribution and demography is central for predicting their response to climate change and anthropogenic pressures, such as seabed mining \citep{jennings_phylogeographic_2014, meisner_prefacebiodiversity_2018}.
 
 Given the urgency of the above factors, this study aims to analyze the influence of ecological (climatic and environmental) and geographic parameters on the genetic variability of Cumacea in the Northern North Atlantic. Specifically, we will examine whether genetic adaptation exists between the genetic structure of the 16S rRNA mitochondrial gene region of cumacean species sampled and their habitat attributes. If so, we will determine the attribute that diverges most from a specific gene sequence of this cumaceans gene (i.e., a window) and further explore the potential associated protein using bioinformatics tools to interpret its biological relevance. Our approach includes confirming different {phylogeographic models}\footnote{Phylogeographic models are computational tools that analyze relationships between the genetic structures of populations and their geographic distributions. In our case, by incorporating regional, biological, and atmospheric characteristics, we can interpret their impact on the genetic distribution of cumacean species,} and updating a Python package (currently in beta), \textit{aPhyloGeo}, to simplify these analyses.
 
-This paper is organized as follows: Section \autoref{related-works} reviews pertinent studies on the biodiversity and biogeography of deep-sea benthic invertebrates; Section \autoref{contribution} summarizes the aims and contributions of this study, highlighting aspects relating to the conservation and adaptation of marine invertebrates to climate change; Section \autoref{materials-methods} describes the data collection, sampling procedures, and genetic analyses; Section \autoref{metrics} describes the metrics used to evaluate the phylogeographic models; Section \autoref{results} presents the results; finally, Section \autoref{conclusion} discusses their implications for future research and conservation efforts.
+This paper is organized as follows: \autoref{related-works} reviews pertinent studies on the biodiversity and biogeography of deep-sea benthic invertebrates; \autoref{contribution} summarizes the aims and contributions of this study, highlighting aspects relating to the conservation and adaptation of marine invertebrates to climate change; \autoref{materials-methods} describes the data collection, sampling procedures, and genetic analyses; \autoref{metrics} describes the metrics used to evaluate the phylogeographic models; \autoref{results} presents the results; finally, \autoref{conclusion} discusses their implications for future research and conservation efforts.
 
 \section{Related Works}\label{related-works}
 Assessing and quantifying the biodiversity of deep-sea benthic invertebrates has become increasingly crucial since it was discovered that their species richness may be underestimated \citep{grassle1992deep}. Subsequent research has highlighted the need for large-scale distribution models to interpret the diversity of these organisms across their ecological and evolutionary contexts \citep{rex1997large}. That is why recent efforts have focused on mapping, managing, and studying the seabed \citep{brown2011benthic}. Advanced technologies such as acoustic detection are improving our knowledge of benthic ecosystem complexity \citep{brown2011benthic}. Integrating genetic and habitat attributes gives a deeper understanding of how ecosystemic, meteorological, and spatial attributes influence the genetic differences, distribution, biodiversity, and resilience of deep-sea benthic organisms \citep{vrijenhoek2009cryptic}.
@@ -30,19 +30,19 @@ \section{Our Contribution}\label{contribution}
 Furthermore, our genetic and environmental data highlights critical habitats of high conservation interest, which can be considered for establishing marine protected areas \citep{levin2009ecological}. These results are essential for developing informed conservation strategies in the context of climate change. Finally, our study paves the way for further research on other invertebrate species across different geographic regions. By extending this research to diverse environments and taxonomic groups, scientists will gain a more complete understanding of the adaptation and resilience of marine invertebrates to changing conditions. This work contributes essential insights to the field and supports the development of informed conservation strategies.
 
 \section{Materials and Methods}\label{materials-methods}
-This section describes our data and introduces the main stages of data pre-processing and the \textit{aPhyloGeo} software. A flow chart, constructed with the diagram software \href{https://app.diagrams.net/}{draw.io}, summarizes this section (Figure \ref{fig:fig1}).
+This section describes our data and introduces the main stages of data pre-processing and the \textit{aPhyloGeo} software. A flow chart, constructed with the diagram software \href{https://app.diagrams.net/}{draw.io}, summarizes this section (\autoref{fig:fig1}).
 
 \begin{figure}[htbp]
     \centering
     \includegraphics[width=0.7\textwidth]{diagram.drawio.png}
-    \caption{Flow chart summarizing the Materials and Methods section workflow. Six different colors highlight the blocks. The first block (blue) represents our database. The second block (red) is data pre-processing, where we remove attributes. The third and fourth blocks (orange) implement the \textit{aPhyloGeo} software and its parameters for our phylogeographic analyses (see in the second step of the section \autoref{aPhyloGeo-software}). The fifth block (grey) calculates phylogenetic tree comparison distances. The sixth block (yellow) compares the distances between the phylogenetic trees produced. The seventh block (purple) identifies regions with high mutation rates based on the results of the tree comparisons. *See YAML files on \href{https://github.com/tahiri-lab/aPhyloGeo}{GitHub} for more details on these parameters. \label{fig:fig1}}
+    \caption{Flow chart summarizing the Materials and Methods section workflow. Six different colors highlight the blocks. The first block (blue) represents our database. The second block (red) is data pre-processing, where we remove attributes. The third and fourth blocks (orange) implement the \textit{aPhyloGeo} software and its parameters for our phylogeographic analyses (see in the second step of the \autoref{aPhyloGeo-software}). The fifth block (grey) calculates phylogenetic tree comparison distances. The sixth block (yellow) compares the distances between the phylogenetic trees produced. The seventh block (purple) identifies regions with high mutation rates based on the results of the tree comparisons. *See YAML files on \href{https://github.com/tahiri-lab/aPhyloGeo}{GitHub} for more details on these parameters. \label{fig:fig1}}
 \end{figure}
 
 \subsection{Description of the data}
 The study area was located in a northern region of the North Atlantic, including the Icelandic Sea, the Denmark Strait, and the Norwegian Sea. The specimens examined were collected as part of the IceAGE project (Icelandic marine Animals: Genetic and Ecology; Cruise ship M85/3 in 2011), which focused on the deep continental slopes and abyssal waters around Iceland \citep{meisner_prefacebiodiversity_2018}. The sampling period for the included specimens was from August 30 to September 22, 2011, and they were collected at depths ranging from 316 m to 2568 m. Detailed protocols concerning the sampling plan, sample processing, DNA extraction steps, PCR amplification, sequencing, and aligned DNA sequences are available in \citep{uhlir_adding_2021}.
 
 \subsection{Data pre-processing}
-We used data from the article \citep{uhlir_adding_2021}, IceAGE project, and related data from the bold system's database, as described in \citep{uhlir_adding_2021}. Given these databases' enormous breadth of features, we applied a selective reduction procedure. Attributes that were not directly relevant to the analysis of correlations between Cumacea genetics and habitat properties, displayed little to no variability (non-numerical data), and had a large number of missing data (> 95\%) were omitted from our study. Out of the 495 available in the IceAGE dataset, we considered 62 specimens for which mitochondrial DNA sequences of the 16S rRNA gene were available. 
+We used data from the article \citep{uhlir_adding_2021}, IceAGE project, and related data from the bold system's database, as described in \citep{uhlir_adding_2021}. Given these databases' enormous breadth of features, we applied a selective reduction procedure. Attributes that were not directly relevant to the analysis of correlations between Cumacea genetics and habitat properties, displayed little to no variability (non-numerical data), and had a large number of missing data (> 95\%) were omitted from our study. Out of the 495 available in the IceAGE dataset, we considered 62 specimens for which mitochondrial DNA sequences of the 16S rRNA gene were available.
 
 Next, we calculated the variance using the $var()$ function in RStudio Desktop 4.3.2 for each of the selected numerical attributes. This step aimed to eliminate attributes with low variation, as they are unlikely to provide critical data to the analysis. We set a variance threshold of ≤ 0.1 to exclude uninformative attributes. The latter allows us to retain attributes whose variability is reasonably sufficient for our analyses while rejecting those with little variation. Only water salinity was eliminated based on this criterion ($S^2 = 0.02146629$). The formula (equation \ref{variance}) and code (\autoref{lst:variance}) used to calculate the variance of our final features, available in the data file on \href{https://github.com/tahiri-lab/Cumacea_aPhyloGeo}{GitHub}, are provided below:
 
@@ -107,39 +107,39 @@ \subsection{Data pre-processing}
 print(correlation_matrix)
 \end{lstlisting}
 
-This selection of attributes and data resulted in a table containing 62 rows ($n=62$) and 17 columns (number of attributes). 
+This selection of attributes and data resulted in a table containing 62 rows ($n=62$) and 17 columns (number of attributes).
 
 \subsection{Selected attributes in the IceAGE database}
 
-\subsubsection{Geographic data} 
+\subsubsection{Geographic data}
 
 \begin{itemize}
 \item The latitude (Figure \ref{fig:fig2}a) and longitude (Figure \ref{fig:fig2}b) at the start of sampling, both in decimal degrees (DD), as they are intimately linked to the environmental gradients and historical mechanisms modeling genetic heterogeneity \citep{gaither2013origins}.
-\item The sectors across the seas around Iceland: the Denmark Strait ($n=28$), the Iceland Basin ($n=15$), the Irminger Basin ($n=12$), the Norwegian Sea ($n=4$), and the Norwegian Basin ($n=3$). 
+\item The sectors across the seas around Iceland: the Denmark Strait ($n=28$), the Iceland Basin ($n=15$), the Irminger Basin ($n=12$), the Norwegian Sea ($n=4$), and the Norwegian Basin ($n=3$).
 \end{itemize}
 
-\subsubsection{Environmental data} 
+\subsubsection{Environmental data}
 \begin{itemize}
-\item Depth (m) at the start of sampling (Figure \ref{fig:fig2}c), as well as water temperature ($^\circ$C) (Figure \ref{fig:fig2}e), and O\textsubscript{2} concentration (mg/L) (Figure \ref{fig:fig2}f), as these are vital elements of the marine ecosystem that have an impact on the distribution and evolutionary acclimatization of marine species \citep{rex2006global, danovaro2010first}. 
+\item Depth (m) at the start of sampling (Figure \ref{fig:fig2}c), as well as water temperature ($^\circ$C) (Figure \ref{fig:fig2}e), and O\textsubscript{2} concentration (mg/L) (Figure \ref{fig:fig2}f), as these are vital elements of the marine ecosystem that have an impact on the distribution and evolutionary acclimatization of marine species \citep{rex2006global, danovaro2010first}.
 \item The sampling sites' sedimentary characteristics directly influence the distribution of Cumacea \citep{uhlir_adding_2021}. In this study, they are divided into six ecological niche categories: mud ($n=30$), sandy mud ($n=15$), sand ($n=9$), forams ($n=3$), muddy sand ($n=3$), and gravel ($n=2$).
 \end{itemize}
 
-\subsubsection{Climatic data} 
-Wind speed (m/s) (Figure \ref{fig:fig2}d) and wind direction at the start and end of sampling were also included, giving the contribution of wind to benthic ecosystem dynamics and the restructuring of species distribution by wind currents and sediment transport \citep{siedlecki2016experiments, waga_recent_2020,saeedi_environmental_2022}. The wind direction at the start of sampling comprises six orientations: South-West ($n=22$), South ($n=15$), North-East ($n=9$), South-South-East ($n=9$), North-West ($n=5$), and East ($n=2$); while that at the end of sampling is composed of seven orientations: South ($n=15$), South-West ($n=15$), North-East ($n=9$), West-South-West ($n=7$), South-East ($n=6$), North-North-West ($n=5$), South-South-East ($n=3$), and East ($n=2$). 
+\subsubsection{Climatic data}
+Wind speed (m/s) (Figure \ref{fig:fig2}d) and wind direction at the start and end of sampling were also included, giving the contribution of wind to benthic ecosystem dynamics and the restructuring of species distribution by wind currents and sediment transport \citep{siedlecki2016experiments, waga_recent_2020,saeedi_environmental_2022}. The wind direction at the start of sampling comprises six orientations: South-West ($n=22$), South ($n=15$), North-East ($n=9$), South-South-East ($n=9$), North-West ($n=5$), and East ($n=2$); while that at the end of sampling is composed of seven orientations: South ($n=15$), South-West ($n=15$), North-East ($n=9$), West-South-West ($n=7$), South-East ($n=6$), North-North-West ($n=5$), South-South-East ($n=3$), and East ($n=2$).
 
 \subsection{Selected attributes in the bold system's database}
-\subsubsection{Taxonomic data} 
+\subsubsection{Taxonomic data}
 The family, genus, and scientific name of the cumaceans sampled were integrated into our data to study evolutionary relationships and genetic variation to habitat attributes among the specimens in our dataset. These comprise seven families: Diastylidae ($n=21$), Lampropidae ($n=13$), Leuconidae ($n=12$), Astacidae ($n=7$), Bodotriidae ($n=4$), Ceratocumatidae ($n=3$), and Pseudocumatidae ($n=2$). A total of 21 cumacean species were found in our sample (Figure \ref{fig:fig3}). We have also included the sample identity (id) so that each sample remains unique. Some specimens were only identified to genus ($n=1$) or family ($n=5$) in our sample.
 
-\subsection{Selected attributes from article \cite{uhlir_adding_2021}} 
-\subsubsection{Other environmental data} 
+\subsection{Selected attributes from article \cite{uhlir_adding_2021}}
+\subsubsection{Other environmental data}
 The habitat and water mass of the sampling points were the only water attributes taken directly from Table 1 of \citep{uhlir_adding_2021}, as they can give us insight into how they may affect Cumacea genetic diversity and the acclimatization of these species in the GIN seas around Iceland. Thus, the water masses definitions, as described in \citep{uhlir_adding_2021}, were used as a reference: Arctic Polar Water (APW, $n=15$), Iceland Sea Overflow Water (ISOW, $n=15$), North Atlantic Water (NAW, $n=9$), Arctic Polar Water/Norwegian Sea Arctic Intermediate Water (APW/NSAIW, $n=7$), warm Norwegian Sea Deep Water (NSDWw, $n=8$), Labrador Sea Water (LSW, $n=3$), cold Norwegian Sea Deep Water (NSDWc, $n=3$), and Norwegian Sea Arctic Intermediate Water (NSAIW, $n=2$) (Figure \ref{fig:fig4}). In terms of habitat, we considered the three categories used in \citep{uhlir_adding_2021}: Deep Sea ($n=38$), Shelf ($n=15$), and Slope ($n=9$) (Figure \ref{fig:fig5}).
 
-\subsubsection{Genetic data} 
+\subsubsection{Genetic data}
 To better interpret benthic species' relationship and evolutionary responses, genetic data are required \citep{wilson_speciation_1987, uhlir_adding_2021}. Thus, the aligned DNA sequence of the 16S rRNA mitochondrial gene region from each of the samples was included in our analyses. This region is standard in phylogeny and phylogeography studies \citep{hugenholtz1998impact} and sufficiently conserved over time to guarantee exact alignments between different species or populations \citep{saccone1999evolutionary}. We examined 62 of the 306 aligned DNA sequences used for phylogeographic analyses by \citep{uhlir_adding_2021}. As some specimens in our sample have their DNA sequence duplicated, or even quadruplicated with a difference of one or two nucleotides, we took into account the longest-aligned DNA sequence of each specimen.
 
 \subsection{{\textit{aPhyloGeo} software}\label{aPhyloGeo-software}}
-We used the cross-platform Python software \textit{aPhyloGeo} for our phylogeographic analyses, designed to analyze phylogenetic trees using ecological and geographic attributes (\autoref{lst:main}). Developed by My-Linh Luu, Georges Marceau, David Beauchemin, and Nadia Tahiri, \textit{aPhyloGeo} offers tools to study and identify potential divergence between species genetics and habitat characteristics, enabling us to understand the evolution of species under different environmental conditions \citep{koshkarov_phylogeography_2022}. 
+We used the cross-platform Python software \textit{aPhyloGeo} for our phylogeographic analyses, designed to analyze phylogenetic trees using ecological and geographic attributes (\autoref{lst:main}). Developed by My-Linh Luu, Georges Marceau, David Beauchemin, and Nadia Tahiri, \textit{aPhyloGeo} offers tools to study and identify potential divergence between species genetics and habitat characteristics, enabling us to understand the evolution of species under different environmental conditions \citep{koshkarov_phylogeography_2022}.
 
 We selected this software for our analysis because, to our knowledge, it is the first phylogeographic tool capable of establishing similarity or dissimilarity between species genetics and environmental, climatic, and geographical attributes - precisely the objective of our study \citep{koshkarov_phylogeography_2022}. The \textit{aPhyloGeo} software offers several key functionalities:
 
@@ -177,15 +177,15 @@ \subsection{{\textit{aPhyloGeo} software}\label{aPhyloGeo-software}}
     # Generate phylogenetic trees based on aligned sequences
     # Create phylogenetic trees from the multiple sequence alignments (MSA).
     genetic_trees = utils.genetic_pipeline(alignments.msa)
-    
+
     # Create a GeneticTrees object
     # Represent the generated phylogenetic trees in Newick format.
     trees = GeneticTrees(trees_dict=genetic_trees, format="newick")
-   
+
     # Generate attribute trees based on attribute data
     # Create trees representing the relationships between different attributes.
     attribute_trees = utils.attribute_pipeline(attribute_data)
-    
+
     # Filter the results based on the generated trees
     # Filter the results to ensure they meet certain criteria.
     utils.filter_results(attribute_trees, genetic_trees, attribute_data)
@@ -196,26 +196,26 @@ \subsection{{\textit{aPhyloGeo} software}\label{aPhyloGeo-software}}
 \begin{enumerate}
 \item \textbf{The first step} was to collect DNA sequences from Cumacea of sufficient quality for the needs of our results \citep{koshkarov_phylogeography_2022}. In this study, 62 cumaceans samples were selected to represent 62 sequences of the 16S rRNA mitochondrial gene. We then included two climatic attributes, namely wind speed (m/s) at the start and end of the sampling; three environmental characteristics, such as depth (m) at the start of sampling, water temperature ($^\circ$C), and O\textsubscript{2} concentration (mg/L); and two geographic variables, latitude (DD) and longitude (DD) at the start of sampling.
 
-\item \textbf{In the second step}, trees were generated separately from biological, spatial, meteorological, and genetic data. Concerning spatial attributes, we calculated the dissimilarity between each pair of cumaceans from distinct spatial conditions \citep{koshkarov_phylogeography_2022}. This produced a symmetrical square matrix \citep{koshkarov_phylogeography_2022}. The {neighbor-joining algorithm}\footnote{It is a method used to construct phylogenetic trees using distance matrices.} was used to build the spatial tree from this matrix \citep{koshkarov_phylogeography_2022}. Each geographic attribute gives rise to a geographic tree. If there are $m$ windows affected by this attribute, there will be $m$ geographic trees. The same approach was applied to biological, meteorological, and genetic data. 
+\item \textbf{In the second step}, trees were generated separately from biological, spatial, meteorological, and genetic data. Concerning spatial attributes, we calculated the dissimilarity between each pair of cumaceans from distinct spatial conditions \citep{koshkarov_phylogeography_2022}. This produced a symmetrical square matrix \citep{koshkarov_phylogeography_2022}. The {neighbor-joining algorithm}\footnote{It is a method used to construct phylogenetic trees using distance matrices.} was used to build the spatial tree from this matrix \citep{koshkarov_phylogeography_2022}. Each geographic attribute gives rise to a geographic tree. If there are $m$ windows affected by this attribute, there will be $m$ geographic trees. The same approach was applied to biological, meteorological, and genetic data.
 
-For genetic data, phylogenetic reconstruction was reiterated to build genetic trees based on 62 mitochondrial 16S rRNA sequences, considering only data within a window that progresses along the alignment \citep{koshkarov_phylogeography_2022}. This displacement can vary according to the steps and the size of the window defined by the user (their length is determined by the number of base pairs (bp)) \citep{koshkarov_phylogeography_2022}. 
+For genetic data, phylogenetic reconstruction was reiterated to build genetic trees based on 62 mitochondrial 16S rRNA sequences, considering only data within a window that progresses along the alignment \citep{koshkarov_phylogeography_2022}. This displacement can vary according to the steps and the size of the window defined by the user (their length is determined by the number of base pairs (bp)) \citep{koshkarov_phylogeography_2022}.
 
 In our case, we set up the \textit{aPhyloGeo} software as follows: $pairwiseAligner$ for sequence alignment; $\text{Hamming distance}$ to measure simple dissimilarities between sequences of identical length; $\text{Wider Fit by elongating with Gap (starAlignment)}$ algorithm takes alignment gaps into account, which is often mandatory in the case of major deletions or insertions in the sequences; $\text{windows\_size}$: 1 nucleotide (nt); and finally, $\text{step\_size}$: 10 nt. The last two configurations imply that for each 1 nt window, a phylogenetic tree is produced using the nucleotide of each cumacean, then the window is moved by 10 nt, creating a new tree. Each window in the alignment will give a genetic tree. If there are $n$ windows, there will be $n$ phylogenetic trees. Genetic trees will be used in an object called $T_1$, while spatial and ecological trees are used in another object called $T_2$.
 
-\item \textbf{In the third step}, the genetic trees constructed in each sliding window are compared with ecosystemic, atmospheric, and regional trees using Robinson-Foulds distance \citep{robinson_comparison_1981}, normalized Robinson-Foulds distance, Euclidean distance, and Least Squares distance. These contribute to understanding the correspondence between cumaceans genetic sequences and their habitat. The approach also takes bootstrapping into account \citep{koshkarov_phylogeography_2022}. The results of these metrics were obtained using the functions $least\_square(tree1, tree2)$, $robinson\_foulds(tree1, tree2)$, $euclidean\_dist(tree1, tree2)$ from the \textit{aPhyloGeo} software and were organized by the main function (\autoref{lst:main}). Those for the normalized Robinson-Foulds distance were obtained with the function $robinson\_foulds(tree1, tree2)$ (see the last line of code in \autoref{lst:robinsonFoulds}). The metric output tells us which of our attributes have the greatest divergence of phylogenetic relationships in our samples, based on the magnitude of the metric distances (see Figure \ref{fig:fig6} and Figure \ref{fig:fig7}). 
+\item \textbf{In the third step}, the genetic trees constructed in each sliding window are compared with ecosystemic, atmospheric, and regional trees using Robinson-Foulds distance \citep{robinson_comparison_1981}, normalized Robinson-Foulds distance, Euclidean distance, and Least Squares distance. These contribute to understanding the correspondence between cumaceans genetic sequences and their habitat. The approach also takes bootstrapping into account \citep{koshkarov_phylogeography_2022}. The results of these metrics were obtained using the functions $least\_square(tree1, tree2)$, $robinson\_foulds(tree1, tree2)$, $euclidean\_dist(tree1, tree2)$ from the \textit{aPhyloGeo} software and were organized by the main function (\autoref{lst:main}). Those for the normalized Robinson-Foulds distance were obtained with the function $robinson\_foulds(tree1, tree2)$ (see the last line of code in \autoref{lst:robinsonFoulds}). The metric output tells us which of our attributes have the greatest divergence of phylogenetic relationships in our samples, based on the magnitude of the metric distances (see Figure \ref{fig:fig6} and Figure \ref{fig:fig7}).
 
 In addition to identifying the specific attribute, a sliding-window approach enables the precise localization of subtle sequences with high rates of genetic mutation \citep{koshkarov_phylogeography_2022}. This method requires shifting a fixed-size window over the alignment of genetic sequences, allowing phylogenetic trees to be reconstructed for each part of the sequence. It therefore allows us to recognize changes in evolutionary relationships along the 16S rRNA mitochondrial gene region of cumacean species. This method is essential for determining whether cumaceans-specific gene sequences in this region of their genome may be affected by certain ecological or spatial attributes of their habitat (see Figure \ref{fig:fig6} and Figure \ref{fig:fig7}).
 \end{enumerate}
 
 \subsection{Metrics}\label{metrics}
-Our phylogeographic study used four distance metrics to quantify topological differences between phylogenetic trees. It also assesses dissimilarities between genetic sequences and ecological and regional attributes. This enables a comprehensive analysis of the evolutionary dynamics of cumacean populations in different environmental contexts. 
+Our phylogeographic study used four distance metrics to quantify topological differences between phylogenetic trees. It also assesses dissimilarities between genetic sequences and ecological and regional attributes. This enables a comprehensive analysis of the evolutionary dynamics of cumacean populations in different environmental contexts.
 
-The following section presents a more concise version of the functions mentioned in the second and third steps of section \autoref{aPhyloGeo-software}:
+The following section presents a more concise version of the functions mentioned in the second and third steps of \autoref{aPhyloGeo-software}:
 
 \subsubsection{Robinson-Foulds distance}\label{RF}
-The Robinson-Foulds (RF) distance calculates the distance between phylogenetic trees built in each sliding window ($T_1$) and the attributes trees ($T_2$) (see the list in the first step of the section \autoref{aPhyloGeo-software}) \citep{tahiri2018new, koshkarov_phylogeography_2022}. This measurement is used to evaluate the topological differences between the two sets of trees (see Equation \eqref{eq:rf} and \autoref{lst:robinsonFoulds}).
+The Robinson-Foulds (RF) distance calculates the distance between phylogenetic trees built in each sliding window ($T_1$) and the attributes trees ($T_2$) (see the list in the first step of the \autoref{aPhyloGeo-software}) \citep{tahiri2018new, koshkarov_phylogeography_2022}. This measurement is used to evaluate the topological differences between the two sets of trees (see Equation \eqref{eq:rf} and \autoref{lst:robinsonFoulds}).
 
-For example, it evaluates the number of division differences between phylogenetic trees built within certain user-defined sliding windows (see the second step of the section \autoref{aPhyloGeo-software}) and geographic trees built with latitude data (DD) at the start of sampling \citep{robinson_comparison_1981}. A high distance between a specific window and other windows considered in the RF distance analysis implies that the habitat feature has little to no impact on this particular DNA sequence and that this attribute cannot explain the genetic divergences observed in this DNA sequence.
+For example, it evaluates the number of division differences between phylogenetic trees built within certain user-defined sliding windows (see the second step of the \autoref{aPhyloGeo-software}) and geographic trees built with latitude data (DD) at the start of sampling \citep{robinson_comparison_1981}. A high distance between a specific window and other windows considered in the RF distance analysis implies that the habitat feature has little to no impact on this particular DNA sequence and that this attribute cannot explain the genetic divergences observed in this DNA sequence.
 
 \begin{equation}\label{eq:rf}
     \text{RF}(T_1, T_2) = | \Sigma(T_1) \Delta \Sigma(T_2) |
@@ -262,7 +262,7 @@ \subsubsection{Robinson-Foulds distance}\label{RF}
 \end{lstlisting}
 
 \subsubsection{Normalized Robinson-Foulds distance}\label{RFnorm}
-The normalized Robinson-Foulds (nRF) distance scales the RF distance to account for the size variations in the trees (number of clades; i.e., a group of species with a common origin), allowing a more equitable comparison. It scales the distance to a range between 0 and 1. In our context, the distance has been normalized by $2n-6$, where $n$ represents the number of taxa (see Equation \eqref{eq:rf_norm} and the last line of code in \autoref{lst:robinsonFoulds}). 
+The normalized Robinson-Foulds (nRF) distance scales the RF distance to account for the size variations in the trees (number of clades; i.e., a group of species with a common origin), allowing a more equitable comparison. It scales the distance to a range between 0 and 1. In our context, the distance has been normalized by $2n-6$, where $n$ represents the number of taxa (see Equation \eqref{eq:rf_norm} and the last line of code in \autoref{lst:robinsonFoulds}).
 
 Since the size of environmental trees constructed with O\textsubscript{2} concentration data (mg/L) differs from that of other attributes due to missing data, this nRF distance allows us to compare its dissimilarity with the phylogenetic trees in a fairer way \citep{tahiri2018new, koshkarov_phylogeography_2022}. It reveals the relative influence of O\textsubscript{2} concentration (mg/L) on cumacean phylogenetic relationships, independent of tree size \citep{tahiri2018new, koshkarov_phylogeography_2022}. A high value of this metric between a specific window and other windows considered in the nRF distance analysis does not allow us to conclude that there is a correlation between this DNA sequence and the attribute. It may indicate a topological dissimilarity between the habitat attribute tree and the gene trees at that position in the DNA sequence alignments.
 
@@ -275,7 +275,7 @@ \subsubsection{Normalized Robinson-Foulds distance}\label{RFnorm}
 \subsubsection{Euclidean distance}\label{euclidean}
 In our study, the Euclidean distance calculates the distance between two sets of points in a multidimensional space, which designates the divisions of the two sets of trees ($T_1$ and $T_2$). It is used to compare divisions between two respective sets of trees to assess the degree of divergence or similarity of their topologies (see Equation \eqref{eq:euclidean} and \autoref{lst:euclideanDist}). Branches are weighted according to their length, which makes it possible to obtain quantitative dissimilarities between leaf (i.e., cumacean species) pairs (i.e., genetic distance) in the two sets of trees \citep{choi2009comparison}. Thus, for each pair of leaves, their distance in the genetic trees and the habitat attribute trees are compared \citep{choi2009comparison}.
 
-By comparing the two sets of trees $T_1$ and $T_2$ using this metric, it is possible to measure the extent to which genetic divergences correspond to fluctuations in habitat attributes. This is crucial for interpreting evolutionary relationships with these factors. A high distance of this metric between a specific window and other windows considered in the Euclidean distance analysis reveals evolutionary divergences between members of the cumacean populations at the level of this DNA sequence (see Figure \ref{fig:fig6}d and Figure \ref{fig:fig7}d). 
+By comparing the two sets of trees $T_1$ and $T_2$ using this metric, it is possible to measure the extent to which genetic divergences correspond to fluctuations in habitat attributes. This is crucial for interpreting evolutionary relationships with these factors. A high distance of this metric between a specific window and other windows considered in the Euclidean distance analysis reveals evolutionary divergences between members of the cumacean populations at the level of this DNA sequence (see Figure \ref{fig:fig6}d and Figure \ref{fig:fig7}d).
 
 \begin{equation}\label{eq:euclidean}
     d_{\text{Euclidean}}(T_1, T_2) = \sqrt{\sum_{i=1}^{n} (T1_i - T2_i)^2}
@@ -306,21 +306,21 @@ \subsubsection{Euclidean distance}\label{euclidean}
     # Load the first tree from Newick format into a dendropy Tree object
     # Analyzes the string formatted by Newick and prepares the tree for comparison.
     tree1_tc = dendropy.Tree.get(
-        data=tree1.format("newick"), 
-        schema="newick", 
+        data=tree1.format("newick"),
+        schema="newick",
         taxon_namespace=tns
     )
-    
+
     # Load the second tree from Newick format into a dendropy Tree object
     # Similar to the first tree, this step prepares the second tree for comparison.
     tree2_tc = dendropy.Tree.get(
-        data=tree2.format("newick"), 
-        schema="newick", 
+        data=tree2.format("newick"),
+        schema="newick",
         taxon_namespace=tns
     )
 
     # Encode the bipartitions of both trees
-    # This step converts the trees into a format where the presence or absence of 
+    # This step converts the trees into a format where the presence or absence of
     # Each bipartition (split) is coded, which is necessary to calculate distances.
     tree1_tc.encode_bipartitions()
     tree2_tc.encode_bipartitions()
@@ -347,7 +347,7 @@ \subsubsection{Least Squares distance}\label{LS}
 
 def least_square(tree1, tree2):
     """
-    
+
     Parameters:
     - tree1: Genetic trees.
     - tree2: Atmospherical, ecosystemic, and spatial trees.
@@ -355,16 +355,16 @@ \subsubsection{Least Squares distance}\label{LS}
     Returns:
     - ls: The Least-Squares distance between the two sets of trees.
     """
-    
+
     # Initialize the Least-Squares distance to zero
     ls = 0.0
-    
+
     # Retrieve the list of terminal leaves (species) from the first tree
     leaves = tree1.get_terminals()
-    
+
     # Extract the names of the terminal leaves
     leaves_name = [leaf.name for leaf in leaves]
-    
+
     # Iterate over each pair of leaves in the trees
     for i in leaves_name:
         # Remove the first leaf from the list to avoid redundant comparisons
@@ -376,7 +376,7 @@ \subsubsection{Least Squares distance}\label{LS}
             d2 = tree2.distance(tree2.find_any(i), tree2.find_any(j))
             # Accumulate the absolute difference of distances into the LSD
             ls += abs(d1 - d2)
-    
+
     return ls
 \end{lstlisting}
 
@@ -414,7 +414,7 @@ \section{Results}\label{results}
     \caption{Frequency distribution of cumacean species in our sample. The bars represent the number of individuals for each species. The percentages (\%) displayed above the bars indicate the relative abundance of each species in the total sample. The mean and median values of the frequency distribution are shown in the top right-hand corner of the histogram. Unlike less common species, those that are abundant (such as \emph{Leptostylis ampullacea} and \emph{Leucon pallidus}) may have adaptive characteristics that enable them to exploit resources more easily, resist interspecific competition or withstand changing biological conditions. \label{fig:fig3}}
 \end{figure}
 
-The distribution and diversity of the various cumacean species found in our sample are shown in Figure \ref{fig:fig3}. It shows that the most represented species are \emph{Leptostylis ampullacea} (14.1\%) and \emph{Leucon pallidus} (12.5\%). In contrast, species like \emph{Bathycuma brevirostre} and \emph{Styloptocuma gracillimum} are less represented (1.6\%), implying that some species may have restricted ecological niches or face ecological forces that limit their distribution. The dominance of certain species (such as \emph{Leptostylis ampullacea} and \emph{Leucon pallidus}) suggests that they may have adaptive traits that enable them to make the most of the accessible resources, resist interspecific competition, or survive in fluctuating ecosystemic conditions, aligns with our study’s aim of relating genetic adaptation to habitat characteristics. 
+The distribution and diversity of the various cumacean species found in our sample are shown in Figure \ref{fig:fig3}. It shows that the most represented species are \emph{Leptostylis ampullacea} (14.1\%) and \emph{Leucon pallidus} (12.5\%). In contrast, species like \emph{Bathycuma brevirostre} and \emph{Styloptocuma gracillimum} are less represented (1.6\%), implying that some species may have restricted ecological niches or face ecological forces that limit their distribution. The dominance of certain species (such as \emph{Leptostylis ampullacea} and \emph{Leucon pallidus}) suggests that they may have adaptive traits that enable them to make the most of the accessible resources, resist interspecific competition, or survive in fluctuating ecosystemic conditions, aligns with our study’s aim of relating genetic adaptation to habitat characteristics.
 
 \begin{figure}[htbp]
     \centering
@@ -422,7 +422,7 @@ \section{Results}\label{results}
     \caption{Distribution of cumacean families by water mass. This histogram represents the frequency of occurrence of the different cumacean families in our samples, classified according to the water mass in which they were collected. Eight water mass categories are represented: Arctic Polar Water (APW), Arctic Polar Water/North Sub-Arctic Intermediate Water (APW/NSAIW), Iceland Scotland Overflow Water (ISOW), Labrador Sea Water (LSW), North Atlantic Water (NAW), North Sub-Arctic Intermediate Water (NSAIW), cold North Sub-Atlantic Deep Water (NSDWc), and warm North Sub-Atlantic Deep Water (NSDWw). Seven families are represented: Astacidae (red), Bodotriidae (brown), Ceratocumatidae (green), Diastylidae (turquoise), Lampropidae (blue), Leuconidae (purple), and Pseudocumatidae (pink). The presence of the Diastylidae (turquoise) family in the majority of water bodies (APW, APW/NSAIW, ISOW, NSAIW, NSDWc, and NSDWw) accentuates the resilience and ecological acclimatization of this family to various ecological niches and conditions. \label{fig:fig4}}
 \end{figure}
 
-The following figure supports the objective of our study by showing the distribution of the various cumacean families in the different water bodies (Figure \ref{fig:fig4}). The Diastylidae family, for example, is the most common in all water bodies (turquoise color in Figure \ref{fig:fig4}), testifying to its resilience and ecological adaptability to a wide variety of habitat conditions, reminiscent of the dominance of \emph{Leptostylis ampullacea} (Figure \ref{fig:fig3}, 14.1\%) which belongs to the Diastylidae family. 
+The following figure supports the objective of our study by showing the distribution of the various cumacean families in the different water bodies (Figure \ref{fig:fig4}). The Diastylidae family, for example, is the most common in all water bodies (turquoise color in Figure \ref{fig:fig4}), testifying to its resilience and ecological adaptability to a wide variety of habitat conditions, reminiscent of the dominance of \emph{Leptostylis ampullacea} (Figure \ref{fig:fig3}, 14.1\%) which belongs to the Diastylidae family.
 
 \begin{figure}[]
     \centering
@@ -444,7 +444,7 @@ \section{Results}\label{results}
     \caption{Analysis of fluctuations in four distance metrics using multiple sequence alignment (MSA): a) Least Squares distance, b) Robinson-Foulds distance, c) normalized Robinson-Foulds distance, and d) Euclidean distance. These variations in distance are studied to establish their dissimilarity with the variation in Otextsubscript{2} concentration (mg/L) at the sampling sites. \label{fig:fig7}}
 \end{figure}
 
-The divergence between the genetic sequences and two attributes, one climatic (wind speed (m/s) at the start of sampling) and the other environmental (O\textsubscript{2} concentration (mg/L)) is presented in Figure \ref{fig:fig6} and Figure \ref{fig:fig7}. All the attributes given in the first step of the \autoref{aPhyloGeo-software} section were analyzed and their script and figure will be soon available in the $img$ and $script$ python file on \href{https://github.com/tahiri-lab/Cumacea_aPhyloGeo}{GitHub}. However, only these two attributes showed the most interesting mutation rate. Using the four metrics mentioned in the section \autoref{metrics}, we noticed that the Euclidean distance is particularly sensitive to our data, manifesting considerable sequence variation at the position in MSA 520-529 amino acids (aa) (Euclidean distance: 0.8 < x < 0.9; Figure \ref{fig:fig6}d) and 1190-199 aa (Euclidean distance: 1.2 < x < 1.3; Figure \ref{fig:fig7}d). Unlike the other windows for this metric in the two figures (see Figure \ref{fig:fig6}d and Figure \ref{fig:fig7}d), the fluctuations in wind speed (m/s) at the start of sampling and in O\textsubscript{2} concentration (mg/L) do not appear to explain the variations in these two specific sequences. This implies that these genetic sites are subject to selection pressures or evolutionary changes, due to biological (O\textsubscript{2} concentration (mg/L)) and meteorological conditions (wind speed (m/s) at the start of sampling). These results align with our study's aim to identify the genetic region of cumaceans with the highest mutation rate linked to a specific habitat attribute.
+The divergence between the genetic sequences and two attributes, one climatic (wind speed (m/s) at the start of sampling) and the other environmental (O\textsubscript{2} concentration (mg/L)) is presented in Figure \ref{fig:fig6} and Figure \ref{fig:fig7}. All the attributes given in the first step of the \autoref{aPhyloGeo-software} section were analyzed and their script and figure will be soon available in the $img$ and $script$ python file on \href{https://github.com/tahiri-lab/Cumacea_aPhyloGeo}{GitHub}. However, only these two attributes showed the most interesting mutation rate. Using the four metrics mentioned in the \autoref{metrics}, we noticed that the Euclidean distance is particularly sensitive to our data, manifesting considerable sequence variation at the position in MSA 520-529 amino acids (aa) (Euclidean distance: 0.8 < x < 0.9; Figure \ref{fig:fig6}d) and 1190-199 aa (Euclidean distance: 1.2 < x < 1.3; Figure \ref{fig:fig7}d). Unlike the other windows for this metric in the two figures (see Figure \ref{fig:fig6}d and Figure \ref{fig:fig7}d), the fluctuations in wind speed (m/s) at the start of sampling and in O\textsubscript{2} concentration (mg/L) do not appear to explain the variations in these two specific sequences. This implies that these genetic sites are subject to selection pressures or evolutionary changes, due to biological (O\textsubscript{2} concentration (mg/L)) and meteorological conditions (wind speed (m/s) at the start of sampling). These results align with our study's aim to identify the genetic region of cumaceans with the highest mutation rate linked to a specific habitat attribute.
 
 These results provide important insight into the genetic adaptation of cumaceans to their environment. These results need to be analyzed in greater depth to certify their involvement, especially in contrast with \citep{uhlir_adding_2021}, which investigated similar topics of environmental and climatic effects on cumaceans distribution and genetics. The \textit{aPhyloGeo} package is still in the process of being updated. A more in-depth analysis of the results is available on \href{https://github.com/tahiri-lab/Cumacea_aPhyloGeo}{GitHub} in the supplementary file.
 
@@ -453,7 +453,7 @@ \section{Conclusion}\label{conclusion}
 
 The novelty in our research lies in the exhaustive divergence between habitat attributes and genetic mutability in cumaceans, particularly in identifying genetic windows associated with habitat fluctuations, which has not been widely investigated in previous studies \citep{manel2003landscape, vrijenhoek2009cryptic}. In this case, our integrated method identifies specific genetic regions sensitive to ecosystemic and atmospheric variations. Thus, by seeking to determine which of these two attributes diverges most with the DNA sequences, the eventual identification of proteins linked to one of these variable DNA sequences will make it possible to represent its functional effects in responses to habitat changes. Our future research will focus on verifying the prediction of this protein and assessing its role in the physiological adaptation of cumaceans to fluctuating conditions, adding a link between genetic data and ecological function.
 
-Interpreting how marine invertebrates genetically adapt to variations in their habitat can help us better predict their responses to climate change and advance conservation plans to protect them. Identifying the specific attributes that influence genetic variability of Cumacea can contribute to the designation and supervision of marine protected areas, assuring they include habitats crucial to the survival and acclimatization of these species. Thus, our results can inform the management of fishing and seabed mining companies by revealing ecologically vulnerable areas where these disturbances can seriously affect benthic biodiversity. 
+Interpreting how marine invertebrates genetically adapt to variations in their habitat can help us better predict their responses to climate change and advance conservation plans to protect them. Identifying the specific attributes that influence genetic variability of Cumacea can contribute to the designation and supervision of marine protected areas, assuring they include habitats crucial to the survival and acclimatization of these species. Thus, our results can inform the management of fishing and seabed mining companies by revealing ecologically vulnerable areas where these disturbances can seriously affect benthic biodiversity.
 
 Furthermore, our results provide essential knowledge to guide future studies on the genetic adaptation of Cumacea and other invertebrates to ecological and regional variability. Based on these findings, future research should focus on additional ecosystemic and meteorological attributes, such as nutrient accessibility, water pH, ocean currents, and the degree of human disturbance, to further improve the interpretation of the complex interactions between genetics and the environment. Broadening the scope of application to other marine species, not just marine invertebrates, and diverse geographic regions would allow us to generalize the results more effectively. With this in mind, longitudinal study models on these species could reflect long-term climatic and biological fluctuations and improve our knowledge of the dynamics of genetic acclimatization.
 
diff --git a/papers/Gagnon_Kebe_Tahiri/myst.yml b/papers/Gagnon_Kebe_Tahiri/myst.yml
index 36a9b321ea..5945d39b9f 100644
--- a/papers/Gagnon_Kebe_Tahiri/myst.yml
+++ b/papers/Gagnon_Kebe_Tahiri/myst.yml
@@ -1,9 +1,12 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/NVYF1037
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-gagnon_kebe_tahiri
   title: Ecological and Geographic Influences on Cumacea Genetics in the Northern North Atlantic
   subtitle: by aPhyloGeo software
+  description: Cumacea are vital indicators of benthic health in marine ecosystems. This study investigated the influence of environmental (i.e., biological or ecosystemic), climatic (i.e., meteorological or atmospheric), and geographic (i.e., spatial or regional) attributes on their genetic variability in the Northern North Atlantic, focusing on Icelandic waters.
   # Authors should have affiliations, emails and ORCIDs if available
   authors:
     - name: Justin Gagnon
@@ -27,19 +30,14 @@ project:
     - Phylogeography
   # Add the abbreviations that you use in your paper here
   abbreviations:
-    MyST: Markedly Structured Text
+    DD: decimal degrees
+    PCR: Polymerase Chain Reaction
+    rRNA: Ribosomal ribonucleic acid
+    GIN: Greenland, Iceland, and Norwegian
   # It is possible to explicitly ignore the `doi-exists` check for certain citation keys
   error_rules:
     - rule: doi-exists
       severity: ignore
       keys:
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-09-23
 site:
   template: article-theme
diff --git a/papers/Gagnon_Kebe_Tahiri/thumbnail.png b/papers/Gagnon_Kebe_Tahiri/thumbnail.png
new file mode 100644
index 0000000000..2ef59a3721
Binary files /dev/null and b/papers/Gagnon_Kebe_Tahiri/thumbnail.png differ
diff --git a/papers/Suvrakamal_Das/banner.jpg b/papers/Suvrakamal_Das/banner.jpg
deleted file mode 100644
index e7b6696129..0000000000
Binary files a/papers/Suvrakamal_Das/banner.jpg and /dev/null differ
diff --git a/papers/Suvrakamal_Das/banner.png b/papers/Suvrakamal_Das/banner.png
new file mode 100644
index 0000000000..e767488250
Binary files /dev/null and b/papers/Suvrakamal_Das/banner.png differ
diff --git a/papers/Suvrakamal_Das/main.md b/papers/Suvrakamal_Das/main.md
index dd5d4276e7..0d7024bbe0 100644
--- a/papers/Suvrakamal_Das/main.md
+++ b/papers/Suvrakamal_Das/main.md
@@ -1,6 +1,9 @@
-## Abstract
-
-The quest for more efficient and faster deep learning models has led to the development of various alternatives to Transformers, one of which is the Mamba model. This paper provides a comprehensive comparison between Mamba models and Transformers, focusing on their architectural differences, performance metrics, and underlying mechanisms. It analyzes and synthesizes findings from extensive research conducted by various authors on these models. The synergy between Mamba models and the SciPy ecosystem enhances their integration into science. By providing an in-depth comparison using Python and its scientific ecosystem, this paper aims to clarify the strengths and weaknesses of Mamba models relative to Transformers. It offers the results obtained along with some thoughts on the possible ramifications for future research and applications in a range of academic and professional fields.
+---
+title: Mamba Models a possible replacement for Transformers?
+subtitle: A Memory-Efficient Approach for Scientific Computing
+abstract: |
+  The quest for more efficient and faster deep learning models has led to the development of various alternatives to Transformers, one of which is the Mamba model. This paper provides a comprehensive comparison between Mamba models and Transformers, focusing on their architectural differences, performance metrics, and underlying mechanisms. It analyzes and synthesizes findings from extensive research conducted by various authors on these models. The synergy between Mamba models and the SciPy ecosystem enhances their integration into science. By providing an in-depth comparison using Python and its scientific ecosystem, this paper aims to clarify the strengths and weaknesses of Mamba models relative to Transformers. It offers the results obtained along with some thoughts on the possible ramifications for future research and applications in a range of academic and professional fields.
+---
 
 ### Introduction
 
@@ -20,11 +23,11 @@ Finally, we explore the potential applications and future directions of Mamba mo
 
 ### State Space Models
 
-The central goal of machine learning is to develop models capable of efficiently processing sequential data across a range of modalities and tasks [@mamba_github]. This is particularly challenging when dealing with **long sequences**, especially those exhibiting **long-range dependencies (LRDs)**  – where information from distant past time steps significantly influences the current state or future predictions. Examples of such sequences abound in real-world applications, including speech, video, medical, time series, and natural language. However, traditional models struggle to effectively handle such long sequences.
+The central goal of machine learning is to develop models capable of efficiently processing sequential data across a range of modalities and tasks [@mamba_github]. This is particularly challenging when dealing with **long sequences**, especially those exhibiting **long-range dependencies (LRDs)** – where information from distant past time steps significantly influences the current state or future predictions. Examples of such sequences abound in real-world applications, including speech, video, medical, time series, and natural language. However, traditional models struggle to effectively handle such long sequences.
 
 **Recurrent Neural Networks (RNNs)** [@Sherstinsky_2020], often considered the natural choice for sequential data, are inherently stateful and require only constant computation per time step. However, they are slow to train and suffer from the well-known "**vanishing gradient problem**", which limits their ability to capture LRDs. **Convolutional Neural Networks (CNNs)** [@oshea2015introduction], while efficient for parallelizable training, are not inherently sequential and struggle with long context lengths, resulting in more expensive inference. **Transformers** [@vaswani2023attention], despite their recent success in various tasks, typically require specialized architectures and attention mechanisms to handle LRDs, which significantly increase computational complexity and memory usage.
 
-A promising alternative for tackling LRDs in long sequences is **State Space Models (SSMs)** [@gu2022efficiently], a foundational mathematical framework deeply rooted in diverse scientific disciplines like control theory and computational neuroscience. SSMs provide a continuous-time representation of a system's state and evolution, offering a powerful paradigm for capturing LRDs.  While SSMs and S4s does not prevent the vanishing gradient problem but it reduces the impact with the help of HiPPO framework and NPLR Parametrization. They represent a system's behavior in terms of its internal **state** and how this state evolves over time. SSMs are widely used in various fields, including control theory, signal processing, and computational neuroscience.
+A promising alternative for tackling LRDs in long sequences is **State Space Models (SSMs)** [@gu2022efficiently], a foundational mathematical framework deeply rooted in diverse scientific disciplines like control theory and computational neuroscience. SSMs provide a continuous-time representation of a system's state and evolution, offering a powerful paradigm for capturing LRDs. While SSMs and S4s does not prevent the vanishing gradient problem but it reduces the impact with the help of HiPPO framework and NPLR Parametrization. They represent a system's behavior in terms of its internal **state** and how this state evolves over time. SSMs are widely used in various fields, including control theory, signal processing, and computational neuroscience.
 
 #### Continuous-time Representation
 
@@ -32,21 +35,21 @@ The continuous-time SSM describes a system's evolution using differential equati
 
 The core equations of the continuous-time SSM are:
 
-* **State Evolution:**
-  $${x'(t) = Ax(t) + Bu(t)}$$
+- **State Evolution:**
+    $${x'(t) = Ax(t) + Bu(t)}$$
 
-* **Output Generation:**
-  $${y(t) = Cx(t) + Du(t)}$$
+- **Output Generation:**
+    $${y(t) = Cx(t) + Du(t)}$$
 
 where:
 
-* $x(t)$ is the state vector at time $t$, belonging to a $N$-dimensional space.
-* $u(t)$ is the input signal at time $t$.
-* $y(t)$ is the output signal at time $t$.
-* $A$ is the state matrix, controlling the evolution of the state vector $x(t)$.
-* $B$ is the control matrix, mapping the input signal $u(t)$ to the state space.
-* $C$ is the output matrix, projecting the state vector $x(t)$ onto the output space.
-* $D$ is the command matrix, directly mapping the input signal $u(t)$ to the output. (For simplicity, we often assume $D$ = 0, as $Du(t)$ can be viewed as a skip connection.)
+- $x(t)$ is the state vector at time $t$, belonging to a $N$-dimensional space.
+- $u(t)$ is the input signal at time $t$.
+- $y(t)$ is the output signal at time $t$.
+- $A$ is the state matrix, controlling the evolution of the state vector $x(t)$.
+- $B$ is the control matrix, mapping the input signal $u(t)$ to the state space.
+- $C$ is the output matrix, projecting the state vector $x(t)$ onto the output space.
+- $D$ is the command matrix, directly mapping the input signal $u(t)$ to the output. (For simplicity, we often assume $D$ = 0, as $Du(t)$ can be viewed as a skip connection.)
 
 This system of equations defines a continuous-time mapping from input $u(t)$ to output $y(t)$ through a latent state $x(t)$. The state matrix $A$ plays a crucial role in determining the dynamics of the system and its ability to capture long-range dependencies.
 
@@ -56,9 +59,9 @@ Despite their theoretical elegance, naive applications of SSMs often struggle wi
 
 HiPPO focuses on finding specific state matrices $A$ that allow the state vector $x(t)$ to effectively memorize the history of the input signal $u(t)$. It achieves this by leveraging the properties of orthogonal polynomials. The HiPPO framework derives several structured state matrices, including:
 
-* **HiPPO-LegT (Translated Legendre):** Based on Legendre polynomials, this matrix enables the state to capture the history of the input within sliding windows of a fixed size.
-* **HiPPO-LagT (Translated Laguerre):** Based on Laguerre polynomials, this matrix allows the state to capture a weighted history of the input, where older information decays exponentially.
-* **HiPPO-LegS (Scaled Legendre):** Based on Legendre polynomials, this matrix captures the history of the input with respect to a linearly decaying weight.
+- **HiPPO-LegT (Translated Legendre):** Based on Legendre polynomials, this matrix enables the state to capture the history of the input within sliding windows of a fixed size.
+- **HiPPO-LagT (Translated Laguerre):** Based on Laguerre polynomials, this matrix allows the state to capture a weighted history of the input, where older information decays exponentially.
+- **HiPPO-LegS (Scaled Legendre):** Based on Legendre polynomials, this matrix captures the history of the input with respect to a linearly decaying weight.
 
 #### Discrete-time SSM: Recurrent Representation
 
@@ -74,13 +77,13 @@ $$
 
 where:
 
-* $\Delta$ acts as a gating factor, selectively weighting the contribution of matrices $A$ and $B$ at each step. This allows the model to dynamically adjust the influence of past hidden states and current inputs.
+- $\Delta$ acts as a gating factor, selectively weighting the contribution of matrices $A$ and $B$ at each step. This allows the model to dynamically adjust the influence of past hidden states and current inputs.
 
-* $A$ represents the state transition matrix. When modulated by $\Delta$, it governs the propagation of information from the previous hidden state to the current hidden state.
+- $A$ represents the state transition matrix. When modulated by $\Delta$, it governs the propagation of information from the previous hidden state to the current hidden state.
 
-* $B$ denotes the input matrix. After modulation by $\Delta$, it determines how the current input is integrated into the hidden state.
+- $B$ denotes the input matrix. After modulation by $\Delta$, it determines how the current input is integrated into the hidden state.
 
-* $C$ serves as the output matrix. It maps the hidden state to the model's output, effectively transforming the internal representations into a desired output space.
+- $C$ serves as the output matrix. It maps the hidden state to the model's output, effectively transforming the internal representations into a desired output space.
 
 :::{figure} ssm.svg
 :label: fig:ssm
@@ -109,17 +112,20 @@ The state-space models (SSMs) compute the output using a linear recurrent neural
 $$
 h_t = \overline{A} h_{t-1} + \overline{B} x_t
 $$
+
 where
 
-* $h_t$ is hidden state matrix at time step t
-* $x_t$ is input vector at time t
+- $h_t$ is hidden state matrix at time step t
+- $x_t$ is input vector at time t
 
 The initial hidden state $h_0$ is computed as:
+
 $$
 h_0 = \overline{A} h_{-1} + \overline{B} x_0 = \overline{B} x_0
 $$
 
 Subsequently, the hidden state at the next time step, $h_1$, is obtained through the recursion:
+
 $$
 h_1 = \overline{A} h_0 + \overline{B} x_1 = \overline{A} \overline{B}
 $$
@@ -130,10 +136,11 @@ $$
 y_t = C h_t
 $$
 
-* C is the output control matrix
-* $y_t$ is output vector at time t
-* $h_t$ is the Internal hidden state at time t
+- C is the output control matrix
+- $y_t$ is output vector at time t
+- $h_t$ is the Internal hidden state at time t
 
+```{math}
 \begin{align*}
 y_0 &= C h_0 = C \overline{B} x_0 \\ 
 y_1 &= C h_1 = C \overline{A} \overline{B} x_0 + C \overline{B} x_1 \\
@@ -141,6 +148,7 @@ y_2 &= C \overline{A}^2 \overline{B} x_0 + C \overline{A} \overline{B} x_1 + C \
 &\vdots\\
 y_t &= C \overline{A}^t \overline{B} x_0 + C \overline{A}^{t-1} \overline{B} x_1 + \ldots + C \overline{A} \overline{B} x_{t-1} + C \overline{B} x_t
 \end{align*}
+```
 
 $$
 Y = K \cdot X
@@ -148,8 +156,8 @@ $$
 
 where :
 
-* $X$ is the input matrix *i.e.* $[x_0, x_1, \ldots, x_L]$
-* $
+- $X$ is the input matrix _i.e._ $[x_0, x_1, \ldots, x_L]$
+- $
 K = \left( C \overline{B}, \, C \overline{A} \overline{B}, \, \ldots, \, C \overline{A}^{L-1} \overline{B} \right)
 $
 
@@ -166,31 +174,31 @@ The core computational bottleneck in SSMs stems from repeated matrix multiplicat
 
 Diagonalization involves finding a change of basis that transforms $A$ into a diagonal form. However, this approach faces significant challenges when $A$ is **non-normal**. Non-normal matrices have complex eigenstructures, which can lead to several problems:
 
-* **Numerically unstable diagonalization:** Diagonalizing non-normal matrices can be numerically unstable, especially for large matrices. This is because the eigenvectors may be highly sensitive to small errors in the matrix, leading to large errors in the computed eigenvalues and eigenvectors.
-* **Exponentially large entries:** The diagonalization of some non-normal matrices, including the HiPPO matrices, can involve matrices with entries that grow exponentially with the dimension $N$. This can lead to overflow issues during computation and render the diagonalization infeasible in practice.
+- **Numerically unstable diagonalization:** Diagonalizing non-normal matrices can be numerically unstable, especially for large matrices. This is because the eigenvectors may be highly sensitive to small errors in the matrix, leading to large errors in the computed eigenvalues and eigenvectors.
+- **Exponentially large entries:** The diagonalization of some non-normal matrices, including the HiPPO matrices, can involve matrices with entries that grow exponentially with the dimension $N$. This can lead to overflow issues during computation and render the diagonalization infeasible in practice.
 
 Therefore, naive diagonalization of non-normal matrices in SSMs is not a viable solution for efficient computation.
 
 ### The S4 Parameterization: Normal Plus Low-Rank (NPLR)
 
-S4 overcomes the challenges of directly diagonalizing non-normal matrices by introducing a novel parameterization [@gu2022parameterization]. It decomposes the state matrix *A* into a sum of a **normal matrix** and a **low-rank term**. This decomposition allows for efficient computation while preserving the structure necessary to handle long-range dependencies. The S4 parameterization is expressed as follows:
+S4 overcomes the challenges of directly diagonalizing non-normal matrices by introducing a novel parameterization [@gu2022parameterization]. It decomposes the state matrix _A_ into a sum of a **normal matrix** and a **low-rank term**. This decomposition allows for efficient computation while preserving the structure necessary to handle long-range dependencies. The S4 parameterization is expressed as follows:
 
-* SSM convolution kernel
+- SSM convolution kernel
 
 $$ ~~~~~~~~
 \overline K = \kappa _L(\overline A, \overline B, \overline C) \text{~~~for~~~} A = V \Lambda V^* − P Q^T$$
 
 where:
 
-* *V* is a unitary matrix that diagonalizes the normal matrix.
-* *Λ* is a diagonal matrix containing the eigenvalues of the normal matrix.
-* *P* and *Q* are low-rank matrices that capture the non-normal component.
-* These matrices HiPPO - $LegS, LegT, LagT$ all satisfy $r$ = 1 or $r$ = 2.
+- _V_ is a unitary matrix that diagonalizes the normal matrix.
+- _Λ_ is a diagonal matrix containing the eigenvalues of the normal matrix.
+- _P_ and _Q_ are low-rank matrices that capture the non-normal component.
+- These matrices HiPPO - $LegS, LegT, LagT$ all satisfy $r$ = 1 or $r$ = 2.
 
 This decomposition allows for efficient computation because:
 
-* **Normal matrices are efficiently diagonalizable:** Normal matrices can be diagonalized stably and efficiently using unitary transformations.
-* **Low-rank corrections are tractable:** The low-rank term can be corrected using the Woodbury identity, a powerful tool for inverting matrices perturbed by low-rank terms.
+- **Normal matrices are efficiently diagonalizable:** Normal matrices can be diagonalized stably and efficiently using unitary transformations.
+- **Low-rank corrections are tractable:** The low-rank term can be corrected using the Woodbury identity, a powerful tool for inverting matrices perturbed by low-rank terms.
 
 ### S4 Algorithms and Complexity
 
@@ -212,10 +220,12 @@ $$
 
 Selective SSM
 
+```{math}
 \begin{align*}
 y_t &= C_0 \overline{A}^t \overline{B}_0 x_0 + C_1 \overline{A}^{t-1} \overline{B}_1 x_1 + \ldots \\
     &\quad \text{input-dependent } B \text{ and } C \text{ matrix}
 \end{align*}
+```
 
 By leveraging the parallel associative scan technique [@lim2024parallelizing], the selective SSM formulation can be efficiently implemented on parallel architectures, such as GPUs. This approach enables the exploitation of the inherent parallelism in the computation, leading to significant performance gains, particularly for large-scale applications and time-series data processing tasks.
 
@@ -239,7 +249,7 @@ In summary, S4 offers a structured and efficient approach to SSMs, overcoming th
 
 ### Mamba Model Architecture
 
-One Mamba Layer [@gu2023mamba] @fig:mamba  is composed of a selective state-space module and several auxiliary layers. Initially, a linear layer doubles the dimensionality of the input token embedding, increasing the dimensionality from 64 to 128. This higher dimensionality provides the network with an expanded representational space, potentially enabling the separation of previously inseparable classes.
+One Mamba Layer [@gu2023mamba] @fig:mamba is composed of a selective state-space module and several auxiliary layers. Initially, a linear layer doubles the dimensionality of the input token embedding, increasing the dimensionality from 64 to 128. This higher dimensionality provides the network with an expanded representational space, potentially enabling the separation of previously inseparable classes.
 Subsequently, a canonical 1D convolution layer processes the output of the previous layer, manipulating the dimensions within the linearly upscaled 128-dimensional vector. This convolution layer employs the **SiLU (Sigmoid-weighted Linear Unit)** activation function [@elfwing2017sigmoidweighted]. The output of the convolution is then processed by the selective state-space module, which operates akin to a linear recurrent neural network (RNN).
 
 :::{figure} mamba.svg
@@ -259,12 +269,14 @@ Self attention, feed forward Neural Networks, normalization, residual layers and
 #### Architecture Overview
 
 ##### Transformer Architecture
-Transformers @fig:transformer rely heavily on attention mechanisms to model dependencies between input and output sequences. A better understanding of the code will be of great help[@transformer_py].
+
+Transformers @fig:transformer rely heavily on attention mechanisms to model dependencies between input and output sequences. A better understanding of the code will be of great help [@transformer_py].
 
 The core components include:
-* **Multi-Head Self-Attention**: Allows the model to focus on different parts of the input sequence.
-* **Position-wise Feed-Forward Networks**: Applied to each position separately.
-* **Positional Encoding**: Adds information about the position of each token in the sequence, as Transformers lack inherent sequential information due to the parallel nature of their processing.
+
+- **Multi-Head Self-Attention**: Allows the model to focus on different parts of the input sequence.
+- **Position-wise Feed-Forward Networks**: Applied to each position separately.
+- **Positional Encoding**: Adds information about the position of each token in the sequence, as Transformers lack inherent sequential information due to the parallel nature of their processing.
 
 :::{figure} transformer.webp
 :label: fig:transformer
@@ -272,10 +284,12 @@ This diagram illustrates the transformer model architecture, featuring encoder a
 :::
 
 ##### Mamba Architecture
+
 Mamba models @fig:mamba are based on Selective State Space Models (SSMs), combining aspects of RNNs, CNNs, and classical state space models. Key features include:
-* **Selective State Space Models**: Allow input-dependent parameterization to selectively propagate or forget information.
-* **Recurrent Mode**: Efficient recurrent computations with linear scaling.
-* **Hardware-aware Algorithm**: Optimized for modern hardware to avoid inefficiencies from the Flash Attention 2 Paper. [@]
+
+- **Selective State Space Models**: Allow input-dependent parameterization to selectively propagate or forget information.
+- **Recurrent Mode**: Efficient recurrent computations with linear scaling.
+- **Hardware-aware Algorithm**: Optimized for modern hardware to avoid inefficiencies from the Flash Attention 2 Paper.
 
 #### Key Differences
 
@@ -292,10 +306,10 @@ Here, $ A $, $ B $, and $ C $ are state space parameters that vary with the inpu
 
 ##### 2. Computational Complexity
 
-| Feature     | Architecture    | Complexity   | Inference Speed   | Training Speed   |
-|------------|:----------------|:-------------|:------------------|:-----------------|
-| Transformer | Attention-based | High         | O(n)              | O(n²)           |
-| Mamba       | SSM-based       | Lower        | O(1)              | O(n)             |
+| Feature     | Architecture    | Complexity | Inference Speed | Training Speed |
+| ----------- | :-------------- | :--------- | :-------------- | :------------- |
+| Transformer | Attention-based | High       | O(n)            | O(n²)          |
+| Mamba       | SSM-based       | Lower      | O(1)            | O(n)           |
 
 ##### 3. Sequence Handling and Memory Efficiency
 
@@ -308,14 +322,16 @@ Mamba integrates selective state spaces directly into the neural network archite
 There are other competing architectures that aim to replace or complement Transformers, such as Retentive Network [@sun2023retentive], Griffin [@de2024griffin], Hyena [@poli2023hyena], and RWKV [@peng2023rwkv]. These architectures propose alternative approaches to modeling sequential data, leveraging techniques like gated linear recurrences, local attention, and reinventing recurrent neural networks (RNNs) for the Transformer era.
 
 ### Mamba's Synergy with Scipy
+
 Scipy [@scipy] provides a robust ecosystem for scientific computing in Python, offering a wide range of tools and libraries for numerical analysis, signal processing, optimization, and more. This ecosystem serves as a fertile ground for the development and integration of Mamba, facilitating its training, evaluation, and deployment in scientific applications. Leveraging Scipy's powerful data manipulation and visualization capabilities, Mamba models can be seamlessly integrated into scientific workflows, enabling in-depth analysis, rigorous statistical testing, and clear visualization of results.
 
 The combination of Mamba's language understanding capabilities and Scipy's scientific computing tools opens up new avenues for exploring large-scale scientific datasets commonly encountered in scientific research domains such as astronomy, medicine, and beyond, extracting insights, and advancing scientific discoveries.
 
 #### Potential Applications and Future Directions:
-* **Efficient Processing of Large Scientific Datasets:** Mamba's ability to handle long-range dependencies makes it well-suited for analyzing and summarizing vast amounts of scientific data, such as astronomical observations, medical records, or experimental results, thereby reducing the complexity and enabling more efficient analysis.
-* **Enhancing Model Efficiency and Scalability:** Integrating Mamba with Scipy's optimization and parallelization techniques can potentially improve the efficiency and scalability of language models, enabling them to handle increasingly larger datasets and more complex scientific problems.
-* **Advancing Scientific Computing through Interdisciplinary Collaboration:** The synergy between Mamba and Scipy fosters interdisciplinary collaboration between natural language processing researchers, scientific computing experts, and domain-specific scientists, paving the way for novel applications and pushing the boundaries of scientific computing.
+
+- **Efficient Processing of Large Scientific Datasets:** Mamba's ability to handle long-range dependencies makes it well-suited for analyzing and summarizing vast amounts of scientific data, such as astronomical observations, medical records, or experimental results, thereby reducing the complexity and enabling more efficient analysis.
+- **Enhancing Model Efficiency and Scalability:** Integrating Mamba with Scipy's optimization and parallelization techniques can potentially improve the efficiency and scalability of language models, enabling them to handle increasingly larger datasets and more complex scientific problems.
+- **Advancing Scientific Computing through Interdisciplinary Collaboration:** The synergy between Mamba and Scipy fosters interdisciplinary collaboration between natural language processing researchers, scientific computing experts, and domain-specific scientists, paving the way for novel applications and pushing the boundaries of scientific computing.
 
 The diverse range of models as U-Mamba [@ma2024umamba], Vision Mamba[@zhu2024vision], VMamba [@liu2024vmamba], MambaByte [@wang2024mambabyte]and Jamba [@lieber2024jamba], highlights the versatility and adaptability of the Mamba architecture. These variants have been designed to enhance efficiency, improve long-range dependency modeling, incorporate visual representations, explore token-free approaches, integrate Fourier learning, and hybridize with Transformer components.
 
diff --git a/papers/Suvrakamal_Das/myst.yml b/papers/Suvrakamal_Das/myst.yml
index 3f9309f2be..662ee76d6a 100644
--- a/papers/Suvrakamal_Das/myst.yml
+++ b/papers/Suvrakamal_Das/myst.yml
@@ -1,10 +1,13 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/XHDR4700
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-Suvrakamal_Das
   # Ensure your title is the same as in your `main.md`
   title: Mamba Models a possible replacement for Transformers?
-  subtitle: A Memory-Efficient Approach for Scientific Computing (SciPy'24 Paper)
+  subtitle: A Memory-Efficient Approach for Scientific Computing
+  description: The quest for more efficient and faster deep learning models has led to the development of various alternatives to Transformers, one of which is the Mamba model. This paper provides a comprehensive comparison between Mamba models and Transformers, focusing on their architectural differences, performance metrics, and underlying mechanisms.
   # Authors should have affiliations, emails and ORCIDs if available
   authors:
     - name: Suvrakamal Das
@@ -12,13 +15,13 @@ project:
       orcid: 0009-0002-4791-9244
       affiliations:
         - Maulana Abul Kalam Azad University Institute of Technology, West Bengal
-        
+
     - name: Rounak Sen
       email: rony000013@gmail.com
       orcid: 0009-0003-9327-4712
       affiliations:
         - Maulana Abul Kalam Azad University Institute of Technology, West Bengal
-          
+
     - name: Saikrishna Devendiran
       email: dsaikrishna200r@gmail.com
       orcid: 0009-0003-6153-3177
@@ -55,13 +58,5 @@ project:
         - mamba_github
         - mamba_s4
         - transformer_py
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.jpg
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
 site:
   template: article-theme
diff --git a/papers/Suvrakamal_Das/thumbnail.png b/papers/Suvrakamal_Das/thumbnail.png
new file mode 100644
index 0000000000..ca2072a51d
Binary files /dev/null and b/papers/Suvrakamal_Das/thumbnail.png differ
diff --git a/papers/Valeria_Martin/banner.png b/papers/Valeria_Martin/banner.png
index c5dd028e26..7c3ec2ae04 100644
Binary files a/papers/Valeria_Martin/banner.png and b/papers/Valeria_Martin/banner.png differ
diff --git a/papers/Valeria_Martin/main.tex b/papers/Valeria_Martin/main.tex
index 844dec52f9..0c376666ec 100644
--- a/papers/Valeria_Martin/main.tex
+++ b/papers/Valeria_Martin/main.tex
@@ -1,107 +1,107 @@
-\documentclass{article}
-\usepackage{graphicx} % Required for inserting images
-\usepackage{listings}
-\usepackage{float}
-\usepackage{color}
-\usepackage{booktabs} 
-\definecolor{codegreen}{rgb}{0,0.6,0}
-\definecolor{codegray}{rgb}{0.5,0.5,0.5}
-\definecolor{codepurple}{rgb}{0.58,0,0.82}
-\definecolor{backcolour}{rgb}{0.95,0.95,0.92}
-\usepackage[numbers]{natbib}
-\usepackage{hyperref}
-
-\lstdefinestyle{mystyle}{
-    backgroundcolor=\color{backcolour},   
-    commentstyle=\color{codegreen},
-    keywordstyle=\color{magenta},
-    numberstyle=\tiny\color{codegray},
-    stringstyle=\color{codepurple},
-    basicstyle=\footnotesize,
-    breakatwhitespace=false,         
-    breaklines=true,                 
-    captionpos=b,                    
-    keepspaces=true,                 
-    numbers=left,                    
-    numbersep=5pt,                  
-    showspaces=false,                
-    showstringspaces=false,
-    showtabs=false,                  
-    tabsize=2
-}
-
-\lstset{style=mystyle}
-
-
-\begin{abstract}
-
-In recent years, leveraging satellite imagery with deep learning (DL) architectures has become an effective approach for environmental monitoring tasks, including forest wildfire detection. Nevertheless, this integration requires substantial high-quality labeled data to train the DL models accurately. Leveraging the capabilities of multiple Python libraries, such as rasterio and GeoPandas, and Google Earth Engine’s Python API, this study introduces a streamlined methodology to efficiently gather, label, augment, process, and evaluate a large-scale bi-temporal high-resolution satellite imagery dataset for DL-driven forest wildfire detection. Known as the California Wildfire GeoImaging Dataset (CWGID), this dataset comprises over 100,000 labeled 'before' and 'after' wildfire image pairs, created from pre-existing satellite imagery. An analysis of the dataset using pre-trained and adapted Convolutional Neural Network (CNN) architectures, such as VGG16 and EfficientNet, achieved accuracies of respectively 76\% and 93\%. The pipeline outlined in this paper demonstrates how Python can be used to gather and process high-resolution satellite imagery datasets, leading to accurate wildfire detection and providing a tool for broader environmental monitoring.
-
-\end{abstract}
-
-\section{Introduction}\label{introduction}
- 
-
+\documentclass{article}
+\usepackage{graphicx} % Required for inserting images
+\usepackage{listings}
+\usepackage{float}
+\usepackage{color}
+\usepackage{booktabs} 
+\definecolor{codegreen}{rgb}{0,0.6,0}
+\definecolor{codegray}{rgb}{0.5,0.5,0.5}
+\definecolor{codepurple}{rgb}{0.58,0,0.82}
+\definecolor{backcolour}{rgb}{0.95,0.95,0.92}
+\usepackage[numbers]{natbib}
+\usepackage{hyperref}
+
+\lstdefinestyle{mystyle}{
+    backgroundcolor=\color{backcolour},   
+    commentstyle=\color{codegreen},
+    keywordstyle=\color{magenta},
+    numberstyle=\tiny\color{codegray},
+    stringstyle=\color{codepurple},
+    basicstyle=\footnotesize,
+    breakatwhitespace=false,         
+    breaklines=true,                 
+    captionpos=b,                    
+    keepspaces=true,                 
+    numbers=left,                    
+    numbersep=5pt,                  
+    showspaces=false,                
+    showstringspaces=false,
+    showtabs=false,                  
+    tabsize=2
+}
+
+\lstset{style=mystyle}
+
+
+\begin{abstract}
+
+In recent years, leveraging satellite imagery with deep learning (DL) architectures has become an effective approach for environmental monitoring tasks, including forest wildfire detection. Nevertheless, this integration requires substantial high-quality labeled data to train the DL models accurately. Leveraging the capabilities of multiple Python libraries, such as rasterio and GeoPandas, and Google Earth Engine’s Python API, this study introduces a streamlined methodology to efficiently gather, label, augment, process, and evaluate a large-scale bi-temporal high-resolution satellite imagery dataset for DL-driven forest wildfire detection. Known as the California Wildfire GeoImaging Dataset (CWGID), this dataset comprises over 100,000 labeled 'before' and 'after' wildfire image pairs, created from pre-existing satellite imagery. An analysis of the dataset using pre-trained and adapted Convolutional Neural Network (CNN) architectures, such as VGG16 and EfficientNet, achieved accuracies of respectively 76\% and 93\%. The pipeline outlined in this paper demonstrates how Python can be used to gather and process high-resolution satellite imagery datasets, leading to accurate wildfire detection and providing a tool for broader environmental monitoring.
+
+\end{abstract}
+
+\section{Introduction}\label{introduction}
+ 
+
 This paper presents a Python-based methodology for gathering and using a labeled high-resolution satellite imagery dataset for forest wildfire detection. 
 
 Forests are important ecosystems found globally. They are made up of trees, plants, and other various types of vegetation. Forests host many species and are crucial for maintaining environmental health, as they support biodiversity, climate regulation, and oxygen production. Moreover, they bring economic and social benefits, including energy production, job opportunities, and spaces for leisure and tourism. Protecting forests and tackling forest loss is a current global priority \citep{IUCN2021}.
 
-With the development of Earth Observation (EO) systems, remote sensing became a time-efficient and cost-effective method for monitoring and detecting forest change \citep{Massey2023}. Moreover, recent advancements in satellite technology have significantly enhanced forest monitoring capabilities by providing high-resolution imagery and increasing the frequency of observations. 
-
-Satellite imagery-based change detection and forest monitoring have traditionally relied on manually identifying specific features and using predefined algorithms and models, such as differential analysis, thresholding techniques, and clustering and classification algorithms. This approach requires considerable domain expertise and such algorithms and models may not capture the full complexity of the studied data \citep{rs14071552}. 
-
+With the development of Earth Observation (EO) systems, remote sensing became a time-efficient and cost-effective method for monitoring and detecting forest change \citep{Massey2023}. Moreover, recent advancements in satellite technology have significantly enhanced forest monitoring capabilities by providing high-resolution imagery and increasing the frequency of observations. 
+
+Satellite imagery-based change detection and forest monitoring have traditionally relied on manually identifying specific features and using predefined algorithms and models, such as differential analysis, thresholding techniques, and clustering and classification algorithms. This approach requires considerable domain expertise and such algorithms and models may not capture the full complexity of the studied data \citep{rs14071552}. 
+
 With the emergence of deep learning (DL) algorithms, specifically computer vision methods, such as Convolutional Neural Networks (CNNs) \citep{lecun} and Fully Convolutional Neural  Networks (e.g., U-Nets \citep{DBLP:RonnebergerFB15}), there is a significant opportunity to enhance and facilitate forest change detection efforts. These advanced computational methods can rapidly identify complex patterns within vast datasets. Furthermore, when integrated with EO systems they can facilitate near real-time monitoring and detection of multiple forest loss causes, assess their extent, or even predict and evaluate their spread \citep{eleo,al-dabbagh2023uni}. Thus, integrating DL methods with satellite imagery offers a more dynamic and precise approach, capable of handling the patterns and variability associated with imagery data. For instance, DL models can automatically learn complex patterns related to wildfire spread from labeled examples, whereas traditional methods might miss subtle but important indicators. 
 
 However, DL algorithms require a substantial amount of labeled data to effectively learn and identify change \citep{Alzubaidi2021ReviewOD}. Therefore, the development of labeled high-resolution satellite imagery datasets is important and relevant for addressing environmental problems. Currently, the availability of labeled high-quality satellite imagery datasets is an obstacle to developing DL models for environmental change detection \citep{Adegun2023}. 
 
 Generally, building a satellite imagery dataset is a time-intensive process. However, Google Earth Engine (GEE) \citep{gorelick2017google} has recently revolutionized this process by providing an extensive, cloud-based platform for the efficient collection, processing, and analysis of satellite imagery. GEE’s Python API allows its users to programmatically query their platform and download cloud-free large-scale satellite imagery datasets from multiple satellite collections, such as Sentinel-2. 
 
-For example, while traditional methods might require manual search and download of images, GEE can automate this process, significantly reducing the time needed to find suitable satellite images. This technology makes data collection and processing faster and easier, facilitating environmental monitoring by providing reliable and easily accessible high-quality satellite imagery.
-
-Furthermore, Python facilitates the use of DL in environmental monitoring by providing a rich ecosystem of libraries and tools, such as TensorFlow \citep{tensorflow2015-whitepaper}, which contains multiple existing DL architectures that can be adapted and used with satellite imagery. Nevertheless, integrating DL and remotely sensed images requires multiple processing steps, such as having smaller imagery tiles and adapting the models to use GeoTIFF data, among others. These steps include:
-\begin{enumerate}
-    \item Data Acquisition: Collect satellite imagery from sources such as Google Earth Engine or 		 other satellite image providers.
-    \item Image Tiling: Divide large satellite images into smaller tiles to fit the input image size used in DL models.
-    \item Data Annotation: Label the images to create a ground truth dataset for training DL models.
-    \item Data Augmentation: Applying transformations such as rotations and flips to increase the diversity of the training dataset.
+For example, while traditional methods might require manual search and download of images, GEE can automate this process, significantly reducing the time needed to find suitable satellite images. This technology makes data collection and processing faster and easier, facilitating environmental monitoring by providing reliable and easily accessible high-quality satellite imagery.
+
+Furthermore, Python facilitates the use of DL in environmental monitoring by providing a rich ecosystem of libraries and tools, such as TensorFlow \citep{tensorflow2015-whitepaper}, which contains multiple existing DL architectures that can be adapted and used with satellite imagery. Nevertheless, integrating DL and remotely sensed images requires multiple processing steps, such as having smaller imagery tiles and adapting the models to use GeoTIFF data, among others. These steps include:
+\begin{enumerate}
+    \item Data Acquisition: Collect satellite imagery from sources such as Google Earth Engine or 		 other satellite image providers.
+    \item Image Tiling: Divide large satellite images into smaller tiles to fit the input image size used in DL models.
+    \item Data Annotation: Label the images to create a ground truth dataset for training DL models.
+    \item Data Augmentation: Applying transformations such as rotations and flips to increase the diversity of the training dataset.
 \end{enumerate}
-Python’s tools help manage these steps. For example, libraries like rasterio, Tifffile and GeoPandas, can be used to process and transform satellite imagery data into formats suitable for DL models.
-
-
-This paper presents a methodology, implemented in Python, to streamline the creation and evaluation, via DL, of satellite imagery datasets. The methodology covers the entire workflow: from data acquisition, labelling, and preprocessing to model adaptation, training, and evaluation. Specifically, this approach is applied to gather and validate a high-resolution dataset for forest wildfire detection, the California Wildfire GeoImaging Dataset (CWGID). Additionally, this methodology can be adapted for various environmental monitoring tasks, showing its versatility in studying and responding to different environmental changes.
-
-
-\section{Building a Sentinel-2 Satellite Imagery Dataset}
-To construct the CWGID, a multi-step process is needed. 
-\subsection{Gathering and Refining Historic Wildfire Polygon Data from California}
-The initial step is to gather georeferenced forest wildfire polygon data from California, sourced from the Fire and Resource Assessment Program (FRAP) maintained by the California Department of Forestry and Fire Protection \citep{california_department_of_forestry_and_fire_protection_2024}. This FRAP data includes perimeters of past wildfires and serves as the geographic reference needed to select satellite imagery with GEE. Figure \ref{fig2} illustrates the polygons from the FRAP. The polygons delineated in purple represent areas affected by wildfires in forested regions. These delineated polygons are used to create the CWGID. 
-
-
-\begin{figure}[h]
-    \centering
-    \includegraphics{polygons.png}
-    \caption{Representation of the Polygon Data from the FRAP. Polygons delineated in purple represent wildfires that occurred in forested areas, used for the California Wildfire GeoImaging Dataset (CWGID). \label{fig2}}
-\end{figure}
-
-Then, in Python, the Pandas library \citep{pandas1} is used to organize the forest wildfire attribute data into a Pandas DataFrame, which is then filtered to align with the launch date and operational phase of the Sentinel-2 satellites, selected for their open-source, high-resolution imagery capabilities \citep{DRUSCH201225}. Additionally, the dates are adjusted to fall within the green-up period, avoiding the winter and fall seasons where snow cover could interfere with identifying burnt areas.
-
-Next, the data is formatted to meet GEE’s querying specifications:
-\begin{itemize}
-    \item A 15-day range for pre- and post-wildfire dates is generated and added to the DataFrame.
-    \item Using Pandas, the date ranges are formatted to meet GEE's requirements.
-    \item Using the pyproj library \citep{pyproj2023}, the recorded point coordinates are converted from NAD83 to WGS84 to facilitate the querying process.
-    \item With the geopy \citep{geopy} library, the coordinates of the squared region of interest are calculated, featuring a side length of 15 miles.
-\end{itemize}
-
-
-\subsection{Downloading the Imagery Data Using GEE's Python API}
-GEE is a cloud-based platform for global environmental data analysis. It combines an extensive archive of satellite imagery and geospatial datasets with powerful computational resources to enable researchers to detect and quantify changes on the Earth’s surface. GEE’s Python API offers an accessible interface for automating the process of satellite imagery downloads, making it a popular tool for environmental monitoring and research projects.
-
-Multiple steps are needed to set up the GEE's Python API. First, a project is created in Google Cloud Console and the Earth Engine API is enabled. Authentication and Google Drive editing rights are configured to effectively manage and store the downloaded imagery. Following the setup, the Earth Engine Python API is installed on a local machine, and the necessary authentications are performed to initialize the API. 
-
-Then, a Python script is developed to automate the download of images depicting the pre- or post-wildfire data using GEE's Python API (see Code \ref{download}). To download three-channel RGB GeoTIFF imagery, the bands B4 (red), B3(green), and B2(blue) need to be specified (different band compositions can be selected in this step). In satellite imagery, bands refer to specific wavelength ranges captured by the satellite sensors, and they are used to create composite images that highlight different features of the Earth's surface. These bands correspond to the visible spectrum, which is useful for visual interpretation and analysis.
-The script to download the satellite imagery needed using GEE (see Code \ref{download}) is configured with a for loop to iterate through each entry in the DataFrame, extracting necessary parameters such as date ranges, region of interest (ROI) coordinates, and center coordinates of each wildfire polygon. Also, the script is designed to specify parameters such as the desired image collection and a threshold for cloud coverage. Tiles exhibiting more than 10\%  cloud coverage are automatically excluded to maintain data quality. Finally, the images are downloaded and exported to Google Drive in a GeoTIFF format. 
-
+Python’s tools help manage these steps. For example, libraries like rasterio, Tifffile and GeoPandas, can be used to process and transform satellite imagery data into formats suitable for DL models.
+
+
+This paper presents a methodology, implemented in Python, to streamline the creation and evaluation, via DL, of satellite imagery datasets. The methodology covers the entire workflow: from data acquisition, labelling, and preprocessing to model adaptation, training, and evaluation. Specifically, this approach is applied to gather and validate a high-resolution dataset for forest wildfire detection, the California Wildfire GeoImaging Dataset (CWGID). Additionally, this methodology can be adapted for various environmental monitoring tasks, showing its versatility in studying and responding to different environmental changes.
+
+
+\section{Building a Sentinel-2 Satellite Imagery Dataset}
+To construct the CWGID, a multi-step process is needed. 
+\subsection{Gathering and Refining Historic Wildfire Polygon Data from California}
+The initial step is to gather georeferenced forest wildfire polygon data from California, sourced from the Fire and Resource Assessment Program (FRAP) maintained by the California Department of Forestry and Fire Protection \citep{california_department_of_forestry_and_fire_protection_2024}. This FRAP data includes perimeters of past wildfires and serves as the geographic reference needed to select satellite imagery with GEE. Figure \ref{fig2} illustrates the polygons from the FRAP. The polygons delineated in purple represent areas affected by wildfires in forested regions. These delineated polygons are used to create the CWGID. 
+
+
+\begin{figure}[h]
+    \centering
+    \includegraphics{polygons.png}
+    \caption{Representation of the Polygon Data from the FRAP. Polygons delineated in purple represent wildfires that occurred in forested areas, used for the California Wildfire GeoImaging Dataset (CWGID). \label{fig2}}
+\end{figure}
+
+Then, in Python, the Pandas library \citep{pandas1} is used to organize the forest wildfire attribute data into a Pandas DataFrame, which is then filtered to align with the launch date and operational phase of the Sentinel-2 satellites, selected for their open-source, high-resolution imagery capabilities \citep{DRUSCH201225}. Additionally, the dates are adjusted to fall within the green-up period, avoiding the winter and fall seasons where snow cover could interfere with identifying burnt areas.
+
+Next, the data is formatted to meet GEE’s querying specifications:
+\begin{itemize}
+    \item A 15-day range for pre- and post-wildfire dates is generated and added to the DataFrame.
+    \item Using Pandas, the date ranges are formatted to meet GEE's requirements.
+    \item Using the pyproj library \citep{pyproj2023}, the recorded point coordinates are converted from NAD83 to WGS84 to facilitate the querying process.
+    \item With the geopy \citep{geopy} library, the coordinates of the squared region of interest are calculated, featuring a side length of 15 miles.
+\end{itemize}
+
+
+\subsection{Downloading the Imagery Data Using GEE's Python API}
+GEE is a cloud-based platform for global environmental data analysis. It combines an extensive archive of satellite imagery and geospatial datasets with powerful computational resources to enable researchers to detect and quantify changes on the Earth’s surface. GEE’s Python API offers an accessible interface for automating the process of satellite imagery downloads, making it a popular tool for environmental monitoring and research projects.
+
+Multiple steps are needed to set up the GEE's Python API. First, a project is created in Google Cloud Console and the Earth Engine API is enabled. Authentication and Google Drive editing rights are configured to effectively manage and store the downloaded imagery. Following the setup, the Earth Engine Python API is installed on a local machine, and the necessary authentications are performed to initialize the API. 
+
+Then, a Python script is developed to automate the download of images depicting the pre- or post-wildfire data using GEE's Python API (see Code \ref{download}). To download three-channel RGB GeoTIFF imagery, the bands B4 (red), B3(green), and B2(blue) need to be specified (different band compositions can be selected in this step). In satellite imagery, bands refer to specific wavelength ranges captured by the satellite sensors, and they are used to create composite images that highlight different features of the Earth's surface. These bands correspond to the visible spectrum, which is useful for visual interpretation and analysis.
+The script to download the satellite imagery needed using GEE (see Code \ref{download}) is configured with a for loop to iterate through each entry in the DataFrame, extracting necessary parameters such as date ranges, region of interest (ROI) coordinates, and center coordinates of each wildfire polygon. Also, the script is designed to specify parameters such as the desired image collection and a threshold for cloud coverage. Tiles exhibiting more than 10\%  cloud coverage are automatically excluded to maintain data quality. Finally, the images are downloaded and exported to Google Drive in a GeoTIFF format. 
+
 \begin{lstlisting}[language=Python, label=download, caption=Script to automate the download of pre- or post-wildfire images using GEE’s Python API. It iterates through a DataFrame, extracting relevant parameters and downloading images with less than 10\% cloud coverage. The images are exported as GeoTIFF files to Google Drive.]
 # Authenticate into EE
 ee.Authenticate()
@@ -174,161 +174,161 @@ \subsection{Downloading the Imagery Data Using GEE's Python API}
         # Skip images with more than 10% cloud coverage
         print(f"Skipping image {i} due to cloudy percentage ({cloudy_percentage} %) > 10 %")
 \end{lstlisting}
-
-
-Figure \ref{fig3} presents an example of a pre-and post-wildfire imagery pair downloaded from GEE to Google Drive using code \ref{download} . 
-
-\begin{figure}[H]
-    \centering
-    \includegraphics{prepost1.png}
-    \caption{Example of a  pre and post-wildfire RGB image pair of a forested area downloaded using GEE's Python API. \label{fig3}}
-\end{figure}
-\subsection{Creating the Ground Truth Wildfire Labels}
-Ground truth masks are essential in forest wildfire detection and general land cover classifications \citep{8113128}. In this project, these type of masks are generated to label the data.
-
-First, Python is used to rasterize the combined geometry of the forest wildfire polygon data and the downloaded post-wildfire RGB satellite imagery. Specifically, the forest wildfire polygons are accessed in Python using GeoPandas \citep{geopandas} and reprojected to match the coordinate system of the satellite imagery (EPSG:4326). Then, each post-wildfire RGB image is locally and temporarily downloaded from Google Drive, with essential properties such as width, height, transform, and bounds extracted using the rasterio library \citep{rasterio}. Next, the geometry column from the forest wildfire polygon data is extracted and intersected with each image bound using Python's shapely \citep{shapely} library. Finally, binary masks are created by rasterizing the combined geometries. These masks match the dimensions of the satellite images, ensuring that each pixel labeled as wildfire damage corresponds directly to the polygon data (see Code \ref{lst:fire_polygons_processing}). The binary masks are saved temporarily in GeoTIFF format and are uploaded to a dedicated Google Drive folder.  All the temporary local files were deleted to clear space and maintain system efficiency.  
-\begin{lstlisting}[language=Python, label= lst:fire_polygons_processing, caption = Building Ground Truth Masks]
-import geopandas as gpd
-import numpy as np
-import rasterio
-from shapely.geometry import box, shape
-import os
-
-# Define the path to your Shapefile - replace with your specific path
-shapefile_path = "YourShapefileDirectory/FirePolygons.shp"
-
-# Read the Shapefile using geopandas
-gdf = gpd.read_file(shapefile_path)
-
-# Reproject the shapefile to EPSG:4326 to match the satellite imagery coordinate system
-gdf = gdf.to_crs(epsg=4326)
-
-# Create a directory to store the output raster masks if it doesn't already exist
-output_dir = "raster_masks"
-if not os.path.exists(output_dir):
-    os.makedirs(output_dir)
-
-# Iterate through the rows in the attribute table
-for index, row in gdf.iterrows():
-    object_id = row["OBJECTID"]
-    image_path = f"YourImageDirectory/RGB_AfterFire{object_id}.tif"
-    # Check if the image exists
-    if os.path.exists(image_path):
-        # Open the image using rasterio
-        with rasterio.open(image_path) as src:
-            image_width = src.width
-            image_height = src.height
-            image_transform = src.transform
-            image_bounds = box(
-                src.bounds[0], src.bounds[1],
-                src.bounds[2], src.bounds[3]
-            )
-
-        # Extract the geometry column
-        geom = shape(row["geometry"])
-        clipped_geom = geom.intersection(image_bounds.envelope)
-
-        if not clipped_geom.is_empty:
-            # Create a two-dimensional label by rasterizing the
-            # clipped geometry
-            clipped_mask = rasterio.features.geometry_mask(
-                [clipped_geom],
-                out_shape=(image_height, image_width),
-                transform=image_transform,
-                invert=True,
-            )
-
-            # Save the image with the two-dimensional label overlay
-            output_image_path
-            = f"{output_dir}/Masked_{object_id}.tif"
-            with rasterio.open(
-                output_image_path,
-                "w",
-                driver="GTiff",
-                width=image_width,
-                height=image_height,
-                count=1,
-                dtype=np.uint8,
-                crs=src.crs,
-                transform=image_transform,
-            ) as dst:
-                dst.write(clipped_mask.astype(np.uint8), 1)
-\end{lstlisting}
-
-In the resulting ground truth masks, the pixel values are set to zero if they are outside of a wildfire polygon, indicating unaffected areas, and set to one if they are within the polygon boundaries, indicating areas affected by a forest wildfire. Figure \ref{fig4} displays an example of a resulting ground truth mask.
-
-
-\begin{figure}[H]
-    \centering
-    \includegraphics{label1.png}
-    \caption{Example of a resulting ground truth mask in a forested area affected by wildfires. The mask highlights wildfire-affected areas in yellow and unaffected areas in purple. This binary mask is used to train and validate deep learning models for accurate wildfire detection \label{fig4}}
-\end{figure}
-
-
-\subsection{Image Segmentation and Data Preparation for Deep Learning Architectures}
-Next, the satellite images and their corresponding ground-truth masks are cropped into smaller tiles that maintain the imagery's spatial resolution (10m). Often, satellite imagery needs to be resized and downscaled to accommodate deep learning (DL) architectures, which can result in the loss of essential details such as subtle indicators of early-stage wildfires. Moreover, using smaller images enhances the efficiency of DL models by lowering computational demands and speeding up training times \citep{hu2015transferring,marmanis2016deep}. 
-
-To do this, a tile size of 256x256 pixels is specified and each RGB image is downloaded individually from Google Drive to a temporary local folder. Using Python’s rasterio library, the original RGB images are opened to obtain their dimensions. Then, the number of rows and columns for the tiles is calculated based on the chosen tile size. Next, a rasterio Window object is used to extract the corresponding portion from the original image and read the RGB data, ensuring the order of the bands (B4, B3, and B2).
-
-The segmented RGB tiles are then saved as GeoTIFF files using the tifffile Python library \citep{tifffile}. This is a critical step to maintain the integrity of the three-channel RGB data, as the rasterio library alone can alter the color of the images during saving. Additionally, the metadata of the saved tiles is updated to include georeferencing information and to modify parameters such as width, height, and transform (see Code \ref{lst:save_rgb_tiles}).
-
-A similar approach is used to segment the binary masks, specifying that the images contain only one band. 
-\begin{lstlisting}[language=Python, caption={Function to Crop RGB Image Tiles}, label= lst:save_rgb_tiles]
-from rasterio import Window
-from tifffile import imwrite
-
-# Function to save image tiles without changing the data type
-def save_rgb_tiles(image_path, output_folder, tile_size, parent_name):
-    # Open the source image file 
-    with rasterio.open(image_path) as src:
-        height = src.height # Get the height of the source image
-        width = src.width # Get the width of the source image
-        
-        # Calculate the number of tiles in both dimensions
-        num_rows = height // tile_size
-        num_cols = width // tile_size
-
-        tile_counter = 1  # Initialize the tile counter
-        
-        # Iterate over the number of rows of tiles
-        for i in range(num_rows):
-            # Iterate over the number of columns of tiles
-            for j in range(num_cols):
-                # Define the window for the current tile
-                window = Window(j * tile_size, i * tile_size, tile_size, tile_size)
-
-                # Read the original data without modifications
-                # Assuming band order B4, B3, B2
-                tile = src.read((1, 2, 3), window=window)
-                # Create a unique name for the tile
-                tile_name = f"{parent_name}_tile_{tile_counter}.tif"
-                # Define the path to save the tile
-                tile_path = os.path.join(output_folder, tile_name)
-
-                # Save the tile using tifffile without changing
-                # data type
-                imwrite(tile_path, tile)
-
-                # Copy the metadata from the source image
-                meta = src.meta.copy()
-                # Get the transformation matrix for the current window
-                transform = src.window_transform(window)
-	    	    # Update the metadata with the new dimensions and transformation
-                meta.update({
-                    'width': tile_size,
-                    'height': tile_size,
-                    'transform': transform
-                })
-	            # Save the tile with updated metadata using rasterio
-                with rasterio.open(tile_path, 'w', **meta) as dst:
-                    dst.write(tile)
-                tile_counter += 1  # Increment the tile counter
-\end{lstlisting}
+
+
+Figure \ref{fig3} presents an example of a pre-and post-wildfire imagery pair downloaded from GEE to Google Drive using code \ref{download} . 
+
+\begin{figure}[H]
+    \centering
+    \includegraphics{prepost1.png}
+    \caption{Example of a  pre and post-wildfire RGB image pair of a forested area downloaded using GEE's Python API. \label{fig3}}
+\end{figure}
+\subsection{Creating the Ground Truth Wildfire Labels}
+Ground truth masks are essential in forest wildfire detection and general land cover classifications \citep{8113128}. In this project, these type of masks are generated to label the data.
+
+First, Python is used to rasterize the combined geometry of the forest wildfire polygon data and the downloaded post-wildfire RGB satellite imagery. Specifically, the forest wildfire polygons are accessed in Python using GeoPandas \citep{geopandas} and reprojected to match the coordinate system of the satellite imagery (EPSG:4326). Then, each post-wildfire RGB image is locally and temporarily downloaded from Google Drive, with essential properties such as width, height, transform, and bounds extracted using the rasterio library \citep{rasterio}. Next, the geometry column from the forest wildfire polygon data is extracted and intersected with each image bound using Python's shapely \citep{shapely} library. Finally, binary masks are created by rasterizing the combined geometries. These masks match the dimensions of the satellite images, ensuring that each pixel labeled as wildfire damage corresponds directly to the polygon data (see Code \ref{lst:fire_polygons_processing}). The binary masks are saved temporarily in GeoTIFF format and are uploaded to a dedicated Google Drive folder.  All the temporary local files were deleted to clear space and maintain system efficiency.  
+\begin{lstlisting}[language=Python, label= lst:fire_polygons_processing, caption = Building Ground Truth Masks]
+import geopandas as gpd
+import numpy as np
+import rasterio
+from shapely.geometry import box, shape
+import os
+
+# Define the path to your Shapefile - replace with your specific path
+shapefile_path = "YourShapefileDirectory/FirePolygons.shp"
+
+# Read the Shapefile using geopandas
+gdf = gpd.read_file(shapefile_path)
+
+# Reproject the shapefile to EPSG:4326 to match the satellite imagery coordinate system
+gdf = gdf.to_crs(epsg=4326)
+
+# Create a directory to store the output raster masks if it doesn't already exist
+output_dir = "raster_masks"
+if not os.path.exists(output_dir):
+    os.makedirs(output_dir)
+
+# Iterate through the rows in the attribute table
+for index, row in gdf.iterrows():
+    object_id = row["OBJECTID"]
+    image_path = f"YourImageDirectory/RGB_AfterFire{object_id}.tif"
+    # Check if the image exists
+    if os.path.exists(image_path):
+        # Open the image using rasterio
+        with rasterio.open(image_path) as src:
+            image_width = src.width
+            image_height = src.height
+            image_transform = src.transform
+            image_bounds = box(
+                src.bounds[0], src.bounds[1],
+                src.bounds[2], src.bounds[3]
+            )
+
+        # Extract the geometry column
+        geom = shape(row["geometry"])
+        clipped_geom = geom.intersection(image_bounds.envelope)
+
+        if not clipped_geom.is_empty:
+            # Create a two-dimensional label by rasterizing the
+            # clipped geometry
+            clipped_mask = rasterio.features.geometry_mask(
+                [clipped_geom],
+                out_shape=(image_height, image_width),
+                transform=image_transform,
+                invert=True,
+            )
+
+            # Save the image with the two-dimensional label overlay
+            output_image_path
+            = f"{output_dir}/Masked_{object_id}.tif"
+            with rasterio.open(
+                output_image_path,
+                "w",
+                driver="GTiff",
+                width=image_width,
+                height=image_height,
+                count=1,
+                dtype=np.uint8,
+                crs=src.crs,
+                transform=image_transform,
+            ) as dst:
+                dst.write(clipped_mask.astype(np.uint8), 1)
+\end{lstlisting}
+
+In the resulting ground truth masks, the pixel values are set to zero if they are outside of a wildfire polygon, indicating unaffected areas, and set to one if they are within the polygon boundaries, indicating areas affected by a forest wildfire. Figure \ref{fig4} displays an example of a resulting ground truth mask.
+
+
+\begin{figure}[H]
+    \centering
+    \includegraphics{label1.png}
+    \caption{Example of a resulting ground truth mask in a forested area affected by wildfires. The mask highlights wildfire-affected areas in yellow and unaffected areas in purple. This binary mask is used to train and validate deep learning models for accurate wildfire detection. \label{fig4}}
+\end{figure}
+
+
+\subsection{Image Segmentation and Data Preparation for Deep Learning Architectures}
+Next, the satellite images and their corresponding ground-truth masks are cropped into smaller tiles that maintain the imagery's spatial resolution (10m). Often, satellite imagery needs to be resized and downscaled to accommodate deep learning (DL) architectures, which can result in the loss of essential details such as subtle indicators of early-stage wildfires. Moreover, using smaller images enhances the efficiency of DL models by lowering computational demands and speeding up training times \citep{hu2015transferring,marmanis2016deep}. 
+
+To do this, a tile size of 256x256 pixels is specified and each RGB image is downloaded individually from Google Drive to a temporary local folder. Using Python’s rasterio library, the original RGB images are opened to obtain their dimensions. Then, the number of rows and columns for the tiles is calculated based on the chosen tile size. Next, a rasterio Window object is used to extract the corresponding portion from the original image and read the RGB data, ensuring the order of the bands (B4, B3, and B2).
+
+The segmented RGB tiles are then saved as GeoTIFF files using the tifffile Python library \citep{tifffile}. This is a critical step to maintain the integrity of the three-channel RGB data, as the rasterio library alone can alter the color of the images during saving. Additionally, the metadata of the saved tiles is updated to include georeferencing information and to modify parameters such as width, height, and transform (see Code \ref{lst:save_rgb_tiles}).
+
+A similar approach is used to segment the binary masks, specifying that the images contain only one band. 
+\begin{lstlisting}[language=Python, caption={Function to Crop RGB Image Tiles}, label= lst:save_rgb_tiles]
+from rasterio import Window
+from tifffile import imwrite
+
+# Function to save image tiles without changing the data type
+def save_rgb_tiles(image_path, output_folder, tile_size, parent_name):
+    # Open the source image file 
+    with rasterio.open(image_path) as src:
+        height = src.height # Get the height of the source image
+        width = src.width # Get the width of the source image
+        
+        # Calculate the number of tiles in both dimensions
+        num_rows = height // tile_size
+        num_cols = width // tile_size
+
+        tile_counter = 1  # Initialize the tile counter
+        
+        # Iterate over the number of rows of tiles
+        for i in range(num_rows):
+            # Iterate over the number of columns of tiles
+            for j in range(num_cols):
+                # Define the window for the current tile
+                window = Window(j * tile_size, i * tile_size, tile_size, tile_size)
+
+                # Read the original data without modifications
+                # Assuming band order B4, B3, B2
+                tile = src.read((1, 2, 3), window=window)
+                # Create a unique name for the tile
+                tile_name = f"{parent_name}_tile_{tile_counter}.tif"
+                # Define the path to save the tile
+                tile_path = os.path.join(output_folder, tile_name)
+
+                # Save the tile using tifffile without changing
+                # data type
+                imwrite(tile_path, tile)
+
+                # Copy the metadata from the source image
+                meta = src.meta.copy()
+                # Get the transformation matrix for the current window
+                transform = src.window_transform(window)
+	    	    # Update the metadata with the new dimensions and transformation
+                meta.update({
+                    'width': tile_size,
+                    'height': tile_size,
+                    'transform': transform
+                })
+	            # Save the tile with updated metadata using rasterio
+                with rasterio.open(tile_path, 'w', **meta) as dst:
+                    dst.write(tile)
+                tile_counter += 1  # Increment the tile counter
+\end{lstlisting}
 
 \begin{figure}[H]
     \centering
     \includegraphics[width=0.9\textwidth]{crop.png}
-    \caption{Example of cropped pre- and post-wildfire images and their corresponding label}. \label{figcrop}
+    \caption{Example of cropped pre- and post-wildfire images and their corresponding label.} \label{figcrop}
 \end{figure}
 
 By combining the capabilities of rasterio for efficient geospatial data handling and the tifffile library for preserving the RGB data during saving, the original images are cropped into smaller RGB tiles. This approach preserves the resolution and the georeferencing information of the images, preparing them to train DL applications.
@@ -371,7 +371,7 @@ \subsection{VGG16 Implementation}
 \end{itemize}
 
 
-\begin{lstlisting}[language=Python, caption={Custom Function to Feed GeoTIFF Files to the VGG16 Model: The function reads batches of 32 GeoTIFF files, shuffles them, and processes them into a three-dimensional array compatible with VGG16.}, label= lst:custom_generator]
+\begin{lstlisting}[language=Python, caption={Custom Function to Feed GeoTIFF Files to the VGG16 Model: The function reads batches of 32 GeoTIFF files - shuffles them - and processes them into a three-dimensional array compatible with VGG16.}, label= lst:custom_generator]
 from sklearn.utils import shuffle
 
 # Define the base paths for training and testing
@@ -471,7 +471,7 @@ \subsection{EfficientNet Implementation}
 \begin{figure}[H]
     \centering
     \includegraphics[width=0.9\textwidth]{6channel.png}
-    \caption{Representation of a 6 Channel RGB GeoTIFF Input. A: Representation of a 3-channel RGB GeoTIFF forested area \textit{before} a wildfire B: Visual example of a 3-channel RGB GeoTIFF forested area \textit{after} a wildfire.}. \label{fig1}
+    \caption{Representation of a 6 Channel RGB GeoTIFF Input. A: Representation of a 3-channel RGB GeoTIFF forested area \textit{before} a wildfire B: Visual example of a 3-channel RGB GeoTIFF forested area \textit{after} a wildfire.} \label{fig1}
 \end{figure}
 EfficientNet \citep{tan2019} is a CNN architecture that uniformly scales network width, depth, and resolution with a fixed set of scaling coefficients. EfficientNet’s architecture begins with a base model, EfficientNet-B0, designed to find the optimal baseline network configuration. The following versions of the network are further scaled versions of B0, offering multiple models for different computational budgets. 
 
@@ -667,6 +667,6 @@ \section{Conclusion}
 
 \bibliography{mybib}
 \bibliographystyle{unsrtnat}
-
-
+
+
 
diff --git a/papers/Valeria_Martin/myst.yml b/papers/Valeria_Martin/myst.yml
index 62cb8da3ef..1d8ca321a3 100644
--- a/papers/Valeria_Martin/myst.yml
+++ b/papers/Valeria_Martin/myst.yml
@@ -1,9 +1,11 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/YADT7194
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-Valeria_Martin
   title: Python-Based GeoImagery Dataset Development for Deep Learning-Driven Forest Wildfire Detection
-  subtitle: 
+  description: In recent years, leveraging satellite imagery with deep learning architectures has become an effective approach for environmental monitoring tasks, including forest wildfire detection. This paper presents a Python-based methodology for gathering and using a labeled high-resolution satellite imagery dataset for forest wildfire detection.
   # Authors should have affiliations, emails and ORCIDs if available
   authors:
     - name: Valeria Martin
@@ -15,7 +17,7 @@ project:
       email: jmorgan3@uwf.edu
       orcid: 0000-0003-2321-3765
       affiliations:
-        - University of West Florida    
+        - University of West Florida
     - name: K. Brent Venable
       email: bvenable@uwf.edu
       orcid: 0000-0002-1092-9759
@@ -84,16 +86,8 @@ project:
         - al-dabbagh2023uni
         - SEYDI2022108999
         - Hunan
-        - 8113128
+        - '8113128'
         - DBLP:RonnebergerFB15
         - lecun
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
 site:
-  template: article-theme
\ No newline at end of file
+  template: article-theme
diff --git a/papers/Valeria_Martin/polygons.png b/papers/Valeria_Martin/polygons.png
index e9f8036999..4f328ec4c9 100644
Binary files a/papers/Valeria_Martin/polygons.png and b/papers/Valeria_Martin/polygons.png differ
diff --git a/papers/Valeria_Martin/thumbnail.png b/papers/Valeria_Martin/thumbnail.png
new file mode 100644
index 0000000000..71a87c3515
Binary files /dev/null and b/papers/Valeria_Martin/thumbnail.png differ
diff --git a/papers/alan_lujan/banner.png b/papers/alan_lujan/banner.png
index e6a793bd6c..391831e724 100644
Binary files a/papers/alan_lujan/banner.png and b/papers/alan_lujan/banner.png differ
diff --git a/papers/alan_lujan/main.md b/papers/alan_lujan/main.md
index c6a88643fd..ffacee30d3 100644
--- a/papers/alan_lujan/main.md
+++ b/papers/alan_lujan/main.md
@@ -46,17 +46,17 @@ The arrangement of known data points, called the grid, significantly influences
 :label: tbl:grids
 :header-rows: 1
 * - Grid
- - Structure
- - Geometry
+  - Structure
+  - Geometry
 * - Rectilinear
- - Regular
- - Rectangular mesh
+  - Regular
+  - Rectangular mesh
 * - Curvilinear
- - Regular
- - Quadrilateral mesh
+  - Regular
+  - Quadrilateral mesh
 * - Unstructured
- - Irregular
- - Random
+  - Irregular
+  - Random
 ```
 
 ### Existing Interpolation Methods
diff --git a/papers/alan_lujan/myst.yml b/papers/alan_lujan/myst.yml
index 3b825b099e..64cbf6ffe5 100644
--- a/papers/alan_lujan/myst.yml
+++ b/papers/alan_lujan/myst.yml
@@ -1,12 +1,15 @@
 version: 1
+extends: ../proceedings.yml
 site:
   template: article-theme
 project:
+  doi: 10.25080/FGCJ9164
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-alan_lujan
   # Ensure your title is the same as in your `main.md`
   title: multinterp
   subtitle: A Unified Interface for Multivariate Interpolation in the Scientific Python Ecosystem
+  description: Multivariate interpolation is a fundamental tool in scientific computing used to approximate the values of a function between known data points in multiple dimensions. Despite its importance, the Python ecosystem offers a fragmented landscape of specialized tools for this task; the multinterp package was developed to address this challenge.
   # Authors should have affiliations, emails and ORCIDs if available
   authors:
     - name: Alan Lujan
@@ -15,6 +18,7 @@ project:
       affiliations:
         - institution: Johns Hopkins University
           department: Department of Economics
+          ror: https://ror.org/00za53h95
         - institution: Econ-ARK
           url: https://econ-ark.org/
       corresponding: true
@@ -29,6 +33,10 @@ project:
   # Add the abbreviations that you use in your paper here
   abbreviations:
     MyST: Markedly Structured Text
+    CPU: Central Processing Unit
+    GPU: Graphics Processing Unit
+    API: Application Programming Interface
+    RBF: Radial Basis Function
   # It is possible to explicitly ignore the `doi-exists` check for certain citation keys
   error_rules:
     - rule: doi-exists
@@ -43,11 +51,3 @@ project:
         - Bradbury2018
         - Pedregosa2011
         - Paszke2019
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
diff --git a/papers/alan_lujan/thumbnail.png b/papers/alan_lujan/thumbnail.png
new file mode 100644
index 0000000000..ac3eaf4a14
Binary files /dev/null and b/papers/alan_lujan/thumbnail.png differ
diff --git a/papers/aleksandar_makelov/arxiv_template.bib b/papers/aleksandar_makelov/arxiv_template.bib
deleted file mode 100644
index 95744c20fc..0000000000
--- a/papers/aleksandar_makelov/arxiv_template.bib
+++ /dev/null
@@ -1,11 +0,0 @@
-@inproceedings{Vaswani+2017,
- author = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, \L ukasz and Polosukhin, Illia},
- booktitle = {Advances in Neural Information Processing Systems},
- pages = {},
- publisher = {Curran Associates, Inc.},
- title = {Attention is All you Need},
- url = {https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf},
- volume = {30},
- year = {2017}
-}
-
diff --git a/papers/aleksandar_makelov/arxiv_template.bst b/papers/aleksandar_makelov/arxiv_template.bst
deleted file mode 100644
index a85a0087d1..0000000000
--- a/papers/aleksandar_makelov/arxiv_template.bst
+++ /dev/null
@@ -1,1440 +0,0 @@
-%% File: `iclr2024.bst'
-%% A copy of iclm2010.bst, which is a modification of `plainnl.bst' for use with natbib package 
-%%
-%% Copyright 2010 Hal Daum\'e III
-%% Modified by J. Fürnkranz
-%% - Changed labels from (X and Y, 2000) to (X & Y, 2000)
-%%
-%% Copyright 1993-2007 Patrick W Daly
-%% Max-Planck-Institut f\"ur Sonnensystemforschung
-%% Max-Planck-Str. 2
-%% D-37191 Katlenburg-Lindau
-%% Germany
-%% E-mail: daly@mps.mpg.de
-%%
-%% This program can be redistributed and/or modified under the terms
-%% of the LaTeX Project Public License Distributed from CTAN
-%% archives in directory macros/latex/base/lppl.txt; either
-%% version 1 of the License, or any later version.
-%%
- % Version and source file information:
- % \ProvidesFile{icml2010.mbs}[2007/11/26 1.93 (PWD)]
- %
- % BibTeX `plainnat' family
- %   version 0.99b for BibTeX versions 0.99a or later,
- %   for LaTeX versions 2.09 and 2e.
- %
- % For use with the `natbib.sty' package; emulates the corresponding
- %   member of the `plain' family, but with author-year citations.
- %
- % With version 6.0 of `natbib.sty', it may also be used for numerical
- %   citations, while retaining the commands \citeauthor, \citefullauthor,
- %   and \citeyear to print the corresponding information.
- %
- % For version 7.0 of `natbib.sty', the KEY field replaces missing
- %   authors/editors, and the date is left blank in \bibitem.
- %
- % Includes field EID for the sequence/citation number of electronic journals
- %  which is used instead of page numbers.
- %
- % Includes fields ISBN and ISSN.
- %
- % Includes field URL for Internet addresses.
- %
- % Includes field DOI for Digital Object Idenfifiers.
- %
- % Works best with the url.sty package of Donald Arseneau.
- %
- % Works with identical authors and year are further sorted by
- %   citation key, to preserve any natural sequence.
- %
-ENTRY
-  { address
-    author
-    booktitle
-    chapter
-    doi
-    eid
-    edition
-    editor
-    howpublished
-    institution
-    isbn
-    issn
-    journal
-    key
-    month
-    note
-    number
-    organization
-    pages
-    publisher
-    school
-    series
-    title
-    type
-    url
-    volume
-    year
-  }
-  {}
-  { label extra.label sort.label short.list }
-
-INTEGERS { output.state before.all mid.sentence after.sentence after.block }
-
-FUNCTION {init.state.consts}
-{ #0 'before.all :=
-  #1 'mid.sentence :=
-  #2 'after.sentence :=
-  #3 'after.block :=
-}
-
-STRINGS { s t }
-
-FUNCTION {output.nonnull}
-{ 's :=
-  output.state mid.sentence =
-    { ", " * write$ }
-    { output.state after.block =
-        { add.period$ write$
-          newline$
-          "\newblock " write$
-        }
-        { output.state before.all =
-            'write$
-            { add.period$ " " * write$ }
-          if$
-        }
-      if$
-      mid.sentence 'output.state :=
-    }
-  if$
-  s
-}
-
-FUNCTION {output}
-{ duplicate$ empty$
-    'pop$
-    'output.nonnull
-  if$
-}
-
-FUNCTION {output.check}
-{ 't :=
-  duplicate$ empty$
-    { pop$ "empty " t * " in " * cite$ * warning$ }
-    'output.nonnull
-  if$
-}
-
-FUNCTION {fin.entry}
-{ add.period$
-  write$
-  newline$
-}
-
-FUNCTION {new.block}
-{ output.state before.all =
-    'skip$
-    { after.block 'output.state := }
-  if$
-}
-
-FUNCTION {new.sentence}
-{ output.state after.block =
-    'skip$
-    { output.state before.all =
-        'skip$
-        { after.sentence 'output.state := }
-      if$
-    }
-  if$
-}
-
-FUNCTION {not}
-{   { #0 }
-    { #1 }
-  if$
-}
-
-FUNCTION {and}
-{   'skip$
-    { pop$ #0 }
-  if$
-}
-
-FUNCTION {or}
-{   { pop$ #1 }
-    'skip$
-  if$
-}
-
-FUNCTION {new.block.checka}
-{ empty$
-    'skip$
-    'new.block
-  if$
-}
-
-FUNCTION {new.block.checkb}
-{ empty$
-  swap$ empty$
-  and
-    'skip$
-    'new.block
-  if$
-}
-
-FUNCTION {new.sentence.checka}
-{ empty$
-    'skip$
-    'new.sentence
-  if$
-}
-
-FUNCTION {new.sentence.checkb}
-{ empty$
-  swap$ empty$
-  and
-    'skip$
-    'new.sentence
-  if$
-}
-
-FUNCTION {field.or.null}
-{ duplicate$ empty$
-    { pop$ "" }
-    'skip$
-  if$
-}
-
-FUNCTION {emphasize}
-{ duplicate$ empty$
-    { pop$ "" }
-    { "\emph{" swap$ * "}" * }
-  if$
-}
-
-INTEGERS { nameptr namesleft numnames }
-
-FUNCTION {format.names}
-{ 's :=
-  #1 'nameptr :=
-  s num.names$ 'numnames :=
-  numnames 'namesleft :=
-    { namesleft #0 > }
-    { s nameptr "{ff~}{vv~}{ll}{, jj}" format.name$ 't :=
-      nameptr #1 >
-        { namesleft #1 >
-            { ", " * t * }
-            { numnames #2 >
-                { "," * }
-                'skip$
-              if$
-              t "others" =
-                { " et~al." * }
-                { " and " * t * }
-              if$
-            }
-          if$
-        }
-        't
-      if$
-      nameptr #1 + 'nameptr :=
-      namesleft #1 - 'namesleft :=
-    }
-  while$
-}
-
-FUNCTION {format.key}
-{ empty$
-    { key field.or.null }
-    { "" }
-  if$
-}
-
-FUNCTION {format.authors}
-{ author empty$
-    { "" }
-    { author format.names }
-  if$
-}
-
-FUNCTION {format.editors}
-{ editor empty$
-    { "" }
-    { editor format.names
-      editor num.names$ #1 >
-        { " (eds.)" * }
-        { " (ed.)" * }
-      if$
-    }
-  if$
-}
-
-FUNCTION {format.isbn}
-{ isbn empty$
-    { "" }
-    { new.block "ISBN " isbn * }
-  if$
-}
-
-FUNCTION {format.issn}
-{ issn empty$
-    { "" }
-    { new.block "ISSN " issn * }
-  if$
-}
-
-FUNCTION {format.url}
-{ url empty$
-    { "" }
-    { new.block "URL \url{" url * "}" * }
-  if$
-}
-
-FUNCTION {format.doi}
-{ doi empty$
-    { "" }
-    { new.block "\doi{" doi * "}" * }
-  if$
-}
-
-FUNCTION {format.title}
-{ title empty$
-    { "" }
-    { title "t" change.case$ }
-  if$
-}
-
-FUNCTION {format.full.names}
-{'s :=
-  #1 'nameptr :=
-  s num.names$ 'numnames :=
-  numnames 'namesleft :=
-    { namesleft #0 > }
-    { s nameptr
-      "{vv~}{ll}" format.name$ 't :=
-      nameptr #1 >
-        {
-          namesleft #1 >
-            { ", " * t * }
-            {
-              numnames #2 >
-                { "," * }
-                'skip$
-              if$
-              t "others" =
-                { " et~al." * }
-                { " and " * t * }
-              if$
-            }
-          if$
-        }
-        't
-      if$
-      nameptr #1 + 'nameptr :=
-      namesleft #1 - 'namesleft :=
-    }
-  while$
-}
-
-FUNCTION {author.editor.full}
-{ author empty$
-    { editor empty$
-        { "" }
-        { editor format.full.names }
-      if$
-    }
-    { author format.full.names }
-  if$
-}
-
-FUNCTION {author.full}
-{ author empty$
-    { "" }
-    { author format.full.names }
-  if$
-}
-
-FUNCTION {editor.full}
-{ editor empty$
-    { "" }
-    { editor format.full.names }
-  if$
-}
-
-FUNCTION {make.full.names}
-{ type$ "book" =
-  type$ "inbook" =
-  or
-    'author.editor.full
-    { type$ "proceedings" =
-        'editor.full
-        'author.full
-      if$
-    }
-  if$
-}
-
-FUNCTION {output.bibitem}
-{ newline$
-  "\bibitem[" write$
-  label write$
-  ")" make.full.names duplicate$ short.list =
-     { pop$ }
-     { * }
-   if$
-  "]{" * write$
-  cite$ write$
-  "}" write$
-  newline$
-  ""
-  before.all 'output.state :=
-}
-
-FUNCTION {n.dashify}
-{ 't :=
-  ""
-    { t empty$ not }
-    { t #1 #1 substring$ "-" =
-        { t #1 #2 substring$ "--" = not
-            { "--" *
-              t #2 global.max$ substring$ 't :=
-            }
-            {   { t #1 #1 substring$ "-" = }
-                { "-" *
-                  t #2 global.max$ substring$ 't :=
-                }
-              while$
-            }
-          if$
-        }
-        { t #1 #1 substring$ *
-          t #2 global.max$ substring$ 't :=
-        }
-      if$
-    }
-  while$
-}
-
-FUNCTION {format.date}
-{ year duplicate$ empty$
-    { "empty year in " cite$ * warning$
-       pop$ "" }
-    'skip$
-  if$
-  month empty$
-    'skip$
-    { month
-      " " * swap$ *
-    }
-  if$
-  extra.label *
-}
-
-FUNCTION {format.btitle}
-{ title emphasize
-}
-
-FUNCTION {tie.or.space.connect}
-{ duplicate$ text.length$ #3 <
-    { "~" }
-    { " " }
-  if$
-  swap$ * *
-}
-
-FUNCTION {either.or.check}
-{ empty$
-    'pop$
-    { "can't use both " swap$ * " fields in " * cite$ * warning$ }
-  if$
-}
-
-FUNCTION {format.bvolume}
-{ volume empty$
-    { "" }
-    { "volume" volume tie.or.space.connect
-      series empty$
-        'skip$
-        { " of " * series emphasize * }
-      if$
-      "volume and number" number either.or.check
-    }
-  if$
-}
-
-FUNCTION {format.number.series}
-{ volume empty$
-    { number empty$
-        { series field.or.null }
-        { output.state mid.sentence =
-            { "number" }
-            { "Number" }
-          if$
-          number tie.or.space.connect
-          series empty$
-            { "there's a number but no series in " cite$ * warning$ }
-            { " in " * series * }
-          if$
-        }
-      if$
-    }
-    { "" }
-  if$
-}
-
-FUNCTION {format.edition}
-{ edition empty$
-    { "" }
-    { output.state mid.sentence =
-        { edition "l" change.case$ " edition" * }
-        { edition "t" change.case$ " edition" * }
-      if$
-    }
-  if$
-}
-
-INTEGERS { multiresult }
-
-FUNCTION {multi.page.check}
-{ 't :=
-  #0 'multiresult :=
-    { multiresult not
-      t empty$ not
-      and
-    }
-    { t #1 #1 substring$
-      duplicate$ "-" =
-      swap$ duplicate$ "," =
-      swap$ "+" =
-      or or
-        { #1 'multiresult := }
-        { t #2 global.max$ substring$ 't := }
-      if$
-    }
-  while$
-  multiresult
-}
-
-FUNCTION {format.pages}
-{ pages empty$
-    { "" }
-    { pages multi.page.check
-        { "pp.\ " pages n.dashify tie.or.space.connect }
-        { "pp.\ " pages tie.or.space.connect }
-      if$
-    }
-  if$
-}
-
-FUNCTION {format.eid}
-{ eid empty$
-    { "" }
-    { "art." eid tie.or.space.connect }
-  if$
-}
-
-FUNCTION {format.vol.num.pages}
-{ volume field.or.null
-  number empty$
-    'skip$
-    { "\penalty0 (" number * ")" * *
-      volume empty$
-        { "there's a number but no volume in " cite$ * warning$ }
-        'skip$
-      if$
-    }
-  if$
-  pages empty$
-    'skip$
-    { duplicate$ empty$
-        { pop$ format.pages }
-        { ":\penalty0 " * pages n.dashify * }
-      if$
-    }
-  if$
-}
-
-FUNCTION {format.vol.num.eid}
-{ volume field.or.null
-  number empty$
-    'skip$
-    { "\penalty0 (" number * ")" * *
-      volume empty$
-        { "there's a number but no volume in " cite$ * warning$ }
-        'skip$
-      if$
-    }
-  if$
-  eid empty$
-    'skip$
-    { duplicate$ empty$
-        { pop$ format.eid }
-        { ":\penalty0 " * eid * }
-      if$
-    }
-  if$
-}
-
-FUNCTION {format.chapter.pages}
-{ chapter empty$
-    'format.pages
-    { type empty$
-        { "chapter" }
-        { type "l" change.case$ }
-      if$
-      chapter tie.or.space.connect
-      pages empty$
-        'skip$
-        { ", " * format.pages * }
-      if$
-    }
-  if$
-}
-
-FUNCTION {format.in.ed.booktitle}
-{ booktitle empty$
-    { "" }
-    { editor empty$
-        { "In " booktitle emphasize * }
-        { "In " format.editors * ", " * booktitle emphasize * }
-      if$
-    }
-  if$
-}
-
-FUNCTION {empty.misc.check}
-{ author empty$ title empty$ howpublished empty$
-  month empty$ year empty$ note empty$
-  and and and and and
-  key empty$ not and
-    { "all relevant fields are empty in " cite$ * warning$ }
-    'skip$
-  if$
-}
-
-FUNCTION {format.thesis.type}
-{ type empty$
-    'skip$
-    { pop$
-      type "t" change.case$
-    }
-  if$
-}
-
-FUNCTION {format.tr.number}
-{ type empty$
-    { "Technical Report" }
-    'type
-  if$
-  number empty$
-    { "t" change.case$ }
-    { number tie.or.space.connect }
-  if$
-}
-
-FUNCTION {format.article.crossref}
-{ key empty$
-    { journal empty$
-        { "need key or journal for " cite$ * " to crossref " * crossref *
-          warning$
-          ""
-        }
-        { "In \emph{" journal * "}" * }
-      if$
-    }
-    { "In " }
-  if$
-  " \citet{" * crossref * "}" *
-}
-
-FUNCTION {format.book.crossref}
-{ volume empty$
-    { "empty volume in " cite$ * "'s crossref of " * crossref * warning$
-      "In "
-    }
-    { "Volume" volume tie.or.space.connect
-      " of " *
-    }
-  if$
-  editor empty$
-  editor field.or.null author field.or.null =
-  or
-    { key empty$
-        { series empty$
-            { "need editor, key, or series for " cite$ * " to crossref " *
-              crossref * warning$
-              "" *
-            }
-            { "\emph{" * series * "}" * }
-          if$
-        }
-        'skip$
-      if$
-    }
-    'skip$
-  if$
-  " \citet{" * crossref * "}" *
-}
-
-FUNCTION {format.incoll.inproc.crossref}
-{ editor empty$
-  editor field.or.null author field.or.null =
-  or
-    { key empty$
-        { booktitle empty$
-            { "need editor, key, or booktitle for " cite$ * " to crossref " *
-              crossref * warning$
-              ""
-            }
-            { "In \emph{" booktitle * "}" * }
-          if$
-        }
-        { "In " }
-      if$
-    }
-    { "In " }
-  if$
-  " \citet{" * crossref * "}" *
-}
-
-FUNCTION {article}
-{ output.bibitem
-  format.authors "author" output.check
-  author format.key output
-  new.block
-  format.title "title" output.check
-  new.block
-  crossref missing$
-    { journal emphasize "journal" output.check
-      eid empty$
-        { format.vol.num.pages output }
-        { format.vol.num.eid output }
-      if$
-      format.date "year" output.check
-    }
-    { format.article.crossref output.nonnull
-      eid empty$
-        { format.pages output }
-        { format.eid output }
-      if$
-    }
-  if$
-  format.issn output
-  format.doi output
-  format.url output
-  new.block
-  note output
-  fin.entry
-}
-
-FUNCTION {book}
-{ output.bibitem
-  author empty$
-    { format.editors "author and editor" output.check
-      editor format.key output
-    }
-    { format.authors output.nonnull
-      crossref missing$
-        { "author and editor" editor either.or.check }
-        'skip$
-      if$
-    }
-  if$
-  new.block
-  format.btitle "title" output.check
-  crossref missing$
-    { format.bvolume output
-      new.block
-      format.number.series output
-      new.sentence
-      publisher "publisher" output.check
-      address output
-    }
-    { new.block
-      format.book.crossref output.nonnull
-    }
-  if$
-  format.edition output
-  format.date "year" output.check
-  format.isbn output
-  format.doi output
-  format.url output
-  new.block
-  note output
-  fin.entry
-}
-
-FUNCTION {booklet}
-{ output.bibitem
-  format.authors output
-  author format.key output
-  new.block
-  format.title "title" output.check
-  howpublished address new.block.checkb
-  howpublished output
-  address output
-  format.date output
-  format.isbn output
-  format.doi output
-  format.url output
-  new.block
-  note output
-  fin.entry
-}
-
-FUNCTION {inbook}
-{ output.bibitem
-  author empty$
-    { format.editors "author and editor" output.check
-      editor format.key output
-    }
-    { format.authors output.nonnull
-      crossref missing$
-        { "author and editor" editor either.or.check }
-        'skip$
-      if$
-    }
-  if$
-  new.block
-  format.btitle "title" output.check
-  crossref missing$
-    { format.bvolume output
-      format.chapter.pages "chapter and pages" output.check
-      new.block
-      format.number.series output
-      new.sentence
-      publisher "publisher" output.check
-      address output
-    }
-    { format.chapter.pages "chapter and pages" output.check
-      new.block
-      format.book.crossref output.nonnull
-    }
-  if$
-  format.edition output
-  format.date "year" output.check
-  format.isbn output
-  format.doi output
-  format.url output
-  new.block
-  note output
-  fin.entry
-}
-
-FUNCTION {incollection}
-{ output.bibitem
-  format.authors "author" output.check
-  author format.key output
-  new.block
-  format.title "title" output.check
-  new.block
-  crossref missing$
-    { format.in.ed.booktitle "booktitle" output.check
-      format.bvolume output
-      format.number.series output
-      format.chapter.pages output
-      new.sentence
-      publisher "publisher" output.check
-      address output
-      format.edition output
-      format.date "year" output.check
-    }
-    { format.incoll.inproc.crossref output.nonnull
-      format.chapter.pages output
-    }
-  if$
-  format.isbn output
-  format.doi output
-  format.url output
-  new.block
-  note output
-  fin.entry
-}
-
-FUNCTION {inproceedings}
-{ output.bibitem
-  format.authors "author" output.check
-  author format.key output
-  new.block
-  format.title "title" output.check
-  new.block
-  crossref missing$
-    { format.in.ed.booktitle "booktitle" output.check
-      format.bvolume output
-      format.number.series output
-      format.pages output
-      address empty$
-        { organization publisher new.sentence.checkb
-          organization output
-          publisher output
-          format.date "year" output.check
-        }
-        { address output.nonnull
-          format.date "year" output.check
-          new.sentence
-          organization output
-          publisher output
-        }
-      if$
-    }
-    { format.incoll.inproc.crossref output.nonnull
-      format.pages output
-    }
-  if$
-  format.isbn output
-  format.doi output
-  format.url output
-  new.block
-  note output
-  fin.entry
-}
-
-FUNCTION {conference} { inproceedings }
-
-FUNCTION {manual}
-{ output.bibitem
-  format.authors output
-  author format.key output
-  new.block
-  format.btitle "title" output.check
-  organization address new.block.checkb
-  organization output
-  address output
-  format.edition output
-  format.date output
-  format.url output
-  new.block
-  note output
-  fin.entry
-}
-
-FUNCTION {mastersthesis}
-{ output.bibitem
-  format.authors "author" output.check
-  author format.key output
-  new.block
-  format.title "title" output.check
-  new.block
-  "Master's thesis" format.thesis.type output.nonnull
-  school "school" output.check
-  address output
-  format.date "year" output.check
-  format.url output
-  new.block
-  note output
-  fin.entry
-}
-
-FUNCTION {misc}
-{ output.bibitem
-  format.authors output
-  author format.key output
-  title howpublished new.block.checkb
-  format.title output
-  howpublished new.block.checka
-  howpublished output
-  format.date output
-  format.issn output
-  format.url output
-  new.block
-  note output
-  fin.entry
-  empty.misc.check
-}
-
-FUNCTION {phdthesis}
-{ output.bibitem
-  format.authors "author" output.check
-  author format.key output
-  new.block
-  format.btitle "title" output.check
-  new.block
-  "PhD thesis" format.thesis.type output.nonnull
-  school "school" output.check
-  address output
-  format.date "year" output.check
-  format.url output
-  new.block
-  note output
-  fin.entry
-}
-
-FUNCTION {proceedings}
-{ output.bibitem
-  format.editors output
-  editor format.key output
-  new.block
-  format.btitle "title" output.check
-  format.bvolume output
-  format.number.series output
-  address output
-  format.date "year" output.check
-  new.sentence
-  organization output
-  publisher output
-  format.isbn output
-  format.doi output
-  format.url output
-  new.block
-  note output
-  fin.entry
-}
-
-FUNCTION {techreport}
-{ output.bibitem
-  format.authors "author" output.check
-  author format.key output
-  new.block
-  format.title "title" output.check
-  new.block
-  format.tr.number output.nonnull
-  institution "institution" output.check
-  address output
-  format.date "year" output.check
-  format.url output
-  new.block
-  note output
-  fin.entry
-}
-
-FUNCTION {unpublished}
-{ output.bibitem
-  format.authors "author" output.check
-  author format.key output
-  new.block
-  format.title "title" output.check
-  new.block
-  note "note" output.check
-  format.date output
-  format.url output
-  fin.entry
-}
-
-FUNCTION {default.type} { misc }
-
-
-MACRO {jan} {"January"}
-
-MACRO {feb} {"February"}
-
-MACRO {mar} {"March"}
-
-MACRO {apr} {"April"}
-
-MACRO {may} {"May"}
-
-MACRO {jun} {"June"}
-
-MACRO {jul} {"July"}
-
-MACRO {aug} {"August"}
-
-MACRO {sep} {"September"}
-
-MACRO {oct} {"October"}
-
-MACRO {nov} {"November"}
-
-MACRO {dec} {"December"}
-
-
-
-MACRO {acmcs} {"ACM Computing Surveys"}
-
-MACRO {acta} {"Acta Informatica"}
-
-MACRO {cacm} {"Communications of the ACM"}
-
-MACRO {ibmjrd} {"IBM Journal of Research and Development"}
-
-MACRO {ibmsj} {"IBM Systems Journal"}
-
-MACRO {ieeese} {"IEEE Transactions on Software Engineering"}
-
-MACRO {ieeetc} {"IEEE Transactions on Computers"}
-
-MACRO {ieeetcad}
- {"IEEE Transactions on Computer-Aided Design of Integrated Circuits"}
-
-MACRO {ipl} {"Information Processing Letters"}
-
-MACRO {jacm} {"Journal of the ACM"}
-
-MACRO {jcss} {"Journal of Computer and System Sciences"}
-
-MACRO {scp} {"Science of Computer Programming"}
-
-MACRO {sicomp} {"SIAM Journal on Computing"}
-
-MACRO {tocs} {"ACM Transactions on Computer Systems"}
-
-MACRO {tods} {"ACM Transactions on Database Systems"}
-
-MACRO {tog} {"ACM Transactions on Graphics"}
-
-MACRO {toms} {"ACM Transactions on Mathematical Software"}
-
-MACRO {toois} {"ACM Transactions on Office Information Systems"}
-
-MACRO {toplas} {"ACM Transactions on Programming Languages and Systems"}
-
-MACRO {tcs} {"Theoretical Computer Science"}
-
-
-READ
-
-FUNCTION {sortify}
-{ purify$
-  "l" change.case$
-}
-
-INTEGERS { len }
-
-FUNCTION {chop.word}
-{ 's :=
-  'len :=
-  s #1 len substring$ =
-    { s len #1 + global.max$ substring$ }
-    's
-  if$
-}
-
-FUNCTION {format.lab.names}
-{ 's :=
-  s #1 "{vv~}{ll}" format.name$
-  s num.names$ duplicate$
-  #2 >
-    { pop$ " et~al." * }
-    { #2 <
-        'skip$
-        { s #2 "{ff }{vv }{ll}{ jj}" format.name$ "others" =
-            { " et~al." * }
-            { " \& " * s #2 "{vv~}{ll}" format.name$ * }
-          if$
-        }
-      if$
-    }
-  if$
-}
-
-FUNCTION {author.key.label}
-{ author empty$
-    { key empty$
-        { cite$ #1 #3 substring$ }
-        'key
-      if$
-    }
-    { author format.lab.names }
-  if$
-}
-
-FUNCTION {author.editor.key.label}
-{ author empty$
-    { editor empty$
-        { key empty$
-            { cite$ #1 #3 substring$ }
-            'key
-          if$
-        }
-        { editor format.lab.names }
-      if$
-    }
-    { author format.lab.names }
-  if$
-}
-
-FUNCTION {author.key.organization.label}
-{ author empty$
-    { key empty$
-        { organization empty$
-            { cite$ #1 #3 substring$ }
-            { "The " #4 organization chop.word #3 text.prefix$ }
-          if$
-        }
-        'key
-      if$
-    }
-    { author format.lab.names }
-  if$
-}
-
-FUNCTION {editor.key.organization.label}
-{ editor empty$
-    { key empty$
-        { organization empty$
-            { cite$ #1 #3 substring$ }
-            { "The " #4 organization chop.word #3 text.prefix$ }
-          if$
-        }
-        'key
-      if$
-    }
-    { editor format.lab.names }
-  if$
-}
-
-FUNCTION {calc.short.authors}
-{ type$ "book" =
-  type$ "inbook" =
-  or
-    'author.editor.key.label
-    { type$ "proceedings" =
-        'editor.key.organization.label
-        { type$ "manual" =
-            'author.key.organization.label
-            'author.key.label
-          if$
-        }
-      if$
-    }
-  if$
-  'short.list :=
-}
-
-FUNCTION {calc.label}
-{ calc.short.authors
-  short.list
-  "("
-  *
-  year duplicate$ empty$
-  short.list key field.or.null = or
-     { pop$ "" }
-     'skip$
-  if$
-  *
-  'label :=
-}
-
-FUNCTION {sort.format.names}
-{ 's :=
-  #1 'nameptr :=
-  ""
-  s num.names$ 'numnames :=
-  numnames 'namesleft :=
-    { namesleft #0 > }
-    {
-      s nameptr "{vv{ } }{ll{ }}{  ff{ }}{  jj{ }}" format.name$ 't :=
-      nameptr #1 >
-        {
-          "   "  *
-          namesleft #1 = t "others" = and
-            { "zzzzz" * }
-            { numnames #2 > nameptr #2 = and
-                { "zz" * year field.or.null * "   " * }
-                'skip$
-              if$
-              t sortify *
-            }
-          if$
-        }
-        { t sortify * }
-      if$
-      nameptr #1 + 'nameptr :=
-      namesleft #1 - 'namesleft :=
-    }
-  while$
-}
-
-FUNCTION {sort.format.title}
-{ 't :=
-  "A " #2
-    "An " #3
-      "The " #4 t chop.word
-    chop.word
-  chop.word
-  sortify
-  #1 global.max$ substring$
-}
-
-FUNCTION {author.sort}
-{ author empty$
-    { key empty$
-        { "to sort, need author or key in " cite$ * warning$
-          ""
-        }
-        { key sortify }
-      if$
-    }
-    { author sort.format.names }
-  if$
-}
-
-FUNCTION {author.editor.sort}
-{ author empty$
-    { editor empty$
-        { key empty$
-            { "to sort, need author, editor, or key in " cite$ * warning$
-              ""
-            }
-            { key sortify }
-          if$
-        }
-        { editor sort.format.names }
-      if$
-    }
-    { author sort.format.names }
-  if$
-}
-
-FUNCTION {author.organization.sort}
-{ author empty$
-    { organization empty$
-        { key empty$
-            { "to sort, need author, organization, or key in " cite$ * warning$
-              ""
-            }
-            { key sortify }
-          if$
-        }
-        { "The " #4 organization chop.word sortify }
-      if$
-    }
-    { author sort.format.names }
-  if$
-}
-
-FUNCTION {editor.organization.sort}
-{ editor empty$
-    { organization empty$
-        { key empty$
-            { "to sort, need editor, organization, or key in " cite$ * warning$
-              ""
-            }
-            { key sortify }
-          if$
-        }
-        { "The " #4 organization chop.word sortify }
-      if$
-    }
-    { editor sort.format.names }
-  if$
-}
-
-
-FUNCTION {presort}
-{ calc.label
-  label sortify
-  "    "
-  *
-  type$ "book" =
-  type$ "inbook" =
-  or
-    'author.editor.sort
-    { type$ "proceedings" =
-        'editor.organization.sort
-        { type$ "manual" =
-            'author.organization.sort
-            'author.sort
-          if$
-        }
-      if$
-    }
-  if$
-  "    "
-  *
-  year field.or.null sortify
-  *
-  "    "
-  *
-  cite$
-  *
-  #1 entry.max$ substring$
-  'sort.label :=
-  sort.label *
-  #1 entry.max$ substring$
-  'sort.key$ :=
-}
-
-ITERATE {presort}
-
-SORT
-
-STRINGS { longest.label last.label next.extra }
-
-INTEGERS { longest.label.width last.extra.num number.label }
-
-FUNCTION {initialize.longest.label}
-{ "" 'longest.label :=
-  #0 int.to.chr$ 'last.label :=
-  "" 'next.extra :=
-  #0 'longest.label.width :=
-  #0 'last.extra.num :=
-  #0 'number.label :=
-}
-
-FUNCTION {forward.pass}
-{ last.label label =
-    { last.extra.num #1 + 'last.extra.num :=
-      last.extra.num int.to.chr$ 'extra.label :=
-    }
-    { "a" chr.to.int$ 'last.extra.num :=
-      "" 'extra.label :=
-      label 'last.label :=
-    }
-  if$
-  number.label #1 + 'number.label :=
-}
-
-FUNCTION {reverse.pass}
-{ next.extra "b" =
-    { "a" 'extra.label := }
-    'skip$
-  if$
-  extra.label 'next.extra :=
-  extra.label
-  duplicate$ empty$
-    'skip$
-    { "{\natexlab{" swap$ * "}}" * }
-  if$
-  'extra.label :=
-  label extra.label * 'label :=
-}
-
-EXECUTE {initialize.longest.label}
-
-ITERATE {forward.pass}
-
-REVERSE {reverse.pass}
-
-FUNCTION {bib.sort.order}
-{ sort.label  'sort.key$ :=
-}
-
-ITERATE {bib.sort.order}
-
-SORT
-
-FUNCTION {begin.bib}
-{   preamble$ empty$
-    'skip$
-    { preamble$ write$ newline$ }
-  if$
-  "\begin{thebibliography}{" number.label int.to.str$ * "}" *
-  write$ newline$
-  "\providecommand{\natexlab}[1]{#1}"
-  write$ newline$
-  "\providecommand{\url}[1]{\texttt{#1}}"
-  write$ newline$
-  "\expandafter\ifx\csname urlstyle\endcsname\relax"
-  write$ newline$
-  "  \providecommand{\doi}[1]{doi: #1}\else"
-  write$ newline$
-  "  \providecommand{\doi}{doi: \begingroup \urlstyle{rm}\Url}\fi"
-  write$ newline$
-}
-
-EXECUTE {begin.bib}
-
-EXECUTE {init.state.consts}
-
-ITERATE {call.type$}
-
-FUNCTION {end.bib}
-{ newline$
-  "\end{thebibliography}" write$ newline$
-}
-
-EXECUTE {end.bib}
diff --git a/papers/aleksandar_makelov/arxiv_template.sty b/papers/aleksandar_makelov/arxiv_template.sty
deleted file mode 100644
index 84c7b6b33e..0000000000
--- a/papers/aleksandar_makelov/arxiv_template.sty
+++ /dev/null
@@ -1,252 +0,0 @@
-%%%% COLM Macros (LaTex)
-%%%% Adapted by Hugo Larochelle from the NIPS stylefile Macros
-%%%% Style File
-%%%% Dec 12, 1990   Rev Aug 14, 1991; Sept, 1995; April, 1997; April, 1999; October 2014
-
-% This file can be used with Latex2e whether running in main mode, or
-% 2.09 compatibility mode.
-%
-% If using main mode, you need to include the commands
-%             \documentclass{article}
-%             \usepackage{colm14submit_e}
-%
-
-% Palatino font
-\RequirePackage{tgpagella} % text only
-\RequirePackage{mathpazo}  % math & text
-
-
-% Change the overall width of the page.  If these parameters are
-%       changed, they will require corresponding changes in the
-%       maketitle section.
-%
-\usepackage{eso-pic} % used by \AddToShipoutPicture
-\RequirePackage{fancyhdr}
-\RequirePackage{natbib}
-
-% modification to natbib citations
-\setcitestyle{authoryear,round,citesep={;},aysep={,},yysep={;}}
-
-\renewcommand{\topfraction}{0.95}   % let figure take up nearly whole page
-\renewcommand{\textfraction}{0.05}  % let figure take up nearly whole page
-
-% Define colmfinal, set to true if colmfinalcopy is defined
-\newif\ifcolmfinal
-\colmfinalfalse
-\def\colmfinalcopy{\colmfinaltrue}
-\font\colmtenhv  = phvb at 8pt
-
-% Specify the dimensions of each page
-
-\setlength{\paperheight}{11in}
-\setlength{\paperwidth}{8.5in}
-
-
-\oddsidemargin .5in    %   Note \oddsidemargin = \evensidemargin
-\evensidemargin .5in
-\marginparwidth 0.07 true in
-%\marginparwidth 0.75 true in
-%\topmargin 0 true pt           % Nominal distance from top of page to top of
-%\topmargin 0.125in
-\topmargin -0.625in
-\addtolength{\headsep}{0.25in}
-\textheight 9.0 true in       % Height of text (including footnotes & figures)
-\textwidth 5.5 true in        % Width of text line.
-\widowpenalty=10000
-\clubpenalty=10000
-
-% \thispagestyle{empty}        \pagestyle{empty}
-\flushbottom \sloppy
-
-% We're never going to need a table of contents, so just flush it to
-% save space --- suggested by drstrip@sandia-2
-\def\addcontentsline#1#2#3{}
-
-% Title stuff, taken from deproc.
-\def\maketitle{\par
-\begingroup
-   \def\thefootnote{\fnsymbol{footnote}}
-   \def\@makefnmark{\hbox to 0pt{$^{\@thefnmark}$\hss}} % for perfect author
-                                                        % name centering
-%   The footnote-mark was overlapping the footnote-text,
-%   added the following to fix this problem               (MK)
-   \long\def\@makefntext##1{\parindent 1em\noindent
-                            \hbox to1.8em{\hss $\m@th ^{\@thefnmark}$}##1}
-   \@maketitle \@thanks
-\endgroup
-\setcounter{footnote}{0}
-\let\maketitle\relax \let\@maketitle\relax
-\gdef\@thanks{}\gdef\@author{}\gdef\@title{}\let\thanks\relax}
-
-% The toptitlebar has been raised to top-justify the first page
-
-\usepackage{fancyhdr}
-\pagestyle{fancy}
-% \renewcommand{\headrulewidth}{1.5pt}
-\renewcommand{\headrulewidth}{0pt}
-\fancyhead{}
-
-% Title (includes both anonimized and non-anonimized versions)
-\def\@maketitle{\vbox{\hsize\textwidth
-%\linewidth\hsize \vskip 0.1in \toptitlebar \centering
-{\Large\bf \@title\par}
-%\bottomtitlebar % \vskip 0.1in %  minus
-\ifcolmfinal
-    % \lhead{Preprint. Under review.}
-    \def\And{\end{tabular}\hfil\linebreak[0]\hfil
-            \begin{tabular}[t]{l}\bf\rule{\z@}{24pt}\ignorespaces}%
-  \def\AND{\end{tabular}\hfil\linebreak[4]\hfil
-            \begin{tabular}[t]{l}\bf\rule{\z@}{24pt}\ignorespaces}%
-    \begin{tabular}[t]{l}\bf\rule{\z@}{24pt}\@author\end{tabular}%
-\else
-       \lhead{Under review as a conference paper at COLM 2024}
-   \def\And{\end{tabular}\hfil\linebreak[0]\hfil
-            \begin{tabular}[t]{l}\bf\rule{\z@}{24pt}\ignorespaces}%
-  \def\AND{\end{tabular}\hfil\linebreak[4]\hfil
-            \begin{tabular}[t]{l}\bf\rule{\z@}{24pt}\ignorespaces}%
-    \begin{tabular}[t]{l}\bf\rule{\z@}{24pt}Anonymous authors\\Paper under double-blind review\end{tabular}%
-\fi
-\vskip 0.3in minus 0.1in}}
-
-\renewenvironment{abstract}{\vskip.075in\centerline{\large\bf
-Abstract}\vspace{0.5ex}\begin{quote}}{\par\end{quote}\vskip 1ex}
-
-% sections with less space
-\def\section{\@startsection {section}{1}{\z@}{-2.0ex plus
-    -0.5ex minus -.2ex}{1.5ex plus 0.3ex
-minus0.2ex}{\large\bf\raggedright}}
-
-\def\subsection{\@startsection{subsection}{2}{\z@}{-1.8ex plus
--0.5ex minus -.2ex}{0.8ex plus .2ex}{\normalsize\raggedright}}
-\def\subsubsection{\@startsection{subsubsection}{3}{\z@}{-1.5ex
-plus      -0.5ex minus -.2ex}{0.5ex plus
-.2ex}{\normalsize\raggedright}}
-\def\paragraph{\@startsection{paragraph}{4}{\z@}{1.5ex plus
-0.5ex minus .2ex}{-1em}{\normalsize\bf}}
-\def\subparagraph{\@startsection{subparagraph}{5}{\z@}{1.5ex plus
-  0.5ex minus .2ex}{-1em}{\normalsize}}
-\def\subsubsubsection{\vskip
-5pt{\noindent\normalsize\rm\raggedright}}
-
-
-% Footnotes
-\footnotesep 6.65pt %
-\skip\footins 9pt plus 4pt minus 2pt
-\def\footnoterule{\kern-3pt \hrule width 12pc \kern 2.6pt }
-\setcounter{footnote}{0}
-
-% Lists and paragraphs
-\parindent 0pt
-\topsep 4pt plus 1pt minus 2pt
-\partopsep 1pt plus 0.5pt minus 0.5pt
-\itemsep 2pt plus 1pt minus 0.5pt
-\parsep 2pt plus 1pt minus 0.5pt
-\parskip .5pc
-
-
-%\leftmargin2em
-\leftmargin3pc
-\leftmargini\leftmargin \leftmarginii 2em
-\leftmarginiii 1.5em \leftmarginiv 1.0em \leftmarginv .5em
-
-%\labelsep \labelsep 5pt
-
-\def\@listi{\leftmargin\leftmargini}
-\def\@listii{\leftmargin\leftmarginii
-   \labelwidth\leftmarginii\advance\labelwidth-\labelsep
-   \topsep 2pt plus 1pt minus 0.5pt
-   \parsep 1pt plus 0.5pt minus 0.5pt
-   \itemsep \parsep}
-\def\@listiii{\leftmargin\leftmarginiii
-    \labelwidth\leftmarginiii\advance\labelwidth-\labelsep
-    \topsep 1pt plus 0.5pt minus 0.5pt
-    \parsep \z@ \partopsep 0.5pt plus 0pt minus 0.5pt
-    \itemsep \topsep}
-\def\@listiv{\leftmargin\leftmarginiv
-     \labelwidth\leftmarginiv\advance\labelwidth-\labelsep}
-\def\@listv{\leftmargin\leftmarginv
-     \labelwidth\leftmarginv\advance\labelwidth-\labelsep}
-\def\@listvi{\leftmargin\leftmarginvi
-     \labelwidth\leftmarginvi\advance\labelwidth-\labelsep}
-
-\abovedisplayskip 7pt plus2pt minus5pt%
-\belowdisplayskip \abovedisplayskip
-\abovedisplayshortskip  0pt plus3pt%
-\belowdisplayshortskip  4pt plus3pt minus3pt%
-
-% Less leading in most fonts (due to the narrow columns)
-% The choices were between 1-pt and 1.5-pt leading
-%\def\@normalsize{\@setsize\normalsize{11pt}\xpt\@xpt} % got rid of @ (MK)
-\def\normalsize{\@setsize\normalsize{11pt}\xpt\@xpt}
-\def\small{\@setsize\small{10pt}\ixpt\@ixpt}
-\def\footnotesize{\@setsize\footnotesize{10pt}\ixpt\@ixpt}
-\def\scriptsize{\@setsize\scriptsize{8pt}\viipt\@viipt}
-\def\tiny{\@setsize\tiny{7pt}\vipt\@vipt}
-\def\large{\@setsize\large{14pt}\xiipt\@xiipt}
-\def\Large{\@setsize\Large{16pt}\xivpt\@xivpt}
-\def\LARGE{\@setsize\LARGE{20pt}\xviipt\@xviipt}
-\def\huge{\@setsize\huge{23pt}\xxpt\@xxpt}
-\def\Huge{\@setsize\Huge{28pt}\xxvpt\@xxvpt}
-
-\def\toptitlebar{\hrule height4pt\vskip .25in\vskip-\parskip}
-
-\def\bottomtitlebar{\vskip .29in\vskip-\parskip\hrule height1pt\vskip
-.09in} %
-%Reduced second vskip to compensate for adding the strut in \@author
-
-
-%% % Vertical Ruler
-%% % This code is, largely, from the CVPR 2010 conference style file
-%% % ----- define vruler
-%% \makeatletter
-%% \newbox\colmrulerbox
-%% \newcount\colmrulercount
-%% \newdimen\colmruleroffset
-%% \newdimen\cv@lineheight
-%% \newdimen\cv@boxheight
-%% \newbox\cv@tmpbox
-%% \newcount\cv@refno
-%% \newcount\cv@tot
-%% % NUMBER with left flushed zeros  \fillzeros[<WIDTH>]<NUMBER>
-%% \newcount\cv@tmpc@ \newcount\cv@tmpc
-%% \def\fillzeros[#1]#2{\cv@tmpc@=#2\relax\ifnum\cv@tmpc@<0\cv@tmpc@=-\cv@tmpc@\fi
-%% \cv@tmpc=1 %
-%% \loop\ifnum\cv@tmpc@<10 \else \divide\cv@tmpc@ by 10 \advance\cv@tmpc by 1 \fi
-%%    \ifnum\cv@tmpc@=10\relax\cv@tmpc@=11\relax\fi \ifnum\cv@tmpc@>10 \repeat
-%% \ifnum#2<0\advance\cv@tmpc1\relax-\fi
-%% \loop\ifnum\cv@tmpc<#1\relax0\advance\cv@tmpc1\relax\fi \ifnum\cv@tmpc<#1 \repeat
-%% \cv@tmpc@=#2\relax\ifnum\cv@tmpc@<0\cv@tmpc@=-\cv@tmpc@\fi \relax\the\cv@tmpc@}%
-%% % \makevruler[<SCALE>][<INITIAL_COUNT>][<STEP>][<DIGITS>][<HEIGHT>]
-%% \def\makevruler[#1][#2][#3][#4][#5]{\begingroup\offinterlineskip
-%% \textheight=#5\vbadness=10000\vfuzz=120ex\overfullrule=0pt%
-%% \global\setbox\colmrulerbox=\vbox to \textheight{%
-%% {\parskip=0pt\hfuzz=150em\cv@boxheight=\textheight
-%% \cv@lineheight=#1\global\colmrulercount=#2%
-%% \cv@tot\cv@boxheight\divide\cv@tot\cv@lineheight\advance\cv@tot2%
-%% \cv@refno1\vskip-\cv@lineheight\vskip1ex%
-%% \loop\setbox\cv@tmpbox=\hbox to0cm{{\colmtenhv\hfil\fillzeros[#4]\colmrulercount}}%
-%% \ht\cv@tmpbox\cv@lineheight\dp\cv@tmpbox0pt\box\cv@tmpbox\break
-%% \advance\cv@refno1\global\advance\colmrulercount#3\relax
-%% \ifnum\cv@refno<\cv@tot\repeat}}\endgroup}%
-%% \makeatother
-%% % ----- end of vruler
-
-%% % \makevruler[<SCALE>][<INITIAL_COUNT>][<STEP>][<DIGITS>][<HEIGHT>]
-%% \def\colmruler#1{\makevruler[12pt][#1][1][3][0.993\textheight]\usebox{\colmrulerbox}}
-%% \AddToShipoutPicture{%
-%% \ifcolmfinal\else
-%% \colmruleroffset=\textheight
-%% \advance\colmruleroffset by -3.7pt
-%%   \color[rgb]{.7,.7,.7}
-%%   \AtTextUpperLeft{%
-%%     \put(\LenToUnit{-35pt},\LenToUnit{-\colmruleroffset}){%left ruler
-%%       \colmruler{\colmrulercount}}
-%%   }
-%% \fi
-%% }
-%%% To add a vertical bar on the side
-%\AddToShipoutPicture{
-%\AtTextLowerLeft{
-%\hspace*{-1.8cm}
-%\colorbox[rgb]{0.7,0.7,0.7}{\small \parbox[b][\textheight]{0.1cm}{}}}
-%}
diff --git a/papers/aleksandar_makelov/banner.png b/papers/aleksandar_makelov/banner.png
index c5dd028e26..dc7a07a9ef 100644
Binary files a/papers/aleksandar_makelov/banner.png and b/papers/aleksandar_makelov/banner.png differ
diff --git a/papers/aleksandar_makelov/fancyhdr.sty b/papers/aleksandar_makelov/fancyhdr.sty
deleted file mode 100644
index 77ed4e3012..0000000000
--- a/papers/aleksandar_makelov/fancyhdr.sty
+++ /dev/null
@@ -1,485 +0,0 @@
-% fancyhdr.sty version 3.2
-% Fancy headers and footers for LaTeX.
-% Piet van Oostrum, 
-% Dept of Computer and Information Sciences, University of Utrecht,
-% Padualaan 14, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands
-% Telephone: +31 30 2532180. Email: piet@cs.uu.nl
-% ========================================================================
-% LICENCE:
-% This file may be distributed under the terms of the LaTeX Project Public
-% License, as described in lppl.txt in the base LaTeX distribution.
-% Either version 1 or, at your option, any later version.
-% ========================================================================
-% MODIFICATION HISTORY:
-% Sep 16, 1994
-% version 1.4: Correction for use with \reversemargin
-% Sep 29, 1994:
-% version 1.5: Added the \iftopfloat, \ifbotfloat and \iffloatpage commands
-% Oct 4, 1994:
-% version 1.6: Reset single spacing in headers/footers for use with
-% setspace.sty or doublespace.sty
-% Oct 4, 1994:
-% version 1.7: changed \let\@mkboth\markboth to
-% \def\@mkboth{\protect\markboth} to make it more robust
-% Dec 5, 1994:
-% version 1.8: corrections for amsbook/amsart: define \@chapapp and (more
-% importantly) use the \chapter/sectionmark definitions from ps@headings if
-% they exist (which should be true for all standard classes).
-% May 31, 1995:
-% version 1.9: The proposed \renewcommand{\headrulewidth}{\iffloatpage...
-% construction in the doc did not work properly with the fancyplain style. 
-% June 1, 1995:
-% version 1.91: The definition of \@mkboth wasn't restored on subsequent
-% \pagestyle{fancy}'s.
-% June 1, 1995:
-% version 1.92: The sequence \pagestyle{fancyplain} \pagestyle{plain}
-% \pagestyle{fancy} would erroneously select the plain version.
-% June 1, 1995:
-% version 1.93: \fancypagestyle command added.
-% Dec 11, 1995:
-% version 1.94: suggested by Conrad Hughes <chughes@maths.tcd.ie>
-% CJCH, Dec 11, 1995: added \footruleskip to allow control over footrule
-% position (old hardcoded value of .3\normalbaselineskip is far too high
-% when used with very small footer fonts).
-% Jan 31, 1996:
-% version 1.95: call \@normalsize in the reset code if that is defined,
-% otherwise \normalsize.
-% this is to solve a problem with ucthesis.cls, as this doesn't
-% define \@currsize. Unfortunately for latex209 calling \normalsize doesn't
-% work as this is optimized to do very little, so there \@normalsize should
-% be called. Hopefully this code works for all versions of LaTeX known to
-% mankind.  
-% April 25, 1996:
-% version 1.96: initialize \headwidth to a magic (negative) value to catch
-% most common cases that people change it before calling \pagestyle{fancy}.
-% Note it can't be initialized when reading in this file, because
-% \textwidth could be changed afterwards. This is quite probable.
-% We also switch to \MakeUppercase rather than \uppercase and introduce a
-% \nouppercase command for use in headers. and footers.
-% May 3, 1996:
-% version 1.97: Two changes:
-% 1. Undo the change in version 1.8 (using the pagestyle{headings} defaults
-% for the chapter and section marks. The current version of amsbook and
-% amsart classes don't seem to need them anymore. Moreover the standard
-% latex classes don't use \markboth if twoside isn't selected, and this is
-% confusing as \leftmark doesn't work as expected.
-% 2. include a call to \ps@empty in ps@@fancy. This is to solve a problem
-% in the amsbook and amsart classes, that make global changes to \topskip,
-% which are reset in \ps@empty. Hopefully this doesn't break other things.
-% May 7, 1996:
-% version 1.98:
-% Added % after the line  \def\nouppercase
-% May 7, 1996:
-% version 1.99: This is the alpha version of fancyhdr 2.0
-% Introduced the new commands \fancyhead, \fancyfoot, and \fancyhf.
-% Changed \headrulewidth, \footrulewidth, \footruleskip to
-% macros rather than length parameters, In this way they can be
-% conditionalized and they don't consume length registers. There is no need
-% to have them as length registers unless you want to do calculations with
-% them, which is unlikely. Note that this may make some uses of them
-% incompatible (i.e. if you have a file that uses \setlength or \xxxx=)
-% May 10, 1996:
-% version 1.99a:
-% Added a few more % signs
-% May 10, 1996:
-% version 1.99b:
-% Changed the syntax of \f@nfor to be resistent to catcode changes of :=
-% Removed the [1] from the defs of \lhead etc. because the parameter is
-% consumed by the \@[xy]lhead etc. macros.
-% June 24, 1997:
-% version 1.99c:
-% corrected \nouppercase to also include the protected form of \MakeUppercase
-% \global added to manipulation of \headwidth.
-% \iffootnote command added.
-% Some comments added about \@fancyhead and \@fancyfoot.
-% Aug 24, 1998
-% version 1.99d
-% Changed the default \ps@empty to \ps@@empty in order to allow
-% \fancypagestyle{empty} redefinition.
-% Oct 11, 2000
-% version 2.0
-% Added LPPL license clause.
-%
-% A check for \headheight is added. An errormessage is given (once) if the
-% header is too large. Empty headers don't generate the error even if
-% \headheight is very small or even 0pt. 
-% Warning added for the use of 'E' option when twoside option is not used.
-% In this case the 'E' fields will never be used.
-%
-% Mar 10, 2002
-% version 2.1beta
-% New command: \fancyhfoffset[place]{length}
-% defines offsets to be applied to the header/footer to let it stick into
-% the margins (if length > 0).
-% place is like in fancyhead, except that only E,O,L,R can be used.
-% This replaces the old calculation based on \headwidth and the marginpar
-% area.
-% \headwidth will be dynamically calculated in the headers/footers when
-% this is used.
-%
-% Mar 26, 2002
-% version 2.1beta2
-% \fancyhfoffset now also takes h,f as possible letters in the argument to
-% allow the header and footer widths to be different.
-% New commands \fancyheadoffset and \fancyfootoffset added comparable to
-% \fancyhead and \fancyfoot.
-% Errormessages and warnings have been made more informative.
-%
-% Dec 9, 2002
-% version 2.1
-% The defaults for \footrulewidth, \plainheadrulewidth and
-% \plainfootrulewidth are changed from \z@skip to 0pt. In this way when
-% someone inadvertantly uses \setlength to change any of these, the value
-% of \z@skip will not be changed, rather an errormessage will be given.
-
-% March 3, 2004
-% Release of version 3.0
-
-% Oct 7, 2004
-% version 3.1
-% Added '\endlinechar=13' to \fancy@reset to prevent problems with
-% includegraphics in header when verbatiminput is active.
-
-% March 22, 2005
-% version 3.2
-% reset \everypar (the real one) in \fancy@reset because spanish.ldf does
-% strange things with \everypar between << and >>.
-
-\def\ifancy@mpty#1{\def\temp@a{#1}\ifx\temp@a\@empty}
-
-\def\fancy@def#1#2{\ifancy@mpty{#2}\fancy@gbl\def#1{\leavevmode}\else
-                                   \fancy@gbl\def#1{#2\strut}\fi}
-
-\let\fancy@gbl\global
-
-\def\@fancyerrmsg#1{%
-        \ifx\PackageError\undefined
-        \errmessage{#1}\else
-        \PackageError{Fancyhdr}{#1}{}\fi}
-\def\@fancywarning#1{%
-        \ifx\PackageWarning\undefined
-        \errmessage{#1}\else
-        \PackageWarning{Fancyhdr}{#1}{}\fi}
-
-% Usage: \@forc \var{charstring}{command to be executed for each char}
-% This is similar to LaTeX's \@tfor, but expands the charstring.
-
-\def\@forc#1#2#3{\expandafter\f@rc\expandafter#1\expandafter{#2}{#3}}
-\def\f@rc#1#2#3{\def\temp@ty{#2}\ifx\@empty\temp@ty\else
-                                    \f@@rc#1#2\f@@rc{#3}\fi}
-\def\f@@rc#1#2#3\f@@rc#4{\def#1{#2}#4\f@rc#1{#3}{#4}}
-
-% Usage: \f@nfor\name:=list\do{body}
-% Like LaTeX's \@for but an empty list is treated as a list with an empty
-% element
-
-\newcommand{\f@nfor}[3]{\edef\@fortmp{#2}%
-    \expandafter\@forloop#2,\@nil,\@nil\@@#1{#3}}
-
-% Usage: \def@ult \cs{defaults}{argument}
-% sets \cs to the characters from defaults appearing in argument
-% or defaults if it would be empty. All characters are lowercased.
-
-\newcommand\def@ult[3]{%
-    \edef\temp@a{\lowercase{\edef\noexpand\temp@a{#3}}}\temp@a
-    \def#1{}%
-    \@forc\tmpf@ra{#2}%
-        {\expandafter\if@in\tmpf@ra\temp@a{\edef#1{#1\tmpf@ra}}{}}%
-    \ifx\@empty#1\def#1{#2}\fi}
-% 
-% \if@in <char><set><truecase><falsecase>
-%
-\newcommand{\if@in}[4]{%
-    \edef\temp@a{#2}\def\temp@b##1#1##2\temp@b{\def\temp@b{##1}}%
-    \expandafter\temp@b#2#1\temp@b\ifx\temp@a\temp@b #4\else #3\fi}
-
-\newcommand{\fancyhead}{\@ifnextchar[{\f@ncyhf\fancyhead h}%
-                                     {\f@ncyhf\fancyhead h[]}}
-\newcommand{\fancyfoot}{\@ifnextchar[{\f@ncyhf\fancyfoot f}%
-                                     {\f@ncyhf\fancyfoot f[]}}
-\newcommand{\fancyhf}{\@ifnextchar[{\f@ncyhf\fancyhf{}}%
-                                   {\f@ncyhf\fancyhf{}[]}}
-
-% New commands for offsets added
-
-\newcommand{\fancyheadoffset}{\@ifnextchar[{\f@ncyhfoffs\fancyheadoffset h}%
-                                           {\f@ncyhfoffs\fancyheadoffset h[]}}
-\newcommand{\fancyfootoffset}{\@ifnextchar[{\f@ncyhfoffs\fancyfootoffset f}%
-                                           {\f@ncyhfoffs\fancyfootoffset f[]}}
-\newcommand{\fancyhfoffset}{\@ifnextchar[{\f@ncyhfoffs\fancyhfoffset{}}%
-                                         {\f@ncyhfoffs\fancyhfoffset{}[]}}
-
-% The header and footer fields are stored in command sequences with
-% names of the form: \f@ncy<x><y><z> with <x> for [eo], <y> from [lcr]
-% and <z> from [hf].
-
-\def\f@ncyhf#1#2[#3]#4{%
-    \def\temp@c{}%
-    \@forc\tmpf@ra{#3}%
-        {\expandafter\if@in\tmpf@ra{eolcrhf,EOLCRHF}%
-            {}{\edef\temp@c{\temp@c\tmpf@ra}}}%
-    \ifx\@empty\temp@c\else
-        \@fancyerrmsg{Illegal char `\temp@c' in \string#1 argument:
-          [#3]}%
-    \fi
-    \f@nfor\temp@c{#3}%
-        {\def@ult\f@@@eo{eo}\temp@c
-         \if@twoside\else
-           \if\f@@@eo e\@fancywarning
-             {\string#1's `E' option without twoside option is useless}\fi\fi
-         \def@ult\f@@@lcr{lcr}\temp@c
-         \def@ult\f@@@hf{hf}{#2\temp@c}%
-         \@forc\f@@eo\f@@@eo
-             {\@forc\f@@lcr\f@@@lcr
-                 {\@forc\f@@hf\f@@@hf
-                     {\expandafter\fancy@def\csname
-                      f@ncy\f@@eo\f@@lcr\f@@hf\endcsname
-                      {#4}}}}}}
-
-\def\f@ncyhfoffs#1#2[#3]#4{%
-    \def\temp@c{}%
-    \@forc\tmpf@ra{#3}%
-        {\expandafter\if@in\tmpf@ra{eolrhf,EOLRHF}%
-            {}{\edef\temp@c{\temp@c\tmpf@ra}}}%
-    \ifx\@empty\temp@c\else
-        \@fancyerrmsg{Illegal char `\temp@c' in \string#1 argument:
-          [#3]}%
-    \fi
-    \f@nfor\temp@c{#3}%
-        {\def@ult\f@@@eo{eo}\temp@c
-         \if@twoside\else
-           \if\f@@@eo e\@fancywarning
-             {\string#1's `E' option without twoside option is useless}\fi\fi
-         \def@ult\f@@@lcr{lr}\temp@c
-         \def@ult\f@@@hf{hf}{#2\temp@c}%
-         \@forc\f@@eo\f@@@eo
-             {\@forc\f@@lcr\f@@@lcr
-                 {\@forc\f@@hf\f@@@hf
-                     {\expandafter\setlength\csname
-                      f@ncyO@\f@@eo\f@@lcr\f@@hf\endcsname
-                      {#4}}}}}%
-     \fancy@setoffs}
-
-% Fancyheadings version 1 commands. These are more or less deprecated,
-% but they continue to work.
-
-\newcommand{\lhead}{\@ifnextchar[{\@xlhead}{\@ylhead}}
-\def\@xlhead[#1]#2{\fancy@def\f@ncyelh{#1}\fancy@def\f@ncyolh{#2}}
-\def\@ylhead#1{\fancy@def\f@ncyelh{#1}\fancy@def\f@ncyolh{#1}}
-
-\newcommand{\chead}{\@ifnextchar[{\@xchead}{\@ychead}}
-\def\@xchead[#1]#2{\fancy@def\f@ncyech{#1}\fancy@def\f@ncyoch{#2}}
-\def\@ychead#1{\fancy@def\f@ncyech{#1}\fancy@def\f@ncyoch{#1}}
-
-\newcommand{\rhead}{\@ifnextchar[{\@xrhead}{\@yrhead}}
-\def\@xrhead[#1]#2{\fancy@def\f@ncyerh{#1}\fancy@def\f@ncyorh{#2}}
-\def\@yrhead#1{\fancy@def\f@ncyerh{#1}\fancy@def\f@ncyorh{#1}}
-
-\newcommand{\lfoot}{\@ifnextchar[{\@xlfoot}{\@ylfoot}}
-\def\@xlfoot[#1]#2{\fancy@def\f@ncyelf{#1}\fancy@def\f@ncyolf{#2}}
-\def\@ylfoot#1{\fancy@def\f@ncyelf{#1}\fancy@def\f@ncyolf{#1}}
-
-\newcommand{\cfoot}{\@ifnextchar[{\@xcfoot}{\@ycfoot}}
-\def\@xcfoot[#1]#2{\fancy@def\f@ncyecf{#1}\fancy@def\f@ncyocf{#2}}
-\def\@ycfoot#1{\fancy@def\f@ncyecf{#1}\fancy@def\f@ncyocf{#1}}
-
-\newcommand{\rfoot}{\@ifnextchar[{\@xrfoot}{\@yrfoot}}
-\def\@xrfoot[#1]#2{\fancy@def\f@ncyerf{#1}\fancy@def\f@ncyorf{#2}}
-\def\@yrfoot#1{\fancy@def\f@ncyerf{#1}\fancy@def\f@ncyorf{#1}}
-
-\newlength{\fancy@headwidth}
-\let\headwidth\fancy@headwidth
-\newlength{\f@ncyO@elh}
-\newlength{\f@ncyO@erh}
-\newlength{\f@ncyO@olh}
-\newlength{\f@ncyO@orh}
-\newlength{\f@ncyO@elf}
-\newlength{\f@ncyO@erf}
-\newlength{\f@ncyO@olf}
-\newlength{\f@ncyO@orf}
-\newcommand{\headrulewidth}{0.4pt}
-\newcommand{\footrulewidth}{0pt}
-\newcommand{\footruleskip}{.3\normalbaselineskip}
-
-% Fancyplain stuff shouldn't be used anymore (rather
-% \fancypagestyle{plain} should be used), but it must be present for
-% compatibility reasons.
-
-\newcommand{\plainheadrulewidth}{0pt}
-\newcommand{\plainfootrulewidth}{0pt}
-\newif\if@fancyplain \@fancyplainfalse
-\def\fancyplain#1#2{\if@fancyplain#1\else#2\fi}
-
-\headwidth=-123456789sp %magic constant
-
-% Command to reset various things in the headers:
-% a.o.  single spacing (taken from setspace.sty)
-% and the catcode of ^^M (so that epsf files in the header work if a
-% verbatim crosses a page boundary)
-% It also defines a \nouppercase command that disables \uppercase and
-% \Makeuppercase. It can only be used in the headers and footers.
-\let\fnch@everypar\everypar% save real \everypar because of spanish.ldf
-\def\fancy@reset{\fnch@everypar{}\restorecr\endlinechar=13
- \def\baselinestretch{1}%
- \def\nouppercase##1{{\let\uppercase\relax\let\MakeUppercase\relax
-     \expandafter\let\csname MakeUppercase \endcsname\relax##1}}%
- \ifx\undefined\@newbaseline% NFSS not present; 2.09 or 2e
-   \ifx\@normalsize\undefined \normalsize % for ucthesis.cls
-   \else \@normalsize \fi
- \else% NFSS (2.09) present
-  \@newbaseline%
- \fi}
-
-% Initialization of the head and foot text.
-
-% The default values still contain \fancyplain for compatibility.
-\fancyhf{} % clear all
-% lefthead empty on ``plain'' pages, \rightmark on even, \leftmark on odd pages
-% evenhead empty on ``plain'' pages, \leftmark on even, \rightmark on odd pages
-\if@twoside
-  \fancyhead[el,or]{\fancyplain{}{\sl\rightmark}}
-  \fancyhead[er,ol]{\fancyplain{}{\sl\leftmark}}
-\else
-  \fancyhead[l]{\fancyplain{}{\sl\rightmark}}
-  \fancyhead[r]{\fancyplain{}{\sl\leftmark}}
-\fi
-\fancyfoot[c]{\rm\thepage} % page number
-
-% Use box 0 as a temp box and dimen 0 as temp dimen. 
-% This can be done, because this code will always
-% be used inside another box, and therefore the changes are local.
-
-\def\@fancyvbox#1#2{\setbox0\vbox{#2}\ifdim\ht0>#1\@fancywarning
-  {\string#1 is too small (\the#1): ^^J Make it at least \the\ht0.^^J
-    We now make it that large for the rest of the document.^^J
-    This may cause the page layout to be inconsistent, however\@gobble}%
-  \dimen0=#1\global\setlength{#1}{\ht0}\ht0=\dimen0\fi
-  \box0}
-
-% Put together a header or footer given the left, center and
-% right text, fillers at left and right and a rule.
-% The \lap commands put the text into an hbox of zero size,
-% so overlapping text does not generate an errormessage.
-% These macros have 5 parameters:
-% 1. LEFTSIDE BEARING % This determines at which side the header will stick
-%    out. When \fancyhfoffset is used this calculates \headwidth, otherwise
-%    it is \hss or \relax (after expansion).
-% 2. \f@ncyolh, \f@ncyelh, \f@ncyolf or \f@ncyelf. This is the left component.
-% 3. \f@ncyoch, \f@ncyech, \f@ncyocf or \f@ncyecf. This is the middle comp.
-% 4. \f@ncyorh, \f@ncyerh, \f@ncyorf or \f@ncyerf. This is the right component.
-% 5. RIGHTSIDE BEARING. This is always \relax or \hss (after expansion).
-
-\def\@fancyhead#1#2#3#4#5{#1\hbox to\headwidth{\fancy@reset
-  \@fancyvbox\headheight{\hbox
-    {\rlap{\parbox[b]{\headwidth}{\raggedright#2}}\hfill
-      \parbox[b]{\headwidth}{\centering#3}\hfill
-      \llap{\parbox[b]{\headwidth}{\raggedleft#4}}}\headrule}}#5}
-
-\def\@fancyfoot#1#2#3#4#5{#1\hbox to\headwidth{\fancy@reset
-    \@fancyvbox\footskip{\footrule
-      \hbox{\rlap{\parbox[t]{\headwidth}{\raggedright#2}}\hfill
-        \parbox[t]{\headwidth}{\centering#3}\hfill
-        \llap{\parbox[t]{\headwidth}{\raggedleft#4}}}}}#5}
-
-\def\headrule{{\if@fancyplain\let\headrulewidth\plainheadrulewidth\fi
-    \hrule\@height\headrulewidth\@width\headwidth \vskip-\headrulewidth}}
-
-\def\footrule{{\if@fancyplain\let\footrulewidth\plainfootrulewidth\fi
-    \vskip-\footruleskip\vskip-\footrulewidth
-    \hrule\@width\headwidth\@height\footrulewidth\vskip\footruleskip}}
-
-\def\ps@fancy{%
-\@ifundefined{@chapapp}{\let\@chapapp\chaptername}{}%for amsbook
-%
-% Define \MakeUppercase for old LaTeXen.
-% Note: we used \def rather than \let, so that \let\uppercase\relax (from
-% the version 1 documentation) will still work.
-%
-\@ifundefined{MakeUppercase}{\def\MakeUppercase{\uppercase}}{}%
-\@ifundefined{chapter}{\def\sectionmark##1{\markboth
-{\MakeUppercase{\ifnum \c@secnumdepth>\z@
- \thesection\hskip 1em\relax \fi ##1}}{}}%
-\def\subsectionmark##1{\markright {\ifnum \c@secnumdepth >\@ne
- \thesubsection\hskip 1em\relax \fi ##1}}}%
-{\def\chaptermark##1{\markboth {\MakeUppercase{\ifnum \c@secnumdepth>\m@ne
- \@chapapp\ \thechapter. \ \fi ##1}}{}}%
-\def\sectionmark##1{\markright{\MakeUppercase{\ifnum \c@secnumdepth >\z@
- \thesection. \ \fi ##1}}}}%
-%\csname ps@headings\endcsname % use \ps@headings defaults if they exist
-\ps@@fancy
-\gdef\ps@fancy{\@fancyplainfalse\ps@@fancy}%
-% Initialize \headwidth if the user didn't
-%
-\ifdim\headwidth<0sp
-%
-% This catches the case that \headwidth hasn't been initialized and the
-% case that the user added something to \headwidth in the expectation that
-% it was initialized to \textwidth. We compensate this now. This loses if
-% the user intended to multiply it by a factor. But that case is more
-% likely done by saying something like \headwidth=1.2\textwidth. 
-% The doc says you have to change \headwidth after the first call to
-% \pagestyle{fancy}. This code is just to catch the most common cases were
-% that requirement is violated.
-%
-    \global\advance\headwidth123456789sp\global\advance\headwidth\textwidth
-\fi}
-\def\ps@fancyplain{\ps@fancy \let\ps@plain\ps@plain@fancy}
-\def\ps@plain@fancy{\@fancyplaintrue\ps@@fancy}
-\let\ps@@empty\ps@empty
-\def\ps@@fancy{%
-\ps@@empty % This is for amsbook/amsart, which do strange things with \topskip
-\def\@mkboth{\protect\markboth}%
-\def\@oddhead{\@fancyhead\fancy@Oolh\f@ncyolh\f@ncyoch\f@ncyorh\fancy@Oorh}%
-\def\@oddfoot{\@fancyfoot\fancy@Oolf\f@ncyolf\f@ncyocf\f@ncyorf\fancy@Oorf}%
-\def\@evenhead{\@fancyhead\fancy@Oelh\f@ncyelh\f@ncyech\f@ncyerh\fancy@Oerh}%
-\def\@evenfoot{\@fancyfoot\fancy@Oelf\f@ncyelf\f@ncyecf\f@ncyerf\fancy@Oerf}%
-}
-% Default definitions for compatibility mode:
-% These cause the header/footer to take the defined \headwidth as width
-% And to shift in the direction of the marginpar area
-
-\def\fancy@Oolh{\if@reversemargin\hss\else\relax\fi}
-\def\fancy@Oorh{\if@reversemargin\relax\else\hss\fi}
-\let\fancy@Oelh\fancy@Oorh
-\let\fancy@Oerh\fancy@Oolh
-
-\let\fancy@Oolf\fancy@Oolh
-\let\fancy@Oorf\fancy@Oorh
-\let\fancy@Oelf\fancy@Oelh
-\let\fancy@Oerf\fancy@Oerh
-
-% New definitions for the use of \fancyhfoffset
-% These calculate the \headwidth from \textwidth and the specified offsets.
-
-\def\fancy@offsolh{\headwidth=\textwidth\advance\headwidth\f@ncyO@olh
-                   \advance\headwidth\f@ncyO@orh\hskip-\f@ncyO@olh}
-\def\fancy@offselh{\headwidth=\textwidth\advance\headwidth\f@ncyO@elh
-                   \advance\headwidth\f@ncyO@erh\hskip-\f@ncyO@elh}
-
-\def\fancy@offsolf{\headwidth=\textwidth\advance\headwidth\f@ncyO@olf
-                   \advance\headwidth\f@ncyO@orf\hskip-\f@ncyO@olf}
-\def\fancy@offself{\headwidth=\textwidth\advance\headwidth\f@ncyO@elf
-                   \advance\headwidth\f@ncyO@erf\hskip-\f@ncyO@elf}
-
-\def\fancy@setoffs{%
-% Just in case \let\headwidth\textwidth was used
-  \fancy@gbl\let\headwidth\fancy@headwidth
-  \fancy@gbl\let\fancy@Oolh\fancy@offsolh
-  \fancy@gbl\let\fancy@Oelh\fancy@offselh
-  \fancy@gbl\let\fancy@Oorh\hss
-  \fancy@gbl\let\fancy@Oerh\hss
-  \fancy@gbl\let\fancy@Oolf\fancy@offsolf
-  \fancy@gbl\let\fancy@Oelf\fancy@offself
-  \fancy@gbl\let\fancy@Oorf\hss
-  \fancy@gbl\let\fancy@Oerf\hss}
-
-\newif\iffootnote
-\let\latex@makecol\@makecol
-\def\@makecol{\ifvoid\footins\footnotetrue\else\footnotefalse\fi
-\let\topfloat\@toplist\let\botfloat\@botlist\latex@makecol}
-\def\iftopfloat#1#2{\ifx\topfloat\empty #2\else #1\fi}
-\def\ifbotfloat#1#2{\ifx\botfloat\empty #2\else #1\fi}
-\def\iffloatpage#1#2{\if@fcolmade #1\else #2\fi}
-
-\newcommand{\fancypagestyle}[2]{%
-  \@namedef{ps@#1}{\let\fancy@gbl\relax#2\relax\ps@fancy}}
diff --git a/papers/aleksandar_makelov/main.tex b/papers/aleksandar_makelov/main.tex
index 73e687bfe2..9872dbc4b3 100644
--- a/papers/aleksandar_makelov/main.tex
+++ b/papers/aleksandar_makelov/main.tex
@@ -1,74 +1,13 @@
-\documentclass{article} % For LaTeX2e
-\usepackage{arxiv_template}
-
-% Optional math commands from https://github.com/goodfeli/dlbook_notation.
-% \input{math_commands.tex}
-
-\usepackage{microtype}
-\usepackage{hyperref}
-\usepackage{url}
-\usepackage{graphicx}
-
-\usepackage{pgfplots}
-\usepackage{pgfplotstable}
-\pgfplotsset{compat=1.3}
-\usepackage{tikz}
-\usetikzlibrary{arrows.meta}
-\usetikzlibrary{pgfplots.groupplots}
-\usepackage{ragged2e}
-\definecolor{mydarkblue}{rgb}{0,0.08,0.85}
-\definecolor{mylightblue}{rgb}{0.06,0.56,1.0}
-\definecolor{mylightorange}{rgb}{1.0,0.62,0.12}
-\definecolor{mylightred}{rgb}{0.99,0.00,0.04}
-\definecolor{mygreen}{HTML}{2F9E44}
-\definecolor{myred}{HTML}{E03131}
-\definecolor{myblue}{HTML}{1971C2}
-
-\usepackage{subcaption}
-\usepackage{booktabs}
-\usepackage{wrapfig}
-\usepackage{changes}
-\definecolor{myred}{HTML}{E03131}
-\makeatletter
-\@namedef{Changes@AuthorColor}{myred}
-\colorlet{Changes@Color}{myred}
-\makeatother
-% \usepackage{floatrow}
-\colmfinalcopy
-
-% \def\l{\left}
-% \def\r{\right}
-
-\title{\texttt{mandala}: Compositional Memoization for Simple \&
-Powerful Scientific Data Management}
-
-\newcommand{\fix}{\marginpar{FIX}}
-\newcommand{\new}{\marginpar{NEW}}
-% \newfloatcommand{capbtabbox}{table}[][\FBwidth]
-
-\usepackage{soul}
-\usepackage{amsthm}
-\usepackage{mathrsfs}
-% \usepackage[outputdir=/home/amakelov/vscode_output/latex-aux]{minted}
-\newtheorem{theorem}{Theorem}[section]
-\newtheorem{lemma}[theorem]{Lemma}
-
-
-
-\begin{document}
-\maketitle
-
-
 \begin{abstract}
   We present
-  \texttt{mandala}\footnote{\url{https://github.com/amakelov/mandala}}, a Python
+  \href{https://github.com/amakelov/mandala}{\texttt{mandala}}, a Python
   library that largely eliminates the accidental complexity of scientific data
   management and incremental computing. While most traditional and/or
   popular data management solutions are based on \emph{logging},
   \texttt{mandala} takes a fundamentally different approach, using
   \emph{memoization} of function calls as the fundamental unit of saving,
-  loading, querying and deleting computational artifacts. 
-  
+  loading, querying and deleting computational artifacts.
+
   It does so by implementing a \emph{compositional} form of memoization, which keeps
   track of how memoized functions compose with one another. In this way: (1)
   complex computations are effectively memoized end-to-end, and become
@@ -105,7 +44,7 @@ \section{Introduction}
 \citep{sandve2013ten,wilkinson2016fair}, but still require manual effort,
 attention to extraneous details, and discipline to follow. Researchers often
 operate under time pressure and/or the need to quickly iterate on code, which
-makes these best `practices' a rather \emph{impractical} time investment. 
+makes these best `practices' a rather \emph{impractical} time investment.
 
 Thus, ideally we would like a system that (1) does not get in the way by
 imposing a complex new language/semantics/syntax, (2) provides powerful
@@ -158,7 +97,7 @@ \section{Introduction}
 \emph{accidental complexity} (the data management tools necessary to implement the solution) \citep{Brooks1987NoSB}.
 
 The rest of this paper presents the design and main functionalities of
-\texttt{mandala}, and is organized as follows: 
+\texttt{mandala}, and is organized as follows:
 \begin{itemize}
 \item In Section \ref{section:core-concepts}, we describe how memoization is
 designed, how this allows memoized calls to be composed and memoized results to
@@ -206,12 +145,12 @@ \subsection{Memoization and the Computational Graph}
 \texttt{Ref}s and \texttt{Call}s are the two atomic data structures in
 \texttt{mandala}'s model of computations. When a call to an
 \texttt{@op}-decorated function \texttt{f} is executed inside a storage context,
-this results in the creation of 
+this results in the creation of
 \begin{itemize}
 \item A \texttt{Ref} object for each input to the call. These wrap the `raw'
 values passed as inputs together with content IDs (hashes of the Python objects)
 and history IDs (hashes of the memoized calls that produced these values, if
-any). 
+any).
 \begin{itemize}
 \item If an input to the call is already a \texttt{Ref} object, it is passed
 through as is;
@@ -234,7 +173,7 @@ \subsection{Memoization and the Computational Graph}
 %     \centering
 %     \includegraphics[width=\linewidth]{img/comp-graph.pdf}
 %     \caption{A part of the computaitonal graph built up by the calls in Figure
-%     \ref{fig:basic-usage}. 
+%     \ref{fig:basic-usage}.
 %     % The nodes are \texttt{Call} and \texttt{Ref} objects,
 %     % and the edges are the inputs/output names connecting them.
 %     }
@@ -253,7 +192,7 @@ \subsection{Memoization and the Computational Graph}
 user composes memoized calls.
 
 \subsection{Motivation for the Design of Memoization}
-\label{subsection:}
+\label{subsection:design}
 
 \paragraph{Why content and history IDs?}
 The simultaneous use of content and history IDs has a few subtle advantages.
@@ -318,7 +257,7 @@ \subsection{Retracing as a Versatile Imperative Interface to the Stored Computat
 which means stepping through memoized code with the purpose of resuming from a
 failure, loading intermediate values, or continuing from a particular point with
 new computations. A small example of retracing is shown in Figure
-\ref{fig:basic-usage} (c). 
+\ref{fig:basic-usage} (c).
 
 This pattern is simple yet powerful, as it allows the user to interact with the
 stored computation graph in a way that is adapted to their use case, and to
@@ -344,9 +283,9 @@ \section{Computation Frames}
             computations found.}
             \label{fig:figure1}
         \end{subfigure}
-        
+
         \vspace{1em}
-        
+
         \begin{subfigure}[b]{\textwidth}
             \centering
             \includegraphics[width=\textwidth]{img/fig5.pdf}
@@ -424,16 +363,16 @@ \subsection{Formal Definition}
 An example is shown in Figure \ref{fig:cf} (c);
 \item \textbf{Groups of \texttt{Ref}s and \texttt{Call}s}: for each variable
 $v\in V$, a set of (history IDs of) \texttt{Ref}s $R_v$, and for each function
-$f\in F$ with underlying \texttt{@op} $o_f$, a set of (history IDs of) \texttt{Call}s $C_f$; 
+$f\in F$ with underlying \texttt{@op} $o_f$, a set of (history IDs of) \texttt{Call}s $C_f$;
 \end{itemize}
 subject to the constraint that: for every call $c\in C_f$, if there's an
 input/output edge labeled $l$ connecting $f$ to some variable $v$, then if $c$
 has a \texttt{Ref} $r_l$ corresponding to input/output name $l$, we have $r_l\in
-R_v$. 
+R_v$.
 
 In other words, when we look at all calls in $f\in F$, their inputs/outputs must
 be present in the variables connected to $f$ under the respective input/output
-name. 
+name.
 
 \subsection{Basic Usage}
 \label{subsection:cf-basic-usage}
@@ -478,7 +417,7 @@ \subsection{Data Structures}
 \texttt{MList[int]} inheriting from \texttt{List[int]}, \ldots. By applying this
 type annotation, individual elements as well as the collection itself are
 memoized as \texttt{Ref}s (with the collection merely pointing to the
-\texttt{Ref}s of its elements to avoid duplication). 
+\texttt{Ref}s of its elements to avoid duplication).
 
 \begin{wrapfigure}[18]{l}{0.45\textwidth}
     \centering
@@ -537,7 +476,7 @@ \section{Related Work}
 
 \textbf{Memoization.} There are several memoization solutions for Python that lack the compositional nature of \texttt{mandala}, as well as the versioning and
 querying tools: the builtin \texttt{functools} module provides decorators such as \texttt{lru\_cache} for memoization; the \texttt{incpy} project \citep{guo2011using} enables automatic
-persistent memoization of Python functions directly on the interpreter level; 
+persistent memoization of Python functions directly on the interpreter level;
 the \texttt{funsies} project \citep{lavigne2021funsies} is a memoization-based
 distributed workflow executor that uses a similar hashing approach to keep track
 of which computations have already been done; \texttt{koji} \citep{maymounkov2018koji} is a design for an incremental computation data processing framework that unifies over different resource types (files or services), and uses an analogous notion of hashing to keep track of computations.
@@ -566,17 +505,17 @@ \section{Related Work}
 organized in a bare-bones \texttt{git} repository \citep{git}: it is a
 content-addressed tree, where each edge tracks a diff from the content at one
 endpoint to that at the other. Additional metadata indicates equivalence classes
-of semantically equivalent contents. 
-% Semantic versioning \citep{semver} is another popular code versioning system. 
+of semantically equivalent contents.
+% Semantic versioning \citep{semver} is another popular code versioning system.
 % \texttt{mandala} is similar to semver in
 % that it allows you to make backward-compatible changes to the interface and
 % logic of dependencies. It is different in that versions are still labeled by
-% content, instead of `non-canonical' numbers. 
+% content, instead of `non-canonical' numbers.
 
 \section{Limitations}
 \label{sec:limitations}
 
-\textbf{Computing deterministic content IDs of any Python object is difficult.} 
+\textbf{Computing deterministic content IDs of any Python object is difficult.}
 \texttt{mandala} uses the \texttt{joblib} library to serialize Python objects
 into byte strings, and then hashes these strings to get the content ID. This
 approach is not perfect, as it is not always possible to serialize Python
@@ -586,7 +525,7 @@ \section{Limitations}
 sensitive to small changes in the input, such as numerical precision in floating
 point numbers. Finally, complex Python objects may contain state that is not
 intrinsically part of the object's identity, such as resource utilization data
-(e.g., memory addresses). This can lead to different content IDs before and 
+(e.g., memory addresses). This can lead to different content IDs before and
 after a round trip through the storage backend. These issues don't come up often
 as long as all initial \texttt{Ref}s are created from simple Python objects:
 complex objects are hashed and saved once when returned from an \texttt{@op},
@@ -617,12 +556,12 @@ \section{Conclusion}
 \section*{Acknowledgements}
 
 First and foremost, I would like to thank my friend Stefan Krastanov for many
-valuable conversations throughout the evolution and development of 
+valuable conversations throughout the evolution and development of
 \texttt{mandala}. Nobody could ask for a more enthusiastic collaborator and
 champion of their work. Second, I would also like to thank Nicholas Schiefer for
 some helpful feedback on an earlier version of the library, as well as
 suggestions and implementations for features to make it work in a distributed
-setting, and for advertising \texttt{mandala} at his workplace. 
+setting, and for advertising \texttt{mandala} at his workplace.
 
 There have been far too many people over the years who have listened patiently
 to me talk about this project in its earlier stages; in particular, I'm grateful
@@ -633,9 +572,3 @@ \section*{Acknowledgements}
 to me talk about this project, and who have given me valuable feedback and
 encouragement. Finally, I would like to thank my reviewers at the SciPy
 conference, especially Andrei Paleyes, for their helpful feedback on this paper.
-
-% bibliography
-\bibliography{scipy}
-\bibliographystyle{arxiv_template}
-
-\end{document}
\ No newline at end of file
diff --git a/papers/aleksandar_makelov/myst.yml b/papers/aleksandar_makelov/myst.yml
index 08e5a6fe7e..d8133a0d80 100644
--- a/papers/aleksandar_makelov/myst.yml
+++ b/papers/aleksandar_makelov/myst.yml
@@ -1,9 +1,11 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/JHPV7385
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-aleksandar_makelov
-  title: "Mandala: Compositional Memoization for Simple & Powerful Scientific Data Management"
-  subtitle: LaTeX edition
+  title: 'Mandala: Compositional Memoization for Simple & Powerful Scientific Data Management'
+  description: We present mandala, a Python library that largely eliminates the accidental complexity of scientific data management and incremental computing. While most traditional and/or popular data management solutions are based on logging, mandala takes a fundamentally different approach, using memoization of function calls as the fundamental unit of saving, loading, querying and deleting computational artifacts.
   # Authors should have affiliations, emails and ORCIDs if available
   authors:
     - name: Aleksandar Makelov
@@ -32,15 +34,5 @@ project:
         - maymounkov2018koji
         - semver
         - lozano2017unison
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
-  numbering:
-    headings: true
 site:
   template: article-theme
diff --git a/papers/aleksandar_makelov/natbib.sty b/papers/aleksandar_makelov/natbib.sty
deleted file mode 100644
index ff0d0b91b6..0000000000
--- a/papers/aleksandar_makelov/natbib.sty
+++ /dev/null
@@ -1,1246 +0,0 @@
-%%
-%% This is file `natbib.sty',
-%% generated with the docstrip utility.
-%%
-%% The original source files were:
-%%
-%% natbib.dtx  (with options: `package,all')
-%% =============================================
-%% IMPORTANT NOTICE:
-%% 
-%% This program can be redistributed and/or modified under the terms
-%% of the LaTeX Project Public License Distributed from CTAN
-%% archives in directory macros/latex/base/lppl.txt; either
-%% version 1 of the License, or any later version.
-%% 
-%% This is a generated file.
-%% It may not be distributed without the original source file natbib.dtx.
-%% 
-%% Full documentation can be obtained by LaTeXing that original file.
-%% Only a few abbreviated comments remain here to describe the usage.
-%% =============================================
-%% Copyright 1993-2009 Patrick W Daly
-%% Max-Planck-Institut f\"ur Sonnensystemforschung
-%% Max-Planck-Str. 2
-%% D-37191 Katlenburg-Lindau
-%% Germany
-%% E-mail: daly@mps.mpg.de
-\NeedsTeXFormat{LaTeX2e}[1995/06/01]
-\ProvidesPackage{natbib}
-        [2009/07/16 8.31 (PWD, AO)]
-
- % This package reimplements the LaTeX \cite command to be used for various
- % citation styles, both author-year and numerical. It accepts BibTeX
- % output intended for many other packages, and therefore acts as a
- % general, all-purpose citation-style interface.
- %
- % With standard numerical .bst files, only numerical citations are
- % possible. With an author-year .bst file, both numerical and
- % author-year citations are possible.
- %
- % If author-year citations are selected, \bibitem must have one of the
- %   following forms:
- %   \bibitem[Jones et al.(1990)]{key}...
- %   \bibitem[Jones et al.(1990)Jones, Baker, and Williams]{key}...
- %   \bibitem[Jones et al., 1990]{key}...
- %   \bibitem[\protect\citeauthoryear{Jones, Baker, and Williams}{Jones
- %       et al.}{1990}]{key}...
- %   \bibitem[\protect\citeauthoryear{Jones et al.}{1990}]{key}...
- %   \bibitem[\protect\astroncite{Jones et al.}{1990}]{key}...
- %   \bibitem[\protect\citename{Jones et al., }1990]{key}...
- %   \harvarditem[Jones et al.]{Jones, Baker, and Williams}{1990}{key}...
- %
- % This is either to be made up manually, or to be generated by an
- % appropriate .bst file with BibTeX.
- %                            Author-year mode     ||   Numerical mode
- % Then, \citet{key}  ==>>  Jones et al. (1990)    ||   Jones et al. [21]
- %       \citep{key}  ==>> (Jones et al., 1990)    ||   [21]
- % Multiple citations as normal:
- % \citep{key1,key2}  ==>> (Jones et al., 1990; Smith, 1989) || [21,24]
- %                           or  (Jones et al., 1990, 1991)  || [21,24]
- %                           or  (Jones et al., 1990a,b)     || [21,24]
- % \cite{key} is the equivalent of \citet{key} in author-year mode
- %                         and  of \citep{key} in numerical mode
- % Full author lists may be forced with \citet* or \citep*, e.g.
- %       \citep*{key}      ==>> (Jones, Baker, and Williams, 1990)
- % Optional notes as:
- %   \citep[chap. 2]{key}    ==>> (Jones et al., 1990, chap. 2)
- %   \citep[e.g.,][]{key}    ==>> (e.g., Jones et al., 1990)
- %   \citep[see][pg. 34]{key}==>> (see Jones et al., 1990, pg. 34)
- %  (Note: in standard LaTeX, only one note is allowed, after the ref.
- %   Here, one note is like the standard, two make pre- and post-notes.)
- %   \citealt{key}          ==>> Jones et al. 1990
- %   \citealt*{key}         ==>> Jones, Baker, and Williams 1990
- %   \citealp{key}          ==>> Jones et al., 1990
- %   \citealp*{key}         ==>> Jones, Baker, and Williams, 1990
- % Additional citation possibilities (both author-year and numerical modes)
- %   \citeauthor{key}       ==>> Jones et al.
- %   \citeauthor*{key}      ==>> Jones, Baker, and Williams
- %   \citeyear{key}         ==>> 1990
- %   \citeyearpar{key}      ==>> (1990)
- %   \citetext{priv. comm.} ==>> (priv. comm.)
- %   \citenum{key}          ==>> 11 [non-superscripted]
- % Note: full author lists depends on whether the bib style supports them;
- %       if not, the abbreviated list is printed even when full requested.
- %
- % For names like della Robbia at the start of a sentence, use
- %   \Citet{dRob98}         ==>> Della Robbia (1998)
- %   \Citep{dRob98}         ==>> (Della Robbia, 1998)
- %   \Citeauthor{dRob98}    ==>> Della Robbia
- %
- %
- % Citation aliasing is achieved with
- %   \defcitealias{key}{text}
- %   \citetalias{key}  ==>> text
- %   \citepalias{key}  ==>> (text)
- %
- % Defining the citation mode and punctual (citation style)
- %   \setcitestyle{<comma-separated list of keywords, same
- %     as the package options>}
- % Example: \setcitestyle{square,semicolon}
- % Alternatively:
- % Use \bibpunct with 6 mandatory arguments:
- %    1. opening bracket for citation
- %    2. closing bracket
- %    3. citation separator (for multiple citations in one \cite)
- %    4. the letter n for numerical styles, s for superscripts
- %        else anything for author-year
- %    5. punctuation between authors and date
- %    6. punctuation between years (or numbers) when common authors missing
- % One optional argument is the character coming before post-notes. It
- %   appears in square braces before all other arguments. May be left off.
- % Example (and default) \bibpunct[, ]{(}{)}{;}{a}{,}{,}
- %
- % To make this automatic for a given bib style, named newbib, say, make
- % a local configuration file, natbib.cfg, with the definition
- %   \newcommand{\bibstyle@newbib}{\bibpunct...}
- % Then the \bibliographystyle{newbib} will cause \bibstyle@newbib to
- % be called on THE NEXT LATEX RUN (via the aux file).
- %
- % Such preprogrammed definitions may be invoked anywhere in the text
- %  by calling \citestyle{newbib}. This is only useful if the style specified
- %  differs from that in \bibliographystyle.
- %
- % With \citeindextrue and \citeindexfalse, one can control whether the
- % \cite commands make an automatic entry of the citation in the .idx
- % indexing file. For this, \makeindex must also be given in the preamble.
- %
- % Package Options: (for selecting punctuation)
- %   round  -  round parentheses are used (default)
- %   square -  square brackets are used   [option]
- %   curly  -  curly braces are used      {option}
- %   angle  -  angle brackets are used    <option>
- %   semicolon  -  multiple citations separated by semi-colon (default)
- %   colon  - same as semicolon, an earlier confusion
- %   comma  -  separated by comma
- %   authoryear - selects author-year citations (default)
- %   numbers-  selects numerical citations
- %   super  -  numerical citations as superscripts
- %   sort   -  sorts multiple citations according to order in ref. list
- %   sort&compress   -  like sort, but also compresses numerical citations
- %   compress - compresses without sorting
- %   longnamesfirst  -  makes first citation full author list
- %   sectionbib - puts bibliography in a \section* instead of \chapter*
- %   merge - allows the citation key to have a * prefix,
- %           signifying to merge its reference with that of the previous citation.
- %   elide - if references are merged, repeated portions of later ones may be removed.
- %   mcite - recognizes and ignores the * prefix for merging.
- % Punctuation so selected dominates over any predefined ones.
- % Package options are called as, e.g.
- %        \usepackage[square,comma]{natbib}
- % LaTeX the source file natbib.dtx to obtain more details
- % or the file natnotes.tex for a brief reference sheet.
- %-----------------------------------------------------------
-\providecommand\@ifxundefined[1]{%
- \ifx#1\@undefined\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi
-}%
-\providecommand\@ifnum[1]{%
- \ifnum#1\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi
-}%
-\providecommand\@ifx[1]{%
- \ifx#1\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi
-}%
-\providecommand\appdef[2]{%
- \toks@\expandafter{#1}\@temptokena{#2}%
- \edef#1{\the\toks@\the\@temptokena}%
-}%
-\@ifclassloaded{agu2001}{\PackageError{natbib}
-  {The agu2001 class already includes natbib coding,\MessageBreak
-   so you should not add it explicitly}
-  {Type <Return> for now, but then later remove\MessageBreak
-   the command \protect\usepackage{natbib} from the document}
-  \endinput}{}
-\@ifclassloaded{agutex}{\PackageError{natbib}
-  {The AGUTeX class already includes natbib coding,\MessageBreak
-   so you should not add it explicitly}
-  {Type <Return> for now, but then later remove\MessageBreak
-   the command \protect\usepackage{natbib} from the document}
-  \endinput}{}
-\@ifclassloaded{aguplus}{\PackageError{natbib}
-  {The aguplus class already includes natbib coding,\MessageBreak
-   so you should not add it explicitly}
-  {Type <Return> for now, but then later remove\MessageBreak
-   the command \protect\usepackage{natbib} from the document}
-  \endinput}{}
-\@ifclassloaded{nlinproc}{\PackageError{natbib}
-  {The nlinproc class already includes natbib coding,\MessageBreak
-   so you should not add it explicitly}
-  {Type <Return> for now, but then later remove\MessageBreak
-   the command \protect\usepackage{natbib} from the document}
-  \endinput}{}
-\@ifclassloaded{egs}{\PackageError{natbib}
-  {The egs class already includes natbib coding,\MessageBreak
-   so you should not add it explicitly}
-  {Type <Return> for now, but then later remove\MessageBreak
-   the command \protect\usepackage{natbib} from the document}
-  \endinput}{}
-\@ifclassloaded{egu}{\PackageError{natbib}
-  {The egu class already includes natbib coding,\MessageBreak
-   so you should not add it explicitly}
-  {Type <Return> for now, but then later remove\MessageBreak
-   the command \protect\usepackage{natbib} from the document}
-  \endinput}{}
- % Define citation punctuation for some author-year styles
- % One may add and delete at this point
- % Or put additions into local configuration file natbib.cfg
-\newcommand\bibstyle@chicago{\bibpunct{(}{)}{;}{a}{,}{,}}
-\newcommand\bibstyle@named{\bibpunct{[}{]}{;}{a}{,}{,}}
-\newcommand\bibstyle@agu{\bibpunct{[}{]}{;}{a}{,}{,~}}%Amer. Geophys. Union
-\newcommand\bibstyle@copernicus{\bibpunct{(}{)}{;}{a}{,}{,}}%Copernicus Publications
-\let\bibstyle@egu=\bibstyle@copernicus
-\let\bibstyle@egs=\bibstyle@copernicus
-\newcommand\bibstyle@agsm{\bibpunct{(}{)}{,}{a}{}{,}\gdef\harvardand{\&}}
-\newcommand\bibstyle@kluwer{\bibpunct{(}{)}{,}{a}{}{,}\gdef\harvardand{\&}}
-\newcommand\bibstyle@dcu{\bibpunct{(}{)}{;}{a}{;}{,}\gdef\harvardand{and}}
-\newcommand\bibstyle@aa{\bibpunct{(}{)}{;}{a}{}{,}} %Astronomy & Astrophysics
-\newcommand\bibstyle@pass{\bibpunct{(}{)}{;}{a}{,}{,}}%Planet. & Space Sci
-\newcommand\bibstyle@anngeo{\bibpunct{(}{)}{;}{a}{,}{,}}%Annales Geophysicae
-\newcommand\bibstyle@nlinproc{\bibpunct{(}{)}{;}{a}{,}{,}}%Nonlin.Proc.Geophys.
- % Define citation punctuation for some numerical styles
-\newcommand\bibstyle@cospar{\bibpunct{/}{/}{,}{n}{}{}%
-     \gdef\bibnumfmt##1{##1.}}
-\newcommand\bibstyle@esa{\bibpunct{(Ref.~}{)}{,}{n}{}{}%
-     \gdef\bibnumfmt##1{##1.\hspace{1em}}}
-\newcommand\bibstyle@nature{\bibpunct{}{}{,}{s}{}{\textsuperscript{,}}%
-     \gdef\bibnumfmt##1{##1.}}
- % The standard LaTeX styles
-\newcommand\bibstyle@plain{\bibpunct{[}{]}{,}{n}{}{,}}
-\let\bibstyle@alpha=\bibstyle@plain
-\let\bibstyle@abbrv=\bibstyle@plain
-\let\bibstyle@unsrt=\bibstyle@plain
- % The author-year modifications of the standard styles
-\newcommand\bibstyle@plainnat{\bibpunct{[}{]}{,}{a}{,}{,}}
-\let\bibstyle@abbrvnat=\bibstyle@plainnat
-\let\bibstyle@unsrtnat=\bibstyle@plainnat
-\newif\ifNAT@numbers \NAT@numbersfalse
-\newif\ifNAT@super \NAT@superfalse
-\let\NAT@merge\z@
-\DeclareOption{numbers}{\NAT@numberstrue
-   \ExecuteOptions{square,comma,nobibstyle}}
-\DeclareOption{super}{\NAT@supertrue\NAT@numberstrue
-   \renewcommand\NAT@open{}\renewcommand\NAT@close{}
-   \ExecuteOptions{nobibstyle}}
-\DeclareOption{authoryear}{\NAT@numbersfalse
-   \ExecuteOptions{round,semicolon,bibstyle}}
-\DeclareOption{round}{%
-      \renewcommand\NAT@open{(} \renewcommand\NAT@close{)}
-   \ExecuteOptions{nobibstyle}}
-\DeclareOption{square}{%
-      \renewcommand\NAT@open{[} \renewcommand\NAT@close{]}
-   \ExecuteOptions{nobibstyle}}
-\DeclareOption{angle}{%
-      \renewcommand\NAT@open{$<$} \renewcommand\NAT@close{$>$}
-   \ExecuteOptions{nobibstyle}}
-\DeclareOption{curly}{%
-      \renewcommand\NAT@open{\{} \renewcommand\NAT@close{\}}
-   \ExecuteOptions{nobibstyle}}
-\DeclareOption{comma}{\renewcommand\NAT@sep{,}
-   \ExecuteOptions{nobibstyle}}
-\DeclareOption{semicolon}{\renewcommand\NAT@sep{;}
-   \ExecuteOptions{nobibstyle}}
-\DeclareOption{colon}{\ExecuteOptions{semicolon}}
-\DeclareOption{nobibstyle}{\let\bibstyle=\@gobble}
-\DeclareOption{bibstyle}{\let\bibstyle=\@citestyle}
-\newif\ifNAT@openbib \NAT@openbibfalse
-\DeclareOption{openbib}{\NAT@openbibtrue}
-\DeclareOption{sectionbib}{\def\NAT@sectionbib{on}}
-\def\NAT@sort{\z@}
-\def\NAT@cmprs{\z@}
-\DeclareOption{sort}{\def\NAT@sort{\@ne}}
-\DeclareOption{compress}{\def\NAT@cmprs{\@ne}}
-\DeclareOption{sort&compress}{\def\NAT@sort{\@ne}\def\NAT@cmprs{\@ne}}
-\DeclareOption{mcite}{\let\NAT@merge\@ne}
-\DeclareOption{merge}{\@ifnum{\NAT@merge<\tw@}{\let\NAT@merge\tw@}{}}
-\DeclareOption{elide}{\@ifnum{\NAT@merge<\thr@@}{\let\NAT@merge\thr@@}{}}
-\@ifpackageloaded{cite}{\PackageWarningNoLine{natbib}
-  {The `cite' package should not be used\MessageBreak
-   with natbib. Use option `sort' instead}\ExecuteOptions{sort}}{}
-\@ifpackageloaded{mcite}{\PackageWarningNoLine{natbib}
-  {The `mcite' package should not be used\MessageBreak
-   with natbib. Use option `merge' instead}\ExecuteOptions{merge}}{}
-\@ifpackageloaded{citeref}{\PackageError{natbib}
-  {The `citeref' package must be loaded after natbib}%
-  {Move \protect\usepackage{citeref} to after \string\usepackage{natbib}}}{}
-\newif\ifNAT@longnames\NAT@longnamesfalse
-\DeclareOption{longnamesfirst}{\NAT@longnamestrue}
-\DeclareOption{nonamebreak}{\def\NAT@nmfmt#1{\mbox{\NAT@up#1}}}
-\def\NAT@nmfmt#1{{\NAT@up#1}}
-\renewcommand\bibstyle[1]{\csname bibstyle@#1\endcsname}
-\AtBeginDocument{\global\let\bibstyle=\@gobble}
-\let\@citestyle\bibstyle
-\newcommand\citestyle[1]{\@citestyle{#1}\let\bibstyle\@gobble}
-\newcommand\bibpunct[7][, ]%
-  {\gdef\NAT@open{#2}\gdef\NAT@close{#3}\gdef
-   \NAT@sep{#4}\global\NAT@numbersfalse
-     \ifx #5n\global\NAT@numberstrue\global\NAT@superfalse
-   \else
-     \ifx #5s\global\NAT@numberstrue\global\NAT@supertrue
-   \fi\fi
-   \gdef\NAT@aysep{#6}\gdef\NAT@yrsep{#7}%
-   \gdef\NAT@cmt{#1}%
-   \NAT@@setcites
-  }
-\newcommand\setcitestyle[1]{
- \@for\@tempa:=#1\do
- {\def\@tempb{round}\ifx\@tempa\@tempb
-    \renewcommand\NAT@open{(}\renewcommand\NAT@close{)}\fi
-  \def\@tempb{square}\ifx\@tempa\@tempb
-    \renewcommand\NAT@open{[}\renewcommand\NAT@close{]}\fi
-  \def\@tempb{angle}\ifx\@tempa\@tempb
-    \renewcommand\NAT@open{$<$}\renewcommand\NAT@close{$>$}\fi
-  \def\@tempb{curly}\ifx\@tempa\@tempb
-    \renewcommand\NAT@open{\{}\renewcommand\NAT@close{\}}\fi
-  \def\@tempb{semicolon}\ifx\@tempa\@tempb
-    \renewcommand\NAT@sep{;}\fi
-  \def\@tempb{colon}\ifx\@tempa\@tempb
-    \renewcommand\NAT@sep{;}\fi
-  \def\@tempb{comma}\ifx\@tempa\@tempb
-    \renewcommand\NAT@sep{,}\fi
-  \def\@tempb{authoryear}\ifx\@tempa\@tempb
-    \NAT@numbersfalse\fi
-  \def\@tempb{numbers}\ifx\@tempa\@tempb
-    \NAT@numberstrue\NAT@superfalse\fi
-  \def\@tempb{super}\ifx\@tempa\@tempb
-    \NAT@numberstrue\NAT@supertrue\fi
-  \expandafter\NAT@find@eq\@tempa=\relax\@nil
-  \if\@tempc\relax\else
-    \expandafter\NAT@rem@eq\@tempc
-    \def\@tempb{open}\ifx\@tempa\@tempb
-     \xdef\NAT@open{\@tempc}\fi
-    \def\@tempb{close}\ifx\@tempa\@tempb
-     \xdef\NAT@close{\@tempc}\fi
-    \def\@tempb{aysep}\ifx\@tempa\@tempb
-     \xdef\NAT@aysep{\@tempc}\fi
-    \def\@tempb{yysep}\ifx\@tempa\@tempb
-     \xdef\NAT@yrsep{\@tempc}\fi
-    \def\@tempb{notesep}\ifx\@tempa\@tempb
-     \xdef\NAT@cmt{\@tempc}\fi
-    \def\@tempb{citesep}\ifx\@tempa\@tempb
-     \xdef\NAT@sep{\@tempc}\fi
-  \fi
- }%
- \NAT@@setcites
-}
- \def\NAT@find@eq#1=#2\@nil{\def\@tempa{#1}\def\@tempc{#2}}
- \def\NAT@rem@eq#1={\def\@tempc{#1}}
- \def\NAT@@setcites{\global\let\bibstyle\@gobble}
-\AtBeginDocument{\let\NAT@@setcites\NAT@set@cites}
-\newcommand\NAT@open{(} \newcommand\NAT@close{)}
-\newcommand\NAT@sep{;}
-\ProcessOptions
-\newcommand\NAT@aysep{,} \newcommand\NAT@yrsep{,}
-\newcommand\NAT@cmt{, }
-\newcommand\NAT@cite%
-    [3]{\ifNAT@swa\NAT@@open\if*#2*\else#2\NAT@spacechar\fi
-        #1\if*#3*\else\NAT@cmt#3\fi\NAT@@close\else#1\fi\endgroup}
-\newcommand\NAT@citenum%
-    [3]{\ifNAT@swa\NAT@@open\if*#2*\else#2\NAT@spacechar\fi
-        #1\if*#3*\else\NAT@cmt#3\fi\NAT@@close\else#1\fi\endgroup}
-\newcommand\NAT@citesuper[3]{\ifNAT@swa
-\if*#2*\else#2\NAT@spacechar\fi
-\unskip\kern\p@\textsuperscript{\NAT@@open#1\NAT@@close}%
-   \if*#3*\else\NAT@spacechar#3\fi\else #1\fi\endgroup}
-\providecommand\textsuperscript[1]{\mbox{$^{\mbox{\scriptsize#1}}$}}
-\begingroup \catcode`\_=8
-\gdef\NAT@ifcat@num#1{%
- \ifcat_\ifnum\z@<0#1_\else A\fi
-  \expandafter\@firstoftwo
- \else
-  \expandafter\@secondoftwo
- \fi
-}%
-\endgroup
-\providecommand\@firstofone[1]{#1}
-\newcommand\NAT@citexnum{}
-\def\NAT@citexnum[#1][#2]#3{%
-  \NAT@reset@parser
-  \NAT@sort@cites{#3}%
-  \NAT@reset@citea
-  \@cite{\def\NAT@num{-1}\let\NAT@last@yr\relax\let\NAT@nm\@empty
-    \@for\@citeb:=\NAT@cite@list\do
-    {\@safe@activestrue
-     \edef\@citeb{\expandafter\@firstofone\@citeb\@empty}%
-     \@safe@activesfalse
-     \@ifundefined{b@\@citeb\@extra@b@citeb}{%
-       {\reset@font\bfseries?}
-        \NAT@citeundefined\PackageWarning{natbib}%
-       {Citation `\@citeb' on page \thepage \space undefined}}%
-     {\let\NAT@last@num\NAT@num\let\NAT@last@nm\NAT@nm
-      \NAT@parse{\@citeb}%
-      \ifNAT@longnames\@ifundefined{bv@\@citeb\@extra@b@citeb}{%
-        \let\NAT@name=\NAT@all@names
-        \global\@namedef{bv@\@citeb\@extra@b@citeb}{}}{}%
-      \fi
-      \ifNAT@full\let\NAT@nm\NAT@all@names\else
-        \let\NAT@nm\NAT@name\fi
-      \ifNAT@swa
-       \@ifnum{\NAT@ctype>\@ne}{%
-        \@citea
-        \NAT@hyper@{\@ifnum{\NAT@ctype=\tw@}{\NAT@test{\NAT@ctype}}{\NAT@alias}}%
-       }{%
-        \@ifnum{\NAT@cmprs>\z@}{%
-         \NAT@ifcat@num\NAT@num
-          {\let\NAT@nm=\NAT@num}%
-          {\def\NAT@nm{-2}}%
-         \NAT@ifcat@num\NAT@last@num
-          {\@tempcnta=\NAT@last@num\relax}%
-          {\@tempcnta\m@ne}%
-         \@ifnum{\NAT@nm=\@tempcnta}{%
-          \@ifnum{\NAT@merge>\@ne}{}{\NAT@last@yr@mbox}%
-         }{%
-           \advance\@tempcnta by\@ne
-           \@ifnum{\NAT@nm=\@tempcnta}{%
-             \ifx\NAT@last@yr\relax
-               \def@NAT@last@yr{\@citea}%
-             \else
-               \def@NAT@last@yr{--\NAT@penalty}%
-             \fi
-           }{%
-             \NAT@last@yr@mbox
-           }%
-         }%
-        }{%
-         \@tempswatrue
-         \@ifnum{\NAT@merge>\@ne}{\@ifnum{\NAT@last@num=\NAT@num\relax}{\@tempswafalse}{}}{}%
-         \if@tempswa\NAT@citea@mbox\fi
-        }%
-       }%
-       \NAT@def@citea
-      \else
-        \ifcase\NAT@ctype
-          \ifx\NAT@last@nm\NAT@nm \NAT@yrsep\NAT@penalty\NAT@space\else
-            \@citea \NAT@test{\@ne}\NAT@spacechar\NAT@mbox{\NAT@super@kern\NAT@@open}%
-          \fi
-          \if*#1*\else#1\NAT@spacechar\fi
-          \NAT@mbox{\NAT@hyper@{{\citenumfont{\NAT@num}}}}%
-          \NAT@def@citea@box
-        \or
-          \NAT@hyper@citea@space{\NAT@test{\NAT@ctype}}%
-        \or
-          \NAT@hyper@citea@space{\NAT@test{\NAT@ctype}}%
-        \or
-          \NAT@hyper@citea@space\NAT@alias
-        \fi
-      \fi
-     }%
-    }%
-      \@ifnum{\NAT@cmprs>\z@}{\NAT@last@yr}{}%
-      \ifNAT@swa\else
-        \@ifnum{\NAT@ctype=\z@}{%
-          \if*#2*\else\NAT@cmt#2\fi
-        }{}%
-        \NAT@mbox{\NAT@@close}%
-      \fi
-  }{#1}{#2}%
-}%
-\def\NAT@citea@mbox{%
- \@citea\mbox{\NAT@hyper@{{\citenumfont{\NAT@num}}}}%
-}%
-\def\NAT@hyper@#1{%
- \hyper@natlinkstart{\@citeb\@extra@b@citeb}#1\hyper@natlinkend
-}%
-\def\NAT@hyper@citea#1{%
- \@citea
- \NAT@hyper@{#1}%
- \NAT@def@citea
-}%
-\def\NAT@hyper@citea@space#1{%
- \@citea
- \NAT@hyper@{#1}%
- \NAT@def@citea@space
-}%
-\def\def@NAT@last@yr#1{%
- \protected@edef\NAT@last@yr{%
-  #1%
-  \noexpand\mbox{%
-   \noexpand\hyper@natlinkstart{\@citeb\@extra@b@citeb}%
-   {\noexpand\citenumfont{\NAT@num}}%
-   \noexpand\hyper@natlinkend
-  }%
- }%
-}%
-\def\NAT@last@yr@mbox{%
- \NAT@last@yr\let\NAT@last@yr\relax
- \NAT@citea@mbox
-}%
-\newcommand\NAT@test[1]{%
- \@ifnum{#1=\@ne}{%
-  \ifx\NAT@nm\NAT@noname
-   \begingroup\reset@font\bfseries(author?)\endgroup
-   \PackageWarning{natbib}{%
-    Author undefined for citation`\@citeb' \MessageBreak on page \thepage%
-   }%
-  \else \NAT@nm
-  \fi
- }{%
-  \if\relax\NAT@date\relax
-   \begingroup\reset@font\bfseries(year?)\endgroup
-   \PackageWarning{natbib}{%
-    Year undefined for citation`\@citeb' \MessageBreak on page \thepage%
-   }%
-  \else \NAT@date
-  \fi
- }%
-}%
-\let\citenumfont=\@empty
-\newcommand\NAT@citex{}
-\def\NAT@citex%
-  [#1][#2]#3{%
-  \NAT@reset@parser
-  \NAT@sort@cites{#3}%
-  \NAT@reset@citea
-  \@cite{\let\NAT@nm\@empty\let\NAT@year\@empty
-    \@for\@citeb:=\NAT@cite@list\do
-    {\@safe@activestrue
-     \edef\@citeb{\expandafter\@firstofone\@citeb\@empty}%
-     \@safe@activesfalse
-     \@ifundefined{b@\@citeb\@extra@b@citeb}{\@citea%
-       {\reset@font\bfseries ?}\NAT@citeundefined
-                 \PackageWarning{natbib}%
-       {Citation `\@citeb' on page \thepage \space undefined}\def\NAT@date{}}%
-     {\let\NAT@last@nm=\NAT@nm\let\NAT@last@yr=\NAT@year
-      \NAT@parse{\@citeb}%
-      \ifNAT@longnames\@ifundefined{bv@\@citeb\@extra@b@citeb}{%
-        \let\NAT@name=\NAT@all@names
-        \global\@namedef{bv@\@citeb\@extra@b@citeb}{}}{}%
-      \fi
-     \ifNAT@full\let\NAT@nm\NAT@all@names\else
-       \let\NAT@nm\NAT@name\fi
-     \ifNAT@swa\ifcase\NAT@ctype
-       \if\relax\NAT@date\relax
-         \@citea\NAT@hyper@{\NAT@nmfmt{\NAT@nm}\NAT@date}%
-       \else
-         \ifx\NAT@last@nm\NAT@nm\NAT@yrsep
-            \ifx\NAT@last@yr\NAT@year
-              \def\NAT@temp{{?}}%
-              \ifx\NAT@temp\NAT@exlab\PackageWarningNoLine{natbib}%
-               {Multiple citation on page \thepage: same authors and
-               year\MessageBreak without distinguishing extra
-               letter,\MessageBreak appears as question mark}\fi
-              \NAT@hyper@{\NAT@exlab}%
-            \else\unskip\NAT@spacechar
-              \NAT@hyper@{\NAT@date}%
-            \fi
-         \else
-           \@citea\NAT@hyper@{%
-             \NAT@nmfmt{\NAT@nm}%
-             \hyper@natlinkbreak{%
-               \NAT@aysep\NAT@spacechar}{\@citeb\@extra@b@citeb
-             }%
-             \NAT@date
-           }%
-         \fi
-       \fi
-     \or\@citea\NAT@hyper@{\NAT@nmfmt{\NAT@nm}}%
-     \or\@citea\NAT@hyper@{\NAT@date}%
-     \or\@citea\NAT@hyper@{\NAT@alias}%
-     \fi \NAT@def@citea
-     \else
-       \ifcase\NAT@ctype
-        \if\relax\NAT@date\relax
-          \@citea\NAT@hyper@{\NAT@nmfmt{\NAT@nm}}%
-        \else
-         \ifx\NAT@last@nm\NAT@nm\NAT@yrsep
-            \ifx\NAT@last@yr\NAT@year
-              \def\NAT@temp{{?}}%
-              \ifx\NAT@temp\NAT@exlab\PackageWarningNoLine{natbib}%
-               {Multiple citation on page \thepage: same authors and
-               year\MessageBreak without distinguishing extra
-               letter,\MessageBreak appears as question mark}\fi
-              \NAT@hyper@{\NAT@exlab}%
-            \else
-              \unskip\NAT@spacechar
-              \NAT@hyper@{\NAT@date}%
-            \fi
-         \else
-           \@citea\NAT@hyper@{%
-             \NAT@nmfmt{\NAT@nm}%
-             \hyper@natlinkbreak{\NAT@spacechar\NAT@@open\if*#1*\else#1\NAT@spacechar\fi}%
-               {\@citeb\@extra@b@citeb}%
-             \NAT@date
-           }%
-         \fi
-        \fi
-       \or\@citea\NAT@hyper@{\NAT@nmfmt{\NAT@nm}}%
-       \or\@citea\NAT@hyper@{\NAT@date}%
-       \or\@citea\NAT@hyper@{\NAT@alias}%
-       \fi
-       \if\relax\NAT@date\relax
-         \NAT@def@citea
-       \else
-         \NAT@def@citea@close
-       \fi
-     \fi
-     }}\ifNAT@swa\else\if*#2*\else\NAT@cmt#2\fi
-     \if\relax\NAT@date\relax\else\NAT@@close\fi\fi}{#1}{#2}}
-\def\NAT@spacechar{\ }%
-\def\NAT@separator{\NAT@sep\NAT@penalty}%
-\def\NAT@reset@citea{\c@NAT@ctr\@ne\let\@citea\@empty}%
-\def\NAT@def@citea{\def\@citea{\NAT@separator\NAT@space}}%
-\def\NAT@def@citea@space{\def\@citea{\NAT@separator\NAT@spacechar}}%
-\def\NAT@def@citea@close{\def\@citea{\NAT@@close\NAT@separator\NAT@space}}%
-\def\NAT@def@citea@box{\def\@citea{\NAT@mbox{\NAT@@close}\NAT@separator\NAT@spacechar}}%
-\newif\ifNAT@par \NAT@partrue
-\newcommand\NAT@@open{\ifNAT@par\NAT@open\fi}
-\newcommand\NAT@@close{\ifNAT@par\NAT@close\fi}
-\newcommand\NAT@alias{\@ifundefined{al@\@citeb\@extra@b@citeb}{%
-  {\reset@font\bfseries(alias?)}\PackageWarning{natbib}
-  {Alias undefined for citation `\@citeb'
-  \MessageBreak on page \thepage}}{\@nameuse{al@\@citeb\@extra@b@citeb}}}
-\let\NAT@up\relax
-\newcommand\NAT@Up[1]{{\let\protect\@unexpandable@protect\let~\relax
-  \expandafter\NAT@deftemp#1}\expandafter\NAT@UP\NAT@temp}
-\newcommand\NAT@deftemp[1]{\xdef\NAT@temp{#1}}
-\newcommand\NAT@UP[1]{\let\@tempa\NAT@UP\ifcat a#1\MakeUppercase{#1}%
-  \let\@tempa\relax\else#1\fi\@tempa}
-\newcommand\shortcites[1]{%
-  \@bsphack\@for\@citeb:=#1\do
-  {\@safe@activestrue
-   \edef\@citeb{\expandafter\@firstofone\@citeb\@empty}%
-   \@safe@activesfalse
-   \global\@namedef{bv@\@citeb\@extra@b@citeb}{}}\@esphack}
-\newcommand\NAT@biblabel[1]{\hfill}
-\newcommand\NAT@biblabelnum[1]{\bibnumfmt{#1}}
-\let\bibnumfmt\@empty
-\providecommand\@biblabel[1]{[#1]}
-\AtBeginDocument{\ifx\bibnumfmt\@empty\let\bibnumfmt\@biblabel\fi}
-\newcommand\NAT@bibsetnum[1]{\settowidth\labelwidth{\@biblabel{#1}}%
-   \setlength{\leftmargin}{\labelwidth}\addtolength{\leftmargin}{\labelsep}%
-   \setlength{\itemsep}{\bibsep}\setlength{\parsep}{\z@}%
-   \ifNAT@openbib
-     \addtolength{\leftmargin}{\bibindent}%
-     \setlength{\itemindent}{-\bibindent}%
-     \setlength{\listparindent}{\itemindent}%
-     \setlength{\parsep}{0pt}%
-   \fi
-}
-\newlength{\bibhang}
-\setlength{\bibhang}{1em}
-\newlength{\bibsep}
- {\@listi \global\bibsep\itemsep \global\advance\bibsep by\parsep}
-
-\newcommand\NAT@bibsetup%
-   [1]{\setlength{\leftmargin}{\bibhang}\setlength{\itemindent}{-\leftmargin}%
-       \setlength{\itemsep}{\bibsep}\setlength{\parsep}{\z@}}
-\newcommand\NAT@set@cites{%
-  \ifNAT@numbers
-    \ifNAT@super \let\@cite\NAT@citesuper
-       \def\NAT@mbox##1{\unskip\nobreak\textsuperscript{##1}}%
-       \let\citeyearpar=\citeyear
-       \let\NAT@space\relax
-       \def\NAT@super@kern{\kern\p@}%
-    \else
-       \let\NAT@mbox=\mbox
-       \let\@cite\NAT@citenum
-       \let\NAT@space\NAT@spacechar
-       \let\NAT@super@kern\relax
-    \fi
-    \let\@citex\NAT@citexnum
-    \let\@biblabel\NAT@biblabelnum
-    \let\@bibsetup\NAT@bibsetnum
-    \renewcommand\NAT@idxtxt{\NAT@name\NAT@spacechar\NAT@open\NAT@num\NAT@close}%
-    \def\natexlab##1{}%
-    \def\NAT@penalty{\penalty\@m}%
-  \else
-    \let\@cite\NAT@cite
-    \let\@citex\NAT@citex
-    \let\@biblabel\NAT@biblabel
-    \let\@bibsetup\NAT@bibsetup
-    \let\NAT@space\NAT@spacechar
-    \let\NAT@penalty\@empty
-    \renewcommand\NAT@idxtxt{\NAT@name\NAT@spacechar\NAT@open\NAT@date\NAT@close}%
-    \def\natexlab##1{##1}%
-  \fi}
-\AtBeginDocument{\NAT@set@cites}
-\AtBeginDocument{\ifx\SK@def\@undefined\else
-\ifx\SK@cite\@empty\else
-  \SK@def\@citex[#1][#2]#3{\SK@\SK@@ref{#3}\SK@@citex[#1][#2]{#3}}\fi
-\ifx\SK@citeauthor\@undefined\def\HAR@checkdef{}\else
-  \let\citeauthor\SK@citeauthor
-  \let\citefullauthor\SK@citefullauthor
-  \let\citeyear\SK@citeyear\fi
-\fi}
-\newif\ifNAT@full\NAT@fullfalse
-\newif\ifNAT@swa
-\DeclareRobustCommand\citet
-   {\begingroup\NAT@swafalse\let\NAT@ctype\z@\NAT@partrue
-     \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
-\newcommand\NAT@citetp{\@ifnextchar[{\NAT@@citetp}{\NAT@@citetp[]}}
-\newcommand\NAT@@citetp{}
-\def\NAT@@citetp[#1]{\@ifnextchar[{\@citex[#1]}{\@citex[][#1]}}
-\DeclareRobustCommand\citep
-   {\begingroup\NAT@swatrue\let\NAT@ctype\z@\NAT@partrue
-         \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
-\DeclareRobustCommand\cite
-    {\begingroup\let\NAT@ctype\z@\NAT@partrue\NAT@swatrue
-      \@ifstar{\NAT@fulltrue\NAT@cites}{\NAT@fullfalse\NAT@cites}}
-\newcommand\NAT@cites{\@ifnextchar [{\NAT@@citetp}{%
-     \ifNAT@numbers\else
-     \NAT@swafalse
-     \fi
-    \NAT@@citetp[]}}
-\DeclareRobustCommand\citealt
-   {\begingroup\NAT@swafalse\let\NAT@ctype\z@\NAT@parfalse
-         \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
-\DeclareRobustCommand\citealp
-   {\begingroup\NAT@swatrue\let\NAT@ctype\z@\NAT@parfalse
-         \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
-\DeclareRobustCommand\citenum
-   {\begingroup
-     \NAT@swatrue\let\NAT@ctype\z@\NAT@parfalse\let\textsuperscript\NAT@spacechar
-     \NAT@citexnum[][]}
-\DeclareRobustCommand\citeauthor
-   {\begingroup\NAT@swafalse\let\NAT@ctype\@ne\NAT@parfalse
-    \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
-\DeclareRobustCommand\Citet
-   {\begingroup\NAT@swafalse\let\NAT@ctype\z@\NAT@partrue
-     \let\NAT@up\NAT@Up
-     \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
-\DeclareRobustCommand\Citep
-   {\begingroup\NAT@swatrue\let\NAT@ctype\z@\NAT@partrue
-     \let\NAT@up\NAT@Up
-         \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
-\DeclareRobustCommand\Citealt
-   {\begingroup\NAT@swafalse\let\NAT@ctype\z@\NAT@parfalse
-     \let\NAT@up\NAT@Up
-         \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
-\DeclareRobustCommand\Citealp
-   {\begingroup\NAT@swatrue\let\NAT@ctype\z@\NAT@parfalse
-     \let\NAT@up\NAT@Up
-         \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
-\DeclareRobustCommand\Citeauthor
-   {\begingroup\NAT@swafalse\let\NAT@ctype\@ne\NAT@parfalse
-     \let\NAT@up\NAT@Up
-    \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
-\DeclareRobustCommand\citeyear
-   {\begingroup\NAT@swafalse\let\NAT@ctype\tw@\NAT@parfalse\NAT@citetp}
-\DeclareRobustCommand\citeyearpar
-   {\begingroup\NAT@swatrue\let\NAT@ctype\tw@\NAT@partrue\NAT@citetp}
-\newcommand\citetext[1]{\NAT@open#1\NAT@close}
-\DeclareRobustCommand\citefullauthor
-   {\citeauthor*}
-\newcommand\defcitealias[2]{%
-   \@ifundefined{al@#1\@extra@b@citeb}{}
-   {\PackageWarning{natbib}{Overwriting existing alias for citation #1}}
-   \@namedef{al@#1\@extra@b@citeb}{#2}}
-\DeclareRobustCommand\citetalias{\begingroup
-   \NAT@swafalse\let\NAT@ctype\thr@@\NAT@parfalse\NAT@citetp}
-\DeclareRobustCommand\citepalias{\begingroup
-   \NAT@swatrue\let\NAT@ctype\thr@@\NAT@partrue\NAT@citetp}
-\renewcommand\nocite[1]{\@bsphack
-  \@for\@citeb:=#1\do{%
-    \@safe@activestrue
-    \edef\@citeb{\expandafter\@firstofone\@citeb\@empty}%
-    \@safe@activesfalse
-    \if@filesw\immediate\write\@auxout{\string\citation{\@citeb}}\fi
-    \if*\@citeb\else
-    \@ifundefined{b@\@citeb\@extra@b@citeb}{%
-       \NAT@citeundefined \PackageWarning{natbib}%
-       {Citation `\@citeb' undefined}}{}\fi}%
-  \@esphack}
-\newcommand\NAT@parse[1]{%
-  \begingroup
-   \let\protect=\@unexpandable@protect
-   \let~\relax
-   \let\active@prefix=\@gobble
-   \edef\NAT@temp{\csname b@#1\@extra@b@citeb\endcsname}%
-   \aftergroup\NAT@split
-   \expandafter
-  \endgroup
-  \NAT@temp{}{}{}{}{}@@%
-  \expandafter\NAT@parse@date\NAT@date??????@@%
-  \ifciteindex\NAT@index\fi
-}%
-\def\NAT@split#1#2#3#4#5@@{%
-  \gdef\NAT@num{#1}\gdef\NAT@name{#3}\gdef\NAT@date{#2}%
-  \gdef\NAT@all@names{#4}%
-  \ifx\NAT@num\@empty\gdef\NAT@num{0}\fi
-  \ifx\NAT@noname\NAT@all@names \gdef\NAT@all@names{#3}\fi
-}%
-\def\NAT@reset@parser{%
-  \global\let\NAT@num\@empty
-  \global\let\NAT@name\@empty
-  \global\let\NAT@date\@empty
-  \global\let\NAT@all@names\@empty
-}%
-\newcommand\NAT@parse@date{}
-\def\NAT@parse@date#1#2#3#4#5#6@@{%
-  \ifnum\the\catcode`#1=11\def\NAT@year{}\def\NAT@exlab{#1}\else
-  \ifnum\the\catcode`#2=11\def\NAT@year{#1}\def\NAT@exlab{#2}\else
-  \ifnum\the\catcode`#3=11\def\NAT@year{#1#2}\def\NAT@exlab{#3}\else
-  \ifnum\the\catcode`#4=11\def\NAT@year{#1#2#3}\def\NAT@exlab{#4}\else
-    \def\NAT@year{#1#2#3#4}\def\NAT@exlab{{#5}}\fi\fi\fi\fi}
-\newcommand\NAT@index{}
-\let\NAT@makeindex=\makeindex
-\renewcommand\makeindex{\NAT@makeindex
-  \renewcommand\NAT@index{\@bsphack\begingroup
-     \def~{\string~}\@wrindex{\NAT@idxtxt}}}
-\newcommand\NAT@idxtxt{\NAT@name\NAT@spacechar\NAT@open\NAT@date\NAT@close}
-\@ifxundefined\@indexfile{}{\let\NAT@makeindex\relax\makeindex}
-\newif\ifciteindex \citeindexfalse
-\newcommand\citeindextype{default}
-\newcommand\NAT@index@alt{{\let\protect=\noexpand\let~\relax
-  \xdef\NAT@temp{\NAT@idxtxt}}\expandafter\NAT@exp\NAT@temp\@nil}
-\newcommand\NAT@exp{}
-\def\NAT@exp#1\@nil{\index[\citeindextype]{#1}}
-
-\AtBeginDocument{%
-\@ifpackageloaded{index}{\let\NAT@index=\NAT@index@alt}{}}
-\newcommand\NAT@ifcmd{\futurelet\NAT@temp\NAT@ifxcmd}
-\newcommand\NAT@ifxcmd{\ifx\NAT@temp\relax\else\expandafter\NAT@bare\fi}
-\def\NAT@bare#1(#2)#3(@)#4\@nil#5{%
-  \if @#2
-    \expandafter\NAT@apalk#1, , \@nil{#5}%
-  \else
-  \NAT@wrout{\the\c@NAT@ctr}{#2}{#1}{#3}{#5}%
-\fi
-}
-\newcommand\NAT@wrout[5]{%
-\if@filesw
-      {\let\protect\noexpand\let~\relax
-       \immediate
-       \write\@auxout{\string\bibcite{#5}{{#1}{#2}{{#3}}{{#4}}}}}\fi
-\ignorespaces}
-\def\NAT@noname{{}}
-\renewcommand\bibitem{\@ifnextchar[{\@lbibitem}{\@lbibitem[]}}%
-\let\NAT@bibitem@first@sw\@secondoftwo
-\def\@lbibitem[#1]#2{%
-  \if\relax\@extra@b@citeb\relax\else
-    \@ifundefined{br@#2\@extra@b@citeb}{}{%
-     \@namedef{br@#2}{\@nameuse{br@#2\@extra@b@citeb}}%
-    }%
-  \fi
-  \@ifundefined{b@#2\@extra@b@citeb}{%
-   \def\NAT@num{}%
-  }{%
-   \NAT@parse{#2}%
-  }%
-  \def\NAT@tmp{#1}%
-  \expandafter\let\expandafter\bibitemOpen\csname NAT@b@open@#2\endcsname
-  \expandafter\let\expandafter\bibitemShut\csname NAT@b@shut@#2\endcsname
-  \@ifnum{\NAT@merge>\@ne}{%
-   \NAT@bibitem@first@sw{%
-    \@firstoftwo
-   }{%
-    \@ifundefined{NAT@b*@#2}{%
-     \@firstoftwo
-    }{%
-     \expandafter\def\expandafter\NAT@num\expandafter{\the\c@NAT@ctr}%
-     \@secondoftwo
-    }%
-   }%
-  }{%
-   \@firstoftwo
-  }%
-  {%
-   \global\advance\c@NAT@ctr\@ne
-   \@ifx{\NAT@tmp\@empty}{\@firstoftwo}{%
-    \@secondoftwo
-   }%
-   {%
-    \expandafter\def\expandafter\NAT@num\expandafter{\the\c@NAT@ctr}%
-    \global\NAT@stdbsttrue
-   }{}%
-   \bibitem@fin
-   \item[\hfil\NAT@anchor{#2}{\NAT@num}]%
-   \global\let\NAT@bibitem@first@sw\@secondoftwo
-   \NAT@bibitem@init
-  }%
-  {%
-   \NAT@anchor{#2}{}%
-   \NAT@bibitem@cont
-   \bibitem@fin
-  }%
-  \@ifx{\NAT@tmp\@empty}{%
-    \NAT@wrout{\the\c@NAT@ctr}{}{}{}{#2}%
-  }{%
-    \expandafter\NAT@ifcmd\NAT@tmp(@)(@)\@nil{#2}%
-  }%
-}%
-\def\bibitem@fin{%
- \@ifxundefined\@bibstop{}{\csname bibitem@\@bibstop\endcsname}%
-}%
-\def\NAT@bibitem@init{%
- \let\@bibstop\@undefined
-}%
-\def\NAT@bibitem@cont{%
- \let\bibitem@Stop\bibitemStop
- \let\bibitem@NoStop\bibitemContinue
-}%
-\def\BibitemOpen{%
- \bibitemOpen
-}%
-\def\BibitemShut#1{%
- \bibitemShut
- \def\@bibstop{#1}%
- \let\bibitem@Stop\bibitemStop
- \let\bibitem@NoStop\bibitemNoStop
-}%
-\def\bibitemStop{}%
-\def\bibitemNoStop{.\spacefactor\@mmm\space}%
-\def\bibitemContinue{\spacefactor\@mmm\space}%
-\mathchardef\@mmm=3000 %
-\providecommand{\bibAnnote}[3]{%
-  \BibitemShut{#1}%
-  \def\@tempa{#3}\@ifx{\@tempa\@empty}{}{%
-   \begin{quotation}\noindent
-    \textsc{Key:}\ #2\\\textsc{Annotation:}\ \@tempa
-   \end{quotation}%
-  }%
-}%
-\providecommand{\bibAnnoteFile}[2]{%
-  \IfFileExists{#2}{%
-    \bibAnnote{#1}{#2}{\input{#2}}%
-  }{%
-    \bibAnnote{#1}{#2}{}%
-  }%
-}%
-\let\bibitemOpen\relax
-\let\bibitemShut\relax
-\def\bibfield{\@ifnum{\NAT@merge>\tw@}{\@bibfield}{\@secondoftwo}}%
-\def\@bibfield#1#2{%
- \begingroup
-  \let\Doi\@gobble
-  \let\bibinfo\relax
-  \let\restore@protect\@empty
-  \protected@edef\@tempa{#2}%
-  \aftergroup\def\aftergroup\@tempa
- \expandafter\endgroup\expandafter{\@tempa}%
- \expandafter\@ifx\expandafter{\csname @bib#1\endcsname\@tempa}{%
-  \expandafter\let\expandafter\@tempa\csname @bib@X#1\endcsname
- }{%
-  \expandafter\let\csname @bib#1\endcsname\@tempa
-  \expandafter\let\expandafter\@tempa\csname @bib@Y#1\endcsname
- }%
- \@ifx{\@tempa\relax}{\let\@tempa\@firstofone}{}%
- \@tempa{#2}%
-}%
-\def\bibinfo#1{%
- \expandafter\let\expandafter\@tempa\csname bibinfo@X@#1\endcsname
- \@ifx{\@tempa\relax}{\@firstofone}{\@tempa}%
-}%
-\def\@bib@Xauthor#1{\let\@bib@Xjournal\@gobble}%
-\def\@bib@Xjournal#1{\begingroup\let\bibinfo@X@journal\@bib@Z@journal#1\endgroup}%
-\def\@bibibid@#1{\textit{ibid}.}%
-\appdef\NAT@bibitem@init{%
- \let\@bibauthor  \@empty
- \let\@bibjournal \@empty
- \let\@bib@Z@journal\@bibibid@
-}%
-\ifx\SK@lbibitem\@undefined\else
-   \let\SK@lbibitem\@lbibitem
-   \def\@lbibitem[#1]#2{%
-     \SK@lbibitem[#1]{#2}\SK@\SK@@label{#2}\ignorespaces}\fi
-\newif\ifNAT@stdbst \NAT@stdbstfalse
-
-\AtEndDocument{%
-  \ifNAT@stdbst\if@filesw
-   \immediate\write\@auxout{%
-    \string\providecommand\string\NAT@force@numbers{}%
-    \string\NAT@force@numbers
-   }%
-  \fi\fi
- }
-\newcommand\NAT@force@numbers{%
-  \ifNAT@numbers\else
-  \PackageError{natbib}{Bibliography not compatible with author-year
-  citations.\MessageBreak
-  Press <return> to continue in numerical citation style}
-  {Check the bibliography entries for non-compliant syntax,\MessageBreak
-   or select author-year BibTeX style, e.g. plainnat}%
-  \global\NAT@numberstrue\fi}
-
-\providecommand\bibcite{}
-\renewcommand\bibcite[2]{%
- \@ifundefined{b@#1\@extra@binfo}{\relax}{%
-   \NAT@citemultiple
-   \PackageWarningNoLine{natbib}{Citation `#1' multiply defined}%
- }%
- \global\@namedef{b@#1\@extra@binfo}{#2}%
-}%
-\AtEndDocument{\NAT@swatrue\let\bibcite\NAT@testdef}
-\newcommand\NAT@testdef[2]{%
-  \def\NAT@temp{#2}%
-  \expandafter \ifx \csname b@#1\@extra@binfo\endcsname\NAT@temp
-  \else
-    \ifNAT@swa \NAT@swafalse
-      \PackageWarningNoLine{natbib}{%
-        Citation(s) may have changed.\MessageBreak
-        Rerun to get citations correct%
-      }%
-    \fi
-  \fi
-}%
-\newcommand\NAT@apalk{}
-\def\NAT@apalk#1, #2, #3\@nil#4{%
-  \if\relax#2\relax
-    \global\NAT@stdbsttrue
-    \NAT@wrout{#1}{}{}{}{#4}%
-  \else
-    \NAT@wrout{\the\c@NAT@ctr}{#2}{#1}{}{#4}%
-  \fi
-}%
-\newcommand\citeauthoryear{}
-\def\citeauthoryear#1#2#3(@)(@)\@nil#4{%
-  \if\relax#3\relax
-    \NAT@wrout{\the\c@NAT@ctr}{#2}{#1}{}{#4}%
-  \else
-    \NAT@wrout{\the\c@NAT@ctr}{#3}{#2}{#1}{#4}%
-  \fi
-}%
-\newcommand\citestarts{\NAT@open}%
-\newcommand\citeends{\NAT@close}%
-\newcommand\betweenauthors{and}%
-\newcommand\astroncite{}
-\def\astroncite#1#2(@)(@)\@nil#3{%
- \NAT@wrout{\the\c@NAT@ctr}{#2}{#1}{}{#3}%
-}%
-\newcommand\citename{}
-\def\citename#1#2(@)(@)\@nil#3{\expandafter\NAT@apalk#1#2, \@nil{#3}}
-\newcommand\harvarditem[4][]{%
- \if\relax#1\relax
-   \bibitem[#2(#3)]{#4}%
- \else
-   \bibitem[#1(#3)#2]{#4}%
- \fi
-}%
-\newcommand\harvardleft{\NAT@open}
-\newcommand\harvardright{\NAT@close}
-\newcommand\harvardyearleft{\NAT@open}
-\newcommand\harvardyearright{\NAT@close}
-\AtBeginDocument{\providecommand{\harvardand}{and}}
-\newcommand\harvardurl[1]{\textbf{URL:} \textit{#1}}
-\providecommand\bibsection{}
-\@ifundefined{chapter}{%
-  \renewcommand\bibsection{%
-   \section*{\refname\@mkboth{\MakeUppercase{\refname}}{\MakeUppercase{\refname}}}%
-  }%
-}{%
-  \@ifxundefined\NAT@sectionbib{%
-    \renewcommand\bibsection{%
-      \chapter*{\bibname\@mkboth{\MakeUppercase{\bibname}}{\MakeUppercase{\bibname}}}%
-    }%
-  }{%
-    \renewcommand\bibsection{%
-      \section*{\bibname\ifx\@mkboth\@gobbletwo\else\markright{\MakeUppercase{\bibname}}\fi}%
-    }%
-  }%
-}%
-\@ifclassloaded{amsart}{\renewcommand\bibsection{\section*{\refname}}}{}%
-\@ifclassloaded{amsbook}{\renewcommand\bibsection{\chapter*{\bibname}}}{}%
-\@ifxundefined\bib@heading{}{\let\bibsection\bib@heading}%
-\newcounter{NAT@ctr}
-\renewenvironment{thebibliography}[1]{%
- \bibsection
- \parindent\z@
- \bibpreamble
- \bibfont
- \list{\@biblabel{\the\c@NAT@ctr}}{\@bibsetup{#1}\global\c@NAT@ctr\z@}%
- \ifNAT@openbib
-   \renewcommand\newblock{\par}%
- \else
-   \renewcommand\newblock{\hskip .11em \@plus.33em \@minus.07em}%
- \fi
- \sloppy\clubpenalty4000\widowpenalty4000
- \sfcode`\.\@m
- \let\NAT@bibitem@first@sw\@firstoftwo
-    \let\citeN\cite \let\shortcite\cite
-    \let\citeasnoun\cite
-}{%
- \bibitem@fin
- \bibpostamble
- \def\@noitemerr{%
-  \PackageWarning{natbib}{Empty `thebibliography' environment}%
- }%
- \endlist
- \bibcleanup
-}%
-\let\bibfont\@empty
-\let\bibpreamble\@empty
-\let\bibpostamble\@empty
-\def\bibcleanup{\vskip-\lastskip}%
-\providecommand\reset@font{\relax}
-\providecommand\bibname{Bibliography}
-\providecommand\refname{References}
-\newcommand\NAT@citeundefined{\gdef \NAT@undefined {%
-    \PackageWarningNoLine{natbib}{There were undefined citations}}}
-\let \NAT@undefined \relax
-\newcommand\NAT@citemultiple{\gdef \NAT@multiple {%
-    \PackageWarningNoLine{natbib}{There were multiply defined citations}}}
-\let \NAT@multiple \relax
-\AtEndDocument{\NAT@undefined\NAT@multiple}
-\providecommand\@mkboth[2]{}
-\providecommand\MakeUppercase{\uppercase}
-\providecommand{\@extra@b@citeb}{}
-\gdef\@extra@binfo{}
-\def\NAT@anchor#1#2{%
- \hyper@natanchorstart{#1\@extra@b@citeb}%
-  \def\@tempa{#2}\@ifx{\@tempa\@empty}{}{\@biblabel{#2}}%
- \hyper@natanchorend
-}%
-\providecommand\hyper@natanchorstart[1]{}%
-\providecommand\hyper@natanchorend{}%
-\providecommand\hyper@natlinkstart[1]{}%
-\providecommand\hyper@natlinkend{}%
-\providecommand\hyper@natlinkbreak[2]{#1}%
-\AtBeginDocument{%
-  \@ifpackageloaded{babel}{%
-     \let\org@@citex\@citex}{}}
-\providecommand\@safe@activestrue{}%
-\providecommand\@safe@activesfalse{}%
-
-\newcommand\NAT@sort@cites[1]{%
-  \let\NAT@cite@list\@empty
-  \@for\@citeb:=#1\do{\expandafter\NAT@star@cite\@citeb\@@}%
-  \if@filesw
-    \expandafter\immediate\expandafter\write\expandafter\@auxout
-      \expandafter{\expandafter\string\expandafter\citation\expandafter{\NAT@cite@list}}%
-  \fi
-  \@ifnum{\NAT@sort>\z@}{%
-    \expandafter\NAT@sort@cites@\expandafter{\NAT@cite@list}%
-  }{}%
-}%
-\def\NAT@star@cite{%
-  \let\NAT@star@sw\@secondoftwo
-  \@ifnum{\NAT@merge>\z@}{%
-   \@ifnextchar*{%
-    \let\NAT@star@sw\@firstoftwo
-    \NAT@star@cite@star
-   }{%
-    \NAT@star@cite@nostar
-   }%
-  }{%
-   \NAT@star@cite@noextension
-  }%
-}%
-\def\NAT@star@cite@star*{%
- \NAT@star@cite@nostar
-}%
-\def\NAT@star@cite@nostar{%
- \let\nat@keyopt@open\@empty
- \let\nat@keyopt@shut\@empty
- \@ifnextchar[{\NAT@star@cite@pre}{\NAT@star@cite@pre[]}%
-}%
-\def\NAT@star@cite@pre[#1]{%
- \def\nat@keyopt@open{#1}%
- \@ifnextchar[{\NAT@star@cite@post}{\NAT@star@cite@post[]}%
-}%
-\def\NAT@star@cite@post[#1]#2\@@{%
- \def\nat@keyopt@shut{#1}%
- \NAT@star@sw{\expandafter\global\expandafter\let\csname NAT@b*@#2\endcsname\@empty}{}%
- \NAT@cite@list@append{#2}%
-}%
-\def\NAT@star@cite@noextension#1\@@{%
-  \let\nat@keyopt@open\@empty
-  \let\nat@keyopt@shut\@empty
-  \NAT@cite@list@append{#1}%
-}%
-\def\NAT@cite@list@append#1{%
-  \edef\@citeb{\@firstofone#1\@empty}%
-  \if@filesw\@ifxundefined\@cprwrite{}{\expandafter\@cprwrite\@citeb=}\fi
-  \if\relax\nat@keyopt@open\relax\else
-   \global\expandafter\let\csname NAT@b@open@\@citeb\endcsname\nat@keyopt@open
-  \fi
-  \if\relax\nat@keyopt@shut\relax\else
-   \global\expandafter\let\csname NAT@b@shut@\@citeb\endcsname\nat@keyopt@shut
-  \fi
-  \toks@\expandafter{\NAT@cite@list}%
-  \ifx\NAT@cite@list\@empty
-    \@temptokena\expandafter{\@citeb}%
-  \else
-    \@temptokena\expandafter{\expandafter,\@citeb}%
-  \fi
-  \edef\NAT@cite@list{\the\toks@\the\@temptokena}%
-}%
-\newcommand\NAT@sort@cites@[1]{%
-  \count@\z@
-  \@tempcntb\m@ne
-  \let\@celt\delimiter
-  \def\NAT@num@list{}%
-  \let\NAT@cite@list\@empty
-  \let\NAT@nonsort@list\@empty
-  \@for \@citeb:=#1\do{\NAT@make@cite@list}%
-  \ifx\NAT@nonsort@list\@empty\else
-   \protected@edef\NAT@cite@list{\NAT@cite@list\NAT@nonsort@list}%
-  \fi
-  \ifx\NAT@cite@list\@empty\else
-   \protected@edef\NAT@cite@list{\expandafter\NAT@xcom\NAT@cite@list @@}%
-  \fi
-}%
-\def\NAT@make@cite@list{%
-  \advance\count@\@ne
-  \@safe@activestrue
-  \edef\@citeb{\expandafter\@firstofone\@citeb\@empty}%
-  \@safe@activesfalse
-  \@ifundefined{b@\@citeb\@extra@b@citeb}%
-   {\def\NAT@num{A}}%
-   {\NAT@parse{\@citeb}}%
-  \NAT@ifcat@num\NAT@num
-   {\@tempcnta\NAT@num \relax
-    \@ifnum{\@tempcnta<\@tempcntb}{%
-      \let\NAT@@cite@list=\NAT@cite@list
-      \let\NAT@cite@list\@empty
-      \begingroup\let\@celt=\NAT@celt\NAT@num@list\endgroup
-      \protected@edef\NAT@num@list{%
-       \expandafter\NAT@num@celt \NAT@num@list \@gobble @%
-      }%
-    }{%
-      \protected@edef\NAT@num@list{\NAT@num@list \@celt{\NAT@num}}%
-      \protected@edef\NAT@cite@list{\NAT@cite@list\@citeb,}%
-      \@tempcntb\@tempcnta
-    }%
-   }%
-   {\protected@edef\NAT@nonsort@list{\NAT@nonsort@list\@citeb,}}%
-}%
-\def\NAT@celt#1{%
-  \@ifnum{#1>\@tempcnta}{%
-    \xdef\NAT@cite@list{\NAT@cite@list\@citeb,\NAT@@cite@list}%
-    \let\@celt\@gobble
-  }{%
-    \expandafter\def@NAT@cite@lists\NAT@@cite@list\@@
-  }%
-}%
-\def\NAT@num@celt#1#2{%
- \ifx#1\@celt
-  \@ifnum{#2>\@tempcnta}{%
-    \@celt{\number\@tempcnta}%
-    \@celt{#2}%
-  }{%
-    \@celt{#2}%
-    \expandafter\NAT@num@celt
-  }%
- \fi
-}%
-\def\def@NAT@cite@lists#1,#2\@@{%
-  \xdef\NAT@cite@list{\NAT@cite@list#1,}%
-  \xdef\NAT@@cite@list{#2}%
-}%
-\def\NAT@nextc#1,#2@@{#1,}
-\def\NAT@restc#1,#2{#2}
-\def\NAT@xcom#1,@@{#1}
-\InputIfFileExists{natbib.cfg}
-       {\typeout{Local config file natbib.cfg used}}{}
-%% 
-%% <<<<< End of generated file <<<<<<
-%%
-%% End of file `natbib.sty'.
diff --git a/papers/aleksandar_makelov/thumbnail.png b/papers/aleksandar_makelov/thumbnail.png
new file mode 100644
index 0000000000..14727157cc
Binary files /dev/null and b/papers/aleksandar_makelov/thumbnail.png differ
diff --git a/papers/amadi_udu/banner.png b/papers/amadi_udu/banner.png
index e6a793bd6c..c4b99d1fe4 100644
Binary files a/papers/amadi_udu/banner.png and b/papers/amadi_udu/banner.png differ
diff --git a/papers/amadi_udu/main.md b/papers/amadi_udu/main.md
index ec0eb549c0..9a816b321b 100644
--- a/papers/amadi_udu/main.md
+++ b/papers/amadi_udu/main.md
@@ -3,26 +3,31 @@
 title: Computational Resource Optimisation in Feature Selection under Class Imbalance Conditions
 abstract: |
   Feature selection is crucial for reducing data dimensionality as well as enhancing model interpretability and performance in machine learning tasks. However, selecting the most informative features in large dataset often incurs high computational costs. This study explores the possibility of performing feature selection on a subset of data to reduce the computational burden. The study uses five real-life datasets with substantial sample sizes and severe class imbalance ratios between 0.09 – 0.18. The results illustrate the variability of feature importance with smaller sample fractions in different models. In this cases considered, light gradient-boosting machine exhibited the least variability, even with reduced sample fractions, while also incurring the least computational resource.
-
 ---
+
 (sec:introduction)=
+
 ## Introduction
 
-In the development of prediction models for real-world applications, two key challenges often arise: high-dimensionality resulting from the numerous features, and class-imbalance due to the rarity of samples in the positive class. Feature selection methods are  utilised to address issues of high-dimensionality by selecting a smaller subset of relevant features, thus reducing noise, increasing interpretability, and enhancing model performance [@Cai2018; @Dhal2022; @Udu2023a]. 
+In the development of prediction models for real-world applications, two key challenges often arise: high-dimensionality resulting from the numerous features, and class-imbalance due to the rarity of samples in the positive class. Feature selection methods are utilized to address issues of high-dimensionality by selecting a smaller subset of relevant features, thus reducing noise, increasing interpretability, and enhancing model performance [@Cai2018;@Dhal2022;@Udu2023a].
 
-Studies [@Yin2013; @Tsai2020; @deHaro-Garcia2020; @Matharaarachchi2021] on the performance of feature selection methods with class imbalance data have been undertaken on using synthetic and real-life datasets. A significant drawback noted was the computational cost of their approach on large sample sizes.  While experimental investigations of feature selection amid class imbalance conditions have been studied in the literature, there is a need to further understand the effect of sample size on performance degradation of feature selection methods. This would offer valuable insights into tackling the associated resource expense involved in undertaking feature selection with respect to large sample sizes where class-imbalance exists, for a wide range of applications. 
+Studies [@Yin2013;@Tsai2020;@deHaro-Garcia2020;@Matharaarachchi2021] on the performance of feature selection methods with class imbalance data have been undertaken on using synthetic and real-life datasets. A significant drawback noted was the computational cost of their approach on large sample sizes. While experimental investigations of feature selection amid class imbalance conditions have been studied in the literature, there is a need to further understand the effect of sample size on performance degradation of feature selection methods. This would offer valuable insights into tackling the associated resource expense involved in undertaking feature selection with respect to large sample sizes where class-imbalance exists, for a wide range of applications.
 
-This study investigates the impact of performing feature selection on a reduced dataset on feature importance and model performance, using five real-life datasets characterised by large sample sizes and severe class imbalance structures. We employ a feature selection process that utilises permutation feature importance (PFI) and evaluate the feature importance on three selected models; namely light gradient-boosting machine (Light GBM), random forest (RF) and support vector machines (SVM). These models are popular in real-world machine learning (ML) studies and also serve as a benchmark for comparing novel models [@bonaccorso2018; @feng2019; @sarker2021; @paleyes2022; @udu2024a]. Feature importance was evaluated using the area under the Receiver Operator Characteristics (ROC) curve, commonly referred to as AUC owing to its suitability in class imbalance problems [@Luque2019; @Temraz2022]. The development of the ML framework and data visualisation in this study was facilitated by several key Python libraries. Pandas [@pandas1] and NumPy [@numpy] were used for data loading and numerical computations, respectively. Scikit-learn [@sklearn1] provided tools for data preprocessing, model development, and evaluation. Matplotlib [@matplotlib] was employed for visualising data structures. Additionally, the SciPy [@scipy] library's cluster, spatial, and stats modules were crucial for hierarchical clustering, Spearman rank correlation, and distance matrix computations.
+This study investigates the impact of performing feature selection on a reduced dataset on feature importance and model performance, using five real-life datasets characterised by large sample sizes and severe class imbalance structures. We employ a feature selection process that utilises permutation feature importance (PFI) and evaluate the feature importance on three selected models; namely light gradient-boosting machine (Light GBM), random forest (RF) and support vector machines (SVM). These models are popular in real-world machine learning (ML) studies and also serve as a benchmark for comparing novel models [@bonaccorso2018;@feng2019;@sarker2021;@paleyes2022;@udu2024a]. Feature importance was evaluated using the area under the Receiver Operator Characteristics (ROC) curve, commonly referred to as AUC owing to its suitability in class imbalance problems [@Luque2019;@Temraz2022]. The development of the ML framework and data visualisation in this study was facilitated by several key Python libraries. Pandas [@pandas1] and NumPy [@numpy] were used for data loading and numerical computations, respectively. Scikit-learn [@sklearn1] provided tools for data preprocessing, model development, and evaluation. Matplotlib [@matplotlib] was employed for visualising data structures. Additionally, the SciPy [@scipy] library's cluster, spatial, and stats modules were crucial for hierarchical clustering, Spearman rank correlation, and distance matrix computations.
 
 The rest of the paper is organised as follows: @sec:methodology briefly outlines the methodology adopted, while @sec:results presents the results and discussion. The conclusion of the study is provided in @sec:conclusion.
 
 (sec:methodology)=
+
 ## Methodology
+
 ### Description of datasets
+
 Five real-life datasets from different subject areas were considered in this study. Four of the datasets were obtained from the UC Irvine ML repository, including CDC Diabetes Health Indicator [@diabetes], Census Income [@census_income], Bank Marketing [@bank_marketing], and Statlog (Shuttle) [@statlog]. The fifth dataset is Moisture Absorbed Composite [@Osa-uwagboe2024] from a damage morphology study. The datasets are presented in @tbl:dataset_summary. Notably, all datasets exhibited high class imbalance ratios from 0.09 - 0.18 (i.e., the ratio of the number of samples in the minority class over that of the majority class).
 
 :::{table} Summary of datasets used in the study
 :label: tbl:dataset_summary
+
 <table>
   <thead>
     <tr>
@@ -73,14 +78,15 @@ Five real-life datasets from different subject areas were considered in this stu
 </table>
 :::
 
-Building data-driven models in the presence of  high dimensionality includes several steps such as data preprocessing, feature selection, model training and evaluation. To address class imbalance issues during model training, an additional resampling step may be performed to adjust the uneven distribution of class samples [@Udu2023b; @rezvani2023; @Udu2024b]. This paper, however, focuses on the feature selection method, model training, and  the evaluation metrics adopted.
+Building data-driven models in the presence of high dimensionality includes several steps such as data preprocessing, feature selection, model training and evaluation. To address class imbalance issues during model training, an additional resampling step may be performed to adjust the uneven distribution of class samples [@Udu2023b; @rezvani2023; @Udu2024b]. This paper, however, focuses on the feature selection method, model training, and the evaluation metrics adopted.
+
+### Feature selection and model training
 
-###  Feature selection and model training
 To maintain a model-agnostic approach that is not confined to any specific ML algorithm, this study employed PFI for feature selection. PFI assesses how each feature affects the model's performance by randomly shuffling the values of a feature and noting the resulting change in performance. In essence, if a feature is important, shuffling its values should significantly reduce the model's performance since the model relies on that feature to make predictions. A positive importance score suggests that a feature is useful for the model's prediction as permuting the values of the feature led to a decrease in the model’s performance. Conversely, a negative importance score suggests that a feature might be introducing noise and the model might perform better without it. Thus, PFI interrupts the link between a feature and its predicted outcome, enabling us determine the extent to which a model relies on a particular feature [@Li2017; @sklearn1; @Kaneko2022]. It is noteworthy that the effect of permuting one feature could be negligible when features are collinear. Hence, an important feature may report a low score. To tackle this, a hierarchical cluster on a Spearman rank-order correlation can be adopted, with a threshold taking from visual inspection of the dendrograms in grouping features into clusters and selecting the feature to retain.
 
-Datasets were loaded using pandas, and categorical features were encoded using one-hot encoding. The Spearman correlation matrix was computed and then converted into a distance matrix. Hierarchical clustering was subsequently performed using Ward’s linkage method, and  a threshold for grouping features into clusters was determined through visual inspection of the dendrograms, allowing for the selection of features to retain. Subsequently, the investigation proceeded in two steps. In step 1,  all samples of the respective dataset was used. The dataset was split into training and test sets based on a test-size of 0.25. The respective classifiers were initialised using their default hyper-parameter settings and fitted on the training data. Thereafter, PFI was computed on the fitted model with number of times a feature is permuted set to 30 repeats. Lastly, the change in AUC was evaluated on the test set.
+Datasets were loaded using pandas, and categorical features were encoded using one-hot encoding. The Spearman correlation matrix was computed and then converted into a distance matrix. Hierarchical clustering was subsequently performed using Ward’s linkage method, and a threshold for grouping features into clusters was determined through visual inspection of the dendrograms, allowing for the selection of features to retain. Subsequently, the investigation proceeded in two steps. In step 1, all samples of the respective dataset was used. The dataset was split into training and test sets based on a test-size of 0.25. The respective classifiers were initialised using their default hyper-parameter settings and fitted on the training data. Thereafter, PFI was computed on the fitted model with number of times a feature is permuted set to 30 repeats. Lastly, the change in AUC was evaluated on the test set.
 
-In the second step, we initiate three for-loops to handle the different features, fractions of samples, and repetition of the PFI process undertaken in step 1. Sample fraction sizes were taken from 10% – 100% in increments of 10%, with the entire process randomly repeated 10 times. This provided an array of 300 AUC scores for each sample fraction and respective feature of the PFI process. To ensure reproducibility,  the random state for the classifiers, sample fractions, data split, and permutation importance were predefined.  Computation processes were accelerated using the joblib parallel library on the Sulis High Performance Computing platform.  A sample source code of step 2 is presented:
+In the second step, we initiate three for-loops to handle the different features, fractions of samples, and repetition of the PFI process undertaken in step 1. Sample fraction sizes were taken from 10% – 100% in increments of 10%, with the entire process randomly repeated 10 times. This provided an array of 300 AUC scores for each sample fraction and respective feature of the PFI process. To ensure reproducibility, the random state for the classifiers, sample fractions, data split, and permutation importance were predefined. Computation processes were accelerated using the joblib parallel library on the Sulis High Performance Computing platform. A sample source code of step 2 is presented:
 
 ```python
 # Define the function for parallel execution
@@ -92,15 +98,17 @@ def process_feature(f_no, selected_features, df):
             pfi = permutation_importance(model, X_val, y_val, n_repeats=30,
                                        random_state=rand, scoring='roc_auc', n_jobs=-1)
     return final_df
-# Parallelise computation 
+# Parallelise computation
 results = Parallel(n_jobs=-1)(delayed(process_feature)(f_no, selected_features, df) for f_no in range(len(selected_features)))
 ```
 
 (sec:results)=
+
 ## Results and Discussions
-The hierarchical cluster and Spearman’s ranking for moisture absorbed composite dataset is shown in [Figure 1a](#hiercorr-a) and [b](#hiercorr-a) respectively (Frequency Centroid – FC, Peak Frequency – PF, Rise Time – RT, Initiation Frequency – IF, Average Signal Level – ASL, Duration – D, Counts – C, Amplitude – A and Absolute Energy – AE). Based on the visual inspection of the hierarchical cluster, a threshold of 0.8 was selected, thus, retaining features RT, C, ASL, and FC. 
 
-:::{figure} 
+The hierarchical cluster and Spearman’s ranking for moisture absorbed composite dataset is shown in [Figure 1a](#hiercorr-a) and [b](#hiercorr-a) respectively (Frequency Centroid – FC, Peak Frequency – PF, Rise Time – RT, Initiation Frequency – IF, Average Signal Level – ASL, Duration – D, Counts – C, Amplitude – A and Absolute Energy – AE). Based on the visual inspection of the hierarchical cluster, a threshold of 0.8 was selected, thus, retaining features RT, C, ASL, and FC.
+
+:::{figure}
 :alt: Hierarchical cluster and Spearman correlation for GSVS
 :width: 30%
 :align: center
@@ -111,12 +119,13 @@ The hierarchical cluster and Spearman’s ranking for moisture absorbed composit
 Feature relationship for moisture absorbed composite dataset; (a) hierarchical cluster, (b) Spearman correlation ranking.
 :::
 
-As observed in [Figure 1a](#hiercorr-a), Frequency Centroid and Peak Frequency are in the same cluster with a highly correlated value of 0.957 shown in [Figure 1b](#hiercorr-a). Similarly, Rise Time and Initiation Frequency are clustered with a highly negative correlation of -0.862. Amplitude and Absolute Energy also exhibited a high positive correlation of 0.981. 
+As observed in [Figure 1a](#hiercorr-a), Frequency Centroid and Peak Frequency are in the same cluster with a highly correlated value of 0.957 shown in [Figure 1b](#hiercorr-a). Similarly, Rise Time and Initiation Frequency are clustered with a highly negative correlation of -0.862. Amplitude and Absolute Energy also exhibited a high positive correlation of 0.981.
 
-@tbl:result_table gives the median and interquartile (IQR) feature importance scores based on change in AUC for the LightGBM, RF and SVM models. These scores were obtained using all samples in the PFI process. Values emphasised in bold fonts represent the  highest ranked feature for the respective models based on their median change in AUC.
+@tbl:result_table gives the median and interquartile (IQR) feature importance scores based on change in AUC for the LightGBM, RF and SVM models. These scores were obtained using all samples in the PFI process. Values emphasised in bold fonts represent the highest ranked feature for the respective models based on their median change in AUC.
 
-:::{table} Median and IQR feature importance scores based on change in AUC for LightGBM, RF and SVM models, (values in bold fonts represent the  highest ranked feature for the respective models).
+:::{table} Median and IQR feature importance scores based on change in AUC for LightGBM, RF and SVM models, (values in bold fonts represent the highest ranked feature for the respective models).
 :label: tbl:result_table
+
 <table border="1">
   <tr>
     <th colspan="11">Census Income</th>    <th> </th>    <th colspan="11">Bank Marketing</th>
@@ -125,13 +134,13 @@ As observed in [Figure 1a](#hiercorr-a), Frequency Centroid and Peak Frequency a
     <th colspan="1">ID</th>    <th colspan="1">Feature</th>    <th colspan="3">LightGBM</th>    <th colspan="3">RF</th>    <th colspan="3">SVM</th>    <th></th>    <th colspan="1">ID</th>    <th colspan="1">Feature</th>    <th colspan="3">LightGBM</th>    <th colspan="3">RF</th>    <th colspan="3">SVM</th>
   </tr>
   <tr>
-    <th> </th>    <th> </th>     <th>Med</th>    <th colspan="2">IQR </th>    <th>Med</th>    <th colspan="2">IQR </th>    <th>Med</th>    <th colspan="2">IQR </th>    <th> </th>    <th> </th>    <th> </th> 
+    <th> </th>    <th> </th>     <th>Med</th>    <th colspan="2">IQR </th>    <th>Med</th>    <th colspan="2">IQR </th>    <th>Med</th>    <th colspan="2">IQR </th>    <th> </th>    <th> </th>    <th> </th>
     <th>Med</th>    <th colspan="2">IQR </th>    <th>Med</th>    <th colspan="2">IQR </th>    <th>Med</th>    <th colspan="2">IQR </th>
-  </tr> 
+  </tr>
   <tr>
     <th> </th>    <th> </th>     <th> </th>     <th>25<sup>th</sup></th>    <th>75<sup>th</sup></th>    <th> </th>    <th>25<sup>th</sup></th>    <th>75<sup>th</sup></th>    <th> </th>    <th>25<sup>th</sup></th>
     <th>75<sup>th</sup></th>    <th> </th>    <th> </th>    <th> </th>     <th> </th>     <th>25<sup>th</sup></th>    <th>75<sup>th</sup></th>    <th> </th>    <th>25<sup>th</sup></th>    <th>75<sup>th</sup></th>
-    <th> </th>    <th>25<sup>th</sup></th>    <th>75<sup>th</sup></th>  </tr>  
+    <th> </th>    <th>25<sup>th</sup></th>    <th>75<sup>th</sup></th>  </tr>
 <tr>
     <td>0</td><td>Age</td> <th>0.117</th> <th>0.114</th> <th>0.121</th> <th>0.066</th> <th>0.061</th> <th>0.069</th> <td><10<sup>-3</sup></td> <td><10<sup>-3</sup></td> <td><10<sup>-3</sup></td> <td> </td>
     <td>0</td><td>Age</td> <td>0.016</td> <td>0.015</td> <td>0.017</td> <td>0.026</td> <td>0.024</td> <td>0.027</td> <td>-0.001</td> <td>-0.001</td> <td>0.002</td>
@@ -196,25 +205,25 @@ As observed in [Figure 1a](#hiercorr-a), Frequency Centroid and Peak Frequency a
   <tr>
    <td>0</td><td>Risetime</td> <td>0.005</td>    <td>0.005</td>    <td>0.005</td>    <td>0.009</td>    <td>0.008</td>    <td>0.009</td><td>0.004</td><td>0.003</td><td>0.004</td>
   <td></td>
-  </tr>							
+  </tr>
   <tr>
    <td>1</td><td>Counts</td>    <td>0.037</td>    <td>0.037</td>    <td>0.037</td>    <td>0.075</td>    <td>0.073</td>    <td>0.075</td>    <td>0.009</td>    <td>0.009</td>    <td>0.009</td>
   <td></td>
   </tr>
-  <tr> 
+  <tr>
    <td>2</td>    <td>ASL</td>    <td>0.034</td>    <td>0.034</td>    <td>0.034</td>    <td>0.072</td>    <td>0.071</td>    <td>0.073</td>    <td><10<sup>-3</sup></td> <td><10<sup>-3</sup></td>    <td><10<sup>-3</sup></td>
   <td></td>
   </tr>
-  <tr> 
+  <tr>
    <td>3</td>    <td>Freq. Centroid</td>    <th>0.468</th>    <th>0.466</th>    <th>0.470</th>    <th>0.463</th>    <th>0.461</th>    <th>0.465</th>    <th>0.422</th> <th>0.421</th>    <th>0.425</th>
   <td></td>
   </tr>
 </table>
 :::
-   
-From @tbl:result_table, SVM tended to be have very low scores in some datasets, possibly due to its reliance of support vectors in determining the decision boundaries. Thus, features with strong influence at the decision boundary but not directly affecting the support vectors may seem less important.   For the Moisture Absorbed Composite dataset, the three classifiers reported similar scores for Frequency Centroid of 0.468, 0.466 and 0.422 respectively in @tbl:result_table. 
 
-However, in Bank Marketing dataset, LightGBM and RF identified Feature 1 as a relatively important feature, while SVM considered it insignificant. The mutability of importance scores for the classifiers considered underscores the need to explore multiple classifiers when undertaking a comprehensive investigation of feature importance for feature selection purposes. 
+From @tbl:result_table, SVM tended to be have very low scores in some datasets, possibly due to its reliance of support vectors in determining the decision boundaries. Thus, features with strong influence at the decision boundary but not directly affecting the support vectors may seem less important. For the Moisture Absorbed Composite dataset, the three classifiers reported similar scores for Frequency Centroid of 0.468, 0.466 and 0.422 respectively in @tbl:result_table.
+
+However, in Bank Marketing dataset, LightGBM and RF identified Feature 1 as a relatively important feature, while SVM considered it insignificant. The mutability of importance scores for the classifiers considered underscores the need to explore multiple classifiers when undertaking a comprehensive investigation of feature importance for feature selection purposes.
 
 [Figure 2](#db_time-a) shows the PFI process time and corresponding sample fractions for the Diabetes dataset, which has a substantial sample size of 253,680 instances. The results are based on one independent run, with PFI set at 30 feature-permuted repeats. For LightGBM and RF, the PFI process time increased linearly with larger sample fractions, whereas SVM experienced an exponential growth. LightGBM had the lowest computational cost, with CPU process times of 3.9 seconds and 28.8 seconds for 10% and 100% sample fractions, respectively. SVM required 21,263 seconds to process the entire dataset, reflecting a 9,345% increase in CPU computational cost compared to using a 10% sample fraction. SVM's poor performance relative to LightGBM and RF is likely due to its poor CPU parallelisability.
 
@@ -223,15 +232,15 @@ However, in Bank Marketing dataset, LightGBM and RF identified Feature 1 as a re
 :label: fig:db_time
 :width: 30%
 :align: center
-(db_time-a)= 
+(db_time-a)=
 ![](./images/time_plot.png)
 
 PFI process time and corresponding sample fractions for the Diabetes dataset.
 :::
 
-[Figure 3a](#ci_boxplot-a) - [c](#ci_boxplot-c) present the PFI for Final Weight feature of Census Income dataset, evaluated across different sample fractions using LightGBM, RF, and  SVM models, respectively. The change in AUC indicates the impact on model performance when Final Weight feature is permuted.  Generally, for smaller sample fractions, there was a higher variability in AUC and prominence of outliers. This could be attributed to the increased influence of randomness, fewer data points, and sampling fluctuations for smaller sample fractions across the datasets.
+[Figure 3a](#ci_boxplot-a) - [c](#ci_boxplot-c) present the PFI for Final Weight feature of Census Income dataset, evaluated across different sample fractions using LightGBM, RF, and SVM models, respectively. The change in AUC indicates the impact on model performance when Final Weight feature is permuted. Generally, for smaller sample fractions, there was a higher variability in AUC and prominence of outliers. This could be attributed to the increased influence of randomness, fewer data points, and sampling fluctuations for smaller sample fractions across the datasets.
 
-:::{figure} 
+:::{figure}
 :alt: Sample fractions and corresponding change in AUC for Final Weight feature of Census Income dataset
 :width: 20%
 :align: center
@@ -246,7 +255,7 @@ PFI process time and corresponding sample fractions for the Diabetes dataset.
 Sample fractions and corresponding change in AUC for Final Weight feature of Census Income dataset; (a) LightGBM, (b) RF, and (c) SVM.
 :::
 
-:::{figure} 
+:::{figure}
 :alt: Sample fractions and corresponding change in AUC for Duration feature of Bank Marketing dataset
 :width: 20%
 :align: center
@@ -261,7 +270,7 @@ Sample fractions and corresponding change in AUC for Final Weight feature of Cen
 Sample fractions and corresponding change in AUC for Duration feature of Bank Marketing dataset; (a) LightGBM, (b) RF, and (c) SVM.
 :::
 
-:::{figure} 
+:::{figure}
 :alt: Sample fractions and corresponding change in AUC for Rad Flow feature of Statlog
 :width: 20%
 :align: center
@@ -276,19 +285,20 @@ Sample fractions and corresponding change in AUC for Duration feature of Bank Ma
 Sample fractions and corresponding change in AUC for Rad Flow feature of Statlog (Shuttle) dataset; (a) LightGBM, (b) RF, and (c) SVM.
 :::
 
+For LightGBM model in [Figure 3a](#ci_boxplot-a), the median change in AUC was close to zero, indicating that Final Weight had minimal impact on model performance, as noted in @tbl:result_table. Similar results were recorded in [Figure 4a](#bm_boxplot-a) - [c](#bm_boxplot-c) for the Duration feature of Bank Marketing dataset, where all models exhibited similarly high feature importance scores. Even for sample fractions of 0.5, LightGBM and RF appeared to give similar importance scores to using the entire data sample. On the other hand, SVM exhibited a higher median change in AUC, indicating that the Final Weight feature had a more significant impact on its performance. Additionally, SVM showed the greatest variability and the most prominent outliers, particularly at lower sample fractions. This was noticeable in [Figure 5a](#ss_boxplot-a) - [c](#ss_boxplot-c), where all classifiers reported similar importance scores as noted in @tbl:result_table. This variability and the presence of outliers suggest that the model's performance is less stable when features are permuted.
 
-For LightGBM model in [Figure 3a](#ci_boxplot-a), the median change in AUC was close to  zero,  indicating that Final Weight had minimal impact on model performance, as noted in @tbl:result_table. Similar results were recorded in [Figure 4a](#bm_boxplot-a) - [c](#bm_boxplot-c) for the Duration feature of Bank Marketing dataset, where all models exhibited similarly high feature importance scores. Even for sample fractions of 0.5, LightGBM and RF appeared to give similar importance scores to using the entire data sample. On the other hand, SVM exhibited a higher median change in AUC, indicating that the Final Weight feature had a more significant impact on its performance. Additionally, SVM showed the greatest variability and the most prominent outliers, particularly at lower sample fractions. This was noticeable in [Figure 5a](#ss_boxplot-a) - [c](#ss_boxplot-c), where all classifiers reported similar importance scores as noted in @tbl:result_table. This variability and the presence of outliers suggest that the model's performance is less stable when features are permuted. 
-
-PFI can provide insights into the importance of features, but it is susceptible to variability, especially with smaller sample sizes. Thus, complementary feature selection methods could be explored to validate feature importance. Future work could investigate the variability of features under particular models and sample sizes, with a view to evolving methods of providing a more stable information to the models. 
+PFI can provide insights into the importance of features, but it is susceptible to variability, especially with smaller sample sizes. Thus, complementary feature selection methods could be explored to validate feature importance. Future work could investigate the variability of features under particular models and sample sizes, with a view to evolving methods of providing a more stable information to the models.
 
 (sec:conclusion)=
+
 ## Conclusion
-Feature selection for large datasets incurs considerable computational cost in the model development process of various ML tasks. This study undertakes a preliminary investigation into the influence of sample fractions on feature importance and model performance in datasets characterised by class imbalance. Five real-life datasets with large sample sizes from different subject fields which exhibited high class imbalance ratios of 0.09 – 0.18 were utilised. 
+
+Feature selection for large datasets incurs considerable computational cost in the model development process of various ML tasks. This study undertakes a preliminary investigation into the influence of sample fractions on feature importance and model performance in datasets characterised by class imbalance. Five real-life datasets with large sample sizes from different subject fields which exhibited high class imbalance ratios of 0.09 – 0.18 were utilised.
 
 Due to its model-agnostic nature, PFI was adopted for feature selection process with feature importance evaluated on Light GBM, RF and SVM. The models were chosen due to their widespread use in real-world ML studies and their role as benchmarks for comparing new models. Cluster, spatial, and stats sub-packages of SciPy were instrumental in tackling the multicollinearity effects associated with PFI. Using a PFI approach, the study revealed the variability of feature importance with smaller sample fractions in LightGBM, random forest and SVM models. In the cases explored, LightGBM showed the lowest variability, while SVM exhibited the highest variability in feature importance. Also, Light GBM had the least CPU process time across the cases considered, while SVM showed the highest computational cost.
 
 In future work, this investigation would be expanded to substantially larger datasets and introduce some quantitative measure of the variability of various models and feature selection methods. An understanding of the variability of feature importance can inform feature engineering efforts that provides means of alleviating the variability of feature importance in samples fractions under class imbalance conditions.
 
-
 ## Acknowledgement
-This work was supported by the Petroleum Technology Development Fund under grant PTDF/ED/OSS/PHD/AGU/1076/17 and NISCO UK Research Centre. Computations were  performed using the Sulis Tier 2 HPC platform hosted by the Scientific Computing Research Technology Platform at the University of Warwick. Sulis is funded by EPSRC Grant EP/T022108/1 and the HPC Midlands+ consortium.
+
+This work was supported by the Petroleum Technology Development Fund under grant PTDF/ED/OSS/PHD/AGU/1076/17 and NISCO UK Research Centre. Computations were performed using the Sulis Tier 2 HPC platform hosted by the Scientific Computing Research Technology Platform at the University of Warwick. Sulis is funded by EPSRC Grant EP/T022108/1 and the HPC Midlands+ consortium.
diff --git a/papers/amadi_udu/myst.yml b/papers/amadi_udu/myst.yml
index 10ee74e775..2d6bb527c1 100644
--- a/papers/amadi_udu/myst.yml
+++ b/papers/amadi_udu/myst.yml
@@ -1,9 +1,12 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/TPGN6857
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-amadi_udu
   # Ensure your title is the same as in your `main.md`
   title: Computational Resource Optimisation in Feature Selection under Class Imbalance Conditions
+  description: Feature selection is crucial for reducing data dimensionality as well as enhancing model interpretability and performance in machine learning tasks. This study explores the possibility of performing feature selection on a subset of data to reduce the computational burden.
   # Authors should have affiliations, emails and ORCIDs if available
   authors:
     - name: Amadi Gabriel Udu
@@ -68,12 +71,14 @@ project:
     - machine learning
   # Add the abbreviations that you use in your paper here
   abbreviations:
-    LightGBM: Light Gradient-Boosting Machine
+    Light GBM: Light Gradient-Boosting Machine
+    ROC: Receiver Operator Characteristics
     SVM: Support Vector Machines
     RF: Random Forest
     PFI: Permutation Feature Importance
     IQR: Interquartile range
     AUC: area under the Receiver Operator Characteristics (ROC) curve
+    ML: machine learning
   numbering:
     headings: true
   # It is possible to explicitly ignore the `doi-exists` check for certain citation keys
@@ -87,13 +92,5 @@ project:
         - sklearn1
         - sklearn2
         - bonaccorso2018
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
 site:
   template: article-theme
diff --git a/papers/amadi_udu/thumbnail.png b/papers/amadi_udu/thumbnail.png
new file mode 100644
index 0000000000..8b16e62555
Binary files /dev/null and b/papers/amadi_udu/thumbnail.png differ
diff --git a/papers/atharva_rasane/banner.png b/papers/atharva_rasane/banner.png
index e6a793bd6c..60ccf9a7cf 100644
Binary files a/papers/atharva_rasane/banner.png and b/papers/atharva_rasane/banner.png differ
diff --git a/papers/atharva_rasane/main.md b/papers/atharva_rasane/main.md
index 19ec32e256..9b89ed669a 100644
--- a/papers/atharva_rasane/main.md
+++ b/papers/atharva_rasane/main.md
@@ -14,6 +14,7 @@ abstract: |
 ---
 
 ## Introduction
+
 The growth of the internet is driven by the spread of web pages, which are written in HTML (Hyper Text Markup Language). These web pages contain large amounts of text. Almost every webpage, in some form or another, contains text, making it a popular mode of communication, whether it be blogs, posts, articles, comments, etc. Text can be represented as a collection of ASCII or Unicode values, where each value corresponds to a specific character. Given the text-focused nature of the internet and tools like ChatGPT or Bard, it is crucial to identify the source of text. This helps to manage copyright issues and distinguish between AI-generated and human-written text, thereby preventing the spread of misinformation. Currently, detecting AI-generated text relies on machine learning classifiers that need frequent retraining with the latest AI-generated data. However, this method has drawbacks, such as the rapid evolution of AI models producing increasingly human-like text. Therefore, a more stable approach is needed, one that does not depend on the specific AI model generating the text.
 
 Watermarks are an identifying pattern used to trace the origin of the data. In this case, we specifically want to focus on text watermarking (watermarking of plain text). Text watermarking can broadly be classified into 2 types, Logical Embedding, and Physical Embedding, which in turn can be classified further [@Atr01]. Logical Embedding involves the user generating a watermark key by some logic from the input text. Note that this means that the input text is not altered, and the user instead keeps the generated watermark key to identify the text. Physical Embedding involves the user altering the input text itself to insert a message into it, and the user instead runs an algorithm to find this message to identify the text. In this paper, we will propose an algorithm to watermark text using BERT (Bidirectional Encoder Representations from Transformers), a model introduced by Google, whose main purpose is to replace a special symbol "[MASK]" with the most probable word given the context.
@@ -23,13 +24,14 @@ BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained
 At its core, BERT employs a bi-directional Transformer encoder, which helps understand the relationships between words in a sentence. This enhances its comprehension of text by understanding context from both directions simultaneously. BERT undergoes pre-training through two tasks: Masked Language Modeling (MLM), where certain words in a sentence are masked and the model predicts them based on surrounding words, and Next Sentence Prediction (NSP), which involves determining if one sentence logically follows another. This comprehensive training enables BERT to excel in numerous NLP applications like question answering, text classification, and named entity recognition. Given its deep understanding of context and semantics, BERT is highly relevant to text watermarking. Watermarking text involves embedding identifying patterns within the text to trace its origin, which can be critical for copyright protection and distinguishing between AI-generated and human-written content. BERT's sophisticated handling of language makes it ideal for embedding watermarks in a way that is subtle yet robust, ensuring that the text remains natural while the watermark is detectable. This capability provides a more stable and reliable method for watermarking text, irrespective of the model generating the text, therefore offering a concrete solution amidst the evolving landscape of AI-generated content.
 
 ## Related Work
+
 In this section, we will review two text watermarking algorithms before introducing our proposed technique. Let's first look at the current standards for text watermarking. Text watermarking algorithms embed unique identifiers in text to protect copyright and verify authenticity. They are important because they help prevent unauthorized use, copying, and distribution of text.
 
-The first algorithm is Word Context, developed by Jalil & Mirza in 2009. It is a type of logical watermarking that generates a watermark key without altering the original text [@Proc01]. Logical watermarking involves embedding a watermark key without changing the original text. Word Context generates a watermark key by analyzing the structure of the text around selected keywords and creating a pattern based on word lengths [@Proc01]. In Word Context, a keyword is selected. For example, using the keyword 'is' in the text 'Pakistan is a developing country, with Islamabad is the capital of Pakistan. It is located in Asia.' The lengths of the words before and after 'is' are recorded: 'Pakistan' (8) and 'a' (1), 'Islamabad' (9) and 'the' (3), 'It' (2) and 'located' (7). The watermark is then 8-1-9-3-2-7 [@Proc01]. The keyword is chosen based on its significance in the text. Word lengths are used to create the watermark because they provide a unique pattern without altering the text, ensuring the watermark is imperceptible [@Proc01].
+The first algorithm is Word Context, developed by @Proc01. It is a type of logical watermarking that generates a watermark key without altering the original text [@Proc01]. Logical watermarking involves embedding a watermark key without changing the original text. Word Context generates a watermark key by analyzing the structure of the text around selected keywords and creating a pattern based on word lengths [@Proc01]. In Word Context, a keyword is selected. For example, using the keyword 'is' in the text 'Pakistan is a developing country, with Islamabad is the capital of Pakistan. It is located in Asia.' The lengths of the words before and after 'is' are recorded: 'Pakistan' (8) and 'a' (1), 'Islamabad' (9) and 'the' (3), 'It' (2) and 'located' (7). The watermark is then 8-1-9-3-2-7 [@Proc01]. The keyword is chosen based on its significance in the text. Word lengths are used to create the watermark because they provide a unique pattern without altering the text, ensuring the watermark is imperceptible [@Proc01].
 
 The second algorithm, UniSpaCh by Kamaruddin et al. in 2018, modifies the white spaces in text to embed a binary message directly into it [@Atr04]. Modifying white spaces changes the spacing patterns in the text, embedding binary information. A binary message is a sequence of bits (0s and 1s) that represents data. This method uses different types of spaces to encode these bits [@Atr04]. UniSpaCh uses 2-bit categorization to create a binary string (e.g., '10', '01', '00', '11'). Each pair of bits is replaced with a unique type of space (like punctuation space, thin space). These spaces are then placed in areas like between words, sentences, and paragraphs. This method is highly invisible but has low capacity, making it unsuitable for embedding long messages [@Atr04]. 2-bit categorization assigns pairs of bits to specific types of spaces. This method is considered invisible because the changes are subtle and not easily noticeable by readers. It has low capacity because only a few bits can be embedded per space, limiting the amount of information that can be hidden [@Atr04].
 
-The first approach by Jalil & Mirza (2009) is not suitable for today's fast-paced generation of AI text, as it is impractical to store a logical watermark for each new text [@Proc01]. It is impractical to store a logical watermark for each text because the volume of generated text is too high, making it difficult to manage and store all watermarks. AI text generation has made it easier and faster to produce large amounts of text, increasing the need for scalable watermarking solutions [@Proc01]. The second approach by Por et al. (2012) is also not suitable because the watermark can be easily removed by reformatting the text. We need a robust and imperceptible watermarking technique [@Atr04]. The watermark can be removed by reformatting because changes in text layout, such as altering spaces or reformatting paragraphs, can disrupt the embedded watermark. A robust watermarking technique can withstand such changes and remain detectable, while an imperceptible technique ensures the watermark is not noticeable to the reader [@Atr04].
+The first approach by @Proc01 is not suitable for today's fast-paced generation of AI text, as it is impractical to store a logical watermark for each new text [@Proc01]. It is impractical to store a logical watermark for each text because the volume of generated text is too high, making it difficult to manage and store all watermarks. AI text generation has made it easier and faster to produce large amounts of text, increasing the need for scalable watermarking solutions [@Proc01]. The second approach by Por et al. (2012) is also not suitable because the watermark can be easily removed by reformatting the text. We need a robust and imperceptible watermarking technique [@Atr04]. The watermark can be removed by reformatting because changes in text layout, such as altering spaces or reformatting paragraphs, can disrupt the embedded watermark. A robust watermarking technique can withstand such changes and remain detectable, while an imperceptible technique ensures the watermark is not noticeable to the reader [@Atr04].
 
 Our proposed technique is based on a method by Lancaster (2023) for ChatGPT [@Atr02]. It replaces every fifth word in a sequence of five consecutive words (non-overlapping 5-gram) with a word generated using a fixed random seed. For example, in the sentence 'The friendly robot greeted the visitors with a cheerful beep and a wave of its metal arms,' the non-overlapping 5-grams are 'The friendly robot greeted the,' 'visitors with a cheerful beep,' and 'and a wave of its metal.' We replace the words 'the,' 'visitors,' and 'metal' with words generated by ChatGPT using a fixed random seed [@Atr02]. A non-overlapping 5-gram is a sequence of five consecutive words without any overlap. Replacing every fifth word embeds the watermark without altering the overall meaning of the text, making it a subtle and effective method for embedding the watermark [@Atr02].
 
@@ -37,20 +39,22 @@ We check the watermark using overlapping 5-grams, which overlap by four words. F
 
 We propose using BERT, a model designed to find missing words, as a better alternative to ChatGPT. BERT is more precise and smaller. Its bidirectional nature uses more context for word prediction, potentially leading to better results. While ChatGPT-based algorithms are best for ChatGPT text, BERT can be used for any text, regardless of its origin. BERT is better than ChatGPT for this purpose because it is more precise and smaller, making it more efficient. BERT's bidirectional nature means it uses context from both the preceding and following words to predict a missing word, which can lead to more accurate results.
 
-
 ## Proposed Model
+
 "BERT-based watermarking is based on the 5-gram approach by Lancaster[@Atr02]. However, our focus is on watermarking any text, regardless of its origin. This paper will use **bert-base-uncased** model, which finds the most probable uncased English word to replace the [MASK] token.
 
 Note that a different variant of BERT can be trained on different language datasets and thus will generate a different result and as such the unique identity to consider here is the BERT model i.e. if the user wants a unique watermark they need to train/develop the BERT model on their own. This paper is not concerned with the type of BERT model and is focused on its conceptual application for watermarking. Thus for us, BERT is a black box model that returns the most probable word given the context with the only condition being that it has a constant temperature i.e. it does not hallucinate (produce different results for the same input). For our purposes, you can think of the proposed algorithm as a many to one function which is responsible for converting the input text into a subset of watermarked set.
 
 ## Algorithm
+
 **Watermark Encoding**
 :::{figure} Algorithm-Encoding.png
 :label: fig:1
 Encoding algorithm to watermark input text
 :::
 
-The above is a simple implementation of the algorithm where we are assuming 
+The above is a simple implementation of the algorithm where we are assuming
+
 1. The only white spaces in the text are " ".
 2. BERT model has infinite context.
 
@@ -78,8 +82,8 @@ The algorithm checks if a given text is watermarked by comparing the input and o
 
 8. **Classification:** Use a pre-trained model to classify the text based on the metrics (`Highest Ratio`, `Average Others`, T-Statistic, P-Value) as watermarked or not.
 
-
 ## Implementation - Encoding module
+
 Let's examine a Python implementation of the proposed watermarking model. The watermark_text module identifies every 5th word in the input string, splits them using Python's built-in split() function, and marks them for modification using BERT. These placeholders are replaced with the [MASK] token. Although we use BERT here, the module can be adapted to other AI models. We chose BERT due to its efficiency in altering individual words. The 5th word is selected to ensure a consistent and detectable pattern.
 **The choice of index = 5 is because?**
 **Also is this code picked from a paper?**
@@ -133,10 +137,12 @@ text = "Quantum computing is a rapidly evolving field that leverages the princip
 watermark_text(text, offset=0)
 result = "Quantum computing is a rapidly evolving field that leverages the principles of quantum mechanics to perform computations that are impossible for classical computers. Unlike quantum computers, which use bits as the fundamental unit of , quantum computers use quantum bits or qubits. Qubits can exist in multiple states simultaneously according to the principles of symmetry and entanglement, providing a significant advantage in solving complex mathematical problems."
 ```
+
 In the result, the module has replaced each 5th word with the most probable replacement word selected by BERT. There will always be some words that AI would not alter. For example the 10th word "the" and the 15th word "to". These cannot be changed by AI without altering the entire sentence.
 Further, to speed up the AI computing, we can employ GPUs in this module as well as the Detection module.
 
 ## Implementation - Detection Module
+
 Now that we have our watermarked text, we need to identify potential copyright infringement. We assume this text is what a plagiarizer has access to.
 
 For this, we create a module to check the number of word matches if the AI model with the same offset parameter is run on the watermarked text again. The algorithm's elegance lies in its consistency: if we run it again on the watermarked text, the output will match the input because the most probable words are already present at every 5th offset. Consequently, we get a 100% match rate with a match ratio of 1. If all the 5th words were altered, our match rate would be 0.
@@ -144,6 +150,7 @@ For this, we create a module to check the number of word matches if the AI model
 Altering written text is a posibility we cannot ignore. consider a scenario where a plagiarizer might insert extra words, causing the input not to match the output exactly. This means our model needs to check for watermarks not only at a specific index but also in the surrounding words. Therefore, our model needs to check for the watermark at different offsets (0 to 4) to account for potential word insertions.
 
 Here is how the offset works:
+
 - If 1 word is added at the start, the offset is 1.
 - If 2 words are added, the offset is 2.
 - If 3 words are added, the offset is 3.
@@ -229,7 +236,7 @@ match_ratios = watermark_text_and_calculate_matches(text, max_offset=5)
 # (result rounded) match_ratio = {0: 0.54, 1: 0.62, 2: 0.58, 3: 0.67, 4: 0.58}
 ```
 
-The final stage of detection involves determining if the match ratios are statistically significant. 
+The final stage of detection involves determining if the match ratios are statistically significant.
 To determine whether the text is watermarked, we rely on a binary classification of whether a text is watermarked. For this, we use a pre-trained model based on metrics including Highest Ratio, Average Others, T-Statistic, and P-Value. This approach is necessary because, as illustrated in the graphs later, there is no discernible or easily observable difference between the T-statistics and P-values of watermarked and non-watermarked texts. Consequently, we resort to using a pre-trained model for classification, which has achieved the highest accuracy of 94%.
 
 The module "check_significant_difference" generates a list of significance.
@@ -272,6 +279,7 @@ match_ratios = watermark_text_and_calculate_matches(text, max_offset=5)
 check_significant_difference(match_ratios)
 
 ```
+
 The module "randomly_add_words" was created to simulate the scenario where additional words have been added to the watermarked test for testing purposes.
 
 ```python
@@ -443,26 +451,26 @@ plt.title('Heatmap of Feature Importances')
 plt.show()
 ```
 
-The plots are created using the result from our previous example with check_significant_difference returned: 
-  Highest Match Ratio: 0.7692307692307693
-  Average of Other Ratios: 0.5164835164835164
-  T-Statistic: -5.66220858504931
-  P-Value: 0.010908789440745323
+The plots are created using the result from our previous example with check_significant_difference returned:
+Highest Match Ratio: 0.7692307692307693
+Average of Other Ratios: 0.5164835164835164
+T-Statistic: -5.66220858504931
+P-Value: 0.010908789440745323
 
 ```{list-table} First few rows of the DataFrame
 :label: tbl:Dataframe
 :header-rows: 1
-* - 
-  - Highest Ratio  
+* -
+  - Highest Ratio
   - Average Others
   - T-Statistic
-  - P-Value 
+  - P-Value
   - Label
 * - 0
   - 0.233333
   - 0.182203
   - -3.532758
-  - 0.038563  
+  - 0.038563
   - Original
 * - 1
   - 0.203390
@@ -489,12 +497,13 @@ The plots are created using the result from our previous example with check_sign
   - 0.012026
   - Original
 ```
+
 ```{list-table} Statistical Summary
 :label: tbl:Statistical_Summary
 :header-rows: 1
-* - 
+* -
   - Highest Ratio
-  - Average Others 
+  - Average Others
   - T-Statistic
   - P-Value
 * - count
@@ -502,7 +511,7 @@ The plots are created using the result from our previous example with check_sign
   - 4000.000000
   - 3999.000000
   - 3999.000000
-* - mean 
+* - mean
   - 0.490285
   - 0.339968
   - -6.076672
@@ -540,11 +549,11 @@ The plots are created using the result from our previous example with check_sign
 ```
 
 Missing Values:
-Highest Ratio     0
-Average Others    0
-T-Statistic       1
-P-Value           1
-Label             0
+Highest Ratio 0
+Average Others 0
+T-Statistic 1
+P-Value 1
+Label 0
 dtype: int64
 
 :::{figure} Distribution_of_highest_ratio.png
@@ -615,9 +624,11 @@ The pair plot and correlation matrix provide further evidence of distinct patter
 While these plots do show a difference between the watermarked and non-watermarked text, using a pre-trained model help us achieve higher efficiency and consistency in our comparisons.
 
 ## Model Training, Testing and Efficiency
+
 The algorithm in this paper was trained using a dataset generated from Gutenberg's top 10 books [@book01], [@book02], [@book03], [@book04], [@book05], [@book06], [@book07], [@book08], [@book09], [@book10]. Specifically, 2000 random 300-word paragraphs were taken from these books, ensuring an equal number of paragraphs from each book. Each paragraph was watermarked, and then statistical analysis was performed. The Highest Match Ratio, Average of Other Ratios, T-Statistic, and P-Value were calculated and stored in Results.csv. The models were trained using an 80/20 split of the dataset, with the following models being trained: Logistic Regression, Decision Tree, Random Forest, Support Vector Machine, Gradient Boosting, AdaBoost, Naive Bayes, and K-Nearest Neighbors. Gradient Boosting gave the highest accuracy, resulting in an overall accuracy of 94% in identifying watermarked text
 
 **Code used for model training**
+
 ```python
 import pandas as pd
 import seaborn as sns
@@ -691,6 +702,7 @@ for model_name, model in models.items():
 ```
 
 **Code for Model testing**
+
 ```python
 import os
 import random
@@ -733,6 +745,7 @@ def extract_test_cases(folder_path, num_cases=2000, words_per_case=300):
 folder_path = 'books'
 test_cases = extract_test_cases(folder_path)
 ```
+
 ```python
 list_of_significance = []
 list_of_significance_watermarked = []
@@ -762,25 +775,28 @@ for text in test_cases:
 ## Analysis of the Algorithm
 
 **Strengths:**
+
 1. Robustness against attacks: The BERT-based watermarking algorithm uses the sophisticated context-understanding capability of BERT to embed watermarks. This makes the watermark integration deeply intertwined with the text's semantic structure, which is difficult to detect and remove without altering the underlying meaning, thus providing robustness against simple text manipulation attacks.
 2. Comparison with existing methods: Compared to traditional watermarking methods like word context and UniSpaCh, the BERT-based approach offers a more adaptable and less detectable method. It does not rely on altering visible text elements or patterns easily erased, like white spaces or specific word sequences. Instead, it uses semantic embedding, making it superior in maintaining the natural flow and readability of the text.
 3. Scalability and adaptability: The method is scalable to different languages and text forms by adjusting the BERT model used. It can be adapted to work with different BERT variants trained on specific datasets, enhancing flexibility in deployment.
 
 **Challenges:**
+
 1. Dependency on model consistency: The watermark detection relies heavily on the consistency of the BERT model's output. Any updates or changes in the model could potentially alter the watermark, making it undetectable. If the watermark can embbed some sort of version history and control, this could be managed.
 2. Data Integrity is highly dependent on the Model: the integrity of the watermarked text depends on how good the model is at replacing the given word, due to the nature of AI-generated text where all the previous tokens are used to generate new ones BERT watermarking can preserve integrity much more effectively. However if it were to watermark text which is completely different from its training dataset it might return an incoherent output, for example if the dataset of BERT consists of scientific papers it will struggle immensely when trying to watermark fairy tails.
 3. Potential for false positives/negatives: Given the probabilistic nature of BERT's predictions, there is a risk of incorrect watermark detection, especially in texts with complex semantics or those that closely mimic the watermark patterns without actually being watermarked.
 4. Potential loss of context: When words are replaced, the intended context of delivery could be altered. However, AI models are continually improving, and we hope that a well-trained model can significantly mitigate this risk.
 
 **Real-world applicability:**
+
 1. Versatility in applications: This method can be applied across various fields such as copyright protection, and content authentication, and in legal and academic settings where proof of authorship is crucial. It is particularly beneficial for managing copyrights in digital media, academic papers, and any online content where text is dynamically generated or reused.
 2. Integration with existing systems: The algorithm can be seamlessly integrated with current content management systems (CMS) and digital rights management (DRM) systems, enhancing their capabilities to include advanced text watermarking features. This integration helps organizations maintain control over their content distribution and monitor usage without invasive methods.
 3. Application in AI-generated text: With the proliferation of AI-generated content from models like ChatGPT, GPT-4, and other AI writing assistants, distinguishing between human-generated and AI-generated text becomes crucial. The BERT-based watermarking can be used to embed unique, non-intrusive identifiers into AI-generated texts, ensuring that each piece of content can be traced back to its source. This is particularly valuable in preventing the spread of misinformation, verifying the authenticity of content, and in applications where copyright claims on AI-generated content might be disputed.
 4. Forensic Linguistics in Cybersecurity: In cybersecurity, determining the origin of phishing emails or malicious texts can be crucial. BERT-based watermarking can assist forensic linguists and security professionals by providing a means to trace the origins of specific texts back to their creators, helping to identify patterns or sources of cyber threats.
 5. Enhanced Licensing Control for Digital Text: As digital content licensing becomes more complex with different rights for different geographies and platforms, watermarking can help content owners and licensing agencies enforce these rights more effectively. The watermark makes it easier to enforce and monitor compliance automatically.
 
-
 ## Conclusion
+
 By leveraging the BERT model and the proposed algorithm, we have achieved a 94% accuracy rate in detecting watermarked text. With an appropriate training dataset and ongoing advancements in AI technology, this approach promises even more robust watermarking techniques. This progress will enhance our ability to identify AI-generated content and provide an effective means for detecting plagiarism.
 
 [^footnote-3]: $\mathrm{e^{-i\pi}}$
diff --git a/papers/atharva_rasane/myst.yml b/papers/atharva_rasane/myst.yml
index 5527e8f92f..0f142c0d0e 100644
--- a/papers/atharva_rasane/myst.yml
+++ b/papers/atharva_rasane/myst.yml
@@ -1,10 +1,13 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/DHKD1726
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-atharva_rasane
   # Ensure your title is the same as in your `main.md`
   title: AI-Driven Watermarking Technique for Safeguarding Text Integrity in the Digital Age
-  subtitle: 
+  description: Identifying the sources is vital for generative AI models, like ChatGPT and Bard, due to concerns about copyright infringement and plagiarism. In this paper, we explore text watermarking as a potential solution. We investigate techniques including physical watermarking and logical watermarking.
+  subtitle:
   # Authors should have affiliations, emails and ORCIDs if available
   authors:
     - name: Atharva Rasane
@@ -19,14 +22,16 @@ project:
   # Add the abbreviations that you use in your paper here
   abbreviations:
     BERT: Bidirectional Encoder Representations from Transformers
-    AI:   Artificial Intelligence
-    NLP:  Natural Language Processing
-    NSP:  Next Sentence Prediction
-    GPU:  Graphics Processing Units
+    AI: Artificial Intelligence
+    NLP: Natural Language Processing
+    NSP: Next Sentence Prediction
+    GPU: Graphics Processing Units
     T-statistics: Test Statistics
     P-value: Probability Value
-    DRM:  Digital Rights Management
-    CMS:  Content Management Systems
+    DRM: Digital Rights Management
+    CMS: Content Management Systems
+    MLM: Masked Language Modeling
+    HTML: Hyper Text Markup Language
   # It is possible to explicitly ignore the `doi-exists` check for certain citation keys
   error_rules:
     - rule: doi-exists
@@ -45,13 +50,5 @@ project:
         - Atr01
         - Atr02
         - Atr03
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
 site:
   template: article-theme
diff --git a/papers/atharva_rasane/thumbnail.png b/papers/atharva_rasane/thumbnail.png
new file mode 100644
index 0000000000..2f6edf4435
Binary files /dev/null and b/papers/atharva_rasane/thumbnail.png differ
diff --git a/papers/benchmarking/banner.png b/papers/benchmarking/banner.png
new file mode 100644
index 0000000000..27e58167da
Binary files /dev/null and b/papers/benchmarking/banner.png differ
diff --git a/papers/benchmarking/main.tex b/papers/benchmarking/main.tex
index ef9e08c43f..e4e845e3d0 100644
--- a/papers/benchmarking/main.tex
+++ b/papers/benchmarking/main.tex
@@ -5,13 +5,13 @@
 \end{abstract}
 
 \section{Introduction}\label{introduction}
-Making probabilistic forecasts is challenging, and evaluating probabilistic forecasts, or the algorithms that produce them, is even more difficult. 
+Making probabilistic forecasts is challenging, and evaluating probabilistic forecasts, or the algorithms that produce them, is even more difficult.
 
-A significant body of literature focuses on developing robust meta-methodologies for evaluation. This includes evaluation metrics such as the Continuous Ranked Probability Score (CRPS) \cite{gneiting2014probabilistic} and their properties, like properness, as well as benchmarking setups and competitions like the Makridakis competitions \cite{makridakis2020_m4, makridakis2022_m5}. This meta-field builds upon a broader primary field that develops methodologies for algorithms producing probabilistic forecasts, encompassing classical methods, uncertainty estimation techniques like bootstrap or conformal intervals, and modern deep learning and foundation models \cite{chen2020probabilistic, nowotarski2018probabilistic, lagllama, decoder_foundation_forecasting}.
+A significant body of literature focuses on developing robust meta-methodologies for evaluation. This includes evaluation metrics such as the Continuous Ranked Probability Score (CRPS) \citep{gneiting2014probabilistic} and their properties, like properness, as well as benchmarking setups and competitions like the Makridakis competitions \citep{makridakis2020_m4, makridakis2022_m5}. This meta-field builds upon a broader primary field that develops methodologies for algorithms producing probabilistic forecasts, encompassing classical methods, uncertainty estimation techniques like bootstrap or conformal intervals, and modern deep learning and foundation models \citep{chen2020probabilistic, nowotarski2018probabilistic, lagllama, decoder_foundation_forecasting}.
 
-Despite the critical importance of evaluating probabilistic forecasts in various domains, including finance, energy, healthcare, and climate science, no comprehensive software framework or interface design has emerged to cover all these needs with a simple workflow or specification language. For instance, the reproducing code for the Makridakis competitions—while extensive in scope—relies on forecasts generated from disparate software interfaces. Similar issues are found in other benchmarking studies, where code availability is often limited or nonexistent \cite{semmelrock2023, kiraly2018}. This lack of unified interfaces makes it challenging for practitioners in both industry and academia to contribute to or verify the growing body of evidence.
+Despite the critical importance of evaluating probabilistic forecasts in various domains, including finance, energy, healthcare, and climate science, no comprehensive software framework or interface design has emerged to cover all these needs with a simple workflow or specification language. For instance, the reproducing code for the Makridakis competitions—while extensive in scope—relies on forecasts generated from disparate software interfaces. Similar issues are found in other benchmarking studies, where code availability is often limited or nonexistent \citep{semmelrock2023, kiraly2018}. This lack of unified interfaces makes it challenging for practitioners in both industry and academia to contribute to or verify the growing body of evidence.
 
-To address these limitations, we present a simple, reproducible experimental workflow for evaluating probabilistic forecasting algorithms using \texttt{sktime} \cite{franz_kiraly_2024_11095261_sktime}. As of 2024, the \texttt{sktime} package provides the most comprehensive collection of time series-related algorithms in unified interfaces and stands out as the only major, non-commercially governed framework for time series forecasting.
+To address these limitations, we present a simple, reproducible experimental workflow for evaluating probabilistic forecasting algorithms using \texttt{sktime} \citep{franz_kiraly_2024_11095261_sktime}. As of 2024, the \texttt{sktime} package provides the most comprehensive collection of time series-related algorithms in unified interfaces and stands out as the only major, non-commercially governed framework for time series forecasting.
 
 The key components of this reproducible benchmarking framework are:
 
@@ -23,24 +23,24 @@ \section{Introduction}\label{introduction}
   \item A standardized workflow for obtaining benchmark result tables for combinations of algorithms, metrics, and experimental setups.
 \end{itemize}
 
-To demonstrate the efficacy and ease of use of \texttt{sktime} in benchmarking probabilistic forecasters, we conducted a small study exploring the performance of various meta-algorithms (wrappers) that add prediction intervals to point forecasters. We investigated a range of forecasters, including Naive Forecasting and AutoTheta models \cite{autotheta}, along with probabilistic wrappers such as Conformal Intervals and BaggingForecaster with different bootstrapping methods. For the bootstrapping methods, we use the \texttt{tsbootstrap} library \cite{gilda_2024_10866090_tsbootstrap, gilda2024tsbootstrap}.
+To demonstrate the efficacy and ease of use of \texttt{sktime} in benchmarking probabilistic forecasters, we conducted a small study exploring the performance of various meta-algorithms (wrappers) that add prediction intervals to point forecasters. We investigated a range of forecasters, including Naive Forecasting and AutoTheta models \citep{autotheta}, along with probabilistic wrappers such as Conformal Intervals and BaggingForecaster with different bootstrapping methods. For the bootstrapping methods, we use the \texttt{tsbootstrap} library \citep{gilda_2024_10866090_tsbootstrap, gilda2024tsbootstrap}.
 
 The study's goal was to evaluate the effectiveness of these combined approaches in improving prediction accuracy and reliability.
 
-We conducted experiments on several common datasets, including Australian electricity demand \cite{Godahewa2021Australian}, sunspot activity \cite{Godahewa2021Sunspot}, and US births \cite{Godahewa2021USBirth}. These datasets represent different time frequencies and characteristics.
+We conducted experiments on several common datasets, including Australian electricity demand \citep{Godahewa2021Australian}, sunspot activity \citep{Godahewa2021Sunspot}, and US births \citep{Godahewa2021USBirth}. These datasets represent different time frequencies and characteristics.
 
 Our paper is accompanied by easily reusable code, and we invite the open research and open-source communities to contribute to extending our experiments or using our code to set up their own. As is often the case in modern data science, computational power is a limiting factor, so we hope to leverage the SciPy conference to plan a more comprehensive set of studies.
 
-The remainder of this paper is organized as follows: In Section \ref{methods}, we describe the forecasting methods and probabilistic wrappers used in our experiments. Section \ref{datasets} provides an overview of the datasets used for evaluation. In Section \ref{results}, we present the experimental results and discuss the performance of the combined approaches. Finally, in Section \ref{conclusion}, we conclude the paper and outline directions for future research.
+The remainder of this paper is organized as follows: In \autoref{methods}, we describe the forecasting methods and probabilistic wrappers used in our experiments. \autoref{datasets} provides an overview of the datasets used for evaluation. In \autoref{results}, we present the experimental results and discuss the performance of the combined approaches. Finally, in \autoref{conclusion}, we conclude the paper and outline directions for future research.
 
 \section{sktime and tsbootstrap for Reproducible Experiments}\label{sktime}
 
-In this section, we summarize the key design principles used for reproducible benchmarking in \texttt{sktime}. Thereby, from a software perspective, it is  worth noting that \textbf{sktime} \cite{loning2019sktime, franz_kiraly_2024_11095261_sktime} contains multiple native implementations, including naive methods, all probability wrappers, and pipeline composition, but also provides a unified interface across multiple packages in the time series ecosystem, e.g.:
+In this section, we summarize the key design principles used for reproducible benchmarking in \texttt{sktime}. Thereby, from a software perspective, it is  worth noting that \textbf{sktime} \citep{loning2019sktime, franz_kiraly_2024_11095261_sktime} contains multiple native implementations, including naive methods, all probability wrappers, and pipeline composition, but also provides a unified interface across multiple packages in the time series ecosystem, e.g.:
 
 \begin{itemize}
-    \item \textbf{tsbootstrap} \cite{gilda_2024_10866090_tsbootstrap, gilda2024tsbootstrap}: A library for time series bootstrapping methods, integrated with \texttt{sktime}.
-    \item \textbf{statsforecast} \cite{garza2022statsforecast}: A library for statistical and econometric forecasting methods, featuring the Auto-Theta algorithm.
-    \item \textbf{statsmodels} \cite{seabold2010statsmodels}: A foundational library for statistical methods, used for the deseasonalizer and various statistical primitives.
+    \item \textbf{tsbootstrap} \citep{gilda_2024_10866090_tsbootstrap, gilda2024tsbootstrap}: A library for time series bootstrapping methods, integrated with \texttt{sktime}.
+    \item \textbf{statsforecast} \citep{garza2022statsforecast}: A library for statistical and econometric forecasting methods, featuring the Auto-Theta algorithm.
+    \item \textbf{statsmodels} \citep{seabold2010statsmodels}: A foundational library for statistical methods, used for the deseasonalizer and various statistical primitives.
 \end{itemize}
 
 This hybrid use of \texttt{sktime} as a framework covering first-party (itself), second-party (\texttt{tsbootstrap}), and third-party (\texttt{statsmodels}, \texttt{statsforecast}) packages is significant.
@@ -71,7 +71,7 @@ \subsection{Unified Interface}
 \end{lstlisting}
 
 A crucial design element in the above is that line 9 can specify any forecaster - for instance, \texttt{ARIMA} or \texttt{ExponentialSmoothing},
-and the rest of the code will work without modification. 
+and the rest of the code will work without modification.
 
 Currently, \texttt{sktime} supports the construction of 82 individual forecasters.
 Some of these are implemented directly in sktime, but sktime also provides adapters providing a unified API to forecasting routines from other libraries, such as gluonts and prophet.
@@ -117,7 +117,7 @@ \subsection{Unified Interface}
 \subsection{Reproducible Specification Language}
 All objects in \texttt{sktime} are uniquely specified by their construction string, which serves as a reproducible blueprint. Algorithms are intended to be stable and uniquely referenced across versions; full replicability can be achieved by freezing Python environment versions and setting random seeds.
 
-For example, the specification string 
+For example, the specification string
 \texttt{NaiveForecaster(strategy="last", sp=12)}
 uniquely specifies a native implementation of the seasonal last-value-carried-forecaster, with seasonality parameter 12.
 
@@ -160,7 +160,7 @@ \subsection{Meta-Algorithms and Composability}
 Out of these residuals, the prediction intervals are computed.
 The resulting algorithm possesses \texttt{fit}, \texttt{predict}, and - added by the wrapper - \texttt{predict\_interval}, as well as the \texttt{capability:pred\_int} tag.
 
-Data transformation pipelines can be constructed similarly, or with an operator-based specification syntax (Program~\ref{lst:pipeline}).:
+Data transformation pipelines can be constructed similarly, or with an operator-based specification syntax (Program~\ref{lst:pipeline}):
 
 \begin{lstlisting}[language=Python, caption=Code to construct a pipeline with a differencer and a seasonalizer, label=lst:pipeline]
 # importing the necessary modules
@@ -240,7 +240,7 @@ \subsection{Benchmarking and Evaluation}
 benchmark.add_estimator(
     estimator=NaiveForecaster(strategy="last", sp=12),
     estimator_id="naive_forecaster"
-)     
+)
 
 # Initialise the Splitter
 cv_splitter = SlidingWindowSplitter(
@@ -261,7 +261,7 @@ \subsection{Benchmarking and Evaluation}
 benchmark.run("result.csv")
 \end{lstlisting}
 
-This approach ensures that our benchmarking process is both comprehensive and reproducible. 
+This approach ensures that our benchmarking process is both comprehensive and reproducible.
 
 \begin{figure}
     \centering
@@ -271,7 +271,7 @@ \subsection{Benchmarking and Evaluation}
 \end{figure}
 
 \subsection{Prior Work}
-The design principles of \texttt{sktime} draw inspiration from several established machine learning frameworks. Notably, \texttt{scikit-learn} in Python \cite{pedregosa2011scikit}, \texttt{mlr} in R \cite{bischl2016mlr}, and Weka \cite{hall2009weka} have pioneered the use of first-order specification languages and object-oriented design patterns for machine learning algorithms. 
+The design principles of \texttt{sktime} draw inspiration from several established machine learning frameworks. Notably, \texttt{scikit-learn} in Python \citep{pedregosa2011scikit}, \texttt{mlr} in R \citep{bischl2016mlr}, and Weka \citep{hall2009weka} have pioneered the use of first-order specification languages and object-oriented design patterns for machine learning algorithms.
 
 \texttt{sktime} extends these foundational ideas by introducing specialized interfaces for time series forecasting, including both point and probabilistic forecasts. It also incorporates unique features such as:
 \begin{itemize}
@@ -281,17 +281,17 @@ \subsection{Prior Work}
     \item Inhomogeneous API composition for flexible algorithmic design
 \end{itemize}
 
-Additionally, the \texttt{R/fable} package \cite{hyndman2020package} has contributed similar concepts specifically tailored to forecasting, which \texttt{sktime} has expanded upon or adapted to fit within its framework.
+Additionally, the \texttt{R/fable} package \citep{hyndman2020package} has contributed similar concepts specifically tailored to forecasting, which \texttt{sktime} has expanded upon or adapted to fit within its framework.
 
 By leveraging and building upon these prior works, \texttt{sktime} offers a comprehensive and adaptable toolkit for time series forecasting, fostering reproducible research and facilitating extensive benchmarking and evaluation.
-For a detailed analysis of design patterns in AI framework packages and innovations in \texttt{sktime}, see \cite{kiraly2021designing}.
+For a detailed analysis of design patterns in AI framework packages and innovations in \texttt{sktime}, see \citep{kiraly2021designing}.
 
 \section{Algorithmic Setup} \label{methods}
 
 In this section, we describe the forecasting algorithms used in our experiments. Our methods combine traditional forecasting models with uncertainty estimation wrappers, showcasing the benchmarking and model specification capabilities of \texttt{sktime}. This study serves as an invitation to the scientific Python community to engage and contribute to a more systematic study with reproducible specifications.
 
 \subsection{Forecasting Pipeline}
-Each forecaster is wrapped in a \texttt{Differencer} and a \texttt{Deseasonalizer} as preprocessing steps to improve stationarity. These preprocessors are necessary because some forecasters require the time series to be stationary (i.e., the properties of the time series at time $t+1$, $t+2$, ..., $t+n$ do not depend on the observation at time $t$ \cite{hyndman2018}) and non-seasonal.
+Each forecaster is wrapped in a \texttt{Differencer} and a \texttt{Deseasonalizer} as preprocessing steps to improve stationarity. These preprocessors are necessary because some forecasters require the time series to be stationary (i.e., the properties of the time series at time $t+1$, $t+2$, ..., $t+n$ do not depend on the observation at time $t$ \citep{hyndman2018}) and non-seasonal.
 
 \begin{itemize}
     \item \texttt{Differencer}: Computes first differences, which are inverted after forecasting by cumulative summing.
@@ -307,14 +307,14 @@ \subsection{Forecasting Pipeline}
     \label{fig:pipeline}
 \end{figure}
 
-The partial pipeline specification, illustrated in Figure \ref{fig:pipeline}, is:
+The partial pipeline specification, illustrated in \autoref{fig:pipeline}, is:
 
 \begin{lstlisting}[language=Python]
 Differencer() * Deseasonalizer(sp=data_sp) * wrapper(forecaster)
 \end{lstlisting}
 
 where variable parts are \texttt{wrapper}, \texttt{forecaster}, and \texttt{data\_sp}. These components are varied as described below.
-Note the code for running the whole benchmark is provided in the Section~\ref{experiments}.
+Note the code for running the whole benchmark is provided in the \autoref{experiments}.
 
 \subsection{Component Forecasting Models}
 
@@ -329,21 +329,21 @@ \subsection{Component Forecasting Models}
     \end{itemize}
 
     \item \texttt{StatsForecastAutoTheta(sp)}: A variant of the Theta model of \cite{Assimakopoulos2000} with automated parameter tuning, from the \texttt{statsforecast} library.
-    
+
 \end{itemize}
 
 \subsection{Probabilistic Wrappers}
 We use the following probabilistic wrappers to enhance the forecasting models:
 
 \begin{itemize}
-    \item \texttt{ConformalIntervals(forecaster, strategy)}: Uses conformal prediction methods \cite{Stankeviciute2021} to produce non-parametric prediction intervals. Variants of the method are selected by the \texttt{strategy} parameter: \textbf{Empirical} and \textbf{Empirical Residual} use training quantiles, with the latter using symmetrized residuals. \textbf{Conformal} implements the method of \cite{Stankeviciute2021}, and \textbf{Conformal Bonferroni} applies the Bonferroni correction \cite{Sedgwicke509} .
-    \item \texttt{BaggingForecaster(bootstrap\_transformer, forecaster)}: Provides probabilistic forecasts by bootstrapping time series and aggregating the bootstrap forecasts \cite{hyndman2018, bergmeir2016}. The \texttt{BaggingForecaster} takes a bootstrap algorithm \texttt{bootstrap\_transformer}, a first-class object in \texttt{sktime}. Various bootstrap algorithms with their parameters are applied in the study.
+    \item \texttt{ConformalIntervals(forecaster, strategy)}: Uses conformal prediction methods \cite{Stankeviciute2021} to produce non-parametric prediction intervals. Variants of the method are selected by the \texttt{strategy} parameter: \textbf{Empirical} and \textbf{Empirical Residual} use training quantiles, with the latter using symmetrized residuals. \textbf{Conformal} implements the method of \cite{Stankeviciute2021}, and \textbf{Conformal Bonferroni} applies the Bonferroni correction \citep{Sedgwicke509} .
+    \item \texttt{BaggingForecaster(bootstrap\_transformer, forecaster)}: Provides probabilistic forecasts by bootstrapping time series and aggregating the bootstrap forecasts \citep{hyndman2018, bergmeir2016}. The \texttt{BaggingForecaster} takes a bootstrap algorithm \texttt{bootstrap\_transformer}, a first-class object in \texttt{sktime}. Various bootstrap algorithms with their parameters are applied in the study.
     \item \texttt{NaiveVariance(forecaster)}: Uses a sliding window to compute backtesting residuals, aggregated by forecasting horizon to a variance estimate. The mean is obtained from the wrapped forecaster, and variance from the pooled backtesting estimate.
     \item \texttt{SquaringResiduals(forecaster, residual\_forecaster)}: Uses backtesting residuals on the training set, squares them, and fits the \texttt{residual\_forecaster} to the squared residuals. Forecasts of \texttt{residual\_forecaster} are used as variance predictions, with mean predictions from \texttt{forecaster}, to obtain a normal distributed forecast. In this study, \texttt{residual\_forecaster} is always \texttt{NaiveForecaster(strategy="last")}.
 \end{itemize}
 
 \subsection{Bootstrapping Techniques}
-Bootstrapping methods generate multiple resampled datasets from the original time series data, which can be used as part of wrappers to estimate prediction intervals or predictive distributions. In this study, we use bootstrap algorithms from \texttt{tsbootstrap} \cite{gilda2024tsbootstrap, gilda_2024_10866090_tsbootstrap}, \texttt{sktime} \cite{franz_kiraly_2024_11095261_sktime}, and \texttt{scikit-learn} \cite{pedregosa2011scikit} compatible framework library dedicated to time series bootstrap algorithms. \texttt{sktime} adapts these algorithms via the \texttt{TSBootstrapAdapter}, used as \texttt{bootstrap\_transformer} in \texttt{BaggingForecaster}.
+Bootstrapping methods generate multiple resampled datasets from the original time series data, which can be used as part of wrappers to estimate prediction intervals or predictive distributions. In this study, we use bootstrap algorithms from \texttt{tsbootstrap} \citep{gilda2024tsbootstrap, gilda_2024_10866090_tsbootstrap}, \texttt{sktime} \citep{franz_kiraly_2024_11095261_sktime}, and \texttt{scikit-learn} \citep{pedregosa2011scikit} compatible framework library dedicated to time series bootstrap algorithms. \texttt{sktime} adapts these algorithms via the \texttt{TSBootstrapAdapter}, used as \texttt{bootstrap\_transformer} in \texttt{BaggingForecaster}.
 
 \begin{itemize}
     \item \texttt{MovingBlockBootstrap}: Divides the time series data into overlapping blocks of a fixed size and resamples these blocks to create new datasets. The block size is chosen to capture the dependence structure in the data.
@@ -358,14 +358,14 @@ \subsection{Evaluation Metrics}
 We evaluate the performance of our forecasting models using the following metrics:
 
 \begin{itemize}
-    
-    \item \texttt{CRPS} - Continuous Ranked Probability Score \cite{matheson1976} measures the accuracy of probabilistic forecasts by comparing the predicted distribution to the observed values. The CRPS for a real-valued forecast distribution $d$ and an observation $y$ can be defined as:
+
+    \item \texttt{CRPS} - Continuous Ranked Probability Score \citep{matheson1976} measures the accuracy of probabilistic forecasts by comparing the predicted distribution to the observed values. The CRPS for a real-valued forecast distribution $d$ and an observation $y$ can be defined as:
     \[
     \text{CRPS}(d, y) = \mathbf{E} \left[ \left| X - y \right| \right] - \frac{1}{2} \mathbf{E} \left[ \left| X - X' \right| \right],
     \]
     where $X$ and $X'$ are independent random variables with distribution $d$.
-    
-    \item \texttt{PinballLoss} - the pinball loss, also known as quantile loss \cite{pinball_loss}, evaluates the accuracy of quantile forecasts by penalizing deviations from the true values based on specified quantiles.
+
+    \item \texttt{PinballLoss} - the pinball loss, also known as quantile loss \citep{pinball_loss}, evaluates the accuracy of quantile forecasts by penalizing deviations from the true values based on specified quantiles.
     For quantile forecasts $\hat{q}_1, \dots, \hat{q}_k$ at levels $\tau_1, \dots, \tau_k$ and an observation $y$, the Pinball Loss is defined as:
 
     \begin{equation}
@@ -383,7 +383,7 @@ \subsection{Evaluation Metrics}
     \item \texttt{IntervalWidth} - width of prediction intervals, or sharpness measures the concentration of the prediction intervals. More concentrated intervals indicate higher confidence in the forecasts. Sharpness is desirable because it indicates precise predictions. Sharpness is calculated as the average width of the prediction intervals.
     \item \texttt{EmpiricalCoverage} - Empirical coverage measures how much of the observations are within the predicted interval. It is computed as the proportion of observations that fall within the prediction, providing a direct measure of the reliability of the intervals. A prediction interval ranging from the 5th to the 95th quantile should cover 90\% of the observations. I.e., the empirical coverage should be close to 0.9.
     \item \texttt{runtime} - Besides metrics that assess the quality of the forecast, average runtime for an individual fit/interence run is also reported. Runtime measures the computational efficiency of the forecasting methods, which is crucial for practical applications.
- 
+
 \end{itemize}
 
 
@@ -402,11 +402,11 @@ \section{Experiments} \label{experiments}
 % \end{itemize}
 
 \subsection{Experimental Setup}
-To perform the benchmarking study, we use the framework described in Section~\ref{methods}. The benchmarking compares different probabilistic wrappers on different datasets and with different forecasters regarding CRPS, Pinball Loss, AUCalibration, and Runtime. 
+To perform the benchmarking study, we use the framework described in \autoref{methods}. The benchmarking compares different probabilistic wrappers on different datasets and with different forecasters regarding CRPS, Pinball Loss, AUCalibration, and Runtime.
 
-To enable easy replication of the experiments, we provide for each used forecaster, and wrapper the hyperparameters by providing the used Python object instantiation in Table~\ref{tab:hyperparams}. Note, that the parameter seasonal periodicity (sp) is dataset dependent and is set to 48 for the Australian Electricity Dataset and 1 for the other datasets.
+To enable easy replication of the experiments, we provide for each used forecaster, and wrapper the hyperparameters by providing the used Python object instantiation in \autoref{tab:hyperparams}. Note, that the parameter seasonal periodicity (sp) is dataset dependent and is set to 48 for the Australian Electricity Dataset and 1 for the other datasets.
 
-To create the cross validation folds, we use the \texttt{SlidingWindowSplitter} from \texttt{sktime}. The instantiation of the splitter for each dataset is shown in Table~\ref{tab:cv_splits}. Figure~\ref{fig:splits} is showing the resulting cross validation folds for the tree datasets. The properties of the datasets are summarized in Table~\ref{tab:datasets}.
+To create the cross validation folds, we use the \texttt{SlidingWindowSplitter} from \texttt{sktime}. The instantiation of the splitter for each dataset is shown in \autoref{tab:cv_splits}. \autoref{fig:splits} is showing the resulting cross validation folds for the tree datasets. The properties of the datasets are summarized in \autoref{tab:datasets}.
 
 
 
@@ -433,7 +433,7 @@ \subsection{Experimental Setup}
         \label{fig:usbirth_split}
 
     \end{subfigure}
-    \caption{The splits used for the evaluation on the three datasets. Blue indicates the training data, and orange indicates the test data. The splits are created using the parameters from Table~\ref{tab:datasets} and Table~\ref{tab:cv_splits}}
+    \caption{The splits used for the evaluation on the three datasets. Blue indicates the training data, and orange indicates the test data. The splits are created using the parameters from \autoref{tab:datasets} and \autoref{tab:cv_splits}}
     \label{fig:splits}
 \end{figure}
 
@@ -441,9 +441,9 @@ \subsection{Datasets} \label{datasets}
 We evaluate our forecasting methods and probabilistic wrappers on several diverse time series datasets, each offering unique characteristics:
 
 \begin{itemize}
-    \item \textbf{Australian Electricity Demand \cite{Godahewa2021Australian}}: Half-hourly electricity demand data for five states of Australia: Victoria, New South Wales, Queensland, Tasmania, and South Australia, useful for high-frequency data evaluation.
-    \item \textbf{Sunspot Activity \cite{Godahewa2021Sunspot}}: Weekly observations of sunspot numbers, ideal for testing forecasting robustness on long-term periodic patterns.
-    \item \textbf{US Births \cite{Godahewa2021USBirth}}: Daily birth records in the United States, with a clear seasonal pattern, suitable for daily data performance assessment.
+    \item \textbf{Australian Electricity Demand \citep{Godahewa2021Australian}}: Half-hourly electricity demand data for five states of Australia: Victoria, New South Wales, Queensland, Tasmania, and South Australia, useful for high-frequency data evaluation.
+    \item \textbf{Sunspot Activity \citep{Godahewa2021Sunspot}}: Weekly observations of sunspot numbers, ideal for testing forecasting robustness on long-term periodic patterns.
+    \item \textbf{US Births \citep{Godahewa2021USBirth}}: Daily birth records in the United States, with a clear seasonal pattern, suitable for daily data performance assessment.
 \end{itemize}
 
 
@@ -457,16 +457,16 @@ \subsection{Datasets} \label{datasets}
 
 \begin{table}[]
     \centering    \caption{
-    The table lists the specification strings for the estimators used in the study. Note that a full pipeline consists of pre-processing, wrapper, and base forecaster, as detailed in section \ref{methods}. \\
+    The table lists the specification strings for the estimators used in the study. Note that a full pipeline consists of pre-processing, wrapper, and base forecaster, as detailed in \autoref{methods}. \\
     Some of the parameters are determined by the used dataset: sp is 48 for the Autralian Electricity Dataset and 1 for the other. The sample\_freq is 0.005 for the Australian Electricity Dataset and 0.1 for the other.\\}
     \label{tab:hyperparams}
     \footnotesize
     \begin{tabular}{p{2.5cm}p{4cm}|p{7.5cm}}
-    \toprule  
+    \toprule
          Role & Name &  Hyperparameters \\ \midrule
          Base Forecaster & Naive last &  \texttt{NaiveForecaster(strategy="last", sp=sp)}\\
          Base Forecaster &  Naive mean &  \texttt{NaiveForecaster(strategy="mean", sp=sp)}\\
-         Base Forecaster & Naive drift &  \texttt{NaiveForecaster(strategy="drift", sp=sp)}\\ 
+         Base Forecaster & Naive drift &  \texttt{NaiveForecaster(strategy="drift", sp=sp)}\\
          Base Forecaster & Theta &   \texttt{StatsForecastAutoTheta(season\_length=sp)} \\ \midrule
          Wrapper & CI Empirical & \texttt{ConformalIntervals(forecaster, sample\_frac=sample\_frac)} \\
          Wrapper & CI Empirical residuals & \texttt{ConformalIntervals(forecaster, sample\_frac=sample\_frac, method="empirical\_residual")} \\
@@ -476,19 +476,19 @@ \subsection{Datasets} \label{datasets}
          Wrapper & Naive Variance & \texttt{NaiveVariance(forecaster, initial\_window=14*sp))}\\
          Wrapper & Squaring Residuals & \texttt{SquaringResiduals(forecaster, initial\_window=14*sp))}  \\ \midrule
          Forecasting Pipeline & Pipeline & \texttt{Differencer(1) * Deaseasonalizer(sp=sp) * Wrapper} \\ \midrule
-        
+
          ts\_bootstrap\_adapter & TSBootstrapAdapter & \texttt{TSBootstrapAdapter(tsbootsrap)} \\
          %Differencer& Differencer &  \\
-         %Deseasonalizer & Deseasonalizer &  \\ 
+         %Deseasonalizer & Deseasonalizer &  \\
          \midrule
- 
+
          tsbootstrap & Moving Block Bootstrap & \texttt{MovingBlockBootstrap()} \\
          tsbootstrap & Block Residual Bootstrap & \texttt{BlockDistributionBootstrap()} \\
          tsbootstrap & Block Statistic Preserving Bootstrap & \texttt{BlockStatisticPreservingBootstrap()} \\
          tsbootstrap & Block Distribution Bootstrap & \texttt{BlockDistributionBootstrap()}\\
 
          \bottomrule
-         
+
     \end{tabular}
 \end{table}
 
@@ -500,7 +500,7 @@ \subsection{Datasets} \label{datasets}
     \begin{tabularx}{\textwidth}{X|X|X|X|X|X|X}
          \toprule
         Dataset & Forecast Horizon & Step Width & Window Size & Cutout Period & Number of Folds & Seasonal Periodicity \\ \midrule
-        Australian Electricity Demand & 48 & 1440 & 1440 & Last Year & 12 & 48 \\ 
+        Australian Electricity Demand & 48 & 1440 & 1440 & Last Year & 12 & 48 \\
         Sunspot Activity & 28 & 395 & 365 & Last 40 Years & 12 & 1\\
         US Births & 28 & 395 & 365 & Whole Time Series & 12 & 1 \\
         \bottomrule
@@ -511,12 +511,12 @@ \subsection{Datasets} \label{datasets}
 \begin{table}[h]
     \centering
     \footnotesize
-    \caption{The code instantiation of the cross-validation splits used for the evaluation on the three datasets. The parameters are taken from Table \ref{tab:datasets}.}
+    \caption{The code instantiation of the cross-validation splits used for the evaluation on the three datasets. The parameters are taken from \autoref{tab:datasets}.}
     \label{tab:cv_splits}
     \begin{tabularx}{\textwidth}{X|X}
          \toprule
         Dataset & CV splitter \\ \midrule
-        Australian Electricity Demand & \texttt{SlidingWindowSplitter(step_length=48*30, window_length=48*30,fh=range(48))} \\  
+        Australian Electricity Demand & \texttt{SlidingWindowSplitter(step_length=48*30, window_length=48*30,fh=range(48))} \\
         Sunspot Activity & \texttt{SlidingWindowSplitter(step_length=395, window_length=365,fh=range(28))}\\
         US Births & \texttt{SlidingWindowSplitter(step_length=395, window_length=365,fh=range(28))}\\
         \bottomrule
@@ -531,13 +531,13 @@ \subsection{Datasets} \label{datasets}
 
 
 \subsection{Results} \label{results}
-In this section, we present the results of our experiments. We evaluate the performance of the forecasting methods combined with probabilistic wrappers on the datasets described in Section \ref{datasets}. 
+In this section, we present the results of our experiments. We evaluate the performance of the forecasting methods combined with probabilistic wrappers on the datasets described in \autoref{datasets}.
 %Our primary focus is on the accuracy and reliability of the prediction intervals generated by these methods.
 To increase the conciseness, we calculated the rank of each probabilistic wrapper for each combination of forecaster, metric, and dataset. Afterwards, for each metric, probabilistic wrapper and dataset, we have calculated the average across all forecasters and time series. In the following, we present the results for each dataset separately, except for the runtime, which is the same for all three experiments. Thus, we describe it only for the Australian Electricity Demand dataset.
 
 
 \subsubsection{Performance on Australian Electricity Demand}
-The results for the Australian electricity demand dataset are summarized in Table \ref{table:aus_elec_results}. We compare the performance of different forecasting models and probabilistic wrappers using the previously described evaluation metrics. 
+The results for the Australian electricity demand dataset are summarized in \autoref{table:aus_elec_results}. We compare the performance of different forecasting models and probabilistic wrappers using the previously described evaluation metrics.
 
 The ranked based evaluation show that diverse results regarding the different metrics. E.g. while CI Empirical Residual performs best on CRPS, it is only mediocre regarding the Pinball Loss and the AUCalibration. On PinballLoss, the best method is CI Empirical and on AUCalibration, it is Moving Block Bootstrap. Regarding the runtime, the fastest method is the fallback probabilistic prediction of the base forecaster. The slowest methods are NaiveVariance and SquaringResiduals. Furthermore, it seems that the ConformalIntervals are slightly faster than the BaggingForecasters.
 
@@ -566,7 +566,7 @@ \subsubsection{Performance on Australian Electricity Demand}
 \end{table}
 
 \subsubsection{Performance on Sunspot Activity}
-Table \ref{table:sunspot_results} shows the performance of our methods on the sunspot activity dataset. The long-term periodic patterns in this dataset provide a challenging test for our forecasting models.
+\autoref{table:sunspot_results} shows the performance of our methods on the sunspot activity dataset. The long-term periodic patterns in this dataset provide a challenging test for our forecasting models.
 
 The ranked based evaluation show that BaggingForecaster with the Block Distribution Bootstrap scores clearly best regarding the CRPS and Pinball Loss, and AUCalibration.
 \begin{table}[h]
@@ -595,8 +595,8 @@ \subsubsection{Performance on Sunspot Activity}
 \end{table}
 
 \subsubsection{Performance on US Births}
-The results for the US births dataset are presented in Table \ref{table:us_births_results}. This dataset, with its clear seasonal pattern, allows us to assess the models' ability to handle daily data.
-The ranked based evaluation show that BaggingForecaster with the Block Distribution Bootstrap scores best regarding the CRPS and Pinball Loss. 
+The results for the US births dataset are presented in \autoref{table:us_births_results}. This dataset, with its clear seasonal pattern, allows us to assess the models' ability to handle daily data.
+The ranked based evaluation show that BaggingForecaster with the Block Distribution Bootstrap scores best regarding the CRPS and Pinball Loss.
 Regarding the AUCalibration, the best score is achieved by CI Conformal.
 
 \begin{table}[h]
@@ -627,7 +627,7 @@ \subsubsection{Performance on US Births}
 \section{Discussion and Conclusion} \label{conclusion}
 Our experiments demonstrate that the benchmarking framework in \texttt{sktime} provides an easy-to-use solution for reproducible benchmarks. We showed this by conducting simple benchmark studies of probabilistic wrappers for point forecasts on three different systems and make the corresponding code available at: \url{https://github.com/sktime/code_for_paper_scipyconf24/tree/main}.
 
-Regarding our benchmark study, we note that this is a limited study primarily aimed at showcasing the capabilities of the proposed framework. Therefore, future work should include a comprehensive hyperparameter search to identify the best parameters for the probabilistic wrappers. Additionally, further bootstrapping methods need to be explored, as well as other wrappers such as generative neural networks, including Generative Adversarial Networks, Variational Autoencoders, or Invertible Neural Networks \cite{phipps2024, wang2020}.
+Regarding our benchmark study, we note that this is a limited study primarily aimed at showcasing the capabilities of the proposed framework. Therefore, future work should include a comprehensive hyperparameter search to identify the best parameters for the probabilistic wrappers. Additionally, further bootstrapping methods need to be explored, as well as other wrappers such as generative neural networks, including Generative Adversarial Networks, Variational Autoencoders, or Invertible Neural Networks \citep{phipps2024, wang2020}.
 
 Besides extending the range of wrappers, we also plan to include additional point forecasters as base models, such as AutoARIMA, in our study. Furthermore, the number of examined datasets should be expanded to provide a more comprehensive evaluation. Finally, we did not perform a grid search on the hyperparameters for the wrappers, which means that with different hyperparameters, their performance and runtime might change.
 
diff --git a/papers/benchmarking/myst.yml b/papers/benchmarking/myst.yml
index db424845e6..97920fd942 100644
--- a/papers/benchmarking/myst.yml
+++ b/papers/benchmarking/myst.yml
@@ -1,8 +1,11 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/VPNX1595
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-00_benchmarking_sktime
-  title: Evaluating Probabilistic Forecasters with sktime and tsbootstrap -- Easy-to-Use, Configurable Frameworks for Reproducible Science
+  title: Evaluating Probabilistic Forecasters with sktime and tsbootstrap — Easy-to-Use, Configurable Frameworks for Reproducible Science
+  description: Evaluating probabilistic forecasts is complex and essential across various domains, yet no comprehensive software framework exists to simplify this task. Despite extensive literature on evaluation methodologies, current practices are fragmented and often lack reproducibility. To address this gap, we introduce a reproducible experimental workflow for evaluating probabilistic forecasting algorithms using the sktime package.
   # Authors should have affiliations, emails and ORCIDs if available
   authors:
     - name: Benedikt Heidrich
@@ -30,7 +33,7 @@ project:
         - Writing - review & editing
         - Investigation
       corresponding: true
-    - name: Franz Kiraly 
+    - name: Franz Kiraly
       email: franz.kiraly@sktime.net
       affiliations:
         - sktime
@@ -69,17 +72,5 @@ project:
         - hyndman2018
         - hyndman2020package
         - garza2022statsforecast
-  # A banner will be generated for you on publication, this is a placeholder
-  #banner: banner.png
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
-  exports: 
-    - format: pdf
-  numbering:
-      headings: true
 site:
   template: article-theme
diff --git a/papers/benchmarking/package-lock.json b/papers/benchmarking/package-lock.json
deleted file mode 100644
index f2a60cd9dd..0000000000
--- a/papers/benchmarking/package-lock.json
+++ /dev/null
@@ -1,6 +0,0 @@
-{
-  "name": "benchmarking",
-  "lockfileVersion": 3,
-  "requires": true,
-  "packages": {}
-}
diff --git a/papers/benchmarking/thumbnail.png b/papers/benchmarking/thumbnail.png
new file mode 100644
index 0000000000..5e7102ad2e
Binary files /dev/null and b/papers/benchmarking/thumbnail.png differ
diff --git a/papers/blaine_mooers/banner.png b/papers/blaine_mooers/banner.png
index e6a793bd6c..80db4deeea 100644
Binary files a/papers/blaine_mooers/banner.png and b/papers/blaine_mooers/banner.png differ
diff --git a/papers/blaine_mooers/main.md b/papers/blaine_mooers/main.md
index cc115fd8d8..44ebae3d34 100644
--- a/papers/blaine_mooers/main.md
+++ b/papers/blaine_mooers/main.md
@@ -2,10 +2,9 @@
 # Voice Computing with Python in Jupyter Notebooks
 title: Voice Computing with Python in Jupyter Notebooks
 abstract: |
-    Jupyter is a popular platform for writing interactive computational narratives that contain computer code and its output interleaved with prose that describes the code and the output. It is possible to use one's voice to interact with Jupyter notebooks. This capability improves access to those with impaired use of their hands. Voice computing also increases the productivity of workers who are tired of typing and increases the productivity of those workers who speak faster than they can type. Voice computing can be divided into three activities: speech-to-text, speech-to-command, and speech-to-code. We will provide examples of the first two activities with the Voice-In Plus plugin for Google Chrome and Microsoft Edge. To support the editing of Markdown and code cells in Jupyter notebooks, we provide several libraries of voice commands at MooersLab on GitHub.
+  Jupyter is a popular platform for writing interactive computational narratives that contain computer code and its output interleaved with prose that describes the code and the output. It is possible to use one's voice to interact with Jupyter notebooks. This capability improves access to those with impaired use of their hands. Voice computing also increases the productivity of workers who are tired of typing and increases the productivity of those workers who speak faster than they can type. Voice computing can be divided into three activities: speech-to-text, speech-to-command, and speech-to-code. We will provide examples of the first two activities with the Voice-In Plus plugin for Google Chrome and Microsoft Edge. To support the editing of Markdown and code cells in Jupyter notebooks, we provide several libraries of voice commands at MooersLab on GitHub.
 ---
 
-
 ## Introduction
 
 Voice computing includes speech-to-text, speech-to-commands, and speech-to-code.
@@ -14,33 +13,34 @@ Using your voice can partially replace use of the keyboard when tired of typing,
 With the Voice In Plus plugin for Google Chrome and Microsoft Edge, we could be productive within an hour.
 This plugin is easy to install, provides accurate dictation, and is easy to modify to correct wrong word insertions with text replacements.
 
-We mapped spoken words to be replaced, called *voice triggers*, to equations set in LaTeX and to code snippets that span one to many lines.
-These *voice-triggered snippets* are analogous to traditional tab-triggered snippets supported by most text editors.
+We mapped spoken words to be replaced, called _voice triggers_, to equations set in LaTeX and to code snippets that span one to many lines.
+These _voice-triggered snippets_ are analogous to traditional tab-triggered snippets supported by most text editors.
 (A tab trigger is a placeholder word replaced with the corresponding code when the tab key is pressed after entering the tab trigger.
 The existing extensions for code snippets in Jupyter do not support tab triggers.)
 We could use Voice In Plus to insert voice-triggered snippets into code and Markdown cells in Jupyter notebooks.
 Our voice-triggered snippets still require customizing to the problem at hand via some use of the keyboard, but their insertion by voice command saves time.
 
-To facilitate voice commands in Jupyter notebooks, we have developed libraries of voice-triggered snippets for use in Markdown or code cells with the *Voice-In Plus* plugin.
+To facilitate voice commands in Jupyter notebooks, we have developed libraries of voice-triggered snippets for use in Markdown or code cells with the _Voice-In Plus_ plugin.
 We are building on our experience with tab-triggered code snippets in text editors [@Mooers2021TemplatesForWritingPyMOLScripts] and domain-specific code snippet libraries for Jupyter [@Mooers2021APyMOLSnippetLibraryForJupyterToBoostResearcherProductivity].
 We have made libraries of these voice-triggered snippets for several of the popular modules of the scientific computing stack for Python.
 These voice-triggered snippets are another tool for software engineering that complements existing tools for enhancing productivity.
 
-
 ## Methods and Materials
 
 ### Hardware
+
 We used a 2018 15-inch MacBook Pro laptop computer.
 It had 32 gigabytes of RAM and one Radeon Pro 560X 4 GB GPU.
 We used the laptop's built-in microphone to record dictation while sitting or standing up to 20 feet (ca. 6 m) away from the computer.
 
 ### Installation of Voice In Plus
-We used the *Voice In* plugin provided by Dictanote Inc.
-First, we installed the *Voice In* plugin by navigating to the [Plugin In page](https://chromewebstore.google.com/detail/voice-in-speech-to-text-d) in the Google Chrome Web Store on the World Wide Web.
+
+We used the _Voice In_ plugin provided by Dictanote Inc.
+First, we installed the _Voice In_ plugin by navigating to the [Plugin In page](https://chromewebstore.google.com/detail/voice-in-speech-to-text-d) in the Google Chrome Web Store on the World Wide Web.
 Second, the [Microsoft Edge Addons web site](https://microsoftedge.microsoft.com/addons/Microsoft-Edge-Extensions-Home) was accessed to install the plugin in Microsoft Edge.
 
-We needed an internet connection to use *Voice In* because Dictanote tracks the websites visited and whether the plugin worked on those websites.
-*Voice In* uses the browser's built-in Speech-to-Text software to transcribe speech into text, so no remote servers are used for the transcription,/ so the transcription process was nearly instant and kept up with the dictation of multiple paragraphs.
+We needed an internet connection to use _Voice In_ because Dictanote tracks the websites visited and whether the plugin worked on those websites.
+_Voice In_ uses the browser's built-in Speech-to-Text software to transcribe speech into text, so no remote servers are used for the transcription,/ so the transcription process was nearly instant and kept up with the dictation of multiple paragraphs.
 Dictanote does not store the audio or the transcripts.
 
 After activating the plugin, we customized it by selecting a dictation language from a pull-down menu.
@@ -50,15 +50,15 @@ The English variants include dialects from Western Europe, Africa, South Asia, a
 Next, we set a keyboard shortcut to activate the plugin.
 We selected command-L on our Mac because this shortcut was not already in use.
 A list of Mac shortcuts can be found [here](https://support.apple.com/en-us/102650).
-This second customization was the last one that we could make for the free version of *Voice In*.
+This second customization was the last one that we could make for the free version of _Voice In_.
 
-Customized text replacements are available in *Voice In Plus*.
-*Voice In Plus* is activated by purchasing a $39 annual subscription through the **Subscription** sub-menu in the *Voice In Settings* sidebar of the *Voice In Options* web page.
+Customized text replacements are available in _Voice In Plus_.
+_Voice In Plus_ is activated by purchasing a $39 annual subscription through the **Subscription** sub-menu in the _Voice In Settings_ sidebar of the _Voice In Options_ web page.
 Monthly and lifetime subscription options are available.
-Only one subscription was required to use *Voice In Plus* in both web browsers.
+Only one subscription was required to use _Voice In Plus_ in both web browsers.
 The library is synched between the browsers.
 
-After purchasing a subscription, we accessed the **Custom Commands** in the *Voice In Settings* sidebar.
+After purchasing a subscription, we accessed the **Custom Commands** in the _Voice In Settings_ sidebar.
 We used the **Add Command** button to enter a voice trigger to start a new sentence when writing one sentence per line {ref}`fig:newSentence`.
 The custom command for `new paragraph` can include a `period` followed by two `new line` commands, which works well when writing with blank lines between paragraphs and without indentation.
 
@@ -78,16 +78,15 @@ Three-word voice triggers are a good compromise between specificity and success
 :::{figure} ./images/csvexamplefile.png
 :label: fig:csvexamplefile
 :width: 50%
-Snapshot of CSV file on Github for the [*jupyter-voice-in*](https://github.com/MooersLab/jupyter-voice-in/blob/main/jupyter.csv) library.
+Snapshot of CSV file on Github for the [_jupyter-voice-in_](https://github.com/MooersLab/jupyter-voice-in/blob/main/jupyter.csv) library.
 :::
 
-
-An exception to this guideline for shorter voice triggers was using two to three words at the beginning of a set of voice triggers to group them in the online *Voice In* library.
+An exception to this guideline for shorter voice triggers was using two to three words at the beginning of a set of voice triggers to group them in the online _Voice In_ library.
 Grouping related voice commands made finding them in the online library easier.
-For example, all Jupyter-related line magic voice triggers start with the phrase *line magic*.
-The prefix *line magic* is easy to remember, so it only adds a little to the recall problem.
-We show below a snapshot of the CSV file displayed in a pleasant format on GithHub for the Jupyter line magics {ref}`fig:csvexamplefile`.
-Note that these CSV files are atypical because they do not contain a line of column headers; *Voice In* does not recognize column headings.
+For example, all Jupyter-related line magic voice triggers start with the phrase _line magic_.
+The prefix _line magic_ is easy to remember, so it only adds a little to the recall problem.
+We show below a snapshot of the CSV file displayed in a pleasant format on GitHub for the Jupyter line magics {ref}`fig:csvexamplefile`.
+Note that these CSV files are atypical because they do not contain a line of column headers; _Voice In_ does not recognize column headings.
 
 The accuracy of the software's interpretation of the phrase was another limitation.
 We would replace frequently misinterpreted words with other words that were more often correctly interpreted.
@@ -106,8 +105,8 @@ First, we placed all the text on a single line and inserted the built-in command
 Second, we enclosed the multiline replacement text with one set of double quotes.
 Double quotes inside these text blocks had to be replaced with single quotes.
 We could use a backslash to escape internal pre-existing double quotation marks.
-Text replacements consisting of commas also had to be enclosed with double else the commas would be misintrepreted as field separators.
-This also appled on Python code that contained commas.
+Text replacements consisting of commas also had to be enclosed with double else the commas would be misinterpreted as field separators.
+This also applied on Python code that contained commas.
 
 We used the **Bulk Add** button to upload multiple commands from a two-column CSV file with commas as the field separator.
 We selected the file contents and pasted them in the open text box after clicking the **Bulk Add** button.
@@ -129,12 +128,14 @@ Each row in the GUI has edit and delete icons.
 The edit icon opens a pop-up menu similar to the pop-up menu invoked by the **Add Command** button.
 
 ### Construction of the snippet libraries
+
 Some of our voice snippets had already been used for a year to compose prose using dictation.
 These snippets are in modular CSV files to ease their selective use.
 The contents of these files can be copied and pasted into the `bulk add` text area of the Voice In Plus configuration GUI.
 
 ### Construction of interactive quizzes
-We developed an interactive quiz to aid the mastery of the *Voice In Plus* syntax.
+
+We developed an interactive quiz to aid the mastery of the _Voice In Plus_ syntax.
 We wrote the quiz as a Python script that can run interactively in the terminal or Jupyter notebooks.
 The quiz randomizes the order of the questions upon restart, ensuring a fresh experience every time.
 When you encounter a question you cannot answer, the quiz steps in to provide feedback, empowering you to learn from your mistakes and improve.
@@ -142,6 +143,7 @@ Your wrongly answered questions are recycled during the current quiz session to
 We set a limit of 40 questions per quiz, allowing you to pace your learning and avoid exhaustion.
 
 ### Availability of the libraries and quizzes
+
 We tested the libraries using Jupyter Lab version 4.2 and Python 3.12 installed from MacPorts.
 All libraries are available at MooersLab on GitHub for download.
 
@@ -168,8 +170,8 @@ While some code, such as IPython line and cell magics, is specific to Jupyter, y
 
 Likewise, you can use these snippets in other browser-hosted text editors, such as the web version of Visual Studio Code because Voice In Plus works in most text areas of web browsers.
 You can use web services with ample text areas to draft documents and Python scripts with the help of voice-triggered snippets.
-You can also use Voice In Plus inside text boxes in local HTML files; however, Voice In Plus still requires an internet connection. 
-You can edit the code or text with an advanced text editor using the GhostText plugin to connect a web-based text area to a text editor. 
+You can also use Voice In Plus inside text boxes in local HTML files; however, Voice In Plus still requires an internet connection.
+You can edit the code or text with an advanced text editor using the GhostText plugin to connect a web-based text area to a text editor.
 Alternatively, you can save a draft document or script to a file and import it into Jupyter Lab for further editing off line.
 
 ### Libraries for Markdown cells
@@ -212,27 +214,27 @@ The customized commands are listed alphabetically in the Voice-In Plus GUI, with
 The prefixes group like commands and, thereby, ease the manual lookup of the commands.
 
 Some prefixes are two or more words long.
-For example, the compound prefix *insert Python* aids the grouping of voice triggers by programming language.
+For example, the compound prefix _insert Python_ aids the grouping of voice triggers by programming language.
 
 :::{table} Examples of voice commands with the prefix in bold that is used to group commands.
 :label: table:commands
 :align: center
 
-| Voice commands                                                                                                     |
-|:-------------------------------------------------------------------------------------------------------------------|
-| **expand** acronyms                                                                                                |
-| **the book** title                                                                                                 |
-| **email** inserts list of e-mail addresses (e.g., email bayesian study group)                                      |
-| **insert Python** (e.g., insert Python for horizontal bar plot)                                                    |
-| **insert Markdown** (e.g., insert Markdown header 3)                                                               |
-| **list** (e.g., list font sizes in beamer slides.)                                                                 |
-| **open** webpage (e.g., open google scholar)                                                                       |
-| **display** insert equation in display mode (e.g., display electron density equation)                              |
-| **display with terms** above plus list of terms and their definitions  (e.g., display electron density equation)   |
-| **inline** equation in-line (e.g., inline information entropy)                                                     |
-| **site** insert corresponding citekey (e.g., site Jaynes 1957)                                                     |
-::::
+| Voice commands                                                                                                  |
+| :-------------------------------------------------------------------------------------------------------------- |
+| **expand** acronyms                                                                                             |
+| **the book** title                                                                                              |
+| **email** inserts list of e-mail addresses (e.g., email bayesian study group)                                   |
+| **insert Python** (e.g., insert Python for horizontal bar plot)                                                 |
+| **insert Markdown** (e.g., insert Markdown header 3)                                                            |
+| **list** (e.g., list font sizes in beamer slides.)                                                              |
+| **open** webpage (e.g., open google scholar)                                                                    |
+| **display** insert equation in display mode (e.g., display electron density equation)                           |
+| **display with terms** above plus list of terms and their definitions (e.g., display electron density equation) |
+| **inline** equation in-line (e.g., inline information entropy)                                                  |
+| **site** insert corresponding citekey (e.g., site Jaynes 1957)                                                  |
 
+::::
 
 Another example of a verb starting a voice trigger is the command `display <equation name>`.
 This command is used in Markdown cells to insert equations in the display mode of LaTeX in Markdown cells.
@@ -246,10 +248,9 @@ This image is followed by the resulting Markdown cell after rendering by running
 :align: center
 :label: fig:displayeq
 
-Three snapshots from a Zoom video of using the voice-trigger *display electron density equation* in a Markdown cell in a Jupyter notebook. A. The Zoom transcript showing the spoken voice trigger. B. The text replacement in the form of a math equation written in LaTeX in display mode in the Markdown cell. C. The rendered Markdown cell. The green and black tab on the right of each panel indicates that the Voice In plugin is active and listening for speech.
+Three snapshots from a Zoom video of using the voice-trigger _display electron density equation_ in a Markdown cell in a Jupyter notebook. A. The Zoom transcript showing the spoken voice trigger. B. The text replacement in the form of a math equation written in LaTeX in display mode in the Markdown cell. C. The rendered Markdown cell. The green and black tab on the right of each panel indicates that the Voice In plugin is active and listening for speech.
 :::
 
-
 Likewise, the command `inline <equation name>` is used to insert equations in prose sections in Markdown cells.
 We have introduced voice-triggered snippet libraries for equations commonly used in machine learning and Bayesian data analysis https://github.com/MooersLab/.
 You can use these equations as templates to generate new equations.
@@ -261,7 +262,7 @@ For example, the voice trigger `URL` is used to insert URLs for essential websit
 Another example involves using the verb `list` in the voice trigger, as in `list matplotlib color codes`, to generate a list of the color codes used in Matplotlib plots.
 These voice triggers provide instant access to essential information, saving you the time and effort of manual searches.
 
-The markup language code is inserted using the verb *insert*, followed by the markup language name and the name of the code.
+The markup language code is inserted using the verb _insert_, followed by the markup language name and the name of the code.
 For example, the command `insert markdown itemized list` will insert five vertically aligned dashes to start an itemized list.
 The command `insert latex itemized list` will insert the corresponding code for an itemized list in LaTeX.
 
@@ -288,21 +289,20 @@ Reading appropriately cleaned data is a common task in data science and a common
 Our data wrangling library provides code fragments that directly import various file types, easing the task of data import and allowing focus on downstream utilization and analysis.
 
 After the data are verified as correctly imported, exploring them by plotting them to detect relationships between a model's parameters and the output is often necessary.
-Our focus on the versatile *matplotlib* library, which generates various plots, is designed to inspire creativity in data visualization and analysis [*matplotlib].
+Our focus on the versatile _matplotlib_ library, which generates various plots, is designed to inspire creativity in data visualization and analysis [*matplotlib].
 Our code fragments cover the most commonly used plots, such as scatter plots, bar graphs (including horizontal bar graphs), kernel density fitted distributions, heat Maps, pie charts, and contour plots.
 We include a variety of examples for formatting tick marks and axis labels as well as the keys and the form of the lines so users can use this information as templates to generate plots for their purposes.
 Generating plots with lines of different shapes, whether solid, dashed, dotted, or combinations thereof, is essential because plots generated with just color are vulnerable to having their information compromised when printed in grayscale.
-Although we provide some examples from higher-order plotting programs like *seaborn* [@seaborn], we focused on matplotlib because most other plotting programs, except the interactive plotting programs, are built on top of it.
+Although we provide some examples from higher-order plotting programs like _seaborn_ [@seaborn], we focused on matplotlib because most other plotting programs, except the interactive plotting programs, are built on top of it.
 
-We also support the import of external images. 
+We also support the import of external images.
 Images often play essential roles in the stories told with Jupyter notebooks.
 
-
 ### Jupyter specific library
 
 We provide a [library](https://github.com/MooersLab/jupyter-voice-in/blob/main/jupyter.csv) of 85 cell and line magics that facilitate the Jupyter notebook's interaction with the rest of the operating system.
 Our cell magics, easily identifiable by their cell magic prefix, and line magics, with the straightforward line magic prefix, are designed to make the Jupyter notebook experience more intuitive.
-For example, the voice command *line majic run* inserts `%run`. 
+For example, the voice command _line majic run_ inserts `%run`.
 You use this command to run a script file {ref}`fig:quiz`.
 
 ### Interactive quiz
@@ -318,15 +318,14 @@ An example of an interactive session with a quiz in a Jupyter notebook. The code
 :::
 
 To build long-term recall of the commands, you must take the quiz five or more times on alternate days, according to the principles of spaced repetition learning.
-These principles were developed by the German psychologist Hermann Ebbinghaus near the end of the 19th Centur and popularized in English his 1913 book *Memory: A Contribution to Experimental Psychology*.
-They have been validated several times by other researchers, including those developed an alogorithm for optimizing the spacing of the repetitions as function of the probability of recall [@tabibian2019enhancinghumanlearningviaspacedrepetitionoptimization].
+These principles were developed by the German psychologist Hermann Ebbinghaus near the end of the 19th Centur and popularized in English his 1913 book _Memory: A Contribution to Experimental Psychology_.
+They have been validated several times by other researchers, including those developed an algorithm for optimizing the spacing of the repetitions as function of the probability of recall [@tabibian2019enhancinghumanlearningviaspacedrepetitionoptimization].
 Spaced repetition learning is one of the most firmly established results of research into human psychology.
 
 Most people need discipline to carry out this kind of learning because they have to schedule the time to do the follow-up sessions.
 Instead, people will find it more convenient to take these quizzes several times in 20 minutes before they spend many hours utilizing the commands.
 If that use occurs on subsequent days, then the recall will be reinforced and retaking the quiz may not be necessary.
 
-
 ### Limitations on using Voice In Plus
 
 The plugin operates in text areas on thousands of web pages.
@@ -338,27 +337,27 @@ Voice In Plus works in Jupyter Lite.
 It also works in streamlet-quill, which uses a Python script to generate a text box in the default web browser.
 It also works in the web-based version of [VS Code](https://vscode.dev/).
 
-Voice In will not work in desktop applications that support the editing of Jupyter notebooks, such as the [*JupyterLab Desktop*](https://github.com/jupyterlab/jupyterlab-desktop) application, the [*nteract*](https://nteract.io/) application, and external text editors, such as *VS Code*, that support the editing of Jupyter notebooks.
-Likewise, *Voice In Plus* will not work in Python Widgets.
-*Voice In Plus* is limited to web browsers, whereas other automated speech recognition software can also operate in the terminal and at the command prompt in GUI-driven applications.
+Voice In will not work in desktop applications that support the editing of Jupyter notebooks, such as the [_JupyterLab Desktop_](https://github.com/jupyterlab/jupyterlab-desktop) application, the [_nteract_](https://nteract.io/) application, and external text editors, such as _VS Code_, that support the editing of Jupyter notebooks.
+Likewise, _Voice In Plus_ will not work in Python Widgets.
+_Voice In Plus_ is limited to web browsers, whereas other automated speech recognition software can also operate in the terminal and at the command prompt in GUI-driven applications.
 
-*Voice In Plus*  is very accurate, with a word error rate that is well below 10\%.
+_Voice In Plus_ is very accurate, with a word error rate that is well below 10\%.
 Like all other dictation software, the word error rate depends on the quality of the microphone used.
-*Voice-In Plus* can pick out words from among background ambient noise such as load ventilation systems, traffic, and outdoor bird songs.
+_Voice-In Plus_ can pick out words from among background ambient noise such as load ventilation systems, traffic, and outdoor bird songs.
 
-The language model used by *Voice-In Plus*  is quite robust in that dictation can be performed without an external microphone.
+The language model used by _Voice-In Plus_ is quite robust in that dictation can be performed without an external microphone.
 We found no reduction in word error rate when using a high-quality Yeti external microphone.
 Our experience might be a reflection of our high-end hardware and may not transfer to low-end computers.
 
-Because of the way *Voice-In Plus* is set up to utilize the Speech-to-Text feature of the Google API, there is not much of a latency issue.
+Because of the way _Voice-In Plus_ is set up to utilize the Speech-to-Text feature of the Google API, there is not much of a latency issue.
 The spoken words' transcriptions occur nearly in real-time; there is only a minor lag.
-*Voice In Plus* will listen for words for 3 minutes before automatically shutting down.
-*Voice In Plus* can generally keep up with dictation occurring at a moderate pace for at least several paragraphs, whereas competing dictation software packages tend to quit after one paragraph.
+_Voice In Plus_ will listen for words for 3 minutes before automatically shutting down.
+_Voice In Plus_ can generally keep up with dictation occurring at a moderate pace for at least several paragraphs, whereas competing dictation software packages tend to quit after one paragraph.
 The program hallucinates when the dictation has occurred at high speed because the transcription has fallen behind.
 You have to pay attention to the progress of the transcription if you want all of your spoken words captured.
 If the transcription halts, it is best to deactivate the plugin, activate it, and resume the dictation.
 
-Pronounce the first word of each sentence loudly so that they are recorded. 
+Pronounce the first word of each sentence loudly so that they are recorded.
 Otherwise, the first words of your sentences will be skipped.
 This problem of omitted words is most acute when there has been a pause in the dictation.
 
@@ -378,7 +377,7 @@ They removed our bloated library, and we used the **bulk add** button in the GUI
 
 ## Discussion
 
-The following four discussion points can enhance understanding the use of *Voice-In Plus* with Jupyter.
+The following four discussion points can enhance understanding the use of _Voice-In Plus_ with Jupyter.
 
 ### Independence from breaking changes in Jupyter
 
@@ -388,14 +387,15 @@ Unfortunately, changes in the core of Jupyter occasionally break these third-par
 Users are burdened with the task of creating Python environments for older versions of Jupyter to work with their favorite outdated snippet extension, all the while missing out on the new features of Jupyter.
 An obvious solution to this problem would be for the Jupyter developers to incorporate one of the snippet extensions into the base distribution of Jupyter to ensure that at least one form of support for snippets is always available.
 Using voice-triggered snippets external to Jupyter side steps the disruption of snippet extensions by breaking changes in new versions of Jupyter.
+
 ### Voice-triggered snippets can complement AI-assisted voice computing
 
 The use of voice-triggered snippets requires knowledge of the code that you want to insert.
 The act of acquiring this knowledge is the up-front cost that the user pays to gain access to quickly inserted code snippets that work.
 In contrast, AI coding assistants can find code that you do not know about to solve the problem described in your prompt.
-From personal experence, the retrieval of the correct code can take multiple iterations of refining the prompt.
+From personal experience, the retrieval of the correct code can take multiple iterations of refining the prompt.
 Expert prompt engineers will find the correct code in several minutes while beginners may take much longer.
-An area of future research is to use AI assistants that have large language models indexed on snipppet libraries to retrieve the correct voice-triggered snippet.
+An area of future research is to use AI assistants that have large language models indexed on snippet libraries to retrieve the correct voice-triggered snippet.
 
 ### Comparision with automated speech recogination extensions for Jupyter lab
 
@@ -407,14 +407,14 @@ This program requires clicking on a microphone icon frequently, which makes the
 The third extension, jupyter-voicepilot, is designed to provide a unique voice control experience.
 Although the extension's name suggests it uses GitHub's Copilot, it uses whisper-1 and ChatGPT3.
 This extension requires an API key for ChatGPT3.
-The robustness of our approach is that the *Voice-In Plus* should work in all browser-based versions of Jupyter Lab and Jupyter Notebook.
+The robustness of our approach is that the _Voice-In Plus_ should work in all browser-based versions of Jupyter Lab and Jupyter Notebook.
 
 ### Coping with the imperfections of the language model
 
 One of the most significant challenges in speech-to-text technology is the occurrence of persistent errors in transcription.
 These persistent errors may be due to the language model having difficulties interpreting your speech.
-For example, the language model often misinterprets the word  *write* as *right*'.
-Likewise, the letter *R* is frequently returned as *are* or *our*'.
+For example, the language model often misinterprets the word _write_ as _right_'.
+Likewise, the letter _R_ is frequently returned as _are_ or _our_'.
 It's crucial to have a remedy for these situations, which involves mapping the misinterpreted phrase to the intended phrase.
 
 This remedy might be the best that can be done for those users who are from a country that is not represented by the selection of English dialects available in Voice In Plus.
@@ -422,10 +422,9 @@ People from Eastern Europe, the Middle East, and Northeast Asia fall into this c
 Users in this situation may have to add several hundred text replacements.
 As the customized library of text replacements grows, the frequency of wrong word insertions should decrease significantly, offering hope for improved accuracy in your speech-to-text transcriptions.
 
-
-### Future directions
+## Future directions
 
 Our future directions include building out the libraries of voice-triggered snippets.
 Another direction includes the development of voice stops analogous to tab stops in code snippets for advanced text editors.
 The cursor would advance to the site of the next voice stop that you should consider editing to customize the code snippet for the project at hand.
-Another related advancement would be the concept of mirroring the parameter values at identical voice stops, similar to the functionality of mirrored tab stops in conventional snippets.
\ No newline at end of file
+Another related advancement would be the concept of mirroring the parameter values at identical voice stops, similar to the functionality of mirrored tab stops in conventional snippets.
diff --git a/papers/blaine_mooers/myst.yml b/papers/blaine_mooers/myst.yml
index 49d154b90d..7c44a857d8 100644
--- a/papers/blaine_mooers/myst.yml
+++ b/papers/blaine_mooers/myst.yml
@@ -1,28 +1,31 @@
 # See docs at: https://mystmd.org/guide/frontmatter
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/MCYV2126
   id: scipy-2024-blaine_mooers
   # id: 748163c7-a7d5-41f6-b9a0-337ab3b5f6d6
   title: Voice Computing with Python in Jupyter Notebooks
-  # description:
-  keywords:     
+  description: Jupyter is a popular platform for writing interactive computational narratives that contain computer code and its output interleaved with prose that describes the code and the output. It is possible to use one’s voice to interact with Jupyter notebooks.
+  keywords:
     - accessibility
     - productivity
     - human-computer interface
-  authors: 
-      - name: Blaine H. M. Mooers
-        roles: 
-            - Conceptualization
-            - Code development and testing
-            - Writing and editing
-            - Funding
-        email: blaine-mooers@ouhsc.edu
-        orcid: 0000-0001-8181-8987
-        affiliations:
-            - Department of Biochemistry and Physiology, College of Medicine, University of Oklahoma Health Sciences, Oklahoma City, OK 73104, USA
-            - Laboratory of Biomolecular Structure and Function, University of Oklahoma Health Sciences, Oklahoma City, OK 73104, USA
-            - Stephenson Cancer Center, University of Oklahoma Health Sciences, Oklahoma City, OK 73104, USA
-        corresponding: true
+  authors:
+    - name: Blaine H. M. Mooers
+      roles:
+        - Conceptualization
+        - Software
+        - Writing
+        - editing
+        - Funding
+      email: blaine-mooers@ouhsc.edu
+      orcid: 0000-0001-8181-8987
+      affiliations:
+        - Department of Biochemistry and Physiology, College of Medicine, University of Oklahoma Health Sciences, Oklahoma City, OK 73104, USA
+        - Laboratory of Biomolecular Structure and Function, University of Oklahoma Health Sciences, Oklahoma City, OK 73104, USA
+        - Stephenson Cancer Center, University of Oklahoma Health Sciences, Oklahoma City, OK 73104, USA
+      corresponding: true
   github: https://github.com/mooerslab/scipy_proceedings
   # bibliography: []
   # title:
@@ -38,12 +41,5 @@ project:
         - human-computer interface
         - speech-to-text
         - software engineering
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
 site:
-  template: article-theme
\ No newline at end of file
+  template: article-theme
diff --git a/papers/blaine_mooers/thumbnail.png b/papers/blaine_mooers/thumbnail.png
new file mode 100644
index 0000000000..5a1f85d9a5
Binary files /dev/null and b/papers/blaine_mooers/thumbnail.png differ
diff --git a/papers/brian_falkenstein/banner.png b/papers/brian_falkenstein/banner.png
index e6a793bd6c..d81d04eb3a 100644
Binary files a/papers/brian_falkenstein/banner.png and b/papers/brian_falkenstein/banner.png differ
diff --git a/papers/brian_falkenstein/myst.yml b/papers/brian_falkenstein/myst.yml
index 2496e34e5e..011e315c92 100644
--- a/papers/brian_falkenstein/myst.yml
+++ b/papers/brian_falkenstein/myst.yml
@@ -1,10 +1,13 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/YCFW5807
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-brian_falkenstein
   # Ensure your title is the same as in your `main.md`
   title: Predx-Tools
   subtitle: Dispelling the Mystery in Histopathological Image Processing
+  description: Histopathological images, which are digitized images of human or animal tissue, contain insights into disease state. We present PredX-Tools, a suite of simple and easy to use python GUI applications which facilitate analysis of histopathological images and provide a no-code platform for data scientists and researchers to perform analysis on raw and transformed data.
   # Authors should have affiliations, emails and ORCIDs if available
   authors:
     - name: Brian Falkenstein
@@ -33,11 +36,11 @@ project:
     - name: Raymond Yan
       email: raymond.yan@predxbio.com
       affiliations:
-        - PredxBio, Inc.    
+        - PredxBio, Inc.
   keywords:
     - digital pathology
     - brightfield
-    - multiplex immunofluorescence 
+    - multiplex immunofluorescence
     - point process
   # Add the abbreviations that you use in your paper here
   abbreviations:
@@ -63,13 +66,5 @@ project:
         - tifffile
         - zarr
         - pysimplegui
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
 site:
   template: article-theme
diff --git a/papers/brian_falkenstein/thumbnail.png b/papers/brian_falkenstein/thumbnail.png
new file mode 100644
index 0000000000..a476054ee2
Binary files /dev/null and b/papers/brian_falkenstein/thumbnail.png differ
diff --git a/papers/christopher_ariza/banner.png b/papers/christopher_ariza/banner.png
new file mode 100644
index 0000000000..51a2b51c56
Binary files /dev/null and b/papers/christopher_ariza/banner.png differ
diff --git a/papers/christopher_ariza/myst.yml b/papers/christopher_ariza/myst.yml
index 06b5abfc21..224f081d72 100644
--- a/papers/christopher_ariza/myst.yml
+++ b/papers/christopher_ariza/myst.yml
@@ -1,9 +1,12 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/WPXM6451
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-christopher_ariza
   # Ensure your title is the same as in your `main.md`
   title: Improving Code Quality with Array and DataFrame Type Hints
+  description: This article demonstrates practical approaches to fully type-hinting generic NumPy arrays and StaticFrame DataFrames, and shows how the same annotations can improve code quality with both static analysis and runtime validation.
   # Authors should have affiliations, emails and ORCIDs if available
   authors:
     - name: Christopher Ariza
@@ -19,7 +22,6 @@ project:
   # Add the abbreviations that you use in your paper here
   abbreviations:
     MyST: Markedly Structured Text
-
   error_rules:
     - rule: doi-exists
       severity: ignore
@@ -31,15 +33,5 @@ project:
         - pep646
         - pep484
         - pep593
-
-
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
 site:
   template: article-theme
diff --git a/papers/christopher_ariza/thumbnail.png b/papers/christopher_ariza/thumbnail.png
new file mode 100644
index 0000000000..d3ba66f738
Binary files /dev/null and b/papers/christopher_ariza/thumbnail.png differ
diff --git a/papers/cockett_etal/article.md b/papers/cockett_etal/article.md
index b11a5d018a..a65fed6f83 100644
--- a/papers/cockett_etal/article.md
+++ b/papers/cockett_etal/article.md
@@ -62,7 +62,7 @@ Many of these processes are hidden from authors[^hidden-processes] as well as ac
     Much of the production publication processes are hidden from scientific authors, with typesetting focused on cross-references, linking citations, ensuring citations have appropriate IDs (e.g. DOIs) as well as conversion to JATS XML (a NISO standard for archiving scientific articles), metadata preparation to [CrossRef](https://crossref.org), and archiving services like LOCKSS (<https://lockss.org>) and CLOCKSS (<https://clockss.org>).
     Additionally, the many proprietary services and tools to create both online and PDF outputs of the authors work that are nicely typeset for reading on the web or online.
 
-One goal of the MyST Markdown project is to _dramatically_ reduce these direct-publication costs[^zero-cost] and directly provide direct unfettered access to structured data as an output of authoring.
+One goal of the MyST Markdown project is to _dramatically_ reduce these direct-publication costs[^zero-cost] and provide unfettered access to structured data as an output of authoring.
 The availability of this structured data directly enables exported content in a variety of formats including HTML, PDF and JATS-XML (a NISO standard for archiving scientific articles).
 In this article, we will demonstrate that having structured data throughout authoring can lead to a number of novel reading and authoring experiences [@eg:hover], connect to interactivity and computation [@eg:interactivity], and can provide new opportunities for reuse and quality checks when publishing [@eg:checks].
 Furthermore, these transformation processes can be run _continuously_, opening the possibilities for faster feedback [see @sec:continuous-science], iterative drafts, small tweaks and versioned improvements that otherwise would not be worth the time and cost.
@@ -146,7 +146,7 @@ The mechanisms to support continuous processes are through automation, rapid fee
 This gives us a technical lens to assess, for example:
 
 - How long does it take to get feedback if your metadata or DOI is incorrect?
-- When a computational figures or data output changes, how long does it take to integrate that into your document?
+- When a computational figure or data output changes, how long does it take to integrate that into your document?
 - How can you assess and test that the structure of a document applies to editorial rules?
 
 In science there is a highly manual, absurdly expensive and disconnected process between authoring and publishing.
diff --git a/papers/cockett_etal/banner.png b/papers/cockett_etal/banner.png
index e6a793bd6c..ac1736c55a 100644
Binary files a/papers/cockett_etal/banner.png and b/papers/cockett_etal/banner.png differ
diff --git a/papers/cockett_etal/myst.yml b/papers/cockett_etal/myst.yml
index 91a65124a2..85f6809fc9 100644
--- a/papers/cockett_etal/myst.yml
+++ b/papers/cockett_etal/myst.yml
@@ -1,9 +1,12 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/NKVC9349
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-cockett_etal
   title: Continuous Tools for Scientific Publishing
   subtitle: Using MyST Markdown and Curvenote to encourage continuous science practices
+  description: Science requires new mediums to compose ideas and ways to share research findings iteratively, as early as possible and connected directly to software and data. In this paper we discuss two tools for scientific authoring and publishing, MyST Markdown and Curvenote, and illustrate examples of improving metadata, reimagining the reading experience, including computational content, and transforming publishing practices for individuals and societies through automation and continuous practices.
   keywords:
     - scientific communication
     - publishing
@@ -114,13 +117,5 @@ project:
         - elife-era
         - dora2018
         - blinde2022
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
 site:
   template: article-theme
diff --git a/papers/cockett_etal/thumbnail.png b/papers/cockett_etal/thumbnail.png
new file mode 100644
index 0000000000..230a0e9a6f
Binary files /dev/null and b/papers/cockett_etal/thumbnail.png differ
diff --git a/papers/eli_knaap/banner.png b/papers/eli_knaap/banner.png
index aaf4a49f8d..f512e652c8 100644
Binary files a/papers/eli_knaap/banner.png and b/papers/eli_knaap/banner.png differ
diff --git a/papers/eli_knaap/myst.yml b/papers/eli_knaap/myst.yml
index c93733b57d..08c6b47683 100644
--- a/papers/eli_knaap/myst.yml
+++ b/papers/eli_knaap/myst.yml
@@ -1,10 +1,13 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/FVWM4182
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-eli_knaap
   # Ensure your title is the same as in your `main.md`
-  title: "geosnap: The Geospatial Neighborhood Analysis Package"
+  title: 'geosnap: The Geospatial Neighborhood Analysis Package'
   subtitle: Open Tools for Urban, Regional, and Neighborhood Science
+  description: Understanding neighborhood context is critical for social science research, public policy analysis, and urban planning. We introduce geosnap, the Geospatial Neighborhood Analysis Package, a suite of tools for exploring, modeling, and visualizing the social context and spatial extent of neighborhoods and regions over time.
   authors:
     - name: Elijah Knaap
       email: eknaap@sdsu.edu
@@ -41,13 +44,5 @@ project:
         - Park1925
         - Foti2012
         - tufte1983visual
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
 site:
   template: article-theme
diff --git a/papers/eli_knaap/thumbnail.png b/papers/eli_knaap/thumbnail.png
new file mode 100644
index 0000000000..aa02f131c8
Binary files /dev/null and b/papers/eli_knaap/thumbnail.png differ
diff --git a/papers/emily_dorne/banner.png b/papers/emily_dorne/banner.png
index e6a793bd6c..094ab74abb 100644
Binary files a/papers/emily_dorne/banner.png and b/papers/emily_dorne/banner.png differ
diff --git a/papers/emily_dorne/myst.yml b/papers/emily_dorne/myst.yml
index 6eac0f5c57..6ed86155aa 100644
--- a/papers/emily_dorne/myst.yml
+++ b/papers/emily_dorne/myst.yml
@@ -1,8 +1,11 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/PDHK7238
   id: scipy-2024-emily_dorne
   title: Cyanobacteria detection in small, inland water bodies with CyFi
   subtitle: Using satellite imagery and machine learning to protect public health
+  description: Harmful algal blooms pose major health risks to human and aquatic life. CyFi is an open-source Python package that enables detection of cyanobacteria in inland water bodies using 10-30m Sentinel-2 imagery and a computationally efficient tree-based machine learning model.
   authors:
     - name: Emily Dorne
       email: emily@drivendata.org
@@ -71,11 +74,5 @@ project:
         - dbscan
         - decimal_degrees
         - chlorophyll
-  banner: banner.png
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
 site:
   template: article-theme
diff --git a/papers/emily_dorne/thumbnail.png b/papers/emily_dorne/thumbnail.png
new file mode 100644
index 0000000000..9e80114c0d
Binary files /dev/null and b/papers/emily_dorne/thumbnail.png differ
diff --git a/papers/funix/banner.png b/papers/funix/banner.png
new file mode 100644
index 0000000000..d4fa231da3
Binary files /dev/null and b/papers/funix/banner.png differ
diff --git a/papers/funix/main.md b/papers/funix/main.md
index f8e35d9638..5f8a9fd884 100644
--- a/papers/funix/main.md
+++ b/papers/funix/main.md
@@ -94,6 +94,7 @@ In summary, Funix has the following cool features for effortless app building in
 3. Leveraging Python's native features to make app building more intuitive, minimizing the need to learn new concepts specific to Funix.
 
 <!-- ## Motivation: scientific apps can be low-interactivity, plain-looking but must be quickly built -->
+
 ## Motivation: the demand to rapidly launch low-interactivity and plain-looking apps at scale
 
 When it comes to GUI app development, there is a trade-off between simplicity and versatility.
@@ -101,6 +102,7 @@ JavaScript/TypeScript-based web frontend frameworks like [React](https://react.d
 However, their versatility is often beyond the reach of most scientists and engineers, except for frontend or full-stack developers, and is usually overkill for most scientific and engineering applications.
 
 As machine learning researchers, we have observed that a significant number of scientific applications share two common features:
+
 1. The underlying logic is a straightforward input-output process -- thus complex interactivity, such as dynamically updating an input widget based on user input in another input widget, is not needed.
 2. The app itself is not the end goal but a means to it -- thus it is not worthy to spend time on building the app.
 
@@ -155,22 +157,21 @@ The default mapping from basic Python types to React components is given in [](#
 :align: center
 :label: table_basic_python_types
 
-| Python type     | As an input (argument) or output (return) | Widget                                                                                                                               |
-|-----------------|-----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `str`           | Input           | MUI [TextField](https://mui.com/material-ui/react-text-field/)                                                                                                                            |
-| `bool`          | Input           | MUI [Checkbox](https://mui.com/material-ui/react-checkbox/) or [Switch](https://mui.com/material-ui/react-switch/)                                                                        |
-| `int`           | Input           | MUI [TextField](https://mui.com/material-ui/react-text-field/)                                                                                                                            |
-| `float`         | Input           | MUI [TextField](https://mui.com/material-ui/react-text-field/) or [Slider](https://mui.com/material-ui/react-slider/)                                                                     |
-| `Literal`       | Input           | MUI  [RadioGroup](https://mui.com/material-ui/react-radio-button/) if number of elements is below 8; [Select](https://mui.com/material-ui/react-select/) otherwise                         |
-| `range`         | Input           | MUI  [Slider](https://mui.com/material-ui/react-slider/)                                                                                                                                   |
-| `List[Literal]` | Input           | An array of MUI [Checkboxes](https://mui.com/material-ui/react-checkbox/) if the number of elements is below 8; [AutoComplete](https://mui.com/material-ui/react-autocomplete/) otherwise |
-| `str`           | Output          | Plain text                                                                                                                                                                            |
-| `bool`          | Output          | Plain text                                                                                                                                                                            |
-| `int`           | Output          | Plain text                                                                                                                                                                            |
-| `float`         | Output          | Plain text                                                                                                                                                                            |
-:::
-
+| Python type     | As an input (argument) or output (return) | Widget                                                                                                                                                                                    |
+| --------------- | ----------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `str`           | Input                                     | MUI [TextField](https://mui.com/material-ui/react-text-field/)                                                                                                                            |
+| `bool`          | Input                                     | MUI [Checkbox](https://mui.com/material-ui/react-checkbox/) or [Switch](https://mui.com/material-ui/react-switch/)                                                                        |
+| `int`           | Input                                     | MUI [TextField](https://mui.com/material-ui/react-text-field/)                                                                                                                            |
+| `float`         | Input                                     | MUI [TextField](https://mui.com/material-ui/react-text-field/) or [Slider](https://mui.com/material-ui/react-slider/)                                                                     |
+| `Literal`       | Input                                     | MUI [RadioGroup](https://mui.com/material-ui/react-radio-button/) if number of elements is below 8; [Select](https://mui.com/material-ui/react-select/) otherwise                         |
+| `range`         | Input                                     | MUI [Slider](https://mui.com/material-ui/react-slider/)                                                                                                                                   |
+| `List[Literal]` | Input                                     | An array of MUI [Checkboxes](https://mui.com/material-ui/react-checkbox/) if the number of elements is below 8; [AutoComplete](https://mui.com/material-ui/react-autocomplete/) otherwise |
+| `str`           | Output                                    | Plain text                                                                                                                                                                                |
+| `bool`          | Output                                    | Plain text                                                                                                                                                                                |
+| `int`           | Output                                    | Plain text                                                                                                                                                                                |
+| `float`         | Output                                    | Plain text                                                                                                                                                                                |
 
+:::
 
 Because Funix is a transcompiler, it leverages multimedia types already defined in popular Python libraries such as `ipywidgets` (Jupyter's input widgets), `IPython` (Jupyter's display system), `pandas`, and `matplotlib`. `ipywidgets` and `IPython` types are mapped to MUI components rather than their respective components for being React compatible. [](#fig_table_and_chart) illustrates a data-plot-from-tabular-data app that maps a `pandas.DataFrame` to a table and a `matplotlib.figure.Figure` to a chart.
 
@@ -179,23 +180,23 @@ Because Funix is a transcompiler, it leverages multimedia types already defined
 :align: center
 :label: table_mime_types
 
-| Python type                                     | As an input (argument) or output (return)|  Widget                                                             |
-|-------------------------------------------------|-----------------|-----------------------------------------------------------------------------------|
-| `ipywidgets.Password`                           | Input           | MUI [TextField](https://mui.com/material-ui/react-text-field/) with `type="password"` |
-| `ipywidgets.Image`                              | Input           | [React Dropzone](https://react-dropzone.js.org/) combine with MUI Components      |
-| `ipywidgets.Video`                              | Input           | [React Dropzone](https://react-dropzone.js.org/) combine with MUI Components      |
-| `ipywidgets.Audio`                              | Input           | [React Dropzone](https://react-dropzone.js.org/) combine with MUI Components      |
-| `ipywidgets.FileUpload`                         | Input           | [React Dropzone](https://react-dropzone.js.org/) combine with MUI Components      |
-| `IPython.display.HTML`                          | Output          | Raw HTML                                                                          |
-| `IPython.display.Markdown`                      | Output          | [React Markdown](https://github.com/remarkjs/react-markdown)                      |
-| `IPython.display.JavaScript`                    | Output          | Raw JavaScript                                                                    |
-| `IPython.display.Image`                         | Output          | MUI [CardMedia](https://mui.com/material-ui/react-card/#media) with `component=img`   |
-| `IPython.display.Video`                         | Output          | MUI  [CardMedia](https://mui.com/material-ui/react-card/#media) with `component=video` |
-| `IPython.display.Audio`                         | Output          | MUI [CardMedia](https://mui.com/material-ui/react-card/#media) with `component=audio` |
-| `matplotlib.figure.Figure`                      | Output          | [mpld3](https://mpld3.github.io/)                                                 |
-| `pandas.DataFrame` & `pandera.typing.DataFrame` | Input & Output  | MUI [DataGrid](https://mui.com/x/react-data-grid/)                                    |
-:::
+| Python type                                     | As an input (argument) or output (return) | Widget                                                                                |
+| ----------------------------------------------- | ----------------------------------------- | ------------------------------------------------------------------------------------- |
+| `ipywidgets.Password`                           | Input                                     | MUI [TextField](https://mui.com/material-ui/react-text-field/) with `type="password"` |
+| `ipywidgets.Image`                              | Input                                     | [React Dropzone](https://react-dropzone.js.org/) combine with MUI Components          |
+| `ipywidgets.Video`                              | Input                                     | [React Dropzone](https://react-dropzone.js.org/) combine with MUI Components          |
+| `ipywidgets.Audio`                              | Input                                     | [React Dropzone](https://react-dropzone.js.org/) combine with MUI Components          |
+| `ipywidgets.FileUpload`                         | Input                                     | [React Dropzone](https://react-dropzone.js.org/) combine with MUI Components          |
+| `IPython.display.HTML`                          | Output                                    | Raw HTML                                                                              |
+| `IPython.display.Markdown`                      | Output                                    | [React Markdown](https://github.com/remarkjs/react-markdown)                          |
+| `IPython.display.JavaScript`                    | Output                                    | Raw JavaScript                                                                        |
+| `IPython.display.Image`                         | Output                                    | MUI [CardMedia](https://mui.com/material-ui/react-card/#media) with `component=img`   |
+| `IPython.display.Video`                         | Output                                    | MUI [CardMedia](https://mui.com/material-ui/react-card/#media) with `component=video` |
+| `IPython.display.Audio`                         | Output                                    | MUI [CardMedia](https://mui.com/material-ui/react-card/#media) with `component=audio` |
+| `matplotlib.figure.Figure`                      | Output                                    | [mpld3](https://mpld3.github.io/)                                                     |
+| `pandas.DataFrame` & `pandera.typing.DataFrame` | Input & Output                            | MUI [DataGrid](https://mui.com/x/react-data-grid/)                                    |
 
+:::
 
 ```{code} python
 :linenos:
@@ -312,7 +313,6 @@ funix.set_default_theme("../../sunset_v2.json") # from local file
 funix.set_default_theme("grandma's secret theme") # from a name/alias
 ```
 
-
 ```{code} python
 :linenos:
 :label: code_theme_apply_function
@@ -363,7 +363,6 @@ funix.import_theme(
 )
 ```
 
-
 ## Building apps Pythonically
 
 Funix leverages the language features of Python and common practices in Python development to make app building in Python more intuitive and efficient.
@@ -372,7 +371,6 @@ Funix leverages the language features of Python and common practices in Python d
 
 Python supports default values for keyword arguments. Funix directly uses them as the placeholder values for corresponding widgets. For example, default values in [](#code_advanced_input_widgets) are prefilled in [](#fig_advanced_input_widgets). A more complex example is the `pandas.DataFrame` initated with `numpy` columns in [](#code_table_and_chart) are prefilled into the table in [](#fig_table_and_chart). In contrast, Funix' peer solutions require developers to provide the placeholder values the second time in the widget initiation.
 
-
 ### Making use of docstrings
 
 Docstrings are widely used in the Python community.
@@ -522,10 +520,9 @@ def guess_letter(Enter_a_letter: str) -> Markdown:
 ```
 
 A security risk is that there is only one backend server for the Funix app and consequently a `global` variable is accessible by all browser sessions of the app.
-To eliminate this risk, Funix provides a simple command-line flag `-t` at the launch of  the Funix app to sessionize all `global` variables.
+To eliminate this risk, Funix provides a simple command-line flag `-t` at the launch of the Funix app to sessionize all `global` variables.
 If the developer on purpose wants to share the data among different connections, the `-t` flag can be omitted.
 
-
 #### Multi-page apps from classes
 
 A special but useful case that requires maintaining states is multi-page apps.
@@ -585,9 +582,10 @@ This is where the `@funix` decorator comes into play. One example, as mentioned
 ### Overriding the type-based widget choice
 
 Funix uses types to determine the widgets. However, there may be needs to manually pick a widget. The `@funix` dectorator has a `widgets` parameter for this purpose.
+
 <!-- The input of the `widgets` parameter is a dictionary where keys are names of function arguments and values are .  -->
-[](#code_sentence_builder) is an example to temporarily override the widget choice for two variables of the types `Literal` and `List[Literal]` respectively. The corresponding app ([](#fig_sentence_builder)) is a sentence builder. The Funix-based code is much shorter and more human-readable than its [Gradio-based counterpart](https://www.gradio.app/playground?demo=Sentence_Builder&code=aW1wb3J0IGdyYWRpbyBhcyBncgoKCmRlZiBzZW50ZW5jZV9idWlsZGVyKHF1YW50aXR5LCBhbmltYWwsIGNvdW50cmllcywgcGxhY2UsIGFjdGl2aXR5X2xpc3QsIG1vcm5pbmcpOgogICAgcmV0dXJuIGYiIiJUaGUge3F1YW50aXR5fSB7YW5pbWFsfXMgZnJvbSB7IiBhbmQgIi5qb2luKGNvdW50cmllcyl9IHdlbnQgdG8gdGhlIHtwbGFjZX0gd2hlcmUgdGhleSB7IiBhbmQgIi5qb2luKGFjdGl2aXR5X2xpc3QpfSB1bnRpbCB0aGUgeyJtb3JuaW5nIiBpZiBtb3JuaW5nIGVsc2UgIm5pZ2h0In0iIiIKCgpkZW1vID0gZ3IuSW50ZXJmYWNlKAogICAgc2VudGVuY2VfYnVpbGRlciwKICAgIFsKICAgICAgICBnci5TbGlkZXIoMiwgMjAsIHZhbHVlPTQsIGxhYmVsPSJDb3VudCIsIGluZm89IkNob29zZSBiZXR3ZWVuIDIgYW5kIDIwIiksCiAgICAgICAgZ3IuRHJvcGRvd24oCiAgICAgICAgICAgIFsiY2F0IiwgImRvZyIsICJiaXJkIl0sIGxhYmVsPSJBbmltYWwiLCBpbmZvPSJXaWxsIGFkZCBtb3JlIGFuaW1hbHMgbGF0ZXIhIgogICAgICAgICksCiAgICAgICAgZ3IuQ2hlY2tib3hHcm91cChbIlVTQSIsICJKYXBhbiIsICJQYWtpc3RhbiJdLCBsYWJlbD0iQ291bnRyaWVzIiwgaW5mbz0iV2hlcmUgYXJlIHRoZXkgZnJvbT8iKSwKICAgICAgICBnci5SYWRpbyhbInBhcmsiLCAiem9vIiwgInJvYWQiXSwgbGFiZWw9IkxvY2F0aW9uIiwgaW5mbz0iV2hlcmUgZGlkIHRoZXkgZ28/IiksCiAgICAgICAgZ3IuRHJvcGRvd24oCiAgICAgICAgICAgIFsicmFuIiwgInN3YW0iLCAiYXRlIiwgInNsZXB0Il0sIHZhbHVlPVsic3dhbSIsICJzbGVwdCJdLCBtdWx0aXNlbGVjdD1UcnVlLCBsYWJlbD0iQWN0aXZpdHkiLCBpbmZvPSJMb3JlbSBpcHN1bSBkb2xvciBzaXQgYW1ldCwgY29uc2VjdGV0dXIgYWRpcGlzY2luZyBlbGl0LiBTZWQgYXVjdG9yLCBuaXNsIGVnZXQgdWx0cmljaWVzIGFsaXF1YW0sIG51bmMgbmlzbCBhbGlxdWV0IG51bmMsIGVnZXQgYWxpcXVhbSBuaXNsIG51bmMgdmVsIG5pc2wuIgogICAgICAgICksCiAgICAgICAgZ3IuQ2hlY2tib3gobGFiZWw9Ik1vcm5pbmciLCBpbmZvPSJEaWQgdGhleSBkbyBpdCBpbiB0aGUgbW9ybmluZz8iKSwKICAgIF0sCiAgICAidGV4dCIsCiAgICBleGFtcGxlcz1bCiAgICAgICAgWzIsICJjYXQiLCBbIkphcGFuIiwgIlBha2lzdGFuIl0sICJwYXJrIiwgWyJhdGUiLCAic3dhbSJdLCBUcnVlXSwKICAgICAgICBbNCwgImRvZyIsIFsiSmFwYW4iXSwgInpvbyIsIFsiYXRlIiwgInN3YW0iXSwgRmFsc2VdLAogICAgICAgIFsxMCwgImJpcmQiLCBbIlVTQSIsICJQYWtpc3RhbiJdLCAicm9hZCIsIFsicmFuIl0sIEZhbHNlXSwKICAgICAgICBbOCwgImNhdCIsIFsiUGFraXN0YW4iXSwgInpvbyIsIFsiYXRlIl0sIFRydWVdLAogICAgXQopCgppZiBfX25hbWVfXyA9PSAiX19tYWluX18iOgogICAgZGVtby5sYXVuY2goKQo=), thanks to leveraging the Python-native features like automatic rendering the `return` strings or default values.
 
+[](#code_sentence_builder) is an example to temporarily override the widget choice for two variables of the types `Literal` and `List[Literal]` respectively. The corresponding app ([](#fig_sentence_builder)) is a sentence builder. The Funix-based code is much shorter and more human-readable than its [Gradio-based counterpart](https://www.gradio.app/playground?demo=Sentence_Builder&code=aW1wb3J0IGdyYWRpbyBhcyBncgoKCmRlZiBzZW50ZW5jZV9idWlsZGVyKHF1YW50aXR5LCBhbmltYWwsIGNvdW50cmllcywgcGxhY2UsIGFjdGl2aXR5X2xpc3QsIG1vcm5pbmcpOgogICAgcmV0dXJuIGYiIiJUaGUge3F1YW50aXR5fSB7YW5pbWFsfXMgZnJvbSB7IiBhbmQgIi5qb2luKGNvdW50cmllcyl9IHdlbnQgdG8gdGhlIHtwbGFjZX0gd2hlcmUgdGhleSB7IiBhbmQgIi5qb2luKGFjdGl2aXR5X2xpc3QpfSB1bnRpbCB0aGUgeyJtb3JuaW5nIiBpZiBtb3JuaW5nIGVsc2UgIm5pZ2h0In0iIiIKCgpkZW1vID0gZ3IuSW50ZXJmYWNlKAogICAgc2VudGVuY2VfYnVpbGRlciwKICAgIFsKICAgICAgICBnci5TbGlkZXIoMiwgMjAsIHZhbHVlPTQsIGxhYmVsPSJDb3VudCIsIGluZm89IkNob29zZSBiZXR3ZWVuIDIgYW5kIDIwIiksCiAgICAgICAgZ3IuRHJvcGRvd24oCiAgICAgICAgICAgIFsiY2F0IiwgImRvZyIsICJiaXJkIl0sIGxhYmVsPSJBbmltYWwiLCBpbmZvPSJXaWxsIGFkZCBtb3JlIGFuaW1hbHMgbGF0ZXIhIgogICAgICAgICksCiAgICAgICAgZ3IuQ2hlY2tib3hHcm91cChbIlVTQSIsICJKYXBhbiIsICJQYWtpc3RhbiJdLCBsYWJlbD0iQ291bnRyaWVzIiwgaW5mbz0iV2hlcmUgYXJlIHRoZXkgZnJvbT8iKSwKICAgICAgICBnci5SYWRpbyhbInBhcmsiLCAiem9vIiwgInJvYWQiXSwgbGFiZWw9IkxvY2F0aW9uIiwgaW5mbz0iV2hlcmUgZGlkIHRoZXkgZ28/IiksCiAgICAgICAgZ3IuRHJvcGRvd24oCiAgICAgICAgICAgIFsicmFuIiwgInN3YW0iLCAiYXRlIiwgInNsZXB0Il0sIHZhbHVlPVsic3dhbSIsICJzbGVwdCJdLCBtdWx0aXNlbGVjdD1UcnVlLCBsYWJlbD0iQWN0aXZpdHkiLCBpbmZvPSJMb3JlbSBpcHN1bSBkb2xvciBzaXQgYW1ldCwgY29uc2VjdGV0dXIgYWRpcGlzY2luZyBlbGl0LiBTZWQgYXVjdG9yLCBuaXNsIGVnZXQgdWx0cmljaWVzIGFsaXF1YW0sIG51bmMgbmlzbCBhbGlxdWV0IG51bmMsIGVnZXQgYWxpcXVhbSBuaXNsIG51bmMgdmVsIG5pc2wuIgogICAgICAgICksCiAgICAgICAgZ3IuQ2hlY2tib3gobGFiZWw9Ik1vcm5pbmciLCBpbmZvPSJEaWQgdGhleSBkbyBpdCBpbiB0aGUgbW9ybmluZz8iKSwKICAgIF0sCiAgICAidGV4dCIsCiAgICBleGFtcGxlcz1bCiAgICAgICAgWzIsICJjYXQiLCBbIkphcGFuIiwgIlBha2lzdGFuIl0sICJwYXJrIiwgWyJhdGUiLCAic3dhbSJdLCBUcnVlXSwKICAgICAgICBbNCwgImRvZyIsIFsiSmFwYW4iXSwgInpvbyIsIFsiYXRlIiwgInN3YW0iXSwgRmFsc2VdLAogICAgICAgIFsxMCwgImJpcmQiLCBbIlVTQSIsICJQYWtpc3RhbiJdLCAicm9hZCIsIFsicmFuIl0sIEZhbHNlXSwKICAgICAgICBbOCwgImNhdCIsIFsiUGFraXN0YW4iXSwgInpvbyIsIFsiYXRlIl0sIFRydWVdLAogICAgXQopCgppZiBfX25hbWVfXyA9PSAiX19tYWluX18iOgogICAgZGVtby5sYXVuY2goKQo=), thanks to leveraging the Python-native features like automatic rendering the `return` strings or default values.
 
 ```{code} python
 :linenos:
@@ -620,6 +618,7 @@ def sentence_builder(
 
 The sentence builder app in Funix. Source code in [](#code_sentence_builder). Gradio-based version [here](https://www.gradio.app/playground?demo=Sentence_Builder&code=aW1wb3J0IGdyYWRpbyBhcyBncgoKCmRlZiBzZW50ZW5jZV9idWlsZGVyKHF1YW50aXR5LCBhbmltYWwsIGNvdW50cmllcywgcGxhY2UsIGFjdGl2aXR5X2xpc3QsIG1vcm5pbmcpOgogICAgcmV0dXJuIGYiIiJUaGUge3F1YW50aXR5fSB7YW5pbWFsfXMgZnJvbSB7IiBhbmQgIi5qb2luKGNvdW50cmllcyl9IHdlbnQgdG8gdGhlIHtwbGFjZX0gd2hlcmUgdGhleSB7IiBhbmQgIi5qb2luKGFjdGl2aXR5X2xpc3QpfSB1bnRpbCB0aGUgeyJtb3JuaW5nIiBpZiBtb3JuaW5nIGVsc2UgIm5pZ2h0In0iIiIKCgpkZW1vID0gZ3IuSW50ZXJmYWNlKAogICAgc2VudGVuY2VfYnVpbGRlciwKICAgIFsKICAgICAgICBnci5TbGlkZXIoMiwgMjAsIHZhbHVlPTQsIGxhYmVsPSJDb3VudCIsIGluZm89IkNob29zZSBiZXR3ZWVuIDIgYW5kIDIwIiksCiAgICAgICAgZ3IuRHJvcGRvd24oCiAgICAgICAgICAgIFsiY2F0IiwgImRvZyIsICJiaXJkIl0sIGxhYmVsPSJBbmltYWwiLCBpbmZvPSJXaWxsIGFkZCBtb3JlIGFuaW1hbHMgbGF0ZXIhIgogICAgICAgICksCiAgICAgICAgZ3IuQ2hlY2tib3hHcm91cChbIlVTQSIsICJKYXBhbiIsICJQYWtpc3RhbiJdLCBsYWJlbD0iQ291bnRyaWVzIiwgaW5mbz0iV2hlcmUgYXJlIHRoZXkgZnJvbT8iKSwKICAgICAgICBnci5SYWRpbyhbInBhcmsiLCAiem9vIiwgInJvYWQiXSwgbGFiZWw9IkxvY2F0aW9uIiwgaW5mbz0iV2hlcmUgZGlkIHRoZXkgZ28/IiksCiAgICAgICAgZ3IuRHJvcGRvd24oCiAgICAgICAgICAgIFsicmFuIiwgInN3YW0iLCAiYXRlIiwgInNsZXB0Il0sIHZhbHVlPVsic3dhbSIsICJzbGVwdCJdLCBtdWx0aXNlbGVjdD1UcnVlLCBsYWJlbD0iQWN0aXZpdHkiLCBpbmZvPSJMb3JlbSBpcHN1bSBkb2xvciBzaXQgYW1ldCwgY29uc2VjdGV0dXIgYWRpcGlzY2luZyBlbGl0LiBTZWQgYXVjdG9yLCBuaXNsIGVnZXQgdWx0cmljaWVzIGFsaXF1YW0sIG51bmMgbmlzbCBhbGlxdWV0IG51bmMsIGVnZXQgYWxpcXVhbSBuaXNsIG51bmMgdmVsIG5pc2wuIgogICAgICAgICksCiAgICAgICAgZ3IuQ2hlY2tib3gobGFiZWw9Ik1vcm5pbmciLCBpbmZvPSJEaWQgdGhleSBkbyBpdCBpbiB0aGUgbW9ybmluZz8iKSwKICAgIF0sCiAgICAidGV4dCIsCiAgICBleGFtcGxlcz1bCiAgICAgICAgWzIsICJjYXQiLCBbIkphcGFuIiwgIlBha2lzdGFuIl0sICJwYXJrIiwgWyJhdGUiLCAic3dhbSJdLCBUcnVlXSwKICAgICAgICBbNCwgImRvZyIsIFsiSmFwYW4iXSwgInpvbyIsIFsiYXRlIiwgInN3YW0iXSwgRmFsc2VdLAogICAgICAgIFsxMCwgImJpcmQiLCBbIlVTQSIsICJQYWtpc3RhbiJdLCAicm9hZCIsIFsicmFuIl0sIEZhbHNlXSwKICAgICAgICBbOCwgImNhdCIsIFsiUGFraXN0YW4iXSwgInpvbyIsIFsiYXRlIl0sIFRydWVdLAogICAgXQopCgppZiBfX25hbWVfXyA9PSAiX19tYWluX18iOgogICAgZGVtby5sYXVuY2goKQo=).
 ```
+
 ### Automatic re-run triggered by input changes
 
 As mentioned earlier, Funix is suitable for straightforward input-output processes. Such a process is triggered once when the "Run" button is clicked. This may work for many cases but in many other cases, we may want the output to be updated following the changes in the input end automatically.
@@ -652,6 +651,7 @@ A sine wave generator with the `autorun` parameter toggled on. Source code in []
 ```
 
 ### Conditional visibility
+
 Although interactivity is not a strong suit of Funix for reasons aforementioned, Funix still supports some common interactivity features. One of them is "conditional visibility" which reveal some widgets only when certain conditions are met ([](#code_conditional_visible) and [](#fig_conditional_visible)).
 
 ```{code} python
@@ -700,7 +700,7 @@ When an app is exposed, a common concern is how to avoid abuses. Rate limiting i
 
 ### Reactive apps
 
-Funix can dynamically prefill widgets based on information from other widgets. We call this "reactive."  An example is given in [](#code_reactive). The `tax` argument of the function is populated automatically based on the values of `salary` and `income_tax_rate` as the user enters.
+Funix can dynamically prefill widgets based on information from other widgets. We call this "reactive." An example is given in [](#code_reactive). The `tax` argument of the function is populated automatically based on the values of `salary` and `income_tax_rate` as the user enters.
 
 <!-- To use the reactive feature, the function that computes the reactive value must be defined  -->
 
@@ -760,11 +760,11 @@ The source code can be found [here](https://github.com/TexteaInc/funix/blob/deve
 
 The Wordle game implemented in Funix. Source code [here](https://github.com/TexteaInc/funix/blob/develop/examples/games/wordle.py).
 ```
+
 ### ChatGPT multi-turn
 
 Funix does not have a chat widget, because it is so easy (less than 10 lines in [](#code_joke)) to build one using simple alignment controls in HTML. The only thing Funix-specific in the code is using the `@funix` decorator to change the arrangement of the input and output panels from the default left-right to top-down for a more natural chat experience.
 
-
 ```{code} python
 :linenos:
 :emphasize-lines: 9,10,11,12,13,14,15,16,17
@@ -802,7 +802,6 @@ def ChatGPT_multi_turn(current_message: str)  -> IPython.display.HTML:
     return __print_messages_html(messages)
 ```
 
-
 ```{figure} figures/joke.gif
 :label: fig_joke
 
@@ -858,7 +857,6 @@ return response.choices[0].message.content
 Funix maps a `ipywidgets.{Image, Audio, Video, File}`-type arguments to a drag-and-drop file uploader with push-to-capture ability from the microphone or webcam of the computer. The corresponding source code is in [](#code_multimedia).
 ```
 
-
 ### Vector stripping in bioinformatics
 
 A vector is a nucleotide sequence that is appended to a nucleotide sequence of interest for easy handling or quality control.
@@ -899,13 +897,13 @@ def remove_3_prime_adapter(
 
 ```
 
-```{figure} figures/figures/vector_stripping.png
+```{figure} figures/vector_stripping.png
 :label: fig_vector_stripping
 ```
 
 ## Conclusion
 
-In this paper, we introduce the philosophy and features of Funix. Funix is motivated by the observations in scientific computing that many apps are straightforward  input-output processes and the apps are meant to be disposable at a large volume. Therefore, Funix' goal is to enable developers, who are experts in their scientific domains but not in frontend development, to build apps by continue doing what they are doing, without code modification or learning anything new.
+In this paper, we introduce the philosophy and features of Funix. Funix is motivated by the observations in scientific computing that many apps are straightforward input-output processes and the apps are meant to be disposable at a large volume. Therefore, Funix' goal is to enable developers, who are experts in their scientific domains but not in frontend development, to build apps by continue doing what they are doing, without code modification or learning anything new.
 To get this goal, Funix leverages the language features of the Python language, including docstrings and keywords, to automatically generate the GUIs for apps and control the behaviors of the app.
 Funix tries to minimize reinventing the wheel by being a transcompiler between the Python word and the React world.
 Not only does it expose developers to the limitless resources in the frontend world, but it also minimizes the learning curve.
diff --git a/papers/funix/myst.yml b/papers/funix/myst.yml
index 0c828ecf8a..54c5a28507 100644
--- a/papers/funix/myst.yml
+++ b/papers/funix/myst.yml
@@ -1,10 +1,13 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/JFYN3740
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-funix
   # Ensure your title is the same as in your `main.md`
   title: Funix - The laziest way to build GUI apps in Python
   subtitle: Make it (the app) before they fake it (in Figma or on a napkin)
+  description: Presenting a model or algorithm as a GUI application is a common need in the scientific and engineering community.  Funix was created to automatically launch apps from existing Python functions, automatically selecting widgets based on the types of the arguments and returning functions according to the type-to-widget mapping defined in a theme.
   # Authors should have affiliations, emails and ORCIDs if available
   authors:
     - name: Forrest Sheng Bao
@@ -31,7 +34,14 @@ project:
     - frontend development
   # Add the abbreviations that you use in your paper here
   abbreviations:
-    MyST: Markedly Structured Text
+    ML: machine learning
+    AI: artificial intelligence
+    GenAI: generative artificial intelligence
+    LLM: large language model
+    GUI: graphical user interface
+    DSL: Domain Specific Language
+    MUI: Material UI (User Interface)
+    CLI: command line interface
   # It is possible to explicitly ignore the `doi-exists` check for certain citation keys
   error_rules:
     - rule: doi-exists
@@ -42,13 +52,5 @@ project:
         - jupyter
         - sklearn1
         - sklearn2
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
 site:
   template: article-theme
diff --git a/papers/funix/thumbnail.png b/papers/funix/thumbnail.png
new file mode 100644
index 0000000000..dd09b5631a
Binary files /dev/null and b/papers/funix/thumbnail.png differ
diff --git a/papers/heinrich_peters/myst.yml b/papers/heinrich_peters/myst.yml
index 462347793d..9f671491b9 100644
--- a/papers/heinrich_peters/myst.yml
+++ b/papers/heinrich_peters/myst.yml
@@ -1,9 +1,12 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/MDCE8355
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-heinrich_peters
   title: Model Share AI
   subtitle: An Integrated Toolkit for Collaborative Machine Learning Model Development, Provenance Tracking, and Deployment in Python
+  description: Machine learning is revolutionizing a wide range of research areas and industries, but many ML projects never progress past the proof-of-concept stage. To address this problem, we introduce Model Share AI, a platform designed to streamline collaborative model development, model provenance tracking, and model deployment.
   # Authors should have affiliations, emails and ORCIDs if available
   authors:
     - name: Heinrich Peters
@@ -12,7 +15,7 @@ project:
       affiliations:
         - Columbia University
       corresponding: True
-      roles: 
+      roles:
         - Conceptualization
         - Methodology
         - Software
@@ -21,7 +24,7 @@ project:
       email: mp3675@columbia.edu
       affiliations:
         - Columbia University
-      roles: 
+      roles:
         - Conceptualization
         - Methodology
         - Software
@@ -56,14 +59,5 @@ project:
         - feurer_efficient_2015
         - feurer_openml-python_2021
         - chollet_keras_2015
-
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
 site:
   template: article-theme
diff --git a/papers/heinrich_peters/thumbnail.png b/papers/heinrich_peters/thumbnail.png
new file mode 100644
index 0000000000..8071ddbc69
Binary files /dev/null and b/papers/heinrich_peters/thumbnail.png differ
diff --git a/papers/henry_schreiner/banner.png b/papers/henry_schreiner/banner.png
index e6a793bd6c..ebefbe74e8 100644
Binary files a/papers/henry_schreiner/banner.png and b/papers/henry_schreiner/banner.png differ
diff --git a/papers/henry_schreiner/myst.yml b/papers/henry_schreiner/myst.yml
index 5c5b693a62..3ca32f1d74 100644
--- a/papers/henry_schreiner/myst.yml
+++ b/papers/henry_schreiner/myst.yml
@@ -1,10 +1,13 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/FMKR8387
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-henry_schreiner
   # Ensure your title is the same as in your `main.md`
   title: Scikit-build-core
   subtitle: A modern build-backend for CPython C/C++/Fortran/Cython extensions
+  description: Discover how scikit-build-core revolutionizes Python extension building with its seamless integration of CMake and Python packaging standards. Learn about its enhanced features for cross-compilation, multi-platform support, and simplified configuration, which enable writing binary extensions with pybind11, Nanobind, Fortran, Cython, C++, and more.
   # Authors should have affiliations, emails and ORCIDs if available
   authors:
     - name: Henry Schreiner
@@ -39,14 +42,6 @@ project:
     NSF: National Science Foundation
     PyPI: Python Package Index
     SDist: Source Distribution, the way Python packages distribute source code
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
   error_rules:
     - rule: doi-exists
       severity: ignore
diff --git a/papers/henry_schreiner/thumbnail.png b/papers/henry_schreiner/thumbnail.png
new file mode 100644
index 0000000000..0a946d2e61
Binary files /dev/null and b/papers/henry_schreiner/thumbnail.png differ
diff --git a/papers/josh_borrow/banner.png b/papers/josh_borrow/banner.png
new file mode 100644
index 0000000000..7f70e9c428
Binary files /dev/null and b/papers/josh_borrow/banner.png differ
diff --git a/papers/josh_borrow/myst.yml b/papers/josh_borrow/myst.yml
index 0ab9c24d85..7a0178521b 100644
--- a/papers/josh_borrow/myst.yml
+++ b/papers/josh_borrow/myst.yml
@@ -1,12 +1,15 @@
 # See docs at: https://mystmd.org/guide/frontmatter
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/HWGA5253
   id: scipy-2024-josh_borrow
   title: Making Research Data Flow With Python
+  description: The increasing volume of research data in fields such as astronomy, biology, and engineering necessitates efficient distributed data management. This paper presents the Librarian, a custom framework designed for data transfer in large academic collaborations, designed for the Simons Observatory.
   keywords:
-   - data
-   - python
-   - research
+    - data
+    - python
+    - research
   authors:
     - name: Josh Borrow
       email: josh@joshborrow.com
@@ -17,32 +20,23 @@ project:
       email: paul.laplante@unlv.edu
       orcid: 0000-0002-4693-0102
       affiliations:
-      - Department of Computer Science, University of Nevada, Las Vegas, NV 89154
-      - Nevada Center for Astrophysics, University of Nevada, Las Vegas, NV 89154
+        - Department of Computer Science, University of Nevada, Las Vegas, NV 89154
+        - Nevada Center for Astrophysics, University of Nevada, Las Vegas, NV 89154
     - name: James Aguirre
       email: jaguirre@sas.upenn.edu
-      orcid: 0000-0002-4810-666X 
+      orcid: 0000-0002-4810-666X
       affiliations:
-      - Department of Physics and Astronomy, University of Pennsylvania, 209 South 33rd Street, Philadelphia, PA, USA 19104
+        - Department of Physics and Astronomy, University of Pennsylvania, 209 South 33rd Street, Philadelphia, PA, USA 19104
     - name: Peter K. G. Williams
       email: pwilliams@cfa.harvard.edu
       orcid: 0000-0003-3734-3587
       affiliations:
-      - "Center for Astrophysics | Harvard & Smithsonian, 60 Garden St., Cambridge, MA 02138"
-  # github:
-  # bibliography: []
-  banner: banner.png
+        - 'Center for Astrophysics | Harvard & Smithsonian, 60 Garden St., Cambridge, MA 02138'
   error_rules:
     - rule: doi-exists
       severity: ignore
       keys:
         - iRODS2024
         - GitAnnex2024
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
 site:
   template: article-theme
diff --git a/papers/josh_borrow/paper.md b/papers/josh_borrow/paper.md
index c15309f836..4fd26300c8 100644
--- a/papers/josh_borrow/paper.md
+++ b/papers/josh_borrow/paper.md
@@ -1,36 +1,33 @@
 ---
-title: "Making Research Data Flow With Python"
+title: 'Making Research Data Flow With Python'
 abstract: |
-    The increasing volume of research data in fields such as astronomy, biology,
-    and engineering necessitates efficient distributed data management.
-    Traditional commercial solutions are often unsuitable for the decentralized
-    infrastructure typical of academic projects. This paper presents the
-    Librarian, a custom framework designed for data transfer in large academic
-    collaborations, designed for the Simons Observatory (SO) as a ground up
-    re-architechture of a previous astronomical data management tool called the
-    'HERA Librarian' from which it takes its name. SO is a new-generation
-    observatory designed for observing the Cosmic Microwave Background, and is
-    located in the Atacama desert in Chile at over 5000 meters of elevation.
-
-    Existing tools like Globus Flows, iRODS, Rucio, and Datalad were evaluated
-    but were found to be lacking in automation or simplicity. Librarian
-    addresses these gaps by integrating with Globus for efficient data transfer
-    and providing a RESTful API for easy interaction. It also supports transfers
-    through the movement of physical media for environments with intermittent
-    connectivity.
-
-    Using technologies like Python, FastAPI, and SQLAlchemy, the Librarian
-    ensures robust, scalable, and user-friendly data management tailored to the
-    needs of large-scale scientific projects. This solution demonstrates an
-    effective method for managing the substantial data flows in modern 'big
-    science' endeavors.
-exports:
-  - format: pdf
-    template: arxiv_two_column
-    output: exports/my-document.pdf
+  The increasing volume of research data in fields such as astronomy, biology,
+  and engineering necessitates efficient distributed data management.
+  Traditional commercial solutions are often unsuitable for the decentralized
+  infrastructure typical of academic projects. This paper presents the
+  Librarian, a custom framework designed for data transfer in large academic
+  collaborations, designed for the Simons Observatory (SO) as a ground up
+  re-architechture of a previous astronomical data management tool called the
+  'HERA Librarian' from which it takes its name. SO is a new-generation
+  observatory designed for observing the Cosmic Microwave Background, and is
+  located in the Atacama desert in Chile at over 5000 meters of elevation.
+
+  Existing tools like Globus Flows, iRODS, Rucio, and Datalad were evaluated
+  but were found to be lacking in automation or simplicity. Librarian
+  addresses these gaps by integrating with Globus for efficient data transfer
+  and providing a RESTful API for easy interaction. It also supports transfers
+  through the movement of physical media for environments with intermittent
+  connectivity.
+
+  Using technologies like Python, FastAPI, and SQLAlchemy, the Librarian
+  ensures robust, scalable, and user-friendly data management tailored to the
+  needs of large-scale scientific projects. This solution demonstrates an
+  effective method for managing the substantial data flows in modern 'big
+  science' endeavors.
 ---
 
 ## Introduction
+
 Research data is ever-growing, with even small projects now producing terabytes
 of data. This has been matched by an increase in the typical size of workstation
 storage and compute, but as many fields like astronomy, biology, and engineering
@@ -77,6 +74,7 @@ Our data is comprised of folders (termed 'books'), each containing 10-20 GB of
 data, laid out in a pre-determined scheme in a POSIX filesystem.
 
 ## Comparing Data Flows
+
 ```{figure} ./Display.svg
 :name: fig-display
 Showing a typical layout of a commercial data engineering structure (top) versus
@@ -93,6 +91,7 @@ likely that multiple copies of experimental data must be kept at various sites,
 and tight control needs to be maintained on the specific sub-sets of data stored
 at each.
 ```
+
 In @fig-display, we show two example data flows: one for a commercial enterprise
 (on the top), and one for a typical academic project (bottom; this matches
 closely with the needs of the Simons Observatory).
@@ -136,7 +135,7 @@ maps, etc.), and publications. Both regulation and community standards typically
 leads to these products being _immutable_ and available _forever_, hosted by an
 external service. This reliance on external compute and storage leads to our
 understanding of this workflow through the flow of data into and out of the
-aforementioned providers. 
+aforementioned providers.
 
 Though they may have varied locations and resources, the computing centers are
 all usually organised in a broadly similar way, with dedicated data ingest (or
@@ -157,15 +156,17 @@ and in the form of 'books' (structured directories of binary data), within a
 pre-determined directory structure that must be replicated exactly on all of the
 downstream sites. At certain downstream sites, we have space for all of the data
 from the entire project, but at others only a recent, rolling, sub-set may be
-maintained. 
+maintained.
 
 ### Existing Software
+
 Because of the significantly different constraints in the academic world versus
 the commercial world, there is a relatively limited set of tools that can be
 used to accomplish the goal of automated data management and transfer. In our
 search, we evaluated the following tools:
 
 #### Globus
+
 Globus [@Foster2011;@Allen2012] is a non-profit service from the University of
 Chicago, and specialises in the batch transfer of large data products. Globus
 works by having an 'endpoint server' running at each location, which can
@@ -185,6 +186,7 @@ such, Globus is more of a 'data transfer' tool, than a data management tool. It
 does not have significant cataloguing capability.
 
 #### Rucio and iRODS
+
 Rucio [@Rucio2019] is a distributed data management system that was originally
 designed for the ATLAS high-energy physics experiment at the LHC, and as such is
 extremely scalable (i.e. exabytes or more). Rucio can be backed by different
@@ -192,7 +194,7 @@ levels of storage, from fast (SSD) to slow (tape), and is declarative (meaning
 that one simply asks the system to follow a set of rules, like 'keep three
 copies of the data'), with its own user and permissions management system. Rucio
 is an exceptionally capable piece of software, but this comes with significant
-complexity. Further,  data is managed externally to the underlying filesystem
+complexity. Further, data is managed externally to the underlying filesystem
 and permissions model of the host, potentially causing issues with the user
 agreements at shared facilities, and interacting with data generally requires
 interaction with the Rucio API. Though Rucio is certainly a fantastic tool, we
@@ -208,6 +210,7 @@ rule-based data management, it was discovered to be incompatible with our need
 to tightly control the specific transfers taking place.
 
 #### Datalad and git-annex
+
 The git-annex [@GitAnnex2024] tool is an extension to Git, the version control
 system, to handle large quantities of binary data. Due to its extensive use by
 home and small business users, it explicitly supports SneakerNet transfers
@@ -223,6 +226,7 @@ data over a variety of storage needs, it was unfortunately not appropriate for
 the Simons Observatory.
 
 ## The Librarian
+
 After much consideration, the collaboration decided that building an
 orchestration and tracking framework on top of Globus was most appropriate. In
 addition, there was an existing piece of software used for the Hydrogen Epoch of
@@ -246,6 +250,7 @@ design concepts that were carried over from the HERA Librarian:
   issues.
 
 The Librarian[^librarian] is made up of five major pieces:
+
 1. A FastAPI[^fastapi] server that allows access through a REST HTTP API.
 2. A database server (postgres[^postgres] in production, SQLite[^sqlite] for
    development) to track state and provide atomicity.
@@ -270,6 +275,7 @@ transferred. The Librarian is open source and available through
 GitHub[^librarian], under the BSD-2-Clause license.
 
 ### Technology Choices
+
 For this project, it was crucial that we used the Python language, as this is
 the lingua franca of the collaboration; all members have at least some knowledge
 of Python. No other language comes close (by a significant margin), and all
@@ -289,6 +295,7 @@ sense to use SQLAlchemy[^sqlalchemy] as our object-relational mapping (ORM) to
 translate database operations to object manipulation.
 
 ### Service Layout
+
 ```{figure} ./Ingest.svg
 :name: fig-ingest
 Showing data ingest (green) and cloning between Librarian instances (orange).
@@ -316,6 +323,7 @@ to explicitly delegate metadata management to other subsystems within the
 project, focusing only on file transfers.
 
 #### User and Librarian Management
+
 During the initial setup of the system, an administrator user is provisioned to
 facilitate further management tasks. This primary administrator has the ability
 to create additional user accounts through the `librarian` command-line tool.
@@ -332,6 +340,7 @@ are salted and hashed in the database using the Argon2[^argon2] algorithm. The
 API employs HTTP Basic Authentication for user verification and access control.
 
 #### Storage Management
+
 In the Librarian system, storage is abstracted into entities known as 'stores'.
 These stores are provisioned during the setup process and can be migrated with a
 server restart. Each store comprises two components: a staging area for ingested
@@ -352,6 +361,7 @@ database. Additionally, all files can have 'remote instances', which are known
 instances of files located on another Librarian.
 
 #### Data Ingestion
+
 Data ingestion follows a systematic process using accounts that have read and
 append privileges. Initially, a request to upload is made to the Librarian web
 server prompting it to create a temporary UUID-named directory in the staging
@@ -365,6 +375,7 @@ one provided in the upload request. If the checksums are consistent, the server
 ingests the file into the storage area.
 
 From the client's perspective, the upload is extremely simple:
+
 ```python
 from hera_librarian import LibrarianClient
 from hera_librarian.settings import client_settings
@@ -381,6 +392,7 @@ client.upload(
     Path("/hello/world/this/is/a/file.txt")
 )
 ```
+
 This then leads to a synchronous uploading of the local file to the global
 namespace, with `upload` returning once this is complete and, crucially, the
 upload is verified (via a checksum) by the server. For most applications,
@@ -389,6 +401,7 @@ successfully, the file is guaranteed to have been correctly uploaded to
 the server and is ready for use and export.
 
 #### Data Cloning
+
 Data can be cloned in two main ways: locally and remotely. Local cloning
 involves making a copy to a different store on the same Librarian, which is
 often used for SneakerNet. This process is straightforward and handled by a
@@ -413,6 +426,7 @@ is generated to the source Librarian to complete the transfer and register a new
 remote instance.
 
 #### Data Access
+
 Data can be accessed through the Librarian, by querying it for the location of
 individual instances of a file. However, because our stores generally just wrap
 the POSIX filesystem, users typically already know a-priori where the files that
@@ -421,12 +435,14 @@ such, data access is generally as simple as opening the file; it is where users
 expect it to be.
 
 ### Data Down the Mountain
+
 ```{figure} ./Sneaker.svg
 :name: fig-sneaker
 Showing the SneakerNet flow between two Librarian instances. The sneaker drive
 is physically moved between computing sites, alongside a manifest of all files
 on the drive, to complete transfers in large batches.
 ```
+
 In the specific case of an observatory, there is an additional challenge here:
 lines representing data flows in @fig-ingest no longer simply show bytes sent
 over HTTP connections to highly available services, but instead represent
@@ -444,10 +460,11 @@ offload the data is required. The observatory produces around 20 TB of data a
 week, which is much more than the 50 MBit/s radio link can transfer; either way,
 that bandwidth is required for other uses. This necessitates a more primitive
 solution: so-called 'SneakerNet' transfers, where data is carried by hand on
-physical media. 
+physical media.
 
 SneakerNet transfers are supported natively by the Librarian, and occur through
 the following steps (as shown in @fig-sneaker):
+
 - A new store is provisioned that refers to the mounted drive that will be
   carried in the process.
 - A local clone background task catches up on missing files to this drive and,
@@ -468,6 +485,7 @@ This is an unusual process, but comes in extremely handy in bandwidth
 constrained, remote, environments.
 
 ### Testing
+
 Testing complex data flows like those represented in the Librarian is a
 notoriously difficult task. For instance, testing the components that make up
 the SneakerNet workflow means being able to test two interacting web-servers,
@@ -502,8 +520,9 @@ this approach is that it is not possible to get test coverage for the lines that
 are performed inside the `xprocess` fork.
 
 ### Deployment Details
+
 As we have made a relatively straightforward selection of technologies, and the
-fact that the Librarian is a  small application (less than 10'000 lines),
+fact that the Librarian is a small application (less than 10'000 lines),
 deployment is a simple process. In addition, it makes it straightforward to
 explain the code paths that are taken by the Librarian.
 
@@ -516,7 +535,7 @@ and notifications.
 The focus on simplicity that we have made for the Librarian has made deployment
 simple; system administrators within the academic community are very familiar
 with Globus, and have been happy to assist with deployments of Librarian
-orchestration framework. 
+orchestration framework.
 
 ## Conclusions
 
@@ -543,7 +562,6 @@ Simons Observatory, highlighting its potential as a versatile and reliable data
 management solution for large-scale scientific collaborations. We provide
 Librarian as free, open source, software to the community.
 
-
 [^librarian]: https://github.com/simonsobs/librarian
 [^fastapi]: https://fastapi.tiangolo.com
 [^postgres]: https://www.postgresql.org
diff --git a/papers/josh_borrow/thumbnail.png b/papers/josh_borrow/thumbnail.png
new file mode 100644
index 0000000000..a5e281af81
Binary files /dev/null and b/papers/josh_borrow/thumbnail.png differ
diff --git a/papers/manz_anywidget/banner.png b/papers/manz_anywidget/banner.png
index e6a793bd6c..58135bea1d 100644
Binary files a/papers/manz_anywidget/banner.png and b/papers/manz_anywidget/banner.png differ
diff --git a/papers/manz_anywidget/main.md b/papers/manz_anywidget/main.md
index 4ac87e983e..a7c20b2805 100644
--- a/papers/manz_anywidget/main.md
+++ b/papers/manz_anywidget/main.md
@@ -16,7 +16,7 @@ abstract: |
   by using the web browser’s built-in module system to load these modules from
   the notebook kernel. This design simplifies widget authorship and distribution,
   enables rapid prototyping, and lowers the barrier to entry for newcomers.
-  Anywidget is compatible with not just JCPs but any web-based notebook
+  Anywidget is compatible with not just Jupyter-compatible platforms but any web-based notebook
   platform or authoring environment and is already adopted by other projects.
   Its adoption has sparked a widget renaissance, improving reusability,
   interoperability, and making interactive computing more accessible.
@@ -151,8 +151,8 @@ native module system. For existing JCPs, anywidget provides a front-end adapter
 to load and execute these standardized modules, while new platforms can add
 native AFM support directly. Widget kernel-side code and AFM can be run
 directly from within notebooks, from source files, or distributed as single
-Python packages. With anywidget, each JCP consumes the same AFM, and widget 
-authors can compose an AFM directly or target AFM through a build step to 
+Python packages. With anywidget, each JCP consumes the same AFM, and widget
+authors can compose an AFM directly or target AFM through a build step to
 take advantage of advanced tools.
 :::
 
@@ -195,28 +195,30 @@ JCP. This practice, known as dependency injection [@Fowler2004], improves AFM
 portability by making integration interfaces explicit. New runtimes can support
 AFMs by implementing the required APIs, and existing JCPs can refactor their
 internals without breaking existing (any)widgets. For projects that want to take
-advantage of advanced front-end tooling, anywidget also provides authoring 
+advantage of advanced front-end tooling, anywidget also provides authoring
 utilities to write AFMs such as "bridge" libraries for popular web frameworks
 [@joss].
 
 :::{figure}
 :label: fig:afm
+
 ```{code-block}javascript
 export default {
   initialize({ model }) {
     // Set up shared state or event handlers.
     return () => {
       // Optional: Called when the widget is destroyed.
-    } 
+    }
   },
   render({ model, el }) {
     // Render the widget's view into the el HTMLElement.
     return () => {
       // Optional: Called when the view is destroyed.
-    } 
+    }
   }
 }
 ```
+
 An anywidget front-end module (AFM) with initialization and rendering lifecycle
 methods. For familiarity, AFM methods use naming conventions from traditional
 Jupyter Widgets; however, AFM [narrows down the
@@ -233,7 +235,6 @@ standard web
 [`HTMLElement`](https://developer.mozilla.org/en-US/docs/Web/HTML/Element).
 :::
 
-
 Widget authorship is particularly challenging due to the need to integrate
 front-end code that communicates with kernel code in a heterogeneous set of
 environments. The anywidget project addresses these challenges by focusing on
@@ -421,10 +422,10 @@ making authorship practical and accessible.
 
 We thank Talley Lambert for his contributions to the project, and David Kouril
 for his suggestions on the manuscript and figures. We also thank the anywidget
-community as well as members of the Abdennur and HIDIVE labs for helpful 
+community as well as members of the Abdennur and HIDIVE labs for helpful
 discussions.
 
-## Funding
+**Funding**
 
 TM, NG, and NA acknowledge funding from the National Institutes of Health (UM1
 HG011536, OT2 OD033758, R33 CA263666, R01 HG011773).
diff --git a/papers/manz_anywidget/myst.yml b/papers/manz_anywidget/myst.yml
index f00487f9b3..1bba031db9 100644
--- a/papers/manz_anywidget/myst.yml
+++ b/papers/manz_anywidget/myst.yml
@@ -1,9 +1,12 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/NRPV2311
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-trevor_manz
   # Ensure your title is the same as in your `main.md`
   title: 'Any notebook served: authoring and sharing reusable interactive widgets'
+  description: Jupyter Widgets enable interactive code and data visualization in notebooks, but creating and distributing widgets across the Jupyter ecosystem is challenging. The anywidget project introduces a standard and toolset for portable, web-based widgets in various computing environments, simplifying development and extending compatibility beyond Jupyter. Its approach has fostered a rich widget ecosystem, driving the  creation of new widgets and adoption of the standard by multiple platforms.
   # subtitle: MyST Markdown edition
   # Authors should have affiliations, emails and ORCIDs if available
   authors:
@@ -50,7 +53,13 @@ project:
 
   # Add the abbreviations that you use in your paper here
   abbreviations:
-    MyST: Markedly Structured Text
+    JCP: Jupyter-compatible platform
+    UI: user interface
+    CDN: content distribution network
+    API: application programming interface
+    ES: ECMAScript
+    AFM: anywidget front-end module
+    HMR: hot module replacement
   # It is possible to explicitly ignore the `doi-exists` check for certain citation keys
   error_rules:
     - rule: doi-exists
@@ -66,13 +75,5 @@ project:
         - ts_cookiecutter
         - ecma
         - cookiecutter_comparison
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
 site:
   template: article-theme
diff --git a/papers/manz_anywidget/thumbnail.png b/papers/manz_anywidget/thumbnail.png
new file mode 100644
index 0000000000..ca021eeb73
Binary files /dev/null and b/papers/manz_anywidget/thumbnail.png differ
diff --git a/papers/matt_mccormick/banner.png b/papers/matt_mccormick/banner.png
index e6a793bd6c..8b17870d58 100644
Binary files a/papers/matt_mccormick/banner.png and b/papers/matt_mccormick/banner.png differ
diff --git a/papers/matt_mccormick/figures/artifacts-snapshot.mp4 b/papers/matt_mccormick/figures/artifacts-snapshot.mp4
new file mode 100644
index 0000000000..670e5143f2
Binary files /dev/null and b/papers/matt_mccormick/figures/artifacts-snapshot.mp4 differ
diff --git a/papers/matt_mccormick/figures/vm-head-frozenct-ome-zarr-snapshot.mp4 b/papers/matt_mccormick/figures/vm-head-frozenct-ome-zarr-snapshot.mp4
new file mode 100644
index 0000000000..bd19d1245b
Binary files /dev/null and b/papers/matt_mccormick/figures/vm-head-frozenct-ome-zarr-snapshot.mp4 differ
diff --git a/papers/matt_mccormick/main.md b/papers/matt_mccormick/main.md
index 7967a3d783..5900d52b87 100644
--- a/papers/matt_mccormick/main.md
+++ b/papers/matt_mccormick/main.md
@@ -342,9 +342,9 @@ This model, combined with ITK-Wasm’s architecture, can perform analysis and vi
 
 ::::{figure}
 :label: fig:vm-head-frozenct-ome-zarr
-% :placeholder: figures/vm-head-frozenct-ome-zarr-snapshot.png
 
 :::{iframe} https://scipy-2024-itk-wasm-vm-head-frozenct-ome-zarr.netlify.app/
+:placeholder: figures/vm-head-frozenct-ome-zarr-snapshot.*
 :width: 100%
 :::
 
@@ -367,9 +367,9 @@ As detailed by the [Nyquist-Shannon Sampling Theorem](https://en.wikipedia.org/w
 
 ::::{figure}
 :label: fig:aliasing-artifacts
-% :placeholder: figures/artifacts-snapshot.png
 
 :::{iframe} https://scipy-2024-itk-wasm-aliasing-artifacts.netlify.app/
+:placeholder: figures/artifacts-snapshot.*
 :width: 100%
 :::
 
@@ -462,7 +462,7 @@ The types used are integers, floating point numbers, `std` containers of the sam
 
 The pipeline interface syntax can be generated from a set of interactive prompts provided by the `create-itk-wasm` CLI tool.
 
-Once `ITK_WASM_PARSE(pipeline)` is called, input argument parsing and error handling is performed and the input pipeline options are populated with their values. During CLI execution, this means reading input files. When used with language bindings, files are not used and inputs are populated with in-memory content that was *lowered* into the wasm module with internal `itk_wasm*` functions.
+Once `ITK_WASM_PARSE(pipeline)` is called, input argument parsing and error handling is performed and the input pipeline options are populated with their values. During CLI execution, this means reading input files. When used with language bindings, files are not used and inputs are populated with in-memory content that was _lowered_ into the wasm module with internal `itk_wasm*` functions.
 
 Next comes the computational logic of the pipeline:
 
@@ -482,7 +482,7 @@ Next comes the computational logic of the pipeline:
 
 In this example, we are using ITK library C++ functionality, but this can be arbitrary C++ code.
 
-The `.Get()` and `.Set()` methods on the interface types supply the C++ interface to the input values for computation and outputs. When the output interface type's destructors are called, they are written to files on disk in a CLI context or prepared for wasm module *lifting* in an embedded language context.
+The `.Get()` and `.Set()` methods on the interface types supply the C++ interface to the input values for computation and outputs. When the output interface type's destructors are called, they are written to files on disk in a CLI context or prepared for wasm module _lifting_ in an embedded language context.
 
 The build is configured with simple, standard CMake:
 
@@ -617,7 +617,7 @@ In the browser, our bindings support pipelines that operate on files using the `
 :::{figure} ./figures/downsample-npm.png
 :label: fig:downsample-npm
 
-JavaScript and TypeScript [package *README* rendered](https://www.npmjs.com/package/@itk-wasm/downsample) on *npmjs.com*.
+JavaScript and TypeScript [package _README_ rendered](https://www.npmjs.com/package/@itk-wasm/downsample) on _npmjs.com_.
 :::
 
 #### TypeScript Interface Types
@@ -636,28 +636,28 @@ For tasks that require parallel processing, a compatible `WebWorkerPool` is avai
 
 From an ITK-Wasm project, we generate a complete TypeScript/JavaScript package configuration. This includes:
 
-- *TypeScript Bindings*: Generated for both Node.js and browser environments.
-- *TypeScript Compilation Configuration*: Pre-configured for optimal performance and compatibility.
-- *NPM Package Configuration*: Ready for [publication on the npm registry](#fig:downsample-npm).
+- _TypeScript Bindings_: Generated for both Node.js and browser environments.
+- _TypeScript Compilation Configuration_: Pre-configured for optimal performance and compatibility.
+- _NPM Package Configuration_: Ready for [publication on the npm registry](#fig:downsample-npm).
 
 ::::{figure}
 :label: fig:demo-app
-% :placeholder: figures/downsample-demo.png
 
 :::{iframe} https://scipy-2024-itk-wasm-downsample-demo.netlify.app/?functionName=downsample
+:placeholder: figures/downsample-demo.png
 :width: 100%
+:::
 
 Interactive live API demo application.
-:::
 ::::
 
 #### Documentation and demo app
 
 To facilitate adoption and ease of use, we generate:
 
-- *README*: A concise introduction to the project and its capabilities.
-- *Docsify Documentation*: Detailed API documentation for the pipeline APIs.
-- *Demo App*: An [interactive testing environment](#fig:demo-app) for sample data or user-provided data, allowing developers to experiment with API parameters and visualize results.
+- _README_: A concise introduction to the project and its capabilities.
+- _Docsify Documentation_: Detailed API documentation for the pipeline APIs.
+- _Demo App_: An [interactive testing environment](#fig:demo-app) for sample data or user-provided data, allowing developers to experiment with API parameters and visualize results.
 
 By providing a comprehensive set of tools and configurations, we empower developers to harness the full potential of ITK-Wasm in modern web applications, streamlining the development of scientific imaging and analysis tools.
 
diff --git a/papers/matt_mccormick/myst.yml b/papers/matt_mccormick/myst.yml
index 0a0a440fb0..6165506d19 100644
--- a/papers/matt_mccormick/myst.yml
+++ b/papers/matt_mccormick/myst.yml
@@ -1,10 +1,13 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/TCFJ5130
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-matt_mccormick
   # Ensure your title is the same as in your `main.md`
   title: ITK-Wasm
   subtitle: Universal spatial analysis and visualization
+  description: In recent years, WebAssembly has emerged as a widely-supported technology that offers high performance, compact binary size, support for multiple languages, hardware independence, security, and universal platform support.  ITK-Wasm brings WebAssembly’s capabilities to scientific computing by combining the Insight Toolkit (ITK) and WebAssembly to enable high-performance spatial analysis across programming languages and hardware architectures.
   # Authors should have affiliations, emails and ORCIDs if available
   authors:
     - name: Matthew McCormick
@@ -48,6 +51,7 @@ project:
   abbreviations:
     ITK: Insight Toolkit
     wasm: WebAssembly
+    Wasm: WebAssembly
     WASI: WebAssembly System Interface
     GPU: Graphics Processing Unit
     CPU: Central Processing Unit
@@ -78,15 +82,5 @@ project:
         - Johnson2015-qc
         - Johnson2015-ur
         - NLM_VisibleHumanMale
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  plugins:
-    - unsplash.mjs
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
 site:
   template: article-theme
diff --git a/papers/matt_mccormick/thumbnail.png b/papers/matt_mccormick/thumbnail.png
new file mode 100644
index 0000000000..39aa84379a
Binary files /dev/null and b/papers/matt_mccormick/thumbnail.png differ
diff --git a/papers/matt_mccormick/unsplash.mjs b/papers/matt_mccormick/unsplash.mjs
deleted file mode 100644
index a180c1320e..0000000000
--- a/papers/matt_mccormick/unsplash.mjs
+++ /dev/null
@@ -1,24 +0,0 @@
-const unsplashDirective = {
-  name: "unsplash",
-  doc: "An example directive for showing a nice random image at a custom size.",
-  alias: ["random-pic"],
-  arg: {
-    type: String,
-    doc: "The kinds of images to search for, e.g., `fruit`",
-  },
-  options: {
-    size: { type: String, doc: "Size of the image, for example, `500x200`." },
-  },
-  run(data) {
-    console.log("unsplash data", data);
-    const query = data.arg;
-    const size = data.options.size || "500x200";
-    const url = `https://source.unsplash.com/random/${size}/?${query}`;
-    const img = { type: "image", url };
-    return [img];
-  },
-};
-
-const plugin = { name: "Unsplash Images", directives: [unsplashDirective] };
-
-export default plugin;
diff --git a/papers/matthew_feickert/acknowledgements.md b/papers/matthew_feickert/acknowledgements.md
deleted file mode 100644
index 02f89b5d14..0000000000
--- a/papers/matthew_feickert/acknowledgements.md
+++ /dev/null
@@ -1,4 +0,0 @@
-## Acknowledgements
-
-Matthew Feickert, Alexander Held, and Gordon Watts are supported by the U.S. National Science Foundation (NSF) under Cooperative Agreement OAC-1836650 and PHY-2323298 (IRIS-HEP).
-Lukas Heinrich and Vangelis Kourlitis are supported by the Excellence Cluster ORIGINS, which is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy - EXC-2094-390783311 and by the German Federal Ministry of Education and Research Project 05H2021 (ErUM-FSP T02) - "Run 3 von ATLAS am LHC: Analysis Infrastructure for the ATLAS Detektor at the LHC".
diff --git a/papers/matthew_feickert/banner.png b/papers/matthew_feickert/banner.png
index e6a793bd6c..c27b0c69ab 100644
Binary files a/papers/matthew_feickert/banner.png and b/papers/matthew_feickert/banner.png differ
diff --git a/papers/matthew_feickert/main.md b/papers/matthew_feickert/main.md
index 397535e92c..87e3a6bc2c 100644
--- a/papers/matthew_feickert/main.md
+++ b/papers/matthew_feickert/main.md
@@ -6,11 +6,11 @@ abstract: |
   The prevalence of Python in scientific computing motivated ATLAS to adopt it for its data analysis workflows while enhancing users' experience.
   This paper will describe to a broad audience how a large scientific collaboration leverages the power of the Scientific Python ecosystem to tackle domain-specific challenges and advance our understanding of the Cosmos.
   Through a simplified example of the renowned Higgs boson discovery, attendees will gain insights into the utilization of Python libraries to discriminate a signal in immersive noise, through tasks such as data cleaning, feature engineering, statistical interpretation and visualization at scale.
-exports:
-  - format: pdf
-    template: eartharxiv
-    output: exports/scipy_proceedings_draft.pdf
+acknowledgements: |
+  Matthew Feickert, Alexander Held, and Gordon Watts are supported by the U.S. National Science Foundation (NSF) under Cooperative Agreement OAC-1836650 and PHY-2323298 (IRIS-HEP).
+  Lukas Heinrich and Vangelis Kourlitis are supported by the Excellence Cluster ORIGINS, which is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy - EXC-2094-390783311 and by the German Federal Ministry of Education and Research Project 05H2021 (ErUM-FSP T02) - "Run 3 von ATLAS am LHC: Analysis Infrastructure for the ATLAS Detektor at the LHC".
 ---
+
 :::{include} introduction.md
 :::
 
@@ -22,6 +22,3 @@ exports:
 
 :::{include} conclusions.md
 :::
-
-:::{include} acknowledgements.md
-:::
diff --git a/papers/matthew_feickert/myst.yml b/papers/matthew_feickert/myst.yml
index fd6f3b1713..2851ee3add 100644
--- a/papers/matthew_feickert/myst.yml
+++ b/papers/matthew_feickert/myst.yml
@@ -1,9 +1,12 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/KMXN4784
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-matthew_feickert
   # Ensure your title is the same as in your `main.md`
   title: How the Scientific Python ecosystem helps answer fundamental questions of the Universe
+  description: The ATLAS experiment at CERN explores vast amounts of physics data to answer the most fundamental questions of the Universe. This paper will describe to a broad audience how a large scientific collaboration leverages the power of the Scientific Python ecosystem to tackle domain-specific challenges and advance our understanding of the Cosmos.
   # Authors should have affiliations, emails and ORCIDs if available
   authors:
     - name: Matthew Feickert
@@ -62,6 +65,7 @@ project:
     HEP: High Energy Physics
     IRIS-HEP: Institute for Research and Innovation in Software for High Energy Physics
     LHC: Large Hadron Collider
+    NSF: National Science Foundation
   # It is possible to explicitly ignore the `doi-exists` check for certain citation keys
   error_rules:
     - rule: doi-exists
@@ -75,14 +79,6 @@ project:
         - Dask-awkward
         - Dask-histogram
         - Kourlitis:2890478
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
   # Exclude source files to keep them from appearing as supporting documents
   exclude:
     - introduction.md
diff --git a/papers/matthew_feickert/thumbnail.png b/papers/matthew_feickert/thumbnail.png
new file mode 100644
index 0000000000..2afa157537
Binary files /dev/null and b/papers/matthew_feickert/thumbnail.png differ
diff --git a/papers/nathan_martindale/banner.png b/papers/nathan_martindale/banner.png
index e6a793bd6c..96afed2c0d 100644
Binary files a/papers/nathan_martindale/banner.png and b/papers/nathan_martindale/banner.png differ
diff --git a/papers/nathan_martindale/main.md b/papers/nathan_martindale/main.md
index 4332d8825e..71f26496ae 100644
--- a/papers/nathan_martindale/main.md
+++ b/papers/nathan_martindale/main.md
@@ -1,19 +1,17 @@
 ---
 title: Supporting Greater Interactivity in the IPython Visualization Ecosystem
 abstract: |
-
-    Interactive visualizations are invaluable tools for building intuition and
-    supporting rapid exploration of datasets and models. Numerous libraries in
-    Python support interactivity, and workflows that combine Jupyter and
-    IPyWidgets in particular make it straightforward to build data analysis
-    tools on the fly. However, the field is missing the ability to arbitrarily
-    overlay widgets and plots on top of others to support more flexible
-    details-on-demand techniques. This work discusses some limitations of the
-    base IPyWidgets library, explains the benefits of IPyVuetify and how it
-    addresses these limitations, and finally presents a new open-source solution
-    that builds on IPyVuetify to provide easily integrated widget overlays in
-    Jupyter.
-
+  Interactive visualizations are invaluable tools for building intuition and
+  supporting rapid exploration of datasets and models. Numerous libraries in
+  Python support interactivity, and workflows that combine Jupyter and
+  IPyWidgets in particular make it straightforward to build data analysis
+  tools on the fly. However, the field is missing the ability to arbitrarily
+  overlay widgets and plots on top of others to support more flexible
+  details-on-demand techniques. This work discusses some limitations of the
+  base IPyWidgets library, explains the benefits of IPyVuetify and how it
+  addresses these limitations, and finally presents a new open-source solution
+  that builds on IPyVuetify to provide easily integrated widget overlays in
+  Jupyter.
 ---
 
 ## Introduction
@@ -53,7 +51,7 @@ building only with the base IPyWidgets framework. We then briefly cover some of
 the available libraries that build on top of IPyWidgets, highlighting
 IPyVuetify, which addresses most of the limitations of base IPyWidgets. Finally,
 we discuss a library we have open sourced called IPyOverlay and an example
-use case that motivated its design. 
+use case that motivated its design.
 
 ### Details on Demand
 
@@ -71,7 +69,6 @@ the form of a pop-up window or inset graph. This selective detail capability
 provides a flexible user experience that keeps the initial interface uncluttered
 without sacrificing the ability to organically explore the data.
 
-
 A simple and widespread example of details on demand are tooltips. Descriptions
 of components or data point values in visualizations would often create an
 unusably noisy interface if they were always visible. Instead, a more concise
@@ -83,21 +80,18 @@ live by hovering over any of the figure references.
 opens a window at the cursor that displays the target of the reference (the
 details).
 
-
-:::{figure} detailsondemand.png 
-:label: fig:detailsondemand 
+:::{figure} detailsondemand.png
+:label: fig:detailsondemand
 A figure demonstrating details on demand in action when a user hovers over the figure
-reference for @fig:detailsondemand. 
+reference for @fig:detailsondemand.
 :::
 
-
-
 ## IPyWidgets
 
 IPyWidgets is a library for implementing user interactivity inside of a
 Jupyter Notebook. A key challenge this framework has to overcome is
 the separation between the Python kernel and the browser-based Jupyter frontend,
-which  operates through HTML, CSS, and JavaScript as a web application. Web
+which operates through HTML, CSS, and JavaScript as a web application. Web
 application development is often a challenging software engineering problem in
 its own right, and many Python developers and scientists may not have either the
 experience or the desire to write custom frontend code in JavaScript. IPyWidgets
@@ -117,7 +111,6 @@ constructs an appropriate input widget for each parameter. In this case, the
 developer need not manually initialize any Python widgets, as seen in the below
 code sample @code:interactive and @fig:ipywidgets_example.
 
-
 ```{code-block} python
 :label: code:interactive
 :linenos: true
@@ -167,7 +160,7 @@ graph1_out = ipw.Output()
 graph2_out = ipw.Output()
 
 graph1_param = ipw.FloatSlider(min=0, max=10, value=1)  # a parameter only influencing graph 1
-graph2_param = ipw.IntSlider(min=0, max=10, value=1)  # a parameter only influencing graph 2 
+graph2_param = ipw.IntSlider(min=0, max=10, value=1)  # a parameter only influencing graph 2
 both_graphs_param = ipw.IntText(value=1)  # a parameter that affects both graphs
 
 # create the dashboard layout, combining all the widgets
@@ -200,7 +193,6 @@ both_graphs_param.observe(on_both_graphs_param_change, names=["value"])
 dashboard
 ```
 
-
 ### State Synchronization
 
 One of the primary challenges that IPyWidgets helps solve is the problem of
@@ -208,14 +200,13 @@ state synchronization, where updates to variables made on either side of the
 Python-JavaScript link are monitored and communicated across when they occur.
 Based on this communication, changes to variables in Python are reflected in the
 browser's JavaScript and vice versa. Under the hood, IPyWidgets makes use of
-backbone.js [@LowLevelWidget], a model-view-* (MV*) framework for JavaScript
+backbone.js [@LowLevelWidget], a model-view-_ (MV_) framework for JavaScript
 that implements synchronization between a JavaScript _model_ (where
 data/attributes/values for a widget are stored) and a JavaScript _view_ (the
 code that creates the visual UI elements the user sees and interacts with in
 their browser). IPyWidgets connects to this backbone model via WebSockets,
 through which model changes and events are sent as serialized JSON.
 
-
 :::{figure} ipywidgets_state.png
 :label: fig:ipywidgets_state
 State mirroring between models and view interaction for an example slider widget.
@@ -234,7 +225,6 @@ slider within the track) updates the JavaScript state, which communicates this
 change to the Python model and calls any event handlers listening for
 changes to the value.
 
-
 ### Common Framework
 
 One of the most important benefits of IPyWidgets is that it provides a low-level
@@ -247,7 +237,6 @@ to emerge, providing novel functionality, a great diversity of new types of
 widgets, and other frameworks that can support rendering IPyWidgets within other
 contexts.
 
-
 ### Limitations
 
 Despite the flexibility of IPyWidgets as a framework and its value as a
@@ -261,11 +250,11 @@ IPyOverlay implementation.
 
 The set of default components included in the IPyWidgets library cover all
 standard form elements such as buttons, drop-down menus, and text fields, as well as a
-generic `Output` widget.  The `Output` widget can render any of the MIME-types
+generic `Output` widget. The `Output` widget can render any of the MIME-types
 supported by normal Jupyter cell outputs, including Markdown, images,
 plots, and even raw HTML. Additionally, there are basic layout components to
 stack other widgets vertically, horizontally, or via tabs, all of which can be
-nested to allow grouping widgets. 
+nested to allow grouping widgets.
 Although these components are often more than sufficient for simple visualizations,
 more complex interfaces with richer interaction paradigms can be hard to achieve.
 Some examples of dashboard elements that cannot be created
@@ -292,13 +281,12 @@ in a dataframe, or running an analysis when a user has highlighted a piece of
 text from a document. These types of user actions are not feasible
 with the default widgets.
 
-
 #### Custom Styling and Layout
 
 Web pages and dashboards that need more complicated layouts than conventional
 linear stacks and grids of elements generally need to use CSS to control positioning. More
 than just a mechanism for changing aesthetic and colors, CSS provides rules
-for structuring the interface. However, this often comes at the cost of greater code complexity and 
+for structuring the interface. However, this often comes at the cost of greater code complexity and
 excessive layers of nested elements and rule sets. IPyWidgets components have
 some capability to modify their CSS but are limited to a
 subset of rules available through Python attributes. This makes common
@@ -306,8 +294,6 @@ style changes easier to work with, but the lack of any direct access to arbitrar
 rules limits more complex layouts, namely the ability to float
 one widget on top of another.
 
-
-
 #### Custom IPyWidget Complexity
 
 A key benefit of IPyWidgets as a framework is that it allows
@@ -346,7 +332,7 @@ widgets like pythreejs [@pythreejs] and ipyvolume [@10.5281/zenodo.1286976] for
 [@ipycytoscape] for interactive networks, there are dozens of packages providing
 specific components and bridges to JavaScript visualization libraries. Other
 libraries such as Panel, Solara, and Streamlit provide more holistic
-integrations between IPyWidgets and other visualization techniques, 
+integrations between IPyWidgets and other visualization techniques,
 which can support production-style dashboards for
 use outside of Jupyter alone. Although these libraries address the limited component
 set included with IPyWidgets, another class of Python packages exists in this ecosystem
@@ -361,7 +347,7 @@ extending the Python `anywidget.AnyWidget` class, defining the class attributes
 that are to be synced between the Python and JavaScript models using the
 `traitlets` library [@traitlets], and providing a string of
 JavaScript code that specifies the view rendering logic. Much of the JavaScript
-MV* boilerplate code is abstracted away, and component development can be done without
+MV\* boilerplate code is abstracted away, and component development can be done without
 any JavaScript project infrastructure, specialized cookie cutter, or separate
 environment setup. Fundamentally, this type of approach allows even developing
 widgets directly within a notebook, rather than doing so in a separate project.
@@ -369,8 +355,6 @@ AnyWidget is used in another project called IPyReact [@ipyreact], which brings
 the ReactJS library to support component-based design and allows wrapping
 many existing React components.
 
-
-
 A similar library we focus on in this work that addresses all of the [main
 limitations of IPyWidgets](#limitations) is IPyVuetify [@ipyvuetify], which
 brings many of the capabilities of Vue.js into IPyWidgets. Vue.js is a web
@@ -401,16 +385,17 @@ user has pressed enter, rather than restarting the computation on every letter
 the user adds or removes.
 
 All of the default components in IPyWidgets similarly have more flexible
-versions and many additional components in IPyVuetify. @fig:ipw_ipv 
+versions and many additional components in IPyVuetify. @fig:ipw_ipv
 shows a small sample of default IPyWidgets on the left with their corresponding
 IPyVuetify versions on the right. Also pictured at the bottom is the Vuetify
-DataTable, which is an example of a component that is not in the default IPyWidgets. The DataTable allows 
-interactivity to be added to a table supporting filtering, searching, and both client/server 
-side pagination. Other examples of useful UI elements IPyVuetify adds 
+DataTable, which is an example of a component that is not in the default IPyWidgets. The DataTable allows
+interactivity to be added to a table supporting filtering, searching, and both client/server
+side pagination. Other examples of useful UI elements IPyVuetify adds
 are various layout widgets like navigation drawers, dialog windows, and
 steppers.
 
-<!-- would potentially be nice to see a searchbox for table --> 
+<!-- would potentially be nice to see a searchbox for table -->
+
 :::{figure} ipw_ipv.png
 :label: fig:ipw_ipv
 A visual comparison of a few IPyWidget components on the upper left and their corresponding
@@ -418,7 +403,7 @@ IPyVuetify components on the upper right. The table on the bottom is a
 DataTable component, an example of a widget which does not have a default IPyWidgets equivalent.
 :::
 
-Finally, the Python API for every IPyVuetify component includes the ability to set CSS 
+Finally, the Python API for every IPyVuetify component includes the ability to set CSS
 class and style attributes. The former of these provides access to a long list of
 predefined classes for easy and consistent control of colors, spacing, and other
 attributes.
@@ -426,7 +411,6 @@ For anything not covered by these classes or separate attributes already part of
 Vuetify components, the style attribute controls the raw in-line CSS applied to
 the element.
 
-
 ### Custom Vue Components
 
 IPyVuetify provides clean abstractions and mechanisms for creating custom
@@ -441,13 +425,12 @@ in the JavaScript model as well as the function to update the view from the
 model. We discuss this further in the section on [implementing IPyOverlay in
 IPyWidgets](#implementing-in-ipywidgets). @code:ipyvuetify exemplifies this,
 defining a custom button with a color change on toggle, where the colors can be
-modified from Python as well, shown in @fig:ipyvuetify_example. In this
+modified from Python as well, shown in @fig:ipyvuetify*example. In this
 particular example, both the `@click` event and `:color` property directly bind
 to the Python attributes, and no explicit JavaScript is necessary. Finally,
 IPyVuetify makes it easy to directly call functions across the
-Python--JavaScript link by making any Python functions prefixed with `vue_`
-available within JavaScript and any JavaScript functions
-prefixed with `jupyter_` callable from Python.
+Python--JavaScript link by making any Python functions prefixed with `vue*`available within JavaScript and any JavaScript functions
+prefixed with`jupyter\_` callable from Python.
 
 ```{code-block} python
 :label: code:ipyvuetify
@@ -465,7 +448,7 @@ class ColoredToggleButton(v.VuetifyTemplate):
     def _template(self):
         return """
         <template>
-            <v-btn 
+            <v-btn
                 width="300px"
                 :color="state ? color1 : color2"
                 @click="state = !state"
@@ -485,7 +468,6 @@ button
 An instance of the `ColoredToggleButton` from @code:ipyvuetify. Clicking the button switches to color2, and modifying the Python property `color2` immediately updates the button to the new second color.
 :::
 
-
 ## IPyOverlay
 
 IPyOverlay is a Python library the authors have implemented and
@@ -503,8 +485,6 @@ use case led to the development of IPyOverlay as a library.
 
 [^footnote-1]: https://github.com/ORNL/ipyoverlay
 
-
-
 :::{figure} overlay_figures_over_multiple.png
 :label: fig:example_dashboard
 A dashboard for exploring seismic station data embedded with a self-supervised
@@ -514,7 +494,6 @@ plots (with colored bars on top) that provide additional details for
 user-selected clusters found within the UMAP.
 :::
 
-
 The application in this use case is a tool to assist in evaluating how a model
 internally represents data from seismic events.
 Determining properties about a seismic event's source is a critical part of
@@ -542,14 +521,13 @@ how well the model generalizes to new sources and to explore uncertainty.
 
 The dashboard includes a UMAP [@10.48550/arXiv.1802.03426] plot of signals embedded by the model under
 exploration. Each datapoint is a single observed seismic signal, and these
-data points are then clustered with DBSCAN [@Ester1996ADA]. Metadata attributes aggregated 
+data points are then clustered with DBSCAN [@Ester1996ADA]. Metadata attributes aggregated
 by cluster can be explored by clicking within the UMAP ipympl plot, which aids
 in determining how much correlation exists for those attributes within that
 cluster. Invariant properties would tend to show as spread out distributions,
 while attributes that heavily inform a models' embedding would tend to cause
 clustering and produce narrow distributions.
 
-
 ### Implementing in IPyWidgets
 
 Within base IPyWidgets, there is no straightforward way of constructing a
@@ -585,7 +563,6 @@ following section) demonstrating how the positioning works for absolutely
 positioned child widgets.
 :::
 
-
 ### Embedding IPyWidgets in IPyOverlay Containers
 
 A key enabler is IPyVuetify's ability to embed other IPyWidgets inside
@@ -604,8 +581,6 @@ declaratively constructing a bound list of child widgets each wrapped in custom
 HTML that reactively responds to list updates and other model property changes.
 A simplified example is shown in @code:container.
 
-
-
 ```{code-block} html
 :label: code:container
 :linenos: true
@@ -613,11 +588,11 @@ A simplified example is shown in @code:container.
 
 <template>
     <div class="ipyoverlay-container">
-        <!-- iterate and create a div container for every child in the python model 
+        <!-- iterate and create a div container for every child in the python model
         (children is a list of IPyWidgets) -->
-        <div v-for="(child, index) in children" 
+        <div v-for="(child, index) in children"
             class="ipyoverlay-child-container"
-            :style="{ left: childPosLeft[index]+'px', top: childPosTop[index]+'px' }" 
+            :style="{ left: childPosLeft[index]+'px', top: childPosTop[index]+'px' }"
         >
             <!-- embed the IPyWidget -->
             <jupyter-widget :widget='child' />
@@ -654,22 +629,22 @@ container `<div>` tag embedding a background (nonoverlay) widget, which can
 consist of a more traditional grid-like layout of nested rows/columns of other
 IPyWidgets, and absolutely positioned floating `<div>` tags for each
 child widget. To support additional functionalities in any of the child
-overlay widgets, such as the ability to click and drag to move them, we 
-provide a `WidgetWrapper` (clicking anywhere within begins the drag) and a 
+overlay widgets, such as the ability to click and drag to move them, we
+provide a `WidgetWrapper` (clicking anywhere within begins the drag) and a
 `DecoratedWidgetWrapper` (a bar is rendered above the widget which can be
-clicked and dragged to move the entire widget.) An example is shown in 
-@fig:overlays, where the background widget is an ipympl Matplotlib plot and 
+clicked and dragged to move the entire widget.) An example is shown in
+@fig:overlays, where the background widget is an ipympl Matplotlib plot and
 three `DecoratedWidgetWrapper`'s that embed additional ipympl Matplotlib plots
 are floating freely above the background.
+
 <!-- probably need more attachment from this to the codeblock to help explain. -->
 
 :::{figure} overlays.png
 :label: fig:overlays
-Three `DecoratedWidgetWrapper` widgets, each wrapping an ipympl widget with more 
-in-depth plots detailing distributions of attributes within selected clusters. 
+Three `DecoratedWidgetWrapper` widgets, each wrapping an ipympl widget with more
+in-depth plots detailing distributions of attributes within selected clusters.
 :::
 
-
 ### Connection Lines
 
 A potential concern with free-floating details-on-demand overlays is the
@@ -719,10 +694,10 @@ applications, or tapping and holding on a mobile device, context menus provide a
 localized and easy to access palette of possible commands to run at the
 current location. Complex dashboards can often have many different possible
 actions for the user to take, and simply displaying all of these as buttons may
-overwhelm or clutter the interface. Alternately, the actions may be too component/position specific 
+overwhelm or clutter the interface. Alternately, the actions may be too component/position specific
 to work well as buttons. Context menus fill this role by hiding these commands until
 the user requests them by right-clicking and supporting different menus when
-over different components. 
+over different components.
 Displaying a context menu similarly requires
 rendering some form of widget over another -- IPyOverlay includes a
 `ContextMenuArea` component that can wrap any widget within an
@@ -731,18 +706,17 @@ that specific component. In the example use case, we add a context menu to
 the scatter plot to allow a user to optionally choose between displaying a
 violin plot with the distributions versus the density plots (@fig:context_menu).
 
-
 :::{figure} context_menu.png
 :width: 800px
 :label: fig:context_menu
 An activated context menu providing a selection of commands, which can be
-expanded well beyond the single possible action that can be taken from a left-click. 
+expanded well beyond the single possible action that can be taken from a left-click.
 :::
 
 ### Example Usage
 
 A minimal code example using IPyOverlay's containers and connection lines is
-shown in @code:example_ipyoverlay. The example renders an interactive ipympl 
+shown in @code:example_ipyoverlay. The example renders an interactive ipympl
 scatterplot of random data, and clicking on a point renders a connected overlay
 window with a bar chart of additional data associated with the point. A
 screenshot of this output is shown in @fig:ipyoverlay_example.
@@ -778,12 +752,12 @@ def on_point_click(point_index, event):
     inset_fig, inset_ax = plt.subplots(figsize=(2,2))
     inset_ax.bar(x=[0,1,2], height=category_bar_data)
 
-    # make a floating IPyOverlay window with the plot and connect it 
+    # make a floating IPyOverlay window with the plot and connect it
     # to the selected point's data location
     inset_window = ui.DecoratedWidgetWrapper(ui.display_output(inset_fig), title=str(point_index))
     container.add_child_at_mpl_point(inset_window, ax, data_point[0], data_point[1])
 
-# attach the event handler to add the inset plot when any of the specified 
+# attach the event handler to add the inset plot when any of the specified
 # datapoints are clicked
 handler = ui.mpl.event.on_mpl_point_click(ax, on_point_click, data[:,0], data[:,1], tolerance=.1)
 
@@ -798,8 +772,6 @@ An `ipympl` scatter plot of random data with two inset plots showing the
 additional data associated with each clicked point in a bar chart.
 :::
 
-
-
 ## Conclusion
 
 The prevalence and utility of Jupyter notebooks in data analysis and scientific
@@ -810,13 +782,12 @@ mediate communication between Python and JavaScript. A multitude of libraries
 have built on top of these mechanisms and IPyVuetify in particular offers a
 collection of well designed and flexible components to use in dashboards and a
 straightforward abstraction for building custom components in projects based on
-Vue.js. A missing piece of this ecosystem is the ability to flexibly render 
+Vue.js. A missing piece of this ecosystem is the ability to flexibly render
 localized pop-up windows in support of details on demand. We developed
 IPyOverlay to provide this capability through wrapper components that can
 arbitrarily position widgets on top of other widgets, opening the door to more
 types of visualization and dashboard layouts.
 
-
 ## Acknowledgements
 
 Notice: This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (https://www.energy.gov/doe-public-access-plan).
diff --git a/papers/nathan_martindale/myst.yml b/papers/nathan_martindale/myst.yml
index e690e91e9c..5cba04f848 100644
--- a/papers/nathan_martindale/myst.yml
+++ b/papers/nathan_martindale/myst.yml
@@ -1,10 +1,13 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/GVHT1072
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-nathan_martindale
   # Ensure your title is the same as in your `main.md`
   title: Supporting Greater Interactivity in the IPython Visualization Ecosystem
   subtitle: IPyWidgets, IPyVuetify, and beyond
+  description: Interactive visualizations are invaluable tools for building intuition and supporting rapid exploration of datasets and models. This paper explains the benefits of IPyVuetify with the ability to arbitrarily overlay widgets and plots on top of others to support more flexible details-on-demand techniques.
   # Authors should have affiliations, emails and ORCIDs if available
   authors:
     - name: Nathan Martindale
@@ -37,6 +40,7 @@ project:
     UMAP: Uniform Manifold Approximation and Projection
     CSS: Cascading Style Sheets
     DBSCAN: Density-Based Spatial Clustering of Applications with Noise
+    DOE: Department of Energy
   # It is possible to explicitly ignore the `doi-exists` check for certain citation keys
   error_rules:
     - rule: doi-exists
@@ -63,13 +67,5 @@ project:
       severity: ignore
       keys:
         - https://www.energy.gov/doe-public-access-plan
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
 site:
   template: article-theme
diff --git a/papers/nathan_martindale/thumbnail.png b/papers/nathan_martindale/thumbnail.png
new file mode 100644
index 0000000000..57c711a360
Binary files /dev/null and b/papers/nathan_martindale/thumbnail.png differ
diff --git a/papers/proceedings.yml b/papers/proceedings.yml
new file mode 100644
index 0000000000..99e51c2382
--- /dev/null
+++ b/papers/proceedings.yml
@@ -0,0 +1,59 @@
+version: 1
+project:
+  subject: Research Article
+  open_access: true
+  license: CC-BY-4.0
+  venue:
+    title: Python in Science Conference
+    short_title: SciPy
+    number: 23rd
+    location: Tacoma, Washington
+    date: July 8 - July 14, 2024
+    publisher: SciPy
+    series: Proceedings of the Python in Science Conference
+    issn: 2575-9752
+    doi: 10.25080/issn.2575-9752
+  date: 2024-07-10
+  banner: banner.png
+  thumbnail: thumbnail.png
+  numbering:
+    headings: true
+  volume:
+    title: Proceedings of the 23rd Python in Science Conference
+    subject: Scientific Computing with Python
+    doi: 10.25080/DTVR3553
+  editors:
+    - name: Meghann Agarwal
+      email: agarwal.meghann@gmail.com
+      affiliation: GDI
+      id: meghann
+    - name: Amey Ambade
+      email: ameyambade@gmail.com
+      affiliation: SLB
+      id: amey
+    - name: Chris Calloway
+      email: cbc@unc.edu
+      affiliation: University of North Carolina
+      id: chris
+    - name: Rowan Cockett
+      email: rowan@curvenote.com
+      affiliation: Curvenote
+      id: rowan
+    - name: Sanhita Joshi
+      email: sanhita.joshi@gmail.com
+      affiliation: Deloitte
+      id: sanhita
+    - name: Charles Lindsey
+      email: lindseycster@gmail.com
+      affiliation: Aptos
+      id: charles
+    - name: Hongsup Shin
+      email: hongsup.shin@pm.me
+      affiliation: Arm
+      id: hongsup
+site:
+  template: article-theme
+  actions:
+    - title: pdf
+      url: full_text.pdf
+      static: true
diff --git a/papers/pryce_turner/banner.png b/papers/pryce_turner/banner.png
index e6a793bd6c..4013986b7c 100644
Binary files a/papers/pryce_turner/banner.png and b/papers/pryce_turner/banner.png differ
diff --git a/papers/pryce_turner/myst.yml b/papers/pryce_turner/myst.yml
index 342089b859..41084116e4 100644
--- a/papers/pryce_turner/myst.yml
+++ b/papers/pryce_turner/myst.yml
@@ -1,9 +1,12 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/DDJJ4932
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-pryce_turner
   # Ensure your title is the same as in your `main.md`
   title: Orchestrating Bioinformatics Workflows Across a Heterogeneous Toolset with Flyte
+  description: While Python excels at prototyping and iterating quickly, it’s not always performant enough for whole-genome scale data processing. Flyte, an open-source Python-based workflow orchestrator, presents an excellent way to tie together the myriad tools required to run bioinformatics workflows.
   # Authors should have affiliations, emails and ORCIDs if available
   authors:
     - name: Pryce Turner
@@ -29,13 +32,5 @@ project:
       severity: ignore
       keys:
         - andrews2012
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
 site:
   template: article-theme
diff --git a/papers/pryce_turner/thumbnail.png b/papers/pryce_turner/thumbnail.png
new file mode 100644
index 0000000000..a8a6f8cfbe
Binary files /dev/null and b/papers/pryce_turner/thumbnail.png differ
diff --git a/papers/sam_morley/banner.png b/papers/sam_morley/banner.png
index e6a793bd6c..f764de0bcb 100644
Binary files a/papers/sam_morley/banner.png and b/papers/sam_morley/banner.png differ
diff --git a/papers/sam_morley/myst.yml b/papers/sam_morley/myst.yml
index 5bf779a875..27c000b8f6 100644
--- a/papers/sam_morley/myst.yml
+++ b/papers/sam_morley/myst.yml
@@ -1,16 +1,18 @@
 # See docs at: https://mystmd.org/guide/frontmatter
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/DXWY3560
   id: scipy-2024-sam_morley
   title: RoughPy
   subtitle: Streaming data is rarely smooth
   short_title: RoughPy
-
-  authors: 
+  description: Rough path theory is a branch of mathematics arising out of stochastic analysis. One of the main tools of rough path analysis is the signature, which captures the evolution of an unparametrised path including the order in which events occur. RoughPy is our new Python package that aims change the way we think about sequential streamed data.
+  authors:
     - name: Sam Morley
       email: sam.morley@maths.ox.ac.uk
       orcid: 0000-0001-5971-7418
-      affiliations: 
+      affiliations:
         - University of Oxford
       roles:
         - Conceptualization
@@ -26,20 +28,18 @@ project:
         - Conceptualization
         - Supervision
         - Writing - review & editing
-
-  # description:
-  keywords: 
+  keywords:
     - sequential data
     - unparametrised paths
-    - time series 
+    - time series
     - rough paths
-    - signatures 
-    - data science 
+    - signatures
+    - data science
     - machine learning
     - signature kernels
     - Log-ODE method
   github: datasig-ac-uk/roughpy
-  bibliography: 
+  bibliography:
     - roughpy.bib
   # Add the abbreviations that you use in your paper here
   abbreviations:
@@ -61,14 +61,5 @@ project:
         - coropa_project
         - Granlund12
         - NEURIPS2021_18a9042b
- 
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
 site:
   template: article-theme
diff --git a/papers/sam_morley/roughpy.md b/papers/sam_morley/roughpy.md
index e279eac19b..60a0bd8952 100644
--- a/papers/sam_morley/roughpy.md
+++ b/papers/sam_morley/roughpy.md
@@ -1,11 +1,6 @@
 ---
 title: RoughPy
-subtitle: Streaming data is rarely smooth 
-bibliography:
-    - roughpy.bib
-numbering:
-    heading_2: true
-    heading_3: true
+subtitle: Streaming data is rarely smooth
 abstract: |
   Rough path theory is a branch of mathematics arising out of stochastic
   analysis. One of the main tools of rough path analysis is the signature,
@@ -30,7 +25,8 @@ acknowledgements: |
 ---
 
 ## Introduction
-Sequential data appears everywhere in the modern world: text, finance, health 
+
+Sequential data appears everywhere in the modern world: text, finance, health
 records, radio (and other electromagnetic spectra), sound (and speech), etc.
 Traditionally, these data are tricky to work with because of the exponential
 complexity and different scales of the underlying process.
@@ -40,33 +36,35 @@ short-term fine detail.
 Rough path theory gives us tools to work with sequential, ordered data in a
 mathematically rigorous way, which should provide a means to overcome some of
 the inherent complexity of the data.
-In this paper, we introduce a new package *RoughPy* for working with sequential
+In this paper, we introduce a new package _RoughPy_ for working with sequential
 data through the lens of rough path theory, where we can perform rigourous
 analyses and explore different ways to understand sequential data.
 
-Rough paths arise in the study of *controlled differential equations* (CDEs),
+Rough paths arise in the study of _controlled differential equations_ (CDEs),
 which generalise ordinary differential equations (ODEs) and stochastic
 differential equations
 [@Lyons1998;@10.1007/978-3-540-71285-5].
-These are equations of the form $\mathrm{d}Y_t = f(Y_t, \mathrm{d}X_t)$, subject 
-to an initial condition $Y_0 = y_0$, that model a non-linear system driven by a 
+These are equations of the form $\mathrm{d}Y_t = f(Y_t, \mathrm{d}X_t)$, subject
+to an initial condition $Y_0 = y_0$, that model a non-linear system driven by a
 input path $X$.
 One simple CDE turns out to be critical to the theory:
+
 ```{math}
 \mathrm{d}S_t = S_t \otimes \mathrm{d}X_t \qquad S_0 = \mathbf{1}.
 ```
-The solution of this equation is called the *signature* of $X$.
+
+The solution of this equation is called the _signature_ of $X$.
 It is analogous to the exponential function for ODEs, in that the
 solution of any CDE can be expressed in terms of the signature of the driving
 path.
-When the path $X$ is sufficiently regular, the signature can be computed 
+When the path $X$ is sufficiently regular, the signature can be computed
 directly as a sequence of iterated integrals.
-In other cases, we can still solve CDEs if we are given higher order data that 
+In other cases, we can still solve CDEs if we are given higher order data that
 can be used in place of the iterated integrals.
-A path equipped with this higher order data is called a *rough path*.
+A path equipped with this higher order data is called a _rough path_.
 
 The signature turns out to be a useful summary of sequential data.
-It captures the order of events but not the necessarily the rate at which 
+It captures the order of events but not the necessarily the rate at which
 these events occur.
 The signature is robust to irregular sampling and provides a fixed-size view of
 the data, regardless of how many observations are used to compute it.
@@ -82,19 +80,19 @@ Both of these enjoy the same robustness of the signature, and expand the range
 of applications of rough path-based methods.
 We give a short overview of these methods in @rp-in-ds-sec.
 
-There are several Python packages for computing signatures of sequential data, 
-including `esig` [@esig], `iisignature` [@10.1145/3371237], and 
+There are several Python packages for computing signatures of sequential data,
+including `esig` [@esig], `iisignature` [@10.1145/3371237], and
 `signatory` [@10.48550/arXiv.2001.00706].
-These packages provide functions for computing signatures from raw, 
-structured data presented in an $n\times d$ array, where $d$ is the dimension 
+These packages provide functions for computing signatures from raw,
+structured data presented in an $n\times d$ array, where $d$ is the dimension
 of the stream and $n$ is the number of samples.
-This means the user is responsible for interpreting the data as a path and 
+This means the user is responsible for interpreting the data as a path and
 arranging the computations that need to be done.
 
 RoughPy is a new package for working with sequential data and rough paths.
 The design philosophy for this package is to shift the emphasis from simply
 computing signatures on data to instead work with streams.
-A *stream* is a view of some data as if it were a rough path, that can be
+A _stream_ is a view of some data as if it were a rough path, that can be
 queried over intervals to obtain a signature.
 The actual form of the data is abstracted away in favour of stream objects that
 closely resemble the mathematics.
@@ -122,109 +120,121 @@ RoughPy is open source (BSD 3-Clause) and available on GitHub
 [https://github.com/datasig-ac-uk/roughpy](https://github.com/datasig-ac-uk/roughpy).
 
 (math-bgd-sec)=
+
 ### Mathematical background
-In this section we give a very short introduction to signatures and rough path 
+
+In this section we give a very short introduction to signatures and rough path
 theory that should be sufficient to inform the discussion in the sequel.
-For a far more comprehensive and rigorous treatment, we refer the reader to the 
+For a far more comprehensive and rigorous treatment, we refer the reader to the
 recent survey [@10.48550/arXiv.2206.14674].
-For the remainder of the paper, we write $V$ for the vector space 
+For the remainder of the paper, we write $V$ for the vector space
 $\mathbb{R}^d$, where $d \geq 1$.
 
-A *path* in $V$ is a continuous function $X:[a, b] \to V$, where $a < b$ are
-real numbers. 
-For the purposes of this discussion, we shall further impose the condition 
+A _path_ in $V$ is a continuous function $X:[a, b] \to V$, where $a < b$ are
+real numbers.
+For the purposes of this discussion, we shall further impose the condition
 that all paths are of bounded variation.
-The value of a path $X$ at some parameter $t\in[a, b]$ is denoted $X_t$. 
+The value of a path $X$ at some parameter $t\in[a, b]$ is denoted $X_t$.
 
-The signature of $X$ is an element of the *(free) tensor algebra*.
+The signature of $X$ is an element of the _(free) tensor algebra_.
 For $n \geq 0$, the $n$th tensor power of $V$ is defined recursively by
 $V^{\otimes 0} = \mathbb{R}$, $V^{\otimes 1} = V$, and $V^{\otimes n+1} = V
-\otimes V^{\otimes n}$ for $n > 1$. 
-For example, $V^{\otimes 2}$ is the space of $d\times d$ 
+\otimes V^{\otimes n}$ for $n > 1$.
+For example, $V^{\otimes 2}$ is the space of $d\times d$
 matrices, and $V^{\otimes 3}$ is the space of $d\times d\times d$ tensors.
-The *tensor algebra over $V$* is the space
+The _tensor algebra over $V$_ is the space
+
 ```{math}
 \mathrm{T}((V)) = \{\mathbf{x} = (x_0, x_1, \dots) : x_j \in V^{\otimes j}
 \,\forall j \geq 0\}
 ```
+
 equipped with the tensor product $\otimes$ as multiplication.
-The tensor algebra is a *Hopf algebra*, and comes equipped with an
+The tensor algebra is a _Hopf algebra_, and comes equipped with an
 antipode operation $\alpha_V:\mathrm{T}((V)) \to \mathrm{T}((V))$.
 It contains a group $\mathrm{G}(V)$ of elements under tensor multiplication and
 the antipode.
-The members of $\mathrm{G}(V)$ are called *group-like* elements.
-For each $n \geq 0$, we write $\mathrm{T}^n(V)$ for the *truncated tensor
-algebra* of degree $n$, which is the space of all $\mathbf{x} = (x_0, x_1,
+The members of $\mathrm{G}(V)$ are called _group-like_ elements.
+For each $n \geq 0$, we write $\mathrm{T}^n(V)$ for the _truncated tensor
+algebra_ of degree $n$, which is the space of all $\mathbf{x} = (x_0, x_1,
 \dots)$ such that $x_j = 0$ whenever $j > n$.
-Similarly, we write $\mathrm{T}^{>n}((V))$ for the subspace of elements 
-$\mathbf{x} = (x_0, x_1,\dots)$ where $x_j = 0$ whenever $j \leq n$, 
-which is an ideal in $\mathrm{T}((V))$ and 
+Similarly, we write $\mathrm{T}^{>n}((V))$ for the subspace of elements
+$\mathbf{x} = (x_0, x_1,\dots)$ where $x_j = 0$ whenever $j \leq n$,
+which is an ideal in $\mathrm{T}((V))$ and
 $\mathrm{T}^n(V) = \mathrm{T}((V)) / \mathrm{T}^{>n}((V))$.
-The truncated tensor algebra is an algebra, when given the *truncated tensor
-product*, obtained by truncating the full tensor product. 
+The truncated tensor algebra is an algebra, when given the _truncated tensor
+product_, obtained by truncating the full tensor product.
 
 The signature $\mathrm{S}(X)_{s, t}$ of a path $X:[a,b] \to V$ over a
-subinterval $[s, t)\subseteq [a, b]$ is 
-$\mathrm{S}(X)_{s,t} = (1, \mathrm{S}_1(X)_{s,t}, \dots)\in \mathrm{G}(V)$ 
-where for each $m\geq 1$, $\mathrm{S}_m(X)_{s, t}$ is given by the iterated 
+subinterval $[s, t)\subseteq [a, b]$ is
+$\mathrm{S}(X)_{s,t} = (1, \mathrm{S}_1(X)_{s,t}, \dots)\in \mathrm{G}(V)$
+where for each $m\geq 1$, $\mathrm{S}_m(X)_{s, t}$ is given by the iterated
 (Riemann-Stieltjes) integral
+
 ```{math}
 \mathrm{S}_m(X)_{s, t} = \underset{s < u_1 < u_2 < \dots < u_m < t}
 {\int \dots \int}
 \mathrm{d}X_{u_1}\otimes \mathrm{d}X_{u_2}\otimes\dots\otimes \mathrm{d}X_{u_m}.
 ```
-The signature respects concatenation of paths, meaning 
-$\mathrm{S}(X)_{s, t} = \mathrm{S}(X)_{s, u} \otimes \mathrm{S}(X)_{u, t}$ 
+
+The signature respects concatenation of paths, meaning
+$\mathrm{S}(X)_{s, t} = \mathrm{S}(X)_{s, u} \otimes \mathrm{S}(X)_{u, t}$
 for any $s < u < t$.
-This property is usually called *Chen's relation*.
-Two paths have the same signature if and only if they differ by a 
-*tree-like path* [@10.4007/annals.2010.171.109].
-The signature is translation invariant, and it is invariant under 
+This property is usually called _Chen's relation_.
+Two paths have the same signature if and only if they differ by a
+_tree-like path_ [@10.4007/annals.2010.171.109].
+The signature is translation invariant, and it is invariant under
 reparametrisation.
 
-The *dual* of $\mathrm{T}((V))$ is the *shuffle algebra* $\mathrm{Sh}(V)$.
+The _dual_ of $\mathrm{T}((V))$ is the _shuffle algebra_ $\mathrm{Sh}(V)$.
 This is the space of linear functionals $\mathrm{T}((V))\to \mathbb{R}$
 and consists of sequences $(\lambda_0, \lambda_1, \dots)$ with
 $\lambda_k\in (V^{\ast})^{\otimes k}$ and where $\lambda_k = 0$ for all $k$
 larger than some $N$.
 (Here $V^{\ast}$ denotes the dual space of $V$. In our notation $V^{\ast} \cong
-V$.) The multiplication on $\mathrm{Sh}(V)$ is the *shuffle product*, which
+V$.) The multiplication on $\mathrm{Sh}(V)$ is the _shuffle product_, which
 corresponds to point-wise multiplication of functions on the path.
 Continuous functions on the path can be approximated (uniformly) by shuffle
 tensors acting on $\mathrm{G}(V)$ on the signature.
-This is a consequence of the Stone-Weierstrass theorem. 
-This property is sometimes referred to as *universal non-lineararity*.
+This is a consequence of the Stone-Weierstrass theorem.
+This property is sometimes referred to as _universal non-lineararity_.
 
-There are several *Lie algebras* associated to $\mathrm{T}((V))$.
-Define a *Lie bracket* on $\mathrm{T}((V))$ by the formula 
-$[\mathbf{x}, \mathbf{y}] = \mathbf{x} \otimes \mathbf{y} - 
+There are several _Lie algebras_ associated to $\mathrm{T}((V))$.
+Define a _Lie bracket_ on $\mathrm{T}((V))$ by the formula
+$[\mathbf{x}, \mathbf{y}] = \mathbf{x} \otimes \mathbf{y} -
 \mathbf{y}\otimes \mathbf{x}$, for
 $\mathbf{x},\mathbf{y}\in \mathrm{T}((V))$.
 We define subspaces $L_m$ of $\mathrm{T}((V))$ for each $m\geq 0$ inductively as
-follows: $L_0 = \{\mathbf{0}\}$, $L_1 = V$, and, for $m \geq 1$, 
+follows: $L_0 = \{\mathbf{0}\}$, $L_1 = V$, and, for $m \geq 1$,
+
 ```{math}
 L_{m+1} = \mathrm{span}\{[\mathbf{x}, \mathbf{y}] : \mathbf{x}\in V, \mathbf{y}
 \in L_m\}.
 ```
-The space of formal Lie series $\mathcal{L}(V)$ over $V$ is the subspace of 
-$\mathrm{T}((V))$ containing sequences of the form $(\ell_0, \ell_1, \cdots)$, 
+
+The space of formal Lie series $\mathcal{L}(V)$ over $V$ is the subspace of
+$\mathrm{T}((V))$ containing sequences of the form $(\ell_0, \ell_1, \cdots)$,
 where $\ell_j\in L_j$ for each $j\geq 0$.
 Note that $\mathcal{L}(V)\subseteq \mathrm{T}^{>0}(V)$.
-For any $\mathbf{x} \in \mathrm{T}(V)$ we define 
+For any $\mathbf{x} \in \mathrm{T}(V)$ we define
+
 ```{math}
 \exp(\mathbf{x}) = \sum_{n=0}^\infty \frac{\mathbf{x}^{\otimes n}}{n!}
 \quad\text{and}\quad
 \log(\mathbf{1} + \mathbf{x}) = \sum_{n=1}^\infty
 \frac{(-1)^{n-1}}{n}\mathbf{x}^{\otimes n}.
 ```
-For any path $X$, we have 
-$\mathrm{LogSig}(X)_{s, t} := \log(\mathrm{S}(X)_{s, t})\in \mathcal{L}(V)$, 
-and we call this the *log-signature* of $X$ over $[s, t)$.
+
+For any path $X$, we have
+$\mathrm{LogSig}(X)_{s, t} := \log(\mathrm{S}(X)_{s, t})\in \mathcal{L}(V)$,
+and we call this the _log-signature_ of $X$ over $[s, t)$.
 This is an alternative representation of the path, but doesn't enjoy the same
 unievrsal non-linearity of the signature.
 
 (rp-in-ds-sec)=
-### Rough paths in data science 
+
+### Rough paths in data science
+
 Now we turn to the applications of rough path theory to data science.
 Our first task is to form a bridge between sequential data and paths.
 Consider a finite, ordered sequence $\{(t_1, \mathbf{x}_1,\dots,
@@ -233,21 +243,23 @@ $\mathbf{x}_j\in V$.
 (More generally, we might consider $\mathbf{x}_j\in\mathcal{L}(V)$ instead.
 That is, data that already contains higher-order information. In our language,
 it is a genuine rough path.)
-We can find numerous paths that interpolate these observations; a path 
+We can find numerous paths that interpolate these observations; a path
 $X:[t_0, t_N]\to V$ such that, for each $j$, $X_{t_j} = \mathbf{x}_j$.
 The simplest interpolation is to take the path that is linear between adjacent
 observations.
 
 Once we have a path, we need to be able to compute signatures.
 For practical purposes, we truncate all signatures (and log-signatures) to a
-particular degree $M$, which we typically call the *depth*.
-The dimension of the ambient space $d$ is usually called the *width*.
-Using linear interpolation, we can compute the iterated integrals explicitly 
+particular degree $M$, which we typically call the _depth_.
+The dimension of the ambient space $d$ is usually called the _width_.
+Using linear interpolation, we can compute the iterated integrals explicitly
 using a free tensor exponential of the difference of successive terms:
+
 ```{math}
 \mathrm{Sig}^M([t_j, t_{j+1})) = \exp_M(\mathbf{x}_{j+1} - \mathbf{x}_j) :=
 \sum_{j=0}^M \frac{1}{j!}(\mathbf{x}_{j+1} - \mathbf{x}_j)^{\otimes j}.
 ```
+
 Here, and in the remainder of the paper, we shall denote the empirical signature
 over an interval $I$ by $\mathrm{Sig}(I)$ and the log-signature as
 $\mathrm{LogSig}(I)$.
@@ -255,18 +267,21 @@ We can compute the signature over arbitrary intervals by taking the product of
 the these terms, using the multiplicative property of the signature.
 
 #### The signature transform
+
 Most of the early applications of rough paths in data science, the (truncated)
 signature was used as a feature map [@NEURIPS2019_d2cdf047]. This provides a
 summary of the path that is independent of the parameterisation and the number
 of observations. Unfortunately, the signature grows geometrically with
 truncation depth. If $d > 1$, then the dimension of $\mathrm{T}^M(V)$ is
+
 ```{math}
 \sum_{m=0}^M d^m = \frac{d^{M+1} - 1}{d - 1}
 ```
+
 The size of the signature is a reflection of the complexity of the data, where
 data with a higher complexity generally needs a higher truncation level and thus
 a larger signature. It is worth noting that this still represents a significant
-compression of stream information in many cases. 
+compression of stream information in many cases.
 
 For some applications, it might be possible to replace the signature with the
 log-signature. The log-signature is smaller than the signature, but we lose the
@@ -277,16 +292,18 @@ science becomes more mathematically mature, we will likely find new ways to use
 the signature without requiring its full size.
 
 (sigker-sec)=
+
 #### Signature kernels
+
 Kernel methods are useful tools for learning with sequential data.
-Mathematically, a *kernel* on a set $W$ is a positive-definite function
+Mathematically, a _kernel_ on a set $W$ is a positive-definite function
 $k:W\times W\to \mathbb{R}$. Kernels are often quite easy to evaluate because of
 the kernel trick, which involves embedding the data in a inner product space,
 with a feature map, in which the kernel can be evaluated by simply taking an
 inner product. Informally, kernels measure the similarity between two points.
 They are used in a variety of machine learning tasks such as classification.
 
-The *signature kernel* is a kernel induced on the space of paths by combining
+The _signature kernel_ is a kernel induced on the space of paths by combining
 the signature with an inner product defined on the tensor algebra
 [@JMLR_v20_16_314]. The theory surrounding the signature kernel has been
 expanded several times since their introduction
@@ -297,7 +314,7 @@ to the tensor algebra.
 Signatures are infinite objects, so we can't simply evaluate inner products on
 the tensor algebra. Fortunately, we can approximate the signature kernel by
 taking inner products of truncated signatures. Even better, it turns out that,
-in certain cases, the signature kernel can be realised as  the solution to a
+in certain cases, the signature kernel can be realised as the solution to a
 partial differential equation (PDE) of Goursat type. This means the full
 signature kernel can be computed from raw data without needing to compute full
 signatures [@10.1137/20M1366794].
@@ -310,12 +327,16 @@ which are not available in any current package for computing signatures. These
 functions are provided by RoughPy.
 
 (ncde-sec)=
+
 #### Neural controlled differential equations
+
 Neural CDEs are a method for modelling irregular time series.
 We consider CDEs of the form
+
 ```{math}
 \mathrm{d}Y_t = f_\theta(Y_t)\,\mathrm{d}X_t
 ```
+
 where $f_\theta$ is a neural network.
 We can treat the path $Y$ as "hidden state" that we can tune using data to
 understand the relationship between the driving path $X_t$ and some response.
@@ -326,23 +347,26 @@ Neural CDEs initially showed some promising results on several benchmarks but
 now lag behind current state-of-the-art approaches to time series modelling.
 The latest iteration of neural CDEs are the recently introduced Log-neural
 controlled differential equations [@10.48550/arXiv.2402.18512],
-which make use of the *Log-ODE* method for solving rough differential equations
+which make use of the _Log-ODE_ method for solving rough differential equations
 in order to boost the performance of neural CDEs.
 
 (applications-sec)=
+
 ## Current applications of rough paths
+
 In this section we enumerate several applications where rough paths have been
 used to develop or improve methods. This list presented here is certainly not
 exhaustive. In addition to the literature cited below, there are numerous
 additional references and worked examples, in the form of Jupyter notebooks,
 available on the DataSig website (<https://datasig.ac.uk/examples>).
 
-### Detecting interference in radio astronomy data 
+### Detecting interference in radio astronomy data
+
 Radio frequency interference (RFI) is a substantial problem in the field of
 radio astronomy. Even small amounts of RFI can obscure the faint signals
 generated by distant stellar objects and events. The problem of identifying RFI
 in a signal falls into a class of semi-supervised learning tasks called
-*novelty* (or *anomaly*) *detection*. Rough path methods have been applied to
+_novelty_ (or _anomaly_) _detection_. Rough path methods have been applied to
 develop a novelty detection framework based on rough path methods to detect RFI
 in radio astronomy data from several radio telescopes
 [@doi:10.48550/arXiv.2402.14892]. Their result show that their framework is
@@ -356,6 +380,7 @@ classifier to identify processes that are malicious compared to "normal"
 behaviour learned via training on a corpus of normality.
 
 ### Tracking mood via natural language processing
+
 One application of rough paths in natural language processing has been in the
 domain of mental health [@10.18653/v1/2023.findings-acl.310;
 @tseriotou_etal_2024_sig]. In this work, the authors present a model for
@@ -366,6 +391,7 @@ identify changes in patients and intervene before the state develops. Their
 model achieves state-of-the-art performance vs existing models on two datasets.
 
 ### Predicting battery cell degradation
+
 Another recent application of signatures is to predict the degradation of
 lithium-ion cells [@10.1016/j.apenergy.2023.121974]. They use signature features
 to train a model that can accurately predict the end of life of a cell using
@@ -374,6 +400,7 @@ observed that the performance at higher frequency was comparable to other
 models.
 
 ### Prediction of sepsis in intensive care data
+
 One of the first effective demonstrations of the utility of signatures and rough
 paths based methods in healthcare was in the 2019 PhysioNet challenge
 [@10.1097/CCM.0000000000004510]. In this contest, teams were invited to develop
@@ -388,26 +415,30 @@ and dense. Rough path based methods can handle these data in an elegant way, and
 retain the structure of long and short term dependencies within the data.
 
 ### Human action recognition
+
 The task of identifying a specific action performed by a person from a short
-video clip is very challenging. 
+video clip is very challenging.
 Signatures derived from landmark data extracted from the video has been used to
 train classification models that achieved state-of-the-art performance compared
 with contemporary models
 [@10.1007/978-3-030-98519-6_18; @doi:10.1109/tmm.2023.3318242;
 @liao2021a].
 (See also preprint papers [@10.48550/arXiv.2308.12840;
-@10.48550/arXiv.2403.15212].) 
+@10.48550/arXiv.2403.15212].)
 Also in the domain of computer vision, signatures have been used to produce
 lightweight models for image classification
 [@10.1109/CVPRW56347.2022.00409] and in handwriting recognition
 tasks [@10.1109/TPAMI.2017.2732978].
 
 (roughpy-sec)=
+
 ## RoughPy
+
 RoughPy is a new library that aims to support the development of connections
 between rough path theory and data science. It represents a shift in philosophy
 from simple computations of signatures for sequential data, to a representation
 of these data as a rough path. The design objectives for RoughPy are as follows:
+
 1. provide a class that presents a rough path view of some source of data as a
    rough path, exposing methods for querying the data over intervals to get a
    signature or log-signature;
@@ -420,10 +451,10 @@ The first two objectives are simple design and implementation problems.
 The final objective presents the most difficulty, especially interoperability
 between RoughPy and common machine learning libraries.
 There are array interchange formats for NumPy-like arrays, such as the Python
-Array API standard [@10.25080/gerudo-f2bc6f59-001] and the 
+Array API standard [@10.25080/gerudo-f2bc6f59-001] and the
 DLPack protocol [@dlpack].
 These provide part of the picture, but in order for them to be fully supported,
-RoughPy must support a variety of compute backends such as CUDA (NVidia), 
+RoughPy must support a variety of compute backends such as CUDA (NVidia),
 ROCm/HIP (AMD), and Metal (Apple).
 
 RoughPy is a substantial library with numerous components, mostly written in C++
@@ -435,6 +466,7 @@ In the remainder of this section, we discuss some of the core components of
 RoughPy, give an example of using RoughPy, and discuss the future of RoughPy.
 
 ### Free tensors, shuffle tensors, and Lie objects
+
 In order to properly support rough path based methods and allow users to write
 code based on mathematical concepts, we provide realisations of several algebra
 types. The algebras provided in RoughPy are `FreeTensor`, `ShuffleTensor`, and
@@ -461,10 +493,11 @@ Polynomial coefficients can be used to derive formulae by performing
 calculations. This is a powerful technique for understanding the terms that
 appear in the result, particularly whilst testing and debugging.
 
-### Intervals 
+### Intervals
+
 RoughPy is very careful in the way it handles intervals. All intervals in
 RoughPy are half-open, meaning that they include one end point but not the
-other; they are either *clopen* $[a, b) := \{t: a\leq t < b\}$ or *opencl* $(a,
+other; they are either _clopen_ $[a, b) := \{t: a\leq t < b\}$ or _opencl_ $(a,
 b] := \{t : a < t \leq b\}$. Besides the type (clopen or opencl), all intervals
 must provide methods for retrieving the infimum ($a$ in the above notation) and
 the supremum ($b$ above) of the interval as double precision floats. This is
@@ -473,8 +506,8 @@ interval types are `RealInterval`, an interval with arbitrary real endpoints,
 and `DyadicInterval`, as described below. For brevity, we shall only consider
 clopen intervals.
 
-A *dyadic interval* is an interval $D_k^n := [k/2^n, (k+1)/2^n)$, where $k$, $n$
-are integers. The number $n$ is often described as the *resolution* of the
+A _dyadic interval_ is an interval $D_k^n := [k/2^n, (k+1)/2^n)$, where $k$, $n$
+are integers. The number $n$ is often described as the _resolution_ of the
 interval. The family of dyadic intervals of a fixed resolution $n$ partition the
 real line so that every real number $t$ belongs to a unique dyadic interval
 $D_n^k$. Moreover, the family of all dyadic intervals have the property that two
@@ -482,12 +515,13 @@ dyadic intervals are either disjoint or one contains the other (including the
 possibility that they are equal).
 
 In many cases, RoughPy will granularise an interval into a dyadic intervals. The
-*dyadic granularisation* of $[a, b)$ with resolution $n$ is $[k_1/2^n, k_2/2^n)$
+_dyadic granularisation_ of $[a, b)$ with resolution $n$ is $[k_1/2^n, k_2/2^n)$
 where $k_1 = \max\{k: k/2^n \leq a\}$ and $k_2 = \max\{k: k/2^n \leq b\}$. In
 effect, the dyadic granularisation is the result of "rounding" each end point to
 the included end of the unique dyadic interval that contain it.
 
 ### Streams
+
 Streams are central to RoughPy. A RoughPy `Stream` is a rough path view of some
 underlying data. It provides two key methods to query the object over intervals
 to retrieve either a signature or log-signature. Importantly, once constructed,
@@ -500,7 +534,7 @@ We construct streams by a factory function associated with each different
 `StreamInterface`, which might perform some compression of the underlying data.
 For example, a basic `StreamInterface` is the `LieIncrementStream`, which can be
 constructed using the associated `from_increments` factory function (a static
-method of the class), which accepts an $n \times d$ array of *increment data*.
+method of the class), which accepts an $n \times d$ array of _increment data_.
 These data will typically be the differences between successive values of the
 data (but could also include higher-order Lie terms). This is similar to the way
 that libraries such as `esig`, `iisignature`, and `signatory` consume data.
@@ -510,7 +544,7 @@ so they can be reused in later calculations. To compute the log-signature over
 any interval $I$, we granularise at a fixed stream resolution $n$ to obtain the
 interval $\tilde I = [k_1/2^n, k_2/2^n)$, and then compute
 :::{math}
-\mathrm{LogSig}(\tilde I) = \log\biggl(\prod_{k=k_1}^{k_2-1} 
+\mathrm{LogSig}(\tilde I) = \log\biggl(\prod\_{k=k_1}^{k_2-1}
 \exp(\mathrm{LogSig}(D_k^n))\biggr).
 :::
 The $\mathrm{LogSig}(D_k^n)$ terms on the right-hand-side are either retrieved
@@ -541,21 +575,25 @@ certainly include "online" data sources such as computer peripheral devices
 
 The other main `StreamInterface` implementation is the `PiecewiseAbelianStream`,
 which is an important construction from CDE. A piecewise Abelian path, or
-log-linear path, is an example of a *smooth rough path*, which generalises
-piecewise linear approximations of an arbitrary stream. Formally, an *Abelian
-path* $Y$ is a pair $([a, b), \mathbf{y})$ where $a < b$ and
+log-linear path, is an example of a _smooth rough path_, which generalises
+piecewise linear approximations of an arbitrary stream. Formally, an _Abelian
+path_ $Y$ is a pair $([a, b), \mathbf{y})$ where $a < b$ and
 $\mathbf{y}\in\mathcal{L}(V)$. The log-signature over an arbitrary interval $[u,
 v) \subseteq [a, b)$ is given by
+
 ```{math}
 \mathrm{LogSig}(Y)_{u, v} = \frac{v - u}{b - a}\mathbf{y}.
 ```
-A *piecewise Abelian path* is the concatenation of finitely many Abelian paths
+
+A _piecewise Abelian path_ is the concatenation of finitely many Abelian paths
 with adjacent intervals. For any rough path $X$ and partition $\{a = t_0 < t_1 <
 \dots < t_N = b\}$ there is a piecewise Abelian approximation for this path
 given by
+
 ```{math}
 \{([t_{j-1}, t_j), \mathrm{LogSig}(X)_{t_{j-1}, t_j}): j=1, \dots, N\}.
 ```
+
 This construction turns out to be vital for computing signature kernels
 [@10.25080/gerudo-f2bc6f59-001] and for solving CDEs
 [@10.1007/978-3-540-71285-5; @10.48550/arXiv.2402.18512]. In particular, this
@@ -563,17 +601,21 @@ construction can be used to compress data at some degree, which can the be used
 in computations at a higher degree.
 
 ### Example
+
 In this section we show a very simple example of how to use RoughPy to construct
 a stream and compute a signature. This example is similar to the first few steps
-of the tutorial found in the RoughPy documentation.[^roughpydocs] RoughPy can be
-installed using `pip`, where prebuilt wheels are available for Windows, Linux,
+of the tutorial found in the [RoughPy documentation](https://roughpy.org/user/absolute_beginners.html).
+RoughPy can be installed using `pip`, where prebuilt wheels are available for Windows, Linux,
 and MacOs:
+
 ```
 pip install roughpy
 ```
+
 We refer the reader to this documentation for much more detail.
 We will construct a stream in $\mathbb{R}^{26}$ by taking each letter in a word,
 "scipy" in this example, as the increments of a path:
+
 ```python
 import numpy as np
 
@@ -582,46 +624,51 @@ increments = np.zeros((5, 26), dtype="int8")
 for i, c in enumerate(text):
     increments[i, ord(c) - 97] = 1
 ```
+
 Now we import RoughPy and construct a `Stream` using the factory mentioned
 above. One other critical ingredient is the algebra `Context`, which is used to
 set up a consistent set of algebra objects with the desired width (26),
 truncation level (2), and coefficient type (`Rational`).
+
 ```python
 import roughpy as rp
 
-ctx = rp.get_context(width=26, depth=2, 
+ctx = rp.get_context(width=26, depth=2,
       coeffs=rp.Rational)
 stream = rp.LieIncrementStream.from_increments(
          increments, ctx=ctx)
 ```
+
 Now we can compute the signature of the stream over the whole domain of the
 stream $[0, 4]$ by omitting the interval argument:
+
 ```python
 sig = stream.signature()
 print(sig)
-# { 1() 1(3) 1(9) 1(16) 1(19) 1(25) 1/2(3,3) 
-#   1(3,9) 1(3,16) 1(3,25) 1/2(9,9) 1(9,16) 
-#   1(9,25) 1/2(16,16) 1(16,25) 1(19,3) 1(19,9) 
+# { 1() 1(3) 1(9) 1(16) 1(19) 1(25) 1/2(3,3)
+#   1(3,9) 1(3,16) 1(3,25) 1/2(9,9) 1(9,16)
+#   1(9,25) 1/2(16,16) 1(16,25) 1(19,3) 1(19,9)
 #   1(19,16) 1/2(19,19) 1(19,25) 1/2(25,25) }
 ```
-The first term of the signature is always 1, and the empty parentheses 
+
+The first term of the signature is always 1, and the empty parentheses
 indicate the empty tensor word.
-The next five terms correspond to the counts of each unique letter that 
+The next five terms correspond to the counts of each unique letter that
 appears, the number in parentheses indicates the letter (with `a` being 1).
 The final terms indicate the order in which each pair of letters appear in
 the word.
 For instance, the term `1(3,9)` indicates that a `c` appears before an `i`.
 
-This is only the beginning of the story. 
+This is only the beginning of the story.
 From here, we can use the signatures to compute the similarity between streams,
 via the signature kernel for instance, or used as features in a variety of
 machine learning problems.
 More detailed examples of how to use signatures in data science are given on the
 DataSig website [https://datasig.ac.uk/examples](https://datasig.ac.uk/examples).
 
-:::{prf:remark} 
+:::{prf:remark}
 :nonumber:
-It turns out that most words in the English language can be distinguished 
+It turns out that most words in the English language can be distinguished
 using only their level 2 signatures.
 The first level signatures groups words into anagrams.
 The second level signature counts the occurrences of each ordered pair of
@@ -634,21 +681,20 @@ This is shown in the RoughPy documentation.
 Similar patterns can be observed in other languages too, including French,
 Spanish, German, Russian, and Lithuanian.
 :::
-[^roughpydocs]: <https://roughpy.org/user/absolute_beginners.html>
-
 
 ### The future of RoughPy
+
 RoughPy is continuously evolving.
 At time of writing, the current version uses libalgebra and libalgebra-lite
 (libalgebra with fewer templates) for computations.
-Unfortunately, this made it difficult to achieve the differentiability and 
+Unfortunately, this made it difficult to achieve the differentiability and
 computation device support that we want.
 We are currently changing the way we implement vectors and algebras to provide
 the support for on-device computation that we want.
 Making the operations differentiable is crucial for machine learning, and will
 be the biggest challenge.
 
-Long term, we need to expand support for signature kernels and CDEs. 
+Long term, we need to expand support for signature kernels and CDEs.
 As applications of these tools grow in data science, we will need to devise
 new methods for computing kernels, or solving CDEs.
 We will also build a framework for constructing and working with linear maps,
@@ -656,8 +702,8 @@ and homomorphisms.
 For example, one very useful linear map is the extension of the $\log$ function
 to the whole tensor algebra.
 
-
 ## Conclusions
+
 The use of rough path theory in data science is rapidly expanding and provides a
 different way to view sequential data.
 Signatures, and other methods arising from rough path theory, are already used
@@ -678,5 +724,4 @@ natural way.
 RoughPy is under active development, and a long list of improvements and
 extensions are planned.
 
-
-% vim: tw=80 wrap cc=80 
+% vim: tw=80 wrap cc=80
diff --git a/papers/sam_morley/thumbnail.png b/papers/sam_morley/thumbnail.png
new file mode 100644
index 0000000000..c689f4ec8e
Binary files /dev/null and b/papers/sam_morley/thumbnail.png differ
diff --git a/papers/theia/banner.png b/papers/theia/banner.png
index e6a793bd6c..d188fa1a69 100644
Binary files a/papers/theia/banner.png and b/papers/theia/banner.png differ
diff --git a/papers/theia/exports/theia.pdf b/papers/theia/exports/theia.pdf
deleted file mode 100644
index e6445dca15..0000000000
Binary files a/papers/theia/exports/theia.pdf and /dev/null differ
diff --git a/papers/theia/main.md b/papers/theia/main.md
index 6f1c0b9d56..004d121484 100644
--- a/papers/theia/main.md
+++ b/papers/theia/main.md
@@ -1,126 +1,134 @@
 ---
-title: "THEIA: An Offline Tool for Tradespace Visualization"
-exports:
-  - format: pdf
-    template: lapreprint-typst
-    output: exports/my-document.pdf
-abstract:
-  Within the Army Corps of Engineers (USACE), there is a need to evaluate tradespaces. 
-  Tradespace datasets are the result of large parameter sweeps run over numerous 
-  design options and can consist of thousands or even millions of design configurations 
-  and the corresponding performance metrics. Because of the immense size of these 
-  datasets, the ability to effectively visualize the data is essential for proper 
-  evaluation. At the USACE Engineer Research & Development Center (ERDC), an 
-  easy-to-use plotting tool known as the Tradespace Holistic Exploration & 
-  Insight Application (THEIA) has been developed for visualizing this complex 
-  tradespace data related to the acquisitions process. THEIA was developed using 
-  Python libraries including Panel, Param, Holoviews, Bokeh, and Plotly. When 
-  combined, these libraries offer a wide range of widgets and plots that allow 
-  the user to visualize their data in multiple ways. Additionally, users can 
-  easily save plots, export findings, and utilize multiple data files at once. 
-  THEIA is also capable of importing tabular data while presenting options to 
+title: 'THEIA: An Offline Tool for Tradespace Visualization'
+abstract: Within the Army Corps of Engineers (USACE), there is a need to evaluate tradespaces.
+  Tradespace datasets are the result of large parameter sweeps run over numerous
+  design options and can consist of thousands or even millions of design configurations
+  and the corresponding performance metrics. Because of the immense size of these
+  datasets, the ability to effectively visualize the data is essential for proper
+  evaluation. At the USACE Engineer Research & Development Center (ERDC), an
+  easy-to-use plotting tool known as the Tradespace Holistic Exploration &
+  Insight Application (THEIA) has been developed for visualizing this complex
+  tradespace data related to the acquisitions process. THEIA was developed using
+  Python libraries including Panel, Param, Holoviews, Bokeh, and Plotly. When
+  combined, these libraries offer a wide range of widgets and plots that allow
+  the user to visualize their data in multiple ways. Additionally, users can
+  easily save plots, export findings, and utilize multiple data files at once.
+  THEIA is also capable of importing tabular data while presenting options to
   customize visualizations and help the end-user make informed decisions.
 ---
 
-### Background/Motivation 
-
-Within the United States Army Corps of Engineers (USACE) the evaluation of 
-tradespaces has become an important capability. Tradespaces, in essence, are 
-extensive datasets generated from broad parameter sweeps across a multitude of 
-design options. These datasets can encompass thousands, if not millions of 
-design configurations, each accompanied by its respective performance metrics. 
-This process results in an immense amount of data, which, while rich in information, 
-presents a challenge in terms of analysis and interpretation. Because of the 
-colossal size of these datasets, the ability to visualize them effectively is 
-not just beneficial, but essential for comprehensive evaluation. This visualization 
-is particularly crucial when it come to the acquisitions process, where 
-understanding the implications of each design choice can have far-reaching 
-consequences. To address this need, the USACE Engineer Research & Development 
-Center (ERDC) has developed a user-friendly plotting tool, named the Tradespace 
-Holistic Exploration & Insight Application (THEIA). 
+### Background/Motivation
+
+Within the United States Army Corps of Engineers (USACE) the evaluation of
+tradespaces has become an important capability. Tradespaces, in essence, are
+extensive datasets generated from broad parameter sweeps across a multitude of
+design options. These datasets can encompass thousands, if not millions of
+design configurations, each accompanied by its respective performance metrics.
+This process results in an immense amount of data, which, while rich in information,
+presents a challenge in terms of analysis and interpretation. Because of the
+colossal size of these datasets, the ability to visualize them effectively is
+not just beneficial, but essential for comprehensive evaluation. This visualization
+is particularly crucial when it come to the acquisitions process, where
+understanding the implications of each design choice can have far-reaching
+consequences. To address this need, the USACE Engineer Research & Development
+Center (ERDC) has developed a user-friendly plotting tool, named the Tradespace
+Holistic Exploration & Insight Application (THEIA).
 
 ### What is a tradespace?
 
-In the context of this paper, the term tradespace is a combination of the terms "parameter space" 
-and "tradeoff". A tradespace dataset is a dataset that shows tradeoffs associated with 
+In the context of this paper, the term tradespace is a combination of the terms "parameter space"
+and "tradeoff". A tradespace dataset is a dataset that shows tradeoffs associated with
 certain design choices in a parameter space. Oftentimes, these datasets are generated by high
 performance computing (HPC) jobs called parameter sweeps.
 
 :::{figure} ./assets/tradespace.png
 :label: fig:tradespace
-This is a simple example of a tradespace in the context of car shopping. When 
-shopping for a new vehicle, buyers often compile a list of pros and cons based 
-on certain vehicle types. The dataset the buyer creates is, in essence, a tradespace. 
-For example, a buyer may want to maximize potential range, while minimizing vehicle 
-cost. By analyzing this tradespace, the buyer would be able to see that opting for a 
+This is a simple example of a tradespace in the context of car shopping. When
+shopping for a new vehicle, buyers often compile a list of pros and cons based
+on certain vehicle types. The dataset the buyer creates is, in essence, a tradespace.
+For example, a buyer may want to maximize potential range, while minimizing vehicle
+cost. By analyzing this tradespace, the buyer would be able to see that opting for a
 sedan provides both the best price and the best potential range.
 :::
 
 ### Methods
 
-THEIA’s primary function is to provide a means of visualizing intricate tradespace 
-data, thereby facilitating a more thorough understanding of the various design 
-configurations and their corresponding performance metrics. It's a robust 
+THEIA’s primary function is to provide a means of visualizing intricate tradespace
+data, thereby facilitating a more thorough understanding of the various design
+configurations and their corresponding performance metrics. It's a robust
 tool built upon several Python libraries, including [Panel](https://panel.holoviz.org/), [Bokeh](https://bokeh.org/),
-[Param](https://param.holoviz.org/), [HoloViews](https://holoviews.org/), 
-and [Plotly](https://plotly.com/). The integration 
-of these libraries equips THEIA with a diverse array of widgets and plots, enabling 
-users to visualize their data in various ways simultaneously. Panel, Param, and 
-HoloViews are all a part of the HoloViz suite of Python packages, which allows for 
-excellent synchronization between them. 
+[Param](https://param.holoviz.org/), [HoloViews](https://holoviews.org/),
+and [Plotly](https://plotly.com/). The integration
+of these libraries equips THEIA with a diverse array of widgets and plots, enabling
+users to visualize their data in various ways simultaneously. Panel, Param, and
+HoloViews are all a part of the HoloViz suite of Python packages, which allows for
+excellent synchronization between them.
 
 #### Panel
-Panel [@panel] is an open-source library for easily building tools, dashboards, and complex applications, 
+
+Panel [@panel] is an open-source library for easily building tools, dashboards, and complex applications,
 all within Python. Its role in THEIA is to define the layout of the application, in an easy, extensible way.
-This is done by utilizing Panel's built-in visual components. Components are the building blocks of apps built 
+This is done by utilizing Panel's built-in visual components. Components are the building blocks of apps built
 on Panel. The primary components utilized by THEIA are Widgets, Panes, Layouts, and Templates.
-- Widgets are generally small components like sliders, text fields, or checkboxes. 
+
+- Widgets are generally small components like sliders, text fields, or checkboxes.
 - Panes are data wrappers that enable rendering of that data in multiple ways.
 - Layouts allow for the coupling of various components like widgets or panes so that they can be rendered together using the Bokeh library.
-- Templates are components that allow the rendering of Panel objects in an HTML document. 
+- Templates are components that allow the rendering of Panel objects in an HTML document.
 
 #### Param
-Param [@param] is an open-source library built for handling user-modifiable parameters, arguments, and attributes 
-that control your code. These capabilities are used in THEIA to monitor for user 
-inputs and changes while maintaining a far simpler codebase. This is done by leveraging Parameters 
-in special Parameterized classes, which provide features like type-checking and default values. THEIA leverages these 
+
+Param [@param] is an open-source library built for handling user-modifiable parameters, arguments, and attributes
+that control your code. These capabilities are used in THEIA to monitor for user
+inputs and changes while maintaining a far simpler codebase. This is done by leveraging Parameters
+in special Parameterized classes, which provide features like type-checking and default values. THEIA leverages these
 classes to define acceptable input types and values as well as handle errors without large amounts of boilerplate code.
 
 #### Holoviews
-HoloViews [@holoviews] is an open-source library built for data analysis and visualization with small amounts of code. This library is the 
+
+HoloViews [@holoviews] is an open-source library built for data analysis and visualization with small amounts of code. This library is the
 backend that provides many of the plots in THEIA. HoloViews takes a different approach to plotting than many other popular libraries.
-Instead of focusing purely on visualization, HoloViews plots are defined in a way that supports analysis in addition to visualization. 
+Instead of focusing purely on visualization, HoloViews plots are defined in a way that supports analysis in addition to visualization.
 This is done by coupling raw data with metadata in a way that allows for immediate visualization rendering as data changes. HoloViews
 also leverages some of Bokeh's capabilities when rendering specific plots.
 
 #### Plotly Express
+
 Plotly Express [@plotly] is a high-level interface to Plotly. It's built to produce easily styleable figures with minimal code.
-HoloViews leverages this library within THEIA to render supplementary plots like the parallel plot. Plotly Express is an 
-optimal choice because it often provides highly interractive plots by default. 
+HoloViews leverages this library within THEIA to render supplementary plots like the parallel plot. Plotly Express is an
+optimal choice because it often provides highly interactive plots by default.
 
 ### Results
-Together, these scientific Python libraries allowed the Rapid Application Development team at ERDC to develop a 
-robust visualization tool capable of running in the browser without being reliant 
+
+Together, these scientific Python libraries allowed the Rapid Application Development team at ERDC to develop a
+robust visualization tool capable of running in the browser without being reliant
 on an internet connection. Additionally, using THEIA requires no coding. Users simply upload a CSV file using
 the built-in navigator, and THEIA does the rest. The following screenshots show THEIA's capabilities using the [iris](https://archive.ics.uci.edu/dataset/53/iris) dataset.
+
 :::{figure} ./assets/upload.png
-:lable: fig:upload
+:label: fig:upload
 View of THEIA's file browser.
 :::
-THEIA also allows user to select from various plot types and color themes, which is a vital feature when 
+
+THEIA also allows user to select from various plot types and color themes, which is a vital feature when
 working with tradespace data because it allows the user to choose the theme which best distinguishes the complex data.
+
 :::{figure} ./assets/user_inputs.png
 :label: fig:user_inputs
 View of THEIA's plot type options.
 :::
+
 :::{figure} ./assets/colormap.png
 :label: fig:colormap
-View of THEIA's colormap options. 
+View of THEIA's colormap options.
 :::
-THEIA also allows users to easily save plots for future reference, export findings 
-for further analysis, and utilize multiple data files simultaneously. 
+
+THEIA also allows users to easily save plots for future reference, export findings
+for further analysis, and utilize multiple data files simultaneously.
+
 :::{figure} ./assets/save_and_export.png
 :label: fig:save_and_export
-A view of the 'Download Configuration' and 'Export Plot' options. "Download Configuration" can be used to save a YAML file that THEIA can leverage to return to a given state. "Export Plot" is used to save an HTML version of a specific plot. 
+A view of the 'Download Configuration' and 'Export Plot' options. "Download Configuration" can be used to save a YAML file that THEIA can leverage to return to a given state. "Export Plot" is used to save an HTML version of a specific plot.
 :::
+
 Ultimately, THEIA serves as an excellent example of how open-source libraries can be leveraged to develop robust solutions to emerging problems.
diff --git a/papers/theia/myst.yml b/papers/theia/myst.yml
index 89d504b838..b2d71b619d 100644
--- a/papers/theia/myst.yml
+++ b/papers/theia/myst.yml
@@ -1,11 +1,14 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/RVRR7774
   id: scipy-2024-theia
-  title: "THEIA: An Offline Tool for Tradespace Visualization"
+  title: 'THEIA: An Offline Tool for Tradespace Visualization'
+  description: Tradespace datasets are the result of large parameter sweeps run over numerous design options and can consist of thousands or even millions of design configurations and the corresponding performance metrics. THEIA has been developed for visualizing this complex tradespace data related to the acquisitions process.
   authors:
     - name: Samuel Williams
       email: Samuel.C.Williams@usace.army.mil
-      affiliations: 
+      affiliations:
         - U.S. Army Engineer Research & Development Center
         - U.S. Army Corps of Engineers
     - name: Scott Christensen
@@ -32,20 +35,12 @@ project:
     THEIA: Tradespace Holistic Exploration & Insight Application
     USACE: United States Army Corps of Engineers
     ERDC: U.S. Army Engineer Research & Development Center
+    HPC: high performance computing
 
   error_rules:
     - rule: doi-exists
       severity: ignore
-      keys: 
+      keys:
         - plotly
-    
-  banner: banner.png
-
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
-
 site:
   template: article-theme
diff --git a/papers/theia/thumbnail.png b/papers/theia/thumbnail.png
new file mode 100644
index 0000000000..32ee7b3f9a
Binary files /dev/null and b/papers/theia/thumbnail.png differ
diff --git a/papers/valentina_staneva/banner.png b/papers/valentina_staneva/banner.png
index e6a793bd6c..14e0ccf54e 100644
Binary files a/papers/valentina_staneva/banner.png and b/papers/valentina_staneva/banner.png differ
diff --git a/papers/valentina_staneva/main.md b/papers/valentina_staneva/main.md
index 78897e8866..dc0e7ff30e 100644
--- a/papers/valentina_staneva/main.md
+++ b/papers/valentina_staneva/main.md
@@ -1,14 +1,12 @@
 ---
-title: "Echodataflow: Recipe-based Fisheries Acoustics Workflow Orchestration"
+title: 'Echodataflow: Recipe-based Fisheries Acoustics Workflow Orchestration'
 abstract: |
-  With the influx of large data from multiple instruments and experiments, scientists are wrangling complex data pipelines that are context-dependent and non-reproducible. We demonstrate how we leverage Prefect [@prefect], a modern orchestration framework, to facilitate fisheries acoustics data processing. We built a Python package Echodataflow [@echodataflow] which 1) allows users to specify workflows and their parameters through editing text “recipes” which provide transparency and reproducibility of the pipelines; 2) supports scaling of the workflows while abstracting the computational infrastructure; 3) provides monitoring and logging of the workflow progress. Under the hood, Echodataflow uses Prefect to execute the workflows while providing a domain-friendly interface to facilitate diverse fisheries acoustics use cases. We demonstrate the features through a typical ship survey data processing pipeline. 
+  With the influx of large data from multiple instruments and experiments, scientists are wrangling complex data pipelines that are context-dependent and non-reproducible. We demonstrate how we leverage Prefect [@prefect], a modern orchestration framework, to facilitate fisheries acoustics data processing. We built a Python package Echodataflow [@echodataflow] which 1) allows users to specify workflows and their parameters through editing text “recipes” which provide transparency and reproducibility of the pipelines; 2) supports scaling of the workflows while abstracting the computational infrastructure; 3) provides monitoring and logging of the workflow progress. Under the hood, Echodataflow uses Prefect to execute the workflows while providing a domain-friendly interface to facilitate diverse fisheries acoustics use cases. We demonstrate the features through a typical ship survey data processing pipeline.
 ---
 
-
 ## Motivation
-Acoustic fisheries surveys and ocean observing systems collect terabytes of echosounder (water column sonar) data that require custom processing pipelines to obtain the distributions and abundance of fish and zooplankton in the ocean [@ncei_story_map]. The data are collected by sending an acoustic signal into the ocean which scatters from objects in the water column and the returning “echo” is recorded. Although data usually have similar dimensions: range, time, location, and frequency, and can be stored into multi-dimensional arrays, the exact format varies based on the data collection scheme and the exact instrument used. Fisheries ship surveys, for example, follow pre-defined paths (transects) and can span several months ([Figure %s ](#fig:data_collection) left). Ocean moorings, on the other hand, have instruments at fixed locations and can collect data continuously at specified intervals for months ([Figure %s ](#fig:data_collection) right). Uncrewed Surface Vessels (USVs) (e.g. Saildrone [@saildrone], DriX [@drix], [Figure %s ](#fig:data_collection) middle) can autonomously collect echosounder data over large spatial regions. In all these scenarios, data are usually collected with similar instruments, and there is an overlap between the initial processing procedures. However, there are always variations associated with the specific data collection format, end research needs, data volume, and available computational infrastructure. For example, ship surveys may require grouping data along individual transects and exluding other data; they may also have varying range/depth resulting into data arrays of different dimensions. Mooring data are more regular, but their volume is large, and studies may require organizing data into daily patterns to analyze long term trends. USVs collect data at varying speeds thus requiring converting the time dimension to distance in order to have consistent echo patterns. The time when the data needs to be processed also affects the workflows: on premise/realtime applications usually require processing small data subsets at a time with limited computing resources; historical analyses require processing large datasets, and can benefit from cluster/cloud computing. The various scenarios demand different data workflows, and adapting from one setting to another is not trivial.
-
 
+Acoustic fisheries surveys and ocean observing systems collect terabytes of echosounder (water column sonar) data that require custom processing pipelines to obtain the distributions and abundance of fish and zooplankton in the ocean [@ncei_story_map]. The data are collected by sending an acoustic signal into the ocean which scatters from objects in the water column and the returning “echo” is recorded. Although data usually have similar dimensions: range, time, location, and frequency, and can be stored into multi-dimensional arrays, the exact format varies based on the data collection scheme and the exact instrument used. Fisheries ship surveys, for example, follow pre-defined paths (transects) and can span several months ([Figure %s](#fig:data_collection) left). Ocean moorings, on the other hand, have instruments at fixed locations and can collect data continuously at specified intervals for months ([Figure %s](#fig:data_collection) right). Uncrewed Surface Vessels (USVs) (e.g. Saildrone [@saildrone], DriX [@drix], [Figure %s](#fig:data_collection) middle) can autonomously collect echosounder data over large spatial regions. In all these scenarios, data are usually collected with similar instruments, and there is an overlap between the initial processing procedures. However, there are always variations associated with the specific data collection format, end research needs, data volume, and available computational infrastructure. For example, ship surveys may require grouping data along individual transects and exluding other data; they may also have varying range/depth resulting into data arrays of different dimensions. Mooring data are more regular, but their volume is large, and studies may require organizing data into daily patterns to analyze long term trends. USVs collect data at varying speeds thus requiring converting the time dimension to distance in order to have consistent echo patterns. The time when the data needs to be processed also affects the workflows: on premise/realtime applications usually require processing small data subsets at a time with limited computing resources; historical analyses require processing large datasets, and can benefit from cluster/cloud computing. The various scenarios demand different data workflows, and adapting from one setting to another is not trivial.
 
 :::{figure} data_collection.png
 :label: fig:data_collection
@@ -16,15 +14,16 @@ Acoustic fisheries surveys and ocean observing systems collect terabytes of echo
 :::
 
 ## Fisheries Acoustics Workflows
-Fisheries acoustics scientists traditionally have had go-to tools and procedures for their data processing and analysis, mostly relying on computation on a local computer. However, as the diversity of computing and data storage resources grows and the field becomes more interdisciplinary (it involves scientists with backgrounds in physics, biology, oceanography, acoustics, signal processing, machine learning, software engineering, etc.), it is becoming more challenging to make decisions on the best arrangement to accomplish the work. For example, [Figure %s ](#fig:workflow_variations) shows the many variations of workflows that can be defined based on the use cases and the options for data storage and computing infrastructure. 
+
+Fisheries acoustics scientists traditionally have had go-to tools and procedures for their data processing and analysis, mostly relying on computation on a local computer. However, as the diversity of computing and data storage resources grows and the field becomes more interdisciplinary (it involves scientists with backgrounds in physics, biology, oceanography, acoustics, signal processing, machine learning, software engineering, etc.), it is becoming more challenging to make decisions on the best arrangement to accomplish the work. For example, [Figure %s](#fig:workflow_variations) shows the many variations of workflows that can be defined based on the use cases and the options for data storage and computing infrastructure.
 
 :::{figure} workflow_variations.png
 :label: fig:workflow_variations
 **Fisheries Acoustics Workflow Variations:** Various use cases (fisheries, data management, machine learning, education) drive different needs for data storage and computing infrastructure. Options are abundant but adopting new technology and adapting workflows across use cases is not trivial.
 :::
 
-
 ### User Stories
+
 To demonstrate the software requirements of the fisheries acoustics community, below we describe several example user stories.
 
 A **fisheries scientist** needs to process all data after a 2-month ship survey to obtain fish biomass estimates. They have previously used a commercial software package and are open to exploring open-source tools to achieve the same goal. They are familiar with basic scripting in Python.
@@ -37,16 +36,16 @@ An **acoustics graduate student** obtained echosounder data analysis scripts fro
 
 We draw attention to the different levels of experience of these users: each user has expertise in a subdomain, however, to accomplish their specific goal(s), they need to learn new tools or obtain knowledge from others. We outline several requirements that stem from these stories:
 
-* The system should run both on a local computer and within a cloud environment. 
-* The system should allow processing to be scaled to large datasets, but should not be overly complicated. For example, users with Python scripting experience can run it locally with pre-defined stages and parameters.
-* The system should provide visibility into the operations that are applied to the data, and the procedures should be interpretable to users without acoustics expertise.
-* The system should preferably be free and open source so that it is accessible to members of different institutions.
-* The system should adapt to rapid changes of cloud and distributed computing libraries, and preferably should leverage existing developments within the technical communities.
-
+- The system should run both on a local computer and within a cloud environment.
+- The system should allow processing to be scaled to large datasets, but should not be overly complicated. For example, users with Python scripting experience can run it locally with pre-defined stages and parameters.
+- The system should provide visibility into the operations that are applied to the data, and the procedures should be interpretable to users without acoustics expertise.
+- The system should preferably be free and open source so that it is accessible to members of different institutions.
+- The system should adapt to rapid changes of cloud and distributed computing libraries, and preferably should leverage existing developments within the technical communities.
 
 ### Software Landscape
+
 Traditionally echosounder data processing pipelines are executed within a GUI-based software (e.g. Echoview [@echoview_software_pty_ltd_echoview_nodate], LSSS [@korneliussen_lsss_2006], ESP3 [@ladroit_esp3_2020], Matecho [@perrot_matecho_2018]). These software packages have been invaluable for onboard real-time visualization, as well as post-survey data screening and annotation. Some of them also support integration with scripting tools which facilitates the reproducible execution of the pipelines. For example, the Echoview software provides the option to automate pipelines through an Automation Module and to visualize the processing stages in a Dataflow Toolbox. Further, one can script operations through the `echoviewR` package [@harrison_echoviewR_2015]. However, since Echoview is neither free nor open source, these pipelines cannot be shared with researchers who do not have a license. In general, the GUI tools are usually designed to be used on a desktop computer and require downloading the data first, which is becoming challenging with the growing volume of the datasets. There has been also growth in development of new methods to detect the species of interest from the echosounder data, with the goal of substituting for the manual annotation procedures and making analysis of large datasets more efficient and objective. However, the new methods are typically developed independently from the existing software packages.
- Over the last several years there has been substantial development of open source Python packages (PyEchoLab [@wall_pyecholab_2018], `echopype` [@lee_echopype_2021], `echopy` [@echopy_2024]), each providing common echosounder processing functionalities, but differing in the data structure organization and processing. Since echosounder instruments store the data in binary, instrument-specific formats, the first stage requires parsing the raw data into a more common format. PyEcholab converts the data into `numpy` [@numpy] arrays. `echopy` expects data are already parsed into `numpy` arrays and all methods operate on them. Echopype converts raw data files into a standardized Python `EchoData` object, which can be stored in a `zarr` [@zarr] format and supports distributed computing by utilizing `dask` [@dask] and `xarray` [@xarray]. The use of open source packages and well-established formats allow further integration with other open source libraries such as those for machine learning (e.g. classification, clustering) or visualization. In addition, if custom modification is required for a specific application scenario, researchers can adapt the code and contribute the modification back to the packages, which is likely to benefit other researchers.
+Over the last several years there has been substantial development of open source Python packages (PyEchoLab [@wall_pyecholab_2018], `echopype` [@lee_echopype_2021], `echopy` [@echopy_2024]), each providing common echosounder processing functionalities, but differing in the data structure organization and processing. Since echosounder instruments store the data in binary, instrument-specific formats, the first stage requires parsing the raw data into a more common format. PyEcholab converts the data into `numpy` [@numpy] arrays. `echopy` expects data are already parsed into `numpy` arrays and all methods operate on them. Echopype converts raw data files into a standardized Python `EchoData` object, which can be stored in a `zarr` [@zarr] format and supports distributed computing by utilizing `dask` [@dask] and `xarray` [@xarray]. The use of open source packages and well-established formats allow further integration with other open source libraries such as those for machine learning (e.g. classification, clustering) or visualization. In addition, if custom modification is required for a specific application scenario, researchers can adapt the code and contribute the modification back to the packages, which is likely to benefit other researchers.
 
 #### Challenges
 
@@ -54,37 +53,42 @@ Despite the availability of methods and tools to process echosounder data, it is
 
 ##### Data Storage
 
-Researchers are faced with decisions of where to store the data from experiments, intermediate products, and final results. Initially, data are usually stored on local hard drive storage associated with the instrument (which on some platforms may have limited capacity), but eventually, these data may be transferred to a data archive if one is maintained within the community. Some agencies (e.g. NOAA National Centers for Environmental Information (NCEI) [@wall_2016]) have adopted cloud storage, and have publicly shared their data, which greatly facilitates data access and reuse. However, those repositories are usually not where researchers can store processed products. Funding models and organizational structure can result in short-term availability of resources and the need to change providers. Certain applications may need to access the data before they are archived and unreliable internet connection may require storing the data on-premise or at temporary locations. *To be agile to those frequent changes and allow to easily switch between different platforms, workflows will benefit from a level of abstraction from storage systems.*
+Researchers are faced with decisions of where to store the data from experiments, intermediate products, and final results. Initially, data are usually stored on local hard drive storage associated with the instrument (which on some platforms may have limited capacity), but eventually, these data may be transferred to a data archive if one is maintained within the community. Some agencies (e.g. NOAA National Centers for Environmental Information (NCEI) [@wall_2016]) have adopted cloud storage, and have publicly shared their data, which greatly facilitates data access and reuse. However, those repositories are usually not where researchers can store processed products. Funding models and organizational structure can result in short-term availability of resources and the need to change providers. Certain applications may need to access the data before they are archived and unreliable internet connection may require storing the data on-premise or at temporary locations. _To be agile to those frequent changes and allow to easily switch between different platforms, workflows will benefit from a level of abstraction from storage systems._
 
 ##### Computing Infrastructure
 
-With the growth of the echosounder datasets, researchers face challenges processing the data on their personal machines: both in terms of memory usage and computational time. A typical first attempt for resolution would be to amend the workflow to process smaller chunks of the data and parallelize operations across multiple cores if available. However, today researchers are also presented with a multitude of options for distributed computing: high-performance computing clusters at local or national institutions, cloud provider services: batch computing (e.g. Azure Batch, AWS Batch, Google Cloud Batch), container services (e.g. Amazon Elastic Container Services, Azure Container Apps, Google Kubernetes Engine), serverless functions (e.g. AWS Lamdba Functions, Google Cloud Functions, Microsoft Azure Functions). The choice may be driven by the storage system: its usage fees and retrieval speeds. Data, code and workflow organization usually has to be adapted based on the computing infrastructure. The knowledge required to configure these systems to achieve efficient processing is quite in-depth, and distributed libraries can be hard to debug and can have unexpected performance bottlenecks. *Abstracting the computing infrastructure and the execution of the tasks can allow researchers to focus on the scientific analysis of these large and rich datasets.*
+With the growth of the echosounder datasets, researchers face challenges processing the data on their personal machines: both in terms of memory usage and computational time. A typical first attempt for resolution would be to amend the workflow to process smaller chunks of the data and parallelize operations across multiple cores if available. However, today researchers are also presented with a multitude of options for distributed computing: high-performance computing clusters at local or national institutions, cloud provider services: batch computing (e.g. Azure Batch, AWS Batch, Google Cloud Batch), container services (e.g. Amazon Elastic Container Services, Azure Container Apps, Google Kubernetes Engine), serverless functions (e.g. AWS Lamdba Functions, Google Cloud Functions, Microsoft Azure Functions). The choice may be driven by the storage system: its usage fees and retrieval speeds. Data, code and workflow organization usually has to be adapted based on the computing infrastructure. The knowledge required to configure these systems to achieve efficient processing is quite in-depth, and distributed libraries can be hard to debug and can have unexpected performance bottlenecks. _Abstracting the computing infrastructure and the execution of the tasks can allow researchers to focus on the scientific analysis of these large and rich datasets._
 
 ## Echodataflow Overview
-At the center of `echodataflow`'s design is the notion that a workflow can be configured through a set of recipes (`.yaml` files) that specify the pipeline, data storage, and logging structure. The idea draws inspiration from the Pangeo-Forge Project which facilitates the Extraction, Transformation, Loading (ETL) of earth science geospatial datasets from traditional repositories to analysis-ready, cloud-optimized (ARCO) data stores [@pangeo-forge]. The pangeo-forge recipes (which themselves are inspired by the conda-forge recipes [@conda_forge_community_2015_4774216]) provide a model of how the data should be accessed and transformed, and the project has garnered numerous recipes from the community. 
 
-While Pangeo-Forge’s focus is on transformation from `netcdf` [@netcdf] and `hdf5` [@hdf5] formats to `zarr`, `echodataflow`’s aim is to support full echosounder data processing and analysis pipelines: from instrument-generated raw data files to data products which contain acoustically-derived biological estimates, such as abundance and biomass. `echodataflow` leverages Prefect [@prefect] to abstract data and computation management. In [Figure %s ](#fig:echodataflow_overview) we provide an overview of `echodataflow`’s framework. At the center we see several steps of an echosounder data processing pipeline: `open_raw`, `combine_echodata`, `compute_Sv`, `compute_MVBS`, `frequency_differencing`, which produce echo classificaton results using a simple threshold-based criterion. All these functions exist in the `echopype` package, and are wrapped by `echodataflow` into pre-defined stages. Prefect executes the stages on a Dask cluster which can be started locally or can be externally set up. These `echopype` functions already support distributed operations with Dask, and thus the integration with Prefect within `echodataflow` is natural. Dask clusters can be set up on a variety of platforms: local computers, cloud virtual machines, kubernetes [@kubernetes] clusters, or HPC clusters (via `dask-jobqueue` [@dask-jobqueue]), etc. and allow abstraction from the computing infrastructure. The input datasets, intermediate data products, and final data products can live in different storage systems (local/cloud) and Prefect’s block feature provides seamless, provider-agnostic, and secure integration with them. Workflows can be executed and monitored through Prefect’s dashboard, while logging of each function is handled by `echodataflow`.
+At the center of `echodataflow`'s design is the notion that a workflow can be configured through a set of recipes (`.yaml` files) that specify the pipeline, data storage, and logging structure. The idea draws inspiration from the Pangeo-Forge Project which facilitates the Extraction, Transformation, Loading (ETL) of earth science geospatial datasets from traditional repositories to analysis-ready, cloud-optimized (ARCO) data stores [@pangeo-forge]. The pangeo-forge recipes (which themselves are inspired by the conda-forge recipes [@conda_forge_community_2015_4774216]) provide a model of how the data should be accessed and transformed, and the project has garnered numerous recipes from the community.
+
+While Pangeo-Forge’s focus is on transformation from `netcdf` [@netcdf] and `hdf5` [@hdf5] formats to `zarr`, `echodataflow`’s aim is to support full echosounder data processing and analysis pipelines: from instrument-generated raw data files to data products which contain acoustically-derived biological estimates, such as abundance and biomass. `echodataflow` leverages Prefect [@prefect] to abstract data and computation management. In [Figure %s](#fig:echodataflow_overview) we provide an overview of `echodataflow`’s framework. At the center we see several steps of an echosounder data processing pipeline: `open_raw`, `combine_echodata`, `compute_Sv`, `compute_MVBS`, `frequency_differencing`, which produce echo classificaton results using a simple threshold-based criterion. All these functions exist in the `echopype` package, and are wrapped by `echodataflow` into pre-defined stages. Prefect executes the stages on a Dask cluster which can be started locally or can be externally set up. These `echopype` functions already support distributed operations with Dask, and thus the integration with Prefect within `echodataflow` is natural. Dask clusters can be set up on a variety of platforms: local computers, cloud virtual machines, kubernetes [@kubernetes] clusters, or HPC clusters (via `dask-jobqueue` [@dask-jobqueue]), etc. and allow abstraction from the computing infrastructure. The input datasets, intermediate data products, and final data products can live in different storage systems (local/cloud) and Prefect’s block feature provides seamless, provider-agnostic, and secure integration with them. Workflows can be executed and monitored through Prefect’s dashboard, while logging of each function is handled by `echodataflow`.
 
 :::{figure} echodataflow.png
 :label: fig:echodataflow_overview
-**Echodataflow Framework:** The above diagram provides an overview of the `echodataflow` framework: the objective is to fetch raw files from a local filesystem/cloud archive, process them through several stages of an echosounder data workflow using a cluster infrastructure, and store both intermediate and final data products. In `echodataflow` the workflow is executed based on text configurations, and logs are generated for the individual processing stages. Prefect handles the execution of the tasks on the cluster and provides tools for monitoring the workflow runs. 
+**Echodataflow Framework:** The above diagram provides an overview of the `echodataflow` framework: the objective is to fetch raw files from a local filesystem/cloud archive, process them through several stages of an echosounder data workflow using a cluster infrastructure, and store both intermediate and final data products. In `echodataflow` the workflow is executed based on text configurations, and logs are generated for the individual processing stages. Prefect handles the execution of the tasks on the cluster and provides tools for monitoring the workflow runs.
 :::
 
 ### Why Prefect?
+
 We chose Prefect among other Python workflow orchestration frameworks such as Apache Airflow [@airflow], Dagster [@dagster], Argo [@argo], Luigi [@luigi]. While most of these tools provide flexibily and level of abstraction suitable for executing fisheries acoustics pipelines, we selected Prefect for the following reasons:
-* Prefect accepts dynamic workflows which are specified at runtime and do not require to follow a Directed Acyclic Graph, which can be restricting and difficult to implement.
-* In Prefect, Python functions are first class citizens, thus building a Prefect workflow does not deviate substantially from traditional science workflows composed of functions.
-* Prefect integrates with a `dask` cluster, and `echopype` processing functions are already using `dask` to scale operations.
-* Prefect’s code runs similarly locally as well as on cloud services. 
-* Prefect’s monitoring dashboard is open source, can be run locally, and is intuitive to use.
+
+- Prefect accepts dynamic workflows which are specified at runtime and do not require to follow a Directed Acyclic Graph, which can be restricting and difficult to implement.
+- In Prefect, Python functions are first class citizens, thus building a Prefect workflow does not deviate substantially from traditional science workflows composed of functions.
+- Prefect integrates with a `dask` cluster, and `echopype` processing functions are already using `dask` to scale operations.
+- Prefect’s code runs similarly locally as well as on cloud services.
+- Prefect’s monitoring dashboard is open source, can be run locally, and is intuitive to use.
 
 We next describe in more detail the components of the workflow lifecycle.
 
 ## Workflow Configuration
-The main goal of `echodataflow` is to allow users to configure an echosounder data processing pipeline through editing configuration “recipe” templates. `echodataflow` can be configured through three templates: `datastore.yaml` which handles the data storage locations, `pipeline.yml` which specifies the processing stages, and `logging.yaml` which sets the logging format. 
+
+The main goal of `echodataflow` is to allow users to configure an echosounder data processing pipeline through editing configuration “recipe” templates. `echodataflow` can be configured through three templates: `datastore.yaml` which handles the data storage locations, `pipeline.yml` which specifies the processing stages, and `logging.yaml` which sets the logging format.
 
 ### Data Storage Configuration
-Below we show an example file `datastore.yaml` with a data storage configuration for a ship survey. In this scenario the goal is to process data from the Joint U.S.-Canada Integrated Ecosystem and Pacific Hake Acoustic Trawl Survey [@NWFSC_FRAM_2022] which are publicly available on an AWS S3 bucket hosted by NOAA National Centers for Environmental Information Acoustics (NCEA) Archive [@wall_2016]. The archive contains data from many surveys dating back to 1991 (~280TB). The configuration allows to pass parameters specifying the ship, survey, and sonar model names and select the subset of files belonging only to the survey of interest. The output destination is set to a private S3 bucket belonging to the user (within an AWS account different from the input one), and the credentials are passed through a `block_name`. The survey contains ~4000 files, and one can set the group option to combine the files into survey-specific groups: based on the transect information provided in the `transect_group.txt` file. One can further use regular expressions to subselect other subgroups based on needs. 
+
+Below we show an example file `datastore.yaml` with a data storage configuration for a ship survey. In this scenario the goal is to process data from the Joint U.S.-Canada Integrated Ecosystem and Pacific Hake Acoustic Trawl Survey [@NWFSC_FRAM_2022] which are publicly available on an AWS S3 bucket hosted by NOAA National Centers for Environmental Information Acoustics (NCEA) Archive [@wall_2016]. The archive contains data from many surveys dating back to 1991 (~280TB). The configuration allows to pass parameters specifying the ship, survey, and sonar model names and select the subset of files belonging only to the survey of interest. The output destination is set to a private S3 bucket belonging to the user (within an AWS account different from the input one), and the credentials are passed through a `block_name`. The survey contains ~4000 files, and one can set the group option to combine the files into survey-specific groups: based on the transect information provided in the `transect_group.txt` file. One can further use regular expressions to subselect other subgroups based on needs.
 
 <!-- :::{figure} datastore_config.png
 :label: fig:datastore_config
@@ -92,45 +96,43 @@ Below we show an example file `datastore.yaml` with a data storage configuration
 :::
 -->
 
-
 <!--:name: datastore-config
-Data Storage Configuration 
+Data Storage Configuration
 -->
 
-
 ```yaml
 # datastore.yaml
 
 name: Bell_M._Shimada-SH1707-EK60
-sonar_model: EK60 
-raw_regex: (.*)-?D(?P<date>\w{1,8})-T(?P<time>\w{1,6}) 
+sonar_model: EK60
+raw_regex: (.*)-?D(?P<date>\w{1,8})-T(?P<time>\w{1,6})
 args:
-  urlpath: s3://ncei-wcsd-archive/data/raw/{{ ship_name }}/{{ survey_name }}/{{ sonar_model }}/*.raw 
-  parameters: 
+  urlpath: s3://ncei-wcsd-archive/data/raw/{{ ship_name }}/{{ survey_name }}/{{ sonar_model }}/*.raw
+  parameters:
     ship_name: Bell_M._Shimada
     survey_name: SH1707
     sonar_model: EK60
-  storage_options: 
+  storage_options:
     anon: true
-  group: 
-    file: ./transect_group.txt 
-    storage_options: 
+  group:
+    file: ./transect_group.txt
+    storage_options:
       block_name: echodataflow-aws-credentials
-      type: AWS 
-  group_name: default_group 
-  json_export: true 
-  raw_json_path: s3://echodataflow-workground/combined_files/raw_json 
-output: 
+      type: AWS
+  group_name: default_group
+  json_export: true
+  raw_json_path: s3://echodataflow-workground/combined_files/raw_json
+output:
   urlpath: <YOUR-S3-BUCKET>
-  overwrite: true 
-  retention: false 
-  storage_options: 
+  overwrite: true
+  retention: false
+  storage_options:
     block_name: echodataflow-aws-credentials
     type: AWS
 ```
 
-
 ### Pipeline Configuration
+
 The pipeline configuration file’s purpose is to list the stages of the processing pipeline and the computational set-up for their execution. Below we show an example `pipeline.yaml` file which cofigures a pipeline with several stages: `open_raw`, `combine_echodata`, `compute_Sv`, `compute_MVBS`. Each stage is executed as a separate Prefect subflow (a component of a Prefect workflow), and one can specify additional options on whether to store the raw files. `echodataflow` requires access to a Dask cluster: it can be either created on the fly by setting the `use_local_dask` to `true`, or an IP address of an already running cluster can be provided. Individual stages may require different cluster configurations to efficiently execute the tasks. Those can be specified with the additional `prefect_config` option through which the user can set a specific Dask task runner or the number of retries. Managing retries is essential for handling transient failures, such as connectivity issues, ensuring the stages can be re-executed without any manual interference if a failure occurs.
 
 <!--
@@ -143,41 +145,41 @@ The pipeline configuration file’s purpose is to list the stages of the process
 ```yaml
 # pipeline.yaml
 
-active_recipe: standard 
-use_local_dask: true 
-n_workers: 4 
-scheduler_address: tcp://127.0.0.1:61918 
-pipeline: 
-- recipe_name: standard 
-  stages: 
-  - name: echodataflow_open_raw 
-    module: echodataflow.stages.subflows.open_raw 
-    options: 
-      save_raw_file: true 
-      use_raw_offline: true 
-      use_offline: true 
-    prefect_config:
-      retries: 3
-  - name: echodataflow_combine_echodata
-    module: echodataflow.stages.subflows.combine_echodata
-    options:
-      use_offline: true
-  - name: echodataflow_compute_Sv
-    module: echodataflow.stages.subflows.compute_Sv
-    options:
-      use_offline: true
-  - name: echodataflow_compute_MVBS
-    module: echodataflow.stages.subflows.compute_MVBS
-    options:
-      use_offline: true
-    external_params:
-      range_meter_bin: 20
-      ping_time_bin: 20S
+active_recipe: standard
+use_local_dask: true
+n_workers: 4
+scheduler_address: tcp://127.0.0.1:61918
+pipeline:
+  - recipe_name: standard
+    stages:
+      - name: echodataflow_open_raw
+        module: echodataflow.stages.subflows.open_raw
+        options:
+          save_raw_file: true
+          use_raw_offline: true
+          use_offline: true
+        prefect_config:
+          retries: 3
+      - name: echodataflow_combine_echodata
+        module: echodataflow.stages.subflows.combine_echodata
+        options:
+          use_offline: true
+      - name: echodataflow_compute_Sv
+        module: echodataflow.stages.subflows.compute_Sv
+        options:
+          use_offline: true
+      - name: echodataflow_compute_MVBS
+        module: echodataflow.stages.subflows.compute_MVBS
+        options:
+          use_offline: true
+        external_params:
+          range_meter_bin: 20
+          ping_time_bin: 20S
 ```
 
 ### Logging Configuration
 
-By default, the outcomes of each stage are logged. The logs can be stored in `.json` or plain text files, and the format of the entries can be specified in the configuration file as displayed below. The `json` format allows searching through the logs for a specific key. 
+By default, the outcomes of each stage are logged. The logs can be stored in `.json` or plain text files, and the format of the entries can be specified in the configuration file as displayed below. The `json` format allows searching through the logs for a specific key.
 
 <!-- :::{figure} logging_config.png
 :label: fig:logging_config
@@ -192,9 +194,9 @@ version: 1
 disable_existing_loggers: False
 formatters:
   json:
-    format: "[%(asctime)s] %(process)d %(levelname)s %(mod_name)s:%(func_name)s:%(lineno)s - %(message)s"
+    format: '[%(asctime)s] %(process)d %(levelname)s %(mod_name)s:%(func_name)s:%(lineno)s - %(message)s'
   plaintext:
-    format: "[%(asctime)s] %(process)d %(levelname)s %(mod_name)s:%(func_name)s:%(lineno)s - %(message)s"
+    format: '[%(asctime)s] %(process)d %(levelname)s %(mod_name)s:%(func_name)s:%(lineno)s - %(message)s'
 handlers:
   logfile:
     class: logging.handlers.RotatingFileHandler
@@ -229,11 +231,12 @@ In this case the logs are stored in the plain text file `echodataflow.log`. Belo
 [2024-06-06 17:32:08,946] 51493 ERROR file_utils.py:file_utils:147 - 'source_ds' must have coordinates 'ping_time' and 'range_sample'!
 ```
 
-In Section [Workflow Logging](#Workflow-Logging), we provide more information on logging options.
-
+In @workflow-logging, we provide more information on logging options.
 
 ## Workflow Execution
-To convert a scientific pipeline into an executable Prefect workflow, one needs to organize its components into flows, sublfows, and tasks (the key objects of Prefect’s execution logic). Usually, the stages of a pipeline are organized into flows and subflows, while the individual pieces of work within the stage are organized into tasks. In practice, flows, subflows, and tasks are all Python functions, and they differ in how we want to execute them (e.g. concurrently/sequentially, w/o retries), and what we want to track during execution (e.g. input/outputs, state logging, etc.). In `echodataflow` we organize the typical echosounder processing stages into subflows (flows within the main workflow), while the operations on different files (or groups of them) are individual tasks. We describe how functions are organized in the `open_raw` stage, which reads the files from raw format, parses the data, and writes them into a `zarr` format. The `echodataflow_open_raw` function is decorated as a flow, and is one of many subflows of the full workflow. This function processes all files. 
+
+To convert a scientific pipeline into an executable Prefect workflow, one needs to organize its components into flows, sublfows, and tasks (the key objects of Prefect’s execution logic). Usually, the stages of a pipeline are organized into flows and subflows, while the individual pieces of work within the stage are organized into tasks. In practice, flows, subflows, and tasks are all Python functions, and they differ in how we want to execute them (e.g. concurrently/sequentially, w/o retries), and what we want to track during execution (e.g. input/outputs, state logging, etc.). In `echodataflow` we organize the typical echosounder processing stages into subflows (flows within the main workflow), while the operations on different files (or groups of them) are individual tasks. We describe how functions are organized in the `open_raw` stage, which reads the files from raw format, parses the data, and writes them into a `zarr` format. The `echodataflow_open_raw` function is decorated as a flow, and is one of many subflows of the full workflow. This function processes all files.
+
 <!--
 :::{figure} flow_task.png
 :label: fig:flow_task
@@ -259,7 +262,8 @@ def echodataflow_open_raw(
         List[Output]: List of processed outputs organized based on transects.
 ```
 
-`echodataflow_open_raw` contains a loop which iterates through all file groups and applies the `process_raw` function which operates on a single group and is decorated as a task. All tasks will be executed on the Dask cluster. 
+`echodataflow_open_raw` contains a loop which iterates through all file groups and applies the `process_raw` function which operates on a single group and is decorated as a task. All tasks will be executed on the Dask cluster.
+
 ```python
 for name, gr in groups.items():
         for raw in gr.data:
@@ -282,7 +286,8 @@ def process_raw(
 ```
 
 ## Workflow Monitoring
-One of the main advantages of using orchestration frameworks is that they usually provide tools to monitor the workflow execution. The integration with Prefect allows leveraging Prefect’s dashboard (Prefect UI) for monitoring the execution of the flows. The dashboard can be run locally and within Prefect’s online managed system (Prefect Cloud). The local version provides an entirely open source framework for running and monitoring workflows. [Figure %s ](#fig:flow_sequence_expanded) shows the view of completed runs within the dashboard. The progress can be monitored while the flows are in progress. 
+
+One of the main advantages of using orchestration frameworks is that they usually provide tools to monitor the workflow execution. The integration with Prefect allows leveraging Prefect’s dashboard (Prefect UI) for monitoring the execution of the flows. The dashboard can be run locally and within Prefect’s online managed system (Prefect Cloud). The local version provides an entirely open source framework for running and monitoring workflows. [Figure %s](#fig:flow_sequence_expanded) shows the view of completed runs within the dashboard. The progress can be monitored while the flows are in progress.
 
 :::{figure} flow_sequence_expanded.png
 :label: fig:flow_sequence_expanded
@@ -296,34 +301,38 @@ Further, one can also view the progress of the execution of the tasks on the Das
 **Dask Dashboard:** The execution of the tasks on the Dask cluster can also be monitored through the Dask dashboard.
 :::
 
-
+(workflow-logging)=
 
 ## Workflow Logging
+
 Processing large data archives requires a robust logging system to identify at which step and for which files the processing has failed. Locating the issues allows to set a path forward to resolve them: either through improving the robustness of the individual libraries performing the processing steps, or through identifying the artifacts of the data which are incompatible with the existing pipeline. To address this, we provide several approaches:
-* Basic Logging with Dask Worker Streams: this approach configures Dask worker streams to handle `echodataflow` logs, which is straightforward if exact log order is not crucial.
-* Centralized Logging with Amazon CloudWatch [@cloudwatch]: this approach centralizes all logs for easy access and analysis. It can be useful when users are already utilizing AWS.
-* Advanced Logging with Apache Kafka [@kafka] and Elastic Stack [@elastic_stack] (Elasticsearch, Kibana, Beats, Logstash): this approach leverages Kafka for log aggregation and Elastic Stack for log analysis and visualization, offering a robust solution for those who can maintain the infrastructure, for example data center managers.
+
+- Basic Logging with Dask Worker Streams: this approach configures Dask worker streams to handle `echodataflow` logs, which is straightforward if exact log order is not crucial.
+- Centralized Logging with Amazon CloudWatch [@cloudwatch]: this approach centralizes all logs for easy access and analysis. It can be useful when users are already utilizing AWS.
+- Advanced Logging with Apache Kafka [@kafka] and Elastic Stack [@elastic_stack] (Elasticsearch, Kibana, Beats, Logstash): this approach leverages Kafka for log aggregation and Elastic Stack for log analysis and visualization, offering a robust solution for those who can maintain the infrastructure, for example data center managers.
 
 By default if logging is not configured, all the worker messages are directed to the application console. The order of logs may not be preserved since logs are written once control returns from the Dask workers to the main application.
 
-## Workflow Deployment 
+## Workflow Deployment
 
 ### Notebook
-`echodataflow` can be directly initiated within a Jupyter notebook, which makes development interactive and provides a work environment familiar to researchers. One can see how the workflow is initiated within the Jupyter cell in [Figure %s ](#fig:notebook_start).
+
+`echodataflow` can be directly initiated within a Jupyter notebook, which makes development interactive and provides a work environment familiar to researchers. One can see how the workflow is initiated within the Jupyter cell in [Figure %s](#fig:notebook_start).
 
 :::{figure} notebook_start.png
 :label: fig:notebook_start
+:class: framed
 **Initiating `echodataflow` in a Jupyter Notebook:** Once one has a set of "recipe" configuration files, they can initiate the workflow in a notebook cell with the `echodataflow_start` command.
 :::
 
-
-We provide two demo notebooks: one for execution on a [local machine](https://github.com/OSOceanAcoustics/echodataflow/blob/1ac65fa0bfcdd01b151b98134b842364311059fd/docs/source/local/notebook.ipynb) and another one for execution on [AWS](https://github.com/OSOceanAcoustics/echodataflow/blob/1ac65fa0bfcdd01b151b98134b842364311059fd/docs/source/local/notebook.ipynb). 
+We provide two demo notebooks: one for execution on a [local machine](https://github.com/OSOceanAcoustics/echodataflow/blob/1ac65fa0bfcdd01b151b98134b842364311059fd/docs/source/local/notebook.ipynb) and another one for execution on [AWS](https://github.com/OSOceanAcoustics/echodataflow/blob/1ac65fa0bfcdd01b151b98134b842364311059fd/docs/source/local/notebook.ipynb).
 
 ### Docker
+
 We facilitate the deployment of `echodataflow` on various platforms by building a Docker image from which one can launch a container with all required components and the user can access the workflow dashboard on the corresponding port.
 
 ```
-docker pull blackdranzer/echodataflow 
+docker pull blackdranzer/echodataflow
 
 prefect server start
 
@@ -334,13 +343,13 @@ Upon execution, the user can readily access the Prefect UI dashboard and run wor
 
 We also provide a Docker image for initiating logging with Kafka and Elastic Stack, thus streamlining the configuration of several tools.
 
-
-
 ## Command Line Interface
+
 We provide a command line interface which supports credential handling, and several additional features for managing workflows: stage addition and rule validation.
 
 ### Adding Stages
-Currently, most major functionalities in the echopype package are wrapped into stages: `open_raw`, `add_depth`, `add_location`, `compute_Sv`, `compute_TS`, `compute_MVBS`, `combine_echodata`, `frequency_differencing`, `apply_mask`. 
+
+Currently, most major functionalities in the echopype package are wrapped into stages: `open_raw`, `add_depth`, `add_location`, `compute_Sv`, `compute_TS`, `compute_MVBS`, `combine_echodata`, `frequency_differencing`, `apply_mask`.
 
 We provide tools to generate boilerplate template configuration based on the existing stages. Here is an example to add a stage:
 
@@ -355,10 +364,12 @@ echodataflow gs compute_Sv
 ```
 
 This command creates a template configuration file for the specified stage, allowing to customize and integrate it into a workflow. The generated file includes:
-* a flow: it orchestrates the execution of all files that need to be processed, either concurrently or in parallel, based on the configuration.
-* a task (helper function): it assists the flow by processing individual files.
+
+- a flow: it orchestrates the execution of all files that need to be processed, either concurrently or in parallel, based on the configuration.
+- a task (helper function): it assists the flow by processing individual files.
 
 ### Rule Validation
+
 Scientific workflows often have stages that cannot be executed until other stages have completed. Those conditions can be set through `echodataflow` client during the initialization process and are stored in a `echodataflow_rules.txt` file:
 
 ```
@@ -390,33 +401,33 @@ In the example, the echodataflow decorator ensures that the function `my_functio
 
 We demonstrate a workflow processing all acoustic data for the 2017 Joint U.S.-Canada Integrated Ecosystem and Pacific Hake Acoustic Trawl Survey through a few routine processing stages. The survey spans a period of 06/15/2017 - 09/13/2017, covering the entire west coast of the US and Canada. Figure 1(a) shows a map of a typical transect schedule of the survey. Raw acoustic data are collected continuously while the ship is in motion, resulting in a total of 3873 files collected with a total size of 93 GB. The raw files are archived by the NOAA NCEI Water Column Sonar Data Archive and are publicly accessible on their Amazon Web Services S3 bucket ([https://registry.opendata.aws/ncei-wcsd-archive/](https://registry.opendata.aws/ncei-wcsd-archive/)). The processing pipeline involves several steps:
 
-* Convert raw files to cloud-native `zarr` format following closely a community convention [@lee_echopype_2021], [@convention]
-* Combine multiple individual `zarr` files within a continuous transect segment into a single `zarr` file
-* Compute Sv: calibrate the measured acoustic backscatter data to volume backscattering strength (Sv, unit: dB re 1 m$^{-1}$)
+- Convert raw files to cloud-native `zarr` format following closely a community convention [@lee_echopype_2021], [@convention]
+- Combine multiple individual `zarr` files within a continuous transect segment into a single `zarr` file
+- Compute Sv: calibrate the measured acoustic backscatter data to volume backscattering strength (Sv, unit: dB re 1 m$^{-1}$)
 
 Once data are converted to Sv, they are easy to manipulate, as the data are stored in an `xarray` data array and are smaller than that of the original data. The final dataset can be served as an analysis-ready data product to the community. It can be beneficial to store also intermediate datasets at different processing stages: for example, preserving the converted raw files in the standardized `zarr` format allows users to regenerate any of the following stages with different groupings or resolution, without having to fetch and convert raw data again.
 
-The execution of the workflow with `echodataflow` allowed us to monitor the progress of all files [Figure %s ](#fig:one_failed): 3872 files were successfully processed, and 1 failed. Most importantly, the failure did not block the execution of the other files, and a log was generated for the stage and the filename for which the error occurred. This experiment serves as a confirmation that the transition from local development to a full production pipeline with `echodataflow` can indeed be smooth.
-
+The execution of the workflow with `echodataflow` allowed us to monitor the progress of all files [Figure %s](#fig:one_failed): 3872 files were successfully processed, and 1 failed. Most importantly, the failure did not block the execution of the other files, and a log was generated for the stage and the filename for which the error occurred. This experiment serves as a confirmation that the transition from local development to a full production pipeline with `echodataflow` can indeed be smooth.
 
 :::{figure} one_failed.png
 :label: fig:one_failed
 **Processing full 2017 Survey Data:** 1/3873 files failed at the `open_raw` stage, but this did not impact the entire pipeline. As shown, other files were processed successfully through all stages.
-::: 
+:::
 
 ## Future Development
-Our immediate goal is to provide more example workflow recipes integrating other stages of echosounder data processing, such as machine learning prediction, training dataset generation, biomass estimation, interactive visualization, etc. We will demonstrate utilizing functionalities from a suite of open source Python packages (`echoregions` [@echoregions] for reading region annotations and creating corresponding masks, `echopop` [@echopop] for combining acoustic data with biological "ground truth" into biomass estimation, `echoshader` [@echoshader] for echogram and map dashboard visualization) in building workflows for the Pacific Hake Survey: both in a historical and near-realtime on-ship data processing context. We aim to streamline the stage addition process. We will further investigate how to improve memory management and caching between and within stages to optimize for different scenarios. There is growing interest in the fisheries acoustics community to share global, accessible, and interoperable datasets [@gain_repo], and to agree on community data standards and definitions of processing levels [@convention], [@levels]. As those mature we will align them with existing stages in `echodataflow`, which will support building interoperable datasets whose integration will push us to study bigger and more challenging questions in fisheries acoustics.
-
 
+Our immediate goal is to provide more example workflow recipes integrating other stages of echosounder data processing, such as machine learning prediction, training dataset generation, biomass estimation, interactive visualization, etc. We will demonstrate utilizing functionalities from a suite of open source Python packages (`echoregions` [@echoregions] for reading region annotations and creating corresponding masks, `echopop` [@echopop] for combining acoustic data with biological "ground truth" into biomass estimation, `echoshader` [@echoshader] for echogram and map dashboard visualization) in building workflows for the Pacific Hake Survey: both in a historical and near-realtime on-ship data processing context. We aim to streamline the stage addition process. We will further investigate how to improve memory management and caching between and within stages to optimize for different scenarios. There is growing interest in the fisheries acoustics community to share global, accessible, and interoperable datasets [@gain_repo], and to agree on community data standards and definitions of processing levels [@convention], [@levels]. As those mature we will align them with existing stages in `echodataflow`, which will support building interoperable datasets whose integration will push us to study bigger and more challenging questions in fisheries acoustics.
 
 ## Beyond Fisheries Acoustics
 
 Echodataflow was designed to facilitate fisheries acoustics workflows, but the structure can be adapted to data processing pipelines in other scientific communities. The key aspects are to identify the potential stages of the workflows and associated Python packages/functions that implement them, and to design the structure of the configuration files. The other aspects such as logging, deployment, monitoring, new-stage integration are domain-agnostic. Processing pipelines that require manipulation of large labeled arrays can directly benefit from the Dask cluster integration and are prevalent in the research community. Our use case of regrouping data based on time segments is a common need within scientific settings in which the file unit level of the instrument is not aligned with the unit level of analysis, and requires further reorganization and potential resampling and regridding along certain coordinates. We hope it can serve as a guide on how to build configurable, reproducible, and scalable workflows in new scientific areas.
 
-## Acknowledgements:
+## Acknowledgements
+
 We thank the Fisheries Engineering and Acoustic Technologies team at the NOAA Northwest Fisheries Science Center: Julia Clemons, Alicia Billings, Rebecca Thomas, Elizabeth Phillips for introducing us to the Pacific Hake Survey operations and the hake biomass estimation workflow.
 
 This work used cpu compute and storage resources at Jetstream2 through allocation AGR230002 from the Advanced cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program [@10.1145/3437359.3465565], [@10.1145/3569951.3597559], which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.
 
-## Funding:
+**Funding**
+
 NOAA Award No. NA21OAR0110201, NOAA Award No. NA20OAR4320271 AM43, eScience Institute
diff --git a/papers/valentina_staneva/myst.yml b/papers/valentina_staneva/myst.yml
index 24fc24a119..7a136d16e2 100644
--- a/papers/valentina_staneva/myst.yml
+++ b/papers/valentina_staneva/myst.yml
@@ -1,10 +1,12 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/JXDK4427
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-valentina_staneva
   # Ensure your title is the same as in your `main.md`
-  title: "Echodataflow: Recipe-based Fisheries Acoustics Workflow Orchestration"
-
+  title: 'Echodataflow: Recipe-based Fisheries Acoustics Workflow Orchestration'
+  description: With the influx of large data from multiple instruments and experiments, scientists are wrangling complex data pipelines that are context-dependent and non-reproducible. Echodataflow provides transparent reproducible pipelines that can be edited with text "recipes", scaled and monitored.
   # Authors should have affiliations, emails and ORCIDs if available
   authors:
     - name: Valentina Staneva
@@ -47,8 +49,8 @@ project:
 
   keywords:
     - prefect
-    - workflow orchestration 
-    - dask 
+    - workflow orchestration
+    - dask
     - zarr
     - fisheries acoustics
   # Add the abbreviations that you use in your paper here
@@ -62,7 +64,7 @@ project:
     GUI: Graphics User Interface
     NOAA: National Oceanic and Atmospheric Administration
     NCEI: National Centers for Environmental Information
-
+    AOP: Aspect-Oriented Programming
 
   # It is possible to explicitly ignore the `doi-exists` check for certain citation keys
   error_rules:
@@ -96,18 +98,5 @@ project:
         - echoviewr
         - echoview_software_pty_ltd_echoview_nodate
         - echodataflow
-        
-
-
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
 site:
   template: article-theme
-
-
diff --git a/papers/valentina_staneva/thumbnail.png b/papers/valentina_staneva/thumbnail.png
new file mode 100644
index 0000000000..9502f722cf
Binary files /dev/null and b/papers/valentina_staneva/thumbnail.png differ
diff --git a/papers/wu-jung_lee/banner.png b/papers/wu-jung_lee/banner.png
index e6a793bd6c..632736285c 100644
Binary files a/papers/wu-jung_lee/banner.png and b/papers/wu-jung_lee/banner.png differ
diff --git a/papers/wu-jung_lee/myst.yml b/papers/wu-jung_lee/myst.yml
index 7312cd92ec..86434f077e 100644
--- a/papers/wu-jung_lee/myst.yml
+++ b/papers/wu-jung_lee/myst.yml
@@ -1,10 +1,12 @@
 version: 1
+extends: ../proceedings.yml
 project:
+  doi: 10.25080/WXRH8633
   # Update this to match `scipy-2024-<folder>` the folder should be `<firstname_surname>`
   id: scipy-2024-wu-jung_lee
   # Ensure your title is the same as in your `main.md`
-  title: "Echostack: A flexible and scalable open-source software toolbox for echosounder data processing"
-  subtitle: ""
+  title: 'Echostack: A flexible and scalable open-source software toolbox for echosounder data processing'
+  description: Water column sonar data collected by echosounders are essential for fisheries and marine ecosystem research, enabling the detection, classification, and quantification of fish and zooplankton from many different ocean observing platforms. We introduce Echostack, a suite of open-source Python software packages that leverage existing distributed computing and cloud-interfacing libraries to support intuitive and scalable data access, processing, and interpretation.
   # Authors should have affiliations, emails and ORCIDs if available
   authors:
     - name: Wu-Jung Lee
@@ -85,7 +87,6 @@ project:
 
   # Add the abbreviations that you use in your paper here
   abbreviations:
-    MyST: Markedly Structured Text
     ML: machine learning
     ARCO: analysis-ready, cloud-optimized
   # It is possible to explicitly ignore the `doi-exists` check for certain citation keys
@@ -99,13 +100,5 @@ project:
         - sklearn1
         - sklearn2
         - korneliussen_lsss_2006
-  # A banner will be generated for you on publication, this is a placeholder
-  banner: banner.png
-  # The rest of the information shouldn't be modified
-  subject: Research Article
-  open_access: true
-  license: CC-BY-4.0
-  venue: Scipy 2024
-  date: 2024-07-10
 site:
   template: article-theme
diff --git a/papers/wu-jung_lee/thumbnail.png b/papers/wu-jung_lee/thumbnail.png
new file mode 100644
index 0000000000..76da1523c7
Binary files /dev/null and b/papers/wu-jung_lee/thumbnail.png differ