Identification of multivariate phenotypes most influenced by mutation: Drosophila serrata wings as a case study

Published in Evolution: 01 August 2025

This repository contains all the code and data used in the manuscript.

The history of the experiment can be found in Causes of variability in estimates of mutational variance from mutation accumulation experiments in Genetics.

Analysis

`MVN_M_6_UN.sas`

This SAS script analyses the Drosophila wing trait data from our mutation reduction experiment where the 12 populations were analysed separately. The outcome is the multivariate (MVN) unstructured (UN) mutation variance - covariance matrix (M) for six wing traits. Here, populations are labelled first by treatment and then sampled generation (e.g., treatment 1_1 small treatment, generation 1). The steps include:

Sorting the Data The dataset mrwings is sorted by Trait and Gen.
Standardising Scores The Score variable is standardised (mean = 0, std = 1) within each Trait group and saved as stdmrwings.
Linear Mixed Model using the mixed procedure (PROC MIXED) A mixed-effects model is fit using REML, applied to treatments seperately, where multivaraite outliers are excluded (e.g., Treat=1, Gen=1, excluding multiout=1):

_Fixed effect: Trait
_Random effects: Trait within Line, Trait within Vial, nested in Treat and Line
_Repeated measures: Animal nested in Treat, Line, and Vial

Output Tables The following outputs are saved for downstream analysis (within the folder un_sas_output):

_Covparms1_1: Covariance parameters
_FitStat1_1: Fit statistics
_Converge1_1: Convergence diagnostics
_Asycov1_1: Asymptotic covariance matrix

`randomise_wing_data.R`

There are four parts to randomise_wing_data.R. In the first part, Part 1. Prepare the data set to be shuffled, we create a unique ID for a line. This section also separates the experimental component of Generation + Treatment + Line (population_info) and the individual fly trait information (with a newly assigned unique line ID, line_trait_info) into two data frames.

In the second section (Part 2. Generate a data frame of sets of randomised unique line IDs), here are the steps where we randomly sample the list of unique line IDs, and then generate a data frame of the sampled line sets (all_sets, N = 1000), each column is a single set of sampled lines and each row, within a column, is a randomly sampled line. This is where you can set the number of data sets needed and whether you want to sample with or without replacement.

In the third part (Part 3. Write a function to merge randomised line data to trait info and save to .csv), we create the function create_randomise_wing_data that uses the data frames generated in parts 1 and 2. It takes a single column from all_sets (one randomised sample of lines), and does several things:

adds the randomised unique line IDs by pasting the all_sets column onto the data frame of the experimental component, population_info

The unique lines are now randomised across the 12 experimental populations of treatment X generation and 42 experimental lines.

then populates with the flies and their wing information by appending the assigned unique line ID in line_trait_info to the randomised unique line ID

This keeps replicate vials and individual flies intact within line, and at this step, this is our complete randomised dataset

we retain the column with the experimental design lines, numbered from 1 to 42, for SAS and remove the column with the unique line IDs, numbered from 1 to 493 (see Note below)
calculates the Mahalanobis Distance and determines multivariate outliers for this new data set to use in SAS
adds a column of the treatments as numbers for SAS, using the original notation: small is 1 and big is 3 (SB crosses were 2)
converts the new data set into long format for SAS
re-orders the data columns
sorts the data by Gen, Treatment, Line, Vial, Individual
saves the dataset as a .csv in the 'generated_datasets' subfolder

Finally, in the last section (Part 4. Implement function on sets of randomised lines) we implement the function created above on the data frame all_sets. In this section, to speed up the time to generate all 1000 data sets, the function is run in parallel using the foreach function (%dopar%).

Note

Our dataset had fewer 'unique' lines than the full experimental design (i.e., 6 gens X 2 treat X 42 lines = 504, Observed data = 493). Our code is written to keep the same number of 'unique' lines (N = 493), where shuffled genetic data is extended onto this. If sample with replacement equals:

False; same number of individuals (N = 5135)
True; the number of total individuals will vary

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
generated_dataset		generated_dataset
output_figures		output_figures
output_tables		output_tables
un_sas_output		un_sas_output
.gitignore		.gitignore
MVN_M_6_UN.sas		MVN_M_6_UN.sas
Multivariate_M_repeat_sampling.Rproj		Multivariate_M_repeat_sampling.Rproj
README.md		README.md
mr_wings_6traits.csv		mr_wings_6traits.csv
multivariate_M_analysis.R		multivariate_M_analysis.R
randomise_wing_data.R		randomise_wing_data.R
write_sas_randomised_data.R		write_sas_randomised_data.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Identification of multivariate phenotypes most influenced by mutation: Drosophila serrata wings as a case study

The history of the experiment can be found in Causes of variability in estimates of mutational variance from mutation accumulation experiments in Genetics.

Analysis

`MVN_M_6_UN.sas`

`randomise_wing_data.R`

About

Uh oh!

Releases

Packages

Uh oh!

Languages

CaraConradsen/Multivariate_M_repeat_sampling

Folders and files

Latest commit

History

Repository files navigation

Identification of multivariate phenotypes most influenced by mutation: Drosophila serrata wings as a case study

The history of the experiment can be found in Causes of variability in estimates of mutational variance from mutation accumulation experiments in Genetics.

Analysis

MVN_M_6_UN.sas

randomise_wing_data.R

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`MVN_M_6_UN.sas`

`randomise_wing_data.R`

Packages