Case study
Note
The following tutorial is intended to be run entirely with Python 3.
Obtaining the files
In the case you haven’t created the working directory, run the following:
On Windows:
import subprocess
subprocess.run('wsl bash -c "mkdir /mnt/c/Agri-Plast"', shell=True, check=True)
On Linux/Mac:
import subprocess
subprocess.run('mkdir ~/Agri-Plast', shell=True, check=True)
In this case study we’ll be focusing on the following dataset: https://dmportal.biodata.pt/dataset.xhtml?persistentId=doi:10.34636/DMPortal/Q3YLV4
Take note of the doi:
doi:10.34636/DMPortal/Q3YLV4
Run the following to obtain the metadata file:
On Windows:
import subprocess
doi = "doi:10.34636/DMPortal/Q3YLV4"
subprocess.run(f'wsl bash -c "curl -L \\"https://dmportal.biodata.pt/api/datasets/:persistentId/?persistentId={doi}\\" -o /mnt/c/Agri-Plast/dataset.metadata"', shell=True, check=True)
On Linux/Mac:
import subprocess
doi = "doi:10.34636/DMPortal/Q3YLV4"
subprocess.run(f'curl -L "https://dmportal.biodata.pt/api/datasets/:persistentId/?persistentId={doi}" -o ~/Agri-Plast/dataset.metadata', shell=True, check=True)
Searching for the ID following "dataFile":{"id": the following IDs are identified: 1231 and 1232. Run the following to obtain both files:
On Windows:
import subprocess
id = "1231"
subprocess.run(f'wsl bash -c "curl -L \\"https://dmportal.biodata.pt/api/access/datafile/{id}?format=original\\" -o /mnt/c/Agri-Plast/file_{id}.csv"', shell=True, check=True)
id = "1232"
subprocess.run(f'wsl bash -c "curl -L \\"https://dmportal.biodata.pt/api/access/datafile/{id}?format=original\\" -o /mnt/c/Agri-Plast/file_{id}.csv"', shell=True, check=True)
On Linux/Mac:
import subprocess
id = "1231"
subprocess.run(f'curl -L "https://dmportal.biodata.pt/api/access/datafile/{id}?format=original" -o ~/Agri-Plast/file_{id}.csv', shell=True, check=True)
id = "1232"
subprocess.run(f'curl -L "https://dmportal.biodata.pt/api/access/datafile/{id}?format=original" -o ~/Agri-Plast/file_{id}.csv', shell=True, check=True)
We can now look at the first file:
import pandas as pd
dataset = pd.read_csv("C:\\Agri-Plast\\file_1231.csv") # Change to "~/Agri-Plast/file_1231.csv" on Linux/Mac
print(dataset.head(10))
Output:
ID Treatment Log Time Start Time Finish \
0 58 Control Log (4) 11h26 11h36
1 62 Control Log (5) 11h36 11h46
2 19 Control Log (8) 12h13 12h23
3 5 Control Log (9) 12h24 12h34
4 18 Control Log (14) 13h26 13h36
5 13 Control Log (15) 13h40 13h50
6 39 High [MPs] Log (1) 10h50 11h00
7 30 High [MPs] Log (7) 12h01 12h11
8 40 High [MPs] Log (10) 12h37 12h47
9 28 High [MPs] Log (13) 13h15 13h25
Gas Exchange (A) (µmol m⁻² s⁻¹) \
0 2.510
1 5.399
2 6.747
3 5.448
4 7.651
5 4.337
6 3.947
7 4.025
8 5.975
9 6.234
Stomatal conductance (Gsw) (µmol m⁻² s⁻¹) Date Observations
0 0.028545 03/07/2024 Before MPs
1 0.003452 03/07/2024 Before MPs
2 0.031398 03/07/2024 Before MPs
3 0.044850 03/07/2024 Before MPs
4 0.040487 03/07/2024 Before MPs
5 0.060281 03/07/2024 Before MPs
6 0.019934 03/07/2024 Before MPs
7 0.031952 03/07/2024 Before MPs
8 0.040976 03/07/2024 Before MPs
9 0.060094 03/07/2024 Before MPs
Stomatal Conductance
To start, a boxplot of the Stomatal conductance by Microplastic concentration will be generated by running the following lines:
import pandas as pd
import matplotlib.pyplot as plt
dataset = pd.read_csv("C:\\Agri-Plast\\file_1231.csv") # Change to "~/Agri-Plast/file_1231.csv" on Linux/Mac
order = ["Control", "Low [MPs]", "High [MPs]"]
dataset["Treatment"] = pd.Categorical(dataset["Treatment"], categories=order, ordered=True)
dataset.boxplot(
column="Stomatal conductance (Gsw) (µmol m⁻² s⁻¹)",
by="Treatment"
)
plt.title("Stomatal Conductance by Treatment")
plt.xlabel("Treatment")
plt.suptitle("")
plt.ylabel("Stomatal Conductance (Gsw)")
plt.show()
Output:
To address about the statistically significant difference between treatments, we need to use a Global test (followed by a Post-hoc if there are observed differences). But first, the normality of the data for each treatment must be studied to know which tests must be used. So, we first plot the distribution of the values per treatment to visually address about the data normality:
import matplotlib.pyplot as plt
dataset = pd.read_csv("C:\\Agri-Plast\\file_1231.csv") # Change to "~/Agri-Plast/file_1231.csv" on Linux/Mac
data = dataset[dataset["Treatment"] == "Control"]["Stomatal conductance (Gsw) (µmol m⁻² s⁻¹)"]
plt.figure(figsize=(6,5))
plt.hist(data, bins=20)
plt.title("Frequency of Stomatal Conductance - Control")
plt.xlabel("Stomatal Conductance (Gsw)")
plt.ylabel("Frequency")
plt.show()
Output:
import matplotlib.pyplot as plt
dataset = pd.read_csv("C:\\Agri-Plast\\file_1231.csv") # Change to "~/Agri-Plast/file_1231.csv" on Linux/Mac
data = dataset[dataset["Treatment"] == "Low [MPs]"]["Stomatal conductance (Gsw) (µmol m⁻² s⁻¹)"]
plt.figure(figsize=(6,5))
plt.hist(data, bins=20)
plt.title("Frequency of Stomatal Conductance - Low [MPs]")
plt.xlabel("Stomatal Conductance (Gsw)")
plt.ylabel("Frequency")
plt.show()
Output:
import matplotlib.pyplot as plt
dataset = pd.read_csv("C:\\Agri-Plast\\file_1231.csv") # Change to "~/Agri-Plast/file_1231.csv" on Linux/Mac
data = dataset[dataset["Treatment"] == "High [MPs]"]["Stomatal conductance (Gsw) (µmol m⁻² s⁻¹)"]
plt.figure(figsize=(6,5))
plt.hist(data, bins=20)
plt.title("Frequency of Stomatal Conductance - High [MPs]")
plt.xlabel("Stomatal Conductance (Gsw)")
plt.ylabel("Frequency")
plt.show()
Output:
The data does not appear to normally distributed. To be sure, we run the Shapiro’s Test.
For Control data:
from scipy.stats import shapiro
dataset = pd.read_csv("C:\\Agri-Plast\\file_1231.csv") # Change to "~/Agri-Plast/file_1231.csv" on Linux/Mac
data = dataset[dataset["Treatment"] == "Control"]["Stomatal conductance (Gsw) (µmol m⁻² s⁻¹)"]
stat, p_value = shapiro(data)
print(stat)
print(p_value)
Output:
0.8156227285188757
0.004425037569148782
For Low [MPs]:
from scipy.stats import shapiro
dataset = pd.read_csv("C:\\Agri-Plast\\file_1231.csv") # Change to "~/Agri-Plast/file_1231.csv" on Linux/Mac
data = dataset[dataset["Treatment"] == "Low [MPs]"]["Stomatal conductance (Gsw) (µmol m⁻² s⁻¹)"]
stat, p_value = shapiro(data)
print(stat)
print(p_value)
Output:
0.7881585746105204
0.0026013957337220357
For High [MPs]:
from scipy.stats import shapiro
dataset = pd.read_csv("C:\\Agri-Plast\\file_1231.csv") # Change to "~/Agri-Plast/file_1231.csv" on Linux/Mac
data = dataset[dataset["Treatment"] == "High [MPs]"]["Stomatal conductance (Gsw) (µmol m⁻² s⁻¹)"]
stat, p_value = shapiro(data)
print(stat)
print(p_value)
Output:
0.8724685354837892
0.03668397290204818
For all the groups, the test rejected the null hypothesis of normality (p_value < 0.05), indicating that stomatal conductance values are not normally distributed. Therefore, Kruskal-Wallis Global test should be used instead of One-Way ANOVA:
from scipy.stats import kruskal
import pandas as pd
dataset = pd.read_csv("C:\\Agri-Plast\\file_1231.csv") # Change to "~/Agri-Plast/file_1231.csv" on Linux/Mac
control = dataset[dataset["Treatment"] == "Control"]["Stomatal conductance (Gsw) (µmol m⁻² s⁻¹)"]
low = dataset[dataset["Treatment"] == "Low [MPs]"]["Stomatal conductance (Gsw) (µmol m⁻² s⁻¹)"]
high = dataset[dataset["Treatment"] == "High [MPs]"]["Stomatal conductance (Gsw) (µmol m⁻² s⁻¹)"]
H, p = kruskal(control, low, high)
print(H)
print(p)
Output:
0.21184088806657542
0.8994962055196596
The analysis revealed no statistically significant differences among the Control, Low [MPs], and High [MPs] groups (H = 0.21, p = 0.90), indicating that exposure to soil microplastics did not affect stomatal conductance under the conditions tested.
Gas Exchange vs Stomatal Conductance
To explore the relationship between photosynthetic activity and stomatal conductance, a simple linear regression is performed between Gas Exchange (A) and Stomatal conductance (Gsw). The resulting scatter plot shows the data points together with the fitted regression line:
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import linregress
dataset = pd.read_csv("C:\\Agri-Plast\\file_1231.csv") # Change to "~/Agri-Plast/file_1231.csv" on Linux/Mac
dataset_clean = dataset.dropna(subset=["Gas Exchange (A) (µmol m⁻² s⁻¹)", "Stomatal conductance (Gsw) (µmol m⁻² s⁻¹)"])
x = dataset_clean["Gas Exchange (A) (µmol m⁻² s⁻¹)"]
y = dataset_clean["Stomatal conductance (Gsw) (µmol m⁻² s⁻¹)"]
slope, intercept, r_value, p_value, std_err = linregress(x, y)
regression_line = slope * x + intercept
plt.scatter(dataset_clean["Gas Exchange (A) (µmol m⁻² s⁻¹)"], dataset_clean["Stomatal conductance (Gsw) (µmol m⁻² s⁻¹)"], alpha=0.6)
plt.plot(x, regression_line, color="red")
plt.xlabel("Gas Exchange (A)")
plt.ylabel("Stomatal conductance (Gsw)")
plt.title("Gas Exchange vs Stomatal Conductance")
plt.show()
print("y ="+str(slope)+"*x"+" + " + str(intercept))
Output:
y =0.005358690270373903*x + 0.005095107097022678