Case study

Note

The following tutorial is intended to be run entirely with Python 3.

Obtaining the files

In the case you haven’t created the working directory, run the following:

On Windows:

import subprocess
subprocess.run('wsl bash -c "mkdir /mnt/c/Agri-Plast"', shell=True, check=True)

On Linux/Mac:

import subprocess
subprocess.run('mkdir ~/Agri-Plast', shell=True, check=True)

In this case study we’ll be focusing on the following dataset: https://dmportal.biodata.pt/dataset.xhtml?persistentId=doi:10.34636/DMPortal/Q3YLV4

Take note of the doi:

doi:10.34636/DMPortal/Q3YLV4

Run the following to obtain the metadata file:

On Windows:

import subprocess
doi = "doi:10.34636/DMPortal/Q3YLV4"
subprocess.run(f'wsl bash -c "curl -L \\"https://dmportal.biodata.pt/api/datasets/:persistentId/?persistentId={doi}\\" -o /mnt/c/Agri-Plast/dataset.metadata"', shell=True, check=True)

On Linux/Mac:

import subprocess
doi = "doi:10.34636/DMPortal/Q3YLV4"
subprocess.run(f'curl -L "https://dmportal.biodata.pt/api/datasets/:persistentId/?persistentId={doi}" -o ~/Agri-Plast/dataset.metadata', shell=True, check=True)

Searching for the ID following "dataFile":{"id": the following IDs are identified: 1231 and 1232. Run the following to obtain both files:

On Windows:

import subprocess
id = "1231"
subprocess.run(f'wsl bash -c "curl -L \\"https://dmportal.biodata.pt/api/access/datafile/{id}?format=original\\" -o /mnt/c/Agri-Plast/file_{id}.csv"', shell=True, check=True)
id = "1232"
subprocess.run(f'wsl bash -c "curl -L \\"https://dmportal.biodata.pt/api/access/datafile/{id}?format=original\\" -o /mnt/c/Agri-Plast/file_{id}.csv"', shell=True, check=True)

On Linux/Mac:

import subprocess
id = "1231"
subprocess.run(f'curl -L "https://dmportal.biodata.pt/api/access/datafile/{id}?format=original" -o ~/Agri-Plast/file_{id}.csv', shell=True, check=True)
id = "1232"
subprocess.run(f'curl -L "https://dmportal.biodata.pt/api/access/datafile/{id}?format=original" -o ~/Agri-Plast/file_{id}.csv', shell=True, check=True)

We can now look at the first file:

import pandas as pd
dataset = pd.read_csv("C:\\Agri-Plast\\file_1231.csv") # Change to "~/Agri-Plast/file_1231.csv" on Linux/Mac
print(dataset.head(10))

Output:

   ID   Treatment       Log Time Start Time Finish  \
58     Control   Log (4)      11h26       11h36
62     Control   Log (5)      11h36       11h46
19     Control   Log (8)      12h13       12h23
 5     Control   Log (9)      12h24       12h34
18     Control  Log (14)      13h26       13h36
13     Control  Log (15)      13h40       13h50
39  High [MPs]   Log (1)      10h50       11h00
30  High [MPs]   Log (7)      12h01       12h11
40  High [MPs]  Log (10)      12h37       12h47
28  High [MPs]  Log (13)      13h15       13h25

   Gas Exchange (A) (µmol m⁻² s⁻¹)  \
                          2.510
                          5.399
                          6.747
                          5.448
                          7.651
                          4.337
                          3.947
                          4.025
                          5.975
                          6.234

   Stomatal conductance (Gsw)  (µmol m⁻² s⁻¹)        Date Observations
                                  0.028545  03/07/2024   Before MPs
                                  0.003452  03/07/2024   Before MPs
                                  0.031398  03/07/2024   Before MPs
                                  0.044850  03/07/2024   Before MPs
                                  0.040487  03/07/2024   Before MPs
                                  0.060281  03/07/2024   Before MPs
                                  0.019934  03/07/2024   Before MPs
                                  0.031952  03/07/2024   Before MPs
                                  0.040976  03/07/2024   Before MPs
                                  0.060094  03/07/2024   Before MPs

Stomatal Conductance

To start, a boxplot of the Stomatal conductance by Microplastic concentration will be generated by running the following lines:

import pandas as pd
import matplotlib.pyplot as plt

dataset = pd.read_csv("C:\\Agri-Plast\\file_1231.csv") # Change to "~/Agri-Plast/file_1231.csv" on Linux/Mac

order = ["Control", "Low [MPs]", "High [MPs]"]
dataset["Treatment"] = pd.Categorical(dataset["Treatment"], categories=order, ordered=True)
dataset.boxplot(
    column="Stomatal conductance (Gsw)  (µmol m⁻² s⁻¹)",
    by="Treatment"
)
plt.title("Stomatal Conductance by Treatment")
plt.xlabel("Treatment")
plt.suptitle("")
plt.ylabel("Stomatal Conductance (Gsw)")
plt.show()

Output:

https://github.com/lmgoncalves94/Agri-Plast_API/blob/main/docs/source/1231_stom_boxplot.png?raw=true

To address about the statistically significant difference between treatments, we need to use a Global test (followed by a Post-hoc if there are observed differences). But first, the normality of the data for each treatment must be studied to know which tests must be used. So, we first plot the distribution of the values per treatment to visually address about the data normality:

import matplotlib.pyplot as plt

dataset = pd.read_csv("C:\\Agri-Plast\\file_1231.csv") # Change to "~/Agri-Plast/file_1231.csv" on Linux/Mac
data = dataset[dataset["Treatment"] == "Control"]["Stomatal conductance (Gsw)  (µmol m⁻² s⁻¹)"]

plt.figure(figsize=(6,5))
plt.hist(data, bins=20)
plt.title("Frequency of Stomatal Conductance - Control")
plt.xlabel("Stomatal Conductance (Gsw)")
plt.ylabel("Frequency")
plt.show()

Output:

https://github.com/lmgoncalves94/Agri-Plast_API/blob/main/docs/source/1231_stom_hist_control.png?raw=true

import matplotlib.pyplot as plt

dataset = pd.read_csv("C:\\Agri-Plast\\file_1231.csv") # Change to "~/Agri-Plast/file_1231.csv" on Linux/Mac
data = dataset[dataset["Treatment"] == "Low [MPs]"]["Stomatal conductance (Gsw)  (µmol m⁻² s⁻¹)"]

plt.figure(figsize=(6,5))
plt.hist(data, bins=20)
plt.title("Frequency of Stomatal Conductance - Low [MPs]")
plt.xlabel("Stomatal Conductance (Gsw)")
plt.ylabel("Frequency")
plt.show()

Output:

https://github.com/lmgoncalves94/Agri-Plast_API/blob/main/docs/source/1231_stom_hist_low.png?raw=true

import matplotlib.pyplot as plt

dataset = pd.read_csv("C:\\Agri-Plast\\file_1231.csv") # Change to "~/Agri-Plast/file_1231.csv" on Linux/Mac
data = dataset[dataset["Treatment"] == "High [MPs]"]["Stomatal conductance (Gsw)  (µmol m⁻² s⁻¹)"]

plt.figure(figsize=(6,5))
plt.hist(data, bins=20)
plt.title("Frequency of Stomatal Conductance - High [MPs]")
plt.xlabel("Stomatal Conductance (Gsw)")
plt.ylabel("Frequency")
plt.show()

Output:

https://github.com/lmgoncalves94/Agri-Plast_API/blob/main/docs/source/1231_stom_hist_high.png?raw=true

The data does not appear to normally distributed. To be sure, we run the Shapiro’s Test.

For Control data:

from scipy.stats import shapiro

dataset = pd.read_csv("C:\\Agri-Plast\\file_1231.csv") # Change to "~/Agri-Plast/file_1231.csv" on Linux/Mac
data = dataset[dataset["Treatment"] == "Control"]["Stomatal conductance (Gsw)  (µmol m⁻² s⁻¹)"]
stat, p_value = shapiro(data)
print(stat)
print(p_value)

Output:

0.8156227285188757
0.004425037569148782

For Low [MPs]:

from scipy.stats import shapiro

dataset = pd.read_csv("C:\\Agri-Plast\\file_1231.csv") # Change to "~/Agri-Plast/file_1231.csv" on Linux/Mac
data = dataset[dataset["Treatment"] == "Low [MPs]"]["Stomatal conductance (Gsw)  (µmol m⁻² s⁻¹)"]
stat, p_value = shapiro(data)
print(stat)
print(p_value)

Output:

0.7881585746105204
0.0026013957337220357

For High [MPs]:

from scipy.stats import shapiro

dataset = pd.read_csv("C:\\Agri-Plast\\file_1231.csv") # Change to "~/Agri-Plast/file_1231.csv" on Linux/Mac
data = dataset[dataset["Treatment"] == "High [MPs]"]["Stomatal conductance (Gsw)  (µmol m⁻² s⁻¹)"]
stat, p_value = shapiro(data)
print(stat)
print(p_value)

Output:

0.8724685354837892
0.03668397290204818

For all the groups, the test rejected the null hypothesis of normality (p_value < 0.05), indicating that stomatal conductance values are not normally distributed. Therefore, Kruskal-Wallis Global test should be used instead of One-Way ANOVA:

from scipy.stats import kruskal
import pandas as pd

dataset = pd.read_csv("C:\\Agri-Plast\\file_1231.csv") # Change to "~/Agri-Plast/file_1231.csv" on Linux/Mac

control = dataset[dataset["Treatment"] == "Control"]["Stomatal conductance (Gsw)  (µmol m⁻² s⁻¹)"]
low = dataset[dataset["Treatment"] == "Low [MPs]"]["Stomatal conductance (Gsw)  (µmol m⁻² s⁻¹)"]
high = dataset[dataset["Treatment"] == "High [MPs]"]["Stomatal conductance (Gsw)  (µmol m⁻² s⁻¹)"]

H, p = kruskal(control, low, high)
print(H)
print(p)

Output:

0.21184088806657542
0.8994962055196596

The analysis revealed no statistically significant differences among the Control, Low [MPs], and High [MPs] groups (H = 0.21, p = 0.90), indicating that exposure to soil microplastics did not affect stomatal conductance under the conditions tested.

Gas Exchange vs Stomatal Conductance

To explore the relationship between photosynthetic activity and stomatal conductance, a simple linear regression is performed between Gas Exchange (A) and Stomatal conductance (Gsw). The resulting scatter plot shows the data points together with the fitted regression line:

import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import linregress

dataset = pd.read_csv("C:\\Agri-Plast\\file_1231.csv") # Change to "~/Agri-Plast/file_1231.csv" on Linux/Mac
dataset_clean = dataset.dropna(subset=["Gas Exchange (A) (µmol m⁻² s⁻¹)", "Stomatal conductance (Gsw)  (µmol m⁻² s⁻¹)"])

x = dataset_clean["Gas Exchange (A) (µmol m⁻² s⁻¹)"]
y = dataset_clean["Stomatal conductance (Gsw)  (µmol m⁻² s⁻¹)"]

slope, intercept, r_value, p_value, std_err = linregress(x, y)
regression_line = slope * x + intercept

plt.scatter(dataset_clean["Gas Exchange (A) (µmol m⁻² s⁻¹)"], dataset_clean["Stomatal conductance (Gsw)  (µmol m⁻² s⁻¹)"], alpha=0.6)
plt.plot(x, regression_line, color="red")

plt.xlabel("Gas Exchange (A)")
plt.ylabel("Stomatal conductance (Gsw)")
plt.title("Gas Exchange vs Stomatal Conductance")
plt.show()
print("y ="+str(slope)+"*x"+" + " + str(intercept))

Output:

https://github.com/lmgoncalves94/Agri-Plast_API/blob/main/docs/source/1231_gas_stom_reg.png?raw=true

y =0.005358690270373903*x + 0.005095107097022678