Obtaining the data

Note

The following tutorial is intended to be run entirely with Python 3.

Installing WSL

To run the following tutorial on Windows 10/11, you need a Linux environment. The easiest way to achieve this is by using the Windows Subsystem for Linux (WSL). To install it, follow the steps below:

Go to Control Panel
Change View By to “Category”
Go to Programs
Go to Turn Windows features on or off
Check Windows Subsystem For Linux and click Ok
Restart your computer
Go to Microsoft Store
Search for “Ubuntu”
Click on the first search result
Click on Get or Install
Open Ubuntu
Wait for installation and choose username and password
Done!

Creating working directory

Running the following commands will create an Agri-Plast directory on C:/ and /home/user/ directories on Windows or Linux, respectively:

On Windows:

import subprocess
subprocess.run('wsl bash -c "mkdir /mnt/c/Agri-Plast"', shell=True, check=True)

On Linux/Mac:

import subprocess
subprocess.run('mkdir ~/Agri-Plast', shell=True, check=True)

Downloading metadata file

In this tutorial, we will be firstly focusing on the following dataset: https://dmportal.biodata.pt/dataset.xhtml?persistentId=doi:10.34636/DMPortal/AWYIXC

From this link we should take note of the doi:

doi:10.34636/DMPortal/AWYIXC

And replace on the the doi variable on the following command:

On Windows:

import subprocess
doi = "doi:10.34636/DMPortal/AWYIXC"
subprocess.run(f'wsl bash -c "curl -L \\"https://dmportal.biodata.pt/api/datasets/:persistentId/?persistentId={doi}\\" -o /mnt/c/Agri-Plast/dataset.metadata"', shell=True, check=True)

On Linux/Mac:

import subprocess
doi = "doi:10.34636/DMPortal/AWYIXC"
subprocess.run(f'curl -L "https://dmportal.biodata.pt/api/datasets/:persistentId/?persistentId={doi}" -o ~/Agri-Plast/dataset.metadata', shell=True, check=True)

Obtaining File IDs

To obtain each file IDs from each dataset:

Open dataset.metadata
Search for: "dataFile":{"id":
Take note of the numbers following that search term. Next to it you have "filename": followed by the name of the file.

Downloading dataset files

In the case of the dataset presented here, the IDs of the two files are 1228 and 1229. To download each file, replace id by the desired one on the following command:

On Windows:

id = "1229"
import subprocess
subprocess.run(f'wsl bash -c "curl -L \\"https://dmportal.biodata.pt/api/access/datafile/{id}?format=original\\" -o /mnt/c/Agri-Plast/file_{id}.csv"', shell=True, check=True)

On Linux/Mac:

id = "1229"
import subprocess
subprocess.run(f'curl -L "https://dmportal.biodata.pt/api/access/datafile/{id}?format=original" -o ~/Agri-Plast/file_{id}.csv', shell=True, check=True)

Inspecting the file

import pandas as pd
dataset = pd.read_csv("C:\\Agri-Plast\\file_1229.csv") # Change to "~/Agri-Plast/file_1229.csv" on Linux/Mac
print(dataset)

Output:

            Date      Time  Temperature (°C)  Humidity (%HR)
   27/06/2024  11:27:00              31.6            92.8
   27/06/2024  11:42:00              36.6            43.7
   27/06/2024  11:57:00              38.1            41.4
   27/06/2024  12:12:00              38.4            39.7
   27/06/2024  12:27:00              38.8            39.5
...          ...       ...               ...             ...
18/09/2024  18:12:00              26.2            60.5
18/09/2024  18:27:00              26.0            63.2
18/09/2024  18:42:00              25.8            65.7
18/09/2024  18:57:00              25.5            66.5
18/09/2024  19:12:00              25.3            67.3

[8000 rows x 4 columns]

A simple linear regression

import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import linregress

dataset = pd.read_csv("C:\\Agri-Plast\\file_1229.csv") # Change to "~/Agri-Plast/file_1229.csv" on Linux/Mac
dataset_clean = dataset.dropna(subset=["Temperature (°C)", "Humidity (%HR)"])

x = dataset_clean["Temperature (°C)"]
y = dataset_clean["Humidity (%HR)"]

slope, intercept, r_value, p_value, std_err = linregress(x, y)
regression_line = slope * x + intercept

plt.scatter(dataset_clean["Temperature (°C)"], dataset_clean["Humidity (%HR)"], alpha=0.6)
plt.plot(x, regression_line, color="red")

plt.xlabel("Temperature (°C)")
plt.ylabel("Humidity (%HR)")
plt.title("Temperature vs Humidity")
plt.show()
print("y ="+str(slope)+"*x"+" + " + str(intercept))

Output:

https://github.com/lmgoncalves94/Agri-Plast_API/blob/main/docs/source/1229_plot_reg.png?raw=true

y =-2.471288951824853*x + 128.68938924684124