Fenología del Dipteryx¶
Estudiantes¶
Jose Pablo Jiménez
Dawa Méndez Álvarez
Descripción¶
El proyecto consiste en la exploración, limpieza, procesamiento y análisis de una base de datos del género Dipteryx en América. Se utilizan diferentes herramientas, aprendidas en el curso, con el fin de obtener información valiosa sobre la distribución del género y sus características de forma y función. Esto en aras de contribuir al aprovechamiento sostenible del recurso y la conservación de las especies forestales del género.
Justificación¶
El género Dipteryx, perteneciente a la familia Fabaceae, es de gran importancia ecológica y económica en las regiones tropicales de América del Sur y Central. Los árboles de Dipteryx son fundamentales para la conservación de la biodiversidad, ya que proporcionan hábitat y alimento a diversas especies de fauna silvestre. Su madera, dura y resistente, es también apreciada en la construcción y la ebanistería, lo que resalta su relevancia en la economía local. Por estas razones, la conservación de las especies del género Dipteryx es vital tanto para los ecosistemas donde se encuentran como para las comunidades humanas que dependen de sus múltiples beneficios.
Antecedentes¶
El área de la lamina foliar, la capacidad fotosintética y otras características de las hojas son de gran importancia para determinar la resiliencia de las especies vegetales ante las condiciones climáticas (Niinemets et al. 2001, Fyllas et al. 2009, Malhado et al. 2009). De manera similar, el periódo de floración, el periódo de fructificacion, y el estadio de desarrollo de los organismos son utilizado para estudiar los ciclos de vida de las especies vegetales, su capacidad de adaptación ante cambios ambientales y su riesgo de mortalidad (Wright et al. 2011). Sin embargo, la recopialción de esta información requiere de mucho trabajo de campo y largos periodos de muestreo. Más aún, si se desea comparar diferentes sitios o taxa, es necesario revisar la literatura cientifica y tratar de rescatar la información de los artículos. Sin embargo, pocas publicaciones suelen incluir los "datos crudos". De modo que era difícil para un solo investigador recolectar suficiente información para analizar de manera compresiva un taxon. Con el advenimiento de las tecnologías de la comunicación y la Internet, esto se ha vuelto más sencillo. Los investigadores pueden ahora compartir toda su información de manera transparente y nuevos repositorios son creados para almacenar dicha informacion. El problema actual no radica en el acceso a la información, sino en como analizar las grandes cantidades que estamos empezando a acumular. De allí la importancia de la ciencia de datos en el campo de la biología.
Descripción del problema¶
A nivel técnico, la base de datos con la que estamos trabajando tiene un acomodo particular lo cuál dificulta el análisis e interpretación de los datos. La base de datos mantienen un número de columnas fijo, referente a los clasificadores. Mientras incluye las variables en las filas. Esto resulta en entradas de una misma observacion o individuo que cuentan con varias filas (una por variable). Más aún, como la base recopila información de diferentes proyecto, sin un estandar establecido, algunas entredas cuenta con más variables que otras. Esto resulta en una gran cantidad de datos nulos y la información que es importante para la toma de decisiones con respecto a su uso y conservación cuesta más detectarla.
A nivel de investigación, se necesita comprender mejor cómo las variables ambientales, como por ejemplo la tasa de radiación solar por mencionar una, influyen en los patrones fenológicos y las caracteríticas fenotípicas de las especies del género Dipteryx.
Objetivo¶
Analizar los datos de características fenotípicas de especies del género Dipteryx y relacionarlos con variables ambientales y su distribución en diferentes regiones de América.
Descripción del conjuto de datos a utilizar con referencia formal a la fuente¶
Los datos corresponden a informacion descargada de TRY - Plant Trait Database (Kattge et al., 2020; https://try-db.org/TryWeb/Home.php). En esta base de datos se puede solicitar la descarga de información por especie o por caracter. La información se proporciona en un archivo de texto (.txt) comprimido, con codificacion "latin1 swedish ci". Las columnas son delimitadas por tabulaciones y contiene un encabezado en la primera fila. Cada fila corresponde a una entrada, categorizadas como caracteres, covariables, o metadatos. Esto resulta en multiples entradas por registro, i.e. varias filas por individuo.
Los datos utilizados en este ejercicio pueden ser accesados con el siguiente enlace: https://drive.google.com/file/d/16yDnspiMg1bWuAfdfr7e2r0tEz9jlOXk/view?usp=sharing
Instalar librería
#Instalar bibliotecas que se van a utilizar
!pip install numpy
!pip install pandas
!pip install seaborn
!pip install scikit-learn
!pip install matplotlib
!pip install ydata-profiling
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (1.25.2) Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (2.0.3) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas) (2.8.2) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas) (2023.4) Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas) (2024.1) Requirement already satisfied: numpy>=1.21.0 in /usr/local/lib/python3.10/dist-packages (from pandas) (1.25.2) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas) (1.16.0) Requirement already satisfied: seaborn in /usr/local/lib/python3.10/dist-packages (0.13.1) Requirement already satisfied: numpy!=1.24.0,>=1.20 in /usr/local/lib/python3.10/dist-packages (from seaborn) (1.25.2) Requirement already satisfied: pandas>=1.2 in /usr/local/lib/python3.10/dist-packages (from seaborn) (2.0.3) Requirement already satisfied: matplotlib!=3.6.1,>=3.4 in /usr/local/lib/python3.10/dist-packages (from seaborn) (3.7.1) Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.2.1) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (4.53.0) Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.4.5) Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (24.0) Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (9.4.0) Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (3.1.2) Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (2.8.2) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.2->seaborn) (2023.4) Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.2->seaborn) (2024.1) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.4->seaborn) (1.16.0) Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (1.2.2) Requirement already satisfied: numpy>=1.17.3 in /usr/local/lib/python3.10/dist-packages (from scikit-learn) (1.25.2) Requirement already satisfied: scipy>=1.3.2 in /usr/local/lib/python3.10/dist-packages (from scikit-learn) (1.11.4) Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn) (1.4.2) Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn) (3.5.0) Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (3.7.1) Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.2.1) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (4.53.0) Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.4.5) Requirement already satisfied: numpy>=1.20 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.25.2) Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (24.0) Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (9.4.0) Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (3.1.2) Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (2.8.2) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib) (1.16.0) Requirement already satisfied: ydata-profiling in /usr/local/lib/python3.10/dist-packages (4.8.3) Requirement already satisfied: scipy<1.14,>=1.4.1 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling) (1.11.4) Requirement already satisfied: pandas!=1.4.0,<3,>1.1 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling) (2.0.3) Requirement already satisfied: matplotlib<3.9,>=3.2 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling) (3.7.1) Requirement already satisfied: pydantic>=2 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling) (2.7.3) Requirement already satisfied: PyYAML<6.1,>=5.0.0 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling) (6.0.1) Requirement already satisfied: jinja2<3.2,>=2.11.1 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling) (3.1.4) Requirement already satisfied: visions[type_image_path]<0.7.7,>=0.7.5 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling) (0.7.6) Requirement already satisfied: numpy<2,>=1.16.0 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling) (1.25.2) Requirement already satisfied: htmlmin==0.1.12 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling) (0.1.12) Requirement already satisfied: phik<0.13,>=0.11.1 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling) (0.12.4) Requirement already satisfied: requests<3,>=2.24.0 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling) (2.31.0) Requirement already satisfied: tqdm<5,>=4.48.2 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling) (4.66.4) Requirement already satisfied: seaborn<0.14,>=0.10.1 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling) (0.13.1) Requirement already satisfied: multimethod<2,>=1.4 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling) (1.11.2) Requirement already satisfied: statsmodels<1,>=0.13.2 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling) (0.14.2) Requirement already satisfied: typeguard<5,>=3 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling) (4.3.0) Requirement already satisfied: imagehash==4.3.1 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling) (4.3.1) Requirement already satisfied: wordcloud>=1.9.1 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling) (1.9.3) Requirement already satisfied: dacite>=1.8 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling) (1.8.1) Requirement already satisfied: numba<1,>=0.56.0 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling) (0.58.1) Requirement already satisfied: PyWavelets in /usr/local/lib/python3.10/dist-packages (from imagehash==4.3.1->ydata-profiling) (1.6.0) Requirement already satisfied: pillow in /usr/local/lib/python3.10/dist-packages (from imagehash==4.3.1->ydata-profiling) (9.4.0) Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2<3.2,>=2.11.1->ydata-profiling) (2.1.5) Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib<3.9,>=3.2->ydata-profiling) (1.2.1) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib<3.9,>=3.2->ydata-profiling) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib<3.9,>=3.2->ydata-profiling) (4.53.0) Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib<3.9,>=3.2->ydata-profiling) (1.4.5) Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib<3.9,>=3.2->ydata-profiling) (24.0) Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib<3.9,>=3.2->ydata-profiling) (3.1.2) Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib<3.9,>=3.2->ydata-profiling) (2.8.2) Requirement already satisfied: llvmlite<0.42,>=0.41.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba<1,>=0.56.0->ydata-profiling) (0.41.1) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas!=1.4.0,<3,>1.1->ydata-profiling) (2023.4) Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas!=1.4.0,<3,>1.1->ydata-profiling) (2024.1) Requirement already satisfied: joblib>=0.14.1 in /usr/local/lib/python3.10/dist-packages (from phik<0.13,>=0.11.1->ydata-profiling) (1.4.2) Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic>=2->ydata-profiling) (0.7.0) Requirement already satisfied: pydantic-core==2.18.4 in /usr/local/lib/python3.10/dist-packages (from pydantic>=2->ydata-profiling) (2.18.4) Requirement already satisfied: typing-extensions>=4.6.1 in /usr/local/lib/python3.10/dist-packages (from pydantic>=2->ydata-profiling) (4.12.1) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.24.0->ydata-profiling) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.24.0->ydata-profiling) (3.7) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.24.0->ydata-profiling) (2.0.7) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.24.0->ydata-profiling) (2024.6.2) Requirement already satisfied: patsy>=0.5.6 in /usr/local/lib/python3.10/dist-packages (from statsmodels<1,>=0.13.2->ydata-profiling) (0.5.6) Requirement already satisfied: attrs>=19.3.0 in /usr/local/lib/python3.10/dist-packages (from visions[type_image_path]<0.7.7,>=0.7.5->ydata-profiling) (23.2.0) Requirement already satisfied: networkx>=2.4 in /usr/local/lib/python3.10/dist-packages (from visions[type_image_path]<0.7.7,>=0.7.5->ydata-profiling) (3.3) Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from patsy>=0.5.6->statsmodels<1,>=0.13.2->ydata-profiling) (1.16.0)
Instalar paquetes
#Cargar paquetes necesarios para analisis
import numpy as np
import pandas as pd
import seaborn as sns
import geopandas as gpd
from sklearn import datasets
from ydata_profiling import ProfileReport
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
%matplotlib inline
Importar los datos
#Cargar datos desde Google Drive
from google.colab import drive
drive.mount('/content/drive')
df = pd.read_csv('/content/drive/MyDrive/2_Cursos/Capacitacion/2024_redbioma_Python_para_Ciencia_de_Datos/ProyectoFinal/Dipteryx.txt', sep='\t', encoding='latin-1')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Explorar los datos
#Crear informe con pandas-profiling
profile = ProfileReport(df, title="Dipteryx", explorative=True)
#Mostrar el informe en un notebook (Jupyter o similares)
profile.to_notebook_iframe()
Output hidden; open in https://colab.research.google.com to view.
De la exploración de los datos, se evidencia que existen columnas con información que no es relevante para análisis posteriores, por ejemplo: los nombres de la persona que sometio la información a la base de datos o algunos códigos internos de TRY para clasificar los registros.Más aún, algunas presentan información redundante. De manera adicional, se presentan muchas asociaciones espurias, artefactos producto de como esta organizada la base de datos. Finalmente, hay un alto porcentaje de datos nulos o perdidos. De nuevo, producto de como esta organizada la base de datos. Por tanto, es necesario limpiar los datos, filtrando todas aquellas columnas que no sean de utilidad, asi como las celdas vacías.
Limpieza de datos
#Eliminar columnas innecesarias para simplificar el dataframe
df.drop(['LastName', 'FirstName', 'DatasetID', 'Dataset', 'AccSpeciesID',
'ObsDataID', 'TraitID', 'TraitName', 'DataID', 'ValueKindName',
'OrigUncertaintyStr', 'UncertaintyName', 'Replicates',
'RelUncertaintyPercent', 'OrigObsDataID', 'ErrorRisk', 'Comment',
'StdValueStr', 'Unnamed: 28','StdValue', 'UnitName','Reference'],
axis=1, inplace=True)
#Eliminar filas con datos perdidos (NaN)
df_clean = df.dropna(subset=['OrigValueStr'])
#Corregir el nombre de la especie Coumarouna odorata por Dipteryx odorata
df_clean.loc[df['SpeciesName'] == 'Coumarouna odorata', 'SpeciesName'] = 'Dipteryx odorata'
#Desplegar el dataframe simplificado y corregido
df_clean
SpeciesName | AccSpeciesName | ObservationID | DataName | OriglName | OrigValueStr | OrigUnitStr | |
---|---|---|---|---|---|---|---|
0 | Dipteryx panamensis | Dipteryx oleifera | 221146 | Plant developmental status / plant age / matur... | Seedlings (True/False) | T | NaN |
1 | Dipteryx panamensis | Dipteryx oleifera | 221146 | Location Site ID | Geography | 10°46'N, 84°02'W | NaN |
2 | Dipteryx panamensis | Dipteryx oleifera | 221146 | Latitude | Latitude | 10.77 | dec |
3 | Dipteryx panamensis | Dipteryx oleifera | 221146 | Longitude | Longitude | -84.03 | dec |
4 | Dipteryx panamensis | Dipteryx oleifera | 221146 | Vegetation type / Biome | Community type | tropical mosit forest | NaN |
... | ... | ... | ... | ... | ... | ... | ... |
3762 | Dipteryx alata | Dipteryx alata | 6466292 | Location / Site Name | location | Bacaba Municipal Park | NaN |
3763 | Dipteryx alata | Dipteryx alata | 6466292 | Location City, municipality | city | Xavantina | NaN |
3764 | Dipteryx alata | Dipteryx alata | 6466292 | Location Region | region | Mato Grosso | NaN |
3765 | Dipteryx alata | Dipteryx alata | 6466292 | Location Country | country | Brazil | NaN |
3766 | Dipteryx alata | Dipteryx alata | 6466292 | Height of measurement: stem diameter, tree rin... | measurement height | 30 | cm |
3767 rows × 7 columns
#Mostrar el número de valores únicos en cada columna del dataframe
df_clean.nunique( )
SpeciesName 10 AccSpeciesName 6 ObservationID 147 DataName 186 OriglName 273 OrigValueStr 954 OrigUnitStr 42 dtype: int64
#Mostrar los valores únicos en la columna DataName para seleccionar
#algunas variables para analizar y visualizar en gráfico
valores_unicos = df_clean['DataName'].unique()
print("Valores únicos en 'DataName':", valores_unicos)
#Contar la frecuencia de cada valor en la columna DataName
frecuencia_valores = df_clean['DataName'].value_counts()
print("\nFrecuencia de cada valor en 'DataName':\n", frecuencia_valores)
Valores únicos en 'DataName': ['Plant developmental status / plant age / maturity / plant life stage' 'Location Site ID' 'Latitude' 'Longitude' 'Vegetation type / Biome' 'Mean daily radiation' 'Mean radiation percent of full sunlight' 'SLA: undefined if petiole in- or excluded' 'Altitude' 'Soil type (soil order)' 'Soil fertility index (qi_1)' 'Plant height vegetative' 'Leaf exposition' 'Dataset (1)' 'Reference / source' 'Treatment: Exposition' 'Family' 'Genus' 'Leaf area: in case of compound leaves leaf; petiole and rhachis included' 'SLA disc: mid-vein, petiole and rhachis excluded; sun' 'SLA: petiole included' 'SLA lamina: petiole and rhachis excluded; sun' 'Leaf area: in case of compound leaves leaf; petiole included (1)' 'SLA disc: mid-vein, petiole and rhachis excluded; shade' 'SLA leaf; shade' 'SLA lamina: petiole and rhachis excluded; shade' 'Mean sum of annual precipitation (PPT / MAP / TAP)' 'Mean length of dry season' 'Parent rock' 'Site species richness' 'Site tree density' 'Identifier within contributed dataset (ID)' 'Mean annual temperature (MAT)' 'Mean annual VPD' 'Mean Windspeed' 'Class' 'Subclass' 'Canopy position: sun vers. Shade leaf qualifier, light exposure; canopy, understorey,' 'SLA: petiole excluded' 'Plant ID / Individual ID' 'Leaf ID' 'Exposition: position of plant in the canopy, canopy position, sun, shade' 'Measurement date / sampling date' 'Dispersal unit (diaspore): seed, fruit or spore' 'Seed water content at seed weight measurement' 'Seed mass comment' 'Seed mass reference (sometimes diaspore (dispersal unit) type)' 'Plant growth form reference' 'Plant height maximum' 'Plant height reference' 'Altitude comments' 'Net primary productivity of the site (NPP)' 'Leaf area index of the site (LAI)' 'Vegetation type / Biome ( 2)' 'Dispersal unit (diaspore) detail' 'Leaf area: in case of compound leaves leaf; undefined if petiole in- or excluded' 'Leaf area: in case of compound leaves leaflet; undefined if petiole in- or excluded' 'Months with sum of precipitation < 100 mm' 'Location / Site Name' 'Leaf area: in case of compound leaves undefined if leaf or leaflet; undefined if petiole and rhachis in- or excluded' 'Health status of plants (vitality)' 'Plot ID' 'X coordinate within plot' 'Y coordinate within plot' 'Number of replicate organs per plant' 'Confidence in species identification' 'Is this individual to be included in analyses?' 'Taxon ID in contributing dataset' 'Height of measurement from the ground / height from which sample was collected / measurement height' 'Was the individual fertile when sampled?' 'Where is the botanical sample currently?' 'Date of species identification update' 'Who modified this individual most recently?' 'Location Name' 'Plot area, plot size' 'Soil bedrock / geological substrate / parent material / lithology' 'Mean diurnal temperature range' 'Isothermality' 'Temperature seasonality' 'Max_temp_warmest_month' 'Minimum temperature of coldest month' 'Annual temperature range' 'Mean_temp_wet_quarter' 'Mean_temp_dry_quarter' 'Mean temperature of warmest quarter' 'Mean_temp_cold_quarter' 'Precip_wet_month: Maximum monthly precipitation' 'Precipitation of driest month: Minimum monthly precipitation' 'Precipitation seasonality; rain seasonality' 'Precipitation of wettest quarter' 'Precipitation of driest quarter' 'Precipitation of warmest quarter' 'Precipitation of coldest quarter' 'Plot name' 'Habitat / site description' 'Collectors' 'Number of plant individuals at the plot' 'Number of species at the plot' 'Leaf area: in case of compound leaves leaf; petiole excluded' 'Comments, notes, methods' 'Location Country' 'Atmospheric pressure' 'Hemisphere where site is found' 'Growth temperature' 'Leaf temperature during measurement (Tleaf)' 'Temperature during respiration measurements' 'Leaf respiration: fixed Q10 for temperature standardization' 'Leaf respiration: variable Q10 for temperature standardization' 'Mean temperature of measuring month' 'Wetness/Humidity/Aridity of area where samples were taken' 'Mean annual sum of potential evapotranspiration (PET)' 'Sampling or measurement date standardized' 'Aci curve ID' 'Leaf mass per area (LMA)' 'Datatype' 'Contact' 'Study ID; external Dataset ID' 'Height of measurement: stem diameter, tree rings, bark thickness' 'Contact email' 'Reference DOI (digital object identifier)' 'Dataset reference (citation)' 'Dataset DOI (digital object identifier) or url' 'BAAD h.t measurement method' 'BAAD h.c measurement method' 'BAAD d.ba measurement method' 'BAAD d.cr measurement method' 'BAAD c.d measurement method' 'Temperature: Mean Diurnal Range (Mean of monthly (max temp - min temp))' 'Temperature: Isothermality (BIO2/BIO7) (* 100)' 'Temperature: Seasonality (standard deviation *100)' 'Temperature: Max Temperature of Warmest Month' 'Temperature: Min Temperature of Coldest Month' 'Temperature: Annual Range' 'Temperature: Mean Temperature of Wettest Quarter' 'Temperature: Mean Temperature of Driest Quarter' 'Temperature: Mean Temperature of Warmest Quarter' 'Temperature: Mean Temperature of Coldest Quarter' 'Precipitation of Wettest Month' 'Precipitation of Driest Month' 'Precipitation Seasonality (Coefficient of Variation)' 'Precipitation of Wettest Quarter' 'Precipitation of Driest Quarter' 'Precipitation of Warmest Quarter' 'Precipitation of Coldest Quarter' 'Solar radiation (kJ m-2 day-1)' 'Water vapor pressure (kPa)' 'Wind speed (m s-1)' 'Dataset (2)' 'Actual EvapoTranspiration' 'Priestley-Taylor alpha coefficient' 'Soil water content (SWC)' 'Average annual relative humidity' 'Cloud cover' 'Average number of ground frost days per year (sum) (FRS)' 'Mean number of wet days per year' 'Mean cloud surface radiation budget from ERBE global: Global shortwave radiation budget data derived from 5 Years of ERBE measurements' 'Mean clear-sky surface radiation budget from ERBE global: Global shortwave radiation budget data derived from 5 Years of ERBE measurements' 'Mean cloud forcing surface radiation budget from ERBE global: Global shortwave radiation budget data derived from 5 Years of ERBE measurements' 'Soil C content per ground area' 'Soil N content per ground area' 'Soil bulk density' 'Soil field capacity' 'Soil thermal capacity' 'Soil ph' 'Soil profile available water capacity' 'Soil plant available water capacity of rooting zone (derived from remote sensing) 1' 'Soil plant available water capacity of rooting zone (derived from remote sensing) 2' 'Soil wilting point' 'Ecosystem rooting depth' 'Temperature sum of growing degree days (GDD)' 'Maximum Green Vegetation Fraction' 'NDVI of the site' 'GPP of the site' 'NPP of the site (2)' 'Terrestrial chlorophyll index of the site' 'Length of growing season (LGP)' 'Order' 'APG IV level 5' 'APG IV level 4' 'APG IV level 3' 'APG IV level 2' 'APG IV level 1' 'Major Phylogenetic Group' 'Fraction of absorbed photosynthetic active radiation (FAPAR) of the site' 'Location City, municipality' 'Location Region'] Frecuencia de cada valor en 'DataName': DataName Latitude 124 Longitude 124 Family 119 Mean sum of annual precipitation (PPT / MAP / TAP) 102 Altitude 71 ... SLA lamina: petiole and rhachis excluded; shade 1 Growth temperature 1 Temperature during respiration measurements 1 Dispersal unit (diaspore) detail 1 Mean temperature of measuring month 1 Name: count, Length: 186, dtype: int64
Graficar algunas variables
Radiacion solar e índice de área foliar por sitio
#Filtrar DataFrame limpio por las variables de interés
df_radiacion = df_clean[df_clean['DataName'] == 'Solar radiation (kJ m-2 day-1)']
df_area = df_clean[df_clean['DataName'] == 'Leaf area index of the site (LAI)']
#Encontrar individuos que tienen ambas variables
individuos_radiacion = set(df_radiacion['ObservationID'])
individuos_area = set(df_area['ObservationID'])
#Intersección de individuos que tienen ambos registros
individuos_ambos = individuos_radiacion & individuos_area
#Verificar si hay individuos en común
if len(individuos_ambos) == 0:
print("No hay individuos en común entre 'Solar radiation (kJ m-2 day-1)' y 'Leaf area index of the site (LAI)'.")
else:
#Filtrar el DataFrame original para mostrar sólo estos individuos con las variables específicas
df_filtrado = df_clean[(df_clean['ObservationID'].isin(individuos_ambos)) & (df_clean['DataName'].isin(['Solar radiation (kJ m-2 day-1)', 'Leaf area index of the site (LAI)']))]
print("\nRegistros de individuos con ambas variables (radiacion solar y area foliar):")
print(df_filtrado)
Registros de individuos con ambas variables (radiacion solar y area foliar): SpeciesName AccSpeciesName ObservationID \ 1772 Dipteryx panamensis Dipteryx oleifera 3032484 1813 Dipteryx panamensis Dipteryx oleifera 3032484 1844 Dipteryx panamensis Dipteryx oleifera 3032485 1885 Dipteryx panamensis Dipteryx oleifera 3032485 1916 Dipteryx panamensis Dipteryx oleifera 3032486 1957 Dipteryx panamensis Dipteryx oleifera 3032486 2061 Dipteryx panamensis Dipteryx oleifera 3055599 2102 Dipteryx panamensis Dipteryx oleifera 3055599 3440 Dipteryx oleifera Dipteryx oleifera 3114573 3478 Dipteryx oleifera Dipteryx oleifera 3114573 3509 Dipteryx odorata Dipteryx odorata 3124677 3550 Dipteryx odorata Dipteryx odorata 3124677 3582 Dipteryx odorata Dipteryx odorata 3124679 3620 Dipteryx odorata Dipteryx odorata 3124679 3652 Dipteryx punctata Dipteryx punctata 3124680 3690 Dipteryx punctata Dipteryx punctata 3124680 DataName OriglName OrigValueStr OrigUnitStr 1772 Solar radiation (kJ m-2 day-1) SRAD 19251.833333 kJ m-2 day-1 1813 Leaf area index of the site (LAI) LAI 4.0031943321 m2/m2 1844 Solar radiation (kJ m-2 day-1) SRAD 19251.833333 kJ m-2 day-1 1885 Leaf area index of the site (LAI) LAI 4.0031943321 m2/m2 1916 Solar radiation (kJ m-2 day-1) SRAD 19251.833333 kJ m-2 day-1 1957 Leaf area index of the site (LAI) LAI 4.0031943321 m2/m2 2061 Solar radiation (kJ m-2 day-1) SRAD 16831.333333 kJ m-2 day-1 2102 Leaf area index of the site (LAI) LAI 2.794444561 m2/m2 3440 Solar radiation (kJ m-2 day-1) SRAD 19301 kJ m-2 day-1 3478 Leaf area index of the site (LAI) LAI 2.9768054485 m2/m2 3509 Solar radiation (kJ m-2 day-1) SRAD 18574.333333 kJ m-2 day-1 3550 Leaf area index of the site (LAI) LAI 4.3365278244 m2/m2 3582 Solar radiation (kJ m-2 day-1) SRAD 18331.916667 kJ m-2 day-1 3620 Leaf area index of the site (LAI) LAI 3.9970834255 m2/m2 3652 Solar radiation (kJ m-2 day-1) SRAD 18331.916667 kJ m-2 day-1 3690 Leaf area index of the site (LAI) LAI 3.9970834255 m2/m2
#Garantizar que los datos de la columna 'OrigValueStr' sean interpretados como
#numericos y luego redondear a dos decimales
df_filtrado['OrigValueStr'] = pd.to_numeric(df_filtrado['OrigValueStr'], errors='coerce')
df_filtrado['OrigValueStr'] = df_filtrado['OrigValueStr'].round(2)
#Crear los vectores con las variables de interes
LAI = df_filtrado[df_filtrado['OriglName'] == 'LAI']['OrigValueStr']
SRAD = df_filtrado[df_filtrado['OriglName'] == 'SRAD']['OrigValueStr']
#Ordenar los datos en forma ascendente, con base en la variable LAI
sorted_indices = np.argsort(LAI)
x_sorted = np.array(LAI)[sorted_indices]
y_sorted = np.array(SRAD)[sorted_indices]
#Crear el grafico de dispersión, especificando el titulo y los ejes
plt.scatter(x_sorted, y_sorted)
plt.xlabel("Area foliar (m2/m2)")
plt.ylabel("Radiacion Solar (kJ m2 dia-1)")
plt.title("Radiación solar vs Area Foliar")
plt.grid(True)
#Mostrar el gráfico
plt.show()
<ipython-input-66-1d184d5f0258>:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_filtrado['OrigValueStr'] = pd.to_numeric(df_filtrado['OrigValueStr'], errors='coerce') <ipython-input-66-1d184d5f0258>:4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_filtrado['OrigValueStr'] = df_filtrado['OrigValueStr'].round(2)
Variabes fenológicas (medio de dispersión)
#Filtrar una variable de interes medio de dispersion
df_dispersal = df_clean[(df_clean['DataName'] == 'Dispersal unit (diaspore): seed, fruit or spore') | (df_clean['DataName'] == 'Dispersal unit (diaspore) detail')]
# Colores personalizados para cada barra
colors = ['#FF6347', '#4682B4']
# Crear el gráfico de barras
plt.figure(figsize=(10, 6))
bars = plt.bar(dispersal_counts.index, dispersal_counts.values, color=colors)
plt.title('Dispersión')
plt.xlabel('Unidades de dispersión (semilla, fruto, espora)')
plt.ylabel('Conteo')
plt.xticks(rotation=0) # Mantener las etiquetas horizontales
plt.show()
Estado de desarrollo del individuo
#Filtrar variable de interes
df_desarrollo = df_clean[df_clean['DataName'] == 'Plant developmental status / plant age / maturity / plant life stage']
#Corregir nombre de la categoria de crecimiento
df_desarrollo.loc[df_desarrollo['OrigValueStr'] == 'T', 'OrigValueStr'] = 'Seedling'
#Contar las clasificaciones, se observa en que columna esta la clasificacion
desarrollo_counts = df_desarrollo['OrigValueStr'].value_counts()
#Colores personalizados para cada barra
colors = ['#4682B4', '#32CD32', '#FF6347']
#Crear el gráfico de barras
plt.figure(figsize=(10, 6))
bars = plt.bar(desarrollo_counts.index, desarrollo_counts.values, color=colors)
plt.title('Desarrollo')
plt.xlabel('Estadio de Desarrollo')
plt.ylabel('Numero de individuos')
plt.xticks(rotation=0)
#Mostrar los valores encima de cada barras
for bar in bars:
yval = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2.0, yval, int(yval), va='bottom') # va='bottom' coloca el texto encima de la barra
plt.show()
Ubicar los datos geográficamente
#Filtrar por Latitud
df_lat = df[df['OriglName'].str.contains('latitude', case=False, na=False)].copy()
#Cambiar el nombre de la columna con los datos de Latitud
df_lat.rename(columns={'OrigValueStr': 'Latitude'}, inplace=True)
#Filtrar por Longitud
df_lon = df[df['OriglName'].str.contains('longitude', case=False, na=False)].copy()
#Cambiar el nombre de la columna con los datos de Longitud
df_lon.rename(columns={'OrigValueStr': 'Longitude'}, inplace=True)
#Combinar ambos DataFrames en uno solo
df_combined = pd.concat([df_lat.reset_index(drop=True), df_lon.reset_index(drop=True)], axis=1)
df_combined
SpeciesName | AccSpeciesName | ObservationID | DataName | OriglName | Latitude | OrigUnitStr | SpeciesName | AccSpeciesName | ObservationID | DataName | OriglName | Longitude | OrigUnitStr | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Dipteryx panamensis | Dipteryx oleifera | 221146 | Latitude | Latitude | 10.77 | dec | Dipteryx panamensis | Dipteryx oleifera | 221146 | Longitude | Longitude | -84.03 | dec |
1 | Dipteryx panamensis | Dipteryx oleifera | 221147 | Latitude | Latitude | 10.77 | dec | Dipteryx panamensis | Dipteryx oleifera | 221147 | Longitude | Longitude | -84.03 | dec |
2 | Dipteryx panamensis | Dipteryx oleifera | 221148 | Latitude | Latitude | 10.77 | dec | Dipteryx panamensis | Dipteryx oleifera | 221148 | Longitude | Longitude | -84.03 | dec |
3 | Dipteryx alata | Dipteryx alata | 251789 | Latitude | Latitude | -14.38556 | NaN | Dipteryx alata | Dipteryx alata | 251789 | Longitude | Longitude | -61.14778 | NaN |
4 | Dipteryx panamensis | Dipteryx oleifera | 1298920 | Latitude | decimal latitude | 9.15 | NaN | Dipteryx panamensis | Dipteryx oleifera | 1298920 | Longitude | decimal longitude | -79.85 | NaN |
5 | Dipteryx panamensis | Dipteryx oleifera | 1298949 | Latitude | decimal latitude | 9.15 | NaN | Dipteryx panamensis | Dipteryx oleifera | 1298949 | Longitude | decimal longitude | -79.85 | NaN |
6 | Dipteryx oleifera | Dipteryx oleifera | 1783916 | Latitude | Latitude | 8.38 | NaN | Dipteryx oleifera | Dipteryx oleifera | 1783916 | Longitude | Longitude | -80.1 | NaN |
7 | Dipteryx oleifera | Dipteryx oleifera | 1783917 | Latitude | Latitude | 9.1 | NaN | Dipteryx oleifera | Dipteryx oleifera | 1783917 | Longitude | Longitude | -79.6 | NaN |
8 | Dipteryx odorata (Aubl.) Willd. | Dipteryx odorata | 1803188 | Latitude | latitude | 5.54415 | NaN | Dipteryx odorata (Aubl.) Willd. | Dipteryx odorata | 1803188 | Longitude | longitude | -53.8132 | NaN |
9 | Dipteryx odorata (Aubl.) Willd. | Dipteryx odorata | 1803189 | Latitude | latitude | 4.08333 | NaN | Dipteryx odorata (Aubl.) Willd. | Dipteryx odorata | 1803189 | Longitude | longitude | -52.6833 | NaN |
10 | Dipteryx odorata (Aubl.) Willd. | Dipteryx odorata | 1803190 | Latitude | latitude | 5.27225 | NaN | Dipteryx odorata (Aubl.) Willd. | Dipteryx odorata | 1803190 | Longitude | longitude | -52.926 | NaN |
11 | Dipteryx punctata (Blake) Amsh. | Dipteryx punctata | 1803191 | Latitude | latitude | 5.27225 | NaN | Dipteryx punctata (Blake) Amsh. | Dipteryx punctata | 1803191 | Longitude | longitude | -52.926 | NaN |
12 | Dipteryx panamensis | Dipteryx oleifera | 2391946 | Latitude | Latitude | 10.430633 | NaN | Dipteryx panamensis | Dipteryx oleifera | 2391946 | Longitude | Longitude | -84.006964 | NaN |
13 | Dipteryx micrantha | Dipteryx micrantha | 2392297 | Latitude | Latitude | -0.783 | NaN | Dipteryx micrantha | Dipteryx micrantha | 2392297 | Longitude | Longitude | -76.042 | NaN |
14 | Dipteryx panamensis | Dipteryx oleifera | 2393079 | Latitude | Latitude | 9.167 | NaN | Dipteryx panamensis | Dipteryx oleifera | 2393079 | Longitude | Longitude | -79.85 | NaN |
15 | Coumarouna odorata | Dipteryx odorata | 2393379 | Latitude | Latitude | -1.45 | NaN | Coumarouna odorata | Dipteryx odorata | 2393379 | Longitude | Longitude | -48.45 | NaN |
16 | Dipteryx odorata | Dipteryx odorata | 2397821 | Latitude | Latitude | 6.935 | NaN | Dipteryx odorata | Dipteryx odorata | 2397821 | Longitude | Longitude | -61.345 | NaN |
17 | Dipteryx alata | Dipteryx alata | 2400021 | Latitude | Latitude | -14.38556 | NaN | Dipteryx alata | Dipteryx alata | 2400021 | Longitude | Longitude | -61.14778 | NaN |
18 | Dipteryx micrantha | Dipteryx micrantha | 2459908 | Latitude | Latitude | -3.9491 | degrees | Dipteryx micrantha | Dipteryx micrantha | 2459908 | Longitude | Longitude | -73.4346 | degrees |
19 | Dipteryx alata | Dipteryx alata | 2691946 | Latitude | Latitude | -12.83855 | NaN | Dipteryx alata | Dipteryx alata | 2691946 | Longitude | Longitude | -69.29602 | NaN |
20 | Dipteryx micrantha | Dipteryx micrantha | 2691985 | Latitude | Latitude | -3.94915 | NaN | Dipteryx micrantha | Dipteryx micrantha | 2691985 | Longitude | Longitude | -73.43463 | NaN |
21 | Dipteryx panamensis | Dipteryx oleifera | 3016160 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016160 | Longitude | longitude | -84.003 | deg |
22 | Dipteryx panamensis | Dipteryx oleifera | 3016161 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016161 | Longitude | longitude | -84.003 | deg |
23 | Dipteryx panamensis | Dipteryx oleifera | 3016162 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016162 | Longitude | longitude | -84.003 | deg |
24 | Dipteryx panamensis | Dipteryx oleifera | 3016163 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016163 | Longitude | longitude | -84.003 | deg |
25 | Dipteryx panamensis | Dipteryx oleifera | 3016164 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016164 | Longitude | longitude | -84.003 | deg |
26 | Dipteryx panamensis | Dipteryx oleifera | 3016165 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016165 | Longitude | longitude | -84.003 | deg |
27 | Dipteryx panamensis | Dipteryx oleifera | 3016166 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016166 | Longitude | longitude | -84.003 | deg |
28 | Dipteryx panamensis | Dipteryx oleifera | 3016167 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016167 | Longitude | longitude | -84.003 | deg |
29 | Dipteryx panamensis | Dipteryx oleifera | 3016168 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016168 | Longitude | longitude | -84.003 | deg |
30 | Dipteryx panamensis | Dipteryx oleifera | 3016169 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016169 | Longitude | longitude | -84.003 | deg |
31 | Dipteryx panamensis | Dipteryx oleifera | 3016170 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016170 | Longitude | longitude | -84.003 | deg |
32 | Dipteryx panamensis | Dipteryx oleifera | 3016171 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016171 | Longitude | longitude | -84.003 | deg |
33 | Dipteryx panamensis | Dipteryx oleifera | 3016172 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016172 | Longitude | longitude | -84.003 | deg |
34 | Dipteryx panamensis | Dipteryx oleifera | 3016173 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016173 | Longitude | longitude | -84.003 | deg |
35 | Dipteryx panamensis | Dipteryx oleifera | 3016174 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016174 | Longitude | longitude | -84.003 | deg |
36 | Dipteryx panamensis | Dipteryx oleifera | 3016175 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016175 | Longitude | longitude | -84.003 | deg |
37 | Dipteryx panamensis | Dipteryx oleifera | 3016176 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016176 | Longitude | longitude | -84.003 | deg |
38 | Dipteryx panamensis | Dipteryx oleifera | 3016177 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016177 | Longitude | longitude | -84.003 | deg |
39 | Dipteryx panamensis | Dipteryx oleifera | 3016178 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016178 | Longitude | longitude | -84.003 | deg |
40 | Dipteryx panamensis | Dipteryx oleifera | 3016179 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016179 | Longitude | longitude | -84.003 | deg |
41 | Dipteryx panamensis | Dipteryx oleifera | 3016180 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016180 | Longitude | longitude | -84.003 | deg |
42 | Dipteryx panamensis | Dipteryx oleifera | 3016181 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016181 | Longitude | longitude | -84.003 | deg |
43 | Dipteryx panamensis | Dipteryx oleifera | 3016182 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016182 | Longitude | longitude | -84.003 | deg |
44 | Dipteryx panamensis | Dipteryx oleifera | 3016183 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016183 | Longitude | longitude | -84.003 | deg |
45 | Dipteryx panamensis | Dipteryx oleifera | 3016184 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016184 | Longitude | longitude | -84.003 | deg |
46 | Dipteryx panamensis | Dipteryx oleifera | 3016185 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016185 | Longitude | longitude | -84.003 | deg |
47 | Dipteryx panamensis | Dipteryx oleifera | 3016186 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016186 | Longitude | longitude | -84.003 | deg |
48 | Dipteryx panamensis | Dipteryx oleifera | 3016187 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016187 | Longitude | longitude | -84.003 | deg |
49 | Dipteryx panamensis | Dipteryx oleifera | 3016188 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016188 | Longitude | longitude | -84.003 | deg |
50 | Dipteryx panamensis | Dipteryx oleifera | 3016189 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016189 | Longitude | longitude | -84.003 | deg |
51 | Dipteryx panamensis | Dipteryx oleifera | 3016190 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016190 | Longitude | longitude | -84.003 | deg |
52 | Dipteryx panamensis | Dipteryx oleifera | 3016191 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016191 | Longitude | longitude | -84.003 | deg |
53 | Dipteryx panamensis | Dipteryx oleifera | 3016192 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016192 | Longitude | longitude | -84.003 | deg |
54 | Dipteryx panamensis | Dipteryx oleifera | 3016193 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016193 | Longitude | longitude | -84.003 | deg |
55 | Dipteryx panamensis | Dipteryx oleifera | 3016194 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016194 | Longitude | longitude | -84.003 | deg |
56 | Dipteryx panamensis | Dipteryx oleifera | 3016195 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016195 | Longitude | longitude | -84.003 | deg |
57 | Dipteryx panamensis | Dipteryx oleifera | 3016196 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016196 | Longitude | longitude | -84.003 | deg |
58 | Dipteryx panamensis | Dipteryx oleifera | 3016197 | Latitude | latitude | 10.4321 | deg | Dipteryx panamensis | Dipteryx oleifera | 3016197 | Longitude | longitude | -84.003 | deg |
59 | Dipteryx panamensis | Dipteryx oleifera | 6231868 | Latitude | Latitude | 10.43 | NaN | Dipteryx panamensis | Dipteryx oleifera | 6231868 | Longitude | Longitude | -84.07 | NaN |
#Eliminar las columnas duplicadas
columnas_unicas = ~df_combined.columns.duplicated()
#Seleccionar las columnas únicas y crear un nuevo DataFrame para mapear
df_coord = df_combined.loc[:, columnas_unicas]
#Eliminar columnas inncesarias
df_coord.drop(['DataName', 'OriglName', 'OrigUnitStr'], axis=1, inplace=True)
df_coord
<ipython-input-95-277d80e6e9c8>:8: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_coord.drop(['DataName', 'OriglName', 'OrigUnitStr'], axis=1, inplace=True)
SpeciesName | AccSpeciesName | ObservationID | Latitude | Longitude | |
---|---|---|---|---|---|
0 | Dipteryx panamensis | Dipteryx oleifera | 221146 | 10.77 | -84.03 |
1 | Dipteryx panamensis | Dipteryx oleifera | 221147 | 10.77 | -84.03 |
2 | Dipteryx panamensis | Dipteryx oleifera | 221148 | 10.77 | -84.03 |
3 | Dipteryx alata | Dipteryx alata | 251789 | -14.38556 | -61.14778 |
4 | Dipteryx panamensis | Dipteryx oleifera | 1298920 | 9.15 | -79.85 |
5 | Dipteryx panamensis | Dipteryx oleifera | 1298949 | 9.15 | -79.85 |
6 | Dipteryx oleifera | Dipteryx oleifera | 1783916 | 8.38 | -80.1 |
7 | Dipteryx oleifera | Dipteryx oleifera | 1783917 | 9.1 | -79.6 |
8 | Dipteryx odorata (Aubl.) Willd. | Dipteryx odorata | 1803188 | 5.54415 | -53.8132 |
9 | Dipteryx odorata (Aubl.) Willd. | Dipteryx odorata | 1803189 | 4.08333 | -52.6833 |
10 | Dipteryx odorata (Aubl.) Willd. | Dipteryx odorata | 1803190 | 5.27225 | -52.926 |
11 | Dipteryx punctata (Blake) Amsh. | Dipteryx punctata | 1803191 | 5.27225 | -52.926 |
12 | Dipteryx panamensis | Dipteryx oleifera | 2391946 | 10.430633 | -84.006964 |
13 | Dipteryx micrantha | Dipteryx micrantha | 2392297 | -0.783 | -76.042 |
14 | Dipteryx panamensis | Dipteryx oleifera | 2393079 | 9.167 | -79.85 |
15 | Coumarouna odorata | Dipteryx odorata | 2393379 | -1.45 | -48.45 |
16 | Dipteryx odorata | Dipteryx odorata | 2397821 | 6.935 | -61.345 |
17 | Dipteryx alata | Dipteryx alata | 2400021 | -14.38556 | -61.14778 |
18 | Dipteryx micrantha | Dipteryx micrantha | 2459908 | -3.9491 | -73.4346 |
19 | Dipteryx alata | Dipteryx alata | 2691946 | -12.83855 | -69.29602 |
20 | Dipteryx micrantha | Dipteryx micrantha | 2691985 | -3.94915 | -73.43463 |
21 | Dipteryx panamensis | Dipteryx oleifera | 3016160 | 10.4321 | -84.003 |
22 | Dipteryx panamensis | Dipteryx oleifera | 3016161 | 10.4321 | -84.003 |
23 | Dipteryx panamensis | Dipteryx oleifera | 3016162 | 10.4321 | -84.003 |
24 | Dipteryx panamensis | Dipteryx oleifera | 3016163 | 10.4321 | -84.003 |
25 | Dipteryx panamensis | Dipteryx oleifera | 3016164 | 10.4321 | -84.003 |
26 | Dipteryx panamensis | Dipteryx oleifera | 3016165 | 10.4321 | -84.003 |
27 | Dipteryx panamensis | Dipteryx oleifera | 3016166 | 10.4321 | -84.003 |
28 | Dipteryx panamensis | Dipteryx oleifera | 3016167 | 10.4321 | -84.003 |
29 | Dipteryx panamensis | Dipteryx oleifera | 3016168 | 10.4321 | -84.003 |
30 | Dipteryx panamensis | Dipteryx oleifera | 3016169 | 10.4321 | -84.003 |
31 | Dipteryx panamensis | Dipteryx oleifera | 3016170 | 10.4321 | -84.003 |
32 | Dipteryx panamensis | Dipteryx oleifera | 3016171 | 10.4321 | -84.003 |
33 | Dipteryx panamensis | Dipteryx oleifera | 3016172 | 10.4321 | -84.003 |
34 | Dipteryx panamensis | Dipteryx oleifera | 3016173 | 10.4321 | -84.003 |
35 | Dipteryx panamensis | Dipteryx oleifera | 3016174 | 10.4321 | -84.003 |
36 | Dipteryx panamensis | Dipteryx oleifera | 3016175 | 10.4321 | -84.003 |
37 | Dipteryx panamensis | Dipteryx oleifera | 3016176 | 10.4321 | -84.003 |
38 | Dipteryx panamensis | Dipteryx oleifera | 3016177 | 10.4321 | -84.003 |
39 | Dipteryx panamensis | Dipteryx oleifera | 3016178 | 10.4321 | -84.003 |
40 | Dipteryx panamensis | Dipteryx oleifera | 3016179 | 10.4321 | -84.003 |
41 | Dipteryx panamensis | Dipteryx oleifera | 3016180 | 10.4321 | -84.003 |
42 | Dipteryx panamensis | Dipteryx oleifera | 3016181 | 10.4321 | -84.003 |
43 | Dipteryx panamensis | Dipteryx oleifera | 3016182 | 10.4321 | -84.003 |
44 | Dipteryx panamensis | Dipteryx oleifera | 3016183 | 10.4321 | -84.003 |
45 | Dipteryx panamensis | Dipteryx oleifera | 3016184 | 10.4321 | -84.003 |
46 | Dipteryx panamensis | Dipteryx oleifera | 3016185 | 10.4321 | -84.003 |
47 | Dipteryx panamensis | Dipteryx oleifera | 3016186 | 10.4321 | -84.003 |
48 | Dipteryx panamensis | Dipteryx oleifera | 3016187 | 10.4321 | -84.003 |
49 | Dipteryx panamensis | Dipteryx oleifera | 3016188 | 10.4321 | -84.003 |
50 | Dipteryx panamensis | Dipteryx oleifera | 3016189 | 10.4321 | -84.003 |
51 | Dipteryx panamensis | Dipteryx oleifera | 3016190 | 10.4321 | -84.003 |
52 | Dipteryx panamensis | Dipteryx oleifera | 3016191 | 10.4321 | -84.003 |
53 | Dipteryx panamensis | Dipteryx oleifera | 3016192 | 10.4321 | -84.003 |
54 | Dipteryx panamensis | Dipteryx oleifera | 3016193 | 10.4321 | -84.003 |
55 | Dipteryx panamensis | Dipteryx oleifera | 3016194 | 10.4321 | -84.003 |
56 | Dipteryx panamensis | Dipteryx oleifera | 3016195 | 10.4321 | -84.003 |
57 | Dipteryx panamensis | Dipteryx oleifera | 3016196 | 10.4321 | -84.003 |
58 | Dipteryx panamensis | Dipteryx oleifera | 3016197 | 10.4321 | -84.003 |
59 | Dipteryx panamensis | Dipteryx oleifera | 6231868 | 10.43 | -84.07 |
#Cargar un archivo shapefile con información geográfica (geopandas)
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
#Filtrar para tener solo el continente américano (por la especie)
america = world[world['continent'].isin(['North America', 'Central America', 'South America'])]
#Convertir el df_coord en un GeoDataFrame
gdf = gpd.GeoDataFrame(df_coord, geometry=gpd.points_from_xy(df_coord.Longitude, df_coord.Latitude))
#Graficar el mapa base
ax = america.plot(color='white', edgecolor='black')
# Graficar los puntos de cada especie con un color diferente
for especie, color in zip(gdf['AccSpeciesName'].unique(), ['blue', 'green', 'red', 'yellow', 'orange']):
gdf[gdf['AccSpeciesName'] == especie].plot(ax=ax, color=color, label=especie, markersize=10, alpha=0.5)
# Agregar la leyenda
plt.legend()
# Mostrar el mapa
plt.show()
<ipython-input-100-3aecd5382e89>:2: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/. world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
Describir los resultados obtenidos¶
Limpieza y exploración de los datos: La tabulación de la base de datos reflejó un arreglo más difícil de solucionar, apesar de hacer la exploración y limpiar los datos, al eliminar columnas, detectar posibles errores como el nombre de las especies. Se observó que ciertas columnas presentan una gran cantidad de variables como la columna "DataName" con más de 100 variables todas de texto mientras que sus datos númericos se encontraban en otra columna "OrigValueStr". Más aún no hay un patrón claro que permita ordenar los datos (i.e. transponer las filas para generar columnas con las variables), pues para cada registro el numero de filas (variables) era diferente dependiendo de la fuente de la información.
Graficar variables: Según el gráfico de dispersión de las variables radiacion solar y índice de área foliar, no parece existir una relación clara entra las variables. Sin embargo, aunque se esperaba ver una disminución en el área foliar conforme incrementa la radiación solar, la tendencia es lo opuesto. Pese a que el set de datos cuenta con 147 observaciones independientes (individuos diferentes), muy pocos cuentan con información completa para todas las variables. Por ejemplo revisando las variables presentes en la columna DataName nos damos cuenta que existen pocas variables relacionadas a datos fenologicos y las que existen son de muy poco individuos.
Mapa de distribución geográfica: Se observa como 5 especies del género Dipteryx se distribuyen en América Central y América del Sur, donde Dipteryx oleifera parece estar limitada al itsmo centroamericano (Costa Rica y Panama), mientras las demás especies se pueden encontrar en América del Sur. Lamentablemente solo 60 individuos tenían la ubicación por latitud y longitud, por lo que no se logró mapear el conjunto de datos completo. Para las demás observaciones no se cuenta con datos de ubicación geografica o solo se incluyen referencias a la localidad (provincias, distritos, etc).
Conclusiones¶
El formato de la base de datos es crucial para su utilización. Aunque TRY proporciona un recurso muy valioso, es necesario que se defina un estandar para todos los investigadores que desean contribuir datos. Pues de lo contrario, se vuelve muy dificil poder extraer la información. Es importante mencionar que si existen paquetes de R específicos para exportar y analizar datos de TRY. Sin embargo, no así para Python (aunque es posible que MySQLdb funcione, pero esto escapa los contenidos del curso).
La exploración de los datos es vital a la hora de trabajar con bases de datos, para deteminar que tipo de análisis realizar y obtener resultados que contribuyan a la toma de decisiones.
Debido a lo que se observó en la exploración de los datos se tuvo que hacer filtros para poder trabajar con las variables, sin embargo la base de dato requiere de mucha depuración.
El género Dipteryx ha sido definido como neotropical. Existe mayor diversidad de especies en America del Sur, y solo una, D. oleifera, alcanza el itsmo centroamericano. Esto puede indicar que los eventos de especiacion que dieron lugar a D. oleifera han sido relativamente recientes y la especie no hay tenido la oportunidad de amplia su rango de distribución.
No se logra el objetivo propuesto, debido a la poca información disponible sobre esos parámetros fenologicos contenidos en la base de datos.
Referencias¶
Kattge, J., G. Bönisch, S. Díaz, S. et al. (2020) TRY plant trait database – enhanced coverage and open access. Global Change Biology, 26:119–188.
Niinemets, U. (2001). Global-scale climatic controls of leaf dry mass per area, density, and thickness in trees and shrubs. Ecology 82:453-469.
Fyllas, N. M., S. Patino, T. R. Baker, et al. (2009). Basin-wide variations in foliar properties of Amazonian forest: phylogeny, soils and climate. Biogeosciences 6:2677-2708.
Malhado ACM, Malhi Y, Whittaker RJ, et al. (2009). Spatial trends in leaf size of Amazonian rainforest trees. Biogeoscience 6, 1563-1576.
Wright, S. J., K. Kitajima, N. J. B. Kraft, et al. (2011). Functional traits and the growth-mortality tradeoff in tropical trees. Ecology 91:3664-3674.