
According to Communications earth & enviroment The Yucatan Peninsula is one of the largest coastal and lowland karst regions worldwide. This groundwater-dependent region is highly vulnerable to contaminants that spread easily due to the karst environment. Here, the spatio-temporal patterns of major ions in 1528 water samples sourced from a government institution suggest the main factors triggering salinization in the aquifer system. The hydrogeochemical analysis, supported by dimensional reduction and network-based clustering, linked one-third of the samples to contamination outbreaks from seawater intrusion, extensive gypsum dissolution in the south, and nitrate pollution in the ubiquitous carbonate aquifer matrix. Temporal variations of water quality indicated changes in regional recharge trends and increasing human impact in recent decades. Moreover, ~23% of water samples from human-use sources exceeded acceptable sulfate and nitrate limits for drinking water purposes. The study underscores the need for continuous water quality monitoring and enhanced regional knowledge to support management plans.
The comprehensive analysis of spatio-temporal water quality patterns in the Yucatan Peninsula, supported by unsupervised learning techniques, provides valuable insights into regional hydrogeochemical processes and contamination vulnerabilities. Examination of major ions reveals a heterogeneous and complex water quality profile driven by distinct processes. Key factors triggering salinization include the seawater-groundwater mixing driven by seawater intrusion, with the coastal lowland being the most vulnerable to upconing due to the proximity of the seawater wedge and potential seawater surges. On the other hand, gypsum dissolution acts regionally at the south of the peninsula within the Elevated Interior Region. Furthermore, regional recharge to the coastal plain with calcium-sulfate groundwater type is suggested, influencing water quality in the predominantly carbonate aquifer. Additionally, the study highlights a widespread nitrate contamination across the Yucatán Peninsula attributed to inadequate waste management and the high vulnerability of the karst environment, which facilitates rapid infiltration and mixing.
The findings emphasize the importance of spatio-temporal monitoring of major ions to ensure safe water sources for human consumption in the Yucatan Peninsula. Approximately 23% of samples from water sources designated for human use exceeded acceptable limits for the SO42− or NO3−, posing health risks. With anticipated population and tourism growth, anthropogenic pollution is expected to intensify, while climate change may disrupt hydrogeological patterns in the karst environment, affecting the rate of carbonate and gypsum dissolution. Furthermore, sea level rise could introduce the large seawater wedge into the aquifer due to the low hydraulic heads, further contributing to salinization. The study suggests future research directions to better understand aquifer behavior, including increasing knowledge of seawater-groundwater mixing diffusion zones at the regional scale through a regional flow model and advancing hydrogeological comprehension in the south of the peninsula. These efforts are essential for informed water resource management and sustainable development in the region.

Methods
Data and quality
The dataset analyzed in this study comprises recently published and unpublished reports of 1528 water quality samples collected between 1998 and 2021, focusing on major ions as measurement parameters (Ca2+, Mg2+, Na+, K+, SO42−, HCO3−, Cl−, and NO3−)36 (Fig. 1d). The dataset includes 706 water samples from historical records (1998–2003), 356 samples at the end of the 2014 rainy season (October and November), 338 at the end of the 2014 dry season (April and May), and 128 samples at the end of the 2021 dry season (April and May). Of these, 78% were collected from tube wells after purging, while the remainder were from cenotes, springs and dug wells. Most of the samples from the wells, mainly intended for water supply, were pumped without certainty about the screening intervals. According to the Mexican water authority, wells have depths of up to 100 m48. The samples from the other sources were taken close to the phreatic surface, except for 63 samples from the 2021 campaign that were taken from depths up to 20 m.
Major ions were measured according to the INEGI guidelines for water quality analysis61. To ensure the reliability of the analysis, 117 out of the initial 1645 samples were excluded. These exclusions were due to exceeding the 5% threshold for the absolute Charge Balance Error, related to major ions measurement accuracy62 as well as those exhibiting inconsistencies in concentration values or sampling sites outside the study area. Total Dissolved Solids (TDS) were calculated as the sum of the major ions for subsequent analysis. CO32− was excluded from the multivariate analyses due to null values in 64% of the samples; in turn, it constitutes only 0.7% of TDS on average. The dataset statistical description is provided in Supplementary Table 1 and Supplementary Table 2.
Statistical analysis
Statistical analyses were conducted using the R programming environment utilizing the stats package37,63. This included descriptive statistics and the two-tailed Wilcoxon rank-sum tests64 for assessing temporal differences between non-parametric distributions. p values less than 0.005 were considered statistically significant.
Multivariate analysis
Clustering of water samples was achieved through the Edge Betweenness65 community detection algorithm applied to an undirected and unweighted Nearest Neighbor Graph66 of the INEGI dataset36,37. First, the Euclidean distance matrix of the standardized dataset was computed, and the K shortest distances for each sample were selected as the network’s connections to simplify the topology of the multivariate data. The Edge Betweenness community detection algorithm was then applied to the graph network to identify clusters (communities) based on the resulting structure. For this analysis, K = 8 was chosen to minimize the number of clusters and maximize the modularity, which quantifies the strength of a network’s division into communities. Fewer clusters simplify the analysis, while higher modularity values indicate better-defined separation. Supplementary Table 3 shows the variation in clusters’ numbers and modularity concerning the K values. The R packages igraph67 and cccd68 were used to construct the network and perform community detection, and the Kamada Kawai layout was utilized for network visualization67.
Principal components analysis (PCA)69 was conducted on the INEGI dataset for multivariate data exploration and dimensional reduction36,37. Data from previous studies19,23,24,32,33 were projected onto the PCA results by multiplying each standardized dataset by the PCA weight matrix. The means and standard deviations of the INEGI dataset were used for standardization. PCA was conducted using the R package psych70.
Hydrogeochemical analysis
Water samples were plotted into the Piper diagram to observe the relative dominance of major ions and infer the main hydrogeochemical facies71. Seawater fraction and excess of major ions were calculated regarding the conservative mixing between freshwater and seawater with chloride as the conservative tracer72. For this, the chemical compositions of rainwater73 and seawater32 in the Yucatan Peninsula were used as end-members. The 100·SO42−/Cl− ratio was also calculated, as it has been highlighted in previous studies as a valuable tracer in the study area.
Saturation indices for minerals of interest were calculated to assess trends in mineral dissolution or precipitation. The indices were determined by the formula SI = Log(IAP/K), where IAP represents the ion activity product and K denotes the equilibrium constant for the specific mineral. The calculations were performed using the USGS PHREEQC 2.774 program, with default equilibrium constants set as follows: Log10(KCalcite) = −8.48, Log10(KDolomite) = −17.09, and Log10(KGypsum) = −4.58.
Source: Communications earth & enviroment



