FROM ALGORITHMS TO AGRICULTURE: APPLYING AI METHODS TO PREDICT KIDNEY DISEASE IN FARMWORKERS

Certificate Output Instructions

For best output, select "Paper Size" as "A4" and "Margin" as "0" or "None".

To save or print to PDF, please select Print Destination > Save as PDF, enable Background Graphics under "More Settings", then click "Save".

Presented the abstract " "
(Abstract co-author(s): )

Back

E-Poster Presentation

During the congress, E-Posters will be accessible to all participants on the congress website 24/7, as well as in the E-poster stations in the congress center.

Preparing your E-Poster

Please review the E-Poster format requirements carefully when preparing your E-Poster. Should your E-Poster not meet the mentioned requirements, it may not be displayed as described above.

E-Poster Submission Deadline

Please prepare and upload your E-Poster no later than March 14, 2026 11.59PM CET. After this date, you will no longer be able to prepare and upload your E-poster and it will not be displayed and accessible on the congress website.

E-Poster Format Requirements

PDF file
Layout: Portrait (vertical orientation)
One page only (Dim A4: 210 x 297mm or PPT)
E-Poster can be prepared in PowerPoint (one (1) PowerPoint slide) but must be saved and submitted as PDF file.
File Size: Maximum file size is 2 Megabytes (2 MB)
No hyperlinks, animated images, animations, and slide transitions
Language: English
Include your abstract number
E-posters can include QR codes, tables and photos

E-Poster

https://storage.unitedwebnetwork.com/files/1099/4b7b1a818e3b3237c3e6a9f5ed7fb1b9.pdf

Abstract Title *

FROM ALGORITHMS TO AGRICULTURE: APPLYING AI METHODS TO PREDICT KIDNEY DISEASE IN FARMWORKERS

Please follow the instructions below to input your abstract title.

Abstract titles should be brief and reflect the content of the abstract.

The title will not be accepted if it exceeds 25 words.
Type in CAPITAL LETTERS.
Lowercase may be used for abbreviations only, for example, mRNA.

Co-author 1

Yusuf Ashktorab yusufashk@gmail.com Howard University College of Medicine Medicine Washington, D.C. United States -

Co-author 2

Santhushya Hewapathiranage santhushya@gmail.com National Hospital Kandy Center for Research Kandy Sri Lanka -

Co-author 3

Xue Yu xueyu@stanford.edu Stanford University Division of Nephrology, Department of Medicine Palo Alto, CA United States -

Co-author 4

Shuchi Anand sanand2@stanford.edu Stanford University Division of Nephrology, Department of Medicine Palo Alto, CA United States *

Co-author 5

Rohana Chandrajith rohanac@sci.pdn.ac.lk National Hospital Kandy Center for Research Kandy Sri Lanka -

Co-author 6

Nishantha Nanyakkara nishantha4313@gmail.com National Hospital Kandy Center for Research Kandy Sri Lanka -

Co-author 7

Maria Montez-Rath mmrath@stanford.edu Stanford University Division of Nephrology, Department of Medicine Palo Alto, CA United States -

Co-author 8

Co-author 9

Co-author 10

Co-author 11

Co-author 12

Co-author 13

Co-author 14

Co-author 15

Introduction

Chronic Kidney Disease of unknown etiology (CKDu) is a progressive form of kidney damage that disproportionately affects agricultural communities in Sri Lanka, Central America, and parts of India, with emerging hotspots also suspected in the South and Western United States. Despite extensive investigation, the underlying causes of CKDu remain unclear, partly because of a potential long lag between exposure and eventual symptomatic presentation of kidney disease. We applied machine learning (ML) and large language models (LLMs) to explore predictive tools for disease progression in a cohort of persons with CKDu, with the hypothesis that early identification of persons likely to experience kidney function decline will facilitate investigation of proximal (causative) exposures.

Methods

We analyzed baseline data (labs demographics) from 244 participants enrolled in Kidney Progression Project (KiPP), a prospective cohort of farmworkers with kidney disease in Sri Lanka launched in 2018. Among these participants, 22 progressed to the study’s primary outcome (eGFR<15 mL/min/1.73 m²) by the year 2024. For model development, we used data from the first 2.5 years of follow-up.

We developed and optimized several ML classifiers including logistic regression, support vector machines, random forests, and XGBoost. Each model was trained using stratified five-fold cross-validation, and class imbalance was addressed using SMOTE-based oversampling. To identify underlying patterns in the dataset, we applied unsupervised models like K-means clustering(k=7), and the resulting cluster labels were included as categorical features in supervised learning models. Model performance was evaluated using standard classification metrics, and interpretability was assessed using SHAP (SHapley Additive exPlanations).

Additionally, we used clinical vignettes from KiPP data to evaluate the predictive capabilities of LLMs. We tested five LLMs through SecureGPT on their ability to predict progression.

Results

The best performing ML model was an XGBoost classifier (ROC-AUC= 0.84; F1 score= 0.48). SHAP analysis identified cluster membership and serum uric acid as key predictors. Among the LLMs, GPT 4.5 performed best (Accuracy= 87%; F1 score of 0.42). Both models outperformed existing methods (eGFR slope, Accuracy=67%, F1 score=0.23).

Figure 1. Bar plots comparing LLM’s results across the two prompts: With baseline labs and without

Figure 2. A. UMAP colored by K-Means clusters, using baseline labs and eGFR values

Figure 2. B. Heatmap showing standardized feature differences across clusters, highlighting distinct profiles in Cluster 2.

Table 1. Best Machine Learning model results.

Conclusion

Our findings demonstrate that both traditional ML models and LLMs can predict CKDu progression from limited clinical data. Custom-tailored ML models slightly outperformed generative AI models such as GPT-4.5. With continued refinement, these tools may not only support early identification of high risk individuals but also aid in identifying causes of CKDu. Future directions include external validation using ongoing international cohorts and incorporation of longitudinal laboratory data to generate clinically-deployable AI tools.

This abstract has been submitted to the 2025 Interim AMA Poster Showcase.

Kewords