REINFORCEMENT LEARNING FOR DYNAMIC TREATMENT OF CHRONIC KIDNEY DISEASE MINERAL AND BONE DISORDER (CKD-MBD) AMONG PATIENTS WITH KIDNEY FAILURE ON HEMODIALYSIS: A DEMONSTRATION PROJECT

Certificate Output Instructions

For best output, select "Paper Size" as "A4" and "Margin" as "0" or "None".

To save or print to PDF, please select Print Destination > Save as PDF, enable Background Graphics under "More Settings", then click "Save".

Presented the abstract " "
(Abstract co-author(s): )

Back

E-Poster Presentation

During the congress, E-Posters will be accessible to all participants on the congress website 24/7, as well as in the E-poster stations in the congress center.

Preparing your E-Poster

Please review the E-Poster format requirements carefully when preparing your E-Poster. Should your E-Poster not meet the mentioned requirements, it may not be displayed as described above.

E-Poster Submission Deadline

Please prepare and upload your E-Poster no later than March 14, 2026 11.59PM CET. After this date, you will no longer be able to prepare and upload your E-poster and it will not be displayed and accessible on the congress website.

E-Poster Format Requirements

PDF file
Layout: Portrait (vertical orientation)
One page only (Dim A4: 210 x 297mm or PPT)
E-Poster can be prepared in PowerPoint (one (1) PowerPoint slide) but must be saved and submitted as PDF file.
File Size: Maximum file size is 2 Megabytes (2 MB)
No hyperlinks, animated images, animations, and slide transitions
Language: English
Include your abstract number
E-posters can include QR codes, tables and photos

E-Poster

Abstract Title *

REINFORCEMENT LEARNING FOR DYNAMIC TREATMENT OF CHRONIC KIDNEY DISEASE MINERAL AND BONE DISORDER (CKD-MBD) AMONG PATIENTS WITH KIDNEY FAILURE ON HEMODIALYSIS: A DEMONSTRATION PROJECT

Please follow the instructions below to input your abstract title.

Abstract titles should be brief and reflect the content of the abstract.

The title will not be accepted if it exceeds 25 words.
Type in CAPITAL LETTERS.
Lowercase may be used for abbreviations only, for example, mRNA.

Co-author 1

Benjamin Lobo lobo@virginia.edu University of Virginia School of Data Science Data Science Charlottesville, VA United States -

Co-author 2

Xingbo Fu xf3av@virginia.edu University of Virginia School of Engineering Electrical and Computer Engineering Charlottesville, VA United States -

Co-author 3

Elizabeth Thompson et7gav@virginia.edu University of Virginia School of Engineering Systems Engineering Charlottesville, VA United States -

Co-author 4

Indika Mallawaarachchi imallawaarachchi@virginia.edu University of Virginia School of Medicine Public Health Sciences Charlottesville, VA United States -

Co-author 5

Varsha Pothula Venkata hjn7wc@uvahealth.org University of Virginia School of Medicine Medicine Charlottesville, VA United States -

Co-author 6

Lori Dunn lad2z@uvahealth.org University of Virginia School of Medicine Medicine Charlottesville, VA United States -

Co-author 7

Jochen Raimann jochen.raimann@rriny.com Fresenius Medical Care Renal Research Institute New York, NY United States -

Co-author 8

Hanjie Zhang hanjie.zhang@rriny.com Fresenius Medical Care Renal Research Institute New York, NY United States -

Co-author 9

Jundong Li jl6qk@virginia.edu University of Virginia School of Engineering Electrical and Computer Engineering Charlottesville, VA United States -

Co-author 10

Jennie Ma jzm4h@virginia.edu University of Virginia School of Medicine Public Health Sciences Charlottesville, VA United States -

Co-author 11

Adam Tashman apt4c@virginia.edu University of Virginia School of Data Science Data Science Charlottesville, VA United States -

Co-author 12

Julia Scialla js7rk@uvahealth.org University of Virginia School of Medicine Medicine Charlottesville, VA United States *

Co-author 13

Co-author 14

Co-author 15

Introduction

Treatment of chronic kidney disease mineral and bone disorder (CKD-MBD) involves dynamic titration of multiple drug classes including vitamin D agonists (vit D), calcimimetics (calm), and phosphorus binders. Developing dynamic treatment algorithms that guide titration could facilitate needed randomized controlled trials (RCTs) of alternative CKD-MBD treatment strategies.

Methods

In this demonstration project, we used monthly electronic health record data from adult patients receiving in-center hemodialysis (HD) at one of 12 units at our institution between 2006-2024 and including any months after first parathyroid hormone (PTH) ≥400 pg/ml. CKD-MBD medication doses were harmonized to common daily equivalents. Months between 2006-2021 were split into 60% training and 40% validation at the patient level. All data from 2022-2024 were included in a held-out test set. Months with medication titrations were up-sampled 3 times in the training set to increase informative examples. Behavioral cloning (BC) and reinforcement learning (RL) were fit using a recurrent neural network to construct different treatment policies involving a matrix of increase/initiate, maintain, and decrease/stop for each of vit D and calm, yielding 9 potential actions. Covariates included CKD-MBD laboratories (albumin-corrected calcium, serum phosphorus, PTH), phosphorus binders (total and calcium-containing), baseline doses of vit D and calm, demographics, dialysis vintage, calendar year, and diabetes, each over the prior 3 months. RL was initialized with BC and rewarded future CKD-MBD laboratories within range (i.e., PTH 400-600 pg/ml; calcium 8.4-10.0 mg/dl; phosphorus 3.5-5.0 mg/dl) or moving toward range (PTH) and conferred penalties for moving away from range (PTH) or out of range (calcium, phosphate), each with discounting over time. A weighted solution including 90% RL and 10% BC in the loss function was selected to prevent extreme deviations from practice, based on interim clinical review during model fitting. An ensemble of 25 models with distinct starting seeds was used to improve stability.

Results

Our weighted RL policy recommended fewer months in which both drug classes were maintained (51%) and recommended more titration of calm (12% of months) than actual practice, referred to as ground truth (GT; maintained medications in 71% of months; titrated calm in 5% of months; both p<0.001). The odds of RL recommending a net increasing (i.e., increase vit D, calm, or both), net decreasing (i.e., decrease vit D, calm, or both), or no net action (i.e., maintain both medications or one increased and one decreased) compared to GT across different CKD-MBD starting states in the validation dataset is depicted in the Figure (n=1,047 patients with 36,799 observed months). CKD-MBD states are defined by corrected-calcium [1: <8.0; 2: 8.0-8.4; 3:8.4-10.0; 4:>10.0 mg/dl], serum phosphorus [1: <3.5; 2: 3.5-5.0; 3: 5.0-6.5; 4: 6.5-8.0; 5: >8.0 mg/dl] and PTH [1: <150; 2: 150-400; 3: 400-600; 4: 600-900; 5: >900 pg/ml]. The RL policy was evaluated in a held-out test set using a nearest neighbor matching approach to pair observations in which the GT actions taken were concordant versus discordant with RL policy (n=1,109 patients with 20,591 observations prior to matching). For actions concordant with RL policies, the odds of being out of range in the next month were reduced by 35% for PTH and 16% for calcium (both p<0.001).

Odds Ratio of Taking a Net Increasing Action vs. a Net Decreasing Action vs. No Net Action Comparing Reinforcement Learning to Ground Truth Clinical Actions Across CKD-MBD States in the Validation Set.

Conclusion

We conclude that preliminary weighted RL-derived policies show promise for generating dynamic treatment algorithms to test in RCTs. This abstract was previously presented, in part, at the Gordon Research Conference on Physiology, Biology and Pathology of Phosphate in February 2025 and the Renal Research Institute Symposium in June 2025.

Kewords