TY - JOUR
T1 - CrysText
T2 - A Generative AI Approach for Text-Conditioned Crystal Structure Generation Using LLM
AU - Mohanty, Trupti
AU - Mehta, Maitrey
AU - Sayeed, Hasan M.
AU - Oded, Bat El
AU - Pitussi, Itay
AU - Borenstein, Arie
AU - Srikumar, Vivek
AU - Sparks, Taylor D.
N1 - Publisher Copyright:
© The Minerals, Metals & Materials Society 2025 2026.
PY - 2026
Y1 - 2026
N2 - The ability to generate crystal structures directly from textual descriptions marks a pivotal advancement in materials informatics and underscores the emerging role of large language models (LLMs) in inverse design. In this work, we introduce CrysText, a text-conditioned framework that generates crystal structures in Crystallographic Information File (CIF) format from natural language prompts specifying composition and space group. Leveraging LLaMA-3.1-8B and Mistral-7B-v0.3 fine-tuned using Quantized Low-Rank Adaptation (QLoRA), our approach enables the efficient and scalable generation of CIF-formatted structures directly from input descriptions, eliminating the need for post-processing with rapid inference. Evaluations on the MP-20 benchmark demonstrate high structural match rates and low RMSE values, confirming the model’s ability to generate physically consistent crystal structures aligned with compositional and symmetry constraints. By incorporating energy above the convex hull as a conditioning parameter, CrysText further demonstrates the ability to generate thermodynamically stable novel materials. We subsequently extend this framework with CrysText-RL, which integrates Group Relative Policy Optimization (GRPO) to provide reinforcement learning feedback directly on generated CIF outputs via group-based normalized rewards. CrysText-RL achieves additional improvements over the supervised CrysText model in terms of composition and space group satisfiability and structure match rate. This work establishes a scalable paradigm for text-driven crystal structure generation, demonstrating that both supervised fine-tuning and reinforcement learning enable a pathway towards accelerated materials discovery.
AB - The ability to generate crystal structures directly from textual descriptions marks a pivotal advancement in materials informatics and underscores the emerging role of large language models (LLMs) in inverse design. In this work, we introduce CrysText, a text-conditioned framework that generates crystal structures in Crystallographic Information File (CIF) format from natural language prompts specifying composition and space group. Leveraging LLaMA-3.1-8B and Mistral-7B-v0.3 fine-tuned using Quantized Low-Rank Adaptation (QLoRA), our approach enables the efficient and scalable generation of CIF-formatted structures directly from input descriptions, eliminating the need for post-processing with rapid inference. Evaluations on the MP-20 benchmark demonstrate high structural match rates and low RMSE values, confirming the model’s ability to generate physically consistent crystal structures aligned with compositional and symmetry constraints. By incorporating energy above the convex hull as a conditioning parameter, CrysText further demonstrates the ability to generate thermodynamically stable novel materials. We subsequently extend this framework with CrysText-RL, which integrates Group Relative Policy Optimization (GRPO) to provide reinforcement learning feedback directly on generated CIF outputs via group-based normalized rewards. CrysText-RL achieves additional improvements over the supervised CrysText model in terms of composition and space group satisfiability and structure match rate. This work establishes a scalable paradigm for text-driven crystal structure generation, demonstrating that both supervised fine-tuning and reinforcement learning enable a pathway towards accelerated materials discovery.
KW - Crystallographic Information File (CIF)
KW - Group Relative Policy Optimization (GRPO)
KW - Large language models (LLMs)
KW - Quantized Low-Rank Adaptation (QLoRA)
KW - Reinforcement learning
UR - https://www.scopus.com/pages/publications/105033891970
U2 - 10.1007/s40192-026-00451-8
DO - 10.1007/s40192-026-00451-8
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:105033891970
SN - 2193-9764
JO - Integrating Materials and Manufacturing Innovation
JF - Integrating Materials and Manufacturing Innovation
ER -