Publications

Speculative Decoding for Verilog: Speed and Quality, All in One

Published in DAC, 2025

The rapid advancement of large language models (LLMs) has revolutionized code generation tasks across various programming languages. However, the unique characteristics of programming languages, particularly those like Verilog with specific syntax and lower representation in training datasets, pose significant challenges for conventional tokenization and decoding approaches. In this paper, we introduce a novel application of speculative decoding for Verilog code generation, showing that it can improve both inference speed and output quality, effectively achieving speed and quality all in one. Unlike standard LLM tokenization schemes, which often fragment meaningful code structures, our approach aligns decoding stops with syntactically significant tokens, making it easier for models to learn the token distribution. This refinement addresses inherent tokenization issues and enhances the model’s ability to capture Verilog’s logical constructs more effectively. Our experimental results show that our method achieves up to a 5.05x speedup in Verilog code generation and increases pass@10 functional accuracy on RTLLM by up to 17.19% compared to conventional training strategies. These findings highlight speculative decoding as a promising approach to bridge the quality gap in code generation for specialized programming languages.

Recommended citation: Liu Y, Xu C, Zhou Y, et al. Speculative Decoding for Verilog: Speed and Quality, All in One https://arxiv.org/abs/2503.14153

Deeprtl: Bridging verilog understanding and generation with a unified representation model

Published in ICLR, 2025

Recent advancements in large language models (LLMs) have shown significant potential for automating hardware description language (HDL) code generation from high-level natural language instructions. While fine-tuning has improved LLMs’ performance in hardware design tasks, prior efforts have largely focused on Verilog generation, overlooking the equally critical task of Verilog understanding. Furthermore, existing models suffer from weak alignment between natural language descriptions and Verilog code, hindering the generation of high-quality, synthesizable designs. To address these issues, we present DeepRTL, a unified representation model that excels in both Verilog understanding and generation. Based on CodeT5+, DeepRTL is fine-tuned on a comprehensive dataset that aligns Verilog code with rich, multi-level natural language descriptions. We also introduce the first benchmark for Verilog understanding and take the initiative to apply embedding similarity and GPT Score to evaluate the models’ understanding capabilities. These metrics capture semantic similarity more accurately than traditional methods like BLEU and ROUGE, which are limited to surface-level n-gram overlaps. By adapting curriculum learning to train DeepRTL, we enable it to significantly outperform GPT-4 in Verilog understanding tasks, while achieving performance on par with OpenAI’s o1-preview model in Verilog generation tasks.

Recommended citation: Liu Y, Xu C, Zhou Y, et al. Natural language is not enough: DeepRTL: Bridging Verilog Understanding and Generation with a Unified Representation Model https://arxiv.org/abs/2502.15832

Natural language is not enough: Benchmarking multi-modal generative AI for Verilog generation

Published in ICCAD, 2024

Natural language interfaces have demonstrated their potential in automating Verilog generation from high-level specifications using large language models which receives much attention. However, this paper reveals that for spatially complex hardware structures, visual representations provide additional context critical for design intent, which may outperform only natural language input. Building upon this insight, our paper presents a benchmark of multi-modal generative models for Verilog synthesis from visuallinguistic inputs, encompassing both single modules and complex modules. Additionally, we introduce a visual and natural language Verilog query language to facilitate efficient and user-friendly multi-modal queries. To evaluate the performance of the proposed multi-modal hardware generative AI in Verilog generation tasks, we compare it with a popular method that relies solely on natural language. Our results demonstrate a significant accuracy improvement in the multi-modal generated Verilog compared to queries based solely on natural language. We hope to reveal a new field in the large hardware design model era, thereby fostering a more diversified and efficacious approach to hardware design.

Recommended citation: Chang K, Chen Z, Zhou Y, et al. Natural language is not enough: Benchmarking multi-modal generative AI for Verilog generation. https://yyh-sjtu.github.io/files/natural_language_is_not_enough.pdf

Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework

Published in DAC, 2024

Recent advances in large language models have demonstrated their potential for automated generation of Verilog code from high-level prompts. Researchers have utilized fine-tuning to enhance the ability of these large language models (LLMs) in the field of Chip Design. However, the lack of Verilog data hinders further improvement in the quality of Verilog generation by LLMs. Additionally, the absence of a Verilog and EDA script data augmentation framework significantly increases the time required to prepare the training dataset for LLM trainers. In this paper, we propose an automated design-data augmentation framework, which generates high quality natural language description of the Verilog/EDA script. To evaluate the effectiveness of our data augmentation method, we finetune Llama2-13B and Llama2-7B models. The results demonstrate a significant improvement in the Verilog generation task when compared to the general data augmentation method. Moreover, the accuracy of Verilog generation surpasses that of the current state-of-the-art open-source Verilog generation model, increasing from 58.8% to 70.6% with the same benchmark and outperforms GPT-3.5 in Verilog repair and EDA Script Generation with only 13B weights.

Recommended citation: Chang K, Wang K, Yang N, et al. Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework[J]. arXiv preprint arXiv:2403.11202, 2024. https://yyh-sjtu.github.io/files/data_is_all_you_need.pdf