Creating Effective Prompts for R Coding
R Programming
LLMs
Tutorials
A guide to constructing high-quality prompts for leveraging Large Language Models (LLMs) in R.
Fundamental Principles of Prompt Construction
Set Clear Goals
- Specify the task and desired outcome.
- Eliminate ambiguity in the requirements.
Bad Prompt
TASK: Text cleaning function
INPUT: character vector
REQUIREMENTS:
- Remove special characters
- Convert to lowercase
OUTPUT: cleaned vector
Good Prompt
Create an R function that cleans text data by removing special characters and converting to lowercase. The input should be a character vector.
Please provide the code and a brief explanation of your approach.
Use Positive Language
Frame requests in terms of “do this” rather than “don’t do this.” This encourages clarity and actionable guidance.
Natural Language and Rigid Formatting
Modern LLMs excel at understanding natural language. Keep prompts conversational but specific.
Provide Context and Specifications
- Specify the desired output format (e.g., R function, script, or data frame).
- Include relevant libraries or packages.
Good Prompt
in R, write code to analyze my sales data with these specifications:
Using the tidyverse ecosystem
<- data.frame(
sales_data date = c("2024-01-01", "2024-01-02", "2024-01-01"),
region = c("North", "South", "North"),
sales = c(1200, NA, 1500),
units = c(50, 45, 60)
)
:
Requirements1. Calculate mean sales and total units by region and month.
2. Format dates appropriately.
3. Return results in a tidied data frame.
Bad Prompt
in R.
Average my data
<- data.frame(
sales_data date = c("2024-01-01", "2024-01-02", "2024-01-01"),
region = c("North", "South", "North"),
sales = c(1200, NA, 1500),
units = c(50, 45, 60)
)
Encourage Reasoning and Explanation
- Request step-by-step explanations in addition to code.
- Ask for comments within the code to clarify its functionality.
- Request a summary of the approach or methodology used.
Optimize for R-Specific Tasks
Data Manipulation
Specify the preferred libraries or frameworks for data handling, such as dplyr
.
Visualization
Explicitly mention visualization libraries like ggplot2
or plotly
.
Good Prompt
in R that shows the relationship between `mpg` and `hp` from the mtcars dataset. The plot should:
Create a ggplot2 visualization - Use a scatter plot with a smooth trend line.
- Color the points by the `cyl` variable.
- Use a minimal theme.
Improve Code Quality
- Request performance optimization and adherence to established style guides.
- Suggest refactoring to ensure modular, reusable code.
Good Prompt
Refactor the following R code to improve performance and readability using the tidyverse style guide and vector operations.
<- function(df) {
my_function <- data.frame()
result for (i in 1:nrow(df)) {
if (!is.na(df$value[i]) & df$category[i] == "A") {
<- data.frame(
temp id = df$id[i],
transformed = sqrt(df$value[i]),
group = df$category[i]
)<- rbind(result, temp)
result
}
}return(result)
}
Leverage R’s Strengths
- Vectorization: Request the use of vectorized operations where applicable.
- Functional Programming: Encourage using
apply
functions or thepurrr
library for functional programming paradigms. - Data Types: Specify the use of appropriate data structures like lists or data frames to improve efficiency.
Iterative Improvement
- Start with a basic prompt and iterate based on feedback.
- Review the generated output and refine your requirements.
- Use specific follow-ups to clarify or improve aspects of the output.
- Restart with a refined prompt when the direction changes significantly.
Summary
Do:
- Clearly specify the required packages and dependencies.
- Include sample data or a detailed description of the expected data structure.
- Request robust error handling and input validation.
- Break down complex tasks into smaller, manageable prompts.
- Ask for explanations of choices or methods used.
Don’t:
- Provide ambiguous or incomplete requirements.
- Overload prompts with unrelated tasks.
- Assume implicit knowledge of specific packages without mentioning them.
- Neglect to specify data types or structures explicitly.