“hello world” (Some Intro-Level R Resources)

Personal Blog, Resources, UC San Diego

Data science proficiency goes hand-in-hand with Ph.D-level research. For most of us, however, we don’t enter graduate school with strong programming skills. Instead, we’re likely thrown into a two-in-one, programming-and-statistical-methods course during our first year of graduate school, using any number of possible languages (MatLab, R, SPSS, etc.). Personally, I’m of the belief that learning R is invaluable. I think the learning curve is steeper compared to other languages, but as you develop proficiency and confidence, I find it to be a dynamic language that can do most-anything you’ll need within the scope of a Ph.D program. (A bonus: It’s heavily used in industry as well.)

Since proficiency comes with practice, I take coding workshops as frequently as they’re available. It solidifies what I already know and keeps me from forgetting methods that I may not use regularly when doing my own research and analyses. Sometimes, I’ll learn more efficient ways to do things too. Also, learning from different teachers has the benefit of having concepts explained in a different ways—things that seemed ‘fuzzy’ when explained by one professor may be crystal clear when explained in a different manner. 

I advocate for a general literacy across a few languages — after all, you’ll have little control over the format of materials sent over by colleagues — but here I thought I’d focus on my favorite beginner-level resources for R programming.

The DataLab at UC Davis is a wealth of information for various languages for users at various skill levels. For those interested in ‘Intro to R’ material, they have recordings available for their four-part series on the basics. The recordings are from a 2021 Zoom workshop, include student questions from those who participated in real-time, and also comes with a ‘reader’ such that you can follow along with notes instead of pausing to write down material. This workshop can be done interactively too, meaning that you are able to follow along within RStudio. A link to the old registration page for the event can be found here. A link to Part 1 of the series is here (with the other three parts showing up in the side-bar). The reader for the series is here.

Codecademy isn’t quite as tailored to Ph.D research, but if video-based learning isn’t your thing, this may be a good resource. Much of Codecademy’s work for data science is behind a paywall for paying members of the site, but their free content within the Learn R course is a good supplement to the work from DataLab. There are other R courses on Codeacademy meant for beginners, but Learn R is the best place to start.

I think that the video lectures from DataLab explain things in a way that make sense to me and are good about addressing my questions as I have them (even as an asynchronous viewer). If you are looking to do both, I would start with Codecademy and then watch the DataLab lectures. I think Codecademy will get you familiar with the R in a way that allows the material from DataLab to seem less overwhelming—though I believe that the DataLab lectures back-fill the context needed to have a fully integrated understanding of what exactly R is doing.

I want to provide two non-course resources as well. One is familiar to most R users: R Studio Cheatsheets. In my opinion, these cheatsheets are great for those who may not remember how to do specific/basic actions within R, but are comfortable reading programming syntax. They’re less helpful for true newbies. That said, I know that beginners often struggle with finding the vocabulary to look up what they need help with online. I think these cheatsheets may provide direction to complete newcomers who want some hints as to what they can Google online. (An aside: The aforementioned reader from the DataLab course can assist similarly even if you do not participate in their workshop series.) The second resource is the website rseek.org. It’s akin to the in-R help function… just exponentially better. R’s base help function can sometimes/often/usually be cryptic at best. When you type your question into rseek, it searches other websites known to be about R (e.g., StackOverflow, R Documentation, etc.) to curate your results. I think it’s a great tool for those who need assistance finding what they’re looking for amongst the plethora of webpages found via a regular Google search.

Lastly, once you’ve completed Codecademy’s course or the DataLab’s beginner series, that means you now have the prerequisite skills to participate in Tidy Tuesday! Congratulations! Tidy Tuesday is one of my favorite ways to practice R. Each week, the R for Data Science Online Learning Community works with a new dataset—cleaning it, analyzing it, etc. It’s a way to practice in a meaningful way and connect with the greater RStats community.

Hope these things help!

-A.Y.

Leave a Reply

Your email address will not be published. Required fields are marked *