Blog Archives

How to Write a Git Commit Message, in 7 Steps

May 11, 2020
By

Version control is an essential tool for any software developer. Hence, any respectable data scientist has to make sure his/her analysis programs and machine learning pipelines are reproducible and maintainable through version control. Often, we use git for version control. If you don’t know what git is yet, I advise you begin here. If you … Continue reading How...

Read more »

Predictive Power Score: Finding predictive patterns in your dataset

May 4, 2020
By
Predictive Power Score: Finding predictive patterns in your dataset

Last week, I shared this Medium blog on PPS — or Predictive Power Score — on my LinkedIn and got so many enthousiastic responses, that I had to share it with here too. Basically, the predictive power score is a normalized metric (values range from 0 to 1) that shows you to what extent you … Continue reading Predictive...

Read more »

Generative art: Let your computer design you a painting

May 2, 2020
By
Generative art: Let your computer design you a painting

I really like generative art, or so-called algorithmic art. Basically, it means you take a pattern or a complex system of rules, and apply it to create something new following those patterns/rules. When I finished my PhD, I got a beautiful poster of where the k-nearest neighbors algorithms was used to generate a set of … Continue reading Generative...

Read more »

Free Springer Books during COVID19

April 24, 2020
By
Free Springer Books during COVID19

图书出版者Springer刚刚发布了超过400个嘘k titles that can be downloaded free of charge following the corona-virus outbreak. Here’s fhe full overview: https://link.springer.com/search?facet-content-type=%22Book%22&package=mat-covid19_textbooks&facet-language=%22En%22&sortOrder=newestFirst&showAll=true Most of these books will normally set you back about $50 to $150, so this is a great deal! There are many titles on computer science, programming, business, psychology, and … Continue reading Free...

Read more »

Simulating and visualizing the Monty Hall problem in Python & R

April 14, 2020
By
Simulating and visualizing the Monty Hall problem in Python & R

I recently visited a data science meetup where one of the speakers spoke about playing out the Monty Hall problem with his kids. The Monty Hall problem is probability puzzle. Based on the American television game show Let’s Make a Deal and its host, named Monty Hall: You’re given the choice of three doors. Behind one door sits a prize: a … Continue reading Simulating...

Read more »

Curated Regular Expression Resources

April 7, 2020
By
Curated Regular Expression Resources

Regular expression (also abbreviated to regex) really is a powertool any programmer should know. It was and is one of the things I most liked learning, as it provides you with immediate, godlike powers that can speed up your (data science) workflow tenfold. I’ve covered many regex related topics on this blog already, but thought … Continue reading Curated...

Read more »

Visualizing decision tree partition and decision boundaries

March 31, 2020
By
Visualizing decision tree partition and decision boundaries

Grant McDermott develop this new R package I had thought of: parttree parttree includes a set of simple functions for visualizing decision tree partitions in R with ggplot2. The package is not yet on CRAN, but can be installed from GitHub using: Using the familiar ggplot2 syntax, we can simply add decision tree boundaries to a plot of … Continue reading Visualizing...

Read more »

How to standardize group colors in data visualizations in R

March 20, 2020
By
How to standardize group colors in data visualizations in R

One best practice in visualization is to make your color scheme consistent across figures. For instance, if you’re making multiple plots of the dataset — say a group of 5 companies — you want to have each company have the same, consistent coloring across all these plots. R has some great data visualization capabilities. Particularly … Continue reading How...

Read more »

paletteer: Hundreds of color palettes in R

March 17, 2020
By
paletteer: Hundreds of color palettes in R

Looking for just the right colors for your data visualization? I often cover tools to pick color palettes on my website (e.g. here, here, or here) and also host a comprehensive list of color packages in my R programming resources overview. However, paletteer is by far my favorite package for customizing your colors in R! … Continue reading paletteer:...

Read more »

Solutions to working with small sample sizes

March 10, 2020
By
Solutions to working with small sample sizes

Both in science and business, we often experience difficulties collecting enough data to test our hypotheses, either because target groups are small or hard to access, or because data collection entails prohibitive costs. Such obstacles may result in data sets that are too small for the complexity of the statistical model needed to answer the … Continue reading Solutions...

Read more »

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggersto receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)