5 Ways to Subset a Data Frame in R

金宝搏网址11月29日,二千零一十六
By

(This article was first published on (R)very Day,并对 188bet appR博主

Often,when you're working with a large data set,you will only be interested in a small portion of it for your particular analysis.So,你如何对所有外来变量和观察结果进行分类,只提取你需要的变量?好,R在一个称为“子集”的过程中有几种实现这一点的方法。

The most basic way of在R中子集数据帧使用方括号,以便:

example[x,y]

example is the data frame we want to subset,“x”由要返回的行组成,and ‘y' consists of the columns we want returned.让我们从网络中提取一些数据,看看如何在真实的数据集上实现这一点。

###导入教育支出数据集并分配列名称injuction<-read.csv(“https://vincentarelbundock.github.io/rdatasets/csv/robustbase/education.csv”,stringsasfactors=false)colnames(education)<-c(“x”,“state”,“region”,“urban.population”,“per.capital.income”,“minor.population”,“education.expenses”)view(education)

以下是导入数据并对其列进行适当命名后,数据集的第一部分的外观。

edexp1

现在,假设我们监督中西部的学校部门,我们负责计算在我们地区每个州每个孩子花了多少钱。我们需要三个变量:状态,未成年人。人口,and Education.Expenditures.However,we would only need the observations from the rows that correspond to Region 2.以下是在R中检索数据的基本方法:

ed exp1<-教育[c(10:21),c(2,6:7)]

为了创建新的数据框架“ed_exp1”,我们通过提取第10-21行来细分“education”数据框架,and columns 2,6,7。Pretty simple,right?

Another way to subset the data frame with brackets is by omitting row and column references.Take a look at this code:

ed_exp2 <- education[-c(1:9,22:50),-c(1,3:5)]

在这里,instead of subsetting the rows and columns we wanted returned,我们将不希望返回的行和列加在一起,然后用“-”号省略了它们。If we now call ed_exp1 and ed_exp2,我们可以看到,两个数据帧都返回原始教育数据帧的相同子集。

EDIP2

现在,在R中设置数据帧的这些基本方法对于大型数据集来说可能会变得单调乏味。You have to know the exact column and row references you want to extract.7列50行很简单,但是如果有70列和5000行呢?在这种情况下,如何找到所需的列和行?这是另一种在R.

Ed_Exp3<-教育[其中(教育$region==2),名称(教育)%以%c表示(“州”,“未成年人口”,“教育.expends”)]

现在,we have a few things going on here.第一,我们使用与前两个示例相同的基本包围技术来子集教育数据框架。This time,however,我们使用which()函数提取所需的行。此函数返回教育数据的区域列为2的索引。That gives us the rows we need.我们使用教育数据框架名称上的%in%运算符来检索子集的列。

现在,您可能会看到这行代码,认为它太复杂了。There's got to be an easier way to do that.好,you would be right.There is another basic function in R that allows us to subset a data frame without knowing the row and column references.The name?您猜对了:subset()。

ed_exp4 <- subset(education,区域=2,select=c(“state”、“minor.population”、“education.expenses”))

The subset() function takes 3 arguments: the data frame you want subsetted,与要通过其进行子集的条件相对应的行,and the columns you want returned.在我们的例子中,we take a subset of education where "Region" is equal to 2 and then we select the "State," "Minor.Population," and "Education.Expenditure" columns.

When we subset the education data frame with either of the two aforementioned methods,we get the same result as we did with the first two methods:

edexp3

现在,there's just one more method to share with you.最后一种方法,once you've learned it well,可能对您操作数据最有用。Let's take a look at the code and then we'll go over it…

安装.packages(“dpylr”)库(dpylr)ed_exp5<-select(filter(education,地区==2),c(州,未成年人。人口:教育。支出)

This last method is not part of the basic R environment.使用它,you've got to install and download thedplyr package.If you're going to be working with data in R,though,这是你绝对想要的包裹。It is among the most downloaded packages in the R environment and,as you start using it,you'll quickly see why.

So,once we've downloaded dplyr,we create a new data frame by using two different functions from this package:

  • filter: the first argument is the data frame;第二个参数是我们希望它被子集的条件。结果是整个数据帧只包含我们想要的行。
  • select: the first argument is the data frame;第二个参数是要从中选择的列的名称。我们不必使用name()函数,我们甚至不需要用引号。我们只是将列名称作为对象列出。

In this example,we've wrapped the filter function in the selection function to return our data frame.换言之,we've first taken the rows where the Region is 2 as a subset.Then,we took the columns we wanted from only those rows.The result gives us a data frame consisting of the data we need for our 12 states of interest:

edexp4

So,to recap,here are 5 ways we can subset a data frame in R:

  1. Subset using brackets by extracting the rows and columns we want
  2. Subset using brackets by omitting the rows and columns we don't want
  3. Subset using brackets in combination with the which() function and the %in% operator
  4. 使用subset()函数的子集
  5. 使用dplyr包中的filter()和select()函数的子集

That's it!快乐的屈从!

留下评论for the author,please follow the link and comment on their blog: (R)very Day.

188bet appR-bloggers.comoffers 每日电子邮件更新金宝搏网址 R新闻与 tutorials关于以下主题: 数据科学大数据, r作业,可视化(可视化) ggplot2Boxplotsmapsanimation),程序设计(程序) 演播室SweaveLaTeXSQLEclipsegit哈多普刮网) statistics ( 回归主成分分析time series交易) and more...



If you got this far,why not subscribe for updates from the site?选择您的口味: e-mail推特1188bet app,or facebook...

Comments are closed.

搜索R-Blo188bet appggers


Sponsors

千万不要错过更新!
Subscribe to 188bet appR-bloggers接收
e-mails with the latest R posts.
(You will not see this message again.)

单击此处关闭(此弹出窗口将不再出现)