R笔记:哑变量
哑变量(Dummy Variables)也称虚拟变量,在回归中是一个很重要的概念。哑变量的引入使得回归模型变得更复杂,但对问题描述更简明而且接近现实。
dummy.c {misty}:
creates k - 1 dummy coded 0/1 variables for a vector with k distinct values.
dummy.c(x, ref = NULL, names = "d", as.na = NULL, check = TRUE)
x:a numeric vector with integer values, character vector or factor.
ref:a numeric value or character string indicating the reference group. By default, the last category is selected as reference group.
names:a character string or character vector indicating the names of the dummy variables. By default, variables are named "d" with the category compared to the reference category (e.g., "d1" and "d2"). Variable names can be specified using a character string (e.g., names = "dummy_" leads to dummy_1 and dummy_2) or a character vector matching the number of dummy coded variables (e.g. names = c("x.3_1", "x.3_2")) which is the number of unique categories minus one.
as.na:a numeric vector indicating user-defined missing values, i.e. these values are converted to NA before conducting the analysis.
check:logical: if TRUE, argument specification is checked.
library(readxl)
dumv<-read_excel("D:/Temp/bsrdata.xlsx")
library(misty)
dummy.c(dumv$race, reference = 1, names = "race_d")
dumv文件中的变量race已经被转换成了哑变量,哑变量以第一水平为参照水平,生成的两个哑变量名称为race_2和race_d3。
可将对象中的因子变量转换成0/1哑变量。
model.matrix(object, ...)
model.matrix(object, data = environment(object),contrasts.arg = NULL, xlev = NULL, ...)
object:an object of an appropriate class. For the default method, a model formula or a terms object.
data:a data frame created with model.frame. If another sort of object, model.frame is called first.
contrasts.arg:a list, whose entries are values (numeric matrices, functions or character strings naming functions) to be used as replacement values for the contrasts replacement function and whose names are the names of columns of data containing factors.
xlev:to be used as argument of model.frame if data is such that model.frame is called.
...:further arguments passed to or from other methods.
library(readxl)
dumv<-read_excel("D:/Temp/bsrdata.xlsx")
dumv$race<-factor(dumv$race) #将变量race设置为因子变量
dumv$smoke<-factor(dumv$smoke)
dumv$ht<-factor(dumv$ht)
dumv$ui<-factor(dumv$ui)
dumlized<-model.matrix(bwt~age+lwt+race+smoke+ptl+ht+ui+ftv,data=dumv)
dumlized
对象中的因子变量已经全部转换成了因子变量。
原创不易,欢迎“在看”
关注“一统浆糊”
获取更多信息