Repository of Reproducible Computations

Free Statistics

of Irreproducible Research!

Author's title

Author

*The author of this computation has been verified*

R Software Module

rwasp_regression_trees1.wasp

Title produced by software

Recursive Partitioning (Regression Trees)

Date of computation

Mon, 13 Dec 2010 19:40:21 +0000

Cite this page as follows

Statistical Computations at FreeStatistics.org, Office for Research Development and Education, URL https://freestatistics.org/blog/index.php?v=date/2010/Dec/13/t1292269279gz7153eqqeyx2to.htm/, Retrieved Mon, 06 May 2024 21:51:11 +0000

Statistical Computations at FreeStatistics.org, Office for Research Development and Education, URL https://freestatistics.org/blog/index.php?pk=109119, Retrieved Mon, 06 May 2024 21:51:11 +0000

QR Codes:

Paste this QR Code to cite your computation.

Original text written by user:

IsPrivate?

No (this computation is public)

User-defined keywords

Estimated Impact

155

Family? (F = Feedback message, R = changed R code, M = changed R Module, P = changed Parameters, D = changed Data)

-     [Recursive Partitioning (Regression Trees)] [] [2010-12-05 20:13:50] [b98453cac15ba1066b407e146608df68]
F   PD    [Recursive Partitioning (Regression Trees)] [Recursive Partiti...] [2010-12-13 19:40:21] [dfb0309aec67f282200eef05efe0d5bd] [Current]

Feedback Forum

2010-12-18 11:28:57 [00c625c7d009d84797af914265b614f9] [reply] 
correct, 
Op de figuur is duidelijk te zien dat indien men een lage leercompetentie heeft men ongeveer 90% kans heeft om een hoge score te halen op 'doubts'. 
Indien er een hoge leercompetentie is speelt bezorgdheid ook een rol, bij hoge bezorgdheid heeft men een kans van ongeveer 60% om een hoge score te behalen. 
Cross validation: uit de tabel leiden we af dat 74% van de studenten met een lage score op twijfel kan voorspeld worden.en ongeveer 70% van de studenten met een hoge score op twijfel. De waarden van training en testing liggen zeer dicht bij elkaar dus er is zeker geen sprake van overfitting.

Post a new message

Dataseries X:

Download CSV

Histogram

Boxplots

0	13	26	9	6	25	25
0	16	20	9	6	25	24
0	19	21	9	13	19	21
1	15	31	14	8	18	23
0	14	21	8	7	18	17
0	13	18	8	9	22	19
0	19	26	11	5	29	18
0	15	22	10	8	26	27
0	14	22	9	9	25	23
0	15	29	15	11	23	23
1	16	15	14	8	23	29
0	16	16	11	11	23	21
1	16	24	14	12	24	26
0	17	17	6	8	30	25
1	15	19	20	7	19	25
1	15	22	9	9	24	23
0	20	31	10	12	32	26
1	18	28	8	20	30	20
0	16	38	11	7	29	29
1	16	26	14	8	17	24
0	19	25	11	8	25	23
0	16	25	16	16	26	24
1	17	29	14	10	26	30
0	17	28	11	6	25	22
1	16	15	11	8	23	22
0	15	18	12	9	21	13
1	14	21	9	9	19	24
0	15	25	7	11	35	17
1	12	23	13	12	19	24
0	14	23	10	8	20	21
0	16	19	9	7	21	23
1	14	18	9	8	21	24
1	7	18	13	9	24	24
1	10	26	16	4	23	24
1	14	18	12	8	19	23
0	16	18	6	8	17	26
1	16	28	14	8	24	24
1	16	17	14	6	15	21
0	14	29	10	8	25	23
1	20	12	4	4	27	28
1	14	25	12	7	29	23
0	14	28	12	14	27	22
0	11	20	14	10	18	24
0	15	17	9	9	25	21
0	16	17	9	6	22	23
1	14	20	10	8	26	23
0	16	31	14	11	23	20
1	14	21	10	8	16	23
1	12	19	9	8	27	21
0	16	23	14	10	25	27
1	9	15	8	8	14	12
0	14	24	9	10	19	15
0	16	28	8	7	20	22
0	16	16	9	8	16	21
1	15	19	9	7	18	21
0	16	21	9	9	22	20
1	12	21	15	5	21	24
1	16	20	8	7	22	24
0	16	16	10	7	22	29
0	14	25	8	7	32	25
0	16	30	14	9	23	14
1	17	29	11	5	31	30
0	18	22	10	8	18	19
1	18	19	12	8	23	29
0	12	33	14	8	26	25
1	16	17	9	9	24	25
1	10	9	13	6	19	25
0	14	14	15	8	14	16
0	18	15	8	6	20	25
1	18	12	7	4	22	28
1	16	21	10	6	24	24
0	16	20	10	4	25	25
0	16	29	13	12	21	21
1	13	33	11	6	28	22
1	16	21	8	11	24	20
1	16	15	12	8	20	25
1	20	19	9	10	21	27
0	16	23	10	10	23	21
1	15	20	11	4	13	13
0	15	20	11	8	24	26
0	16	18	10	9	21	26
1	14	31	16	9	21	25
0	15	18	16	7	17	22
0	12	13	8	7	14	19
0	17	9	6	11	29	23
0	16	20	11	8	25	25
0	15	18	12	8	16	15
0	13	23	14	7	25	21
0	16	17	9	5	25	23
0	16	17	11	7	21	25
0	16	16	8	9	23	24
1	16	31	8	8	22	24
1	14	15	7	6	19	21
0	16	28	16	8	24	24
1	16	26	13	10	26	22
0	20	20	8	10	25	24
1	15	19	11	8	20	28
0	16	25	14	11	22	21
1	13	18	10	8	14	17
0	17	20	10	8	20	28
1	16	33	14	6	32	24
0	12	24	14	20	21	10
0	16	22	10	6	22	20
0	16	32	12	12	28	22
0	17	31	9	9	25	19
1	13	13	16	5	17	22
0	12	18	8	10	21	22
1	18	17	9	5	23	26
0	14	29	16	6	27	24
0	14	22	13	10	22	22
0	13	18	13	6	19	20
0	16	22	8	10	20	20
0	13	25	14	5	17	15
0	16	20	11	13	24	20
0	13	20	9	7	21	20
0	16	17	8	9	21	24
0	15	21	13	11	23	22
0	16	26	13	8	24	29
1	15	10	10	5	19	23
0	17	15	8	4	22	24
0	15	20	7	9	26	22
0	12	14	11	7	17	16
1	16	16	11	5	17	23
1	10	23	14	5	19	27
0	16	11	6	4	15	16
1	14	19	10	7	17	21
0	15	30	9	9	27	26
1	13	21	12	8	19	22
1	15	20	11	8	21	23
0	11	22	14	11	25	19
0	12	30	12	10	19	18
1	8	25	14	9	22	24
0	16	28	8	12	18	24
1	15	23	14	10	20	29
0	17	23	8	10	15	22
1	16	21	11	7	20	24
0	10	30	12	10	29	22
0	18	22	9	6	19	12
1	13	32	16	6	29	26
0	15	22	11	11	24	18
1	16	15	11	8	23	22
0	16	21	12	9	22	24
0	14	27	15	9	23	21
0	10	22	13	13	22	15
0	17	9	6	11	29	23
0	13	29	11	4	26	22
0	15	20	7	9	26	22
0	16	16	8	5	21	24
0	12	16	8	4	18	23
0	13	16	9	9	10	13

Summary of computational transaction
Raw Input	view raw input (R code)
Raw Output	view raw output of R engine
Computing time	8 seconds
R Server	'George Udny Yule' @ 72.249.76.132

\begin{tabular}{lllllllll}
\hline
Summary of computational transaction \tabularnewline
Raw Input & view raw input (R code)  \tabularnewline
Raw Output & view raw output of R engine  \tabularnewline
Computing time & 8 seconds \tabularnewline
R Server & 'George Udny Yule' @ 72.249.76.132 \tabularnewline
\hline
\end{tabular}
%Source: https://freestatistics.org/blog/index.php?pk=109119&T=0

[TABLE]
[ROW][C]Summary of computational transaction[/C][/ROW]
[ROW][C]Raw Input[/C][C]view raw input (R code) [/C][/ROW]
[ROW][C]Raw Output[/C][C]view raw output of R engine [/C][/ROW]
[ROW][C]Computing time[/C][C]8 seconds[/C][/ROW]
[ROW][C]R Server[/C][C]'George Udny Yule' @ 72.249.76.132[/C][/ROW]
[/TABLE]
Source: https://freestatistics.org/blog/index.php?pk=109119&T=0

Globally Unique Identifier (entire table): ba.freestatistics.org/blog/index.php?pk=109119&T=0

As an alternative you can also use a QR Code:

The GUIDs for individual cells are displayed in the table below:

Summary of computational transaction
Raw Input	view raw input (R code)
Raw Output	view raw output of R engine
Computing time	8 seconds
R Server	'George Udny Yule' @ 72.249.76.132

10-Fold Cross Validation
	Prediction (training)			Prediction (testing)
Actual	C1	C2	CV	C1	C2	CV
C1	629	211	0.7488	74	26	0.74
C2	157	353	0.6922	15	35	0.7
Overall	-	-	0.7274	-	-	0.7267

\begin{tabular}{lllllllll}
\hline
10-Fold Cross Validation \tabularnewline
 & Prediction (training) & Prediction (testing) \tabularnewline
Actual & C1 & C2 & CV & C1 & C2 & CV \tabularnewline
C1 & 629 & 211 & 0.7488 & 74 & 26 & 0.74 \tabularnewline
C2 & 157 & 353 & 0.6922 & 15 & 35 & 0.7 \tabularnewline
Overall & - & - & 0.7274 & - & - & 0.7267 \tabularnewline
\hline
\end{tabular}
%Source: https://freestatistics.org/blog/index.php?pk=109119&T=1

[TABLE]
[ROW][C]10-Fold Cross Validation[/C][/ROW]
[ROW][C][/C][C]Prediction (training)[/C][C]Prediction (testing)[/C][/ROW]
[ROW][C]Actual[/C][C]C1[/C][C]C2[/C][C]CV[/C][C]C1[/C][C]C2[/C][C]CV[/C][/ROW]
[ROW][C]C1[/C][C]629[/C][C]211[/C][C]0.7488[/C][C]74[/C][C]26[/C][C]0.74[/C][/ROW]
[ROW][C]C2[/C][C]157[/C][C]353[/C][C]0.6922[/C][C]15[/C][C]35[/C][C]0.7[/C][/ROW]
[ROW][C]Overall[/C][C]-[/C][C]-[/C][C]0.7274[/C][C]-[/C][C]-[/C][C]0.7267[/C][/ROW]
[/TABLE]
Source: https://freestatistics.org/blog/index.php?pk=109119&T=1

Globally Unique Identifier (entire table): ba.freestatistics.org/blog/index.php?pk=109119&T=1

As an alternative you can also use a QR Code:

The GUIDs for individual cells are displayed in the table below:

10-Fold Cross Validation
	Prediction (training)			Prediction (testing)
Actual	C1	C2	CV	C1	C2	CV
C1	629	211	0.7488	74	26	0.74
C2	157	353	0.6922	15	35	0.7
Overall	-	-	0.7274	-	-	0.7267

Confusion Matrix (predicted in columns / actuals in rows)
	C1	C2
C1	71	23
C2	17	39

\begin{tabular}{lllllllll}
\hline
Confusion Matrix (predicted in columns / actuals in rows) \tabularnewline
 & C1 & C2 \tabularnewline
C1 & 71 & 23 \tabularnewline
C2 & 17 & 39 \tabularnewline
\hline
\end{tabular}
%Source: https://freestatistics.org/blog/index.php?pk=109119&T=2

[TABLE]
[ROW][C]Confusion Matrix (predicted in columns / actuals in rows)[/C][/ROW]
[ROW][C][/C][C]C1[/C][C]C2[/C][/ROW]
[ROW][C]C1[/C][C]71[/C][C]23[/C][/ROW]
[ROW][C]C2[/C][C]17[/C][C]39[/C][/ROW]
[/TABLE]
Source: https://freestatistics.org/blog/index.php?pk=109119&T=2

Globally Unique Identifier (entire table): ba.freestatistics.org/blog/index.php?pk=109119&T=2

As an alternative you can also use a QR Code:

The GUIDs for individual cells are displayed in the table below:

Confusion Matrix (predicted in columns / actuals in rows)
	C1	C2
C1	71	23
C2	17	39

Figure 1

PNG link

Postscript link

PDF link

Figure 2

PNG link

Postscript link

PDF link

Figure 3

PNG link

Postscript link

PDF link

Parameters (Session):

par1 = 2 ; par2 = none ; par3 = 3 ; par4 = no ;

Parameters (R input):

par1 = 4 ; par2 = quantiles ; par3 = 2 ; par4 = yes ;

R code (references can be found in the software module):

library(party)
library(Hmisc)
par1 <- as.numeric(par1)
par3 <- as.numeric(par3)
x <- data.frame(t(y))
is.data.frame(x)
x <- x[!is.na(x[,par1]),]
k <- length(x[1,])
n <- length(x[,1])
colnames(x)[par1]
x[,par1]
if (par2 == 'kmeans') {
cl <- kmeans(x[,par1], par3)
print(cl)
clm <- matrix(cbind(cl$centers,1:par3),ncol=2)
clm <- clm[sort.list(clm[,1]),]
for (i in 1:par3) {
cl$cluster[cl$cluster==clm[i,2]] <- paste('C',i,sep='')
}
cl$cluster <- as.factor(cl$cluster)
print(cl$cluster)
x[,par1] <- cl$cluster
}
if (par2 == 'quantiles') {
x[,par1] <- cut2(x[,par1],g=par3)
}
if (par2 == 'hclust') {
hc <- hclust(dist(x[,par1])^2, 'cen')
print(hc)
memb <- cutree(hc, k = par3)
dum <- c(mean(x[memb==1,par1]))
for (i in 2:par3) {
dum <- c(dum, mean(x[memb==i,par1]))
}
hcm <- matrix(cbind(dum,1:par3),ncol=2)
hcm <- hcm[sort.list(hcm[,1]),]
for (i in 1:par3) {
memb[memb==hcm[i,2]] <- paste('C',i,sep='')
}
memb <- as.factor(memb)
print(memb)
x[,par1] <- memb
}
if (par2=='equal') {
ed <- cut(as.numeric(x[,par1]),par3,labels=paste('C',1:par3,sep=''))
x[,par1] <- as.factor(ed)
}
table(x[,par1])
colnames(x)
colnames(x)[par1]
x[,par1]
if (par2 == 'none') {
m <- ctree(as.formula(paste(colnames(x)[par1],' ~ .',sep='')),data = x)
}
load(file='createtable')
if (par2 != 'none') {
m <- ctree(as.formula(paste('as.factor(',colnames(x)[par1],') ~ .',sep='')),data = x)
if (par4=='yes') {
a<-table.start()
a<-table.row.start(a)
a<-table.element(a,'10-Fold Cross Validation',3+2*par3,TRUE)
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'',1,TRUE)
a<-table.element(a,'Prediction (training)',par3+1,TRUE)
a<-table.element(a,'Prediction (testing)',par3+1,TRUE)
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'Actual',1,TRUE)
for (jjj in 1:par3) a<-table.element(a,paste('C',jjj,sep=''),1,TRUE)
a<-table.element(a,'CV',1,TRUE)
for (jjj in 1:par3) a<-table.element(a,paste('C',jjj,sep=''),1,TRUE)
a<-table.element(a,'CV',1,TRUE)
a<-table.row.end(a)
for (i in 1:10) {
ind <- sample(2, nrow(x), replace=T, prob=c(0.9,0.1))
m.ct <- ctree(as.formula(paste('as.factor(',colnames(x)[par1],') ~ .',sep='')),data =x[ind==1,])
if (i==1) {
m.ct.i.pred <- predict(m.ct, newdata=x[ind==1,])
m.ct.i.actu <- x[ind==1,par1]
m.ct.x.pred <- predict(m.ct, newdata=x[ind==2,])
m.ct.x.actu <- x[ind==2,par1]
} else {
m.ct.i.pred <- c(m.ct.i.pred,predict(m.ct, newdata=x[ind==1,]))
m.ct.i.actu <- c(m.ct.i.actu,x[ind==1,par1])
m.ct.x.pred <- c(m.ct.x.pred,predict(m.ct, newdata=x[ind==2,]))
m.ct.x.actu <- c(m.ct.x.actu,x[ind==2,par1])
}
}
print(m.ct.i.tab <- table(m.ct.i.actu,m.ct.i.pred))
numer <- 0
for (i in 1:par3) {
print(m.ct.i.tab[i,i] / sum(m.ct.i.tab[i,]))
numer <- numer + m.ct.i.tab[i,i]
}
print(m.ct.i.cp <- numer / sum(m.ct.i.tab))
print(m.ct.x.tab <- table(m.ct.x.actu,m.ct.x.pred))
numer <- 0
for (i in 1:par3) {
print(m.ct.x.tab[i,i] / sum(m.ct.x.tab[i,]))
numer <- numer + m.ct.x.tab[i,i]
}
print(m.ct.x.cp <- numer / sum(m.ct.x.tab))
for (i in 1:par3) {
a<-table.row.start(a)
a<-table.element(a,paste('C',i,sep=''),1,TRUE)
for (jjj in 1:par3) a<-table.element(a,m.ct.i.tab[i,jjj])
a<-table.element(a,round(m.ct.i.tab[i,i]/sum(m.ct.i.tab[i,]),4))
for (jjj in 1:par3) a<-table.element(a,m.ct.x.tab[i,jjj])
a<-table.element(a,round(m.ct.x.tab[i,i]/sum(m.ct.x.tab[i,]),4))
a<-table.row.end(a)
}
a<-table.row.start(a)
a<-table.element(a,'Overall',1,TRUE)
for (jjj in 1:par3) a<-table.element(a,'-')
a<-table.element(a,round(m.ct.i.cp,4))
for (jjj in 1:par3) a<-table.element(a,'-')
a<-table.element(a,round(m.ct.x.cp,4))
a<-table.row.end(a)
a<-table.end(a)
table.save(a,file='mytable3.tab')
}
}
m
bitmap(file='test1.png')
plot(m)
dev.off()
bitmap(file='test1a.png')
plot(x[,par1] ~ as.factor(where(m)),main='Response by Terminal Node',xlab='Terminal Node',ylab='Response')
dev.off()
if (par2 == 'none') {
forec <- predict(m)
result <- as.data.frame(cbind(x[,par1],forec,x[,par1]-forec))
colnames(result) <- c('Actuals','Forecasts','Residuals')
print(result)
}
if (par2 != 'none') {
print(cbind(as.factor(x[,par1]),predict(m)))
myt <- table(as.factor(x[,par1]),predict(m))
print(myt)
}
bitmap(file='test2.png')
if(par2=='none') {
op <- par(mfrow=c(2,2))
plot(density(result$Actuals),main='Kernel Density Plot of Actuals')
plot(density(result$Residuals),main='Kernel Density Plot of Residuals')
plot(result$Forecasts,result$Actuals,main='Actuals versus Predictions',xlab='Predictions',ylab='Actuals')
plot(density(result$Forecasts),main='Kernel Density Plot of Predictions')
par(op)
}
if(par2!='none') {
plot(myt,main='Confusion Matrix',xlab='Actual',ylab='Predicted')
}
dev.off()
if (par2 == 'none') {
detcoef <- cor(result$Forecasts,result$Actuals)
a<-table.start()
a<-table.row.start(a)
a<-table.element(a,'Goodness of Fit',2,TRUE)
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'Correlation',1,TRUE)
a<-table.element(a,round(detcoef,4))
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'R-squared',1,TRUE)
a<-table.element(a,round(detcoef*detcoef,4))
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'RMSE',1,TRUE)
a<-table.element(a,round(sqrt(mean((result$Residuals)^2)),4))
a<-table.row.end(a)
a<-table.end(a)
table.save(a,file='mytable1.tab')
a<-table.start()
a<-table.row.start(a)
a<-table.element(a,'Actuals, Predictions, and Residuals',4,TRUE)
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'#',header=TRUE)
a<-table.element(a,'Actuals',header=TRUE)
a<-table.element(a,'Forecasts',header=TRUE)
a<-table.element(a,'Residuals',header=TRUE)
a<-table.row.end(a)
for (i in 1:length(result$Actuals)) {
a<-table.row.start(a)
a<-table.element(a,i,header=TRUE)
a<-table.element(a,result$Actuals[i])
a<-table.element(a,result$Forecasts[i])
a<-table.element(a,result$Residuals[i])
a<-table.row.end(a)
}
a<-table.end(a)
table.save(a,file='mytable.tab')
}
if (par2 != 'none') {
a<-table.start()
a<-table.row.start(a)
a<-table.element(a,'Confusion Matrix (predicted in columns / actuals in rows)',par3+1,TRUE)
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'',1,TRUE)
for (i in 1:par3) {
a<-table.element(a,paste('C',i,sep=''),1,TRUE)
}
a<-table.row.end(a)
for (i in 1:par3) {
a<-table.row.start(a)
a<-table.element(a,paste('C',i,sep=''),1,TRUE)
for (j in 1:par3) {
a<-table.element(a,myt[i,j])
}
a<-table.row.end(a)
}
a<-table.end(a)
table.save(a,file='mytable2.tab')
}

Free Statistics

Description of Statistical Computation

Tree of Dependent Computations

Dataset

Tables (Output of Computation)

Figures (Output of Computation)

Input Parameters & R Code