Repository of Reproducible Computations

Free Statistics

of Irreproducible Research!

Author's title

Author

*The author of this computation has been verified*

R Software Module

rwasp_edauni.wasp

Title produced by software

Univariate Explorative Data Analysis

Date of computation

Mon, 27 Oct 2008 15:00:15 -0600

Cite this page as follows

Statistical Computations at FreeStatistics.org, Office for Research Development and Education, URL https://freestatistics.org/blog/index.php?v=date/2008/Oct/27/t12251412562s1mlcuietexf4x.htm/, Retrieved Sun, 19 May 2024 15:22:20 +0000

Statistical Computations at FreeStatistics.org, Office for Research Development and Education, URL https://freestatistics.org/blog/index.php?pk=19618, Retrieved Sun, 19 May 2024 15:22:20 +0000

QR Codes:

Paste this QR Code to cite your computation.

Original text written by user:

IsPrivate?

No (this computation is public)

User-defined keywords

Estimated Impact

137

Family? (F = Feedback message, R = changed R code, M = changed R Module, P = changed Parameters, D = changed Data)

F     [Univariate Explorative Data Analysis] [Investigation Dis...] [2007-10-21 17:06:37] [b9964c45117f7aac638ab9056d451faa]
F    D  [Univariate Explorative Data Analysis] [Workshop 3 q2] [2008-10-27 11:10:42] [d8fc2cb19a73ee9b4ebccccec0f2ad7f]
F           [Univariate Explorative Data Analysis] [Q2] [2008-10-27 21:00:15] [5e2b1e7aa808f9f0d23fd35605d4968f] [Current]
-   P         [Univariate Explorative Data Analysis] [Q2 lag 12] [2008-11-02 12:45:55] [299afd6311e4c20059ea2f05c8dd029d] 
-   P         [Univariate Explorative Data Analysis] [Q2 lag 36] [2008-11-02 12:55:59] [299afd6311e4c20059ea2f05c8dd029d] 

Feedback Forum

2008-11-01 17:01:20 [Olivier Uyttendaele] [reply] 
Je hebt het correcte model gebruikt, alsook de juiste gevevens. 
Alleen moet je bij deze vraag een andere oplossing geven. 
 
Het is de bedoeling dat je hier werkt met de 4 assumpties:  
Assumptie 1: Are the data autocorrelated? 
Assumptie 2: Is the random component generated by a fixed distribution? 
Assumptie 4: The random component have a fixed variation. 
Assumptie 3: Is the deterministic component constant? 
 
Wel dien je wel het model te reproduceren en het aantal lags te veranderen naar 12 of 36; hierdoor krijg je extra grafieken die het mogelijk maken assumptie 1 op te lossen. 
 
Assumptie 1: Are the data autocorrelated? 
Hiervoor gebruik je niet de eerste grafiek die je ziet maar het lag plot of door de autocorrelatie te onderzoeken.  
 
Een normaal lag plot vertoont een vlak waarin de waardes random geplaatst zijn en dus het hele vlak vullen. In dit lag plot is dit ook het geval. 
 
 
Voor de randomness moet de waarden van de correlatie tegen 0 liggen. Op deze autocorrelatie zie je dat er wel correlatie is maar van het type seizoenale correlatie. In elk jaar komt er in maand 12 een extreme waarde (lags aanpassen naar 36 om dit te zien). 
 
Assumptie 2: Is the random component generated by a fixed distribution? 
Om te kijken of er en zelfde verdeling moet je kijken naar het histogram en Density Plot. Deze heb je wel vermeld in het document. Je kan verder ook kijken naar het Q Q Plot. De punten moeten dan rond de rechte liggen. Bij deze is dit ook het geval. 
 
De verdeling lijkt hier vrij normaal. 
De nuancering die je maakt in het document is goed opgemerkt. Deze zijn voor mij eerder afwijkend dan werkelijk extreem. We kunnen ze dan ook verwaarlozen. 
 
Assumptie 3: Is the deterministic component constant? 
 
Hier moet je het Run Sequence Plot gebruiken. 
Je moet dus naar de 1ste grafiek kijken. Om hier een uitspraak over te kunnen doen, is het de bedoeling dat we het niveau bekijken, of dit constant blijft of niet. Fluctuatie heeft hier geen uitstaans. In dit geval kan je een rechte door het gemiddelde trekken, deze zou dan een dalende trend hebben. 
 
Assumptie 4: The random component have a fixed variation. 
Belangrijk bij deze assumptie is de random component. Om hier een antwoord op te kunnen geven moet je het Run Sequence Plot reproduceren met een aanpassing in de R code. 
 
Bedoeling is het Run Sequence Plot opnieuw te laten tekenen zonder de voorspelling. We halen van de run sequence plot eigenlijk de voorspelling uit. Yt = C + Et => Ft = Yt - Et = c (constante is de voorspelling) 
 
Of de mediaan, of het gemiddelde komen in aanmerking om eruit te halen als voorspelling. Afhankelijk van het aantal outliers(veel = mediaan, weinig = gemiddelde) 
 
In dit geval had je dus in de R code moeten zetten “x<-x-86,69” en de Run Sequence Plot opnieuw moeten bekijken. Of dit gemiddelde van reeks aftrekken in excel en zo het model opnieuw berekenen. 
 
Je kan in beide grafieken (met gemiddelde bij en zonder gemiddelde) dat er een onderscheid kan gemaakt worden in het 1ste en 2de deel van de grafiek. In het 2de deel is er veel meer spreiding dan in het 1ste deel. 
 
Deze assumptie dient dus ook verworpen te worden.  
Omdat niet aan alle assumpties werd voldaan (autocorrelatie, spreiding, constant gemiddelde) is dit geen geldig model. 
(zoals je zei in je document) 
 
2008-11-02 12:54:20 [a9d641c8b88cd97bdfe55e3671cf3c5a] [reply] 
Ik ga akkoord met bovengeschreven commentaar.

Post a new message

Dataseries X:

Download CSV

Histogram

Boxplots

Summary of computational transaction
Raw Input	view raw input (R code)
Raw Output	view raw output of R engine
Computing time	3 seconds
R Server	'George Udny Yule' @ 72.249.76.132

\begin{tabular}{lllllllll}
\hline
Summary of computational transaction \tabularnewline
Raw Input & view raw input (R code)  \tabularnewline
Raw Output & view raw output of R engine  \tabularnewline
Computing time & 3 seconds \tabularnewline
R Server & 'George Udny Yule' @ 72.249.76.132 \tabularnewline
\hline
\end{tabular}
%Source: https://freestatistics.org/blog/index.php?pk=19618&T=0

[TABLE]
[ROW][C]Summary of computational transaction[/C][/ROW]
[ROW][C]Raw Input[/C][C]view raw input (R code) [/C][/ROW]
[ROW][C]Raw Output[/C][C]view raw output of R engine [/C][/ROW]
[ROW][C]Computing time[/C][C]3 seconds[/C][/ROW]
[ROW][C]R Server[/C][C]'George Udny Yule' @ 72.249.76.132[/C][/ROW]
[/TABLE]
Source: https://freestatistics.org/blog/index.php?pk=19618&T=0

Globally Unique Identifier (entire table): ba.freestatistics.org/blog/index.php?pk=19618&T=0

As an alternative you can also use a QR Code:

The GUIDs for individual cells are displayed in the table below:

Summary of computational transaction
Raw Input	view raw input (R code)
Raw Output	view raw output of R engine
Computing time	3 seconds
R Server	'George Udny Yule' @ 72.249.76.132

Descriptive Statistics
# observations	61
minimum	66.5
Q1	80.6
median	87.3
mean	86.8934426229508
Q3	94.1
maximum	109.7

\begin{tabular}{lllllllll}
\hline
Descriptive Statistics \tabularnewline
# observations & 61 \tabularnewline
minimum & 66.5 \tabularnewline
Q1 & 80.6 \tabularnewline
median & 87.3 \tabularnewline
mean & 86.8934426229508 \tabularnewline
Q3 & 94.1 \tabularnewline
maximum & 109.7 \tabularnewline
\hline
\end{tabular}
%Source: https://freestatistics.org/blog/index.php?pk=19618&T=1

[TABLE]
[ROW][C]Descriptive Statistics[/C][/ROW]
[ROW][C]# observations[/C][C]61[/C][/ROW]
[ROW][C]minimum[/C][C]66.5[/C][/ROW]
[ROW][C]Q1[/C][C]80.6[/C][/ROW]
[ROW][C]median[/C][C]87.3[/C][/ROW]
[ROW][C]mean[/C][C]86.8934426229508[/C][/ROW]
[ROW][C]Q3[/C][C]94.1[/C][/ROW]
[ROW][C]maximum[/C][C]109.7[/C][/ROW]
[/TABLE]
Source: https://freestatistics.org/blog/index.php?pk=19618&T=1

Globally Unique Identifier (entire table): ba.freestatistics.org/blog/index.php?pk=19618&T=1

As an alternative you can also use a QR Code:

The GUIDs for individual cells are displayed in the table below:

Descriptive Statistics
# observations	61
minimum	66.5
Q1	80.6
median	87.3
mean	86.8934426229508
Q3	94.1
maximum	109.7

Figure 1

PNG link

Postscript link

PDF link

Figure 2

PNG link

Postscript link

PDF link

Figure 3

PNG link

Postscript link

PDF link

Figure 4

PNG link

Postscript link

PDF link

Figure 5

PNG link

Postscript link

PDF link

Figure 6

PNG link

Postscript link

PDF link

Parameters (Session):

par1 = 0 ; par2 = 0 ;

Parameters (R input):

par1 = 0 ; par2 = 0 ; par3 = ; par4 = ; par5 = ; par6 = ; par7 = ; par8 = ; par9 = ; par10 = ; par11 = ; par12 = ; par13 = ; par14 = ; par15 = ; par16 = ; par17 = ; par18 = ; par19 = ; par20 = ;

R code (references can be found in the software module):

par1 <- as.numeric(par1)
par2 <- as.numeric(par2)
x <- as.ts(x)
library(lattice)
bitmap(file='pic1.png')
plot(x,type='l',main='Run Sequence Plot',xlab='time or index',ylab='value')
grid()
dev.off()
bitmap(file='pic2.png')
hist(x)
grid()
dev.off()
bitmap(file='pic3.png')
if (par1 > 0)
{
densityplot(~x,col='black',main=paste('Density Plot   bw = ',par1),bw=par1)
} else {
densityplot(~x,col='black',main='Density Plot')
}
dev.off()
bitmap(file='pic4.png')
qqnorm(x)
grid()
dev.off()
if (par2 > 0)
{
bitmap(file='lagplot.png')
dum <- cbind(lag(x,k=1),x)
dum
dum1 <- dum[2:length(x),]
dum1
z <- as.data.frame(dum1)
z
plot(z,main=paste('Lag plot, lowess, and regression line'))
lines(lowess(z))
abline(lm(z))
dev.off()
bitmap(file='pic5.png')
acf(x,lag.max=par2,main='Autocorrelation Function')
grid()
dev.off()
}
summary(x)
load(file='createtable')
a<-table.start()
a<-table.row.start(a)
a<-table.element(a,'Descriptive Statistics',2,TRUE)
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'# observations',header=TRUE)
a<-table.element(a,length(x))
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'minimum',header=TRUE)
a<-table.element(a,min(x))
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'Q1',header=TRUE)
a<-table.element(a,quantile(x,0.25))
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'median',header=TRUE)
a<-table.element(a,median(x))
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'mean',header=TRUE)
a<-table.element(a,mean(x))
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'Q3',header=TRUE)
a<-table.element(a,quantile(x,0.75))
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'maximum',header=TRUE)
a<-table.element(a,max(x))
a<-table.row.end(a)
a<-table.end(a)
table.save(a,file='mytable.tab')

Free Statistics

Description of Statistical Computation

Tree of Dependent Computations

Dataset

Tables (Output of Computation)

Figures (Output of Computation)

Input Parameters & R Code