Material based on:
The R–packages gRbase, gRain and gRim.
#gRbase dependencies
source("http://bioconductor.org/biocLite.R")
biocLite(c("graph", "RBGL", "Rgraphviz"))
install.packages('gRbase')
install.packages('gRain')
install.packages('gRim')
library(graph); library(RBGL); library(Rgraphviz)
library(gRbase); library(gRain); library(gRim)
In previous part we learned basic concepts behind graphical models (including DAGs), their construction and manipulation. In this part, we will learn how to use it in practice.
Consider the following narrative: Having flu (F) may cause elevated temperature (T). Elevated tempearture may cause a headache (H).
Your task: Illustrate this narrative as a DAG:
The output should look like:
We specify p(F), p(T|F) and p(H|T) as tables (1 =yes, 2 =no):
(p.F <- parray("F", levels=2, values=c(.01,.99)))
## F
## F1 F2
## 0.01 0.99
## attr(,"class")
## [1] "parray" "array"
(p.TgF <- parray(c("T","F"), levels=c(2,2), values=c(.95,.05, .001,.999)))
## F
## T F1 F2
## T1 0.95 0.001
## T2 0.05 0.999
## attr(,"class")
## [1] "parray" "array"
(p.HgT <- parray(c("H","T"), levels=c(2,2), values=c(.80,.20, .010,.990)))
## T
## H T1 T2
## H1 0.8 0.01
## H2 0.2 0.99
## attr(,"class")
## [1] "parray" "array"
p.FT <- tableMult(p.F, p.TgF)
p.FTH <- tableMult(p.FT, p.HgT)
as.data.frame.table(p.FTH)
## H T F Freq
## 1 H1 T1 F1 0.0076000
## 2 H2 T1 F1 0.0019000
## 3 H1 T2 F1 0.0000050
## 4 H2 T2 F1 0.0004950
## 5 H1 T1 F2 0.0007920
## 6 H2 T1 F2 0.0001980
## 7 H1 T2 F2 0.0098901
## 8 H2 T2 F2 0.9791199
p.FH <- tableMargin(p.FTH, margin=c('F','H'))
as.data.frame.table(p.FH)
## F H Freq
## 1 F1 H1 0.0076050
## 2 F2 H1 0.0106821
## 3 F1 H2 0.0023950
## 4 F2 H2 0.9793179
p.H <- tableMargin(p.FH, margin='H')
(p.FgH <- tableDiv(p.FH, p.H))
## F
## H F1 F2
## H1 0.415866923 0.5841331
## H2 0.002439613 0.9975604
So p(F = 1(yes)|H = 1(yes)) = 0.42 while p(F = 1(yes)) = 0.01.
However, this scheme is computationally prohibitive for large models:
Decomposable graphs
par(mfrow=c(1,3))
plot(ug(~1:2+2:3:4+3:4:5:6+6:7), "circo") # decomposable
plot(ug(~1:2+2:3+3:4:5+4:1),"circo") # not decomposable
plot(ug(~1:2:5+2:3:5+3:4:5+4:1:5),"circo") # not decomposable
Perfect ordering and maximum cardinality search
An undirected graph is triangulated (or chordal) if it has no cycles of length ≥ 4 without a chord. This is equivalent to that the vertices can be given a perfe ct ordering. A perfect ordering (if it exists) can be obtained with Maximum Cardinality Search. If character(0) is returned the graph is not triangulated. Otherwise a perfect ordering of the nodes is returned.
uG11<-ug(~a:b+b:c:d)
uG11m<-ug(~a:b+b:c:d,result="matrix") #matrix represeantation
uG12<-ug(c("a","b"),c("b","c","d") )
uG13<-ug(list(c("a","b"),c("b","c","d") ) )
mcs(uG11)
## [1] "a" "b" "c" "d"
mcs(uG11m)
## [1] "a" "b" "c" "d"
In some applications it is convenient to retain control over the orderi ng (if it exists). For example:
mcs(uG11,root=c("a","c"))
## [1] "a" "b" "c" "d"
The desired ordering (specified by root) is followed as far as possible (here only to the first variable “a”). Notice the output when applying mcs() to a directed graph:
daG11<-dag(~a:b+b:c:d)
daG12<-dag(c("a","b"),c("b","c","d") )
daG13<-dag(list(c("a","b"),c("b","c","d") ) )
mcs(daG11)
## character(0)
Task: play with uG12, uG13, daG12, daG13
Back to triangulation: Any undirected graph can be triangulated by adding edges to the graph, so called fill-ins:
uG<-ug(~a:b:c+c:d+d:e+a:e+f:g )
mcs(uG)
## character(0)
(tuG<-triangulate(uG))
## A graphNEL graph with undirected edges
## Number of Nodes = 7
## Number of Edges = 8
mcs(tuG)
## [1] "a" "b" "c" "e" "d" "f" "g"
par(mfrow=c(1,2));plot(uG);plot(tuG)
First, change the DAG into UG
ugFTH = moralize(FTH)
par(mfrow=c(1,2), oma=c(2,1,2,1))
plot(FTH)
plot(ugFTH)
(qFT <- tableMult(p.F, p.TgF))
## F
## T F1 F2
## T1 0.0095 0.00099
## T2 0.0005 0.98901
(qTH <- p.HgT)
## T
## H T1 T2
## H1 0.8 0.01
## H2 0.2 0.99
## attr(,"class")
## [1] "parray" "array"
(qT <- parray("T",levels=2, values=1))
## T
## T1 T2
## 1 1
## attr(,"class")
## [1] "parray" "array"
(qTs <- tableMargin(qFT, "T"))
## T
## T1 T2
## 0.01049 0.98951
(qTHs <- tableMult(qTH, tableDiv(qTs, qT)))
## H
## T H1 H2
## T1 0.0083920 0.0020980
## T2 0.0098951 0.9796149
(qTss <- tableMargin(qTHs, "T"))
## T
## T1 T2
## 0.01049 0.98951
(qFTs <- tableMult(qFT, tableDiv(qTss, qTs)))
## F
## T F1 F2
## T1 0.0095 0.00099
## T2 0.0005 0.98901
This leaves us with marginal distributions on all cliques and separators
qFTs
## F
## T F1 F2
## T1 0.0095 0.00099
## T2 0.0005 0.98901
qTHs
## H
## T H1 H2
## T1 0.0083920 0.0020980
## T2 0.0098951 0.9796149
qTs
## T
## T1 T2
## 0.01049 0.98951
From this we can get:
qTs # probability of temperature
## T
## T1 T2
## 0.01049 0.98951
tableMargin(qFT, "F") # probability of fever
## F
## F1 F2
## 0.01 0.99
tableMargin(qTH, "H") # probability of headache
## H
## H1 H2
## 0.81 1.19
– and we never need to calculate the joint distribution!
qTH
## T
## H T1 T2
## H1 0.8 0.01
## H2 0.2 0.99
## attr(,"class")
## [1] "parray" "array"
## Now important part: set finding H=H1
qTH[c(2,4)] <- 0
qTH
## T
## H T1 T2
## H1 0.8 0.01
## H2 0.0 0.00
## attr(,"class")
## [1] "parray" "array"
## Repeat everything
(qTs <- tableMargin(qFT, "T"))
## T
## T1 T2
## 0.01049 0.98951
(qTHs <- tableMult(qTH, tableDiv(qTs, qT)))
## H
## T H1 H2
## T1 0.0083920 0
## T2 0.0098951 0
(qTss <- tableMargin(qTHs, "T"))
## T
## T1 T2
## 0.0083920 0.0098951
(qFTs <- tableMult(qFT, tableDiv(qTss, qTs)))
## F
## T F1 F2
## T1 7.6e-03 0.0007920
## T2 5.0e-06 0.0098901
sum(qFTs)
## [1] 0.0182871
To get probability of fever we must normalize:
tableMargin(qFTs, "F")/sum(qFTs)
## F
## F1 F2
## 0.4158669 0.5841331
The important point of the computations: After working inwards and outwards in the junction tree, the clique potentials are consistent: They match on their separators:
tableMargin(qFTs, "T")
## T
## T1 T2
## 0.0083920 0.0098951
tableMargin(qTHs, "T")
## T
## T1 T2
## 0.0083920 0.0098951
qTss
## T
## T1 T2
## 0.0083920 0.0098951
Let’s consider (again) the following model:
This is example based on Lauritzen and Spiegehalter (1988) where:
Specify chest clinic network (but this times with some probabilities)
yn <- c("yes","no")
a <- cptable(~asia, values=c(1,99),levels=yn)
t.a <- cptable(~tub+asia, values=c(5,95,1,99),levels=yn)
s <- cptable(~smoke, values=c(5,5), levels=yn)
l.s <- cptable(~lung+smoke, values=c(1,9,1,99), levels=yn)
b.s <- cptable(~bronc+smoke, values=c(6,4,3,7), levels=yn)
e.lt <- cptable(~either+lung+tub,values=c(1,0,1,0,1,0,0,1),levels=yn)
x.e <- cptable(~xray+either, values=c(98,2,5,95), levels=yn)
d.be <- cptable(~dysp+bronc+either, values=c(9,1,7,3,8,2,1,9), levels=yn)
plist <- compileCPT(list(a, t.a, s, l.s, b.s, e.lt, x.e, d.be))
bnet <- grain(plist)
bnet
## Independence network: Compiled: FALSE Propagated: FALSE
## Nodes: chr [1:8] "asia" "tub" "smoke" "lung" "bronc" "either" "xray" "dysp"
plist #list all
## CPTspec with probabilities:
## P( asia )
## P( tub | asia )
## P( smoke )
## P( lung | smoke )
## P( bronc | smoke )
## P( either | lung tub )
## P( xray | either )
## P( dysp | bronc either )
plist$tub #list specific to 'tub'
## asia
## tub yes no
## yes 0.05 0.01
## no 0.95 0.99
## attr(,"class")
## [1] "parray" "array"
plot(bnet)
Now when we have the basic knowlage about graphs and some (usefull) DAG we can ask questions:
querygrain(bnet, nodes=c('lung','tub','bronc'))
## $tub
## tub
## yes no
## 0.0104 0.9896
##
## $lung
## lung
## yes no
## 0.055 0.945
##
## $bronc
## bronc
## yes no
## 0.45 0.55
bnet.f <- setFinding(bnet, nodes=c('asia','dysp'), state=c('yes','yes'))
bnet.f
## Independence network: Compiled: TRUE Propagated: TRUE
## Nodes: chr [1:8] "asia" "tub" "smoke" "lung" "bronc" "either" "xray" "dysp"
## Evidence:
## nodes is.hard.evidence hard.state
## 1 asia TRUE yes
## 2 dysp TRUE yes
## pEvidence: 0.004501
pFinding(bnet.f)
## [1] 0.004501375
querygrain(bnet.f, nodes=c('lung','tub','bronc'))
## $tub
## tub
## yes no
## 0.08775096 0.91224904
##
## $lung
## lung
## yes no
## 0.09952515 0.90047485
##
## $bronc
## bronc
## yes no
## 0.8114021 0.1885979
querygrain(bnet.f, nodes=c('lung','tub','bronc'), type='joint')
## , , bronc = yes
##
## lung
## tub yes no
## yes 0.003149038 0.04183722
## no 0.059831718 0.70658410
##
## , , bronc = no
##
## lung
## tub yes no
## yes 0.001827219 0.04093749
## no 0.034717170 0.11111605
##
## attr(,"class")
## [1] "parray" "array"
The computational scheme outlined above does not apply directly to the chest clinic example.
An extra step is needed: Triangulation of the moral graph (but we aready know how to do it).
par(mfrow=c(1,3))
plot(bnet$dag, main = 'DAG')
plot(moralize(bnet$dag), main = 'Moralized')
plot(triangulate(moralize(bnet$dag)), main = 'Triangulated')
Notice: We have not changed the fundamental model by these steps, but some conditional independencies are concealed in the triangulated graph. But the triangulated graph factorization allows efficient calculations.
gRain
grain()
Create a network from list of conditional probability tables; and do a few other things.setFinding()
: Set the values of some variables.querygrain()
: Get updated beliefs (conditional probabilities given findings) of some variablesUnder the hood there are two additional operations:
compile()
Create a clique potential representation (and a few other steps)propagate()
Turn clique potentials into clique marginals.Task: Make some new queries, set some findings and probability of findings for chest clicic DAG.