Home | Portfolio | Terms and Conditions | E-mail me | LinkedIn

Cumulative Binomial Probability Analysis with R and Shiny

In conducting probability analysis, the two variables that take account of the chance of an event happening are N (number of observations) and λ (lambda – our hit rate/chance of occurrence in a single interval). When we talk about a cumulative binomial probability distribution, we mean to say that the greater the number of trials, the higher the overall probability of an event occurring.

probability = 1 – ((1 – λ)N)

For instance, the odds of rolling a number 6 on a fair die is 1/6. However, suppose that same die is rolled 10 times:

1 – ((1 – 0.1667)10) = 0.8385

We see that the probability of rolling a number 6 now increases to 83.85%.

Based on the law of large numbers, the larger the number of trials; the larger the probability of an event happening even if the probability within a single trial is very low. So, let us generate a cumulative binomial probability to demonstrate how probability increases given an increase in the number of trials.

Firstly, we define a function (with probabilities set at 2%, 4%, and 6%, along with trials of up to 100:

par(bg = '#191661', fg = '#ffffff', col.main = '#ffffff', col.lab = '#ffffff', col.axis = '#ffffff')

#lambda = probability of event occuring in a single trial
#powers = number of trials
#mu = overall probability given n number of trials

muCalculation <- function(lambda, powers) {1 - ((1 - lambda)^powers)}
probability_at_lambda <- sapply(c(0.02, 0.04, 0.06), muCalculation, seq(0, 100, 1))

Then, we can set up our data as a data frame and then plot as normal:

probability_at_lambdadf=data.frame(probability_at_lambda)
col_headings <- c("probability1","probability2","probability3")
names(probability_at_lambdadf) <- col_headings
probability_at_lambdadf
attach(probability_at_lambdadf)
plot(probability_at_lambdadf$probability1,type="o",col="#b1aef4", xlab="N", ylab="Probability", xlim=c(0, 100), ylim=c(0.0, 1.0), pch=19)
lines(probability_at_lambdadf$probability2,type="o",col="red", xlab="N", ylab="Probability2", xlim=c(0, 100), ylim=c(0.0, 1.0), pch=19)
lines(probability_at_lambdadf$probability3,type="o",col="green", xlab="N", ylab="Probability3", xlim=c(0, 100), ylim=c(0.0, 1.0), pch=19)
title(main="Probability Chart")
grid(nx = NULL, ny = NULL, col = "lightgray", lty = "dotted",
     lwd = par("lwd"), equilogs = TRUE)
legend("bottomright", probability[2], c("probability_at_lambda_1","probability_at_lambda_2", "probability_at_lambda_3"), cex=0.6, col=c("#b1aef4","red","green"), pch=21:22, lty=1:2)
proc.time()

Sample Table

Here is a sample table with the calculated probabilities (probability_at_lambdadf):

sample-table

Plot

Accordingly, here is a plot of the probabilities:

imager

Analyse Cumulative Binomial Probability with a Shiny Web Application

This is an example of a Shiny Web application that can calculate cumulative binomial probabilities on the fly.

You’ll remember that our previous R script invoked a function to calculate binomial probabilities based on lambda (the probability of an event happening), and the power value (or number of trials).

The idea is that while the probability of an individual event happening may be low, the cumulative probability of the event happening increases with the number of trials.

1 – ((1 – λ)N)

Here is an example of a Shiny Web App that allows us to manipulate the lambda values using a set of sliders and automatically update the probability curve.

To run this app, open the R Studio console and click File -> New File -> Shiny Web App and select either Single File to paste the ui.R and server.R codes together, or Multiple File to paste them separately.

cumprob

Additionally, if you are new to Shiny you can find my full tutorial on Sitepoint that describes how to build and run a Shiny app from scratch.

A few points when setting up the UI (User Interface):

ui.R

library(shiny)

# Define UI for application that draws a probability plot
shinyUI(fluidPage(
  
  # Application title
  titlePanel("Cumulative Binomial Probability Plot"),
  
  # Sidebar with a slider input for value of lambda
  sidebarLayout(
    sidebarPanel(
      sliderInput("lambda",
                  "Probability 1:",
                  min = 0,
                  max = 1,
                  value = 0.01),
      sliderInput("lambda2",
                  "Probability 2:",
                  min = 0,
                  max = 1,
                  value = 0.01),
      sliderInput("lambda3",
                  "Probability 3:",
                  min = 0,
                  max = 1,
                  value = 0.01)
    ),
    
    # Show a plot of the generated probability plot
    mainPanel(
      plotOutput("ProbPlot")
    )
  )
))

Now, we set up the server - this is the part that takes the inputs and calculates the output that is eventually shown in the UI.

server.R

library(shiny)
library(ggplot2)
library(scales)

# Shiny Application
shinyServer(function(input, output) {
  
  # Reactive expressions
  output$ProbPlot <- renderPlot({
    
    # generate lambda based on input$lambda from ui.R
    l=0:1
    lambda <- seq(min(l), max(l), length.out = input$lambda)
    probability=lambda
    l2=0:1
    lambda2 <- seq(min(l2), max(l2), length.out = input$lambda2)
    probability=lambda
    l3=0:1
    lambda3 <- seq(min(l3), max(l3), length.out = input$lambda3)
    probability=lambda
    
    # generate trials based on lambda value
    muCalculation <- function(lambda, powers) {1 - ((1 - lambda)^powers)}
    probability_at_lambda <- sapply(input$lambda, muCalculation, seq(0, 100, 1))
    probability_at_lambda2 <- sapply(input$lambda2, muCalculation, seq(0, 100, 1))
    probability_at_lambda3 <- sapply(input$lambda3, muCalculation, seq(0, 100, 1))
    
    # draw the probability
    par(bg = '#191661', fg = '#ffffff', col.main = '#ffffff', col.lab = '#ffffff', col.axis = '#ffffff')
    plot(probability_at_lambda,type="o",col="#b1aef4", xlab="N", ylab="Probability", xlim=c(0, 100), ylim=c(0.0, 1.0), pch=19)
    lines(probability_at_lambda2,type="o",col="red", xlab="N", ylab="Probability2", xlim=c(0, 100), ylim=c(0.0, 1.0), pch=19)
    lines(probability_at_lambda3,type="o",col="green", xlab="N", ylab="Probability3", xlim=c(0, 100), ylim=c(0.0, 1.0), pch=19)
    title(main="Cumulative Binomial Probability")
  })
  
})

Conclusion

Today, you have learned how to: