25-element Vector{Int64}:
27
20
21
26
27
31
24
21
20
19
23
24
28
19
24
29
18
20
17
31
20
25
28
21
27
Lecture
2023-08-28
We will move through this module (“fundamentals”) at a fairly brisk pace
Today
Packages in Julia
Effect of beer drinking on attractiveness to mosquitos
Probability distributions
Statistics
Wrapup
Julia has a built-in package manager for installing add-on functionality written in Julia. It can also install external libraries using your operating system’s standard system for doing so, or by compiling from source.
Each project has an *environment, which is defined by the following files (do not edit them manually):
Project.toml
: lists the specified dependencies of the projectManifest.toml
: lists the exact versions of the packages that are used in the projectThe actual packages are stored on your computer and you don’t need to worry
We activate
a project to tell Julia that we want to use the packages in that project. These steps are equivalent:
using Pkg
Pkg.activate(".")
]
to enter the package manageractivate .
We add
a package to install it in the current project
using Pkg
Pkg.add("DataFrames")
]
to enter the package manageradd DataFrames
When working with someone else’s project, we need to install the packages that they use.
activate
does not install anything, just tells Julia which packages to useinstantiate
is your friend to make sure an environment is ready to use. If there’s nothing to do, instantiate does nothing.Pkg.jl
docs
instantiate
ERROR: Jupyter kernel 'julia-1.9' not found.
x4IJulia
package
Manifest.toml
but you need to instantiate
Pkg.build("IJulia")
in the REPL (after you activate
and instantiate
)Today
Packages in Julia
Effect of beer drinking on attractiveness to mosquitos
Probability distributions
Statistics
Wrapup
In this class we will use computation and simulation to build fundamental insight into statistical processes without dwelling on “agonizing” details.
Does drinking beer reduce the likelihood of being bitten by mosquitos?
Create a variable called beer
to hold the number of mosquito bites for beer drinkers:
25-element Vector{Int64}:
27
20
21
26
27
31
24
21
20
19
23
24
28
19
24
29
18
20
17
31
20
25
28
21
27
beer
?We can learn a bit more about it:
Vector{Int64} (alias for Array{Int64, 1})
25
(25,)
23.6
We can do the same for water drinkers:
;
at the end of our statement, we keep the notebook from showing the outputLet’s calculate the difference between the average number of bites in each group.
4.37777777777778
mean
function from the StatsBase
packageThe skeptic asks whether this might be random chance.
1.224444444444444
shuffle
function from the Random
packagey1
and y2
end
closes the function definitionWe want to learn about the sampling distribution of the group differences: repeat this experiment many times over and plott the results
50000
get_shuffled_difference
each time.length
tells us the size of a vectorPlots
package to make plotsdiffs
and obs
histogram
is a function from the Plots
packagediffs
object. ;
separates the positional arguments from the keyword argumentsxlabel
is a “keyword argument” specifying the text for the x-axis labelvline!
) at the observed differenceWe could have done this with a parametric test
HypothesisTests
packageHypothesisTests.
, but it adds clarity;
suppresses outputTwo sample t-test (equal variance)
----------------------------------
Population details:
parameter of interest: Mean difference
value under h_0: 0
point estimate: 4.37778
95% confidence interval: (1.913, 6.843)
Test summary:
outcome with 95% confidence: reject h_0
two-sided p-value: 0.0009
Details:
number of observations: [25,18]
t-statistic: 3.5869843832143413
degrees of freedom: 41
empirical standard error: 1.220461900604875
Two sample t-test (unequal variance)
------------------------------------
Population details:
parameter of interest: Mean difference
value under h_0: 0
point estimate: 4.37778
95% confidence interval: (1.957, 6.798)
Test summary:
outcome with 95% confidence: reject h_0
two-sided p-value: 0.0007
Details:
number of observations: [25,18]
t-statistic: 3.658244539721401
degrees of freedom: 39.11341478045414
empirical standard error: 1.196688119190407
0.00052
.
is the dot operator. It applies the function to each element of the vector individually.Today
Packages in Julia
Effect of beer drinking on attractiveness to mosquitos
Probability distributions
Statistics
Wrapup
The Normal (Gaussian) distribution has probability distribution function:
\[ p(y | \mu, \theta) = \frac{1}{\sigma\sqrt{2\pi}} \exp \left( -\frac{1}{2}\left( \frac{x-\mu}{\sigma} \right)^{\!2} \, \right) \]
The central limit theorem says that the sum of many independent random variables is approximately normally distributed.
We can see this with an example:
J
draws from a non-Gaussian distribution \(\mathcal{D}\)ȳ
, type y
then type \bar
and hit tab
. Julia allows unicode (or emojis) in variable names∈
, type \in
and hit tab
. The _
isn’t doing anything special and we could name it i
or 😶 or whatever we want but _
suggests it’s a throwawayWe will get tired of writing
\[ p(y | \mu, \theta) = \frac{1}{\sigma\sqrt{2\pi}} \exp \left( -\frac{1}{2}\left( \frac{x-\mu}{\sigma} \right)^{\!2} \, \right) \]
Instead, we will often use shorthand:
\[ y \sim \mathcal{N}(\mu, \sigma^2) \]
L"<string>"
allows us to use LaTeX in stringsA Bernoulli distribution models a coin flip.
5-element Vector{Bool}:
0
0
0
0
0
p
A Binomial distribution models the distribution of n
consecutive flips of the same coin
5-element Vector{Int64}:
3
4
5
1
3
The Multinomial extends the Binomial to multiple categories. Note that p
is a vector. If there are 2 categories (\(K=2\)), it’s just the binomial with \(p_\text{multinomial} = [p, 1-p]\).”
3×5 Matrix{Int64}:
3 3 0 4 2
2 0 0 1 1
0 2 5 0 2
rand(Multinimial([0.5, 0.3, 0.2], 5), 5)
. Which is more readable?The Poisson distribution is used to model count data. It is the limit of a Binomial distribution with \(p=\lambda/N\), as \(N \rightarrow \infty\).
A Poisson distribution has mean and variance equal to \(\lambda\).
10-element Vector{Int64}:
1
4
4
1
3
3
3
2
2
4
The NegativeBinomaial
distribution relaxes the Poisson’s assumotion that \(\text{mean} = \text{variance}\).
This distribution models the number of successes in a sequence of independent and identically distributed Bernoulli trials with probability p
before a specified (non-random) number of failures (r
) occurs. For example, we can define rolling a 6 on a dice as a failure, and rolling any other number as a success, and ask how many successful rolls will occur before we see the third failure (p = 1/6
and r = 3
).
Today
Packages in Julia
Effect of beer drinking on attractiveness to mosquitos
Probability distributions
Statistics
Wrapup
The mean of a sample is just the sample average: \[ \bar{y} = \frac{1}{N} \sum_{i=1}^N y_i \]
The mean of a distribution is the expected value of the distribution: \[ \mathbb{E}(u) = \int u p(u) \, du \]
Variance measures how points differ from the mean
You may be familiar with sample variance: \[ S^2 = \frac{\sum_{i=1}^n (x_i - \bar{x})^2}{n - 1} \]
For a distribution: \[ \mathbb{V}(u) = \int (u - \mathbb{E}(u))^2 p(u) \, du \] or, for a vector \[ \mathbb{V}(u) = \int (u - \mathbb{E}(u)) (u - \mathbb{E}(u))^T p(u) \, du \]
Today
Packages in Julia
Effect of beer drinking on attractiveness to mosquitos
Probability distributions
Statistics
Wrapup
If you haven’t filled out the Doodle, please do so ASAP