Stochastic Thinking, Simulation and Sampling
1 Stochastic Process
A stochastic process is a collection of random variables representing a process that evolves over time.
Here is the code of simulation of rolling a dice:
def simulate_dice(goal, num_trials, txt):
total = 0
for i in range(num_trials):
result = ''
for j in range(len(goal)):
result += str(np.random.choice([1, 2, 3, 4, 5, 6]))
if result == goal:
total += 1\
print(f'Actual Probability of {txt} = {1/6**len(goal)}')
print(f'Estimated Probability of {txt} = {total/num_trials}')
# Actual Prob. 0.0001286
# Estimated Prob. 0.02 Simulation and Sampling
Well, however, simulations are often useful, compared to analytical algorithm.
2.1 The Birthday Problem
The Birthday Problem
What is the probability of at least two people in a group having the same birthday, given that the group has
If
, then, , for pigeonhole principle. If
, the answer is And we have
What if at least
def birthday_problem(n, m, num_trials):
total = 0
for i in range(num_trials):
birthdays = np.random.choice(range(365), m)
if len(set(birthdays)) < m-n:
total += 1
print(f'Estimated Probability of at least {n} people having the same birthday = {total/num_trials}')2.2 Simulation
- Simulation models are computational frameworks that simulate the behavior of a system, offering valuable insights into its potential dynamics and outcomes.
2.3 Monte Carlo Simulation
Monte Carlo simulation is a computational technique that generates random variables for modeling risk or uncertainty of a system. It uses random sampling to obtain numerical results, based on principles of inferential statistics:
- Random Sampling: Randomly sample from a distribution.
- Popluation: The entire set of possible outcomes.
- Sample: A subset of the population.
Example: Estimate the value of
using Monte Carlo simulation.
def estimate_pi(num_points):
points = np.random.rand(num_points, 2)
inside = np.sum(np.linalg.norm(points, axis=1) < 1)
return 4*inside/num_pointsThe estimated value of
print(estimate_pi(10)) # 3.6
print(estimate_pi(100)) # 3.08
print(estimate_pi(1000)) # 3.144
print(estimate_pi(10000)) # 3.1416
print(estimate_pi(100000)) # 3.14152Another Step-by-step Example: Toss coins
Consider one flip. How confident would you be about answering
? Half.
Consider two flips. Assume you know nothing about probability, what do you think the next flip will show heads?
Half.
Consider 100 flips, 1000 flips, 10000 flips...... What do you think the next flip will show heads?
Confidence in our estimate depends on two factors:
- Size of the sample.
- Variance of the sample. As the variance grows, we need larger size of samples to maintain the same level of confidence.
Not prefect but precise: Never possible to guarantee perfect accuracy through sampling, but some estimates are precise enough.
Question
How many simulations do we need to have justified confidence on our answer?
It depends on the variability of the results. Theories that supports it are the Large Number Theorem, Central Limit Theorem and Confidence Interval.
2.4 LNT, CLT and CI
Large Number Theorem (LNT): As the number of trials increases, the average of the results will converge to the expected value.
Central Limit Theorem (CLT): As the number of trials increases, the distribution of the results will converge to a normal distribution.
Confidence Interval (CI): A range of values that is likely to contain the true value of an unknown population parameter. It is calculated from the sample data, and provides a range of values that is likely to contain the true value of the population parameter.
where the margin of error is calculated as
and the standard error is calculated as
The critical value is determined by the desired confidence level. For example, for a 95% confidence level, the critical value is 1.96.