How the Engineering Placement Quiz Works

The Engineering Placement Quiz was built by a group of five Management Engineering students for their Fourth Year Design Project.

The Machine Learning Algorithm Behind It

The quiz is powered by a Naive Bayes classifier algorithm - one of the most efficient, intuitive and effective algorithms for applying machine learning to large data sets.


The Naive Bayes algorithm leverages Bayes’ Theorem to calculate the probability of a sample belonging to a certain category. Speaking in terms of The Quiz, it is calculating the probability that you (the sample) should belong to each of the fifteen engineering programs offered at the University of Waterloo (the categories). It is calculating these probabilities based on the ~2000 data points it has been trained with.

Bayes’ Theorem

In machine learning we are often interested in selecting the best hypothesis (H) given a set of data (D). Bayes’ Theorem provides a way to calculate the probability of a hypothesis given the available data.


Bayes’ Theorem Equation: P(H | D) =P(D | H) x P(H)P(D)


Where:

P(H | D) is the probability of hypothesis H given the data D. This is called the posterior probability.

P(D | H) is the probability of data D given that the hypothesis H was true.

P(H) is the probability of hypothesis H being true (regardless of the data). This is called the prior probability of H.

P(D) is the probability of the data (regardless of the hypothesis).

A Simple Example

Let’s say we have data on 1000 pieces of fruit. The fruit being a lemon, mango or some other fruit. Imagine we also know 3 features of each fruit - whether it’s sour or not, round or not and yellow or not. We’ve organized all this data in the table below.


Fruit

Sour

Round

Yellow

Total

Lemon

400

350

450

500

Mango

0

150

300

300

Other

100

150

50

200

Total

500

650

800

1000


Just from looking at the table, we already know that:


Let’s say we’re given the features of an additional piece of fruit and we want to predict what type of fruit it is (it’s class). We’re told that the fruit is sour, round, and yellow. We can use Bayes’ Theorem to classify whether it’s a lemon, a mango or other fruit.


General Formula: P(A | B) =P(B | A) x P(A)P(B)


Lemon:P(Lemon | Sour, Round, Yellow) =P(Sour | Lemon) x P(Round | Lemon) x P(Yellow | Lemon) x P(Lemon)P(Sour) x P(Round) x P(Yellow)

P(Lemon | Sour, Round, Yellow) =(0.8) x (0.7) x (0.9) x (0.5)(0.25) x (0.33) x (0.41)

P(Lemon | Sour, Round, Yellow) =0.252


Mango: P(Mango | Sour, Round, Yellow)=P(Sour| Mango) x P(Round | Mango) x P(Yellow | Mango) x P(Mango)P(Sour) x P(Round) x P(Yellow)

P(Mango | Sour, Round, Yellow)=0


Other: P(Other | Sour, Round, Yellow) =P(Sour | Other) x P(Round | Other) x P(Yellow | Other) x P(Other)P(Sour) x P(Round) x P(Yellow)

P(Other | Sour, Round, Yellow) =0.018


Therefore, based on the highest score (~25.2% for lemon) we can assume this sour, round and yellow fruit is in fact, a lemon.