Cyber Crime cases and role of confusion matrix

Tushar Joshi
5 min readJul 20, 2021

What is cybercrime?

Cybercrime is criminal activity that either targets or uses a computer, a computer network or a networked device.

Most, but not all, cybercrime is committed by cybercriminals or hackers who want to make money. Cybercrime is carried out by individuals or organizations.

Some cybercriminals are organized, use advanced techniques and are highly technically skilled. Others are novice hackers.

Rarely, cybercrime aims to damage computers for reasons other than profit. These could be political or personal.

Types of cybercrime

Here are some specific examples of the different types of cybercrime:

  • Email and internet fraud.
  • Identity fraud (where personal information is stolen and used).
  • Theft of financial or card payment data.
  • Theft and sale of corporate data.
  • Cyberextortion (demanding money to prevent a threatened attack).
  • Ransomware attacks (a type of cyberextortions).
  • Cryptojacking (where hackers mine cryptocurrency using resources they do not own).
  • Cyberespionage (where hackers access government or company data).

Most cybercrime falls under two main categories:

  • Criminal activity that targets
  • Criminal activity that uses computers to commit other crimes.

Cybercrime that targets computers often involves viruses and other types of malware.

Cybercriminals may infect computers with viruses and malware to damage devices or stop them working. They may also use malware to delete or steal data.

Use of ML in finding out Cyber attacks :

  1. Binary Logistic Regressions The significance of the determinants of cybercrime in Qatar was assessed using the method of Binary Logistic Regression analysis.
  2. Six binary variables associated with six types of cybercrime have been constructed, and then six binary logistic regressions have been run.
  3. The binary variables represent website hacking, email cybercrime, social media .The decision on statistical significance of the effects is made on the basis of the p-value (Sig.) and the 10% significance level.
  • The results reveal a statistically significant association between the motive of monetary gains and the probability of cybercrime related to online banking.
  • This relationship is found to be positive and statistically significant at the 1% level. This finding confirms the validity of the Pressure segment of the conceptual framework based on the Fraud Triangle.
  • Monetary motives make people commit cyber attack on financial organisations or individual accounts.

Confusion Matrix:

What is a confusion matrix?

It is a matrix of size 2×2 for binary classification with actual values on one axis and predicted on another.

Let’s understand the confusing terms in the confusion matrix: true positive, true negative, false negative, and false positive with an example.

EXAMPLE:

A machine learning model is trained to predict tumor in patients. The test dataset consists of 100 people.

True Positive (TP) — model correctly predicts the positive class (prediction and actual both are positive). In the above example, 10 people who have tumors are predicted positively by the model.

True Negative (TN) — model correctly predicts the negative class (prediction and actual both are negative). In the above example, 60 people who don’t have tumors are predicted negatively by the model.

False Positive (FP) — model gives the wrong prediction of the negative class (predicted-positive, actual-negative). In the above example, 22 people are predicted as positive of having a tumor, although they don’t have a tumor. FP is also called a TYPE I error.

False Negative (FN) — model wrongly predicts the positive class (predicted-negative, actual-positive). In the above example, 8 people who have tumors are predicted as negative. FN is also called a TYPE II error.

With the help of these four values, we can calculate True Positive Rate (TPR), False Negative Rate (FPR), True Negative Rate (TNR), and False Negative Rate (FNR).

With the help of TP, TN, FN, and FP, other performance metrics can be calculated.

Precision, Recall

Both precision and recall are crucial for information retrieval, where positive class mattered the most as compared to negative. Why?

While searching something on the web, the model does not care about something irrelevant and not retrieved (this is the true negative case). Therefore only TP, FP, FN are used in Precision and Recall.

Precision

Out of all the positive predicted, what percentage is truly positive.

Recall

Out of the total positive, what percentage are predicted positive. It is the same as TPR (true positive rate).

Conclusion:

Confusion matrix, precision, recall, and F1 score provides better insights into the prediction as compared to accuracy performance metrics. Applications of precision, recall, and F1 score is in information retrieval, word segmentation, named entity recognition, and many more.

Thank You for reading !!!

Keep learning Keep sharing !!!

--

--

Tushar Joshi

MLOPS Intern at Linux World || MLOPS 🧠 || DEVOPS(🐳☸👩🏻‍🍳)|| Ansible || Kubernetes|| AWS || ML || DL || Data Science || Jenkins|| Docker || RedHat Linux ||