Skip to the content.

Mercor Cheating Detection Kaggle Competition

Kaggle

This repository contains my solution for the Mercor Cheating Detection Kaggle competition. The goal of this competition is to predict whether a candidate is engaging in cheating behavior during online interviews, using anonymized behavioral features, platform activity signals, and a social graph. The evaluation metric is cost-sensitive, reflecting the real-world operational impact of false negatives, false positives, and manual review decisions.


📝 Overview

The submission requires predicting probabilities for each candidate and finding the optimal decision thresholds to minimize total operational cost.


📂 Repository Structure

├── Datasets/ # Raw and processed datasets (original datasets not included due to size)
| ├── Graph_train.csv # Processed train dataset with graph features
| ├── Graph_test.csv # Processed test dataset with graph features
| ├── train.csv # Original train dataset (download from Kaggle)
| ├── test.csv # Original test dataset (download from Kaggle)
| ├── referral_graph.csv # An edge list representing the complete social network (download from Kaggle)
├── Notebooks/ # Experiment notebooks
| ├── Mercor_Fraud_Add_Graph_Features.ipynb # Feature Engineering Notebook
| ├── Mercor_Fraud_Models_Graph_Features.ipynb # Model Building Notebook
├── README.md

🔧 Pipeline

1. Data Processing & Feature Engineering

2. Model Training & Stacking

3. Threshold Optimization & Evaluation

4. Submission


⚙️ Key Libraries

pandas, numpy, networkx, scikit-learn, catboost, lightgbm, xgboost


💡 Highlights


📈 Results


🔗 References


👨‍💻 How to Run

Python version: Tested with Python 3.12.12 (used in Google Colab).

Datasets: Original datasets are not included due to size. You can download them from the official Kaggle competition page:
Mercor Cheating Detection - Data

If running locally: