Methodology

Technical details of data collection, analysis methods, and algorithms

Data Source

Data from Polymarket, a prediction market platform where users trade on event outcomes.

Dataset Overview
  • Sample Size: 825 traders
  • Collection Period: November 2024
  • Features: 20+ trading metrics
  • Completeness: 100% valid data

Clustering Analysis

K-Means Algorithm

K-Means clustering segments 825 traders into 3 groups by minimizing within-cluster variance.

Algorithm Parameters
  • Number of clusters (k): 3
  • Initialization: k-means++
  • Max iterations: 300
  • Convergence tolerance: 1e-4
  • Random seed: 42 (reproducibility)

Quality Assessment

Silhouette Score measures clustering quality (range: -1 to 1).

Silhouette Score
Loading...

Interpretation: Scores above 0.4 indicate reasonable cluster separation.

Dimensionality Reduction

Principal Component Analysis (PCA)

PCA projects high-dimensional features to 2D while preserving variance.

PCA Results
  • Input dimensions: 20 features
  • Output dimensions: 2 principal components
  • PC1 explained variance: ~45%
  • PC2 explained variance: ~25%
  • Cumulative variance: ~70%

Data Normalization

Robust Scaling

Robust Scaling uses percentiles (p05, p95) to handle outliers.

Formula

scaled_value = (value - p05) / (p95 - p05)

Z-Score Normalization

Standardizes features to mean=0, std=1:

Formula

z_score = (value - mean) / std

Network Analysis

Network Construction: k-NN

k-Nearest Neighbors builds network using Euclidean distance in feature space.

Community Detection: Louvain

Louvain algorithm detects communities by optimizing modularity.

Technology Stack

Frontend

  • HTML5 - Semantic markup
  • CSS3 - Design system with variables
  • JavaScript ES6+ - Modular architecture
  • ECharts 5.4.3 - Data visualization

Data Processing

  • Python - Data collection & analysis
  • scikit-learn - Machine learning
  • pandas - Data manipulation
  • networkx - Network analysis

Limitations

  • Data snapshot from November 2024
  • Sample size: 825 traders
  • Limited to available metrics
  • k=3 chosen empirically