Data Source
Data from Polymarket, a prediction market platform where users trade on event outcomes.
Dataset Overview
- Sample Size: 825 traders
- Collection Period: November 2024
- Features: 20+ trading metrics
- Completeness: 100% valid data
Clustering Analysis
K-Means Algorithm
K-Means clustering segments 825 traders into 3 groups by minimizing within-cluster variance.
Algorithm Parameters
- Number of clusters (k): 3
- Initialization: k-means++
- Max iterations: 300
- Convergence tolerance: 1e-4
- Random seed: 42 (reproducibility)
Quality Assessment
Silhouette Score measures clustering quality (range: -1 to 1).
Silhouette Score
Loading...
Interpretation: Scores above 0.4 indicate reasonable cluster separation.
Dimensionality Reduction
Principal Component Analysis (PCA)
PCA projects high-dimensional features to 2D while preserving variance.
PCA Results
- Input dimensions: 20 features
- Output dimensions: 2 principal components
- PC1 explained variance: ~45%
- PC2 explained variance: ~25%
- Cumulative variance: ~70%
Data Normalization
Robust Scaling
Robust Scaling uses percentiles (p05, p95) to handle outliers.
Formula
scaled_value = (value - p05) / (p95 - p05)
Z-Score Normalization
Standardizes features to mean=0, std=1:
Formula
z_score = (value - mean) / std
Network Analysis
Network Construction: k-NN
k-Nearest Neighbors builds network using Euclidean distance in feature space.
Community Detection: Louvain
Louvain algorithm detects communities by optimizing modularity.
Technology Stack
Frontend
- HTML5 - Semantic markup
- CSS3 - Design system with variables
- JavaScript ES6+ - Modular architecture
- ECharts 5.4.3 - Data visualization
Data Processing
- Python - Data collection & analysis
- scikit-learn - Machine learning
- pandas - Data manipulation
- networkx - Network analysis
Limitations
- Data snapshot from November 2024
- Sample size: 825 traders
- Limited to available metrics
- k=3 chosen empirically