Skip to content

Paper Notes¶

Reading notes and analysis on papers I find interesting.

Object Detection

DINO Emerging Properties in Self-Supervised Vision Transformers
Stable DINO Stable self-distillation with no labels
SAM-DETR++ Semantic-Aligned Matching for DETR Convergence
VLDet Open-Vocabulary Detection with Vision-Language Pre-Training

Transformer Architecture

Revisiting [CLS] Token decoupling in Vision Transformers
Naive Sparse Attention Hardware-aligned trainable sparse attention

Large Language Models

Opus Optimizer-Aware Dynamic Data Selection for LLMs
GLM-5 Report From Vibe Coding to Agent Engineering