The Role of Feature Normalization in Ijepa

https://news.ycombinator.com/rss Hits: 3
Summary

Elucidating the Role of Feature Normalization in IJEPA [arxiv] How to run our code and reproduce our results We use uv for dependency management. Download the training datasets and NYU-Depth tar files: uv run download_dataset.py This requires roughly 100GB of storage space. Run the default training configuration which trains a ~300m parameter ViT-Small with a patch size of 16 and a batch size of 320. This consumes ~22GB of VRAM and takes 116 hours (assuming validation logging is turned off): uv run main.py --config conf/small.yaml Or resume a training run: uv run main.py --config /path/to/checkpoint/config.yaml --conf.resume_checkpoint_path /path/to/checkpoint/checkpointfile.pt Or evaluate the IN1k validation performance of a pretrained model: uv run main.py --config /path/to/checkpoint/config.yaml --conf.resume_checkpoint_path /path/to/checkpoint/checkpointfile.pt --conf.mode validate Or visualize features of a pretrained model: uv run main.py --config /path/to/checkpoint/config.yaml --conf.resume_checkpoint_path /path/to/checkpoint/checkpointfile.pt --conf.mode visualize-embeddings Or plot the losses of a pretrained model: uv run main.py --config /path/to/checkpoint/config.yaml --conf.resume_checkpoint_path /path/to/checkpoint/checkpointfile.pt --conf.mode plot-sample-losses Run tests: uv run python -m unittest Gotchas The code refers to token_ids this is a LongTensor that contains 4 integers for each token: register id, sample id, height id, width id. Register ID refers to the index of the register, if this patch is a register and does not contain image data, or a MASK_TOKEN_ID. Sample ID refers to the unique index of the sample that this patch/register comes from. Height ID refers to the index of this patch into the patched image, or MASK_TOKEN_ID if this token is a register. Width ID refers to the index of this patch into the patched image, or MASK_TOKEN_ID if this token is a register. We need to keep track of these IDs because unlike most ViT models, our model...

First seen: 2025-08-16 01:24

Last seen: 2025-08-16 03:24