Object Embedding for Cluster Analysis

Before conducting cluster analysis on the chosen fashion-related objects, it’s necessary to vectorize them. For this task, the powerful open-source image embedding model DreamSim is utilized to extract visual embeddings from both the entire image and each detected object.

Image source: DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data

Notably, DreamSim is recognized for its ability to align more closely with human similarity judgments compared to existing metrics, making it highly suitable for downstream applications such as image retrieval.

In fact, this embedding process occurs right after open-set object detection, preceding object grouping and filtering. Given the manageable number of objects, it’s feasible to first extract visual embeddings for each object and then proceed with filtering.

This strategy adds flexibility to the entire process, permitting adjustments to the filtering criteria as needed. If the embeddings were extracted post-grouping and filtering, any change in criteria might necessitate re-extraction of visual embeddings for additional objects.

However, for datasets with a large number of objects, filtering prior to embedding extraction might be more efficient, especially if the target objects constitute only a small fraction of the total set.

The dimension of the embeddings generated by this process is 1,792. It’s necessary to set up DreamSim according to the instructions provided in its repository before executing the notebook. Fortunately, the setup process is quite user-friendly, as it can be conveniently installed using pip.

Implementation Details

Input

Name	Description
`ins_posts/<username>/bboxes`	Open-set object bounding boxes of Instagram posts

Process

Code	Description
`codes/cluster_analysis/object_embedding.ipynb`	Extract visual embeddings for full images and objects

Output

Name	Description
`ins_posts/<username>/embeddings`	Visual embeddings of full images and objects

Folder Structure:

ins_posts_3
├── AylaDimitri
│   ├── AylaDimitri_posts.json
│   ├── AylaDimitri_profile.json
│   ├── bboxes
│   │   ├── 008a4a567a8c4a5f5f0cb06ec0dc92e8.json
│   │   ├── ...
│   │   └── fcff621d25e77ac24889686453e1befe.json
│   ├── embeddings
│   │   ├── 008a4a567a8c4a5f5f0cb06ec0dc92e8.npy
│   │   ├── ...
│   │   └── fcff621d25e77ac24889686453e1befe.npy
│   └── images
│       ├── 008a4a567a8c4a5f5f0cb06ec0dc92e8.jpg
│       ├── ...
│       └── fcff621d25e77ac24889686453e1befe.jpg
└── xeniaadonts
    ├── bboxes
    │   ├── 07d8c562f6d1ee6f1a2bdb1453e912d7.json
    │   ├── ...
    │   └── fbeab7c9d911db651ad4bc1d3bc25062.json
    ├── embeddings
    │   ├── 07d8c562f6d1ee6f1a2bdb1453e912d7.npy
    │   ├── ...
    │   └── fbeab7c9d911db651ad4bc1d3bc25062.npy
    ├── images
    │   ├── 07d8c562f6d1ee6f1a2bdb1453e912d7.jpg
    │   ├── ...
    │   └── fbeab7c9d911db651ad4bc1d3bc25062.jpg
    ├── xeniaadonts_posts.json
    └── xeniaadonts_profile.json