Poster Generation

Select Layout for Each Group

For every identified group, the first step is determining the most suitable layout.

As detailed in the “Layout Selection Strategy” section, our objective is to align the aspect ratio distribution of the objects within the group with that of the layout. Concurrently, our aim is to include as many objects as possible, necessitating a balance between these considerations.

Evaluating Candidate Layouts

For each group, we scrutinize all potential layouts. Knowning the number of rectangles in each layout, we devise a strategy to select a corresponding number of objects from the group.

These objects’ bounding boxes are then used to compute the aspect ratio distribution for the group with the chosen layout. Then we calculate the cosine similarity between the aspect ratio distribution of the layout and the group.

Finally, layouts are ranked according to these similarity scores in descending order.

Object Selection Strategy

To determine the aspect ratio similarity between a group and a layout, we select a number of objects from the group equivalent to the number of rectangles in the layout.

Our approach is straightforward: we disregard layouts that exceed the group’s object count and select the top num_rects objects based on the highest clip_iqa scores.

In fact, we initially explored a more complex strategy, where we select the most diverse num_rects objects from the group through kmeans clustering, with num_clusters equaling num_rects. Then within each cluster, we select the image with the highest clip_iqa score.

However, due to the time-consuming nature and minimal impact on results, we reverted to the more streamlined method.

Concluding the Layout Choice

After ranking the layouts, we prefer those with a high similarity score and a substantial rectangle count.

Each group is presented with a table like the following, where layout selection is informed by both the aspect ratio similarity score and the number of rectangles:

Layout	Num_Rects	AR_Sim	Center_Indices
2014-08-24.jpg	21	0.288863	[49, 55, 65, 166, 15, 92, 117, 110, 68, 122, …]
2011-08-21.jpg	30	0.233713	[49, 55, 65, 166, 15, 92, 117, 110, 68, 122, …]
2014-05-04.jpg	26	0.216868	[49, 55, 65, 166, 15, 92, 117, 110, 68, 122, …]
2011-09-04.jpg	35	0.198838	[49, 55, 65, 166, 15, 92, 117, 110, 68, 122, …]
2012-03-25.jpg	43	0.196790	[49, 55, 65, 166, 15, 92, 117, 110, 68, 122, …]

Finally, the chosen layout rectangles are sorted in descending order by area, while the objects are ordered descendingly by their clip_iqa scores. This ensures that larger rectangles are paired with images of higher clip_iqa scores.

Refined Approach to Image Cropping

Rather than displaying the entire image within each rectangle, our goal is to highlight the target fashion objects. As outlined in the “Preprocess for Display” section, we will utilize the human bounding box as a reference point for cropping each object.

The most intricate aspect of this process is cropping the image in such a way that it not only encompasses the object but also includes an appropriate section of the human body. Simultaneously, it’s crucial to maintain the same aspect ratio as the corresponding rectangle to ensure that resizing the image to fit does not lead to distortion.

This task requires extensive adjustment and fine-tuning. We have detailed the strategy within the code. The effectiveness of our approach is evident in the final posters, where the cropped images accurately reflect the desired focus and composition.

Enhanced Approach to Naming and Description

Leveraging Language Models for Style Mimicry

With the remarkable advancements in Large Language Models (LLMs), the optimal method for crafting names and descriptions that emulate a specific style is undoubtedly through the use of GPTs.

Image source: Introducing GPTs

As detailed in the official introduction, GPTs empower us to forge customized versions of ChatGPT, integrating instructions, additional knowledge, and various skill combinations.

Vision Capability Integration

A key feature of GPTs is their vision capability. This opens up exciting possibilities, such as creating a chatbot that can emulate the writing style of notable individuals like Bill Cunningham.

When tasked with generating names and descriptions for posters, this chatbot can process the poster’s image alongside simple instructions. Utilizing its combined vision and language skills, the chatbot then crafts the desired name and description.

Creating “Robot Bill Cunningham”

Consequently, I create a GPTs named “Robot Bill Cunningham” with the following configuration:

Utilizing a Dataset for Enhanced Knowledge

As discussed in the Layout Detection section, we’ve extracted text from each poster. By pairing this text with its corresponding poster, we’ve compiled a (image, text) pair dataset.

This dataset acts as an enriched knowledge base for “Robot Bill Cunningham”, enabling more accurate outputs.

Instructions

The final step involves detailing specific instructions to guide “Robot Bill Cunningham” in generating content that aligns with our objectives.

Bill Cunningham, renowned for his “On the Street” column in The New York Times, was a master at capturing the essence of street fashion through his photography. Your role as Bill Cunningham is to analyze and learn from a collection of fashion posters and their corresponding descriptions provided in the files “bill_template.zip” and “bill_cunningham_text.csv”. These materials contain a wealth of examples from Cunningham’s work, showcasing his unique talent for creating engaging titles and insightful descriptions that encapsulate the spirit of each fashion poster. Your task is to use this knowledge to generate creative and fitting names for fashion posters and provide descriptions that reflect Cunningham’s keen eye for urban style and individual expression. Focus on understanding the nuances of his style, the way he connected fashion to the rhythm of city life, and how he highlighted the extraordinary in the ordinary. This will enable you to create content that truly resonates with the essence of Bill Cunningham’s iconic work.

Advancing Usage of GPTs: Flexibility and Precision

One of the significant advantages of utilizing GPTs is the flexibility it offers in content creation.

After initially generating the poster, GPTs enable us to fine-tune the outcome by producing multiple iterations of the name and description. This allows for a selection process where the most suitable version is chosen.

Furthermore, if the text doesn’t fit the poster due to length constraints, the chatbot can be prompted to generate a more appropriately sized version, ensuring a perfect match for the available space.

When composing the instructions, it’s crucial to consider the complexity of the poster, often featuring an array of objects far more intricate than typical single images.

To guide the chatbot effectively, it’s beneficial to provide details about the object categories present in the poster, as well as highlight the specific fashion trend or theme intended for emphasis. This detailed briefing enables the chatbot to tailor its creative output more accurately, reflecting the nuances and focal points of the poster.

Example: “Futural Sunglasses”

To demonstrate the capability of this approach, below is an example of a name and description generated for the group titled “Futural Sunglasses”:

Title	Visionaries
Description	This week’s fashion narrative is framed not by the garments but by the avant-garde eyewear perched on our city’s pioneers of style. As sunlight dances off the eclectic shapes and tints of their futuristic sunglasses, each pair speaks to the audacity of personal expression amidst the concrete landscape.

Implementation Details

Input

Name	Description
`ins_posts_clusters_selected_group_objects_quality_person.csv`	csv file containing selected objects and their quality scores and person bounding boxes
`bill_template_processed_results_selected.json`	The selected layouts and their corresponding aspect ratio distributions

Process

Code	Description
`codes/poster_generation/poster_generation.ipynb`	Generate posters by selecting layout, cropping images, and generating name and description
GPTs	Generate name and description

Output

Name	Description
`posters`	A folder containing the generated posters