Poster Generation
Select Layout for Each Group
For every identified group, the first step is determining the most suitable layout.
As detailed in the “Layout Selection Strategy” section, our objective is to align the aspect ratio distribution of the objects within the group with that of the layout. Concurrently, our aim is to include as many objects as possible, necessitating a balance between these considerations.
Evaluating Candidate Layouts
For each group, we scrutinize all potential layouts. Knowning the number of rectangles in each layout, we devise a strategy to select a corresponding number of objects from the group.
These objects’ bounding boxes are then used to compute the aspect ratio distribution for the group with the chosen layout. Then we calculate the cosine similarity between the aspect ratio distribution of the layout and the group.
Finally, layouts are ranked according to these similarity scores in descending order.
Object Selection Strategy
To determine the aspect ratio similarity between a group and a layout, we select a number of objects from the group equivalent to the number of rectangles in the layout.
Our approach is straightforward: we disregard layouts that exceed the group’s object count and select the top num_rects
objects based on the highest clip_iqa
scores.
In fact, we initially explored a more complex strategy, where we select the most diverse num_rects
objects from the group through kmeans clustering, with num_clusters equaling num_rects
. Then within each cluster, we select the image with the highest clip_iqa
score.
However, due to the time-consuming nature and minimal impact on results, we reverted to the more streamlined method.
Concluding the Layout Choice
After ranking the layouts, we prefer those with a high similarity score and a substantial rectangle count.
Each group is presented with a table like the following, where layout selection is informed by both the aspect ratio similarity score and the number of rectangles:
Layout | Num_Rects | AR_Sim | Center_Indices |
---|---|---|---|
2014-08-24.jpg | 21 | 0.288863 | [49, 55, 65, 166, 15, 92, 117, 110, 68, 122, …] |
2011-08-21.jpg | 30 | 0.233713 | [49, 55, 65, 166, 15, 92, 117, 110, 68, 122, …] |
2014-05-04.jpg | 26 | 0.216868 | [49, 55, 65, 166, 15, 92, 117, 110, 68, 122, …] |
2011-09-04.jpg | 35 | 0.198838 | [49, 55, 65, 166, 15, 92, 117, 110, 68, 122, …] |
2012-03-25.jpg | 43 | 0.196790 | [49, 55, 65, 166, 15, 92, 117, 110, 68, 122, …] |
Finally, the chosen layout rectangles are sorted in descending order by area, while the objects are ordered descendingly by their clip_iqa
scores. This ensures that larger rectangles are paired with images of higher clip_iqa
scores.
Refined Approach to Image Cropping
Rather than displaying the entire image within each rectangle, our goal is to highlight the target fashion objects. As outlined in the “Preprocess for Display” section, we will utilize the human bounding box as a reference point for cropping each object.
The most intricate aspect of this process is cropping the image in such a way that it not only encompasses the object but also includes an appropriate section of the human body. Simultaneously, it’s crucial to maintain the same aspect ratio as the corresponding rectangle to ensure that resizing the image to fit does not lead to distortion.
This task requires extensive adjustment and fine-tuning. We have detailed the strategy within the code. The effectiveness of our approach is evident in the final posters, where the cropped images accurately reflect the desired focus and composition.
Enhanced Approach to Naming and Description
Leveraging Language Models for Style Mimicry
With the remarkable advancements in Large Language Models (LLMs), the optimal method for crafting names and descriptions that emulate a specific style is undoubtedly through the use of GPTs.
Image source: Introducing GPTs
As detailed in the official introduction, GPTs empower us to forge customized versions of ChatGPT, integrating instructions, additional knowledge, and various skill combinations.
Vision Capability Integration
A key feature of GPTs is their vision capability. This opens up exciting possibilities, such as creating a chatbot that can emulate the writing style of notable individuals like Bill Cunningham.
When tasked with generating names and descriptions for posters, this chatbot can process the poster’s image alongside simple instructions. Utilizing its combined vision and language skills, the chatbot then crafts the desired name and description.
Creating “Robot Bill Cunningham”
Consequently, I create a GPTs named “Robot Bill Cunningham” with the following configuration:
Utilizing a Dataset for Enhanced Knowledge
As discussed in the Layout Detection section, we’ve extracted text from each poster. By pairing this text with its corresponding poster, we’ve compiled a (image, text)
pair dataset.
This dataset acts as an enriched knowledge base for “Robot Bill Cunningham”, enabling more accurate outputs.
Instructions
The final step involves detailing specific instructions to guide “Robot Bill Cunningham” in generating content that aligns with our objectives.
Bill Cunningham, renowned for his “On the Street” column in The New York Times, was a master at capturing the essence of street fashion through his photography. Your role as Bill Cunningham is to analyze and learn from a collection of fashion posters and their corresponding descriptions provided in the files “bill_template.zip” and “bill_cunningham_text.csv”. These materials contain a wealth of examples from Cunningham’s work, showcasing his unique talent for creating engaging titles and insightful descriptions that encapsulate the spirit of each fashion poster. Your task is to use this knowledge to generate creative and fitting names for fashion posters and provide descriptions that reflect Cunningham’s keen eye for urban style and individual expression. Focus on understanding the nuances of his style, the way he connected fashion to the rhythm of city life, and how he highlighted the extraordinary in the ordinary. This will enable you to create content that truly resonates with the essence of Bill Cunningham’s iconic work.
Advancing Usage of GPTs: Flexibility and Precision
One of the significant advantages of utilizing GPTs is the flexibility it offers in content creation.
After initially generating the poster, GPTs enable us to fine-tune the outcome by producing multiple iterations of the name and description. This allows for a selection process where the most suitable version is chosen.
Furthermore, if the text doesn’t fit the poster due to length constraints, the chatbot can be prompted to generate a more appropriately sized version, ensuring a perfect match for the available space.
When composing the instructions, it’s crucial to consider the complexity of the poster, often featuring an array of objects far more intricate than typical single images.
To guide the chatbot effectively, it’s beneficial to provide details about the object categories present in the poster, as well as highlight the specific fashion trend or theme intended for emphasis. This detailed briefing enables the chatbot to tailor its creative output more accurately, reflecting the nuances and focal points of the poster.
Example: “Futural Sunglasses”
To demonstrate the capability of this approach, below is an example of a name and description generated for the group titled “Futural Sunglasses”:
Title | Visionaries |
Description | This week’s fashion narrative is framed not by the garments but by the avant-garde eyewear perched on our city’s pioneers of style. As sunlight dances off the eclectic shapes and tints of their futuristic sunglasses, each pair speaks to the audacity of personal expression amidst the concrete landscape. |
Implementation Details
Input
Name | Description |
---|---|
ins_posts_clusters_selected_group_objects_quality_person.csv | csv file containing selected objects and their quality scores and person bounding boxes |
bill_template_processed_results_selected.json | The selected layouts and their corresponding aspect ratio distributions |
Process
Code | Description |
---|---|
codes/poster_generation/poster_generation.ipynb | Generate posters by selecting layout, cropping images, and generating name and description |
GPTs | Generate name and description |
Output
Name | Description |
---|---|
posters | A folder containing the generated posters |