Building a Catalog from Book Images Using Alibaba Cloud OSS and Model Studio

Introduction

In today's digital age, effectively managing and extracting information from visual content is essential, especially for libraries, bookstores, and personal collections. Leveraging cloud storage and AI technology can streamline this process dramatically.

In this blog, we'll guide you through creating a catalog system that reads book cover images stored in an Alibaba Cloud OSS bucket and generates a CSV file (catalog) with details like book title, author, and publisher. This approach combines the power of Alibaba Cloud OSS for scalable storage and the Qwen-VL-Plus model in Model Studio for intelligent information extraction from images. To see how it works, please watch this video.

1. Setting Up Alibaba Cloud OSS

What is Alibaba Cloud OSS?
- Alibaba Cloud OSS is a scalable and secure cloud storage service that allows users to store large amounts of unstructured data called objects.
Creating an OSS Bucket
- Creating Bucket needs a globally unique name and selection of region etc., as shown below where I created a bucket named 'bookcatalog' to store all my book images."

Uploading Images
- Once the bucket is created, we can upload book cover images to the OSS bucket from our local device such as PC, Laptop or Phone. In this demo, I have created a directory named “um” where I have uploaded the cover images of books, as shown below.

2. Introducing the Model Studio

Alibaba Cloud Model Studio is an all-in-one platform designed for foundation model development and application building. It enables both developers and business professionals to quickly engage in creating and deploying foundation model applications. Readers are encouraged to explore various options of Model studio.

For this blog, we only need to get the API key from the model studio that we will use in our Python program to interact with Qwen-VL-Plus model. To get the API key, follow the steps shown in the following figures:

What is the Qwen-VL-Plus Model?

Alibaba Cloud Qwen-VL-Plus model offers enhanced text extraction, organization, and summarization capabilities, support a wider range of image resolutions and aspect ratios, and improve visual reasoning for advanced decision-making. Additionally, it can analyze photos to solve complex problems, including step-by-step solutions for homework questions. We can test various Qwen models using the GUI of Model Studio, as shown below. However, for this blog we will use Python script to generate API calls to Qwen-VL-Plus model.

3. Integrating OSS with the Qwen Model

For this demo, we will use VS Code to write Python script to extract the information such as book name, author and publisher names from the book title pages stored in OSS bucket named "bookcatalog" under the directory "um".

Setting Up the Environment
- We will use the dotenv Python library to load environment variables, in our case it is the API key to make our application more secure and manageable.
- The .env file is a simple text file used to store environment variables like API keys, Access Keys, and other sensitive or configurable information outside of your main codebase. Each line contains a variable and its value in KEY=value format as shown below.

Making API Calls
- Some part of the code snippets that shows the necessary python library and other variables such as bucket name, region, number of images in the bucket etc. which needs to be configured before running the code. To access the full python code, please click on this link.

4. Running the Program and final Results

Python Libaraires.
- Before we execute the code, we need to install dashcope and dotenv libraries using the following: pip install dashscope python-dotenv
- Additionally, make sure you have your .env file in place to store API keys taken from Model Studio.

Storing Results in CSV Format
- Run main.py either from the terminal (python3 main.py) or directly from VS Code GUI. Once the execution is complete, a CSV file (book_info.csv) will be generated and stored in the same folder where we have our main.py. This file contain all the necessary information extracted from the book images.

Just to verify the results, I am showing the actual image of the book5 stored in OSS bucket:

The output of Qwen-VL-Plus for this image is: "The Art of War, Sun Tzu, Vintage Books" which is perfect.

5. Final thoughts

I have set ACL of objects to "Public-Read" to make it simple and to avoid the use of Access Keys. However, it is recommended to consider proper security measures.
A well-crafted prompt can significantly enhance the quality of responses from AI models. So, it is recommended to try different prompts and see the response.
Pay specific attention to the output if the desired information is missing on the front page of the book.

6. Conclusion

In conclusion, combining Alibaba Cloud OSS with Model Studio's Qwen-VL-Plus model enables a streamlined, automated solution for cataloging book collections directly from cover images. By storing images in OSS and leveraging AI to extract essential book details, we can efficiently generate organized, structured catalogs in CSV format. This approach not only saves time and reduces manual data entry but also offers scalable potential for larger collections. As AI and cloud services continue to evolve, such integrations will become increasingly valuable for organizations and individuals seeking intelligent solutions to manage and organize vast amounts of visual information.

Community

Building a Catalog from Book Images Using Alibaba Cloud OSS and Model Studio

Introduction

1. Setting Up Alibaba Cloud OSS

2. Introducing the Model Studio

What is the Qwen-VL-Plus Model?

3. Integrating OSS with the Qwen Model

4. Running the Program and final Results

5. Final thoughts

6. Conclusion

Read previous post:

Read next post:

JwdShah

You may also like

Comments

JwdShah

Related Products

Hybrid Cloud Distributed Storage

OSS(Object Storage Service)

Storage Capacity Unit

EasyDispatch for Field Service Management