Can ChatGPT turn the tables with Ghibli?

On April 7th, OpenAI is testing adding a watermark to the GPT-4o image generation model.

This is an ImageGen model originally only open to ChatGPT Plus users, capable of generating images with text and realistic visual works. OpenAI said that the model demonstrates amazing visual expression and good context understanding through joint training of a large number of images and text.

A week ago on April Fools' Day, OpenAI CEO Ultraman just announced that ChatGPT image generation will be launched to all free users. Soon, Ghibli-style AI images quickly flooded the entire network.

Immediately afterwards, Midjourney released the AI ​​image generation model Midjourney V7 version and enabled alpha testing. The new "sketch mode" supports conversational interactive interface, real-time editing, and voice recognition generation functions. As the "same model" of OpenAI, Midjourney is not willing to be outdone and secretly competes with OpenAI.

"Ghibli" is the name of Miyazaki's animation studio and art gallery, and its meaning refers to the hot wind blowing through the Sahara Desert. The style characteristics are mainly the combination of gouache and watercolor. The core of the animation is mostly related to nature, and it injects high-end gray into nature, presenting a light, gentle, comfortable and quiet visual effect.

Not only that, this style is good at conveying the mood of the picture through color filters of the same color. In pictures with higher proportions of the same color, it also uses brushstrokes and subtle color matching differences to enhance the depth and shallowness of the picture. In terms of character design, simplicity and picture book style are emphasized, and the image is outlined with capable simple lines.

OpenAI is testing watermarks for free user-generated images, while ChatGPT Plus users can save watermarkless images.

Let’s take a look at ChatGPT’s magic today and evaluate OpenAI’s Ghibli’s strength.

prompt1: Beijing without round cypresses in spring sunny weekend crowds shuttle through the streets in medium scene Ghibli style

prompt2: Nine and three quarters of the platform are smoking someone squinting at the close-up Ghibli style

prompt3: The goddess of liberty is working in front of a computer, wearing anti-blue glasses, showing the distress of cows and horses on her face. Close-up Ghibli style

Players participating in the evaluation include Jimeng, Keling and ChatGPT, and they also look at the strengths of each company.

Dream AI

That is, the literary picture of Dreams is very fast, with an average of 10 seconds.

Not only that, it supports adjustment of image proportions. After completing the image generation, you can select images for editing, supporting functions such as high definition, detail repair, partial repainting, video generation, image expansion, and pen elimination.

The final generated diagram is as follows.

prompt1: Beijing without round cypresses in spring sunny weekend crowds shuttle through the streets in medium scene Ghibli style

prompt2: Nine and three quarters of the platform are smoking someone squinting at the close-up Ghibli style

prompt3: The goddess of liberty is working in front of a computer, wearing anti-blue glasses, showing the distress of cows and horses on her face. Close-up Ghibli style

Keling AI

Ke Ling's waiting time is slightly longer than that of dreams, and it is generated in about 30 seconds.

However, Keling has a better ecological combination. There is a DeepSeek prompt word optimization portal in the upper right corner of the propt input box. After the image is generated, you can click to generate a video with one click. In other words, from text to pictures, and from pictures to video, the arrangements are clearly made.

The final generation effect is as follows.

prompt1: Beijing without round cypresses in spring sunny weekend crowds shuttle through the streets in medium scene Ghibli style

prompt2: Nine and three quarters of the platform are smoking someone squinting at the close-up Ghibli style

prompt3: The goddess of liberty is working in front of a computer, wearing anti-blue glasses, showing the distress of cows and horses on her face. Close-up Ghibli style

ChatGPT

According to OpenAI's official website, its literary graphic model DALL·E 3 is built natively based on ChatGPT. It is suitable for brainstorming ideas using ChatGPT. Just ask what ChatGPT wants to see in anything from simple sentences to detailed paragraphs.

Like Keling with DeepSeek, ChatGPT will automatically generate customized detailed prompts for DALL·E 3.

At the same time, it supports fine-tuning of the picture, that is, if you are roughly satisfied with a certain picture, but there are some inappropriate aspects, you can ask ChatGPT to adjust it in a few sentences.

Click More on the right to see the option to create a picture. Select Create Picture and enter propt.

Overall, the operation is simple and the process is silky. The basic tone has been produced in about 30 seconds, but the average waiting time for the entire process reaches 150 seconds.

Below are the results.

prompt1: Beijing without round cypresses in spring sunny weekend crowds shuttle through the streets in medium scene Ghibli style

prompt2: Nine and three quarters of the platform are smoking someone squinting at the close-up Ghibli style

prompt3: The goddess of liberty is working in front of a computer, wearing anti-blue glasses, showing the distress of cows and horses on her face. Close-up Ghibli style

Summarize

That is, Dreams stand out at an average generation speed of 10 seconds, and this immediacy is a huge advantage for users who need to iterate their creativity quickly. However, the speed improvement is often accompanied by compromises in detail control. From the perspective of generation effect, although the dream image can quickly present the tone of Ghibli style, it is slightly insufficient in emotional communication and hierarchical scheduling. Especially in the complex scene of "Beijing without round cypresses" in Prompt1, that is, the generation results of dreams fail to fully capture the delicate balance between "advanced gray" and "natural atmosphere".

In contrast, although the generation speed of Keling is slightly slower (about 30 seconds), a complete ecosystem from text to pictures and then to video is built through DeepSeek's prompt word optimization and video generation capabilities. This ecological integration capability is especially suitable for users who need multimodal output, such as animation creators or short video makers.

Judging from the quality of the generated pictures, ChatGPT has a better understanding of Ghibli style, and controls both tones and emotions relatively accurately. For example, in the propt3 "The Goddess of Liberty works in front of the computer", ChatGPT successfully captured the subtle emotional tension between "blue light glasses" and "the distress of cows and horses", while maintaining the lightness and tenderness of Ghibli's style.

This advantage stems from ChatGPT's prompt word optimization mechanism. It can automatically generate more detailed descriptions based on the user input prompt, thereby improving the accuracy of generated images. In addition, ChatGPT supports fine-tuning of images, allowing users to adjust details through simple language descriptions, which further enhances its competitiveness in creative expression.

In the display of the official website, the style of the picture generated by ChatGPT is not limited to Ghibli, but also the following detailed pictures, brain images and creative pictures.

As for the copyright issue of image, open source images created using DALL·E 3 belong to users and can be reprinted, sold or sold without obtaining OpenAI license.

Not only that, OpenAI confirms that it is developing the ImageGen API, which can be used in the future to build application products and expand the application scenarios of image generation models. Developers can use this API to build their own application products, such as educational tools, design assistance platforms, etc. The construction of this open ecosystem will promote the popularization and innovation of AI-generated image technology.

ChatGPT has a different idea this time. It seems to show that it is not enough to stand out in a big language model, and the momentum of a diverse track integrating multiple ecology is just right. Use products to cue the big model of China: It’s your turn to play.

"Special statement: The content of the above works (including videos, pictures or audio) is uploaded and published by users of the "Dafenghao" self-media platform under Phoenix.com. This platform only provides information storage space services.

Notice: The content above (including the videos, pictures and audios if any) is uploaded and posted by the user of Dafeng Hao, which is a social media platform and merely provide information storage space services."

[Editor in charge: Peng Kunping PT135]

Comment

Dedicated to interviewing and publishing global news events.