MGIE: Apple's new AI tool launches prompt-based image editing

Do repost and rate:

Apple's new open-source MLLM-Guided Image Editing, MGIE for short, lets you edit images by describing them. This new artificial intelligence (AI) powered tool will also let you crop and brighten photos on demand and free the user from learning to use sophisticated photo editing software.

MGIE is a model developed through a collaboration between Apple and researchers from the University of California, Santa Barbara. The model was presented in a paper accepted at the International Conference on Learning Representations (ICLR) 2024.

The paper shows that MGIE is effective in enhancing automatic metrics and human evaluation, while still maintaining competitive inference efficiency.

The MGIE model can handle both simple and complex image editing tasks, such as changing the shape of objects in a photo or making them appear brighter. It combines two different functions of multimodal language models.

Firstly, it interprets user prompts, and secondly, it "imagines" what the edited image would look like. For instance, if a user requests a bluer sky in a photo, the model will increase the brightness of the sky portion of the image to create the desired effect.

Easy photo editing

Users can easily edit their photos with MGIE by simply typing out what they want to change about the picture.

For example, if a user wants to make an image of a pepperoni pizza look healthier, they can type “make it more healthy” and the model will add vegetable toppings. Similarly, if a user wants to brighten a photo of tigers in the Sahara, they can type “add more contrast to simulate more light” and the picture will appear brighter.

“Instead of brief but ambiguous guidance, MGIE derives explicit visual-aware intention and leads to reasonable image editing. We conduct extensive studies from various editing aspects and demonstrate that our MGIE effectively improves performance while maintaining competitive efficiency. We also believe the MLLM-guided framework can contribute to future vision-and-language research,” the researchers said in the paper.

Apple's new AI photo editing software.

According to Venture Beat, Apple has made MGIE available for download on GitHub, as well as a web demo on Hugging Face Spaces. Its plans for the model beyond research are, as yet, unknown.

Many image generation platforms, such as OpenAI's DALL-E 3, can carry out basic photo editing tasks on the images they produce using text inputs. Additionally, Adobe, the creator of Photoshop, has its own AI editing model.

Apple slow roll in the AI market

Apple, The Verge reports, has not been a significant player in the field of generative AI, unlike its competitors, Microsoft, Meta, and Google. However, Apple's CEO Tim Cook has stated that the company plans to introduce more AI features to its devices in the upcoming year.

In December, Apple researchers unveiled an open-source machine learning framework called MLX, which facilitates the training of AI models on Apple Silicon chips.

Although MGIE is considered a significant advancement, Venture Beat explains, professionals believe that there is still a lot of work to be done to enhance multimodal AI systems.

Nevertheless, the pace of development in this area is rapidly increasing. If the buzz surrounding the release of MGIE is any indication, this form of assistive AI may soon become an essential creative partner.

Last week, Interesting Engineering reported that with the Apple Vision Pro having recently launched, CEO Tim Cook, during a call with analysts following the company's fiscal first-quarter earnings report, hinted that the company will make a mega artificial intelligence (AI) announcement later this year.

“As we look ahead, we will continue to invest in these and other technologies that will shape the future,” said Cook.

“That includes artificial intelligence where we continue to spend a tremendous amount of time and effort, and we’re excited to share the details of our ongoing work in that space later this year.”

You can review the study for yourself in the journal arXiv.

Study abstract:

Instruction-based image editing improves the controllability and flexibility of image manipulation via natural commands without elaborate descriptions or regional masks. However, human instructions are sometimes too brief for current methods to capture and follow. Multimodal large language models (MLLMs) show promising capabilities in cross-modal understanding and visual-aware response generation via LMs. We investigate how MLLMs facilitate edit instructions and present MLLM-Guided Image Editing (MGIE). MGIE learns to derive expressive instructions and provides explicit guidance.

The editing model jointly captures this visual imagination and performs manipulation through end-to-end training. We evaluate various aspects of Photoshop-style modification, global photo optimization, and local editing. Extensive experimental results demonstrate that expressive instructions are crucial to instruction-based image editing, and our MGIE can lead to a notable improvement in automatic metrics and human evaluation while maintaining competitive inference efficiency.

Add Interesting Engineering to your Google News feed.
Add Interesting Engineering to your Google News feed.

Regulation and Society adoption

Ждем новостей

Нет новых страниц

Следующая новость