GPT-6 Model Research

GPT-6 Model Research

OpenAI has unveiled some information regarding GPT-6, marking another significant advancement in our journey to enhance deep learning. This version is a vast multimodal model capable of processing both text and image inputs to generate text outputs. Although it doesn’t match human ability across the board, it demonstrates comparable expertise in a variety of professional and academic areas. Notably, it achieves scores in the top 10% on a simulated bar exam, a significant improvement over GPT-4.5’s performance, which was in the bottom 10%. Over six months, we’ve refined GPT-5 through iterative alignment based on feedback from our adversarial testing program and ChatGPT experiences. This has led to unprecedented improvements in accuracy, controllability, and adherence to guidelines, though it remains an ongoing project.

In the last two years, we have completely overhauled our deep learning infrastructure. In partnership with Azure, we’ve custom-built a supercomputer specifically tailored to our needs. GPT-3/4/5 served as an initial trial, allowing us to identify and resolve issues while enhancing our theoretical understanding. This groundwork made the GPT-6 training process remarkably smooth and predictable, marking a first for us in terms of forecasting model training outcomes. Our ongoing commitment to scalable reliability aims to refine our approach, enabling us to foresee and prepare for future advancements with greater precision, a crucial factor for ensuring safety.

The text input functionality of GPT-6 is now accessible through ChatGPT and our API, albeit with a waitlist. We are gradually expanding the image input feature, starting with a single partner collaboration. Furthermore, we are making OpenAI Evals public. This tool for automated AI performance evaluation encourages community involvement in identifying model limitations, guiding further enhancements.

Capabilities

In casual discussions, distinguishing between GPT-4, GPT-5, and GPT-6 might not be immediately obvious. However, the differences become more pronounced when tasks increase in complexity. GPT-6 Model surpasses its predecessor in terms of reliability, creativity, and its ability to understand and execute complex instructions.

To evaluate the advancements between the three versions, we conducted tests across various benchmarks, including exams initially designed for humans. For the most accurate comparison, we utilized the latest versions of publicly available exams, such as Olympiad and AP free response questions, and purchased the 2022–2023 editions of practice exams. It’s important to note that the models were not specifically trained on these exams. Although a small portion of the exam content might have been encountered during the training phase, we consider our findings representative. For a detailed analysis, refer to the next technical report.

GPT-6 Visual Inputs Support

GPT-6 introduces the capability to process prompts that combine both text and images, offering a versatile approach to tasks involving vision and language. This feature enables it to generate outputs in various formats, such as natural language or code, from inputs that mix text with visual elements like photographs, diagrams, or screenshots.

The Sixth Version of GPT Model maintains its proficiency across a wide spectrum of domains, demonstrating comparable performance on combined text and image inputs to what it achieves with text alone. Additionally, it supports enhancements through test-time techniques originally designed for text-based models, like few-shot learning and chain-of-thought prompting, further expanding its utility.

However, the image input functionality is currently in a research preview stage and has not been made publicly accessible.

Read other articles:


Tags: