Sora: Generation is the Beginning

鲁大荒 @AISERLU
#Sora #World Model #AIGC

ChatGPT is not an AGI, yet

ChatGPT is an AI-powered language model developed by OpenAI, capable of generating human-like text based on context and past conversations. An AGI model is a hypothetical type of intelligent agent which, if realized, could learn to accomplish any intellectual task that human beings or animals can perform. ChatGPT is not able to do that, as it is limited by its training data and its specific domain of natural language processing. ChatGPT is an example of weak AI or narrow AI, which is able to solve one specific problem, but lacks general cognitive abilities.

Sora is not a world model

Sora is another OpenAI model that can create realistic and imaginative scenes from text instructions. A world model is a neural network that can learn to view the world at different levels of detail and predict future events based on past observations. While Sora and world models share some similarities, such as using generative models and latent variables, they are not exactly the same. Sora is more focused on producing high-quality videos from text, while world models are more concerned with learning a compressed representation of the environment and a policy to solve a task. Sora does not seem to have a mechanism to adjust the level of detail or abstraction of its output, nor does it have a way to interact with the environment or plan actions. Therefore, I would not consider Sora to be a world model, but rather a text-to-video model.

Selling anxiety is the main function of the current media. The headlines say “Reality no longer exists”. What’s more, Sora will disrupt the film and video industry.

Reality still exists, and Sora is just a new way of creating and experiencing videos. Sora does not change the facts of the world, but it can create simulations of different scenarios and perspectives. Sora can also help people express their creativity and imagination, and inspire new forms of storytelling and art.

However, Sora will have a significant impact on the film and video industry, both positive and negative. On the positive side, Sora can lower the barriers to entry and reduce the costs of filmmaking, allowing more people to create and share their videos. Sora can also enhance the quality and diversity of video content, and enable new genres and styles of filmmaking. Sora can also be a useful tool for education, entertainment, and research, as it can generate videos of various topics and domains.

On the negative side, Sora poses some challenges and risks to the film and video industry, such as:

Ethical issues: Sora can generate videos that contain harmful or misleading content, such as violence, nudity, propaganda, or fake news. Sora can also violate the privacy and consent of people who appear in the videos, or impersonate their identity and voice. Sora can also create videos that are offensive, biased, or inappropriate for certain audiences or contexts.
Legal issues: Sora can raise questions about the ownership and rights of the videos it generates, and the responsibility and liability of the users and creators. Sora can also infringe on the intellectual property and trademarks of existing films and videos, or create unfair competition and plagiarism.
Social issues: Sora can affect the perception and trust of the viewers and consumers of video content, and the reputation and credibility of the filmmakers and producers. Sora can also influence the culture and values of the society, and the norms and standards of the industry.

Therefore, Sora is a groundbreaking and disruptive technology, but it also needs to be used with caution and responsibility. Sora’s outputs need to be verified and moderated, and its users and creators need to follow ethical and legal guidelines. Sora also needs to be regulated and governed by the relevant authorities and stakeholders, and its development and deployment need to be transparent and accountable. Sora is not a threat or a substitute to human intelligence and creativity, but a partner and a catalyst.

Sora: The AI That Can Make You Believe Anything

Sora can produce videos up to a minute long with high visual quality and fidelity to the user’s prompt. That’s right, a minute. That’s longer than most TikTok videos, and probably more entertaining too.

Sora works by using a novel technique to turn visual data into patches, which are small pieces of spacetime that can be processed by a transformer network. A transformer network is a type of neural network that can learn to understand the context and relationships between different patches. Sora is based on a diffusion model, which is a generative model that can create new data by adding or removing noise. Sora can also condition its output on a text prompt, which guides the video generation process according to the user’s instructions. Sounds complicated, right? Well, it is. But the bottom line is, Sora can make videos out of thin air, or rather, out of words.

Sora is not only able to generate realistic videos, but also imaginative ones. For example, Sora can create videos of scenes that do not exist in reality, such as a papercraft world of a coral reef, or a giant wooly mammoth in a snowy meadow. Sora can also generate videos of different styles, such as animated, cinematic, or artistic. Sora can make your wildest dreams come true, or your worst nightmares, depending on what you ask for.

Sora is a powerful tool that can help people create and explore new visual worlds. However, Sora also has some limitations and challenges. For instance, Sora may not always generate videos that are consistent with the text prompt, or that follow the laws of physics. Sora may also generate videos that contain harmful or misleading content, such as violence, nudity, propaganda, or fake news. Sora may also violate the privacy and consent of people who appear in the videos, or impersonate their identity and voice. Sora may also create videos that are offensive, biased, or inappropriate for certain audiences or contexts. Therefore, Sora needs to be used with caution and responsibility, and its outputs need to be verified and moderated. Sora is not a toy, but a tool. A very powerful and dangerous tool.

Sora is an impressive achievement of artificial intelligence, and a testament to the potential of video generation models. Sora demonstrates that AI can learn to understand and simulate the physical world in motion, and to create videos that are both realistic and imaginative. Sora is a step towards building general purpose simulators of the physical world, which could help people solve problems that require real-world interaction. Or, you know, just have fun.

However, Sora is not a magic wand that can create anything that the user desires. Sora is not a threat to human creativity, nor a source of unlimited entertainment. Sora is not a world model that can predict the future, nor a super intelligence that can surpass human intelligence. Sora is also a narrow AI, which is able to solve one specific problem, but lacks general cognitive abilities. Sora is not God, but a smart machine.

When will Sora be open for registration?

According to OpenAI, Sora is not yet available to the public, and the company has not specified when it will be released. The company is currently performing adversarial testing and seeking feedback from a small group of experts and creative professionals.

Is Sora free or paid? What is the specific method?

OpenAI has not announced the pricing or the access method for Sora yet. However, based on the company’s previous products, such as ChatGPT and DALL·E, it is likely that Sora will be offered as a cloud-based service, with different tiers of pricing and usage limits depending on the user’s needs and preferences. OpenAI may also provide free or discounted access to researchers, educators, and non-profit organizations, as well as impose ethical and legal guidelines for using Sora.

How does Sora solve the copyright issues of data sources and user privacy security issues?

OpenAI has not disclosed the details of how Sora handles the copyright and privacy issues of the data sources and the user-generated videos. However, the company has stated that it is committed to ensuring the safety and responsibility of Sora, and that it is working with various stakeholders and authorities to address the potential challenges and risks of Sora. OpenAI may also implement measures such as watermarking, verification, moderation, and consent to prevent misuse and abuse of Sora.

What are some of the potential applications of Sora in different industries?

Sora has many potential applications in different industries, such as:

Filmmaking and entertainment: Sora can lower the barriers to entry and reduce the costs of filmmaking, allowing more people to create and share their videos. Sora can also enhance the quality and diversity of video content, and enable new genres and styles of filmmaking.
Advertising and marketing: Sora can help advertisers and marketers create engaging and personalized video campaigns that can capture the attention and interest of the target audience. Sora can also help test and optimize different video scenarios and strategies, and measure the impact and effectiveness of the video ads. Sora can also help generate video testimonials and reviews from customers, and showcase the features and benefits of the products or services.
Education and training: Sora can help educators and trainers create interactive and immersive video lessons and courses that can enhance the learning outcomes and experiences of the students and trainees. Sora can also help generate video simulations and scenarios that can test and improve the skills and knowledge of the learners. Sora can also help create video tutorials and guides that can explain and demonstrate complex concepts and procedures.
Journalism and media: Sora can help journalists and media outlets create compelling and informative video stories and reports that can convey the facts and opinions of the news and events. Sora can also help generate video interviews and documentaries that can showcase the perspectives and experiences of the people and places involved. Sora can also help create video summaries and highlights that can capture the essence and importance of the news and events.

These are some of the potential applications of Sora in different industries. However, Sora also poses some ethical issues that need to be considered and addressed, such as:

Disinformation and manipulation: Sora can generate videos that contain harmful or misleading content, such as violence, nudity, propaganda, or fake news. Sora can also violate the privacy and consent of people who appear in the videos, or impersonate their identity and voice. Sora can also create videos that are inconsistent with the text prompt, or that do not follow the laws of physics. Sora can affect the perception and trust of the viewers and consumers of video content, and the reputation and credibility of the filmmakers and producers. Sora can also influence the culture and values of the society, and the norms and standards of the industry.
Ownership and rights: Sora can raise questions about the ownership and rights of the videos it generates, and the responsibility and liability of the users and creators. Sora can also infringe on the intellectual property and trademarks of existing films and videos, or create unfair competition and plagiarism. Sora can also create videos that are offensive, biased, or inappropriate for certain audiences or contexts. Sora can also affect the economic and social value of video content, and the incentives and rewards of the video industry.
Safety and regulation: Sora can pose challenges and risks to the safety and security of the users and creators, and the public at large. Sora can also be misused and abused by malicious actors, such as hackers, criminals, or terrorists. Sora can also have unintended and unforeseen consequences, such as technical errors, biases, or feedback loops. Sora can also be difficult to monitor and control, due to the complexity and opacity of the AI system. Sora can also be subject to different laws and regulations in different countries and regions, creating conflicts and dilemmas.

These are some of the ethical issues that are not unique to Sora, but apply to any AI technology that can generate or manipulate digital media. Therefore, Sora needs to be used with caution and responsibility, and its outputs need to be verified and moderated. Sora also needs to be regulated and governed by the relevant authorities and stakeholders, and its development and deployment need to be transparent and accountable. Sora is a powerful and innovative technology, but it also needs to be ethical and beneficial.

Text to video generation require massive data and video resources for training. If Sora is open to all OpenAI users, will user-generated content also become Sora’s data source? Who will manage and control this data? How to control?

Good questions, but OpenAI has not disclosed the details of how Sora handles the data sources and the user-generated videos. However, based on some web sources, Here are some possible answers, but they may not be accurate or complete.:

If Sora is open to all OpenAI users, will user-generated content also become Sora’s data source? According to OpenAI, Sora was trained on a large and diverse dataset of videos and images from the internet, It is not clear whether whether it will incorporate user-generated content as well. However, it is possible that Sora will use user-generated content as a way to improve its performance and diversity, as well as to learn from user feedback and preferences. This may also create a virtuous cycle, where more users create more videos, which in turn help Sora create better videos.

Who will manage and control this data? OpenAI has not announced who will manage and control the data sources and the user-generated videos. However, it is likely that OpenAI will have some oversight and authority over the data, as it is responsible for ensuring the safety and responsibility of Sora. OpenAI may also collaborate with other stakeholders and authorities, such as data providers, content creators, content consumers, regulators, and researchers, to address the potential challenges and risks of Sora. OpenAI may also implement measures such as watermarking, verification, moderation, and consent to prevent misuse and abuse of Sora.

How to control this data? OpenAI has not specified how to control the data sources and the user-generated videos. However, it is possible that OpenAI will provide some tools and guidelines for the users and creators to control their data, such as choosing the data sources, setting the data quality and quantity, filtering the data content, and deleting or editing the data. OpenAI may also provide some tools and guidelines for the viewers and consumers to control their data, such as verifying the data origin, checking the data accuracy and reliability, reporting the data issues, and opting out of the data.

Sora and Computer Graphics

Computer Graphics (CG) has emerged as a crucial division in the film industry, playing an indispensable role in the creation of both animated and live-action films. The advent of CG has revolutionized the way filmmakers tell stories, enabling them to realize their creative visions more efficiently and accurately. CG technology allows for the creation of synthetic images that can be manipulated to the filmmaker’s will to create intricate worlds, design fantastical creatures, and generate breathtaking special effects. These elements, which would be impossible to capture with a camera, can be crafted with precision and detail, bringing the filmmaker’s imagination to life on the screen.

While advancements in technology have led to the development of tools like Sora, a text-to-video generator, these cannot replace the role of CG in the film industry immediately. Sora generates videos through text dialogue, a process that is inherently random and uncontrollable.

The videos produced by Sora lack the precision and control offered by CG. They cannot be fine-tuned to match the filmmaker’s vision, nor can they create the kind of complex visuals and special effects that CG can. As such, while tools like Sora may have their uses offering exciting possibilities, they are not a substitute for the detailed, controlled environment that CG provides.

The Potential of Sora: World Simulator

Unreal Engine, a renowned physics engine for game development, has revolutionized the way we create and interact with virtual worlds. It’s not just a tool for game developers, but also an asset in film production. Now, imagine if Sora, a text-to-video generator, could harness the same capabilities as Unreal Engine. It could potentially become the most powerful world simulator.

Such a tool would have far-reaching implications. It could be used in game development, film production, and even in fields like urban planning and disaster management. It could simulate how societies react to different scenarios, providing valuable insights for policymakers and researchers.

While Sora is currently a text-to-video generator, the potential to be world simulator is immense only if it is incorporating the physics of world construction and the sociology of rule construction, it could revolutionize the way we create and interact with virtual worlds.

Is Sora good enough for spatial computing now?

Sora can create and simulate some aspects of the physical world in motion, but it may not be able to capture the full complexity and interactivity of spatial computing. Spatial computing requires more than just generating videos, but also creating immersive and responsive environments that can interact with the user and the real world. Spatial computing is the field of computer science that deals with creating and manipulating digital representations of the physical world, such as 3D graphics, augmented reality, and virtual reality.

The video generated by Sora gives AI a more realistic visual expression, but authenticity is not just the goal pursued by movies, such as movies directed by Nolan, including Inception, Tenet, etc. The world of movies is often surreal, anti-physical and counter-intuitive, which is where artistic expression can bring unlimited imagination. If AI generates such content, it will often be seen as a mistake. How should we choose between surreal imagination and error tolerance?

This is an interesting question, and there is no definitive answer to it. However, based on some web sources, I can provide some possible perspectives, but they may not be accurate or complete. Here they are:

One perspective is that surreal imagination and error tolerance are not mutually exclusive, but complementary. Sora can generate videos that are both realistic and imaginative, and that can challenge and expand the conventional notions of physics and logic. Sora can also learn from user feedback and preferences, and improve its performance and diversity. Sora can be seen as a creative partner and a catalyst, rather than a competitor or a threat, to human intelligence and expression. Sora can also inspire new forms of artistic and cinematic experiences, such as education, entertainment, and research .
Another perspective is that surreal imagination and error tolerance are not equally desirable, but dependent on the context and the purpose. Sora can generate videos that are consistent with the user’s prompt, but the user’s prompt may not always be consistent with the reality or the intention. Sora can also generate videos that are inconsistent with the user’s prompt, but the user’s prompt may not always be clear or complete. Sora can be seen as a powerful and dangerous tool, rather than a faithful and obedient servant, to human intelligence and expression. Sora can also pose some ethical and social issues, such as disinformation and manipulation, privacy and consent, and ownership and rights .

Therefore, the choice between surreal imagination and error tolerance is not a simple or a fixed one, but a complex and a dynamic one.

What will people say when Sora generated a distorted self-portrait just like the Artist Francis Bacon‘s painting?

Francis Bacon was an Irish-born British artist who painted distorted and expressive figures, often based on the themes of crucifixion, violence, and alienation. If Sora generated a distorted self-portrait just like Francis Bacon’s painting, what’s your reaction?

impressed and amazed
confused and disturbed
be critical and sceptical

Best Selling Movies

Best Selling Games

Number of ChatGPT Users, Feb 2024

World Model : Generative AI is Changing the Media Landscape

A world model is a representation of the environment and its dynamics, learned by an AI system from data and experience. A world model can help an AI system to understand, predict, and interact with the world, and to generate new content and scenarios.

The media landscape is undergoing a radical transformation, thanks to the advances in artificial intelligence (AI). AI is not only changing the way we consume and create media, but also the way we understand and interact with the world. In this section, I will explore how AI is changing the media landscape.

Film: The Medium of Light and Grain

Film is one of the oldest and most influential forms of media, dating back to the late 19th century. Film is the medium of light and grain, as it uses film stock, which is a thin strip of plastic coated with light-sensitive chemicals, to capture and create images. Film stock is composed of tiny grains, which are the smallest units of film. Film grain can detect light and create images, giving film its distinctive texture and quality.

Film is also the medium of art and expression, as it uses various techniques, such as cinematography, editing, sound, and music, to tell stories and convey emotions. Film is also the medium of culture and history, as it reflects and influences the society and the events of its time. Film is also the medium of innovation and experimentation, as it evolves and adapts to new technologies and trends.

However, film is also facing some challenges and limitations, such as the high cost and complexity of production, the degradation and deterioration of the film material, and the difficulty and inefficiency of distribution and preservation. Film is also being challenged and replaced by newer and more accessible forms of media, such as video and game.

Video: The Medium of Screen and Pixel

Video is one of the most popular and pervasive forms of media, dating back to the mid-20th century. Video is the medium of screen and pixel, as it uses electronic devices, such as cameras, computers, and televisions, to display and create images. Video is composed of pixels, which are the smallest units of display.

Video is also the medium of information and communication, as it uses various formats, such as news, documentaries, and interviews, to inform and educate the audience. Video is also the medium of entertainment and leisure, as it uses various genres, such as comedy, drama, and action, to amuse and engage the audience. Video is also the medium of diversity and accessibility, as it reaches and appeals to a wide and global audience.

However, video is also facing some challenges and limitations, such as the low quality and fidelity of the images, the lack of interactivity and immersion of the experience, and the abundance and overload of the content. Video is also being challenged and enhanced by newer and more advanced forms of media, such as game and virtual reality.

Game: The Medium of Space and Vector

Game is one of the most interactive and immersive forms of media, dating back to the late 20th century. Game is the medium of space and vector, as it uses digital environments, such as 3D graphics, augmented reality, and virtual reality, to create and manipulate images. Game is composed of vectors, which are the basic units of space. Vectors are mathematical entities that represent direction and magnitude, which define the shape and movement of the images in the game.

Game is also the medium of challenge and fun, as it uses various mechanics, such as rules, goals, and feedback, to test and reward the player. Game is also the medium of choice and agency, as it uses various elements, such as characters, stories, and worlds, to empower and involve the player. Game is also the medium of creativity and learning, as it uses various tools, such as editors, modding, and sharing, to enable and inspire the player.

However, game is also facing some challenges and limitations, such as the high demand and expectation of the player, the complexity and difficulty of the development, and the ethical and social issues of the impact.

Generative AI: The Medium of Token

Generative AI is a term that refers to AI applications that can create new content or data, such as text, images, music, or code, based on some input or guidance. A token is a basic unit of meaning or representation in generative AI. Tokens can be words, characters, symbols, or segments of data that are used to encode and decode information. Tokens are essential for language models, which are foundation models that can generate natural language text. Language models use tokens to tokenize the input text, which means breaking it down into smaller pieces that can be processed by the neural network. The output text is then detokenized, which means reassembling the tokens into coherent sentences.

Tokens have some constraints and limitations that affect the performance and capabilities of generative AI. One of them is the token count, which is the number of tokens that a model can process at a time. The token count determines the length and complexity of the text that a model can generate or understand. Another one is the vocabulary size, which is the number of unique tokens that a model can recognize and use. The vocabulary size affects the diversity and quality of the text that a model can generate or understand.

However, Generative AI is facing some challenges and limitations, such as the uncertainty and unpredictability of the AI, the quality and coherence of the images, and the verification and moderation of the content. Generative AI is not only a powerful tool for existing and established forms of media, such as film, video, and game, but also a new media.

World Model: Generation is the beginning

Generation is the beginning of a new era of media, where AI using generative models to simulate and understand the dynamics of the world. Mathematics and algorithms are the basis, where hundreds of millions of users construct the world brick by brick with the prompt-engineering and imagination. This is a topic of interest for many researchers and practitioners in the field of artificial intelligence.

This world model is not only a source of media, but also a medium of media, where users can consume and create media, as well as understand and interact with the world. This world model is not only a representation of reality, but also a supplement to reality, where users can challenge and expand their reality, as well as express and inspire their imagination. This world model is not only an opportunity, but also a responsibility, where users need to use it with caution and responsibility, and verify and moderate its outputs. This world model is not only a product of AI, but also a enviroment of AI, where users and AI can collaborate and participate in the creation and exploration of the world. This world model is not only a world of AI, but also a world of us.

I call this World Model created by all mankind the Ship of Theseus, a philosophical thought experiment that raises the question of whether an object that has had all of its components replaced remains fundamentally the same object. This name may be appropriate for world model, as it reflects the paradox and the possibility of identity and change across time and space. The world model is constantly changing and evolving, as new components are added and replaced by the people and the AI. The world model is also constantly the same and different, as it retains and transforms the essence and the appearance of the old world. The world model is a ship of Theseus, a ship of us.

转载请注明：

Tagged: Media, OpenAI, Ship of Theseus, Sora, World Model

鲁大荒 @AISERLU #Sora #World Model #AIGC