Pika Labs unveiled version 2 of its powerful AI video model last week, bringing with it not just improved motion and realism but also a suite of tools that make it one of the best platforms of its type that I've tried during my time covering generative AI.
No stranger to implementing features aimed at making the process of creating AI videos easier, the new features in Pika 2 include adding "ingredients" into the mix to create videos that more closely match your ideas, templates with pre-built structures, and more Pikaffects.
Pikaffects was the AI lab's first foray into this type of improved controllability and saw companies like Fenty and Balenciaga, as well as celebrities and individuals, share videos of products, landmarks, and objects being squished, exploded, and blown up.
On the surface, this might make it sound like Pika Labs is using tricks and gimmicks to disguise a lack of power in its underlying model, but nothing could be further from the truth. In tests I ran over the weekend, even without those features, Pika-generated videos are comparable with the best models on the scene, including Kling, MiniMax, Runway, and even Sora.
Running tests on Pika 2.0 is slightly different from how I'd approach another model. Usually, when I put AI video tools to the test, I create a series of prompts -- some with images and some without -- and fire away. However, a lot of Pika's power comes from these additional features.
I decided to start by seeing how well it handled a simple image-to-video prompt and then a text-to-video prompt. I gave it an image generated in Midjourney with a simple descriptive prompt and then used the same prompt I'd used in Midjourney to see how well Pika could create the visuals.
My favorite test prompt for AI video is: "A dog wearing sunglasses traveling on a train." This is because most models handle it fairly well but interpret it in different ways.
It also requires the model to create a realistic-looking dog with sunglasses -- something unusual. On top of that, it has to generate accurate rapid motion outside of the window while keeping it still inside.
Unlike Sora or Kling, Pika kept the dog static, sitting on the seat. It also generated a second shot within the five-second video, zooming in on the dog's face to show off those sunglasses.
It didn't do as well with a straight image-to-video prompt using a Midjourney picture, but when I tried the same prompt while using the image as an ingredient instead of the prompt, it worked significantly better.
I wrote an article a while ago where I used FreePic's consistent character features to fine-tune the model with pictures of myself. I was able to use this to put myself in various situations by using image-to-video models, so I decided to try this out with Pika Labs 2.0.
I started with a picture I generated of myself standing on a 1950s-style US high street with a stereotypical UFO visible in the background. I'm in a full suit, ready for action, and I gave it to Pika 2.0 as part of an ingredient in the scene. I wasn't sure how it would interpret it or whether it would just take my likeness while ignoring the rest of the visuals.
The model did a brilliant job, creating two camera movements -- first focusing on me and then zooming out for a wide shot that captured the moving UFO. It managed to keep multiple individual elements moving while retaining the aesthetic of the image throughout the short video clip.
I then tried something more complex, giving it a picture of AI-generated me against a white background (who needs to pose for photos when you can generate them?) and a generated image of the interior of a potential Mars base.
I gave it the two images as ingredients along with the prompt "working on Mars." It created a video of me smiling and walking around. I then gave it an image of a potential clothing item that might be worn by Mars settlers, but the model interpreted it as a robot and gave the suit a head. It still looked cool, though.
Finally, I decided to see how well it handled one of my first AI video prompts: "A cat on the moon wearing a spacesuit with Earthrise in the background." This is something all AI video models used to fail at miserably, and most image models also struggled with.
First, I generated an image in Ideogram using that prompt. It's now one of my all-time favourite images and one I plan to print as a poster. I then gave it to Pika 2.0 as an ingredient for AI video generation with no additional prompt. It came out looking like a studio ident for a new movie.
I tried the same prompt with text-to-video, and it didn't work as well, giving us a second super-Earth in the background, but it's still better than it used to be.
Pika 2.0 isn't just a significant upgrade on the previous generation model, it's catapulted the AI video lab into prime position as one of the best platforms on the market.
Last week when Sora was first announced I wrote a guide to best Sora alternatives and left Pika off the list. While the 1.5 model was good, especially when used with Pikaffects, it wasn't as good as the alternatives. Now I feel like I need to write a best alternative to Pika guide as in my view it's better than Sora.
Competition aside, I think its amazing how far AI video has come in less than a year, going from 2 seconds of barely moving mush, to content resembling something actually filmed with a real camera -- and with near total control over output.