As technology is evolving, we can see that how our future is revolving around robots. Robots are being made to automate the activities where human life is considered in danger. The idea is to replace human with robots where the task is more critical and requires more accuracy. But, many firms are interested in evolving robots to mimic same like as human in actions.

As being human, when we see people meet each other, we can predict that what happens next: handshake, a greeting or a hug. It is our ability that developed on the basis of our experiences to anticipate actions.

On the other hand, machines are not smart enough to take these types of decisions. In the world of robotics, the computer system that predicts actions will open doors to new endless possibilities of the usage of robots in our daily life and in emergency response systems and other smart systems that are supposed to provide suggestion in different situations.

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) are developing an algorithm that can anticipate interaction more accurately than ever before. This is a great move towards predictive vision that can utilize artificial intelligence to enhance the robot’s ability to work beyond expectation in various fields.

To train the machine about the future actions, a variety of Youtube videos, Tv series, and movies has been added to the machine learning algorithm that will help machine to adopt the human behavior and respond accordingly. These videos also anticipate what object is likely to appear in a video five seconds later.

A PhD student, Carl Vondrick at Computer Science and Artificial Intelligence Laboratory (CSAIL) says of his related research, “Humans automatically learn to anticipate actions through experience, which is what made us interested in trying to imbue computers with the same sort of common sense, We wanted to show that just by watching large amounts of video, computers can gain enough knowledge to consistently make predictions about their surroundings.”

How It Actually Works?

Previously, there were two approaches to predict the future actions. The one is to take individual pixel of an image as a sample and use that knowledge to create photorealistic “future” image. Second approach involves, labeling the scene for the computer in advance, which is impractical for being able to predict actions on a large scale.

Instead of working on previous strategies, the CSAIL team along with Vondrick created an algorithm that can predict visual representations which involves different versions of what the scene might look like. Vondrick says, “Rather than saying that one pixel value is blue, the next one is red, and so on, visual representations reveal information about the larger image, such as a certain collection of pixels that represents a human face”.

The presented algorithm employs field of artificial intelligence that uses neural networks to teach a computer machine to pore over massive amounts of data to find patterns on their own. The network predicts a representation and automatically classifies it accordingly as a list of actions. Then system merges those actions into one that it uses as its prediction.

Vondrick says, “The future is inherently ambiguous, so it’s exciting to challenge ourselves to develop a system that uses these representations to anticipate all of the possibilities”. It is something beyond, which is challenging artificial intelligence for future innovations.

How successful is it, Does Artificial intelligence approach went well in predicting future actions?

After the algorithm is trained on 600 hours of unlabeled videos, the algorithm correctly predicted the action more than 43 percent of the time. That is a significant increase in effectiveness as compared to existing algorithms. This new approach of artificial intelligence trained machine in a more promising way that it can predict future more accurately on the basis of real life sample videos.

The algorithm was tested and shown a frame from a video and asked to predict what object will appear five seconds later. The algorithm predicted the object in the frame 30 percent more accurately than any other algorithm.

The researchers are expecting more success in this approach to predict more complex actions. Upon successful prediction of the algorithm, Vondrick says, “We hope to be able to work off of this example to be able to soon predict even more complex tasks”.

As the algorithm is not 100% accurate and is not capable enough to deploy yet, the team says, “the future versions could be used for everything from robots that develop better action plans to security cameras that can alert emergency responders when someone who has fallen or gotten injured”.

The research work to develop the algorithm was supported by a grant from the National Science Foundation, along with a Google faculty research award for Torralba and a Google PhD fellowship for Vondrick. More advancement and research are being conducted to make the algorithm more promising in the world to deploy.