My Favorite is building the Person of Interest machine 😉
In my sophomore, I came across the CBS series “Person Of Interest” that revolves around a computer system dubbed the “Machine” that can analyze data from surveillance cameras, electronic communications, and audio input and predict acts of crime.
I had this spark then in 2013, Why can’t I make one such machine? While most of the things the machine do are fairly impossible at present, I was able to design a working system of the machine that can perform at least half of what the POI Machine does.
The “Machine” as described in Person of Interest has artificial consciousness which is pretty much out of equation at the current moment. I started listing down all the features the machine came packed with, thanks to the episode where Nathan boots the system reveals most of the stuff.
The first step was to decide upon the technologies I need to use, and I ended up with the following,
- NodeJS for forwarding and handling the videos streamed from the users’ webcam through Websockets
- Python for most of the NLP tasks, OCR, Information Extraction, Anomaly detection,etc,.
- C++ for processing the images from videos using OpenCV
- Databases including MongoDB, Redis, MySQL, Cayley.
Various other frameworks and libraries used includes NuPIC, OpenCog, ConceptNet, Natural, WordNet, Freebase and DBPedia. And of course, Apache Hadoop and Thrift.
Once I had decided on all the technologies to work on, I started building it. The initial phase was to ‘teach’ as in POI terms to recognize faces of people, their speech and analyze their emotions.
I tested it in a closed environment, in my room, myself at different corners checking if it can see me and followed the test with multiple users. Once it was able to do so, the next problem as Harold says would be “to sort them all out”.
The system needs to have knowledge of the world. I made use of ConceptNet and WordNet to serve this purpose. For the sentiment analysis from users’ speech, I used AFINN list and it turned out to be very good one.
The Query processing was a tedious job. I referenced the architecture of IBM’s Watson (It’s one hell of a Q&A System).
Graph Database plays a vital role in mapping the connections between the users in real world. I can recall an episode where the machine points out that “the taxi driver and the passenger were actually fifth cousins”.
I was initially skeptical on opting for the right one, but ended up finally with Cayley and it did the job pretty well.
Handling the data was a very difficult task. Most of the information you receive will be unstructured and thanks to UIMA, I was able to sort out most of them.
The system used a small Memcached instance to cache the primary information of all the people included an uniqueID, names, Aadhar ID (like SSN), their recent location, emotion and seen with, when. The last four tend to change in real-time (or, near maybe).
I didn't want the ‘machine’ to be neither a closed or open system, but a combination of both. The machine works autonomously, and it will send you an email when you are predicted to be a victim.
When I mentioned the system to be a ‘combination of both’, it comes with an UI where you can query a name or Aadhar ID and the machine will say only where the person was, when and with whom. Nothing more than it as it may pose a threat to their privacy.
It cannot actually identify a gun shot, but did a good job on predicting the crime from speech, and their emotions. I did also patched the system to identify the person with a colored box as seen in POI, “Yellow” for who knows about the machine, “White” for common people and “Red” for perpetrator.
The cluster configuration on which it ran is as follows,
- 5 nodes each with 2.1GHz Intel Quad Core processor and 4GB RAM
- 2TB of total memory
- Each equipped with a NVIDIA GeForce GT 610 GPU
The system achieved a maximum speed of 0.29 Tflops, which is really not great, but fair enough initially.
I was pretty excited. It has been technically Day 0 in those earlier days of building. Booted the system and started with small face and speech recognition tests. It did good on tracking a couple of people but failed when scaled up. It was a triumph to me and I was jubilant, I forgot to take images on Day 1.
To put it in action, I connected it to my department’s local network and subscribed to the video stream from the computers (well not legal). I can still recall my friend saying to the other during demonstration “We must kill Raghav before this weekend at the central park”.
I got an email within seconds notifying me that I am predicted to be a victim.
Now I needed to test the system in Real World. Thanks to my College, we have over 13 CCTV equipped round our campus and our lab being the relay to the network, I was able to tap into the network and woila, I had this,
As days progressed, I improved with features like it calling me through Twilio and also revamped the UI to suit Samaritan,
Apparently my friends weren’t aware its running on their system too.
Its been 2 years since I ran it and it is quietly resting inside my backup HDD.
Edit: Digged up some of the old pics,
I needed a medium through which the machine can communicate with me on the go. Facebook didn’t have its Bots API earlier then, so I created a separate account for the machine, and integrated Facebook API that allowed it to chat with me from anywhere,
I also made a Dashboard through which I was able to view the locations of the people the system tracks, their devices connected to network within our campus network,
Finally, When I upgraded the system to the new UI and wiped all memory (Day 0)
Along with my friend, we were able to build a portable version of the system that is solar powered and can be fitted anywhere like lamp posts. The main goal of the portable version was to provide assistance during Disasters where the Government can remotely access affected areas, find casualties, provide charging station for smartphones and also WiFi hotspot for nearby people by connecting to Outernet. We showcased the system at a Nascomm’s event,
Edit #2: While I am overwhelmed by a lot of positive response, I came across a few comments questioning the authenticity of the answer above and I would be very much happy to share some things,
I had posted the above answer in my personal medium blog late Nov 2014 (How I built the Person of Interest Machine – Raghav – Medium) and suprisingly I was curious enough to share the first day I started working on this thing in Twitter in Nov 2013
I made the revamped UI and the dashboard to track the people at Freshdesk’s Save the Hacker hackathon in May 2015,
You can see the video stream along with the data in the Browser client and Yes, it is streamed in real-time through Websockets and the information are drawn over them using HTML Canvas. You can see a similar project opensourced here (drejkim/face-detection-node-opencv) and I didn’t clone this repo as base one (you can see it was done on Dec 2014) whereas I shared my first day tweet on Nov 2013. I would highly suggest the above repo for people who want to know how the server works seamlessly with OpenCV and streams the frames in real-time to the Browser.
I had also received a lot of comments on Open-sourcing the system so that people can look over and collaborate on building. While I would be happy to do so, I believe the system is too early to be open sourced and it is my long term vision to put this project in real-world use. Meanwhile, I will opensource individual components that I had built including the Anamoly detection, Knowledge Graph, etc,. eventually in my GitHub.