About

Who Am I?

Hi! I'm Falaah. I'm currently a Research Fellow in the CVIT Lab at IIIT-Hyderabad and an Artist in Residence at the Montreal AI Ethics Institute. I'm extremely lucky to get to do two things I absolutely love to do: fundamental research and creating scientific comics!

Previously, I've worked as a Research Engineer at Dell EMC, Bangalore where I designed and built data-driven models for Identity and Access Management (IAM). I really enjoyed working in the Security space and Machine Learning presented itself as a natural fit for this domain. Security is a fundamentally asymmetric domain; To break a system, we need only find one weakness. To secure a system, we must identify (which itself is practically impossible) and then safeguard every possible vulnerability. Furthermore, when designing bastions of security, we are tasked to account for attack capabilities that we have seen in the past and continue to see in the present, but also for future capabilities. Machine learning presents itself as a desirable ally because it effectively detects patterns that otherwise require decades of human expertise to discern.

My work in the industry showed me firsthand the pressing challenges of building 'production-ready' models. When the prediction from a model could determine whether we authorize an 8-figure transaction, it ceases to matter what new acronym we’re giving the underlying algorithm (AI or ML or DL or AGI or !!!!!) Security is a specially challenging environment to deploy Machine Learning in because the ground truth extremely ambiguous. This is unlike other domains, where say you’re looking at the picture of a commonplace object and the model detects it something different, you immediately know that the model has gone wrong in its classification. In the middle of a Flash Sale, when all your alerts have gone off and bad actors have potentially penetrated your system, you need to be able to rely upon model results. And if you’re not a domain expert, you might choose to blindly trust in the model’s predictions. One simple oversight and you would have incorrectly declined millions of dollars’ worth of transactions and will forever go down in history as the developer who cried wolf.

Contrary to the media narrative around AI, we are yet to have figured out how to build models that are robust, dependable, unbiased and designed to thrive in the wild. These challenges have informed my interest to explore the theoretical foundations of generalization and robustness and to translate these insights into algorithms with provable guarantees. I’m also interested in critically assessing how AI impacts, and is in turn impacted by, the underlying social setting in which it was formulated. Towards this end, I dapple in “AI Anthropology”, which follows the simple philosophy of: The best way to understand models is to observe their behavior in the wild.

Curriculum VitaeGoogle Scholar

My Work

Meta-Security Research

The fundamental research problem was to investigate the efficacy of a novel “who I am/how I behave” authentication paradigm. Conventional authentication works on a “what I know” (username/password) or “what I have” (device) model. Our system would study the user’s behavior while typing his/her username and use the activity profile as the key against which access was granted. This eliminated the need for the user to remember a password or have access to a registered device. Conversely, even if a password is cracked or a device is stolen, the bad actor would not be able to penetrate the system because his behavior would intrinsically differ from that of the genuine user.

Paper: Arif Khan F., Kunhambu S., G K.C. (2019) Behavioral Biometrics and Machine Learning to Secure Website Logins

US Patent: Arif Khan, Falaah, Kunhambu, Sajin and Chakravarthy G, K. Behavioral Biometrics and Machine Learning to secure Website Logins. US Patent 16/257650, filed January 25, 2019

CAPTCHAs, short for Complete Automated Public Turing Tests to tell Computers and Humans Apart, have been around since 2003 as the simplest human-user identification test. They can be understood as Reverse Turing Tests because in solving a CAPTCHA challenge it is a human subject that is appearing to prove his/her human-ness to a computer program.

Over the years we have seen CAPTCHA challenges evolve from being a string of characters for the user to decipher, to be an image selection challenge, to being as simple as ticking a checkbox. As each new CAPTCHA scheme hits the market, it is inevitably followed with research on new techniques to break these challenges. Engineers must then go back to the drawing board and design a new and more secure CAPTCHA scheme, which, upon deployment and subsequent use, is again, inadvertently subject to adversarial scrutiny. This arduous cycle of designing, breaking and then redesigning to strengthen against subsequent breaking, has become the de-facto lifecycle of a secure CAPTCHA scheme. This beckons the question; Are our CAPTCHAs truly “Completely Automated”? Is the labor involved in designing each new secure scheme outweighed by the speed with which a suitable adversary can be designed? Is the fantasy of creating a truly automated reverse Turing test dead?

Reminding ourselves of why we count CAPTCHAs as such an essential tool in our security toolbox, we characterize CAPTCHAs in a robustness-user experience-feasibility trichotomy. With such a characterization, we introduce a novel framework that leverages Adversarial Learning and Human-in-the-Loop, Bayesian Inference to design CAPTCHAs schemes that are truly automated. We apply our framework to character CAPTCHAs and show that it does in fact generate a scheme that steadily moves closer to our design objectives of maximizing robustness while maintaining user experience and minimizing allocated resources, without requiring manual redesigning.

US Patent: Arif Khan, Falaah and Sharma, Hari Surender. Framework to Design Completely Automated Reverse Turing Tests. US Patent 16/828520, filed March 24, 2020 and US Patent (Provisional) 62/979500, filed February 21, 2020

Threat modelling is the process of identifying vulnerabilities in an application. The standard practice of threat modelling today involves drawing out the architecture of the product and then looking at the structure and nature of calls being made and determining which components could be vulnerable to which kinds of attacks.

Threat modelling is an extremely important step in the software development lifecycle, but emerging practice shows that teams usually only construct and evaluate the threat model before deploying the application. Industrial offerings also cater to this approach, by designing tools that generate static models, suitable for one-time reference. The major drawback in this approach is that a software is not a static entity and is subject to dynamic changes in form of incremental feature enhancements and routine re-design for optimization. Threat modelling, hence, should also be imparted the same dynamism and our work attempts to enable this.

Application logs are used to model the product as a weighted directed graph, where vertices are code elements and edges indicate function calls between elements. Unsupervised learning models are used to set edge weights as indicators of vulnerability to a specific attack. Graph filters are then created and nodes that pass through the filter form the vulnerable subgraph. Superimposing all the vulnerable subgraphs with respect to the different attacks gives rise to a threat model, which is dynamic in nature and evolves as the product grows.

The event-based search engine is an enhancement to conventional image searches. When performing a search for an object, such as a person, an image search using facial recognition may not yield many results, especially if there are relatively few pictures of the person. We fix this limitation by indexing objects based on their occurrence at events. Bipartite graphs are used for search optimization and complexity minimization, while propensity scoring models are used to maximize the precision of information retrieval performed on the graph.

As an example, a server hosting a search engine may receive a search query and determine a searched time interval, a searched object, and a searched event. The server may select, based on the searched time interval, a portion of an object-event bipartite graph that was created using information gathered from social media sites. The server may compare attributes of individual events in the portion with attributes of the searched event to identify a set of relevant events. The server may determine objects associated with the relevant events and compare attributes of individual objects with the attributes of the searched object to identify a set of relevant objects. The search engine may provide search results that include the set of relevant objects ordered according to their similarity to the searched object.

US Patent: Arif Khan, Falaah, Mohammed, Tousif, Gupta, Shubham, Dinh, Hung and Kannapan, Ramu. Event-Based Search Engine, US Patent 16/752775, filed January 27, 2020

Stuff

Articles, Talks and More!

Visit my blog https://thefaladox.wordpress.com/ for the entire archive of essays.

September 15, 2020 | Article

Hope Returns to the Machine Learning Universe

According to witnesses, Earth's been visited by the ***Superheroes of Deep Learning***. What do they want? What powers do they possess? Will they fight for good or for evil? Read to learn more!.

June 11th, 2020 | Interview

Interview with AI Hub

I sat down with the folks at AIHub to chat about my work and art. We talk (meta-)security, scientific comics and demystifying the hype around AI.

(BONUS!) What din't make it into the transcript: Ideating how we would conduct a global (Reverse) Turing Competition where its GANs vs artists and Pondering which problem humanity will solve first- creating AGI or disposing of the media hype

February 20, 2020 | Talk

The Impossibility of Productizable AI: Problems and Potential Solutions

In my talk at the Sparks Tech Forum at Dell, Bangalore, I present a social and technical perspective on the most pressing problems in Machine Learning today, the sources of these problems and some potential solutions.

Slides
May 28th, 2020 | Talk

The Hitchhiker's Guide to Technology: A Conversation on Careers in Tech

In this invited talk for the CETI Group's Career counselling initiative I share some friendly advice to undergraduate students from India on how to navigate the current industrial landscape, with special emphasis on prospects in AI/ML research.

Slides
March 24, 2020 | Talk

We Don't Need No Bot Infestation: Machine Learning for Cyber Security

In this talk for Dell's Technology and Innovation Pillar, I explore the applicability of machine intelligence and data-driven modelling for enterprise security and illustrate the best approaches to building 'intelligent security'.

January 4, 2020 | Article

Deep Learning Perspectives from Death Note: Another Approximately Inimitable Exegesis

Masked under a binge-worthy anime lies an adept critique of the ongoing deep learning craze in the industry. Here’s my commentary on the technical symbols in Death Note.

Get in Touch

Contact

Get it touch if you want to collaborate on an interesting research idea, want some custom cartoons for your presentations or some personalized art for your thesis/book covers or simply want to discuss something wonderfully esoteric!