Experience

 
 
 
 
 

Senior Software Engineer

Dropbox

New York
  • Part of team that manages previews serving infrastructure which converts uploaded files into previewable formats. The service handles 20k qps requests at peak doing conversions within jailed environments.
  • Migrated legacy HTTP routes to grpc based services, by creating a wrapper service around legacy li- braries in order to move towards SOA.
  • DRI for migration of file metada storage as well as extraction pipeline to on-the-fly extraction. Gained alignment across multiple teams in order to deprecate defunct use-cases and reduce costs by 400k/yr.
  • Worked on the new document conversion pipeline for converting MS Office documents into previewable formats securly in jailed environments, at scale.
 
 
 
 
 

Sr. Assoc Research Staff

NEC Labs America

Princeton, NJ
  • NGLA: An end-to-end log analytics service (Jan 2015- Nov 2017)

    • Architect and led the design & development of streaming anomaly detection with NoSQL database (Elas- ticSearch), Kafka and Spark Streaming. Owned most components of the pipeline for streaming analytics - Collaborated on design of complex time-series, stateful and stateless log analytics in a multi-tier setup - Designed a control interface for streaming analytic task job management (tasks involved - model man- agement, in-memory states, periodic anomaly check, start/stop, and cleanup)
    • Modified core apache spark code to introduce support for on-the-fly broadcast model update, leveraged this in deploying model control management interface in spark streaming
    • Designed a prototype web-interface for real-time visualization using Flask, javascript and bootstrap
    • Founding member with an initial team of 3, experience in data cleaning, preprocessing, log pattern rep- resentation and parsing including multiple POC trials for customer data Behavior Analysis Engine (Jan 2017-Nov 2017)
    • A Semantic language framework for knowledge representation - expressing complex machine learning log models and allowing administrators to express domain knowledge as “rules” and “behaviors”
    • Worked in a two person team for writing language grammar and execution operations using Spark SQL - Developed (in progress) RESTful API to convert BAE to an SOA with job and rule management
  • CLUE: Distributed System Trace Analytics (Jan 2013- May 2015)

    • Stitched kernel event logs to generate end-to-end “transactions”, which can help give a “CLUE” to the root-cause of bugs. CLUE uses data-mining and transaction clustering to find potential anomalies
    • Developed a novel hybrid (static + dynamic) binary instrumentation tool called iProbe with an order of magnitude better performance than state of art-tools
    • Collaborated on core-engine development, and designed the interface along with data visualizations, and project management
  • NetLogic (Jan 2015 - Dec 2015):

    • Building a software defined data-center and cloud environments by deploying OpenStack and Open- VSwitch based network management. Deployed and managed the OpenStack infrastructure, and wrote several wrappers to setup a small internal cloud.
    • Developed a novel prototype network manager called HybNET for hybrid network infrastructure with both SDN and legacy switches. The controller allowed centralized network management despite partial transition to SDN switches.
 
 
 
 
 

Business Analyst

McKinsey and Co

New York, NY
Product Owner Proxy for Scrum roll-out team (Agile s/w Development) in McKinsey App-Dev. Also designed architecture & a proof of concept of a trend analysis tool.
 
 
 
 
 

Graduate Research Assistant

Columbia University

New York, NY
  • Thesis: Developed on-the-fly sandboxed debugging framework called Parikshan, which allows developers to debug SOA applications hosted on user-space containers in a cloned parallel container, without any downtime and any impact on the production facing service
  • Parikshan leverages live-cloning a modification of live-migration and a new network duplication proxy to enable on-the-fly cloning of OpenVZ containers
  • Also worked on other projects associated with the lab- COMPASS, research in Multi-core Software Engi- neering, Binary/Run-time instrumentation, static and dynamic program analysis, Recommender Systems. and system administration/mentoring research students.
 
 
 
 
 

Research Consultant

Instituto de Soldedura Equalidade

New York, NY
Designed a prototype for a Decision Support Tool with an interactive interface for Natural Gas + Hydrogen combine fuel being tested for use in pipelines all over Europe. The tool was designed in Visual Basic.Net
 
 
 
 
 

Research Assistant

Indian Institute of Technology

New York, NY
Worked in the Computer Integrated Manufacturing (CIM) Lab on comparing genetic algorithms, simulated annealing and tabu search algorithms to evaluate algorithm efficiencies.

Projects

NGLA: Next Generation Log Analytics

Most modern day softwares generate human readable logs for developers/administrators to understand and realize the cause of any error …

CLUE

Modern computer systems, from single servers to large cloud deployments, generate billions of events that reflect the state and …

Publications

Short time-to-localize and time-to-x for production bugs is extremely important for any 24x7 service-oriented application (SOA). …

Administrators of most user-facing systems depend on periodic log data to get an idea of the health and status of production …

Network virtualization has been propounded as a diversifying attribute of the future inter-networking paradigm. However, monitoring and …

Troubleshooting Software-Defined Networks requires a structured approach to detect mistranslations between high-level intent (policy) …

Performance bugs are frequently observed in commodity software. While profilers or source code-based tools can be used at development …

Unified tracing is the process of collecting trace logs across the boundary of kernel and user spaces, and has been used to understand …

In this paper, we present CLUE, a system event analytics tool for black-box performance diagnosis in production Cloud Computing …

To diagnose performance problems in production systems, many OS kernel-level monitoring and analysis tools have been proposed. Using …

Calling context provides important information for a large range of applications, such as event logging, profiling, debugging, anomaly …

The emergence of Software-Defined Networking(SDN) has led to a paradigm shift in network management. SDN has the capability to provide …

Application tracing in production systems requires dynamic and flexible instrumentation mechanisms with lowoverhead. Tracing tools may …

We present a tool BEST (Binary instrumentation-based Error-directed Symbolic Testing) for predicting concurrency violations.1 We …

Short time-to-bug localization is extremely important for any 24x7 service-oriented application. To this end, we introduce a new …

Recommender systems have become increasingly popular. Most research on recommender systems has focused on recommendation algorithms. …

Recommender systems have become increasingly popular. Most of the re- search on recommender systems has focused on recommendation …

The widespread adoption of multicores has renewed the emphasis on the use of parallelism to improve performance. The present and …

In this poster, we will describe research being done at Columbia University on a system called COMPASS: A community driven …

Patents

Issued:
  • USPTO - 14030 Path Selection in Hybrid NetworksUtility-ORGUS (8/9/2016)
  • USPTO - 13148 Dynamic Border Line Tracing for Tracking Message Flows Across Distributed Systems (1/3/2017)
  • USPTO - 13062 Transparent Performance Inference of Whole Software Layers and Context Sensitive Performance Debugging (6/14/2016)
  • USPTO - 13035 Method and Apparatus for managing Hybrid Network Systems (9/20/2016)
  • USPTO - 12155 Guarding a Monitoring Scope and Interpreting Partial Control Flow (10/18/2016)
  • USPTO - 12082 Method and System for Computer Assisted Hot-Tracing Mechanism (11/8/2016)
  • USPTO - 12049 Blackbox Memory Monitoring with a Calling Context Memory Map and Semantic ExtractionUtility (4/7/2015)
  • USPTO - 12016 Efficient Unified Tracing of Kernel and User Events with Multi-Mode Stacking (11/25/2014)
  • USPTO - 12010 Method and Apparatus for Correlated Tracing with Automated Multi-Layer Function Instrumentation Localization (7/28/2015)
  • Japan Patent Office - 13035J Hybrid Network Management (11/10/2015)
Pending:

Pending patents available on request.

Contact

nipun<at>cs<dot>columbia<dot>edu