Arpit Tak

Big Data Developer

I work on building end to end architecture for big data applications and running code on them( using Python, Java, Scala and Shell script) .I have built scalable big data systems from scratch including handling data from multiple structured/unstructured sources, web crawling, searching.



  • Batch-2013

    Govt. Center for Converging Technologies

    Integrated Bachelor & Master of Technology

Work Experience

  • 15 Sept 2016 – Present

    Quotient Technology Inc.

    Spark Developer

    • Developed and Design Consumer Behavior ETL pipeline to track/capture all the events on understand the shopping pattern from Sign In to Thank You page of 10m users per day.(100m records)
    • This pipeline served as source of truth for; Media revenue, Sales Revenue, Reporting dashboard and helps to track the $12m revenue per month basis.
    • It helps in monetizing the revenue for building campaign, offers for different brands and CPGs and increase sales by 1-2% MoM basis.

  • Jan 2016 – Aug 2016

    Focus Analytics

    Spark Developer

    • Track users activity and send them notifications(discount coupon/offers) on their mobile app based on their current location.
    • It helps to tracks pattern; how many users goes to shopping mall across per city/region/state wise, where they spend maximum time in shopping mall/food court etc.

  • 1 April 2015– 30 Nov 2015


    BigData Developer

    • This project aims to re-target users who drop off from websites(ecommerce,hotels,flights,app) across other websites (publisher) on the internet.
    • I have written more than 25 pig scripts which help to build the re-targeting model running everyday as a part of daily job in Azkaban, processing more than 2TB+ data on Hadoop cluster having more than 400 nodes.

  • Dec 2013 - Sept 2014

    Sigmoid Analytics

    Software Developer

    • Written bootstrap-code and scripts to automate the infrastrucutre on AWS cluster which installs hadoop, spark and shark on the cluster and loads 200+ GB of publically available data of wikipedia from s3 to HDFS using s3cmd and does caching automatically..
    • Builds 10 machines scalable cluster from scratch in just 20 min with 200 Gb data cached in Spark..

  • Aug- Nov 12

    Newgen Software Technologies

    Software Engineering,Intern ,New Delhi

    • Designed and created Credit Amendments and Loan Approval Process.
    • Data insertion and routing is done in SQL and integration is done with API based on JAVA .


I have expertise in following applications:

  • Apache Spark
  • Shark
  • Hadoop
  • Java
  • SparkSql
  • Apache Hive
  • Apache Pig
  • HBase
  • Python
  • ElasticSearch
  • Scala
  • Azkaban
  • Information Retrieval

Technical Talks

I have given Technical Talks on Big-Data to many corporate executives online , mostly on the insights/technical aspects of Apache Spark. Most of prominent ones are :-

  • Yahoo- Bangalore Office

    Given talk on Scheduling Hadoop and Bigdata Jobs , Detecting and Resolving Hadoop Job Failures and internals of Azkaban.

  • Yahoo

    In-memory cluster computing on 1TB wikipedia data and showing its performance with response time < 20-30 secs.

  • Kaggle

    In-memory cluster computing on 500GB wikipedia data and showing its performance with response time < 15-20 secs.

  • Sears Holdings

    Queries with bounded errors and bounded response time using BlinkDB on 500 million records taken from Berkeley Amplab data.

Get In Touch

I'd love to hear from you. If you think I would be a good fit for your upcoming project, or would just like to just say hello, please fill out the form below.

Error boy
Your message was sent, thank you!