bi

Arpit Tak

Big Data Developer

I work on building end to end architecture for big data applications and running code on them( using Python, Java, Scala and Shell script) .I have built scalable big data systems from scratch including handling data from multiple structured/unstructured sources, web crawling, searching.

Resume

Education

  • Batch-2013

    Govt. Center for Converging Technologies

    Integrated Bachelor & Master of Technology

Work Experience

  • 15 Sept 2016 – Present

    Quotient Technology Inc.

    Spark Developer

    • Processing and Writing Bigdata Spark jobs
    • Integration of Shell , Python, Bash , Hive jobs in Oozie workflow .

  • 1 April 2015– 30 Nov 2015

    Vizury

    BigData Developer

    • This project aims to re-target users who drop off from websites(ecommerce,hotels,flights,app) across other websites (publisher) on the internet.
    • I have written more than 25 pig scripts which help to build the re-targeting model running everyday as a part of daily job in Azkaban, processing more than 2TB+ data on Hadoop cluster having more than 400 nodes.

  • July 2013 - March 2015

    Sigmoid Analytics

    Software Developer

    • Written bootstrap-code and scripts to automate the infrastrucutre on AWS cluster which installs hadoop, spark and shark on the cluster and loads 200+ GB of publically available data of wikipedia from s3 to HDFS using s3cmd and does caching automatically..
    • Builds 10 machines scalable cluster from scratch in just 20 min with 200 Gb data cached in Spark..

  • Aug- Nov 12

    Newgen Software Technologies

    Software Engineering,Intern ,New Delhi

    • Designed and created Credit Amendments and Loan Approval Process.
    • Data insertion and routing is done in SQL and integration is done with API based on JAVA .

Skills

I have expertise in following applications:

  • Apache Spark
  • Shark
  • Hadoop
  • Java
  • SparkSql
  • Apache Hive
  • Apache Pig
  • HBase
  • Python
  • ElasticSearch
  • Scala
  • Azkaban
  • Information Retrieval

Technical Talks

I have given Technical Talks on Big-Data to many corporate executives online , mostly on the insights/technical aspects of Apache Spark. Most of prominent ones are :-

  • Yahoo- Bangalore Office

    Given talk on Scheduling Hadoop and Bigdata Jobs , Detecting and Resolving Hadoop Job Failures and internals of Azkaban.

  • Yahoo

    In-memory cluster computing on 1TB wikipedia data and showing its performance with response time < 20-30 secs.

  • Kaggle

    In-memory cluster computing on 500GB wikipedia data and showing its performance with response time < 15-20 secs.

  • Sears Holdings

    Queries with bounded errors and bounded response time using BlinkDB on 500 million records taken from Berkeley Amplab data.

Get In Touch

I'd love to hear from you. If you think I would be a good fit for your upcoming project, or would just like to just say hello, please fill out the form below.

Error boy
Your message was sent, thank you!