I work on building end to end architecture for big data applications and running code on them( using Python, Java, Scala and Shell script) .I have built scalable big data systems from scratch including handling data from multiple structured/unstructured sources, web crawling, searching.
15 Sept 2016 – Present
• Processing and Writing Bigdata Spark jobs
• Integration of Shell , Python, Bash , Hive jobs in Oozie workflow .
1 April 2015– 30 Nov 2015
• This project aims to re-target users who drop off from websites(ecommerce,hotels,flights,app) across other websites (publisher) on the internet.
• I have written more than 25 pig scripts which help to build the re-targeting model running everyday as a part of daily job in Azkaban, processing more than 2TB+ data on Hadoop cluster having more than 400 nodes.
July 2013 - March 2015
• Written bootstrap-code and scripts to automate the infrastrucutre on AWS cluster which installs hadoop, spark and shark on the cluster and loads 200+ GB of publically available data of wikipedia from s3 to HDFS using s3cmd and does caching automatically..
• Builds 10 machines scalable cluster from scratch in just 20 min with 200 Gb data cached in Spark..
Aug- Nov 12
• Designed and created Credit Amendments and Loan Approval Process.
• Data insertion and routing is done in SQL and integration is done with API based on JAVA .
I have expertise in following applications:
I have given Technical Talks on Big-Data to many corporate executives online , mostly on the insights/technical aspects of Apache Spark. Most of prominent ones are :-
I'd love to hear from you. If you think I would be a good fit for your upcoming project, or would just like to just say hello, please fill out the form below.