Responsibilities and Duties
1. Understanding business requirements & helping in assessing them with the development teams.
2. Creating high quality documentation supporting the design/coding tasks.
3. Participate in architecture / design discussions and Develop the ETL/ELT using PySpark and
SparkSql
4. Conduct code and design reviews and provide review feedback.
5. Identify areas of improvement in framework and processes and strive them to make better.
Key Skills
Desired Skill: 1. Airflow 2. Understanding Object oriented programming 3. Devops implementation knowledge 4. Git Commands 5. Python Sphinx, Pandas, SQL Alchemy , Mccabe, Unitest etc.. Modules
Required Experience and Qualifications
Qualifications:
1. At least 3 years working experience in a Big Data Environment
2. Knowledge on design and development best practices in datawarehouse environments
3. Experience developing large scale distributed computing systems
4. Knowledge of Hadoop ecosystem and its components – HBase, Pig, Hive, Sqoop, Flume, Oozie,
etc.
5. Experience of Pyspark & Spark SQL
6. Experience with integration of data from multiple data sources
7. Implement ETL process in Hadoop (Develop big data ETL jobs that ingest, integrate, and export
data.) Converting Teradata SQL to PySpark SQL.
8. Experience in Presto, Kafka, Nifi.
Job Type: Contract
Salary: $60.00 to $65.00 /hour
Experience: