This job is restricted to tax residents of , but we detected your IP as outside of the country. Please only apply if you are a tax resident.
Responsibilities include:
Cleanse, manipulate and analyze large datasets (Structured and Unstructured data – XMLs, JSONs, PDFs) using Hadoop platform.
Develop Python, PySpark, Spark scripts to filter/cleanse/map/aggregate data.
Manage and implement data processes (Data Quality reports)
Develop data profiling, deduping logic, matching logic for analysis
Programming Languages experience in Python, PySpark and Spark for data ingestion
Programming experience in BigData platform using Hadoop platform
Present ideas and recommendations on Hadoop and other technologies best use to management
Qualifications:
5+ years of experience in processing large volumes and variety of data (Structured and unstructured data, writing code for parallel processing, XMLS, JSONs, PDFs)
3+ years of programming experience in Python, Spark for data processing and analysis.
Strong SQL experience is a must
3+ years of experience – using Hadoop platform and performing analysis. Familiarity with Hadoop cluster environment and configurations for resource management for analysis work