Data Architect with cloud ideally GCP and PySpark experience is required for 6-month contract with a leading financial services organisation based in London. You will architect, design, estimate, developing and deploy cutting edge software products and services that leverage large scale data ingestion, processing, storage and querying, in-stream & batch analytics for Cloud and on-prem environments.

THIS ROLE IS FULLY REMOTE AND INSIDE IR35

Experience:

Extensive experience with Data related technologies, including knowledge of Big Data Architecture Patterns and Cloud services (AWS/Azure/GCP)
GCP experience is desirable (Big Query, Pub-Sub, Spanner)
Experience delivering end to end Big Data solutions on-premise and/or on Cloud
Knowledge of the pros and cons of various database technologies like Relational, NoSQL, MPP, Columnar databases
Expertise in the Hadoop eco-system with one or more distribution-like Cloudera and cloud-specific distributions
Proficiency in Java and Scala programming languages
Python experience
Expertise in one or more NoSQL database (Mongo DB, Cassandra, HBase, DynamoDB, Big Table etc.)
Experience of one or more big data ingestion tools (Sqoop, Flume, NiFI etc.), distributed messaging and ingestion frameworks (Kafka, Pulsar, Pub/Sub etc.)
Expertise with at least one distributed data processing framework eg Spark (Core, Streaming, SQL), Storm, Flink etc.
Knowledge of flexible, scalable data models addressing a wide variety of consumption patterns including random-access, sequential access including necessary optimisations like bucketing, aggregating, sharding
Knowledge of performance tuning, optimization and scaling solutions from a storage/processing standpoint
Experience building DevOps pipelines for data solutions, including automated testing

Desirable:

Knowledge of containerization, orchestration and Kubernetes engine
An understanding of how to setup Big data cluster security (Authorization/Authentication, Security for data at rest, data in transit)
A basic understanding of how to manage and setup Monitoring and alerting for Big data clusters
Experience of orchestration tools - Oozie, Airflow, Ctr-M or similar
Experience of MPP style query engines like Impala, Presto, Athena etc.
Knowledge of multi-dimensional modelling like start schema, snowflakes, normalized and de-normalized models
Exposure to data governance, catalog, lineage and associated tools would be an added advantage
A certification in one or more cloud platforms or big data technologies
Any active participation in the Data Engineering thought community (eg blogs, key note sessions, POV/POC, hackathon)

Role: Data Architect - Remote - Cloud/PySpark/Java or Scala
Job Type: Contract
Location: Not Specified,

Apply for this job now.

Other Java contracts

What

Where

Remote

Data Architect - Remote - Cloud/PySpark/Java or Scala (Inside IR35)

Other Java contracts

0 outside IR35 Java contracts