Save this contract

Mind providing us with your email so we can save this contract? We promise we won't spam you, and you can unsubscribe any time.
Saved this job! You can see it in saved.
Saved this job! You can see it in saved.

Remote Apache Spark contract jobs


4 remote Apache Spark contracts

Zaloni Administrator

3 hours ago
RemoteMatlen Silver

Zaloni Platform Administrator
New project requirement for building Zaloni Cluster for GHR

  • Strong experience in Data Lake

Skilled in at least a few of the following technologies:
o Zaloni development or Admin
o Databases: Oracle, SQL Server or other non-relational Database
o Knowledge on Hadoop
o Apache Kafka, Apache Storm, Apache Spark.
o Familiarity of deploying applications with container technology (Docker, Kubernetes, etc.)

Critical Skills/Niche Skills: Zaloni Admin
Experience Level: 5 to 7 years

Job Type: Contract


  • 401(k)
  • Health Insurance


  • Monday to Friday

COVID-19 considerations:
Applicant will be working remotely until Organization's decision to return to work at the Jersey City, NJ office. Return to office date is unknown.


  • Zaloni: 1 year (Preferred)
  • Apache Spark, Apache Kafka, Apache Storm: 3 years (Preferred)
  • Hadoop Administration: 5 years (Preferred)

Work authorization:

  • United States (Required)

Contract Renewal:

  • Likely

Full Time Opportunity:

  • Yes

Company's website:


Work Remotely:

  • Temporarily due to COVID-19
Get new remote Apache Spark contracts sent to you every week.
Subscribed to weekly Apache Spark alerts! 🎉

Scientific Data Engineer

1 month ago
$55 - $70/hour (Estimated)RemoteAllen Institute for Immunology

Bioinformatics Data Engineer

The mission of the Allen Institute is to unlock the complexities of bioscience and advance our knowledge to improve human health. Using an open science, multi-scale, team-oriented approach, the Allen Institute focuses on accelerating foundational research, developing standards and models, and cultivating new ideas to make a broad, transformational impact on science.

The goal of the Allen Institute for Immunology is to advance the fundamental understanding of human immunology through the study of immune health and disease where excessive or impaired immune responses drive pathological processes.

The Allen Institute for Immunology is seeking a Bioinformatics Data Engineer (Data Scientist) with broad experience in developing computer codes/scripts to automate the analysis of omics data, especially next generation sequencing (NGS) data, to join our Informatics and Computational Biology team.

You will be part of a multidisciplinary team and will be responsible for (i) development and implementation of data processing and analysis software as needed, (ii) assisting in both pipeline and exploratory analysis of data from diverse assays and sample types, (iii) working towards visualizations and reports for internal and external dissemination. As such, ideal candidates should have a good understanding of sequencing technologies, and a proven track record of development of analytical software packages. This role includes analysis and integration of “big data” types, and working in close collaboration with the software development team for deployment on our interactive cloud environment to ensure user accessibility and generation of actionable insights. You will also support technology development projects in collaboration with the Molecular Biology and Immunology teams.

Good judgment and problem-solving skills are required for recognizing anomalous data, identifying and fixing code bugs and participating in data-driven algorithm design and improvement. A successful candidate will have demonstrated success in big data science, code optimization and deployment. The Bioinformatics Data Engineer must have excellent attention to detail and the eagerness to work in a team science, deadline-driven atmosphere.

Essential Functions

  • Design and develop software programs to optimize scRNA-seq, scATAC-seq & CITE-seq processing pipelines and analysis algorithms including PCA and dimensionality reduction

  • Deploy automated pipelines in our interactive cloud environment with graphical user interface to facilitate user accessibility

  • Publish codebase or software as part of high impact publications or releases

  • Integrate multiple data streams for “Big Data” analysis (examples include scRNA-seq, scATAC-seq, flow cytometry, WGS)

  • Generate interactive data visualizations and work with end users to identify actionable insights

  • Exploratory data mining

  • Meet production deadlines for data analysis and be able to pivot between multiple projects

Required Qualifications

  • Bachelor's degree in a big data computational field (e.g., Bioinformatics, Computer Science, Biostatistics, Physics, Mathematics) with a minimum of 2 years experience in analyzing omics data.

  • Demonstrated success in a multidisciplinary team environment.

  • Good understanding of sequencing technologies, data processing and integrative analysis

  • Fluency in Java, Python, R and Unix shell scripting.

  • Experience in Big Data analysis, code optimization & parallel programming. Proven experience with big data analysis technical and languages such as Apache Spark, BigTable, Scala or Rust.

  • Good knowledge of version control systems such as Git

  • Strong organizational, teamwork, and communication skills

  • Attention to detail, and good problem-solving skills

Preferred Qualifications

  • Masters or PhD in Bioinformatics/Computational Biology or similar

  • Familiarity with immunology

  • Understanding of Flow Cytometry and CyTOF analysis a plus

  • Familiarity with cloud computing

  • Ability to implement, test, and share new computational tools quickly, in an iterative manner, after feedback from experimental, data production, and analysis teams

  • Excellent work ethic displayed as a reliable, self-motivated, enthusiastic team player

  • Ability to learn new programming languages and packages

  • Eager to learn new skills

Work Environment

  • Working at a computer and using a mouse for extended periods of time

  • May need to work outside of standard working hours at times


  • Some travel may be required

Additional Details:

  • This role is currently able to work remotely full-time, this may change and you may be required to work onsite as safety restrictions are lifted in relation to Covid-19. You must be a Washington State resident to work remotely.

  • We are open to full-time, part-time, and/or contract work for this role. When you apply, please specify which work arrangement you desire. We are flexible.

Additional Comments

**Please note, this opportunity does sponsor work visas**

**Please note, this opportunity offers relocation assistance**

Data Engineer

1 month ago
RemoteGeorgia IT Inc.

We are looking for strong Data Engineers, skilled in Hadoop, Scala, Spark, Kafka, Python, and AWS. I've included the job description below.
Here is what we are looking for:

Overall Responsibility:

  • Develop sustainable data driven solutions with current new gen data technologies to meet the needs of our organization and business customers.
  • Apply domain driven design practices to build out data applications. Experience in building out conceptual and logical models.
  • Build out data consumption views and provisioning self-service reporting needs via demonstrated dimensional modeling skills.
  • Measuring data quality and making improvements to data standards, helping application teams to publish data in the correct format so it becomes easy for downstream consumption.
  • Big Data applications using Open Source frameworks like Apache Spark, Scala and Kafka on AWS and Cloud based data warehousing services such as Snowflake.
  • Build pipelines to enable features to be provisioned for machine learning models. Familiar with data science model building concepts as well as consuming and from data lake.

Basic Qualifications:

  • At least 8 years of experience with the Software Development Life Cycle (SDLC)
  • At least 5 years of experience working on a big data platform
  • At least 3 years of experience working with unstructured datasets
  • At least 3 years of experience developing microservices: Python, Java, or Scala
  • At least 1 year of experience building data pipelines, CICD pipelines, and fit for purpose data stores
  • At least 1 year of experience in cloud technologies: AWS, Docker, Ansible, or Terraform
  • At least 1 year of Agile experience
  • At least 1 year of experience with a streaming data platform including Apache Kafka and Spark

Preferred Qualifications:

  • 5+ years of data modeling and data engineering skills
  • 3+ years of microservices architecture & RESTful web service frameworks
  • 3+ years of experience with JSON, Parquet, or Avro formats
  • 2+ years of creating data quality dashboards establishing data standards
  • 2+ years experience in RDS, NOSQL or Graph Databases
  • 2+ years of experience working with AWS platforms, services, and component technologies, including S3, RDS and Amazon EMR

Job Type: Contract


  • Monday to Friday


  • AWS: 1 year (Preferred)
  • Hadoop: 1 year (Required)
  • Spark: 1 year (Required)
  • Big Data: 1 year (Preferred)
  • Scala: 1 year (Preferred)
  • Data Engineering: 1 year (Required)

Contract Renewal:

  • Possible

Full Time Opportunity:

  • Yes

Work Location:

  • Fully Remote

Work Remotely:

  • Yes

BigData Solutions Engineer

21 days ago
$62 - $68/hourRemoteOmnipoint Services Inc

Our fortune client is looking for a talented Solutions Engineer.. This is one of our top clients and we have been successful in building out entire teams for this organization. This role will be temp to permanent, 40 hours/week, paid on an hourly rate plus very highly subsidized benefits. This role will start working remotely but after Covid restrictions are lifted, the goal is to have this person onsite in Hartford CT

  • 6+ years Hortonworks HDP Solution Architect helping re-solution the migration projects from HDP 2.6 to 3.1.
  • Thorough understanding of HDP 2.6 and 3.1 platforms and related tech stack.
  • Good documentation (Vizio) and presentation(PPT) skills.
  • HDP 2.x and HDP 3.x


  • Review project current solution, document the proposed solution review with involved groups, help engineering teams implement the solution end to end with low level technical recommendations and code review.
  • Document existing and new solution patterns.

Tools involved:

  • Apache Hadoop 3.1.1(Hadoop File System)
  • Apache HBase 2.0.0(Java APIs)
  • Apache Hive 3.1.0(Hive Query Language)
  • Apache Kafka 1.1.1(Java/Python/Spark streaming APIs)
  • Apache Phoenix 5.0.0(Standard SQL, JDBC, ODBC)
  • Apache Pig 0.16.0
  • Apache Ranger 1.1.0
  • Apache Spark 2.3.1(Java, Scala, Python)
  • Apache Sqoop 1.4.7
  • Apache Tez 0.9.1

Java based web services APIs and python clients.

Job Types: Full-time, Contract

Pay: $62.00 - $68.00 per hour


  • Apache Hive 3.1.0(Hive Query Language): 4 years (Required)
  • Apache Kafka 1.1.1(Java/Python/Spark streaming APIs): 4 years (Required)
  • Apache HBase 2.0.0(Java APIs): 4 years (Required)
  • Apache Ranger 1.1.0: 4 years (Required)
  • Java based webservices APIs and python clients: 2 years (Required)
  • Apache Spark 2.3.1(Java, Scala, Python): 4 years (Required)
  • Hortonworks HDP Solution Architect: 8 years (Required)
  • Apache Pig 0.16.0: 4 years (Required)
  • Apache Hadoop 3.1.1(Hadoop File System): 4 years (Required)
  • HDP 2.6 and 3.1 platforms and related tech stack: 5 years (Required)
  • Apache Phoenix 5.0.0(Standard SQL, JDBC, ODBC): 4 years (Required)

Work Remotely:

  • No