.Job Title: Java Developer – DataBricks & Python (PySpark) Location: Remote (Must be based in Portugal) Contract Type: 6+ Months (Long-term project)Overview:We are looking for a highly skilled Java Developer with proven experience in DataBricks and Python (PySpark) to join our growing data engineering team.
This role is ideal for a developer who thrives in high-performance data environments and enjoys solving complex challenges in large-scale data processing.
You will play a key role in designing, developing, and optimizing robust data pipelines and analytics workflows, primarily within the DataBricks ecosystem on cloud platforms.Key Responsibilities: Data Engineering & Development:Design and build scalable Java-based data solutions for real-time and batch processing.Develop efficient PySpark scripts to transform and process large datasets using DataBricks.Optimize code for performance and maintainability in distributed data environments.
Cross-functional Collaboration:- Partner with data scientists, analysts, and product teams to translate business needs into technical solutions.- Work in an Agile setup, contributing to continuous delivery and improvement initiatives.
System Integration:- Integrate Java applications with DataBricks, ensuring seamless data flow across systems.- Build modular, reusable components to support cloud-native data engineering practices.
Performance Tuning & Optimization:- Analyze and fine-tune existing data workflows for performance and scalability.- Troubleshoot bottlenecks and ensure efficient processing in distributed systems.
Maintenance & Documentation:- Maintain and enhance existing applications and data pipelines.- Create and maintain thorough documentation covering data flows, system design, and operational processes.Required Skills & Qualifications: Technical Proficiency:Strong hands-on experience in Java development, writing clean and modular code.Expertise in DataBricks, including notebooks, jobs, clusters, and libraries.Proficiency in Python and PySpark for big data processing.Deep understanding of distributed computing and parallel data processing with Apache Spark.Experience with SQL and relational database systems.Familiarity with cloud platforms such as Azure or AWS.
Tools & Frameworks:- Experience with version control systems (e.G., Git).- Knowledge of CI/CD pipelines and best practices for automated testing and deployment.
Soft Skills:Excellent communication and interpersonal skills.Strong analytical mindset and ability to think critically under pressure.Self-driven and organized with a keen eye for detail.Preferred (Nice to Have):Experience with machine learning frameworks within the DataBricks environment.Exposure to Docker and Kubernetes for containerization and orchestration.Background in real-time data processing and streaming technologies.Education & Experience:- Bachelor's degree in Computer Science, Engineering, or a related field