Job Title: Java Developer – DataBricks & Python (PySpark)
Location: Remote (Must be based in Portugal)
Contract Type: 6+ Months (Long-term project)
Overview:
We are looking for a highly skilled Java Developer with proven experience in DataBricks and Python (PySpark) to join our growing data engineering team. This role is ideal for a developer who thrives in high-performance data environments and enjoys solving complex challenges in large-scale data processing. You will play a key role in designing, developing, and optimizing robust data pipelines and analytics workflows, primarily within the DataBricks ecosystem on cloud platforms.
Key Responsibilities:
Data Engineering & Development:
- Design and build scalable Java-based data solutions for real-time and batch processing.
- Develop efficient PySpark scripts to transform and process large datasets using DataBricks.
- Optimize code for performance and maintainability in distributed data environments.
Cross-functional Collaboration:
- Partner with data scientists, analysts, and product teams to translate business needs into technical solutions.
- Work in an Agile setup, contributing to continuous delivery and improvement initiatives.
System Integration:
- Integrate Java applications with DataBricks, ensuring seamless data flow across systems.
- Build modular, reusable components to support cloud-native data engineering practices.
Performance Tuning & Optimization:
- Analyze and fine-tune existing data workflows for performance and scalability.
- Troubleshoot bottlenecks and ensure efficient processing in distributed systems.
Maintenance & Documentation:
- Maintain and enhance existing applications and data pipelines.
- Create and maintain thorough documentation covering data flows, system design, and operational processes.
Required Skills & Qualifications:
Technical Proficiency:
- Strong hands-on experience in Java development, writing clean and modular code.
- Expertise in DataBricks, including notebooks, jobs, clusters, and libraries.
- Proficiency in Python and PySpark for big data processing.
- Deep understanding of distributed computing and parallel data processing with Apache Spark.
- Experience with SQL and relational database systems.
- Familiarity with cloud platforms such as Azure or AWS.
Tools & Frameworks:
- Experience with version control systems (e.g., Git).
- Knowledge of CI/CD pipelines and best practices for automated testing and deployment.
Soft Skills:
- Excellent communication and interpersonal skills.
- Strong analytical mindset and ability to think critically under pressure.
- Self-driven and organized with a keen eye for detail.
Preferred (Nice to Have):
- Experience with machine learning frameworks within the DataBricks environment.
- Exposure to Docker and Kubernetes for containerization and orchestration.
- Background in real-time data processing and streaming technologies.
Education & Experience:
- Bachelor's degree in Computer Science, Engineering, or a related field.
- 3–5+ years of experience in Java development with hands-on experience in DataBricks and PySpark.
Why Join Us?
- Work in a fast-paced, data-driven environment using the latest in big data and cloud technologies.
- Join a collaborative and forward-thinking team that values innovation and technical excellence.
- Opportunity to grow your skills in a long-term, cutting-edge data engineering project.
If you're passionate about building smart, scalable solutions in cloud-based big data environments – we'd love to hear from you