The Data engineer position will be responsible for handing the existing cloud based Data lake and build next version of analytics Datalake with advanced technologies.
Must have AWS, Python and Big Data Experience
Experience / Training:
- 3+ years of experience in AWS cloud (AWS solution architect / AWS Certified Data analytics certification will be preferable)
- 5+ years of experience in software engineering and Big Data Analytics.
- Prior experience on AWS cloud services EC2, Glue, Athena, S3, EKS, RDS, Redshift, Data pipeline, EMR, DynamoDB, cloud watch.
- Experience in creating and maintain Data lake on AWS cloud.
- Experience in Big Data analytics tools like Hadoop, Spark, Kafka etc...
- experience in collecting data from different source systems and create ETL pipelines to handle complex data sets & uncertain schema changes in data.
- experience in Python programming and analytics libraries like Pandas, NumPy etc...
- experience on Analytics skills and complex SQL based queries implementation.
- Data engineer also need to very passionate about efficient/accurate code development, optimising performance of organization Data lake.
- Good experience in UNIX based shell scripting.
- Support to Data scientist team for data availability, extract & provide required data sets.
- Coordinating with various teams and clients to provide data based on specific requirements.
- Creation and maintenance of optimal Data lake pipeline architectures.
- Stay abreast of industry trends and enable successful data solutions by leveraging best practices.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability.
- Partnering effectively with inhouse Products, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources and AWS 'big data' technologies.
- Assemble large, complex data sets that meet functional / non-functional business requirements.
- Keep our data separated and secure within national boundaries through multiple AWS regions.
- Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
- Bachelor/Master/Engineering in IT/Computer science/software engineering or relevant experience.
- Amazon Elastic Compute Cloud
- Amazon Redshift
- Amazon Relational Database Service
- Amazon S3
- Apache Hadoop
- Apache Kafka