Data Quality Engineer

103
Data Quality Engineer – RemoteDescription

Founded in 2010, Scrapinghub is on pace to grow revenue more than 5x in the next 3 years, with the largest growth coming from our SaaS product business lines.

We are a fast growing and diverse technology business turning web content into useful data with a cloud-based web crawling platform, off-the-shelf datasets, and turn-key web scraping services.Our comprehensive web crawling stack powers crawls of over 9 billion pages per month and is used by everyone from individual developers to Fortune 100 companies.

We’re a globally distributed team of over 190 Shubbers who are passionate about web data and data science.

About the Job:

Data QA is an important function within Scrapinghub. The Data QA team works to ensure that the quality and usability of the data scraped by our web scrapers meets and exceeds the expectations of our enterprise clients.

Are you passionate about data and data quality and integrity?

Do you enjoy using programming languages and tools to analyze and manipulate data, detect data quality issues, and visualize your findings?

Are you highly customer-focused with excellent attention to detail?

Owing to growing business and the need for ever more sophisticated Data QA, we are looking for a talented Data Quality Engineer to join our team. As a Scrapinghub Engineer, you will take primarily automated and semi-automated data wrangling, data manipulation, and data visualisation techniques and apply them in the verification and validation of data quality as it pertains to data extracted from the web.

Job Responsibilities:

  • Understand customer web scraping and data requirements; map these requirements to custom scripts in your language/tool of choice, with a view to establishing the degree of data quality and uncovering data quality issues.
  • Draw conclusions about data quality by producing descriptive and inferential statistics, summaries, and visualisations.
  • Supplement existing manual QA and schema validation techniques with advanced data wrangling and manipulation.
  • As needed, perform complementary manual and semi-automated verification.
  • Collaborate with developers to further troubleshoot and pinpoint solutions.
  • Present findings and conclusions to stakeholders at various levels (other members of the QA department, developers, project managers, account managers, customers).
  • Write high-quality, well-structured code that is maintainable and extensible.
  • Manage code using GitHub, BitBucket and other version control approaches as applicable.

Requirements

  • Highly proficient in one or more of Pandas, SQL, R, Excel.
  • BS degree in Computer Science, Engineering, Mathematics, or equivalent.
  • Demonstrable programming knowledge and experience, minimum of 3 years (please provide code samples in your application – ideally pertaining to data analysis – via a link to GitHub or other publicly-accessible service).
  • Background in data profiling.
  • Strong analytical skills with unstructured data.
  • Experience in data management, data integration and data quality verification.
  • Experience in data quality visualization and the visualisation of data quality issues.
  • Ability to work with very large datasets (into the millions of records).
  • Strong knowledge of software QA methodologies, tools, and processes.
  • Excellent level of written and spoken English; confident communicator; able to communicate on both technical and non-technical levels with various stakeholders on all matters of QA.
  • Outstanding attention to detail and ability to meet deadlines.

Desired Skills:

  • Prior experience in a Data QA role (where the focus was on verifying data quality, rather than testing application functionality).
  • Familiarity with Jupyter and JupyterLab.
  • Experience with dashboard and monitoring tools such as Grafana, Kibana, FineReport, etc.
  • Experience building your own dashboards.
  • Interest in and flair for Data Science concepts as they pertain to data analysis and data validation (machine learning, inferential statistics etc.); if you have ideas, mention them in your application.
  • Experience with Spark, BigQuery, and other big data technologies.
  • Knowledge of and experience in other technologies that support a modern cloud-based software service (Linux, AWS, Docker, Kafka etc.)
  • Previous remote working experience.

Since you have read this far, the role clearly interests you. In your cover letter, describe in detail what appeals to you about the role, and some of your previous experience (ideally with public links to code or examples) in the area of data analysis, data visualization, and/or data quality verification.

Benefits

  • As a new Shubber, you will:
  • Become part of a self-motivated, progressive, multi-cultural team.
  • Have the freedom and flexibility to work from where you do your best work.
  • Attend conferences and meet with team members from across the globe.
  • Work with cutting-edge open source technologies and tools.
  • 35 days paid holidays
  • Enroll in Scrapinghub’s Share Programme

Apply Here!

Comments

Selibeng.com
Whether you are looking for your first job, a better job or just want to manage the direction of your career, explore educational opportunities, and/or pursue entrepreneurship, Selibeng.com offers the resources you need to make it happen.