About the position – Senior Big Data Engineer/Technical Architect
We are seeking a Senior Big Data Engineer with a strong background in managing structured and unstructured data pipelines, who thrives in a fast-paced AI-focused environment. You will be instrumental in building and scaling our data lake architecture, supporting a system designed to fuel intelligent AI agents for data collection, labeling, and analytical reasoning. This includes integrating vector databases and optimizing for retrieval-augmented generation (RAG) workflows deployed on AWS Bedrock and other AI stacks.
Responsibilities
Design and implement scalable ingestion pipelines for structured/unstructured data using AWS and Databricks Unity Catalog.
Build and maintain high-throughput ETL/ELT pipelines with Apache Airflow and Databricks.
Architect and manage data modeling, storage, and indexing strategies in PostgreSQL and RDS, ensuring compatibility with AI retrieval systems.
Integrate and manage vector databases to support fast semantic and embedding-based search in RAG pipelines.
Collaborate with AI engineers to ensure seamless compatibility with LangGraph and LangSmith agent systems.
Implement robust data validation, lineage, and governance systems using Unity Catalog.
Optimize performance across distributed compute environments (Databricks, EC2).
Deploy and maintain Lambda-based microservices for scalable, real-time data ingestion and enrichment.
Requirements
5+ years working with big data systems in production environments.
Proven expertise with Databricks, Unity Catalog, and Apache Spark.
Proficiency in Airflow, AWS stack (Lambda, EC2, RDS), and cloud-based data lake architectures.
Strong SQL and database design skills (PostgreSQL preferred).
Preferred Qualifications
Experience with AI agent pipelines or large-scale ML model support.
Working knowledge of vector databases (Chroma, Pinecone, FAISS).
Solid understanding of data lifecycle management in ML/AI contexts.
Bonus: Familiarity with LangGraph, LangSmith, LangChain, or similar agent orchestration tools.
Emphasis on data observability, security, and lineage tracking.
Hands-on with RAG architecture, including vector storage and semantic retrieval.
Exposure to AWS Bedrock and model deployment orchestrationAbout the position – Senior Big Data Engineer/Technical Architect
We are seeking a Senior Big Data Engineer with a strong background in managing structured and unstructured data pipelines, who thrives in a fast-paced AI-focused environment. You will be instrumental in building and scaling our data lake architecture, supporting a system designed to fuel intelligent AI agents for data collection, labeling, and analytical reasoning. This includes integrating vector databases and optimizing for retrieval-augmented generation (RAG) workflows deployed on AWS Bedrock and other AI stacks.
Responsibilities
Design and implement scalable ingestion pipelines for structured/unstructured data using AWS and Databricks Unity Catalog.
Build and maintain high-throughput ETL/ELT pipelines with Apache Airflow and Databricks.
Architect and manage data modeling, storage, and indexing strategies in PostgreSQL and RDS, ensuring compatibility with AI retrieval systems.
Integrate and manage vector databases to support fast semantic and embedding-based search in RAG pipelines.
Collaborate with AI engineers to ensure seamless compatibility with LangGraph and LangSmith agent systems.
Implement robust data validation, lineage, and governance systems using Unity Catalog.
Optimize performance across distributed compute environments (Databricks, EC2).
Deploy and maintain Lambda-based microservices for scalable, real-time data ingestion and enrichment.
Requirements
5+ years working with big data systems in production environments.
Proven expertise with Databricks, Unity Catalog, and Apache Spark.
Proficiency in Airflow, AWS stack (Lambda, EC2, RDS), and cloud-based data lake architectures.
Strong SQL and database design skills (PostgreSQL preferred).
Preferred Qualifications
Experience with AI agent pipelines or large-scale ML model support.
Working knowledge of vector databases (Chroma, Pinecone, FAISS).
Solid understanding of data lifecycle management in ML/AI contexts.
Bonus: Familiarity with LangGraph, LangSmith, LangChain, or similar agent orchestration tools.
Emphasis on data observability, security, and lineage tracking.
Hands-on with RAG architecture, including vector storage and semantic retrieval.
Exposure to AWS Bedrock and model deployment orchestration
Manpower
You must sign in to apply for this position.
