CONTENTS

    My Experience of the Databricks New Grad Interview Process in 2025

    avatar
    Seraphina
    ·September 10, 2025
    ·10 min read
    I am sharing my firsthand experience with the Databricks new grad interview process for 2025 so you can ace your own journey

    I went through the Databricks new grad interview process in 2025, which can be divided into several stages: HR calls, online assessments, technical screens, virtual onsites, and the hiring manager round. In the following sections, I’ll walk you through the types of questions I encountered at each stage, share actual interview questions, and include additional insights I gathered from other candidates’ experiences.

    If you want to perform better in interviews, I recommend using Linkjob. It not only offers AI mock interview sessions based on real interview questions, but also provides personalized AI-generated answers during live interviews to help you respond to follow-up questions from interviewers.

    Databricks New Grad Interview Stages

    Recruiter Call

    The first round focused on surface-level topics, mainly covering my past experience, technical background, and fit for the role. The overall atmosphere was fairly relaxed. This round was primarily to confirm whether my profile aligned with the team’s needs. Afterwards, the HR explained the upcoming interview process to me.

    Online Assessment

    This round was conducted on CodeSignal and consisted of four problems to be completed in 70 minutes. The overall difficulty was between easy and medium, with a combination of two easy and two medium problems. A camera was required, and only one browser could be open. You were allowed to use scratch paper for notes.

    Since answers could be submitted multiple times, it was possible to check them quickly. I used Linkjob’s screenshot feature to help me solve the problems. I had tested it with a friend beforehand and confirmed that it wouldn’t be detected when sharing my screen, so this part went very smoothly.

    Click to take a screenshot, and the AI generates an answer that is only visible to you.

    Technical Phone Screen

    (This question is a variation of the LeetCode question 751. IP to CIDR. If you haven’t completed that question yet, it is recommended to solve it first.)

    An IP address is a formatted 32-bit unsigned integer where each group of 8 bits is printed as a decimal number, and the dot character . splits the groups. For example, the binary number 00001111 10001000 11111111 01101011 (spaces added for clarity) formatted as an IP address would be "15.136.255.107".

    A CIDR block is a format used to denote a specific set of IP addresses. It is a string consisting of a base IP address, followed by a slash /, followed by a prefix length k. The addresses it covers are all the IPs whose first k bits are the same as the base IP address.

    You are asked to design an IP firewall that determines whether an IPv4 address is allowed or denied based on an ordered list of “ALLOW” or “DENY” rules. Each rule specifies either a single IP or a CIDR block, and only the first matching rule determines the result. If no rule matches, deny by default.

    Example Rules and Queries:

    Rules = [
        {"ALLOW", "192.168.100.5/30"},
        {"DENY", "123.456.789.100/31"},
        {"ALLOW", "1.2.3.4"}
    ]
    • IP "192.168.100.4" matches the first rule → ALLOW

    • IP "123.456.789.100" matches the second rule → DENY

    • IP "1.2.3.4" matches the third rule → ALLOW

    Notes / Corner Cases:

    • If an IP matches multiple rules, the first matched rule determines the result.

    • Some rules may specify a single IP without a CIDR prefix (like "1.2.3.4"). This should be treated as a /32 prefix (single IP).

    • All IP addresses and CIDR blocks are valid IPv4 addresses.

    Implement the IpFirewall class:

    class IpFirewall:
        def __init__(self, rules: List[Tuple[str, str]]):
            """
            Initialize the firewall with a list of rules.
            Each rule is a tuple: (action, ip_or_cidr), where action is "ALLOW" or "DENY".
            """
            pass
            def query(self, ip: str) -> str:
            """
            Given an IP address, return "ALLOW" or "DENY" based on the first matching rule.
            """
            pass

    Onsite Interviews

    The onsite stage was the most intense part of the Databricks new grad interview process. The onsite interview included multiple rounds: Cross Functional + BQ, Algorithm, System Design and Coding.

    Cross Functional + BQ

    I had a discussion with the Engineering Manager (EM) about my experience, with a technical focus that delved into the details of my past projects. Having a well-prepared project story was important. In this round, the EM mainly focused on how I handle large-scale data processing, how I would optimize pipelines, and also asked some questions about team collaboration.

    Behavioral Questions:

    1.Tell me about the most technically complex system/project you've built or worked on. What made it complex and how did you handle it?

    2.Tell me about a time you had a strong technical disagreement with a colleague. How did you resolve it?

    3.Have you ever had to refactor a critical, messy piece of code? What was your approach and how did you ensure you didn't break anything?

    Algorithm

    In this round, I had to write code on the spot. The problems were similar to classic LeetCode questions, with a medium level of difficulty. The main focus was on fundamental coding skills and clear, logical thinking. The interviewers paid attention to whether the solution worked, and also to the approach, problem-solving process, and how clearly I could explain my reasoning while coding.

    System Design

    The question was about adding delete and trash functionality in a database server, and I was asked about production optimization strategies. My approach was to avoid physical deletion and instead add a boolean column to the original table to mark whether a record is in the trash. I combined this with parallel and asynchronous processing to improve performance and throughput.

    Coding

    The task was to implement in-place incremental encoding of an array: the first number remains unchanged, and each subsequent number is stored as the difference from the previous number.

    Approach: Keep the first element unchanged, then iterate from the last element backward to the second element. For each element, subtract the previous element’s value from it. Finally, return the modified array. This avoids data overwrite issues that could occur when traversing from front to back.

    Follow-Up: If the array is extremely large and cannot fit into a single machine’s memory, how would you perform incremental encoding on a distributed cluster, such as using Spark?

    Hiring Manager Round

    This round was an interview with the manager. It mainly assessed my technical abilities, including programming and data engineering skills, and also included some behavioral questions as well as questions about the company’s products and culture.

    Databricks Interview Questions

    While preparing for the interview, I also collected some actual interview questions and experience. Here, I’ve organized and categorized them to share with you.

    Algorithm Problems

    Delete Interval

    Delete a specific index from a given interval.

    Max Area of Island (dfs, O(M*N))

    Equivalent to LeetCode 695 - Max Area of Island. Use DFS to explore connected land cells and calculate the maximum area.

    SnapshotSet

    Implement a set that supports a versioned iterator (equivalent to LeetCode 1146). The iterator should reflect the state at the time it was created; subsequent put or remove operations should not affect the iterator’s traversal.

    KV Store

    Based on LeetCode 981. Design a key-value store that supports efficient get, put, and time-based query. Key technical points: sliding window, using TreeMap or deque to handle timestamps efficiently.

    Anagrammed indexOf

    Similar to LeetCode 438. Determine if a substring is any permutation of a target string.

    House Robber

    LeetCode 198: Classic House Robber problem.

    • Follow-up 1: LeetCode 213 - House Robber II (houses arranged in a circle).

    • Follow-up 2: Change the interval to skip k houses.

    Fibonacci Tree

    Given a Fibonacci tree (pre-order traversal node values), find the path connecting two given nodes.

    Optimal solution: O(log n) by calculating the path from root to each target using node values, then determine the LCA.

    Tic Tac Toe

    LeetCode 348. Determine the winner after each move, ignoring invalid cases.

    • The board can be any size, and the number of consecutive marks required to win is configurable.

    Path Finding

    LeetCode 787 - Find the shortest path given certain constraints.

    DB HQ to SF Commute

    Given start and end points with three commuting options, use BFS unless commuting methods can change dynamically; only then consider a priority queue.

    Hash Map with QPS

    LeetCode 362: Implement a hash map that supports queries per second (QPS) efficiently.

    Coding Problems

    The focus of this round was concurrency, requiring live coding. Emphasis was on creating and managing threads.

    Revenue System

    Three operations:

    1. insert(revenue): Returns an auto-incremented customer ID.

    2. insert(revenue, referrer): Returns an auto-incremented customer ID; this new customer is referred by referrer.

    3. get_top_k_revenue(k, min_revenue) -> Set[int]: Returns the top k customers by revenue, where each customer’s revenue includes their own revenue plus the revenue of customers they directly referred. Only include customers with revenue ≥ min_revenue.

    Lazy Array

    An API was provided, and the task was to reason about its expected behavior.

    • Test laziness using a wrapper or mock to track function invocation.

    • Implement a lazy array supporting array.map(...).indexOf(...). Each map applies a function lazily; indexOf executes all pending functions to determine the index of the value.

    • Example:

    arr = LazyArray([10, 20, 30, 40, 50])
    arr.map(lambda x: x*2).indexOf(40)   # -> 1
    arr.map(lambda x: x*2).map(lambda x: x*3).indexOf(240)  # -> 3
    • Each map() must return a new LazyArray, as chains are independent.

    System Design

    Bookstore Problem

    Design a web service for selling books. A customer sends a book request along with their credit card information and a maximum acceptable price. The service should query hundreds of connected bookstores via their APIs to find the lowest price for the requested book. If the lowest price is below the customer’s maximum acceptable price, the transaction should go through and charge the credit card; otherwise, the service should return the lowest available price to the customer.

    • Latency requirement: 10–20 seconds.

    • Number of book sellers: 50–200.

    • Number of books: 1–2 million.

    • Scale: relatively small.

    The main focus is on designing efficient API calls, aggregating results quickly, and returning a reliable response under the latency constraint.

    Throttle System Safety

    Problem Description:

    Design a throttling system for serving infrastructure that handles both internal and external users. The data flow looks like this:

    Client -> HTTP Server (Gateway) -> API Server -> Database / 3rd Party Server / Another API Server

    • API servers are owned by internal teams (first-party services).

    • The system experiences bursts of traffic, causing services to fail in complex ways.

    • Engineers from various teams are paged frequently due to cascading failures.

    Scope:

    • Design a throttling mechanism to prevent overloads and make the system safer.

    • You should consider strategies for rate limiting, backpressure, circuit breakers, and prioritization of requests.

    • Debugging the current architecture is not in scope.

    The main goal is to ensure system reliability and reduce cascading failures during traffic spikes.

    Payment Gateway System

    Design a payment gateway system that supports multiple types of credit or bank cards. Assume clients (merchants) will have POS machines to scan cards and call your API to submit payments.

    Scope:

    • Focus on designing the card validation part, not the full transaction processing.

    • High availability and low latency are required.

    Design Considerations:

    1. Routing: Based on the card number, route requests to the correct bank endpoint.

    2. API Schema: Design APIs for both the gateway and the bank endpoints, including request/response formats.

    3. Failure Handling: Define strategies for retries, fallback endpoints, error reporting, and transaction idempotency.

    4. Load Estimation: Estimate the expected number of requests per second and design servers to handle peak traffic.

    The goal is to design a scalable, fault-tolerant, and low-latency validation system that ensures correctness and reliability in real-world payment scenarios.

    Unlock Your Stealthy AI Interview Copilot

    Databricks Interview Process Insights

    Overall, the Databricks interview has a relatively small question pool, so you have a high chance of encountering original questions if you prepare well. However, each original question often comes with multiple follow-ups from different angles. This means you need to understand the underlying algorithms and concepts thoroughly. For example, a common IP CIDR question is straightforward if you know subnet masks, but follow-up questions will test variations on the same concept. Being clear on the core knowledge allows you to solve both the main problem and its follow-ups efficiently.

    Coding Rounds

    The coding rounds focus on fundamental data structures and algorithms, typically medium-level LeetCode-style problems. Key areas include arrays, strings, hash maps, trees, and occasionally graphs. The main skill tested is not only writing a correct solution but also adapting to follow-up questions and optimizations. Practicing original problems and understanding their core principles is essential.

    System Design Rounds

    Databricks separates system design into two types:

    1. High-Level System Design (Architectural Design) – This is the classic approach where you sketch workflows, data flow, and high-level architecture. It tests your ability to structure a scalable and maintainable system.

    2. Low-Level System Design – This is more hands-on and unique. You’re expected to write pseudocode, and ideally code that could run. It resembles a CS class mini-project but with the pressure of limited time. The challenge is to think through all components and implement a working design quickly.

    Behavioral / HR Rounds

    HR or recruiter calls generally focus on your past experiences, teamwork, and alignment with Databricks’ values. Common topics include handling tight deadlines, cross-team collaboration, and examples of ownership. STAR-based stories help you answer efficiently.

    Technical Screens / EM Rounds

    These interviews often dig deeper into your coding skills, system design ability, and data engineering experience. Expect questions that test handling large-scale data, pipeline optimizations, and design decisions under constraints. They may also include applied machine learning or data processing questions depending on the role.

    Virtual Onsite (VO)

    The VO phase combines multiple technical interviews, system design discussions, and behavioral questions. You may meet with multiple engineers, the hiring manager, and cross-functional team members. The goal is to test both depth and breadth: your coding skills, design thinking, and ability to communicate effectively.

    FAQ

    How long does the Databricks new grad interview process usually take?

    Answer: The process typically takes 3–4 weeks, from the initial recruiter call to the final hiring manager round. Timing may vary depending on scheduling and availability.

    What types of coding questions should I expect?

    Answer: Expect a mix of algorithmic and data structure problems, often similar to medium-level LeetCode questions. Key areas include arrays, strings, hash maps, and occasionally graph or tree traversal.

    Can I ask questions at the end of the interview?

    Answer: Yes, you can. Good questions include asking about team culture, current projects, how success is measured, and what technologies the team is excited to adopt next. It shows curiosity and engagement.