Live Schema Normalization round·Engineering·Medium·20 min

TCS Digital Trainee Interview — Live Schema Normalization

20 min · 1 credit · scorecard at the end
Field
Engineering
Company
Tata Consultancy Services
Role
Digital Software Engineer Trainee
Duration
20 min
Difficulty
Medium
Completions
New
Updated
2026-06-11

How to prepare

What this round tests, what strong and weak answers sound like, and the traps to sidestep.

What this round is about

  • Topic focus. You reason about database schema and transactions live: normalizing an un-normalized orders table to third normal form, the key family, ACID, isolation levels, and indexing tradeoffs.
  • Conversation dynamic. The interviewer hands you a messy table and pushes for the why behind every decomposition, never accepting a definition without asking how you know.
  • What gets tested. Whether you can point at a column and name the dependency it violates, not whether you can list the four normal forms from memory.
  • Round format. A twenty-minute database round with a coding-problem card showing the table and a shared whiteboard for the decomposition and a small entity-relationship sketch.

What strong answers look like

  • Dependency-named decomposition. You say which dependency each step removes, for example moving a column that depends on only part of a composite key into its own table for the second form.
  • Worked transaction reasoning. You walk ACID through a two-account money transfer that fails halfway and explain what atomicity and durability guarantee, rather than listing four words.
  • Anomaly-mapped isolation. You map a level to what it still permits, for example, read committed removes dirty reads but allows the same row to change on a second read.
  • Tradeoff-aware indexing. You name the write and storage cost of an index and connect it to whether the table is read-heavy or write-heavy.

What weak answers look like (and how to avoid them)

  • Form without dependency. Naming a normal form but not the column that violates it; fix it by pointing at the determinant on the card.
  • ACID as four words. Reciting atomicity, consistency, isolation, durability with no example; fix it by running a transfer that fails between the debit and the credit.
  • Index always good. Claiming indexes only help; fix it by stating what every insert, update and delete must now also maintain.
  • Key confusion. Treating a candidate key and a primary key as the same; fix it by saying a candidate key is minimal and the primary key is the one you chose.

Pre-interview checklist (2 minutes before you start)

  • Recall each normal-form trigger. Be able to name the dependency behind the first, second and third forms, not just the numbers.
  • Have a transfer example ready. Keep a two-account bank transfer in mind for the ACID walkthrough.
  • Identify the three read anomalies. Get dirty read, non-repeatable read and phantom read straight, and which level stops each.
  • Re-read the clustered versus non-clustered split. One sets physical row order, the other is a separate structure.
  • Pull up the whiteboard early. Plan to draw the decomposed tables and a small entity-relationship sketch rather than only talking.

How the AI behaves

  • Probes every claim. Asks how you know a table violates a form and which dependency you just removed, not the headline label.
  • No mid-interview praise. It will not say great answer; it names the specific thing you said and then pushes deeper.
  • Interrupts on narration without drawing. If you describe a schema without sketching it, it directs you to the whiteboard within the first couple of turns.
  • Cross-checks diagram and words. If your spoken answer and your drawing disagree, it points at the mismatch and asks which is right.

Common traps in this type of round

  • Label without locator. Saying a table is not in third normal form without naming the non-key column that determines another.
  • Silent freeze. Going quiet instead of reasoning aloud when the question gets concrete.
  • ACID recitation. Listing the four properties with no failing-transaction example.
  • Clustered confusion. Swapping clustered and non-clustered behaviour under follow-up.
  • Index absolutism. Claiming an index is free and never naming the write or storage cost.
  • Rigid normalization. Insisting normalization is always correct and refusing to acknowledge any denormalization case.

Sample problems you'll face

The 2 problems below are the same ones you'll work through in the live session — no surprises. Read the constraints carefully; the AI persona will refer you to the on-canvas card by problem number.

  1. 1Normalize the un-normalized orders table to 3NF

    This Orders table was built by a junior developer and stores everything in one flat sheet. Every row repeats the customer and product details, and the Items column packs multiple products into one cell. Identify the redundancy and the anomalies it causes, then decompose this table step by step to first, second, and third normal form. For each step, state which dependency you are removing (repeating group, partial dependency on part of the composite key, or transitive dependency) and name the primary and foreign keys of every resulting table.

    Example inputOrders(OrderID=1001, OrderDate=2026-06-10, CustID=C7, CustName=Anita Rao, CustCity=Pune, Items='P3:Keyboard:799 x2, P9:Mouse:399 x1', ProductCategory='Peripherals, Peripherals') -- repeated as new rows for every item and re-typed for every order by the same customer
    Example outputCustomer(CustID PK, CustName, CustCity) | Product(ProductID PK, ProductName, UnitPrice, CategoryID FK) | Category(CategoryID PK, CategoryName) | Orders(OrderID PK, OrderDate, CustID FK) | OrderLine(OrderID FK, ProductID FK, Quantity, PK(OrderID,ProductID)) -- Items split to atomic rows for 1NF; CustName/CustCity moved out as they depend only on CustID (partial dependency removed for 2NF); CategoryName moved out as it is determined by CategoryID not the product transitively (transitive dependency removed for 3NF)
    • Treat (OrderID, ProductID) as the natural composite key of the flattened line-item table when reasoning about partial dependencies.
    • Justify every decomposition by naming the exact dependency removed, not just the normal-form number.
    • On the whiteboard, draw the decomposed tables as labelled boxes with primary and foreign keys marked, then add a small entity-relationship sketch showing Customer to Orders to OrderLine to Product to Category with the relationship lines.
  2. 2Reason about a transfer under READ COMMITTED

    Two transactions run concurrently against the normalized schema. Transaction T1 transfers 500 from account A to account B as a single transaction. Transaction T2 is a reporting transaction running under the READ COMMITTED isolation level that reads the balance of account B, then reads it again a moment later within the same transaction. Walk through what T2 can and cannot observe: can it ever see T1's half-completed transfer, and can the two reads of account B return different values? Name the anomaly involved and state which isolation level would prevent it.

    Example inputT1: BEGIN; UPDATE acct SET bal=bal-500 WHERE id='A'; UPDATE acct SET bal=bal+500 WHERE id='B'; COMMIT; T2 (READ COMMITTED): BEGIN; SELECT bal FROM acct WHERE id='B'; -- (T1 commits here) -- SELECT bal FROM acct WHERE id='B'; COMMIT;
    Example outputUnder READ COMMITTED, T2 never sees T1's uncommitted intermediate state, so no dirty read of the half-done transfer. But T2's two reads of account B can differ because T1 committed between them: this is a non-repeatable read. REPEATABLE READ (or SERIALIZABLE) would hold B's value stable across both reads and prevent it.
    • Assume a standard SQL isolation-level model with the three read phenomena: dirty read, non-repeatable read, phantom read.
    • Distinguish what is prevented (dirty read) from what is still allowed (non-repeatable read) at READ COMMITTED.
    • On the whiteboard, draw a two-column timeline with T1 and T2 side by side and mark the exact point where T1 commits, annotating which read sees which value.
Reference

The full breakdown

How you're scored, the questions candidates ask most, and the research this interview is built on. Skim it — or just start the interview.

Interview framework

You will be scored on these 6 dimensions. The full rubric with definitions is below.

Decomposition Justification
Whether you name the specific dependency each normalization step removes, not just the form number, while decomposing the table.
25%
Key Family Precision
How accurately you separate super, candidate, primary and foreign keys, including that a candidate key is minimal.
15%
Transaction Reasoning
Whether you walk ACID through a failing transfer with rollback and durability, instead of listing the four properties.
20%
Isolation Anomaly Mapping
How precisely you map each isolation level to the dirty, non-repeatable, or phantom read it does and does not prevent.
20%
Indexing Tradeoff Judgment
Whether you name the write and storage cost of an index and tie it to read-heavy versus write-heavy workloads.
12%
Whiteboard Schema Clarity
How clearly your drawn tables and entity-relationship sketch match your spoken reasoning, with keys labelled.
8%

What we evaluate

Your final scorecard breaks down across these dimensions. The full rubric and tier criteria are revealed inside the interview itself.

  • Normalization Decomposition Justification24%
  • Key Family Precision14%
  • ACID Transaction Reasoning20%
  • Isolation Level Anomaly Mapping20%
  • Indexing Tradeoff Reasoning12%
  • Schema Design Tradeoff Judgment10%

Common questions

What does the TCS Digital DBMS technical round actually test?
It tests whether you can reason about a database schema, not just recite definitions. You are handed an un-normalized orders table and asked to decompose it to first, second and third normal form, naming the specific dependency each step removes. You then defend the key family, walk ACID through a bank transfer, map isolation levels to the read anomalies they prevent, and explain when an index hurts writes. The interviewer at Tata Consultancy Services consistently pushes for the why behind every answer, so a candidate who can only state the four normal forms as labels will not clear the bar.
How should I structure my answer when given the un-normalized table?
Start by looking before you act. Read the columns aloud and state the obvious redundancy, then move one normal form at a time. For each step, say which problem you are fixing: repeating groups and non-atomic columns for the first form, columns that depend on only part of a composite key for the second, and a non-key column that determines another non-key column for the third. Draw the resulting tables on the whiteboard and label the keys. Finishing each step with the anomaly it prevents is what separates a strong answer from a recitation.
What are the most common mistakes candidates make in this round?
The biggest one is naming a normal form but being unable to point at the column that violates it. Close behind is reciting ACID as four words with no worked example, confusing clustered and non-clustered indexes, and being unable to say why a primary key differs from a candidate key. Many candidates also claim indexes are always good and freeze when asked what an index slows down. Reciting textbook lines without applying them to the table on the card is the fastest way to a borderline result.
How is this AI interviewer different from a real TCS interviewer?
It behaves like a real database panelist and not a quiz bot. It asks one question at a time, acknowledges the specific thing you said, and always probes at least once before moving on. It never praises you mid-interview and it never hands you the answer, so when you stall it reframes the question rather than teaching the concept. It also reads your whiteboard, references the boxes you drew by name, and points out when your spoken answer and your diagram disagree, exactly as a campus interviewer would.
How is the scoring done in this mock interview?
You are scored from the transcript and the final whiteboard against named dimensions such as decomposition justification, key precision, transaction reasoning, isolation mapping and indexing tradeoffs. Each dimension has observable signals, so two evaluators would land within a few points of each other. The report quotes the exact moment a decomposition or anomaly went unjustified and shows what a stronger answer would have sounded like. Nothing is scored on accent, fluency or delivery style, only on the quality of your reasoning.
What should I do in the first two minutes before I start?
Recall the trigger for each normal form so you can name the dependency, not just the number. Have a two-account bank transfer ready as your ACID example. Get the three read anomalies straight in your head: dirty read, non-repeatable read and phantom read, and which isolation level stops each. Remind yourself that a clustered index sets physical row order while a non-clustered index is a separate structure. Finally, plan to use the whiteboard early, because narrating a schema without drawing it loses signal in this round.
How do I handle a question about which normal form a table violates?
Anchor on the dependencies, not the labels. Identify the candidate key first, then check each non-key column: if a column depends on only part of a composite key, the table breaks the second form; if a non-key column is determined by another non-key column, it breaks the third. Say how you know by naming the determinant and the dependent column on the card. If you assert a table is already in third normal form, prove it by showing no non-key column determines another, because the interviewer will ask you to.
What does a strong answer about isolation levels sound like?
A strong answer maps each level to the anomalies it does and does not prevent, with a concrete picture. For example, under read committed you only see committed data so dirty reads are gone, but the same row can change value if you read it twice, which is a non-repeatable read, and new rows can still appear in a range query, which is a phantom. Repeatable read holds the rows you read steady but phantoms can still slip in, and serializable removes all three. Tying the level to throughput cost shows real understanding.
Why does the interviewer keep asking when an index hurts?
Because claiming indexes are always good is a classic shallow answer. An index speeds reads by keeping lookups logarithmic through a B-tree, but every insert, update and delete must also maintain that index, and the index consumes storage. On a write-heavy table, over-indexing can slow the system down more than it helps. The interviewer at Tata Consultancy Services wants to hear you name that write cost and the storage cost out loud, and ideally connect it to whether the table is read-heavy or write-heavy.
Is normalization always the right choice, or is denormalization ever correct?
Denormalization is a legitimate engineering decision, not a mistake. Normalizing reduces redundancy and update anomalies, which is ideal for transactional systems where data changes often. Denormalization deliberately reintroduces redundancy to avoid expensive joins on read-heavy reporting workloads, trading slower or riskier writes for faster reads. A strong candidate states the tradeoff explicitly: what consistency risk you accept and which workload justifies it. Saying normalization is always correct, or that denormalization is always bad, is exactly the rigid answer the round is designed to catch.
How long is the round and what format does it take?
This is a focused twenty-minute database round. It opens with a live normalization task on an un-normalized orders table shown on a card, moves through the key family and functional dependencies, then ACID and isolation levels, and closes on indexing and a normalization-versus-denormalization tradeoff. You use a shared whiteboard to draw the decomposed tables and a small entity-relationship sketch. The pace mirrors a real Tata Consultancy Services technical interview where one topic flows into the next as your answers open new follow-ups.

Sources this interview is built on

Real candidate-report URLs (Glassdoor / AmbitionBox / PrepInsta / GeeksforGeeks / Medium) reviewed when authoring the questions, persona, and rubric. Verify the realism yourself.